I've made a dict-of-counts style df to make a barplot from and am interested in sorting the bars in descending order by size. In the R world I'd do something like this and was wondering 1) what a tidy way to do this would be and 2) if something like it existed in Gadfly.
            Asked
            
        
        
            Active
            
        
            Viewed 289 times
        
    2 Answers
1
            
            
        My hack is this:
using DataFrames
using DataStructures
using Gadfly
# test data
list = ["E", "E", "E", "E", "E",
        "B", "B", "B", "B",
        "C", "C", "C",
        "D", "D", "D",
        "A"]
# I am making a dict-of-counts to turn into a df
# empty string->int dict
countsDict = Dict{String,Integer}()
# a function to count occurences of a given string in an array
function countStrInArray(str::String, arr::Array{String,1})::Integer
    findall(x -> x == str, arr) |> length
end
# for every item in the list 
for item in list
    # if we don't have it in the dict, add, with count as value
    if !haskey(countsDict, item)
        countsDict[item] = countStrInArray(item, list)
    end
end
# this gives me the structure I want but I lose 'lookup' functionality
sortedTuples = sort(collect(zip(values(countsDict),
                    keys(countsDict))), rev = true)
# ...so I creaated an order-preserving dict
sortedCountsDict = OrderedDict{String,Integer}()
# map our tuples to it
for item in sortedTuples
    sortedCountsDict[item[2]] = item[1]
end
# make it into a dataframe
df = DataFrame(group = [i for i in keys(sortedCountsDict)],
               count = [i for i in values(sortedCountsDict)])
# plot it!
plot(df, x = :group, y = :count, Geom.bar) |> SVG("HackyButWorks.svg")
Does anyone know a cleaner way to do this?
        Sweasonable Doubt
        
- 83
 - 1
 - 7
 
1
            df = DataFrame(
list = ["E", "E", "E", "E", "E","B", "B", "B", "B",
    "C", "C", "C", "D", "D", "D","A"]
)
p1 = plot(df, x=:list, Geom.histogram)
p2 = plot(df, x=:list, Geom.histogram,  Scale.x_discrete(levels=["A","D","C","B","E"]) )
See the Gadfly tutorial
        Mattriks
        
- 171
 - 4
 
- 
                    Aha! I missed the `levels` argument when I read the docs. It looks like I can pass the keys of my sorted dict right to it and avoid all the other df. Thank you :) – Sweasonable Doubt Mar 28 '21 at 18:25
 
