In the context of cluster profiling, I am trying to visualize categorical variables distribution of each cluster compared to the overall population.
In order to make them comparable, I use the Relative Frequency.
For numerical variable is pretty straigthforward because I can easily overlay histograms.
Instead, for categorical variable I would like to obtain something like this:
In which the external piechart visualizes the Relative Frequency of Cluster 1 and the internal piechart represents the Relative Frequency of the Overall Population.
An reproducible example is:
mydf <- data.frame(week_day = as.factor(c(rep("monday",10), rep("monday",5), rep("tuesday",5))), cluster = c(rep(1,10), rep(2,10)))
Here, Cluster 1 is exclusively composed by "monday", whereas the Overall Population is composed 75% "monday" and 25% "tuesday".
The Relative Frequency within ggplot aes can be easily computed using:
y = (..count..)/sum(..count..)
