I have the following dataset with the following variables indicating whether a person used their phone (a dummy variable with 1 = used the phone ("Yes") and 0 ("No") else); their ID and district and sub-district they live in. Note that a same person may have been recorded twice or more under different sub-districts. However, I only want to count such a person once, that is, consider only unique IDs.
district sub_district   id  used_phone
    A   SX  1   Yes
    A   SX  2   Yes
    A   SX  3   No
    A   SX  4   No
    A   SY  4   No
    A   SY  5   Yes
    A   SZ  6   Yes
    A   SX  6   Yes
    A   SZ  7   No
    B   RX  8   No
    B   RV  9   No
    B   RX  9   No
    B   RV  10  Yes
    B   RV  11  Yes
    B   RT  12  Yes
    B   RT  13  Yes
    B   RV  13  Yes
    B   RT  14  No
    B   RX  14  No
  
N.B: used_phone is a factor variable
For the above dataset, I want to plot a distribution of "whether a person used a phone" for which I was using the following code:
  ggplot(df, aes(x=used_phone)) +
  geom_bar(color = "black", fill = "aquamarine4", position = "dodge") +
  labs(x="Used phone", y = "Number of people") +
  ggtitle("Whether person used phone") +
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5)))
  
This code works fine. However, I want to do two things:
- Add % labels for each group (yes & no) over the respective bars but y-axis to show the "count"
 - Plot the graph such that it only considers the unique IDs
 
Looking forward to solving this with your help as I am novice in R.
Thanks, Rachita




