user_a - 3
user_b - 4
user_c - 1
user_d - 4
I want to show the distribution over number of tweets per author in r using a histogram. The original file has 1048575 such rows
I did hist(df$twitter_count, nrow(df)) but I don't think its correct
            Asked
            
        
        
            Active
            
        
            Viewed 6,017 times
        
    0
            
            
         
    
    
        Mehru
        
- 1
- 1
- 3
- 
                    please include your data as editable text instead of link to an image – Imran Ali Oct 22 '17 at 04:37
- 
                    Hi Mehru - welcome to SO... it would help me help you if I knew a little more about your data - see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. Your nrow(df) is speficying the breaks in the histogram... If you are looking at doing some conditional histograms (e.g. number of tweets per day/week/month/year per author) you might consider using lattice or ggplot2. – James Thomas Durant Oct 22 '17 at 04:40
- 
                    If you want the histogram of twitter counts, just use `hist(df$twitter_count)` – kangaroo_cliff Oct 22 '17 at 04:44
- 
                    1see [here](https://stackoverflow.com/questions/46860454/constructing-histogram-from-2-variables-in-1-column-in-r/46860693#46860693) – vaettchen Oct 22 '17 at 05:19
- 
                    1Possible duplicate of [Constructing histogram from 2 variables in 1 column in R](https://stackoverflow.com/questions/46860454/constructing-histogram-from-2-variables-in-1-column-in-r) – vaettchen Oct 22 '17 at 05:20
3 Answers
3
            
            
        It seems I have misunderstood the question. I think following could be what the OP is looking for.
df <- data.frame(user = letters, 
                 twitter_count = sample.int(200, 26))
ggplot(df, aes(user, twitter_count)) +
  geom_col()
Assuming you are looking for multiple histograms.
Replace user with respective variable name in your data.frame.
# Example data
df <- data.frame(user = iris$Species, 
                 twitter_count= round(iris[, 1]*10))
# Histograms using ggplot2 package
library(ggplot2)
ggplot(df, aes(x = twitter_count)) +
  geom_histogram() + facet_grid(.~user)
Best to use an alternative method to see the distributions of twitter counts if your data contain many twitter users.
 
    
    
        kangaroo_cliff
        
- 6,067
- 3
- 29
- 42
1
            If each row of the data.frame represents a user -
set.seed(1)
df <- data.frame(user = letters, twitter_count = rpois(26, lambda = 4) + 1)
hist(df$twitter_count)
 
    
    
        James Thomas Durant
        
- 285
- 4
- 13
0
            
            
        Since you said, distribution for 'each user', I think it should be a bar blot:
require(data.table)
dat <- fread("
  user_a - 3
  user_b - 4
  user_c - 1
  user_d - 4"
)
barplot( names.arg = dat$V1, as.numeric(dat$V3) )
or if you are looking for histograms, then:
hist(as.numeric(dat$V3), xlab = "", main="Histogram")
 
    
    
        LeMarque
        
- 733
- 5
- 21

