I have a dataset with 600 responses with a "Free_Text" variable which contains the feedback/comments from the respondents. Now I want to calculate the number of words in the comments for each respondent. How should I do it? I am a new learner of R and am working on R studio.
            Asked
            
        
        
            Active
            
        
            Viewed 137 times
        
    -6
            
            
        - 
                    4Please do not ask for help without [a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Thomas Jun 24 '14 at 11:30
4 Answers
2
            
            
        Consider using stri_extract_words from the stringi package, especially if you have a non-English text. It uses ICU's BreakIterator for this task and contains a list of sophisticated word breaking rules.
library(stringi)
str <- c("How many words are there?", "R — язык программирования для статистической обработки данных и работы с графикой, а также свободная программная среда вычислений с открытым исходным кодом в рамках проекта GNU.")
stri_extract_words(str)
## [[1]]
## [1] "How"   "many"  "words" "are"   "there"
## 
## [[2]]
##  [1] "R"                "язык"             "программирования" "для"              "статистической"  
##  [6] "обработки"        "данных"           "и"                "работы"           "с"               
## [11] "графикой"         "а"                "также"            "свободная"        "программная"     
## [16] "среда"            "вычислений"       "с"                "открытым"         "исходным"        
## [21] "кодом"            "в"                "рамках"           "проекта"          "GNU"   
sapply(stri_extract_words(str), length) # how many words are there in each character string?
## [1]  5 25
 
    
    
        gagolews
        
- 12,836
- 2
- 50
- 75
1
            
            
        Split the string and count the elements is a simple way to get you started.
str = "This is a string."
str_length = length(strsplit(str," ")[[1]])
> str_length
[1] 4
 
    
    
        AGS
        
- 14,288
- 5
- 52
- 67
1
            
            
        May be this helps:
 str1 <- c("How many words are in this sentence","How many words")
 sapply(gregexpr("\\W+", gsub("[[:punct:]]+","",str1)), length) + 1
 #[1] 7 3
Also,
 library(qdap)
 word_count(str1)
#[1] 7 3
 str2 <- "How many words?."  
 word_count(str2)
 #[1] 3
 
    
    
        akrun
        
- 874,273
- 37
- 540
- 662
0
            
            
        And, one more method, using the stringr package, to list individual words:
str1 <- c("How many words are in this sentence","How many words")
length(unlist(str_match_all(str1, "\\S+" ))) # list all words -- strings that end with one or more white spaces, then unlist them so that the length function counts them
 
    
    
        lawyeR
        
- 7,488
- 5
- 33
- 63
