Suppose I have a character vector like
"Hi, this is a good time to start working together.".
I just want to have
" Hi, this is a good time to start working together."
Only one white space between two words. How should I do this in R?
Suppose I have a character vector like
"Hi, this is a good time to start working together.".
I just want to have
" Hi, this is a good time to start working together."
Only one white space between two words. How should I do this in R?
gsub is your friend:
test <- "Hi, this is a good time to start working together."
gsub("\\s+"," ",test)
#[1] "Hi, this is a good time to start working together."
\\s+ will match any space character (space, tab etc), or repeats of space characters, and will replace it with a single space " ".
Another option is the squish function from the stringr library
library(stringr)
string <- "Hi, this is a good time to start working together."
str_squish(string)
#[1] ""Hi, this is a good time to start working together.""
Since the title of the question is "remove the extra whitespace between words", without touching the leading and trailing whitespaces, the answer is (assuming the "words" are non-whitespace character chunks)
gsub("(\\S)\\s{2,}(?=\\S)", "\\1 ", text, perl=TRUE)
stringr::str_replace_all(text, "(\\S)\\s{2,}(?=\\S)", "\\1 ")
## Or, if the whitespace to leep is the last whitespace in those matched
gsub("(\\S)(\\s){2,}(?=\\S)", "\\1\\2", text, perl=TRUE)
stringr::str_replace_all(text, "(\\S)(\\s){2,}(?=\\S)", "\\1\\2")
See regex demo #1 and regex demo #2 and this R demo.
Regex details:
(\S) - Capturing group 1 (\1 refers to this group value from the replacement pattern): a non-whitespace char\s{2,} - two or more whitespace chars (in Regex #2, it is wrapped with parentheses to form a capturing group with ID 2 (\2))(?=\S) - a positive lookahead that requires a non-whitespace char immediately to the right of the current location.