I'd like to extract everything after "-" in vector of strings in R.
For example in :
test = c("Pierre-Pomme","Jean-Poire","Michel-Fraise")
I'd like to get
c("Pomme","Poire","Fraise")
Thanks !
I'd like to extract everything after "-" in vector of strings in R.
For example in :
test = c("Pierre-Pomme","Jean-Poire","Michel-Fraise")
I'd like to get
c("Pomme","Poire","Fraise")
Thanks !
 
    
    With str_extract. \\b is a zero-length token that matches a word-boundary. This includes any non-word characters:
library(stringr)
str_extract(test, '\\b\\w+$')
# [1] "Pomme"  "Poire"  "Fraise"
We can also use a back reference with sub. \\1 refers to string matched by the first capture group (.+), which is any character one or more times following a - at the end:
sub('.+-(.+)', '\\1', test)
# [1] "Pomme"  "Poire"  "Fraise"
This also works with str_replace if that is already loaded:
library(stringr)
str_replace(test, '.+-(.+)', '\\1')
# [1] "Pomme"  "Poire"  "Fraise"
Third option would be using strsplit and extract the second word from each element of the list (similar to word from @akrun's answer):
sapply(strsplit(test, '-'), `[`, 2)
# [1] "Pomme"  "Poire"  "Fraise"
stringr also has  str_split variant to this:
str_split(test, '-', simplify = TRUE)[,2]
# [1] "Pomme"  "Poire"  "Fraise"
 
    
    We can use sub to match characters (.*) until the - and in the replacement specify ""
sub(".*-", "", test)
Or another option is word
library(stringr)
word(test, 2, sep="-")
 
    
    I think the other answers might be what you're looking for, but if you don't want to lose the original context you can try something like this:
library(tidyverse)
tibble(test) %>% 
    separate(test, c("first", "last"), remove = F)
This will return a dataframe containing the original strings plus components, which might be more useful down the road:
# A tibble: 3 x 3
  test          first  last  
  <chr>         <chr>  <chr> 
1 Pierre-Pomme  Pierre Pomme 
2 Jean-Poire    Jean   Poire 
3 Michel-Fraise Michel Fraise
For some reason the responses here didn't work for my particular string. I found this response more helpful (i.e., using Stringr's lookbehind function): stringr str_extract capture group capturing everything.
