How to extract everything after a specific string?

Question

I'd like to extract everything after "-" in vector of strings in R.

For example in :

test = c("Pierre-Pomme","Jean-Poire","Michel-Fraise")

I'd like to get

c("Pomme","Poire","Fraise")

Thanks !

See also: [Extract a substring according to a pattern](https://stackoverflow.com/questions/17215789) — GKi, Jun 14 '23 at 06:59

acylam · Accepted Answer · 2020-05-22T13:48:20.987

With str_extract. \\b is a zero-length token that matches a word-boundary. This includes any non-word characters:

library(stringr)
str_extract(test, '\\b\\w+$')
# [1] "Pomme"  "Poire"  "Fraise"

We can also use a back reference with sub. \\1 refers to string matched by the first capture group (.+), which is any character one or more times following a - at the end:

sub('.+-(.+)', '\\1', test)
# [1] "Pomme"  "Poire"  "Fraise"

This also works with str_replace if that is already loaded:

library(stringr)
str_replace(test, '.+-(.+)', '\\1')
# [1] "Pomme"  "Poire"  "Fraise"

Third option would be using strsplit and extract the second word from each element of the list (similar to word from @akrun's answer):

sapply(strsplit(test, '-'), `[`, 2)
# [1] "Pomme"  "Poire"  "Fraise"

stringr also has str_split variant to this:

str_split(test, '-', simplify = TRUE)[,2]
# [1] "Pomme"  "Poire"  "Fraise"

score 17 · Answer 2 · answered Jul 25 '19 at 14:42

17

We can use sub to match characters (.*) until the - and in the replacement specify ""

sub(".*-", "", test)

Or another option is word

library(stringr)
word(test, 2, sep="-")

answered Jul 25 '19 at 14:42

akrun

874,273
37
540
662

score 5 · Answer 3 · answered Jul 25 '19 at 15:15

5

I think the other answers might be what you're looking for, but if you don't want to lose the original context you can try something like this:

library(tidyverse)

tibble(test) %>% 
    separate(test, c("first", "last"), remove = F)

This will return a dataframe containing the original strings plus components, which might be more useful down the road:

# A tibble: 3 x 3
  test          first  last  
  <chr>         <chr>  <chr> 
1 Pierre-Pomme  Pierre Pomme 
2 Jean-Poire    Jean   Poire 
3 Michel-Fraise Michel Fraise

answered Jul 25 '19 at 15:15

How do you specify where to separate the text if there were more than one "-" in the test column? – ORStudent Sep 08 '20 at 08:06
@ORStudent you can try using more complex regex in the `sep` argument. You can also use integers to specify exact positions, which means you can use something like `str_locate_all` to find all occurrences of a separator and then specify which one, exactly, should be separated on. – Sep 09 '20 at 11:21

score 0 · Answer 4 · answered Nov 06 '22 at 15:16

0

For some reason the responses here didn't work for my particular string. I found this response more helpful (i.e., using Stringr's lookbehind function): stringr str_extract capture group capturing everything.

answered Nov 06 '22 at 15:16

rtk19

1

1

This answer could be more helpful if you could kindly provide a short summary of your reference and a simple showcase. – X Zhang Nov 10 '22 at 00:46

How to extract everything after a specific string?

4 Answers4

Linked

Related