How to replace text in R using regular expression?

Question

I have a vector of strings that looks something like this: c("abc@40gmail.com", "xyz@50gmail.com"). For some reason, there are random/different digits after the @ and I'm trying to remove them. Using regular expression, how can I tell R to remove or replace the digits that come after "@", so I end up with c("abc@gmail.com", "xyz@gmail.com"). I don't know much about Regex, so I'd really really appreciate if someone can provide not just the code, but also a brief explanation of the code. Thanks!

@Thomas has is that a dupe? From now on every text replacement question is a dupe of `gsub("e", "", x)`? The regex in the "dupe" is of exact match type, while in this question, it is a bit more complicated — David Arenburg, May 17 '15 at 17:23

score 3 · Answer 1 · answered May 17 '15 at 15:38

3

One option is

x <- c("abc@40gmail.com", "xyz@50gmail.com")
sub("@\\d+", "@", x)
## [1] "abc@gmail.com" "xyz@gmail.com"

answered May 17 '15 at 15:38

David Arenburg

91,361
17
137
196

@ColonelBeauvel, ok, nvm then. – David Arenburg May 19 '15 at 09:41

Avinash Raj · Accepted Answer · 2015-05-17T15:46:01.467

1

You could use Positive lookbehind or \K

sub("(?<=@)\\d+", "", x, perl=T)

\\d+ matches one or more digits characters. So (?<=@) forces the regex engine to look immediate after to the @ symbol and then make it to match the following one or more digit characters. Since lookarounds belong to the PCRE family, you need to enable perl=TRUE parameter.

OR

sub("@\\K\\d+", "", x, perl=T)

edited May 17 '15 at 15:46

answered May 17 '15 at 15:39

Avinash Raj

172,303
28
230
274

Thanks a lot! Is there any reason why you wouldn't just use the simpler `sub("@\\d+", "@", x)`? – hsl May 17 '15 at 15:55
@hsl because it's already mentioned. We could write atleast two answers for a single regex based question. :-) That's the beauty of regex. – Avinash Raj May 17 '15 at 15:57

How to replace text in R using regular expression?

2 Answers2