I have a vector of strings that looks something like this: c("abc@40gmail.com", "xyz@50gmail.com"). For some reason, there are random/different digits after the @ and I'm trying to remove them. Using regular expression, how can I tell R to remove or replace the digits that come after "@", so I end up with c("abc@gmail.com", "xyz@gmail.com"). I don't know much about Regex, so I'd really really appreciate if someone can provide not just the code, but also a brief explanation of the code. Thanks!
            Asked
            
        
        
            Active
            
        
            Viewed 393 times
        
    -2
            
            
         
    
    
        hsl
        
- 670
- 2
- 10
- 22
- 
                    1@Thomas has is that a dupe? From now on every text replacement question is a dupe of `gsub("e", "", x)`? The regex in the "dupe" is of exact match type, while in this question, it is a bit more complicated – David Arenburg May 17 '15 at 17:23
2 Answers
3
            
            
        One option is
x <- c("abc@40gmail.com", "xyz@50gmail.com")
sub("@\\d+", "@", x)
## [1] "abc@gmail.com" "xyz@gmail.com"
 
    
    
        David Arenburg
        
- 91,361
- 17
- 137
- 196
1
            You could use  Positive lookbehind or \K
sub("(?<=@)\\d+", "", x, perl=T)
\\d+ matches one or more digits characters. So (?<=@) forces the regex engine to look immediate after to the @ symbol and then make it to match the following one or more digit characters. Since lookarounds belong to the PCRE family, you need to enable perl=TRUE parameter.
OR
sub("@\\K\\d+", "", x, perl=T)
 
    
    
        Avinash Raj
        
- 172,303
- 28
- 230
- 274
- 
                    Thanks a lot! Is there any reason why you wouldn't just use the simpler `sub("@\\d+", "@", x)`? – hsl May 17 '15 at 15:55
- 
                    @hsl because it's already mentioned. We could write atleast two answers for a single regex based question. :-) That's the beauty of regex. – Avinash Raj May 17 '15 at 15:57