I'm trying to find a way of using pipes to group data by part of a character vector using a function. The data is in this format:ampXXi or ampXXXi , where XX or XXX. Are the unique site codes and the i denotes sub-sites within each site. Is there a way of grouping the data by each ampXXi or ampXXXi? I tried to sort this with function using grepl(), but that didn't work. Thanks for any advice.
            Asked
            
        
        
            Active
            
        
            Viewed 198 times
        
    0
            
            
         
    
    
        Dennis Kozevnikoff
        
- 2,078
- 3
- 19
- 29
 
    
    
        user14014863
        
- 5
- 1
- 
                    2you need to provide a [small reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Onyambu Aug 07 '20 at 15:29
1 Answers
2
            substr() to get part of string variable for grouping
You could use substr() to extract the unique site ids, and use the variable to group your data.
Example dataframe:
df <- data.frame(
          x = c("amp22i", "amp333i", "amp11i", "amp22i", "amp11i", "amp333i"),
          y = c(1:6), 
          stringsAsFactors = FALSE)
df
#         x y
# 1  amp22i 1
# 2 amp333i 2
# 3  amp11i 3
# 4  amp22i 4
# 5  amp11i 5
# 6 amp333i 6
substr() to make group id variable from portion of string
library(dplyr)
library(magrittr)
df %<>% 
  mutate(id = substr(x,4, nchar(x)))
df
#          x y   id
#  1  amp22i 1  22i
#  2 amp333i 2 333i
#  3  amp11i 3  11i
#  4  amp22i 4  22i
#  5  amp11i 5  11i
#  6 amp333i 6 333i
Grouping using pipes/group_by and get group means.
df %>% 
  group_by(id) %>% 
  summarize(mean = mean(y))
# # A tibble: 3 x 2
#   id     mean
#   <chr> <dbl>
# 1 11i     4  
# 2 22i     2.5
# 3 333i    4 
There are tidyverse alternatives for the above, e.g. str_sub() and str_length() within mutate().
 
    
    
        Tfsnuff
        
- 181
- 6