6

I am trying to filter specific rows of my tibble using the dplyr::filter() function.

Here is part of my tibble head(raw.tb):

A tibble: 738 x 4
      geno   ind     X     Y
     <chr> <chr> <int> <int>
 1 san1w16    A1   467   383
 2 san1w16    A1   465   378
 3 san1w16    A1   464   378
 4 san1w16    A1   464   377
 5 san1w16    A1   464   376
 6 san1w16    A1   464   375
 7 san1w16    A1   463   375
 8 san1w16    A1   463   374
 9 san1w16    A1   463   373
10 san1w16    A1   463   372
# ... with 728 more rows

When I ask for: raw.tb %>% dplyr::filter(ind == contains("A"))

I get: Error in filter_impl(.data, quo) : Evaluation error: No tidyselect variables were registered

In my tibble unique(raw.tb$ind) is:

    [1] "A1"  "A10" "A11" "A12" "A2"  "A3"  "A4"  "A5"  "A6"  "A7"  "A8"  "A9"  "B1" 
[14] "B10" "B11" "B12" "B2"  "B3"  "B4"  "B5"  "B6"  "B7"  "B8"  "B9"  "C1"  "C10"
[27] "C11" "C12" "C2"  "C3"  "C4"  "C5"  "C6"  "C7"  "C8"  "C9"  "D1"  "D10" "D11"
[40] "D12" "D2"  "D3"  "D4"  "D5"  "D6"  "D7"  "D8"  "D9"  "E1"  "E10" "E11" "E12"
[53] "E2"  "E3"  "E4"  "E5"  "E6"  "E7"  "E8"  "E9"  "F1"  "F10" "F11" "F12" "F2" 
[66] "F3"  "F4"  "F5"  "F6"  "F7"  "F8"  "F9"  "G1"  "G10" "G11" "G2"  "G3"  "G4" 
[79] "G5"  "G6"  "G7"  "G8"  "G9"  "H1"  "H10" "H11"

And I would like to extract only the rows where raw.tb$ind starts with "A" using the tidyverse language.

(I know how to do that in base R, but my goal here is to use tidyverse).

Thanks a lot for any feedback

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
Al3xEP
  • 328
  • 2
  • 9
  • 3
    You may need `raw.tb %>% dplyr::filter(grepl("A", ind))` or `raw.tb %>% dplyr::filter(str_detect(ind, "A"))` as `contains` is used in a different context to select variables – akrun Feb 04 '18 at 12:17
  • `dplyr::filter(str_detect(ind, "A"))` works! Many thanks – Al3xEP Feb 04 '18 at 12:22

2 Answers2

9

The filter expects a logical vector to filter the rows. The select helper (?select_helpers) function contains selects the columns of the dataset based on some pattern. In order to filter the rows, we can either use grepl from base R

raw.tb %>%
   dplyr::filter(grepl("A", ind)) 

or str_detect from stringr (one of the packages in tidyverse

raw.tb %>%
  dplyr::filter(stringr::str_detect(ind, "A"))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

simply writing out akrun's comment, @akrun feel free to take over this answer in case.

create some data,

dput(raw.tb) 
raw.tb <- structure(list(geno = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = "san1w16", class = "factor"), ind = structure(c(1L, 
1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 1L), .Label = c("A1", "B1", "C1", 
"D1", "E1"), class = "factor"), X = c(467L, 465L, 464L, 464L, 
464L, 464L, 463L, 463L, 463L, 463L), Y = c(383L, 378L, 378L, 
377L, 376L, 375L, 375L, 374L, 373L, 372L)), .Names = c("geno", 
"ind", "X", "Y"), row.names = c("1", "2", "3", "4", "5", "6", 
"7", "8", "9", "10"), class = c("tbl_df", "tbl", "data.frame"
))

the data,

raw.tb
#> # A tibble: 10 x 4
#>       geno    ind     X     Y
#>  *  <fctr> <fctr> <int> <int>
#>  1 san1w16     A1   467   383
#>  2 san1w16     A1   465   378
#>  3 san1w16     B1   464   378
#>  4 san1w16     B1   464   377
#>  5 san1w16     C1   464   376
#>  6 san1w16     C1   464   375
#>  7 san1w16     D1   463   375
#>  8 san1w16     D1   463   374
#>  9 san1w16     E1   463   373
#> 10 san1w16     A1   463   372

Method #1

raw.tb %>% dplyr::filter(str_detect(ind, "A"))
#> # A tibble: 3 x 4
#>      geno    ind     X     Y
#>    <fctr> <fctr> <int> <int>
#> 1 san1w16     A1   467   383
#> 2 san1w16     A1   465   378
#> 3 san1w16     A1   463   372

Method #1

raw.tb %>% dplyr::filter(grepl("A", ind))
#> # A tibble: 3 x 4
#>      geno    ind     X     Y
#>    <fctr> <fctr> <int> <int>
#> 1 san1w16     A1   467   383
#> 2 san1w16     A1   465   378
#> 3 san1w16     A1   463   372
Eric Fail
  • 8,191
  • 8
  • 72
  • 128