I have a data frame df with variable x. However, two different expression to check on NA give me different results. Can anyone explain?
sum(is.na(df$x)
#[1] 41
df %>% filter(x==NA)
#A tibble: 0 x 1`
I have a data frame df with variable x. However, two different expression to check on NA give me different results. Can anyone explain?
sum(is.na(df$x)
#[1] 41
df %>% filter(x==NA)
#A tibble: 0 x 1`
Note that a comparison with NA via == (nearly) always evaluates to NA. This is easily demonstrated with:
x <- c(1, 2, NA, 4)
x == NA
#[1] NA NA NA NA
See help("NA") and help("=="). From the latter documentation:
Missing values (
NA) andNaNvalues are regarded as non-comparable even to themselves, so comparisons involving them will always result inNA.
So your dplyr code should be:
df %>% filter(is.na(x))