sum(is.na(df$x) gives me a different answer than dplyr df%>%filter(x==NA)

Question

I have a data frame df with variable x. However, two different expression to check on NA give me different results. Can anyone explain?

sum(is.na(df$x)
#[1] 41

df %>% filter(x==NA)
#A tibble: 0 x 1`

Perhaps the answer to this question will clear some things up: https://stackoverflow.com/questions/25100974/na-matches-na-but-is-not-equal-to-na-why — sumshyftw, Feb 25 '19 at 19:02

score 0 · Answer 1 · answered Feb 25 '19 at 20:06

Note that a comparison with NA via == (nearly) always evaluates to NA. This is easily demonstrated with:

x <- c(1, 2, NA, 4)
x == NA
#[1] NA NA NA NA

See help("NA") and help("=="). From the latter documentation:

Missing values (NA) and NaN values are regarded as non-comparable even to themselves, so comparisons involving them will always result in NA.

So your dplyr code should be:

df %>% filter(is.na(x))

1 Answers1