The difference is that complete.cases returns a logical vector of the same length as the number of rows of the dataset while na.omit removes row that have at least one NA. Using the reproducible example created below,
complete.cases(auto)
#[1] TRUE FALSE TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE
As we can see, it is a logical vector with no NAs. It gives TRUE for rows that doesn't have any NAs. So, obviously, doing summary on a logical vector returns no NA's.
summary(complete.cases(auto))
# Mode FALSE TRUE NA's
#logical 4 6 0
Suppose, we need to get the same result as the na.omit, the logical vector derived should be used to subset the original dataset
autoN <- auto[complete.cases(auto),]
auto1 <- na.omit(auto)
dim(autoN)
#[1] 6 2
dim(auto1)
#[1] 6 2
Though, the results will be similar, na.omit also returns some attributes
str(autoN)
#'data.frame': 6 obs. of 2 variables:
# $ v1: int 1 2 2 2 3 3
# $ v2: int 3 3 3 1 4 2
str(auto1)
#'data.frame': 6 obs. of 2 variables:
# $ v1: int 1 2 2 2 3 3
# $ v2: int 3 3 3 1 4 2
# - attr(*, "na.action")=Class 'omit' Named int [1:4] 2 7 8 10
# .. ..- attr(*, "names")= chr [1:4] "2" "7" "8" "10"
and would be slower compared to complete.cases based on the benchmarks showed below.
Benchmarks
set.seed(238)
df1 <- data.frame(v1 = sample(c(NA, 1:9), 1e7, replace=TRUE),
v2 = sample(c(NA, 1:50), 1e7, replace=TRUE))
system.time(na.omit(df1))
# user system elapsed
# 2.50 0.19 2.69
system.time(df1[complete.cases(df1),])
# user system elapsed
# 0.61 0.09 0.70
data
set.seed(24)
auto <- data.frame(v1 = sample(c(NA, 1:3), 10, replace=TRUE),
v2 = sample(c(NA, 1:4), 10, replace=TRUE))