We can use data.table. We convert the 'data.frame' to 'data.table' (setDT(x), grouped by the first column i.e. "X1", if, there is only one observation, return the row else remove all the duplicates and return only the unique row.
library(data.table)
setDT(x)[, if(.N==1) .SD else
.SD[!(duplicated(X2)|duplicated(X2, fromLast=TRUE))], X1]
# X1 X2
#1: 1 3
#2: 1 4
#3: 2 5
If we are using both "X1" and "X2" as grouping variable
setDT(x)[x[, .I[.N==1], .(X1, X2)]$V1]
# X1 X2
#1: 1 3
#2: 1 4
#3: 2 5
NOTE: Data.table is very fast and is compact.
Or without using any group by option, with base R we can do
x[!(duplicated(x)|duplicated(x, fromLast=TRUE)),]
# X1 X2
#1 1 3
#2 1 4
#4 2 5
Or with tally from dplyr
library(dplyr)
x %>%
group_by_(.dots= names(x)) %>%
tally() %>%
filter(n==1) %>%
select(-n)
Note that this should be faster than the other dplyr solution.
Benchmarks
library(data.table)
library(dplyr)
Sample data
set.seed(24)
x1 <- data.frame(X1 = sample(1:5000, 1e6, replace=TRUE),
X2 = sample(1:10000, 1e6, replace=TRUE))
x2 <- copy(as.data.table(x1))
Base R approaches
system.time(x1[with(x1, ave(X2, sprintf("%s__%s", X1, X2), FUN = length)) == 1, ])
# user system elapsed
# 20.245 0.002 20.280
system.time(x1[!(duplicated(x1)|duplicated(x1, fromLast=TRUE)), ])
# user system elapsed
# 1.994 0.000 1.998
dplyr approaches
system.time(x1 %>% group_by(X1, X2) %>% filter(n() == 1))
# user system elapsed
# 33.400 0.006 33.467
system.time(x1 %>% group_by_(.dots= names(x2)) %>% tally() %>% filter(n==1) %>% select(-n))
# user system elapsed
# 2.331 0.000 2.333
data.table approaches
system.time(x2[x2[, .I[.N==1], list(X1, X2)]$V1])
# user system elapsed
# 1.128 0.001 1.131
system.time(x2[, .N, by = list(X1, X2)][N == 1][, N := NULL][])
# user system elapsed
# 0.320 0.000 0.323
Summary: The "data.table" approaches win hands down, but if you're unable to use the package for some reason, using duplicated from base R also performs quite well.