I need to remove duplicated rows where all values are duplicates apart from two columns. How do I go about this?
The posted solutions are along the lines of this:
df[!duplicated(df[ , c("x","y")]),]
Where the duplicated rows are selected based on if the value appears again in the stated columns, i.e. x and y in the example above. Therefore the suggested answer by the person that closed my question is not helpful (R dataframe: drop duplicates based on certain columns [duplicate]).
Some of my data is duplicated in all 40 columns except two. I would therefore like to use these two columns as the condition (to ignore that is). I can remove the duplicated rows if I don't take into account one column like the example below. Hence, the second suggested answer is also redundant (Select all rows which are duplicates except for one column).
df[!duplicated(df[, -5]), ]
This is by using the column number but I can't get it to work for two columns.
Here is an example data frame where rows (see x5) 4, 10 and 16 need to be removed since they are duplicated in each column except x5 and x6.
df1 <- data.frame(x1 = c("1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2", "3", "3", "3", "3", "3", "3"),
x2 = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C"),
x3 = c("A1", "A1", "A2", "A2", "A3", "A3", "B1", "B1", "B2", "B2", "B3", "B3", "C1", "C1", "C2", "C2", "C3", "C3"),
x4 = c("35", "43", "33", "33", "63", "24", "14", "25", "77", "77", "94", "51", "34", "36", "55", "55", "73", "72"),
x5 = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18"),
x6 = c("XA", "XB", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XK", "XL", "XM", "XN", "XO", "XP", "XQ", "XR"))
This is my desired outcome.
df2 <- data.frame(x1 = c("1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "3", "3", "3", "3", "3"),
x2 = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C"),
x3 = c("A1", "A1", "A2", "A3", "A3", "B1", "B1", "B2", "B3", "B3", "C1", "C1", "C2", "C3", "C3"),
x4 = c("35", "43", "33", "63", "24", "14", "25", "77", "94", "51", "34", "36", "55", "73", "72"),
x5 = c("1", "2", "3", "5", "6", "7", "8", "9", "11", "12", "13", "14", "15", "17", "18"),
x6 = c("XA", "XB", "XC", "XE", "XF", "XG", "XH", "XI", "XK", "XL", "XM", "XN", "XO", "XQ", "XR"))