I have a big dataset df (354903 rows) with two columns named df$ColumnName and df$ColumnName.1
head(df)
       CompleteName       CompleteName.1
1   Lefebvre Arnaud Lefebvre Schuhl Anne
1.1 Lefebvre Arnaud              Abe Lyu
1.2 Lefebvre Arnaud              Abe Lyu
1.3 Lefebvre Arnaud       Louvet Nicolas
1.4 Lefebvre Arnaud   Muller Jean Michel
1.5 Lefebvre Arnaud  De Dinechin Florent
I am trying to create labels to see whether the name is the same or not. When I try a small subset it works [1 if they are the same, 0 if not]:
> match(df$CompleteName[1], df$CompleteName.1[1], nomatch = 0)
[1] 0
> match(df$CompleteName[1:10], df$CompleteName.1[1:10], nomatch = 0)
[1] 0 0 0 0 0 0 0 0 0 0
But as soon as I throw the complete columns, it gives me complete different values, which seem nonsense to me:
> match(df$CompleteName, df$CompleteName.1, nomatch = 0)
[1] 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101
[23] 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101
[45] 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101
Should I use sapply? I did not figured it out, I tried this with an error:
 sapply(df, function(x) match(x$CompleteName, x$CompleteName.1, nomatch = 0))
Please help!!!
 
     
     
     
     
    