I have been working on large data set which has names of customers , each of this has to be checked with the master file which has correct names (300 KB) and if matched append the master file name to names of customer file as new column value. My prev Question worked for small data sets
Both Customer & Master file has been cleaned using tm and have tried different logic , but only works on small set of data when applied to huge files not effective, pattern matching doesn't help here my opinion cause no names comes with exact pattern
Cus File
1           chang chun petrochemical  
2                chang chun plastics  
3                     church  dwight  
4        citrix systems asia pacific  
5          cnh industrial services srl
6                   conoco phillips   
7                    conocophillips   
8                  dfk laurence varnay
9                       dtz worldwide 
10  electro motive maintenance operati
11                enterasys networks  
12                   esso  resources  
13                          expedia   
14                            expedia 
15        exponential interactive aust
16        exxonmobil asia pacific pte 
17    exxonmobil chemical asia pac div
18                     exxonmobil png 
19         formula  world championship
20      fortitech asia pacific sdn bhd
Master
1                     chang chun group
2                     church  dwight  
3        citrix systems asia pacific  
4                    cnh industrial nv
5                      conoco phillips
6                  dfk laurence varnay
7                  dtz group  zealand 
8                         caterpillar 
9                 enterasys networks  
10                   exxon mobil group
11                       expedia group
12        exponential interactive aust
13         formula  world championship
14      fortitech asia pacific sdn bhd
15                frhi hotels  resorts
16          gardner denver industries 
17  glencore xstrata international plc
18                            grace   
19                       incomm   nz  
20              information resources 
21                    kbr holdings llc
22                       kennametal   
23                            komatsu 
24     leonhard hofstetter pelzdesign 
25          communications corporation
26              manhattan associates  
27                             mattel 
28                        mmg finance 
29                     nokia oyj group
30                           nortek  
i have tried with this simple loop
for (i in 1:100){
  result$x[i] = agrep(result$ICIS_Cust_Names[i], result1$Master_Names, value = TRUE, max = list(del = 0.2, ins = 0.3, sub = 0.4))
  #result$Y[i] = agrep(result$ICIS_Cust_Names[i], result1$Master_Names, value = FALSE, max = list(del = 0.2, ins = 0.3, sub = 0.4))
}
*result *
1           chang chun petrochemical                             <NA> NA
2                chang chun plastics                             <NA> NA
3                     church  dwight                 church  dwight    2
4        citrix systems asia pacific    citrix systems asia pacific    3
5          cnh industrial services srl                           <NA> NA
6                   conoco phillips                  church  dwight    2
7                    conocophillips                              <NA> NA
8                  dfk laurence varnay                           <NA> NA
9                       dtz worldwide                church  dwight    2
10  electro motive maintenance operati                           <NA> NA
11                enterasys networks                             <NA> NA
12                   esso  resources                 church  dwight    2
13                          expedia                              <NA> NA
14                            expedia                            <NA> NA
15        exponential interactive aust               church  dwight    2
16        exxonmobil asia pacific pte                            <NA> NA
17    exxonmobil chemical asia pac div                           <NA> NA
18                     exxonmobil png                church  dwight    2
19         formula  world championship                           <NA> NA
20      fortitech asia pacific sdn bhd 
tried with lapply but no use , as you can notice my master file is large and some times i get error of rows  length doesn't match! 
mm<-dt[lapply(result, function(x) levenshteinDist(x ,lapply(result1, function(x) x)))]
#using looping stat. for checking each cus name with all the master names
for(i in seq(nrow(result)) )
    {
      if((levenshteindist(result[i],lapply(result1, function(x) String(x))))==0)
        sprintf("%s", x)
    }
which method would be best for this ? similar to my Q but not much helpfullI referd few Q from STO
it might be naive but when applied with huge data sets it mis behaves, can anybody familiar with R could correct me with the above code for levenshteinDist
code:
 #check with each value of master file and if matches more than .90 then return master value.
for(i in seq(1:nrow(gr1))
{
  for(j in seq(1:nrow(gr2))
  {
     gr1$jar[i,j]<-jarowinkler(gr1$ICIS_Cust_Names[i],gr2$Master_Names[j])
     if(gr1$jar[i,j]>.90)
         gr1$res[i] = gr2$Master_Names[j] 
  }
}
#Please let know if there is any minute error with this code
Please if anybody has worked with such data in R please help !
 
    