I have two dataframes, first one (dt) contains of all chr and second one (TargetWord) is a dictionary contains chr as well. I have used pmatch to search in dt which words are available in the TargetWord and returning the position from TargetWord. It is working fine when dataframes are small. But problem starts when the dataframes are huge, it is returning the word position for only the first column, rest of the columns are becoming NA.
## Data Table
word_1 <- c("conflict","", "resolved", "", "", "")
word_2 <- c("", "one", "tricky", "one", "", "one")
word_3 <- c("thanks","", "", "comments", "par","")
word_4 <- c("thanks","", "", "comments", "par","")
word_5 <- c("", "one", "tricky", "one", "", "one")
dt <- data.frame(word_1, word_2, word_3,word_4, word_5, stringsAsFactors = FALSE)
## Targeted Words
TargetWord <- data.frame(cbind(c("conflict", "thanks", "tricky", "one", "two", "three")))
## convert into matrix (needed)
dt <- as.matrix(dt)
TargetWord <- as.matrix(TargetWord)
result <- `dim<-`(pmatch(dt, TargetWord, duplicates.ok=TRUE), dim(dt))
print(result)
Returning result,
[,1] [,2] [,3] [,4] [,5]
[1,] 1 NA 2 2 NA
[2,] NA 4 NA NA 4
[3,] NA 3 NA NA 3
[4,] NA 4 NA NA 4
[5,] NA NA NA NA NA
[6,] NA 4 NA NA 4
Now after reading two .csv as bellow, result is just for the first column where I want it for all columns like above result. Bellow, dt1 = 79*50 dataframe, and word_dict 13901*1 dataframe.
#################### on big data #####################################
dt1 <- read.csv("C:/Users/Wonderland/Downloads/string_feature.csv", stringsAsFactors = FALSE)
word_dict <- read.csv("C:/Users/Wonderland/Downloads/word_dict.csv", stringsAsFactors = FALSE)
dt1 <- as.matrix(dt1)
word_dict <- as.matrix(word_dict)
result <- `dim<-`(pmatch(dt1, word_dict, duplicates.ok=TRUE), dim(dt1))
print(result)