I have a data.table where one of the columns contains JSON. I am trying to extract the content so that each variable is a column.
library(jsonlite)
library(data.table)
df<-data.table(a=c('{"tag_id":"34","response_id":2}',
                   '{"tag_id":"4","response_id":1,"other":4}',
                   '{"tag_id":"34"}'),stringsAsFactors=F)
The desired result, that does not refer to the "other" variable:
   tag_id response_id 
1     "34"      2 
2     "4"       1 
3     "34"      NA
I have tried several versions of:
parseLog <- function(x){
  if (is.na(x))
    e=c(tag_id=NA,response_id=NA)
  else{
    j=fromJSON(x)
    e=c(tag_id=as.integer(j$tag_id),response_id=j$response_id)
  }
  e
}
that seems to work well to retrieve a list of vectors (or lists if c is replaced by list) but when I try to convert the list to data.table something doesn´t work as expected.
   parsed<-lapply(df$a,parseLog)
   rparsed<-do.call(rbind.data.frame,parsed)
   colnames(rparsed)<-c("tag_id","response_id")
Because of the missing value in the third row. How can I solve it in a R-ish clean way? How can I make that my parse method returns an NA for the missing value. Alternative, Is there a parameter "fill" like there is for rbind that can be used in rbind.data.frame or analogous method?
The dataset I am using has 11M rows so performance is important.
Additionally, there is an equivalent method to rbind.data.frame to obtain a data.table. How would that be used? When I check the documentation it refers me to rbindlist but it complains the parameter is not used and if call directly(without do.call it complains about the type of parsed):
rparsed<-do.call(rbindlist,fill=T,parsed)
EDIT: The case I need to cover is more general, in a set of 11M records all the possible circumstances happen:
df<-data.table(a=c('{"tag_id":"34","response_id":2}',
                   '{"trash":"34","useless":2}',                          
                   '{"tag_id":"4","response_id":1,"other":4}',
                   NA,
                   '{"response_id":"34"}', 
                   '{"tag_id":"34"}'),stringsAsFactors=F)
and the output should only contain tag_id and response_id columns.