I have a data.table, DT containing one string variable, s, from which I want to create additional variables based on the positions of each character in s. The varname, start and end positions and vartype of each variable are given in a dictionary, dic. Here is an example, including the desired output:
DT <- DT <- data.table(s=c('a191','b292','c393'))
dic <- data.table(varname=c('bla,ble,bli'),start=c(1,2,3),end=c(1,2,4),vartype=c('c','i','i')
DTdesired <- data.table(bla=c('a','b','c'),ble=c(1,2,3),bli=c(91,92,93))
This is the same problem as importing data from a fixed width file (fwf), which has been covered in this SO question. The answer to that post includes a data table solution
DT[ , lapply(seq_len(length(cols$beg)), function(ii) {
stri_sub(V1, cols$beg[ii], cols$end[ii])
})]
But I am looking for a solution that besides start and end would also use the varname and coltype variables of the dictionary
Edit: In reality, DT would be a large Census file, around 200 million observations. Each element of s is a string with 200 characters. And the dictionary, dic, can contain about 10 to 50 variables to be extracted, depending on the year.