I'm dealing with my huge .txt data frames generated from microscopic data. Each single .txt output file from it is about 3 to 4 GB! And I have a couple hundreds of them....
For each of those monster file, it has a couple hundreds of features, some are categorical and some are numeric.
Here is an abstract example of the dataframe:
df <- read.csv("output.txt", sep="\t", skip = 9,header=TRUE, fill = T)
df
Row  Column stimulation Compound Concentration treatmentsum Pid_treatmentsum  var1 var2  var3  ...
1    1      uns         Drug1    3             uns_Drug1_3  Jack_uns_Drug1_3  15.0 20.2  3.568 ...
1    1      uns         Drug1    3             uns_Drug1_3  Jack_uns_Drug1_3  55.0 0.20  9.068
1    1      uns         Drug2    5             uns_Drug2_5  Jack_uns_Drug2_5  100  50.2  3.568
1    1      uns         Drug2    5             uns_Drug2_5  Jack_uns_Drug2_5  75.0 60.2  13.68
1    1      3S          Drug1    3             3s_Drug3_3   Jack_3s_Drug1_3   65.0 30.8  6.58
1    1      4S          Drug1    3             4s_Drug3_3   Jack_4s_Drug1_3   35.0 69.3  2.98
.....
And I would like to split the data frame based on common value in a categorical column, the treatmentsum. So I can have all cells treated with the same drug and same dosage together, aka all "uns_Drug1_3" goes to one output.txt.
I have seen similar post so I used split()
sptdf <- split(df, df$treatmentsum)
it worked, as now sptdf gave me lists of data frames. Now I want to write them out as tables, ideally I want to use the "Pid_treatmentsum" element as the name of each splited file's name, as they should have the exact same "Pid_treatmentsum" after splitting. I don't quite know how to do that, so thus far I can at least manual input patient ID and join them by paste
lapply(names(sptdf), function(x){write.table(sptdf[[x]], file = paste("Jack", x, sep = "_"))}) 
This works isn a sense that it writes out all the individual files with correct titles, but they are not .txt and if I open them in excel, I get warning messages that they are corrupted. Meanwhile in R, I get warning messages
    Error in file(file, ifelse(append, "a", "w")) : 
      cannot open the connection
Where did I got this wrong?
Given the sheer size of each output file by the microscope (3-4GB), is this the best way to do this?
And if I can push this further, can I dump all hundreds of those huge files in a folder, and could I write a loop to autopmate the process instead of splitting one file a time? the only problem I foresee is the microscope outfiles always have the same name, titled "output".
Thank you in advance, and sorry for the long post.
Cheers, ML
 
     
     
    