I have data.frames with character columns containing numbers (like '0123', '1234' etc). When I write them to csv and read them back, they end up as numeric columns. The write.csv and read.csv functions have quote arguments, and by default should quote character strings on output and respect them on input, so this behavior is unexpected.
How can I avoid this, without manually specifying colClasses when I read the file back in?
Reproducible example:
# dummy data
fake_data <- 
  data.frame(num=1:25, char=letters[1:25], charnum=as.character(1:25),
             stringsAsFactors=F)
# check out col classes - all good
sapply(fake_data, class)
#       num        char     charnum 
# "integer" "character" "character" 
# write it to a file and read it back
fpath <- '~/Desktop/fake_data.csv'
write.csv(fake_data, fpath, row.names=F)
fake_data2 <- read.csv(fpath, stringsAsFactors=F)
# but now look, different classes!
sapply(fake_data2, class)
#       num        char     charnum 
# "integer" "character"   "integer"
It seems like the error is on the read side, since the file is being written with quotes.
> cat(readLines(fpath))
"num","char","charnum" 1,"a","1" 2,"b","2" 3,"c","3" 4,"d","4" 5,"e","5" 6,"f","6" 7,"g","7" 8,"h","8" 9,"i","9" 10,"j","10" 11,"k","11" 12,"l","12" 13,"m","13" 14,"n","14" 15,"o","15" 16,"p","16" 17,"q","17" 18,"r","18" 19,"s","19" 20,"t","20" 21,"u","21" 22,"v","22" 23,"w","23" 24,"x","24" 25,"y","25"
sessionInfo:
R version 3.1.1 (2014-07-10) | Platform: x86_64-apple-darwin13.1.0 (64-bit)
 
     
     
     
     
    