Some .csv files with numerical data I work with contain errors, each error is marked as random string, for example after reading in, data frame could look like that :
set.seed(123)
rand.str <-  paste0(letters[sample(10)], collapse="")
wrong.output <- data.frame(a=1:5, b=c(4:5, rand.str, 7:8), stringsAsFactors=FALSE)
in this case proper output is :
proper.output <- data.frame(a=1:5, b=c(4:5, NA, 7:8))
after reading with read.csv each column with at least one character value is treated as character column.
Can I mark errors (random strings) as NAs while reading-in file? If not, what is the most convenient, proper or fastest method for subsetting them with NA's ?
There is na.strings argument in read.csv, but it is the solution only in simpler cases where it can be used like: na.strings=c("-", "unavailable")
(can't see any duplicate, so I guess there is simple, workaround)
colClasses suggestion does not work
read.csv("test.txt", sep=",", colClasses = c("numeric", "numeric"))
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : scan() expected 'a real', got 'chdgfajibe' In addition: Warning message: In read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'test.txt'
 
     
    