new to R and trying to make the most of it. This is a task that needs to be done numerous times per week.
At work we get a SAS-code text file that contains the following data:
            1,2,3,201,202,203 = "Screening"
            101,102,301,302,404,405,1001= "Cycle 1 Day 1"
            1002 = "Cycle 1 Day 2"
            1003 = "Cycle 1 Day 3"
            103,104,303,304,407,408,409,410 = "Cycle 1 Day 8"
            105,106,305,306,412,413 = "Cycle 1 Day 15"
            107,108,307,308,414,415,416,417,1022= "Cycle 1 Day 22"
            1023 = "Cycle 1 Day 23"
            1024 = "Cycle 1 Day 24"
            109,110,309,310,418,419,420,421,2001 = "Cycle 2 Day 1"
            2002= "Cycle 2 Day 2"
            2003= "Cycle 2 Day 3"
            111,112,422,423 = "Cycle 2 Day 8"
            113,114,311,312,424,425 = "Cycle 2 Day 15"
            115,116,426,427= "Cycle 2 Day 22"
            117,118,119,313,314,315 = "Cycle 2 End of Cycle"
            120,121,316,317,430,431 = "Cycle 3 Day 1"
            122,123,432,433 = "Cycle 3 Day 8"
            124,125,318,319,434,435 = "Cycle 3 Day 15"
            126,127,436,437 = "Cycle 3 Day 22"
            128,129,320,321,438,439 = "Cycle 4 Day 1"
I also have an Excel file with the following content:
Visit No    Vis Label
1   Screening Day -14 to -1
2   Screening Day -14 to -1
3   Screening Day -14 to -1
101 Cycle 1 Day 1 to 3
102 Cycle 1 Day 1 to 3
103 Cycle 1 Day 8     
104 Cycle 1 Day 8     
105 Cycle 1 Day 15     
106 Cycle 1 Day 15     
107 Cycle 1 Day 22     
108 Cycle 1 Day 22     
109 Cycle 2 Day 1     
110 Cycle 2 Day 1     
111 Cycle 2 Day 8     
112 Cycle 2 Day 8     
113 Cycle 2 Day 15     
114 Cycle 2 Day 15     
115 Cycle 2 Day 22     
116 Cycle 2 Day 22     
117 Cycle 2 End of Cycle
118 Cycle 2 End of Cycle
119 Cycle 2 End of Cycle
Now I have to compare the SAS-code txt file to the Excel one, and jot down which ones are missing from the SAS file.
I tried reading the SAS txt file, which became filled with white space and was able to remove that and get the content by this:
d <- read.delim("testSAS.txt", sep = ":", strip.white = TRUE, skip = 2, header = FALSE)
which results in this:
                               V1             V2
1               1,2,3,201,202,203      Screening
2    101,102,301,302,404,405,1001  Cycle 1 Day 1
3                            1002  Cycle 1 Day 2
4                            1003  Cycle 1 Day 3
5 103,104,303,304,407,408,409,410  Cycle 1 Day 8
6         105,106,305,306,412,413 Cycle 1 Day 15
Now I want to separate the V1 values such that each one makes a new row with the same value attached to it, as shown below:
1   Screening
2   Screening
3   Screening
201 Screening
202 Screening
203 Screening
I have used the following code to make it more malleable but it seems it returns a deformed list:
df<- data.frame(matrix(unlist(d), nrow = 61, byrow = T),stringsAsFactors = FALSE)
And now want to compare it to the Excel file in order to finally generate either a txt or xlsx saying which V1 values were not present.
Is there a faster way to do this than by manipulating the SAS txt file? How should I proceed with this? is it faster doing it from the excel to the txt or vice versa?
Any advice is welcomed !
 
     
    