I have a data frame of character variables, containing long paragraphs, which I need to split up at positions determined by certain phrases. However the problem is that in many cases these phrases are merged with preceding words.
Here is what I am doing:
data  <- readLines(n=2)
= DAY 1 CHALLENGES = syndicated.= DAY 2 CHALLENGES = Red Sea.= DAY 3 CHALLENGES = framework.= DAY 4 CHALLENGES = Did ;-)= DAY 5 CHALLENGES = Paste ...= DAY 6 CHALLENGES = Name 
= DAY 1 CHALLENGES = very high.= DAY 2 CHALLENGES = Rank understand.= DAY 3 CHALLENGES = buy....= DAY 4 CHALLENGES = result.= DAY 5 CHALLENGES = coffee.= DAY 6 CHALLENGES = Bla.
df  <- as.data.frame(data)
delim  <- c("= DAY 1 CHALLENGES = ",
            "= DAY 2 CHALLENGES = ",
            "= DAY 3 CHALLENGES = ",
            "= DAY 4 CHALLENGES = ",
            "= DAY 5 CHALLENGES = ",
            "= DAY 6 CHALLENGES = ")
y  <- data.frame(do.call('rbind',
                         strsplit(as.character(df$data), delim, fixed = FALSE)))
y
                               X1
1                                
2 = DAY 1 CHALLENGES = very high.
                                                                                    X2
1 syndicated.= DAY 2 CHALLENGES = Red Sea.= DAY 3 CHALLENGES = framework.= DAY 4 CHALLENGES = Did ;-)= DAY 5 CHALLENGES = Paste ...= DAY 6 CHALLENGES = Name 
2                               Rank understand.= DAY 3 CHALLENGES = buy....= DAY 4 CHALLENGES = result.= DAY 5 CHALLENGES = coffee.= DAY 6 CHALLENGES = Bla.
I would like to get each = DAY x CHALLENGES = segment with the text until the next such segment as a separate variable.
Thanks!
Update with proposed methods:
> a  <- scan(file ="~/Desktop/alm/a.txt", what="")
Read 1 item
> a
[1] "= DAY 1 CHALLENGES = very high.= DAY 2 CHALLENGES = Rank understand.= DAY 3 CHALLENGES = buy....= DAY 4 CHALLENGES = result. = DAY 5 CHALLENGES = Paste the link(s) that you think is Paid Media.http://lebron11.nikeinc.com/ DAY 5 CHALLENGE: Paste the link(s) that you think is Owned Media.http://www.nike.com/ ; https://www.pinterest.com/nikewomen DAY 5 CHALLENGE: Paste the link(s) that you think is BONUS QUESTION DAY 5 = DAY 6 CHALLENGES = Bla."
> b  <- scan(file ="~/Desktop/alm/b.txt", what="")
Read 1 item
> b
[1] "= DAY 1 CHALLENGES = very high.= DAY 2 CHALLENGES = Rank understand.= DAY 3 CHALLENGES = buy....= DAY 4 CHALLENGES = result. Paste the link(s) that you think is Paid Media.http://lebron11.nikeinc.com/ DAY 5 CHALLENGE: Paste the link(s) that you think is Owned Media.http://www.nike.com/ ; https://www.pinterest.com/nikewomen DAY 5 CHALLENGE: Paste the link(s) that you think is BONUS QUESTION DAY 5 ?= DAY 6 CHALLENGES = Bla."
> c <- c(a,b)
> df  <- as.data.frame(c)
> lst <- strsplit(gsub(" (?=\\= DAY)", ".", c, perl=TRUE), 
+                 '(?<=[.)])(?=\\=)', perl=TRUE)
> out <-  do.call(cbind, lapply(lst, function(x) sub('^=.*= ', '', x)))
Warning message:
In (function (..., deparse.level = 1)  :
  number of rows of result is not a multiple of vector length (arg 2)
> out
     [,1]                                                                                                                                                                                                                                                                                
[1,] "very high."                                                                                                                                                                                                                                                                        
[2,] "Rank understand."                                                                                                                                                                                                                                                                  
[3,] "buy...."                                                                                                                                                                                                                                                                           
[4,] "result.."                                                                                                                                                                                                                                                                          
[5,] "Paste the link(s) that you think is Paid Media.http://lebron11.nikeinc.com/ DAY 5 CHALLENGE: Paste the link(s) that you think is Owned Media.http://www.nike.com/ ; https://www.pinterest.com/nikewomen DAY 5 CHALLENGE: Paste the link(s) that you think is BONUS QUESTION DAY 5."
[6,] "Bla."                                                                                                                                                                                                                                                                              
     [,2]              
[1,] "very high."      
[2,] "Rank understand."
[3,] "buy...."         
[4,] "Bla." #this is not the value from the input file           
[5,] "very high." #this is missing in the input file, yet a value is getting output      
[6,] "Rank understand." #incorrect recognition of ?= DAY 6 CHALLENGES =; the same happens with := and != or similar
Problems are indicated in the comments. An indication of a missing value will be useful instead of a random one being inserted.
 
    