I have a txt file with 100,000+ lines of data. I want to turn it into a dataframe but do not need every line of data. An example of the data entry looks like this:
FN Clarivate Analytics Web of Science
VR 1.0
PT J
AU Yang, Qiang
   Liu, Yang
   Chen, Tianjian
   Tong, Yongxin
TI Federated Machine Learning: Concept and Applications
SO ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY
VL 10
IS 2
AR 12
DI 10.1145/3298981
DT Article
PD FEB 2019
PY 2019
AB Today's artificial intelligence still faces two major challenges (...) etc. 
I only want the rows that begin TI, AU, PD, AB and extract them into corresponding named columns. This is as far as I have gotten too and I am really struggling!
read.table("groupprojectdatabase.txt", header = FALSE, sep = ",", quote = "",
           dec = ".", numerals = c("allow.loss"),
           row.names = c("TI", "AU", "PB","AB"), col.names = c('title_col','author_col','date_col','summary_col'), as.is = !stringsAsFactors,
           na.strings = "NA", colClasses = NA, nrows = -1,
           skip = 0, check.names = TRUE, fill = FALSE,
           strip.white = FALSE, blank.lines.skip = TRUE,
           comment.char = "#",
           allowEscapes = FALSE, flush = FALSE,
           stringsAsFactors = FALSE,
           fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)
Any help would be really appreciated, even if it was what functions I need to look up or if I am on the right tracks. I was thinking that sep = command is relevant but I couldnt work out how to tell it to skip everything but the TI,AU,PB and AB rows
In particular I am not sure how to program R to treat entire sentences as variables, not each word etc.
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  line 1 did not have 4 elements
 
     
    