I have ~100 XML files of publication data each > 10GB formatted like this:
<?xml version="1.0" encoding="UTF-8"?> 
<records xmlns="http://website”>
<REC rid=“this is a test”>
    <UID>ABCD123</UID>
    <data_1>
        <fullrecord_metadata>
            <references count=“3”>
                <reference>
                    <uid>ABCD2345</uid>
                </reference>
                <reference>
                    <uid>ABCD3456</uid>
                </reference>
                <reference>
                    <uid>ABCD4567</uid>
                </reference>
            </references>
        </fullrecord_metadata>
    </data_1>
</REC>
<REC rid=“this is a test”>
    <UID>XYZ0987</UID>
    <data_1>
        <fullrecord_metadata>
            <references count=“N”>
            </references>
        </fullrecord_metadata>
    </data_1>
</REC>
</records>
, with variation in the number of references for each unique entry (indexed by UID), some of which may be zero.
The goal: create 1 simple data.frame per XML file as follows-
UID        reference
ABCD123    ABCD2345
ABCD123    ABCD3456
ABCD123    ABCD4567
XYZ0987    NULL
Due to the size of files and need for efficient looping over many files, I have been exploring xmlEventParse to limit memory usage. I can successfully extract the key unique "UID"s for each "REC" and create a data.frame using the following code from prior questions:
branchFunction <- function() {
 store <- new.env() 
 func <- function(x, ...) {
 ns <- getNodeSet(x, path = "//UID")
 key <- xmlValue(ns[[1]])
 value <- xmlValue(ns[[1]])
 print(value)
 store[[key]] <- value
}
 getStore <- function() { as.list(store) }
 list(UID = func, getStore=getStore)
}
 myfunctions <- branchFunction()
 xmlEventParse(
  file = "test.xml", 
  handlers = NULL, 
  branches = myfunctions
 )
 DF <- do.call(rbind.data.frame, myfunctions$getStore())
But I cannot successfully store the reference data nor handle the variation in reference numbers for a single UID. Thanks for any suggestions!