I have huge XML datasets (2-40GB). Some of the data is confidential, so I am trying to edit the dataset to mask all of the confidential information. I have a long list of each value that needs to be masked, so for example if I have ID 'GYT-1064' I need to find and replace every instance of it. These values can be in different fields/levels/subclasses, so in one object it might have 'Order-ID = GYT-1064' whereas another might say 'PO-Name = GYT-1064'. I have looked into iterparse but cannot figure out how to in-place edit the xml file instead of building the entire new tree in memory, because I have to loop through it multiple times to find each instance of each ID.
Ideal functionality:
For each element, if a given string is in element, replace the text and change the line in the XML file.
I have a solution that works if the dataset is small enough to load into memory, but I can't figure out how to correctly leverage iterparse. I've also looked into every answer that talks about lxml iterparse, but since I need to iterate through the entire file multiple times, I need to be able to edit it in place
Simple version that works, but has to load the whole xml into memory (and isn't in-place)
values_to_mask = ['val1', 'GMX-103', 'etc-555'] #imported list of vals to mask
with open(dataset_name, encoding='utf8') as f:
    tree = ET.parse(f)
    root = tree.getroot()
    for old in values_to_mask:
            new = mu.generateNew(old, randomnumber) #utility to generate new amt
            for elem in root.iter():
                try:
                    elem.text = elem.text.replace(old, new)
                except AttributeError:
                    pass
tree.write(output_name, encoding='utf8')
What I attempted with iterparse:
with open(output_name, mode='rb+') as f:
    context = etree.iterparse( f )
    for old in values_to_mask:
        new = mu.generateNew(old, randomnumber)
        mu.fast_iter(context, mu.replace_if_exists, old, new, f)
def replace_if_exists(elem, old, new, xf):
try:
    if(old in elem.text):
        elem.text = elem.text.replace(old, new)
        xf.write(elem)
except AttributeError:
    pass
It runs but doesn't replace any text, and I get print(context.root) = 'Null'. Additionally, it doesn't seem like it would correctly write back to the file in place.
Basically how the XML data looks (hierarchical objects with subclasses)
It looks generally like this:
<Master_Data_Object>
 <Package>
      <PackageNr>1000</PackageNr>
      <Quantity>900</Quantity>
      <ID>FAKE_CONFIDENTIALGYO421</ID>
      <Item_subclass>
        <ItemType>C</ItemType>
        <MasterPackageID>FAKE_CONFIDENTIALGYO421</MasterPackageID>
 <Package>
 <Other_Types>
 
     
    