I've been trying to parse some huge XML files that LXML won't grok, so I'm forced to parse them with xml.sax.
class SpamExtractor(sax.ContentHandler):
    def startElement(self, name, attrs):
        if name == "spam":
            print("We found a spam!")
            # now what?
The problem is that I don't understand how to actually return, or better, yield, the things that this handler finds to the caller, without waiting for the entire file to be parsed. So far, I've been messing around with threading.Thread and Queue.Queue, but that leads to all kinds of issues with threads that are really distracting me from the actual problem I'm trying to solve.
I know I could run the SAX parser in a separate process, but I feel there must be a simpler way to get the data out. Is there?
As to deleting nodes, I don't see where that is needed, could you explain?- Explained seconds later by larsmans. – Gareth Latty Jan 15 '12 at 22:31