With MiFID 2 introduced, I would like to analyze the LEI data from GLEIF.
The data is in XML format, but boy! It is hard to parse.
I tried the code (see below), which freezes my machine almost completely and then gives this error:
AttributeError: no such child: {http://www.gleif.org/data/schema/leidata/2016}pyval. 
The structure of the data is really simple, but the files are large. Nevertheless, I think the main culprit is the use of special characters, i.e. the colon "lei:" in the tags, see this shortened example:
<lei:LEIData xmlns:gleif="http://www.gleif.org/concatenated-file/header-extension/2.0" xmlns:lei="http://www.gleif.org/data/schema/leidata/2016">
    <lei:LEIRecords>
         <lei:LEIRecord>
              <lei:LEI>029200137F2K8AH5C573</lei:LEI>
         </lei:LEIRecord>
     </lei:LEIRecords>
</lei:LEIData>
Any help?
I posted a larger sample on pastebin: https://pastebin.com/UbrM5mVp after having eliminated the lei:LEIHeader section.
See the python code below (borrowed from Wes McKinney's book, Section 6.1):
from lxml import objectify
path = '20180104-gleif-concatenated-file-lei2.xml'
data = []
parsed = objectify.parse(open(path))
root = parsed.getroot()
for child in root:
    print(child.tag, child.attrib)
for elt in root.INDICATOR:
    el_data = {}
    for child in elt.getchildren():
        el_data[child.tag] = child.pyval
    data.append(el_data)
perf = pd.DataFrame(data)
 
    