I have a folder of .xml files which look like this:
<PubmedArticleSet>
  <PubmedArticle>
    <MedlineCitation Owner="NLM" Status="MEDLINE">
      <PMID Version="1">23458631</PMID>
      <DateCreated>
        <Year>2013</Year>
        <Month>04</Month>
        <Day>08</Day>
      </DateCreated>
      <MeshHeadingList>
        <MeshHeading>
          <DescriptorName MajorTopicYN="N">Animals</DescriptorName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName MajorTopicYN="N">Calcium</DescriptorName>
          <QualifierName MajorTopicYN="Y">metabolism</QualifierName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName MajorTopicYN="N">Calcium Chloride</DescriptorName>
          <QualifierName MajorTopicYN="N">administration & dosage</QualifierName>
        </MeshHeading>
      </MeshHeadingList>
    </MedlineCitation>
  </PubmedArticle>
  <PubmedArticle>
    <MedlineCitation Status="Publisher" Owner="NLM">
      <PMID Version="1">23458629</PMID>
      <DateCreated>
        <Year>2013</Year>
        <Month>3</Month>
        <Day>20</Day>
      </DateCreated>
      <MeshHeadingList>
        <MeshHeading>
          <DescriptorName MajorTopicYN="N">Adolescent</DescriptorName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName MajorTopicYN="N">Adult</DescriptorName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName MajorTopicYN="N">Anthropometry</DescriptorName>
        </MeshHeading>
      </MeshHeadingList>
    </MedlineCitation>
  </PubmedArticle>
</PubmedArticleSet>
I would like to use Python to parse the XML files and extract PMID,DateCreated,all DescriptorName and MajorTopicYN for each article. Then, save the result as .txt file that looks like:
ArticleID|CreatedDate|MeSH|IsMajor
23458631|20130408|Animals|N
23458631|20130408|Calcium|N
23458631|20130408|Calcium Chloride|N
23458629|20130320|Adolescent|N
23458629|20130320|Adult|N
23458629|20130320|Anthropometry|N
 
     
     
    