How to find duplicate id entries in a xml file

Question

How can I quickly narrow down duplicate id entries in a xml file, example:

<entry id="A">...
<entry id="B">...
<entry id="A">...

and output them

id="A" dup 2 times

Just to let you know I'm a total noob meaning I don't even know how to run any code, so if you have a code for this problem, can you at least tell me the software name I need to run it and I'll look it up from there.

Michael Kay · Answer 1 · 2023-07-10T07:30:56.687

Here's an XSLT 2.0 stylesheet that does it:

<xsl:transform version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
  <out>
    <xsl:for-each-group select="//entry" group-by="@id">
      <xsl:if test="count(current-group()) > 1">
        <duplicate id="{current-grouping-key()}" count="{count(current-group())"/>
      </xsl:if>
    </xsl:for-each-group>
  </out>
</xsl:template>
</xsl:transform>

You can run this by (for example) downloading Saxon-HE from SourceForge, and running (from the command line)

java -jar saxon9he.jar -s:input.xml -xsl:count-dupes.xsl

where input.xml is your XML input and count-dupes.xsl is the stylesheet.

I have formatted the output as XML but of course you can change the output format if you like.

How to find duplicate id entries in a xml file

1 Answers1