2

How can I quickly narrow down duplicate id entries in a xml file, example:

<entry id="A">...
<entry id="B">...
<entry id="A">...

and output them

id="A" dup 2 times

Just to let you know I'm a total noob meaning I don't even know how to run any code, so if you have a code for this problem, can you at least tell me the software name I need to run it and I'll look it up from there.

Glorfindel
  • 4,158

1 Answers1

2

Here's an XSLT 2.0 stylesheet that does it:

<xsl:transform version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/"> <out> <xsl:for-each-group select="//entry" group-by="@id"> <xsl:if test="count(current-group()) > 1"> <duplicate id="{current-grouping-key()}" count="{count(current-group())"/> </xsl:if> </xsl:for-each-group> </out> </xsl:template>

</xsl:transform>

You can run this by (for example) downloading Saxon-HE from SourceForge, and running (from the command line)

java -jar saxon9he.jar -s:input.xml -xsl:count-dupes.xsl

where input.xml is your XML input and count-dupes.xsl is the stylesheet.

I have formatted the output as XML but of course you can change the output format if you like.