15

I want to read a pom.xml ('Project Object Model' of Maven) and extract the version information. Here is an example:

<?xml version="1.0" encoding="UTF-8"?><project 
xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

    <modelVersion>4.0.0</modelVersion>
    <groupId>com.mycompany</groupId>
    <artifactId>project-parent</artifactId>
    <name>project-parent</name>
    <version>1.0.74-SNAPSHOT</version>
    <dependencies>
        <dependency>
        <groupId>com.sybase.jconnect</groupId>
        <artifactId>jconnect</artifactId>
        <version>6.05-26023</version>
    </dependency>
    <dependency>
        <groupId>joda-time</groupId>
        <artifactId>joda-time</artifactId>
        <version>1.5.2</version>
    </dependency>
    <dependency>
        <groupId>com.sun.jdmk</groupId>
        <artifactId>jmxtools</artifactId>
        <version>1.2.1</version>
    </dependency>
    <dependency>
        <groupId>org.easymock</groupId>
        <artifactId>easymock</artifactId>
        <version>2.4</version>
    </dependency>       
</dependencies>
</project>

How can I extract the version '1.0.74-SNAPSHOT' from above?

Would love to be able to do so using simple bash scripting sed or awk. Otherwise a simple python is preferred.

EDIT

  1. Constraint

    The linux box is in a corporate environment so I can only use tools that are already installed (not that I cannot request utility such as xml2, but I have to go through a lot of red-tape). Some of the solutions are very good (learn a few new tricks already), but they may not be applicable due to the restricted environment

  2. updated xml listing

    I added the dependencies tag to the original listing. This will show some hacky solution may not work in this case

  3. Distro

    The distro I am using is RHEL4

Cyrus
  • 5,751
Anthony Kong
  • 5,318

14 Answers14

19

xml2 can convert xml to/from line-oriented format:

xml2 < pom.xml  | grep /project/version= | sed 's/.*=//'
Vi.
  • 17,755
7

Other way: xmlgrep and XPath:

xmlgrep --text_only '/project/version' pom.xml

Disadvantage: slow

Vi.
  • 17,755
6

Using python

$ python -c 'from xml.etree.ElementTree import ElementTree; print ElementTree(file="pom.xml").findtext("{http://maven.apache.org/POM/4.0.0}version")'
1.0.74-SNAPSHOT

Using xmlstarlet

$ xml sel -N x="http://maven.apache.org/POM/4.0.0" -t -m 'x:project/x:version' -v . pom.xml
1.0.74-SNAPSHOT

Using xmllint

$ echo -e 'setns x=http://maven.apache.org/POM/4.0.0\ncat /x:project/x:version/text()' | xmllint --shell pom.xml | grep -v /
1.0.74-SNAPSHOT
kev
  • 13,200
5

Clojure way. Requires only jvm with special jar file:

java -cp clojure.jar clojure.main -e "(use 'clojure.xml) (->> (java.io.File. \"pom.xml\") (clojure.xml/parse) (:content) (filter #(= (:tag %) :version)) (first) (:content) (first) (println))"

Scala way:

java -Xbootclasspath/a:scala-library.jar -cp scala-compiler.jar scala.tools.nsc.MainGenericRunner -e 'import scala.xml._; println((XML.load(new java.io.FileInputStream("pom.xml")) match { case <project>{children @ _*}</project> => for (i <- children if (i  match { case <version>{children @ _*}</version> => true; case _ => false;  }))  yield i })(0) match { case <version>{Text(x)}</version> => x })'

Groovy way:

java -classpath groovy-all.jar groovy.ui.GroovyMain -e 'println (new XmlParser().parse(new File("pom.xml")).value().findAll({ it.name().getLocalPart()=="version" }).first().value().first())'
Vi.
  • 17,755
4

Here's an alternative in Perl

$ perl -MXML::Simple -e'print XMLin("pom.xml")->{version}."\n"'
1.0.74-SNAPSHOT

It works with the revised/extended example in the questions which has multiple "version" elements at different depths.

3

Hacky way:

perl -e '$_ = join "", <>; m!<project[^>]*>.*\n(?:    |\t)<version[^>]*>\s*([^<]+?)\s*</version>.*</project>!s and print "$1\n"' pom.xml

Relies on correct indentation of the required <version>

Vi.
  • 17,755
3

Work out a very clumsy, one-liner solution

python -c "from xml.dom.minidom import parse;dom = parse('pom.xml');print [n for n in dom.getElementsByTagName('version') if n.parentNode == dom.childNodes[0]][0].toxml()" | sed -e "s/.*>\(.*\)<.*/\1/g"

The sed at the end is very ugly but i was not able to print out the text of the node with mindom alone.

Update from _Vi:

Less hacky Python version:

python -c "from xml.dom.minidom import parse;dom = parse('pom.xml');print [i.childNodes.item(0).nodeValue for i in dom.firstChild.childNodes if i.nodeName == 'version'].pop()"

Update from me

Another version:

    python -c "from  xml.dom.minidom import parse;dom = parse('pom.xml');print [n.firstChild.data for n in dom.childNodes[0].childNodes if n.firstChild and n.tagName == 'version']"
Anthony Kong
  • 5,318
2

XSLT way:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        <xsl:output method="text"/>

        <xsl:template match="/">
                <xsl:for-each select="*[local-name()='project']">
                    <xsl:for-each select="*[local-name()='version']">
                        <xsl:value-of select="text()"/>
                    </xsl:for-each>
                </xsl:for-each>
        </xsl:template>
</xsl:stylesheet>
xalan -xsl x.xsl -in pom.xml
Vi.
  • 17,755
2

if "There are a lot of version tag in the xml" then you better forget about doing it with "simple tools" and regexps, that won't do.

try this python (no dependencies):

from xml.dom.minidom import parse

dom = parse('pom.xml')
project = dom.getElementsByTagName('project')[0]
for node in project.childNodes:
    if node.nodeType == node.ELEMENT_NODE and node.tagName == 'version':
        print node.firstChild.nodeValue
Samus_
  • 176
1

awk works fine without using any extra tools.
cat pod.xml

<project>
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.networks.app</groupId>
  <artifactId>operation-platform</artifactId>
  <version>1.0.0</version>
  <packaging>tar.xz</packaging>
  <description>POM was created by Sonatype Nexus</description>
</project>

simple and legible way to get the value of <packaging> tag:

cat pod.xml | awk -F'[<>]' '/packaging/{print $3}'
1

Here is a one-liner using sed:

sed '/<dependencies>/,/<\/dependencies>/d;/<version>/!d;s/ *<\/\?version> *//g' pom.xml
1

I know your question says Linux but if you have the need to do this on Windows without the need of any 3rd party tools such that you can put it in a batch file, Powershell can extract any node from the your pom.xml file like so:

powershell -Command "& {select-xml //pom:project/pom:properties/pom:mypluginversion -path pom.xml -Namespace  @{pom='http://maven.apache.org/POM/4.0.0'} | foreach {$_.Node.Innerxml}}" > myPluginVersion.txt
0
Return_text_val=$(xmllint --xpath "//*[local-name()='$TagElmnt']" $FILE )

Here, try this:

$TagElmnt - TagName
$FILE - xml file to parse
Kunal
  • 1,899
0
sed -n "/<name>project-parent/{n;s/.*>\(.*\)<.*/\1/p;q}" pom.xml

The -n option avoids printing non-matching lines; first match (/.../) is on the line before the one with wanted text; the n command skips to next line, where s extracts relevant info thru a capturing group (\(...\)), and a backreference (\1). p prints out, q quits.

SΛLVΘ
  • 1,465