Use saxon with python

Question

I need to process XSLT using python, currently I'm using lxml which only support XSLT 1, now I need to process XSLT 2 is there any way to use saxon XSLT processor with python?

Michael Kay · Accepted Answer · 2018-07-17T08:56:41.793

17

There are two possible approaches:

set up an HTTP service that accepts tranformation requests and implements them by invoking Saxon from Java; you can then send the transformation requests from Python over HTTP
use the Saxon/C product~~, currently available on prerelease~~: details here: http://www.saxonica.com/saxon-c/index.xml

edited Jul 17 '18 at 08:56

answered Apr 04 '15 at 09:03

Michael Kay

156,231
11
92
164

@Maliqf, which approach did you end up taking? and how was your experience with it – Vijay Kumar Dec 16 '15 at 17:41
3

I wrap Saxon/C in a thin Boost-Python wrapper. It's not difficult to do providing you know a bit of C/C++ - it's just a bit of boilerplate on-top of the the C++ examples given on Saxon's website. You can use the supplied PHP API as a guide on how to structure your Python API. I did it for exactly the reasons stated, no XSLT 3 support native to Python. It works well for me - specifically it's fast, unlike forking a child saxon process or HTTP requests. – Phil Jul 12 '18 at 12:29

score 10 · Answer 2 · answered Oct 18 '19 at 09:53

10

Saxon/C release 1.2.0 is now out with XSLT 3.0 support for Python3 see details:

http://www.saxonica.com/saxon-c/index.xml

answered Oct 18 '19 at 09:53

ond1

691
6
10

4

By now, this should be promoted to correct answer. Also cf. https://stackoverflow.com/questions/59059768/making-saxon-c-available-in-python for a step-by-step description. – Chiarcos Mar 14 '22 at 12:48
SaxonC 11 has since been released. – ond1 Mar 15 '22 at 10:04

score 8 · Answer 3 · answered Jul 20 '16 at 10:03

8

A Python interface for Saxon/C is in development and worth a look:

https://github.com/ajelenak/pysaxon

answered Jul 20 '16 at 10:03

ond1

691
6
10

Bruno · Answer 4 · 2016-10-21T03:39:05.213

5

At the moment there is not, but you could use the subprocess module to use the Saxon processor:

import subprocess

subprocess.call(["saxon", "-o:output.xml", "-s:file.xml", "file.xslt"])

edited Oct 21 '16 at 03:39

answered Oct 21 '16 at 03:17

Bruno

898
14
17

score 3 · Answer 5 · answered Mar 07 '23 at 09:29

3

On January 13, 2023, Saxonica has released their own mantained pip package for Saxon 12:

saxonche

Now all we need is:

pip install saxonche

answered Mar 07 '23 at 09:29

Mihail-Cosmin Munteanu

462
1
6
20

score 1 · Answer 6 · answered Dec 16 '19 at 02:39

If you're using Windows:

Download the zip file Saxon-HE 9.9 for Java from http://saxon.sourceforge.net/#F9.9HE and unzip the file to C:\saxon

Use this Python code:

import os
import subprocess

def file_path(relative_path):
    folder = os.path.dirname(os.path.abspath(__file__))
    path_parts = relative_path.split("/")
    new_path = os.path.join(folder, *path_parts)
    return new_path

def transform(xml_file, xsl_file, output_file):
    """all args take relative paths from Python script"""
    input = file_path(xml_file)
    output = file_path(output_file)
    xslt = file_path(xsl_file)

    subprocess.call(f"java -cp C:\saxon\saxon9he.jar net.sf.saxon.Transform -t -s:{input} -xsl:{xslt} -o:{output}")

score 0 · Answer 7 · answered May 08 '23 at 16:32

This is in addition to the above answers suggesting subprocess and saxonche.

The example code in saxonche's pypi repository is slightly flawed in that there's essential indentation missing.

Also, I know it's just an example, but it would instantiate a new_xslt30_processor() for each and every xml file you need to transform. That wouldn't be very efficient.

My use case is that I periodically get a bunch of xml files (MARC21) that I need to transform with one and the same xslt-sheet (XSLT 2.0). So assume that the xslt-sheet 'o2a.xml' produces the desired output when I run

transform -s:my.xml -xsl:o2a.xml -o:my_output.xml

So I wrote this:

from saxonche import PySaxonProcessor
from pathlib import Path

class Xslt_proc():
    proc = PySaxonProcessor(license = False)
    nuproc = proc.new_xslt30_processor()
    xform = nuproc.compile_stylesheet(stylesheet_file='o2a.xsl')
    
def transform(processor, infile, sfx):
    outfname = f'{Path(infile).stem}_{sfx}.xml'
    doc = processor.proc.parse_xml(xml_file_name=infile)
    out = processor.xform.transform_to_string(xdm_node=doc)
    with open(outfname, 'w') as f:
        f.write(out)

def main():
    f_xml = 'some_xml_file.xml'
    P = Xslt_proc()
    transform(P, f_xml, '_done')
    
if __name__ == "__main__":
    main()

I was curious which method would be faster, subprocess or the code above.

So I ran 20 iterations on 5 input files. First using a subprocess call to transform.exe. And again, 20 iterations on the same 5 input files, with my own module, like this:

from pathlib import Path
import saxonche_transform as st

flist = [f.name for f in Path('.').glob('*.xml')]

P = st.Xslt_proc()

for i in range(20):
    for f in flist:
        st.transform(P, f, '_python')

The latter was 100 times faster, 2.6 seconds against 258 seconds for the subprocess test.

So thank you, Saxonica.

Use saxon with python

7 Answers7

Linked