I need to process XSLT using python, currently I'm using lxml which only support XSLT 1, now I need to process XSLT 2 is there any way to use saxon XSLT processor with python?
7 Answers
There are two possible approaches:
set up an HTTP service that accepts tranformation requests and implements them by invoking Saxon from Java; you can then send the transformation requests from Python over HTTP
use the Saxon/C product
, currently available on prerelease: details here: http://www.saxonica.com/saxon-c/index.xml
- 156,231
 - 11
 - 92
 - 164
 
- 
                    @Maliqf, which approach did you end up taking? and how was your experience with it – Vijay Kumar Dec 16 '15 at 17:41
 - 
                    3I wrap Saxon/C in a thin Boost-Python wrapper. It's not difficult to do providing you know a bit of C/C++ - it's just a bit of boilerplate on-top of the the C++ examples given on Saxon's website. You can use the supplied PHP API as a guide on how to structure your Python API. I did it for exactly the reasons stated, no XSLT 3 support native to Python. It works well for me - specifically it's fast, unlike forking a child saxon process or HTTP requests. – Phil Jul 12 '18 at 12:29
 
Saxon/C release 1.2.0 is now out with XSLT 3.0 support for Python3 see details:
- 691
 - 6
 - 10
 
- 
                    4By now, this should be promoted to correct answer. Also cf. https://stackoverflow.com/questions/59059768/making-saxon-c-available-in-python for a step-by-step description. – Chiarcos Mar 14 '22 at 12:48
 - 
                    
 
At the moment there is not, but you could use the subprocess module to use the Saxon processor:
import subprocess
subprocess.call(["saxon", "-o:output.xml", "-s:file.xml", "file.xslt"])
- 898
 - 14
 - 17
 
On January 13, 2023, Saxonica has released their own mantained pip package for Saxon 12:
Now all we need is:
pip install saxonche
- 462
 - 1
 - 6
 - 20
 
If you're using Windows:
Download the zip file Saxon-HE 9.9 for Java from http://saxon.sourceforge.net/#F9.9HE and unzip the file to C:\saxon
Use this Python code:
import os
import subprocess
def file_path(relative_path):
    folder = os.path.dirname(os.path.abspath(__file__))
    path_parts = relative_path.split("/")
    new_path = os.path.join(folder, *path_parts)
    return new_path
def transform(xml_file, xsl_file, output_file):
    """all args take relative paths from Python script"""
    input = file_path(xml_file)
    output = file_path(output_file)
    xslt = file_path(xsl_file)
    subprocess.call(f"java -cp C:\saxon\saxon9he.jar net.sf.saxon.Transform -t -s:{input} -xsl:{xslt} -o:{output}")
- 2,397
 - 24
 - 39
 
This is in addition to the above answers suggesting subprocess and saxonche.
The example code in saxonche's pypi repository is slightly flawed in that there's essential indentation missing.
Also, I know it's just an example, but it would instantiate a new_xslt30_processor() for each and every xml file you need to transform. That wouldn't be very efficient.
My use case is that I periodically get a bunch of xml files (MARC21) that I need to transform with one and the same xslt-sheet (XSLT 2.0). So assume that the xslt-sheet 'o2a.xml' produces the desired output when I run
transform -s:my.xml -xsl:o2a.xml -o:my_output.xml
So I wrote this:
from saxonche import PySaxonProcessor
from pathlib import Path
class Xslt_proc():
    proc = PySaxonProcessor(license = False)
    nuproc = proc.new_xslt30_processor()
    xform = nuproc.compile_stylesheet(stylesheet_file='o2a.xsl')
    
def transform(processor, infile, sfx):
    outfname = f'{Path(infile).stem}_{sfx}.xml'
    doc = processor.proc.parse_xml(xml_file_name=infile)
    out = processor.xform.transform_to_string(xdm_node=doc)
    with open(outfname, 'w') as f:
        f.write(out)
def main():
    f_xml = 'some_xml_file.xml'
    P = Xslt_proc()
    transform(P, f_xml, '_done')
    
if __name__ == "__main__":
    main()
  
I was curious which method would be faster, subprocess or the code above.
So I ran 20 iterations on 5 input files. First using a subprocess call to transform.exe. And again, 20 iterations on the same 5 input files, with my own module, like this:
from pathlib import Path
import saxonche_transform as st
flist = [f.name for f in Path('.').glob('*.xml')]
P = st.Xslt_proc()
for i in range(20):
    for f in flist:
        st.transform(P, f, '_python')
The latter was 100 times faster, 2.6 seconds against 258 seconds for the subprocess test.
So thank you, Saxonica.
- 3,612
 - 5
 - 32
 - 46