I have the following xml generated from Gnote:
<?xml version="1.0"?>
<note version="0.3" xmlns:link="http://beatniksoftware.com/tomboy/link" xmlns:size="http://beatniksoftware.com/tomboy/size" xmlns="http://beatniksoftware.com/tomboy"><title>things</title><text xml:space="preserve"><note-content version="0.1" xmlns:link="http://beatniksoftware.com/tomboy/link" xmlns:size="http://beatniksoftware.com/tomboy/size">things
<list><list-item dir="ltr">sheets
</list-item><list-item dir="ltr">test
</list-item><list-item dir="ltr">eval</list-item></list>
asd
</note-content>
</text><last-change-date>2023-02-19T12:20:06.551763Z</last-change-date><last-metadata-change-date>2023-02-19T12:20:06.553010Z</last-metadata-change-date><create-date>2023-02-19T10:40:01.309068Z</create-date><cursor-position>90</cursor-position><selection-bound-position>-1</selection-bound-position><width>649</width><height>282</height></note>
I want all the text content in <note-content></note-content> without extra newlines. This includes text content in list/list-item elements. The content requested is the following including the format:
things
sheets
test
eval
asd
After much trial and error, parsing the xml with xmllint --xpath "//*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/text() | //*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/*[local-name()='list']/*[local-name()='list-item'][@dir='ltr']/text()[normalize-space()]" a.xml (with or without --noblanks) yields output separated by extra newline/blank lines (there is a supposed to be a blank line after asd but the code block isn't showing it):
things
sheets
test
eval
asd
Removing the new lines in the xml file and using the same xmllint command outputs desired output with no extra newlines/blank lines so I don't know if this is Gnote producing something non-standard.
I tried looking at the comments and answers at https://stackoverflow.com/questions/11776910/xpath-expression-to-remove-whitespace/11777638, but I've been unsuccessful. Some observations:
- When I tried to execute (notice the
|)xmllint --xpath "normalize-space(//*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/text()) | normalize-space(//*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/*[local-name()='list']/*[local-name()='list-item'][@dir='ltr']/text())" a.xmlI get:
XPath error : Invalid type
XPath evaluation failure
- Even if I reworked my script to execute multiple xmllint invocations I'm left with a single string where the newlines are removed, which is good, but which string needs to be manually set before hand. So for example here's both normalize-space and translate(normalize-space, ' ', '') variations for the note-content element path:
xmllint --xpath "normalize-space(//*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/text())" a.xml xmllint --xpath "translate(normalize-space(//*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/text()), ' ', '')" a.xmlBoth yield one of the same two items in the note-content element without newlines ([1] is things and [2] is asd). I can choose between the two by appending [1] or [2] to text(), but this doesn't work if I have an undefined number of items. (I don't know if there is a way to just get all of the text array/items this way). - Some answers suggest using
[normalize-space() = 'desiredtext'], this doesn't work if I can't expect the text in the generated xml. - If I just have
[normalize-space()]aftertext():xmllint --noblanks --xpath "//*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/text()[normalize-space()] | //*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/*[local-name()='list']/*[local-name()='list-item'][@dir='ltr']/text()[normalize-space()]" a.xmlI'm left with the same output I started with. - I tried appending
[not(.='')]aftertext()I get the same output.
The question: I want to know if this excessive blankline behavior is caused by incorrect xmllint/xpath commands or due to the way Gnote generated the xml and the correct xmllint/xpath commands if there are any. I am not looking to use xmlstarlet because it doesn't appear to be maintained anymore. This question is not asking for a way to pipe this into a command that removes extra newlines.