0

I have the following xml generated from Gnote:

<?xml version="1.0"?>
<note version="0.3" xmlns:link="http://beatniksoftware.com/tomboy/link" xmlns:size="http://beatniksoftware.com/tomboy/size" xmlns="http://beatniksoftware.com/tomboy"><title>things</title><text xml:space="preserve"><note-content version="0.1" xmlns:link="http://beatniksoftware.com/tomboy/link" xmlns:size="http://beatniksoftware.com/tomboy/size">things
<list><list-item dir="ltr">sheets
</list-item><list-item dir="ltr">test
</list-item><list-item dir="ltr">eval</list-item></list>
asd
</note-content>
</text><last-change-date>2023-02-19T12:20:06.551763Z</last-change-date><last-metadata-change-date>2023-02-19T12:20:06.553010Z</last-metadata-change-date><create-date>2023-02-19T10:40:01.309068Z</create-date><cursor-position>90</cursor-position><selection-bound-position>-1</selection-bound-position><width>649</width><height>282</height></note>

I want all the text content in <note-content></note-content> without extra newlines. This includes text content in list/list-item elements. The content requested is the following including the format:

things
sheets
test
eval
asd

After much trial and error, parsing the xml with xmllint --xpath "//*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/text() | //*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/*[local-name()='list']/*[local-name()='list-item'][@dir='ltr']/text()[normalize-space()]" a.xml (with or without --noblanks) yields output separated by extra newline/blank lines (there is a supposed to be a blank line after asd but the code block isn't showing it):

things

sheets

test

eval

asd

Removing the new lines in the xml file and using the same xmllint command outputs desired output with no extra newlines/blank lines so I don't know if this is Gnote producing something non-standard.

I tried looking at the comments and answers at https://stackoverflow.com/questions/11776910/xpath-expression-to-remove-whitespace/11777638, but I've been unsuccessful. Some observations:

  1. When I tried to execute (notice the |) xmllint --xpath "normalize-space(//*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/text()) | normalize-space(//*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/*[local-name()='list']/*[local-name()='list-item'][@dir='ltr']/text())" a.xml I get:

XPath error : Invalid type

XPath evaluation failure

  1. Even if I reworked my script to execute multiple xmllint invocations I'm left with a single string where the newlines are removed, which is good, but which string needs to be manually set before hand. So for example here's both normalize-space and translate(normalize-space, ' ', '') variations for the note-content element path: xmllint --xpath "normalize-space(//*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/text())" a.xml xmllint --xpath "translate(normalize-space(//*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/text()), ' ', '')" a.xml Both yield one of the same two items in the note-content element without newlines ([1] is things and [2] is asd). I can choose between the two by appending [1] or [2] to text(), but this doesn't work if I have an undefined number of items. (I don't know if there is a way to just get all of the text array/items this way).
  2. Some answers suggest using [normalize-space() = 'desiredtext'], this doesn't work if I can't expect the text in the generated xml.
  3. If I just have [normalize-space()] after text(): xmllint --noblanks --xpath "//*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/text()[normalize-space()] | //*[local-name()='note']/*[local-name()='text']/*[local-name()='note-content']/*[local-name()='list']/*[local-name()='list-item'][@dir='ltr']/text()[normalize-space()]" a.xml I'm left with the same output I started with.
  4. I tried appending [not(.='')] after text() I get the same output.

The question: I want to know if this excessive blankline behavior is caused by incorrect xmllint/xpath commands or due to the way Gnote generated the xml and the correct xmllint/xpath commands if there are any. I am not looking to use xmlstarlet because it doesn't appear to be maintained anymore. This question is not asking for a way to pipe this into a command that removes extra newlines.

Yetoo
  • 13

1 Answers1

0

Although you have expressed a preference to avoid xmlstarlet, it will do exactly what you want:

xmlstarlet sel -t -v '//_:note-content' -n xmlfile

Output

things
sheets
test
eval
asd

Using xmllint I cannot avoid the blank lines that are part of the element value text:

xmllint --xpath '//*[local-name()="note-content"]//text()' xmlfile

Output

things

sheets

test

eval

asd

After having spent some time with xmllint I would suggest that you simply remove the blank lines. (Not ideal, but certainly effective.)

xmllint … | grep .

Output

things
sheets
test
eval
asd
Chris Davies
  • 4,560