Newlines can legitimately appear in xml data. A more robust approach would be to delimit xpath results by a character that is guaranteed to not occur in XML data. The Null character, U+0000 in the Universal Coded Character Set, is one such character.
Note that the code point U+0000, assigned to the null control
character, is the only character encoded in Unicode and ISO/IEC 10646
that is always invalid in any XML 1.0 and 1.1 document.
– https://en.wikipedia.org/wiki/Valid_characters_in_XML
@Cyker's merge request for xmllint included the addition of an -xpath0 option that would delimit xpath results by NUL. A new feature request for this functionality was opened as well.
Hopefully, xmllint will gain this feature soon.
xmlstarlet
In the mean time, another xpath command line tool, xmlstarlet, can be coaxed into achieving this goal now. xmlstarlet does not currently support output of NULs directly, but we can make it output U+FFFF, which, like NUL, is guaranteed to not occur in XML data (source). We then just need to translate U+FFFF to U+0000 and we'll have NUL delimited xpath results.
In the following examples, I'll use the following partial html file. It's the same example from the OP's question, except I added newlines for testing purposes.
cat >data.html <<'EOF'
<textarea name="command" class="setting-input fixed-width" rows="9">1
newline</textarea>
<textarea name="command" class="setting-input fixed-width" rows="5">2
newline</textarea>
EOF
Here is how to use xmlstarlet and sed to delimit the xpath results with NULs:
xmlstarlet fo -H -R data.html \
| xmlstarlet sel -t -m '//textarea[@name="command"]' -v '.' -o $'\uffff' \
| sed s/$'\uFFFF'/\\x00/g
perl could be used instead of sed, if you prefer: perl -CS -0xFFFF -l0 -pe ''
Note: I ran the HTML through xmlstarlet fo -H -R as shown in @TheDudeAbides answer.
Now that the xpath results are delimited by NULs, we can process the results with the help of xargs -0. Example:
xmlstarlet fo -H -R data.html \
| xmlstarlet sel -t -m '//textarea[@name="command"]' -v '.' -o $'\uffff' \
| sed s/$'\uFFFF'/\\x00/g \
| xargs -0 -n 1 printf '%q\n'
Result:
'1 '$'\n'' newline'
'2 '$'\n'' newline'
or load it into a bash array:
mapfile -t -d '' a < <(
xmlstarlet fo -H -R data.html \
| xmlstarlet sel -t -m '//textarea[@name="command"]' -v '.' -o $'\uffff' \
| sed s/$'\uFFFF'/\\x00/g
)
declare -p a
Result:
declare -a a=([0]=$'1 \n newline' [1]=$'2 \n newline')
saxon
Same technique using saxon instead of xmlstarlet:
xmllint --html data.html --dropdtd --xmlout \
| java -cp "$CP" net.sf.saxon.Query -s:- -qs:'//textarea[@name="command"]' !method=text !item-separator=$'\uFFFF' \
| sed s/$'\uFFFF'/\\x00/g \
| xargs -0 -n 1 printf '%q\n'