My apologies, I must have made some mistakes when performing the initial tests, as after putting everything into a single script the xxd output does indeed always match the stdouput.
The entire script is here: https://pastebin.pl/view/454913ec I'm updating my question and leave the original (but wrong) question below.
The output of the script that I'm getting is the following:
$ ./test.sh
# Case 1A: echo -n $TEST1
hello world
00000000: 6865 6c6c 6f20 776f 726c 64 hello world
Case 1B: echo -n -e $TEST1
hello world
00000000: 6865 6c6c 6f20 776f 726c 64 hello world
Case 1C: echo -n "$TEST1"
hello
world
00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world
Case 1D: echo -n -e "$TEST1"
hello
world
00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world
Case 1E: printf "%s" $TEST1
helloworld
00000000: 6865 6c6c 6f77 6f72 6c64 helloworld
Case 1F: $ printf "%s" "$TEST1"
hello
world
00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world
--------------------------------
Case 2A: $ echo -n $TEST2
hello\nworld
00000000: 6865 6c6c 6f5c 6e77 6f72 6c64 hello\nworld
Case 2B: echo -n -e $TEST2
hello
world
00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world
Case 2C: echo -n "$TEST2"
hello\nworld
00000000: 6865 6c6c 6f5c 6e77 6f72 6c64 hello\nworld
Case 2D: echo -n -e "$TEST2"
hello
world
00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world
Case 2E: printf "%s" $TEST2
hello\nworld
00000000: 6865 6c6c 6f5c 6e77 6f72 6c64 hello\nworld
Case 2F: printf "%s" "$TEST2"
hello\nworld
00000000: 6865 6c6c 6f5c 6e77 6f72 6c64 hello\nworld
So the xxd output at least is the same for the same stdout output. Again, apologies for that!
So the remaining questions for me are:
Why does
Case 1Eresult in the outputhelloworldWhich byte sequences are REALLY contained in TEST1 and TEST2 and which is the propper, portable way to figure that out?
How can I make printf interpret the type of newline encoded in TEST2?
is the following assignment portable (in the sense that it will always result in the same binary content in the variables?
$ TEST1="hello
world" $ TEST2="hello\nworld"
in another question I read that the locale only applies at expansion time, so that should mean it should, right?
Original (but wrong) question:
I performed the following tests using git bash:
$ TEST1="hello
> world"
$ TEST2="hello\nworld"
Case 1A:
$ echo -n $TEST1
hello world
$ echo -n $TEST1 | xxd
00000000: 6865 6c6c 6f20 776f 726c 64 hello world
Case 1B:
$ echo -n -e $TEST1
hello world
$ echo -n -e $TEST1 | xxd
00000000: 6865 6c6c 6f20 776f 726c 64 hello world
Case 1C:
$ echo -n "$TEST1"
hello
world
$ echo -n "$TEST1" | xxd
00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world
Case 1D:
$ echo -n -e "$TEST1"
hello
world
$ echo -n -e "$TEST1" | xxd
00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world
Case 1E:
$ printf "%s" $TEST1
helloworld
$ printf "%s" $TEST1 | xxd
00000000: 6865 6c6c 6f77 6f72 6c64 helloworld
Case 1F:
$ printf "%s" "$TEST1"
hello
world
$ printf "%s" "$TEST1" | xxd
00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world
$
--------------------------------
Case 2A:
$ echo -n $TEST2
hello\nworld
$ echo -n $TEST2 | xxd
00000000: 6865 6c6c 6f20 776f 726c 64 hello world
Case 2B:
$ echo -n -e $TEST2
hello
world
$ echo -n -e $TEST2 | xxd
00000000: 6865 6c6c 6f20 776f 726c 64 hello world
Case 2C:
$ echo -n "$TEST2"
hello\nworld
$ echo -n "$TEST2" | xxd
00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world
Case 2D:
$ echo -n -e "$TEST2"
hello
world
$ echo -n -e "$TEST2" | xxd
00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world
Case 2E:
$ printf "%s" $TEST2
hello\nworld
$ printf "%s" $TEST2 | xxd
00000000: 6865 6c6c 6f77 6f72 6c64 helloworld
Case 2F:
$ printf "%s" "$TEST2"
hello\nworld
$ printf "%s" "$TEST2" | xxd
00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world
$
first: I find this frustrating. Also I wish I could add some custom colors to codeblocks on stackoverflow to visualize the problem better (like, color equal outputs in equal colors).
second: With that off my chest - can someone help me make sense of these outputs by explaining the base rules that affect these results?
So, some things that confuse me for example:
Even though the printed stdout outputs are different for
TEST1andTEST2(for exampleCase 1Aresults in different output thancase 2A), it seems that the actual bytes that xxd receives as input are identical in all respectiveTEST1andTEST2cases (with all respective cases I meanCase 1xhas always the same xxd output asCase 2x, even though the corresponding stdout outputs of the same command are not equal). How is that possible?Obviously the contents of
TEST1andTEST2must differ somehow, otherwise it wouldn't be possible that echoing/printing them could result in different stdout outputs. So, how can I properly output the ACTUAL bits (as hex or whatever, doesn't matter so long as it's a clear representation of the actual variable content) contained in those variables?the
TEST1cases would indicate that xxd receives a0Anewline ascii character exatly when the printout also shows a linebreak. However, in theTEST2casesCase 2Bprints a linebreak but doesn't result in a0Acharacter andCase 2Fdoes not print a linebreak while however resulting in a0Acharacter
I kiiinda get that it seems the linebreak is differently encoded in the TEST1 and TEST2 variables, and that when echoing double-quoting seems to expand (is that the right terminology?) the kind of linebreak contained in TEST1, while the -e flag to echo seems to interpret the kind of linebreak encoded in TEST2, but that doesn't explain the xxd outputs as well as the printf cases.
why does
Case 1Eresult in$ printf "%s" $TEST1 helloworldHow can printf be made to apply the kind of linebreaks encoded in the
TEST2variable?What should be the most important lesson learned here?
notes: I refrained from adding
$ TEST3="hello\n
world"
to keep the question short.
I also tested using single quotes ' ' instead of double quotes " " when defining the variables, which does not seem to affect the results.