1

My apologies, I must have made some mistakes when performing the initial tests, as after putting everything into a single script the xxd output does indeed always match the stdouput.

The entire script is here: https://pastebin.pl/view/454913ec I'm updating my question and leave the original (but wrong) question below.

The output of the script that I'm getting is the following:

$ ./test.sh
# Case 1A: echo -n $TEST1
hello world
00000000: 6865 6c6c 6f20 776f 726c 64              hello world

Case 1B: echo -n -e $TEST1

hello world 00000000: 6865 6c6c 6f20 776f 726c 64 hello world

Case 1C: echo -n "$TEST1"

hello world 00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world

Case 1D: echo -n -e "$TEST1"

hello world 00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world

Case 1E: printf "%s" $TEST1

helloworld 00000000: 6865 6c6c 6f77 6f72 6c64 helloworld

Case 1F: $ printf "%s" "$TEST1"

hello world 00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world

--------------------------------

Case 2A: $ echo -n $TEST2

hello\nworld 00000000: 6865 6c6c 6f5c 6e77 6f72 6c64 hello\nworld

Case 2B: echo -n -e $TEST2

hello world 00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world

Case 2C: echo -n "$TEST2"

hello\nworld 00000000: 6865 6c6c 6f5c 6e77 6f72 6c64 hello\nworld

Case 2D: echo -n -e "$TEST2"

hello world 00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world

Case 2E: printf "%s" $TEST2

hello\nworld 00000000: 6865 6c6c 6f5c 6e77 6f72 6c64 hello\nworld

Case 2F: printf "%s" "$TEST2"

hello\nworld 00000000: 6865 6c6c 6f5c 6e77 6f72 6c64 hello\nworld

So the xxd output at least is the same for the same stdout output. Again, apologies for that!

So the remaining questions for me are:

  1. Why does Case 1E result in the output helloworld

  2. Which byte sequences are REALLY contained in TEST1 and TEST2 and which is the propper, portable way to figure that out?

  3. How can I make printf interpret the type of newline encoded in TEST2?

  4. is the following assignment portable (in the sense that it will always result in the same binary content in the variables?

$ TEST1="hello

world" $ TEST2="hello\nworld"

in another question I read that the locale only applies at expansion time, so that should mean it should, right?


Original (but wrong) question:

I performed the following tests using git bash:

$ TEST1="hello
> world"
$ TEST2="hello\nworld"

Case 1A:

$ echo -n $TEST1 hello world $ echo -n $TEST1 | xxd 00000000: 6865 6c6c 6f20 776f 726c 64 hello world

Case 1B:

$ echo -n -e $TEST1 hello world $ echo -n -e $TEST1 | xxd 00000000: 6865 6c6c 6f20 776f 726c 64 hello world

Case 1C:

$ echo -n "$TEST1" hello world $ echo -n "$TEST1" | xxd 00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world

Case 1D:

$ echo -n -e "$TEST1" hello world $ echo -n -e "$TEST1" | xxd 00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world

Case 1E:

$ printf "%s" $TEST1 helloworld $ printf "%s" $TEST1 | xxd 00000000: 6865 6c6c 6f77 6f72 6c64 helloworld

Case 1F:

$ printf "%s" "$TEST1" hello world $ printf "%s" "$TEST1" | xxd 00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world $

--------------------------------

Case 2A:

$ echo -n $TEST2 hello\nworld $ echo -n $TEST2 | xxd 00000000: 6865 6c6c 6f20 776f 726c 64 hello world

Case 2B:

$ echo -n -e $TEST2 hello world $ echo -n -e $TEST2 | xxd 00000000: 6865 6c6c 6f20 776f 726c 64 hello world

Case 2C:

$ echo -n "$TEST2" hello\nworld $ echo -n "$TEST2" | xxd 00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world

Case 2D:

$ echo -n -e "$TEST2" hello world $ echo -n -e "$TEST2" | xxd 00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world

Case 2E:

$ printf "%s" $TEST2 hello\nworld $ printf "%s" $TEST2 | xxd 00000000: 6865 6c6c 6f77 6f72 6c64 helloworld

Case 2F:

$ printf "%s" "$TEST2" hello\nworld $ printf "%s" "$TEST2" | xxd 00000000: 6865 6c6c 6f0a 776f 726c 64 hello.world $

first: I find this frustrating. Also I wish I could add some custom colors to codeblocks on stackoverflow to visualize the problem better (like, color equal outputs in equal colors).

second: With that off my chest - can someone help me make sense of these outputs by explaining the base rules that affect these results?

So, some things that confuse me for example:

  1. Even though the printed stdout outputs are different for TEST1 and TEST2 (for example Case 1A results in different output than case 2A), it seems that the actual bytes that xxd receives as input are identical in all respective TEST1 and TEST2 cases (with all respective cases I mean Case 1x has always the same xxd output as Case 2x, even though the corresponding stdout outputs of the same command are not equal). How is that possible?

  2. Obviously the contents of TEST1 and TEST2 must differ somehow, otherwise it wouldn't be possible that echoing/printing them could result in different stdout outputs. So, how can I properly output the ACTUAL bits (as hex or whatever, doesn't matter so long as it's a clear representation of the actual variable content) contained in those variables?

  3. the TEST1 cases would indicate that xxd receives a 0A newline ascii character exatly when the printout also shows a linebreak. However, in the TEST2 cases Case 2B prints a linebreak but doesn't result in a 0A character and Case 2F does not print a linebreak while however resulting in a 0A character

I kiiinda get that it seems the linebreak is differently encoded in the TEST1 and TEST2 variables, and that when echoing double-quoting seems to expand (is that the right terminology?) the kind of linebreak contained in TEST1, while the -e flag to echo seems to interpret the kind of linebreak encoded in TEST2, but that doesn't explain the xxd outputs as well as the printf cases.

  1. why does Case 1E result in

     $ printf "%s" $TEST1
     helloworld
    
  2. How can printf be made to apply the kind of linebreaks encoded in the TEST2 variable?

  3. What should be the most important lesson learned here?

notes: I refrained from adding

$ TEST3="hello\n
world"

to keep the question short.

I also tested using single quotes ' ' instead of double quotes " " when defining the variables, which does not seem to affect the results.

2 Answers2

1

Even though the printed stdout outputs are different for TEST1 and TEST2 (for example Case 1A results in different output than case 2A), it seems that the actual bytes that xxd receives as input are identical in all respective TEST1 and TEST2 cases (with all respective cases I mean Case 1x has always the same xxd output as Case 2x, even though the corresponding stdout outputs of the same command are not equal). How is that possible?

They're not identical. I cannot reproduce your results using Bash on Linux nor using Git's MSYS Bash on Windows.

when echoing double-quoting seems to expand (is that the right terminology?) the kind of linebreak contained in TEST1

If you quote a variable expansion, its value will remain as-is. If you don't quote a variable expansion, its value will be split into multiple parameters at whitespace. This is done by the shell itself, and happens regardless of which command you're using.

(Exception: Expansions done as part of string variable assignments aren't split. For example, foo=$TEST1 will preserve the original value.

However, Expansions done as part of array assignments are split. For example, foo=($TEST1) will result in a two-element array containing hello and world.)

Later on, when the echo command receives multiple arguments, it always joins them using a single space.

Obviously the contents of TEST1 and TEST2 must differ somehow, otherwise it wouldn't be possible that echoing/printing them could result in different stdout outputs. So, how can I properly output the ACTUAL bits (as hex or whatever, doesn't matter so long as it's a clear representation of the actual variable content) contained in those variables?

Use typeset -p TEST1 or declare -p TEST2. (I think Ksh/Zsh prefer typeset, Bash prefers declare, both do the same thing.)

Using printf %s "$TEST1" works for strings, though the above two also handle arrays. You can also use the %q expansion which will backslash-escape any special characters in the printed value (using $''-style quoting, which can then be used in a shell script again).

> printf %q "$TEST1"
$'hello\nworld'

> printf %q "$TEST2" hello\nworld

why does Case 1E result in helloworld

As mentioned before, an unquoted variable expansion causes its value to be split at whitespace and provided as multiple parameters. So the command in Case 1E is equivalen to:

printf "%s" "hello" "world"

and while it might seem nonsensical in most other languages carrying printf(), the printf command in Bash will repeat the pattern until it completely runs out arguments, meaning that the above is actually equivalent to:

printf %s "hello"
printf %s "world"

How can printf be made to apply the kind of linebreaks encoded in the TEST2 variable?

The %b expansion works like %s but additionally expands the backslash-escapes found in the argument.

$ printf %b 'Hello\t,\nworld\t!'
Hello   ,
world   !

What should be the most important lesson learned here?

Don't write shell scripts.

Quote variables in shell scripts, unless you know exactly when not to.

grawity
  • 501,077
0

I'm not sure that this accounts for all the differences, but I believe that the difference is that TEST1 contains a carriage-return (\r) and not a newline (\n).

In addition, this carriage-return is part of the string as binary character and needs no interpretation to be issued.

You may see the differences by the following code:

$ echo $TEST1 | od -w32 -t x1c
0000000  68  65  6c  6c  6f  20  3e  20  77  6f  72  6c  64  0a
          h   e   l   l   o       >       w   o   r   l   d  \n

$ echo $TEST2 | od -w32 -t x1c 0000000 68 65 6c 6c 6f 5c 6e 77 6f 72 6c 64 0a h e l l o \ n w o r l d \n

One should also remember that \r and \n are interpreted by the terminal, not by Bash. This means that mixing up their handling by Bash and the terminal can come up with various results according to the order in which the operations were done.

harrymc
  • 498,455