(git) Bash: how exactly does a line break differ from \n in a variable?

Question

My apologies, I must have made some mistakes when performing the initial tests, as after putting everything into a single script the xxd output does indeed always match the stdouput.

The entire script is here: https://pastebin.pl/view/454913ec I'm updating my question and leave the original (but wrong) question below.

The output of the script that I'm getting is the following:

$ ./test.sh
# Case 1A: echo -n $TEST1
hello world
00000000: 6865 6c6c 6f20 776f 726c 64              hello world
Case 1B: echo -n -e $TEST1
hello world
00000000: 6865 6c6c 6f20 776f 726c 64              hello world
Case 1C: echo -n "$TEST1"
hello
world
00000000: 6865 6c6c 6f0a 776f 726c 64              hello.world
Case 1D: echo -n -e "$TEST1"
hello
world
00000000: 6865 6c6c 6f0a 776f 726c 64              hello.world
Case 1E: printf "%s" $TEST1
helloworld
00000000: 6865 6c6c 6f77 6f72 6c64                 helloworld
Case 1F: $ printf "%s" "$TEST1"
hello
world
00000000: 6865 6c6c 6f0a 776f 726c 64              hello.world
--------------------------------
Case 2A: $ echo -n $TEST2
hello\nworld
00000000: 6865 6c6c 6f5c 6e77 6f72 6c64            hello\nworld
Case 2B: echo -n -e $TEST2
hello
world
00000000: 6865 6c6c 6f0a 776f 726c 64              hello.world
Case 2C: echo -n "$TEST2"
hello\nworld
00000000: 6865 6c6c 6f5c 6e77 6f72 6c64            hello\nworld
Case 2D: echo -n -e "$TEST2"
hello
world
00000000: 6865 6c6c 6f0a 776f 726c 64              hello.world
Case 2E: printf "%s" $TEST2
hello\nworld
00000000: 6865 6c6c 6f5c 6e77 6f72 6c64            hello\nworld
Case 2F: printf "%s" "$TEST2"
hello\nworld
00000000: 6865 6c6c 6f5c 6e77 6f72 6c64            hello\nworld

So the xxd output at least is the same for the same stdout output. Again, apologies for that!

So the remaining questions for me are:

Why does Case 1E result in the output helloworld
Which byte sequences are REALLY contained in TEST1 and TEST2 and which is the propper, portable way to figure that out?
How can I make printf interpret the type of newline encoded in TEST2?
is the following assignment portable (in the sense that it will always result in the same binary content in the variables?

$ TEST1="hello

world" $ TEST2="hello\nworld"

in another question I read that the locale only applies at expansion time, so that should mean it should, right?

Original (but wrong) question:

I performed the following tests using git bash:

$ TEST1="hello
> world"
$ TEST2="hello\nworld"
Case 1A:
$ echo -n $TEST1
hello world
$ echo -n $TEST1 | xxd
00000000: 6865 6c6c 6f20 776f 726c 64              hello world
Case 1B:
$ echo -n -e $TEST1
hello world
$ echo -n -e $TEST1 | xxd
00000000: 6865 6c6c 6f20 776f 726c 64              hello world
Case 1C:
$ echo -n "$TEST1"
hello
world
$ echo -n "$TEST1" | xxd
00000000: 6865 6c6c 6f0a 776f 726c 64              hello.world
Case 1D:
$ echo -n -e "$TEST1"
hello
world
$ echo -n -e "$TEST1" | xxd
00000000: 6865 6c6c 6f0a 776f 726c 64              hello.world
Case 1E:
$ printf "%s" $TEST1
helloworld
$ printf "%s" $TEST1 | xxd
00000000: 6865 6c6c 6f77 6f72 6c64                 helloworld
Case 1F:
$ printf "%s" "$TEST1"
hello
world
$ printf "%s" "$TEST1" | xxd
00000000: 6865 6c6c 6f0a 776f 726c 64              hello.world
$
--------------------------------
Case 2A:
$ echo -n $TEST2
hello\nworld
$ echo -n $TEST2 | xxd
00000000: 6865 6c6c 6f20 776f 726c 64              hello world
Case 2B:
$ echo -n -e $TEST2
hello
world
$ echo -n -e $TEST2 | xxd
00000000: 6865 6c6c 6f20 776f 726c 64              hello world
Case 2C:
$ echo -n "$TEST2"
hello\nworld
$ echo -n "$TEST2" | xxd
00000000: 6865 6c6c 6f0a 776f 726c 64              hello.world
Case 2D:
$ echo -n -e "$TEST2"
hello
world
$ echo -n -e "$TEST2" | xxd
00000000: 6865 6c6c 6f0a 776f 726c 64              hello.world
Case 2E:
$ printf "%s" $TEST2
hello\nworld
$ printf "%s" $TEST2 | xxd
00000000: 6865 6c6c 6f77 6f72 6c64                 helloworld
Case 2F:
$ printf "%s" "$TEST2"
hello\nworld
$ printf "%s" "$TEST2" | xxd
00000000: 6865 6c6c 6f0a 776f 726c 64              hello.world
$

first: I find this frustrating. Also I wish I could add some custom colors to codeblocks on stackoverflow to visualize the problem better (like, color equal outputs in equal colors).

second: With that off my chest - can someone help me make sense of these outputs by explaining the base rules that affect these results?

So, some things that confuse me for example:

Even though the printed stdout outputs are different for TEST1 and TEST2 (for example Case 1A results in different output than case 2A), it seems that the actual bytes that xxd receives as input are identical in all respective TEST1 and TEST2 cases (with all respective cases I mean Case 1x has always the same xxd output as Case 2x, even though the corresponding stdout outputs of the same command are not equal). How is that possible?
Obviously the contents of TEST1 and TEST2 must differ somehow, otherwise it wouldn't be possible that echoing/printing them could result in different stdout outputs. So, how can I properly output the ACTUAL bits (as hex or whatever, doesn't matter so long as it's a clear representation of the actual variable content) contained in those variables?
the TEST1 cases would indicate that xxd receives a 0A newline ascii character exatly when the printout also shows a linebreak. However, in the TEST2 cases Case 2B prints a linebreak but doesn't result in a 0A character and Case 2F does not print a linebreak while however resulting in a 0A character

I kiiinda get that it seems the linebreak is differently encoded in the TEST1 and TEST2 variables, and that when echoing double-quoting seems to expand (is that the right terminology?) the kind of linebreak contained in TEST1, while the -e flag to echo seems to interpret the kind of linebreak encoded in TEST2, but that doesn't explain the xxd outputs as well as the printf cases.

why does Case 1E result in
```
 $ printf "%s" $TEST1
 helloworld
```
How can printf be made to apply the kind of linebreaks encoded in the TEST2 variable?
What should be the most important lesson learned here?

notes: I refrained from adding

$ TEST3="hello\n
world"

to keep the question short.

I also tested using single quotes ' ' instead of double quotes " " when defining the variables, which does not seem to affect the results.

score 1 · Answer 1 · answered Mar 01 '21 at 15:52

Even though the printed stdout outputs are different for TEST1 and TEST2 (for example Case 1A results in different output than case 2A), it seems that the actual bytes that xxd receives as input are identical in all respective TEST1 and TEST2 cases (with all respective cases I mean Case 1x has always the same xxd output as Case 2x, even though the corresponding stdout outputs of the same command are not equal). How is that possible?

They're not identical. I cannot reproduce your results using Bash on Linux nor using Git's MSYS Bash on Windows.

when echoing double-quoting seems to expand (is that the right terminology?) the kind of linebreak contained in TEST1

If you quote a variable expansion, its value will remain as-is. If you don't quote a variable expansion, its value will be split into multiple parameters at whitespace. This is done by the shell itself, and happens regardless of which command you're using.

(Exception: Expansions done as part of string variable assignments aren't split. For example, foo=$TEST1 will preserve the original value.

However, Expansions done as part of array assignments are split. For example, foo=($TEST1) will result in a two-element array containing hello and world.)

Later on, when the echo command receives multiple arguments, it always joins them using a single space.

Obviously the contents of TEST1 and TEST2 must differ somehow, otherwise it wouldn't be possible that echoing/printing them could result in different stdout outputs. So, how can I properly output the ACTUAL bits (as hex or whatever, doesn't matter so long as it's a clear representation of the actual variable content) contained in those variables?

Use typeset -p TEST1 or declare -p TEST2. (I think Ksh/Zsh prefer typeset, Bash prefers declare, both do the same thing.)

Using printf %s "$TEST1" works for strings, though the above two also handle arrays. You can also use the %q expansion which will backslash-escape any special characters in the printed value (using $''-style quoting, which can then be used in a shell script again).

> printf %q "$TEST1"
$'hello\nworld'
> printf %q "$TEST2"
hello\nworld

why does Case 1E result in helloworld

As mentioned before, an unquoted variable expansion causes its value to be split at whitespace and provided as multiple parameters. So the command in Case 1E is equivalen to:

printf "%s" "hello" "world"

and while it might seem nonsensical in most other languages carrying printf(), the printf command in Bash will repeat the pattern until it completely runs out arguments, meaning that the above is actually equivalent to:

printf %s "hello"
printf %s "world"

How can printf be made to apply the kind of linebreaks encoded in the TEST2 variable?

The %b expansion works like %s but additionally expands the backslash-escapes found in the argument.

$ printf %b 'Hello\t,\nworld\t!'
Hello   ,
world   !

What should be the most important lesson learned here?

~~Don't write shell scripts.~~

Quote variables in shell scripts, unless you know exactly when not to.

score 0 · Answer 2 · answered Mar 01 '21 at 16:00

I'm not sure that this accounts for all the differences, but I believe that the difference is that TEST1 contains a carriage-return (\r) and not a newline (\n).

In addition, this carriage-return is part of the string as binary character and needs no interpretation to be issued.

You may see the differences by the following code:

$ echo $TEST1 | od -w32 -t x1c
0000000  68  65  6c  6c  6f  20  3e  20  77  6f  72  6c  64  0a
          h   e   l   l   o       >       w   o   r   l   d  \n
$ echo $TEST2 | od -w32 -t x1c
0000000  68  65  6c  6c  6f  5c  6e  77  6f  72  6c  64  0a
          h   e   l   l   o   \   n   w   o   r   l   d  \n

One should also remember that \r and \n are interpreted by the terminal, not by Bash. This means that mixing up their handling by Bash and the terminal can come up with various results according to the order in which the operations were done.

(git) Bash: how exactly does a line break differ from \n in a variable?

Case 1B: echo -n -e $TEST1

Case 1C: echo -n "$TEST1"

Case 1D: echo -n -e "$TEST1"

Case 1E: printf "%s" $TEST1

Case 1F: $ printf "%s" "$TEST1"

--------------------------------

Case 2A: $ echo -n $TEST2

Case 2B: echo -n -e $TEST2

Case 2C: echo -n "$TEST2"

Case 2D: echo -n -e "$TEST2"

Case 2E: printf "%s" $TEST2

Case 2F: printf "%s" "$TEST2"

Case 1A:

Case 1B:

Case 1C:

Case 1D:

Case 1E:

Case 1F:

--------------------------------

Case 2A:

Case 2B:

Case 2C:

Case 2D:

Case 2E:

Case 2F:

2 Answers2