1

I am having an issue with a bash script that is echoing output in an unexpected order. The script is as follows. The problem is with the output of lines 30-32.


1 IFS=$'\n'
2 i=1
3 bluered=""
4 blueyellow=""
5 redyellow=""
6 all=""
7 while [ $i -le `cat sorted.csv | wc -l` ]
8       do
9           for j in {0..2}
10               do
11                     # cat sorted.csv | head -$i | tail -1 | awk -F',' '{print $1}'
12                     declare "`cat sorted.csv | head -$i | tail -1 | awk -F',' '{print $1}'`=`cat sorted.csv | head -$i | tail -1 | awk -F','   '{print $5}'`"
13                     i=$((i+1))
14               done
15
16           if [[ ${blue} == ${red} ]]; then bluered=1; else bluered=0; fi
17           if [[ ${blue} == ${yellow} ]]; then blueyellow=1; else blueyellow=0; fi
18           if [[ ${red} == ${yellow} ]]; then redyellow=1; else redyellow=0; fi
19           if [[ ${blue} == ${red} ]] && [[ ${red} == ${yellow} ]]; then all=1; else all=0; fi
20
21           echo "`cat sorted.csv | head -$((i-3)) | tail -1`"
22           echo ",$all,$bluered,$blueyellow,$redyellow"
23           echo "`cat sorted.csv | head -$((i-2)) | tail -1`"
24           echo ",$all,$bluered,$blueyellow,$redyellow"
25           echo "`cat sorted.csv | head -$((i-1)) | tail -1`"
26           echo ",""$all"",""$bluered"",""$blueyellow"",""$redyellow"
27
28
29
30           echo  "`cat sorted.csv | head -$((i-3)) | tail -1`,$all,$bluered,$blueyellow,$redyellow"
31           echo  "`cat sorted.csv | head -$((i-2)) | tail -1`"",$all,$bluered,$blueyellow,$redyellow"
32           echo  "`cat sorted.csv | head -$((i-1)) | tail -1`"",""$all"",""$bluered"",""$blueyellow"",""$redyellow"
33       done

Lines 30-32 have slightly different double quote formatting as I was trying different things to get it to work correctly. Lines 21-26 are nothing more than lines 30-32 decomposed into 2 parts (i.e. line 21-22 is the same as line 30).

Based on the input file, "sorted.csv," the correct output of lines 30-32 (for the first 3 lines of the input file) should be:

blue,1,WCC131035882,0,e89d89d7ca7c502ca8d3b2e0d7c4980dba346a63d57a437d8f1428065fb83e9f,0,0,0,1
red,1,Z292V5DB,0,68a4917c878f1b26e370264097f476840aa995dc6b8d6d2e552a78a6bdd77c68,0,0,0,2
yellow,1,Z292V94K,0,68a4917c878f1b26e370264097f476840aa995dc6b8d6d2e552a78a6bdd77c68,0,0,0,1

but the actual output is:

,0,0,0,1CC131035882,0,e89d89d7ca7c502ca8d3b2e0d7c4980dba346a63d57a437d8f1428065fb83e9f #(line 30 output)
,0,0,0,192V5DB,0,68a4917c878f1b26e370264097f476840aa995dc6b8d6d2e552a78a6bdd77c68 #(line 31 output)
,0,0,0,1,Z292V94K,0,68a4917c878f1b26e370264097f476840aa995dc6b8d6d2e552a78a6bdd77c68 #(line 32 output)

Lines 21 - 26 return the following output:

blue,1,WCC131035882,0,e89d89d7ca7c502ca8d3b2e0d7c4980dba346a63d57a437d8f1428065fb83e9f #(line 21 output)
,0,0,0,1 #(line 22 output)   
red,1,Z292V5DB,0,68a4917c878f1b26e370264097f476840aa995dc6b8d6d2e552a78a6bdd77c68 #(line 23output)
,0,0,0,1 #(line 24 output)
yellow,1,Z292V94K,0,68a4917c878f1b26e370264097f476840aa995dc6b8d6d2e552a78a6bdd77c68 #(line 25)
,0,0,0,1 #(line 26 output)

In short, I want to concatenate the output from lines 21-22, 23-24, and 25-26 using 3 single line commands like lines 30-32 (but with correct syntax).Note, lines 21-26 are only included in the script to demonstrate that the two parts of line 30 (31 or 32) are working correctly when separated into two lines. Presently, line 30, effectively concatenates the the output from line 22 and then line 21 instead of 21 and then 22. However, in doing this reverse concatenation, it also truncates the first 8 characters of the output of line 21 (note, the output of line 22 is exactly 8 characters).

How do I correctly write lines 30-32 so they create the desired output?

Thanks in advance for your assistance.

Brian
  • 1,085

2 Answers2

3

More insight than a laconic "use dos2unix".

Apparently sorted.csv uses CR+LF line endings while it should use LF only.

When you use `something` line feeds (LF) at the very end of the output of something are stripped, but not carriage returns (CR). In your case text+CR+LF became text+CR. If it's a sole input to echo, the tool adds a newline as usual and again you have CR+LF at the end. While printing to the console this CR changes nothing.

But in case of echo "`foo`bar" the CR character returned by foo stays in the middle of the resulting string, so anything that follows gets printed from the left edge of the console, overwriting the previous part.

The solution is to use dos2unix sorted.csv, as you have already noted.


But there's more:

And maybe

  • What is wrong with echo $(stuff) or echo `stuff`?

    I admit this objection is questionable here. "`foo`bar" is useful to strip any trailing LF from the output of foo and concatenate it with bar; in your case it's important. Then syntax like echo "`foo`" was used to demonstrate the relevant difference in behavior, you explicitly stated this.


dos2unix certainly doesn't answer the question, which is [emphasis mine]:

How do I correctly write lines 30-32 so they create the desired output?

A correctly written line 32 would at least use printf, no cat. Backticks and excessive quotes may stay, although the code is more readable with $( … ) and reasonable quoting. You may also consider separating format from data, it's easy with printf:

printf '%s,%s,%s,%s,%s\n' "$( <sorted.csv head -$((i-1)) | tail -1 )" "$all" "$bluered" "$blueyellow" "$redyellow"

In addition there are poor solutions in other places:

  • while [ $i -le `cat sorted.csv | wc -l` ] (aside from cat and backticks). There's no need to obtain the number of lines with each iteration. wc -l sorted.csv should be run once before the loop, its result stored in a variable (unless you expect the number of lines to change during execution, but I think you don't; the logic of the script makes more sense if the number doesn't change).

  • You read the file anew again and again. The more lines there are, the more times you reopen, read from the beginning and pick a single line each time. The flow should be redesigned so the file is parsed line by line, possibly without the prior wc -l, possibly in a pipe-like manner (i.e. opened just once, read just once without skipping back). Since you use the declare builtin, while IFS= read -r … ; done <sorted.csv is probably a must (maybe with preliminary awk, e.g. <sorted.csv awk … | while IFS= read -r …). You can read three times to three different variables, then perform operations; then read the next three lines. read itself is inefficient, still it's more elegant than reopening the file many times. Note your script becomes less efficient with every additional line in the file; if the file is large enough for inefficiency of read to manifest itself, your approach will probably perform even worse.

    This whole change, if you go for it, won't be trivial though.

  • The name of the file shouldn't be hardcoded in so many places. Reading the file just once would naturally reduce this problem. Even then it's good to start with input_file="sorted.csv" and use "$input_file" wherever you need. If you ever decide to pass the file path as a command line argument, it will be as easy as typing input_file="$1".

  • Why is there no shebang?

Considering the big picture, the more correct lines 30-32 would be like in this snippet:

#!/bin/bash
input_file="sorted.csv
…
while IFS= read -r pre_previous_line && IFS= read -r previous_line && IFS= read -r current_line; do
   …
   # useful bashism instead of    echo "$previous_line" | some_command
   some_command <<< "$previous_line"
   …
   # printf will reuse the format if there are more arguments than the format needs
   # so this one line will be enough for your three
   # (split for readability, it's still one line for the shell)
   printf '%s,%s,%s,%s,%s\n' \
      "$pre_previous_line" "$all" "$bluered" "$blueyellow" "$redyellow" \
      "$previous_line"     "$all" "$bluered" "$blueyellow" "$redyellow" \
      "$current_line"      "$all" "$bluered" "$blueyellow" "$redyellow"
   …
done <"$input_file"

Probably sole awk working as a filter could do this; the file would be piped to it. In awk you can store information in variables too, to use it later. Conditionals are also available. Something like this poorly tested example:

#!/usr/bin/awk -f
BEGIN { FS="," }
{
if ($1 == "blue") blue=$5
if ($1 == "red") red=$5
if ($1 == "yellow") yellow=$5
if (NR%3 == 1) prepre=$0
if (NR%3 == 2) pre=$0
if (NR%3 == 0)
   {
   bluered=($blue == $red)
   blueyellow=($blue == $yellow)
   redyellow=($red == $yellow)
   all= bluered * blueyellow
   suffix=","all","bluered","blueyellow","redyellow
   print prepre suffix
   print pre    suffix
   print $0     suffix
   }
}

(Note: awk I tested with is in fact mawk). Save it, make executable, pipe your sorted.csv to it (e.g. <sorted.csv ./the_script).

0

The comment about carriage returns being in the file was dead on. After running the input file thought dox2unix, the script ran as expected. Thanks, Gordon.

Brian
  • 1,085