12

I know we can use dos2unix to convert between Windows and Unix line termination. I am wondering if there is any command that can tell me if a file has Windows or Unix line termination?

phuclv
  • 30,396
  • 15
  • 136
  • 260
Oliver
  • 275
  • 1
  • 3
  • 6

4 Answers4

14
$ file f1 f2 f3
f1: ASCII text, with CRLF, LF line terminators
f2: ASCII text, with CRLF line terminators
f3: ASCII text

If you feel it necessary to check every line in the file, you can do this:

$ grep -c "^M" f1 f2
f1:0
f2:3

$ wc -l f1 f2 3 f1 3 f2 6 total

The "^M" was entered using Ctrl+V Ctrl+M and is the ASCII carriage-return (CR) character.

Here we see that file f1 has three lines but no CRs so all line endings must be Unix style solo LFs.

File f2 has equal numbers of lines and CRs so it is reasonable to guess that it uses the CR,LF line-endings as used by MS-DOS and Windows.

phuclv
  • 30,396
  • 15
  • 136
  • 260
2

On Windows, a quick way to tell is to open your file in Notepad. Notepad will show line-breaks only on windows style terminations (CR+LF), and not unix terminations (LF). So your unix text will look like this:

Line1Line2Line3Line4

whereas, windows text will look like this:

line1
line2
line3
line4

I'm not much familiar with unix/linux platform, but I'm sure you can use similar hacks with programs like gedit or emacs.

Prahlad Yeri
  • 1,015
0

PowerShell is built into Windows and is available for all other major platforms so you can use it to detect format like this

('LF', 'CRLF')[([regex]::Matches($(gc -Ra path\to\file.txt), "\r?\n") | group -P Length).Group[0].Value.Length - 1]

If you want to make it work for mixed CRLF files then you need to use the more complete solution below

$content = Get-Content -Raw path\to\file.txt
[regex]::Matches($content, "\r?\n") | Group-Object -Property Length `
    | Tee-Object -Variable newlines
if ($newlines.Length -eq 2) {
    echo "Mixed CRLF"
} else {
    if ($newlines[0].Group[0].Value.Length -eq 2) {
        echo "CRLF"
    } else {
        echo "LF"
    }
}

Also note that I'm assuming there are only CRLF and LF like git's behavior. To make it work for CR files you'll need some small changes

Another solution:

$content = Get-Content -Raw -Encoding Byte .\path\to\file.txt
$cr = 0; $lf = 0
foreach ($c in $content) { if ($c -eq 10) { $lf++ } elseif ($c -eq 13) { $cr++ } }
echo "CR = $cr, LF = $lf"
phuclv
  • 30,396
  • 15
  • 136
  • 260
0
c=($(perl -0777ne 'print $_ =~ tr/\n//; print " "; 
                    print $_ =~ tr/\r//;'))
if   ((!(c[0] +   c[1]))) ;then echo no line endings  
elif ((  c[0] && !c[1] )) ;then echo LF
elif (( !c[0] &&  c[1] )) ;then echo CR 
elif ((  c[0] ==  c[1] )) ;then echo CRLF 
else echo "ambiguous LF ${c[0]} CR ${c[1]}"
fi

Note, that for speed's sake, only individual \rs and \ns are counted, but it would be a pretty whacky file that had an equal number of both types and yet was not a Windows CRLF file...

Also note that the *nix tool file does not do a complete scan of the file, whereas this perl script does. You haven't mentioned which platform you wish it to run on; I have used bash script to test perl's output, but that can be changed to Window cmd script.

You can just pipe your file to it.

beatcracker
  • 2,712
Peter.O
  • 3,093