I copied a big folder of CSVs from Windows to Linux using scp. Once the copy completed, I used du -sh . to make sure everything copied, and I see a much smaller disk usage than expected. I can see that many of the CSV files are taking up much less space than I expect, so I'll focus just on a single file, below. These are all regular text CSV files. I don't expect there to be any NUL characters in them. The file is stored on an NFS mounted partition on the Linux system.
This is on CentOS 7.9.2009.
If I use sha256sum to compare the Windows and Linux files, I get the same checksum, so the files were not corrupted while being copied.
For one single CSV:
$ du *
1897800 big_csv_file.txt
$ du --apparent-size *
2792499 big_csv_file.txt
$ find . -type f -printf '%S:%p\n'
0.679606:./big_csv_file.txt
$ du --apparent-size -b *
2859518674 big_csv_file.txt
That last value is the same size as reported by ls -l of course. Just to verify that there are no NUL characters and thus the file can't be sparse:
$ grep -Pa '\x00' big_csv_file.txt
$ echo $?
1
The grep version is "grep (GNU grep) 2.20" in case it matters.
How is this possible?