1

I copied a big folder of CSVs from Windows to Linux using scp. Once the copy completed, I used du -sh . to make sure everything copied, and I see a much smaller disk usage than expected. I can see that many of the CSV files are taking up much less space than I expect, so I'll focus just on a single file, below. These are all regular text CSV files. I don't expect there to be any NUL characters in them. The file is stored on an NFS mounted partition on the Linux system.

This is on CentOS 7.9.2009.

If I use sha256sum to compare the Windows and Linux files, I get the same checksum, so the files were not corrupted while being copied.

For one single CSV:

$ du *
1897800 big_csv_file.txt
$ du --apparent-size *
2792499 big_csv_file.txt
$ find . -type f -printf '%S:%p\n'
0.679606:./big_csv_file.txt
$ du --apparent-size -b *
2859518674      big_csv_file.txt

That last value is the same size as reported by ls -l of course. Just to verify that there are no NUL characters and thus the file can't be sparse:

$ grep -Pa '\x00' big_csv_file.txt
$ echo $?
1

The grep version is "grep (GNU grep) 2.20" in case it matters.

How is this possible?

Eddie
  • 237

1 Answers1

1

Files can be transparently compressed at filesystem level, on Linux and Windows both. For example, the ZFS filesystem is commonly used with compression enabled. (This would be done on the NFS server, not on your CentOS system.)

grawity
  • 501,077