6

I try to find an efficient way of rsyncing contents of an ext4 file system as a part of a regular backup, yet I consider a decent compression and a minimal required space.

I can just use plain rsync and then tar/gzip the resulting directory, but compression itself will be orders of magnitude slower, than the preceding rsyncing.

I cannot use squashfs and its likes because they are read-only.

I can make a partition of a special type for this backup, with an in-built compression, such as btrfs or reiser4 but I must to create it with a specific size and it will not scale.

I wonder if there any technology of a container with an in-built compression, which transparently and automatically adjust its size according to the volume of data, rsynced to it?

By the way, I use Debian GNU/Linux.

Neurotransmitter
  • 1,291
  • 16
  • 35

2 Answers2

4

@Tetsujin gave me a right direction, OS X's sparse bundles/images do have analog in Linux and this is sparse files.

Sparse files grow as the data in them grows. They can contain any Linux filesystem, including any modern variants with in-built compression, such as btrfs.

The following shows how to create a sparse compressed btrfs image. btrfs support in Debian and its derivatives (such as Ubuntu) can be enabled by the installing of btrfs-tools packages (sudo apt-get install btrfs-tools). I have added a sparsed ext4 image as well to compare speed and size. All operations were performed on Debian 7.8 Wheezy (oldstable as of 30 April 2015).

  1. Create empty sparse files of any size. Let it be 5 terabytes:

     me@wheezy:~$ truncate -s 5T ext4.sparse btrfs.sparse
    
  2. Format them

to ext4:

    me@wheezy:~$ mkfs.ext4 ext4.sparse
    mke2fs 1.42.5 (29-Jul-2012)
    <...>
    Allocating group tables: done
    Writing inode tables: done
    Creating journal (32768 blocks): done
    Writing superblocks and filesystem accounting information: done

to btrfs:

    me@wheezy:~$ mkfs.btrfs btrfs.sparse
WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

fs created label (null) on btrfs.sparse
        nodesize 4096 leafsize 4096 sectorsize 4096 size 5.00TB
Btrfs Btrfs v0.19

  1. Create mount points:

     me@wheezy:~$ mkdir ext4_mount btrfs_mount
    
  2. Mount them. Do not forget loop option:

ext4:

    me@wheezy:~$ sudo mount -o loop -t ext4 ext4.sparse ext4_mount

btrfs (don't forget compress option (can be zlib or lzo)):

    me@wheezy:~$ sudo mount -o loop,compress=lzo -t btrfs btrfs.sparse btrfs_mount
  1. That's it! File systems are created and mounted, appear as 5 TB to the OS, but actually occupy very little space:

df:

    me@wheezy:~$ df -h | grep _mount
    /dev/loop0                         5.0T  189M  4.8T   1% /home/a/ext4_mount
    /dev/loop1                         5.0T  120K  5.0T   1% /home/a/btrfs_mount

du:

    me@wheezy:~$ du -h *.sparse
    4.3M    btrfs.sparse
    169M    ext4.sparse
  1. For a purpose of testing I've created a huge 1.3 GB text file with a repetitive pattern. It will be cp'd to both newly created file systems:

ext4:

    me@wheezy:~$ time sudo cp /store/share/bigtextfile ext4_mount/
real    0m12.344s
user    0m0.008s
sys     0m1.708s

btrfs:

    me@wheezy:~$ time sudo cp /store/share/bigtextfile btrfs_mount/
real    0m3.714s
user    0m0.016s
sys     0m1.204s

  1. As have been seen in the previous step, btrfs proved to be a lot faster during a transfer of a highly compressible data, compared to the good ol' ext4. Let's check filesystems' sizes afterwards:

     me@wheezy:~$ df -h | grep _mount
     /dev/loop0                         5.0T  1.5G  4.8T   1% /home/a/ext4_mount
     /dev/loop1                         5.0T   46M  5.0T   1% /home/a/btrfs_mount
    
  2. btrfs proved to be a lot more space efficient. At last, let's check the sparse files' sizes as well:

     me@wheezy:~$ du -h *.sparse
     50M     btrfs.sparse
     1.4G    ext4.sparse
    

That's it. If it's needed, sparse files may be further enlarged. btrfs can be resized online as well.

Cool solution for regular rsync backups. But don't forget to backup these files more traditionally as well, since btrfs is still an experimental filesystem.

Further info on Arch Wiki: https://wiki.archlinux.org/index.php/Sparse_file and https://wiki.archlinux.org/index.php/Btrfs

Neurotransmitter
  • 1,291
  • 16
  • 35
0

A .sparsebundle or .sparseimage is possibly what you need...

Sparse bundles defined

A sparse bundle is a disk image format introduced in Mac OS X 10.5 Leopard® (.sparsebundle). Like sparse images (.sparseimage), a sparse bundle is a Read/Write format where the disk image only occupies as much space as the data it contains, up to the limit defined when it was created. Sparse bundles compact more efficiently than sparse images, meaning that it is faster to reclaim the unused free space in a sparse bundle than in an equivalent sparse image.

While both sparse images and sparse bundles contain a file system, a sparse bundle is bundle-backed, meaning that it employs a specialized, hierarchical directory structure for grouping related resources. Within a sparse bundle, the bands subdirectory contains the actual data saved within the disk image.

Under Leopard, enabling FileVault® on a Home folder converts that Home folder into an encrypted sparse bundle. Under Mac OS X 10.4 Tiger® and earlier, FileVault employed encrypted sparse images.

Sparse bundles are also employed for network-based backup disks created by Time Machine®, such as on a Time Capsule®.

See Can Linux mount a normal Time Machine sparse bundle disk image directory? for far more than I know about nix, I'm Mac-based, sorry.

Tetsujin
  • 50,917