I was reading several guides how combine btrfs snapshots with rsync to make an efficient backup solution with history. However it all depends on if rsync --inplace modifies only those portions of files that actually changed, or if it overwrites the whole file sequentially. If it writes the whole file then it seems that btrfs will always create a new copy of the file, which would make the idea much less efficient.
- 3,273
7 Answers
TL;DR - You need option --inplace and if copying between local filesystems you also need --no-whole-file.
If you pass rsync two local paths, it will default to using "--whole-file", and not delta-transfer. Rsync assumes that it can write a completely new file and unlink the old one faster than reading both and calculating the changed blocks. If it doesn't calculate the changes, there won't be block-level changes for btrfs to observe. So, what you're looking for is --no-whole-file, in addition to --inplace. You also get delta-transfer if you requested '-c'.
Here's how you can verify:
$ mkdir a b
$ dd if=/dev/zero of=a/1 bs=1k count=64
$ dd if=/dev/zero of=a/2 bs=1k count=64
$ dd if=/dev/zero of=a/3 bs=1k count=64
$ rsync -av a/ b/
sending incremental file list
./
1
2
3
sent 196831 bytes received 72 bytes 393806.00 bytes/sec
total size is 196608 speedup is 1.00
Then touch a file and re-sync
$ touch a/1
$ rsync -av --inplace a/ b/
sending incremental file list
1
sent 65662 bytes received 31 bytes 131386.00 bytes/sec
total size is 196608 speedup is 2.99
You can verify it re-used the inode with "ls -li", but notice it sent a whole 64K bytes. Try again with --no-whole-file
$ touch a/1
$ rsync -av --inplace --no-whole-file a/ b/
sending incremental file list
1
sent 494 bytes received 595 bytes 2178.00 bytes/sec
total size is 196608 speedup is 180.54
Now you've only sent 494 bytes. You could use strace to further verify if any of the file was written, but this shows it at least used delta-transfer.
Note (see comments) that for local filesystems, --whole-file is assumed (see the man page for rsync). On the other hand, across a network --no-whole-file is assumed, so --inplace on its own will behave as --inplace --no-whole-file.
- 743
Here the definite answer I guess, citing the correct part of the manual:
--inplace
[...]
This option is useful for transferring large files
with block-based changes or appended data, and
also on systems that are disk bound, not network
bound. It can also help keep a copy-on-write
*************
filesystem snapshot from diverging the entire con‐
*******************
tents of a file that only has minor changes.
- 301
- 2
- 3
rsync's delta transfer algorithm deals with whether the entire file is transmitted or just the parts that differ. This is the default behavior when rsyncing a file between two machines to save on bandwidth. This can be overriden with the --whole-file (or -W) to force rsync to transmit the entire file.
--inplace deals with whether rsync, during the transfer, will create a temporary file or not. The default behavior is to create a temporary file. This gives a measure of safety in that if the transfer is interrupted, the existing file in the destination machine remains intact/untouched. --inplace overrides this behavior and tells rsync to update the existing file directly. With this, you run the risk of having an inconsistent file in the destination machine if the transfer is interrupted.
--inplace overwrites only regions that have changed. Always use it when writing to Btrfs.
- 567
- 5
- 10
From the man page:
This option changes how rsync transfers a file when its data
needs to be updated: instead of the default method of creating a
new copy of the file and moving it into place when it is com-
plete, rsync instead writes the updated data directly to the
destination file.
This leads me to believe that it writes over the file in its entirety-- I imagine it would be near impossible for rsync to work any other way.
I believe btrfs-sync could be what you need, here's an article explaining it.
In short, it is a bash script to sync BTRFS snapshots, locally or through SSH*.
The syntax is similar to that of scp
Usage:
btrfs-sync [options] <src> [<src>...] [[user@]host:]<dir>
-k|--keep NUM keep only last <NUM> sync'ed snapshots
-d|--delete delete snapshots in <dst> that don't exist in <src>
-z|--xz use xz compression. Saves bandwidth, but uses one CPU
-Z|--pbzip2 use pbzip2 compression. Saves bandwidth, but uses all CPUs
-q|--quiet don't display progress
-v|--verbose display more information
-h|--help show usage
<src> can either be a single snapshot, or a folder containing snapshots
<user> requires privileged permissions at <host> for the 'btrfs' command
- 121
The theoretical work on in-place rsync is described in this paper.
Paper reference: D. Rasch and R. Burns. In-Place Rsync: File Synchronization for Mobile and Wireless Devices. USENIX Annual Technical Conference, FREENIX track, 91-100, USENIX, 2003.
From the link:
... We modified the existing rsync implementation to support in-place reconstruction.
Abstract: [...] We have modified rsync so that it operates on space constrained devices. Files on the target host are updated in the same storage the current version of the file occupies. Space-constrained devices cannot use traditional rsync because it requires memory or storage for both the old and new version of the file. Examples include synchronizing files on cellular phones and handheld PCs, which have small memories. The in-place rsync algorithm encodes the compressed representation of a file in a graph, which is then topologically sorted to achieve the in-place property. [...]
So this appears to be the technical details of what rsync --inplace is doing. According to the beginning of the paper:
We have modified rsync so that it performs file synchronization tasks with in-place reconstruction. [...] Instead of using temporary space, the changes to the target file take place in the space already occupied by the current version. This tool can be used to synchronize devices where space is limited.
As becomes clear from @dataless's answer, this implies that --inplace is using the same storage space, but it may still copy the whole file into that space. Specifically, when copies are made from/to local filesystems, rsync assumes the --whole-file option. But when it is across networked systems on the other hand, it assumes the --no-whole-file option.