Does rsync --inplace write to the entire file, or just to the parts that need to be updated? (for btrfs+rsync backups)

Question

I was reading several guides how combine btrfs snapshots with rsync to make an efficient backup solution with history. However it all depends on if rsync --inplace modifies only those portions of files that actually changed, or if it overwrites the whole file sequentially. If it writes the whole file then it seems that btrfs will always create a new copy of the file, which would make the idea much less efficient.

dataless · Answer 1 · 2022-09-16T21:54:25.117

TL;DR - You need option --inplace and if copying between local filesystems you also need --no-whole-file.

If you pass rsync two local paths, it will default to using "--whole-file", and not delta-transfer. Rsync assumes that it can write a completely new file and unlink the old one faster than reading both and calculating the changed blocks. If it doesn't calculate the changes, there won't be block-level changes for btrfs to observe. So, what you're looking for is --no-whole-file, in addition to --inplace. You also get delta-transfer if you requested '-c'.

Here's how you can verify:

$ mkdir a b
$ dd if=/dev/zero of=a/1 bs=1k count=64
$ dd if=/dev/zero of=a/2 bs=1k count=64
$ dd if=/dev/zero of=a/3 bs=1k count=64
$ rsync -av a/ b/
sending incremental file list
./
1
2
3
sent 196831 bytes  received 72 bytes  393806.00 bytes/sec
total size is 196608  speedup is 1.00

Then touch a file and re-sync

$ touch a/1
$ rsync -av --inplace a/ b/
sending incremental file list
1
sent 65662 bytes  received 31 bytes  131386.00 bytes/sec
total size is 196608  speedup is 2.99

You can verify it re-used the inode with "ls -li", but notice it sent a whole 64K bytes. Try again with --no-whole-file

$ touch a/1
$ rsync -av --inplace --no-whole-file a/ b/
sending incremental file list
1
sent 494 bytes  received 595 bytes  2178.00 bytes/sec
total size is 196608  speedup is 180.54

Now you've only sent 494 bytes. You could use strace to further verify if any of the file was written, but this shows it at least used delta-transfer.

Note (see comments) that for local filesystems, --whole-file is assumed (see the man page for rsync). On the other hand, across a network --no-whole-file is assumed, so --inplace on its own will behave as --inplace --no-whole-file.

score 19 · Answer 2 · answered Jan 11 '16 at 07:37

Here the definite answer I guess, citing the correct part of the manual:

   --inplace

          [...]

          This option is useful for transferring large files
          with  block-based  changes  or  appended data, and
          also on systems that are disk bound,  not  network
          bound.   It  can  also  help  keep a copy-on-write
                                               *************
          filesystem snapshot from diverging the entire con‐
          *******************
          tents of a file that only has minor changes.

score 9 · Answer 3 · edited Nov 12 '15 at 03:24

rsync's delta transfer algorithm deals with whether the entire file is transmitted or just the parts that differ. This is the default behavior when rsyncing a file between two machines to save on bandwidth. This can be overriden with the --whole-file (or -W) to force rsync to transmit the entire file.

--inplace deals with whether rsync, during the transfer, will create a temporary file or not. The default behavior is to create a temporary file. This gives a measure of safety in that if the transfer is interrupted, the existing file in the destination machine remains intact/untouched. --inplace overrides this behavior and tells rsync to update the existing file directly. With this, you run the risk of having an inconsistent file in the destination machine if the transfer is interrupted.

score 4 · Answer 4 · answered Oct 25 '13 at 12:31

4

--inplace overwrites only regions that have changed. Always use it when writing to Btrfs.

answered Oct 25 '13 at 12:31

Gabriel

567
5
10

score 3 · Answer 5 · edited Mar 02 '16 at 14:17

From the man page:

This  option  changes  how  rsync transfers a file when its data
needs to be updated: instead of the default method of creating a
new  copy  of  the file and moving it into place when it is com-
plete, rsync instead writes the updated  data  directly  to  the
destination file.

This leads me to believe that it writes over the file in its entirety-- I imagine it would be near impossible for rsync to work any other way.

score 2 · Answer 6 · answered Jan 06 '21 at 01:13

I believe btrfs-sync could be what you need, here's an article explaining it.

In short, it is a bash script to sync BTRFS snapshots, locally or through SSH*.

The syntax is similar to that of scp

Usage:
  btrfs-sync [options] <src> [<src>...] [[user@]host:]<dir>
-k|--keep NUM     keep only last <NUM> sync'ed snapshots
  -d|--delete       delete snapshots in <dst> that don't exist in <src>
  -z|--xz           use xz     compression. Saves bandwidth, but uses one CPU
  -Z|--pbzip2       use pbzip2 compression. Saves bandwidth, but uses all CPUs
  -q|--quiet        don't display progress
  -v|--verbose      display more information
  -h|--help         show usage
<src> can either be a single snapshot, or a folder containing snapshots
<user> requires privileged permissions at <host> for the 'btrfs' command

score 1 · Answer 7 · edited Dec 15 '17 at 01:14

The theoretical work on in-place rsync is described in this paper.

Paper reference: D. Rasch and R. Burns. In-Place Rsync: File Synchronization for Mobile and Wireless Devices. USENIX Annual Technical Conference, FREENIX track, 91-100, USENIX, 2003.

From the link:

... We modified the existing rsync implementation to support in-place reconstruction.

Abstract: [...] We have modified rsync so that it operates on space constrained devices. Files on the target host are updated in the same storage the current version of the file occupies. Space-constrained devices cannot use traditional rsync because it requires memory or storage for both the old and new version of the file. Examples include synchronizing files on cellular phones and handheld PCs, which have small memories. The in-place rsync algorithm encodes the compressed representation of a file in a graph, which is then topologically sorted to achieve the in-place property. [...]

So this appears to be the technical details of what rsync --inplace is doing. According to the beginning of the paper:

We have modified rsync so that it performs file synchronization tasks with in-place reconstruction. [...] Instead of using temporary space, the changes to the target file take place in the space already occupied by the current version. This tool can be used to synchronize devices where space is limited.

As becomes clear from @dataless's answer, this implies that --inplace is using the same storage space, but it may still copy the whole file into that space. Specifically, when copies are made from/to local filesystems, rsync assumes the --whole-file option. But when it is across networked systems on the other hand, it assumes the --no-whole-file option.

Does rsync --inplace write to the entire file, or just to the parts that need to be updated? (for btrfs+rsync backups)

7 Answers7