Filesystem with copy on write hard links

Question

Study case:

An automatic backup system from all family members via OpenVPN.

A lot of files (especially photos) are common between family members.

So, with a script I replace identical files with hard links.

Then a problem arise: If a user changes its file, then the file changes for all the users. Deletion of file is not a problem, nor rename. Only changes of file content.

So I want, when a user changes its file and the file is a hard link, to eliminate the hard link and to create a new copy of the original file.

Is this possible with any filesystem or any hack or feature?

grawity · Accepted Answer · 2023-03-25T13:17:41.003

You're looking for the reflink feature, which was introduced in 2009. It only works with certain filesystems – currently Btrfs, XFS, and the upcoming Bcachefs. (ZFS is still working on it.)

Use --reflink to create a CoW copy when possible (this is already the default as of coreutils 9.0), or --reflink=always if you want to make sure it'll never fall back to doing a full copy:

cp --reflink OLDFILE NEWFILE

The new file will have a different inode, but will initially share all data extents with the original (which can be compared using filefrag -v FILE or xfs_io -rc "fiemap -v" FILE).

An alternative is filesystem deduplication, which is supported by Btrfs and ZFS among others, and allows merging identical blocks underneath existing files. In ZFS this happens synchronously ("online" or as soon as the file is written), while in Btrfs it's done as a batch job (i.e. "offline", using tools such has Bees or 'duperemove'). Unfortunately, online deduplication in ZFS has a significant impact on resource usage. If you use Btrfs, however, you can just run duperemove -rd against the folders once in a while.

Finally, whether you use reflinks or dedupe, you'll also want to use backup tools that themselves perform deduplication (it is not enough to use a hardlink-aware backup tool, as reflinks don't look like hardlinks). For example, the archive formats used by Restic and Borg are content-addressed (much like Git), so identical blocks will automatically be stored only once per repository, even if they occur in separate files.

The OCFS2 cluster filesystem on Linux also has "reflinks" at least in name, but doesn't support the standard reflink creation API, so they have to be created using an OCFS2-specific tool.

On Windows, ReFS supports reflinks under the name "block cloning" (though it doesn't seem to come with a built-in CLI tool); NTFS does not. Finally on macOS, cp -c will create reflinks (CoW copies) as long as you're using APFS.

Jim · Answer 2 · 2025-05-28T15:47:10.340

This is almost 5 years old and I'm sure you've either solved the problem or moved on. I stumbled on this through a link on an unrelated ZFS issue on github.

But I noticed that no one actually answered both of your implied questions. (The accepted answer actually does, as an aside, suggest a partial solution to both - duperemove -dr - but that tool and command doesn't and can't solve it in the way you asked or implied. I.e, at the file level. It's a block/extent-level deduper. Which may be close enough.)

So as I'm sure you've figured out by now, there are necessarily two problems you are trying to solve. In reverse order:

Dedup identical photos in a way that aren't permanently linked together for life. This is what you explicitly asked about.
Determine which photos are actually identical in the first place. The same photos often have different filenames. So relying on filename - or even time + size - isn't necessarily enough.

The problem of identifying identical files, is a pretty well solved problem by now. The world has pretty much agreed that we only want to compare file contents - and not metadata like filename, directory, extended attributes, security ACLS, or even necessarily time/datestamps.

So (for the benefit of others stumbling on this), what most tools do is: first, make an internal list of files with identical sizes, above a minimum threshold. (There's no space-saving to be had, deduping files below a certain size.) Then it computes a binary crypographic hash of just the file contents, for each same-size file. (For the paranoid, many tools can also do full content binary compares, but optionally cache just the checksums for subsequent runs.)

I had to solve essentially the same problem, around the same time.

By now there are many tools on github that do both, in "offline" batch-mode for CoW filesystems: find duplicates, cache results for subsequent runs, and performs an ioctl() with FICLONE (same as cp --reflink), so that the identical files share the same blocks or extents, but if one is edited, they will diverge again.

(Unlike hardlinks, which can be unintentionally permanently destructive, sometimes years later, and in ways you may not even be aware of until even more years later. Long after backups and snapshots have rotated out. This is why I just never, ever use hardlinks for user-level files - and only in exceptionally narrow, highly technical use-cases. Otherwise it almost never makes sense for regular user files.)

Btrfs was the first to support reflink copies over 15 years ago in development - which is exactly what you want.

cp --reflink became a part of coreutils the next year. Some GUI file managers even detect if they are copy/pasting files within the same CoW filesystem boundary, and will essentially cp --reflink=always if so. It's fun to see TBs of data copied instantly that way.

XFS added support about six years later.

BcacheFS, which someone mentioned five years ago, is still in development (going on ten years now), and the rather prickly main developer Kent Overstreet has been promising near production-readiness for about that long. I've been predicting it will be booted from the kernel development tree for years now, due to near-Muskian-level over-promising and under-delivering.

But in spite of recent head-butting with Torvalds himself, just the passing of more time and even any ongoing progress, makes my prediction less likely. (And I do hope I'm wrong. Regardless of personality issues and big talk, it's a promising FS.)

Meanwhile through all that drama, OpenZFS managed to pull off what everyone said was impossible: Getting cp --reflink support into ZFS. Working and stable.

I don't know what the best tool is now to identify and dedup duplicate files - at the file level - nowadays. I wrote my own a while back, basically an elaborate script wrapper around rmlint.

Duperemove and Bees dedupe at the block or extent level, not the file level. Which in the end may not make a practical difference in terms of space-saving or the ability for files to later diverge. But to me personally when scripting a custom solution, the distinction is at least semantically important.

As far as solutions I'm aware of that work at the file level: check out rdfind and jdupes.

rmlint deserves an honorable mention, but requires a lot of additional work - because it's a general-purpose tool that does much more than just dedup. (There is a feature request to add a flag for a single-run "find-dupes-and-also-dedupe-them", for CoW filesystems, that seems to have some developer interest but we'll see.)

It's also a surprisingly straightforward problem to tackle yourself with basic scripting and coreutils, that can be easily and fairly safely vibe-coded with a chatbot - using find, sha256sum, sort, and uniq, for example. The only real mildly sticking point I encountered myself, was the solvable problem of preemptive file-locking for additional safety. And for increased performance over multiple passes, you can script the use of a local cache via sqlite3 (to store checksums and the metadata necessary to know if you need to regenerate a new checksum), and also for better persistence across moves and renames, store hashes+metadata on the files themselves with setfattr (as rmlint optionally does). Though if all this is done in, say, bash performance isn't going to be amazing.

But since there are at least a couple of tools that do all that already, in optimized machine-code form, and surely a half dozen or more others on github, there's arguably no real need to roll your own in script - unless you just want more control over and visibility into the process.

score 1 · Answer 3 · answered Jan 05 '24 at 06:41

Another possibility is to set up a shared directory with the sticky bit set.

On a Linux system, the /tmp directory has the permissions drwxrwxrwt, or 1777 in numeric terms, which ensures that anybody can write anything in there, but once that do those files belong to them and can't be modified or deleted by other users, so you maintain a concept of ownership of the files.

So, you can create a directory with these same permissions as a kind of group directory.

This doesn't exactly work in the way you describe above, but it does ensure that users can put whatever files they like in there, into a new directory which they themselves create or just into the root, and no other user can delete or modify any files belonging to other users. The sticky bit on the directory is what achieves the second part - normally if you give a directory world-writable permissions, then people can delete the files of others. With the sticky bit, people cannot delete or modify files not belonging to them.

For any common files that you want anyone to be able to read, you just put it in there yourself and leave it owned by you.

Note: unlike in your question, other people won't be able to modify or delete these files, but they would just need to be instructed that they can set up their own copy wherever they like and modify that instead.

The downside to this is that they can't transparently delete or modify whatever they like if it doesn't belong to them.

But the upside to this solution is that everything they put in there is shared with everyone else, who can all open it and read it.

score 0 · Answer 4 · answered Jan 05 '24 at 06:28

With a bit of planning you can achieve what you want with overlayfs.

You would take all the files common to all instances and put them into the normally read-only lower directory.

Then, user overlayfs to mount a separate upper directory over the top of it for each differently modifiable copy you want. The upper directory can be empty to begin with meaning that each person just what's in the lower directory unmodified.

When an existing file is modified in any mounted overlay, it causes the file to be copied to the upper directory and overlay it for that instance only. This is completely transparent. Same with deletions and newly created files - which will only affect the upper directory currently in use. The user who did the changes will see it in their instance, but it won't affect the lower directory or other peoples' instances.

Over time if different people add different things in to their own instance of it, they'll end up getting more and more different but if you ever want to consolidate things you can periodically go through and determine anything that should be the same for all users and move it into the normally read-only lower directory.

The only issue I can forsee with this setup (which is the same for both solutions) is that if a user wants to share a file with the other users, anything they add won't be shared and will only be visible to them. If you want that, there's another possibility.

Filesystem with copy on write hard links

4 Answers4

Linked