I have a NAS box running some version of Linux that I use for backing up anything and everything.
It is essentially an absolute certainty that some of the files are identical duplicates.
That being the case, what I want to do is:
- Identify duplicate files where "duplicate" = identical SHA256 checksums. (Identical SHA512 is also acceptable but might take much longer. Which do you suggest?)
- Allow one to be the "master" copy, remove all other copies, and substitute hard links to the one remaining copy. This should free up a considerable amount of space on the NAS volume.
Note that the first file found is a good choice for the "master" file and all others can be removed and hard-linked to it. Permissions and ownership isn't a problem because there's only one user and (don't hate on me here), it's all wide open permission-wise anyway.
Also note that I want hard links so that if I delete a file, (for whatever reason), all the others remain.
Note that I have console access to the NAS box through a SSH shell.
Question:
- Is it possible ?
- How do I do it ?
- If a file has "X" number of hard-links to it, and I delete the original file that everyone else is hard linked to, do the remaining hard links remain to a real file? (I suspect the answer is "yes" such that the one file remains until all hard-links are removed.)
Update to add additional context:
The NAS box has two drives, one of which is external, can be removed, and can be processed on my Ubuntu laptop.
The other one is internal and is essentially untouchable. Though I can remove it, it is set up in a very custom way and dinking with it is the fast boat to disaster.
Additionally the internal O/S is a "network appliance" version of Linux running BusyBox. It appears to implement the standard functions including things like find, grep, sed, awk, etc.
Viz.: (as returned by busybox --help)
Currently defined functions:
adjtimex, ar, arp, arping, ash, awk, basename, cat, chgrp, chmod,
chown, chroot, clear, cmp, cp, crond, crontab, cut, date, dd, df,
dhcprelay, diff, dirname, dmesg, dnsdomainname, dos2unix, dpkg,
dumpleases, echo, egrep, env, expand, expr, false, fgrep, find, free,
fsck, getopt, getty, grep, halt, head, hostname, id, ifconfig,
ifenslave, init, ionice, ipcalc, kill, killall, ln, logger, logname,
logread, losetup, ls, lsof, lspci, lsusb, md5sum, mdev, mkdir, mkfifo,
mknod, mkswap, mktemp, modprobe, more, mount, mv, nice, nohup,
nslookup, pidof, ping, ping6, pivot_root, poweroff, printenv, printf,
ps, pwd, rdate, readlink, reboot, renice, rm, rmdir, route, sed, seq,
sh, sleep, sort, split, stat, swapoff, swapon, sync, sysctl, syslogd,
tac, tail, tar, tee, tftp, tftpd, top, touch, tr, traceroute,
traceroute6, true, tty, udhcpc, udhcpd, umount, uname, uniq, unix2dos,
uptime, usleep, vconfig, vi, watch, wc, which, xargs, zcip