1

I create RAW image files plus a small selection of JPEG files derived from the RAW masters. The JPEGs once created are very, very rarely edited again but when they are because they are recompressed, the whole file will change. When editing the RAW images I use software that makes changes non-destructively. A preview file and a meta file (XMP <40KB) is created in conjunction with a catalog that together keep track of the changes.

I manage the preview and catalog file backups in a separate system so for this question I’m only concerned with the RAWs, XMPs and JPEGs.

I want to backup all RAW, JPEG and XMP files offsite over a WAN connection based on new and altered files on a filesystem that is scanned for changes once per day.

The de-duplication seems to work by reading portions of files and creating weak hashes to compare with all other portions of files. If a hash is found to be the same as another, a stronger hash is created and the portions are compared again. If the portions are still creating the same hash then the second portion isn't uploaded. Instead, the backup system points the duplicated portion of the file to it's previously backed up copy.

My question is…

  • If the RAW files don’t change and…
  • The JPEGs will rarely change and…
  • The XMP files may have portions of the files changed and…
  • The CPU/RAM requirements for de-duplication are very high and…
  • Given that data de-duplication can reduce the amount of data transmitted…

…is it worth using de-duplication?

Giacomo1968
  • 58,727

0 Answers0