My friends and I all have TBs on our system(s). None of us have any full backups which are geographically distributed however, because at that amount of data, solutions such as Dropbox, S3, et al. are cost-prohibitive for us. However, each of us has local storage in excess. TBs each, in fact, going unused.
We began thinking: If we could network our hosts into some form of Distributed File System, we could each gain geographically distributed backups of our complete data sets while achieving higher utilization of the storage capacity we have. The perfect solution... we think.
- There are at least 3 of us. Surely 6 or more if the project yields fruit.
- Each of us has 1-2TB of data, and at least that much to spare.
- We're all spread out over WAN.
- We'd need the ability for any host(s) to enter and leave the cloud service arbitrarily.
- Real(ish)-time synchronization. Otherwise we'd just meet up once a week over beers and trade around a pile of external HDDs.
- F/OSS is requisite, but we have plenty of elbow grease.
- If we can use/learn/leverage a distributed computing platform in the process, so much the better.
We started out thinking about building a Dropbox-esque interface on top of OpenStack or Hadoop, but I'd like to hear if there are other alternatives out there which we're ignoring. Perhaps for our case there is an even simpler solution? Is something like this even feasible, given the low number of nodes per cluster?
NB: Naturally the initial synchronization/balancing/transfer/etc will take days at the least, but that's acceptable.