TL;DR:
I'd like to have recommendations for a distributed key-value storage, for avg. entry size of up to 50KB, to be installed on a Linux environment (dedicated servers).
A file-system solution would do.
I found a few solutions: Ceph, Cassandra, Riak, and a few more.
Details
I'm looking for a storage solution for one of our components, it should be a key-value storage, flat namespace.
Scenario
The read/write patterns are very simple:
Once a key-value is written, there are a few reads within the next hours.
After that, nothing touches the given key-value. We'd like to keep the data for future purposes, "Storage mode".
Other usage aspects
- OS: Linux
 - Python client/connector
 - Total size: up to 80TB (this value also represents future needs).
 - Avg Entry Size (for a single value in a k-v pair): 10 to 50 KB, uncompressed, mostly textual data
 - Compression: either built-in or external.
 - Encryption: not needed
 - Network bandwidth: 1Gb, single LAN
 - Servers: dedicated (not in the cloud)
 
Most important requirements
The "base" requirements are:
- OS: Linux
 - Python client/connector OR RESTful API via HTTP
 - Can easily store up to 80TB (this value also represents future needs).
 - Max read latency: a few seconds for first reads, 30 seconds for "storage mode" (see above for explanation)
 - Built in replication (so that data is stored on more than a single node)
 
Nice to have
- RESTful gateway
 - Background data backup to another store (for data recovery in case of a disaster).
 - Easy to configure
 
What I've found so far
- Ceph
 - HDFS
 - HBase on top of HDFS
 - Lustre
 - GlusterFS
 - Mongo's GridFS - but can I trust Mongo's infrastructure?
 - Cassandra - not an option, since the merge process consumes double disk size
 - Riak - looks like it has the same issue as Cassandra, needs more research
 - Swift + OpenStack (actual storage can be on Amazon S3)
 - Voldemort
 - There are dozens of additional tools, but I won't write them here since some of them have proprietary license, and others seem to be immature.
 
I'd appreciate any recommendation on any of the tools I mentioned above (with total capacity of more than 50TB), or on a tool you think is sufficient.