6

intelligent people, noob here!

I am planning on building a multi-user apparatus on s3 for photo/object storage, and I was planning on using s3. I have the whole front-end planned out but I have a question about the bucket system.

Should I have one bucket holding every user, or rather 4-5 buckets with the users distributed across them, or should I have 1 bucket for each user?

Each user will be storing on average about 35 GB as an example, and I want this to be able to run smoothly with as little as 3 users to as large as 300,000,000 in the future (so just as scalable as possible)

Which method should I choose, and what did Dropbox do during their S3 days?

1 Answers1

4

You definitely do not need a bucket for each user. Never mind the fact that it seems very unlikely that AWS support would approve a request to increase your account's default total bucket limit from 100 to 300,000,000. Also, initial bucket creation is not intended to be done aggressively or in real-time.

The high-availability engineering of Amazon S3 is focused on get, put, list, and delete operations. Because bucket operations work against a centralized, global resource space, it is not appropriate to create or delete buckets on the high-availability code path of your application. It is better to create or delete buckets in a separate initialization or setup routine that you run less often.

http://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html

Design your application so that it doesn't matter whether you use one bucket or several. How? For each user, store the bucket_id where that user's data is stored. Then start with everybody in bucket_id 1 and then later you have the flexibility to put new users in new buckets if that becomes necessary... or if you decide to migrate some users to different buckets... or if you decide to situate users' storage in a bucket nearer to the user's typical location.

S3 will automatically scale its capacity to meet the demands of your traffic. You can make that process easier by designing the paths to your objects so that there is nonsequential assignment of object keys near the left hand side of the key.

S3 scales its capacity by splitting index partitions, so, for example, giving each object a path that begins with the date of the upload would be a really bad idea, because your bucket index develops a hot spot with heavy uploads in a small part of the keyspace.

See http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

For the same reason, don't give your buckets lexically sequential names within a region.


What Dropbox may have been doing is probably not relevant.