2

Amazon Glacier FAQ page contains several points, which talk about time needed to retrieve data from Amazon Glacier. For example:

Standard retrievals allow you to access any of your archives within several hours. Standard retrievals typically complete within 3 – 5 hours ...

&

... Bulk retrievals typically complete within 5 – 12 hours.

Why does it take so long to retrieve data from Amazon Glacier in comparison with other storage classes?

an0o0nym
  • 141

3 Answers3

3

Why does it take so long? Because that's how it's designed.

Amazon Glacier is specifically designed to be a low-cost low-access storage service for "data archiving and long-term backup." If you want regular immediate access to your data, then you need something like Amazon S3, which is a higher-cost instant-access storage service.

Please also note that it's called "Glacier," and glaciers are not known for being fast.

I suspect they're using tape drives or something similar, but I can't comment on the specific technical aspects, nor can I find that info on Amazon's web pages.

hymie
  • 1,276
2

I've found this on Glacier Wiki Page:

ZDNet says, that according to private e-mail, Glacier runs on "inexpensive commodity hardware components". In 2012, ZDNet quoted a former Amazon employee as saying that Glacier is based on custom low-RPM hard drives attached to custom logic boards where only a percentage of a rack's drives can be spun at full speed at any one time. (Similar technology is also used by Facebook.)

There is some belief amongst users that the underlying hardware used for Glacier storage is tape-based, owing to the fact that Amazon has positioned Glacier as a direct competitor to tape backup services (both on-premises and cloud-based). This confusion is exacerbated by the fact that Glacier has archive retrieval delays (3–5 hours before archives are available) similar to that of tape-based systems and a pricing model that discourages frequent data retrieval.

The Register claimed that Glacier runs on Spectra T-Finity tape libraries with LTO-6 tapes. Others have conjectured Amazon using off-line shingled magnetic recording hard drives, multi-layer Blu-ray optical discs, or an alternative proprietary storage technology.

an0o0nym
  • 141
1

Amazon Glacier has 2 stages: the retrieval and the download. It was created for long term storage that does not require frequent retrieval; such as cloud backups. Retrieval requests take typically 3 - 5 hours and then the data is placed in a staging area for the customer to download it. Retrieved data is staged for 24 hours, therefore it's important to download the data within that period. Download time depends on your bandwidth. The reason for the lengthy time is that Amazon prices Glacier lower than other storage options, which are intended for more frequent data access.

However, Glacier does have different types of data retrieval available. If needed, they do have expedited retrieval available which makes data available at a much faster rate, as soon as 1-5 minutes. This type is more expensive than the standard Glacier data retrieval. AWS has an FAQ with additional details on the various types of retrievals: https://aws.amazon.com/glacier/faqs/.