1

We have a large amount of data on an EBS volume. I am familiar with attaching the volume to a new EC2 cluster.

But how is this done for EMR ? Here is the Add Storage dialog: notice there is no entries for specifying the EBS Snapshot ID:

enter image description here

1 Answers1

1

EMR console doesn't give you the option to do the same.

We also had a requirement where we wanted to make available 70GB of data via EBS volumes.
The solution is to mount the volume to underlying EC2 instances.

For this

Step 1: Select the EMR and go to the Hardware tab
Step 2: Go into the instance group, in our case CORE, because we wanted the data to available on the worker nodes.
Step 3: Copy the EC2 instance id where you want to mount the volume.
Step 4: Go to EC2 console and then select Volumes from the left navigation menu. Select the volume you want to mount and then from Actions dropdown select 'Attach Volume'. paste the EC2 instance id and done.

This EBS volume will most likely be available as /dev/sdf in EC2 instance which you then mount on any directory by ssh'ing to that instance.

Now the question here is that

if you want to share that data in all the participating core nodes then you need to have a separate volume created for each node and then attached.
Or else what you can do is, create EFS (Elastic File System) and then mount it in all the core nodes using nfs util (https://docs.aws.amazon.com/efs/latest/ug/wt1-test.html)

HIH

raevilman
  • 111