Questions tagged [amazon-emr]

4 questions
4
votes
0 answers

Reading data from Amazon redshift in Spark 2.4

We used to read data in Spark 2.3 using databricks with the following code segment Spark-Shell initialization : spark-shell --jars RedshiftJDBC42-1.2.10.1009.jar --packages…
1
vote
1 answer

Can not connect to Amazon EMR cluster with PuTTY

I created EMR cluster with standard configuration. Then I allowed inbound SSH traffic on port 22 for the corresponding security group. I added the following rules: Then I followed the instructions: But I am getting the error: Server refused our…
Andrey
  • 111
1
vote
0 answers

How to read large zip files in pyspark

I do have n number of .zip files on s3, which I want to process and extract some data out of them. zip files contains a single json file. In spar we can read .gz files, but I didn't find any way to read data within .zip files. Can someone please…
Sandie
  • 111
1
vote
1 answer

How to add an EBS volume by snapshot ID to Amazon EMR

We have a large amount of data on an EBS volume. I am familiar with attaching the volume to a new EC2 cluster. But how is this done for EMR ? Here is the Add Storage dialog: notice there is no entries for specifying the EBS Snapshot ID: