Amazon's S3 Select allows a user to write SQL against S3 objects, but there's insufficient documentation around what standard SQL functionality is supported.
According to the documentation, Amazon S3 Select supports AVG, COUNT, MAX, MIN, and SUM. But when I run any aggregate query I get an error like
botocore.exceptions.ClientError: An error occurred (UnsupportedSqlOperation) when calling the SelectObjectContent operation: Unsupported SQL operation GROUP BY. Please check the service documentation for supported operations.
Here's some code to reproduce:
import boto3
client = boto3.client('s3')
response = client.select_object_content(Bucket='my-bucket', Key='object.csv',
ExpressionType='SQL', Expression="select ID, count(*) from s3object group by s.ID ",
InputSerialization = {'CSV':{"FileHeaderInfo": "Use"}, 'CompressionType': 'GZIP'},
OutputSerialization = {'CSV':{}})
event_stream = response['Payload']
with open('output', 'wb') as f:
for event in event_stream:
if 'Records' in event:
data = event['Records']['Payload']
f.write(data)