I’m attempting to run EMR ETL Runner for the first time, and I’m getting a failure “Input path does not exist” on Step 4 of enrichment (according to the diagram here )
It seems the most common cause of this is that there is no data in the raw in bucket, but I can confirm that I have data to process, and the folder structure there is getting replaced with XX_folder files.
I am using Scala collector to Kinesis to S3 (using AWS Kinesis Firehose delivery stream).
It’s not clear from the setup guide if and how exactly the AWS side of this is supposed to be done, so maybe that’s where the issue is. I see some docs and references to LZO format for EMR but the docs for setting up a Kinesis LZO S3 sink that is in the alternative data stores setup step after setting up enrichment. Is that a required step for Kinesis to EMR?
The Kinesis Firehose delivery stream doesn’t have an option for LZO format. I’m currently GZIP. Should compression be disabled, instead?