Documentation does not have many details on how to do this, i would like a step by step. I`m just starting with snowplow and i’ve had no sucess to run EmrEtlRunner. Could someone help me?
@feliciosan, that could be quite a big topic. Your starting point is here: https://github.com/snowplow/snowplow/wiki/Setting-up-EmrEtlRunner. It would be a better approach if you just follow the guidance there and ask the questions along the way if you get stuck anywhere.
Also, the following diagram would be helpful, in understanding the workflow of data processed by EmrEtlRunner: https://github.com/snowplow/snowplow/wiki/Batch-pipeline-steps.
I’ve got the config.yml.sample, and i also downloaded the snowplow-emr-etl-runner file, but i don’t know how to setup and run this on AWS. i already have a EC2 instance running on AWS. So i got stuck in step 4.CONFIGURATION of this link: https://github.com/snowplow/snowplow/wiki/Setting-up-EmrEtlRunner
If you need help with preparing the
config.yml you can refer to Common configuration.
Additionally, you will need
- Iglu resolver configuration file
- targets configuration file, which is optional if you need the data in S3 buckets only
Once configuration files completed you can run EmrEtlRunner as per https://github.com/snowplow/snowplow/wiki/2-Using-EmrEtlRunner.