First time run snowplow-emr-etl-runner problem


#1

Hi guys. I am exploring snowplow and running it the first time. I made the configuration config.yaml file and copy default resolver.json for the first run.
for some reason I got an error all the time:

./snowplow-emr-etl-runner -d --config config/config.yml --resolver resolver.json
invalid option: -d

./snowplow-emr-etl-runner --config config/config.yml --resolver resolver.json
invalid option: --config

./snowplow-emr-etl-runner --resolver resolver.json
invalid option: --resolver

It looks like with any command line option I choose to return the error.

I am using snowplow_emr_r97_knossos.zip

Can you please help how to resolve the issue.
Thanks
Oleg.


#2

I wonder if that R97 was before the changes to the CLI interface. You may have to dig up older documentation then.

I would suggest upgrading your pipeline to later releases. There are some worthwhile fixes for PII data, Clojure collector and IP lookups that you will probably want to be using…


#3

Ok , Thank you for the quick answer. Can you please point me to the links I should use. I passed successfully first 2 steps using this link:

  1. [Setup a Snowplow Collector]
  2. [Setup a Snowplow Tracker]

I started step 3 Enrich process (https://github.com/snowplow/snowplow/wiki/Setting-up-Enrich)
and follow the instruction. ( configure config.yaml and iglu.json ) and download snowplow-emr-etl-runner.

At this step, I got stuck.
Question:
1) what older version should I use snowplow-emr-etl-runner?
2) to upgrade the later pipeline - what documentation should I use? It is my first time I am
installing snowplow so I am a bit confused about how to proceed :slight_smile:

Thanks
Oleg


#4

i use this all the time for upgrades:

we are on 109 and stable and happy.

collector -> kinesis -> s3loader -> EMR ETL runner -> redshift

is the flow we are using.


#5

Thanks guys. I’ve made some progress.

It looks my problem was related to
[https://github.com/snowplow/snowplow/wiki/Troubleshooting#etl-failure](http://Shredding is failing with File does not exist)

I took snowplow-emr-etl-runner 2-3 versions back

./snowplow-emr-etl-runner -d --config config/config.yml --resolver iglu_resolver.json --skip shred stage

and it works!!!

Now I am going to deal with storing data to redshift. :slight_smile:


Loading data from s3 to Redshift after EmrEtlRunner