Emr-etl runner error

jpdc · August 16, 2017, 2:31pm

Hello

I’ve been trying to setup snowplow and I’m having issues …
I follow the guide and got the elasticbean setup and I was able to get the emr-etl config.xml working with version r90, the emr cluster starts setting up but it errors when trying to download the rdb logs

root@snowplow:~/emr-etl# ./snowplow-emr-etl-runner --skip staging,archive_raw --config config/config.yml --targets config/targets/ --resolver resolver.json --enrichments file:./enrichments
D, [2017-08-16T14:01:22.054000 #6510] DEBUG -- : Initializing EMR jobflow
D, [2017-08-16T14:01:27.468000 #6510] DEBUG -- : EMR jobflow j-1X4LYCUAUP5XQ started, waiting for jobflow to complete...
I, [2017-08-16T14:01:27.481000 #6510]  INFO -- : SnowplowTracker::Emitter initialized with endpoint http://collector.namastetech.me:80/i
I, [2017-08-16T14:01:27.826000 #6510]  INFO -- : Attempting to send 1 request
I, [2017-08-16T14:01:27.842000 #6510]  INFO -- : Sending GET request to http://collector.namastetech.me:80/i...
I, [2017-08-16T14:01:27.896000 #6510]  INFO -- : GET request to http://collector.namastetech.me:80/i finished with status code 200
I, [2017-08-16T14:11:29.152000 #6510]  INFO -- : RDB Loader logs
D, [2017-08-16T14:11:29.635000 #6510] DEBUG -- : Downloading s3://ntech-snoplow-data/snowplow-log/rdb-loader/2017-08-16-14-01-22/3465bd73-393e-40a4-a622-6e4106b658af to /root/emr-etl/rdbloader20170816-6510-1qjquf2
E, [2017-08-16T14:11:31.563000 #6510] ERROR -- : Error while downloading RDB log s3://ntech-snoplow-data/snowplow-log/rdb-loader/2017-08-16-14-01-22/3465bd73-393e-40a4-a622-6e4106b658af
E, [2017-08-16T14:11:31.595000 #6510] ERROR -- : undefined method `body' for nil:NilClass
I, [2017-08-16T14:11:31.899000 #6510]  INFO -- : Attempting to send 1 request
I, [2017-08-16T14:11:31.909000 #6510]  INFO -- : Sending GET request to http://collector.namastetech.me:80/i...
I, [2017-08-16T14:11:31.991000 #6510]  INFO -- : GET request to http://collector.namastetech.me:80/i finished with status code 200
F, [2017-08-16T14:11:32.347000 #6510] FATAL -- :

In the config.xml I have log: s3://ntech-snoplow-data/snowplow-log

any idea what could be wrong? did I missed a step or something?

Thanks in advace

anton · August 16, 2017, 3:04pm

Hello @jpdc,

Is it your first EmrEtlRunner run? I’m puzzled about why you’ve skipped staging step, as it is skipped usually if previous run failed. So far I have a feeling that enrich and shred jobs were just implicitly skipped because there’s no data.

If you’re aware of recovery process and this is not your first run could you please share following (with credentials removed):

Your config/config.yml
Log file at s3://ntech-snoplow-data/snowplow-log/rdb-loader/2017-08-16-14-01-22/3465bd73-393e-40a4-a622-6e4106b658af

jpdc · August 16, 2017, 4:41pm

@anton

Thanks for reply

Yes, it was the first run – it was now working, I was just following the guide – I’m new to this so I’m still trying to figure this out

removing the --skip option did not give error

root@snowplow:~/emr-etl# ./snowplow-emr-etl-runner --config config/config.yml --targets config/targets/ --resolver resolver.json --enrichments file:./enrichments
D, [2017-08-16T16:34:51.415000 #7057] DEBUG -- : Staging raw logs...
  moving files from s3://ntech-snoplow-data/old-data/ to s3://ntech-snoplow-data/processing/
  moving files from s3://ntech-snoplow-data/ to s3://ntech-snoplow-data/processing/

what is the --skip staging does? also why it did not started the EMR cluster? according to the documentation

Invoking EmrEtlRunner with just the --config option puts it into rolling mode, processing all the raw Snowplow event logs it can find in your In Bucket:

jpdc · August 16, 2017, 6:10pm

nvm i figured,

question tho, I have this setup since yesterday and barely some logs from elasticbean bucket, I see the cluster is stuck in

Elasticity Spark Step: Enrich Raw Events Running 2017-08-16 12:08 (UTC-5) 57 minutes

is it normail that is taking so long? the master and core instance are m1.medium – should be enought for testing… it worries me what would happen when is live lol

Topic		Replies	Views
Failed to start EmrEtlRunner Enrichment	6	1877	May 9, 2016
Emr etl runner fails without useful error on step "Elasticity Spark Step: Enrich Raw Events" Troubleshooting	3	3138	July 25, 2018
Shred problems using Batch Troubleshooting	1	828	December 5, 2020
EmrEtlRunner ArgumentError (AWS EMR API Error (ValidationException) Enrichment	9	2173	March 29, 2017
Cluster: Snowplow ETLTerminated with errorsShut down as step failed Duplicate	2	2156	October 10, 2017

Emr-etl runner error

Related Topics