Elasticity Spark Step: Enrich Raw Events never ends

jpdc · August 16, 2017, 7:36pm

I stuck in there for a looong time

on the console I see

root@snowplow:~/emr-etl# ./snowplow-emr-etl-runner --config config/config.yml --targets config/targets/ --resolver resolver.json --enrichments file:./enrichments
D, [2017-08-16T19:19:06.674000 #7598] DEBUG -- : Staging raw logs...
  moving files from s3://elasticbeanstalk-us-east-1-768491813393/resources/environments/logs/publish/e-gepyppfmhg/ to s3://ntech-snoplow-data/processing/
(t0)    MOVE elasticbeanstalk-us-east-1-768491813393/resources/environments/logs/publish/e-gepyppfmhg/i-03c30b5f5f2a0d9e9/_var_log_tomcat8_rotated_localhost_access_log.txt1502910061.gz -> ntech-snoplow-data/processing/var_log_tomcat8_rotated_localhost_access_log.2017-08-16-19.us-east-1.i-03c30b5f5f2a0d9e9.txt.gz
      +-> ntech-snoplow-data/processing/var_log_tomcat8_rotated_localhost_access_log.2017-08-16-19.us-east-1.i-03c30b5f5f2a0d9e9.txt.gz
      x elasticbeanstalk-us-east-1-768491813393/resources/environments/logs/publish/e-gepyppfmhg/i-03c30b5f5f2a0d9e9/_var_log_tomcat8_rotated_localhost_access_log.txt1502910061.gz
D, [2017-08-16T19:19:15.482000 #7598] DEBUG -- : Waiting a minute to allow S3 to settle (eventual consistency)
D, [2017-08-16T19:20:15.487000 #7598] DEBUG -- : Initializing EMR jobflow
D, [2017-08-16T19:20:17.956000 #7598] DEBUG -- : EMR jobflow j-3EYT62SEUUZQN started, waiting for jobflow to complete...
I, [2017-08-16T19:20:17.976000 #7598]  INFO -- : SnowplowTracker::Emitter initialized with endpoint http://collector.namastetech.me:80/i
I, [2017-08-16T19:20:18.342000 #7598]  INFO -- : Attempting to send 1 request
I, [2017-08-16T19:20:18.346000 #7598]  INFO -- : Sending GET request to http://collector.namastetech.me:80/i...
I, [2017-08-16T19:20:18.391000 #7598]  INFO -- : GET request to http://collector.namastetech.me:80/i finished with status code 200

stderr logs shows lines like

17/08/16 19:31:47 INFO Client: Application report for application_1502911438395_0001 (state: RUNNING)
17/08/16 19:31:48 INFO Client: Application report for application_1502911438395_0001 (state: RUNNING)
17/08/16 19:31:50 INFO Client: Application report for application_1502911438395_0001 (state: RUNNING)
17/08/16 19:31:51 INFO Client: Application report for application_1502911438395_0001 (state: RUNNING)
17/08/16 19:31:52 INFO Client: Application report for application_1502911438395_0001 (state: RUNNING)
17/08/16 19:31:53 INFO Client: Application report for application_1502911438395_0001 (state: RUNNING)
17/08/16 19:31:54 INFO Client: Application report for application_1502911438395_0001 (state: RUNNING)
17/08/16 19:31:55 INFO Client: Application report for application_1502911438395_0001 (state: RUNNING)
17/08/16 19:31:56 INFO Client: Application report for application_1502911438395_0001 (state: RUNNING)
17/08/16 19:31:57 INFO Client: Application report for application_1502911438395_0001 (state: RUNNING)
17/08/16 19:31:58 INFO Client: Application report for application_1502911438395_0001 (state: RUNNING)
17/08/16 19:31:59 INFO Client: Application report for application_1502911438395_0001 (state: RUNNING)
17/08/16 19:32:00 INFO Client: Application report for application_1502911438395_0001 (state: RUNNING)

and it never ends

I barely have few tomcat logs to process so not sure what could be happening

any ideas?

thanks in advance

BenFradet · August 17, 2017, 8:06am

@jpdc

~~Which instance type are you using?~~

Saw in the other thread that you’re using m1.medium.

This instance type doesn’t seem to work for Spark most likely because of the number of virtual CPUs (1).

What’s puzzling is that YARN still accepts the request for resources.

ganesh · March 29, 2018, 9:38am

Core node instance must be m4.2Xlarge are greater than m1.medium atleast

alexc-sigfig · April 4, 2018, 8:02pm

I’m seeing the same symptom (neverending spark enrichment), and I’m using 1x c3.xlarge to process ~ 1.5GB collector logs. CPU and disk utilization on the EMR instances is nearly zero after a brief burst at the job start. I’m skeptical that it could be an issue with the instance type, but I’m at a loss for other explanations.

alexc-sigfig · April 4, 2018, 10:35pm

@alex / team-snowplow – any suggestions for the above? I’m running EmrEtlRunner r97, and I’ve tried both spark_enrich 1.11.0 and 1.12.0.

This is in furtherance of upgrading from r77, which stopped working a while back due to bitrot of at least one remote dependency jar required by the EMR bootstrap.

ihor · April 4, 2018, 11:44pm

@alexc-sigfig,

Not sure that it’s the same. In @jpdc example, I can see he ran EmrEtlRunner without run command which is an indication of the app prior to R91 when it was introduced.

We could also observe an ever running enrichment but it’s rather an exceptional case (happens very rear). Not much could be done but have the job terminated and resumed from the appropriate step.

For 1.5GB compressed logs, we use up to 1 x r4.8xlarge core instance; if that’s the size of uncompressed logs 1 x r4.xlarge would be sufficient.

In either case, a fine-tuned Spark settings are required for improved performance.

alexc-sigfig · April 5, 2018, 12:15am

I don’t believe what I’m seeing is strictly a performance issue; if c3.xlarge is running out of memory and the job is dying, disappointing that EMR can’t tell. htop shows all processes but itself as suspended (I’ve also strace'd looking for activity without success). I’ll try with a larger instance, though…

alexc-sigfig · April 5, 2018, 2:43am

r4.8xlarge progressed better, so appears that hang was due to under-resourcing. Unfortunately, it didn’t succeed, but that’s an issue for another thread. Thanks for the tip, @ihor.

Topic		Replies	Views
Emr etl runner fails without useful error on step "Elasticity Spark Step: Enrich Raw Events" Troubleshooting	3	3148	July 25, 2018
Enrich Raw Events fails due to "Not a file: hdfs" -- Clojure connector -- EMR ETL Runner Troubleshooting	11	1796	September 27, 2017
Elasticity Scalding Step: Enrich Raw Events fails Enrichment	2	1630	July 21, 2016
Emr-etl runner error Enrichment	3	1512	August 16, 2017
EMR job failed in Hadoop Enrich step AWS batch pipeline (Legacy)	2	1634	April 21, 2016

Elasticity Spark Step: Enrich Raw Events never ends

Related Topics