Enricher job fails

Hi All,

While running the enricher my job is failing due to below error:

Exception in thread “main” cascading.flow.FlowException: step failed: (2/3) …d/run=2017-01-30-04-22-43, with job id: job_1485750568870_0003, please see cluster logs for failure messages
at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:221)
at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:149)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:124)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:43)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Can anyone help!

Deepak Bhatt

Hi @deepak, you’ll also need to post the cluster logs which will contain more information about the root cause of the failure.

Thankx @mike. Please find the below cluster logs:

2017-01-30 05:28:38,610 INFO cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] starting step: (2/3) …d/run=2017-01-30-04-22-43
2017-01-30 05:28:38,610 INFO cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics…] starting step: (3/3) …/snowplow/enriched-events
2017-01-30 05:28:38,675 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-2): Connecting to ResourceManager at ip-172-31-30-28.eu-west-1.compute.internal/172.31.30.28:8032
2017-01-30 05:28:38,723 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-3): Connecting to ResourceManager at ip-172-31-30-28.eu-west-1.compute.internal/172.31.30.28:8032
2017-01-30 05:28:38,772 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-2): Connecting to ResourceManager at ip-172-31-30-28.eu-west-1.compute.internal/172.31.30.28:8032
2017-01-30 05:28:38,886 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-3): Connecting to ResourceManager at ip-172-31-30-28.eu-west-1.compute.internal/172.31.30.28:8032
2017-01-30 05:28:45,088 INFO org.apache.hadoop.mapred.FileInputFormat (pool-5-thread-3): Total input paths to process : 476
2017-01-30 05:28:45,834 INFO org.apache.hadoop.mapred.FileInputFormat (pool-5-thread-2): Total input paths to process : 476
2017-01-30 05:28:45,890 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-3): number of splits:476
2017-01-30 05:28:46,342 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-3): Submitting tokens for job: job_1485750568870_0002
2017-01-30 05:28:46,409 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl (pool-5-thread-3): Submitted application application_1485750568870_0002
2017-01-30 05:28:46,417 INFO org.apache.hadoop.mapreduce.Job (pool-5-thread-3): The url to track the job: http://ip-172-31-30-28.eu-west-1.compute.internal:20888/proxy/application_1485750568870_0002/
2017-01-30 05:28:46,417 INFO cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics…] submitted hadoop job: job_1485750568870_0002
2017-01-30 05:28:46,417 INFO cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics…] tracking url: http://ip-172-31-30-28.eu-west-1.compute.internal:20888/proxy/application_1485750568870_0002/
2017-01-30 05:28:46,505 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-2): number of splits:476
2017-01-30 05:28:46,812 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-2): Submitting tokens for job: job_1485750568870_0003
2017-01-30 05:28:46,856 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl (pool-5-thread-2): Submitted application application_1485750568870_0003
2017-01-30 05:28:46,865 INFO org.apache.hadoop.mapreduce.Job (pool-5-thread-2): The url to track the job: http://ip-172-31-30-28.eu-west-1.compute.internal:20888/proxy/application_1485750568870_0003/
2017-01-30 05:28:46,866 INFO cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] submitted hadoop job: job_1485750568870_0003
2017-01-30 05:28:46,866 INFO cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] tracking url: http://ip-172-31-30-28.eu-west-1.compute.internal:20888/proxy/application_1485750568870_0003/
2017-01-30 05:47:08,534 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] hadoop job job_1485750568870_0003 state at FAILED
2017-01-30 05:47:08,536 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] failure info: Task failed task_1485750568870_0003_m_000003
Job failed as tasks failed. failedMaps:1 failedReduces:0

2017-01-30 05:47:08,575 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] task completion events identify failed tasks
2017-01-30 05:47:08,575 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] task completion events count: 10
2017-01-30 05:47:08,576 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] event = Task Id : attempt_1485750568870_0003_m_000001_0, Status : SUCCEEDED
2017-01-30 05:47:08,577 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] event = Task Id : attempt_1485750568870_0003_m_000004_0, Status : SUCCEEDED
2017-01-30 05:47:08,577 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] event = Task Id : attempt_1485750568870_0003_m_000005_0, Status : SUCCEEDED
2017-01-30 05:47:08,577 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] event = Task Id : attempt_1485750568870_0003_m_000006_0, Status : SUCCEEDED
2017-01-30 05:47:08,577 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] event = Task Id : attempt_1485750568870_0003_m_000010_0, Status : SUCCEEDED
2017-01-30 05:47:08,577 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] event = Task Id : attempt_1485750568870_0003_m_000012_0, Status : SUCCEEDED
2017-01-30 05:47:08,577 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] event = Task Id : attempt_1485750568870_0003_m_000011_0, Status : SUCCEEDED
2017-01-30 05:47:08,577 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] event = Task Id : attempt_1485750568870_0003_m_000003_0, Status : FAILED
2017-01-30 05:47:08,577 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] event = Task Id : attempt_1485750568870_0003_m_000013_0, Status : SUCCEEDED
2017-01-30 05:47:08,578 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] event = Task Id : attempt_1485750568870_0003_m_000007_0, Status : SUCCEEDED
2017-01-30 05:47:08,586 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] stopping all jobs
2017-01-30 05:47:08,587 INFO cascading.flow.FlowStep (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] stopping: (3/3) …/snowplow/enriched-events
2017-01-30 05:47:09,592 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-18-187.eu-west-1.compute.internal/172.31.18.187:53364. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2017-01-30 05:47:10,593 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-18-187.eu-west-1.compute.internal/172.31.18.187:53364. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2017-01-30 05:47:11,594 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-18-187.eu-west-1.compute.internal/172.31.18.187:53364. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2017-01-30 05:47:11,707 INFO org.apache.hadoop.mapred.ClientServiceDelegate (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-01-30 05:47:18,689 INFO cascading.flow.FlowStep (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] stopping: (2/3) …d/run=2017-01-30-04-22-43
2017-01-30 05:47:19,696 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-20-96.eu-west-1.compute.internal/172.31.20.96:52029. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2017-01-30 05:47:20,697 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-20-96.eu-west-1.compute.internal/172.31.20.96:52029. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2017-01-30 05:47:21,698 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-20-96.eu-west-1.compute.internal/172.31.20.96:52029. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2017-01-30 05:47:21,809 INFO org.apache.hadoop.mapred.ClientServiceDelegate (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Application state is completed. FinalApplicationStatus=FAILED. Redirecting to job history server
2017-01-30 05:47:23,093 INFO cascading.flow.FlowStep (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] stopping: (1/3)
2017-01-30 05:47:24,095 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-18-187.eu-west-1.compute.internal/172.31.18.187:58695. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2017-01-30 05:47:25,096 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-18-187.eu-west-1.compute.internal/172.31.18.187:58695. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2017-01-30 05:47:26,097 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-18-187.eu-west-1.compute.internal/172.31.18.187:58695. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2017-01-30 05:47:26,204 INFO org.apache.hadoop.mapred.ClientServiceDelegate (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-01-30 05:47:26,996 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] stopped all jobs
2017-01-30 05:47:27,360 INFO cascading.tap.hadoop.util.Hadoop18TapUtil (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): deleting temp path s3://udmd-a-storage/udmd-a-enriched/enrich/bad/run=2017-01-30-04-22-43/_temporary
2017-01-30 05:47:27,404 INFO cascading.tap.hadoop.util.Hadoop18TapUtil (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): deleting temp path hdfs:/local/snowplow/enriched-events/_temporary

Deepak

This looks like your data nodes aren’t able to connect to the name node.

Is the cluster running out of disk space at any point in the job?

Thanks for your response mike. There is java heap space error in my nodes. Is there any way to increase the java heap space?

-Regards
D B

If you’re running out of memory on nodes the easiest way is to switch EC2 instance types to a type with more memory per node.

I think there is some problem with my data set. With the same configuration I am able to process my production logs but unable to process the UAT logs. How can I debug the issue.

Thanks!
DB