Enricher job fails


#1

Hi All,

While running the enricher my job is failing due to below error:

Exception in thread “main” cascading.flow.FlowException: step failed: (2/3) …d/run=2017-01-30-04-22-43, with job id: job_1485750568870_0003, please see cluster logs for failure messages
at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:221)
at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:149)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:124)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:43)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Can anyone help!

Deepak Bhatt


#2

Hi @deepak, you’ll also need to post the cluster logs which will contain more information about the root cause of the failure.


#3

Thankx @mike. Please find the below cluster logs:

2017-01-30 05:28:38,610 INFO cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] starting step: (2/3) …d/run=2017-01-30-04-22-43
2017-01-30 05:28:38,610 INFO cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics…] starting step: (3/3) …/snowplow/enriched-events
2017-01-30 05:28:38,675 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-2): Connecting to ResourceManager at ip-172-31-30-28.eu-west-1.compute.internal/172.31.30.28:8032
2017-01-30 05:28:38,723 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-3): Connecting to ResourceManager at ip-172-31-30-28.eu-west-1.compute.internal/172.31.30.28:8032
2017-01-30 05:28:38,772 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-2): Connecting to ResourceManager at ip-172-31-30-28.eu-west-1.compute.internal/172.31.30.28:8032
2017-01-30 05:28:38,886 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-3): Connecting to ResourceManager at ip-172-31-30-28.eu-west-1.compute.internal/172.31.30.28:8032
2017-01-30 05:28:45,088 INFO org.apache.hadoop.mapred.FileInputFormat (pool-5-thread-3): Total input paths to process : 476
2017-01-30 05:28:45,834 INFO org.apache.hadoop.mapred.FileInputFormat (pool-5-thread-2): Total input paths to process : 476
2017-01-30 05:28:45,890 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-3): number of splits:476
2017-01-30 05:28:46,342 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-3): Submitting tokens for job: job_1485750568870_0002
2017-01-30 05:28:46,409 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl (pool-5-thread-3): Submitted application application_1485750568870_0002
2017-01-30 05:28:46,417 INFO org.apache.hadoop.mapreduce.Job (pool-5-thread-3): The url to track the job: http://ip-172-31-30-28.eu-west-1.compute.internal:20888/proxy/application_1485750568870_0002/
2017-01-30 05:28:46,417 INFO cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics…] submitted hadoop job: job_1485750568870_0002
2017-01-30 05:28:46,417 INFO cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics…] tracking url: http://ip-172-31-30-28.eu-west-1.compute.internal:20888/proxy/application_1485750568870_0002/
2017-01-30 05:28:46,505 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-2): number of splits:476
2017-01-30 05:28:46,812 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-2): Submitting tokens for job: job_1485750568870_0003
2017-01-30 05:28:46,856 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl (pool-5-thread-2): Submitted application application_1485750568870_0003
2017-01-30 05:28:46,865 INFO org.apache.hadoop.mapreduce.Job (pool-5-thread-2): The url to track the job: http://ip-172-31-30-28.eu-west-1.compute.internal:20888/proxy/application_1485750568870_0003/
2017-01-30 05:28:46,866 INFO cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] submitted hadoop job: job_1485750568870_0003
2017-01-30 05:28:46,866 INFO cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] tracking url: http://ip-172-31-30-28.eu-west-1.compute.internal:20888/proxy/application_1485750568870_0003/
2017-01-30 05:47:08,534 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] hadoop job job_1485750568870_0003 state at FAILED
2017-01-30 05:47:08,536 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] failure info: Task failed task_1485750568870_0003_m_000003
Job failed as tasks failed. failedMaps:1 failedReduces:0

2017-01-30 05:47:08,575 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] task completion events identify failed tasks
2017-01-30 05:47:08,575 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] task completion events count: 10
2017-01-30 05:47:08,576 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] event = Task Id : attempt_1485750568870_0003_m_000001_0, Status : SUCCEEDED
2017-01-30 05:47:08,577 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] event = Task Id : attempt_1485750568870_0003_m_000004_0, Status : SUCCEEDED
2017-01-30 05:47:08,577 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] event = Task Id : attempt_1485750568870_0003_m_000005_0, Status : SUCCEEDED
2017-01-30 05:47:08,577 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] event = Task Id : attempt_1485750568870_0003_m_000006_0, Status : SUCCEEDED
2017-01-30 05:47:08,577 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] event = Task Id : attempt_1485750568870_0003_m_000010_0, Status : SUCCEEDED
2017-01-30 05:47:08,577 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] event = Task Id : attempt_1485750568870_0003_m_000012_0, Status : SUCCEEDED
2017-01-30 05:47:08,577 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] event = Task Id : attempt_1485750568870_0003_m_000011_0, Status : SUCCEEDED
2017-01-30 05:47:08,577 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] event = Task Id : attempt_1485750568870_0003_m_000003_0, Status : FAILED
2017-01-30 05:47:08,577 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] event = Task Id : attempt_1485750568870_0003_m_000013_0, Status : SUCCEEDED
2017-01-30 05:47:08,578 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] event = Task Id : attempt_1485750568870_0003_m_000007_0, Status : SUCCEEDED
2017-01-30 05:47:08,586 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] stopping all jobs
2017-01-30 05:47:08,587 INFO cascading.flow.FlowStep (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] stopping: (3/3) …/snowplow/enriched-events
2017-01-30 05:47:09,592 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-18-187.eu-west-1.compute.internal/172.31.18.187:53364. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2017-01-30 05:47:10,593 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-18-187.eu-west-1.compute.internal/172.31.18.187:53364. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2017-01-30 05:47:11,594 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-18-187.eu-west-1.compute.internal/172.31.18.187:53364. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2017-01-30 05:47:11,707 INFO org.apache.hadoop.mapred.ClientServiceDelegate (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-01-30 05:47:18,689 INFO cascading.flow.FlowStep (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] stopping: (2/3) …d/run=2017-01-30-04-22-43
2017-01-30 05:47:19,696 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-20-96.eu-west-1.compute.internal/172.31.20.96:52029. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2017-01-30 05:47:20,697 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-20-96.eu-west-1.compute.internal/172.31.20.96:52029. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2017-01-30 05:47:21,698 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-20-96.eu-west-1.compute.internal/172.31.20.96:52029. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2017-01-30 05:47:21,809 INFO org.apache.hadoop.mapred.ClientServiceDelegate (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Application state is completed. FinalApplicationStatus=FAILED. Redirecting to job history server
2017-01-30 05:47:23,093 INFO cascading.flow.FlowStep (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] stopping: (1/3)
2017-01-30 05:47:24,095 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-18-187.eu-west-1.compute.internal/172.31.18.187:58695. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2017-01-30 05:47:25,096 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-18-187.eu-west-1.compute.internal/172.31.18.187:58695. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2017-01-30 05:47:26,097 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-18-187.eu-west-1.compute.internal/172.31.18.187:58695. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2017-01-30 05:47:26,204 INFO org.apache.hadoop.mapred.ClientServiceDelegate (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-01-30 05:47:26,996 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] stopped all jobs
2017-01-30 05:47:27,360 INFO cascading.tap.hadoop.util.Hadoop18TapUtil (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): deleting temp path s3://udmd-a-storage/udmd-a-enriched/enrich/bad/run=2017-01-30-04-22-43/_temporary
2017-01-30 05:47:27,404 INFO cascading.tap.hadoop.util.Hadoop18TapUtil (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): deleting temp path hdfs:/local/snowplow/enriched-events/_temporary

Deepak


#4

This looks like your data nodes aren’t able to connect to the name node.

Is the cluster running out of disk space at any point in the job?


#5

Thanks for your response mike. There is java heap space error in my nodes. Is there any way to increase the java heap space?

-Regards
D B


#6

If you’re running out of memory on nodes the easiest way is to switch EC2 instance types to a type with more memory per node.


#7

I think there is some problem with my data set. With the same configuration I am able to process my production logs but unable to process the UAT logs. How can I debug the issue.

Thanks!
DB