Snowplow ETL: TERMINATED_WITH_ERRORS [STEP_FAILURE


#1

getting error in 4th stage of process.
Below is the command to run process and error details:
ubuntu@ip-172-31-38-39:~$ ./snowplow-emr-etl-runner run --config snowplow/4-storage/config/emretlrunner.yml --resolver snowplow/3-enrich/config/iglu_resolver.json --enrichments snowplow/3-enrich/config/enrichments/

D, [2017-10-03T12:24:52.338000 #3942] DEBUG – : Initializing EMR jobflow
D, [2017-10-03T12:24:58.797000 #3942] DEBUG – : EMR jobflow j-3AHEY12TYQ0EB started, waiting for jobflow to complete…
F, [2017-10-03T12:37:01.353000 #3942] FATAL – :

Snowplow::EmrEtlRunner::EmrExecutionError (EMR jobflow j-3AHEY12TYQ0EB failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.
Snowplow ETL: TERMINATED_WITH_ERRORS [STEP_FAILURE] ~ 00:05:47 [2017-10-03 12:30:57 +0000 - 2017-10-03 12:36:44 +0000]

    1. Elasticity S3DistCp Step: Raw s3://snowplowevents/ -> Raw Staging S3: COMPLETED ~ 00:01:18 [2017-10-03 12:30:59 +0000 - 2017-10-03 12:32:17 +0000]
    1. Elasticity S3DistCp Step: Raw S3 -> Raw HDFS: COMPLETED ~ 00:01:20 [2017-10-03 12:32:19 +0000 - 2017-10-03 12:33:39 +0000]
    1. Elasticity Spark Step: Enrich Raw Events: COMPLETED ~ 00:01:06 [2017-10-03 12:33:41 +0000 - 2017-10-03 12:34:47 +0000]
    1. Elasticity S3DistCp Step: Enriched HDFS -> S3: FAILED ~ 00:00:06 [2017-10-03 12:34:49 +0000 - 2017-10-03 12:34:55 +0000]
    1. Elasticity S3DistCp Step: Shredded S3 -> Shredded Archive S3: CANCELLED ~ elapsed time n/a [ - ]
    1. Elasticity S3DistCp Step: Enriched S3 -> Enriched Archive S3: CANCELLED ~ elapsed time n/a [ - ]
    1. Elasticity S3DistCp Step: Raw Staging S3 -> Raw Archive S3: CANCELLED ~ elapsed time n/a [ - ]
    1. Elasticity S3DistCp Step: Shredded HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]
    1. Elasticity S3DistCp Step: Shredded HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]
    1. Elasticity Spark Step: Shred Enriched Events: CANCELLED ~ elapsed time n/a [ - ]
    1. Elasticity Custom Jar Step: Empty Raw HDFS: CANCELLED ~ elapsed time n/a [ - ]
    1. Elasticity S3DistCp Step: Enriched HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]):
      uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:586:in run' uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:insend_to’
      uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in call_with' uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:inblock in redefine_method’
      uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:103:in run' uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:insend_to’
      uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in call_with' uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:inblock in redefine_method’
      uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41:in <main>' org/jruby/RubyKernel.java:979:inload’
      uri:classloader:/META-INF/main.rb:1:in <main>' org/jruby/RubyKernel.java:961:inrequire’
      uri:classloader:/META-INF/main.rb:1:in (root)' uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in

Please help me to resolve this error


#2

Hi @sandesh - sorry to hear you are having trouble!

The most common reason for S3DistCp failing is if there’s no data to move. Are you sure you are processing events and those events are validating okay?


#3

Thanks you so much for your valuable reply…
Hey alex from kinesis-s3 to s3 bucket we are sending with lzo format.
Attached is the events file of lzo format.

‰LZO 

 €	@      ¤YÔŠt     3e`    I   K)ØÕXÍL)²¼W™!q½ÿí  byte[]à d   172.31.38.39
 È  ^æ<e× Ò   UTF-8 Ü   ssc-0.9.0-kinesis,   sMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.366   ;http://localhost:8085/ChangesHTML/SampleExampleTracker.html@   /iJ  stm=1507101287282&e=pp&url=http%3A%2F%2Flocalhost%3A8085%2FChangesHTML%2FSampleExampleTracker.html&page=Fixed%20Width%202%20Blue&pp_mix=0&pp_max=0&pp_miy=0&pp_may=0&tv=js-2.8.0&tna=cf&aid=12&p=web&tz=Asia%2FKolkata&lang=en-US&cs=UTF-8&f_pdf=1&f_qt=0&f_realp=0&f_wma=0&f_dir=0&f_fla=0&f_java=0&f_gearsˆ =ag=0&res=1366x768&cd=24&cookie=1&eid=61dea568-9595-44e4-ae5f-55aa14ae5abb&dtm=1(è ƒ79&vp=1517x735&ds=1499x840&vid=1&sid=e15722ce-83bf-4895-a347-552ae6b3a1cb&duid=4b22d39e-4a36-48e0-bb28-90a158cf0158&fp=1283365705^      Host: lo)€	 2   Connection: keep-alive   User-Agent: Moz Od    2Accept: image/webp,Ìapngì *, */*;q=0.8   DReferer: h T
   "´ -Encoding: gzip, deflate, br    Ô Language: en-US, en'FºCˆI: rxVisitordA  5864829678I083H1MM3UVIPQREIQSNDG5V1FS6344V; _sp_id.1fff=a3310903-1094-4cb8-a179-e209cb4198f9.1500386692.22.1503474180.1503471200.6f1d47eb-6b94-408c-b0dd-6c91d546bfc0   	localhostš   $a1162659-16b8-4102-b6ee-f1ef58e773d2zi   Aiglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0       

please check and let us know whether it is valid?

And also in my s3 bucket there are so many unwanted files getting added.
below is the sample of the unwanted file and i have attached screenshot of those unwanted file list.

6bc7ed293c5937358736661139581633c1b396fc702984f48f9f508a363d9131 unilogregion1 [04/Oct/2017:06:47:49 +0000] 10.35.41.248 3272ee65a908a7677109fedda345db8d9554ba26398b2ca10581de88777e2b61 7E1BD247249C9134 REST.PUT.OBJECT 2017-10-04-06-47-49-EB8921A2C55E55A9 "PUT /unilogregion1/2017-10-04-06-47-49-EB8921A2C55E55A9 HTTP/1.1" 200 - - 363 62 44 "-" "aws-internal/3" -

Please correct me if i need to change anything!


#4

Hey alex i resolved the error.

Below is the success message after running emretlrunner.

 ubuntu@ip-172-31-38-39:~$ ./snowplow-emr-etl-runner run --config snowplow/4-
storage/config/emretlrunner.yml --resolver snowplow/3-enrich/config/iglu_resolver.json --enrichments 
snowplow/3-enrich/config/enrichments/
D, [2017-10-04T07:21:17.069000 #9396] DEBUG -- : Initializing EMR jobflow
D, [2017-10-04T07:21:27.195000 #9396] DEBUG -- : EMR jobflow j-2AWULJ44UQI6P started, waiting for 
jobflow to complete...
I, [2017-10-04T07:43:31.461000 #9396]  INFO -- : No RDB Loader logs
D, [2017-10-04T07:43:31.461000 #9396] DEBUG -- : EMR jobflow j-2AWULJ44UQI6P completed 
successfully.
I, [2017-10-04T07:43:31.461000 #9396]  INFO -- : Completed successfully

I am following below architecture:

JavaScript Tracker --> Scala Stream Collector --> Stream enrich --> kinesis S3 --> S3 -> EmrEtlRunner (shredding) -> PostgreSQL

please tell me what is next step to be followed inorder to store the events to PostgreSQL database.


#5

Hi @sandesh - glad you got it working! Please raise a new thread if you have a new question.


#6

Thanks for the quick reply @alex