Failing in the 4th steps of storage process(Input file not found)


#1

Hi

  1. In the web page i have added the page veiw tracking script.

  2. when i load the do the action on the web page, below is the response i am getting in the scala stream collector

     06:24:25.672 [scala-stream-collector-akka.actor.default-dispatcher-8] DEBUG s.can.server.HttpServerConnection - Dispatching GET request to http://localhost:8082/i?stm=1511763945885&e=pv&url=http://localhost:8085/ChangesHTML/SampleExampleTracker.html&page=Fixed+Width+2+Blue&tv=js-2.8.0&tna=cf&aid=13&p=web&tz=Asia/Kolkata&lang=en-US&cs=UTF-8&f_pdf=1&f_qt=0&f_realp=0&f_wma=0&f_dir=0&f_fla=0&f_java=0&f_gears=0&f_ag=0&res=1366x768&cd=24&cookie=1&eid=f12f640c-ba08-48a8-96b0-f9ff57ddccdc&dtm=1511763945883&vp=1517x735&ds=1518x736&vid=1&sid=117cb3c3-a8a5-4716-921b-32d5b4dae585&duid=e727b978-3be6-4a78-aac5-42ba1f0c6e38&fp=4265106636 to handler Actor[akka://scala-stream-collector/system/IO-TCP/selectors/$a/1#1753195764]
     06:24:26.310 [scala-stream-collector-akka.actor.default-dispatcher-8] DEBUG s.can.server.HttpServerConnection - Dispatching GET request to http://localhost:8082/i?stm=1511763946625&e=pv&url=http://localhost:8085/ChangesHTML/SampleExampleTracker.html&page=Fixed+Width+2+Blue&tv=js-2.8.0&tna=cf&aid=13&p=web&tz=Asia/Kolkata&lang=en-US&cs=UTF-8&f_pdf=1&f_qt=0&f_realp=0&f_wma=0&f_dir=0&f_fla=0&f_java=0&f_gears=0&f_ag=0&res=1366x768&cd=24&cookie=1&eid=74795290-9a1c-4567-85b2-3e25389ccfd8&dtm=1511763946622&vp=1517x735&ds=1499x1028&vid=1&sid=6b95a78f-7ae8-4656-8538-0ba0c9d8ce8e&duid=6567aa9b-809f-472c-a6db-e73dab17ddc1&fp=4265106636 to handler Actor[akka://scala-stream-collector/system/IO-TCP/selectors/$a/1#1753195764]
    
  3. Then i have created 2 kinises stream, to pass the events to s3 bucket.
    below is the command to run.
    (./snowplow-kinesis-s3-0.5.0 --config kinises.conf)
    Below is the data that is passed from kinesis stream to S3 bucket.

     [RecordProcessor-0000] INFO com.snowplowanalytics.snowplow.storage.kinesis.s3.S3Emitter - Flushing buffer with 8 records.
     [RecordProcessor-0000] INFO com.snowplowanalytics.snowplow.storage.kinesis.s3.S3Emitter - Successfully serialized 8 records out of 8
     [RecordProcessor-0000] INFO com.snowplowanalytics.snowplow.storage.kinesis.s3.S3Emitter - Successfully emitted 8 records to S3 in s3://databaseregionevents/2017-11-27-49578891737724711875591370082515362848639313462685597698-49578891737724711875591370107949953167511509038834647042.lzo
     [RecordProcessor-0000] INFO com.snowplowanalytics.snowplow.storage.kinesis.s3.S3Emitter - Successfully emitted 8 records to S3 in s3://databaseregionevents/2017-11-27-49578891737724711875591370082515362848639313462685597698-49578891737724711875591370107949953167511509038834647042.lzo.index
    

Below is the data when i opened .lzo file(Note: i have just opened that file with notepad ++ software, didnt extract anything)
(s3://databaseregionevents/2017-11-27-49578891737724711875591370082515362848639313462685597698-49578891737724711875591370107949953167511509038834647042.lzo)

		‰LZO 
	
	 €	@      ¤Ze±%     ,Ý€  j  ë   )ØÕXÍL)²¼W™!q½ÿV  byte[]É d   172.31.38.39
	 È  _ü+N) Ò   UTF-8 Ü   ssc-0.9.0-kinesis,   rMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.366   ;http://localhost:8085/ChangesHTML/SampleExampleTracker.html@   /iJ  stm=1511764297427&e=pp&url=http%3A%2F%2Flocalhost%3A8085%2FChangesHTML%2FSampleExampleTracker.html&page=Fixed%20Width%202%20Blue&pp_mix=0&pp_max=0&pp_miy=0&pp_may=0&tv=js-2.8.0&tna=cf&aid=13&p=web&tz=Asia%2FKolkata&lang=en-US&cs=9 &f_pdf=1&f_qt=0&f_realp‡wmaŸ dirž fl¿jav gearsˆ  Mag=0&res=1366x768&cd=24&cookie=1&eid=ac450cd0-8bd6-4b53-bf6d-1a7a0397105e&dtm=1511764297423&vp=1517x735&ds=1499x860&vid=1&sid=4fb9e7d0-5df4-4e60-9226-8457b36193bd&duid=e071f763-3e03-4e40-9472-6f4d56e444a0&fp=4265106636^      Host: localhost:8082   Connection: keep-alive   ~User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537. `    2Accept: image/webp,Ìapngì 	*, */*;q=0.8   DReferer: ht P
	   "´ -Encoding: gzip, deflate, br    Ô Language: en-US, en„9  %C„I  ’: rxVisitor=1495864829678I083H1MM3UVIPQREIQSNDG5V1FS6344V; _sp_id.1fff=a3310903-1094-4cb8-a179-e209cb4198f9.1500386692.22.1503474180.1503471200.6f1d47eb-6b94-408c-b0dd-6c91d546bfc0; loginMessage=logout; F47F4A0F30FB7A75=sandesh.p@unilogcorp.com; 750B2E0333A28C1D=test1234; F30FB33A2=true   	localhostš   $7ec37777-495b-4b2b-8a11-da7d24ef9a5dzi   Aiglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0       

Below is the data of .lzo.index file(Note: i have just opened that file with notepad ++ software, didnt extract anything)
(s3://databaseregionevents/2017-11-27-49578891737724711875591370082515362848639313462685597698-49578891737724711875591370107949953167511509038834647042.lzo.index)

       &
  1. Then i will start the enrichement process of the events by using following command.

     ./snowplow-emr-etl-runner run --config snowplow/4-storage/config/emretlrunner.yml --resolver snowplow/3-enrich/config/iglu_resolver.json --enrichments snowplow/3-enrich/config/enrichments/
    

Once i run this command all the 12 steps completed suceessfully, even i checked in the EMR.
Below is the message i got after process completed

	D, [2017-11-27T06:39:50.913000 #19422] DEBUG -- : Initializing EMR jobflow
	D, [2017-11-27T06:39:55.161000 #19422] DEBUG -- : EMR jobflow j-1T9FRDP4EWWI8 started, waiting for jobflow to complete...
	I, [2017-11-27T07:05:59.671000 #19422]  INFO -- : No RDB Loader logs
	D, [2017-11-27T07:05:59.671000 #19422] DEBUG -- : EMR jobflow j-1T9FRDP4EWWI8 completed successfully.
	I, [2017-11-27T07:05:59.671000 #19422]  INFO -- : Completed successfully

After the process completes successfully,

  1. Raw section, 2 folders has created i,e archive and processing. Inside the processing, logs folder has created and inside the archive under run folder kinesis events has copied.

  2. Enrich Section, 3 folders has created i,e archive, good and bad. Inside the archive, run folder has created and inside run folder .csv generated, Below is the data present in .csv file.

    13	web	2017-11-27 06:39:50.934	2017-11-27 06:30:17.385	2017-11-27 06:31:37.423	page_ping	ac450cd0-8bd6-4b53-bf6d-1a7a0397105e		cf	js-2.8.0	ssc-0.9.0-kinesis	spark-1.9.0-common-0.25.0		172.31.38.x	4265106636	e071f763-3e03-4e40-9472-6f4d56e444a0	1	7ec37777-495b-4b2b-8a11-da7d24ef9a5d												http://localhost:8085/ChangesHTML/SampleExampleTracker.html	Fixed Width 2 Blue		http	localhost	8085	/ChangesHTML/SampleExampleTracker.html																																						0	0	0	0	Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36	Chrome	Chrome	62.0.3202.94	Browser	WEBKIT	en-US	1	0	0	0	0	0	0	0	0	1	24	1517	735	Windows 10	Windows	Microsoft Corporation	Asia/Kolkata	Computer	0	1366	768	UTF-8	1499	860												2017-11-27 06:31:37.427			{"schema":"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-1","data":[{"schema":"iglu:com.snowplowanalytics.snowplow/ua_parser_context/jsonschema/1-0-0","data":{"useragentFamily":"Chrome","useragentMajor":"62","useragentMinor":"0","useragentPatch":"3202","useragentVersion":"Chrome 62.0.3202","osFamily":"Windows","osMajor":null,"osMinor":null,"osPatch":null,"osPatchMinor":null,"osVersion":"Windows","deviceFamily":"Other"}}]}	4fb9e7d0-5df4-4e60-9226-8457b36193bd	2017-11-27 06:30:17.381	com.snowplowanalytics.snowplow	page_ping	jsonschema	1-0-0	dcfc0cffb76b37e93a54d47d3b33ef1c	
    

Inside the good folder run folder has generated with 0KB with no data inside that file
Inside the bad folder run folder has generated with 2 files i,e success and part_0 with no data inside(0KB)
3. Shredded section, 3 folders has created i,e archive, good and bad. Inside the archive, run folder has created and inside run folder their is atomic-events and shredded-types.
Inside the good folder, run folder has generated and it has shredded_types inside run folder.
Inside the bad folder, run folder has generated and have so many files with 0KB

  1. In the storage process, Below is the command used to run

    ./snowplow-emr-etl-runner run --config snowplow/4-storage/config/emretlrunner.yml --resolver snowplow/4-storage/config/iglu_resolver.json --targets snowplow/4-storage/config/targets/ --skip analyze
    

It is failing in the 4th steps of the process, Below is the error

	Exception in thread "main" java.lang.RuntimeException: Error running job
		at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:927)
		at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:705)
		at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
		at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
		at com.amazon.elasticmapreduce.s3distcp.Main.main(Main.java:22)
		at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
		at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
		at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
		at java.lang.reflect.Method.invoke(Method.java:498)
		at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
		at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
	Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://ip-172-31-18-175.ec2.internal:8020/tmp/d28bb1f8-0bae-420d-97ef-45046305b36e/files
		at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:317)
		at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
		at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
		at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:352)
		at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
		at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
		at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
		at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
		at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
		at java.security.AccessController.doPrivileged(Native Method)
		at javax.security.auth.Subject.doAs(Subject.java:422)
		at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
		at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
		at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:901)
		... 10 more

I have explained everything in details, IS anything is missing in the above steps please let me know the details.

Thanks,
Sandesh P


#2

guys please help me out.
I am not finding any other options.


#3

Hi Sandesh, a quick google of this error returns this page, which might be informative.

Also, have you tried searching the AWS forums for a similar error?

I’m not sure what the solution is myself, and am short on time to go digging into the forums, but hopefully doing that research will be helpful. :slight_smile:

Best,
Colm