Hi
-
In the web page i have added the page veiw tracking script.
-
when i load the do the action on the web page, below is the response i am getting in the scala stream collector
06:24:25.672 [scala-stream-collector-akka.actor.default-dispatcher-8] DEBUG s.can.server.HttpServerConnection - Dispatching GET request to http://localhost:8082/i?stm=1511763945885&e=pv&url=http://localhost:8085/ChangesHTML/SampleExampleTracker.html&page=Fixed+Width+2+Blue&tv=js-2.8.0&tna=cf&aid=13&p=web&tz=Asia/Kolkata&lang=en-US&cs=UTF-8&f_pdf=1&f_qt=0&f_realp=0&f_wma=0&f_dir=0&f_fla=0&f_java=0&f_gears=0&f_ag=0&res=1366x768&cd=24&cookie=1&eid=f12f640c-ba08-48a8-96b0-f9ff57ddccdc&dtm=1511763945883&vp=1517x735&ds=1518x736&vid=1&sid=117cb3c3-a8a5-4716-921b-32d5b4dae585&duid=e727b978-3be6-4a78-aac5-42ba1f0c6e38&fp=4265106636 to handler Actor[akka://scala-stream-collector/system/IO-TCP/selectors/$a/1#1753195764] 06:24:26.310 [scala-stream-collector-akka.actor.default-dispatcher-8] DEBUG s.can.server.HttpServerConnection - Dispatching GET request to http://localhost:8082/i?stm=1511763946625&e=pv&url=http://localhost:8085/ChangesHTML/SampleExampleTracker.html&page=Fixed+Width+2+Blue&tv=js-2.8.0&tna=cf&aid=13&p=web&tz=Asia/Kolkata&lang=en-US&cs=UTF-8&f_pdf=1&f_qt=0&f_realp=0&f_wma=0&f_dir=0&f_fla=0&f_java=0&f_gears=0&f_ag=0&res=1366x768&cd=24&cookie=1&eid=74795290-9a1c-4567-85b2-3e25389ccfd8&dtm=1511763946622&vp=1517x735&ds=1499x1028&vid=1&sid=6b95a78f-7ae8-4656-8538-0ba0c9d8ce8e&duid=6567aa9b-809f-472c-a6db-e73dab17ddc1&fp=4265106636 to handler Actor[akka://scala-stream-collector/system/IO-TCP/selectors/$a/1#1753195764]
-
Then i have created 2 kinises stream, to pass the events to s3 bucket.
below is the command to run.
(./snowplow-kinesis-s3-0.5.0 --config kinises.conf)
Below is the data that is passed from kinesis stream to S3 bucket.[RecordProcessor-0000] INFO com.snowplowanalytics.snowplow.storage.kinesis.s3.S3Emitter - Flushing buffer with 8 records. [RecordProcessor-0000] INFO com.snowplowanalytics.snowplow.storage.kinesis.s3.S3Emitter - Successfully serialized 8 records out of 8 [RecordProcessor-0000] INFO com.snowplowanalytics.snowplow.storage.kinesis.s3.S3Emitter - Successfully emitted 8 records to S3 in s3://databaseregionevents/2017-11-27-49578891737724711875591370082515362848639313462685597698-49578891737724711875591370107949953167511509038834647042.lzo [RecordProcessor-0000] INFO com.snowplowanalytics.snowplow.storage.kinesis.s3.S3Emitter - Successfully emitted 8 records to S3 in s3://databaseregionevents/2017-11-27-49578891737724711875591370082515362848639313462685597698-49578891737724711875591370107949953167511509038834647042.lzo.index
Below is the data when i opened .lzo file(Note: i have just opened that file with notepad ++ software, didnt extract anything)
(s3://databaseregionevents/2017-11-27-49578891737724711875591370082515362848639313462685597698-49578891737724711875591370107949953167511509038834647042.lzo)
‰LZO
€ @ ¤Ze±% ,Ý€ j ë )ØÕXÍL)²¼W™!q½ÿV byte[]É d 172.31.38.39
È _ü+N) Ò UTF-8 Ü ssc-0.9.0-kinesis, rMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.366 ;http://localhost:8085/ChangesHTML/SampleExampleTracker.html@ /iJ stm=1511764297427&e=pp&url=http%3A%2F%2Flocalhost%3A8085%2FChangesHTML%2FSampleExampleTracker.html&page=Fixed%20Width%202%20Blue&pp_mix=0&pp_max=0&pp_miy=0&pp_may=0&tv=js-2.8.0&tna=cf&aid=13&p=web&tz=Asia%2FKolkata&lang=en-US&cs=9 &f_pdf=1&f_qt=0&f_realp‡wmaŸ dirž fl¿jav gearsˆ Mag=0&res=1366x768&cd=24&cookie=1&eid=ac450cd0-8bd6-4b53-bf6d-1a7a0397105e&dtm=1511764297423&vp=1517x735&ds=1499x860&vid=1&sid=4fb9e7d0-5df4-4e60-9226-8457b36193bd&duid=e071f763-3e03-4e40-9472-6f4d56e444a0&fp=4265106636^ Host: localhost:8082 Connection: keep-alive ~User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537. ` 2Accept: image/webp,Ìapngì *, */*;q=0.8 DReferer: ht P
"´ -Encoding: gzip, deflate, br Ô Language: en-US, en„9 %C„I ’: rxVisitor=1495864829678I083H1MM3UVIPQREIQSNDG5V1FS6344V; _sp_id.1fff=a3310903-1094-4cb8-a179-e209cb4198f9.1500386692.22.1503474180.1503471200.6f1d47eb-6b94-408c-b0dd-6c91d546bfc0; loginMessage=logout; F47F4A0F30FB7A75=sandesh.p@unilogcorp.com; 750B2E0333A28C1D=test1234; F30FB33A2=true localhostš $7ec37777-495b-4b2b-8a11-da7d24ef9a5dzi Aiglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0
Below is the data of .lzo.index file(Note: i have just opened that file with notepad ++ software, didnt extract anything)
(s3://databaseregionevents/2017-11-27-49578891737724711875591370082515362848639313462685597698-49578891737724711875591370107949953167511509038834647042.lzo.index)
&
-
Then i will start the enrichement process of the events by using following command.
./snowplow-emr-etl-runner run --config snowplow/4-storage/config/emretlrunner.yml --resolver snowplow/3-enrich/config/iglu_resolver.json --enrichments snowplow/3-enrich/config/enrichments/
Once i run this command all the 12 steps completed suceessfully, even i checked in the EMR.
Below is the message i got after process completed
D, [2017-11-27T06:39:50.913000 #19422] DEBUG -- : Initializing EMR jobflow
D, [2017-11-27T06:39:55.161000 #19422] DEBUG -- : EMR jobflow j-1T9FRDP4EWWI8 started, waiting for jobflow to complete...
I, [2017-11-27T07:05:59.671000 #19422] INFO -- : No RDB Loader logs
D, [2017-11-27T07:05:59.671000 #19422] DEBUG -- : EMR jobflow j-1T9FRDP4EWWI8 completed successfully.
I, [2017-11-27T07:05:59.671000 #19422] INFO -- : Completed successfully
After the process completes successfully,
-
Raw section, 2 folders has created i,e archive and processing. Inside the processing, logs folder has created and inside the archive under run folder kinesis events has copied.
-
Enrich Section, 3 folders has created i,e archive, good and bad. Inside the archive, run folder has created and inside run folder .csv generated, Below is the data present in .csv file.
13 web 2017-11-27 06:39:50.934 2017-11-27 06:30:17.385 2017-11-27 06:31:37.423 page_ping ac450cd0-8bd6-4b53-bf6d-1a7a0397105e cf js-2.8.0 ssc-0.9.0-kinesis spark-1.9.0-common-0.25.0 172.31.38.x 4265106636 e071f763-3e03-4e40-9472-6f4d56e444a0 1 7ec37777-495b-4b2b-8a11-da7d24ef9a5d http://localhost:8085/ChangesHTML/SampleExampleTracker.html Fixed Width 2 Blue http localhost 8085 /ChangesHTML/SampleExampleTracker.html 0 0 0 0 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36 Chrome Chrome 62.0.3202.94 Browser WEBKIT en-US 1 0 0 0 0 0 0 0 0 1 24 1517 735 Windows 10 Windows Microsoft Corporation Asia/Kolkata Computer 0 1366 768 UTF-8 1499 860 2017-11-27 06:31:37.427 {"schema":"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-1","data":[{"schema":"iglu:com.snowplowanalytics.snowplow/ua_parser_context/jsonschema/1-0-0","data":{"useragentFamily":"Chrome","useragentMajor":"62","useragentMinor":"0","useragentPatch":"3202","useragentVersion":"Chrome 62.0.3202","osFamily":"Windows","osMajor":null,"osMinor":null,"osPatch":null,"osPatchMinor":null,"osVersion":"Windows","deviceFamily":"Other"}}]} 4fb9e7d0-5df4-4e60-9226-8457b36193bd 2017-11-27 06:30:17.381 com.snowplowanalytics.snowplow page_ping jsonschema 1-0-0 dcfc0cffb76b37e93a54d47d3b33ef1c
Inside the good folder run folder has generated with 0KB with no data inside that file
Inside the bad folder run folder has generated with 2 files i,e success and part_0 with no data inside(0KB)
3. Shredded section, 3 folders has created i,e archive, good and bad. Inside the archive, run folder has created and inside run folder their is atomic-events and shredded-types.
Inside the good folder, run folder has generated and it has shredded_types inside run folder.
Inside the bad folder, run folder has generated and have so many files with 0KB
-
In the storage process, Below is the command used to run
./snowplow-emr-etl-runner run --config snowplow/4-storage/config/emretlrunner.yml --resolver snowplow/4-storage/config/iglu_resolver.json --targets snowplow/4-storage/config/targets/ --skip analyze
It is failing in the 4th steps of the process, Below is the error
Exception in thread "main" java.lang.RuntimeException: Error running job
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:927)
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:705)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.amazon.elasticmapreduce.s3distcp.Main.main(Main.java:22)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://ip-172-31-18-175.ec2.internal:8020/tmp/d28bb1f8-0bae-420d-97ef-45046305b36e/files
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:317)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:352)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:901)
... 10 more
I have explained everything in details, IS anything is missing in the above steps please let me know the details.
Thanks,
Sandesh P