Storage-loader issues, runtime errors


#1

Hi all I get a following error when trying to run a storage loader

$ ./snowplow-storage-loader -c config.yml
`Unexpected error: undefined method `host=' for #<Fog::Storage::AWS::Real:0x5a5c2889>
`ri:classloader:/storage-loader/lib/snowplow-storage-loader/s3_tasks.rb:41:in `download_events'
`uri:classloader:/storage-loader/bin/snowplow-storage-loader:42:in `<main>'
`org/jruby/RubyKernel.java:973:in `load'
`uri:classloader:/META-INF/main.rb:1:in `<main>'
`org/jruby/RubyKernel.java:955:in `require'
`uri:classloader:/META-INF/main.rb:1:in `(root)'
`uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'`

Looking at the source, the line that breaks it seems in this file:

https://github.com/snowplow/snowplow/blob/master/4-storage/storage-loader/lib/snowplow-storage-loader/s3_tasks.rb

and my guess is it has to do something with regions?

My config.yaml for buckets is as follows (mind the disclosure formatting and removing of private info) They are althog working as emr-etl runner rusn correctly and fills the buckets with data

 s3:
    region: ap-southeast-2
    buckets:
      assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket
      jsonpath_assets: # If you have defined your own JSON Schemas, add the s3:// path to your own JSON Path files in your own bucket here
      log: s3://aws-logs-121486008730-ap-southeast-2
      raw:
        in:                  # This is a YAML array of one or more in buckets - you MUST use hyphens before each entry in the array, as below
          - s3://logs-snowplow.xxxxxx         # e.g. s3://my-old-collector-bucket
        #  - s3://raw-in-new-snowplow.xxxx         # e.g. s3://my-new-collector-bucke
        processing: s3://raw-processing-snowplow.xxxxx
        archive: s3://raw-snowplow.xxxxxx/archive    # e.g. s3://my-archive-bucket/raw
      enriched:
        good: s3://enriched-snowplow.xxxxx/good      # e.g. s3://my-out-bucket/enriched/good
        bad: s3://enriched-snowplow.xxxxxx/bad        # e.g. s3://my-out-bucket/enriched/bad
        errors: s3://enriched-snowplow.xxxxxxx/errors     # Leave blank unless :continue_on_unexpected_error: set to true below
        archive: s3://enriched-snowplow.xxxxxx/archived    # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched
      shredded:
        good: s3://shredded-snowplow.xxxxxx/good       # e.g. s3://my-out-bucket/shredded/good
        bad: s3://shredded-snowplow.xxxxxxx/bad        # e.g. s3://my-out-bucket/shredded/bad
        errors: s3://shredded-snowplow.xxxxxx/errors     # Leave blank unless :continue_on_unexpected_error: set to true below
        archive: s3://shredded-snowplow.xxxxxxxx/archive    # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded

#2

Hi @elijan - what version of StorageLoader are you running?


#3

Hey @alex

87_chicken.

I have to add, that just now I ve tried the 88_angkor_rc1 and it works fine (albeit I had to pass storage and resolver configs)


#5

Actually @alex, even the 88_angkor_rc1 doesnt work. Angkor requires donwload_required option, once set to true it triggers the same error (as it calls the same function that causes error in previous release)

# Download files if required unless config[:skip].include?('download') if config[:storage][:download_required] loader::S3Tasks.download_events(config) end end


#6

Hi @elijan, @alex,

It all looks that culprit is s3.host assignation that is probably not possible in fog. Postgres load is only place where it happens in both EmrEtlRunner and StorageLoader. I’ll try to reproduce it and will let you know about findings.


#7

@anton

It all looks that culprit is s3.host assignation that is probably not possible in fog

It seems like that as I ve found no method host exist in Fog::class (granted I ain’t ruby dev so I am not sure exactly hwo the API works). Also,yeah it only happens when storage download is added for postgres

However, I downloaded the source and removed the s3.host = region_to_safe_host(config[:aws][:s3][:region]) line; did a new build and it worked fine.(files were downloaded in assigned folder and loader performed its tasks)

FYI, I also have to add that I went back to several releases and storage-loader had same error


#8

Thanks for raising this @elijan. I created a ticket to track this problem. Hope to release Angkor Wat RC2 with this fix today.


Snowplow rc release not working as expected
#9

@elijan, rc2 with this fix is available for test at usual place: https://snowplow.bintray.com/snowplow-generic/snowplow_emr_r88_angkor_wat_rc2.zip.

Please let us know if you have any issues.


#10

@anton

With snowplow.bintray.com/snowplow-generic/snowplow_emr_r88_angkor_wat_rc2.zip
or any other angkor_wat released I am facing following issues:

./snowplow-storage-loader --config exe-snowplow/config/config.yml --targets /root/exe-snowplow/data/ --resolver exe-snowplow/config/resolver.json 

It just archive events but does not download to the target.

Initial reponse on running storage loader

Archiving Snowplow events...
      
moving files from s3://xxx-bucket/enriched/good/ to s3://xxx-bucket/enriched/archive/
    (t1)    MOVE xxx-bucket/enriched/good/run=2017-04-03-16-37-34/part-00001 -> xxx-bucket/enriched/archive/run=2017-04-03-16-37-34/part-00001
    (t3)    MOVE xxx-bucket/enriched/good/run=2017-04-03-16-37-34/part-00002 -> xxx-bucket/enriched/archive/run=2017-04-03-16-37-34/part-00002(t2)    MOVE xxx-bucket/enriched/good/run=2017-04-03-16-37-34/part-00003 -> xxx-bucket/enriched/archive/run=2017-04-03-16-37-34/part-00003

    (t0)    MOVE xxx-bucket/enriched/good/run=2017-04-03-16-37-34/part-00000 -> xxx-bucket/enriched/archive/run=2017-04-03-16-37-34/part-00000
          +-> xxx-bucket/enriched/archive/run=2017-04-03-16-37-34/part-00001      +-> xxx-bucket/enriched/archive/run=2017-04-03-16-37-34/part-00002      +-> xxx-bucket/enriched/archive/run=2017-04-03-16-37-34/part-00003      +-> xxx-bucket/enriched/archive/run=2017-04-03-16-37-34/part-00000



          x xxx-bucket/enriched/good/run=2017-04-03-16-37-34/part-00000
          x xxx-bucket/enriched/good/run=2017-04-03-16-37-34/part-00002
          x xxx-bucket/enriched/good/run=2017-04-03-16-37-34/part-00003
          x xxx-bucket/enriched/good/run=2017-04-03-16-37-34/part-00001
      moving files from s3://xxx-bucket/enriched/good/ to s3://xxx-bucket/enriched/archive/
    (t0)    MOVE xxx-bucket/enriched/good/run=2017-04-03-16-37-34/_SUCCESS -> xxx-bucket/enriched/archive/run=2017-04-03-16-37-34/_SUCCESS
          +-> xxx-bucket/enriched/archive/run=2017-04-03-16-37-34/_SUCCESS
          x xxx-bucket/enriched/good/run=2017-04-03-16-37-34/_SUCCESS
      moving files from s3://xxx-bucket/shredded/good/ to s3://xxx-bucket/shredded/archive/
      moving files from s3://xxx-bucket/shredded/good/ to s3://xxx-bucket/shredded/archive/
    Completed successfully

Response on later runs

Archiving Snowplow events...
  moving files from s3://xxx-bucket/enriched/good/ to s3://xxx-bucket/enriched/archive/
  moving files from s3://xxx-bucket/enriched/good/ to s3://xxx-bucket/enriched/archive/
  moving files from s3://xxx-bucket/shredded/good/ to s3://xxx-bucket/shredded/archive/
  moving files from s3://xxx-bucket/shredded/good/ to s3://xxx-bucket/shredded/archive/
Completed successfully

Download does not initiate.

kindly help


#11

I’m experiencing the same thing. Using storage loader from https://github.com/snowplow/snowplow/tree/release/r88-angkor-wat

Any help would be appreciated.


#12

@darren

Hey,

R-88 is release candidate: Snowplow rcs are designed for internal testing and often have significant changes relative to previous final releases. The config files in r-88 release has been changed significantly, so I tweaked the master branch, recompiled and ran successfully.

Do the same or else wait till they finalise the R-88 release candidate.


#13

You mean this tweak?

I also compensated for the config files (target json, storage schemas in an iglu repo) but no joy.


#14

Hi @darren

if you are still facing issue. do let me know I will share my build.


#15

Hello,

I’m facing this issue too with snowplow_emr_r88_angkor_wat_rc4.zip (same as @v3nom

Running the storage loader only archives data, no targets are run (redshift in my case). There are no errors thrown.

Please help!


#16

R88 is still a release candidate - have you tried running in R87?


#17

snowplow_emr_r87_chichen_itza.zip works without issue! Thanks!

However, R87 does not allow for custom resolver json. This means that we cannot load data into our redshift table in a custom format? Is this true?

Also, https://github.com/snowplow/iglu-central/tree/master/sql/com.snowplowanalytics.snowplow has a lot of SQL of which I’m using only duplicate_1.sql. How do we use the other SQLs? Are they used for various kinds of trackers defined?