No data loaded in postgres, no errors either


#1

I have run the following command using the r88 release, but databse showing empty table. I have run it multiple times but still same results.

/home/ec2-user/snowplow/bin/snowplow-storage-loader --config /home/ec2-user/snowplow/enrich/runner.yaml --targets /home/ec2-user/snowplow/enrich/targets --resolver /home/ec2-user/snowplow/enrich/resolver.json

Here is the config file

aws:
  # Credentials can be hardcoded or set in environment variables
  access_key_id: xxxx
  secret_access_key: xxxxx
  s3:
    region: us-west-2
    buckets:
      assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket
      jsonpath_assets: # If you have defined your own JSON Schemas, add the s3:// path to your own JSON Path files in your own bucket here
      log: s3://namespace-analytics/logs
      raw:
        in:
          - s3://elasticbeanstalk-us-west-2-377172143455/resources/environments/logs/publish/e-vcgkcxe3cz
        processing: s3n://namespace-analytics/raw/processing
        archive: s3://namespace-analytics-data/raw/archive    # e.g. s3://my-archive-bucket/raw
      enriched:
        good: s3://namespace-analytics-data/enriched/good         # e.g. s3://my-out-bucket/enriched/good
        bad: s3://namespace-analytics-data/enriched/bad           # e.g. s3://my-out-bucket/enriched/bad
        errors: s3://namespace-analytics-data/enriched/errors     # Leave blank unless :continue_on_unexpected_error: set to true below
        archive: s3://namespace-analytics-data/enriched/archive   # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched
      shredded:
        good: s3://namespace-analytics-data/shredded/good         # e.g. s3://my-out-bucket/shredded/good
        bad: s3://namespace-analytics-data/shredded/bad           # e.g. s3://my-out-bucket/shredded/bad
        errors: s3://namespace-analytics-data/shredded/errors     # Leave blank unless :continue_on_unexpected_error: set to true below
        archive: s3://namespace-analytics-data/shredded/archive   # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded
  emr:
    ami_version: 4.5.0
    region: us-west-2        # Always set this
    jobflow_role: EMR_EC2_DefaultRole # Created using $ aws emr create-default-roles
    service_role: EMR_DefaultRole     # Created using $ aws emr create-default-roles
    placement: # Set this if not running in VPC. Leave blank otherwise
    ec2_subnet_id: # Set this if running in VPC. Leave blank otherwise
    ec2_key_name: namespace-production
    bootstrap: []           # Set this to specify custom boostrap actions. Leave empty otherwise
    software:
      hbase:                # Optional. To launch on cluster, provide version, "0.92.0", keep quotes. Leave empty otherwise.
      lingual:              # Optional. To launch on cluster, provide version, "1.1", keep quotes. Leave empty otherwise.
    # Adjust your Hadoop cluster below
    jobflow:
      master_instance_type: m1.small
      core_instance_count: 1
      core_instance_type: m1.small
      core_instance_ebs:    # Optional. Attach an EBS volume to each core instance.
        volume_size: 20    # Gigabytes
        volume_type: "gp2"
        volume_iops: 400    # Optional. Will only be used if volume_type is "io1"
        ebs_optimized: false # Optional. Will default to true
      task_instance_count: 0 # Increase to use spot instances
      task_instance_type: m1.small
      task_instance_bid: 0.015 # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances
    bootstrap_failure_tries: 3 # Number of times to attempt the job in the event of bootstrap failures
    additional_info:        # Optional JSON string for selecting additional features
collectors:
  format: clj-tomcat # For example: 'clj-tomcat' for the Clojure Collector, 'thrift' for Thrift records, 'tsv/com.amazon.aws.cloudfront/wd_access_log' for Cloudfront access logs or 'ndjson/urbanairship.connect/v1' for UrbanAirship Connect events
enrich:
  job_name: Snowplow ETL # Give your job a name
  versions:
    hadoop_enrich: 1.8.0 # Version of the Hadoop Enrichment process
    hadoop_shred: 0.10.0 # Version of the Hadoop Shredding process
    hadoop_elasticsearch: 0.1.0 # Version of the Hadoop to Elasticsearch copying process
  continue_on_unexpected_error: false
  output_compression: NONE # Compression only supported with Redshift, set to NONE if you have Postgres targets. Allowed formats: NONE, GZIP
storage:
  download:
    folder: /home/ec2-user/postgres # Postgres-only config option. Where to store the downloaded files. Leave blank for Redshift
monitoring:
  tags: {} # Name-value pairs describing this job
  logging:
    level: DEBUG # You can optionally switch to INFO for production
  snowplow:
    method: get
    app_id: snowplow-backend # e.g. snowplow
    collector: collector.namespace.co # e.g. d3rkrsqld9gmqf.cloudfront.net

how to debug it or what the possible fix ??


#2

Hi @ankur18 - R88 is pre-release, please see:

http://discourse.snowplowanalytics.com/t/snowplow-rc-release-not-working-as-expected/1100/2


#3

Thanks @alex for the reply. I have seen the diff on the repo for the changes wrt to commands and got everything else working. Its just the import is not working.

Also I have tried the config of 87 version as well. Even its not importing the data.


#4

Hey @ankur18,

I had the same issue for which I tested all the releases of the storage loader but none of them worked for me. Finally, I tweaked the s3_tasks.rb from master branch, recompiled and ran successfully.

Do the same or else wait till they finalise the R-88 release candidate.