Contract violation for argument


#1

Hello

When I try to parser logs from my bucket I get the error below:

root@ip-172-31-32-36:/home/cloudfront/snowplow# ./snowplow-emr-etl-runner run -c config.yaml -r resolver.json -d
D, [2018-09-05T04:43:27.209000 #1807] DEBUG -- : Initializing EMR jobflow
ParamContractError: Contract violation for argument 2 of 5:
        Expected: String,
        Actual: nil
        Value guarded in: Snowplow::EmrEtlRunner::EmrJob::get_archive_step
        With Contract: String, String, String, String, String => Elasticity::S3DistCpStep
        At: uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:806 
         block in Contract at uri:classloader:/gems/contracts-0.11.0/lib/contracts.rb:48
          failure_callback at uri:classloader:/gems/contracts-0.11.0/lib/contracts.rb:154
  block in redefine_method at uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:143
                    <main> at uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41
                      load at org/jruby/RubyKernel.java:979
                    <main> at uri:classloader:/META-INF/main.rb:1
                   require at org/jruby/RubyKernel.java:961
                    (root) at uri:classloader:/META-INF/main.rb:1
                    <main> at uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1
ERROR: org.jruby.embed.EvalFailedException: (ParamContractError) Contract violation for argument 2 of 5:
        Expected: String,
        Actual: nil
        Value guarded in: Snowplow::EmrEtlRunner::EmrJob::get_archive_step
        With Contract: String, String, String, String, String => Elasticity::S3DistCpStep
        At: uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:806 

I have take look more on google but nothing. Any one have issue like this. My logs file in bucket here

#Version: 1.0
#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields
2018-09-05	04:34:02	NRT12-C1	565	112.197.14.10	GET	d1g8ya19nxyjyz.cloudfront.net	/	301	-	Mozilla/5.0%2520(X11;%2520Ubuntu;%2520Linux%2520x86_64;%2520rv:61.0)%2520Gecko/20100101%2520Firefox/61.0	-	-	Redirect	xLCndSc8mClqXjUZIAtllkGogaGkY4Jk-fRVbU2-0AiZYb9sSr_V-A==	click-tracking.rcapp.co	http	554	0.001	-	-	-	Redirect	HTTP/1.1	-	-
2018-09-05	04:34:05	NRT12-C1	341	112.197.14.10	GET	d1g8ya19nxyjyz.cloudfront.net	/	200	-	Mozilla/5.0%2520(X11;%2520Ubuntu;%2520Linux%2520x86_64;%2520rv:61.0)%2520Gecko/20100101%2520Firefox/61.0	-	-	Error	Tv9QZusbNVt6K1kDMp27Rix24SWU6TRm5qPO4rLD3V_q5Sznil7QNQ==	click-tracking.rcapp.co	https	359	2.080	-	TLSv1.2	ECDHE-RSA-AES128-GCM-SHA256	Error	HTTP/2.0	-	-
2018-09-05	04:34:08	NRT12-C1	348	112.197.14.10	GET	d1g8ya19nxyjyz.cloudfront.net	/abc	200	-	Mozilla/5.0%2520(X11;%2520Ubuntu;%2520Linux%2520x86_64;%2520rv:61.0)%2520Gecko/20100101%2520Firefox/61.0	-	-	Error	DI55WryQ7hAUjhlewtCw3SfSxW_Q3227Yq_T_N6rSsXjxFmXm6v3Xw==	click-tracking.rcapp.co	https	24	0.342	-	TLSv1.2	ECDHE-RSA-AES128-GCM-SHA256	Error	HTTP/2.0	-

And my config.yaml

  access_key_id: xxx
  secret_access_key: RHk/xxxxxx/xxx
  s3:
    region: eu-central-1
    buckets:
      assets: "s3://rcsnowplow-hosted-assets"
      log: "s3n://rcmy-snowplow-etl/logs/"
      raw:
        in: 
          - "s3://rc-eu-central-1-snowplow/tokyo_logs/"
        processing: "s3n://rcmy-snowplow-etl/processing/"
        archive: "s3://rcmy-archive-bucket/raw"
      enriched:
        good: "s3://rcmy-data-bucket/enriched/good"
        bad: "s3://rcmy-data-bucket/enriched/bad"
        errors: "s3://rcmy-data-bucket/enriched/errors"
        archive: "s3://rcmy-data-bucket/enriched/archive"
      shredded:
        good: "s3://rcmy-data-bucket/shredded/good"
        bad: s3://rcmy-data-bucket/shredded/bad
        errors: "s3://rcmy-data-bucket/shredded/errors"
  emr:
    ami_version: 5.9.0
    region: eu-central-1
    jobflow_role: EMR_EC2_DefaultRole
    service_role: EMR_DefaultRole
    placement:
    ec2_subnet_id: subnet-1edcfa54
    ec2_key_name: xxxx
    bootstrap: []
    software:
      hbase:
      lingual:
    jobflow:
      job_name: Snowplow ETL
      master_instance_type: m1.medium
      core_instance_count: 2
      core_instance_type: m1.medium
      core_instance_ebs:
        volume_size: 100
        volume_type: "gp2"
        volume_iops: 400
        ebs_optimized: false
      task_instance_count: 0
      task_instance_type: m1.medium
      task_instance_bid: 0.015
    bootstrap_failure_tries: 3
    configuration:
      yarn-site:
        yarn.resourcemanager.am.max-attempts: "1"
      spark:
        maximizeResourceAllocation: "true"
    additional_info:
collectors:
  format: cloudfront
enrich:
  versions:
    spark_enrich: 1.12.0
  continue_on_unexpected_error: false
  output_compression: NONE
storage:
  download:
    folder: 
  versions:
    rdb_loader: 0.14.0
    rdb_shredder: 0.13.0
    hadoop_elasticsearch: 0.1.0
  targets:
    name: "My PostgreSQL database"
    type: postgres
    host: [192.168.10.153] # Hostname of database server
    database: postgress # Name of database
    port: 5432 # Default Postgres port
    ssl_mode: disable # One of disable (default), require, verify-ca or verify-full
    table: atomic.events
    username: [ptma-log]
    password: [xxxxx]
    maxerror: # Not required for Postgres
    comprows: # Not required for Postgres
monitoring:
  tags: {}
  logging:
    level: DEBUG

Thanks


#2

@duythien, it’s not your data that causes this “Contract violation” but rather your configuration file:

Expected: String,
Actual: nil
Value guarded in: Snowplow::EmrEtlRunner::EmrJob::get_archive_step
With Contract: String, String, String, String, String => Elasticity::S3DistCpStep

As I can see you have missing shredded:archive bucket.


#3

Thanks @ihor it’s work now. Now I have data so I can running other command to insert to database postgress, but my question could we insert data to postgress the step above( ./snowplow-emr-etl-runner run --config config.yaml --resolver reslover.json --targets targets/?
Below other option:

$ ./snowplow-storage-loader --config config.yaml --resolver reslover.json --targets targets/ --skip analyze
Loading Snowplow events into PostgreSQL enriched events storage (PostgreSQL database)...
Opening database connection ...
Archiving Snowplow events...
  moving files from s3://rcmy-data-bucket/enriched/good/ to s3://rcmy-data-bucket/enriched/archive/
  moving files from s3://rcmy-data-bucket/enriched/good/ to s3://rcmy-data-bucket/enriched/archive/
  moving files from s3://rcmy-data-bucket/shredded/good/ to s3://rcmy-data-bucket/shredded/archive/
  moving files from s3://rcmy-data-bucket/shredded/good/ to s3://rcmy-data-bucket/shredded/archive/
Completed successfully

But nothing data save into postgress?

Because from github page StorageLoader is deprecated and replaced by RDB Loader https://github.com/snowplow/snowplow/wiki/1-Installing-the-StorageLoader

Any sugesstion?


#4

Update logs more

$ ./snowplow-emr-etl-runner run  --config config.yaml --resolver reslover.json --targets targets/
D, [2018-09-06T16:08:40.983000 #26052] DEBUG -- : Initializing EMR jobflow
D, [2018-09-06T16:09:00.286000 #26052] DEBUG -- : EMR jobflow j-3BTNXFGB9E7IP started, waiting for jobflow to complete...
I, [2018-09-06T16:53:46.710000 #26052]  INFO -- : RDB Loader logs
D, [2018-09-06T16:53:46.716000 #26052] DEBUG -- : Downloading s3://rcmy-snowplow-logs/rdb-loader/2018-09-06-16-08-41/87371d13-0b07-4345-a9ac-040385c69d30 to /tmp/rdbloader20180906-26052-sc8ks7
I, [2018-09-06T16:54:01.040000 #26052]  INFO -- : PostgreSQL enriched events storage
I, [2018-09-06T16:54:01.041000 #26052]  INFO -- : RDB Loader successfully completed following steps: [Discover, Analyze]
D, [2018-09-06T16:54:01.042000 #26052] DEBUG -- : EMR jobflow j-3BTNXFGB9E7IP completed successfully.
I, [2018-09-06T16:54:01.043000 #26052]  INFO -- : Completed successfully

Version snowplow-emr-etl-runner:

./snowplow-emr-etl-runner -v
snowplow-emr-etl-runner 0.32.0

Postgress in targets/postgress.json

$ cat targets/postgress.json 
{
    "schema": "iglu:com.snowplowanalytics.snowplow.storage/postgresql_config/jsonschema/1-1-0",
    "data": {
        "name": "PostgreSQL enriched events storage",
        "host": "112.xx.xxx.10",
        "database": "xxx-log",
        "port": 5432,
        "sslMode": "DISABLE",
        "username": "xxx",
        "password": "xxxx",
        "schema": "atomic",
        "sshTunnel": null,
        "purpose": "ENRICHED_EVENTS"
    }
}
$ 

Still empty data in postgress


#5

@duythien, all looks fine to me (I assume you also amended config.yaml as the original format is not right - you have now targets as a separate entity).

The logs indicate RDB Loader did run OK and 2 out of expected 3 steps completed - “RDB Loader successfully completed following steps: [Discover, Analyze]”. The Load step is missing. Are you sure there was valid enriched data in the payload and you have not skipped that step with --skip option to EmrEtlRunner?

The bucket in the logs also different from the one in your original config.yaml.