Elasticity Custom Jar Step: Load AWS Redshift enriched events storage Storage Target: FAILED


#1

i am trying to push the events to redshift database.
below is the command i am running.

 ./snowplow-emr-etl-runner run --config snowplow/4-storage/config/emretlrunner.yml --resolver snowplow/4-storage/config/iglu_resolver.json --targets snowplow/4-storage/config/targets/ --skip analyze

below is error details in ubuntu CLI:

D, [2017-10-25T11:46:27.056000 #11917] DEBUG -- : Initializing EMR jobflow
D, [2017-10-25T11:46:33.307000 #11917] DEBUG -- : EMR jobflow j-1DF51MTOLK5EX started, waiting for jobflow to complete...
I, [2017-10-25T12:04:37.203000 #11917]  INFO -- : RDB Loader logs
D, [2017-10-25T12:04:37.210000 #11917] DEBUG -- : Downloading s3://datapipelinesnowplow/logs/rdb-loader/2017-10-25-11-46-27/36fc9d8e-c42a-41b4-b277-3b636ac60083 to /tmp/rdbloader20171025-11917-nwonjx
E, [2017-10-25T12:04:38.459000 #11917] ERROR -- : AWS Redshift enriched events storage
E, [2017-10-25T12:04:38.459000 #11917] ERROR -- : ERROR: Data loading error [Amazon](500310) Invalid operation: User arn:aws:redshift:us-east-1:302576851619:dbuser:snowplow/unilog is not authorized to assume IAM Role arn:aws:iam::302576851619:role/myRedshiftRole
Details:
 -----------------------------------------------
  error:  User arn:aws:redshift:us-east-1:302576851619:dbuser:snowplow/unilog is not authorized to assume IAM Role arn:aws:iam::302576851619:role/myRedshiftRole
  code:      8001
  context:   IAM Role = arn:aws:iam::302576851619:role/myRedshiftRole
  query:     52261
  location:  xen_aws_credentials_mgr.cpp:230
  process:   padbmaster [pid=10112]
  -----------------------------------------------;
Following steps completed: [Discover]
F, [2017-10-25T12:04:38.835000 #11917] FATAL -- :

Snowplow::EmrEtlRunner::EmrExecutionError (EMR jobflow j-1DF51MTOLK5EX failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.
Snowplow ETL: TERMINATING [STEP_FAILURE] ~ elapsed time n/a [2017-10-25 11:51:51 +0000 - ]
 - 1. Elasticity S3DistCp Step: Raw s3://datapipelinesnowplow/ -> Raw Staging S3: COMPLETED ~ 00:01:26 [2017-10-25 11:51:53 +0000 - 2017-10-25 11:53:19 +0000]
 - 2. Elasticity S3DistCp Step: Raw S3 -> Raw HDFS: COMPLETED ~ 00:01:20 [2017-10-25 11:53:21 +0000 - 2017-10-25 11:54:41 +0000]
 - 3. Elasticity Spark Step: Enrich Raw Events: COMPLETED ~ 00:01:10 [2017-10-25 11:54:43 +0000 - 2017-10-25 11:55:53 +0000]
 - 4. Elasticity S3DistCp Step: Enriched HDFS -> S3: COMPLETED ~ 00:00:44 [2017-10-25 11:55:55 +0000 - 2017-10-25 11:56:39 +0000]
 - 5. Elasticity S3DistCp Step: Enriched HDFS _SUCCESS -> S3: COMPLETED ~ 00:00:46 [2017-10-25 11:56:41 +0000 - 2017-10-25 11:57:27 +0000]
 - 6. Elasticity Custom Jar Step: Empty Raw HDFS: COMPLETED ~ 00:00:08 [2017-10-25 11:57:29 +0000 - 2017-10-25 11:57:37 +0000]
 - 7. Elasticity Spark Step: Shred Enriched Events: COMPLETED ~ 00:01:22 [2017-10-25 11:57:39 +0000 - 2017-10-25 11:59:01 +0000]
 - 8. Elasticity S3DistCp Step: Shredded HDFS -> S3: COMPLETED ~ 00:01:08 [2017-10-25 11:59:03 +0000 - 2017-10-25 12:00:12 +0000]
 - 9. Elasticity S3DistCp Step: Shredded HDFS _SUCCESS -> S3: COMPLETED ~ 00:00:48 [2017-10-25 12:00:14 +0000 - 2017-10-25 12:01:02 +0000]
 - 10. Elasticity S3DistCp Step: Raw Staging S3 -> Raw Archive S3: COMPLETED ~ 00:01:22 [2017-10-25 12:01:04 +0000 - 2017-10-25 12:02:26 +0000]
 - 11. Elasticity Custom Jar Step: Load AWS Redshift enriched events storage Storage Target: FAILED ~ 00:00:32 [2017-10-25 12:02:31 +0000 - 2017-10-25 12:03:03 +0000]
 - 12. Elasticity S3DistCp Step: Shredded S3 -> Shredded Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 13. Elasticity S3DistCp Step: Enriched S3 -> Enriched Archive S3: CANCELLED ~ elapsed time n/a [ - ]):
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:586:in `run'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:103:in `run'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41:in `<main>'
	org/jruby/RubyKernel.java:979:in `load'
	uri:classloader:/META-INF/main.rb:1:in `<main>'
	org/jruby/RubyKernel.java:961:in `require'
	uri:classloader:/META-INF/main.rb:1:in `(root)'
	uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'

Below is error in stdout.gz

Data loading error [Amazon](500310) Invalid operation: User arn:aws:redshift:us-east-1:302576851619:dbuser:snowplow/xxxxxx is not authorized to assume IAM Role arn:aws:iam::302576851619:role/myRedshiftRole
Details: 
 -----------------------------------------------
  error:  User arn:aws:redshift:us-east-1:302576851619:dbuser:snowplow/xxxxxx is not authorized to assume IAM Role arn:aws:iam::302576851619:role/myRedshiftRole
  code:      8001
  context:   IAM Role = arn:aws:iam::302576851619:role/myRedshiftRole
  query:     52261
  location:  xen_aws_credentials_mgr.cpp:230
  process:   padbmaster [pid=10112]
  -----------------------------------------------;
ERROR: Data loading error [Amazon](500310) Invalid operation: User arn:aws:redshift:us-east-1:302576851619:dbuser:snowplow/unilog is not authorized to assume IAM Role arn:aws:iam::302576851619:role/myRedshiftRole
Details: 
 -----------------------------------------------
  error:  User arn:aws:redshift:us-east-1:302576851619:dbuser:snowplow/unilog is not authorized to assume IAM Role arn:aws:iam::302576851619:role/myRedshiftRole
  code:      8001
  context:   IAM Role = arn:aws:iam::302576851619:role/myRedshiftRole
  query:     52261
  location:  xen_aws_credentials_mgr.cpp:230
  process:   padbmaster [pid=10112]
  -----------------------------------------------;
Following steps completed: [Discover]
INFO: Logs successfully dumped to S3 [s3://datapipelinesnowplow/logs/rdb-loader/2017-10-25-11-46-27/36fc9d8e-c42a-41b4-b277-3b636ac60083]

I have followed the step suggested by @marien and below is the link for details:

Please help me out…


#2

Hi all,

Please help me to resolve this error…
I have been stuck from 3 days…

Thanks,
Sandesh P


#4

Please help me to resolve this error.


#5

Hi @sandesh - could you please provide:

  1. The IAM policy you have assigned to your Redshift Role
  2. The IAM Trust Relationship associated with your Redshift Role
  3. Your Redshift Target file which is being used by EMR to load your Redshift
  4. The IAM policy for the user with ARN: arn:aws:redshift:us-east-1:302576851619:dbuser:snowplow/unilog

#6

Hey @josh thanks for the response

  1. The IAM policy you have assigned to your Redshift Role

    arn:aws:iam::302576851619:role/aws-service-role/redshift.amazonaws.com/AWSServiceRoleForRedshift
    
  2. The IAM Trust Relationship associated with your Redshift Role

     {
    "Version": "2012-10-17",
    "Statement": [
     {
       "Effect": "Allow",
       "Principal": {
     	"Service": "redshift.amazonaws.com"
       },
       "Action": "sts:AssumeRole"
     }
      ]
    }
    

3.Your Redshift Target file which is being used by EMR to load your Redshift

{
	"schema": "iglu:com.snowplowanalytics.snowplow.storage/redshift_config/jsonschema/2-0-0",
	"data": {
		"name": "AWS Redshift enriched events storage",
		"host": "snowplow.cze0fyuagv4x.us-east-1.redshift.amazonaws.com",
		"database": "unilog",
		"port": 5439,
		"sslMode": "DISABLE",
		"username": "unilog",
		"password": "Unilog123",
		"roleArn": "arn:aws:iam::302576851619:role/aws-service-role/redshift.amazonaws.com/AWSServiceRoleForRedshift",
		"schema": "atomic",
		"maxError": 1,
		"compRows": 20000,
		"purpose": "ENRICHED_EVENTS"
	}
}

4.The IAM policy for the user with ARN: arn:aws:redshift:us-east-1:302576851619:dbuser:snowplow/unilog

AWSServiceRoleForRedshift

Please help me out


#7

Hi @sandesh,

It looks like you have not created your Redshift Role correctly. The ARN for the role should follow this pattern:

arn:aws:iam::302576851619:role/MyRedshiftRole

Please follow this guide http://docs.aws.amazon.com/redshift/latest/gsg/rs-gsg-create-an-iam-role.html, paying particular attention to this step:

5. Under Select your use case, choose Redshift - Customizable then choose Next: Permissions.

This will let you select the correct IAM policy for your Role. In this it should be AmazonS3ReadOnlyAccess.


Once you have created this new Role you will need to ensure that:

  1. It is attached to your Redshift Cluster and;
  2. You update your JSON target file with the new ARN

Hope this helps,
Josh


#8

Thanks for the valuable input @josh.

Now the error got changed.

Data loading error [Amazon](500310) Invalid operation: relation "atomic.manifest" does not exist;
ERROR: Data loading error [Amazon](500310) Invalid operation: relation "atomic.manifest" does not exist;
Following steps completed: [Discover,Load]
INFO: Logs successfully dumped to S3 [s3://eventsnowplow/logs/rdb-loader/2017-10-31-06-40-03/496a3c58-9d16-4ace-a0a0-fb8d5338414b]

What changes i need to do?


#9

As per the Snowplow Upgrade guide here:

You will need to deploy the manifest table from Snowplow Release 87 onwards.


#10

Thank you so much @josh

i change the rdb_loader version from 0.12.0 to 0.13.0
now the error changed and as below

Data loading error [Amazon](500310) Invalid operation: Cannot COPY into nonexistent table com_snowplowanalytics_snowplow_link_click_1;
ERROR: Data loading error [Amazon](500310) Invalid operation: Cannot COPY into nonexistent table com_snowplowanalytics_snowplow_link_click_1;
Following steps completed: [Discover]
INFO: Logs successfully dumped to S3 [s3://eventsnowplow/logs/rdb-loader/2017-10-31-11-22-50/ffc0e42f-f5e6-4355-b131-7537a0d048d5]

Can you please suggest the changes i need to do?
do i need to change the version of rdb_shredder: 0.12.0 to 0.13.0


#11

Hi @sandesh,

The error message is fairly explicit:

Cannot COPY into nonexistent table com_snowplowanalytics_snowplow_link_click_1; 

You need to ensure that all the tables for the events you are collecting have been deployed to your Redshift Cluster or the load process will fail.

The DDLs for all of the Snowplow Tables can be found here:

In this case you need to deploy this DDL:

Hope this helps,
Josh