Data loading error to redshift


#1

Hi all,

i am trying to push the events to redshift database.
I have been stuck in some steps and i overcome from the suggestion given by @anton.
Below is the link for details:

I am running storage targets using below command.

./snowplow-emr-etl-runner run --config snowplow/4-storage/config/emretlrunner.yml --resolver snowplow/4-storage/config/iglu_resolver.json --targets snowplow/4-storage/config/targets/ --skip analyze

Below is the error i am getting:

Data loading error [Amazon](500310) Invalid operation: User arn:aws:redshift:us-east-1:302576851619:dbuser:snowplow/xxxxxx is not authorized to assume IAM Role arn:aws:iam::302576851619:role/myRedshiftRole
Details: 
 -----------------------------------------------
  error:  User arn:aws:redshift:us-east-1:302576851619:dbuser:snowplow/xxxxxx is not authorized to assume IAM Role arn:aws:iam::302576851619:role/myRedshiftRole
  code:      8001
  context:   IAM Role = arn:aws:iam::302576851619:role/myRedshiftRole
  query:     44246
  location:  xen_aws_credentials_mgr.cpp:230
  process:   padbmaster [pid=1561]
  -----------------------------------------------;
ERROR: Data loading error [Amazon](500310) Invalid operation: User arn:aws:redshift:us-east-1:302576851619:dbuser:snowplow/unilog is not authorized to assume IAM Role arn:aws:iam::302576851619:role/myRedshiftRole
Details: 
 -----------------------------------------------
  error:  User arn:aws:redshift:us-east-1:302576851619:dbuser:snowplow/unilog is not authorized to assume IAM Role arn:aws:iam::302576851619:role/myRedshiftRole
  code:      8001
  context:   IAM Role = arn:aws:iam::302576851619:role/myRedshiftRole
  query:     44246
  location:  xen_aws_credentials_mgr.cpp:230
  process:   padbmaster [pid=1561]
  -----------------------------------------------;
Following steps completed: [Discover]
INFO: Logs successfully dumped to S3 [s3://snowplowevents/logs/rdb-loader/2017-10-23-06-56-26/80c5a3ec-ef7c-499a-b88f-3d1f56e4f9c6]

Below is the redshift.json file of 2-0-0 as suggested by @anton

{
	"schema": "iglu:com.snowplowanalytics.snowplow.storage/redshift_config/jsonschema/2-0-0",
	"data": {
		"name": "AWS Redshift enriched events storage",
		"host": "snowplow.cze0fyuagv4x.us-east-1.redshift.amazonaws.com",
		"database": "xxx",
		"port": 5439,
		"sslMode": "DISABLE",
		"username": "xxx",
		"password": "xxxxx",
		"roleArn": "arn:aws:iam::302576851619:role/myRedshiftRole",
		"schema": "atomic",
		"maxError": 1,
		"compRows": 20000,
		"purpose": "ENRICHED_EVENTS"
	}
}

My emrrtlrunner.yml file is below:

aws:
  # Credentials can be hardcoded or set in environment variables
  access_key_id: xxxxx
  secret_access_key: xxxxx
  #keypair: Snowplowkeypair
  #key-pair-file: /home/ubuntu/snowplow/4-storage/config/Snowplowkeypair.pem
  region: us-east-1
  s3:
	region: us-east-1
	buckets:
	  assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket
	  jsonpath_assets: # If you have defined your own JSON Schemas, add the s3:// path to your own JSON Path files in your own bucket here
	  log: s3://snowplowevents/logs
	  raw:
		in:                  # This is a YAML array of one or more in buckets - you MUST use hyphens before each entry in the array, as below
		  - s3://snowplowevents/      # e.g. s3://my-old-collector-bucket
		processing: s3://snowplowevents/raw/processing11
		archive: s3://snowplowevents/raw/archive11   # e.g. s3://my-archive-bucket/raw
	  enriched:
		good: s3://snowplowevents/enriched/good11        # e.g. s3://my-out-bucket/enriched/good
		bad: s3://snowplowevents/enriched/bad11       # e.g. s3://my-out-bucket/enriched/bad
		errors: s3://snowplowevents/enriched/errors11     # Leave blank unless :continue_on_unexpected_error: set to true below
		archive: s3://snowplowevents/enriched/archive11    # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched
	  shredded:
		good: s3://snowplowevents/shredded/good11       # e.g. s3://my-out-bucket/shredded/good
		bad: s3://snowplowevents/shredded/bad11        # e.g. s3://my-out-bucket/shredded/bad
		errors: s3://snowplowevents/shredded/errors11     # Leave blank unless :continue_on_unexpected_error: set to true below
		archive: s3://snowplowevents/shredded/archive11     # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded
  emr:
	ami_version: 5.5.0
	region: us-east-1       # Always set this
	jobflow_role: EMR_EC2_DefaultRole # Created using $ aws emr create-default-roles
	service_role: EMR_DefaultRole     # Created using $ aws emr create-default-roles
	placement: us-east-1a      # Set this if not running in VPC. Leave blank otherwise
	ec2_subnet_id:  # Set this if running in VPC. Leave blank otherwise
	ec2_key_name: Snowplowkeypair
	bootstrap: []           # Set this to specify custom boostrap actions. Leave empty otherwise
	software:
	  hbase:              # Optional. To launch on cluster, provide version, "0.92.0", keep quotes. Leave empty otherwise.
	  lingual:              # Optional. To launch on cluster, provide version, "1.1", keep quotes. Leave empty otherwise.
	# Adjust your Hadoop cluster below
	jobflow:
	  job_name: Snowplow ETL # Give your job a name
	  master_instance_type: m2.4xlarge
	  core_instance_count: 2
	  core_instance_type: m2.4xlarge
	  core_instance_ebs:    # Optional. Attach an EBS volume to each core instance.
		volume_size: 100    # Gigabytes
		volume_type: "gp2"
		volume_iops: 400    # Optional. Will only be used if volume_type is "io1"
		ebs_optimized: false # Optional. Will default to true
	  task_instance_count: 0 # Increase to use spot instances
	  task_instance_type: m2.4xlarge
	  task_instance_bid: 0.015 # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances
	bootstrap_failure_tries: 3 # Number of times to attempt the job in the event of bootstrap failures
	configuration:
	  yarn-site:
		yarn.resourcemanager.am.max-attempts: "1"
	  spark:
		maximizeResourceAllocation: "true"
	additional_info:        # Optional JSON string for selecting additional features
collectors:
  format: thrift # For example: 'clj-tomcat' for the Clojure Collector, 'thrift' for Thrift records, 'tsv/com.amazon.aws.cloudfront/wd_access_log' for Cloudfront access logs or 'ndjson/urbanairship.connect/v1' for UrbanAirship Connect events
enrich:
  versions:
	spark_enrich: 1.9.0 # Version of the Spark Enrichment process
  continue_on_unexpected_error: false # Set to 'true' (and set :out_errors: above) if you don't want any exceptions thrown from ETL
  output_compression: NONE # Compression only supported with Redshift, set to NONE if you have Postgres targets. Allowed formats: NONE, GZIP
storage:
  versions:
	rdb_loader: 0.12.0
	rdb_shredder: 0.12.0        # Version of the Spark Shredding process
	hadoop_elasticsearch: 0.1.0 # Version of the Hadoop to Elasticsearch copying process
monitoring:
  tags: {} # Name-value pairs describing this job
  logging:
	level: DEBUG # You can optionally switch to INFO for production
  #snowplow:
	#method: get
	#app_id: unilog # e.g. snowplow
	#collector: 172.31.38.39:8082 # e.g. d3rkrsqld9gmqf.cloudfront.net

And iglu_resolver.json file is:

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1",
  "data": {
	"cacheSize": 500,
	"repositories": [
	  {
		"name": "Iglu Central",
		"priority": 0,
		"vendorPrefixes": [ "com.snowplowanalytics" ],
		"connection": {
		  "http": {
			"uri": "http://iglucentral.com"
		  }
		}
	  }
	]
  }
}

I have created the role for redshift, below is the screenshot for details

Please help me to resolve this error


EmrEtlRunner::EmrExecutionError in the 3rd stage of the process
#2

From what I read from the error is that your database user has not sufficient rights.


#3

Thanks for the quick response @marien.

whats the permission i need to give.
i have just followed the below link to create the database.

Please revert back…


#4

I think your rights for the RedShift role is not correct, please verify your IAM configuration with this documentation: https://github.com/snowplow/snowplow/wiki/Setup-IAM-permissions-for-operating-Snowplow


Elasticity Custom Jar Step: Load AWS Redshift enriched events storage Storage Target: FAILED
#5

Hey @marien i followed the configuration and made all the changes, but still i am stuck in the same step.

below is error details:

Data loading error [Amazon](500310) Invalid operation: User arn:aws:redshift:us-east-1:302576851619:dbuser:snowplow/xxxxxx is not authorized to assume IAM Role arn:aws:iam::302576851619:role/myRedshiftRole
Details: 
 -----------------------------------------------
  error:  User arn:aws:redshift:us-east-1:302576851619:dbuser:snowplow/xxxxxx is not authorized to assume IAM Role arn:aws:iam::302576851619:role/myRedshiftRole
  code:      8001
  context:   IAM Role = arn:aws:iam::302576851619:role/myRedshiftRole
  query:     52261
  location:  xen_aws_credentials_mgr.cpp:230
  process:   padbmaster [pid=10112]
  -----------------------------------------------;
ERROR: Data loading error [Amazon](500310) Invalid operation: User arn:aws:redshift:us-east-1:302576851619:dbuser:snowplow/unilog is not authorized to assume IAM Role arn:aws:iam::302576851619:role/myRedshiftRole
Details: 
 -----------------------------------------------
  error:  User arn:aws:redshift:us-east-1:302576851619:dbuser:snowplow/unilog is not authorized to assume IAM Role arn:aws:iam::302576851619:role/myRedshiftRole
  code:      8001
  context:   IAM Role = arn:aws:iam::302576851619:role/myRedshiftRole
  query:     52261
  location:  xen_aws_credentials_mgr.cpp:230
  process:   padbmaster [pid=10112]
  -----------------------------------------------;
Following steps completed: [Discover]
INFO: Logs successfully dumped to S3 [s3://datapipelinesnowplow/logs/rdb-loader/2017-10-25-11-46-27/36fc9d8e-c42a-41b4-b277-3b636ac60083]

Please help me out.
tell me the root cause so that i can find the solution for that.