Snowplow docker s3-load work incorrect


#1

Hi, when i use docker run s3-load with config file:

# Default configuration for s3-loader

# Sources currently supported are:
# 'kinesis' for reading records from a Kinesis stream
# 'nsq' for reading records from a NSQ topic
source = "kinesis"

# Sink is used for sending events which processing failed.
# Sinks currently supported are:
# 'kinesis' for writing records to a Kinesis stream
# 'nsq' for writing records to a NSQ topic
sink = "kinesis"

# The following are used to authenticate for the Amazon Kinesis sink.
# If both are set to 'default', the default provider chain is used
# (see http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html)
# If both are set to 'iam', use AWS IAM Roles to provision credentials.
# If both are set to 'env', use environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
aws {
  accessKey = "iam"
  secretKey = "iam"
}

# Config for NSQ
nsq {
  # Channel name for NSQ source
  # If more than one application reading from the same NSQ topic at the same time,
  # all of them must have unique channel name for getting all the data from the same topic
  # channelName = "{{nsqSourceChannelName}}"
  channelName = ""

  # Host name for NSQ tools
  # host = "{{nsqHost}}"
  host = ""

  # HTTP port for nsqd
  # port = {{nsqdPort}}
  port = 0

  # HTTP port for nsqlookupd
  # lookupPort = {{nsqlookupdPort}}
  lookupPort = 0
}

kinesis {
  # LATEST: most recent data.
  # TRIM_HORIZON: oldest available data.
  # "AT_TIMESTAMP": Start from the record at or after the specified timestamp
  # Note: This only affects the first run of this application on a stream.
  initialPosition = "TRIM_HORIZON"

  # Need to be specified when initialPosition is "AT_TIMESTAMP".
  # Timestamp format need to be in "yyyy-MM-ddTHH:mm:ssZ".
  # Ex: "2017-05-17T10:00:00Z"
  # Note: Time need to specified in UTC.
  # initialTimestamp = "{{timestamp}}"

  # Maximum number of records to read per GetRecords call
  maxRecords = 100

  region = "us-east-1"

  # "appName" is used for a DynamoDB table to maintain stream state.
  appName = "XXX"
}

streams {
  # Input stream name
  inStreamName = "XXX"

  # Stream for events for which the storage process fails
  outStreamName = "XXX"

  # Events are accumulated in a buffer before being sent to S3.
  # The buffer is emptied whenever:
  # - the combined size of the stored records exceeds byteLimit or
  # - the number of stored records exceeds recordLimit or
  # - the time in milliseconds since it was last emptied exceeds timeLimit
  buffer {
    byteLimit = 12000 # Not supported by NSQ; will be ignored
    recordLimit = 200
    timeLimit = 9000 # Not supported by NSQ; will be ignored
  }
}

s3 {
  region = "us-east-1"
  bucket = "XXXX"

  # Format is one of lzo or gzip
  # Note, that you can use gzip only for enriched data stream.
  format = "gzip"

  # Maximum Timeout that the application is allowed to fail for
  maxTimeout = 5000
}

The container will terminal with no error message, All message show like:

[ec2-user@ip-10-0-10-209 ~]$ docker run -v ${PWD}/config:/snowplow/config -e 'SP_JAVA_OPTS=-Xms512m -Xmx512m' snowplow-docker-registry.bintray.io/snowplow/s3-loader:0.6.0 --config /snowplow/config/config.hocon


log4j:WARN No appenders could be found for logger (com.amazonaws.AmazonWebServiceClient).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[main] INFO com.snowplowanalytics.s3.loader.sinks.KinesisSink - Stream XXXX exists and is active
[main] INFO com.snowplowanalytics.s3.loader.SinkApp$ - Initializing sink with KinesisConnectorConfiguration: {regionName=us-east-1, s3Endpoint=https://s3.amazonaws.com, kinesisInputStream=XXX, maxRecords=100, connectorDestination=s3, bufferMillisecondsLimit=9000, bufferRecordCountLimit=200, s3Bucket=XXX, kinesisEndpoint=https://kinesis.us-east-1.amazonaws.com, appName=XXX, bufferByteSizeLimit=12000, retryLimit=1, initialPositionInStream=TRIM_HORIZON}
[main] INFO com.snowplowanalytics.s3.loader.KinesisSourceExecutor - KinesisSourceExecutor worker created

Could you tell me how can i debug?

Thanks.


#2

Hello @tiny,

You can get access to more debugging KCL information by using the -Dorg.slf4j.simpleLogger.defaultLogLevel=debug JVM flag.


#3

[ec2-user@ip-10-0-10-209 ~] docker run -v {PWD}/config:/snowplow/config -e ‘SP_JAVA_OPTS=-Xms512m -Xmx512m’ XXXXXXXXX/s3-loader:0.0.0.1 --config /snowplow/config/config.hocon

log4j:WARN No appenders could be found for logger (com.amazonaws.AmazonWebServiceClient).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[main] INFO com.snowplowanalytics.s3.loader.sinks.KinesisSink - Stream XXXX exists and is active
[main] INFO com.snowplowanalytics.s3.loader.SinkApp$ - Initializing sink with KinesisConnectorConfiguration: {regionName=us-east-1, s3Endpoint=https://s3.amazonaws.com, kinesisInputStream=XXX, maxRecords=100, connectorDestination=s3, bufferMillisecondsLimit=9000, bufferRecordCountLimit=200, s3Bucket=XXX, kinesisEndpoint=https://kinesis.us-east-1.amazonaws.com, appName=XXXr, bufferByteSizeLimit=12000, retryLimit=1, initialPositionInStream=TRIM_HORIZON}
[main] INFO com.snowplowanalytics.s3.loader.KinesisSourceExecutor - KinesisSourceExecutor worker created

#4

Thanks for your reply

It’s seem there is not any more logs.

Is there any other way to debug?

Thank you very much


#6

Hello @BenFradet

When i give the container instance full access to the resource, it can run correct.

Thanks for your help.