Error in setting up Collector Part


The following is my config file

Copyright © 2013-2017 Snowplow Analytics Ltd. All rights reserved.

    # This program is licensed to you under the Apache License Version 2.0, and
    # you may not use this file except in compliance with the Apache License
    # Version 2.0.  You may obtain a copy of the Apache License Version 2.0 at
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the Apache License Version 2.0 is distributed on an "AS
    # implied.  See the Apache License Version 2.0 for the specific language
    # governing permissions and limitations there under.

    # This file (application.conf.example) contains a template with
    # configuration options for the Scala Stream Collector.
    # To use, copy this to 'application.conf' and modify the configuration options.

    # 'collector' contains configuration options for the main Scala collector.
    collector {
      # The collector runs as a web service specified on the following
      # interface and port.
      interface = "123.123.123"
      port = 8080

      # Configure the P3P policy header.
      p3p {
            policyRef = "/w3c/p3p.xml"

      # The collector returns a cookie to clients for user identification
      # with the following domain and expiration.
      cookie {
            enabled = true
            expiration = "365 days" # e.g. "365 days"
            # Network cookie name
            name = UnilogAnalytics
            # The domain is optional and will make the cookie accessible to other
            # applications on the domain. Comment out this line to tie cookies to
            # the collector's full domain
            domain = "collector.cookie.domain"

      # When enabled and the cookie specified above is missing, performs a redirect to itself to check
      # if third-party cookies are blocked using the specified name. If they are indeed blocked,
      # fallbackNetworkId is used instead of generating a new random one.
      cookieBounce {
            enabled = false
            # The name of the request parameter which will be used on redirects checking that third-party
            # cookies work.
            name = "n3pc"
            # Network user id to fallback to when third-party cookies are blocked.
            fallbackNetworkUserId = "00000000-0000-4000-A000-000000000000"
       # When enabled, the redirect url passed via the `u` query parameter is scanned for a placeholder
       # Events that are too big (w.r.t Kinesis 1MB limit) will be stored in the bad stream/topic
       # token. All instances of that token are replaced withe the network ID. If the placeholder isn't
            bad = collector.streams.good
       # specified, the default value is `${SP_NUID}`.
            redirectMacro {
                enabled = false
       # Optional custom placeholder token (defaults to the literal `${SP_NUID}`)
                placeholder = "${SP_NUID}"

       streams {
            # Events which have successfully been collected will be stored in the good stream/topic
            good = GoodStream

            # Events that are too big (w.r.t Kinesis 1MB limit) will be stored in the bad stream/topic
            bad = BadStream

            # Whether to use the incoming event's ip as the partition key for the good stream/topic
            # Note: Nsq does not make use of partition key.
            useIpAddressAsPartitionKey = false

            # Enable the chosen sink by uncommenting the appropriate configuration
            sink {
              # Choose between kinesis, kafka, nsq, or stdout
              # To use stdout comment everything
              enabled = kinesis

              # Region where the streams are located
              region = ca-central-1

              # Thread pool size for Kinesis API requests
              threadPoolSize = 10

              # The following are used to authenticate for the Amazon Kinesis sink.
              # If both are set to 'default', the default provider chain is used
              # (see
              # If both are set to 'iam', use AWS IAM Roles to provision credentials.
              # If both are set to 'env', use environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
              aws {
                    accessKey = ABC-XYZ
                    secretKey = ABC-XYZ

              # Minimum and maximum backoff periods
              backoffPolicy {
                    minBackoff = 3000
                    maxBackoff = 600000

              # Or Kafka
              #brokers = "{{kafkaBrokers}}"
              ## Number of retries to perform before giving up on sending a record
              #retries = 0

              # Or NSQ
              ## Host name for nsqd
              #host = "{{nsqd}}"
              ## TCP port for nsqd, 4150 by default
              #port = 4150

            # Incoming events are stored in a buffer before being sent to Kinesis/Kafka.
            # Note: Buffering is not supported by NSQ.
            # The buffer is emptied whenever:
            # - the number of stored records reaches record-limit or
            # - the combined size of the stored records reaches byte-limit or
            # - the time in milliseconds since the buffer was last emptied reaches time-limit
            buffer {
              byteLimit = 4500000
              recordLimit = 500 # Not supported by Kafka; will be ignored
              timeLimit = 60000

    # Akka has a variety of possible configuration options defined at
    akka {
      loglevel = DEBUG # 'OFF' for no logging, 'DEBUG' for all logging.
      loggers = ["akka.event.slf4j.Slf4jLogger"]

      # akka-http is the server the Stream collector uses and has configurable options defined at
      http.server {
            # To obtain the hostname in the collector, the 'remote-address' header
            # should be set. By default, this is disabled, and enabling it
            # adds the 'Remote-Address' header to every request automatically.
            remote-address-header = on

            raw-request-uri-header = on

            # Define the maximum request length (the default is 2048)
            parsing {
              max-uri-length = 32768
              uri-parsing-mode = relaxed

And this is the Error I am facing…

java -Dcom.amazonaws.sdk.disableCbor -jar snowplow-stream-collector-0.12.0.jar --config collector.conf
[] INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
[main] INFO com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink - Creating thread pool of size 10
Exception in thread “main” java.lang.IllegalArgumentException: requirement failed: Kinesis stream GoodStream doesn’t exist
at scala.Predef$.require(Predef.scala:224)
at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink.(KinesisSink.scala:114)
at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.createAndInitialize(KinesisSink.scala:48)
at com.snowplowanalytics.snowplow.collectors.scalastream.Collector$.run(Collector.scala:80)
at com.snowplowanalytics.snowplow.collectors.scalastream.Collector$.main(Collector.scala:63)
at com.snowplowanalytics.snowplow.collectors.scalastream.Collector.main(Collector.scala)

I need assistance regarding this issue.


Do the GoodStream and BadStream kinesis streams exist in your account already? Are they accessible using the access key and secret key you’ve specified in the configuration?


Yes, I did set the GoodStream and BadStream later on. Thank you so much for helping me out from that.
But now I encountered a new situation and the following is it’s screenshot.

Am I also supposed to set the Kinesis Firehose and Data Analytics stuffs??
Can you please help me figure out that?


The screenshot doesn’t show any errors - if the collector is binding to an interface that means you should now be able to send events to the collector that will sink into Kinesis.

Snowplow doesn’t use Kinesis Firehose or Kinesis Analytics as part of the AWS pipeline so the next step would be to set up the stream enrichment process.


(post withdrawn by author, will be automatically deleted in 24 hours unless flagged)


(post withdrawn by author, will be automatically deleted in 24 hours unless flagged)