Lack of enricher config examples

Trying to use the kafka stream version of the enricher.
There are no example files for kafka streaming here: https://github.com/snowplow/enrich/tree/master/config

I want basic things: change log level and enable statsd metrics.

My config is pasted below. As you see here I try both a direct “metrics”-block and a block inside the “monitoring” block. The wiki says both things, so I have tried only using one of them, then the other, then both, but to no effect.
I am using the container image snowplow/stream-enrich-kafka:3.1.5

Any help would be greatly appreciated.

enrich {
  streams {

    in {
      raw = snowplow-goodRawEvents
    }

    out {
      enriched = snowplow-goodEnrichedEvents
      bad = snowplow-badEnrichedEvents
      pii = snowplow-piiEnrichedEvents

      partitionKey = "event_id"
    }

    sourceSink {
      enabled = "kafka" 
      initialTimestamp = "2018-01-01T00:00:00Z"

      backoffPolicy {
        minBackoff = 5000
        maxBackoff = 600000
      }

      brokers = "host.docker.internal:9092"

      retries = 1

      producerConf {
        acks = all
        "key.serializer"     = "org.apache.kafka.common.serialization.StringSerializer"
        "value.serializer"   = "org.apache.kafka.common.serialization.StringSerializer"
      }

      consumerConf {
        "enable.auto.commit" = true
        "auto.commit.interval.ms" = 1000
        "auto.offset.reset"  = earliest
        "session.timeout.ms" = 30000
        "key.deserializer"   = "org.apache.kafka.common.serialization.StringDeserializer"
        "value.deserializer" = "org.apache.kafka.common.serialization.ByteArrayDeserializer"
      }
    }

    buffer {
      byteLimit = 4500000 
      recordLimit = 500
      timeLimit = 60000
    }

    appName = "enricher-app"
  }

  monitoring {
    snowplow {
      collectorUri = "host.docker.internal"
      collectorPort = 80
      appId = "enricher-app" 
      method = GET
    }

    metrics {
      statsd {
        enabled = true
        hostname = "host.docker.internal"
        port = 8125
        period = "10 seconds"
      }
    }
  }

  metrics {
    statsd {
      enabled = true
      hostname = "host.docker.internal"
      port = 8125
      period = "10 seconds"
    }
  }
}

1 Like

Hi @fref,

Thanks for pointing out that we removed the Kafka example config files from the github repo. This was a mistake on our part, and we should put them back again.

Meanwhile, if you follow this direct link it will take you to an example config file from the github history. The configuration hasn’t changed, so that slightly-old version will still help you configure version 3.1.5 of stream-enrich-kafka.

The old example config is a little bit hard to follow, but there is a section starting at line 126 with the kafka-specific options.

To change log level, you can use flags on the command line like -Dorg.slf4j.simpleLogger.defaultLogLevel=info (or warn or error). This SimpleLogger documentation describes the other standard configuration options. Here is an example using the docker image:

docker run \
  -v ${path_to_config_dir}:/snowplow/config \
  snowplow/stream-enrich-kafka:3.1.5 \
  -Dorg.slf4j.simpleLogger.defaultLogLevel=info \
  --config /snowplow/config/config.hocon \
  --resolver file:/snowplow/config/iglu_resolver.json \
  --enrichments file:/snowplow/config/enrichments/

Unfortunately we don’t have statsd metrics implemented in stream-enrich-kafka. But there is some good news! In the next couple of months we will be working on a completely new implementation of enrich for kafka. It will use the same core as our recent enrich apps for kinesis and pubsub, so it will include all the nice new features like statsd metrics. We will announce it here on Discourse when it’s ready.

1 Like