Streaming to Kinesis only writes one record?


#1

Hi,

I’m running the collector and enrich with scala in local mode. (stdout to stdin)
If my enrich-sink is stdouterr, I successfully get the data printed out for every page view event.

When my sink is kinesis, I only get the first event written to kinesis. The others seem to be discarded…
This is the command I’m running:

$ java -jar snowplow-stream-collector-0.7.0 --config ./docker/config/collector.config.hocon | java -jar snowplow-stream-enrich-0.8.1 --config ./docker/config/enrich.config.hocon --resolver file:./docker/config/iglu_resolver.json

and that’s the log I’m getting:

[main] INFO com.snowplowanalytics.snowplow.enrich.kinesis.sinks.KinesisSink - Stream snowplow_enriched exists and is active
[main] INFO com.snowplowanalytics.snowplow.enrich.kinesis.sinks.KinesisSink - Stream snowplow_enriched_bad exists and is active
[main] INFO com.snowplowanalytics.snowplow.enrich.kinesis.sinks.KinesisSink - Writing 1 records to Kinesis stream snowplow_enriched
[main] INFO com.snowplowanalytics.snowplow.enrich.kinesis.sinks.KinesisSink - Successfully wrote 1 out of 1 records

Event when I trigger some events, I still don’t get any additional records written to Kinesis.

This is my out config:

    out: {
      enriched: "snowplow_enriched"
      bad: "snowplow_enriched_bad"

      # Minimum and maximum backoff periods
      # - Units: Milliseconds
      backoffPolicy: {
        minBackoff: 3000 # 3 seconds
        maxBackoff: 3000 # 5 minutes
      }
    }

    # "app-name" is used for a DynamoDB table to maintain stream state.
    # You can set it automatically using: "SnowplowKinesisEnrich-$\\{enrich.streams.in.raw\\}"
    app-name: "SnowplowKinesisEnrich-$\\{enrich.streams.in.raw\\}"

    # LATEST: most recent data.
    # TRIM_HORIZON: oldest available data.
    # Note: This only effects the first run of this application
    # on a stream.
    initial-position = "TRIM_HORIZON"

    region: "ap-southeast-2"

Thanks for the help!


#2

Hi @simplesteph, could you post your full enrich.config.hocon with the sensitive credentials removed? Thanks.


#3

Hi @simplesteph,

While playing with Kinesis stack on single machine, it is usually more convenient to use processes in parallel (so run them as separate processes rather than a single command with pipes streaming outputs to inputs). In fact, you approach may make in impossible to debug, as all the out/log data is being sent to following process.

The additional disadvantage is that you do not know if one of middle stages has failed. You have most likely view on the output of the storage process.

Recently I have found, that in local mode it is more convenient to use named pipes rather than streaming output to input. This is something similar to snowplow mini, but setup on local machine. The biggest advantage is possibility to handle both: good and bad stream on each part of process.


#4

Hi,

Thanks for the help!
When I set them up in parallel it’s working! Not sure what was the issue before

Thanks again,
Stephane