Is the stream shredder still experimental?

Hi,

I’m trying to run the shredder in our environment. I have a question about stream shredder.

Is the stream shredder still experimental? Or it is suitable to use in a production environment?

I’m assuming it is experimental as the README (snowplow-rdb-loader/README.md at master · snowplow/snowplow-rdb-loader · GitHub) said it is experimental and there is no document for stream shredder.

Thanks,
shimpeko

Hi @shimpeko,

Apologies for the long silence on this one.

We’ve branded the stream shredder ‘experimental’ which was probably not the best use of the word. What we meant by it is that it has some limitations compared with the batch shredder. You can certainly use it in a production environment if you accept those limitations.

The first of these is that we’ve tested the stream shredder only in single-node deployments. In a distributed architecture, we expect there might be some race conditions between the different KCL workers. So the performance would be capped by the resources provided by the machine you run it on. This would be appropriate for low-volume pipelines, where the overhead of using Spark on EMR is not justified.

Secondly, there is no deduplication in the stream shredder. If duplicates are not a concern, or you can deal with them after the data has been loaded into the data warehouse, then this point is irrelevant.

We are currently working on the documentation for the stream shredder, but in the meantime, here’s what you need to know:

  • You can get the jar file from the Github release page or an image from Docker Hub under snowplow/snowplow-rdb-stream-shredder:2.2.0.

  • It takes the same config.hocon and iglu_resolver.json config files as the batch shredder. The only difference in the HOCON file is that the source is no longer an S3 bucket but a Kinesis stream. You can find the reference config file here.

  • You don’t need Dataflow Runner or EMR to run it. It can be as simple as:

    $ docker run snowplow/snowplow-rdb-stream-shredder:2.2.0 \
    --iglu-config 'base64-resolver' \
    --config 'base64-config'
    

Do let us know any feedback if you give it a try.

2 Likes

Hi @dilyan

Much appreciate your response. I understand the limitation and will discuss if we’d like to adapt it with my team.

Thanks,
Shimpeko