How to run RDB shredder?

pramod.niralakeri · December 16, 2021, 1:19am

I’m having a stupid question.

How to run rdb shredder?

I’ve this jar file from s3://snowplow-hosted-assets/4-storage/rdb-shredder/snowplow-rdb-shredder-2.2.0-rc1.jar

and config file.

so running rdb shredder is just executing the jar file right?

pramod.niralakeri · December 29, 2021, 5:16pm

Bump

enes_aldemir · December 31, 2021, 8:54am

Hey @pramod.niralakeri ,

We have two types of shredder currently.

First one is RDB Batch Shredder. RDB Batch Shredder is Spark job. It needs to be run on AWS EMR. You can use Dataflow Runner to submit job to EMR. You can follow this guide to get more information about running the RDB Shredder on EMR.

Second one is RDB Stream Shredder. Stream Shredder is reading from Kinesis stream directly and writing its output to s3. It is plain Java application. Therefore you don’t need any platform like EMR to run it. Reference config file for Stream Shredder can be found here. However, keep in mind that Stream Shredder is still in experimental phase. We don’t recommend to run it on high volume pipelines.

Let us know if you have any further question.

pramod.niralakeri · December 31, 2021, 9:33am

Thank you. @enes_aldemir that clears my doubt, but just curious and wondering to know. When we’ll get production ready stream shredder?

Also, to run it(Stream shredder) in staging environment, what type of output file format/compression I should use? So that RDB Loader takes it for to load into Redshift.

Topic		Replies	Views
Should I run rdb_load only? For engineers	7	997	February 11, 2020
Most up-to-date approach to running RDBLoader Storage targets	2	1086	June 12, 2018
Dataflow Runner setup For engineers	3	801	February 11, 2022
Running the RDB Loader without Redshift Storage targets	3	995	October 15, 2021
RDB Loader, Storage Loader, EmrEtlRunner Storage targets	14	2058	October 22, 2019

How to run RDB shredder?

Related Topics