Raw events in gzip

Hey,

I have over 18 months of RAW events from realtime pipeline, compressed with GZIP. How can I recompress them to LZO/what can I do to process them with Batch pipeline?

I tried decompress gz files and compress them with either GNU lzop and S3DistCP on EMR. Both did not worked. Uncompressed files do not work too. Any ideas?

I see there is difference not only in compression method but in serializer used as well, unfortunately.

Hi @grzegorzewald - see the thread here:

I think you’re going to have to write a translation job to perform the change. The code inside in the Snowplow S3 Loader should help:

https://github.com/snowplow/snowplow-s3-loader/tree/develop/src/main/scala/com.snowplowanalytics.s3