Raw events in gzip


#1

Hey,

I have over 18 months of RAW events from realtime pipeline, compressed with GZIP. How can I recompress them to LZO/what can I do to process them with Batch pipeline?

I tried decompress gz files and compress them with either GNU lzop and S3DistCP on EMR. Both did not worked. Uncompressed files do not work too. Any ideas?

I see there is difference not only in compression method but in serializer used as well, unfortunately.


#2

Hi @grzegorzewald - see the thread here:

I think you’re going to have to write a translation job to perform the change. The code inside in the Snowplow S3 Loader should help: