Stream Enrich 1.1.2 released

BenB · June 9, 2020, 1:44pm

We are glad to release Stream Enrich 1.1.2, quickly after Stream Enrich 1.1.1 got released. This is the first release from its new home snowplow/stream-enrich. Common Enrich, the underlying library used by Beam Enrich and Stream Enrich, also got moved to its new home snowplow/common-enrich.

Version 1.1.1 adds Sentry integration: if an unhandled exception is thrown by Common Enrich (even though it should never be the case), this is caught and sent to Sentry, if configured.

Version 1.1.2 fixes a bug introduced in 1.1.0 where the user agent of the HTTP request is used instead of ua query string parameter.

AcidFlow · June 11, 2020, 2:52pm

Hey @BenB!

Thanks for the quick bug fix release.

When I upgraded to R119 the stream-enrich version in the migration guide was pointing to 1.1.0 which included the useragent bug.

I’ve now updated to stream-enrich 1.1.2, but I would like to reprocess the few days of data which ended with a wrong useragent.

I was trying to use the last version of spark-enrich (which I know is not maintained anymore). I cannot run it on EMR (with a default spark installation) because it is compiled for Scala 2.12 and give the following error when starting.

java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V

I wanted to rebuild a version using scala 2.11, however the common-enrich artifact is only published for scala 2.12.

I managed to run spark_enrich locally, but I still need to reconfigure a few enrichments to get the same result from my stream-enrich pipeline.

Is there an easier way than running everything locally to reprocess these events?

Thanks in advance for your answer.

BenB · June 11, 2020, 8:10pm

Hi @AcidFlow,

I see 2 solutions:

Rebuild common-enrich with Scala 2.11 and publish it to a custom repo (e.g. local) and use it to compile Spark Enrich
Provided that you archive raw data on s3, write a Flink or Spark job to read this raw data from s3 and insert it back to Kinesis raw topic.

Please be aware that both of these methods would create duplicates, as enriched events have already been emitted (but with the wrong user agent).

Is it something that you could do?

AcidFlow · June 12, 2020, 10:07am

Okay actually I’m a bit lucky.

EMR 6.0.0 since its beta2, supports Spark with Scala 2.12.

So I can run spark-enrich on this EMR release.

I guess I’ll still have to tweak one or two enrichments and maybe create some bootstrap scripts to copy the GeoIP database and referer-latest.

But without these enrichments I was able to reprocess my events

BenB · June 12, 2020, 10:18am

Good to hear!

Colm · June 12, 2020, 10:26am

I guess I’ll still have to tweak one or two enrichments and maybe create some bootstrap scripts to copy the GeoIP database and referer-latest .

But without these enrichments I was able to reprocess my events

As long as your data comes from the main trackers (which generate event ID), and not eg. webhook (in which case event ID is generated during enrich process), then you’d likely be able to work your way around this using SQL.

Event ID in that case would be the same for the same event, so you could write some logic to create a single row per event with only the values that were correctly processed, then overwrite the two sets of half-correct events with that.

AcidFlow · June 12, 2020, 11:12am

Thanks @Colm!

I like the idea, and it would work in our case as these events are issued by a Java tracker.

I’ll keep you posted on the result !

Colm · June 12, 2020, 11:28am

It’s probably obvious, but since a destructive operation is involved it’s best to be explicit - I would definitely output the query to a separate table and double check results before any overwrite.

AcidFlow · June 15, 2020, 11:26am

Hello folks!

Good news everyone, re-processed data seem to be correct!

As @Colm suggested, I only applied useragent enrichments, and then updated existing records with the re-processed values in SQL.

Thanks for the assist and have a nice day

Colm · June 15, 2020, 11:34am

Great to hear @AcidFlow!

Topic		Replies	Views
Enrich 3.4.1 released New releases	0	590	October 10, 2022
Scala Stream Collector + Strem Enrich + S3 Loader Setup AWS real-time pipeline	6	3456	December 5, 2017
Enrich 3.2.3 released New releases	0	648	August 9, 2022
Payload with vendor cgi-bin and version index.cgi not supported by this version of Scala Common Enrich Enrichment	2	1712	April 24, 2018
Stream-enrich build - Invalid or corrupt jar file Troubleshooting	3	2042	January 8, 2017

Stream Enrich 1.1.2 released

Related Topics