Enricher high CPU utilisation issue

Hi @BenB
we are running below command for enricher
sudo docker run -d -v /snowplow/config:/snowplow/config snowplow-docker-registry.bintray.io/snowplow/stream-enrich-kinesis:0.21.0 --config /snowplow/config/enrich.hocon --resolver file:/snowplow/config/resolver.json --enrichments file:/snowplow/config/enrichments/ --force-cached-files-download

Regarding the stats will update you soon

Hi @karan,

Latest enrich version is 1.3.1. It’s available directly on Docker Hub. We recommend to use this version. Please note that it comes with a new format for the bad rows emitted by enrich. You can read about it on this blog post.

The upgrade guides until this version can be found here.

Thanks @BenB will definitely use 1.3.1 and let you know

Hi @BenB

I tried 1.3.1 in that I am getting CPU utilisation as well as sql enricher failure issue both.

{"schemaKey":"iglu:com.snowplowanalytics.snowplow.enrichments/sql_query_enrichment_config/jsonschema/1-0-0","identifier":"sql-query"},"message":{"error":"The placeholder map error. The map: Some(IntMap()), where count is: 1"}}]}

Hi @karan,

Sorry for getting back and forth, but we recently realised that SQL Enrichment bug wasn’t get fixed in 1.3.1, please give a try to a fresh 1.3.2.

However, I’m bit puzzled by CPU utilisation bug as it did certainly go away in 1.3.x branch in our experience.

Hi @anton,

Thanks for the SqlEnrichment Fix… 1.3.2 is working fine for Sql Enrichments now.

But CPU utilisation issue is still there. I tried 1.3.2 by removing Sql enrichment as well but there is no improvement in CPU utilisation, it keeps on getting over 100%.

Hi @karan,

Great that SQL enrichment is now working!

Regarding the CPU going over 100%, it’s hard to guess what can be wrong. We managed to troubleshoot the same issue in one of our pipelines using profiling. To do that, you need to add -Dcom.sun.management.jmxremote.port=5555 -Dcom.sun.management.jmxremote.rmi.port=5555 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Djava.rmi.server.hostname=127.0.0.1 to the JAVA options when running enrich (5555 being the port that you want to use) and then you can inspect the JVM that runs enrich with a tool like visualvm for instance.

In our case we discovered that enrich was constantly doing garbage collection, and we found out that this was due to a memory leak (fixed by 48e4ce8be913).

Using profiling you’ll be able to determine how the CPU is used and to find the culprit.

Hi @BenB,

Sorry for the late reply… I will definitely test this and share the outcome with you .

Hi @BenB,

I tested the fix and sql enrichment is working as expected. Only concern I have here is that the enricher is lagging behind the collector means number of records pushed by the collector are more as compare to the number of records processed by the enricher. Due this there is huge amount of lag get introduced in pipeline.

Any suggestion to overcome this issue?

I tested the fix and sql enrichment is working as expected.

Good to hear !!

the enricher is lagging behind the collector

Have you checked in Kinesis metrics with collector payloads that you didn’t reach the reading quota ?

Aren’t there any errors in the logs of enrich in Cloudwatch ?

@BenB

Thanks for the reply…

I will check this and get back

sudo docker run -d -v snowplow/config:/snowplow/config snowplow-docker-registry.bintray.io/snowplow/stream-enrich-kinesis: --config /snowplow/config/enrich.hocon --resolver file:/snowplow/config/resolver.json --enrichments file:/snowplow/config/enrichments/ --force-cached-files-download

Hi @karan ,

We’ve now released enrich-kinesis that aims at replacing Stream Enrich.

You can find its installation guide here. Latest version is 3.1.3.

Hi @BenB,

Thanks for the reply.

Actually, we have recently switched to latest version of enricher 3.1.3 but this time instead of kinesis we are using Kafka.

The latest stream enrich kafka is lagging way behind the collector with huge margin. For same I have started a new topic.

If you get a chance, please look into it as has become a bottle neck for us to migrate to kafka.