Enrich Pub Sub on App Engine Scaling Issue

Hey,

I had created a custom docker image for the enrich pubsub (though there is a docker image provided), the dockerfile for the same is as follows:

FROM openjdk:12

COPY /enrichments /enrichments
COPY config.hocon config.hocon
COPY resolver.json resolver.json
COPY snowplow-enrich-pubsub-2.0.3.jar snowplow-enrich-pubsub-2.0.3.jar
COPY script.sh script.sh

RUN yum install jq -y

CMD sh script.sh

Contents of script.sh being:

jq '.data.repositories[0].connection.http.uri=env.SCHEMA_BUCKET' resolver.json >> tmp.json && mv tmp.json resolver.json
java -jar snowplow-enrich-pubsub-2.0.3.jar --enrichments enrichments --iglu-config resolver.json --config config.hocon

I was able to successfully deploy it on AppEngine Flexible and have sent a load of 12 millions records to the same. The enricher scaled to a max of 20 instances and still wasn’t able to clear the backlog even after a few hours.

Instance Count:
image

Backlog:
image

Just wanted to know if I’m missing something

Thank you!

Hi @siv ,

Indeed the numbers don’t look good. In our testing, on a JVM with only 1GB of memory and 2 cores we were able to process 3.3 millions of events per hour.

I don’t think that the issue is related to having your own Docker image.

  1. Your screenshot shows a constant decrease of the unacked messages, does it stop decreasing at some point ? Or is it just too slow ?
  2. Are there some errors in the logs ?
  3. What is your scaling strategy ?
  4. How big are your events ?

Hi @BenB,

Thanks the reply!

There were no errors but I realized I wasn’t using the latest openjdk and hence switched to jdk 18, that seemed to have fixed the issue for the enricher as it is now able handle the load we desire without delay.

Thank you!

1 Like