I have the bq stream loader setup on appengine flexible. The current scaling metric is cpu utilization which is set at 10% (the bqloader doesnt scale enough with other values).
Performed a load test on the pipeline with 8k requests/second for 90mins. The result was that the bq loader scaled to 163 instances(max set to 200) but there was a huge backlog and it only kept increasing until the load test came to an end. Once the test was over the backlog started decreasing though.
The docker file for bq stream loader is:
FROM openjdk:18-alpine COPY snowplow-bigquery-streamloader-1.1.0.jar snowplow-bigquery-streamloader-1.1.0.jar COPY config.hocon config.hocon COPY resolver.json resolver.json COPY script.sh script.sh RUN apk add jq CMD sh script.sh
The script.sh contents are:
jq '.data.repositories.connection.http.uri=env.SCHEMA_BUCKET' resolver.json >> tmp.json && mv tmp.json resolver.json java -jar snowplow-bigquery-streamloader-1.1.0.jar --config $(cat config.hocon | base64 -w 0) --resolver $(cat resolver.json | base64 -w 0)
AppEngine service config being:
runtime: custom api_version: '1.0' env: flexible threadsafe: true env_variables: ... automatic_scaling: cool_down_period: 120s min_num_instances: 2 max_num_instances: 200 cpu_utilization: target_utilization: 0.1 network: ... liveness_check: initial_delay_sec: 300 check_interval_sec: 30 timeout_sec: 4 failure_threshold: 4 success_threshold: 2 readiness_check: check_interval_sec: 5 timeout_sec: 4 failure_threshold: 2 success_threshold: 2 app_start_timeout_sec: 300 service_account: ...
There are errors for the bq stream loader:
Is there a way to have the bqloader handle the load more efficiently without having the backlog increase while under load.
Could you help with this?