Illegal base64 character when running bigquery mutator with docker

When i try to run a docker image of bigquery-mutator i get a Illegal base64 character 5f

When running in a gcp instance i execute mutator like this:
./snowplow-bigquery-mutator-0.1.0/bin/snowplow-bigquery-mutator create --config $(cat bigquery_config.json | base64 -w 0) --resolver $(cat iglu_resolver.json | base64 -w 0)

Which works great.

But running it as a docker image fails:

user@MacBook-Pro-USER mutator % docker run 392cef3c1e9e create --config bigquery_config.txt --resolver iglu_resolver.txt
Illegal base64 character 5f

Usage: snowplow-bigquery-mutator create --resolver <string> --config <string>

Create empty table and exit

Options and flags:
    --help
        Display this help text.
    --resolver <string>
        Base64-encoded Iglu Resolver configuration
    --config <string>
        Base64-encoded BigQuery configuration

bigquery_config.txt file looks like this:

ewogICAgInNjaGVtYSI6ICJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5zbm93cGxvdy5zdG9yYWdlL2JpZ3F1ZXJ5X2NvbmZpZy9qc29uc2NoZW1hLzEtMC0wIiwKICAgICJkYXRhIjogewogICAgICAgICJuYW1lIjogIkFscGhhIEJpZ1F1ZXJ5IHRlc3QiLAogICAgICAg
ICJpZCI6ICIzMWIxNTU5ZC1kMzE5LTQwMjMtYWFhZS05NzY5ODIzOGQ4MDgiLAoKICAgICAgICAicHJvamVjdElkIjogImNvbS1hY21lIiwKICAgICAgICAiZGF0YXNldElkIjogInNub3dwbG93IiwKICAgICAgICAidGFibGVJZCI6ICJldmVudHMiLAoKICAgICAgICAiaW5wdXQiOiA
iZW5yaWNoZWQtZ29vZC1zdWIiLAogICAgICAgICJ0eXBlc1RvcGljIjogImJxLXRlc3QtdHlwZXMiLAogICAgICAgICJ0eXBlc1N1YnNjcmlwdGlvbiI6ICJicS10ZXN0LXR5cGVzLXN1YiIsCiAgICAgICAgImJhZFJvd3MiOiAiYnEtdGVzdC1iYWQtcm93cyIsCiAgICAgICAgImZhaW
xlZEluc2VydHMiOiAiYnEtdGVzdC1iYWQtaW5zZXJ0cyIsCgogICAgICAgICJsb2FkIjogewogICAgICAgICAgICAibW9kZSI6ICJTVFJFQU1JTkdfSU5TRVJUUyIsCiAgICAgICAgICAgICJyZXRyeSI6IGZhbHNlCiAgICAgICAgfSwKCiAgICAgICAgInB1cnBvc2UiOiAiRU5SSUNIR
URfRVZFTlRTIgogICAgfQp9

with iglu_resolver.txt looking like this:

ewogICAgInNjaGVtYSI6ICJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5pZ2x1L3Jlc29sdmVyLWNvbmZpZy9qc29uc2NoZW1hLzEtMC0wIiwKICAgICJkYXRhIjogewogICAgICAgICJjYWNoZVNpemUiOiA1MDAsCiAgICAgICAgInJlcG9zaXRvcmllcyI6IFsKICAgICAg
ICAgICAgewogICAgICAgICAgICAgICAgIm5hbWUiOiAiSWdsdSBDZW50cmFsIiwKICAgICAgICAgICAgICAgICJwcmlvcml0eSI6IDAsCiAgICAgICAgICAgICAgICAidmVuZG9yUHJlZml4ZXMiOiBbICJjb20uc25vd3Bsb3dhbmFseXRpY3MiIF0sCiAgICAgICAgICAgICAgICAiY29
ubmVjdGlvbiI6IHsKICAgICAgICAgICAgICAgICAgICAiaHR0cCI6IHsKICAgICAgICAgICAgICAgICAgICAgICAgInVyaSI6ICJodHRwOi8vaWdsdWNlbnRyYWwuY29tIgogICAgICAgICAgICAgICAgICAgIH0KICAgICAgICAgICAgICAgIH0KICAgICAgICAgICAgfSwKICAgICAgIC
AgICAgewogICAgICAgICAgICAgICAgIm5hbWUiOiAiSWdsdSBDZW50cmFsIC0gR0NQIE1pcnJvciIsCiAgICAgICAgICAgICAgICAicHJpb3JpdHkiOiAxLAogICAgICAgICAgICAgICAgInZlbmRvclByZWZpeGVzIjogWyAiY29tLnNub3dwbG93YW5hbHl0aWNzIiBdLAogICAgICAgI
CAgICAgICAgImNvbm5lY3Rpb24iOiB7CiAgICAgICAgICAgICAgICAgICAgImh0dHAiOiB7CiAgICAgICAgICAgICAgICAgICAgICAgICJ1cmkiOiAiaHR0cDovL21pcnJvcjAxLmlnbHVjZW50cmFsLmNvbSIKICAgICAgICAgICAgICAgICAgICB9CiAgICAgICAgICAgICAgICB9CiAg
ICAgICAgICAgIH0KICAgICAgICBdCiAgICB9Cn0K

The same issue happens with the bigquery-loader

Tested on 0.1.0, 0.3.0, 0.4.0 fails on all three

Hi @Arclight_Slavik
~you likely found an edge case in our bas64 decoding algorithm that does not allow fo r _ (this is your case) and - which might result in above behavior. As a workaround you can remoe the char. ~
We will investigate an alternative in https://github.com/snowplow-incubator/snowplow-bigquery-loader/issues/65 though.

Edit: I verified the above assumption and neither of these seem to be offending strings. I believe base64 encoding process might be here to blame. You should be able to use java.util.Encoder.getEncoder(YOUR_STRING.getBytes(java.nio.charset.StandardCharsets.UTF_8)).

I’ve tried to do encoding and the error still persisted.

Here’s what i did:

import java.util.Base64;

public class HelloWorld{

     public static void main(String []args){
        String my_string = "{\r\n    \"schema\": \"iglu:com.snowplowanalytics.snowplow.storage/bigquery-config/jsonschema/1-0-0\",\r\n    \"data\": {\r\n        \"name\": \"Alpha BigQuery test\",\r\n        \"id\": \"31b1559d-d319-4023-aaae-97698238d808\",\r\n\r\n        \"projectId\": \"com-acme\",\r\n        \"datasetId\": \"snowplow\",\r\n        \"tableId\": \"events\",\r\n\r\n        \"input\": \"enriched-good-sub\",\r\n        \"typesTopic\": \"bq-test-types\",\r\n        \"typesSubscription\": \"bq-test-types-sub\",\r\n        \"badRows\": \"bq-test-bad-rows\",\r\n        \"failedInserts\": \"bq-test-bad-inserts\",\r\n\r\n        \"load\": {\r\n            \"mode\": \"STREAMING_INSERTS\",\r\n            \"retry\": false\r\n        },\r\n\r\n        \"purpose\": \"ENRICHED_EVENTS\"\r\n    }\r\n}";
                    
        byte[] encodedBytes = Base64.getEncoder().encode(my_string.getBytes(java.nio.charset.StandardCharsets.UTF_8));
        String encoded_bigquery = Base64.getEncoder().encodeToString(encodedBytes);
        
        System.out.println("bigquery");
        System.out.println(encoded_bigquery);
     }
}

This code results in this string

ZXcwS0lDQWdJQ0p6WTJobGJXRWlPaUFpYVdkc2RUcGpiMjB1YzI1dmQzQnNiM2RoYm1Gc2VYUnBZM011YzI1dmQzQnNiM2N1YzNSdmNtRm5aUzlpYVdkeGRXVnllUzFqYjI1bWFXY3Zhbk52Ym5OamFHVnRZUzh4TFRBdE1DSXNEUW9nSUNBZ0ltUmhkR0VpT2lCN0RRb2dJQ0FnSUNBZ0lDSnVZVzFsSWpvZ0lrRnNjR2hoSUVKcFoxRjFaWEo1SUhSbGMzUWlMQTBLSUNBZ0lDQWdJQ0FpYVdRaU9pQWlNekZpTVRVMU9XUXRaRE14T1MwME1ESXpMV0ZoWVdVdE9UYzJPVGd5TXpoa09EQTRJaXdOQ2cwS0lDQWdJQ0FnSUNBaWNISnZhbVZqZEVsa0lqb2dJbU52YlMxaFkyMWxJaXdOQ2lBZ0lDQWdJQ0FnSW1SaGRHRnpaWFJKWkNJNklDSnpibTkzY0d4dmR5SXNEUW9nSUNBZ0lDQWdJQ0owWVdKc1pVbGtJam9nSW1WMlpXNTBjeUlzRFFvTkNpQWdJQ0FnSUNBZ0ltbHVjSFYwSWpvZ0ltVnVjbWxqYUdWa0xXZHZiMlF0YzNWaUlpd05DaUFnSUNBZ0lDQWdJblI1Y0dWelZHOXdhV01pT2lBaVluRXRkR1Z6ZEMxMGVYQmxjeUlzRFFvZ0lDQWdJQ0FnSUNKMGVYQmxjMU4xWW5OamNtbHdkR2x2YmlJNklDSmljUzEwWlhOMExYUjVjR1Z6TFhOMVlpSXNEUW9nSUNBZ0lDQWdJQ0ppWVdSU2IzZHpJam9nSW1KeExYUmxjM1F0WW1Ga0xYSnZkM01pTEEwS0lDQWdJQ0FnSUNBaVptRnBiR1ZrU1c1elpYSjBjeUk2SUNKaWNTMTBaWE4wTFdKaFpDMXBibk5sY25Seklpd05DZzBLSUNBZ0lDQWdJQ0FpYkc5aFpDSTZJSHNOQ2lBZ0lDQWdJQ0FnSUNBZ0lDSnRiMlJsSWpvZ0lsTlVVa1ZCVFVsT1IxOUpUbE5GVWxSVElpd05DaUFnSUNBZ0lDQWdJQ0FnSUNKeVpYUnllU0k2SUdaaGJITmxEUW9nSUNBZ0lDQWdJSDBzRFFvTkNpQWdJQ0FnSUNBZ0luQjFjbkJ2YzJVaU9pQWlSVTVTU1VOSVJVUmZSVlpGVGxSVElnMEtJQ0FnSUgwTkNuMD0

Putting this into the config results in the same error.

I used this website to decode it: https://www.base64decode.org/

The result looks like this

eyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5zbm93cGxvdy5zdG9yYWdlL2JpZ3F1ZXJ5X2NvbmZpZy9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6eyJuYW1lIjoiQWxwaGEgQmlnUXVlcnkgdGVzdCIsImlkIjoiMzFiMTU1OWQtZDMxOS00MDIzLWFhYWUtOTc2OTgyMzhkODA4IiwicHJvamVjdElkIjoiY29tLWFjbWUiLCJkYXRhc2V0SWQiOiJzbm93cGxvdyIsInRhYmxlSWQiOiJldmVudHMiLCJpbnB1dCI6ImVucmljaGVkLWdvb2Qtc3ViIiwidHlwZXNUb3BpYyI6ImJxLXRlc3QtdHlwZXMiLCJ0eXBlc1N1YnNjcmlwdGlvbiI6ImJxLXRlc3QtdHlwZXMtc3ViIiwiYmFkUm93cyI6ImJxLXRlc3QtYmFkLXJvd3MiLCJmYWlsZWRJbnNlcnRzIjoiYnEtdGVzdC1iYWQtaW5zZXJ0cyIsImxvYWQiOnsibW9kZSI6IlNUUkVBTUlOR19JTlNFUlRTIiwicmV0cnkiOmZhbHNlfSwicHVycG9zZSI6IkVOUklDSEVEX0VWRU5UUyJ9fQ==

Putting this into the config also results in the same error.
(if you decode this string again on the website it looks like a good json)

I’ve also tried to use [B@6d06d69c which is the encodedBytes variable but still get the same error.

Am i missing something?

This mostly looks fine. I’d minify your JSON before encoding it though just to remove any carriage returns and try again.

Minified it but still get the same error.

How the string looks:

String my_string = "{\"schema\":\"iglu:com.snowplowanalytics.snowplow.storage/bigquery-config/jsonschema/1-0-0\",\"data\":{\"name\":\"Alpha BigQuery test\",\"id\":\"31b1559d-d319-4023-aaae-97698238d808\",\"projectId\":\"com-acme\",\"datasetId\":\"snowplow\",\"tableId\":\"events\",\"input\":\"enriched-good-sub\",\"typesTopic\":\"bq-test-types\",\"typesSubscription\":\"bq-test-types-sub\",\"badRows\":\"bq-test-bad-rows\",\"failedInserts\":\"bq-test-bad-inserts\",\"load\":{\"mode\":\"STREAMING_INSERTS\",\"retry\":false},\"purpose\":\"ENRICHED_EVENTS\"}}";

Resulting encoding:

ZXlKelkyaGxiV0VpT2lKcFoyeDFPbU52YlM1emJtOTNjR3h2ZDJGdVlXeDVkR2xqY3k1emJtOTNjR3h2ZHk1emRHOXlZV2RsTDJKcFozRjFaWEo1TFdOdmJtWnBaeTlxYzI5dWMyTm9aVzFoTHpFdE1DMHdJaXdpWkdGMFlTSTZleUp1WVcxbElqb2lRV3h3YUdFZ1FtbG5VWFZsY25rZ2RHVnpkQ0lzSW1sa0lqb2lNekZpTVRVMU9XUXRaRE14T1MwME1ESXpMV0ZoWVdVdE9UYzJPVGd5TXpoa09EQTRJaXdpY0hKdmFtVmpkRWxrSWpvaVkyOXRMV0ZqYldVaUxDSmtZWFJoYzJWMFNXUWlPaUp6Ym05M2NHeHZkeUlzSW5SaFlteGxTV1FpT2lKbGRtVnVkSE1pTENKcGJuQjFkQ0k2SW1WdWNtbGphR1ZrTFdkdmIyUXRjM1ZpSWl3aWRIbHdaWE5VYjNCcFl5STZJbUp4TFhSbGMzUXRkSGx3WlhNaUxDSjBlWEJsYzFOMVluTmpjbWx3ZEdsdmJpSTZJbUp4TFhSbGMzUXRkSGx3WlhNdGMzVmlJaXdpWW1Ga1VtOTNjeUk2SW1KeExYUmxjM1F0WW1Ga0xYSnZkM01pTENKbVlXbHNaV1JKYm5ObGNuUnpJam9pWW5FdGRHVnpkQzFpWVdRdGFXNXpaWEowY3lJc0lteHZZV1FpT25zaWJXOWtaU0k2SWxOVVVrVkJUVWxPUjE5SlRsTkZVbFJUSWl3aWNtVjBjbmtpT21aaGJITmxmU3dpY0hWeWNHOXpaU0k2SWtWT1VrbERTRVZFWDBWV1JVNVVVeUo5ZlE9PQ==

The result there looks like it’s been base64 encoded twice, rather than just once.

Yea but even if i encode once i get the same error.

Wouldn’t it be better to just take the .json file directly instead of needing to encode it?

For example in docker images for the enricher i provided the resolver as a .json file and it worked well.

And looking into the code it looks like the base64 config is getting decoded into json.

1 Like

Can you provide java version that you’re running? There were changes to how base64 works among different JVM versions. This also might indicate there might be something off with char encoding in the original string. It could be transformed into your own locale and then become a different chars in the output. Some chars might look the same but they are totally different things in terms of UTF-8. I might be wrong though.

The problem with supplying a JSON file is a little bit different with jobs that rely heavily on clustering and real-life JSON configs can get pretty beefy. Number of runtime instances might be in thousands for some users. An alternative would be to provide a binary location but that affects how fast autoscaling works. But we might be investigating an alternative approach in the future.

Java info:

java version "1.8.0_241"
Java(TM) SE Runtime Environment (build 1.8.0_241-b07)
Java HotSpot(TM) 64-Bit Server VM (build 25.241-b07, mixed mode)

I’ve fixed the problem by adding an entrypoint in the docker-compose file

    entrypoint: ["sh", "/config/loader.sh"]

Then in the Dockerfile i do

FROM snowplow-docker-registry.bintray.io/snowplow/snowplow-bigquery-loader:0.4.0
COPY loader-config/loader.sh /config/loader.sh
RUN chmod +x /config/loader.sh

With my loader.sh file looking like this

#!/bin/bash
./bin/snowplow-bigquery-loader \
    --config=$(cat /config/bigquery_config.json | base64 -w 0) \
    --resolver=$(cat /config/iglu_resolver.json | base64 -w 0) \
    --runner=DirectRunner \
    --project=<project_id> \
    --region=<region> \
    --gcpTempLocation=gs://<temp_folder>

Seems to work now :slight_smile:

1 Like

The problem is still present in 0.5.1. Arclight_Slavik is right, using custom docker image does the trick.

This happens because the command line expects to receive the base64 encoded string, not the file base64 encoded. So for instance, this will fail:

base64 /usr/config/bq_schema.json -w 0 > /usr/config/bq_schema_b64
base64 /usr/config/iglu_resolver.json -w 0 > /usr/config/iglu_resolver_b64

docker run \
  -v /usr/config:/snowplow/config \
  -e GOOGLE_APPLICATION_CREDENTIALS=/snowplow/config/credentials.json \
  snowplow/snowplow-bigquery-mutator:$bq_version \
  create \
  --config /snowplow/config/bq_schema_b64 \
  --resolver /snowplow/config/iglu_resolver_b64

The mutator is expecting the full string, not a file that is base64 encoded. This will work:

docker run \
  -v /usr/config:/snowplow/config \
  -e GOOGLE_APPLICATION_CREDENTIALS=/snowplow/config/credentials.json \
  snowplow/snowplow-bigquery-mutator:$bq_version \
  create \
  --config $(cat /usr/config/bq_schema_b64) \
  --resolver $(cat /usr/config/iglu_resolver_b64)

Note the new command is passing in the full base64 string from the host filesystem, not the filename in the docker filesystem. You are getting the “Illegal character 5f” error because you are trying to use a filename and the filename has an underscore “_”.

2 Likes