Dataflow runner docker container

Hey all,
Strange issue here, have install Snowplow collector, enrich, redshift, s3-sink, rdb-loader with TerraForm and docker containers (AWS Fargate)

To my understanding to run RDB-Loader (post 35) you recommend either dataflow runner or a boto3 script (R35 Upgrade Guide - Snowplow Docs)

For now due to external factors i can’t use a boto 3 script (or Lambda for that matter)
So instead of a ec2 server to run the script i planed to run dataflow-runner in a simple docker container

Based on the documentation if i use a Linux 64 i should be able to run the dataflow runner with no other dependencies.
Although i have no issues to get the container up and running and got a bash script as a launch script (using that for all docker images) I cant get dataflow-runner to run. Response i get is “not found” even though i have verified that its executable and I got the right permissions, some suggestions on stack is that this is because I am missing some dependencies. Have anyone got dataflow runner to work in a docker image? or any ideas would be welcome… Thanks F

Hi @fwahlqvist ,

That’s correct.

Could you share your script running dataflow runner please ? Have you tried doing all the steps manually on a Docker container to make sure that it’s working ?

Hey Ben, thanks for getting back to me

Docker-compose.yml is

version: "3"
    container_name: snowplow-dataflow-runner
    #  - .:/snowplow
    #restart: "unless-stopped"
      context: ./
      dockerfile: Dockerfile

Dockerfile is

FROM  snowplow/base-alpine as builder

#RUN apk update && apk upgrade && apk add bash && apk add bash-completion
WORKDIR /snowplow
COPY /snowplow/ 
COPY playbook.json /snowplow/playbook.json
COPY cluster.json /snowplow/cluster.json
RUN wget
RUN unzip

FROM snowplow/base-alpine
RUN apk update && apk upgrade && apk add bash

WORKDIR /snowplow
COPY --from=builder /snowplow /snowplow 
RUN chmod +x
#RUN chown  snowplow:snowplow
#RUN echo ${PATH}
#RUN ls -la
ENTRYPOINT [ "./" ] is

echo "in script"
ls -la
./dataflow-runner help
#./dataflow-runner run-transient --emr-config=cluster.json --emr-playbook=playbook.json
#run-transient  Launches, runs and then terminates an EMR cluster

And finally the output from the script part is

snowplow-dataflow-runner | in script
snowplow-dataflow-runner | ./ line 5: ./dataflow-runner: not found
snowplow-dataflow-runner | total 28652
snowplow-dataflow-runner | drwxr-xr-x    1 snowplow snowplow      4096 Feb 20 15:17 .
snowplow-dataflow-runner | drwxr-xr-x    1 root     root          4096 Feb 20 15:17 ..
snowplow-dataflow-runner | drwxr-xr-x    1 snowplow snowplow      4096 Oct 29 15:47 bin
snowplow-dataflow-runner | -rw-r--r--    1 root     root          1987 Feb 17 17:43 cluster.json
snowplow-dataflow-runner | drwxr-xr-x    2 snowplow snowplow      4096 Oct 29 15:47 config
snowplow-dataflow-runner | -rwxr-xr-x    1 root     root      20789708 Feb 20 15:17 dataflow-runner
snowplow-dataflow-runner | -rw-r--r--    1 root     root       8518063 Aug 24 15:55
snowplow-dataflow-runner | -rwxr-xr-x    1 root     root           214 Feb 20 15:17
snowplow-dataflow-runner | -rw-r--r--    1 root     root          1483 Feb 17 18:23 playbook.json
snowplow-dataflow-runner | /snowplow
snowplow-dataflow-runner exited with code 127 

Hopefully something simple I am missing but tried chaining directories, permissions and path etc.
Any insight is welcome …


Hey @fwahlqvist ,

I suspect that the Docker image you are using does not contain all the system libraries required by Dataflow runner binary.

Either you could use a bigger Linux image or you can troubleshoot which library is missing.

$ ldd dataflow-runner (0x00007fff7198c000) => /lib/x86_64-linux-gnu/ (0x00007fcec4640000) => /lib/x86_64-linux-gnu/ (0x00007fcec42a1000)
        /lib64/ (0x00007fcec485d000)

Maybe one of these libs is missing. Could you run strace -f -e open ./dataflow-runner and see what the ouput says please ?

Hey Ben,
Many thanks for the help, it turns out that the “” is not in the kernel of base-alpine so updated my docker to use base-debian, when running the code now it prints out the help command…
Many thanks !

1 Like