Snowplow-emr-etlrunner exit code 3 on empty input bucket

#1

When snowplow-emr-etlrunner finds no new files to process in aws.s3.buckets.raw.in, it returns exit code 3 and logs to debug: “No logs to process: No Snowplow logs to process since last run”.

I am running snowplow-emr-etlrunner as a step in a bash script in a cron job. The script starts with “set -e” so that if any step fails (exit code != 0), subsequent steps will be canceled. In the case of no new data, I would like for the wrapper script to continue and run the other steps.

I can wrap the call to snowplow-emr-etlrunner in a conditional so it will ignore exit code 3 and keep running if there is no new data; but before I do that, thought I would check here to see if

  • Is there a way to configure snowplow-emr-etlrunner so it will return 0 on empty input buckets?

  • Are there other conditions that lead to exit code 3 that are less innocuous?

#2

Hi @wleftwich,

As per https://github.com/snowplow/snowplow/blob/master/3-enrich/emr-etl-runner/bin/snowplow-emr-etl-runner, the only reason for exit code 3 is no data to process in input S3 bucket.

Cheers,
GE

#3

Thanks @grzegorzewald.

In case it’s useful to anyone else, here is how I trap exit code 3.

#!/bin/bash

set -e
# Do various things, exit if any command returns != 0

set +e
./snowplow-emr-etl-runner run -c config.yml -r resolver.json -n enrichments -t targets -l lockfile
ret=$?
if [ $ret -ne 0 -a $ret -ne 3 ]
then 
	exit $ret
fi
set -e
#4

@wleftwich, we use Factotum internally. It provides a dedicated property, terminateJobWithSuccess, not to raise an exception on the exit codes (other than 0) you provide with it. The idea though is the same as in your shell script.

There are other exit codes you might consider:

  • 3: NoDataToProcessError
  • 4: DirectoryNotEmptyError
  • 17: LockHeldError

You can find them in the code here.