Enrich cannot connect to the source stream; UnknownHostException in Fargate

Hello,

We are deploying enrich module on AWS Fargate with the pushing the enrich configuration below to ECR:

enrich {

  streams {

    in {
      # Stream/topic where the raw events to be enriched are located
      raw = ${ENRICH_STREAMS_IN_RAW}
    }

    out {
      # Stream/topic where the events that were successfully enriched will end up
      enriched = ${ENRICH_STREAMS_OUT_ENRICHED}
      # Stream/topic where the event that failed enrichment will be stored
      bad = ${ENRICH_STREAMS_OUT_BAD}
      # Stream/topic where the pii tranformation events will end up
      pii = ${ENRICH_STREAMS_OUT_PII}

      # How the output stream/topic will be partitioned.
      # Possible partition keys are: event_id, event_fingerprint, domain_userid, network_userid,
      # user_ipaddress, domain_sessionid, user_fingerprint.
      # Refer to https://github.com/snowplow/snowplow/wiki/canonical-event-model to know what the
      # possible parittion keys correspond to.
      # Otherwise, the partition key will be a random UUID.
      # Note: Nsq does not make use of partition key.
      partitionKey = event_id
    }
    # Configuration shown is for Kafka, to use another uncomment the appropriate configuration
    # and comment out the other
    # To use stdin, comment or remove everything in the "enrich.streams.sourceSink" section except
    # "enabled" which should be set to "stdin".
    sourceSink {
      # Sources / sinks currently supported are:
      # 'kinesis' for reading Thrift-serialized records and writing enriched and bad events to a
      # Kinesis stream
      # 'kafka' for reading / writing to a Kafka topic
      # 'nsq' for reading / writing to a Nsq topic
      # 'stdin' for reading from stdin and writing to stdout and stderr
      enabled = kinesis

      # Region where the streams are located (AWS region, pertinent to kinesis sink/source type)
      # region = {{region}}
      region = ${ENRICH_STREAMS_SOURCE_SINK_REGION}

      ## Optional endpoint url configuration to override aws kinesis endpoints,
      ## this can be used to specify local endpoints when using localstack
      # customEndpoint = {{kinesisEndpoint}}
      #customEndpoint = "localstack:4566"

      ## Optional endpoint url configuration to override aws dyanomdb endpoints for Kinesis checkpoints lease table,
      ## this can be used to specify local endpoints when using Localstack
      # dynamodbCustomEndpoint = "http://localhost:4569"
      #dynamodbCustomEndpoint = "localstack:4566"

      # Optional override to disable cloudwatch
      # disableCloudWatch = true
      # disableCloudWatch = ${ENRICH_DISABLE_CLOUDWATCH}

      # AWS credentials
      # If both are set to 'default', use the default AWS credentials provider chain.
      # If both are set to 'iam', use AWS IAM Roles to provision credentials.
      # If both are set to 'env', use env variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
      aws {
        accessKey = default
        secretKey = default
      }

      # GCP credentials
      # Either provide path to service account file or set environment variable GOOGLE_APPLICATION_CREDENTIALS
      # gcp {
      #   creds = {{googleApplicationCredentials}}
      #   creds = ${GOOGLE_APPLICATION_CREDENTIALS}
      # }

      # Maximum number of records to get from Kinesis per call to GetRecords
      maxRecords = 100

      # LATEST: most recent data.
      # TRIM_HORIZON: oldest available data.
      # "AT_TIMESTAMP": Start from the record at or after the specified timestamp
      # Note: This only effects the first run of this application on a stream.
      # (pertinent to kinesis source type)
      initialPosition = TRIM_HORIZON

And we get the following error in CloudWatch after the ECS tasks get terminated automatically after provisioning:

[main] WARN com.networknt.schema.JsonMetaSchema - Unknown keyword exclusiveMinimum - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
[main] WARN com.amazonaws.http.AmazonHttpClient - SSL Certificate checking for endpoints has been explicitly disabled.
[main] WARN com.amazonaws.http.AmazonHttpClient - SSL Certificate checking for endpoints has been explicitly disabled.
[main] WARN com.amazonaws.http.AmazonHttpClient - SSL Certificate checking for endpoints has been explicitly disabled.
Exception in thread "main" java.net.UnknownHostException: d4f5b4e280c4: d4f5b4e280c4: Name or service not known
	at java.net.InetAddress.getLocalHost(InetAddress.java:1505)
	at com.snowplowanalytics.snowplow.enrich.stream.sources.KinesisSource.run(KinesisSource.scala:147)
	at com.snowplowanalytics.snowplow.enrich.stream.KinesisEnrich$.main(KinesisEnrich.scala:97)
	at com.snowplowanalytics.snowplow.enrich.stream.KinesisEnrich.main(KinesisEnrich.scala)
Caused by: java.net.UnknownHostException: d4f5b4e280c4: Name or service not known
	at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
	at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
	at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
	at java.net.InetAddress.getLocalHost(InetAddress.java:1500)

Try running

echo "127.0.0.1 $HOSTNAME" >> /etc/hosts

to add to /etc/hosts and then rerunning. I’m pretty sure this is a Fargate specific problem that occurs due to the Kinesis Source not being able to pick up the hostname for the local machine which is used as the KCL worker id.

1 Like

Thatnks a lot for your response. Where do we add the command you mentioned? How can we override the deault entrypoint of the module?

This is how our dockerfile looks like at the moment:

FROM snowplow/stream-enrich-kinesis:1.1.0

COPY enrichment.conf . 
COPY resolver.json . 
COPY log4j.properties .

CMD ["--config", "enrichment.conf", \ 
 "--resolver", "file:resolver.json", \ 
 "-Dcom.amazonaws.sdk.disableCertChecking", "-Dcom.amazonaws.sdk.disableCbor", \
 "-debug", "-Dlog4j.debug", "-Dlog4j.configuration=file:log4j.properties"]

Hey @mike, maybe I can elaborate a little. We were trying to override the snowplow entrypoint with the help of a sh script. And now we’re facing two issues:

  1. we cannot add the host to etc/hosts permission denied
  2. we do not know how to use the default endpoint after executing the command.

entrypoint.sh

#!/usr/bin/dumb-init /bin/sh
set -e

/bin/sh -c "echo 127.0.0.1 $HOSTNAME >> /etc/hosts"

exec /usr/local/bin/docker-entrypoint.sh $*

In the dockerfile we call the entrypoint then:
Dockerfile

FROM snowplow/stream-enrich-kinesis:1.1.0

COPY enrichment.conf . 
COPY resolver.json . 
COPY log4j.properties .

ENTRYPOINT ["sh", "/home/snowplow/entrypoint.sh"]

CMD ["--config", "enrichment.conf", \ 
 "--resolver", "file:resolver.json", \ 
 "-Dcom.amazonaws.sdk.disableCertChecking", "-Dcom.amazonaws.sdk.disableCbor", \
 "-debug", "-Dlog4j.debug", "-Dlog4j.configuration=file:log4j.properties"] 

Unfortunately we get a /bin/sh: 1: cannot create /etc/hosts: Permission denied error for the echo command. Furthermore we were not sure how to enter the default entrypoint after appending the host to etc/hosts.

Is there maybe a way to include them as extrahosts in the container definition of our ECS service on Fargate:

[
  {
    "name": "sp_enrichment",
    "image": "${sp_enrichment_module_image}",
    "cpu": ${fargate_cpu},
    "memory": ${fargate_memory},
    "networkMode": "awsvpc",
    "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "${log_group}",
          "awslogs-region": "${aws_region}",
          "awslogs-stream-prefix": "ecs"
        }
    },
    "portMappings": [
      {
        "containerPort": ${container_port},
        "hostPort": ${container_port}
      }
    ]
    "extraHosts": [
      {
        "hostname": "???", # how can we retrieve the host name?
        "ipAddress": "127.0.0.0"
      }
      .
    ]
  }
]

Any help would be greatly appreciated. @AcidFlow did you run into the same issue? ::slight_smile:

1 Like

Hello @mgloel,

I didn’t run into this issue.
What are you trying to achieve by overriding the host / what do you mean by overriding the Snowplow endpoint?

In our deployment we build a custom docker image to include our configuration files.

Then in the Fargate task definition, we simply override the command:

 "command": [
        "--config", "/snowplow/config/stream-enrich.conf",
        "--resolver", "file:/snowplow/config/resolver.json",
        "--enrichments", "file:/snowplow/config/enrichments/"
    ]

as shown in this AWS doc, and passing some environment variables to configure properties in the configuration dynamically such as the stream names etc.

I do not have extraHosts or portMapping as in our current setup each component of the Snowplow pipeline runs in a separate Fargate task.

1 Like

Hi @AcidFlow,
thanks a lot for your quick reply. We ran into this issue that is also described here:


It seems to be a general issue with running the enrichment on fargate sse Mike’s response.

Bascially we want to run this command:
echo "127.0.0.1 $HOSTNAME" >> /etc/hosts
before we use the entrypoint of the snowplow image:

ENTRYPOINT ["/home/snowplow/bin/kinesis"]

CMD ["–config", “oneapp_enrichment.conf”, \
“–resolver”, “file:resolver.json”, \
“-Dcom.amazonaws.sdk.disableCertChecking”, “-Dcom.amazonaws.sdk.disableCbor”,
“-debug”, “-Dlog4j.debug”, “-Dlog4j.configuration=file:log4j.properties”]

1 Like

I’m not familiar enough with Fargate to know the answer but it looks like the entrypoint modification in the blog post you linked to should work.

It looks like this may have been a Fargate issue that may have been resolved in 1.4.0?

2 Likes

Hey @mgloel,

Okay I never experienced the same issue and yes that’s strange that this problem does not appear for everyone.

Nevertheless I think I know why you get a permission denied :wink:

You are building an image from the snowplow base image which set the user to snowplow at the end of their Dockerfile.

Therefore your entrypoint is executed with this user and is not allowed to change the host file.

What you can do is something like the following:

Dockerfile

FROM snowplow/stream-enrich-kinesis:1.1.0

USER root
COPY entrypoint.sh .
RUN chmod +x entrypoint.sh

ENTRYPOINT ./entrypoint.sh "$@"

entrypoint.sh:

#!/bin/bash
set -e

echo "127.0.0.1 $HOSTNAME" >> /etc/hosts

gosu snowplow /home/snowplow/bin/snowplow-stream "$@"

The first part of the script will run as root and switch back to the snowplow user to start the stream enrich application.

I hope this helps and solves your problem!

3 Likes

Awesome! Thank you so much @AcidFlow and @mike.
Switching to root user and then back to the snowplow user was really the small step that we were missing. It works now.

1 Like

Good to hear!

Happy Snowplowing!

2 Likes