Is a local setup of Snowplow dependent on AWS?


#1

I have local Snowplow setup with Tracker, Collector and Enrich running. Collector and Enrich are configured to use stdin and stdout as input and output methods. I was hoping to get pipeline running locally that way but Enrich logging tries to connect AWS anyway. This is what I see in console:

[WARN] [06/09/2016 13:48:56.365] [snowplow-scala-tracker-akka.actor.default-dispatcher-4] [akka://snowplow-scala-tracker/user/IO-HTTP/group-0/0] Configured connecting timeout of 10 seconds expired, stopping
[WARN] [06/09/2016 13:48:56.366] [snowplow-scala-tracker-akka.actor.default-dispatcher-5] [akka://snowplow-scala-tracker/user/IO-HTTP/host-connector-0/0] Connection attempt to 169.254.169.254:80 failed in response to GET request to /latest/dynamic/instance-identity/document/ with 5 retries left, retrying...
[WARN] [06/09/2016 13:48:56.372] [snowplow-scala-tracker-akka.actor.default-dispatcher-4] [akka://snowplow-scala-tracker/user/IO-HTTP/host-connector-0/1] Connection attempt to 169.254.169.254:80 failed in response to GET request to /latest/dynamic/instance-identity/document/ with 4 retries left, retrying...
[WARN] [06/09/2016 13:48:56.375] [snowplow-scala-tracker-akka.actor.default-dispatcher-4] [akka://snowplow-scala-tracker/user/IO-HTTP/host-connector-0/0] Connection attempt to 169.254.169.254:80 failed in response to GET request to /latest/dynamic/instance-identity/document/ with 3 retries left, retrying...
[WARN] [06/09/2016 13:48:56.377] [snowplow-scala-tracker-akka.actor.default-dispatcher-9] [akka://snowplow-scala-tracker/user/IO-HTTP/host-connector-0/1] Connection attempt to 169.254.169.254:80 failed in response to GET request to /latest/dynamic/instance-identity/document/ with 2 retries left, retrying...
[WARN] [06/09/2016 13:48:56.379] [snowplow-scala-tracker-akka.actor.default-dispatcher-9] [akka://snowplow-scala-tracker/user/IO-HTTP/host-connector-0/0] Connection attempt to 169.254.169.254:80 failed in response to GET request to /latest/dynamic/instance-identity/document/ with 1 retries left, retrying...
[WARN] [06/09/2016 13:48:56.380] [snowplow-scala-tracker-akka.actor.default-dispatcher-9] [akka://snowplow-scala-tracker/user/IO-HTTP/host-connector-0/1] Connection attempt to 169.254.169.254:80 failed in response to GET request to /latest/dynamic/instance-identity/document/ with no retries left, dispatching error...
[INFO] [06/09/2016 13:48:56.382] [snowplow-scala-tracker-akka.actor.default-dispatcher-2] [akka://snowplow-scala-tracker/deadLetters] Message [akka.actor.Status$Failure] from Actor[akka://snowplow-scala-tracker/user/IO-HTTP/host-connector-0/1#-2096861829] to Actor[akka://snowplow-scala-tracker/deadLetters] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

Here, as far as I understand, /latest/dynamic/instance-identity/document/ refers to EC2 path. Also ip mentioned in those warnings is found only from compiled EC2 classes:

$ grep -r 169.254.169.254 .
Binary file ./2-collectors/scala-stream-collector/target/streams/$global/assemblyOption/$global/assembly/f107d5880901222ed16966331467148a044e5163_6247112519c0acbe13d01b3ce90fa82813bc899f/com/amazonaws/internal/EC2MetadataClient.class matches
Binary file ./3-enrich/stream-enrich/target/streams/$global/assemblyOption/$global/assembly/32fd15d3c40bf10538cbcd607389ff1dbd0f79b7_cac14c3d6f91c8a37cf24f2a939ae015fdb89c84/com/amazonaws/internal/EC2MetadataClient.class matches
Binary file ./3-enrich/stream-enrich/target/streams/$global/assemblyOption/$global/assembly/f658dbe8ccb60ef2796e81a5da6ed2d7e231d251_d694565572e63ab3f46d48245f70fe8fe36bd402/com/snowplowanalytics/snowplow/scalatracker/Ec2Metadata$.class matches

Point is, I would like to run Snowplow locally without AWS at all. Is this possible? Or am I missing something here?

In addition here is my run command where you can find used versions:

2-collectors/scala-stream-collector/target/scala-2.10/snowplow-stream-collector-0.7.0 --config 2-collectors/scala-stream-collector/examples/collector.conf | 3-enrich/stream-enrich/target/scala-2.10/snowplow-stream-enrich-0.8.0 --config 3-enrich/stream-enrich/examples/enrich.conf --resolver file:3-enrich/config/iglu_resolver.json


#2

Hi @paulisiirama,

That particular logging is output from the embedded scala-tracker which is attempting to automatically add contextual information about an EC2 instance.

It is safe to ignore these warnings if you are running the applications locally.

The other option is to remove the monitoring portion of your config which should stop the Tracker from being instantiated - thus preventing these errors.

Hope this helps,
Joshua


#3

Hey @josh,

Thanks for reply! Commenting out monitoring part vanished those warnings.

I was wondering is Snowplow intentionally made to be dependent on AWS or are there any plans make it fully work on other environments? It might be already possible but I keep seeing things like hard coded EC2 connection and necessary dummy kinesis config even when stdin/stdout are used etc.

-ps


#4

Hey @paulisiirama,

We are definitely planning on supporting non-AWS fabrics for Snowplow later this year. AWSisms creep in because that’s Snowplow’s primary platform today.

The hard-coded EC2 connection to grab box metadata isn’t part of Snowplow itself - it’s just a neat feature of the Snowplow Scala Tracker, which the Snowplow stream apps embed.

Any other AWSisms - do please create tickets and we will get to those!