Errors getting the Enrich working - With and without the enrich flag

Good Morning

I’ve been trying to get the enrichment jar to work for probably the last 12 hours or so and I can’t seem to figure out the two different errors I’m getting when I run the jar file with or without the --enrich flag.

The problem I’m trying to solve is get rid of the HTTP headers in the snowplow stream because when the tags are sent to Kinesis, I get following in the kinesis.records.data:

�p(Gra�UTF-8�ssc-1.0.0-kinesis,sMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.366http://px.system/@#/com.snowplowanalytics.snowplow/tp2T�
    {
        "schema": "iglu:com.snowplowanalytics.snowplow/payload_data/jsonschema/1-0-4",
        ...

So I’m trying to add the enrichment to remove the HTTP headers so I can get a clean JSON data point to then use in Lambda functions, but I can’t get my enrichment process to work.

So I’ve run into two different errors when I run the JAR file on my server:

The first error I can’t get around

When I run the command without the enrich flag, it seems to run but I get error notices around Caught exception while sync'ing Kinesis shards and leases. I’ve looked around the web and understand that it could be a AWS policy issue, but the role that the kinesis streams run from and the AWS key I created for the collector and enrich all have full access to: Kinesis, Lambda, DynamoDB, CloudWatch. To help with the troubleshooting, the output of running this command is attached: java -jar snowplow-stream-enrich-kinesis-1.0.0.jar --config config.hocon --resolver file:iglu_resolver.json

The second error I can’t get around

The second problem I run into and having hard time understand the output, is that when I run it with the --enrich flag, it errors out entirely. Since I downloaded the JAR file, I don’t believe I have the ability to add the debug flag, but here is the output of running the following command: java -jar snowplow-stream-enrich-kinesis-1.0.0.jar --config config.hocon --resolver file:iglu_resolver.json --enrichments file:/enrich/

If anyone has any ideas/thoughts that could help me get around this, it would be grand.

To also note, I’m running this on a RaspberryPi 4B as I need to run a demo for a client on how they can use Snowplow for their real-time analytics with Kinesis/Lambda and push the data sets to our software for marketing automation.

The collector is working fine, no problem there, but I did compile the collector whereas with the enricher I did download and use the JAR supplied. I did try to compile the enricher but ran into errors and I have to run out the door for work, but I’ll give it a go again in a few hours to try and attempt other areas to resolve the issues noted above.

Thanks for the help and I hope it’s not TLDR. :slight_smile:

Do you have a DynamoDB stream enrich table containing lease information? If it hasn’t been created (generally a permission error) this can crop up.

The error output here is pretty general but if it’s a Raspberry Pi you may be better off either compiling on the system or using the Docker container.

Hi Mike, thanks for the super quick reply mate!

I do have a DynamoDB table setup, I’ve called it demoConfigs and referenced that in config file parameter appName. Is this the right place to hold the table name?

As for the compiling on the Pi, yeah I’ll do that again in a few hours and see if I can get it going.

The KCL should create this table for you - if you’ve manually created it try and delete the table and let Stream Enrich / KCL create the table.

2 Likes

Wow, that did indeed work! Most excellent tip.

And just as a real quick last thing, I tried compiling the enrich again and this is the error I get:

Invalid maximum heap size: -Xmx6G
The specified size exceeds the maximum representable size.
Error: Could not create the Java Virtual Machine.

The 4B caps out at 4G of RAM - you might be able to compile it by setting the maximum heap size to 4G instead of 6, but I’m not too sure as I haven’t tried this.

Now a more of a functional question…

When I have the collector running and the enricher running, both are watching shardId-00000000000000

When I used the CLI to send a payload, the collector picks it up, but I didn’t see the enricher output say anything, just that it was sleeping…
Do I need to configure anything to have those two talk to each other?

Whoa… 20 minutes late… Gotta run, be back online in about an hour. Thanks for all the help Mike!

If the enricher has been setup to listen from the same raw topic everything should just work - in theory. Both the collector and the enricher have configurable buffer settings - depending on what these are it may take a few seconds / more bytes for you to see the output of stream enrich.

Hey Mike - not sure if you’re still around… I tried compilling with a lower mem size and it just doesn’t like it.

If I were to bring up an EC2 and run the enricher there, there wouldn’t be any problems because the enricher is listening Kinesis stream and would still pick it up?

Yes - if you’re running something that is designed to collect on the edge then you could run the collector on the Raspberry PI and the enricher within EC2. If you just want to send data from the device then you’re better off running all the infrastructure in AWS and just using a tracker (very lightweight) to send the data over HTTPS to a load balancer rather than running any Snowplow components on the limited hardware.

1 Like

Thanks Mike - I got it working on the Mac, so good to go.

Cheers for the help!

1 Like

This was exactly my problem. Thanks!