Issue with snowflake transformer/loader

I’m attempting to set up snowplow for the first time for my company, and load the data into our Snowflake setup. I’m doing everything on AWS, and I’ve got everything up to the snowflake loader working (I believe).

However, the loader is failing on the Transform step and sending the failed records to my “badOutputUrl”. The failure error is “FieldNumberMismatch”. I haven’t been able to find any information on what could be causing this in previous discourse posts or anywhere else.

I’ve created this gist with all the relevant config files. Let me know if you need any other information that what I’ve provided.

I’m really stuck on this one. I’m not sure what the issue could be at this point, so any ideas would be wonderful!

Hi @ian-dribbble , the problem seems to be that the enriched event has more fields than expected: 391 vs the “canonical” 130. Looking at the example you provided, it looks like there are a lot of redundant tabs in the enriched event on S3.

Do you have a way to compare the enriched data on S3 with the events coming out of Enrich into Kinesis? What do you use to get them from Kinesis to S3? I wonder if that step is not adding all the extra tabs.

1 Like

Thanks for the reply @dilyan. I’m just using kinesis firehose to dump it from kinesis to s3. The configuration is pretty simple. I don’t have any conversions, compression, or encryption turned on.

Could it possibly be something with the iglu schema I’m getting from iteratively? I added iteratively-schema.json to my gist above, including the only event I’m attempting to trigger so far.

Also, is it something to do with my enrich setup? I’m not actually running any enrichments yet. I’m using the snowplow/stream-enrich-kinesis:latest dockerfile and here’s the command I’m using to run it (after copying the config files over):

["--config", "/snowplow/config/config.hocon", "--resolver", "file:/snowplow/config/resolver.json"]

Hi @ian-dribbble , I don’t think the Iteratively schema is the culprit. If you look at the example you shared, there are no extra tabs inside the JSON blob that contains the PageViewed event.

I can think of two places where these extra tabs might be getting introduced:

1.) In Enrich. The only way I can imagine it could happen here is if you have the JS enrichment running and that is updating the event in place – a bug here might be unnecessarily padding fields with tabs.

However you said you’re not running any enrichments, so that leads me to:

2.) In the process that loads the enriched data from Kinesis to S3. We don’t use Firehose for this usually. Rather, there’s a tool that you can use, which we maintain: Load data to S3 - Snowplow Docs . Would you be able to give that one a go?

Previously I asked you if you can see what the enriched data looks like in Kinesis, as way to check if scenario 2 above holds. I think you can use the aws cli tools to get records from Kinesis and inspect them, just to see if they will have the extra tabs. If they do, then we’re back on scenario 1; but from the evidence so far that appears to be the less likely scenario.

3 Likes

Jumping in just to add:

I can’t find the threads where this has come up before, but I do remember an issue coming up with using Firehose instead of S3 loader.

If memory serves, Kinesis firehose just dumps everything it finds into S3 without any delimiter between events, but S3 loader delimits with newline. I suspect that this may be the issue here.

I believe people have mentioned that they circumnavigated the problem by adding a custom lambda function to their firehose setup to add a newline delimiter to the end of each event.

In my opinion Dilyan’s suggestion to use the S3 loader is the safest option, since it’s what we maintain as compatible with our other components (so any changes that happen to any of the components will be forward compatible with this setup).

Hope that extra bit of context helps!

1 Like

Thank you @dilyan and @Colm! I’ll give the s3 loader a try. I didn’t realize that was a potentially necessary piece with the snowflake loader.

The s3 loader was the piece I was missing. Everything is working great now. Thanks guys!

3 Likes

That’s great news!

2 Likes