Snowplow R109 Lambaesis released


#1

We are proud to announce a new release of Snowplow:

This release is one the most-community driven release we’ve ever done, a huge thanks to everyone involved:

If you’re looking to contribute, we’ve significantly revamped our contributing guide.

Apart from containing great community contributions, this release revolves around new capabilities for the real-time pipeline!


Snowplow Docker images R9
#2

Hi @BenFradet,

thanks for releasing yet another great increment to the real time pipeline.

However, I would like to challenge the decision about truncating the X-Forwarded-For header field. We heavily rely on the whole list of IP-addresses or domain names present in that header field.

  • Corporate and private users can be proxy’fied from different networks but still have same internal IP (X-Forwarded-For: 192.168.123.45,cache.acme.com,gw.acme.com,cache.acme-s-isp.com vs X-Forwarded-For: 192.168.123.45,home-isp-cache.foo.com)
  • Traffic compression servers like Opera Mini can put different kind of values into the header field. In example you can see 127.0.0.1 as first entry

I think it should be up to the snowplow stack user to decide what to do with the header, and maybe use a header enrichment which we contributed a few years back to selectively read from X-Forwarded-For header and truncate to their liking?

What do you think?


#3

Hey @christoph-buente,

The only truncation we do surfaces in user_ipaddress, the headers are left intact. As such you can still use the header enrichment if you want to keep the whole chain of ip addresses.


#4

Thanks @BenFradet for clarifying. I was under the impression the collector is changing the header field and there is no way anymore to access the original headers during enrichment phase.

Then please ignore my concerns :wink:


#5

No worries, sorry if the explanation wasn’t clear enough.