How to properly configure the Scala Stream Collector v0.16.0

With the recent release of Snowplow R116 Madara Rider, we introduced a host of new features to the Scala Stream Collector. Users now have greater control over the attributes attached to the Set-Cookie header of the collector response. However this means that the configuration file has had to become even more complex. If you’re migrating from an earlier version, it’s now easier to break stuff by simply copying the old hocon over. Also, you now have to be more aware about the specific implications of the Set-Cookie attributes in different browsers and in general.

In this thread, we’ll try to collect as many tips and hints about potential gotchas as we can.

0.15.0 cookie_domain =/= 0.16.0 cookie_domains

In earlier versions, users could specify a domain to be used for the cookie, overriding the default (which was to tie the cookie to the full collector domain).

From 0.16.0, there is no longer a cookie_domain variable in the configuration file. The closest to it (and best option for quick migration without any extra settings) is to use the new cookie_fallbackDomain variable instead. This value will be used in the cookie whenever the host form a request’s Origin header cannot be matched to a value in the new cookie_domains setting.

0.15.0

cookie {
  domain = "acme.com"
}
0.16.0

cookie {
  fallbackDomain = "acme.com"
}

To use multiple domains on the came collector, you’ll need multiple endpoints

From 0.16.0, you can specify a list of cookie_domains in the configuration file to be used in the Set-Cookie header of the response.

cookie {
  domains = [
    "acme.com"
    "acme.net"
  ]
}

When the collector receives a request from a tracker, it will read the Origin header and try to match the host to the list from the configuration. This allows you to set a first-party server-side cookie on both acme.com and acme.net. However, in order for this to work, you will also need to:

Multiple hosts in an Origin header will be ignored

If the collector receives a request that specifies multiple hosts in its Origin header, only the first match against the domains list will be used.

You should probably always configure a fallbackDomain

The fallbackDomain value is meant to be used when no host from the request’s Origin header can be matched to a value in the domains list. This can happen under two scenarios:

  • there is a valid Origin host in the request but that cannot be matched against any of the domains – either because no domains have been configured or because they are non-matching
  • there is no valid Origin host.

The second scenario might be more likely than you think, because GET and HEAD requests do not have an Origin header. So even if you set up the collector with multiple domains, a GET request from one of those valid domains will not be recognised and the cookie will be set against domain= if you have not specified a fallbackDomain.

No leading dots in domain names

When specifying domain names in the cookie section of the 0.16.0 hocon file, never prepend a leading dot.

correct

cookie {
  domains = [
    "acme.com"
    "acme.net"
  ]
  fallbackDomain = "acme.com"
}
incorrect

cookie {
  domains = [
    ".acme.com"
    ".acme.net"
  ]
  fallbackDomain = ".acme.com"
}

Take care if you’re migrating from an earlier versions and simply copying the domain value over to fallbackDomain.

Issues with SameSite=None and older versions of iOS

In newer versions of Chrome, you must specify SameSite=None if you want to retain unrestricted use. A missing SameSite attribute will be interpreted as SameSite=Lax by default in Chrome 80.

However, if you set SameSite=None for third party cookies, those might not work in older versions of iOS. According to this bug report, those cookies might be treated as SameSite=Strict. That might result in a loss of existing third-party cookies on iOS devices prior to iOS 13. (Hat tip to jankoulaga for this one.)

Third-party cookies have been disabled by default since Safari 6, so the above would only affect users on older Safari versions, as well as those who have explicitly enabled third-party cookies.

Custom request paths must comply with the vendor/version protocol

If you want to map your request paths to custom values, you are free to choose any path as long as it complies with the vendor/version protocol. For example:

correct

paths {
  "/com.acme/track" = "/com.snowplowanalytics.snowplow/tp2"
  "/redirect/sp" = "/r/tp2"
  "/fhdjfsfjgneoirenjfnJDGSOJWEONFJD/KFADFLJFJDHFDG" = "/com.snowplowanalytics.iglu/v1"
}
incorrect

paths {
  "/mycustompath" = "/com.snowplowanalytics.snowplow/tp2"
  "/my/custom/path" = "/r/tp2"
}

Also note that you must specify full valid paths, including a leading slash.

We will be updating this thread if there is any additional advice. In the meantime, please share your own tips or questions in the comments.

3 Likes

can anyone please help me in running jar file for snowplow scala stream collector 0.16.0.
I’ m facing the below error in logs folder

/opt/snowplow-collector/bin/snowplow-stream-collector-kinesis-0.16.0.jar: 1: /opt/snowplow-collector/bin/snowplow-stream-collector-kinesis-0.16.0.jar: PK^C^D^T^H^H^H^W?-O^T^DMETA-INF/MANIFEST.MFþÃ<9d><8e>1: not found
/opt/snowplow-collector/bin/snowplow-stream-collector-kinesis-0.16.0.jar: 2: /opt/snowplow-collector/bin/snowplow-stream-collector-kinesis-0.16.0.jar: <83>@^PE{Ã: not found
/opt/snowplow-collector/bin/snowplow-stream-collector-kinesis-0.16.0.jar: 2: /opt/snowplow-collector/bin/snowplow-stream-collector-kinesis-0.16.0.jar: ì^EvÃ: not found
/opt/snowplow-collector/bin/snowplow-stream-collector-kinesis-0.16.0.jar: 2: /opt/snowplow-collector/bin/snowplow-stream-collector-kinesis-0.16.0.jar: <85>­
<95>^D«<84>ôÃ8ÃÃì,Hn^_^Mh^Q<91><84>´^?æ¿ÿz^P^^³}PR<8e>Ã<98>ÃUeÃ<85>ÃS: not found
/opt/snowplow-collector/bin/snowplow-stream-collector-kinesis-0.16.0.jar: 121: /opt/snowplow-collector/bin/snowplow-stream-collector-kinesis-0.16.0.jar: ^H^X?-O: not found
/opt/snowplow-collector/bin/snowplow-stream-collector-kinesis-0.16.0.jar: 122: /opt/snowplow-collector/bin/snowplow-stream-collector-kinesis-0.16.0.jar: ^H^X?-O^Zcom/amazonaws/http/client/PK^C^D: not found
/opt/snowplow-collector/bin/snowplow-stream-collector-kinesis-0.16.0.jar: 123: /opt/snowplow-collector/bin/snowplow-stream-collector-kinesis-0.16.0.jar: ^H^X?-O^Xcom/amazonaws/http/conn/PK^C^D: not found
/opt/snowplow-collector/bin/snowplow-stream-collector-kinesis-0.16.0.jar: 124: /opt/snowplow-collector/bin/snowplow-stream-collector-kinesis-0.16.0.jar: ^H^X?-O^\com/amazonaws/http/conn/ssl/PK^C^D: not found
/opt/snowplow-collector/bin/snowplow-stream-collector-kinesis-0.16.0.jar: 164: /opt/snowplow-collector/bin/snowplow-stream-collector-kinesis-0.16.0.jar: ^H^X?-Ocom/amazonaws/http/conn/ssl/privileged/PK^C^D
^H^X?-O^]com/amazonaws/http/exception/PK^C^D
^H^X?-O^Xcom/amazonaws/http/impl/PK^C^D
^H^X?-O^_com/amazonaws/http/impl/client/PK^C^D
^H^X?-O^\com/amazonaws/http/protocol/PK^C^D
^H^X?-O^[com/amazonaws/http/request/PK^C^D
^H^X?-O^\com/amazonaws/http/response/PK^C^D
^H^X?-O^\com/amazonaws/http/settings/PK^C^D
^H^X?-O^Zcom/amazonaws/http/timers/PK^C^D
^H^X?-O!com/amazonaws/http/timers/client/PK^C^D
^H^X?-O"com/amazonaws/http/timers/request/PK^C^D
^H^X?-O^Wcom/amazonaws/internal/PK^C^D
^H^X?-O^\com/amazonaws/internal/auth/PK^C^D
^H^X?-O^^com/amazonaws/internal/config/PK^C^D
^H^X?-O^\com/amazonaws/internal/http/PK^C^D
^H^X?-O^Wcom/amazonaws/jmespath/PK^C^D
^H^X?-O^Rcom/amazonaws/jmx/PK^C^D
^H^X?-O^Vcom/amazonaws/jmx/spi/PK^C^D
^H^X?-O^Rcom/amazonaws/log/PK^C^D
^H^X?-O^Vcom/amazonaws/metrics/PK^C^D
not found

Thanks,
Tameem k

Hi @sp_user, would you mind sharing your configuration hocon? That might help pinpoint the reason for the errors you’re seeing.

Thanks for the help dilyan ,I resolved it .