Snowplow-google-analytics-plugin does not send app_id


#1

Hi.

I am executing a test to collect events using Snowplow GA plugin. The field app_id is not set by the plugin. Then in enrichment phase there is a null app_id: { "app_id": null }.

However there is a derived context - created by GoogleAnalyticsAdapter - that has Google trackingId field.

"contexts_com_google_analytics_measurement_protocol_general_1": [{
  "protocolVersion": "1",
  "trackingId": "UA-XXXXXX-XX",
  "cacheBuster": "XXXXXXXX"
}],

I was expecting configure the app_id somewhere, maybe in plugin configuration:

ga('require', 'spGaPlugin', { endpoint: 'https://events.acme.com', appId: 'XXXX' });

Or at least Google trackingId as Snowplow app_id.

Finally, my questions:

  1. Is there a way to set app_id using GA plugin?
  2. Why Google trackingId is not defined as app_id?

Cheers,
Ander


#2

@aparra, GA plugin is simply used to capture the data sent to Google Analytics and push a copy of it down Snowplow pipeline. It is not amended in any way.


#5

@aparra, it appears I was wrong about having Snowplow app_id linked to GA trackingId. As I earlier stated the Snowplow and GA events are not the same. They use different trackers and the events are not related to each other even though they might track the same events.

I have removed my confusing statement from the previous comment.


#6

Hi @ihor,

as i tend to agree, that the events emmited by either snowplow or google analytics are different, the concept of the app_id as an identifier of a “customer” is still valid in both cases. And if we want to link snowplow events and GA events for one and the same customer later down the pipeline, we need a way set the app_id for the GA events. There are two ideas here:

  1. Use the UA-XXXXXX-XX GA id as app_id
  2. pass a dedicated app id to the initializer of the GA plugin

What do you think?


#7

Thanks, @christoph-buente. Depending on what you mean by the “customer” the item 1 could be used indeed. The app_id defines the application as a whole and is not suitable to identify a user (customer) in terms of sessions, etc. If a broader meaning is applied in terms of application for a specific “customer” (web application) for which a dedicated ID is used in a multitenant environment then surely app_id could be set to UA-XXXXXX-XX.

The item 2 is not likely to be implemented although I can imagine possible if there is a demand for such a feature.


#8

Hi @ihor,

with customer we don’t mean the actual user browsing the web, but a multitenant setup.In order to keep their data separate we use the app_id. From the documentation of the GA plugin, i can see a config object is passed containing the snowplow endpoint. This is used to setup the tracker internally. Would it be so complex to pass on an optional key in the config object to set the app_id?

ga('require', 'spGaPlugin', { endpoint: 'https://events.acme.com', app_id: 'myAppId' });

The desired behaviour would be, that the app_id is appended to every request leaving for the collector with the appropriate parameter like aid according to the tracker protocol.
But as it looks like the collector handler ignores a parameter called aid completely and the GA adapter puts it into separate custom context:

This is what we send in the payload:
v=1&aid=myAppId&_v=j68&a=1737411827&t=pageview&_s=1&dl=http%3A%2F%2Flocalhost%2Fpage-view.html&ul=en-gb&de=windows-1252&sd=24-bit&sr=1440x900&vp=1440x329&je=0&_u=QCCAAEAD~&jid=618920935&gjid=690343321&cid=882630223.1534330790&tid=UA-123394832-1&_gid=1179024832.1534496179&_r=1&z=905653280

And the resulting context looks like this:

"contexts_com_google_analytics_measurement_protocol_app_1": [
  {
    "id": "myAppId"
  }
]

@alex, @BenFradet Any thoughts here?


#9

Hey @christoph-buente,

This is because aid is part of the measurement protocol: https://developers.google.com/analytics/devguides/collection/protocol/v1/parameters#aid.

This is typically why we wouldn’t want to mix and match Google’s measurement protocol and the Snowplow tracker protocol.


#10

OK, fair. But let’s ignore the exact parameter name for the sake of having the discussion about passing the app id with GA events which could either be the UA-XXX id or a dedicated app_id.


#11

I see where you are coming from - but it feels like yours is a pretty unusual use case @christoph-buente:

  • You want to run a single multi-tenanted Snowplow pipeline,
  • With a mix of customers using Snowplow tracking or GA tracking at their discretion

As @BenFradet says, I am really not a fan of adding Snowplow tracking concepts like app_id into the GA adapter - because as soon as we do that, we are actually crafting a hybrid protocol, Snowplow-GA, rather than simply supporting GA.


#12

@alex, gotcha.

Why i actually thought, you might be in favour of our suggestion is, because the collector already fills some of the SP canonical event model fields from the GA events, like screen resolution, enabled java feature, view port and other browser related fields. So obviously you agreed at some point, that the GA/SP events can have something on common.

While i agree with you, that the tracked events can and will be of fundamentally different nature. But how do you solve the problem of either merging GA and SP traffic from the same page, without having a common id? Or how do you tell the traffic of different SP/GA installations apart?


#13

That’s a good point: while we are not keen to create a hybrid Snowplow-GA protocol, we of course understand that both protocols are modeling a similar web analytics space, which is why we do the downstream mapping of GA protocol into Snowplow event model.

Given this, I think it would be coherent to support @aparra’s suggestion in the original post:

Or at least Google trackingId as Snowplow app_id

@BenFradet any thoughts on this?