appName unique ID


#1

From what I am gathering, the “app name” that connects all the pieces of the setup, from tracker -> collector -> enricher -> dynamodb -> s3loader is serving as a subscribed ID.

If I want to have multiple apps sending events to the same infrastructure, do I just have an app identifier in a custom field of an event? What is the best practice for having multiple apps/games using the same pipeline?


#2

@dbuscaglia, I assume you are enquiring about app_id. There’s no rule what that should be and it depends on your business model. Normally, the app_id identifies a website or server where the tracker is enabled. Thus you could distinguish between various websites and applications sending the events to the same collector. This will apply to separate mobile applications back-end servers, etc.

If you want to have a separate ID for each game on the same website, for example, you surely can do that. However, it might make more sense to attach the game ID to the custom event triggered by the game activation/engagement by users. It could be implemented as a property of the event or a property of contexts to a bunch of events related to the game - launch, pause, quit, etc. depending on your data model.


#3

Thank you for the reply @ihor. I suppose what I was confused by was the app id config variable that permeates each step of the data pipeline. A collector has an app id, as does an enricher, an s3loader has one, which makes a dynamo db table based on that app name.


#4

@dbuscaglia, those appNames are irrelevant to the events you track, the apps the tracker(s) are enabled on or objects users interact with. If you do want to track individual games my earlier statement is still relevant.


#5

I see a bit of misunderstanding here. app_id, being collector parameter allows you push data by different applications to single Snowplow processing stream - in my case i use lots of websites to populate data to single Snowplow pipeline and than i can report based on app_id. No magic behind that.

If we are talking about appName for Kinesis stream consumers, this is a bit more weird. In fact this is not a real name for app, but an unique name for process consuming data from Kinesis stream. Namely Kinesis Consumer Library (which is used by SNowplow) uses DynamoDB to keep track of place (like pointer in stream) in Kinesis pipe, where process has finished eating data. This is used to process data at least once (this model is used by KCL, so you might have duplicates in particular situation). Moreover this allows parts of process to be scalable - you can have as may Enrichments as you want (but in fact more than number of shards does not make any sense, but you CAN). The appName is strictly translated to name of DynamoDB table. As you can see, it is extremely important not to mangle with names - if you have more than 1 process eating form Kinesis stream, each should have different name, to “mark” in different table (good example is sending enriched events in parallel to S3 and Elasticsearch - if you use the same name for both sotrages, each will get about half of data). SImilarly, in case of failure, you can start the same process and use history in stream to rebuild data in relatime part (as kinesis keeps data up to 7 days). I personally prefer to create tables by myself and pass table names to Kinesis consumers as appName parameter so i can limit each process to access only one table. Note, producers do not have appName as they do not stamp Kinesis streams.

Bottom note: your data stream is incorrect. Enricher sends data to Kinesis. Not to DynamoDB. DynamoDB is just a convenient store for Kinesis stream position and/or different configs.


#6

my stream does go collector kinesis enricher kinesis s3loader