I see a bit of misunderstanding here.
app_id, being collector parameter allows you push data by different applications to single Snowplow processing stream - in my case i use lots of websites to populate data to single Snowplow pipeline and than i can report based on
app_id. No magic behind that.
If we are talking about
appName for Kinesis stream consumers, this is a bit more weird. In fact this is not a real name for app, but an unique name for process consuming data from Kinesis stream. Namely Kinesis Consumer Library (which is used by SNowplow) uses DynamoDB to keep track of place (like pointer in stream) in Kinesis pipe, where process has finished eating data. This is used to process data at least once (this model is used by KCL, so you might have duplicates in particular situation). Moreover this allows parts of process to be scalable - you can have as may Enrichments as you want (but in fact more than number of shards does not make any sense, but you CAN). The
appName is strictly translated to name of DynamoDB table. As you can see, it is extremely important not to mangle with names - if you have more than 1 process eating form Kinesis stream, each should have different name, to “mark” in different table (good example is sending enriched events in parallel to S3 and Elasticsearch - if you use the same name for both sotrages, each will get about half of data). SImilarly, in case of failure, you can start the same process and use history in stream to rebuild data in relatime part (as kinesis keeps data up to 7 days). I personally prefer to create tables by myself and pass table names to Kinesis consumers as
appName parameter so i can limit each process to access only one table. Note, producers do not have
appName as they do not stamp Kinesis streams.
Bottom note: your data stream is incorrect. Enricher sends data to Kinesis. Not to DynamoDB. DynamoDB is just a convenient store for Kinesis stream position and/or different configs.