Unstructured events problem with storageloader and postgresql


#1

Hello,

First, i’m really sorry if this is a dumb question, i’m learning how to deal with snowplow.

I have a snowplow setup that uses the redshift for final storage and any structured and unstructured event is working really fine with it, but now i need to use postgresql for that final storage and the storageloader only load structured events, here are my configs:

resolver.json: https://transfer.sh/I9qMZ/resolver.json.s
snowplow-emr-etl-runner.yml: https://transfer.sh/mBqM4/snowplow-emr-etl-runner.yml.s

Commands:

RUN EMR-ETL-RUNNER

./bin/snowplow-emr-etl-runner -c ./config/snowplow-emr-etl-runner.yml -r ./config/resolver.json -n ./enrichments/

RUN STORAGELOADER

./bin/snowplow-storage-loader -c ./config/snowplow-emr-etl-runner.yml

Obs: the same approach work with redshift:

Some other data:
All s3 bucket are created and with the right permissions
The storage loader download all enriched data to tempdir
On the log, no error is issued.

○ → find tempdir/run=2016-07-26-20-19-17/atomic-events/|xargs wc -l
140908 total -> all that events are loaded in atomic.events on postgresql

○ → find tempdir/run=2016-07-26-20-19-17/|grep -v atomic|xargs wc -l
527505 total -> these not :frowning:

Ps: sorry for the bad english, it’s not my native language.


#2

Hi @lsferreira42 - right, the reason is that our load process for Postgres only supports loading atomic.events currently; it doesn’t perform shredding and thus loading of unstructured events and custom contexts into dedicated tables.

We plan on eventually adding this support into Postgres but it hasn’t been prioritised yet.


Form tracking with Snowplow [tutorial]
#3

Thank you for the reply, there are any github issue that i can help with that?

If you do accept pull requests, let me know :slight_smile:


#4

Pull requests are always very welcome @lsferreira42! The relevant ticket is here:

https://github.com/snowplow/snowplow/issues/657