Snowplow miss data in Elasticsearch


We use Snowplow for data analytics and we use Postgres for storing data. We use metabase for data visualization and it had been good until it started to reduce speed for queering due to large amount of data.

As a result, we decided to test Elasticsearch. But we faced with a problem that amount of data in Elasticsearch is 5 times less than in Postgres. We do not understand why. They receive data from the same stream, but difference is massive.

Also, there is warn from elk stream loader: WARN - Returning 56 records as failed, but we did not find explanation of what it means.

Also it throws: [scala-execution-context-global-18] ERROR - Record

And: failed with message failed to parse

If someone faced with similar problem or may be work with Elasticsearch, can you give some explanation of how to deal with difference of data. I understand that Postgres and Elastic are different storage services but they use the same stream of data. May be there is some problems with schemes?

Thank you in advance

Are you able to share your config.hocon file for the ES loader?

source = "kinesis"
sink {
  good = "elasticsearch"
  bad = "kinesis"
enabled = "good"
aws {
  accessKey = iam
  secretKey = iam
queue {
  enabled = kinesis
  initialPosition = "TRIM_HORIZON"
  initialTimestamp = ""
  maxRecords = 10000
  region = "us-west-1"
  appName = "<server-app-name>"
  disableCloudWatch = true
streams {
  inStreamName = "wx-enriched-stream"
  outStreamName = "wx-bad-1-stream"
  buffer {
    byteLimit = 1000000
    recordLimit = 500
    timeLimit = 500
elasticsearch {
  client {
    endpoint = "<elk-endpoint-ip>"
    port = "9200"
    maxTimeout = 10000
    maxRetries = 6
    ssl = false
  aws {
    signing = false
    region = "us-west-1"
  cluster {
    name = "<snowplow-cluster-name>"
    index = "snowplow-enriched-index"
    documentType = "good"

Do you have some errors that are being emitted to Kinesis for the bad events that are not being inserted successfully? There should be some additional info in there.

There are some warnings and errors in enriched events loader’s logs.
First block is:

[RecordProcessor-0000] INFO - Emitted 97 records to Elasticseacrch
[RecordProcessor-0000] WARN - Returning 55 records as failed
[scala-execution-context-global-19] WARN - Cluster health is yellow

The second is a json block (or group of blocks) that starts with:

[scala-execution-context-global-19] ERROR - Record 

and finishes with:

failed with message failed to parse

with information about an event between them.
Also I forgot to mention that we deployed snowplow’s stuff with terraform and the version of loader is 1.0.0 as it turned out, and we use elasticsearch version 7.13. May be there is a discrepancy between versions? What is the best stuck versions to use?