Kinesis Elastic sink - create mapping problem on ES ver 2.3

Dear team,
I try to load data to exist Elasticsearch cluster (version 2.3). Follow the guild (https://github.com/snowplow/snowplow/wiki/kinesis-elasticsearch-sink-setup), I create the mapping on ES, and as the note (On a 2.x cluster you will need to remove the _timestamp key), I run:

curl -XPUT 'http://localhost:9200/snowplow' -d '{
    "settings": {
        "analysis": {
            "analyzer": {
                "default": {
                    "type": "keyword"
                }
            }
        }
    },
    "mappings": {
        "enriched": {
            "_ttl": {
              "enabled":true,
              "default": "604800000"
            },
            "properties": {
                "geo_location": {
                    "type": "geo_point"
                }
            }
        }
    }
}'

Error:
{"error":{"root_cause":[{"type":"parse_exception","reason":"Failed to parse setting [XContentMapValues.nodeTimeValue] with value [604800000] as a time value: unit is missing or unrecognized"}],"type":"mapper_parsing_exception","reason":"Failed to parse mapping [enriched]: Failed to parse setting [XContentMapValues.nodeTimeValue] with value [604800000] as a time value: unit is missing or unrecognized","caused_by":{"type":"parse_exception","reason":"Failed to parse setting [XContentMapValues.nodeTimeValue] with value [604800000] as a time value: unit is missing or unrecognized"}},"status":400}

So, I tried to run as below, and it was done without error.
Is it OK?

curl -XPUT 'http://localhost:9200/snowplow' -d '{
    "settings": {
        "analysis": {
            "analyzer": {
                "default": {
                    "type": "keyword"
                }
            }
        }
    },
    "mappings": {
        "enriched": {
            "_timestamp" : {
                "enabled" : "yes",
               },
            "_ttl": {
              "enabled":true,
            },
            "properties": {
                "geo_location": {
                    "type": "geo_point"
                }
            }
        }
    }
}'

Hey DK9, it might not be the reason for the error but I think the _ttl field has been deprecated in version 2 of ElasticSearch.

The best practice is apparently to use a tool like https://github.com/elastic/curator to create a new index every day/week/month/year, alias it as a “current_index” where you push data into and then delete the older index.
I’ve implemented this with my own Snowplow/ES pipeline and it works very well

1 Like

Hi @DK9,

The issue is that they do not support defining the ttl with milliseconds anymore. If you exchange the “604800000” with something like “7d” it will work as expected.

Will need to get the documentation updated for this!

Cheers,
Josh

1 Like

@josh: This could also be updated in docs for ES 2.x to be

    "properties": {
        "geo_location": {
            "type": "geo_point",
            "geohash": true
        }
    }

See: Geo Point Type | Elasticsearch Guide [1.3] | Elastic

This makes visualization of geo data more efficient. Especially useful with grafana :slight_smile:

1 Like

Have got this updated now! Thanks for the suggestion @christoph-buente.

@DK9 has that got the mapping creation all functional now?

@josh The mapping and Elasticsearch worked, data on realtime perfectly
Thank you very much :blush:

1 Like

Great to here!