Kinesis Elastic sink - create mapping problem on ES ver 2.3


#1

Dear team,
I try to load data to exist Elasticsearch cluster (version 2.3). Follow the guild (https://github.com/snowplow/snowplow/wiki/kinesis-elasticsearch-sink-setup), I create the mapping on ES, and as the note (On a 2.x cluster you will need to remove the _timestamp key), I run:

curl -XPUT 'http://localhost:9200/snowplow' -d '{
    "settings": {
        "analysis": {
            "analyzer": {
                "default": {
                    "type": "keyword"
                }
            }
        }
    },
    "mappings": {
        "enriched": {
            "_ttl": {
              "enabled":true,
              "default": "604800000"
            },
            "properties": {
                "geo_location": {
                    "type": "geo_point"
                }
            }
        }
    }
}'

Error:
{"error":{"root_cause":[{"type":"parse_exception","reason":"Failed to parse setting [XContentMapValues.nodeTimeValue] with value [604800000] as a time value: unit is missing or unrecognized"}],"type":"mapper_parsing_exception","reason":"Failed to parse mapping [enriched]: Failed to parse setting [XContentMapValues.nodeTimeValue] with value [604800000] as a time value: unit is missing or unrecognized","caused_by":{"type":"parse_exception","reason":"Failed to parse setting [XContentMapValues.nodeTimeValue] with value [604800000] as a time value: unit is missing or unrecognized"}},"status":400}

So, I tried to run as below, and it was done without error.
Is it OK?

curl -XPUT 'http://localhost:9200/snowplow' -d '{
    "settings": {
        "analysis": {
            "analyzer": {
                "default": {
                    "type": "keyword"
                }
            }
        }
    },
    "mappings": {
        "enriched": {
            "_timestamp" : {
                "enabled" : "yes",
               },
            "_ttl": {
              "enabled":true,
            },
            "properties": {
                "geo_location": {
                    "type": "geo_point"
                }
            }
        }
    }
}'

#2

Hey DK9, it might not be the reason for the error but I think the _ttl field has been deprecated in version 2 of ElasticSearch.

The best practice is apparently to use a tool like https://github.com/elastic/curator to create a new index every day/week/month/year, alias it as a “current_index” where you push data into and then delete the older index.
I’ve implemented this with my own Snowplow/ES pipeline and it works very well


#3

Hi @DK9,

The issue is that they do not support defining the ttl with milliseconds anymore. If you exchange the “604800000” with something like “7d” it will work as expected.

Will need to get the documentation updated for this!

Cheers,
Josh


#4

@josh: This could also be updated in docs for ES 2.x to be

    "properties": {
        "geo_location": {
            "type": "geo_point",
            "geohash": true
        }
    }

See: https://www.elastic.co/guide/en/elasticsearch/reference/1.3/mapping-geo-point-type.html

This makes visualization of geo data more efficient. Especially useful with grafana :slight_smile:


#5

Have got this updated now! Thanks for the suggestion @christoph-buente.

@DK9 has that got the mapping creation all functional now?


#6

@josh The mapping and Elasticsearch worked, data on realtime perfectly
Thank you very much :blush:


#7

Great to here!