Can the batch Elasticsearch target sign requests?


#1

I’m trying to enable the Elasticsearch sink on our batch pipeline to send the bad rows to our Elasticsearch cluster. Is it possible to enable AWS request signing?

There isn’t anything obvious in the config file however I can see that it has been enabled in the real-time pipeline.

Is it possible for the index to be parameterised?

At present we have a index per day and it might make sense to add the rows into there.

Thanks
Gareth


#2

Hey @gareth,

Is it possible to enable AWS request signing?

The batch loading of bad_rows into Elasticsearch does not yet support this. Your best bet for the moment is to:

  1. Setup a private VPC subnet
  2. Run EMR out of this private subnet
  3. Whitelist the NAT Elastic IP Address on your Elasticsearch Cluster

This means that all outbound network traffic from the EMR process appears to come from one IP Address, the NAT, which allows you to then load the Elasticsearch Cluster.

http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_NAT_Instance.html

Is it possible for the index to be parameterised?

If you are already creating an index per day I would recommend setting up an index alias to put in your config. You can then update this alias when the daily index is created and data will flow to the new index.

https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html


HTH,
Josh


#3

We’ll implement the NAT but probably give Snowplow it’s own index.

Thanks for your help, good advice!