I’m trying to enable the Elasticsearch sink on our batch pipeline to send the bad rows to our Elasticsearch cluster. Is it possible to enable AWS request signing?
There isn’t anything obvious in the config file however I can see that it has been enabled in the real-time pipeline.
Is it possible for the index to be parameterised?
At present we have a index per day and it might make sense to add the rows into there.
The batch loading of bad_rows into Elasticsearch does not yet support this. Your best bet for the moment is to:
Setup a private VPC subnet
Run EMR out of this private subnet
Whitelist the NAT Elastic IP Address on your Elasticsearch Cluster
This means that all outbound network traffic from the EMR process appears to come from one IP Address, the NAT, which allows you to then load the Elasticsearch Cluster.
If you are already creating an index per day I would recommend setting up an index alias to put in your config. You can then update this alias when the daily index is created and data will flow to the new index.