I am attempting to write bad rows from Hadoop to ElasticSearch, as suggested in https://github.com/snowplow/snowplow/wiki/Common-configuration#elasticsearch. I am not using the AWS ElasticSearch service but a self-contained ES cluster on EC2. I can write to it successfully from that instance via curl. However, I am again running into a proxy issue when trying to write to it from inside the EMR cluster and step. (I assume it’s a proxy issue; the step never resolves and just hangs, and every other step of the Snowplow pipeline for me has been subject to a proxy.)
My question is: At what point exactly does the EMR runner make the http request to ElasticSearch to index these bad records? I’m trying to pinpoint it so I can test adding the proxy, but it’s not clear from either the runner lib or the Scala src.