I am setting up the pipeline for the first time on AWS and would really appreciate any help to cover the missing documentation from the setup guide. I promise that once I have this running, I’ll post my detailed setup here as reference!
I am trying to get the scala collector setup on an Elastic Beanstalk. From my understanding this should be fairly straightforward. I used a few very helpful support topics and the official setup guide for clojure as a reference.
However, I am not able to get the collector running. Here are some steps I followed:
- Tested on my local dev (127.0.0.1 on port 8080) and with sink set as stdout. This worked.
- I created an EB with the web server environment type (with java as the configuration) and ELB enabled.
- I uploaded a zip file with the Procfile, collector jar and the above config file in it. For testing, I kept the settings in the config file the same as I used for local testing (ie interface:127.0.0.1, port 8080, sink: stdout). The Procfile contents were:
web: ./snowplow-stream-collector --config test.conf
- Note: I will eventually set this up with kinesis but I wanted to test it on stdout first. Can this be an issue as well?
The EB starts up successfully. Accessing the EB url gives the nginx 502 error. But I am assuming that this is the error on port 80 which the EB starts up automatically due to the web server config on the EB. But trying to access the <eb_url:8080> refuses the connection completely. I cannot telnet or stat this port on the server at all.
I accessed the EC2 volume being used and checked the processes. I can see the line
java -jar ./snowplow-stream-collector --config ih.conf so the process is running. But not sure where and why is it inaccessible? Is it a firewall issue (the EB is not inside a VPC)?
Would appreciate any help…thanks all!
UPDATE: while trying to figure this out, i tried the above with 2 kinesis streams (good and bad) and got the same result ie it worked when testing locally, but not in the EB. Thanks!
UPDATE 2: OK…doing some more debugging, seems like it was partly AWS setup at fault. The EB created a default security group with this application, which of course, did not allow 8080 inbound! So I added 8080 as a rule, and I was able to access to server! But weirdly I could not use the EB url to do so, I had to use the EC2 IP address (or amazon public dns) to access it. That doesn’t sound right to me…Also eventually I would like to run the collector on port 80. How can I do that on this application and override the nginx process that runs on it by default? Thank a million!
UPDATE 2: Now after a few retries, I can access the EB url (eg. *.us-east-1.elasticbeanstalk.com) to access the box on port 8080. But as above, if I configure the scala conf to use port 80, it doesn’t work since the server box’s default web server (nginx) takes control. How can I disable this so I can use the scala collector at the standard port 80?
Hope someone can help as it really can’t be all that difficult
UPDATE3: Sorry for using the forum as a rather verbose running log of my activities - but I am hoping the resolution will help others struggling at this stage as well. So leaving the previous EB as is, I tried the same approach by building in an ELB/ASG with the EB (since this is highly recommended byt the snowplow team anyhow). Now I cannot expose the port 8080 from outside anymore. I can still use the ec2 box’s IP and return OK for port 80, but that defeats the purpose of the ELB anyhow. So its a bit of a catch22 here. I can work with 8080 but the ELB doesn’t allow connecting to it. But if I switch to port 80 (ideal) or possibly https (443) then the default nginx takes over the requests.
Would appreciate any help or guidance…thanks very much!