JS tracker and scala stream collector


#1

I am having trouble with the scala stream collector communicating with the JS tracker.

window.snowplow('newTracker', 'ssc', '**_ec2-instance-public-dns-here.com_**', { // Initialise a tracker
  appId: "snowplowPOC",
  cookieDomain: "snowplowpoc_aws.com"
});

ec2-instance-public-dns-here - I have tried the following things

  • Collector’s EC2 instance public DNS
  • Collector’s EC2 instance public IPv4 IP
  • Collector’s EC2 instance public DNS with port mentioned in the config file
  • Collector’s EC2 instance public IPv4 IP with port mentioned in the config file
  • Places the collector behind a load balancer and used the load balancer’s DNS name

When I places the collector EC2 instance behind a load balancer, I opened the collector’s security group to include All Traffic and inbound source to be from anywhere.

When I ping the EC2 instance from my local machine I am able to ping the instance and the ELB successfully, but when I try to "curl http://url-to-elb-or-instance-here.com:port, the request times out.

I have tried quite a few things mentioned on this community as well as other blogs and communities, but nothing seems to be working.

I am assuming two things here:

  1. Since I am behind my company’s firewall, could that be creating any issues to ping the collector from my machine.
  2. Is there any other way I could test the fact that data is reaching my S3 bucket from my collector to enrich - S3.

Any help is appreciated.
Thanks.


#2

Hi @dsouzabash,

Does your collector have any issues binding to the interface and port you specified in the configuration when you launch it?

I would suggest trying curl http://dns:port/health to check it’s running / accessible.

If you have set up collector + enrich + an s3 loader, you should be able to see your enriched events landing in S3. Otherwise, you can check the backend where the raw events are sent to from the collector (Kinesis stream, Kafka or NSQ topic).


#3

Hi @BenFradet,
when i run curl localhost:8080/health, it returns a 200 OK from within the EC2 instance for the collector.
But when I run the curl dns:port/health from outside the EC2 instance, the request times out. Which means it is not accessible.
I was wondering if I had to associate any SSL certificate with my ELB to be accessible from outside. But if the EC2 instance isn’t accessible, that’s a possible problem, right?
Thanks.


#4

This sounds like it could be a security group issue. What do the security groups look like for

  1. Your EC2 instance
  2. Your load balancer

#5

This is the security group inbound rules for my EC2 instance


This is the security group inbound rules for my ELB

I am just wondering if it is an issue with an external resource trying to access my EC2 instance’s port or ELB over the internet? Would I need to pass some SSL certificate between my website and the ELB to confirm that the data coming in is from a legitimate source?
I am creating my own dummy website using Elastic Beanstalk and that website timesout every time it tries to access the ELB or my instance:port in the js tracker.
But when i curl port:ip from my Elastic beanstalk web app’s instance to the collector’s instance, I get a 200 OK response.
I am assuming this is not a snowplow issue but more of a AWS resource accessing issue.
Help is appreciated :slight_smile:

Thanks.


#6

The problem was my office firewall. Trying the same from home without VPN connected worked fine :slight_smile:
Thanks for your support.