Unstable Iglu server response times

medicinal-matt · October 11, 2021, 11:46am

Hi!

We tried to setup the Iglu server based on the secure quick start example. We have however noticed that the response time of API calls can be extreme slow from time to time, and we suspect this might be the issue we’re seeing with the RDB Loader.

The health call says everything is OK, but about 50 % of the API calls take forever to complete.

curl -kso /dev/null iglu-lb-<ACCOUNT>.<REGION>.elb.amazonaws.com/api/meta/health -w "==============\n\n
| dnslookup: %{time_namelookup}\n
| connect: %{time_connect}\n
| appconnect: %{time_appconnect}\n
| pretransfer: %{time_pretransfer}\n
| starttransfer: %{time_starttransfer}\n
| total: %{time_total}\n
| size: %{size_download}\n
| HTTPCode=%{http_code}\n\n"

| dnslookup: 0.001543

| connect: 75.232440

| appconnect: 0.000000

| pretransfer: 75.232495

| starttransfer: 75.275799

| total: 75.275937

| size: 2

| HTTPCode=200

time curl iglu-lb-<ACCOUNT>.<REGION>.elb.amazonaws.com/api/schemas/com.snowplowanalytics.snowplow/ua_parser_context/jsonschema/1-0-0 -X GET -H "apikey: <READ KEY>"

0.01s user 0.01s system 0% cpu 1:15.38 total

Moments before the connect was only 0.039454. Is this expected?

We’re having the server and database in two private subnets, and the load balancer in two public subnets of the same VPC. We tried setting the EC2 to t3.medium and the RDS to db.t3.medium, but no success.

mike · October 12, 2021, 12:27am

This is a pretty unusually long response time.

Is this for all endpoints or just a subset of endpoints?

I’d be tempted to work backwards from the connection (to the load balancer, to the EC2 instance and then to the RDS instance) to try and determine where that potential latency is being introduced as well as having a look through the Cloudwatch metrics for these services.

medicinal-matt · October 12, 2021, 2:49pm

It is all end points I’ve tried. What is so strange that works every now and then. It doesn’t consistently take long.

I now tried adding one subnet per availability zone, but there is no obvious improvement.

The Cloud Watch metrics of the Monitoring tab for the load balancer, Iglu server and Iglu database look calm.

Can you tell me a VPC setup that is verified to work? How many public and private subnets? What about availability zones?

medicinal-matt · October 13, 2021, 7:11am

Or maybe it is the security groups?

iglu-server security group
Inbound

type: SSH, protocol: TCP, port: 22, source: 0.0.0.0/0
type: Custom TCP, protocol: TCP, port: 8080, source: sg-08ad553c562946650 / iglu-lb

Outbound

type: HTTPS, protocol: TCP, port: 443, destination: 0.0.0.0/0
type: HTTP, protocol: TCP, port: 80, destination: 0.0.0.0/0
type: PostgreSQL, protocol: TCP, port: 5432, destination: sg-0f0cc0f43ab50c185 / iglu-rds
type: Custom UDP, protocol: UDP, port: 123, destination: 0.0.0.0/0

iglu-lb security group
Inbound

type: HTTPS, protocol: TCP, port: 443, source: 0.0.0.0/0
type: HTTP, protocol: TCP, port: 80, source: 0.0.0.0/0

Outbound

type: Custom TCP, protocol: TCP, port: 8080, destination: sg-008224009d024b58a / iglu-server

mike · October 13, 2021, 7:31am

Public / private subnets should be fine as well as any availability zones. I’d dig further into Cloudwatch as there should be some indicator as to where they slow response is coming from if those logs are being written out. The security groups shouldn’t have a material impact on response latency.

medicinal-matt · December 17, 2021, 3:33pm

I actually got a reply from AWS customer support for how to debug this, but it turns out the problem simply disappeared when we switched from using the new VPC setup for this purpose to older one someone else had setup.

Not entirely sure how the VPC and the surrounding settings differ, but at least this solves the issue for us.

mike · December 20, 2021, 12:03am

Yeah - that’s odd. Thanks for the update!

Topic		Replies	Views
Snowplow-rdb-loader timing out For engineers	2	620	March 5, 2020
Iglu Server responds with 502 For engineers	3	647	February 2, 2022
RDB Loader can hang for many hours Troubleshooting	3	1343	September 22, 2017
S3 curl Error on rdb loader For engineers	1	1854	April 20, 2021
Quick Start on GCP - Iglu Server instance group creation timeout Troubleshooting	9	1292	June 17, 2023

Unstable Iglu server response times

Related Topics