We’re facing a really weird behaviour.
It doesn’t occur all the time, but every once in a while the parameter &u= isn’t respected and the redirect goes to another URL.
I’ve checked the source code for the clojure collector, is it possible that it could have some sort of cache in there?
(defn- send-redirect
"If our params map contains `u`, 302 redirect to that URI,
else return a 400 for Bad Request (malformed syntax)"
[cookies headers params]
(let [{url "u"} params]
(if (nil? url)
{:status 400
:headers headers
:cookies cookies}
{:status 302
:headers (merge headers {"Location" url})
:cookies cookies
:body ""})))
Please check two of those hits.
The identified U parameter and the Location don’t match!
The Location URL is valid and could have been used by other active event at the moment and sent to the same collector endpoint.
Also, the exact same U parameter points to two different redirect locations.
In this particular case, an AU event got redirected to a DK campaign. It doesn’t make sense.
We used to have only 1 EC2 (m4.large) instance running, but some spikes in CPU and latency a few days ago triggered another instances an now we have 3 instances running until things stabilize.
Cloned the Beanstalk with latest platform (so a new ELB, new instance with a fresh clojure collector install, new volume), swap URLs, but the exact same problem persists.
And I think it’s getting worse, in like 20 request we used to got 1 wrong redirect, now in 10 request, 1 or 2 are pointing to the wrong URLs.
Mike, even if I wanted to reproduce this condition I could not recreate it. It’s an oddly behaviour.
We have around ±850k daily events being tracked. Not that much of a deal, I think.
The thing now is, we’ve got two beanstalks (single m4.large clojure collector sitting behind an ELB) running (old env and new env), I’ve swapped the old beanstalk url to the new env. Yesterday the situation was happening in the new env but can’t be noticed in the old one.
Because of complaints, we had to turn off tracking for almost all future events, so the traffic volume in the collector was reduced by a lot. I’ve tested today and can’t reproduce the behaviour in either of the envs which leads me to believe that it may be related to the volume of requests and the management of any eventual cache…
I can pm you, if you want to, both beanstalk URLs so that you can see it for yourself.
So, we’ve setup a second collector and tried to split the traffic between them. Both collectors are M4.large instances.
I think a pattern can be found. Right now we’re having ±10 reqs/sec in one of the collectors and the redirects are getting scrambled.
Take a look. In 39 requests only 28 went to the correct location.
The test posted above is a situation that happens in both cases when the number of requests increases. With the current volume of requests things work without constraints but we had to reduce ±50% the number of tracked events.
This does not depend on the number of requests. An error appears on some instances. You need to re-create the instance, or all of them. Make tests and if everything is fine, leave, if not - recreate the instance one more time.
Our workaround was significantly reduce the number of tracked events, currently around 500k daily without noticing the issue.
We would love to get back to 1M daily but can’t take any more risks with messing the user navigation.
We were very uncomfortable with this error, switching to other systems seemed too complicated.
Three weeks ago we found the Clojure programmer, switched traffic to the old instance, but could not produce the error, it does not exist, everything works well, as before, until February.
Now we are testing every day - there is no errors.
Try to reproduce the situation that you had, we can understand if there is a problem today and decide if it is there.
Perhaps you can do this at a later time, or on weekends, or using tools to simulate the load on the server.