POST data from CloudFront Collector


#1

So I’m dealing with a large amount of raw cloudfront data on S3. I successfully put it through the emr-etl tool -> storageloader into Redshift. However, only a small portion of my data made it through and I’m finding most of my data in the “enrichment/bad” bucket with this error:

“Only GET operations supported for CloudFront Collector, not POST”

However, the line it’s referencing has base64 encoded JSON data. I’m a bit confused, I’m reading that CloudFront shouldn’t accept POST requests, but clearly the data is there so it must have worked somehow. Why won’t the EMR job accept it? Thanks.


#2

Hi @dyerw, can you share an example line with the Base64-encoded JSON data?


#3

Here’s a line from the “bad” folder: http://pastebin.com/raw/y2MiKE6C


#4

Hi @dyerw - the row you have shared is from the Clojure Collector - it contains this giveaway parameter:

&cv=clj-1.1.0-tom-0.2.0

So it seems like something has gone wrong somewhere in the configuration or setup of your pipeline…


#5

Ah it appears I had it set to cloudfront in my config.yml. Changing it and trying again.


#6

So I’m getting a different error now for a bunch of the data:

Payload with vendor noonu and version tp2 not supported by this version of Scala Common Enrich"}],“failure_tstamp”:"2016-05-16T16:37:36.873Z

So I guess my question now is: what is the vendor and how is it set?

Made a separate post for this because the initial issue was resolved:
http://discourse.snowplowanalytics.com/t/cant-enrich-custom-events-with-custom-schemas-repository/237