Processing apache or nginx logs from non-collectors


#1

Hi there,

We have set up the regular snowplow batch pipeline with a collector on ElasticBeanstalk.

The business has a need to identify certain metrics from our webserver access logs (not the collector). Example: How many requests came from GoogleBot?

Does snowplow supports a way to process existing Apache or NGINX log files that didn’t come through the collector?

Thanks
Enrico


#2

Hi @estahn - it’s a nice idea but it’s not currently supported.

We support processing CloudFront access logs, but not Apache or Nginx logs.


#3

I imagine it’s possible to convert nginx/Apache logs to a pseudo CloudFront format and parse that? (with placeholder values for fields that nginx/Apache doesn’t have).


#4

Apache or Nginx was just an example for a custom log format. In fact we
collect cloudfront logs from all our domains. My understanding was that
those have to come through a special pixel. Is this not the case? Can I
process random cloudfront logs with snowplow?

Enrico


#5

Hi @estahn - yes, it’s a bit hard to find in the documentation but you can process CloudFront access logs in the Snowplow batch pipeline by setting this input format in the config.yml: