Getting incorrect userip_address in input data header of Scala Stream Collector

Hi Team,

In Scala Stream Collector o/p headers we are getting the subnet IP(usually the docker inet IP of the machine on which the collector is hosted). Due this issue our ip_enrichment is not working as expected as we are not getting the userip_address.

below is the collector config we are using:

collector {
  interface = "0.0.0.0"
  port = 8080

  p3p {
    policyRef = "/w3c/p3p.xml"
    CP = "NOI DSP COR NID PSA OUR IND COM NAV STA"
  }

  crossDomain {
    enabled = false
    domains = ["*"]
    secure = true
  }

  cookie {
    enabled = true
    expiration = "365 days"
    name = collectorCookieName
    domain = cookieDomain
  }

  cookieBounce {
    enabled = true
    name = n3pc
    fallbackNetworkUserId = "00000000-0000-4000-A000-000000000000"
    forwardedProtocolHeader = "X-Forwarded-Proto"
  }

  doNotTrackCookie {
    enabled = false
    name = dnt
    value = 1
  }

  rootResponse {
    enabled = false
    statusCode = 200
    body = ok
  }

  redirectMacro {
    enabled = false
    placeholder = "[TOKEN]"
  }

  cors {
    accessControlMaxAge = 10 seconds
}

  prometheusMetrics.enabled = false

  streams {
    good = <good>
    bad = <bad>

    useIpAddressAsPartitionKey = false

    sink {
      enabled = kinesis
      port = 4150
      region = <region>

      threadPoolSize = 10

      aws {
            accessKey = <key>
            secretKey = <secret>
      }

      backoffPolicy {
            minBackoff = 1000
            maxBackoff = 600000
      }

    }

    buffer {
      byteLimit = 4500000
      recordLimit = 500
      timeLimit = 600000
    }
  }
}

paths {
  "/com.acme/track"    = "/com.snowplowanalytics.snowplow/tp2"
  "/com.acme/redirect" = "/r/tp2"
  "/com.acme/iglu"     = "/com.snowplowanalytics.iglu/v1"
}
akka {
  loglevel = ERROR
  loggers = ["akka.event.slf4j.Slf4jLogger"]

  http.server {
    remote-address-header = on
    raw-request-uri-header = on
    parsing {
      max-uri-length = 32768
      uri-parsing-mode = relaxed
    }
  }
}

Kindly help me on this as one of our major use case is dependent on this functionality.

Let me know if any other information is required from my end.

Hi @BenB,

Hope you are doing good!

can anyone from the team can help here?

Regards
Karan

Can someone help here?

Hi @karan,

Are you saying that whenever a tracker sends events to the collector, user_ipaddress is always the one of the machine where the collector runs, but not the one sent by the tracker ?

Your configuration looks correct, the problem might come from the networking where your collector runs. How is your collector launched? Do you have load balancers ?

@BenB Thanks for the reply

Yes, we are getting the user_ipaddress as ip of collector machine.

We are launching the collector using the official docker images and we have tested it using with and without load balancer in both cases we are getting the incorrect user_ipaddress.
With load balancer we are getting the load balancer subnet IPs and without load balancer are getting the collector machine inet IP.

Also below is the event we are getting from the tracker. I am not able to see the IP address in tracker. Does it looks correct to you?

(Connection,close)
(User-Agent,okhttp/3.14.7)
(Host,127.0.0.1:8080)
(Accept-Encoding,gzip)
(Content-Length,4866)
(Content-Type,application/json; charset=UTF-8)
{"schema":"iglu:com.snowplowanalytics.snowplow\/payload_data\/jsonschema\/1-0-4","data":[{"eid":"cc-a65a-0ddc7becdfb8","tv":"andr-1.3.0","e":"ue","tna":"<ourAppName>","tz":"Asia\/Kolkata","stm":"1606568002029","p":"mob","uid":"<userId>","cx":"<ourData>","ue_px":"<ourData>","dtm":"1606567999138","lang":"English","aid":"student"},{"eid":"35-b86a-028548179f","tv":"andr-1.3.0","e":"se","tna":"<>","tz":"Asia\/Kolkata","se_ca":"<ourData>","se_ac":"<ourData>","stm":"1606568002029","p":"mob","uid":"<ourData>","cx":"<ourData>","dtm":"1606567998687","lang":"English","aid":"student"},{"eid":"a0af-a1e9-d58a13b","tv":"andr-1.3.0","e":"se","tna":"<ourAppName>","tz":"Asia\/Kolkata","se_ca":"<ourData>","se_ac":"<ourData>","stm":"1606568002029","p":"mob","uid":"<ourData>","cx":"ourData","se_va":"1.0","dtm":"1606567998602","lang":"English","aid":"student"}]}

Also below is the event we are getting from the tracker.

Where/how did you get this data ?

I’m a bit surprised to see (Host,127.0.0.1:8080), where is your tracker ? Is it sending events from the same machine where the collector runs?

@BenB,

For getting this data we have developed a simple web service and and intercepted the data we are getting from the tracker. Below is the attributes we are reading from the tracker request.

  post("/com.snowplowanalytics.snowplow/tp2"){
    //println("------> Cookies <----------")
    //request.getCookies.foreach(println)
    println("------> RemoteUser <----------")
    println(request.getRemoteUser)
    println("------> X-Forwarded-For <----------")
    println(request.getHeader("X-Forwarded-For"))
    println("------> All headers <----------")
    request.headers.foreach(println)
    println("------> body <----------")
    println(request.body)
    println("------> RemoteAddress <----------")
    println(request.getRemoteAddr)
    println("------> Proxy-Client-IP <----------")
    println(request.getHeader("Proxy-Client-IP"))
    println("------> WL-Proxy-Client-IP <----------")
    println(request.getHeader("WL-Proxy-Client-IP"))
    println("------> HTTP_CLIENT_IP <----------")
    println(request.getHeader("HTTP_CLIENT_IP"))
    println("------> HTTP_X_FORWARDED_FOR <----------")
    println(request.getHeader("HTTP_X_FORWARDED_FOR"))
  }

Tracker is running at the android side and hosted in EC2 machine. Also, tracker and collector both are running in different machines.

Hi @karan ,

Sorry for taking some time to get back to you on this.

The issue might come from your collector setup. Where/how is collector started ?

Hi,

I’m seeing a similar problem where the reported IP address is the collector IP address, not the users’ IP address, and I’m looking for a solution.

Question for the @karan: Are you running https or http?

Hi @timolin ,

Using http or https should not have any impact on the IP address that the collector receives.

What’s your collector setup? Where/how is it run ?

Thanks @BenB - I think I got to that conclusion, too.

I’m running a my collector on a Kubernetes cluster in AWS. What I find is that the reported IP address are just the intetnal VMs that make up the cluster, perhaps the VMs are fowarding the pod that is running the collector.

Any suggestions? I’ve tried to set the externalTrafficPolicy to local, but that didn’t help.

Hi @timolin ,

I think that the first thing to determine is whether the user’s IP address makes it to the collector in the HTTP request or not.

For that, could you activate HTTP header extractor enrichment with this config :


{
	"schema": "iglu:com.snowplowanalytics.snowplow.enrichments/http_header_extractor_config/jsonschema/1-0-0",
	"data": {
		"vendor": "com.snowplowanalytics.snowplow.enrichments",
		"name": "http_header_extractor_config",
		"enabled": true,
		"parameters": {
			"headersPattern": ".*"
		}
	}
}

This will add one context (with this schema) per HTTP header to the enriched event and it will be possible to see if the IP appears in one of them (e.g. X-Forwarded-For).

Hi @BenB - Sorry for the radio silence. I had to step away from this problem for a whille.

I have a solution to this problem, but not through the classic load balancer. Instead, I eventually configured an Ingress service that starts up an ALB. This retains the client’s (browser’s) IP address.

-Tim

Was having the same issue but I was running with nginx on my host machine.

Had to add this config:

server {
  server_name example.com;

  location / {
    proxy_pass http://localhost:8080;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }
  ...
}