Hi all,
Currently we have set up the “simple” version of the GCP open source setup using instance templates. What we want to do now is to create a Kubernetes setup for which all docker images are run on.
I have successfully run the scala-stream-collector as a docker image on a Kubernetes Cluster on GCP (GKE), however struggling with the beam enrich deployment.
This is my deployment yaml for enrichment:
---
apiVersion: "v1"
kind: "Namespace"
metadata:
name: "default"
---
apiVersion: "apps/v1"
kind: "Deployment"
metadata:
name: "beam-enrich-dm"
namespace: "default"
labels:
app: "beam-enrich-dm"
spec:
progressDeadlineSeconds: 1200
replicas: 2
selector:
matchLabels:
app: "beam-enrich-dm"
template:
metadata:
labels:
app: "beam-enrich-dm"
spec:
containers:
- name: "beam-enrich-dm"
image: "docker.io/snowplow/beam-enrich:1.3.1"
args: ["--runner", "DataFlowRunner", --streaming", "true", "--project", "XXX", "--zone", "europe-west1-d", --gcpTempLocation", "gs://my-bucket/", "--job-name", "beam-enrich", --raw", "projects/XXX/subscriptions/good-sub", "--enriched", "projects/XXX/topics/enriched-good", "--bad", "projects/XXX/topics/enriched-bad", "--pii", "projects/XXX/topics/pii-topic", "--enrichments", "/snowplow/enrichments/", "--resolver", "/snowplow/resolver/iglu_resolver.json", "--workerMachineType", "n1-standard-1", "--diskSizeGb", "30", "serviceAccount", "myserviceaccount@..."]
env:
volumeMounts:
- name: enrichments
mountPath: /snowplow/enrichments
- name: resolver
mountPath: /snowplow/resolver
volumes:
- name: resolver
configMap:
name: resolver
- name: enrichments
configMap:
name: enrichments
These are my config maps
enrichments.yaml:
apiVersion: v1
data:
anon_ip.json: |
{
"schema": "iglu:com.snowplowanalytics.snowplow/anon_ip/jsonschema/1-0-1",
"data": {
"name": "anon_ip",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": true,
"parameters": {
"anonOctets": 2,
"anonSegments": 1
}
}
}
pii_pseudo.json: |
{
"schema": "iglu:com.snowplowanalytics.snowplow.enrichments\/pii_enrichment_config\/jsonschema\/2-0-0",
"data": {
"vendor": "com.snowplowanalytics.snowplow.enrichments",
"name": "pii_enrichment_config",
"emitEvent": true,
"enabled": true,
"parameters": {
"pii": [
{
"pojo": {
"field": "user_id"
}
}
],
"strategy": {
"pseudonymize": {
"hashFunction": "XXX",
"salt": "XXXX"
}
}
}
}
}
kind: ConfigMap
metadata:
name: enrichments
resolver.yaml
kind: ConfigMap
metadata:
name: resolver
apiVersion: v1
data:
resolver.json: |-
{
"schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1",
"data": {
"cacheSize": 500,
"repositories": [
{
"name": "Iglu Central",
"priority": 0,
"vendorPrefixes": [ "com.snowplowanalytics" ],
"connection": {
"http": {
"uri": "http://iglucentral.com"
}
}
},
{
"name": "Iglu Central - GCP Mirror",
"priority": 1,
"vendorPrefixes": [ "com.snowplowanalytics" ],
"connection": {
"http": {
"uri": "http://mirror01.iglucentral.com"
}
}
}
]
}
}
This is how its deployed using Google CloudBuild:
- name: snowplow-cloudbuild-deploy-beam-enrich
type: cloudbuild.py
properties:
steps:
- name: 'gcr.io/cloud-builders/gcloud'
args:
- source
- repos
- clone
- snowplow-sandbox
- --project=XXX
- name: "gcr.io/cloud-builders/gke-deploy"
args:
- run
- --filename=snowplow-sandbox/iac-setup/k8s/beam-enrich/beam-enrich-dm.yaml
- --location=europe-west1-d
- --cluster=id_of_cluster
This is the error I get from the CloudBuild job
Expanding configuration files.
Saving expanded configuration files to "output/expanded"
Finished preparing deployment.
Applying deployment.
Getting access to cluster "id_of_cluster" in "europe-west1-d".
Configuration files to be used: [{kind: Deployment, name: beam-enrich-dm} {kind: Namespace, name: default}]
Applying configuration files to cluster.
Waiting for deployed objects to be ready with timeout of 5m0s
Still waiting on 1 object(s) to be ready: [{kind: Deployment, name: beam-enrich-dm}]
Still waiting on 1 object(s) to be ready: [{kind: Deployment, name: beam-enrich-dm}]
Still waiting on 1 object(s) to be ready: [{kind: Deployment, name: beam-enrich-dm}]
Still waiting on 1 object(s) to be ready: [{kind: Deployment, name: beam-enrich-dm}]
Still waiting on 1 object(s) to be ready: [{kind: Deployment, name: beam-enrich-dm}]
Still waiting on 1 object(s) to be ready: [{kind: Deployment, name: beam-enrich-dm}]
Still waiting on 1 object(s) to be ready: [{kind: Deployment, name: beam-enrich-dm}]
Still waiting on 1 object(s) to be ready: [{kind: Deployment, name: beam-enrich-dm}]
Still waiting on 1 object(s) to be ready: [{kind: Deployment, name: beam-enrich-dm}]
Finished applying deployment.
################################################################################
> Deployed Objects
NAMESPACE KIND NAME READY
default Deployment beam-enrich-dm No
################################################################################
> GKE
Workloads: https://console.cloud.google.com/kubernetes/workload?project=XXX
Services & Ingress: https://console.cloud.google.com/kubernetes/discovery?project=XXX
Applications: https://console.cloud.google.com/kubernetes/application?project=XXX
Configuration: https://console.cloud.google.com/kubernetes/config?project=XX
Storage: https://console.cloud.google.com/kubernetes/storage?project=XXX
Error: failed to apply deployment: timed out after 5m0s while waiting for deployed objects to be ready
Any k8s + snowplow expertees out there? Thanks!
Brian