BigQuery Loader - Time partitioned table

sdbeuf · April 22, 2020, 9:26am

I got snowplow completely set up on GCP, using a CE for the stream collector and 2 dataflow pipelines for the other steps:

Scala stream collector with PubSub (1.0.0)
Beam Enrich (1.1.0)
BigQuery mutator with BigQuery Loader (0.4.0)

Now, from reading previous entries I read that the 0.4.0 mutator does not make a time partitioned table and from what I know of BigQuery is that you can’t change a table to time partitioned.
I’ve also read the discussion (Google Cloud Platform data pipeline optimization) where Anton mentions: “Unfortunately Mutator cannot create partitioned tables yet - we’ll add this in next version. But right now you create partitioned table manually via BigQuery Console: Create table -> Schema edit as text -> Paste example atomic schema . Partitioning dropdown menu will automatically propose you to choose any datetime columns as partitioning key.”
If I perform this action, everything goes into my failed-streaming-inserts.

Can anybody help here?

mike · April 22, 2020, 10:32pm

If it’s going into failed inserts there should be error messages associated with each error that will help you debug further. Often this is the case if there are missing columns.

sdbeuf · April 23, 2020, 6:14am

I’ve found an approach which works (https://fivetran.com/docs/destinations/bigquery/partition-table).
What it does:

Copy the table in the same dataset
Delete the original table
Copy the copied table into the original name and make it time partitioned

Topic		Replies	Views
BigQuery Loader - Mutator GCP pipeline	6	1422	May 7, 2020
Multiple BigQuery tables GCP pipeline	3	709	July 18, 2022
GCP: Ideal setup For engineers	7	1165	April 30, 2020
[RFC] Big Query Loader (Google Cloud Dataflow version) deprecation RFCs	0	725	July 8, 2022
About badrows pipeline choices GCP pipeline	1	745	October 23, 2021

BigQuery Loader - Time partitioned table

Related Topics