Geo Region has 3 symbols in enriched data and only 2 in redshift field. Why?

I faced a problem that sometimes raw data after enricher are coming with GEO_REGION consisting with 3 symbols, however in Redshift schema of events, geo_region is only accept 2 symbols. Why it happens? Outdated schema in redshift? Or it is expected that 3 symbols will be trimmed to 2 during load to Redshift?

1 Like

@sphinks, the events table has a 2-char restriction and RDB Loader ensures the value is adjusted during shredding accordingly. I guess this is a historical restriction and the geo value is actually taken from the 3rd party database, MaxMind, once IP lookups enrichment is enabled.

Bumped into this too while off-roading a bit and using Firehose to load Redshift. What’s the behavior or the RDB loader during shredding? Does it fail for the three-char geo_region codes or does it just truncate them?

(FWIW neither seems like particularly desirable behavior–IMO the column should just have a three-char limit.)

@pearsonhenri, in the previous message it was already explained. The geo_region is chopped during shredding. Thus, RDB loader already loads prepared/chopped values including some others as per the link. Shredding is not part of RDB loader. It is only required to prepare data to be loaded to Redshift. Other storages (Postgres, Snowflake) are not affected as they do not require shredding.

Chopping values at shredding is done for a historical reason to prevent breaking older pipelines.