I have an issue with my batch pipeline today.
The COPY to Redshift fails because some values of domain_sessionidx are overflowing. The type in the database is int2 which has its maximum at 32767. I have multiple entries with a domain_sessionidx above that limit (probably a bot or something?).
I’d like to change the type of that column to something like BIGINT but since Redshift doesn’t allow altering column types, I’ll probably need to add a new column. I was wondering if the column order is important for the data pipeline (the inserting part actually) or if recreating that column (it would be the last column) is enough. Else I’ll have to completely duplicate the data in a new table in order to change the column type.
I guess another solution would be to parse the files and replace the overflowing values.