We are trying to setup dev and qa environments using data from a prod environment.
We are not using CREATE DATABASE dev CLONE prod because we are trying to avoid cloning database-specific objects like stages and pipes, since we are using per-environment Terraform to manage pipe-related objects and want to avoid out-of-band changes to those objects.
On top of that, there are some tables that should not be cloned from prod->dev. I'm trying to design a cleaner solution than the cobbled mess that we have.
We have a scheduled script that does the following:
- Connect to
prodanddevdatabases and fetch the rightsrcanddstschemas - Run
SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = '<>' AND TABLE_TYPE = 'BASE TABLE'to get tables to clone- Cloning tables across databases results in dangling references to constraints and sequences, so those have to be manually cloned https://docs.snowflake.com/en/user-guide/database-replication-considerations.html?#references-to-objects-in-another-database
- For each table:
- If it shouldn't be cloned, skip it
- Run
CREATE OR REPLACE TABLE <dev> CLONE <prod> COPY GRANTS; - Run
GET_DDL(<dev>)to see if the table has sequences/constraints to update - Run
CREATE OR REPLACE SEQUENCE <dev> CLONE <prod>to update thenextvalof the sequence since our table was cloned and references the sequence from the source database (and it also has the wrong value anyways) - Run
ALTER TABLE <dev> ALTER COLUMN <> SET DEFAULT <new seq>.nextval - Check if there are constraints
- Run
ALTER TABLE <dev> DROP CONSTRAINT <>since the cloned tables reference the source database - Run
ALTER TABLE <dev> ADD CONSTRAINT <>to rebuild them to reference the destination database
- Run
So... it works, but it's hacky, fragile, and prone to updating because of custom rules. We currently have this running on an AWS lambda, but a first step would be to migrate this to pure Snowflake.
Does anyone have any suggestions to improve this process? Or at least have recommendations on Snowflake tools that