Dev/Prod workflow & Data tables

I’ve run into the exact same issues you have. I don’t have a solution for programmatically updating schemas between prod and dev, but I did write the following Python code that would identify differences between two versions of my app. I wrote this all into a Jupyter notebook, but I can’t attach it here, so I’ll include it as a python script.

Note: you’ll need to have PyYaml available, and will need to update the paths to your dev and prod versions of your app. I clone each app onto my machine as a separate repository (directory) and use an rsync command to copy all but specific files between the two. Then I use this script to identify any schema changes I need to make between the two apps. Not a great solution, but it’s the best I’ve come up with so far.

import yaml

with open("path-to-prod/anvil.yaml", "r") as f:
    prod_sp = yaml.safe_load(f)

len(prod_sp["db_schema"])

with open("path-to-dev/anvil.yaml", "r") as f:
    dev_sp = yaml.safe_load(f)

dev_table_names = [t['name'] for t in dev_sp["db_schema"]]
len(dev_sp["db_schema"])

# Find Missing Dev Tables

for t in prod_sp["db_schema"]:
    if t["name"] not in dev_table_names:
        print("New Table: ", t["name"])
        for k, v in t["columns"].items():
            print(f"\tCol: {v['name']} - Type: {v['type']}")
    
def get_table_schema(schema_dict, table_name):
    for t in schema_dict["db_schema"]:
        if t["name"] == table_name:
            return t
    return None

# Find Missing Dev Columns

# Assumption: schema modifications start from dev and are copied to 
# prod as needed. Primary is intended to be the "source" schema, 
# and secondary schema is the target that should be the same as the primary.

missing_tables = []
missing_columns = []
primary_schema = prod_sp
secondary_schema = dev_sp


for t in primary_schema["db_schema"]:
    secondary_table = get_table_schema(secondary_schema, t["name"])
    if secondary_table is None:
        missing_tables.append(f"Missing Table: {t['name']}")
        continue
  
    for k, v in t["columns"].items():
        secondary_columns = [dt["name"] for dev_key, dt in secondary_table["columns"].items()]
        if v["name"] not in secondary_columns:
            missing_columns.append(f"Table: {t['name']} - Missing {v['name']} - {v['type']}")
missing_tables, missing_columns
3 Likes