Data backup with Github actions via Uplink

pablo.diego.rosell · September 18, 2024, 10:19am

Here’s a simple automated workflow to backup your tables using Github actions. The action pulls data tables from you Anvil app via uplink and saves them as CSV file artifacts. This setup ensures regular backups of critical data, and best of all… it’s free! Up until the first 2,000 minutes per month, that is, then you either make your Github repo public or pay to run the actions.

This is the .yml file that will trigger the Github action every hour:

name: Export Anvil Data
on:
  schedule:
     - cron: "0 * * * *"  # Runs every hour
   workflow_dispatch:  # Manual trigger
jobs:
 export-data:
    runs-on: ubuntu-latest

    steps:
      # Checkout repository
      - name: Checkout repository
        uses: actions/checkout@v2

      # Set up Python
      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: '3.x'

      # Install dependencies
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install anvil-uplink pandas

      # Run the Python script
      - name: Run backup_anvil.py
        env:
          ANVIL_UPLINK_KEY: ${{ secrets.ANVIL_UPLINK_KEY }}  # Uplink key from GitHub secrets
        run: python backup_anvil.py

      # Upload CSVs as artifacts
      - name: Upload CSVs
        uses: actions/upload-artifact@v4
        with:
          name: exported-data
          path: |
            users.csv
            sessions.csv

Here’s the Python script that connects to Anvil, fetches data, and saves it as CSV files (you should store your uplink key in Github secrets first):

import os
import anvil.server
from anvil.tables import app_tables
import pandas as pd

anvil.server.connect(os.environ.get("ANVIL_UPLINK_KEY"))

table1_df = pd.DataFrame([dict(r) for r in app_tables.table1.search()])
table1_df.to_csv("table1.csv", index=False)

table2_df = pd.DataFrame([dict(r) for r in app_tables.table2.search()])
table2_df.to_csv("table2.csv", index=False)

Thanks to p.colbert for his starter code. Let me know if you have any questions or suggestions!

p.colbert · September 18, 2024, 3:06pm

If I remember correctly, Anvil tables have a to_csv() method, but I’m not sure that it retrieves values from Media columns. If your technique correctly handles Media columns, then it’s a step ahead.
I suggest that you sort tables, in a repeatable manner, before converting them. That way, after committing your CSV files, git can sensibly diff them for you, without a lot of spurious noise (from unchanged but reordered rows).
Can DataFrame take an iterable as its parameter? If so, then you don’t need to convert to a list (you could omit the []). Loading an entire table as a list can soak up a lot of memory, especially when there are columns of type Media. It might be nice to avoid that, if feasible.

ianb · September 18, 2024, 7:49pm

This should be possible, @pablo.diego.rosell you might want to just change

into :

table1_df = pd.DataFrame(dict(r) for r in app_tables.table1.search())

This will create a generator iterable instead of loading the entire table into the list, so it will not start pulling the data from anvil until you start operating on the dataframe by writing it to the csv.
(Less memory and possibly faster, the write to disk and the i/o pull from the anvil tables are going to be closer to the same speed, and both are way slower than the cpu can convert each row)

pablo.diego.rosell · September 18, 2024, 10:15pm

Thanks @p.colbert, great and savvy advice. You truly are an Anvil legend!

pablo.diego.rosell · September 18, 2024, 10:16pm

Thanks @ianb that worked great!