Backing up data tables

jshaffstall · January 5, 2021, 2:24am

I’ve seen some other topics on the question of backing up data tables, but I haven’t seen anything that looks like a final solution to the problem. As I’m (distantly) viewing the prospect of deploying my large app and filling its data tables with production data, I’m thinking about a backup strategy.

The basic options seem to be:

Rely on Anvil’s automatic backups (and restores if needed)
Use an in-app backup and restore mechanism

Anvil’s backups

How often are these done?

How granular are restores? Is it all or none? Or can individual tables/rows be brought back?

How timely are restores?

This suggests something coming along soon-ish for backup/restore, but I haven’t found anything new about it: Database backup and export

In-App Backups

I have a significant number of linked tables, and am unsure how to handle those with an in-app backup system. Clearly row ids will change on a restore, so I’d need to track the original row ids and fix up the links on a restore.

Given server function timeouts, I probably wouldn’t be able to restore in a single server function if the backup file were of any size.

This seems like a Thing-Someone-Has-Done-Before…anyone have any samples they can share?

p.colbert · January 5, 2021, 1:03pm

I wrote an Uplink program to back up my data to SQLite. Since the function is running locally, timeouts do not come into play. It isn’t specific to any one app. It reads the Anvil app’s DB schema from one of the app’s .yaml files. That schema is included in the backup. Perhaps that would be a start.

Backup’s the easy part, though. For restore, a lot depends on what data are still left in the Anvil tables, at time of restore, and how those data should be treated. How selective must the restore be? Actual situations will vary greatly, and may need to be aware of an app’s specific needs. There’s no one-size-fits-all.

You are absolutely right about the row ids changing. Inserting an old row will give it a new id. If your rows are interlinked by row id, then restore gets trickier. You have to insert the row before you can get a usable id, for use elsewhere. A two-phase restore (do all the creates, then add all the references) is sometimes necessary.

The other issue is, how do you keep user activity from interfering with the restore process?

Tony.Nguyen · January 5, 2021, 1:46pm

To make backup and restoring simple, I don’t use linked row and store files in tables.

Then every day, I download all tables to cvs files.

Data can be moved between hosted app and open source server without any issue

p.colbert · January 5, 2021, 2:06pm

Yes. The alternative to a row-id link is an app-generated unique id. These ids are long-term stable, and can be backed up and restored as @Tony.Nguyen describes.

jshaffstall · January 5, 2021, 7:16pm

@p.colbert Is your backup program something that can be shared? That would certainly be a good start, especially if it deals with linked rows.

I agree, restoring has other issues, and I need to work through the use cases there.

@Tony.Nguyen In retrospect, designing the schema to avoid linked rows would have made backing up very easy (and avoided some issues with linked row data leakage), but I’d rather avoid refactoring the current project where linked rows are used heavily unless it looks like there are no other good solutions.

p.colbert · January 5, 2021, 8:31pm

I haven’t tested it with multi-link columns. Even so, it’s certainly better than starting from scratch. I’ll see about posting it to Show and Tell.

Edit: Posted to Backup to SQLite via Uplink

p.colbert · January 5, 2021, 10:07pm

Regarding use of row-ids… It may help to add your own unique id (column) to each referenced row. Even if the row-ids can’t be consistent over time, your ids can.

jshaffstall · January 6, 2021, 1:46am

Restoring is another issue to be addressed, but thanks to @p.colbert backups are handled.

owen.campbell · January 6, 2021, 5:15pm

Another option here is to use the open source app server. You could create an app to query your main app over uplink and update its local copy of the tables.

rsmahmud · May 17, 2023, 6:31pm

Hi Owen,

I find your idea quite interesting. This would allow people to migrate anvil hosted app including data to a standalone app server.

However I think I’m missing something,

For example, In my local app

app_tables.mydata.search()

would get me the local data,

but as soon I connect the anvil uplink,

anvil.server.connect("ABC")

and then run,

app_tables.mydata.search()

this would give me data from the remote app.

so, how do you write this to your local mydata table?

also what about linked rows? wouldn’t the row id’s change when you write to your local copy?

owen.campbell · May 17, 2023, 7:06pm

You could create server functions on the main app and call those instead of querying its tables directly.

p.colbert · May 17, 2023, 8:21pm

Yes, row ids will change. You’ll need to catalog each and every row-id reference in the incoming data, and translate it to the corresponding new row-id, once it becomes available, i.e., once the corresponding new row is created. In most cases, this will be a two-step process.

rsmahmud · May 18, 2023, 12:01am

but after connecting the uplink wouldn’t all code reference to remote app tables?

so, how do you access local app tables while being connected via uplink?

import anvil.server
from anvil.tables import app_tables

# CRUD operations of app_tables.any_table are happening on local
anvil.server.connect("YOUR_ANVIL_API_KEY_OR_TOKEN")

# after connecting the uplink,
# CRUD operations of app_tables.any_table are happening on remote

# Retrieve data from the Anvil-hosted app
@anvil.server.callable
def get_data():
    # Perform queries or data retrieval operations on the Anvil-hosted app tables
    # Example: Retrieve data from a specific table
    return app_tables.mydata.search()
   
 
# Call the Anvil-hosted app function to retrieve the data
remote_data = anvil.server.call('get_data')

# this would write it again in the remote tables
for row in remote_data:
    app_tables.mydata.add_row(**row)

Am I missing anything here?

owen.campbell · May 18, 2023, 6:50am

I actually download to csv files via uplink and load separateky.

I do that so I can reload at will without having to download again but, I suppose, it also avoids the problem you describe.