Replace Big Table from CSV data [Possible Bug?] [Fixed?]

What I’m trying to do:
I have a 20k rows table, lets call it production_data that needs to be periodically regenerated from scratch from a CSV.
This table is the data source for an HTTP API exposed to my users.
I’d like to regenerate this table avoiding service interruption to my users.
The data in the CSV has no key, so I can’t simply update the existing table, I need to throw it away and regenerate from scratch from the new CSV.
Truncating the table and adding the 20k rows is the only way that comes to my mind, but that requires some time. With a semaphore I can avoid serving partial data to my user, simply returning a “Updating data, retry later” error message.
In order to reduce this offline time at minimum, I was thinking at this:

  1. load the CSV data to an ad-hoc table, lets call it csv_load_data, that mirrors production_data.
  2. once loaded, is there a way to switch the two tables?

Other approaches/ideas are welcome.

Thanks and BR

You can use getattr to give you a dynamic table name.

e.g. instead of

rows = app_tables.my_table.search()

you can use:

table_name = "my_table"
rows = getattr(app_tables, table_name).search()

The logic of how you set the current table name is then whatever you want it to be.

2 Likes

You could add one version column and every time you add data, you increase the version number.

When the upload is finished, you update the the current version number on another table, then you start deleting all the rows of the old version.

When serving data to the users, you always filter by version. You could create a view filtered by version, so the view doesn’t even see the version column.

5 Likes

Thanks, very good idea.
Simple, straightforward and elegant.
I’m gonna try that!

Thank you very interesting information, I’m thinking of many other applications where I need to access tables whose name is not known at design time.

Hi @anvil
guess what? Different App, same error as here.

anvil.tables.TableError: Internal database error: ERROR: canceling statement due to user request
<running on the server>
called from /downlink/anvil/_server.py, line 42
called from update_module, line 63


Session ID:BUJKV4WFU7BA6CHUG7TALA5QTUCZJLZP

Code where the exception is raised, is the insert loop of the “new version” data: (translated from italian to english)

  print(f"{timestamp()} Begin insert new data in table")
  for d in df.to_dict(orient="records"):
    # d is now a dict of {columnname -> value} for this row
    # We use Python's **kwargs syntax to pass the whole dict as
    # keyword arguments
    d['data_version'] = new_data_version
    app_tables.comuni_anpr.add_row(**d)   -----> this throws the error
  print(f"{timestamp()} Begin insert new data in table")

Hopeless and frustrated.

Don’t start telling me use transactions, I tried in this App and had to revert back because DB service is raising transactional conflicts while I am adding rows in single-user.

What’s common in this 3 posts? Those are both background tasks running daily at 03.00 CEST.
@anvil is there any Database maintenance operation scheduled at that time that could conflict ?

I’ve run out of options.

I’m losing trust in the database service.

I think this should be raised as a bug report, but I don’t know if continuing this exact thread is the right way to do it, so I’m deferring to others who might know.

Maybe it can be split off? (I added [Possible Bug?] to the thread title.)

1 Like

I completely agree with you, it’s useless to keep discussing it in the forums.
This is just an update to say that starting from May the 5th I am not having errors anymore.
Unmodified code.

I guess someone from @anvil fixed something in their backend after the PyCon, but I have no evidence / confirmation of this.
It would be nice to know.

Case closed (for now) by … magic?
Thanks to all for your help and suggestions.