Auto-batch table operations

hugetim · June 25, 2024, 5:33pm

Accelerated Tables allow batching Data Tables row updates, deletes, and adds, which can make a big difference for performance when you’re doing many data tables operations. (Every round-trip to the database takes around 0.03s, and increases in transaction execution time are multiplied by increasing the likelihood of conflicts triggering one or more re-tries of the entire transaction.)

But rewriting code to take advantage of these batching abilities can be detrimental to code quality (as well as time-consuming). It recently hit me that (at least for my use cases) it’s possible to instead create a wrapper for anvil.tables that batches these operations automatically inside transactions, increasing performance without any code modifications (other than importing auto_batch.tables in place of anvil.tables).

For example, instead of...

import anvil.tables as tables

@tables.in_transaction
def demo():
    rows = tables.app_tables.table_2.search()
    with tables.batch_update:
        for row in rows:
            if row['text'] == "1":
                row.update(text="2")
    with tables.batch_delete:
        for row in rows:
            if row['text'] != "1":
                row.delete()

you could get the same performance with:

from auto_batch import tables

@tables.in_transaction
def demo():
    rows = tables.app_tables.table_2.search()
    for row in rows:
        if row['text'] = 1:
            row.update(text="2")
        else:
            row.delete()

Would others be interested in something like this? I don’t have it set up to share quite yet, but expressions of interest would help me get across the finish line to doing that.

meredydd · June 26, 2024, 1:24pm

Honestly, this is something we would love to have first-party support for, and if I could wish upon a star I would wish your wrapper implementation could be evolved into a PR against the official implementation.

Something like:

with anvil.tables.auto_batch:
    rows = tables.app_tables.table_2.search()
    for row in rows:
        if row['text'] = 1:
            row.update(text="2")
        else:
            row.delete()

You could reuse a bunch of the machinery for the existing batching system - possibly it would be as simple as “automatically flush any pending batch operations before every operation that might read the DB, then flush again at the end of the context handler”.

hugetim · June 26, 2024, 4:43pm

Your wish is my…shared wish.

Yes, that’s how I have mine working, though I have also had to reinvent various wheels, doing it as an add-on.

Here’s what I have at this point, only for use on the server side as yet: GitHub - hugetim/auto-batch: AutoBatch - Built with Anvil I hope to take a crack at an anvil-runtime PR, but I would also welcome someone else to take it on, if you get to it before me.

hugetim · June 26, 2024, 4:48pm

Is there ever a reason to not auto_batch a transaction? I’ve been thinking not, since the database updates only take effect at the end of the transaction anyway, but my working model of how this stuff works could be way off.

stucork · June 27, 2024, 1:41am

Just as another option, if separating batch_updates and batch_deletes is an issue
have you thought about doing


with batch_delete, batch_update:
    ...

and you could write a context manager to combine these

# only on the server
from contextlib import contextmanager

@contextmanager
def auto_batch():
    with batch_delete, batch_update:
        yield


with auto_batch():
    ...

hugetim · June 27, 2024, 2:16pm

Oh, ok, thanks! I wouldn’t have guessed that those could be combined, but I can imagine how that works now.

hugetim · December 9, 2024, 11:32pm

You can now try out auto-batch as a third-party dependency, with token PKF2MZRQMPCXFWNE. I’ve also added some initial docs at https://github.com/hugetim/auto-batch (and labeled the current code v0.1).