Are data tables thread-safe?

stefano.menci · February 27, 2018, 8:21pm

This post says that I am in charge of making sure my functions are thread-safe.

Does that mean that I must lock every function that accesses the database?

Or that I must lock every function that modifies the database?

Or that I must always use with tables.Transaction()?

Or the tables module is thread-safe and I only need to make my own stuff thread-safe?

meredydd · February 28, 2018, 12:28pm

Aha - the post you link to refers to a difference between the uplink and server modules.

Uplink calls execute within a single program (a program you write and run, which calls anvil.server.connect()). Each server call is handled by a new thread within that program, so you should make sure they’re safe to call simultaneously.

Server module calls execute in their own process. Each server call is handled by a new process, which terminates once the call is complete. This is why, eg, global variables are not preserved from one server module call to another. Code within server modules therefore does not have to be thread safe.

However, any function can find itself editing data in data tables at the same time as someone else. For example, if you have two simultaneous users of an app, their server module calls might be running at the same time.

You can’t lock the database - instead, you have to use transactions. The correct way to do this is to decorate your function with @tables.in_transaction. Below is a long explanation of why:

An explanation of transactions:

A transaction says “when someone else looks at the database, they will either see all the effects of this piece of code, or none of them”. What’s more, a strong transaction (like Anvil’s) says that “as far as the database is concerned, it should be as if every transaction executed one after another, in sequence”. It does this by optimistically letting every piece of code execute, and then rolling back code that would produce an inconsistent result.

Here’s a classic example: transferring money (or points, or whatever) from one account to another. Let’s say each account is represented by a row in a table called accounts table. We could write the naive code like this:

def transfer_balance(from_email, to_email, amt):
  from_row = app_tables.accounts.get(email=from_email)
  to_row = app_tables.accounts.get(email=to_email)
  from_row['balance'] = from_row['balance'] - amt
  to_row['balance'] = to_row['balance'] + amt

Now, this has problems. Someone who looks at the table between the updates of from_row and to_row will see a curious thing - amt will have disappeared from one row, and not appeared anywhere else (yet). Not acceptable. So, we put this inside a transaction:

def transfer_balance(from_email, to_email, amt):
  with tables.Transaction():
    from_row = app_tables.accounts.get(email=from_email)
    to_row = app_tables.accounts.get(email=to_email)
    from_row['balance'] = from_row['balance'] - amt
    to_row['balance'] = to_row['balance'] + amt

Now, the updates arrive atomically - someone who looks at the data tables will either see the state before the transfer, or the state after it, but never the half-way state.

However, there’s still a problem that can occur if two sessions run this function at the same time. Imagine two calls to transfer_balance() happening simultaneously - one from row A into row B, and one from row A into row C:

FIRST CALL       SECOND CALL      BALANCES:     A    B     C
read balance                                   10    0     
                 read balance                  10          0     
                 update balance                 5          5     
update balance                                  5    5

We’ve just made two transfers from the same account, and yet its balance has only been deducted once - we have accidentally created money out of thin air! This is called a transaction conflict. It’s produced a result that couldn’t possibly happen if the two transactions had run one after another.

The good news is that the database detects transaction conflicts, and aborts one or more transactions so that this incorrect state never becomes visible to the rest of the world. In this example, the second call would succeed, but the first call would raise a tables.TransactionConflict exception.

You can catch the tables.TransactionConflict exception yourself, and retry it - something like this:

def transfer_balance(from_email, to_email, amt):
  while True:
    try:
      with tables.Transaction():
        from_row = app_tables.accounts.get(email=from_email)
        to_row = app_tables.accounts.get(email=to_email)
        from_row['balance'] = from_row['balance'] - amt
        to_row['balance'] = to_row['balance'] + amt
    except tables.TransactionConflict:
      continue
    # if we got this far, the transaction succeeded
    break

However, this is a bit of a mouthful! So instead, Anvil includes a built-in decorator that does this for you. (It also does useful things like random backoff, so that two conflicting transactions don’t immediately retry and conflict again.) We highly recommend using @anvil.tables.in_transaction for every transaction you do:

@anvil.server.callable
@tables.in_transaction
def transfer_balance(from_email, to_email, amt):
  from_row = app_tables.accounts.get(email=from_email)
  to_row = app_tables.accounts.get(email=to_email)
  from_row['balance'] = from_row['balance'] - amt
  to_row['balance'] = to_row['balance'] + amt

Much simpler!