Non obvious causes of a transaction conflict?

jshaffstall · May 24, 2021, 9:27pm

I have a server function running in a transaction, using @anvil.tables.in_transaction. That server function is the only place where a particular data table field is updated. I’m getting intermittent transaction conflicts on that line. Everything is still in alpha testing, so I’m pretty much the only person using the app at the moment, so an actual transaction conflict isn’t likely.

What I’m looking for are the non-obvious causes of transaction conflicts, the sort of gotchas that might crop up even when only a single person is using the app. Is there a way the client might be making parallel calls to the server function, for example?

The server function is being called from a form’s init method, as well as in response to a button click.

I thought at first that I was clicking the button multiple times (this was on a mobile device), but I have tried replicating that and it doesn’t cause the transaction conflict.

Are there other reasons you might get a transaction conflict with a server call initiated by button press?

Here’s the server function:

@anvil.server.callable
@anvil.tables.in_transaction
def get_progress(order_item_id):
  order_item = app_tables.orderitems.get(order_item_id=order_item_id)
  
  if not order_item:
    raise ValueError("Invalid order item")
  
  set_mystery_being_played(order_item)
  
  # The following line is the one triggering the transaction conflict
  order_item['last_accessed'] = datetime.datetime.now()
  
  rendered = process_bbcode(order_item['stage']['data']['message'], order_item)
  
  return (order_item['stage'], rendered, order_item['variables'], app_tables.stage_submissions.search(tables.order_by('datetime'), order_item=order_item))

owen.campbell · May 24, 2021, 9:51pm

In general, it’s best to keep the scope of a transaction as narrow as possible. Here, you are performing a search and calling another function before your update. Instead, try moving that update to a separate function and put the transaction decorator on there instead.

I’m guessing that the set_mysyery_being_played function also does an update, so you’ll also need to refractor things so that update is also in the decorated function.

So, get all the updates into one function, put the transaction decorator on it and don’t do anything else in that function. Just call it from elsewhere.

jshaffstall · May 26, 2021, 1:50am

Thanks, that’s good advice. I have other refactoring I need to do with the code, I’ll work that into it.

The original question still stands, though, with only a single user using the system there should have been no transaction conflict, yet one happened.

Are there things others have run into that might cause a single user conflict?

rickhurlbatt · May 26, 2021, 12:00pm

I had a similar issue (also single user) and found that I had nested the in_transaction decorator in a function that was called by another function which was also decorated. Both were updating the same row and hence raised the conflict. @owen.campbell’s solution is what worked for me - keep the scope as narrow as possible.

I’m sure you have already checked but are there any background tasks running which may cause the conflict?

jshaffstall · May 26, 2021, 12:35pm

No background tasks being used in this app yet, so not that.

For some reason I thought that nesting in_transaction decorators wasn’t allowed. When I refactor I’ll make sure this isn’t happening.

Thanks!

owen.campbell · May 26, 2021, 12:45pm

For me, transaction management is one of the dark arts. People have been known to get a
PhD for knowing their way around that stuff.

Trying to predict what a database might decide constitutes a conflict is a rabbit hole from which you might never emerge. For me, I prefer to narrow the scope and, most of the time, watch the conflict magically disappear as a result!

rickhurlbatt · May 26, 2021, 1:30pm

border line tears from that line

p.colbert · May 26, 2021, 2:50pm

I’m experiencing a similar problem, revolving around a “heartbeat” function on the server.

This function is the consolidation of several ideas.

Letting the Server know that a Client is definitely connected, and perhaps logged in. The goal is to know when everybody’s out, so that we know when it is safe to back up data and perform maintenance.
Letting the Client know (via return value) when the App is going down for maintenance, so that new log-ins can be prevented, and logged-in users can be logged out (e.g., when away from their browser).

Both of these imply calling the Server at regular intervals. Given the time-lag of each server call, it makes sense to do both of them in the same call.

In my case, I think a few of my “heartbeat” calls may be too close together in time. One occurs at App startup. Another occurs at startup of the Main Form. These are less than a second apart., and depending on database load, it’s conceivable that the database hasn’t finished the first transaction, even though the with statement has concluded, and its Server function has returned.

I’ve artificially added a one-second sleep before returning, and for now that seems to be doing the job. But I need to add diagnostic prints, and possibly re-think my mechanisms, to be sure.

jshaffstall · May 26, 2021, 3:16pm

That’s an interesting thought. Hopefully someone with knowledge of Anvil internals can comment on whether that’s possible or not. I’d always assumed that data table operations were synchronous, and the server code wouldn’t continue until the data table was in a stable state, including the end of a transaction.

p.colbert · May 26, 2021, 4:01pm

That’s what I would have expected, too. In fact, the only reason I have, for suspecting otherwise, is this particular message.

I wonder if this would be an effective workaround: Using a decorator. It will retry up to 5 times before giving up.

junderhill · May 30, 2021, 7:43am

Jeff, I think we’re running Postgres under the hood, here. Postgres uses Multi-Version Concurrency Control (MVCC). This is designed to provide the isolation necessary to support transactions, while allowing the database to scale to high levels of concurrency. In general, it works really well.

There are different levels of transaction isolation. The default level in Postgres is called Read Committed: Basically, the transaction sees the data that was committed before the transaction started. If another transaction changes the data you’re looking at before you finish, one of them will fail. The highest isolation level is Serializable, which guarantees certain other edge cases don’t occur, but increases the likelihood of conflicts. Per this link, Anvil provides Serializable isolation. You can read more about isolation in Postgres here.

Owen really is right–understanding how all this works can get ugly, and I couldn’t agree more that the best approach in the application is to keep it simple, and do one thing at a time. But in most cases it should be possible to reason through what the database is doing, provided you understand what the application is doing.

But what is the application really doing? Your front end is compiled into JavaScript, a non-blocking language. So it seems possible that the front end could trigger your back end routine two or more times in succession. Each of these may run as a separate request on the Anvil back end, resulting in concurrent attempts to update the database.

It also seems possible that one of your subroutines is opening a second transaction, and thereby causing a conflict with the caller, or a sibling. I don’t know for sure because I can’t see all the code.

Are either of these things actually happening? You can probably figure that out, but ultimately this brings us back to Owen: I think his solution (and mine) would be to ensure that they don’t. For example, I would not call this routine from an init function. I would call it from only one place–the button routine–and I would disable the button as soon as it’s pressed. On the back end, I would move all the database I/O into the main routine, where we can see it, and there’s no possibility a second transaction is opened by your own code. If you do those things, I’ll bet the problem goes away.

John

owen.campbell · May 30, 2021, 7:57am

What an excellent dark arts introductory tutorial!

Definitely. I often find this happening for reasons I never get to the bottom of. Try putting some print statements in and see if you get the number of them you expect (or use the logging/timing functions in Anvil Extras)

Again, I never bother trying to work out why this happens, I just accept that it can and ensure the server side handles it properly.

jshaffstall · May 30, 2021, 2:29pm

That’s definitely not happening in my case.

I thought that was the most likely cause of it, but I was unable to recreate that deliberately. I did put in code to disable the button after a click just in case.

That’s part of the problem with intermittent errors, you don’t know if you’ve fixed them because they happen irregularly anyway.

danbolinson · June 1, 2021, 9:44pm

We’ve seen our share of transaction conflicts. Owen’s suggestion (delay all db writes to a specific function that does the bare minimum and wrap only that function in the in_transaction decorator) is our basic approach.

These transaction conflicts appear to be very significantly increased when a search/get is made as part of a transaction. Most of the time for us, that search/get can safely be performed BEFORE the transaction lock is placed, because we don’t expect situations where the same rows would be handled by multiple users concurrently. So if you can do it, getting searches/gets done before starting a transaction is a really good place to start.

Feature request: This Read Committed thing @junderhill references seems like it would handle 99% of our use-cases… would be great if we could pass a parameter to the in_transaction lock for opt in to this!

jshaffstall · June 1, 2021, 10:48pm

That’d be a good thing to post in the Feature Requests forum: Feature Requests - Anvil Community Forum

junderhill · June 2, 2021, 4:15am

@danbolinson raises some great points. When running with Serializable isolation (the default in Anvil), if you do a search inside a transaction, and the data you’ve ‘observed’ changes before the transaction completes, it may fail. Postgres doesn’t know how you intend to use the results, so it assumes they’re all important. Running with Read Committed relaxes the constraints somewhat.

On the other hand, if you need those search results to be accurate and unchanged, doing the search outside the transaction might be unacceptable.

Narrowing the search may help. @jshaffstall seems to have already done this, at least in the code I can see.

I still think there might be glitch here, with a second transaction getting started somehow. @jshaffstall swears this isn’t happening, but I’m not convinced. Most web platforms run every request in a transaction, by default. What does Anvil do if you don’t include the decorator? Are you sure?

When not already in a transaction, Postgres itself runs every statement in a transaction. Wheels within wheels. Which is why I still like @owen.campbell’s advice best. If anything can go wrong, it will.

In Postgres you can specify the isolation level when you open a transaction. It might be nice to have that in Anvil.

junderhill · June 2, 2021, 4:19am

I put in a feature request. Links back here and to the Postgres doc.

jshaffstall · June 2, 2021, 4:53am

I can confirm that if you start another transaction (explicitly) when you already have one open, Anvil throws an exception that specifically states you’re trying to start a second transaction.

So it isn’t possible that a second transaction is being explicitly started and thus causing the transaction conflict. What might be happening under the hood is anyone’s guess.

junderhill · June 2, 2021, 6:46pm

Yes, but that doesn’t mean some other part of your program, not in a transaction, isn’t messing up the transaction. Particularly with Serializable isolation, it’s possible to step on yourself.

It’s difficult to say anything about this because we can’t see what your code is doing. We have no way of knowing whether this line

or this one

does any database I/O. If they do, they could affect the transaction. And it’s still possible the back end is being called two or more times in succession by the front end. Adding some debug statements might shed some light on that, and you may have already done that.

I do think there’s something counter-intuitive going on internally, particularly if you’re certain nobody else is using the app, nor any background task. This shouldn’t be this hard. We have built large, highly concurrent apps, such as mobile games, on (non-Anvil) Python/Postgres back ends, and don’t run into this problem. But we run under Read Committed isolation, and updates are wrapped in two layers of retries, including an application-level retry loop with a random back-off delay. These applications operate flawlessly at scale.

jshaffstall · June 2, 2021, 7:07pm

I can guarantee that I am the only person using this app right now, and that there are no background tasks. And under regular use, even with multiple users calling the same function, they would all be operating on different rows in the data tables.

I have taken steps to make sure the button can’t get pushed twice, which is the only way I can think of that the server function might have gotten called again before it had finished the first time.

That’s the one that’s doing the database work. I’ve since moved the transaction inside of it, to narrow down the scope.

I’m having a hard time reconciling what I know from SQL transactions with the behavior I’ve been seeing in Anvil. The fact that it’s such a black box makes it tough to even reason about what might be causing a conflict with only a single transaction active.

I’m probably also overusing transactions because of the convenience of the decorator. In this particular case, the database activity is limited to records for the current user. There really should be zero chance of conflict under normal usage.

Anyway, here’s the code in set_mystery_being_played that was actually executed when the transaction conflict happened:

  try:
    if order_item['mystery_being_played'] and order_item['mystery_being_played']['title']:
      return
  except:
    pass

There’s more past that that wasn’t being executed in that code path.

Maybe the conflict had to do with including the search in the scope of the transaction, and is part of some underlying Anvil issue. I plan on narrowing down the scope of the transaction more, following the advice to keep searches out of transactions.