Questions about Anvil server details for optimization

jshaffstall · July 15, 2024, 6:37pm

As I’ve been helping several people with optimizations of their apps, I realized that I’ve never confirmed some of my assumptions about how Anvil implements server calls and background tasks. I want to lay out what those assumptions are, and then hopefully those with more knowledge of the implementation details can say where I got it wrong.

Here’s what I currently think is true:

Without persistent server turned on, each Anvil server call starts up a new process
With persistent server turned on, each Anvil server call is a separate thread in the same process
Each Anvil background task starts a new process, regardless of persistent server.

p.colbert · July 15, 2024, 7:30pm

That’s my understanding for #1 and #3.

I was under the impression that for #2, each browser session got its own process. So different users could never see the same global variables.

But I’ve never had a reason to try it out.

In principle, the two can be distinguished via the use of global variables. If you’re correct, then all/many browser sessions share one server process, and will see the same value (e.g., a startup timestamp) of that global variable. Otherwise, each browser session will see a different value.

In practice, Anvil could, for load-balancing purposes, start multiple server process instances, on different [virtual] machines, so it’s not guaranteed that global variables would be shared.

Edit: I was also under the impression that the Persistent Server was not multithreaded, but strictly one anvil.server.call at a time. This would certainly be easier on the programmer, as otherwise they’d have to be keenly aware of multithreading issues such as locks and race conditions throughout their design and programming (and debugging!) process. (This is the same reason that Python uses a Global Interpreter Lock.)

stefano.menci · July 15, 2024, 9:11pm

I’ve lived with the same 3 assumptions and never had bad surprises, both on server and uplink.

The very few times I’ve worked with globals shared across calls, either they were loaded at startup then used as read only, so I didn’t care about locking, or if they were read and write, I have used locks.

More often, the only locking mechanism I’ve used has been database transactions, with the (fourth) assumption that they are thread safe, like any other service available in Anvil.

In an old app I had the transaction decorator wrapping a long and slow chunk of code. One day I got a burst of 120 http endpoints calls, and those transaction kept failing and retrying. The Anvil support noticed it and sent me an email telling me that that app had brought down the postgres and Anvil servers. I consider this a confirmation that two calls can run concurrently, and my guess is that they run in the same process, in different threads.

p.colbert · July 15, 2024, 9:25pm

Good to have that feedback! Thanks, @stefano.menci .

Even without concurrency, some care is required with pseudo-globals, like local files, database state and anvil.server.session. For example, it’s entirely possible for a user to log out of a browser session – or have it time out – and not close the browser tab; and then a completely different user logs in in the same browser tab/session.

There are no Anvil-defined “user-changed” events we can use to detect such cases. We have to code those for ourselves, and make sure that we clear user-specific data out of any caches (anvil.server.session, anvil.server.cookies) we may have used.

Multithreaded calls is something Uplink servers have to account for, too. So the same kind of thinking can be reused for Persistent Servers.

jshaffstall · July 15, 2024, 10:32pm

I guess that adds another element, that HTTP endpoints execute in the same persistent server that server calls do.

stefano.menci · July 15, 2024, 11:57pm

Well, I have a dedicated server, so that’s always the case for me.

Even with a dedicated server, the process can be killed for any reason, but I have only one server.

In that case the solution was to isolate the transaction sensitive code and get it to run quickly, by creating a smaller transaction decorated function, rather than decorating the whole endpoint. And if I remember correctly, that code was running in an uplink script. The solution was to have the uplink call a transaction decorated function that runs in the server, so we don’t need to keep the transaction open while uplink and server communicate back and forth about the transaction management and about the table queries.

ryan · July 17, 2024, 1:13pm

Hi @jshaffstall,

This is a great idea for a thread.

As you assumed and @p.colbert has said, #1 and #3 are correct.

For #2 @stefano.menci, calls can run in different threads of the same process but this isn’t guaranteed on Anvil’s hosted infastructure due to load balancing.

And to your point about HTTP endpoints executing in the same persistent server @jshaffstall, they can but again it isn’t garaunteed because they could be routed to another server for load balancing.

I hope that helps.

jshaffstall · July 17, 2024, 1:57pm

Thanks, Ryan, that helps a lot!