How long do background tasks stay in the list?

stefano.menci · February 5, 2022, 5:48am

I am using this code to see if all the background tasks have finished doing their job:

done = True
for task in anvil.server.list_background_tasks():
    if task.is_running() and task.get_state().get('job_id') == job_id:
        done = False
        break

I am waiting for about 30 tasks to finish their job, but len(anvil.server.list_background_tasks()) returns 700 and the cycle takes a few seconds to run.

This function takes more than 10 seconds to print 0:

@anvil.server.callable
def test():
    t = time.time()
    print(sum(t for t in anvil.server.list_background_tasks() if t.is_running()))
    print(time.time() - t)

Is there a way to either cycle faster through the tasks or to get rid of the stale ones?

p.colbert · February 5, 2022, 3:31pm

Could get_termination_status() == "completed" be used to shorten (a copy of) the list? After all, once a job is done, it shouldn’t have to be revisited.

stefano.menci · February 5, 2022, 4:30pm

The client calls the server and the server needs to find the list of tasks that are still running such that task.get_state().get('job_id') == job_id.

I could store the task_id of each task in a data table, then restart from there, but I thought that using anvil.server.list_background_tasks() would be faster.

So… how do I not revisit them?

Do I keep my own list of task ids on a data table instead of using anvil.server.list_background_tasks()?

p.colbert · February 5, 2022, 6:29pm

I might not have been thinking all that clearly, but I was thinking that a short list should take less time to iterate over. Of necessity, this means keeping a separate copy of that list, so that it can be shortened.

To avoid revisiting a completed task, either drop it from the list, or don’t add it in the first place. (Depending on when and where you build the list.) The next time you iterate over the list, that task just won’t be there, and shouldn’t waste any more time.

You bring up a good point. Where to keep the list depends on the circumstances and the lifetime needed. If it has to be dragged out of a database, instead of a session cache, or a memory-resident list, then that might negate any speed advantage. This suggests using another background task as a monitor, to minimize latency.

A lot depends on when and how these background tasks are created, too. If the monitoring task is the only one that creates them, then perhaps the list can be maintained entirely in memory. With no outside interference, the list ought to be accurate and fast.

Whether any of the above is practical, will depend very strongly on the circumstances.

If task-creation is distributed, for example, then a transaction-guarded “task table” may be the only other reliable way to keep track. At least then you can use query conditions to exclude completed tasks – or remove them to a different table entirely, thus keeping the “active” list short.

Will this be faster in practice? I don’t know, but 30 vs. 700 certainly gives this approach a numerical advantage. And it gives you a place to hang additional per-task data (and metadata), which might be helpful in some circumstances.

ianb · February 5, 2022, 6:33pm

I keep trying to think of a way other than this:

And I really can’t think of a better way.

Especially if you are doing this (which I just read and think is a really great solution):

I would have a table with literally just one column of running ID’s.

The ID’s would be added when the task was generated by another function before being passed back to the client.
To do what you wanted I would do a app_tables.running_btaskID_table.get(job_id) for your job ID, if it was None then your job is completed.
Finally if it is not None, to a .search() on the table and test each running task with task.is_running(), deleting the rows of any that return False.
Finally Finally, repeat step 2 and return the result of if your task was still running.

stefano.menci · February 5, 2022, 9:18pm

I could create a singleton class that lives in the server and keeps the list of tasks in memory and the list of task ids in a data table.

Keeping the list of tasks in memory makes the job faster if you have a persistent server.
Keeping the list of task ids in a data table allows to rebuild the list of tasks when the persistent server doesn’t persist or doesn’t exist.

The problem is not anvil.server.list_background_tasks() being too slow. The problem is that is_running() or get_state() are very slow, and using them on a very long list makes the whole thing unusable. I can’t make them faster, but I can make the list shorter by only focusing on the tasks listed and filtered from the data table, or, even better, by not using them at all.

Adding a state column on the task data table and having each task updating its state in the data table instead of using anvil.server.task_state, allows to quickly get a short list of task states and ids. Sometimes this alone is enough, sometimes the task must be retrieved to check is_running() or get_termination_status().

For now (pending the creation of the singleton class mentioned above) my solution is similar to what @ianb describes:

generate a session_id
start the background task and pass the session_id argument to it
add a row to a table containing both session_id and task_id

In then the background task:

use the session_id to get its row
update state in that row instead of using anvil.server.task_state
let the table know, either in the state column or another dedicated column, when the task has completed its job . This will fail when the task times out, is killed, etc., but will help in most cases

At this point any server function can quickly get the tasks it’s interested on and their state most of the times, and can get the task object for deeper investigation when in need.

I would still like an answer to the question: How long do background tasks stay in the list?