Running background tasks in uplinked server code

aldo.ercolani · October 19, 2019, 4:51pm

Hello
I am trying to run a background task in an uplink-connected server module.
When I try to set task state e.g.
anvil.server.task_state['progress'] = 42
I get the error
Cannot find reference 'task_state' in 'server.py'

I cloned Shaun’s Crawl tutorial and ctrl+c ctrl+v his server code in an uplink module, and get this error (besides another one on the import anvil.http line about anvil.http being inaccessible from the uplink module).
Where am I wrong?
I am not finding information in the docs about this.

Thanks

bridget · October 21, 2019, 10:16am

Hi @aldo.ercolani,

The Uplink doesn’t form part of our managed infrastructure, meaning that if an Uplink connection is disconnected, we lose track of any background tasks.

Because we can’t guarantee to keep track of all background tasks launched via the Uplink, the get_state() method always returns None if a task is running on the Uplink. You will still have access to more general methods relating to the Task object, such as is_running(), is_completed(), get_task_name(), get_start_time().

I am happy to submit a feature request for turning the Uplink into a full background task executor if that’s something you’d find useful?

david.wylie · October 21, 2019, 10:47am

Could you both fire the background task and check its state by calling “wrapper” functions from the uplink on the server, rather than starting it from the uplink directly?

In my head that would push the ownership of the task into the managed infrastructure, possibly using data tables to handle. A bit convoluted, I agree, as you’d need to manage the task ending with a potentially disconnected uplink. Data tables maybe?

aldo.ercolani · October 21, 2019, 1:26pm

Thanks to both bridget and david.wylie
Ok, now the picture is clear I’ll spend some more time trying to figure out a solution, maybe trying david’s suggestion first.
I think what bridget says should be put in the docs too.
Should I file a bug report for this @bridget?

Thanks and BR

bridget · October 21, 2019, 3:35pm

Thanks @aldo.ercolani but no need, it’s already on my to-do list!

aldo.ercolani · October 22, 2019, 7:00pm

I’ve been thinking for a while on possible workarounds, but cannot find a clear workaround.

I thought at david’s idea but I bet that if task’s ownership is back into managed infrastructure, then the 30 secs time limit will apply again (I am on free plan) while that’s the main reason I am trying to have the background task on an uplink.

Then I thought to use a global variable to communicate between bg task and client, but the uplink code doesn’t see (and cannot import) the globals module.

Maybe I could use the DB Tables to pass data between uplink and client but that seems really too slow to me.

I am sure there is a simpler, more straightforward way here but I seem unable to find it out.

EDIT: Another question: is it possible, for a task, to know its own task id?

ben.j.etherington · July 9, 2020, 3:39pm

Upvote for the ability to execute background tasks on Uplink. I’m happy writing my own exception handling in case the Uplink disconnects, and it would be really nice to be able to take advantage of my desktop’s resources and have an elegant way to pass asynchronous data to the client.

p.colbert · July 9, 2020, 4:04pm

Do you mean

Invoke background tasks (running on Anvil’s servers) from an Uplink program
Have an Uplink program run as a background task

?

ben.j.etherington · July 9, 2020, 4:22pm

#2, Uplink code running as a background task. I haven’t done any testing, so apologies if this is already implemented. I’ve been reading the docs and forum instead of actually testing it out.

p.colbert · July 9, 2020, 5:11pm

The docs on Background Tasks are fairly short.

Before Background Tasks were implemented, I set up an in-house (Uplinked) equivalent. A Server Module function would receive a request, log it in a table, and call an Uplinked function to process that request. Because that function had to return right away, it would pass the work on to another process. Ultimately, a second Uplink program would file the result in the table row reserved by the Server Module.

You could think of that as a poor-man’s locally-running Background Task. Funny thing is, it still works.

aldo.ercolani · July 11, 2020, 3:09pm

Interesting, could you give code samples?
Thanks

p.colbert · July 13, 2020, 8:04pm

With any luck, I’ll be getting back into that code in a day or two. Then I can track down the relevant bits across the three programs.

p.colbert · July 17, 2020, 5:10pm

I’m piecing this together from existing code. It was written to solve a problem, not as a general “how-to”. So I’ve had to edit extensively. Apologies if I have, in the process, introduced syntax or execution errors.

I’m deliberately omitting module setup (imports and anvil.server.connect() calls). They don’t illustrate any important points about this technique.

First, you need a db table to record each outstanding task. In this example, I name it Tasks (python_name: tasks), with the following columns:

request: string
when_started: datetime
when_completed: datetime

You may want to add other bookkeeping columns.

Second, a Server Module function to create and launch the task:

@anvil.server.callable
def launch_offsite_task(request):
  """Immediately launches a task, 
  with the string request as data.
  Returns a row id for use as a task id."""
  task = app_tables.tasks.add_row(
    request = request,
    when_started = datetime.utcnow(),
    when_completed = None
  )
  task_id = task.get_id()
  anvil.server.call('launch_task_nowait', request, task_id)
  # note: all sorts of things can go wrong with the above call, 
  # so you should really wrap it in a try...except block.
  return task_id

The caller might use the task_id, later, to look up the table row, and see whether the task has completed.

Third, an Uplink function, off-site, to launch the task locally.

job_inbox = '/tasks/in_box/'

@anvil.server.callable
def launch_task_nowait(request, task_id):
  """Immediately launches a local task, 
  with the string request as data.
  Returns immediately."""
  task_id_as_file_name = task_id.encode().hex()
  full_file_name = job_inbox + task_id_as_file_name + '.txt'
  with task_file as open(full_file_name, 'wt'):
    task_file.write(request)

Because several users might launch tasks at essentially the same time, this function may still be running when Anvil calls it again. In this case, it will run that call in its own thread.

So be careful with global variables! Avoid writing to global data, even indirectly (e.g., via some database wrapper), unless you can be absolutely certain that the write is thread-safe.

Fourth: Now that the request has been written as a file, it’s up to some worker program to pick up the file and actually get something done with it. For simplicity, we assume that the worker writes to an outbox folder when it is done. Then a separate Python program, monitoring that folder, can notify Anvil that the work really is done:

job_outbox = '/tasks/out_box/'
folder_to_watch = Path(job_outbox)
pattern_to_wait_for = '*.txt'
cycle_time_seconds = 30

def record_task_completion(task_id):
  """Records completion of a local task."""
  task_row = app_tables.tasks.get_by_id(task_id)
  task_row.update(when_completed = datetime.utcnow())

# begin watching for completed jobs:
while True :
  for full_filename_found in folder_to_watch.glob(pattern_to_wait_for):
    file_name = full_filename_found.stem
    # convert file name back to a task id:
    task_id = bytes.fromhex(file_name).decode()
    record_task_completion(task_id)
    # remove flag file, so we don't process it again:
    full_filename_found.unlink()
  # Once we've exhausted the out-box, 
  # let other programs use the file system and CPUs:
  time.sleep(cycle_time_seconds)

Because this function handles just one file at a time, it does not need to use threads. It can safely use global state (e.g., a database connection).

I hope this gets the general idea across.

aldo.ercolani · July 17, 2020, 6:04pm

Awesome!
Thanks a lot!!