Issues with data table lookups when running a Telegram bot from a Background Task

ejl · August 24, 2021, 3:35pm

What I’m trying to do:
I need to run a Telegram (TG) bot, for notifying my users of different events happening. I am using the python-telegram-bot (PTB) library, which is already included in the
As PTB works by registering different command-handlers, and the starting a polling task, I was hoping to have the bot run as a Background Task, which I can control from an admin interface I have already built.

What I am trying to figure out, is this a reasonable use of the Background Tasks of Anvil?
I understand that a dedicated server connected through Uplink, would work, but I’m trying to do this all inside Anvil if possible.

I am running into issues, when trying to query the database, from inside the background task, that is handling PTB:

What I’ve tried and what’s not working:
I have defined my background task, and can start and stop it as I want.

This task is running the PTB, using the following principle:

tg_service_task:

from telegram.ext import Updater
from telegram.ext.commandhandler import CommandHandler

from .tg_bot_api import start_auth

updater = Updater(anvil.secrets.get_secret("VERY_SECRET_TOKEN"))
dispatcher = updater.dispatcher

@anvil.server.background_task('telegram_service')
def service_handler():
  
  anvil.server.task_state = 'Starting service'
  
  # Register commands
  start_auth_handler = CommandHandler("start", start_auth)
  dispatcher.add_handler(start_auth_handler)
  
  # Start telegram-bot
  updater.start_polling()
  print(f"Telegram bot started")

The callback function start_auth looks something like this:
tg_bot_api:

      token = context.args[0]
      print(f"User { update.effective_chat.id } is requesting authorization with token { token }")
      
      operator = app_tables.operators.get(tg_auth_token=token)

      if operator is not None:
        # Save TG ID in DB
        operator.update(telegram_id=int(update.effextive_chat.id))
        
        # Let user know everything went well
        context.bot.send_message(
          chat_id=update.effective_chat.id,
          text=f"Welcome { operator['title'] }\nYou have been authorized successfully!",
        )
        
      else:
        # Authorization request denied
        context.bot.send_message(
          chat_id=update.effective_chat.id,
          text=f"Failed to authorize.",
        )

The issue:

When I initiate the command from Telegram, the start_auth function hangs on the operator = app_tables.operators.get(tg_auth_token=token), without throwing an error or anything else.

In the logs, I can see that the bot registers the event, with the correct auth_token that matched the one in the data table, however the get() command does not return.

Am I missing something obvious?

I have tried implemented a simple greeting command, which responds to commands from a Telegram client, so the bot works.
I am, however, suspecting that the PTB maybe blocks the process that is trying to call the data table?

Any insights, suggestions or follow-up questions would be greatly appreciated.

I am working on creating a demo-app, to show the behaviour, but to use it, one would need to provide their own Telegram API key

stefano.menci · August 24, 2021, 4:29pm

Are you sure it’s hanging rather than throwing an exception?

Have you tried wrapping the get in a try - except block, and printing the exception?

try:
  operator = app_tables.operators.get(tg_auth_token=token)
except Exception e:
  print(str(e))

ejl · August 24, 2021, 4:34pm

Wow, I thought I had tried catching any exceptions, but apparently not.

I now get the following exception:

Call from invalid context

Any suggestions on the reason for this message?

ejl · August 24, 2021, 5:35pm

Investigating further, the output of anvil.server.context.type is server_module, which should mean the function has the correct context?

Or am I misunderstanding the context?

stefano.menci · August 24, 2021, 6:16pm

I have never seen that error and I don’t know what is a context.
I can use app_tables from background tasks without any problems.
Sorry, we need someone else to help here.

Are you able to reproduce the problem in a smaller app, without Telegram or other dependencies?

I usually try to create a small app that isolates the problem so I can get help in the forum by posting the clone link, and I almost always end up finding the cause of the problem while creating the app.

ejl · August 24, 2021, 6:30pm

After looking a bit into the source-code of PTB, it looks like the Updater.start_polling() spawns a thread, which is responsible for polling updates from Telegrams server. And therefor might be calling the callback functions as well.

I seem to remember reading on Anvil forums, that Anvil does not work well, when using threads. This may explain why I’m experiencing the issues I am.

Any thoughts?

I’ll build a demo-app tomorrow morning, to see if I can reproduce the issue.
I’ll post a clone-link once it’s ready.

In any case, thank you for taking the time to help out!

I hope Anvil Staff or another user might have seen the Call from invalid context before.

stefano.menci · August 24, 2021, 6:35pm

Using threads on the server is not safe:

I’m sure you can’t use app_tables from an unofficial thread, and I would be afraid that Anvil would kill that thread whenever it likes.

I don’t know your library, but I always stay away from creating my own threads and try to use callbacks to http endpoints instead. The http endpoint can always use app_tables or start a background task.

ejl · August 25, 2021, 10:48am

I don’t have control over the use of threads, as the python-telegram-bot spawns it internally.

I guess using a separate Uplink server seems to be the most straight forward solution. Although I was hoping to avoid, having to manage my own VPS.

Could you elaborate a bit on what you mean by HTTP endpoints?

I have tried calling the callback-function using anvil.server.call('start_auth'), but this results in an error about Anvil not being able to serialize the objects in my library. From the docs, I gather that only specific types of objects can be passe around by anvil.server.call. This lack of ability to serialize complex objects, would also prevent me from using HTTP endpoints, no?

stefano.menci · August 25, 2021, 12:16pm

This is the answer an old post of mine. I was trying to create a background task when background tasks in anvil did not exist yet. You may find some interesting info about threads in Anvil (assuming that that info is still current).

By using http endpoints I meant getting whatever service you are using to call you back via an http end point rather than generating a thread that waits for that service to respond. I don’t know your library and it may be impossible, I just thought I would mention it.

As I said I stopped using threads before background tasks in Anvil even existed, but you could try to start a background task from your thread. The context (whatever that is) is wrong for working with the database in your own thread, but you could be lucky and you could be able to have your background task start a thread that starts a new background task with the correct context.

Or get your thread to call your http endpoint.

Whether you use a background task or http end point, I’m afraid you will need to serialize your data.

ejl · August 26, 2021, 12:26pm

Marking this as the solution, as this brought me to the conclusion, that I have to run python-telegram-bot on a self-hosted server using Uplink, due to Anvils use of threads on the cloud-servers.

This however seems to be working very well, so for future reference, this is the overall structure that made it work for me:

Anvil Form
- Provides a dashboard, from which I can see the status of my Uplink connected server, as well as start/stop the Telegram Bot by calling the functions running on Anvils server
Server Code on Anvils servers
- Provides middleman functions, that are called from the Form dashboard, to connect to the functions running on the Uplink server
- These run on the Anvil server, as they need to be responsive, even when the Uplink connection is lost, but in theory I could have called the functions directly on the Uplink server
Server running on a VPS
- Connected with Uplink
- Anvil accessible functions are basically just an interface (start/stop/status) for Bot object which is handling the actual PTB code.
- In the Bot object I can call methods from app_tables, which now has the correct context.

I still don’t quite understand how I am able to make calls to my data table from the Bot object, as this is running in a new thread, but it works. I guess its because the original Uplink thread spawned it, and it carries over the context?

Anyways, thank you for offering your input @stefano.menci - much appreciated!

stefano.menci · August 26, 2021, 1:29pm

Thats’ weird, I have never noticed differences between uplink and server code, with the exception of:

Accessing the database and other services from uplink works the same way but it’s slower because it requires more round trips
Being slower makes transactions more likely to fail. One day an app of mine received a burst of http endpoint calls which started a burst of uplink functions which started a burst of database accesses to the same row of the same table which caused a ton of database connections to stay open while retrying the transactions which caused the whole dedicated server to slow down for about 10 minutes. I have modified the app so that the first database access happens on the server before starting the uplink, and tested it with bigger bursts without problems.

I have 3 uplink Windows machines, it’s Windows because they need to run software only available in Windows. They are 3 because:

When the server calls a function registered in multiple machines, it will execute randomly in one of them (it used to execute it on the last registered one)
An uplink node is automatically unregistered as soon as it’s down (very useful with sneaky Windows updates or other problems)
I can have 3 long running tasks running in parallel on different machines
In same cases I don’t like to use the random load balancing, so I append the machine name to the function name with:

@anvil.server.callable(f'start_uplink_process_{COMPUTER_NAME}')
def start_uplink_process():