Hi,
An installed package I’m trying to use is giving an error when called from an Anvil server module because it needs to run on the main thread (Anvil seems to run server code somewhere other than the main thread), and I’m wondering if anyone can suggest a solution / approach for getting this package working in Anvil.
For reference, I’m building an app which needs to render a URL’s html from JavaScript. There doesn’t seem to be a straightforward way of acomplishing this without using this particular package.
What happened:
Using the requests_html python package running in a script on my local machine, it was possible to render the HTML from the JavaScript without much difficulty, which is a win given the complicated nature of what is happening behind the scenes i.e. opening a background version of chromium and rendering the HTML from there.
However, trying to run the same code on an Anvil server module called from the Client (and this was also tried on a separate uplink server) produced an error:
RuntimeError: There is no current event loop in thread ‘Thread-1’.
The error is caused by a threading issue:
The issue seems to be that Anvil is using threading / has requirements around threading which interfere with the ‘html.render()’ method of ‘requests_html’, which apparently needs to be run from the main thread in order to work - this looks to be related to its use of ‘signals’.
Others who have experienced this problem seem to be pointing to various possible solutions involving trying to control the threading using asyncio and similar packages which are new to me and they look like long shots - I would think that what is causing requests_html to break on Anvil is happening outside what the code in the Anvil Server Module can control?
I tried setting up a new event loop using asyncio:
asyncio.set_event_loop(asyncio.new_event_loop())
This didn’t work but got a different error:
ValueError: signal only works in main thread
To give some context, here is the code I’m trying to run in the Server Module, which runs successfully on my local machine:
from requests_html import HTMLSession
session = HTMLSession()
page = session.get(url)
page.html.render() # This is the line which triggers the error
Here is the traceback for the error:
RuntimeError: There is no current event loop in thread ‘Thread-1’.
at /usr/local/lib/python3.7/asyncio/events.py, line 644
called from /usr/local/lib/python3.7/site-packages/requests_html.py, line 727
called from /usr/local/lib/python3.7/site-packages/requests_html.py, line 586
called from ServerModule1, line 53
called from Main_form, line 83
called from Main_form, line 101
My questions:
Is there anything which can be done to allow the current code to be called from a Server Module by the Client in such a way that it runs on what requests_html package sees as the main thread, thereby allowing it to work like it does on my local machine?
If not, my potential backup plan would be to look at coding something in Selenium (a web scraping library) directly, but this is likely to be complex without guarantee of success - I think requests_html uses Selenium anyway but the hope would be that the threading problem on Anvil is caused by the way requests_html uses Selenium rather than the combination of Selenium and Anvil itself being the issue, not sure. So here I’d ask, can anyone suggest whether they would take the risk of going down this potentially time consuming route with Selenium or not, given what they know about how Anvil works?
Thanks,
Richard