anvil.server.TimeoutError after outage

mastamatto · January 20, 2024, 5:13pm

We are experiencing anvil.server.TimeoutError on all of our apps that we use for our workplace. This has been off and on for the past few days causing severe workplace disruptions. We would share an app ID, but all of the apps require a login. Is this related to your outage announcement, and is work being done to address this?

daviesian · January 20, 2024, 6:39pm

[Moved to new topic]

Sorry to hear that! We’re not aware of any current issues. Please let us know the app IDs of the apps where you’re seeing the issue so we can investigate. Do you get the error reliably for certain Server Functions, or all Server functions, or does it only happen sometimes? If it’s only certain functions, please let us know which ones. Thanks!

andrew · January 19, 2024, 8:19pm

I’m having repeated response issues as at 20:00-20:15GMT on Friday.

(I saw the issues with uplinks that occurred yesterday, but this seems to be a different issue, so a new thread. )

On several different tries, the server commands are timing out before the task seems to start. E.G even a simple print("Test is starting") first line isn’t triggering.

An example session ID is JODHSDDJQC73GHI5FCWQGNNQ5ALKK5YN
App ID is Y26N6JTYNQLGMR7Z

Full console log below:

Application loaded

`anvil.server.TimeoutError: Server code took too long`

* `at /downlink-sources/downlink-2024-01-08-11-08-41/anvil/_threaded_server.py:436`
* `called from /downlink-sources/downlink-2024-01-08-11-08-41/anvil/server.py:55`
* `called from <input>:1`
* `called from /downlink-sources/downlink-2024-01-08-11-08-41/anvil_downlink_worker/__init__.py:213`
* `called from /downlink-sources/downlink-2024-01-08-11-08-41/anvil_downlink_worker/__init__.py:244`

`anvil.server.TimeoutError: Server code took too long`

I’m in the US East Coast if that’s relevant.

andrew · January 19, 2024, 10:35pm

Servers are responding now at 22:35GMT.

andrew · January 20, 2024, 1:52pm

This appeared to be resolved ~0300GMT but timeouts are recurring again at 1400GMT (0900 my time).

This only seems to be affecting direct server calls as background tasks seem to be running without issue.

anvil.server.callable tasks that normally complete without issue keep timing out.

mastamatto · January 20, 2024, 6:20pm

I’m also running into this issue. Yesterday and today (1/19/2024 and 1/20/2024).
Any one find a solution yet?

daviesian · January 20, 2024, 6:48pm

Hi @andrew

Are there any particular functions that time out reliably when called from client-side code? We’re trying to work out what’s going on here, but haven’t been able to reproduce the problem so far. Thanks!

mastamatto · January 20, 2024, 6:57pm

We do not have any specific functions that always timeout. For us, it’s intermittent. However, it’s ALL server functions. Perhaps this is a hint: this timeout error has only occurred on the apps that we have hosted on Anvil. We have some self-hosted anvil apps that never ran into this error.

andrew · January 20, 2024, 7:54pm

Hi Ian,

Client-side, I couldn’t log in as the authentication calls were timing out so I don’t have any examples of client-side events that failed earlier. Sorry.

I just tried a couple of server calls, and things are running fine from the command line. I can log to the app and the commands are working there too. (This was at 19:51GMT.)

The logs for the last couple of hours also seem OK and the last set of issues I logged were at 17:03GMT / 12:03 ET, where several runs in a background task timed out.

This was session ID EC7QG6DYCXQSNTGHXEV3VLBBYWOTF7G


`Error updating news for Syria: Server code took too long

Error updating news for Singapore: Server code took too long

Run complete for Switzerland

Error updating news for Brazil: Server code took too long

Run complete for Democratic Republic of the Congo

Error updating news for Egypt: Server code took too long

Error updating news for Algeria: Server code took too long

Not sure if that helps.

Is there anything else I can do to help track this down?

Update to this

By 20:30GMT I was getting timeout errors again on some longer server calls (all of which normally run ok).

FWIW, instead of a flat-out server error, it seems as though something is running slower than normal, meaning that server calls that run under the server timeout cut-off are now taking too long to execute and are producing the timeout error. So maybe not an Anvil issue as much as a server issue somewhere?

(But to be clear, I really have no idea what I am talking about here… )

daviesian · January 21, 2024, 10:31am

Thanks, we’re continuing to investigate this, but so far we have no leads - everything looks fine in our server metrics, and we haven’t seen any timeouts ourselves.

@mastamatto I think I’m right in saying that sometimes everything works fine for you, but for periods of time every server call times out - is that correct? Do you have any idea how long those outage periods are? Please can you tell me an app ID for an app where this is happening so I can correlate the TimeoutErrors in the logs with other events in our infrastructure?

Vadim · January 21, 2024, 7:37pm

Hi Ian,
That started from Thu 18th Jan 2024 ~ 22:00 EET.
The most terrible day was Friday ~ 20.00 EET.
Weekend - periodically huge delays without timeout errors.
ID:A2ARJWGGE5WE4BRR

I thought you have a problem with a storage.

meredydd · January 22, 2024, 1:23pm

Hi @Vadim,

Thanks for providing a specific app and time! Unfortunately, some of the most detailed diagnostics have now aged out of our system (we’re curringly working on retaining more information for longer).

In the meantime, it would be really, really helpful if people experiencing this problem could post:

An app ID
One or more session IDs where the error was occurring
Dates and times wimes where the error has occurred

Without that, we’re looking for a straw in a haystack (the rate of “correct” timeout errors across Anvil is fairly high, and so far we don’t have much to distinguish the “bad” timeout errors from the “good” ones. We’re working on it!)

D.S · January 22, 2024, 2:09pm

Hey there,

we are experiencing connection timeouts again in different apps. This started a couple of minutes ago.

AppOfflineError: Connection to server failed (1006)

mark.breuss · January 22, 2024, 2:12pm

We are getting in reports that certain endpoints time out or do not work at all.

Also sever calls time out.

Anyone else experiencing issues?

Update:
The server calls just spin forever and don’t even return a timeout exception.

Update II:
Looks like sever calls from the dev environment work - it seems to be happen only for production environment.

daviesian · January 22, 2024, 2:30pm

Hi everyone,

It looks like one server node in our cluster has stopped responding. We’re in the process of bringing it back now, please stand by for further updates

daviesian · January 22, 2024, 2:40pm

That server node is now back, but it is suspicious that there was any disruption - the loss of a single node should not cause timeouts or infinite spinners such as those being reported, so there must be more going on here.

Is anyone able to provide instructions for us to be able to trigger one of these infinite spinners ourselves on your app?

mark.breuss · January 22, 2024, 2:43pm

Our Infinite Spinners have now stopped.
Will report if we see any more issues.

mastamatto · January 22, 2024, 3:32pm

Well, it was taking place on all of our applications that are published and used daily.

APP ID: G6NB2XYFOWYGVE7P

SESSIONS WITH TIMEOUT ERRORS:

HCLVZXRIHM35H2F5K7KJCQ6ADZT5SCA4
Z3USCBU4BI5Q47F3ZD4U4FXY43CAVL6E
HGVCXRMI7RRK3BOZTQMCRN4KNEY77ARW
TGPGJFAQYHAK3JXQK4NGVHK7LFFUN7SU
DWRUVKLWMBZ47267FMYAV4BTRGU6WU2U
H7EDHNMQRW3K52W5XBYM7ODBEUBANYK6

SESSIONS WITH APPOFFLINE ERRORS:

P5L5SJI2HBVN2ZKJLRAWBNTM6UDEKJGJ
PRELSHHBM3PZMFF2N5TO3V6NIPL6NOLE

Oddly enough, we haven’t seen any errors at all today. So perhaps the node that was fixed covered us as well.

mastamatto · January 22, 2024, 3:45pm

I spoke too soon. We are seeing this issue again. Latest sessions on the same app (timeout error):

US7QNDYCSVWAG5UCXNHA33B4ZEB3HK3G
CH5YZOO6LGW5RMOJWOGJZKXZMCFSJYWW

daviesian · January 22, 2024, 3:50pm

@mastamatto Thank you! Is this a timeout from every anvil.server.call, or just some of them?