Session time out loading LLM model

leavitt · December 5, 2024, 4:34am

I have, but the tutorial model is clearly smaller than mine—I get server timeout errors even though I have cut the size of the model in half.

My problem is two-fold:

I would like to load the model once, but when I load the model from a server module, the (e.g., module global) variable holding it is apparently re-init’d between server calls. This causes my code to try to load the model every time it wants to be used
when I do (re-)load the model, it is often in the context of an already long-running task (thanks slow-responding LLMs!), and so I get timeouts

I have played with background tasks, but I haven’t found a way to pass a model from the background task to the foreground, where I need it. I’ve tried task.get_return_value(), but it turns out that the task got killed for trying to pass back a non-serialized SVD model

I haven’t tried storing the model directly in a data table, but that would also entail deserialization any time I wanted to use it, right?

My project is due in a few days, so I guess I’ll try explicit serialization/deserialization, but I’m not optimistic that it will help.

Any constructive ideas gratefully received

(and yes, I’ve been reading “Communicating with Background Tasks” )

leavitt · December 5, 2024, 6:28am

Okay, I went “the long way 'round” and used a background task to

load the ML model
serialize/pickle it
convert to anvil.BlobMedia
store that in a data table

then i have the foreground server code grab it from the data table, deserialize it, and save the result to a module global variable so I only have to to the retrieve & deserialize once for a long-running task, which saves me like 10 very precious seconds.

Bottom line: This approach works (well enough) and now the app works (well enough)

If anyone has better ideas, I’d still love to hear them.

hugetim · December 6, 2024, 3:31am

Welcome to the forum! Glad you found something that works well enough. I don’t have immediate thoughts but maybe others will. (I moved your comments to a new thread. It’s more difficult for others to respond when you reply to a long stale thread.)

duncan_richards12 · December 6, 2024, 6:33am

Perhaps you could still hold the LLM in a variable in an anvil uplink session.

yahiakalabs · December 6, 2024, 3:46pm

As a rule of thumb I generally avoid trying to host an LLM within an Anvil app - it’s just computationally very expensive beyond a school project.

But glad you figured out a way! That seems clever. Uplink would be my preferred way to do this - simpler and it’s “always” on. You can host an uplink script on Render for cheap - I believe @Quoc does this. Alternatively, you can just run it on your machine and make sure its on while the project is being graded.

leavitt · December 6, 2024, 8:53pm

Oh, for sure this (and by “this,” I mean my whole project) can only reasonably be viewed through the lens of a class project/PoC. There are quite a number of issues that would need to be addressed before this would scale.

That said, for this version of this (class project) app, and for my limited number of user testers, once the model is deserialized, which only happens once per invocation of my long-running server function, the inference does not contribute noticeably to response time. Interacting with an LLM, on the other hand, is a whole 'nother story!

Thanks for the ideas!