Server quits with ExecutionTerminatedError on creating a pandas dataframe

simon.m.shapiro · May 23, 2023, 7:03am

What I’m trying to do:

I am getting data from a database server via a background task. The task never finishes and I get this error message:

anvil.server.ExecutionTerminatedError: Server code exited unexpectedly: 5107b8af69

What I’ve tried and what’s not working:

The background task works ok when the database is returning smaller datasets, but this is the biggest one yet.

Is there any way of understanding this error message, or should I implement limit and offset logic in the query?

** Code snippet **
With additional debugging, I found the error to be in building the pandas dataframe. Here is the snippet where it crashes.

            res = self._post_query(query_string, accept="text/csv")
            print("slow dataframe sparql query finished")
            try:
              result = pd.read_csv(StringIO(res))
              print("dataframe ready", result.shape)
            except Exception as e:
              print("bad dataframe", e)
            return result

As an aside, it above Termination never raises the exception. So it is in pd.read_csv.

I have established that the size of the in-memory csv is over 125MB. DO you think this is too big?

p.colbert · May 23, 2023, 7:41pm

I don’t know whether it’s too big or not, but I can see that there might be some opportunities for saving space and/or time – if it’s feasible to read via numpy instead of pandas.

From what I can see, numpy should be able to read res directly, via genfromtxt. No need to wrap a StringIO object around it.

My rationale: If StringIO copies the string – I don’t know whether it does or not – then that would double the amount of memory used.

My other rationale: pandas is built on top of numpy. If numpy alone is sufficient (for your purposes), then there’s no need to load pandas into memory on top of it, giving you more memory to work with.

So, it might be worth trying, in your case.

simon.m.shapiro · May 24, 2023, 7:14am

Hmm. It super weird because I can do use the same class, do the same query within a jupyter notebook on my laptop and get it converted to a pandas dataframe. It yields a dataframe of 421,487 rows, with 4 columns.

meredydd · July 24, 2023, 2:46pm

Hi @simon.m.shapiro,

Following up here by request: It looks like what’s happening here is that you’re exceeding the maximum memory usage permitted by your plan (1GB for you). @p.colbert is right – the first step is probably to try to minimise your memory usage. What I’d recommend is to save the downloaded data into a file on disk, and then read the CSV from there – that way, you avoid keeping multiple copies of it in memory at the same time, and expand the maximum size of data you can load!