Performance accessing data

jbherold · September 17, 2022, 1:28pm

Hello there

I am looking at anvil to build app displaying dashboards on the internet with restricted users access.

Those dashb. will be showing datatables with filtering options and so will use many thousands of rows (up to hundreds thousands) and 5 to 15 columns.

I usually build my dashboards using dash (plotly) from pandas dataframe coming from different pipelines all the way from databases extractions.

What are the best way to do this with anvil and keep performance?

Should I feed anvil databases with those frames coming from pipelines.
Meaning that a lot of datas will be feeding and refreshing on a 10minutes basis.

Should I link directly the pipeline returns and deal with this in anvil with pandas and plotly both on server side and client side.

I have looked tutorials in anvil learning and watch different use cases but can t be sure that anvil will be my guy for this purpose as the databases uses very few datas in the examples.
I see that most of the time the datas are retrieved using python comprehension list to be sent to plotly for an example, or datagrids or datatables, but pandas is way much accurate to loop over thousands of lines, is that possible to deal directly with pandas and numpy on server side?

Thanks for helping.

jshaffstall · September 18, 2022, 1:05pm

Surprised some of the folks processing large amounts of data haven’t responded. I don’t, but from reading answers to similar posts, understand the principles.

That’s the least efficient way to handle large amounts of data, unless you know that you always need all the data.

A data grid is designed to work with the fact that users aren’t generally going to need all the data, so you don’t (generally, some scenarios are different) pull all the data into a list, you give the data grid the Anvil search iterator. Search iterators pull pages at a time as needed from the data table, so spreads out the time needed.

Every situation is different, though, so you’re better off putting a lot of test data into data tables and trying various techniques.

rickhurlbatt · September 19, 2022, 5:25am

First step - enable accelerated tables!!

There are a few ways that we have approached this problem but I think the main way to keep the performance up is to limit what you send to the client. The easiest way to do this is to use Pandas on the server to handle the data manipulation and only send the plotly traces back to the client.

The next challenge then is getting the data out of the data table and into a DF. There are a few options:

Export the data table to excel and create a data frame from that (seems to be insanely quick to export an entire data table to excel)
Use SQL (dedicated plan and above) to get dicts and then create the DF from dicts

This means that all you are doing on the client is having to handle the sort and filtering options which you then pass to a server function to manipulate the DF.

One thing to keep things fast would be to have the persistent server option enabled - this means you don’t have to set up the server every time you make a server call.

You could also look at storing the DF in a data table in between server calls.

You should also check out the Anvil Extras serialisation module.

p.colbert · September 19, 2022, 11:02pm

Welcome to the Forum!

Yes, absolutely. See Anvil Docs | List of Packages