Performance Issues Working with Anvil Data Tables and Row Iterators

robert · January 30, 2021, 7:51am

I’ve been able to use some tricks to make all my app load times much better, but it does often take more work and thinking about data structures (eeek!) a lot more than the built in data tables.

As @danbolinson mentioned, if you are dealing with a good amount of rows then loading it into a dataframe will add about 1 second, but then future filtering or transforms are lightning fast (because Pandas leverages C on the back end). Pandas isn’t supported on the client side though. You can always export to a list of dicts with:

my_list_of_dicts = df.to_dict(orient='records')

To answer your questions from my perspective:

I don’t think so. It just depends on how the user will interact with the database. If huge amounts of unique data will be constantly needed and you want lightning-fast response times then using your own database might be a good idea. Then again it’s a lot of work to optimize a database and get it working fast with your specific type of queries.

For example, I have an app that uses an external database of about 2 million rows and the database has to search and filter through the data every load time (https://agaar.mongol.ai). It takes about 9-10 seconds. It’s not a commercial app or I would definitely be using the dedicated plan (so that data would be available in the server). I use a Postgres database that I’ve set up indexing for the specific columns I filter for. But honestly, I hate that. It’s annoying and takes time and you have to remember what the heck you did later and then figure out why performance is degrading because your host or your app or you did something stupid.

What you are doing is to call the iterator to the client, which makes the client have to go back and fetch batches to do the sorting. It’s like having a bad internet connection with lots of dropped packets, you have to call back to the source and get more packets. It’s quite inefficient. Instead do something like:

Client-Code:

product_data = anvil.server.call('get_product_data')

Server Code:

@anvil.server.callable
def get_product_data():
    results = app_tables.product_data.search()
    return [r for r in results]

If you want to filter using linked rows than you have to use Anvils tools. But for your use case that probably isn’t efficient. You should add a unique key like you would in a standard RDBMS and then you can filter on the list of dicts with that key a la:

filtered_results = [x for x in product_data if x['category_id'] == 'shoes']

For my large database I use the psycopg2 package to pull data and typically sqlalchemy to send data because the syntax is easier. I use the SQL connector for pandas with those libraries. But for your case I really don’t think you need an external database.
For your case, if you want a lightning fast experience, you are going to have to get creative. You could return only the first 20 rows as a list of dicts, then silently load the rest of the data (with anvil.server.call_s) into a client module. You can then do all the filtering in the client.

But you have a lot of edge cases to deal with there. For example what if someone filters before your background data is done loading? You would need some logic to deal with that.

As usual, the answer is mostly “it depends”. If you have a specific situation we can give some more help with that.