Performance Issues Working with Anvil Data Tables and Row Iterators

danbolinson · January 29, 2021, 11:03pm

I will try my best to give a bit of insight here.

First, an app_tables search call returns a lazy search iterator. I think they load on the order of 20 row objects at a time, and only load objects as needed. This is sometimes good, but when you iterate through an entire search iterator object, you load all the rows in batches and (I think) doing so with multiple back-end calls every 20ish rows. So, performance rapidly degrades, and unpacking a search iterator is expensive as you’re finding. It’s a wokr-around in some cases but if you have any number of rows, you end up where you are. The good news is that if you don’t need to unpack the search iterator, you can pass it back to a Datagrid on the client side and it will only load the rows as needed by the client as they paginate through the datagrid. Fancy.

It would be great if you could sort/filter by a linked column but right now I don’t think that’s possible (without unpacking the entire search iterator, which we want to avoid). Here’s a plug for everyone to +1 this feature request.

You could add a redundant field to your object and filter/sort on that, like your hypothetical category column, and keep a second data table that tracks the unique values in that column. So you can return that (small) list to the client for the list of categories and avoid doing something like set([r['categories'] for r in search_results]), which again, unpacks the entire search iterator.

There IS a way to hack around this by leveraging the ‘to_csv’ method of a search_iterator. This returns the entire results of the search iterator and does so shocklingly quickly. You can then send it to a dataframe as a StringIO object or else just deal with it in Python, at Python speed.

Here’s an example of returning a 10,000 row table as a list of dictionaries in 3 seconds, including conversion to and then from a dataframe:
https://anvil.works/build#clone:575YYURRGKKAGWHU=A7PKA5XWLYUID363T2YHCHLB

So to answer your questions:

Still useful, but you do run into some constraints (especially with filtering/sorting on linked tables)
Remember they are lazy, and unpacking them naively will always suck if they are even a little bit big.
External SQL would be fine but it’s probably not necessary.
Lots of ways to skin that cat! But, despite the datatable -> csv -> dataframe hack, I’d still try to keep things in search iterators to keep it lazy, and use helper tables to keep track of the known values you might be filtering on, and include those in the datatable for filtering and sorting purposes. There’s a lot of benefits in staying with search iterator/row objects (lazy loading for example) and I’d resist giving that up at least a little longer!