Can I have more details about data table row caching?

stefano.menci · September 5, 2021, 5:00am

After banging my head against the wall for hours trying to understand why certain functions were much slower than others, I think I figured out what’s going on:

A server function returning the result of an database search returns an iterator (or iterable) and there is some smart lazy loading going on
After a row has been cached, it does contain the values of some columns, but not all of them. So for example reading string columns after the row has already been cached will be fast, but reading simple object columns will require a round trip per column
Reading those simple object columns again requires a round trip again

Please let me know if I understand correctly what’s going on. And if I do, please explain why the simple object values are not kept. I imagine they are not loaded immediately to avoid unnecessarily loading large amounts of data, but I don’t understand why, after they are loaded once, they are not kept in cache.

Running the little test below, runs a cycle 4 times on the same rows. The first 2 times reading the content of a text column, and it’s always fast. The second 2 times reading the content of a simple object column, and it’s always slow. I was expecting for the simple objects in the 4th execution to be cached and to be fast.

def test():
  rows = anvil.server.call('get_rows')
  
  for row in rows:
    print(len(row['text_column']))
  
  for row in rows:
    print(len(row['text_column']))
  
  for row in rows:
    print(len(row['simple_object_column']))
  
  for row in rows:
    print(len(row['simple_object_column']))

I tried with these two versions of the server code:

@anvil.server.callable
def get_rows_as_dicts():
  return [dict(row) for row in app_tables.my_table.search()[:10]]

@anvil.server.callable
def get_rows_as_rows():
  return list(app_tables.my_table.search()[:10])

The first version returning 10 dictionaries containing the simple object values is slightly slower, but the client code is much faster.

The second version returning 10 rows is slightly faster, but the client code is much slower.

Obviously the factors affecting these measurements are the number of rows in the list and the number and size of simple object columns on each row.

The advantage of using the first format is that I only need one round trip, slower, but just one. Then the client code is fast.
The disadvantage is that if I want to return hundreds of rows, the first round trip can be really slow.

The advantage of using the second format is that I can return hundreds of rows very quickly.
The disadvantage is that every access to a simple object requires a round trip.

I would love to be able to:

Understand what’s going on. Is my understanding correct?
Be able to fine tune the behavior and load some or all simple object values on the first round trip with something like app_table.search(lazy_simple_object_columns=['col_1', 'col_3'])
Create an iterator on the client side, so I can build the row objects one by one on the server side (faster with simple objects) and only on demand (fast with long lists of rows when the client ends up using only the first few rows on the first page).

rickhurlbatt · December 30, 2021, 11:46am

Agreed. Understanding this would definitely help optimise loading of large data sets.

Any info would be appreciated.