Optimizing search with multiple linked rows

What I’m trying to do:

I have four tables:

  • Customer
    • Has around 4500 rows
  • Status (Linked in Customer)
    • Has 10 rows
  • Location (Linked in Customer)
    • Has same amount of rows as Customer (4500)
  • Country (Linked in Location)
    • Has around 15 rows

Customer and Location are “1-to-1”, meaning a Customer have a linked Location and a Location has a link back to the Customer.
I’m sure this is a bad practice, but it has helped me speed up some other lookups I am doing elsewhere.

End goal is to populate a DataGrid (RepeatingPanel) with Locations, based on two Dropdowns: (Country and Status)

The user can select either Country, Status or both from the Dropdowns.
If only Country is selected, the query takes ~1 second.
If only Status is selected, the query takes ~0.5-5 seconds.
If both are selected, the query can take up to 10 seconds.

I am looking at ways to speed up the queries, and any pointers are welcome!
Preferably not any, that would need me to recreate the structure in my tables. Although that might be the best solution, it would take time, that I don’t have, at the moment… something something, technical debt, I know…

What I’ve tried and what’s not working:
I’ve tried different approaches, shown in the code below, but I think I might be missing something crucial.
I even tried the new and improved q.fetch_only(), but that resulted in errors. If you think it’s the road to take, I’ll have another look at it.

Code Sample:

def filter_by_country_or_status(filter_tuple):
  if filter_tuple[0] and filter_tuple[1]:
    # Country [0] and Status [1] chosen

    # Approach 1
    customers = app_tables.customer.search(status=filter_tuple[1])
    result = [c['location'] for c in customers if c['location']['country'] == filter_tuple[0]]

    # Approach 2
    locations = app_tables.location.search(country=filter_tuple[0])
    result = [l for l in locations if l['customer']['status'] == filter_tuple[1]]

  elif not filter_tuple[0] and filter_tuple[1]:
    # Status chosen
    result = [c['location'] for c in app_tables.customer.search(status=filter_tuple[1])]
  elif filter_tuple[0] and not filter_tuple[1]:
    # Country chosen
    result = app_tables.location.search(country=filter_tuple[0])
  else:
    result = app_tables.location.search()
  return result

My experiment with q.fetch_only:

locations = app_tables.customer.search(q.fetch_only('location'), status=filter_tuple[1])
    result = [l for l in locations if l['country'] == filter_tuple[0]]

This results in a: anvil.tables.NoSuchColumnError: No such column 'country'

Thank you in advance! I’ll be happy to further explain the setup.
As the tables contain customer data, I cannot provide a public clone.

EDIT:
Accelerated Tables are enabled.

Just checking, but do you have accelerated tables checked?

That’s a quick & easy change , if not implemented and a HUGE speed bump.

1 Like

Yes, Accelerated tables enabled :+1:

1 Like

You really should be allowing the database to filter, not doing so in a list comprehension. Avoid fetching everything and filtering yourself unless you absolutely cannot do the filtering using db features.

In your case, you can filter based on the linked field by looking up the linked rows, e.g. (untested code warning):

locations = app_tables.location.search(country=filter_tuple[0])
customers = app_tables.customer.search(status=filter_tuple[1], location=q.any_of(*locations))

Refer to the query operators blog post for more details on q.any_of: Querying Data Tables

3 Likes

Thank you - this makes sense, letting the database to the hard work.

However, I am still seing query-times of upwards to 3-4 seconds,

Do you have any other suggestions?

I’m having a hard time debugging exactly what it is, that is making the queries slow. I suppose a redesign of the table-structure, combining Customer and Location (as they already have a 1-to-1 relationship), would be a way to speed things up.

What’s your current version of the code look like?

With accelerated tables enabled, all the linked rows are fetched during the search, so searches themselves might take longer than without accelerated tables. You might time your current version of the search with and without accelerated tables to see if that’s affecting the search time.

If accelerated tables searches are longer than non-accelerated tables, you could (assuming there isn’t something else obvious with your current code):

  1. Use q.fetch_only to not fetch all the linked fields (assuming you don’t need those linked fields for that query)

  2. Change your db structure to avoid as many linked fields

Denormalizing your db structure to optimize queries isn’t specific to Anvil. Typically you might end up duplicating some data and working hard to make sure your code keeps all the duplicate values in sync. I’d put that off until you’ve exhausted all the other possibilities, though.

Also on the db structure front, you don’t say how many fields each data table has, but large numbers of fields can slow down searches (at least pre-accelerated tables, I don’t know about after). You can reduce the number of fields by putting some fields in a simple object column (typically ones you don’t need to search on). Again, I wouldn’t go that route until it’s absolutely necessary.

1 Like

My code currently looks like this:

Client:

  def dd_change(self, **event_args):
    tl = TimerLogger("Dropdown filter timer")
    tl.start('Client request send')
    self.rp_locations.items = anvil.server.call(
      'filter_by_country_or_status', 
      (
        self.dd_country.selected_value, 
        self.dd_status_filter.selected_value
      )
    )
    tl.end('Client request completed')

Server:

@anvil.server.callable('filter_by_country_or_status', require_user=True)
def filter_by_country_or_status(filter_tuple: tuple=None):
  result = None
  tl = TimerLogger('Server timer')
  tl.start('Server start')
  if not any(filter_tuple):
    try:
      filter_tuple = load_filters()
    except:
      return result
  else:
    store_filters(filter_tuple)
  tl.check('Server cookie check done')
    
  if filter_tuple[0] and filter_tuple[1]:
    customers = app_tables.customer.search(status=filter_tuple[1])
    tl.check('Server: Customers fetched.')
    result = app_tables.location.search(country=filter_tuple[0], customer=q.any_of(*customers))
    tl.check('Server: Locations fetched.')
    
  elif not filter_tuple[0] and filter_tuple[1]:
    # Status chosen
    result = [c['location'] for c in app_tables.customer.search(q.fetch_only('location'), status=filter_tuple[1])]
  elif filter_tuple[0] and not filter_tuple[1]:
    # Country chosen
    result = app_tables.location.search(country=filter_tuple[0])
  else:
    result = app_tables.location.search()
  tl.end('Server end')
  return result

Output from timers

Accelerated Tables enabled:

20:11:21 | Dropdown filter timer: ( 0.000 secs) | Client request send
18:11:22 | Server start: ( 0.000 secs) | start
18:11:22 | Server start: ( 0.029 secs) | Server cookie check done
18:11:22 | Server start: ( 0.332 secs) | Server: Customers fetched.
18:11:26 | Server start: ( 4.661 secs) | Server: Locations fetched.
18:11:26 | Server start: ( 4.661 secs) | Server end
20:11:27 | Dropdown filter timer: ( 6.265 secs) | Client request completed

Accelerated Tables disabled:

20:15:43 | Dropdown filter timer: ( 0.000 secs) | Client request send
18:15:44 | Server timer: ( 0.000 secs) | Server start
18:15:44 | Server timer: ( 0.045 secs) | Server cookie check done
18:15:45 | Server timer: ( 1.373 secs) | Server: Customers fetched.
18:16:02 | Server timer: (18.262 secs) | Server: Locations fetched.
18:16:02 | Server timer: (18.263 secs) | Server end
20:16:03 | Dropdown filter timer: (19.738 secs) | Client request completed

So Accelerated Tables are definitly doing wonders!

In regards to number of fields in each table, it looks like this:

  • Customer (2548 rows)
    • 6 x Text
    • 2 x Linked:
      • Location (1-to-1)
      • Status (1-to-many)
  • Location (2547 rows)
    • 1 x Text
    • 2 x Number
    • 2 x Linked
      • Customer (1-to-1)
      • Country (1-to-many)
  • Status (10 rows)
    • 3 x Text
    • 1 x Media (32x32 pixels)
  • Country (12 rows)
    • 1 x Text

When displaying the query results (from Location table) in the DataGrid, a “name” column from the Customer table is displayed, and a Status-widget is constructed from the “Status” table, linked from the Customer.
So I am displaying data from a “double-link”:
Location ← Customer ← Status

Does the result that is returned from the server-code, fetch the linked data at the time of the query, or does it do multiple roundtrips, after it reached the client?

I seem to have been ably to shave of about ~1 second from all queries, by only fetching the data I actually need with fetch_only():

@anvil.server.callable('filter_by_country_or_status', require_user=True)
def filter_by_country_or_status(filter_tuple: tuple=None):
  result = None
  tl = TimerLogger('Server timer')
  tl.start('Server start')
  if not any(filter_tuple):
    try:
      filter_tuple = load_filters()
    except:
      return result
  else:
    store_filters(filter_tuple)
  tl.check('Server cookie check done')
    
  if filter_tuple[0] and filter_tuple[1]:
    customers = app_tables.customer.search(
      q.fetch_only('name', 'status'),
      status=filter_tuple[1]
    )
    tl.check('Server: Customers fetched.')
    
    result = app_tables.location.search(
      q.fetch_only('customer', 'address', 'country'),
      country=filter_tuple[0], 
      customer=q.any_of(*customers)
    )
    tl.check('Server: Locations fetched.')
    
  elif not filter_tuple[0] and filter_tuple[1]:
    # Status chosen
    customers = app_tables.customer.search(
      q.fetch_only('name', 'status'),
      status=filter_tuple[1]
    )
    
    result = app_tables.location.search(customer=q.any_of(*customers))
  elif filter_tuple[0] and not filter_tuple[1]:
    # Country chosen
    result = app_tables.location.search(
      q.fetch_only('customer', 'address', 'country'),
      country=filter_tuple[0]
    )
  else:
    result = app_tables.location.search()
  tl.end('Server end')
  return result

With accelerated tables on, data is fetched at the time of the query. With it off, the client would fetch the linked data on request.

The number of columns in the tables isn’t enough to cause issues, and I don’t see anything in your code that seems like it’s not right.

Using q.fetch_only like you are might be as good as you’re going to get without denormalizing your database structure.

Edit: unless you can use q.fetch_only to tell it not to fetch the location column for the customer link inside the location row. The circular reference might be taking extra time?

Right, I have the query down to a stable ~ 3 seconds now, which is good enough for the time being.

In regards to the circular reference, that might be a thing, I’ll have a look.

Thank you very much, for taking the time, to have a look!
Much appreciated!

2 Likes