Help me understand the speed difference between two DataTable requests

EsbenLorenzen · September 18, 2022, 12:47pm

What I’m trying to do:
I have a function which fetches a list of customers, based on a few linked tables.

I am trying to optimize the speed of the query, and have at this point gotten it down to 5-6 seconds, which is okay for the use, as it is meant to be used approximately once few hours at most.

During my testing, and coming up with the correct query, I discovered, that doing one of the sub-queries as a list-comprehension saw dropped the execution time by more than double.
And “flipping” that comprehension reduced it even further.

I was hoping, someone could help me shed some light on why this is, so as to improve my query-writing skills for the future.

What I thought I knew about Anvil queries
After going through the task of optimizing queries several times, and getting some great help and learnings from the Anvil community, one thing I though was pretty important, when constructing multi-table queries, was:
Using the built-in query tools in the Anvil toolkit (q.any_of, q.all_of, etc.) was quicker, than doing the queries one by one, and using Python functions to filter, sort etc.
Adding to that, I was under the understanding, that passing a query directly as a parameter for another query, let the database handle everything internally, thus increasing speed, as there is no need to translate to and from Python objects that often

Therefore, I am not quite sure, why Method 2 and 3 in the below code, is so much quicker, than Method 1.

Am I missing something, or am I misunderstanding, what my own code is actually doing?

Thank you in advance.

Method 1

  # Execution-time:
  #  +30s (Server took too long to answer)

  result = []
  seen_ids = set()
  for e in app_tables.event.search(
    q.fetch_only("followup", "followup_done", "customer"),
    customer=q.any_of(
      *app_tables.customer.search(
        location=q.any_of(
          *app_tables.location.search(
            country=q.any_of(*country)
          )
        ),
      status=q.any_of(*status)
      )),
    followup=followup['has_followup'],
    followup_done=False
  ):
    seen_ids.add(e['customer'].get_id())
    result.append(e['customer'])
  return result

Method 2

  # Execution-times:
  #  16.24s
  #  15.17s
  #  15.08s

  result = []
  seen_ids = set()
  for e in app_tables.event.search(
    q.fetch_only("followup", "followup_done", "customer"),
    customer=q.any_of(*[l['customer'] for l in app_tables.location.search(country=q.any_of(*country)) if l['customer']['status'] in status]),
    followup=followup['has_followup'],
    followup_done=False
  ):
    seen_ids.add(e['customer'].get_id())
    result.append(e['customer'])
  return result

Method 3

  # Execution-times:
  #  5.90s
  #  6.13s
  #  5.56s

  result = []
  seen_ids = set()
  for e in app_tables.event.search(
    q.fetch_only("followup", "followup_done", "customer"),
    customer=q.any_of(*[c for c in app_tables.customer.search(status=q.any_of(*status)) if c['location']['country'] in country]),
    followup=followup['has_followup'],
    followup_done=False
  ):
    seen_ids.add(e['customer'].get_id())
    result.append(e['customer'])
  return result

jshaffstall · September 18, 2022, 1:01pm

I wouldn’t expect q.any_of to do any caching, so your first query:

EsbenLorenzen:

customer=q.any_of(
      *app_tables.customer.search(
        location=q.any_of(
          *app_tables.location.search(
            country=q.any_of(*country)
          )
        ),

Is doing that customer search for every row in the event table. q.any_of is on of those situations where you’re better off pulling the customer results into a Python list, and then expanding that in your query. Same with the location search.

I’m not surprised that version is taking so long. Query operators are just simple translation functions that convert their arguments into proper syntax for the database.

EsbenLorenzen · September 18, 2022, 1:05pm

That makes sense.

I thought that some special Anvil-magic was being done, so that the database was able to execute everything as single query, when using the query operators.

stefano.menci · September 18, 2022, 3:06pm

This doesn’t look right to me:

What column is the filter applied to?

EsbenLorenzen · September 18, 2022, 5:29pm

I was hoping to speed up the query, by letting the database know, I’m only interested in the followup, followup_done and customer (linked) columns of the event table.

I guessing by your response, that this is not what I am achieving, and am using the q._fetch_only() wrong?

Basically what the whole function does, is return every customer from a specified country and with a specified status, that does not have an event with a followup.

Simplified mockup of my DB structure:

jshaffstall · September 18, 2022, 8:18pm

That’s the right usage of q.fetch_only. It’s a table-wide specification, not a column filter, from every example I’ve seen.

stefano.menci · September 19, 2022, 3:03pm

Sorry, that’s correct. My eyes read q.fetch_only and my brain received q.any_of. I had a filter in my mind, so I was wondering on what column.

One thing to keep in mind with the accelerated tables is that, if you are not using q.fetch_only then, all the columns are cached, including all linked ones down to the full depth of their dependencies. If you use it and you include the name of a linked column, then all its columns, including the linked ones, are all included.

If you have many relations and want to only include some of them, then you can specify which relations and which columns of each relation. I’m saying this because I noticed that you included 'customer', which I think is a linked column, and it would trigger the caching of all the columns of each customer and all of its linked columns, down with their full tree of dependencies.

For example this would cache all the columns of each customer, including all their linked ones:

app_tables.event.search(
    q.fetch_only("followup", "followup_done", "customer")

While this would prevent the caching of all the columns of each customer:

app_tables.event.search(
    q.fetch_only("followup", "followup_done", customer=q.fetch_only("name"))

This can be done also at multiple levels:

app_tables.event.search(
    q.fetch_only(
        "followup",
        "followup_done", 
        customer=q.fetch_only(
            "name",
            status=q.fetch_only(
                "title")))

jshaffstall · September 19, 2022, 5:04pm

You might eke a little more speed out when you’re searching for linking rows by using an empty q.fetch_only() when doing the initial search for linking rows, e.g. the customer and location searches. That way you get the row id for testing for linking rows, but none of the other associated data for the row (which you don’t need in this case).