Query rows from table that are IN a column from another

So, let’s say I have the following tables:

TABLE A, which has x columns (not relevant)
TABLE B, which has y columns, one of them of type TABLE A Row

I want to get all TABLE A rows that are used in TABLE B (or all TABLE A rows that are NOT used in TABLE B ) relational column to A.

the problem is, on app_tables.tablea.search() there’s no way to filter by the whole row instead of each column. At least, not one that I could think of.

Basically I wanted to make:

SELECT *
FROM table_a
WHERE id NOT IN (SELECT a_id FROM table_b)

I know that in certain cases I could just search in table_b getting the column for table A, but this would give me multiple times the same row. Furthermore, a “NOT IN” select would not be possible this way.

What I want is to make a complex filter for table_a. The user would filter table A by it’s columns, but also for records that are in/not in multiple relational tables that use A as a column.

Any ideas of how to do that?

This seems like a Python question, as opposed to an Anvil-specific question. But one way to remove the multiples is to do a set comprehension like this, which you can then use to get the “not in” rows:

in_rows = {b_row['a_row'] for b_row in app_tables.table_b.search()}
not_in_rows = [a_row for a_row in app_tables.table_a.search() if a_row not in in_rows]

This is just kind of translating your SQL into Python.

Yes, that’s how I’m currently doing, but I’m doing this, I’m loading all the values in the row iterator coming from the search method.

I wanted to take advantage of the lazy part of the result of the search. Also, this approach makes me do lots of queries (at least one for each other table to get the in_rows). I wanted to see if there was a way to do only ONE search.

How about:

result = [
    a_row for a_row in app_tables.table_a.search()
    if a_row not in (
        b_row["a_row"] for b_row in app_tables.table_b.search()
    )
]
1 Like

To minimize database hits, I would suggest cacheing the Anvil equivalent of this as a Python set:

Then use the set to resolve the in and not in conditions.

1 Like

Perhaps (I haven’t tried) it would be faster if the inner query is executed only once:

excluded = {b_row["a_row"] for b_row in app_tables.table_b.search()}
result = [
    a_row for a_row in app_tables.table_a.search()
    if a_row not in excluded
]

And maybe (again, not tried) even better:

excluded = {b_row["a_row"] for b_row in app_tables.table_b.search()}
result = [a_row for a_row in app_tables.table_a.search(name=q.none_of(excluded))]

As a side note, since this is the first time you are posting, it may help you to know that you can test these queries from your command line after first connecting to your app with an uplink key. The queries will be orders of magnitude slower, but it could be faster to play with queries in the command line rather than modifying and executing the app a million times.

As a second side note, since this is the first time you are posting, welcome to the forum!

1 Like

Thank you all for your answers.

The general idea is to create a set from a prior search before the main search, which will use it in a “for in/if not in” statement.

The problem (at least… that I thought that I had) is that a search returns me a lazy iterator which will only be loaded if/when asked. When I convert it to a set, it has to load all data to do this.

I tried the following using Anvil Uplink (thanks for the tip @stefano.menci !):

excluded = app_tables.table_b.search() # It took as slow as 1.75s excluded = {b_row['a_row'] for b_row in app_tables.table_b.search()} # Took as slow as 3.4s(!!!), but sometimes as quick as 2.2s. One time it took only 1.8s and another one 10s+. excluded = [b_row['a_row'] for b_row in app_tables.table_b.search()] # Took around 2s~2.5s
The first test was just to compare how quicker would be to get the iterator for all results WITHOUT transforming the results.

The test for sets turned out a little unreliable in terms of time… I could understand why though. I tried passing a tables.order_by parameter to order by ‘a_row’ and it appears to be more quicker and reliable, staying between 2.4 and 3s, but I don’t know if I could take this for certain.

I tested a list since it doesn’t matter if I check for items in [a,b,c,d] that are not in [a,a,a,b,b,b] or that are not in [a,b]: both should give me [c,d].

In my tests, table A has 326 rows and table B has 317 rows.

After this, I tried using the set/list to get rows from table_a that are not in the excluded collection.

What i tried was:

result_list = [a_row for a_row in app_tables.table_a.search() if a_row not in excluded]

I also tried to do a for in loop with the result to compare the time it took from before the main query to after the result of the query and to after the end of the for/in (both counting from before the query).

The results were:
Using the set: ~2.5s to get the set + 1.2s to get the final result
Using the list: ~2.5s to get the list + 1.1s to get the final result

So I gues… it doesn’t really matter. I thought that doing that would be time-costly, since I was going to iterate in all results for both tables.

Thank you all for your help and kindness!

And thank you for the welcoming words, @stefano.menci ! While I’m not new to Anvil and have made a lot of apps here, I’m kind of new to the forum!

1 Like

Also, this:

excluded = {b_row[“a_row”] for b_row in app_tables.table_b.search()}
result = [a_row for a_row in app_tables.table_a.search(name=q.none_of(excluded))]

does not work, since excluded would be a set of AnvilLiveObject and “name” column (if exists) would not be of said type.

One of the approaches I was thinking before was to transform each value in the set of b_row[‘a_row’] into different params of
Name=b_row['a_row']['Name'], OtherColumn=b_row['a_row']['OtherColumn']...
and then put then on a q.None_of()… but I couldn’t think of a way to do it (simple).

Perhaps you could use a dictionary, something like:
excluded = {b_row[“a_row”][‘username’]: b_row[“a_row”] for b_row in app_tables.table_b.search()}

You could use the in operator to see if the user name exists in the dictionary keys, then use the value when inserting it in the row.

Using an uplink connection is great to figure out how to build a query, not to check the performance.

Queries from an uplink connection will add roundtrips, which are by far the slowest part.

Queries from the browser will add the slow Skulpt loops to the slow roundtrips.

Queries in a server module will always be much faster. It’s always a good idea to create a function in a server module that does all the queries you need and all the iterations on the queries on the server side. Then call that function either from the uplink script or from the browser.

1 Like