Searching simple object with a dictionary

campopianoa · September 26, 2019, 8:18pm

Hello,

I am trying to return a row from a data table where a simple object’s key/val pair matches a local dictionary.

The data table’s simple object looks like this:

{"municipality":["milton"],"school":["s1","s2"]}

My dictionary looks like this:

my_dict={"municipality":["milton"],"school":["s1"]}

When I run app_tables.my_table.get(object_col=my_dict) it returns the row above, even though the values in my_dict['school'] are not equivalent to the values my_row['school']. I would expect to get None back since the lists do not match.

The documentation says:

# This finds all rows where object_col is a
# dict and contains a key 'the_answer' with
# the value '42' - no matter what other keys
# are in the dict.
r = app_tables.my_table.search(object_col={'the_answer': 42})

I interpret this as saying that, in order to find a match, the simple object must contain a key that equals school (and it does), and that my_dict['school'] must equal my_row['school'] (which it does not).

The docs go on to say:

If you specify a dictionary, all specified keys must be present in the column, and all values must match the pattern specified in the dictionary.

I’m not sure what “match the pattern specified in the dictionary” refers to exactly. Based on the fact that I am able to return the row above with my_dict, I assume that Anvil considers things to be a match based on an intersection between the two lists.

Update: I no longer think that an intersection between the two lists leads to a match. If I change my_dict['school'] to ["s1", "s2", "s3"] the search returns None despite the fact that the lists have common values.

Here is an app to demonstrate:

Is this the intended behaviour when searching simple objects? If so, how would I test for equality between my dictionary and a key/value pair in a simple object (I do realize I can do this in multiple steps).

shaun · September 30, 2019, 2:13pm

Hi @alcampopiano

The Simple Object match is a subset match: as long as there are no extra values anywhere in the tree, a match is returned.

Let’s say we have a table called my_table with a Simple Object column called obj. It has a row that has this in the obj column:

{"foo":[1,2,3,4], "a": {"b": "c", "d": "e"}}

The following would return that row:

app_tables.my_table.search(obj={
    "foo": [1, 2, 3, 4],
    "a": {"b": "c","d": "e"}
  })

app_tables.my_table.search(obj={
    "foo": [1, 2, 3],
    "a": {"b": "c","d": "e"}
  })

app_tables.my_table.search(obj={
    "foo": [1, 2, 3],
    "a": {"b": "c"}
  })

app_tables.my_table.search(obj={
    "foo": [1, 2, 3, 4],
  })

app_tables.my_table.search(obj={})

The following would not:

app_tables.my_table.search(obj={
    "foo": [1, 2, 3, 4, 5, 6, 7, 8, 9],
    "a": {"b": "c","d": "e"}
  })

app_tables.my_table.search(obj={
    "a": {"b": "c","d": "e", "f": "g"}
  })

app_tables.my_table.search(obj={
    "foo": [1, 2, 3, 4],
    "a": {"b": "c","d": "e"},
    "p": 1,
  })

app_tables.my_table.search(obj=[])

So I’m afraid you’ll have to take the multiple-step approach to get an exact match. Is this going to cause you performance issues?

I’ll update the docs to make the matching behaviour clearer.

campopianoa · September 30, 2019, 4:03pm

Hi Shaun,

I’m don’t think it will cause app performance issues for me, but rather developer speed/readability/surprise issues.

I’m sure there is a good reason it is designed to match on subsets.

In my particular case I am using simple objects to store attributes that handle role based access control (RBAC) functionality.

For example, I have two columns: role and attributes. Role is plain text, and attributes is a simple object.

My users have roles and attributes that are constantly changing, and so I have scripts that pull from our organization’s database and updates the Users and RBAC tables accordingly (i.e., updating/adding/removing roles and attributes).

For example, I often have to ask whether some user’s roles and attributes are different when comparing our organization’s database to the Anvil data tables. I would like to do the following, but I cannot:

# from internal database
actual_user_role='vice_principal' 
actual_user_attributes=`{'school': ['school_1', 'school_2']}`

if not app_tables.RBAC_table.get(role=actual_user_role, attributes=actual_user_attributes):
 # add a new row to the RBAC table

As you describe, get() will likely not work since my search will be a subset of many rows (so I’d switch to search()), and even if I get a single row back, there is no guarantee that row['attributes'] will match actual_user_attributes.

This means that I cannot use an if statement. I have to pull back whatever matches based on subsets, as you describe, cast values into lists, and then compare those lists for equality. Now that I think about it though, I don’t think this is even feasible since supersets will crash as you indicate above.

I think I have to take a different approach that maybe adds roles and attributes to the RBAC table regardless of what else is in there, and then find a way of removing stale rows from that table after the fact.

What I would love, is a way to use the query operator to say something like:

app_tables.RBAC_table.get(attributes=q.equals({'school': ['school_1', 'school_2']}`)

Either that or make it so that app_tables.my_table.get(attributes=my_dict) implies equivalence, since there is an equals sign, and have q.subset as part of the query operator options.

Perhaps all of this is very specific to my current situation and wouldn’t apply to most users. Anyway, thanks for explaining how things work. I’m sure I can find ways to handle my use case now that things are clear.

aldo.ercolani · October 2, 2019, 4:22pm

My 2 cents.
If your simple objects are small enough and if your focus is on keeping it simple in the source code, an idea could be to store not the simple object, but its string representation, and convert back/forth simple object/string accordingly with json package functions in the data-access layer, so to minimize impact on existing codebase.
That would allow the exact match search.
That would slow down your data-access for sure, maybe some indexes can help but a match on a string field is never source of happiness.
A little step forward would be to calc an unique hash of the object (or its string repr), store that as a new field just to use it for the queries.

campopianoa · October 2, 2019, 4:31pm

Thanks. There are always work arounds; however, testing for equality between dicts and simple objects seems like an operation that would be useful.

Having to know that = means “matches on subsets” in the case of simple objects seems like a slippery spot.

aldo.ercolani · October 2, 2019, 4:36pm

Yeah, I totally agree.

p.colbert · October 2, 2019, 5:29pm

This is one of those bi-directional things that is trivial to get backwards: is it matching when A is a subset of B? or when B is a subset of A?

Python already has sets, and operators for testing super/subsets. If we can maintain consistency with those, then it will be less confusing.

campopianoa · October 2, 2019, 5:33pm

I agree. If you folks want to upvote the related feature request, or comment on it, that may help get the discussion started.

aldo.ercolani · October 3, 2019, 7:00am

That makes me think of another possible workaround in the meanwhile:

if not app_tables.RBAC_table.get(q.all_of(role=actual_user_role, q.all_of(attributes=actual_user_attributes,actual_user_attributes=attributes)):

If A is a subset of B and B a subset of A then A==B, does it make sense?

Don’t know how slow that can be…