Change row.get(key, default=value) behavior

racersmith · January 31, 2025, 4:48pm

I would like to propose a change to the Row.get() to make its behavior more consistent with a python dict.

If we stop considering that every row is fully populated at creation with None and start taking advantage of the underlying null value for empty values, I think we can make the get() method more consistent and more useful.

Let me give some examples:

account_row = app_tables.my_accounts.add_row(id='me')
account_dict = dict(id='me')

# Alignment with expectation
print(account_row.get('id', "no id set"))
>> 'me'
print(account_dict.get('id', "no id set"))
>> 'me'

# Key in table but value is not set
# This behavior does not seem useful
print(account_row.get('last_update', 0))
>> None 
print(account_dict.get('last_update', 0))
>> 0

# Key not in table
# This seems... dangerous
print(account_row.get('non_existent_key', True))
>> True
print(account_dict.get('non_existent_key', True))
>> True

I think that the row object should feel like a dict that has defined key scope.

My proposal would be to change this to:

# Proposed behavior
# Key in table but not set
print(account_row.get('last_update', 0))
>> 0
print(account_dict.get('last_update', 0))
>> 0

# The dict consistent behavior for a missing key
print(account_row.get('non_existent_key', True))
>> True
print(account_dict.get('non_existent_key', True))
>> True

# Maybe better behavior for missing key:
#   auto create missing columns: False
print(account_row.get('non_existent_key', True))
>> NoSuchColumnError

#   auto create missing columns: True
print(account_row.get('non_existent_key', True))
>> True

print(account_dict.get('non_existent_key', True))
>> True

The or method works, it has some pitfalls with bool(table_value). I realize that I’m trying to trigger those pitfalls below and you may be rolling your eyes with a duh. But they are not always obvious and the proposed get() behavior would resolve these.

# bool(False) == bool(None)
account['update'] = False
print(account['updated'] or True)
>> True
print(account.get('updated', True)
>> False

# bool(0) == bool(None)
account['last_update'] = 0
print(account['last_update'] or now())
>> 123456
print(account.get('last_update', now))
>> 0

# bool(None) == bool({}) empty dict, list, set
account['messages'] = []
print(account['messages'] or 'never messaged')
>> 'never messaged'
print(account.get('messages', 'never messaged'))
>> []

Creating the functionality on our side means that None is a reserved value in tables. Here is what I’ve done in the past.

def default_get(row: Row, key: str, default: Any=None) -> Any:
    value = row[key]
    if value is None:
        return default
    else:
         return value

Here is what it could look like:

def get(self, key: str, default: Any=None) -> Any:
    try:
        value = self[key]
        if value == NULL_VALUE:
            return default 
        else:
            return value
    except NoSuchColumn as e:
         if _auto_create_columns:
            return default
        else:
            raise e

stefano.menci · January 31, 2025, 5:48pm

What behavior do you expect when row_dict['last_update'] = None? As I understand it, you’re treating that as if the key is unset, even though I’m explicitly assigning a None value.

A Row object does function like a dictionary, with each table column acting as a key and each table value acting as the corresponding dictionary value. If a column exists but its value is None, it’s reasonable for the dictionary getter to return None for that key.

racersmith · January 31, 2025, 5:52pm

This is a major difference. Right now we can’t distinguish between None because it was never populated or None because we set the value to None

With my proposal you would get this behavior:

row = app_tables.my_table.add_row(id='1234')

print(row.get('last_update', 0))
>> 0
print(row['last_update'])
>> None

row['last_update'] = None
print(row.get('last_update', 0))
>> None
print(row['last_update'])
>> None

stefano.menci · January 31, 2025, 5:57pm

As far as I know, Postgres (the underlying database server) can’t tell whether a value is NULL because you set it to NULL or because it was never populated. So Python doesn’t know what kind of None we are dealing with either.

If you use a Simple Object column to store a dictionary, then you are dealing with a dictionary and the concept of missing key now makes sense. But the concept of existing key with unpopulated value still doesn’t.

racersmith · January 31, 2025, 6:08pm

Seems like there could be a simple Empty object and None object that would resolve that between the postgres and python layers. I mean you already have to map a null value to something, right?

However, just saying that the null is protected this is still more useful:

row = app_tables.my_table.add_row(id='1234')

print(row.get('last_update', 0))
>> 0
print(row['last_update'])
>> None

row['last_update'] = None
print(row.get('last_update', 0))
>> 0
print(row['last_update'])
>> None

than the current get(),

row = app_tables.my_table.add_row(id='1234')

print(row.get('last_update', 0))
>> None
print(row['last_update'])
>> None

row['last_update'] = None
print(row.get('last_update', 0))
>> None
print(row['last_update'])
>> None

stefano.menci · January 31, 2025, 6:22pm

Yes, NULL is mapped to None, which is what you get.

The current behavior is correct: the column last_update does exist, and its value has been set to NULL (or None), so that’s what you get.

Adding the fallback value to the getter, allows you to get the value of a missing column, just like in a dictionary with a missing key, not the value of an existing column.

You could create your own getter like this:

function get_row_value(row, column_name, default_value=None):
    value = row[column_name]
    return value if value is None else default_value

racersmith · January 31, 2025, 6:30pm

Yup, that is what I currently do.

I understand that it is better thought of as a dict that has been fully populated with either the set value or None. It just seems like a waste of a perfectly good get method.

Maybe what I should be asking is: How are people using .get(key, default) now that is more helpful in providing a default value outside of the scope of the table?

stefano.menci · January 31, 2025, 9:05pm

I have never used the .get method with row objects. I always use the square bracket notation.

I handle the None value explicitly. If I need to differentiate between specific cases like undefined, empty, or not set yet, I don’t rely on None. Instead, I introduce a specific value that represents that case. For example, for a text column, I might use a string like <empty>.

I think it’s fairly standard in databases to store either a value of the column type or None, and you can assign any value, including None.

Introducing Empty would shift the pattern to storing a value of the column type or None or Empty. This would only kick the can a little further, because someone could come and ask for None, Empty or Undefined. And so on.

racersmith · January 31, 2025, 9:55pm

That’s sort of my point. The .get() method is pretty useless (pelase, someone prove me wrong). But, wanting to get a value that will not raise type errors in your code is quite helpful.

I guess I’ll just continue to use the value = row[key] or default method.

stefano.menci · January 31, 2025, 10:10pm

This is dangerous because in a numerical column it will return default when row[key] is 0, in a text column it will return default when row[key] is '', etc.

The correct way is to decide what None means and manage it. And when you manage it, the test should be done with row[key] is None, not with its truthy value. Using truthy can be OK in some cases, but often is not.

p.colbert · January 31, 2025, 10:12pm

In the relational database arena, it has long been recognized that Relational NULL (None) can be used to mean many different things:

does not apply (i.e., value would be meaningless)
known to have no value (e.g., has no spouse)
not yet entered (i.e., haven’t asked)
invalid value entered
value unavailable (i.e., the source didn’t provide it)
…

So, NULL/None by itself is often not enough to tell what it means.

@stefano.menci suggested one approach: store a value that is distinct from NULL, and from any valid value, to indicate what is known about the missing value, and/or how it should be resolved.

IIRC, Chris Date suggested putting such an indicator in an adjacent column. This would work even when the original column has no invalid values that can be used as “flags”.

racersmith · January 31, 2025, 10:23pm

@stefano.menci I’m aware, I outlined these pitfall conditions in the OP.

@p.colbert There is no shortage of ways to handle missing data. Of course there is no one-size-fits-all solution. But utilizing .get() in a way that could be one solution seems better than it’s current implementation.

Do you have a use case for .get(key, default) as it stands today?

But I guess in all reality, this would never be changed as it would change a core function behavior. So, I guess, as Joey from Friends would say, “The point is moo.”

p.colbert · January 31, 2025, 10:28pm

My goof. My comment was not about ways to handle the omission. It was about deciding what the omission means in the first place. I’ve edited my comment to make it more clear.