Security of Users table

I’m trying to understand the security implications of the users table with the client side call anvil.users.get_user(). I have the users table set to not allow any client side access, however, I can still make the get_user() call on the client side and retrieve the user’s row as well as any linked rows to other tables which are also marked as no client side access.

import anvil.users

from ._anvil_designer import TEST_PAGETemplate

class TEST_PAGE(TEST_PAGETemplate):
    def __init__(self, **properties):
        self.init_components(**properties)

        user = anvil.users.get_user()
        print(dict(user))  # I get all the data from the user's row
        print(dict(user['link_to_other_table']))  # I get all the data from the linked row

I can see the data in the server console in Anvil but I can also see the user row data in the webpage developer console after the RPC to anvil.private.users.get_current_user

So my questions:

  1. Why can a client side call get data from a table that is marked as no client side access? I’m guessing it is because of the callable ‘anvil.priviate.users.get_current_user’ function.

  2. Is it possible to keep the user from obtaining their user row and linked rows? Couldn’t any logged in user make the anvil.private.users.get_current_user call?

  3. If the user always has access to their user row, is there anyway to reduce the retrievable scope?

  4. Should all data in or linked from the user’s row be considered client readable in reality?

For now, yes. It isn’t writable, but it is readable.

For this reason, for any app-specific per-user data, I create one or more “shadow” tables. These link to the Users table, instead of embedding any links in Users. This complicates new-user creation, and user deletion (shadows must be created and destroyed).

But it definitely limits user-level access. I give the logged-in user no access to shadow tables, or any others.

This limits the exposure to just the user’s own Users row, and while they can read it, they can’t write to it, so the risk is minimized.

4 Likes

Thanks @p.colbert. I was hoping that wasn’t the case. I’ll have to rethink some of my tables now.

In the past I’ve used the user row id from user_row.get_id() as the unique key in other tables. Maybe I’ll make a sort of data_hub table that uses the row id for lookup and then has columns of my linked data in other tables.

linked_data = app_tables.data_hub.get(uuid=user_row.get_id())

Thank you for bringing up this security issue which I have to consider myself for relating data tables.

Having searched the forum I’ve found this thread: security of linked tables (Oct 2019) and a related feature request.
The following solution for choosing and giving access only to specific columns is a feature of the „accelerated tables beta“ brought by the Anvil team: choose specific columns in Acc Tables Beta (Mar 2023).

That feature prevents making extra „shadow tables“ as suggested, right?

Are there other „best practices“ for securing data tables etc. which are not obvious at first sight? I have found many hints here and there which I integrate into my anvil/python code. I am not a security expert but I know enough to know that I depend on relying on a secure infrastructure and that I am grateful for any recommendations to „harden“ code.

Unfortunately, that doesn’t affect the row returned by anvil.users.get_user. To use that feature you’d need to make a sever call to the user, create a client-readable view on the users table there, and return that to the client.

And that still wouldn’t prevent someone who knows Anvil apps from doing an anvil.users.get_user call on the client.

Best practice at the moment seems to be to not link from the Users table to anything you don’t want the client to see, and to never link to the Users table from somewhere else (so you aren’t sending other user’s rows to the client).

2 Likes

@tfm, @jshaffstall is right, essentially there is a default callable server function anvil.users.get_user() that is included with the User Management add-on feature. This is not something anvil users have access to modify so we are stuck with it returning the entirety of the user’s row.

Some best practices that I can think of:

  • Do not add any new columns to users that you do not want exposed to the client

  • Do not link out of or into (thanks @jshaffstall, I hadn’t considered how easy it would be to forget to restrict a view with q.only_cols() when linking into users)

  • generally, making secure data require a table.get() or table.search() call to something other than the user table.

user = anvil.users.get_user()  # obtain user row
# if you have sensitive data at this point, so does the client

user_id = user.get_id() # or whatever you choose
sensitive_row = app_tables.<not user table>.get(uuid=user_id)
sensitive_data = app_tables.<not user table>.search(uuid=user_id)
# data from these are not accessible on the client unless,
# you write a callable server function and send it to the client.

thoughts on feature request?

I think there are a couple of tiers that could be added that would help reduce exposure.

Without accelerated tables:
It appears there is already a distinction being made when calling get_users() from server vs client. You could give us an app level flag to disable client side returns of get_user(). This would force users to write their own server function and allow them to handle what data is sent to the client.

With accelerated tables:
Default behavior of get_user() to restrict scope to just the “safe” columns or those that drive good design, ie, don’t verify password on the client side. Then give the option to allow the unrestricted view with a disclaimer of client side access.

Suggested addition to User Managment config panel:

get_users() behavior:
No client side access - most secure
Restrict client side access to columns: (Accelerated Tables Only)

  • email
  • confirmed_email
  • enabled
  • last_login
  • n_password_failures

No client side restriction - least secure (<< current behavior)

I would love to hear community thoughts and then move it over to feature requests.

4 Likes

Ok, I went overboard. Created dependency app that provides either a clear text version of the row.get_id() or an encrypted version. My thought for the secure versions was for some of my Stripe integrations where I want to send a reference id out that I can then use to reconcile accounts.

The other use is to provide links back and forth between tables that can’t be associated “over the shoulder”.

Gives two modules, common and secure, that have mirrored functions.

Get a identifier for the authenticated user:

>>> common.get_user_id()
'911779_2238396130'

>>> secure.get_user_id()
'YooTuJr%2Bm6SLWMwQwpez5Jb1exIl7OyapStY%2Bu6IfjpNHFnu2zKwy6uFXbZW'

There is a generic version for any row:

>>> row = app_tables.common.search()[0]
>>> common.get_row_id(row)
'911818_2238554191'
>>> secure.get_row_id(row)
'5lt45mQmv%2B63WHwlIe8Za%2B1esOs6%2BNeOfr4XEoPN2Uo%2FSdkOXGye1MBr%2BMvK'

Ok, who cares… But here is the interesting bit, from either the common or secure id you can obtain the row without knowledge of what table it is from. This is because we are encoding the row.get_id() which includes the table id already.

>>> row = common.get_row('911818_2238554191')
>>> dict(row)
{'info': 'likes the color red.', 'user': '911779_2238396130'}

>>> row = secure.get_row('5lt45mQmv%2B63WHwlIe8Za%2B1esOs6%2BNeOfr4XEoPN2Uo%2FSdkOXGye1MBr%2BMvK')
>>> dict(row)
{'info': 'likes the color red.', 'user': '911779_2238396130'}

This implementation only works with accelerated tables. This is due to the changes in how tables can be accessed as well as where the table id is stored. I also have not tested the edge cases here with table views and such.

Here is the clone if you are so inclined:
clone

Review the tests to see usages.

3 Likes

This looks like a good post for the Show and Tell category. :smiley:

Thank you very much for this clarification! That in mind helps me to plan and build my User tables in a more secure way.

Many thanks for your best practices! I appreciate your detailled hints and recommendations to prevent me from doing „naive coding“ and exposing unwanted data.
I have to reconsider some of my table constructions and to put as little information as possible in the Users table.
I support your suggestions/additions to user management! I am looking forward to any further workarounds and extensions to this issue.

It should be noted that the row id isn’t absolutely guaranteed to remain stable over time. The id for a given row shouldn’t change after it’s created, but there was a recent bug that temporarily altered row ids and caused apps that depend on them to break. The bug was fixed very quickly, so you may be okay with minor downtime due to things like that.

I tend to use an alternate unique id for those situations where I need something opaque to pass around, like a UUID4. That needs another column, though.

3 Likes

This also helps when you have to restore a deleted row from a backup. That row will never get its old row_id back – which is a pain to program around – but these alternate ids can be restored with no extra work.

2 Likes

I was worried about the row ids changing during that recent next_val issue too. I did some looking and didn’t see any of my row ids change. I assumed that these would be the identifier for usages like linking between tables and would be robust.

If this is true and they can change, (trying not to shout) why is row.get_id() exposed to users? If over the long run we can expect it to be a random number generator, what is the point? Sorry, not mad at you.

I’m not sure I can start wrapping my head around planning for changes that shouldn’t happen. I’m having a fun enough time handling things I know will happen. I feel like constantly battling things like this.

So you use UUID anytime you want a row identifier?
Should I just consider that best practice?

@p.colbert, that is a great reason not to use get_id. I hadn’t considered that.

Edit: Correction 2025-02-12! New info from the highest authority, @meredydd! See his Admin note below.

Row ids can change, but it’s extremely rare.

Admin note: This is not the case - row IDs are stable. See Meredydd’s clarification below.

Because they’re stored in clearly-identified reference columns, when Anvil has to change an id, it can change the references to match, so it’s not usually something you have to worry about. However, if you cache a row id somewhere else, e.g., as a string in a SimpleObject member, then Anvil has no idea you’ve done that, and it can’t patch up the reference behind the scenes for you.

In cases like that, it’s often best to use a UUID. Likewise, if you anticipate having to restore deleted rows from backups.

Edit: I suppose it’s also a good idea if you ever have to restructure a database. Sometimes, the best way to do that is to create an entirely new database, with new tables, and populate it from the old. In that case, NONE of the row ids can be preserved.

Just to add to the uncertainty fun:
From the Anvil Server REPL:

import uuid

>>> uuid.uuid1().is_safe
<SafeUUID.unsafe: -1>

>>> uuid.uuid3(uuid.NAMESPACE_DNS, 'my_app').is_safe
<SafeUUID.unknown: None>

>>> uuid.uuid4().is_safe
<SafeUUID.unknown: None>
 
>>> uuid.uuid5(uuid.NAMESPACE_DNS, 'my_app').is_safe
<SafeUUID.unknown: None>

So maybe the answer is extra dumb:

row['uuid'] = row['uuid'] or row.get_id()

UUID4 values are always safe from a multiprocessing standpoint, since they don’t include any fingerprint information from the device.

In a given data table, row ids should (barring bugs) be stable once a row is created. The row id for a row shouldn’t change, and row ids won’t be reused. You can always safely store a row id temporarily (e.g. in a session value) and use it later (again, barring bugs).

If you clone an app, the row ids all change because the data tables are duplicated. If you backup and restore data the row ids all change because you’re creating new rows that are duplicates of the backed up rows.

So there are situations where row ids are perfectly fine to use, and situations where another sort of id is better. I tend to never use row ids so that I don’t need to worry about the differences in those situations.

4 Likes

Would be great if some Anvil folks could chime in here too. If we should be regarding the Users table as always client-readable this could have far-reaching implications, so would definitely be something to mention in the docs etc if it’s not a bug.

4 Likes

I think the only way to make it not accessible by the client is to mark it as not server readable… But that is not particularly helpful.

I’ve done more looking at this here

It comes down to Anvil having a server callable function as part of the User Management feature. Since this is an authenticated call it has access and it returns the user row back to the client. We can’t do anything to stop clients from making that call if they are logged in.

There are a number of implicit server calls, anvil.users.get_user among them. The bottom line is that the current user’s row is always available on the client.

That does not mean the table is client readable. That would mean that the client could search the table, which isn’t true. Any table marked as not client readable can have rows returned to the client from the server. Otherwise no real app would be able to display results to the user while maintaining security.

We currently can’t stop them from making the call, even if they’re not logged in. Code running on the browser can be hacked to call whatever the hacker wants to call.

For a get_user() call, we have no control over the result. That’s entirely under Anvil’s control.

But for anvil.server.callables, we have full control over how to respond. See Require user authentication for one handy shortcut.