Best way to find a user's entry in a data table?

What I’m trying to do:

I have a number of large data tables, with one entry per user.

Whenever I want to find a value for a user from a data table, I have to first find the row corresponding to that user. I end up doing this a lot and it significantly affects performance, so I would like to know how to do it best.

What I’ve tried and what’s not working:

Currently, each time I want a data value for a user, I search for a unique property of the user in the data table to find that user’s row e.g.:

rowForUser = app_tables.user_data.get(email_address=email_address)
dataValue = rowForUser['data_column']

I know I can keep hold of that row while I’m in the same function, but in practice I need to get data all over the code base, so I end up often repeating that search to find rowForUser.

It occurred to me before that I could add a column to the main Users data table which contained links to each user’s row in the user_data data table. And add additional columns with links to all the other relevant data tables.

However, it turns out that when you put a row link into a data table, whenever you load a row in that data table, it also loads the whole linked row. So, if I put links to all the user’s data table rows in as columns in the Users table, a simple call to anvil.users.get_user() would load all data for that user (which would probably fail as a result of needing too much memory, and certainly seems inefficient).

This must be a completely standard problem, so surely I’m missing something obvious. Thoughts and help greatly appreciated!

If you do this on the server side, it’s just bad practice, if you do this on the client side, it will slow down your app to a crawl.

If possible, you should try to get what you need once and either pass it around as function argument, or store it in a class instance and pass that as a function argument, or, often better, have that class getting it and remembering it, then you call its methods, and the class instance will already have a reference to the user’s row.

If you use accelerated tables with q.fetch_only you can control what’s immediately fetched and what’s left to lazy loading.

1 Like

Infrequently changing data can also be cached either on the client (in a module) or on the server (in the session dict).

That comes with its own challenge of keeping the cached data up to date, but depending on how often the data changes it might help.

1 Like

Thanks for your answer. I realise I didn’t ask my question very well, sorry - let me try again, stripping out a couple of red herrings.

Each time I make a call from the client I need to find the user’s entry in a data table called user_data (in practice because my code’s not perfect I maybe have to do it more than once; I’m working on that but it’s not the subject of this question, sorry for the red herring). I can’t cache the values from the data table because they change in unpredictable ways.

Because the app has a lot of total historical users (~100k) the data tables are large and finding the user’s row takes a while - 0.5s or so, which makes the user experience slower than I’d like.

It seems crazy to have to search a huge data table for the row corresponding to that user each time they take an action, so I was hoping there was a “best practice” way which is better.

My approaches have been:

Approach 1:

usersRow = anvil.users.get_user()
emailAddress = usersRow['email_address']
dataRow = app_tables.user_data.get(email_address=emailAddress)

(This is what I’m currently using, but requires a search each time)

Approach 2:
Include a link to the required data table row as a column in the Users data table.

usersRow = anvil.users.get_user()
dataRow = usersRow['user_data_row_link']

The difficulty here is that the whole data row is returned as part of usersRow. Since anvil.users.get_user() doesn’t take any useful arguments, I had assumed that it wasn’t possible to use q.fetch_only here. I just checked though and it doesn’t throw an error - so maybe it works?

I worry though that I am making up a complex answer to what must be a very common requirement?

Maybe if you explain more about why @stefano.menci’s fixes won’t work/how often the values change, we could understand better how to help. Passing the value you need to various functions and form classes ought to be pretty simple if you make the value a class attribute and then use open_form to change to new forms wit hit as an argument, but if it’s not for you, I’d be curious to see code samples or a stripped down clone to see what alternatives might be suggested.

Are the values in question changed by the user themselves or by other users?

If the former, then you just have to change the class attribute and then follow Stefano’s methods.

If by other users, then more context could help, maye a time could help things happen in the background.

1 Like

Thanks for your reply. I think, to paraphrase, that my question is “how can I quickly find a user’s row in a data table”, and stefano’s answer is essentially “you shouldn’t need to”.

One short answer as to why I need to at this stage is that the code base is now very large and it would probably take a substantial redesign to not get data from data tables for each request.

But perhaps I’m missing something. I don’t fully understand how I can store data in a class. Is this class on the client or the server? A lot of the data is not suitable to be held by the client (for example, how many paid credits they have left). At the server end, I’m not aware of how to have persistent data except via data tables - have I missed something major here?

On reflection, I guess the reason I am using data tables is that client requests often need to access and/or change data which shouldn’t be visible to the client. This data changes often and, because I can’t guess what order the user will do things in, in an unpredictable way.

Here shows how you can pass data between forms:

I think if there’s a performance bottleneck that you need to resolve and the issue is across your app, then unfortunately, I don’t think you’ll get around doing substantive work regardless.

Well, my performance issue is 100% due to the time taken to find the data table row for the current user.

Since this is always the same for that user, I guess the solution is probably to store the link to that row somewhere. However I don’t know how to do that (I guess because I don’t really understand the row objects).

Any ideas how I can store the row link in a neat way for the user so that the server code can rapidly retrieve it whenever a call comes in from the client?

The biggest time sink when making server calls for getting data table data is the server call itself – that is the time it takes to go from client to server and for the server to return the data back to the client.

app_tables.user_data.get(email_address=emailAddress) is just the basic way to get a single data table row. The only thing you could do to make this faster is to make a server callable that gives you only the single column you want from the row, but that probably won’t radically change the performance you’re seeing.

The way to fix performance issues where you are trying to get data – especially similar/the same data – is to make one or minimal number of round-trips to the server.

That’s what Stefano’s post was getting at.

Ah, sorry for the lack of clarity. I’ve already carefully avoided multiple server calls, so I only ever make one server call (if any) when a client takes an action. I agree that server calls are inherently time consuming, but in my case it’s not the most time consuming step.

In the case of my app, because the data tables are very large, that app_tables.user_data.get(email_address=emailAddress) really is the issue (I’ve profiled it very carefully).

It can take up to a second, and sometimes I need to get data from two or three different data tables, so the time for finding those rows can add up. I also suspect it’s quite CPU hungry, although I’ve got no way of confirming that.

Ok, that’s fair.

If you really can’t reduce the number of calls you are making, why don’t you split up the data table into 2 or 3 separate datatables?

Are you saying that when you print the timestamps before and after each of these three rows, you see 1 second used by the third line?

If yes, have you tried using q.fetch_only('email_address') to only get very little data? This will tell you if it’s slow because it’s getting too much data or is slow searching for the row.

I have tables with hundreds of thousands of rows, and I can get hundreds off rows per search very quickly. The whole round trip doesn’t take 0.5 seconds.

But I also have a table with about 40000 rows where getting one row was very slow, and I ended up splitting it in two, one for the weekly usage and one for historical record. I don’t know exactly why it was so slow, perhaps because of a simple object column that in some rows, a small percentage, can get very large.

I have seen posts like this complaining about performances of the database. In my 8 year experience, with many hundreds of tables, I only had one slow table. This means that it is possible, but it is very unlikely (in my experience).

1 Like

Thanks, this is really interesting to hear.

Yes, line 3 would take up to a second.

So I have two main data tables, game_state and user_data.

game_state was really bad, sometimes several seconds to find a row. Originally it consisted of lots of simple object columns which were long and frequently modified. After lots of being confused, Anvil suggested that the changing simple objects might be causing indexing mayhem, and I should change them to media columns (packing the objects as JSON into media objects). I did this and they kindly rebuilt the index, shaving 8GB off in the index alone. More recently they kindly did a “VACUUM FULL” operation (rebuilding the entire data table) which shrunk my total disk usage from 255 GB to 80 GB(!!), and now the search time seems to be <0.1s, which is fine.

The other one, user_data, has a typical search time of 0.5s. No idea why this one should be slow, it hasn’t got any complex data in it.

So, I guess maybe I’ve been unlucky with poorly performing data tables? It’s all frustratingly opaque.

I also have issues with weirdly high CPU usage, which I suspect is the same issue, as my app doesn’t have anything in it which looks like it would use much CPU - maybe it’s burning it all endlessly searching data tables?

Does it mean simple object columns are indexed??

I use them as blob containers for stuff I don’t need to search, and indexing them would be a big waste of resources.

My understanding is that yes, they are.

I’ve issues with simple object columns causing slowness for tables. I restructured the database for the app and made it more of a star schema. This works when you have a defined schema, but you might not.

I used simple objects because I added functionality to an existing app and basically got lazy with defining a schema.

I have tons of simple object columns, and my performances are just fine.

My tables have a few searchable columns plus one, sometimes a few, simple object columns where I store the bulk of the data. Most of the times this approach reduces the number of tables.

For example I have a Clients table with the list of clients. I never search by address, so the address is in a simple object column. Each client has multiple releases, so there is the Releases table with one column with the client id, a few searchable columns and a few simple object columns.

One simple object column contains the list of panels and accessories in the release. In a typical relational db this would include tables for panels, for accessories and a few more tables, all joined. Instead one simple object column does the job.

Another simple object column contains the list of crates, each crate has properties like size and weight, plus a list of layers. Each layer has a list of panels or accessories. Each entry in each part, layer or crate is versioned, and includes the timestamp and the name of the operator that saved it. In a typical relational db this would include 10+ tables, all joined. Instead a simple object does the job. The versioned history of a crate can contain tens of thousands of items organized in lists and dictionaries, and I get it in one single step, without any proliferation of tables, linked or not.

A third simple object column in the Clients table contains the list of trucks. Each truck can contain crates from different releases, is versioned, etc. Trucks and releases are linked via release id, truck id and create id, but there are no id columns. All these ids, thousands of ids, are stored in a handful of simple object columns.

I am always surprised to see complains about db performances here in the forum, perhaps because my approach to consolidate dozens of tables into a few simple objects keeps my tables nice and zippy.

For what it’s worth, my most problematic data table was based on simple objects. These simple objects were frequently changed, frequently extended, and contained a decent amount of text (thousands of words). Anvil reckoned this caused me problems because the indexing of these simple objects was very inefficient. The data table also seems to have wasted a lot of space, since apparently whenever a row is modified it’s written to a new bit of disk space unless it happens to fit in the space left by the previous version. Since my simple objects were frequently increasing in size on each interaction, it seems that each interaction caused more disk space to be used. I eventually redesigned the data table to use only media objects (essentially simple objects just converted into JSON in a media blob), which seems to have improved things.

My other problematic data table, by contrast, is as simple as simple could be. Just things like number of credits, email address, IP address, that kind of thing. No linked rows, no simple objects, only about 20 columns. It takes about 0.5 seconds to find a row among it’s ~100,000 rows. So I’m really not sure what I could have done differently here.

I’ve not used linked rows until now, so that’s not the issue. I’m just trying to figure out if they can help me.