Linking multiple tables & perfomance

KR1 · May 4, 2023, 5:17am

I was wondering how bad is the Idea of linking of multiple tables and making a chain of linked tables
It’s no so easy to test without setting it up, with enough data. Maybe someone could share his experience in that matter.

I need to create a master table that contain for e.g. Project.
Almost every other object will be linked with the project, because it’s a central point for everything.

Linked to that table will be another one containing users and their roles in project.
And another table containing products to produce for the project. Those products will be linked with a table that containing the production data and another data with production instruction.

So at the end I will get like Project → product → documentation from production → cerficates.

I’m having concerns about the loading times of the data.
I was thinking about assigning some custom unique project id (base36 number created via transaction) and instead of making so many chained tables use that unique id instead.

jshaffstall · May 4, 2023, 1:25pm

Every project is unique, so ultimately you’d need to test it with your data set. I tend to err on the side of allowing the database to do as much as possible in one go, as opposed to me making a lot of individual calls, so I use linked tables a lot.

I can say that accelerated tables speeds up loading times immensely, while giving you control over whether linked fields are preloaded.

rickhurlbatt · May 4, 2023, 1:41pm

I have spent ALOT of time testing load times and rebuilding apps one way or the other to get the best loading times. Honestly like we are talking days on end

If you were asking this question a year ago, I would say that using non linked rows is the fastest … but now with cache control in the accelerated tables, some of my load times are 10x faster (yes I have empirical data to back this up).

Accelerated performance with linked rows and careful cache control using q.fetch_only() you can get blistering speeds.

Granted you will never beat local storage like indexed db but you also have don’t have the development overhead of dealing with indexed db.

We have reverted back to linked rows because of the speed and flexibility and I cannot recommend them enough!

owen.campbell · May 4, 2023, 1:49pm

I can confirm this. I have a client where the app was originally designed with lots of linked rows. For a while, we regretted this but always had other stuff to do and never changed it. Now, we’re very happy we didn’t. It sped up 5 fold over night when we switched on the accelerated tables!

stefano.menci · May 4, 2023, 3:39pm

Two limitations of the linked tables in Anvil are that they are in one direction only and that they can’t be used in searches.

For example, you can create an SQL query that joins two or more tables, uses columns of one table in the filter and reads columns of another table. In such SQL query there is no “direction” in the link.

In Anvil instead, your project would know its products, but your products wouldn’t know their project. You could search for a project and find its products, but you can’t search for a product and find its projects. You would need to explicitly add a linked column in the other direction.

Because of these limitations, I have never used linked tables. I have used either tons of simple object columns or columns with ids. And then found the products of a project with a second query. A second query is usually only twice as slow (two queries rather than one), while a badly behaving query going wild on linked tables may be hundreds of times slower (one query that fetches thousands of useless rows).

Today there are accelerated tables. They are faster and they allow to specify what’s loaded immediately, which makes them even faster, so I may start using linked tables in the future.

But reducing the number of tables from 50 to 5 just using simple objects rather than linked tables will remain one of my main strategies. For example I have an app that has the Project → Trucks → Crates → Layers → Panels hierarchy. 5 panels sit on the same layer, 10 layers fill a crate, 5 crates fill a truck, 15 trucks make up a project. Using a well designed database I would have the following tables:

Projects
Trucks
Crates
Layers
Panels

Instead I have only 1 Projects table with 3 simple object columns:

Panels
Trucks
Crates

This is because I will never need to search a panel or a truck or a crate without knowing the project first. Every interaction with the app starts from the project. Once the project is selected, I may need to manage panels or crates or trucks. Each truck is a dictionary and has the list of crate ids (together with many other properties). Similarly, each crate is a dictionary and has the list of panel ids.

So my answer is yes, those links may work very well, and will do even better if you do good use of the q.fetch_only() option. But it could be much better not to link the tables, or not to even have them and work with simple object columns instead.

It all depends on what you will need to search and how many items each category will have.

KR1 · May 5, 2023, 4:36am

Thank you for the sharing with your experience. Every single post contains valuable information. I will use linked tables and everywhere where it is possible, I can use simple objects to reduce the data.

One last question. In one case it may be needed to use a huge multiple link column. It can contain even 10.000 linked rows with 6 columns. I can convert the data in 6 columns into single simple object, but how would it perform in loading the 10.000 links? Should I avoid using so many multiple links?

stefano.menci · May 5, 2023, 2:08pm

After 3 days from deployment I got complains that the app was getting slow. I checked and noticed that one of the simple object columns was growing above 2MB and it was slowing down the app.

I have not tried storing the thousands of dictionaries from that list stored in that simple object column as individual rows of a second table. I have not tried, but I’m (almost) sure it would have been slower. Reading 2MB from one row in one shot should be faster than reading 20,000 individual rows and building the 2MB object in memory (well, 2MB when stored as json object).

My solution was to be less lazy and organize the data in a smarter way. The list contains all the versions of the crates. Two consecutive versions always differ in one or two crates out of a list of 10-20 crates, so, when a crate doesn’t change, I replace its definition (containing the list of layers, each containing the list of panels) with a pointer to a crate defined in the previous version. This compresses the size of a large list of crates with a long history from 5MB to 300KB and the app gets snappy again.

I understand, my answer to save 3,000 rows instead of 50,000 may not be acceptable!

But the point remains the same:

If you always need all your 10,000 rows in memory and you never need to search any individual row, then the simple object is the fastest solution
If you need to search one individual row, then loading 10,000 rows in memory could be a waste of time, better having them in one table with the value you need to use for filtering in its own column (so it automatically has its own index)
If you need only the first few rows, for example you often need to iterate and stop the iteration before the end, then again, having its own table will help. It will be slower when you need to read them all, but it will be faster when you need to read only a few of them

In my case I do need the whole history only in a small number of cases. 95% of the interactions only need the latest version, so I could have stored one version per row. But I decided to stick with storing and reading always all the versions, because avoiding linked tables makes my life easier and the impact on performances wasn’t too bad.