Very slow database performance

Hi

With everyone’s help I am making good progress on my basic CRUD app.

However, the data table performance is excruciatingly slow.

I have a simple data table with ~500 rows and I really expected that populating the datagrid will be snappy. But it is not.

I tried returning a dictionory as suggested by @alcampopiano in another post, but that did not help.

I am not very savvy with timing functions etc. but it is very apparent that it is slow.

I am a paid user and below is the clone link. Seems like we have had some issues last year with performance of the data tables but looking through the topics it seems it is unresolved.

Please let me know if there is something I could do to improve the speed of data loading.

You just need to create a account and sign in and you will see that it takes a while to load. If you delete a record, it will refresh and you can once again see that it takes a while.

Thanks for your help.

https://anvil.works/build#clone:PL4RM3BSN5W4UFXR=RL3QTSDME2LFYARX5TQFSB57

2 Likes

I can’t say whether you’re seeing a legitimate issue or not, but you are returning the entire table’s worth of rows when the user is only viewing the first page of them.

Personally, I’d return the search iterator so that Anvil could retrieve only the rows needed and allow the data grid to retrieve more when the user asks for them.

So, instead of:

jobs_dict = [dict(r) for r in app_tables.jobhistorymaster.search(tables.order_by("jobDate",ascending=False))]

I’d use:

jobs_dict = app_tables.jobhistorymaster.search(tables.order_by("jobDate",ascending=False))

Thank you @jshaffstall

I was returning the iterator this entire time and the performance was bad.

Looking through the forum, I found a topic where returning dictionaries was suggested. But it hasnt improved anything.

Below is a new clone with your suggested change. Is the performance acceptable for a small table with 500 rows? Will it be better if I were to retrieve only a few columns?

https://anvil.works/build#clone:PL4RM3BSN5W4UFXR=RL3QTSDME2LFYARX5TQFSB57

For what it’s worth, the app runs well enough for me. There’s a subjective delay of about 2 seconds when it retrieves the job list.

If you can identify where it’s slow on your end, that would help. Maybe put some timing checks in so that you know how long the server call takes to complete, how long the client waits for the server call to complete, how long it takes to populate the repeating panel.

To time something you can use the datetime module, e.g.:

import datetime
  
s = datetime.datetime.now()
# do something that needs timed
print("Elapsed time " + str(datetime.datetime.now()-s))

From what I can tell - the bottle neck seems to come from the serialization - i.e. actually sending the table rows to the client.

for a helper function for timing your client side and server side functions this might be useful

Hi @rthatha, please see the discussion here, it helps a lot

1 Like

Timing your clone app for fetching the jobs table I had these results:

  1. hh:mm:38.462080
  2. hh:mm:39.887506
  3. hh:mm:40.029495
  4. hh:mm:40.212050

Where:
data_access_client:

  print('0) '+str(datetime.datetime.now()))
  __joblist = anvil.server.call('get_joblist')
  print('3) '+str(datetime.datetime.now()))  

and

data_access_server

  print('1) '+str(datetime.datetime.now()))
  jobs = app_tables.jobhistorymaster.search(tables.order_by("jobDate",ascending=False))
  print('2) '+str(datetime.datetime.now()))

What looks strange is the 1.5 sec to have the client to the server (1->2).
Then it’s a 0.2 and a 0.2 to complete.

Would it help to have an index on this field?

@p.colbert may be an index on jobDate could speed up the sorting but the query itself is 0.2 seconds … not too bad on the total 2+ seconds overall and IMHO not the main “culprit”.

Ah. I see. It’s the server-code startup lag, not the database lag.

Server-code startup lag has been discussed elsewhere to good effect.

As more background, when you call a server function, all server modules get loaded, which imports all modules those server modules import. Even if those modules are not used for that particular server function.

You can time how long each import takes. In your case, the non-Anvil ones came up as:

Pandas 0:00:00.000009
Anvil PDF 0:00:00.000006
io 0:00:00.000004
plotly.graph 0:00:00.208733

So none of them are taking a long time, although plotly.graph is quite a bit longer than the others.

You can move time consuming imports into the server functions that need them, so they only get imported when that server function is called, rather than every time.

1 Like

Wow, thanks a lot for all the responses. Truly thankful for all the support.

I have now tried to time the functions as suggested.

  print('client function in) '+str(datetime.datetime.now()))
  __joblist = anvil.server.call('get_joblist')
  print('client function out) '+str(datetime.datetime.now()))  

  print('Server function in) '+str(datetime.datetime.now()))
  
  jobs = app_tables.jobhistorymaster.search(tables.order_by("jobDate",ascending=False))
  
  print('Server function out) '+str(datetime.datetime.now()))  

First load at login gave these results:

client function in) 2020-09-18 21:26:49.582415
Server function in) 2020-09-18 17:26:51.356574
Server function out) 2020-09-18 17:26:51.620033
client function out) 2020-09-18 21:26:52.038360

Total: 2.5 seconds.

I then added a row, which refreshes the table.

client function in) 2020-09-18 21:27:24.983995
Server function in) 2020-09-18 17:27:26.671233
Server function out) 2020-09-18 17:27:26.776778
client function out) 2020-09-18 21:27:27.177015

Total: 2.2 seconds

client function in) 2020-09-18 21:27:33.282830
Server function in) 2020-09-18 17:27:34.937255
Server function out) 2020-09-18 17:27:35.038419
client function out) 2020-09-18 21:27:35.295425

Total: 2 seconds.

So, from @jshaffstall reply I understand that this is all server lag as I am loading unnecessary modules. And it seems to load them everytime the function is called and not just the first time?

So, I went ahead and removed four modules io,pdf,plotly and pandas. Below result…

client function in) 2020-09-18 21:36:36.501675
Server function in) 2020-09-18 17:36:37.967317
Server function out) 2020-09-18 17:36:38.101370
client function out) 2020-09-18 21:36:38.332245

Total time: 1.8 seconds.

Add a new row…

client function in) 2020-09-18 21:38:14.431935
Server function in) 2020-09-18 17:38:15.985631
Server function out) 2020-09-18 17:38:16.124561
client function out) 2020-09-18 21:38:16.343080

Total time: 1.9 seconds

The performance has not really improved. So, I guess the culprit is somewhere else?

The above two quotes are about things I can try, but I am afraid I don’t really understand what it means to use JSON columns!!

I really did expect that just using the built in data grid and table search function was the easiest and fastest.

Can you help me please with some simpler directions to improve the performance?

Where should you concentrate? It starts with your numbers.

By far, the largest part of the time lag seems to lie between “client function in” and “server function in”. Therefore, this is your biggest opportunity for time savings. You’d gain more by cutting this in half than by cutting all other lags by 100%. From what you’ve written, it looks like a good spot to concentrate on.

Because that lag is part of every single server call you make, you stand to gain on every call, not just the ones you’ve shown.

You’ve already shown that you can reduce that initial lag from around 1.8 seconds to around 1.5, or a saving of roughly 16%, with what sounds like fairly little effort. This is very promising, and suggests that more reduction may be available in this area.

Where this lag comes from: It’s not so much the total number of modules, as the total number of distinct import statements. import results are cached, so even if there are 40 import X statements, the first one executed does most of the work.

Specifically, import statements that bring in code that is not part of your server-side code. All of your server-side modules will be loaded anyway. It’s the other import statements that might potentially be postponed – in some cases, indefinitely.

Other server-side time-eaters are certainly possible. For example, setting up a global data structure, to cache frequently-read data. In a desktop-based program, this is often a good idea, because such an object will cost its construction time only once, and thereafter it gives fast data access throughout the life of the program. Thus, there’s a net savings in time.

Folks may do this in Anvil server-side code, under the assumption that the same advantage applies. And it can, if you’re paying to keep the server-side program running between calls. If not, then that server-side “program” keeps being closed and re-started. That initial construction cost, then, is paid over and over again, at each restart.

If you’ve got such a thing going on, then that may be another opportunity for savings. Anvil does offer a per-connection data cache, that does persist between calls. See Storing session data.

1 Like

Thanks @p.colbert

The approach makes a lot of sense. I have tried a few things but the performance has not improved much. Below the current timings. Looking at it real time, it feels like the time from when the server sends data to client receiving data is where the lag is.

calling server function) 2020-09-24 10:13:14.802595
Server function starts) 2020-09-24 06:13:16.809160
Server function ends) 2020-09-24 06:13:16.933067
received data) 2020-09-24 10:13:16.763920

On the server side below is the only code. I am not sure if there is anymore saving to be done with reduction of “import” statements.

import anvil.users
import anvil.tables as tables
import anvil.tables.query as q
from anvil.tables import app_tables
import anvil.server
import datetime

@anvil.server.callable 
def get_joblist():
  user = anvil.users.get_user()
  if not user:
    return []
  
  print('Server function in) '+str(datetime.datetime.now()))
  
  jobs = app_tables.jobhistorymaster.search(tables.order_by("jobDate",ascending=False))
  
  print('Server function out) '+str(datetime.datetime.now()))  
  return jobs

As others have suggested i might have to look at returning a better data structure from the server? What would be the best way?

Regards
Rajesh

@rthatha, not a solution but a workaround suggestion here.
If server-side startup time cannot be reduced more, and if this lag is not acceptable on your side, you could try the same approach as Heidi did in her Vocab Victor Classroom App leveraging Google Firestore services.
As she says:

Serverless model: We determined that the Anvil Tables system, while handy, led to very slow performance. Therefore, we decided to use a serverless model and wrote a python + js library to read/write to Google’s Firestore. It is lightning fast, extremely reliable, and allowed us to write the entire application with very few (I think maybe three or four) server functions.*

This may apply to your case too.

BR

I too have a case of a relatively slow db performance, and found out that the search() method itself on data tables is not slow, but the actual use of data (i.e. when you assign the search iterator to container items). Interestingly, I found practically no difference when using some_field=q.any_of() with lots of possibilities (>20) in comparisson to a simple some_field=some_value.

Every anvil.server.call() from client to the server takes 1s on average in my app. Since I have some complex forms where I combine data from several queries, a bad approach was to have not only one but several (also more than 10) anvil.server.call() statements to load all the data. After investigation I changed my “architecture” and now I use only one call per form, where server function packs search iterators in a dictionary.

It would be nice if anvil would have some way to load only the server module which is actually used in a specific call.

I suggested that, awhile back. In practice, it’s trickier than it sounds, because any server module can call

  1. anything in server_modules/
  2. anything in modules/

So the default is to load them all. During the load process, @anvil.server.callable collects a list of all the remotely-callable functions. This list could be cached, but figuring out which supporting modules also need to be loaded is not trivial. And if the bulk of the startup time is in starting a new instance of Python itself, then such an optimization isn’t worth the effort.