Server code takes more than 15 seconds to iterate over 1000 rows?

mark.breuss · May 29, 2020, 7:42pm

Hi There!

I’m creating an app which allows a sales person to take down orders. Since they are ridiciously fast typers i need to have the articles cached on client side.

When creating a dictionary server side, which is then shipped to the client, i get a timeout error.
I know why the error occurs (<30sec) and i know that I could circumvent this with a background task.
However, since I belive 1000 rows should be iterated faster, the error must be in my code or understanding of the search iterator.

I’ve tried several things (even launching two background tasks in parallel which take first and second half of the table and merge them together later ) but none improved the speed significantly.

Here is an exemplary app with my datatable.
https://anvil.works/build#clone:IFYXW5PLY45QBNKL=NZ6WYD2SVKTCLQK6CBBC6W2N

Code

print('rows',len(app_tables.article.search()))
for i in app_tables.article.search():
    ret[i.get_id()] = {'name':i['name'],'row_id':i.get_id(), 'id': i['id'],'no_discount': i['no_discount'],
                       'price':i['price'],'ust':i['ust_dropdown'],'ks':i['in_kuehlraum']}

Exception

Any help or suggestions are appreciated.

Cheers, Mark

david.wylie · May 29, 2020, 7:44pm

I might be wrong here, so take with a pinch of salt -

do the app tables search above the for loop, and iterate over the result :

res = app_tables.article.search()
for i in res:
  ...

mark.breuss · May 29, 2020, 7:48pm

hi @david.wylie,

thanks for the fast reply, I’ve just tried it - same result unfortunatelly.

campopianoa · May 29, 2020, 7:48pm

Hard to say exactly based solely on the info (and my understanding), but do you by any chance have a lot of data inside a text or simple object column in your data tables?

In other words, there may only be 1000 rows, but any given cell may contain a massive amount of data.

Just curious if that is the case.

mark.breuss · May 29, 2020, 7:54pm

hi @alcampopiano,

In the colums which I need and store in the dict, definetly not (e.g. name, id, price etc.) .

Yet about 60 % of the articles contain a picture which is scaled to 500 when uploaded:

self.item['produktfoto'] = anvil.image.generate_thumbnail(event_args['sender'].file, 500)

But i thought the pictures would not be loaded if i didn’t access them?

campopianoa · May 29, 2020, 8:03pm

If I use the full Python 3 runtime, I can get it to finish successfully in around 15s. I have not gotten it to work in the basic Python 3 runtime.

mark.breuss · May 29, 2020, 8:10pm

You’re right, to be honest I just forgot to switch on the full python 3 in the example application.

Still, is it normal to take 15 seconds to get a dict from the database which is about 250kb in size, without any processing of the data?

campopianoa · May 29, 2020, 8:17pm

I’m sorry but I don’t know if it is normal. Perhaps it is just a lot of data, or perhaps there is something going on with Anvil’s systems (although I have personally not felt any issues today).

mark.breuss · May 29, 2020, 8:21pm

What I do not understand is, how can the resulting dictionary take so long to create when there is no processing done (just take a couple columns).

If I take the resulting dict and iteate over it its lightning fast (jupyter notebook).
So data should not be the problem i guess.

Would columns with more data in the row, yet not accessed in the code affect the performance?

If so would splitting the article table help, one with the data I need to cache (only a few char and numbers) and another one with the picture?

campopianoa · May 29, 2020, 8:25pm

It’s a good question and I’d need to investigate more. In my heavy-handed testing, I deleted a bunch of columns randomly and reduced the dict to one key/value and it took 3 seconds to iterate over all the rows.

So, I altered two factors, but perhaps this can tell us something.

Sorry I cannot be of more help at the moment.

mark.breuss · May 29, 2020, 8:48pm

You’ve already helped me quite a lot, thanks for your time!

If i delete all columns which i do not cache, the time is cut from 15 to 10 seconds. Which is strange since this would mean I’d be better off by creating an article table with just the columns to cache and another one with a 1:1 relationship for the other columns.

Anyway, I’ll try play around with more tomorrow.

BTW: If anyone wants to serialize the whole table without having the linked columns as Live Objects this post helped me Dynamic serialisation of data table rows

tcarter · May 29, 2020, 10:25pm

I tried out the alternatives to get_id() on a clone and got a small improvement.
I found them in this post:

Also, just retrieving the row id once and storing it in a variable within the for loop saved another second.

  for i in app_tables.article.search():
    rid = i._spec['id']
    ret[rid] = {'name':i['name'],'row_id':rid, 'id': i['id'],'no_discount': i['no_discount'],
                'price':i['price'],'ust':i['ust_dropdown'],'ks':i['in_kuehlraum']}

stefano.menci · May 30, 2020, 1:15am

I’m on my cell now, so I can’t try, but the first thing I would try here is to first convert each row to a dict, then work with the dict. Something like this:

d = dict(i)
d['id'] = i.get_id()
ret[d['id']] = d

mark.breuss · May 30, 2020, 9:11am

Hi there, thanks to your very helpful suggestions @alcampopiano @stefano.menci @tcarter @david.wylie I was able to increase the speed by 5 seconds .
Here’s a summary for people that encounter similar:

Data
Article database, each of the 1180 rows represents an article. 29 columns, never much data, except for the article image. (can be cloned with the link in the first post)

Goal
I need a cache wich can be shipped to the client and includes the information of 6 colums as well as the row_id.

Initial Approach ~15sec

for i in app_tables.article.search():
    ret[i.get_id()] = {'name':i['name'],'row_id':i.get_id(), 'id': i['id'],'no_discount': i['no_discount'],
                       'price':i['price'],'ust':i['ust_dropdown'],'ks':i['in_kuehlraum']}

Best I could do so far (without changing the datatable) ~ 10sec

  for r in app_tables.article.search():
    rid = r._spec['id']
    i = dict(r)

    ret[rid] = {'name':i['name'],'row_id':rid, 'id': i['id'],'no_discount': i['no_discount'],
                          'price':i['price'],'ust':i['ust_dropdown'],'ks':i['in_kuehlraum']}

Some Interesting "Benchmarks"
Simply iterating - ~2.6sec

  for i in app_tables.article.search():
    pass

Creating Dictionary - ~9sec

  for i in app_tables.article.search():
    dict(i)

Access 6 Items - ~12.5sec

  for i in app_tables.article.search():
    i['article_tags']
    i['id']
    i['no_discount']
    i['name']
    i['price']
    i['text_1']

Getting ID slow - ~5sec

  for i in app_tables.article.search():
    i.get_id()

Getting ID fast - ~2.7sec

  for i in app_tables.article.search():
    i._spec['id']

'* disclaimer, times are simply datetime.now() averages from two tries.

In Summary, it seems there are many ways to improve this specific problem. However, for someone with limited understanding of the systems not really straight forward . Also I havent explored much into splitting up the datatable or caching the cache in a separate column, but that would proboably be worth a look.

Cheers, Mark

Tony.Nguyen · May 30, 2020, 10:17am

Thanks for taking time conducting this helpful experiment. I guess the issue is the nature of list and dictionary, which is quite slow.

stefano.menci · May 30, 2020, 12:25pm

My guess (still a guess, no science here), is that the bottleneck is the row object that acts like a dictionary but does more than a dictionary and has some slow overhead.

Another thing that I often do to speed up database access is to limit the number of tables and columns and use simple object (json) columns.

I only use columns for data that needs to be searched or sorted and put the rest in a simple object column. I also avoid the proliferation of tables like type or status and use a string to represent type or status directly.

These two techniques help keeping the tables simpler, the code shorter and the app faster.

Tony.Nguyen · May 30, 2020, 3:32pm

Hi @mark.breuss you could try the following code. it takes around 7s most of the time, rarely 11s

return list(dict(r) for r in app_tables.test.search()[0:1000])

mark.breuss · May 30, 2020, 3:47pm

Hi @Tony.Nguyen,
Thats right tried that as well.
However in my specific case i need the row_id, once converted to a dict the row_id is lost as far as I know. And sadly the time difference is more or less what generating the row_id’s costs.

Additionally, with this approach the dict includes the linked rows as live objects which might result in unwanted roundtrips.

Thanks anyway!

Tony.Nguyen · May 30, 2020, 3:50pm

If you really need row id, you need to create your own row id since the system row_id must not be used: Short, Unique ID Values

mark.breuss · May 30, 2020, 4:07pm

Interesting, I didn’t know that.

Thanks for the heads up then