I have over 20,000 rows in a data table. I need to be able to filter them based on user selected criteria, or display all of them if no criteria is selected. Using the to do list method of building a second form and displaying each row in a for loop takes 6 minutes to load. Any suggestions on how I can get instant display updates (or at least faster, like seconds rather than minutes)?
Chris, any possibility you can get your users to rethink their “need” to scroll through 20k rows on-screen? Been there, done that, which includes users who believed being able to view literally millions of rows was just as important as searching specifically for what they need. Sorry that my response has nothing to do with Anvil, but is rather a design decision.
That’s a good point. I’ve been playing with filtering the results in my app, but even filtering still requires processing all the rows into a list, based on the Anvil documentation. I’m down to 90 seconds to load, but that’s still 90 seconds too long. Should I be looking at database services instead?
I do all my queries in a server module and just return an object with the data back to the form for display. Regarding query performance in an application, my rule of thumb has always been that if a query takes more than 3 seconds, something is wrong (query logic, indexing, etc.) and the user is not going to be happy with the wait time.
@PeterJ if you’re using a server module does that mean you’re not using data tables at all? 3 seconds seems like a good rule of thumb.
I’m using data tables, which can be accessed from a server module. We know Anvil allows you to query a data table from a web form, but from a design standpoint, that’s not a best practice i.e. your UI is talking directly to the DB, thus bypassing the controller (Anvil server module) in the MVC (model-view-controller) world.
That answers my initial question. Follow up question: how do you handle manipulation of query results. Say I need to multiply all the values in a column by a coefficient. Would you also apply that in the controller/server module? Or do that in the database itself? If you do it in the view, then you would get bogged down again I would assume.
I handle further manipulation of query results in the server module (by iterating through the query result object) and then finally pass the query object back to the form. If it’s just a single row of data for a data entry form, then it’s simply a matter of parsing the query results onto the edit fields/combos/checkboxes/radiobuttons on the form. If it’s a set of rows (to be displayed in a grid i.e. rows and columns), then I just iterate thru the rows inside the form, which I believe is a lightweight operation.
I’m apparently still stuck at sorting. Now that I have the sort() function on the server module, it’s timing out the server. Am I using the wrong python function to find the top or bottom of a list of floats, if I’m using sort()? I couldn’t find any other way to get say, the top 10 of a list of 20,000 numbers other than sorted() or sort(). I looked at list comprehensions and I looked at using SQL in Anvil and didn’t get any clues that these were the answer.
Chris, here’s an example from my application of sorting. It’s generic method for getting lookup values from a lookup table for a specific customer:
def db_read_data_for_lookup (table): customer = list(app_tables.customer.search(ID = get_customer_id())) # Return single row rows = getattr(app_tables, table).search(Customer = customer) # Return iterator return sorted([r for r in rows], key = lambda r: r['Name']) # Return the sorted list of lookup values to the form
The code above makes use of Python’s
sorted() and a
lambda. If you’re not familiar with these, spend a few minutes reading the Python docs to familiarize yourself. My app has a bunch of lookup tables, all containing a customer id link and a column called
Name (which contains the actual lookup value). Hopefully the code is pretty self-explanatory. If not, I can elaborate.
Just found this in the Anvil docs: “Anvil makes no guarantee about the order in which rows are returned from a search(). If you want to sort the returned rows, or access them by index, you can use the list() function to turn a search object into a Python list.” Which is what I’m doing above in my code.
How long does that function take to create the list in your app? I’ve tried my own version of it and timed each piece and by far, the list() takes the longest - for my project, between 80 and 110 seconds to turn 20,000+ records into a list.
When I put that piece of code on the Server Module, I get “timeout: Server code took too long at Form1, line 24”
I only have one data table with 26 columns. Should I break it up into smaller data tables? Like one table per column and add the key so each table has 2 columns?
Hi Chris - this doesn’t answer your question directly, but have you considered using an external database (such as MySQL) instead of DataTables?
I suggest it because (a) it’s what I do, and (b) because you could push the sorting off to the database server. 20k rows wouldn’t normally worry MySQL and it will handle things like sorting for you. You could requery the DB every time they change the sort criteria and get back a pre-sorted result set that you can just render to your controls back to you in (quite likely) sub-second time.
Might not suit you, and there may be a solution for you baked in to Anvil, but I thought I’d mention it.
@david.wylie you make a good point. However the reason I’m using Anvil is because of its ease of use, having everything you need in one package and being able to set up a web app with mediocre Python knowledge. A 3rd party DB solution is probably my best bet and if I knew how to set up a 3rd party DB I would have already. Now that it looks like my best option, I’ll look into it and figure it out. Can I ask what service you or anyone else prefer(s)? And what works well with Anvil?
I quite understand. There is a learning curve to using MySQL for sure.
You would then need to ask the Anvil guys to add a mysql package (I use “pymysql”) to your account so you can import it in your code. You would then access it like this :
import pymysql.cursors ... connection = pymysql.connect(host='x.x.x.x', user='xxxxxxx', password='xxxxxxx', port=3306, db='mydatabasename', cursorclass=pymysql.cursors.DictCursor) try: with connection.cursor() as cursor: sql = "SELECT * from atable WHERE tablefield=%s" cursor.execute(sql, ('searchterm')) result = cursor.fetchone() print(result) finally: connection.close()
There’s a slightly better example here : http://pymysql.readthedocs.io/en/latest/user/examples.html
you could also use something like mongodb (a NoSQL database). You can use it for free here mlab.com. You can connect using mongo python drivers (assuming Anvil will load them for you) or a straight forward REST API. Pretty well documented.
Difference? NoSQL is not SQL, ie it doesn’t use the SQL language and has different ways of searching for data. Tomes written on this subject that google will help with, but it may or may not suit your use.
@david.wylie awesome! Lots of research to do. Thanks!
Chris, I’m surprised your list creation is taking that long. I asked Anvil some time ago what DB they’re actually using behind the Data Tables, and they said Postgres. Postgres is no slouch when it comes to query performance, and my experience with even large lists in Python is that Python is very fast as well. Regarding my code and tables, my lookup tables are tiny - less the 50 recs. The search, list creation and sort happen pretty much instantaneously. I would suggest that before you go too far off on a tangent, such as switching to MySQL and thus complicating your development, let’s see if we can get Anvil support to chime in on your performance issue. Would it be possible for me to get a hold of your 20k record set? I’d be happy to try to duplicate (and possibly fix) your performance problem.
Chris, got your file. I won’t be able to take a look at it until I get home this evening (from my regular day job, EST time-wise).
It’s great to see the collaboration going on here! I’m just going to add a couple of things. The short version is that I would endorse moving to an external SQL database for Chris’s application. You should be able to sort/filter 20,000 rows in memory in Anvil, but from personal conversation I know your data set is eventually going to be much bigger.
In particular, you’re doing a lot of geographical lookups, so I would suggest having your data in Postgres (rather than MySQL) so you can later expand to the excellent PostGIS tools. When you reach full scale, you’ll be needing specialised geographical data types and indexes that Anvil’s general-purpose database can’t provide you.
For ease of getting started, I’d recommend starting with ElephantSQL (elephantsql.com) - they provide one-click fully-hosted Postgres database. And they have a free plan with 20MB of storage, which should be OK for testing, as well as a fairly inexpensive 2GB plan ($19/mo). (Something like the DigitalOcean solution proposed by David Wylie would also work, but would be more work to set up.)
In terms of your slow experience filtering data in your server modules, we’re working hard behind the scenes to make this work faster. (We’ll be including the first steps of this work in a platform update tomorrow morning. You may not know it, but we ship little updates and improvements like this all the time - we should probably get better at showing off about it!) That said, loading your entire data table into memory and processing it there is never going to be the quickest operation. In the coming weeks and months, we’ll be adding better querying capabilities to data tables, so you can perform more filtering and sorting in the
search() method, rather than loading it all into Python and doing it there.
However, none of this changes my original recommendation - in both the short and the long term, I think you’ll be better off switching to an external SQL database.
Hope that helps!
Thanks for all the helpful input. It’s time for me to start doing some database homework!
That’s great news Meredydd!