Reducing DataGrid load time for PDF generation

stu · March 24, 2020, 3:59pm

Hi,

I have a table of roughly 5k entries of inspections that I want to output into Anvils form to PDF function.

I have a nested repeating panel with grouping as per the Anvil tutorials: https://anvil.works/learn/tutorials/data-grids/grouping

In essence, First Repeating_Panel populated by server call to the datatables for the equipment names (section titles).

def kit_info():
    return app_tables.kit_info.search()

Second Repeating_Panel (data_grid) populated by server calls to a different datatable for all items with that equipment name (row) to get the latest inspection record of that item of equipment to populate the DataGrid.

def inspection(item):
    data_grid_return = []
    for item_of_kit in app_tables.kit_item.search(Item=item):
        last_inspection = app_tables.inspection.search(tables.order_by("DateInspected",ascending=False),Kit_Item=item_of_kit)
        insp_date = last_inspection[0]['DateInspected'].strftime("%d/%m/%Y")
        insp_pass = last_inspection[0]['PassFail']
        insp_inspector = last_inspection[0]['Inspector']
        insp_manID = item_of_kit['Manufacturer_ID']
        insp_CID = item_of_kit['Manufacturer_ID']

        datagrid_row = { "ManID" : insp_manID,
                "CID" :insp_CID,
                "Inspector" : insp_inspector,
                "DateInspected" : insp_date,
                "PassFail" : insp_pass 
                }
        data_grid_return.append(datagrid_row)
        return data_grid_return

Due to the approx 5k row size it takes 100+ seconds to load.

This stops a PDF being generated as it times out at the Anvil server call limit of 30 seconds.
Error:

anvil.server.TimeoutError: Server code took too long at [Form1, line 29]

It will allow me to view the form on load because no server call from the form takes longer than 30 seconds.

How can I alter the server code to populate the DataGrid in less than 30 seconds to allow a PDF to be generated?

Clone link with sample data: https://anvil.works/build#clone:BAICHUI7DWVFQYTA=NGLT5MSSYOIMYK2D3IBH6BFS

Thanks everyone.

robert · March 24, 2020, 4:11pm

This is perhaps a workaround but you could move the work to a background task that would email you the PDF after it was generated.

This would make it so the user couldn’t do a spot check to make sure things looked ok though. You could get around that by perhaps showing the first 10 rows to the user (or a sample of them) and have the user double check before launching the background task.

Alternatively, launch the background task and have the task store the results somehow (as a blob in a separate data table perhaps). Then periodically check the task, and when completed, display the data. You could make a separate scheduled task to clean up old queries.

p.colbert · March 24, 2020, 4:25pm

Also see: https://anvil.works/docs/background-tasks/communicating-back

stu · March 24, 2020, 4:32pm

The background task times out in the same way:

Exception: PDF generation failed: Components did not load within allowed time.

I’ve tried storing the list of dictionaries to populate the DataGrid as a Simple Object within a DataTable. This also took too long to load and timed out.

robert · March 24, 2020, 4:49pm

Are you on the free plan? Based on what you are saying that’s the only reason why you would get a timeout on a background task like that.

stefano.menci · March 24, 2020, 4:55pm

Have you tried splitting the PDF in smaller chunks and then putting them together?

stu · March 24, 2020, 5:01pm

I’m on a paid plan but I assume its that the anvil.pdf.render_form() call being a server call

stu · March 24, 2020, 5:03pm

I know I could split it down into multiple documents but wanted a flowing PDF document rather than having gaps and white space where I have had to implement a split.

meredydd · March 24, 2020, 5:39pm

Hi @stu,

So, there are two problems here:

You’re doing a lot of nested searches, and they’re taking a long time
You’re doing it all from inside the PDF renderer (which is a headless Chrome browser that’s a bit of a resource hog, so we don’t let it run longer than 30s)

We can address those issues separately. #2 will be the fastest, so let’s start there. You can precalculate the data before you call anvil.pdf.render_form(), and then pass it as an argument to the form’s constructor - that way, you can calculate what you’re displaying on the server (which is faster). What’s more, you could calculate it in a Background Task (which gives you all the time in the world), and then call the PDF renderer once you’ve assembled your report.

This should get you up and running while you optimise Issue #1, which is a little more tricky, as it relates to your actual search logic. For starters, the search goes faster if you do it from the server (you’re closer to the database!), but also, this seems odd to me:

        last_inspection = app_tables.inspection.search(tables.order_by("Date",ascending=False,Kit_Item=item_of_kit))

Did you mean Kit_Item= to be an argument to tables.order_by() rather than search()? It’s possible you’re querying the whole inspection table, every time, for every item of kit, which could explain a bit of the slowness.

stu · March 24, 2020, 5:53pm

Thanks Meredydd,

I’ll start work on #2 and report back.

Explanation of #1. Each item of kit will have many inspections (these inspections are done every 6 months) so there are 3 DataTables to compile the PDF report.

Table 1: Kit_Info which details the make, model, shorthand identifier for that piece of equipment

Table 2: Kit_Item which details each item of that type of kit including serial numbers

Table 3: Inspection which details each time a specific item of kit is inspected

Therefore the reason for the nested search is to find the most recent inspection for a single piece of equipment.

I do not have a solution around that except for a search for that piece of equipment and order the date descending, then taking the first result at the most recent inspection of that piece of equipment.

Have I missed something in my methods or does that now make sense?

stu · March 24, 2020, 5:56pm

I’ve just re-read my original post realised that I pasted the wrong app_tables search into the original post. Correct version (as per the clone link):

last_inspection = app_tables.inspection.search(tables.order_by("DateInspected",ascending=False),Kit_Item=item_of_kit)

Correction now made in edit to original post.

stefano.menci · March 24, 2020, 6:09pm

This is a case where using SQL you get all you need in one shot and in the correct order. And it’s really fast.
It is only available with dedicated plans (like mine ).

stu · March 25, 2020, 11:27am

Meredydd,

I’ve tried the pre-calculate and pass through the data to the form method but the form still times out on load:

anvil.server.TimeoutError: Server code took too long at [Form2, line 17]

meredydd · March 25, 2020, 11:42am

Are you precalculating in an ordinary server call, or in a background task? You should be able to avoid timeouts by doing the precalc in a background task (which can run as long as you like), and then rendering it to PDF (which will then be quick!)

stu · March 25, 2020, 11:47am

I’m pre-calculating using a background task and saving the data in a DataTable.

Then separately calling the DataTable and passing it to the form.

    @anvil.server.callable
   def run_pdf():
      print("start")
     
      data = app_tables.reportstore.search()[0]
      
      pdf = anvil.pdf.render_form('Form1',data['Data'])
      create_inspection_pdf(pdf) # email pdf form
      print("email sent")

Clone link:
https://anvil.works/build#clone:BNQWSRTV3X6YD7U3=XCICHWIFRRHA76DVVAJCZQCT

stucork · March 25, 2020, 2:28pm

I had a quick look and tried to load form1 without rendering to pdf and it still takes around 100 seconds to load without any server calls … (pre copying the data as a list)… since the rendered form can’t take longer than 30 seconds I think you’ll need to ensure that the form itself takes less time to load in order to be rendered.

perhaps using the renderform isn’t the right approach here? The dataset seems too large… how many nested rows are loaded to the form?

perhaps you could use one of the pdf writers from the list of packages https://anvil.works/docs/server/packages. You can then keep all this in a background task… use a progress bar so that the user has an idea of how long it will take…

meredydd · March 25, 2020, 4:42pm

Does the data really still take that long to load?! That’s remarkable, and shouldn’t happen - let us take a look…

stu · March 25, 2020, 5:02pm

Thanks Meredydd.

For info: the total row count is 6219.

stefano.menci · March 25, 2020, 6:11pm

I wouldn’t be surprised if rendering 6000 template forms is the slow part here, even if the forms are very simple.

stu · March 30, 2020, 11:47am

Hi Meredydd, is this a limitation of Anvil loading repeating panels and I should attempt a different method or is this something that can be sped up to use the Anvil PDF function?