Ways to speed up csv import to data table?

momoko · November 21, 2022, 2:08am

Thanks very much for all the help! @stucork, I have a few newbie questions to add after trying to use the add_rows functionality for the accelerate tables beta:

What I’ve tried so far:

I moved my ‘store_upload_data’ function into a hosted Server Module in Anvil (instead of using the uplink) just to see if I’d see any difference, and also — after opting in to the Acc. Tables Beta — modified the code as shown:

@anvil.server.callable
def store_upload_data(file):
    with anvil.media.TempFile(file) as file_name:
        if file.content_type == 'text/csv':
            df = pd.read_csv(file_name).dropna()

            # checking start & end time and adding print statements to see elapsed time to execute .add_rows 

            s1 = datetime.datetime.now()
            print("Initiating .add_rows function ...")
            app_tables.upload_data.add_rows(df.to_dict(orient="records")) 
            s2 = datetime.datetime.now()
            func_time = s2-s1
          
            print("Total time to execute .add_rows: {}".format(func_time))

What I’ve noticed testing this so far …

My print statements don’t appear in either the ‘Running App Console’ or the ‘Server Console’ windows. Should I be looking elsewhere to see these?
Still takes a really long time to upload the data, and around the same amount as before.

(I don’t doubt I’m completely missing something v. basic about this as I’m pretty unfamiliar with app development)

Also - both @stefano.menci and @jshaffstall suggested minimizing calls by ‘sending one list of rows and doing the bulk of database operations in the server’ - I get this conceptually, but I’m having a hard time understanding how the original store_upload_data() function (shown below for reference) isn’t already doing this:

def store_upload_data(file):
    with anvil.media.TempFile(file) as file_name:
        if file.content_type == 'text/csv':
            df = pd.read_csv(file_name).dropna()
            for d in df.to_dict(orient="records"): 
                app_tables.upload_data.add_row(**d)

Doesn’t df.to_dict(orient=“records”) already convert the dataframe to a list of rows, which is then looped through? And isn’t this loop technically within a single server call, namely

anvil.server.call(‘store_upload_data’, file)

?

How might one refactor this code to avoid making a server call for every row of data?

(Thank you in advance for your patience with me on this - I’m learning as I go, and there’s lots to learn … )