What is the best way to save temporary large csv data into server

Hadri · May 23, 2023, 3:33am

What I’m trying to do:
Doing a lot of operations on Pandas dataframe which is read from csv file (either uploaded or from google drive) based on client interactions so we need to save the dataframe temporary on the server and do specific operations each time on that file before the final save into google drive or download.

What I’ve tried and what’s not working:
I am saving the pandas dataframe as csv file temporary into the server. it is working well for small csv files but for large datasets I am getting " anvil.server.ExecutionTerminatedError: Server code exited unexpectedly: 2bfeef06ac". I am not sure if I have to convert all my request to background task or not. In general, is there any recommend way to save the data temporary better than what I am doing now?
I have business plan

Code Sample:

@anvil.server.callable
def function_name(ds):
  df = pd.read_csv(f"/tmp/{ds}")
  df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
  # Some operation here on the df
  df.to_csv(f'/tmp/{ds}')
  return df.to_json(orient='records')

KR1 · May 23, 2023, 5:33am

I was getting the same error message. After moving it to the background the error was gone.

For saving the data I would recommend you to save it in the data table. You can delete row later when it’s not needed anymore.

Hadri · May 23, 2023, 8:28am

Thanks for the reply!
can you share how the client side when you call background task. I am doing the following:

task = anvil.server.call('func_to_call_bg_task', args)
while not task.is_completed():
        time.sleep(15)

Is there is a better way?
Also, is it possible to share a sample how are doing “For saving the data I would recommend you to save it in the data table. You can delete row later when it’s not needed anymore.”

KR1 · May 23, 2023, 11:01am

Add timer component to your form, set the standard interval to 0.
in the initiation of your background task, you should call your function on server side which start the background task function.

client side: something like that:

    with anvil.server.no_loading_indicator:
      #prepare the variables for the background task
      self.task = anvil.server.call('first_function_server', var1, var2)
      self.timer_1.interval = 1#set timer, you can set the intervals to everything you want. I use always 1 sek.

server side:

@anvil.server.callable
def first_function_server(var1, var2):
  task = anvil.server.launch_background_task('background_task',var1, var2)
  return task#important to return this information back to the client.

@anvil.server.background_task
def background_task(var1, var2):
  #background task

 #at the end data table with 2 columns, 1 for date and 1 for your object.
    id = app_tables.export_media_transfer.add_row(object=media,date=datetime.now()).get_id()
    return id#for the download from data table

Timer function on client side form:

def timer_1_tick(self, **event_args):
    """This method is called Every [interval] seconds. Does not trigger if [interval] is 0."""
    with anvil.server.no_loading_indicator:
      # Show progress

      if self.task.is_running():
        #this codes executes every time the task is not finished and the timer hits interval
        #you can call for example some int variable that tracks the processing progress and show it as progress bar.
        #state = self.task.get_state()
        #e.g. progress = state.get('progress', 0)
       
      else:
          #retrieve the data or do everything you want after the task is finished.
          #for e.g. var1 = self.task.get_return_value() or a database call to get the data.
          #example with a file transferred
          id = self.task.get_return_value()
          row = app_tables.export_media_transfer.get_by_id(id)
          anvil.media.download(row['object'])

You can delete your row within timer function after retrieving the data with just .row.delete()
I’m using the data column with a scheduled Background Task to delete the data older than 1 day.