Storing pandas data-frames as a single row entry in an anvil data table

a.smith.carrier · July 12, 2021, 1:51pm

What I’m trying to do:

I am not a programmer, but have developed a desktop python GUI for my app and am trying to make it into a web-accessible system.

I need to load large data files from csv files (sometimes 1M rows, 100 MB) and store them as a pandas data frame that can be manipulated on the server side in different user sessions.

I am looking for some confirmation that I am on a good track. I believe my approach should be:

a) use the upload function to get the csv data file data and read it in to a pandas dataframe using pandas read csv function. This part is working fine.

b) after some manipulation of the data frame I would then like to store the resulting data frame so it can be re-loaded in a future session as the user may need to come back to the project several times over different sessions to add more data, etc.

My understanding is that I can store the data frame as a python object in a data table row if I ‘serialize’ the data by using the pandas_to_pickle function and then I can store that ‘pickled’ dataframe as a python object in a data table.

So then basically, when the user starts a new project and creates a data frame related to that project I would add that data frame into a table that has rows organized by user, project, and dataframe so that in future sessions, the user can select a particular project that they want to work on and then the server can go and search the data table and collect all data frames that were previously uploaded without having to re-load them from the original csv files.

Some of the raw csv files will be up to 100 MB in size.

If this makes sense, one question I have is when I use the to_pickle function my understanding is that I have to give it a temporary filename on the server to use and then turn that file into a media object to store in the data table. Or is there a way to directly use the to_pickle and turn the dataframe directly into a media object without needing to store the file as a temporary local file on the server.

If anyone has any tips/links/comments or suggestions it would be greatly appreciated.

Thanks. A.

a.smith.carrier · July 12, 2021, 2:45pm

I was able to get the dataframe converted to a csv file and then converted the csv file to a blobmedia and then stuck the blobmedia object into the table and it shows up as row with the csv file in the media column. And it worked! Very pleased with that process . If anyone can suggest improvements let me know:

‘’’@anvil.server.callable
def load_data(file):

with anvil.media.TempFile(file) as file_name:
if file.content_type == ‘text/csv’:
d = pd.read_csv(file_name)
d[‘datetime2’]=pd.to_datetime(d[‘DateTime’])
else:
d = pd.read_excel(file_name)

a=d.head(10)
b=d.tail(10)
c=d.size

tdf=d.to_csv()
m=anvil.BlobMedia(‘text/plain’, tdf.encode(), name=‘junk.csv’)
app_tables.table.add_row(media=m)’’’

p.colbert · July 12, 2021, 5:53pm

If you ever need more capacity, consider compressing your CSV files before storing them. It’s not unusual to get 10:1 compression.

a.smith.carrier · July 13, 2021, 1:44pm

Thanks for that, that would make sense in my case since the files are quite large.