Storing general objects to a table (and then retrieving them) - also: do I need to pickle it?

I am trying to simply follow a very typical process of caching an object. Creating the object takes a long time (and costs money - its a FAISS object)

Let’s call this object “obj1”

I want to save it in app_tables.files as a pickled object (is there another way?)

Then, I want it be available (for any other users) to be loaded and un-pickled.

Is there a simple way to do this?

NOTE: The pickling is not required, I just want to save the object and retrieve it later.

UPDATE:
I am able to save to the files table and then retrieve it, but the types before and after are:

before saving: <class 'langchain.vectorstores.faiss.FAISS'>
after loading from table: <class 'anvil._server.LiveObjectProxy'>

How do I convert it back to its original type?

1 Like

At the top of the server module:

import pickle

Then after you have your object:

anvil_media_object = anvil.BlobMedia(
                                      'text/plain', 
                                      pickle.dumps( your_faiss_object_here ) ,
                                         )

Now you have a pickled version of your object, wrapped in an anvil media object. Look up the tutorials on how to store this media object in a data table in a Media column (for storing files). Or search the forum, it has been answered many many times.

When you want to use your object again, from within the anvil server module, you retrieve this anvil media object again from a data table and use:

your_faiss_object_here   = pickle.loads(anvil_media_object.get_bytes())
1 Like

Here is a link to the docs on how to set up a data table:

and how to use it to store data like media objects in python:

Almost there… :slight_smile: (and thank you for your help!)

Here is what I have:

anvil_media_object = anvil.BlobMedia('text/plain', pickle.dumps(db) , )
app_tables.files.add_row(path="db", file=anvil_media_object)
retrieved_anvil_media_object = app_tables.files.get(path="db")
your_faiss_object_here = pickle.loads(retrieved_anvil_media_object.get_bytes())

but I get an error for the last line:

`AttributeError: get_bytes`
* `at /downlink-sources/downlink-2023-02-23-21-45-40/anvil/_server.py:47`

What am I doing wrong?

When I add some print statements:

anvil_media_object = anvil.BlobMedia('text/plain', pickle.dumps(db) , )
print(f'anvil_media_object: {type(anvil_media_object)}')
app_tables.files.add_row(path="db", file=anvil_media_object)
retrieved_anvil_media_object = app_tables.files.get(path="db")
print(f'retrieved_anvil_media_object: {type(retrieved_anvil_media_object)}')
db = pickle.loads(retrieved_anvil_media_object.get_bytes())

I get:

anvil_media_object: <class 'anvil.BlobMedia'>
retrieved_anvil_media_object: <class 'anvil._server.LiveObjectProxy'>
`AttributeError: get_bytes`

This line will get the row object of that row in the data table (you can think of the data table like a much more powerful spreadsheet).

So since you have a column named ‘file’ you can access the row like a python dictionary, using the column name as the key:

row_object= app_tables.files.get(path="db")
retrieved_anvil_media_object = row_object['file']

Also, did you create your own data table or are you using the one automatically created when you install the Data Files service?

You should really create your own, since I have no idea what will happen if you use that data table, the service might delete your data if it isn’t being used for a while.

1 Like

I created my own table.

THAT DID THE TRICK! :slight_smile: thank you!

print(f'db: {type(db)}')
anvil_media_object = anvil.BlobMedia('text/plain', pickle.dumps(db) , )
print(f'anvil_media_object: {type(anvil_media_object)}')
app_tables.files.add_row(path="db", file=anvil_media_object)
# retrieved_anvil_media_object = app_tables.files.get(path="db")
row_object= app_tables.files.get(path="db")
retrieved_anvil_media_object = row_object['file']
print(f'retrieved_anvil_media_object: {type(retrieved_anvil_media_object)}')
db2 = pickle.loads(retrieved_anvil_media_object.get_bytes())
print(f'db2: {type(db2)}')

output:

db: <class 'langchain.vectorstores.faiss.FAISS'>
anvil_media_object: <class 'anvil.BlobMedia'>
retrieved_anvil_media_object: <class 'anvil._server.LazyMedia'>
db2: <class 'langchain.vectorstores.faiss.FAISS'>

Now I will go figure out why it works :wink:

I really think this is a very basic operation of caching an object (for later use).

2 Likes

You may also want to check out the alternative pickling method in this article:

Section is called “Serializing and De-Serializing to bytes”

1 Like