Subclass Chroma’s Configuration and override the cache_dir property:
from chroma import Configuration
class MyConfig(Configuration):
@property
def cache_dir(self):
return '/tmp/.chroma'
chroma.config = MyConfig()
Set the location of the Chroma config file to use a custom config:
chroma.config_file = '/path/to/myconfig.ini'
Where myconfig.ini contains:
[chroma]
cache_dir = /tmp/.chroma
The key is to configure Chroma’s cache directory before it gets used for the first time. Let me know if any of those suggestions help or if you have any other questions!
Thanks so much for the help. Unfortunately, none of those things worked. I think you have the right idea about setting the directory. the LangChain AI chatbot gave erroneous suggestions, too.
I’m using Google Drive integration to get around no writing to the anvil system.
Chroma has to write a file (.chroma) to a directory which I can set using a db_path parameter, but I can’t seem to pass it a google drive folder. Help, please! I have the google drive folder loaded as:
I’m speculating, but I doubt you’ll be able to pass it a google drive folder. I’m guessing you’ll have to use the file system. It may be easier to help you if you shared more details about the errors you’re seeing, what you’ve tried, etc., with the gold standard being a clone link to a simplified demo app demonstrating the issue. It may also be helpful to link to relevant chroma docs, etc.
instead of db_path=, but I am unsure if this really needs to be inside a with context block any longer.
…or really if this will even work the way that you want, since if more than one user is using your site at the same time they would be sharing the same temporary file, which could cause aberrant behaviour.
One way to solve that last problem would be to use
with anvil.media.TempFile() as temp_chroma_file:
chroma = Chroma(persist_directory=str(temp_chroma_file))
but every single line of the rest of your code that uses chromadb will have to exist within this block.
Or you could refactor all the rest of your code to be inside a function that gets called from inside this temporary file with block, passing in the randomized name of the temporary file to the function.
This way the “persistent” database file is instead, randomized and ephemeral.
… or you could create a tempfile name directly from the session ID of the user, so two different users could not collide, however there is no guarantee that the same file will still exist between server calls, even in the same session.
for now this is just a proof of concept, so multiple users won’t be a problem. Thanks for your help. I’ll give this a shot and mark it as a solution if this works. I appreciate the effort and thoroughness!