What I’m trying to do:
I’m trying to add a row to a data table but not if a matching row already exists (avoiding duplicate entries). Essentially, ‘upsert’.
What I’ve tried and what’s not working:
I have an if-statement that checks if there’s a matching row and then, if there isn’t, add a row to the data table.
This if-statement slows down my server function quite a bit.
Code Sample:
@anvil.server.callable
def upsert_urls_to_bot(urls, bot_name):
"""Upsert URLs to a bot as a document."""
user = anvil.users.get_user(allow_remembered=True)
bot = app_tables.chatbots.get(User=user, bot_name=bot_name)
if user and bot:
print('Adding URLs to chatbot sources')
for url in urls:
if app_tables.documents.get(bot=bot, url=url):
row = app_tables.documents.get(bot=bot, url=url)
row['url'] = url
else:
app_tables.documents.add_row(bot=bot, url=url)
return app_tables.documents.search(bot=bot)
else:
print('add_urls_to_bot: User or bot not defined')
return None
If there’s no upsert, what else can I do to speed this up?
Is user a linked column in bot? Or are you storing an entire copy of the “user” object/row, which could be larger than say, just the users email?
If so, a text search of just the email might be faster?
There are many more things that can make it faster.
Also, look at the app tables docs for app_tables.table_name.get(), it shows you how to short circuit the or operator to get and or create a row without the if statement (and the 2 instead of one calls to .get() ) .
You can then just go:
row['url'] = url
Also the same question with bot in the documents table, if you are just comparing the row, you might want to use accelerated tables and fetch_only to not load anything except the row id inside the row object for comparison.
Here is my take, (typed in a phone, I hope it’s not messy).
Your code seems to add rows, but doesn’t seem to update anything. This version has only one search rather than many get, it should be faster:
@anvil.server.callable
def upsert_urls_to_bot(urls, bot_name):
"""Upsert URLs to a bot as a document."""
user = anvil.users.get_user(allow_remembered=True)
if not user:
print('add_urls_to_bot: User not defined')
return None
bot = app_tables.chatbots.get(User=user, bot_name=bot_name)
if not bot:
print('add_urls_to_bot: bot not defined')
return None
print('Adding URLs to chatbot sources')
rows = list(app_tables.documents.search(bot=bot, url=q.any_of(urls)))
existing_urls = [row['url'] for row in rows]
for url in urls:
if url not in existing_urls:
rows.append(app_tables.documents.add_row(bot=bot, url=url))
return rows
q.fetch_only allows to decrease the number of precached values.
If you are returning row objects to the client and you have many columns that you are not going to use or some columns with large amounts of data, not transferring all that unused data from the server to the client, can save some time.
But when you are in un the server, it would only help if it skips the precaching of very large amounts of data. You are unlikely to see a difference if you prefetch less than a few megabytes.