I made a simple dependency that lets you easily find similar rows using NLP. The integration works well with Anvil data tables and also optimized using Accelerated Tables.
It utilises TF-IDF
for finding similar rows. Of course, the algorithm is pretty basic and has scope for further improvements (Most of the logic for calculating similarity is not mine. Would be happy to have someone who is an ML expert to take a look at it).
Dependecy - Anvil | Login
Usage
This dependency is intended for being used in batch (inside a background/scheduled task). It looks through similarity scores between every row and stores the top similar rows for each row in a linked column.
Prerequisite :
-
Create a column which is a link to multiple rows to store the similar rows in it. This will be the output_col
-
Ensure that you have
numpy
andscikit-learn
in your Server environment (Choose Python 3.10 with Machine Learning Anvil Docs | Installing Custom Packages) or Full Python. -
Enable Accelerated Tables
If your existing app does not use the specific Python version or Accelerated Tables, create a new app with these specifications and add existing tables from your app.
Code Snippet
# In Server Module
Similar_Rows.batch_similar_rows("table_name", #Name of the table
{"col1" : 0.8, "col2" : 0.2,}, #Weightage given to columns when calculating similarity
"Similar_Rows", #Output column
10 #Set the number of similar rows for each row
)
Parameters
- table_name - Name of Table
- col_weights - Determine which columns to use for similarity comparison and how much weightage they have. More value for weightage means more preference.
- output_col - The output column name where the similar rows would be stored. Must be a link to multiple rows pointing to the same table
- top_n - Set the number of similar rows to find for each row.
- queries - Perform additional queries on rows before computing similarities. Can also accept
anvil.tables.query
objects
Example -
Similar_Rows.batch_similar_rows("movies", #table_name
{"Genre" : 0.2, "Overview" : 0.8}, #col_weights
"Similar_Movies", #output_col
10, #top_n
Language = "en" #Performs additional query that only finds similar movies for english movies
)
Here is an example app that shows similar movies for a movie - https://movies-db-similarity.anvil.app/
Clone for example app - Anvil | Login
Let me know your feedback on it.