Find Similar Rows using NLP

I made a simple dependency that lets you easily find similar rows using NLP. The integration works well with Anvil data tables and also optimized using Accelerated Tables.

It utilises TF-IDF for finding similar rows. Of course, the algorithm is pretty basic and has scope for further improvements (Most of the logic for calculating similarity is not mine. Would be happy to have someone who is an ML expert to take a look at it).

Dependecy - Anvil | Login

Usage

This dependency is intended for being used in batch (inside a background/scheduled task). It looks through similarity scores between every row and stores the top similar rows for each row in a linked column.

Prerequisite :

  • Create a column which is a link to multiple rows to store the similar rows in it. This will be the output_col

  • Ensure that you have numpy and scikit-learn in your Server environment (Choose Python 3.10 with Machine Learning Anvil Docs | Installing Custom Packages) or Full Python.

  • Enable Accelerated Tables

If your existing app does not use the specific Python version or Accelerated Tables, create a new app with these specifications and add existing tables from your app.

Code Snippet

# In Server Module

Similar_Rows.batch_similar_rows("table_name", #Name of the table
    {"col1" : 0.8, "col2" : 0.2,}, #Weightage given to columns when calculating similarity
    "Similar_Rows", #Output column
    10 #Set the number of similar rows for each row
) 

Parameters

  • table_name - Name of Table
  • col_weights - Determine which columns to use for similarity comparison and how much weightage they have. More value for weightage means more preference.
  • output_col - The output column name where the similar rows would be stored. Must be a link to multiple rows pointing to the same table
  • top_n - Set the number of similar rows to find for each row.
  • queries - Perform additional queries on rows before computing similarities. Can also accept anvil.tables.query objects

Example -


Similar_Rows.batch_similar_rows("movies", #table_name
    {"Genre" : 0.2, "Overview" : 0.8}, #col_weights
    "Similar_Movies", #output_col
    10, #top_n
    Language = "en" #Performs additional query that only finds similar movies for english movies
) 

Here is an example app that shows similar movies for a movie - https://movies-db-similarity.anvil.app/

Clone for example app - Anvil | Login

Let me know your feedback on it.

2 Likes