I created an app that uses the whole value of a simple object column as a key for searches.
The app works as expected, and, as long as the number of rows in the table remains very small, let’s say 100 or less, it should work just fine.
But what happens if the number of rows increases to thousands or much more?
Is that column indexed?
Would I be better off using a text column, converting the object to a json string and using that for my queries, because that text column would be automatically indexed?
2 Likes
SimpleObject columns are indexed with Postgres’s JSONB GIN index (which is in fact what we use for everything in Data Tables; it’s why you can search by every column).
However, queries on SimpleObject columns use “contains” semantics (so, for example, if the value of the column is {"letters": ["a","b","c"]}
, then the query search(column={"letters":["b"]})
will match that row) - so if you’re looking for precise equality, something like a text column containing your JSON might be handy!
2 Likes
In that case, the JSON had better be “normalized”, i.e., in a specific, repeatable order, before converting to text. By definition, JSON objects (equivalent to dicts) are intrinsically and deliberately unordered.
1 Like
Thank you for your answer, I will make sure this app use a text column (and make sure the order of the values is consistent as @p.colbert suggests), but…
I did a little reading about GIN indexes, and, if all SimpleObject
columns are indexed with them, then my tables in other apps have huge useless indexes. I have tables with SimpleObject
columns that I use to dump anything that I will never use in queries. In a traditional normalized database I would have dozens of tables, each with 5-10 columns. Instead I have one single SimpleObject
column that contains a list of dictionaries containing lists of dictionaries all the way down. The SimpleObject
columns of the average project row contain tens of thousands of values, which I will never use in searches, and, if I understand how JSONB GIN indexes work, each one of those values appears in the index.
I once worked on a BigSomething data storage system in Google App Engine. The system automatically created indexes the first time a query was run. These indexes were optimized for the specific query that was run. I mostly did not have to do anything, other than sometimes deleting indexes that were created by queries that were only run once during development. I could also create indexes that required more manual attention than what the automatic index generation process could provide. Optimizing indexes was important, because, if I remember correctly, the number of indexes affected both performances and the monthly bill.
I don’t know what is the cost of those useless indexes in my apps, I have no way to measure it, so it makes little sense for me to ask for a fix for something that I don’t even know if it’s broken.
Still, my gut feeling is that a less brute force approach, maybe something similar to that BigIDontRememberWhat
data storage from Google, would work better.