Implementing Faceted search

My app displays data in a 10-column table. I’ve been trying to build in filters and sorting by taking user input and sending a sql query to my Postgres DB. But I’d like to step it up to the next level and implement full faceted search. Writing sql queries nested in others seems a bit overwhelming. Is this the best way to go about it? Does anyone know some better options (quicker, easier, less error-prone)?

When you say full faceted search are you talking about every column potentially having definable criteria (including options for equals, less than, contains, etc.) on the already populated data table? A bit like the Kendo style data tables?

Also, how much data are you talking about?

I seem to remember you are talking about potentially large result sets. The problem here is that in code you are going to need to reorder your data according to potentially 10 sets of criteria, and that is going to be, in my opinion, harder than writing the SQL to only fetch the required data in the first place.

Using a JS library - again, I really think the SQL will be easier to learn than integrating a 3rd party libraries.

Clearly I would recommend the SQL query route :slight_smile: and it’s something I could probably help you with. It is a little overwhelming to start with but it grows on you.

Just my opinion, of course.

Yeah my Drupal developer friend was the one who told me it was “easy”.

I have 25,000 rows and the faceted search probably wouldn’t go beyond that for a while. But you are correct in assuming I would need greater than, less than, between, not between, equals on each of the 10 columns, one filter building on the next. I could probably write the sql query for a specific set of filters, but the part that seems overwhelming is programming it so the 10!x4! Or whatever number of possible combinations based on user input all work.

Is it as simple as nesting queries within queries?

And that means I need to save the previous query and concatenation the next one. How do you avoid sql injection?

Yeah I could probably use some help.

It’s largely going to be WHERE clauses. And unless I’m missing something all the filters would be applied at once, it wouldn’t really be one building on the next.

Let me give you a very rough, not very optimised pseudo-code example of what I’m thinking of :

if you had 3 columns called “name”, “age”, “married” and you wanted to have “equal to, not equal to,less than, greater than” filter options for each, then you would do something like this -

# Pass each filter clause and filter values, eg : 
# name_filter_clause (values "equal to", "not equal to", etc)
# name_filter_value (eg "Dave")
if passed_where_clause == "is equal to":
    name_filter_clause = " = "
# repeat for each one 

sql = "SELECT name, age, married FROM mytable WHERE "
where_clause=None
params=[]
if name_filter_clause is not None:
    if where_clause is not None:
        # Only add AND if this is not the first parameter.
        where_clause +=" AND "

    # This should add  "name = %s" to the where_clause
    # (or "AND age > %s" if it's not the first parameter)
    where_clause += "name "+name_filter_clause+" %s"
    params.append(name_filter_value)

if age_filter is not None:
    # repeat for each filter.

sql += where_clause

What you end up with is something like this in your sql variable :

SELECT name,age,married FROM table WHERE name = %s AND age > %s ..etc

Then execute it :

cursor.execute(sql,params)

Now, to avoid SQL injection you would need to ensure the where clause operators are sanitised and contain only values you expressly set server side, and the execute with parameters will stop SQL injection by passed filter values.

Using this method, you would need a logic block for each column, but that’s not too hard. Someone better than me might have an easier way to do that, too.

There’s almost certainly a more “pythonic” way of doing this, but hopefully that’s a start in understanding at least one technique you could use.

EDIT -

I have updated the project on the end of the link below (link has changed) in accordance with Meredydd’s recommendations. The function is now server side.

Ok, here’s a project that shows off what I’m talking about :

https://anvil.works/ide#clone:QHRURFJYJ4MAVQEF=SZFRUHGK7J6OQJOCE5G7UACY

I’ve only done name & age columns, but you will see that if you put values in the name & age text boxes next to the drop downs (and set the drop downs to the required comparison) then clicking search will dynamically create the SQL.

SQL INJECTION ALERT - YOU MUST BE CAREFUL !!
Building SQL dynamically is a hazardous undertaking. Ensure you use parameters for values and that you never put raw, unchecked data originating from a client form straight into any SQL statement.

1 Like

Well that’s surprisingly simple!

I do worry about sql injection. Would it change very much to also protect from injection? If this were done on the server, obscured from the user, the only chance for injection would be in the user input, right? So if you put the user input into the second argument of the cur.execute(sql, [2nd argument]) and use %s appropriately, we’re good right?

My example was all client side to show the principle, but yes you would move that all server side and pass the values for the query in the second argument to execute, using %s for substitution.

Just be careful to ensure the “operator” variable is explicitly set by you and never defaults to something valid (but potentially troublesome), and definitely doesn’t use the values directly selected by the user (ie don’t be tempted to change “equals” to “=” symbols directly in the drop down and add it directly to “operator”).

This is all a good start! Although I would like to double and triple the warnings about SQL injection. Please, please be careful here, and use the proper substitution mechanisms in the Python database libraries. Not doing this is the #1 way to get your app hacked.

If you’re going to build up an SQL query as a string here’s how I would do it (based on David’s code):


  def do_query(name_op, name, age_op, age, married_op, married):
    sql = "SELECT name,age,married FROM mytable WHERE TRUE"
    sql_args = []

    # Check first column.
    if name_op is not None:
      operator = get_operator(name_op)
      sql += " AND name " + operator + " %s"
      sql_args.append(name)

    # check second column
    if age_op is not None:
      operator = get_operator(age_op)
      sql += " AND age " + operator + " %s"
      sql_args.append(age)

    cur = #...get a DBAPI cursor...

    # cur.execute() will substitute all those arguments in *safely*:
    cur.execute(sql, sql_args)

    return list(cur)

A few notes:

  • I’ve also recast this function as a server function, so you don’t get into the habit of trusting client code to construct SQL for you

  • It now expects you to pass age in as a number (so you’ll have to pass something like int(self.age_text_box.text) from the client), and to pass None for the operator parameters to signal “don’t filter on this column”.

  • Note that this code still trusts the get_operator() function absolutely, so this function must also be on the server, and be sure to only return a white-listed set of SQL operators.

@david.wylie, can I ask you a favour? Your code is clearly going to be used as a template by readers, however many warnings you put on it, so could I ask you to incorporate my changes (or something like them) to make it safe by default?


Roadmap note: We are going to be adding support for the necessary queries for faceted search (greater than/less than/between) to our built-in data table system soon! For Chris’s particular use case, he will probably still want Postgres, but for everyone else, we want to save them from the dangers of playing with fire manually building SQL queries.

You are, of course, correct. In my desire to get a speedy answer out there I have given out some rather dodgy code. I will ensure I give more thought to that from now on!

I have just disabled the app for the moment whilst I make those changes, and I’ll repost the link shortly.

(edit) - I have updated my post above with a new link. Hopefully that’s a little clearer now.

Will this also support full text search, or at least something like SQL LIKE (“WHERE CustomerName LIKE ‘%or%’”). When will this be available?

1 Like