Use LIMIT-OFFSET for search() slicing

stefano.menci · May 28, 2018, 1:56am

I noticed that list(my_table.search()[10:20]) is fast, but list(my_table.search()[1010:1020]) is slow, and it is slower on the client than on the server.

My guess is that the slicing is slow because is left to Python, and my other guess is that using the slice object of __getitem__ to set LIMIT-OFFSET in the SQL query would make a big difference and would make the time on the client identical to the time on the server.

If I am guessing well, then please do use LIMIT-OFFSET (or any other trick) to speed up the search slicing.

meredydd · May 30, 2018, 9:56pm

You’re right; that’s exactly what’s happening!

If you want to browse deep into a result set, I would suggest holding onto an iterator on the object returned by search() (ie x = iter(my_table.search()), then keeping x around for a long time and calling next on it when you want the next row - you can safely keep that iterator around indefinitely!). That does a proper lazy paginated query (not using LIMIT/OFFSET, but actually using row IDs to make sure you don’t get skipped or duplicate rows, even if someone else is adding or removing rows concurrently).

Making slices use LIMIT/OFFSET would be tricky, because the values they return would need to be consistent with the sequence you get from the paginated iteration. I have a few ideas, though, so I’m going to leave this request here while they percolate…

stefano.menci · May 30, 2018, 10:25pm

Sometime you want a cursor, sometimes the LIMIT/OFFSET is good enough (or better), and since the Python expression table.search()[1000:1010] is equivalent to a LIMIT/OFFSET query, not to a cursor, I thought I would ask

meredydd · May 30, 2018, 10:26pm

Yeah; the actual implementation is probably going to be along the lines of “if it’s far enough away that we haven’t already got any of that range cached in the search result, go use LIMIT/OFFSET to seek to it”. But I’m sure that’s got all sorts of nice squirrelly edge cases

(Also, we keep our remote-object system implementation-neutral, so we’ll need a robust, portable way of conveying that information, which means a protocol upgrade. Not impossible, just unfortunately not a 50minute feature! Upside is that we will be able to support other things that work just like data tables, complete with client/server transparency and paginated iteration…)

david.wylie · May 30, 2018, 10:36pm

Would someone be willing to show a working example of this as it might relate to a many thousand row table? I’m having a slow day anyway, but I’m not sure I’d fully understand how that works anyway. Looks like fun if I can get a grip on it…

meredydd · May 30, 2018, 11:00pm

Would someone be willing to show a working example of this…?

I’m about to hit the sack, but I can point you at my “Table” custom component, and how it does pagination: Building a rich table as a custom component

Specifically, when pagination is enabled, it sets up an iterator and stores it:

      self._data_iter = iter(self._data)

When it wants to display data, it then repeatedly pulls elements out of this iterator into its list of realised rows (aka self._data_seen):

      seen_len = len(self._data_seen)
      if seen_len < self._page_start + self._page_len and seen_len < self._data_len:
        # We need to realise more from the iterator
        try:
          while seen_len < self._page_start + self._page_len:
            self._data_seen.append(self._data_iter.next())
        except StopIteration:
          self._data_len = len(self._data_seen)

But it only does this bit-by-bit, when you hit the “Next” button. So you can have a table displaying “Rows 1-10 of 100000”, and its only loaded those ten rows from the iterator. When you hit the “next page” button, it will load the next 10 rows from the iterator, so now self._data_seen has 20 rows in it.

(Apologies for that code, btw - I would write it a lot more clearly if I were doing it today.)