Implement pagination on App Engine Datastore

6 Mar 2017

It's a bad idea to support jumping to an arbitrary page when using App Engine Datastore. In fact, it's a bad idea for any database.

For most databases, when we want to jump to, say, page 100 where each page contains 10 items. A database would compose a result of 1000 items and then jumps to the item 990. That's an O(n) operation.

For MySQL or Postgresql, the process happens in the database's process. For App Engine Datastore, the process happens on your machine. This is an intelligent guess about where the process occurs. I'm too lazy to verify it. Please let me know if I'm wrong. Therefore, it's worse for App Engine Datastore.

But if you decide that you want pagination anyway, read on…

As I mentioned previously, jumping to a specific offset requires Datastore to build the items until the offset. The process happens on our machines (not Datastore server).

This means that Datastore returns the first 10 items with the cursor to fetch the next 10 items. Our code sees that we haven't reached the offset yet; our code fetch more items using the given cursor. This mechanism happens until we reach the wanted offset.

With this understanding, there are two things you can do:

  1. Increase the batch_size, so that we go back and forth fewer times. I believe the maximum batch_size is 1000.
  2. Fetch with keys_only=True, so that we don't receive full entities on every request. After we get the keys, we can query for full entities. This tactic is proven to be 10 times faster than paginating with keys_only=False. (Warning: the page takes around 30 seconds to load)].

One note: paginating with the zig-zag merge join algorithm is faster than paginating with exact index. It's super weird. After asking around within Google, the zig-zag merge join algorithm seems to operate in the new kind of index store. So, it's faster.

Happy paginating your site!