Be careful with the zig-zag merge algorithm on Datastore

9 Feb 2017

The article is here: https://cloud.google.com/appengine/articles/indexselection

I thought the algorithm was a god send that allowed me to build flexible querying on a website.

Today I have found that it make ~10 RPC calls to get the first 10 items when filtering with 3 columns.

My batch_size and limit are 10.

After looking at our Appstats, there are a bunch of Next() calls, which means it keeps requesting for the next page.

It took me a while, but now it makes sense. The zig-zag merge algorithm combines the results from 3 indexed columns and try to find the first 10 valid items. It scans for 10 times to find the first 10 valid items.

The only way to lesson the problem is to make a composite index from the columns that are queried together often.

If you only query with those columns, then the zig-zag algorithm doesn't occur. If you queries using more columns, it'll utilize this composite instead of using 3 separate indexes for the zig-zag algorithm.

That's considerably better.

Give it a kudos