MongoDB's explain() might lie to you when querying with $ne

1 Nov 2015

Please note that this is MongoDB 2.6.11.

I have been trying to optimise MongoDB, and I have figured out how indexing for $ne works. Here's one gotcha:

MongoDB will decide to indexes based on previous queries, and explain() might lie to you


Let's look at the example how MongoDB does that.

Say we have 10 records:

db.posts.insert({ group_id: 1, kind: "comment", created_date: 10 }) db.posts.insert({ group_id: 1, kind: "comment", created_date: 9 }) db.posts.insert({ group_id: 1, kind: "comment", created_date: 8 }) db.posts.insert({ group_id: 1, kind: "comment", created_date: 7 }) db.posts.insert({ group_id: 1, kind: "post", created_date: 6 }) db.posts.insert({ group_id: 1, kind: "post", created_date: 5 }) db.posts.insert({ group_id: 1, kind: "post", created_date: 4 }) db.posts.insert({ group_id: 1, kind: "post", created_date: 3 }) db.posts.insert({ group_id: 1, kind: "post", created_date: 2 }) db.posts.insert({ group_id: 1, kind: "post", created_date: 1 })

And we have the index (group_id, kind, created_date) and the index (created_date). Let's assume that all indexes are of DESC. Here are the queries to create the indexes:

db.posts.ensureIndex({ group_id: -1, kind: -1, created_date: -1 }) db.posts.ensureIndex({ created_date: -1 })

Let's look at the query for group_id = 1:

> db.posts.find({ group_id: 1, kind: { "$ne": "comment" } }).sort({ created_date: -1 }).limit(3) mongod> 2015-11-01T13:17:00.830-0700 [conn1] query tests.posts query: { query: { group_id: 1.0, kind: { $ne: "comment" } }, orderby: { created_date: -1.0 } } planSummary: IXSCAN { created_date: -1.0 } cursorid:313845427450 ntoreturn:3 ntoskip:0 nscanned:7 nscannedObjects:7 keyUpdates:0 numYields:0 locks(micros) r:971 nreturned:3 reslen:251 0ms

Please notice IXSCAN { created_date: -1.0 } means it uses the (created_date) index. And nscanned is 7.

Let's look at the query for group_id = 2:

> db.posts.find({ group_id: 2, kind: { "$ne": "comment" } }).sort({ created_date: -1 }).limit(3) mongod> 2015-11-01T13:17:10.477-0700 [conn1] query tests.posts query: { query: { group_id: 2.0, kind: { $ne: "comment" } }, orderby: { created_date: -1.0 } } planSummary: IXSCAN { created_date: -1.0 } ntoreturn:3 ntoskip:0 nscanned:10 nscannedObjects:10 keyUpdates:0 numYields:0 locks(micros) r:291 nreturned:0 reslen:20 0ms

Please notice IXSCAN { created_date: -1.0 } and nscanned is 10.

If it used the index (group_id, kind, created_date), nscanned would be 0.

But if you run explain():

> db.posts.find({ group_id: 2, kind: { "$ne": "comment" } }).sort({ created_date: -1 }).limit(3).explain()

It'll tell you that the index (group_id, kind, created_date) is being used. What the hell?


Now let's remove all the indexes, re-create them, then querying for group_id = 2 before querying for group_id = 1.

We first query for group_id = 2:

> db.posts.find({ group_id: 2, kind: { "$ne": "comment" } }).sort({ created_date: -1 }).limit(3) mongod> 2015-11-01T13:19:59.058-0700 [conn1] query tests.posts query: { query: { group_id: 2.0, kind: { $ne: "comment" } }, orderby: { created_date: -1.0 } } planSummary: IXSCAN { group_id: -1.0, kind: -1.0, created_date: -1.0 }, IXSCAN { group_id: -1.0, kind: -1.0, created_date: -1.0 } ntoreturn:3 ntoskip:0 nscanned:0 nscannedObjects:0 keyUpdates:0 numYields:0 locks(micros) r:3577 nreturned:0 reslen:20 3ms

Please note that the index (group_id, kind, created_date) is used and nscanned is 0.

Then, we query for group_id = 1:

> db.posts.find({ group_id: 1, kind: { "$ne": "comment" } }).sort({ created_date: -1 }).limit(3) mongod> 2015-11-01T13:20:38.444-0700 [conn1] query tests.posts query: { query: { group_id: 1.0, kind: { $ne: "comment" } }, orderby: { created_date: -1.0 } } planSummary: IXSCAN { created_date: -1.0 } cursorid:315208528520 ntoreturn:3 ntoskip:0 nscanned:7 nscannedObjects:7 keyUpdates:0 numYields:0 locks(micros) r:855 nreturned:3 reslen:251 0ms

Please note that the index (created_date) is used and nscanned is 7.

Then, we query for group_id = 2 again:

> db.posts.find({ group_id: 2, kind: { "$ne": "comment" } }).sort({ created_date: -1 }).limit(3) mongod> 2015-11-01T13:21:18.426-0700 [conn1] query tests.posts query: { query: { group_id: 2.0, kind: { $ne: "comment" } }, orderby: { created_date: -1.0 } } planSummary: IXSCAN { created_date: -1.0 } ntoreturn:3 ntoskip:0 nscanned:10 nscannedObjects:10 keyUpdates:0 numYields:0 locks(micros) r:250 nreturned:0 reslen:20 0ms

Please note that the index (created_date) is used and nscanned is 10. Please also note that explain() tell us that it will use the index (group_id, kind, created_date). That's a lie.

Now imagine if we have 50,000 records…

That's the gotcha for you!


The best way to handle this situation is to create the index (group_id, created_date). It'll still scan the kind field, but it's much less than scanning the whole database.

Give it a kudos