Why the sample with the embedded search field with reverse sorting hampers?


There is a relatively small database of 2.5 million documents. The structure of the documents is not very important, the more it is "dynamic". But there is a query that runs regularly and quite often picks "the best" and no nested field:
 'checker.updated': { '$exists': false }

So the fact that this request is about 40sec... whereas this:
 'checker.updated': { '$exists': false }

spits out the result in less than one.

Tried indexes tinkering:
- separated fields postedNum:-1 checker.updated:1
- did dual postedNum:-1 + checker.updated:1
- tried to use the index for the whole checker

The result is zero. If you sort in ascending order - immediately as soon as the reverse for descending order, "quimbombo". There were even thoughts, do not blame if something relational school, but there stops the speed of the active work with tables that are larger than 20-25 million records...

Update 1. Tried to replace the filter with a nested field on a simple (create a copy of 'checker.updated' => 'checkerUpdated') - also it was "to fly" and with direct and reverse sorting. I don't understand... As an alternative solution, of course, completely refuse from "nested structure", but damn, it is to shovel half the code, and to punch a bunch of blocks that should be "collapse/expand" working structures in single-level lists

Update 2. And now do not understand anything. Povyrubali all processes/nodes that perform in parallel one query for the "reverse sort" (condition left, sorting removed - which is not good, but as a temporary variant will do), and then went wild for the delay in execution of the same query with 40C+ 2s... until it Turns out that such a wild delay gives the number of parallel requests to the database with the same query, but in the combination "sort + filter by nested field"... What the hell???
March 23rd 20 at 19:28
2 answers
March 23rd 20 at 19:30
A start would be to look at the output of explain.
Application performance depends on the used index, but in this case you're using $exists is false, which automatically leads to a situation of full scan.
Indexing this field is not output. If you want to index, then index should be a value.
To solve the problem, you must when you insert to add something like updated: false.
If you store the date, use the additional field updatedAt. Do not mix different data types in a single field - monga may not build the index.
For a reverse sort you can create an additional index with the reverse order.
Read more on sorts can be read here https://docs.mongodb.com/manual/tutorial/sort-resu...
You can build combined Indes with direct access to property and reverse sorting, you should only comply with the order.
@bradley_Jacobson60 followed your advice, but apparently not quite true. Used false for the field "no data" (type - does not exist), and when filling in a field to spell the string value. The speed increased, the index build... But then got one strange situation...
1. find({'field':false}) quickly
2. find({'field':{$ne: false}}) - complete suckage
3. find({'field:{$gt:"}}) - quickly

On the field were doing index and forward and reverse, the situation has not changed. All three options use the index.

Just do not understand why a direct comparison === false is fast, and the reverse in fact !== false dies a minute (then just lost patience)... any ideas? - raymond commented on March 23rd 20 at 19:33
@raymond, look at the explain output. Pumped skill of telepathy suggests that the index is not used.
Not used due to the fact that JS is not a typed language, so it is necessary to make the comparison, and therefore the casting, and then invert the result.

Why do find({'field':{$ne: false}})when you can just do find({'field': true})? - bradley_Jacobson60 commented on March 23rd 20 at 19:36
@bradley_Jacobson60, the comparison is "not false" is a very rare operation in principle, the constant comparison of just $gt and $lt - because field some semblance of a group sort or something... Well, that is there is written something like "se032-dr-230-12/3442" - i.e. the field is populated for a particular string pattern - which when you compare just and it turns out that increases it and doing the sorting and selections often.... - raymond commented on March 23rd 20 at 19:39
March 23rd 20 at 19:32
because fashion until recently, nosql databases are not designed for complex and fast samples.
It is not your last problem - the sooner you switch to a normal relational database, the less problems in the future.
Yes, I know exactly what you need to do... a Long time friend of musculi from true, we have to cut, because I remember was the experience, and tables 1-2 million records slowed down mercilessly, and then it is expected over time, the content of under 100 million...
Although pull it postgre such signs or have to think thoroughly on the subject of partitions, =((( - raymond commented on March 23rd 20 at 19:35
if the computer has enough memory (and if SSD ) and musculi, respectively, to configure you will have no problems. Pogre faster not muscle - there is simply more advanced sql
and patriciavonne in the muscle is not a problem - nora_Pfannerstill commented on March 23rd 20 at 19:38
Migrated the collection from Mongo to Postgres and tried to perform basic queries (the correct indexes built) - somehow neraduzhno. What Mongo gives data a fraction of a second, on postgre runs for almost 2 seconds...
In fact, nothing complicated and no ties too, all in one plate, direct comparison - just a table of 10 million records...
This is somehow "insulting" or something... - raymond commented on March 23rd 20 at 19:41
@raymond, if it is necessary to roll the big data you're supposed to use the server enterprayz level - mssql Oracle etc.
All in all the database servers have a bunch of settings which allow you to optimize the speed. there are all sorts of caches and stuff
If the index does not fit in pamatai the sense of it a bit - nora_Pfannerstill commented on March 23rd 20 at 19:44
@nora_Pfannerstill, who I will give free MSSQL or Oracle =))) and I doubt that postgre compared to him not a competitor... and a table of 50 million records (in the future) somehow does not pull on megacrania - raymond commented on March 23rd 20 at 19:47

Find more questions by tags MongoDB