How to design the architecture of search engines?

For starters, I would like to briefly describe the principle of operation of search engines.
1. There is a request.
2. Request is subjected to stemming.
3. The system index is searched for documents that contain words from the query.
4. Documents are ordered relative to the frequency of words, their first point of entry and on the basis of the Pearson correlation coefficient, other elements of the citation index, as well as the "vibramotor" as a search result by the user (this is due to the neural network with backward propagation).

But to store huge indexes in a single database — murder for such speed. How can I create the ability to scale out storage indexes, not much to lose in speed?
And how you can implement the pagination mechanism? You can, of course, to remember the last index, but it does not take into account using a full text search on all indexes. And with this approach, storage space is not enough. To create a separate cluster group with the storage of indices and running multithreaded servers to search, and then combine the results? But ranked well in the search process.

In General, I would like to hear the advice of professionals.
June 3rd 19 at 19:22
2 answers
June 3rd 19 at 19:24
The solution was found — replication daily update index.
June 3rd 19 at 19:26
Sphinx + MVA

with pagination there are no miracles - only post-processing after the extraction from the engine
Sphinx will not use, purely use a search engine.
I think that keeping my index just washed down its replication. - Arnold commented on June 3rd 19 at 19:29

Find more questions by tags Search enginesDesigning software