For starters, I would like to briefly describe the principle of operation of search engines.
1. There is a request.
2. Request is subjected to stemming.
3. The system index is searched for documents that contain words from the query.
4. Documents are ordered relative to the frequency of words, their first point of entry and on the basis of the Pearson correlation coefficient, other elements of the citation index, as well as the "vibramotor" as a search result by the user (this is due to the neural network with backward propagation).
But to store huge indexes in a single database — murder for such speed. How can I create the ability to scale out storage indexes, not much to lose in speed?
And how you can implement the pagination mechanism? You can, of course, to remember the last index, but it does not take into account using a full text search on all indexes. And with this approach, storage space is not enough. To create a separate cluster group with the storage of indices and running multithreaded servers to search, and then combine the results? But ranked well in the search process.
In General, I would like to hear the advice of professionals.