How to quickly sort a large table for frequently changing field?

Know what sort conveniently for the indexed field.
However, it is also known that the use of indexes on frequently changing data causes a large performance degradation.
Here's the situation.
There are shards with tables with game accounts, where lives about 50 million records. Among the different fields there is a field with the balance of game currency. This balance is often changed accordingly.

You have a background task with a regular build on top on the basis of the balance every few hours. The request includes course design "ORDER BY balance DESC". Interval of 2 hours because the building is more than an hour. I would like possible and more often.

The problem creates a very serious load on the disk subsystem of the database server, IOLA about 90%, reaches almost 100%.
Possible and undesired solutions:
1) Make a replica of the database on another server and its torment queries. But you need to make a reservation. It is also possible to reduce the frequency of updates because the hardware is less powerful.
2) to Deliver a more efficient array SSD expensive
3) Use some kind of NoSQL solution specifically for this task. Will add redundancy and probably inconsistently. Again, it is necessary to make a reservation.

Want to solve the problem more elegantly on softova level with the existing infrastructure.
Can help to use MATERIALIZED VIEW? Let's make a separate submission of the required data for the construction of the rating and to impose balance on the field index, and to update let's say every half hour?
April 4th 20 at 12:57
5 answers
April 4th 20 at 12:59
Solution
And there is an option to store values in a redis user's ID and balance? For example, using Sorted sets. For every change of the balance of each user to update the values. That is to pull the top of the radish, the idea is that not be a problem. And pull can be arbitrarily often, and the data is always current.
That was the idea.
There is a caveat that this decision does not perfect. In fact, there are a few games + several types of tops. That is, will be the number of "slices" with some redundancy. Have to estimate there will be enough space in the vault. But the solution is a work in a first approximation. Thank you. - angelica64 commented on April 4th 20 at 13:02
@angelica64, given that it's all tops, You, in theory, can change the balance of any user to see the maximum and minimum element in radish, and if the balance falls within this range, add a record, and the record minimum value to delete (otherwise, if not within range, do nothing). That is, to limit the number of items in the list in Redis. In any case, you do not need all the 80 million users to keep. - bethel_Lubowi commented on April 4th 20 at 13:05
Had a similar experience. The volume was smaller, but worked perfect and very fast.

Every change of balance fall radishes. Also, there were many different lists of ratings. Was the endpoint that issued ZREVRANGEBYSCORE on the desired keys. - yasmin_Farrell32 commented on April 4th 20 at 13:08
@bethel_Lubowi, Yes, that's right, there is enough only the top 3000 - angelica64 commented on April 4th 20 at 13:11
@angelica64, then all problems will not arise - bethel_Lubowi commented on April 4th 20 at 13:14
April 4th 20 at 13:01
Solution
explain (analyze,buffers) with the included track_io_timing show.

Options:
- you spend the whole hour of time not sorting, but somewhere else about decided not to write. Accordingly, the question is irrelevant and must be sought there where dropped, but no where light.
- you have mismatched settings if so (ie default)
- time is a filesort, but on ssd you have in error are read-optimised.
If to solve in a forehead, it looks like this:
EXPLAIN (ANALYZE, BUFFERS) SELECT user_id, rating FROM accounts WHERE year = 123 ORDER BY rating DESC LIMIT 3000;
 QUERY PLAN 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit (cost=2475021.96 2475029.46..rows=3000 width=12) (actual time=..204038.913 204039.763 rows=3000 loops=1)
 Buffers: shared hit=79631 read=1385187 written=4601
 -> Sort (cost=2475021.96 2480821.74..rows=2319911 width=12) (actual time=..204038.911 204039.447 rows=3000 loops=1)
 Sort Key: rating DESC
 Sort Method: top-N heapsort Memory: 333kB
 Buffers: shared hit=79631 read=1385187 written=4601
 -> Bitmap Heap Scan on accounts (cost=549639.99 2329438.88..rows=2319911 width=12) (actual time=..rows 17485.007 202936.757=2330230 loops=1)
 Recheck Cond: (year = 123)
 Rows Removed by Index Recheck: 1509171
 Heap Blocks: exact=363810 lossy=645363
 Buffers: shared hit=79631 read=1385187 written=4601
 -> Bitmap Index Scan on uk_account_user_game (cost=0.00..549060.02 rows=2319911 width=0) (actual time=..rows 17338.478 17338.478=2685830 loops=1)
 Index Cond: (year = 123)
 Buffers: shared hit=51458 read=404187 written=4601
 Planning time: 0.170 ms
 Execution time: 204044.322 ms


But until I put a crutch in the form of a partial index on the field last visit of the game than the reduced dataset, and slightly accelerated:
EXPLAIN (ANALYZE, BUFFERS) SELECT user_id, rating FROM accounts WHERE year = 123 AND last_played > '2019-01-01 00:00:00' ORDER BY rating DESC LIMIT 3000;
 QUERY PLAN 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit (cost=0.43..11604.33 rows=3000 width=12) (actual time=1.270..2399.434 rows=3000 loops=1)
 Buffers: shared hit=79253 read=8879
 -> Index Scan Backward using account_last_played_idx on accounts (cost=0.43..1797664.90 rows=464757 width=12) (actual time=1.269..2397.924 rows=3000 loops=1)
 Filter: ((last_played > '2019-01-01 00:00:00'::timestamp without time zone) AND (new = 123))
 Rows Removed by Filter: 11100
 Buffers: shared hit=79253 read=8879
 Planning time: 0.205 ms
 Execution time: 2400.207 ms
(8 lines)


But still it's not much help. - angelica64 commented on April 4th 20 at 13:04
track_io_timing where? It is no coincidence that I mentioned. - Tamara.Hackett commented on April 4th 20 at 13:07
a crutch in the form of a partial index on the field last visit of the game

However, it is also known that the use of indexes on frequently changing data causes a large performance degradation.

You already have done so. So do a direct index on the balance.

Sort (cost=2475021.96 2480821.74..rows=2319911 width=12) (actual time=..204038.911 204039.447 rows=3000 loops=1)

As you can see, the sort you really not worth anything.

written=4601

But checkpointer and bgwriter do not.
And bitmap is not enough workmem. And, I suspect, shared_buffers

Well, for only 2.3 million rows touched almost 12GB with the if so, the problem is exactly. - Tamara.Hackett commented on April 4th 20 at 13:10
@Tamara.Hackett,
> track_io_timing where? It is no coincidence that I mentioned.

With this there are difficulties.

Thanks for the analysis, it seems there's still work to do in the current scheme. - angelica64 commented on April 4th 20 at 13:13
@Tamara.Hackett, we did debloat - do everything has become much better.
I only have a question: how you calculated the amount of data read from disk? - angelica64 commented on April 4th 20 at 13:16
@angelica64,
Buffers: shared hit=79631 read=1385187 written=4601

shared hit is found and read from shared_buffers
read - read from the filesystem (maybe the page cache system, this base does not know)
written - were forced to write dirty blocks

All these numbers are measured in units of a fixed size block_size bytes. Can only be changed when you build the database from source, because with a good probability we can assume that is equal to the usual 8KB.
(1 385 187+79 631)x8/1024 = 11 443 MB - Tamara.Hackett commented on April 4th 20 at 13:19
@Tamara.Hackett, thank you - angelica64 commented on April 4th 20 at 13:22
April 4th 20 at 13:03
Solution
Auxiliary table with the fields "balance" and "user_id". A unique index on user_id, the clustered index(very important!) on the field "balance"
Trigger on change of balance in the main table corrects the value in the auxiliary.

In cases where you are accessing single rows randomly within a table, the actual order of the data in the table is unimportant. However, if you tend to access some data more than others, and there is an index that groups them together, you will benefit from using CLUSTER. If you are requesting a range of indexed values from a table, or a single indexed value that has multiple rows that match, CLUSTER will help because once the index identifies the table page for the first row that matches, all other rows that match are probably already on the same table page, and so you save disk accesses and speed up the query.


https://www.postgresql.org/docs/9.1/sql-cluster.html

UPD

I came up with another solution.
1) get familiar with the size of the main table, most likely it has inflated
https://www.youtube.com/watch?v=-GNHIHEHDmQ
2) make an index on the field Balance with the option to Include Columns and add the user_id. Then when you request

select balance, user_id from your_table_name order by balance

all data will be proofread from the index, and this will significantly reduce your load
basically it allows you to make a materialized view - angelica64 commented on April 4th 20 at 13:06
@angelica64, the clustered index? - Gia_Mueller commented on April 4th 20 at 13:09
@Gia_Mueller, sorry, read the whole explanation at the link. First a little about the other thought. It is necessary to try. - angelica64 commented on April 4th 20 at 13:12
@angelica64, the clustered index determines the physical location of the rows on disk, that is, it should reduce the load on the disk. - Gia_Mueller commented on April 4th 20 at 13:15
@angelica64, I'm still thinking. Now you have a long transaction for the construction of the top. The main table is not inflated from this? There is a view of table bloat which in some scenarios it happens on postgres.

Well, for only 2.3 million rows touched almost 12GB with the if so, the problem is exactly.

What wrote @Tamara.Hackett just may be because of this - Gia_Mueller commented on April 4th 20 at 13:18
@Gia_Mueller, do I understand correctly that CLUSTER when performing locks the table? Then it can be a problem.

Yes, apparently first have to deal with the DB settings. - angelica64 commented on April 4th 20 at 13:21
@angelica64, Yes, maybe. I am now clarified in the documentation. Have to periodically run this command.

I came up with another solution.
1) get familiar with the size of the main table, most likely it has inflated
https://www.youtube.com/watch?v=-GNHIHEHDmQ
2) make an index on the field Balance with the option to Include Columns and add the user_id. Then when you request

select balance, user_id from your_table_name order by balance

all data will be proofread from the index, and this will significantly reduce your load - Gia_Mueller commented on April 4th 20 at 13:24
Will add it back - Gia_Mueller commented on April 4th 20 at 13:27
all data will be proofread from the index, and this will significantly reduce your load

This balance is often changed accordingly.

Because I have great doubt on the rationality of covering index. Very likely that quickly become rotten visibility map happens makes no difference, the include is there or not there. Only include also the index will be more. - Tamara.Hackett commented on April 4th 20 at 13:30
@Tamara.Hackett, perhaps. I don't really know how it will work with postgres in this case. Rotten visibility map will force all to read from the table or from the table will be read only rows from the foul flag? - Gia_Mueller commented on April 4th 20 at 13:33
Will be read page of data, about which the visibility map is not said that they are reliably visible to all. That is, the index only scan plan can always decide to raise the page table itself and check the visibility tuple. In the ideal case, calls to the blocks table will not be in the worst - identical to the index scan. - Tamara.Hackett commented on April 4th 20 at 13:36
@angelica64, unsubscribe, please, if you check this option. Interesting. - Gia_Mueller commented on April 4th 20 at 13:39
@Tamara.Hackett, thanks for the clarification - Gia_Mueller commented on April 4th 20 at 13:42
@Gia_Muellernow understand how to fix table bloat. Video on your link was very informative, case No. 1 definitely happened. As for clustering, then everything would be fine, but we need to estimate the running time. If the exclusive lock will hang more than 2 seconds, then it is unacceptable.
As to Include Columns, I will try, if the treatment does not give the desired result, but I think all should be fine. - angelica64 commented on April 4th 20 at 13:45
@angelica64, clustering doesn't seem like a good idea. It was a quick answer that doesn't quite meet the specific requirements of postgres, namely the fact that a full cluster index it. But the idea of a covering index may help. - Gia_Mueller commented on April 4th 20 at 13:48
April 4th 20 at 13:05
We have to see how often the players overcame the rating and try to solve organizationally. For example, to decouple the rating of the points in the game and enter the "magic" of progress-option (rate set points) . If, in fact, whether you counted in the rating or not-the user to check is impossible. In addition, it can be assumed that those who from the 1st to the 1000th place, interested in their ratings more seriously than the ones with 100000 to 500000-e... And then the vastness and expanse, as all you can reorganize. For example, divide users into groups -- Gurus, Pros, Rookies, Microorganisms (and can be divided according to frequency of references to the rating) -- for each group its own table (even in the database on a separate node), and in the user table add a pointer to which group he belongs to... write points Respectively in both tables (it's fast), but the index will only be on the table that is smaller... When you request to show the user location in the rating in its group, while doing the "big sort" and scattered in groups -- once a day during low activity...

Options such organizational alterations to reduce the load can be a lot. Maybe I'm wrong, but doesn't always make sense to solve a technical task in the forehead.
Institutional arrangements have already been made, thank you. But not enough of them were.
To change the table schema - very bulky solution, I don't have all the nuances of the project to describe. - angelica64 commented on April 4th 20 at 13:08
April 4th 20 at 13:07
Why do you need to sort all 50 million? The task of the top - to take the example of the top 10.
Make a temporary plate and trigger drain it the Pareto principal, or more than 95%
where the balance is greater than X. And there will be no 50 million and 100 thousand

And this small sign is easy osorterade and uploaded.
Here this method doesn't work, because they are not top 10 or top 100, and top 3000. What these tops about 2000 pieces now in every game. I do not advocate to add logic in the database unless absolutely necessary - it turns into a magic that test uncomfortable. - angelica64 commented on April 4th 20 at 13:10
@angelica64, read about the CQRS pattern. This does not apply to the database. It's about the architecture of the system in principle. Maybe you'll reconsider the statement. - Devin38 commented on April 4th 20 at 13:13
@Devin38, CQRS involves a serious refactoring. And we have legacy code what operation to change the balance are made using 2 different services. - angelica64 commented on April 4th 20 at 13:16
Whatever - Devin38 commented on April 4th 20 at 13:19

Find more questions by tags PostgreSQL