There is a table of several hundred million records. About 200 GB. It has a lot of columns, including text. There are columns with low selectivity. For example, the city column. The task is to select all where the city is equal to the specified value. For example, SPB about 10 million records. All they need to bring in the file. I.e. this type of query to COPY (SELECT more fields ...) TO 'file.txt'. Now they remain half an hour. Any indexes not help. And if you do not SELECT a few fields, and SELECT id ... WHERE city = ..., this happens in just a few seconds. If to make record in the city of St. PETERSBURG in a separate materialized view, SELECT more fields ... there are no longer half an hour, and half a minute.
- Do you need to create for each city a separate table?
- If you create a separate table for each city, what if you want to filter on other columns, not in the city?
- Read a little about PgPool 2 and the possibility of parallel queries. If you make protezirovanie by id and use parallel queries to all partitions, a variant of it? And if pgpool can do this in one car?
- How else can you optimize?
- Cope if even one machine with this problem? I read somewhere, people write that they have a few billion records in Postgres fly on one machine even fairly complex queries. How so?