There was a problem. Every month the web application was stopped with an error 502. The reason postgresql has stopped processing requests because the limit was exceeded number of connections. The process list showed a bunch of open pending compounds that block some request. Moreover, the request was quite ordinary, which is usually instantly. The cause that led to the blockage, to find out not smart enough. But there were a few processes are "idle in transaction" and after killing one of them recovered.
In postgresql 9.5 still there is no option that sets the timeout on a hanging transaction. It was therefore decided to install Pgbouncer.
In the end, there is postgresql, pgbouncer in front of him pool_mode=transaction.
A couple of weeks everything worked great. But in the last two days suddenly for no apparent reason formed the problem. The Web application returns to the user 500, the user is surprised, tries again, all again works normally. After a few minutes, in another place, the situation repeats. The user is nervous.
In the log application (Python, Flask, psycopg2):
Exception on /any-url-of-the-application [POST]
...
...
Exception: idle transaction timeout
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Look at any requests falling on any who normally work instantly.
In pgbouncer.ini:
idle_transaction_timeout = 600
The server load is 10-20 connections per second. In the system logs, in the logs of Postgresql and Pgbouncer - nothing suspicious. The process list does not show the presence of postgres processes with hanging transactions. What system bursts, resulting in short-term resource depletion, which can lead to this specific mistake, I'll never know.
Weird that's what. If this error occurred in service scripts that perform long tasks regularly in cron, it would be clear where to dig. And here - just a user, operates in the browser, and suddenly after the next click on the button - error 500! idle_transaction_timeout = 600 is 10 minutes. The user also gets 500 instantly. If you consider that he is whenever you update a page creates in the context of Flask application its own connection as it can suddenly fall because of the idle transaction timeout?
What could it be?
Version 1.7 of the bouncer looks Like your aiming for a fix is relevant if, of course, because of this bug and this is happening. Thank you.
Generally, the bouncer, as I said, was established with the sole purpose to kill long transactions, which for some reason was blocked by other requests, which were exceeding the number of connections. Postgres version 9.5 does not have this option yet. Was inspired at first by this decision. But it failed because at the time when Postgres no longer accepting requests because of connection_limit, too late to drink Borjomi.
How easy it is to remove (or kill-9) idle transaction, another way is not found. - Agustin53 commented on April 19th 20 at 12:44
To remove the process by calling pg_terminate_backend (and in order not to work as a superuser and then superuser_reserved_connections you the opportunity to administratively connect give) - Shad19 commented on April 19th 20 at 12:47
In General, it is unclear how the user who opened the page in the browser can immediately get the error that the log application is described as idle transaction timeoutif a timeout is already 10 minutes? Bouncer used to request a free connection from the pool, and some where 10 minutes is incomplete transaction? How can this be??? Or I something do not understand in this life? )) - Agustin53 commented on April 19th 20 at 12:50
Don't remember about the psycopg2 and met such pooling in the application which do not properly respond to the closure of the connection. - Shad19 commented on April 19th 20 at 12:53
By the way, your remark about
This method I initially used. The scheme was as follows. In the crown starts processing from a script running in the application context, i.e. in the end the request was issued as it were from a web application. Postgres, already having exceeded the number of connections, given it failed.
About superuser_reserved_connections didn't know. If understand correctly, you need to create a superuser in postgres, on whose behalf and to start the treatment? - Agustin53 commented on April 19th 20 at 12:56
And Yes, even
this is not so. The real story: there are 2 idle transaction process, and a lot of "SELECT waiting", which is blocked. Kill one process of the two - not guessed, the picture doesn't change, but the base does not fall into the restart. When you kill the second process, in which the idle transaction, other processes and calmly work out the situation normalizes. He saw))
Thanks for the help, figured out superuser_reserved_connections, I'll dig in this direction. - Agustin53 commented on April 19th 20 at 12:59