There was a problem. Every month the web application was stopped with an error 502. The reason postgresql has stopped processing requests because the limit was exceeded number of connections. The process list showed a bunch of open pending compounds that block some request. Moreover, the request was quite ordinary, which is usually instantly. The cause that led to the blockage, to find out not smart enough. But there were a few processes are "idle in transaction" and after killing one of them recovered.
In postgresql 9.5 still there is no option that sets the timeout on a hanging transaction. It was therefore decided to install Pgbouncer.
In the end, there is postgresql, pgbouncer in front of him pool_mode=transaction.
A couple of weeks everything worked great. But in the last two days suddenly for no apparent reason formed the problem. The Web application returns to the user 500, the user is surprised, tries again, all again works normally. After a few minutes, in another place, the situation repeats. The user is nervous.
In the log application (Python, Flask, psycopg2):
Exception on /any-url-of-the-application [POST]
Exception: idle transaction timeout
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Look at any requests falling on any who normally work instantly.
idle_transaction_timeout = 600
The server load is 10-20 connections per second. In the system logs, in the logs of Postgresql and Pgbouncer - nothing suspicious. The process list does not show the presence of postgres processes with hanging transactions. What system bursts, resulting in short-term resource depletion, which can lead to this specific mistake, I'll never know.
Weird that's what. If this error occurred in service scripts that perform long tasks regularly in cron, it would be clear where to dig. And here - just a user, operates in the browser, and suddenly after the next click on the button - error 500! idle_transaction_timeout = 600 is 10 minutes. The user also gets 500 instantly. If you consider that he is whenever you update a page creates in the context of Flask application its own connection as it can suddenly fall because of the idle transaction timeout?
What could it be?