How to find the cause of "Exception: transaction idle timeout" in the application Flask+psycopg2+postgresql+pgbouncer?

There was a problem. Every month the web application was stopped with an error 502. The reason postgresql has stopped processing requests because the limit was exceeded number of connections. The process list showed a bunch of open pending compounds that block some request. Moreover, the request was quite ordinary, which is usually instantly. The cause that led to the blockage, to find out not smart enough. But there were a few processes are "idle in transaction" and after killing one of them recovered.

In postgresql 9.5 still there is no option that sets the timeout on a hanging transaction. It was therefore decided to install Pgbouncer.
In the end, there is postgresql, pgbouncer in front of him pool_mode=transaction.

A couple of weeks everything worked great. But in the last two days suddenly for no apparent reason formed the problem. The Web application returns to the user 500, the user is surprised, tries again, all again works normally. After a few minutes, in another place, the situation repeats. The user is nervous.

In the log application (Python, Flask, psycopg2):

Exception on /any-url-of-the-application [POST]
...
...
Exception: idle transaction timeout
server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.


Look at any requests falling on any who normally work instantly.

In pgbouncer.ini:
idle_transaction_timeout = 600

The server load is 10-20 connections per second. In the system logs, in the logs of Postgresql and Pgbouncer - nothing suspicious. The process list does not show the presence of postgres processes with hanging transactions. What system bursts, resulting in short-term resource depletion, which can lead to this specific mistake, I'll never know.

Weird that's what. If this error occurred in service scripts that perform long tasks regularly in cron, it would be clear where to dig. And here - just a user, operates in the browser, and suddenly after the next click on the button - error 500! idle_transaction_timeout = 600 is 10 minutes. The user also gets 500 instantly. If you consider that he is whenever you update a page creates in the context of Flask application its own connection as it can suddenly fall because of the idle transaction timeout?

What could it be?
April 19th 20 at 12:39
2 answers
April 19th 20 at 12:41
and after killing one of them recovered.

it is hoped that not kill -9

Start by checking that your application is indeed correctly rediscover the connection projected on the initiative of a base or bouncer. And not leave it in the pool forever, giving the stored text of the error.

By the way, you did not mention the version of pgbouncer. Perhaps the bouncer you have put is as old as the base. Remember possible relevant fix in 1.10
Well, actually, kill-9. Another way to combat server, where need to think quickly, not found.
Version 1.7 of the bouncer looks Like your aiming for a fix is relevant if, of course, because of this bug and this is happening. Thank you.
Generally, the bouncer, as I said, was established with the sole purpose to kill long transactions, which for some reason was blocked by other requests, which were exceeding the number of connections. Postgres version 9.5 does not have this option yet. Was inspired at first by this decision. But it failed because at the time when Postgres no longer accepting requests because of connection_limit, too late to drink Borjomi.
How easy it is to remove (or kill-9) idle transaction, another way is not found. - Agustin53 commented on April 19th 20 at 12:44
kill -9 any process database drops the entire database to restart.
To remove the process by calling pg_terminate_backend (and in order not to work as a superuser and then superuser_reserved_connections you the opportunity to administratively connect give) - Shad19 commented on April 19th 20 at 12:47
Put version 1.12. It did not help.

In General, it is unclear how the user who opened the page in the browser can immediately get the error that the log application is described as idle transaction timeoutif a timeout is already 10 minutes? Bouncer used to request a free connection from the pool, and some where 10 minutes is incomplete transaction? How can this be??? Or I something do not understand in this life? )) - Agustin53 commented on April 19th 20 at 12:50
because
Start by checking that your application is indeed correctly rediscover the connection projected on the initiative of a base or bouncer. And not leave it in the pool forever, giving the stored text of the error.

Don't remember about the psycopg2 and met such pooling in the application which do not properly respond to the closure of the connection. - Shad19 commented on April 19th 20 at 12:53
@Shad19, I don't know how to verify this))
By the way, your remark about
To remove the process by calling pg_terminate_backend (and in order not to work as a superuser and then superuser_reserved_connections you the opportunity to administratively connect give)

This method I initially used. The scheme was as follows. In the crown starts processing from a script running in the application context, i.e. in the end the request was issued as it were from a web application. Postgres, already having exceeded the number of connections, given it failed.

About superuser_reserved_connections didn't know. If understand correctly, you need to create a superuser in postgres, on whose behalf and to start the treatment? - Agustin53 commented on April 19th 20 at 12:56
@Shad19,
And Yes, even
kill -9 any process database drops the entire database to restart.

this is not so. The real story: there are 2 idle transaction process, and a lot of "SELECT waiting", which is blocked. Kill one process of the two - not guessed, the picture doesn't change, but the base does not fall into the restart. When you kill the second process, in which the idle transaction, other processes and calmly work out the situation normalizes. He saw))
Thanks for the help, figured out superuser_reserved_connections, I'll dig in this direction. - Agustin53 commented on April 19th 20 at 12:59
nope, kill -9 (SIGKILL) will lead the database to restart. SIGTERM'om ask for the process to be completed and it may be correct, it pg_terminate_backend and uses. - Shad19 commented on April 19th 20 at 13:02
@Shad19, well, I will not argue, when the case will try again to check. Thank you! - Agustin53 commented on April 19th 20 at 13:05
April 19th 20 at 12:43
here all studied?

or your version Postgre not applicable?

obviously, the case of using the base / pool

Find more questions by tags FlaskPostgreSQL