Why are there duplicates in Postgres in multiple insert?
There is an app with a certain interval collects data, processes and inserts to the database (about 2000+ values at a time).
When you insert the entity, it checks the fields if there is already in the database and if there is, do nothing.
When checking for uniqueness (for some time) it turned out that there are duplicates (1 for 100 inserts about): the same line with the difference insertion in milliseconds.
Before the check was in the SQL query (INSERT SELECT IS NOT EXIT), later made into the application (the id is searched in the parameters, if not, then insert). But duplicates remained.
What could be the problem?
Operation check, then an insert is not atomic, in the interval between these two actions, another query could insert a similar entry. Solution: ensure uniqueness of data before the bulk insert (as a solution), use a unique constraint and ignore errors, use INSERT ON CONFLICT.
Markus_Langworth answered on June 26th 19 at 14:18
The problem is the competitive access to data.
This can be managed, but will be clever to regain control over data integrity, the PostgreSQL database engine.