There's a way to quickly compare one image with other images?


There is a form in which the user loads the picture. I need to make sure that the downloaded picture is definitely unique. Compare not by name, hegu or size, and with the help of this function, which peeked in the vastness:

def compareImage(file_to_compare, folder):

 # file_to_compare is the path to the picture, which was uploaded by a user
 # folder is the folder in which pictures with which to compare the image of the user

 photos_in_folder = os.listdir(folder)

 if len(photos_in_folder) > 0:

 for image in photos_in_folder:

 other_image = str(image)

 if file_to_compare != other_image:

 if os.path.isfile(file_to_compare) and os.path.isfile(other_image):

 h1 =
 h2 =

 result = math.sqrt(reduce(operator.add, map(lambda a,b: (a-b)**2, h1, h2))/len(h1))

 if result and result < 150:

 # Found a brace
 return True

else:'no photos for comparison. Skip')

 # Picture of the user unique
 return False

If the images in the folder 6K, then a comparison will last almost 2 minutes. It dawned on me that endlessly optimizing function is not possible and if you compare the image of the user with each existing image in a for loop, it is in any case will last a long time.

Question — is it possible to compare one image from many others that that it was faster than the comparison pairs, on the condition that:

  1. You need to compare precisely the contents of images, not their size, hash, address, weight etc
  2. Every time the user loads the picture, I need to update the information already available on the image server, so to build the index once a day is impossible, it is not relevant
  3. Can't afford to keep the RAM already present on the server the picture since a lot of them, and RAM is limited (if, for example, to exclude from a chain of slow file system)
  4. Already using a SSD drive and optimize the file system it seems to me there is nowhere else

I would be grateful for ideas or references on implementations where you can pry the principle of comparison.
April 3rd 20 at 17:47
2 answers
April 3rd 20 at 17:49
The most expensive operation here is the
For each downloadable pictures visitavi hash and save in database, so you do not count it twice
April 3rd 20 at 17:51
Gener hash and not worry. It is unique and I think you can find a quick generator, and search like - fast. In hash-function it is possible to put any standardized, normalized data.

Find more questions by tags Python