How to compare two images for a full identity?

Good afternoon.
1. There is a set of images-the originals (1000х500рх say)
2. There is a set of reduced (500х250) images of claim 1.
I need everyone to original find smaller copies of it.
Prompt, in what direction to dig
1. To reduce the image of claim 1 to the size 500х250 and compared with the set from paragraph 2?
2. How is this comparison? It is necessary to translate image into a binary code and compare binary codes?
3. Again, images are the same , just in the same folder as the originals, and the other reduced versions of them.

Thank you.
March 23rd 20 at 19:30
March 23rd 20 at 19:32
Not the fact that reducing an image will help, because one extra percentage compression of the avalanche will change the file hash.

Look system to find similar images based on AI, such definitely should be.
1. And to know the exact percentage of compression is impossible, manually comparing the original and reduced copy?

2. The fact that the comparison is scans documents, so they will all be very similar, just the numbers may vary inside these documents) - clemens_OHara commented on March 23rd 20 at 19:35
@clemens_OHara, whoa, here I will not prompt, because the documents were not compared, but some utility diff for images should highlight the difference. - jonathon_Reilly commented on March 23rd 20 at 19:38
March 23rd 20 at 19:34
As a mathematician, I have to say that the solution to this problem exists. (This is a joke, understandable to mathematicians)

Only the word "identity" should be thrown out of the conditions. They don't make sense and make the task unsolvable. When the reduced image, first worked some kind of interpolation algorithm, and such algorithms with a dozen, maybe more, each will give a slightly different picture on the output. Then worked the compression algorithm with unknown parameters. I.e., varying the parameters from one original to make thousands (and even millions) of different little thumbnails. They are not completely identical (although many eyes are not distinguishable). You will need a lot of luck (not counting heaps of effort and heaps of time) to get to select all the parameters. So forget about "identity", we will be strict mathematical methods to search for the most similar pictures. If "similarity" has not exceeded some threshold (conventionally, 97.1%, or 99.5%), then we say that similar is not found at all.

We compare the screen of the same resolution, so either a large decrease or a small increase. And here it is not obvious that it is more efficient to compare a little (since we lose some information in exchange for a speed comparison and a more simple algorithm), it is worthy of a separate study, but for simplicity it will reduce the originals to compare only thumbnails.

So the comparison (of course, in a loop for all images). Pixel by pixel subtract one image from another. In the case of "full identity," the difference should appear as a fully black background. This we will not. But we have to look "dirty" the difference of the images (actually the "absolute difference", I only for ease of explanation say "the difference"). The less bright pixels in the difference, the more similar images. In mathematics this section deals with "methods of optimization" (googling it, to take in the library tutorial, etc.). The criterion "black" still have the right to choose. But this is a matter of technique. The task is not trivial, but solvable.

Find more questions by tags Images