Distributed image repository

Hi, Habr!
The challenge is to create a distributed repository of images that you want to store in different sizes, the original image can be stored on any server.

We came to the following scheme: there is a physical machine, which is the entry point, and N additional stores. On the host server has nginx listening to port 80 and apache. When accessing the server nginx is looking for a processed image to the file system, if finds, tries to find the image on additional servers. If there is no image there, using apache, nginx speaks to PHP script that looks for the source of the desired image on the cars (sure the image directly on any of the machines), performs the necessary processing and stores the processed image in the storage. If not found the original image, php gives 404-e headers in the response that nginx gives a picture of the cover.

Please advice on the scheme of work (perhaps there are alternative options for building interactions that will suit us?), and help with configuration of nginx (to be specific — if you can't handle return apache headers in nginx using "error_page").

Thank you all in advance!
October 3rd 19 at 03:10
5 answers
October 3rd 19 at 03:12
It is not clear why "original pictures can be stored on any server".
It would be logical to make such an architecture, under which name the image uniquely identify its location.
1. At the time of uploading the image to determine the server where it should be stored.
2. To name the picture accordingly storage.
3. When you return you know exactly where to look.
October 3rd 19 at 03:14
IMHO it will be completely not fast. Well, locally the file check is not long, but to check for the other N servers will already not fast. And, again, a bottleneck may be a channel to the load balancer (entry point).

I would advise to consider a variant when at the stage of generating the html content is already known the location of the pictures. Each server has its own subdomain, for example: img1.domain.com, img2.domain.com Here you will be able to easily throw a dns balancing.

Well and, accordingly, if there are no ready pictures, have given the path to the PHP script.

To store id of the store next to the desired version of the picture will not be difficult.
October 3rd 19 at 03:16
To the main machine via NFS-mounted N privychkami.
NFS easily check what server has already processed the image and redirected him.
Originally NFS was planned only to check for the file, but after many tests it turned out that even with a record privychek on other servers, he manages to cheer. Falls no more once a year.
October 3rd 19 at 03:18
Why not make a simple database on the entry point in which to store the path to the picture (or at least the name of the server)? Search will still be made, but correctly made the database search would have a lot less and will work much faster.
Database load grows significantly. Connection to the database+scripting language... With a large number of queries is deadly. And DB optimization plays a very small role. - geraldine91 commented on October 3rd 19 at 03:21
L3n1n not really yours. It is possible to make requests directly from nginx - buck commented on October 3rd 19 at 03:24
October 3rd 19 at 03:20
I want to offer another option.

0 optional — lookup on the local FS
1 is a hash function from the path to the image that returns an integer. hashVal
2. select front server room hashVal % serversCount, and if he is alive — shrinks the picture from the server. if it is not something generated.
3. if the server is not alive, take the following.and go to step 2.

In practice, this solution is flawed for the reason that after the failure of the server — the server becomes overloaded.
Live projects used to
There are 1000 records in memcache. Initsiirovaniya values from step 2. After the fall of the front of the server to its indexes indexes are changing on live servers randomly. After raising restored.
(actually a little more complicated, because each picture is living always on 3 servers, and access to them is balanced by the carousel, but it doesn't matter for this question)

Find more questions by tags ApacheNginxPHPData storage