The web server under the photo. Which storage architecture to choose based on the scaling?

Hello!

There is a web project, there is a lot of pictures.
Need advice on how to store pictures including the scaling?
Now all the pictures fall down in the General section of web server space is becoming less and less...
Plans to buy a machine under static, private domain of the type my-cdn.com
In nginx to create a subdomain fs1.my-cdn.com and there to put the files. How to end a server fs2.my-cdn.com etc. In a table pictures to store the number of the server (or fs1) to understand what kind of car is a file.
Linux is not strong, so the more nothing comes to mind. The web server is working under debian.

p.s. I will appreciate any answers. Credits to everyone!:D Only no flood plz.
June 26th 19 at 14:33
3 answers
June 26th 19 at 14:35
Solution
Have not considered the connection hranilischa like S3?
I think that the price will be much more expensive. - myron_Schmeler98 commented on June 26th 19 at 14:38
: Agree : third-party services do not want to, and with the dollar not rosy times. 67 today, tomorrow 100 R, and then a new pain in the ass to find in terms of moving. I someone else's experience with foreign currency mortgages had enough)) It is certainly not a mortgage, but still...
If the picture was 2-3 terabytes a long time would not think... - jonas_Kohler36 commented on June 26th 19 at 14:41
June 26th 19 at 14:37
Solution
It is better to do something a little different. To make a node — the input node to statics. This input node requires very little resources, but will make the interface for your entire system, you will avoid many problems and will be able to scale easily.

All access requests to static files should be sent to that node, for example, files.example.com. The one who is out of the query doesn't even know what server etc. is a file and it doesn't need to know.

This node keeps track of the files and knows on what server you have located the file. Your application (the website) sends a request to node (files.example.com) to access a particular file and that node is looking in my database where the file is located (on what server and what address) and in return gives the address, for example, srv01.files.example.com/f/405/502.jpg

Thus you will have a single interface as the entry point and your code will interact with the files through this interface (the API). If in the future you will need to change the algorithm, it is necessary to change only what is behind the interface, but not what lies ahead (what lies ahead interface won't even notice the change).

In the future, for reliability, you can make mirror sites.
Great advice, thanks I will consider and most likely it will design! - myron_Schmeler98 commented on June 26th 19 at 14:40
June 26th 19 at 14:39
Solution
If the migration photo is not anticipated, then Your option is quite reliable. When mass migration from server to server, you only need to do UPDATE the column with the name of the server.

If you want to delete/move/add files between servers, it is still necessary to write the handler, which monitors the space on the servers and prioritizethe record current pictures to the correct server on which the workload was light, IO is the place. Ala load balancer.
They will move in an extreme case, and then every 3-4 years and not in large quantities. But the balancer would be very helpful. Now manual control. What number will show in the config on this serv and will be filled... It is implemented, but not yet running, because the time's not right. I want to examine this question, to eliminate some bugs on the design phase of the - myron_Schmeler98 commented on June 26th 19 at 14:42
With the balancer it DOWNLOADING files all just. Choose the parameter by which to measure the balancing. Usually choose such indicators:
- The average load in IOPs on the disks on the server for 5-10 minutes (which is less to write)
- Free space on the disk server (check whether it is possible to write)
- Traffic bandwidth for 1-3-5-10 minutes (IOPs may be small, a lot of places, but the canal is full at 99%)

These parameters are collected in table script.
Every few minutes work analyzer which on the basis of these data puts the weight of each node to upload the files.

But if you need to balance the data between nodes, then it is necessary to consider the load on the reading, network bandwidth, volume of occupied space. And on the basis of these data to highlight the hot nodes with SSD for frequently accessed files, medium and cold. And to think of how to do the migration on Your production.
This is usually at night, but it all depends on Your load pattern files.
For photo ads, for example, typically have a large load in the first days, then you can move to the middle zone. - jonas_Kohler36 commented on June 26th 19 at 14:45
: got it, thanks! more or less cleared up everything - jonas_Kohler36 commented on June 26th 19 at 14:48

Find more questions by tags ScalabilityHighload