How to reduce to the minimum point of failure?

Hello.
Now go through "basic training", but rather studying engineering to ensure availability.
Only perceive this world, so you can ask these probably simple questions.

But still, prompt, please, as to reduce to a minimum the point of failure?
Suppose, for example, mail mail. They have e-mail works even with the fall DTS?

We were given usercash: 3 servers, it is necessary to configure the service so that the uptime was close to 100%
In my head flashed the thought: 1 - balancer, 2,3 - web well, etc.
But in this case the load balancer is the single point of failure.

Therefore, the question arose - and how "big" companies with this "fight?" or how to make this load balancer does not become a point of failure?

Thank you
March 12th 20 at 08:01
5 answers
March 12th 20 at 08:03
Two load balancer with the same addresses (a pair in the cluster, a total of four), each located in different regions of the country, each load balancer connected to two legs to different routers via EBGP. Becky of load balancers in constant synchronization (heartbeat). Add the allowance for virtualization (vmotion + fault tolerance), nutrition (two independent inputs + UPS + diesel generator). Statistically measure the inaccessibility of + recovery time + duration of preventive maintenance and scalable solution on the above scheme, inadequate accessibility to the desired level)

We have some sites where there is a risk of political unrest in transit countries, on two satellite channels are included.
March 12th 20 at 08:05
Therefore, the question arose - and how "big" companies with this "fight?"
Usually does or something like.
If engineering, then by duplicating sometimes multiple (aircraft hydraulics control can have a fourfold redundancy).
In large and adult companies at the entrance usually costs lakirovanie hardware network load balancer that spreads the load across clusters.
well by "big" companies I was referring to MRG, Yandex, Google
after all, they must have six nines of uptime, and the simple minute of a search engine -- a huge loss.

they have done so?
at the entrance usually costs lakirovanie hardware network load balancer that spreads the load across clusters
- Ignacio commented on March 12th 20 at 08:08
@Ignacio, can you double-check. Information open. On Hi-Load Classmates talk about their architecture. - Xzavier98 commented on March 12th 20 at 08:11
March 12th 20 at 08:07
You can make a balancer on the DNS level.
And you , as I understand it, is meant to tunnel (proxy).
PS
As a variant of the Failover IP and backup balansirovka.
Balancing using DNS only works on the local network.
The Internet prevents caching unsuccessful DNS. No, the decrease in the TTL does not resolve the problem completely
https://habr.com/company/ivi/blog/237349/
https://habr.com/company/ivi/blog/240237/
And the solution is DNS-based, in principle, understandable, and workable. From our operating experience and came out all these negative sides. In addition, the output node for preventing a damn complicated: you have to wait until rotten caches on all devices along the path to the user (by the way, it turns out that a huge number of home routers completely ignores the TTL in the DNS records and keep the cache before the power failure). And what happens if the site suddenly alarm went off, so generally it is terrible to think! And one more thing: very hard to understand which node the subscriber is served when he has a problem. Because it depends on several factors: in what region he is in, and what DNS it is using. In General, a lot of ambiguities.
- margret_Fay99 commented on March 12th 20 at 08:10
@margret_Fay99, in any case, that's just a proxy. - jay_King17 commented on March 12th 20 at 08:13
@jay_King17, there are more options, not DNS-Ohm single... - margret_Fay99 commented on March 12th 20 at 08:16
@margret_Fay99, there is certainly. - jay_King17 commented on March 12th 20 at 08:19
March 12th 20 at 08:09
the load balancers are too few. on top of them DNS server)
March 12th 20 at 08:11
In other words, raise the service, and then multiply it by 2. (two servers, two switches, two power servers, two UPS, etc.)
and if two load balancer crashes? - Ignacio commented on March 12th 20 at 08:14
@Ignacio, Put 10. It's all about money. Put 3 balansirovka. And he weight floor stands can fail, you consolidated it? A metal sheathed to the fire was not? And if tomorrow the end of the world, why bother to start something to do?
Stupid question, comrade. The risk is always there. - Bernardo58 commented on March 12th 20 at 08:17

Find more questions by tags System administration