Mysql lags behind the replica or somehow, the server works very slow?

Hello! What a strange situation. There is mysql(mariadb) master. At the moment, he raised the slave to the dev and all is working well, no lags. Bought a new piece of metal, which have the same characteristics as the wizard lifted the second slave and get the replica lag on average 1-3 hours. The configs are all the same, nothing new... Server this Supermicro, encountered at the beginning of the trouble that the processors operate in a power saving mode, it is fixed. Maybe they're still some limitations fouled, I do not understand.
But here's a load interrupt is strange big, although the number of data records is quite small.
atop
PRC | sys 5.00 s | user 26.52 s | #proc 575 | #trun 6 | #tslpi 1071 | #tslpu 0 | #zombie 0 | clones 7 | #exit 0 |
CPU | sys 37% | user 218% | 21 irq% | idle 3693% | wait 34% | steal 0% | guest 0% | curf 2.15 GHz | curscal 97% |
CPL | avg1 3.95 | avg5 3.72 | 3.73 avg15 | | csw 360872 | | intr 364166 | | numcpu 40 |
MEM | tot 125.8 G | 49.5 G free | cache 61.5 G | 1.3 G buff | slab 1.5 G | shmem 99.5 M | vmbal 0.0 M | hptot 0.0 M | 0.0 M hpuse |
SWP | tot 8.0 G | 7.7 G free | | | | | | vmcom 55.1 G | vmlim 70.9 G |
DSK | sda | busy 96% | 17 read | write 4984 | KiB/r 20 | KiB/w 12 | MBr/s 0.0 | MBw/s 4.9 | avio 2.33 ms |
NFS | rpc 1 | cread 0 | cwrit 0 | MBcr/s 0.0 | MBcw/s 0.0 | nettcp 1 | netudp 0 | badaut 0 | 0 badcln |
NET | transport | tcpi 195613 | tcpo 252297 | udpi 53 | udpo 67 | tcpao 3723 | tcppo 3136 | tcprs 217 | udpie 0 |
NET | network | ipi 195667 | ipo 206104 | ipfrw 0 | deliv 195667 | | | icmpi 0 | 0 icmpo |
NET | eno1 11% | 225328 pcki | pcko 206800 | sp 1000 Mbps | si 119 Mbps | 68 Mbps so | erri 0 | erro 0 | drpo 0 |
NET | lo ---- | 45782 pcki | pcko 45782 | sp 0 Mbps | si 9524 Kbps | so 9524 Kbps | erri 0 | erro 0 | drpo 0 |

 TID PID RDDSK WRDSK WCANCL DSK CMD 1/6
 1640 - 352K 33076K 388K 100% mysqld
 692 - 24K 0K 0K 0% jbd2/sda2-8
20006 - 8K 0K 0K 0% apache2


iotop
Total DISK READ : 30.16 K/s | Total DISK WRITE : 3.75 M/s
Actual DISK READ: 30.16 K/s | Actual DISK WRITE: 5.84 M/s
 TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 
 1706 be/4 mysql 0.00 B/s 644.67 K/s 0.00 % 93.35 % mysqld
 1693 be/4 mysql 30.16 K/s 3.77 K/s 0.00 % 2.31 % mysqld
 692 be/3 root 0.00 B/s 0.00 B/s 0.00 % 1.54 % [jbd2/sda2-8]
 1697 be/4 mysql 0.00 B/s 3.06 M/s 0.00% 0.95 % mysqld
 1676 be/4 mysql 0.00 B/s 0.00 B/s 0.00% 0.80 % mysqld
 1705 be/4 mysql 0.00 B/s 52.78 K/s 0.00% 0.01 % mysqld
 1 be/4 root 0.00 B/s 0.00 B/s 0.00% 0.00 % init maybe-ubiquity
 2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]
 4 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/0:0H]


Installed 4 SSD hard disk and megaraid controller. 10 raid.
March 25th 20 at 13:28
3 answers
March 25th 20 at 13:30
Solution
Decided everything. The problem was the slow speed of disk writes. Demolished recompiled my Linux RAID, set this time Debian 10 and everything was fine :).

Everyone who helped thank you so much!
March 25th 20 at 13:32
Here here, the Consul, nomad. There is a great solution Master Master real time.
Do not understand anything...
If about muskul, master - master, too, can grab a lot of pitfalls, I'd better handles in case of failure will switch and calmly restore the old master...

Trouble is disk speed, he does not want to write faster 16 megabytes per second. That is where the stabbed suspect that Supermicro somewhere else saves energy and gives better performance for ssd drives - Naomi.Rodriguez commented on March 25th 20 at 13:35
And Muscul you? :-) Good is the speed. The database dump is usually more and it is not the case, if You have certainly not Hayload - Lowell.Olson51 commented on March 25th 20 at 13:38
How would You cluster kept? :-) - Lowell.Olson51 commented on March 25th 20 at 13:41
Well, of course, on the same hardware on the master write speed of 180 megabytes per second.
For the cluster enough that there is an Apache in it spinning and spread out on 3 servers, and there is cached data for Apache, that's no problem.... - Naomi.Rodriguez commented on March 25th 20 at 13:44
March 25th 20 at 13:34
Approval
1. on your server clock speed for one processor is not lower than the old one.( do not tell me about the number)
2. your master and slave are located in the same datacenter, and for good connections or one shrunken or special routing.
3. With access to the database you write the name of the domain instead of localhost
4. iotop - oka for 10 minutes did not rise above 60% i-O
5. you have NvME hope in 19 year you're not using something else.
6. The size of your database corresponds to the settings in my.conf and all tables exactly well written to the cache and so on?
7.You know that the socket connection is faster than normal
9. You know that localhost and 83.32.113. 32 rabotayut different algorithm.
Yes, I have everything in a local Gigabit network. When setting innodb_flush_log_at_trx_commit=2 slave very quickly caught up with the master. The problem is that the disk-write speed 36MB/c... - Naomi.Rodriguez commented on March 25th 20 at 13:37
@Naomi.Rodriguez,
IOtop - in the Studio
and if it is not a secret that over the drive a speed record provides ?
and reading ?
You were not the RAID of FDD ? - meggi commented on March 25th 20 at 13:40
@meggi, As in the first report iotop. And the rest of the tests in the comments... - Naomi.Rodriguez commented on March 25th 20 at 13:43
well, it's a problem with the screws
I-O is overloaded do not look at mysql problem - meggi commented on March 25th 20 at 13:46
Well, Yes, now not looking, slowly comes down to it, Here it is necessary to understand where the dog rummaged how the screw u and screw is... - Naomi.Rodriguez commented on March 25th 20 at 13:49
@Naomi.Rodriguez,
1. SMART in Studio
2. check the speed of read / write of each screw and the RAID results in the Studio
3. show fstab
4. model screws - meggi commented on March 25th 20 at 13:52
@meggi, Actually look. Yesterday, the current undestanding night...

dd if=/dev/zero of=/root/test bs=128k count=10k oflag=dsync ; rm-f /root/test
1342177280 bytes (1,3 GB, and 1.2 GiB) copied, 107,936 s, 12,6 MB/s

dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1073741824 bytes (1.1 GB, and 1.0 GiB) copied, 2,42738 s, 442 MB/s


That is, the problem appears when many small records. 1 core CPU to the eyeballs. Now you have to understand what to do to solve this issue... - Naomi.Rodriguez commented on March 25th 20 at 13:55
@Naomi.Rodriguez, e no ssd, no such problem, poskolku head physically writes to the disk, and do not need awaits the block when it divertida on the spindle to the head.
so write, and parallel read and small filf for ssd do not affect the speed.
You have an obvious problem with the disk
but you stubbornly refused to answer some questions, so let's go to the second round
model screws, what kind of RAID as RAID fstab - meggi commented on March 25th 20 at 13:58
I'm just a big file checked speed is good and when a lot of small, 1 core CPU 100% loading, that thought that a lack of it. And the disk can cope, the percents do not cope...

As a RAID there are no errors all is well... There is a very large listing...
On all screws
Media Error Count: 0
Other Error Count: 0


cat /etc/fstab 
UUID=9d056c79-deb8-4fae-9bac-b908d267a70b / ext4 defaults 0 0
/swap.img none swap sw 0 0


Device Model: INTEL SSDSC2KG480G8

raid 10 of 4 screws

megacli-LDInfo -Lall -aAll


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name :Volume0
RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0
Size : 893.25 GB
Sector Size : 512
Is VD emulated : Yes
Mirror Data : 893.25 GB
State : Optimal
Strip Size : 256 KB
Number Of Drives per span:2
Span Depth : 2
Default Cache Policy: WriteThrough, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None
Bad Blocks Exist: No
PI type: No PI

Is VD Cached: No


Other issues tell me how to check... - Naomi.Rodriguez commented on March 25th 20 at 14:01
the entry in 1 thread what ? - meggi commented on March 25th 20 at 14:04
Here above it was 442 MB/s

dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1073741824 bytes (1.1 GB, and 1.0 GiB) copied, 2,42738 s, 442 MB/s
- Naomi.Rodriguez commented on March 25th 20 at 14:07
@Naomi.Rodriguezis killed the cache? - meggi commented on March 25th 20 at 14:10
@meggi, yeah, the option oflag=dsync. - Naomi.Rodriguez commented on March 25th 20 at 14:13
hat
give details of your rental server
what did what is the OS if there is or what
in General, more
why not software RAID, etc - meggi commented on March 25th 20 at 14:16
Your server, new Ubuntu 18.04 Deconfigure cluster doserovski it(I stopped, no difference). I hardwarei RAID is more convenient, though and software too. Happy to have checked each drive separately, but don't know how to do with my RAID. It would be possible to go to the datacenter to remove it, drive each screw and TP, if nothing helps. To go far and just have to endure all (...

In fact, about download percent lied its someone else shipped :).

In short, why he eats all interrupts IO and that is why I do not understand... - Naomi.Rodriguez commented on March 25th 20 at 14:19
ardorny RAID essence the same software but chipset ( this is not an analog solution and software)
As a consequence of the two postulates
1. you will need exactly the same controller if this will burn, and no other, and will burn it not in 5 years.
2. cheap or even middle segment controllers may be even less effective than software, poskolku on the IIC can stand cheap processor, and the gain is only in expensive controllers.
3. there is no opportunity to touch the disk from the system directly
So I put all software, easier than a new controller to add a couple of screws, and then the stack.
The more NVme drives particularly in this speed nuzhdayutsya, and their price still adequate.

I see this option
1. pull one of the disks from the RAID and mount it as a normal block device
to test the speed
2. What kind of RAID ? - meggi commented on March 25th 20 at 14:22
Well, like pros cons Yes :). Can SL server will have to softwarem...
02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] (rev 02)
raid 10 .
all I found in the BIOS to do it screw offline, I did, but he still remained in the mountains still, and from Linux not visible... - Naomi.Rodriguez commented on March 25th 20 at 14:25
@meggi, we'll Go next week to the data center, that server would not remove the raise softwary 10 RAID and see some of the screws there was naughty :) - Naomi.Rodriguez commented on March 25th 20 at 14:28

Find more questions by tags MySQLLinux