Task #9956
closed
Please slow down HAProxy/Nagios checks on Resource Registry
100%
Description
Looking at logs and accounting I foud that the service receive 1,33 checks per second (one check every 0,75 second).
The normal rate should be 1 check every 3 seconds. Can you please investigate it?
Updated by Andrea Dell'Amico over 7 years ago
- Status changed from New to In Progress
The count is correct. haproxy checks every 3 seconds. But there are two haproxy instances, and each instance checks both http and https.
As each protocol receives, from the active haproxy, a check every 3 seconds and 4 failures are needed to declare a service dead. So, 12 seconds are needed at worst before disconnecting a non responsive server.
To me, 12 seconds is way too much. But if you can accept longer disservices, we can increase the check interval.
Updated by Luca Frosini over 7 years ago
The load balancer is contacted on by using https .The instances have to respond on http too because of the whn-manager. Can you please remove the http from load balancer?
Updated by Andrea Dell'Amico over 7 years ago
- Status changed from In Progress to Feedback
- % Done changed from 0 to 100
Done. The load balancer now balances the resource registry https accesses only.
Updated by Luca Frosini over 7 years ago
The requests are now about 40 per minutes on each instance (they were 80 per minute). This mean that arrive one request every 1.5 second and not a request every 3 seconds.
If I correctly understand now 6 seconds are needed at worst before disconnecting a non responsive server.
Can we slow down to 1 request every 2 seconds which can be good compromise (8 seconds are needed at worst before disconnecting a non responsive server)
Updated by Andrea Dell'Amico over 7 years ago
Luca Frosini wrote:
The requests are now about 40 per minutes on each instance (they were 80 per minute). This mean that arrive one request every 1.5 second and not a request every 3 seconds.
If I correctly understand now 6 seconds are needed at worst before disconnecting a non responsive server.
No, nothing changed: the check interval is always 3 seconds, and the worst case scenario is 12 seconds. The two haproxy are independent one from another, and each backend inside the same haproxy instance is independent too.
Can we slow down to 1 request every 2 seconds which can be good compromise (8 seconds are needed at worst before disconnecting a non responsive server)
That would move the worst case scenario to 16 seconds, not 8.
Updated by Luca Frosini over 7 years ago
- Status changed from Feedback to Closed
Ok I understood. I'm going to close the ticket