Project

General

Profile

Actions

Task #9956

closed

Please slow down HAProxy/Nagios checks on Resource Registry

Added by Luca Frosini over 7 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
_InfraScience Systems Engineer
Category:
System Application
Target version:
Start date:
Oct 16, 2017
Due date:
% Done:

100%

Estimated time:
Infrastructure:
Production

Description

Looking at logs and accounting I foud that the service receive 1,33 checks per second (one check every 0,75 second).
The normal rate should be 1 check every 3 seconds. Can you please investigate it?

Actions #1

Updated by Andrea Dell'Amico over 7 years ago

  • Status changed from New to In Progress

The count is correct. haproxy checks every 3 seconds. But there are two haproxy instances, and each instance checks both http and https.

As each protocol receives, from the active haproxy, a check every 3 seconds and 4 failures are needed to declare a service dead. So, 12 seconds are needed at worst before disconnecting a non responsive server.
To me, 12 seconds is way too much. But if you can accept longer disservices, we can increase the check interval.

Actions #2

Updated by Luca Frosini over 7 years ago

The load balancer is contacted on by using https .The instances have to respond on http too because of the whn-manager. Can you please remove the http from load balancer?

Actions #3

Updated by Andrea Dell'Amico over 7 years ago

  • Status changed from In Progress to Feedback
  • % Done changed from 0 to 100

Done. The load balancer now balances the resource registry https accesses only.

Actions #4

Updated by Luca Frosini over 7 years ago

The requests are now about 40 per minutes on each instance (they were 80 per minute). This mean that arrive one request every 1.5 second and not a request every 3 seconds.
If I correctly understand now 6 seconds are needed at worst before disconnecting a non responsive server.

Can we slow down to 1 request every 2 seconds which can be good compromise (8 seconds are needed at worst before disconnecting a non responsive server)

Actions #5

Updated by Andrea Dell'Amico over 7 years ago

Luca Frosini wrote:

The requests are now about 40 per minutes on each instance (they were 80 per minute). This mean that arrive one request every 1.5 second and not a request every 3 seconds.
If I correctly understand now 6 seconds are needed at worst before disconnecting a non responsive server.

No, nothing changed: the check interval is always 3 seconds, and the worst case scenario is 12 seconds. The two haproxy are independent one from another, and each backend inside the same haproxy instance is independent too.

Can we slow down to 1 request every 2 seconds which can be good compromise (8 seconds are needed at worst before disconnecting a non responsive server)

That would move the worst case scenario to 16 seconds, not 8.

Actions #6

Updated by Luca Frosini over 7 years ago

  • Status changed from Feedback to Closed

Ok I understood. I'm going to close the ticket

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)