Task #9956: Please slow down HAProxy/Nagios checks on Resource Registry - D4Science Infrastructure - D4science

Actions

Copy link

Task #9956

closed

Please slow down HAProxy/Nagios checks on Resource Registry

Added by Luca Frosini about 8 years ago. Updated about 8 years ago.

Status:

Closed

Priority:

Normal

Assignee:

_InfraScience Systems Engineer

Category:

System Application

Target version:

Nagios monitoring

Start date:

Oct 16, 2017

Due date:

% Done:

100%

Estimated time:

Infrastructure:

Production

Description

Looking at logs and accounting I foud that the service receive 1,33 checks per second (one check every 0,75 second).
The normal rate should be 1 check every 3 seconds. Can you please investigate it?

Actions

Copy link

Updated by Andrea Dell'Amico about 8 years ago

Status changed from New to In Progress

The count is correct. haproxy checks every 3 seconds. But there are two haproxy instances, and each instance checks both http and https.

As each protocol receives, from the active haproxy, a check every 3 seconds and 4 failures are needed to declare a service dead. So, 12 seconds are needed at worst before disconnecting a non responsive server.
To me, 12 seconds is way too much. But if you can accept longer disservices, we can increase the check interval.

Actions

Copy link

Updated by Luca Frosini about 8 years ago

The load balancer is contacted on by using https .The instances have to respond on http too because of the whn-manager. Can you please remove the http from load balancer?

Actions

Copy link

Updated by Andrea Dell'Amico about 8 years ago

Status changed from In Progress to Feedback
% Done changed from 0 to 100

Done. The load balancer now balances the resource registry https accesses only.

Actions

Copy link

Updated by Luca Frosini about 8 years ago

The requests are now about 40 per minutes on each instance (they were 80 per minute). This mean that arrive one request every 1.5 second and not a request every 3 seconds.
If I correctly understand now 6 seconds are needed at worst before disconnecting a non responsive server.

Can we slow down to 1 request every 2 seconds which can be good compromise (8 seconds are needed at worst before disconnecting a non responsive server)

Actions

Copy link

Updated by Andrea Dell'Amico about 8 years ago

Luca Frosini wrote:

The requests are now about 40 per minutes on each instance (they were 80 per minute). This mean that arrive one request every 1.5 second and not a request every 3 seconds.
If I correctly understand now 6 seconds are needed at worst before disconnecting a non responsive server.

No, nothing changed: the check interval is always 3 seconds, and the worst case scenario is 12 seconds. The two haproxy are independent one from another, and each backend inside the same haproxy instance is independent too.

Can we slow down to 1 request every 2 seconds which can be good compromise (8 seconds are needed at worst before disconnecting a non responsive server)

That would move the worst case scenario to 16 seconds, not 8.

Actions

Copy link

Updated by Luca Frosini about 8 years ago

Status changed from Feedback to Closed

Ok I understood. I'm going to close the ticket

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

D4Science Infrastructure

Custom queries

Task #9956

Please slow down HAProxy/Nagios checks on Resource Registry

Updated by Andrea Dell'Amico about 8 years ago

Updated by Luca Frosini about 8 years ago

Updated by Andrea Dell'Amico about 8 years ago

Updated by Luca Frosini about 8 years ago

Updated by Andrea Dell'Amico about 8 years ago

Updated by Luca Frosini about 8 years ago