Task #12449: Add Nagios checks on couchbase cluster - D4Science Infrastructure - D4science

Actions

Copy link

Task #12449

closed

Add Nagios checks on couchbase cluster

Added by Luca Frosini almost 7 years ago. Updated almost 7 years ago.

Status:

Closed

Priority:

Normal

Assignee:

_InfraScience Systems Engineer

Category:

Application

Target version:

Data Publishing

Start date:

Sep 10, 2018

Due date:

% Done:

100%

Estimated time:

Infrastructure:

Production

Description

It seems that the new buckets in couchbase cluster are not monitored by Nagios.

This is my mistake because I never advise you about them.

The old bucket is going to be deleted (see #12446) has the following checks:

accounting_service OPS
accounting_service VB total items
accounting_service disk creates per second
accounting_service items count
accounting_service used memory

Related issues

Actions

Copy link

Updated by Luca Frosini almost 7 years ago

Related to Task #12446: Remove accounting_service bucket added

Actions

Copy link

Updated by Andrea Dell'Amico almost 7 years ago

Can you list the buckets that need monitoring?

Actions

Copy link

Updated by Luca Frosini almost 7 years ago

The buckets to be monitored are:
accounting_storage_status
AccountingManager
JobUsageRecord
ServiceUsageRecord
StorageUsageRecord

Actions

Copy link

Updated by Tommaso Piccioli almost 7 years ago

Status changed from New to In Progress
% Done changed from 0 to 80

New nagios check on the selected buckets but we have to customize the parameters with @luca.frosini@isti.cnr.it

Actions

Copy link

Updated by Luca Frosini almost 7 years ago

I read the documentation of the couchbase nagios plugin at:
https://gcube.wiki.gcube-system.org/gcube/Monitoring_a_gCube_infrastructure_With_Nagios#Couchbase_plugin

which is more or less the documentation provided by the plugin.

I really don't know how to tune the metrics. Maybe, we are not so interested in monitoring buckets metrics, instead, we are interested in monitoring the cluster sanity.

Looking the alert received tonight they are just useless and instead they could cause discarding the important ones.

@tommaso.piccioli@isti.cnr.it @andrea.dellamico@isti.cnr.it @pasquale.pagano@isti.cnr.it what do you think?

Actions

Copy link

Updated by Andrea Dell'Amico almost 7 years ago

We do not collect metrics in nagios, and the checks are failing because the service is so slow to answer that the timeout is triggered and that seems independent from the specific check: they all fail.
I've checked the plugin options and there's no way to specifiy a longer timeout without changing the code. It uses python requests(), so it should be easy.

There also are a lot of parameters that we do not use, so I don't know if we are checking the most significant aspects of the cluster.

(I didn't know about the existence of that wiki page, most of the information reported is obsolete, FYI)

Actions

Copy link

Updated by Andrea Dell'Amico almost 7 years ago

Status changed from In Progress to Feedback
% Done changed from 80 to 100

I just configured a timeout in the couchbase check code:

r = requests.get(url, auth=(options.username, options.password),timeout=(10,120))

If it works we should create a proper fix and send a pull request to the author

Actions

Copy link

Updated by Andrea Dell'Amico almost 7 years ago

Status changed from Feedback to Closed

The change worked, it seems. I'm closing the ditcket.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

D4Science Infrastructure

Custom queries

Task #12449

Add Nagios checks on couchbase cluster

Updated by Luca Frosini almost 7 years ago

Updated by Andrea Dell'Amico almost 7 years ago

Updated by Luca Frosini almost 7 years ago

Updated by Tommaso Piccioli almost 7 years ago

Updated by Luca Frosini almost 7 years ago

Updated by Andrea Dell'Amico almost 7 years ago

Updated by Andrea Dell'Amico almost 7 years ago

Updated by Andrea Dell'Amico almost 7 years ago