Project

General

Profile

Actions

Task #850

closed

Investigate a new way for check smartgears container by nagios

Added by Roberto Cirillo over 9 years ago. Updated about 9 years ago.

Status:
Closed
Priority:
Normal
Category:
System Application
Target version:
Start date:
Apr 01, 2016
Due date:
% Done:

100%

Estimated time:
Infrastructure:
Development, Pre-Production, Production

Description

At this time we have a simple nagios check on tomcat port but this check often is not enough.
If a container, for some reasons, is not longer registered on the infrastructure for some time, the nagios check on tomcat port doesn't detect this problem.
A possible way for enhance the nagios check could be the following:
Every Smartgears node has an enabling service: Whn-Manager. This service could be checked via http for verify the container status.
For example, this url (related to node2-d-d4s.d4science.org): http://node2-d-d4s.d4science.org:8080/whn-manager/gcube/resource/ responds with a "resource is active" string.
What happen if this container, for some reasons, is no longer registered to the Infrastructure? What answer will be provided by this url?
if the answer is "the resource is not active", we have found a more specific nagios check.
There is a way to check this behavior?


Related issues

Related to D4Science Infrastructure - Task #3140: Improve nagios checks for gCore containerClosedAndrea Dell'AmicoApr 05, 2016

Actions
Related to D4Science Infrastructure - Task #3157: Improve the nagios check for the Smartgears (not the smart executor ones) nodesClosedAndrea Dell'AmicoApr 07, 2016

Actions
Actions #1

Updated by Roberto Cirillo over 9 years ago

  • Tracker changed from Support to Task
  • Start date changed from Oct 01, 2015 to Apr 01, 2016
Actions #2

Updated by Roberto Cirillo about 9 years ago

  • Target version changed from System Configuration to improve nagios checks
Actions #3

Updated by Roberto Cirillo about 9 years ago

  • Related to Task #3140: Improve nagios checks for gCore container added
Actions #4

Updated by Roberto Cirillo about 9 years ago

When the container is down, this url return an error. So, I think, this url should be better than the standard check on the port.
In my opinion, we could change the standard check with this one.
What do you think about this, @lucio.lelii@isti.cnr.it , @andrea.dellamico@isti.cnr.it ?

Actions #5

Updated by Lucio Lelii about 9 years ago

I agree with you, this is the right way to check the smartgears container

Actions #6

Updated by Andrea Dell'Amico about 9 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100
  • Infrastructure Pre-Production, Production added
Actions #7

Updated by Andrea Dell'Amico about 9 years ago

  • Related to Task #3157: Improve the nagios check for the Smartgears (not the smart executor ones) nodes added
Actions #8

Updated by Andrea Dell'Amico about 9 years ago

  • Status changed from Closed to In Progress

@lucio.lelii@isti.cnr.it Can you thell when the /whn-manager/gcube/resource/ URL was added to the whn-manager? There are some smartgears installations that fail the nagios check. Most of them are Greek VMs that I cannot access. But on node31.p.d4science.research-infrastructures.eu and node13.p.d4science.research-infrastructures.eu where the check is also failing, the whn-manager version is 1.0.0-3.1.0 installed on april 2014, so two years old.

Actions #9

Updated by Andrea Dell'Amico about 9 years ago

  • Status changed from In Progress to Closed

Never mind. The war installation name was the problem, see #3159

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)