Project

General

Profile

Actions

Task #3140

closed

Improve nagios checks for gCore container

Added by Roberto Cirillo about 9 years ago. Updated about 9 years ago.

Status:
Closed
Priority:
Normal
Category:
System Application
Target version:
Start date:
Apr 05, 2016
Due date:
% Done:

100%

Estimated time:
Infrastructure:
Pre-Production, Production

Description

At this time we have a simple nagios check on container port but this check often is not enough.
For example, if a container, for some reasons, is in outOfMemory, the nagios check on container port doesn't detect this problem.
We have successfully tested a new check based on wsdl url that is able to detect this kind of problem. The new check should be the following:

http://host:port/wsrf/services/gcube/common/vremanagement/GHNManager?WSDL

where "host" is the host of the container and "port" is the port where the container is running. All gCore containers in production environment should be configured in this way.


Files

wsdlgCoreCheckList.txt (1.03 KB) wsdlgCoreCheckList.txt Roberto Cirillo, Apr 05, 2016 04:38 PM

Related issues

Related to D4Science Infrastructure - Task #850: Investigate a new way for check smartgears container by nagios ClosedRoberto CirilloApr 01, 2016

Actions
Actions #1

Updated by Roberto Cirillo about 9 years ago

In attachment a list of containers where the new check is failing. The smartgears hosts in this list should be equipped with a proper nagios check for smartgears containers.

Actions #2

Updated by Roberto Cirillo about 9 years ago

  • Related to Task #850: Investigate a new way for check smartgears container by nagios added
Actions #3

Updated by Andrea Dell'Amico about 9 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 80

I switched to the new check all the nodes described as SMART instances on the attachment.

Actions #4

Updated by Andrea Dell'Amico about 9 years ago

  • Status changed from In Progress to Feedback

It seems that the following nodes aren't smartgears nodes:

dewn03.madgik.di.uoa.gr
dewn09.madgik.di.uoa.gr
dl23.di.uoa.gr
node41.d4science.org
node13.p.d4science.research-infrastructures.eu
node31.p.d4science.research-infrastructures.eu
node56.p.d4science.research-infrastructures.eu
Actions #5

Updated by Andrea Dell'Amico about 9 years ago

Some more:

node51.p.d4science.research-infrastructures.eu
dewn08.madgik.di.uoa.gr
dl17.di.uoa.gr
Actions #6

Updated by Andrea Dell'Amico about 9 years ago

  • Status changed from Feedback to In Progress

My fault, I used the check that should be used to test the gcore nodes.

Actions #7

Updated by Andrea Dell'Amico about 9 years ago

  • Status changed from In Progress to Feedback

OK, there is one node that's failing the gCore check:

dewn10.madgik.di.uoa.gr
Actions #8

Updated by Roberto Cirillo about 9 years ago

Andrea Dell'Amico wrote:

OK, there is one node that's failing the gCore check:

dewn10.madgik.di.uoa.gr

Yes, because it is a smartgears node.

Actions #9

Updated by Andrea Dell'Amico about 9 years ago

Roberto Cirillo wrote:

Andrea Dell'Amico wrote:

OK, there is one node that's failing the gCore check:

dewn10.madgik.di.uoa.gr

Yes, because it is a smartgears node.

I see that there's a tomcat instance running, but http://dewn10.madgik.di.uoa.gr:8080/whn-manager/gcube/resource/ returns 404 (as it does on the hosts listed in #3157 )

Actions #10

Updated by Andrea Dell'Amico about 9 years ago

  • Status changed from Feedback to Closed
  • % Done changed from 80 to 100

The new check is active for all the gCore nodes.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)