Task #3140
closedImprove nagios checks for gCore container
100%
Description
At this time we have a simple nagios check on container port but this check often is not enough.
For example, if a container, for some reasons, is in outOfMemory, the nagios check on container port doesn't detect this problem.
We have successfully tested a new check based on wsdl url that is able to detect this kind of problem. The new check should be the following:
http://host:port/wsrf/services/gcube/common/vremanagement/GHNManager?WSDL
where "host" is the host of the container and "port" is the port where the container is running. All gCore containers in production environment should be configured in this way.
Files
Related issues
Updated by Roberto Cirillo about 9 years ago
- File wsdlgCoreCheckList.txt wsdlgCoreCheckList.txt added
In attachment a list of containers where the new check is failing. The smartgears hosts in this list should be equipped with a proper nagios check for smartgears containers.
Updated by Roberto Cirillo about 9 years ago
- Related to Task #850: Investigate a new way for check smartgears container by nagios added
Updated by Andrea Dell'Amico about 9 years ago
- Status changed from New to In Progress
- % Done changed from 0 to 80
I switched to the new check all the nodes described as SMART instances on the attachment.
Updated by Andrea Dell'Amico about 9 years ago
- Status changed from In Progress to Feedback
It seems that the following nodes aren't smartgears nodes:
dewn03.madgik.di.uoa.gr dewn09.madgik.di.uoa.gr dl23.di.uoa.gr node41.d4science.org node13.p.d4science.research-infrastructures.eu node31.p.d4science.research-infrastructures.eu node56.p.d4science.research-infrastructures.eu
Updated by Andrea Dell'Amico about 9 years ago
Some more:
node51.p.d4science.research-infrastructures.eu dewn08.madgik.di.uoa.gr dl17.di.uoa.gr
Updated by Andrea Dell'Amico about 9 years ago
- Status changed from Feedback to In Progress
My fault, I used the check that should be used to test the gcore nodes.
Updated by Andrea Dell'Amico about 9 years ago
- Status changed from In Progress to Feedback
OK, there is one node that's failing the gCore check:
dewn10.madgik.di.uoa.gr
Updated by Roberto Cirillo about 9 years ago
Andrea Dell'Amico wrote:
OK, there is one node that's failing the gCore check:
dewn10.madgik.di.uoa.gr
Yes, because it is a smartgears node.
Updated by Andrea Dell'Amico about 9 years ago
Roberto Cirillo wrote:
Andrea Dell'Amico wrote:
OK, there is one node that's failing the gCore check:
dewn10.madgik.di.uoa.grYes, because it is a smartgears node.
I see that there's a tomcat instance running, but http://dewn10.madgik.di.uoa.gr:8080/whn-manager/gcube/resource/ returns 404 (as it does on the hosts listed in #3157 )
Updated by Andrea Dell'Amico about 9 years ago
- Status changed from Feedback to Closed
- % Done changed from 80 to 100
The new check is active for all the gCore nodes.