Project

General

Profile

Actions

Task #3133

closed

Install monitoring checks on access.d4science.org

Added by Roberto Cirillo about 9 years ago. Updated about 9 years ago.

Status:
Closed
Priority:
Low
Assignee:
_InfraScience Systems Engineer
Category:
System Application
Start date:
Apr 05, 2016
Due date:
% Done:

100%

Estimated time:
Infrastructure:
Development, Pre-Production

Description

It's need a standard nagios check on port 8080 and to check the memory usage of this host.
What check do you suggest for memory usage? Ganglia or Munin?

Actions #1

Updated by Andrea Dell'Amico about 9 years ago

Munin will be dismissed in a not so distant future, I hope. I'll add the node to the gcore ganglia cluster.

About the http check: is there a 'standard' URL to check the gcore nodes status? I see that the majority of the gcore checks are simply a test that the port is open.

Actions #2

Updated by Roberto Cirillo about 9 years ago

Andrea Dell'Amico wrote:

Munin will be dismissed in a not so distant future, I hope. I'll add the node to the gcore ganglia cluster.

ok

About the http check: is there a 'standard' URL to check the gcore nodes status? I see that the majority of the gcore checks are simply a test that the port is open.

Yes, this is true but at the moment, I think, we don't have a better check for this type of service. @gianpaolo.coro@isti.cnr.it have you any idea about this?

Actions #3

Updated by Gianpaolo Coro about 9 years ago

Unfortunately, I don't have any information about this, especially since this is an old gCube (WSDL-based) service.
Maybe a call to the IS could do the job. For example, if there was an Out-of-Memory error, the service should not be able to update its status on the IS and this could result from the interrogation to the IS. @lucio.lelii@isti.cnr.it is there such a REST-like check? Alternatively, is there a means to monitor the machine service status in another way than asking to the service?

Actions #4

Updated by Andrea Dell'Amico about 9 years ago

If the WSDL lives at a fixed position, the nagios check could try to fetch it.
A call to the IS to check a service that lives elsewhere seems too convoluted to me.

Actions #5

Updated by Lucio Lelii about 9 years ago

the simpler solution is to ask for the wsdl of one of the base services in gcore (eg. http://node1.d.d4science.research-infrastructures.eu:8080/wsrf/services/gcube/common/vremanagement/GHNManager?WSDL)

Actions #6

Updated by Roberto Cirillo about 9 years ago

@lucio.lelii@isti.cnr.it could this solution be generalized for all ghn containers?

Actions #7

Updated by Lucio Lelii about 9 years ago

Yes, the ghnManager is installed on all gCore container

Actions #8

Updated by Roberto Cirillo about 9 years ago

I think we could progressively migrate all checks "port based" to checks "wsdl based" , for every container gCore in production environment. If you agree on this, I'm going to open a new ticket with a list of all container involved in this task.

Actions #9

Updated by Andrea Dell'Amico about 9 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 60

Roberto Cirillo wrote:

I think we could progressively migrate all checks "port based" to checks "wsdl based" , for every container gCore in production environment. If you agree on this, I'm going to open a new ticket with a list of all container involved in this task.

Yes please.

Ganglia is active on the node, btw.

Actions #10

Updated by Andrea Dell'Amico about 9 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 60 to 100

nagios is also active.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)