Project

General

Profile

Actions

Task #8103

closed

a nagios script must be scheduled every 24 h to add a list of resources to all scopes

Added by Lucio Lelii about 8 years ago. Updated almost 8 years ago.

Status:
Closed
Priority:
Immediate
Assignee:
_InfraScience Systems Engineer
Category:
Other
Target version:
Start date:
Apr 12, 2017
Due date:
% Done:

100%

Estimated time:
Infrastructure:
Production

Related issues

Blocked by D4Science Infrastructure - VM Creation #8486: Provide a new SmartGears node for resource-checker pluginClosedCostantino PercianteMay 11, 2017

Actions
Actions #1

Updated by Lucio Lelii about 8 years ago

  • Subject changed from a nagios script must be sheduled every 24 h to add a list of resources to all scopes to a nagios script must be scheduled every 24 h to add a list of resources to all scopes
Actions #2

Updated by Andrea Dell'Amico about 8 years ago

  • Assignee changed from Andrea Dell'Amico to _InfraScience Systems Engineer

Can you give us less information? With so many details we lose the joy of discovering new ways to mess up with our systems.

Actions #3

Updated by Pasquale Pagano about 8 years ago

  • Priority changed from Low to Urgent
Actions #5

Updated by Andrea Dell'Amico about 8 years ago

Wy a nagios script anyway? Is it something that can be used as a check too?

Actions #6

Updated by Pasquale Pagano about 8 years ago

Nagios was my suggestion for the following reason:

  • it notifies when something happen
  • it keeps track of the history and we can understand how frequently we are loosing resources
  • it allows to easily schedule a check

Any other solution can be selected clearly.

Actions #7

Updated by Lucio Lelii about 8 years ago

but this script only adds all the scope to the selected resource. The check for the scope loss in the resource should be done with a smartexecutor plugin as reported in the ticket #7800

Actions #8

Updated by Andrea Dell'Amico about 8 years ago

And is the plugin meant to run the script that restores the scopes too? Or we run it as a secondary effect of the nagios check?

From what I read on the other related tasks, a nagios check could run on the host where the smart executor plugin lives. The plugin could write its results into a file (list of missing scopes, if any. 'none' if it's all OK). The nagios check will read that file, report the result and, if there are missing scopes, run the script that fixes them. So the scopes would be added again in a matter of minutes and we will have a nagios report.

Actions #9

Updated by Costantino Perciante about 8 years ago

@lucio.lelii@isti.cnr.it told me that his script just needs the list of missing ids (no matter where the resources were missing, because the script adds them everywhere again).

@andrea.dellamico@isti.cnr.it Please, just let me know the path in which nagios will look for that file

Actions #10

Updated by Andrea Dell'Amico about 8 years ago

We need a place writeable by the gcube user. Anything under /home/gcube is fine with me, better a subdirectory. /home/gcube/scopes_data/scopes_status maybe? where scopes_status is the filename?

Actions #11

Updated by Costantino Perciante about 8 years ago

Andrea Dell'Amico wrote:

We need a place writeable by the gcube user. Anything under /home/gcube is fine with me, better a subdirectory. /home/gcube/scopes_data/scopes_status maybe? where scopes_status is the filename?

Since we are dealing with identifiers of resources, I would say that /home/gcube/missing_resources/identifiers is ok too, isn't it? ("identifiers" is the file name)

Moreover, is there a place in dev in which we can test both the smart-executor plugin and the script?

Actions #12

Updated by Andrea Dell'Amico about 8 years ago

Costantino Perciante wrote:

Andrea Dell'Amico wrote:

We need a place writeable by the gcube user. Anything under /home/gcube is fine with me, better a subdirectory. /home/gcube/scopes_data/scopes_status maybe? where scopes_status is the filename?

Since we are dealing with identifiers of resources, I would say that /home/gcube/missing_resources/identifiers is ok too, isn't it? ("identifiers" is the file name)

Yes, no probl.

Moreover, is there a place in dev in which we can test both the smart-executor plugin and the script?

It's a question for @lucio.lelii@isti.cnr.it I guess.

Actions #13

Updated by Lucio Lelii about 8 years ago

  • File AddResourcesToAllScopes.java added

I have just attached the script to add all scopes to the selected resource ids.
Run it with the command:

java AddResourcesToAllScopes id1 id2 ... idn

with the smartgears classpath.

Actions #14

Updated by Pasquale Pagano about 8 years ago

  • Priority changed from Urgent to Immediate

We had another issue in production and another ticket from the user. It is fundamental to implement this workaround now.

Actions #18

Updated by Andrea Dell'Amico about 8 years ago

Lucio Lelii wrote:

I have just attached the script to add all scopes to the selected resource ids.
Run it with the command:

java AddResourcesToAllScopes id1 id2 ... idn

with the smartgears classpath.

So I need to run it on a smartgears node, the one that collects the missing IDs. I still don't know what node is that to be used.
The java source should live on subversion, btw.

Actions #19

Updated by Costantino Perciante about 8 years ago

The smart executor plugin will run every hour, starting from now, on the node resource-checker-d-d4s.d4science.org. It will write a file with the missing resources' identifiers at /home/gcube/missing_resources/identifiers (it will contain "none" if nothing is wrong). Please perform the other missing operations to let Lucio's code properly work

Actions #20

Updated by Andrea Dell'Amico about 8 years ago

@lucio.lelii@isti.cnr.it the java source has a fixed list of VO from production. I cannot use it to test in dev. Can you provide a binary, btw?

Actions #21

Updated by Lucio Lelii about 8 years ago

yes, the list of VOs is fixed for the production, I can change the script with 2 options:

  • the script takes as argument the list of VOs
  • the script has a fixed list of production VOs and a list of development VOs and you can select the environment passing a special argument

Tell me which one you prefer.

Actions #22

Updated by Costantino Perciante about 8 years ago

What if we use the smart-executor plugin for this task too?
I mean, no other external file/scripts/whatever, just the plugin.

Actions #23

Updated by Andrea Dell'Amico about 8 years ago

Costantino Perciante wrote:

What if we use the smart-executor plugin for this task too?
I mean, no other external file/scripts/whatever, just the plugin.

It sounds better to me. So the nagios check should report the status change only. Correct?

Actions #24

Updated by Costantino Perciante about 8 years ago

Andrea Dell'Amico wrote:

Costantino Perciante wrote:

What if we use the smart-executor plugin for this task too?
I mean, no other external file/scripts/whatever, just the plugin.

It sounds better to me. So the nagios check should report the status change only. Correct?

ok!

@lucio.lelii@isti.cnr.it the plugin already evaluates the list of VOs (taking into account the infrastructure in which it is running, of course).

I guess I can easily import your code in the resource-checker plugin

Actions #25

Updated by Costantino Perciante about 8 years ago

The updated script is running on that node. It also takes care of re-adding any missing resource to a context.

Actions #26

Updated by Andrea Dell'Amico almost 8 years ago

  • Blocked by VM Creation #8486: Provide a new SmartGears node for resource-checker plugin added
Actions #27

Updated by Andrea Dell'Amico almost 8 years ago

  • File deleted (AddResourcesToAllScopes.java)
Actions #28

Updated by Pasquale Pagano almost 8 years ago

This activity was urgent one month ago. Please try to complete it asap.

Actions #29

Updated by Andrea Dell'Amico almost 8 years ago

Pasquale Pagano wrote:

This activity was urgent one month ago. Please try to complete it asap.

As soon as the production service works.

Actions #30

Updated by Andrea Dell'Amico almost 8 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100
  • Infrastructure deleted (Development)

Done. The check goes to CRITICAL if the /home/gcube/missing_resources/identifiers file contains anything different than the string none.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)