Support #11605: Set DataMiner queue length to 1 - D4Science Infrastructure - D4science

Actions

Copy link

Support #11605

closed

Set DataMiner queue length to 1

Added by Gianpaolo Coro almost 8 years ago. Updated about 7 years ago.

Status:

Closed

Priority:

Urgent

Assignee:

Leonardo Candela

Category:

High-Throughput-Computing

Start date:

Nov 02, 2018

Due date:

% Done:

100%

Estimated time:

(Total: 0.00 h)

Infrastructure:

Production

Description

In the templates/setup.cfg file on the DataMiner, the maxcomputations parameter should be changed to 1 instead of 4. This parameter should be changed in the component too, possibly.
However, this modification is urgent to users currently using DataMiner as a cloud computing platform, thus action via provisioning is required.

Subtasks 2 (0 open — 2 closed)

Actions

Copy link

Updated by Andrea Dell'Amico almost 8 years ago

Where is it?

root@dataminer1-d-d4s:/home/gcube/tomcat/webapps/wps# find . -name setup.cfg
root@dataminer1-d-d4s:/home/gcube/tomcat/webapps/wps#

Actions

Copy link

Updated by Gianpaolo Coro almost 8 years ago

@lucio.lelii@isti.cnr.it could you help us, please?

Actions

Copy link

Updated by Lucio Lelii almost 8 years ago

It is in src/main/resources inside the dataminer jar, It cannot be changed in this way

Actions

Copy link

Updated by Lucio Lelii almost 8 years ago

I'm building, via etics, the dataminer jar with the modification needed. When it will be ready we need to replace the jar in every dataminer.

Actions

Copy link

Updated by Andrea Dell'Amico almost 8 years ago

Please let me know when it's ready and post the maven coordinates, so that I can put in place an ad hoc playbook.

Actions

Copy link

Updated by Andrea Dell'Amico almost 8 years ago

Status changed from New to In Progress

We have the artifact. I'm going to install it on the dev dataminers first.

Actions

Copy link

Updated by Andrea Dell'Amico almost 8 years ago

Can anybody check dataminer1-d-d4s.d4science.org? If I do not hear anything back in an hour, I'll start updating the production dataminers.

Actions

Copy link

Updated by Andrea Dell'Amico almost 8 years ago

% Done changed from 0 to 30

The services in dev restarted correctly, I'm starting rolling out in production.

Actions

Copy link

Updated by Andrea Dell'Amico almost 8 years ago

Status changed from In Progress to Feedback
% Done changed from 30 to 100

Done.

Actions

Copy link

#10

Updated by Gianpaolo Coro almost 8 years ago

As I have discussed with Andrea, this patch should be rolled-back since there are algorithms that invoke themselves, possibly on the same machine. In the next weeks, Andrea will deploy new DataMiner machines as Generic Worker machines behind the Generic Worker proxy, and this should solve the issue.

Actions

Copy link

#11

Updated by Andrea Dell'Amico almost 8 years ago

The rollback is running.

Gianpaolo Coro wrote:

As I have discussed with Andrea, this patch should be rolled-back since there are algorithms that invoke themselves, possibly on the same machine. In the next weeks, Andrea will deploy new DataMiner machines as Generic Worker machines behind the Generic Worker proxy, and this should solve the issue.

Would you open a dedicated ticket for the new generic workers? They substitute the old ones, correct?
And when deployed, the queue lenght parameter will be fixed at 1 on both the regular dataminer and the generic workers?

Actions

Copy link

#12

Updated by Andrea Dell'Amico almost 8 years ago

(rollback completed)

Actions

Copy link

#13

Updated by Gianpaolo Coro almost 8 years ago

Thank you. Given the way people are going to use the services, I think we will need to have the standard DataMiners running with the default configuration and the Generic Workers running with 1.

Actions

Copy link

#14

Updated by Andrea Dell'Amico almost 8 years ago

Gianpaolo Coro wrote:

Thank you. Given the way people are going to use the services, I think we will need to have the standard DataMiners running with the default configuration and the Generic Workers running with 1.

So that value must be converted in a property, configurable at provisioning time without swapping jar files.

Actions

Copy link

#15

Updated by Gianpaolo Coro almost 8 years ago

Status changed from Feedback to Closed

Actions

Copy link

#16

Updated by Lars Valentin over 7 years ago

Tracker changed from Task to Support
Assignee changed from _InfraScience Systems Engineer to Leonardo Candela
Priority changed from High to Normal

Hello Leonardo,

I hope you don't mind that I reuse this specific ticket instead of opening a new one with the same context.

Unfortunately, I did not use the chance in the webmeeting yesterday, since, I would like to know the current status of the queue length per damaminer server and the future possibilities.

My understanding in the past was that each dataminer has 16 cores and 4 slots to accept jobs - what means that each job is allowed to use max. 4 cores. Am I right at this point?
That is totally fine if we run workflows (models) which are not optimized for multicore use and therefore only use one core.

But what about workflows/models which are able to use all cores provided by the system? Usually, they should be programmed to leave one core for system processes, which would mean 3 cores are available to be used if one job can only assign 4 cores max.

I would like to ask if there is the possibility to execute workflows/models via REST in a way that the dataminer knows it should switch and run this job unique on the 16 cores dataminer to allow to incorporate all 16 cores for one job exclusively.

That would be 15 cores available to run the model instead of 3, what might roughly lead to 1/5 of the previous running time.

Thank you in advance!
Lars

PS. Do you may have dataminer with more cores to be addressed in the future for such cases?

Actions

Copy link

#17

Updated by Leonardo Candela over 7 years ago

Hi @lars.valentin@bfr.bund.de I would suggest to open specific tickets to discuss any need you / your use cases might have on aspects like that.

During our last PMB meeting (see slides here https://goo.gl/ZFYGJq) I tried to explain that:

right now there are two "clusters" configured to provide data miner facilities (proto and prod);
these clusters are not for exclusive use of AGINFRA+ cases;
we can have one cluster per VRE;

The business logic the service use for allocating the tasks on machines and consuming the available cores can be better explained by @gianpaolo.coro@isti.cnr.it ... in particular regarding cores (a) it depends also on how the algorithm has been implemented and (b) it cannot be changed at algorithm invocation time.

Last but not least, if you have particular settings / behaviors you have to satisfy we can try to configure a specific cluster yet this has a cost we have to carefully evaluate.

These are overall comments stemming from me, my suggestion is to try to be specific and report any specific need you have (e.g. enact process x that needs 100 cores) and we will do our best to satisfy it with the available resources and technologies.

Actions

Copy link

#18

Updated by Lars Valentin over 7 years ago

Due date set to Nov 02, 2018
Start date changed from Oct 06, 2018 to Nov 02, 2018

due to changes in a related task

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

D4Science Infrastructure

Custom queries

Support #11605

Set DataMiner queue length to 1

Updated by Andrea Dell'Amico almost 8 years ago

Updated by Gianpaolo Coro almost 8 years ago

Updated by Lucio Lelii almost 8 years ago

Updated by Lucio Lelii almost 8 years ago

Updated by Andrea Dell'Amico almost 8 years ago

Updated by Andrea Dell'Amico almost 8 years ago

Updated by Andrea Dell'Amico almost 8 years ago

Updated by Andrea Dell'Amico almost 8 years ago

Updated by Andrea Dell'Amico almost 8 years ago

Updated by Gianpaolo Coro almost 8 years ago

Updated by Andrea Dell'Amico almost 8 years ago

Updated by Andrea Dell'Amico almost 8 years ago

Updated by Gianpaolo Coro almost 8 years ago

Updated by Andrea Dell'Amico almost 8 years ago

Updated by Gianpaolo Coro almost 8 years ago

Updated by Lars Valentin over 7 years ago

Updated by Leonardo Candela over 7 years ago

Updated by Lars Valentin over 7 years ago