Task #1215: Install Dataminer in production environment - D4Science Infrastructure - D4science

Actions

Copy link

Task #1215

closed

Install Dataminer in production environment

Added by Gianpaolo Coro almost 10 years ago. Updated over 9 years ago.

Status:

Closed

Priority:

High

Assignee:

Andrea Dell'Amico

Category:

High-Throughput-Computing

Target version:

D4Science Infrastructure Upgrade to gCube 3.9.0

Start date:

Oct 22, 2015

Due date:

% Done:

100%

Estimated time:

Infrastructure:

Production

Description

An installation of Dataminer in the production environment is required.
The installation should follow the guidelines at this wiki page:

https://gcube.wiki.gcube-system.org/gcube/DataMiner_Installation

and the instructions in the tickets related to this ticket.

Furthermore, apart from the machine for the single instance, another machine is required, with an Apache service installed, to act as proxy for DataMiner. This will be used to balance the requests to serveral instances that will be running in D4Science. The name of this balancer should be "dataminer.d4science.org".

The Apache service should answer on the 80 port, but also each Dataminer instance should be available on that port.

Related issues

Actions

Copy link

Updated by Gianpaolo Coro almost 10 years ago

Related to Task #422: Report libraries and configuration components for DataMiner added

Actions

Copy link

Updated by Gianpaolo Coro almost 10 years ago

Related to Task #421: Create an automatic installation package for DataMiner added

Actions

Copy link

Updated by Andrea Dell'Amico almost 10 years ago

Related to Task #1502: Automate the dataminer installation as a loadbalanced service added

Actions

Copy link

Updated by Andrea Dell'Amico almost 10 years ago

% Done changed from 0 to 20

Following the dev provisioning, I've created the ansible variabile groups needed to install a dataminer cluster in production.

Actions

Copy link

Updated by Gianpaolo Coro almost 10 years ago

@andrea.dellamico@isti.cnr.it @tommaso.piccioli@isti.cnr.it Who is responsible for creating the machines for the production environment and run the installation scripts?

Actions

Copy link

Updated by Andrea Dell'Amico almost 10 years ago

One load balancer and two dataminer servers also in production. Other dataminer servers can be added at a later time

Required specs for the load balancer:

Hostname: dataminer-lb1-p-d4s.d4science.org (with dataminer.d4science.org as alias)
Ubuntu 14.04 LTS
1 GB of RAM
2 virtual CPUs
10 GB disk

Required specs for the dataminer VMs, from the wiki page https://wiki.gcube-system.org/index.php/DataMiner_Installation:

Hostname: dataminer[1:2]-p-d4s.d4science.org
Ubuntu 12.04.5 LTS
6 GB of RAM
10 virtual CPUs
10 GB of HD space

Actions

Copy link

Updated by Andrea Dell'Amico almost 10 years ago

Status changed from New to In Progress
Assignee changed from Tommaso Piccioli to Andrea Dell'Amico

Actions

Copy link

Updated by Andrea Dell'Amico almost 10 years ago

dataminer1-p-d4s.d4science.org IP: 146.48.122.251
dataminer2-p-d4s.d4science.org IP: 146.48.123.64
dataminer-lb1-p-d4s.d4science.org IP: 146.48.123.71

Actions

Copy link

Updated by Andrea Dell'Amico almost 10 years ago

% Done changed from 20 to 60

The VMs are up and the provisioning of the three hosts is running.

Actions

Copy link

#10

Updated by Andrea Dell'Amico almost 10 years ago

Status changed from In Progress to Feedback
% Done changed from 60 to 100

The dataminer production cluster is ready to be tested. The main URL is http://dataminer.d4science.org/wps/

Actions

Copy link

#11

Updated by Gianpaolo Coro almost 10 years ago

Status changed from Feedback to In Progress
% Done changed from 100 to 90

I added Dataminer to the BiodiversityLab, ScalableDataMining and BiOnym VREs. It works very well with "local" algorithms. However no SmartGenericWorker is present in these VREs and should be added. I'm going to open another ticket and put it as dependency of this ticket.

Actions

Copy link

#12

Updated by Gianpaolo Coro almost 10 years ago

Related to Support #1837: Add SmartGenericWorkers to the VREs added

Actions

Copy link

#13

Updated by Gianpaolo Coro over 9 years ago

Test on all the algorithms were successful using 4 workers behind the scenes. I communicated the main dataminer link (load balancer) to FishBase people.
I have two questions on the installation @andrea.dellamico@isti.cnr.it :
1 - is the Dataminer configuration downloaded on-the fly during the installation? In other words: will it always be updated?
2 - is it possible to copy the encryption keys (https://goo.gl/cSXOBq) also in the PARALLEL_PROCESSING folder?

Actions

Copy link

#14

Updated by Andrea Dell'Amico over 9 years ago

Gianpaolo Coro wrote:

Test on all the algorithms were successful using 4 workers behind the scenes. I communicated the main dataminer link (load balancer) to FishBase people.
I have two questions on the installation @andrea.dellamico@isti.cnr.it :
1 - is the Dataminer configuration downloaded on-the fly during the installation? In other words: will it always be updated?

Nope. It was not requested, so the download of the svn files is done once. If the matter is to run a svn update at regular intervals it can be done easily. Otherwise, please give details.

2 - is it possible to copy the encryption keys (https://goo.gl/cSXOBq) also in the PARALLEL_PROCESSING folder?

Yes, I'll do.

Actions

Copy link

#15

Updated by Gianpaolo Coro over 9 years ago

Thank you Andrea,
the cfg and the PARALLEL_PROCESSING folders would be better downloaded from this SVN location in the next installations:

https://svn.d4science.research-infrastructures.eu/gcube/trunk/data-analysis/DataMinerConfiguration

Actions

Copy link

#16

Updated by Andrea Dell'Amico over 9 years ago

Gianpaolo Coro wrote:

Thank you Andrea,
the cfg and the PARALLEL_PROCESSING folders would be better downloaded from this SVN location in the next installations:

Do we need to perform a svn update regularly or not?

https://svn.d4science.research-infrastructures.eu/gcube/trunk/data-analysis/DataMinerConfiguration

I was using this (public path, to avoid authentication and problems with the untrusted TLS certificate):

http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/{cfg,PARALLEL_PROCESSING}

I don't think there's a difference between the private and the public link.

Actions

Copy link

#17

Updated by Gianpaolo Coro over 9 years ago

So I misunderstood. We do not need an automatic periodic update. Those folders need to be installed only at the beginning and should be updated only when a new release is available.
Furthermore, your link is good, it is the free-access read-only version of my link. If your script downloads that folder at each installation there is no problem.

Actions

Copy link

#18

Updated by Andrea Dell'Amico over 9 years ago

Gianpaolo Coro wrote:

If your script downloads that folder at each installation there is no problem.

Yes. I only changed the svn link and you changed the wiki accordingly, so that it now points to the public access svn repository.

Actions

Copy link

#19

Updated by Andrea Dell'Amico over 9 years ago

Status changed from In Progress to Feedback

Andrea Dell'Amico wrote:

2 - is it possible to copy the encryption keys (https://goo.gl/cSXOBq) also in the PARALLEL_PROCESSING folder?

Yes, I'll do.

The playbook is ready and I tried it on the dev instances. I see that you installed the production keys there. Do we need both the dev and the production keys, or under 'PARALLEL_PROCESSING' the production keys are needed even if it's the dev environment?

Actions

Copy link

#20

Updated by Gianpaolo Coro over 9 years ago

Perhaps the best would be having the production keys installed on the prod. servers and the development key on the dev. servers. From a technical point of view there is no issue if you copy all the keys on the services, since the scopes are assigned at infrastructure level and the belonging infrastructure depends on the GHN configuration.

Actions

Copy link

#21

Updated by Andrea Dell'Amico over 9 years ago

Gianpaolo Coro wrote:

Perhaps the best would be having the production keys installed on the prod. servers and the development key on the dev. servers.

It's what I'm (the playbook is) doing.

Actions

Copy link

#22

Updated by Andrea Dell'Amico over 9 years ago

Status changed from Feedback to Closed
% Done changed from 90 to 100

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

D4Science Infrastructure

Custom queries

Task #1215

Install Dataminer in production environment

Updated by Gianpaolo Coro almost 10 years ago

Updated by Gianpaolo Coro almost 10 years ago

Updated by Andrea Dell'Amico almost 10 years ago

Updated by Andrea Dell'Amico almost 10 years ago

Updated by Gianpaolo Coro almost 10 years ago

Updated by Andrea Dell'Amico almost 10 years ago

Updated by Andrea Dell'Amico almost 10 years ago

Updated by Andrea Dell'Amico almost 10 years ago

Updated by Andrea Dell'Amico almost 10 years ago

Updated by Andrea Dell'Amico almost 10 years ago

Updated by Gianpaolo Coro almost 10 years ago

Updated by Gianpaolo Coro almost 10 years ago

Updated by Gianpaolo Coro over 9 years ago

Updated by Andrea Dell'Amico over 9 years ago

Updated by Gianpaolo Coro over 9 years ago

Updated by Andrea Dell'Amico over 9 years ago

Updated by Gianpaolo Coro over 9 years ago

Updated by Andrea Dell'Amico over 9 years ago

Updated by Andrea Dell'Amico over 9 years ago

Updated by Gianpaolo Coro over 9 years ago

Updated by Andrea Dell'Amico over 9 years ago

Updated by Andrea Dell'Amico over 9 years ago