Project

General

Profile

Actions

Task #1215

closed

Install Dataminer in production environment

Added by Gianpaolo Coro over 9 years ago. Updated over 9 years ago.

Status:
Closed
Priority:
High
Category:
High-Throughput-Computing
Start date:
Oct 22, 2015
Due date:
% Done:

100%

Estimated time:
Infrastructure:
Production

Description

An installation of Dataminer in the production environment is required.
The installation should follow the guidelines at this wiki page:

https://gcube.wiki.gcube-system.org/gcube/DataMiner_Installation

and the instructions in the tickets related to this ticket.

Furthermore, apart from the machine for the single instance, another machine is required, with an Apache service installed, to act as proxy for DataMiner. This will be used to balance the requests to serveral instances that will be running in D4Science. The name of this balancer should be "dataminer.d4science.org".

The Apache service should answer on the 80 port, but also each Dataminer instance should be available on that port.


Related issues

Related to D4Science Infrastructure - Task #422: Report libraries and configuration components for DataMinerClosedAndrea Dell'AmicoJul 23, 2015

Actions
Related to D4Science Infrastructure - Task #421: Create an automatic installation package for DataMinerClosedAndrea Dell'AmicoJul 23, 2015

Actions
Related to D4Science Infrastructure - Task #1502: Automate the dataminer installation as a loadbalanced serviceClosed_InfraScience Systems EngineerNov 23, 2015

Actions
Related to D4Science Infrastructure - Support #1837: Add SmartGenericWorkers to the VREsClosedRoberto CirilloDec 17, 2015

Actions
Actions #1

Updated by Gianpaolo Coro over 9 years ago

  • Related to Task #422: Report libraries and configuration components for DataMiner added
Actions #2

Updated by Gianpaolo Coro over 9 years ago

  • Related to Task #421: Create an automatic installation package for DataMiner added
Actions #3

Updated by Andrea Dell'Amico over 9 years ago

  • Related to Task #1502: Automate the dataminer installation as a loadbalanced service added
Actions #4

Updated by Andrea Dell'Amico over 9 years ago

  • % Done changed from 0 to 20

Following the dev provisioning, I've created the ansible variabile groups needed to install a dataminer cluster in production.

Actions #5

Updated by Gianpaolo Coro over 9 years ago

@andrea.dellamico@isti.cnr.it @tommaso.piccioli@isti.cnr.it Who is responsible for creating the machines for the production environment and run the installation scripts?

Actions #6

Updated by Andrea Dell'Amico over 9 years ago

One load balancer and two dataminer servers also in production. Other dataminer servers can be added at a later time

Required specs for the load balancer:

  • Hostname: dataminer-lb1-p-d4s.d4science.org (with dataminer.d4science.org as alias)
  • Ubuntu 14.04 LTS
  • 1 GB of RAM
  • 2 virtual CPUs
  • 10 GB disk

Required specs for the dataminer VMs, from the wiki page https://wiki.gcube-system.org/index.php/DataMiner_Installation:

  • Hostname: dataminer[1:2]-p-d4s.d4science.org
  • Ubuntu 12.04.5 LTS
  • 6 GB of RAM
  • 10 virtual CPUs
  • 10 GB of HD space
Actions #7

Updated by Andrea Dell'Amico over 9 years ago

  • Status changed from New to In Progress
  • Assignee changed from Tommaso Piccioli to Andrea Dell'Amico
Actions #8

Updated by Andrea Dell'Amico over 9 years ago

dataminer1-p-d4s.d4science.org IP: 146.48.122.251
dataminer2-p-d4s.d4science.org IP: 146.48.123.64
dataminer-lb1-p-d4s.d4science.org IP: 146.48.123.71
Actions #9

Updated by Andrea Dell'Amico over 9 years ago

  • % Done changed from 20 to 60

The VMs are up and the provisioning of the three hosts is running.

Actions #10

Updated by Andrea Dell'Amico over 9 years ago

  • Status changed from In Progress to Feedback
  • % Done changed from 60 to 100

The dataminer production cluster is ready to be tested. The main URL is http://dataminer.d4science.org/wps/

Actions #11

Updated by Gianpaolo Coro over 9 years ago

  • Status changed from Feedback to In Progress
  • % Done changed from 100 to 90

I added Dataminer to the BiodiversityLab, ScalableDataMining and BiOnym VREs. It works very well with "local" algorithms. However no SmartGenericWorker is present in these VREs and should be added. I'm going to open another ticket and put it as dependency of this ticket.

Actions #12

Updated by Gianpaolo Coro over 9 years ago

  • Related to Support #1837: Add SmartGenericWorkers to the VREs added
Actions #13

Updated by Gianpaolo Coro over 9 years ago

Test on all the algorithms were successful using 4 workers behind the scenes. I communicated the main dataminer link (load balancer) to FishBase people.
I have two questions on the installation @andrea.dellamico@isti.cnr.it :
1 - is the Dataminer configuration downloaded on-the fly during the installation? In other words: will it always be updated?
2 - is it possible to copy the encryption keys (https://goo.gl/cSXOBq) also in the PARALLEL_PROCESSING folder?

Actions #14

Updated by Andrea Dell'Amico over 9 years ago

Gianpaolo Coro wrote:

Test on all the algorithms were successful using 4 workers behind the scenes. I communicated the main dataminer link (load balancer) to FishBase people.
I have two questions on the installation @andrea.dellamico@isti.cnr.it :
1 - is the Dataminer configuration downloaded on-the fly during the installation? In other words: will it always be updated?

Nope. It was not requested, so the download of the svn files is done once. If the matter is to run a svn update at regular intervals it can be done easily. Otherwise, please give details.

2 - is it possible to copy the encryption keys (https://goo.gl/cSXOBq) also in the PARALLEL_PROCESSING folder?

Yes, I'll do.

Actions #15

Updated by Gianpaolo Coro over 9 years ago

Thank you Andrea,
the cfg and the PARALLEL_PROCESSING folders would be better downloaded from this SVN location in the next installations:

https://svn.d4science.research-infrastructures.eu/gcube/trunk/data-analysis/DataMinerConfiguration
Actions #16

Updated by Andrea Dell'Amico over 9 years ago

Gianpaolo Coro wrote:

Thank you Andrea,
the cfg and the PARALLEL_PROCESSING folders would be better downloaded from this SVN location in the next installations:

Do we need to perform a svn update regularly or not?

https://svn.d4science.research-infrastructures.eu/gcube/trunk/data-analysis/DataMinerConfiguration

I was using this (public path, to avoid authentication and problems with the untrusted TLS certificate):

http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/{cfg,PARALLEL_PROCESSING}

I don't think there's a difference between the private and the public link.

Actions #17

Updated by Gianpaolo Coro over 9 years ago

So I misunderstood. We do not need an automatic periodic update. Those folders need to be installed only at the beginning and should be updated only when a new release is available.
Furthermore, your link is good, it is the free-access read-only version of my link. If your script downloads that folder at each installation there is no problem.

Actions #18

Updated by Andrea Dell'Amico over 9 years ago

Gianpaolo Coro wrote:

If your script downloads that folder at each installation there is no problem.

Yes. I only changed the svn link and you changed the wiki accordingly, so that it now points to the public access svn repository.

Actions #19

Updated by Andrea Dell'Amico over 9 years ago

  • Status changed from In Progress to Feedback

Andrea Dell'Amico wrote:

2 - is it possible to copy the encryption keys (https://goo.gl/cSXOBq) also in the PARALLEL_PROCESSING folder?

Yes, I'll do.

The playbook is ready and I tried it on the dev instances. I see that you installed the production keys there. Do we need both the dev and the production keys, or under 'PARALLEL_PROCESSING' the production keys are needed even if it's the dev environment?

Actions #20

Updated by Gianpaolo Coro over 9 years ago

Perhaps the best would be having the production keys installed on the prod. servers and the development key on the dev. servers. From a technical point of view there is no issue if you copy all the keys on the services, since the scopes are assigned at infrastructure level and the belonging infrastructure depends on the GHN configuration.

Actions #21

Updated by Andrea Dell'Amico over 9 years ago

Gianpaolo Coro wrote:

Perhaps the best would be having the production keys installed on the prod. servers and the development key on the dev. servers.

It's what I'm (the playbook is) doing.

Actions #22

Updated by Andrea Dell'Amico over 9 years ago

  • Status changed from Feedback to Closed
  • % Done changed from 90 to 100
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)