Task #10479: Cloud Provisioning Requests - D4Science Infrastructure - D4science

Actions

Copy link

Task #10479

closed

Cloud Provisioning Requests

Added by Gianpaolo Coro over 7 years ago. Updated over 7 years ago.

Status:

Closed

Priority:

Normal

Assignee:

_InfraScience Systems Engineer

Category:

High-Throughput-Computing

Target version:

Data Processing

Start date:

Nov 29, 2017

Due date:

% Done:

100%

Estimated time:

Infrastructure:

Production

Description

A number of requests for the Cloud computing provisioning service:

1 - access the dataminer.d4science.org:8880 page with a password also from outside CNR

2 - enrich the cloud1 system with private IP machines

3 - save the DM logs up to 3 months earlier

Actions

Copy link

Updated by Gianpaolo Coro over 7 years ago

There are Garr machines that need to have logging double checked:

Machines that are not logging at all:

http://ip-90-147-167-183.ct1.garrservices.it/gcube-logs/
http://ip-90-147-167-179.ct1.garrservices.it/gcube-logs/
http://ip-90-147-167-180.ct1.garrservices.it/gcube-logs/
http://ip-90-147-167-176.ct1.garrservices.it/gcube-logs/

Machines with no- or poor- file rolling:

http://ip-90-147-167-222.ct1.garrservices.it/gcube-logs/
http://ip-90-147-167-175.ct1.garrservices.it/gcube-logs/
http://ip-90-147-167-177.ct1.garrservices.it/gcube-logs/
http://ip-90-147-167-178.ct1.garrservices.it/gcube-logs/
http://ip-90-147-167-181.ct1.garrservices.it/gcube-logs/
http://ip-90-147-167-182.ct1.garrservices.it/gcube-logs/

Actions

Copy link

Updated by Andrea Dell'Amico over 7 years ago

Status changed from New to In Progress

Gianpaolo Coro wrote:

3 - save the DM logs up to 3 months earlier

I'm running the log reconfiguration right now. It should fix the logging/rolling problems too where they exist.

Actions

Copy link

Updated by Andrea Dell'Amico over 7 years ago

% Done changed from 0 to 60

Actions

Copy link

Updated by Andrea Dell'Amico over 7 years ago

I just adopted a configuration on both the CNR haproxy instance and the dataminer.garr loadbalancer that makes all the GARR instances available, but it's not satisfying because the checks on the private IP instances are not reliable.

If there are no objections, I'm going to move the dataminer-cloud1.d4science.org to the dataminer.garr load balancer to properly fix the configuration. We will eliminate an unnecessary hop too.

Actions

Copy link

Updated by Andrea Dell'Amico over 7 years ago

I just reconfigured the dataminer haproxy and moved the host dataminer-cloud1.d4science.org. It is now a CNAME of dataminer-lb.garr.d4science.org

Actions

Copy link

Updated by Andrea Dell'Amico over 7 years ago

The haproxy stats on dataminer-lb.garr.d4science.org:8880 are accessible with the same credentials that are valid for dataminer.d4science.org

Actions

Copy link

Updated by Andrea Dell'Amico over 7 years ago

The new configuration is operational. I removed the cloud1 cluster configuration from the CNR haproxy instance.

Actions

Copy link

Updated by Andrea Dell'Amico over 7 years ago

Status changed from In Progress to Closed
% Done changed from 60 to 100

Actions

Copy link

#10

Updated by Gianpaolo Coro over 7 years ago

What about point 3 and https://support.d4science.org/issues/10479#note-1 ?
Logs are still empty.

Actions

Copy link

#11

Updated by Andrea Dell'Amico over 7 years ago

Are you sure that those instances received any job request after Nov 29th? Because the logs configuration were updated that day and the container restarted. Here is the logback configuration that matters:

  <appender name="ANALYSIS" class="ch.qos.logback.core.rolling.RollingFileAppender">
    <file>/home/gcube/tomcat/logs/analysis.log</file>
    <append>true</append>
    <encoder>
      <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{0}: %msg%n</pattern>
    </encoder>
    <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
      <fileNamePattern>/home/gcube/tomcat/logs/analysis.%d{yyyy-MM-dd}.%i.log</fileNamePattern>
      <maxHistory>90</maxHistory>
      <maxFileSize>10MB</maxFileSize>
      <totalSizeCap>2GB</totalSizeCap>
    </rollingPolicy>
  </appender>

  <logger name="org.gcube.dataanalysis" level="DEBUG">
    <appender-ref ref="ANALYSIS" />
  </logger>
  <logger name="AnalysisLogger" level="DEBUG">
    <appender-ref ref="ANALYSIS" />
  </logger>

I just tested ip-90-147-167-183.ct1.garrservices.it and it didn't receive any jobs.

Actions

Copy link

#12

Updated by Gianpaolo Coro over 7 years ago

Friday I run 200 executions towards Cloud1 by repeatedly invoking this HTTP Get request:

http://dataminer-cloud1.d4science.org/wps/WebProcessingService?request=Execute&service=WPS&Version=1.0.0&gcube-token=3a8e6a79-1ae0-413f-9121-0d59e5f2cea2-843339462&lang=en-US&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.CMSY_FOR_DLM_TOOL&DataInputs=catch_file=http%3A%2F%2Fdata.d4science.org%2FMlVxMFYwdFVxTjlHV05FbExrRDJjODVwajFzVzNJcnVHbWJQNStIS0N6Yz0;Region=Mediterranean;Subregion=Adriatic+Sea;Stock=Athe_boy_AD;Group=Plankton+feeders;Name=Sand+smelt+in+Adriatic+Sea;EnglishName=Big+scale+sand+smelt;ScientificName=Atherina+boyeri;Source=-;MinOfYear=1970;MaxOfYear=2014;StartYear=1970;EndYear=2014;Flim=NA;Fpa=NA;Blim=NA;Bpa=NA;Bmsy=NA;FMSY=NA;MSY=NA;MSYBtrigger=NA;B40=NA;M=NA;Fofl=NA;last_F=NA;Resilience=Medium;r.low=NA;r.hi=NA;stb.low=0.2;stb.hi=0.6;int.yr=NA;intb.low=NA;intb.hi=NA;endb.low=0.01;endb.hi=0.4;q.start=NA;q.end=NA;btype=None;force.cmsy=false;Comment=landings;

The requests were supposed to be balanced over the machines behind the cloud1 cluster. Why this did not happen then?

Actions

Copy link

#13

Updated by Andrea Dell'Amico over 7 years ago

I don't know what you did, but the requests are always perfectly balanced between all servers. You can see by yourself on the stats page.

Actions

Copy link

#14

Updated by Gianpaolo Coro over 7 years ago

I simply repeatedly invoked the HTTP Get request above, but eventually the calls did not go on all the machines. This is an evidence in spite of the proxy statistics.
I simply used the httr::GET call of R iteratively.
I don't know where this discrepancy comes from, but we should understand it, otherwise I cannot be sure that clients' requests are really parallelised with our implementations.

Is it possible to do a linux script invoking the cloud1 proxy and see if these are balanced?
One testing URL could be:

http://dataminer-cloud1.d4science.org/wps/WebProcessingService?request=Execute&service=WPS&Version=1.0.0&gcube-token=3a8e6a79-1ae0-413f-9121-0d59e5f2cea2-843339462&lang=en-US&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.MOST_OBSERVED_SPECIES&DataInputs=Species_number=10;Start_year=1800;End_year=2020;

and the logs should reveal if the calls were distributed. I have difficulties since I'm at a meeting this week.

Actions

Copy link

#15

Updated by Andrea Dell'Amico over 7 years ago

No problem doing it. But the haproxy statistics don't lie

Actions

Copy link

#16

Updated by Andrea Dell'Amico over 7 years ago

The script is running. But I just realized that the hosts you reported in your first comment but http://ip-90-147-167-222.ct1.garrservices.it/gcube-logs/ are genericworkers and not standard dataminers and are not called by your job. ip-90-147-167-222.ct1.garrservices.it was recently reinstalled. If you check now that one, and you'll see the logs of your computations (a couple of them, I called it 40 times).

Actions

Copy link

#17

Updated by Gianpaolo Coro over 7 years ago

OK that explains everything. Is it possible perhaps to use the -genericworkers prefix for the generic workers machines names?
The cloud1 machines should be all "large" DM machines, right?

Thanks

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

D4Science Infrastructure

Custom queries

Task #10479

Cloud Provisioning Requests

Updated by Gianpaolo Coro over 7 years ago

Updated by Andrea Dell'Amico over 7 years ago

Updated by Andrea Dell'Amico over 7 years ago

Updated by Andrea Dell'Amico over 7 years ago

Updated by Andrea Dell'Amico over 7 years ago

Updated by Andrea Dell'Amico over 7 years ago

Updated by Andrea Dell'Amico over 7 years ago

Updated by Andrea Dell'Amico over 7 years ago

Updated by Gianpaolo Coro over 7 years ago

Updated by Andrea Dell'Amico over 7 years ago

Updated by Gianpaolo Coro over 7 years ago

Updated by Andrea Dell'Amico over 7 years ago

Updated by Gianpaolo Coro over 7 years ago

Updated by Andrea Dell'Amico over 7 years ago

Updated by Andrea Dell'Amico over 7 years ago

Updated by Gianpaolo Coro over 7 years ago