Project

General

Profile

Actions

Task #10479

closed

Cloud Provisioning Requests

Added by Gianpaolo Coro over 7 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
_InfraScience Systems Engineer
Category:
High-Throughput-Computing
Target version:
Start date:
Nov 29, 2017
Due date:
% Done:

100%

Estimated time:
Infrastructure:
Production

Description

A number of requests for the Cloud computing provisioning service:

1 - access the dataminer.d4science.org:8880 page with a password also from outside CNR

2 - enrich the cloud1 system with private IP machines

3 - save the DM logs up to 3 months earlier

Actions #2

Updated by Andrea Dell'Amico over 7 years ago

  • Status changed from New to In Progress

Gianpaolo Coro wrote:

3 - save the DM logs up to 3 months earlier

I'm running the log reconfiguration right now. It should fix the logging/rolling problems too where they exist.

Actions #4

Updated by Andrea Dell'Amico over 7 years ago

  • % Done changed from 0 to 60
Actions #5

Updated by Andrea Dell'Amico over 7 years ago

I just adopted a configuration on both the CNR haproxy instance and the dataminer.garr loadbalancer that makes all the GARR instances available, but it's not satisfying because the checks on the private IP instances are not reliable.

If there are no objections, I'm going to move the dataminer-cloud1.d4science.org to the dataminer.garr load balancer to properly fix the configuration. We will eliminate an unnecessary hop too.

Actions #6

Updated by Andrea Dell'Amico over 7 years ago

I just reconfigured the dataminer haproxy and moved the host dataminer-cloud1.d4science.org. It is now a CNAME of dataminer-lb.garr.d4science.org

Actions #7

Updated by Andrea Dell'Amico over 7 years ago

The haproxy stats on dataminer-lb.garr.d4science.org:8880 are accessible with the same credentials that are valid for dataminer.d4science.org

Actions #8

Updated by Andrea Dell'Amico over 7 years ago

The new configuration is operational. I removed the cloud1 cluster configuration from the CNR haproxy instance.

Actions #9

Updated by Andrea Dell'Amico over 7 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 60 to 100
Actions #10

Updated by Gianpaolo Coro over 7 years ago

What about point 3 and https://support.d4science.org/issues/10479#note-1 ?
Logs are still empty.

Actions #11

Updated by Andrea Dell'Amico over 7 years ago

Are you sure that those instances received any job request after Nov 29th? Because the logs configuration were updated that day and the container restarted. Here is the logback configuration that matters:

  <appender name="ANALYSIS" class="ch.qos.logback.core.rolling.RollingFileAppender">
    <file>/home/gcube/tomcat/logs/analysis.log</file>
    <append>true</append>
    <encoder>
      <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{0}: %msg%n</pattern>
    </encoder>
    <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
      <fileNamePattern>/home/gcube/tomcat/logs/analysis.%d{yyyy-MM-dd}.%i.log</fileNamePattern>
      <maxHistory>90</maxHistory>
      <maxFileSize>10MB</maxFileSize>
      <totalSizeCap>2GB</totalSizeCap>
    </rollingPolicy>
  </appender>

  <logger name="org.gcube.dataanalysis" level="DEBUG">
    <appender-ref ref="ANALYSIS" />
  </logger>
  <logger name="AnalysisLogger" level="DEBUG">
    <appender-ref ref="ANALYSIS" />
  </logger>

I just tested ip-90-147-167-183.ct1.garrservices.it and it didn't receive any jobs.

Actions #12

Updated by Gianpaolo Coro over 7 years ago

Friday I run 200 executions towards Cloud1 by repeatedly invoking this HTTP Get request:

http://dataminer-cloud1.d4science.org/wps/WebProcessingService?request=Execute&service=WPS&Version=1.0.0&gcube-token=3a8e6a79-1ae0-413f-9121-0d59e5f2cea2-843339462&lang=en-US&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.CMSY_FOR_DLM_TOOL&DataInputs=catch_file=http%3A%2F%2Fdata.d4science.org%2FMlVxMFYwdFVxTjlHV05FbExrRDJjODVwajFzVzNJcnVHbWJQNStIS0N6Yz0;Region=Mediterranean;Subregion=Adriatic+Sea;Stock=Athe_boy_AD;Group=Plankton+feeders;Name=Sand+smelt+in+Adriatic+Sea;EnglishName=Big+scale+sand+smelt;ScientificName=Atherina+boyeri;Source=-;MinOfYear=1970;MaxOfYear=2014;StartYear=1970;EndYear=2014;Flim=NA;Fpa=NA;Blim=NA;Bpa=NA;Bmsy=NA;FMSY=NA;MSY=NA;MSYBtrigger=NA;B40=NA;M=NA;Fofl=NA;last_F=NA;Resilience=Medium;r.low=NA;r.hi=NA;stb.low=0.2;stb.hi=0.6;int.yr=NA;intb.low=NA;intb.hi=NA;endb.low=0.01;endb.hi=0.4;q.start=NA;q.end=NA;btype=None;force.cmsy=false;Comment=landings;

The requests were supposed to be balanced over the machines behind the cloud1 cluster. Why this did not happen then?

Actions #13

Updated by Andrea Dell'Amico over 7 years ago

I don't know what you did, but the requests are always perfectly balanced between all servers. You can see by yourself on the stats page.

Actions #14

Updated by Gianpaolo Coro over 7 years ago

I simply repeatedly invoked the HTTP Get request above, but eventually the calls did not go on all the machines. This is an evidence in spite of the proxy statistics.
I simply used the httr::GET call of R iteratively.
I don't know where this discrepancy comes from, but we should understand it, otherwise I cannot be sure that clients' requests are really parallelised with our implementations.

Is it possible to do a linux script invoking the cloud1 proxy and see if these are balanced?
One testing URL could be:

http://dataminer-cloud1.d4science.org/wps/WebProcessingService?request=Execute&service=WPS&Version=1.0.0&gcube-token=3a8e6a79-1ae0-413f-9121-0d59e5f2cea2-843339462&lang=en-US&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.MOST_OBSERVED_SPECIES&DataInputs=Species_number=10;Start_year=1800;End_year=2020;

and the logs should reveal if the calls were distributed. I have difficulties since I'm at a meeting this week.

Actions #15

Updated by Andrea Dell'Amico over 7 years ago

No problem doing it. But the haproxy statistics don't lie

Actions #16

Updated by Andrea Dell'Amico over 7 years ago

The script is running. But I just realized that the hosts you reported in your first comment but http://ip-90-147-167-222.ct1.garrservices.it/gcube-logs/ are genericworkers and not standard dataminers and are not called by your job. ip-90-147-167-222.ct1.garrservices.it was recently reinstalled. If you check now that one, and you'll see the logs of your computations (a couple of them, I called it 40 times).

Actions #17

Updated by Gianpaolo Coro over 7 years ago

OK that explains everything. Is it possible perhaps to use the -genericworkers prefix for the generic workers machines names?
The cloud1 machines should be all "large" DM machines, right?

Thanks

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)