Task #10479
closed
Cloud Provisioning Requests
100%
Description
A number of requests for the Cloud computing provisioning service:
1 - access the dataminer.d4science.org:8880 page with a password also from outside CNR
2 - enrich the cloud1 system with private IP machines
3 - save the DM logs up to 3 months earlier
Updated by Gianpaolo Coro over 7 years ago
There are Garr machines that need to have logging double checked:
Machines that are not logging at all:
http://ip-90-147-167-183.ct1.garrservices.it/gcube-logs/
http://ip-90-147-167-179.ct1.garrservices.it/gcube-logs/
http://ip-90-147-167-180.ct1.garrservices.it/gcube-logs/
http://ip-90-147-167-176.ct1.garrservices.it/gcube-logs/
Machines with no- or poor- file rolling:
http://ip-90-147-167-222.ct1.garrservices.it/gcube-logs/
http://ip-90-147-167-175.ct1.garrservices.it/gcube-logs/
http://ip-90-147-167-177.ct1.garrservices.it/gcube-logs/
http://ip-90-147-167-178.ct1.garrservices.it/gcube-logs/
http://ip-90-147-167-181.ct1.garrservices.it/gcube-logs/
http://ip-90-147-167-182.ct1.garrservices.it/gcube-logs/
Updated by Andrea Dell'Amico over 7 years ago
- Status changed from New to In Progress
Gianpaolo Coro wrote:
3 - save the DM logs up to 3 months earlier
I'm running the log reconfiguration right now. It should fix the logging/rolling problems too where they exist.
Updated by Andrea Dell'Amico over 7 years ago
I just adopted a configuration on both the CNR haproxy instance and the dataminer.garr loadbalancer that makes all the GARR instances available, but it's not satisfying because the checks on the private IP instances are not reliable.
If there are no objections, I'm going to move the dataminer-cloud1.d4science.org to the dataminer.garr load balancer to properly fix the configuration. We will eliminate an unnecessary hop too.
Updated by Andrea Dell'Amico over 7 years ago
I just reconfigured the dataminer haproxy and moved the host dataminer-cloud1.d4science.org. It is now a CNAME of dataminer-lb.garr.d4science.org
Updated by Andrea Dell'Amico over 7 years ago
The haproxy stats on dataminer-lb.garr.d4science.org:8880 are accessible with the same credentials that are valid for dataminer.d4science.org
Updated by Andrea Dell'Amico over 7 years ago
The new configuration is operational. I removed the cloud1
cluster configuration from the CNR haproxy instance.
Updated by Andrea Dell'Amico over 7 years ago
- Status changed from In Progress to Closed
- % Done changed from 60 to 100
Updated by Gianpaolo Coro over 7 years ago
What about point 3 and https://support.d4science.org/issues/10479#note-1 ?
Logs are still empty.
Updated by Andrea Dell'Amico over 7 years ago
Are you sure that those instances received any job request after Nov 29th? Because the logs configuration were updated that day and the container restarted. Here is the logback configuration that matters:
<appender name="ANALYSIS" class="ch.qos.logback.core.rolling.RollingFileAppender"> <file>/home/gcube/tomcat/logs/analysis.log</file> <append>true</append> <encoder> <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{0}: %msg%n</pattern> </encoder> <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy"> <fileNamePattern>/home/gcube/tomcat/logs/analysis.%d{yyyy-MM-dd}.%i.log</fileNamePattern> <maxHistory>90</maxHistory> <maxFileSize>10MB</maxFileSize> <totalSizeCap>2GB</totalSizeCap> </rollingPolicy> </appender> <logger name="org.gcube.dataanalysis" level="DEBUG"> <appender-ref ref="ANALYSIS" /> </logger> <logger name="AnalysisLogger" level="DEBUG"> <appender-ref ref="ANALYSIS" /> </logger>
I just tested ip-90-147-167-183.ct1.garrservices.it and it didn't receive any jobs.
Updated by Gianpaolo Coro over 7 years ago
Friday I run 200 executions towards Cloud1 by repeatedly invoking this HTTP Get request:
http://dataminer-cloud1.d4science.org/wps/WebProcessingService?request=Execute&service=WPS&Version=1.0.0&gcube-token=3a8e6a79-1ae0-413f-9121-0d59e5f2cea2-843339462&lang=en-US&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.CMSY_FOR_DLM_TOOL&DataInputs=catch_file=http%3A%2F%2Fdata.d4science.org%2FMlVxMFYwdFVxTjlHV05FbExrRDJjODVwajFzVzNJcnVHbWJQNStIS0N6Yz0;Region=Mediterranean;Subregion=Adriatic+Sea;Stock=Athe_boy_AD;Group=Plankton+feeders;Name=Sand+smelt+in+Adriatic+Sea;EnglishName=Big+scale+sand+smelt;ScientificName=Atherina+boyeri;Source=-;MinOfYear=1970;MaxOfYear=2014;StartYear=1970;EndYear=2014;Flim=NA;Fpa=NA;Blim=NA;Bpa=NA;Bmsy=NA;FMSY=NA;MSY=NA;MSYBtrigger=NA;B40=NA;M=NA;Fofl=NA;last_F=NA;Resilience=Medium;r.low=NA;r.hi=NA;stb.low=0.2;stb.hi=0.6;int.yr=NA;intb.low=NA;intb.hi=NA;endb.low=0.01;endb.hi=0.4;q.start=NA;q.end=NA;btype=None;force.cmsy=false;Comment=landings;
The requests were supposed to be balanced over the machines behind the cloud1 cluster. Why this did not happen then?
Updated by Andrea Dell'Amico over 7 years ago
I don't know what you did, but the requests are always perfectly balanced between all servers. You can see by yourself on the stats page.
Updated by Gianpaolo Coro over 7 years ago
I simply repeatedly invoked the HTTP Get request above, but eventually the calls did not go on all the machines. This is an evidence in spite of the proxy statistics.
I simply used the httr::GET call of R iteratively.
I don't know where this discrepancy comes from, but we should understand it, otherwise I cannot be sure that clients' requests are really parallelised with our implementations.
Is it possible to do a linux script invoking the cloud1 proxy and see if these are balanced?
One testing URL could be:
http://dataminer-cloud1.d4science.org/wps/WebProcessingService?request=Execute&service=WPS&Version=1.0.0&gcube-token=3a8e6a79-1ae0-413f-9121-0d59e5f2cea2-843339462&lang=en-US&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.MOST_OBSERVED_SPECIES&DataInputs=Species_number=10;Start_year=1800;End_year=2020;
and the logs should reveal if the calls were distributed. I have difficulties since I'm at a meeting this week.
Updated by Andrea Dell'Amico over 7 years ago
No problem doing it. But the haproxy statistics don't lie
Updated by Andrea Dell'Amico over 7 years ago
The script is running. But I just realized that the hosts you reported in your first comment but http://ip-90-147-167-222.ct1.garrservices.it/gcube-logs/ are genericworkers and not standard dataminers and are not called by your job. ip-90-147-167-222.ct1.garrservices.it was recently reinstalled. If you check now that one, and you'll see the logs of your computations (a couple of them, I called it 40 times).
Updated by Gianpaolo Coro over 7 years ago
OK that explains everything. Is it possible perhaps to use the -genericworkers prefix for the generic workers machines names?
The cloud1 machines should be all "large" DM machines, right?
Thanks