Project

General

Profile

Actions

Task #12788

closed

DMPoolManager - The dm-pool-manager-pre.d4science.org has problems when an algorithm is republished

Added by Giancarlo Panichi over 6 years ago. Updated almost 6 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
Application
Start date:
Oct 26, 2018
Due date:
May 02, 2019
% Done:

100%

Estimated time:
Infrastructure:
Pre-Production

Description

The dm-pool-manager-pre.d4science.org has problems when an algorithm is republished.
Perhaps it might even be necessary to upgrade the machine.
Please can you check?
Thanks

An error occurred while deploying your algorithm

Here are the error details:

Installation failed. Return code=2



Algorithm details:

User: Giancarlo Panichi
Algorithm name: PARAMETERSCHECKER
Staging DataMiner Host: dataminer-ghost-t.pre.d4science.org
Caller VRE: /gcube/preprod/preVRE
Target VRE: /gcube/preprod/preVRE




16:32:44.444 [catalina-exec-1] INFO  RequestAccounting: REQUEST START ON dataminer-pool-manager:DataAnalysis(/api/monitor) CALLED FROM giancarlo.panichi@146.48.122.240 IN SCOPE /gcube/preprod/preVRE 
16:32:44.446 [catalina-exec-1] INFO  RequestAccounting: REQUEST SERVED ON dataminer-pool-manager:DataAnalysis(/api/monitor) CALLED FROM giancarlo.panichi@146.48.122.240 IN SCOPE /gcube/preprod/preVRE FINISHED IN 2 millis
16:32:46.026 [Thread-23] ERROR DMPMJob: Operation failed: Ansible work failed
16:32:46.026 [Thread-23] ERROR DMPMJob: Exception: 
org.gcube.dataanalysis.dataminer.poolmanager.service.exceptions.AnsibleException: Ansible work failed
    at org.gcube.dataanalysis.dataminer.poolmanager.service.DMPMJob.installation(DMPMJob.java:172)
    at org.gcube.dataanalysis.dataminer.poolmanager.service.DMPMJob.execute(DMPMJob.java:207)
    at org.gcube.dataanalysis.dataminer.poolmanager.service.StagingJob.execute(StagingJob.java:32)
    at org.gcube.dataanalysis.dataminer.poolmanager.service.DMPMJob$1.run(DMPMJob.java:88)
    at java.lang.Thread.run(Thread.java:748)
16:32:46.410 [Thread-23] ERROR SendMail: Error in the IO process
java.io.IOException: Server returned HTTP response code: 400 for URL: https://socialnetworking-t.pre.d4science.org/social-networking-library-ws/rest/messages/writeMessageToUsers?gcube-token=04269c7d-dab7-498a-841d-8d38ae2d482b-98187548
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1894)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
    at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:263)
    at org.gcube.dataanalysis.dataminer.poolmanager.util.SendMail.sendPostRequest(SendMail.java:165)
    at org.gcube.dataanalysis.dataminer.poolmanager.util.SendMail.sendNotification(SendMail.java:94)
    at org.gcube.dataanalysis.dataminer.poolmanager.service.DMPMJob.execute(DMPMJob.java:223)
    at org.gcube.dataanalysis.dataminer.poolmanager.service.StagingJob.execute(StagingJob.java:32)
    at org.gcube.dataanalysis.dataminer.poolmanager.service.DMPMJob$1.run(DMPMJob.java:88)
    at java.lang.Thread.run(Thread.java:748)
16:32:46.410 [Thread-23] ERROR DMPMJob: Unable to send notification email
org.gcube.dataanalysis.dataminer.poolmanager.util.exception.EMailException: Unable to send email notification
    at org.gcube.dataanalysis.dataminer.poolmanager.util.SendMail.sendNotification(SendMail.java:97)
    at org.gcube.dataanalysis.dataminer.poolmanager.service.DMPMJob.execute(DMPMJob.java:223)
    at org.gcube.dataanalysis.dataminer.poolmanager.service.StagingJob.execute(StagingJob.java:32)
    at org.gcube.dataanalysis.dataminer.poolmanager.service.DMPMJob$1.run(DMPMJob.java:88)
    at java.lang.Thread.run(Thread.java:748)
16:32:46.503 [catalina-exec-2] INFO  RequestContextRetriever: retrieving context using token 04269c7d-dab7-498a-841d-8d38ae2d482b-98187548 
16:32:46.504 [catalina-exec-2] INFO  RequestContextRetriever: retrieved request authorization info org.gcube.common.authorization.library.utils.Caller@684bdaf0 in scope /gcube/preprod/preVRE 


Related issues

Related to D4Science Infrastructure - Upgrade #12739: /gcube/preprod upgrade to gCube 4.13CompletedRoberto CirilloOct 19, 2018Oct 29, 2018

Actions
Related to D4Science Infrastructure - Task #12794: Workspace - Download problem with some file extensionClosedLucio LeliiOct 29, 2018

Actions
Actions #1

Updated by Giancarlo Panichi over 6 years ago

  • Assignee changed from Ciro Formisano to Lucio Lelii

I tried to run ansible from the command line and I noticed these logs:

gcube@dm-pool-manager-pre:~/dataminer-pool-manager/work/62b21472-63b8-4c72-874e-504aad83d55d$ ansible-playbook -v -i inventory.yaml playbook.yaml 
Using /etc/ansible/ansible.cfg as config file
/home/gcube/dataminer-pool-manager/work/62b21472-63b8-4c72-874e-504aad83d55d/inventory.yaml did not meet host_list requirements, check plugin documentation if this is unexpected
/home/gcube/dataminer-pool-manager/work/62b21472-63b8-4c72-874e-504aad83d55d/inventory.yaml did not meet script requirements, check plugin documentation if this is unexpected

PLAY [universe] ************************************************************************************************

TASK [Gathering Facts] *****************************************************************************************
ok: [dataminer-ghost-t.pre.d4science.org]

TASK [gcube-algorithm-DMPOOLMANAGERCHECK : Install algorithm DMPOOLMANAGERCHECK] *******************************
fatal: [dataminer-ghost-t.pre.d4science.org]: FAILED! => {"changed": true, "cmd": "/home/gcube/algorithmInstaller/addAlgorithm.sh DMPOOLMANAGERCHECK BLACK_BOX org.gcube.dataanalysis.executor.rscripts.DMPoolManagerCheck /gcube/preprod/preVRE transducerers N https://data1-d.d4science.net/shub/bc867d01-ba3c-427a-9abf-3a542a7916c6 \"DM Pool Manager Check {Published by Giancarlo Panichi (giancarlo.panichi) on 2018/10/26 12:21 GMT}\"", "delta": "0:00:01.509639", "end": "2018-10-26 15:06:12.546733", "msg": "non-zero return code", "rc": 1, "start": "2018-10-26 15:06:11.037094", "stderr": "SLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:/home/gcube/tomcat/webapps/wps/WEB-INF/lib/logback-classic-1.1.11.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/home/gcube/tomcat/webapps/wps/WEB-INF/lib/slf4j-nop-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/home/gcube/tomcat/lib/logback-classic-1.1.11.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]\nException in thread \"main\" java.lang.ClassNotFoundException: org.gcube.dataanalysis.executor.rscripts.DMPoolManagerCheck\n\tat java.net.URLClassLoader.findClass(URLClassLoader.java:381)\n\tat java.lang.ClassLoader.loadClass(ClassLoader.java:424)\n\tat sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)\n\tat java.lang.ClassLoader.loadClass(ClassLoader.java:357)\n\tat java.lang.Class.forName0(Native Method)\n\tat java.lang.Class.forName(Class.java:264)\n\tat org.gcube.dataanalysis.wps.mapper.ClassGenerator.generateEcologicalEngineClasses(ClassGenerator.java:67)\n\tat org.gcube.dataanalysis.wps.mapper.ClassGenerator.<init>(ClassGenerator.java:29)\n\tat org.gcube.dataanalysis.wps.mapper.DataMinerUpdater.Update(DataMinerUpdater.java:283)\n\tat org.gcube.dataanalysis.wps.mapper.DataMinerUpdater.main(DataMinerUpdater.java:130)", "stderr_lines": ["SLF4J: Class path contains multiple SLF4J bindings.", "SLF4J: Found binding in [jar:file:/home/gcube/tomcat/webapps/wps/WEB-INF/lib/logback-classic-1.1.11.jar!/org/slf4j/impl/StaticLoggerBinder.class]", "SLF4J: Found binding in [jar:file:/home/gcube/tomcat/webapps/wps/WEB-INF/lib/slf4j-nop-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]", "SLF4J: Found binding in [jar:file:/home/gcube/tomcat/lib/logback-classic-1.1.11.jar!/org/slf4j/impl/StaticLoggerBinder.class]", "SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.", "SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]", "Exception in thread \"main\" java.lang.ClassNotFoundException: org.gcube.dataanalysis.executor.rscripts.DMPoolManagerCheck", ...}
    to retry, use: --limit @/home/gcube/.ansible_retry/playbook.retry

PLAY RECAP *****************************************************************************************************
dataminer-ghost-t.pre.d4science.org : ok=1    changed=0    unreachable=0    failed=1   

Playbook run took 0 days, 0 hours, 0 minutes, 3 seconds

I and @roberto.cirillo@isti.cnr.it have seen that the .jar file is downloaded by adding undescore at the beginning and at the end of the file:

https://data1-d.d4science.net/shub/bc867d01-ba3c-427a-9abf-3a542a7916c6

_DMPoolManagerCheck.jar_

Please @lucio.lelii@isti.cnr.it , can you check what happens to the .jar files?
Thanks

Actions #2

Updated by Ciro Formisano over 6 years ago

I am not an expert on ansible. However I don't remember if the re-pubblication should have been supported. Are we sure that Ansible does not prevent the re-installation of the same algorithm?

Actions #3

Updated by Giancarlo Panichi over 6 years ago

Hi @ciro.formisano@eng.it , yes the republishing and updating of the algorithms is a requirement, and it has always worked. Now, we have realized that perhaps the problem is due to the file that is downloaded through UriResolver and StorageHub, for this @lucio.lelii@isti.cnr.it will investigate the matter.

Actions #4

Updated by Roberto Cirillo over 6 years ago

  • Priority changed from Normal to High
Actions #5

Updated by Roberto Cirillo over 6 years ago

  • Related to Upgrade #12739: /gcube/preprod upgrade to gCube 4.13 added
Actions #6

Updated by Giancarlo Panichi over 6 years ago

  • Related to Task #12794: Workspace - Download problem with some file extension added
Actions #7

Updated by Giancarlo Panichi over 6 years ago

After the last corrections, I have done another test.
Now in case of update of the algorithm the file is written with the right name, but the content seems wrong:

Here the logs:

gcube@dm-pool-manager-pre:~/dataminer-pool-manager/work/db4e55da-1519-4b26-a5ec-ec92ca0809b5$ ansible-playbook -v -i inventory.yaml playbook.yaml
Using /etc/ansible/ansible.cfg as config file
/home/gcube/dataminer-pool-manager/work/db4e55da-1519-4b26-a5ec-ec92ca0809b5/inventory.yaml did not meet host_list requirements, check plugin documentation if this is unexpected
/home/gcube/dataminer-pool-manager/work/db4e55da-1519-4b26-a5ec-ec92ca0809b5/inventory.yaml did not meet script requirements, check plugin documentation if this is unexpected

PLAY [universe] **************************************************************************************************************

TASK [Gathering Facts] *******************************************************************************************************
ok: [dataminer-ghost-t.pre.d4science.org]

TASK [gcube-algorithm-DMPM_UPDATE_CHECKER : Install algorithm DMPM_UPDATE_CHECKER] *******************************************
fatal: [dataminer-ghost-t.pre.d4science.org]: FAILED! => {"changed": true, "cmd": "/home/gcube/algorithmInstaller/addAlgorithm.sh DMPM_UPDATE_CHECKER BLACK_BOX org.gcube.dataanalysis.executor.rscripts.DMPMUpdateChecker /gcube/preprod/preVRE transducerers N https://data-d.d4science.org/shub/da558346-7d1a-4159-8780-a22e63f3c7dc \"DMPM Update Checker {Published by Giancarlo Panichi (giancarlo.panichi) on 2018/10/31 15:04 GMT}\"", "delta": "0:00:01.208403", "end": "2018-10-31 16:15:44.739404", "msg": "non-zero return code", "rc": 1, "start": "2018-10-31 16:15:43.531001", "stderr": "SLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:/home/gcube/tomcat/webapps/wps/WEB-INF/lib/slf4j-nop-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/home/gcube/tomcat/webapps/wps/WEB-INF/lib/logback-classic-1.1.11.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/home/gcube/tomcat/lib/logback-classic-1.1.11.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [org.slf4j.helpers.NOPLoggerFactory]\nException in thread \"main\" java.lang.ClassNotFoundException: org.gcube.dataanalysis.executor.rscripts.DMPMUpdateChecker\n\tat java.net.URLClassLoader.findClass(URLClassLoader.java:381)\n\tat java.lang.ClassLoader.loadClass(ClassLoader.java:424)\n\tat sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)\n\tat java.lang.ClassLoader.loadClass(ClassLoader.java:357)\n\tat java.lang.Class.forName0(Native Method)\n\tat java.lang.Class.forName(Class.java:264)

....

    to retry, use: --limit @/home/gcube/.ansible_retry/playbook.retry

PLAY RECAP *******************************************************************************************************************
dataminer-ghost-t.pre.d4science.org : ok=1    changed=0    unreachable=0    failed=1   

Playbook run took 0 days, 0 hours, 0 minutes, 3 seconds

Here the wrong file:

https://data-d.d4science.org/shub/da558346-7d1a-4159-8780-a22e63f3c7dc 

This file matches on storagehub with:

"@class": "org.gcube.common.storagehub.model.items.GenericFileItem",
    "id": "da558346-7d1a-4159-8780-a22e63f3c7dc",
"name": "DMPMUpdateChecker.jar",
    "path": "/Home/giancarlo.panichi/Workspace/TestSAI/BlackBox/DMPMUpdateChecker/Target/Deploy/DMPMUpdateChecker.jar",
    "parentId": "af27bbba-49d5-4192-abd1-7571fc4902c2",
    "parentPath": "/Home/giancarlo.panichi/Workspace/TestSAI/BlackBox/DMPMUpdateChecker/Target/Deploy",

By downloading this file from the Workspace we get the same error.
Therefore, when the SAI updates the algorithm(DMPMUpdateChecker.jar) some error occurs communicating with StorageHub.
So, the operations during this phase must be monitored to catch the error:

FileContainer.copy(folderContainer, "DMPMUpdateChecker.jar");
Actions #9

Updated by Lucio Lelii about 6 years ago

this ticket can be closed ?

Actions #10

Updated by Giancarlo Panichi almost 6 years ago

  • Due date set to May 02, 2019
  • Status changed from New to Closed
  • % Done changed from 0 to 100

Yes, the ticket is 6 months old. Now, there is a new Preproduction infrastructure.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)