Project

General

Profile

Actions

Incident #12214

closed

DataMiners cannot interact with the Storage System

Added by Gianpaolo Coro almost 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
Immediate
Assignee:
_InfraScience Systems Engineer
Category:
High-Throughput-Computing
Target version:
Start date:
Jul 20, 2018
Due date:
Jul 20, 2018
% Done:

100%

Estimated time:
Infrastructure:
Production

Description

No DM is currently working in any VRE because they get errors when interacting with the Storage system. The issue is when DM tries to write the output of a computation directly on the (volatile) storage system. Here is the stack trace:

com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting to connect. Client view of cluster state is {type=UNKNOWN, servers=[{address=mongo-p-vol.d4science.org:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.ConnectException: Connection refused (Connection refused)}}]
        at com.mongodb.connection.BaseCluster.getDescription(BaseCluster.java:167)
        at com.mongodb.Mongo.getConnectedClusterDescription(Mongo.java:881)
        at com.mongodb.Mongo.createClientSession(Mongo.java:873)
        at com.mongodb.Mongo$3.getClientSession(Mongo.java:862)
        at com.mongodb.Mongo$3.execute(Mongo.java:830)
        at com.mongodb.Mongo$3.execute(Mongo.java:814)
        at com.mongodb.DBCollection.createIndex(DBCollection.java:1623)
        at com.mongodb.DBCollection.createIndex(DBCollection.java:1608)
        at org.gcube.contentmanagement.blobstorage.transport.backend.MongoOperationManager.getMongoInstance(MongoOperationManager.java:93)
        at org.gcube.contentmanagement.blobstorage.transport.backend.MongoOperationManager.initBackend(MongoOperationManager.java:78)
        at org.gcube.contentmanagement.blobstorage.transport.backend.MongoOperationManager.<init>(MongoOperationManager.java:51)
        at org.gcube.contentmanagement.blobstorage.transport.TransportManagerFactory.load(TransportManagerFactory.java:55)
        at org.gcube.contentmanagement.blobstorage.transport.TransportManagerFactory.getTransport(TransportManagerFactory.java:42)
        at org.gcube.contentmanagement.blobstorage.service.operation.Operation.put(Operation.java:164)
        at org.gcube.contentmanagement.blobstorage.service.operation.Upload.doIt(Upload.java:52)
        at org.gcube.contentmanagement.blobstorage.service.operation.Upload.doIt(Upload.java:26)
        at org.gcube.contentmanagement.blobstorage.service.operation.OperationManager.startOperation(OperationManager.java:71)
        at org.gcube.contentmanagement.blobstorage.service.impl.Resource.retrieveRemoteObject(Resource.java:120)
        at org.gcube.contentmanagement.blobstorage.service.impl.Resource.getRemoteObject(Resource.java:110)
        at org.gcube.contentmanagement.blobstorage.service.impl.RemoteResource.RFile(RemoteResource.java:61)
        at org.gcube.contentmanagement.blobstorage.service.impl.RemoteResource.RFile(RemoteResource.java:43)
        at org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mapping.OutputsManager.uploadFileOnStorage(OutputsManager.java:183)
        at org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mapping.OutputsManager.createOutput(OutputsManager.java:105)
        at org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mapping.AbstractEcologicalEngineMapper.run(AbstractEcologicalEngineMapper.java:432)
        at org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.ITALIANLP_NER.run(ITALIANLP_NER.java:24)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.n52.wps.algorithm.annotation.AnnotationBinding$ExecuteMethodBinding.execute(AnnotationBinding.java:89)
        at org.n52.wps.server.AbstractAnnotatedAlgorithm.run(AbstractAnnotatedAlgorithm.java:54)
        at org.gcube.data.analysis.wps.ExecuteRequest.call(ExecuteRequest.java:608)
        at org.gcube.data.analysis.wps.ExecuteRequest.call(ExecuteRequest.java:67)
        at org.gcube.common.authorization.library.AuthorizedTasks$1.call(AuthorizedTasks.java:41)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Here is a testing url:

http://dataminer0-proto.d4science.org/wps/WebProcessingService?request=Execute&service=WPS&Version=1.0.0&gcube-token=<RprotoToken>&lang=en-US&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.LANGUAGE_RECOGNIZER&DataInputs=sentence=North+Korea+has+agreed+to+send+a+delegation+to+next+month+Winter+Olympics+in+South+Korea%2C+the+first+notable+breakthrough+to+come+out+of+a+face-to-face+meeting+Tuesday+between+the+neighboring+nations.;

Related issues

Blocked by D4Science Infrastructure - Task #12215: Increase disk space on mongo-p-vol.d4science.orgClosed_InfraScience Systems EngineerJul 20, 2018

Actions
Actions #2

Updated by Andrea Dell'Amico almost 7 years ago

I see that the mongo instance on that server is down.

So: we do not have monitoring of those instances, it seems. @tommaso.piccioli@isti.cnr.it they were upgraded as well as the other mongo instances. I see that mongo killed itself on Jul 19th at 07:23 because of a full disk.

Actions #3

Updated by Andrea Dell'Amico almost 7 years ago

  • Blocked by Task #12215: Increase disk space on mongo-p-vol.d4science.org added
Actions #4

Updated by Andrea Dell'Amico almost 7 years ago

  • Status changed from New to In Progress
Actions #5

Updated by Andrea Dell'Amico almost 7 years ago

  • Status changed from In Progress to Feedback
  • % Done changed from 0 to 100

The disk space on the volatile mongo instance has been incremented and the server restarted. Does it solve the dataminers problem?

Actions #6

Updated by Giancarlo Panichi almost 7 years ago

  • Due date set to Jul 20, 2018
  • Status changed from Feedback to Closed

Yes, it seems that everything works correctly. I close this ticket.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)