Project

General

Profile

Actions

Incident #4805

closed

SmartGears on dataminer4 fails to save status

Added by Gianpaolo Coro almost 9 years ago. Updated almost 9 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
High-Throughput-Computing
Start date:
Jul 26, 2016
Due date:
% Done:

100%

Estimated time:
Infrastructure:
Production

Description

Each time the SmartGears service on dataminer4-p-d4s.d4science.org is shutdown, the status gets corrupted. This machine is hosted by hardware that is bit slower than the one of the other dataminer machines.

I guess this is a general SG issue that this machine can help to investigate.

Here is the Exception:

java.lang.RuntimeException: cannot load profile for DataMiner @ /home/gcube/SmartGears/state/DataMiner/endpoint.xml
        at org.gcube.smartgears.handlers.application.lifecycle.ProfileManager.load(ProfileManager.java:258) ~[common-smartgears-1.2.7-4.0.0-128702.jar:na]
        at org.gcube.smartgears.handlers.application.lifecycle.ProfileManager.loadOrCreateProfile(ProfileManager.java:214) ~[common-smartgears-1.2.7-4.0.0-128702.jar:na]
        at org.gcube.smartgears.handlers.application.lifecycle.ProfileManager.onStart(ProfileManager.java:66) ~[common-smartgears-1.2.7-4.0.0-128702.jar:na]
        at org.gcube.smartgears.handlers.application.ApplicationLifecycleHandler.onEvent(ApplicationLifecycleHandler.java:42) ~[common-smartgears-1.2.7-4.0.0-128702.jar:na]
        at org.gcube.smartgears.handlers.application.ApplicationLifecycleHandler.onEvent(ApplicationLifecycleHandler.java:18) ~[common-smartgears-1.2.7-4.0.0-128702.jar:na]
        at org.gcube.smartgears.handlers.Pipeline.forward(Pipeline.java:69) ~[common-smartgears-1.2.7-4.0.0-128702.jar:na]
        at org.gcube.smartgears.managers.ApplicationManager.start(ApplicationManager.java:205) ~[common-smartgears-1.2.7-4.0.0-128702.jar:na]
        at org.gcube.smartgears.managers.ApplicationManager.start(ApplicationManager.java:88) ~[common-smartgears-1.2.7-4.0.0-128702.jar:na]
        at org.gcube.smartgears.Bootstrap.onStartup(Bootstrap.java:63) [common-smartgears-1.2.7-4.0.0-128702.jar:na]
        at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5493) [tomcat-catalina-7.0.52.jar:7.0.52]
        at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) [tomcat-catalina-7.0.52.jar:7.0.52]
        at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901) [tomcat-catalina-7.0.52.jar:7.0.52]
        at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877) [tomcat-catalina-7.0.52.jar:7.0.52]
        at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:632) [tomcat-catalina-7.0.52.jar:7.0.52]
        at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1229) [tomcat-catalina-7.0.52.jar:7.0.52]
        at org.apache.catalina.startup.HostConfig$DeployDirectory.run(HostConfig.java:1875) [tomcat-catalina-7.0.52.jar:7.0.52]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_80]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_80]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
Caused by: java.lang.RuntimeException: deserialisation error
        at org.gcube.common.resources.gcore.Resources.unmarshal(Resources.java:212) ~[common-gcore-resources-1.3.1-4.0.0-125225.jar:na]
        at org.gcube.common.resources.gcore.Resources.unmarshal(Resources.java:196) ~[common-gcore-resources-1.3.1-4.0.0-125225.jar:na]
        at org.gcube.smartgears.handlers.application.lifecycle.ProfileManager.load(ProfileManager.java:248) ~[common-smartgears-1.2.7-4.0.0-128702.jar:na]
        ... 20 common frames omitted
Caused by: javax.xml.bind.UnmarshalException: null
        at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:335) ~[na:1.7.0_80]
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.createUnmarshalException(UnmarshallerImpl.java:514) ~[na:na]
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:215) ~[na:na]
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:184) ~[na:na]
        at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:157) ~[na:1.7.0_80]
        at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:125) ~[na:1.7.0_80]
        at org.gcube.common.resources.gcore.Resources.unmarshal(Resources.java:209) ~[common-gcore-resources-1.3.1-4.0.0-125225.jar:na]
        ... 22 common frames omitted
Caused by: org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.
        at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source) ~[xercesImpl-2.7.1.jar:na]
        at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source) ~[xercesImpl-2.7.1.jar:na]
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) ~[xercesImpl-2.7.1.jar:na]
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) ~[xercesImpl-2.7.1.jar:na]
        at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source) ~[xercesImpl-2.7.1.jar:na]
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.endEntity(Unknown Source) ~[xercesImpl-2.7.1.jar:na]
        at org.apache.xerces.impl.XMLDocumentScannerImpl.endEntity(Unknown Source) ~[xercesImpl-2.7.1.jar:na]
        at org.apache.xerces.impl.XMLEntityManager.endEntity(Unknown Source) ~[xercesImpl-2.7.1.jar:na]
        at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) ~[xercesImpl-2.7.1.jar:na]
        at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source) ~[xercesImpl-2.7.1.jar:na]
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source) ~[xercesImpl-2.7.1.jar:na]
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) ~[xercesImpl-2.7.1.jar:na]
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) ~[xercesImpl-2.7.1.jar:na]
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) ~[xercesImpl-2.7.1.jar:na]
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) ~[xercesImpl-2.7.1.jar:na]
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) ~[xercesImpl-2.7.1.jar:na]
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) ~[xercesImpl-2.7.1.jar:na]
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) ~[xercesImpl-2.7.1.jar:na]
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:211) ~[na:na]
        ... 26 common frames omitted
Actions #2

Updated by Tommaso Piccioli almost 9 years ago

The slowest machine is dataminer5-p-d4science, slower than dataminer4-p-d4s, cpu and disk.
Anyway, the speed at a certain time depends on load punctual load on the full xen server.

Actions #3

Updated by Andrea Dell'Amico almost 9 years ago

I don't think that the VM performances have a big impact on the corrupted state problem. Yesterday a dozen smartexecutor corrupted the state after the upgrade, and they were running on different hypervisors.

Actions #4

Updated by Pasquale Pagano almost 9 years ago

is this incident solved?

Actions #5

Updated by Gianpaolo Coro almost 9 years ago

Andrea and Tommaso have added useful information, but the solution should be provided by Lucio. This is a long-known issue indeed.

Actions #7

Updated by Andrea Dell'Amico almost 9 years ago

There's an open ticket for the bug, #2176

Actions #8

Updated by Gianpaolo Coro almost 9 years ago

Sorry, let me clarify that I opened this ticket because the issue was frequent but random on the development machines and this made it difficult to catch it. Instead, on dataminer4 it seems to be always happening.

Actions #9

Updated by Pasquale Pagano almost 9 years ago

Any news?

Actions #10

Updated by Lucio Lelii almost 9 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

this issue will be solved with the release 4.1.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)