Incident #4805
closedSmartGears on dataminer4 fails to save status
100%
Description
Each time the SmartGears service on dataminer4-p-d4s.d4science.org is shutdown, the status gets corrupted. This machine is hosted by hardware that is bit slower than the one of the other dataminer machines.
I guess this is a general SG issue that this machine can help to investigate.
Here is the Exception:
java.lang.RuntimeException: cannot load profile for DataMiner @ /home/gcube/SmartGears/state/DataMiner/endpoint.xml at org.gcube.smartgears.handlers.application.lifecycle.ProfileManager.load(ProfileManager.java:258) ~[common-smartgears-1.2.7-4.0.0-128702.jar:na] at org.gcube.smartgears.handlers.application.lifecycle.ProfileManager.loadOrCreateProfile(ProfileManager.java:214) ~[common-smartgears-1.2.7-4.0.0-128702.jar:na] at org.gcube.smartgears.handlers.application.lifecycle.ProfileManager.onStart(ProfileManager.java:66) ~[common-smartgears-1.2.7-4.0.0-128702.jar:na] at org.gcube.smartgears.handlers.application.ApplicationLifecycleHandler.onEvent(ApplicationLifecycleHandler.java:42) ~[common-smartgears-1.2.7-4.0.0-128702.jar:na] at org.gcube.smartgears.handlers.application.ApplicationLifecycleHandler.onEvent(ApplicationLifecycleHandler.java:18) ~[common-smartgears-1.2.7-4.0.0-128702.jar:na] at org.gcube.smartgears.handlers.Pipeline.forward(Pipeline.java:69) ~[common-smartgears-1.2.7-4.0.0-128702.jar:na] at org.gcube.smartgears.managers.ApplicationManager.start(ApplicationManager.java:205) ~[common-smartgears-1.2.7-4.0.0-128702.jar:na] at org.gcube.smartgears.managers.ApplicationManager.start(ApplicationManager.java:88) ~[common-smartgears-1.2.7-4.0.0-128702.jar:na] at org.gcube.smartgears.Bootstrap.onStartup(Bootstrap.java:63) [common-smartgears-1.2.7-4.0.0-128702.jar:na] at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5493) [tomcat-catalina-7.0.52.jar:7.0.52] at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) [tomcat-catalina-7.0.52.jar:7.0.52] at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901) [tomcat-catalina-7.0.52.jar:7.0.52] at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877) [tomcat-catalina-7.0.52.jar:7.0.52] at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:632) [tomcat-catalina-7.0.52.jar:7.0.52] at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1229) [tomcat-catalina-7.0.52.jar:7.0.52] at org.apache.catalina.startup.HostConfig$DeployDirectory.run(HostConfig.java:1875) [tomcat-catalina-7.0.52.jar:7.0.52] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] Caused by: java.lang.RuntimeException: deserialisation error at org.gcube.common.resources.gcore.Resources.unmarshal(Resources.java:212) ~[common-gcore-resources-1.3.1-4.0.0-125225.jar:na] at org.gcube.common.resources.gcore.Resources.unmarshal(Resources.java:196) ~[common-gcore-resources-1.3.1-4.0.0-125225.jar:na] at org.gcube.smartgears.handlers.application.lifecycle.ProfileManager.load(ProfileManager.java:248) ~[common-smartgears-1.2.7-4.0.0-128702.jar:na] ... 20 common frames omitted Caused by: javax.xml.bind.UnmarshalException: null at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:335) ~[na:1.7.0_80] at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.createUnmarshalException(UnmarshallerImpl.java:514) ~[na:na] at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:215) ~[na:na] at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:184) ~[na:na] at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:157) ~[na:1.7.0_80] at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:125) ~[na:1.7.0_80] at org.gcube.common.resources.gcore.Resources.unmarshal(Resources.java:209) ~[common-gcore-resources-1.3.1-4.0.0-125225.jar:na] ... 22 common frames omitted Caused by: org.xml.sax.SAXParseException: XML document structures must start and end within the same entity. at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source) ~[xercesImpl-2.7.1.jar:na] at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source) ~[xercesImpl-2.7.1.jar:na] at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) ~[xercesImpl-2.7.1.jar:na] at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) ~[xercesImpl-2.7.1.jar:na] at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source) ~[xercesImpl-2.7.1.jar:na] at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.endEntity(Unknown Source) ~[xercesImpl-2.7.1.jar:na] at org.apache.xerces.impl.XMLDocumentScannerImpl.endEntity(Unknown Source) ~[xercesImpl-2.7.1.jar:na] at org.apache.xerces.impl.XMLEntityManager.endEntity(Unknown Source) ~[xercesImpl-2.7.1.jar:na] at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) ~[xercesImpl-2.7.1.jar:na] at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source) ~[xercesImpl-2.7.1.jar:na] at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source) ~[xercesImpl-2.7.1.jar:na] at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) ~[xercesImpl-2.7.1.jar:na] at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) ~[xercesImpl-2.7.1.jar:na] at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) ~[xercesImpl-2.7.1.jar:na] at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) ~[xercesImpl-2.7.1.jar:na] at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) ~[xercesImpl-2.7.1.jar:na] at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) ~[xercesImpl-2.7.1.jar:na] at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) ~[xercesImpl-2.7.1.jar:na] at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:211) ~[na:na] ... 26 common frames omitted
Updated by Tommaso Piccioli almost 9 years ago
The slowest machine is dataminer5-p-d4science, slower than dataminer4-p-d4s, cpu and disk.
Anyway, the speed at a certain time depends on load punctual load on the full xen server.
Updated by Andrea Dell'Amico almost 9 years ago
I don't think that the VM performances have a big impact on the corrupted state problem. Yesterday a dozen smartexecutor corrupted the state after the upgrade, and they were running on different hypervisors.
Updated by Gianpaolo Coro almost 9 years ago
Andrea and Tommaso have added useful information, but the solution should be provided by Lucio. This is a long-known issue indeed.
Updated by Andrea Dell'Amico almost 9 years ago
There's an open ticket for the bug, #2176
Updated by Gianpaolo Coro almost 9 years ago
Sorry, let me clarify that I opened this ticket because the issue was frequent but random on the development machines and this made it difficult to catch it. Instead, on dataminer4 it seems to be always happening.
Updated by Lucio Lelii almost 9 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
this issue will be solved with the release 4.1.