Incident #13029
closedPrivate algorithm disappeared in SoBigDataLab VRE
100%
Description
On Monday 10 Dec, I could see my method named SortWords (Python Examples category), also took the screenshot attached. Today the algorithm is not present anymore. Haven't touched anything.
This is not the first time it happens.
Files
Related issues
Updated by Massimiliano Assante over 6 years ago
- Priority changed from Normal to High
Updated by Giancarlo Panichi over 6 years ago
Looking on svn it seems that SAI and DMPoolManager have correctly published the algorithm on svn, and for what I see on the DataMiner server there is also the algorithm.
Looking on the IS it would seem that the algorithm has been published as private in RPrototypingLab in fact there is the resource:
<Resource version="0.4.x"> <ID>51cda6dd-1a4c-43f2-abbd-ba4675539446</ID> <Type>GenericResource</Type> <Scopes> <Scope>/d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab</Scope> </Scopes> <Profile> <SecondaryType>StatisticalManagerAlgorithmPrivate</SecondaryType> <Name>SORTWORDS</Name> <Description>Simple method that sort words in a file {Published by Massimiliano Assante (massimiliano.assante) on 2018/12/04 17:05 GMT}
This resource seems to contain the last publication date and it is not present on IS in SobigDataLab VRE.
If you go to RPrototypingLab see the SortWord algorithm.
Is it possible that you forgot to have it republished?
Updated by Massimiliano Assante over 6 years ago
Look this morning I have not republished in RPrototypingLab for sure (and it says .... Simple method that sort words in a file {Published by Massimiliano Assante (massimiliano.assante) on 2018/12/10 10:44 GMT}.
Also, even if I did why would it disappear from SbDLAB VRE?
Updated by Giancarlo Panichi over 6 years ago
The problem is that there is no resource on the IS and it is also missing the SecondaryType:
<SecondaryType>StatisticalManagerAlgorithmPrivate</SecondaryType>
Is it possible that cleaning was done in anticipation of the SoBigData meeting?
Updated by Massimiliano Assante over 6 years ago
Giancarlo Panichi wrote:
The problem is that there is no resource on the IS and it is also missing the SecondaryType:
<SecondaryType>StatisticalManagerAlgorithmPrivate</SecondaryType>Is it possible that cleaning was done in anticipation of the SoBigData meeting?
No cleaning was performed, cleaning from who?
could you check that the deployment of GATE methods didn t create this issue? I mean I did not publish anything this morning. Nor here nor rproto
Updated by Massimiliano Assante over 6 years ago
Just to add that last Thursday (13/12) I republished the algorithm sortWords again in SoBigDataLab (see the email I received posted below) and today I realised it's not present anymore...
On 13 Dec 2018, at 12:13, DataMiner on D4Science.org Gateway <"services+f95b7f39-4f56-450d-b17e-e0b63cc84d19_MSG@d4science.org"@socialnetworking1.d4science.org> wrote: - Write ABOVE THIS LINE to reply via email DataMiner sent you a message: The installation of the algorithm is completed successfully. You can retrieve experiment results under the '/DataMiner' e-Infrastructure Workspace folder or from the DataMiner interface. Algorithm details: User: Massimiliano Assante Algorithm name: SORTWORDS Staging DataMiner Host: dataminer-proto-ghost.d4science.org Caller VRE: /d4science.research-infrastructures.eu/SoBigData/SoBigDataLab Target VRE: /d4science.research-infrastructures.eu/SoBigData/SoBigDataLab - This message was also sent to: • Gianpaolo Coro • Giancarlo Panichi • Statistical Manager • Roberto Cirillo • Lucio Lelii If you reply, your message will be also delivered to them. Go to Message Please note that email replies do not support attachments.
Updated by Pasquale Pagano over 6 years ago
- Priority changed from High to Urgent
Updated by Massimiliano Assante over 6 years ago
Si I was told to re publish again the SortWords algorithm in the SoBigData Lab VRE and to paste the resource UID in this ticket.
I am also attaching a screenshot of the DataMiner portlet showing the successful algorithm deployment and execution.
here the Generic Resource correctly published in the scope, which may disappear in the next days:
<Resource version="0.4.x"> <ID>d5406f87-fd13-4780-901a-71d541a34c48</ID> <Type>GenericResource</Type> <Scopes> <Scope>/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab</Scope> </Scopes> <Profile> <SecondaryType>StatisticalManagerAlgorithmPrivate</SecondaryType> <Name>SORTWORDS</Name> <Description>Simple method that sort words in a file {Published by Massimiliano Assante (massimiliano.assante) on 2018/12/18 15:00 GMT}</Description> <Body> <category>TRANSDUCERS</category> <inputs> <input> <name>Empty</name> <description>Empty</description> <defaultValue /> <type>string</type> </input> </inputs> <privateusers>massimiliano.assante</privateusers> </Body> </Profile> </Resource>
Updated by Massimiliano Assante over 6 years ago
- Status changed from New to In Progress
Updated by Massimiliano Assante over 6 years ago
So, here's the thing. It disappeared again 5 minutes ago when I triggered an algorithm deployment via the DM Deployer portlet. (Universal Dependencies Pos Tagger For Slovenian)
Updated by Giancarlo Panichi over 6 years ago
After investigating in depth I think there are two problems:
The code of addAlgorithm that run on the DataMiner, I'm talking about the java code that is executed by the ansible scripts. Seeing the source code I noticed that the publication on the IS is done regardless of the parameters passed to it.
There also seems to be a problem with the IS in the SoBigData VO because even the resources of the algorithms that have not been published today have been duplicated, and these were not duplicated this morning.
So for the first problem I try to fix the issue with Lucio.
For the second problem, please @roberto.cirillo@isti.cnr.it or @andrea.dellamico@isti.cnr.it , can you check the registry and possibly restart it because I do not have access rights.
Updated by Roberto Cirillo over 6 years ago
Giancarlo Panichi wrote:
After investigating in depth I think there are two problems:
The code of addAlgorithm that run on the DataMiner, I'm talking about the java code that is executed by the ansible scripts. Seeing the source code I noticed that the publication on the IS is done regardless of the parameters passed to it.
There also seems to be a problem with the IS in the SoBigData VO because even the resources of the algorithms that have not been published today have been duplicated, and these were not duplicated this morning.
So for the first problem I try to fix the issue with Lucio.
For the second problem, please @roberto.cirillo@isti.cnr.it or @andrea.dellamico@isti.cnr.it , can you check the registry and possibly restart it because I do not have access rights.
I don't understand your second point. What resources are you referring? FYI the registry service is restarted every night.
I think we should understand first who delete the resource because I think the resource doesn't disappear but someone delete it. I'm going to check the registry logs.
Updated by Roberto Cirillo over 6 years ago
As shown in the registry logs, the resource has been deleted several times by someone:
2018-12-18 18:28:42,186 INFO porttypes.ResourceRegistration [ServiceThread-3562,info:78] ResourceRegistration: RemoveResource operation invoked on resource ID=d5406f87-fd13-4780-901a-71d541a34c48, type=GenericResource 2018-12-18 18:28:42,186 INFO publisher.ISResourcePublisher [ServiceThread-3562,info:78] ISResourcePublisher: ISPublisher is going to remove the GCUBEResource d5406f87-fd13-4780-901a-71d541a34c48 2018-12-18 18:28:42,388 INFO porttypes.ResourceRegistration [ServiceThread-3562,info:78] ResourceRegistration: RemoveResource operation invoked on resource ID=d5406f87-fd13-4780-901a-71d541a34c48, type=GenericResource 2018-12-18 18:28:42,389 INFO publisher.ISResourcePublisher [ServiceThread-3562,info:78] ISResourcePublisher: ISPublisher is going to remove the GCUBEResource d5406f87-fd13-4780-901a-71d541a34c48 2018-12-18 18:28:42,795 INFO porttypes.ResourceRegistration [ServiceThread-3562,info:78] ResourceRegistration: RemoveResource operation invoked on resource ID=d5406f87-fd13-4780-901a-71d541a34c48, type=GenericResource 2018-12-18 18:28:42,795 INFO publisher.ISResourcePublisher [ServiceThread-3562,info:78] ISResourcePublisher: ISPublisher is going to remove the GCUBEResource d5406f87-fd13-4780-901a-71d541a34c48 2018-12-18 18:28:43,195 INFO resources.ISRegistryServiceUpdaterHandler [GHNConsumer$<anon>,info:78] ISRegistryServiceUpdaterHandler: using cached instance of ISRegistryServiceHandler@ http://node2.p.d4science.research-infrastructures.eu:8080/wsrf/services/gcube/informationsystem/registry/ResourceRegistration 2018-12-18 18:28:43,219 INFO porttypes.ResourceRegistration [Thread-3585,info:78] ResourceRegistration: CreateResource operation invoked in scope /d4science.research-infrastructures.eu/SoBigData 2018-12-18 18:28:43,221 INFO node.KGCUBEHostingNode [Thread-3585,info:78] GHN: Added scope(s) [/d4science.research-infrastructures.eu, /d4science.research-infrastructures.eu/SoBigData] 2018-12-18 18:28:43,222 INFO publisher.ISResourcePublisher [Thread-3585,info:78] ISResourcePublisher: ISPublisher is going to publish the GCUBEResource bb9ab1c0-1206-11e6-b6b2-c2bb453e9301 2018-12-18 18:28:43,421 INFO porttypes.ResourceRegistration [ServiceThread-3569,info:78] ResourceRegistration: RemoveResource operation invoked on resource ID=d5406f87-fd13-4780-901a-71d541a34c48, type=GenericResource 2018-12-18 18:28:43,421 INFO publisher.ISResourcePublisher [ServiceThread-3569,info:78] ISResourcePublisher: ISPublisher is going to remove the GCUBEResource d5406f87-fd13-4780-901a-71d541a34c48 2018-12-18 18:28:43,926 INFO porttypes.ResourceRegistration [ServiceThread-3569,info:78] ResourceRegistration: RemoveResource operation invoked on resource ID=d5406f87-fd13-4780-901a-71d541a34c48, type=GenericResource 2018-12-18 18:28:43,926 INFO publisher.ISResourcePublisher [ServiceThread-3569,info:78] ISResourcePublisher: ISPublisher is going to remove the GCUBEResource d5406f87-fd13-4780-901a-71d541a34c48 2018-12-18 18:28:43,982 ERROR generic.GCUBEGenericBulkPublisher [BulkPublisher,error:72] GCUBEGenericBulkPublisher: Unable to publish resources for Profiles/RunningInstance in scope /d4science.research-infrastructures.eu/SoBigData java.lang.ArrayIndexOutOfBoundsException 2018-12-18 18:28:44,279 INFO porttypes.ResourceRegistration [ServiceThread-3562,info:78] ResourceRegistration: UpdateResource operation invoked in scope /d4science.research-infrastructures.eu/SoBigData 2018-12-18 18:28:44,281 INFO runninginstance.KGCUBERunningInstance [ServiceThread-3562,info:78] RunningInstance: Added scope(s) [/d4science.research-infrastructures.eu, /d4science.research-infrastructures.eu/ParthenosVO, /d4science.research-infrastructures.eu/D4OS, /d4science.research-infrastructures.eu/SoBigData, /d4science.research-infrastructures.eu/OpenAIRE, /d4science.research-infrastructures.eu/gCubeApps, /d4science.research-infrastructures.eu/FARM, /d4science.research-infrastructures.eu/D4Research, /d4science.research-infrastructures.eu/SmartArea] 2018-12-18 18:28:44,282 INFO publisher.ISResourcePublisher [ServiceThread-3562,info:78] ISResourcePublisher: ISPublisher is going to publish the GCUBEResource 1d8760f5-201f-4696-8c36-855c7f558ba2 2018-12-18 18:28:44,482 ERROR generic.GCUBEGenericBulkPublisher [BulkPublisher,error:72] GCUBEGenericBulkPublisher: Unable to publish resources for Profiles/GHN in scope /d4science.research-infrastructures.eu/SoBigData java.lang.ArrayIndexOutOfBoundsException 2018-12-18 18:28:45,005 INFO porttypes.ResourceRegistration [ServiceThread-3570,info:78] ResourceRegistration: RemoveResource operation invoked on resource ID=d5406f87-fd13-4780-901a-71d541a34c48, type=GenericResource 2018-12-18 18:28:45,005 INFO publisher.ISResourcePublisher [ServiceThread-3570,info:78] ISResourcePublisher: ISPublisher is going to remove the GCUBEResource d5406f87-fd13-4780-901a-71d541a34c48 nnn
double checking on access log show clearly that the resources has been removed by more than one service:
2018-12-18 18:28:42,186 INFO handlers.GCUBEHandler [ServiceThread-3562,info:78] GCUBEHandler: START CALL FROM (146.48.123.169) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3562,5,main] 2018-12-18 18:28:42,224 INFO handlers.GCUBEHandler [ServiceThread-3562,info:78] GCUBEHandler: END CALL FROM (146.48.123.169) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3562,5,main],[0.0] 2018-12-18 18:28:42,388 INFO handlers.GCUBEHandler [ServiceThread-3562,info:78] GCUBEHandler: START CALL FROM (146.48.123.168) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3562,5,main] 2018-12-18 18:28:42,425 INFO handlers.GCUBEHandler [ServiceThread-3562,info:78] GCUBEHandler: END CALL FROM (146.48.123.168) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3562,5,main],[0.0] 2018-12-18 18:28:42,795 INFO handlers.GCUBEHandler [ServiceThread-3562,info:78] GCUBEHandler: START CALL FROM (146.48.123.172) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3562,5,main] 2018-12-18 18:28:42,833 INFO handlers.GCUBEHandler [ServiceThread-3562,info:78] GCUBEHandler: END CALL FROM (146.48.123.172) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3562,5,main],[0.0] 2018-12-18 18:28:43,420 INFO handlers.GCUBEHandler [ServiceThread-3569,info:78] GCUBEHandler: START CALL FROM (146.48.123.167) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3569,5,main] 2018-12-18 18:28:43,459 INFO handlers.GCUBEHandler [ServiceThread-3569,info:78] GCUBEHandler: END CALL FROM (146.48.123.167) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3569,5,main],[0.0] 2018-12-18 18:28:43,925 INFO handlers.GCUBEHandler [ServiceThread-3569,info:78] GCUBEHandler: START CALL FROM (146.48.123.171) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3569,5,main] 2018-12-18 18:28:43,987 INFO handlers.GCUBEHandler [ServiceThread-3569,info:78] GCUBEHandler: END CALL FROM (146.48.123.171) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3569,5,main],[0.0] 2 2018-12-18 18:28:45,004 INFO handlers.GCUBEHandler [ServiceThread-3570,info:78] GCUBEHandler: START CALL FROM (146.48.123.170) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3570,5,main] 2018-12-18 18:28:45,566 INFO handlers.GCUBEHandler [ServiceThread-3570,info:78] GCUBEHandler: END CALL FROM (146.48.123.170) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3570,5,main],[0.0]
The hosts above are respectively:
- dataminer2-gw-proto.d4science.org
- dataminer1-gw-proto.d4science.org
- dataminer5-gw-proto.d4science.org
- dataminer0-gw-proto.d4science.org
- dataminer4-gw-proto.d4science.org
- dataminer3-gw-proto.d4science.org
Now the question is: why all the gw instances have removed several times the same GenericResource from IS?
Updated by Giancarlo Panichi over 6 years ago
- % Done changed from 0 to 100
The answer is that the dataniner*-gw-proto are also dataminer in the proto cluster, so they also execute the java code for the installation of the algorithms, what I have indicated in point 1 above and exactly:
dataminer-algorithms-importer-1.2.1-SNAPSHOT.jar
The problem is that this component is wrong, and regardless of the parameters it receives, however, it publishes on the IS, which it should not do.
So, I open a task to fix this component.
Updated by Massimiliano Assante over 6 years ago
Giancarlo Panichi wrote:
The answer is that the dataniner*-gw-proto are also dataminer in the proto cluster, so they also execute the java code for the installation of the algorithms, what I have indicated in point 1 above and exactly:
dataminer-algorithms-importer-1.2.1-SNAPSHOT.jarThe problem is that this component is wrong, and regardless of the parameters it receives, however, it publishes on the IS, which it should not do.
So, I open a task to fix this component.
Fine that dataminer-algorithms-importer-1.2.1 has a bug, it has to, question is: why at some point this component decides to delete the "private" algorithm generic resources?
Updated by Giancarlo Panichi over 6 years ago
- Due date set to Dec 19, 2018
- Status changed from In Progress to Closed
I close this ticket
Updated by Giancarlo Panichi over 6 years ago
Hi @massimiliano.assante@isti.cnr.it , it is part of the code that was written by Gianpaolo and Lucio to install the algorithms on the servers.
Updated by Roberto Cirillo over 6 years ago
Just two comments:
If the problem is in the dataminer-algorithms-importer, why only the gw proto instances have removed the GenericResource from IS?
Is the same component also present in all the dataminer*-proto instances?
In addition, is it normal that each instance of gw cluster try to delete or update the same resource at the same time on the same service?
Updated by Giancarlo Panichi over 6 years ago
@roberto.cirillo@isti.cnr.it I have already answered above this depends on the fact that that code does not behave correctly with respect to the parameters passed.
Updated by Giancarlo Panichi over 6 years ago
@roberto.cirillo@isti.cnr.it , Yes the same component is also present in all the dataminer*-proto instances.
Updated by Massimiliano Assante over 6 years ago
- Related to Incident #13066: DataMiner algorithm on FARM/PerformFISH-KPIs not working added
Updated by Lucio Lelii over 6 years ago
My fear is that what is now modifying Giancarlo is not a bug but a choice, because it's impossible we didn't notice it before, this change was done in march 2017.
Updated by Massimiliano Assante over 6 years ago
Lucio Lelii wrote:
My fear is that what is now modifying Giancarlo is not a bug but a choice, because it's impossible we didn't notice it before, this change was done in march 2017.
It cannot be a choice. You couldn't have chosen that new algorithm deployments would randomly remove the others. What you're saying cannot be true (no changes since 1.5 years) is impossible because private's algorithms are a new feature since 6 months or so not since 1.5 years.
Updated by Lucio Lelii over 6 years ago
I was talking about the publication on the IS done at every execution of the addAlgorithm, this is what Giancarlo is going to change
Updated by Gianpaolo Coro over 6 years ago
I think the issue is in the fact that the Generic Workers do not find the algorithm in the private category and they try to republish it.
Please, consider that the add algorithm operation is NOT done only used by the DMPool Manager. The only way I have to publish an algorithm from a prototype VRE to a public VRE is by using a process on the DataMiner that operates this movement. This must be able to invoke the algorithm installer and publish the resource on the IS. Thus, please allow this process to be able to use the algorithm installer.
Alternatively, I should be opening tickets for every algorithm to move, which goes against innovation.
Updated by Roberto Cirillo over 6 years ago
The problem here was due to an incompatibility between jdk 8 build 161 and registry service. In order to fix this incident we need to downgrade the jdk to 8_151 version on every generic-worker. Now the genericworker instances have jdk8_171 while the dm-proto jdk8_151. For this reason only the gw instances suffer of this problem. I'm going to open a dedicated ticket for downgrading the jdk on gw instances.
Updated by Giancarlo Panichi over 6 years ago
@roberto.cirillo@isti.cnr.it this ticket has been closed, follow the discussion in the associated Task. The problem is that DataMiner should not publish on the IS but only DMGhost can do it. Now we have to support this thing considering the GP algorithm for resolve the problem in the short term.