Project

General

Profile

Actions

Incident #13029

closed

Private algorithm disappeared in SoBigDataLab VRE

Added by Massimiliano Assante over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Urgent
Category:
Application
Target version:
Start date:
Dec 12, 2018
Due date:
Dec 19, 2018
% Done:

100%

Estimated time:
Infrastructure:
Production

Description

On Monday 10 Dec, I could see my method named SortWords (Python Examples category), also took the screenshot attached. Today the algorithm is not present anymore. Haven't touched anything.

This is not the first time it happens.


Files

Screenshot 2018-12-12 at 14.19.23.png (285 KB) Screenshot 2018-12-12 at 14.19.23.png Massimiliano Assante, Dec 12, 2018 03:24 PM
Screenshot 2018-12-18 at 16.08.28.png (322 KB) Screenshot 2018-12-18 at 16.08.28.png Massimiliano Assante, Dec 18, 2018 04:10 PM

Related issues

Related to D4Science Infrastructure - Incident #13066: DataMiner algorithm on FARM/PerformFISH-KPIs not workingClosedGianpaolo CoroDec 19, 2018

Actions
Actions #1

Updated by Massimiliano Assante over 6 years ago

  • Priority changed from Normal to High
Actions #2

Updated by Giancarlo Panichi over 6 years ago

Looking on svn it seems that SAI and DMPoolManager have correctly published the algorithm on svn, and for what I see on the DataMiner server there is also the algorithm.

Looking on the IS it would seem that the algorithm has been published as private in RPrototypingLab in fact there is the resource:

<Resource version="0.4.x">

   <ID>51cda6dd-1a4c-43f2-abbd-ba4675539446</ID>

   <Type>GenericResource</Type>

   <Scopes>

      <Scope>/d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab</Scope>

   </Scopes>

   <Profile>

      <SecondaryType>StatisticalManagerAlgorithmPrivate</SecondaryType>

      <Name>SORTWORDS</Name>

      <Description>Simple method that sort words in a file {Published by Massimiliano Assante (massimiliano.assante) on 2018/12/04 17:05 GMT}

This resource seems to contain the last publication date and it is not present on IS in SobigDataLab VRE.
If you go to RPrototypingLab see the SortWord algorithm.
Is it possible that you forgot to have it republished?

Actions #3

Updated by Massimiliano Assante over 6 years ago

Look this morning I have not republished in RPrototypingLab for sure (and it says .... Simple method that sort words in a file {Published by Massimiliano Assante (massimiliano.assante) on 2018/12/10 10:44 GMT}.

Also, even if I did why would it disappear from SbDLAB VRE?

Actions #4

Updated by Giancarlo Panichi over 6 years ago

The problem is that there is no resource on the IS and it is also missing the SecondaryType:

 <SecondaryType>StatisticalManagerAlgorithmPrivate</SecondaryType>

Is it possible that cleaning was done in anticipation of the SoBigData meeting?

Actions #5

Updated by Massimiliano Assante over 6 years ago

Giancarlo Panichi wrote:

The problem is that there is no resource on the IS and it is also missing the SecondaryType:

 <SecondaryType>StatisticalManagerAlgorithmPrivate</SecondaryType>

Is it possible that cleaning was done in anticipation of the SoBigData meeting?

No cleaning was performed, cleaning from who?
could you check that the deployment of GATE methods didn t create this issue? I mean I did not publish anything this morning. Nor here nor rproto

Actions #6

Updated by Massimiliano Assante over 6 years ago

Just to add that last Thursday (13/12) I republished the algorithm sortWords again in SoBigDataLab (see the email I received posted below) and today I realised it's not present anymore...

On 13 Dec 2018, at 12:13, DataMiner on D4Science.org Gateway <"services+f95b7f39-4f56-450d-b17e-e0b63cc84d19_MSG@d4science.org"@socialnetworking1.d4science.org> wrote:

- Write ABOVE THIS LINE to reply via email

DataMiner sent you a message: 

The installation of the algorithm is completed successfully. 

You can retrieve experiment results under the '/DataMiner' e-Infrastructure Workspace folder or from the DataMiner interface. 




Algorithm details: 

User: Massimiliano Assante 
Algorithm name: SORTWORDS 
Staging DataMiner Host: dataminer-proto-ghost.d4science.org 
Caller VRE: /d4science.research-infrastructures.eu/SoBigData/SoBigDataLab 
Target VRE: /d4science.research-infrastructures.eu/SoBigData/SoBigDataLab 

- This message was also sent to:
    • Gianpaolo Coro
    • Giancarlo Panichi
    • Statistical Manager
    • Roberto Cirillo
    • Lucio Lelii
If you reply, your message will be also delivered to them.

Go to Message
Please note that email replies do not support attachments.

Actions #7

Updated by Pasquale Pagano over 6 years ago

  • Priority changed from High to Urgent
Actions #8

Updated by Massimiliano Assante over 6 years ago

Si I was told to re publish again the SortWords algorithm in the SoBigData Lab VRE and to paste the resource UID in this ticket.

I am also attaching a screenshot of the DataMiner portlet showing the successful algorithm deployment and execution.

here the Generic Resource correctly published in the scope, which may disappear in the next days:

<Resource version="0.4.x">

   <ID>d5406f87-fd13-4780-901a-71d541a34c48</ID>

   <Type>GenericResource</Type>

   <Scopes>

      <Scope>/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab</Scope>

   </Scopes>

   <Profile>

      <SecondaryType>StatisticalManagerAlgorithmPrivate</SecondaryType>

      <Name>SORTWORDS</Name>

      <Description>Simple method that sort words in a file {Published by Massimiliano Assante (massimiliano.assante) on 2018/12/18 15:00 GMT}</Description>

      <Body>

         <category>TRANSDUCERS</category>

         <inputs>

            <input>

               <name>Empty</name>

               <description>Empty</description>

               <defaultValue />

               <type>string</type>

            </input>

         </inputs>

         <privateusers>massimiliano.assante</privateusers>

      </Body>

   </Profile>

</Resource>
Actions #9

Updated by Massimiliano Assante over 6 years ago

  • Status changed from New to In Progress
Actions #10

Updated by Massimiliano Assante over 6 years ago

So, here's the thing. It disappeared again 5 minutes ago when I triggered an algorithm deployment via the DM Deployer portlet. (Universal Dependencies Pos Tagger For Slovenian)

Actions #11

Updated by Giancarlo Panichi over 6 years ago

After investigating in depth I think there are two problems:

  1. The code of addAlgorithm that run on the DataMiner, I'm talking about the java code that is executed by the ansible scripts. Seeing the source code I noticed that the publication on the IS is done regardless of the parameters passed to it.

  2. There also seems to be a problem with the IS in the SoBigData VO because even the resources of the algorithms that have not been published today have been duplicated, and these were not duplicated this morning.

So for the first problem I try to fix the issue with Lucio.
For the second problem, please @roberto.cirillo@isti.cnr.it or @andrea.dellamico@isti.cnr.it , can you check the registry and possibly restart it because I do not have access rights.

Actions #12

Updated by Roberto Cirillo over 6 years ago

Giancarlo Panichi wrote:

After investigating in depth I think there are two problems:

  1. The code of addAlgorithm that run on the DataMiner, I'm talking about the java code that is executed by the ansible scripts. Seeing the source code I noticed that the publication on the IS is done regardless of the parameters passed to it.

  2. There also seems to be a problem with the IS in the SoBigData VO because even the resources of the algorithms that have not been published today have been duplicated, and these were not duplicated this morning.

So for the first problem I try to fix the issue with Lucio.
For the second problem, please @roberto.cirillo@isti.cnr.it or @andrea.dellamico@isti.cnr.it , can you check the registry and possibly restart it because I do not have access rights.

I don't understand your second point. What resources are you referring? FYI the registry service is restarted every night.
I think we should understand first who delete the resource because I think the resource doesn't disappear but someone delete it. I'm going to check the registry logs.

Actions #13

Updated by Roberto Cirillo over 6 years ago

As shown in the registry logs, the resource has been deleted several times by someone:

2018-12-18 18:28:42,186 INFO  porttypes.ResourceRegistration [ServiceThread-3562,info:78] ResourceRegistration: RemoveResource operation invoked on resource ID=d5406f87-fd13-4780-901a-71d541a34c48, type=GenericResource
2018-12-18 18:28:42,186 INFO  publisher.ISResourcePublisher [ServiceThread-3562,info:78] ISResourcePublisher: ISPublisher is going to remove the GCUBEResource d5406f87-fd13-4780-901a-71d541a34c48
2018-12-18 18:28:42,388 INFO  porttypes.ResourceRegistration [ServiceThread-3562,info:78] ResourceRegistration: RemoveResource operation invoked on resource ID=d5406f87-fd13-4780-901a-71d541a34c48, type=GenericResource
2018-12-18 18:28:42,389 INFO  publisher.ISResourcePublisher [ServiceThread-3562,info:78] ISResourcePublisher: ISPublisher is going to remove the GCUBEResource d5406f87-fd13-4780-901a-71d541a34c48
2018-12-18 18:28:42,795 INFO  porttypes.ResourceRegistration [ServiceThread-3562,info:78] ResourceRegistration: RemoveResource operation invoked on resource ID=d5406f87-fd13-4780-901a-71d541a34c48, type=GenericResource
2018-12-18 18:28:42,795 INFO  publisher.ISResourcePublisher [ServiceThread-3562,info:78] ISResourcePublisher: ISPublisher is going to remove the GCUBEResource d5406f87-fd13-4780-901a-71d541a34c48
2018-12-18 18:28:43,195 INFO  resources.ISRegistryServiceUpdaterHandler [GHNConsumer$<anon>,info:78] ISRegistryServiceUpdaterHandler: using cached instance of ISRegistryServiceHandler@ http://node2.p.d4science.research-infrastructures.eu:8080/wsrf/services/gcube/informationsystem/registry/ResourceRegistration
2018-12-18 18:28:43,219 INFO  porttypes.ResourceRegistration [Thread-3585,info:78] ResourceRegistration: CreateResource operation invoked in scope /d4science.research-infrastructures.eu/SoBigData
2018-12-18 18:28:43,221 INFO  node.KGCUBEHostingNode [Thread-3585,info:78] GHN: Added scope(s) [/d4science.research-infrastructures.eu, /d4science.research-infrastructures.eu/SoBigData]
2018-12-18 18:28:43,222 INFO  publisher.ISResourcePublisher [Thread-3585,info:78] ISResourcePublisher: ISPublisher is going to publish the GCUBEResource bb9ab1c0-1206-11e6-b6b2-c2bb453e9301
2018-12-18 18:28:43,421 INFO  porttypes.ResourceRegistration [ServiceThread-3569,info:78] ResourceRegistration: RemoveResource operation invoked on resource ID=d5406f87-fd13-4780-901a-71d541a34c48, type=GenericResource
2018-12-18 18:28:43,421 INFO  publisher.ISResourcePublisher [ServiceThread-3569,info:78] ISResourcePublisher: ISPublisher is going to remove the GCUBEResource d5406f87-fd13-4780-901a-71d541a34c48
2018-12-18 18:28:43,926 INFO  porttypes.ResourceRegistration [ServiceThread-3569,info:78] ResourceRegistration: RemoveResource operation invoked on resource ID=d5406f87-fd13-4780-901a-71d541a34c48, type=GenericResource
2018-12-18 18:28:43,926 INFO  publisher.ISResourcePublisher [ServiceThread-3569,info:78] ISResourcePublisher: ISPublisher is going to remove the GCUBEResource d5406f87-fd13-4780-901a-71d541a34c48
2018-12-18 18:28:43,982 ERROR generic.GCUBEGenericBulkPublisher [BulkPublisher,error:72] GCUBEGenericBulkPublisher: Unable to publish resources for Profiles/RunningInstance in scope /d4science.research-infrastructures.eu/SoBigData
java.lang.ArrayIndexOutOfBoundsException
2018-12-18 18:28:44,279 INFO  porttypes.ResourceRegistration [ServiceThread-3562,info:78] ResourceRegistration: UpdateResource operation invoked in scope /d4science.research-infrastructures.eu/SoBigData
2018-12-18 18:28:44,281 INFO  runninginstance.KGCUBERunningInstance [ServiceThread-3562,info:78] RunningInstance: Added scope(s) [/d4science.research-infrastructures.eu, /d4science.research-infrastructures.eu/ParthenosVO, /d4science.research-infrastructures.eu/D4OS, /d4science.research-infrastructures.eu/SoBigData, /d4science.research-infrastructures.eu/OpenAIRE, /d4science.research-infrastructures.eu/gCubeApps, /d4science.research-infrastructures.eu/FARM, /d4science.research-infrastructures.eu/D4Research, /d4science.research-infrastructures.eu/SmartArea]
2018-12-18 18:28:44,282 INFO  publisher.ISResourcePublisher [ServiceThread-3562,info:78] ISResourcePublisher: ISPublisher is going to publish the GCUBEResource 1d8760f5-201f-4696-8c36-855c7f558ba2
2018-12-18 18:28:44,482 ERROR generic.GCUBEGenericBulkPublisher [BulkPublisher,error:72] GCUBEGenericBulkPublisher: Unable to publish resources for Profiles/GHN in scope /d4science.research-infrastructures.eu/SoBigData
java.lang.ArrayIndexOutOfBoundsException
2018-12-18 18:28:45,005 INFO  porttypes.ResourceRegistration [ServiceThread-3570,info:78] ResourceRegistration: RemoveResource operation invoked on resource ID=d5406f87-fd13-4780-901a-71d541a34c48, type=GenericResource
2018-12-18 18:28:45,005 INFO  publisher.ISResourcePublisher [ServiceThread-3570,info:78] ISResourcePublisher: ISPublisher is going to remove the GCUBEResource d5406f87-fd13-4780-901a-71d541a34c48
nnn

double checking on access log show clearly that the resources has been removed by more than one service:

2018-12-18 18:28:42,186 INFO  handlers.GCUBEHandler [ServiceThread-3562,info:78] GCUBEHandler: START CALL FROM (146.48.123.169) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3562,5,main]
2018-12-18 18:28:42,224 INFO  handlers.GCUBEHandler [ServiceThread-3562,info:78] GCUBEHandler: END CALL FROM (146.48.123.169) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3562,5,main],[0.0]
2018-12-18 18:28:42,388 INFO  handlers.GCUBEHandler [ServiceThread-3562,info:78] GCUBEHandler: START CALL FROM (146.48.123.168) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3562,5,main]
2018-12-18 18:28:42,425 INFO  handlers.GCUBEHandler [ServiceThread-3562,info:78] GCUBEHandler: END CALL FROM (146.48.123.168) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3562,5,main],[0.0]
2018-12-18 18:28:42,795 INFO  handlers.GCUBEHandler [ServiceThread-3562,info:78] GCUBEHandler: START CALL FROM (146.48.123.172) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3562,5,main]
2018-12-18 18:28:42,833 INFO  handlers.GCUBEHandler [ServiceThread-3562,info:78] GCUBEHandler: END CALL FROM (146.48.123.172) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3562,5,main],[0.0]
2018-12-18 18:28:43,420 INFO  handlers.GCUBEHandler [ServiceThread-3569,info:78] GCUBEHandler: START CALL FROM (146.48.123.167) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3569,5,main]
2018-12-18 18:28:43,459 INFO  handlers.GCUBEHandler [ServiceThread-3569,info:78] GCUBEHandler: END CALL FROM (146.48.123.167) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3569,5,main],[0.0]
2018-12-18 18:28:43,925 INFO  handlers.GCUBEHandler [ServiceThread-3569,info:78] GCUBEHandler: START CALL FROM (146.48.123.171) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3569,5,main]
2018-12-18 18:28:43,987 INFO  handlers.GCUBEHandler [ServiceThread-3569,info:78] GCUBEHandler: END CALL FROM (146.48.123.171) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3569,5,main],[0.0]
2
2018-12-18 18:28:45,004 INFO  handlers.GCUBEHandler [ServiceThread-3570,info:78] GCUBEHandler: START CALL FROM (146.48.123.170) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3570,5,main]
2018-12-18 18:28:45,566 INFO  handlers.GCUBEHandler [ServiceThread-3570,info:78] GCUBEHandler: END CALL FROM (146.48.123.170) TO (InformationSystem:IS-Registry:remove),/d4science.research-infrastructures.eu/SoBigData/SoBigDataLab,Thread[ServiceThread-3570,5,main],[0.0]

The hosts above are respectively:

  • dataminer2-gw-proto.d4science.org
  • dataminer1-gw-proto.d4science.org
  • dataminer5-gw-proto.d4science.org
  • dataminer0-gw-proto.d4science.org
  • dataminer4-gw-proto.d4science.org
  • dataminer3-gw-proto.d4science.org

Now the question is: why all the gw instances have removed several times the same GenericResource from IS?

Actions #14

Updated by Giancarlo Panichi over 6 years ago

  • % Done changed from 0 to 100

The answer is that the dataniner*-gw-proto are also dataminer in the proto cluster, so they also execute the java code for the installation of the algorithms, what I have indicated in point 1 above and exactly:

dataminer-algorithms-importer-1.2.1-SNAPSHOT.jar

The problem is that this component is wrong, and regardless of the parameters it receives, however, it publishes on the IS, which it should not do.

So, I open a task to fix this component.

Actions #15

Updated by Massimiliano Assante over 6 years ago

Giancarlo Panichi wrote:

The answer is that the dataniner*-gw-proto are also dataminer in the proto cluster, so they also execute the java code for the installation of the algorithms, what I have indicated in point 1 above and exactly:

dataminer-algorithms-importer-1.2.1-SNAPSHOT.jar

The problem is that this component is wrong, and regardless of the parameters it receives, however, it publishes on the IS, which it should not do.

So, I open a task to fix this component.

Fine that dataminer-algorithms-importer-1.2.1 has a bug, it has to, question is: why at some point this component decides to delete the "private" algorithm generic resources?

Actions #17

Updated by Giancarlo Panichi over 6 years ago

  • Due date set to Dec 19, 2018
  • Status changed from In Progress to Closed

I close this ticket

Actions #18

Updated by Giancarlo Panichi over 6 years ago

Hi @massimiliano.assante@isti.cnr.it , it is part of the code that was written by Gianpaolo and Lucio to install the algorithms on the servers.

Actions #19

Updated by Roberto Cirillo over 6 years ago

Just two comments:
If the problem is in the dataminer-algorithms-importer, why only the gw proto instances have removed the GenericResource from IS?
Is the same component also present in all the dataminer*-proto instances?

In addition, is it normal that each instance of gw cluster try to delete or update the same resource at the same time on the same service?

Actions #20

Updated by Giancarlo Panichi over 6 years ago

@roberto.cirillo@isti.cnr.it I have already answered above this depends on the fact that that code does not behave correctly with respect to the parameters passed.

Actions #21

Updated by Giancarlo Panichi over 6 years ago

@roberto.cirillo@isti.cnr.it , Yes the same component is also present in all the dataminer*-proto instances.

Actions #22

Updated by Massimiliano Assante over 6 years ago

  • Related to Incident #13066: DataMiner algorithm on FARM/PerformFISH-KPIs not working added
Actions #23

Updated by Lucio Lelii over 6 years ago

My fear is that what is now modifying Giancarlo is not a bug but a choice, because it's impossible we didn't notice it before, this change was done in march 2017.

Actions #24

Updated by Massimiliano Assante over 6 years ago

Lucio Lelii wrote:

My fear is that what is now modifying Giancarlo is not a bug but a choice, because it's impossible we didn't notice it before, this change was done in march 2017.

It cannot be a choice. You couldn't have chosen that new algorithm deployments would randomly remove the others. What you're saying cannot be true (no changes since 1.5 years) is impossible because private's algorithms are a new feature since 6 months or so not since 1.5 years.

Actions #25

Updated by Lucio Lelii over 6 years ago

I was talking about the publication on the IS done at every execution of the addAlgorithm, this is what Giancarlo is going to change

Actions #26

Updated by Gianpaolo Coro over 6 years ago

I think the issue is in the fact that the Generic Workers do not find the algorithm in the private category and they try to republish it.

Please, consider that the add algorithm operation is NOT done only used by the DMPool Manager. The only way I have to publish an algorithm from a prototype VRE to a public VRE is by using a process on the DataMiner that operates this movement. This must be able to invoke the algorithm installer and publish the resource on the IS. Thus, please allow this process to be able to use the algorithm installer.

Alternatively, I should be opening tickets for every algorithm to move, which goes against innovation.

Actions #27

Updated by Roberto Cirillo over 6 years ago

The problem here was due to an incompatibility between jdk 8 build 161 and registry service. In order to fix this incident we need to downgrade the jdk to 8_151 version on every generic-worker. Now the genericworker instances have jdk8_171 while the dm-proto jdk8_151. For this reason only the gw instances suffer of this problem. I'm going to open a dedicated ticket for downgrading the jdk on gw instances.

Actions #28

Updated by Giancarlo Panichi over 6 years ago

@roberto.cirillo@isti.cnr.it this ticket has been closed, follow the discussion in the associated Task. The problem is that DataMiner should not publish on the IS but only DMGhost can do it. Now we have to support this thing considering the GP algorithm for resolve the problem in the short term.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)