Project

General

Profile

Actions

Task #8040

closed

Task #7900: Generating Darwin Core Archives via SPD

Impossible to produce DWCA for a number of Families

Added by Gianpaolo Coro about 8 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
High
Category:
-
Target version:
Start date:
Apr 07, 2017
Due date:
% Done:

100%

Estimated time:
Infrastructure:
Production

Description

We cannot produce the DWCA for a number of Families because of an error from SPD. The list of Families and the error are attached. We should understand where this issue comes from and if those Families should be really discarded.


Files

Incident.txt (10.6 KB) Incident.txt Gianpaolo Coro, Apr 07, 2017 04:34 PM
families_list_missing.txt (1.44 KB) families_list_missing.txt Gianpaolo Coro, Apr 07, 2017 04:34 PM
errors.txt (18.4 KB) errors.txt Valentina Marioli, Apr 11, 2017 05:55 PM
species.txt (14.8 KB) species.txt Valentina Marioli, Apr 11, 2017 05:55 PM
Actions #1

Updated by Valentina Marioli about 8 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Valentina Marioli about 8 years ago

I've created a script to get all species given a list of scientific names through WoRMS REST webservice (http://www.marinespecies.org/rest/).
I get 783 species using the families included in the file families_list_missing.txt.
The complete list is in species.txt.
I also get 214 errors (see errors.txt) due to issues processing the URIs.
Sometimes the URI contains no data, other times the ID or scientific name seems not to exist, even when it's WoRMS REST webservice to provide such information.

Regarding the DWCAs:
Arthropoda: 1066 (AphiaID) is missing;
Mollusca: 105, 849233,18826,14503,871021, 385738, 234066 are missing;
Platyhelminthes: completed.

Actions #3

Updated by Gianpaolo Coro about 8 years ago

  • % Done changed from 50 to 80

I will contact the WoRMS team to have this issue fixed. Meanwhile, using BiOnym I have checked the ASFIS species to see if they are contained in the WoRMS DWCAs (using minimum edit distance too) and it seems that there are some species present on the WoRMS site but not reached by the DWCA generation process. Strangely, I don't find some of them among the errors.

Actions #4

Updated by Gianpaolo Coro about 8 years ago

As usual, it was nice to talk with VLIZ technicians: they have said that empty families and genus are normal in WoRMS, because, for example, they do not harvest FishBase completely. They are not going to look into the issues we have highlighted, thus we should solve them by ourselves. I think that the DWCA process on SPD should be revised to avoid failure in the case of empty taxonomic branches. Perhaps, avoiding a consistency check for the DWCA could work for the issues listed in the errors file.

Actions #5

Updated by Gianpaolo Coro about 8 years ago

We are struggling to produce the DWCA for some crucial species (ca. 14400) but there are some issues:

1 - there are some species (e.g. WoRMS:183256) that are no more on the WoRMS system but, although the SPD service reports an exception internally, it continues to stay in the "running" state indefinitely;
2 - after some requests, the SPD service leaves all the requests in a "pending" state. I guess this is related to point 1 somehow.

I can see if I can find a workaround at client side, but I guess this issue affects the SPD work.

Actions #6

Updated by Pasquale Pagano about 8 years ago

  • Priority changed from Normal to High

Please @valentina.marioli@isti.cnr.it analyses this issue and if some changes to the code has to be implemented this has to be tracked as a task issue. This incident is opened since several days and we should find a solution to close the issue.

Actions #7

Updated by Gianpaolo Coro about 8 years ago

It seems something happens when the service meets a taxonomic name which is either "quarantined" or deleted. Further, at a certain point the service does not accept jobs anymore and I see this exception in the ghn.log:

16:08:42.968 [spd-job-thread-9] WARN  AbstractLocalReader: the queue is empty
java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) ~[na:1.7.0_80]
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095) ~[na:1.7.0_80]
        at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389) ~[na:1.7.0_80]
        at org.gcube.data.spd.plugin.fwk.readers.LocalReader.hasNext(LocalReader.java:21) ~[spd-plugin-framework-3.1.0-4.3.0-144690.jar:na]
        at org.gcube.data.spd.executor.jobs.dwca.MapDwCA.createTaxaTxt(MapDwCA.java:135) [MapDwCA.class:na]
        at org.gcube.data.spd.executor.jobs.dwca.MapDwCA.createDwCA(MapDwCA.java:45) [MapDwCA.class:na]
        at org.gcube.data.spd.executor.jobs.dwca.DWCAJobByIds.execute(DWCAJobByIds.java:93) [DWCAJobByIds.class:na]
        at org.gcube.data.spd.executor.jobs.SpeciesJob.run(SpeciesJob.java:43) [SpeciesJob.class:na]
        at org.gcube.common.authorization.library.AuthorizedTasks$2.run(AuthorizedTasks.java:75) [common-authorization-2.0.2-4.3.0-144378.jar:na]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_80]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]

I have run a web crawler that excludes deleted and quarantined names from the submitted IDs, but we still get job pending issues.

Actions #8

Updated by Gianpaolo Coro about 8 years ago

I also see many accounting issues in some threads:

java.lang.NullPointerException: null
        at org.gcube.data.spd.executor.jobs.SpeciesJob.generateAccounting(SpeciesJob.java:57) [SpeciesJob.class:na]
        at org.gcube.data.spd.executor.jobs.SpeciesJob.run(SpeciesJob.java:48) [SpeciesJob.class:na]
        at org.gcube.common.authorization.library.AuthorizedTasks$2.run(AuthorizedTasks.java:75) [common-authorization-2.0.2-4.3.0-144378.jar:na]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_80]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80]

Actions #9

Updated by Gianpaolo Coro almost 8 years ago

Shall we continue to investigate this issue? We will need to periodically update our resources and we cannot lose weeks each time. Valentina, could you please make a plan (perhaps together with Lucio) to investigate the issue?

Actions #10

Updated by Valentina Marioli almost 8 years ago

  • Tracker changed from Incident to Task
  • Status changed from In Progress to Paused
Actions #11

Updated by Pasquale Pagano over 6 years ago

  • Status changed from Paused to Closed
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)