19th May 2020 - Discussing new developments with FORTH¶
- Table of contents
- 19th May 2020 - Discussing new developments with FORTH
Meeting Notes¶
Participants:
FORTH (Yannis Marketakis)
FAO (Bracken van Niekerk, Anton Ellenbroek, Aureliano Gentile)
Main topics¶
FORTH made several steps forward towards updating GRSF KB with fresh content retrieved from the original data sources.
A 2nd version was developed of the GRSF KB with those sources, while at the same time we preserved manually added information from the current version of GRSF KB (i.e. manual merges, URLs, IDs, annotations)
This meeting discussed:
the progress done
- Harvesting of new data
- Construction of a new version of GRSF KB, while preserving information from the current version
the adopted workflow
- The workflow aims at automating several steps, in order to be able to refresh GRSF on a periodical basis in future with the least possible human intervention
Obsolete Records. These are records that are already published in GRSF catalogs, however the latest harvesting does not include them (i.e. they no longer exist in the data sources)
- Obsolete legacy records: Remove them from the GRSF Admin VRE
- Obsolete GRSF records: If their status is pending then remove them from the GRSF Admin catalog. If their status is different then set it to archived and provide a proper annotation to the administrators of GRSF so that they are aware of it.
Matching algorithms
- Legacy records and GRSF Stock records are refreshed with respect to the URLs or IDs as they appear in the original data sources (e.g. the source URL for http://data.d4science.org/ctlg/GRSF_Admin/84029760-3c22-38d1-9886-6e7ddb800e08 is the FIRMS record with URL http://firms.fao.org/firms/resource/10412/en)
- GRSF Fishery records are refreshed with respect to their URLs as they appear in the original data sources and their semantic IDs. More specifically if the semantic ID is exactly the same then there is a perfect match (all the information from the previous version of GRSF KB are kept), if there is a partial match, if for example the newly harvested record includes information that were missing in the previous version (e.g. the previous version didn't contain the fishing gear part), then consider this as a match, and in addition, put status as pending and provide a proper annotation to the administrators of GRSF so that they can double-check this, after refreshing GRSF.
Other topics
- GRSF API improvements and GRSF geographical information to be carried out after finalizing the GRSF refreshing discussion
- CNR has developed new services for publishing resources. It is needed to discuss with CNR to see how much effort it is required for migrating the existing clients for publishing/updating GRSF records with respect to the new services.
Resources¶
- GRFS Pilot release (public) https://i-marine.d4science.org/web/grsf/data-catalogue
- GRSF API
- GRSF Competency queries: https://i-marine.d4science.org/group/grsf_admin/grsf-competency-queries
- "GRSF Admin": https://i-marine.d4science.org/group/grsf_admin
- "GRSF VRE": https://i-marine.d4science.org/group/grsf/data-catalogue