Task #20602
closedApprove-Merge GRSF records 29/01/2021
100%
Description
Dear Yannis
Please find the next iteration of 27 records for your kind attention, for either MERGING and/or APPROVAL, and in some cases removal with a new harvest of the correct record.
Please find them in the attached file within the tab "FORTH_ACTION_29-01-2021", with actions in columns P-R.
For the MERGES, the UUIDs have been provided for all records requiring merging in column P, the dominant records are stipulated in column Q, and any annotations in column R.
Please disregard the other worksheets for the meantime.
Many thanks in advance!
@aureliano.gentile@fao.org
Files
Updated by Yannis Marketakis about 4 years ago
- Status changed from New to In Progress
Updated by Yannis Marketakis about 4 years ago
- File GRSF_Records-export-2021-01-25_UpdateActions-FORTH_2021-02-03.xlsx GRSF_Records-export-2021-01-25_UpdateActions-FORTH_2021-02-03.xlsx added
- Status changed from In Progress to Feedback
- % Done changed from 0 to 90
Dear @bracken.vanniekerk@fao.org and @aureliano.gentile@fao.org .
Find attached the appended XLS file with results. More specifically, you will find two more columns (S, T) describing what we did.
In a nutshell, you will find that for 7 cases no actions were carried out. These cases required harvesting new data from FishSource. Although we've executed the GRSF harvester again, the latest version of the FishSource API (v.5) does not include the records that are mentioned.
Updated by Aureliano Gentile about 4 years ago
Thanks a lot Yannis, appreciated.
Indeed we were suspecting something related to the FishSource API.
Dear @merul.patel@sustainablefish.org are these stocks not retrieved due to the issues you mentioned recently for the FishSource API? Please see the 7 records mentioned in Yannis's attached Excel file.
With thanks in advance
Aureliano
Updated by Patrícia Amorim about 4 years ago
- File GRSF_Records-export-2021-01-25_UpdateActions_FS_review_18Fev2021.xlsx added
- Assignee changed from Yannis Marketakis to Bracken van Niekerk
Dear Branken and Aureliano,
We are submitting some records to be approved /merged based on the list that you send before (GRSF_Records-export-2021-01-25_UpdateActions.xlsx ). I am attaching here the excel file with the ones to APProve. for some, we did some edits in FS so I think that a new harvest it will be needed before. Please review them.
I am planning to submit more records in the next days.
Many thanks,
Patricia
Updated by Patrícia Amorim about 4 years ago
- File deleted (
GRSF_Records-export-2021-01-25_UpdateActions_FS_review_18Fev2021.xlsx)
Updated by Patrícia Amorim about 4 years ago
- File GRSF_Records-export-2021-01-25_UpdateActions_FS_review_18Fev2021.xlsx GRSF_Records-export-2021-01-25_UpdateActions_FS_review_18Fev2021.xlsx added
adding "notes" column in the file
Updated by Aureliano Gentile about 4 years ago
- File GRSF_StockstoApprove_24March2021.xlsx GRSF_StockstoApprove_24March2021.xlsx added
- Status changed from Feedback to In Progress
- Assignee changed from Bracken van Niekerk to Yannis Marketakis
Dear Yannis, @marketak@ics.forth.gr
Following the submission of Patricia and the review by Bracken, I then further worked on the attached file.
For your convenience, please refer to worksheet "StocksRev24March-compiled26Feb" where you will find 94 records to approve, archive, merge etc.
The action for FORTH are indicated in column P "FORTH ACTION", where needed dominant records are indicated in column Q "FORTH dominant record".
Some of the actions are highlighted in orange (in place of yellow) because it implies a re-harvest, I understand you recently did it with the fixed FishSource API.
Updated by Yannis Marketakis about 4 years ago
- File GRSF_StockstoApprove_24March2021-FORTH-updates.xlsx GRSF_StockstoApprove_24March2021-FORTH-updates.xlsx added
Most of the aforementioned actions have been carried out.
Attached you will find the Excel file expanded with an extra column (column T) indicating what we have done.
You'll notice that some of the cells (in column T) are in yellow (indicating that they require your attention).
As regards those, they fall into the following categories
- records that need to be harvested (e.g. FishSource 889, 750, etc.) that are not available right now (e.g. if we use grsf harvesters right now)
- records that need re-harvesting. I'm not sure why we should do it manually for a few records instead of refreshing the entire GRSF KB.
- a request of unmerge and remerge (rows 57-58) that created 3 new GRSF Stock records.
Updated by Aureliano Gentile about 4 years ago
Dear Yannis thanks for the usual careful follow-up,
1) regarding rows 57-58
- 731f0ef3-fc8a-3a5b-84c3-44363df04090 APPROVE but please ensure RAM is the dominant record (after more checks later one we may merge with firms record 10334)
- f8a9b66e-53b4-3d31-90fc-5064ac526dc1 APPROVE (this is FIRMS 10334)
- fbd9ed88-4806-3bfb-8a0c-1fa3a8ba2e1a APPROVE (This is FishSource 1216)
2) regarding the other two categories of comments
- Why reharvest manually. I propose refreshing the entire GRSF KB with the new versions of all sources.
- The record with FishSource ID 750 is not retrieved using grsf harvester from FishSource API V5 (tried on 2021-03-30) - and similar comments
Yes, absolutely fine to refresh the GRSF KB. Just consider the seminar on the 12th of May, so, or we do at the soonest or we wait after the seminar. Your advice is welcome, I would be happy to do it now (if this not creates problem to FORTH or there are reasons to expect problems or uncertain results).
In case of refreshing, if possible, we also should take into account the updated polygons coordinates provided by RAM, the work we did on species matching based on 3AlphaCode and no longer the scientific name, the updated ASFIS and ISSCFG classifications (species and gears). If these tasks are too demanding, then we will do in a following iteration.
@bracken.vanniekerk@fao.org
@marketak@ics.forth.gr
Updated by Yannis Marketakis about 4 years ago
Aureliano Gentile wrote:
- 731f0ef3-fc8a-3a5b-84c3-44363df04090 APPROVE but please ensure RAM is the dominant record (after more checks later one we may merge with firms record 10334)
- f8a9b66e-53b4-3d31-90fc-5064ac526dc1 APPROVE (this is FIRMS 10334)
- fbd9ed88-4806-3bfb-8a0c-1fa3a8ba2e1a APPROVE (This is FishSource 1216)
Done.
Aureliano Gentile wrote:
In case of refreshing, if possible, we also should take into account the updated polygons coordinates provided by RAM,
Where are these updated polygons? The latest resource I can find (https://data.d4science.net/qS3A) is dated 24-Oct-2019
Aureliano Gentile wrote:
the work we did on species matching based on 3AlphaCode and no longer the scientific name, the updated ASFIS and ISSCFG classifications (species and gears).
I assume that this work you are referring to has been carried out on FIRMS database. As a result, they will be made available in GRSF KB after refreshing (refreshing GRSF KB requires, harvesting new data, transforming, and the rest of the activities)
Aureliano Gentile wrote:
Yes, absolutely fine to refresh the GRSF KB. Just consider the seminar on the 12th of May, so, or we do at the soonest or we wait after the seminar. Your advice is welcome, I would be happy to do it now (if this not creates problem to FORTH or there are reasons to expect problems or uncertain results).
Based on the fact that we have resources that are missing when harvesting new data (e.g. the records from FishSource that are not retrieved from the API), we cannot do it now. As soon as we have those, and the information described above, then yes we could do it now.
Updated by Aureliano Gentile about 4 years ago
Thanks again,
The updated polygons are available in the folder you mentioned (https://data.d4science.net/qS3A) and the file is named RAMLDB_4-494_bbox_compilation_2021-03-16.xlsx (my apologies, it was not loaded)
The discussion on fao3alpha vs. scientific name stems from the google file https://docs.google.com/spreadsheets/d/1mDw3PjS5QKOyNWFeBMu_QWGXqY7Rp6CcE8kXx-s9pNo/edit?usp=sharing you prepared and for which we reviewed and exchanged. At the end we said that the viable solution was to match against fao 3alpha code rather than scientific names for the RAM/FishSource database (when 3alpha is available)
for the missing resources from FishSource, I am going to write to Merul and colleagues and see why those are not in the API, you will be in copy.
Updated by Aureliano Gentile about 4 years ago
Following exchanges with Susana, Merul and Yannis it was found that the FishSource API does retrieve stocks and their nested assessment units (e.g. ID 750 contains ID 751).
Yannis further explained that:
I've inspected the mappings that are used for transforming the data and I noticed that the URL of the record in its original source (e.g. https://www.fishsource.org/stock_page/750) was constructed during the transformation using the following rule:
FISHSOURCE_PREFIX + RESOURCE_TYPE + RESOURCE_IDAs a result, for the stock record with ID 751, it constructed a URL of the form https://www.fishsource.org/stock_page/751
We'll remove the URL construction for FishSource (we still need it for other sources) and use the URL provided under source_of_information.
Updated by Yannis Marketakis about 4 years ago
Thanks for the clarifications @aureliano.gentile@fao.org .
Before moving on with the refreshment of GRSF we have to resolve the following:
- As regards the updated polygons from RAM V4.494, I noticed that the new file (filename: RAMLDB_4-494_bbox_compilation_2021-03-16(1).xlsx) does not contain the polygons. It only contains the coordinates of the bounding boxes (in the columns H, I, J, K). Moreover, compared to the previous version of the file (filename: RAM_AREA-IDs_mapping.xlsx, sheet: area_24Oct2019) we are missing the mapping of ram areas with GRSF Standard areas (see columns I, J). As regards the latter, if you think we are not going to use them then we're OK.
- As regards FIRMS resources, is there an updated Do you have an updated list of the IDs, OIDs to be used? The latest one can be found in https://data.d4science.net/m7A1 and is dated on September 2020. I think that among the request for harvesting new records was a stock record from FIRMS with ID 13747). Can you please update this list so that is can be used for harvesting new records from FIRMS ?
Updated by Aureliano Gentile about 4 years ago
Dear Yannis, thanks
The RAM new submission is an update of the bbox coordinates, I understand no actual polygons for the time being. Nonetheless, I understand coordinates should be more accurate. In the discussion on area standards, we said we will add other standards later on, for example New Zealand areas. Regarding the mapping with FAO area, I think we said this is to be computed by the intersection engine and we no longer maintain such manual mapping. For search interfaces based on FAO areas, then you should get the intersection values by @emmanuel.blondel@fao.org, I guess.
For the FIRMS IDs/OIDs, I just made the request, will write you as soon as I will upload.
Updated by Yannis Marketakis about 4 years ago
Aureliano Gentile wrote:
The RAM new submission is an update of the bbox coordinates, I understand no actual polygons for the time being.
I understand, that we will not use polygons for RAM for now. The side effect, is that their areas will not be visualized in VRE catalog and the GRSF viewer. Can you confirm?
Aureliano Gentile wrote:
For the FIRMS IDs/OIDs, I just made the request, will write you as soon as I will upload.
Thanks a lot
Updated by Aureliano Gentile about 4 years ago
The new list of FIRMS IDs/OIDs is available at https://data.d4science.net/nnXm (FIRMS-MR-F_ID-OID-lang-16Apr2021.xlsx)
(stored in Workspace >VRE Folders >StocksAndFisheriesKB >Requirements >FIRMS-to-GRSF)
Regarding RAM bbox, I understand these are updated coordinates (i.e. more accurate) and we should consider them as polygons even if they are rectangles or squares. What issue on that? We do need to show the area, even if it is a bbox, but we consider them as polygons. Is there any technical constraint on that?
Updated by Yannis Marketakis about 4 years ago
- Status changed from In Progress to Closed
- % Done changed from 90 to 100
Aureliano Gentile wrote:
The new list of FIRMS IDs/OIDs is available at https://data.d4science.net/nnXm (FIRMS-MR-F_ID-OID-lang-16Apr2021.xlsx)
(stored in Workspace >VRE Folders >StocksAndFisheriesKB >Requirements >FIRMS-to-GRSF)
Thanks
Aureliano Gentile wrote:
Regarding RAM bbox, I understand these are updated coordinates (i.e. more accurate) and we should consider them as polygons even if they are rectangles or squares.
OK, clear.
I'll start GRSF refresh workflow and I'll update GRSF PRE first, as usual. After inspecting it (from your side) I'll update GRSF Admin and GRSF Public VREs. I'll open a new ticket for that.