Project

General

Profile

Semantic ID

Description

GRSF records are meant to be unique regarding the information they convey, and this is achieved (among others) with the merging and dissection activities that are carried out. Although uniqueness is justified from the adopted processes and from the assignment of Universal Unique Identiefers (UUID) to the records, records are assigned another form of Identifier, the Semantic Identifier (for short semantic ID). Semantic IDs are build from particular concepts of a record and are used for identification and traceability purposes of GRSF records. For this reason they are constructed in a way that is both human and machine- understandable/interpretable.

Structure and Basic Principles

The basic principle for the construction of Semantic Identifiers is the re-use of standards as much as possible, since this will reduce the overall length of the identifier and will avoid any interpretation issues about its contents. In general, the semantic ID is composed of components that identify particular aspects of a GRSF record. The identification of these aspects is composed of two parts: the first one being the standard scheme that is used for the identification and the second one being the identifier with respect to that standard scheme. As an example consider ASFIS:YFT that identifies the marine species YFT (i.e. 3-alpha code for the marine species with scientific name Thunnus albacares) with respect to the ASFIS List of Species for Fishery Statistics Purposes.

It is evident that the semantic ID, is not assigned to a GRSF record (e.g. like the UUID for example). On the contrary, it is being constructed from the information of a GRSF record. The properties that are used for constructing it are concrete and most importantly the order of appearance of those attributes in the semantic ID is very specific. In addition, those fields are different for the cases of GRSF stock and GRSF fishery records. More specifically, the fields that are used for each resource type are shown in the table below:

Resource Type Field1 Field 2 Field 3 Field 4 Field 5
GRSF Stock Marine Species Assessment/Distribution Area
GRSF Fishery Marine Species Fishing/Management Area Management Authority Flag State Fishing Gear Type

As described above, the order of appearance of these fields in the semantic identifier is predefined, so identification of marine species in a GRSF record (either stock or fishery) will always be the first part of the semantic ID, and appear before the areas' part. Semantic IDs allow also the addition of multiple values (e.g. multiple assessment areas for a particular GRSF stock record), as well as empty values for a field indicating the absence of the corresponding information about this field.

As regards the codification of the semantic IDs we adopt the following rules:

  • Different parts of the semantic ID are concatenated using '+'
  • If a field has multiple values they are concatenated using ';'
  • If a field does not have any values then nothing is added for that field
  • Semantic ID for GRSF stock records should always have 2 parts and GRSF fishery records should have 5 parts (even if some of them are empty. As a consequence, the semantic ID of GRSF stock record should always have one '+' separator while GRSF Fishery records should have exactly 4.
  • If there is a field that cannot be properly identified using a standard coding scheme, then it is allowed to use a proprietary identifier using UNK as the coding scheme.

Some examples of valid and invalid semantic IDs are shown below:

Semantic ID Comment
asfis:TGS+FAO:27 valid
asfis:RJH+fao:27.4.a;fao:27.6 valid, multiple values for assessment area
asfis:BLM+rfb:IOTC valid, with uknnown identifier for assessment area
asfis:SPR+fao:27.3.a+authority:NAT:NOR+iso3:SWE+isscfg:03.29 valid
asfis:SPR+fao:27.3.a+authority:NAT:NOR++isscfg:03.29 valid, empty value for flag state
asfis:SPR+fao:27.3.a+authority:NAT:NOR+iso:SWE+ valid, empty value for fishing gear
asfis:SPR+fao:27.3.a+authority:NAT:NOR++ valid, empty values for flag states and fishing gear
asfis:RJH+fao:27.4.a,fao:27.6 invalid, multiple values for assessment area not concatenated using ';'
fao:27.4.a+asfis:RJH invalid, wrong ordering of fields
asfis:SPR+fao:27.3.a+authority:NAT:NOR+isscfg:03.29 invalid, flag state is missing
asfis:SPR+fao:27.3.a+authority:NAT:NOR invalid, empty values for flag states and fishing gear are not declared properly

Below we provide an indicative list of standard coding schemes that are used for building the corresponding parts of the semantic ID.

  • ASFIS List of Species for Fishery Statistics Purposes, used for identifying marine species. In this case the valud of the code is the 3-alpha code of the species (e.g. asfis:SOL)
  • AphiaID (provided by WoRMS), used for identifying marine species (e.g. aphiaid:341983)
  • FAO areas, used for identifying assessment or fishing areas (e.g. fao:34.1.11)
  • RFB areas, used for identifying assessment or fishing areas (e.g. rfb:IOTC)
  • EEZ areas, used for identifying assessment or fishing areas (e.g. EEZ:PAN)
  • ISO3 countries, used for identifying the flag state or the country of a management authority (not the ones under international scope), (e.g. iso3:MEX, authority:NAT:MEX)
  • International Standard Statistical Classification of Fishing Gear (ISSCFG) codes, for identifying fishing gears (e.g. sscfg:08.5)

Limitations

The current structure of the semantic IDs is informative enough to facilitate users in properly and quickly identifying the key information of a GRSF record. However, it's detailed and "strict" structure might be also its limitation in some cases. Most importantly, the fact that it does not allow adding any other type of information apart from those that have been identified might be an issue limiting the proper traceability of records. For example the SDG flag of a GRSF record is an important indicator that could be part of the semantic ID, since it provides valuable information for the record (not necessarily identification information though).

In addition, the mandatory order of appearance might be cumbersome in some cases, especially when one field has multiple and many in number, values. The result is that it will produce a lengthy semantic ID, with its bigger part being the multiple values of one field. Consider the case that is shown below, where there are many different values for the assessment area, and this is the second part of the semantic ID. The singleton values for flag states and fishing gears are the last ones in the semantic ID, and are somehow hidden.

asfis:MEG+fao:27.7.b;fao:27.7.c;fao:27.7.d;fao:27.7.e;fao:27.7.f;fao:27.7.g;fao:27.7.h;fao:27.7.j;fao:27.7.k;fao:27.8.a;fao:27.8.b;fao:27.8.d+authority:INT:EC+iso3:IRL+isscfg:03.12

Expansion

In this section, we provide the proposal for the expansion of semantic ID. This expansion will enable the addition of more information, as fields of the semantic ID, that would in turn facilitate and enhance both the traceability of GRSF records, as well as the identification of such records from users. The basic principle that we adopt for the expansion is to preserve the backward compatibility of semantic IDs, and at the same time avoid the limitations of the current version. In simple terms, we propose an expansion that will be able to work properly and been co-exist (if needed) with the current version of semantic IDs. To achieve this we have to: (a) be able to distinguish the semantic IDs between the two version, (b) provide a set of services that enable the transition between the two versions.

The main expansions in the implementation of the new semantic ID are:

  • the semantic ID will be able to contain more fields than the ones already included
  • the semantic ID will contain mandatory and optional fields
  • the ordering of the fields in the semantic ID can be arbitrary
  • the serialization of a semantic ID into text, will use box bracket (i.e. []) to distinguish between the current and the expanded version.

The first update on semantic IDs, is the ability to add more fields, apart from the ones already included. It is evident that the new fields should have a code (or they should be codified somehow) so that they can be included in the semantic ID and at the same time keeping it as short as possible. Below we provide a list of the new fields that can be supported in the expanded implementation.

  • GRSF type (type:assessmentUnit, type:marineResource, type:fishingUnit, type:otherFishery)
  • Database Source (source:firms, source:ram, source:fishsource)
  • Database Source ID (source:firms:13765, source:ram:NEPHFU31, source:fishsource:1048)
  • SDG flag (sdg:true, sdg:false)
  • GRSF status (status:pending, status:approved, status:archived)
  • UUID (uuid:3728c5c4-170e-4c69-b3d5-5089241cd26a)
  • Last updated (updated:2020-08-15)

The rationale for the selection of the aforementioned fields is that (a) their values can be codified, and (b) the coding schemes that they are using are unique. The latter is really important because it allows the identification of the semantics of each part without the addition of extra information (e.g. an extra-label denoting that the information that follows it is the marine species). In the current implementation of the semantic IDs, the index of appearance of each field defines the semantics of the field. For example, for the case of stock records, the first field is the marine species identification, and the second one is the assessment areas. Under the light of making the order of appearance of these fields arbitrary in the new implementation, this is about to change and therefore it is necessary to be able to identify them properly. As a result, the different coding schemes that are used for each field is enough for identifying them.

The addition of new fields in the semantic ID, comes with the proper guidelines about which of them are mandatory to exist in the semantic ID, and which if them can be optionally included. The proposed guidelines stem from the merging and dissection activities that are followed when constructing GRSF records, that among other identify which are those attributes that define the uniqueness of a record. So in order to preserve the backward compatibility and proper interpretation of older semantic ID, it is necessary to specify as mandatory fields the ones that exist in the current implementation of the semantic ID.

Marine Species Assessment/Fishing Areas Management Authorities Flag States Fishing Gears
*Stock * Yes Yes n/a n/a n/a
Fishery Yes Yes Yes (if available) Yes (if available) Yes (if available)

As described in the previous section, sometimes it would be useful to "promote" some fields in the semantic identifier compared to others, in the sense that they should appear in the beginning part of the semantic ID. For this reason, we propose disabling the order of appearance of each part in the semantic ID. This can only be achieved if it is guaranteed that all the mandatory parts of the semantic ID, have been provided. In practical terms, this means that empty values will no longer be supported, and for the expanded version they will be replaced with null values. In addition, for supporting the smooth co-existence of semantic IDs (of the new and the expanded version) a set of services will be provided, which transform a semantic ID from its expanded version to the original one and vice-versa (more information about these services is provided in the next section).

Finally, we propose enclosing the different parts of a semantic ID in box brackets (i.e. []), for two reasons: (a) this serialization of the semantic ID, enables the distinction between the original version of the ID and its expanded one, (b) it improves the readability of the particular parts of the semantic ID, by presenting them as different entities. Below, we demonstrate a semantic ID in its original form and its expanded version that uses box brackets.

  • GRSF Stock

    • Orignal: asfis:SRG+fao:34.3.2
    • Expanded: [asfis:SRG]+[fao:34.3.2]
  • GRSF Fishery

    • Orignal: asfis:SFA+fao:71+authority:INT:FFA+iso3:TON+isscfg:10.9
    • Expanded: [asfis:SFA]+[fao:71]+[authority:INT:FFA]+[iso3:TON]+[isscfg:10.9]

Services

In this section, we demonstrate and describe the services that are needed for supporting the co-existence (or the migration if needed) of both versions of GRSF Semantic ID More details are given below.

getExpandedSemanticID

input a valid GRSF record
output the expanded semantic ID of the given GRSF record
additional parameters a list with the optional parameters indicating which of the (optional fields) should be included in the generated expanded semantic ID. The accepted values for the parameter list, are actually the fields that can be used for the construction of the semantic ID. The service getFieldsVocabulary can be used for retrieving the accepted fields in each case.
description The service generates the expanded version of the semantic ID of the given record. More specifically, it generates the semantic ID, based on the guidelines that are described above. As regards optional parameters, it supports a parameter list, which can be used for indicating which of the optional fields will be included in the generated semantic ID. If the parameter list is empty, then the generated version of the semantic ID, will contain only the mandatory fields.
input example a valid GRSF record (identified through its UUID), fields:[type,source,source_id,uuid,sdg,status,last_modified] (alternatively, fields:*)
output example [status:approved]+[uuid:82cc7b13-d20c-30c6-8b43-c6adccf7cfc6]+[asfis:ORY]+[fao:81][type:assessment_unit]+[source:firms]+[source_id:13886]+[sdg:true]+[last_modified:2020-11-19]

getShortenedSemanticID

input a valid GRSF record or an expanded Semantic ID
output the (short) semantic ID of the given GRSF record or Semantic ID
additional parameters note
description The service generates the short version of the semantic ID. The service can work using as input either an entire GRSF record or the expanded semantic ID of a GRSF record. In the former case, it will generate the semantic ID, using the mandatory fields that are needed, while in the latter case it will remove the optional fields from the given expanded semantic ID. In both cases the mandatory fields in the returned semantic ID, will be properly ordered (recall that in the shortened version of semantic IDs, ordering is important)
input example [status:approved]+[uuid:82cc7b13-d20c-30c6-8b43-c6adccf7cfc6]+[asfis:ORY]+[fao:81][type:assessment_unit]+[source:firms]+[source_id:13886]+[sdg:true]+[last_modified:2020-11-19]
output example asfis:ORY+fao:81

getFieldsVocabulary

input one of the following terms: [stock, fishery]
output the list of fields that can be exploited for constructing the semantic ID
additional parameters none
description The service returns the list of the fields that can be used for constructing the semantic ID. Given the proper input (i.e. stock or fishery) it returns the list of the fields that can be used. Apart from the values of the fields, that can be used from semantic ID generation services (e.g. getExpandedSemanticID), it also returns the mandatory and optional fields.
input example stock
output example {"mandatory":["species","area"], "optional":"type","source","source_id","sdg","status","uuid","last_modified"]}}

Add picture from clipboard (Maximum size: 8.91 MB)