Please use this identifier to cite or link to this item:
Title: DiSSCo Prepare Deliverable D5.4 - A best practice guide for semantic enhancement and improvement of semantic interoperability
Authors: Dillen, Mathias
Groom, Quentin
Cubey, Rob
von Mering, Sabine
Hardisty, Alex
Contributors: Humphries, Josh
Butcher, Ginger
Robertson, Tim
Ernst, Marcus
Islam, Sharif
Keywords: Data, including standards and other common resources;technical;Persistent identifier (PID);disambiguation;knowledge graph;Darwin core;Wikidata
Publication Date: 2021
Publisher: DiSSCo Prepare
Citation: Dillen, M., Groom, Q., Cubey, R., von Mering, S. & Hardisty, A. (2021). DiSSCo Prepare Deliverable D5.4 - A best practice guide for semantic enhancement and improvement of semantic interoperability. DiSSCo Prepare.
Abstract: Textual data on natural history specimens regularly suffers from ambiguity and interoperability problems, which impairs their common understanding with related specimens and the connection of these properties to other sources of information. Semantic enhancement of these data is one approach to address these problems, where subjects and concepts are identified through common standards or links to authority resources rather than textual strings. Through enrichment, key properties of specimens such as (1) the location they were gathered from, (2) the agents who acted upon them and (3) their taxonomic determinations can be unambiguously identified and processed for further scientific research. In this report, we take a closer look at the current state of natural history specimen data enrichment. Building on a range of pilot projects, we break down the general workflow of enrichment, informing on potential approaches/tools that may be utilized, key considerations that need to be made and obstacles that may be encountered. Workflows to enrich specimen data tend to be diverse, in particular because the context in which the enrichment takes place can be very variable. Not all institutions will have the same resources to undertake the enrichment process, nor are all collections managed and digitized in the same manner or can different types of collections be compared to each other. Despite this lack of homogeneity, general lessons can be inferred and some base recommendations be stipulated. Enrichment should ideally take place in close accord with digitization. Otherwise, in general, enrichment will be easier to implement at larger scales above the collection level. Yet the key role in comprehensive enrichment of local knowledge about the collection and the relative ease at which low-hanging fruit can be (semi-)manually processed still promises considerable added value of even simple local approaches to enrichment. Technical obstacles, again, may be more easily tackled at big data levels, but this may lead to synchronization problems with local systems. Data standards have adapted to support enriched properties in various manners. An extension for Darwin Core to accommodate agent attribution is under development and has been tested i n this report. Some problems still abound, in particular the strain enrichment places on the simple data model of the popular Darwin Core archive. Alternative representations are gaining traction, including the openDS standard currently under development by DiSSCo.
Appears in the Folders:DPP Work Package 5 - Common Resources and Standards

Files in This Item:
File Description SizeFormat 
DiSSCo Prepare D5.4 Semantic Enhancement.pdf1.16 MBAdobe PDFView/Open

This item is licensed under a Creative Commons License