Please use this identifier to cite or link to this item:
http://dx.doi.org/10.34960/27
Title: | D4.2 Report on New Methods for Data Quality Assurance, Verification and Enrichment |
Authors: | Phillips, Sarah Dillen, Mathias Groom, Quentin Green, Laura Weech, Marie- Hélène Wijkamp, Noortje |
Keywords: | Data, including standards and other common resources |
Publication Date: | 2019 |
Publisher: | ICEDIG |
Citation: | Phillips Sarah, Dillen Mathias, Groom Quentin, Green Laura, Weech Marie- Hélène, & Wijkamp Noortje. (2019). Report on New Methods for Data Quality Assurance, Verification and Enrichment. Zenodo. |
Abstract: | Distributed Systems of Scientific Collections (DiSSCo) will facilitate the production of tens of millions of natural history specimen collection images along with their labels each year. The labels of these specimens contain valuable information for research studies, but their transcription can be very difficult and time consuming with often hard to read handwritten labels. Whilst accurate label transcription is only one step along the way to create a specimen record fit for different research uses, it is an extremely important one. It would be very time consuming to have to return to recheck label information for even a very small proportion of specimens. Once a specimen is transcribed correctly it becomes much easier to enhance the record with additional information from other sources, e.g. from literature or collector itineraries, determine the point of collection from the textual information on the label by a process known as georeferencing, or even to find inaccuracies within the label itself. This document discusses and compares different approaches for the efficient accurate transcription of these labels. Using Herbarium specimens as an example, the quality of transcribed data by in-house trained institute staff, outsourced to a commercial company or transcribed by the general public through online crowdsourcing platforms was compared. Key transcription data was assessed and common errors in label transcription identified. Reasons for these errors are discussed along with possible mechanisms to improve the accuracy of the transcriptions. The need for standards for transcription was identified and recommendations made. |
URI: | https://know.dissco.eu/handle/item/92 |
DOI: | https://doi.org/10.5281/zenodo.3364509 |
Appears in the Folders: | ICEDIG Work Package 4 - Business Framework |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Deliverable D4.2 ICEDIG - Data quality in transcription.pdf | 1.81 MB | Adobe PDF | View/Open |
This item is licensed under a Creative Commons License