Datasets are usually accompanied by some documentation that functions like an owner's manual. This documentation will allow a user to understand the raw data, its context, how to read it, and appropriate use. Structured data in the documentation will also allow the dataset to be found more easily.
Good data documentation will include the context, methods, protocols, scale, software and instruments used in the data collection. It will include a detailed outline of the data structure and file types used. Data validation and cleaning methods should be included as well as data sources used. Modifications to the data over time should be noted as well as information on data confidentiality, access and use conditions.
Documentation is most useful when it is standardized to a general norm or to a discipline-specific convention. Two levels of terminology are used in data documentation: metadata and ontologies.
Metadata is information about data. It is often information describing a unit of publication such as a book, a website, or a dataset.
NISO (National Information Standards Organization) describes 3 types of metadata:
Ex. a general standard is the Dublin Core Metadata Initiative (DCMI), a simple standard widely used to describe web resources.
Ex. a discipline-specific standard is the Data Documentation Initiative (DDI), an XML-based standard for the documentation of datasets in the social and behavioral sciences.
Another type of metadata to consider is Provenance Metadata. This kind of metadata provides information on the individuals, groups, and activities involved in producing a dataset.
NIH Common Data Elements (CDE) - http://www.nlm.nih.gov/cde/
NIH portal of data elements that are common to multiple data sets across different studies in support of improving data quality and promoting data sharing. View collections of CDEs by specific project, or subject area.
Ontologies are standardized terminology for the concepts and their inter-relationships within a given domain. Ontologies are used at the content level of a dataset.
An ontology, on its own, allows researchers to be specific about describing the phenomena they work with, such as in annotations of datasets and indexing of journal articles, in order to facilitate identification and retrieval of relevant information.
Ontologies, in concert with each other, allow for the mapping of concepts, interoperability, between domains thus allowing for use in:
The following are links to directories of ontologies serving the health sciences.