Skip to Main Content Skip to main content

Home

Data Resources in the Health Sciences

An inventory of resources to support research data needs in the health sciences field. This guide is constantly being updated; please send feedback via the Comments links.

Storage Services :: Local

UW Data Hosting and Storage Services

  • ResearchWorks Archive (UW Libraries) -deposit datasets up to 10GB, contact them for larger datasets.
  • UW IT Service Catalog - view a comprehensive list of self-sustaining services including data hosting and archiving.

Storage Services :: General

NCBI provides several repository services for genomic and genetic data.

http://www.ncbi.nlm.nih.gov/guide/howto/submit-data/

  • Sequence Data
  • Microarray Data
  • Bioassay Data, Substance or Sequence-Based Reagents
  • Human Clinical Data and Genetic Tests

NIH Data Sharing Repositories

  • Table of NIH-supported data repositories that accept submissions of appropriate data from NIH-funded investigators (and others).
  • Includes instructions for submitting and accessing data.

Dryad - international, nonprofit organization that accepts data for storage and reuse. Free for UW researchers.

ICPSR (Inter-university Consortium for Political and Social Research)

UK Data Service- Acquires data from academic, public sector and commercial sources within the United Kingdom and abroad.

figshare - allows researchers to publish all of their research outputs in an easily citable, sharable and discoverable manner. All file formats can be published, including videos and datasets that are often demoted to the supplemental materials section in current publishing models.

Open Science Framework - open source scholarly commons for the entire research cycle including data deposit; from the Center for Open Science

Zenodo - CERN initiative; free deposit for the long tail of Science

Common Data Formats

File format to use depends on many aspects, including:

  • the data itself
  • how it was collected
  • repository capabilities and preferences
  • preservation considerations
  • compatibility with re-use

File formats currently recommended by the UK Data Archive for long-term preservation of research data.  Other data centres or digital archives may recommend different formats.  

http://www.data-archive.ac.uk/media/2894/managingsharing.pdf

Type of Data    Recommended File Formats for Sharing, Re-Use and Preservation
Quantitative tabular data with extensive metadata
a dataset with variable labels, code labels, and defined missing values, in addition to the matrix of data

SPSS portable format (.por)

delimited text and command ('setup') file (SPSS, Stata, SAS, etc.) containing metadata information some structured text or mark-up file containing metadata information, e.g. DDI XML file

Quantitative tabular data with minimal metadata
a matrix of data with or without column headings or variable names, but no other metadata or labelling

comma-separated values file (.csv)

tab-delimited file (.tab)

Geospatial data


vector and raster data

ESRI Shapefile (essential: .shp, .shx, .dbf ; optional: .prj, .sbx, .sbn)

geo-referenced TIFF (.tif, .tfw) 

CAD data (.dwg)

tabular GIS attribute data

Qualitative data
textual

eXtensible Mark-up Language (XML) text according to an appropriate Document Type Definition (DTD) or schema (.xml) 

Rich Text Format (.rtf) 

plain text data, ASCII (.txt)

Digital image data TIFF version 6 uncompressed (.tif)
Digital audio data Free Lossless Audio Codec (FLAC) (.flac)
Digital video data 

MPEG-4 (.mp4)

motion JPET.2000 (.jp2)

Documentation  

Rich Text Frormat (.rtf)

PDF/A or PDF (.pdf)

OpenDocument Text (.odt)