Research Data Management: Organization & Format

Guide of resources related to the many aspects of research data management. Data management encompasses the processes surrounding collecting, organizing, describing, sharing, and preserving data.

Organization

Why should you organize your data?

The organizational structure of your data can help you easily locate files when revisiting a past project and can help secondary users find, identify, select, and obtain the data they require.

How do you organize your data?

For best results, data structure should be fully modeled top-to-bottom/beginning-to-end in the planning phase of a project.

You'll want to devise ways to express the following:

The context of data collection: project history, aim, objectives, and hypothesis
Data collection methods: sampling, data collection process, instruments used, hardware and software used, scale and resolution, temporal and geographic coverage, and secondary data sources used
Dataset structure of data files, study cases, and relationships between files
Data validation, checking, proofing, cleaning, and quality assurance procedure carried out
Changes made to data over time since their original creation and identification of different versions of data files
Information on access and use conditions or data confidentiality

(adapted from UKDA)

File Naming & Structure

Why is file naming important?

Think of a file name as a unique identifier for each of your files. Following a naming convention allows you to simplify the organization of your files and locate your files with ease, as well as making it easier for others to understand and reuse your data. This is particularly important when you are working on a collaborative project.

How should you name your file?

Here are some recommended best practices for naming your files:

Use names that are brief but descriptive
Avoid spaces and special characters (ex: *, #, %, etc.)
Come up with a naming convention adhered to by everyone using the files
Identify versions of files using dates and version numbering in file name
Use three letter file extensions to ensure backwards compatibility (ex: .doc, .tif, .txt)
Do not use letter case to identify different files (ex: datasetA.txt vs. dataseta.txt)

How should files be structured?

Folder structure for your files can assist in the unique identification of the files contained within them. Consider the structure of the folders containing your data files before you begin to collect your data. Ideas for how to organize your folders include:

Data type (text, images, models, etc.)
Time (year, month, session, etc.)
Subject characteristic (species, age grouping, etc.)
Research activity (interview, survey, experiment, etc.)

Consider these examples:

File naming: File001.txt vs. 201206blood_ID0234.txt
Folder structure: MyDocuments\Research\Sample12.jpg vs. C:\\NEHGrant01234\WWI\Images\London_001.jpg

File Naming Resources

File and Directory Naming Conventions
Guidelines provided by Purdue University.
Filenames as Strategy to Managing Your Image Assests
Guidelines to using a file naming system to assist in management of image files.

Format

Sustainability of Digital Formats
Recommendations for file formats when preparing digital content for Library of Congress collections.
ICPSR Guide to Social Science Data Preparation and Archiving
Section of ICPSR guide relating to data formats, documentation, and file structure.
NARA FAQs About Selecting Sustainable Formats for Electronic Records
U.S. National Archives and Records Administration frequently asked questions about format choices.
NARA Transfer Guidance
U.S. National Archives and Records Administration frequently asked questions and guidance for file transfers and records management. (Formerly FAQ About Digital Audio and Video Records)

Questions?

If you have questions about data organization and format or would like to request a consultation with a member of the Scholarly Communications and Publishing Team, please email uwlib-scp@uw.edu or click the "Ask Us" link on the top right side of this site.