Data Management Plans

“By failing to prepare, you are preparing to fail.” Benjamin Franklin

Photo by Pixabay on


In the last post I provided an overview of the final stage in the data pipeline: Metadata and Archiving. The first stage in the data pipeline is planning, and I provided an overview of this stage in my third post “The Data Pipeline: Planning.” In this post I’ll cover the planning stage in more detail, specifically I’ll be discussing Data Management Plans (DMPs). Since there is a surprising amount of information regarding data management plans available online, in this post I will summarize the common themes from across several sources, including government agencies and universities.

I think you should always bear in mind that entropy is not on your side.

Elon Musk

Recall from my 3rd post that a data management plan provides the foundation for the data pipeline, and lays out the framework for how the data management team will complete each step. This is important, because as Elon Musk, owner of Tesla and SpaceX, once said “I think you should always bear in mind that entropy is not on your side.” In my experience, such is true in all aspects of life, and no less so when it comes to data management. On a more practical note, some project sponsors require a data management plan be prepared at the proposal stage (e.g., U.S. National Science Foundation [NSF]) or as one of the first deliverables due after beginning a project.

Writing a Data Management Plan

I reviewed information on DMPs from the following university and government sources, and the literature: U.S. Fish and Wildlife Service, U.S. Geological Service, Cornell University, Massachusetts Institute of Technology, NSF, Data Observation Network of Earth (DataONE), and Michener (2015). I then compiled a list of common elements of DMPs from each of the sources, which is provided with annotations, below.

Common Themes of DMPs

Basic project information and purpose

Every DMP should include a basic project description, including the purpose, specific project objectives, project team, contact information, and timeline.

Roles and Responsibilities

The DMP should include a description of the roles and responsibilities of the project team, including the assignment of the data steward(s) and managers.


The DMP should include a data management budget estimate. When preparing the estimate be sure to budget for each stage in the data pipeline.

Data Description

The DMP should include a listing and brief description of the data to be collected (e.g., tabular, photographs, video, audio, physical samples, geospatial), data sources (e.g., new data collection vs. existing), the data types (e.g., character varying), and the estimated volume (e.g., gigabytes).

Data Organization and Storage

The DMP should include a description of how the data will be organized and stored. This may include database schemas, file structures, code repositories, and, for physical samples that will be maintained over time, a curation plan. This section should also include information on how the data will be backed up periodically to ensure it is not lost.

Data Quality Assurance

The DMP should include a description of the data quality assurance and control (QA/QC) process. This may include standard operating procedures, flow charts, and a description of database control structures.

Data Processing and Workflows

At the core of this element of DMPs are the concepts of data lineage and provenance. Wikipedia (2020) defined data lineage as the data origin, what happens to it and where it moves over time; and data provenance as records of the inputs, entities, systems, and processes that influence data of interest, providing a historical record of the data and its origins. In essence, the DMP should include information about the data from the time it is first obtained; through the processes of QA/QC review, analysis, and reporting; and to archiving. Topics in this section of the DMP may include:

  • Flow charts illustrating the flow of data through the stages in the data pipeline
  • Standard operating procedures
  • Code repositories
  • Descriptions of anticipated data transformations


The DMP should include a description of the process of preparing metadata (i.e., data about the data), including the minimum standards for complete metadata. See The Data Pipeline: Metadata & Archiving for more details.

Data Access and Sharing

The DMP should include a section describing data access while the project is active, and after project completion. Will the data will be submitted to a public data repository? If so, then which repository will be used? This section may also include information regarding data ownership, intellectual property rights, attribution, and licensing (e.g. creative commons).

Data Archiving

Lastly, the DMP should include a description of how the data will be archived and maintained over time after the project is complete.

Tools for Preparing DMPs

There are a variety of tools available for preparing DMPs. Below is a listing of said resources.

Next Time on Elfinwood Data Science Blog

In this post I provided a detailed review of data management plans. In the next post I’ll provide an overview of data management software. If you like this post then please consider subscribing to this blog (see below) or following me on social media.

Literature Cited

Michener, W. K. 2015. Ten Simple Rules for Creating a Good Data Management Plan. PLoS Comput Biol 11(10): e1004525. doi:10.1371/journal. pcbi.1004525

Wikipedia contributors. (2020, July 18). Data lineage. In Wikipedia, The Free Encyclopedia. Retrieved 18:56, July 20, 2020, from

Follow My Blog

Get new content delivered directly to your inbox.

Join 18 other followers

Copyright © 2020, Aaron Wells

4 thoughts on “Data Management Plans

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: