Website Search
Find information on spaces, staff, and services.
Find information on spaces, staff, and services.
Broadly speaking, research data is the information needed to produce and support your research findings. Research data takes many forms, including both physical and digital formats. There is not a single shared definition. What information falls within the scope of “research data” may change depending on your discipline, whether or not your job is positioned in academia, or whether or not you’re subject to funding agency guidelines or university policy.
If you’re funded by a federal funding agency, the following definition is often used:
Research data is “…the recorded factual information commonly accepted in the scientific community as necessary to validate research findings.”
INCLUDES: code, figures, statistics, interviews, transcripts
EXCLUDES**: preliminary analyses, drafts of papers, plans for further research, communication + peer reviews, physical samples
– White House, OMB Circular 110, 2013
If you’re working here at the University of Wisconsin-Madison, you’d be subject to the following definition from the Policy on Data Stewardship, Access, and Retention, which shares similarities with the federal one above:
“Data means recorded factual material, regardless of the form or media on which it may be recorded, that is commonly accepted in the research community as necessary to validate research findings. For example, data may include writings, films, sound recordings, pictorial reproductions, drawings, designs, or other graphic representations, procedural manuals, forms, diagrams, work flow charts, equipment descriptions, data files, statistical records, and other research data.
…This definition of data excludes research results based on data such as preliminary analyses, drafts of research papers, published papers, plans for future research, peer reviews, or communications with colleagues.”
This module will focus on best practices for managing digital research data. However, physical data such as paper lab notebooks or physical samples, as well as non-research data files, are also important to manage well.
Practices for managing physical data should follow standards in your discipline or research group if they exist, and questions can be directed to local resources. Here at UW-Madison, Research Data Services is able to answer your questions.
Practices for managing non-data files can usually be directed to your local records manager. Here at UW-Madison, our University Records Officer can assist you.
While definitions for research data may be broad, there are ways to categorize your data based on common forms, general types, and the stages that it moves through during the research process. Understanding your research data at this level will help you make more informed decisions as you begin to manage your data. Certain data types or data at certain stages of the research cycle may be harder to recreate or recollect once lost, so you may choose to use different strategies that provide extra protections for data at a greater risk.
The most common data forms can vary by discipline. Below we’ve included examples of some commonly used data across a few broad domains.
Examples of commonly used research data forms across general domains:
Domain | Research Data Forms |
---|---|
Hard Sciences |
|
Social Sciences |
|
Arts and Humanities |
|
Research data can fall into a few general categories, based on method of collection, that can be used when talking about types of research data.
As we noted above, research data will also move through the following stages during your project:
These different stages of research data are often represented in a more formal model that we call a data life cycle. In reality, research doesn’t move in quite such an orderly fashion and often, many of these steps happen simultaneously.
However, the data life cycle can be a helpful mental model to use because at each stage in the life cycle, there are best practices for managing data. Visualizing how your data is moving through your project can help remind you of key practices to incorporate. While this module won’t cover every single stage in the life cycle, we will provide a primer to some essential practices for getting started with managing your data.
Engage with the interactive Research Data Management Life Cycle. Selecting a stage of research data management will reveal its definition and that stage’s role in the life cycle. The Research Data Management Life Cycle can also be opened in a new browser tab or window.
Open Research Data Life Cycle in a new window
While data management can sound like a lot of work for little payoff, managing your research data well actually provides a lot of personal and practical benefits. Well-managed and well-described data is easier to sort through, access, and understand, making your research project more efficient. Having a good system also prevents the frustration of data loss in the case of hardware failure or other accidents, as you will spend less time recovering the lost data or redoing your work. Another personal benefit to researchers is that well-managed data can help prevent publication retractions. Retractions can be an unintended consequence of poor data management when it leads to errors in data or the loss of data that supports published material.
There are larger changes happening in the research community that have led to increased attention to research data management. First, research is increasingly computational, data-driven, and collaborative. As methods, instruments, and processes continue to advance, so too does the amount of data we are able to create and capture. The increasing size of data and corresponding infrastructure needed for storage and computing requires us to be more responsible, proactive data managers.
Second, funding agencies, especially federal agencies that provide funding through tax dollars, are increasingly interested in ensuring that publications and data from funded research are openly available to funders. They’ve put policies in place that require data to be managed and shared, something we’ll talk about in another course.
Third, emphasis is increasingly being placed on the reproducibility and reusability of research. Reproducibility refers to a researcher being able to understand another researcher’s methods well enough to move from the same raw data or beginning point and reproduce the results of the work.
Another important reason to manage data is the fact that data is often a valuable asset as well as a very delicate one. Depending on the type of work, data can be expensive. This expense is both in terms of monetary cost spent on instruments or infrastructure needed to collect data but also in terms of resource cost in the time spent to work with that data. The investment you make in your data can be maximized by describing and sharing it so that others can reuse or build upon it. For example, if a researcher has access to a prohibitively expensive instrument, sharing the data from their project makes it available to other researchers who may not have the same resources.
Data is also more fragile than you may imagine. It can be easy to think that our digital data is somehow safer than physical samples or notebooks we may keep in a lab, but the truth is that digital data relies on hardware that physically exists somewhere in our world. Digital data live on computers in our offices, servers in the basement, on instruments in our labs, and on flash drives in our backpacks. That physical hardware can be damaged by natural causes or accidents, files can be corrupted, and data formats can be rendered inaccessible with constantly and quickly changing technology. Managing your data well can help prevent these losses.
Research data management also:
Of the following, what is considered research data? Select all that apply.
According to the definitions of research data we have covered in this section, drafts for publication would not be considered as research data. The satellite images, the code to visualize the data, and the audio recording would all be important for another researcher to have to be able understand, interpret, and reproduce your work. However, another researcher would likely have little use for your drafts for publication.
Choose one of the following questions to answer:
[1] New England Collaborative Data Management Curriculum, “Module 2: Types, Formats, and Stages of Data” by Lamar Soutter Library, University of Massachusetts Medical School licensed under CC BY SA 4.0 at https://library.umassmed.edu/resources/necdmc/index