Website Search
Find information on spaces, staff, and services.
Find information on spaces, staff, and services.
Broadly speaking, research data is the information needed to produce and support your research findings.
Research data takes many forms, including both physical and digital formats. The definition, scope, and most common formats of ‘research data’ will vary depending on your discipline and whether or not you’re subject to broader policy from funding agencies, publishers, or the University.
The following definition is commonly used. Though the definition describes ‘scientific data’ it can be applied broadly to all types of research:
Scientific data “include the recorded factual material commonly accepted in the scientific community as of sufficient quality to validate and replicate research findings.”
EXCLUDES: laboratory notebooks, preliminary analyses, case report forms, drafts of scientific papers, plans for future research, peer-reviews, communications with colleagues, or physical objects and materials, such as laboratory specimens, artifacts, or field notes.
-White House, OSTP Memo, 2022
If you’re working here at the University of Wisconsin-Madison, you’ll be subject to the following definition from the Policy on Data Stewardship, Access, and Retention, which shares similarities with the federal one above but expands on what items fall within the concept:
“Data means recorded factual material, regardless of the form or media on which it may be recorded, that is commonly accepted in the research community as necessary to validate research findings. For example, data may include writings, films, sound recordings, pictorial reproductions, drawings, designs, or other graphic representations, procedural manuals, forms, diagrams, work flow charts, equipment descriptions, data files, statistical records, and other research data. This definition pertains to both primary and secondary data.”
While this module will focus on best practices for managing digital research data, it is important to remember that physical data formats such as paper lab notebooks or physical samples and your other files are also important to manage well.
Questions about managing other types of data—such as physical data, administrative or business data, or other non-data records and files—should be directed to the most appropriate part of your organization. Here at UW-Madison, the following offices can answer your questions regarding:
While definitions for research data may be broad, there are ways to categorize your data based on common forms, general types, and the stages that it moves through during the research process. Understanding your research data at this level will help you make more informed decisions in the storage and organization of your data. Certain data types or data at certain stages of the research cycle may be harder to recreate or recollect once lost, so you may choose different strategies that provide extra protections for data at a greater risk.
The most common data forms can vary by discipline. Below we’ve included examples of some commonly used data across a few broad domains.
Examples of commonly used research data forms across general domains:
Domain | Research Data Forms |
---|---|
Formal and Natural Sciences |
|
Social Sciences |
|
Arts and Humanities |
|
Research data can fall into a few general categories, based on method of collection, that can be used when talking about types of research data.
As we noted above, research data may also move through the following stages during your project:
These different stages of research data are often represented in a more formal model called the data life cycle. In reality, research doesn’t move in quite such an orderly fashion and often, many of these steps happen simultaneously.
However, the data life cycle can be a helpful mental model to use because at each stage in the life cycle, there are best practices for managing data. Visualizing how your data is moving through your project can help remind you of key practices to incorporate. While this module won’t cover every single stage in the life cycle, we will provide a primer to some essential practices for getting started with managing your data.
Engage with the interactive Research Data Management Life Cycle. Selecting a stage of research data management will reveal its definition and that stage’s role in the life cycle.
Open Research Data Management Life Cycle in a new window
While data management can sound like a lot of work for little payoff, managing your research data well actually provides a lot of personal and practical benefits. Well-managed and well-described data is easier to sort through, access, and understand, making your research project more efficient. Having a good system also prevents the frustration of data loss in the case of hardware failure or other accidents, as you will spend less time recovering the lost data or redoing your work. Another personal benefit to researchers is that well-managed data can help prevent publication retractions. Retractions can be an unintended consequence of poor data management when it leads to errors in data or the loss of data that supports published material.
There are larger changes happening in the research community that have led to increased attention to research data management. First, research is increasingly computational, data-driven, and collaborative. As methods, instruments, and processes continue to advance, so too does the amount of data we are able to create and capture. The increasing size of data and corresponding infrastructure needed for storage and computing requires us to be more responsible, proactive data managers.
Second, funding agencies, especially federal agencies that provide funding through tax dollars, are interested in ensuring that publications and data from funded research are openly available to the public. Due to this, they’ve put policies in place that require data to be managed and shared, something we’ll talk about in another course.
Third, emphasis is increasingly being placed on the reproducibility and reusability of research. Reproducibility refers to a researcher being able to understand another researcher’s methods well enough to move from the same raw data or beginning point and reproduce the results of the original work.
Another important reason to manage data is the fact that data is often a valuable asset as well as a very delicate one. Depending on the type of work, data can be expensive. The instruments, infrastructure, time spent, and staff needed to collect data can have a high monetary and resource cost. You can maximize this investment in your data by describing and sharing it so that others can reuse or build upon it. For example, if a researcher has access to a prohibitively expensive instrument, sharing the data from their project makes it available to other researchers who may not have the same resources.
Data is also more fragile than you may imagine. It can be easy to think that our digital data is somehow safer than physical samples or notebooks we may keep in a lab, but the truth is that digital data relies on hardware that physically exists somewhere in our world. Digital data live on computers in our offices, servers in data centers, instruments in our labs, and on flash drives in our backpacks. That physical hardware can be damaged by natural causes or accidents, files can be corrupted, and data formats can be rendered inaccessible with constantly and quickly changing technology. Managing your data well can help prevent these losses.
Research data management also:
Of the following, what is considered research data? Select all that apply.
According to the definitions of research data we have covered in this section, drafts for publication would not be considered as research data. The satellite images, the code to visualize the data, and the audio recording would all be important for another researcher to have to be able understand, interpret, and reproduce your work. However, another researcher would likely have little use for your drafts for publication.
Choose one of the following questions to answer:
[1] New England Collaborative Data Management Curriculum, “Module 2: Types, Formats, and Stages of Data” by Lamar Soutter Library, University of Massachusetts Medical School licensed under CC BY SA 4.0 at https://library.umassmed.edu/resources/necdmc/index