Find information on spaces, staff, and services.
Complying with the policies and regulations your data may be subject to is an important part of properly caring for your data. However, there are also a few other considerations for working with sensitive data, as well as properly crediting and reusing others’ data, that can help ensure you’re working with data ethically throughout the life cycle.
When working with data, it is important to be aware that some data has further risk associated with it and could potentially cause harm to individuals, communities, nations, animals, or other entities if made publicly available. As the researcher, it is your responsibility to ensure that you assess your data for risk, avoid collecting sensitive data if it’s not necessary to the work, and properly protect the data through effective storage and security practices. 
Data that could potentially identify an individual should be considered sensitive and receive extra care and handling, greater security restrictions than other data types, and should be de-identified prior to sharing or potentially not shared at all. Some data that falls in this category is personally identifiable information, which includes information like health information, name, age, address, occupation, race/ethnicity, and more.
There are two different types of identifiable information: direct identifiers and indirect identifiers.
These do exactly what they say — enable direct identification of or provide enough detail to make it easy to distinguish someone from others.
Direct identifiers include information like:
These are data that can be used in combination together to enable identification of someone.
Indirect identifiers include information like:
For example, here at UW–Madison, knowing someone’s ethnicity, gender, and area of study may enable you to identify who that person is.
There are also some other human subject data types that are legally regulated, such as in the case of HIPAA and FERPA data, and require even more restriction. We’ll go into more details about HIPAA and FERPA in the next section, where we discuss legal implications for data.
Any research you might do here at UW–Madison that may involve human subject data should go through the Institutional Review Board for either Education, Behavioral, and Social Sciences or for Medical Sciences prior to data collection, but ideally during your planning phase.
For each type of data listed below, identify whether it is a direct or indirect identifier.
Social Security Number
|Social Security Number
If your data falls in this category, you will at some point in time have to complete training on responsible conduct of research that may cover information like human subjects, data management, research misconduct (like plagiarism and falsification), etc.
Human subjects training will include information on the rights and autonomy of human subjects, discuss risks and harm, and emphasize the importance of consent in research and data sharing. However, another important ethical consideration from the perspective of research data is considering the impact that data collection may have and that data sharing may have on the communities you’re working with.
In the past, research teams have repeatedly caused harm to indigenous communities across North America by publishing, filming and recording, or otherwise giving access to information that was not supposed to be shared broadly. This has happened in different ways: broadcasting restricted community knowledge that had been shared specially with a researcher, misleading consent forms or research purposes, and conducting invasive research that does not benefit the community. This has also likely happened to many other communities around the globe that have worked with researchers.
It is incredibly important to question the methods and purposes of your research: Who is benefitting from the research? What have you promised to not share and what have you explicitly discussed with the community that you can share? Are you letting the culture or community inform your data rather than imposing your own ideas?
Respect different knowledge systems and let the language and definitions used by a community inform your work. When conducting research with and about underrepresented communities as someone not from that community, consider the way your collection tools like forms and surveys or your research variables ask people to define themselves or their needs. Do the tools provide options for the way the community would define those things themselves? If not, will the data really be able to answer your question or help that community?
There are resources available written by different communities about working responsibly with data. Contact your subject librarian or Research Data Services for assistance locating appropriate resources.
When working with communities, be responsible in the ways you choose to conduct your research, how you collect and represent data, and if and how you choose to share the research. If you share your research, be sure to communicate how, where, and exactly what data will be shared in the consent form.
Data outside of the human subjects category can also be considered sensitive. You may often know outright if you’re working with sensitive data as it will be subject to laws, contracts, or policies that you have to comply with. However, it is always good to think through your data and any other products of your research to understand the impact they could have if shared.
Storing and using sensitive data requires extra protections to ensure that not only are we complying with important legal obligations, but also protecting the safety of human subjects’ information appropriately.
Here at UW–Madison, if you have sensitive or restricted data, you should only be using approved tools for that type of data. You should follow campus guidance regarding handling sensitive data and reach out to your departmental IT or DoIT for guidance on what approved tools are available to you for your data type.
Much like scholarly publications, research data is a scholarly output of a researcher’s work. Due to this, it is important to understand when research data is considered intellectual property as well as how to cite it correctly so that it can contribute to the scholarly discourse.
In this section we’ll provide a brief introduction to copyright and licensing, data citation, and Digital Object Identifiers (DOIs).
In the United States, research data that is considered factual cannot be copyrighted. However, sometimes the associated metadata, databases, figures, software, or work that could be considered a “creative” output can be considered an asset that you want to control reuse or redistribution over by applying appropriate licensing. Licenses define how others may interact with, reuse, modify, or redistribute your work. Choosing a license for your data ensures that it is used appropriately by other researchers.
For scientific or factual data, many researchers choose to apply a Creative Commons 0 license, so that it is clear the data is being distributed freely. At most a CC-BY (Creative Commons Attribution) license should be applied.
For creative works, there are varying levels of Creative Commons licenses that you can choose to apply. The license levels build on one another and range from unrestricted to fairly restrictive terms. The Creative Commons website has a good guide that can help you decide what restrictions you may want to apply, including request for attribution or non-commercial use.
To learn more about Creative Commons licenses and how they differ from copyright, view Lesson 2 of the Copyright and Fair Use micro-course.
For software or code, there are multiple choices. You will want to select from a license you are comfortable with, such as the GNU licenses, MIT license, or Apache licenses.
For databases and their content, the Open Data Commons Licenses can be used for the licensing of databases and their contents. Databases are a more complicated situation in that the database may have copyright protection, but its data does not or may have separate copyright. We recommend reviewing Cornell’s guidance on this licensing.
You should cite datasets for the same reasons you cite books and journal articles: for dataset creators to receive appropriate credit for their work, and to make clear the antecedents to your research.
Data citation standards may vary between disciplines, and some professional organizations, academic journals, and repositories may also have guidance on preferred data citation formats. However, in general, the information you capture in a data citation is similar to the information included in a citation for any other work. The Inter-university Consortium for Political and Social Research (ICPSR) suggests the minimum elements of data citation as:
As mentioned above, a persistent identifier is a useful piece of information for data citation. A commonly used persistent identifier for research data is a DOI, or a digital object identifier.
A DOI is a series of alphanumeric characters that serve as a unique identifier for a specific publication, dataset, or other digital object. The DOI for that object won’t change over time as the URL might if the web page is moved or the website is changed. Instead, the URL is information that is attached behind the DOI and it can be updated over time.
This allows researchers to make their datasets easier to locate, access, and cite for other researchers. Reliable location, identification, and citation by others is a critical component for researchers to enable reuse of their data, replication of their work, and to track the impact of their research.
A DOI has to be provided by a DOI Registration Agency. However, many publishers, repositories, and institutions work with such agencies in order to provide DOIs for their communities. When sharing your data, we recommend checking with your publisher or repository to see if a DOI is provided for you.
 Briney, K. (2015). Data Management for Researchers : Organize, Maintain and Share Your Data for Research Success. Exeter, UK: Pelagic Publishing.