Section 3: Data Policies

Introduction

Learning to work responsibly with data will be an important part of your program. As a Data Scientist, your work will often be subject to larger legal and institutional policies that impact the ways in which you can collect, share, and use data.

Your instructors in the Data Science and Human Behavior program will walk you through these policies and processes when applicable and appropriate. The policies that are reviewed below may not come into play during your time at UW–Madison, but they are critical for any Data Science and Human Behavior practitioner to be aware of.

An important component of learning to manage research data is understanding and complying with the laws, regulations, and policies to which your data may be subject. Knowing about these will help you manage your data better from the beginning, ensure you’re making appropriate storage and security decisions for your data, and ensure compliance to avoid legal risk for yourself and your organization.

Topics in this Section:

  1. Sensitive/Restrictive Data and Policies
  2. Introduction to Ethical Considerations for Working with Data

1. Sensitive/Restrictive Data and Policies

This section will highlight some of the policies and regulations that may affect you as a student and researcher at UW-Madison. While at UW-Madison, you might work with industry partners/institutional data meaning that you could be working with sensitive/restricted data that is governed by campus policies, federal regulations, and other policies. In this section we will introduce you to the policies you need to know regarding sensitive/restricted data, human subjects data, and your responsibilities for working with this data.

UW-Madison has institutionally specific policies that guide members of their communities in the use, security, and management of data while at that institution. These may come from different offices or departments, so sometimes it can be difficult to identify all the policies to which you may be subject. However, investing some time to locate relevant policies will help inform your data management plan as well as ensure that you’re being a thoughtful and responsible data steward. You can search for university-wide policies on UW-Madison’s Policy Library. This section will give a brief introduction to a few UW-Madison campus policies.

UW-Madison’s IT has defined four major classifications for campus data that can help us understand the risk associated with our data and help us select the most appropriate storage and sharing methods for our data. The four classifications, their definitions, and brief examples have been pulled from the campus IT website and are included below. You can find further detail on the IT website and the Office of the Vice-Chancellor for Research and Graduate Education website. If you are unsure how to classify your data, reach out to your departmental IT or Office of Cybersecurity.

ClassificationDefinitions
RestrictedData should be classified as Restricted when the unauthorized disclosure, alteration, loss or destruction of that data could cause a significant level of risk to the University, affiliates or research projects. Data should be classified as Restricted if protection of the data is required by law or regulation or if UW-Madison is required to self-report to the government and/or provide notice to the individual if the data is inappropriately accessed.

Examples might include social security numbers, PHI, and other personally identifiable information.
SensitiveData should be classified as Sensitive when the unauthorized disclosure, alteration, loss or destruction of that data could cause a moderate level of risk to the university, affiliates or research projects. Data should be classified as Sensitive if the loss of confidentiality, integrity or availability of the data could have a serious adverse effect on university operations, assets or individuals.
InternalData should be classified as Internal when the unauthorized disclosure, alteration, loss or destruction of that data could result in some risk to the University, affiliates, or research projects. By default, all Institutional Data that is not explicitly classified as Restricted, Sensitive or Public should be treated as Internal.
PublicData should be classified as Public prior to display on websites or once published without access restrictions, and when the unauthorized disclosure, alteration, or destruction of that data would result in little or no risk to the University and its affiliates.

Data Transfer and Use Agreements

Individual signing a contract

A Data Transfer and Use Agreement (DTUA) is a contract that governs the sharing and use of data when it is exchanged between UW-Madison and another institution, collaborator, or other external source (whether acquiring from or providing data to). Per guidelines from Research & Sponsored Programs, “whenever data is being transferred off campus to another person, an agreement on the sharing of data should be used.” However there are also cases in which the use of DTUA is required, such as in the sharing of Protected Health Information (PHI) where the university is subject to legal obligations. Research and Sponsored Programs provides further information about DTUAs and links to templates for different types of data including sensitive and general data.

Our Institutional Review Board (IRB) provides further guidance for these agreements, which are also sometimes called memorandum of understanding, data sharing agreements, or data release agreements. This guidance provides further detail on when these agreements are required and how they relate to FERPA and the IRB process.

Other Campus Level Requirements

IRBs are campus bodies that work with campus researchers to review human subjects research and ensure that the rights and interests of those participating are protected. For research that involves the use of human subjects, it is expected that you will submit your project plan and materials to the correct IRB for review prior to the beginning of your project.

The IRB will review your plans and examine the risk to the subjects, help ensure you are meeting ethical and legal responsibilities, and will help you understand if you may share your data. UW-Madison has multiple IRBs:

  • The Health Sciences IRB: Reviews research protocols involving medical interventions or procedures where medical expertise is required for evaluation.
  • Minimal Risk IRB: Reviews research protocols that present minimal risk to subjects and that involve medical interventions or procedures requiring medical expertise or that require knowledge of the health care setting. [1]

The UW-Madison Office of the Vice Chancellor for Research and Graduate Education has a policy detailing the data stewardship roles and responsibilities of the University, Principal Investigators (PIs), and researchers on the campus. This policy focuses on the guidance for the management, retention, and access to research data. It is important to read the full policy to ensure your complete understanding of your responsibilities.

While at UW-Madison, you are subject to invention disclosure. You can find the full policy and guidance on the VCRGE Intellectual Property page. You can also visit the Wisconsin Alumni Research Foundation (WARF) website to learn more about the disclosure process. WARF also provides an FAQ page regarding common disclosure questions.

Funding Agency Requirements

In 2013, a memo from the White House Office of Science and Technology (OSTP) directed federal agencies with over $100 million in R&D (research and development) to create plans that would increase public access to the articles and the underlying research data that result from grant funding.

Piggy bank with money coming out the top

This memo affected many of the common, large funders that we frequently encounter at UW-Madison like the National Institutes of Health (NIH), National Science Foundation (NSF), Department of Energy (DOE), Department of Defense (DOD), etc. These requirements affect both publications and data from federally funded research, typically requiring that articles be made publicly available and associated research data be made publicly available no later than 12 months after the article’s publication date. This requirement is often referred to as “public access.” Agencies also now typically ask for a data management plan (DMP) to be submitted as part of the proposal process. The DMP should detail the management of the data during the research project and should identify where and when the data and research outputs will be made publicly available.

Funding agency guidelines have provided some of the greatest incentive for researchers and universities to think more carefully about data management and data sharing. Especially as funders begin to become more stringent in the review of and compliance with written DMPs. For more information about federal funding requirements view the Research Data Services informational table.

Federal Regulations and Policies

There are legal policies that impact and regulate the security and protection of certain types of data. Three common policies that affect researchers are:

  • the Health Insurance Portability and Accountability Act (HIPAA),
  • the Federal Information Security Modernization Act of 2014 (FISMA),
  • and the Federal Educational Rights and Privacy Act (FERPA)

While you may not work with data that falls under these guidelines, it is important to be aware that they govern requirements for handling and storing sensitive and restricted data as defined by these policies at the federal level.


2. Introduction to Ethical Considerations for Working with Data

Complying with the policies and regulations your data may be subject to is an important part of properly caring for your data. However, there are also a few other considerations for working with sensitive data, including working with human subjects data, as well as properly crediting and reusing others’ data that can help ensure you’re working with data ethically throughout the lifecycle.

When working with data, it is important to be aware that some data has further risk associated with it and could potentially cause harm to individuals, communities, nations, animals, or other entities if made publicly available. As the researcher, it is your responsibility to ensure that you assess your data for risk, avoid collecting sensitive data if it’s not necessary to the work, and properly protect the data through effective storage and security practices.[2]

Digital illustration of human fingerprint

Research + Human Subject = Oversight

Human subject research requires UW-Madison’s IRB oversight. The following can help you understand what constitutes research and human subjects.

  • Research: Systematic investigation including research development, testing and evaluation, designed to develop or contribute to generalizable knowledge.
  • Human Subject: A living individual about whom an investigator (whether professional or student) conducting research obtains data through intervention or interaction with the individual OR identifiable private information.[3]

It is important to know that when working with data about a human subject that could potentially identify an individual, it should be considered sensitive. These data types should receive extra care and handling, greater security restrictions than other data types, and should be de-identified prior to sharing or potentially not shared at all. These data types should also follow all campus governance and policies to protect these data types. You should also be sure to take advantage of the campus resources such as the Office of Compliance, Office of Cybersecurity, department and campus IT, and our campus IRBs.

In general, in research data there are two different types of identifiable information – direct identifiers and indirect identifiers (or limited identifiers under HIPAA). Our campus IRBs provide guidance around identifiability. This guidance covers the specific identifiers to be aware of for each legal obligation, guidance as to sample size in relation to indirect identifiers, guidance on coding datasets, and guidance on de-identifying datasets.

Direct identifiers

These do exactly as they sound – enable direct identification of or provide enough detail to make it easy distinguish someone from others, and include:

  • name
  • address
  • social security number
  • other unique numbers related to the individual such as driver’s license number, insurance accounts, medical records number, etc.
  • email address
  • full face photos
  • vehicle information and license plates

Indirect identifiers

These are data that can be used in combination together to enable identification of someone, and include:

  • birthdate
  • ethnicity, race, or indigenous status
  • gender
  • detailed geographic information (e.g., state, county, province, or census tract of residence)
  • profession, detailed title, or organizations the person belongs to
  • rare diseases or health information
  • For example, here at UW-Madison, knowing someone’s ethnicity, gender, and area of study may enable you to identify who that person is.

Researchers are responsible for following the relevant guidelines for sensitive/restricted and human subject data. These include but are not limited to guidelines affecting storage, data use agreements (DUAs), informed consent, and identifiable/de-identified data. You can always reach out to your department with questions or consult the resources and data professionals available on campus to help you navigate these policies.

A Data Release agreement is an umbrella term that campus uses that includes “Data Use Agreements (DUA), Data Transfer Agreements (DTUA), Memorandums of Understanding (MOU), [and] Business Associate Agreement (BAA).” These are contractual agreements that govern the sharing and use of data when it is exchanged between UW-Madison and another institution, collaborator, or other external source (whether acquiring from or providing data to).

While these agreements are not always required, it is likely that you will be subject to these agreements when receiving data from, “records from governmental agencies or corporations; Student record information; Existing, identifiable human subject data; [or] A limited or restricted use data set.” [4]

As a campus researcher, if you are working with this type of data it is important that you review the applicable campus guidance and that you do not sign these agreements on behalf of campus, but instead seek assistance from appropriate campus signatories. To learn more read the IRB guidance on Data Release Agreements.

IRBs are campus bodies that work with campus researchers to review human subjects research and ensure that the rights and interests of those participating are protected. For research that involves the use of human subjects, it is expected that you will submit your project plan and materials to the correct IRB for review prior to the beginning of your project. If you will be conducting research and collecting data from learners or other human subjects for your project, it is your responsibility to follow the IRB process. Certain educational situations and tests or secondary research may be exempt from this requirement but it is your responsibility to understand the requirements and perform the due diligence of seeking exemption from IRB when required.

Informed Consent is a federal requirement intended to provide clear and concise information that a reasonable person would want in order to decide whether or not to participate in the research. Your language and methods for informed consent are laid out during the IRB application process and they have guidance available. [5]

Requirements for secondary datasets may be different depending on whether the data is identifiable or whether it has been de-identified to FERPA standards, which requires that “that all direct and indirect identifiers that could be used in combination to identify an individual be removed; for example, demographic information that creates small cells of individuals must be removed from a data set.” [6] In certain circumstances, data use agreements may apply in lieu of consent (see the guidance linked in the data release agreements section above).


References

[1] University of Wisconsin-Madison. (n.d.) Human Research Protection Program. Retrieved from: https://irb.wisc.edu/

[2] Briney, K. (2015). Data Management for Researchers : Organize, Maintain and Share Your Data for Research Success. Exeter, UK: Pelagic Publishing.

[3] University of Wisconsin-Madison. (n.d.) Human Research Protection Program: Investigator Manual. Retrieved from: https://irb.wisc.edu/manual/investigator-manual/

[4] University of Wisconsin-Madison. (June 24, 2024) Research and Sponsored Programs. Data Transfer and Use Agreements. Retrieved from: https://rsp.wisc.edu/contracts/dtua.cfm

[5] University of Wisconsin-Madison. (n.d.) Human Research Protection Program: Consent Processes and Documentation. Informed Consent Document. Retrieved from: https://irb.wisc.edu/manual/investigator-manual/conducting-human-participant-research/consent-processes-and-documentation/

[6] University of Wisconsin-Madison. (n.d.) Human Research Protection Program: FERPA. Retrieved from: https://irb.wisc.edu/regulatory-information/ferpa/