Lesson 4: Storage and Backup

Reflection

How do you currently store and back up your important files?

It’s okay if you don’t have a system in place yet! In this section, we’ll talk about ways to improve your storage and backup.

4.1 Data Management Storage and Backup

Let’s define storage and backup, both essential practices for ensuring the care and keeping of your data.

Storage is the act of keeping your data in a secure location that you can access readily. Files in storage should be the working copies of your files that you access and change regularly.

Backup is the practice of keeping additional copies of your data in separate physical or cloud locations from your files in storage. Backup copies are copies you would access in the case of data loss and needing to access previous versions of your work.

Good storage and backup practices help protect your data and research from losses due to hardware failure, natural disaster, or file corruption. You spend a lot of time collecting your data, so ensuring you have a good system for backing up your data will prevent you from having to spend time trying to recover your files, recollect data, or redo any cleaning or analysis.

Other benefits:

  • A granting agency may require that you retain data for a given period and may ask you to explain in a data plan how you will store and back it up.
  • Storing and backing up your data ensures that it will be there when you need to use it for publications, theses, or grant proposals.
  • Good preservation practices help make your data available to researchers in your lab/research group, department, or discipline in the future.

We recommend using managed storage and backup services made available to you through your institution or department, but some general best practices for managing your own is described below.

How many copies of my data should I have?

A good rule of thumb to remember is LOCKSS, or Lot Of Copies Keep Stuff Safe. However, you don’t need to go overboard with the number of copies you have. Typically, the rule to follow is the rule of three:

Two to three copies, in at least two physically separate locations, on more than one type of storage hardware.

This might look like:

  • A copy in active storage; that means a copy you are regularly accessing and working on during your research. This will likely be on your computer or a lab’s shared network drive, or a service like Google Drive.
  • A second copy on a different device on- or off-site, such as an external hard drive in your office or a backup server provided by your IT department.
  • A third copy, preferably off-site. Many common storage solutions build this into their services, such as UW-Madison’s ResearchDrive, or appropriate cloud solutions like Google Drive.

The goal here is to get your backups and storage as physically far apart as possible to prevent any loss due to natural disaster, such as a fire or flood occurring in the lab where you’re doing research. If your backups are all housed together, it could ruin both the primary copy of your data and any backups that you keep in the same building. Having at least one off-site backup increases the chances that you can restore your data if such a disaster happens.

Black woman holding a laptop standing in front of a bay of computer servers.
Backing up data in a secure location

Setting a Schedule

Backing up your data can be done automatically or manually, depending on your level of comfort with those types of systems.

If backing up your data manually, you’ll want to determine how often you should back up your research data and will need to weigh the benefits of having up-to-date backup copies against the work involved with frequent backups. Once you’ve determined how often you should back up your data, set a schedule for regularly doing so.

It’s important to remember that backing up your data doesn’t require backing up every bit of data every time. You can also choose to back up only the files that have been changed or added since the last backup. This is called an incremental backup, which requires less time and storage space than a full backup.

There are also a number of automatic options depending on the hardware or cloud systems you are using. Some cloud tools, like ResearchDrive, are automatically backed up daily and replicated offsite to an encrypted storage cluster . The IT contacts in your departments may also have other automatic solutions for you.

Remember: Verifying your backups

Backing up your data doesn’t always go according to plan. It’s important to check your backups periodically so that you know you can restore important data from one of your backup copies if necessary.

Set a schedule for checking your backup data integrity. Make sure to check that the correct files were backed up, that they do not contain errors, and that they are the most up-to-date versions of the files.

Other Important Notes

Terms of Use: When you are deciding on cloud applications to use for your data, always read the terms of service so you know what permissions you are granting the company that supports the application and how any data might be potentially shared. Part of protecting your data is understanding the risks to your data, and that includes knowing what risks could come through your storage and backup tools.

For those going to school or working at UW–Madison, we recommend that you always use your institutionally provided storage and Google Drive accounts over your personal accounts. UW–Madison has agreements with service providers to provide more intellectual property protections than your personal accounts would provide.

For those not part of the UW–Madison community, just be sure to always read the terms of service and understand what you’re agreeing to. It’s unlikely that those terms of service would ever be exploited and harm your data, but it may help you make decisions about what tool is right for you and your data.

Sensitive Data: Remember, the storage and backup solutions you choose must have appropriate protections for the type of data you are working with. At UW-Madison, if you have sensitive or restricted data, you should only use approved tools for that type of data. You should follow campus guidance regarding handling sensitive data and reach out to your departmental IT or the Office of Cybersecurity for guidance on what approved tools are available to you for your data type.

USBs: Be cautious about using USB flash drives to backup your data. They have some advantages that can make them an appealing option: they’re affordable, they’re convenient, and you probably own at least one already. However, flash drives’ portability makes them easy to misplace, have stolen, or accidentally break.

General Security

Seek help for securing your data from security experts. Your departmental IT and DoIT can help identify the most appropriate security solutions for your research data.

There are also some day-to-day basic security measures you can take to protect your non-sensitive research data.

  • Keep your computer software and applications up to date.
  • Use strong passwords, never using the same password twice, and store passwords securely with a password manager. UW-Madison uses 1Password.
  • Limit access: Regardless of the storage and backup solutions you choose, limiting access to your data is an easy way to provide an extra layer of security. Ways that you can do this include:
    • Limit physical access to data and storage solutions by keeping offices locked or restricted as appropriate, remove old collaborators who no longer need access from shared tools, and don’t travel with your data on a physical device if you can avoid it.
    • Refrain from entering sensitive, restricted, or otherwise protected data, including hard-coded passwords, into any generative AI tool or service. Providing any data to a generative AI tool as part of a query is the equivalent of posting that information on a public website because it “learns” by collecting user generated data and may later provide it as an output to others. You can find more examples of security issues related to the use of generative AI on UW-Madison Information Technology’s webpage.
  • DoIT provides some other security tips on their website. Highlights including using anti-virus software, using a VPN, using a firewall, preventing device theft, and avoiding phishing attempts and suspicious links.

You can find more information about security and tools available to you on the DoIT website.


4.2 Where to Store Your Data at UW–Madison

There are a variety of options for storage and backup solutions available through UW-Madison depending on your needs. The UW-Madison Data Storage Finder is a tool from Research Data Services and DoIT that helps you narrow down the available services for storage and backup by asking you questions about your particular use case. The tool will help you evaluate and compare your options based on your responses to questions about the classification of your data, whether you need to meet any regulatory compliance standards, the volume of your data, and more. 

In addition to services available at the campus level, we recommend reaching our to your departmental IT staff. Your department or unit may have specialized resources available to you, such as departmental servers or specialized storage solutions for sensitive and restricted data.

Check your understanding

At the beginning of this section, we asked you to reflect on your current backup and storage practices. From what you’ve learned in this section, which of the strategies below could you adopt to improve your practices?

  • Keep backups of important data on an external storage device.
  • Move my external storage device into a different location than my computer or other storage.
  • Keep backups of important data on a cloud system like ResearchDrive, Google Drive, or other.
  • Set a schedule for/begin an automatic system for backing up my important data.
  • Move my data off of a USB (thumb drive) to a more stable backup option.