Lesson 4: Storage and Backup

Reflection

How do you currently store and back up your important files?

It’s okay if you don’t have a system in place yet! In this section, we’ll talk about ways to improve your storage and backup.

4.1 Data Management Storage and Backup

Let’s define storage and backup, both essential practices for ensuring the care and keeping of your data.

Storage is the act of keeping your data in a secure location that you can access readily. Files in storage should be the working copies of your files that you access and change regularly.

Backup is the practice of keeping additional copies of your data in separate physical or cloud locations from your files in storage. Backup copies are copies you would access in the case of data loss and needing to access previous versions of your work.

Storage systems often provide mirroring, in which data is written simultaneously to two drives. This is not the same thing as backup since alterations in the primary files will be mirrored in the second copy.

Good storage and backup practices help protect your data and research from losses due to hardware failure, natural disaster, or file corruption. You spend a lot of time collecting your data, so ensuring you have a good system for backing up your data will prevent you from having to spend time trying to recover your files, recollect data, or redo any cleaning or analysis.

Other benefits:

  • A granting agency may require that you retain data for a given period and may ask you to explain in a data plan how you will store and back it up.
  • Storing and backing up your data ensures that it will be there when you need to use it for publications, theses, or grant proposals.
  • Good preservation practices help make your data available to researchers in your lab/research group, department, or discipline in the future.

How many copies of my data should I have?

A good rule of thumb to remember is LOCKSS, or Lot Of Copies Keep Stuff Safe. However, you don’t need to go overboard with the number of copies you have. Typically, the rule to follow is the rule of three:

Three copies, in at least two physically separate locations, on more than one type of storage hardware.

This might look like:

  • A copy in active storage; that means a copy you are regularly accessing and working on during your research. This will likely be on your computer or a shared network drive in a lab.
  • A second copy on a different device on- or off-site, such as an external hard drive in your office or a backup server provided by your IT department.
  • A third copy, preferably off-site. This might be on a cloud application like Box, Google Drive, or other appropriate cloud solution.

The goal here is to get your backups and storage as physically far apart as possible to prevent any loss due to natural disaster, such as a fire or flood occurring in the lab where you’re doing research. If your backups are all housed together, it could ruin both the primary copy of your data and any backups that you keep in the same building. Having at least one off-site backup increases the chances that you can restore your data if such a disaster happens.

Black woman holding a laptop standing in front of a bay of computer servers.
Backing up data in a secure location

Setting a Schedule

Backing up your data can be done automatically or manually, depending on your level of comfort with those types of systems.

If backing up your data manually, you’ll want to determine how often you should back up your research data and will need to weigh the benefits of having up-to-date backup copies against the work involved with frequent backups. Once you’ve determined how often you should back up your data, set a schedule for regularly doing so.

It’s important to remember that backing up your data doesn’t require backing up every bit of data every time. You can also choose to back up only the files that have been changed or added since the last backup. This is called an incremental backup, which requires less time and storage space than a full backup.

There are also a number of automatic options depending on the hardware or cloud systems you are using. Some cloud tools, like UW Box, have a sync option that will automatically sync certain files and folders depending on the settings you provide. The IT contacts in your departments may also have automatic solutions for you.

Other Important Notes

Terms of Use: When you are deciding on cloud applications to use for your data, always read the terms of service so you know what permissions you are granting the company that supports the application and how any data might be potentially shared. Part of protecting your data is understanding the risks to your data, and that includes knowing what risks could come through your storage and backup tools.

For those going to school or working at UW–Madison, we recommend that you always use your institutionally provided Box and Google Drive accounts over your personal accounts. UW–Madison has an agreement with Box and Google to provide more intellectual property protections than your personal accounts would provide.

For those not part of the UW–Madison community, just be sure to always read the terms of service and understand what you’re agreeing to. It’s unlikely that those terms of service would ever be exploited and harm your data, but it may help you make decisions about what tool is right for you and your data.

Sensitive Data: Remember, if you have sensitive data, be sure that any applications you choose to use are approved for that type of data. At UW–Madison, if you have sensitive or restricted data, you should only be using approved tools for that type of data. You should follow campus guidance regarding handling sensitive data and reach out to your departmental IT or DoIT for guidance on what approved tools are available to you for your data type.

USBs: Be cautious about using USB flash drives to backup your data. They have some advantages that can make them an appealing option: they’re affordable, they’re convenient, and you probably own at least one already. However, flash drives’ portability makes them easy to misplace, have stolen, or accidentally break.

General Security

Seek help for securing your data from security experts. Your departmental IT and DoIT can help identify the most appropriate security solutions for your research data.

There are also some day-to-day basic security measures you can take to protect your non-sensitive research data.

  • Keep your computer software and applications up to date.
  • Use strong passwords, never use the same password twice, and store passwords securely.
  • Limit access: Regardless of the storage and backup solutions you choose, limiting access to your data is an easy way to provide an extra layer of security.
    • Ways that you can do this include limiting physical access to data and storage solutions by keeping offices locked or restricted as appropriate, removing old collaborators who no longer need access from shared solutions, and not traveling with your data on a physical device if you can avoid it.
  • DoIT provides some other security tips on their website for securing your computer and securing your mobile device, including using anti-virus software, using a VPN, using a firewall, preventing device theft, and avoiding phishing attempts and suspicious links.

You can find more information about security and tools available to you on the DoIT website.


4.2 Where to Store Your Data at UW–Madison

OptionDescriptionCapacitySecurityBest Use
ResearchDriveResearchDrive is the UW–Madison campus-wide file storage service that provides secure, shareable storage space on the UW–Madison network.5TB of storage is available to each PI at no cost.ResearchDrive is available to UW–Madison PIs and their collaborators and is suited for a variety of research purposes, including backup, archive, storage for research computing, and others.Hosted on campus and supported by DoIT.
Automatic offsite backups included.
Daily replication to second off-campus location.
Automatic de-duplication and performance tuning.
No file or folder limits.
Campus Computing Infrastructure (CCI)Shared, scalable, secure options for a variety of needs, from home/group directories to long-term archiving.See CCI site for current pricing.NetID restricted. Can add permissions for campus and external users via Manifest, a tool that allows departments to authorize users to log in to their resources using groups of NetIDs, and allows for the creation of new NetIDs for UW affiliates and collaborators.Shared Storage can scale up to hundreds of TBs. Contact CCI to schedule a meeting with the CCI Engagement team to discuss your needs.
UW–Madison BoxUW–Madison Box is a cloud-based storage service that provides a secure place to store data, share, and collaborate with others both within and outside of UW–Madison.Allows for 50GB of data storage.
15GB maximum per file.
NetID restricted; can add permissions for campus or external users.
Only secure Box folders configured with the involvement of the HIPAA Security Officer (or designee) may be used when working with PHI. If you are part of the UW–Madison Health Care Component, contact your HIPAA Privacy or Security Coordinator with questions about the applications best suited for creating, storing, and sharing PHI.
Versions your files automatically.
Encryption of data in transit and at rest.
Box provides useful fine-grained controls for sharing files and folders outside of Box.
The UW–Madison enterprise agreement protects the intellectual property rights of UW–Madison faculty, staff and students (unless shared with others outside the university).
UW G SuiteUW–Madison G Suite is a collection of cloud-based productivity and collaborative tools, including Google Drive, Google Docs, Google Sheets, Google Slides, Google Sites, Google Keep, Hangouts Meet, Hangouts Chat, and more.
It is NOT considered a secure and permanent place for keeping data.
Unlimited storage, free to UW–Madison faculty, staff, and students.
5TB maximum per file.
NetID restricted; can add permissions for other campus users using their NetIDs.
Not appropriate for sensitive or restricted data.
The UW–Madison enterprise agreement protects the intellectual property rights of UW–Madison facstaff and students (unless shared with others outside the university).
Google Drive can be more useful for real-time collaborating than Box.
Encryption of data in transit and at rest.
LabArchivesLabArchives is the UW–Madison campus-wide Electronic Lab Notebook (ELN) service which can be used to store data, observations, notes, and other digital materials generated during the research process.Unlimited data storage.
4GB maximum per file.
Available to the faculty, researchers, staff, graduate students, and undergraduate students performing their research activities.
Accounts must be created at request of the PI by the ELN team.
The UW–Madison LabArchives instance provides extra data security such as encryption and firewalls.
May not be appropriate for sensitive data, human subjects data, or other restricted data types. Consult with the ELN team, IRB, or your local security officers.
Accepts many file types, allows versioning, securely stores files.
Supports multiple user roles, permissions.
The UW–Madison enterprise agreement protects the intellectual property rights of UW–Madison facstaff and students.
LabArchives is currently on a multi-year license and the ELN team suggests keeping an exported archival copy of your notebook at the end of a project.
Departmental server or storage networkYour department’s IT unit may offer storage on their server or network.VariesProtected by user accounts and passwords.Contact your department’s IT unit for information.
External hard drivesFlash drives, CDs, and DVDs.VariesNot secure unless kept in a secure location and sensitive data are encrypted.Best for short-term storage (approximately 1-5 years) since media formatting can fail.
Portable mediaFlash drives, CDs, and DVDs.VariesNot secure unless kept in a secure location and sensitive data are encrypted.Best for short-term storage (approximately 1-5 years) since media formatting can fail.
Third Party Cloud StorageDropbox and others.VariesVariesUW–Madison has no negotiated terms of services with these providers. See Guidelines for use of non-UW–Madison applications for research for help evaluating your risks and rights.

4.3 Data Backup Options at UW–Madison

OptionDescriptionSize & CostNotes
Bucky BackupA managed service for data backup and recovery solution that utilizes IBM’s Tivoli Storage. Three levels are available: Lite, Enterprise, and Archive.VariesAllows you to schedule automatic backups for critical data.
The archive service should not be used for backup and files you will need to overwrite. Archiving is for preserving files as they are.
Compare service levels
Departmental ServerYour department’s IT staff may offer backup.VariesContact your departmental IT staff for details.
Departmental ServerYour department’s IT staff may offer backup.VariesContact your departmental IT staff for details.
External hard drivesExternal hard drivesVariesRemember to have another backup copy available as hardware can fail. Flash drives are also available, but remember that they are easily lost and easily corrupted.

Remember: Verifying your backups

Backing up your data doesn’t always go according to plan. It’s important to check your backups periodically so that you know you can restore important data from one of your backup copies if necessary.

Set a schedule for checking your backup data integrity. Make sure to check that the correct files were backed up, that they do not contain errors, and that they are the most up-to-date versions of the files.

One way to do this is a checksum algorithm, which compares backup files to the originals to make sure the backups are accurate.

Check your understanding

At the beginning of this section, we asked you to reflect on your current backup and storage practices. From what you’ve learned in this section, which of the strategies below could you adopt to improve your practices?

  • Keep backups of important data on an external storage device.
  • Move my external storage device into a different location than my computer or other storage.
  • Set a schedule for/begin an automatic system for backing up my important data.
  • Move my data off of a USB (thumb drive) to a more stable backup option.