Website Search
Find information on spaces, staff, and services.
Find information on spaces, staff, and services.
How do you currently store and back up your important files?
It’s okay if you don’t have a system in place yet! In this section, we’ll talk about ways to improve your storage and backup.
Let’s define storage and backup, both essential practices for ensuring the care and keeping of your data.
Storage is the act of keeping your data in a secure location that you can access readily. Files in storage should be the working copies of your files that you access and change regularly.
Backup is the practice of keeping additional copies of your data in separate physical or cloud locations from your files in storage. Backup copies are copies you would access in the case of data loss and needing to access previous versions of your work.
Good storage and backup practices help protect your data and research from losses due to hardware failure, natural disaster, or file corruption. You spend a lot of time collecting your data, so ensuring you have a good system for backing up your data will prevent you from having to spend time trying to recover your files, recollect data, or redo any cleaning or analysis.
Other benefits:
A good rule of thumb to remember is LOCKSS, or Lot Of Copies Keep Stuff Safe. However, you don’t need to go overboard with the number of copies you have. Typically, the rule to follow is the rule of three:
Three copies, in at least two physically separate locations, on more than one type of storage hardware.
This might look like:
The goal here is to get your backups and storage as physically far apart as possible to prevent any loss due to natural disaster, such as a fire or flood occurring in the lab where you’re doing research. If your backups are all housed together, it could ruin both the primary copy of your data and any backups that you keep in the same building. Having at least one off-site backup increases the chances that you can restore your data if such a disaster happens.
Backing up your data can be done automatically or manually, depending on your level of comfort with those types of systems.
If backing up your data manually, you’ll want to determine how often you should back up your research data and will need to weigh the benefits of having up-to-date backup copies against the work involved with frequent backups. Once you’ve determined how often you should back up your data, set a schedule for regularly doing so.
It’s important to remember that backing up your data doesn’t require backing up every bit of data every time. You can also choose to back up only the files that have been changed or added since the last backup. This is called an incremental backup, which requires less time and storage space than a full backup.
There are also a number of automatic options depending on the hardware or cloud systems you are using. Some cloud tools, like UW Box, have a sync option that will automatically sync certain files and folders depending on the settings you provide. The IT contacts in your departments may also have automatic solutions for you.
Terms of Use: When you are deciding on cloud applications to use for your data, always read the terms of service so you know what permissions you are granting the company that supports the application and how any data might be potentially shared. Part of protecting your data is understanding the risks to your data, and that includes knowing what risks could come through your storage and backup tools.
For those going to school or working at UW–Madison, we recommend that you always use your institutionally provided Box and Google Drive accounts over your personal accounts. UW–Madison has an agreement with Box and Google to provide more intellectual property protections than your personal accounts would provide.
For those not part of the UW–Madison community, just be sure to always read the terms of service and understand what you’re agreeing to. It’s unlikely that those terms of service would ever be exploited and harm your data, but it may help you make decisions about what tool is right for you and your data.
Sensitive Data: Remember, if you have sensitive data, be sure that any applications you choose to use are approved for that type of data. At UW–Madison, if you have sensitive or restricted data, you should only be using approved tools for that type of data. You should follow campus guidance regarding handling sensitive data and reach out to your departmental IT or DoIT for guidance on what approved tools are available to you for your data type.
USBs: Be cautious about using USB flash drives to backup your data. They have some advantages that can make them an appealing option: they’re affordable, they’re convenient, and you probably own at least one already. However, flash drives’ portability makes them easy to misplace, have stolen, or accidentally break.
Seek help for securing your data from security experts. Your departmental IT and DoIT can help identify the most appropriate security solutions for your research data.
There are also some day-to-day basic security measures you can take to protect your non-sensitive research data.
You can find more information about security and tools available to you on the DoIT website.
Option | Description | Capacity | Security | Best Use |
---|---|---|---|---|
ResearchDrive | ResearchDrive is the UW–Madison campus-wide file storage service that provides secure, shareable storage space on the UW–Madison network. | 5TB of storage is available to each PI at no cost. | ResearchDrive is available to UW–Madison PIs and their collaborators and is suited for a variety of research purposes, including backup, archive, storage for research computing, and others. | Hosted on campus and supported by DoIT. Automatic offsite backups included. Daily replication to second off-campus location. Automatic de-duplication and performance tuning. No file or folder limits. |
Campus Computing Infrastructure (CCI) | Shared, scalable, secure options for a variety of needs, from home/group directories to long-term archiving. | See CCI site for current pricing. | NetID restricted. Can add permissions for campus and external users via Manifest, a tool that allows departments to authorize users to log in to their resources using groups of NetIDs, and allows for the creation of new NetIDs for UW affiliates and collaborators. | Shared Storage can scale up to hundreds of TBs. Contact CCI to schedule a meeting with the CCI Engagement team to discuss your needs. |
UW–Madison Box | UW–Madison Box is a cloud-based storage service that provides a secure place to store data, share, and collaborate with others both within and outside of UW–Madison. | Allows for 50GB of data storage. 15GB maximum per file. | NetID restricted; can add permissions for campus or external users. Only secure Box folders configured with the involvement of the HIPAA Security Officer (or designee) may be used when working with PHI. If you are part of the UW–Madison Health Care Component, contact your HIPAA Privacy or Security Coordinator with questions about the applications best suited for creating, storing, and sharing PHI. | Versions your files automatically. Encryption of data in transit and at rest. Box provides useful fine-grained controls for sharing files and folders outside of Box. The UW–Madison enterprise agreement protects the intellectual property rights of UW–Madison faculty, staff and students (unless shared with others outside the university). |
UW G Suite | UW–Madison G Suite is a collection of cloud-based productivity and collaborative tools, including Google Drive, Google Docs, Google Sheets, Google Slides, Google Sites, Google Keep, Hangouts Meet, Hangouts Chat, and more. It is NOT considered a secure and permanent place for keeping data. | Unlimited storage, free to UW–Madison faculty, staff, and students. 5TB maximum per file. | NetID restricted; can add permissions for other campus users using their NetIDs. Not appropriate for sensitive or restricted data. | The UW–Madison enterprise agreement protects the intellectual property rights of UW–Madison facstaff and students (unless shared with others outside the university). Google Drive can be more useful for real-time collaborating than Box. Encryption of data in transit and at rest. |
LabArchives | LabArchives is the UW–Madison campus-wide Electronic Lab Notebook (ELN) service which can be used to store data, observations, notes, and other digital materials generated during the research process. | Unlimited data storage. 4GB maximum per file. | Available to the faculty, researchers, staff, graduate students, and undergraduate students performing their research activities. Accounts must be created at request of the PI by the ELN team. The UW–Madison LabArchives instance provides extra data security such as encryption and firewalls. May not be appropriate for sensitive data, human subjects data, or other restricted data types. Consult with the ELN team, IRB, or your local security officers. | Accepts many file types, allows versioning, securely stores files. Supports multiple user roles, permissions. The UW–Madison enterprise agreement protects the intellectual property rights of UW–Madison facstaff and students. LabArchives is currently on a multi-year license and the ELN team suggests keeping an exported archival copy of your notebook at the end of a project. |
Departmental server or storage network | Your department’s IT unit may offer storage on their server or network. | Varies | Protected by user accounts and passwords. | Contact your department’s IT unit for information. |
External hard drives | Flash drives, CDs, and DVDs. | Varies | Not secure unless kept in a secure location and sensitive data are encrypted. | Best for short-term storage (approximately 1-5 years) since media formatting can fail. |
Portable media | Flash drives, CDs, and DVDs. | Varies | Not secure unless kept in a secure location and sensitive data are encrypted. | Best for short-term storage (approximately 1-5 years) since media formatting can fail. |
Third Party Cloud Storage | Dropbox and others. | Varies | Varies | UW–Madison has no negotiated terms of services with these providers. See Guidelines for use of non-UW–Madison applications for research for help evaluating your risks and rights. |
Option | Description | Size & Cost | Notes |
---|---|---|---|
Bucky Backup | A managed service for data backup and recovery solution that utilizes IBM’s Tivoli Storage. Three levels are available: Lite, Enterprise, and Archive. | Varies | Allows you to schedule automatic backups for critical data. The archive service should not be used for backup and files you will need to overwrite. Archiving is for preserving files as they are. Compare service levels |
Departmental Server | Your department’s IT staff may offer backup. | Varies | Contact your departmental IT staff for details. |
Departmental Server | Your department’s IT staff may offer backup. | Varies | Contact your departmental IT staff for details. |
External hard drives | External hard drives | Varies | Remember to have another backup copy available as hardware can fail. Flash drives are also available, but remember that they are easily lost and easily corrupted. |
Backing up your data doesn’t always go according to plan. It’s important to check your backups periodically so that you know you can restore important data from one of your backup copies if necessary.
Set a schedule for checking your backup data integrity. Make sure to check that the correct files were backed up, that they do not contain errors, and that they are the most up-to-date versions of the files.
One way to do this is a checksum algorithm, which compares backup files to the originals to make sure the backups are accurate.
At the beginning of this section, we asked you to reflect on your current backup and storage practices. From what you’ve learned in this section, which of the strategies below could you adopt to improve your practices?