Backup Tips

From Archiveteam
Revision as of 22:13, 2 April 2012 by Shaqfu (talk | contribs) (Very large update; formatting's a bit sloppy)
Jump to navigation Jump to search

Personal digital archiving is all the rage nowadays. This article will give you a basic overview of why you should do it and how.

How Is Data Lost?

Here's a short list of the ways to lose data:

  • Disk failure
  • Software failure
  • Malicious software
  • Natural disaster
  • Clumsy user
  • Accidental deletion
  • Accidental overwriting
  • Cat hair
  • Refrigerator magnets
  • Solar radiation
  • You forgot where you put it
  • Your parents/roommate/spouse moved it and didn't tell you
  • The feds paid a surprise visit to you/your storage provider
  • Your storage provider went under/got bought/got bored

Any one of these can erase decades of data in a second. The goal of good backups is to contain the damage any one of these can cause, ideally to zero. The ways to lose data can be summarized into these categories:

  1. Operational - drives wear down, software writes garbage, user error
  2. Environmental - building catches fire, hurricane knocks your house over
  3. Access - you lose track of data, or lose ability to get to it

Thus, a good backup plan is resilient against each of these types of failure. We'll use Michael Ashenfelder's four-step process as a model.

Identify/Decide

Before starting any sort of backup plan, it helps to identify what you're saving. Terabytes save differently than megabytes; knowing which you plan on saving can save money and grief.

One way to get started is to envision the following scenarios (which also serve as excellent fire drills):

  • You get a call from your lawyer, telling you that someone opened a suit against you. Your lawyer says that it's easy to take care of, but it requires as much documentation as possible to build your case. This encompasses things like financial data (Quicken, Excel, etc.), legal documentation, and the like.
  • You get a call from a client/boss, saying that the Big Project needs some crucial information from some old work of yours to save it. They don't have a specific date, but they'll sift through everything more than one year old to find it. This encompasses anything you do for work, be it media projects, code, reports, etc. Essentially, stuff your livelihood is based on.
  • You learn the hard way that nobody keeps backups of people. Your next of kin go through your effects, and come across your digital data. Here, they'll find personal things - photographs/videos, special emails, etc. - essentially, stuff that's meaningful to you.
  • You have to move to a developing country for a short time - not long enough to think long-term, but long enough that you'll want some amenities. Due to arcane customs laws, you can only bring one small hard drive into the country. These are things that you really like and would rather not lose, but don't fall into the above. Things like contact information, game saves, hard-to-replace data, favorite porn, etc. Think of this as a catch-all.

Here are things you probably shouldn't save:

  • Program and system files. Unless you run a high-reliability business server, there's little need to have a ready copy of explorer.exe. If you have the install discs handy, then there's no real reason to back these up.

That being said, remember that storage is cheap, but your data is priceless. When in doubt, save it - the cost of doing so is nearly zero, and the cost of losing it is not.

Organize

Make sure you assess all possible data sources when deciding what to back up. There's nothing more embarassing than losing your vacation photos because you didn't copy them off your phone before pitching it. If you have anything in the cloud, copy it locally!

When backing up, it helps to keep everything together in one large archive. This solves a number of problems:

  • You won't forget where you put that Really Important Data - don't be like Jordan Mechner! - because it's all in one place.
  • It's easier to keep one big archive reliable than many small ones (think economy of scale)
  • Buying a few big hard drives is cheaper than buying many small ones (and they tend to be more reliable)

Save Copies

The goal here is to mitigate the damage caused by sudden catastrophic data loss, so that your valuable data (from above) is kept safe.

Scheme

First, buy some hard drives. Mechanical (traditional) drives are the cheapest for the storage and their longevity/flaws are well-documented. For purposes of personal archiving, consumer drives are sufficient - so long as it's not a disastrously bad line, any drive is sufficient. 1-1.5TB drives are roomy and cheap, and are recommended.

A basic backup scheme may look like this:

  1. Primary storage (your PC/phone/tablet/etc) - changes constantly as you use it
  2. Secondary local storage (a hard drive in a closet) - changes once every 2-3 months
  3. Secondary offsite storage (a hard drive in a safety deposit box) - changes once or twice a year [optional but highly recommended]

This scheme provides resilence against most common failures: if one drive dies, there are two backups; if you delete something, you have two; if your house floods, you have one. So long as you are vigilant, the chance of total data loss is negligible, even in case of total disaster. You may wish to add more drives to each area in accordance with your paranoia.

Keeping backup cycles up is important for both the longevity of the hardware and security of your data. Not only does it allow you to keep your data current, but it can show early signs of hardware failure as you read/write to the disks.

Why Not the Cloud?

You may be thinking "why not use cloud backups as offsite storage?" The answer is: you can, but it's risky, and you should only use it to supplement an already solid scheme.

The cloud offers many seductive features, such as high disk reliability, easy access, and cheap storage. However, there's a hidden cost: by using cloud storage, you lose control of your data. By trusting a storage provider with your data, you trust them to be there tomorrow. This has shown to be a very risky agreement, as cloud storage providers tend not to be long-lived, and those that fall don't give a damn about your data loss. The provider may lose interest (MobileMe), close shop (Deathwatch#Dead as a doornail), or have a surprise party thrown by the Department of Justice (MegaUpload).

In short, cloud storage is a lot like real clouds - insubstantial, fleeting, and really bad to build on. This isn't to say it's useless - cloud storage can be a useful function in a storage scheme due to easy access - but it's not something you should trust your backups to. Put another way, AT wouldn't be working overtime to save dying clouds if they were reliable long-term.

Doing It

Here's a quick, dirty, and platform-agnostic backup method:

  1. Connect the external drive to your computer.
  2. In the root of the external hard drive, create a folder called 'backups'.
  3. In the 'backups' folder, create another folder named after your computer's name (e.g. "POSEIDON").
  4. In this new folder, create another folder of today's date in the format of YYYY-MM-DD (i.e. 2021-06-23). This format ensures that Windows will properly sort the folders in date order.
  5. Copy everything you selected (if you're extra-thorough, everything from your C: drive) into this date folder. A program like TeraCopy is exceedingly useful, as it supports copy verification, pause/resume and most importantly, won't randomly die if it runs into any problems.

You'll probably have space left over (unless you do a lot of media editing), so you can repeat this once per backup cycle on the same drive. This gives you some extra peace of mind in case one cycle's backup is corrupted, as you can use the next most recent one.

Conclusion

Congrats! You're now resistant against catastrophic data loss. This is only the beginning of good archiving - vigilance is the watchword of digital archiving. Keep an eye on disk health, run through fire drills (either really or hypothetically), and stay consistent with backups.

As the old joke goes, there are two kinds of people in the world: those that keep backups, and those that haven't lost data yet. Don't let it happen to you.