More backups

I previously discussed backing up this site and how it saved me recently. But it’s my personal backup scheme that has saved me many times over the years, mostly from self-inflicted losses. So I thought I’d outline that, not so much as advice, but to illustrate my thinking.

The environment

My home network is more complex than most, but possibly simpler than the ones some readers of this blog may run.

At the core I have a fairly straightforward SOHO Wifi router. I keep thinking that I should upgrade this to a Unifi setup. And while it would be fun, I haven’t been able to justify that expense for my use case.

There are a number of devices on the network, in addition to my various PCs and mobile devices:

A previous incarnation of my Pi cluster
It’s since been upgraded to Pi 4s
  • A QNAP 4-bay NAS
    • 2 internal 250GB SSDs, which are mirrored and contain only system information/applications
    • 4 10TB drives in a RAID 5 configuration, yielding a total of 32TB (or 27TB, if you’re thinking in binary)
    • 2 external 4TB USB drives
  • 2 Raspberry Pi3, both running pihole, serving as my primary and secondary DNS relays.
    Each uses different upstream DNS servers.
  • My “mini-cluster” of 7 Raspberry Pi 4
  • Power for all these runs through a UPS, which is connected to the NAS and will shut it down cleanly before it runs down.

Everything else connects using wifi.

My Windows workstation has 2TB of local storage. My Linux workstation has 1TB. Mostly, these are used for working storage with large longer-term storage on the NAS. In the case of my Windows station what’s there is mostly copies of recent photos/videos for editing. There’s limited amounts of code and other techie-stuff on the Linux machine. Since both contain work in process, having regular backups of my files is important to me. The Windows machine also hosts a variety of administrative and personal records, though most of my documents/spreadsheets/presentations are in Google Apps.

I’m not concerned with snapshots of the system and/or applications. It would take only a few hours to rebuild my Windows system from scratch, and even less for the Linux environment. I do a clean Windows install from time to time anyway, just to get rid of all the cruft that seems to build up. I don’t have anything on the Pi cluster that needs to be backed up. I also want to back up my notebook computer when I get it home and connect to the network. There isn’t a lot of importance on my mobile devices, though I do periodically copy photos/videos to the NAS.

The NAS has a number of things on it, with different types of material in different logical volumes.

  1. Local backups from my workstations and laptop.
  2. Archived personal photos that are not currently work in process
  3. Archived personal video clips that are not currently work in process
  4. Saved music files, some of these from old CDs, some from a variety of music vendors
  5. Purchased/dowloaded video files

Requirements

  1. My saved videos, mostly old movies, don’t need to be backed up. My decision is driven by a few factors:
    • All of it is replaceable, and most of it is streamable. It would not be free, but it wouldn’t be a disaster if I lost it.
    • I haven’t watched most of these in years. In fact, I often wonder why I’m keeping them at all.
    • This is a lot of TB of files, more than anything else.
    • RAID is not a backup, but I have yet to lose anything that was stored on a well-maintained RAID array. This one is now on it’s 3rd set of drives and second physical device and it’s doing fine. I’m probably rationalizing, but I’m feeling OK about not backing this up.
  2. Personal video clips need to be backed up from their primary storage (the NAS) at least for a while.
    • These are mostly from the GoPros and can’t be reconstructed
    • Once they are no longer used in a project, it’s not clear that I will ever need the raw files again. I’m new to this and I suppose I might revisit some things. For now I prefer to keep these, but am not willing to spend much to do it.
    • The finished, edited videos are stored on my workstation and sometimes uploaded to Youtube, so they reside in the same location as most personal files and will be backed up with them.
    • Given how easy it is to create terabytes of 4K video clips, I expect that over time I’ll end up deleting older raw files, or at least limiting how much effort I put into backing them up. (For now, There’s plenty of NAS storage.
  3. Personal files (workstation and laptop backups), music and photos should be doubly backed up indefinitely.
    • This is the stuff that I’ll really miss if it’s gone, and some of it, even the music, is not easily replaced or reconstructed.
    • Some of this is business or financial information I need to keep for my records for several years.
    • Photography has always been my “serious” hobby and I’m loathe to lose any of my photos. I probably should go through and toss all the ones that clearly don’t matter, but I haven’t. Edited, complete projects are stored on my workstation, so like the finished videos are backed up with my important personal files.
    • These files all together, even the photos, are not a huge amount to store. It’s about 2TB total.

Harsh Reality

Every file loss I’ve ever suffered has been due to operator error. My primary concern is that I inadvertently delete something and want it back within weeks, if not minutes. Hardware failure on one of the workstations is my next concern. Disaster (likely fire) that takes out both my workstation and NAS is the third concern. Hardware failure on the NAS is the final concern, given that I’d need to suffer two near-simultaneous disk failures in order to lose data which is why I consider it to be less of a concern than a fire or other disaster.

Options

For things on the workstations, the NAS can be a reasonable local backup, but is vulnerable to a local disaster like a fire, flood, earthquake, etc.

For things on the NAS (other than workstation backups) the USB drives can be reasonable local backups with the same caveats as above. USB storage is relatively cheap, with a 20TB drive from either WD or Seagate running about $270.

AWS S3 Glacier Flexible Retrieval is currently under $4/month per TB of storage with reasonable numbers of updates/additions. There are a number of factors that go into the cost. It doesn’t count the cost of API requests to S3 and does not include data egress should I ever need it. Those can add a fair amount for the initial load of a large number of small files, and would add even more if I ever needed to retrieve everything, but those are (hopefully) one-time costs.

Decisions

  1. My Windows and Linux workstations, and the notebook are backed up to the NAS nightly (or, in case of the notebook, whenever it connects to the local network). I’ve been using GoodSync software to replicate the files. The backup volume on the NAS is in turn backed up to S3 weekly using the QNAP backup software.
  2. Archived photos and music are on the NAS. They are backed up nightly to a local USB drive, and weekly to S3 using the QNAP software.
  3. Archived personal video clips are backed up nightly to USB. They are not backed up to S3.
  4. My various movies, videos and other miscelleneous video files are not backed up.
  5. As noted above, I don’t save my system configuration, operating system, or other applications. In my case, all these are easy to re-download or recreate.