To RAID or not to RAID | Karl Kaufmann Online

Posted on 2019-02-03

With the expanding options available for cloud data storage, it’s tempting to avoid offline options altogether. While this may be advisable in some cases, there are other factors to consider. These include hosting costs, data access/pricing, security, and data integrity. When all the factors are considered, a Redundant Array of Inexpensive Drives¹ (RAID) may be a viable situation for your use case.

I will preface this by stating that although I have some networking and server configuration experience—I make no claims on being an expert—and wanted a straightforward, reliable solution that would be easy to maintain and service. First and foremost, I'm a designer, and would rather be spending my time building creative solutions to design challenges than maintaining and troubleshooting network equipment. Unless you particularly enjoy networking and device protocols, I would advise taking into account your time investment—and potential frustration—when going the locally-maintained route.

There’s also the existential aspect—part of what spurred me to look at more robust solutions was the complete failure of a local storage volume. This opened 1TB drive shows telltale scoring on the drive platter (the rings seen below the write heads)—and shows no practical method would retrieve some of the data, since the magnetic media was scored off the disk surface. While this was quite annoying and time consuming (hours searching through offline archival backups), it wasn’t catastrophic, but a stark reminder of what can easily happen when data exists without a solid backup contingency. Also, there are services available for device recovery—at a potentially steep cost—but won’t work in all situations, like the drive shown here.

This led me to examine my options. Since I do design work spanning multiple media, and need access to large project files (sometimes 1GB+) on a quick and regular basis, as well as a file structure that allowed fast scanning and retrieval, I knew that most basic tier services wouldn’t fit the bill. Doing a rough estimate on costs, it led me to consider pricing of online data storage vs. local. To further narrow down the comparison, I estimated 4TB as a reasonable size to see me through the foreseeable future.

Below are cost structures from major cloud storage providers, as well as some local NAS (Network Attached Storage) options. Granted, determining pricing is a convoluted process, and it’s possible that better options from the listed providers exist. Your use case may warrant a different path than mine—which was a decently-sized locally available storage. In the interest of simplicity, I looked for straightforward comparisons, and put this together (prices listed at date of article posting):

Cloud Storage—4TB Annual Pricing

Provider	Costs
Amazon S3	$1130
Box Drive	*$540
Dropbox	$720
Google Cloud Storage	*$983
Microsoft Azure Files	$2950
*Data transfer charges may apply

NAS Devices—Purchase Costs

Device	*Cost
Drobo 5N2 5-Bay Standard Edition	$940
QNAP TS-453Be-2G-US 4-Bay	$910
Synology DS418+ 4-Bay	$820
Synology DS918+ 4-Bay	$1,000
*Cost is device + 3 drives

Note: Pricing is based with 3 4TB Seagate ST4000NM0035 Enterprise HDDs (Hard Disk Drives). Three drives were chosen since this is the minimum for RAID 5 (and comparable hybrid/custom solutions), where failure of one drive still allows data access, as well as rebuilding when a replacement is added, without service interruption. All prices are from Amazon.com as of publication date. As with cloud storage, there are myriad options. Based on recommendations, I used decently-reviewed enterprise-class drives to account for reliability and integrity. Some setups will allow (or ship) for desktop-class drives, which aren’t rated for the continuous uptime of a NAS (and not recommended by most hardware vendors). This may work in your situation, but it’s wise to consider the tradeoffs in reliability. Also, per online discussions on Reddit, and elsewhere, there is the roll-your-own route using open source software as well as custom-source hardware, which allows customizable solutions, but no dedicated support. Not included in the costs are hubs/routers. For my use case, this was a nominal addition, described below.

Online vs. local storage—some factors to consider:

Size of your dataset
How often you need access to data
Remote availability of data
Transfer speeds (especially for large items—and remote connections)
Costs of storage services vs. devices
Costs of traffic bandwidth (for large items/numerous access sizes) It’s also imperative to note that most broadband packages, unless business-class, are asymmetric (slower upload vs. download speeds), and may have data caps, hidden fees, or throttling above a certain amount.
Data protection/integrity. This can even be a factor with major players, such as these events with Azure and AWS
Scalability when datasets increase
Uptime/Reliability (even factoring in hardware failures)

Based on my use case, costs were pointing more in the direction of local storage. Since most of the large files—base design assets—aren’t shared (and final output is much smaller), third-party transfer wasn’t a major issue. Also, I was looking for a solution that was easy to set up and manage, and offered future upgrades that were reasonable from a cost and time management perspective.

My final setup:

Component	Cost
Synology DS918+ 4-Bay NAS	$550
4TB Seagate ST4000NM0035 Enterprise HDDs X 3	$450
NETGEAR GS105NA 5-Port Gigabit Ethernet Switch	$35
Total	$1,035

Note: Gigabit Ethernet was chosen for its ubiquity, longevity, and compatibility. In my case, I can access the NAS from either 5Ghz WiFi, or wired Ethernet. Although most of my hardware is MacOS-based, Windows and Linux machines could be added with little issue. Legacy devices can also be used, where Thunderbolt 3 may have been a challenge. I was able to connect to my existing legacy networking hardware without issue.

Image of Synology DS918+ with hard disk drive.

The proprietary Synology Hybrid RAID (SHR) management was chosen for quick setup, and tracks most closely to RAID 5 (which offers redundancy + seamless drive replacement after failure). In addition, the capacity can be increased, when needed, as well as adding another drive at a later date to provide a hot spare. This would allow for seamless integration when an active drive fails, without interruption or user intervention. Since all drives in the NAS are hot-swappable, the drive can be replaced without taking the system off line. This image shows the DS918+ with a hard drive and its mounting sled.

Setup time, after delivery, was about one hour. Instructions were easy-to-follow, and required no tools to mount the hard drives to their sleds and insert. After this, drives were formatted and ready-for-use in about twelve hours. This part of the process requires no user input. Since, at this time, I only intend to use the NAS as local storage, I didn’t enable Web sharing (allows remote access), Web server functionality, or media streaming to the local network.

In two months of use, I have only had the NAS offline for about 15 minutes, and this was to update the proprietary Synology system software, which ran without issue. Aside from that, performance has met my needs and expectations, alleviating concerns of another drive failure (and potential data loss). I can access and search for files the same way I would on any other attached drive.

Based on my cost analysis and estimates, when compared to online services, my investment will have paid itself back in about a year. Most importantly—it won’t incur ongoing service fees or potential broadband costs.

Conclusion

It’s important to factor in the many facets of your particular use case, as well reasonably expected future needs. In my case, local storage via RAID offers an easily maintainable solution with future expansion, as well as controlling costs.

Final caveat—whether cloud or local—most data storage is meant to keep you up and running, not be an archival backup. For critical data safekeeping, consider offline backups such as DVD-R or Blu-Ray (BD-R) for large datasets that can’t be replaced.

Hope that my time (and frustration) helps inform your considerations when looking at reliable and cost-effective data storage solutions.

Author note: As of the time of purchase of this equipment and writing this article, I have no financial or promotional connection to any manufacturer or service provider referenced in this article.

Notes

This version of the acronym can be a point of contention. It was the original definition, as coined by David A. Patterson, Garth Gibson, and Randy H. Katz in their 1988 research paper A Case for Redundant Arrays of Inexpensive Disks. Due to some marketing concerns, others later referred to RAID as a Redundant Array of Independent Drives, presumably as a way to add value, and hopefully charge a premium to enterprise clients. Although an expensive investment at one time, this no longer has to be the case. ↩︎

Topics

Data

Storage