This section of the wiki is all about how NOT to lose your data.
All disks fail eventually. Some disks fail quicker than others.
If a small stand-alone disk holding a limited amount of non-vital data fails, it will be a little annoying.
However if a single disk failure within an integrated group of several, or perhaps many, large drives causes you to lose every single piece of data held across all those drives - including all the critical data you ever accumulated - this can be quite literally life changing. And this
History is littered with the examples of organisations that lost their vital data - see Ma.gnolia (2009), Samsung (2014) and even the Library of Alexandria (300BC) - and Uncle Fester claims that he still remembers the Library of Alexandria fire though even by his standards this sounds unlikely.
ZFS is designed to group drives together to create one large storage pool - indeed it is designed to potentially group hundreds or even thousands of such drives together - but equally it is designed to ensure that these drives can be configured in a way that your don't lose any data even with the loss of one, two or even three drives simultaneously in any sub-group. Get this design right and your data can survive almost any isolated disk failures (though of course e.g. a fire that destroys all disks will need other backups to have been taken) but get the design wrong and you may find that the loss of a single disk causes you to lose all your data.
This section describes the types of redundancy that you can create to try to ensure that your data survives disk failures, talks about the performance characteristics of each of these, and gives some rules of thumb about which of these you should use in different circumstances.
It also discusses which types of Disk Drive are suitable for use with ZFS and which types must be avoided - and (without going into technicalities) why this is the case.
But first a brief description (in very simple terms) about how ZFS storage is organised…
The largest unit of storage in ZFS is called a Storage Pool - which in the simplest terms is a collection of disk blocks spread out over several (or many) disk drives (which can be Hard Disk Drives (HDDs) which can be SAS or SATA attached or Solid State Drives which can be SATA or NVMe attached.
Storage Pools consist of one or more Virtual Devices (vDevs), each of which consists of one or more disks, and which can be of several types:
plus zero or one of each of several types of very specialised vDev:
These special vDevs will be covered in more detail in the very next wiki page - but unless you have VERY special performance needs, most or all of these special vDevs will NOT be needed and you can skip this page.
As an aside, each storage pool has a ZFS Dataset at its root, and within this initial dataset can be a hierarchy of nested Datasets, each of which can contain normal folders and files and also Zvols (which are virtual disks for want of a better description).
However what we are concerned about on this page is how you can make an individual vDev resilient by adding extra, redundant drives.
First off, a vDev does not have to have redundancy. If we look at special vDevs, some are so vital to the operation of a storage pool that they pretty much have to be redundant - other types only hold copies or temporary data and can be safely defined without redundancy.
Data vDevs can be non-redundant - and there are several examples of single-disk storage pools where this is a very valid configuration:
But the general rule of thumb for most storage pools containing your network of VM data, resiliency should pretty much be considered mandatory.
This means you should not:
Resiliency does, of course, mean that you will be dedicating expensive drives to providing resiliency rather than basic storage, and newcomers often feel that this is an expensive luxury rather than a necessity, particularly once you start using 2 or 3 redundant disks rather than just one, however this is something you need to fully internalise and accept because it really is a necessity.
In essence there are two types of redundancy that you can create:
In all of these types of redundant data vDev, you should:
Another general rule of thumb is that all the data-vDevs in a storage pool should have a similar type and level of redundancy i.e.:
This is not a technical limitation, but rather a realisation that the reasons for a particular type of resiliency apply to the whole Storage Pool and not to individual vDevs.
This means that you will end up with different pools for different disk technologies (HDD, SATA SSD and NVMe) and between high write performance uses (mirrors) and general uses (RAIDZ).
The penultimate rule of thumb is that within a single disk type and redundancy type, it is almost always best to create a single Storage Pool of that type with multiple data-vDevs, rather than create several Storage Pools each with a single vDev - because this allows ZFS to manage the space and performance in the single storage pool for you, whereas with several smaller storage pools of the same type you will have to balance the data across them.
One final rule of thumb is that you should not expect to be able to significantly change the size of a vDev:
Normally when a drive fails, you need to physically replace it before ZFS can being the process of resilvering to bring the degraded vDev back to full redundancy.
However it is possible within a Storage Pool to provide extra data drives which are left empty at the start and which can be switched in automatically when a drive fails to begin the resilvering process immediately. When the failed drive is replaced, it can then become the new hot-spare drive ready for the next time a drive fails.
A hot spare drives can be used across all the RAIDZx data-vDevs in the storage pool, with the proviso that it cannot be used for any RAIDZ vDevs whose drives are bigger than the hot-spare.
As an aside, in the most recent versions of TrueNAS SCALE, there is an alternative type of vDev (called Distributed RAID or dRAID) which allows you to use hot spares in a different (and complex) way that reduces the resilvering time.
The author's interpretation of how dRAID works is that it effectively distributes the hot-spare empty blocks across all the disks in the vDev, so that when a drive fails it can resilver into these empty blocks and because these blocks are spread across all the disks in the Storage Pool rather than on a single device, the resilvering can be completed in a much shorter time. This then allows each RAIDZ group within the Storage Pool to be wider and achieve the same resilvering time, and thus reduces the number of drives you need to dedicate to redundancy across the Storage Pool.
iXsystems recommend that you only use this on Storage Pools with ≥ 100 drives, which puts it outside the scope of this “Beginners Guide”. We really only mention it here in order to save you investigating this unnecessarily.
If your hardware has a specific number of 3.5" hardware slots and (now or eventually) you want to be able to use all these slots, here is a proposed approach for deciding how to group them into vDevs:
As an example, suppose you have 20 slots, and you need 2 of them for a mirror vDev and want RAIDZ2 for the bulk of your storage. You can divide the remaining slots into either:
When discussing use of redundant disks, it is important to warn against a particular hard drive technology called SMR (Shingled Magnetic Recording - which is slightly cheaper to produce than the alternative CMR technology).
The problem with SMR drives is that they have very poor (c. 10%) bulk[1] random write throughput (compared to CMR drives) - and whilst this write throughput might be completely acceptable in normal usage where writes are sporadic, if you ever need to resilver which requires massive random write throughput, the performance of the SMR disks is completely inadequate.
This makes SMR drives completely and utterly unsuitable for use with ZFS redundant vDevs[2].
When purchasing HDDs, you MUST ensure that the drives you buy are NOT SMR drives, and to do this you will likely need to check the detailed and formal drive specifications issued by the manufacturer because retail marketing materials often fail to disclose this, and often the drives themselves and the packaging they are delivered in often do not state this either.
Sadly, for some manufacturers this is even true for some of the drives in their product ranges which are specifically designed for NAS usage. For example, Western Digital Red drives are sold as being suitable for small NAS systems up to 8 bays, but the detailed specification PDF fails to mention anywhere that they are unsuitable for ZFS.
[1] SMR drive typically have a “persistent cache” area of the drive which is CMR of c. 70GB-100GB, where writes are staged, and later when the drive is idle they are written out asynchronously to the correct areas - but if you are doing bulk random writes, allowing no time for this background de-caching, then the cache fills up and everything slows to a crawl whilst previous writes are de-cached in order to make space for new writes to be cached. If you want a detailed explanation, here is a Youtube video which explains in detail.
[2] Here is another Youtube video which shows a test of SMR drives in ZFS arrays showing just how disastrous they will be (and damning Western Digital for introducing SMR drives into their WD Red line of NAS drives without properly disclosing that they did this).
Prev - 2.2.3 Data Volumes | Index | Next - 2.2.5 ZFS Special Devices |