On the previous wiki page we have discussed the various types of redundant vDevs - Mirrors or RAIDZ1/2/3.
This page discusses some other, special, types of vDev that ZFS provides for improving performance in specific circumstances:
To save you some time and effort in reading and digesting the remainder of this page, here are some general rules of thumb as to why most people don't need specialist vDevs at all:
In the vast majority of cases, none of these special disks are either needed or a good idea, and those use cases that need them are usually pretty specialised. However it is possible that you do have one of these use cases, in which case one or more of these may be of use to you.
If you really feel that you need any of these specialist devices, you should really be considering the use of NVMe disks (which are substantially faster than SATA SSDs), in which case you may need motherboards with multiple native NVMe slots (i.e. each of which has a direct connection to a dedicated CPU PCIe lane). If you decide to use e.g. PCIe cards with multiple NVMe slots on them, then do make sure that the card supports sufficient PCIe lanes to support the performance needs of those NVMe slots and that both the motherboard and PCIe card support bifurcation and that multiple M.2 cards are not supported by a multiplexing chipset.
As previously stated, ZFS comes with an Adaptive Replacement Cache (ARC) as standard which uses all the otherwise unused memory in your system as a cache for both metadata (data about where your actual files are stored) and the most recently read actual data so that if it is requested again then the request can be satisfied from memory rather than having to go to disk, which is several orders of magnitude faster/slower.
The most important data to be held in ARC is the metadata because this relatively small amount of data is needed to locate where the actual data is stored on disk and it is therefore used relatively frequently. Holding actual data in ARC is less important in most home/small business servers, partly because once the first network request has been sent, ZFS keeps pre-fetching the rest of the file and storing it in cache in anticipation that the remainder of the file will be requested next - and thus all subsequent requests end up being satisfied from ARC.
The consequence is that even relatively moderate amounts of ARC can still often provide very high hit ratios in a typical home / small business environment - the authors very modest 5GB ARC delivers c. 99.5% hit rate.
If your ARC hit ratio is not as high as you would like, then adding more memory to your system can easily be the cheapest and most effective way to improve your overall I/O performance.
Because it is held in RAM, standard ARC is not persistent - in other words it does NOT survive a reboot or power off and needs to be repopulated from original disks every time these happen.
A Level 2 Adaptive Replacement Cache (L2ARC) is an SSD (NVMe or SATA) which stores copies of the most frequently accessed metadata and data blocks so that they can be used to populate the ARC faster then reading from slower HDD (or SSD) disks, and L2ARC is persistent across reboots and power downs.
The downside of an L2ARC is that it reduces the amount of memory available for standard ARC because the L2ARC index needs to be held in memory.
Don't consider an L2ARC unless your server has at least 60GB and preferably 120GB of free memory after O/S, services, apps and VMs are loaded.
So, for some specific workloads which have occasional but repeating access to the same data, then an L2ARC can potentially be quite effective. Potential examples:
There is no point in adding an L2ARC unless:
As you might expect if you understand the full name, a Synchronous Log (SLOG) can provide performance benefits for response time critical or I/O intensive synchronous writes.
In an asynchronous write, ZFS holds the data in memory, sends a success response to the client immediately, and writes this data to disk every 5seconds in an atomic transaction group (which preserves ZFS consistency even if the system crashes part way through this bulk write). So many application writes are combined into a single, larger, more efficient write every few seconds.
In a synchronous write, ZFS writes the data to disk (in a special area called a Zero-Intent-Log or ZIL), and only when this is complete does it send a success response to the client, and then as with asynchronous writes, data is written to disk in transaction groups every five seconds. So for synchronous writes, every application write results in an actual ZIL write to disk, plus the same larger bulk write as asynchronous writes every few seconds.
Because of this asynchronous writes are 10x to 100x faster and less resource intensive as synchronous writes.
There are many situations where writes are sent as a synchronous write as default but this commitment is not necessary. In these situations it is possible to configure e.g. the Zvol or Dataset as being asynchronous and / or the NFS share as asynchronous and / or the client mount of that share as asynchronous.
SMB access from Windows is also always asynchronous.
The ZIL is never normally read - it is read only at system start-up to ensure that any bulk writes that haven't happened are caught up.
Normally the ZIL is on the same disks as the data. With an SLOG, the ZIL is on a separate SLOG device, normally substantially faster than an HDD. This provides two benefits:
An SLOG will not benefit:
An SLOG will potentially benefit:
In practice, there is no point in adding a SLOG unless:
Additionally:
ZFS normally stores the metadata for your storage area - which describes which files are where - as part of the storage area itself. However you can use a Special vDev, or more accurately a Special Allocation vDev, to store this metadata for HDD storage areas separately on a faster SSD (NVMe or possibly SATA) storage medium, and this will speed up how quickly TrueNAS can locate where on the HDD your data is stored.
This type of vDev works by diverting requests to allocate a new metadata block so that they are satisfied from this SSD vDev rather than from the main data vDev(s). If you set the dataset to do so, it can also divert requests to allocate blocks for a small file (≤ a defined size) to the SSD vDev as well.
This is very different to a L2ARC which holds a copy of the metadata or file blocks. A Special (Allocation) vDev holds the primary and ONLY copy of the pool's metadata, and if this vDev is lost (e.g. because it is non-resilient and the only device it is held on fails) then the entirety of the pool's data will be irrevocably lost too.
For this reason it is essential that a Special (Metadata) vDev is 2x (or even 3x) mirrored - and the simple starting point is to use the same redundancy as the data vDev(s).
In practice, there is no point in adding metadata SSDs unless:
Additionally:
Deduplication is done by finding ZFS blocks which have the same exact contents, and then adjusting the metadata for the files that use these identical blocks to point to the same single block, allowing the other block(s) to be returned to the free space list.
However, this then means that a great deal more care has to be taken rewriting these files in order to ensure that a change to a block in one file does not end up inadvertently impacting the matching blocks in other files.
Additionally, once you have added deduplication special devices to a pool, they cannot later be removed from the pool.
Deduplication is a VERY intensive operation, requiring significant memory, fast SSDs and quite a lot of CPU. Consequently, ZFS Deduplication is a highly specialised solution to what is typically a large enterprise requirement.
Uncle Fester therefore recommends that you avoid deduplication like the plague - and he can remember full well just how bad the plague was when it killed half his family in the 17C (and he sneers at Covid as being a pale immitation!!).
If you are thinking about adding any type of special vDev, then think about it again because in most typical home / small business environments they are not only unnecessary but possibly add to the risk and complexity of the solution and thus add to the complexity of recovering your pools when things go wrong.
But if you genuinely believe that they would be beneficial to your own use case, the before you go ahead please do seek advice in the TrueNAS forums on both 1) whether they are indeed necessary or a good idea; and 2) the best way to configure them.
Prev - 2.2.4 Redundant disks | Index | Next - 2.3 Choosing your Server Hardware |