This section of the wiki is NOT here to help you calculate how much data you want to store on your NAS - only you can estimate how long that piece of string is.
However the Thing writes that you shouldn't to worry too much about the detail because disk space only comes in big chunks anyway.
Rather, the purpose of this section is to help you translate whatever estimate you come up with into a decision on how much disk space you need to install - because you do NOT want to run out because you were too conservative, but OTOH you don't want to spend a lot of extra dosh on disk space you will never use. There are several factors that come into this:
Whatever estimate you make it is likely to be wrong, and history shows that almost always it is an underestimate rather than an over estimate.
Wednesday Addams' rule of thumb is that whatever estimate you come up with you should double or triple it to account for both underestimates and future growth.
Sometimes you want to keep multiple copies of data: multiple backup generations, multiple old versions of files so you can go back to them etc.
You can do this in two ways:
Note: If you just want to have identical copies of a file in two datasets in the same pool then (if you are running Dragonfish or later) when you copy the file, ZFS will use a technique called βblock cloningβ to create the new file by simply pointing it at the same data blocks as the old file. The consequences of this are
Snapshots pretty much do what they say on the can which is to literally take a snapshot of the data on a disk at a point in time (say every night). However this is done in a very clever way by ZFS which does NOT use any disk space (at least not immediately). The extra disk space is used when you delete or replace a file, because the space used by any files in one or more Snapshots is NOT released, and the old file can be recovered at a later point in time.
Snapshots are also a GREAT defence against ransomware attacks. Ransomware encrypts all your files, but so long as you still have a Snapshot that pre-dates the encryption, you can recover your files through the GUI with a click of a button.
Of course, if you kept every snapshot ever made, then nothing would ever really be deleted from the file system and eventually it would run out of space, so when you define a Snapshot you say not only how often it is taken, but how long it is kept for. And when a Snapshot is deleted, any disk space that is used for files that were deleted or changed after that snapshot and before the next one (i.e. that are not either current files or in a later snapshot) is then freed up and available for reuse.
So you when it comes to defining how the data will be stored and accessed, you will need to structure the βdatasetsβ so that different classes of data are held separately and can then be subject to different Snapshots definitions which will retain copies for a specific length of time.
Gomez Addams, who is a bit of a wide boy, loves Snapshots because:
The Thing adds the following rules of thumb for Snapshots:
Different types of data (with differing sizes, differing frequencies of change, different importance) will have different snapshot requirements (frequency, retention period), and since snapshots are by dataset then you need to define datasets and network shares in a way that supports these differing snapshot requirements.
If you already use file naming to keep multiple backups (by e.g. using versions or dates as part of the filename), you might want to think about whether to use Snapshots instead and avoid the complexities of pruning them yourself.
And, of course, there may still be use-cases (which is the Addams' Family IT support bloke's way of saying βsome situationsβ) where you still want to use filename versioning.
ZFS has built in compression, with various algorithms that trade of the level of compressions against speed & CPU load.
The default level of compression is fast and low impact and typically saves you 25%-33% of disk space, and Wednesday Addams (who is of course a minimalist goth) recommends:
Like all file systems, ZFS allocates disk space in chunks, and the chunks depend on how many non-redundant disks you have in your storage pool i.e. the stripe size, and on-average each file will only use Β½ of the last stripe that is needed.
That said, stripes are almost always a multiple of 4KB, so for a (say) 3-wide stripe the stripe size is 12KB and you will use on average c. 6KB more storage per file than the file's actual size. That said, most files are at least two orders of magnitude larger than this (and often a lot more) and so as a % of the file size it is still relatively small.
Similarly when you store a file, you also need to store details of the file's name, security properties and where the file is stored on the disk. Them anally-retentive IT folks call this Metadata, and the point is that some small amount of disk space is needed to store this stuff.
Uncle Fester's Guide gives some recommendations in the implementation sections on this, but at this stage for the purposes of calculating the amount of disk space you need, The Thing recommends that you allow an extra 10% for this overhead.
ZFS does not support defragmentation because with the amount of disk space managed by a NAS this might take years to complete - and instead has been designed NOT to need disks defragmented to consolidate free space or to have a performance hit if large files are fragmented. The flip side to this, however, is that if the βpoolβ utilisation goes over 85% then allocating space for new files slows down considerably.
To take this into account, you need to divide the disk space you think you need by 0.85 (which is equivalent to multiplying by 1.2 or adding 20%).
Gomez Addams has told me that the answer is βA byte, a byte, a byte & a half, half a byte and a byteβ but I think this is a misquote of an old joke he heard at school a century or so ago, or perhaps just one of his Dad jokes. Wednesday thinks it's lame either way.
But he does have a point. There are two ways that you count bytes - either in multiples of 1,000 or in multiples of 1,024, and whilst this doesn't make that much difference in small amounts, when you are talking of trillions of bytes (or as Thing likes to type, TB for tera-bytes) these differences mount up. Anyway, some techies with tight sphincters differentiate between these by sticking an eye in the abbreviation - no, not another disembodied body-part friend of Things, but the letter βiβ. Here is an illustration of why this matters:
1000 | 1024 | % difference |
1KB = 1,000B | 1KiB = 1,024B | 2.4% |
1MB = 1,000,000B | 1MiB = 1,048,578B | 4.8% |
1GB = 1,000,000,000B | 1GiB = 1,073,741,824B | 7.3% |
1TB = 1,000,000,000,000B | 1TiB = 1,099,511,627,776B | 10.0% |
So when we start to talk about NAS capacities these differences can be 10% which is enough to be significant.
Different operating systems report the size of your data using 1,000s or 1,024s - TrueNAS uses 1,024 and GiB or TiB. But almost all disk manufacturers refer to the size of their disks in 1,000s or GB or TB.
This means that when you are deciding what disks to buy, you need to add 10% to convert TBs from 1,024s to 1,000s.
Finally, Pugsley Addams (who delights in destroying disks just for fun) wants to briefly mention the single most important factor when determining the physical disk space you need, which is Redundant Disks.
In a normal PC, your data is stored on a single disk, and if that disk fails, all the data on that disk is lost - but because each disk is separate, the data on all the other disks is just fine. However, ZFS can group disks together into vDevs and group (stripe) several vDevs into a Pool - and if you don't have sufficient redundancy, then if you lose a disk, you lose all the data on all the disks in the pool.
The other thing to realise about redundancy, is that if you lose a disk and replace it, rebuilding the vDev's redundancy (so called βresilveringβ) requires a huge amount of time and effort and puts stress on all the remaining disks increasing the likelihood that another disk will fail during the resilver - and so for big vDevs (large disks and / or large numbers of disks) you really should consider having redundancy of 2 or 3 disks.
With ZFS there are various ways that you can use additional disks in such a way that if one disk fails (or if you provide more spare disks then two or three disks fail simultaneously) then not only is your data not lost, but your NAS can continue working. And when you replace the failed disk(s), the NAS will automatically recover its redundant state over the next several hours.
Here are a few rules of thumb:
Our family servant, Lurch, is responsible for keeping our place tidy and for neatly storing all our personal stuff, and since all of the above personal data is going to be accessed over the network, and in totality each small part of it is going to be accessed rarely, Lurch normally chooses to store it on spinning Hard Disk Drives (HDDs).
Unfortunately Gomez has been reminiscing again, and seems to have been thinking about one of those old time guess the number parlour tricks that were common family entertainment when he was a kid before valve radios came along and spoiled things, and so by way of summarising says:
Or combining 4.-7. together, multiply by 1.134375 or more sensibly 1.14.
Prev - 2.2.2 Server-only Data Types | Index | Next - 2.2.4 Redundant disks |