[pve-devel] [PATCH docs 09/10] Rewrite ZFS - admin section
Aaron Lauterer
a.lauterer at proxmox.com
Mon Jun 17 15:05:49 CEST 2019
Polished phrasing, restuctured `advantages` list and removed double
entries, extended RAID level descriptions, aligned CLI commands style
Signed-off-by: Aaron Lauterer <a.lauterer at proxmox.com>
---
local-zfs.adoc | 295 ++++++++++++++++++++++++++++---------------------
1 file changed, 170 insertions(+), 125 deletions(-)
diff --git a/local-zfs.adoc b/local-zfs.adoc
index 13f6050..b0eb3dd 100644
--- a/local-zfs.adoc
+++ b/local-zfs.adoc
@@ -5,106 +5,122 @@ ifdef::wiki[]
:pve-toplevel:
endif::wiki[]
-ZFS is a combined file system and logical volume manager designed by
-Sun Microsystems. Starting with {pve} 3.4, the native Linux
-kernel port of the ZFS file system is introduced as optional
-file system and also as an additional selection for the root
-file system. There is no need for manually compile ZFS modules - all
-packages are included.
-
-By using ZFS, its possible to achieve maximum enterprise features with
-low budget hardware, but also high performance systems by leveraging
-SSD caching or even SSD only setups. ZFS can replace cost intense
-hardware raid cards by moderate CPU and memory load combined with easy
-management.
+General
+~~~~~~~
+
+ZFS is a combined file system and logical volume manager with RAID
+functionality. It was initially developed by Sun Microsystems. The
+Linux port of ZFS is based on the http://open-zfs.org[OpenZFS]
+project. Starting with {pve} 3.4 ZFS has been introduced as an
+optional file system with the possibility as root file system. The ZFS
+packages are shipped with {pve} - there is no need to manually compile
+ZFS.
+
+With ZFS it is possible to get enterprise level storage systems
+without expensive hardware RAID cards at the cost of only moderate CPU
+and memory loads. SSDs can be used to increase the performance either
+as caching devices of by running a SSD only setup.
.General ZFS advantages
+// In order for the small headline to be formatted correctly there
+// needs to be some normal text before the list.
+The following list is not exhaustive.
+
* Easy configuration and management with {pve} GUI and CLI.
* Reliable
* Protection against data corruption
-* Data compression on file system level
+* Self healing
-* Snapshots
+* Continuous integrity checking
-* Copy-on-write clone
+* Data compression on file system level
-* Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2 and RAIDZ-3
+* Snapshots
-* Can use SSD for cache
+* Copy-on-write clones
-* Self healing
+* Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2 and
+ RAIDZ-3
-* Continuous integrity checking
+* Can use SSDs for caching
* Designed for high storage capacities
-* Protection against data corruption
-
* Asynchronous replication over network
* Open Source
* Encryption
-* ...
-
Hardware
~~~~~~~~
-ZFS depends heavily on memory, so you need at least 8GB to start. In
-practice, use as much you can get for your hardware/budget. To prevent
-data corruption, we recommend the use of high quality ECC RAM.
+The performance of ZFS depends heavily on available memory. At least
+8GiB are needed. In practice it is best to get the most RAM that is
+possible for the hardware / budget available. The use of ECC RAM is
+recommended to reduce the chances of data corruption.
-If you use a dedicated cache and/or log disk, you should use an
-enterprise class SSD (e.g. Intel SSD DC S3700 Series). This can
-increase the overall performance significantly.
+If a dedicated cache and/or log disk is to be used, it is recommended
+to use enterprise class SSDs (e.g. Intel SSD DC S3700 Series). This
+can increase the overall performance significantly.
-IMPORTANT: Do not use ZFS on top of hardware controller which has its
-own cache management. ZFS needs to directly communicate with disks. An
-HBA adapter is the way to go, or something like LSI controller flashed
-in ``IT'' mode.
+IMPORTANT: It is not recommended to use ZFS on top of a hardware
+controller that has its own cache management. ZFS needs to communicate
+directly with the disks. A Host-Bus-Adapter (HBA) card instead of a
+RAID card is the recommended setup. Some LSI cards can be flashed to
+``IT'' mode.
-If you are experimenting with an installation of {pve} inside a VM
-(Nested Virtualization), don't use `virtio` for disks of that VM,
-since they are not supported by ZFS. Use IDE or SCSI instead (works
-also with `virtio` SCSI controller type).
+When evaluationg {pve} within a virtual machines (neste
+virtualization) do not use `virtio` for the disks. ZFS does not
+support it. `IDE`, `SCSI` and `
+
+
+use either `IDE` or `SCSI` for the VMs disks. ZFS does not
+work on `virtio` disks.
Installation as Root File System
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-When you install using the {pve} installer, you can choose ZFS for the
-root file system. You need to select the RAID type at installation
-time:
+ZFS can be selected as root file system during the installation of
+{pve}. One of the following RAID types can be selected:.
[horizontal]
-RAID0:: Also called ``striping''. The capacity of such volume is the sum
-of the capacities of all disks. But RAID0 does not add any redundancy,
-so the failure of a single drive makes the volume unusable.
+RAID0:: Also called ``striping''. Stripes the volume across all
+selected disks. The total capacity is the sum of all disks. Does
+**not** have redundancy! The failure of a single drive will fail the
+pool.
RAID1:: Also called ``mirroring''. Data is written identically to all
disks. This mode requires at least 2 disks with the same size. The
-resulting capacity is that of a single disk.
+total capacity is that of a single disk.
RAID10:: A combination of RAID0 and RAID1. Requires at least 4 disks.
-RAIDZ-1:: A variation on RAID-5, single parity. Requires at least 3 disks.
+RAIDZ-1:: Comparable to RAID-5. Distributes single parity across
+all disks. Can survive the failure of one disk. The total capacity is
+the sum of all disks minus one disk. Requires at least three disks.
+
+RAIDZ-2:: Like RAIDZ-1 but with double parity. Can survive the failure
+of two disks. The total capacity is the sum of all disks minus two
+disks. Requires at least four disks.
-RAIDZ-2:: A variation on RAID-5, double parity. Requires at least 4 disks.
-RAIDZ-3:: A variation on RAID-5, triple parity. Requires at least 5 disks.
+RAIDZ-3:: Like RAIDZ-1 but with tripe parity. Can survive the failure
+of three disks. The total capacity is the sum of all disks minus three
+disks. Requires at least five disks.
The installer automatically partitions the disks, creates a ZFS pool
called `rpool`, and installs the root file system on the ZFS subvolume
`rpool/ROOT/pve-1`.
-Another subvolume called `rpool/data` is created to store VM
-images. In order to use that with the {pve} tools, the installer
+Another subvolume called `rpool/data` is created to store virtual
+machine images. In order to use it with the {pve} tools, the installer
creates the following configuration entry in `/etc/pve/storage.cfg`:
----
@@ -114,7 +130,7 @@ zfspool: local-zfs
content images,rootdir
----
-After installation, you can view your ZFS pool status using the
+After installation the status of the ZFS pool can be viewed with the
`zpool` command:
----
@@ -136,9 +152,8 @@ config:
errors: No known data errors
----
-The `zfs` command is used configure and manage your ZFS file
-systems. The following command lists all file systems after
-installation:
+The `zfs` command is used to configure and manage ZFS file systems.
+The following command lists all file systems after installation:
----
# zfs list
@@ -154,12 +169,12 @@ rpool/swap 4.25G 7.69T 64K -
Bootloader
~~~~~~~~~~
-The default ZFS disk partitioning scheme does not use the first 2048
-sectors. This gives enough room to install a GRUB boot partition. The
-{pve} installer automatically allocates that space, and installs the
-GRUB boot loader there. If you use a redundant RAID setup, it installs
-the boot loader on all disk required for booting. So you can boot
-even if some disks fail.
+The {pve} installer is using the default ZFS disk partitioning scheme.
+It does not use the first 2048 sectors on the disk. This space is used
+to install the GRUB boot loader. If a redundant disk setup (RAID) is
+used for the root file system, the boot loader will be installed on
+each disk of the RAID. This enables the server to be booted even if
+one of the disks fails.
NOTE: It is not possible to use ZFS as root file system with UEFI
boot.
@@ -168,10 +183,11 @@ boot.
ZFS Administration
~~~~~~~~~~~~~~~~~~
-This section gives you some usage examples for common tasks. ZFS
-itself is really powerful and provides many options. The main commands
-to manage ZFS are `zfs` and `zpool`. Both commands come with great
-manual pages, which can be read with:
+This section provides examples of common tasks. ZFS is very powerful
+and does provide many options. The main commands to manage ZFS are
+`zfs` for everything within a pool (file systems, block volumes) and
+`zpool` to manage the pools. Both commands have good and exhaustive
+manual pages. They can be read by running the following commands:
----
# man zpool
@@ -180,105 +196,130 @@ manual pages, which can be read with:
.Create a new zpool
-To create a new pool, at least one disk is needed. The `ashift` should
-have the same sector-size (2 power of `ashift`) or larger as the
-underlying disk.
+To create a new pool, at least one disk is needed. The `ashift` tells
+ZFS which sector size to use. 2 to the power of `ashift` is the sector
+size (e.g. for modern 4k disks: 2 to the power of 12 = 4096).
+The `ashift` value should be selected so that the sector size is least
+the sector size of the underlying disks or larger.
- zpool create -f -o ashift=12 <pool> <device>
+----
+# zpool create -f -o ashift=12 <pool> <device>
+----
-To activate compression
+To activate compression:
- zfs set compression=lz4 <pool>
+----
+# zfs set compression=lz4 <pool>
+----
.Create a new pool with RAID-0
-Minimum 1 Disk
-
- zpool create -f -o ashift=12 <pool> <device1> <device2>
+Needs at least 1 disk.
+----
+# zpool create -f -o ashift=12 <pool> <device1> <device2>
+----
.Create a new pool with RAID-1
-Minimum 2 Disks
+Needs at least 2 disks.
- zpool create -f -o ashift=12 <pool> mirror <device1> <device2>
+----
+# zpool create -f -o ashift=12 <pool> mirror <device1> <device2>
+----
.Create a new pool with RAID-10
-Minimum 4 Disks
+Needs at least 4 disks.
- zpool create -f -o ashift=12 <pool> mirror <device1> <device2> mirror <device3> <device4>
+----
+# zpool create -f -o ashift=12 <pool> mirror <device1> <device2> mirror <device3> <device4>
+----
.Create a new pool with RAIDZ-1
-Minimum 3 Disks
+Needs at least 3 disks.
- zpool create -f -o ashift=12 <pool> raidz1 <device1> <device2> <device3>
+----
+# zpool create -f -o ashift=12 <pool> raidz1 <device1> <device2> <device3>
+----
.Create a new pool with RAIDZ-2
-Minimum 4 Disks
+Needs at least 4 disks
- zpool create -f -o ashift=12 <pool> raidz2 <device1> <device2> <device3> <device4>
+----
+# zpool create -f -o ashift=12 <pool> raidz2 <device1> <device2> <device3> <device4>
+----
.Create a new pool with cache (L2ARC)
It is possible to use a dedicated cache drive partition to increase
-the performance (use SSD).
+the performance (only recommended with an SSD).
-As `<device>` it is possible to use more devices, like it's shown in
-"Create a new pool with RAID*".
+As shown in ``Create a new pool with RAID'' it is possible to use
+multiple devices where the `<device>` placeholder is.
- zpool create -f -o ashift=12 <pool> <device> cache <cache_device>
+----
+# zpool create -f -o ashift=12 <pool> <device> cache <cache_device>
+----
.Create a new pool with log (ZIL)
-It is possible to use a dedicated cache drive partition to increase
-the performance(SSD).
+It is possible to use a dedicated log drive partition to increase
+the performance (only recommended with an SSD).
-As `<device>` it is possible to use more devices, like it's shown in
-"Create a new pool with RAID*".
+As shown in ``Create a new pool with RAID'' it is possible to use
+multiple devices where the `<device>` placeholder is.
- zpool create -f -o ashift=12 <pool> <device> log <log_device>
+----
+# zpool create -f -o ashift=12 <pool> <device> log <log_device>
+----
.Add cache and log to an existing pool
-If you have an pool without cache and log. First partition the SSD in
-2 partition with `parted` or `gdisk`
+It is possible to add a cache and log device after the pool has been
+created. The size of the log device should be about half the size of
+physical memory available. Since this will usually result in a rather
+small log device it is possible to partition an SSD and use the rest
+as cache.
-IMPORTANT: Always use GPT partition tables.
+First partition the SSD with `parted` or `gdisk`.
-The maximum size of a log device should be about half the size of
-physical memory, so this is usually quite small. The rest of the SSD
-can be used as cache.
+IMPORTANT: Always use GPT partition tables.
- zpool add -f <pool> log <device-part1> cache <device-part2>
+----
+# zpool add -f <pool> log <device-part1> cache <device-part2>
+----
.Changing a failed device
- zpool replace -f <pool> <old device> <new-device>
+----
+# zpool replace -f <pool> <old device> <new-device>
+----
Activate E-Mail Notification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-ZFS comes with an event daemon, which monitors events generated by the
-ZFS kernel module. The daemon can also send emails on ZFS events like
-pool errors. Newer ZFS packages ships the daemon in a separate package,
-and you can install it using `apt-get`:
+ZFS comes with an event daemon which monitors events generated by the
+ZFS kernel module. The daemon can send emails on ZFS events like pool
+errors. Newer ZFS packages ship the daemon in a separate package.
+It can be installed with `apt-get`:
----
# apt-get install zfs-zed
----
-To activate the daemon it is necessary to edit `/etc/zfs/zed.d/zed.rc` with your
-favourite editor, and uncomment the `ZED_EMAIL_ADDR` setting:
+To activate the daemon it is necessary to edit
+`/etc/zfs/zed.d/zed.rc`. Uncomment the `ZED_EMAIL_ADDR` setting:
--------
ZED_EMAIL_ADDR="root"
--------
-Please note {pve} forwards mails to `root` to the email address
-configured for the root user.
+Please note that {pve} forwards mails to the `root at pam` user and
+therefore to the email address that has been defined during
+installation.
IMPORTANT: The only setting that is required is `ZED_EMAIL_ADDR`. All
other settings are optional.
@@ -288,43 +329,47 @@ Limit ZFS Memory Usage
~~~~~~~~~~~~~~~~~~~~~~
It is good to use at most 50 percent (which is the default) of the
-system memory for ZFS ARC to prevent performance shortage of the
-host. Use your preferred editor to change the configuration in
-`/etc/modprobe.d/zfs.conf` and insert:
+system memory for ZFSs ARC (cache in RAM) to prevent performance
+shortages on the host. Use your preferred editor to change the
+configuration in `/etc/modprobe.d/zfs.conf` and insert:
--------
options zfs zfs_arc_max=8589934592
--------
-This example setting limits the usage to 8GB.
+This example limits the memory usage of ZFS to 8GiB
[IMPORTANT]
====
-If your root file system is ZFS you must update your initramfs every
-time this value changes:
-
- update-initramfs -u
+If ZFS is used as root file system the 'initramfs' must be updated
+every time this value is changed:
+----
+# update-initramfs -u
+----
====
[[zfs_swap]]
.SWAP on ZFS
-Swap-space created on a zvol may generate some troubles, like blocking the
-server or generating a high IO load, often seen when starting a Backup
-to an external Storage.
+Using a `zvol` as swap space can cause problems like blocking the
+server by generating high IO load. This is often seen when starting a
+backup job to an external storage.
-We strongly recommend to use enough memory, so that you normally do not
-run into low memory situations. Should you need or want to add swap, it is
-preferred to create a partition on a physical disk and use it as swapdevice.
-You can leave some space free for this purpose in the advanced options of the
-installer. Additionally, you can lower the
-``swappiness'' value. A good value for servers is 10:
+It is strongly recommended to have enough memory installed in order to
+avoid low memory situations. If swap space cannot be avoided it is
+recommended to to create and use a partition on a physical disk as
+swap space. For this the installer can be set to leave space empty on
+the disks, found in the 'advanced options' during disk selection.
+Additionally the ``swappiness'' value can be lowered. A good value for
+server is 10:
- sysctl -w vm.swappiness=10
+----
+# sysctl -w vm.swappiness=10
+----
-To make the swappiness persistent, open `/etc/sysctl.conf` with
-an editor of your choice and add the following line:
+To make the swappiness persistent, add the following line to
+`/etc/sysctl.conf`:
--------
vm.swappiness = 10
--
2.20.1
More information about the pve-devel
mailing list