[pve-devel] [PATCH multiple] btrfs, file system for the brave

Wolfgang Bumiller w.bumiller at proxmox.com
Wed Jun 9 15:18:44 CEST 2021

This is another take at btrfs storage support.
I wouldn't exactly call it great, but I guess it works (although I did
manage to break a few... Then again I also manged to do that with ZFS
(it just took a few years longer there...)).

This one's spread over quite a few repositories, so let's go through
them in apply-order:

* pve-common:

  One nice improvement since the last go around is that by now btrfs
  supports renameat2's `RENAME_EXCHANGE` flag.

  * PATCH 1/1: Syscalls/Tools: add renameat2

    The idea here is to have a more robust "rollback" implementation,
    since "snapshots" in btrfs are really just losely connected
    subvolumes, and there is no direct rollback functionality.  Instead,
    we simply clone the snapshot we want to roll back to (by making a
    writable snapshot), and then rotate the clone into place before
    cleaning up the now-old version.
    Without `RENAME_EXCHANGE` this rotation required 2 syscalls
    creating a small window where, if the process is stopped/killed,
    the volume we're working on would not live in its designated place,
    making it somewhat nasty to deal with. Now, the worst that happens
    is an extra left-over snapshot lying around.

* pve-storage:

  * PATCH 1/4: fix find_free_disk_name invocations

    Just a non-issue I ran into (the parameter isn't actually used by
    our implementors currently, but it confused me ;-) ).

  * PATCH 2/4: add BTRFS storage plugin

    The main implementation with btrfs send/recv saved up for patch 4.
    (There's a note about `mkdir` vs `populate` etc., I intend to clean
    this up later, we had some off-list discussion about this

    Currently, container subvolumes are only allowed to be unsized
    (size zero, like with our plain directory storage subvols), though
    we *could* enable quota support with little effort, but quota
    information is lost in send/recv operations, so we need to cover
    this in our import/export format separately, if we want to.
    (Although I have a feeling it wouldn't be nice for performance

  * PATCH 3/4: update import/export storage API

    _Technically_ I *could* do without, but it would be quite
    inconvenient, and the information it adds to the methods is usually
    readily available, so I think this makes sense.

  * PATCH 4/4: btrfs: add 'btrfs' import/export format

    This requires a bit more elbow grease than ZFS, though, so I split
    this out into a separate patch.

* pve-container:

  * PATCH 1/2: migration: fix snapshots boolean accounting

    (The `with_snapshots` parameter is otherways not set correctly
    since we handle the base volume last)

  * PATCH 2/2: enable btrfs support via subvolumes

    Some of this stuff should probably become a storage property...
    For container volumes which aren't _unsized_ this still allocates
    an ext4 formatted raw image. For size=0 volumes we'll have an
    actual btrfs subvolume.

* qemu-server:

  * PATCH 1/1: allow migrating raw btrfs volumes

    Like in pve-container, some of this stuff should probably become a
    storage property...

Big Terrifying Risky File System

More information about the pve-devel mailing list