[pve-devel] [PATCH pve-cluster 15/15] pmxcfs-rs: add project documentation

Kefu Chai k.chai at proxmox.com
Tue Jan 6 15:24:39 CET 2026


From: Kefu Chai <tchaikov at gmail.com>

---
 src/pmxcfs-rs/ARCHITECTURE.txt | 350 +++++++++++++++++++++++++++++++++
 src/pmxcfs-rs/README.md        | 235 ++++++++++++++++++++++
 2 files changed, 585 insertions(+)
 create mode 100644 src/pmxcfs-rs/ARCHITECTURE.txt
 create mode 100644 src/pmxcfs-rs/README.md

diff --git a/src/pmxcfs-rs/ARCHITECTURE.txt b/src/pmxcfs-rs/ARCHITECTURE.txt
new file mode 100644
index 00000000..2854520b
--- /dev/null
+++ b/src/pmxcfs-rs/ARCHITECTURE.txt
@@ -0,0 +1,350 @@
+================================================================================
+                     pmxcfs-rs Architecture Overview
+================================================================================
+
+                         Crate Dependency Graph
+================================================================================
+
+                        +-------------------+
+                        | pmxcfs-api-types  |
+                        | (Shared Types)    |
+                        +-------------------+
+                                 ^
+                                 |
+          +----------------------+----------------------+
+          |                      |                      |
+          |                      |                      |
++---------+---------+  +---------+---------+  +---------+---------+
+| pmxcfs-config     |  | pmxcfs-memdb      |  | pmxcfs-rrd        |
+| (Configuration)   |  | (SQLite DB)       |  | (RRD Files)       |
++-------------------+  +-------------------+  +-------------------+
+          ^                      ^                      ^
+          |                      |                      |
+          |         +------------+------------+         |
+          |         |                         |         |
++---------+---------+              +---------+---------+
+| pmxcfs-ipc        |              | pmxcfs-status     |
+| (libqb Server)    |              | (VM/Node Status)  |
++-------------------+              +-------------------+
+          ^                                  ^
+          |                                  |
+          |         +------------------------+
+          |         |
++---------+---------+
+| pmxcfs-logger     |
+| (Cluster Log)     |
++-------------------+
+          ^
+          |
++---------+---------+              +-------------------+
+| pmxcfs-dfsm       |              | pmxcfs-services   |
+| (State Machine)   |              | (Service Mgmt)    |
++-------------------+              +-------------------+
+          ^                                  ^
+          |                                  |
+          +------------------+---------------+
+                             |
+                   +---------+---------+
+                   |     pmxcfs        |
+                   |   (Main Daemon)   |
+                   +-------------------+
+
+
+================================================================================
+                          Component Descriptions
+================================================================================
+
+pmxcfs-api-types
+    Shared types, errors, and constants used across all crates
+    - Error types (PmxcfsError)
+    - Common data structures
+    - VmType enum (Qemu, Lxc)
+
+pmxcfs-config
+    Corosync configuration parsing and management
+    - Reads /etc/corosync/corosync.conf
+    - Extracts cluster configuration (nodes, quorum, etc.)
+    - Provides Config struct
+
+pmxcfs-memdb
+    In-memory database with SQLite persistence
+    - SQLite schema version 5 (C-compatible)
+    - FUSE plugin system (6 functional + 4 link plugins)
+    - Key-value storage
+    - Version tracking
+
+pmxcfs-rrd
+    Round-Robin Database file management
+    - RRD file creation and updates
+    - Schema definitions (CPU, memory, network, etc.)
+    - Format migration (v1/v2/v3)
+    - rrdcached integration
+
+pmxcfs-status
+    Cluster status tracking
+    - VM/CT registration and tracking
+    - Node online/offline status
+    - RRD data collection
+    - Cluster log storage
+
+pmxcfs-ipc
+    libqb-compatible IPC server
+    - Unix socket server (@pve2)
+    - Wire protocol compatibility with libqb clients
+    - QB_IPC_SOCKET implementation
+    - 13 IPC operations (version, get, set, mkdir, etc.)
+
+pmxcfs-logger
+    Cluster log with distributed synchronization
+    - Ring buffer storage (50,000 entries)
+    - Deduplication
+    - Binary message format (32-byte aligned)
+    - Multi-node synchronization
+
+pmxcfs-dfsm
+    Distributed Finite State Machine
+    - State synchronization via Corosync CPG
+    - Message ordering and queuing
+    - Leader-based updates
+    - Membership change handling
+    - Services:
+      * ClusterDatabaseService (MemDB sync)
+      * StatusSyncService (Status sync)
+
+pmxcfs-services
+    Service lifecycle management framework
+    - Automatic retry logic
+    - Service dependencies
+    - Graceful shutdown
+
+pmxcfs (main daemon)
+    Main binary that integrates all components
+    - FUSE filesystem operations
+    - Corosync/CPG integration
+    - IPC server lifecycle
+    - Plugin system
+    - Daemon process management
+
+
+================================================================================
+                          Data Flow: Write Operation
+================================================================================
+
+User/API
+   |
+   | write to /etc/pve/nodes/node1/qemu-server/100.conf
+   |
+   v
+FUSE Layer (pmxcfs::fuse::filesystem)
+   |
+   | filesystem::write()
+   |
+   v
+MemDB (pmxcfs-memdb)
+   |
+   | memdb.set(path, data)
+   | Update SQLite database
+   |
+   v
+DFSM (pmxcfs-dfsm)
+   |
+   | dfsm.broadcast_update(FuseMessage::Write)
+   |
+   v
+Corosync CPG
+   |
+   | CPG multicast to all nodes
+   |
+   v
+All Cluster Nodes
+   |
+   | Receive CPG message
+   | Apply update to local MemDB
+   | Update FUSE filesystem
+
+
+================================================================================
+                       Data Flow: Cluster Log Entry
+================================================================================
+
+Local Log Event
+   |
+   | cluster log write
+   |
+   v
+Logger (pmxcfs-logger)
+   |
+   | Add to ring buffer
+   | Check for duplicates
+   |
+   v
+Status (pmxcfs-status)
+   |
+   | Store in status subsystem
+   |
+   v
+DFSM (pmxcfs-dfsm)
+   |
+   | Broadcast via StatusSyncService
+   |
+   v
+Corosync CPG
+   |
+   | Multicast to cluster
+   |
+   v
+All Nodes
+   |
+   | Receive and merge log entries
+
+
+================================================================================
+                         Data Flow: IPC Request
+================================================================================
+
+Perl Client (PVE::IPCC)
+   |
+   | libqb IPC request (e.g., get("/nodes/localhost/qemu-server/100.conf"))
+   |
+   v
+IPC Server (pmxcfs-ipc)
+   |
+   | Parse libqb wire protocol
+   | Route to appropriate handler
+   |
+   v
+MemDB (pmxcfs-memdb)
+   |
+   | memdb.get(path)
+   | Query SQLite or plugin
+   |
+   v
+IPC Server
+   |
+   | Format libqb response
+   |
+   v
+Perl Client
+   |
+   | Receive data
+
+
+================================================================================
+                      Initialization Sequence
+================================================================================
+
+1. Parse command line arguments
+   - Debug mode, local mode, paths, etc.
+
+2. Set up logging (tracing)
+   - journald integration
+   - Environment filter
+   - .debug file toggle support
+
+3. Initialize MemDB
+   - Open/create SQLite database
+   - Initialize schema (version 5)
+   - Register plugins
+
+4. Load Corosync configuration
+   - Parse corosync.conf
+   - Extract node info, quorum settings
+
+5. Initialize Status subsystem
+   - Set up VM/CT tracking
+   - Initialize RRD storage
+   - Set up cluster log
+
+6. Create DFSM
+   - Initialize state machine
+   - Set up CPG handler
+   - Register callbacks (MemDbCallbacks, StatusCallbacks)
+
+7. Start Services
+   - ClusterDatabaseService (MemDB sync)
+   - StatusSyncService (Status sync)
+   - QuorumService (quorum monitoring)
+   - ClusterConfigService (config sync)
+
+8. Initialize IPC Server
+   - Create Unix socket (@pve2)
+   - Set up request handlers
+   - Start listening
+
+9. Mount FUSE Filesystem
+   - Create mount point (/etc/pve)
+   - Initialize FUSE operations
+   - Start FUSE event loop
+
+10. Enter main event loop
+    - Handle DFSM messages
+    - Process IPC requests
+    - Service FUSE operations
+    - Monitor quorum
+
+
+================================================================================
+                        Key Design Patterns
+================================================================================
+
+Trait-Based Abstraction
+    - DFSM uses Callbacks trait for MemDB/Status updates
+    - Enables testing with mock implementations
+    - Clean separation of concerns
+
+Service Framework
+    - pmxcfs-services provides retry logic
+    - Services can be started/stopped independently
+    - Automatic error recovery
+
+Plugin System
+    - MemDB supports dynamic plugins
+    - Functional plugins: Generate content on-the-fly
+    - Link plugins: Symlinks to other paths
+    - Examples: .version, .members, .vmlist, etc.
+
+Wire Protocol Compatibility
+    - IPC server implements libqb wire protocol
+    - Binary compatibility with C libqb clients
+    - Enables Perl tools (PVE::IPCC) to work unchanged
+
+Async Runtime
+    - tokio for async I/O
+    - Non-blocking operations
+    - Efficient resource usage
+
+
+================================================================================
+                          Thread Model
+================================================================================
+
+Main Thread
+    - FUSE event loop (blocking)
+    - Handles filesystem operations
+
+Tokio Runtime
+    - IPC server (async)
+    - DFSM message handling (async)
+    - Service tasks (async)
+    - CPG message processing
+
+Background Threads
+    - SQLite I/O (blocking, offloaded)
+    - RRD file writes (blocking)
+
+
+================================================================================
+                          Testing
+================================================================================
+
+Unit Tests
+    - Per-crate unit tests with mock implementations
+    - Run with: cargo test --workspace
+
+Integration Tests
+    - Comprehensive test suite in integration-tests/ directory
+    - Single-node, multi-node, and mixed C/Rust cluster tests
+    - See integration-tests/README.md for full documentation
+
+
+================================================================================
diff --git a/src/pmxcfs-rs/README.md b/src/pmxcfs-rs/README.md
new file mode 100644
index 00000000..4ad846f3
--- /dev/null
+++ b/src/pmxcfs-rs/README.md
@@ -0,0 +1,235 @@
+# pmxcfs-rs
+
+## Executive Summary
+
+pmxcfs-rs is a complete rewrite of the Proxmox Cluster File System from C to Rust, achieving full functional parity while maintaining wire-format compatibility with the C implementation. The implementation has passed comprehensive single-node and multi-node integration testing.
+
+**Overall Completion**: All subsystems implemented
+- All core subsystems implemented and tested
+- Wire protocol compatibility verified
+- Comprehensive test coverage (24 integration tests + extensive unit tests)
+- Production client compatibility confirmed
+- Multi-node cluster functionality validated
+
+---
+
+## Component Status
+
+### Workspace Structure
+
+pmxcfs-rs is organized as a Rust workspace with 9 crates:
+
+| Crate | Purpose |
+|-------|---------|
+| `pmxcfs` | Main daemon binary |
+| `pmxcfs-config` | Configuration management |
+| `pmxcfs-api-types` | Shared types and errors |
+| `pmxcfs-memdb` | Database with SQLite backend |
+| `pmxcfs-dfsm` | Distributed state machine + CPG |
+| `pmxcfs-rrd` | RRD file persistence |
+| `pmxcfs-status` | Status monitoring + RRD |
+| `pmxcfs-ipc` | libqb-compatible IPC server |
+| `pmxcfs-services` | Service lifecycle framework |
+| `pmxcfs-logger` | Cluster log + ring buffer |
+
+### Compatibility Matrix
+
+| Component | Notes |
+|-----------|-------|
+| **FUSE Filesystem** | All operations implemented |
+| **Database (MemDB)** | SQLite schema compatible |
+| **Cluster Communication** | CPG/Quorum via Corosync |
+| **DFSM State Machine** | Binary message format compatible |
+| **IPC Server** | Wire protocol verified with libqb clients |
+| **Plugin System** | All 10 plugins (6 func + 4 link) with write support |
+| **RRD Integration** | Format migration implemented |
+| **Status Subsystem** | VM list, config tracking, cluster log |
+
+---
+
+## Design Decisions and Notable Differences
+
+### 1. IPC Protocol: Partial libqb Implementation
+
+**Decision**: Implement libqb-compatible wire protocol without using libqb library directly.
+
+**C Implementation**:
+- Uses libqb library directly (`libqb0`, `libqb-dev`)
+- Full libqb feature set (SHM ring buffers, POSIX message queues, etc.)
+- IPC types: `QB_IPC_SOCKET`, `QB_IPC_SHM`, `QB_IPC_POSIX_MQ`
+
+**Rust Implementation**:
+- Custom implementation of libqb wire protocol
+- Only implements `QB_IPC_SOCKET` type (Unix datagram sockets + shared memory control files)
+- Compatible handshake, request/response structures
+- Verified with both libqb C clients and production Perl clients (PVE::IPCC)
+
+**Rationale**:
+- libqb has no Rust bindings and FFI would be complex
+- pmxcfs only uses `QB_IPC_SOCKET` type in production
+- Wire protocol compatibility is what matters for clients
+- Simpler implementation, easier to maintain
+
+**Compatibility Impact**: **None** - All production clients work identically
+
+**Reference**:
+- C: `src/pmxcfs/server.c` (uses libqb API)
+- Rust: `src/pmxcfs-rs/pmxcfs-ipc/src/server.rs` (custom implementation)
+- Verification: `pmxcfs-ipc/tests/qb_wire_compat.rs` (all tests passing)
+
+---
+
+### 2. Logging System: tracing vs qb_log
+
+**Decision**: Use Rust `tracing` ecosystem instead of libqb's `qb_log`.
+
+**C Implementation**:
+- Uses `qb_log` from libqb for all logging
+- Log levels: `QB_LOG_EMERG`, `QB_LOG_ALERT`, `QB_LOG_CRIT`, `QB_LOG_ERR`, `QB_LOG_WARNING`, `QB_LOG_NOTICE`, `QB_LOG_INFO`, `QB_LOG_DEBUG`
+- Output: syslog + stderr
+- Runtime control: Write to `/etc/pve/.debug` file (0 = info, 1 = debug)
+- Format: `[domain] LEVEL: message (file.c:line:function)`
+
+**Rust Implementation**:
+- Uses `tracing` crate with `tracing-subscriber`
+- Log levels: `ERROR`, `WARN`, `INFO`, `DEBUG`, `TRACE`
+- Output: journald (via `tracing-journald`) + stdout
+- Runtime control: Same mechanism - `.debug` plugin file (0 = info, 1 = debug)
+- Format: `[timestamp] LEVEL module::path: message`
+
+**Key Differences**:
+
+| Aspect | C (qb_log) | Rust (tracing) | Impact |
+|--------|-----------|----------------|--------|
+| **Log format** | `[domain] INFO: msg (file.c:123)` | `2025-11-14T10:30:45 INFO pmxcfs::module: msg` | Log parsers need update |
+| **Severity levels** | 8 levels (syslog standard) | 5 levels (standard Rust) | Mapping works fine |
+| **Destination** | syslog | journald (systemd) | Both queryable, journald is modern |
+| **Runtime toggle** | `/etc/pve/.debug` | Same | **No change** |
+| **CLI flag** | `-d` or `--debug` | Same | **No change** |
+
+**Rationale**:
+- `tracing` is the Rust ecosystem standard
+- Better async/structured logging support
+- No FFI to libqb needed
+- Integrates with systemd/journald natively
+- Same user-facing behavior (`.debug` file toggle)
+
+**Compatibility Impact**: **Minor** - Log monitoring scripts may need format updates
+
+**Migration**:
+```bash
+# Old C logs (syslog)
+journalctl -u pve-cluster | grep pmxcfs
+
+# New Rust logs (journald, same command works)
+journalctl -u pve-cluster | grep pmxcfs
+```
+
+**Reference**:
+- C: `src/pmxcfs/pmxcfs.c` (qb_log initialization)
+- Rust: `src/pmxcfs-rs/pmxcfs/src/main.rs` (tracing-subscriber setup)
+
+---
+
+### 3. OpenVZ Container Support: Intentionally Excluded
+
+**Decision**: No functional support for OpenVZ containers.
+
+**C Implementation**:
+- Includes OpenVZ VM type (`VMTYPE_OPENVZ = 2`)
+- Detects OpenVZ action scripts (`vps*.mount`, `*.start`, `*.stop`, etc.)
+- Sets executable permissions on OpenVZ scripts
+- Scans `nodes/*/openvz/` directories for containers
+- **All code marked**: `// FIXME: remove openvz stuff for 7.x`
+
+**Rust Implementation**:
+- VM types: `VmType::Qemu = 1`, `VmType::Lxc = 3` (no `VMTYPE_OPENVZ = 2`)
+- `/openvz` symlink exists (for backward compatibility) but no functional support
+- No OpenVZ script detection or VM scanning
+
+**Rationale**:
+- OpenVZ deprecated in Proxmox VE 4.0 (2015)
+- OpenVZ removed completely in Proxmox VE 7.0 (2021)
+- pmxcfs-rs ships with Proxmox VE 9.x (2 major versions after removal)
+- Last OpenVZ code change: October 2011 (14 years ago)
+- Mandatory LXC migration completed years ago
+
+**Compatibility Impact**: **None** - No PVE 9.x systems have OpenVZ containers
+
+**Reference**:
+- C: `src/pmxcfs/status.h:31-32`, `cfs-plug-memdb.c:46-93`, `memdb.c:455-460`
+- Rust: `pmxcfs-api-types/src/lib.rs:99-102` (VmType enum)
+
+---
+
+## Testing
+
+pmxcfs-rs has a comprehensive test suite with 100+ tests organized following modern Rust testing best practices.
+
+### Quick Start
+
+```bash
+# Run all tests
+cargo test --workspace
+
+# Run unit tests only (fast, inline tests)
+cargo test --lib
+
+# Run integration tests only
+cargo test --test '*'
+
+# Run specific package tests
+cargo test -p pmxcfs-memdb
+```
+
+### Multi-Node Integration Tests
+
+Complete integration test suite covering single-node, multi-node cluster, and C/Rust interoperability.
+
+```bash
+cd integration-tests
+./test --build          # Build and run all tests
+./test --no-build       # Quick iteration
+./test --list           # Show available tests
+```
+
+See [integration-tests/README.md](integration-tests/README.md) for detailed documentation.
+
+---
+
+## Compatibility Summary
+
+### Wire-Compatible
+- IPC protocol (verified with libqb clients)
+- DFSM message format (binary compatible)
+- Database schema (SQLite version 5)
+- RRD file formats (all versions)
+- FUSE operations (all 12 ops)
+
+### Different but Compatible
+- Logging system (tracing vs qb_log) - format differs, functionality same
+- IPC implementation (custom vs libqb) - protocol identical, implementation differs
+- Event loop (tokio vs qb_loop) - both provide event-driven concurrency
+
+### Intentionally Different
+- OpenVZ support (removed, not needed)
+- Service priority levels (all run concurrently in Rust)
+
+---
+
+## References
+
+- **C Implementation**: `src/pmxcfs/`
+- **Rust Implementation**: `src/pmxcfs-rs/`
+  - `pmxcfs` - Main daemon binary
+  - `pmxcfs-config` - Configuration management
+  - `pmxcfs-api-types` - Shared types and error definitions
+  - `pmxcfs-memdb` - In-memory database with SQLite persistence
+  - `pmxcfs-dfsm` - Distributed Finite State Machine (CPG integration)
+  - `pmxcfs-rrd` - RRD persistence
+  - `pmxcfs-status` - Status monitoring and RRD data management
+  - `pmxcfs-ipc` - libqb-compatible IPC server
+  - `pmxcfs-services` - Service framework for lifecycle management
+  - `pmxcfs-logger` - Cluster log with ring buffer and deduplication
+- **Testing Guide**: `integration-tests/README.md`
+- **Test Runner**: `integration-tests/test` (unified test interface)
-- 
2.47.3





More information about the pve-devel mailing list