[pve-devel] [RFC-NOT-TO-BE-APPLIED qemu] log on writes to sector zero

Fiona Ebner f.ebner at proxmox.com
Thu Feb 9 15:17:49 CET 2023


The idea is to make a qemu build with this available to users affected
by bug #2874 in the hope to catch an actual buggy write should it
happen again and if it actually comes from QEMU.

The logging in block-backend covers writes coming from the virtual
disk protocol (e.g. SATA), while the loggin in block/io should cover
most writes coming from both guest and block jobs (AFAICT,
bdrv_co_pwritev_part() should cover most paths leading to writes on
the device, e.g. bdrv_co_pwrite_zeroes() calls it too).

Note that there are false positives we can't filter, because they are
valid writes to sector 0, for example:
* A drive with just an ext4 filesystem (no partitions) seems to have a
  write to the first sector on every Linux boot and shutdown.
* Any other guest write that should go to sector 0.
* Live move disk/mirror operations.
* Live restore from PBS.

In bdrv_co_pwritev_part():

False positives with qemu-img and pbs-restore etc. are avoided by
checking the program's path read via the /proc/self/exe link.

If there is no filename for the block driver state, nothing is printed
to avoid false positives for the backup target (and other such special
devices). Drives on LVM(-Thin), ZFS, RBD (with and without krbd), file
based storages and even iSCSI all seem to have the filename property
set.

Sometimes the filename will be a bit lenghty, e.g. for a Ceph storage
without krbd, but it's better to still catch these:
json:{"pool": "rbdkvm", "image": "vm-168-disk-0", "conf":
"/etc/pve/ceph.conf", "driver": "rbd", "namespace": "", "user":
"admin"}

backtrace_symbols() is used to get the relative offset from the binary
rather than full address, making it easy to use addr2line afterwards
to get file name and line number. There seems to be a slight mismatch
in line numbers unfortunately, but it should be enough to figure out
the call path. (Compiling with -rdynamic would allow resolving the
symbols themselves, but also not provide line numbers and printing
only offsets relative to the resolved symbol, making it harder to use
for addr2line).

Signed-off-by: Fiona Ebner <f.ebner at proxmox.com>
---
 block/block-backend.c | 18 ++++++++++++++++++
 block/io.c            | 20 ++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 1b563e628b..3233403d2e 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -28,6 +28,8 @@
 #include "trace.h"
 #include "migration/misc.h"
 
+#include <execinfo.h>
+
 /* Number of coroutines to reserve per attached device model */
 #define COROUTINE_POOL_RESERVATION 64
 
@@ -1625,6 +1627,22 @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t offset,
 {
     IO_CODE();
     assert((uint64_t)qiov->size <= INT64_MAX);
+
+    if (offset < 512) {
+        void *trace[20];
+        const char *name = blk_name(blk) ?: "unnamed";
+        int line_count = backtrace(trace, 20);
+        char **trace_lines = backtrace_symbols(trace, line_count);
+
+        warn_report("write to first sector on device '%s':", name);
+        if (trace_lines != NULL) {
+            for (int i = 0; i < line_count; i++) {
+                warn_report("%s", trace_lines[i]);
+            }
+            free(trace_lines);
+        }
+    }
+
     return blk_aio_prwv(blk, offset, qiov->size, qiov,
                         blk_aio_write_entry, flags, cb, opaque);
 }
diff --git a/block/io.c b/block/io.c
index 531b3b7a2d..2b552c4d12 100644
--- a/block/io.c
+++ b/block/io.c
@@ -38,6 +38,8 @@
 #include "qemu/main-loop.h"
 #include "sysemu/replay.h"
 
+#include <execinfo.h>
+
 /* Maximum bounce buffer for copy-on-read and write zeroes, in bytes */
 #define MAX_BOUNCE_BUFFER (32768 << BDRV_SECTOR_BITS)
 
@@ -2222,6 +2224,24 @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
     bool padded = false;
     IO_CODE();
 
+    if (offset < 512) {
+        char path[PATH_MAX];
+        readlink("/proc/self/exe", path, PATH_MAX);
+        if ((g_strrstr(path, "/qemu-system") || g_strrstr(path, "/kvm")) && bs->filename[0]) {
+            void *trace[20];
+            int line_count = backtrace(trace, 20);
+            char **trace_lines = backtrace_symbols(trace, line_count);
+            warn_report("write to first sector on device '%s':", bs->filename);
+
+            if (trace_lines != NULL) {
+                for (int i = 0; i < line_count; i++) {
+                    warn_report("%s", trace_lines[i]);
+                }
+                free(trace_lines);
+            }
+        }
+    }
+
     trace_bdrv_co_pwritev_part(child->bs, offset, bytes, flags);
 
     if (!bdrv_is_inserted(bs)) {
-- 
2.30.2






More information about the pve-devel mailing list