[pbs-devel] [PATCH proxmox-backup 2/2] tape: use datastores 'read-thread' for tape backup
Dominik Csapak
d.csapak at proxmox.com
Tue Apr 30 11:39:39 CEST 2024
using a single thread for reading is not optimal in some cases, e.g.
when the underlying storage can handle more reads in parallel than
with a single thread.
This depends largely on the storage and cpu.
We use the ParallelHandler to handle the actual reads.
Make the sync_channel buffer size depending on the number of threads
so we have space for two chunks per thread.
Did some benchmarks on my (virtual) pbs with a real tape drive (lto8
tape in an lto9 drive):
For my NVME datastore it did not matter much how many threads were used
so i guess the bottleneck was either in the hba/drive or cable rather
than the disks/cpu. (Always got around ~300MB/s from the task log)
For a datastore on a single HDD, the results are much more interesting:
1 Thread: ~55MB/s
2 Threads: ~70MB/s
4 Threads: ~80MB/s
8 Threads: ~95MB/s
So the fact that multiple IO request are done in parallel does speed up
the tape backup in general.
Signed-off-by: Dominik Csapak <d.csapak at proxmox.com>
---
altough i did benchmark, i would be very grateful if other people could
test this (and the previous) change in their varying disk setups, so we
can verify that it really makes a difference and is worth it to have it
configurable
pbs-api-types/src/datastore.rs | 2 +-
src/tape/pool_writer/new_chunks_iterator.rs | 42 +++++++++++++--------
2 files changed, 27 insertions(+), 17 deletions(-)
diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
index 2ad2ae063..243c4759f 100644
--- a/pbs-api-types/src/datastore.rs
+++ b/pbs-api-types/src/datastore.rs
@@ -210,7 +210,7 @@ pub enum DatastoreFSyncLevel {
optional: true,
},
"read-threads": {
- description: "Controls how many threads are used for reading from the datastore for verification.",
+ description: "Controls how many threads are used for reading from the datastore for verify and tape backup.",
type: usize,
optional: true,
minimum: 1,
diff --git a/src/tape/pool_writer/new_chunks_iterator.rs b/src/tape/pool_writer/new_chunks_iterator.rs
index 1454b33d2..63b10c9f8 100644
--- a/src/tape/pool_writer/new_chunks_iterator.rs
+++ b/src/tape/pool_writer/new_chunks_iterator.rs
@@ -6,8 +6,9 @@ use anyhow::{format_err, Error};
use pbs_datastore::{DataBlob, DataStore, SnapshotReader};
use crate::tape::CatalogSet;
+use crate::tools::parallel_handler::ParallelHandler;
-/// Chunk iterator which use a separate thread to read chunks
+/// Chunk iterator which use separate threads to read chunks
///
/// The iterator skips duplicate chunks and chunks already in the
/// catalog.
@@ -25,7 +26,8 @@ impl NewChunksIterator {
snapshot_reader: Arc<Mutex<SnapshotReader>>,
catalog_set: Arc<Mutex<CatalogSet>>,
) -> Result<(std::thread::JoinHandle<()>, Self), Error> {
- let (tx, rx) = std::sync::mpsc::sync_channel(3);
+ let read_threads = datastore.get_read_threads();
+ let (tx, rx) = std::sync::mpsc::sync_channel(read_threads * 2);
let reader_thread = std::thread::spawn(move || {
let snapshot_reader = snapshot_reader.lock().unwrap();
@@ -35,36 +37,44 @@ impl NewChunksIterator {
let datastore_name = snapshot_reader.datastore_name().to_string();
let result: Result<(), Error> = proxmox_lang::try_block!({
- let mut chunk_iter = snapshot_reader.chunk_iterator(move |digest| {
+ let chunk_iter = snapshot_reader.chunk_iterator(move |digest| {
catalog_set
.lock()
.unwrap()
.contains_chunk(&datastore_name, digest)
})?;
- loop {
- let digest = match chunk_iter.next() {
- None => {
- let _ = tx.send(Ok(None)); // ignore send error
- break;
+ let reader_pool =
+ ParallelHandler::new("tape backup chunk reader pool", read_threads, {
+ let tx = tx.clone();
+ move |digest| {
+ let blob = datastore.load_chunk(&digest)?;
+ //println!("LOAD CHUNK {}", hex::encode(&digest));
+
+ tx.send(Ok(Some((digest, blob)))).map_err(|err| {
+ format_err!("error sending result from reader thread: {err}")
+ })?;
+
+ Ok(())
}
- Some(digest) => digest?,
- };
+ });
+
+ for digest in chunk_iter {
+ let digest = digest?;
if chunk_index.contains(&digest) {
continue;
}
- let blob = datastore.load_chunk(&digest)?;
- //println!("LOAD CHUNK {}", hex::encode(&digest));
- if let Err(err) = tx.send(Ok(Some((digest, blob)))) {
- eprintln!("could not send chunk to reader thread: {err}");
- break;
- }
+ reader_pool.send(digest)?;
chunk_index.insert(digest);
}
+ reader_pool.complete()?;
+
+ let _ = tx.send(Ok(None)); // ignore send error
+
Ok(())
});
if let Err(err) = result {
--
2.39.2
More information about the pbs-devel
mailing list