[pbs-devel] [PATCH proxmox 1/2] add tools/zero: add fast zero comparison code

Wolfgang Bumiller w.bumiller at proxmox.com
Mon Dec 14 13:52:52 CET 2020


Some testing & internal talk led to the decision to exclude this patch.

apart from being incomplete (some alignment issues aren't handled),
rustc itself is very capable of producing fast SSE code for this, if you
know *how*:

Assuming an `fn is_zero(buf: &[u8]) -> bool`:

a) `buf.contains(&0)`

    compiles to a naive loop, slow

b) `buf.iter().fold(0, |a, b| a | b) == 0`

    produces fast SSE code loading 128 bytes at a time (sort of) into
    xmm registers, (pretty much the code from this commit, but better),

    however, this doesn't stop at the first non-zero

c) ```
   buf
       .chunks(128)
       .map(|aa| aa.iter().fold(0, |a, b| a|b) != 0)
       .any(|a| a)
   ```

    A compromise suggested by Fabian G.
    Much like case (b), the inner loop loads 128 bytes directly via sse
    instructions, but we also have the outer chunks to stop early

On Mon, Dec 14, 2020 at 09:38:49AM +0100, Thomas Lamprecht wrote:
> On 11.12.20 13:08, Dominik Csapak wrote:
> > that can make use of see/avx instructions where available
> > 
> 
> maybe some performance numbers can help to argue why we should add
> that, maybe directly as small benchmark binary so different CPUs
> could be compared?
> 
> > this is mostly a direct translation of qemu's util/bufferiszero.c
> > 
> > this is originally from Wolfgang Bumiller
> 
> FYI, you could use the
> 
> Originally-by: Wolfgang Bumiller <w.bumiller at proxmox.com>
> 
> git trailer for that, I saw it a few times used in other projects (e.g.,
> kernel)




More information about the pbs-devel mailing list