[pve-devel] ZFS Storage Patches

Michael Rasmussen mir at datanom.net
Fri Mar 14 20:50:16 CET 2014


On Fri, 14 Mar 2014 12:21:43 -0700
Chris Allen <ca.allen at gmail.com> wrote:

> 8k for me too is much better than 4k.  With 4k I tend to hit my IOPS limit
> easily, with not much throughput, and I get a lot of IO delay on VMs when
> the SAN is fairly busy.  Currently I'm leaning towards 16k, sparse, with
> lz4 compression.  If you go the sparse route then compression is a no
> brainer as it accelerates performance on the underlying storage
> considerably.  Compression will lower your IOPS and data usage both are
> good things for performance.  ZFS performance drops as usage rises and gets
> really ugly at around 90% capacity.  Some people say it starts to drop with
> as little as 10% used, but I have not tested this.  With 16k block sizes
> I'm getting good compression ratios - my best volume is 2.21x, my worst
> 1.33x, and the average is 1.63x.  So as you can see a lot of the time my
> real block size on disk is going to be effectively smaller than 16k.  The
> tradeoff here is that compression ratios will go up with a larger block
> size, but you'll have to do larger operations and thus more waste will
> occur when the VM is doing small I/O.  With a large block size on a busy
> SAN your I/O is going to get fragmented before it hits the disk anyway, so
> I think 16k is good balance.  I only have 7200 RPM drives in my array, but
> a ton of RAM and a big ZFS cache device, which is another reason I went
> with 16k, to maximize what I get when I can get it.  I think with 15k RPM
> drives 8k block size might be better, as your IOPS limit will be roughly
> double that of 7200 RPM.
> 
nice catch. I haven't thought of this. I will experiment some more with
sparse volumes. I already have lz4 activated on all my pools and can
proof that performance actually increases using compression.

Hint: If you haven't already tested this you should try making some
performance tests using RAID10. 12 disks (4 vdevs with a 3 disk mirror
gives excellent speed/IO with reasonable security (you can loose 2
disks in any vdev and still not loose any data from the pool)

> Dedup did not work out well for me.  Aside from the huge memory
> consumption, it didn't save all that much space and to save the max space
> you need to match the VM's filesystem cluster size to the ZVOL block size.
>  Which means 4k for ext4 and NTFS (unless you change it during a Windows
> install).  Also dedup really really slows down zpool scrubbing and possibly
> rebuild.  This is one of the main reasons I avoid it.  I don't want scrubs
> to take forever, when I'm paranoid of something potentially being wrong.
>
I don't recall anybody have provided a proof for anything good coming
out of using dedub. 
> 
> > Regards write caching: Why not simply use sync
> > directly on the volume?
> 
> Good question.  I don't know.
> 
> 
If you use Omnios you should install napp-it. With napp-it
administration of the Omnios storage cluster is a breeze. Changing
caching policy is two clicks with a mouse.

> I'm in the process of trying to run away from all things Oracle at my
> company.  We keep getting burned by them.  It's so freakin' expensive, and
> they hold you over a barrel with patches for both hardware and software.
>  We bought some very expensive hardware from them, and a management
> controller for a blade chassis had major bugs to the point it was
> practically unusable out of the box.  Oracle would not under any
> circumstance supply us with the new firmware unless we spent boatloads of
> cash for a maintenance contract.  We ended up doing this because we needed
> the controller to work as advertised.  This is what annoys me the most with
> them - you buy a product and it doesn't do what is written on the box and
> then you have to pay tons extra for it to do what they said it would do
> when you bought it.  I miss Sun...
> 
Hehe, more or less the same story here. We stick to Oracle database
and application servers though for mission critical data since having a
comparable setup from MS puts huge demands on hardware. 

-- 
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael <at> rasmussen <dot> cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir <at> datanom <dot> net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir <at> miras <dot> org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--------------------------------------------------------------
/usr/games/fortune -es says:
How should I know if it works?  That's what beta testers are for.  I
only coded it.
		-- Attributed to Linus Torvalds, somewhere in a posting
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.proxmox.com/pipermail/pve-devel/attachments/20140314/6a738ce7/attachment.sig>


More information about the pve-devel mailing list