[pve-devel] ZFS Storage Patches
Michael Rasmussen
mir at datanom.net
Fri Mar 14 20:50:16 CET 2014
On Fri, 14 Mar 2014 12:21:43 -0700
Chris Allen <ca.allen at gmail.com> wrote:
> 8k for me too is much better than 4k. With 4k I tend to hit my IOPS limit
> easily, with not much throughput, and I get a lot of IO delay on VMs when
> the SAN is fairly busy. Currently I'm leaning towards 16k, sparse, with
> lz4 compression. If you go the sparse route then compression is a no
> brainer as it accelerates performance on the underlying storage
> considerably. Compression will lower your IOPS and data usage both are
> good things for performance. ZFS performance drops as usage rises and gets
> really ugly at around 90% capacity. Some people say it starts to drop with
> as little as 10% used, but I have not tested this. With 16k block sizes
> I'm getting good compression ratios - my best volume is 2.21x, my worst
> 1.33x, and the average is 1.63x. So as you can see a lot of the time my
> real block size on disk is going to be effectively smaller than 16k. The
> tradeoff here is that compression ratios will go up with a larger block
> size, but you'll have to do larger operations and thus more waste will
> occur when the VM is doing small I/O. With a large block size on a busy
> SAN your I/O is going to get fragmented before it hits the disk anyway, so
> I think 16k is good balance. I only have 7200 RPM drives in my array, but
> a ton of RAM and a big ZFS cache device, which is another reason I went
> with 16k, to maximize what I get when I can get it. I think with 15k RPM
> drives 8k block size might be better, as your IOPS limit will be roughly
> double that of 7200 RPM.
>
nice catch. I haven't thought of this. I will experiment some more with
sparse volumes. I already have lz4 activated on all my pools and can
proof that performance actually increases using compression.
Hint: If you haven't already tested this you should try making some
performance tests using RAID10. 12 disks (4 vdevs with a 3 disk mirror
gives excellent speed/IO with reasonable security (you can loose 2
disks in any vdev and still not loose any data from the pool)
> Dedup did not work out well for me. Aside from the huge memory
> consumption, it didn't save all that much space and to save the max space
> you need to match the VM's filesystem cluster size to the ZVOL block size.
> Which means 4k for ext4 and NTFS (unless you change it during a Windows
> install). Also dedup really really slows down zpool scrubbing and possibly
> rebuild. This is one of the main reasons I avoid it. I don't want scrubs
> to take forever, when I'm paranoid of something potentially being wrong.
>
I don't recall anybody have provided a proof for anything good coming
out of using dedub.
>
> > Regards write caching: Why not simply use sync
> > directly on the volume?
>
> Good question. I don't know.
>
>
If you use Omnios you should install napp-it. With napp-it
administration of the Omnios storage cluster is a breeze. Changing
caching policy is two clicks with a mouse.
> I'm in the process of trying to run away from all things Oracle at my
> company. We keep getting burned by them. It's so freakin' expensive, and
> they hold you over a barrel with patches for both hardware and software.
> We bought some very expensive hardware from them, and a management
> controller for a blade chassis had major bugs to the point it was
> practically unusable out of the box. Oracle would not under any
> circumstance supply us with the new firmware unless we spent boatloads of
> cash for a maintenance contract. We ended up doing this because we needed
> the controller to work as advertised. This is what annoys me the most with
> them - you buy a product and it doesn't do what is written on the box and
> then you have to pay tons extra for it to do what they said it would do
> when you bought it. I miss Sun...
>
Hehe, more or less the same story here. We stick to Oracle database
and application servers though for mission critical data since having a
comparable setup from MS puts huge demands on hardware.
--
Hilsen/Regards
Michael Rasmussen
Get my public GnuPG keys:
michael <at> rasmussen <dot> cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir <at> datanom <dot> net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir <at> miras <dot> org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--------------------------------------------------------------
/usr/games/fortune -es says:
How should I know if it works? That's what beta testers are for. I
only coded it.
-- Attributed to Linus Torvalds, somewhere in a posting
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.proxmox.com/pipermail/pve-devel/attachments/20140314/6a738ce7/attachment.sig>
More information about the pve-devel
mailing list