<div dir="ltr"><div>> I have already made some test and I have not be able to make any</div><div>> conclusive tests proving performance should be hurt by using sparse. </div><div><br></div><div>Yeah it shouldn't affect the ZFS mechanics at all, the ZVOL will just lack a reservation.<br>
</div><div><br></div><div><br></div><div>> Is sparse a way to provision more than a 100% then?</div><div>Yes. That, and it enables you to take advantage of compression on the volume. Without sparse the volume is always going to take away the same amount of space from the pool (due to the hard reservation) regardless of whether or not compression and/or dedup is on. You just have to be careful to monitor pool capacity. Bad things will happen if your SAN server runs out of space... I attached a quick and dirty script I wrote to monitor pool capacity and status, and send an e-mail alert if the pool degrades or a capacity threshold is hit. I run it from cron every 30 minutes.</div>
<div><br></div><div><br></div><div><div>> For me 8k block size for volumes seems to be given more write speed.</div></div><div>8k for me too is much better than 4k. With 4k I tend to hit my IOPS limit easily, with not much throughput, and I get a lot of IO delay on VMs when the SAN is fairly busy. Currently I'm leaning towards 16k, sparse, with lz4 compression. If you go the sparse route then compression is a no brainer as it accelerates performance on the underlying storage considerably. Compression will lower your IOPS and data usage both are good things for performance. ZFS performance drops as usage rises and gets really ugly at around 90% capacity. Some people say it starts to drop with as little as 10% used, but I have not tested this. With 16k block sizes I'm getting good compression ratios - my best volume is 2.21x, my worst 1.33x, and the average is 1.63x. So as you can see a lot of the time my real block size on disk is going to be effectively smaller than 16k. The tradeoff here is that compression ratios will go up with a larger block size, but you'll have to do larger operations and thus more waste will occur when the VM is doing small I/O. With a large block size on a busy SAN your I/O is going to get fragmented before it hits the disk anyway, so I think 16k is good balance. I only have 7200 RPM drives in my array, but a ton of RAM and a big ZFS cache device, which is another reason I went with 16k, to maximize what I get when I can get it. I think with 15k RPM drives 8k block size might be better, as your IOPS limit will be roughly double that of 7200 RPM.</div>
<div><br></div><div>Dedup did not work out well for me. Aside from the huge memory consumption, it didn't save all that much space and to save the max space you need to match the VM's filesystem cluster size to the ZVOL block size. Which means 4k for ext4 and NTFS (unless you change it during a Windows install). Also dedup really really slows down zpool scrubbing and possibly rebuild. This is one of the main reasons I avoid it. I don't want scrubs to take forever, when I'm paranoid of something potentially being wrong.</div>
<div><br></div><div><br></div><div><div>> Regards write caching: Why not simply use sync</div><div>> directly on the volume?</div></div><div><br></div><div>Good question. I don't know.</div><div><br></div><div>
<br></div><div><div>> I have made no tests on Solaris - licens costs is out of my league. I</div><div>> regularly test FreeBSD, Linux and Omnios. In production I only use</div><div>> Omnios (15008 but will migrate all to r151014 when this is released</div>
<div>> and then only use LTS in the future).</div></div><div><br></div><div>I'm in the process of trying to run away from all things Oracle at my company. We keep getting burned by them. It's so freakin' expensive, and they hold you over a barrel with patches for both hardware and software. We bought some very expensive hardware from them, and a management controller for a blade chassis had major bugs to the point it was practically unusable out of the box. Oracle would not under any circumstance supply us with the new firmware unless we spent boatloads of cash for a maintenance contract. We ended up doing this because we needed the controller to work as advertised. This is what annoys me the most with them - you buy a product and it doesn't do what is written on the box and then you have to pay tons extra for it to do what they said it would do when you bought it. I miss Sun...</div>
<div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Mar 14, 2014 at 10:52 AM, Michael Rasmussen <span dir="ltr"><<a href="mailto:mir@datanom.net" target="_blank">mir@datanom.net</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">On Fri, 14 Mar 2014 10:11:17 -0700<br>
Chris Allen <<a href="mailto:ca.allen@gmail.com">ca.allen@gmail.com</a>> wrote:<br>
<br>
> > It was also part of latest 3.1. Double-click the mouse over your<br>
> > storage specification in Datacenter->storage and the panel pops up.<br>
> > Patched panel attached.<br>
><br>
</div>I forgot to mention that at the moment the code for creating ZFS<br>
storage is commented out<br>
in /usr/share/pve-manager/ext4/pvemanagerlib.js line <a href="tel:20465-20473" value="+12046520473">20465-20473</a><br>
<div class=""><br>
><br>
> No I haven't. As far as I understand it sparse should not affect<br>
> performance whatsoever, it only changes whether or not a reservation is<br>
> created on the ZVOL. Turning of write caching on the LU should decrease<br>
> performance, dramatically so, if you do not have a separate and very fast<br>
> ZIL device (eg. ZeusRAM). Every block write to the ZVOL will be done<br>
> synchronously when write caching is turned off.<br>
><br>
</div>I have already made some test and I have not be able to make any<br>
conclusive tests proving performance should be hurt by using sparse. Is<br>
sparse a way to provision more than a 100% then?<br>
<div class=""><br>
> I've done some testing with regards to block size, compression, and dedup.<br>
> I wanted sparse support for myself and I figured while I was there I might<br>
> as well add a flag for turning off write caching. For people with the<br>
> right (and expensive!) hardware the added safety of no write caching might<br>
> be worth it.<br>
><br>
</div>I have done the same. For me 8k block size for volumes seems to be given<br>
more write speed. Regards write caching: Why not simply use sync<br>
directly on the volume?<br>
<div class=""><br>
> Have you tested the ZFS storage plugin on Solaris 11.1? I first tried<br>
> using it with 11.1, but they changed how the LUN assignment for the views<br>
> works. In 11.0 and OmniOS the first available LUN will get used when a new<br>
> view is created if no LUN is given. But in 11.1 it gets populated with a<br>
> string that says "AUTO". This of course means PVE can't connect to the<br>
> volume because it can't resolve the LUN. Unfortunately I couldn't find<br>
> anything in the 11.1 documentation that described how to get the LUN. I'm<br>
> assuming there's some kind of mechanism in 11.1 where you can get the<br>
> number on the fly, as it must handle them dynamically now. But after a lot<br>
> of Googling and fiddling around I gave up and switched to OmniOS. I don't<br>
> have a support contract with Oracle so that was a no go. Anyway, just<br>
> thought I'd mention that in case you knew about it.<br>
><br>
> In addition to that problem 11.1 also has a bug in the handling of the<br>
> iSCSI feature Immediate Data. It doesn't implement it properly according<br>
> to the iSCSI RFC, and so you need to turn of Immediate Data on the client<br>
> in order to connect. The patch is available to Oracle paying support<br>
> customers only.<br>
><br>
</div>I have made no tests on Solaris - licens costs is out of my league. I<br>
regularly test FreeBSD, Linux and Omnios. In production I only use<br>
Omnios (15008 but will migrate all to r151014 when this is released<br>
and then only use LTS in the future).<br>
<div class=""><br>
--<br>
Hilsen/Regards<br>
Michael Rasmussen<br>
<br>
Get my public GnuPG keys:<br>
michael <at> rasmussen <dot> cc<br>
<a href="http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E" target="_blank">http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E</a><br>
mir <at> datanom <dot> net<br>
<a href="http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C" target="_blank">http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C</a><br>
mir <at> miras <dot> org<br>
<a href="http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917" target="_blank">http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917</a><br>
--------------------------------------------------------------<br>
/usr/games/fortune -es says:<br>
</div>I never failed to convince an audience that the best thing they<br>
could do was to go away.<br>
</blockquote></div><br></div>