[PVE-User] problems with 3.2-beta
athompso at athompso.net
Sun Feb 2 19:00:55 CET 2014
Overall, the Ceph GUI is great. I actually got Ceph up and running (and
working) this time! Syncing ceph.conf through corosync is such an
obvious way to simplify things... for small clusters, anyway.
I am seeing some problems, however, and I'm not sure if they're just me,
or if I should be opening bugs:
1. I have one node that's up and running just fine, pvecm claims
everything's fine, but I can't migrate VMs that started somewhere else
to it - migration always fails, claiming the node is dead. Nothing
unusual appears in any logfile that I can see... or at least nothing
that looks bad to me. I can create a new VM there, migrate it (online)
to another node and migrate it back (online, again), but VMs that were
started on another node won't migrate.
2. CPU usage in the "Summary" screen of each VM sometimes reports
non-sensical values: right now one VM is using 126% of 1 vCPU.
3. The Wiki page on setting up CEPH Server doesn't mention that you can
do most of the setup from within the GUI. Since I have write access
there, I guess I should fix it myself :-).
4. (This isn't really new...) SPICE continues to be a major PITA when
running Ubuntu 12.04LTS as the management client. Hmm, I just found a
PPA with virt-viewer packages that work. I should update the Wiki with
that info, too.
5. Stopping VMs with HA enabled is now an *extremely* slow process... If
I disable HA for a particular VM, I now notice that Stopping also
produces a Shutdown task, and it takes longer than previously, but not
unreasonably slow. I don't understand why Stop isn't instantaneous,
though. I notice that typing "stop" into a qm monitor also is slow...
the only way I have to rapidly stop a VM is to kill the KVM process
6. I'm not sure if this is new, but when I have a VM under HA, if I stop
it manually, it immediately restarts. I don't know if I ever tried that
under 3.1 Enterprise... maybe it always worked this way?
Ceph speeds are barely acceptable (10-20MB/sec) but that's typical of
Ceph in my experience so far, even with caching turned on. (Still a bit
of a letdown compared to Sheepdog's 300MB/sec burst throughput, though.)
One thing I'm not sure of is OSD placement... if I have two drives per
host dedicated to Ceph (and thus two OSDs), and my pool "size" is 2,
does that mean a single node failure could render some data
unreachable? I've adjusted my "size" to 3 just in case, but I don't
understand how this works. Sheepdog guarantees that multiple copies of
an object won't be stored on the same host for exactly this reason, but
I can't tell what Ceph does.
Also not sure what's going on with thin-provisioning; I guess Ceph and
QEMU/KVM don't do thin provisioning at all, in any way, shape or form?
athompso at athompso.net
More information about the pve-user