[pve-devel] StorPool storage plugin concerns

Tue Feb 18 12:34:47 CET 2025

Hello,

On 14/02/2025 13:42, Fabian Grünbichler wrote:
> AFAICT from the description above (not looking at code or actually testing anything), issues on your storage layer should be ruled out. But it still leaves issues with anything else, e.g. any long running task (either by PVE, or by the admin) that involves a HA-managed guest is at risk of being "split-brained". In a regular (HA) setup, another node will only recover the config (and thus ownership) of the guest once the requisite timeouts have passed, which means it *knows* the failed node must have fenced itself. In your setup, this is not the case anymore - the non-quorate node still has the VM config (since it is not quorate, it cannot notice the "theft" of the config by the HA stack running on the quorate partition of the cluster) and thus (from a local point of view) at least RO ownership of that guest. Depending on the sequence of events, such a task might have passed a quorum check earlier and not yet reached the next such check, and thus even think it still has full ownership and act accordingly! Obviously, writes to your shared storage or to /etc/pve would be blocked, but that doesn't mean that nothing dangerous can happen (e.g., local or external state being corrupted or running out of sync by writes on/from two different nodes).
>
> The only way to make this safe(r) would be to basically disallow any custom integration (to ensure no non-PVE tasks are running) and kill the whole PVE stack on quorum loss, including any spawned tasks and pmxcfs. At that point, all the configs and API would become unavailable as well, so the risk of something/somebody misinterpreting anything should become zero - if there is no information, nothing can be misinterpreted after all ;) This would mean basically mean "downgrading" a PVE+StorPool node to a StorPool node on quorum loss, which is your intended semantics (I think?).
>
> This approach does come with a new problem though - once this node rejoins the cluster, you'd need to bring up all of the PVE stack again in an orderly fashion.
>
> I hope the above explains why and how PVE is using self-fencing via watchdogs, and the implications of disabling that while keeping HA "enabled". If something is unclear or you have more questions, please reach out!
>
Thank you for the detailed feedback and helpful explanations. Your 
suggestion is essentially what we had in mind with the "automatic 
recovery" idea, and it seems like the correct direction for the watchdog 
after separating it from the plugin.

Best regards,

-- 
Ivaylo Markov
Quality & Automation Engineer
StorPool Storage
https://www.storpool.com