[pve-devel] [RFC] towards automated integration testing

Tue Oct 17 09:34:56 CEST 2023

Am 16/10/2023 um 17:18 schrieb Lukas Wagner:
> On 10/16/23 13:20, Stefan Hanreich wrote:
>> I can imagine having to setup VMs inside the Test Setup as well for
>> doing various tests. Doing this manually every time could be quite
>> cumbersome / hard to automate. Do you have a mechanism in mind to
>> deploy VMs inside the test system as well? Again, PBS could be an
>> interesting option for this imo.
>>
> Several options come to mind. We could use a virtualized PBS instance
> with a datastore containing the VM backup as part of the fixture.  We
> could use some external backup store (so the same 'source' as for the
> templates themselves) - however that means that the systems under test
> must have network access to that.  We could also think about using
> iPXE to boot test VMs, with the boot image either be provided by some
> template from the fixture, or by some external server.  For both
> approaches, the 'as part of the fixture' approaches seem a bit nicer,
> as they are more self-contained.

What about the following approach:

The test state that they need one or more VMs with certain properties,
i.e., something like "none" (don't care), "ostype=win*", "memory>=10G"
or the like (can start out easy w.r.t to supported comparison features,
as long the base system is there it can be extended relatively easily
later on).

Then, on a run of a test first all those asset-dependencies are
collected.  Then they can be, depending on further config, get newly
created or selected from existing candidates on the target test-host
system.

In general the test-system can add a specific tag (like "test-asset") to
such virtual guests by default, and also add that as implicit property
condition (if no explicit tag-condition is already present) for when
searching for existing assets, this way one can either re-use guests, be
it because they exist due to running on a bare-metal system, that won't
get rolled back, or even in some virtual system that gets rolled back to
a state that already has to virtual-guest test-assets configured and
thus can also reduce the time required to set up a clean environment by
a lot, benefiting both use cases.

Extra config, and/or command line, knobs can then force re-creation of
all, or some asses of, a test, or the base search path for images, here
it's probably enough to have some simpler definitively wanted ones to
provide the core-infra for how to add others, maybe more complex knobs
in the future more easily (creating new things is IMO always harder than
extending existing ones, at least if non-trivial).

> Also, the vmbd2 thingy that thomas mentioned might be interesting for

Because I stumbled upon it today, systemd's mkosi tool could be also
interesting here:

https://github.com/systemd/mkosi
https://github.com/systemd/mkosi/blob/main/mkosi/resources/mkosi.md

> this - i've only glanced at it so far though.
> 
> As of now it seems that this question will not influence the design of
> the test runner much, so it can probably be postponed to a later
> stage.

Not of the runner itself, but all set up stuff for it, so I'd at least
try to keep it in mind – above features might not be that much work, but
would create lots of flexibility to allow devs using it more easily for
declarative reproduction tries of bugs too.  At least I see it a big
mental roadblock if I have to set up specific environments for using
such tools, and cannot just re-use my existing ones 1:1.

> 
>>> In theory, the test runner would also be able to drive tests on real
>>> hardware, but of course with some limitations (harder to have a
>>> predictable, reproducible environment, etc.)
>>
>> Maybe utilizing Aaron's installer for setting up those test systems
>> could at least produce somewhat identical setups? Although it is
>> really hard managing systems with different storage types, network
>> cards, ... .
> 
> In general my biggest concern with 'bare-metal' tests - and to
> precise, that does not really have anything to do with being
> 'bare-metal', more about testing on something that is harder roll back
> into a clean state that can be used for the next test execution, is
> that I'm afraid that a setup like this could become quite brittle and
> a maintenance burden

I don't see that as issue, just as two separate thing, one is regression
testing in clean states where we can turn up reporting of test-failures
to the max and the other is integration testing where we don't report
widely but only allow some way to see list of issues for admins to
decide.

Bugs in the test system or configuration issue breaking idempotency
assumptions can then be fixed, other issues that are not visible in
those clean-room tests can become visible, I see no reason why both
cannot co-exist and have equivalent priority/focus.

New tests can be checked for basic idempotency by running them twice,
with the second run not doing any rollback.

>> I've seen GitLab using tags for runners that specify certain
>> capabilities of systems. Maybe we could also introduce something like
>> that here for different bare-metal systems? E.g. a test case
>> specifies it needs a system with tag `ZFS` and then you can run /
>> skip the respective test case on that system. Managing those tags can
>> introduce quite a lot of churn though, so I'm not sure if this would
>> be a good idea.
> 
> I have thought about a tag system as well - not necessarily for test
> runners, but for test cases. E.g. you could tag tests for the
> authentication system with 'auth' - because at least for the local
> development cycle it might not make much sense to run tests for
> clusters, ceph, etc. while working on the authentication system.

Yes, I thought about something like that too, a known set of tags (i.e.,
centrally managed set and bail, or at least warn if test uses unknown
one) – having test runs be filtered by their use classes, like
"migration" or "windows" or your "auth" example would be definitively
nice.

>>> The test script is executed by the test runner; the test outcome is
>>> determined by the exit code of the script. Test scripts could be
>>> written
>> Are you considering capturing output as well? That would make sense
>> when using assertions at least, so in case of failures developers
>> have a starting point for debugging.
> Yup, I'd capture stdout/stderr from all test executables/scripts and
> include it in the final test report.

I guess there would be a (optional) notification to a set of addresses,
passed to the test system via CLI/Config by the tester (human on manual
tests or derived from changes and maintainers for automated tests), and
that would only have a summary and link/point to the full report that
provides the longer outputs of test harness and possibly system logs.

> Test output is indeed very useful when determining *why* something
> went wrong.

Journalctl of all nodes that took part of a test might be useful too.

>> Would it make sense to allow specifying a expected exit code for
>> tests that actually should fail - or do you consider this something
>> that should be handled by the test script?
> 
> I guess that's a matter of taste. Personally I'd keep the contract
> between test runner and test script simple and say 0 == success,
> everything else is a failure. If there are any test cases that expect
> a failure of some API call, then the script should 'translate' the
> exit code.

W.r.t. exit code I find that fine, but maybe we want to allow passing a
more formal result text back, but we always can extend this by just
using some special files that the test script writes to, or something
like that, in the future, here starting out with simply checking exit
code seems fine enough to me.