[pve-devel] [RFC] towards automated integration testing

Mon Oct 16 15:57:25 CEST 2023

A few things, most of which we talked off-list already anyway.

We should eye if we can integrate existing regression testing in there
too, i.e.:

- The qemu autotest that Stefan Reiter started and Fiona still uses,
  here we should drop the in-git tracked backup that the test VM is
  restored from (replace with something like vmdb2 [0] managed Debian
  image   that gets generated on demand), replace some hard coded
  configs with a simple config and make it public.

[0]: https://vmdb2.liw.fi/

- The selenium based end-to-end tests which we also use to generate most
  screenshots with (they can run headless too). Here we also need a few
  clean-ups, but not that many, and make the repo public.

Am 13/10/2023 um 15:33 schrieb Lukas Wagner:> I am currently doing the groundwork that should eventually enable us
> to write automated integration tests for our products.
> 
> Part of that endeavor will be to write a custom test runner, which will
>   - setup a specified test environment
>   - execute test cases in that environment

This should be decoupled from all else, so that I can run it on any
existing installation, bare-metal or not. This allows devs using it in
their existing setups with almost no change required.

We can then also add it easily in our existing buildbot instance
relatively easily, so it would be worth doing so even if we might
deprecate Buildbot in the future (for what little it can do, it could
be simpler).

>   - create some sort of test report

As Stefan mentioned, test-output can be good to have. Our buildbot
instance provides that, and while I don't look at them in 99% of the
builds, when I need to its worth *a lot*.

> 
> ## Introduction
> 
> The goal is to establish a framework that allows us to write
> automated integration tests for our products.
> These tests are intended to run in the following situations:
> - When new packages are uploaded to the staging repos (by triggering
>   a test run from repoman, or similar)

*debian repos, as we could also trigger some when git commits are
pushed, just like we do now through Buildbot. Doing so is IMO nice as it
will catch issues before a package was bumped, but is still quite a bit
simpler to implement than an "apply patch from list to git repos" thing
from the next point, but could still act as a preparation for that.

> - Later, this tests could also be run when patch series are posted to
>   our mailing lists. This requires a  mechanism to automatically
>   discover, fetch and build patches, which will be a separate,
>   follow-up project.

> 
> As a main mode of operation, the Systems under Test (SUTs)
> will be virtualized on top of a Proxmox VE node.

For the fully-automated test system this can be OK as primary mode, as
it indeed makes things like going back to an older software state much
easier.

But, if we decouple the test harness and running them from that more
automated system, we can also run the harness periodically on our
bare-metal test servers.

> ## Terminology
> - Template: A backup/VM template that can be instantiated by the test
>   runner

I.e., the base of the test host? I'd call this test-host, template is a
bit to overloaded/generic and might focus too much on the virtual test
environment.

Or is this some part that takes place in the test, i.e., a
generalization of product to test and supplementary tool/app that helps
on that test?

Hmm, could work out ok, and we should be able to specialize stuff
relatively easier later too, if wanted.

> - Test Case: Some script/executable executed by the test runner, success
>   is determined via exit code.
> - Fixture: Description of a test setup (e.g. which templates are needed,
>   additional setup steps to run, etc.)
> 
> ## Approach
> Test writers write template, fixture, test case definition in
> declarative configuration files (most likely TOML). The test case
> references a test executable/script, which performs the actual test.
> 
> The test script is executed by the test runner; the test outcome is
> determined by the exit code of the script. Test scripts could be written
> in any language, e.g. they could be Perl scripts that use the official
> `libpve-apiclient-perl` to test-drive the SUTs.
> If we notice any emerging patterns, we could write additional helper
> libs that reduce the amount of boilerplate in test scripts.
> 
> In essence, the test runner would do the following:
> - Group testcases by fixture
> - For every fixture:
>     - Instantiate needed templates from their backup snapshot

Should be optional, possible a default-on boolean option that conveys

>     - Start VMs

Same.

>     - Run any specified `setup-hooks` (update system, deploy packages,
>     etc.)

Should be as idempotent as possible.

>     - Take a snapshot, including RAM

Should be optional (as in, don't care if it cannot be done, e.g., on
bare metal).

>     - For every testcase using that fixture:
>         - Run testcase (execute test executable, check exit code)
>         - Rollback to snapshot (iff `rollback = true` for that template)
>     - destroy test instances (or at least those which are not needed by
>       other fixtures)

Might be optional for l1 hosts, l2 test VMs might be a separate switch.

> In the beginning, the test scripts would primarily drive the Systems
> under Test (SUTs) via their API. However, the system would also offer
> the flexibility for us to venture into the realm of automated GUI
> testing at some point (e.g. using selenium) - without having to
> change the overall test architecture.

Our existing selenium based UI test simple use the API to create stuff
that it needs, if it's not existing, and sometimes remove also some.

It uses some special ranges or values to avoid most conflicts with real
systems, allowing one to point it at existing (production) systems
without problems.

IMO this has a big value, and I actually added a bit of resiliency, as I
find that having to set up clean states a bit annoying and for one of
the main use cases of that tooling, creating screenshots, too sterile.

But always starting out from a very clean state is IMO not only "ugly"
for screenshots, but can also sometimes mas issues that test can run
into on systems with a longer uptime and the "organic mess" that comes
from long-term maintenance.

In practice one naturally wants both, starting from a clean state and
from existing one, both have their advantages and disadvantages.  Like
messy systems also might have more false-positives on regression
tracking.

> 
> ## Mock Test Runner Config
> 
> Beside the actual test scripts, test writers would write test
> configuration. Based on the current requirements and approach that
> I have chose, a example config *could* look like the one following.
> These would likely be split into multiple files/folders
> (e.g. to group test case definition and the test script logically).
> 
> ```toml
> [template.pve-default]
> # Backup image to restore from, in this case this would be a previously
> # set up PVE installation
> restore = '...'
> # To check if node is booted successfully, also made available to hook
> # scripts, in case they need to SSH in to setup things.
> ip = "10.0.0.1"
> # Define credentials in separate file - most template could use a
> # default password/SSH key/API token etc.
> credentials = "default"
> # Update to latest packages, install test .debs
> # credentials are passed via env var
> # Maybe this could also be ansible playbooks, if the need arises.

fwiw, one could also define a config-deployment-system, like

- none (already is setup)
- cloudinit
- QGA

but that can be added later on too.

> setup-hooks = [
>     "update.sh",
> ]
> # Take snapshot after setup-hook, roll back after each test case
> rollback = true
> 
> 
> [template.ldap-server]
> # Backup image to restore from
> restore = '...'
> credentials = "default"
> ip = "10.0.0.3"
> # No need to roll back in between test cases, there won't be any changes
> rollback = false
> 
> 
> 
> # Example fixture. They can be used by multiple testcases.
> [fixture.pve-with-ldap-server]
> # Maybe one could specify additional setup-hooks here as well, in case
> # one wants a 'per-fixture' setup? So that we can reduce the number of
> # base images?
> templates = [
>     'pve-default',
>     'ldap-server',
> ]
> 
> 
> # testcases.toml (might be split to multiple files/folders?)

maybe some sort of predicates could be also nice (even if not there from
the start), like to place condition where a test is skipped if that's
not met, like the existence of a ZFS-storage or something like that.

While those seem like details, having a general (simple) dependency and,
so to say, anti-dependency system might influence overall design more.

> [testcase.test-ldap-realms]
> fixture = 'pve-with-ldap-server'
> 
> # - return code is check to determine test case success
> # - stderr/stdout is captured for the final test report
> # - some data is passed via env var:
> #   - name of the test case
> #   - template configuration (IPs, credentials, etc.)
> #   - ...
> test-exec = './test-ldap-realms.pl'
> # Consider test as failed if test script does not finish fast enough
> test-timeout = 60
> # Additional params for the test script, allowing for parameterized
> # tests.
> # Could also turn this into an array and loop over the values, in
> # order to create multiple test cases from the same definition.
> test-params = { foo = "bar" }
> 
> # Second test case, using the same fixture
> [testcase.test-ldap-something-else]
> fixture = 'pve-with-ldap-server'
> test-exec = './test-ldap-something-else.pl'
> 
> ```
> 

Is the order of test-cases guaranteed by toml parsing, or how are intra-
fixture dependencies ensured?

Anyway, the most important thing is to start out here, so I don't
want to block anything on base on minorish stuff.

The most important thing for me is that the following parts are decoupled
and ideally shippable by a separate debian package each:

- parts that manage automated testing, including how the test host
  base system is set up (the latter could be even its own thing)
- running test itself inclusive some helper modules/scripts
- the test definitions

As then we can run them anywhere easily and extend, or possible even
rework some parts independently, if ever needed.

- Thomas