Message ID | cover.1591456338.git.lukasstraub2@web.de (mailing list archive) |
---|---|
Headers | show |
Series | colo: Introduce resource agent and test suite/CI | expand |
On Sat, 6 Jun 2020 21:17:32 +0200 Lukas Straub <lukasstraub2@web.de> wrote: > Hello Everyone, > So here is v2. Patch 1 can already be merged independently of the others. > > Regards, > Lukas Straub > > Changes: > v2: > -use new yank api > -drop disk_size parameter > -introduce pick_qemu_util function and use it > > Overview: > > Hello Everyone, > These patches introduce a resource agent for fully automatic management of colo > and a test suite building upon the resource agent to extensively test colo. > > Test suite features: > -Tests failover with peer crashing and hanging and failover during checkpoint > -Tests network using ssh and iperf3 > -Quick test requires no special configuration > -Network test for testing colo-compare > -Stress test: failover all the time with network load > > Resource agent features: > -Fully automatic management of colo > -Handles many failures: hanging/crashing qemu, replication error, disk error, ... > -Recovers from hanging qemu by using the "yank" oob command > -Tracks which node has up-to-date data > -Works well in clusters with more than 2 nodes > > Run times on my laptop: > Quick test: 200s > Network test: 800s (tagged as slow) > Stress test: 1300s (tagged as slow) > > The test suite needs access to a network bridge to properly test the network, > so some parameters need to be given to the test run. See > tests/acceptance/colo.py for more information. > > I wonder how this integrates in existing CI infrastructure. Is there a common > CI for qemu where this can run or does every subsystem have to run their own > CI? > > Regards, > Lukas Straub > > > Lukas Straub (7): > block/quorum.c: stable children names > avocado_qemu: Introduce pick_qemu_util to pick qemu utility binaries > boot_linux.py: Use pick_qemu_util > colo: Introduce resource agent > colo: Introduce high-level test suite > configure,Makefile: Install colo resource-agent > MAINTAINERS: Add myself as maintainer for COLO resource agent > > MAINTAINERS | 6 + > Makefile | 5 + > block/quorum.c | 20 +- > configure | 10 + > scripts/colo-resource-agent/colo | 1466 +++++++++++++++++++++ > scripts/colo-resource-agent/crm_master | 44 + > scripts/colo-resource-agent/crm_resource | 12 + > tests/acceptance/avocado_qemu/__init__.py | 15 + > tests/acceptance/boot_linux.py | 11 +- > tests/acceptance/colo.py | 677 ++++++++++ > 10 files changed, 2251 insertions(+), 15 deletions(-) > create mode 100755 scripts/colo-resource-agent/colo > create mode 100755 scripts/colo-resource-agent/crm_master > create mode 100755 scripts/colo-resource-agent/crm_resource > create mode 100644 tests/acceptance/colo.py > > -- > 2.20.1 Ping...
On 7/5/20 11:37 AM, Lukas Straub wrote: > On Sat, 6 Jun 2020 21:17:32 +0200 > Lukas Straub <lukasstraub2@web.de> wrote: > >> Hello Everyone, >> So here is v2. Patch 1 can already be merged independently of the others. >> >> Regards, >> Lukas Straub >> >> Changes: >> v2: >> -use new yank api >> -drop disk_size parameter >> -introduce pick_qemu_util function and use it >> >> Overview: >> >> Hello Everyone, >> These patches introduce a resource agent for fully automatic management of colo >> and a test suite building upon the resource agent to extensively test colo. >> >> Test suite features: >> -Tests failover with peer crashing and hanging and failover during checkpoint >> -Tests network using ssh and iperf3 >> -Quick test requires no special configuration >> -Network test for testing colo-compare >> -Stress test: failover all the time with network load >> >> Resource agent features: >> -Fully automatic management of colo >> -Handles many failures: hanging/crashing qemu, replication error, disk error, ... >> -Recovers from hanging qemu by using the "yank" oob command >> -Tracks which node has up-to-date data >> -Works well in clusters with more than 2 nodes >> >> Run times on my laptop: >> Quick test: 200s >> Network test: 800s (tagged as slow) >> Stress test: 1300s (tagged as slow) >> >> The test suite needs access to a network bridge to properly test the network, >> so some parameters need to be given to the test run. See >> tests/acceptance/colo.py for more information. >> >> I wonder how this integrates in existing CI infrastructure. Is there a common >> CI for qemu where this can run or does every subsystem have to run their own >> CI? >> >> Regards, >> Lukas Straub >> >> >> Lukas Straub (7): >> block/quorum.c: stable children names >> avocado_qemu: Introduce pick_qemu_util to pick qemu utility binaries >> boot_linux.py: Use pick_qemu_util >> colo: Introduce resource agent >> colo: Introduce high-level test suite >> configure,Makefile: Install colo resource-agent >> MAINTAINERS: Add myself as maintainer for COLO resource agent >> >> MAINTAINERS | 6 + >> Makefile | 5 + >> block/quorum.c | 20 +- >> configure | 10 + >> scripts/colo-resource-agent/colo | 1466 +++++++++++++++++++++ >> scripts/colo-resource-agent/crm_master | 44 + >> scripts/colo-resource-agent/crm_resource | 12 + >> tests/acceptance/avocado_qemu/__init__.py | 15 + >> tests/acceptance/boot_linux.py | 11 +- >> tests/acceptance/colo.py | 677 ++++++++++ >> 10 files changed, 2251 insertions(+), 15 deletions(-) >> create mode 100755 scripts/colo-resource-agent/colo >> create mode 100755 scripts/colo-resource-agent/crm_master >> create mode 100755 scripts/colo-resource-agent/crm_resource >> create mode 100644 tests/acceptance/colo.py >> >> -- >> 2.20.1 > > Ping... Ping^2?