tests/functional: Convert the kvm_xen_guest avocado test

Message ID	20241218113255.232356-1-thuth@redhat.com (mailing list archive)
State	New
Headers	show Return-Path: <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org> From: Thomas Huth <thuth@redhat.com> To: qemu-devel@nongnu.org Cc: Paul Durrant <paul@xen.org>, David Woodhouse <dwmw2@infradead.org> Subject: [PATCH] tests/functional: Convert the kvm_xen_guest avocado test Date: Wed, 18 Dec 2024 12:32:49 +0100 Message-ID: <20241218113255.232356-1-thuth@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=170.10.133.124; envelope-from=thuth@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -31 X-Spam_score: -3.2 X-Spam_bar: --- X-Spam_report: (-3.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-1.116, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Series	tests/functional: Convert the kvm_xen_guest avocado test \| expand tests/functional: Convert the kvm_xen_guest avocado test

Thomas Huth Dec. 18, 2024, 11:32 a.m. UTC

Use the serial console to execute the commands in the guest instead
of using ssh since we don't have ssh support in the functional
framework yet.

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 MAINTAINERS                                   |  2 +-
 tests/functional/meson.build                  |  2 +
 .../test_x86_64_kvm_xen.py}                   | 81 +++++++++++--------
 3 files changed, 51 insertions(+), 34 deletions(-)
 rename tests/{avocado/kvm_xen_guest.py => functional/test_x86_64_kvm_xen.py} (64%)
 mode change 100644 => 100755

David Woodhouse Dec. 18, 2024, 11:48 a.m. UTC | #1

On 18 December 2024 12:32:49 CET, Thomas Huth <thuth@redhat.com> wrote:
>Use the serial console to execute the commands in the guest instead
>of using ssh since we don't have ssh support in the functional
>framework yet.
>
>Signed-off-by: Thomas Huth <thuth@redhat.com>

Hm, but serial is lossy and experience shows that it leads to flaky tests if the guest (or host) misses bytes. While SSH would just go slower.

Daniel P. Berrangé Dec. 18, 2024, 11:53 a.m. UTC | #2

On Wed, Dec 18, 2024 at 12:48:01PM +0100, David Woodhouse wrote:
> On 18 December 2024 12:32:49 CET, Thomas Huth <thuth@redhat.com> wrote:
> >Use the serial console to execute the commands in the guest instead
> >of using ssh since we don't have ssh support in the functional
> >framework yet.
> >
> >Signed-off-by: Thomas Huth <thuth@redhat.com>
> 
> Hm, but serial is lossy and experience shows that it leads to
> flaky tests if the guest (or host) misses bytes. While SSH would just go slower.

Practically all of our tests are using the serial console for interaction.
QEMU serial port emulation is generally written to stall if the fifo is
full, and not throwaway data.

With regards,
Daniel

Thomas Huth Dec. 18, 2024, 12:40 p.m. UTC | #3

On 18/12/2024 12.48, David Woodhouse wrote:
> On 18 December 2024 12:32:49 CET, Thomas Huth <thuth@redhat.com> wrote:
>> Use the serial console to execute the commands in the guest instead
>> of using ssh since we don't have ssh support in the functional
>> framework yet.
>>
>> Signed-off-by: Thomas Huth <thuth@redhat.com>
> 
> Hm, but serial is lossy and experience shows that it leads to flaky tests if the guest (or host) misses bytes. While SSH would just go slower.

The issue with the serial console should be fixed since:

  https://gitlab.com/qemu-project/qemu/-/commit/cdad03b74f759857d784e074755

We didn't see any more issues with all the other tests since that has been 
merged.

  Thomas

Thomas Huth Dec. 18, 2024, 1:38 p.m. UTC | #4

On 18/12/2024 13.40, Thomas Huth wrote:
> On 18/12/2024 12.48, David Woodhouse wrote:
>> On 18 December 2024 12:32:49 CET, Thomas Huth <thuth@redhat.com> wrote:
>>> Use the serial console to execute the commands in the guest instead
>>> of using ssh since we don't have ssh support in the functional
>>> framework yet.
>>>
>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>
>> Hm, but serial is lossy and experience shows that it leads to flaky tests 
>> if the guest (or host) misses bytes. While SSH would just go slower.
> 
> The issue with the serial console should be fixed since:
> 
>   https://gitlab.com/qemu-project/qemu/-/commit/cdad03b74f759857d784e074755
> 
> We didn't see any more issues with all the other tests since that has been 
> merged.

But FWIW, there seems to be another issue with this test. While running it 
multiple times, I sometimes see test_kvm_xen_guest_novector_noapic hanging. 
According to the console output, the guest waits in vain for a device:

2024-12-18 14:32:58,606: Initializing XFRM netlink socket
2024-12-18 14:32:58,607: NET: Registered PF_INET6 protocol family
2024-12-18 14:32:58,609: Segment Routing with IPv6
2024-12-18 14:32:58,609: In-situ OAM (IOAM) with IPv6
2024-12-18 14:32:58,610: NET: Registered PF_PACKET protocol family
2024-12-18 14:32:58,610: 8021q: 802.1Q VLAN Support v1.8
2024-12-18 14:32:58,611: 9pnet: Installing 9P2000 support
2024-12-18 14:32:58,613: NET: Registered PF_VSOCK protocol family
2024-12-18 14:32:58,614: IPI shorthand broadcast: enabled
2024-12-18 14:32:58,619: sched_clock: Marking stable (551147059, 
-6778955)->(590359530, -45991426)
2024-12-18 14:32:59,507: tsc: Refined TSC clocksource calibration: 2495.952 MHz
2024-12-18 14:32:59,508: clocksource: tsc: mask: 0xffffffffffffffff 
max_cycles: 0x23fa49fc138, max_idle_ns: 440795295059 ns
2024-12-18 14:32:59,509: clocksource: Switched to clocksource tsc
2024-12-18 14:33:28,667: xenbus_probe_frontend: Waiting for devices to 
initialise: 25s...20s...15s...10s...5s...0s...

Have you seen this problem before?

  Thomas

David Woodhouse Dec. 18, 2024, 2:11 p.m. UTC | #5

On Wed, 2024-12-18 at 14:38 +0100, Thomas Huth wrote:
> On 18/12/2024 13.40, Thomas Huth wrote:
> > On 18/12/2024 12.48, David Woodhouse wrote:
> > > On 18 December 2024 12:32:49 CET, Thomas Huth <thuth@redhat.com> wrote:
> > > > Use the serial console to execute the commands in the guest instead
> > > > of using ssh since we don't have ssh support in the functional
> > > > framework yet.
> > > > 
> > > > Signed-off-by: Thomas Huth <thuth@redhat.com>
> > > 
> > > Hm, but serial is lossy and experience shows that it leads to flaky tests 
> > > if the guest (or host) misses bytes. While SSH would just go slower.
> > 
> > The issue with the serial console should be fixed since:
> > 
> >   https://gitlab.com/qemu-project/qemu/-/commit/cdad03b74f759857d784e074755
> > 
> > We didn't see any more issues with all the other tests since that has been 
> > merged.

Fair enough, thanks. In that case:

Acked-by: David Woodhouse <dwmw@amazon.co.uk>

> But FWIW, there seems to be another issue with this test. While running it 
> multiple times, I sometimes see test_kvm_xen_guest_novector_noapic hanging. 
> According to the console output, the guest waits in vain for a device:
> 
> 2024-12-18 14:32:58,606: Initializing XFRM netlink socket
> 2024-12-18 14:32:58,607: NET: Registered PF_INET6 protocol family
> 2024-12-18 14:32:58,609: Segment Routing with IPv6
> 2024-12-18 14:32:58,609: In-situ OAM (IOAM) with IPv6
> 2024-12-18 14:32:58,610: NET: Registered PF_PACKET protocol family
> 2024-12-18 14:32:58,610: 8021q: 802.1Q VLAN Support v1.8
> 2024-12-18 14:32:58,611: 9pnet: Installing 9P2000 support
> 2024-12-18 14:32:58,613: NET: Registered PF_VSOCK protocol family
> 2024-12-18 14:32:58,614: IPI shorthand broadcast: enabled
> 2024-12-18 14:32:58,619: sched_clock: Marking stable (551147059, -6778955)->(590359530, -45991426)
> 2024-12-18 14:32:59,507: tsc: Refined TSC clocksource calibration: 2495.952 MHz
> 2024-12-18 14:32:59,508: clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x23fa49fc138, max_idle_ns: 440795295059 ns
> 2024-12-18 14:32:59,509: clocksource: Switched to clocksource tsc
> 2024-12-18 14:33:28,667: xenbus_probe_frontend: Waiting for devices to initialise: 25s...20s...15s...10s...5s...0s...
> 
> Have you seen this problem before?

That seems like event channel interrupts aren't being routed to the
legacy i8259 PIC. I've certainly seen that kind of thing before,
especially when asserted level-triggered interrupts weren't correctly
being asserted. But I don't expect that of QEMU. I'll see if I can
reproduce; thanks.

How often does it happen?

Thomas Huth Dec. 18, 2024, 3:54 p.m. UTC | #6

On 18/12/2024 12.48, David Woodhouse wrote:
> On 18 December 2024 12:32:49 CET, Thomas Huth <thuth@redhat.com> wrote:
>> Use the serial console to execute the commands in the guest instead
>> of using ssh since we don't have ssh support in the functional
>> framework yet.
>>
>> Signed-off-by: Thomas Huth <thuth@redhat.com>
> 
> Hm, but serial is lossy and experience shows that it leads to flaky tests if the guest (or host) misses bytes. While SSH would just go slower.

I now noticed some issue with the serial console in this test, too.
Looks like the "Starting dropbear sshd: OK" is not print in an atomic way by 
the guest, sometimes there are other kernel messages between the ":" and the 
"OK". It works reliable when removing the "OK" from the string.

  Thomas

Thomas Huth Dec. 18, 2024, 4:19 p.m. UTC | #7

On 18/12/2024 15.11, David Woodhouse wrote:
> On Wed, 2024-12-18 at 14:38 +0100, Thomas Huth wrote:
...
>> But FWIW, there seems to be another issue with this test. While running it
>> multiple times, I sometimes see test_kvm_xen_guest_novector_noapic hanging.
>> According to the console output, the guest waits in vain for a device:
>>
>> 2024-12-18 14:32:58,606: Initializing XFRM netlink socket
>> 2024-12-18 14:32:58,607: NET: Registered PF_INET6 protocol family
>> 2024-12-18 14:32:58,609: Segment Routing with IPv6
>> 2024-12-18 14:32:58,609: In-situ OAM (IOAM) with IPv6
>> 2024-12-18 14:32:58,610: NET: Registered PF_PACKET protocol family
>> 2024-12-18 14:32:58,610: 8021q: 802.1Q VLAN Support v1.8
>> 2024-12-18 14:32:58,611: 9pnet: Installing 9P2000 support
>> 2024-12-18 14:32:58,613: NET: Registered PF_VSOCK protocol family
>> 2024-12-18 14:32:58,614: IPI shorthand broadcast: enabled
>> 2024-12-18 14:32:58,619: sched_clock: Marking stable (551147059, -6778955)->(590359530, -45991426)
>> 2024-12-18 14:32:59,507: tsc: Refined TSC clocksource calibration: 2495.952 MHz
>> 2024-12-18 14:32:59,508: clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x23fa49fc138, max_idle_ns: 440795295059 ns
>> 2024-12-18 14:32:59,509: clocksource: Switched to clocksource tsc
>> 2024-12-18 14:33:28,667: xenbus_probe_frontend: Waiting for devices to initialise: 25s...20s...15s...10s...5s...0s...
>>
>> Have you seen this problem before?
> 
> That seems like event channel interrupts aren't being routed to the
> legacy i8259 PIC. I've certainly seen that kind of thing before,
> especially when asserted level-triggered interrupts weren't correctly
> being asserted. But I don't expect that of QEMU. I'll see if I can
> reproduce; thanks.
> 
> How often does it happen?

With the new functional test, it happens maybe 2 times out of 100 test runs.

I wasn't able to reproduce it with the avocado version yet, but that also 
runs 10x slower, so it takes a longer time to get to that many runs...

  Thomas

Thomas Huth Dec. 18, 2024, 5:16 p.m. UTC | #8

On 18/12/2024 17.19, Thomas Huth wrote:
> On 18/12/2024 15.11, David Woodhouse wrote:
>> On Wed, 2024-12-18 at 14:38 +0100, Thomas Huth wrote:
> ...
>>> But FWIW, there seems to be another issue with this test. While running it
>>> multiple times, I sometimes see test_kvm_xen_guest_novector_noapic hanging.
>>> According to the console output, the guest waits in vain for a device:
>>>
>>> 2024-12-18 14:32:58,606: Initializing XFRM netlink socket
>>> 2024-12-18 14:32:58,607: NET: Registered PF_INET6 protocol family
>>> 2024-12-18 14:32:58,609: Segment Routing with IPv6
>>> 2024-12-18 14:32:58,609: In-situ OAM (IOAM) with IPv6
>>> 2024-12-18 14:32:58,610: NET: Registered PF_PACKET protocol family
>>> 2024-12-18 14:32:58,610: 8021q: 802.1Q VLAN Support v1.8
>>> 2024-12-18 14:32:58,611: 9pnet: Installing 9P2000 support
>>> 2024-12-18 14:32:58,613: NET: Registered PF_VSOCK protocol family
>>> 2024-12-18 14:32:58,614: IPI shorthand broadcast: enabled
>>> 2024-12-18 14:32:58,619: sched_clock: Marking stable (551147059, 
>>> -6778955)->(590359530, -45991426)
>>> 2024-12-18 14:32:59,507: tsc: Refined TSC clocksource calibration: 
>>> 2495.952 MHz
>>> 2024-12-18 14:32:59,508: clocksource: tsc: mask: 0xffffffffffffffff 
>>> max_cycles: 0x23fa49fc138, max_idle_ns: 440795295059 ns
>>> 2024-12-18 14:32:59,509: clocksource: Switched to clocksource tsc
>>> 2024-12-18 14:33:28,667: xenbus_probe_frontend: Waiting for devices to 
>>> initialise: 25s...20s...15s...10s...5s...0s...
>>>
>>> Have you seen this problem before?
>>
>> That seems like event channel interrupts aren't being routed to the
>> legacy i8259 PIC. I've certainly seen that kind of thing before,
>> especially when asserted level-triggered interrupts weren't correctly
>> being asserted. But I don't expect that of QEMU. I'll see if I can
>> reproduce; thanks.
>>
>> How often does it happen?
> 
> With the new functional test, it happens maybe 2 times out of 100 test runs.
> 
> I wasn't able to reproduce it with the avocado version yet, but that also 
> runs 10x slower, so it takes a longer time to get to that many runs...

Ok, FWIW, I've now also seen the problem with the old avocado version of the 
test, so it's nothing that has been introduced by my patch. I just had to 
downgrade to Avocado v88 again since the current version v103 does not seem 
to correctly output the console anymore :-/ (which is another good indicator 
that we really need to get the stuff moved over to the functional framework 
now).

  Thomas

David Woodhouse Dec. 18, 2024, 7:11 p.m. UTC | #9

On 18 December 2024 18:16:13 CET, Thomas Huth <thuth@redhat.com> wrote:
>On 18/12/2024 17.19, Thomas Huth wrote:
>> On 18/12/2024 15.11, David Woodhouse wrote:
>>> On Wed, 2024-12-18 at 14:38 +0100, Thomas Huth wrote:
>> ...
>>>> But FWIW, there seems to be another issue with this test. While running it
>>>> multiple times, I sometimes see test_kvm_xen_guest_novector_noapic hanging.
>>>> According to the console output, the guest waits in vain for a device:
>>>> 
>>>> 2024-12-18 14:32:58,606: Initializing XFRM netlink socket
>>>> 2024-12-18 14:32:58,607: NET: Registered PF_INET6 protocol family
>>>> 2024-12-18 14:32:58,609: Segment Routing with IPv6
>>>> 2024-12-18 14:32:58,609: In-situ OAM (IOAM) with IPv6
>>>> 2024-12-18 14:32:58,610: NET: Registered PF_PACKET protocol family
>>>> 2024-12-18 14:32:58,610: 8021q: 802.1Q VLAN Support v1.8
>>>> 2024-12-18 14:32:58,611: 9pnet: Installing 9P2000 support
>>>> 2024-12-18 14:32:58,613: NET: Registered PF_VSOCK protocol family
>>>> 2024-12-18 14:32:58,614: IPI shorthand broadcast: enabled
>>>> 2024-12-18 14:32:58,619: sched_clock: Marking stable (551147059, -6778955)->(590359530, -45991426)
>>>> 2024-12-18 14:32:59,507: tsc: Refined TSC clocksource calibration: 2495.952 MHz
>>>> 2024-12-18 14:32:59,508: clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x23fa49fc138, max_idle_ns: 440795295059 ns
>>>> 2024-12-18 14:32:59,509: clocksource: Switched to clocksource tsc
>>>> 2024-12-18 14:33:28,667: xenbus_probe_frontend: Waiting for devices to initialise: 25s...20s...15s...10s...5s...0s...
>>>> 
>>>> Have you seen this problem before?
>>> 
>>> That seems like event channel interrupts aren't being routed to the
>>> legacy i8259 PIC. I've certainly seen that kind of thing before,
>>> especially when asserted level-triggered interrupts weren't correctly
>>> being asserted. But I don't expect that of QEMU. I'll see if I can
>>> reproduce; thanks.
>>> 
>>> How often does it happen?
>> 
>> With the new functional test, it happens maybe 2 times out of 100 test runs.
>> 
>> I wasn't able to reproduce it with the avocado version yet, but that also runs 10x slower, so it takes a longer time to get to that many runs...
>
>Ok, FWIW, I've now also seen the problem with the old avocado version of the test, so it's nothing that has been introduced by my patch. I just had to downgrade to Avocado v88 again since the current version v103 does not seem to correctly output the console anymore :-/ (which is another good indicator that we really need to get the stuff moved over to the functional framework now).
>
> Thomas
>

I have reproduced it, will look into it. I'm fairly sure this was all working reliably at the time the Xen support was merged; that's why I wrote these test cases after all.

David Woodhouse Dec. 18, 2024, 9:42 p.m. UTC | #10

On Wed, 2024-12-18 at 17:19 +0100, Thomas Huth wrote:
> On 18/12/2024 15.11, David Woodhouse wrote:
> > On Wed, 2024-12-18 at 14:38 +0100, Thomas Huth wrote:
> ...
> > > But FWIW, there seems to be another issue with this test. While running it
> > > multiple times, I sometimes see test_kvm_xen_guest_novector_noapic hanging.
> > > According to the console output, the guest waits in vain for a device:
> > > 
> > > 2024-12-18 14:32:58,606: Initializing XFRM netlink socket
> > > 2024-12-18 14:32:58,607: NET: Registered PF_INET6 protocol family
> > > 2024-12-18 14:32:58,609: Segment Routing with IPv6
> > > 2024-12-18 14:32:58,609: In-situ OAM (IOAM) with IPv6
> > > 2024-12-18 14:32:58,610: NET: Registered PF_PACKET protocol family
> > > 2024-12-18 14:32:58,610: 8021q: 802.1Q VLAN Support v1.8
> > > 2024-12-18 14:32:58,611: 9pnet: Installing 9P2000 support
> > > 2024-12-18 14:32:58,613: NET: Registered PF_VSOCK protocol family
> > > 2024-12-18 14:32:58,614: IPI shorthand broadcast: enabled
> > > 2024-12-18 14:32:58,619: sched_clock: Marking stable (551147059, -6778955)->(590359530, -45991426)
> > > 2024-12-18 14:32:59,507: tsc: Refined TSC clocksource calibration: 2495.952 MHz
> > > 2024-12-18 14:32:59,508: clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x23fa49fc138, max_idle_ns: 440795295059 ns
> > > 2024-12-18 14:32:59,509: clocksource: Switched to clocksource tsc
> > > 2024-12-18 14:33:28,667: xenbus_probe_frontend: Waiting for devices to initialise: 25s...20s...15s...10s...5s...0s...
> > > 
> > > Have you seen this problem before?
> > 
> > That seems like event channel interrupts aren't being routed to the
> > legacy i8259 PIC. I've certainly seen that kind of thing before,
> > especially when asserted level-triggered interrupts weren't correctly
> > being asserted. But I don't expect that of QEMU. I'll see if I can
> > reproduce; thanks.
> > 
> > How often does it happen?
> 
> With the new functional test, it happens maybe 2 times out of 100 test runs.
> 
> I wasn't able to reproduce it with the avocado version yet, but that also 
> runs 10x slower, so it takes a longer time to get to that many runs...

I can reproduce it probably about one in ten attempts.

It seems like it's because of the way QEMU handles shared level-
triggered interrupts.

We kind of work around it with PCI INTx demultiplexing, but the Xen
guest explicitly asks for the interrupt to be delivered to INT10. So it
asserts INT10, then I suspect something on the PCI deasserts it, and
the Xen interrupt is lost.

The guest configures the Xen event channel IRQ to be delivered on
IRQ10, but IRQ10 is *also* a PCI INTX of some device. So if the PCI
device *clears* IRQ10 at the wrong moment, we miss a Xen interrupt...
which means we end up waiting for ever.

We *really* ought to do this with a callback when the interrupt is
acked in the PIC/IOAPIC, and any interrupt source which wants it to be
still asserted can reassert it from that callback, which is precisely
how the vfio eventfd 'resampler' works. Or *should* work, if QEMU was
actually capable of working that way.

I'll see if I can come up with a workaround... or whether I fall into
the rabbithole of fixing the overall level interrupt stuff.

David Woodhouse Dec. 18, 2024, 10:11 p.m. UTC | #11

On Wed, 2024-12-18 at 22:42 +0100, David Woodhouse wrote:
> 
> It seems like it's because of the way QEMU handles shared level-
> triggered interrupts.

Yeah, this hack seems to confirm it. As I said, PCI INTx manages to
demux correctly, but any time you have non-PCI interrupt sharing, it's
hosed because they all just set/clear the GSI as if they own it, and
there's no OR gate in sight.

Now I have to decide if this is going to provoke me into attempting to
fix it for the general case with callbacks and fixing VFIO resampling
too, or whether I paper over it for QEMU with something *slightly* less
icky than this (which ideally would not lose levels from PCI devices
either)...

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 07bd0c9ab8..4c2e8876e5 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -301,7 +301,20 @@ static void gsi_assert_bh(void *opaque)
         xen_evtchn_set_callback_level(!!vi->evtchn_upcall_pending);
     }
 }
-
+int xen_evtchn_check_gsi(int n, int level)
+{
+    struct vcpu_info *vi = kvm_xen_get_vcpu_info_hva(0);
+    XenEvtchnState *s = xen_evtchn_singleton;
+    if (!s || n != s->callback_gsi || !vi) {
+        return level;
+    }
+    if (vi->evtchn_upcall_pending && !level) {
+        printf("Refusing to deassert GSI#%d which is asserted by Xen\n",
+               n);
+        return 1;
+    }
+    return level;
+}
 void xen_evtchn_create(unsigned int nr_gsis, qemu_irq *system_gsis)
 {
     XenEvtchnState *s = XEN_EVTCHN(sysbus_create_simple(TYPE_XEN_EVTCHN,
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index b740acfc0d..c1f56869b3 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -31,6 +31,7 @@ struct kvm_irq_routing_entry;
 int xen_evtchn_translate_pirq_msi(struct kvm_irq_routing_entry *route,
                                   uint64_t address, uint32_t data);
 bool xen_evtchn_deliver_pirq_msi(uint64_t address, uint32_t data);
+int xen_evtchn_check_gsi(int n, int level);
 
 
 /*
diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
index dc031af662..4185f467ee 100644
--- a/hw/i386/x86-common.c
+++ b/hw/i386/x86-common.c
@@ -452,6 +452,7 @@ void gsi_handler(void *opaque, int n, int level)
     GSIState *s = opaque;
 
     trace_x86_gsi_interrupt(n, level);
+    level = xen_evtchn_check_gsi(n, level);
     switch (n) {
     case 0 ... ISA_NUM_IRQS - 1:
         if (s->i8259_irq[n]) {

David Woodhouse Dec. 18, 2024, 10:14 p.m. UTC | #12

On Wed, 2024-12-18 at 16:54 +0100, Thomas Huth wrote:
> On 18/12/2024 12.48, David Woodhouse wrote:
> > On 18 December 2024 12:32:49 CET, Thomas Huth <thuth@redhat.com> wrote:
> > > Use the serial console to execute the commands in the guest instead
> > > of using ssh since we don't have ssh support in the functional
> > > framework yet.
> > > 
> > > Signed-off-by: Thomas Huth <thuth@redhat.com>
> > 
> > Hm, but serial is lossy and experience shows that it leads to flaky tests if the guest (or host) misses bytes. While SSH would just go slower.
> 
> I now noticed some issue with the serial console in this test, too.
> Looks like the "Starting dropbear sshd: OK" is not print in an atomic way by 
> the guest, sometimes there are other kernel messages between the ":" and the 
> "OK". It works reliable when removing the "OK" from the string.

Nah, that still isn't atomic; you just got lucky because the race
window is smaller. It's not like serial ports are at a premium; can't
you have a separate port for kernel vs. userspace messages?

Thomas Huth Dec. 19, 2024, 8:35 a.m. UTC | #13

On 18/12/2024 23.14, David Woodhouse wrote:
> On Wed, 2024-12-18 at 16:54 +0100, Thomas Huth wrote:
>> On 18/12/2024 12.48, David Woodhouse wrote:
>>> On 18 December 2024 12:32:49 CET, Thomas Huth <thuth@redhat.com> wrote:
>>>> Use the serial console to execute the commands in the guest instead
>>>> of using ssh since we don't have ssh support in the functional
>>>> framework yet.
>>>>
>>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>>
>>> Hm, but serial is lossy and experience shows that it leads to flaky tests if the guest (or host) misses bytes. While SSH would just go slower.
>>
>> I now noticed some issue with the serial console in this test, too.
>> Looks like the "Starting dropbear sshd: OK" is not print in an atomic way by
>> the guest, sometimes there are other kernel messages between the ":" and the
>> "OK". It works reliable when removing the "OK" from the string.
> 
> Nah, that still isn't atomic; you just got lucky because the race
> window is smaller. It's not like serial ports are at a premium; can't
> you have a separate port for kernel vs. userspace messages?

Maybe easiest solution: Simply add "quiet" to the kernel command line, then 
it does not write the kernel messages to the serial console anymore.

  Thomas

David Woodhouse Dec. 19, 2024, 8:49 a.m. UTC | #14

On 19 December 2024 09:35:13 CET, Thomas Huth <thuth@redhat.com> wrote:
>On 18/12/2024 23.14, David Woodhouse wrote:
>> On Wed, 2024-12-18 at 16:54 +0100, Thomas Huth wrote:
>>> On 18/12/2024 12.48, David Woodhouse wrote:
>>>> On 18 December 2024 12:32:49 CET, Thomas Huth <thuth@redhat.com> wrote:
>>>>> Use the serial console to execute the commands in the guest instead
>>>>> of using ssh since we don't have ssh support in the functional
>>>>> framework yet.
>>>>> 
>>>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>>> 
>>>> Hm, but serial is lossy and experience shows that it leads to flaky tests if the guest (or host) misses bytes. While SSH would just go slower.
>>> 
>>> I now noticed some issue with the serial console in this test, too.
>>> Looks like the "Starting dropbear sshd: OK" is not print in an atomic way by
>>> the guest, sometimes there are other kernel messages between the ":" and the
>>> "OK". It works reliable when removing the "OK" from the string.
>> 
>> Nah, that still isn't atomic; you just got lucky because the race
>> window is smaller. It's not like serial ports are at a premium; can't
>> you have a separate port for kernel vs. userspace messages?
>
>Maybe easiest solution: Simply add "quiet" to the kernel command line, then it does not write the kernel messages to the serial console anymore.

Want to resend the bug report about that test failing again? But without the kernel messages this time... :)

Thomas Huth Dec. 19, 2024, 12:24 p.m. UTC | #15

On 19/12/2024 09.49, David Woodhouse wrote:
> On 19 December 2024 09:35:13 CET, Thomas Huth <thuth@redhat.com> wrote:
>> On 18/12/2024 23.14, David Woodhouse wrote:
>>> On Wed, 2024-12-18 at 16:54 +0100, Thomas Huth wrote:
>>>> On 18/12/2024 12.48, David Woodhouse wrote:
>>>>> On 18 December 2024 12:32:49 CET, Thomas Huth <thuth@redhat.com> wrote:
>>>>>> Use the serial console to execute the commands in the guest instead
>>>>>> of using ssh since we don't have ssh support in the functional
>>>>>> framework yet.
>>>>>>
>>>>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>>>>
>>>>> Hm, but serial is lossy and experience shows that it leads to flaky tests if the guest (or host) misses bytes. While SSH would just go slower.
>>>>
>>>> I now noticed some issue with the serial console in this test, too.
>>>> Looks like the "Starting dropbear sshd: OK" is not print in an atomic way by
>>>> the guest, sometimes there are other kernel messages between the ":" and the
>>>> "OK". It works reliable when removing the "OK" from the string.
>>>
>>> Nah, that still isn't atomic; you just got lucky because the race
>>> window is smaller. It's not like serial ports are at a premium; can't
>>> you have a separate port for kernel vs. userspace messages?
>>
>> Maybe easiest solution: Simply add "quiet" to the kernel command line, then it does not write the kernel messages to the serial console anymore.
> 
> Want to resend the bug report about that test failing again? But without the kernel messages this time... :)

With "quiet", the output just looks like this when it hangs:

  Cannot get hvm parameter CONSOLE_EVTCHN (18): -22!
  Spectre V2 : Kernel not compiled with retpoline; no mitigation available!
  kvm_intel: VMX not supported by CPU 0
  Cannot get hvm parameter CONSOLE_EVTCHN (18): -22!
  fail to initialize ptp_kvm

Anyway, to properly track this, I've now created a ticket with the full log:

  https://gitlab.com/qemu-project/qemu/-/issues/2731

  Thomas

David Woodhouse Dec. 19, 2024, 12:56 p.m. UTC | #16

On Thu, 2024-12-19 at 13:24 +0100, Thomas Huth wrote:
> On 19/12/2024 09.49, David Woodhouse wrote:
> > On 19 December 2024 09:35:13 CET, Thomas Huth <thuth@redhat.com> wrote:
> > > On 18/12/2024 23.14, David Woodhouse wrote:
> > > > On Wed, 2024-12-18 at 16:54 +0100, Thomas Huth wrote:
> > > > > On 18/12/2024 12.48, David Woodhouse wrote:
> > > > > > On 18 December 2024 12:32:49 CET, Thomas Huth <thuth@redhat.com> wrote:
> > > > > > > Use the serial console to execute the commands in the guest instead
> > > > > > > of using ssh since we don't have ssh support in the functional
> > > > > > > framework yet.
> > > > > > > 
> > > > > > > Signed-off-by: Thomas Huth <thuth@redhat.com>
> > > > > > 
> > > > > > Hm, but serial is lossy and experience shows that it leads to flaky tests if the guest (or host) misses bytes. While SSH would just go slower.
> > > > > 
> > > > > I now noticed some issue with the serial console in this test, too.
> > > > > Looks like the "Starting dropbear sshd: OK" is not print in an atomic way by
> > > > > the guest, sometimes there are other kernel messages between the ":" and the
> > > > > "OK". It works reliable when removing the "OK" from the string.
> > > > 
> > > > Nah, that still isn't atomic; you just got lucky because the race
> > > > window is smaller. It's not like serial ports are at a premium; can't
> > > > you have a separate port for kernel vs. userspace messages?
> > > 
> > > Maybe easiest solution: Simply add "quiet" to the kernel command line, then it does not write the kernel messages to the serial console anymore.
> > 
> > Want to resend the bug report about that test failing again? But without the kernel messages this time... :)
> 
> With "quiet", the output just looks like this when it hangs:
> 
>   Cannot get hvm parameter CONSOLE_EVTCHN (18): -22!
>   Spectre V2 : Kernel not compiled with retpoline; no mitigation available!
>   kvm_intel: VMX not supported by CPU 0
>   Cannot get hvm parameter CONSOLE_EVTCHN (18): -22!
>   fail to initialize ptp_kvm

Yeah, that request was rhetorical. That output is useless for
understanding anything about what happened.

> Anyway, to properly track this, I've now created a ticket with the full log:
> 
>   https://gitlab.com/qemu-project/qemu/-/issues/2731

The patch below should fix it. I don't like it very much; it's very
much papering over a much bigger generic problem with QEMU's handling
of shared interrupts.

Basically, *nothing* should just directly set the system GSIs to
"their" desired level with qemu_set_irq(). Each device should feed into
a multiplexer which is essentially an OR gate, and the *output* of that
mux goes into the actual GSI.

Alternatively, when the interrupt line is acked at the interrupt
controller, we should do a callback to query whether that line should
still be asserted (which is exactly how VFIO's resampler should work,
but we can't use it correctly from QEMU and currently just invoke it
for *every* MMIO access to a VFIO device!).

This just implements the logical OR for the Xen event channel callback
GSI, since the Xen code *already* has a hook in the x86 gsi_handler()
function to capture external interrupts being routed to event channel
PIRQs. So we just shuffle that a little and let it 'adjust' the level
that's being set.


diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 07bd0c9ab8..e3bd48d954 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -140,6 +140,8 @@ struct XenEvtchnState {
 
     uint64_t callback_param;
     bool evtchn_in_kernel;
+    bool setting_callback_gsi;
+    int extern_gsi_level;
     uint32_t callback_gsi;
 
     QEMUBH *gsi_bh;
@@ -431,7 +433,9 @@ void xen_evtchn_set_callback_level(int level)
     }
 
     if (s->callback_gsi && s->callback_gsi < s->nr_callback_gsis) {
-        qemu_set_irq(s->callback_gsis[s->callback_gsi], level);
+        s->setting_callback_gsi = true;
+        qemu_set_irq(s->callback_gsis[s->callback_gsi], level || s->extern_gsi_level);
+        s->setting_callback_gsi = false;
         if (level) {
             /* Ensure the vCPU polls for deassertion */
             kvm_xen_set_callback_asserted();
@@ -1596,7 +1600,7 @@ static int allocate_pirq(XenEvtchnState *s, int type, int gsi)
     return pirq;
 }
 
-bool xen_evtchn_set_gsi(int gsi, int level)
+bool xen_evtchn_set_gsi(int gsi, int *level)
 {
     XenEvtchnState *s = xen_evtchn_singleton;
     int pirq;
@@ -1608,16 +1612,29 @@ bool xen_evtchn_set_gsi(int gsi, int level)
     }
 
     /*
-     * Check that that it *isn't* the event channel GSI, and thus
-     * that we are not recursing and it's safe to take s->port_lock.
+     * For the callback_gsi we need to implement a logical OR of the event
+     * channel GSI and the external input (e.g. from PCI INTx), because
+     * QEMU itself doesn't support shared level interrupts via demux or
+     * resamplers.
      *
-     * Locking aside, it's perfectly sane to bail out early for that
-     * special case, as it would make no sense for the event channel
-     * GSI to be routed back to event channels, when the delivery
-     * method is to raise the GSI... that recursion wouldn't *just*
-     * be a locking issue.
+     * Remember the level which is being set by *other* callers.
+     *
+     * The event channel GSI cannot be routed to PIRQ, as that would make
+     * no sense. It would also be a locking problem so bail out early and
+     * don't potentially deadlock on s->port_lock.
      */
     if (gsi && gsi == s->callback_gsi) {
+        /* Remember the external state of the GSI pin (e.g. from PCI INTx) */
+        if (!s->setting_callback_gsi) {
+            s->extern_gsi_level = *level;
+            if (!s->extern_gsi_level) {
+                struct vcpu_info *vi = kvm_xen_get_vcpu_info_hva(0);
+                if (vi && vi->evtchn_upcall_pending) {
+                    *level = 1;
+                }
+            }
+        }
+
         return false;
     }
 
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index b740acfc0d..0521ebc092 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -23,7 +23,7 @@ void xen_evtchn_set_callback_level(int level);
 
 int xen_evtchn_set_port(uint16_t port);
 
-bool xen_evtchn_set_gsi(int gsi, int level);
+bool xen_evtchn_set_gsi(int gsi, int *level);
 void xen_evtchn_snoop_msi(PCIDevice *dev, bool is_msix, unsigned int vector,
                           uint64_t addr, uint32_t data, bool is_masked);
 void xen_evtchn_remove_pci_device(PCIDevice *dev);
diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
index dc031af662..77acd331ee 100644
--- a/hw/i386/x86-common.c
+++ b/hw/i386/x86-common.c
@@ -450,28 +450,36 @@ static long get_file_size(FILE *f)
 void gsi_handler(void *opaque, int n, int level)
 {
     GSIState *s = opaque;
+    bool bypass_ioapic = false;
 
     trace_x86_gsi_interrupt(n, level);
-    switch (n) {
-    case 0 ... ISA_NUM_IRQS - 1:
-        if (s->i8259_irq[n]) {
-            /* Under KVM, Kernel will forward to both PIC and IOAPIC */
-            qemu_set_irq(s->i8259_irq[n], level);
-        }
-        /* fall through */
-    case ISA_NUM_IRQS ... IOAPIC_NUM_PINS - 1:
+
 #ifdef CONFIG_XEN_EMU
         /*
          * Xen delivers the GSI to the Legacy PIC (not that Legacy PIC
          * routing actually works properly under Xen). And then to
          * *either* the PIRQ handling or the I/OAPIC depending on
          * whether the former wants it.
+         *
+         * Additionally, this hook allows the Xen event channel GSI to
+         * work around QEMU's lack of support for shared level interrupts,
+         * by keeping track of the externally driven state of the pin and
+         * implementing a logical OR with the state of the evtchn GSI.
          */
-        if (xen_mode == XEN_EMULATE && xen_evtchn_set_gsi(n, level)) {
-            break;
-        }
+    if (xen_mode == XEN_EMULATE)
+        bypass_ioapic = xen_evtchn_set_gsi(n, &level);
 #endif
-        qemu_set_irq(s->ioapic_irq[n], level);
+
+    switch (n) {
+    case 0 ... ISA_NUM_IRQS - 1:
+        if (s->i8259_irq[n]) {
+            /* Under KVM, Kernel will forward to both PIC and IOAPIC */
+            qemu_set_irq(s->i8259_irq[n], level);
+        }
+        /* fall through */
+    case ISA_NUM_IRQS ... IOAPIC_NUM_PINS - 1:
+        if (!bypass_ioapic)
+            qemu_set_irq(s->ioapic_irq[n], level);
         break;
     case IO_APIC_SECONDARY_IRQBASE
         ... IO_APIC_SECONDARY_IRQBASE + IOAPIC_NUM_PINS - 1:

Richard Henderson Dec. 19, 2024, 6:05 p.m. UTC | #17

On 12/19/24 04:56, David Woodhouse wrote:
> On Thu, 2024-12-19 at 13:24 +0100, Thomas Huth wrote:
>> On 19/12/2024 09.49, David Woodhouse wrote:
>>> On 19 December 2024 09:35:13 CET, Thomas Huth <thuth@redhat.com> wrote:
>>>> On 18/12/2024 23.14, David Woodhouse wrote:
>>>>> On Wed, 2024-12-18 at 16:54 +0100, Thomas Huth wrote:
>>>>>> On 18/12/2024 12.48, David Woodhouse wrote:
>>>>>>> On 18 December 2024 12:32:49 CET, Thomas Huth <thuth@redhat.com> wrote:
>>>>>>>> Use the serial console to execute the commands in the guest instead
>>>>>>>> of using ssh since we don't have ssh support in the functional
>>>>>>>> framework yet.
>>>>>>>>
>>>>>>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>>>>>>
>>>>>>> Hm, but serial is lossy and experience shows that it leads to flaky tests if the guest (or host) misses bytes. While SSH would just go slower.
>>>>>>
>>>>>> I now noticed some issue with the serial console in this test, too.
>>>>>> Looks like the "Starting dropbear sshd: OK" is not print in an atomic way by
>>>>>> the guest, sometimes there are other kernel messages between the ":" and the
>>>>>> "OK". It works reliable when removing the "OK" from the string.
>>>>>
>>>>> Nah, that still isn't atomic; you just got lucky because the race
>>>>> window is smaller. It's not like serial ports are at a premium; can't
>>>>> you have a separate port for kernel vs. userspace messages?
>>>>
>>>> Maybe easiest solution: Simply add "quiet" to the kernel command line, then it does not write the kernel messages to the serial console anymore.
>>>
>>> Want to resend the bug report about that test failing again? But without the kernel messages this time... :)
>>
>> With "quiet", the output just looks like this when it hangs:
>>
>>    Cannot get hvm parameter CONSOLE_EVTCHN (18): -22!
>>    Spectre V2 : Kernel not compiled with retpoline; no mitigation available!
>>    kvm_intel: VMX not supported by CPU 0
>>    Cannot get hvm parameter CONSOLE_EVTCHN (18): -22!
>>    fail to initialize ptp_kvm
> 
> Yeah, that request was rhetorical. That output is useless for
> understanding anything about what happened.
> 
>> Anyway, to properly track this, I've now created a ticket with the full log:
>>
>>    https://gitlab.com/qemu-project/qemu/-/issues/2731
> 
> The patch below should fix it. I don't like it very much; it's very
> much papering over a much bigger generic problem with QEMU's handling
> of shared interrupts.
> 
> Basically, *nothing* should just directly set the system GSIs to
> "their" desired level with qemu_set_irq(). Each device should feed into
> a multiplexer which is essentially an OR gate, and the *output* of that
> mux goes into the actual GSI.

We have such a device: include/hw/or-irq.h.
How simple it is to wire that into this machine model is left unexplored.


r~

David Woodhouse Dec. 20, 2024, 12:22 p.m. UTC | #18

On Thu, 2024-12-19 at 10:05 -0800, Richard Henderson wrote:
> On 12/19/24 04:56, David Woodhouse wrote:
> > On Thu, 2024-12-19 at 13:24 +0100, Thomas Huth wrote:
> > 
> > > Anyway, to properly track this, I've now created a ticket with the full log:
> > > 
> > >    https://gitlab.com/qemu-project/qemu/-/issues/2731
> > 
> > The patch below should fix it. I don't like it very much; it's very
> > much papering over a much bigger generic problem with QEMU's handling
> > of shared interrupts.
> > 
> > Basically, *nothing* should just directly set the system GSIs to
> > "their" desired level with qemu_set_irq(). Each device should feed into
> > a multiplexer which is essentially an OR gate, and the *output* of that
> > mux goes into the actual GSI.
> 
> We have such a device: include/hw/or-irq.h.
> How simple it is to wire that into this machine model is left
> unexplored.

It's not trivial; I think every source feeding interrupts into
x86ms->gsi[] would need to have its own input into one of these OrIRQ
devices. At which point parts of the PCI bus INTx routing start to look
a bit pointless since it's mostly just implementing that function (and
the SCI IRQ for ICH9 is also manually OR'd in).

I wouldn't be entirely averse to embarking on that journey, but it
*still* wouldn't fix VFIO. For that we really do want the other mode,
of invoking callbacks to reassert the IRQ if the device still wants
attention. That would actually be a lot nicer for the Xen GSI case too,
as we'd just resample when the IRQ is acked at the APIC, rather than
constantly having to check whether we should deassert it.

And doing *that* one is even more yak shaving. And would also lead to
fixing the whole MSI routing mess to get cached translations right...

For now, given that the Xen code already *had* a hook in gsi_handler, I
think I can live with the patch I posted.

tests/functional: Convert the kvm_xen_guest avocado test

Commit Message

Comments

Patch