diff mbox

[v2,6/6] KVM: arm/arm64: vgic: Allow configuration of interrupt groups

Message ID 1530697100-22419-7-git-send-email-christoffer.dall@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Christoffer Dall July 4, 2018, 9:38 a.m. UTC
Implement the required MMIO accessors for GICv2 and GICv3 for the
IGROUPR distributor and redistributor registers.

This can allow guests to change behavior compared to running on previous
versions of KVM, but only to align with the architecture and hardware
implementations.

This also allows userspace to configure the groups for interrupts.  Note
that this potentially results in GICv2 guests not receiving interrupts
after migration if migrating from an older kernel that exposes GICv2
interrupts as group 1.

Cc: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
---
I implemented (but stashed) a version of this which predicated the
behavior based on the value of GICD_IIDR revision field, falling back to
ignoring writes and resetting GICv2 groups to 0 if the guest wrote a
revision less than 2.  However, current QEMU implementations simply
don't write the GICD_IIDR, so this doesn't help at all without changing
QEMU anyhow.

The only actual fix I can see here to work around the problem in the
kernel is to require an opt-in to allow restoring groups from userspace,
but that's a lot of logic to support cross-kernel version migration.

Question: Do we expect that cross-kernel version migration is a critical
feature that people really expect to work, and do we actually have
examples of catering to this in the kernel elsewhere?  (Also, how would
then that relate to the whole 'adding a new sysreg breaks migration'
situation?)

 virt/kvm/arm/vgic/vgic-init.c    |  2 +-
 virt/kvm/arm/vgic/vgic-mmio-v2.c |  4 +++-
 virt/kvm/arm/vgic/vgic-mmio-v3.c | 11 +++++++++--
 virt/kvm/arm/vgic/vgic-mmio.c    | 38 ++++++++++++++++++++++++++++++++++++++
 virt/kvm/arm/vgic/vgic-mmio.h    |  6 ++++++
 5 files changed, 57 insertions(+), 4 deletions(-)

--
2.7.4

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Comments

Marc Zyngier July 9, 2018, 8:42 a.m. UTC | #1
On 04/07/18 10:38, Christoffer Dall wrote:
> Implement the required MMIO accessors for GICv2 and GICv3 for the
> IGROUPR distributor and redistributor registers.
> 
> This can allow guests to change behavior compared to running on previous
> versions of KVM, but only to align with the architecture and hardware
> implementations.
> 
> This also allows userspace to configure the groups for interrupts.  Note
> that this potentially results in GICv2 guests not receiving interrupts
> after migration if migrating from an older kernel that exposes GICv2
> interrupts as group 1.
> 
> Cc: Andrew Jones <drjones@redhat.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> ---
> I implemented (but stashed) a version of this which predicated the
> behavior based on the value of GICD_IIDR revision field, falling back to
> ignoring writes and resetting GICv2 groups to 0 if the guest wrote a
> revision less than 2.  However, current QEMU implementations simply
> don't write the GICD_IIDR, so this doesn't help at all without changing
> QEMU anyhow.
> 
> The only actual fix I can see here to work around the problem in the
> kernel is to require an opt-in to allow restoring groups from userspace,
> but that's a lot of logic to support cross-kernel version migration.
> 
> Question: Do we expect that cross-kernel version migration is a critical
> feature that people really expect to work, and do we actually have
> examples of catering to this in the kernel elsewhere?  (Also, how would
> then that relate to the whole 'adding a new sysreg breaks migration'
> situation?)

I don't really get why QEMU doesn't try to restore GICD_IIDR, while it
is definitely trying to restore RO sysregs (and that's how we detect
incompatibilities).

I think we should at least give userspace a chance to do the right
thing. If it doesn't, well, too bad.

How bad is that "writable GICD_IIDR" patch? Because at this stage, and
in the absence of any comment, I'm close to just pick that series and
merge it for 4.19.

Thanks,

	M.
Peter Maydell July 9, 2018, 8:52 a.m. UTC | #2
On 9 July 2018 at 09:42, Marc Zyngier <marc.zyngier@arm.com> wrote:
> I don't really get why QEMU doesn't try to restore GICD_IIDR, while it
> is definitely trying to restore RO sysregs (and that's how we detect
> incompatibilities).

Accident of design, mostly. From QEMU's point of view, GICD_IIDR
is part of the GIC device, which we save and restore as a separate
thing from the CPU. The GIC device was written in what for QEMU
is a more 'traditional' style, where QEMU assumes it knows all the
registers that might have state and saves and restores them all
(and doesn't bother to do anything with constant registers).
The CPU sysregs are done in a completely different style[*], where
we let the kernel be the source of truth for what sysregs exist;
as a side effect of that we end up trying to save and restore
constant sysregs, since QEMU doesn't know they're constant.

[*] there's an argument that in retrospect this was a mistake;
still, it is what we have and trying to upend it now would be
a huge pain.

thanks
-- PMM
Christoffer Dall July 9, 2018, 10:48 p.m. UTC | #3
On Mon, Jul 09, 2018 at 09:42:40AM +0100, Marc Zyngier wrote:
> On 04/07/18 10:38, Christoffer Dall wrote:
> > Implement the required MMIO accessors for GICv2 and GICv3 for the
> > IGROUPR distributor and redistributor registers.
> > 
> > This can allow guests to change behavior compared to running on previous
> > versions of KVM, but only to align with the architecture and hardware
> > implementations.
> > 
> > This also allows userspace to configure the groups for interrupts.  Note
> > that this potentially results in GICv2 guests not receiving interrupts
> > after migration if migrating from an older kernel that exposes GICv2
> > interrupts as group 1.
> > 
> > Cc: Andrew Jones <drjones@redhat.com>
> > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> > ---
> > I implemented (but stashed) a version of this which predicated the
> > behavior based on the value of GICD_IIDR revision field, falling back to
> > ignoring writes and resetting GICv2 groups to 0 if the guest wrote a
> > revision less than 2.  However, current QEMU implementations simply
> > don't write the GICD_IIDR, so this doesn't help at all without changing
> > QEMU anyhow.
> > 
> > The only actual fix I can see here to work around the problem in the
> > kernel is to require an opt-in to allow restoring groups from userspace,
> > but that's a lot of logic to support cross-kernel version migration.
> > 
> > Question: Do we expect that cross-kernel version migration is a critical
> > feature that people really expect to work, and do we actually have
> > examples of catering to this in the kernel elsewhere?  (Also, how would
> > then that relate to the whole 'adding a new sysreg breaks migration'
> > situation?)
> 
> I don't really get why QEMU doesn't try to restore GICD_IIDR, while it
> is definitely trying to restore RO sysregs (and that's how we detect
> incompatibilities).
> 
> I think we should at least give userspace a chance to do the right
> thing. If it doesn't, well, too bad.

This series should give userspace an option to adjust its behavior.

My main concern is that this version of the series results in the worst
kind of migration failures, where the guest simply doesn't run on the
destination side with no warnings or error messages returned to the
user..

We could add logic to return an error code if trying to write a
different revision than what the kernel has (similar to the invariant
sysregs), so that a simple fix to QEMU to save restore the GICD_IIDR
register at least results in an error being returned to userspace.

However, as QEMU doesn't do anything useful here today (not blaming
anyone, I wrote the apparently broken GIC save/restore code for QEMU
myself), we could also argue that QEMU might as well just fix things up
if it detects a different IIDR.

> 
> How bad is that "writable GICD_IIDR" patch? Because at this stage, and
> in the absence of any comment, I'm close to just pick that series and
> merge it for 4.19.
> 

My guess is that a patch to "save/restore GICD_IIDR" is simple enough,
but requires additional logic in the kernel that returns an error if the
GICD_IIDR don't match on write.

A patch which changes the groups and bumps the IIDR in userspace is
probably more complex.

Sounds like I should add the GICD_IIDR checking patch.  Thoughts?

What I would really like to know is whether this is really an issue or
not.  Do people who run products based on KVM, such as RHEV, have any
expectations about cross-kernel version migration?


Thanks,
-Christoffer
Marc Zyngier July 10, 2018, 8:33 a.m. UTC | #4
On 09/07/18 23:48, Christoffer Dall wrote:
> On Mon, Jul 09, 2018 at 09:42:40AM +0100, Marc Zyngier wrote:
>> On 04/07/18 10:38, Christoffer Dall wrote:
>>> Implement the required MMIO accessors for GICv2 and GICv3 for the
>>> IGROUPR distributor and redistributor registers.
>>>
>>> This can allow guests to change behavior compared to running on previous
>>> versions of KVM, but only to align with the architecture and hardware
>>> implementations.
>>>
>>> This also allows userspace to configure the groups for interrupts.  Note
>>> that this potentially results in GICv2 guests not receiving interrupts
>>> after migration if migrating from an older kernel that exposes GICv2
>>> interrupts as group 1.
>>>
>>> Cc: Andrew Jones <drjones@redhat.com>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
>>> ---
>>> I implemented (but stashed) a version of this which predicated the
>>> behavior based on the value of GICD_IIDR revision field, falling back to
>>> ignoring writes and resetting GICv2 groups to 0 if the guest wrote a
>>> revision less than 2.  However, current QEMU implementations simply
>>> don't write the GICD_IIDR, so this doesn't help at all without changing
>>> QEMU anyhow.
>>>
>>> The only actual fix I can see here to work around the problem in the
>>> kernel is to require an opt-in to allow restoring groups from userspace,
>>> but that's a lot of logic to support cross-kernel version migration.
>>>
>>> Question: Do we expect that cross-kernel version migration is a critical
>>> feature that people really expect to work, and do we actually have
>>> examples of catering to this in the kernel elsewhere?  (Also, how would
>>> then that relate to the whole 'adding a new sysreg breaks migration'
>>> situation?)
>>
>> I don't really get why QEMU doesn't try to restore GICD_IIDR, while it
>> is definitely trying to restore RO sysregs (and that's how we detect
>> incompatibilities).
>>
>> I think we should at least give userspace a chance to do the right
>> thing. If it doesn't, well, too bad.
> 
> This series should give userspace an option to adjust its behavior.
> 
> My main concern is that this version of the series results in the worst
> kind of migration failures, where the guest simply doesn't run on the
> destination side with no warnings or error messages returned to the
> user..

I agree, which is why I'm suggesting we make IIDR writeable. And yes, it
still requires userspace to be fixed to actually write IIDR.

> 
> We could add logic to return an error code if trying to write a
> different revision than what the kernel has (similar to the invariant
> sysregs), so that a simple fix to QEMU to save restore the GICD_IIDR
> register at least results in an error being returned to userspace.
> 
> However, as QEMU doesn't do anything useful here today (not blaming
> anyone, I wrote the apparently broken GIC save/restore code for QEMU
> myself), we could also argue that QEMU might as well just fix things up
> if it detects a different IIDR.
> 
>>
>> How bad is that "writable GICD_IIDR" patch? Because at this stage, and
>> in the absence of any comment, I'm close to just pick that series and
>> merge it for 4.19.
>>
> 
> My guess is that a patch to "save/restore GICD_IIDR" is simple enough,
> but requires additional logic in the kernel that returns an error if the
> GICD_IIDR don't match on write.
> 
> A patch which changes the groups and bumps the IIDR in userspace is
> probably more complex.
> 
> Sounds like I should add the GICD_IIDR checking patch.  Thoughts?

I'm quite keen on that. It makes the userspace change trivial, aligns it
somehow on the sysreg and ITS ABI behaviours.

> What I would really like to know is whether this is really an issue or
> not.  Do people who run products based on KVM, such as RHEV, have any
> expectations about cross-kernel version migration?

I'd like to know as well.

Thanks,

	M.
Eric Auger July 10, 2018, 9:12 a.m. UTC | #5
Hi Christoffer, Marc,

On 07/10/2018 10:33 AM, Marc Zyngier wrote:
> On 09/07/18 23:48, Christoffer Dall wrote:
>> On Mon, Jul 09, 2018 at 09:42:40AM +0100, Marc Zyngier wrote:
>>> On 04/07/18 10:38, Christoffer Dall wrote:
>>>> Implement the required MMIO accessors for GICv2 and GICv3 for the
>>>> IGROUPR distributor and redistributor registers.
>>>>
>>>> This can allow guests to change behavior compared to running on previous
>>>> versions of KVM, but only to align with the architecture and hardware
>>>> implementations.
>>>>
>>>> This also allows userspace to configure the groups for interrupts.  Note
>>>> that this potentially results in GICv2 guests not receiving interrupts
>>>> after migration if migrating from an older kernel that exposes GICv2
>>>> interrupts as group 1.
>>>>
>>>> Cc: Andrew Jones <drjones@redhat.com>
>>>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
>>>> ---
>>>> I implemented (but stashed) a version of this which predicated the
>>>> behavior based on the value of GICD_IIDR revision field, falling back to
>>>> ignoring writes and resetting GICv2 groups to 0 if the guest wrote a
>>>> revision less than 2.  However, current QEMU implementations simply
>>>> don't write the GICD_IIDR, so this doesn't help at all without changing
>>>> QEMU anyhow.
>>>>
>>>> The only actual fix I can see here to work around the problem in the
>>>> kernel is to require an opt-in to allow restoring groups from userspace,
>>>> but that's a lot of logic to support cross-kernel version migration.
>>>>
>>>> Question: Do we expect that cross-kernel version migration is a critical
>>>> feature that people really expect to work, and do we actually have
>>>> examples of catering to this in the kernel elsewhere?  (Also, how would
>>>> then that relate to the whole 'adding a new sysreg breaks migration'
>>>> situation?)
>>>
>>> I don't really get why QEMU doesn't try to restore GICD_IIDR, while it
>>> is definitely trying to restore RO sysregs (and that's how we detect
>>> incompatibilities).
>>>
>>> I think we should at least give userspace a chance to do the right
>>> thing. If it doesn't, well, too bad.
>>
>> This series should give userspace an option to adjust its behavior.
>>
>> My main concern is that this version of the series results in the worst
>> kind of migration failures, where the guest simply doesn't run on the
>> destination side with no warnings or error messages returned to the
>> user..
> 
> I agree, which is why I'm suggesting we make IIDR writeable. And yes, it
> still requires userspace to be fixed to actually write IIDR.
> 
>>
>> We could add logic to return an error code if trying to write a
>> different revision than what the kernel has (similar to the invariant
>> sysregs), so that a simple fix to QEMU to save restore the GICD_IIDR
>> register at least results in an error being returned to userspace.
>>
>> However, as QEMU doesn't do anything useful here today (not blaming
>> anyone, I wrote the apparently broken GIC save/restore code for QEMU
>> myself), we could also argue that QEMU might as well just fix things up
>> if it detects a different IIDR.
>>
>>>
>>> How bad is that "writable GICD_IIDR" patch? Because at this stage, and
>>> in the absence of any comment, I'm close to just pick that series and
>>> merge it for 4.19.
>>>
>>
>> My guess is that a patch to "save/restore GICD_IIDR" is simple enough,
>> but requires additional logic in the kernel that returns an error if the
>> GICD_IIDR don't match on write.
>>
>> A patch which changes the groups and bumps the IIDR in userspace is
>> probably more complex.
>>
>> Sounds like I should add the GICD_IIDR checking patch.  Thoughts?
> 
> I'm quite keen on that. It makes the userspace change trivial, aligns it
> somehow on the sysreg and ITS ABI behaviours.
> 
>> What I would really like to know is whether this is really an issue or
>> not.  Do people who run products based on KVM, such as RHEV, have any
>> expectations about cross-kernel version migration?
> 
> I'd like to know as well.

Sorry I did not have time to review the series properly. My
understanding is we generally care about migration between different
kernels. For the ITS, we made the IIDR writable too to manage the ABI
version properly. See ab01c6bdacc43c41c6b326889f4358f5afc38bf9. But
maybe I missed the crux of the matter.

Thanks

Eric
> 
> Thanks,
> 
> 	M.
>
Peter Maydell July 10, 2018, 9:17 a.m. UTC | #6
On 10 July 2018 at 10:12, Auger Eric <eric.auger@redhat.com> wrote:
> Hi Christoffer, Marc,
>
> On 07/10/2018 10:33 AM, Marc Zyngier wrote:
>> On 09/07/18 23:48, Christoffer Dall wrote:
>>> What I would really like to know is whether this is really an issue or
>>> not.  Do people who run products based on KVM, such as RHEV, have any
>>> expectations about cross-kernel version migration?
>>
>> I'd like to know as well.
>
> Sorry I did not have time to review the series properly. My
> understanding is we generally care about migration between different
> kernels. For the ITS, we made the IIDR writable too to manage the ABI
> version properly. See ab01c6bdacc43c41c6b326889f4358f5afc38bf9. But
> maybe I missed the crux of the matter.

I think the question is whether we're doing this because
we think hypothetical users care, or because we know actual
real in-production users really care...

thanks
-- PMM
Eric Auger July 10, 2018, 9:57 a.m. UTC | #7
Hi,

On 07/10/2018 12:48 AM, Christoffer Dall wrote:
> On Mon, Jul 09, 2018 at 09:42:40AM +0100, Marc Zyngier wrote:
>> On 04/07/18 10:38, Christoffer Dall wrote:
>>> Implement the required MMIO accessors for GICv2 and GICv3 for the
>>> IGROUPR distributor and redistributor registers.
>>>
>>> This can allow guests to change behavior compared to running on previous
>>> versions of KVM, but only to align with the architecture and hardware
>>> implementations.
>>>
>>> This also allows userspace to configure the groups for interrupts.  Note
>>> that this potentially results in GICv2 guests not receiving interrupts
>>> after migration if migrating from an older kernel that exposes GICv2
>>> interrupts as group 1.
>>>
>>> Cc: Andrew Jones <drjones@redhat.com>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
>>> ---
>>> I implemented (but stashed) a version of this which predicated the
>>> behavior based on the value of GICD_IIDR revision field, falling back to
>>> ignoring writes and resetting GICv2 groups to 0 if the guest wrote a
>>> revision less than 2.  However, current QEMU implementations simply
>>> don't write the GICD_IIDR, so this doesn't help at all without changing
>>> QEMU anyhow.
>>>
>>> The only actual fix I can see here to work around the problem in the
>>> kernel is to require an opt-in to allow restoring groups from userspace,
>>> but that's a lot of logic to support cross-kernel version migration.
>>>
>>> Question: Do we expect that cross-kernel version migration is a critical
>>> feature that people really expect to work, and do we actually have
>>> examples of catering to this in the kernel elsewhere?  (Also, how would
>>> then that relate to the whole 'adding a new sysreg breaks migration'
>>> situation?)
>>
>> I don't really get why QEMU doesn't try to restore GICD_IIDR, while it
>> is definitely trying to restore RO sysregs (and that's how we detect
>> incompatibilities).
>>
>> I think we should at least give userspace a chance to do the right
>> thing. If it doesn't, well, too bad.
> 
> This series should give userspace an option to adjust its behavior.
> 
> My main concern is that this version of the series results in the worst
> kind of migration failures, where the guest simply doesn't run on the
> destination side with no warnings or error messages returned to the
> user..

Adding Dave in the loop to comment about general user perspective.
> 
> We could add logic to return an error code if trying to write a
> different revision than what the kernel has (similar to the invariant
> sysregs), so that a simple fix to QEMU to save restore the GICD_IIDR
> register at least results in an error being returned to userspace.
> 
> However, as QEMU doesn't do anything useful here today (not blaming
> anyone, I wrote the apparently broken GIC save/restore code for QEMU
> myself), we could also argue that QEMU might as well just fix things up
> if it detects a different IIDR.
> 
>>
>> How bad is that "writable GICD_IIDR" patch? Because at this stage, and
>> in the absence of any comment, I'm close to just pick that series and
>> merge it for 4.19.
>>
> 
> My guess is that a patch to "save/restore GICD_IIDR" is simple enough,
> but requires additional logic in the kernel that returns an error if the
> GICD_IIDR don't match on write.
> 
> A patch which changes the groups and bumps the IIDR in userspace is
> probably more complex.
> 
> Sounds like I should add the GICD_IIDR checking patch.  Thoughts?
> 
> What I would really like to know is whether this is really an issue or
> not.  Do people who run products based on KVM, such as RHEV, have any
> expectations about cross-kernel version migration?

Thanks

Eric
> 
> 
> Thanks,
> -Christoffer
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Dr. David Alan Gilbert July 10, 2018, 10:32 a.m. UTC | #8
* Auger Eric (eric.auger@redhat.com) wrote:
> Hi,
> 
> On 07/10/2018 12:48 AM, Christoffer Dall wrote:
> > On Mon, Jul 09, 2018 at 09:42:40AM +0100, Marc Zyngier wrote:
> >> On 04/07/18 10:38, Christoffer Dall wrote:
> >>> Implement the required MMIO accessors for GICv2 and GICv3 for the
> >>> IGROUPR distributor and redistributor registers.
> >>>
> >>> This can allow guests to change behavior compared to running on previous
> >>> versions of KVM, but only to align with the architecture and hardware
> >>> implementations.
> >>>
> >>> This also allows userspace to configure the groups for interrupts.  Note
> >>> that this potentially results in GICv2 guests not receiving interrupts
> >>> after migration if migrating from an older kernel that exposes GICv2
> >>> interrupts as group 1.
> >>>
> >>> Cc: Andrew Jones <drjones@redhat.com>
> >>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> >>> ---
> >>> I implemented (but stashed) a version of this which predicated the
> >>> behavior based on the value of GICD_IIDR revision field, falling back to
> >>> ignoring writes and resetting GICv2 groups to 0 if the guest wrote a
> >>> revision less than 2.  However, current QEMU implementations simply
> >>> don't write the GICD_IIDR, so this doesn't help at all without changing
> >>> QEMU anyhow.
> >>>
> >>> The only actual fix I can see here to work around the problem in the
> >>> kernel is to require an opt-in to allow restoring groups from userspace,
> >>> but that's a lot of logic to support cross-kernel version migration.
> >>>
> >>> Question: Do we expect that cross-kernel version migration is a critical
> >>> feature that people really expect to work, and do we actually have
> >>> examples of catering to this in the kernel elsewhere?  (Also, how would
> >>> then that relate to the whole 'adding a new sysreg breaks migration'
> >>> situation?)
> >>
> >> I don't really get why QEMU doesn't try to restore GICD_IIDR, while it
> >> is definitely trying to restore RO sysregs (and that's how we detect
> >> incompatibilities).
> >>
> >> I think we should at least give userspace a chance to do the right
> >> thing. If it doesn't, well, too bad.
> > 
> > This series should give userspace an option to adjust its behavior.
> > 
> > My main concern is that this version of the series results in the worst
> > kind of migration failures, where the guest simply doesn't run on the
> > destination side with no warnings or error messages returned to the
> > user..
> 
> Adding Dave in the loop to comment about general user perspective.

Without understanding the details of the GIC, but I have to agree that
a failure without error where the guest is hung is one of the worst
cases - when a user comes to you saying that their VM hangs after
migration with no diagnostics, then you know you're in for some nasty
debug!

On other architectures we definitely provide this level of compatibility
between kernels, libraries, qemu and everything in between - it's not
easy, and we do screwup from time to time; but it's still what we try
and get right.

So ideally you'd make this switchable and wire it into a versioned
machine type in QEMU, so that only virt-3.0 VMs would use this (and
they'd somehow know they needed the new kernel to do it).

If that's not possible then you could add a subsection to the GIC migration
data if you can detect at migration time that this feature is being
used, and make the destination check for the feature/kernel.
Migrating to an older qemu would fail since it wouldn't know about the
new subsection.  This should at least get a clear failure.

For a user, this still gets messy if they do something like start
upgrading some of the hosts in an openstack cluster, which they often
do incrementally; so you'll suddenly get the situation of a VM that
was started on a host with a newer kernel being migrated to an older one
and stuff breaks.

Dave

> > 
> > We could add logic to return an error code if trying to write a
> > different revision than what the kernel has (similar to the invariant
> > sysregs), so that a simple fix to QEMU to save restore the GICD_IIDR
> > register at least results in an error being returned to userspace.
> > 
> > However, as QEMU doesn't do anything useful here today (not blaming
> > anyone, I wrote the apparently broken GIC save/restore code for QEMU
> > myself), we could also argue that QEMU might as well just fix things up
> > if it detects a different IIDR.
> > 
> >>
> >> How bad is that "writable GICD_IIDR" patch? Because at this stage, and
> >> in the absence of any comment, I'm close to just pick that series and
> >> merge it for 4.19.
> >>
> > 
> > My guess is that a patch to "save/restore GICD_IIDR" is simple enough,
> > but requires additional logic in the kernel that returns an error if the
> > GICD_IIDR don't match on write.
> > 
> > A patch which changes the groups and bumps the IIDR in userspace is
> > probably more complex.
> > 
> > Sounds like I should add the GICD_IIDR checking patch.  Thoughts?
> > 
> > What I would really like to know is whether this is really an issue or
> > not.  Do people who run products based on KVM, such as RHEV, have any
> > expectations about cross-kernel version migration?
> 
> Thanks
> 
> Eric
> > 
> > 
> > Thanks,
> > -Christoffer
> > 
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> > 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Christoffer Dall July 10, 2018, 3:27 p.m. UTC | #9
On Tue, Jul 10, 2018 at 11:32:24AM +0100, Dr. David Alan Gilbert wrote:
> * Auger Eric (eric.auger@redhat.com) wrote:
> > Hi,
> > 
> > On 07/10/2018 12:48 AM, Christoffer Dall wrote:
> > > On Mon, Jul 09, 2018 at 09:42:40AM +0100, Marc Zyngier wrote:
> > >> On 04/07/18 10:38, Christoffer Dall wrote:
> > >>> Implement the required MMIO accessors for GICv2 and GICv3 for the
> > >>> IGROUPR distributor and redistributor registers.
> > >>>
> > >>> This can allow guests to change behavior compared to running on previous
> > >>> versions of KVM, but only to align with the architecture and hardware
> > >>> implementations.
> > >>>
> > >>> This also allows userspace to configure the groups for interrupts.  Note
> > >>> that this potentially results in GICv2 guests not receiving interrupts
> > >>> after migration if migrating from an older kernel that exposes GICv2
> > >>> interrupts as group 1.
> > >>>
> > >>> Cc: Andrew Jones <drjones@redhat.com>
> > >>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> > >>> ---
> > >>> I implemented (but stashed) a version of this which predicated the
> > >>> behavior based on the value of GICD_IIDR revision field, falling back to
> > >>> ignoring writes and resetting GICv2 groups to 0 if the guest wrote a
> > >>> revision less than 2.  However, current QEMU implementations simply
> > >>> don't write the GICD_IIDR, so this doesn't help at all without changing
> > >>> QEMU anyhow.
> > >>>
> > >>> The only actual fix I can see here to work around the problem in the
> > >>> kernel is to require an opt-in to allow restoring groups from userspace,
> > >>> but that's a lot of logic to support cross-kernel version migration.
> > >>>
> > >>> Question: Do we expect that cross-kernel version migration is a critical
> > >>> feature that people really expect to work, and do we actually have
> > >>> examples of catering to this in the kernel elsewhere?  (Also, how would
> > >>> then that relate to the whole 'adding a new sysreg breaks migration'
> > >>> situation?)
> > >>
> > >> I don't really get why QEMU doesn't try to restore GICD_IIDR, while it
> > >> is definitely trying to restore RO sysregs (and that's how we detect
> > >> incompatibilities).
> > >>
> > >> I think we should at least give userspace a chance to do the right
> > >> thing. If it doesn't, well, too bad.
> > > 
> > > This series should give userspace an option to adjust its behavior.
> > > 
> > > My main concern is that this version of the series results in the worst
> > > kind of migration failures, where the guest simply doesn't run on the
> > > destination side with no warnings or error messages returned to the
> > > user..
> > 
> > Adding Dave in the loop to comment about general user perspective.
> 
> Without understanding the details of the GIC, but I have to agree that
> a failure without error where the guest is hung is one of the worst
> cases - when a user comes to you saying that their VM hangs after
> migration with no diagnostics, then you know you're in for some nasty
> debug!
> 
> On other architectures we definitely provide this level of compatibility
> between kernels, libraries, qemu and everything in between - it's not
> easy, and we do screwup from time to time; but it's still what we try
> and get right.

That's good to know.

> 
> So ideally you'd make this switchable and wire it into a versioned
> machine type in QEMU, so that only virt-3.0 VMs would use this (and
> they'd somehow know they needed the new kernel to do it).
> 
> If that's not possible then you could add a subsection to the GIC migration
> data if you can detect at migration time that this feature is being
> used, and make the destination check for the feature/kernel.
> Migrating to an older qemu would fail since it wouldn't know about the
> new subsection.  This should at least get a clear failure.
> 
> For a user, this still gets messy if they do something like start
> upgrading some of the hosts in an openstack cluster, which they often
> do incrementally; so you'll suddenly get the situation of a VM that
> was started on a host with a newer kernel being migrated to an older one
> and stuff breaks.
> 

I think we should ask userspace to opt-in to the new behavior and fix
userspace to save/restore the IIDR while we're at it.

Unless someone objects, I'll try to come up with a v3 that asks
userspace to confirm it wants writable groups.  Ideally I'd like for
this to happen automatically if userspace writes an IIDR with revision 2
and above, but that may result in either

  (1) imposing ordering on the restore sequence from userspace;
      userspace must write IIDR before IGROUPR if it wants non-default
      groups,
  (2) terrible sequence of locking and resetting everything if IIDR
      hasn't been written before time of first executing a VCPU, or
  (3) an additional bookkeeping flag in the critical path for GICv2
      which ignores the group unless userspace wrote IIDR.

Out of the three, I think (3) is the least desirable because it
precludes the guest from programming its own groups.  I'll have a look
at how (2) looks, because it hides everything, and finally we can fall
back to (1) and document it clearly.

Thanks,
Christoffer
Peter Maydell July 10, 2018, 3:38 p.m. UTC | #10
On 10 July 2018 at 16:27, Christoffer Dall <christoffer.dall@arm.com> wrote:
> Unless someone objects, I'll try to come up with a v3 that asks
> userspace to confirm it wants writable groups.  Ideally I'd like for
> this to happen automatically if userspace writes an IIDR with revision 2
> and above, but that may result in either
>
>   (1) imposing ordering on the restore sequence from userspace;
>       userspace must write IIDR before IGROUPR if it wants non-default
>       groups,
>   (2) terrible sequence of locking and resetting everything if IIDR
>       hasn't been written before time of first executing a VCPU, or
>   (3) an additional bookkeeping flag in the critical path for GICv2
>       which ignores the group unless userspace wrote IIDR.
>
> Out of the three, I think (3) is the least desirable because it
> precludes the guest from programming its own groups.  I'll have a look
> at how (2) looks, because it hides everything, and finally we can fall
> back to (1) and document it clearly.

There are already some ordering restrictions on the restore
sequence for the GICv3, I think (eg there's a comment in QEMU
about having to restore GICR_PROPBASER/PENDBASER before GICR_CTLR),
so (1) isn't the end of the world...

thanks
-- PMM
Marc Zyngier July 10, 2018, 3:50 p.m. UTC | #11
On 10/07/18 16:27, Christoffer Dall wrote:
> On Tue, Jul 10, 2018 at 11:32:24AM +0100, Dr. David Alan Gilbert wrote:
>> * Auger Eric (eric.auger@redhat.com) wrote:
>>> Hi,
>>>
>>> On 07/10/2018 12:48 AM, Christoffer Dall wrote:
>>>> On Mon, Jul 09, 2018 at 09:42:40AM +0100, Marc Zyngier wrote:
>>>>> On 04/07/18 10:38, Christoffer Dall wrote:
>>>>>> Implement the required MMIO accessors for GICv2 and GICv3 for the
>>>>>> IGROUPR distributor and redistributor registers.
>>>>>>
>>>>>> This can allow guests to change behavior compared to running on previous
>>>>>> versions of KVM, but only to align with the architecture and hardware
>>>>>> implementations.
>>>>>>
>>>>>> This also allows userspace to configure the groups for interrupts.  Note
>>>>>> that this potentially results in GICv2 guests not receiving interrupts
>>>>>> after migration if migrating from an older kernel that exposes GICv2
>>>>>> interrupts as group 1.
>>>>>>
>>>>>> Cc: Andrew Jones <drjones@redhat.com>
>>>>>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
>>>>>> ---
>>>>>> I implemented (but stashed) a version of this which predicated the
>>>>>> behavior based on the value of GICD_IIDR revision field, falling back to
>>>>>> ignoring writes and resetting GICv2 groups to 0 if the guest wrote a
>>>>>> revision less than 2.  However, current QEMU implementations simply
>>>>>> don't write the GICD_IIDR, so this doesn't help at all without changing
>>>>>> QEMU anyhow.
>>>>>>
>>>>>> The only actual fix I can see here to work around the problem in the
>>>>>> kernel is to require an opt-in to allow restoring groups from userspace,
>>>>>> but that's a lot of logic to support cross-kernel version migration.
>>>>>>
>>>>>> Question: Do we expect that cross-kernel version migration is a critical
>>>>>> feature that people really expect to work, and do we actually have
>>>>>> examples of catering to this in the kernel elsewhere?  (Also, how would
>>>>>> then that relate to the whole 'adding a new sysreg breaks migration'
>>>>>> situation?)
>>>>>
>>>>> I don't really get why QEMU doesn't try to restore GICD_IIDR, while it
>>>>> is definitely trying to restore RO sysregs (and that's how we detect
>>>>> incompatibilities).
>>>>>
>>>>> I think we should at least give userspace a chance to do the right
>>>>> thing. If it doesn't, well, too bad.
>>>>
>>>> This series should give userspace an option to adjust its behavior.
>>>>
>>>> My main concern is that this version of the series results in the worst
>>>> kind of migration failures, where the guest simply doesn't run on the
>>>> destination side with no warnings or error messages returned to the
>>>> user..
>>>
>>> Adding Dave in the loop to comment about general user perspective.
>>
>> Without understanding the details of the GIC, but I have to agree that
>> a failure without error where the guest is hung is one of the worst
>> cases - when a user comes to you saying that their VM hangs after
>> migration with no diagnostics, then you know you're in for some nasty
>> debug!
>>
>> On other architectures we definitely provide this level of compatibility
>> between kernels, libraries, qemu and everything in between - it's not
>> easy, and we do screwup from time to time; but it's still what we try
>> and get right.
> 
> That's good to know.
> 
>>
>> So ideally you'd make this switchable and wire it into a versioned
>> machine type in QEMU, so that only virt-3.0 VMs would use this (and
>> they'd somehow know they needed the new kernel to do it).
>>
>> If that's not possible then you could add a subsection to the GIC migration
>> data if you can detect at migration time that this feature is being
>> used, and make the destination check for the feature/kernel.
>> Migrating to an older qemu would fail since it wouldn't know about the
>> new subsection.  This should at least get a clear failure.
>>
>> For a user, this still gets messy if they do something like start
>> upgrading some of the hosts in an openstack cluster, which they often
>> do incrementally; so you'll suddenly get the situation of a VM that
>> was started on a host with a newer kernel being migrated to an older one
>> and stuff breaks.
>>
> 
> I think we should ask userspace to opt-in to the new behavior and fix
> userspace to save/restore the IIDR while we're at it.
> 
> Unless someone objects, I'll try to come up with a v3 that asks
> userspace to confirm it wants writable groups.  Ideally I'd like for
> this to happen automatically if userspace writes an IIDR with revision 2
> and above, but that may result in either
> 
>   (1) imposing ordering on the restore sequence from userspace;
>       userspace must write IIDR before IGROUPR if it wants non-default
>       groups,
>   (2) terrible sequence of locking and resetting everything if IIDR
>       hasn't been written before time of first executing a VCPU, or
>   (3) an additional bookkeeping flag in the critical path for GICv2
>       which ignores the group unless userspace wrote IIDR.
> 
> Out of the three, I think (3) is the least desirable because it
> precludes the guest from programming its own groups.  I'll have a look
> at how (2) looks, because it hides everything, and finally we can fall
> back to (1) and document it clearly.

I don't see any issue with (1). Userspace just has to make sure that
IIDR is the first thing that gets written, like we have for the ITS
(where the IIDR must be written before restoring the tables).

(2) and (3) are really overkill, IMHO.

Thanks,

	M.
diff mbox

Patch

diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
index a7c19cd..c0c0b88 100644
--- a/virt/kvm/arm/vgic/vgic-init.c
+++ b/virt/kvm/arm/vgic/vgic-init.c
@@ -313,7 +313,7 @@  int vgic_init(struct kvm *kvm)

        vgic_debug_init(kvm);

-       dist->implementation_rev = 1;
+       dist->implementation_rev = 2;
        dist->initialized = true;

 out:
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c
index db646f1..a7f09b5 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
@@ -26,6 +26,8 @@ 
  * The Revision field in the IIDR have the following meanings:
  *
  * Revision 1: Report GICv2 interrupts as group 0 instead of group 1
+ * Revision 2: Interrupt groups are guest-configurable and signaled using
+ *            their configured groups.
  */

 static unsigned long vgic_mmio_read_v2_misc(struct kvm_vcpu *vcpu,
@@ -371,7 +373,7 @@  static const struct vgic_register_region vgic_v2_dist_registers[] = {
                vgic_mmio_read_v2_misc, vgic_mmio_write_v2_misc, 12,
                VGIC_ACCESS_32bit),
        REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_IGROUP,
-               vgic_mmio_read_raz, vgic_mmio_write_wi, NULL, NULL, 1,
+               vgic_mmio_read_group, vgic_mmio_write_group, NULL, NULL, 1,
                VGIC_ACCESS_32bit),
        REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ENABLE_SET,
                vgic_mmio_read_enable, vgic_mmio_write_senable, NULL, NULL, 1,
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
index ebe10a0..49df2a1 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
@@ -59,6 +59,13 @@  bool vgic_supports_direct_msis(struct kvm *kvm)
        return kvm_vgic_global_state.has_gicv4 && vgic_has_its(kvm);
 }

+/*
+ * The Revision field in the IIDR have the following meanings:
+ *
+ * Revision 2: Interrupt groups are guest-configurable and signaled using
+ *            their configured groups.
+ */
+
 static unsigned long vgic_mmio_read_v3_misc(struct kvm_vcpu *vcpu,
                                            gpa_t addr, unsigned int len)
 {
@@ -454,7 +461,7 @@  static const struct vgic_register_region vgic_v3_dist_registers[] = {
                vgic_mmio_read_rao, vgic_mmio_write_wi, 4,
                VGIC_ACCESS_32bit),
        REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_IGROUPR,
-               vgic_mmio_read_rao, vgic_mmio_write_wi, NULL, NULL, 1,
+               vgic_mmio_read_group, vgic_mmio_write_group, NULL, NULL, 1,
                VGIC_ACCESS_32bit),
        REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISENABLER,
                vgic_mmio_read_enable, vgic_mmio_write_senable, NULL, NULL, 1,
@@ -527,7 +534,7 @@  static const struct vgic_register_region vgic_v3_rdbase_registers[] = {

 static const struct vgic_register_region vgic_v3_sgibase_registers[] = {
        REGISTER_DESC_WITH_LENGTH(GICR_IGROUPR0,
-               vgic_mmio_read_rao, vgic_mmio_write_wi, 4,
+               vgic_mmio_read_group, vgic_mmio_write_group, 4,
                VGIC_ACCESS_32bit),
        REGISTER_DESC_WITH_LENGTH(GICR_ISENABLER0,
                vgic_mmio_read_enable, vgic_mmio_write_senable, 4,
diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
index ff9655c..ae31bd0 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.c
+++ b/virt/kvm/arm/vgic/vgic-mmio.c
@@ -40,6 +40,44 @@  void vgic_mmio_write_wi(struct kvm_vcpu *vcpu, gpa_t addr,
        /* Ignore */
 }

+unsigned long vgic_mmio_read_group(struct kvm_vcpu *vcpu,
+                                  gpa_t addr, unsigned int len)
+{
+       u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
+       u32 value = 0;
+       int i;
+
+       /* Loop over all IRQs affected by this read */
+       for (i = 0; i < len * 8; i++) {
+               struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
+
+               if (irq->group)
+                       value |= BIT(i);
+
+               vgic_put_irq(vcpu->kvm, irq);
+       }
+
+       return value;
+}
+
+void vgic_mmio_write_group(struct kvm_vcpu *vcpu, gpa_t addr,
+                          unsigned int len, unsigned long val)
+{
+       u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
+       int i;
+       unsigned long flags;
+
+       for (i = 0; i < len * 8; i++) {
+               struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
+
+               spin_lock_irqsave(&irq->irq_lock, flags);
+               irq->group = !!(val & BIT(i));
+               vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
+
+               vgic_put_irq(vcpu->kvm, irq);
+       }
+}
+
 /*
  * Read accesses to both GICD_ICENABLER and GICD_ISENABLER return the value
  * of the enabled bit, so there is only one function for both here.
diff --git a/virt/kvm/arm/vgic/vgic-mmio.h b/virt/kvm/arm/vgic/vgic-mmio.h
index 5693f6df..1079862 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.h
+++ b/virt/kvm/arm/vgic/vgic-mmio.h
@@ -134,6 +134,12 @@  unsigned long vgic_mmio_read_rao(struct kvm_vcpu *vcpu,
 void vgic_mmio_write_wi(struct kvm_vcpu *vcpu, gpa_t addr,
                        unsigned int len, unsigned long val);

+unsigned long vgic_mmio_read_group(struct kvm_vcpu *vcpu, gpa_t addr,
+                                  unsigned int len);
+
+void vgic_mmio_write_group(struct kvm_vcpu *vcpu, gpa_t addr,
+                          unsigned int len, unsigned long val);
+
 unsigned long vgic_mmio_read_enable(struct kvm_vcpu *vcpu,
                                    gpa_t addr, unsigned int len);