diff mbox

[3/3] apic: always update the in-kernel status after loading

Message ID 1351599394-24876-4-git-send-email-pbonzini@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paolo Bonzini Oct. 30, 2012, 12:16 p.m. UTC
The LAPIC is loaded separately from the rest of the VCPU state.  Thus,
when restoring the CPU the dummy post-reset state is passed to the
in-kernel APIC.

This can cause MSI injection to fail if attempted during the restore of
another device, because the LAPIC believes it's not enabled.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/apic_common.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

Comments

Avi Kivity Oct. 30, 2012, 12:38 p.m. UTC | #1
On 10/30/2012 02:16 PM, Paolo Bonzini wrote:
> The LAPIC is loaded separately from the rest of the VCPU state.  Thus,
> when restoring the CPU the dummy post-reset state is passed to the
> in-kernel APIC.
> 
> This can cause MSI injection to fail if attempted during the restore of
> another device, because the LAPIC believes it's not enabled.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/apic_common.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/apic_common.c b/hw/apic_common.c
> index f373ba8..1ef52b2 100644
> --- a/hw/apic_common.c
> +++ b/hw/apic_common.c
> @@ -362,6 +362,7 @@ static int apic_dispatch_post_load(void *opaque, int version_id)
>      if (info->post_load) {
>          info->post_load(s);
>      }
> +    cpu_put_apic_state(DEVICE(s));
>      return 0;
>  }

Aren't we still dependent on the order of processing?  If the APIC is
restored after the device, won't we get the same problem?
Paolo Bonzini Oct. 30, 2012, 2:16 p.m. UTC | #2
Il 30/10/2012 13:38, Avi Kivity ha scritto:
> On 10/30/2012 02:16 PM, Paolo Bonzini wrote:
>> The LAPIC is loaded separately from the rest of the VCPU state.  Thus,
>> when restoring the CPU the dummy post-reset state is passed to the
>> in-kernel APIC.
>>
>> This can cause MSI injection to fail if attempted during the restore of
>> another device, because the LAPIC believes it's not enabled.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  hw/apic_common.c |    1 +
>>  1 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/hw/apic_common.c b/hw/apic_common.c
>> index f373ba8..1ef52b2 100644
>> --- a/hw/apic_common.c
>> +++ b/hw/apic_common.c
>> @@ -362,6 +362,7 @@ static int apic_dispatch_post_load(void *opaque, int version_id)
>>      if (info->post_load) {
>>          info->post_load(s);
>>      }
>> +    cpu_put_apic_state(DEVICE(s));
>>      return 0;
>>  }
> 
> Aren't we still dependent on the order of processing?  If the APIC is
> restored after the device, won't we get the same problem?

Strictly speaking yes, but CPUs and APICs are always the first devices
to be saved.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kiszka Oct. 30, 2012, 6:17 p.m. UTC | #3
On 2012-10-30 13:16, Paolo Bonzini wrote:
> The LAPIC is loaded separately from the rest of the VCPU state.  Thus,
> when restoring the CPU the dummy post-reset state is passed to the
> in-kernel APIC.
> 
> This can cause MSI injection to fail if attempted during the restore of
> another device, because the LAPIC believes it's not enabled.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/apic_common.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/apic_common.c b/hw/apic_common.c
> index f373ba8..1ef52b2 100644
> --- a/hw/apic_common.c
> +++ b/hw/apic_common.c
> @@ -362,6 +362,7 @@ static int apic_dispatch_post_load(void *opaque, int version_id)
>      if (info->post_load) {
>          info->post_load(s);
>      }
> +    cpu_put_apic_state(DEVICE(s));

Just implement a post_load handler for the KVM APIC and trigger putting
from there.

Jan

>      return 0;
>  }
>  
>
Jan Kiszka Oct. 30, 2012, 6:21 p.m. UTC | #4
On 2012-10-30 15:16, Paolo Bonzini wrote:
> Il 30/10/2012 13:38, Avi Kivity ha scritto:
>> On 10/30/2012 02:16 PM, Paolo Bonzini wrote:
>>> The LAPIC is loaded separately from the rest of the VCPU state.  Thus,
>>> when restoring the CPU the dummy post-reset state is passed to the
>>> in-kernel APIC.
>>>
>>> This can cause MSI injection to fail if attempted during the restore of
>>> another device, because the LAPIC believes it's not enabled.
>>>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>>  hw/apic_common.c |    1 +
>>>  1 files changed, 1 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/hw/apic_common.c b/hw/apic_common.c
>>> index f373ba8..1ef52b2 100644
>>> --- a/hw/apic_common.c
>>> +++ b/hw/apic_common.c
>>> @@ -362,6 +362,7 @@ static int apic_dispatch_post_load(void *opaque, int version_id)
>>>      if (info->post_load) {
>>>          info->post_load(s);
>>>      }
>>> +    cpu_put_apic_state(DEVICE(s));
>>>      return 0;
>>>  }
>>
>> Aren't we still dependent on the order of processing?  If the APIC is
>> restored after the device, won't we get the same problem?
> 
> Strictly speaking yes, but CPUs and APICs are always the first devices
> to be saved.

Hmm, thinking about this again: Why is the MSI event injected at all
during restore, specifically while the device models are in transitional
state. Can you explain this? Does the same pattern then also apply on
INTx injection?

Jan
Paolo Bonzini Nov. 2, 2012, 2:53 p.m. UTC | #5
Il 30/10/2012 19:21, Jan Kiszka ha scritto:
> > > Aren't we still dependent on the order of processing?  If the APIC is
> > > restored after the device, won't we get the same problem?
> > 
> > Strictly speaking yes, but CPUs and APICs are always the first devices
> > to be saved.
> Hmm, thinking about this again: Why is the MSI event injected at all
> during restore, specifically while the device models are in transitional
> state. Can you explain this?

Because the (virtio-serial) port was connected on the source and
disconnected on the destination, or vice versa.

In my simplified reproducer, I'm really using different command-lines on
the source and destination, but it is not necessary.  For example, if
you have a socket backend, the destination will usually be disconnected
at the time the machine loads.

One alternative fix is a vm_clock timer that expires immediately.  It
would fix both MSI and INTx, on the other hand I thought it was an APIC
bug because the QEMU APIC works nicely.

> Does the same pattern then also apply on INTx injection?

Yes.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kiszka Nov. 2, 2012, 2:59 p.m. UTC | #6
On 2012-11-02 15:53, Paolo Bonzini wrote:
> Il 30/10/2012 19:21, Jan Kiszka ha scritto:
>>>> Aren't we still dependent on the order of processing?  If the APIC is
>>>> restored after the device, won't we get the same problem?
>>>
>>> Strictly speaking yes, but CPUs and APICs are always the first devices
>>> to be saved.
>> Hmm, thinking about this again: Why is the MSI event injected at all
>> during restore, specifically while the device models are in transitional
>> state. Can you explain this?
> 
> Because the (virtio-serial) port was connected on the source and
> disconnected on the destination, or vice versa.
> 
> In my simplified reproducer, I'm really using different command-lines on
> the source and destination, but it is not necessary.  For example, if
> you have a socket backend, the destination will usually be disconnected
> at the time the machine loads.
> 
> One alternative fix is a vm_clock timer that expires immediately.  It
> would fix both MSI and INTx, on the other hand I thought it was an APIC
> bug because the QEMU APIC works nicely.

I think deferring IRQ events to the point when the complete vmstate is
loaded is the cleaner and more robust approach.

Jan
Gerd Hoffmann Nov. 2, 2012, 3:07 p.m. UTC | #7
Hi,

> I think deferring IRQ events to the point when the complete vmstate is
> loaded is the cleaner and more robust approach.

Agree.  Just schedule a bh in post_load.
See also a229c0535bd336efaec786dd6e352a54e0a8187d

cheers,
  Gerd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Nov. 2, 2012, 3:13 p.m. UTC | #8
> Hi,
> 
> > I think deferring IRQ events to the point when the complete vmstate
> > is
> > loaded is the cleaner and more robust approach.
> 
> Agree.  Just schedule a bh in post_load.
> See also a229c0535bd336efaec786dd6e352a54e0a8187d

No, it cannot a bh.  Right now incoming migration is blocking,
but this will change in 1.3.  There is no guarantee that a
bottom half will run after migration has completed.

Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gerd Hoffmann Nov. 2, 2012, 3:17 p.m. UTC | #9
On 11/02/12 16:13, Paolo Bonzini wrote:
>> Hi,
>>
>>> I think deferring IRQ events to the point when the complete vmstate
>>> is
>>> loaded is the cleaner and more robust approach.
>>
>> Agree.  Just schedule a bh in post_load.
>> See also a229c0535bd336efaec786dd6e352a54e0a8187d
> 
> No, it cannot a bh.  Right now incoming migration is blocking,
> but this will change in 1.3.  There is no guarantee that a
> bottom half will run after migration has completed.

Then we'll need some new way to do this, maybe a new post_load handler
which is called once _all_ state is loaded.

cheers,
  Gerd

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Nov. 2, 2012, 3:21 p.m. UTC | #10
Il 02/11/2012 16:17, Gerd Hoffmann ha scritto:
> On 11/02/12 16:13, Paolo Bonzini wrote:
>>> >> Hi,
>>> >>
>>>> >>> I think deferring IRQ events to the point when the complete vmstate
>>>> >>> is loaded is the cleaner and more robust approach.
>>> >>
>>> >> Agree.  Just schedule a bh in post_load.
>>> >> See also a229c0535bd336efaec786dd6e352a54e0a8187d
>> > 
>> > No, it cannot a bh.  Right now incoming migration is blocking,
>> > but this will change in 1.3.  There is no guarantee that a
>> > bottom half will run after migration has completed.
> Then we'll need some new way to do this, maybe a new post_load handler
> which is called once _all_ state is loaded.

The simplest is a vm_clock timer that expires at time 0.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/hw/apic_common.c b/hw/apic_common.c
index f373ba8..1ef52b2 100644
--- a/hw/apic_common.c
+++ b/hw/apic_common.c
@@ -362,6 +362,7 @@  static int apic_dispatch_post_load(void *opaque, int version_id)
     if (info->post_load) {
         info->post_load(s);
     }
+    cpu_put_apic_state(DEVICE(s));
     return 0;
 }