diff mbox series

[1/2] x86/vlapic: Fix handling of writes to APIC_ESR

Message ID 20241128004737.283521-2-andrew.cooper3@citrix.com (mailing list archive)
State New
Headers show
Series x86/vlapic: Fixes to APIC_ESR handling | expand

Commit Message

Andrew Cooper Nov. 28, 2024, 12:47 a.m. UTC
Xen currently presents APIC_ESR to guests as a simple read/write register.

This is incorrect.  The SDM states:

  The ESR is a write/read register. Before attempt to read from the ESR,
  software should first write to it. (The value written does not affect the
  values read subsequently; only zero may be written in x2APIC mode.) This
  write clears any previously logged errors and updates the ESR with any
  errors detected since the last write to the ESR. This write also rearms the
  APIC error interrupt triggering mechanism.

Introduce a new pending_esr field in hvm_hw_lapic.  Update vlapic_error() to
accumulate errors here, and extend vlapic_reg_write() to discard the written
value, and instead transfer pending_esr into APIC_ESR.  Reads are still as
before.

Importantly, this means that guests no longer destroys the ESR value it's
looking for in the LVTERR handler when following the SDM instructions.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

Slightly RFC.  This collides with Alejandro's patch which adds the apic_id
field to hvm_hw_lapic too.  However, this is a far more obvious backport
candidate.

lapic_check_hidden() might in principle want to audit this field, but it's not
clear what to check.  While prior Xen will never have produced it in the
migration stream, Intel APIC-V will set APIC_ESR_ILLREGA above and beyond what
Xen will currently emulate.

I've checked that this does behave correctly under Intel APIC-V.  Writes to
APIC_ESR drop the written value into the backing page then take a trap-style
EXIT_REASON_APIC_WRITE which allows us to sample/latch properly.
---
 xen/arch/x86/hvm/vlapic.c              | 17 +++++++++++++++--
 xen/include/public/arch-x86/hvm/save.h |  1 +
 2 files changed, 16 insertions(+), 2 deletions(-)

Comments

Roger Pau Monné Nov. 28, 2024, 9:03 a.m. UTC | #1
On Thu, Nov 28, 2024 at 12:47:36AM +0000, Andrew Cooper wrote:
> Xen currently presents APIC_ESR to guests as a simple read/write register.
> 
> This is incorrect.  The SDM states:
> 
>   The ESR is a write/read register. Before attempt to read from the ESR,
>   software should first write to it. (The value written does not affect the
>   values read subsequently; only zero may be written in x2APIC mode.) This
>   write clears any previously logged errors and updates the ESR with any
>   errors detected since the last write to the ESR. This write also rearms the
>   APIC error interrupt triggering mechanism.
> 
> Introduce a new pending_esr field in hvm_hw_lapic.  Update vlapic_error() to
> accumulate errors here, and extend vlapic_reg_write() to discard the written
> value, and instead transfer pending_esr into APIC_ESR.  Reads are still as
> before.
> 
> Importantly, this means that guests no longer destroys the ESR value it's
> looking for in the LVTERR handler when following the SDM instructions.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> 
> Slightly RFC.  This collides with Alejandro's patch which adds the apic_id
> field to hvm_hw_lapic too.  However, this is a far more obvious backport
> candidate.
> 
> lapic_check_hidden() might in principle want to audit this field, but it's not
> clear what to check.  While prior Xen will never have produced it in the
> migration stream, Intel APIC-V will set APIC_ESR_ILLREGA above and beyond what
> Xen will currently emulate.
> 
> I've checked that this does behave correctly under Intel APIC-V.  Writes to
> APIC_ESR drop the written value into the backing page then take a trap-style
> EXIT_REASON_APIC_WRITE which allows us to sample/latch properly.
> ---
>  xen/arch/x86/hvm/vlapic.c              | 17 +++++++++++++++--
>  xen/include/public/arch-x86/hvm/save.h |  1 +
>  2 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> index 3363926b487b..98394ed26a52 100644
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -108,7 +108,7 @@ static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
>      uint32_t esr;
>  
>      spin_lock_irqsave(&vlapic->esr_lock, flags);
> -    esr = vlapic_get_reg(vlapic, APIC_ESR);
> +    esr = vlapic->hw.pending_esr;
>      if ( (esr & errmask) != errmask )
>      {
>          uint32_t lvterr = vlapic_get_reg(vlapic, APIC_LVTERR);
> @@ -127,7 +127,7 @@ static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
>                   errmask |= APIC_ESR_RECVILL;
>          }
>  
> -        vlapic_set_reg(vlapic, APIC_ESR, esr | errmask);
> +        vlapic->hw.pending_esr |= errmask;
>  
>          if ( inj )
>              vlapic_set_irq(vlapic, lvterr & APIC_VECTOR_MASK, 0);

The SDM also contains:

"This write also rearms the APIC error interrupt triggering
mechanism."

Where "this write" is a write to the ESR register.  My understanding
is that the error vector will only be injected for the first reported
error. I think the logic regarding whether to inject the lvterr vector
needs to additionally be gated on whether vlapic->hw.pending_esr ==
0.

Thanks, Roger.
Jan Beulich Nov. 28, 2024, 10:31 a.m. UTC | #2
On 28.11.2024 01:47, Andrew Cooper wrote:
> Xen currently presents APIC_ESR to guests as a simple read/write register.
> 
> This is incorrect.  The SDM states:
> 
>   The ESR is a write/read register. Before attempt to read from the ESR,
>   software should first write to it. (The value written does not affect the
>   values read subsequently; only zero may be written in x2APIC mode.) This
>   write clears any previously logged errors and updates the ESR with any
>   errors detected since the last write to the ESR. This write also rearms the
>   APIC error interrupt triggering mechanism.
> 
> Introduce a new pending_esr field in hvm_hw_lapic.  Update vlapic_error() to
> accumulate errors here, and extend vlapic_reg_write() to discard the written
> value, and instead transfer pending_esr into APIC_ESR.  Reads are still as
> before.
> 
> Importantly, this means that guests no longer destroys the ESR value it's
> looking for in the LVTERR handler when following the SDM instructions.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

No Fixes: tag presumably because the issue had been there forever?

> ---
> Slightly RFC.  This collides with Alejandro's patch which adds the apic_id
> field to hvm_hw_lapic too.  However, this is a far more obvious backport
> candidate.
> 
> lapic_check_hidden() might in principle want to audit this field, but it's not
> clear what to check.  While prior Xen will never have produced it in the
> migration stream, Intel APIC-V will set APIC_ESR_ILLREGA above and beyond what
> Xen will currently emulate.

The ESR really is an 8-bit value (in a 32-bit register), so checking the
upper bits may be necessary. Plus ...

> --- a/xen/include/public/arch-x86/hvm/save.h
> +++ b/xen/include/public/arch-x86/hvm/save.h
> @@ -394,6 +394,7 @@ struct hvm_hw_lapic {
>      uint32_t             disabled; /* VLAPIC_xx_DISABLED */
>      uint32_t             timer_divisor;
>      uint64_t             tdt_msr;
> +    uint32_t             pending_esr;
>  };

... I think you need to make padding explicit here, and then check that
to be zero.

Jan
Andrew Cooper Nov. 28, 2024, 11:01 a.m. UTC | #3
On 28/11/2024 9:03 am, Roger Pau Monné wrote:
> On Thu, Nov 28, 2024 at 12:47:36AM +0000, Andrew Cooper wrote:
>> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
>> index 3363926b487b..98394ed26a52 100644
>> --- a/xen/arch/x86/hvm/vlapic.c
>> +++ b/xen/arch/x86/hvm/vlapic.c
>> @@ -108,7 +108,7 @@ static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
>>      uint32_t esr;
>>  
>>      spin_lock_irqsave(&vlapic->esr_lock, flags);
>> -    esr = vlapic_get_reg(vlapic, APIC_ESR);
>> +    esr = vlapic->hw.pending_esr;
>>      if ( (esr & errmask) != errmask )
>>      {
>>          uint32_t lvterr = vlapic_get_reg(vlapic, APIC_LVTERR);
>> @@ -127,7 +127,7 @@ static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
>>                   errmask |= APIC_ESR_RECVILL;
>>          }
>>  
>> -        vlapic_set_reg(vlapic, APIC_ESR, esr | errmask);
>> +        vlapic->hw.pending_esr |= errmask;
>>  
>>          if ( inj )
>>              vlapic_set_irq(vlapic, lvterr & APIC_VECTOR_MASK, 0);
> The SDM also contains:
>
> "This write also rearms the APIC error interrupt triggering
> mechanism."
>
> Where "this write" is a write to the ESR register.

Correct.

> My understanding
> is that the error vector will only be injected for the first reported
> error. I think the logic regarding whether to inject the lvterr vector
> needs to additionally be gated on whether vlapic->hw.pending_esr ==
> 0.

I think it's clumsy wording.

Bits being set mask subsequent LVTERR's of the same type.  That's what
the "if ( (esr & errmask) != errmask )" guard is doing above.

What I think it's referring to is that writing APIC_ESR will zero
pending_esr and thus any subsequent error will cause LVTERR to deliver.


Having said all that, I can't find anything in the current SDM/APM which
states this.  I think I need to go back to the older manuals.

~Andrew
Jan Beulich Nov. 28, 2024, 11:09 a.m. UTC | #4
On 28.11.2024 12:01, Andrew Cooper wrote:
> On 28/11/2024 9:03 am, Roger Pau Monné wrote:
>> On Thu, Nov 28, 2024 at 12:47:36AM +0000, Andrew Cooper wrote:
>>> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
>>> index 3363926b487b..98394ed26a52 100644
>>> --- a/xen/arch/x86/hvm/vlapic.c
>>> +++ b/xen/arch/x86/hvm/vlapic.c
>>> @@ -108,7 +108,7 @@ static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
>>>      uint32_t esr;
>>>  
>>>      spin_lock_irqsave(&vlapic->esr_lock, flags);
>>> -    esr = vlapic_get_reg(vlapic, APIC_ESR);
>>> +    esr = vlapic->hw.pending_esr;
>>>      if ( (esr & errmask) != errmask )
>>>      {
>>>          uint32_t lvterr = vlapic_get_reg(vlapic, APIC_LVTERR);
>>> @@ -127,7 +127,7 @@ static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
>>>                   errmask |= APIC_ESR_RECVILL;
>>>          }
>>>  
>>> -        vlapic_set_reg(vlapic, APIC_ESR, esr | errmask);
>>> +        vlapic->hw.pending_esr |= errmask;
>>>  
>>>          if ( inj )
>>>              vlapic_set_irq(vlapic, lvterr & APIC_VECTOR_MASK, 0);
>> The SDM also contains:
>>
>> "This write also rearms the APIC error interrupt triggering
>> mechanism."
>>
>> Where "this write" is a write to the ESR register.
> 
> Correct.
> 
>> My understanding
>> is that the error vector will only be injected for the first reported
>> error. I think the logic regarding whether to inject the lvterr vector
>> needs to additionally be gated on whether vlapic->hw.pending_esr ==
>> 0.
> 
> I think it's clumsy wording.
> 
> Bits being set mask subsequent LVTERR's of the same type.  That's what
> the "if ( (esr & errmask) != errmask )" guard is doing above.

That's what we do, yes, but is that correct? I agree with Roger's reading
of that sentence.

> What I think it's referring to is that writing APIC_ESR will zero
> pending_esr and thus any subsequent error will cause LVTERR to deliver.

..., while at the same time preventing LVTERR delivery when there was
another error already pending.

Jan
Andrew Cooper Nov. 28, 2024, 11:10 a.m. UTC | #5
On 28/11/2024 10:31 am, Jan Beulich wrote:
> On 28.11.2024 01:47, Andrew Cooper wrote:
>> Xen currently presents APIC_ESR to guests as a simple read/write register.
>>
>> This is incorrect.  The SDM states:
>>
>>   The ESR is a write/read register. Before attempt to read from the ESR,
>>   software should first write to it. (The value written does not affect the
>>   values read subsequently; only zero may be written in x2APIC mode.) This
>>   write clears any previously logged errors and updates the ESR with any
>>   errors detected since the last write to the ESR. This write also rearms the
>>   APIC error interrupt triggering mechanism.
>>
>> Introduce a new pending_esr field in hvm_hw_lapic.  Update vlapic_error() to
>> accumulate errors here, and extend vlapic_reg_write() to discard the written
>> value, and instead transfer pending_esr into APIC_ESR.  Reads are still as
>> before.
>>
>> Importantly, this means that guests no longer destroys the ESR value it's
>> looking for in the LVTERR handler when following the SDM instructions.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> No Fixes: tag presumably because the issue had been there forever?

Oh, I forgot to note that.

I can't decide between forever, or since the introduction of the ESR
support (so Xen 4.5 like XSA-462, and still basically forever).
>> ---
>> Slightly RFC.  This collides with Alejandro's patch which adds the apic_id
>> field to hvm_hw_lapic too.  However, this is a far more obvious backport
>> candidate.
>>
>> lapic_check_hidden() might in principle want to audit this field, but it's not
>> clear what to check.  While prior Xen will never have produced it in the
>> migration stream, Intel APIC-V will set APIC_ESR_ILLREGA above and beyond what
>> Xen will currently emulate.
> The ESR really is an 8-bit value (in a 32-bit register), so checking the
> upper bits may be necessary.

It is now, but it may not be in the future.

My concern is that this value is generated by microcode, so we can't
audit based on which reserved bits we think prior versions of Xen never set.

I don't particularly care about a toolstack deciding to feed ~0 in
here.  But, if any bit beyond 7 gets allocated in the future, then
auditing the bottom byte would lead to a migration failure of what is in
practice a correct value.

>  Plus ...
>
>> --- a/xen/include/public/arch-x86/hvm/save.h
>> +++ b/xen/include/public/arch-x86/hvm/save.h
>> @@ -394,6 +394,7 @@ struct hvm_hw_lapic {
>>      uint32_t             disabled; /* VLAPIC_xx_DISABLED */
>>      uint32_t             timer_divisor;
>>      uint64_t             tdt_msr;
>> +    uint32_t             pending_esr;
>>  };
> ... I think you need to make padding explicit here, and then check that
> to be zero.

On further consideration I need to merge this with Alejandro's change. 
His depends on spotting the need to zero-extend beyond tdt_msr to
identify the compatibility case.

~Andrew
Jan Beulich Nov. 28, 2024, 11:50 a.m. UTC | #6
On 28.11.2024 12:10, Andrew Cooper wrote:
> On 28/11/2024 10:31 am, Jan Beulich wrote:
>> On 28.11.2024 01:47, Andrew Cooper wrote:
>>> Xen currently presents APIC_ESR to guests as a simple read/write register.
>>>
>>> This is incorrect.  The SDM states:
>>>
>>>   The ESR is a write/read register. Before attempt to read from the ESR,
>>>   software should first write to it. (The value written does not affect the
>>>   values read subsequently; only zero may be written in x2APIC mode.) This
>>>   write clears any previously logged errors and updates the ESR with any
>>>   errors detected since the last write to the ESR. This write also rearms the
>>>   APIC error interrupt triggering mechanism.
>>>
>>> Introduce a new pending_esr field in hvm_hw_lapic.  Update vlapic_error() to
>>> accumulate errors here, and extend vlapic_reg_write() to discard the written
>>> value, and instead transfer pending_esr into APIC_ESR.  Reads are still as
>>> before.
>>>
>>> Importantly, this means that guests no longer destroys the ESR value it's
>>> looking for in the LVTERR handler when following the SDM instructions.
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> No Fixes: tag presumably because the issue had been there forever?
> 
> Oh, I forgot to note that.
> 
> I can't decide between forever, or since the introduction of the ESR
> support (so Xen 4.5 like XSA-462, and still basically forever).
>>> ---
>>> Slightly RFC.  This collides with Alejandro's patch which adds the apic_id
>>> field to hvm_hw_lapic too.  However, this is a far more obvious backport
>>> candidate.
>>>
>>> lapic_check_hidden() might in principle want to audit this field, but it's not
>>> clear what to check.  While prior Xen will never have produced it in the
>>> migration stream, Intel APIC-V will set APIC_ESR_ILLREGA above and beyond what
>>> Xen will currently emulate.
>> The ESR really is an 8-bit value (in a 32-bit register), so checking the
>> upper bits may be necessary.
> 
> It is now, but it may not be in the future.
> 
> My concern is that this value is generated by microcode, so we can't
> audit based on which reserved bits we think prior versions of Xen never set.
> 
> I don't particularly care about a toolstack deciding to feed ~0 in
> here.  But, if any bit beyond 7 gets allocated in the future, then
> auditing the bottom byte would lead to a migration failure of what is in
> practice a correct value.

If a bit beyond zero got allocated, then it being set in an incoming stream
will, for an unaware Xen version, still be illegal. Such a guest simply can't
be migrated to a Xen version unaware of the bit. Once Xen becomes aware, the
auditing would (of course) also need adjustment.

Jan
Andrew Cooper Nov. 28, 2024, 11:57 a.m. UTC | #7
On 28/11/2024 11:50 am, Jan Beulich wrote:
> On 28.11.2024 12:10, Andrew Cooper wrote:
>> On 28/11/2024 10:31 am, Jan Beulich wrote:
>>> On 28.11.2024 01:47, Andrew Cooper wrote:
>>>> Xen currently presents APIC_ESR to guests as a simple read/write register.
>>>>
>>>> This is incorrect.  The SDM states:
>>>>
>>>>   The ESR is a write/read register. Before attempt to read from the ESR,
>>>>   software should first write to it. (The value written does not affect the
>>>>   values read subsequently; only zero may be written in x2APIC mode.) This
>>>>   write clears any previously logged errors and updates the ESR with any
>>>>   errors detected since the last write to the ESR. This write also rearms the
>>>>   APIC error interrupt triggering mechanism.
>>>>
>>>> Introduce a new pending_esr field in hvm_hw_lapic.  Update vlapic_error() to
>>>> accumulate errors here, and extend vlapic_reg_write() to discard the written
>>>> value, and instead transfer pending_esr into APIC_ESR.  Reads are still as
>>>> before.
>>>>
>>>> Importantly, this means that guests no longer destroys the ESR value it's
>>>> looking for in the LVTERR handler when following the SDM instructions.
>>>>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> No Fixes: tag presumably because the issue had been there forever?
>> Oh, I forgot to note that.
>>
>> I can't decide between forever, or since the introduction of the ESR
>> support (so Xen 4.5 like XSA-462, and still basically forever).
>>>> ---
>>>> Slightly RFC.  This collides with Alejandro's patch which adds the apic_id
>>>> field to hvm_hw_lapic too.  However, this is a far more obvious backport
>>>> candidate.
>>>>
>>>> lapic_check_hidden() might in principle want to audit this field, but it's not
>>>> clear what to check.  While prior Xen will never have produced it in the
>>>> migration stream, Intel APIC-V will set APIC_ESR_ILLREGA above and beyond what
>>>> Xen will currently emulate.
>>> The ESR really is an 8-bit value (in a 32-bit register), so checking the
>>> upper bits may be necessary.
>> It is now, but it may not be in the future.
>>
>> My concern is that this value is generated by microcode, so we can't
>> audit based on which reserved bits we think prior versions of Xen never set.
>>
>> I don't particularly care about a toolstack deciding to feed ~0 in
>> here.  But, if any bit beyond 7 gets allocated in the future, then
>> auditing the bottom byte would lead to a migration failure of what is in
>> practice a correct value.
> If a bit beyond zero got allocated, then it being set in an incoming stream
> will, for an unaware Xen version, still be illegal. Such a guest simply can't
> be migrated to a Xen version unaware of the bit. Once Xen becomes aware, the
> auditing would (of course) also need adjustment.

That's the whole point.  It's not about Xen's awareness; it's what
APIC-V/AVIC might do *in existing configurations* on future hardware
without taking a VMExit.

If there were no APIC-V support to begin with, this would be easy and
auditing would be limited to SENDILL|RECVILL as those are the only two
bits Xen knows about.

~Andrew
Jan Beulich Nov. 28, 2024, 12:16 p.m. UTC | #8
On 28.11.2024 12:57, Andrew Cooper wrote:
> On 28/11/2024 11:50 am, Jan Beulich wrote:
>> On 28.11.2024 12:10, Andrew Cooper wrote:
>>> On 28/11/2024 10:31 am, Jan Beulich wrote:
>>>> On 28.11.2024 01:47, Andrew Cooper wrote:
>>>>> Xen currently presents APIC_ESR to guests as a simple read/write register.
>>>>>
>>>>> This is incorrect.  The SDM states:
>>>>>
>>>>>   The ESR is a write/read register. Before attempt to read from the ESR,
>>>>>   software should first write to it. (The value written does not affect the
>>>>>   values read subsequently; only zero may be written in x2APIC mode.) This
>>>>>   write clears any previously logged errors and updates the ESR with any
>>>>>   errors detected since the last write to the ESR. This write also rearms the
>>>>>   APIC error interrupt triggering mechanism.
>>>>>
>>>>> Introduce a new pending_esr field in hvm_hw_lapic.  Update vlapic_error() to
>>>>> accumulate errors here, and extend vlapic_reg_write() to discard the written
>>>>> value, and instead transfer pending_esr into APIC_ESR.  Reads are still as
>>>>> before.
>>>>>
>>>>> Importantly, this means that guests no longer destroys the ESR value it's
>>>>> looking for in the LVTERR handler when following the SDM instructions.
>>>>>
>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> No Fixes: tag presumably because the issue had been there forever?
>>> Oh, I forgot to note that.
>>>
>>> I can't decide between forever, or since the introduction of the ESR
>>> support (so Xen 4.5 like XSA-462, and still basically forever).
>>>>> ---
>>>>> Slightly RFC.  This collides with Alejandro's patch which adds the apic_id
>>>>> field to hvm_hw_lapic too.  However, this is a far more obvious backport
>>>>> candidate.
>>>>>
>>>>> lapic_check_hidden() might in principle want to audit this field, but it's not
>>>>> clear what to check.  While prior Xen will never have produced it in the
>>>>> migration stream, Intel APIC-V will set APIC_ESR_ILLREGA above and beyond what
>>>>> Xen will currently emulate.
>>>> The ESR really is an 8-bit value (in a 32-bit register), so checking the
>>>> upper bits may be necessary.
>>> It is now, but it may not be in the future.
>>>
>>> My concern is that this value is generated by microcode, so we can't
>>> audit based on which reserved bits we think prior versions of Xen never set.
>>>
>>> I don't particularly care about a toolstack deciding to feed ~0 in
>>> here.  But, if any bit beyond 7 gets allocated in the future, then
>>> auditing the bottom byte would lead to a migration failure of what is in
>>> practice a correct value.
>> If a bit beyond zero got allocated, then it being set in an incoming stream
>> will, for an unaware Xen version, still be illegal. Such a guest simply can't
>> be migrated to a Xen version unaware of the bit. Once Xen becomes aware, the
>> auditing would (of course) also need adjustment.
> 
> That's the whole point.  It's not about Xen's awareness; it's what
> APIC-V/AVIC might do *in existing configurations* on future hardware
> without taking a VMExit.

How would you migrate such a guest to arbitrary other hardware, i.e.
potentially lacking support for that bit? If LVTERR triggering is as per
Roger's reading of the SDM, without knowing how many bits hardware
presently checks we couldn't guarantee correctness. Bits from 8 up being
reserved right now even leaves me wondering what happens on present
hardware when one of those top 24 bits is set.

> If there were no APIC-V support to begin with, this would be easy and
> auditing would be limited to SENDILL|RECVILL as those are the only two
> bits Xen knows about.

Limiting to just these two bits would be wrong; future Xen might make
use of more of them, and a guest should then still migrate correctly
(just that, after this initial being set of extra bits, it would never
again see any of them becoming set).

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 3363926b487b..98394ed26a52 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -108,7 +108,7 @@  static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
     uint32_t esr;
 
     spin_lock_irqsave(&vlapic->esr_lock, flags);
-    esr = vlapic_get_reg(vlapic, APIC_ESR);
+    esr = vlapic->hw.pending_esr;
     if ( (esr & errmask) != errmask )
     {
         uint32_t lvterr = vlapic_get_reg(vlapic, APIC_LVTERR);
@@ -127,7 +127,7 @@  static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
                  errmask |= APIC_ESR_RECVILL;
         }
 
-        vlapic_set_reg(vlapic, APIC_ESR, esr | errmask);
+        vlapic->hw.pending_esr |= errmask;
 
         if ( inj )
             vlapic_set_irq(vlapic, lvterr & APIC_VECTOR_MASK, 0);
@@ -802,6 +802,19 @@  void vlapic_reg_write(struct vcpu *v, unsigned int reg, uint32_t val)
         vlapic_set_reg(vlapic, APIC_ID, val);
         break;
 
+    case APIC_ESR:
+    {
+        unsigned long flags;
+
+        spin_lock_irqsave(&vlapic->esr_lock, flags);
+        val = vlapic->hw.pending_esr;
+        vlapic->hw.pending_esr = 0;
+        spin_unlock_irqrestore(&vlapic->esr_lock, flags);
+
+        vlapic_set_reg(vlapic, APIC_ESR, val);
+        break;
+    }
+
     case APIC_TASKPRI:
         vlapic_set_reg(vlapic, APIC_TASKPRI, val & 0xff);
         break;
diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
index 7ecacadde165..9c4bfc7ebdac 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -394,6 +394,7 @@  struct hvm_hw_lapic {
     uint32_t             disabled; /* VLAPIC_xx_DISABLED */
     uint32_t             timer_divisor;
     uint64_t             tdt_msr;
+    uint32_t             pending_esr;
 };
 
 DECLARE_HVM_SAVE_TYPE(LAPIC, 5, struct hvm_hw_lapic);