diff mbox series

[v2,1/2] x2APIC: translate IO-APIC entries when enabling the IOMMU

Message ID 20191010113345.6867-2-roger.pau@citrix.com (mailing list archive)
State Superseded
Headers show
Series iommu: fixes for interrupt remapping enabling | expand

Commit Message

Roger Pau Monné Oct. 10, 2019, 11:33 a.m. UTC
When interrupt remapping is enabled as part of enabling x2APIC the
IO-APIC entries also need to be translated to the new format and added
to the interrupt remapping table.

This prevents IOMMU interrupt remapping faults when booting on
hardware that has unmasked IO-APIC pins.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Juergen Gross <jgross@suse.com>
---
Changes since v1:
 - Remove the unneeded iommu_enabled local variable.
---
 xen/arch/x86/apic.c           | 9 +++++++--
 xen/arch/x86/io_apic.c        | 5 +++--
 xen/include/asm-x86/io_apic.h | 3 ++-
 3 files changed, 12 insertions(+), 5 deletions(-)

Comments

Jan Beulich Oct. 10, 2019, 11:54 a.m. UTC | #1
On 10.10.2019 13:33, Roger Pau Monne wrote:
> When interrupt remapping is enabled as part of enabling x2APIC the

Perhaps "unmasked" instead of "the"?

> IO-APIC entries also need to be translated to the new format and added
> to the interrupt remapping table.
> 
> This prevents IOMMU interrupt remapping faults when booting on
> hardware that has unmasked IO-APIC pins.

But in the end it only papers over whatever the spurious interrupts
result form, doesn't it? Which isn't to say this isn't an
improvement. Calling out the ExtInt case here may be worthwhile as
well, as would be pointing out that this case still won't work on
AMD IOMMUs.

> --- a/xen/arch/x86/apic.c
> +++ b/xen/arch/x86/apic.c
> @@ -515,7 +515,7 @@ static void resume_x2apic(void)
>      iommu_enable_x2apic();
>      __enable_x2apic();
>  
> -    restore_IO_APIC_setup(ioapic_entries);
> +    restore_IO_APIC_setup(ioapic_entries, true);
>      unmask_8259A();
>  
>  out:
> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void)
>          printk("Switched to APIC driver %s\n", genapic.name);
>  
>  restore_out:
> -    restore_IO_APIC_setup(ioapic_entries);
> +    /*
> +     * NB: do not use raw mode when restoring entries if the iommu has been
> +     * enabled during the process, because the entries need to be translated
> +     * and added to the remapping table in that case.
> +     */
> +    restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled);

How is this different in the resume_x2apic() case? The IOMMU gets
enabled in the course of that as well. I.e. I'd expect you want
to pass "false" there, not "true".

Also how would "x2apic_enabled" reflect the transition? It may
have been "true" already before entering the function here.

Jan
Roger Pau Monné Oct. 10, 2019, 12:13 p.m. UTC | #2
On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote:
> On 10.10.2019 13:33, Roger Pau Monne wrote:
> > When interrupt remapping is enabled as part of enabling x2APIC the
> 
> Perhaps "unmasked" instead of "the"?
> 
> > IO-APIC entries also need to be translated to the new format and added
> > to the interrupt remapping table.
> > 
> > This prevents IOMMU interrupt remapping faults when booting on
> > hardware that has unmasked IO-APIC pins.
> 
> But in the end it only papers over whatever the spurious interrupts
> result form, doesn't it? Which isn't to say this isn't an
> improvement. Calling out the ExtInt case here may be worthwhile as
> well, as would be pointing out that this case still won't work on
> AMD IOMMUs.

But the fix for the ExtINT AMD issue should be done in
amd_iommu_ioapic_update_ire then, so that it can properly handle
ExtINT delivery mode, not to this part of the code. I will look
into it, but I think it's kind of tangential to the issue here.

> > --- a/xen/arch/x86/apic.c
> > +++ b/xen/arch/x86/apic.c
> > @@ -515,7 +515,7 @@ static void resume_x2apic(void)
> >      iommu_enable_x2apic();
> >      __enable_x2apic();
> >  
> > -    restore_IO_APIC_setup(ioapic_entries);
> > +    restore_IO_APIC_setup(ioapic_entries, true);
> >      unmask_8259A();
> >  
> >  out:
> > @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void)
> >          printk("Switched to APIC driver %s\n", genapic.name);
> >  
> >  restore_out:
> > -    restore_IO_APIC_setup(ioapic_entries);
> > +    /*
> > +     * NB: do not use raw mode when restoring entries if the iommu has been
> > +     * enabled during the process, because the entries need to be translated
> > +     * and added to the remapping table in that case.
> > +     */
> > +    restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled);
> 
> How is this different in the resume_x2apic() case? The IOMMU gets
> enabled in the course of that as well. I.e. I'd expect you want
> to pass "false" there, not "true".

In the resume_x2apic case interrupt remapping should already be
enabled or not, but that function is not going to enable interrupt
remapping if it wasn't enabled before, hence the IO-APIC entries
should already be using the interrupt remapping table and no
translation is needed.

> Also how would "x2apic_enabled" reflect the transition? It may
> have been "true" already before entering the function here.

If x2apic_enabled == true at the point where restore_IO_APIC_setup
is called interrupt remapping must be enabled, because AFAICT at this
point it's not possible to have x2apic_enabled == true and interrupt
remapping disabled.

The issue I can see here is what happens if interrupt remapping was
already enabled by the hardware, and entries in the IO-APIC are
already using the remapping table. I would have to look into how to
detect that case properly, but I think the proposed change is an
improvement overall.

Thanks, Roger.
Jan Beulich Oct. 10, 2019, 12:55 p.m. UTC | #3
On 10.10.2019 14:13, Roger Pau Monné  wrote:
> On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote:
>> On 10.10.2019 13:33, Roger Pau Monne wrote:
>>> When interrupt remapping is enabled as part of enabling x2APIC the
>>
>> Perhaps "unmasked" instead of "the"?
>>
>>> IO-APIC entries also need to be translated to the new format and added
>>> to the interrupt remapping table.
>>>
>>> This prevents IOMMU interrupt remapping faults when booting on
>>> hardware that has unmasked IO-APIC pins.
>>
>> But in the end it only papers over whatever the spurious interrupts
>> result form, doesn't it? Which isn't to say this isn't an
>> improvement. Calling out the ExtInt case here may be worthwhile as
>> well, as would be pointing out that this case still won't work on
>> AMD IOMMUs.
> 
> But the fix for the ExtINT AMD issue should be done in
> amd_iommu_ioapic_update_ire then, so that it can properly handle
> ExtINT delivery mode, not to this part of the code. I will look
> into it, but I think it's kind of tangential to the issue here.

I'm not talking of you working on fixing this right away. I'm merely
asking that you mention here (a) the ExtInt special case and (b)
that this special case will (continue to) not work in the AMD case.

>>> --- a/xen/arch/x86/apic.c
>>> +++ b/xen/arch/x86/apic.c
>>> @@ -515,7 +515,7 @@ static void resume_x2apic(void)
>>>      iommu_enable_x2apic();
>>>      __enable_x2apic();
>>>  
>>> -    restore_IO_APIC_setup(ioapic_entries);
>>> +    restore_IO_APIC_setup(ioapic_entries, true);
>>>      unmask_8259A();
>>>  
>>>  out:
>>> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void)
>>>          printk("Switched to APIC driver %s\n", genapic.name);
>>>  
>>>  restore_out:
>>> -    restore_IO_APIC_setup(ioapic_entries);
>>> +    /*
>>> +     * NB: do not use raw mode when restoring entries if the iommu has been
>>> +     * enabled during the process, because the entries need to be translated
>>> +     * and added to the remapping table in that case.
>>> +     */
>>> +    restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled);
>>
>> How is this different in the resume_x2apic() case? The IOMMU gets
>> enabled in the course of that as well. I.e. I'd expect you want
>> to pass "false" there, not "true".
> 
> In the resume_x2apic case interrupt remapping should already be
> enabled or not, but that function is not going to enable interrupt
> remapping if it wasn't enabled before, hence the IO-APIC entries
> should already be using the interrupt remapping table and no
> translation is needed.

Who / what would have enabled the IOMMU in the resume case?

>> Also how would "x2apic_enabled" reflect the transition? It may
>> have been "true" already before entering the function here.
> 
> If x2apic_enabled == true at the point where restore_IO_APIC_setup
> is called interrupt remapping must be enabled, because AFAICT at this
> point it's not possible to have x2apic_enabled == true and interrupt
> remapping disabled.
> 
> The issue I can see here is what happens if interrupt remapping was
> already enabled by the hardware, and entries in the IO-APIC are
> already using the remapping table. I would have to look into how to
> detect that case properly, but I think the proposed change is an
> improvement overall.

Firmware handing off with the IOMMU (and hence interrupt remapping)
already enabled is a case which - afaict - we can't currently cope
with. Firmware handing off in x2APIC enabled state is typically
with the IOMMU (and hence interrupt remapping) still disabled. This
is not a forbidden mode, it's just that in such a configuration
interrupts can't be delivered to certain CPUs.

In any event you need to properly distinguish x2APIC enabled state
(or the transition thereof) from IOMMU / interrupt remapping
enabled state (or the transition thereof). I.e. you want to avoid
raw mode restore if interrupt remapping state transitioned from
off to on in the process.

Jan
Roger Pau Monné Oct. 10, 2019, 1:12 p.m. UTC | #4
On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote:
> On 10.10.2019 14:13, Roger Pau Monné  wrote:
> > On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote:
> >> On 10.10.2019 13:33, Roger Pau Monne wrote:
> >>> When interrupt remapping is enabled as part of enabling x2APIC the
> >>
> >> Perhaps "unmasked" instead of "the"?
> >>
> >>> IO-APIC entries also need to be translated to the new format and added
> >>> to the interrupt remapping table.
> >>>
> >>> This prevents IOMMU interrupt remapping faults when booting on
> >>> hardware that has unmasked IO-APIC pins.
> >>
> >> But in the end it only papers over whatever the spurious interrupts
> >> result form, doesn't it? Which isn't to say this isn't an
> >> improvement. Calling out the ExtInt case here may be worthwhile as
> >> well, as would be pointing out that this case still won't work on
> >> AMD IOMMUs.
> > 
> > But the fix for the ExtINT AMD issue should be done in
> > amd_iommu_ioapic_update_ire then, so that it can properly handle
> > ExtINT delivery mode, not to this part of the code. I will look
> > into it, but I think it's kind of tangential to the issue here.
> 
> I'm not talking of you working on fixing this right away. I'm merely
> asking that you mention here (a) the ExtInt special case and (b)
> that this special case will (continue to) not work in the AMD case.
> 
> >>> --- a/xen/arch/x86/apic.c
> >>> +++ b/xen/arch/x86/apic.c
> >>> @@ -515,7 +515,7 @@ static void resume_x2apic(void)
> >>>      iommu_enable_x2apic();
> >>>      __enable_x2apic();
> >>>  
> >>> -    restore_IO_APIC_setup(ioapic_entries);
> >>> +    restore_IO_APIC_setup(ioapic_entries, true);
> >>>      unmask_8259A();
> >>>  
> >>>  out:
> >>> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void)
> >>>          printk("Switched to APIC driver %s\n", genapic.name);
> >>>  
> >>>  restore_out:
> >>> -    restore_IO_APIC_setup(ioapic_entries);
> >>> +    /*
> >>> +     * NB: do not use raw mode when restoring entries if the iommu has been
> >>> +     * enabled during the process, because the entries need to be translated
> >>> +     * and added to the remapping table in that case.
> >>> +     */
> >>> +    restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled);
> >>
> >> How is this different in the resume_x2apic() case? The IOMMU gets
> >> enabled in the course of that as well. I.e. I'd expect you want
> >> to pass "false" there, not "true".
> > 
> > In the resume_x2apic case interrupt remapping should already be
> > enabled or not, but that function is not going to enable interrupt
> > remapping if it wasn't enabled before, hence the IO-APIC entries
> > should already be using the interrupt remapping table and no
> > translation is needed.
> 
> Who / what would have enabled the IOMMU in the resume case?

I don't think the question is who enables interrupt remapping in the
resume case (which is resume_x2apic when calling iommu_enable_x2apic
AFAICT), the point here is that on resume the entries in the IO-APIC
will already match the state of interrupt remapping, so they shouldn't
be translated. If interrupt remapping was off before suspend it will
still be off after resume, and there won't be any translation needed.
The same is true if interrupt remapping is on before suspend.

> >> Also how would "x2apic_enabled" reflect the transition? It may
> >> have been "true" already before entering the function here.
> > 
> > If x2apic_enabled == true at the point where restore_IO_APIC_setup
> > is called interrupt remapping must be enabled, because AFAICT at this
> > point it's not possible to have x2apic_enabled == true and interrupt
> > remapping disabled.
> > 
> > The issue I can see here is what happens if interrupt remapping was
> > already enabled by the hardware, and entries in the IO-APIC are
> > already using the remapping table. I would have to look into how to
> > detect that case properly, but I think the proposed change is an
> > improvement overall.
> 
> Firmware handing off with the IOMMU (and hence interrupt remapping)
> already enabled is a case which - afaict - we can't currently cope
> with. Firmware handing off in x2APIC enabled state is typically
> with the IOMMU (and hence interrupt remapping) still disabled. This
> is not a forbidden mode, it's just that in such a configuration
> interrupts can't be delivered to certain CPUs.
> 
> In any event you need to properly distinguish x2APIC enabled state
> (or the transition thereof) from IOMMU / interrupt remapping
> enabled state (or the transition thereof). I.e. you want to avoid
> raw mode restore if interrupt remapping state transitioned from
> off to on in the process.

Right, and that's why the call to restore_IO_APIC_setup in
x2apic_bsp_setup uses !x2apic_enabled as it's second parameter. If
interrupt remapping has been enabled by the call to
iommu_enable_x2apic x2apic_enabled must be true, and hence the entries
in the IO-APIC need to be translated to use the interrupt remapping
table. There's no path that can lead to restore_IO_APIC_setup with
interrupt remapping enabled and x2APIC mode disabled (or with x2APIC
enabled and interrupt remapping disabled).

Hence if interrupt remapping is off before calling x2apic_bsp_setup
(which is what Xen expects to function properly) and x2apic_enabled ==
true when calling restore_IO_APIC_setup it means interrupt remapping
got enabled, and the IO-APIC entries need translation.

Thanks, Roger.
Roger Pau Monné Oct. 10, 2019, 1:29 p.m. UTC | #5
On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote:
> On 10.10.2019 14:13, Roger Pau Monné  wrote:
> > On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote:
> >> On 10.10.2019 13:33, Roger Pau Monne wrote:
> >>> When interrupt remapping is enabled as part of enabling x2APIC the
> >>
> >> Perhaps "unmasked" instead of "the"?
> >>
> >>> IO-APIC entries also need to be translated to the new format and added
> >>> to the interrupt remapping table.
> >>>
> >>> This prevents IOMMU interrupt remapping faults when booting on
> >>> hardware that has unmasked IO-APIC pins.
> >>
> >> But in the end it only papers over whatever the spurious interrupts
> >> result form, doesn't it? Which isn't to say this isn't an
> >> improvement. Calling out the ExtInt case here may be worthwhile as
> >> well, as would be pointing out that this case still won't work on
> >> AMD IOMMUs.
> > 
> > But the fix for the ExtINT AMD issue should be done in
> > amd_iommu_ioapic_update_ire then, so that it can properly handle
> > ExtINT delivery mode, not to this part of the code. I will look
> > into it, but I think it's kind of tangential to the issue here.
> 
> I'm not talking of you working on fixing this right away. I'm merely
> asking that you mention here (a) the ExtInt special case and (b)
> that this special case will (continue to) not work in the AMD case.

Please bear with me, I've taken a look at the ExtINT handling for AMD
and I'm not sure I can spot what's missing. Xen seems to handle both
the EIntPass and IV fields of the DTE (see iommu_dte_add_device_entry
and amd_iommu_set_intremap_table), and that should be enough in order
to passthrough such interrupts.

Is there some other handling that I'm missing? (maybe in the handling
of the interrupt itself?)

Thanks, Roger.
Jan Beulich Oct. 10, 2019, 1:46 p.m. UTC | #6
On 10.10.2019 15:12, Roger Pau Monné  wrote:
> On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote:
>> On 10.10.2019 14:13, Roger Pau Monné  wrote:
>>> On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote:
>>>> On 10.10.2019 13:33, Roger Pau Monne wrote:
>>>>> When interrupt remapping is enabled as part of enabling x2APIC the
>>>>
>>>> Perhaps "unmasked" instead of "the"?
>>>>
>>>>> IO-APIC entries also need to be translated to the new format and added
>>>>> to the interrupt remapping table.
>>>>>
>>>>> This prevents IOMMU interrupt remapping faults when booting on
>>>>> hardware that has unmasked IO-APIC pins.
>>>>
>>>> But in the end it only papers over whatever the spurious interrupts
>>>> result form, doesn't it? Which isn't to say this isn't an
>>>> improvement. Calling out the ExtInt case here may be worthwhile as
>>>> well, as would be pointing out that this case still won't work on
>>>> AMD IOMMUs.
>>>
>>> But the fix for the ExtINT AMD issue should be done in
>>> amd_iommu_ioapic_update_ire then, so that it can properly handle
>>> ExtINT delivery mode, not to this part of the code. I will look
>>> into it, but I think it's kind of tangential to the issue here.
>>
>> I'm not talking of you working on fixing this right away. I'm merely
>> asking that you mention here (a) the ExtInt special case and (b)
>> that this special case will (continue to) not work in the AMD case.
>>
>>>>> --- a/xen/arch/x86/apic.c
>>>>> +++ b/xen/arch/x86/apic.c
>>>>> @@ -515,7 +515,7 @@ static void resume_x2apic(void)
>>>>>      iommu_enable_x2apic();
>>>>>      __enable_x2apic();
>>>>>  
>>>>> -    restore_IO_APIC_setup(ioapic_entries);
>>>>> +    restore_IO_APIC_setup(ioapic_entries, true);
>>>>>      unmask_8259A();
>>>>>  
>>>>>  out:
>>>>> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void)
>>>>>          printk("Switched to APIC driver %s\n", genapic.name);
>>>>>  
>>>>>  restore_out:
>>>>> -    restore_IO_APIC_setup(ioapic_entries);
>>>>> +    /*
>>>>> +     * NB: do not use raw mode when restoring entries if the iommu has been
>>>>> +     * enabled during the process, because the entries need to be translated
>>>>> +     * and added to the remapping table in that case.
>>>>> +     */
>>>>> +    restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled);
>>>>
>>>> How is this different in the resume_x2apic() case? The IOMMU gets
>>>> enabled in the course of that as well. I.e. I'd expect you want
>>>> to pass "false" there, not "true".
>>>
>>> In the resume_x2apic case interrupt remapping should already be
>>> enabled or not, but that function is not going to enable interrupt
>>> remapping if it wasn't enabled before, hence the IO-APIC entries
>>> should already be using the interrupt remapping table and no
>>> translation is needed.
>>
>> Who / what would have enabled the IOMMU in the resume case?
> 
> I don't think the question is who enables interrupt remapping in the
> resume case (which is resume_x2apic when calling iommu_enable_x2apic
> AFAICT), the point here is that on resume the entries in the IO-APIC
> will already match the state of interrupt remapping, so they shouldn't
> be translated. If interrupt remapping was off before suspend it will
> still be off after resume, and there won't be any translation needed.
> The same is true if interrupt remapping is on before suspend.

I disagree: save_IO_APIC_setup() gets called from resume_x2apic(),
not prior to suspend.

>>>> Also how would "x2apic_enabled" reflect the transition? It may
>>>> have been "true" already before entering the function here.
>>>
>>> If x2apic_enabled == true at the point where restore_IO_APIC_setup
>>> is called interrupt remapping must be enabled, because AFAICT at this
>>> point it's not possible to have x2apic_enabled == true and interrupt
>>> remapping disabled.
>>>
>>> The issue I can see here is what happens if interrupt remapping was
>>> already enabled by the hardware, and entries in the IO-APIC are
>>> already using the remapping table. I would have to look into how to
>>> detect that case properly, but I think the proposed change is an
>>> improvement overall.
>>
>> Firmware handing off with the IOMMU (and hence interrupt remapping)
>> already enabled is a case which - afaict - we can't currently cope
>> with. Firmware handing off in x2APIC enabled state is typically
>> with the IOMMU (and hence interrupt remapping) still disabled. This
>> is not a forbidden mode, it's just that in such a configuration
>> interrupts can't be delivered to certain CPUs.
>>
>> In any event you need to properly distinguish x2APIC enabled state
>> (or the transition thereof) from IOMMU / interrupt remapping
>> enabled state (or the transition thereof). I.e. you want to avoid
>> raw mode restore if interrupt remapping state transitioned from
>> off to on in the process.
> 
> Right, and that's why the call to restore_IO_APIC_setup in
> x2apic_bsp_setup uses !x2apic_enabled as it's second parameter. If
> interrupt remapping has been enabled by the call to
> iommu_enable_x2apic x2apic_enabled must be true, and hence the entries
> in the IO-APIC need to be translated to use the interrupt remapping
> table. There's no path that can lead to restore_IO_APIC_setup with
> interrupt remapping enabled and x2APIC mode disabled (or with x2APIC
> enabled and interrupt remapping disabled).
> 
> Hence if interrupt remapping is off before calling x2apic_bsp_setup
> (which is what Xen expects to function properly) and x2apic_enabled ==
> true when calling restore_IO_APIC_setup it means interrupt remapping
> got enabled, and the IO-APIC entries need translation.

But the code in question sits on a shared success/error path, and
in the error case it matters whether x2apic_enabled was true already
on entry. I realize that io_apic_write() would suitably avoid going
the remapping path, but I think it would be more clear if the
distinction was already made properly at the call site.

Jan
Jan Beulich Oct. 10, 2019, 1:53 p.m. UTC | #7
On 10.10.2019 15:29, Roger Pau Monné  wrote:
> On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote:
>> On 10.10.2019 14:13, Roger Pau Monné  wrote:
>>> On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote:
>>>> On 10.10.2019 13:33, Roger Pau Monne wrote:
>>>>> When interrupt remapping is enabled as part of enabling x2APIC the
>>>>
>>>> Perhaps "unmasked" instead of "the"?
>>>>
>>>>> IO-APIC entries also need to be translated to the new format and added
>>>>> to the interrupt remapping table.
>>>>>
>>>>> This prevents IOMMU interrupt remapping faults when booting on
>>>>> hardware that has unmasked IO-APIC pins.
>>>>
>>>> But in the end it only papers over whatever the spurious interrupts
>>>> result form, doesn't it? Which isn't to say this isn't an
>>>> improvement. Calling out the ExtInt case here may be worthwhile as
>>>> well, as would be pointing out that this case still won't work on
>>>> AMD IOMMUs.
>>>
>>> But the fix for the ExtINT AMD issue should be done in
>>> amd_iommu_ioapic_update_ire then, so that it can properly handle
>>> ExtINT delivery mode, not to this part of the code. I will look
>>> into it, but I think it's kind of tangential to the issue here.
>>
>> I'm not talking of you working on fixing this right away. I'm merely
>> asking that you mention here (a) the ExtInt special case and (b)
>> that this special case will (continue to) not work in the AMD case.
> 
> Please bear with me, I've taken a look at the ExtINT handling for AMD
> and I'm not sure I can spot what's missing. Xen seems to handle both
> the EIntPass and IV fields of the DTE (see iommu_dte_add_device_entry
> and amd_iommu_set_intremap_table), and that should be enough in order
> to passthrough such interrupts.

EIntPass gets set based on ACPI table information, not what we found
in a particular RTE. That's hopefully fine, but you know how reliable
firmware is especially in corner cases.

> Is there some other handling that I'm missing? (maybe in the handling
> of the interrupt itself?)

Look at the update_intremap_entry_from_ioapic() ->
update_intremap_entry() path: This converts the 3-bit delivery mode
field into a 1-bit int_type one (the field is really 3 bits wide,
but values 010..111 (binary) are documented as reserved; I can't
exclude though that the documentation is wrong here).

Jan
Roger Pau Monné Oct. 10, 2019, 3:19 p.m. UTC | #8
On Thu, Oct 10, 2019 at 03:46:45PM +0200, Jan Beulich wrote:
> On 10.10.2019 15:12, Roger Pau Monné  wrote:
> > On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote:
> >> On 10.10.2019 14:13, Roger Pau Monné  wrote:
> >>> On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote:
> >>>> On 10.10.2019 13:33, Roger Pau Monne wrote:
> >>>>> When interrupt remapping is enabled as part of enabling x2APIC the
> >>>>
> >>>> Perhaps "unmasked" instead of "the"?
> >>>>
> >>>>> IO-APIC entries also need to be translated to the new format and added
> >>>>> to the interrupt remapping table.
> >>>>>
> >>>>> This prevents IOMMU interrupt remapping faults when booting on
> >>>>> hardware that has unmasked IO-APIC pins.
> >>>>
> >>>> But in the end it only papers over whatever the spurious interrupts
> >>>> result form, doesn't it? Which isn't to say this isn't an
> >>>> improvement. Calling out the ExtInt case here may be worthwhile as
> >>>> well, as would be pointing out that this case still won't work on
> >>>> AMD IOMMUs.
> >>>
> >>> But the fix for the ExtINT AMD issue should be done in
> >>> amd_iommu_ioapic_update_ire then, so that it can properly handle
> >>> ExtINT delivery mode, not to this part of the code. I will look
> >>> into it, but I think it's kind of tangential to the issue here.
> >>
> >> I'm not talking of you working on fixing this right away. I'm merely
> >> asking that you mention here (a) the ExtInt special case and (b)
> >> that this special case will (continue to) not work in the AMD case.
> >>
> >>>>> --- a/xen/arch/x86/apic.c
> >>>>> +++ b/xen/arch/x86/apic.c
> >>>>> @@ -515,7 +515,7 @@ static void resume_x2apic(void)
> >>>>>      iommu_enable_x2apic();
> >>>>>      __enable_x2apic();
> >>>>>  
> >>>>> -    restore_IO_APIC_setup(ioapic_entries);
> >>>>> +    restore_IO_APIC_setup(ioapic_entries, true);
> >>>>>      unmask_8259A();
> >>>>>  
> >>>>>  out:
> >>>>> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void)
> >>>>>          printk("Switched to APIC driver %s\n", genapic.name);
> >>>>>  
> >>>>>  restore_out:
> >>>>> -    restore_IO_APIC_setup(ioapic_entries);
> >>>>> +    /*
> >>>>> +     * NB: do not use raw mode when restoring entries if the iommu has been
> >>>>> +     * enabled during the process, because the entries need to be translated
> >>>>> +     * and added to the remapping table in that case.
> >>>>> +     */
> >>>>> +    restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled);
> >>>>
> >>>> How is this different in the resume_x2apic() case? The IOMMU gets
> >>>> enabled in the course of that as well. I.e. I'd expect you want
> >>>> to pass "false" there, not "true".
> >>>
> >>> In the resume_x2apic case interrupt remapping should already be
> >>> enabled or not, but that function is not going to enable interrupt
> >>> remapping if it wasn't enabled before, hence the IO-APIC entries
> >>> should already be using the interrupt remapping table and no
> >>> translation is needed.
> >>
> >> Who / what would have enabled the IOMMU in the resume case?
> > 
> > I don't think the question is who enables interrupt remapping in the
> > resume case (which is resume_x2apic when calling iommu_enable_x2apic
> > AFAICT), the point here is that on resume the entries in the IO-APIC
> > will already match the state of interrupt remapping, so they shouldn't
> > be translated. If interrupt remapping was off before suspend it will
> > still be off after resume, and there won't be any translation needed.
> > The same is true if interrupt remapping is on before suspend.
> 
> I disagree: save_IO_APIC_setup() gets called from resume_x2apic(),
> not prior to suspend.

Oh, so maybe that's a misunderstanding on my side. I don't seem to be
able to find a statement about the contents of the IO-APIC registers
(and more specifically the entries) when getting back from
suspension. Are all entries cleared and masked?

Are the values previous to suspension stored?

> >>>> Also how would "x2apic_enabled" reflect the transition? It may
> >>>> have been "true" already before entering the function here.
> >>>
> >>> If x2apic_enabled == true at the point where restore_IO_APIC_setup
> >>> is called interrupt remapping must be enabled, because AFAICT at this
> >>> point it's not possible to have x2apic_enabled == true and interrupt
> >>> remapping disabled.
> >>>
> >>> The issue I can see here is what happens if interrupt remapping was
> >>> already enabled by the hardware, and entries in the IO-APIC are
> >>> already using the remapping table. I would have to look into how to
> >>> detect that case properly, but I think the proposed change is an
> >>> improvement overall.
> >>
> >> Firmware handing off with the IOMMU (and hence interrupt remapping)
> >> already enabled is a case which - afaict - we can't currently cope
> >> with. Firmware handing off in x2APIC enabled state is typically
> >> with the IOMMU (and hence interrupt remapping) still disabled. This
> >> is not a forbidden mode, it's just that in such a configuration
> >> interrupts can't be delivered to certain CPUs.
> >>
> >> In any event you need to properly distinguish x2APIC enabled state
> >> (or the transition thereof) from IOMMU / interrupt remapping
> >> enabled state (or the transition thereof). I.e. you want to avoid
> >> raw mode restore if interrupt remapping state transitioned from
> >> off to on in the process.
> > 
> > Right, and that's why the call to restore_IO_APIC_setup in
> > x2apic_bsp_setup uses !x2apic_enabled as it's second parameter. If
> > interrupt remapping has been enabled by the call to
> > iommu_enable_x2apic x2apic_enabled must be true, and hence the entries
> > in the IO-APIC need to be translated to use the interrupt remapping
> > table. There's no path that can lead to restore_IO_APIC_setup with
> > interrupt remapping enabled and x2APIC mode disabled (or with x2APIC
> > enabled and interrupt remapping disabled).
> > 
> > Hence if interrupt remapping is off before calling x2apic_bsp_setup
> > (which is what Xen expects to function properly) and x2apic_enabled ==
> > true when calling restore_IO_APIC_setup it means interrupt remapping
> > got enabled, and the IO-APIC entries need translation.
> 
> But the code in question sits on a shared success/error path, and
> in the error case it matters whether x2apic_enabled was true already
> on entry.

But it's simply not possible to reach the call to
restore_IO_APIC_setup with x2apic_enabled == true and interrupt
remapping disabled, regardless of the initial value of
x2apic_enabled.

All the paths that could lead to this scenario are short-circuited
above with a panic.

> I realize that io_apic_write() would suitably avoid going
> the remapping path, but I think it would be more clear if the
> distinction was already made properly at the call site.

I'm afraid I'm slightly loss, do you mean to replace the
ioapic_write_entry with an io_apic_write in restore_IO_APIC_setup?

That would be the same as always passing raw == false AFAICT.

Thanks, Roger.
Jan Beulich Oct. 11, 2019, 7:52 a.m. UTC | #9
On 10.10.2019 17:19, Roger Pau Monné  wrote:
> On Thu, Oct 10, 2019 at 03:46:45PM +0200, Jan Beulich wrote:
>> On 10.10.2019 15:12, Roger Pau Monné  wrote:
>>> On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote:
>>>> On 10.10.2019 14:13, Roger Pau Monné  wrote:
>>>>> On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote:
>>>>>> On 10.10.2019 13:33, Roger Pau Monne wrote:
>>>>>>> When interrupt remapping is enabled as part of enabling x2APIC the
>>>>>>
>>>>>> Perhaps "unmasked" instead of "the"?
>>>>>>
>>>>>>> IO-APIC entries also need to be translated to the new format and added
>>>>>>> to the interrupt remapping table.
>>>>>>>
>>>>>>> This prevents IOMMU interrupt remapping faults when booting on
>>>>>>> hardware that has unmasked IO-APIC pins.
>>>>>>
>>>>>> But in the end it only papers over whatever the spurious interrupts
>>>>>> result form, doesn't it? Which isn't to say this isn't an
>>>>>> improvement. Calling out the ExtInt case here may be worthwhile as
>>>>>> well, as would be pointing out that this case still won't work on
>>>>>> AMD IOMMUs.
>>>>>
>>>>> But the fix for the ExtINT AMD issue should be done in
>>>>> amd_iommu_ioapic_update_ire then, so that it can properly handle
>>>>> ExtINT delivery mode, not to this part of the code. I will look
>>>>> into it, but I think it's kind of tangential to the issue here.
>>>>
>>>> I'm not talking of you working on fixing this right away. I'm merely
>>>> asking that you mention here (a) the ExtInt special case and (b)
>>>> that this special case will (continue to) not work in the AMD case.
>>>>
>>>>>>> --- a/xen/arch/x86/apic.c
>>>>>>> +++ b/xen/arch/x86/apic.c
>>>>>>> @@ -515,7 +515,7 @@ static void resume_x2apic(void)
>>>>>>>      iommu_enable_x2apic();
>>>>>>>      __enable_x2apic();
>>>>>>>  
>>>>>>> -    restore_IO_APIC_setup(ioapic_entries);
>>>>>>> +    restore_IO_APIC_setup(ioapic_entries, true);
>>>>>>>      unmask_8259A();
>>>>>>>  
>>>>>>>  out:
>>>>>>> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void)
>>>>>>>          printk("Switched to APIC driver %s\n", genapic.name);
>>>>>>>  
>>>>>>>  restore_out:
>>>>>>> -    restore_IO_APIC_setup(ioapic_entries);
>>>>>>> +    /*
>>>>>>> +     * NB: do not use raw mode when restoring entries if the iommu has been
>>>>>>> +     * enabled during the process, because the entries need to be translated
>>>>>>> +     * and added to the remapping table in that case.
>>>>>>> +     */
>>>>>>> +    restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled);
>>>>>>
>>>>>> How is this different in the resume_x2apic() case? The IOMMU gets
>>>>>> enabled in the course of that as well. I.e. I'd expect you want
>>>>>> to pass "false" there, not "true".
>>>>>
>>>>> In the resume_x2apic case interrupt remapping should already be
>>>>> enabled or not, but that function is not going to enable interrupt
>>>>> remapping if it wasn't enabled before, hence the IO-APIC entries
>>>>> should already be using the interrupt remapping table and no
>>>>> translation is needed.
>>>>
>>>> Who / what would have enabled the IOMMU in the resume case?
>>>
>>> I don't think the question is who enables interrupt remapping in the
>>> resume case (which is resume_x2apic when calling iommu_enable_x2apic
>>> AFAICT), the point here is that on resume the entries in the IO-APIC
>>> will already match the state of interrupt remapping, so they shouldn't
>>> be translated. If interrupt remapping was off before suspend it will
>>> still be off after resume, and there won't be any translation needed.
>>> The same is true if interrupt remapping is on before suspend.
>>
>> I disagree: save_IO_APIC_setup() gets called from resume_x2apic(),
>> not prior to suspend.
> 
> Oh, so maybe that's a misunderstanding on my side. I don't seem to be
> able to find a statement about the contents of the IO-APIC registers
> (and more specifically the entries) when getting back from
> suspension. Are all entries cleared and masked?
> 
> Are the values previous to suspension stored?

See ioapic_suspend() / ioapic_resume(): Looks like there's some
redundancy here - I don't think it makes sense for the LAPIC
code to fiddle with all the RTEs if subsequently (I assume;
didn't check) they'll all be overwritten anyway. I would seem
more logical to me if they'd just all get masked for IOMMU
enabling, deferring to ioapic_resume() for everything else.

>>>>>> Also how would "x2apic_enabled" reflect the transition? It may
>>>>>> have been "true" already before entering the function here.
>>>>>
>>>>> If x2apic_enabled == true at the point where restore_IO_APIC_setup
>>>>> is called interrupt remapping must be enabled, because AFAICT at this
>>>>> point it's not possible to have x2apic_enabled == true and interrupt
>>>>> remapping disabled.
>>>>>
>>>>> The issue I can see here is what happens if interrupt remapping was
>>>>> already enabled by the hardware, and entries in the IO-APIC are
>>>>> already using the remapping table. I would have to look into how to
>>>>> detect that case properly, but I think the proposed change is an
>>>>> improvement overall.
>>>>
>>>> Firmware handing off with the IOMMU (and hence interrupt remapping)
>>>> already enabled is a case which - afaict - we can't currently cope
>>>> with. Firmware handing off in x2APIC enabled state is typically
>>>> with the IOMMU (and hence interrupt remapping) still disabled. This
>>>> is not a forbidden mode, it's just that in such a configuration
>>>> interrupts can't be delivered to certain CPUs.
>>>>
>>>> In any event you need to properly distinguish x2APIC enabled state
>>>> (or the transition thereof) from IOMMU / interrupt remapping
>>>> enabled state (or the transition thereof). I.e. you want to avoid
>>>> raw mode restore if interrupt remapping state transitioned from
>>>> off to on in the process.
>>>
>>> Right, and that's why the call to restore_IO_APIC_setup in
>>> x2apic_bsp_setup uses !x2apic_enabled as it's second parameter. If
>>> interrupt remapping has been enabled by the call to
>>> iommu_enable_x2apic x2apic_enabled must be true, and hence the entries
>>> in the IO-APIC need to be translated to use the interrupt remapping
>>> table. There's no path that can lead to restore_IO_APIC_setup with
>>> interrupt remapping enabled and x2APIC mode disabled (or with x2APIC
>>> enabled and interrupt remapping disabled).
>>>
>>> Hence if interrupt remapping is off before calling x2apic_bsp_setup
>>> (which is what Xen expects to function properly) and x2apic_enabled ==
>>> true when calling restore_IO_APIC_setup it means interrupt remapping
>>> got enabled, and the IO-APIC entries need translation.
>>
>> But the code in question sits on a shared success/error path, and
>> in the error case it matters whether x2apic_enabled was true already
>> on entry.
> 
> But it's simply not possible to reach the call to
> restore_IO_APIC_setup with x2apic_enabled == true and interrupt
> remapping disabled, regardless of the initial value of
> x2apic_enabled.
> 
> All the paths that could lead to this scenario are short-circuited
> above with a panic.

Hmm, true. Nevertheless it would feel better if the conditionals
were using what actually matters, rather than something derived.

>> I realize that io_apic_write() would suitably avoid going
>> the remapping path, but I think it would be more clear if the
>> distinction was already made properly at the call site.
> 
> I'm afraid I'm slightly loss, do you mean to replace the
> ioapic_write_entry with an io_apic_write in restore_IO_APIC_setup?

No, because of ...

> That would be the same as always passing raw == false AFAICT.

... this. I'm asking to pass in the argument for "raw" based
on what you want, without relying on io_apic_write()'s
behavior. That's more for code clarity than actual correctness,
since - as said - io_apic_write() would invoke __io_apic_write()
anyway. By setting "raw" correctly you can simply avoid going
through io_apic_write() altogether.

Jan
Roger Pau Monné Oct. 11, 2019, 9:28 a.m. UTC | #10
On Fri, Oct 11, 2019 at 09:52:35AM +0200, Jan Beulich wrote:
> On 10.10.2019 17:19, Roger Pau Monné  wrote:
> > On Thu, Oct 10, 2019 at 03:46:45PM +0200, Jan Beulich wrote:
> >> On 10.10.2019 15:12, Roger Pau Monné  wrote:
> >>> On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote:
> >>>> On 10.10.2019 14:13, Roger Pau Monné  wrote:
> >>>>> On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote:
> >>>>>> On 10.10.2019 13:33, Roger Pau Monne wrote:
> >>>>>>> When interrupt remapping is enabled as part of enabling x2APIC the
> >>>>>>
> >>>>>> Perhaps "unmasked" instead of "the"?
> >>>>>>
> >>>>>>> IO-APIC entries also need to be translated to the new format and added
> >>>>>>> to the interrupt remapping table.
> >>>>>>>
> >>>>>>> This prevents IOMMU interrupt remapping faults when booting on
> >>>>>>> hardware that has unmasked IO-APIC pins.
> >>>>>>
> >>>>>> But in the end it only papers over whatever the spurious interrupts
> >>>>>> result form, doesn't it? Which isn't to say this isn't an
> >>>>>> improvement. Calling out the ExtInt case here may be worthwhile as
> >>>>>> well, as would be pointing out that this case still won't work on
> >>>>>> AMD IOMMUs.
> >>>>>
> >>>>> But the fix for the ExtINT AMD issue should be done in
> >>>>> amd_iommu_ioapic_update_ire then, so that it can properly handle
> >>>>> ExtINT delivery mode, not to this part of the code. I will look
> >>>>> into it, but I think it's kind of tangential to the issue here.
> >>>>
> >>>> I'm not talking of you working on fixing this right away. I'm merely
> >>>> asking that you mention here (a) the ExtInt special case and (b)
> >>>> that this special case will (continue to) not work in the AMD case.
> >>>>
> >>>>>>> --- a/xen/arch/x86/apic.c
> >>>>>>> +++ b/xen/arch/x86/apic.c
> >>>>>>> @@ -515,7 +515,7 @@ static void resume_x2apic(void)
> >>>>>>>      iommu_enable_x2apic();
> >>>>>>>      __enable_x2apic();
> >>>>>>>  
> >>>>>>> -    restore_IO_APIC_setup(ioapic_entries);
> >>>>>>> +    restore_IO_APIC_setup(ioapic_entries, true);
> >>>>>>>      unmask_8259A();
> >>>>>>>  
> >>>>>>>  out:
> >>>>>>> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void)
> >>>>>>>          printk("Switched to APIC driver %s\n", genapic.name);
> >>>>>>>  
> >>>>>>>  restore_out:
> >>>>>>> -    restore_IO_APIC_setup(ioapic_entries);
> >>>>>>> +    /*
> >>>>>>> +     * NB: do not use raw mode when restoring entries if the iommu has been
> >>>>>>> +     * enabled during the process, because the entries need to be translated
> >>>>>>> +     * and added to the remapping table in that case.
> >>>>>>> +     */
> >>>>>>> +    restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled);
> >>>>>>
> >>>>>> How is this different in the resume_x2apic() case? The IOMMU gets
> >>>>>> enabled in the course of that as well. I.e. I'd expect you want
> >>>>>> to pass "false" there, not "true".
> >>>>>
> >>>>> In the resume_x2apic case interrupt remapping should already be
> >>>>> enabled or not, but that function is not going to enable interrupt
> >>>>> remapping if it wasn't enabled before, hence the IO-APIC entries
> >>>>> should already be using the interrupt remapping table and no
> >>>>> translation is needed.
> >>>>
> >>>> Who / what would have enabled the IOMMU in the resume case?
> >>>
> >>> I don't think the question is who enables interrupt remapping in the
> >>> resume case (which is resume_x2apic when calling iommu_enable_x2apic
> >>> AFAICT), the point here is that on resume the entries in the IO-APIC
> >>> will already match the state of interrupt remapping, so they shouldn't
> >>> be translated. If interrupt remapping was off before suspend it will
> >>> still be off after resume, and there won't be any translation needed.
> >>> The same is true if interrupt remapping is on before suspend.
> >>
> >> I disagree: save_IO_APIC_setup() gets called from resume_x2apic(),
> >> not prior to suspend.
> > 
> > Oh, so maybe that's a misunderstanding on my side. I don't seem to be
> > able to find a statement about the contents of the IO-APIC registers
> > (and more specifically the entries) when getting back from
> > suspension. Are all entries cleared and masked?
> > 
> > Are the values previous to suspension stored?
> 
> See ioapic_suspend() / ioapic_resume(): Looks like there's some
> redundancy here - I don't think it makes sense for the LAPIC
> code to fiddle with all the RTEs if subsequently (I assume;
> didn't check) they'll all be overwritten anyway. I would seem
> more logical to me if they'd just all get masked for IOMMU
> enabling, deferring to ioapic_resume() for everything else.

Yes, it seems like resume_x2apic shouldn't need to play with the
entries at all TBH, just enabling interrupt remapping should be enough
given that the entries are already save and restored by
ioapic_{suspend/resume}.

Would you agree to leave that as-is in this patch, ie: always pass
true to restore_IO_APIC_setup in resume_x2apic?

> >>>>>> Also how would "x2apic_enabled" reflect the transition? It may
> >>>>>> have been "true" already before entering the function here.
> >>>>>
> >>>>> If x2apic_enabled == true at the point where restore_IO_APIC_setup
> >>>>> is called interrupt remapping must be enabled, because AFAICT at this
> >>>>> point it's not possible to have x2apic_enabled == true and interrupt
> >>>>> remapping disabled.
> >>>>>
> >>>>> The issue I can see here is what happens if interrupt remapping was
> >>>>> already enabled by the hardware, and entries in the IO-APIC are
> >>>>> already using the remapping table. I would have to look into how to
> >>>>> detect that case properly, but I think the proposed change is an
> >>>>> improvement overall.
> >>>>
> >>>> Firmware handing off with the IOMMU (and hence interrupt remapping)
> >>>> already enabled is a case which - afaict - we can't currently cope
> >>>> with. Firmware handing off in x2APIC enabled state is typically
> >>>> with the IOMMU (and hence interrupt remapping) still disabled. This
> >>>> is not a forbidden mode, it's just that in such a configuration
> >>>> interrupts can't be delivered to certain CPUs.
> >>>>
> >>>> In any event you need to properly distinguish x2APIC enabled state
> >>>> (or the transition thereof) from IOMMU / interrupt remapping
> >>>> enabled state (or the transition thereof). I.e. you want to avoid
> >>>> raw mode restore if interrupt remapping state transitioned from
> >>>> off to on in the process.
> >>>
> >>> Right, and that's why the call to restore_IO_APIC_setup in
> >>> x2apic_bsp_setup uses !x2apic_enabled as it's second parameter. If
> >>> interrupt remapping has been enabled by the call to
> >>> iommu_enable_x2apic x2apic_enabled must be true, and hence the entries
> >>> in the IO-APIC need to be translated to use the interrupt remapping
> >>> table. There's no path that can lead to restore_IO_APIC_setup with
> >>> interrupt remapping enabled and x2APIC mode disabled (or with x2APIC
> >>> enabled and interrupt remapping disabled).
> >>>
> >>> Hence if interrupt remapping is off before calling x2apic_bsp_setup
> >>> (which is what Xen expects to function properly) and x2apic_enabled ==
> >>> true when calling restore_IO_APIC_setup it means interrupt remapping
> >>> got enabled, and the IO-APIC entries need translation.
> >>
> >> But the code in question sits on a shared success/error path, and
> >> in the error case it matters whether x2apic_enabled was true already
> >> on entry.
> > 
> > But it's simply not possible to reach the call to
> > restore_IO_APIC_setup with x2apic_enabled == true and interrupt
> > remapping disabled, regardless of the initial value of
> > x2apic_enabled.
> > 
> > All the paths that could lead to this scenario are short-circuited
> > above with a panic.
> 
> Hmm, true. Nevertheless it would feel better if the conditionals
> were using what actually matters, rather than something derived.

Would you be OK to using something like v1:

https://lists.xenproject.org/archives/html/xen-devel/2019-10/msg00805.html

See the usage of iommu_enabled in x2apic_bsp_setup. I think using
x2apic_enabled is fine given the logic in the function, and the fact
that x2APIC mandates interrupt remapping, but I could live with that
extra local variable.

> >> I realize that io_apic_write() would suitably avoid going
> >> the remapping path, but I think it would be more clear if the
> >> distinction was already made properly at the call site.
> > 
> > I'm afraid I'm slightly loss, do you mean to replace the
> > ioapic_write_entry with an io_apic_write in restore_IO_APIC_setup?
> 
> No, because of ...
> 
> > That would be the same as always passing raw == false AFAICT.
> 
> ... this. I'm asking to pass in the argument for "raw" based
> on what you want, without relying on io_apic_write()'s
> behavior. That's more for code clarity than actual correctness,
> since - as said - io_apic_write() would invoke __io_apic_write()
> anyway.

Sorry, but I'm afraid I still don't fully understand what you mean
with this. restore_IO_APIC_setup uses ioapic_write_entry which in turn
uses __io_apic_write or io_apic_write. I could get rid of the usage of
ioapic_write_entry and just call __io_apic_write or io_apic_write (or
even directly call iommu_update_ire_from_apic), but I'm not sure
what's the benefit of any of this, it's just open-coding logic that's
done in the helpers.

Would you rather mean to rename the parameter of restore_IO_APIC_setup
from 'raw' to 'translate' or some such in order to clearly denote
there's a transformation done before writing the entry?

Thanks, Roger.
Jan Beulich Oct. 11, 2019, 10:01 a.m. UTC | #11
On 11.10.2019 11:28, Roger Pau Monné  wrote:
> On Fri, Oct 11, 2019 at 09:52:35AM +0200, Jan Beulich wrote:
>> On 10.10.2019 17:19, Roger Pau Monné  wrote:
>>> On Thu, Oct 10, 2019 at 03:46:45PM +0200, Jan Beulich wrote:
>>>> On 10.10.2019 15:12, Roger Pau Monné  wrote:
>>>>> On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote:
>>>>>> On 10.10.2019 14:13, Roger Pau Monné  wrote:
>>>>>>> On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote:
>>>>>>>> On 10.10.2019 13:33, Roger Pau Monne wrote:
>>>>>>>>> When interrupt remapping is enabled as part of enabling x2APIC the
>>>>>>>>
>>>>>>>> Perhaps "unmasked" instead of "the"?
>>>>>>>>
>>>>>>>>> IO-APIC entries also need to be translated to the new format and added
>>>>>>>>> to the interrupt remapping table.
>>>>>>>>>
>>>>>>>>> This prevents IOMMU interrupt remapping faults when booting on
>>>>>>>>> hardware that has unmasked IO-APIC pins.
>>>>>>>>
>>>>>>>> But in the end it only papers over whatever the spurious interrupts
>>>>>>>> result form, doesn't it? Which isn't to say this isn't an
>>>>>>>> improvement. Calling out the ExtInt case here may be worthwhile as
>>>>>>>> well, as would be pointing out that this case still won't work on
>>>>>>>> AMD IOMMUs.
>>>>>>>
>>>>>>> But the fix for the ExtINT AMD issue should be done in
>>>>>>> amd_iommu_ioapic_update_ire then, so that it can properly handle
>>>>>>> ExtINT delivery mode, not to this part of the code. I will look
>>>>>>> into it, but I think it's kind of tangential to the issue here.
>>>>>>
>>>>>> I'm not talking of you working on fixing this right away. I'm merely
>>>>>> asking that you mention here (a) the ExtInt special case and (b)
>>>>>> that this special case will (continue to) not work in the AMD case.
>>>>>>
>>>>>>>>> --- a/xen/arch/x86/apic.c
>>>>>>>>> +++ b/xen/arch/x86/apic.c
>>>>>>>>> @@ -515,7 +515,7 @@ static void resume_x2apic(void)
>>>>>>>>>      iommu_enable_x2apic();
>>>>>>>>>      __enable_x2apic();
>>>>>>>>>  
>>>>>>>>> -    restore_IO_APIC_setup(ioapic_entries);
>>>>>>>>> +    restore_IO_APIC_setup(ioapic_entries, true);
>>>>>>>>>      unmask_8259A();
>>>>>>>>>  
>>>>>>>>>  out:
>>>>>>>>> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void)
>>>>>>>>>          printk("Switched to APIC driver %s\n", genapic.name);
>>>>>>>>>  
>>>>>>>>>  restore_out:
>>>>>>>>> -    restore_IO_APIC_setup(ioapic_entries);
>>>>>>>>> +    /*
>>>>>>>>> +     * NB: do not use raw mode when restoring entries if the iommu has been
>>>>>>>>> +     * enabled during the process, because the entries need to be translated
>>>>>>>>> +     * and added to the remapping table in that case.
>>>>>>>>> +     */
>>>>>>>>> +    restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled);
>>>>>>>>
>>>>>>>> How is this different in the resume_x2apic() case? The IOMMU gets
>>>>>>>> enabled in the course of that as well. I.e. I'd expect you want
>>>>>>>> to pass "false" there, not "true".
>>>>>>>
>>>>>>> In the resume_x2apic case interrupt remapping should already be
>>>>>>> enabled or not, but that function is not going to enable interrupt
>>>>>>> remapping if it wasn't enabled before, hence the IO-APIC entries
>>>>>>> should already be using the interrupt remapping table and no
>>>>>>> translation is needed.
>>>>>>
>>>>>> Who / what would have enabled the IOMMU in the resume case?
>>>>>
>>>>> I don't think the question is who enables interrupt remapping in the
>>>>> resume case (which is resume_x2apic when calling iommu_enable_x2apic
>>>>> AFAICT), the point here is that on resume the entries in the IO-APIC
>>>>> will already match the state of interrupt remapping, so they shouldn't
>>>>> be translated. If interrupt remapping was off before suspend it will
>>>>> still be off after resume, and there won't be any translation needed.
>>>>> The same is true if interrupt remapping is on before suspend.
>>>>
>>>> I disagree: save_IO_APIC_setup() gets called from resume_x2apic(),
>>>> not prior to suspend.
>>>
>>> Oh, so maybe that's a misunderstanding on my side. I don't seem to be
>>> able to find a statement about the contents of the IO-APIC registers
>>> (and more specifically the entries) when getting back from
>>> suspension. Are all entries cleared and masked?
>>>
>>> Are the values previous to suspension stored?
>>
>> See ioapic_suspend() / ioapic_resume(): Looks like there's some
>> redundancy here - I don't think it makes sense for the LAPIC
>> code to fiddle with all the RTEs if subsequently (I assume;
>> didn't check) they'll all be overwritten anyway. I would seem
>> more logical to me if they'd just all get masked for IOMMU
>> enabling, deferring to ioapic_resume() for everything else.
> 
> Yes, it seems like resume_x2apic shouldn't need to play with the
> entries at all TBH, just enabling interrupt remapping should be enough
> given that the entries are already save and restored by
> ioapic_{suspend/resume}.
> 
> Would you agree to leave that as-is in this patch, ie: always pass
> true to restore_IO_APIC_setup in resume_x2apic?

Properly explained in the description, that's an option. Even better
would of course be to do away with the unnecessary save/restore in
a prereq patch, at which point the question disappears as to what to
pass to the function at that point.

>>> But it's simply not possible to reach the call to
>>> restore_IO_APIC_setup with x2apic_enabled == true and interrupt
>>> remapping disabled, regardless of the initial value of
>>> x2apic_enabled.
>>>
>>> All the paths that could lead to this scenario are short-circuited
>>> above with a panic.
>>
>> Hmm, true. Nevertheless it would feel better if the conditionals
>> were using what actually matters, rather than something derived.
> 
> Would you be OK to using something like v1:
> 
> https://lists.xenproject.org/archives/html/xen-devel/2019-10/msg00805.html
> 
> See the usage of iommu_enabled in x2apic_bsp_setup. I think using
> x2apic_enabled is fine given the logic in the function, and the fact
> that x2APIC mandates interrupt remapping, but I could live with that
> extra local variable.

Conceptually this looks better to me. I'd prefer to avoid shadowing
the global "iommu_enabled" though, and set the local variable to
"true" only one we actually enable the IOMMU. Also strictly speaking
you care about the enabling of interrupt remapping, not that of the
IOMMU (which implicitly means DMA remapping). I wonder whether the
code couldn't easily be made cope with IOMMU already being enabled,
assuming that this would lead to {iommu,intremap}_enabled to already
be true on entry. I.e. you'd request a "raw" restore unless
intremap_enabled changed from false to true.

>>>> I realize that io_apic_write() would suitably avoid going
>>>> the remapping path, but I think it would be more clear if the
>>>> distinction was already made properly at the call site.
>>>
>>> I'm afraid I'm slightly loss, do you mean to replace the
>>> ioapic_write_entry with an io_apic_write in restore_IO_APIC_setup?
>>
>> No, because of ...
>>
>>> That would be the same as always passing raw == false AFAICT.
>>
>> ... this. I'm asking to pass in the argument for "raw" based
>> on what you want, without relying on io_apic_write()'s
>> behavior. That's more for code clarity than actual correctness,
>> since - as said - io_apic_write() would invoke __io_apic_write()
>> anyway.
> 
> Sorry, but I'm afraid I still don't fully understand what you mean
> with this. restore_IO_APIC_setup uses ioapic_write_entry which in turn
> uses __io_apic_write or io_apic_write. I could get rid of the usage of
> ioapic_write_entry and just call __io_apic_write or io_apic_write (or
> even directly call iommu_update_ire_from_apic), but I'm not sure
> what's the benefit of any of this, it's just open-coding logic that's
> done in the helpers.
> 
> Would you rather mean to rename the parameter of restore_IO_APIC_setup
> from 'raw' to 'translate' or some such in order to clearly denote
> there's a transformation done before writing the entry?

Whether it's "raw" or "translate" doesn't really matter to me.
What I'm after is for you to not pass raw=true when you really
mean raw=false (or the other way around), relying on the
intremap_enabled check down the call chain in io_apic_write().

Jan
Roger Pau Monné Oct. 11, 2019, 10:36 a.m. UTC | #12
On Fri, Oct 11, 2019 at 12:01:36PM +0200, Jan Beulich wrote:
> On 11.10.2019 11:28, Roger Pau Monné  wrote:
> > On Fri, Oct 11, 2019 at 09:52:35AM +0200, Jan Beulich wrote:
> >> On 10.10.2019 17:19, Roger Pau Monné  wrote:
> >>> On Thu, Oct 10, 2019 at 03:46:45PM +0200, Jan Beulich wrote:
> >>>> On 10.10.2019 15:12, Roger Pau Monné  wrote:
> >>>>> On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote:
> >>>>>> On 10.10.2019 14:13, Roger Pau Monné  wrote:
> >>>>>>> On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote:
> >>>>>>>> On 10.10.2019 13:33, Roger Pau Monne wrote:
> >>>>>>>>> When interrupt remapping is enabled as part of enabling x2APIC the
> >>>>>>>>
> >>>>>>>> Perhaps "unmasked" instead of "the"?
> >>>>>>>>
> >>>>>>>>> IO-APIC entries also need to be translated to the new format and added
> >>>>>>>>> to the interrupt remapping table.
> >>>>>>>>>
> >>>>>>>>> This prevents IOMMU interrupt remapping faults when booting on
> >>>>>>>>> hardware that has unmasked IO-APIC pins.
> >>>>>>>>
> >>>>>>>> But in the end it only papers over whatever the spurious interrupts
> >>>>>>>> result form, doesn't it? Which isn't to say this isn't an
> >>>>>>>> improvement. Calling out the ExtInt case here may be worthwhile as
> >>>>>>>> well, as would be pointing out that this case still won't work on
> >>>>>>>> AMD IOMMUs.
> >>>>>>>
> >>>>>>> But the fix for the ExtINT AMD issue should be done in
> >>>>>>> amd_iommu_ioapic_update_ire then, so that it can properly handle
> >>>>>>> ExtINT delivery mode, not to this part of the code. I will look
> >>>>>>> into it, but I think it's kind of tangential to the issue here.
> >>>>>>
> >>>>>> I'm not talking of you working on fixing this right away. I'm merely
> >>>>>> asking that you mention here (a) the ExtInt special case and (b)
> >>>>>> that this special case will (continue to) not work in the AMD case.
> >>>>>>
> >>>>>>>>> --- a/xen/arch/x86/apic.c
> >>>>>>>>> +++ b/xen/arch/x86/apic.c
> >>>>>>>>> @@ -515,7 +515,7 @@ static void resume_x2apic(void)
> >>>>>>>>>      iommu_enable_x2apic();
> >>>>>>>>>      __enable_x2apic();
> >>>>>>>>>  
> >>>>>>>>> -    restore_IO_APIC_setup(ioapic_entries);
> >>>>>>>>> +    restore_IO_APIC_setup(ioapic_entries, true);
> >>>>>>>>>      unmask_8259A();
> >>>>>>>>>  
> >>>>>>>>>  out:
> >>>>>>>>> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void)
> >>>>>>>>>          printk("Switched to APIC driver %s\n", genapic.name);
> >>>>>>>>>  
> >>>>>>>>>  restore_out:
> >>>>>>>>> -    restore_IO_APIC_setup(ioapic_entries);
> >>>>>>>>> +    /*
> >>>>>>>>> +     * NB: do not use raw mode when restoring entries if the iommu has been
> >>>>>>>>> +     * enabled during the process, because the entries need to be translated
> >>>>>>>>> +     * and added to the remapping table in that case.
> >>>>>>>>> +     */
> >>>>>>>>> +    restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled);
> >>>>>>>>
> >>>>>>>> How is this different in the resume_x2apic() case? The IOMMU gets
> >>>>>>>> enabled in the course of that as well. I.e. I'd expect you want
> >>>>>>>> to pass "false" there, not "true".
> >>>>>>>
> >>>>>>> In the resume_x2apic case interrupt remapping should already be
> >>>>>>> enabled or not, but that function is not going to enable interrupt
> >>>>>>> remapping if it wasn't enabled before, hence the IO-APIC entries
> >>>>>>> should already be using the interrupt remapping table and no
> >>>>>>> translation is needed.
> >>>>>>
> >>>>>> Who / what would have enabled the IOMMU in the resume case?
> >>>>>
> >>>>> I don't think the question is who enables interrupt remapping in the
> >>>>> resume case (which is resume_x2apic when calling iommu_enable_x2apic
> >>>>> AFAICT), the point here is that on resume the entries in the IO-APIC
> >>>>> will already match the state of interrupt remapping, so they shouldn't
> >>>>> be translated. If interrupt remapping was off before suspend it will
> >>>>> still be off after resume, and there won't be any translation needed.
> >>>>> The same is true if interrupt remapping is on before suspend.
> >>>>
> >>>> I disagree: save_IO_APIC_setup() gets called from resume_x2apic(),
> >>>> not prior to suspend.
> >>>
> >>> Oh, so maybe that's a misunderstanding on my side. I don't seem to be
> >>> able to find a statement about the contents of the IO-APIC registers
> >>> (and more specifically the entries) when getting back from
> >>> suspension. Are all entries cleared and masked?
> >>>
> >>> Are the values previous to suspension stored?
> >>
> >> See ioapic_suspend() / ioapic_resume(): Looks like there's some
> >> redundancy here - I don't think it makes sense for the LAPIC
> >> code to fiddle with all the RTEs if subsequently (I assume;
> >> didn't check) they'll all be overwritten anyway. I would seem
> >> more logical to me if they'd just all get masked for IOMMU
> >> enabling, deferring to ioapic_resume() for everything else.
> > 
> > Yes, it seems like resume_x2apic shouldn't need to play with the
> > entries at all TBH, just enabling interrupt remapping should be enough
> > given that the entries are already save and restored by
> > ioapic_{suspend/resume}.
> > 
> > Would you agree to leave that as-is in this patch, ie: always pass
> > true to restore_IO_APIC_setup in resume_x2apic?
> 
> Properly explained in the description, that's an option. Even better
> would of course be to do away with the unnecessary save/restore in
> a prereq patch, at which point the question disappears as to what to
> pass to the function at that point.

OK, I can create a pre-patch, but I don't think I have any hardware
that does suspend/resume since it's all server-grade. Someone would
have to test it and assert it actually works correctly, hence my
reluctance to touch any of the suspend/resume logic.

> >>> But it's simply not possible to reach the call to
> >>> restore_IO_APIC_setup with x2apic_enabled == true and interrupt
> >>> remapping disabled, regardless of the initial value of
> >>> x2apic_enabled.
> >>>
> >>> All the paths that could lead to this scenario are short-circuited
> >>> above with a panic.
> >>
> >> Hmm, true. Nevertheless it would feel better if the conditionals
> >> were using what actually matters, rather than something derived.
> > 
> > Would you be OK to using something like v1:
> > 
> > https://lists.xenproject.org/archives/html/xen-devel/2019-10/msg00805.html
> > 
> > See the usage of iommu_enabled in x2apic_bsp_setup. I think using
> > x2apic_enabled is fine given the logic in the function, and the fact
> > that x2APIC mandates interrupt remapping, but I could live with that
> > extra local variable.
> 
> Conceptually this looks better to me. I'd prefer to avoid shadowing
> the global "iommu_enabled" though, and set the local variable to
> "true" only one we actually enable the IOMMU. Also strictly speaking
> you care about the enabling of interrupt remapping, not that of the
> IOMMU (which implicitly means DMA remapping). I wonder whether the
> code couldn't easily be made cope with IOMMU already being enabled,
> assuming that this would lead to {iommu,intremap}_enabled to already
> be true on entry. I.e. you'd request a "raw" restore unless
> intremap_enabled changed from false to true.

I'm not sure it's that easy. A raw restore wouldn't fully work, since
Xen would have replaced the interrupt remapping table with an empty
one, so Xen would have to add the entries for the unmasked IO-APIC
pins to the IRT and then likely also update the IO-APIC pin entry to
match the newly created IRTE.

I think adding a comment at the top of the function to note that this
won't work properly if there are unmasked IO-APIC pins and interrupt
remapping enabled should be enough for now, as the patch doesn't make
this case worse than it's current state.

> >>>> I realize that io_apic_write() would suitably avoid going
> >>>> the remapping path, but I think it would be more clear if the
> >>>> distinction was already made properly at the call site.
> >>>
> >>> I'm afraid I'm slightly loss, do you mean to replace the
> >>> ioapic_write_entry with an io_apic_write in restore_IO_APIC_setup?
> >>
> >> No, because of ...
> >>
> >>> That would be the same as always passing raw == false AFAICT.
> >>
> >> ... this. I'm asking to pass in the argument for "raw" based
> >> on what you want, without relying on io_apic_write()'s
> >> behavior. That's more for code clarity than actual correctness,
> >> since - as said - io_apic_write() would invoke __io_apic_write()
> >> anyway.
> > 
> > Sorry, but I'm afraid I still don't fully understand what you mean
> > with this. restore_IO_APIC_setup uses ioapic_write_entry which in turn
> > uses __io_apic_write or io_apic_write. I could get rid of the usage of
> > ioapic_write_entry and just call __io_apic_write or io_apic_write (or
> > even directly call iommu_update_ire_from_apic), but I'm not sure
> > what's the benefit of any of this, it's just open-coding logic that's
> > done in the helpers.
> > 
> > Would you rather mean to rename the parameter of restore_IO_APIC_setup
> > from 'raw' to 'translate' or some such in order to clearly denote
> > there's a transformation done before writing the entry?
> 
> Whether it's "raw" or "translate" doesn't really matter to me.
> What I'm after is for you to not pass raw=true when you really
> mean raw=false (or the other way around), relying on the
> intremap_enabled check down the call chain in io_apic_write().

I don't think that's the case with my proposed code (or at least not
the intention), when raw=false is passed that's because the entries
_must_ be translated, not because x2apic_bsp_setup is relying on the
value of intremap_enabled.

Let me prepare a new series with the proposed changes and we can
continue from there.

Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index 6cdb50cf41..8415d9f787 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -515,7 +515,7 @@  static void resume_x2apic(void)
     iommu_enable_x2apic();
     __enable_x2apic();
 
-    restore_IO_APIC_setup(ioapic_entries);
+    restore_IO_APIC_setup(ioapic_entries, true);
     unmask_8259A();
 
 out:
@@ -961,7 +961,12 @@  void __init x2apic_bsp_setup(void)
         printk("Switched to APIC driver %s\n", genapic.name);
 
 restore_out:
-    restore_IO_APIC_setup(ioapic_entries);
+    /*
+     * NB: do not use raw mode when restoring entries if the iommu has been
+     * enabled during the process, because the entries need to be translated
+     * and added to the remapping table in that case.
+     */
+    restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled);
     unmask_8259A();
 
 out:
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index 5d25862bd8..37eabc16c9 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -379,7 +379,8 @@  void mask_IO_APIC_setup(struct IO_APIC_route_entry **ioapic_entries)
 /*
  * Restore IO APIC entries which was saved in ioapic_entries.
  */
-int restore_IO_APIC_setup(struct IO_APIC_route_entry **ioapic_entries)
+int restore_IO_APIC_setup(struct IO_APIC_route_entry **ioapic_entries,
+                          bool raw)
 {
     int apic, pin;
 
@@ -394,7 +395,7 @@  int restore_IO_APIC_setup(struct IO_APIC_route_entry **ioapic_entries)
             return -ENOMEM;
 
         for (pin = 0; pin < nr_ioapic_entries[apic]; pin++)
-	    ioapic_write_entry(apic, pin, 1, ioapic_entries[apic][pin]);
+	    ioapic_write_entry(apic, pin, raw, ioapic_entries[apic][pin]);
     }
 
     return 0;
diff --git a/xen/include/asm-x86/io_apic.h b/xen/include/asm-x86/io_apic.h
index 0b041f0565..998905186b 100644
--- a/xen/include/asm-x86/io_apic.h
+++ b/xen/include/asm-x86/io_apic.h
@@ -197,7 +197,8 @@  extern struct IO_APIC_route_entry **alloc_ioapic_entries(void);
 extern void free_ioapic_entries(struct IO_APIC_route_entry **ioapic_entries);
 extern int save_IO_APIC_setup(struct IO_APIC_route_entry **ioapic_entries);
 extern void mask_IO_APIC_setup(struct IO_APIC_route_entry **ioapic_entries);
-extern int restore_IO_APIC_setup(struct IO_APIC_route_entry **ioapic_entries);
+extern int restore_IO_APIC_setup(struct IO_APIC_route_entry **ioapic_entries,
+                                 bool raw);
 
 unsigned highest_gsi(void);