Message ID | 20191010113345.6867-2-roger.pau@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | iommu: fixes for interrupt remapping enabling | expand |
On 10.10.2019 13:33, Roger Pau Monne wrote: > When interrupt remapping is enabled as part of enabling x2APIC the Perhaps "unmasked" instead of "the"? > IO-APIC entries also need to be translated to the new format and added > to the interrupt remapping table. > > This prevents IOMMU interrupt remapping faults when booting on > hardware that has unmasked IO-APIC pins. But in the end it only papers over whatever the spurious interrupts result form, doesn't it? Which isn't to say this isn't an improvement. Calling out the ExtInt case here may be worthwhile as well, as would be pointing out that this case still won't work on AMD IOMMUs. > --- a/xen/arch/x86/apic.c > +++ b/xen/arch/x86/apic.c > @@ -515,7 +515,7 @@ static void resume_x2apic(void) > iommu_enable_x2apic(); > __enable_x2apic(); > > - restore_IO_APIC_setup(ioapic_entries); > + restore_IO_APIC_setup(ioapic_entries, true); > unmask_8259A(); > > out: > @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void) > printk("Switched to APIC driver %s\n", genapic.name); > > restore_out: > - restore_IO_APIC_setup(ioapic_entries); > + /* > + * NB: do not use raw mode when restoring entries if the iommu has been > + * enabled during the process, because the entries need to be translated > + * and added to the remapping table in that case. > + */ > + restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled); How is this different in the resume_x2apic() case? The IOMMU gets enabled in the course of that as well. I.e. I'd expect you want to pass "false" there, not "true". Also how would "x2apic_enabled" reflect the transition? It may have been "true" already before entering the function here. Jan
On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote: > On 10.10.2019 13:33, Roger Pau Monne wrote: > > When interrupt remapping is enabled as part of enabling x2APIC the > > Perhaps "unmasked" instead of "the"? > > > IO-APIC entries also need to be translated to the new format and added > > to the interrupt remapping table. > > > > This prevents IOMMU interrupt remapping faults when booting on > > hardware that has unmasked IO-APIC pins. > > But in the end it only papers over whatever the spurious interrupts > result form, doesn't it? Which isn't to say this isn't an > improvement. Calling out the ExtInt case here may be worthwhile as > well, as would be pointing out that this case still won't work on > AMD IOMMUs. But the fix for the ExtINT AMD issue should be done in amd_iommu_ioapic_update_ire then, so that it can properly handle ExtINT delivery mode, not to this part of the code. I will look into it, but I think it's kind of tangential to the issue here. > > --- a/xen/arch/x86/apic.c > > +++ b/xen/arch/x86/apic.c > > @@ -515,7 +515,7 @@ static void resume_x2apic(void) > > iommu_enable_x2apic(); > > __enable_x2apic(); > > > > - restore_IO_APIC_setup(ioapic_entries); > > + restore_IO_APIC_setup(ioapic_entries, true); > > unmask_8259A(); > > > > out: > > @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void) > > printk("Switched to APIC driver %s\n", genapic.name); > > > > restore_out: > > - restore_IO_APIC_setup(ioapic_entries); > > + /* > > + * NB: do not use raw mode when restoring entries if the iommu has been > > + * enabled during the process, because the entries need to be translated > > + * and added to the remapping table in that case. > > + */ > > + restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled); > > How is this different in the resume_x2apic() case? The IOMMU gets > enabled in the course of that as well. I.e. I'd expect you want > to pass "false" there, not "true". In the resume_x2apic case interrupt remapping should already be enabled or not, but that function is not going to enable interrupt remapping if it wasn't enabled before, hence the IO-APIC entries should already be using the interrupt remapping table and no translation is needed. > Also how would "x2apic_enabled" reflect the transition? It may > have been "true" already before entering the function here. If x2apic_enabled == true at the point where restore_IO_APIC_setup is called interrupt remapping must be enabled, because AFAICT at this point it's not possible to have x2apic_enabled == true and interrupt remapping disabled. The issue I can see here is what happens if interrupt remapping was already enabled by the hardware, and entries in the IO-APIC are already using the remapping table. I would have to look into how to detect that case properly, but I think the proposed change is an improvement overall. Thanks, Roger.
On 10.10.2019 14:13, Roger Pau Monné wrote: > On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote: >> On 10.10.2019 13:33, Roger Pau Monne wrote: >>> When interrupt remapping is enabled as part of enabling x2APIC the >> >> Perhaps "unmasked" instead of "the"? >> >>> IO-APIC entries also need to be translated to the new format and added >>> to the interrupt remapping table. >>> >>> This prevents IOMMU interrupt remapping faults when booting on >>> hardware that has unmasked IO-APIC pins. >> >> But in the end it only papers over whatever the spurious interrupts >> result form, doesn't it? Which isn't to say this isn't an >> improvement. Calling out the ExtInt case here may be worthwhile as >> well, as would be pointing out that this case still won't work on >> AMD IOMMUs. > > But the fix for the ExtINT AMD issue should be done in > amd_iommu_ioapic_update_ire then, so that it can properly handle > ExtINT delivery mode, not to this part of the code. I will look > into it, but I think it's kind of tangential to the issue here. I'm not talking of you working on fixing this right away. I'm merely asking that you mention here (a) the ExtInt special case and (b) that this special case will (continue to) not work in the AMD case. >>> --- a/xen/arch/x86/apic.c >>> +++ b/xen/arch/x86/apic.c >>> @@ -515,7 +515,7 @@ static void resume_x2apic(void) >>> iommu_enable_x2apic(); >>> __enable_x2apic(); >>> >>> - restore_IO_APIC_setup(ioapic_entries); >>> + restore_IO_APIC_setup(ioapic_entries, true); >>> unmask_8259A(); >>> >>> out: >>> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void) >>> printk("Switched to APIC driver %s\n", genapic.name); >>> >>> restore_out: >>> - restore_IO_APIC_setup(ioapic_entries); >>> + /* >>> + * NB: do not use raw mode when restoring entries if the iommu has been >>> + * enabled during the process, because the entries need to be translated >>> + * and added to the remapping table in that case. >>> + */ >>> + restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled); >> >> How is this different in the resume_x2apic() case? The IOMMU gets >> enabled in the course of that as well. I.e. I'd expect you want >> to pass "false" there, not "true". > > In the resume_x2apic case interrupt remapping should already be > enabled or not, but that function is not going to enable interrupt > remapping if it wasn't enabled before, hence the IO-APIC entries > should already be using the interrupt remapping table and no > translation is needed. Who / what would have enabled the IOMMU in the resume case? >> Also how would "x2apic_enabled" reflect the transition? It may >> have been "true" already before entering the function here. > > If x2apic_enabled == true at the point where restore_IO_APIC_setup > is called interrupt remapping must be enabled, because AFAICT at this > point it's not possible to have x2apic_enabled == true and interrupt > remapping disabled. > > The issue I can see here is what happens if interrupt remapping was > already enabled by the hardware, and entries in the IO-APIC are > already using the remapping table. I would have to look into how to > detect that case properly, but I think the proposed change is an > improvement overall. Firmware handing off with the IOMMU (and hence interrupt remapping) already enabled is a case which - afaict - we can't currently cope with. Firmware handing off in x2APIC enabled state is typically with the IOMMU (and hence interrupt remapping) still disabled. This is not a forbidden mode, it's just that in such a configuration interrupts can't be delivered to certain CPUs. In any event you need to properly distinguish x2APIC enabled state (or the transition thereof) from IOMMU / interrupt remapping enabled state (or the transition thereof). I.e. you want to avoid raw mode restore if interrupt remapping state transitioned from off to on in the process. Jan
On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote: > On 10.10.2019 14:13, Roger Pau Monné wrote: > > On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote: > >> On 10.10.2019 13:33, Roger Pau Monne wrote: > >>> When interrupt remapping is enabled as part of enabling x2APIC the > >> > >> Perhaps "unmasked" instead of "the"? > >> > >>> IO-APIC entries also need to be translated to the new format and added > >>> to the interrupt remapping table. > >>> > >>> This prevents IOMMU interrupt remapping faults when booting on > >>> hardware that has unmasked IO-APIC pins. > >> > >> But in the end it only papers over whatever the spurious interrupts > >> result form, doesn't it? Which isn't to say this isn't an > >> improvement. Calling out the ExtInt case here may be worthwhile as > >> well, as would be pointing out that this case still won't work on > >> AMD IOMMUs. > > > > But the fix for the ExtINT AMD issue should be done in > > amd_iommu_ioapic_update_ire then, so that it can properly handle > > ExtINT delivery mode, not to this part of the code. I will look > > into it, but I think it's kind of tangential to the issue here. > > I'm not talking of you working on fixing this right away. I'm merely > asking that you mention here (a) the ExtInt special case and (b) > that this special case will (continue to) not work in the AMD case. > > >>> --- a/xen/arch/x86/apic.c > >>> +++ b/xen/arch/x86/apic.c > >>> @@ -515,7 +515,7 @@ static void resume_x2apic(void) > >>> iommu_enable_x2apic(); > >>> __enable_x2apic(); > >>> > >>> - restore_IO_APIC_setup(ioapic_entries); > >>> + restore_IO_APIC_setup(ioapic_entries, true); > >>> unmask_8259A(); > >>> > >>> out: > >>> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void) > >>> printk("Switched to APIC driver %s\n", genapic.name); > >>> > >>> restore_out: > >>> - restore_IO_APIC_setup(ioapic_entries); > >>> + /* > >>> + * NB: do not use raw mode when restoring entries if the iommu has been > >>> + * enabled during the process, because the entries need to be translated > >>> + * and added to the remapping table in that case. > >>> + */ > >>> + restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled); > >> > >> How is this different in the resume_x2apic() case? The IOMMU gets > >> enabled in the course of that as well. I.e. I'd expect you want > >> to pass "false" there, not "true". > > > > In the resume_x2apic case interrupt remapping should already be > > enabled or not, but that function is not going to enable interrupt > > remapping if it wasn't enabled before, hence the IO-APIC entries > > should already be using the interrupt remapping table and no > > translation is needed. > > Who / what would have enabled the IOMMU in the resume case? I don't think the question is who enables interrupt remapping in the resume case (which is resume_x2apic when calling iommu_enable_x2apic AFAICT), the point here is that on resume the entries in the IO-APIC will already match the state of interrupt remapping, so they shouldn't be translated. If interrupt remapping was off before suspend it will still be off after resume, and there won't be any translation needed. The same is true if interrupt remapping is on before suspend. > >> Also how would "x2apic_enabled" reflect the transition? It may > >> have been "true" already before entering the function here. > > > > If x2apic_enabled == true at the point where restore_IO_APIC_setup > > is called interrupt remapping must be enabled, because AFAICT at this > > point it's not possible to have x2apic_enabled == true and interrupt > > remapping disabled. > > > > The issue I can see here is what happens if interrupt remapping was > > already enabled by the hardware, and entries in the IO-APIC are > > already using the remapping table. I would have to look into how to > > detect that case properly, but I think the proposed change is an > > improvement overall. > > Firmware handing off with the IOMMU (and hence interrupt remapping) > already enabled is a case which - afaict - we can't currently cope > with. Firmware handing off in x2APIC enabled state is typically > with the IOMMU (and hence interrupt remapping) still disabled. This > is not a forbidden mode, it's just that in such a configuration > interrupts can't be delivered to certain CPUs. > > In any event you need to properly distinguish x2APIC enabled state > (or the transition thereof) from IOMMU / interrupt remapping > enabled state (or the transition thereof). I.e. you want to avoid > raw mode restore if interrupt remapping state transitioned from > off to on in the process. Right, and that's why the call to restore_IO_APIC_setup in x2apic_bsp_setup uses !x2apic_enabled as it's second parameter. If interrupt remapping has been enabled by the call to iommu_enable_x2apic x2apic_enabled must be true, and hence the entries in the IO-APIC need to be translated to use the interrupt remapping table. There's no path that can lead to restore_IO_APIC_setup with interrupt remapping enabled and x2APIC mode disabled (or with x2APIC enabled and interrupt remapping disabled). Hence if interrupt remapping is off before calling x2apic_bsp_setup (which is what Xen expects to function properly) and x2apic_enabled == true when calling restore_IO_APIC_setup it means interrupt remapping got enabled, and the IO-APIC entries need translation. Thanks, Roger.
On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote: > On 10.10.2019 14:13, Roger Pau Monné wrote: > > On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote: > >> On 10.10.2019 13:33, Roger Pau Monne wrote: > >>> When interrupt remapping is enabled as part of enabling x2APIC the > >> > >> Perhaps "unmasked" instead of "the"? > >> > >>> IO-APIC entries also need to be translated to the new format and added > >>> to the interrupt remapping table. > >>> > >>> This prevents IOMMU interrupt remapping faults when booting on > >>> hardware that has unmasked IO-APIC pins. > >> > >> But in the end it only papers over whatever the spurious interrupts > >> result form, doesn't it? Which isn't to say this isn't an > >> improvement. Calling out the ExtInt case here may be worthwhile as > >> well, as would be pointing out that this case still won't work on > >> AMD IOMMUs. > > > > But the fix for the ExtINT AMD issue should be done in > > amd_iommu_ioapic_update_ire then, so that it can properly handle > > ExtINT delivery mode, not to this part of the code. I will look > > into it, but I think it's kind of tangential to the issue here. > > I'm not talking of you working on fixing this right away. I'm merely > asking that you mention here (a) the ExtInt special case and (b) > that this special case will (continue to) not work in the AMD case. Please bear with me, I've taken a look at the ExtINT handling for AMD and I'm not sure I can spot what's missing. Xen seems to handle both the EIntPass and IV fields of the DTE (see iommu_dte_add_device_entry and amd_iommu_set_intremap_table), and that should be enough in order to passthrough such interrupts. Is there some other handling that I'm missing? (maybe in the handling of the interrupt itself?) Thanks, Roger.
On 10.10.2019 15:12, Roger Pau Monné wrote: > On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote: >> On 10.10.2019 14:13, Roger Pau Monné wrote: >>> On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote: >>>> On 10.10.2019 13:33, Roger Pau Monne wrote: >>>>> When interrupt remapping is enabled as part of enabling x2APIC the >>>> >>>> Perhaps "unmasked" instead of "the"? >>>> >>>>> IO-APIC entries also need to be translated to the new format and added >>>>> to the interrupt remapping table. >>>>> >>>>> This prevents IOMMU interrupt remapping faults when booting on >>>>> hardware that has unmasked IO-APIC pins. >>>> >>>> But in the end it only papers over whatever the spurious interrupts >>>> result form, doesn't it? Which isn't to say this isn't an >>>> improvement. Calling out the ExtInt case here may be worthwhile as >>>> well, as would be pointing out that this case still won't work on >>>> AMD IOMMUs. >>> >>> But the fix for the ExtINT AMD issue should be done in >>> amd_iommu_ioapic_update_ire then, so that it can properly handle >>> ExtINT delivery mode, not to this part of the code. I will look >>> into it, but I think it's kind of tangential to the issue here. >> >> I'm not talking of you working on fixing this right away. I'm merely >> asking that you mention here (a) the ExtInt special case and (b) >> that this special case will (continue to) not work in the AMD case. >> >>>>> --- a/xen/arch/x86/apic.c >>>>> +++ b/xen/arch/x86/apic.c >>>>> @@ -515,7 +515,7 @@ static void resume_x2apic(void) >>>>> iommu_enable_x2apic(); >>>>> __enable_x2apic(); >>>>> >>>>> - restore_IO_APIC_setup(ioapic_entries); >>>>> + restore_IO_APIC_setup(ioapic_entries, true); >>>>> unmask_8259A(); >>>>> >>>>> out: >>>>> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void) >>>>> printk("Switched to APIC driver %s\n", genapic.name); >>>>> >>>>> restore_out: >>>>> - restore_IO_APIC_setup(ioapic_entries); >>>>> + /* >>>>> + * NB: do not use raw mode when restoring entries if the iommu has been >>>>> + * enabled during the process, because the entries need to be translated >>>>> + * and added to the remapping table in that case. >>>>> + */ >>>>> + restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled); >>>> >>>> How is this different in the resume_x2apic() case? The IOMMU gets >>>> enabled in the course of that as well. I.e. I'd expect you want >>>> to pass "false" there, not "true". >>> >>> In the resume_x2apic case interrupt remapping should already be >>> enabled or not, but that function is not going to enable interrupt >>> remapping if it wasn't enabled before, hence the IO-APIC entries >>> should already be using the interrupt remapping table and no >>> translation is needed. >> >> Who / what would have enabled the IOMMU in the resume case? > > I don't think the question is who enables interrupt remapping in the > resume case (which is resume_x2apic when calling iommu_enable_x2apic > AFAICT), the point here is that on resume the entries in the IO-APIC > will already match the state of interrupt remapping, so they shouldn't > be translated. If interrupt remapping was off before suspend it will > still be off after resume, and there won't be any translation needed. > The same is true if interrupt remapping is on before suspend. I disagree: save_IO_APIC_setup() gets called from resume_x2apic(), not prior to suspend. >>>> Also how would "x2apic_enabled" reflect the transition? It may >>>> have been "true" already before entering the function here. >>> >>> If x2apic_enabled == true at the point where restore_IO_APIC_setup >>> is called interrupt remapping must be enabled, because AFAICT at this >>> point it's not possible to have x2apic_enabled == true and interrupt >>> remapping disabled. >>> >>> The issue I can see here is what happens if interrupt remapping was >>> already enabled by the hardware, and entries in the IO-APIC are >>> already using the remapping table. I would have to look into how to >>> detect that case properly, but I think the proposed change is an >>> improvement overall. >> >> Firmware handing off with the IOMMU (and hence interrupt remapping) >> already enabled is a case which - afaict - we can't currently cope >> with. Firmware handing off in x2APIC enabled state is typically >> with the IOMMU (and hence interrupt remapping) still disabled. This >> is not a forbidden mode, it's just that in such a configuration >> interrupts can't be delivered to certain CPUs. >> >> In any event you need to properly distinguish x2APIC enabled state >> (or the transition thereof) from IOMMU / interrupt remapping >> enabled state (or the transition thereof). I.e. you want to avoid >> raw mode restore if interrupt remapping state transitioned from >> off to on in the process. > > Right, and that's why the call to restore_IO_APIC_setup in > x2apic_bsp_setup uses !x2apic_enabled as it's second parameter. If > interrupt remapping has been enabled by the call to > iommu_enable_x2apic x2apic_enabled must be true, and hence the entries > in the IO-APIC need to be translated to use the interrupt remapping > table. There's no path that can lead to restore_IO_APIC_setup with > interrupt remapping enabled and x2APIC mode disabled (or with x2APIC > enabled and interrupt remapping disabled). > > Hence if interrupt remapping is off before calling x2apic_bsp_setup > (which is what Xen expects to function properly) and x2apic_enabled == > true when calling restore_IO_APIC_setup it means interrupt remapping > got enabled, and the IO-APIC entries need translation. But the code in question sits on a shared success/error path, and in the error case it matters whether x2apic_enabled was true already on entry. I realize that io_apic_write() would suitably avoid going the remapping path, but I think it would be more clear if the distinction was already made properly at the call site. Jan
On 10.10.2019 15:29, Roger Pau Monné wrote: > On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote: >> On 10.10.2019 14:13, Roger Pau Monné wrote: >>> On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote: >>>> On 10.10.2019 13:33, Roger Pau Monne wrote: >>>>> When interrupt remapping is enabled as part of enabling x2APIC the >>>> >>>> Perhaps "unmasked" instead of "the"? >>>> >>>>> IO-APIC entries also need to be translated to the new format and added >>>>> to the interrupt remapping table. >>>>> >>>>> This prevents IOMMU interrupt remapping faults when booting on >>>>> hardware that has unmasked IO-APIC pins. >>>> >>>> But in the end it only papers over whatever the spurious interrupts >>>> result form, doesn't it? Which isn't to say this isn't an >>>> improvement. Calling out the ExtInt case here may be worthwhile as >>>> well, as would be pointing out that this case still won't work on >>>> AMD IOMMUs. >>> >>> But the fix for the ExtINT AMD issue should be done in >>> amd_iommu_ioapic_update_ire then, so that it can properly handle >>> ExtINT delivery mode, not to this part of the code. I will look >>> into it, but I think it's kind of tangential to the issue here. >> >> I'm not talking of you working on fixing this right away. I'm merely >> asking that you mention here (a) the ExtInt special case and (b) >> that this special case will (continue to) not work in the AMD case. > > Please bear with me, I've taken a look at the ExtINT handling for AMD > and I'm not sure I can spot what's missing. Xen seems to handle both > the EIntPass and IV fields of the DTE (see iommu_dte_add_device_entry > and amd_iommu_set_intremap_table), and that should be enough in order > to passthrough such interrupts. EIntPass gets set based on ACPI table information, not what we found in a particular RTE. That's hopefully fine, but you know how reliable firmware is especially in corner cases. > Is there some other handling that I'm missing? (maybe in the handling > of the interrupt itself?) Look at the update_intremap_entry_from_ioapic() -> update_intremap_entry() path: This converts the 3-bit delivery mode field into a 1-bit int_type one (the field is really 3 bits wide, but values 010..111 (binary) are documented as reserved; I can't exclude though that the documentation is wrong here). Jan
On Thu, Oct 10, 2019 at 03:46:45PM +0200, Jan Beulich wrote: > On 10.10.2019 15:12, Roger Pau Monné wrote: > > On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote: > >> On 10.10.2019 14:13, Roger Pau Monné wrote: > >>> On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote: > >>>> On 10.10.2019 13:33, Roger Pau Monne wrote: > >>>>> When interrupt remapping is enabled as part of enabling x2APIC the > >>>> > >>>> Perhaps "unmasked" instead of "the"? > >>>> > >>>>> IO-APIC entries also need to be translated to the new format and added > >>>>> to the interrupt remapping table. > >>>>> > >>>>> This prevents IOMMU interrupt remapping faults when booting on > >>>>> hardware that has unmasked IO-APIC pins. > >>>> > >>>> But in the end it only papers over whatever the spurious interrupts > >>>> result form, doesn't it? Which isn't to say this isn't an > >>>> improvement. Calling out the ExtInt case here may be worthwhile as > >>>> well, as would be pointing out that this case still won't work on > >>>> AMD IOMMUs. > >>> > >>> But the fix for the ExtINT AMD issue should be done in > >>> amd_iommu_ioapic_update_ire then, so that it can properly handle > >>> ExtINT delivery mode, not to this part of the code. I will look > >>> into it, but I think it's kind of tangential to the issue here. > >> > >> I'm not talking of you working on fixing this right away. I'm merely > >> asking that you mention here (a) the ExtInt special case and (b) > >> that this special case will (continue to) not work in the AMD case. > >> > >>>>> --- a/xen/arch/x86/apic.c > >>>>> +++ b/xen/arch/x86/apic.c > >>>>> @@ -515,7 +515,7 @@ static void resume_x2apic(void) > >>>>> iommu_enable_x2apic(); > >>>>> __enable_x2apic(); > >>>>> > >>>>> - restore_IO_APIC_setup(ioapic_entries); > >>>>> + restore_IO_APIC_setup(ioapic_entries, true); > >>>>> unmask_8259A(); > >>>>> > >>>>> out: > >>>>> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void) > >>>>> printk("Switched to APIC driver %s\n", genapic.name); > >>>>> > >>>>> restore_out: > >>>>> - restore_IO_APIC_setup(ioapic_entries); > >>>>> + /* > >>>>> + * NB: do not use raw mode when restoring entries if the iommu has been > >>>>> + * enabled during the process, because the entries need to be translated > >>>>> + * and added to the remapping table in that case. > >>>>> + */ > >>>>> + restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled); > >>>> > >>>> How is this different in the resume_x2apic() case? The IOMMU gets > >>>> enabled in the course of that as well. I.e. I'd expect you want > >>>> to pass "false" there, not "true". > >>> > >>> In the resume_x2apic case interrupt remapping should already be > >>> enabled or not, but that function is not going to enable interrupt > >>> remapping if it wasn't enabled before, hence the IO-APIC entries > >>> should already be using the interrupt remapping table and no > >>> translation is needed. > >> > >> Who / what would have enabled the IOMMU in the resume case? > > > > I don't think the question is who enables interrupt remapping in the > > resume case (which is resume_x2apic when calling iommu_enable_x2apic > > AFAICT), the point here is that on resume the entries in the IO-APIC > > will already match the state of interrupt remapping, so they shouldn't > > be translated. If interrupt remapping was off before suspend it will > > still be off after resume, and there won't be any translation needed. > > The same is true if interrupt remapping is on before suspend. > > I disagree: save_IO_APIC_setup() gets called from resume_x2apic(), > not prior to suspend. Oh, so maybe that's a misunderstanding on my side. I don't seem to be able to find a statement about the contents of the IO-APIC registers (and more specifically the entries) when getting back from suspension. Are all entries cleared and masked? Are the values previous to suspension stored? > >>>> Also how would "x2apic_enabled" reflect the transition? It may > >>>> have been "true" already before entering the function here. > >>> > >>> If x2apic_enabled == true at the point where restore_IO_APIC_setup > >>> is called interrupt remapping must be enabled, because AFAICT at this > >>> point it's not possible to have x2apic_enabled == true and interrupt > >>> remapping disabled. > >>> > >>> The issue I can see here is what happens if interrupt remapping was > >>> already enabled by the hardware, and entries in the IO-APIC are > >>> already using the remapping table. I would have to look into how to > >>> detect that case properly, but I think the proposed change is an > >>> improvement overall. > >> > >> Firmware handing off with the IOMMU (and hence interrupt remapping) > >> already enabled is a case which - afaict - we can't currently cope > >> with. Firmware handing off in x2APIC enabled state is typically > >> with the IOMMU (and hence interrupt remapping) still disabled. This > >> is not a forbidden mode, it's just that in such a configuration > >> interrupts can't be delivered to certain CPUs. > >> > >> In any event you need to properly distinguish x2APIC enabled state > >> (or the transition thereof) from IOMMU / interrupt remapping > >> enabled state (or the transition thereof). I.e. you want to avoid > >> raw mode restore if interrupt remapping state transitioned from > >> off to on in the process. > > > > Right, and that's why the call to restore_IO_APIC_setup in > > x2apic_bsp_setup uses !x2apic_enabled as it's second parameter. If > > interrupt remapping has been enabled by the call to > > iommu_enable_x2apic x2apic_enabled must be true, and hence the entries > > in the IO-APIC need to be translated to use the interrupt remapping > > table. There's no path that can lead to restore_IO_APIC_setup with > > interrupt remapping enabled and x2APIC mode disabled (or with x2APIC > > enabled and interrupt remapping disabled). > > > > Hence if interrupt remapping is off before calling x2apic_bsp_setup > > (which is what Xen expects to function properly) and x2apic_enabled == > > true when calling restore_IO_APIC_setup it means interrupt remapping > > got enabled, and the IO-APIC entries need translation. > > But the code in question sits on a shared success/error path, and > in the error case it matters whether x2apic_enabled was true already > on entry. But it's simply not possible to reach the call to restore_IO_APIC_setup with x2apic_enabled == true and interrupt remapping disabled, regardless of the initial value of x2apic_enabled. All the paths that could lead to this scenario are short-circuited above with a panic. > I realize that io_apic_write() would suitably avoid going > the remapping path, but I think it would be more clear if the > distinction was already made properly at the call site. I'm afraid I'm slightly loss, do you mean to replace the ioapic_write_entry with an io_apic_write in restore_IO_APIC_setup? That would be the same as always passing raw == false AFAICT. Thanks, Roger.
On 10.10.2019 17:19, Roger Pau Monné wrote: > On Thu, Oct 10, 2019 at 03:46:45PM +0200, Jan Beulich wrote: >> On 10.10.2019 15:12, Roger Pau Monné wrote: >>> On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote: >>>> On 10.10.2019 14:13, Roger Pau Monné wrote: >>>>> On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote: >>>>>> On 10.10.2019 13:33, Roger Pau Monne wrote: >>>>>>> When interrupt remapping is enabled as part of enabling x2APIC the >>>>>> >>>>>> Perhaps "unmasked" instead of "the"? >>>>>> >>>>>>> IO-APIC entries also need to be translated to the new format and added >>>>>>> to the interrupt remapping table. >>>>>>> >>>>>>> This prevents IOMMU interrupt remapping faults when booting on >>>>>>> hardware that has unmasked IO-APIC pins. >>>>>> >>>>>> But in the end it only papers over whatever the spurious interrupts >>>>>> result form, doesn't it? Which isn't to say this isn't an >>>>>> improvement. Calling out the ExtInt case here may be worthwhile as >>>>>> well, as would be pointing out that this case still won't work on >>>>>> AMD IOMMUs. >>>>> >>>>> But the fix for the ExtINT AMD issue should be done in >>>>> amd_iommu_ioapic_update_ire then, so that it can properly handle >>>>> ExtINT delivery mode, not to this part of the code. I will look >>>>> into it, but I think it's kind of tangential to the issue here. >>>> >>>> I'm not talking of you working on fixing this right away. I'm merely >>>> asking that you mention here (a) the ExtInt special case and (b) >>>> that this special case will (continue to) not work in the AMD case. >>>> >>>>>>> --- a/xen/arch/x86/apic.c >>>>>>> +++ b/xen/arch/x86/apic.c >>>>>>> @@ -515,7 +515,7 @@ static void resume_x2apic(void) >>>>>>> iommu_enable_x2apic(); >>>>>>> __enable_x2apic(); >>>>>>> >>>>>>> - restore_IO_APIC_setup(ioapic_entries); >>>>>>> + restore_IO_APIC_setup(ioapic_entries, true); >>>>>>> unmask_8259A(); >>>>>>> >>>>>>> out: >>>>>>> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void) >>>>>>> printk("Switched to APIC driver %s\n", genapic.name); >>>>>>> >>>>>>> restore_out: >>>>>>> - restore_IO_APIC_setup(ioapic_entries); >>>>>>> + /* >>>>>>> + * NB: do not use raw mode when restoring entries if the iommu has been >>>>>>> + * enabled during the process, because the entries need to be translated >>>>>>> + * and added to the remapping table in that case. >>>>>>> + */ >>>>>>> + restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled); >>>>>> >>>>>> How is this different in the resume_x2apic() case? The IOMMU gets >>>>>> enabled in the course of that as well. I.e. I'd expect you want >>>>>> to pass "false" there, not "true". >>>>> >>>>> In the resume_x2apic case interrupt remapping should already be >>>>> enabled or not, but that function is not going to enable interrupt >>>>> remapping if it wasn't enabled before, hence the IO-APIC entries >>>>> should already be using the interrupt remapping table and no >>>>> translation is needed. >>>> >>>> Who / what would have enabled the IOMMU in the resume case? >>> >>> I don't think the question is who enables interrupt remapping in the >>> resume case (which is resume_x2apic when calling iommu_enable_x2apic >>> AFAICT), the point here is that on resume the entries in the IO-APIC >>> will already match the state of interrupt remapping, so they shouldn't >>> be translated. If interrupt remapping was off before suspend it will >>> still be off after resume, and there won't be any translation needed. >>> The same is true if interrupt remapping is on before suspend. >> >> I disagree: save_IO_APIC_setup() gets called from resume_x2apic(), >> not prior to suspend. > > Oh, so maybe that's a misunderstanding on my side. I don't seem to be > able to find a statement about the contents of the IO-APIC registers > (and more specifically the entries) when getting back from > suspension. Are all entries cleared and masked? > > Are the values previous to suspension stored? See ioapic_suspend() / ioapic_resume(): Looks like there's some redundancy here - I don't think it makes sense for the LAPIC code to fiddle with all the RTEs if subsequently (I assume; didn't check) they'll all be overwritten anyway. I would seem more logical to me if they'd just all get masked for IOMMU enabling, deferring to ioapic_resume() for everything else. >>>>>> Also how would "x2apic_enabled" reflect the transition? It may >>>>>> have been "true" already before entering the function here. >>>>> >>>>> If x2apic_enabled == true at the point where restore_IO_APIC_setup >>>>> is called interrupt remapping must be enabled, because AFAICT at this >>>>> point it's not possible to have x2apic_enabled == true and interrupt >>>>> remapping disabled. >>>>> >>>>> The issue I can see here is what happens if interrupt remapping was >>>>> already enabled by the hardware, and entries in the IO-APIC are >>>>> already using the remapping table. I would have to look into how to >>>>> detect that case properly, but I think the proposed change is an >>>>> improvement overall. >>>> >>>> Firmware handing off with the IOMMU (and hence interrupt remapping) >>>> already enabled is a case which - afaict - we can't currently cope >>>> with. Firmware handing off in x2APIC enabled state is typically >>>> with the IOMMU (and hence interrupt remapping) still disabled. This >>>> is not a forbidden mode, it's just that in such a configuration >>>> interrupts can't be delivered to certain CPUs. >>>> >>>> In any event you need to properly distinguish x2APIC enabled state >>>> (or the transition thereof) from IOMMU / interrupt remapping >>>> enabled state (or the transition thereof). I.e. you want to avoid >>>> raw mode restore if interrupt remapping state transitioned from >>>> off to on in the process. >>> >>> Right, and that's why the call to restore_IO_APIC_setup in >>> x2apic_bsp_setup uses !x2apic_enabled as it's second parameter. If >>> interrupt remapping has been enabled by the call to >>> iommu_enable_x2apic x2apic_enabled must be true, and hence the entries >>> in the IO-APIC need to be translated to use the interrupt remapping >>> table. There's no path that can lead to restore_IO_APIC_setup with >>> interrupt remapping enabled and x2APIC mode disabled (or with x2APIC >>> enabled and interrupt remapping disabled). >>> >>> Hence if interrupt remapping is off before calling x2apic_bsp_setup >>> (which is what Xen expects to function properly) and x2apic_enabled == >>> true when calling restore_IO_APIC_setup it means interrupt remapping >>> got enabled, and the IO-APIC entries need translation. >> >> But the code in question sits on a shared success/error path, and >> in the error case it matters whether x2apic_enabled was true already >> on entry. > > But it's simply not possible to reach the call to > restore_IO_APIC_setup with x2apic_enabled == true and interrupt > remapping disabled, regardless of the initial value of > x2apic_enabled. > > All the paths that could lead to this scenario are short-circuited > above with a panic. Hmm, true. Nevertheless it would feel better if the conditionals were using what actually matters, rather than something derived. >> I realize that io_apic_write() would suitably avoid going >> the remapping path, but I think it would be more clear if the >> distinction was already made properly at the call site. > > I'm afraid I'm slightly loss, do you mean to replace the > ioapic_write_entry with an io_apic_write in restore_IO_APIC_setup? No, because of ... > That would be the same as always passing raw == false AFAICT. ... this. I'm asking to pass in the argument for "raw" based on what you want, without relying on io_apic_write()'s behavior. That's more for code clarity than actual correctness, since - as said - io_apic_write() would invoke __io_apic_write() anyway. By setting "raw" correctly you can simply avoid going through io_apic_write() altogether. Jan
On Fri, Oct 11, 2019 at 09:52:35AM +0200, Jan Beulich wrote: > On 10.10.2019 17:19, Roger Pau Monné wrote: > > On Thu, Oct 10, 2019 at 03:46:45PM +0200, Jan Beulich wrote: > >> On 10.10.2019 15:12, Roger Pau Monné wrote: > >>> On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote: > >>>> On 10.10.2019 14:13, Roger Pau Monné wrote: > >>>>> On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote: > >>>>>> On 10.10.2019 13:33, Roger Pau Monne wrote: > >>>>>>> When interrupt remapping is enabled as part of enabling x2APIC the > >>>>>> > >>>>>> Perhaps "unmasked" instead of "the"? > >>>>>> > >>>>>>> IO-APIC entries also need to be translated to the new format and added > >>>>>>> to the interrupt remapping table. > >>>>>>> > >>>>>>> This prevents IOMMU interrupt remapping faults when booting on > >>>>>>> hardware that has unmasked IO-APIC pins. > >>>>>> > >>>>>> But in the end it only papers over whatever the spurious interrupts > >>>>>> result form, doesn't it? Which isn't to say this isn't an > >>>>>> improvement. Calling out the ExtInt case here may be worthwhile as > >>>>>> well, as would be pointing out that this case still won't work on > >>>>>> AMD IOMMUs. > >>>>> > >>>>> But the fix for the ExtINT AMD issue should be done in > >>>>> amd_iommu_ioapic_update_ire then, so that it can properly handle > >>>>> ExtINT delivery mode, not to this part of the code. I will look > >>>>> into it, but I think it's kind of tangential to the issue here. > >>>> > >>>> I'm not talking of you working on fixing this right away. I'm merely > >>>> asking that you mention here (a) the ExtInt special case and (b) > >>>> that this special case will (continue to) not work in the AMD case. > >>>> > >>>>>>> --- a/xen/arch/x86/apic.c > >>>>>>> +++ b/xen/arch/x86/apic.c > >>>>>>> @@ -515,7 +515,7 @@ static void resume_x2apic(void) > >>>>>>> iommu_enable_x2apic(); > >>>>>>> __enable_x2apic(); > >>>>>>> > >>>>>>> - restore_IO_APIC_setup(ioapic_entries); > >>>>>>> + restore_IO_APIC_setup(ioapic_entries, true); > >>>>>>> unmask_8259A(); > >>>>>>> > >>>>>>> out: > >>>>>>> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void) > >>>>>>> printk("Switched to APIC driver %s\n", genapic.name); > >>>>>>> > >>>>>>> restore_out: > >>>>>>> - restore_IO_APIC_setup(ioapic_entries); > >>>>>>> + /* > >>>>>>> + * NB: do not use raw mode when restoring entries if the iommu has been > >>>>>>> + * enabled during the process, because the entries need to be translated > >>>>>>> + * and added to the remapping table in that case. > >>>>>>> + */ > >>>>>>> + restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled); > >>>>>> > >>>>>> How is this different in the resume_x2apic() case? The IOMMU gets > >>>>>> enabled in the course of that as well. I.e. I'd expect you want > >>>>>> to pass "false" there, not "true". > >>>>> > >>>>> In the resume_x2apic case interrupt remapping should already be > >>>>> enabled or not, but that function is not going to enable interrupt > >>>>> remapping if it wasn't enabled before, hence the IO-APIC entries > >>>>> should already be using the interrupt remapping table and no > >>>>> translation is needed. > >>>> > >>>> Who / what would have enabled the IOMMU in the resume case? > >>> > >>> I don't think the question is who enables interrupt remapping in the > >>> resume case (which is resume_x2apic when calling iommu_enable_x2apic > >>> AFAICT), the point here is that on resume the entries in the IO-APIC > >>> will already match the state of interrupt remapping, so they shouldn't > >>> be translated. If interrupt remapping was off before suspend it will > >>> still be off after resume, and there won't be any translation needed. > >>> The same is true if interrupt remapping is on before suspend. > >> > >> I disagree: save_IO_APIC_setup() gets called from resume_x2apic(), > >> not prior to suspend. > > > > Oh, so maybe that's a misunderstanding on my side. I don't seem to be > > able to find a statement about the contents of the IO-APIC registers > > (and more specifically the entries) when getting back from > > suspension. Are all entries cleared and masked? > > > > Are the values previous to suspension stored? > > See ioapic_suspend() / ioapic_resume(): Looks like there's some > redundancy here - I don't think it makes sense for the LAPIC > code to fiddle with all the RTEs if subsequently (I assume; > didn't check) they'll all be overwritten anyway. I would seem > more logical to me if they'd just all get masked for IOMMU > enabling, deferring to ioapic_resume() for everything else. Yes, it seems like resume_x2apic shouldn't need to play with the entries at all TBH, just enabling interrupt remapping should be enough given that the entries are already save and restored by ioapic_{suspend/resume}. Would you agree to leave that as-is in this patch, ie: always pass true to restore_IO_APIC_setup in resume_x2apic? > >>>>>> Also how would "x2apic_enabled" reflect the transition? It may > >>>>>> have been "true" already before entering the function here. > >>>>> > >>>>> If x2apic_enabled == true at the point where restore_IO_APIC_setup > >>>>> is called interrupt remapping must be enabled, because AFAICT at this > >>>>> point it's not possible to have x2apic_enabled == true and interrupt > >>>>> remapping disabled. > >>>>> > >>>>> The issue I can see here is what happens if interrupt remapping was > >>>>> already enabled by the hardware, and entries in the IO-APIC are > >>>>> already using the remapping table. I would have to look into how to > >>>>> detect that case properly, but I think the proposed change is an > >>>>> improvement overall. > >>>> > >>>> Firmware handing off with the IOMMU (and hence interrupt remapping) > >>>> already enabled is a case which - afaict - we can't currently cope > >>>> with. Firmware handing off in x2APIC enabled state is typically > >>>> with the IOMMU (and hence interrupt remapping) still disabled. This > >>>> is not a forbidden mode, it's just that in such a configuration > >>>> interrupts can't be delivered to certain CPUs. > >>>> > >>>> In any event you need to properly distinguish x2APIC enabled state > >>>> (or the transition thereof) from IOMMU / interrupt remapping > >>>> enabled state (or the transition thereof). I.e. you want to avoid > >>>> raw mode restore if interrupt remapping state transitioned from > >>>> off to on in the process. > >>> > >>> Right, and that's why the call to restore_IO_APIC_setup in > >>> x2apic_bsp_setup uses !x2apic_enabled as it's second parameter. If > >>> interrupt remapping has been enabled by the call to > >>> iommu_enable_x2apic x2apic_enabled must be true, and hence the entries > >>> in the IO-APIC need to be translated to use the interrupt remapping > >>> table. There's no path that can lead to restore_IO_APIC_setup with > >>> interrupt remapping enabled and x2APIC mode disabled (or with x2APIC > >>> enabled and interrupt remapping disabled). > >>> > >>> Hence if interrupt remapping is off before calling x2apic_bsp_setup > >>> (which is what Xen expects to function properly) and x2apic_enabled == > >>> true when calling restore_IO_APIC_setup it means interrupt remapping > >>> got enabled, and the IO-APIC entries need translation. > >> > >> But the code in question sits on a shared success/error path, and > >> in the error case it matters whether x2apic_enabled was true already > >> on entry. > > > > But it's simply not possible to reach the call to > > restore_IO_APIC_setup with x2apic_enabled == true and interrupt > > remapping disabled, regardless of the initial value of > > x2apic_enabled. > > > > All the paths that could lead to this scenario are short-circuited > > above with a panic. > > Hmm, true. Nevertheless it would feel better if the conditionals > were using what actually matters, rather than something derived. Would you be OK to using something like v1: https://lists.xenproject.org/archives/html/xen-devel/2019-10/msg00805.html See the usage of iommu_enabled in x2apic_bsp_setup. I think using x2apic_enabled is fine given the logic in the function, and the fact that x2APIC mandates interrupt remapping, but I could live with that extra local variable. > >> I realize that io_apic_write() would suitably avoid going > >> the remapping path, but I think it would be more clear if the > >> distinction was already made properly at the call site. > > > > I'm afraid I'm slightly loss, do you mean to replace the > > ioapic_write_entry with an io_apic_write in restore_IO_APIC_setup? > > No, because of ... > > > That would be the same as always passing raw == false AFAICT. > > ... this. I'm asking to pass in the argument for "raw" based > on what you want, without relying on io_apic_write()'s > behavior. That's more for code clarity than actual correctness, > since - as said - io_apic_write() would invoke __io_apic_write() > anyway. Sorry, but I'm afraid I still don't fully understand what you mean with this. restore_IO_APIC_setup uses ioapic_write_entry which in turn uses __io_apic_write or io_apic_write. I could get rid of the usage of ioapic_write_entry and just call __io_apic_write or io_apic_write (or even directly call iommu_update_ire_from_apic), but I'm not sure what's the benefit of any of this, it's just open-coding logic that's done in the helpers. Would you rather mean to rename the parameter of restore_IO_APIC_setup from 'raw' to 'translate' or some such in order to clearly denote there's a transformation done before writing the entry? Thanks, Roger.
On 11.10.2019 11:28, Roger Pau Monné wrote: > On Fri, Oct 11, 2019 at 09:52:35AM +0200, Jan Beulich wrote: >> On 10.10.2019 17:19, Roger Pau Monné wrote: >>> On Thu, Oct 10, 2019 at 03:46:45PM +0200, Jan Beulich wrote: >>>> On 10.10.2019 15:12, Roger Pau Monné wrote: >>>>> On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote: >>>>>> On 10.10.2019 14:13, Roger Pau Monné wrote: >>>>>>> On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote: >>>>>>>> On 10.10.2019 13:33, Roger Pau Monne wrote: >>>>>>>>> When interrupt remapping is enabled as part of enabling x2APIC the >>>>>>>> >>>>>>>> Perhaps "unmasked" instead of "the"? >>>>>>>> >>>>>>>>> IO-APIC entries also need to be translated to the new format and added >>>>>>>>> to the interrupt remapping table. >>>>>>>>> >>>>>>>>> This prevents IOMMU interrupt remapping faults when booting on >>>>>>>>> hardware that has unmasked IO-APIC pins. >>>>>>>> >>>>>>>> But in the end it only papers over whatever the spurious interrupts >>>>>>>> result form, doesn't it? Which isn't to say this isn't an >>>>>>>> improvement. Calling out the ExtInt case here may be worthwhile as >>>>>>>> well, as would be pointing out that this case still won't work on >>>>>>>> AMD IOMMUs. >>>>>>> >>>>>>> But the fix for the ExtINT AMD issue should be done in >>>>>>> amd_iommu_ioapic_update_ire then, so that it can properly handle >>>>>>> ExtINT delivery mode, not to this part of the code. I will look >>>>>>> into it, but I think it's kind of tangential to the issue here. >>>>>> >>>>>> I'm not talking of you working on fixing this right away. I'm merely >>>>>> asking that you mention here (a) the ExtInt special case and (b) >>>>>> that this special case will (continue to) not work in the AMD case. >>>>>> >>>>>>>>> --- a/xen/arch/x86/apic.c >>>>>>>>> +++ b/xen/arch/x86/apic.c >>>>>>>>> @@ -515,7 +515,7 @@ static void resume_x2apic(void) >>>>>>>>> iommu_enable_x2apic(); >>>>>>>>> __enable_x2apic(); >>>>>>>>> >>>>>>>>> - restore_IO_APIC_setup(ioapic_entries); >>>>>>>>> + restore_IO_APIC_setup(ioapic_entries, true); >>>>>>>>> unmask_8259A(); >>>>>>>>> >>>>>>>>> out: >>>>>>>>> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void) >>>>>>>>> printk("Switched to APIC driver %s\n", genapic.name); >>>>>>>>> >>>>>>>>> restore_out: >>>>>>>>> - restore_IO_APIC_setup(ioapic_entries); >>>>>>>>> + /* >>>>>>>>> + * NB: do not use raw mode when restoring entries if the iommu has been >>>>>>>>> + * enabled during the process, because the entries need to be translated >>>>>>>>> + * and added to the remapping table in that case. >>>>>>>>> + */ >>>>>>>>> + restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled); >>>>>>>> >>>>>>>> How is this different in the resume_x2apic() case? The IOMMU gets >>>>>>>> enabled in the course of that as well. I.e. I'd expect you want >>>>>>>> to pass "false" there, not "true". >>>>>>> >>>>>>> In the resume_x2apic case interrupt remapping should already be >>>>>>> enabled or not, but that function is not going to enable interrupt >>>>>>> remapping if it wasn't enabled before, hence the IO-APIC entries >>>>>>> should already be using the interrupt remapping table and no >>>>>>> translation is needed. >>>>>> >>>>>> Who / what would have enabled the IOMMU in the resume case? >>>>> >>>>> I don't think the question is who enables interrupt remapping in the >>>>> resume case (which is resume_x2apic when calling iommu_enable_x2apic >>>>> AFAICT), the point here is that on resume the entries in the IO-APIC >>>>> will already match the state of interrupt remapping, so they shouldn't >>>>> be translated. If interrupt remapping was off before suspend it will >>>>> still be off after resume, and there won't be any translation needed. >>>>> The same is true if interrupt remapping is on before suspend. >>>> >>>> I disagree: save_IO_APIC_setup() gets called from resume_x2apic(), >>>> not prior to suspend. >>> >>> Oh, so maybe that's a misunderstanding on my side. I don't seem to be >>> able to find a statement about the contents of the IO-APIC registers >>> (and more specifically the entries) when getting back from >>> suspension. Are all entries cleared and masked? >>> >>> Are the values previous to suspension stored? >> >> See ioapic_suspend() / ioapic_resume(): Looks like there's some >> redundancy here - I don't think it makes sense for the LAPIC >> code to fiddle with all the RTEs if subsequently (I assume; >> didn't check) they'll all be overwritten anyway. I would seem >> more logical to me if they'd just all get masked for IOMMU >> enabling, deferring to ioapic_resume() for everything else. > > Yes, it seems like resume_x2apic shouldn't need to play with the > entries at all TBH, just enabling interrupt remapping should be enough > given that the entries are already save and restored by > ioapic_{suspend/resume}. > > Would you agree to leave that as-is in this patch, ie: always pass > true to restore_IO_APIC_setup in resume_x2apic? Properly explained in the description, that's an option. Even better would of course be to do away with the unnecessary save/restore in a prereq patch, at which point the question disappears as to what to pass to the function at that point. >>> But it's simply not possible to reach the call to >>> restore_IO_APIC_setup with x2apic_enabled == true and interrupt >>> remapping disabled, regardless of the initial value of >>> x2apic_enabled. >>> >>> All the paths that could lead to this scenario are short-circuited >>> above with a panic. >> >> Hmm, true. Nevertheless it would feel better if the conditionals >> were using what actually matters, rather than something derived. > > Would you be OK to using something like v1: > > https://lists.xenproject.org/archives/html/xen-devel/2019-10/msg00805.html > > See the usage of iommu_enabled in x2apic_bsp_setup. I think using > x2apic_enabled is fine given the logic in the function, and the fact > that x2APIC mandates interrupt remapping, but I could live with that > extra local variable. Conceptually this looks better to me. I'd prefer to avoid shadowing the global "iommu_enabled" though, and set the local variable to "true" only one we actually enable the IOMMU. Also strictly speaking you care about the enabling of interrupt remapping, not that of the IOMMU (which implicitly means DMA remapping). I wonder whether the code couldn't easily be made cope with IOMMU already being enabled, assuming that this would lead to {iommu,intremap}_enabled to already be true on entry. I.e. you'd request a "raw" restore unless intremap_enabled changed from false to true. >>>> I realize that io_apic_write() would suitably avoid going >>>> the remapping path, but I think it would be more clear if the >>>> distinction was already made properly at the call site. >>> >>> I'm afraid I'm slightly loss, do you mean to replace the >>> ioapic_write_entry with an io_apic_write in restore_IO_APIC_setup? >> >> No, because of ... >> >>> That would be the same as always passing raw == false AFAICT. >> >> ... this. I'm asking to pass in the argument for "raw" based >> on what you want, without relying on io_apic_write()'s >> behavior. That's more for code clarity than actual correctness, >> since - as said - io_apic_write() would invoke __io_apic_write() >> anyway. > > Sorry, but I'm afraid I still don't fully understand what you mean > with this. restore_IO_APIC_setup uses ioapic_write_entry which in turn > uses __io_apic_write or io_apic_write. I could get rid of the usage of > ioapic_write_entry and just call __io_apic_write or io_apic_write (or > even directly call iommu_update_ire_from_apic), but I'm not sure > what's the benefit of any of this, it's just open-coding logic that's > done in the helpers. > > Would you rather mean to rename the parameter of restore_IO_APIC_setup > from 'raw' to 'translate' or some such in order to clearly denote > there's a transformation done before writing the entry? Whether it's "raw" or "translate" doesn't really matter to me. What I'm after is for you to not pass raw=true when you really mean raw=false (or the other way around), relying on the intremap_enabled check down the call chain in io_apic_write(). Jan
On Fri, Oct 11, 2019 at 12:01:36PM +0200, Jan Beulich wrote: > On 11.10.2019 11:28, Roger Pau Monné wrote: > > On Fri, Oct 11, 2019 at 09:52:35AM +0200, Jan Beulich wrote: > >> On 10.10.2019 17:19, Roger Pau Monné wrote: > >>> On Thu, Oct 10, 2019 at 03:46:45PM +0200, Jan Beulich wrote: > >>>> On 10.10.2019 15:12, Roger Pau Monné wrote: > >>>>> On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote: > >>>>>> On 10.10.2019 14:13, Roger Pau Monné wrote: > >>>>>>> On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote: > >>>>>>>> On 10.10.2019 13:33, Roger Pau Monne wrote: > >>>>>>>>> When interrupt remapping is enabled as part of enabling x2APIC the > >>>>>>>> > >>>>>>>> Perhaps "unmasked" instead of "the"? > >>>>>>>> > >>>>>>>>> IO-APIC entries also need to be translated to the new format and added > >>>>>>>>> to the interrupt remapping table. > >>>>>>>>> > >>>>>>>>> This prevents IOMMU interrupt remapping faults when booting on > >>>>>>>>> hardware that has unmasked IO-APIC pins. > >>>>>>>> > >>>>>>>> But in the end it only papers over whatever the spurious interrupts > >>>>>>>> result form, doesn't it? Which isn't to say this isn't an > >>>>>>>> improvement. Calling out the ExtInt case here may be worthwhile as > >>>>>>>> well, as would be pointing out that this case still won't work on > >>>>>>>> AMD IOMMUs. > >>>>>>> > >>>>>>> But the fix for the ExtINT AMD issue should be done in > >>>>>>> amd_iommu_ioapic_update_ire then, so that it can properly handle > >>>>>>> ExtINT delivery mode, not to this part of the code. I will look > >>>>>>> into it, but I think it's kind of tangential to the issue here. > >>>>>> > >>>>>> I'm not talking of you working on fixing this right away. I'm merely > >>>>>> asking that you mention here (a) the ExtInt special case and (b) > >>>>>> that this special case will (continue to) not work in the AMD case. > >>>>>> > >>>>>>>>> --- a/xen/arch/x86/apic.c > >>>>>>>>> +++ b/xen/arch/x86/apic.c > >>>>>>>>> @@ -515,7 +515,7 @@ static void resume_x2apic(void) > >>>>>>>>> iommu_enable_x2apic(); > >>>>>>>>> __enable_x2apic(); > >>>>>>>>> > >>>>>>>>> - restore_IO_APIC_setup(ioapic_entries); > >>>>>>>>> + restore_IO_APIC_setup(ioapic_entries, true); > >>>>>>>>> unmask_8259A(); > >>>>>>>>> > >>>>>>>>> out: > >>>>>>>>> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void) > >>>>>>>>> printk("Switched to APIC driver %s\n", genapic.name); > >>>>>>>>> > >>>>>>>>> restore_out: > >>>>>>>>> - restore_IO_APIC_setup(ioapic_entries); > >>>>>>>>> + /* > >>>>>>>>> + * NB: do not use raw mode when restoring entries if the iommu has been > >>>>>>>>> + * enabled during the process, because the entries need to be translated > >>>>>>>>> + * and added to the remapping table in that case. > >>>>>>>>> + */ > >>>>>>>>> + restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled); > >>>>>>>> > >>>>>>>> How is this different in the resume_x2apic() case? The IOMMU gets > >>>>>>>> enabled in the course of that as well. I.e. I'd expect you want > >>>>>>>> to pass "false" there, not "true". > >>>>>>> > >>>>>>> In the resume_x2apic case interrupt remapping should already be > >>>>>>> enabled or not, but that function is not going to enable interrupt > >>>>>>> remapping if it wasn't enabled before, hence the IO-APIC entries > >>>>>>> should already be using the interrupt remapping table and no > >>>>>>> translation is needed. > >>>>>> > >>>>>> Who / what would have enabled the IOMMU in the resume case? > >>>>> > >>>>> I don't think the question is who enables interrupt remapping in the > >>>>> resume case (which is resume_x2apic when calling iommu_enable_x2apic > >>>>> AFAICT), the point here is that on resume the entries in the IO-APIC > >>>>> will already match the state of interrupt remapping, so they shouldn't > >>>>> be translated. If interrupt remapping was off before suspend it will > >>>>> still be off after resume, and there won't be any translation needed. > >>>>> The same is true if interrupt remapping is on before suspend. > >>>> > >>>> I disagree: save_IO_APIC_setup() gets called from resume_x2apic(), > >>>> not prior to suspend. > >>> > >>> Oh, so maybe that's a misunderstanding on my side. I don't seem to be > >>> able to find a statement about the contents of the IO-APIC registers > >>> (and more specifically the entries) when getting back from > >>> suspension. Are all entries cleared and masked? > >>> > >>> Are the values previous to suspension stored? > >> > >> See ioapic_suspend() / ioapic_resume(): Looks like there's some > >> redundancy here - I don't think it makes sense for the LAPIC > >> code to fiddle with all the RTEs if subsequently (I assume; > >> didn't check) they'll all be overwritten anyway. I would seem > >> more logical to me if they'd just all get masked for IOMMU > >> enabling, deferring to ioapic_resume() for everything else. > > > > Yes, it seems like resume_x2apic shouldn't need to play with the > > entries at all TBH, just enabling interrupt remapping should be enough > > given that the entries are already save and restored by > > ioapic_{suspend/resume}. > > > > Would you agree to leave that as-is in this patch, ie: always pass > > true to restore_IO_APIC_setup in resume_x2apic? > > Properly explained in the description, that's an option. Even better > would of course be to do away with the unnecessary save/restore in > a prereq patch, at which point the question disappears as to what to > pass to the function at that point. OK, I can create a pre-patch, but I don't think I have any hardware that does suspend/resume since it's all server-grade. Someone would have to test it and assert it actually works correctly, hence my reluctance to touch any of the suspend/resume logic. > >>> But it's simply not possible to reach the call to > >>> restore_IO_APIC_setup with x2apic_enabled == true and interrupt > >>> remapping disabled, regardless of the initial value of > >>> x2apic_enabled. > >>> > >>> All the paths that could lead to this scenario are short-circuited > >>> above with a panic. > >> > >> Hmm, true. Nevertheless it would feel better if the conditionals > >> were using what actually matters, rather than something derived. > > > > Would you be OK to using something like v1: > > > > https://lists.xenproject.org/archives/html/xen-devel/2019-10/msg00805.html > > > > See the usage of iommu_enabled in x2apic_bsp_setup. I think using > > x2apic_enabled is fine given the logic in the function, and the fact > > that x2APIC mandates interrupt remapping, but I could live with that > > extra local variable. > > Conceptually this looks better to me. I'd prefer to avoid shadowing > the global "iommu_enabled" though, and set the local variable to > "true" only one we actually enable the IOMMU. Also strictly speaking > you care about the enabling of interrupt remapping, not that of the > IOMMU (which implicitly means DMA remapping). I wonder whether the > code couldn't easily be made cope with IOMMU already being enabled, > assuming that this would lead to {iommu,intremap}_enabled to already > be true on entry. I.e. you'd request a "raw" restore unless > intremap_enabled changed from false to true. I'm not sure it's that easy. A raw restore wouldn't fully work, since Xen would have replaced the interrupt remapping table with an empty one, so Xen would have to add the entries for the unmasked IO-APIC pins to the IRT and then likely also update the IO-APIC pin entry to match the newly created IRTE. I think adding a comment at the top of the function to note that this won't work properly if there are unmasked IO-APIC pins and interrupt remapping enabled should be enough for now, as the patch doesn't make this case worse than it's current state. > >>>> I realize that io_apic_write() would suitably avoid going > >>>> the remapping path, but I think it would be more clear if the > >>>> distinction was already made properly at the call site. > >>> > >>> I'm afraid I'm slightly loss, do you mean to replace the > >>> ioapic_write_entry with an io_apic_write in restore_IO_APIC_setup? > >> > >> No, because of ... > >> > >>> That would be the same as always passing raw == false AFAICT. > >> > >> ... this. I'm asking to pass in the argument for "raw" based > >> on what you want, without relying on io_apic_write()'s > >> behavior. That's more for code clarity than actual correctness, > >> since - as said - io_apic_write() would invoke __io_apic_write() > >> anyway. > > > > Sorry, but I'm afraid I still don't fully understand what you mean > > with this. restore_IO_APIC_setup uses ioapic_write_entry which in turn > > uses __io_apic_write or io_apic_write. I could get rid of the usage of > > ioapic_write_entry and just call __io_apic_write or io_apic_write (or > > even directly call iommu_update_ire_from_apic), but I'm not sure > > what's the benefit of any of this, it's just open-coding logic that's > > done in the helpers. > > > > Would you rather mean to rename the parameter of restore_IO_APIC_setup > > from 'raw' to 'translate' or some such in order to clearly denote > > there's a transformation done before writing the entry? > > Whether it's "raw" or "translate" doesn't really matter to me. > What I'm after is for you to not pass raw=true when you really > mean raw=false (or the other way around), relying on the > intremap_enabled check down the call chain in io_apic_write(). I don't think that's the case with my proposed code (or at least not the intention), when raw=false is passed that's because the entries _must_ be translated, not because x2apic_bsp_setup is relying on the value of intremap_enabled. Let me prepare a new series with the proposed changes and we can continue from there. Thanks, Roger.
diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c index 6cdb50cf41..8415d9f787 100644 --- a/xen/arch/x86/apic.c +++ b/xen/arch/x86/apic.c @@ -515,7 +515,7 @@ static void resume_x2apic(void) iommu_enable_x2apic(); __enable_x2apic(); - restore_IO_APIC_setup(ioapic_entries); + restore_IO_APIC_setup(ioapic_entries, true); unmask_8259A(); out: @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void) printk("Switched to APIC driver %s\n", genapic.name); restore_out: - restore_IO_APIC_setup(ioapic_entries); + /* + * NB: do not use raw mode when restoring entries if the iommu has been + * enabled during the process, because the entries need to be translated + * and added to the remapping table in that case. + */ + restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled); unmask_8259A(); out: diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c index 5d25862bd8..37eabc16c9 100644 --- a/xen/arch/x86/io_apic.c +++ b/xen/arch/x86/io_apic.c @@ -379,7 +379,8 @@ void mask_IO_APIC_setup(struct IO_APIC_route_entry **ioapic_entries) /* * Restore IO APIC entries which was saved in ioapic_entries. */ -int restore_IO_APIC_setup(struct IO_APIC_route_entry **ioapic_entries) +int restore_IO_APIC_setup(struct IO_APIC_route_entry **ioapic_entries, + bool raw) { int apic, pin; @@ -394,7 +395,7 @@ int restore_IO_APIC_setup(struct IO_APIC_route_entry **ioapic_entries) return -ENOMEM; for (pin = 0; pin < nr_ioapic_entries[apic]; pin++) - ioapic_write_entry(apic, pin, 1, ioapic_entries[apic][pin]); + ioapic_write_entry(apic, pin, raw, ioapic_entries[apic][pin]); } return 0; diff --git a/xen/include/asm-x86/io_apic.h b/xen/include/asm-x86/io_apic.h index 0b041f0565..998905186b 100644 --- a/xen/include/asm-x86/io_apic.h +++ b/xen/include/asm-x86/io_apic.h @@ -197,7 +197,8 @@ extern struct IO_APIC_route_entry **alloc_ioapic_entries(void); extern void free_ioapic_entries(struct IO_APIC_route_entry **ioapic_entries); extern int save_IO_APIC_setup(struct IO_APIC_route_entry **ioapic_entries); extern void mask_IO_APIC_setup(struct IO_APIC_route_entry **ioapic_entries); -extern int restore_IO_APIC_setup(struct IO_APIC_route_entry **ioapic_entries); +extern int restore_IO_APIC_setup(struct IO_APIC_route_entry **ioapic_entries, + bool raw); unsigned highest_gsi(void);
When interrupt remapping is enabled as part of enabling x2APIC the IO-APIC entries also need to be translated to the new format and added to the interrupt remapping table. This prevents IOMMU interrupt remapping faults when booting on hardware that has unmasked IO-APIC pins. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- Cc: Juergen Gross <jgross@suse.com> --- Changes since v1: - Remove the unneeded iommu_enabled local variable. --- xen/arch/x86/apic.c | 9 +++++++-- xen/arch/x86/io_apic.c | 5 +++-- xen/include/asm-x86/io_apic.h | 3 ++- 3 files changed, 12 insertions(+), 5 deletions(-)