diff mbox series

x86/ioapic: sanitize IO-APIC pins before enabling the local APIC

Message ID 20230707095338.44244-1-roger.pau@citrix.com (mailing list archive)
State New, archived
Headers show
Series x86/ioapic: sanitize IO-APIC pins before enabling the local APIC | expand

Commit Message

Roger Pau Monné July 7, 2023, 9:53 a.m. UTC
The current logic to init the local APIC and the IO-APIC does init the
former first before doing any kind of sanitation on the IO-APIC pin
configuration.  It's already noted on enable_IO_APIC() that Xen
shouldn't trust the IO-APIC being empty at bootup.

At XenServer we have a system where the IO-APIC 0 is handed to Xen
with pin 0 unmasked, set to Fixed delivery mode, edge triggered and
with a vector of 0 (all fields of the RTE are zeroed).  Once the local
APIC is enabled periodic injections from such pin cause a storm of
errors:

APIC error on CPU0: 00(40), Received illegal vector
APIC error on CPU0: 40(40), Received illegal vector
APIC error on CPU0: 40(40), Received illegal vector
APIC error on CPU0: 40(40), Received illegal vector
APIC error on CPU0: 40(40), Received illegal vector
APIC error on CPU0: 40(40), Received illegal vector

That prevents Xen from booting.

Fix this by moving the masking of IO-APIC pins ahead of the enabling
of the local APIC.  Note that before doing such masking Xen attempts
to detect the pin where the legacy i8259 is connected, and that logic
relies on the pin being unmasked, hence the logic is also moved ahead
of enabling the local APIC.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
I've placed the sanitize_IO_APIC() declaration in irq.h with the rest
of related IO-APIC setup functions declarations instead of placing it
in io_apic.h.
---
 xen/arch/x86/apic.c            | 4 ++++
 xen/arch/x86/include/asm/irq.h | 1 +
 xen/arch/x86/io_apic.c         | 4 +---
 xen/arch/x86/smpboot.c         | 4 ++++
 4 files changed, 10 insertions(+), 3 deletions(-)

Comments

Jan Beulich July 10, 2023, 10:56 a.m. UTC | #1
On 07.07.2023 11:53, Roger Pau Monne wrote:
> The current logic to init the local APIC and the IO-APIC does init the
> former first before doing any kind of sanitation on the IO-APIC pin
> configuration.  It's already noted on enable_IO_APIC() that Xen
> shouldn't trust the IO-APIC being empty at bootup.
> 
> At XenServer we have a system where the IO-APIC 0 is handed to Xen
> with pin 0 unmasked, set to Fixed delivery mode, edge triggered and
> with a vector of 0 (all fields of the RTE are zeroed).  Once the local
> APIC is enabled periodic injections from such pin cause a storm of
> errors:
> 
> APIC error on CPU0: 00(40), Received illegal vector
> APIC error on CPU0: 40(40), Received illegal vector
> APIC error on CPU0: 40(40), Received illegal vector
> APIC error on CPU0: 40(40), Received illegal vector
> APIC error on CPU0: 40(40), Received illegal vector
> APIC error on CPU0: 40(40), Received illegal vector
> 
> That prevents Xen from booting.

And I expect no RTE is in ExtInt mode, so one might conclude that
firmware meant to set up RTE 0 that way. Mainly as a remark, albeit
of course there's then the question whether to change the RTE
rather than masking it. What do ACPI tables say?

> Fix this by moving the masking of IO-APIC pins ahead of the enabling
> of the local APIC.  Note that before doing such masking Xen attempts
> to detect the pin where the legacy i8259 is connected, and that logic
> relies on the pin being unmasked, hence the logic is also moved ahead
> of enabling the local APIC.

A comma after "masking" might help readers here.

> --- a/xen/arch/x86/apic.c
> +++ b/xen/arch/x86/apic.c
> @@ -1476,6 +1476,10 @@ int __init APIC_init_uniprocessor (void)
>          return -1;
>      }
>  
> +    if ( smp_found_config && !skip_ioapic_setup && nr_ioapics )
> +        /* Sanitize the IO-APIC pins before enabling the local APIC. */
> +        sanitize_IO_APIC();

I'm a little puzzled by the smp_found_config part of the check here,
but not in smp_prepare_cpus(). What's the reason for (a) the check
and (b) the difference? Isn't checking nr_ioapics sufficient in this
regard? (b) probably could be addressed by moving the code to the
beginning of verify_local_APIC(), immediately ahead of which you
insert both call sites. (At which point the function may want naming
verify_IO_APIC() for consistency, but that's surely minor.)

As to also checking skip_ioapic_setup - wouldn't the unmasked pin be
similarly troublesome in that case? Aiui it would not hurt only if
the LAPIC also isn't (going to be) set up. Then again the flag,
among other things, signals a zero base address found in the ACPI or
MP tables, so I guess this is a (largely) separate issue anyway.

Jan
Roger Pau Monné July 10, 2023, 2:43 p.m. UTC | #2
On Mon, Jul 10, 2023 at 12:56:27PM +0200, Jan Beulich wrote:
> On 07.07.2023 11:53, Roger Pau Monne wrote:
> > The current logic to init the local APIC and the IO-APIC does init the
> > former first before doing any kind of sanitation on the IO-APIC pin
> > configuration.  It's already noted on enable_IO_APIC() that Xen
> > shouldn't trust the IO-APIC being empty at bootup.
> > 
> > At XenServer we have a system where the IO-APIC 0 is handed to Xen
> > with pin 0 unmasked, set to Fixed delivery mode, edge triggered and
> > with a vector of 0 (all fields of the RTE are zeroed).  Once the local
> > APIC is enabled periodic injections from such pin cause a storm of
> > errors:
> > 
> > APIC error on CPU0: 00(40), Received illegal vector
> > APIC error on CPU0: 40(40), Received illegal vector
> > APIC error on CPU0: 40(40), Received illegal vector
> > APIC error on CPU0: 40(40), Received illegal vector
> > APIC error on CPU0: 40(40), Received illegal vector
> > APIC error on CPU0: 40(40), Received illegal vector
> > 
> > That prevents Xen from booting.
> 
> And I expect no RTE is in ExtInt mode, so one might conclude that
> firmware meant to set up RTE 0 that way. Mainly as a remark, albeit
> of course there's then the question whether to change the RTE
> rather than masking it. What do ACPI tables say?

There's one relevant override:

[668h 1640   1]                Subtable Type : 02 [Interrupt Source Override]
[669h 1641   1]                       Length : 0A
[66Ah 1642   1]                          Bus : 00
[66Bh 1643   1]                       Source : 00
[66Ch 1644   4]                    Interrupt : 00000002
[670h 1648   2]        Flags (decoded below) : 0000
                                    Polarity : 0
                                Trigger Mode : 0

So IRQ 0 -> GSI 2, so it's likely pin 0 is the one the i8259 is
connected to.

> > Fix this by moving the masking of IO-APIC pins ahead of the enabling
> > of the local APIC.  Note that before doing such masking Xen attempts
> > to detect the pin where the legacy i8259 is connected, and that logic
> > relies on the pin being unmasked, hence the logic is also moved ahead
> > of enabling the local APIC.
> 
> A comma after "masking" might help readers here.
> 
> > --- a/xen/arch/x86/apic.c
> > +++ b/xen/arch/x86/apic.c
> > @@ -1476,6 +1476,10 @@ int __init APIC_init_uniprocessor (void)
> >          return -1;
> >      }
> >  
> > +    if ( smp_found_config && !skip_ioapic_setup && nr_ioapics )
> > +        /* Sanitize the IO-APIC pins before enabling the local APIC. */
> > +        sanitize_IO_APIC();
> 
> I'm a little puzzled by the smp_found_config part of the check here,
> but not in smp_prepare_cpus(). What's the reason for (a) the check
> and (b) the difference?

This just mimics what gates the call to setup_IO_APIC() in that same
function.  It makes no sense to call sanitize_IO_APIC() if
setup_IO_APIC() is not called, and I wasn't planning on changing the
logic that gates the call setup_IO_APIC() in this patch.

I did note the difference with smp_prepare_cpus(), and I think we
should look at unifying those paths, but didn't want to do it as part
of this fix.

> Isn't checking nr_ioapics sufficient in this
> regard? (b) probably could be addressed by moving the code to the
> beginning of verify_local_APIC(), immediately ahead of which you
> insert both call sites. (At which point the function may want naming
> verify_IO_APIC() for consistency, but that's surely minor.)

I wanted the call to sanitize_IO_APIC() to be done at the same level
where the call to setup_IO_APIC() is done, because it makes the logic
clearer.

> As to also checking skip_ioapic_setup - wouldn't the unmasked pin be
> similarly troublesome in that case?

skip_ioapic_setup is set when the IO-APIC address in the MADT is
invalid, so in that case attempting to access IO-APIC registers in
that case won't lead to anything good.

> Aiui it would not hurt only if
> the LAPIC also isn't (going to be) set up. Then again the flag,
> among other things, signals a zero base address found in the ACPI or
> MP tables, so I guess this is a (largely) separate issue anyway.

Oh, yes, indeed.  See my reply above.

Thanks, Roger.
Jan Beulich July 11, 2023, 10:53 a.m. UTC | #3
On 10.07.2023 16:43, Roger Pau Monné wrote:
> On Mon, Jul 10, 2023 at 12:56:27PM +0200, Jan Beulich wrote:
>> On 07.07.2023 11:53, Roger Pau Monne wrote:
>>> The current logic to init the local APIC and the IO-APIC does init the
>>> former first before doing any kind of sanitation on the IO-APIC pin
>>> configuration.  It's already noted on enable_IO_APIC() that Xen
>>> shouldn't trust the IO-APIC being empty at bootup.
>>>
>>> At XenServer we have a system where the IO-APIC 0 is handed to Xen
>>> with pin 0 unmasked, set to Fixed delivery mode, edge triggered and
>>> with a vector of 0 (all fields of the RTE are zeroed).  Once the local
>>> APIC is enabled periodic injections from such pin cause a storm of
>>> errors:
>>>
>>> APIC error on CPU0: 00(40), Received illegal vector
>>> APIC error on CPU0: 40(40), Received illegal vector
>>> APIC error on CPU0: 40(40), Received illegal vector
>>> APIC error on CPU0: 40(40), Received illegal vector
>>> APIC error on CPU0: 40(40), Received illegal vector
>>> APIC error on CPU0: 40(40), Received illegal vector
>>>
>>> That prevents Xen from booting.
>>
>> And I expect no RTE is in ExtInt mode, so one might conclude that
>> firmware meant to set up RTE 0 that way. Mainly as a remark, albeit
>> of course there's then the question whether to change the RTE
>> rather than masking it. What do ACPI tables say?
> 
> There's one relevant override:
> 
> [668h 1640   1]                Subtable Type : 02 [Interrupt Source Override]
> [669h 1641   1]                       Length : 0A
> [66Ah 1642   1]                          Bus : 00
> [66Bh 1643   1]                       Source : 00
> [66Ch 1644   4]                    Interrupt : 00000002
> [670h 1648   2]        Flags (decoded below) : 0000
>                                     Polarity : 0
>                                 Trigger Mode : 0
> 
> So IRQ 0 -> GSI 2, so it's likely pin 0 is the one the i8259 is
> connected to.

Then wouldn't we be better off converting that RTE to ExtInt? That
would allow PIC IRQs (not likely to exist, but still) to work
(without undue other side effects afaics), whereas masking this RTE
would not.

>>> --- a/xen/arch/x86/apic.c
>>> +++ b/xen/arch/x86/apic.c
>>> @@ -1476,6 +1476,10 @@ int __init APIC_init_uniprocessor (void)
>>>          return -1;
>>>      }
>>>  
>>> +    if ( smp_found_config && !skip_ioapic_setup && nr_ioapics )
>>> +        /* Sanitize the IO-APIC pins before enabling the local APIC. */
>>> +        sanitize_IO_APIC();
>>
>> I'm a little puzzled by the smp_found_config part of the check here,
>> but not in smp_prepare_cpus(). What's the reason for (a) the check
>> and (b) the difference?
> 
> This just mimics what gates the call to setup_IO_APIC() in that same
> function.  It makes no sense to call sanitize_IO_APIC() if
> setup_IO_APIC() is not called, and I wasn't planning on changing the
> logic that gates the call setup_IO_APIC() in this patch.
> 
> I did note the difference with smp_prepare_cpus(), and I think we
> should look at unifying those paths, but didn't want to do it as part
> of this fix.

Well, consistency is one valid goal. But masking RTEs may need to be
done more aggressively than setting up the IO-APICs. In particular
if we're not to use them, we still want to mask all RTEs. Otherwise
we're likely to observe each IRQ to arrive via two separate routes
(once through the 8259 and once from an unmasked IO-APIC pin).

>> Isn't checking nr_ioapics sufficient in this
>> regard? (b) probably could be addressed by moving the code to the
>> beginning of verify_local_APIC(), immediately ahead of which you
>> insert both call sites. (At which point the function may want naming
>> verify_IO_APIC() for consistency, but that's surely minor.)
> 
> I wanted the call to sanitize_IO_APIC() to be done at the same level
> where the call to setup_IO_APIC() is done, because it makes the logic
> clearer.

Hmm, I see.

>> As to also checking skip_ioapic_setup - wouldn't the unmasked pin be
>> similarly troublesome in that case?
> 
> skip_ioapic_setup is set when the IO-APIC address in the MADT is
> invalid, so in that case attempting to access IO-APIC registers in
> that case won't lead to anything good.

Of course, as I did say as well. Still if for some IO-APICs we know
their addresses, we would still be able to deal with those (if we
were to stick to this mask-all-RTEs-early model).

Jan
Roger Pau Monné July 11, 2023, 1:02 p.m. UTC | #4
On Tue, Jul 11, 2023 at 12:53:31PM +0200, Jan Beulich wrote:
> On 10.07.2023 16:43, Roger Pau Monné wrote:
> > On Mon, Jul 10, 2023 at 12:56:27PM +0200, Jan Beulich wrote:
> >> On 07.07.2023 11:53, Roger Pau Monne wrote:
> >>> The current logic to init the local APIC and the IO-APIC does init the
> >>> former first before doing any kind of sanitation on the IO-APIC pin
> >>> configuration.  It's already noted on enable_IO_APIC() that Xen
> >>> shouldn't trust the IO-APIC being empty at bootup.
> >>>
> >>> At XenServer we have a system where the IO-APIC 0 is handed to Xen
> >>> with pin 0 unmasked, set to Fixed delivery mode, edge triggered and
> >>> with a vector of 0 (all fields of the RTE are zeroed).  Once the local
> >>> APIC is enabled periodic injections from such pin cause a storm of
> >>> errors:
> >>>
> >>> APIC error on CPU0: 00(40), Received illegal vector
> >>> APIC error on CPU0: 40(40), Received illegal vector
> >>> APIC error on CPU0: 40(40), Received illegal vector
> >>> APIC error on CPU0: 40(40), Received illegal vector
> >>> APIC error on CPU0: 40(40), Received illegal vector
> >>> APIC error on CPU0: 40(40), Received illegal vector
> >>>
> >>> That prevents Xen from booting.
> >>
> >> And I expect no RTE is in ExtInt mode, so one might conclude that
> >> firmware meant to set up RTE 0 that way. Mainly as a remark, albeit
> >> of course there's then the question whether to change the RTE
> >> rather than masking it. What do ACPI tables say?
> > 
> > There's one relevant override:
> > 
> > [668h 1640   1]                Subtable Type : 02 [Interrupt Source Override]
> > [669h 1641   1]                       Length : 0A
> > [66Ah 1642   1]                          Bus : 00
> > [66Bh 1643   1]                       Source : 00
> > [66Ch 1644   4]                    Interrupt : 00000002
> > [670h 1648   2]        Flags (decoded below) : 0000
> >                                     Polarity : 0
> >                                 Trigger Mode : 0
> > 
> > So IRQ 0 -> GSI 2, so it's likely pin 0 is the one the i8259 is
> > connected to.
> 
> Then wouldn't we be better off converting that RTE to ExtInt? That
> would allow PIC IRQs (not likely to exist, but still) to work
> (without undue other side effects afaics), whereas masking this RTE
> would not.

I'm kind of worry of trying to automate this logic.  Should we always
convert pin 0 to ExtInt if it's found unmasked, with and invalid
vector and there's a source override entry for the IRQ?

It seems weird to infer this just from the fact that pin 0 is all
zeroed out.

> >>> --- a/xen/arch/x86/apic.c
> >>> +++ b/xen/arch/x86/apic.c
> >>> @@ -1476,6 +1476,10 @@ int __init APIC_init_uniprocessor (void)
> >>>          return -1;
> >>>      }
> >>>  
> >>> +    if ( smp_found_config && !skip_ioapic_setup && nr_ioapics )
> >>> +        /* Sanitize the IO-APIC pins before enabling the local APIC. */
> >>> +        sanitize_IO_APIC();
> >>
> >> I'm a little puzzled by the smp_found_config part of the check here,
> >> but not in smp_prepare_cpus(). What's the reason for (a) the check
> >> and (b) the difference?
> > 
> > This just mimics what gates the call to setup_IO_APIC() in that same
> > function.  It makes no sense to call sanitize_IO_APIC() if
> > setup_IO_APIC() is not called, and I wasn't planning on changing the
> > logic that gates the call setup_IO_APIC() in this patch.
> > 
> > I did note the difference with smp_prepare_cpus(), and I think we
> > should look at unifying those paths, but didn't want to do it as part
> > of this fix.
> 
> Well, consistency is one valid goal. But masking RTEs may need to be
> done more aggressively than setting up the IO-APICs. In particular
> if we're not to use them, we still want to mask all RTEs. Otherwise
> we're likely to observe each IRQ to arrive via two separate routes
> (once through the 8259 and once from an unmasked IO-APIC pin).

So avoid the smp_found_config check here and use the same condition as in
smp_prepare_cpus(), I will adjust the patch to that effect.

I do wonder why we don't simply mandate IO-APIC usage and get rid of
the code to handle the 8259.  Is it only to cope with systems that
have a broken IO-APIC configuration?  Because I don't think there are
x86 64bit systems without an IO-APIC?

> >> Isn't checking nr_ioapics sufficient in this
> >> regard? (b) probably could be addressed by moving the code to the
> >> beginning of verify_local_APIC(), immediately ahead of which you
> >> insert both call sites. (At which point the function may want naming
> >> verify_IO_APIC() for consistency, but that's surely minor.)
> > 
> > I wanted the call to sanitize_IO_APIC() to be done at the same level
> > where the call to setup_IO_APIC() is done, because it makes the logic
> > clearer.
> 
> Hmm, I see.
> 
> >> As to also checking skip_ioapic_setup - wouldn't the unmasked pin be
> >> similarly troublesome in that case?
> > 
> > skip_ioapic_setup is set when the IO-APIC address in the MADT is
> > invalid, so in that case attempting to access IO-APIC registers in
> > that case won't lead to anything good.
> 
> Of course, as I did say as well. Still if for some IO-APICs we know
> their addresses, we would still be able to deal with those (if we
> were to stick to this mask-all-RTEs-early model).

The issue is that ioapic_init_mappings() will refuse to process
further IO-APIC entries once an entry with address 0 is found.  We
could change that, but I would likely do it as an improvement rather
than tie it to this change.

Thanks, Roger.
Jan Beulich July 12, 2023, 10:07 a.m. UTC | #5
On 11.07.2023 15:02, Roger Pau Monné wrote:
> On Tue, Jul 11, 2023 at 12:53:31PM +0200, Jan Beulich wrote:
>> On 10.07.2023 16:43, Roger Pau Monné wrote:
>>> On Mon, Jul 10, 2023 at 12:56:27PM +0200, Jan Beulich wrote:
>>>> On 07.07.2023 11:53, Roger Pau Monne wrote:
>>>>> The current logic to init the local APIC and the IO-APIC does init the
>>>>> former first before doing any kind of sanitation on the IO-APIC pin
>>>>> configuration.  It's already noted on enable_IO_APIC() that Xen
>>>>> shouldn't trust the IO-APIC being empty at bootup.
>>>>>
>>>>> At XenServer we have a system where the IO-APIC 0 is handed to Xen
>>>>> with pin 0 unmasked, set to Fixed delivery mode, edge triggered and
>>>>> with a vector of 0 (all fields of the RTE are zeroed).  Once the local
>>>>> APIC is enabled periodic injections from such pin cause a storm of
>>>>> errors:
>>>>>
>>>>> APIC error on CPU0: 00(40), Received illegal vector
>>>>> APIC error on CPU0: 40(40), Received illegal vector
>>>>> APIC error on CPU0: 40(40), Received illegal vector
>>>>> APIC error on CPU0: 40(40), Received illegal vector
>>>>> APIC error on CPU0: 40(40), Received illegal vector
>>>>> APIC error on CPU0: 40(40), Received illegal vector
>>>>>
>>>>> That prevents Xen from booting.
>>>>
>>>> And I expect no RTE is in ExtInt mode, so one might conclude that
>>>> firmware meant to set up RTE 0 that way. Mainly as a remark, albeit
>>>> of course there's then the question whether to change the RTE
>>>> rather than masking it. What do ACPI tables say?
>>>
>>> There's one relevant override:
>>>
>>> [668h 1640   1]                Subtable Type : 02 [Interrupt Source Override]
>>> [669h 1641   1]                       Length : 0A
>>> [66Ah 1642   1]                          Bus : 00
>>> [66Bh 1643   1]                       Source : 00
>>> [66Ch 1644   4]                    Interrupt : 00000002
>>> [670h 1648   2]        Flags (decoded below) : 0000
>>>                                     Polarity : 0
>>>                                 Trigger Mode : 0
>>>
>>> So IRQ 0 -> GSI 2, so it's likely pin 0 is the one the i8259 is
>>> connected to.
>>
>> Then wouldn't we be better off converting that RTE to ExtInt? That
>> would allow PIC IRQs (not likely to exist, but still) to work
>> (without undue other side effects afaics), whereas masking this RTE
>> would not.
> 
> I'm kind of worry of trying to automate this logic.  Should we always
> convert pin 0 to ExtInt if it's found unmasked, with and invalid
> vector and there's a source override entry for the IRQ?
> 
> It seems weird to infer this just from the fact that pin 0 is all
> zeroed out.

As you say in the earlier paragraph, it wouldn't be just that. It's
unmasked + bad vector + present source override (indicating that
nothing except perhaps a PIC is connected to the pin).

>>>>> --- a/xen/arch/x86/apic.c
>>>>> +++ b/xen/arch/x86/apic.c
>>>>> @@ -1476,6 +1476,10 @@ int __init APIC_init_uniprocessor (void)
>>>>>          return -1;
>>>>>      }
>>>>>  
>>>>> +    if ( smp_found_config && !skip_ioapic_setup && nr_ioapics )
>>>>> +        /* Sanitize the IO-APIC pins before enabling the local APIC. */
>>>>> +        sanitize_IO_APIC();
>>>>
>>>> I'm a little puzzled by the smp_found_config part of the check here,
>>>> but not in smp_prepare_cpus(). What's the reason for (a) the check
>>>> and (b) the difference?
>>>
>>> This just mimics what gates the call to setup_IO_APIC() in that same
>>> function.  It makes no sense to call sanitize_IO_APIC() if
>>> setup_IO_APIC() is not called, and I wasn't planning on changing the
>>> logic that gates the call setup_IO_APIC() in this patch.
>>>
>>> I did note the difference with smp_prepare_cpus(), and I think we
>>> should look at unifying those paths, but didn't want to do it as part
>>> of this fix.
>>
>> Well, consistency is one valid goal. But masking RTEs may need to be
>> done more aggressively than setting up the IO-APICs. In particular
>> if we're not to use them, we still want to mask all RTEs. Otherwise
>> we're likely to observe each IRQ to arrive via two separate routes
>> (once through the 8259 and once from an unmasked IO-APIC pin).
> 
> So avoid the smp_found_config check here and use the same condition as in
> smp_prepare_cpus(), I will adjust the patch to that effect.
> 
> I do wonder why we don't simply mandate IO-APIC usage and get rid of
> the code to handle the 8259.  Is it only to cope with systems that
> have a broken IO-APIC configuration?  Because I don't think there are
> x86 64bit systems without an IO-APIC?

There shouldn't be, yes. And Andrew has been suggesting the same on a
number of occasions, iirc. Yet trying to remove that code is perhaps
more risky than simply keeping it, which may be (besides time
constraints) why nobody has made an attempt so far.

>>>> Isn't checking nr_ioapics sufficient in this
>>>> regard? (b) probably could be addressed by moving the code to the
>>>> beginning of verify_local_APIC(), immediately ahead of which you
>>>> insert both call sites. (At which point the function may want naming
>>>> verify_IO_APIC() for consistency, but that's surely minor.)
>>>
>>> I wanted the call to sanitize_IO_APIC() to be done at the same level
>>> where the call to setup_IO_APIC() is done, because it makes the logic
>>> clearer.
>>
>> Hmm, I see.
>>
>>>> As to also checking skip_ioapic_setup - wouldn't the unmasked pin be
>>>> similarly troublesome in that case?
>>>
>>> skip_ioapic_setup is set when the IO-APIC address in the MADT is
>>> invalid, so in that case attempting to access IO-APIC registers in
>>> that case won't lead to anything good.
>>
>> Of course, as I did say as well. Still if for some IO-APICs we know
>> their addresses, we would still be able to deal with those (if we
>> were to stick to this mask-all-RTEs-early model).
> 
> The issue is that ioapic_init_mappings() will refuse to process
> further IO-APIC entries once an entry with address 0 is found.  We
> could change that, but I would likely do it as an improvement rather
> than tie it to this change.

Sure, I didn't mean to suggest that such needs to be done right here
(it could though, but chances are that this would quickly grow too
much). What we want to do right here is make the change maximally
useful without needing to tweak overly many other places in the code.

Jan
Roger Pau Monné July 12, 2023, 1:53 p.m. UTC | #6
On Wed, Jul 12, 2023 at 12:07:43PM +0200, Jan Beulich wrote:
> On 11.07.2023 15:02, Roger Pau Monné wrote:
> > On Tue, Jul 11, 2023 at 12:53:31PM +0200, Jan Beulich wrote:
> >> On 10.07.2023 16:43, Roger Pau Monné wrote:
> >>> On Mon, Jul 10, 2023 at 12:56:27PM +0200, Jan Beulich wrote:
> >>>> On 07.07.2023 11:53, Roger Pau Monne wrote:
> >>>>> The current logic to init the local APIC and the IO-APIC does init the
> >>>>> former first before doing any kind of sanitation on the IO-APIC pin
> >>>>> configuration.  It's already noted on enable_IO_APIC() that Xen
> >>>>> shouldn't trust the IO-APIC being empty at bootup.
> >>>>>
> >>>>> At XenServer we have a system where the IO-APIC 0 is handed to Xen
> >>>>> with pin 0 unmasked, set to Fixed delivery mode, edge triggered and
> >>>>> with a vector of 0 (all fields of the RTE are zeroed).  Once the local
> >>>>> APIC is enabled periodic injections from such pin cause a storm of
> >>>>> errors:
> >>>>>
> >>>>> APIC error on CPU0: 00(40), Received illegal vector
> >>>>> APIC error on CPU0: 40(40), Received illegal vector
> >>>>> APIC error on CPU0: 40(40), Received illegal vector
> >>>>> APIC error on CPU0: 40(40), Received illegal vector
> >>>>> APIC error on CPU0: 40(40), Received illegal vector
> >>>>> APIC error on CPU0: 40(40), Received illegal vector
> >>>>>
> >>>>> That prevents Xen from booting.
> >>>>
> >>>> And I expect no RTE is in ExtInt mode, so one might conclude that
> >>>> firmware meant to set up RTE 0 that way. Mainly as a remark, albeit
> >>>> of course there's then the question whether to change the RTE
> >>>> rather than masking it. What do ACPI tables say?
> >>>
> >>> There's one relevant override:
> >>>
> >>> [668h 1640   1]                Subtable Type : 02 [Interrupt Source Override]
> >>> [669h 1641   1]                       Length : 0A
> >>> [66Ah 1642   1]                          Bus : 00
> >>> [66Bh 1643   1]                       Source : 00
> >>> [66Ch 1644   4]                    Interrupt : 00000002
> >>> [670h 1648   2]        Flags (decoded below) : 0000
> >>>                                     Polarity : 0
> >>>                                 Trigger Mode : 0
> >>>
> >>> So IRQ 0 -> GSI 2, so it's likely pin 0 is the one the i8259 is
> >>> connected to.
> >>
> >> Then wouldn't we be better off converting that RTE to ExtInt? That
> >> would allow PIC IRQs (not likely to exist, but still) to work
> >> (without undue other side effects afaics), whereas masking this RTE
> >> would not.
> > 
> > I'm kind of worry of trying to automate this logic.  Should we always
> > convert pin 0 to ExtInt if it's found unmasked, with and invalid
> > vector and there's a source override entry for the IRQ?
> > 
> > It seems weird to infer this just from the fact that pin 0 is all
> > zeroed out.
> 
> As you say in the earlier paragraph, it wouldn't be just that. It's
> unmasked + bad vector + present source override (indicating that
> nothing except perhaps a PIC is connected to the pin).

I can do this as a separate patch, but not really here IMO.  The
purpose of this patch is strictly to perform the IO-APIC sanitation
ahead of enabling the local APIC.  I will add a followup patch to the
series, albeit I'm unconvinced we want to infer IO-APIC pin
configuration based on firmware miss configurations.

Thanks, Roger.
Jan Beulich July 12, 2023, 2:50 p.m. UTC | #7
On 12.07.2023 15:53, Roger Pau Monné wrote:
> On Wed, Jul 12, 2023 at 12:07:43PM +0200, Jan Beulich wrote:
>> On 11.07.2023 15:02, Roger Pau Monné wrote:
>>> On Tue, Jul 11, 2023 at 12:53:31PM +0200, Jan Beulich wrote:
>>>> On 10.07.2023 16:43, Roger Pau Monné wrote:
>>>>> On Mon, Jul 10, 2023 at 12:56:27PM +0200, Jan Beulich wrote:
>>>>>> On 07.07.2023 11:53, Roger Pau Monne wrote:
>>>>>>> The current logic to init the local APIC and the IO-APIC does init the
>>>>>>> former first before doing any kind of sanitation on the IO-APIC pin
>>>>>>> configuration.  It's already noted on enable_IO_APIC() that Xen
>>>>>>> shouldn't trust the IO-APIC being empty at bootup.
>>>>>>>
>>>>>>> At XenServer we have a system where the IO-APIC 0 is handed to Xen
>>>>>>> with pin 0 unmasked, set to Fixed delivery mode, edge triggered and
>>>>>>> with a vector of 0 (all fields of the RTE are zeroed).  Once the local
>>>>>>> APIC is enabled periodic injections from such pin cause a storm of
>>>>>>> errors:
>>>>>>>
>>>>>>> APIC error on CPU0: 00(40), Received illegal vector
>>>>>>> APIC error on CPU0: 40(40), Received illegal vector
>>>>>>> APIC error on CPU0: 40(40), Received illegal vector
>>>>>>> APIC error on CPU0: 40(40), Received illegal vector
>>>>>>> APIC error on CPU0: 40(40), Received illegal vector
>>>>>>> APIC error on CPU0: 40(40), Received illegal vector
>>>>>>>
>>>>>>> That prevents Xen from booting.
>>>>>>
>>>>>> And I expect no RTE is in ExtInt mode, so one might conclude that
>>>>>> firmware meant to set up RTE 0 that way. Mainly as a remark, albeit
>>>>>> of course there's then the question whether to change the RTE
>>>>>> rather than masking it. What do ACPI tables say?
>>>>>
>>>>> There's one relevant override:
>>>>>
>>>>> [668h 1640   1]                Subtable Type : 02 [Interrupt Source Override]
>>>>> [669h 1641   1]                       Length : 0A
>>>>> [66Ah 1642   1]                          Bus : 00
>>>>> [66Bh 1643   1]                       Source : 00
>>>>> [66Ch 1644   4]                    Interrupt : 00000002
>>>>> [670h 1648   2]        Flags (decoded below) : 0000
>>>>>                                     Polarity : 0
>>>>>                                 Trigger Mode : 0
>>>>>
>>>>> So IRQ 0 -> GSI 2, so it's likely pin 0 is the one the i8259 is
>>>>> connected to.
>>>>
>>>> Then wouldn't we be better off converting that RTE to ExtInt? That
>>>> would allow PIC IRQs (not likely to exist, but still) to work
>>>> (without undue other side effects afaics), whereas masking this RTE
>>>> would not.
>>>
>>> I'm kind of worry of trying to automate this logic.  Should we always
>>> convert pin 0 to ExtInt if it's found unmasked, with and invalid
>>> vector and there's a source override entry for the IRQ?
>>>
>>> It seems weird to infer this just from the fact that pin 0 is all
>>> zeroed out.
>>
>> As you say in the earlier paragraph, it wouldn't be just that. It's
>> unmasked + bad vector + present source override (indicating that
>> nothing except perhaps a PIC is connected to the pin).
> 
> I can do this as a separate patch, but not really here IMO.  The
> purpose of this patch is strictly to perform the IO-APIC sanitation
> ahead of enabling the local APIC.  I will add a followup patch to the
> series, albeit I'm unconvinced we want to infer IO-APIC pin
> configuration based on firmware miss configurations.

Hmm. The question really is which of the changes we want to backport.
That one should be first. While it's largely guesswork, I'd be more
inclined to take the change that has less of an effect overall.

That said I can see that Linux have their enable_IO_APIC() calls
also ahead of setup_IO_APIC() (but after connect_bsp_APIC() and
setup_local_APIC()). IOW with your change we'd do the masking yet
earlier than them. This may of course even be advantageous, but
there may also be good reasons for the sequence they're using.

Jan
Roger Pau Monné July 12, 2023, 3:41 p.m. UTC | #8
On Wed, Jul 12, 2023 at 04:50:34PM +0200, Jan Beulich wrote:
> On 12.07.2023 15:53, Roger Pau Monné wrote:
> > On Wed, Jul 12, 2023 at 12:07:43PM +0200, Jan Beulich wrote:
> >> On 11.07.2023 15:02, Roger Pau Monné wrote:
> >>> On Tue, Jul 11, 2023 at 12:53:31PM +0200, Jan Beulich wrote:
> >>>> On 10.07.2023 16:43, Roger Pau Monné wrote:
> >>>>> On Mon, Jul 10, 2023 at 12:56:27PM +0200, Jan Beulich wrote:
> >>>>>> On 07.07.2023 11:53, Roger Pau Monne wrote:
> >>>>>>> The current logic to init the local APIC and the IO-APIC does init the
> >>>>>>> former first before doing any kind of sanitation on the IO-APIC pin
> >>>>>>> configuration.  It's already noted on enable_IO_APIC() that Xen
> >>>>>>> shouldn't trust the IO-APIC being empty at bootup.
> >>>>>>>
> >>>>>>> At XenServer we have a system where the IO-APIC 0 is handed to Xen
> >>>>>>> with pin 0 unmasked, set to Fixed delivery mode, edge triggered and
> >>>>>>> with a vector of 0 (all fields of the RTE are zeroed).  Once the local
> >>>>>>> APIC is enabled periodic injections from such pin cause a storm of
> >>>>>>> errors:
> >>>>>>>
> >>>>>>> APIC error on CPU0: 00(40), Received illegal vector
> >>>>>>> APIC error on CPU0: 40(40), Received illegal vector
> >>>>>>> APIC error on CPU0: 40(40), Received illegal vector
> >>>>>>> APIC error on CPU0: 40(40), Received illegal vector
> >>>>>>> APIC error on CPU0: 40(40), Received illegal vector
> >>>>>>> APIC error on CPU0: 40(40), Received illegal vector
> >>>>>>>
> >>>>>>> That prevents Xen from booting.
> >>>>>>
> >>>>>> And I expect no RTE is in ExtInt mode, so one might conclude that
> >>>>>> firmware meant to set up RTE 0 that way. Mainly as a remark, albeit
> >>>>>> of course there's then the question whether to change the RTE
> >>>>>> rather than masking it. What do ACPI tables say?
> >>>>>
> >>>>> There's one relevant override:
> >>>>>
> >>>>> [668h 1640   1]                Subtable Type : 02 [Interrupt Source Override]
> >>>>> [669h 1641   1]                       Length : 0A
> >>>>> [66Ah 1642   1]                          Bus : 00
> >>>>> [66Bh 1643   1]                       Source : 00
> >>>>> [66Ch 1644   4]                    Interrupt : 00000002
> >>>>> [670h 1648   2]        Flags (decoded below) : 0000
> >>>>>                                     Polarity : 0
> >>>>>                                 Trigger Mode : 0
> >>>>>
> >>>>> So IRQ 0 -> GSI 2, so it's likely pin 0 is the one the i8259 is
> >>>>> connected to.
> >>>>
> >>>> Then wouldn't we be better off converting that RTE to ExtInt? That
> >>>> would allow PIC IRQs (not likely to exist, but still) to work
> >>>> (without undue other side effects afaics), whereas masking this RTE
> >>>> would not.
> >>>
> >>> I'm kind of worry of trying to automate this logic.  Should we always
> >>> convert pin 0 to ExtInt if it's found unmasked, with and invalid
> >>> vector and there's a source override entry for the IRQ?
> >>>
> >>> It seems weird to infer this just from the fact that pin 0 is all
> >>> zeroed out.
> >>
> >> As you say in the earlier paragraph, it wouldn't be just that. It's
> >> unmasked + bad vector + present source override (indicating that
> >> nothing except perhaps a PIC is connected to the pin).
> > 
> > I can do this as a separate patch, but not really here IMO.  The
> > purpose of this patch is strictly to perform the IO-APIC sanitation
> > ahead of enabling the local APIC.  I will add a followup patch to the
> > series, albeit I'm unconvinced we want to infer IO-APIC pin
> > configuration based on firmware miss configurations.
> 
> Hmm. The question really is which of the changes we want to backport.
> That one should be first. While it's largely guesswork, I'd be more
> inclined to take the change that has less of an effect overall.

My views would be the other way around I think.  Current code already
has a comment to notice that IO-APIC pins might be wrongly unmasked,
and there's also logic for masking them when the IO-APICs are
initialized.  The fact that such logic is placed after the local APIC
has been initialized is IMO a bug, as having unmasked unconfigured
IO-APIC pins when the local APIC is enabled should be avoided.

> That said I can see that Linux have their enable_IO_APIC() calls
> also ahead of setup_IO_APIC() (but after connect_bsp_APIC() and
> setup_local_APIC()). IOW with your change we'd do the masking yet
> earlier than them. This may of course even be advantageous, but
> there may also be good reasons for the sequence they're using.

Linux calls enable_IO_APIC() before setting up the local
APIC LVT Error vector (that's done in end_local_APIC_setup()).  That
logic seems to be introduced by commit:

1c69524c2e5b x86: clear IO_APIC before enabing apic error vector.

Might it be less controversial to do like Linux does and call
enable_IO_APIC() before the local APIC ESR is setup?

Thanks, Roger.
Jan Beulich July 13, 2023, 7:49 a.m. UTC | #9
On 12.07.2023 17:41, Roger Pau Monné wrote:
> On Wed, Jul 12, 2023 at 04:50:34PM +0200, Jan Beulich wrote:
>> Hmm. The question really is which of the changes we want to backport.
>> That one should be first. While it's largely guesswork, I'd be more
>> inclined to take the change that has less of an effect overall.
> 
> My views would be the other way around I think.  Current code already
> has a comment to notice that IO-APIC pins might be wrongly unmasked,
> and there's also logic for masking them when the IO-APICs are
> initialized.  The fact that such logic is placed after the local APIC
> has been initialized is IMO a bug, as having unmasked unconfigured
> IO-APIC pins when the local APIC is enabled should be avoided.

Hmm, if you really meant this (and not setting up / unmasking of
LVTERR), then your change would still be insufficient. We may enable
the APIC for the BSP in init_bsp_APIC() (which is called quite a bit
earlier), and the APIC may also have been enabled by firmware already,
so I don't view this argument as fully convincing. (That said,
init_bsp_APIC() calls clear_local_APIC(), so while the LAPIC may be
enabled, errors would be reported only in ESR, not by delivering
interrupts.)

Which gets me back to another aspect of your scenario that I haven't
fully understood: In the description you say that booting fails.
Since we handle the errors, the implication is that the pin remains
constantly active (thus triggering errors over and over again). Yet
how would this not already cause problems ahead of smp_prepare_cpus()
if the LAPIC was already enabled? Wouldn't we need to do part of
clear_local_APIC() from init_bsp_APIC() before bailing from there
when smp_found_config is set? ("Part of" because as per the comment
at the top of init_bsp_APIC() we apparently would need to leave LVT0
[and then perhaps also LVT1] unmasked.)

>> That said I can see that Linux have their enable_IO_APIC() calls
>> also ahead of setup_IO_APIC() (but after connect_bsp_APIC() and
>> setup_local_APIC()). IOW with your change we'd do the masking yet
>> earlier than them. This may of course even be advantageous, but
>> there may also be good reasons for the sequence they're using.
> 
> Linux calls enable_IO_APIC() before setting up the local
> APIC LVT Error vector (that's done in end_local_APIC_setup()).  That
> logic seems to be introduced by commit:
> 
> 1c69524c2e5b x86: clear IO_APIC before enabing apic error vector.
> 
> Might it be less controversial to do like Linux does and call
> enable_IO_APIC() before the local APIC ESR is setup?

You already do so, just that you do it yet earlier. I'm not
convinced it needs doing from the middle of setup_local_APIC() (or,
like nowaday's Linux has it, with ESR / LVTERR setup split to a
separate function, and enable_IO_APIC() called between those two
LAPIC related calls). You also disliked putting the call at the
beginning of setup_local_APIC(), so putting it in the middle of it
might be yet worse when taking that perspective. (Another downside
of calling it from anywhere in setup_local_APIC() is that this
would be another __init function called from a non-__init one. We
have examples of such, and keying the call to "bsp" would leave it
safe, but avoiding such calls when we easily can is probably a
worthwhile secondary goal.)

Question is whether it can sensibly be moved at least a little
later: verify_local_APIC() isn't of much use anyway, first and
foremost because we ignore its return value. And connect_bsp_APIC()
largely is concerned about leaving PIC mode. So maybe put the call
immediately ahead of the setup_local_APIC(true) ones?

A further question is whether, considering that Linux continues to
use that name, we wouldn't be better off not renaming the function.

One other thing I finally figured was confusing me in the
description of the patch; re-quoting that paragraph here:

"Fix this by moving the masking of IO-APIC pins ahead of the enabling
 of the local APIC.  Note that before doing such masking Xen attempts
 to detect the pin where the legacy i8259 is connected, and that logic
 relies on the pin being unmasked, hence the logic is also moved ahead
 of enabling the local APIC."

The last sentence saying "also" is kind of odd with the first one
already stating this very movement. To me it's therefore unclear what
exactly the second sentence is intended to be telling me. I guess you
want to express that together with the making the detection logic is
also moved (i.e. the entire function is called earlier), but I'm
afraid this isn't the only way to read that second sentence.

Jan
Roger Pau Monné July 13, 2023, 11:29 a.m. UTC | #10
On Thu, Jul 13, 2023 at 09:49:14AM +0200, Jan Beulich wrote:
> On 12.07.2023 17:41, Roger Pau Monné wrote:
> > On Wed, Jul 12, 2023 at 04:50:34PM +0200, Jan Beulich wrote:
> >> Hmm. The question really is which of the changes we want to backport.
> >> That one should be first. While it's largely guesswork, I'd be more
> >> inclined to take the change that has less of an effect overall.
> > 
> > My views would be the other way around I think.  Current code already
> > has a comment to notice that IO-APIC pins might be wrongly unmasked,
> > and there's also logic for masking them when the IO-APICs are
> > initialized.  The fact that such logic is placed after the local APIC
> > has been initialized is IMO a bug, as having unmasked unconfigured
> > IO-APIC pins when the local APIC is enabled should be avoided.
> 
> Hmm, if you really meant this (and not setting up / unmasking of
> LVTERR), then your change would still be insufficient. We may enable
> the APIC for the BSP in init_bsp_APIC() (which is called quite a bit
> earlier), and the APIC may also have been enabled by firmware already,
> so I don't view this argument as fully convincing. (That said,
> init_bsp_APIC() calls clear_local_APIC(), so while the LAPIC may be
> enabled, errors would be reported only in ESR, not by delivering
> interrupts.)

Indeed.  I should mention the error vector explicitly, as it's not the
local APIC being enabled, but the error vector being setup which
causes the loop here.

> Which gets me back to another aspect of your scenario that I haven't
> fully understood: In the description you say that booting fails.
> Since we handle the errors, the implication is that the pin remains
> constantly active (thus triggering errors over and over again). Yet
> how would this not already cause problems ahead of smp_prepare_cpus()

Before smp_prepare_cpus() (and explicitly before setup_local_APIC())
the error vector is not setup (or that would be my expectation), so
errors are just reported on the ESR, but there's no vector injected.

> if the LAPIC was already enabled? Wouldn't we need to do part of
> clear_local_APIC() from init_bsp_APIC() before bailing from there
> when smp_found_config is set? ("Part of" because as per the comment
> at the top of init_bsp_APIC() we apparently would need to leave LVT0
> [and then perhaps also LVT1] unmasked.)

Right, if the APIC is already enabled by the firmware we do currently
rely on the error vector being masked.  OI think this is a reasonable
expectation, as handing being done with an unknown error vector
unmasked in the local APIC would be bad.  Even with the CPU running
with interrupts disabled we would get error vectors set in IRR.

> 
> >> That said I can see that Linux have their enable_IO_APIC() calls
> >> also ahead of setup_IO_APIC() (but after connect_bsp_APIC() and
> >> setup_local_APIC()). IOW with your change we'd do the masking yet
> >> earlier than them. This may of course even be advantageous, but
> >> there may also be good reasons for the sequence they're using.
> > 
> > Linux calls enable_IO_APIC() before setting up the local
> > APIC LVT Error vector (that's done in end_local_APIC_setup()).  That
> > logic seems to be introduced by commit:
> > 
> > 1c69524c2e5b x86: clear IO_APIC before enabing apic error vector.
> > 
> > Might it be less controversial to do like Linux does and call
> > enable_IO_APIC() before the local APIC ESR is setup?
> 
> You already do so, just that you do it yet earlier. I'm not
> convinced it needs doing from the middle of setup_local_APIC() (or,
> like nowaday's Linux has it, with ESR / LVTERR setup split to a
> separate function, and enable_IO_APIC() called between those two
> LAPIC related calls). You also disliked putting the call at the
> beginning of setup_local_APIC(), so putting it in the middle of it
> might be yet worse when taking that perspective. (Another downside
> of calling it from anywhere in setup_local_APIC() is that this
> would be another __init function called from a non-__init one. We
> have examples of such, and keying the call to "bsp" would leave it
> safe, but avoiding such calls when we easily can is probably a
> worthwhile secondary goal.)

Yes, it's not my preference to call it from setup_local_APIC(),
neither splitting the setup of LVTERR/ESR into a separate function.

> 
> Question is whether it can sensibly be moved at least a little
> later: verify_local_APIC() isn't of much use anyway, first and
> foremost because we ignore its return value. And connect_bsp_APIC()
> largely is concerned about leaving PIC mode. So maybe put the call
> immediately ahead of the setup_local_APIC(true) ones?

Strictly speaking for the issue I have at hand it needs to be done
ahead of setting up LVTERR/ESR.

> A further question is whether, considering that Linux continues to
> use that name, we wouldn't be better off not renaming the function.

Yes, I will leave the name alone, albeit it's IMO a bit misleading,
it's not actually enabling anything, but rather masking the pins and
figuring out the pin where the i8259 might be connected.

> One other thing I finally figured was confusing me in the
> description of the patch; re-quoting that paragraph here:
> 
> "Fix this by moving the masking of IO-APIC pins ahead of the enabling
>  of the local APIC.  Note that before doing such masking Xen attempts
>  to detect the pin where the legacy i8259 is connected, and that logic
>  relies on the pin being unmasked, hence the logic is also moved ahead
>  of enabling the local APIC."
> 
> The last sentence saying "also" is kind of odd with the first one
> already stating this very movement. To me it's therefore unclear what
> exactly the second sentence is intended to be telling me. I guess you
> want to express that together with the making the detection logic is
> also moved (i.e. the entire function is called earlier), but I'm
> afraid this isn't the only way to read that second sentence.

Yes, you are reading it as I was intending.  It's merely a logic that
the i8259 detection is also done slightly earlier, as it relies on the
RTE masking not being tampered with.

What about:

"Fix this by moving the masking of IO-APIC pins ahead of the enabling
 of the local APIC.  Note that before doing such masking Xen attempts
 to detect the pin where the legacy i8259 is connected, and that logic
 relies on the pin being unmasked, hence the logic is also moved ahead
 of enabling the local APIC."

"Move the masking of the IO-APIC pins ahead of the setup of the local
 APIC.  This has the side effect of also moving the detection of the
 pin where the i8259 is connected, as it must be done before masking
 any pins."

Is the aboce any better?  Would you rather prefer me to drop any
mention of the i8259 detection being moved? (as it's just a side
effect of moving the masking).

So to recap, I think we are in agreement that calling enable_IO_APIC()
just ahead of the call to setup_local_APIC() is the preferred
solution?

Thanks, Roger.
Jan Beulich July 13, 2023, 12:18 p.m. UTC | #11
On 13.07.2023 13:29, Roger Pau Monné wrote:
> On Thu, Jul 13, 2023 at 09:49:14AM +0200, Jan Beulich wrote:
>> One other thing I finally figured was confusing me in the
>> description of the patch; re-quoting that paragraph here:
>>
>> "Fix this by moving the masking of IO-APIC pins ahead of the enabling
>>  of the local APIC.  Note that before doing such masking Xen attempts
>>  to detect the pin where the legacy i8259 is connected, and that logic
>>  relies on the pin being unmasked, hence the logic is also moved ahead
>>  of enabling the local APIC."
>>
>> The last sentence saying "also" is kind of odd with the first one
>> already stating this very movement. To me it's therefore unclear what
>> exactly the second sentence is intended to be telling me. I guess you
>> want to express that together with the making the detection logic is
>> also moved (i.e. the entire function is called earlier), but I'm
>> afraid this isn't the only way to read that second sentence.
> 
> Yes, you are reading it as I was intending.  It's merely a logic that
> the i8259 detection is also done slightly earlier, as it relies on the
> RTE masking not being tampered with.
> 
> What about:
> 
> "Fix this by moving the masking of IO-APIC pins ahead of the enabling
>  of the local APIC.  Note that before doing such masking Xen attempts
>  to detect the pin where the legacy i8259 is connected, and that logic
>  relies on the pin being unmasked, hence the logic is also moved ahead
>  of enabling the local APIC."
> 
> "Move the masking of the IO-APIC pins ahead of the setup of the local
>  APIC.  This has the side effect of also moving the detection of the
>  pin where the i8259 is connected, as it must be done before masking
>  any pins."
> 
> Is the aboce any better?

It is, thanks.

>  Would you rather prefer me to drop any
> mention of the i8259 detection being moved? (as it's just a side
> effect of moving the masking).

No, better mention this aspect.

> So to recap, I think we are in agreement that calling enable_IO_APIC()
> just ahead of the call to setup_local_APIC() is the preferred
> solution?

Well, yes and no. My preferred course of action for the issue at hand
would be to convert RTE 0 to ExtInt (under the mentioned set of
conditions). I agree though that we also want to move the masking of
RTEs, and for that I further agree with the placement mentioned above.

That said, you're the contributor, so it's still up to you what you
submit in what order (you could still decide to leave out the RTE
conversion change). The two of us not reaching agreement merely makes
a little more difficult to decide which of the changes to backport
then.

Jan
Roger Pau Monné July 14, 2023, 4:05 p.m. UTC | #12
On Thu, Jul 13, 2023 at 02:18:29PM +0200, Jan Beulich wrote:
> On 13.07.2023 13:29, Roger Pau Monné wrote:
> > So to recap, I think we are in agreement that calling enable_IO_APIC()
> > just ahead of the call to setup_local_APIC() is the preferred
> > solution?
> 
> Well, yes and no. My preferred course of action for the issue at hand
> would be to convert RTE 0 to ExtInt (under the mentioned set of
> conditions). I agree though that we also want to move the masking of
> RTEs, and for that I further agree with the placement mentioned above.

So I hacked up a change to set pin 0 to ExtINT mode (and avoid doing
the masking early), and I got:

(XEN) spurious 8259A interrupt: IRQ7.

This was a single interrupt, but still I think the masking is the
critical part to get backported.

Thanks, Roger.
Jan Beulich July 17, 2023, 6:40 a.m. UTC | #13
On 14.07.2023 18:05, Roger Pau Monné wrote:
> On Thu, Jul 13, 2023 at 02:18:29PM +0200, Jan Beulich wrote:
>> On 13.07.2023 13:29, Roger Pau Monné wrote:
>>> So to recap, I think we are in agreement that calling enable_IO_APIC()
>>> just ahead of the call to setup_local_APIC() is the preferred
>>> solution?
>>
>> Well, yes and no. My preferred course of action for the issue at hand
>> would be to convert RTE 0 to ExtInt (under the mentioned set of
>> conditions). I agree though that we also want to move the masking of
>> RTEs, and for that I further agree with the placement mentioned above.
> 
> So I hacked up a change to set pin 0 to ExtINT mode (and avoid doing
> the masking early), and I got:
> 
> (XEN) spurious 8259A interrupt: IRQ7.
> 
> This was a single interrupt, but still I think the masking is the
> critical part to get backported.

One way to look at it, yes. My perspective is different though: If
there truly is a (spurious or not) IRQ at the 8259, then it needs
dealing with sooner or later (in the process of booting). Aiui if
we masked pin 0 initially and then later unmasked it (after
switching to ExtInt, if not set so by firmware), we'd still see a
spurious IRQ7.

Jan
Roger Pau Monné July 17, 2023, 12:04 p.m. UTC | #14
On Mon, Jul 17, 2023 at 08:40:05AM +0200, Jan Beulich wrote:
> On 14.07.2023 18:05, Roger Pau Monné wrote:
> > On Thu, Jul 13, 2023 at 02:18:29PM +0200, Jan Beulich wrote:
> >> On 13.07.2023 13:29, Roger Pau Monné wrote:
> >>> So to recap, I think we are in agreement that calling enable_IO_APIC()
> >>> just ahead of the call to setup_local_APIC() is the preferred
> >>> solution?
> >>
> >> Well, yes and no. My preferred course of action for the issue at hand
> >> would be to convert RTE 0 to ExtInt (under the mentioned set of
> >> conditions). I agree though that we also want to move the masking of
> >> RTEs, and for that I further agree with the placement mentioned above.
> > 
> > So I hacked up a change to set pin 0 to ExtINT mode (and avoid doing
> > the masking early), and I got:
> > 
> > (XEN) spurious 8259A interrupt: IRQ7.
> > 
> > This was a single interrupt, but still I think the masking is the
> > critical part to get backported.
> 
> One way to look at it, yes. My perspective is different though: If
> there truly is a (spurious or not) IRQ at the 8259, then it needs
> dealing with sooner or later (in the process of booting). Aiui if
> we masked pin 0 initially and then later unmasked it (after
> switching to ExtInt, if not set so by firmware), we'd still see a
> spurious IRQ7.

Hm, I see.  I'm not very familiar with 8259, I was expecting that
maybe we would mask the IRQ before getting such spurious injection,
but I'm not even sure that's possible or whether it would work.

Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index f71474d47dd1..9197b9532480 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1476,6 +1476,10 @@  int __init APIC_init_uniprocessor (void)
         return -1;
     }
 
+    if ( smp_found_config && !skip_ioapic_setup && nr_ioapics )
+        /* Sanitize the IO-APIC pins before enabling the local APIC. */
+        sanitize_IO_APIC();
+
     verify_local_APIC();
 
     connect_bsp_APIC();
diff --git a/xen/arch/x86/include/asm/irq.h b/xen/arch/x86/include/asm/irq.h
index 424b0e1af8f4..dfa681846255 100644
--- a/xen/arch/x86/include/asm/irq.h
+++ b/xen/arch/x86/include/asm/irq.h
@@ -118,6 +118,7 @@  bool bogus_8259A_irq(unsigned int irq);
 int i8259A_suspend(void);
 int i8259A_resume(void);
 
+void sanitize_IO_APIC(void);
 void setup_IO_APIC(void);
 void disable_IO_APIC(void);
 void setup_ioapic_dest(void);
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index 9b8a972cf570..120c231e0302 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -1273,7 +1273,7 @@  static void cf_check _print_IO_APIC_keyhandler(unsigned char key)
     __print_IO_APIC(0);
 }
 
-static void __init enable_IO_APIC(void)
+void __init sanitize_IO_APIC(void)
 {
     int i8259_apic, i8259_pin;
     int i, apic;
@@ -2067,8 +2067,6 @@  static void __init ioapic_pm_state_alloc(void)
 
 void __init setup_IO_APIC(void)
 {
-    enable_IO_APIC();
-
     if (acpi_ioapic)
         io_apic_irqs = ~0;	/* all IRQs go through IOAPIC */
     else
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index cf9bb220f90d..f9e27a23d383 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -1221,6 +1221,10 @@  void __init smp_prepare_cpus(void)
         goto init_uniprocessor;
     }
 
+    if ( !skip_ioapic_setup && nr_ioapics )
+        /* Sanitize the IO-APIC pins before enabling the local APIC. */
+        sanitize_IO_APIC();
+
     verify_local_APIC();
 
     connect_bsp_APIC();