diff mbox series

x86/IRQ: bump max number of guests for a shared IRQ to 31

Message ID 1606780777-30718-1-git-send-email-igor.druzhinin@citrix.com (mailing list archive)
State New, archived
Headers show
Series x86/IRQ: bump max number of guests for a shared IRQ to 31 | expand

Commit Message

Igor Druzhinin Nov. 30, 2020, 11:59 p.m. UTC
Current limit of 7 is too restrictive for modern systems where one GSI
could be shared by potentially many PCI INTx sources where each of them
corresponds to a device passed through to its own guest. Some systems do not
apply due dilligence in swizzling INTx links in case e.g. INTA is declared as
interrupt pin for the majority of PCI devices behind a single router,
resulting in overuse of a GSI.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---

If people think that would make sense - I can rework the array to a list of
domain pointers to avoid the limit.

---
 xen/arch/x86/irq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jan Beulich Dec. 2, 2020, 9:25 a.m. UTC | #1
On 01.12.2020 00:59, Igor Druzhinin wrote:
> Current limit of 7 is too restrictive for modern systems where one GSI
> could be shared by potentially many PCI INTx sources where each of them
> corresponds to a device passed through to its own guest. Some systems do not
> apply due dilligence in swizzling INTx links in case e.g. INTA is declared as
> interrupt pin for the majority of PCI devices behind a single router,
> resulting in overuse of a GSI.
> 
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> ---
> 
> If people think that would make sense - I can rework the array to a list of
> domain pointers to avoid the limit.

Not sure about this. What is the biggest number you've found on any
system? (I assume the chosen value of 31 has some headroom.)

Instead I'm wondering whether this wouldn't better be a Kconfig
setting (or even command line controllable). There don't look to be
any restrictions on the precise value chosen (i.e. 2**n-1 like is
the case for old and new values here, for whatever reason), so a
simple permitted range of like 4...64 would seem fine to specify.
Whether the default then would want to be 8 (close to the current
7) or higher (around the actually observed maximum) is a different
question.

Jan
Igor Druzhinin Dec. 2, 2020, 2:53 p.m. UTC | #2
On 02/12/2020 09:25, Jan Beulich wrote:
> On 01.12.2020 00:59, Igor Druzhinin wrote:
>> Current limit of 7 is too restrictive for modern systems where one GSI
>> could be shared by potentially many PCI INTx sources where each of them
>> corresponds to a device passed through to its own guest. Some systems do not
>> apply due dilligence in swizzling INTx links in case e.g. INTA is declared as
>> interrupt pin for the majority of PCI devices behind a single router,
>> resulting in overuse of a GSI.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>> ---
>>
>> If people think that would make sense - I can rework the array to a list of
>> domain pointers to avoid the limit.
> 
> Not sure about this. What is the biggest number you've found on any
> system? (I assume the chosen value of 31 has some headroom.)

The value 31 was taken as a practical maximum for one specific HP system
if all of the PCI slots in all of its riser cards are occupied with GPUs.
The value is obtained by reverse engineering their ACPI tables. Currently
we're only concerned with number 8 (our graphics vendors do not recommend
installing more cards than that) but it's a matter of time it will grow.
I'm also not sure why this routing scheme was chosen in the first place:
is it dictated by hardware restrictions or firmware engineers being lazy - 
we're working with HP to determine it.

> Instead I'm wondering whether this wouldn't better be a Kconfig
> setting (or even command line controllable). There don't look to be
> any restrictions on the precise value chosen (i.e. 2**n-1 like is
> the case for old and new values here, for whatever reason), so a
> simple permitted range of like 4...64 would seem fine to specify.
> Whether the default then would want to be 8 (close to the current
> 7) or higher (around the actually observed maximum) is a different
> question.

I'm in favor of a command line argument here - it would be much less trouble
if a higher limit was suddenly necessary in the field. The default IMO
should definitely be higher than 8 - I'd stick with number 32 which to me
should cover our real world scenarios and apply some headroom for the future.

Igor
Jan Beulich Dec. 2, 2020, 3:21 p.m. UTC | #3
On 02.12.2020 15:53, Igor Druzhinin wrote:
> On 02/12/2020 09:25, Jan Beulich wrote:
>> On 01.12.2020 00:59, Igor Druzhinin wrote:
>>> Current limit of 7 is too restrictive for modern systems where one GSI
>>> could be shared by potentially many PCI INTx sources where each of them
>>> corresponds to a device passed through to its own guest. Some systems do not
>>> apply due dilligence in swizzling INTx links in case e.g. INTA is declared as
>>> interrupt pin for the majority of PCI devices behind a single router,
>>> resulting in overuse of a GSI.
>>>
>>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>>> ---
>>>
>>> If people think that would make sense - I can rework the array to a list of
>>> domain pointers to avoid the limit.
>>
>> Not sure about this. What is the biggest number you've found on any
>> system? (I assume the chosen value of 31 has some headroom.)
> 
> The value 31 was taken as a practical maximum for one specific HP system
> if all of the PCI slots in all of its riser cards are occupied with GPUs.
> The value is obtained by reverse engineering their ACPI tables. Currently
> we're only concerned with number 8 (our graphics vendors do not recommend
> installing more cards than that) but it's a matter of time it will grow.
> I'm also not sure why this routing scheme was chosen in the first place:
> is it dictated by hardware restrictions or firmware engineers being lazy - 
> we're working with HP to determine it.

Thanks for the insight.

>> Instead I'm wondering whether this wouldn't better be a Kconfig
>> setting (or even command line controllable). There don't look to be
>> any restrictions on the precise value chosen (i.e. 2**n-1 like is
>> the case for old and new values here, for whatever reason), so a
>> simple permitted range of like 4...64 would seem fine to specify.
>> Whether the default then would want to be 8 (close to the current
>> 7) or higher (around the actually observed maximum) is a different
>> question.
> 
> I'm in favor of a command line argument here - it would be much less trouble
> if a higher limit was suddenly necessary in the field. The default IMO
> should definitely be higher than 8 - I'd stick with number 32 which to me
> should cover our real world scenarios and apply some headroom for the future.

Well, I'm concerned of the extra memory overhead. Every IRQ,
sharable or not, will get the extra slots allocated with the
current scheme. Perhaps a prereq change then would be to only
allocate multi-guest arrays for sharable IRQs, effectively
shrinking the overhead in particular for all MSI ones?

Jan
Igor Druzhinin Dec. 2, 2020, 4:34 p.m. UTC | #4
On 02/12/2020 15:21, Jan Beulich wrote:
> On 02.12.2020 15:53, Igor Druzhinin wrote:
>> On 02/12/2020 09:25, Jan Beulich wrote:
>>> Instead I'm wondering whether this wouldn't better be a Kconfig
>>> setting (or even command line controllable). There don't look to be
>>> any restrictions on the precise value chosen (i.e. 2**n-1 like is
>>> the case for old and new values here, for whatever reason), so a
>>> simple permitted range of like 4...64 would seem fine to specify.
>>> Whether the default then would want to be 8 (close to the current
>>> 7) or higher (around the actually observed maximum) is a different
>>> question.
>>
>> I'm in favor of a command line argument here - it would be much less trouble
>> if a higher limit was suddenly necessary in the field. The default IMO
>> should definitely be higher than 8 - I'd stick with number 32 which to me
>> should cover our real world scenarios and apply some headroom for the future.
> 
> Well, I'm concerned of the extra memory overhead. Every IRQ,
> sharable or not, will get the extra slots allocated with the
> current scheme. Perhaps a prereq change then would be to only
> allocate multi-guest arrays for sharable IRQs, effectively
> shrinking the overhead in particular for all MSI ones?

That's one way to improve overall system scalability but in that area
there is certainly much bigger fish to fry elsewhere. With 32 elements in the
array we get 200 bytes of overhead per structure, with 16 it's just 72 extra
bytes which in the unattainable worst case scenario of every single vector taken
in 512 CPU machine would only account for several MB of overhead.

I'd start with dynamic array allocation first and setting the limit to 16 that
should be enough for now. And then if that default value needs to be raised
we can consider further improvements.

Igor
Jan Beulich Dec. 3, 2020, 8:20 a.m. UTC | #5
On 02.12.2020 17:34, Igor Druzhinin wrote:
> On 02/12/2020 15:21, Jan Beulich wrote:
>> On 02.12.2020 15:53, Igor Druzhinin wrote:
>>> On 02/12/2020 09:25, Jan Beulich wrote:
>>>> Instead I'm wondering whether this wouldn't better be a Kconfig
>>>> setting (or even command line controllable). There don't look to be
>>>> any restrictions on the precise value chosen (i.e. 2**n-1 like is
>>>> the case for old and new values here, for whatever reason), so a
>>>> simple permitted range of like 4...64 would seem fine to specify.
>>>> Whether the default then would want to be 8 (close to the current
>>>> 7) or higher (around the actually observed maximum) is a different
>>>> question.
>>>
>>> I'm in favor of a command line argument here - it would be much less trouble
>>> if a higher limit was suddenly necessary in the field. The default IMO
>>> should definitely be higher than 8 - I'd stick with number 32 which to me
>>> should cover our real world scenarios and apply some headroom for the future.
>>
>> Well, I'm concerned of the extra memory overhead. Every IRQ,
>> sharable or not, will get the extra slots allocated with the
>> current scheme. Perhaps a prereq change then would be to only
>> allocate multi-guest arrays for sharable IRQs, effectively
>> shrinking the overhead in particular for all MSI ones?
> 
> That's one way to improve overall system scalability but in that area
> there is certainly much bigger fish to fry elsewhere. With 32 elements in the
> array we get 200 bytes of overhead per structure, with 16 it's just 72 extra
> bytes which in the unattainable worst case scenario of every single vector taken
> in 512 CPU machine would only account for several MB of overhead.

I'm generally unhappy with this way of thinking, as this is what has
been leading to unnecessary growth of all sorts of software and its
needs of resources. Yes, there surely are larger gains to be had
elsewhere, but that's imo still no excuse to grow memory allocations
"blindly" despite it being clear that in a fair share of cases a
fair part of the allocated memory won't be used. This said, ...

> I'd start with dynamic array allocation first and setting the limit to 16 that
> should be enough for now. And then if that default value needs to be raised
> we can consider further improvements.

... I'm puzzled by this plan of yours, because unless I'm
misunderstanding dynamic array allocation is what I've been asking
for, effectively. Now that we have xmalloc_flex_struct(), this
should even be relatively straightforward, i.e. in particular with
no need to open code complex expressions.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 8d1f9a9..194f660 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1028,7 +1028,7 @@  int __init setup_irq(unsigned int irq, unsigned int irqflags,
  * HANDLING OF GUEST-BOUND PHYSICAL IRQS
  */
 
-#define IRQ_MAX_GUESTS 7
+#define IRQ_MAX_GUESTS 31
 typedef struct {
     u8 nr_guests;
     u8 in_flight;