x86/IRQ: bump max number of guests for a shared IRQ to 31

Message ID	1606780777-30718-1-git-send-email-igor.druzhinin@citrix.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=rTFK=FF=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 73551207F7 Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> IronPort-SDR: MU1q1OVfNhGAOCZ8vgWjDcDNyKxtTuhjZRCeMCqhSdbGBsEb+15fHdFnJqr4FqaXxl2gdjig/5 /o2v4nmXgA07JIYyR1sII3kO4VvHP5nW2xd8gnbV/r4gbvx96kgwDbvu8tARPZ3Gb6aGCN7KwE LVm0CHoUXxNimPHC+lpKMjLBENjDi1fx+2xwuDEZNBF1iuL4Cw8vuOFTccFqWyvkPCRezoXwsG kV8HhjGjTu51Lu/BvFmK66UeLPABPtHMcXZq/LY1TbSBACc31crIARIqZCIsCYSczNUQxZOYRI IgQ= From: Igor Druzhinin <igor.druzhinin@citrix.com> To: <xen-devel@lists.xenproject.org> CC: <jbeulich@suse.com>, <andrew.cooper3@citrix.com>, <roger.pau@citrix.com>, <wl@xen.org>, Igor Druzhinin <igor.druzhinin@citrix.com> Subject: [PATCH] x86/IRQ: bump max number of guests for a shared IRQ to 31 Date: Mon, 30 Nov 2020 23:59:37 +0000 Message-ID: <1606780777-30718-1-git-send-email-igor.druzhinin@citrix.com> MIME-Version: 1.0 Content-Type: text/plain
Series	x86/IRQ: bump max number of guests for a shared IRQ to 31 \| expand x86/IRQ: bump max number of guests for a shared IRQ to 31

Igor Druzhinin Nov. 30, 2020, 11:59 p.m. UTC

Current limit of 7 is too restrictive for modern systems where one GSI
could be shared by potentially many PCI INTx sources where each of them
corresponds to a device passed through to its own guest. Some systems do not
apply due dilligence in swizzling INTx links in case e.g. INTA is declared as
interrupt pin for the majority of PCI devices behind a single router,
resulting in overuse of a GSI.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---

If people think that would make sense - I can rework the array to a list of
domain pointers to avoid the limit.

---
 xen/arch/x86/irq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Jan Beulich Dec. 2, 2020, 9:25 a.m. UTC | #1

On 01.12.2020 00:59, Igor Druzhinin wrote:
> Current limit of 7 is too restrictive for modern systems where one GSI
> could be shared by potentially many PCI INTx sources where each of them
> corresponds to a device passed through to its own guest. Some systems do not
> apply due dilligence in swizzling INTx links in case e.g. INTA is declared as
> interrupt pin for the majority of PCI devices behind a single router,
> resulting in overuse of a GSI.
> 
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> ---
> 
> If people think that would make sense - I can rework the array to a list of
> domain pointers to avoid the limit.

Not sure about this. What is the biggest number you've found on any
system? (I assume the chosen value of 31 has some headroom.)

Instead I'm wondering whether this wouldn't better be a Kconfig
setting (or even command line controllable). There don't look to be
any restrictions on the precise value chosen (i.e. 2**n-1 like is
the case for old and new values here, for whatever reason), so a
simple permitted range of like 4...64 would seem fine to specify.
Whether the default then would want to be 8 (close to the current
7) or higher (around the actually observed maximum) is a different
question.

Jan

Igor Druzhinin Dec. 2, 2020, 2:53 p.m. UTC | #2

On 02/12/2020 09:25, Jan Beulich wrote:
> On 01.12.2020 00:59, Igor Druzhinin wrote:
>> Current limit of 7 is too restrictive for modern systems where one GSI
>> could be shared by potentially many PCI INTx sources where each of them
>> corresponds to a device passed through to its own guest. Some systems do not
>> apply due dilligence in swizzling INTx links in case e.g. INTA is declared as
>> interrupt pin for the majority of PCI devices behind a single router,
>> resulting in overuse of a GSI.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>> ---
>>
>> If people think that would make sense - I can rework the array to a list of
>> domain pointers to avoid the limit.
> 
> Not sure about this. What is the biggest number you've found on any
> system? (I assume the chosen value of 31 has some headroom.)

The value 31 was taken as a practical maximum for one specific HP system
if all of the PCI slots in all of its riser cards are occupied with GPUs.
The value is obtained by reverse engineering their ACPI tables. Currently
we're only concerned with number 8 (our graphics vendors do not recommend
installing more cards than that) but it's a matter of time it will grow.
I'm also not sure why this routing scheme was chosen in the first place:
is it dictated by hardware restrictions or firmware engineers being lazy - 
we're working with HP to determine it.

> Instead I'm wondering whether this wouldn't better be a Kconfig
> setting (or even command line controllable). There don't look to be
> any restrictions on the precise value chosen (i.e. 2**n-1 like is
> the case for old and new values here, for whatever reason), so a
> simple permitted range of like 4...64 would seem fine to specify.
> Whether the default then would want to be 8 (close to the current
> 7) or higher (around the actually observed maximum) is a different
> question.

I'm in favor of a command line argument here - it would be much less trouble
if a higher limit was suddenly necessary in the field. The default IMO
should definitely be higher than 8 - I'd stick with number 32 which to me
should cover our real world scenarios and apply some headroom for the future.

Igor

Jan Beulich Dec. 2, 2020, 3:21 p.m. UTC | #3

On 02.12.2020 15:53, Igor Druzhinin wrote:
> On 02/12/2020 09:25, Jan Beulich wrote:
>> On 01.12.2020 00:59, Igor Druzhinin wrote:
>>> Current limit of 7 is too restrictive for modern systems where one GSI
>>> could be shared by potentially many PCI INTx sources where each of them
>>> corresponds to a device passed through to its own guest. Some systems do not
>>> apply due dilligence in swizzling INTx links in case e.g. INTA is declared as
>>> interrupt pin for the majority of PCI devices behind a single router,
>>> resulting in overuse of a GSI.
>>>
>>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>>> ---
>>>
>>> If people think that would make sense - I can rework the array to a list of
>>> domain pointers to avoid the limit.
>>
>> Not sure about this. What is the biggest number you've found on any
>> system? (I assume the chosen value of 31 has some headroom.)
> 
> The value 31 was taken as a practical maximum for one specific HP system
> if all of the PCI slots in all of its riser cards are occupied with GPUs.
> The value is obtained by reverse engineering their ACPI tables. Currently
> we're only concerned with number 8 (our graphics vendors do not recommend
> installing more cards than that) but it's a matter of time it will grow.
> I'm also not sure why this routing scheme was chosen in the first place:
> is it dictated by hardware restrictions or firmware engineers being lazy - 
> we're working with HP to determine it.

Thanks for the insight.

>> Instead I'm wondering whether this wouldn't better be a Kconfig
>> setting (or even command line controllable). There don't look to be
>> any restrictions on the precise value chosen (i.e. 2**n-1 like is
>> the case for old and new values here, for whatever reason), so a
>> simple permitted range of like 4...64 would seem fine to specify.
>> Whether the default then would want to be 8 (close to the current
>> 7) or higher (around the actually observed maximum) is a different
>> question.
> 
> I'm in favor of a command line argument here - it would be much less trouble
> if a higher limit was suddenly necessary in the field. The default IMO
> should definitely be higher than 8 - I'd stick with number 32 which to me
> should cover our real world scenarios and apply some headroom for the future.

Well, I'm concerned of the extra memory overhead. Every IRQ,
sharable or not, will get the extra slots allocated with the
current scheme. Perhaps a prereq change then would be to only
allocate multi-guest arrays for sharable IRQs, effectively
shrinking the overhead in particular for all MSI ones?

Jan

Igor Druzhinin Dec. 2, 2020, 4:34 p.m. UTC | #4

On 02/12/2020 15:21, Jan Beulich wrote:
> On 02.12.2020 15:53, Igor Druzhinin wrote:
>> On 02/12/2020 09:25, Jan Beulich wrote:
>>> Instead I'm wondering whether this wouldn't better be a Kconfig
>>> setting (or even command line controllable). There don't look to be
>>> any restrictions on the precise value chosen (i.e. 2**n-1 like is
>>> the case for old and new values here, for whatever reason), so a
>>> simple permitted range of like 4...64 would seem fine to specify.
>>> Whether the default then would want to be 8 (close to the current
>>> 7) or higher (around the actually observed maximum) is a different
>>> question.
>>
>> I'm in favor of a command line argument here - it would be much less trouble
>> if a higher limit was suddenly necessary in the field. The default IMO
>> should definitely be higher than 8 - I'd stick with number 32 which to me
>> should cover our real world scenarios and apply some headroom for the future.
> 
> Well, I'm concerned of the extra memory overhead. Every IRQ,
> sharable or not, will get the extra slots allocated with the
> current scheme. Perhaps a prereq change then would be to only
> allocate multi-guest arrays for sharable IRQs, effectively
> shrinking the overhead in particular for all MSI ones?

That's one way to improve overall system scalability but in that area
there is certainly much bigger fish to fry elsewhere. With 32 elements in the
array we get 200 bytes of overhead per structure, with 16 it's just 72 extra
bytes which in the unattainable worst case scenario of every single vector taken
in 512 CPU machine would only account for several MB of overhead.

I'd start with dynamic array allocation first and setting the limit to 16 that
should be enough for now. And then if that default value needs to be raised
we can consider further improvements.

Igor

Jan Beulich Dec. 3, 2020, 8:20 a.m. UTC | #5

On 02.12.2020 17:34, Igor Druzhinin wrote:
> On 02/12/2020 15:21, Jan Beulich wrote:
>> On 02.12.2020 15:53, Igor Druzhinin wrote:
>>> On 02/12/2020 09:25, Jan Beulich wrote:
>>>> Instead I'm wondering whether this wouldn't better be a Kconfig
>>>> setting (or even command line controllable). There don't look to be
>>>> any restrictions on the precise value chosen (i.e. 2**n-1 like is
>>>> the case for old and new values here, for whatever reason), so a
>>>> simple permitted range of like 4...64 would seem fine to specify.
>>>> Whether the default then would want to be 8 (close to the current
>>>> 7) or higher (around the actually observed maximum) is a different
>>>> question.
>>>
>>> I'm in favor of a command line argument here - it would be much less trouble
>>> if a higher limit was suddenly necessary in the field. The default IMO
>>> should definitely be higher than 8 - I'd stick with number 32 which to me
>>> should cover our real world scenarios and apply some headroom for the future.
>>
>> Well, I'm concerned of the extra memory overhead. Every IRQ,
>> sharable or not, will get the extra slots allocated with the
>> current scheme. Perhaps a prereq change then would be to only
>> allocate multi-guest arrays for sharable IRQs, effectively
>> shrinking the overhead in particular for all MSI ones?
> 
> That's one way to improve overall system scalability but in that area
> there is certainly much bigger fish to fry elsewhere. With 32 elements in the
> array we get 200 bytes of overhead per structure, with 16 it's just 72 extra
> bytes which in the unattainable worst case scenario of every single vector taken
> in 512 CPU machine would only account for several MB of overhead.

I'm generally unhappy with this way of thinking, as this is what has
been leading to unnecessary growth of all sorts of software and its
needs of resources. Yes, there surely are larger gains to be had
elsewhere, but that's imo still no excuse to grow memory allocations
"blindly" despite it being clear that in a fair share of cases a
fair part of the allocated memory won't be used. This said, ...

> I'd start with dynamic array allocation first and setting the limit to 16 that
> should be enough for now. And then if that default value needs to be raised
> we can consider further improvements.

... I'm puzzled by this plan of yours, because unless I'm
misunderstanding dynamic array allocation is what I've been asking
for, effectively. Now that we have xmalloc_flex_struct(), this
should even be relatively straightforward, i.e. in particular with
no need to open code complex expressions.

Jan

x86/IRQ: bump max number of guests for a shared IRQ to 31

Commit Message

Comments

Patch