diff mbox series

[v3,3/3] cmdline: document and enforce "extra_guest_irqs" upper bounds

Message ID 3c3a1d0c-06f2-a392-b2f9-381bed5c5e7b@suse.com (mailing list archive)
State New, archived
Headers show
Series new CONFIG_HAS_PIRQ and extra_guest_irqs adjustment | expand

Commit Message

Jan Beulich July 27, 2023, 7:38 a.m. UTC
PHYSDEVOP_pirq_eoi_gmfn_v<N> accepting just a single GFN implies that no
more than 32k pIRQ-s can be used by a domain on x86. Document this upper
bound.

To also enforce the limit, (ab)use both arch_hwdom_irqs() (changing its
parameter type) and setup_system_domains(). This is primarily to avoid
exposing the two static variables or introducing yet further arch hooks.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Instead of passing dom_xen into arch_hwdom_irqs(), NULL could also be
used. That would make the connection to setup_system_domains() yet more
weak, though.

Passing the domain pointer instead of the domain ID would also allow
to return a possibly different value if sensible for PVH Dom0 (which
presently has no access to PHYSDEVOP_pirq_eoi_gmfn_v<N> in the first
place).
---
v2: Also enforce these bounds. Adjust doc to constrain the bound to x86
    only. Re-base over new earlier patch.

Comments

Roger Pau Monné July 1, 2024, 9:55 a.m. UTC | #1
On Thu, Jul 27, 2023 at 09:38:55AM +0200, Jan Beulich wrote:
> PHYSDEVOP_pirq_eoi_gmfn_v<N> accepting just a single GFN implies that no
> more than 32k pIRQ-s can be used by a domain on x86. Document this upper
> bound.
> 
> To also enforce the limit, (ab)use both arch_hwdom_irqs() (changing its
> parameter type) and setup_system_domains(). This is primarily to avoid
> exposing the two static variables or introducing yet further arch hooks.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Instead of passing dom_xen into arch_hwdom_irqs(), NULL could also be
> used. That would make the connection to setup_system_domains() yet more
> weak, though.
> 
> Passing the domain pointer instead of the domain ID would also allow
> to return a possibly different value if sensible for PVH Dom0 (which
> presently has no access to PHYSDEVOP_pirq_eoi_gmfn_v<N> in the first
> place).
> ---
> v2: Also enforce these bounds. Adjust doc to constrain the bound to x86
>     only. Re-base over new earlier patch.
> 
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1146,7 +1146,8 @@ common for all domUs, while the optional
>  is for dom0.  Changing the setting for domU has no impact on dom0 and vice
>  versa.  For example to change dom0 without changing domU, use
>  `extra_guest_irqs=,512`.  The default value for Dom0 and an eventual separate
> -hardware domain is architecture dependent.
> +hardware domain is architecture dependent.  The upper limit for both values on
> +x86 is such that the resulting total number of IRQs can't be higher than 32768.
>  Note that specifying zero as domU value means zero, while for dom0 it means
>  to use the default.
>  
> --- a/xen/arch/x86/io_apic.c
> +++ b/xen/arch/x86/io_apic.c
> @@ -2663,18 +2663,21 @@ void __init ioapic_init(void)
>             nr_irqs_gsi, nr_irqs - nr_irqs_gsi);
>  }
>  
> -unsigned int arch_hwdom_irqs(domid_t domid)
> +unsigned int arch_hwdom_irqs(const struct domain *d)

While at it, should this be __hwdom_init?

I'm fine with changing the function to take a domain parameter...

>  {
>      unsigned int n = fls(num_present_cpus());
>  
> -    if ( !domid )
> +    if ( is_system_domain(d) )
> +        return PAGE_SIZE * BITS_PER_BYTE;

... but why do we need a function call just to get a constant value?
Wouldn't this better be a define in a header?

> +
> +    if ( !d->domain_id )
>          n = min(n, dom0_max_vcpus());
>      n = min(nr_irqs_gsi + n * NR_DYNAMIC_VECTORS, nr_irqs);
>  
>      /* Bounded by the domain pirq eoi bitmap gfn. */
>      n = min_t(unsigned int, n, PAGE_SIZE * BITS_PER_BYTE);

So that could also use the same constant here?

> -    printk("Dom%d has maximum %u PIRQs\n", domid, n);
> +    printk("%pd has maximum %u PIRQs\n", d, n);
>  
>      return n;
>  }
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -693,7 +693,7 @@ struct domain *domain_create(domid_t dom
>              d->nr_pirqs = nr_static_irqs + extra_domU_irqs;
>          else
>              d->nr_pirqs = extra_hwdom_irqs ? nr_static_irqs + extra_hwdom_irqs
> -                                           : arch_hwdom_irqs(domid);
> +                                           : arch_hwdom_irqs(d);
>          d->nr_pirqs = min(d->nr_pirqs, nr_irqs);
>  
>          radix_tree_init(&d->pirq_tree);
> @@ -819,6 +819,24 @@ void __init setup_system_domains(void)
>      if ( IS_ERR(dom_xen) )
>          panic("Failed to create d[XEN]: %ld\n", PTR_ERR(dom_xen));
>  
> +#ifdef CONFIG_HAS_PIRQ
> +    /* Bound-check values passed via "extra_guest_irqs=". */
> +    {
> +        unsigned int n = max(arch_hwdom_irqs(dom_xen), nr_static_irqs);
> +
> +        if ( extra_hwdom_irqs > n - nr_static_irqs )
> +        {
> +            extra_hwdom_irqs = n - nr_static_irqs;
> +            printk(XENLOG_WARNING "hwdom IRQs bounded to %u\n", n);
> +        }
> +        if ( extra_domU_irqs > max(32U, n - nr_static_irqs) )
> +        {
> +            extra_domU_irqs = n - nr_static_irqs;
> +            printk(XENLOG_WARNING "domU IRQs bounded to %u\n", n);
> +        }
> +    }
> +#endif

IMO this is kind of a weird placement. Wouldn't this be more naturally
handled in parse_extra_guest_irqs()?

Thanks, Roger.
Jan Beulich July 1, 2024, 10:40 a.m. UTC | #2
On 01.07.2024 11:55, Roger Pau Monné wrote:
> On Thu, Jul 27, 2023 at 09:38:55AM +0200, Jan Beulich wrote:
>> --- a/xen/arch/x86/io_apic.c
>> +++ b/xen/arch/x86/io_apic.c
>> @@ -2663,18 +2663,21 @@ void __init ioapic_init(void)
>>             nr_irqs_gsi, nr_irqs - nr_irqs_gsi);
>>  }
>>  
>> -unsigned int arch_hwdom_irqs(domid_t domid)
>> +unsigned int arch_hwdom_irqs(const struct domain *d)
> 
> While at it, should this be __hwdom_init?

It indeed can be, so I've done this for v4.

> I'm fine with changing the function to take a domain parameter...
> 
>>  {
>>      unsigned int n = fls(num_present_cpus());
>>  
>> -    if ( !domid )
>> +    if ( is_system_domain(d) )
>> +        return PAGE_SIZE * BITS_PER_BYTE;
> 
> ... but why do we need a function call just to get a constant value?
> Wouldn't this better be a define in a header?

Would be an option, but would result in parts of the logic living is
distinct places.

>> +
>> +    if ( !d->domain_id )
>>          n = min(n, dom0_max_vcpus());
>>      n = min(nr_irqs_gsi + n * NR_DYNAMIC_VECTORS, nr_irqs);
>>  
>>      /* Bounded by the domain pirq eoi bitmap gfn. */
>>      n = min_t(unsigned int, n, PAGE_SIZE * BITS_PER_BYTE);
> 
> So that could also use the same constant here?
> 
>> -    printk("Dom%d has maximum %u PIRQs\n", domid, n);
>> +    printk("%pd has maximum %u PIRQs\n", d, n);
>>  
>>      return n;
>>  }
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -693,7 +693,7 @@ struct domain *domain_create(domid_t dom
>>              d->nr_pirqs = nr_static_irqs + extra_domU_irqs;
>>          else
>>              d->nr_pirqs = extra_hwdom_irqs ? nr_static_irqs + extra_hwdom_irqs
>> -                                           : arch_hwdom_irqs(domid);
>> +                                           : arch_hwdom_irqs(d);
>>          d->nr_pirqs = min(d->nr_pirqs, nr_irqs);
>>  
>>          radix_tree_init(&d->pirq_tree);
>> @@ -819,6 +819,24 @@ void __init setup_system_domains(void)
>>      if ( IS_ERR(dom_xen) )
>>          panic("Failed to create d[XEN]: %ld\n", PTR_ERR(dom_xen));
>>  
>> +#ifdef CONFIG_HAS_PIRQ
>> +    /* Bound-check values passed via "extra_guest_irqs=". */
>> +    {
>> +        unsigned int n = max(arch_hwdom_irqs(dom_xen), nr_static_irqs);
>> +
>> +        if ( extra_hwdom_irqs > n - nr_static_irqs )
>> +        {
>> +            extra_hwdom_irqs = n - nr_static_irqs;
>> +            printk(XENLOG_WARNING "hwdom IRQs bounded to %u\n", n);
>> +        }
>> +        if ( extra_domU_irqs > max(32U, n - nr_static_irqs) )
>> +        {
>> +            extra_domU_irqs = n - nr_static_irqs;
>> +            printk(XENLOG_WARNING "domU IRQs bounded to %u\n", n);
>> +        }
>> +    }
>> +#endif
> 
> IMO this is kind of a weird placement. Wouldn't this be more naturally
> handled in parse_extra_guest_irqs()?

Indeed it is and yes it would, but no, it can't. We shouldn't rely on
the particular behavior of arch_hwdom_irqs(), and in the general case
we can't call it as early as when command line arguments are parsed. I
couldn't think of a neater way of doing this, and it not being pretty
is why I'm saying "(ab)use" in the description.

Jan
Roger Pau Monné July 1, 2024, 1:29 p.m. UTC | #3
On Mon, Jul 01, 2024 at 12:40:35PM +0200, Jan Beulich wrote:
> On 01.07.2024 11:55, Roger Pau Monné wrote:
> > On Thu, Jul 27, 2023 at 09:38:55AM +0200, Jan Beulich wrote:
> >> --- a/xen/arch/x86/io_apic.c
> >> +++ b/xen/arch/x86/io_apic.c
> >> @@ -2663,18 +2663,21 @@ void __init ioapic_init(void)
> >>             nr_irqs_gsi, nr_irqs - nr_irqs_gsi);
> >>  }
> >>  
> >> -unsigned int arch_hwdom_irqs(domid_t domid)
> >> +unsigned int arch_hwdom_irqs(const struct domain *d)
> > 
> > While at it, should this be __hwdom_init?
> 
> It indeed can be, so I've done this for v4.
> 
> > I'm fine with changing the function to take a domain parameter...
> > 
> >>  {
> >>      unsigned int n = fls(num_present_cpus());
> >>  
> >> -    if ( !domid )
> >> +    if ( is_system_domain(d) )
> >> +        return PAGE_SIZE * BITS_PER_BYTE;
> > 
> > ... but why do we need a function call just to get a constant value?
> > Wouldn't this better be a define in a header?
> 
> Would be an option, but would result in parts of the logic living is
> distinct places.
> 
> >> +
> >> +    if ( !d->domain_id )
> >>          n = min(n, dom0_max_vcpus());
> >>      n = min(nr_irqs_gsi + n * NR_DYNAMIC_VECTORS, nr_irqs);
> >>  
> >>      /* Bounded by the domain pirq eoi bitmap gfn. */
> >>      n = min_t(unsigned int, n, PAGE_SIZE * BITS_PER_BYTE);
> > 
> > So that could also use the same constant here?

I would have a slight preference for PAGE_SIZE * BITS_PER_BYTE being
defined inside of this function as:

/* Bounded by the domain pirq eoi bitmap gfn. */
const unsigned int max_irqs = PAGE_SIZE * BITS_PER_BYTE;

Or similar for clarity purposes.

While at it, I've noticed that PHYSDEVOP_pirq_eoi_gmfn_v{1,2} is not
available to HVM guests (not even when exposing PIRQ support) and
hence I wonder if we should special case PVH dom0, but maybe it's best
to deal with this properly rather than hacking something special
just for PVH dom0.  At the end of the day the current limit is high
enough to not cause issues on current systems I would expect.

> >> -    printk("Dom%d has maximum %u PIRQs\n", domid, n);
> >> +    printk("%pd has maximum %u PIRQs\n", d, n);
> >>  
> >>      return n;
> >>  }
> >> --- a/xen/common/domain.c
> >> +++ b/xen/common/domain.c
> >> @@ -693,7 +693,7 @@ struct domain *domain_create(domid_t dom
> >>              d->nr_pirqs = nr_static_irqs + extra_domU_irqs;
> >>          else
> >>              d->nr_pirqs = extra_hwdom_irqs ? nr_static_irqs + extra_hwdom_irqs
> >> -                                           : arch_hwdom_irqs(domid);
> >> +                                           : arch_hwdom_irqs(d);
> >>          d->nr_pirqs = min(d->nr_pirqs, nr_irqs);
> >>  
> >>          radix_tree_init(&d->pirq_tree);
> >> @@ -819,6 +819,24 @@ void __init setup_system_domains(void)
> >>      if ( IS_ERR(dom_xen) )
> >>          panic("Failed to create d[XEN]: %ld\n", PTR_ERR(dom_xen));
> >>  
> >> +#ifdef CONFIG_HAS_PIRQ
> >> +    /* Bound-check values passed via "extra_guest_irqs=". */
> >> +    {
> >> +        unsigned int n = max(arch_hwdom_irqs(dom_xen), nr_static_irqs);
> >> +
> >> +        if ( extra_hwdom_irqs > n - nr_static_irqs )
> >> +        {
> >> +            extra_hwdom_irqs = n - nr_static_irqs;
> >> +            printk(XENLOG_WARNING "hwdom IRQs bounded to %u\n", n);
> >> +        }
> >> +        if ( extra_domU_irqs > max(32U, n - nr_static_irqs) )
> >> +        {
> >> +            extra_domU_irqs = n - nr_static_irqs;
> >> +            printk(XENLOG_WARNING "domU IRQs bounded to %u\n", n);
> >> +        }
> >> +    }
> >> +#endif
> > 
> > IMO this is kind of a weird placement. Wouldn't this be more naturally
> > handled in parse_extra_guest_irqs()?
> 
> Indeed it is and yes it would, but no, it can't. We shouldn't rely on
> the particular behavior of arch_hwdom_irqs(), and in the general case
> we can't call it as early as when command line arguments are parsed. I
> couldn't think of a neater way of doing this, and it not being pretty
> is why I'm saying "(ab)use" in the description.

I see, nr_static_irqs is an alias of nr_irqs_gsi, which is not properly
set by the time we evaluate command line arguments.

My only possible suggestion would be to do it as a presmp initcall,
and define/register such initcall for x86 only, the only benefit would
be that such inicall could be defined in the same translation unit as
arch_hwdom_irqs() then.

Thanks, Roger.
Jan Beulich July 1, 2024, 3:07 p.m. UTC | #4
On 01.07.2024 15:29, Roger Pau Monné wrote:
> On Mon, Jul 01, 2024 at 12:40:35PM +0200, Jan Beulich wrote:
>> On 01.07.2024 11:55, Roger Pau Monné wrote:
>>> On Thu, Jul 27, 2023 at 09:38:55AM +0200, Jan Beulich wrote:
>>>> --- a/xen/arch/x86/io_apic.c
>>>> +++ b/xen/arch/x86/io_apic.c
>>>> @@ -2663,18 +2663,21 @@ void __init ioapic_init(void)
>>>>             nr_irqs_gsi, nr_irqs - nr_irqs_gsi);
>>>>  }
>>>>  
>>>> -unsigned int arch_hwdom_irqs(domid_t domid)
>>>> +unsigned int arch_hwdom_irqs(const struct domain *d)
>>>
>>> While at it, should this be __hwdom_init?
>>
>> It indeed can be, so I've done this for v4.
>>
>>> I'm fine with changing the function to take a domain parameter...
>>>
>>>>  {
>>>>      unsigned int n = fls(num_present_cpus());
>>>>  
>>>> -    if ( !domid )
>>>> +    if ( is_system_domain(d) )
>>>> +        return PAGE_SIZE * BITS_PER_BYTE;
>>>
>>> ... but why do we need a function call just to get a constant value?
>>> Wouldn't this better be a define in a header?
>>
>> Would be an option, but would result in parts of the logic living is
>> distinct places.
>>
>>>> +
>>>> +    if ( !d->domain_id )
>>>>          n = min(n, dom0_max_vcpus());
>>>>      n = min(nr_irqs_gsi + n * NR_DYNAMIC_VECTORS, nr_irqs);
>>>>  
>>>>      /* Bounded by the domain pirq eoi bitmap gfn. */
>>>>      n = min_t(unsigned int, n, PAGE_SIZE * BITS_PER_BYTE);
>>>
>>> So that could also use the same constant here?
> 
> I would have a slight preference for PAGE_SIZE * BITS_PER_BYTE being
> defined inside of this function as:
> 
> /* Bounded by the domain pirq eoi bitmap gfn. */
> const unsigned int max_irqs = PAGE_SIZE * BITS_PER_BYTE;
> 
> Or similar for clarity purposes.

Can do, sure.

> While at it, I've noticed that PHYSDEVOP_pirq_eoi_gmfn_v{1,2} is not
> available to HVM guests (not even when exposing PIRQ support) and
> hence I wonder if we should special case PVH dom0, but maybe it's best
> to deal with this properly rather than hacking something special
> just for PVH dom0.  At the end of the day the current limit is high
> enough to not cause issues on current systems I would expect.

Oh, so entirely the other way around than mentioned when we talked, once
again due to the filtering in hvm/hypercall.h that I keep forgetting. So
in principle we could avoid the bounding for HVM. Just that right now
extra_domU_irqs covers both PV and HVM, and would hence need splitting
first.

>>>> --- a/xen/common/domain.c
>>>> +++ b/xen/common/domain.c
>>>> @@ -693,7 +693,7 @@ struct domain *domain_create(domid_t dom
>>>>              d->nr_pirqs = nr_static_irqs + extra_domU_irqs;
>>>>          else
>>>>              d->nr_pirqs = extra_hwdom_irqs ? nr_static_irqs + extra_hwdom_irqs
>>>> -                                           : arch_hwdom_irqs(domid);
>>>> +                                           : arch_hwdom_irqs(d);
>>>>          d->nr_pirqs = min(d->nr_pirqs, nr_irqs);
>>>>  
>>>>          radix_tree_init(&d->pirq_tree);
>>>> @@ -819,6 +819,24 @@ void __init setup_system_domains(void)
>>>>      if ( IS_ERR(dom_xen) )
>>>>          panic("Failed to create d[XEN]: %ld\n", PTR_ERR(dom_xen));
>>>>  
>>>> +#ifdef CONFIG_HAS_PIRQ
>>>> +    /* Bound-check values passed via "extra_guest_irqs=". */
>>>> +    {
>>>> +        unsigned int n = max(arch_hwdom_irqs(dom_xen), nr_static_irqs);
>>>> +
>>>> +        if ( extra_hwdom_irqs > n - nr_static_irqs )
>>>> +        {
>>>> +            extra_hwdom_irqs = n - nr_static_irqs;
>>>> +            printk(XENLOG_WARNING "hwdom IRQs bounded to %u\n", n);
>>>> +        }
>>>> +        if ( extra_domU_irqs > max(32U, n - nr_static_irqs) )
>>>> +        {
>>>> +            extra_domU_irqs = n - nr_static_irqs;
>>>> +            printk(XENLOG_WARNING "domU IRQs bounded to %u\n", n);
>>>> +        }
>>>> +    }
>>>> +#endif
>>>
>>> IMO this is kind of a weird placement. Wouldn't this be more naturally
>>> handled in parse_extra_guest_irqs()?
>>
>> Indeed it is and yes it would, but no, it can't. We shouldn't rely on
>> the particular behavior of arch_hwdom_irqs(), and in the general case
>> we can't call it as early as when command line arguments are parsed. I
>> couldn't think of a neater way of doing this, and it not being pretty
>> is why I'm saying "(ab)use" in the description.
> 
> I see, nr_static_irqs is an alias of nr_irqs_gsi, which is not properly
> set by the time we evaluate command line arguments.
> 
> My only possible suggestion would be to do it as a presmp initcall,
> and define/register such initcall for x86 only, the only benefit would
> be that such inicall could be defined in the same translation unit as
> arch_hwdom_irqs() then.

Which then would require making extra_{hwdom,domU}_irqs available to
x86/io_apic.c, which also wouldn't be very nice. To be honest, I'd prefer
to keep the logic where it is, until such time where perhaps we move pIRQ
stuff wholesale to x86-only files.

Jan
Roger Pau Monné July 1, 2024, 3:17 p.m. UTC | #5
On Mon, Jul 01, 2024 at 05:07:19PM +0200, Jan Beulich wrote:
> On 01.07.2024 15:29, Roger Pau Monné wrote:
> > On Mon, Jul 01, 2024 at 12:40:35PM +0200, Jan Beulich wrote:
> >> On 01.07.2024 11:55, Roger Pau Monné wrote:
> >>> On Thu, Jul 27, 2023 at 09:38:55AM +0200, Jan Beulich wrote:
> >>>> --- a/xen/arch/x86/io_apic.c
> >>>> +++ b/xen/arch/x86/io_apic.c
> >>>> @@ -2663,18 +2663,21 @@ void __init ioapic_init(void)
> >>>>             nr_irqs_gsi, nr_irqs - nr_irqs_gsi);
> >>>>  }
> >>>>  
> >>>> -unsigned int arch_hwdom_irqs(domid_t domid)
> >>>> +unsigned int arch_hwdom_irqs(const struct domain *d)
> >>>
> >>> While at it, should this be __hwdom_init?
> >>
> >> It indeed can be, so I've done this for v4.
> >>
> >>> I'm fine with changing the function to take a domain parameter...
> >>>
> >>>>  {
> >>>>      unsigned int n = fls(num_present_cpus());
> >>>>  
> >>>> -    if ( !domid )
> >>>> +    if ( is_system_domain(d) )
> >>>> +        return PAGE_SIZE * BITS_PER_BYTE;
> >>>
> >>> ... but why do we need a function call just to get a constant value?
> >>> Wouldn't this better be a define in a header?
> >>
> >> Would be an option, but would result in parts of the logic living is
> >> distinct places.
> >>
> >>>> +
> >>>> +    if ( !d->domain_id )
> >>>>          n = min(n, dom0_max_vcpus());
> >>>>      n = min(nr_irqs_gsi + n * NR_DYNAMIC_VECTORS, nr_irqs);
> >>>>  
> >>>>      /* Bounded by the domain pirq eoi bitmap gfn. */
> >>>>      n = min_t(unsigned int, n, PAGE_SIZE * BITS_PER_BYTE);
> >>>
> >>> So that could also use the same constant here?
> > 
> > I would have a slight preference for PAGE_SIZE * BITS_PER_BYTE being
> > defined inside of this function as:
> > 
> > /* Bounded by the domain pirq eoi bitmap gfn. */
> > const unsigned int max_irqs = PAGE_SIZE * BITS_PER_BYTE;
> > 
> > Or similar for clarity purposes.
> 
> Can do, sure.
> 
> > While at it, I've noticed that PHYSDEVOP_pirq_eoi_gmfn_v{1,2} is not
> > available to HVM guests (not even when exposing PIRQ support) and
> > hence I wonder if we should special case PVH dom0, but maybe it's best
> > to deal with this properly rather than hacking something special
> > just for PVH dom0.  At the end of the day the current limit is high
> > enough to not cause issues on current systems I would expect.
> 
> Oh, so entirely the other way around than mentioned when we talked, once
> again due to the filtering in hvm/hypercall.h that I keep forgetting. So
> in principle we could avoid the bounding for HVM. Just that right now
> extra_domU_irqs covers both PV and HVM, and would hence need splitting
> first.

Yes, we would need to split, that's why I'm OK with what you propose
here.  We can do the split as a later change.

> >>>> --- a/xen/common/domain.c
> >>>> +++ b/xen/common/domain.c
> >>>> @@ -693,7 +693,7 @@ struct domain *domain_create(domid_t dom
> >>>>              d->nr_pirqs = nr_static_irqs + extra_domU_irqs;
> >>>>          else
> >>>>              d->nr_pirqs = extra_hwdom_irqs ? nr_static_irqs + extra_hwdom_irqs
> >>>> -                                           : arch_hwdom_irqs(domid);
> >>>> +                                           : arch_hwdom_irqs(d);
> >>>>          d->nr_pirqs = min(d->nr_pirqs, nr_irqs);
> >>>>  
> >>>>          radix_tree_init(&d->pirq_tree);
> >>>> @@ -819,6 +819,24 @@ void __init setup_system_domains(void)
> >>>>      if ( IS_ERR(dom_xen) )
> >>>>          panic("Failed to create d[XEN]: %ld\n", PTR_ERR(dom_xen));
> >>>>  
> >>>> +#ifdef CONFIG_HAS_PIRQ
> >>>> +    /* Bound-check values passed via "extra_guest_irqs=". */
> >>>> +    {
> >>>> +        unsigned int n = max(arch_hwdom_irqs(dom_xen), nr_static_irqs);
> >>>> +
> >>>> +        if ( extra_hwdom_irqs > n - nr_static_irqs )
> >>>> +        {
> >>>> +            extra_hwdom_irqs = n - nr_static_irqs;
> >>>> +            printk(XENLOG_WARNING "hwdom IRQs bounded to %u\n", n);
> >>>> +        }
> >>>> +        if ( extra_domU_irqs > max(32U, n - nr_static_irqs) )
> >>>> +        {
> >>>> +            extra_domU_irqs = n - nr_static_irqs;
> >>>> +            printk(XENLOG_WARNING "domU IRQs bounded to %u\n", n);
> >>>> +        }
> >>>> +    }
> >>>> +#endif
> >>>
> >>> IMO this is kind of a weird placement. Wouldn't this be more naturally
> >>> handled in parse_extra_guest_irqs()?
> >>
> >> Indeed it is and yes it would, but no, it can't. We shouldn't rely on
> >> the particular behavior of arch_hwdom_irqs(), and in the general case
> >> we can't call it as early as when command line arguments are parsed. I
> >> couldn't think of a neater way of doing this, and it not being pretty
> >> is why I'm saying "(ab)use" in the description.
> > 
> > I see, nr_static_irqs is an alias of nr_irqs_gsi, which is not properly
> > set by the time we evaluate command line arguments.
> > 
> > My only possible suggestion would be to do it as a presmp initcall,
> > and define/register such initcall for x86 only, the only benefit would
> > be that such inicall could be defined in the same translation unit as
> > arch_hwdom_irqs() then.
> 
> Which then would require making extra_{hwdom,domU}_irqs available to
> x86/io_apic.c, which also wouldn't be very nice. To be honest, I'd prefer
> to keep the logic where it is, until such time where perhaps we move pIRQ
> stuff wholesale to x86-only files.

Fine by me.

I think we are in agreement about what needs doing.  I can provide:

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

With the changes we have agreed to arch_hwdom_irqs().

Thanks, Roger.
Jan Beulich July 2, 2024, 8:26 a.m. UTC | #6
On 01.07.2024 17:17, Roger Pau Monné wrote:
> I think we are in agreement about what needs doing.  I can provide:
> 
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> With the changes we have agreed to arch_hwdom_irqs().

Yet another request for considering for a release-ack.

Thanks, Jan
Jan Beulich July 2, 2024, 8:30 a.m. UTC | #7
On 01.07.2024 17:17, Roger Pau Monné wrote:
> On Mon, Jul 01, 2024 at 05:07:19PM +0200, Jan Beulich wrote:
>> On 01.07.2024 15:29, Roger Pau Monné wrote:
>>> On Mon, Jul 01, 2024 at 12:40:35PM +0200, Jan Beulich wrote:
>>>> On 01.07.2024 11:55, Roger Pau Monné wrote:
>>>>> On Thu, Jul 27, 2023 at 09:38:55AM +0200, Jan Beulich wrote:
>>>>>> --- a/xen/common/domain.c
>>>>>> +++ b/xen/common/domain.c
>>>>>> @@ -693,7 +693,7 @@ struct domain *domain_create(domid_t dom
>>>>>>              d->nr_pirqs = nr_static_irqs + extra_domU_irqs;
>>>>>>          else
>>>>>>              d->nr_pirqs = extra_hwdom_irqs ? nr_static_irqs + extra_hwdom_irqs
>>>>>> -                                           : arch_hwdom_irqs(domid);
>>>>>> +                                           : arch_hwdom_irqs(d);
>>>>>>          d->nr_pirqs = min(d->nr_pirqs, nr_irqs);
>>>>>>  
>>>>>>          radix_tree_init(&d->pirq_tree);
>>>>>> @@ -819,6 +819,24 @@ void __init setup_system_domains(void)
>>>>>>      if ( IS_ERR(dom_xen) )
>>>>>>          panic("Failed to create d[XEN]: %ld\n", PTR_ERR(dom_xen));
>>>>>>  
>>>>>> +#ifdef CONFIG_HAS_PIRQ
>>>>>> +    /* Bound-check values passed via "extra_guest_irqs=". */
>>>>>> +    {
>>>>>> +        unsigned int n = max(arch_hwdom_irqs(dom_xen), nr_static_irqs);
>>>>>> +
>>>>>> +        if ( extra_hwdom_irqs > n - nr_static_irqs )
>>>>>> +        {
>>>>>> +            extra_hwdom_irqs = n - nr_static_irqs;
>>>>>> +            printk(XENLOG_WARNING "hwdom IRQs bounded to %u\n", n);
>>>>>> +        }
>>>>>> +        if ( extra_domU_irqs > max(32U, n - nr_static_irqs) )
>>>>>> +        {
>>>>>> +            extra_domU_irqs = n - nr_static_irqs;
>>>>>> +            printk(XENLOG_WARNING "domU IRQs bounded to %u\n", n);
>>>>>> +        }
>>>>>> +    }
>>>>>> +#endif
>>>>>
>>>>> IMO this is kind of a weird placement. Wouldn't this be more naturally
>>>>> handled in parse_extra_guest_irqs()?
>>>>
>>>> Indeed it is and yes it would, but no, it can't. We shouldn't rely on
>>>> the particular behavior of arch_hwdom_irqs(), and in the general case
>>>> we can't call it as early as when command line arguments are parsed. I
>>>> couldn't think of a neater way of doing this, and it not being pretty
>>>> is why I'm saying "(ab)use" in the description.
>>>
>>> I see, nr_static_irqs is an alias of nr_irqs_gsi, which is not properly
>>> set by the time we evaluate command line arguments.
>>>
>>> My only possible suggestion would be to do it as a presmp initcall,
>>> and define/register such initcall for x86 only, the only benefit would
>>> be that such inicall could be defined in the same translation unit as
>>> arch_hwdom_irqs() then.
>>
>> Which then would require making extra_{hwdom,domU}_irqs available to
>> x86/io_apic.c, which also wouldn't be very nice. To be honest, I'd prefer
>> to keep the logic where it is, until such time where perhaps we move pIRQ
>> stuff wholesale to x86-only files.
> 
> Fine by me.
> 
> I think we are in agreement about what needs doing.

Hmm, after further thinking I'm not sure splitting would be the way to go.
We should rather properly remove the concept of pIRQ from PVH, i.e. not
only not have them visible to the kernel, but also not use e.g. ->nr_pirqs
and ->pirq_tree internally. Then we could presumably drop the constraining
via this command line option (documenting it as inapplicable to PVH).

>  I can provide:
> 
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> With the changes we have agreed to arch_hwdom_irqs().

Thanks; changes done locally.

Jan
Roger Pau Monné July 2, 2024, 8:45 a.m. UTC | #8
On Tue, Jul 02, 2024 at 10:30:03AM +0200, Jan Beulich wrote:
> On 01.07.2024 17:17, Roger Pau Monné wrote:
> > On Mon, Jul 01, 2024 at 05:07:19PM +0200, Jan Beulich wrote:
> >> On 01.07.2024 15:29, Roger Pau Monné wrote:
> >>> On Mon, Jul 01, 2024 at 12:40:35PM +0200, Jan Beulich wrote:
> >>>> On 01.07.2024 11:55, Roger Pau Monné wrote:
> >>>>> On Thu, Jul 27, 2023 at 09:38:55AM +0200, Jan Beulich wrote:
> >>>>>> --- a/xen/common/domain.c
> >>>>>> +++ b/xen/common/domain.c
> >>>>>> @@ -693,7 +693,7 @@ struct domain *domain_create(domid_t dom
> >>>>>>              d->nr_pirqs = nr_static_irqs + extra_domU_irqs;
> >>>>>>          else
> >>>>>>              d->nr_pirqs = extra_hwdom_irqs ? nr_static_irqs + extra_hwdom_irqs
> >>>>>> -                                           : arch_hwdom_irqs(domid);
> >>>>>> +                                           : arch_hwdom_irqs(d);
> >>>>>>          d->nr_pirqs = min(d->nr_pirqs, nr_irqs);
> >>>>>>  
> >>>>>>          radix_tree_init(&d->pirq_tree);
> >>>>>> @@ -819,6 +819,24 @@ void __init setup_system_domains(void)
> >>>>>>      if ( IS_ERR(dom_xen) )
> >>>>>>          panic("Failed to create d[XEN]: %ld\n", PTR_ERR(dom_xen));
> >>>>>>  
> >>>>>> +#ifdef CONFIG_HAS_PIRQ
> >>>>>> +    /* Bound-check values passed via "extra_guest_irqs=". */
> >>>>>> +    {
> >>>>>> +        unsigned int n = max(arch_hwdom_irqs(dom_xen), nr_static_irqs);
> >>>>>> +
> >>>>>> +        if ( extra_hwdom_irqs > n - nr_static_irqs )
> >>>>>> +        {
> >>>>>> +            extra_hwdom_irqs = n - nr_static_irqs;
> >>>>>> +            printk(XENLOG_WARNING "hwdom IRQs bounded to %u\n", n);
> >>>>>> +        }
> >>>>>> +        if ( extra_domU_irqs > max(32U, n - nr_static_irqs) )
> >>>>>> +        {
> >>>>>> +            extra_domU_irqs = n - nr_static_irqs;
> >>>>>> +            printk(XENLOG_WARNING "domU IRQs bounded to %u\n", n);
> >>>>>> +        }
> >>>>>> +    }
> >>>>>> +#endif
> >>>>>
> >>>>> IMO this is kind of a weird placement. Wouldn't this be more naturally
> >>>>> handled in parse_extra_guest_irqs()?
> >>>>
> >>>> Indeed it is and yes it would, but no, it can't. We shouldn't rely on
> >>>> the particular behavior of arch_hwdom_irqs(), and in the general case
> >>>> we can't call it as early as when command line arguments are parsed. I
> >>>> couldn't think of a neater way of doing this, and it not being pretty
> >>>> is why I'm saying "(ab)use" in the description.
> >>>
> >>> I see, nr_static_irqs is an alias of nr_irqs_gsi, which is not properly
> >>> set by the time we evaluate command line arguments.
> >>>
> >>> My only possible suggestion would be to do it as a presmp initcall,
> >>> and define/register such initcall for x86 only, the only benefit would
> >>> be that such inicall could be defined in the same translation unit as
> >>> arch_hwdom_irqs() then.
> >>
> >> Which then would require making extra_{hwdom,domU}_irqs available to
> >> x86/io_apic.c, which also wouldn't be very nice. To be honest, I'd prefer
> >> to keep the logic where it is, until such time where perhaps we move pIRQ
> >> stuff wholesale to x86-only files.
> > 
> > Fine by me.
> > 
> > I think we are in agreement about what needs doing.
> 
> Hmm, after further thinking I'm not sure splitting would be the way to go.
> We should rather properly remove the concept of pIRQ from PVH, i.e. not
> only not have them visible to the kernel, but also not use e.g. ->nr_pirqs
> and ->pirq_tree internally. Then we could presumably drop the constraining
> via this command line option (documenting it as inapplicable to PVH).

Removing it from PVH would also imply removing from HVM AFAICT (unless
it's HVM with explicitly pIRQ support).  I think the main issue there
would be dealing with the change in all the interfaces that currently
take pIRQ parameters, albeit I could be wrong.  Seems like a
worthwhile improvement.

Thanks, Roger.
Oleksii Kurochko July 2, 2024, 9:50 a.m. UTC | #9
On Tue, 2024-07-02 at 10:26 +0200, Jan Beulich wrote:
> On 01.07.2024 17:17, Roger Pau Monné wrote:
> > I think we are in agreement about what needs doing.  I can provide:
> > 
> > Acked-by: Roger Pau Monné <roger.pau@citrix.com>
> > 
> > With the changes we have agreed to arch_hwdom_irqs().
> 
> Yet another request for considering for a release-ack.
It doesn't fix any real issue, does it? It just provides limitation of
how many pIRQs can be used by domain.

Anyway this change is low-risk so:
 Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

~ Oleksii
Jan Beulich July 2, 2024, 9:55 a.m. UTC | #10
On 02.07.2024 11:50, Oleksii wrote:
> On Tue, 2024-07-02 at 10:26 +0200, Jan Beulich wrote:
>> On 01.07.2024 17:17, Roger Pau Monné wrote:
>>> I think we are in agreement about what needs doing.  I can provide:
>>>
>>> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
>>>
>>> With the changes we have agreed to arch_hwdom_irqs().
>>
>> Yet another request for considering for a release-ack.
> It doesn't fix any real issue, does it?

It does address a bug report we had, by folks specifying excessively large
values with that command line option.

> It just provides limitation of how many pIRQs can be used by domain.
> 
> Anyway this change is low-risk so:
>  Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

Thanks.

Jan
diff mbox series

Patch

--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1146,7 +1146,8 @@  common for all domUs, while the optional
 is for dom0.  Changing the setting for domU has no impact on dom0 and vice
 versa.  For example to change dom0 without changing domU, use
 `extra_guest_irqs=,512`.  The default value for Dom0 and an eventual separate
-hardware domain is architecture dependent.
+hardware domain is architecture dependent.  The upper limit for both values on
+x86 is such that the resulting total number of IRQs can't be higher than 32768.
 Note that specifying zero as domU value means zero, while for dom0 it means
 to use the default.
 
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -2663,18 +2663,21 @@  void __init ioapic_init(void)
            nr_irqs_gsi, nr_irqs - nr_irqs_gsi);
 }
 
-unsigned int arch_hwdom_irqs(domid_t domid)
+unsigned int arch_hwdom_irqs(const struct domain *d)
 {
     unsigned int n = fls(num_present_cpus());
 
-    if ( !domid )
+    if ( is_system_domain(d) )
+        return PAGE_SIZE * BITS_PER_BYTE;
+
+    if ( !d->domain_id )
         n = min(n, dom0_max_vcpus());
     n = min(nr_irqs_gsi + n * NR_DYNAMIC_VECTORS, nr_irqs);
 
     /* Bounded by the domain pirq eoi bitmap gfn. */
     n = min_t(unsigned int, n, PAGE_SIZE * BITS_PER_BYTE);
 
-    printk("Dom%d has maximum %u PIRQs\n", domid, n);
+    printk("%pd has maximum %u PIRQs\n", d, n);
 
     return n;
 }
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -693,7 +693,7 @@  struct domain *domain_create(domid_t dom
             d->nr_pirqs = nr_static_irqs + extra_domU_irqs;
         else
             d->nr_pirqs = extra_hwdom_irqs ? nr_static_irqs + extra_hwdom_irqs
-                                           : arch_hwdom_irqs(domid);
+                                           : arch_hwdom_irqs(d);
         d->nr_pirqs = min(d->nr_pirqs, nr_irqs);
 
         radix_tree_init(&d->pirq_tree);
@@ -819,6 +819,24 @@  void __init setup_system_domains(void)
     if ( IS_ERR(dom_xen) )
         panic("Failed to create d[XEN]: %ld\n", PTR_ERR(dom_xen));
 
+#ifdef CONFIG_HAS_PIRQ
+    /* Bound-check values passed via "extra_guest_irqs=". */
+    {
+        unsigned int n = max(arch_hwdom_irqs(dom_xen), nr_static_irqs);
+
+        if ( extra_hwdom_irqs > n - nr_static_irqs )
+        {
+            extra_hwdom_irqs = n - nr_static_irqs;
+            printk(XENLOG_WARNING "hwdom IRQs bounded to %u\n", n);
+        }
+        if ( extra_domU_irqs > max(32U, n - nr_static_irqs) )
+        {
+            extra_domU_irqs = n - nr_static_irqs;
+            printk(XENLOG_WARNING "domU IRQs bounded to %u\n", n);
+        }
+    }
+#endif
+
     /*
      * Initialise our DOMID_IO domain.
      * This domain owns I/O pages that are within the range of the page_info
--- a/xen/include/xen/irq.h
+++ b/xen/include/xen/irq.h
@@ -173,8 +173,9 @@  extern irq_desc_t *pirq_spin_lock_irq_de
 
 unsigned int set_desc_affinity(struct irq_desc *, const cpumask_t *);
 
+/* When passed a system domain, this returns the maximum permissible value. */
 #ifndef arch_hwdom_irqs
-unsigned int arch_hwdom_irqs(domid_t);
+unsigned int arch_hwdom_irqs(const struct domain *);
 #endif
 
 #ifndef arch_evtchn_bind_pirq