diff mbox series

[1/3] sched/isolation: Add io_queue housekeeping option

Message ID 20240621-isolcpus-io-queues-v1-1-8b169bf41083@suse.de (mailing list archive)
State New, archived
Headers show
Series nvme-pci: honor isolcpus configuration | expand

Commit Message

Daniel Wagner June 21, 2024, 1:53 p.m. UTC
Drivers such as nvme-pci are spreading the IO queues on all CPUs. This
is not necessary an invalid setup when using isolcpus but there are also
users of isolcpus which do not want to have any IO workload on the
isolated CPUs.

Add a new house keeping type so the infrastructure code and drivers are
able to distinguish between the two setups.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 include/linux/sched/isolation.h | 1 +
 kernel/sched/isolation.c        | 7 +++++++
 2 files changed, 8 insertions(+)

Comments

Christoph Hellwig June 22, 2024, 5:11 a.m. UTC | #1
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index 2b461129d1fa..fe751d704e99 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -16,6 +16,7 @@ enum hk_type {
>  	HK_TYPE_WQ,
>  	HK_TYPE_MANAGED_IRQ,
>  	HK_TYPE_KTHREAD,
> +	HK_TYPE_IO_QUEUE,
>  	HK_TYPE_MAX
>  };

It might be a good time to write comments explaining these types?
Daniel Wagner June 24, 2024, 7:13 a.m. UTC | #2
On Sat, Jun 22, 2024 at 07:11:57AM GMT, Christoph Hellwig wrote:
> > diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> > index 2b461129d1fa..fe751d704e99 100644
> > --- a/include/linux/sched/isolation.h
> > +++ b/include/linux/sched/isolation.h
> > @@ -16,6 +16,7 @@ enum hk_type {
> >  	HK_TYPE_WQ,
> >  	HK_TYPE_MANAGED_IRQ,
> >  	HK_TYPE_KTHREAD,
> > +	HK_TYPE_IO_QUEUE,
> >  	HK_TYPE_MAX
> >  };
> 
> It might be a good time to write comments explaining these types?

Sure, will do.

Do you think we should introduce a new type or just use the existing
managed_irq for this?
Christoph Hellwig June 24, 2024, 8:47 a.m. UTC | #3
On Mon, Jun 24, 2024 at 09:13:06AM +0200, Daniel Wagner wrote:
> On Sat, Jun 22, 2024 at 07:11:57AM GMT, Christoph Hellwig wrote:
> > > diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> > > index 2b461129d1fa..fe751d704e99 100644
> > > --- a/include/linux/sched/isolation.h
> > > +++ b/include/linux/sched/isolation.h
> > > @@ -16,6 +16,7 @@ enum hk_type {
> > >  	HK_TYPE_WQ,
> > >  	HK_TYPE_MANAGED_IRQ,
> > >  	HK_TYPE_KTHREAD,
> > > +	HK_TYPE_IO_QUEUE,
> > >  	HK_TYPE_MAX
> > >  };
> > 
> > It might be a good time to write comments explaining these types?
> 
> Sure, will do.
> 
> Do you think we should introduce a new type or just use the existing
> managed_irq for this?

No idea really.  What was the reason for adding a new one?  The best
person to comment on this is probably Thomas.
Daniel Wagner June 24, 2024, 9 a.m. UTC | #4
On Mon, Jun 24, 2024 at 10:47:05AM GMT, Christoph Hellwig wrote:
> > Do you think we should introduce a new type or just use the existing
> > managed_irq for this?
> 
> No idea really.  What was the reason for adding a new one?

I've added the new type so that the current behavior of spreading the
queues over to the isolated CPUs is still possible. I don't know if this
a valid use case or not. I just didn't wanted to kill this feature it
without having discussed it before.

But if we agree this doesn't really makes sense with isolcpus, then I
think we should use the managed_irq one as nvme-pci is using the managed
IRQ API.
Hannes Reinecke June 25, 2024, 6:37 a.m. UTC | #5
On 6/24/24 11:00, Daniel Wagner wrote:
> On Mon, Jun 24, 2024 at 10:47:05AM GMT, Christoph Hellwig wrote:
>>> Do you think we should introduce a new type or just use the existing
>>> managed_irq for this?
>>
>> No idea really.  What was the reason for adding a new one?
> 
> I've added the new type so that the current behavior of spreading the
> queues over to the isolated CPUs is still possible. I don't know if this
> a valid use case or not. I just didn't wanted to kill this feature it
> without having discussed it before.
> 
> But if we agree this doesn't really makes sense with isolcpus, then I
> think we should use the managed_irq one as nvme-pci is using the managed
> IRQ API.
> 
I'm in favour in expanding/modifying the managed irq case.
For managed irqs the driver will be running on the housekeeping CPUs 
only, and has no way of even installing irq handlers for the isolcpus.

Cheers,

Hannes
Thomas Gleixner June 25, 2024, 7:07 a.m. UTC | #6
On Tue, Jun 25 2024 at 08:37, Hannes Reinecke wrote:
> On 6/24/24 11:00, Daniel Wagner wrote:
>> On Mon, Jun 24, 2024 at 10:47:05AM GMT, Christoph Hellwig wrote:
>>>> Do you think we should introduce a new type or just use the existing
>>>> managed_irq for this?
>>>
>>> No idea really.  What was the reason for adding a new one?
>> 
>> I've added the new type so that the current behavior of spreading the
>> queues over to the isolated CPUs is still possible. I don't know if this
>> a valid use case or not. I just didn't wanted to kill this feature it
>> without having discussed it before.
>> 
>> But if we agree this doesn't really makes sense with isolcpus, then I
>> think we should use the managed_irq one as nvme-pci is using the managed
>> IRQ API.
>> 
> I'm in favour in expanding/modifying the managed irq case.
> For managed irqs the driver will be running on the housekeeping CPUs 
> only, and has no way of even installing irq handlers for the isolcpus.

Yes, that's preferred, but please double check with the people who
introduced that in the first place.

Thanks,

        tglx
Daniel Wagner June 25, 2024, 8:57 a.m. UTC | #7
On Tue, Jun 25, 2024 at 09:07:30AM GMT, Thomas Gleixner wrote:
> On Tue, Jun 25 2024 at 08:37, Hannes Reinecke wrote:
> > On 6/24/24 11:00, Daniel Wagner wrote:
> >> On Mon, Jun 24, 2024 at 10:47:05AM GMT, Christoph Hellwig wrote:
> >>>> Do you think we should introduce a new type or just use the existing
> >>>> managed_irq for this?
> >>>
> >>> No idea really.  What was the reason for adding a new one?
> >> 
> >> I've added the new type so that the current behavior of spreading the
> >> queues over to the isolated CPUs is still possible. I don't know if this
> >> a valid use case or not. I just didn't wanted to kill this feature it
> >> without having discussed it before.
> >> 
> >> But if we agree this doesn't really makes sense with isolcpus, then I
> >> think we should use the managed_irq one as nvme-pci is using the managed
> >> IRQ API.
> >> 
> > I'm in favour in expanding/modifying the managed irq case.
> > For managed irqs the driver will be running on the housekeeping CPUs 
> > only, and has no way of even installing irq handlers for the isolcpus.
> 
> Yes, that's preferred, but please double check with the people who
> introduced that in the first place.

The relevant code was added by Ming:

11ea68f553e2 ("genirq, sched/isolation: Isolate from handling managed
interrupts")

    [...] it can happen that a managed interrupt whose affinity
    mask contains both isolated and housekeeping CPUs is routed to an isolated
    CPU. As a consequence IO submitted on a housekeeping CPU causes interrupts
    on the isolated CPU.

    Add a new sub-parameter 'managed_irq' for 'isolcpus' and the corresponding
    logic in the interrupt affinity selection code.

    The subparameter indicates to the interrupt affinity selection logic that
    it should try to avoid the above scenario.
    [...]

From the commit message I read the original indent is that managed_irq
should avoid speading queues on isolcated CPUs.

Ming, do you agree to use the managed_irq mask to limit the queue
spreading on isolated CPUs? It would make the io_queue option obsolete.
Ming Lei June 30, 2024, 1:47 p.m. UTC | #8
On Tue, Jun 25, 2024 at 10:57:42AM +0200, Daniel Wagner wrote:
> On Tue, Jun 25, 2024 at 09:07:30AM GMT, Thomas Gleixner wrote:
> > On Tue, Jun 25 2024 at 08:37, Hannes Reinecke wrote:
> > > On 6/24/24 11:00, Daniel Wagner wrote:
> > >> On Mon, Jun 24, 2024 at 10:47:05AM GMT, Christoph Hellwig wrote:
> > >>>> Do you think we should introduce a new type or just use the existing
> > >>>> managed_irq for this?
> > >>>
> > >>> No idea really.  What was the reason for adding a new one?
> > >> 
> > >> I've added the new type so that the current behavior of spreading the
> > >> queues over to the isolated CPUs is still possible. I don't know if this
> > >> a valid use case or not. I just didn't wanted to kill this feature it
> > >> without having discussed it before.
> > >> 
> > >> But if we agree this doesn't really makes sense with isolcpus, then I
> > >> think we should use the managed_irq one as nvme-pci is using the managed
> > >> IRQ API.
> > >> 
> > > I'm in favour in expanding/modifying the managed irq case.
> > > For managed irqs the driver will be running on the housekeeping CPUs 
> > > only, and has no way of even installing irq handlers for the isolcpus.
> > 
> > Yes, that's preferred, but please double check with the people who
> > introduced that in the first place.
> 
> The relevant code was added by Ming:
> 
> 11ea68f553e2 ("genirq, sched/isolation: Isolate from handling managed
> interrupts")
> 
>     [...] it can happen that a managed interrupt whose affinity
>     mask contains both isolated and housekeeping CPUs is routed to an isolated
>     CPU. As a consequence IO submitted on a housekeeping CPU causes interrupts
>     on the isolated CPU.
> 
>     Add a new sub-parameter 'managed_irq' for 'isolcpus' and the corresponding
>     logic in the interrupt affinity selection code.
> 
>     The subparameter indicates to the interrupt affinity selection logic that
>     it should try to avoid the above scenario.
>     [...]
> 
> From the commit message I read the original indent is that managed_irq
> should avoid speading queues on isolcated CPUs.
> 
> Ming, do you agree to use the managed_irq mask to limit the queue
> spreading on isolated CPUs? It would make the io_queue option obsolete.

Yes, managed_irq is introduced for not spreading on isolated CPUs, and
it is supposed to work well.

The only problem of managed_irq is just that isolated CPUs are
spread, but they are excluded from irq effective masks.


Thanks,
Ming
diff mbox series

Patch

diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index 2b461129d1fa..fe751d704e99 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -16,6 +16,7 @@  enum hk_type {
 	HK_TYPE_WQ,
 	HK_TYPE_MANAGED_IRQ,
 	HK_TYPE_KTHREAD,
+	HK_TYPE_IO_QUEUE,
 	HK_TYPE_MAX
 };
 
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 5891e715f00d..91d7a434330c 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -18,6 +18,7 @@  enum hk_flags {
 	HK_FLAG_WQ		= BIT(HK_TYPE_WQ),
 	HK_FLAG_MANAGED_IRQ	= BIT(HK_TYPE_MANAGED_IRQ),
 	HK_FLAG_KTHREAD		= BIT(HK_TYPE_KTHREAD),
+	HK_FLAG_IO_QUEUE	= BIT(HK_TYPE_IO_QUEUE),
 };
 
 DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
@@ -228,6 +229,12 @@  static int __init housekeeping_isolcpus_setup(char *str)
 			continue;
 		}
 
+		if (!strncmp(str, "io_queue,", 9)) {
+			str += 9;
+			flags |= HK_FLAG_IO_QUEUE;
+			continue;
+		}
+
 		/*
 		 * Skip unknown sub-parameter and validate that it is not
 		 * containing an invalid character.