[1/3] sched/isolation: Add io_queue housekeeping option

Message ID	20240621-isolcpus-io-queues-v1-1-8b169bf41083@suse.de (mailing list archive)
State	New, archived
Headers	show Received: from mail.nearlyone.de (mail.nearlyone.de [49.12.199.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0128DEAE6; Fri, 21 Jun 2024 13:53:26 +0000 (UTC) From: Daniel Wagner <dwagner@suse.de> Date: Fri, 21 Jun 2024 15:53:10 +0200 Subject: [PATCH 1/3] sched/isolation: Add io_queue housekeeping option Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20240621-isolcpus-io-queues-v1-1-8b169bf41083@suse.de> References: <20240621-isolcpus-io-queues-v1-0-8b169bf41083@suse.de> In-Reply-To: <20240621-isolcpus-io-queues-v1-0-8b169bf41083@suse.de> To: Jens Axboe <axboe@kernel.dk>, Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>, Thomas Gleixner <tglx@linutronix.de>, Christoph Hellwig <hch@lst.de> Cc: Frederic Weisbecker <fweisbecker@suse.com>, Mel Gorman <mgorman@suse.de>, Hannes Reinecke <hare@suse.de>, Sridhar Balaraman <sbalaraman@parallelwireless.com>, "brookxu.cn" <brookxu.cn@gmail.com>, Ming Lei <ming.lei@redhat.com>, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, Daniel Wagner <dwagner@suse.de>
Series	nvme-pci: honor isolcpus configuration \| expand [0/3] nvme-pci: honor isolcpus configuration [1/3] sched/isolation: Add io_queue housekeeping option [2/3] nvme-pci: limit queue count to housekeeping cpus [3/3] lib/group_cpus.c: honor housekeeping config when grouping CPUs

Daniel Wagner June 21, 2024, 1:53 p.m. UTC

Drivers such as nvme-pci are spreading the IO queues on all CPUs. This
is not necessary an invalid setup when using isolcpus but there are also
users of isolcpus which do not want to have any IO workload on the
isolated CPUs.

Add a new house keeping type so the infrastructure code and drivers are
able to distinguish between the two setups.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 include/linux/sched/isolation.h | 1 +
 kernel/sched/isolation.c        | 7 +++++++
 2 files changed, 8 insertions(+)

Christoph Hellwig June 22, 2024, 5:11 a.m. UTC | #1

> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index 2b461129d1fa..fe751d704e99 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -16,6 +16,7 @@ enum hk_type {
>  	HK_TYPE_WQ,
>  	HK_TYPE_MANAGED_IRQ,
>  	HK_TYPE_KTHREAD,
> +	HK_TYPE_IO_QUEUE,
>  	HK_TYPE_MAX
>  };

It might be a good time to write comments explaining these types?

Daniel Wagner June 24, 2024, 7:13 a.m. UTC | #2

On Sat, Jun 22, 2024 at 07:11:57AM GMT, Christoph Hellwig wrote:
> > diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> > index 2b461129d1fa..fe751d704e99 100644
> > --- a/include/linux/sched/isolation.h
> > +++ b/include/linux/sched/isolation.h
> > @@ -16,6 +16,7 @@ enum hk_type {
> >  	HK_TYPE_WQ,
> >  	HK_TYPE_MANAGED_IRQ,
> >  	HK_TYPE_KTHREAD,
> > +	HK_TYPE_IO_QUEUE,
> >  	HK_TYPE_MAX
> >  };
> 
> It might be a good time to write comments explaining these types?

Sure, will do.

Do you think we should introduce a new type or just use the existing
managed_irq for this?

Christoph Hellwig June 24, 2024, 8:47 a.m. UTC | #3

On Mon, Jun 24, 2024 at 09:13:06AM +0200, Daniel Wagner wrote:
> On Sat, Jun 22, 2024 at 07:11:57AM GMT, Christoph Hellwig wrote:
> > > diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> > > index 2b461129d1fa..fe751d704e99 100644
> > > --- a/include/linux/sched/isolation.h
> > > +++ b/include/linux/sched/isolation.h
> > > @@ -16,6 +16,7 @@ enum hk_type {
> > >  	HK_TYPE_WQ,
> > >  	HK_TYPE_MANAGED_IRQ,
> > >  	HK_TYPE_KTHREAD,
> > > +	HK_TYPE_IO_QUEUE,
> > >  	HK_TYPE_MAX
> > >  };
> > 
> > It might be a good time to write comments explaining these types?
> 
> Sure, will do.
> 
> Do you think we should introduce a new type or just use the existing
> managed_irq for this?

No idea really.  What was the reason for adding a new one?  The best
person to comment on this is probably Thomas.

Daniel Wagner June 24, 2024, 9 a.m. UTC | #4

On Mon, Jun 24, 2024 at 10:47:05AM GMT, Christoph Hellwig wrote:
> > Do you think we should introduce a new type or just use the existing
> > managed_irq for this?
> 
> No idea really.  What was the reason for adding a new one?

I've added the new type so that the current behavior of spreading the
queues over to the isolated CPUs is still possible. I don't know if this
a valid use case or not. I just didn't wanted to kill this feature it
without having discussed it before.

But if we agree this doesn't really makes sense with isolcpus, then I
think we should use the managed_irq one as nvme-pci is using the managed
IRQ API.

Hannes Reinecke June 25, 2024, 6:37 a.m. UTC | #5

On 6/24/24 11:00, Daniel Wagner wrote:
> On Mon, Jun 24, 2024 at 10:47:05AM GMT, Christoph Hellwig wrote:
>>> Do you think we should introduce a new type or just use the existing
>>> managed_irq for this?
>>
>> No idea really.  What was the reason for adding a new one?
> 
> I've added the new type so that the current behavior of spreading the
> queues over to the isolated CPUs is still possible. I don't know if this
> a valid use case or not. I just didn't wanted to kill this feature it
> without having discussed it before.
> 
> But if we agree this doesn't really makes sense with isolcpus, then I
> think we should use the managed_irq one as nvme-pci is using the managed
> IRQ API.
> 
I'm in favour in expanding/modifying the managed irq case.
For managed irqs the driver will be running on the housekeeping CPUs 
only, and has no way of even installing irq handlers for the isolcpus.

Cheers,

Hannes

Thomas Gleixner June 25, 2024, 7:07 a.m. UTC | #6

On Tue, Jun 25 2024 at 08:37, Hannes Reinecke wrote:
> On 6/24/24 11:00, Daniel Wagner wrote:
>> On Mon, Jun 24, 2024 at 10:47:05AM GMT, Christoph Hellwig wrote:
>>>> Do you think we should introduce a new type or just use the existing
>>>> managed_irq for this?
>>>
>>> No idea really.  What was the reason for adding a new one?
>> 
>> I've added the new type so that the current behavior of spreading the
>> queues over to the isolated CPUs is still possible. I don't know if this
>> a valid use case or not. I just didn't wanted to kill this feature it
>> without having discussed it before.
>> 
>> But if we agree this doesn't really makes sense with isolcpus, then I
>> think we should use the managed_irq one as nvme-pci is using the managed
>> IRQ API.
>> 
> I'm in favour in expanding/modifying the managed irq case.
> For managed irqs the driver will be running on the housekeeping CPUs 
> only, and has no way of even installing irq handlers for the isolcpus.

Yes, that's preferred, but please double check with the people who
introduced that in the first place.

Thanks,

        tglx

Daniel Wagner June 25, 2024, 8:57 a.m. UTC | #7

On Tue, Jun 25, 2024 at 09:07:30AM GMT, Thomas Gleixner wrote:
> On Tue, Jun 25 2024 at 08:37, Hannes Reinecke wrote:
> > On 6/24/24 11:00, Daniel Wagner wrote:
> >> On Mon, Jun 24, 2024 at 10:47:05AM GMT, Christoph Hellwig wrote:
> >>>> Do you think we should introduce a new type or just use the existing
> >>>> managed_irq for this?
> >>>
> >>> No idea really.  What was the reason for adding a new one?
> >> 
> >> I've added the new type so that the current behavior of spreading the
> >> queues over to the isolated CPUs is still possible. I don't know if this
> >> a valid use case or not. I just didn't wanted to kill this feature it
> >> without having discussed it before.
> >> 
> >> But if we agree this doesn't really makes sense with isolcpus, then I
> >> think we should use the managed_irq one as nvme-pci is using the managed
> >> IRQ API.
> >> 
> > I'm in favour in expanding/modifying the managed irq case.
> > For managed irqs the driver will be running on the housekeeping CPUs 
> > only, and has no way of even installing irq handlers for the isolcpus.
> 
> Yes, that's preferred, but please double check with the people who
> introduced that in the first place.

The relevant code was added by Ming:

11ea68f553e2 ("genirq, sched/isolation: Isolate from handling managed
interrupts")

    [...] it can happen that a managed interrupt whose affinity
    mask contains both isolated and housekeeping CPUs is routed to an isolated
    CPU. As a consequence IO submitted on a housekeeping CPU causes interrupts
    on the isolated CPU.

    Add a new sub-parameter 'managed_irq' for 'isolcpus' and the corresponding
    logic in the interrupt affinity selection code.

    The subparameter indicates to the interrupt affinity selection logic that
    it should try to avoid the above scenario.
    [...]

From the commit message I read the original indent is that managed_irq
should avoid speading queues on isolcated CPUs.

Ming, do you agree to use the managed_irq mask to limit the queue
spreading on isolated CPUs? It would make the io_queue option obsolete.

Ming Lei June 30, 2024, 1:47 p.m. UTC | #8

On Tue, Jun 25, 2024 at 10:57:42AM +0200, Daniel Wagner wrote:
> On Tue, Jun 25, 2024 at 09:07:30AM GMT, Thomas Gleixner wrote:
> > On Tue, Jun 25 2024 at 08:37, Hannes Reinecke wrote:
> > > On 6/24/24 11:00, Daniel Wagner wrote:
> > >> On Mon, Jun 24, 2024 at 10:47:05AM GMT, Christoph Hellwig wrote:
> > >>>> Do you think we should introduce a new type or just use the existing
> > >>>> managed_irq for this?
> > >>>
> > >>> No idea really.  What was the reason for adding a new one?
> > >> 
> > >> I've added the new type so that the current behavior of spreading the
> > >> queues over to the isolated CPUs is still possible. I don't know if this
> > >> a valid use case or not. I just didn't wanted to kill this feature it
> > >> without having discussed it before.
> > >> 
> > >> But if we agree this doesn't really makes sense with isolcpus, then I
> > >> think we should use the managed_irq one as nvme-pci is using the managed
> > >> IRQ API.
> > >> 
> > > I'm in favour in expanding/modifying the managed irq case.
> > > For managed irqs the driver will be running on the housekeeping CPUs 
> > > only, and has no way of even installing irq handlers for the isolcpus.
> > 
> > Yes, that's preferred, but please double check with the people who
> > introduced that in the first place.
> 
> The relevant code was added by Ming:
> 
> 11ea68f553e2 ("genirq, sched/isolation: Isolate from handling managed
> interrupts")
> 
>     [...] it can happen that a managed interrupt whose affinity
>     mask contains both isolated and housekeeping CPUs is routed to an isolated
>     CPU. As a consequence IO submitted on a housekeeping CPU causes interrupts
>     on the isolated CPU.
> 
>     Add a new sub-parameter 'managed_irq' for 'isolcpus' and the corresponding
>     logic in the interrupt affinity selection code.
> 
>     The subparameter indicates to the interrupt affinity selection logic that
>     it should try to avoid the above scenario.
>     [...]
> 
> From the commit message I read the original indent is that managed_irq
> should avoid speading queues on isolcated CPUs.
> 
> Ming, do you agree to use the managed_irq mask to limit the queue
> spreading on isolated CPUs? It would make the io_queue option obsolete.

Yes, managed_irq is introduced for not spreading on isolated CPUs, and
it is supposed to work well.

The only problem of managed_irq is just that isolated CPUs are
spread, but they are excluded from irq effective masks.


Thanks,
Ming

[1/3] sched/isolation: Add io_queue housekeeping option

Commit Message

Comments

Patch