Message ID | 20240621-isolcpus-io-queues-v1-1-8b169bf41083@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | nvme-pci: honor isolcpus configuration | expand |
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h > index 2b461129d1fa..fe751d704e99 100644 > --- a/include/linux/sched/isolation.h > +++ b/include/linux/sched/isolation.h > @@ -16,6 +16,7 @@ enum hk_type { > HK_TYPE_WQ, > HK_TYPE_MANAGED_IRQ, > HK_TYPE_KTHREAD, > + HK_TYPE_IO_QUEUE, > HK_TYPE_MAX > }; It might be a good time to write comments explaining these types?
On Sat, Jun 22, 2024 at 07:11:57AM GMT, Christoph Hellwig wrote: > > diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h > > index 2b461129d1fa..fe751d704e99 100644 > > --- a/include/linux/sched/isolation.h > > +++ b/include/linux/sched/isolation.h > > @@ -16,6 +16,7 @@ enum hk_type { > > HK_TYPE_WQ, > > HK_TYPE_MANAGED_IRQ, > > HK_TYPE_KTHREAD, > > + HK_TYPE_IO_QUEUE, > > HK_TYPE_MAX > > }; > > It might be a good time to write comments explaining these types? Sure, will do. Do you think we should introduce a new type or just use the existing managed_irq for this?
On Mon, Jun 24, 2024 at 09:13:06AM +0200, Daniel Wagner wrote: > On Sat, Jun 22, 2024 at 07:11:57AM GMT, Christoph Hellwig wrote: > > > diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h > > > index 2b461129d1fa..fe751d704e99 100644 > > > --- a/include/linux/sched/isolation.h > > > +++ b/include/linux/sched/isolation.h > > > @@ -16,6 +16,7 @@ enum hk_type { > > > HK_TYPE_WQ, > > > HK_TYPE_MANAGED_IRQ, > > > HK_TYPE_KTHREAD, > > > + HK_TYPE_IO_QUEUE, > > > HK_TYPE_MAX > > > }; > > > > It might be a good time to write comments explaining these types? > > Sure, will do. > > Do you think we should introduce a new type or just use the existing > managed_irq for this? No idea really. What was the reason for adding a new one? The best person to comment on this is probably Thomas.
On Mon, Jun 24, 2024 at 10:47:05AM GMT, Christoph Hellwig wrote: > > Do you think we should introduce a new type or just use the existing > > managed_irq for this? > > No idea really. What was the reason for adding a new one? I've added the new type so that the current behavior of spreading the queues over to the isolated CPUs is still possible. I don't know if this a valid use case or not. I just didn't wanted to kill this feature it without having discussed it before. But if we agree this doesn't really makes sense with isolcpus, then I think we should use the managed_irq one as nvme-pci is using the managed IRQ API.
On 6/24/24 11:00, Daniel Wagner wrote: > On Mon, Jun 24, 2024 at 10:47:05AM GMT, Christoph Hellwig wrote: >>> Do you think we should introduce a new type or just use the existing >>> managed_irq for this? >> >> No idea really. What was the reason for adding a new one? > > I've added the new type so that the current behavior of spreading the > queues over to the isolated CPUs is still possible. I don't know if this > a valid use case or not. I just didn't wanted to kill this feature it > without having discussed it before. > > But if we agree this doesn't really makes sense with isolcpus, then I > think we should use the managed_irq one as nvme-pci is using the managed > IRQ API. > I'm in favour in expanding/modifying the managed irq case. For managed irqs the driver will be running on the housekeeping CPUs only, and has no way of even installing irq handlers for the isolcpus. Cheers, Hannes
On Tue, Jun 25 2024 at 08:37, Hannes Reinecke wrote: > On 6/24/24 11:00, Daniel Wagner wrote: >> On Mon, Jun 24, 2024 at 10:47:05AM GMT, Christoph Hellwig wrote: >>>> Do you think we should introduce a new type or just use the existing >>>> managed_irq for this? >>> >>> No idea really. What was the reason for adding a new one? >> >> I've added the new type so that the current behavior of spreading the >> queues over to the isolated CPUs is still possible. I don't know if this >> a valid use case or not. I just didn't wanted to kill this feature it >> without having discussed it before. >> >> But if we agree this doesn't really makes sense with isolcpus, then I >> think we should use the managed_irq one as nvme-pci is using the managed >> IRQ API. >> > I'm in favour in expanding/modifying the managed irq case. > For managed irqs the driver will be running on the housekeeping CPUs > only, and has no way of even installing irq handlers for the isolcpus. Yes, that's preferred, but please double check with the people who introduced that in the first place. Thanks, tglx
On Tue, Jun 25, 2024 at 09:07:30AM GMT, Thomas Gleixner wrote: > On Tue, Jun 25 2024 at 08:37, Hannes Reinecke wrote: > > On 6/24/24 11:00, Daniel Wagner wrote: > >> On Mon, Jun 24, 2024 at 10:47:05AM GMT, Christoph Hellwig wrote: > >>>> Do you think we should introduce a new type or just use the existing > >>>> managed_irq for this? > >>> > >>> No idea really. What was the reason for adding a new one? > >> > >> I've added the new type so that the current behavior of spreading the > >> queues over to the isolated CPUs is still possible. I don't know if this > >> a valid use case or not. I just didn't wanted to kill this feature it > >> without having discussed it before. > >> > >> But if we agree this doesn't really makes sense with isolcpus, then I > >> think we should use the managed_irq one as nvme-pci is using the managed > >> IRQ API. > >> > > I'm in favour in expanding/modifying the managed irq case. > > For managed irqs the driver will be running on the housekeeping CPUs > > only, and has no way of even installing irq handlers for the isolcpus. > > Yes, that's preferred, but please double check with the people who > introduced that in the first place. The relevant code was added by Ming: 11ea68f553e2 ("genirq, sched/isolation: Isolate from handling managed interrupts") [...] it can happen that a managed interrupt whose affinity mask contains both isolated and housekeeping CPUs is routed to an isolated CPU. As a consequence IO submitted on a housekeeping CPU causes interrupts on the isolated CPU. Add a new sub-parameter 'managed_irq' for 'isolcpus' and the corresponding logic in the interrupt affinity selection code. The subparameter indicates to the interrupt affinity selection logic that it should try to avoid the above scenario. [...] From the commit message I read the original indent is that managed_irq should avoid speading queues on isolcated CPUs. Ming, do you agree to use the managed_irq mask to limit the queue spreading on isolated CPUs? It would make the io_queue option obsolete.
On Tue, Jun 25, 2024 at 10:57:42AM +0200, Daniel Wagner wrote: > On Tue, Jun 25, 2024 at 09:07:30AM GMT, Thomas Gleixner wrote: > > On Tue, Jun 25 2024 at 08:37, Hannes Reinecke wrote: > > > On 6/24/24 11:00, Daniel Wagner wrote: > > >> On Mon, Jun 24, 2024 at 10:47:05AM GMT, Christoph Hellwig wrote: > > >>>> Do you think we should introduce a new type or just use the existing > > >>>> managed_irq for this? > > >>> > > >>> No idea really. What was the reason for adding a new one? > > >> > > >> I've added the new type so that the current behavior of spreading the > > >> queues over to the isolated CPUs is still possible. I don't know if this > > >> a valid use case or not. I just didn't wanted to kill this feature it > > >> without having discussed it before. > > >> > > >> But if we agree this doesn't really makes sense with isolcpus, then I > > >> think we should use the managed_irq one as nvme-pci is using the managed > > >> IRQ API. > > >> > > > I'm in favour in expanding/modifying the managed irq case. > > > For managed irqs the driver will be running on the housekeeping CPUs > > > only, and has no way of even installing irq handlers for the isolcpus. > > > > Yes, that's preferred, but please double check with the people who > > introduced that in the first place. > > The relevant code was added by Ming: > > 11ea68f553e2 ("genirq, sched/isolation: Isolate from handling managed > interrupts") > > [...] it can happen that a managed interrupt whose affinity > mask contains both isolated and housekeeping CPUs is routed to an isolated > CPU. As a consequence IO submitted on a housekeeping CPU causes interrupts > on the isolated CPU. > > Add a new sub-parameter 'managed_irq' for 'isolcpus' and the corresponding > logic in the interrupt affinity selection code. > > The subparameter indicates to the interrupt affinity selection logic that > it should try to avoid the above scenario. > [...] > > From the commit message I read the original indent is that managed_irq > should avoid speading queues on isolcated CPUs. > > Ming, do you agree to use the managed_irq mask to limit the queue > spreading on isolated CPUs? It would make the io_queue option obsolete. Yes, managed_irq is introduced for not spreading on isolated CPUs, and it is supposed to work well. The only problem of managed_irq is just that isolated CPUs are spread, but they are excluded from irq effective masks. Thanks, Ming
diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h index 2b461129d1fa..fe751d704e99 100644 --- a/include/linux/sched/isolation.h +++ b/include/linux/sched/isolation.h @@ -16,6 +16,7 @@ enum hk_type { HK_TYPE_WQ, HK_TYPE_MANAGED_IRQ, HK_TYPE_KTHREAD, + HK_TYPE_IO_QUEUE, HK_TYPE_MAX }; diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index 5891e715f00d..91d7a434330c 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -18,6 +18,7 @@ enum hk_flags { HK_FLAG_WQ = BIT(HK_TYPE_WQ), HK_FLAG_MANAGED_IRQ = BIT(HK_TYPE_MANAGED_IRQ), HK_FLAG_KTHREAD = BIT(HK_TYPE_KTHREAD), + HK_FLAG_IO_QUEUE = BIT(HK_TYPE_IO_QUEUE), }; DEFINE_STATIC_KEY_FALSE(housekeeping_overridden); @@ -228,6 +229,12 @@ static int __init housekeeping_isolcpus_setup(char *str) continue; } + if (!strncmp(str, "io_queue,", 9)) { + str += 9; + flags |= HK_FLAG_IO_QUEUE; + continue; + } + /* * Skip unknown sub-parameter and validate that it is not * containing an invalid character.
Drivers such as nvme-pci are spreading the IO queues on all CPUs. This is not necessary an invalid setup when using isolcpus but there are also users of isolcpus which do not want to have any IO workload on the isolated CPUs. Add a new house keeping type so the infrastructure code and drivers are able to distinguish between the two setups. Signed-off-by: Daniel Wagner <dwagner@suse.de> --- include/linux/sched/isolation.h | 1 + kernel/sched/isolation.c | 7 +++++++ 2 files changed, 8 insertions(+)