Message ID | 20210501021832.743094-1-jesse.brandeburg@intel.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | [tip:irq/core,v1] genirq: remove auto-set of the mask when setting the hint | expand |
On 2021-05-01 03:18, Jesse Brandeburg wrote: > It was pointed out by Nitesh that the original work I did in 2014 > to automatically set the interrupt affinity when requesting a > mask is no longer necessary. The kernel has moved on and no > longer has the original problem, BUT the original patch > introduced a subtle bug when booting a system with reserved or > excluded CPUs. Drivers calling this function with a mask value > that included a CPU that was currently or in the future > unavailable would generally not update the hint. > > I'm sure there are a million ways to solve this, but the simplest > one is to just remove a little code that tries to force the > affinity, as Nitesh has shown it fixes the bug and doesn't seem > to introduce immediate side effects. Unfortunately, I think there are quite a few other drivers now relying on this behaviour, since they are really using irq_set_affinity_hint() as a proxy for irq_set_affinity(). Partly since the latter isn't exported to modules, but also I have a vague memory of it being said that it's nice to update the user-visible hint to match when the affinity does have to be forced to something specific. Robin. > While I'm here, introduce a kernel-doc for the hint function. > > Ref: https://lore.kernel.org/lkml/CAFki+L=_dd+JgAR12_eBPX0kZO2_6=1dGdgkwHE=u=K6chMeLQ@mail.gmail.com/ > Cc: netdev@vger.kernel.org > Fixes: 4fe7ffb7e17c ("genirq: Fix null pointer reference in irq_set_affinity_hint()") > Fixes: e2e64a932556 ("genirq: Set initial affinity in irq_set_affinity_hint()") > Reported-by: Nitesh Lal <nilal@redhat.com> > Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> > --- > > !!! NOTE: Compile tested only, would appreciate feedback > > --- > kernel/irq/manage.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c > index e976c4927b25..a31df64662d5 100644 > --- a/kernel/irq/manage.c > +++ b/kernel/irq/manage.c > @@ -456,6 +456,16 @@ int __irq_set_affinity(unsigned int irq, const struct cpumask *mask, bool force) > return ret; > } > > +/** > + * irq_set_affinity_hint - set the hint for an irq > + * @irq: Interrupt for which to set the hint > + * @m: Mask to indicate which CPUs to suggest for the interrupt, use > + * NULL here to indicate to clear the value. > + * > + * Use this function to recommend which CPU should handle the > + * interrupt to any userspace that uses /proc/irq/nn/smp_affinity_hint > + * in order to align interrupts. Pass NULL as the mask to clear the hint. > + */ > int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m) > { > unsigned long flags; > @@ -465,9 +475,6 @@ int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m) > return -EINVAL; > desc->affinity_hint = m; > irq_put_desc_unlock(desc, flags); > - /* set the initial affinity to prevent every interrupt being on CPU0 */ > - if (m) > - __irq_set_affinity(irq, m, false); > return 0; > } > EXPORT_SYMBOL_GPL(irq_set_affinity_hint); > > base-commit: 765822e1569a37aab5e69736c52d4ad4a289eba6 >
On Tue, May 4, 2021 at 8:15 AM Robin Murphy <robin.murphy@arm.com> wrote: > > On 2021-05-01 03:18, Jesse Brandeburg wrote: > > It was pointed out by Nitesh that the original work I did in 2014 > > to automatically set the interrupt affinity when requesting a > > mask is no longer necessary. The kernel has moved on and no > > longer has the original problem, BUT the original patch > > introduced a subtle bug when booting a system with reserved or > > excluded CPUs. Drivers calling this function with a mask value > > that included a CPU that was currently or in the future > > unavailable would generally not update the hint. > > > > I'm sure there are a million ways to solve this, but the simplest > > one is to just remove a little code that tries to force the > > affinity, as Nitesh has shown it fixes the bug and doesn't seem > > to introduce immediate side effects. > > Unfortunately, I think there are quite a few other drivers now relying > on this behaviour, since they are really using irq_set_affinity_hint() > as a proxy for irq_set_affinity(). That's true. > Partly since the latter isn't > exported to modules, but also I have a vague memory of it being said > that it's nice to update the user-visible hint to match when the > affinity does have to be forced to something specific. If you see the downside of it we are forcing the affinity to match the hint mask without considering the default SMP affinity mask. Also, we are repeating things here. First, we set certain mask for a device IRQ via request_irq code path which does consider the default SMP mask but then we are letting the driver over-write it. If we want to set the IRQ mask in a certain way then it should be done at the time of initial setup itself. Do you know about a workload/use case that can show the benefit of this behavior? As then we can try fixing it in the right way. -- Thanks Nitesh
Robin Murphy wrote: > On 2021-05-01 03:18, Jesse Brandeburg wrote: > > It was pointed out by Nitesh that the original work I did in 2014 > > to automatically set the interrupt affinity when requesting a > > mask is no longer necessary. The kernel has moved on and no > > longer has the original problem, BUT the original patch > > introduced a subtle bug when booting a system with reserved or > > excluded CPUs. Drivers calling this function with a mask value > > that included a CPU that was currently or in the future > > unavailable would generally not update the hint. > > > > I'm sure there are a million ways to solve this, but the simplest > > one is to just remove a little code that tries to force the > > affinity, as Nitesh has shown it fixes the bug and doesn't seem > > to introduce immediate side effects. > > Unfortunately, I think there are quite a few other drivers now relying > on this behaviour, since they are really using irq_set_affinity_hint() > as a proxy for irq_set_affinity(). Partly since the latter isn't > exported to modules, but also I have a vague memory of it being said > that it's nice to update the user-visible hint to match when the > affinity does have to be forced to something specific. > > Robin. Thanks for your feedback Robin, but there is definitely a bug here that is being exposed by this code. The fact that people are using this function means they're all exposed to this bug. Not sure if you saw, but this analysis from Nitesh explains what happened chronologically to the kernel w.r.t this code, it's a useful analysis! [1] I'd add in addition that irqbalance daemon *stopped* paying attention to hints quite a while ago, so I'm not quite sure what purpose they serve. [1] https://lore.kernel.org/lkml/CAFki+Lm0W_brLu31epqD3gAV+WNKOJfVDfX2M8ZM__aj3nv9uA@mail.gmail.com/
On Tue, May 4, 2021 at 12:25 PM Jesse Brandeburg <jesse.brandeburg@intel.com> wrote: > > Robin Murphy wrote: > > > On 2021-05-01 03:18, Jesse Brandeburg wrote: > > > It was pointed out by Nitesh that the original work I did in 2014 > > > to automatically set the interrupt affinity when requesting a > > > mask is no longer necessary. The kernel has moved on and no > > > longer has the original problem, BUT the original patch > > > introduced a subtle bug when booting a system with reserved or > > > excluded CPUs. Drivers calling this function with a mask value > > > that included a CPU that was currently or in the future > > > unavailable would generally not update the hint. > > > > > > I'm sure there are a million ways to solve this, but the simplest > > > one is to just remove a little code that tries to force the > > > affinity, as Nitesh has shown it fixes the bug and doesn't seem > > > to introduce immediate side effects. > > > > Unfortunately, I think there are quite a few other drivers now relying > > on this behaviour, since they are really using irq_set_affinity_hint() > > as a proxy for irq_set_affinity(). Partly since the latter isn't > > exported to modules, but also I have a vague memory of it being said > > that it's nice to update the user-visible hint to match when the > > affinity does have to be forced to something specific. > > > > Robin. > > Thanks for your feedback Robin, but there is definitely a bug here that > is being exposed by this code. The fact that people are using this > function means they're all exposed to this bug. > > Not sure if you saw, but this analysis from Nitesh explains what > happened chronologically to the kernel w.r.t this code, it's a useful > analysis! [1] > > I'd add in addition that irqbalance daemon *stopped* paying attention > to hints quite a while ago, so I'm not quite sure what purpose they > serve. > > [1] > https://lore.kernel.org/lkml/CAFki+Lm0W_brLu31epqD3gAV+WNKOJfVDfX2M8ZM__aj3nv9uA@mail.gmail.com/ > Wanted to follow up to see if there are any more objections or even suggestions to take this forward?
On 2021-05-17 17:57, Nitesh Lal wrote: > On Tue, May 4, 2021 at 12:25 PM Jesse Brandeburg > <jesse.brandeburg@intel.com> wrote: >> >> Robin Murphy wrote: >> >>> On 2021-05-01 03:18, Jesse Brandeburg wrote: >>>> It was pointed out by Nitesh that the original work I did in 2014 >>>> to automatically set the interrupt affinity when requesting a >>>> mask is no longer necessary. The kernel has moved on and no >>>> longer has the original problem, BUT the original patch >>>> introduced a subtle bug when booting a system with reserved or >>>> excluded CPUs. Drivers calling this function with a mask value >>>> that included a CPU that was currently or in the future >>>> unavailable would generally not update the hint. >>>> >>>> I'm sure there are a million ways to solve this, but the simplest >>>> one is to just remove a little code that tries to force the >>>> affinity, as Nitesh has shown it fixes the bug and doesn't seem >>>> to introduce immediate side effects. >>> >>> Unfortunately, I think there are quite a few other drivers now relying >>> on this behaviour, since they are really using irq_set_affinity_hint() >>> as a proxy for irq_set_affinity(). Partly since the latter isn't >>> exported to modules, but also I have a vague memory of it being said >>> that it's nice to update the user-visible hint to match when the >>> affinity does have to be forced to something specific. >>> >>> Robin. >> >> Thanks for your feedback Robin, but there is definitely a bug here that >> is being exposed by this code. The fact that people are using this >> function means they're all exposed to this bug. >> >> Not sure if you saw, but this analysis from Nitesh explains what >> happened chronologically to the kernel w.r.t this code, it's a useful >> analysis! [1] >> >> I'd add in addition that irqbalance daemon *stopped* paying attention >> to hints quite a while ago, so I'm not quite sure what purpose they >> serve. >> >> [1] >> https://lore.kernel.org/lkml/CAFki+Lm0W_brLu31epqD3gAV+WNKOJfVDfX2M8ZM__aj3nv9uA@mail.gmail.com/ >> > > Wanted to follow up to see if there are any more objections or even > suggestions to take this forward? Oops, sorry, seems I got distracted before getting round to actually typing up my response :) I'm not implying that there isn't a bug, or that this code ever made sense in the first place, just that fixing it will unfortunately be a bit more involved than a simple revert. This patch as-is *will* subtly break at least the system PMU drivers currently using irq_set_affinity_hint() - those I know require the IRQ affinity to follow whichever CPU the PMU context is bound to, in order to meet perf core's assumptions about mutual exclusion. As far as the consistency argument goes, maybe that's just backwards and it should be irq_set_affinity() that also sets the hint, to indicate to userspace that the affinity has been forced by the kernel? Either way we'll need to do a little more diligence to figure out which callers actually care about more than just the hint, and sort them out first. Robin.
On Mon, May 17 2021 at 18:26, Robin Murphy wrote: > On 2021-05-17 17:57, Nitesh Lal wrote: > I'm not implying that there isn't a bug, or that this code ever made > sense in the first place, just that fixing it will unfortunately be a > bit more involved than a simple revert. This patch as-is *will* subtly > break at least the system PMU drivers currently using s/using/abusing/ > irq_set_affinity_hint() - those I know require the IRQ affinity to > follow whichever CPU the PMU context is bound to, in order to meet perf > core's assumptions about mutual exclusion. Which driver is that? Thanks, tglx
On Mon, May 17, 2021 at 1:26 PM Robin Murphy <robin.murphy@arm.com> wrote: > > On 2021-05-17 17:57, Nitesh Lal wrote: > > On Tue, May 4, 2021 at 12:25 PM Jesse Brandeburg > > <jesse.brandeburg@intel.com> wrote: > >> > >> Robin Murphy wrote: > >> > >>> On 2021-05-01 03:18, Jesse Brandeburg wrote: > >>>> It was pointed out by Nitesh that the original work I did in 2014 > >>>> to automatically set the interrupt affinity when requesting a > >>>> mask is no longer necessary. The kernel has moved on and no > >>>> longer has the original problem, BUT the original patch > >>>> introduced a subtle bug when booting a system with reserved or > >>>> excluded CPUs. Drivers calling this function with a mask value > >>>> that included a CPU that was currently or in the future > >>>> unavailable would generally not update the hint. > >>>> > >>>> I'm sure there are a million ways to solve this, but the simplest > >>>> one is to just remove a little code that tries to force the > >>>> affinity, as Nitesh has shown it fixes the bug and doesn't seem > >>>> to introduce immediate side effects. > >>> > >>> Unfortunately, I think there are quite a few other drivers now relying > >>> on this behaviour, since they are really using irq_set_affinity_hint() > >>> as a proxy for irq_set_affinity(). Partly since the latter isn't > >>> exported to modules, but also I have a vague memory of it being said > >>> that it's nice to update the user-visible hint to match when the > >>> affinity does have to be forced to something specific. > >>> > >>> Robin. > >> > >> Thanks for your feedback Robin, but there is definitely a bug here that > >> is being exposed by this code. The fact that people are using this > >> function means they're all exposed to this bug. > >> > >> Not sure if you saw, but this analysis from Nitesh explains what > >> happened chronologically to the kernel w.r.t this code, it's a useful > >> analysis! [1] > >> > >> I'd add in addition that irqbalance daemon *stopped* paying attention > >> to hints quite a while ago, so I'm not quite sure what purpose they > >> serve. > >> > >> [1] > >> https://lore.kernel.org/lkml/CAFki+Lm0W_brLu31epqD3gAV+WNKOJfVDfX2M8ZM__aj3nv9uA@mail.gmail.com/ > >> > > > > Wanted to follow up to see if there are any more objections or even > > suggestions to take this forward? > > Oops, sorry, seems I got distracted before getting round to actually > typing up my response :) No worries. > > I'm not implying that there isn't a bug, or that this code ever made > sense in the first place, just that fixing it will unfortunately be a > bit more involved than a simple revert. Fair point. > This patch as-is *will* subtly > break at least the system PMU drivers currently using > irq_set_affinity_hint() - those I know require the IRQ affinity to > follow whichever CPU the PMU context is bound to, in order to meet perf > core's assumptions about mutual exclusion. Thanks for bringing this up. Please correct me if I am wrong, so the PMU driver(s) is/are written in a way that it uses the hint API to overwrite the previously set affinity mask with a CPU to which the PMU context is bound to? Is this context information exposed in the userspace and can we modify the IRQ affinity mask from the userspace based on that? I do understand that this is a behavior change from the PMU drivers perspective. > > As far as the consistency argument goes, maybe that's just backwards and > it should be irq_set_affinity() that also sets the hint, to indicate to > userspace that the affinity has been forced by the kernel? Either way > we'll need to do a little more diligence to figure out which callers > actually care about more than just the hint, and sort them out first. > We can use irq_set_affinity() to set the hint mask as well, however, maybe there is a specific reason behind separating those two in the first place (maybe not?). But even in this case, we have to either modify the PMU drivers' IRQs affinity from the userspace or we will have to make changes in the existing request_irq code path. I am not sure about the latter because we already have the required controls to adjust the device IRQ mask (by using default_smp_affinity or by modifying them manually).
On 2021-05-17 19:08, Thomas Gleixner wrote: > On Mon, May 17 2021 at 18:26, Robin Murphy wrote: >> On 2021-05-17 17:57, Nitesh Lal wrote: >> I'm not implying that there isn't a bug, or that this code ever made >> sense in the first place, just that fixing it will unfortunately be a >> bit more involved than a simple revert. This patch as-is *will* subtly >> break at least the system PMU drivers currently using > > s/using/abusing/ > >> irq_set_affinity_hint() - those I know require the IRQ affinity to >> follow whichever CPU the PMU context is bound to, in order to meet perf >> core's assumptions about mutual exclusion. > > Which driver is that? Right now, any driver which wants to control an IRQ's affinity and also build as a module, for one thing. I'm familiar with drivers/perf/ where a basic pattern has been widely copied; some of the callers in other subsystems appear to *expect* it to set the underlying affinity as well, but whether any of those added within the last 6 years represent a functional dependency rather than just a performance concern I don't know. Robin.
On Mon, May 17 2021 at 19:50, Robin Murphy wrote: > On 2021-05-17 19:08, Thomas Gleixner wrote: >> On Mon, May 17 2021 at 18:26, Robin Murphy wrote: >>> On 2021-05-17 17:57, Nitesh Lal wrote: >>> I'm not implying that there isn't a bug, or that this code ever made >>> sense in the first place, just that fixing it will unfortunately be a >>> bit more involved than a simple revert. This patch as-is *will* subtly >>> break at least the system PMU drivers currently using >> >> s/using/abusing/ >> >>> irq_set_affinity_hint() - those I know require the IRQ affinity to >>> follow whichever CPU the PMU context is bound to, in order to meet perf >>> core's assumptions about mutual exclusion. >> >> Which driver is that? > > Right now, any driver which wants to control an IRQ's affinity and also > build as a module, for one thing. I'm familiar with drivers/perf/ where > a basic pattern has been widely copied; Bah. Why the heck can't people talk and just go and rumage until they find something which hopefully does what they want... The name of that function should have rang all alarm bells... > some of the callers in other subsystems appear to *expect* it to set > the underlying affinity as well, but whether any of those added within > the last 6 years represent a functional dependency rather than just a > performance concern I don't know. Sigh. Let me do yet another tree wide audit... Thanks, tglx
On Mon, May 17 2021 at 21:08, Thomas Gleixner wrote: > On Mon, May 17 2021 at 19:50, Robin Murphy wrote: >> some of the callers in other subsystems appear to *expect* it to set >> the underlying affinity as well, but whether any of those added within >> the last 6 years represent a functional dependency rather than just a >> performance concern I don't know. > > Sigh. Let me do yet another tree wide audit... It's clearly only the perf muck which has a functional dependency. None of the other usage sites has IRQF_NOBALANCING set which clearly makes this a hint because user space can freely muck with the affinity. Thanks, tglx
Nitesh, On Mon, May 17 2021 at 14:21, Nitesh Lal wrote: > On Mon, May 17, 2021 at 1:26 PM Robin Murphy <robin.murphy@arm.com> wrote: > > We can use irq_set_affinity() to set the hint mask as well, however, maybe > there is a specific reason behind separating those two in the > first place (maybe not?). Yes, because kernel side settings might overwrite the hint. > But even in this case, we have to either modify the PMU drivers' IRQs > affinity from the userspace or we will have to make changes in the existing > request_irq code path. Adjusting them from user space does not work for various reasons, especially CPU hotplug. > I am not sure about the latter because we already have the required controls > to adjust the device IRQ mask (by using default_smp_affinity or by modifying > them manually). default_smp_affinity does not help at all and there is nothing a module can modify manually. I'll send out a patch series which cleans that up soon. Thanks, tglx
On Mon, May 17 2021 at 21:08, Thomas Gleixner wrote: > On Mon, May 17 2021 at 19:50, Robin Murphy wrote: >> On 2021-05-17 19:08, Thomas Gleixner wrote: >>> On Mon, May 17 2021 at 18:26, Robin Murphy wrote: >>>> On 2021-05-17 17:57, Nitesh Lal wrote: >>>> I'm not implying that there isn't a bug, or that this code ever made >>>> sense in the first place, just that fixing it will unfortunately be a >>>> bit more involved than a simple revert. This patch as-is *will* subtly >>>> break at least the system PMU drivers currently using >>> >>> s/using/abusing/ >>> >>>> irq_set_affinity_hint() - those I know require the IRQ affinity to >>>> follow whichever CPU the PMU context is bound to, in order to meet perf >>>> core's assumptions about mutual exclusion. >>> >>> Which driver is that? >> >> Right now, any driver which wants to control an IRQ's affinity and also >> build as a module, for one thing. I'm familiar with drivers/perf/ where >> a basic pattern has been widely copied; > > Bah. Why the heck can't people talk and just go and rumage until they > find something which hopefully does what they want... > > The name of that function should have rang all alarm bells... Aside of that all the warnings around the return value are useless cargo cult. Why? The only reason why this function returns an error code is when there is no irq descriptor assigned to the interrupt number, which is well close to impossible in that context. But it does _NOT_ return an error when the actual affinity setting fails... Thanks, tglx
On Tue, May 04 2021 at 09:23, Jesse Brandeburg wrote: > I'd add in addition that irqbalance daemon *stopped* paying attention > to hints quite a while ago, so I'm not quite sure what purpose they > serve. The hint was added so that userspace has a better understanding where it should place the interrupt. So if irqbalanced ignores it anyway, then what's the point of the hint? IOW, why is it still used drivers? Now there is another aspect to that. What happens if irqbalanced does not run at all and a driver relies on the side effect of the hint setting the initial affinity. Bah... While none of the drivers (except the perf muck) actually prevents userspace from fiddling with the affinity (via IRQF_NOBALANCING) a deeper inspection shows that they actually might rely on the current behaviour if irqbalanced is disabled. Of course every driver has its own convoluted way to do that and all of those functions are well documented. What a mess. If the hint still serves a purpose then we can provide a variant which solely applies the hint and does not fiddle with the actual affinity, but if the hint is useless anyway then we have a way better option to clean that up. Most users are in networking, there are a few in crypto, a couple of leftovers in scsi, virtio and a handfull of oddball drivers. The perf muck wants to be cleaned up anyway as it's just crystal clear abuse. Thanks, tglx
On Mon, May 17, 2021 at 3:47 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > Nitesh, > > On Mon, May 17 2021 at 14:21, Nitesh Lal wrote: > > On Mon, May 17, 2021 at 1:26 PM Robin Murphy <robin.murphy@arm.com> wrote: > > > > We can use irq_set_affinity() to set the hint mask as well, however, maybe > > there is a specific reason behind separating those two in the > > first place (maybe not?). > > Yes, because kernel side settings might overwrite the hint. > > > But even in this case, we have to either modify the PMU drivers' IRQs > > affinity from the userspace or we will have to make changes in the existing > > request_irq code path. > > Adjusting them from user space does not work for various reasons, > especially CPU hotplug. > > > I am not sure about the latter because we already have the required controls > > to adjust the device IRQ mask (by using default_smp_affinity or by modifying > > them manually). > > default_smp_affinity does not help at all and there is nothing a module > can modify manually. Right, it will not help a module. > > I'll send out a patch series which cleans that up soon. Ack, thanks. > > Thanks, > > tglx > > -- Nitesh
On Mon, May 17, 2021 at 4:48 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Tue, May 04 2021 at 09:23, Jesse Brandeburg wrote: > > I'd add in addition that irqbalance daemon *stopped* paying attention > > to hints quite a while ago, so I'm not quite sure what purpose they > > serve. > > The hint was added so that userspace has a better understanding where it > should place the interrupt. So if irqbalanced ignores it anyway, then > what's the point of the hint? IOW, why is it still used drivers? > Took a quick look at the irqbalance repo and saw the following commit: dcc411e7bf remove affinity_hint infrastructure The commit message mentions that "PJ is redesiging how affinity hinting works in the kernel, the future model will just tell us to ignore an IRQ, and the kernel will handle placement for us. As such we can remove the affinity_hint recognition entirely". This does indicate that apparently, irqbalance moved away from the usage of affinity_hint. However, the next question is what was this future model? I don't know but I can surely look into it if that helps or maybe someone here already knows about it? > Now there is another aspect to that. What happens if irqbalanced does > not run at all and a driver relies on the side effect of the hint > setting the initial affinity. Bah... > Right, but if they only rely on this API so that the IRQs are spread across all the CPUs then that issue is already resolved and these other drivers should not regress because of changing this behavior. Isn't it? > While none of the drivers (except the perf muck) actually prevents > userspace from fiddling with the affinity (via IRQF_NOBALANCING) a > deeper inspection shows that they actually might rely on the current > behaviour if irqbalanced is disabled. Of course every driver has its own > convoluted way to do that and all of those functions are well > documented. What a mess. > > If the hint still serves a purpose then we can provide a variant which > solely applies the hint and does not fiddle with the actual affinity, > but if the hint is useless anyway then we have a way better option to > clean that up. > +1
On Mon, May 17 2021 at 18:44, Nitesh Lal wrote: > On Mon, May 17, 2021 at 4:48 PM Thomas Gleixner <tglx@linutronix.de> wrote: >> The hint was added so that userspace has a better understanding where it >> should place the interrupt. So if irqbalanced ignores it anyway, then >> what's the point of the hint? IOW, why is it still used drivers? >> > Took a quick look at the irqbalance repo and saw the following commit: > > dcc411e7bf remove affinity_hint infrastructure > > The commit message mentions that "PJ is redesiging how affinity hinting > works in the kernel, the future model will just tell us to ignore an IRQ, > and the kernel will handle placement for us. As such we can remove the > affinity_hint recognition entirely". No idea who PJ is. I really love useful commit messages. Maybe Neil can shed some light on that. > This does indicate that apparently, irqbalance moved away from the usage of > affinity_hint. However, the next question is what was this future > model? I might have missed something in the last 5 years, but that's the first time I hear about someone trying to cleanup that thing. > I don't know but I can surely look into it if that helps or maybe someone > here already knows about it? I CC'ed Neil :) >> Now there is another aspect to that. What happens if irqbalanced does >> not run at all and a driver relies on the side effect of the hint >> setting the initial affinity. Bah... >> > > Right, but if they only rely on this API so that the IRQs are spread across > all the CPUs then that issue is already resolved and these other drivers > should not regress because of changing this behavior. Isn't it? Is that true for all architectures? >> While none of the drivers (except the perf muck) actually prevents >> userspace from fiddling with the affinity (via IRQF_NOBALANCING) a >> deeper inspection shows that they actually might rely on the current >> behaviour if irqbalanced is disabled. Of course every driver has its own >> convoluted way to do that and all of those functions are well >> documented. What a mess. >> >> If the hint still serves a purpose then we can provide a variant which >> solely applies the hint and does not fiddle with the actual affinity, >> but if the hint is useless anyway then we have a way better option to >> clean that up. >> > > +1 = 1 Thanks, tglx
On Mon, May 17, 2021 at 8:04 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Mon, May 17 2021 at 18:44, Nitesh Lal wrote: > > On Mon, May 17, 2021 at 4:48 PM Thomas Gleixner <tglx@linutronix.de> wrote: > >> The hint was added so that userspace has a better understanding where it > >> should place the interrupt. So if irqbalanced ignores it anyway, then > >> what's the point of the hint? IOW, why is it still used drivers? > >> > > Took a quick look at the irqbalance repo and saw the following commit: > > > > dcc411e7bf remove affinity_hint infrastructure > > > > The commit message mentions that "PJ is redesiging how affinity hinting > > works in the kernel, the future model will just tell us to ignore an IRQ, > > and the kernel will handle placement for us. As such we can remove the > > affinity_hint recognition entirely". > > No idea who PJ is. I really love useful commit messages. Maybe Neil can > shed some light on that. > > > This does indicate that apparently, irqbalance moved away from the usage of > > affinity_hint. However, the next question is what was this future > > model? > > I might have missed something in the last 5 years, but that's the first > time I hear about someone trying to cleanup that thing. > > > I don't know but I can surely look into it if that helps or maybe someone > > here already knows about it? > > I CC'ed Neil :) Thanks, I have added PJ Waskiewicz as well who I think was referred in that commit message as PJ. > > >> Now there is another aspect to that. What happens if irqbalanced does > >> not run at all and a driver relies on the side effect of the hint > >> setting the initial affinity. Bah... > >> > > > > Right, but if they only rely on this API so that the IRQs are spread across > > all the CPUs then that issue is already resolved and these other drivers > > should not regress because of changing this behavior. Isn't it? > > Is that true for all architectures? Unfortunately, I don't know and that's probably why we have to be careful. -- Nitesh
On Mon, May 17, 2021 at 8:23 PM Nitesh Lal <nilal@redhat.com> wrote: > > On Mon, May 17, 2021 at 8:04 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > > > On Mon, May 17 2021 at 18:44, Nitesh Lal wrote: > > > On Mon, May 17, 2021 at 4:48 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > >> The hint was added so that userspace has a better understanding where it > > >> should place the interrupt. So if irqbalanced ignores it anyway, then > > >> what's the point of the hint? IOW, why is it still used drivers? > > >> > > > Took a quick look at the irqbalance repo and saw the following commit: > > > > > > dcc411e7bf remove affinity_hint infrastructure > > > > > > The commit message mentions that "PJ is redesiging how affinity hinting > > > works in the kernel, the future model will just tell us to ignore an IRQ, > > > and the kernel will handle placement for us. As such we can remove the > > > affinity_hint recognition entirely". > > > > No idea who PJ is. I really love useful commit messages. Maybe Neil can > > shed some light on that. > > > > > This does indicate that apparently, irqbalance moved away from the usage of > > > affinity_hint. However, the next question is what was this future > > > model? > > > > I might have missed something in the last 5 years, but that's the first > > time I hear about someone trying to cleanup that thing. > > > > > I don't know but I can surely look into it if that helps or maybe someone > > > here already knows about it? > > > > I CC'ed Neil :) > > Thanks, I have added PJ Waskiewicz as well who I think was referred in > that commit message as PJ. > > > > > >> Now there is another aspect to that. What happens if irqbalanced does > > >> not run at all and a driver relies on the side effect of the hint > > >> setting the initial affinity. Bah... > > >> > > > > > > Right, but if they only rely on this API so that the IRQs are spread across > > > all the CPUs then that issue is already resolved and these other drivers > > > should not regress because of changing this behavior. Isn't it? > > > > Is that true for all architectures? > > Unfortunately, I don't know and that's probably why we have to be careful. I think here to ensure that we are not breaking any of the drivers we have to first analyze all the existing drivers and understand how they are using this API. AFAIK there are three possible scenarios: - A driver use this API to spread the IRQs + For this case we should be safe considering the spreading is naturally done from the IRQ subsystem itself. - A driver use this API to actually set the hint + These drivers should have no functional impact because of this revert - Driver use this API to force a certain affinity mask + In this case we have to replace the API with the irq_force_affinity() I can start looking into the individual drivers, however, testing them will be a challenge. Any thoughts? -- Thanks Nitesh
On Thu, May 20, 2021 at 5:57 PM Nitesh Lal <nilal@redhat.com> wrote: > > On Mon, May 17, 2021 at 8:23 PM Nitesh Lal <nilal@redhat.com> wrote: > > > > On Mon, May 17, 2021 at 8:04 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > > > > > On Mon, May 17 2021 at 18:44, Nitesh Lal wrote: > > > > On Mon, May 17, 2021 at 4:48 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > > >> The hint was added so that userspace has a better understanding where it > > > >> should place the interrupt. So if irqbalanced ignores it anyway, then > > > >> what's the point of the hint? IOW, why is it still used drivers? > > > >> > > > > Took a quick look at the irqbalance repo and saw the following commit: > > > > > > > > dcc411e7bf remove affinity_hint infrastructure > > > > > > > > The commit message mentions that "PJ is redesiging how affinity hinting > > > > works in the kernel, the future model will just tell us to ignore an IRQ, > > > > and the kernel will handle placement for us. As such we can remove the > > > > affinity_hint recognition entirely". > > > > > > No idea who PJ is. I really love useful commit messages. Maybe Neil can > > > shed some light on that. > > > > > > > This does indicate that apparently, irqbalance moved away from the usage of > > > > affinity_hint. However, the next question is what was this future > > > > model? > > > > > > I might have missed something in the last 5 years, but that's the first > > > time I hear about someone trying to cleanup that thing. > > > > > > > I don't know but I can surely look into it if that helps or maybe someone > > > > here already knows about it? > > > > > > I CC'ed Neil :) > > > > Thanks, I have added PJ Waskiewicz as well who I think was referred in > > that commit message as PJ. > > > > > > > > >> Now there is another aspect to that. What happens if irqbalanced does > > > >> not run at all and a driver relies on the side effect of the hint > > > >> setting the initial affinity. Bah... > > > >> > > > > > > > > Right, but if they only rely on this API so that the IRQs are spread across > > > > all the CPUs then that issue is already resolved and these other drivers > > > > should not regress because of changing this behavior. Isn't it? > > > > > > Is that true for all architectures? > > > > Unfortunately, I don't know and that's probably why we have to be careful. > > I think here to ensure that we are not breaking any of the drivers we have > to first analyze all the existing drivers and understand how they are using > this API. > AFAIK there are three possible scenarios: > > - A driver use this API to spread the IRQs > + For this case we should be safe considering the spreading is naturally > done from the IRQ subsystem itself. Forgot to mention another thing in the above case is to determine whether it is true for all architectures or not as Thomas mentioned. > > - A driver use this API to actually set the hint > + These drivers should have no functional impact because of this revert > > - Driver use this API to force a certain affinity mask > + In this case we have to replace the API with the irq_force_affinity() > > I can start looking into the individual drivers, however, testing them will > be a challenge. > > Any thoughts? > > -- > Thanks > Nitesh
Nitesh, On Thu, May 20 2021 at 20:03, Nitesh Lal wrote: > On Thu, May 20, 2021 at 5:57 PM Nitesh Lal <nilal@redhat.com> wrote: >> I think here to ensure that we are not breaking any of the drivers we have >> to first analyze all the existing drivers and understand how they are using >> this API. >> AFAIK there are three possible scenarios: >> >> - A driver use this API to spread the IRQs >> + For this case we should be safe considering the spreading is naturally >> done from the IRQ subsystem itself. > > Forgot to mention another thing in the above case is to determine whether > it is true for all architectures or not as Thomas mentioned. Yes. >> >> - A driver use this API to actually set the hint >> + These drivers should have no functional impact because of this revert Correct. >> - Driver use this API to force a certain affinity mask >> + In this case we have to replace the API with the irq_force_affinity() irq_set_affinity() or irq_set_affinity_and_hint() >> I can start looking into the individual drivers, however, testing them will >> be a challenge. The only way to do that is to have the core infrastructure added and then send patches changing it in the way you think. The relevant maintainers/developers should be able to tell you when your analysis went south. :) Been there, done that. It's just lots of work :) Thanks, tglx
On Fri, May 21, 2021 at 7:56 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > Nitesh, > > On Thu, May 20 2021 at 20:03, Nitesh Lal wrote: > > On Thu, May 20, 2021 at 5:57 PM Nitesh Lal <nilal@redhat.com> wrote: > >> I think here to ensure that we are not breaking any of the drivers we have > >> to first analyze all the existing drivers and understand how they are using > >> this API. > >> AFAIK there are three possible scenarios: > >> > >> - A driver use this API to spread the IRQs > >> + For this case we should be safe considering the spreading is naturally > >> done from the IRQ subsystem itself. > > > > Forgot to mention another thing in the above case is to determine whether > > it is true for all architectures or not as Thomas mentioned. > > Yes. > > >> > >> - A driver use this API to actually set the hint > >> + These drivers should have no functional impact because of this revert > > Correct. > > > >> - Driver use this API to force a certain affinity mask > >> + In this case we have to replace the API with the irq_force_affinity() > > irq_set_affinity() or irq_set_affinity_and_hint() Ah yes! my bad. _force_ doesn't check the mask against the online CPUs. Hmm, I didn't realize that you also added irq_set_affinity_and_hint() in your last patchset. > > >> I can start looking into the individual drivers, however, testing them will > >> be a challenge. > > The only way to do that is to have the core infrastructure added and Right. > then send patches changing it in the way you think. The relevant > maintainers/developers should be able to tell you when your analysis > went south. :) Ack will start looking into this. -- Thanks Nitesh
On Fri, May 21 2021 at 09:46, Nitesh Lal wrote: > On Fri, May 21, 2021 at 7:56 AM Thomas Gleixner <tglx@linutronix.de> wrote: >> >> - Driver use this API to force a certain affinity mask >> >> + In this case we have to replace the API with the irq_force_affinity() >> >> irq_set_affinity() or irq_set_affinity_and_hint() > > Ah yes! my bad. _force_ doesn't check the mask against the online CPUs. > Hmm, I didn't realize that you also added irq_set_affinity_and_hint() > in your last patchset. I did not. It just exposed irq_set_affinity(). See https://lore.kernel.org/r/87wnrs9tvp.ffs@nanos.tec.linutronix.de for the new hint interface I came up with. Thanks, tglx
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index e976c4927b25..a31df64662d5 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -456,6 +456,16 @@ int __irq_set_affinity(unsigned int irq, const struct cpumask *mask, bool force) return ret; } +/** + * irq_set_affinity_hint - set the hint for an irq + * @irq: Interrupt for which to set the hint + * @m: Mask to indicate which CPUs to suggest for the interrupt, use + * NULL here to indicate to clear the value. + * + * Use this function to recommend which CPU should handle the + * interrupt to any userspace that uses /proc/irq/nn/smp_affinity_hint + * in order to align interrupts. Pass NULL as the mask to clear the hint. + */ int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m) { unsigned long flags; @@ -465,9 +475,6 @@ int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m) return -EINVAL; desc->affinity_hint = m; irq_put_desc_unlock(desc, flags); - /* set the initial affinity to prevent every interrupt being on CPU0 */ - if (m) - __irq_set_affinity(irq, m, false); return 0; } EXPORT_SYMBOL_GPL(irq_set_affinity_hint);
It was pointed out by Nitesh that the original work I did in 2014 to automatically set the interrupt affinity when requesting a mask is no longer necessary. The kernel has moved on and no longer has the original problem, BUT the original patch introduced a subtle bug when booting a system with reserved or excluded CPUs. Drivers calling this function with a mask value that included a CPU that was currently or in the future unavailable would generally not update the hint. I'm sure there are a million ways to solve this, but the simplest one is to just remove a little code that tries to force the affinity, as Nitesh has shown it fixes the bug and doesn't seem to introduce immediate side effects. While I'm here, introduce a kernel-doc for the hint function. Ref: https://lore.kernel.org/lkml/CAFki+L=_dd+JgAR12_eBPX0kZO2_6=1dGdgkwHE=u=K6chMeLQ@mail.gmail.com/ Cc: netdev@vger.kernel.org Fixes: 4fe7ffb7e17c ("genirq: Fix null pointer reference in irq_set_affinity_hint()") Fixes: e2e64a932556 ("genirq: Set initial affinity in irq_set_affinity_hint()") Reported-by: Nitesh Lal <nilal@redhat.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> --- !!! NOTE: Compile tested only, would appreciate feedback --- kernel/irq/manage.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) base-commit: 765822e1569a37aab5e69736c52d4ad4a289eba6