Message ID | 1430709339-29083-8-git-send-email-jiang.liu@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Mon, May 4, 2015 at 6:15 AM, Jiang Liu <jiang.liu@linux.intel.com> wrote: > The field 'affinity' in irq_desc won't change once the irq_desc data > structure is created. So cache irq_desc->affinity instead of irq_desc. > This also helps to hide struct irq_desc from device drivers. Hi Jiang, I might not understand the new changes irq core, but up until now affinity was changed when the user changed it through /proc/irq/<IRQ>/smp_affinity. This code is monitoring the affinity from the napi_poll context to detect affinity changes, and prevent napi from keep running on the wrong CPU. Therefore, the affinity can't be cached at the beginning. Please revert this caching. Thanks, Amir -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2015/5/4 20:10, Amir Vadai wrote: > On Mon, May 4, 2015 at 6:15 AM, Jiang Liu <jiang.liu@linux.intel.com> wrote: >> The field 'affinity' in irq_desc won't change once the irq_desc data >> structure is created. So cache irq_desc->affinity instead of irq_desc. >> This also helps to hide struct irq_desc from device drivers. > > Hi Jiang, > > I might not understand the new changes irq core, but up until now > affinity was changed when the user changed it through > /proc/irq/<IRQ>/smp_affinity. > This code is monitoring the affinity from the napi_poll context to > detect affinity changes, and prevent napi from keep running on the > wrong CPU. > Therefore, the affinity can't be cached at the beginning. Please > revert this caching. Hi Amir, Thanks for review:) We want to hide irq_desc implementation details from device drivers, so made these changes. Function irq_get_affinity_mask() returns 'struct cpumask *' and we cache the returned pointer. On the other hand, user may change IRQ affinity through /proc/irq/<IRQ>/smp_affinity, but that only changes the bitmap pointed to by the cached pointer and won't change the pointer itself. So it should always return the latest affinity setting by calling cpumask_test_cpu(cpu_curr, cq->irq_affinity). Or am I missing something here? Thanks! Gerry > > Thanks, > Amir > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 4 May 2015, Amir Vadai wrote: > On Mon, May 4, 2015 at 6:15 AM, Jiang Liu <jiang.liu@linux.intel.com> wrote: > > The field 'affinity' in irq_desc won't change once the irq_desc data > > structure is created. So cache irq_desc->affinity instead of irq_desc. > > This also helps to hide struct irq_desc from device drivers. > > Hi Jiang, > > I might not understand the new changes irq core, but up until now > affinity was changed when the user changed it through > /proc/irq/<IRQ>/smp_affinity. > This code is monitoring the affinity from the napi_poll context to > detect affinity changes, and prevent napi from keep running on the > wrong CPU. > Therefore, the affinity can't be cached at the beginning. Please > revert this caching. Sigh. All of this is completely wrong. Polling in the internals of irq_desc is just crap. irq_set_affinity_notifier() has been added for exactly this purpose. Please get rid of your irq_desc abuse. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 4, 2015 at 5:00 PM, Jiang Liu <jiang.liu@linux.intel.com> wrote: > On 2015/5/4 20:10, Amir Vadai wrote: >> On Mon, May 4, 2015 at 6:15 AM, Jiang Liu <jiang.liu@linux.intel.com> wrote: >>> The field 'affinity' in irq_desc won't change once the irq_desc data >>> structure is created. So cache irq_desc->affinity instead of irq_desc. >>> This also helps to hide struct irq_desc from device drivers. >> >> Hi Jiang, >> >> I might not understand the new changes irq core, but up until now >> affinity was changed when the user changed it through >> /proc/irq/<IRQ>/smp_affinity. >> This code is monitoring the affinity from the napi_poll context to >> detect affinity changes, and prevent napi from keep running on the >> wrong CPU. >> Therefore, the affinity can't be cached at the beginning. Please >> revert this caching. > Hi Amir, > Thanks for review:) We want to hide irq_desc implementation > details from device drivers, so made these changes. > Function irq_get_affinity_mask() returns 'struct cpumask *' > and we cache the returned pointer. On the other hand, user may change > IRQ affinity through /proc/irq/<IRQ>/smp_affinity, but that only > changes the bitmap pointed to by the cached pointer and won't change > the pointer itself. So it should always return the latest affinity > setting by calling cpumask_test_cpu(cpu_curr, cq->irq_affinity). > Or am I missing something here? No - you are completely right - I missed it. Anyway I tested it to make sure, and everything works as expected. Acked-By: Amir Vadai <amirv@mellanox.com> Thanks, Amir > Thanks! > Gerry >> >> Thanks, >> Amir >> > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 4, 2015 at 6:10 PM, Thomas Gleixner <tglx@linutronix.de> wrote: > On Mon, 4 May 2015, Amir Vadai wrote: > >> On Mon, May 4, 2015 at 6:15 AM, Jiang Liu <jiang.liu@linux.intel.com> wrote: >> > The field 'affinity' in irq_desc won't change once the irq_desc data >> > structure is created. So cache irq_desc->affinity instead of irq_desc. >> > This also helps to hide struct irq_desc from device drivers. >> >> Hi Jiang, >> >> I might not understand the new changes irq core, but up until now >> affinity was changed when the user changed it through >> /proc/irq/<IRQ>/smp_affinity. >> This code is monitoring the affinity from the napi_poll context to >> detect affinity changes, and prevent napi from keep running on the >> wrong CPU. >> Therefore, the affinity can't be cached at the beginning. Please >> revert this caching. > > Sigh. All of this is completely wrong. Polling in the internals of > irq_desc is just crap. Hi, > > irq_set_affinity_notifier() has been added for exactly this purpose. rmap is already registered on irq_set_affinity_notifier(). Actually, I did post an extension to the affinity notifier, to use affinity chain which you rejected [1] (and BTW, convinced me that this is not the way to do it). Current code is according to your suggestion. [1] - https://lkml.org/lkml/2014/5/26/222 Amir > > Please get rid of your irq_desc abuse. > > Thanks, > > tglx > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 5 May 2015, Amir Vadai wrote: > On Mon, May 4, 2015 at 6:10 PM, Thomas Gleixner <tglx@linutronix.de> wrote: > > On Mon, 4 May 2015, Amir Vadai wrote: > > > >> On Mon, May 4, 2015 at 6:15 AM, Jiang Liu <jiang.liu@linux.intel.com> wrote: > >> > The field 'affinity' in irq_desc won't change once the irq_desc data > >> > structure is created. So cache irq_desc->affinity instead of irq_desc. > >> > This also helps to hide struct irq_desc from device drivers. > >> > >> Hi Jiang, > >> > >> I might not understand the new changes irq core, but up until now > >> affinity was changed when the user changed it through > >> /proc/irq/<IRQ>/smp_affinity. > >> This code is monitoring the affinity from the napi_poll context to > >> detect affinity changes, and prevent napi from keep running on the > >> wrong CPU. > >> Therefore, the affinity can't be cached at the beginning. Please > >> revert this caching. > > > > Sigh. All of this is completely wrong. Polling in the internals of > > irq_desc is just crap. > Hi, > > > > > irq_set_affinity_notifier() has been added for exactly this purpose. > rmap is already registered on irq_set_affinity_notifier(). > Actually, I did post an extension to the affinity notifier, to use > affinity chain which you rejected [1] (and BTW, convinced me that this > is not the way to do it). > Current code is according to your suggestion. > > [1] - https://lkml.org/lkml/2014/5/26/222 No, it's not. I wrote: "So what's the point of adding notifier call chain complexity, ordering problems etc., if you can simply note the fact that the affinity changed in the rmap itself and check that in the RX path?" I did not tell you to fiddle in irqdesc, really. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 5/5/2015 5:53 PM, Thomas Gleixner wrote: > On Tue, 5 May 2015, Amir Vadai wrote: > >> On Mon, May 4, 2015 at 6:10 PM, Thomas Gleixner <tglx@linutronix.de> wrote: >>> On Mon, 4 May 2015, Amir Vadai wrote: >>> >>>> On Mon, May 4, 2015 at 6:15 AM, Jiang Liu <jiang.liu@linux.intel.com> wrote: >>>>> The field 'affinity' in irq_desc won't change once the irq_desc data >>>>> structure is created. So cache irq_desc->affinity instead of irq_desc. >>>>> This also helps to hide struct irq_desc from device drivers. >>>> >>>> Hi Jiang, >>>> >>>> I might not understand the new changes irq core, but up until now >>>> affinity was changed when the user changed it through >>>> /proc/irq/<IRQ>/smp_affinity. >>>> This code is monitoring the affinity from the napi_poll context to >>>> detect affinity changes, and prevent napi from keep running on the >>>> wrong CPU. >>>> Therefore, the affinity can't be cached at the beginning. Please >>>> revert this caching. >>> >>> Sigh. All of this is completely wrong. Polling in the internals of >>> irq_desc is just crap. >> Hi, >> >>> >>> irq_set_affinity_notifier() has been added for exactly this purpose. >> rmap is already registered on irq_set_affinity_notifier(). >> Actually, I did post an extension to the affinity notifier, to use >> affinity chain which you rejected [1] (and BTW, convinced me that this >> is not the way to do it). >> Current code is according to your suggestion. >> >> [1] - https://lkml.org/lkml/2014/5/26/222 > > No, it's not. I wrote: > > "So what's the point of adding notifier call chain complexity, > ordering problems etc., if you can simply note the fact that the > affinity changed in the rmap itself and check that in the RX path?" > > I did not tell you to fiddle in irqdesc, really. > > Thanks, > > tglx > It is problematic, since I didn't want to link the napi polling affinity issue and CONFIG_CPU_RMAP being selected. That's why I CC'd you when submitted the patch [1]. Guess it was my bad that I didn't do it in a better way that would give you a chance to fully review it. Anyway, I understand now that I shouldn't access irqdesc attributes from a device driver. I will see if I can live with disabling this whole feature (migrate a currently looping NAPI when affinity is changed), when CONFIG_CPU_RMAP is not selected. If not, I might modify the rmap code to give me this information even when it is not selected. I assume that rmap notifier callback can access this information. Thanks, Amir -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c index 22da4d0d0f05..a03a01625398 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c @@ -31,6 +31,7 @@ * */ +#include <linux/irq.h> #include <linux/mlx4/cq.h> #include <linux/mlx4/qp.h> #include <linux/mlx4/cmd.h> @@ -135,9 +136,8 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq, mdev->dev->caps.num_comp_vectors; } - cq->irq_desc = - irq_to_desc(mlx4_eq_get_irq(mdev->dev, - cq->vector)); + cq->irq_affinity = irq_get_affinity_mask( + mlx4_eq_get_irq(mdev->dev, cq->vector)); } else { /* For TX we use the same irq per ring we assigned for the RX */ diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index 4fdd3c37e47b..59d8395ef31c 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -1022,14 +1022,11 @@ int mlx4_en_poll_rx_cq(struct napi_struct *napi, int budget) /* If we used up all the quota - we're probably not done yet... */ if (done == budget) { int cpu_curr; - const struct cpumask *aff; INC_PERF_COUNTER(priv->pstats.napi_quota); cpu_curr = smp_processor_id(); - aff = irq_desc_get_irq_data(cq->irq_desc)->affinity; - - if (likely(cpumask_test_cpu(cpu_curr, aff))) + if (likely(cpumask_test_cpu(cpu_curr, cq->irq_affinity))) return budget; /* Current cpu is not according to smp_irq_affinity - diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h index 9de30216b146..5de5e9a51d76 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h @@ -357,7 +357,7 @@ struct mlx4_en_cq { #define CQ_USER_PEND (MLX4_EN_CQ_STATE_POLL | MLX4_EN_CQ_STATE_POLL_YIELD) spinlock_t poll_lock; /* protects from LLS/napi conflicts */ #endif /* CONFIG_NET_RX_BUSY_POLL */ - struct irq_desc *irq_desc; + struct cpumask *irq_affinity; }; struct mlx4_en_port_profile {
The field 'affinity' in irq_desc won't change once the irq_desc data structure is created. So cache irq_desc->affinity instead of irq_desc. This also helps to hide struct irq_desc from device drivers. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> --- drivers/net/ethernet/mellanox/mlx4/en_cq.c | 6 +++--- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 5 +---- drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 2 +- 3 files changed, 5 insertions(+), 8 deletions(-)