Message ID | 20240618150902.345881-2-shayd@nvidia.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Introduce auxiliary bus IRQs sysfs | expand |
On 6/18/24 17:09, Shay Drory wrote: > PCI subfunctions (SF) are anchored on the auxiliary bus. PCI physical > and virtual functions are anchored on the PCI bus. The irq information > of each such function is visible to users via sysfs directory "msi_irqs" > containing files for each irq entry. However, for PCI SFs such > information is unavailable. Due to this users have no visibility on IRQs > used by the SFs. > Secondly, an SF can be multi function device supporting rdma, netdevice > and more. Without irq information at the bus level, the user is unable > to view or use the affinity of the SF IRQs. > > Hence to match to the equivalent PCI PFs and VFs, add "irqs" directory, > for supporting auxiliary devices, containing file for each irq entry. > > For example: > $ ls /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/ > 50 51 52 53 54 55 56 57 58 > > Reviewed-by: Parav Pandit <parav@nvidia.com> > Signed-off-by: Shay Drory <shayd@nvidia.com> > > --- > v6-v7: > - dynamically creating irqs directory when first irq file created (Greg) > - removed irqs flag and simplified the dev_add() API (Greg) > - move sysfs related new code to a new auxiliary_sysfs.c file (Greg) [...] > +static int auxiliary_irq_dir_prepare(struct auxiliary_device *auxdev) > +{ > + int ret = 0; > + > + mutex_lock(&auxdev->lock); > + if (auxdev->dir_exists) > + goto unlock; > + > + xa_init(&auxdev->irqs); due to below error handling you could end up with calling xa_init() twice (and this is a "library" code, so it does not matter how you handle this error in the current sole user ;)) > + ret = devm_device_add_group(&auxdev->dev, &auxiliary_irqs_group); > + if (!ret) > + auxdev->dir_exists = 1; > + > +unlock: > + mutex_unlock(&auxdev->lock); > + return ret; > +} > + [...] > --- a/include/linux/auxiliary_bus.h > +++ b/include/linux/auxiliary_bus.h > @@ -58,6 +58,7 @@ > * in > * @name: Match name found by the auxiliary device driver, > * @id: unique identitier if multiple devices of the same name are exported, > + * @irqs: irqs xarray contains irq indices which are used by the device, > * > * An auxiliary_device represents a part of its parent device's functionality. > * It is given a name that, combined with the registering drivers > @@ -138,7 +139,10 @@ > struct auxiliary_device { > struct device dev; > const char *name; > + struct xarray irqs; > + struct mutex lock; /* Protects "irqs" directory creation */ > u32 id; > + u8 dir_exists:1; nit: I would make it a bool, or `bool: 1` if you really want > }; [...]
On Tue, Jun 18, 2024 at 05:47:15PM +0200, Przemek Kitszel wrote: > On 6/18/24 17:09, Shay Drory wrote: > > PCI subfunctions (SF) are anchored on the auxiliary bus. PCI physical > > and virtual functions are anchored on the PCI bus. The irq information > > of each such function is visible to users via sysfs directory "msi_irqs" > > containing files for each irq entry. However, for PCI SFs such > > information is unavailable. Due to this users have no visibility on IRQs > > used by the SFs. > > Secondly, an SF can be multi function device supporting rdma, netdevice > > and more. Without irq information at the bus level, the user is unable > > to view or use the affinity of the SF IRQs. > > > > Hence to match to the equivalent PCI PFs and VFs, add "irqs" directory, > > for supporting auxiliary devices, containing file for each irq entry. > > > > For example: > > $ ls /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/ > > 50 51 52 53 54 55 56 57 58 > > > > Reviewed-by: Parav Pandit <parav@nvidia.com> > > Signed-off-by: Shay Drory <shayd@nvidia.com> > > > > --- > > v6-v7: > > - dynamically creating irqs directory when first irq file created (Greg) > > - removed irqs flag and simplified the dev_add() API (Greg) > > - move sysfs related new code to a new auxiliary_sysfs.c file (Greg) > > [...] > > > +static int auxiliary_irq_dir_prepare(struct auxiliary_device *auxdev) > > +{ > > + int ret = 0; > > + > > + mutex_lock(&auxdev->lock); > > + if (auxdev->dir_exists) > > + goto unlock; > > + > > + xa_init(&auxdev->irqs); > > due to below error handling you could end up with calling xa_init() > twice (and this is a "library" code, so it does not matter how you > handle this error in the current sole user ;)) > > > + ret = devm_device_add_group(&auxdev->dev, &auxiliary_irqs_group); > > + if (!ret) > > + auxdev->dir_exists = 1; > > + > > +unlock: > > + mutex_unlock(&auxdev->lock); > > + return ret; > > +} > > + > > [...] > > > --- a/include/linux/auxiliary_bus.h > > +++ b/include/linux/auxiliary_bus.h > > @@ -58,6 +58,7 @@ > > * in > > * @name: Match name found by the auxiliary device driver, > > * @id: unique identitier if multiple devices of the same name are exported, > > + * @irqs: irqs xarray contains irq indices which are used by the device, > > * > > * An auxiliary_device represents a part of its parent device's functionality. > > * It is given a name that, combined with the registering drivers > > @@ -138,7 +139,10 @@ > > struct auxiliary_device { > > struct device dev; > > const char *name; > > + struct xarray irqs; > > + struct mutex lock; /* Protects "irqs" directory creation */ > > u32 id; > > + u8 dir_exists:1; > > nit: I would make it a bool, or `bool: 1` if you really want Why is this even needed? It should "know" if the directory is there or not, it can always be looked up, right? thanks, greg k-h
On Tue, Jun 18, 2024 at 06:09:01PM +0300, Shay Drory wrote: > diff --git a/drivers/base/auxiliary_sysfs.c b/drivers/base/auxiliary_sysfs.c > new file mode 100644 > index 000000000000..3f112fd26e72 > --- /dev/null > +++ b/drivers/base/auxiliary_sysfs.c > @@ -0,0 +1,110 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES > + */ > + > +#include <linux/auxiliary_bus.h> > +#include <linux/slab.h> > + > +struct auxiliary_irq_info { > + struct device_attribute sysfs_attr; > +}; > + > +static struct attribute *auxiliary_irq_attrs[] = { > + NULL > +}; > + > +static const struct attribute_group auxiliary_irqs_group = { > + .name = "irqs", > + .attrs = auxiliary_irq_attrs, > +}; > + > +static int auxiliary_irq_dir_prepare(struct auxiliary_device *auxdev) > +{ > + int ret = 0; > + > + mutex_lock(&auxdev->lock); > + if (auxdev->dir_exists) > + goto unlock; You do know about cleanup.h, right? Please use it. But what exactly are you trying to protect here? How will you race and add two irqs at the same time? Driver probe is always single threaded, so what would be calling this at the same time from multiple places? > + > + xa_init(&auxdev->irqs); > + ret = devm_device_add_group(&auxdev->dev, &auxiliary_irqs_group); > + if (!ret) > + auxdev->dir_exists = 1; > + > +unlock: > + mutex_unlock(&auxdev->lock); > + return ret; > +} > + > +/** > + * auxiliary_device_sysfs_irq_add - add a sysfs entry for the given IRQ > + * @auxdev: auxiliary bus device to add the sysfs entry. > + * @irq: The associated interrupt number. > + * > + * This function should be called after auxiliary device have successfully > + * received the irq. > + * > + * Return: zero on success or an error code on failure. > + */ > +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq) > +{ > + struct device *dev = &auxdev->dev; > + struct auxiliary_irq_info *info; > + int ret; > + > + ret = auxiliary_irq_dir_prepare(auxdev); > + if (ret) > + return ret; > + > + info = kzalloc(sizeof(*info), GFP_KERNEL); > + if (!info) > + return -ENOMEM; > + > + sysfs_attr_init(&info->sysfs_attr.attr); > + info->sysfs_attr.attr.name = kasprintf(GFP_KERNEL, "%d", irq); > + if (!info->sysfs_attr.attr.name) { > + ret = -ENOMEM; > + goto name_err; > + } > + > + ret = xa_insert(&auxdev->irqs, irq, info, GFP_KERNEL); So no lock happening here, either use it always, or not at all? > + if (ret) > + goto auxdev_xa_err; > + > + ret = sysfs_add_file_to_group(&dev->kobj, &info->sysfs_attr.attr, > + auxiliary_irqs_group.name); You do know that you are never going to see these files from the userspace library tools that watch sysfs, right? libudev will never see them as you are adding them AFTER the device is created. So, because of that, who is really going to use these files? > + if (ret) > + goto sysfs_add_err; > + > + return 0; > + > +sysfs_add_err: > + xa_erase(&auxdev->irqs, irq); > +auxdev_xa_err: > + kfree(info->sysfs_attr.attr.name); > +name_err: > + kfree(info); Again, cleanup.h is your friend. > + return ret; > +} > +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_add); > + > +/** > + * auxiliary_device_sysfs_irq_remove - remove a sysfs entry for the given IRQ > + * @auxdev: auxiliary bus device to add the sysfs entry. > + * @irq: the IRQ to remove. > + * > + * This function should be called to remove an IRQ sysfs entry. > + */ > +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, int irq) > +{ > + struct auxiliary_irq_info *info = xa_load(&auxdev->irqs, irq); > + struct device *dev = &auxdev->dev; > + > + sysfs_remove_file_from_group(&dev->kobj, &info->sysfs_attr.attr, > + auxiliary_irqs_group.name); > + xa_erase(&auxdev->irqs, irq); > + kfree(info->sysfs_attr.attr.name); > + kfree(info); > +} > +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_remove); > diff --git a/include/linux/auxiliary_bus.h b/include/linux/auxiliary_bus.h > index de21d9d24a95..96be140bd1ff 100644 > --- a/include/linux/auxiliary_bus.h > +++ b/include/linux/auxiliary_bus.h > @@ -58,6 +58,7 @@ > * in > * @name: Match name found by the auxiliary device driver, > * @id: unique identitier if multiple devices of the same name are exported, > + * @irqs: irqs xarray contains irq indices which are used by the device, > * > * An auxiliary_device represents a part of its parent device's functionality. > * It is given a name that, combined with the registering drivers > @@ -138,7 +139,10 @@ > struct auxiliary_device { > struct device dev; > const char *name; > + struct xarray irqs; > + struct mutex lock; /* Protects "irqs" directory creation */ Protects it from what? > u32 id; > + u8 dir_exists:1; I don't think this is needed, but if it really is, just use a bool. > }; > > /** > @@ -212,8 +216,24 @@ int auxiliary_device_init(struct auxiliary_device *auxdev); > int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname); > #define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, KBUILD_MODNAME) > > +#ifdef CONFIG_SYSFS > +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq); > +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, > + int irq); You can use longer lines :) thanks, greg k-h
On 18/06/2024 19:13, Greg KH wrote: > External email: Use caution opening links or attachments > > > On Tue, Jun 18, 2024 at 06:09:01PM +0300, Shay Drory wrote: >> diff --git a/drivers/base/auxiliary_sysfs.c b/drivers/base/auxiliary_sysfs.c >> new file mode 100644 >> index 000000000000..3f112fd26e72 >> --- /dev/null >> +++ b/drivers/base/auxiliary_sysfs.c >> @@ -0,0 +1,110 @@ >> +// SPDX-License-Identifier: GPL-2.0 >> +/* >> + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES >> + */ >> + >> +#include <linux/auxiliary_bus.h> >> +#include <linux/slab.h> >> + >> +struct auxiliary_irq_info { >> + struct device_attribute sysfs_attr; >> +}; >> + >> +static struct attribute *auxiliary_irq_attrs[] = { >> + NULL >> +}; >> + >> +static const struct attribute_group auxiliary_irqs_group = { >> + .name = "irqs", >> + .attrs = auxiliary_irq_attrs, >> +}; >> + >> +static int auxiliary_irq_dir_prepare(struct auxiliary_device *auxdev) >> +{ >> + int ret = 0; >> + >> + mutex_lock(&auxdev->lock); >> + if (auxdev->dir_exists) >> + goto unlock; > > You do know about cleanup.h, right? Please use it. > > But what exactly are you trying to protect here? How will you race and > add two irqs at the same time? Driver probe is always single threaded, > so what would be calling this at the same time from multiple places? mlx5 driver requests IRQs on demand for PCI PF, VF, SFs. And it occurs from multiple threads, hence we need to protect it. > > >> + >> + xa_init(&auxdev->irqs); >> + ret = devm_device_add_group(&auxdev->dev, &auxiliary_irqs_group); >> + if (!ret) >> + auxdev->dir_exists = 1; >> + >> +unlock: >> + mutex_unlock(&auxdev->lock); >> + return ret; >> +} >> + >> +/** >> + * auxiliary_device_sysfs_irq_add - add a sysfs entry for the given IRQ >> + * @auxdev: auxiliary bus device to add the sysfs entry. >> + * @irq: The associated interrupt number. >> + * >> + * This function should be called after auxiliary device have successfully >> + * received the irq. >> + * >> + * Return: zero on success or an error code on failure. >> + */ >> +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq) >> +{ >> + struct device *dev = &auxdev->dev; >> + struct auxiliary_irq_info *info; >> + int ret; >> + >> + ret = auxiliary_irq_dir_prepare(auxdev); >> + if (ret) >> + return ret; >> + >> + info = kzalloc(sizeof(*info), GFP_KERNEL); >> + if (!info) >> + return -ENOMEM; >> + >> + sysfs_attr_init(&info->sysfs_attr.attr); >> + info->sysfs_attr.attr.name = kasprintf(GFP_KERNEL, "%d", irq); >> + if (!info->sysfs_attr.attr.name) { >> + ret = -ENOMEM; >> + goto name_err; >> + } >> + >> + ret = xa_insert(&auxdev->irqs, irq, info, GFP_KERNEL); > > So no lock happening here, either use it always, or not at all? the lock is only needed to protect the group (directory) creation, which will be used by all the IRQs of this auxdev. parallel calls to this API will always be with different IRQs, which means each IRQ have a unique index. > > >> + if (ret) >> + goto auxdev_xa_err; >> + >> + ret = sysfs_add_file_to_group(&dev->kobj, &info->sysfs_attr.attr, >> + auxiliary_irqs_group.name); > > You do know that you are never going to see these files from the > userspace library tools that watch sysfs, right? libudev will never see > them as you are adding them AFTER the device is created. > > So, because of that, who is really going to use these files? To learn about the interrupt mapping of the SF IRQs. > > >> + if (ret) >> + goto sysfs_add_err; >> + >> + return 0; >> + >> +sysfs_add_err: >> + xa_erase(&auxdev->irqs, irq); >> +auxdev_xa_err: >> + kfree(info->sysfs_attr.attr.name); >> +name_err: >> + kfree(info); > > Again, cleanup.h is your friend. > >> + return ret; >> +} >> +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_add); >> + >> +/** >> + * auxiliary_device_sysfs_irq_remove - remove a sysfs entry for the given IRQ >> + * @auxdev: auxiliary bus device to add the sysfs entry. >> + * @irq: the IRQ to remove. >> + * >> + * This function should be called to remove an IRQ sysfs entry. >> + */ >> +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, int irq) >> +{ >> + struct auxiliary_irq_info *info = xa_load(&auxdev->irqs, irq); >> + struct device *dev = &auxdev->dev; >> + >> + sysfs_remove_file_from_group(&dev->kobj, &info->sysfs_attr.attr, >> + auxiliary_irqs_group.name); >> + xa_erase(&auxdev->irqs, irq); >> + kfree(info->sysfs_attr.attr.name); >> + kfree(info); >> +} >> +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_remove); >> diff --git a/include/linux/auxiliary_bus.h b/include/linux/auxiliary_bus.h >> index de21d9d24a95..96be140bd1ff 100644 >> --- a/include/linux/auxiliary_bus.h >> +++ b/include/linux/auxiliary_bus.h >> @@ -58,6 +58,7 @@ >> * in >> * @name: Match name found by the auxiliary device driver, >> * @id: unique identitier if multiple devices of the same name are exported, >> + * @irqs: irqs xarray contains irq indices which are used by the device, >> * >> * An auxiliary_device represents a part of its parent device's functionality. >> * It is given a name that, combined with the registering drivers >> @@ -138,7 +139,10 @@ >> struct auxiliary_device { >> struct device dev; >> const char *name; >> + struct xarray irqs; >> + struct mutex lock; /* Protects "irqs" directory creation */ > > Protects it from what? please look the answer above > > >> u32 id; >> + u8 dir_exists:1; > > I don't think this is needed, but if it really is, just use a bool. If you know of an API that query whether a specific group is exists on some device, can you please share it with me? I came out empty when I looked for one :( > > >> }; >> >> /** >> @@ -212,8 +216,24 @@ int auxiliary_device_init(struct auxiliary_device *auxdev); >> int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname); >> #define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, KBUILD_MODNAME) >> >> +#ifdef CONFIG_SYSFS >> +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq); >> +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, >> + int irq); > > You can use longer lines :) > > thanks, > > greg k-h
On Wed, Jun 19, 2024 at 09:33:12AM +0300, Shay Drori wrote: > > > On 18/06/2024 19:13, Greg KH wrote: > > External email: Use caution opening links or attachments > > > > > > On Tue, Jun 18, 2024 at 06:09:01PM +0300, Shay Drory wrote: > > > diff --git a/drivers/base/auxiliary_sysfs.c b/drivers/base/auxiliary_sysfs.c > > > new file mode 100644 > > > index 000000000000..3f112fd26e72 > > > --- /dev/null > > > +++ b/drivers/base/auxiliary_sysfs.c > > > @@ -0,0 +1,110 @@ > > > +// SPDX-License-Identifier: GPL-2.0 > > > +/* > > > + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES > > > + */ > > > + > > > +#include <linux/auxiliary_bus.h> > > > +#include <linux/slab.h> > > > + > > > +struct auxiliary_irq_info { > > > + struct device_attribute sysfs_attr; > > > +}; > > > + > > > +static struct attribute *auxiliary_irq_attrs[] = { > > > + NULL > > > +}; > > > + > > > +static const struct attribute_group auxiliary_irqs_group = { > > > + .name = "irqs", > > > + .attrs = auxiliary_irq_attrs, > > > +}; > > > + > > > +static int auxiliary_irq_dir_prepare(struct auxiliary_device *auxdev) > > > +{ > > > + int ret = 0; > > > + > > > + mutex_lock(&auxdev->lock); > > > + if (auxdev->dir_exists) > > > + goto unlock; > > > > You do know about cleanup.h, right? Please use it. > > > > But what exactly are you trying to protect here? How will you race and > > add two irqs at the same time? Driver probe is always single threaded, > > so what would be calling this at the same time from multiple places? > > > mlx5 driver requests IRQs on demand for PCI PF, VF, SFs. > And it occurs from multiple threads, hence we need to protect it. How are irqs asked for, for the same device, from multiple threads? What threads exactly? What is causing these irqs to be asked for? But ok, that's fine, if you want to do this, then properly protect the allocation, don't just half-protect it like you did here :( > > > + > > > + xa_init(&auxdev->irqs); > > > + ret = devm_device_add_group(&auxdev->dev, &auxiliary_irqs_group); > > > + if (!ret) > > > + auxdev->dir_exists = 1; > > > + > > > +unlock: > > > + mutex_unlock(&auxdev->lock); > > > + return ret; > > > +} > > > + > > > +/** > > > + * auxiliary_device_sysfs_irq_add - add a sysfs entry for the given IRQ > > > + * @auxdev: auxiliary bus device to add the sysfs entry. > > > + * @irq: The associated interrupt number. > > > + * > > > + * This function should be called after auxiliary device have successfully > > > + * received the irq. > > > + * > > > + * Return: zero on success or an error code on failure. > > > + */ > > > +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq) > > > +{ > > > + struct device *dev = &auxdev->dev; > > > + struct auxiliary_irq_info *info; > > > + int ret; > > > + > > > + ret = auxiliary_irq_dir_prepare(auxdev); > > > + if (ret) > > > + return ret; > > > + > > > + info = kzalloc(sizeof(*info), GFP_KERNEL); > > > + if (!info) > > > + return -ENOMEM; > > > + > > > + sysfs_attr_init(&info->sysfs_attr.attr); > > > + info->sysfs_attr.attr.name = kasprintf(GFP_KERNEL, "%d", irq); > > > + if (!info->sysfs_attr.attr.name) { > > > + ret = -ENOMEM; > > > + goto name_err; > > > + } > > > + > > > + ret = xa_insert(&auxdev->irqs, irq, info, GFP_KERNEL); > > > > So no lock happening here, either use it always, or not at all? > > > the lock is only needed to protect the group (directory) creation, which > will be used by all the IRQs of this auxdev. > parallel calls to this API will always be with different IRQs, which > means each IRQ have a unique index. You are inserting into the sysfs group at the same time? You are calling xa_insert() at the same time? Is that protected with some internal lock? If so, this needs to be documented a bunch here. Allocating irqs is NOT a fast path, just grab a lock and do it right please, don't make us constantly have to stare at the code to ensure it is correct. > > > + if (ret) > > > + goto auxdev_xa_err; > > > + > > > + ret = sysfs_add_file_to_group(&dev->kobj, &info->sysfs_attr.attr, > > > + auxiliary_irqs_group.name); > > > > You do know that you are never going to see these files from the > > userspace library tools that watch sysfs, right? libudev will never see > > them as you are adding them AFTER the device is created. > > > > So, because of that, who is really going to use these files? > > To learn about the interrupt mapping of the SF IRQs. Who is going to "learn"? Again, you are creating files that our userspace tools will miss, so what userspace tools are going to be able to learn anything here? This is strongly implying that all of this is just a debugging aid. So please, put this in debugfs where that type of thing belongs. > > > + if (ret) > > > + goto sysfs_add_err; > > > + > > > + return 0; > > > + > > > +sysfs_add_err: > > > + xa_erase(&auxdev->irqs, irq); > > > +auxdev_xa_err: > > > + kfree(info->sysfs_attr.attr.name); > > > +name_err: > > > + kfree(info); > > > > Again, cleanup.h is your friend. > > > > > + return ret; > > > +} > > > +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_add); > > > + > > > +/** > > > + * auxiliary_device_sysfs_irq_remove - remove a sysfs entry for the given IRQ > > > + * @auxdev: auxiliary bus device to add the sysfs entry. > > > + * @irq: the IRQ to remove. > > > + * > > > + * This function should be called to remove an IRQ sysfs entry. > > > + */ > > > +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, int irq) > > > +{ > > > + struct auxiliary_irq_info *info = xa_load(&auxdev->irqs, irq); > > > + struct device *dev = &auxdev->dev; > > > + > > > + sysfs_remove_file_from_group(&dev->kobj, &info->sysfs_attr.attr, > > > + auxiliary_irqs_group.name); > > > + xa_erase(&auxdev->irqs, irq); > > > + kfree(info->sysfs_attr.attr.name); > > > + kfree(info); > > > +} > > > +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_remove); > > > diff --git a/include/linux/auxiliary_bus.h b/include/linux/auxiliary_bus.h > > > index de21d9d24a95..96be140bd1ff 100644 > > > --- a/include/linux/auxiliary_bus.h > > > +++ b/include/linux/auxiliary_bus.h > > > @@ -58,6 +58,7 @@ > > > * in > > > * @name: Match name found by the auxiliary device driver, > > > * @id: unique identitier if multiple devices of the same name are exported, > > > + * @irqs: irqs xarray contains irq indices which are used by the device, > > > * > > > * An auxiliary_device represents a part of its parent device's functionality. > > > * It is given a name that, combined with the registering drivers > > > @@ -138,7 +139,10 @@ > > > struct auxiliary_device { > > > struct device dev; > > > const char *name; > > > + struct xarray irqs; > > > + struct mutex lock; /* Protects "irqs" directory creation */ > > > > Protects it from what? > > please look the answer above You need to document it here. Or somewhere. Don't rely on an email thread from 10 years ago for when you look at this in 10 years and wonder what is going on... > > > u32 id; > > > + u8 dir_exists:1; > > > > I don't think this is needed, but if it really is, just use a bool. > > > If you know of an API that query whether a specific group is exists on > some device, can you please share it with me? > I came out empty when I looked for one :( Normally sysfs groups are NOT created this way at all. Oh wait, they can be now, why not use the new feature where a group is created by the core but only exposed if an attribute is added there? Will that work here? See commit d87c295f599c ("sysfs: Introduce a mechanism to hide static attribute_groups") for details. That should solve the issue of trying to figure out if the directory is present or not logic. thanks, greg k-h
On 19/06/2024 9:45, Greg KH wrote: > External email: Use caution opening links or attachments > > > On Wed, Jun 19, 2024 at 09:33:12AM +0300, Shay Drori wrote: >> >> >> On 18/06/2024 19:13, Greg KH wrote: >>> External email: Use caution opening links or attachments >>> >>> >>> On Tue, Jun 18, 2024 at 06:09:01PM +0300, Shay Drory wrote: >>>> diff --git a/drivers/base/auxiliary_sysfs.c b/drivers/base/auxiliary_sysfs.c >>>> new file mode 100644 >>>> index 000000000000..3f112fd26e72 >>>> --- /dev/null >>>> +++ b/drivers/base/auxiliary_sysfs.c >>>> @@ -0,0 +1,110 @@ >>>> +// SPDX-License-Identifier: GPL-2.0 >>>> +/* >>>> + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES >>>> + */ >>>> + >>>> +#include <linux/auxiliary_bus.h> >>>> +#include <linux/slab.h> >>>> + >>>> +struct auxiliary_irq_info { >>>> + struct device_attribute sysfs_attr; >>>> +}; >>>> + >>>> +static struct attribute *auxiliary_irq_attrs[] = { >>>> + NULL >>>> +}; >>>> + >>>> +static const struct attribute_group auxiliary_irqs_group = { >>>> + .name = "irqs", >>>> + .attrs = auxiliary_irq_attrs, >>>> +}; >>>> + >>>> +static int auxiliary_irq_dir_prepare(struct auxiliary_device *auxdev) >>>> +{ >>>> + int ret = 0; >>>> + >>>> + mutex_lock(&auxdev->lock); >>>> + if (auxdev->dir_exists) >>>> + goto unlock; >>> >>> You do know about cleanup.h, right? Please use it. >>> >>> But what exactly are you trying to protect here? How will you race and >>> add two irqs at the same time? Driver probe is always single threaded, >>> so what would be calling this at the same time from multiple places? >> >> >> mlx5 driver requests IRQs on demand for PCI PF, VF, SFs. >> And it occurs from multiple threads, hence we need to protect it. > > How are irqs asked for, for the same device, from multiple threads? > What threads exactly? What is causing these irqs to be asked for? > > But ok, that's fine, if you want to do this, then properly protect the > allocation, don't just half-protect it like you did here :( Thanks for the comment, will protect all the allocations > >>>> + >>>> + xa_init(&auxdev->irqs); >>>> + ret = devm_device_add_group(&auxdev->dev, &auxiliary_irqs_group); >>>> + if (!ret) >>>> + auxdev->dir_exists = 1; >>>> + >>>> +unlock: >>>> + mutex_unlock(&auxdev->lock); >>>> + return ret; >>>> +} >>>> + >>>> +/** >>>> + * auxiliary_device_sysfs_irq_add - add a sysfs entry for the given IRQ >>>> + * @auxdev: auxiliary bus device to add the sysfs entry. >>>> + * @irq: The associated interrupt number. >>>> + * >>>> + * This function should be called after auxiliary device have successfully >>>> + * received the irq. >>>> + * >>>> + * Return: zero on success or an error code on failure. >>>> + */ >>>> +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq) >>>> +{ >>>> + struct device *dev = &auxdev->dev; >>>> + struct auxiliary_irq_info *info; >>>> + int ret; >>>> + >>>> + ret = auxiliary_irq_dir_prepare(auxdev); >>>> + if (ret) >>>> + return ret; >>>> + >>>> + info = kzalloc(sizeof(*info), GFP_KERNEL); >>>> + if (!info) >>>> + return -ENOMEM; >>>> + >>>> + sysfs_attr_init(&info->sysfs_attr.attr); >>>> + info->sysfs_attr.attr.name = kasprintf(GFP_KERNEL, "%d", irq); >>>> + if (!info->sysfs_attr.attr.name) { >>>> + ret = -ENOMEM; >>>> + goto name_err; >>>> + } >>>> + >>>> + ret = xa_insert(&auxdev->irqs, irq, info, GFP_KERNEL); >>> >>> So no lock happening here, either use it always, or not at all? >> >> >> the lock is only needed to protect the group (directory) creation, which >> will be used by all the IRQs of this auxdev. >> parallel calls to this API will always be with different IRQs, which >> means each IRQ have a unique index. > > You are inserting into the sysfs group at the same time? You are > calling xa_insert() at the same time? Is that protected with some > internal lock? If so, this needs to be documented a bunch here. > > Allocating irqs is NOT a fast path, just grab a lock and do it right > please, don't make us constantly have to stare at the code to ensure it > is correct. like I said above, I will protect all the allocations > >>>> + if (ret) >>>> + goto auxdev_xa_err; >>>> + >>>> + ret = sysfs_add_file_to_group(&dev->kobj, &info->sysfs_attr.attr, >>>> + auxiliary_irqs_group.name); >>> >>> You do know that you are never going to see these files from the >>> userspace library tools that watch sysfs, right? libudev will never see >>> them as you are adding them AFTER the device is created. >>> >>> So, because of that, who is really going to use these files? >> >> To learn about the interrupt mapping of the SF IRQs. > > Who is going to "learn"? Again, you are creating files that our > userspace tools will miss, so what userspace tools are going to be able > to learn anything here? > > This is strongly implying that all of this is just a debugging aid. So > please, put this in debugfs where that type of thing belongs. It is certainly a debugging aid as I described in the commit log. But it is one of the purpose. The motivation was clear but probably I should have better written. The irq affinity setting code [1] needs to read the irqs number of the device. Tools like irqbalance [1] are using the sysfs. And one should be able to do the same for the PCI SF too. They cannot rely on the debugfs. [1] https://github.com/Irqbalance/irqbalance/blob/ba44a683cdfaa688e89e0d887952032766fb89aa/classify.c#L631 > >>>> + if (ret) >>>> + goto sysfs_add_err; >>>> + >>>> + return 0; >>>> + >>>> +sysfs_add_err: >>>> + xa_erase(&auxdev->irqs, irq); >>>> +auxdev_xa_err: >>>> + kfree(info->sysfs_attr.attr.name); >>>> +name_err: >>>> + kfree(info); >>> >>> Again, cleanup.h is your friend. >>> >>>> + return ret; >>>> +} >>>> +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_add); >>>> + >>>> +/** >>>> + * auxiliary_device_sysfs_irq_remove - remove a sysfs entry for the given IRQ >>>> + * @auxdev: auxiliary bus device to add the sysfs entry. >>>> + * @irq: the IRQ to remove. >>>> + * >>>> + * This function should be called to remove an IRQ sysfs entry. >>>> + */ >>>> +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, int irq) >>>> +{ >>>> + struct auxiliary_irq_info *info = xa_load(&auxdev->irqs, irq); >>>> + struct device *dev = &auxdev->dev; >>>> + >>>> + sysfs_remove_file_from_group(&dev->kobj, &info->sysfs_attr.attr, >>>> + auxiliary_irqs_group.name); >>>> + xa_erase(&auxdev->irqs, irq); >>>> + kfree(info->sysfs_attr.attr.name); >>>> + kfree(info); >>>> +} >>>> +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_remove); >>>> diff --git a/include/linux/auxiliary_bus.h b/include/linux/auxiliary_bus.h >>>> index de21d9d24a95..96be140bd1ff 100644 >>>> --- a/include/linux/auxiliary_bus.h >>>> +++ b/include/linux/auxiliary_bus.h >>>> @@ -58,6 +58,7 @@ >>>> * in >>>> * @name: Match name found by the auxiliary device driver, >>>> * @id: unique identitier if multiple devices of the same name are exported, >>>> + * @irqs: irqs xarray contains irq indices which are used by the device, >>>> * >>>> * An auxiliary_device represents a part of its parent device's functionality. >>>> * It is given a name that, combined with the registering drivers >>>> @@ -138,7 +139,10 @@ >>>> struct auxiliary_device { >>>> struct device dev; >>>> const char *name; >>>> + struct xarray irqs; >>>> + struct mutex lock; /* Protects "irqs" directory creation */ >>> >>> Protects it from what? >> >> please look the answer above > > You need to document it here. Or somewhere. Don't rely on an email > thread from 10 years ago for when you look at this in 10 years and > wonder what is going on... Thanks, I will document it better in next version > >>>> u32 id; >>>> + u8 dir_exists:1; >>> >>> I don't think this is needed, but if it really is, just use a bool. >> >> >> If you know of an API that query whether a specific group is exists on >> some device, can you please share it with me? >> I came out empty when I looked for one :( > > Normally sysfs groups are NOT created this way at all. Oh wait, they > can be now, why not use the new feature where a group is created by the > core but only exposed if an attribute is added there? > > Will that work here? See commit d87c295f599c ("sysfs: Introduce a > mechanism to hide static attribute_groups") for details. That should > solve the issue of trying to figure out if the directory is present or > not logic. thank for the suggestion:) will give it a shoot > > thanks, > > greg k-h
On Tue, Jun 18, 2024 at 06:09:01PM +0300, Shay Drory wrote: > PCI subfunctions (SF) are anchored on the auxiliary bus. PCI physical > and virtual functions are anchored on the PCI bus. The irq information > of each such function is visible to users via sysfs directory "msi_irqs" > containing files for each irq entry. However, for PCI SFs such > information is unavailable. Due to this users have no visibility on IRQs > used by the SFs. > Secondly, an SF can be multi function device supporting rdma, netdevice > and more. Without irq information at the bus level, the user is unable > to view or use the affinity of the SF IRQs. > > Hence to match to the equivalent PCI PFs and VFs, add "irqs" directory, > for supporting auxiliary devices, containing file for each irq entry. > > For example: > $ ls /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/ > 50 51 52 53 54 55 56 57 58 > > Reviewed-by: Parav Pandit <parav@nvidia.com> > Signed-off-by: Shay Drory <shayd@nvidia.com> ... > --- a/include/linux/auxiliary_bus.h > +++ b/include/linux/auxiliary_bus.h > @@ -58,6 +58,7 @@ > * in > * @name: Match name found by the auxiliary device driver, > * @id: unique identitier if multiple devices of the same name are exported, > + * @irqs: irqs xarray contains irq indices which are used by the device, Hi Shay, A minor nit from my side: please also add entries for @lock and @dir_exists. Flagged by kernel-doc -none > * > * An auxiliary_device represents a part of its parent device's functionality. > * It is given a name that, combined with the registering drivers > @@ -138,7 +139,10 @@ > struct auxiliary_device { > struct device dev; > const char *name; > + struct xarray irqs; > + struct mutex lock; /* Protects "irqs" directory creation */ > u32 id; > + u8 dir_exists:1; > }; > > /** ...
Hi Greg On 20/06/2024 8:47, Shay Drori wrote: >>>>> u32 id; >>>>> + u8 dir_exists:1; >>>> >>>> I don't think this is needed, but if it really is, just use a bool. >>> >>> >>> If you know of an API that query whether a specific group is exists on >>> some device, can you please share it with me? >>> I came out empty when I looked for one
diff --git a/Documentation/ABI/testing/sysfs-bus-auxiliary b/Documentation/ABI/testing/sysfs-bus-auxiliary new file mode 100644 index 000000000000..e8752c2354bc --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-auxiliary @@ -0,0 +1,7 @@ +What: /sys/bus/auxiliary/devices/.../irqs/ +Date: April, 2024 +Contact: Shay Drory <shayd@nvidia.com> +Description: + The /sys/devices/.../irqs directory contains a variable set of + files, with each file is named as irq number similar to PCI PF + or VF's irq number located in msi_irqs directory. diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 3079bfe53d04..7fb21768ca36 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -16,6 +16,7 @@ obj-$(CONFIG_NUMA) += node.o obj-$(CONFIG_MEMORY_HOTPLUG) += memory.o ifeq ($(CONFIG_SYSFS),y) obj-$(CONFIG_MODULES) += module.o +obj-$(CONFIG_AUXILIARY_BUS) += auxiliary_sysfs.o endif obj-$(CONFIG_SYS_HYPERVISOR) += hypervisor.o obj-$(CONFIG_REGMAP) += regmap/ diff --git a/drivers/base/auxiliary.c b/drivers/base/auxiliary.c index d3a2c40c2f12..55bde375150f 100644 --- a/drivers/base/auxiliary.c +++ b/drivers/base/auxiliary.c @@ -287,6 +287,7 @@ int auxiliary_device_init(struct auxiliary_device *auxdev) dev->bus = &auxiliary_bus_type; device_initialize(&auxdev->dev); + mutex_init(&auxdev->lock); return 0; } EXPORT_SYMBOL_GPL(auxiliary_device_init); diff --git a/drivers/base/auxiliary_sysfs.c b/drivers/base/auxiliary_sysfs.c new file mode 100644 index 000000000000..3f112fd26e72 --- /dev/null +++ b/drivers/base/auxiliary_sysfs.c @@ -0,0 +1,110 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ + +#include <linux/auxiliary_bus.h> +#include <linux/slab.h> + +struct auxiliary_irq_info { + struct device_attribute sysfs_attr; +}; + +static struct attribute *auxiliary_irq_attrs[] = { + NULL +}; + +static const struct attribute_group auxiliary_irqs_group = { + .name = "irqs", + .attrs = auxiliary_irq_attrs, +}; + +static int auxiliary_irq_dir_prepare(struct auxiliary_device *auxdev) +{ + int ret = 0; + + mutex_lock(&auxdev->lock); + if (auxdev->dir_exists) + goto unlock; + + xa_init(&auxdev->irqs); + ret = devm_device_add_group(&auxdev->dev, &auxiliary_irqs_group); + if (!ret) + auxdev->dir_exists = 1; + +unlock: + mutex_unlock(&auxdev->lock); + return ret; +} + +/** + * auxiliary_device_sysfs_irq_add - add a sysfs entry for the given IRQ + * @auxdev: auxiliary bus device to add the sysfs entry. + * @irq: The associated interrupt number. + * + * This function should be called after auxiliary device have successfully + * received the irq. + * + * Return: zero on success or an error code on failure. + */ +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq) +{ + struct device *dev = &auxdev->dev; + struct auxiliary_irq_info *info; + int ret; + + ret = auxiliary_irq_dir_prepare(auxdev); + if (ret) + return ret; + + info = kzalloc(sizeof(*info), GFP_KERNEL); + if (!info) + return -ENOMEM; + + sysfs_attr_init(&info->sysfs_attr.attr); + info->sysfs_attr.attr.name = kasprintf(GFP_KERNEL, "%d", irq); + if (!info->sysfs_attr.attr.name) { + ret = -ENOMEM; + goto name_err; + } + + ret = xa_insert(&auxdev->irqs, irq, info, GFP_KERNEL); + if (ret) + goto auxdev_xa_err; + + ret = sysfs_add_file_to_group(&dev->kobj, &info->sysfs_attr.attr, + auxiliary_irqs_group.name); + if (ret) + goto sysfs_add_err; + + return 0; + +sysfs_add_err: + xa_erase(&auxdev->irqs, irq); +auxdev_xa_err: + kfree(info->sysfs_attr.attr.name); +name_err: + kfree(info); + return ret; +} +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_add); + +/** + * auxiliary_device_sysfs_irq_remove - remove a sysfs entry for the given IRQ + * @auxdev: auxiliary bus device to add the sysfs entry. + * @irq: the IRQ to remove. + * + * This function should be called to remove an IRQ sysfs entry. + */ +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, int irq) +{ + struct auxiliary_irq_info *info = xa_load(&auxdev->irqs, irq); + struct device *dev = &auxdev->dev; + + sysfs_remove_file_from_group(&dev->kobj, &info->sysfs_attr.attr, + auxiliary_irqs_group.name); + xa_erase(&auxdev->irqs, irq); + kfree(info->sysfs_attr.attr.name); + kfree(info); +} +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_remove); diff --git a/include/linux/auxiliary_bus.h b/include/linux/auxiliary_bus.h index de21d9d24a95..96be140bd1ff 100644 --- a/include/linux/auxiliary_bus.h +++ b/include/linux/auxiliary_bus.h @@ -58,6 +58,7 @@ * in * @name: Match name found by the auxiliary device driver, * @id: unique identitier if multiple devices of the same name are exported, + * @irqs: irqs xarray contains irq indices which are used by the device, * * An auxiliary_device represents a part of its parent device's functionality. * It is given a name that, combined with the registering drivers @@ -138,7 +139,10 @@ struct auxiliary_device { struct device dev; const char *name; + struct xarray irqs; + struct mutex lock; /* Protects "irqs" directory creation */ u32 id; + u8 dir_exists:1; }; /** @@ -212,8 +216,24 @@ int auxiliary_device_init(struct auxiliary_device *auxdev); int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname); #define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, KBUILD_MODNAME) +#ifdef CONFIG_SYSFS +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq); +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, + int irq); +#else /* CONFIG_SYSFS */ +static inline int +auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq) +{ + return 0; +} + +static inline void +auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, int irq) {} +#endif + static inline void auxiliary_device_uninit(struct auxiliary_device *auxdev) { + mutex_destroy(&auxdev->lock); put_device(&auxdev->dev); }