Message ID | 20240508040453.602230-2-shayd@nvidia.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Introduce auxiliary bus IRQs sysfs | expand |
On 5/8/24 06:04, Shay Drory wrote: > PCI subfunctions (SF) are anchored on the auxiliary bus. PCI physical > and virtual functions are anchored on the PCI bus; the irq information > of each such function is visible to users via sysfs directory "msi_irqs" > containing file for each irq entry. However, for PCI SFs such information > is unavailable. Due to this users have no visibility on IRQs used by the > SFs. > Secondly, an SF is a multi function device supporting rdma, netdevice > and more. Without irq information at the bus level, the user is unable > to view or use the affinity of the SF IRQs. > > Hence to match to the equivalent PCI PFs and VFs, add "irqs" directory, > for supporting auxiliary devices, containing file for each irq entry. > > Additionally, the PCI SFs sometimes share the IRQs with peer SFs. This > information is also not available to the users. To overcome this > limitation, each irq sysfs entry shows if irq is exclusive or shared. > > For example: > $ ls /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/ > 50 51 52 53 54 55 56 57 58 > $ cat /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/52 > exclusive > > Reviewed-by: Parav Pandit <parav@nvidia.com> > Signed-off-by: Shay Drory <shayd@nvidia.com> > > --- > v2->v3: > - fix function declaration in case SYSFS isn't defined (Parav) > - convert auxdev->groups array with auxiliary_irqs_groups (Przemek) > v1->v2: > - move #ifdefs from drivers/base/auxiliary.c to > include/linux/auxiliary_bus.h (Greg) > - use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL (Greg) > - Fix kzalloc(ref) to kzalloc(*ref) (Simon) > - Add return description in auxiliary_device_sysfs_irq_add() kdoc (Simon) > - Fix auxiliary_irq_mode_show doc (kernel test boot) > --- > Documentation/ABI/testing/sysfs-bus-auxiliary | 14 ++ > drivers/base/auxiliary.c | 171 +++++++++++++++++- > include/linux/auxiliary_bus.h | 24 ++- > 3 files changed, 206 insertions(+), 3 deletions(-) > create mode 100644 Documentation/ABI/testing/sysfs-bus-auxiliary > > diff --git a/Documentation/ABI/testing/sysfs-bus-auxiliary b/Documentation/ABI/testing/sysfs-bus-auxiliary > new file mode 100644 > index 000000000000..3b8299d49d9e > --- /dev/null > +++ b/Documentation/ABI/testing/sysfs-bus-auxiliary > @@ -0,0 +1,14 @@ > +What: /sys/bus/auxiliary/devices/.../irqs/ > +Date: April, 2024 > +Contact: Shay Drory <shayd@nvidia.com> > +Description: > + The /sys/devices/.../irqs directory contains a variable set of > + files, with each file is named as irq number similar to PCI PF > + or VF's irq number located in msi_irqs directory. > + > +What: /sys/bus/auxiliary/devices/.../irqs/<N> > +Date: April, 2024 > +Contact: Shay Drory <shayd@nvidia.com> > +Description: > + auxiliary devices can share IRQs. This attribute indicates if > + the irq is shared with other SFs or exclusively used by the SF. > diff --git a/drivers/base/auxiliary.c b/drivers/base/auxiliary.c > index d3a2c40c2f12..6293c6707e1e 100644 > --- a/drivers/base/auxiliary.c > +++ b/drivers/base/auxiliary.c > @@ -158,6 +158,169 @@ > * }; > */ > > +#ifdef CONFIG_SYSFS > +/* Xarray of irqs to determine if irq is exclusive or shared. */ > +static DEFINE_XARRAY(irqs); > +/* Protects insertions into the irtqs xarray. */ > +static DEFINE_MUTEX(irqs_lock); sorry for not catching it earlier, you don't need a separate lock, xarray provides one, please see below [1], [2] > + > +struct auxiliary_irq_info { > + struct device_attribute sysfs_attr; > + int irq; > +}; > + > +static struct attribute *auxiliary_irq_attrs[] = { > + NULL > +}; > + > +static const struct attribute_group auxiliary_irqs_group = { > + .name = "irqs", > + .attrs = auxiliary_irq_attrs, > +}; > + > +static const struct attribute_group *auxiliary_irqs_groups[2] = { > + &auxiliary_irqs_group, > + NULL > +}; > + > +/* Auxiliary devices can share IRQs. Expose to user whether the provided IRQ is > + * shared or exclusive. > + */ > +static ssize_t auxiliary_irq_mode_show(struct device *dev, > + struct device_attribute *attr, char *buf) > +{ > + struct auxiliary_irq_info *info = > + container_of(attr, struct auxiliary_irq_info, sysfs_attr); > + > + if (refcount_read(xa_load(&irqs, info->irq)) > 1) I didn't checked if it is possible with current implementation, but please imagine a scenario where user open()'s sysfs file, then triggers operation to remove irq (to call auxiliary_irq_destroy()), and only then read()'s sysfs contents, what results in nullptr dereference (xa_load() returning NULL). Splitting the code into two if statements would resolve this issue. > + return sysfs_emit(buf, "%s\n", "shared"); > + else > + return sysfs_emit(buf, "%s\n", "exclusive"); > +} > + > +static void auxiliary_irq_destroy(int irq) > +{ > + refcount_t *ref; > + > + xa_lock(&irqs); > + ref = xa_load(&irqs, irq); > + if (refcount_dec_and_test(ref)) { > + __xa_erase(&irqs, irq); > + kfree(ref); > + } > + xa_unlock(&irqs); > +} > + > +static int auxiliary_irq_create(int irq) > +{ > + refcount_t *ref; > + int ret = 0; > + > + mutex_lock(&irqs_lock); [1] xa_lock() instead ... > + ref = xa_load(&irqs, irq); > + if (ref && refcount_inc_not_zero(ref)) > + goto out; `&& refcount_inc_not_zero()` here means: leak memory and wreak havoc on saturation, instead the logic should be: if (ref) { refcount_inc(ref); goto out; } anyway allocating under a lock taken is not the best idea in general, although xarray API somehow encourages this - alternative is to preallocate and free when not used, or some lock dance that will be easy to get wrong - and that's the raison d'etre of xa_reserve() :) > + > + ref = kzalloc(sizeof(*ref), GFP_KERNEL); > + if (!ref) { > + ret = -ENOMEM; > + goto out; > + } > + > + refcount_set(ref, 1); > + ret = xa_insert(&irqs, irq, ref, GFP_KERNEL); [2] ... then __xa_insert() here > + if (ret) > + kfree(ref); > + > +out: > + mutex_unlock(&irqs_lock); > + return ret; > +} > + > +/** > + * auxiliary_device_sysfs_irq_add - add a sysfs entry for the given IRQ > + * @auxdev: auxiliary bus device to add the sysfs entry. > + * @irq: The associated Linux interrupt number. > + * > + * This function should be called after auxiliary device have successfully > + * received the irq. > + * > + * Return: zero on success or an error code on failure. > + */ > +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq) > +{ > + struct device *dev = &auxdev->dev; > + struct auxiliary_irq_info *info; > + int ret; > + > + ret = auxiliary_irq_create(irq); > + if (ret) > + return ret; > + > + info = kzalloc(sizeof(*info), GFP_KERNEL); > + if (!info) { > + ret = -ENOMEM; > + goto info_err; > + } > + > + sysfs_attr_init(&info->sysfs_attr.attr); > + info->sysfs_attr.attr.name = kasprintf(GFP_KERNEL, "%d", irq); > + if (!info->sysfs_attr.attr.name) { > + ret = -ENOMEM; > + goto name_err; > + } > + info->irq = irq; > + info->sysfs_attr.attr.mode = 0444; > + info->sysfs_attr.show = auxiliary_irq_mode_show; > + > + ret = xa_insert(&auxdev->irqs, irq, info, GFP_KERNEL); > + if (ret) > + goto auxdev_xa_err; > + > + ret = sysfs_add_file_to_group(&dev->kobj, &info->sysfs_attr.attr, > + auxiliary_irqs_group.name); > + if (ret) > + goto sysfs_add_err; > + > + return 0; > + > +sysfs_add_err: > + xa_erase(&auxdev->irqs, irq); > +auxdev_xa_err: > + kfree(info->sysfs_attr.attr.name); > +name_err: > + kfree(info); > +info_err: > + auxiliary_irq_destroy(irq); > + return ret; > +} > +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_add); > + > +/** > + * auxiliary_device_sysfs_irq_remove - remove a sysfs entry for the given IRQ > + * @auxdev: auxiliary bus device to add the sysfs entry. > + * @irq: the IRQ to remove. > + * > + * This function should be called to remove an IRQ sysfs entry. > + */ > +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, int irq) > +{ > + struct auxiliary_irq_info *info = xa_load(&auxdev->irqs, irq); > + struct device *dev = &auxdev->dev; > + > + if (WARN_ON(!info)) > + return; > + > + sysfs_remove_file_from_group(&dev->kobj, &info->sysfs_attr.attr, > + auxiliary_irqs_group.name); > + xa_erase(&auxdev->irqs, irq); > + kfree(info->sysfs_attr.attr.name); > + kfree(info); > + auxiliary_irq_destroy(irq); > +} > +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_remove); > +#endif > + > static const struct auxiliary_device_id *auxiliary_match_id(const struct auxiliary_device_id *id, > const struct auxiliary_device *auxdev) > { > @@ -295,6 +458,7 @@ EXPORT_SYMBOL_GPL(auxiliary_device_init); > * __auxiliary_device_add - add an auxiliary bus device > * @auxdev: auxiliary bus device to add to the bus > * @modname: name of the parent device's driver module > + * @irqs_sysfs_enable: whether to enable IRQs sysfs > * > * This is the third step in the three-step process to register an > * auxiliary_device. > @@ -310,7 +474,8 @@ EXPORT_SYMBOL_GPL(auxiliary_device_init); > * parameter. Only if a user requires a custom name would this version be > * called directly. > */ > -int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname) > +int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname, > + bool irqs_sysfs_enable) > { > struct device *dev = &auxdev->dev; > int ret; > @@ -325,6 +490,10 @@ int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname) > dev_err(dev, "auxiliary device dev_set_name failed: %d\n", ret); > return ret; > } > + if (irqs_sysfs_enable) { > + dev->groups = auxiliary_irqs_groups; > + xa_init(&auxdev->irqs); > + } > > ret = device_add(dev); > if (ret) > diff --git a/include/linux/auxiliary_bus.h b/include/linux/auxiliary_bus.h > index de21d9d24a95..760fadb26620 100644 > --- a/include/linux/auxiliary_bus.h > +++ b/include/linux/auxiliary_bus.h > @@ -58,6 +58,7 @@ > * in > * @name: Match name found by the auxiliary device driver, > * @id: unique identitier if multiple devices of the same name are exported, > + * @irqs: irqs xarray contains irq indices which are used by the device, > * > * An auxiliary_device represents a part of its parent device's functionality. > * It is given a name that, combined with the registering drivers > @@ -138,6 +139,7 @@ > struct auxiliary_device { > struct device dev; > const char *name; > + struct xarray irqs; > u32 id; > }; > > @@ -209,8 +211,26 @@ static inline struct auxiliary_driver *to_auxiliary_drv(struct device_driver *dr > } > > int auxiliary_device_init(struct auxiliary_device *auxdev); > -int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname); > -#define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, KBUILD_MODNAME) > +int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname, > + bool irqs_sysfs_enable); > +#define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, KBUILD_MODNAME, false) > +#define auxiliary_device_add_with_irqs(auxdev) \ > + __auxiliary_device_add(auxdev, KBUILD_MODNAME, true) > + > +#ifdef CONFIG_SYSFS > +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq); > +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, > + int irq); > +#else /* CONFIG_SYSFS */ > +static inline int > +auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq) > +{ > + return 0; > +} > + > +static inline void > +auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, int irq) {} > +#endif > > static inline void auxiliary_device_uninit(struct auxiliary_device *auxdev) > {
On 08/05/2024 12:34, Przemek Kitszel wrote: > External email: Use caution opening links or attachments > > > On 5/8/24 06:04, Shay Drory wrote: >> PCI subfunctions (SF) are anchored on the auxiliary bus. PCI physical >> and virtual functions are anchored on the PCI bus; the irq information >> of each such function is visible to users via sysfs directory "msi_irqs" >> containing file for each irq entry. However, for PCI SFs such information >> is unavailable. Due to this users have no visibility on IRQs used by the >> SFs. >> Secondly, an SF is a multi function device supporting rdma, netdevice >> and more. Without irq information at the bus level, the user is unable >> to view or use the affinity of the SF IRQs. >> >> Hence to match to the equivalent PCI PFs and VFs, add "irqs" directory, >> for supporting auxiliary devices, containing file for each irq entry. >> >> Additionally, the PCI SFs sometimes share the IRQs with peer SFs. This >> information is also not available to the users. To overcome this >> limitation, each irq sysfs entry shows if irq is exclusive or shared. >> >> For example: >> $ ls /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/ >> 50 51 52 53 54 55 56 57 58 >> $ cat /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/52 >> exclusive >> >> Reviewed-by: Parav Pandit <parav@nvidia.com> >> Signed-off-by: Shay Drory <shayd@nvidia.com> >> >> --- >> v2->v3: >> - fix function declaration in case SYSFS isn't defined (Parav) >> - convert auxdev->groups array with auxiliary_irqs_groups (Przemek) >> v1->v2: >> - move #ifdefs from drivers/base/auxiliary.c to >> include/linux/auxiliary_bus.h (Greg) >> - use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL (Greg) >> - Fix kzalloc(ref) to kzalloc(*ref) (Simon) >> - Add return description in auxiliary_device_sysfs_irq_add() kdoc (Simon) >> - Fix auxiliary_irq_mode_show doc (kernel test boot) >> --- >> Documentation/ABI/testing/sysfs-bus-auxiliary | 14 ++ >> drivers/base/auxiliary.c | 171 +++++++++++++++++- >> include/linux/auxiliary_bus.h | 24 ++- >> 3 files changed, 206 insertions(+), 3 deletions(-) >> create mode 100644 Documentation/ABI/testing/sysfs-bus-auxiliary >> >> diff --git a/Documentation/ABI/testing/sysfs-bus-auxiliary >> b/Documentation/ABI/testing/sysfs-bus-auxiliary >> new file mode 100644 >> index 000000000000..3b8299d49d9e >> --- /dev/null >> +++ b/Documentation/ABI/testing/sysfs-bus-auxiliary >> @@ -0,0 +1,14 @@ >> +What: /sys/bus/auxiliary/devices/.../irqs/ >> +Date: April, 2024 >> +Contact: Shay Drory <shayd@nvidia.com> >> +Description: >> + The /sys/devices/.../irqs directory contains a variable >> set of >> + files, with each file is named as irq number similar to >> PCI PF >> + or VF's irq number located in msi_irqs directory. >> + >> +What: /sys/bus/auxiliary/devices/.../irqs/<N> >> +Date: April, 2024 >> +Contact: Shay Drory <shayd@nvidia.com> >> +Description: >> + auxiliary devices can share IRQs. This attribute >> indicates if >> + the irq is shared with other SFs or exclusively used by >> the SF. >> diff --git a/drivers/base/auxiliary.c b/drivers/base/auxiliary.c >> index d3a2c40c2f12..6293c6707e1e 100644 >> --- a/drivers/base/auxiliary.c >> +++ b/drivers/base/auxiliary.c >> @@ -158,6 +158,169 @@ >> * }; >> */ >> >> +#ifdef CONFIG_SYSFS >> +/* Xarray of irqs to determine if irq is exclusive or shared. */ >> +static DEFINE_XARRAY(irqs); >> +/* Protects insertions into the irtqs xarray. */ >> +static DEFINE_MUTEX(irqs_lock); > > sorry for not catching it earlier, you don't need a separate lock, > xarray provides one, please see below [1], [2] > >> + >> +struct auxiliary_irq_info { >> + struct device_attribute sysfs_attr; >> + int irq; >> +}; >> + >> +static struct attribute *auxiliary_irq_attrs[] = { >> + NULL >> +}; >> + >> +static const struct attribute_group auxiliary_irqs_group = { >> + .name = "irqs", >> + .attrs = auxiliary_irq_attrs, >> +}; >> + >> +static const struct attribute_group *auxiliary_irqs_groups[2] = { >> + &auxiliary_irqs_group, >> + NULL >> +}; >> + >> +/* Auxiliary devices can share IRQs. Expose to user whether the >> provided IRQ is >> + * shared or exclusive. >> + */ >> +static ssize_t auxiliary_irq_mode_show(struct device *dev, >> + struct device_attribute *attr, >> char *buf) >> +{ >> + struct auxiliary_irq_info *info = >> + container_of(attr, struct auxiliary_irq_info, sysfs_attr); >> + >> + if (refcount_read(xa_load(&irqs, info->irq)) > 1) > > I didn't checked if it is possible with current implementation, but > please imagine a scenario where user open()'s sysfs file, then triggers > operation to remove irq (to call auxiliary_irq_destroy()), and only then > read()'s sysfs contents, what results in nullptr dereference (xa_load() > returning NULL). Splitting the code into two if statements would resolve > this issue. the first function in auxiliary_irq_destroy() is removing the sysfs. I don't see how after that user can read() the sysfs... > >> + return sysfs_emit(buf, "%s\n", "shared"); >> + else >> + return sysfs_emit(buf, "%s\n", "exclusive"); >> +} >> + >> +static void auxiliary_irq_destroy(int irq) >> +{ >> + refcount_t *ref; >> + >> + xa_lock(&irqs); >> + ref = xa_load(&irqs, irq); >> + if (refcount_dec_and_test(ref)) { >> + __xa_erase(&irqs, irq); >> + kfree(ref); >> + } >> + xa_unlock(&irqs); >> +} >> + >> +static int auxiliary_irq_create(int irq) >> +{ >> + refcount_t *ref; >> + int ret = 0; >> + >> + mutex_lock(&irqs_lock); > > [1] xa_lock() instead ... > >> + ref = xa_load(&irqs, irq); >> + if (ref && refcount_inc_not_zero(ref)) >> + goto out; > > `&& refcount_inc_not_zero()` here means: leak memory and wreak havoc on > saturation, instead the logic should be: > if (ref) { > refcount_inc(ref); > goto out; > } > > anyway allocating under a lock taken is not the best idea in general, > although xarray API somehow encourages this - > alternative is to > preallocate and free when not used, or some lock dance that will be easy > to get wrong - and that's the raison d'etre of xa_reserve() :) I don't understand what you picture here? xa_reserve() can drop the lock while allocating the xa_entry, so how it will help? > >> + >> + ref = kzalloc(sizeof(*ref), GFP_KERNEL); >> + if (!ref) { >> + ret = -ENOMEM; >> + goto out; >> + } >> + >> + refcount_set(ref, 1); >> + ret = xa_insert(&irqs, irq, ref, GFP_KERNEL); > > [2] ... then __xa_insert() here __xa_insert() can drop the lock as well... > >> + if (ret) >> + kfree(ref); >> + >> +out: >> + mutex_unlock(&irqs_lock); >> + return ret; >> +} >> + >> +/** >> + * auxiliary_device_sysfs_irq_add - add a sysfs entry for the given IRQ >> + * @auxdev: auxiliary bus device to add the sysfs entry. >> + * @irq: The associated Linux interrupt number. >> + * >> + * This function should be called after auxiliary device have >> successfully >> + * received the irq. >> + * >> + * Return: zero on success or an error code on failure. >> + */ >> +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, >> int irq) >> +{ >> + struct device *dev = &auxdev->dev; >> + struct auxiliary_irq_info *info; >> + int ret; >> + >> + ret = auxiliary_irq_create(irq); >> + if (ret) >> + return ret; >> + >> + info = kzalloc(sizeof(*info), GFP_KERNEL); >> + if (!info) { >> + ret = -ENOMEM; >> + goto info_err; >> + } >> + >> + sysfs_attr_init(&info->sysfs_attr.attr); >> + info->sysfs_attr.attr.name = kasprintf(GFP_KERNEL, "%d", irq); >> + if (!info->sysfs_attr.attr.name) { >> + ret = -ENOMEM; >> + goto name_err; >> + } >> + info->irq = irq; >> + info->sysfs_attr.attr.mode = 0444; >> + info->sysfs_attr.show = auxiliary_irq_mode_show; >> + >> + ret = xa_insert(&auxdev->irqs, irq, info, GFP_KERNEL); >> + if (ret) >> + goto auxdev_xa_err; >> + >> + ret = sysfs_add_file_to_group(&dev->kobj, &info->sysfs_attr.attr, >> + auxiliary_irqs_group.name); >> + if (ret) >> + goto sysfs_add_err; >> + >> + return 0; >> + >> +sysfs_add_err: >> + xa_erase(&auxdev->irqs, irq); >> +auxdev_xa_err: >> + kfree(info->sysfs_attr.attr.name); >> +name_err: >> + kfree(info); >> +info_err: >> + auxiliary_irq_destroy(irq); >> + return ret; >> +} >> +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_add); >> + >> +/** >> + * auxiliary_device_sysfs_irq_remove - remove a sysfs entry for the >> given IRQ >> + * @auxdev: auxiliary bus device to add the sysfs entry. >> + * @irq: the IRQ to remove. >> + * >> + * This function should be called to remove an IRQ sysfs entry. >> + */ >> +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device >> *auxdev, int irq) >> +{ >> + struct auxiliary_irq_info *info = xa_load(&auxdev->irqs, irq); >> + struct device *dev = &auxdev->dev; >> + >> + if (WARN_ON(!info)) >> + return; >> + >> + sysfs_remove_file_from_group(&dev->kobj, &info->sysfs_attr.attr, >> + auxiliary_irqs_group.name); >> + xa_erase(&auxdev->irqs, irq); >> + kfree(info->sysfs_attr.attr.name); >> + kfree(info); >> + auxiliary_irq_destroy(irq); >> +} >> +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_remove); >> +#endif >> + >> static const struct auxiliary_device_id *auxiliary_match_id(const >> struct auxiliary_device_id *id, >> const struct >> auxiliary_device *auxdev) >> { >> @@ -295,6 +458,7 @@ EXPORT_SYMBOL_GPL(auxiliary_device_init); >> * __auxiliary_device_add - add an auxiliary bus device >> * @auxdev: auxiliary bus device to add to the bus >> * @modname: name of the parent device's driver module >> + * @irqs_sysfs_enable: whether to enable IRQs sysfs >> * >> * This is the third step in the three-step process to register an >> * auxiliary_device. >> @@ -310,7 +474,8 @@ EXPORT_SYMBOL_GPL(auxiliary_device_init); >> * parameter. Only if a user requires a custom name would this >> version be >> * called directly. >> */ >> -int __auxiliary_device_add(struct auxiliary_device *auxdev, const >> char *modname) >> +int __auxiliary_device_add(struct auxiliary_device *auxdev, const >> char *modname, >> + bool irqs_sysfs_enable) >> { >> struct device *dev = &auxdev->dev; >> int ret; >> @@ -325,6 +490,10 @@ int __auxiliary_device_add(struct >> auxiliary_device *auxdev, const char *modname) >> dev_err(dev, "auxiliary device dev_set_name failed: >> %d\n", ret); >> return ret; >> } >> + if (irqs_sysfs_enable) { >> + dev->groups = auxiliary_irqs_groups; >> + xa_init(&auxdev->irqs); >> + } >> >> ret = device_add(dev); >> if (ret) >> diff --git a/include/linux/auxiliary_bus.h >> b/include/linux/auxiliary_bus.h >> index de21d9d24a95..760fadb26620 100644 >> --- a/include/linux/auxiliary_bus.h >> +++ b/include/linux/auxiliary_bus.h >> @@ -58,6 +58,7 @@ >> * in >> * @name: Match name found by the auxiliary device driver, >> * @id: unique identitier if multiple devices of the same name are >> exported, >> + * @irqs: irqs xarray contains irq indices which are used by the device, >> * >> * An auxiliary_device represents a part of its parent device's >> functionality. >> * It is given a name that, combined with the registering drivers >> @@ -138,6 +139,7 @@ >> struct auxiliary_device { >> struct device dev; >> const char *name; >> + struct xarray irqs; >> u32 id; >> }; >> >> @@ -209,8 +211,26 @@ static inline struct auxiliary_driver >> *to_auxiliary_drv(struct device_driver *dr >> } >> >> int auxiliary_device_init(struct auxiliary_device *auxdev); >> -int __auxiliary_device_add(struct auxiliary_device *auxdev, const >> char *modname); >> -#define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, >> KBUILD_MODNAME) >> +int __auxiliary_device_add(struct auxiliary_device *auxdev, const >> char *modname, >> + bool irqs_sysfs_enable); >> +#define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, >> KBUILD_MODNAME, false) >> +#define auxiliary_device_add_with_irqs(auxdev) \ >> + __auxiliary_device_add(auxdev, KBUILD_MODNAME, true) >> + >> +#ifdef CONFIG_SYSFS >> +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, >> int irq); >> +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, >> + int irq); >> +#else /* CONFIG_SYSFS */ >> +static inline int >> +auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq) >> +{ >> + return 0; >> +} >> + >> +static inline void >> +auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, >> int irq) {} >> +#endif >> >> static inline void auxiliary_device_uninit(struct auxiliary_device >> *auxdev) >> { >
please not that v4+ is already being discussed On 5/8/24 13:33, Shay Drori wrote: > On 08/05/2024 12:34, Przemek Kitszel wrote: // ... >>> + >>> +/* Auxiliary devices can share IRQs. Expose to user whether the >>> provided IRQ is >>> + * shared or exclusive. >>> + */ >>> +static ssize_t auxiliary_irq_mode_show(struct device *dev, >>> + struct device_attribute *attr, >>> char *buf) >>> +{ >>> + struct auxiliary_irq_info *info = >>> + container_of(attr, struct auxiliary_irq_info, sysfs_attr); >>> + >>> + if (refcount_read(xa_load(&irqs, info->irq)) > 1) >> >> I didn't checked if it is possible with current implementation, but >> please imagine a scenario where user open()'s sysfs file, then triggers >> operation to remove irq (to call auxiliary_irq_destroy()), and only then >> read()'s sysfs contents, what results in nullptr dereference (xa_load() >> returning NULL). Splitting the code into two if statements would resolve >> this issue. > > the first function in auxiliary_irq_destroy() is removing the sysfs. > I don't see how after that user can read() the sysfs... Let me illustrate, but with my running kernel instead of your series: # strace cat /sys/class/net/enp0s31f6/duplex 2>&1 | grep -e open -e read yields (among others): openat(AT_FDCWD, "/sys/class/net/enp0s31f6/duplex", O_RDONLY) = 3 read(3, "full\n", 131072) = 5 And now imagine that other, concurrent user app or any HW event triggers this IRQ removal (resulting with xarray entry removed (!), likely sysfs attr refcount dropped to 0 [A], so new open()s will be declined, but that is irrelevant). My assumption is that, until close()d, user is free to call read() on fd received from openat(), but it's possible that xa_load() would return NULL (because of [A] above). > >> >>> + return sysfs_emit(buf, "%s\n", "shared"); >>> + else >>> + return sysfs_emit(buf, "%s\n", "exclusive"); >>> +} >>> + >>> +static void auxiliary_irq_destroy(int irq) >>> +{ >>> + refcount_t *ref; >>> + >>> + xa_lock(&irqs); >>> + ref = xa_load(&irqs, irq); >>> + if (refcount_dec_and_test(ref)) { >>> + __xa_erase(&irqs, irq); >>> + kfree(ref); >>> + } >>> + xa_unlock(&irqs); >>> +} >>> + >>> +static int auxiliary_irq_create(int irq) >>> +{ >>> + refcount_t *ref; >>> + int ret = 0; >>> + >>> + mutex_lock(&irqs_lock); >> >> [1] xa_lock() instead ... >> >>> + ref = xa_load(&irqs, irq); >>> + if (ref && refcount_inc_not_zero(ref)) >>> + goto out; >> >> `&& refcount_inc_not_zero()` here means: leak memory and wreak havoc on >> saturation, instead the logic should be: >> if (ref) { >> refcount_inc(ref); >> goto out; >> } >> <digression> >> anyway allocating under a lock taken is not the best idea in general, >> although xarray API somehow encourages this - > >> alternative is to >> preallocate and free when not used, or some lock dance that will be easy >> to get wrong - and that's the raison d'etre of xa_reserve() :) > > I don't understand what you picture here? Here I was digressing, sorry for not marking it clearly as that. IMO xarray API need an extension to make this and similar use case easier to code right. I will CC you ofc. </digression> > xa_reserve() can drop the lock while allocating the xa_entry, so how it > will help? > >> >>> + >>> + ref = kzalloc(sizeof(*ref), GFP_KERNEL); >>> + if (!ref) { >>> + ret = -ENOMEM; >>> + goto out; >>> + } >>> + >>> + refcount_set(ref, 1); >>> + ret = xa_insert(&irqs, irq, ref, GFP_KERNEL); >> >> [2] ... then __xa_insert() here > > __xa_insert() can drop the lock as well... Thank you for pointing it to me. Let's move future discussion on this series to your newer submissions. // ...
On 10/05/2024 16:07, Przemek Kitszel wrote: > External email: Use caution opening links or attachments > > > please not that v4+ is already being discussed > > On 5/8/24 13:33, Shay Drori wrote: >> On 08/05/2024 12:34, Przemek Kitszel wrote: > > // ... > >>>> + >>>> +/* Auxiliary devices can share IRQs. Expose to user whether the >>>> provided IRQ is >>>> + * shared or exclusive. >>>> + */ >>>> +static ssize_t auxiliary_irq_mode_show(struct device *dev, >>>> + struct device_attribute *attr, >>>> char *buf) >>>> +{ >>>> + struct auxiliary_irq_info *info = >>>> + container_of(attr, struct auxiliary_irq_info, >>>> sysfs_attr); >>>> + >>>> + if (refcount_read(xa_load(&irqs, info->irq)) > 1) >>> >>> I didn't checked if it is possible with current implementation, but >>> please imagine a scenario where user open()'s sysfs file, then triggers >>> operation to remove irq (to call auxiliary_irq_destroy()), and only then >>> read()'s sysfs contents, what results in nullptr dereference (xa_load() >>> returning NULL). Splitting the code into two if statements would resolve >>> this issue. >> >> the first function in auxiliary_irq_destroy() is removing the sysfs. >> I don't see how after that user can read() the sysfs... > > Let me illustrate, but with my running kernel instead of your series: > # strace cat /sys/class/net/enp0s31f6/duplex 2>&1 | grep -e open -e read > yields (among others): > openat(AT_FDCWD, "/sys/class/net/enp0s31f6/duplex", O_RDONLY) = 3 > read(3, "full\n", 131072) = 5 > > And now imagine that other, concurrent user app or any HW event triggers > this IRQ removal (resulting with xarray entry removed (!), likely sysfs > attr refcount dropped to 0 [A], so new open()s will be declined, but > that is irrelevant). > My assumption is that, until close()d, user is free to call read() on > fd received from openat(), but it's possible that xa_load() would return > NULL (because of [A] above). > >> >>> >>>> + return sysfs_emit(buf, "%s\n", "shared"); >>>> + else >>>> + return sysfs_emit(buf, "%s\n", "exclusive"); >>>> +} >>>> + >>>> +static void auxiliary_irq_destroy(int irq) >>>> +{ >>>> + refcount_t *ref; >>>> + >>>> + xa_lock(&irqs); >>>> + ref = xa_load(&irqs, irq); >>>> + if (refcount_dec_and_test(ref)) { >>>> + __xa_erase(&irqs, irq); >>>> + kfree(ref); >>>> + } >>>> + xa_unlock(&irqs); >>>> +} >>>> + >>>> +static int auxiliary_irq_create(int irq) >>>> +{ >>>> + refcount_t *ref; >>>> + int ret = 0; >>>> + >>>> + mutex_lock(&irqs_lock); >>> >>> [1] xa_lock() instead ... >>> >>>> + ref = xa_load(&irqs, irq); >>>> + if (ref && refcount_inc_not_zero(ref)) >>>> + goto out; >>> >>> `&& refcount_inc_not_zero()` here means: leak memory and wreak havoc on >>> saturation, instead the logic should be: >>> if (ref) { >>> refcount_inc(ref); >>> goto out; >>> } >>> > > > <digression> > >>> anyway allocating under a lock taken is not the best idea in general, >>> although xarray API somehow encourages this - >> >>> alternative is to >>> preallocate and free when not used, or some lock dance that will be easy >>> to get wrong - and that's the raison d'etre of xa_reserve() :) >> >> I don't understand what you picture here? > > Here I was digressing, sorry for not marking it clearly as that. > IMO xarray API need an extension to make this and similar use case > easier to code right. I will CC you ofc. > </digression> > >> xa_reserve() can drop the lock while allocating the xa_entry, so how it >> will help? >> >>> >>>> + >>>> + ref = kzalloc(sizeof(*ref), GFP_KERNEL); >>>> + if (!ref) { >>>> + ret = -ENOMEM; >>>> + goto out; >>>> + } >>>> + >>>> + refcount_set(ref, 1); >>>> + ret = xa_insert(&irqs, irq, ref, GFP_KERNEL); >>> >>> [2] ... then __xa_insert() here >> >> __xa_insert() can drop the lock as well... > > Thank you for pointing it to me. > > Let's move future discussion on this series to your newer submissions. thanks for the quick reviews :) lets continue in the v4 series. > > // ...
diff --git a/Documentation/ABI/testing/sysfs-bus-auxiliary b/Documentation/ABI/testing/sysfs-bus-auxiliary new file mode 100644 index 000000000000..3b8299d49d9e --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-auxiliary @@ -0,0 +1,14 @@ +What: /sys/bus/auxiliary/devices/.../irqs/ +Date: April, 2024 +Contact: Shay Drory <shayd@nvidia.com> +Description: + The /sys/devices/.../irqs directory contains a variable set of + files, with each file is named as irq number similar to PCI PF + or VF's irq number located in msi_irqs directory. + +What: /sys/bus/auxiliary/devices/.../irqs/<N> +Date: April, 2024 +Contact: Shay Drory <shayd@nvidia.com> +Description: + auxiliary devices can share IRQs. This attribute indicates if + the irq is shared with other SFs or exclusively used by the SF. diff --git a/drivers/base/auxiliary.c b/drivers/base/auxiliary.c index d3a2c40c2f12..6293c6707e1e 100644 --- a/drivers/base/auxiliary.c +++ b/drivers/base/auxiliary.c @@ -158,6 +158,169 @@ * }; */ +#ifdef CONFIG_SYSFS +/* Xarray of irqs to determine if irq is exclusive or shared. */ +static DEFINE_XARRAY(irqs); +/* Protects insertions into the irtqs xarray. */ +static DEFINE_MUTEX(irqs_lock); + +struct auxiliary_irq_info { + struct device_attribute sysfs_attr; + int irq; +}; + +static struct attribute *auxiliary_irq_attrs[] = { + NULL +}; + +static const struct attribute_group auxiliary_irqs_group = { + .name = "irqs", + .attrs = auxiliary_irq_attrs, +}; + +static const struct attribute_group *auxiliary_irqs_groups[2] = { + &auxiliary_irqs_group, + NULL +}; + +/* Auxiliary devices can share IRQs. Expose to user whether the provided IRQ is + * shared or exclusive. + */ +static ssize_t auxiliary_irq_mode_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct auxiliary_irq_info *info = + container_of(attr, struct auxiliary_irq_info, sysfs_attr); + + if (refcount_read(xa_load(&irqs, info->irq)) > 1) + return sysfs_emit(buf, "%s\n", "shared"); + else + return sysfs_emit(buf, "%s\n", "exclusive"); +} + +static void auxiliary_irq_destroy(int irq) +{ + refcount_t *ref; + + xa_lock(&irqs); + ref = xa_load(&irqs, irq); + if (refcount_dec_and_test(ref)) { + __xa_erase(&irqs, irq); + kfree(ref); + } + xa_unlock(&irqs); +} + +static int auxiliary_irq_create(int irq) +{ + refcount_t *ref; + int ret = 0; + + mutex_lock(&irqs_lock); + ref = xa_load(&irqs, irq); + if (ref && refcount_inc_not_zero(ref)) + goto out; + + ref = kzalloc(sizeof(*ref), GFP_KERNEL); + if (!ref) { + ret = -ENOMEM; + goto out; + } + + refcount_set(ref, 1); + ret = xa_insert(&irqs, irq, ref, GFP_KERNEL); + if (ret) + kfree(ref); + +out: + mutex_unlock(&irqs_lock); + return ret; +} + +/** + * auxiliary_device_sysfs_irq_add - add a sysfs entry for the given IRQ + * @auxdev: auxiliary bus device to add the sysfs entry. + * @irq: The associated Linux interrupt number. + * + * This function should be called after auxiliary device have successfully + * received the irq. + * + * Return: zero on success or an error code on failure. + */ +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq) +{ + struct device *dev = &auxdev->dev; + struct auxiliary_irq_info *info; + int ret; + + ret = auxiliary_irq_create(irq); + if (ret) + return ret; + + info = kzalloc(sizeof(*info), GFP_KERNEL); + if (!info) { + ret = -ENOMEM; + goto info_err; + } + + sysfs_attr_init(&info->sysfs_attr.attr); + info->sysfs_attr.attr.name = kasprintf(GFP_KERNEL, "%d", irq); + if (!info->sysfs_attr.attr.name) { + ret = -ENOMEM; + goto name_err; + } + info->irq = irq; + info->sysfs_attr.attr.mode = 0444; + info->sysfs_attr.show = auxiliary_irq_mode_show; + + ret = xa_insert(&auxdev->irqs, irq, info, GFP_KERNEL); + if (ret) + goto auxdev_xa_err; + + ret = sysfs_add_file_to_group(&dev->kobj, &info->sysfs_attr.attr, + auxiliary_irqs_group.name); + if (ret) + goto sysfs_add_err; + + return 0; + +sysfs_add_err: + xa_erase(&auxdev->irqs, irq); +auxdev_xa_err: + kfree(info->sysfs_attr.attr.name); +name_err: + kfree(info); +info_err: + auxiliary_irq_destroy(irq); + return ret; +} +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_add); + +/** + * auxiliary_device_sysfs_irq_remove - remove a sysfs entry for the given IRQ + * @auxdev: auxiliary bus device to add the sysfs entry. + * @irq: the IRQ to remove. + * + * This function should be called to remove an IRQ sysfs entry. + */ +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, int irq) +{ + struct auxiliary_irq_info *info = xa_load(&auxdev->irqs, irq); + struct device *dev = &auxdev->dev; + + if (WARN_ON(!info)) + return; + + sysfs_remove_file_from_group(&dev->kobj, &info->sysfs_attr.attr, + auxiliary_irqs_group.name); + xa_erase(&auxdev->irqs, irq); + kfree(info->sysfs_attr.attr.name); + kfree(info); + auxiliary_irq_destroy(irq); +} +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_remove); +#endif + static const struct auxiliary_device_id *auxiliary_match_id(const struct auxiliary_device_id *id, const struct auxiliary_device *auxdev) { @@ -295,6 +458,7 @@ EXPORT_SYMBOL_GPL(auxiliary_device_init); * __auxiliary_device_add - add an auxiliary bus device * @auxdev: auxiliary bus device to add to the bus * @modname: name of the parent device's driver module + * @irqs_sysfs_enable: whether to enable IRQs sysfs * * This is the third step in the three-step process to register an * auxiliary_device. @@ -310,7 +474,8 @@ EXPORT_SYMBOL_GPL(auxiliary_device_init); * parameter. Only if a user requires a custom name would this version be * called directly. */ -int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname) +int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname, + bool irqs_sysfs_enable) { struct device *dev = &auxdev->dev; int ret; @@ -325,6 +490,10 @@ int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname) dev_err(dev, "auxiliary device dev_set_name failed: %d\n", ret); return ret; } + if (irqs_sysfs_enable) { + dev->groups = auxiliary_irqs_groups; + xa_init(&auxdev->irqs); + } ret = device_add(dev); if (ret) diff --git a/include/linux/auxiliary_bus.h b/include/linux/auxiliary_bus.h index de21d9d24a95..760fadb26620 100644 --- a/include/linux/auxiliary_bus.h +++ b/include/linux/auxiliary_bus.h @@ -58,6 +58,7 @@ * in * @name: Match name found by the auxiliary device driver, * @id: unique identitier if multiple devices of the same name are exported, + * @irqs: irqs xarray contains irq indices which are used by the device, * * An auxiliary_device represents a part of its parent device's functionality. * It is given a name that, combined with the registering drivers @@ -138,6 +139,7 @@ struct auxiliary_device { struct device dev; const char *name; + struct xarray irqs; u32 id; }; @@ -209,8 +211,26 @@ static inline struct auxiliary_driver *to_auxiliary_drv(struct device_driver *dr } int auxiliary_device_init(struct auxiliary_device *auxdev); -int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname); -#define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, KBUILD_MODNAME) +int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname, + bool irqs_sysfs_enable); +#define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, KBUILD_MODNAME, false) +#define auxiliary_device_add_with_irqs(auxdev) \ + __auxiliary_device_add(auxdev, KBUILD_MODNAME, true) + +#ifdef CONFIG_SYSFS +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq); +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, + int irq); +#else /* CONFIG_SYSFS */ +static inline int +auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq) +{ + return 0; +} + +static inline void +auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, int irq) {} +#endif static inline void auxiliary_device_uninit(struct auxiliary_device *auxdev) {