Message ID | 20240613161912.300785-2-shayd@nvidia.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Introduce auxiliary bus IRQs sysfs | expand |
On Thu, Jun 13, 2024 at 07:19:11PM +0300, Shay Drory wrote: > PCI subfunctions (SF) are anchored on the auxiliary bus. PCI physical > and virtual functions are anchored on the PCI bus. The irq information > of each such function is visible to users via sysfs directory "msi_irqs" > containing files for each irq entry. However, for PCI SFs such > information is unavailable. Due to this users have no visibility on IRQs > used by the SFs. > Secondly, an SF can be multi function device supporting rdma, netdevice > and more. Without irq information at the bus level, the user is unable > to view or use the affinity of the SF IRQs. > > Hence to match to the equivalent PCI PFs and VFs, add "irqs" directory, > for supporting auxiliary devices, containing file for each irq entry. > > For example: > $ ls /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/ > 50 51 52 53 54 55 56 57 58 > > Reviewed-by: Parav Pandit <parav@nvidia.com> > Signed-off-by: Shay Drory <shayd@nvidia.com> > > --- > v5-v6: > - removed concept of shared and exclusive and hence global xarray (Greg) > v4-v5: > - restore global mutex and replace refcount_t with simple integer (Greg) > v3->4: > - remove global mutex (Przemek) > v2->v3: > - fix function declaration in case SYSFS isn't defined > v1->v2: > - move #ifdefs from drivers/base/auxiliary.c to > include/linux/auxiliary_bus.h (Greg) > - use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL (Greg) > - Fix kzalloc(ref) to kzalloc(*ref) (Simon) > - Add return description in auxiliary_device_sysfs_irq_add() kdoc (Simon) > - Fix auxiliary_irq_mode_show doc (kernel test boot) > --- > Documentation/ABI/testing/sysfs-bus-auxiliary | 7 ++ > drivers/base/auxiliary.c | 96 ++++++++++++++++++- > include/linux/auxiliary_bus.h | 24 ++++- > 3 files changed, 124 insertions(+), 3 deletions(-) > create mode 100644 Documentation/ABI/testing/sysfs-bus-auxiliary > > diff --git a/Documentation/ABI/testing/sysfs-bus-auxiliary b/Documentation/ABI/testing/sysfs-bus-auxiliary > new file mode 100644 > index 000000000000..e8752c2354bc > --- /dev/null > +++ b/Documentation/ABI/testing/sysfs-bus-auxiliary > @@ -0,0 +1,7 @@ > +What: /sys/bus/auxiliary/devices/.../irqs/ > +Date: April, 2024 > +Contact: Shay Drory <shayd@nvidia.com> > +Description: > + The /sys/devices/.../irqs directory contains a variable set of > + files, with each file is named as irq number similar to PCI PF > + or VF's irq number located in msi_irqs directory. > diff --git a/drivers/base/auxiliary.c b/drivers/base/auxiliary.c > index d3a2c40c2f12..fcd7dbf20f88 100644 > --- a/drivers/base/auxiliary.c > +++ b/drivers/base/auxiliary.c > @@ -158,6 +158,94 @@ > * }; > */ > > +#ifdef CONFIG_SYSFS People really build boxes without sysfs? Ok :( But if so, why not move this to a whole new file? That would make it simpler to maintain. > +struct auxiliary_irq_info { > + struct device_attribute sysfs_attr; > +}; > + > +static struct attribute *auxiliary_irq_attrs[] = { > + NULL > +}; > + > +static const struct attribute_group auxiliary_irqs_group = { > + .name = "irqs", > + .attrs = auxiliary_irq_attrs, > +}; > + > +static const struct attribute_group *auxiliary_irqs_groups[] = { > + &auxiliary_irqs_group, > + NULL > +}; > + > +/** > + * auxiliary_device_sysfs_irq_add - add a sysfs entry for the given IRQ > + * @auxdev: auxiliary bus device to add the sysfs entry. > + * @irq: The associated interrupt number. > + * > + * This function should be called after auxiliary device have successfully > + * received the irq. > + * > + * Return: zero on success or an error code on failure. > + */ > +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq) > +{ > + struct device *dev = &auxdev->dev; > + struct auxiliary_irq_info *info; > + int ret; > + > + info = kzalloc(sizeof(*info), GFP_KERNEL); > + if (!info) > + return -ENOMEM; > + > + sysfs_attr_init(&info->sysfs_attr.attr); > + info->sysfs_attr.attr.name = kasprintf(GFP_KERNEL, "%d", irq); > + if (!info->sysfs_attr.attr.name) { > + ret = -ENOMEM; > + goto name_err; > + } > + > + ret = xa_insert(&auxdev->irqs, irq, info, GFP_KERNEL); > + if (ret) > + goto auxdev_xa_err; > + > + ret = sysfs_add_file_to_group(&dev->kobj, &info->sysfs_attr.attr, > + auxiliary_irqs_group.name); Dynamic attributes are rough, because: > + if (ret) > + goto sysfs_add_err; > + > + return 0; > + > +sysfs_add_err: > + xa_erase(&auxdev->irqs, irq); > +auxdev_xa_err: > + kfree(info->sysfs_attr.attr.name); > +name_err: > + kfree(info); > + return ret; > +} > +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_add); > + > +/** > + * auxiliary_device_sysfs_irq_remove - remove a sysfs entry for the given IRQ > + * @auxdev: auxiliary bus device to add the sysfs entry. > + * @irq: the IRQ to remove. > + * > + * This function should be called to remove an IRQ sysfs entry. > + */ > +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, int irq) > +{ > + struct auxiliary_irq_info *info = xa_load(&auxdev->irqs, irq); > + struct device *dev = &auxdev->dev; > + > + sysfs_remove_file_from_group(&dev->kobj, &info->sysfs_attr.attr, > + auxiliary_irqs_group.name); > + xa_erase(&auxdev->irqs, irq); > + kfree(info->sysfs_attr.attr.name); > + kfree(info); > +} > +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_remove); What is forcing you to remove the irqs after a device is removed from the system? Why not just remove them all automatically? Why would you ever want to remove them after they were added, will they ever actually change over the lifespan of a device? > int auxiliary_device_init(struct auxiliary_device *auxdev); > -int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname); > -#define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, KBUILD_MODNAME) > +int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname, > + bool irqs_sysfs_enable); > +#define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, KBUILD_MODNAME, false) > +#define auxiliary_device_add_with_irqs(auxdev) \ > + __auxiliary_device_add(auxdev, KBUILD_MODNAME, true) Ick, no, that way lies madness. Just keep the original function: auxiliary_device_add() as is. Then, if someone DOES call auxiliary_device_sysfs_irq_add() then add the irq directory and file as needed then. That way no "norml" paths are messed up and over time, we don't keep having an explosion of combinations of function calls to create an aux device (as we all know, this is NOT going to be the last feature ever added to them...) thanks, greg k-h
On 13/06/2024 19:33, Greg KH wrote: > External email: Use caution opening links or attachments > > > On Thu, Jun 13, 2024 at 07:19:11PM +0300, Shay Drory wrote: >> PCI subfunctions (SF) are anchored on the auxiliary bus. PCI physical >> and virtual functions are anchored on the PCI bus. The irq information >> of each such function is visible to users via sysfs directory "msi_irqs" >> containing files for each irq entry. However, for PCI SFs such >> information is unavailable. Due to this users have no visibility on IRQs >> used by the SFs. >> Secondly, an SF can be multi function device supporting rdma, netdevice >> and more. Without irq information at the bus level, the user is unable >> to view or use the affinity of the SF IRQs. >> >> Hence to match to the equivalent PCI PFs and VFs, add "irqs" directory, >> for supporting auxiliary devices, containing file for each irq entry. >> >> For example: >> $ ls /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/ >> 50 51 52 53 54 55 56 57 58 >> >> Reviewed-by: Parav Pandit <parav@nvidia.com> >> Signed-off-by: Shay Drory <shayd@nvidia.com> >> >> --- >> v5-v6: >> - removed concept of shared and exclusive and hence global xarray (Greg) >> v4-v5: >> - restore global mutex and replace refcount_t with simple integer (Greg) >> v3->4: >> - remove global mutex (Przemek) >> v2->v3: >> - fix function declaration in case SYSFS isn't defined >> v1->v2: >> - move #ifdefs from drivers/base/auxiliary.c to >> include/linux/auxiliary_bus.h (Greg) >> - use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL (Greg) >> - Fix kzalloc(ref) to kzalloc(*ref) (Simon) >> - Add return description in auxiliary_device_sysfs_irq_add() kdoc (Simon) >> - Fix auxiliary_irq_mode_show doc (kernel test boot) >> --- >> Documentation/ABI/testing/sysfs-bus-auxiliary | 7 ++ >> drivers/base/auxiliary.c | 96 ++++++++++++++++++- >> include/linux/auxiliary_bus.h | 24 ++++- >> 3 files changed, 124 insertions(+), 3 deletions(-) >> create mode 100644 Documentation/ABI/testing/sysfs-bus-auxiliary >> >> diff --git a/Documentation/ABI/testing/sysfs-bus-auxiliary b/Documentation/ABI/testing/sysfs-bus-auxiliary >> new file mode 100644 >> index 000000000000..e8752c2354bc >> --- /dev/null >> +++ b/Documentation/ABI/testing/sysfs-bus-auxiliary >> @@ -0,0 +1,7 @@ >> +What: /sys/bus/auxiliary/devices/.../irqs/ >> +Date: April, 2024 >> +Contact: Shay Drory <shayd@nvidia.com> >> +Description: >> + The /sys/devices/.../irqs directory contains a variable set of >> + files, with each file is named as irq number similar to PCI PF >> + or VF's irq number located in msi_irqs directory. >> diff --git a/drivers/base/auxiliary.c b/drivers/base/auxiliary.c >> index d3a2c40c2f12..fcd7dbf20f88 100644 >> --- a/drivers/base/auxiliary.c >> +++ b/drivers/base/auxiliary.c >> @@ -158,6 +158,94 @@ >> * }; >> */ >> >> +#ifdef CONFIG_SYSFS > > People really build boxes without sysfs? Ok :( > > But if so, why not move this to a whole new file? That would make it > simpler to maintain. sounds good. Will move them to new sysfs.c > >> +struct auxiliary_irq_info { >> + struct device_attribute sysfs_attr; >> +}; >> + >> +static struct attribute *auxiliary_irq_attrs[] = { >> + NULL >> +}; >> + >> +static const struct attribute_group auxiliary_irqs_group = { >> + .name = "irqs", >> + .attrs = auxiliary_irq_attrs, >> +}; >> + >> +static const struct attribute_group *auxiliary_irqs_groups[] = { >> + &auxiliary_irqs_group, >> + NULL >> +}; >> + >> +/** >> + * auxiliary_device_sysfs_irq_add - add a sysfs entry for the given IRQ >> + * @auxdev: auxiliary bus device to add the sysfs entry. >> + * @irq: The associated interrupt number. >> + * >> + * This function should be called after auxiliary device have successfully >> + * received the irq. >> + * >> + * Return: zero on success or an error code on failure. >> + */ >> +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq) >> +{ >> + struct device *dev = &auxdev->dev; >> + struct auxiliary_irq_info *info; >> + int ret; >> + >> + info = kzalloc(sizeof(*info), GFP_KERNEL); >> + if (!info) >> + return -ENOMEM; >> + >> + sysfs_attr_init(&info->sysfs_attr.attr); >> + info->sysfs_attr.attr.name = kasprintf(GFP_KERNEL, "%d", irq); >> + if (!info->sysfs_attr.attr.name) { >> + ret = -ENOMEM; >> + goto name_err; >> + } >> + >> + ret = xa_insert(&auxdev->irqs, irq, info, GFP_KERNEL); >> + if (ret) >> + goto auxdev_xa_err; >> + >> + ret = sysfs_add_file_to_group(&dev->kobj, &info->sysfs_attr.attr, >> + auxiliary_irqs_group.name); > > Dynamic attributes are rough, because: Your response after "because" is missing. Can you please elaborate? > > >> + if (ret) >> + goto sysfs_add_err; >> + >> + return 0; >> + >> +sysfs_add_err: >> + xa_erase(&auxdev->irqs, irq); >> +auxdev_xa_err: >> + kfree(info->sysfs_attr.attr.name); >> +name_err: >> + kfree(info); >> + return ret; >> +} >> +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_add); >> + >> +/** >> + * auxiliary_device_sysfs_irq_remove - remove a sysfs entry for the given IRQ >> + * @auxdev: auxiliary bus device to add the sysfs entry. >> + * @irq: the IRQ to remove. >> + * >> + * This function should be called to remove an IRQ sysfs entry. >> + */ >> +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, int irq) >> +{ >> + struct auxiliary_irq_info *info = xa_load(&auxdev->irqs, irq); >> + struct device *dev = &auxdev->dev; >> + >> + sysfs_remove_file_from_group(&dev->kobj, &info->sysfs_attr.attr, >> + auxiliary_irqs_group.name); >> + xa_erase(&auxdev->irqs, irq); >> + kfree(info->sysfs_attr.attr.name); >> + kfree(info); >> +} >> +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_remove); > > What is forcing you to remove the irqs after a device is removed from > the system? We are removing the irqs _before_ removing the device. Irqs removal is following the exact mirror of add flow. > > Why not just remove them all automatically? Why would you ever want to > remove them after they were added, will they ever actually change over > the lifespan of a device? IRQs of the SFs are allocated and removed when the resources are created. for example, devlink reload flow that re-initialize the whole device by releasing and re-allocating new set of IRQs. Certain driver internal health recovery flow can also trigger similar re-initialize. > >> int auxiliary_device_init(struct auxiliary_device *auxdev); >> -int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname); >> -#define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, KBUILD_MODNAME) >> +int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname, >> + bool irqs_sysfs_enable); >> +#define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, KBUILD_MODNAME, false) >> +#define auxiliary_device_add_with_irqs(auxdev) \ >> + __auxiliary_device_add(auxdev, KBUILD_MODNAME, true) > > Ick, no, that way lies madness. > > Just keep the original function: > auxiliary_device_add() > as is. > > Then, if someone DOES call auxiliary_device_sysfs_irq_add() then add the > irq directory and file as needed then. > > That way no "norml" paths are messed up and over time, we don't keep > having an explosion of combinations of function calls to create an aux > device (as we all know, this is NOT going to be the last feature ever > added to them...) Thanks for the suggestion, will change in next version. > > thanks, > > greg k-h
On 6/17/24 08:38, Shay Drori wrote: > > > On 13/06/2024 19:33, Greg KH wrote: >> External email: Use caution opening links or attachments >> >> >> On Thu, Jun 13, 2024 at 07:19:11PM +0300, Shay Drory wrote: >>> PCI subfunctions (SF) are anchored on the auxiliary bus. PCI physical >>> and virtual functions are anchored on the PCI bus. The irq information >>> of each such function is visible to users via sysfs directory "msi_irqs" >>> containing files for each irq entry. However, for PCI SFs such >>> information is unavailable. Due to this users have no visibility on IRQs >>> used by the SFs. >>> Secondly, an SF can be multi function device supporting rdma, netdevice >>> and more. Without irq information at the bus level, the user is unable >>> to view or use the affinity of the SF IRQs. >>> >>> Hence to match to the equivalent PCI PFs and VFs, add "irqs" directory, >>> for supporting auxiliary devices, containing file for each irq entry. >>> >>> For example: >>> $ ls /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/ >>> 50 51 52 53 54 55 56 57 58 >>> >>> Reviewed-by: Parav Pandit <parav@nvidia.com> >>> Signed-off-by: Shay Drory <shayd@nvidia.com> >>> >>> --- >>> v5-v6: >>> - removed concept of shared and exclusive and hence global xarray (Greg) >>> v4-v5: >>> - restore global mutex and replace refcount_t with simple integer (Greg) >>> v3->4: >>> - remove global mutex (Przemek) >>> v2->v3: >>> - fix function declaration in case SYSFS isn't defined >>> v1->v2: >>> - move #ifdefs from drivers/base/auxiliary.c to >>> include/linux/auxiliary_bus.h (Greg) >>> - use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL (Greg) >>> - Fix kzalloc(ref) to kzalloc(*ref) (Simon) >>> - Add return description in auxiliary_device_sysfs_irq_add() kdoc >>> (Simon) >>> - Fix auxiliary_irq_mode_show doc (kernel test boot) >>> --- >>> Documentation/ABI/testing/sysfs-bus-auxiliary | 7 ++ >>> drivers/base/auxiliary.c | 96 ++++++++++++++++++- >>> include/linux/auxiliary_bus.h | 24 ++++- >>> 3 files changed, 124 insertions(+), 3 deletions(-) >>> create mode 100644 Documentation/ABI/testing/sysfs-bus-auxiliary >>> >>> diff --git a/Documentation/ABI/testing/sysfs-bus-auxiliary >>> b/Documentation/ABI/testing/sysfs-bus-auxiliary >>> new file mode 100644 >>> index 000000000000..e8752c2354bc >>> --- /dev/null >>> +++ b/Documentation/ABI/testing/sysfs-bus-auxiliary >>> @@ -0,0 +1,7 @@ >>> +What: /sys/bus/auxiliary/devices/.../irqs/ >>> +Date: April, 2024 >>> +Contact: Shay Drory <shayd@nvidia.com> >>> +Description: >>> + The /sys/devices/.../irqs directory contains a variable >>> set of >>> + files, with each file is named as irq number similar to >>> PCI PF >>> + or VF's irq number located in msi_irqs directory. >>> diff --git a/drivers/base/auxiliary.c b/drivers/base/auxiliary.c >>> index d3a2c40c2f12..fcd7dbf20f88 100644 >>> --- a/drivers/base/auxiliary.c >>> +++ b/drivers/base/auxiliary.c >>> @@ -158,6 +158,94 @@ >>> * }; >>> */ >>> >>> +#ifdef CONFIG_SYSFS >> >> People really build boxes without sysfs? Ok :( >> >> But if so, why not move this to a whole new file? That would make it >> simpler to maintain. > > sounds good. Will move them to new sysfs.c your proposed name combined with the directory would suggest that this is base sysfs for drivers - drivers/base/sysfs.c please add aux_ prefix, or similar > >> >>> +struct auxiliary_irq_info { >>> + struct device_attribute sysfs_attr; >>> +}; >>> + >>> +static struct attribute *auxiliary_irq_attrs[] = { >>> + NULL >>> +}; >>> + >>> +static const struct attribute_group auxiliary_irqs_group = { >>> + .name = "irqs", >>> + .attrs = auxiliary_irq_attrs, >>> +}; >>> + >>> +static const struct attribute_group *auxiliary_irqs_groups[] = { >>> + &auxiliary_irqs_group, >>> + NULL >>> +}; >>> + >>> +/** >>> + * auxiliary_device_sysfs_irq_add - add a sysfs entry for the given IRQ >>> + * @auxdev: auxiliary bus device to add the sysfs entry. >>> + * @irq: The associated interrupt number. >>> + * >>> + * This function should be called after auxiliary device have >>> successfully >>> + * received the irq. >>> + * >>> + * Return: zero on success or an error code on failure. >>> + */ >>> +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, >>> int irq) >>> +{ >>> + struct device *dev = &auxdev->dev; >>> + struct auxiliary_irq_info *info; >>> + int ret; >>> + >>> + info = kzalloc(sizeof(*info), GFP_KERNEL); >>> + if (!info) >>> + return -ENOMEM; >>> + >>> + sysfs_attr_init(&info->sysfs_attr.attr); >>> + info->sysfs_attr.attr.name = kasprintf(GFP_KERNEL, "%d", irq); >>> + if (!info->sysfs_attr.attr.name) { >>> + ret = -ENOMEM; >>> + goto name_err; >>> + } >>> + >>> + ret = xa_insert(&auxdev->irqs, irq, info, GFP_KERNEL); >>> + if (ret) >>> + goto auxdev_xa_err; >>> + >>> + ret = sysfs_add_file_to_group(&dev->kobj, &info->sysfs_attr.attr, >>> + auxiliary_irqs_group.name); >> >> Dynamic attributes are rough, because: > > Your response after "because" is missing. > Can you please elaborate? you have "complicated" (compared to "nothing" for static attrs) unwinding/error path > >> >> >>> + if (ret) >>> + goto sysfs_add_err; >>> + >>> + return 0; >>> + >>> +sysfs_add_err: >>> + xa_erase(&auxdev->irqs, irq); >>> +auxdev_xa_err: >>> + kfree(info->sysfs_attr.attr.name); >>> +name_err: >>> + kfree(info); >>> + return ret; >>> +} >>> +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_add); >>> + >>> +/** >>> + * auxiliary_device_sysfs_irq_remove - remove a sysfs entry for the >>> given IRQ >>> + * @auxdev: auxiliary bus device to add the sysfs entry. >>> + * @irq: the IRQ to remove. >>> + * >>> + * This function should be called to remove an IRQ sysfs entry. >>> + */ >>> +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device >>> *auxdev, int irq) >>> +{ >>> + struct auxiliary_irq_info *info = xa_load(&auxdev->irqs, irq); >>> + struct device *dev = &auxdev->dev; >>> + >>> + sysfs_remove_file_from_group(&dev->kobj, &info->sysfs_attr.attr, >>> + auxiliary_irqs_group.name); >>> + xa_erase(&auxdev->irqs, irq); >>> + kfree(info->sysfs_attr.attr.name); >>> + kfree(info); >>> +} >>> +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_remove); >> >> What is forcing you to remove the irqs after a device is removed from >> the system? > > We are removing the irqs _before_ removing the device. > Irqs removal is following the exact mirror of add flow. > >> >> Why not just remove them all automatically? Why would you ever want to >> remove them after they were added, will they ever actually change over >> the lifespan of a device? > > IRQs of the SFs are allocated and removed when the resources are > created. > for example, devlink reload flow that re-initialize the whole device by > releasing and re-allocating new set of IRQs. > Certain driver internal health recovery flow can also trigger similar > re-initialize. I read it as "removing all is what we use 'remove-one' for", I'm correct? > >> >>> int auxiliary_device_init(struct auxiliary_device *auxdev); >>> -int __auxiliary_device_add(struct auxiliary_device *auxdev, const >>> char *modname); >>> -#define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, >>> KBUILD_MODNAME) >>> +int __auxiliary_device_add(struct auxiliary_device *auxdev, const >>> char *modname, >>> + bool irqs_sysfs_enable); >>> +#define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, >>> KBUILD_MODNAME, false) >>> +#define auxiliary_device_add_with_irqs(auxdev) \ >>> + __auxiliary_device_add(auxdev, KBUILD_MODNAME, true) >> >> Ick, no, that way lies madness. >> >> Just keep the original function: >> auxiliary_device_add() >> as is. >> >> Then, if someone DOES call auxiliary_device_sysfs_irq_add() then add the >> irq directory and file as needed then. >> >> That way no "norml" paths are messed up and over time, we don't keep >> having an explosion of combinations of function calls to create an aux >> device (as we all know, this is NOT going to be the last feature ever >> added to them...) > > Thanks for the suggestion, will change in next version. > >> >> thanks, >> >> greg k-h >
On 17/06/2024 12:52, Przemek Kitszel wrote: > External email: Use caution opening links or attachments > > > On 6/17/24 08:38, Shay Drori wrote: >> >> >> On 13/06/2024 19:33, Greg KH wrote: >>> External email: Use caution opening links or attachments >>> >>> >>> On Thu, Jun 13, 2024 at 07:19:11PM +0300, Shay Drory wrote: >>>> PCI subfunctions (SF) are anchored on the auxiliary bus. PCI physical >>>> and virtual functions are anchored on the PCI bus. The irq information >>>> of each such function is visible to users via sysfs directory >>>> "msi_irqs" >>>> containing files for each irq entry. However, for PCI SFs such >>>> information is unavailable. Due to this users have no visibility on >>>> IRQs >>>> used by the SFs. >>>> Secondly, an SF can be multi function device supporting rdma, netdevice >>>> and more. Without irq information at the bus level, the user is unable >>>> to view or use the affinity of the SF IRQs. >>>> >>>> Hence to match to the equivalent PCI PFs and VFs, add "irqs" directory, >>>> for supporting auxiliary devices, containing file for each irq entry. >>>> >>>> For example: >>>> $ ls /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/ >>>> 50 51 52 53 54 55 56 57 58 >>>> >>>> Reviewed-by: Parav Pandit <parav@nvidia.com> >>>> Signed-off-by: Shay Drory <shayd@nvidia.com> >>>> >>>> --- >>>> v5-v6: >>>> - removed concept of shared and exclusive and hence global xarray >>>> (Greg) >>>> v4-v5: >>>> - restore global mutex and replace refcount_t with simple integer >>>> (Greg) >>>> v3->4: >>>> - remove global mutex (Przemek) >>>> v2->v3: >>>> - fix function declaration in case SYSFS isn't defined >>>> v1->v2: >>>> - move #ifdefs from drivers/base/auxiliary.c to >>>> include/linux/auxiliary_bus.h (Greg) >>>> - use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL (Greg) >>>> - Fix kzalloc(ref) to kzalloc(*ref) (Simon) >>>> - Add return description in auxiliary_device_sysfs_irq_add() kdoc >>>> (Simon) >>>> - Fix auxiliary_irq_mode_show doc (kernel test boot) >>>> --- >>>> Documentation/ABI/testing/sysfs-bus-auxiliary | 7 ++ >>>> drivers/base/auxiliary.c | 96 >>>> ++++++++++++++++++- >>>> include/linux/auxiliary_bus.h | 24 ++++- >>>> 3 files changed, 124 insertions(+), 3 deletions(-) >>>> create mode 100644 Documentation/ABI/testing/sysfs-bus-auxiliary >>>> >>>> diff --git a/Documentation/ABI/testing/sysfs-bus-auxiliary >>>> b/Documentation/ABI/testing/sysfs-bus-auxiliary >>>> new file mode 100644 >>>> index 000000000000..e8752c2354bc >>>> --- /dev/null >>>> +++ b/Documentation/ABI/testing/sysfs-bus-auxiliary >>>> @@ -0,0 +1,7 @@ >>>> +What: /sys/bus/auxiliary/devices/.../irqs/ >>>> +Date: April, 2024 >>>> +Contact: Shay Drory <shayd@nvidia.com> >>>> +Description: >>>> + The /sys/devices/.../irqs directory contains a variable >>>> set of >>>> + files, with each file is named as irq number similar to >>>> PCI PF >>>> + or VF's irq number located in msi_irqs directory. >>>> diff --git a/drivers/base/auxiliary.c b/drivers/base/auxiliary.c >>>> index d3a2c40c2f12..fcd7dbf20f88 100644 >>>> --- a/drivers/base/auxiliary.c >>>> +++ b/drivers/base/auxiliary.c >>>> @@ -158,6 +158,94 @@ >>>> * }; >>>> */ >>>> >>>> +#ifdef CONFIG_SYSFS >>> >>> People really build boxes without sysfs? Ok :( >>> >>> But if so, why not move this to a whole new file? That would make it >>> simpler to maintain. >> >> sounds good. Will move them to new sysfs.c > > your proposed name combined with the directory would suggest that this > is base sysfs for drivers - drivers/base/sysfs.c > please add aux_ prefix, or similar correct, Thanks for the suggestion. > >> >>> >>>> +struct auxiliary_irq_info { >>>> + struct device_attribute sysfs_attr; >>>> +}; >>>> + >>>> +static struct attribute *auxiliary_irq_attrs[] = { >>>> + NULL >>>> +}; >>>> + >>>> +static const struct attribute_group auxiliary_irqs_group = { >>>> + .name = "irqs", >>>> + .attrs = auxiliary_irq_attrs, >>>> +}; >>>> + >>>> +static const struct attribute_group *auxiliary_irqs_groups[] = { >>>> + &auxiliary_irqs_group, >>>> + NULL >>>> +}; >>>> + >>>> +/** >>>> + * auxiliary_device_sysfs_irq_add - add a sysfs entry for the given >>>> IRQ >>>> + * @auxdev: auxiliary bus device to add the sysfs entry. >>>> + * @irq: The associated interrupt number. >>>> + * >>>> + * This function should be called after auxiliary device have >>>> successfully >>>> + * received the irq. >>>> + * >>>> + * Return: zero on success or an error code on failure. >>>> + */ >>>> +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, >>>> int irq) >>>> +{ >>>> + struct device *dev = &auxdev->dev; >>>> + struct auxiliary_irq_info *info; >>>> + int ret; >>>> + >>>> + info = kzalloc(sizeof(*info), GFP_KERNEL); >>>> + if (!info) >>>> + return -ENOMEM; >>>> + >>>> + sysfs_attr_init(&info->sysfs_attr.attr); >>>> + info->sysfs_attr.attr.name = kasprintf(GFP_KERNEL, "%d", irq); >>>> + if (!info->sysfs_attr.attr.name) { >>>> + ret = -ENOMEM; >>>> + goto name_err; >>>> + } >>>> + >>>> + ret = xa_insert(&auxdev->irqs, irq, info, GFP_KERNEL); >>>> + if (ret) >>>> + goto auxdev_xa_err; >>>> + >>>> + ret = sysfs_add_file_to_group(&dev->kobj, &info->sysfs_attr.attr, >>>> + auxiliary_irqs_group.name); >>> >>> Dynamic attributes are rough, because: >> >> Your response after "because" is missing. >> Can you please elaborate? > > you have "complicated" (compared to "nothing" for static attrs) > unwinding/error path Dynamic attributes are needed since the number of IRQs per SF is dynamic, one SF can have 10 IRQs and another can have 15 IRQs. In addition, SF is requesting the IRQs on demand, when the SF net-device is opening its channels for example. > >> >>> >>> >>>> + if (ret) >>>> + goto sysfs_add_err; >>>> + >>>> + return 0; >>>> + >>>> +sysfs_add_err: >>>> + xa_erase(&auxdev->irqs, irq); >>>> +auxdev_xa_err: >>>> + kfree(info->sysfs_attr.attr.name); >>>> +name_err: >>>> + kfree(info); >>>> + return ret; >>>> +} >>>> +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_add); >>>> + >>>> +/** >>>> + * auxiliary_device_sysfs_irq_remove - remove a sysfs entry for the >>>> given IRQ >>>> + * @auxdev: auxiliary bus device to add the sysfs entry. >>>> + * @irq: the IRQ to remove. >>>> + * >>>> + * This function should be called to remove an IRQ sysfs entry. >>>> + */ >>>> +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device >>>> *auxdev, int irq) >>>> +{ >>>> + struct auxiliary_irq_info *info = xa_load(&auxdev->irqs, irq); >>>> + struct device *dev = &auxdev->dev; >>>> + >>>> + sysfs_remove_file_from_group(&dev->kobj, &info->sysfs_attr.attr, >>>> + auxiliary_irqs_group.name); >>>> + xa_erase(&auxdev->irqs, irq); >>>> + kfree(info->sysfs_attr.attr.name); >>>> + kfree(info); >>>> +} >>>> +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_remove); >>> >>> What is forcing you to remove the irqs after a device is removed from >>> the system? >> >> We are removing the irqs _before_ removing the device. >> Irqs removal is following the exact mirror of add flow. >> >>> >>> Why not just remove them all automatically? Why would you ever want to >>> remove them after they were added, will they ever actually change over >>> the lifespan of a device? >> >> IRQs of the SFs are allocated and removed when the resources are >> created. >> for example, devlink reload flow that re-initialize the whole device by >> releasing and re-allocating new set of IRQs. >> Certain driver internal health recovery flow can also trigger similar >> re-initialize. > > I read it as "removing all is what we use 'remove-one' for", > I'm correct? > I am not sure what you meant here. what I wanted to say is that SF is requesting and freeing the IRQs one by one and not in bulk. >> >>> >>>> int auxiliary_device_init(struct auxiliary_device *auxdev); >>>> -int __auxiliary_device_add(struct auxiliary_device *auxdev, const >>>> char *modname); >>>> -#define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, >>>> KBUILD_MODNAME) >>>> +int __auxiliary_device_add(struct auxiliary_device *auxdev, const >>>> char *modname, >>>> + bool irqs_sysfs_enable); >>>> +#define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, >>>> KBUILD_MODNAME, false) >>>> +#define auxiliary_device_add_with_irqs(auxdev) \ >>>> + __auxiliary_device_add(auxdev, KBUILD_MODNAME, true) >>> >>> Ick, no, that way lies madness. >>> >>> Just keep the original function: >>> auxiliary_device_add() >>> as is. >>> >>> Then, if someone DOES call auxiliary_device_sysfs_irq_add() then add the >>> irq directory and file as needed then. >>> >>> That way no "norml" paths are messed up and over time, we don't keep >>> having an explosion of combinations of function calls to create an aux >>> device (as we all know, this is NOT going to be the last feature ever >>> added to them...) >> >> Thanks for the suggestion, will change in next version. >> >>> >>> thanks, >>> >>> greg k-h >> >
diff --git a/Documentation/ABI/testing/sysfs-bus-auxiliary b/Documentation/ABI/testing/sysfs-bus-auxiliary new file mode 100644 index 000000000000..e8752c2354bc --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-auxiliary @@ -0,0 +1,7 @@ +What: /sys/bus/auxiliary/devices/.../irqs/ +Date: April, 2024 +Contact: Shay Drory <shayd@nvidia.com> +Description: + The /sys/devices/.../irqs directory contains a variable set of + files, with each file is named as irq number similar to PCI PF + or VF's irq number located in msi_irqs directory. diff --git a/drivers/base/auxiliary.c b/drivers/base/auxiliary.c index d3a2c40c2f12..fcd7dbf20f88 100644 --- a/drivers/base/auxiliary.c +++ b/drivers/base/auxiliary.c @@ -158,6 +158,94 @@ * }; */ +#ifdef CONFIG_SYSFS +struct auxiliary_irq_info { + struct device_attribute sysfs_attr; +}; + +static struct attribute *auxiliary_irq_attrs[] = { + NULL +}; + +static const struct attribute_group auxiliary_irqs_group = { + .name = "irqs", + .attrs = auxiliary_irq_attrs, +}; + +static const struct attribute_group *auxiliary_irqs_groups[] = { + &auxiliary_irqs_group, + NULL +}; + +/** + * auxiliary_device_sysfs_irq_add - add a sysfs entry for the given IRQ + * @auxdev: auxiliary bus device to add the sysfs entry. + * @irq: The associated interrupt number. + * + * This function should be called after auxiliary device have successfully + * received the irq. + * + * Return: zero on success or an error code on failure. + */ +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq) +{ + struct device *dev = &auxdev->dev; + struct auxiliary_irq_info *info; + int ret; + + info = kzalloc(sizeof(*info), GFP_KERNEL); + if (!info) + return -ENOMEM; + + sysfs_attr_init(&info->sysfs_attr.attr); + info->sysfs_attr.attr.name = kasprintf(GFP_KERNEL, "%d", irq); + if (!info->sysfs_attr.attr.name) { + ret = -ENOMEM; + goto name_err; + } + + ret = xa_insert(&auxdev->irqs, irq, info, GFP_KERNEL); + if (ret) + goto auxdev_xa_err; + + ret = sysfs_add_file_to_group(&dev->kobj, &info->sysfs_attr.attr, + auxiliary_irqs_group.name); + if (ret) + goto sysfs_add_err; + + return 0; + +sysfs_add_err: + xa_erase(&auxdev->irqs, irq); +auxdev_xa_err: + kfree(info->sysfs_attr.attr.name); +name_err: + kfree(info); + return ret; +} +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_add); + +/** + * auxiliary_device_sysfs_irq_remove - remove a sysfs entry for the given IRQ + * @auxdev: auxiliary bus device to add the sysfs entry. + * @irq: the IRQ to remove. + * + * This function should be called to remove an IRQ sysfs entry. + */ +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, int irq) +{ + struct auxiliary_irq_info *info = xa_load(&auxdev->irqs, irq); + struct device *dev = &auxdev->dev; + + sysfs_remove_file_from_group(&dev->kobj, &info->sysfs_attr.attr, + auxiliary_irqs_group.name); + xa_erase(&auxdev->irqs, irq); + kfree(info->sysfs_attr.attr.name); + kfree(info); +} +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_remove); +#endif + static const struct auxiliary_device_id *auxiliary_match_id(const struct auxiliary_device_id *id, const struct auxiliary_device *auxdev) { @@ -295,6 +383,7 @@ EXPORT_SYMBOL_GPL(auxiliary_device_init); * __auxiliary_device_add - add an auxiliary bus device * @auxdev: auxiliary bus device to add to the bus * @modname: name of the parent device's driver module + * @irqs_sysfs_enable: whether to enable IRQs sysfs * * This is the third step in the three-step process to register an * auxiliary_device. @@ -310,7 +399,8 @@ EXPORT_SYMBOL_GPL(auxiliary_device_init); * parameter. Only if a user requires a custom name would this version be * called directly. */ -int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname) +int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname, + bool irqs_sysfs_enable) { struct device *dev = &auxdev->dev; int ret; @@ -325,6 +415,10 @@ int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname) dev_err(dev, "auxiliary device dev_set_name failed: %d\n", ret); return ret; } + if (irqs_sysfs_enable) { + dev->groups = auxiliary_irqs_groups; + xa_init(&auxdev->irqs); + } ret = device_add(dev); if (ret) diff --git a/include/linux/auxiliary_bus.h b/include/linux/auxiliary_bus.h index de21d9d24a95..760fadb26620 100644 --- a/include/linux/auxiliary_bus.h +++ b/include/linux/auxiliary_bus.h @@ -58,6 +58,7 @@ * in * @name: Match name found by the auxiliary device driver, * @id: unique identitier if multiple devices of the same name are exported, + * @irqs: irqs xarray contains irq indices which are used by the device, * * An auxiliary_device represents a part of its parent device's functionality. * It is given a name that, combined with the registering drivers @@ -138,6 +139,7 @@ struct auxiliary_device { struct device dev; const char *name; + struct xarray irqs; u32 id; }; @@ -209,8 +211,26 @@ static inline struct auxiliary_driver *to_auxiliary_drv(struct device_driver *dr } int auxiliary_device_init(struct auxiliary_device *auxdev); -int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname); -#define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, KBUILD_MODNAME) +int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname, + bool irqs_sysfs_enable); +#define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, KBUILD_MODNAME, false) +#define auxiliary_device_add_with_irqs(auxdev) \ + __auxiliary_device_add(auxdev, KBUILD_MODNAME, true) + +#ifdef CONFIG_SYSFS +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq); +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, + int irq); +#else /* CONFIG_SYSFS */ +static inline int +auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq) +{ + return 0; +} + +static inline void +auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, int irq) {} +#endif static inline void auxiliary_device_uninit(struct auxiliary_device *auxdev) {