Message ID | 20200530091505.56664-1-song.bao.hua@hisilicon.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | iommu/arm-smmu-v3: expose numa_node attribute to users in sysfs | expand |
On 2020-05-30 10:15, Barry Song wrote: > As tests show the latency of dma_unmap can increase dramatically while > calling them cross NUMA nodes, especially cross CPU packages, eg. > 300ns vs 800ns while waiting for the completion of CMD_SYNC in an > empty command queue. The large latency causing by remote node will > in turn make contention of the command queue more serious, and enlarge > the latency of DMA users within local NUMA nodes. > > Users might intend to enforce NUMA locality with the consideration of > the position of SMMU. The patch provides minor benefit by presenting > this information to users directly, as they might want to know it without > checking hardware spec at all. Hmm, given that dev-to_node() is a standard driver model thing, is there not already some generic device property that can expose it - and if not, should there be? Presumably if userspace cares enough to want to know whereabouts in the system an IOMMU is, it probably also cares where the actual endpoint devices are too. At the very least, it doesn't seem right for it to be specific to one single IOMMU driver. Robin. > Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> > --- > drivers/iommu/arm-smmu-v3.c | 40 ++++++++++++++++++++++++++++++++++++- > 1 file changed, 39 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > index 82508730feb7..754c4d59498b 100644 > --- a/drivers/iommu/arm-smmu-v3.c > +++ b/drivers/iommu/arm-smmu-v3.c > @@ -4021,6 +4021,44 @@ err_reset_pci_ops: __maybe_unused; > return err; > } > > +static ssize_t numa_node_show(struct device *dev, > + struct device_attribute *attr, char *buf) > +{ > + return sprintf(buf, "%d\n", dev_to_node(dev)); > +} > +static DEVICE_ATTR_RO(numa_node); > + > +static umode_t arm_smmu_numa_attr_visible(struct kobject *kobj, struct attribute *a, > + int n) > +{ > + struct device *dev = container_of(kobj, typeof(*dev), kobj); > + > + if (!IS_ENABLED(CONFIG_NUMA)) > + return 0; > + > + if (a == &dev_attr_numa_node.attr && > + dev_to_node(dev) == NUMA_NO_NODE) > + return 0; > + > + return a->mode; > +} > + > +static struct attribute *arm_smmu_dev_attrs[] = { > + &dev_attr_numa_node.attr, > + NULL > +}; > + > +static struct attribute_group arm_smmu_dev_attrs_group = { > + .attrs = arm_smmu_dev_attrs, > + .is_visible = arm_smmu_numa_attr_visible, > +}; > + > + > +static const struct attribute_group *arm_smmu_dev_attrs_groups[] = { > + &arm_smmu_dev_attrs_group, > + NULL, > +}; > + > static int arm_smmu_device_probe(struct platform_device *pdev) > { > int irq, ret; > @@ -4097,7 +4135,7 @@ static int arm_smmu_device_probe(struct platform_device *pdev) > return ret; > > /* And we're up. Go go go! */ > - ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL, > + ret = iommu_device_sysfs_add(&smmu->iommu, dev, arm_smmu_dev_attrs_groups, > "smmu3.%pa", &ioaddr); > if (ret) > return ret; >
> -----Original Message----- > From: Robin Murphy [mailto:robin.murphy@arm.com] > Sent: Tuesday, June 2, 2020 1:14 AM > To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>; will@kernel.org; > hch@lst.de; m.szyprowski@samsung.com; iommu@lists.linux-foundation.org > Cc: Linuxarm <linuxarm@huawei.com>; linux-arm-kernel@lists.infradead.org > Subject: Re: [PATCH] iommu/arm-smmu-v3: expose numa_node attribute to > users in sysfs > > On 2020-05-30 10:15, Barry Song wrote: > > As tests show the latency of dma_unmap can increase dramatically while > > calling them cross NUMA nodes, especially cross CPU packages, eg. > > 300ns vs 800ns while waiting for the completion of CMD_SYNC in an > > empty command queue. The large latency causing by remote node will > > in turn make contention of the command queue more serious, and enlarge > > the latency of DMA users within local NUMA nodes. > > > > Users might intend to enforce NUMA locality with the consideration of > > the position of SMMU. The patch provides minor benefit by presenting > > this information to users directly, as they might want to know it without > > checking hardware spec at all. > > Hmm, given that dev-to_node() is a standard driver model thing, is there > not already some generic device property that can expose it - and if > not, should there be? Presumably if userspace cares enough to want to > know whereabouts in the system an IOMMU is, it probably also cares where > the actual endpoint devices are too. > > At the very least, it doesn't seem right for it to be specific to one > single IOMMU driver. Right now pci devices have generally got the numa_node in sysfs by drivers/pci/pci-sysfs.c static ssize_t numa_node_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { ... add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK); pci_alert(pdev, FW_BUG "Overriding NUMA node to %d. Contact your vendor for updates.", node); dev->numa_node = node; return count; } static ssize_t numa_node_show(struct device *dev, struct device_attribute *attr, char *buf) { return sprintf(buf, "%d\n", dev->numa_node); } static DEVICE_ATTR_RW(numa_node); for other devices who care about numa information, the specific drivers are doing that, for example: drivers/dax/bus.c: if (a == &dev_attr_numa_node.attr && !IS_ENABLED(CONFIG_NUMA)) drivers/dax/bus.c: &dev_attr_numa_node.attr, drivers/dma/idxd/sysfs.c: &dev_attr_numa_node.attr, drivers/hv/vmbus_drv.c: &dev_attr_numa_node.attr, drivers/nvdimm/bus.c: &dev_attr_numa_node.attr, drivers/nvme/host/core.c: &dev_attr_numa_node.attr, smmu is usually a platform device, we can actually expose numa_node for platform_device, or globally expose numa_node for general "device" if people don't opposite. Barry > > Robin. > > > Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> > > --- > > drivers/iommu/arm-smmu-v3.c | 40 > ++++++++++++++++++++++++++++++++++++- > > 1 file changed, 39 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > > index 82508730feb7..754c4d59498b 100644 > > --- a/drivers/iommu/arm-smmu-v3.c > > +++ b/drivers/iommu/arm-smmu-v3.c > > @@ -4021,6 +4021,44 @@ err_reset_pci_ops: __maybe_unused; > > return err; > > } > > > > +static ssize_t numa_node_show(struct device *dev, > > + struct device_attribute *attr, char *buf) > > +{ > > + return sprintf(buf, "%d\n", dev_to_node(dev)); > > +} > > +static DEVICE_ATTR_RO(numa_node); > > + > > +static umode_t arm_smmu_numa_attr_visible(struct kobject *kobj, struct > attribute *a, > > + int n) > > +{ > > + struct device *dev = container_of(kobj, typeof(*dev), kobj); > > + > > + if (!IS_ENABLED(CONFIG_NUMA)) > > + return 0; > > + > > + if (a == &dev_attr_numa_node.attr && > > + dev_to_node(dev) == NUMA_NO_NODE) > > + return 0; > > + > > + return a->mode; > > +} > > + > > +static struct attribute *arm_smmu_dev_attrs[] = { > > + &dev_attr_numa_node.attr, > > + NULL > > +}; > > + > > +static struct attribute_group arm_smmu_dev_attrs_group = { > > + .attrs = arm_smmu_dev_attrs, > > + .is_visible = arm_smmu_numa_attr_visible, > > +}; > > + > > + > > +static const struct attribute_group *arm_smmu_dev_attrs_groups[] = { > > + &arm_smmu_dev_attrs_group, > > + NULL, > > +}; > > + > > static int arm_smmu_device_probe(struct platform_device *pdev) > > { > > int irq, ret; > > @@ -4097,7 +4135,7 @@ static int arm_smmu_device_probe(struct > platform_device *pdev) > > return ret; > > > > /* And we're up. Go go go! */ > > - ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL, > > + ret = iommu_device_sysfs_add(&smmu->iommu, dev, > arm_smmu_dev_attrs_groups, > > "smmu3.%pa", &ioaddr); > > if (ret) > > return ret; > >
On Sat, May 30, 2020 at 09:15:05PM +1200, Barry Song wrote: > As tests show the latency of dma_unmap can increase dramatically while > calling them cross NUMA nodes, especially cross CPU packages, eg. > 300ns vs 800ns while waiting for the completion of CMD_SYNC in an > empty command queue. The large latency causing by remote node will > in turn make contention of the command queue more serious, and enlarge > the latency of DMA users within local NUMA nodes. > > Users might intend to enforce NUMA locality with the consideration of > the position of SMMU. The patch provides minor benefit by presenting > this information to users directly, as they might want to know it without > checking hardware spec at all. I don't think that's a very good reason to expose things to userspace. I know sysfs shouldn't be treated as ABI, but the grim reality is that once somebody relies on this stuff then we can't change it, so I'd rather avoid exposing it unless it's absolutely necessary. Thanks, Will
> -----Original Message----- > From: Will Deacon [mailto:will@kernel.org] > Sent: Saturday, July 4, 2020 4:22 AM > To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com> > Cc: robin.murphy@arm.com; hch@lst.de; m.szyprowski@samsung.com; > iommu@lists.linux-foundation.org; linux-arm-kernel@lists.infradead.org; > Linuxarm <linuxarm@huawei.com> > Subject: Re: [PATCH] iommu/arm-smmu-v3: expose numa_node attribute to > users in sysfs > > On Sat, May 30, 2020 at 09:15:05PM +1200, Barry Song wrote: > > As tests show the latency of dma_unmap can increase dramatically while > > calling them cross NUMA nodes, especially cross CPU packages, eg. > > 300ns vs 800ns while waiting for the completion of CMD_SYNC in an > > empty command queue. The large latency causing by remote node will > > in turn make contention of the command queue more serious, and enlarge > > the latency of DMA users within local NUMA nodes. > > > > Users might intend to enforce NUMA locality with the consideration of > > the position of SMMU. The patch provides minor benefit by presenting > > this information to users directly, as they might want to know it without > > checking hardware spec at all. > > I don't think that's a very good reason to expose things to userspace. > I know sysfs shouldn't be treated as ABI, but the grim reality is that > once somebody relies on this stuff then we can't change it, so I'd > rather avoid exposing it unless it's absolutely necessary. Will, thanks for taking a look! I am not sure if it is absolutely necessary, but it is useful to users. The whole story started from some users who wanted to know the hardware topology very clear by reading some sysfs node just like they are able to do that for pci devices. The intention is that users can know hardware topology of various devices easily from linux since they maybe don't know all the hardware details. For pci devices, kernel has done that. And there are some other drivers out of pci exposing numa_node as well. It seems it is hard to say it is absolutely necessary for them too since sysfs shouldn't be treated as ABI. I got some input from Linux users who also wanted to know the numa node for other devices which are not PCI, for example, platform devices. And I thought the requirement is kind of reasonable. So I also had another patch to generally support this kind of requirements, with the below patch, this smmu patch is not necessary any more: https://lkml.org/lkml/2020/6/18/1257 for platform device created by ARM ACPI/IORT and general acpi_create_platform_device() drivers/acpi/scan.c: static void acpi_default_enumeration(struct acpi_device *device) { ... if (!device->flags.enumeration_by_parent) { acpi_create_platform_device(device, NULL); acpi_device_set_enumerated(device); } } struct platform_device *acpi_create_platform_device(struct acpi_device *adev, struct property_entry *properties) { ... pdev = platform_device_register_full(&pdevinfo); if (IS_ERR(pdev)) ... else { set_dev_node(&pdev->dev, acpi_get_node(adev->handle)); ... } ... } numa_node is set for this kind of devices. Anyway, just want to explain to you the background some people want to know the hardware topology from Linux in same simple way. And it seems it is a reasonable requirement to me :-) > > Thanks, > > Will Thanks barry
+CC Brice. On Sun, 5 Jul 2020 09:53:58 +0000 "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com> wrote: > > -----Original Message----- > > From: Will Deacon [mailto:will@kernel.org] > > Sent: Saturday, July 4, 2020 4:22 AM > > To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com> > > Cc: robin.murphy@arm.com; hch@lst.de; m.szyprowski@samsung.com; > > iommu@lists.linux-foundation.org; linux-arm-kernel@lists.infradead.org; > > Linuxarm <linuxarm@huawei.com> > > Subject: Re: [PATCH] iommu/arm-smmu-v3: expose numa_node attribute to > > users in sysfs > > > > On Sat, May 30, 2020 at 09:15:05PM +1200, Barry Song wrote: > > > As tests show the latency of dma_unmap can increase dramatically while > > > calling them cross NUMA nodes, especially cross CPU packages, eg. > > > 300ns vs 800ns while waiting for the completion of CMD_SYNC in an > > > empty command queue. The large latency causing by remote node will > > > in turn make contention of the command queue more serious, and enlarge > > > the latency of DMA users within local NUMA nodes. > > > > > > Users might intend to enforce NUMA locality with the consideration of > > > the position of SMMU. The patch provides minor benefit by presenting > > > this information to users directly, as they might want to know it without > > > checking hardware spec at all. > > > > I don't think that's a very good reason to expose things to userspace. > > I know sysfs shouldn't be treated as ABI, but the grim reality is that > > once somebody relies on this stuff then we can't change it, so I'd > > rather avoid exposing it unless it's absolutely necessary. > > Will, thanks for taking a look! > > I am not sure if it is absolutely necessary, but it is useful to users. The whole story started > from some users who wanted to know the hardware topology very clear by reading some > sysfs node just like they are able to do that for pci devices. The intention is that users can > know hardware topology of various devices easily from linux since they maybe don't know > all the hardware details. > > For pci devices, kernel has done that. And there are some other drivers out of pci > exposing numa_node as well. It seems it is hard to say it is absolutely necessary > for them too since sysfs shouldn't be treated as ABI. Brice, Given hwloc is probably the most demanding user of topology information currently... How useful would this info be for hwloc and hwloc users? Sort of feels like it might be useful in some cases. The very brief description of what we have here is exposing the numa node of an IOMMU. The discussion also diverted into whether it just makes sense to expose this for all platform devices or even do it at the device level. Jonathan > > I got some input from Linux users who also wanted to know the numa node for > other devices which are not PCI, for example, platform devices. And I thought the > requirement is kind of reasonable. So I also had another patch to generally support > this kind of requirements, with the below patch, this smmu patch is not necessary > any more: > https://lkml.org/lkml/2020/6/18/1257 > > for platform device created by ARM ACPI/IORT and general acpi_create_platform_device() > drivers/acpi/scan.c: > static void acpi_default_enumeration(struct acpi_device *device) > { > ... > if (!device->flags.enumeration_by_parent) { > acpi_create_platform_device(device, NULL); > acpi_device_set_enumerated(device); > } > } > > struct platform_device *acpi_create_platform_device(struct acpi_device *adev, > struct property_entry *properties) > { > ... > > pdev = platform_device_register_full(&pdevinfo); > if (IS_ERR(pdev)) > ... > else { > set_dev_node(&pdev->dev, acpi_get_node(adev->handle)); > ... > } > ... > } > numa_node is set for this kind of devices. > > Anyway, just want to explain to you the background some people want to know the > hardware topology from Linux in same simple way. And it seems it is a reasonable > requirement to me :-) > > > > > Thanks, > > > > Will > > Thanks > barry > _______________________________________________ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu
Le 06/07/2020 à 10:26, Jonathan Cameron a écrit : > On Sun, 5 Jul 2020 09:53:58 +0000 > "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com> wrote: > >>> -----Original Message----- >>> From: Will Deacon [mailto:will@kernel.org] >>> Sent: Saturday, July 4, 2020 4:22 AM >>> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com> >>> Cc: robin.murphy@arm.com; hch@lst.de; m.szyprowski@samsung.com; >>> iommu@lists.linux-foundation.org; linux-arm-kernel@lists.infradead.org; >>> Linuxarm <linuxarm@huawei.com> >>> Subject: Re: [PATCH] iommu/arm-smmu-v3: expose numa_node attribute to >>> users in sysfs >>> >>> On Sat, May 30, 2020 at 09:15:05PM +1200, Barry Song wrote: >>>> As tests show the latency of dma_unmap can increase dramatically while >>>> calling them cross NUMA nodes, especially cross CPU packages, eg. >>>> 300ns vs 800ns while waiting for the completion of CMD_SYNC in an >>>> empty command queue. The large latency causing by remote node will >>>> in turn make contention of the command queue more serious, and enlarge >>>> the latency of DMA users within local NUMA nodes. >>>> >>>> Users might intend to enforce NUMA locality with the consideration of >>>> the position of SMMU. The patch provides minor benefit by presenting >>>> this information to users directly, as they might want to know it without >>>> checking hardware spec at all. >>> I don't think that's a very good reason to expose things to userspace. >>> I know sysfs shouldn't be treated as ABI, but the grim reality is that >>> once somebody relies on this stuff then we can't change it, so I'd >>> rather avoid exposing it unless it's absolutely necessary. >> Will, thanks for taking a look! >> >> I am not sure if it is absolutely necessary, but it is useful to users. The whole story started >> from some users who wanted to know the hardware topology very clear by reading some >> sysfs node just like they are able to do that for pci devices. The intention is that users can >> know hardware topology of various devices easily from linux since they maybe don't know >> all the hardware details. >> >> For pci devices, kernel has done that. And there are some other drivers out of pci >> exposing numa_node as well. It seems it is hard to say it is absolutely necessary >> for them too since sysfs shouldn't be treated as ABI. > Brice, > > Given hwloc is probably the most demanding user of topology information > currently... > > How useful would this info be for hwloc and hwloc users? > Sort of feels like it might be useful in some cases. > > The very brief description of what we have here is exposing the numa node > of an IOMMU. The discussion also diverted into whether it just makes sense > to expose this for all platform devices or even do it at the device level. Hello We don't have anything about IOMMU in hwloc so far, likely because its locality never mattered in the past? I guess we'll get some user requests for it once more platforms show this issue and some performance-critical applications are not happy with it. Can you clarify what the whole machine topology look like? Are we talking about some PCI devices being attached to one socket but talking to the IOMMU of the other socket? Brice
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 82508730feb7..754c4d59498b 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -4021,6 +4021,44 @@ err_reset_pci_ops: __maybe_unused; return err; } +static ssize_t numa_node_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%d\n", dev_to_node(dev)); +} +static DEVICE_ATTR_RO(numa_node); + +static umode_t arm_smmu_numa_attr_visible(struct kobject *kobj, struct attribute *a, + int n) +{ + struct device *dev = container_of(kobj, typeof(*dev), kobj); + + if (!IS_ENABLED(CONFIG_NUMA)) + return 0; + + if (a == &dev_attr_numa_node.attr && + dev_to_node(dev) == NUMA_NO_NODE) + return 0; + + return a->mode; +} + +static struct attribute *arm_smmu_dev_attrs[] = { + &dev_attr_numa_node.attr, + NULL +}; + +static struct attribute_group arm_smmu_dev_attrs_group = { + .attrs = arm_smmu_dev_attrs, + .is_visible = arm_smmu_numa_attr_visible, +}; + + +static const struct attribute_group *arm_smmu_dev_attrs_groups[] = { + &arm_smmu_dev_attrs_group, + NULL, +}; + static int arm_smmu_device_probe(struct platform_device *pdev) { int irq, ret; @@ -4097,7 +4135,7 @@ static int arm_smmu_device_probe(struct platform_device *pdev) return ret; /* And we're up. Go go go! */ - ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL, + ret = iommu_device_sysfs_add(&smmu->iommu, dev, arm_smmu_dev_attrs_groups, "smmu3.%pa", &ioaddr); if (ret) return ret;
As tests show the latency of dma_unmap can increase dramatically while calling them cross NUMA nodes, especially cross CPU packages, eg. 300ns vs 800ns while waiting for the completion of CMD_SYNC in an empty command queue. The large latency causing by remote node will in turn make contention of the command queue more serious, and enlarge the latency of DMA users within local NUMA nodes. Users might intend to enforce NUMA locality with the consideration of the position of SMMU. The patch provides minor benefit by presenting this information to users directly, as they might want to know it without checking hardware spec at all. Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> --- drivers/iommu/arm-smmu-v3.c | 40 ++++++++++++++++++++++++++++++++++++- 1 file changed, 39 insertions(+), 1 deletion(-)