diff mbox

nvdimm: Export supported alignments via sysfs

Message ID 20170427091552.17694-1-oohall@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Oliver O'Halloran April 27, 2017, 9:15 a.m. UTC
Adds two new sysfs attributes for pfn (and dax) devices:
supported_alignements and default_alignment. These advertise to
userspace what alignments this kernel supports, and provides a nominal
default alignment to use.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
I'm not sure it makes sense to provide these for pfn devices. In the dax
case we have hard restrictions because of how fault handling works, but
I'm not convinced this makes sense for the pfn case since it's going to
be used with fs-dax.
---
 drivers/nvdimm/pfn_devs.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

Comments

Dan Williams April 27, 2017, 3:59 p.m. UTC | #1
On Thu, Apr 27, 2017 at 2:15 AM, Oliver O'Halloran <oohall@gmail.com> wrote:
> Adds two new sysfs attributes for pfn (and dax) devices:
> supported_alignements and default_alignment. These advertise to
> userspace what alignments this kernel supports, and provides a nominal
> default alignment to use.
>
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> ---
> I'm not sure it makes sense to provide these for pfn devices. In the dax
> case we have hard restrictions because of how fault handling works, but
> I'm not convinced this makes sense for the pfn case since it's going to
> be used with fs-dax.

We still want this for fs-dax so we can make sure that the namespace
is aligned to allow for opportunistic large mappings. We have pmd
support for fs-dax currently shipping, and looking to expand that to
pud support.

> ---
>  drivers/nvdimm/pfn_devs.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
>
> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> index 6c033c9a2f06..5157e7d89f0b 100644
> --- a/drivers/nvdimm/pfn_devs.c
> +++ b/drivers/nvdimm/pfn_devs.c
> @@ -260,6 +260,30 @@ static ssize_t size_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(size);
>
> +static ssize_t supported_alignments_show(struct device *dev,
> +               struct device_attribute *attr, char *buf)
> +{
> +       /* Fun fact: These aren't always constants! */
> +       unsigned long supported_alignments[] = {
> +               PAGE_SIZE,
> +               HPAGE_PMD_SIZE,
> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> +               HPAGE_PUD_SIZE,
> +#endif
> +               0,
> +       };
> +
> +       return nd_sector_size_show(0, supported_alignments, buf);
> +}
> +DEVICE_ATTR_RO(supported_alignments);
> +
> +static ssize_t default_alignment_show(struct device *dev,
> +               struct device_attribute *attr, char *buf)
> +{
> +       return sprintf(buf, "%ld\n", HPAGE_PMD_SIZE);
> +}
> +DEVICE_ATTR_RO(default_alignment);
> +
>  static struct attribute *nd_pfn_attributes[] = {
>         &dev_attr_mode.attr,
>         &dev_attr_namespace.attr,
> @@ -267,6 +291,8 @@ static struct attribute *nd_pfn_attributes[] = {
>         &dev_attr_align.attr,
>         &dev_attr_resource.attr,
>         &dev_attr_size.attr,
> +       &dev_attr_supported_alignments.attr,
> +       &dev_attr_default_alignment.attr,
>         NULL,

So, we don't need DEVICE_ATTR_RO(default_alignment), that can be
reflected by setting nd_pfn->align to HPAGE_PMD_SIZE by default and
passing nd_pfn->align to nd_sector_size_show(). Should probably rename
nd_sector_size_show() to nd_size_select_show().

The other concern is that the current DEVICE_ATTR_RW(align) can be
made redundant by this new interface if you make it writable. I wonder
if we can avoid breaking old ndctl versions by making the current
align setting the first one in the output? Worse comes to worse we can
live with two attributes 'align' and 'aligns', but I'd like to see if
can add this to the existing attribute.
Dan Williams April 27, 2017, 4:18 p.m. UTC | #2
On Thu, Apr 27, 2017 at 8:59 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Thu, Apr 27, 2017 at 2:15 AM, Oliver O'Halloran <oohall@gmail.com> wrote:
>> Adds two new sysfs attributes for pfn (and dax) devices:
>> supported_alignements and default_alignment. These advertise to
>> userspace what alignments this kernel supports, and provides a nominal
>> default alignment to use.
>>
>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
>> ---
>> I'm not sure it makes sense to provide these for pfn devices. In the dax
>> case we have hard restrictions because of how fault handling works, but
>> I'm not convinced this makes sense for the pfn case since it's going to
>> be used with fs-dax.
>
> We still want this for fs-dax so we can make sure that the namespace
> is aligned to allow for opportunistic large mappings. We have pmd
> support for fs-dax currently shipping, and looking to expand that to
> pud support.
>
>> ---
>>  drivers/nvdimm/pfn_devs.c | 26 ++++++++++++++++++++++++++
>>  1 file changed, 26 insertions(+)
>>
>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
>> index 6c033c9a2f06..5157e7d89f0b 100644
>> --- a/drivers/nvdimm/pfn_devs.c
>> +++ b/drivers/nvdimm/pfn_devs.c
>> @@ -260,6 +260,30 @@ static ssize_t size_show(struct device *dev,
>>  }
>>  static DEVICE_ATTR_RO(size);
>>
>> +static ssize_t supported_alignments_show(struct device *dev,
>> +               struct device_attribute *attr, char *buf)
>> +{
>> +       /* Fun fact: These aren't always constants! */
>> +       unsigned long supported_alignments[] = {
>> +               PAGE_SIZE,
>> +               HPAGE_PMD_SIZE,
>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>> +               HPAGE_PUD_SIZE,
>> +#endif
>> +               0,
>> +       };
>> +
>> +       return nd_sector_size_show(0, supported_alignments, buf);
>> +}
>> +DEVICE_ATTR_RO(supported_alignments);
>> +
>> +static ssize_t default_alignment_show(struct device *dev,
>> +               struct device_attribute *attr, char *buf)
>> +{
>> +       return sprintf(buf, "%ld\n", HPAGE_PMD_SIZE);
>> +}
>> +DEVICE_ATTR_RO(default_alignment);
>> +
>>  static struct attribute *nd_pfn_attributes[] = {
>>         &dev_attr_mode.attr,
>>         &dev_attr_namespace.attr,
>> @@ -267,6 +291,8 @@ static struct attribute *nd_pfn_attributes[] = {
>>         &dev_attr_align.attr,
>>         &dev_attr_resource.attr,
>>         &dev_attr_size.attr,
>> +       &dev_attr_supported_alignments.attr,
>> +       &dev_attr_default_alignment.attr,
>>         NULL,
>
> So, we don't need DEVICE_ATTR_RO(default_alignment), that can be
> reflected by setting nd_pfn->align to HPAGE_PMD_SIZE by default and
> passing nd_pfn->align to nd_sector_size_show(). Should probably rename
> nd_sector_size_show() to nd_size_select_show().
>
> The other concern is that the current DEVICE_ATTR_RW(align) can be
> made redundant by this new interface if you make it writable. I wonder
> if we can avoid breaking old ndctl versions by making the current
> align setting the first one in the output? Worse comes to worse we can
> live with two attributes 'align' and 'aligns', but I'd like to see if
> can add this to the existing attribute.

Ok, so we can make this backward compatible, all that is needed is to
list the current setting as the first entry in the list and make it
un-decorated. For example a size list like this with 528 selected:

    "512 520 [528] 4096 4104 4160 4224"

...would become this:

    "528 512 520 [528] 4096 4104 4160 4224"

...slightly messy, but it allows us to avoid growing redundant attributes.
Oliver O'Halloran April 28, 2017, 5:59 a.m. UTC | #3
On Fri, Apr 28, 2017 at 2:18 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Thu, Apr 27, 2017 at 8:59 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>> On Thu, Apr 27, 2017 at 2:15 AM, Oliver O'Halloran <oohall@gmail.com> wrote:
>>> Adds two new sysfs attributes for pfn (and dax) devices:
>>> supported_alignements and default_alignment. These advertise to
>>> userspace what alignments this kernel supports, and provides a nominal
>>> default alignment to use.
>>>
>>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
>>> ---
>>> I'm not sure it makes sense to provide these for pfn devices. In the dax
>>> case we have hard restrictions because of how fault handling works, but
>>> I'm not convinced this makes sense for the pfn case since it's going to
>>> be used with fs-dax.
>>
>> We still want this for fs-dax so we can make sure that the namespace
>> is aligned to allow for opportunistic large mappings. We have pmd
>> support for fs-dax currently shipping, and looking to expand that to
>> pud support.
>>
>>> ---
>>>  drivers/nvdimm/pfn_devs.c | 26 ++++++++++++++++++++++++++
>>>  1 file changed, 26 insertions(+)
>>>
>>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
>>> index 6c033c9a2f06..5157e7d89f0b 100644
>>> --- a/drivers/nvdimm/pfn_devs.c
>>> +++ b/drivers/nvdimm/pfn_devs.c
>>> @@ -260,6 +260,30 @@ static ssize_t size_show(struct device *dev,
>>>  }
>>>  static DEVICE_ATTR_RO(size);
>>>
>>> +static ssize_t supported_alignments_show(struct device *dev,
>>> +               struct device_attribute *attr, char *buf)
>>> +{
>>> +       /* Fun fact: These aren't always constants! */
>>> +       unsigned long supported_alignments[] = {
>>> +               PAGE_SIZE,
>>> +               HPAGE_PMD_SIZE,
>>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>> +               HPAGE_PUD_SIZE,
>>> +#endif
>>> +               0,
>>> +       };
>>> +
>>> +       return nd_sector_size_show(0, supported_alignments, buf);
>>> +}
>>> +DEVICE_ATTR_RO(supported_alignments);
>>> +
>>> +static ssize_t default_alignment_show(struct device *dev,
>>> +               struct device_attribute *attr, char *buf)
>>> +{
>>> +       return sprintf(buf, "%ld\n", HPAGE_PMD_SIZE);
>>> +}
>>> +DEVICE_ATTR_RO(default_alignment);
>>> +
>>>  static struct attribute *nd_pfn_attributes[] = {
>>>         &dev_attr_mode.attr,
>>>         &dev_attr_namespace.attr,
>>> @@ -267,6 +291,8 @@ static struct attribute *nd_pfn_attributes[] = {
>>>         &dev_attr_align.attr,
>>>         &dev_attr_resource.attr,
>>>         &dev_attr_size.attr,
>>> +       &dev_attr_supported_alignments.attr,
>>> +       &dev_attr_default_alignment.attr,
>>>         NULL,
>>
>> So, we don't need DEVICE_ATTR_RO(default_alignment), that can be
>> reflected by setting nd_pfn->align to HPAGE_PMD_SIZE by default and
>> passing nd_pfn->align to nd_sector_size_show(). Should probably rename
>> nd_sector_size_show() to nd_size_select_show().
>>
>> The other concern is that the current DEVICE_ATTR_RW(align) can be
>> made redundant by this new interface if you make it writable. I wonder
>> if we can avoid breaking old ndctl versions by making the current
>> align setting the first one in the output? Worse comes to worse we can
>> live with two attributes 'align' and 'aligns', but I'd like to see if
>> can add this to the existing attribute.
>
> Ok, so we can make this backward compatible, all that is needed is to
> list the current setting as the first entry in the list and make it
> un-decorated. For example a size list like this with 528 selected:
>
>     "512 520 [528] 4096 4104 4160 4224"
>
> ...would become this:
>
>     "528 512 520 [528] 4096 4104 4160 4224"
>
> ...slightly messy, but it allows us to avoid growing redundant attributes.

This is pretty gross, are you sure you want to do this?
Oliver O'Halloran April 28, 2017, 7:31 a.m. UTC | #4
On Fri, Apr 28, 2017 at 1:59 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Thu, Apr 27, 2017 at 2:15 AM, Oliver O'Halloran <oohall@gmail.com> wrote:
>> Adds two new sysfs attributes for pfn (and dax) devices:
>> supported_alignements and default_alignment. These advertise to
>> userspace what alignments this kernel supports, and provides a nominal
>> default alignment to use.
>>
>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
>> ---
>> I'm not sure it makes sense to provide these for pfn devices. In the dax
>> case we have hard restrictions because of how fault handling works, but
>> I'm not convinced this makes sense for the pfn case since it's going to
>> be used with fs-dax.

> We still want this for fs-dax so we can make sure that the namespace
> is aligned to allow for opportunistic large mappings. We have pmd
> support for fs-dax currently shipping, and looking to expand that to
> pud support.

Sure, but whether we can use a PUD for userspace mappings mostly
depends on the allocation decisions of the filesystem rather than the
alignment of the namespace. The reservations for the PFN superblock,
altmap and dax labels mean the namespace is always going to be
unaligned so forcing a PUD alignment will result in a lot of wasted
space for dubious benefits. I suppose there's no reason not to provide
the functionality, but I don't see it buying us much.

>> ---
>>  drivers/nvdimm/pfn_devs.c | 26 ++++++++++++++++++++++++++
>>  1 file changed, 26 insertions(+)
>>
>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
>> index 6c033c9a2f06..5157e7d89f0b 100644
>> --- a/drivers/nvdimm/pfn_devs.c
>> +++ b/drivers/nvdimm/pfn_devs.c
>> @@ -260,6 +260,30 @@ static ssize_t size_show(struct device *dev,
>>  }
>>  static DEVICE_ATTR_RO(size);
>>
>> +static ssize_t supported_alignments_show(struct device *dev,
>> +               struct device_attribute *attr, char *buf)
>> +{
>> +       /* Fun fact: These aren't always constants! */
>> +       unsigned long supported_alignments[] = {
>> +               PAGE_SIZE,
>> +               HPAGE_PMD_SIZE,
>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>> +               HPAGE_PUD_SIZE,
>> +#endif
>> +               0,
>> +       };
>> +
>> +       return nd_sector_size_show(0, supported_alignments, buf);
>> +}
>> +DEVICE_ATTR_RO(supported_alignments);
>> +
>> +static ssize_t default_alignment_show(struct device *dev,
>> +               struct device_attribute *attr, char *buf)
>> +{
>> +       return sprintf(buf, "%ld\n", HPAGE_PMD_SIZE);
>> +}
>> +DEVICE_ATTR_RO(default_alignment);
>> +
>>  static struct attribute *nd_pfn_attributes[] = {
>>         &dev_attr_mode.attr,
>>         &dev_attr_namespace.attr,
>> @@ -267,6 +291,8 @@ static struct attribute *nd_pfn_attributes[] = {
>>         &dev_attr_align.attr,
>>         &dev_attr_resource.attr,
>>         &dev_attr_size.attr,
>> +       &dev_attr_supported_alignments.attr,
>> +       &dev_attr_default_alignment.attr,
>>         NULL,
>
> So, we don't need DEVICE_ATTR_RO(default_alignment), that can be
> reflected by setting nd_pfn->align to HPAGE_PMD_SIZE by default.

Hmm true, if we do this then we can use the alignment of the seed as
the default rather than having a separate attribute.

> passing nd_pfn->align to nd_sector_size_show(). Should probably rename
> nd_sector_size_show() to nd_size_select_show().

I agree. I figured another respin would be required so I kept the
changes to a minimum.

> The other concern is that the current DEVICE_ATTR_RW(align) can be
> made redundant by this new interface if you make it writable. I wonder
> if we can avoid breaking old ndctl versions by making the current
> align setting the first one in the output? Worse comes to worse we can
> live with two attributes 'align' and 'aligns', but I'd like to see if
> can add this to the existing attribute.

I'd rather have a small amount of redundancy and keep the the
attribute consistent with the the btt sector size attribute. We could
always remove align some time down the track since I imagine ndctl is
the only thing that consumes that part of the interface and ndctl
already handles align being missing.

Oliver
Dan Williams May 2, 2017, 9:57 p.m. UTC | #5
On Fri, Apr 28, 2017 at 12:31 AM, Oliver O'Halloran <oohall@gmail.com> wrote:
> On Fri, Apr 28, 2017 at 1:59 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>> On Thu, Apr 27, 2017 at 2:15 AM, Oliver O'Halloran <oohall@gmail.com> wrote:
>>> Adds two new sysfs attributes for pfn (and dax) devices:
>>> supported_alignements and default_alignment. These advertise to
>>> userspace what alignments this kernel supports, and provides a nominal
>>> default alignment to use.
>>>
>>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
>>> ---
>>> I'm not sure it makes sense to provide these for pfn devices. In the dax
>>> case we have hard restrictions because of how fault handling works, but
>>> I'm not convinced this makes sense for the pfn case since it's going to
>>> be used with fs-dax.
>
>> We still want this for fs-dax so we can make sure that the namespace
>> is aligned to allow for opportunistic large mappings. We have pmd
>> support for fs-dax currently shipping, and looking to expand that to
>> pud support.
>
> Sure, but whether we can use a PUD for userspace mappings mostly
> depends on the allocation decisions of the filesystem rather than the
> alignment of the namespace. The reservations for the PFN superblock,
> altmap and dax labels mean the namespace is always going to be
> unaligned so forcing a PUD alignment will result in a lot of wasted
> space for dubious benefits. I suppose there's no reason not to provide
> the functionality, but I don't see it buying us much.
>
>>> ---
>>>  drivers/nvdimm/pfn_devs.c | 26 ++++++++++++++++++++++++++
>>>  1 file changed, 26 insertions(+)
>>>
>>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
>>> index 6c033c9a2f06..5157e7d89f0b 100644
>>> --- a/drivers/nvdimm/pfn_devs.c
>>> +++ b/drivers/nvdimm/pfn_devs.c
>>> @@ -260,6 +260,30 @@ static ssize_t size_show(struct device *dev,
>>>  }
>>>  static DEVICE_ATTR_RO(size);
>>>
>>> +static ssize_t supported_alignments_show(struct device *dev,
>>> +               struct device_attribute *attr, char *buf)
>>> +{
>>> +       /* Fun fact: These aren't always constants! */
>>> +       unsigned long supported_alignments[] = {
>>> +               PAGE_SIZE,
>>> +               HPAGE_PMD_SIZE,
>>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>> +               HPAGE_PUD_SIZE,
>>> +#endif
>>> +               0,
>>> +       };
>>> +
>>> +       return nd_sector_size_show(0, supported_alignments, buf);
>>> +}
>>> +DEVICE_ATTR_RO(supported_alignments);
>>> +
>>> +static ssize_t default_alignment_show(struct device *dev,
>>> +               struct device_attribute *attr, char *buf)
>>> +{
>>> +       return sprintf(buf, "%ld\n", HPAGE_PMD_SIZE);
>>> +}
>>> +DEVICE_ATTR_RO(default_alignment);
>>> +
>>>  static struct attribute *nd_pfn_attributes[] = {
>>>         &dev_attr_mode.attr,
>>>         &dev_attr_namespace.attr,
>>> @@ -267,6 +291,8 @@ static struct attribute *nd_pfn_attributes[] = {
>>>         &dev_attr_align.attr,
>>>         &dev_attr_resource.attr,
>>>         &dev_attr_size.attr,
>>> +       &dev_attr_supported_alignments.attr,
>>> +       &dev_attr_default_alignment.attr,
>>>         NULL,
>>
>> So, we don't need DEVICE_ATTR_RO(default_alignment), that can be
>> reflected by setting nd_pfn->align to HPAGE_PMD_SIZE by default.
>
> Hmm true, if we do this then we can use the alignment of the seed as
> the default rather than having a separate attribute.
>
>> passing nd_pfn->align to nd_sector_size_show(). Should probably rename
>> nd_sector_size_show() to nd_size_select_show().
>
> I agree. I figured another respin would be required so I kept the
> changes to a minimum.
>
>> The other concern is that the current DEVICE_ATTR_RW(align) can be
>> made redundant by this new interface if you make it writable. I wonder
>> if we can avoid breaking old ndctl versions by making the current
>> align setting the first one in the output? Worse comes to worse we can
>> live with two attributes 'align' and 'aligns', but I'd like to see if
>> can add this to the existing attribute.
>
> I'd rather have a small amount of redundancy and keep the the
> attribute consistent with the the btt sector size attribute.

I'd rather not, that's expanding the kernel-user ABI for only vanity
reasons as far as I can see.

> We could
> always remove align some time down the track since I imagine ndctl is
> the only thing that consumes that part of the interface and ndctl
> already handles align being missing.

No, that breaks old ndctl binaries that depend on the align attribute
to be there if the kernel supports device-dax.
Oliver O'Halloran May 3, 2017, 3:25 a.m. UTC | #6
On Wed, May 3, 2017 at 7:57 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Fri, Apr 28, 2017 at 12:31 AM, Oliver O'Halloran <oohall@gmail.com> wrote:
>> On Fri, Apr 28, 2017 at 1:59 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>>> On Thu, Apr 27, 2017 at 2:15 AM, Oliver O'Halloran <oohall@gmail.com> wrote:
>>>> Adds two new sysfs attributes for pfn (and dax) devices:
>>>> supported_alignements and default_alignment. These advertise to
>>>> userspace what alignments this kernel supports, and provides a nominal
>>>> default alignment to use.
>>>>
>>>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
>>>> ---
>>>> I'm not sure it makes sense to provide these for pfn devices. In the dax
>>>> case we have hard restrictions because of how fault handling works, but
>>>> I'm not convinced this makes sense for the pfn case since it's going to
>>>> be used with fs-dax.
>>
>>> We still want this for fs-dax so we can make sure that the namespace
>>> is aligned to allow for opportunistic large mappings. We have pmd
>>> support for fs-dax currently shipping, and looking to expand that to
>>> pud support.
>>
>> Sure, but whether we can use a PUD for userspace mappings mostly
>> depends on the allocation decisions of the filesystem rather than the
>> alignment of the namespace. The reservations for the PFN superblock,
>> altmap and dax labels mean the namespace is always going to be
>> unaligned so forcing a PUD alignment will result in a lot of wasted
>> space for dubious benefits. I suppose there's no reason not to provide
>> the functionality, but I don't see it buying us much.
>>
>>>> ---
>>>>  drivers/nvdimm/pfn_devs.c | 26 ++++++++++++++++++++++++++
>>>>  1 file changed, 26 insertions(+)
>>>>
>>>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
>>>> index 6c033c9a2f06..5157e7d89f0b 100644
>>>> --- a/drivers/nvdimm/pfn_devs.c
>>>> +++ b/drivers/nvdimm/pfn_devs.c
>>>> @@ -260,6 +260,30 @@ static ssize_t size_show(struct device *dev,
>>>>  }
>>>>  static DEVICE_ATTR_RO(size);
>>>>
>>>> +static ssize_t supported_alignments_show(struct device *dev,
>>>> +               struct device_attribute *attr, char *buf)
>>>> +{
>>>> +       /* Fun fact: These aren't always constants! */
>>>> +       unsigned long supported_alignments[] = {
>>>> +               PAGE_SIZE,
>>>> +               HPAGE_PMD_SIZE,
>>>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>>> +               HPAGE_PUD_SIZE,
>>>> +#endif
>>>> +               0,
>>>> +       };
>>>> +
>>>> +       return nd_sector_size_show(0, supported_alignments, buf);
>>>> +}
>>>> +DEVICE_ATTR_RO(supported_alignments);
>>>> +
>>>> +static ssize_t default_alignment_show(struct device *dev,
>>>> +               struct device_attribute *attr, char *buf)
>>>> +{
>>>> +       return sprintf(buf, "%ld\n", HPAGE_PMD_SIZE);
>>>> +}
>>>> +DEVICE_ATTR_RO(default_alignment);
>>>> +
>>>>  static struct attribute *nd_pfn_attributes[] = {
>>>>         &dev_attr_mode.attr,
>>>>         &dev_attr_namespace.attr,
>>>> @@ -267,6 +291,8 @@ static struct attribute *nd_pfn_attributes[] = {
>>>>         &dev_attr_align.attr,
>>>>         &dev_attr_resource.attr,
>>>>         &dev_attr_size.attr,
>>>> +       &dev_attr_supported_alignments.attr,
>>>> +       &dev_attr_default_alignment.attr,
>>>>         NULL,
>>>
>>> So, we don't need DEVICE_ATTR_RO(default_alignment), that can be
>>> reflected by setting nd_pfn->align to HPAGE_PMD_SIZE by default.
>>
>> Hmm true, if we do this then we can use the alignment of the seed as
>> the default rather than having a separate attribute.
>>
>>> passing nd_pfn->align to nd_sector_size_show(). Should probably rename
>>> nd_sector_size_show() to nd_size_select_show().
>>
>> I agree. I figured another respin would be required so I kept the
>> changes to a minimum.
>>
>>> The other concern is that the current DEVICE_ATTR_RW(align) can be
>>> made redundant by this new interface if you make it writable. I wonder
>>> if we can avoid breaking old ndctl versions by making the current
>>> align setting the first one in the output? Worse comes to worse we can
>>> live with two attributes 'align' and 'aligns', but I'd like to see if
>>> can add this to the existing attribute.
>>
>> I'd rather have a small amount of redundancy and keep the the
>> attribute consistent with the the btt sector size attribute.
>
> I'd rather not, that's expanding the kernel-user ABI for only vanity
> reasons as far as I can see.

It's an extension of the user-kernel ABI in any case. This is just the
most byzantine way to do it.

>> We could
>> always remove align some time down the track since I imagine ndctl is
>> the only thing that consumes that part of the interface and ndctl
>> already handles align being missing.
>
> No, that breaks old ndctl binaries that depend on the align attribute
> to be there if the kernel supports device-dax.

Fair enough.
Dan Williams May 3, 2017, 4:17 a.m. UTC | #7
On Tue, May 2, 2017 at 8:25 PM, Oliver O'Halloran <oohall@gmail.com> wrote:
> On Wed, May 3, 2017 at 7:57 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>> On Fri, Apr 28, 2017 at 12:31 AM, Oliver O'Halloran <oohall@gmail.com> wrote:
>>> On Fri, Apr 28, 2017 at 1:59 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>>>> On Thu, Apr 27, 2017 at 2:15 AM, Oliver O'Halloran <oohall@gmail.com> wrote:
>>>>> Adds two new sysfs attributes for pfn (and dax) devices:
>>>>> supported_alignements and default_alignment. These advertise to
>>>>> userspace what alignments this kernel supports, and provides a nominal
>>>>> default alignment to use.
>>>>>
>>>>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
>>>>> ---
>>>>> I'm not sure it makes sense to provide these for pfn devices. In the dax
>>>>> case we have hard restrictions because of how fault handling works, but
>>>>> I'm not convinced this makes sense for the pfn case since it's going to
>>>>> be used with fs-dax.
>>>
>>>> We still want this for fs-dax so we can make sure that the namespace
>>>> is aligned to allow for opportunistic large mappings. We have pmd
>>>> support for fs-dax currently shipping, and looking to expand that to
>>>> pud support.
>>>
>>> Sure, but whether we can use a PUD for userspace mappings mostly
>>> depends on the allocation decisions of the filesystem rather than the
>>> alignment of the namespace. The reservations for the PFN superblock,
>>> altmap and dax labels mean the namespace is always going to be
>>> unaligned so forcing a PUD alignment will result in a lot of wasted
>>> space for dubious benefits. I suppose there's no reason not to provide
>>> the functionality, but I don't see it buying us much.
>>>
>>>>> ---
>>>>>  drivers/nvdimm/pfn_devs.c | 26 ++++++++++++++++++++++++++
>>>>>  1 file changed, 26 insertions(+)
>>>>>
>>>>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
>>>>> index 6c033c9a2f06..5157e7d89f0b 100644
>>>>> --- a/drivers/nvdimm/pfn_devs.c
>>>>> +++ b/drivers/nvdimm/pfn_devs.c
>>>>> @@ -260,6 +260,30 @@ static ssize_t size_show(struct device *dev,
>>>>>  }
>>>>>  static DEVICE_ATTR_RO(size);
>>>>>
>>>>> +static ssize_t supported_alignments_show(struct device *dev,
>>>>> +               struct device_attribute *attr, char *buf)
>>>>> +{
>>>>> +       /* Fun fact: These aren't always constants! */
>>>>> +       unsigned long supported_alignments[] = {
>>>>> +               PAGE_SIZE,
>>>>> +               HPAGE_PMD_SIZE,
>>>>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>>>> +               HPAGE_PUD_SIZE,
>>>>> +#endif
>>>>> +               0,
>>>>> +       };
>>>>> +
>>>>> +       return nd_sector_size_show(0, supported_alignments, buf);
>>>>> +}
>>>>> +DEVICE_ATTR_RO(supported_alignments);
>>>>> +
>>>>> +static ssize_t default_alignment_show(struct device *dev,
>>>>> +               struct device_attribute *attr, char *buf)
>>>>> +{
>>>>> +       return sprintf(buf, "%ld\n", HPAGE_PMD_SIZE);
>>>>> +}
>>>>> +DEVICE_ATTR_RO(default_alignment);
>>>>> +
>>>>>  static struct attribute *nd_pfn_attributes[] = {
>>>>>         &dev_attr_mode.attr,
>>>>>         &dev_attr_namespace.attr,
>>>>> @@ -267,6 +291,8 @@ static struct attribute *nd_pfn_attributes[] = {
>>>>>         &dev_attr_align.attr,
>>>>>         &dev_attr_resource.attr,
>>>>>         &dev_attr_size.attr,
>>>>> +       &dev_attr_supported_alignments.attr,
>>>>> +       &dev_attr_default_alignment.attr,
>>>>>         NULL,
>>>>
>>>> So, we don't need DEVICE_ATTR_RO(default_alignment), that can be
>>>> reflected by setting nd_pfn->align to HPAGE_PMD_SIZE by default.
>>>
>>> Hmm true, if we do this then we can use the alignment of the seed as
>>> the default rather than having a separate attribute.
>>>
>>>> passing nd_pfn->align to nd_sector_size_show(). Should probably rename
>>>> nd_sector_size_show() to nd_size_select_show().
>>>
>>> I agree. I figured another respin would be required so I kept the
>>> changes to a minimum.
>>>
>>>> The other concern is that the current DEVICE_ATTR_RW(align) can be
>>>> made redundant by this new interface if you make it writable. I wonder
>>>> if we can avoid breaking old ndctl versions by making the current
>>>> align setting the first one in the output? Worse comes to worse we can
>>>> live with two attributes 'align' and 'aligns', but I'd like to see if
>>>> can add this to the existing attribute.
>>>
>>> I'd rather have a small amount of redundancy and keep the the
>>> attribute consistent with the the btt sector size attribute.
>>
>> I'd rather not, that's expanding the kernel-user ABI for only vanity
>> reasons as far as I can see.
>
> It's an extension of the user-kernel ABI in any case. This is just the
> most byzantine way to do it.
>
>>> We could
>>> always remove align some time down the track since I imagine ndctl is
>>> the only thing that consumes that part of the interface and ndctl
>>> already handles align being missing.
>>
>> No, that breaks old ndctl binaries that depend on the align attribute
>> to be there if the kernel supports device-dax.
>
> Fair enough.

All that said, there's nothing stopping us from making 'align' it's
own mechanism. Where the first entry in the list is the current
setting, in contrast to btt that decorates the current sector-size
setting with square brackets.
Oliver O'Halloran May 3, 2017, 7:08 a.m. UTC | #8
On Wed, May 3, 2017 at 2:17 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, May 2, 2017 at 8:25 PM, Oliver O'Halloran <oohall@gmail.com> wrote:
>> On Wed, May 3, 2017 at 7:57 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>>> On Fri, Apr 28, 2017 at 12:31 AM, Oliver O'Halloran <oohall@gmail.com> wrote:
>>>> On Fri, Apr 28, 2017 at 1:59 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>>>>> On Thu, Apr 27, 2017 at 2:15 AM, Oliver O'Halloran <oohall@gmail.com> wrote:
>>>>>> Adds two new sysfs attributes for pfn (and dax) devices:
>>>>>> supported_alignements and default_alignment. These advertise to
>>>>>> userspace what alignments this kernel supports, and provides a nominal
>>>>>> default alignment to use.
>>>>>>
>>>>>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
>>>>>> ---
>>>>>> I'm not sure it makes sense to provide these for pfn devices. In the dax
>>>>>> case we have hard restrictions because of how fault handling works, but
>>>>>> I'm not convinced this makes sense for the pfn case since it's going to
>>>>>> be used with fs-dax.
>>>>
>>>>> We still want this for fs-dax so we can make sure that the namespace
>>>>> is aligned to allow for opportunistic large mappings. We have pmd
>>>>> support for fs-dax currently shipping, and looking to expand that to
>>>>> pud support.
>>>>
>>>> Sure, but whether we can use a PUD for userspace mappings mostly
>>>> depends on the allocation decisions of the filesystem rather than the
>>>> alignment of the namespace. The reservations for the PFN superblock,
>>>> altmap and dax labels mean the namespace is always going to be
>>>> unaligned so forcing a PUD alignment will result in a lot of wasted
>>>> space for dubious benefits. I suppose there's no reason not to provide
>>>> the functionality, but I don't see it buying us much.
>>>>
>>>>>> ---
>>>>>>  drivers/nvdimm/pfn_devs.c | 26 ++++++++++++++++++++++++++
>>>>>>  1 file changed, 26 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
>>>>>> index 6c033c9a2f06..5157e7d89f0b 100644
>>>>>> --- a/drivers/nvdimm/pfn_devs.c
>>>>>> +++ b/drivers/nvdimm/pfn_devs.c
>>>>>> @@ -260,6 +260,30 @@ static ssize_t size_show(struct device *dev,
>>>>>>  }
>>>>>>  static DEVICE_ATTR_RO(size);
>>>>>>
>>>>>> +static ssize_t supported_alignments_show(struct device *dev,
>>>>>> +               struct device_attribute *attr, char *buf)
>>>>>> +{
>>>>>> +       /* Fun fact: These aren't always constants! */
>>>>>> +       unsigned long supported_alignments[] = {
>>>>>> +               PAGE_SIZE,
>>>>>> +               HPAGE_PMD_SIZE,
>>>>>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>>>>> +               HPAGE_PUD_SIZE,
>>>>>> +#endif
>>>>>> +               0,
>>>>>> +       };
>>>>>> +
>>>>>> +       return nd_sector_size_show(0, supported_alignments, buf);
>>>>>> +}
>>>>>> +DEVICE_ATTR_RO(supported_alignments);
>>>>>> +
>>>>>> +static ssize_t default_alignment_show(struct device *dev,
>>>>>> +               struct device_attribute *attr, char *buf)
>>>>>> +{
>>>>>> +       return sprintf(buf, "%ld\n", HPAGE_PMD_SIZE);
>>>>>> +}
>>>>>> +DEVICE_ATTR_RO(default_alignment);
>>>>>> +
>>>>>>  static struct attribute *nd_pfn_attributes[] = {
>>>>>>         &dev_attr_mode.attr,
>>>>>>         &dev_attr_namespace.attr,
>>>>>> @@ -267,6 +291,8 @@ static struct attribute *nd_pfn_attributes[] = {
>>>>>>         &dev_attr_align.attr,
>>>>>>         &dev_attr_resource.attr,
>>>>>>         &dev_attr_size.attr,
>>>>>> +       &dev_attr_supported_alignments.attr,
>>>>>> +       &dev_attr_default_alignment.attr,
>>>>>>         NULL,
>>>>>
>>>>> So, we don't need DEVICE_ATTR_RO(default_alignment), that can be
>>>>> reflected by setting nd_pfn->align to HPAGE_PMD_SIZE by default.
>>>>
>>>> Hmm true, if we do this then we can use the alignment of the seed as
>>>> the default rather than having a separate attribute.
>>>>
>>>>> passing nd_pfn->align to nd_sector_size_show(). Should probably rename
>>>>> nd_sector_size_show() to nd_size_select_show().
>>>>
>>>> I agree. I figured another respin would be required so I kept the
>>>> changes to a minimum.
>>>>
>>>>> The other concern is that the current DEVICE_ATTR_RW(align) can be
>>>>> made redundant by this new interface if you make it writable. I wonder
>>>>> if we can avoid breaking old ndctl versions by making the current
>>>>> align setting the first one in the output? Worse comes to worse we can
>>>>> live with two attributes 'align' and 'aligns', but I'd like to see if
>>>>> can add this to the existing attribute.
>>>>
>>>> I'd rather have a small amount of redundancy and keep the the
>>>> attribute consistent with the the btt sector size attribute.
>>>
>>> I'd rather not, that's expanding the kernel-user ABI for only vanity
>>> reasons as far as I can see.
>>
>> It's an extension of the user-kernel ABI in any case. This is just the
>> most byzantine way to do it.
>>
>>>> We could
>>>> always remove align some time down the track since I imagine ndctl is
>>>> the only thing that consumes that part of the interface and ndctl
>>>> already handles align being missing.
>>>
>>> No, that breaks old ndctl binaries that depend on the align attribute
>>> to be there if the kernel supports device-dax.
>>
>> Fair enough.
>
> All that said, there's nothing stopping us from making 'align' it's
> own mechanism. Where the first entry in the list is the current
> setting, in contrast to btt that decorates the current sector-size
> setting with square brackets.

I'd be okay with this provided we force the alignment to one of the
supported values. Currently the only validation done by the kernel is:

        if (!is_power_of_2(val) || val < PAGE_SIZE || val > SZ_1G)
                return -EINVAL;

So you can set an unsupported value by poking at sysfs directly. This
behaviour is useful for testing since you can use it to force an
alignment failure in the DAX fault handler. I'm not overly concerned
if it goes, but it's something to keep in mind. I still think it would
be cleaner if we just added a separate attribute.

Oliver
Dan Williams May 3, 2017, 3:38 p.m. UTC | #9
On Wed, May 3, 2017 at 12:08 AM, Oliver O'Halloran <oohall@gmail.com> wrote:
> On Wed, May 3, 2017 at 2:17 PM, Dan Williams <dan.j.williams@intel.com> wrote:
[..]
>>> Fair enough.
>>
>> All that said, there's nothing stopping us from making 'align' it's
>> own mechanism. Where the first entry in the list is the current
>> setting, in contrast to btt that decorates the current sector-size
>> setting with square brackets.
>
> I'd be okay with this provided we force the alignment to one of the
> supported values. Currently the only validation done by the kernel is:
>
>         if (!is_power_of_2(val) || val < PAGE_SIZE || val > SZ_1G)
>                 return -EINVAL;

Yes, we'd need to validate the input against the supported values.
There are no known binaries in the wild that I  know of that depend on
this looser definition, so we should be ok to change it.

> So you can set an unsupported value by poking at sysfs directly. This
> behaviour is useful for testing since you can use it to force an
> alignment failure in the DAX fault handler.

I'd rather move that test support to something like the nfit_test
infrastructure.

> I'm not overly concerned
> if it goes, but it's something to keep in mind. I still think it would
> be cleaner if we just added a separate attribute.

I'm still having a hard time seeing how redundant sysfs attributes is "clean".
Dan Williams May 12, 2017, 11:01 p.m. UTC | #10
On Wed, May 3, 2017 at 8:38 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Wed, May 3, 2017 at 12:08 AM, Oliver O'Halloran <oohall@gmail.com> wrote:
>> On Wed, May 3, 2017 at 2:17 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> [..]
>>>> Fair enough.
>>>
>>> All that said, there's nothing stopping us from making 'align' it's
>>> own mechanism. Where the first entry in the list is the current
>>> setting, in contrast to btt that decorates the current sector-size
>>> setting with square brackets.
>>
>> I'd be okay with this provided we force the alignment to one of the
>> supported values. Currently the only validation done by the kernel is:
>>
>>         if (!is_power_of_2(val) || val < PAGE_SIZE || val > SZ_1G)
>>                 return -EINVAL;
>
> Yes, we'd need to validate the input against the supported values.
> There are no known binaries in the wild that I  know of that depend on
> this looser definition, so we should be ok to change it.
>
>> So you can set an unsupported value by poking at sysfs directly. This
>> behaviour is useful for testing since you can use it to force an
>> alignment failure in the DAX fault handler.
>
> I'd rather move that test support to something like the nfit_test
> infrastructure.
>
>> I'm not overly concerned
>> if it goes, but it's something to keep in mind. I still think it would
>> be cleaner if we just added a separate attribute.
>
> I'm still having a hard time seeing how redundant sysfs attributes is "clean".

It turns out the NVML project is also parsing the 'align' attribute
outside of ndctl. So, now I'm with you, I think it would better to
move the 'possible alignments' to its own read-only attribute
('aligns'?) and leave 'align' as the interface to read/write the
current setting.
diff mbox

Patch

diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 6c033c9a2f06..5157e7d89f0b 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -260,6 +260,30 @@  static ssize_t size_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(size);
 
+static ssize_t supported_alignments_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	/* Fun fact: These aren't always constants! */
+	unsigned long supported_alignments[] = {
+		PAGE_SIZE,
+		HPAGE_PMD_SIZE,
+#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
+		HPAGE_PUD_SIZE,
+#endif
+		0,
+	};
+
+	return nd_sector_size_show(0, supported_alignments, buf);
+}
+DEVICE_ATTR_RO(supported_alignments);
+
+static ssize_t default_alignment_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%ld\n", HPAGE_PMD_SIZE);
+}
+DEVICE_ATTR_RO(default_alignment);
+
 static struct attribute *nd_pfn_attributes[] = {
 	&dev_attr_mode.attr,
 	&dev_attr_namespace.attr,
@@ -267,6 +291,8 @@  static struct attribute *nd_pfn_attributes[] = {
 	&dev_attr_align.attr,
 	&dev_attr_resource.attr,
 	&dev_attr_size.attr,
+	&dev_attr_supported_alignments.attr,
+	&dev_attr_default_alignment.attr,
 	NULL,
 };