Message ID | 20170427091552.17694-1-oohall@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Apr 27, 2017 at 2:15 AM, Oliver O'Halloran <oohall@gmail.com> wrote: > Adds two new sysfs attributes for pfn (and dax) devices: > supported_alignements and default_alignment. These advertise to > userspace what alignments this kernel supports, and provides a nominal > default alignment to use. > > Signed-off-by: Oliver O'Halloran <oohall@gmail.com> > --- > I'm not sure it makes sense to provide these for pfn devices. In the dax > case we have hard restrictions because of how fault handling works, but > I'm not convinced this makes sense for the pfn case since it's going to > be used with fs-dax. We still want this for fs-dax so we can make sure that the namespace is aligned to allow for opportunistic large mappings. We have pmd support for fs-dax currently shipping, and looking to expand that to pud support. > --- > drivers/nvdimm/pfn_devs.c | 26 ++++++++++++++++++++++++++ > 1 file changed, 26 insertions(+) > > diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c > index 6c033c9a2f06..5157e7d89f0b 100644 > --- a/drivers/nvdimm/pfn_devs.c > +++ b/drivers/nvdimm/pfn_devs.c > @@ -260,6 +260,30 @@ static ssize_t size_show(struct device *dev, > } > static DEVICE_ATTR_RO(size); > > +static ssize_t supported_alignments_show(struct device *dev, > + struct device_attribute *attr, char *buf) > +{ > + /* Fun fact: These aren't always constants! */ > + unsigned long supported_alignments[] = { > + PAGE_SIZE, > + HPAGE_PMD_SIZE, > +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD > + HPAGE_PUD_SIZE, > +#endif > + 0, > + }; > + > + return nd_sector_size_show(0, supported_alignments, buf); > +} > +DEVICE_ATTR_RO(supported_alignments); > + > +static ssize_t default_alignment_show(struct device *dev, > + struct device_attribute *attr, char *buf) > +{ > + return sprintf(buf, "%ld\n", HPAGE_PMD_SIZE); > +} > +DEVICE_ATTR_RO(default_alignment); > + > static struct attribute *nd_pfn_attributes[] = { > &dev_attr_mode.attr, > &dev_attr_namespace.attr, > @@ -267,6 +291,8 @@ static struct attribute *nd_pfn_attributes[] = { > &dev_attr_align.attr, > &dev_attr_resource.attr, > &dev_attr_size.attr, > + &dev_attr_supported_alignments.attr, > + &dev_attr_default_alignment.attr, > NULL, So, we don't need DEVICE_ATTR_RO(default_alignment), that can be reflected by setting nd_pfn->align to HPAGE_PMD_SIZE by default and passing nd_pfn->align to nd_sector_size_show(). Should probably rename nd_sector_size_show() to nd_size_select_show(). The other concern is that the current DEVICE_ATTR_RW(align) can be made redundant by this new interface if you make it writable. I wonder if we can avoid breaking old ndctl versions by making the current align setting the first one in the output? Worse comes to worse we can live with two attributes 'align' and 'aligns', but I'd like to see if can add this to the existing attribute.
On Thu, Apr 27, 2017 at 8:59 AM, Dan Williams <dan.j.williams@intel.com> wrote: > On Thu, Apr 27, 2017 at 2:15 AM, Oliver O'Halloran <oohall@gmail.com> wrote: >> Adds two new sysfs attributes for pfn (and dax) devices: >> supported_alignements and default_alignment. These advertise to >> userspace what alignments this kernel supports, and provides a nominal >> default alignment to use. >> >> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> >> --- >> I'm not sure it makes sense to provide these for pfn devices. In the dax >> case we have hard restrictions because of how fault handling works, but >> I'm not convinced this makes sense for the pfn case since it's going to >> be used with fs-dax. > > We still want this for fs-dax so we can make sure that the namespace > is aligned to allow for opportunistic large mappings. We have pmd > support for fs-dax currently shipping, and looking to expand that to > pud support. > >> --- >> drivers/nvdimm/pfn_devs.c | 26 ++++++++++++++++++++++++++ >> 1 file changed, 26 insertions(+) >> >> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c >> index 6c033c9a2f06..5157e7d89f0b 100644 >> --- a/drivers/nvdimm/pfn_devs.c >> +++ b/drivers/nvdimm/pfn_devs.c >> @@ -260,6 +260,30 @@ static ssize_t size_show(struct device *dev, >> } >> static DEVICE_ATTR_RO(size); >> >> +static ssize_t supported_alignments_show(struct device *dev, >> + struct device_attribute *attr, char *buf) >> +{ >> + /* Fun fact: These aren't always constants! */ >> + unsigned long supported_alignments[] = { >> + PAGE_SIZE, >> + HPAGE_PMD_SIZE, >> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD >> + HPAGE_PUD_SIZE, >> +#endif >> + 0, >> + }; >> + >> + return nd_sector_size_show(0, supported_alignments, buf); >> +} >> +DEVICE_ATTR_RO(supported_alignments); >> + >> +static ssize_t default_alignment_show(struct device *dev, >> + struct device_attribute *attr, char *buf) >> +{ >> + return sprintf(buf, "%ld\n", HPAGE_PMD_SIZE); >> +} >> +DEVICE_ATTR_RO(default_alignment); >> + >> static struct attribute *nd_pfn_attributes[] = { >> &dev_attr_mode.attr, >> &dev_attr_namespace.attr, >> @@ -267,6 +291,8 @@ static struct attribute *nd_pfn_attributes[] = { >> &dev_attr_align.attr, >> &dev_attr_resource.attr, >> &dev_attr_size.attr, >> + &dev_attr_supported_alignments.attr, >> + &dev_attr_default_alignment.attr, >> NULL, > > So, we don't need DEVICE_ATTR_RO(default_alignment), that can be > reflected by setting nd_pfn->align to HPAGE_PMD_SIZE by default and > passing nd_pfn->align to nd_sector_size_show(). Should probably rename > nd_sector_size_show() to nd_size_select_show(). > > The other concern is that the current DEVICE_ATTR_RW(align) can be > made redundant by this new interface if you make it writable. I wonder > if we can avoid breaking old ndctl versions by making the current > align setting the first one in the output? Worse comes to worse we can > live with two attributes 'align' and 'aligns', but I'd like to see if > can add this to the existing attribute. Ok, so we can make this backward compatible, all that is needed is to list the current setting as the first entry in the list and make it un-decorated. For example a size list like this with 528 selected: "512 520 [528] 4096 4104 4160 4224" ...would become this: "528 512 520 [528] 4096 4104 4160 4224" ...slightly messy, but it allows us to avoid growing redundant attributes.
On Fri, Apr 28, 2017 at 2:18 AM, Dan Williams <dan.j.williams@intel.com> wrote: > On Thu, Apr 27, 2017 at 8:59 AM, Dan Williams <dan.j.williams@intel.com> wrote: >> On Thu, Apr 27, 2017 at 2:15 AM, Oliver O'Halloran <oohall@gmail.com> wrote: >>> Adds two new sysfs attributes for pfn (and dax) devices: >>> supported_alignements and default_alignment. These advertise to >>> userspace what alignments this kernel supports, and provides a nominal >>> default alignment to use. >>> >>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> >>> --- >>> I'm not sure it makes sense to provide these for pfn devices. In the dax >>> case we have hard restrictions because of how fault handling works, but >>> I'm not convinced this makes sense for the pfn case since it's going to >>> be used with fs-dax. >> >> We still want this for fs-dax so we can make sure that the namespace >> is aligned to allow for opportunistic large mappings. We have pmd >> support for fs-dax currently shipping, and looking to expand that to >> pud support. >> >>> --- >>> drivers/nvdimm/pfn_devs.c | 26 ++++++++++++++++++++++++++ >>> 1 file changed, 26 insertions(+) >>> >>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c >>> index 6c033c9a2f06..5157e7d89f0b 100644 >>> --- a/drivers/nvdimm/pfn_devs.c >>> +++ b/drivers/nvdimm/pfn_devs.c >>> @@ -260,6 +260,30 @@ static ssize_t size_show(struct device *dev, >>> } >>> static DEVICE_ATTR_RO(size); >>> >>> +static ssize_t supported_alignments_show(struct device *dev, >>> + struct device_attribute *attr, char *buf) >>> +{ >>> + /* Fun fact: These aren't always constants! */ >>> + unsigned long supported_alignments[] = { >>> + PAGE_SIZE, >>> + HPAGE_PMD_SIZE, >>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD >>> + HPAGE_PUD_SIZE, >>> +#endif >>> + 0, >>> + }; >>> + >>> + return nd_sector_size_show(0, supported_alignments, buf); >>> +} >>> +DEVICE_ATTR_RO(supported_alignments); >>> + >>> +static ssize_t default_alignment_show(struct device *dev, >>> + struct device_attribute *attr, char *buf) >>> +{ >>> + return sprintf(buf, "%ld\n", HPAGE_PMD_SIZE); >>> +} >>> +DEVICE_ATTR_RO(default_alignment); >>> + >>> static struct attribute *nd_pfn_attributes[] = { >>> &dev_attr_mode.attr, >>> &dev_attr_namespace.attr, >>> @@ -267,6 +291,8 @@ static struct attribute *nd_pfn_attributes[] = { >>> &dev_attr_align.attr, >>> &dev_attr_resource.attr, >>> &dev_attr_size.attr, >>> + &dev_attr_supported_alignments.attr, >>> + &dev_attr_default_alignment.attr, >>> NULL, >> >> So, we don't need DEVICE_ATTR_RO(default_alignment), that can be >> reflected by setting nd_pfn->align to HPAGE_PMD_SIZE by default and >> passing nd_pfn->align to nd_sector_size_show(). Should probably rename >> nd_sector_size_show() to nd_size_select_show(). >> >> The other concern is that the current DEVICE_ATTR_RW(align) can be >> made redundant by this new interface if you make it writable. I wonder >> if we can avoid breaking old ndctl versions by making the current >> align setting the first one in the output? Worse comes to worse we can >> live with two attributes 'align' and 'aligns', but I'd like to see if >> can add this to the existing attribute. > > Ok, so we can make this backward compatible, all that is needed is to > list the current setting as the first entry in the list and make it > un-decorated. For example a size list like this with 528 selected: > > "512 520 [528] 4096 4104 4160 4224" > > ...would become this: > > "528 512 520 [528] 4096 4104 4160 4224" > > ...slightly messy, but it allows us to avoid growing redundant attributes. This is pretty gross, are you sure you want to do this?
On Fri, Apr 28, 2017 at 1:59 AM, Dan Williams <dan.j.williams@intel.com> wrote: > On Thu, Apr 27, 2017 at 2:15 AM, Oliver O'Halloran <oohall@gmail.com> wrote: >> Adds two new sysfs attributes for pfn (and dax) devices: >> supported_alignements and default_alignment. These advertise to >> userspace what alignments this kernel supports, and provides a nominal >> default alignment to use. >> >> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> >> --- >> I'm not sure it makes sense to provide these for pfn devices. In the dax >> case we have hard restrictions because of how fault handling works, but >> I'm not convinced this makes sense for the pfn case since it's going to >> be used with fs-dax. > We still want this for fs-dax so we can make sure that the namespace > is aligned to allow for opportunistic large mappings. We have pmd > support for fs-dax currently shipping, and looking to expand that to > pud support. Sure, but whether we can use a PUD for userspace mappings mostly depends on the allocation decisions of the filesystem rather than the alignment of the namespace. The reservations for the PFN superblock, altmap and dax labels mean the namespace is always going to be unaligned so forcing a PUD alignment will result in a lot of wasted space for dubious benefits. I suppose there's no reason not to provide the functionality, but I don't see it buying us much. >> --- >> drivers/nvdimm/pfn_devs.c | 26 ++++++++++++++++++++++++++ >> 1 file changed, 26 insertions(+) >> >> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c >> index 6c033c9a2f06..5157e7d89f0b 100644 >> --- a/drivers/nvdimm/pfn_devs.c >> +++ b/drivers/nvdimm/pfn_devs.c >> @@ -260,6 +260,30 @@ static ssize_t size_show(struct device *dev, >> } >> static DEVICE_ATTR_RO(size); >> >> +static ssize_t supported_alignments_show(struct device *dev, >> + struct device_attribute *attr, char *buf) >> +{ >> + /* Fun fact: These aren't always constants! */ >> + unsigned long supported_alignments[] = { >> + PAGE_SIZE, >> + HPAGE_PMD_SIZE, >> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD >> + HPAGE_PUD_SIZE, >> +#endif >> + 0, >> + }; >> + >> + return nd_sector_size_show(0, supported_alignments, buf); >> +} >> +DEVICE_ATTR_RO(supported_alignments); >> + >> +static ssize_t default_alignment_show(struct device *dev, >> + struct device_attribute *attr, char *buf) >> +{ >> + return sprintf(buf, "%ld\n", HPAGE_PMD_SIZE); >> +} >> +DEVICE_ATTR_RO(default_alignment); >> + >> static struct attribute *nd_pfn_attributes[] = { >> &dev_attr_mode.attr, >> &dev_attr_namespace.attr, >> @@ -267,6 +291,8 @@ static struct attribute *nd_pfn_attributes[] = { >> &dev_attr_align.attr, >> &dev_attr_resource.attr, >> &dev_attr_size.attr, >> + &dev_attr_supported_alignments.attr, >> + &dev_attr_default_alignment.attr, >> NULL, > > So, we don't need DEVICE_ATTR_RO(default_alignment), that can be > reflected by setting nd_pfn->align to HPAGE_PMD_SIZE by default. Hmm true, if we do this then we can use the alignment of the seed as the default rather than having a separate attribute. > passing nd_pfn->align to nd_sector_size_show(). Should probably rename > nd_sector_size_show() to nd_size_select_show(). I agree. I figured another respin would be required so I kept the changes to a minimum. > The other concern is that the current DEVICE_ATTR_RW(align) can be > made redundant by this new interface if you make it writable. I wonder > if we can avoid breaking old ndctl versions by making the current > align setting the first one in the output? Worse comes to worse we can > live with two attributes 'align' and 'aligns', but I'd like to see if > can add this to the existing attribute. I'd rather have a small amount of redundancy and keep the the attribute consistent with the the btt sector size attribute. We could always remove align some time down the track since I imagine ndctl is the only thing that consumes that part of the interface and ndctl already handles align being missing. Oliver
On Fri, Apr 28, 2017 at 12:31 AM, Oliver O'Halloran <oohall@gmail.com> wrote: > On Fri, Apr 28, 2017 at 1:59 AM, Dan Williams <dan.j.williams@intel.com> wrote: >> On Thu, Apr 27, 2017 at 2:15 AM, Oliver O'Halloran <oohall@gmail.com> wrote: >>> Adds two new sysfs attributes for pfn (and dax) devices: >>> supported_alignements and default_alignment. These advertise to >>> userspace what alignments this kernel supports, and provides a nominal >>> default alignment to use. >>> >>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> >>> --- >>> I'm not sure it makes sense to provide these for pfn devices. In the dax >>> case we have hard restrictions because of how fault handling works, but >>> I'm not convinced this makes sense for the pfn case since it's going to >>> be used with fs-dax. > >> We still want this for fs-dax so we can make sure that the namespace >> is aligned to allow for opportunistic large mappings. We have pmd >> support for fs-dax currently shipping, and looking to expand that to >> pud support. > > Sure, but whether we can use a PUD for userspace mappings mostly > depends on the allocation decisions of the filesystem rather than the > alignment of the namespace. The reservations for the PFN superblock, > altmap and dax labels mean the namespace is always going to be > unaligned so forcing a PUD alignment will result in a lot of wasted > space for dubious benefits. I suppose there's no reason not to provide > the functionality, but I don't see it buying us much. > >>> --- >>> drivers/nvdimm/pfn_devs.c | 26 ++++++++++++++++++++++++++ >>> 1 file changed, 26 insertions(+) >>> >>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c >>> index 6c033c9a2f06..5157e7d89f0b 100644 >>> --- a/drivers/nvdimm/pfn_devs.c >>> +++ b/drivers/nvdimm/pfn_devs.c >>> @@ -260,6 +260,30 @@ static ssize_t size_show(struct device *dev, >>> } >>> static DEVICE_ATTR_RO(size); >>> >>> +static ssize_t supported_alignments_show(struct device *dev, >>> + struct device_attribute *attr, char *buf) >>> +{ >>> + /* Fun fact: These aren't always constants! */ >>> + unsigned long supported_alignments[] = { >>> + PAGE_SIZE, >>> + HPAGE_PMD_SIZE, >>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD >>> + HPAGE_PUD_SIZE, >>> +#endif >>> + 0, >>> + }; >>> + >>> + return nd_sector_size_show(0, supported_alignments, buf); >>> +} >>> +DEVICE_ATTR_RO(supported_alignments); >>> + >>> +static ssize_t default_alignment_show(struct device *dev, >>> + struct device_attribute *attr, char *buf) >>> +{ >>> + return sprintf(buf, "%ld\n", HPAGE_PMD_SIZE); >>> +} >>> +DEVICE_ATTR_RO(default_alignment); >>> + >>> static struct attribute *nd_pfn_attributes[] = { >>> &dev_attr_mode.attr, >>> &dev_attr_namespace.attr, >>> @@ -267,6 +291,8 @@ static struct attribute *nd_pfn_attributes[] = { >>> &dev_attr_align.attr, >>> &dev_attr_resource.attr, >>> &dev_attr_size.attr, >>> + &dev_attr_supported_alignments.attr, >>> + &dev_attr_default_alignment.attr, >>> NULL, >> >> So, we don't need DEVICE_ATTR_RO(default_alignment), that can be >> reflected by setting nd_pfn->align to HPAGE_PMD_SIZE by default. > > Hmm true, if we do this then we can use the alignment of the seed as > the default rather than having a separate attribute. > >> passing nd_pfn->align to nd_sector_size_show(). Should probably rename >> nd_sector_size_show() to nd_size_select_show(). > > I agree. I figured another respin would be required so I kept the > changes to a minimum. > >> The other concern is that the current DEVICE_ATTR_RW(align) can be >> made redundant by this new interface if you make it writable. I wonder >> if we can avoid breaking old ndctl versions by making the current >> align setting the first one in the output? Worse comes to worse we can >> live with two attributes 'align' and 'aligns', but I'd like to see if >> can add this to the existing attribute. > > I'd rather have a small amount of redundancy and keep the the > attribute consistent with the the btt sector size attribute. I'd rather not, that's expanding the kernel-user ABI for only vanity reasons as far as I can see. > We could > always remove align some time down the track since I imagine ndctl is > the only thing that consumes that part of the interface and ndctl > already handles align being missing. No, that breaks old ndctl binaries that depend on the align attribute to be there if the kernel supports device-dax.
On Wed, May 3, 2017 at 7:57 AM, Dan Williams <dan.j.williams@intel.com> wrote: > On Fri, Apr 28, 2017 at 12:31 AM, Oliver O'Halloran <oohall@gmail.com> wrote: >> On Fri, Apr 28, 2017 at 1:59 AM, Dan Williams <dan.j.williams@intel.com> wrote: >>> On Thu, Apr 27, 2017 at 2:15 AM, Oliver O'Halloran <oohall@gmail.com> wrote: >>>> Adds two new sysfs attributes for pfn (and dax) devices: >>>> supported_alignements and default_alignment. These advertise to >>>> userspace what alignments this kernel supports, and provides a nominal >>>> default alignment to use. >>>> >>>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> >>>> --- >>>> I'm not sure it makes sense to provide these for pfn devices. In the dax >>>> case we have hard restrictions because of how fault handling works, but >>>> I'm not convinced this makes sense for the pfn case since it's going to >>>> be used with fs-dax. >> >>> We still want this for fs-dax so we can make sure that the namespace >>> is aligned to allow for opportunistic large mappings. We have pmd >>> support for fs-dax currently shipping, and looking to expand that to >>> pud support. >> >> Sure, but whether we can use a PUD for userspace mappings mostly >> depends on the allocation decisions of the filesystem rather than the >> alignment of the namespace. The reservations for the PFN superblock, >> altmap and dax labels mean the namespace is always going to be >> unaligned so forcing a PUD alignment will result in a lot of wasted >> space for dubious benefits. I suppose there's no reason not to provide >> the functionality, but I don't see it buying us much. >> >>>> --- >>>> drivers/nvdimm/pfn_devs.c | 26 ++++++++++++++++++++++++++ >>>> 1 file changed, 26 insertions(+) >>>> >>>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c >>>> index 6c033c9a2f06..5157e7d89f0b 100644 >>>> --- a/drivers/nvdimm/pfn_devs.c >>>> +++ b/drivers/nvdimm/pfn_devs.c >>>> @@ -260,6 +260,30 @@ static ssize_t size_show(struct device *dev, >>>> } >>>> static DEVICE_ATTR_RO(size); >>>> >>>> +static ssize_t supported_alignments_show(struct device *dev, >>>> + struct device_attribute *attr, char *buf) >>>> +{ >>>> + /* Fun fact: These aren't always constants! */ >>>> + unsigned long supported_alignments[] = { >>>> + PAGE_SIZE, >>>> + HPAGE_PMD_SIZE, >>>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD >>>> + HPAGE_PUD_SIZE, >>>> +#endif >>>> + 0, >>>> + }; >>>> + >>>> + return nd_sector_size_show(0, supported_alignments, buf); >>>> +} >>>> +DEVICE_ATTR_RO(supported_alignments); >>>> + >>>> +static ssize_t default_alignment_show(struct device *dev, >>>> + struct device_attribute *attr, char *buf) >>>> +{ >>>> + return sprintf(buf, "%ld\n", HPAGE_PMD_SIZE); >>>> +} >>>> +DEVICE_ATTR_RO(default_alignment); >>>> + >>>> static struct attribute *nd_pfn_attributes[] = { >>>> &dev_attr_mode.attr, >>>> &dev_attr_namespace.attr, >>>> @@ -267,6 +291,8 @@ static struct attribute *nd_pfn_attributes[] = { >>>> &dev_attr_align.attr, >>>> &dev_attr_resource.attr, >>>> &dev_attr_size.attr, >>>> + &dev_attr_supported_alignments.attr, >>>> + &dev_attr_default_alignment.attr, >>>> NULL, >>> >>> So, we don't need DEVICE_ATTR_RO(default_alignment), that can be >>> reflected by setting nd_pfn->align to HPAGE_PMD_SIZE by default. >> >> Hmm true, if we do this then we can use the alignment of the seed as >> the default rather than having a separate attribute. >> >>> passing nd_pfn->align to nd_sector_size_show(). Should probably rename >>> nd_sector_size_show() to nd_size_select_show(). >> >> I agree. I figured another respin would be required so I kept the >> changes to a minimum. >> >>> The other concern is that the current DEVICE_ATTR_RW(align) can be >>> made redundant by this new interface if you make it writable. I wonder >>> if we can avoid breaking old ndctl versions by making the current >>> align setting the first one in the output? Worse comes to worse we can >>> live with two attributes 'align' and 'aligns', but I'd like to see if >>> can add this to the existing attribute. >> >> I'd rather have a small amount of redundancy and keep the the >> attribute consistent with the the btt sector size attribute. > > I'd rather not, that's expanding the kernel-user ABI for only vanity > reasons as far as I can see. It's an extension of the user-kernel ABI in any case. This is just the most byzantine way to do it. >> We could >> always remove align some time down the track since I imagine ndctl is >> the only thing that consumes that part of the interface and ndctl >> already handles align being missing. > > No, that breaks old ndctl binaries that depend on the align attribute > to be there if the kernel supports device-dax. Fair enough.
On Tue, May 2, 2017 at 8:25 PM, Oliver O'Halloran <oohall@gmail.com> wrote: > On Wed, May 3, 2017 at 7:57 AM, Dan Williams <dan.j.williams@intel.com> wrote: >> On Fri, Apr 28, 2017 at 12:31 AM, Oliver O'Halloran <oohall@gmail.com> wrote: >>> On Fri, Apr 28, 2017 at 1:59 AM, Dan Williams <dan.j.williams@intel.com> wrote: >>>> On Thu, Apr 27, 2017 at 2:15 AM, Oliver O'Halloran <oohall@gmail.com> wrote: >>>>> Adds two new sysfs attributes for pfn (and dax) devices: >>>>> supported_alignements and default_alignment. These advertise to >>>>> userspace what alignments this kernel supports, and provides a nominal >>>>> default alignment to use. >>>>> >>>>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> >>>>> --- >>>>> I'm not sure it makes sense to provide these for pfn devices. In the dax >>>>> case we have hard restrictions because of how fault handling works, but >>>>> I'm not convinced this makes sense for the pfn case since it's going to >>>>> be used with fs-dax. >>> >>>> We still want this for fs-dax so we can make sure that the namespace >>>> is aligned to allow for opportunistic large mappings. We have pmd >>>> support for fs-dax currently shipping, and looking to expand that to >>>> pud support. >>> >>> Sure, but whether we can use a PUD for userspace mappings mostly >>> depends on the allocation decisions of the filesystem rather than the >>> alignment of the namespace. The reservations for the PFN superblock, >>> altmap and dax labels mean the namespace is always going to be >>> unaligned so forcing a PUD alignment will result in a lot of wasted >>> space for dubious benefits. I suppose there's no reason not to provide >>> the functionality, but I don't see it buying us much. >>> >>>>> --- >>>>> drivers/nvdimm/pfn_devs.c | 26 ++++++++++++++++++++++++++ >>>>> 1 file changed, 26 insertions(+) >>>>> >>>>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c >>>>> index 6c033c9a2f06..5157e7d89f0b 100644 >>>>> --- a/drivers/nvdimm/pfn_devs.c >>>>> +++ b/drivers/nvdimm/pfn_devs.c >>>>> @@ -260,6 +260,30 @@ static ssize_t size_show(struct device *dev, >>>>> } >>>>> static DEVICE_ATTR_RO(size); >>>>> >>>>> +static ssize_t supported_alignments_show(struct device *dev, >>>>> + struct device_attribute *attr, char *buf) >>>>> +{ >>>>> + /* Fun fact: These aren't always constants! */ >>>>> + unsigned long supported_alignments[] = { >>>>> + PAGE_SIZE, >>>>> + HPAGE_PMD_SIZE, >>>>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD >>>>> + HPAGE_PUD_SIZE, >>>>> +#endif >>>>> + 0, >>>>> + }; >>>>> + >>>>> + return nd_sector_size_show(0, supported_alignments, buf); >>>>> +} >>>>> +DEVICE_ATTR_RO(supported_alignments); >>>>> + >>>>> +static ssize_t default_alignment_show(struct device *dev, >>>>> + struct device_attribute *attr, char *buf) >>>>> +{ >>>>> + return sprintf(buf, "%ld\n", HPAGE_PMD_SIZE); >>>>> +} >>>>> +DEVICE_ATTR_RO(default_alignment); >>>>> + >>>>> static struct attribute *nd_pfn_attributes[] = { >>>>> &dev_attr_mode.attr, >>>>> &dev_attr_namespace.attr, >>>>> @@ -267,6 +291,8 @@ static struct attribute *nd_pfn_attributes[] = { >>>>> &dev_attr_align.attr, >>>>> &dev_attr_resource.attr, >>>>> &dev_attr_size.attr, >>>>> + &dev_attr_supported_alignments.attr, >>>>> + &dev_attr_default_alignment.attr, >>>>> NULL, >>>> >>>> So, we don't need DEVICE_ATTR_RO(default_alignment), that can be >>>> reflected by setting nd_pfn->align to HPAGE_PMD_SIZE by default. >>> >>> Hmm true, if we do this then we can use the alignment of the seed as >>> the default rather than having a separate attribute. >>> >>>> passing nd_pfn->align to nd_sector_size_show(). Should probably rename >>>> nd_sector_size_show() to nd_size_select_show(). >>> >>> I agree. I figured another respin would be required so I kept the >>> changes to a minimum. >>> >>>> The other concern is that the current DEVICE_ATTR_RW(align) can be >>>> made redundant by this new interface if you make it writable. I wonder >>>> if we can avoid breaking old ndctl versions by making the current >>>> align setting the first one in the output? Worse comes to worse we can >>>> live with two attributes 'align' and 'aligns', but I'd like to see if >>>> can add this to the existing attribute. >>> >>> I'd rather have a small amount of redundancy and keep the the >>> attribute consistent with the the btt sector size attribute. >> >> I'd rather not, that's expanding the kernel-user ABI for only vanity >> reasons as far as I can see. > > It's an extension of the user-kernel ABI in any case. This is just the > most byzantine way to do it. > >>> We could >>> always remove align some time down the track since I imagine ndctl is >>> the only thing that consumes that part of the interface and ndctl >>> already handles align being missing. >> >> No, that breaks old ndctl binaries that depend on the align attribute >> to be there if the kernel supports device-dax. > > Fair enough. All that said, there's nothing stopping us from making 'align' it's own mechanism. Where the first entry in the list is the current setting, in contrast to btt that decorates the current sector-size setting with square brackets.
On Wed, May 3, 2017 at 2:17 PM, Dan Williams <dan.j.williams@intel.com> wrote: > On Tue, May 2, 2017 at 8:25 PM, Oliver O'Halloran <oohall@gmail.com> wrote: >> On Wed, May 3, 2017 at 7:57 AM, Dan Williams <dan.j.williams@intel.com> wrote: >>> On Fri, Apr 28, 2017 at 12:31 AM, Oliver O'Halloran <oohall@gmail.com> wrote: >>>> On Fri, Apr 28, 2017 at 1:59 AM, Dan Williams <dan.j.williams@intel.com> wrote: >>>>> On Thu, Apr 27, 2017 at 2:15 AM, Oliver O'Halloran <oohall@gmail.com> wrote: >>>>>> Adds two new sysfs attributes for pfn (and dax) devices: >>>>>> supported_alignements and default_alignment. These advertise to >>>>>> userspace what alignments this kernel supports, and provides a nominal >>>>>> default alignment to use. >>>>>> >>>>>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> >>>>>> --- >>>>>> I'm not sure it makes sense to provide these for pfn devices. In the dax >>>>>> case we have hard restrictions because of how fault handling works, but >>>>>> I'm not convinced this makes sense for the pfn case since it's going to >>>>>> be used with fs-dax. >>>> >>>>> We still want this for fs-dax so we can make sure that the namespace >>>>> is aligned to allow for opportunistic large mappings. We have pmd >>>>> support for fs-dax currently shipping, and looking to expand that to >>>>> pud support. >>>> >>>> Sure, but whether we can use a PUD for userspace mappings mostly >>>> depends on the allocation decisions of the filesystem rather than the >>>> alignment of the namespace. The reservations for the PFN superblock, >>>> altmap and dax labels mean the namespace is always going to be >>>> unaligned so forcing a PUD alignment will result in a lot of wasted >>>> space for dubious benefits. I suppose there's no reason not to provide >>>> the functionality, but I don't see it buying us much. >>>> >>>>>> --- >>>>>> drivers/nvdimm/pfn_devs.c | 26 ++++++++++++++++++++++++++ >>>>>> 1 file changed, 26 insertions(+) >>>>>> >>>>>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c >>>>>> index 6c033c9a2f06..5157e7d89f0b 100644 >>>>>> --- a/drivers/nvdimm/pfn_devs.c >>>>>> +++ b/drivers/nvdimm/pfn_devs.c >>>>>> @@ -260,6 +260,30 @@ static ssize_t size_show(struct device *dev, >>>>>> } >>>>>> static DEVICE_ATTR_RO(size); >>>>>> >>>>>> +static ssize_t supported_alignments_show(struct device *dev, >>>>>> + struct device_attribute *attr, char *buf) >>>>>> +{ >>>>>> + /* Fun fact: These aren't always constants! */ >>>>>> + unsigned long supported_alignments[] = { >>>>>> + PAGE_SIZE, >>>>>> + HPAGE_PMD_SIZE, >>>>>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD >>>>>> + HPAGE_PUD_SIZE, >>>>>> +#endif >>>>>> + 0, >>>>>> + }; >>>>>> + >>>>>> + return nd_sector_size_show(0, supported_alignments, buf); >>>>>> +} >>>>>> +DEVICE_ATTR_RO(supported_alignments); >>>>>> + >>>>>> +static ssize_t default_alignment_show(struct device *dev, >>>>>> + struct device_attribute *attr, char *buf) >>>>>> +{ >>>>>> + return sprintf(buf, "%ld\n", HPAGE_PMD_SIZE); >>>>>> +} >>>>>> +DEVICE_ATTR_RO(default_alignment); >>>>>> + >>>>>> static struct attribute *nd_pfn_attributes[] = { >>>>>> &dev_attr_mode.attr, >>>>>> &dev_attr_namespace.attr, >>>>>> @@ -267,6 +291,8 @@ static struct attribute *nd_pfn_attributes[] = { >>>>>> &dev_attr_align.attr, >>>>>> &dev_attr_resource.attr, >>>>>> &dev_attr_size.attr, >>>>>> + &dev_attr_supported_alignments.attr, >>>>>> + &dev_attr_default_alignment.attr, >>>>>> NULL, >>>>> >>>>> So, we don't need DEVICE_ATTR_RO(default_alignment), that can be >>>>> reflected by setting nd_pfn->align to HPAGE_PMD_SIZE by default. >>>> >>>> Hmm true, if we do this then we can use the alignment of the seed as >>>> the default rather than having a separate attribute. >>>> >>>>> passing nd_pfn->align to nd_sector_size_show(). Should probably rename >>>>> nd_sector_size_show() to nd_size_select_show(). >>>> >>>> I agree. I figured another respin would be required so I kept the >>>> changes to a minimum. >>>> >>>>> The other concern is that the current DEVICE_ATTR_RW(align) can be >>>>> made redundant by this new interface if you make it writable. I wonder >>>>> if we can avoid breaking old ndctl versions by making the current >>>>> align setting the first one in the output? Worse comes to worse we can >>>>> live with two attributes 'align' and 'aligns', but I'd like to see if >>>>> can add this to the existing attribute. >>>> >>>> I'd rather have a small amount of redundancy and keep the the >>>> attribute consistent with the the btt sector size attribute. >>> >>> I'd rather not, that's expanding the kernel-user ABI for only vanity >>> reasons as far as I can see. >> >> It's an extension of the user-kernel ABI in any case. This is just the >> most byzantine way to do it. >> >>>> We could >>>> always remove align some time down the track since I imagine ndctl is >>>> the only thing that consumes that part of the interface and ndctl >>>> already handles align being missing. >>> >>> No, that breaks old ndctl binaries that depend on the align attribute >>> to be there if the kernel supports device-dax. >> >> Fair enough. > > All that said, there's nothing stopping us from making 'align' it's > own mechanism. Where the first entry in the list is the current > setting, in contrast to btt that decorates the current sector-size > setting with square brackets. I'd be okay with this provided we force the alignment to one of the supported values. Currently the only validation done by the kernel is: if (!is_power_of_2(val) || val < PAGE_SIZE || val > SZ_1G) return -EINVAL; So you can set an unsupported value by poking at sysfs directly. This behaviour is useful for testing since you can use it to force an alignment failure in the DAX fault handler. I'm not overly concerned if it goes, but it's something to keep in mind. I still think it would be cleaner if we just added a separate attribute. Oliver
On Wed, May 3, 2017 at 12:08 AM, Oliver O'Halloran <oohall@gmail.com> wrote: > On Wed, May 3, 2017 at 2:17 PM, Dan Williams <dan.j.williams@intel.com> wrote: [..] >>> Fair enough. >> >> All that said, there's nothing stopping us from making 'align' it's >> own mechanism. Where the first entry in the list is the current >> setting, in contrast to btt that decorates the current sector-size >> setting with square brackets. > > I'd be okay with this provided we force the alignment to one of the > supported values. Currently the only validation done by the kernel is: > > if (!is_power_of_2(val) || val < PAGE_SIZE || val > SZ_1G) > return -EINVAL; Yes, we'd need to validate the input against the supported values. There are no known binaries in the wild that I know of that depend on this looser definition, so we should be ok to change it. > So you can set an unsupported value by poking at sysfs directly. This > behaviour is useful for testing since you can use it to force an > alignment failure in the DAX fault handler. I'd rather move that test support to something like the nfit_test infrastructure. > I'm not overly concerned > if it goes, but it's something to keep in mind. I still think it would > be cleaner if we just added a separate attribute. I'm still having a hard time seeing how redundant sysfs attributes is "clean".
On Wed, May 3, 2017 at 8:38 AM, Dan Williams <dan.j.williams@intel.com> wrote: > On Wed, May 3, 2017 at 12:08 AM, Oliver O'Halloran <oohall@gmail.com> wrote: >> On Wed, May 3, 2017 at 2:17 PM, Dan Williams <dan.j.williams@intel.com> wrote: > [..] >>>> Fair enough. >>> >>> All that said, there's nothing stopping us from making 'align' it's >>> own mechanism. Where the first entry in the list is the current >>> setting, in contrast to btt that decorates the current sector-size >>> setting with square brackets. >> >> I'd be okay with this provided we force the alignment to one of the >> supported values. Currently the only validation done by the kernel is: >> >> if (!is_power_of_2(val) || val < PAGE_SIZE || val > SZ_1G) >> return -EINVAL; > > Yes, we'd need to validate the input against the supported values. > There are no known binaries in the wild that I know of that depend on > this looser definition, so we should be ok to change it. > >> So you can set an unsupported value by poking at sysfs directly. This >> behaviour is useful for testing since you can use it to force an >> alignment failure in the DAX fault handler. > > I'd rather move that test support to something like the nfit_test > infrastructure. > >> I'm not overly concerned >> if it goes, but it's something to keep in mind. I still think it would >> be cleaner if we just added a separate attribute. > > I'm still having a hard time seeing how redundant sysfs attributes is "clean". It turns out the NVML project is also parsing the 'align' attribute outside of ndctl. So, now I'm with you, I think it would better to move the 'possible alignments' to its own read-only attribute ('aligns'?) and leave 'align' as the interface to read/write the current setting.
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 6c033c9a2f06..5157e7d89f0b 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -260,6 +260,30 @@ static ssize_t size_show(struct device *dev, } static DEVICE_ATTR_RO(size); +static ssize_t supported_alignments_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + /* Fun fact: These aren't always constants! */ + unsigned long supported_alignments[] = { + PAGE_SIZE, + HPAGE_PMD_SIZE, +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD + HPAGE_PUD_SIZE, +#endif + 0, + }; + + return nd_sector_size_show(0, supported_alignments, buf); +} +DEVICE_ATTR_RO(supported_alignments); + +static ssize_t default_alignment_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%ld\n", HPAGE_PMD_SIZE); +} +DEVICE_ATTR_RO(default_alignment); + static struct attribute *nd_pfn_attributes[] = { &dev_attr_mode.attr, &dev_attr_namespace.attr, @@ -267,6 +291,8 @@ static struct attribute *nd_pfn_attributes[] = { &dev_attr_align.attr, &dev_attr_resource.attr, &dev_attr_size.attr, + &dev_attr_supported_alignments.attr, + &dev_attr_default_alignment.attr, NULL, };
Adds two new sysfs attributes for pfn (and dax) devices: supported_alignements and default_alignment. These advertise to userspace what alignments this kernel supports, and provides a nominal default alignment to use. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> --- I'm not sure it makes sense to provide these for pfn devices. In the dax case we have hard restrictions because of how fault handling works, but I'm not convinced this makes sense for the pfn case since it's going to be used with fs-dax. --- drivers/nvdimm/pfn_devs.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)