Message ID | 20200108064905.170394-1-aneesh.kumar@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] libnvdimm: Update the meaning for persistence_domain values | expand |
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes: > Currently, kernel shows the below values > "persistence_domain":"cpu_cache" > "persistence_domain":"memory_controller" > "persistence_domain":"unknown" > > This patch updates the meaning of these values such that > > "cpu_cache" indicates no extra instructions is needed to ensure the persistence > of data in the pmem media on power failure. > > "memory_controller" indicates platform provided instructions need to be issued > as per documented sequence to make sure data flushed is guaranteed to be on pmem > media in case of system power loss. > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > --- > arch/powerpc/platforms/pseries/papr_scm.c | 7 ++++++- > include/linux/libnvdimm.h | 6 +++--- > 2 files changed, 9 insertions(+), 4 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c > index c2ef320ba1bf..26a5ef263758 100644 > --- a/arch/powerpc/platforms/pseries/papr_scm.c > +++ b/arch/powerpc/platforms/pseries/papr_scm.c > @@ -360,8 +360,13 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p) > > if (p->is_volatile) > p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc); > - else > + else { > + /* > + * We need to flush things correctly to guarantee persistance > + */ > + set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags); > p->region = nvdimm_pmem_region_create(p->bus, &ndr_desc); > + } > if (!p->region) { > dev_err(dev, "Error registering region %pR from %pOF\n", > ndr_desc.res, p->dn); Would you also update of_pmem to indicate the persistence domain, please? > diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h > index f2a33f2e3ba8..9126737377e1 100644 > --- a/include/linux/libnvdimm.h > +++ b/include/linux/libnvdimm.h > @@ -52,9 +52,9 @@ enum { > */ > ND_REGION_PERSIST_CACHE = 1, > /* > - * Platform provides mechanisms to automatically flush outstanding > - * write data from memory controler to pmem on system power loss. > - * (ADR) > + * Platform provides instructions to flush data such that on completion > + * of the instructions, data flushed is guaranteed to be on pmem even > + * in case of a system power loss. I find the prior description easier to understand. -Jeff
On 1/15/20 10:25 PM, Jeff Moyer wrote: > "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes: > >> Currently, kernel shows the below values >> "persistence_domain":"cpu_cache" >> "persistence_domain":"memory_controller" >> "persistence_domain":"unknown" >> >> This patch updates the meaning of these values such that >> >> "cpu_cache" indicates no extra instructions is needed to ensure the persistence >> of data in the pmem media on power failure. >> >> "memory_controller" indicates platform provided instructions need to be issued >> as per documented sequence to make sure data flushed is guaranteed to be on pmem >> media in case of system power loss. >> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> >> --- >> arch/powerpc/platforms/pseries/papr_scm.c | 7 ++++++- >> include/linux/libnvdimm.h | 6 +++--- >> 2 files changed, 9 insertions(+), 4 deletions(-) >> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c >> index c2ef320ba1bf..26a5ef263758 100644 >> --- a/arch/powerpc/platforms/pseries/papr_scm.c >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c >> @@ -360,8 +360,13 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p) >> >> if (p->is_volatile) >> p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc); >> - else >> + else { >> + /* >> + * We need to flush things correctly to guarantee persistance >> + */ >> + set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags); >> p->region = nvdimm_pmem_region_create(p->bus, &ndr_desc); >> + } >> if (!p->region) { >> dev_err(dev, "Error registering region %pR from %pOF\n", >> ndr_desc.res, p->dn); > > Would you also update of_pmem to indicate the persistence domain, > please? > sure. >> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h >> index f2a33f2e3ba8..9126737377e1 100644 >> --- a/include/linux/libnvdimm.h >> +++ b/include/linux/libnvdimm.h >> @@ -52,9 +52,9 @@ enum { >> */ >> ND_REGION_PERSIST_CACHE = 1, >> /* >> - * Platform provides mechanisms to automatically flush outstanding >> - * write data from memory controler to pmem on system power loss. >> - * (ADR) >> + * Platform provides instructions to flush data such that on completion >> + * of the instructions, data flushed is guaranteed to be on pmem even >> + * in case of a system power loss. > > I find the prior description easier to understand. I was trying to avoid the term 'automatically, 'memory controler' and ADR. Can I update the above as /* * Platform provides mechanisms to flush outstanding write data * to pmem on system power loss. */ -aneesh
On 1/15/20 10:57 PM, Aneesh Kumar K.V wrote: > On 1/15/20 10:25 PM, Jeff Moyer wrote: >> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes: >> >>> Currently, kernel shows the below values >>> "persistence_domain":"cpu_cache" >>> "persistence_domain":"memory_controller" >>> "persistence_domain":"unknown" >>> >>> This patch updates the meaning of these values such that >>> >>> "cpu_cache" indicates no extra instructions is needed to ensure the >>> persistence >>> of data in the pmem media on power failure. >>> >>> "memory_controller" indicates platform provided instructions need to >>> be issued >>> as per documented sequence to make sure data flushed is guaranteed to >>> be on pmem >>> media in case of system power loss. >>> >>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> >>> --- >>> arch/powerpc/platforms/pseries/papr_scm.c | 7 ++++++- >>> include/linux/libnvdimm.h | 6 +++--- >>> 2 files changed, 9 insertions(+), 4 deletions(-) >>> >>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c >>> b/arch/powerpc/platforms/pseries/papr_scm.c >>> index c2ef320ba1bf..26a5ef263758 100644 >>> --- a/arch/powerpc/platforms/pseries/papr_scm.c >>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c >>> @@ -360,8 +360,13 @@ static int papr_scm_nvdimm_init(struct >>> papr_scm_priv *p) >>> if (p->is_volatile) >>> p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc); >>> - else >>> + else { >>> + /* >>> + * We need to flush things correctly to guarantee persistance >>> + */ >>> + set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags); >>> p->region = nvdimm_pmem_region_create(p->bus, &ndr_desc); >>> + } >>> if (!p->region) { >>> dev_err(dev, "Error registering region %pR from %pOF\n", >>> ndr_desc.res, p->dn); >> >> Would you also update of_pmem to indicate the persistence domain, >> please? >> > > sure. > > >>> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h >>> index f2a33f2e3ba8..9126737377e1 100644 >>> --- a/include/linux/libnvdimm.h >>> +++ b/include/linux/libnvdimm.h >>> @@ -52,9 +52,9 @@ enum { >>> */ >>> ND_REGION_PERSIST_CACHE = 1, >>> /* >>> - * Platform provides mechanisms to automatically flush outstanding >>> - * write data from memory controler to pmem on system power loss. >>> - * (ADR) >>> + * Platform provides instructions to flush data such that on >>> completion >>> + * of the instructions, data flushed is guaranteed to be on pmem >>> even >>> + * in case of a system power loss. >> >> I find the prior description easier to understand. > > > I was trying to avoid the term 'automatically, 'memory controler' and > ADR. Can I update the above as > > /* > * Platform provides mechanisms to flush outstanding write data > * to pmem on system power loss. > */ > Wanted to add more details. So with the above interpretation, if the persistence_domain is found to be 'cpu_cache', application can expect a store instruction to guarantee persistence. If it is 'none' there is no persistence ( I am not sure how that is the difference from 'volatile' pmem region). If it is 'memory_controller' ( I am not sure whether that is the right term), application needs to follow the recommended mechanism to flush write data to pmem. -aneesh
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes: >> Would you also update of_pmem to indicate the persistence domain, >> please? >> > > sure. Thanks! >>> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h >>> index f2a33f2e3ba8..9126737377e1 100644 >>> --- a/include/linux/libnvdimm.h >>> +++ b/include/linux/libnvdimm.h >>> @@ -52,9 +52,9 @@ enum { >>> */ >>> ND_REGION_PERSIST_CACHE = 1, >>> /* >>> - * Platform provides mechanisms to automatically flush outstanding >>> - * write data from memory controler to pmem on system power loss. >>> - * (ADR) >>> + * Platform provides instructions to flush data such that on completion >>> + * of the instructions, data flushed is guaranteed to be on pmem even >>> + * in case of a system power loss. >> >> I find the prior description easier to understand. > > I was trying to avoid the term 'automatically, 'memory controler' and > ADR. Can I update the above as I can understand avoiding the very x86-specific "ADR," but is memory controller not accurate for your platform? > /* > * Platform provides mechanisms to flush outstanding write data > * to pmem on system power loss. > */ That's way too broad. :) The comments are describing the persistence domain. i.e. if you get data to $HERE, it is guaranteed to make it out to stable media. -Jeff
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes: >> /* >> * Platform provides mechanisms to flush outstanding write data >> * to pmem on system power loss. >> */ >> > > Wanted to add more details. So with the above interpretation, if the > persistence_domain is found to be 'cpu_cache', application can expect > a store instruction to guarantee persistence. If it is 'none' there is > no persistence ( I am not sure how that is the difference from > 'volatile' pmem region). If it is 'memory_controller' ( I am not sure > whether that is the right term), application needs to follow the > recommended mechanism to flush write data to pmem. There is no enum for "NONE". If there is no persistence domain specified, then it is undefined/unknown, which results in the message: nd_pmem namespace0.0: unable to guarantee persistence of writes Other than that, yes, that's right. -Jeff
On 1/15/20 11:05 PM, Jeff Moyer wrote: > "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes: > >>>> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h >>>> index f2a33f2e3ba8..9126737377e1 100644 >>>> --- a/include/linux/libnvdimm.h >>>> +++ b/include/linux/libnvdimm.h >>>> @@ -52,9 +52,9 @@ enum { >>>> */ >>>> ND_REGION_PERSIST_CACHE = 1, >>>> /* >>>> - * Platform provides mechanisms to automatically flush outstanding >>>> - * write data from memory controler to pmem on system power loss. >>>> - * (ADR) >>>> + * Platform provides instructions to flush data such that on completion >>>> + * of the instructions, data flushed is guaranteed to be on pmem even >>>> + * in case of a system power loss. >>> >>> I find the prior description easier to understand. >> >> I was trying to avoid the term 'automatically, 'memory controler' and >> ADR. Can I update the above as > > I can understand avoiding the very x86-specific "ADR," but is memory > controller not accurate for your platform? > >> /* >> * Platform provides mechanisms to flush outstanding write data >> * to pmem on system power loss. >> */ > > That's way too broad. :) The comments are describing the persistence > domain. i.e. if you get data to $HERE, it is guaranteed to make it out > to stable media. With technologies like OpenCAPI, we possibly may not want to call them "memory controller"? In a way platform mechanism will flush them such that on power failure, it is guaranteed to be on the pmem media. But should we call these boundary "memory_controller"? May be we can consider "memory controller" an overloaded term there. Considering we are calling this as memory_controller for application to parse via sysfs, may be the documentation can also carry the same term. -aneesh
On Wed, Jan 15, 2020 at 9:31 AM Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> wrote: > > On 1/15/20 10:57 PM, Aneesh Kumar K.V wrote: > > On 1/15/20 10:25 PM, Jeff Moyer wrote: > >> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes: > >> > >>> Currently, kernel shows the below values > >>> "persistence_domain":"cpu_cache" > >>> "persistence_domain":"memory_controller" > >>> "persistence_domain":"unknown" > >>> > >>> This patch updates the meaning of these values such that > >>> > >>> "cpu_cache" indicates no extra instructions is needed to ensure the > >>> persistence > >>> of data in the pmem media on power failure. > >>> > >>> "memory_controller" indicates platform provided instructions need to > >>> be issued > >>> as per documented sequence to make sure data flushed is guaranteed to > >>> be on pmem > >>> media in case of system power loss. > >>> > >>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > >>> --- > >>> arch/powerpc/platforms/pseries/papr_scm.c | 7 ++++++- > >>> include/linux/libnvdimm.h | 6 +++--- > >>> 2 files changed, 9 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c > >>> b/arch/powerpc/platforms/pseries/papr_scm.c > >>> index c2ef320ba1bf..26a5ef263758 100644 > >>> --- a/arch/powerpc/platforms/pseries/papr_scm.c > >>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c > >>> @@ -360,8 +360,13 @@ static int papr_scm_nvdimm_init(struct > >>> papr_scm_priv *p) > >>> if (p->is_volatile) > >>> p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc); > >>> - else > >>> + else { > >>> + /* > >>> + * We need to flush things correctly to guarantee persistance > >>> + */ > >>> + set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags); > >>> p->region = nvdimm_pmem_region_create(p->bus, &ndr_desc); > >>> + } > >>> if (!p->region) { > >>> dev_err(dev, "Error registering region %pR from %pOF\n", > >>> ndr_desc.res, p->dn); > >> > >> Would you also update of_pmem to indicate the persistence domain, > >> please? > >> > > > > sure. > > > > > >>> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h > >>> index f2a33f2e3ba8..9126737377e1 100644 > >>> --- a/include/linux/libnvdimm.h > >>> +++ b/include/linux/libnvdimm.h > >>> @@ -52,9 +52,9 @@ enum { > >>> */ > >>> ND_REGION_PERSIST_CACHE = 1, > >>> /* > >>> - * Platform provides mechanisms to automatically flush outstanding > >>> - * write data from memory controler to pmem on system power loss. > >>> - * (ADR) > >>> + * Platform provides instructions to flush data such that on > >>> completion > >>> + * of the instructions, data flushed is guaranteed to be on pmem > >>> even > >>> + * in case of a system power loss. > >> > >> I find the prior description easier to understand. > > > > > > I was trying to avoid the term 'automatically, 'memory controler' and > > ADR. Can I update the above as > > > > /* > > * Platform provides mechanisms to flush outstanding write data > > * to pmem on system power loss. > > */ > > > > Wanted to add more details. So with the above interpretation, if the > persistence_domain is found to be 'cpu_cache', application can expect a > store instruction to guarantee persistence. The expectation is globally visible stores are persisted > If it is 'none' there is no > persistence ( I am not sure how that is the difference from 'volatile' > pmem region). "None" means the "platform does not enumerate a persistence domain". That's not necessarily "volatile" because the user may know that they have battery backup, or some other private/out-of-band capability not exported by the platform. In which case they would manually need to manipulate the pmemX/write_cache property manually. > If it is 'memory_controller' ( I am not sure whether that > is the right term), application needs to follow the recommended > mechanism to flush write data to pmem. If it is memory_controller the expectation is that flushing data out of the cpu caches and making those writebacks to memory globally visible is sufficient for the platform to persist flushed data on a power loss.
On Wed, Jan 15, 2020 at 9:56 AM Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> wrote: > > On 1/15/20 11:05 PM, Jeff Moyer wrote: > > "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes: > > > > >>>> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h > >>>> index f2a33f2e3ba8..9126737377e1 100644 > >>>> --- a/include/linux/libnvdimm.h > >>>> +++ b/include/linux/libnvdimm.h > >>>> @@ -52,9 +52,9 @@ enum { > >>>> */ > >>>> ND_REGION_PERSIST_CACHE = 1, > >>>> /* > >>>> - * Platform provides mechanisms to automatically flush outstanding > >>>> - * write data from memory controler to pmem on system power loss. > >>>> - * (ADR) > >>>> + * Platform provides instructions to flush data such that on completion > >>>> + * of the instructions, data flushed is guaranteed to be on pmem even > >>>> + * in case of a system power loss. > >>> > >>> I find the prior description easier to understand. > >> > >> I was trying to avoid the term 'automatically, 'memory controler' and > >> ADR. Can I update the above as > > > > I can understand avoiding the very x86-specific "ADR," but is memory > > controller not accurate for your platform? > > > >> /* > >> * Platform provides mechanisms to flush outstanding write data > >> * to pmem on system power loss. > >> */ > > > > That's way too broad. :) The comments are describing the persistence > > domain. i.e. if you get data to $HERE, it is guaranteed to make it out > > to stable media. > > With technologies like OpenCAPI, we possibly may not want to call them > "memory controller"? In a way platform mechanism will flush them such > that on power failure, it is guaranteed to be on the pmem media. But > should we call these boundary "memory_controller"? May be we can > consider "memory controller" an overloaded term there. Considering we > are calling this as memory_controller for application to parse via > sysfs, may be the documentation can also carry the same term. I don't see how OpenCAPI or any other transport has any bearing on the "memory_controller" term. It's still a controller of persistent memory and it needs to have the write data received at its buffers / queue to ensure that the data gets persisted, or, as in the cpu_cache case, some other agent takes responsibility for shuttling pending writes that have hit the cache out over the transport to be persisted.
On 1/16/20 1:18 AM, Dan Williams wrote: > On Wed, Jan 15, 2020 at 9:56 AM Aneesh Kumar K.V > <aneesh.kumar@linux.ibm.com> wrote: >> >> On 1/15/20 11:05 PM, Jeff Moyer wrote: >>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes: >>> >> >>>>>> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h >>>>>> index f2a33f2e3ba8..9126737377e1 100644 >>>>>> --- a/include/linux/libnvdimm.h >>>>>> +++ b/include/linux/libnvdimm.h >>>>>> @@ -52,9 +52,9 @@ enum { >>>>>> */ >>>>>> ND_REGION_PERSIST_CACHE = 1, >>>>>> /* >>>>>> - * Platform provides mechanisms to automatically flush outstanding >>>>>> - * write data from memory controler to pmem on system power loss. >>>>>> - * (ADR) >>>>>> + * Platform provides instructions to flush data such that on completion >>>>>> + * of the instructions, data flushed is guaranteed to be on pmem even >>>>>> + * in case of a system power loss. >>>>> >>>>> I find the prior description easier to understand. >>>> >>>> I was trying to avoid the term 'automatically, 'memory controler' and >>>> ADR. Can I update the above as >>> >>> I can understand avoiding the very x86-specific "ADR," but is memory >>> controller not accurate for your platform? >>> >>>> /* >>>> * Platform provides mechanisms to flush outstanding write data >>>> * to pmem on system power loss. >>>> */ >>> >>> That's way too broad. :) The comments are describing the persistence >>> domain. i.e. if you get data to $HERE, it is guaranteed to make it out >>> to stable media. >> >> With technologies like OpenCAPI, we possibly may not want to call them >> "memory controller"? In a way platform mechanism will flush them such >> that on power failure, it is guaranteed to be on the pmem media. But >> should we call these boundary "memory_controller"? May be we can >> consider "memory controller" an overloaded term there. Considering we >> are calling this as memory_controller for application to parse via >> sysfs, may be the documentation can also carry the same term. > > I don't see how OpenCAPI or any other transport has any bearing on the > "memory_controller" term. It's still a controller of persistent memory > and it needs to have the write data received at its buffers / queue to > ensure that the data gets persisted, or, as in the cpu_cache case, > some other agent takes responsibility for shuttling pending writes > that have hit the cache out over the transport to be persisted. > Agreed. I want to make sure we document that details correctly. It is a controller of persistent memory and in some cases, there is no reserve power available to keep things in self-refresh mode and to flush things automatically. The platform provided mechanism will ensure the write data is in the pmem media. Should the later have a persistence_domain value "pmem media" ? Even then I am not sure how applications are supposed to use this information. IMHO what is important for application is to differentiate between whether a platform specific flush mechanism is needed or not. Hence was trying to keep this as two value property. IS there any other detail application is supposed to infer from this property? -aneesh
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index c2ef320ba1bf..26a5ef263758 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -360,8 +360,13 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p) if (p->is_volatile) p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc); - else + else { + /* + * We need to flush things correctly to guarantee persistance + */ + set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags); p->region = nvdimm_pmem_region_create(p->bus, &ndr_desc); + } if (!p->region) { dev_err(dev, "Error registering region %pR from %pOF\n", ndr_desc.res, p->dn); diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h index f2a33f2e3ba8..9126737377e1 100644 --- a/include/linux/libnvdimm.h +++ b/include/linux/libnvdimm.h @@ -52,9 +52,9 @@ enum { */ ND_REGION_PERSIST_CACHE = 1, /* - * Platform provides mechanisms to automatically flush outstanding - * write data from memory controler to pmem on system power loss. - * (ADR) + * Platform provides instructions to flush data such that on completion + * of the instructions, data flushed is guaranteed to be on pmem even + * in case of a system power loss. */ ND_REGION_PERSIST_MEMCTRL = 2,
Currently, kernel shows the below values "persistence_domain":"cpu_cache" "persistence_domain":"memory_controller" "persistence_domain":"unknown" This patch updates the meaning of these values such that "cpu_cache" indicates no extra instructions is needed to ensure the persistence of data in the pmem media on power failure. "memory_controller" indicates platform provided instructions need to be issued as per documented sequence to make sure data flushed is guaranteed to be on pmem media in case of system power loss. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> --- arch/powerpc/platforms/pseries/papr_scm.c | 7 ++++++- include/linux/libnvdimm.h | 6 +++--- 2 files changed, 9 insertions(+), 4 deletions(-)