Message ID | 161651910115.13873.14215644994307713797.stgit@6532096d84d3 (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] powerpc/papr_scm: Implement support for H_SCM_FLUSH hcall | expand |
Hi Shiva, Thanks for the patch. Few minor review comments: Shivaprasad G Bhat <sbhat@linux.ibm.com> writes: > Add support for ND_REGION_ASYNC capability if the device tree > indicates 'ibm,hcall-flush-required' property in the NVDIMM node. > Flush is done by issuing H_SCM_FLUSH hcall to the hypervisor. > > If the flush request failed, the hypervisor is expected to > to reflect the problem in the subsequent dimm health request call. s/dimm/nvdimm s/health request call/H_SCM_HEALTH hcall/ > > This patch prevents mmap of namespaces with MAP_SYNC flag if the > nvdimm requires explicit flush[1]. s/explicit/an explicit/ > > References: > [1] https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/ndctl.py.data/map_sync.c > > Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> > --- > v1 - https://www.spinics.net/lists/kvm-ppc/msg18272.html > Changes from v1: > - Hcall semantics finalized, all changes are to accomodate them. > > Documentation/powerpc/papr_hcalls.rst | 14 ++++++++++ > arch/powerpc/include/asm/hvcall.h | 3 +- > arch/powerpc/platforms/pseries/papr_scm.c | 39 +++++++++++++++++++++++++++++ > 3 files changed, 55 insertions(+), 1 deletion(-) > > diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst > index 48fcf1255a33..648f278eea8f 100644 > --- a/Documentation/powerpc/papr_hcalls.rst > +++ b/Documentation/powerpc/papr_hcalls.rst > @@ -275,6 +275,20 @@ Health Bitmap Flags: > Given a DRC Index collect the performance statistics for NVDIMM and copy them > to the resultBuffer. > > +**H_SCM_FLUSH** > + > +| Input: *drcIndex, continue-token* > +| Out: *continue-token* > +| Return Value: *H_SUCCESS, H_Parameter, H_P2, H_BUSY* > + > +Given a DRC Index Flush the data to backend NVDIMM device. > + > +The hcall returns H_BUSY when the flush takes longer time and the hcall needs > +to be issued multiple times in order to be completely serviced. The > +*continue-token* from the output to be passed in the argument list of > +subsequent hcalls to the hypervisor until the hcall is completely serviced > +at which point H_SUCCESS or other error is returned by the hypervisor. > + > References > ========== > .. [1] "Power Architecture Platform Reference" > diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h > index ed6086d57b22..9f7729a97ebd 100644 > --- a/arch/powerpc/include/asm/hvcall.h > +++ b/arch/powerpc/include/asm/hvcall.h > @@ -315,7 +315,8 @@ > #define H_SCM_HEALTH 0x400 > #define H_SCM_PERFORMANCE_STATS 0x418 > #define H_RPT_INVALIDATE 0x448 > -#define MAX_HCALL_OPCODE H_RPT_INVALIDATE > +#define H_SCM_FLUSH 0x44C > +#define MAX_HCALL_OPCODE H_SCM_FLUSH > > /* Scope args for H_SCM_UNBIND_ALL */ > #define H_UNBIND_SCOPE_ALL (0x1) > diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c > index 835163f54244..f0407e135410 100644 > --- a/arch/powerpc/platforms/pseries/papr_scm.c > +++ b/arch/powerpc/platforms/pseries/papr_scm.c > @@ -93,6 +93,7 @@ struct papr_scm_priv { > uint64_t block_size; > int metadata_size; > bool is_volatile; > + bool hcall_flush_required; > > uint64_t bound_addr; > > @@ -117,6 +118,38 @@ struct papr_scm_priv { > size_t stat_buffer_len; > }; > > +static int papr_scm_pmem_flush(struct nd_region *nd_region, > + struct bio *bio __maybe_unused) > +{ > + struct papr_scm_priv *p = nd_region_provider_data(nd_region); > + unsigned long ret_buf[PLPAR_HCALL_BUFSIZE]; > + uint64_t token = 0; > + int64_t rc; > + Suggest adding a dev_dbg to to indicate a flush request to a drc. That way if the loop below gets stuck the issue can be debugged with kernel logs. > + do { > + rc = plpar_hcall(H_SCM_FLUSH, ret_buf, p->drc_index, token); > + token = ret_buf[0]; > + > + /* Check if we are stalled for some time */ > + if (H_IS_LONG_BUSY(rc)) { > + msleep(get_longbusy_msecs(rc)); > + rc = H_BUSY; > + } else if (rc == H_BUSY) { > + cond_resched(); > + } > + > + } while (rc == H_BUSY); > + > + if (rc) { > + dev_err(&p->pdev->dev, "flush error: %lld", rc); > + rc = -EIO; > + } else { > + dev_dbg(&p->pdev->dev, "flush drc 0x%x complete", p->drc_index); > + } > + > + return rc; > +} > + > static LIST_HEAD(papr_nd_regions); > static DEFINE_MUTEX(papr_ndr_lock); > > @@ -943,6 +976,11 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p) > ndr_desc.num_mappings = 1; > ndr_desc.nd_set = &p->nd_set; > > + if (p->hcall_flush_required) { > + set_bit(ND_REGION_ASYNC, &ndr_desc.flags); > + ndr_desc.flush = papr_scm_pmem_flush; > + } > + > if (p->is_volatile) > p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc); > else { > @@ -1088,6 +1126,7 @@ static int papr_scm_probe(struct platform_device *pdev) > p->block_size = block_size; > p->blocks = blocks; > p->is_volatile = !of_property_read_bool(dn, "ibm,cache-flush-required"); > + p->hcall_flush_required = of_property_read_bool(dn, "ibm,hcall-flush-required"); > > /* We just need to ensure that set cookies are unique across */ > uuid_parse(uuid_str, (uuid_t *) uuid); > >
diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst index 48fcf1255a33..648f278eea8f 100644 --- a/Documentation/powerpc/papr_hcalls.rst +++ b/Documentation/powerpc/papr_hcalls.rst @@ -275,6 +275,20 @@ Health Bitmap Flags: Given a DRC Index collect the performance statistics for NVDIMM and copy them to the resultBuffer. +**H_SCM_FLUSH** + +| Input: *drcIndex, continue-token* +| Out: *continue-token* +| Return Value: *H_SUCCESS, H_Parameter, H_P2, H_BUSY* + +Given a DRC Index Flush the data to backend NVDIMM device. + +The hcall returns H_BUSY when the flush takes longer time and the hcall needs +to be issued multiple times in order to be completely serviced. The +*continue-token* from the output to be passed in the argument list of +subsequent hcalls to the hypervisor until the hcall is completely serviced +at which point H_SUCCESS or other error is returned by the hypervisor. + References ========== .. [1] "Power Architecture Platform Reference" diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index ed6086d57b22..9f7729a97ebd 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -315,7 +315,8 @@ #define H_SCM_HEALTH 0x400 #define H_SCM_PERFORMANCE_STATS 0x418 #define H_RPT_INVALIDATE 0x448 -#define MAX_HCALL_OPCODE H_RPT_INVALIDATE +#define H_SCM_FLUSH 0x44C +#define MAX_HCALL_OPCODE H_SCM_FLUSH /* Scope args for H_SCM_UNBIND_ALL */ #define H_UNBIND_SCOPE_ALL (0x1) diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index 835163f54244..f0407e135410 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -93,6 +93,7 @@ struct papr_scm_priv { uint64_t block_size; int metadata_size; bool is_volatile; + bool hcall_flush_required; uint64_t bound_addr; @@ -117,6 +118,38 @@ struct papr_scm_priv { size_t stat_buffer_len; }; +static int papr_scm_pmem_flush(struct nd_region *nd_region, + struct bio *bio __maybe_unused) +{ + struct papr_scm_priv *p = nd_region_provider_data(nd_region); + unsigned long ret_buf[PLPAR_HCALL_BUFSIZE]; + uint64_t token = 0; + int64_t rc; + + do { + rc = plpar_hcall(H_SCM_FLUSH, ret_buf, p->drc_index, token); + token = ret_buf[0]; + + /* Check if we are stalled for some time */ + if (H_IS_LONG_BUSY(rc)) { + msleep(get_longbusy_msecs(rc)); + rc = H_BUSY; + } else if (rc == H_BUSY) { + cond_resched(); + } + + } while (rc == H_BUSY); + + if (rc) { + dev_err(&p->pdev->dev, "flush error: %lld", rc); + rc = -EIO; + } else { + dev_dbg(&p->pdev->dev, "flush drc 0x%x complete", p->drc_index); + } + + return rc; +} + static LIST_HEAD(papr_nd_regions); static DEFINE_MUTEX(papr_ndr_lock); @@ -943,6 +976,11 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p) ndr_desc.num_mappings = 1; ndr_desc.nd_set = &p->nd_set; + if (p->hcall_flush_required) { + set_bit(ND_REGION_ASYNC, &ndr_desc.flags); + ndr_desc.flush = papr_scm_pmem_flush; + } + if (p->is_volatile) p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc); else { @@ -1088,6 +1126,7 @@ static int papr_scm_probe(struct platform_device *pdev) p->block_size = block_size; p->blocks = blocks; p->is_volatile = !of_property_read_bool(dn, "ibm,cache-flush-required"); + p->hcall_flush_required = of_property_read_bool(dn, "ibm,hcall-flush-required"); /* We just need to ensure that set cookies are unique across */ uuid_parse(uuid_str, (uuid_t *) uuid);
Add support for ND_REGION_ASYNC capability if the device tree indicates 'ibm,hcall-flush-required' property in the NVDIMM node. Flush is done by issuing H_SCM_FLUSH hcall to the hypervisor. If the flush request failed, the hypervisor is expected to to reflect the problem in the subsequent dimm health request call. This patch prevents mmap of namespaces with MAP_SYNC flag if the nvdimm requires explicit flush[1]. References: [1] https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/ndctl.py.data/map_sync.c Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> --- v1 - https://www.spinics.net/lists/kvm-ppc/msg18272.html Changes from v1: - Hcall semantics finalized, all changes are to accomodate them. Documentation/powerpc/papr_hcalls.rst | 14 ++++++++++ arch/powerpc/include/asm/hvcall.h | 3 +- arch/powerpc/platforms/pseries/papr_scm.c | 39 +++++++++++++++++++++++++++++ 3 files changed, 55 insertions(+), 1 deletion(-)