Message ID | 164396252398.109112.13436924292537517470.stgit@ltczzess4.aus.stglabs.ibm.com (mailing list archive) |
---|---|
Headers | show |
Series | spapr: nvdimm: Introduce spapr-nvdimm device | expand |
On 2/4/22 05:15, Shivaprasad G Bhat wrote: > If the device backend is not persistent memory for the nvdimm, there > is need for explicit IO flushes to ensure persistence. > > On SPAPR, the issue is addressed by adding a new hcall to request for > an explicit flush from the guest when the backend is not pmem. > So, the approach here is to convey when the hcall flush is required > in a device tree property. The guest once it knows the device needs > explicit flushes, makes the hcall as and when required. > > It was suggested to create a new device type to address the > explicit flush for such backends on PPC instead of extending the > generic nvdimm device with new property. So, the patch introduces > the spapr-nvdimm device. The new device inherits the nvdimm device > with the new bahviour such that if the backend has pmem=no, the > device tree property is set by default. > > The below demonstration shows the map_sync behavior for non-pmem > backends. > (https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/ndctl.py.data/map_sync.c) > > The pmem0 is from spapr-nvdimm with with backend pmem=on, and pmem1 is > from spapr-nvdimm with pmem=off, mounted as > /dev/pmem0 on /mnt1 type xfs (rw,relatime,attr2,dax=always,inode64,logbufs=8,logbsize=32k,noquota) > /dev/pmem1 on /mnt2 type xfs (rw,relatime,attr2,dax=always,inode64,logbufs=8,logbsize=32k,noquota) > > [root@atest-guest ~]# ./mapsync /mnt1/newfile ----> When pmem=on > [root@atest-guest ~]# ./mapsync /mnt2/newfile ----> when pmem=off > Failed to mmap with Operation not supported > > First patch adds the realize/unrealize call backs to the generic device > for the new device's vmstate registration. The second patch implements > the hcall, adds the necessary vmstate properties to spapr machine structure > for carrying the hcall status during save-restore. The nature of the hcall > being asynchronus, the patch uses aio utilities to offload the flush. The > third patch introduces the spapr-nvdimm device, adds the device tree > property for the guest when spapr-nvdimm is used with pmem=no on the > backend. Also adds new property pmem-override(?, suggest if you have better > name) to the spapr-nvdimm which hints at forcing the hcall based flushes even > on pmem backed devices. > > The kernel changes to exploit this hcall is at > https://github.com/linuxppc/linux/commit/75b7c05ebf9026.patch > > --- I noted that we have only two nvdimm tests in QEMU, both in tests/qtest/bios-tables-test.c. It would be a good future improvement to add some spapr-nvdimm tests there as well. Thanks, Daniel > v6 - https://lists.gnu.org/archive/html/qemu-devel/2022-02/msg00322.html > Changes from v6: > - Addressed commen from Daniel. > Fixed a typo > Fetch the memory backend FD in the flush_worker_cb(), updated hcall > return values in the comments description) > - Updated the signatures. > > v5 - https://lists.gnu.org/archive/html/qemu-devel/2021-07/msg01741.html > Changes from v5: > - Taken care of all comments from David > - Moved the flush lists from spapr machine into the spapr-nvdimm device > state structures. So, all corresponding data structures adjusted > accordingly as required. > - New property pmem-overrride is added to the spapr-nvdimm device. The > hcall flushes are allowed when pmem-override is set for the device. > - The flush for pmem backend devices are made to use pmem_persist(). > - The vmstate structures are also made part of device state instead of > global spapr. > - Passing the flush token to destination during migration, I think its > better than finding, deriving it from the outstanding ones. > > v4 - https://lists.gnu.org/archive/html/qemu-devel/2021-04/msg05982.html > Changes from v4: > - Introduce spapr-nvdimm device with nvdimm device as the parent. > - The new spapr-nvdimm has no new properties. As this is a new > device and there is no migration related dependencies to be > taken care of, the device behavior is made to set the device tree > property and enable hcall when the device type spapr-nvdimm is > used with pmem=off > - Fixed commit messages > - Added checks to ensure the backend is actualy file and not memory > - Addressed things pointed out by Eric > > v3 - https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg07916.html > Changes from v3: > - Fixed the forward declaration coding guideline violations in 1st patch. > - Removed the code waiting for the flushes to complete during migration, > instead restart the flush worker on destination qemu in post load. > - Got rid of the randomization of the flush tokens, using simple > counter. > - Got rid of the redundant flush state lock, relying on the BQL now. > - Handling the memory-backend-ram usage > - Changed the sync-dax symantics from on/off to 'unsafe','writeback' and 'direct'. > Added prevention code using 'writeback' on arm and x86_64. > - Fixed all the miscellaneous comments. > > v2 - https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg07031.html > Changes from v2: > - Using the thread pool based approach as suggested > - Moved the async hcall handling code to spapr_nvdimm.c along > with some simplifications > - Added vmstate to preserve the hcall status during save-restore > along with pre_save handler code to complete all ongoning flushes. > - Added hw_compat magic for sync-dax 'on' on previous machines. > - Miscellanious minor fixes. > > v1 - https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg06330.html > Changes from v1 > - Fixed a missed-out unlock > - using QLIST_FOREACH instead of QLIST_FOREACH_SAFE while generating token > > Shivaprasad G Bhat (3): > nvdimm: Add realize, unrealize callbacks to NVDIMMDevice class > spapr: nvdimm: Implement H_SCM_FLUSH hcall > spapr: nvdimm: Introduce spapr-nvdimm device > > > hw/mem/nvdimm.c | 16 ++ > hw/mem/pc-dimm.c | 5 + > hw/ppc/spapr.c | 2 + > hw/ppc/spapr_nvdimm.c | 394 ++++++++++++++++++++++++++++++++++ > include/hw/mem/nvdimm.h | 2 + > include/hw/mem/pc-dimm.h | 1 + > include/hw/ppc/spapr.h | 4 +- > include/hw/ppc/spapr_nvdimm.h | 1 + > 8 files changed, 424 insertions(+), 1 deletion(-) > > -- > Signature >
On 2/4/22 09:15, Shivaprasad G Bhat wrote: > If the device backend is not persistent memory for the nvdimm, there > is need for explicit IO flushes to ensure persistence. > > On SPAPR, the issue is addressed by adding a new hcall to request for > an explicit flush from the guest when the backend is not pmem. > So, the approach here is to convey when the hcall flush is required > in a device tree property. The guest once it knows the device needs > explicit flushes, makes the hcall as and when required. > > It was suggested to create a new device type to address the > explicit flush for such backends on PPC instead of extending the > generic nvdimm device with new property. So, the patch introduces > the spapr-nvdimm device. The new device inherits the nvdimm device > with the new bahviour such that if the backend has pmem=no, the > device tree property is set by default. > > The below demonstration shows the map_sync behavior for non-pmem > backends. > (https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/ndctl.py.data/map_sync.c) > > The pmem0 is from spapr-nvdimm with with backend pmem=on, and pmem1 is > from spapr-nvdimm with pmem=off, mounted as > /dev/pmem0 on /mnt1 type xfs (rw,relatime,attr2,dax=always,inode64,logbufs=8,logbsize=32k,noquota) > /dev/pmem1 on /mnt2 type xfs (rw,relatime,attr2,dax=always,inode64,logbufs=8,logbsize=32k,noquota) > > [root@atest-guest ~]# ./mapsync /mnt1/newfile ----> When pmem=on > [root@atest-guest ~]# ./mapsync /mnt2/newfile ----> when pmem=off > Failed to mmap with Operation not supported > > First patch adds the realize/unrealize call backs to the generic device > for the new device's vmstate registration. The second patch implements > the hcall, adds the necessary vmstate properties to spapr machine structure > for carrying the hcall status during save-restore. The nature of the hcall > being asynchronus, the patch uses aio utilities to offload the flush. The > third patch introduces the spapr-nvdimm device, adds the device tree > property for the guest when spapr-nvdimm is used with pmem=no on the > backend. Also adds new property pmem-override(?, suggest if you have better > name) to the spapr-nvdimm which hints at forcing the hcall based flushes even > on pmem backed devices. > > The kernel changes to exploit this hcall is at > https://github.com/linuxppc/linux/commit/75b7c05ebf9026.patch Applied for ppc-7.0 Thanks, C.