Message ID | 20180207073331.14158-5-haozhong.zhang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Feb 07, 2018 at 03:33:27PM +0800, Haozhong Zhang wrote: > @@ -156,11 +157,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf, > { > MemoryRegion *mr; > PCDIMMDevice *dimm = PC_DIMM(nvdimm); > + bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem), > + "pmem", NULL); > uint64_t backend_offset; > > nvdimm_validate_rw_label_data(nvdimm, size, offset); > > - memcpy(nvdimm->label_data + offset, buf, size); > + if (!is_pmem) { > + memcpy(nvdimm->label_data + offset, buf, size); > + } else { > + pmem_memcpy_persist(nvdimm->label_data + offset, buf, size); > + } Is this enough to prevent label corruption in case of power failure? pmem_memcpy_persist() is not atomic. Power failure can result in a mix of the old and new label data. If we want this operation to be 100% safe there needs to be some kind of update protocol that makes the change atomic, like a Label A and Label B area with a single Label Index field that can be updated atomically to point to the active Label A/B area. Stefan
On 02/09/18 14:27 +0000, Stefan Hajnoczi wrote: > On Wed, Feb 07, 2018 at 03:33:27PM +0800, Haozhong Zhang wrote: > > @@ -156,11 +157,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf, > > { > > MemoryRegion *mr; > > PCDIMMDevice *dimm = PC_DIMM(nvdimm); > > + bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem), > > + "pmem", NULL); > > uint64_t backend_offset; > > > > nvdimm_validate_rw_label_data(nvdimm, size, offset); > > > > - memcpy(nvdimm->label_data + offset, buf, size); > > + if (!is_pmem) { > > + memcpy(nvdimm->label_data + offset, buf, size); > > + } else { > > + pmem_memcpy_persist(nvdimm->label_data + offset, buf, size); > > + } > > Is this enough to prevent label corruption in case of power failure? > > pmem_memcpy_persist() is not atomic. Power failure can result in a mix > of the old and new label data. > > If we want this operation to be 100% safe there needs to be some kind of > update protocol that makes the change atomic, like a Label A and Label B > area with a single Label Index field that can be updated atomically to > point to the active Label A/B area. All this patch series is to guarantee: if the guest is still alive and running, all its previous writes to pmem, which were performed by QEMU, will be still persistent on pmem. If a power failure happens before QEMU returns to the guest, e.g., in the middle of above pmem_memcpy_persist(), yes, the guest label data may be in an inconsistent state, but the guest also has no chance to progress. And, that is what could happen in the non-virtualization environment as well, and it's the responsibility of the (guest) SW to defend such failures, e.g., by the protocol you mentioned. Haozhong
On Fri, Feb 09, 2018 at 10:57:26PM +0800, Haozhong Zhang wrote: > On 02/09/18 14:27 +0000, Stefan Hajnoczi wrote: > > On Wed, Feb 07, 2018 at 03:33:27PM +0800, Haozhong Zhang wrote: > > > @@ -156,11 +157,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf, > > > { > > > MemoryRegion *mr; > > > PCDIMMDevice *dimm = PC_DIMM(nvdimm); > > > + bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem), > > > + "pmem", NULL); > > > uint64_t backend_offset; > > > > > > nvdimm_validate_rw_label_data(nvdimm, size, offset); > > > > > > - memcpy(nvdimm->label_data + offset, buf, size); > > > + if (!is_pmem) { > > > + memcpy(nvdimm->label_data + offset, buf, size); > > > + } else { > > > + pmem_memcpy_persist(nvdimm->label_data + offset, buf, size); > > > + } > > > > Is this enough to prevent label corruption in case of power failure? > > > > pmem_memcpy_persist() is not atomic. Power failure can result in a mix > > of the old and new label data. > > > > If we want this operation to be 100% safe there needs to be some kind of > > update protocol that makes the change atomic, like a Label A and Label B > > area with a single Label Index field that can be updated atomically to > > point to the active Label A/B area. > > All this patch series is to guarantee: if the guest is still alive and > running, all its previous writes to pmem, which were performed by > QEMU, will be still persistent on pmem. > > If a power failure happens before QEMU returns to the guest, e.g., in > the middle of above pmem_memcpy_persist(), yes, the guest label data > may be in an inconsistent state, but the guest also has no chance to > progress. And, that is what could happen in the non-virtualization > environment as well, and it's the responsibility of the (guest) SW to > defend such failures, e.g., by the protocol you mentioned. Thanks for explaining! I thought the atomic update is done by the ACPI _LSW DSM method but it turns out the driver must do it. For anyone else who is interested, a detailed description of the Label Storage Area is on page 652 of the UEFI Specification version 2.7 and the Linux driver implements the atomic update algorithm in drivers/nvdimm/label.c:__pmem_label_update(). Stefan
diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c index 61e677f92f..18861d1a7a 100644 --- a/hw/mem/nvdimm.c +++ b/hw/mem/nvdimm.c @@ -23,6 +23,7 @@ */ #include "qemu/osdep.h" +#include "qemu/pmem.h" #include "qapi/error.h" #include "qapi/visitor.h" #include "qapi-visit.h" @@ -156,11 +157,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf, { MemoryRegion *mr; PCDIMMDevice *dimm = PC_DIMM(nvdimm); + bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem), + "pmem", NULL); uint64_t backend_offset; nvdimm_validate_rw_label_data(nvdimm, size, offset); - memcpy(nvdimm->label_data + offset, buf, size); + if (!is_pmem) { + memcpy(nvdimm->label_data + offset, buf, size); + } else { + pmem_memcpy_persist(nvdimm->label_data + offset, buf, size); + } mr = host_memory_backend_get_memory(dimm->hostmem, &error_abort); backend_offset = memory_region_size(mr) - nvdimm->label_size + offset; diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h new file mode 100644 index 0000000000..9017596ff0 --- /dev/null +++ b/include/qemu/pmem.h @@ -0,0 +1,31 @@ +/* + * Stub functions for libpmem. + * + * Copyright (c) 2018 Intel Corporation. + * + * Author: Haozhong Zhang <haozhong.zhang@intel.com> + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef QEMU_PMEM_H +#define QEMU_PMEM_H + +#ifdef CONFIG_LIBPMEM +#include <libpmem.h> +#else /* !CONFIG_LIBPMEM */ + +#include <string.h> + +/* Stubs */ + +static inline void * +pmem_memcpy_persist(void *pmemdest, const void *src, size_t len) +{ + return memcpy(pmemdest, src, len); +} + +#endif /* CONFIG_LIBPMEM */ + +#endif /* !QEMU_PMEM_H */
Guest writes to vNVDIMM labels are intercepted and performed on the backend by QEMU. When the backend is a real persistent memort, QEMU needs to take proper operations to ensure its write persistence on the persistent memory. Otherwise, a host power failure may result in the loss of guest label configurations. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> --- hw/mem/nvdimm.c | 9 ++++++++- include/qemu/pmem.h | 31 +++++++++++++++++++++++++++++++ 2 files changed, 39 insertions(+), 1 deletion(-) create mode 100644 include/qemu/pmem.h