Message ID | 156092356065.979959.6681003754765958296.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: Sub-section memory hotplug support | expand |
Dan Williams <dan.j.williams@intel.com> writes: > At namespace creation time there is the potential for the "expected to > be zero" fields of a 'pfn' info-block to be filled with indeterminate > data. While the kernel buffer is zeroed on allocation it is immediately > overwritten by nd_pfn_validate() filling it with the current contents of > the on-media info-block location. For fields like, 'flags' and the > 'padding' it potentially means that future implementations can not rely > on those fields being zero. > > In preparation to stop using the 'start_pad' and 'end_trunc' fields for > section alignment, arrange for fields that are not explicitly > initialized to be guaranteed zero. Bump the minor version to indicate it > is safe to assume the 'padding' and 'flags' are zero. Otherwise, this > corruption is expected to benign since all other critical fields are > explicitly initialized. > > Note The cc: stable is about spreading this new policy to as many > kernels as possible not fixing an issue in those kernels. It is not > until the change titled "libnvdimm/pfn: Stop padding pmem namespaces to > section alignment" where this improper initialization becomes a problem. > So if someone decides to backport "libnvdimm/pfn: Stop padding pmem > namespaces to section alignment" (which is not tagged for stable), make > sure this pre-requisite is flagged. Don't we need a change like below in this patch? modified drivers/nvdimm/pfn_devs.c @@ -452,10 +452,11 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) if (memcmp(pfn_sb->parent_uuid, parent_uuid, 16) != 0) return -ENODEV; - if (__le16_to_cpu(pfn_sb->version_minor) < 1) { - pfn_sb->start_pad = 0; - pfn_sb->end_trunc = 0; - } + if ((__le16_to_cpu(pfn_sb->version_minor) < 1) || + (__le16_to_cpu(pfn_sb->version_minor) >= 3)) { + pfn_sb->start_pad = 0; + pfn_sb->end_trunc = 0; + } IIUC we want to force the start_pad and end_truc to zero if the pfn_sb minor version number >= 3. So once we have this patch backported and older kernel finds a pfn_sb with minor version 3, it will ignore the start_pad read from the nvdimm and overwrite that with zero here. This patch doesn't enforce that right? After the next patch we can have values other than 0 in pfn_sb->start_pad? > > Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem") > Cc: <stable@vger.kernel.org> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > drivers/nvdimm/dax_devs.c | 2 +- > drivers/nvdimm/pfn.h | 1 + > drivers/nvdimm/pfn_devs.c | 18 +++++++++++++++--- > 3 files changed, 17 insertions(+), 4 deletions(-) > > diff --git a/drivers/nvdimm/dax_devs.c b/drivers/nvdimm/dax_devs.c > index 49fc18ee0565..6d22b0f83b3b 100644 > --- a/drivers/nvdimm/dax_devs.c > +++ b/drivers/nvdimm/dax_devs.c > @@ -118,7 +118,7 @@ int nd_dax_probe(struct device *dev, struct nd_namespace_common *ndns) > nvdimm_bus_unlock(&ndns->dev); > if (!dax_dev) > return -ENOMEM; > - pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL); > + pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL); > nd_pfn->pfn_sb = pfn_sb; > rc = nd_pfn_validate(nd_pfn, DAX_SIG); > dev_dbg(dev, "dax: %s\n", rc == 0 ? dev_name(dax_dev) : "<none>"); > diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h > index f58b849e455b..dfb2bcda8f5a 100644 > --- a/drivers/nvdimm/pfn.h > +++ b/drivers/nvdimm/pfn.h > @@ -28,6 +28,7 @@ struct nd_pfn_sb { > __le32 end_trunc; > /* minor-version-2 record the base alignment of the mapping */ > __le32 align; > + /* minor-version-3 guarantee the padding and flags are zero */ > u8 padding[4000]; > __le64 checksum; > }; > diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c > index 0f81fc56bbfd..4977424693b0 100644 > --- a/drivers/nvdimm/pfn_devs.c > +++ b/drivers/nvdimm/pfn_devs.c > @@ -412,6 +412,15 @@ static int nd_pfn_clear_memmap_errors(struct nd_pfn *nd_pfn) > return 0; > } > > +/** > + * nd_pfn_validate - read and validate info-block > + * @nd_pfn: fsdax namespace runtime state / properties > + * @sig: 'devdax' or 'fsdax' signature > + * > + * Upon return the info-block buffer contents (->pfn_sb) are > + * indeterminate when validation fails, and a coherent info-block > + * otherwise. > + */ > int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) > { > u64 checksum, offset; > @@ -557,7 +566,7 @@ int nd_pfn_probe(struct device *dev, struct nd_namespace_common *ndns) > nvdimm_bus_unlock(&ndns->dev); > if (!pfn_dev) > return -ENOMEM; > - pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL); > + pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL); > nd_pfn = to_nd_pfn(pfn_dev); > nd_pfn->pfn_sb = pfn_sb; > rc = nd_pfn_validate(nd_pfn, PFN_SIG); > @@ -694,7 +703,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) > u64 checksum; > int rc; > > - pfn_sb = devm_kzalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL); > + pfn_sb = devm_kmalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL); > if (!pfn_sb) > return -ENOMEM; > > @@ -703,11 +712,14 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) > sig = DAX_SIG; > else > sig = PFN_SIG; > + > rc = nd_pfn_validate(nd_pfn, sig); > if (rc != -ENODEV) > return rc; > > /* no info block, do init */; > + memset(pfn_sb, 0, sizeof(*pfn_sb)); > + > nd_region = to_nd_region(nd_pfn->dev.parent); > if (nd_region->ro) { > dev_info(&nd_pfn->dev, > @@ -760,7 +772,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) > memcpy(pfn_sb->uuid, nd_pfn->uuid, 16); > memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(&ndns->dev), 16); > pfn_sb->version_major = cpu_to_le16(1); > - pfn_sb->version_minor = cpu_to_le16(2); > + pfn_sb->version_minor = cpu_to_le16(3); > pfn_sb->start_pad = cpu_to_le32(start_pad); > pfn_sb->end_trunc = cpu_to_le32(end_trunc); > pfn_sb->align = cpu_to_le32(nd_pfn->align); > > _______________________________________________ > Linux-nvdimm mailing list > Linux-nvdimm@lists.01.org > https://lists.01.org/mailman/listinfo/linux-nvdimm
On Wed, Jun 19, 2019 at 9:30 AM Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> wrote: > > Dan Williams <dan.j.williams@intel.com> writes: > > > At namespace creation time there is the potential for the "expected to > > be zero" fields of a 'pfn' info-block to be filled with indeterminate > > data. While the kernel buffer is zeroed on allocation it is immediately > > overwritten by nd_pfn_validate() filling it with the current contents of > > the on-media info-block location. For fields like, 'flags' and the > > 'padding' it potentially means that future implementations can not rely > > on those fields being zero. > > > > In preparation to stop using the 'start_pad' and 'end_trunc' fields for > > section alignment, arrange for fields that are not explicitly > > initialized to be guaranteed zero. Bump the minor version to indicate it > > is safe to assume the 'padding' and 'flags' are zero. Otherwise, this > > corruption is expected to benign since all other critical fields are > > explicitly initialized. > > > > Note The cc: stable is about spreading this new policy to as many > > kernels as possible not fixing an issue in those kernels. It is not > > until the change titled "libnvdimm/pfn: Stop padding pmem namespaces to > > section alignment" where this improper initialization becomes a problem. > > So if someone decides to backport "libnvdimm/pfn: Stop padding pmem > > namespaces to section alignment" (which is not tagged for stable), make > > sure this pre-requisite is flagged. > > Don't we need a change like below in this patch? > > modified drivers/nvdimm/pfn_devs.c > @@ -452,10 +452,11 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) > if (memcmp(pfn_sb->parent_uuid, parent_uuid, 16) != 0) > return -ENODEV; > > - if (__le16_to_cpu(pfn_sb->version_minor) < 1) { > - pfn_sb->start_pad = 0; > - pfn_sb->end_trunc = 0; > - } > + if ((__le16_to_cpu(pfn_sb->version_minor) < 1) || > + (__le16_to_cpu(pfn_sb->version_minor) >= 3)) { > + pfn_sb->start_pad = 0; > + pfn_sb->end_trunc = 0; > + } No, this kills off start_pad and end_trunc permanently. > IIUC we want to force the start_pad and end_truc to zero if the pfn_sb > minor version number >= 3. So once we have this patch backported and > older kernel finds a pfn_sb with minor version 3, it will ignore the > start_pad read from the nvdimm and overwrite that with zero here. > This patch doesn't enforce that right? After the next patch we can have > values other than 0 in pfn_sb->start_pad? The reason for the version bump is for the kernel to safely assume that uninitialized fields default to zero, but it's otherwise a nop when the implementation is explicitly initializing every field by default.
diff --git a/drivers/nvdimm/dax_devs.c b/drivers/nvdimm/dax_devs.c index 49fc18ee0565..6d22b0f83b3b 100644 --- a/drivers/nvdimm/dax_devs.c +++ b/drivers/nvdimm/dax_devs.c @@ -118,7 +118,7 @@ int nd_dax_probe(struct device *dev, struct nd_namespace_common *ndns) nvdimm_bus_unlock(&ndns->dev); if (!dax_dev) return -ENOMEM; - pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL); + pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL); nd_pfn->pfn_sb = pfn_sb; rc = nd_pfn_validate(nd_pfn, DAX_SIG); dev_dbg(dev, "dax: %s\n", rc == 0 ? dev_name(dax_dev) : "<none>"); diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h index f58b849e455b..dfb2bcda8f5a 100644 --- a/drivers/nvdimm/pfn.h +++ b/drivers/nvdimm/pfn.h @@ -28,6 +28,7 @@ struct nd_pfn_sb { __le32 end_trunc; /* minor-version-2 record the base alignment of the mapping */ __le32 align; + /* minor-version-3 guarantee the padding and flags are zero */ u8 padding[4000]; __le64 checksum; }; diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 0f81fc56bbfd..4977424693b0 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -412,6 +412,15 @@ static int nd_pfn_clear_memmap_errors(struct nd_pfn *nd_pfn) return 0; } +/** + * nd_pfn_validate - read and validate info-block + * @nd_pfn: fsdax namespace runtime state / properties + * @sig: 'devdax' or 'fsdax' signature + * + * Upon return the info-block buffer contents (->pfn_sb) are + * indeterminate when validation fails, and a coherent info-block + * otherwise. + */ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) { u64 checksum, offset; @@ -557,7 +566,7 @@ int nd_pfn_probe(struct device *dev, struct nd_namespace_common *ndns) nvdimm_bus_unlock(&ndns->dev); if (!pfn_dev) return -ENOMEM; - pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL); + pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL); nd_pfn = to_nd_pfn(pfn_dev); nd_pfn->pfn_sb = pfn_sb; rc = nd_pfn_validate(nd_pfn, PFN_SIG); @@ -694,7 +703,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) u64 checksum; int rc; - pfn_sb = devm_kzalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL); + pfn_sb = devm_kmalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL); if (!pfn_sb) return -ENOMEM; @@ -703,11 +712,14 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) sig = DAX_SIG; else sig = PFN_SIG; + rc = nd_pfn_validate(nd_pfn, sig); if (rc != -ENODEV) return rc; /* no info block, do init */; + memset(pfn_sb, 0, sizeof(*pfn_sb)); + nd_region = to_nd_region(nd_pfn->dev.parent); if (nd_region->ro) { dev_info(&nd_pfn->dev, @@ -760,7 +772,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) memcpy(pfn_sb->uuid, nd_pfn->uuid, 16); memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(&ndns->dev), 16); pfn_sb->version_major = cpu_to_le16(1); - pfn_sb->version_minor = cpu_to_le16(2); + pfn_sb->version_minor = cpu_to_le16(3); pfn_sb->start_pad = cpu_to_le32(start_pad); pfn_sb->end_trunc = cpu_to_le32(end_trunc); pfn_sb->align = cpu_to_le32(nd_pfn->align);
At namespace creation time there is the potential for the "expected to be zero" fields of a 'pfn' info-block to be filled with indeterminate data. While the kernel buffer is zeroed on allocation it is immediately overwritten by nd_pfn_validate() filling it with the current contents of the on-media info-block location. For fields like, 'flags' and the 'padding' it potentially means that future implementations can not rely on those fields being zero. In preparation to stop using the 'start_pad' and 'end_trunc' fields for section alignment, arrange for fields that are not explicitly initialized to be guaranteed zero. Bump the minor version to indicate it is safe to assume the 'padding' and 'flags' are zero. Otherwise, this corruption is expected to benign since all other critical fields are explicitly initialized. Note The cc: stable is about spreading this new policy to as many kernels as possible not fixing an issue in those kernels. It is not until the change titled "libnvdimm/pfn: Stop padding pmem namespaces to section alignment" where this improper initialization becomes a problem. So if someone decides to backport "libnvdimm/pfn: Stop padding pmem namespaces to section alignment" (which is not tagged for stable), make sure this pre-requisite is flagged. Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem") Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/nvdimm/dax_devs.c | 2 +- drivers/nvdimm/pfn.h | 1 + drivers/nvdimm/pfn_devs.c | 18 +++++++++++++++--- 3 files changed, 17 insertions(+), 4 deletions(-)