Message ID | 20250319113215.520902-1-rrichter@amd.com |
---|---|
State | Superseded |
Headers | show |
Series | libnvdimm/labels: Fix divide error in nd_label_data_init() | expand |
On 3/19/2025 12:32 PM, Robert Richter wrote: > If a CXL memory device returns a broken zero LSA size in its memory > device information (Identify Memory Device (Opcode 4000h), CXL > spec. 3.1, 8.2.9.9.1.1), a divide error occurs in the libnvdimm > driver: > > Oops: divide error: 0000 [#1] PREEMPT SMP NOPTI > RIP: 0010:nd_label_data_init+0x10e/0x800 [libnvdimm] > > Code and flow: > > 1) CXL Command 4000h returns LSA size = 0, > 2) config_size is assigned to zero LSA size (CXL pmem driver): > > drivers/cxl/pmem.c: .config_size = mds->lsa_size, > > 3) max_xfer is set to zero (nvdimm driver): > > drivers/nvdimm/label.c: max_xfer = min_t(size_t, ndd->nsarea.max_xfer, config_size); > drivers/nvdimm/label.c: if (read_size < max_xfer) { > drivers/nvdimm/label.c- /* trim waste */ > > 4) DIV_ROUND_UP() causes division by zero: > > drivers/nvdimm/label.c: max_xfer -= ((max_xfer - 1) - (config_size - 1) % max_xfer) / > drivers/nvdimm/label.c: DIV_ROUND_UP(config_size, max_xfer); > drivers/nvdimm/label.c- /* make certain we read indexes in exactly 1 read */ > drivers/nvdimm/label.c: if (max_xfer < read_size) > drivers/nvdimm/label.c: max_xfer = read_size; > drivers/nvdimm/label.c- } > > Fix this by checking the config size parameter by extending an > existing check. > > Signed-off-by: Robert Richter <rrichter@amd.com> LGTM Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> > --- > drivers/nvdimm/label.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/nvdimm/label.c b/drivers/nvdimm/label.c > index 082253a3a956..04f4a049599a 100644 > --- a/drivers/nvdimm/label.c > +++ b/drivers/nvdimm/label.c > @@ -442,7 +442,8 @@ int nd_label_data_init(struct nvdimm_drvdata *ndd) > if (ndd->data) > return 0; > > - if (ndd->nsarea.status || ndd->nsarea.max_xfer == 0) { > + if (ndd->nsarea.status || ndd->nsarea.max_xfer == 0 || > + ndd->nsarea.config_size == 0) { > dev_dbg(ndd->dev, "failed to init config data area: (%u:%u)\n", > ndd->nsarea.max_xfer, ndd->nsarea.config_size); > return -ENXIO;
Robert Richter wrote: > If a CXL memory device returns a broken zero LSA size in its memory > device information (Identify Memory Device (Opcode 4000h), CXL > spec. 3.1, 8.2.9.9.1.1), a divide error occurs in the libnvdimm > driver: > > Oops: divide error: 0000 [#1] PREEMPT SMP NOPTI > RIP: 0010:nd_label_data_init+0x10e/0x800 [libnvdimm] > > Code and flow: > > 1) CXL Command 4000h returns LSA size = 0, > 2) config_size is assigned to zero LSA size (CXL pmem driver): > > drivers/cxl/pmem.c: .config_size = mds->lsa_size, > > 3) max_xfer is set to zero (nvdimm driver): > > drivers/nvdimm/label.c: max_xfer = min_t(size_t, ndd->nsarea.max_xfer, config_size); > drivers/nvdimm/label.c: if (read_size < max_xfer) { > drivers/nvdimm/label.c- /* trim waste */ > > 4) DIV_ROUND_UP() causes division by zero: > > drivers/nvdimm/label.c: max_xfer -= ((max_xfer - 1) - (config_size - 1) % max_xfer) / > drivers/nvdimm/label.c: DIV_ROUND_UP(config_size, max_xfer); I think this is the wrong DIV_ROUND_UP which is failing because read_size is never less than max_xfer is it? I believe the failing DIV_ROUND_UP is after if statement here: 489 /* Make our initial read size a multiple of max_xfer size */ 490 read_size = min(DIV_ROUND_UP(read_size, max_xfer) * max_xfer, 491 config_size); Apparently nvdimm_get_config_data() was intended to check for this implicitly but it is too late. Anyway all this side tracked me a bit. I assume this is a broken device which is in the real world? The fix looks fine. But could you re-spin with a clean up of the commit message and I'll queue it up. Reviewed-by: Ira Weiny <ira.weiny@intel.com> [snip]
On 19.03.25 14:33:54, Ira Weiny wrote: > Robert Richter wrote: > > If a CXL memory device returns a broken zero LSA size in its memory > > device information (Identify Memory Device (Opcode 4000h), CXL > > spec. 3.1, 8.2.9.9.1.1), a divide error occurs in the libnvdimm > > driver: > > > > Oops: divide error: 0000 [#1] PREEMPT SMP NOPTI > > RIP: 0010:nd_label_data_init+0x10e/0x800 [libnvdimm] > > > > Code and flow: > > > > 1) CXL Command 4000h returns LSA size = 0, > > 2) config_size is assigned to zero LSA size (CXL pmem driver): > > > > drivers/cxl/pmem.c: .config_size = mds->lsa_size, > > > > 3) max_xfer is set to zero (nvdimm driver): > > > > drivers/nvdimm/label.c: max_xfer = min_t(size_t, ndd->nsarea.max_xfer, config_size); > > drivers/nvdimm/label.c: if (read_size < max_xfer) { > > drivers/nvdimm/label.c- /* trim waste */ > > > > 4) DIV_ROUND_UP() causes division by zero: > > > > drivers/nvdimm/label.c: max_xfer -= ((max_xfer - 1) - (config_size - 1) % max_xfer) / > > drivers/nvdimm/label.c: DIV_ROUND_UP(config_size, max_xfer); > > I think this is the wrong DIV_ROUND_UP which is failing because read_size is > never less than max_xfer is it? > > I believe the failing DIV_ROUND_UP is after if statement here: > > 489 /* Make our initial read size a multiple of max_xfer size */ > 490 read_size = min(DIV_ROUND_UP(read_size, max_xfer) * max_xfer, > 491 config_size); Yes, it is this one. > > Apparently nvdimm_get_config_data() was intended to check for this implicitly > but it is too late. > > Anyway all this side tracked me a bit. > > I assume this is a broken device which is in the real world? The fix looks > fine. But could you re-spin with a clean up of the commit message and I'll > queue it up. Yes, it was caused by a faulty device. Sure, will update description and resend. > > Reviewed-by: Ira Weiny <ira.weiny@intel.com> Thanks for review, -Robert
diff --git a/drivers/nvdimm/label.c b/drivers/nvdimm/label.c index 082253a3a956..04f4a049599a 100644 --- a/drivers/nvdimm/label.c +++ b/drivers/nvdimm/label.c @@ -442,7 +442,8 @@ int nd_label_data_init(struct nvdimm_drvdata *ndd) if (ndd->data) return 0; - if (ndd->nsarea.status || ndd->nsarea.max_xfer == 0) { + if (ndd->nsarea.status || ndd->nsarea.max_xfer == 0 || + ndd->nsarea.config_size == 0) { dev_dbg(ndd->dev, "failed to init config data area: (%u:%u)\n", ndd->nsarea.max_xfer, ndd->nsarea.config_size); return -ENXIO;
If a CXL memory device returns a broken zero LSA size in its memory device information (Identify Memory Device (Opcode 4000h), CXL spec. 3.1, 8.2.9.9.1.1), a divide error occurs in the libnvdimm driver: Oops: divide error: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:nd_label_data_init+0x10e/0x800 [libnvdimm] Code and flow: 1) CXL Command 4000h returns LSA size = 0, 2) config_size is assigned to zero LSA size (CXL pmem driver): drivers/cxl/pmem.c: .config_size = mds->lsa_size, 3) max_xfer is set to zero (nvdimm driver): drivers/nvdimm/label.c: max_xfer = min_t(size_t, ndd->nsarea.max_xfer, config_size); drivers/nvdimm/label.c: if (read_size < max_xfer) { drivers/nvdimm/label.c- /* trim waste */ 4) DIV_ROUND_UP() causes division by zero: drivers/nvdimm/label.c: max_xfer -= ((max_xfer - 1) - (config_size - 1) % max_xfer) / drivers/nvdimm/label.c: DIV_ROUND_UP(config_size, max_xfer); drivers/nvdimm/label.c- /* make certain we read indexes in exactly 1 read */ drivers/nvdimm/label.c: if (max_xfer < read_size) drivers/nvdimm/label.c: max_xfer = read_size; drivers/nvdimm/label.c- } Fix this by checking the config size parameter by extending an existing check. Signed-off-by: Robert Richter <rrichter@amd.com> --- drivers/nvdimm/label.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)