Message ID | 20220921181009.oxytvicy6sry6it7@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RESEND] i5000_edac: fix slot number passed to determine_mtr() | expand |
On Wed, Sep 21, 2022 at 02:10:09PM -0400, Aristeu Rozanski wrote: > When the logic mapping branch/slot/channel was reworked back in > 64e1fdaf55d6 ("i5000_edac: Fix the logic that retrieves memory information") > i5000_init_csrows() was not updated and kept passing twice the number > of slots to determine_mtr(), which leads to acessing past the end of > i5000_pvt.b1_mtr[]. This was found by KASAN. > > Fixes: 64e1fdaf55d6 ("i5000_edac: Fix the logic that retrieves memory information") > Cc: Mauro Carvalho Chehab <mchehab@kernel.org> > Cc: Borislav Petkov <bp@suse.de> > Signed-off-by: Aristeu Rozanski <aris@redhat.com> > > --- > drivers/edac/i5000_edac.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > --- linus-2.6.orig/drivers/edac/i5000_edac.c 2022-07-25 15:26:40.093989879 -0400 > +++ linus-2.6/drivers/edac/i5000_edac.c 2022-07-26 14:32:23.644694778 -0400 > @@ -1249,14 +1249,12 @@ static int i5000_init_csrows(struct mem_ > struct i5000_pvt *pvt; > struct dimm_info *dimm; > int empty; > - int max_csrows; > int mtr; > int csrow_megs; > int channel; > int slot; > > pvt = mci->pvt_info; > - max_csrows = pvt->maxdimmperch * 2; So it looks to me like back then, the number of CS rows was twice the DIMMs per channel, implying two CS rows per DIMM, I guess dual-ranked DIMMs or so. Now you're saying the number CS rows is the number of DIMMs. Which means, single-ranked DIMMs? But all this is broken as it needs to deal with both single-ranked and dual-ranked DIMMs elegantly and not assume any DIMM types... Do we have hardware where we can test this or can we declare this hw for dead? Because https://www.intel.com/content/www/us/en/products/sku/27746/intel-5000p-memory-controller/specifications.html says launch date Q1 2006 which is a whole another era in IT years.
On Wed, Sep 21, 2022 at 09:38:06PM +0200, Borislav Petkov wrote: > So it looks to me like back then, the number of CS rows was twice the > DIMMs per channel, implying two CS rows per DIMM, I guess dual-ranked > DIMMs or so. > > Now you're saying the number CS rows is the number of DIMMs. Which > means, single-ranked DIMMs? I'm only fixing a out of bounds access, on purpose. Look at 64e1fdaf55d6 and the new version of determine_mtr(). > But all this is broken as it needs to deal with both single-ranked and > dual-ranked DIMMs elegantly and not assume any DIMM types... That's correct, while talking to Mauro he suggested fixing the entire thing. > Do we have hardware where we can test this or can we declare this hw for > dead? And this is the reason why I decided to not do it. We do have the hardware but last time I checked none of them had functional EINJ.
On Thu, Sep 22, 2022 at 09:46:59AM -0400, Aristeu Rozanski wrote: > I'm only fixing a out of bounds access, on purpose. > Look at 64e1fdaf55d6 and the new version of determine_mtr(). I have been looking at that one. And I'm questioning it: "The logic there is broken: it basically creates two csrows for each DIMM and assumes that all DIMM's are dual rank." That commit message doesn't explain why this assumption is wrong. Furthermore, "If single rank memories are found, they'll be marked with 0 bytes." So this looks to me like this commit is breaking dual-ranked DIMMs configs in order to fix single-ranked ones. > And this is the reason why I decided to not do it. We do have the hardware > but last time I checked none of them had functional EINJ. You don't need EINJ - there's arch/x86/kernel/cpu/mce/inject.c with which you can do sw-only injection to test the decoding paths.
On Thu, Sep 22, 2022 at 03:55:57PM +0200, Borislav Petkov wrote: > On Thu, Sep 22, 2022 at 09:46:59AM -0400, Aristeu Rozanski wrote: > > I'm only fixing a out of bounds access, on purpose. > > Look at 64e1fdaf55d6 and the new version of determine_mtr(). > > I have been looking at that one. And I'm questioning it: > > "The logic there is broken: it basically creates two csrows for each > DIMM and assumes that all DIMM's are dual rank." > > That commit message doesn't explain why this assumption is wrong. > > Furthermore, "If single rank memories are found, they'll be marked with > 0 bytes." > > So this looks to me like this commit is breaking dual-ranked DIMMs > configs in order to fix single-ranked ones. Declare it dead. Only have a single working system and I don't have physical access to it along with the fact it's unlikely we have memory modules single and dual rank to swap around for testing purposes.
On Mon, Sep 26, 2022 at 12:51:38PM -0400, Aristeu Rozanski wrote: > Declare it dead. Only have a single working system and I don't have > physical access to it along with the fact it's unlikely we have memory > modules single and dual rank to swap around for testing purposes. Wanna send a "depends on BROKEN" patch and explain why we're doing that in the commit message? Thx.
--- linus-2.6.orig/drivers/edac/i5000_edac.c 2022-07-25 15:26:40.093989879 -0400 +++ linus-2.6/drivers/edac/i5000_edac.c 2022-07-26 14:32:23.644694778 -0400 @@ -1249,14 +1249,12 @@ static int i5000_init_csrows(struct mem_ struct i5000_pvt *pvt; struct dimm_info *dimm; int empty; - int max_csrows; int mtr; int csrow_megs; int channel; int slot; pvt = mci->pvt_info; - max_csrows = pvt->maxdimmperch * 2; empty = 1; /* Assume NO memory */ @@ -1267,7 +1265,7 @@ struct i5000_pvt *pvt; * to map the dimms. A good cleanup would be to remove this array, * and do a loop here with branch, channel, slot */ - for (slot = 0; slot < max_csrows; slot++) { + for (slot = 0; slot < pvt->maxdimmperch; slot++) { for (channel = 0; channel < pvt->maxch; channel++) { mtr = determine_mtr(pvt, slot, channel);
When the logic mapping branch/slot/channel was reworked back in 64e1fdaf55d6 ("i5000_edac: Fix the logic that retrieves memory information") i5000_init_csrows() was not updated and kept passing twice the number of slots to determine_mtr(), which leads to acessing past the end of i5000_pvt.b1_mtr[]. This was found by KASAN. Fixes: 64e1fdaf55d6 ("i5000_edac: Fix the logic that retrieves memory information") Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Aristeu Rozanski <aris@redhat.com> --- drivers/edac/i5000_edac.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)