diff mbox series

[RESEND] i5000_edac: fix slot number passed to determine_mtr()

Message ID 20220921181009.oxytvicy6sry6it7@redhat.com (mailing list archive)
State New, archived
Headers show
Series [RESEND] i5000_edac: fix slot number passed to determine_mtr() | expand

Commit Message

Aristeu Rozanski Sept. 21, 2022, 6:10 p.m. UTC
When the logic mapping branch/slot/channel was reworked back in
64e1fdaf55d6 ("i5000_edac: Fix the logic that retrieves memory information")
i5000_init_csrows() was not updated and kept passing twice the number
of slots to determine_mtr(), which leads to acessing past the end of
i5000_pvt.b1_mtr[]. This was found by KASAN.

Fixes: 64e1fdaf55d6 ("i5000_edac: Fix the logic that retrieves memory information")
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>

---
 drivers/edac/i5000_edac.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Comments

Borislav Petkov Sept. 21, 2022, 7:38 p.m. UTC | #1
On Wed, Sep 21, 2022 at 02:10:09PM -0400, Aristeu Rozanski wrote:
> When the logic mapping branch/slot/channel was reworked back in
> 64e1fdaf55d6 ("i5000_edac: Fix the logic that retrieves memory information")
> i5000_init_csrows() was not updated and kept passing twice the number
> of slots to determine_mtr(), which leads to acessing past the end of
> i5000_pvt.b1_mtr[]. This was found by KASAN.
> 
> Fixes: 64e1fdaf55d6 ("i5000_edac: Fix the logic that retrieves memory information")
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Borislav Petkov <bp@suse.de>
> Signed-off-by: Aristeu Rozanski <aris@redhat.com>
> 
> ---
>  drivers/edac/i5000_edac.c |    4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> --- linus-2.6.orig/drivers/edac/i5000_edac.c	2022-07-25 15:26:40.093989879 -0400
> +++ linus-2.6/drivers/edac/i5000_edac.c	2022-07-26 14:32:23.644694778 -0400
> @@ -1249,14 +1249,12 @@ static int i5000_init_csrows(struct mem_
>  	struct i5000_pvt *pvt;
>  	struct dimm_info *dimm;
>  	int empty;
> -	int max_csrows;
>  	int mtr;
>  	int csrow_megs;
>  	int channel;
>  	int slot;
>  
>  	pvt = mci->pvt_info;
> -	max_csrows = pvt->maxdimmperch * 2;

So it looks to me like back then, the number of CS rows was twice the
DIMMs per channel, implying two CS rows per DIMM, I guess dual-ranked
DIMMs or so.

Now you're saying the number CS rows is the number of DIMMs. Which
means, single-ranked DIMMs?

But all this is broken as it needs to deal with both single-ranked and
dual-ranked DIMMs elegantly and not assume any DIMM types...

Do we have hardware where we can test this or can we declare this hw for
dead?

Because

https://www.intel.com/content/www/us/en/products/sku/27746/intel-5000p-memory-controller/specifications.html

says launch date Q1 2006 which is a whole another era in IT years.
Aristeu Rozanski Sept. 22, 2022, 1:46 p.m. UTC | #2
On Wed, Sep 21, 2022 at 09:38:06PM +0200, Borislav Petkov wrote:
> So it looks to me like back then, the number of CS rows was twice the
> DIMMs per channel, implying two CS rows per DIMM, I guess dual-ranked
> DIMMs or so.
> 
> Now you're saying the number CS rows is the number of DIMMs. Which
> means, single-ranked DIMMs?

I'm only fixing a out of bounds access, on purpose.
Look at 64e1fdaf55d6 and the new version of determine_mtr().

> But all this is broken as it needs to deal with both single-ranked and
> dual-ranked DIMMs elegantly and not assume any DIMM types...

That's correct, while talking to Mauro he suggested fixing the entire
thing.

> Do we have hardware where we can test this or can we declare this hw for
> dead?

And this is the reason why I decided to not do it. We do have the hardware
but last time I checked none of them had functional EINJ.
Borislav Petkov Sept. 22, 2022, 1:55 p.m. UTC | #3
On Thu, Sep 22, 2022 at 09:46:59AM -0400, Aristeu Rozanski wrote:
> I'm only fixing a out of bounds access, on purpose.
> Look at 64e1fdaf55d6 and the new version of determine_mtr().

I have been looking at that one. And I'm questioning it:

"The logic there is broken: it basically creates two csrows for each
DIMM and assumes that all DIMM's are dual rank."

That commit message doesn't explain why this assumption is wrong.

Furthermore, "If single rank memories are found, they'll be marked with
0 bytes."

So this looks to me like this commit is breaking dual-ranked DIMMs
configs in order to fix single-ranked ones.

> And this is the reason why I decided to not do it. We do have the hardware
> but last time I checked none of them had functional EINJ.

You don't need EINJ - there's arch/x86/kernel/cpu/mce/inject.c with
which you can do sw-only injection to test the decoding paths.
Aristeu Rozanski Sept. 26, 2022, 4:51 p.m. UTC | #4
On Thu, Sep 22, 2022 at 03:55:57PM +0200, Borislav Petkov wrote:
> On Thu, Sep 22, 2022 at 09:46:59AM -0400, Aristeu Rozanski wrote:
> > I'm only fixing a out of bounds access, on purpose.
> > Look at 64e1fdaf55d6 and the new version of determine_mtr().
> 
> I have been looking at that one. And I'm questioning it:
> 
> "The logic there is broken: it basically creates two csrows for each
> DIMM and assumes that all DIMM's are dual rank."
> 
> That commit message doesn't explain why this assumption is wrong.
> 
> Furthermore, "If single rank memories are found, they'll be marked with
> 0 bytes."
> 
> So this looks to me like this commit is breaking dual-ranked DIMMs
> configs in order to fix single-ranked ones.

Declare it dead. Only have a single working system and I don't have
physical access to it along with the fact it's unlikely we have memory
modules single and dual rank to swap around for testing purposes.
Borislav Petkov Sept. 26, 2022, 6:02 p.m. UTC | #5
On Mon, Sep 26, 2022 at 12:51:38PM -0400, Aristeu Rozanski wrote:
> Declare it dead. Only have a single working system and I don't have
> physical access to it along with the fact it's unlikely we have memory
> modules single and dual rank to swap around for testing purposes.

Wanna send a "depends on BROKEN" patch and explain why we're doing that
in the commit message?

Thx.
diff mbox series

Patch

--- linus-2.6.orig/drivers/edac/i5000_edac.c	2022-07-25 15:26:40.093989879 -0400
+++ linus-2.6/drivers/edac/i5000_edac.c	2022-07-26 14:32:23.644694778 -0400
@@ -1249,14 +1249,12 @@  static int i5000_init_csrows(struct mem_
 	struct i5000_pvt *pvt;
 	struct dimm_info *dimm;
 	int empty;
-	int max_csrows;
 	int mtr;
 	int csrow_megs;
 	int channel;
 	int slot;
 
 	pvt = mci->pvt_info;
-	max_csrows = pvt->maxdimmperch * 2;
 
 	empty = 1;		/* Assume NO memory */
 
@@ -1267,7 +1265,7 @@  struct i5000_pvt *pvt;
 	 * to map the dimms. A good cleanup would be to remove this array,
 	 * and do a loop here with branch, channel, slot
 	 */
-	for (slot = 0; slot < max_csrows; slot++) {
+	for (slot = 0; slot < pvt->maxdimmperch; slot++) {
 		for (channel = 0; channel < pvt->maxch; channel++) {
 
 			mtr = determine_mtr(pvt, slot, channel);