Message ID | 20200827075600.22335-1-gregor.herburger@ew.tq-group.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2,1/1] edac: fsl_ddr_edac: fix expected data message | expand |
On Thu, Aug 27, 2020 at 09:56:00AM +0200, Gregor Herburger wrote: > When a correctable single bit error occurs, the driver calculates the > bad_data_bit respectively the bad_ecc_bit. If there is no error in the > corresponding data, the value becomes -1. With this the expected data > message is calculated. > > In the case of an error in the lower 32 bits or no error (-1) the right > side operand of the bit-shift becomes negative which is undefined > behavior. > > This can result in wrong and misleading messages like this: > [ 311.103794] EDAC FSL_DDR MC0: Faulty Data bit: 36 > [ 311.108490] EDAC FSL_DDR MC0: Expected Data / ECC: 0xffffffef_ffffffff / 0x80000059 > [ 311.116135] EDAC FSL_DDR MC0: Captured Data / ECC: 0xffffffff_ffffffef / 0x59 > > Fix this by only calculating the expected data where the error occurred. > > With the fix the dmesg output looks like this: > [ 311.103794] EDAC FSL_DDR MC0: Faulty Data bit: 36 > [ 311.108490] EDAC FSL_DDR MC0: Expected Data / ECC: 0xffffffef_ffffffef / 0x59 > [ 311.116135] EDAC FSL_DDR MC0: Captured Data / ECC: 0xffffffff_ffffffef / 0x59 > > Signed-off-by: Gregor Herburger <gregor.herburger@ew.tq-group.com> > --- > drivers/edac/fsl_ddr_edac.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/edac/fsl_ddr_edac.c b/drivers/edac/fsl_ddr_edac.c > index 6d8ea226010d..4b6989cf1947 100644 > --- a/drivers/edac/fsl_ddr_edac.c > +++ b/drivers/edac/fsl_ddr_edac.c > @@ -343,9 +343,9 @@ static void fsl_mc_check(struct mem_ctl_info *mci) > > fsl_mc_printk(mci, KERN_ERR, > "Expected Data / ECC:\t%#8.8x_%08x / %#2.2x\n", > - cap_high ^ (1 << (bad_data_bit - 32)), > - cap_low ^ (1 << bad_data_bit), > - syndrome ^ (1 << bad_ecc_bit)); > + (bad_data_bit > 31) ? cap_high ^ (1 << (bad_data_bit - 32)) : cap_high, > + (bad_data_bit <= 31) ? cap_low ^ (1 << (bad_data_bit)) : cap_low, But if bad_data_bit is -1, this check above will hit and you'd still shift by -1, IINM. How about you fix it properly, clean it up and make it more readable in the process (pasting the code directly instead of a diff because a diff is less readable): if ((err_detect & DDR_EDE_SBE) && (bus_width == 64)) { sbe_ecc_decode(cap_high, cap_low, syndrome, &bad_data_bit, &bad_ecc_bit); if (bad_data_bit != -1) { if (bad_data_bit > 31) cap_high ^= 1 << (bad_data_bit - 32); else cap_low ^= 1 << bad_data_bit; fsl_mc_printk(mci, KERN_ERR, "Faulty Data bit: %d\n", bad_data_bit); fsl_mc_printk(mci, KERN_ERR, "Expected Data: %#8.8x_%08x\n", cap_high, cap_low); } if (bad_ecc_bit != -1) { fsl_mc_printk(mci, KERN_ERR, "Faulty ECC bit: %d\n", bad_ecc_bit); fsl_mc_printk(mci, KERN_ERR, "Expected ECC: %#2.2x\n", syndrome ^ (1 << bad_ecc_bit)); } } This way you print only when the respective faulty bits have been properly found and not print anything otherwise. Hmm?
diff --git a/drivers/edac/fsl_ddr_edac.c b/drivers/edac/fsl_ddr_edac.c index 6d8ea226010d..4b6989cf1947 100644 --- a/drivers/edac/fsl_ddr_edac.c +++ b/drivers/edac/fsl_ddr_edac.c @@ -343,9 +343,9 @@ static void fsl_mc_check(struct mem_ctl_info *mci) fsl_mc_printk(mci, KERN_ERR, "Expected Data / ECC:\t%#8.8x_%08x / %#2.2x\n", - cap_high ^ (1 << (bad_data_bit - 32)), - cap_low ^ (1 << bad_data_bit), - syndrome ^ (1 << bad_ecc_bit)); + (bad_data_bit > 31) ? cap_high ^ (1 << (bad_data_bit - 32)) : cap_high, + (bad_data_bit <= 31) ? cap_low ^ (1 << (bad_data_bit)) : cap_low, + (bad_ecc_bit != -1) ? syndrome ^ (1 << (bad_ecc_bit)) : syndrome); } fsl_mc_printk(mci, KERN_ERR,
When a correctable single bit error occurs, the driver calculates the bad_data_bit respectively the bad_ecc_bit. If there is no error in the corresponding data, the value becomes -1. With this the expected data message is calculated. In the case of an error in the lower 32 bits or no error (-1) the right side operand of the bit-shift becomes negative which is undefined behavior. This can result in wrong and misleading messages like this: [ 311.103794] EDAC FSL_DDR MC0: Faulty Data bit: 36 [ 311.108490] EDAC FSL_DDR MC0: Expected Data / ECC: 0xffffffef_ffffffff / 0x80000059 [ 311.116135] EDAC FSL_DDR MC0: Captured Data / ECC: 0xffffffff_ffffffef / 0x59 Fix this by only calculating the expected data where the error occurred. With the fix the dmesg output looks like this: [ 311.103794] EDAC FSL_DDR MC0: Faulty Data bit: 36 [ 311.108490] EDAC FSL_DDR MC0: Expected Data / ECC: 0xffffffef_ffffffef / 0x59 [ 311.116135] EDAC FSL_DDR MC0: Captured Data / ECC: 0xffffffff_ffffffef / 0x59 Signed-off-by: Gregor Herburger <gregor.herburger@ew.tq-group.com> --- drivers/edac/fsl_ddr_edac.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)