Message ID | 20241209215636.2744733-1-avadhut.naik@amd.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | EDAC/amd64: Fix possible module load failure on some UMC usage combinations | expand |
On Mon, Dec 09, 2024 at 09:55:10PM +0000, Avadhut Naik wrote: > Starting Zen4, AMD SOCs have 12 Unified Memory Controllers (UMCs) per > socket. > > When the amd64_edac module is being loaded, these UMCs are traversed to > determine if they have SdpInit (SdpCtrl[31]) and EccEnabled (UmcCapHi[30]) > bits set and create masks in umc_en_mask and ecc_en_mask respectively. > > However, the current data type of these variables is u8. As a result, if > only the last 4 UMCs (UMC8 - UMC11) of the system have been utilized, > umc_ecc_enabled() will return false. Consequently, the module may fail to > load on these systems. > > Change the data type of these variables to u16. No need to explain what the patch does. The "why" is enough. > Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> > --- > drivers/edac/amd64_edac.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) This looks like it needs a CC:stable and a Fixes: tag, right? While at it, you can simply make those vars int and be done with it. Thx.
On 12/10/2024 03:55, Borislav Petkov wrote: > On Mon, Dec 09, 2024 at 09:55:10PM +0000, Avadhut Naik wrote: >> Starting Zen4, AMD SOCs have 12 Unified Memory Controllers (UMCs) per >> socket. >> >> When the amd64_edac module is being loaded, these UMCs are traversed to >> determine if they have SdpInit (SdpCtrl[31]) and EccEnabled (UmcCapHi[30]) >> bits set and create masks in umc_en_mask and ecc_en_mask respectively. >> >> However, the current data type of these variables is u8. As a result, if >> only the last 4 UMCs (UMC8 - UMC11) of the system have been utilized, >> umc_ecc_enabled() will return false. Consequently, the module may fail to >> load on these systems. >> >> Change the data type of these variables to u16. > > No need to explain what the patch does. The "why" is enough. > Will remove this line. >> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> >> --- >> drivers/edac/amd64_edac.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) > > This looks like it needs a CC:stable and a Fixes: tag, right? > It does need a Fixes: tag. Will add that. But not so sure about the CC:stable. Quoting from stable-kernel-rules about the patches that are accepted into stable. It must fix a real bug that bothers people (not a, “This could be a problem...” type thing). This has not been reported by anyone yet. IMO, it classifies as "This could be a problem" type issue. Would you agree? > While at it, you can simply make those vars int and be done with it. > Okay, will change the data type of these 2 variables to int. > Thx. >
On Tue, Dec 10, 2024 at 11:32:06AM -0600, Naik, Avadhut wrote: > But not so sure about the CC:stable. > Quoting from stable-kernel-rules about the patches that are accepted into stable. > > It must fix a real bug that bothers people (not a, “This could be a problem...” type thing). > > This has not been reported by anyone yet. IMO, it classifies as > "This could be a problem" type issue. > Would you agree? So if someone populates the last 4 UMCs, he won't have EDAC load on long-term kernels. So are you sure about "this could be a problem"? Or is it more like a "this is real problem for some configurations" issue? IOW, at least "oh, that's not good" issue. Or?
On 12/10/2024 12:04, Borislav Petkov wrote: > On Tue, Dec 10, 2024 at 11:32:06AM -0600, Naik, Avadhut wrote: >> But not so sure about the CC:stable. >> Quoting from stable-kernel-rules about the patches that are accepted into stable. >> >> It must fix a real bug that bothers people (not a, “This could be a problem...” type thing). >> >> This has not been reported by anyone yet. IMO, it classifies as >> "This could be a problem" type issue. >> Would you agree? > > So if someone populates the last 4 UMCs, he won't have EDAC load on long-term > kernels. So are you sure about "this could be a problem"? > > Or is it more like a "this is real problem for some configurations" issue? > > IOW, at least "oh, that's not good" issue. > > Or? Agreed that this is a real problem for some configurations, particularly since the module wont load, even on long term kernels. Will add stable to Cc and resend. >
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index ddfbdb66b794..583685e8e60f 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -3362,7 +3362,7 @@ static bool dct_ecc_enabled(struct amd64_pvt *pvt) static bool umc_ecc_enabled(struct amd64_pvt *pvt) { - u8 umc_en_mask = 0, ecc_en_mask = 0; + u16 umc_en_mask = 0, ecc_en_mask = 0; u16 nid = pvt->mc_node_id; struct amd64_umc *umc; u8 ecc_en = 0, i;
Starting Zen4, AMD SOCs have 12 Unified Memory Controllers (UMCs) per socket. When the amd64_edac module is being loaded, these UMCs are traversed to determine if they have SdpInit (SdpCtrl[31]) and EccEnabled (UmcCapHi[30]) bits set and create masks in umc_en_mask and ecc_en_mask respectively. However, the current data type of these variables is u8. As a result, if only the last 4 UMCs (UMC8 - UMC11) of the system have been utilized, umc_ecc_enabled() will return false. Consequently, the module may fail to load on these systems. Change the data type of these variables to u16. Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> --- drivers/edac/amd64_edac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)