Message ID | 20241025024602.24318-4-qiuxu.zhuo@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | x86/mce: Clean up some x86/mce code | expand |
On Fri, Oct 25, 2024 at 10:45:55AM +0800, Qiuxu Zhuo wrote: > @@ -1748,7 +1748,7 @@ static void mce_timer_delete_all(void) > * Can be called from interrupt context, but not from machine check/NMI > * context. > */ > -int mce_notify_irq(void) > +bool mce_notify_user(void) So why are you not fixing the comment above it too then? Have you audited all the code to make sure this function is not called from IRQ context anymore? Did you do git archeology to find out why it was called "_irq" at all in the first place and why is it ok to change the name and adjust the comment above it now? In any case, this change needs to be a separate patch along with all the explanations why is it ok to rename it. Thx.
Hi Boris, Thanks for your time reviewing the series. > From: Borislav Petkov <bp@alien8.de> > [...] > On Fri, Oct 25, 2024 at 10:45:55AM +0800, Qiuxu Zhuo wrote: > > @@ -1748,7 +1748,7 @@ static void mce_timer_delete_all(void) > > * Can be called from interrupt context, but not from machine check/NMI > > * context. > > */ > > -int mce_notify_irq(void) > > +bool mce_notify_user(void) > > So why are you not fixing the comment above it too then? > > Have you audited all the code to make sure this function is not called from > IRQ context anymore? Currently this function is called from either interrupt context (timer) or process context (notifier chain). > Did you do git archeology to find out why it was called "_irq" at all in the first Thanks for reminding me this. Commit 9ff36ee9668f ("x86, mce: rename mce_notify_user to mce_notify_irq") renamed mce_notify_user() to mce_notify_irq() to indicate that this function should only be called from interrupt context and not used in machine check or NMI context. > place and why is it ok to change the name and adjust the comment above it > now? At first glance, the function name mce_notify_irq() it looks like "MCE notifies IRQ ...", which is confusing and doesn't clearly reflect what it does. After the "git archeology" above and offline communication from Andi (the commit author), it was clarified that the suffix '_irq' was only used to indicate that the function can only be used in interrupt context. But I think the comments above the function clearly indicates which types of context it can be used in, so it doesn't need the suffix '_irq' in the function name. Renaming it back to mce_notify_user() can better reflect its function of notifying the user(s) about the new machine check events. > In any case, this change needs to be a separate patch along with all the > explanations why is it ok to rename it. OK. I'll sperate this patch in the next version. How about the following diff? From f33f7340a40d3db1f4acf3c9ded574d1b9801fa7 Mon Sep 17 00:00:00 2001 From: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Date: Tue, 29 Oct 2024 09:07:47 +0800 Subject: [PATCH 1/1] x86/mce: Rename mce_notify_irq() back to mce_notify_user() Commit 9ff36ee9668f ("x86, mce: rename mce_notify_user to mce_notify_irq") renamed mce_notify_user() to mce_notify_irq() to indicate that this function should only be called from interrupt context and not used in machine check or NMI context. However, the function name mce_notify_irq() is confusing and doesn't clearly reflect what it does. The comments above the function clearly indicates which types of context it can be used in, so it doesn't need the suffix '_irq' in the function name. Rename it back to mce_notify_user() to better reflect its function of notifying the user(s) about the new machine check events and adjust the comment to indicate that it can also be called from process context. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> --- arch/x86/include/asm/mce.h | 2 +- arch/x86/kernel/cpu/mce/core.c | 10 +++++----- arch/x86/kernel/cpu/mce/inject.c | 2 +- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index 7a01bb5b19d3..dd9effd7c8b5 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -264,7 +264,7 @@ enum mcp_flags { void machine_check_poll(enum mcp_flags flags, mce_banks_t *b); -bool mce_notify_irq(void); +bool mce_notify_user(void); DECLARE_PER_CPU(struct mce, injectm); diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 725c1d6fb1e5..5f42ab2ea944 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -584,7 +584,7 @@ static int mce_early_notifier(struct notifier_block *nb, unsigned long val, set_bit(0, &mce_need_notify); - mce_notify_irq(); + mce_notify_user(); return NOTIFY_DONE; } @@ -1704,7 +1704,7 @@ static void mce_timer_fn(struct timer_list *t) * Alert userspace if needed. If we logged an MCE, reduce the polling * interval, otherwise increase the polling interval. */ - if (mce_notify_irq()) + if (mce_notify_user()) iv = max(iv / 2, (unsigned long) HZ/100); else iv = min(iv * 2, round_jiffies_relative(check_interval * HZ)); @@ -1745,10 +1745,10 @@ static void mce_timer_delete_all(void) /* * Notify the user(s) about new machine check events. - * Can be called from interrupt context, but not from machine check/NMI + * Can be called from process/interrupt context, but not from machine check/NMI * context. */ -bool mce_notify_irq(void) +bool mce_notify_user(void) { /* Not more than two messages every minute */ static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2); @@ -1763,7 +1763,7 @@ bool mce_notify_irq(void) } return false; } -EXPORT_SYMBOL_GPL(mce_notify_irq); +EXPORT_SYMBOL_GPL(mce_notify_user); static void __mcheck_cpu_mce_banks_init(void) { diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c index 49ed3428785d..f5a7dae385c6 100644 --- a/arch/x86/kernel/cpu/mce/inject.c +++ b/arch/x86/kernel/cpu/mce/inject.c @@ -229,7 +229,7 @@ static int raise_local(void) } else if (m->status) { pr_info("Starting machine check poll CPU %d\n", cpu); raise_poll(m); - mce_notify_irq(); + mce_notify_user(); pr_info("Machine check poll done on CPU %d\n", cpu); } else m->finished = 0; -- 2.17.1
On Tue, Oct 29, 2024 at 03:32:00AM +0000, Zhuo, Qiuxu wrote: > At first glance, the function name mce_notify_irq() it looks like "MCE > notifies IRQ ...", which is confusing and doesn't clearly reflect what it > does. Maybe to you but the name means exactly that - it is run in irq context. > But I think the comments above the function clearly indicates which types of > context it can be used in, so it doesn't need the suffix '_irq' in the > function name. Renaming it back to mce_notify_user() can better reflect its > function of notifying the user(s) about the new machine check events. Who else would you be notifying except the users?! > renamed mce_notify_user() to mce_notify_irq() to indicate that this function > should only be called from interrupt context and not used in machine check > or NMI context. However, the function name mce_notify_irq() is confusing and > doesn't clearly reflect what it does. Maybe it confuses you only. Considering how there's not a notify-in-NMI/MCE counterpart, I guess this function could be renamed to "mce_notify" simply. Or not do anything at all. It has been that way for over a decade and hasn't bothered anyone. Let's not get overeager. Thx.
Hi Boris, > From: Borislav Petkov <bp@alien8.de> > [...] > Or not do anything at all. It has been that way for over a decade and hasn't > bothered anyone. Let's not get overeager. Thanks for letting me know your thoughts. OK. I'll drop this part in the next version. -Qiuxu
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index 3b9970117a0f..dd9effd7c8b5 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -244,7 +244,7 @@ static inline void cmci_rediscover(void) {} static inline void cmci_recheck(void) {} #endif -int mce_available(struct cpuinfo_x86 *c); +bool mce_available(struct cpuinfo_x86 *c); bool mce_is_memory_error(struct mce *m); bool mce_is_correctable(struct mce *m); bool mce_usable_address(struct mce *m); @@ -264,7 +264,7 @@ enum mcp_flags { void machine_check_poll(enum mcp_flags flags, mce_banks_t *b); -int mce_notify_irq(void); +bool mce_notify_user(void); DECLARE_PER_CPU(struct mce, injectm); diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index 14bf8c232e45..4dae9841ee38 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -381,7 +381,7 @@ static bool lvt_interrupt_supported(unsigned int bank, u32 msr_high_bits) return msr_high_bits & BIT(28); } -static int lvt_off_valid(struct threshold_block *b, int apic, u32 lo, u32 hi) +static bool lvt_off_valid(struct threshold_block *b, int apic, u32 lo, u32 hi) { int msr = (hi & MASK_LVTOFF_HI) >> 20; @@ -389,7 +389,7 @@ static int lvt_off_valid(struct threshold_block *b, int apic, u32 lo, u32 hi) pr_err(FW_BUG "cpu %d, failed to setup threshold interrupt " "for bank %d, block %d (MSR%08X=0x%x%08x)\n", b->cpu, b->bank, b->block, b->address, hi, lo); - return 0; + return false; } if (apic != msr) { @@ -399,15 +399,15 @@ static int lvt_off_valid(struct threshold_block *b, int apic, u32 lo, u32 hi) * was set is reserved. Return early here: */ if (mce_flags.smca) - return 0; + return false; pr_err(FW_BUG "cpu %d, invalid threshold interrupt offset %d " "for bank %d, block %d (MSR%08X=0x%x%08x)\n", b->cpu, apic, b->bank, b->block, b->address, hi, lo); - return 0; + return false; } - return 1; + return true; }; /* Reprogram MCx_MISC MSR behind this threshold bank. */ diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 2a938f429c4d..57c05015f984 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -479,10 +479,10 @@ static noinstr void mce_gather_info(struct mce *m, struct pt_regs *regs) } } -int mce_available(struct cpuinfo_x86 *c) +bool mce_available(struct cpuinfo_x86 *c) { if (mca_cfg.disabled) - return 0; + return false; return cpu_has(c, X86_FEATURE_MCE) && cpu_has(c, X86_FEATURE_MCA); } @@ -584,7 +584,7 @@ static int mce_early_notifier(struct notifier_block *nb, unsigned long val, set_bit(0, &mce_need_notify); - mce_notify_irq(); + mce_notify_user(); return NOTIFY_DONE; } @@ -1704,7 +1704,7 @@ static void mce_timer_fn(struct timer_list *t) * Alert userspace if needed. If we logged an MCE, reduce the polling * interval, otherwise increase the polling interval. */ - if (mce_notify_irq()) + if (mce_notify_user()) iv = max(iv / 2, (unsigned long) HZ/100); else iv = min(iv * 2, round_jiffies_relative(check_interval * HZ)); @@ -1748,7 +1748,7 @@ static void mce_timer_delete_all(void) * Can be called from interrupt context, but not from machine check/NMI * context. */ -int mce_notify_irq(void) +bool mce_notify_user(void) { /* Not more than two messages every minute */ static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2); @@ -1759,11 +1759,11 @@ int mce_notify_irq(void) if (__ratelimit(&ratelimit)) pr_info(HW_ERR "Machine check events logged\n"); - return 1; + return true; } - return 0; + return false; } -EXPORT_SYMBOL_GPL(mce_notify_irq); +EXPORT_SYMBOL_GPL(mce_notify_user); static void __mcheck_cpu_mce_banks_init(void) { @@ -1985,25 +1985,25 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c) return 0; } -static int __mcheck_cpu_ancient_init(struct cpuinfo_x86 *c) +static bool __mcheck_cpu_ancient_init(struct cpuinfo_x86 *c) { if (c->x86 != 5) - return 0; + return false; switch (c->x86_vendor) { case X86_VENDOR_INTEL: intel_p5_mcheck_init(c); mce_flags.p5 = 1; - return 1; + return true; case X86_VENDOR_CENTAUR: winchip_mcheck_init(c); mce_flags.winchip = 1; - return 1; + return true; default: - return 0; + return false; } - return 0; + return false; } /* diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c index 49ed3428785d..f5a7dae385c6 100644 --- a/arch/x86/kernel/cpu/mce/inject.c +++ b/arch/x86/kernel/cpu/mce/inject.c @@ -229,7 +229,7 @@ static int raise_local(void) } else if (m->status) { pr_info("Starting machine check poll CPU %d\n", cpu); raise_poll(m); - mce_notify_irq(); + mce_notify_user(); pr_info("Machine check poll done on CPU %d\n", cpu); } else m->finished = 0; diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c index b3cd2c61b11d..f863df0ff42c 100644 --- a/arch/x86/kernel/cpu/mce/intel.c +++ b/arch/x86/kernel/cpu/mce/intel.c @@ -75,12 +75,12 @@ static u16 cmci_threshold[MAX_NR_BANKS]; */ #define CMCI_STORM_THRESHOLD 32749 -static int cmci_supported(int *banks) +static bool cmci_supported(int *banks) { u64 cap; if (mca_cfg.cmci_disabled || mca_cfg.ignore_ce) - return 0; + return false; /* * Vendor check is not strictly needed, but the initial @@ -89,10 +89,11 @@ static int cmci_supported(int *banks) */ if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL && boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN) - return 0; + return false; if (!boot_cpu_has(X86_FEATURE_APIC) || lapic_get_maxlvt() < 6) - return 0; + return false; + rdmsrl(MSR_IA32_MCG_CAP, cap); *banks = min_t(unsigned, MAX_NR_BANKS, cap & MCG_BANKCNT_MASK); return !!(cap & MCG_CMCI_P);