Message ID | YWCPyK+xotTgUMy/@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [GIT,PULL] arm64 fixes for 5.15-rc5 | expand |
On Fri, Oct 8, 2021 at 11:37 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > Pingfan Liu (2): > kernel/irq: make irq_{enter,exit}() in handle_domain_irq() arch optional > arm64: entry: avoid double-accounting IRQ RCU entry Ugh. This is *really* ugly. And it seems to be going exactly the wrong way. I read the commit descriptions, and it still doesn't answer the fundamental question of why arm64 needs to do the accounting in arch-specific code, and disable the generic code. It says To fix this, we must perform all the accounting from the architecture code. We prevent the IRQ domain code from performing any accounting by selecting HAVE_ARCH_IRQENTRY, and must call irq_enter_rcu() and irq_exit_rcu() around invoking the root IRQ handler. but at no point does it actually explain *why* all the accounting needs to be done by the architecture code. Yes, yes, I read the previous paragraph. But why isn't the fix to just stop doing the double accounting in the arm64 specific code? Instead it doubles down on that "let's do this non-arch-specific accounting in arch-specific code", making the common code uglier and weaker. I initially pulled this, and then I just unpulled in disgust. Please explain why arm64 does this bad thing, and why the fix isn't "fix arm64", but instead "make the generic code uglier and harder to maintain and follow". Really, from all the explanations those commits give, the natural thing to do would be "just fix arm64". So if that really isn't the answer, then the explanations are clearly lacking. Linus
Hi Linus, [adding Paul and Peter, just in case we need to discuss the RCU and accounting bits further] On Fri, Oct 08, 2021 at 01:25:51PM -0700, Linus Torvalds wrote: > On Fri, Oct 8, 2021 at 11:37 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > > > Pingfan Liu (2): > > kernel/irq: make irq_{enter,exit}() in handle_domain_irq() arch optional > > arm64: entry: avoid double-accounting IRQ RCU entry > > Ugh. This is *really* ugly. And it seems to be going exactly the wrong way. > > I read the commit descriptions, and it still doesn't answer the > fundamental question of why arm64 needs to do the accounting in > arch-specific code, and disable the generic code. > > It says > > To fix this, we must perform all the accounting from the architecture > code. We prevent the IRQ domain code from performing any accounting by > selecting HAVE_ARCH_IRQENTRY, and must call irq_enter_rcu() and > irq_exit_rcu() around invoking the root IRQ handler. > > but at no point does it actually explain *why* all the accounting > needs to be done by the architecture code. Sorry; I agree that commit messages don't explain this thoroughly. I can go and rework the commit messages to clarify things, if you'd like? The TL;DR is that a bunch of constraints conspire such that we can't defer accounting things to the irqdomain or irqchip code without making this even more complicated/fragile, and many architectures get this subtly wrong today -- it's not that arm64 is necessarily much different from other architectures using this code, just that we're trying to solve this first. More specifically, the problem here is the combination of: * During entry, rcu_irq_enter() must be called *before* trace_hardirqs_off_finish(), because the latter needs RCU to be watching, but we need to have informed lockdep before poking RCU or lockdep will complain withing the RCU code. To handle that, kernel/entry/common.c and arch/arm64/kernel/entry-common.c have the sequence: lockdep_hardirqs_off(CALLER_ADDR0); rcu_irq_enter(): // or rcu_irq_enter_check_tick(); trace_hardirqs_off_finish(); ... and since we don't want something to come in the middle of the sequence, we want those sandwiched together in a noinstr function called before we invoke the root irqchip's handler function. A bunch of architectures don't follow this sequence, and are potentially subtly broken today in some configurations. Note: the plan is to move arm64 over to the generic entry code, but that has no effect on this specific issue. * rcu_irq_enter() must be called *precisely once* upon taking an interrupt exception, or we can end up mis-accounting quiescent periods as non-quiescent (since this maintains a nesting count in rcu_data::dynticks_nmi_nesting, checked by rcu_is_cpu_rrupt_from_idle()). * Prior to these patches, we call rcu_irq_enter() at least twice on arm64, because the architectural entry code must call rcu_irq_enter() for the lockdep bits, then when we invoke the irqchip (e.g. GICv3), we do: gic_handle_irq() handle_domain_irq() irq_enter() rcu_irq_enter() irq_enter_rcu() ... and so if we take a sched clock IRQ and call rcu_sched_clock_irq(), rcu_is_cpu_rrupt_from_idle() may return false when it should return true(), and we don't decide to preempt the task, leading to stalls. Note that since irqchips can be chained (e.g. there can be a second interrupt controller fed into the GIC), handle_domain_irq() can be nested, so even if we somehow removed the accounting from arch code we'd need to handle the nested calls to handle_domain_irq() somehow. AFAICT it's far simpler to do that *once* outside of the irqchip code than it is to try to handle that nesting. Note that x86 doesn't use handle_domain_irq(), and just maps the raw irqnrs itself, and just calls irq_enter_rcu() when transitioning to the IRQ stack. > Yes, yes, I read the previous paragraph. But why isn't the fix to just > stop doing the double accounting in the arm64 specific code? As above, that'd require moving *some* of the low-level entry logic into the irqchip/irqdomain code, and handling nesting, which is *more* fragile. FWIW, we do need to fix the other architectures too, but that involves more substantial rework to each of those (e.g. moving to generic entry), since there are a bunch of problems today. The thinking was that this gave a way to fix each of those on its own, then remove the old behaviour. Does that all make sense to you? Thanks, Mark.
On Mon, Oct 11, 2021 at 3:47 AM Mark Rutland <mark.rutland@arm.com> wrote: > > Sorry; I agree that commit messages don't explain this thoroughly. I can > go and rework the commit messages to clarify things, if you'd like? So I really hate that patch, even with explanation. Having the Kconfig option for "do the right thing" is not how this should be fixed. Either the generic code is generic, or it isn't. And if the generic code doesn't work for arm64, we shouldn't add a random Kconfig option for "this architecture wants different behavior". > The TL;DR is that a bunch of constraints conspire such that we can't > defer accounting things to the irqdomain or irqchip code without making > this even more complicated/fragile, and many architectures get this > subtly wrong today -- it's not that arm64 is necessarily much different > from other architectures using this code, just that we're trying to > solve this first. So then I think the fix is to just say "accounting in this generic function is wrong, and the accounting needs to be moved to the callers". That's particularly true if you think arm64 is only the only actually _tested_ case that gets this wrong, and other architectures most likely have the exact same issue, but you only fixed it for arm64. So do it unconditionally - possibly even using a coccinelle script if there are lots of callers. Because generic code that just isn't generic, but randomly does different things based on subtle Kconfig options that are different for different architectures is bad, bad, bad. And yes, I realize that we've had that kind of stuff before (and we have odd Kconfig option testing in that irqdesc.c file elsewhere), but the Kconfig options we currently have are mostly either (a) actual real honest-to-goodness config options with semantic meaning (ie things like CONFIG_SMP and CONFIG_NUMA) (b) really ugly workarounds for odd special module exports (that CONFIG_KVM_BOOK3S_64_HV_MODULE thing for irq_to_desc() that we *tried* but failed to remove). And so the reason I really hate that patch is that it introduces a new "different architectures randomly and inexplicably do different things, and the generic behavior is very different on arm64 than it is elsewhere". That's just the worst kind of hack to me. And in this case, it's really *horribly* hard to see what the call chain is. It all ends up being actively obfuscated and obscured through that 'handle_arch_irq' function pointer, that is sometimes set through set_handle_irq(), and sometimes set directly. I really think that if the rule is "we can't do accounting in handle_domain_irq(), because it's too late for arm64", then the fix really should be to just not do that. Move the irq_enter()/irq_exit() to the callers - quite possibly far up the call chain to the root of it all, and just say "architecture code needs to do this in the low-level code before calling handle_arch_irq". Or, if it turns out that 99% of callers do want it - and don't have the issues arm64 has - maybe we can have a helper wrapper that does the irq_enter/irq_exit, and another version that doesn't do it, and at least it's clear to the callers which one it is. But it looks like particularly with the odd indirection through that handle_arch_irq function, it's best to just say "this needs to be done". What architectures actually care? From what I can follow, it's really GENERIC_IRQ_MULTI_HANDLER that ends up doing this all, and then arm64 has it's own special case for no obvious reason (why isn't it using GENERIC_IRQ_MULTI_HANDLER that seems to do the EXACT same thing in generic code except for a special "default != NULL" case?) Anyway, it _looks_ to me like the pattern is very simple: Step 1: - remove irq_enter/irq_exit from handle_domain_irq(), move it to all callers. This clearly doesn't change anything at all, but also doesn't fix the problem you have. But it's easy to verify that the code is the same before-and-after. Step 2 is the pattern matching step: - if the caller of handle_domain_irq() ends up being a function that is registered with set_handle_irq(), then we (a) remove the irq_enter/irq_exit from it (b) add it to the architectures that use handle_arch_irq. (c) make sure that if there are other callers of it (not through handle_arch_irq) we move that irq_enter/irq_exit into them too I _suspect_ - but didn't check - that Step 2(c) doesn't actually exist. But who knows. It really looks like there is a very tight connection between "uses handle_domain_irq()" and "uses handle_arch_irq/set_handle_irq()". No? Is this a bit more work? Yes. Would this fix arm64? Yes - it ends up effectively doing what you did. Would this fix _other_ architectures doing the same thing that you suspect are broken? YES. Instead of leaving them randomly broken. And most importantly, it would make the rules for "generic" code clear, and actually generic. Now, it may be that I'm wrong. I'm willing to be convinced, and if you beat me over the head enough I guess I can take that pull request and just be unhappy about it. So I'm not trying to draw some line in the sand and be an arsehole and say "you have to do it this way". But I look at that patch in that pull request, and it really screams "this is horribly wrong" to me. And I _think_ the above suggestion can be done almost mechanically. And maybe it's easier to combine step1/step2 into one thing, if that handle_domain_irq/handle_arch_irq" use is really as tightly bound as it seems to be from a quick look. So I'm not saying that they have to be separate steps, I wrote them out that way mainly to kind of show the logic of the change - and in case there are situations where you end up calling things without going through that handle_arch_irq thing. Linus
Linus, On Mon, Oct 11 2021 at 12:54, Linus Torvalds wrote: > On Mon, Oct 11, 2021 at 3:47 AM Mark Rutland <mark.rutland@arm.com> wrote: > And so the reason I really hate that patch is that it introduces a new > "different architectures randomly and inexplicably do different > things, and the generic behavior is very different on arm64 than it is > elsewhere". > > That's just the worst kind of hack to me. > > And in this case, it's really *horribly* hard to see what the call > chain is. It all ends up being actively obfuscated and obscured > through that 'handle_arch_irq' function pointer, that is sometimes set > through set_handle_irq(), and sometimes set directly. > > I really think that if the rule is "we can't do accounting in > handle_domain_irq(), because it's too late for arm64", then the fix > really should be to just not do that. > > Move the irq_enter()/irq_exit() to the callers - quite possibly far up > the call chain to the root of it all, and just say "architecture code > needs to do this in the low-level code before calling > handle_arch_irq". That's where it belongs. It's mandatory to have it there for NOHZ_FULL to work correctly vs. instrumentation etc. I've pointed that out back then after we fed the X86 entry code into the mincer and added noinstr sections to keep tracers, BPF and kprobes away from it. Looking at the architectures which "support" that by selecting HAVE_CONTEXT_TRACKING: arch/arm/Kconfig: select HAVE_CONTEXT_TRACKING arch/arm64/Kconfig: select HAVE_CONTEXT_TRACKING arch/csky/Kconfig: select HAVE_CONTEXT_TRACKING arch/mips/Kconfig: select HAVE_CONTEXT_TRACKING arch/powerpc/Kconfig: select HAVE_CONTEXT_TRACKING if PPC64 arch/riscv/Kconfig: select HAVE_CONTEXT_TRACKING arch/sparc/Kconfig: select HAVE_CONTEXT_TRACKING arch/x86/Kconfig: select HAVE_CONTEXT_TRACKING if X86_64 S390 and X86 are (mostly) complete and use the generic entry code. S390 does not even select HAVE_CONTEXT_TRACKING! PPC64 has done quite some work to fix that, but it looks not yet complete. Mark is working on ARM64. There is some effort underway to convert MIPS over to generic entry. The rest needs all the fundamental architecture side changes. > Anyway, it _looks_ to me like the pattern is very simple: > > Step 1: > - remove irq_enter/irq_exit from handle_domain_irq(), move it to all callers. > > This clearly doesn't change anything at all, but also doesn't fix the > problem you have. But it's easy to verify that the code is the same > before-and-after. > > Step 2 is the pattern matching step: > > - if the caller of handle_domain_irq() ends up being a function that > is registered with set_handle_irq(), then we > (a) remove the irq_enter/irq_exit from it > (b) add it to the architectures that use handle_arch_irq. > (c) make sure that if there are other callers of it (not through > handle_arch_irq) we move that irq_enter/irq_exit into them too > > I _suspect_ - but didn't check - that Step 2(c) doesn't actually > exist. But who knows. It only exists with chained handlers, but they do not need that at all because: irq_enter() arch_handle_irq() handle_domain_irq() chained_handler() handle_domain_irq() which is still the same interrupt context and not a nested interrupt. > It really looks like there is a very tight connection between "uses > handle_domain_irq()" and "uses handle_arch_irq/set_handle_irq()". No? Looks like. That might conflict with the MIPS rework though. I don't know how far that came already. Cc'ed the MIPS people. Thanks, tglx
Hi Linus, Thomas, A couple of minor comments below -- I'll have a go at the rework Linus suggested and see what blows up. On Tue, Oct 12, 2021 at 03:18:16PM +0200, Thomas Gleixner wrote: > On Mon, Oct 11 2021 at 12:54, Linus Torvalds wrote: > > On Mon, Oct 11, 2021 at 3:47 AM Mark Rutland <mark.rutland@arm.com> wrote: > > And so the reason I really hate that patch is that it introduces a new > > "different architectures randomly and inexplicably do different > > things, and the generic behavior is very different on arm64 than it is > > elsewhere". > > > > That's just the worst kind of hack to me. > > > > And in this case, it's really *horribly* hard to see what the call > > chain is. It all ends up being actively obfuscated and obscured > > through that 'handle_arch_irq' function pointer, that is sometimes set > > through set_handle_irq(), and sometimes set directly. > > > > I really think that if the rule is "we can't do accounting in > > handle_domain_irq(), because it's too late for arm64", then the fix > > really should be to just not do that. > > > > Move the irq_enter()/irq_exit() to the callers - quite possibly far up > > the call chain to the root of it all, and just say "architecture code > > needs to do this in the low-level code before calling > > handle_arch_irq". > > That's where it belongs. It's mandatory to have it there for NOHZ_FULL > to work correctly vs. instrumentation etc. I've pointed that out back > then after we fed the X86 entry code into the mincer and added noinstr > sections to keep tracers, BPF and kprobes away from it. > > Looking at the architectures which "support" that by selecting > HAVE_CONTEXT_TRACKING: > > arch/arm/Kconfig: select HAVE_CONTEXT_TRACKING > arch/arm64/Kconfig: select HAVE_CONTEXT_TRACKING > arch/csky/Kconfig: select HAVE_CONTEXT_TRACKING > arch/mips/Kconfig: select HAVE_CONTEXT_TRACKING > arch/powerpc/Kconfig: select HAVE_CONTEXT_TRACKING if PPC64 > arch/riscv/Kconfig: select HAVE_CONTEXT_TRACKING > arch/sparc/Kconfig: select HAVE_CONTEXT_TRACKING > arch/x86/Kconfig: select HAVE_CONTEXT_TRACKING if X86_64 > > S390 and X86 are (mostly) complete and use the generic entry code. S390 > does not even select HAVE_CONTEXT_TRACKING! > > PPC64 has done quite some work to fix that, but it looks not yet complete. > > Mark is working on ARM64. > > There is some effort underway to convert MIPS over to generic entry. > > The rest needs all the fundamental architecture side changes. > > > Anyway, it _looks_ to me like the pattern is very simple: > > > > Step 1: > > - remove irq_enter/irq_exit from handle_domain_irq(), move it to all callers. > > > > This clearly doesn't change anything at all, but also doesn't fix the > > problem you have. But it's easy to verify that the code is the same > > before-and-after. I'm happy with this in principle. The only reason we didn't go down that route initially is because the callers are (typically) in the bowels of arch asm or platform code, they all need to be fixed in one go to avoid breaking anything, and it's a headache if we collide with any rework (e.g. MIPS moving to generic entry). As above, I'll have a go and see what blows up. It should be possible to have C wrappers for the common cases, and have the asm call that instead of branching directly to whichever irqchip handler handle_arch_irq points at. > > Step 2 is the pattern matching step: > > > > - if the caller of handle_domain_irq() ends up being a function that > > is registered with set_handle_irq(), then we > > (a) remove the irq_enter/irq_exit from it > > (b) add it to the architectures that use handle_arch_irq. > > (c) make sure that if there are other callers of it (not through > > handle_arch_irq) we move that irq_enter/irq_exit into them too > > > > I _suspect_ - but didn't check - that Step 2(c) doesn't actually > > exist. But who knows. > > It only exists with chained handlers, but they do not need that at all > because: > > irq_enter() > arch_handle_irq() > handle_domain_irq() > chained_handler() > handle_domain_irq() ... and this is exactly the shape of case where we need to avoid the nested calls so as to not break RCU's view of nesting. > which is still the same interrupt context and not a nested interrupt. > > It really looks like there is a very tight connection between "uses > > handle_domain_irq()" and "uses handle_arch_irq/set_handle_irq()". No? > > Looks like. That might conflict with the MIPS rework though. I don't > know how far that came already. Cc'ed the MIPS people. There's also a bunch of old platforms on arch/arm which have a hard-coded handler (so not using handle_arch_irq/set_handle_irq()) which calls handle_domain_irq() -- those can be fixed up. Thanks, Mark.
On Tue, Oct 12 2021 at 15:02, Mark Rutland wrote: > On Tue, Oct 12, 2021 at 03:18:16PM +0200, Thomas Gleixner wrote: >> On Mon, Oct 11 2021 at 12:54, Linus Torvalds wrote: > I'm happy with this in principle. The only reason we didn't go down that > route initially is because the callers are (typically) in the bowels of > arch asm or platform code, they all need to be fixed in one go to avoid > breaking anything, and it's a headache if we collide with any rework > (e.g. MIPS moving to generic entry). mips-next looks pretty empty vs. that. >> > It really looks like there is a very tight connection between "uses >> > handle_domain_irq()" and "uses handle_arch_irq/set_handle_irq()". No? >> >> Looks like. That might conflict with the MIPS rework though. I don't >> know how far that came already. Cc'ed the MIPS people. > > There's also a bunch of old platforms on arch/arm which have a > hard-coded handler (so not using handle_arch_irq/set_handle_irq()) which > calls handle_domain_irq() -- those can be fixed up. If that turns out to be ugly, then somehting like the below might be less horrible as a stop gap. Thanks, tglx --- --- a/arch/x86/xen/smp.c +++ b/arch/x86/xen/smp.c @@ -268,20 +268,16 @@ void xen_send_IPI_allbutself(int vector) static irqreturn_t xen_call_function_interrupt(int irq, void *dev_id) { - irq_enter(); generic_smp_call_function_interrupt(); inc_irq_stat(irq_call_count); - irq_exit(); return IRQ_HANDLED; } static irqreturn_t xen_call_function_single_interrupt(int irq, void *dev_id) { - irq_enter(); generic_smp_call_function_single_interrupt(); inc_irq_stat(irq_call_count); - irq_exit(); return IRQ_HANDLED; } --- a/arch/x86/xen/smp_pv.c +++ b/arch/x86/xen/smp_pv.c @@ -458,10 +458,8 @@ static void xen_pv_stop_other_cpus(int w static irqreturn_t xen_irq_work_interrupt(int irq, void *dev_id) { - irq_enter(); irq_work_run(); inc_irq_stat(apic_irq_work_irqs); - irq_exit(); return IRQ_HANDLED; } --- a/arch/Kconfig +++ b/arch/Kconfig @@ -33,6 +33,9 @@ config HOTPLUG_SMT config GENERIC_ENTRY bool +config ARCH_ENTRY_RCU_CLEAN + bool + config KPROBES bool "Kprobes" depends on MODULES --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -66,6 +66,7 @@ config X86 select ARCH_ENABLE_MEMORY_HOTREMOVE if MEMORY_HOTPLUG select ARCH_ENABLE_SPLIT_PMD_PTLOCK if (PGTABLE_LEVELS > 2) && (X86_64 || X86_PAE) select ARCH_ENABLE_THP_MIGRATION if X86_64 && TRANSPARENT_HUGEPAGE + select ARCH_ENTRY_RCU_CLEAN select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI select ARCH_HAS_CACHE_LINE_SIZE select ARCH_HAS_DEBUG_VIRTUAL --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -677,24 +677,13 @@ int generic_handle_domain_irq(struct irq EXPORT_SYMBOL_GPL(generic_handle_domain_irq); #ifdef CONFIG_HANDLE_DOMAIN_IRQ -/** - * handle_domain_irq - Invoke the handler for a HW irq belonging to a domain, - * usually for a root interrupt controller - * @domain: The domain where to perform the lookup - * @hwirq: The HW irq number to convert to a logical one - * @regs: Register file coming from the low-level handling code - * - * Returns: 0 on success, or -EINVAL if conversion has failed - */ -int handle_domain_irq(struct irq_domain *domain, - unsigned int hwirq, struct pt_regs *regs) +static int __handle_domain_irq(struct irq_domain *domain, + unsigned int hwirq, struct pt_regs *regs) { struct pt_regs *old_regs = set_irq_regs(regs); struct irq_desc *desc; int ret = 0; - irq_enter(); - /* The irqdomain code provides boundary checks */ desc = irq_resolve_mapping(domain, hwirq); if (likely(desc)) @@ -702,12 +691,41 @@ int handle_domain_irq(struct irq_domain else ret = -EINVAL; - irq_exit(); set_irq_regs(old_regs); return ret; } /** + * handle_domain_irq - Invoke the handler for a HW irq belonging to a domain, + * usually for a root interrupt controller + * @domain: The domain where to perform the lookup + * @hwirq: The HW irq number to convert to a logical one + * @regs: Register file coming from the low-level handling code + * + * Returns: 0 on success, or -EINVAL if conversion has failed + */ +#ifdef CONFIG_ARCH_ENTRY_RCU_CLEAN +int handle_domain_irq(struct irq_domain *domain, + unsigned int hwirq, struct pt_regs *regs) +{ + __handle_domain_irq(domain, hwirq, regs); +} +#else +int handle_domain_irq(struct irq_domain *domain, + unsigned int hwirq, struct pt_regs *regs) +{ + /* + * irq_enter()/exit() has to be done in low level + * architecture code. Bandaid for not yet fixed + * architectures. + */ + irq_enter(); + __handle_domain_irq(domain, hwirq, regs); + irq_exit(); +} +#endif + +/** * handle_domain_nmi - Invoke the handler for a HW irq belonging to a domain * @domain: The domain where to perform the lookup * @hwirq: The HW irq number to convert to a logical one --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -601,6 +601,7 @@ void irq_enter_rcu(void) account_hardirq_enter(current); } +#ifndef ARCH_ENTRY_RCU_CLEAN /** * irq_enter - Enter an interrupt context including RCU update */ @@ -609,6 +610,7 @@ void irq_enter(void) rcu_irq_enter(); irq_enter_rcu(); } +#endif static inline void tick_irq_exit(void) { @@ -650,6 +652,7 @@ void irq_exit_rcu(void) lockdep_hardirq_exit(); } +#ifndef ARCH_ENTRY_RCU_CLEAN /** * irq_exit - Exit an interrupt context, update RCU and lockdep * @@ -662,6 +665,7 @@ void irq_exit(void) /* must be last! */ lockdep_hardirq_exit(); } +#endif /* * This function must run with irqs disabled!
Hi Linus, I am not in the Cc list, so just aware of it the day before yesterday. On Mon, Oct 11, 2021 at 12:54:49PM -0700, Linus Torvalds wrote: > On Mon, Oct 11, 2021 at 3:47 AM Mark Rutland <mark.rutland@arm.com> wrote: > > > > Sorry; I agree that commit messages don't explain this thoroughly. I can > > go and rework the commit messages to clarify things, if you'd like? > > So I really hate that patch, even with explanation. > > Having the Kconfig option for "do the right thing" is not how this > should be fixed. > > Either the generic code is generic, or it isn't. > > And if the generic code doesn't work for arm64, we shouldn't add a > random Kconfig option for "this architecture wants different > behavior". > > > The TL;DR is that a bunch of constraints conspire such that we can't > > defer accounting things to the irqdomain or irqchip code without making > > this even more complicated/fragile, and many architectures get this > > subtly wrong today -- it's not that arm64 is necessarily much different > > from other architectures using this code, just that we're trying to > > solve this first. > > So then I think the fix is to just say "accounting in this generic > function is wrong, and the accounting needs to be moved to the > callers". > > That's particularly true if you think arm64 is only the only actually > _tested_ case that gets this wrong, and other architectures most > likely have the exact same issue, but you only fixed it for arm64. > > So do it unconditionally - possibly even using a coccinelle script if > there are lots of callers. > > Because generic code that just isn't generic, but randomly does > different things based on subtle Kconfig options that are different > for different architectures is bad, bad, bad. > When composing the patch, I failed to realize it, but now, I learn. > And yes, I realize that we've had that kind of stuff before (and we > have odd Kconfig option testing in that irqdesc.c file elsewhere), but > the Kconfig options we currently have are mostly either > > (a) actual real honest-to-goodness config options with semantic > meaning (ie things like CONFIG_SMP and CONFIG_NUMA) > > (b) really ugly workarounds for odd special module exports (that > CONFIG_KVM_BOOK3S_64_HV_MODULE thing for irq_to_desc() that we *tried* > but failed to remove). > > And so the reason I really hate that patch is that it introduces a new > "different architectures randomly and inexplicably do different > things, and the generic behavior is very different on arm64 than it is > elsewhere". > > That's just the worst kind of hack to me. > > And in this case, it's really *horribly* hard to see what the call > chain is. It all ends up being actively obfuscated and obscured > through that 'handle_arch_irq' function pointer, that is sometimes set > through set_handle_irq(), and sometimes set directly. > > I really think that if the rule is "we can't do accounting in > handle_domain_irq(), because it's too late for arm64", then the fix > really should be to just not do that. > > Move the irq_enter()/irq_exit() to the callers - quite possibly far up > the call chain to the root of it all, and just say "architecture code > needs to do this in the low-level code before calling > handle_arch_irq". > > Or, if it turns out that 99% of callers do want it - and don't have > the issues arm64 has - maybe we can have a helper wrapper that does > the irq_enter/irq_exit, and another version that doesn't do it, and at > least it's clear to the callers which one it is. But it looks like > particularly with the odd indirection through that handle_arch_irq > function, it's best to just say "this needs to be done". > > What architectures actually care? From what I can follow, it's really > GENERIC_IRQ_MULTI_HANDLER that ends up doing this all, and then arm64 > has it's own special case for no obvious reason (why isn't it using > GENERIC_IRQ_MULTI_HANDLER that seems to do the EXACT same thing in > generic code except for a special "default != NULL" case?) > > Anyway, it _looks_ to me like the pattern is very simple: > > Step 1: > - remove irq_enter/irq_exit from handle_domain_irq(), move it to all callers. > > This clearly doesn't change anything at all, but also doesn't fix the > problem you have. But it's easy to verify that the code is the same > before-and-after. > > Step 2 is the pattern matching step: > > - if the caller of handle_domain_irq() ends up being a function that > is registered with set_handle_irq(), then we > (a) remove the irq_enter/irq_exit from it > (b) add it to the architectures that use handle_arch_irq. > (c) make sure that if there are other callers of it (not through > handle_arch_irq) we move that irq_enter/irq_exit into them too > > I _suspect_ - but didn't check - that Step 2(c) doesn't actually > exist. But who knows. > > It really looks like there is a very tight connection between "uses > handle_domain_irq()" and "uses handle_arch_irq/set_handle_irq()". No? > > Is this a bit more work? Yes. > > Would this fix arm64? Yes - it ends up effectively doing what you did. > > Would this fix _other_ architectures doing the same thing that you > suspect are broken? YES. Instead of leaving them randomly broken. > > And most importantly, it would make the rules for "generic" code > clear, and actually generic. > > Now, it may be that I'm wrong. I'm willing to be convinced, and if you > beat me over the head enough I guess I can take that pull request and > just be unhappy about it. > I had thought if any arch adapts GENERIC_ENTRY, then handle_domain_irq() can be a bug. As GENERIC_ENTRY is a not _random_ config option, so try to re-anchor the config depending on GENERIC_ENTRY. But finally I gave up, since there is no direct link between them at a glance. And what is more, as you said, "the rules for "generic" code clear, and actually generic", so it is better to go in that way. I think Mark has started the work or I will be happy to re-work on these patches. Thanks, Pingfan
On Tue, Oct 12, 2021 at 03:02:43PM +0100, Mark Rutland wrote: > On Tue, Oct 12, 2021 at 03:18:16PM +0200, Thomas Gleixner wrote: > > On Mon, Oct 11 2021 at 12:54, Linus Torvalds wrote: > > > On Mon, Oct 11, 2021 at 3:47 AM Mark Rutland <mark.rutland@arm.com> wrote: > > > And so the reason I really hate that patch is that it introduces a new > > > "different architectures randomly and inexplicably do different > > > things, and the generic behavior is very different on arm64 than it is > > > elsewhere". > > > > > > That's just the worst kind of hack to me. > > > > > > And in this case, it's really *horribly* hard to see what the call > > > chain is. It all ends up being actively obfuscated and obscured > > > through that 'handle_arch_irq' function pointer, that is sometimes set > > > through set_handle_irq(), and sometimes set directly. > > > > > > I really think that if the rule is "we can't do accounting in > > > handle_domain_irq(), because it's too late for arm64", then the fix > > > really should be to just not do that. > > > > > > Move the irq_enter()/irq_exit() to the callers - quite possibly far up > > > the call chain to the root of it all, and just say "architecture code > > > needs to do this in the low-level code before calling > > > handle_arch_irq". I've spent the last few days attacking this, and I have a series which reworks things to pull irq_{enter,exit}() out of the irqchip code and into arch/entry code where it belongs, removig CONFIG_HANDLE_DOMAIN_IRQ entirely in the process. I'll post that out soon once I've cleaned up the commit messages and given it a decent cover letter. > > > Anyway, it _looks_ to me like the pattern is very simple: > > > > > > Step 1: > > > - remove irq_enter/irq_exit from handle_domain_irq(), move it to all callers. > > > > > > This clearly doesn't change anything at all, but also doesn't fix the > > > problem you have. But it's easy to verify that the code is the same > > > before-and-after. > > > > > > Step 2 is the pattern matching step: > > > > > > - if the caller of handle_domain_irq() ends up being a function that > > > is registered with set_handle_irq(), then we > > > (a) remove the irq_enter/irq_exit from it > > > (b) add it to the architectures that use handle_arch_irq. > > > (c) make sure that if there are other callers of it (not through > > > handle_arch_irq) we move that irq_enter/irq_exit into them too > > > > > > I _suspect_ - but didn't check - that Step 2(c) doesn't actually I had a go with the approach suggested above, but that didn't really work out and I ended up splitting the problem a different way. Comments belwo for the sake of posterity. Attacking this as a per-caller issue is *really* chury, and interdependencies force you to fix all drivers and all architectures in one go, which makes it really hard to see the wood for the trees. The underlying issue was with CONFIG_HANDLE_DOMAIN_IRQ, so just looking as set_handle_irq (which indicates CONFIG_GENERIC_IRQ_MULTI_HANDLER) also wasn't sufficient, and I had to go digging through each of the affected architectures' entry code. Instead, I've added a temporary shim, migrated each architecture in turn, then removed the shim and CONFIG_HANDLE_DOMAIN_IRQ entirely, which also ends up simplifying the drivers a bit. Thanks, Mark.