Message ID | 20200914204209.256266093@linutronix.de (mailing list archive) |
---|---|
Headers | show |
Series | preempt: Make preempt count unconditional | expand |
On Mon, 14 Sep 2020 22:42:09 +0200
Thomas Gleixner <tglx@linutronix.de> wrote:
> 21 files changed, 23 insertions(+), 92 deletions(-)
This alone makes it look promising, and hopefully acceptable by Linus :-)
-- Steve
On Mon, Sep 14, 2020 at 1:45 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > Recently merged code does: > > gfp = preemptible() ? GFP_KERNEL : GFP_ATOMIC; > > Looks obviously correct, except for the fact that preemptible() is > unconditionally false for CONFIF_PREEMPT_COUNT=n, i.e. all allocations in > that code use GFP_ATOMIC on such kernels. I don't think this is a good reason to entirely get rid of the no-preempt thing. The above is just garbage. It's bogus. You can't do it. Blaming the no-preempt code for this bug is extremely unfair, imho. And the no-preempt code does help make for much better code generation for simple spinlocks. Where is that horribly buggy recent code? It's not in that exact format, certainly, since 'grep' doesn't find it. Linus
On Mon, Sep 14 2020 at 13:59, Linus Torvalds wrote: > On Mon, Sep 14, 2020 at 1:45 PM Thomas Gleixner <tglx@linutronix.de> wrote: >> >> Recently merged code does: >> >> gfp = preemptible() ? GFP_KERNEL : GFP_ATOMIC; >> >> Looks obviously correct, except for the fact that preemptible() is >> unconditionally false for CONFIF_PREEMPT_COUNT=n, i.e. all allocations in >> that code use GFP_ATOMIC on such kernels. > > I don't think this is a good reason to entirely get rid of the > no-preempt thing. I did not say that this is a good reason. It just illustrates the problem. > The above is just garbage. It's bogus. You can't do it. > > Blaming the no-preempt code for this bug is extremely unfair, imho. I'm not blaming the no-preempt code. I'm blaming inconsistency and there is no real good argument for inconsistent behaviour, TBH. > And the no-preempt code does help make for much better code generation > for simple spinlocks. Yes it does generate better code, but I tried hard to spot a difference in various metrics exposed by perf. It's all in the noise and I only can spot a difference when the actual preemption check after the decrement which still depends on CONFIG_PREEMPT is in place, but that's not the case for PREEMPT_NONE or PREEMPT_VOLUNTARY kernels where the decrement is just a decrement w/o any conditional behind it. > Where is that horribly buggy recent code? It's not in that exact > format, certainly, since 'grep' doesn't find it. Bah, that was stuff in next which got dropped again. But just look at any check which uses preemptible(), especially those which check !preemptible(): In the X86 #GP handler we have: /* * To be potentially processing a kprobe fault and to trust the result * from kprobe_running(), we have to be non-preemptible. */ if (!preemptible() && kprobe_running() && kprobe_fault_handler(regs, X86_TRAP_GP)) goto exit; and a similar check in the S390 code in kprobe_exceptions_notify(). That all magically 'works' because that code might have been actually tested with lockdep enabled which enforces PREEMPT_COUNT... The SG code has some interesting usage as well: if (miter->__flags & SG_MITER_ATOMIC) { WARN_ON_ONCE(preemptible()); kunmap_atomic(miter->addr); How is that WARN_ON_ONCE() supposed to catch anything? Especially as calling code does: flags = SG_MITER_TO_SG; if (!preemptible()) flags |= SG_MITER_ATOMIC; which is equally useless on kernels which have PREEMPT_COUNT=n. There are bugs which are related to in_atomic() or other in_***() usage all over the place as well. Inconsistency at the core level is a clear recipe for disaster and at some point we have to bite the bullet and accept that consistency is more important than the non measurable 3 cycles? Thanks, tglx
On Mon, Sep 14, 2020 at 01:59:15PM -0700, Linus Torvalds wrote: > On Mon, Sep 14, 2020 at 1:45 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > > > Recently merged code does: > > > > gfp = preemptible() ? GFP_KERNEL : GFP_ATOMIC; > > > > Looks obviously correct, except for the fact that preemptible() is > > unconditionally false for CONFIF_PREEMPT_COUNT=n, i.e. all allocations in > > that code use GFP_ATOMIC on such kernels. > > I don't think this is a good reason to entirely get rid of the no-preempt thing. > > The above is just garbage. It's bogus. You can't do it. > > Blaming the no-preempt code for this bug is extremely unfair, imho. > > And the no-preempt code does help make for much better code generation > for simple spinlocks. > > Where is that horribly buggy recent code? It's not in that exact > format, certainly, since 'grep' doesn't find it. It would be convenient for that "gfp =" code to work, as this would allow better cache locality while invoking RCU callbacks, and would further provide better robustness to callback floods. The full story is quite long, but here are alternatives have not yet been proven to be abject failures: 1. Use workqueues to do the allocations in a clean context. While waiting for the allocations, the callbacks are queued in the old cache-busting manner. This functions correctly, but in the meantime (which on busy systems can be some time) the cache locality and robustness are lost. 2. Provide the ability to allocate memory in raw atomic context. This is extremely effective, especially when used in combination with #1 above, but as you might suspect, the MM guys don't like it much. In contrast, with Thomas's patch series, call_rcu() and kvfree_rcu() could just look at preemptible() to see whether or not it was safe to allocate memory, even in !PREEMPT kernels -- and in the common case, it almost always would be safe. It is quite possible that this approach would work in isolation, or failing that, that adding #1 above would do the trick. I understand that this is all very hand-wavy, and I do apologize for that. If you really want the full sad story with performance numbers and the works, let me know! Thanx, Paul