From patchwork Mon Aug 21 14:19:28 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 9912911 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id B4E7C600C8 for ; Mon, 21 Aug 2017 14:20:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A157F28795 for ; Mon, 21 Aug 2017 14:20:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 95AD128727; Mon, 21 Aug 2017 14:20:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, DKIM_VALID, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [65.50.211.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 00528287AB for ; Mon, 21 Aug 2017 14:20:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=PEiudjO5LzRPS2CfKgL/CwFHwh3y4w6na+cJc4NbQPg=; b=SFmfMLxSSsJdYx jU1eVOFTKCbtU6Kx8FscbebHaU3EXlN/xA9R2NbXulR/b98m68dTfcGButp4Yu4l5JuRp68pajT3E njaqyIaQogN1JefmU6eJwB+0TldCj/z6GhcLhUNMoUV2nqrGMi0XchVb3ggEVwdGBbRSqrBVPNWRS Z5e95O7PcyDXXVUcsfOyh6XS+ju2QWBHgwpDA6jMSXaTILpFKV05AO6aLWRJd63njqvLFhWxyIHnu aU5Q8b2VM+i3ZVh8NhpX6grlH1BUVh5480JiGvW3J3engeE0MCBLvuVRgn295vWNRdslEC9MFetTx FNWXHPIfUzFOZwN3Cbqw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1djnYu-00038P-2V; Mon, 21 Aug 2017 14:20:16 +0000 Received: from mail-pg0-x244.google.com ([2607:f8b0:400e:c05::244]) by bombadil.infradead.org with esmtps (Exim 4.87 #1 (Red Hat Linux)) id 1djnYm-0001tz-3V for linux-arm-kernel@lists.infradead.org; Mon, 21 Aug 2017 14:20:14 +0000 Received: by mail-pg0-x244.google.com with SMTP id y129so24020530pgy.3 for ; Mon, 21 Aug 2017 07:19:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:in-reply-to:references :organization:mime-version:content-transfer-encoding; bh=f6JipUU3yRVffOlSzxmXZ6IzYnt7T0ebeN9UnbGSn3A=; b=s2Lfxwh3ib4Pg+g0m/FW/w9AAhQ08kl9DcBv4fKoxCjttq8WElfBVMI0zYkcQkUsbJ RSnMOCEYsZrn5m/ZYhLMRbhWn4pygtjH8yezusbQ7KNlh+tNlLpYYQu40MKAhaRPDiK0 wkEKSMLSPtgLd7Wo3zVdEKXqo0Q3ca0wdAKrjCkducmAiMNHJwIb4K/a65gitakoCAeU U15kM+/h469INa6mezxj7ARrEL356NSSCDZ7Zn7FJvDkbSAkMqWl3odQF8dZnBx14+CI zoYXeZ3u0G3RuGUuvqQt/DAKh6JYvVqLyC0u0lorUA8hkodfsOcpuRFW8RfR2QDZ2Az6 00Ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:organization:mime-version:content-transfer-encoding; bh=f6JipUU3yRVffOlSzxmXZ6IzYnt7T0ebeN9UnbGSn3A=; b=ctL74NkkVdjKrWwP+NIMkFU9W3k2IfSQjnrQIMBzsWy8gbQRd5bglxANKv3x64Ysve 0ig42FONXIYTRf4uMrbkT3iOawy6/iLh//yK4p0OID2MI1xpvWzzTdWsZGuzzJaUcKDY IBOTRpkX8ozvflaJmLKDKkzcCtn/i0D1HZtWn++hNRUAd9VBv+GNbxk/2+ikEM8reZGO JzNG/zoMycC2dS+46+K1B1fj6gwz4nmJi8IOA5s9JGIuUflk8AO4K48W6fU7rEromDgu fJusDEtO1wbZsM3AxyjzDV6GLPVnf9qNWOyI0LLypsoYqEkYfJ66D06+DPoY3qaQpV6J X9jQ== X-Gm-Message-State: AHYfb5iQLr97gP7n3tQ50AaMaQp7BDp6CWBqQGp06bRF9B7J0LS/lYyx mpO5KMKcHolGCA== X-Received: by 10.98.63.146 with SMTP id z18mr1017662pfj.256.1503325187392; Mon, 21 Aug 2017 07:19:47 -0700 (PDT) Received: from roar.ozlabs.ibm.com (203-219-56-202.tpgi.com.au. [203.219.56.202]) by smtp.gmail.com with ESMTPSA id h81sm23644257pfj.72.2017.08.21.07.19.40 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 21 Aug 2017 07:19:46 -0700 (PDT) Date: Tue, 22 Aug 2017 00:19:28 +1000 From: Nicholas Piggin To: Jonathan Cameron Subject: Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this? Message-ID: <20170822001928.74f01322@roar.ozlabs.ibm.com> In-Reply-To: <20170821111833.00006753@huawei.com> References: <20170731162757.000058ba@huawei.com> <20170801184646.GE3730@linux.vnet.ibm.com> <20170802172555.0000468a@huawei.com> <20170815154743.GK7017@linux.vnet.ibm.com> <87wp63smwn.fsf@concordia.ellerman.id.au> <20170816125617.GY7017@linux.vnet.ibm.com> <20170816162731.GA22978@linux.vnet.ibm.com> <20170820144553.2ab2727b@ppc64le> <20170820230040.706b62ac@roar.ozlabs.ibm.com> <20170820183514.GM11320@linux.vnet.ibm.com> <20170820211429.GA27111@linux.vnet.ibm.com> <20170821105258.191d04b1@roar.ozlabs.ibm.com> <20170821160605.5b1cc019@roar.ozlabs.ibm.com> <20170821111833.00006753@huawei.com> Organization: IBM X-Mailer: Claws Mail 3.15.0-dirty (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20170821_072008_240552_A20754A4 X-CRM114-Status: GOOD ( 46.72 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: dzickus@redhat.com, sfr@canb.auug.org.au, Michael Ellerman , linuxarm@huawei.com, David Miller , abdhalee@linux.vnet.ibm.com, john.stultz@linaro.org, sparclinux@vger.kernel.org, tglx@linutronix.de, "Paul E. McKenney" , linuxppc-dev@lists.ozlabs.org, akpm@linux-foundation.org, linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP On Mon, 21 Aug 2017 11:18:33 +0100 Jonathan Cameron wrote: > On Mon, 21 Aug 2017 16:06:05 +1000 > Nicholas Piggin wrote: > > > On Mon, 21 Aug 2017 10:52:58 +1000 > > Nicholas Piggin wrote: > > > > > On Sun, 20 Aug 2017 14:14:29 -0700 > > > "Paul E. McKenney" wrote: > > > > > > > On Sun, Aug 20, 2017 at 11:35:14AM -0700, Paul E. McKenney wrote: > > > > > On Sun, Aug 20, 2017 at 11:00:40PM +1000, Nicholas Piggin wrote: > > > > > > On Sun, 20 Aug 2017 14:45:53 +1000 > > > > > > Nicholas Piggin wrote: > > > > > > > > > > > > > On Wed, 16 Aug 2017 09:27:31 -0700 > > > > > > > "Paul E. McKenney" wrote: > > > > > > > > On Wed, Aug 16, 2017 at 05:56:17AM -0700, Paul E. McKenney wrote: > > > > > > > > > > > > > > > > Thomas, John, am I misinterpreting the timer trace event messages? > > > > > > > > > > > > > > So I did some digging, and what you find is that rcu_sched seems to do a > > > > > > > simple scheudle_timeout(1) and just goes out to lunch for many seconds. > > > > > > > The process_timeout timer never fires (when it finally does wake after > > > > > > > one of these events, it usually removes the timer with del_timer_sync). > > > > > > > > > > > > > > So this patch seems to fix it. Testing, comments welcome. > > > > > > > > > > > > Okay this had a problem of trying to forward the timer from a timer > > > > > > callback function. > > > > > > > > > > > > This was my other approach which also fixes the RCU warnings, but it's > > > > > > a little more complex. I reworked it a bit so the mod_timer fast path > > > > > > hopefully doesn't have much more overhead (actually by reading jiffies > > > > > > only when needed, it probably saves a load). > > > > > > > > > > Giving this one a whirl! > > > > > > > > No joy here, but then again there are other reasons to believe that I > > > > am seeing a different bug than Dave and Jonathan are. > > > > > > > > OK, not -entirely- without joy -- 10 of 14 runs were error-free, which > > > > is a good improvement over 0 of 84 for your earlier patch. ;-) But > > > > not statistically different from what I see without either patch. > > > > > > > > But no statistical difference compared to without patch, and I still > > > > see the "rcu_sched kthread starved" messages. For whatever it is worth, > > > > by the way, I also see this: "hrtimer: interrupt took 5712368 ns". > > > > Hmmm... I am also seeing that without any of your patches. Might > > > > be hypervisor preemption, I guess. > > > > > > Okay it makes the warnings go away for me, but I'm just booting then > > > leaving the system idle. You're doing some CPU hotplug activity? > > > > Okay found a bug in the patch (it was not forwarding properly before > > adding the first timer after an idle) and a few other concerns. > > > > There's still a problem of a timer function doing a mod timer from > > within expire_timers. It can't forward the base, which might currently > > be quite a way behind. I *think* after we close these gaps and get > > timely wakeups for timers on there, it should not get too far behind > > for standard timers. > > > > Deferrable is a different story. Firstly it has no idle tracking so we > > never forward it. Even if we wanted to, we can't do it reliably because > > it could contain timers way behind the base. They are "deferrable", so > > you get what you pay for, but this still means there's a window where > > you can add a deferrable timer and get a far later expiry than you > > asked for despite the CPU never going idle after you added it. > > > > All these problems would seem to go away if mod_timer just queued up > > the timer to a single list on the base then pushed them into the > > wheel during your wheel processing softirq... Although maybe you end > > up with excessive passes over big queue of timers. Anyway that > > wouldn't be suitable for 4.13 even if it could work. > > > > I'll send out an updated minimal fix after some more testing... > > Hi All, > > I'm back in the office with hardware access on our D05 64 core ARM64 > boards. > > I think we still have by far the quickest test cases for this so > feel free to ping me anything you want tested quickly (we were > looking at an average of less than 10 minutes to trigger > with machine idling). > > Nick, I'm currently running your previous version and we are over an > hour so even without any instances of the issue so it looks like a > considerable improvement. I'll see if I can line a couple of boards > up for an overnight run if you have your updated version out by then. > > Be great to finally put this one to bed. Hi Jonathan, Thanks here's an updated version with a couple more bugs fixed. If you could try testing, that would be much appreciated. Thanks, Nick Tested-by: Jonathan Cameron --- kernel/time/timer.c | 43 +++++++++++++++++++++++++++++++++++-------- 1 file changed, 35 insertions(+), 8 deletions(-) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 8f5d1bf18854..2b9d2cdb3fac 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -203,6 +203,7 @@ struct timer_base { bool migration_enabled; bool nohz_active; bool is_idle; + bool was_idle; /* was it idle since last run/fwded */ DECLARE_BITMAP(pending_map, WHEEL_SIZE); struct hlist_head vectors[WHEEL_SIZE]; } ____cacheline_aligned; @@ -856,13 +857,19 @@ get_target_base(struct timer_base *base, unsigned tflags) static inline void forward_timer_base(struct timer_base *base) { - unsigned long jnow = READ_ONCE(jiffies); + unsigned long jnow; /* - * We only forward the base when it's idle and we have a delta between - * base clock and jiffies. + * We only forward the base when we are idle or have just come out + * of idle (was_idle logic), and have a delta between base clock + * and jiffies. In the common case, run_timers will take care of it. */ - if (!base->is_idle || (long) (jnow - base->clk) < 2) + if (likely(!base->was_idle)) + return; + + jnow = READ_ONCE(jiffies); + base->was_idle = base->is_idle; + if ((long)(jnow - base->clk) < 2) return; /* @@ -938,6 +945,13 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only) * same array bucket then just return: */ if (timer_pending(timer)) { + /* + * The downside of this optimization is that it can result in + * larger granularity than you would get from adding a new + * timer with this expiry. Would a timer flag for networking + * be appropriate, then we can try to keep expiry of general + * timers within ~1/8th of their interval? + */ if (timer->expires == expires) return 1; @@ -948,6 +962,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only) * dequeue/enqueue dance. */ base = lock_timer_base(timer, &flags); + forward_timer_base(base); clk = base->clk; idx = calc_wheel_index(expires, clk); @@ -964,6 +979,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only) } } else { base = lock_timer_base(timer, &flags); + forward_timer_base(base); } ret = detach_if_pending(timer, base, false); @@ -991,12 +1007,10 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only) raw_spin_lock(&base->lock); WRITE_ONCE(timer->flags, (timer->flags & ~TIMER_BASEMASK) | base->cpu); + forward_timer_base(base); } } - /* Try to forward a stale timer base clock */ - forward_timer_base(base); - timer->expires = expires; /* * If 'idx' was calculated above and the base time did not advance @@ -1499,8 +1513,10 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem) /* * If we expect to sleep more than a tick, mark the base idle: */ - if ((expires - basem) > TICK_NSEC) + if ((expires - basem) > TICK_NSEC) { + base->was_idle = true; base->is_idle = true; + } } raw_spin_unlock(&base->lock); @@ -1611,6 +1627,17 @@ static __latent_entropy void run_timer_softirq(struct softirq_action *h) { struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]); + /* + * was_idle must be cleared before running timers so that any timer + * functions that call mod_timer will not try to forward the base. + * + * The deferrable base does not do idle tracking at all, so we do + * not forward it. This can result in very large variations in + * granularity for deferrable timers, but they can be deferred for + * long periods due to idle. + */ + base->was_idle = false; + __run_timers(base); if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active) __run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));