From patchwork Fri Jun 23 17:12:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13290836 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC8F6EB64DD for ; Fri, 23 Jun 2023 17:12:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C5F768D0005; Fri, 23 Jun 2023 13:12:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BC1A08D0003; Fri, 23 Jun 2023 13:12:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A617E8D0005; Fri, 23 Jun 2023 13:12:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8F63A8D0001 for ; Fri, 23 Jun 2023 13:12:46 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 90F2BC0C7C for ; Fri, 23 Jun 2023 17:12:45 +0000 (UTC) X-FDA: 80934657090.19.CE78BBC Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf11.hostedemail.com (Postfix) with ESMTP id 9CF0940023 for ; Fri, 23 Jun 2023 17:12:43 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b="UooxLz/z"; dkim=pass header.d=linutronix.de header.s=2020e header.b=8kTgNXsU; dmarc=pass (policy=none) header.from=linutronix.de; spf=pass (imf11.hostedemail.com: domain of bigeasy@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=bigeasy@linutronix.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687540363; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BMld1PknhyoTTWqIQWTi/haDnPaStkrULY18977v1W8=; b=MnCDO86u/bcGDsBZ9Js3Ik3UNB+vd9lcr4VzJYzPXEBQXrNv2M3RP1wpCun8IK0IrIFqiT MbjAGK0ynbjzXbH5UCvjEotOBedbs/haJnnOpArxVMt+ZmLnSpMyiQC9pUeWHysh7ZBE23 x6QJ75Zbr2zW7zFKMg87hKhMTW2E4jI= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b="UooxLz/z"; dkim=pass header.d=linutronix.de header.s=2020e header.b=8kTgNXsU; dmarc=pass (policy=none) header.from=linutronix.de; spf=pass (imf11.hostedemail.com: domain of bigeasy@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=bigeasy@linutronix.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687540363; a=rsa-sha256; cv=none; b=fv/+ERP07sA3v2c7ELIzLypUzk3K0QcKv/9asDvKY9vj1wGe28NfqYmNM+5+KVM0/WOrOi fOI9pABGVO0/g/RykniFHNMrKylw3xw5JXI7uNBZaylp25izeelGkT2TIkkIHxnQgjQKNn eF+KHTy9CrWbXssWS8ovMLf90rDttQI= From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1687540361; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BMld1PknhyoTTWqIQWTi/haDnPaStkrULY18977v1W8=; b=UooxLz/zkoWhq+XAmNcit5ZQDI4n/Y+KVronARo5RtTZ+Tn6qjAbf+SOx/FC241sDV8f4t t8HJpjsAu06tlvLNMcBnutuGtTbknm/McJdfqacY7/9mKIMPXaChuyDfV91Lysn8JRgd04 FEajI9b521OGBvRhaLW4XirU9DI8WhB8c6njc7IoxIwm2p33Jxy1cOxv66rkXxgB5Chru+ A/FRmn+Hy1gok2d3NWwOZ5rM/Dtjk9URSGAuXs6LWjFMo+DbMSueXkd5UnHH+D8fM0FIeG T9YGtSeiToHljMH9F6NvMNkhsbUtBC+W2HnzhVsbBO2nnoCWRzHlsR620hmdWQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1687540361; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BMld1PknhyoTTWqIQWTi/haDnPaStkrULY18977v1W8=; b=8kTgNXsUUa80s7tfPsheHF9UbSnMTZUFyjFjS52xpo0dV5r9dRvc1KbatV5CicN60n6keS OX2i42K9YrjV0vBg== To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: "Luis Claudio R. Goncalves" , Andrew Morton , Boqun Feng , Ingo Molnar , John Ogness , Mel Gorman , Michal Hocko , Peter Zijlstra , Petr Mladek , Tetsuo Handa , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior Subject: [PATCH v2 2/2] mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save(). Date: Fri, 23 Jun 2023 19:12:32 +0200 Message-Id: <20230623171232.892937-3-bigeasy@linutronix.de> In-Reply-To: <20230623171232.892937-1-bigeasy@linutronix.de> References: <20230623171232.892937-1-bigeasy@linutronix.de> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 9CF0940023 X-Stat-Signature: fyoi7m65r37aqeegb7stz6dp9fzuxs3k X-HE-Tag: 1687540363-746042 X-HE-Meta: U2FsdGVkX1/+k+j22P/hzJQ66deN5dYFD+ko4j9mwUY6AcLU+IwEyBqKMB/Xjz0ov2iHYC5amJV74omum8PhL8fhowk6jJ4FPzybD9ac8bsA/issEmmxXBIEf56q+Mmj8oMZl9e2G6FkxYaEsq6CE11tbd64d3/kNMi4NyPsk7zqk1ajBieq/en8XD7WDHGywRWUKEH+sQAmljE8hHZGGp9F+F8gSjt3tdwTVr560akMJi64gTWJ4fuLNDPNgU1H8CNMp6dySEd794vmrEEa5+Wh/sSQmjg7CQ9gMyMvUEr4aa++Y7MWhOOO7DXl9YULQxWwf3EPoBrkvieC7egFIaDTxCu1e9+NOvY4kJR67hKzTVOYo0zLiTshDdmCHQ2oFWzqHXnFniw6ow8rDpUicf23a8ISbNqPy85+hQcn9a5YlbQZNG06AhfV8e4CULAPyn4+iwYBI+OEPN89Dq+b93gXfEIDzBDtE8YMYxQDsDk0QICc2yvor1JR73Yeu/MLxx2Q/HtbrHUc4Xi4ItACaMYxB1cBN6Y3GsqF4XTl3pvfgZQNT7OEyxNt/Tz7X+e6j4jBVKUD0ZoK8JtWCpYDO/8NpiYvfaKFw8sSnmXMDCXE/rFCDg1dlJYa+yXyfCb7jkGLV/zOEhOST3yXTP3j0ArfQriDbd1nFCXbPq583OrwERXHVWkKOq9jG3APluE6sNpiWJ9IPVpvTg/NHOto3XgYtQkZA0Z/ZAnlm7Vdi7qTcTWYfyvHd0FzQ3eZJaR8OJ9BPr6Km+W2TamYARi009/GNuRgqZQdRBHtT4I42C4AtaKU4xyQH/loAQeYBG2EAH2/RU1qatYXQcFAAMQMobzG6BlNh/XfNBdmgjyhmYUUHBHG4yzFVlsZ60YxXpvAnEqsmha5u/seu7tIwcbH8pQDnIE2menQA/du+0j3GWgLcGOPZM3WQeIut1UHqy4vQDo6OvI2ZS+NmVkBUW6 imHNGK3U Mp872QUpSgptygZ0BzOFjGAeBwgnD6CWqQ4KoC/dYsLhQLSHctS4XZPkh4yUCCa9pa7THt8eplrGASTgTg+pJoJyGVZCBuoYSgNCz47qkBqb6kCiS4dAhLAtzprJKXBIjkq8C/472n511xYfrKdgZtnXI9nrL4lUllsbUKvdFuFQBlPOM1Uf8LcQL0xVoNDKpKNye8ANkhJpq8O4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: __build_all_zonelists() acquires zonelist_update_seq by first disabling interrupts via local_irq_save() and then acquiring the seqlock with write_seqlock(). This is troublesome and leads to problems on PREEMPT_RT. The problem is that the inner spinlock_t becomes a sleeping lock on PREEMPT_RT and must not be acquired with disabled interrupts. The API provides write_seqlock_irqsave() which does the right thing in one step. printk_deferred_enter() has to be invoked in non-migrate-able context to ensure that deferred printing is enabled and disabled on the same CPU. This is the case after zonelist_update_seq has been acquired. There was discussion on the first submission that the order should be: local_irq_disable(); printk_deferred_enter(); write_seqlock(); to avoid pitfalls like having an unaccounted printk() coming from write_seqlock_irqsave() before printk_deferred_enter() is invoked. The only origin of such a printk() can be a lockdep splat because the lockdep annotation happens after the sequence count is incremented. This is exceptional and subject to change. It was also pointed that PREEMPT_RT can be affected by the printk problem since its write_seqlock_irqsave() does not really disable interrupts. This isn't the case because PREEMPT_RT's printk implementation differs from the mainline implementation in two important aspects: - Printing happens in a dedicated threads and not at during the invocation of printk(). - In emergency cases where synchronous printing is used, a different driver is used which does not use tty_port::lock. Acquire zonelist_update_seq with write_seqlock_irqsave() and then defer printk output. Fixes: 1007843a91909 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock") Signed-off-by: Sebastian Andrzej Siewior Acked-by: Michal Hocko --- mm/page_alloc.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 47421bedc12b7..99b7e7d09c5c0 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5808,11 +5808,10 @@ static void __build_all_zonelists(void *data) unsigned long flags; /* - * Explicitly disable this CPU's interrupts before taking seqlock - * to prevent any IRQ handler from calling into the page allocator - * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock. + * The zonelist_update_seq must be acquired with irqsave because the + * reader can be invoked from IRQ with GFP_ATOMIC. */ - local_irq_save(flags); + write_seqlock_irqsave(&zonelist_update_seq, flags); /* * Explicitly disable this CPU's synchronous printk() before taking * seqlock to prevent any printk() from trying to hold port->lock, for @@ -5820,7 +5819,6 @@ static void __build_all_zonelists(void *data) * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held. */ printk_deferred_enter(); - write_seqlock(&zonelist_update_seq); #ifdef CONFIG_NUMA memset(node_load, 0, sizeof(node_load)); @@ -5857,9 +5855,8 @@ static void __build_all_zonelists(void *data) #endif } - write_sequnlock(&zonelist_update_seq); printk_deferred_exit(); - local_irq_restore(flags); + write_sequnlock_irqrestore(&zonelist_update_seq, flags); } static noinline void __init