From patchwork Fri Jun 23 20:15:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 13290905 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94C1EEB64D7 for ; Fri, 23 Jun 2023 20:15:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B583D8D0002; Fri, 23 Jun 2023 16:15:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B06ED8D0001; Fri, 23 Jun 2023 16:15:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9CEEE8D0002; Fri, 23 Jun 2023 16:15:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 8A5AA8D0001 for ; Fri, 23 Jun 2023 16:15:24 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 3A01340298 for ; Fri, 23 Jun 2023 20:15:24 +0000 (UTC) X-FDA: 80935117368.19.7E38801 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf27.hostedemail.com (Postfix) with ESMTP id 2E83440010 for ; Fri, 23 Jun 2023 20:15:21 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=vklA3KHc; dkim=pass header.d=linutronix.de header.s=2020e header.b=I11awLMB; dmarc=pass (policy=none) header.from=linutronix.de; spf=pass (imf27.hostedemail.com: domain of bigeasy@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=bigeasy@linutronix.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687551322; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+rOuJ0EojYhp+SwRjs1UE2wFsEuADPRbuu96NWj04jc=; b=BmvzVrWlXRK6v0sEuo8vSVWO2MGJTTN2yho6uOxlUWBGuGiHUQsxJtdu3ubg4HucupI77G cYQ4tgb0v4GN662bkSBatUbKb7Ck9/R7e6MBjX8zCN+BqH5MG7xIQee/RZ0FYlDAN6ilD9 sTMnF7JacZ1ty3MtqQ+lVKFfGLuvfA4= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=vklA3KHc; dkim=pass header.d=linutronix.de header.s=2020e header.b=I11awLMB; dmarc=pass (policy=none) header.from=linutronix.de; spf=pass (imf27.hostedemail.com: domain of bigeasy@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=bigeasy@linutronix.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687551322; a=rsa-sha256; cv=none; b=mFV5bi0HWWzF5J5R1G7seSloSUjIRGRJweIUYymPkOXxbsYQYK1Q+qLm0QuXy5mLXIYbbb ecbGCM8A0N2xZZ+Q+s2j+xjsAXEYM40MY/rfTZSRn/JrrYfaYu0xKBPCpmCp6ypZp4h4Ou 6w0JyFILqTIsFdirxvR/+zunoiZdEUg= Date: Fri, 23 Jun 2023 22:15:17 +0200 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1687551319; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+rOuJ0EojYhp+SwRjs1UE2wFsEuADPRbuu96NWj04jc=; b=vklA3KHcYIeQ8abDfiavFibQAUoNowGOojA9cQTRd8eBLmpzSXU+KpI8o7ZjmxVJpVhuAa /WmVzKWicviG9IRDI0XZvHO2R/yyXCqhomvIX5hjGO4MvCTqUQf9Avke39RqTKe0X8Atdf 3HsI/D/B51kPXEnxOs7aQtIeiPeEAtG/2ijD6q5/NtgxgWUhMdHlXveZAwQCvaLlkilQsX 3LVDhGM60kNGEqnvUVA668rJ1+xv5CA+sLmqe7giEarcIq2Pp8OdHvx2YCzqD0waE0QcGs sLvgTNa3e/LzUxzx9kXPhCl43Xw5BcVOqj6gQ+wJkBKHnzMtmOChxdHohya0aA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1687551319; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+rOuJ0EojYhp+SwRjs1UE2wFsEuADPRbuu96NWj04jc=; b=I11awLMBniQg45qxTqHudRDJCpolCWftDRg9wMpJy+7/uIdTho0BTQlNRsBhVE3Tp1dD1r zEuZgrl/bW7TFCCw== From: Sebastian Andrzej Siewior To: Michal Hocko Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Luis Claudio R. Goncalves" , Andrew Morton , Boqun Feng , Ingo Molnar , John Ogness , Mel Gorman , Peter Zijlstra , Petr Mladek , Tetsuo Handa , Thomas Gleixner , Waiman Long , Will Deacon Subject: [PATCH v3 2/2] mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save(). Message-ID: <20230623201517.yw286Knb@linutronix.de> References: <20230623171232.892937-1-bigeasy@linutronix.de> <20230623171232.892937-3-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 2E83440010 X-Stat-Signature: cq7u1ccc6zwgmwjotsp9wha7njtxd14o X-Rspam-User: X-HE-Tag: 1687551321-920404 X-HE-Meta: U2FsdGVkX1/SsLoa6hU8RUxRYn7rE5PELdhQgozfzleH7QKylPjMDwdfYlz9t/BtVeVGHourVSXfmFDn8ZanbvejZmsovIFedp4igJLMV8GfcY4fQ6Dp/G7QU0vrFR6NvPSgKD0YeUfYXBr3tAOdjjLeevcbkIMXmpk9lRTLy/v12LXFKcn37eEMJaCRsqC0eEPWAaqFRzo+n0wQBynqeeam+TAOnI07YCZAfBynke5dMkpmUnK5SZ4MGDTx4zxoK81ubfAe9+nmP39FWeSsZWZbU8mZVaVPqccl8QkBkdEt2MEHVULaxXwYFiu0d0p/aOMS0oD7/TpumeptZYYucsbXL+cgAaZGYS0sg1ytnwYuRlE0eiWJ3BLCazldlvhZdWNt02WBk/ehlshANOKKV8EoTvguazEfa5OCiQGiUzDzs94KKUBAP7DoNMxWiSt1dgH9jNNWra+0vN6MBNxb5+5HE/SjIfL5/KomfxrQgp305tmD5ddfPxqzMCF3wv9icg5IInxeEgLiM1/Vlv/+39jNcp8LIf4qHU9nkGRCNZlfuDJjByDjYdDqu415+a2ELykFJtVzQ7EgpUd+Ac3mYur7NCm7AyJ/CB2SSi6IDsi6+DBmFHhOG1/7JtEVsmZBD1VMu1D6ltYqCDiI+/bBstVFgMkIisCnIcUd+dafFTAZ0hnL9nC6ZIcR28z7YjNjp8YVWIiooP5hbjgXqWpqvYMqD4yLtRsQVNrR+H6VVycbXJcFG8ALc/YMpjJLsrpkbmVTcLCnq/LdIXoxbBSjTAC+vnZ2WdyfKj0wMKrM9r4exwUFLCLyJDQOEaqOjgTiNzT8xoifECtrw2hJ2mK0sZ3EPmP5/AsUTPMLnWgKWgdEebVWoUIWiASSD176BDg21o9DqoFnLu7riEpSlM4pPENaApbgjpX4BK3ly/pKA6nLsIRNnQPX0JG6Ik/8+UEOn4kOe+O1DKb2VGaXR7u qjTPTwIW zcvO4Kh7uGLnl74gEktzwondLtFIDkoTRYXO2ch1hgHBo7vWm0y9vqt1vTWvr4qRh4QRgj1Gel4T8loT85OpZRXaROngz5NB3enBm0srjK3Kb97v56+B8FQOzkdy6A0N8NxNs7nNcTlYgzbp3w3p8gbCEdMh5aR5p5+ngHoeHiV9R3gwZl4gHOjTDnir/6hfLfwcOuLyLByCioZRpj5ib7LmCZ3t/EQZHiMgxz6TMETINnEUGzaSr5cN2DLhRN2x4+n8weeURXLH01fueGDMdTPxBhg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: __build_all_zonelists() acquires zonelist_update_seq by first disabling interrupts via local_irq_save() and then acquiring the seqlock with write_seqlock(). This is troublesome and leads to problems on PREEMPT_RT. The problem is that the inner spinlock_t becomes a sleeping lock on PREEMPT_RT and must not be acquired with disabled interrupts. The API provides write_seqlock_irqsave() which does the right thing in one step. printk_deferred_enter() has to be invoked in non-migrate-able context to ensure that deferred printing is enabled and disabled on the same CPU. This is the case after zonelist_update_seq has been acquired. There was discussion on the first submission that the order should be: local_irq_disable(); printk_deferred_enter(); write_seqlock(); to avoid pitfalls like having an unaccounted printk() coming from write_seqlock_irqsave() before printk_deferred_enter() is invoked. The only origin of such a printk() can be a lockdep splat because the lockdep annotation happens after the sequence count is incremented. This is exceptional and subject to change. It was also pointed that PREEMPT_RT can be affected by the printk problem since its write_seqlock_irqsave() does not really disable interrupts. This isn't the case because PREEMPT_RT's printk implementation differs from the mainline implementation in two important aspects: - Printing happens in a dedicated threads and not at during the invocation of printk(). - In emergency cases where synchronous printing is used, a different driver is used which does not use tty_port::lock. Acquire zonelist_update_seq with write_seqlock_irqsave() and then defer printk output. Fixes: 1007843a91909 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock") Signed-off-by: Sebastian Andrzej Siewior Acked-by: Michal Hocko Reviewed-by: David Hildenbrand Acked-by: Mel Gorman --- v2…v3 - Update comment as per Michal's suggestion. v1…v2: - Improve commit description mm/page_alloc.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 47421bedc12b7..440e9af67b48d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5808,19 +5808,17 @@ static void __build_all_zonelists(void *data) unsigned long flags; /* - * Explicitly disable this CPU's interrupts before taking seqlock - * to prevent any IRQ handler from calling into the page allocator - * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock. + * The zonelist_update_seq must be acquired with irqsave because the + * reader can be invoked from IRQ with GFP_ATOMIC. */ - local_irq_save(flags); + write_seqlock_irqsave(&zonelist_update_seq, flags); /* - * Explicitly disable this CPU's synchronous printk() before taking - * seqlock to prevent any printk() from trying to hold port->lock, for + * Also disable synchronous printk() to prevent any printk() from + * trying to hold port->lock, for * tty_insert_flip_string_and_push_buffer() on other CPU might be * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held. */ printk_deferred_enter(); - write_seqlock(&zonelist_update_seq); #ifdef CONFIG_NUMA memset(node_load, 0, sizeof(node_load)); @@ -5857,9 +5855,8 @@ static void __build_all_zonelists(void *data) #endif } - write_sequnlock(&zonelist_update_seq); printk_deferred_exit(); - local_irq_restore(flags); + write_sequnlock_irqrestore(&zonelist_update_seq, flags); } static noinline void __init