From patchwork Tue Apr 4 14:31:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tetsuo Handa X-Patchwork-Id: 13200279 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3BE3C761AF for ; Tue, 4 Apr 2023 14:32:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 782586B0071; Tue, 4 Apr 2023 10:32:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 732676B0072; Tue, 4 Apr 2023 10:32:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5FA206B0074; Tue, 4 Apr 2023 10:32:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4D1DF6B0071 for ; Tue, 4 Apr 2023 10:32:19 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D6AAB80EEC for ; Tue, 4 Apr 2023 14:32:18 +0000 (UTC) X-FDA: 80643948756.06.8955616 Received: from www262.sakura.ne.jp (www262.sakura.ne.jp [202.181.97.72]) by imf09.hostedemail.com (Postfix) with ESMTP id 8FAE9140029 for ; Tue, 4 Apr 2023 14:32:15 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=none; spf=none (imf09.hostedemail.com: domain of penguin-kernel@I-love.SAKURA.ne.jp has no SPF policy when checking 202.181.97.72) smtp.mailfrom=penguin-kernel@I-love.SAKURA.ne.jp; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680618736; a=rsa-sha256; cv=none; b=rlKzD8iGaPcB1budx/o8/vtLvOpd0pIUnVXiE4bwtQso66amO0aHTVYYlfWAZ76i7nrmGS aqAw+MxFZokOWOkHjFnX72h7z4KacRy2Onevz1u+24rGxFXzcN6bDJVDuc7npPizC5Jcli cB6vGluRV8/YRbhwfR0p1QpKoENZBTo= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; spf=none (imf09.hostedemail.com: domain of penguin-kernel@I-love.SAKURA.ne.jp has no SPF policy when checking 202.181.97.72) smtp.mailfrom=penguin-kernel@I-love.SAKURA.ne.jp; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680618736; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ABo+/d0XbYIejEgOP/zTfcEkLK8S0rFx2aAfI6Kta/Y=; b=UTz1+0/axB2UvIu9YAe/OeokE/s2+HlGaw4mBWioy/I620+AO1UdrzXmHtPAP84exjq1qN ra4YoIqBUYikJi+vJxVTUMzQOJoYDGEgiUQONHAk6g2Hyule7M7AXMROiSUXau/uEdH8ZQ xbzl9IH74fIhhAa0qnFYqd1QNxsgh3g= Received: from fsav313.sakura.ne.jp (fsav313.sakura.ne.jp [153.120.85.144]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id 334EW1Y0081169; Tue, 4 Apr 2023 23:32:01 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav313.sakura.ne.jp (F-Secure/fsigk_smtp/550/fsav313.sakura.ne.jp); Tue, 04 Apr 2023 23:32:01 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/fsav313.sakura.ne.jp) Received: from [192.168.1.6] (M106072142033.v4.enabler.ne.jp [106.72.142.33]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id 334EVxZd081165 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NO); Tue, 4 Apr 2023 23:32:00 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Message-ID: <8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKURA.ne.jp> Date: Tue, 4 Apr 2023 23:31:58 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 Subject: [PATCH v2] mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock Content-Language: en-US From: Tetsuo Handa To: Michal Hocko , Petr Mladek Cc: Patrick Daly , Mel Gorman , David Hildenbrand , Andrew Morton , Sergey Senozhatsky , Steven Rostedt , John Ogness , syzkaller-bugs@googlegroups.com, =?utf-8?q?Ilpo_J=C3=A4rvinen?= , syzbot , linux-mm References: <78ff6e70-e986-1fcb-eafb-3edd5f2bceae@I-love.SAKURA.ne.jp> <6266b161-e4c3-7d65-6590-da6cc04d93ec@I-love.SAKURA.ne.jp> <0585ddb9-5de8-8cdd-202e-53887bbb6b5f@I-love.SAKURA.ne.jp> In-Reply-To: X-Rspam-User: X-Rspamd-Queue-Id: 8FAE9140029 X-Rspamd-Server: rspam01 X-Stat-Signature: siiie3c78a6zmc4c4faccmgkdmc1ibd7 X-HE-Tag: 1680618735-467755 X-HE-Meta: U2FsdGVkX1+wKUk8hbUX7rV1R1zXd4rQ1H6Vzl/k8cWVacmo0QA7K7R59HtdSUtNBmYr34EWNHjBCIgtI+MsprpIJh0kWJuBuRgE7ojFcJKEaJK9iEWvzZlML/f5TJal9+TR/nFWnnWhQWCuf0XDdYf2D/lxOMWoxFHNhN8FxktSg9euLB8jEvDA7sr9ExlS9UJGzqe0KwEY55hgObqrdwsghMr78c5/SgSW6DzFyMI/0f29OXRS4PtF/67fzOMkRiDpdwsZegYO4kUsUBsPhEgKQwmZb8ZppQ8rVzxSo1MwovJv71jh6RklN1oAUJ68mBh7nLIPRbFRFUFvOmI1j8qxLU8LnqXYJJuSx2mjqtRKMA2HsxiMCDYsBs0GZL9ZR9muIN7t+eEg/kQjo68dNNB4HOHFnEsOktmTupSp3dqaxkfznWqpoYY3PUPi3FR5ukB8NTOWldgFVVSaZMkO66e8Wwh9cLhJkBUyx/J4Z5khm3qcdqcsOePRf01Rc2KrOaZf58tpzv/14sx61HeCjEbf29NahWnrRsdeQwvJj3anP5Q3p7bNn7y3z0+xFhmQM/nXkW6S91jdE4J0RQVPqY+HCHyJGpH/gAj5Q4WScnzyFE8UyEAKmH7w7CiOvyBvD05YdQ2BoqlHOk2p9IlyUpbDlNZA3lcj4KKMVGCb5xMYUoREiUe4gweccJIJftzOaOBMA87N5NUG1yAod7Car4JYrjKtna6erN5t4i72MVRVN1H2sCGmxKBCfg5sA3TlDZkjONGUMHea6fK+BTlDtQ5eoKeQoZZ/OEEg4IceVR1RaB2D1Y8OxK2KBNNLJc/qTWeIvAeVz1Vwih4o2UKU1xjjzqw7/mgK+wM5xElH5sOh4oj/LkUT3FLUUa9kUYJ/UY/eyUSWzilGP1r2ll3YVNwEhDIkbrb0oNs1AE2ob3IyzPlmzHYcU0z6bLyhQYzJytKDAOrkUElLdMeRzYT O3L3214p eyEKwQ1HhofH4+ob9ve27BpD+bt/2CXtypIbaqFMUa/0reSdWy1bagDOgEJ7NenegcpTf7/dcISYMasHKtIOqMuU4rNSNjIKmBjajghb+1eqmHZxyFz7XXrDuFQFzYil0OYuieRVgPAN5Aism9GlYfafYehD1DJcj6ryMR/PqDj7R+0XUU7kN5Ep1GurCyaB4Nx8y9XG3fDKPRAccxSg3xCx9JvSAeeFxccgome9c0E9Yi83wDsXzPHCbouhliZnBZ555OZJbtCuz3VfIlarCg2gvAeSMHe+A50pNix3TJBcSoJqWNgL2i79ko1eGzB7OCQ0hupJ2fH8REo1LGa6O3b5CVYkBsVQsWZcGjgeD1CGcGWIB8x1QeJ3yJvqICuNIJ7Ml5Hdxp2JSy3cPfo0VmK85vIVKyonporOmTeufcoZvJg0/b8BfRoSNlpDSTV846IBixfdWhlJGOU2H7TecRBIQzi1Nt9lAweXcJfV8GwjUDyUPEe6DSErc1QqaHdoNXuXTv4Blo/1EZ/l+pemASSaZIooPh9wpanvsWdHStQD2Anpp/Xo/5UAqpSXxYrwL+r3goQ/U5VGwoymQiBRhuvjThuN+vciEwmzJlvdC/FCdIlg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: syzbot is reporting circular locking dependency which involves zonelist_update_seq seqlock [1], for this lock is checked by memory allocation requests which do not need to be retried. One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler. CPU0 ---- __build_all_zonelists() { write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd // e.g. timer interrupt handler runs at this moment some_timer_func() { kmalloc(GFP_ATOMIC) { __alloc_pages_slowpath() { read_seqbegin(&zonelist_update_seq) { // spins forever because zonelist_update_seq.seqcount is odd } } } } // e.g. timer interrupt handler finishes write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even } This deadlock scenario can be easily eliminated by not calling read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation requests. But Michal Hocko does not know whether we should go with this approach. Another deadlock scenario which syzbot is reporting is a race between kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with port->lock held and printk() from __build_all_zonelists() with zonelist_update_seq held. CPU0 CPU1 ---- ---- pty_write() { tty_insert_flip_string_and_push_buffer() { __build_all_zonelists() { write_seqlock(&zonelist_update_seq); build_zonelists() { printk() { vprintk() { vprintk_default() { vprintk_emit() { console_unlock() { console_flush_all() { console_emit_next_record() { con->write() = serial8250_console_write() { spin_lock_irqsave(&port->lock, flags); tty_insert_flip_string() { tty_insert_flip_string_fixed_flag() { __tty_buffer_request_room() { tty_buffer_alloc() { kmalloc(GFP_ATOMIC | __GFP_NOWARN) { __alloc_pages_slowpath() { zonelist_iter_begin() { read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held } } } } } } } } spin_unlock_irqrestore(&port->lock, flags); // message is printed to console spin_unlock_irqrestore(&port->lock, flags); } } } } } } } } } write_sequnlock(&zonelist_update_seq); } } } This deadlock scenario can be eliminated by preventing interrupt context from calling kmalloc(GFP_ATOMIC) and preventing printk() from calling console_flush_all() while zonelist_update_seq.seqcount is odd. Since Petr Mladek thinks that __build_all_zonelists() can become a candidate for deferring printk() [2], let's address this problem by disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC) and disabling synchronous printk() in order to avoid console_flush_all() . As a side effect of minimizing duration of zonelist_update_seq.seqcount being odd by disabling synchronous printk(), latency at read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and __GFP_DIRECT_RECLAIM allocation requests will be reduced. Although, from lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e. do not record unnecessary locking dependency) from interrupt context is still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC) inside write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq) section... Reported-by: syzbot Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1] Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation") Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2] Signed-off-by: Tetsuo Handa Cc: Michal Hocko Cc: Petr Mladek Acked-by: Michal Hocko Acked-by: Mel Gorman --- Changes in v2: Update patch description and comment. mm/page_alloc.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7136c36c5d01..e8b4f294d763 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6632,7 +6632,21 @@ static void __build_all_zonelists(void *data) int nid; int __maybe_unused cpu; pg_data_t *self = data; + unsigned long flags; + /* + * Explicitly disable this CPU's interrupts before taking seqlock + * to prevent any IRQ handler from calling into the page allocator + * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock. + */ + local_irq_save(flags); + /* + * Explicitly disable this CPU's synchronous printk() before taking + * seqlock to prevent any printk() from trying to hold port->lock, for + * tty_insert_flip_string_and_push_buffer() on other CPU might be + * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held. + */ + printk_deferred_enter(); write_seqlock(&zonelist_update_seq); #ifdef CONFIG_NUMA @@ -6671,6 +6685,8 @@ static void __build_all_zonelists(void *data) } write_sequnlock(&zonelist_update_seq); + printk_deferred_exit(); + local_irq_restore(flags); } static noinline void __init