From patchwork Tue Aug 15 07:07:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: mawupeng X-Patchwork-Id: 13353624 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F4A6C001B0 for ; Tue, 15 Aug 2023 07:07:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 922CC90001F; Tue, 15 Aug 2023 03:07:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8D2AE90000B; Tue, 15 Aug 2023 03:07:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 79A2E90001F; Tue, 15 Aug 2023 03:07:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 6BE0F90000B for ; Tue, 15 Aug 2023 03:07:23 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id D64ADB23A3 for ; Tue, 15 Aug 2023 07:07:22 +0000 (UTC) X-FDA: 81125457924.18.98F1765 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf12.hostedemail.com (Postfix) with ESMTP id 71F144000B for ; Tue, 15 Aug 2023 07:07:18 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of mawupeng1@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=mawupeng1@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692083241; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=78e8zaMvGz3NZ5Ef3kWkzKdcVqLEyPXgzBOJtX6bHmo=; b=3QbhJ6UfXGiItgzwzbIa/58FEK5K6QO68xWTJsbsvXOv+8LJ05obUbEDS7plXRKL04xzJN UaKYv/gymGixE7UFZIOlE0yDoobGZ5IwyJ20lVvbsKvAoACoyA11j8M26rQtEBcIfcwpiI Igrhl77GVrOrtEg5/YTLwN9TDmZqMzs= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of mawupeng1@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=mawupeng1@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692083241; a=rsa-sha256; cv=none; b=NoHHdt5zCF+osXA1FmyMzQCBzN0yKJgibVheKIEs1ScOOp7q5f/OxcVxUccBXKufJDrAil ZQMf5vBCDAub3znVHUpgbGzecDGeWlJM9YEgyUuqJg0uhYMsEMYZylXisJkoCQ+qy/O49x 06BQH0B9uRoBIeyOFB/AXc6tK3JuEOw= Received: from dggpemm500014.china.huawei.com (unknown [172.30.72.56]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4RQ2L54Zd4zNmrW; Tue, 15 Aug 2023 15:03:41 +0800 (CST) Received: from [10.174.178.120] (10.174.178.120) by dggpemm500014.china.huawei.com (7.185.36.153) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Tue, 15 Aug 2023 15:07:12 +0800 Message-ID: <4263470f-77f8-47e2-be03-e1f8d790999e@huawei.com> Date: Tue, 15 Aug 2023 15:07:12 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird CC: , , , Content-Language: en-US To: From: mawupeng Subject: [5.10/5.15 LTS] Question on mlock race between ksm and cow X-Originating-IP: [10.174.178.120] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To dggpemm500014.china.huawei.com (7.185.36.153) X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 71F144000B X-Rspam-User: X-Stat-Signature: ksb7nde8o31u879rsprwyjfhwbskgzkf X-Rspamd-Server: rspam01 X-HE-Tag: 1692083238-709150 X-HE-Meta: U2FsdGVkX19DNV9XeNbuRAvZqHL+8QCNmLQcHGWWoJO2UPpY8anw2O2TaYpHwwyjKohXPPYbLd7dbvRsKkQPlXqFB9oXu0iKvOgVlD67dguUqCzrUQgw4HxzFDEY5XOojlh+c1VnUQ5GMuU3J4GxCGhB2ZdJySl7OUjP94vvjcqafVohQkDzfwlerepF1NOxwdAbms2Z8MofjCZUB4Z0LI1vC2ZjlMf5iSQOZXndq9uKliePmI69SBXdq8flJ08gbX3l/bau/609jMwL9VfhcDjrNyYRoANMcTFPcUrZmXr9jLfOcQBjw3QrDdA7fgGkvVmFlHTGSegiCRBrKkxqiPSUMT+S54oD/K3oLdNAYArlyXsdJOqzv8yWLea6+HikqMmWaawVgGwGDPqRL3szH1ERL9hAmBOf+qlRvrh0Wey6haSepfgtJrVw1DwFBpc/lgQzXJUq76WK47LFHdiLWLWM4OliPMjfdNMqQbDQ+ooAxOPvMpSIhzkfQBFK+l82UifcYeKRtsC2JBGitdi/JHkGuoNkk/e/+MdGSCnl7MoICK9NoQaFd8g59EtxL1GzQ1iyUYm0+iHOiyrho2KJCkudOcjvBETAy64Gsyc+8wAQi3+tmBLABYcqVM+P16xLhkfxEQzeGnjbTQXpsMEWfqXu8czyY8/NfCz+yZQFoFOJKKqpjip6hn2tC2zwGWVyvvL8XiRcZbDahVTP065fAgQFHHaViZCL28rkHlhoBNyConUnEqjnYkAM1V87qvuxOJA9wFwYxn93SU5NYekiZrbOJkVPKJXTcefGDa1iQHaR0QvALJnZwN32P5ZczGjJh9TZf2akabR1gG8cdzKrVsMsOiNIAcbBj1+dfdr/n4SurvczHnahCdbO29H9ZQJHjGPMSeIFmce8HjwotpmbJvGLEARrRyD6Nhk9Ruj4FKLgqe7l55H+WCMqrjHHuT0+aVF1yyQwtZA5OWAIA5O aKxmc/A1 0//lkJVgoSGmvuZgJpyAr/l4hG9o/bY9E2H2OIjZVlbxqxX43n/L9or2eCYPn2XwBs0ISHJHH5ntJUqlH6lSaWHmPDB7wqeUuoNx1Tm/w5rx54OLvdLaTteEMh7txh25oUaEKbuADeRxdoY8CeBWzuD9Uj6ufFoOMhl6dkaAzT8Lv5Uwe/Yh5Ffkh2Ee5xjRWJTyaZv72n6DqEsk+NHGg+jMPm909LlQbHZb3 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Our syzbot reports a warning on bad page state. The mlocked flag is not cleared during page free. During try_to_merge_one_page in ksm, kpage will be remlocked if vma contains flag VM_LOCKED, however this flag is just cleared in wp_page_copy. Since the mapcount of this kpage is -1, no one can remove its mlocked flag before free, this lead to the bad page report. Since mlock changes a lot in v5.18-rc1[1], the latest linux do not have this problem. The 5.10/5.15 LTS do have this issue. Here is the simplified calltrace: try_to_merge_one_page wp_page_copy try_to_merge_one_page // clear page mlocked during rmap removal replace_page page_remove_rmap if (unlikely(PageMlocked(page))) clear_page_mlock(compound_head(page)); if ((vma->vm_flags & VM_LOCKED) lock_page(old_page); if (vma->vm_flags & VM_LOCKED) if (PageMlocked(old_page)) munlock_vma_page(old_page); if (!PageMlocked(kpage)) lock_page(kpage); mlock_vma_page(kpage); unlock_page(kpage); ------------------------------------------------- This problem can be easily reproduced with the following modifies: 1. enable the following CONFIG a) CONFIG_DEBUG_VM b) CONFIG_KSM c) CONFIG_MEMORY_FAIALURE 2. add delay in try_to_merge_one_page 3. run syzbot with the following content: madvise(&(0x7f0000ff3000/0xc000)=nil, 0xc000, 0xc) mlockall(0x1) mlockall(0x5) madvise(&(0x7f0000ff3000/0xc000)=nil, 0xc04c, 0x65) madvise(&(0x7f0000ff5000/0x4000)=nil, 0x4000, 0xc) mlockall(0x1) mlockall(0xa5) mlockall(0x0) munlock(&(0x7f0000ff7000/0x4000)=nil, 0x4000) ------------------------------------------------- The detail bug report can be seen as follow: BUG: Bad page state in process rs:main Q:Reg pfn:11406a page:fffff7b004501a80 refcount:0 mapcount:0 mapping:0000000000000000 index:0x20ff4 pfn:0x11406a flags: 0x30000000028000e(referenced|uptodate|dirty|swapbacked|mlocked|node=0|zone=3) raw: 030000000028000e fffff7b00456aec8 fffff7b011439908 0000000000000000 Soft offlining pfn 0x455e8f at process virtual address 0x20ff6000 raw: 0000000000020ff4 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set Modules linked in: CPU: 1 PID: 239 Comm: rs:main Q:Reg Not tainted 5.15.126+ #580 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack_lvl+0x33/0x46 bad_page+0x9e/0xe0 free_pcp_prepare+0x14b/0x1f0 free_unref_page_list+0x7c/0x210 release_pages+0x2fe/0x3c0 __pagevec_lru_add+0x21a/0x360 lru_cache_add+0x80/0xe0 add_to_page_cache_lru+0x71/0xd0 pagecache_get_page+0x245/0x460 grab_cache_page_write_begin+0x1a/0x40 ext4_da_write_begin+0xb7/0x280 generic_perform_write+0xb4/0x1e0 ext4_buffered_write_iter+0x9c/0x140 ext4_file_write_iter+0x5b/0x840 ? do_futex+0x1af/0xb60 ? check_preempt_curr+0x21/0x60 ? ttwu_do_wakeup.isra.140+0xd/0xf0 new_sync_write+0x117/0x1b0 vfs_write+0x1ff/0x260 ksys_write+0xa0/0xe0 do_syscall_64+0x37/0x90 entry_SYSCALL_64_after_hwframe+0x67/0xd1 RIP: 0033:0x7fb815cef32f Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 29 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 05 <48> 3d 00 f0 ff ff 77 2d 44 89 c7 48 89 44 24 08 e8 5c fd ff ff 48 RSP: 002b:00007fb814b2b860 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007fb808004f20 RCX: 00007fb815cef32f RDX: 000000000000006e RSI: 00007fb808004f20 RDI: 0000000000000007 RBP: 00007fb808004c40 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000293 R12: 00007fb808009550 R13: 000000000000006e R14: 0000000000000000 R15: 0000000000000000 [1]: https://lore.kernel.org/linux-mm/e7fbbdca-6590-7e45-3efd-279fba7f8376@suse.cz/T/#m0cb6e42b2a5ad634e1ec16e59f0f98f2e9382460 diff --git a/mm/ksm.c b/mm/ksm.c index a5716fdec1aa..f9ee2ec615ac 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1248,8 +1248,10 @@ static int try_to_merge_one_page(struct vm_area_struct *vma, if ((vma->vm_flags & VM_LOCKED) && kpage && !err) { munlock_vma_page(page); + mdelay(10); if (!PageMlocked(kpage)) { unlock_page(page); + mdelay(100); lock_page(kpage); mlock_vma_page(kpage); page = kpage; /* for final unlock */