From patchwork Wed Jun 9 04:12:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12308831 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 685E4C47095 for ; Wed, 9 Jun 2021 04:12:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 03E6461249 for ; Wed, 9 Jun 2021 04:12:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 03E6461249 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 96D5D6B0036; Wed, 9 Jun 2021 00:12:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F5676B006E; Wed, 9 Jun 2021 00:12:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 748B26B0070; Wed, 9 Jun 2021 00:12:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0242.hostedemail.com [216.40.44.242]) by kanga.kvack.org (Postfix) with ESMTP id 3D00D6B0036 for ; Wed, 9 Jun 2021 00:12:24 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id D9164181AEF23 for ; Wed, 9 Jun 2021 04:12:23 +0000 (UTC) X-FDA: 78232863366.28.AAB4100 Received: from mail-ot1-f46.google.com (mail-ot1-f46.google.com [209.85.210.46]) by imf17.hostedemail.com (Postfix) with ESMTP id D2BEC40002C1 for ; Wed, 9 Jun 2021 04:12:20 +0000 (UTC) Received: by mail-ot1-f46.google.com with SMTP id 6-20020a9d07860000b02903e83bf8f8fcso9618855oto.12 for ; Tue, 08 Jun 2021 21:12:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:subject:message-id:mime-version; bh=dH5ufk7Y6j9/GuAEqgcW6GomD5lTAsfQgKNtMZXGAZw=; b=suDqTOtYhw65igVyFflM5NWu6cE9W/9pkIJIuRfBvtmcdE+kQ2E4LW6yecy0YeIDoC A1vQrJPBticUu4bYvfKxFd1QFTfDMue5JVBfh7OcvH6b451eNbR4Pa8MzRkqPGh20skr UXgA/BY6on6HUK1ggRI2FIq9vaUEgYu8CIUczXccrb4RmQXlanSbqpfEkriV4xpSMZce 2P5dygh3yYd9psOWHRZbwWu/1B/zYKILV6XmCjh5yxmerxiTTJ8pehEDNx3URfODt1GX gU98yCW2jGZXP1tcmOapS4+1RFiuKj3cTUjcDkjQp3vHKIcVHcw2YMcfg+meFfkfcEPg As4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:mime-version; bh=dH5ufk7Y6j9/GuAEqgcW6GomD5lTAsfQgKNtMZXGAZw=; b=eFq/nK0+lNgN6pjqdbRDNP2NBvJUexNYvEnIODdusDKsDSWS8uHJIqc6g+s1Jvof0L 2+gATtTXq11+N9/DzS/wyAQX41Ta+TXkpl6pKaWFlNr+40+N46H/IZIpjrXOlXr5qPbb ALxmgwmGGF4z2R0njDbHNyZhX4lacM5Tb5n9n6eNx+BxPFEMexyz5e47kJIsQ0N/TJYJ jR+iF+aulkRw/lUnmev9NBXFEQyy+5Qr95ZkAYdq70zTeGHgYOiGM47J77Q/U54SWjus hmPKtBce30mrcnoQ4Rn1fiZ8odfsZ1vbf4nXEDCs7seaZthP6rxR0W9Fy9a/M6drbj0N Wi8Q== X-Gm-Message-State: AOAM530PF6iAgQwyGpq/p0awBJYxGjDE453BGb+gYgbnyg0oVqFmRzYx 1w6uKQPRiUNsiZT/jXLuk3sMEdNV1uVN1w== X-Google-Smtp-Source: ABdhPJy8EA3JngAx0gGz5Jhhe/FOExFaGsYyva3Po5f2ayBipGfyZuuW56kwCv6M1oUi0fsxDF/P+A== X-Received: by 2002:a9d:3e5:: with SMTP id f92mr20891152otf.181.1623211942541; Tue, 08 Jun 2021 21:12:22 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id l18sm201747otr.50.2021.06.08.21.12.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Jun 2021 21:12:22 -0700 (PDT) Date: Tue, 8 Jun 2021 21:12:20 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 03/10] mm/thp: try_to_unmap() use TTU_SYNC for safe splitting (fwd) Message-ID: <6b2b6683-d9a7-b7d0-a3e5-425b96338d63@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: D2BEC40002C1 Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=suDqTOtY; spf=pass (imf17.hostedemail.com: domain of hughd@google.com designates 209.85.210.46 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: hnfhi61g8jskzwa6ebcstortkkxehrs6 X-HE-Tag: 1623211940-216031 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: ---------- Forwarded message ---------- Date: Tue, 8 Jun 2021 21:10:19 -0700 (PDT) From: Hugh Dickins To: Andrew Morton Cc: Hugh Dickins , Kirill A. Shutemov , Yang Shi , Wang Yugui , Matthew Wilcox , Naoya Horiguchi , Alistair Popple , Ralph Campbell , Zi Yan , Miaohe Lin , Minchan Kim , Jue Wang , Peter Xu , Jan Kara , Shakeel Butt , Oscar Salvador Subject: [PATCH v2 03/10] mm/thp: try_to_unmap() use TTU_SYNC for safe splitting Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE (!unmap_success): with dump_page() showing mapcount:1, but then its raw struct page output showing _mapcount ffffffff i.e. mapcount 0. And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed, it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)), and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG(): all indicative of some mapcount difficulty in development here perhaps. But the !CONFIG_DEBUG_VM path handles the failures correctly and silently. I believe the problem is that once a racing unmap has cleared pte or pmd, try_to_unmap_one() may skip taking the page table lock, and emerge from try_to_unmap() before the racing task has reached decrementing mapcount. Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding TTU_SYNC to the options, and passing that from unmap_page(). When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same for both: the slight overhead added should rarely matter, except perhaps if splitting sparsely-populated multiply-mapped shmem. Once confident that bugs are fixed, TTU_SYNC here can be removed, and the race tolerated. Fixes: fec89c109f3a ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers") Signed-off-by: Hugh Dickins Cc: Acked-by: Kirill A. Shutemov Reviewed-by: Yang Shi --- v2: moved TTU_SYNC definition up, to avoid conflict with other patchset use TTU_SYNC even when non-debug, per Peter Xu and Yang Shi expanded PVMW_SYNC's spin_unlock(pmd_lock()), per Kirill and Peter include/linux/rmap.h | 1 + mm/huge_memory.c | 2 +- mm/page_vma_mapped.c | 11 +++++++++++ mm/rmap.c | 17 ++++++++++++++++- 4 files changed, 29 insertions(+), 2 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index def5c62c93b3..8d04e7deedc6 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -91,6 +91,7 @@ enum ttu_flags { TTU_SPLIT_HUGE_PMD = 0x4, /* split huge PMD if any */ TTU_IGNORE_MLOCK = 0x8, /* ignore mlock */ + TTU_SYNC = 0x10, /* avoid racy checks with PVMW_SYNC */ TTU_IGNORE_HWPOISON = 0x20, /* corrupted page is recoverable */ TTU_BATCH_FLUSH = 0x40, /* Batch TLB flushes where possible * and caller guarantees they will diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 5885c5f5836f..84ab735139dc 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2350,7 +2350,7 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, static void unmap_page(struct page *page) { - enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | + enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD; bool unmap_success; diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 2cf01d933f13..5b559967410e 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -212,6 +212,17 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) pvmw->ptl = NULL; } } else if (!pmd_present(pmde)) { + /* + * If PVMW_SYNC, take and drop THP pmd lock so that we + * cannot return prematurely, while zap_huge_pmd() has + * cleared *pmd but not decremented compound_mapcount(). + */ + if ((pvmw->flags & PVMW_SYNC) && + PageTransCompound(pvmw->page)) { + spinlock_t *ptl = pmd_lock(mm, pvmw->pmd); + + spin_unlock(ptl); + } return false; } if (!map_pte(pvmw)) diff --git a/mm/rmap.c b/mm/rmap.c index 693a610e181d..07811b4ae793 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1405,6 +1405,15 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; + /* + * When racing against e.g. zap_pte_range() on another cpu, + * in between its ptep_get_and_clear_full() and page_remove_rmap(), + * try_to_unmap() may return false when it is about to become true, + * if page table locking is skipped: use TTU_SYNC to wait for that. + */ + if (flags & TTU_SYNC) + pvmw.flags = PVMW_SYNC; + /* munlock has nothing to gain from examining un-locked vmas */ if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED)) return true; @@ -1777,7 +1786,13 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags) else rmap_walk(page, &rwc); - return !page_mapcount(page) ? true : false; + /* + * When racing against e.g. zap_pte_range() on another cpu, + * in between its ptep_get_and_clear_full() and page_remove_rmap(), + * try_to_unmap() may return false when it is about to become true, + * if page table locking is skipped: use TTU_SYNC to wait for that. + */ + return !page_mapcount(page); } /**