From patchwork Mon Nov 7 02:53:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiaqi Yan X-Patchwork-Id: 13033758 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16E6AC43217 for ; Mon, 7 Nov 2022 02:54:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6CB898E0005; Sun, 6 Nov 2022 21:54:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 67B858E0001; Sun, 6 Nov 2022 21:54:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 51BAF8E0005; Sun, 6 Nov 2022 21:54:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 409388E0001 for ; Sun, 6 Nov 2022 21:54:08 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 190CA80418 for ; Mon, 7 Nov 2022 02:54:08 +0000 (UTC) X-FDA: 80105126976.27.49A6781 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf17.hostedemail.com (Postfix) with ESMTP id B31E140002 for ; Mon, 7 Nov 2022 02:54:07 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-367f94b9b16so97277897b3.11 for ; Sun, 06 Nov 2022 18:54:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Kx/bGs78oQIYGDCXuBU+Z0PzSbPHizZNrpOTRAdqb5o=; b=MOPoVYHGvu0VrNtbeX2G2Jjbrsg+jq3jtMuHv5G5LhOKoNZyGg0fTVJfhtu02qR8yT VGBNF9ncSMBMn68CtKgdCihDAaVGCfHAhvoplTUIXyTYpndN/pt3Qpe7FgrHt2dB4xyL OagEWVJUgtHs06oYD8z5AD7zJPOgffDyVJDmhgATIfadMuyIFCh6MJ5CQIczB+wZI4NB WW0pcbwditrdumggGc1zq9sP3NqxVPoGGOlIdFMP3eC2mmIGMxKq5TG8L3ofo4a/ADdY 6mP77qK50iBFNhWEcJ/zlyAClmpMiEZDci3FnL94X2H8n5MEjlpglCHB1gOkp9ssiR54 mNtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Kx/bGs78oQIYGDCXuBU+Z0PzSbPHizZNrpOTRAdqb5o=; b=4WiJ1TTFWsl3/h4OWOKUQkj3cN0KXvzn1b26BJe3HLfm3NPFoZo4n1jnzmTrLj8bTh oRvRXGEL6FF6D2EnA/jiinRuNJvElz4FMnEBM43bT5ijv2Pd0YHYbiSiaJ+RQCbL+/7H fmetgmSevmZARhMKmVjQ9aUKS1PsDdLyTFDM15TGr9uayvaYZKlHnjB3MiPjK8AJ/+0L yQMpY8e5qKMMNgucqn4KJip8nfckB9M3Tko8nJHoBVRHVsXBJjBNS43YrEeys9KbLn+s urBrVo2GuMj5jOgmw+hvAKQGouv8l3inhfOhR357ntOD0LUQvxoriMZZ8W5kgndYdWZp P2tA== X-Gm-Message-State: ACrzQf3cIZsZ6aSHaIhPb3O9VnFg0uJUgMWJRR5AXd8VBHrbeGdP54me MO+KZ5ShulYzwd6ljVLsx1Rdlwq7wXxbow== X-Google-Smtp-Source: AMsMyM447OTvVmx/TQM2cQTMBfByUlMenBrDgHiMyiX0ZjUO+SRp0KigWBfN2vNg/8C0v0JZZ9i1D2WvW6Yr5g== X-Received: from yjqkernel.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1837]) (user=jiaqiyan job=sendgmr) by 2002:a25:7b43:0:b0:6cb:bb1b:abf3 with SMTP id w64-20020a257b43000000b006cbbb1babf3mr45285957ybc.141.1667789646907; Sun, 06 Nov 2022 18:54:06 -0800 (PST) Date: Sun, 6 Nov 2022 18:53:59 -0800 In-Reply-To: <20221107025359.2911028-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20221107025359.2911028-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.38.1.431.g37b22c650d-goog Message-ID: <20221107025359.2911028-3-jiaqiyan@google.com> Subject: [PATCH v6 2/2] mm/khugepaged: recover from poisoned file-backed memory From: Jiaqi Yan To: kirill.shutemov@linux.intel.com, kirill@shutemov.name, shy828301@gmail.com, tongtiangen@huawei.com Cc: tony.luck@intel.com, naoya.horiguchi@nec.com, linmiaohe@huawei.com, jiaqiyan@google.com, linux-mm@kvack.org, akpm@linux-foundation.org ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=MOPoVYHG; spf=pass (imf17.hostedemail.com: domain of 3TnNoYwgKCPcihZphxZmfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--jiaqiyan.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3TnNoYwgKCPcihZphxZmfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667789647; a=rsa-sha256; cv=none; b=mTpNRUvdxvD+ZkrOJT9m63yurTj3XQyIR811qr3SBLjd/wRepU79+RhsQ8qb5TStjmQQ8q Ype6S6S0YMxUnh0ywEieqa7HzWaaJZJgOBg7VIDkUDAuUB6srYN6WmsoI13P5Yhga4WNP1 NF2NioYGDCRxnFXgkSlrGu9/L9TRm7A= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667789647; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Kx/bGs78oQIYGDCXuBU+Z0PzSbPHizZNrpOTRAdqb5o=; b=kzypCVbVIUR9dDMkTSWz0azjQ/Yj3OLGO0D+2mrzTg8EjnjQK0dCFX6ihrrka+SdqOBLfB We3sCqi8zSftNyIwIIy9Y96yuuLpMatwuUmb8YNBTS9c1htplUvcy3ik8JR7zS0p+C15cj f5kvXDZzLaq6cu0i0lI9IelQW9lvTGs= X-Stat-Signature: tq975n7dg8zw6yaxmu6wbs4tqyj69oyf X-Rspamd-Queue-Id: B31E140002 Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=MOPoVYHG; spf=pass (imf17.hostedemail.com: domain of 3TnNoYwgKCPcihZphxZmfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--jiaqiyan.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3TnNoYwgKCPcihZphxZmfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam05 X-Rspam-User: X-HE-Tag: 1667789647-519036 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Make collapse_file roll back when copying pages failed. More concretely: - extract copying operations into a separate loop - postpone the updates for nr_none until both scanning and copying succeeded - postpone joining small xarray entries until both scanning and copying succeeded - postpone the update operations to NR_XXX_THPS until both scanning and copying succeeded - for non-SHMEM file, roll back filemap_nr_thps_inc if scan succeeded but copying failed Tested manually: 0. Enable khugepaged on system under test. Mount tmpfs at /mnt/ramdisk. 1. Start a two-thread application. Each thread allocates a chunk of non-huge memory buffer from /mnt/ramdisk. 2. Pick 4 random buffer address (2 in each thread) and inject uncorrectable memory errors at physical addresses. 3. Signal both threads to make their memory buffer collapsible, i.e. calling madvise(MADV_HUGEPAGE). 4. Wait and then check kernel log: khugepaged is able to recover from poisoned pages by skipping them. 5. Signal both thread to inspect their buffer contents and make sure no data corruption. Signed-off-by: Jiaqi Yan --- mm/khugepaged.c | 74 ++++++++++++++++++++++++++++++------------------- 1 file changed, 45 insertions(+), 29 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index c1f225327bc05..d54527b77e365 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1776,7 +1776,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, struct collapse_control *cc) { struct address_space *mapping = file->f_mapping; - struct page *hpage; + struct page *hpage, *page, *tmp; pgoff_t index = 0, end = start + HPAGE_PMD_NR; LIST_HEAD(pagelist); XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER); @@ -1821,7 +1821,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, xas_set(&xas, start); for (index = start; index < end; index++) { - struct page *page = xas_next(&xas); + page = xas_next(&xas); VM_BUG_ON(index != xas.xa_index); if (is_shmem) { @@ -2003,10 +2003,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, } nr = thp_nr_pages(hpage); - if (is_shmem) - __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr); - else { - __mod_lruvec_page_state(hpage, NR_FILE_THPS, nr); + if (!is_shmem) { filemap_nr_thps_inc(mapping); /* * Paired with smp_mb() in do_dentry_open() to ensure @@ -2017,21 +2014,10 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, smp_mb(); if (inode_is_open_for_write(mapping->host)) { result = SCAN_FAIL; - __mod_lruvec_page_state(hpage, NR_FILE_THPS, -nr); filemap_nr_thps_dec(mapping); goto xa_locked; } } - - if (nr_none) { - __mod_lruvec_page_state(hpage, NR_FILE_PAGES, nr_none); - /* nr_none is always 0 for non-shmem. */ - __mod_lruvec_page_state(hpage, NR_SHMEM, nr_none); - } - - /* Join all the small entries into a single multi-index entry */ - xas_set_order(&xas, start, HPAGE_PMD_ORDER); - xas_store(&xas, hpage); xa_locked: xas_unlock_irq(&xas); xa_unlocked: @@ -2044,20 +2030,34 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, try_to_unmap_flush(); if (result == SCAN_SUCCEED) { - struct page *page, *tmp; - /* * Replacing old pages with new one has succeeded, now we - * need to copy the content and free the old pages. + * attempt to copy the contents. */ index = start; - list_for_each_entry_safe(page, tmp, &pagelist, lru) { + list_for_each_entry(page, &pagelist, lru) { while (index < page->index) { clear_highpage(hpage + (index % HPAGE_PMD_NR)); index++; } - copy_highpage(hpage + (page->index % HPAGE_PMD_NR), - page); + if (copy_highpage_mc(hpage + (page->index % HPAGE_PMD_NR), page)) { + result = SCAN_COPY_MC; + break; + } + index++; + } + while (result == SCAN_SUCCEED && index < end) { + clear_highpage(hpage + (page->index % HPAGE_PMD_NR)); + index++; + } + } + + if (result == SCAN_SUCCEED) { + /* + * Copying old pages to huge one has succeeded, now we + * need to free the old pages. + */ + list_for_each_entry_safe(page, tmp, &pagelist, lru) { list_del(&page->lru); page->mapping = NULL; page_ref_unfreeze(page, 1); @@ -2065,12 +2065,23 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, ClearPageUnevictable(page); unlock_page(page); put_page(page); - index++; } - while (index < end) { - clear_highpage(hpage + (index % HPAGE_PMD_NR)); - index++; + + xas_lock_irq(&xas); + if (is_shmem) + __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr); + else + __mod_lruvec_page_state(hpage, NR_FILE_THPS, nr); + + if (nr_none) { + __mod_lruvec_page_state(hpage, NR_FILE_PAGES, nr_none); + /* nr_none is always 0 for non-shmem. */ + __mod_lruvec_page_state(hpage, NR_SHMEM, nr_none); } + /* Join all the small entries into a single multi-index entry. */ + xas_set_order(&xas, start, HPAGE_PMD_ORDER); + xas_store(&xas, hpage); + xas_unlock_irq(&xas); SetPageUptodate(hpage); page_ref_add(hpage, HPAGE_PMD_NR - 1); @@ -2086,8 +2097,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, unlock_page(hpage); hpage = NULL; } else { - struct page *page; - /* Something went wrong: roll back page cache changes */ xas_lock_irq(&xas); if (nr_none) { @@ -2121,6 +2130,13 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, xas_lock_irq(&xas); } VM_BUG_ON(nr_none); + /* + * Undo the updates of filemap_nr_thps_inc for non-SHMEM file only. + * This undo is not needed unless failure is due to SCAN_COPY_MC. + */ + if (!is_shmem && result == SCAN_COPY_MC) + filemap_nr_thps_dec(mapping); + xas_unlock_irq(&xas); hpage->mapping = NULL;