From patchwork Fri Nov 18 01:31:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jiaqi Yan X-Patchwork-Id: 13047581 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CF73C433FE for ; Fri, 18 Nov 2022 01:32:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D95B06B0073; Thu, 17 Nov 2022 20:32:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D1E496B0075; Thu, 17 Nov 2022 20:32:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBF258E0001; Thu, 17 Nov 2022 20:32:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A65DE6B0073 for ; Thu, 17 Nov 2022 20:32:06 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 76471160B9A for ; Fri, 18 Nov 2022 01:32:06 +0000 (UTC) X-FDA: 80144837052.01.E9CE7C3 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf20.hostedemail.com (Postfix) with ESMTP id 28A191C0006 for ; Fri, 18 Nov 2022 01:32:03 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-349423f04dbso36224577b3.13 for ; Thu, 17 Nov 2022 17:32:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:date:from:to:cc:subject:date:message-id:reply-to; bh=jStKC03Y0zXQqOe+gc+LNVfb0l3I3dg+VjsycJv09Oc=; b=A+Vfc/AUBGAx4b4eQxhks5aUEnPT1v59wmOVKT6g1hp0la9IP5ePTl/pZVB0QZdC++ Rrb1JM4NGQI3xuIbTB2mxEf+j3B82Kf5QbeARnU91MLKS3o7N69ghIRu1Yl/oJSuomOT q7u/AJqp23kKPlYjNHIOlgqbDKNGOXtLOD8vUNXvEwQA//79ClGYrGzKiOUlb9W/zqup DE2CxyhShFh0OK3YUDetpwvAp17LniPDFXJiOUNZ0+b7n+CsnmS8Z8Tv7uSn9Rov7Xex aS8W7Ff09WQP9BE3zizLtgBtL71pWQuXb8G3L63fNc8HlbTvTgdchprvLQ+/Q/z1XSrk 4eQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=jStKC03Y0zXQqOe+gc+LNVfb0l3I3dg+VjsycJv09Oc=; b=MwkHO1mOPBcZexkeb0vNKg/VzNe0+bXg2voTrW+hyesnLPih9S2GOSf+R1ZWD3GDf8 0Vqlb3Nr4Ned6Ej6fkz59QzzZw34GEhhbF7MI0xwZmlyT54Z1/SLGNmepnkKUeGRZHHN fVkXf/kTDqNNuWUp+25oGQcMhhhSp2lU9ElwZUy3AcJ7Utd6h6BRjlPZaMRONHOFGJdi oQPWBbhYBvyZXnwHgcue1g2hBffBf8SlBwVgsGZJsq+KOchqU5gyta1U+uCz5uIK2+Xd ENXs5ttuioGEylSnDuiTzXQ7nxamQScTD4uZAKKU/5snHjM8ziC3TTnRDt1qUs5Up48m vFXw== X-Gm-Message-State: ANoB5pk3H6j30mjsreSadqjsIxkGli9GmIMk+bBSdAqfIH6Ty165eYq1 gTeseVoEjIgV2B83KLMU+SfV9pc6PcCU0w== X-Google-Smtp-Source: AA0mqf5ZCSQq9ONJ55qIJE5dS2zb2Kt1Tp/aKGjULS+MTaXDN9jl47npbd+ratB8XZhoOypoyY+WH3ACbs295Q== X-Received: from yjqkernel.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1837]) (user=jiaqiyan job=sendgmr) by 2002:a5b:f4f:0:b0:6bd:d36:f096 with SMTP id y15-20020a5b0f4f000000b006bd0d36f096mr4838778ybr.150.1668735123313; Thu, 17 Nov 2022 17:32:03 -0800 (PST) Date: Thu, 17 Nov 2022 17:31:55 -0800 Mime-Version: 1.0 X-Mailer: git-send-email 2.38.1.584.g0f3c55d4c2-goog Message-ID: <20221118013157.1333622-1-jiaqiyan@google.com> Subject: [PATCH v7 0/2] Memory poison recovery in khugepaged collapsing From: Jiaqi Yan To: kirill.shutemov@linux.intel.com, kirill@shutemov.name, shy828301@gmail.com, tongtiangen@huawei.com, akpm@linux-foundation.org Cc: tony.luck@intel.com, naoya.horiguchi@nec.com, linmiaohe@huawei.com, juew@google.com, jiaqiyan@google.com, linux-mm@kvack.org, osalvador@suse.de ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="A+Vfc/AU"; spf=pass (imf20.hostedemail.com: domain of 3k-B2YwgKCFI32uA2Iu708805y.w86527EH-664Fuw4.8B0@flex--jiaqiyan.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3k-B2YwgKCFI32uA2Iu708805y.w86527EH-664Fuw4.8B0@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668735124; a=rsa-sha256; cv=none; b=gA/7Z0S5SRROmo5lIaNL609Ib7wElG2VmS/JX4/xjah5ObY+pbKGYgDygEBjxe5EijsdGE 0n0giAVZ9md0eQLjOptnWvhxZ9CRbGie3/6Jwb49l0BNaD8Lrz/zfVPuE5fWOIALI8bIMy OkFeN0FjJtJT4F7GwJyj3sw2uhhmKlY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668735124; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=jStKC03Y0zXQqOe+gc+LNVfb0l3I3dg+VjsycJv09Oc=; b=oHVVU1dOavvsRmQA+NeGYGg5W1nh+BuKg08Kceft9iu85w2vVOQtMCY87y2PfQ/9cqAHeG FZxr0FNJIx8Y0jLJVhOsVGbmru/BpLBeba8J85wTOC1rFPzoNR9AThIEKuSfhMSUxvhAvY vZo+stsswNPwy4X8lfx5bYZPuLyXIRA= Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="A+Vfc/AU"; spf=pass (imf20.hostedemail.com: domain of 3k-B2YwgKCFI32uA2Iu708805y.w86527EH-664Fuw4.8B0@flex--jiaqiyan.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3k-B2YwgKCFI32uA2Iu708805y.w86527EH-664Fuw4.8B0@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam02 X-Rspam-User: X-Stat-Signature: mmp4a1tm7kc7pmhs7eyxrm1qgrpjqd1n X-Rspamd-Queue-Id: 28A191C0006 X-HE-Tag: 1668735123-626142 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Problem ======= Memory DIMMs are subject to multi-bit flips, i.e. memory errors. As memory size and density increase, the chances of and number of memory errors increase. The increasing size and density of server RAM in the data center and cloud have shown increased uncorrectable memory errors. There are already mechanisms in the kernel to recover from uncorrectable memory errors. This series of patches provides the recovery mechanism for the particular kernel agent khugepaged when it collapses memory pages. Impact ====== The main reason we chose to make khugepaged collapsing tolerant of memory failures was its high possibility of accessing poisoned memory while performing functionally optional compaction actions. Standard applications typically don't have strict requirements on the size of its pages. So they are given 4K pages by the kernel. The kernel is able to improve application performance by either 1) giving applications 2M pages to begin with, or 2) collapsing 4K pages into 2M pages when possible. This collapsing operation is done by khugepaged, a kernel agent that is constantly scanning memory. When collapsing 4K pages into a 2M page, it must copy the data from the 4K pages into a physically contiguous 2M page. Therefore, as long as there exists one poisoned cache line in collapsible 4K pages, khugepaged will eventually access it. The current impact to users is a machine check exception triggered kernel panic. However, khugepaged’s compaction operations are not functionally required kernel actions. Therefore making khugepaged tolerant to poisoned memory will greatly improve user experience. This patch series is for cases where khugepaged is the first guy that detects the memory errors on the poisoned pages. IOW, the pages are not known to have memory errors when khugepaged collapsing gets to them. In our observation, this happens frequently when the huge page ratio of the system is relatively low, which is fairly common in virtual machines running on cloud. Solution ======== As stated before, it is less desirable to crash the system only because khugepaged accesses poisoned pages while it is collapsing 4K pages. The high level idea of this patch series is to skip the group of pages (usually 512 4K-size pages) once khugepaged finds one of them is poisoned, as these pages have become ineligible to be collapsed. We are also careful to unwind operations khuagepaged has performed before it detects memory failures. For example, before copying and collapsing a group of anonymous pages into a huge page, the source pages will be isolated and their page table is unlinked from their PMD. These operations need to be undone in order to ensure these pages are not changed/lost from the perspective of other threads (both user and kernel space). As for file backed memory pages, there already exists a rollback case. This patch just extends it so that khugepaged also correctly rolls back when it fails to copy poisoned 4K pages. Changelog ========= v7 changes - Fix a bug "KASAN: stack-out-of-bounds Read in collapse_file". After copying all pages into the huge page, clear_highpage should use index instead of page->index. v6 changes - Address comments from Kirill Shutemov - Rewrite __collapse_huge_page_copy to make rollback operations more clear to its reader. - Add detailed test steps in each commit message. v5 changes - Rebase patches to mm-unstable at commit ffb39098bf87 ("Merge tag 'linux-kselftest-kunit-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest"). - Resolves conflicts with: commit 2f55f070e5b8 ("mm/khugepaged: minor cleanup for collapse_file") commit 1baec203b77c ("mm/khugepaged: try to free transhuge swapcache when possible") v4 changes - Incorporate feedbacks from Yang Shi - Remove tracepoint for __collapse_huge_page_copy, just keep SCAN_COPY_MC and let trace_mm_collapse_huge_page it - Remove unnecessary comments v3 changes - Incorporate feedbacks from Yang Shi - Add tracepoint for __collapse_huge_page_copy - Restore PMD in collapse_huge_page - Correct comment about mmap_read_lock v2 changes - Incorporate feedbacks from Yang Shi - Only keep copy_highpage_mc - Adding new scan_result SCAN_COPY_MC - Defer NR_FILE_THPS update until copying succeeded Jiaqi Yan (2): mm/khugepaged: recover from poisoned anonymous memory mm/khugepaged: recover from poisoned file-backed memory include/linux/highmem.h | 19 +++ include/trace/events/huge_memory.h | 3 +- mm/khugepaged.c | 233 ++++++++++++++++++++--------- 3 files changed, 183 insertions(+), 72 deletions(-)