From patchwork Mon Jan 16 19:39:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiaqi Yan X-Patchwork-Id: 13103610 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD313C54EBE for ; Mon, 16 Jan 2023 19:39:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6D5346B0073; Mon, 16 Jan 2023 14:39:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 685DA6B0075; Mon, 16 Jan 2023 14:39:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 54D6A6B0078; Mon, 16 Jan 2023 14:39:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 478226B0073 for ; Mon, 16 Jan 2023 14:39:14 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id ECA641606B2 for ; Mon, 16 Jan 2023 19:39:13 +0000 (UTC) X-FDA: 80361675786.23.CB1013A Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf04.hostedemail.com (Postfix) with ESMTP id 4601840016 for ; Mon, 16 Jan 2023 19:39:12 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=eBl7jimI; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of 33qfFYwgKCGkQPHXPfHUNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--jiaqiyan.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=33qfFYwgKCGkQPHXPfHUNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--jiaqiyan.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673897952; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZXeY8E/2c5nWSDT/ivMx8LjfRNuYigTN+ho7oHQ71Zc=; b=GLBFCDZ6eaMY26mNEyCYQX1V+k28Vz4AJk/D2wIuXYkyaVYGqPR7wHAsYNxYHwy5SyApcW 064OafNt+Xe5i0l8L5nRqcttmp3GJoulymxIxznAlMrRAxt5NNwJW4JBW5ILc3IGvEStvY zK9xbK9p21iXzZc0kg2yr5COi9mMuwY= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=eBl7jimI; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of 33qfFYwgKCGkQPHXPfHUNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--jiaqiyan.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=33qfFYwgKCGkQPHXPfHUNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--jiaqiyan.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673897952; a=rsa-sha256; cv=none; b=MevATxeH89kjUPkeoF8dIoYJoPlGUfzi/JBneYDtxQNMRO/wlEXMe6w6pR4rLrIRWbooTt hKwlA68uja2N4gqS26UWLbIb96l52Y18mGfjPn3IzFRaLndTpYVIRZez8MvL2mKZl2EHyO zbly8iEBn7uYtj9nQ66456P+6fzKab0= Received: by mail-pj1-f74.google.com with SMTP id h6-20020a17090aa88600b00223fccff2efso20419008pjq.6 for ; Mon, 16 Jan 2023 11:39:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ZXeY8E/2c5nWSDT/ivMx8LjfRNuYigTN+ho7oHQ71Zc=; b=eBl7jimIv/AtfRQcfYm1wcj8VbI/BEWMzO7FwjlVd1vF+d2ku7J/Do3OHGH6tyWVlI 7BwB7JpaMiSnJCWZDOybw5aFiAwQgT5KiG3HA/fxC6d1WppL4mwrZhUO9QGyyYPRIQyn GvarQtt72TrJG058kQrdEN3X41O7HCvPtSEBh4769gOHPoQNPxR4+LeSR9uYFYJu0LDj Z0cbZ+0NHLkLP7UQrVlzgflmavKcb4UMvQuiWn8rgxlM+jAqEhojQE+QNwQOIWzC1cSO OyHDmA1XgsXw6GhIe9j85klvVqLCa6opzpWx2PEOApFOAkCqnZNVaVbpkI108TCF8U5e U9qg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ZXeY8E/2c5nWSDT/ivMx8LjfRNuYigTN+ho7oHQ71Zc=; b=c/l1JtWOoGg/PgMwU/ajLGxuXkkV883UXov/FiZPubyMZc+6JGtT6Xk5m6kMoNoUZH /RBYRW4ETXyhdOWcZR0AaI9GtT5b5Ah0TXfWep2KzhIB31+NZO7rul2f6CeeLHM72aFH KfzJqpO3TVvLQB/n3Wp/0QuxP3g5b6MgbxWRh2u4/ozVAuJmses877MP0MW5o6lqpQVe DOCDMm6nbCP3trz2Ctnf6yWHkjpYvQrwKz2S6i2bcFwAmoGWtbHVZthDEqa8Bzh2YNVX sjlLw1Z22BkAjrdIsqT7+VbVl73V1XrrcVl1YMVKNwf0x6K1k4IOU0i0AqYuIhtkcAWb JM6Q== X-Gm-Message-State: AFqh2ko064+51Ojrrpv+E6MwdDJgPSf5ETwUqAviLlOPgUbB6UItqobH +lWipVFCkCh09p2uHPnQTPRoqLDASao3ig== X-Google-Smtp-Source: AMrXdXuno/ymFCV9qdpwPbmJI0fUw50GtvCcyQlZEiSLqR25INbML+EBJBoCq3fdPTn04SAkmP42HpI7Zgktog== X-Received: from yjq3.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:272f]) (user=jiaqiyan job=sendgmr) by 2002:a05:6a00:2c1:b0:58b:957a:5ec7 with SMTP id b1-20020a056a0002c100b0058b957a5ec7mr53390pft.39.1673897950995; Mon, 16 Jan 2023 11:39:10 -0800 (PST) Date: Mon, 16 Jan 2023 19:39:00 +0000 In-Reply-To: <20230116193902.1315236-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20230116193902.1315236-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230116193902.1315236-2-jiaqiyan@google.com> Subject: [PATCH v1 1/3] mm: memory-failure: Add memory failure stats to sysfs From: Jiaqi Yan To: tony.luck@intel.com, naoya.horiguchi@nec.com Cc: jiaqiyan@google.com, duenwen@google.com, rientjes@google.com, linux-mm@kvack.org, shy828301@gmail.com, akpm@linux-foundation.org, wangkefeng.wang@huawei.com X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 4601840016 X-Stat-Signature: tizn8ubc6qkf17akzdkzo4o8fw9zer3m X-HE-Tag: 1673897952-407615 X-HE-Meta: U2FsdGVkX19mJBrZYwqWMUcYCZkR1an+Dj4+MixL+Ko78l8zRnmphS4P+lGOoF4M0NJsIMd4FNFpiRq5XB/r79uRKaRzG4AS0M6GBADG0yXc5iDh19unf/LMbaDZsnB7kdXz3rDShQOx3jPbdvmkULctz0nzRbgb9oXv/IUEDtVswm3547CcD8StUZHZuHw08dCV7tLQFRc2YCRB1K8oiTKJkFyHeso04WnjQ8tklJEfPVMK7RpOxVZrWTGI0qdTiiBwuCkfV16zUL8Yb6L0UYcMzhNZNL9mFRtx3lQ0tuASmNIiGD6Z+fryUoG84yroly72djJPwpvzFryj5pICNo98kP+GF6Lx1O5ZTj7ztrGpPeCV79G0rv1eVEoqCzg7+1zKd8MARcq5hLHXUXt/EvUhgP66XXuf3hz+wI6rVGgJifQXwZ5tTtjXrrikPL6Y8o2MNOi+9YXdsyr1gCROiSnXniZ+3/1Nv5hjoNbhCYvMafgB2FR/EzY8/8lVSPBxbkWzWqRpoMNz7Bsn4kzjystuHNO98pHZFFgitkMW5IRcujETzkX1Qc6ywTwmhsdapH35/a7gRVC2Pi0Xq1ZNzuoKWoxko4piivGWiK5arp8RNxBK6HiLsV2jS6ORmLduDdNC7k/jYwPi/DAg7B+YVqvyjYu8OK5S3kDjtx+gfP/VnLEg/52e9lz7JVm6aVMCmiyGgIjeBFs+y6sopx6nwRSk5/sXitukJbt1JTQ7wMCUmn3GIMwNiYEQkDSCmlCT5t1rK6VI1XwhhsEN5hbBK4E4lfvX2f9CCM/h5TBj2ZLT9wbqMOeHSVMsO/hL5EhXxOjF78zF/kWA5ZmaRlWxkkAlad7nL1bmoYZSX+pyvK8dvivpvhP/yt3QDAJYHFbNLJKMQTAMs1L2WD2FpJPYzOcauxzBFE8wRoeqKa0YToDzSIlMuEgOWGRW+g3PU0A5aY5QEi/c8Gghi2w/o44 dzhFrMDo XehdhtIpv18y1diHRVSFhLiU+UWHlBZy3xaQ1LySs5zLFZUhgbxXUI7C0EFME/fMbchcu7IYoN5sqcuae8SvMadP1imfahJSHko64vTQYBw4fQ46EzM9IbF7Got0HJYonAH9y8mTYZrIzxWyeTuFcdK3gkdHB358vpQlkMMxod81RlSQQ5HFN3yrIjjEMTFlMTUOTZ28TIc+cGqFumr9G3RsAN8tmwy4G5EjTZYoIZXZkyjfOVbxXvS86bFD96CunN5UWK8lsOuznJFCbwgC7VtKTJv2VUvokiEVYTd95YyiXMqAXayiBn9Ae+3gRjPDwVjC+Zssc5peyVT620nkZseYKpONqtlBE2eesSLj7c4pJZUqTIu1QAFEo8fBUBS108vO1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Today kernel provides following memory error info to userspace, but each has its own disadvantage * HardwareCorrupted in /proc/meminfo: number of bytes poisoned in total, not per NUMA node stats though * ras:memory_failure_event: only available after explicitly enabled * /dev/mcelog provides many useful info about the MCEs, but doesn't capture how memory_failure recovered memory MCEs * kernel logs: userspace needs to process log text Exposes per NUMA node memory error stats as sysfs entries: /sys/devices/system/node/node${X}/memory_failure/pages_poisoned /sys/devices/system/node/node${X}/memory_failure/pages_recovered /sys/devices/system/node/node${X}/memory_failure/pages_ignored /sys/devices/system/node/node${X}/memory_failure/pages_failed /sys/devices/system/node/node${X}/memory_failure/pages_delayed These counters describe how many raw pages are poisoned and after the attempted recoveries by the kernel, their resolutions: how many are recovered, ignored, failed, or delayed respectively. The following math holds for the statistics: * pages_poisoned = pages_recovered + pages_ignored + pages_failed + pages_delayed * pages_poisoned * PAGE_SIZE = /proc/meminfo/HardwareCorrupted Acked-by: David Rientjes Signed-off-by: Jiaqi Yan --- drivers/base/node.c | 3 +++ include/linux/mm.h | 5 +++++ include/linux/mmzone.h | 28 ++++++++++++++++++++++++++++ mm/memory-failure.c | 35 +++++++++++++++++++++++++++++++++++ 4 files changed, 71 insertions(+) diff --git a/drivers/base/node.c b/drivers/base/node.c index faf3597a96da..b46db17124f3 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -586,6 +586,9 @@ static const struct attribute_group *node_dev_groups[] = { &node_dev_group, #ifdef CONFIG_HAVE_ARCH_NODE_DEV_GROUP &arch_node_dev_group, +#endif +#ifdef CONFIG_MEMORY_FAILURE + &memory_failure_attr_group, #endif NULL }; diff --git a/include/linux/mm.h b/include/linux/mm.h index f3f196e4d66d..888576884eb9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3521,6 +3521,11 @@ enum mf_action_page_type { MF_MSG_UNKNOWN, }; +/* + * Sysfs entries for memory failure handling statistics. + */ +extern const struct attribute_group memory_failure_attr_group; + #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) extern void clear_huge_page(struct page *page, unsigned long addr_hint, diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index cd28a100d9e4..0a14b35a96da 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1110,6 +1110,31 @@ struct deferred_split { }; #endif +#ifdef CONFIG_MEMORY_FAILURE +/* + * Per NUMA node memory failure handling statistics. + */ +struct memory_failure_stats { + /* + * Number of pages poisoned. + * Cases not accounted: memory outside kernel control, offline page, + * arch-specific memory_failure (SGX), and hwpoison_filter() + * filtered error events. + */ + unsigned long pages_poisoned; + /* + * Recovery results of poisoned pages handled by memory_failure, + * in sync with mf_result. + * pages_poisoned = pages_ignored + pages_failed + + * pages_delayed + pages_recovered + */ + unsigned long pages_ignored; + unsigned long pages_failed; + unsigned long pages_delayed; + unsigned long pages_recovered; +}; +#endif + /* * On NUMA machines, each NUMA node would have a pg_data_t to describe * it's memory layout. On UMA machines there is a single pglist_data which @@ -1253,6 +1278,9 @@ typedef struct pglist_data { #ifdef CONFIG_NUMA struct memory_tier __rcu *memtier; #endif +#ifdef CONFIG_MEMORY_FAILURE + struct memory_failure_stats mf_stats; +#endif } pg_data_t; #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index c77a9e37e27e..cb782fa552d5 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -87,6 +87,41 @@ inline void num_poisoned_pages_sub(unsigned long pfn, long i) memblk_nr_poison_sub(pfn, i); } +/** + * MF_ATTR_RO - Create sysfs entry for each memory failure statistics. + * @_name: name of the file in the per NUMA sysfs directory. + */ +#define MF_ATTR_RO(_name) \ +static ssize_t _name##_show(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + struct memory_failure_stats *mf_stats = \ + &NODE_DATA(dev->id)->mf_stats; \ + return sprintf(buf, "%lu\n", mf_stats->_name); \ +} \ +static DEVICE_ATTR_RO(_name) + +MF_ATTR_RO(pages_poisoned); +MF_ATTR_RO(pages_ignored); +MF_ATTR_RO(pages_failed); +MF_ATTR_RO(pages_delayed); +MF_ATTR_RO(pages_recovered); + +static struct attribute *memory_failure_attr[] = { + &dev_attr_pages_poisoned.attr, + &dev_attr_pages_ignored.attr, + &dev_attr_pages_failed.attr, + &dev_attr_pages_delayed.attr, + &dev_attr_pages_recovered.attr, + NULL, +}; + +const struct attribute_group memory_failure_attr_group = { + .name = "memory_failure", + .attrs = memory_failure_attr, +}; + /* * Return values: * 1: the page is dissolved (if needed) and taken off from buddy, From patchwork Mon Jan 16 19:39:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiaqi Yan X-Patchwork-Id: 13103611 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF5B4C54EBE for ; Mon, 16 Jan 2023 19:39:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 500826B0075; Mon, 16 Jan 2023 14:39:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B0E86B0078; Mon, 16 Jan 2023 14:39:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 352026B007B; Mon, 16 Jan 2023 14:39:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 26B546B0075 for ; Mon, 16 Jan 2023 14:39:17 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id F1FD4A0853 for ; Mon, 16 Jan 2023 19:39:16 +0000 (UTC) X-FDA: 80361675912.07.957F3FF Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf10.hostedemail.com (Postfix) with ESMTP id 61D44C0005 for ; Mon, 16 Jan 2023 19:39:15 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=XDmsjOcs; spf=pass (imf10.hostedemail.com: domain of 34qfFYwgKCG0UTLbTjLYRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--jiaqiyan.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=34qfFYwgKCG0UTLbTjLYRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673897955; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9m2TZ6IFiZM1VRO/0EBGpODM+jLu6Y9x1OYyXGlGaAE=; b=EP3j5CvoOXu9+j/oJYeTxqeZCVrvSHFz2thZow41Rr+TcBT1ZwaGCO/Nlg4dj17k/adF1B xC2n1syNkPtpxRvccvDstjAbfmo9jOD4gCDkogrLyBLF3HeQb/sRxx/nojDoriBDmOZAii 3DCoJFgCRVCxZxnSQ6IVsZL3nffttfs= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=XDmsjOcs; spf=pass (imf10.hostedemail.com: domain of 34qfFYwgKCG0UTLbTjLYRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--jiaqiyan.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=34qfFYwgKCG0UTLbTjLYRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673897955; a=rsa-sha256; cv=none; b=eNbXoJw0YQChnQccGvaZdoywYDTEW+hOQn6FimpoySmB8+BVmbkkGZBvla9q7jKrotZiIZ ZkXYEUYZeKOxMSuDZVRFFTIkXvvENzpsuSKOgHbtXkDJ/8oRfREILYAtnaRYiACfcxhkpO /v89v2E8X/xXr7MWPxT5eTZfvBI8XKM= Received: by mail-pl1-f201.google.com with SMTP id u6-20020a170903124600b00188cd4769bcso20348918plh.0 for ; Mon, 16 Jan 2023 11:39:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9m2TZ6IFiZM1VRO/0EBGpODM+jLu6Y9x1OYyXGlGaAE=; b=XDmsjOcsaWcHGJJzxrW7OL5na2AWW1mH+egu3heC6/XNRrJZsqUwQ8a5+MWtCtuCTD 40Ye7IvOY8HacScudAbcNdbFMJ/RaqoY+Tf3Eg+efRQAXhiEkZ6Olg232cgaWetQEn/F 05u4w4vxezb+eowmNASgbi/K9di2WD4mLxmxR3SUjN2DIM2Sh9y+TZhyVD8dfOIKsrE7 PIp2aOQyxasVBtAyWPUfWZifhmDSYIP9KWE0QL2n+pRGsQ96cOYI0c7RTmnCRdBhnuSP Mdbi8/ShErq8K26giVJFdPz885saCn2YTbyCBO6u/9I+oRPLZ9zhP64EwAOh0ntMT/Ka af5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9m2TZ6IFiZM1VRO/0EBGpODM+jLu6Y9x1OYyXGlGaAE=; b=Slu7MrDTdzNeVdoPBJsyu3HkQuz52NMNMyuQhXS5JGmqO5wrbAOQLQTRlVh5syiYCK suWecj1YORAtVfF/FNk7MjTgWSSFxnlQDA84VWQd+NeOON4rJf5zJL7rE6lBu3mcCv/M siqJIPcU3YOkTr1A3vcILnQ2auTwWpnAjoqhINB82WCeiOvFo3/w4FhRzBKYdUj0SXEz gtw2wCG8UbWNFiD72YWnscCyTQctcQMjlJ6/k24JlbdPr6JoXpkZNjz+wReFUEQgWrV8 HmjIsuEM3bGtYNKv2y7XJ2c/dVyHPTxpZlUwTXvnC+PFpsR+ql3F0kRbRDT0in6FtXJc 9u+w== X-Gm-Message-State: AFqh2kqSzm3woABEr97mDhEraLhcPo8padlRJ3EV+DzAk6OVmM9xOJXS u6+oTcBxMhRPNIuio8/6KlbNf80WFCDsGQ== X-Google-Smtp-Source: AMrXdXu/Z2nDqVdVJuD71p4PILJiOHqmaIbqMlPlJ+Gs7GwlHZEt2FL5OM5vPI/9PJrHeeAB5sk4thKXipkFYg== X-Received: from yjq3.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:272f]) (user=jiaqiyan job=sendgmr) by 2002:a17:90b:3a90:b0:229:6b54:5a07 with SMTP id om16-20020a17090b3a9000b002296b545a07mr38451pjb.103.1673897954252; Mon, 16 Jan 2023 11:39:14 -0800 (PST) Date: Mon, 16 Jan 2023 19:39:01 +0000 In-Reply-To: <20230116193902.1315236-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20230116193902.1315236-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230116193902.1315236-3-jiaqiyan@google.com> Subject: [PATCH v1 2/3] mm: memory-failure: Bump memory failure stats to pglist_data From: Jiaqi Yan To: tony.luck@intel.com, naoya.horiguchi@nec.com Cc: jiaqiyan@google.com, duenwen@google.com, rientjes@google.com, linux-mm@kvack.org, shy828301@gmail.com, akpm@linux-foundation.org, wangkefeng.wang@huawei.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 61D44C0005 X-Stat-Signature: 5b1dy74nxm9mo3ezqkauydmcgf18cpd8 X-Rspam-User: X-HE-Tag: 1673897955-430966 X-HE-Meta: U2FsdGVkX18f+xF95ND3xPWGws4A1tsu+2d/d5XJifOOs90J2xOdTaYkA6dPmBVg4rMey3Yn4DdNejtHhl/VqMxn5Xy7CTIGaygS7CwDL+rNeznDIqOzxQjZKemFAJzRhYwgCdOn9XwTQbxe89R2kaaXEHJ3Z0auR6+oqh4EMCoql/3LhZwQMjRpl6EdAJC1ulCM+cDBgoIi5KpJlz/NaQkwpiANJLJpWQQ7hdY8A7fhnirx47CGpU94xLga6HTU29D0SaiH0Y56LLMTU2PIUrK/NY8KnONsYy7BJDE9ecbxZvcfL2pJQoLcknM6x+K8MMHW0f81BB8OPP8WdpuOBTypOn21OUMF/oVZ/75iTZfp0t+3jQZIzu41nxMZHaosLFVifL5tg6LVUlsd+oNAE4g9DP0CYmrQWBnUHhkAzSUCRT4+cOeF/ik3jJhuh1kz7Yw/a97FVIBgQFiPoDK4q5ef7norJoNvVI0SDk6ZO9GBLmFu3Qa1ZQUi7RXZ0VgigRpBtVUpGlvp6YMZ5L1fNf0qdb44VrOmfsWNCTSE+ZGt6ta4kmzC/504mDxbVDQytTsowKOup64TRmWOHe9d6toqLbxMAYscC6eyv4lFubarDDDE6REoZdYJGYIcCnQBCFYrB8VcD1QA7aw7+JkhDBOhTx+R386zaDdjw+0Y6t2rrsh+jnnk4VgfxSBRMumtkH2FQFofDgWk2hS7kbJWG8NqWME3REjpgEA4nbBSds7XGIi/OAS6jI1pjxsPe8+yXzOo7lshbJx1yfeX7v4W0Qn01AgqI9IqX294Ndp7h7Zw2BvYGhkVRcJT7G41WwWm0Supco7XakKvmJ84NCCs9iRV4aPbtlimIwwlWbQD11FToBfXxKBLjj9gNYugbFnLryxer9ibWOemrUq0ftX3Vv2kMbrwoYpmd0kyufHzV54d44lmixeJyED4tnJUx13i/0Xmt8Zsldxiu16dTIn HXXGl3za N2y642H36KM8WlG0jKINJwP3P89f8XYqzgWNe0FmyMij3587kmr5xxDbzITgCQ1XIuSsGdju9Zck45b69R8SNh6qQLaW1zd0ePyL0cX8BlxHj+f6nqYzn0+eomH7TOOyDb2dQ+AJTeCWEuLFIDaiwMYquM/tlGBdpJtkaarjo4Agg+5ZRjivBJXN+kgXZKpKvepKtUpp5WBRHeyIblmErQgnNkJgTjV6W+/uted6AAztDOCTr8G2MwX4NdP2HGgiEU3ieHTdCl21dmsRPsNdST27v1YNEAbcCNOvqQEh/t/ETXIuj1o0+7UzIkIl8Bss9rfe+deq3RlUuJHtWFbIcLtGfjmMYe3oobqJm1qDN/mdLUQReLs+DHmbvppIuGpnXylOmQy01i5+gaZKbzX+RRY/b2shdx0re6VDqOw4XJaZ33Kk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Right before memory_failure finishes its handling, accumulate poisoned page's resolution counters to pglist_data's memory_failure_stats, so as to update the corresponding sysfs entries. Tested: 1) Start an application to allocate memory buffer chunks 2) Convert random memory buffer addresses to physical addresses 3) Inject memory errors using EINJ at chosen physical addresses 4) Access poisoned memory buffer and recover from SIGBUS 5) Check counter values under /sys/devices/system/node/node*/memory_failure/pages_* Acked-by: David Rientjes Signed-off-by: Jiaqi Yan --- mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index cb782fa552d5..c90417cfcda4 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1227,6 +1227,39 @@ static struct page_state error_states[] = { #undef slab #undef reserved +static void update_per_node_mf_stats(unsigned long pfn, + enum mf_result result) +{ + int nid = MAX_NUMNODES; + struct memory_failure_stats *mf_stats = NULL; + + nid = pfn_to_nid(pfn); + if (unlikely(nid < 0 || nid >= MAX_NUMNODES)) { + WARN_ONCE(1, "Memory failure: pfn=%#lx, invalid nid=%d", pfn, nid); + return; + } + + mf_stats = &NODE_DATA(nid)->mf_stats; + switch (result) { + case MF_IGNORED: + ++mf_stats->pages_ignored; + break; + case MF_FAILED: + ++mf_stats->pages_failed; + break; + case MF_DELAYED: + ++mf_stats->pages_delayed; + break; + case MF_RECOVERED: + ++mf_stats->pages_recovered; + break; + default: + WARN_ONCE(1, "Memory failure: mf_result=%d is not properly handled", result); + break; + } + ++mf_stats->pages_poisoned; +} + /* * "Dirty/Clean" indication is not 100% accurate due to the possibility of * setting PG_dirty outside page lock. See also comment above set_page_dirty(). @@ -1237,6 +1270,9 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type, trace_memory_failure_event(pfn, type, result); num_poisoned_pages_inc(pfn); + + update_per_node_mf_stats(pfn, result); + pr_err("%#lx: recovery action for %s: %s\n", pfn, action_page_types[type], action_name[result]); From patchwork Mon Jan 16 19:39:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiaqi Yan X-Patchwork-Id: 13103612 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3AC73C54EBE for ; Mon, 16 Jan 2023 19:39:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CC8DE6B0078; Mon, 16 Jan 2023 14:39:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C7B076B007B; Mon, 16 Jan 2023 14:39:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B19F16B007D; Mon, 16 Jan 2023 14:39:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A4ED56B0078 for ; Mon, 16 Jan 2023 14:39:20 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 5B0A34067C for ; Mon, 16 Jan 2023 19:39:20 +0000 (UTC) X-FDA: 80361676080.14.13FC6C9 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf15.hostedemail.com (Postfix) with ESMTP id C3100A0007 for ; Mon, 16 Jan 2023 19:39:18 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=OuQk2Iyl; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf15.hostedemail.com: domain of 35afFYwgKCHAXWOeWmObUccUZS.QcaZWbil-aaYjOQY.cfU@flex--jiaqiyan.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=35afFYwgKCHAXWOeWmObUccUZS.QcaZWbil-aaYjOQY.cfU@flex--jiaqiyan.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673897958; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=z7fkm4REvBMXJIiyFRLx4JwzlzIx/oVQipHwvFKskdo=; b=IunhUcqa9B+CO+fJfbp/bKaDRPgSIuC6Lb5lxoRVCbs88YDishdUcBFBxupKZ0yQigJwAB QYmdeNg+cv/uqF1GCDfzasSmX86eyXiVrOu+wBgMV7FSvUeKayFR8zwHXMaVWeWbOsxfnw uY7bXygI/hb89s7MXPc5d4hjd3o3ewc= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=OuQk2Iyl; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf15.hostedemail.com: domain of 35afFYwgKCHAXWOeWmObUccUZS.QcaZWbil-aaYjOQY.cfU@flex--jiaqiyan.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=35afFYwgKCHAXWOeWmObUccUZS.QcaZWbil-aaYjOQY.cfU@flex--jiaqiyan.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673897958; a=rsa-sha256; cv=none; b=taeVQDLINlMUNFDZ1NZC9J8T/uhv3aHmWWLdv1UIr5gOSLUaMPjkhgjCKEn+eSmHpidGSO FDzDNFc6ZRbZbB5mykQ052Ni34/Oqc+qvM5Iw+wTaDyxnLw0xNi0wCW9iyl6b+IC8RqFCn HgXChu6ADjHkVOAuILJVXrOOTzlXO48= Received: by mail-pj1-f74.google.com with SMTP id y2-20020a17090a784200b00225c0839b80so12562723pjl.5 for ; Mon, 16 Jan 2023 11:39:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=z7fkm4REvBMXJIiyFRLx4JwzlzIx/oVQipHwvFKskdo=; b=OuQk2IylkN4j/dSpwGgqDw4dif/xqkUSSDGRfCAUmHjLN0qfq7TqskLxMD0XhMF4Ih 85AVpyUKis1ddJ2aOFwl6xwWQYCXooZy8niO3c6VxCLxMzyRMa8/VFYAZIWUHmIcS2DY Vu4XmjwawK06r3RILsKh8PZzEXrD4SzwYOOJu6zK75RsjHKCm6ATMr+fBM5WOwRXQr7A +dhtf9zL0VeCaOfAIHp+MVwi0L6d3pjvpvHtwXkZAq9wDL6uyi6IwWjtyUPzaz/ePFai 7zG666seWCtgT154iCD/6vFQKTtivWDu/fRKQwMV6SA8v4bcgmMIcIEruM5crgBRJ4Ws hGkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=z7fkm4REvBMXJIiyFRLx4JwzlzIx/oVQipHwvFKskdo=; b=1bog2V9cingr+aVedPC/i6RezbJfgav6OAFjRpJ4HjF0RAxmL5BCbX4eE07kMbh9uF bL6r2rGG/Hbr4ESQvZ8O59nErZ5nxr8FRFh+3q/W9sgE47rwlxsiTkjwJQKcJ52tfIjL PQqOo5emsC53Ute4c80SHCbOE39in8SvcSQ5BR/sNCAu5qIZYqNguOLhZ+AUIna0FQCD MbqSMpiG9pfrxIH7e4vAPhO9hDF3jynogqwuFQbeJxrQjQNevbHgBYOQhYbjQZ3voJ+W vOk6GiTADZ8c2H7cXAdS4/o1HmzIbxxfjyftjXHgxu/ljLG7CWrwfPap7D6u3B3b/KcH zC+w== X-Gm-Message-State: AFqh2krawbq8qbJyg05P2YEXtcpKDW9w4OsAbV+hHZck+kxDtBvusSq2 gwJm4nLOS9R+Hm2oDwATQwZBP5oTPNjtfw== X-Google-Smtp-Source: AMrXdXtN1/hXDqFlvulQ1iWRJ5u8WJ8/KNHSE0dC+2DZLknz/wdY4M62Io1UWDSzhjGp4A6FiKLzU71GeCI8mg== X-Received: from yjq3.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:272f]) (user=jiaqiyan job=sendgmr) by 2002:aa7:85d5:0:b0:577:81cb:4761 with SMTP id z21-20020aa785d5000000b0057781cb4761mr55669pfn.46.1673897957801; Mon, 16 Jan 2023 11:39:17 -0800 (PST) Date: Mon, 16 Jan 2023 19:39:02 +0000 In-Reply-To: <20230116193902.1315236-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20230116193902.1315236-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230116193902.1315236-4-jiaqiyan@google.com> Subject: [PATCH v1 3/3] mm: memory-failure: Document memory failure stats From: Jiaqi Yan To: tony.luck@intel.com, naoya.horiguchi@nec.com Cc: jiaqiyan@google.com, duenwen@google.com, rientjes@google.com, linux-mm@kvack.org, shy828301@gmail.com, akpm@linux-foundation.org, wangkefeng.wang@huawei.com X-Rspamd-Queue-Id: C3100A0007 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 6j6jdf53f1zziketyq8kj8m4cc868aak X-HE-Tag: 1673897958-459924 X-HE-Meta: U2FsdGVkX19ssnzQwMeRP9TOvRAdI/Hp37pI592xajCWrATG0PixDPTo5iaK6yTTu735q3Zd69p936xqIfs7kkF+D584mFYNQ/HTPt63392S9t1b7DqgtuI2jLrvKdVZ1vPEitzCva5c9fqTeqb39MWX8xV6b7eqq0JWs61/Kdvx8haWZ/iK7rjzxaePRA10txIcuEBP3bY9EMeGg5MoOgpOlEkg0Kai+S8uEs4mq5BTrWbr9zhNQvUDoUf9DWmPojweUYS+7r0EAlAsImGRTDFGBRkgqHe42Vz/QwHBw7vphuVdB9Tn2FQilryP4wBw6Vp3hCYNrspJCJShEm1icBurpVLaEV0AtDSNAu8x6k7Z5vbFlAHyUMk/GxNbY4qCw3LI0ysNNnrjr8W6bVAhcm7Le09uyz7gmInIER9L7FvOXtrDBF+dU109tgIBBCafmLUTwjVN+lbsjBZUbLJvKAXMKJ7IHylUIVLQDmV019XeRfyRLTfYBCKm3mbU/TwmkcnDn0FDQjuAY0L5tX7S2z4GgpsIZ189Dn/muUteFRkxbMUswjIcpy923rZlLqo6ETft6SF7+KhAmpstW6l5plOyG9Z2GX6O3W1uXXVqsR9XEj5UlUOn+eMf4ngVsqdPdxhmiWsuwaWiB5uaY3+8DUGMjFVbRrXiF78HEGFmVip/gnWAkzigJOrcRt87P2xf83NeshG2iy2Ub+eIrcYthgXpP0cl5dqL3N2TOYNgL/4MqZDs2C6m5VV+8KoCIB4jQEoQemuzdWrUNjRt7xDzfCAKfGY7eit9Lgne9EatjNPqbR35R9WVlcaHBgzFgDeHwPKrClKX68G5sefbqOojYQxQo5La3i/dCNwcYQmlLzhwiywofA+Obug/KVgDQ3c2/yTyP3YaIRp6DZ/G3GXOcuhW/pQyUTV2ST5IT86zruvDAmxDMO5S4EQ/icIAV/wChCSXRjCsSX47qqT+0sR Lapkkfku YKXwwgkAE06wenBXk9pTJbS7M2OWFHX+crx+2Iq6lpPZfdymBBZLpbAuLBIJZ8A0zQxV8QBQOg8cPiblMMulsj43DT63fsMJ/8R5tG/LbRzPddqCjakp42rup8Km9EJh1X3mWIY3t2s0nGDf7T2uigrDcAPiVh1a/rXLGgZ6W+CjyzdzcvhWU6w6RMMCbY4FlCCuuWjw2UrYcgR0W5jZm6zEErITHrc9SkZniLCe1MooFvu8l6OSrfg27svMGcmXK7aIl1uEkPfHWSIQMYij15IJJXkQ6knR1kkKEkX9Q85o6neB2m6AJVpFTDjCa3jP7lVZ/zkNqdPRAERkzgyTcmfHHNuMEJ3XT2JcCeXeEIPTtlLvikCy9Ak39sg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add documentation for memory_failure's per NUMA node sysfs entries. Signed-off-by: Jiaqi Yan --- Documentation/ABI/stable/sysfs-devices-node | 39 +++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node index 8db67aa472f1..a2fefb0e61a7 100644 --- a/Documentation/ABI/stable/sysfs-devices-node +++ b/Documentation/ABI/stable/sysfs-devices-node @@ -182,3 +182,42 @@ Date: November 2021 Contact: Jarkko Sakkinen Description: The total amount of SGX physical memory in bytes. + +What: /sys/devices/system/node/nodeX/memory_failure/pages_poisoned +Date: January 2023 +Contact: Jiaqi Yan +Description: + The total number of raw poisoned pages (pages containing + corrupted data due to memory errors) on a NUMA node. + +What: /sys/devices/system/node/nodeX/memory_failure/pages_ignored +Date: January 2023 +Contact: Jiaqi Yan +Description: + Of the raw poisoned pages on a NUMA node, how many pages are + ignored by memory error recovery attempt, usually because + support for this type of pages is unavailable, and kernel + gives up the recovery. + +What: /sys/devices/system/node/nodeX/memory_failure/pages_failed +Date: January 2023 +Contact: Jiaqi Yan +Description: + Of the raw poisoned pages on a NUMA node, how many pages are + failed by memory error recovery attempt. This usually means + a key recovery operation failed. + +What: /sys/devices/system/node/nodeX/memory_failure/pages_delayed +Date: January 2023 +Contact: Jiaqi Yan +Description: + Of the raw poisoned pages on a NUMA node, how many pages are + delayed by memory error recovery attempt. Delayed poisoned + pages usually will be retried by kernel. + +What: /sys/devices/system/node/nodeX/memory_failure/pages_recovered +Date: January 2023 +Contact: Jiaqi Yan +Description: + Of the raw poisoned pages on a NUMA node, how many pages are + recovered by memory error recovery attempt.