From patchwork Fri Jan 20 03:46:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiaqi Yan X-Patchwork-Id: 13109100 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43600C46467 for ; Fri, 20 Jan 2023 03:46:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 95F6A6B0072; Thu, 19 Jan 2023 22:46:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 90FC16B0073; Thu, 19 Jan 2023 22:46:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7FE2E6B0078; Thu, 19 Jan 2023 22:46:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 717146B0072 for ; Thu, 19 Jan 2023 22:46:29 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 9B843C085A for ; Fri, 20 Jan 2023 03:46:28 +0000 (UTC) X-FDA: 80373790056.07.ADEA033 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf03.hostedemail.com (Postfix) with ESMTP id D75CA20002 for ; Fri, 20 Jan 2023 03:46:26 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=n1GWccAI; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf03.hostedemail.com: domain of 3kQ7KYwgKCPolkcsk0cpiqqing.eqonkpwz-oomxcem.qti@flex--jiaqiyan.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3kQ7KYwgKCPolkcsk0cpiqqing.eqonkpwz-oomxcem.qti@flex--jiaqiyan.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674186386; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=+VA4Ls9pcK9BXFXJQTKsI+Lf510E11GF30kXwUnt71I=; b=mpR+Gl2CUwoSN1ZybogcP/1TOVwo9CIS6eh8TWIrMknVRysB+yQs91GaI19log1WkJtJBU 4fMMvahNKztQmzB9va0iZWlayxGaDDm7t6g1E91XrxcJlqxLmsMCX3rQ5unBeFE2sKf7Pr xS/p53Z4W6FJ0on4sPV6c8AHtC5C75k= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=n1GWccAI; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf03.hostedemail.com: domain of 3kQ7KYwgKCPolkcsk0cpiqqing.eqonkpwz-oomxcem.qti@flex--jiaqiyan.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3kQ7KYwgKCPolkcsk0cpiqqing.eqonkpwz-oomxcem.qti@flex--jiaqiyan.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674186386; a=rsa-sha256; cv=none; b=QY2ds/sjkXVgZHet41EufD5e1rR+9plYoxww5h2aOM2XGzoYKuiGcqVsmSeSHzN2cY2fYR TCqj36tsuyXQ1yTnJCG2vPCCXjrkc0DZOn21ScPlrwOQ2D9oG2AEVM8mN7tsT0PZMwtC4n fGnyIDyE5yywe3/THQG0TO6e8WXHZCA= Received: by mail-pl1-f201.google.com with SMTP id z10-20020a170902ccca00b001898329db72so2452307ple.21 for ; Thu, 19 Jan 2023 19:46:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=+VA4Ls9pcK9BXFXJQTKsI+Lf510E11GF30kXwUnt71I=; b=n1GWccAIL9mcUY20mig4iSRLeySOvHJPdGOVjo1ktUtKP3Z60rOB8E8RfsR4cjMHq/ Ih4L217dgcyweAKwW9ZrVxHnGdgNQZOMW1N6krB9nbttYW7Sih+C0k2hujH4srV5NX7X hNHsyBxoibLEdXOjgWYnQYTbh0UAw5EQyWZrnC6NtXnJzFzpNQ0dZ6WLRLxA2IJ7MuUB Dd7Rjw/ubFEZAggEXOJQ0tOrkjW30qEc8F9VChD6bCDmTZB5iyrQp3Pxl5a11eSpiX8Q E/6YVJUTXs7gtSKmROZmgtF6bqd6QsQbRRtfZV3fdFBkEQw91PeH5N74tZnFv4tDFr7s HaQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=+VA4Ls9pcK9BXFXJQTKsI+Lf510E11GF30kXwUnt71I=; b=xFKJLJb8A/ZIF1zEOn++H2EO5tD2KDpJxb4iJf0dCtGPuA8U+plJqH5fyZQ17q8I/y a0iiee8tx5jT0T5N4+2UTQBFo7cnhJYrJgCIQzB15yrLNe5G/rQU2QRB9ii52SBklpXr VDg1XFjxchoarf7AfJDi+C4uUAJRGFXPCzD8jJ9kBwejERQGqYaP1c9OA+e8u3v6opEg cUb4P9DLXgyguhV9EAWcWCPagqA0KVH04Gfi69oKG3K435IvMGuNYF5CTwOyUg7FnP8W aJreg7T5HAto43ulzcWx6N1EbVqka5ZSUiOYbSxVqaz0r20X1PqBPf2O6EN1quOrfl4l 1gPQ== X-Gm-Message-State: AFqh2kodMcVGi2wDGKiwLTLeoy0/akaoc+3Q4/bygbT8yEyDIIGjkmyz RnRIP4na2sDtISpSb6/9f/emKq4qf3+azw== X-Google-Smtp-Source: AMrXdXum1kPMGXSBLXkVneV8Wn2RORWGajlC3+6EWjknE9asDg2OBb1nC7HEDHFhEa+jQuTD8beXSzKF8fJvfg== X-Received: from yjq3.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:272f]) (user=jiaqiyan job=sendgmr) by 2002:a17:902:b497:b0:192:63c3:7b5f with SMTP id y23-20020a170902b49700b0019263c37b5fmr1410273plr.28.1674186385587; Thu, 19 Jan 2023 19:46:25 -0800 (PST) Date: Fri, 20 Jan 2023 03:46:19 +0000 Mime-Version: 1.0 X-Mailer: git-send-email 2.39.0.246.g2a6d74b583-goog Message-ID: <20230120034622.2698268-1-jiaqiyan@google.com> Subject: [PATCH v2 0/3] Introduce per NUMA node memory error statistics From: Jiaqi Yan To: tony.luck@intel.com, naoya.horiguchi@nec.com Cc: jiaqiyan@google.com, duenwen@google.com, rientjes@google.com, linux-mm@kvack.org, shy828301@gmail.com, akpm@linux-foundation.org, wangkefeng.wang@huawei.com X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: D75CA20002 X-Stat-Signature: gpinc8ucigqdah6tghh7cr7wnisyy1s3 X-HE-Tag: 1674186386-980065 X-HE-Meta: U2FsdGVkX1935eKbSEpPwrCKyk9umkyxMfFue/PY3znrWAFHh3HpuV5TEEN7ak9EBwx+tlMC/c3QcghYcb0W7MCiHbA90uSnm1VzKMRa4tzI9qCDWmgHSX3nxwia52HLoZWnzmICJbsES0Cgqsi9ecdH5CdYicFo2if2/ddlVfB4uHeG8oqv4G19gvCmoMT4Iceu3meiSFjcfPuAGbq7Vaany+HyT3VSdV6Rxi7p8F23s4MXld7BJLU/YqWbImSWTOgFiaZHbdDbA6ximRjKLLJGV9XErtEQcgH3BolpMkOoCPH/3NEMXFV7SN2i9/trrS6IdS24qYYL0yS7wbcv/y6Aqr8QZprT1v8w+J4wl47KZ7Jhc8OyndSXO6j2VyLReFD/U+9HeyAH3UH3/SAiRFvvYmmhsBjJMJ6KjzNegh6cGnfiOqS4s+afP9jancxTzVW7+L5qWJPKaeEmFz4Z6Xf6RQL9PpXTm5GzJtWR0EwYuufn3W8dFmo86G/GEOQGm+eLoqHRPk7CuNkDHZaSrDBq1gbixXcRJ8BZo+p5Z0mHCcR54RpcAQrPkkVWHPPHI711OtLUyTzk2029eEmg2QEbrMra/prm1LsT0AiHKdL6NbTBRcm5olwx7sGlI/ndqiooadR+3Hx+EI7i10x98Cejrtl1srFIl0Fgmi0EXLgIjpv8Jg6JMCOb3Efa+fCxtqE60xizDnPDRtew5D5HokvrjZTJT4RbLFkcxC/V5hN9jPFDZAay1Y7OMkcW/A97syjB617vdI/1StKVQl0E/EUD6gdQjmzYnzfPTKsnNYPNrObhoi/DOdNJHSuYebKqRuzSC7fFGoIE/VdYcpItyqr/x1TddvyV6RsVcIRT7o9R/+x6FCO+MMIuqc2eqvy6/4cuM9A9HR6u4bhPjkGAaJF3GEb0stDvrCdEo5qut5mxZXjiWYccaTpab4E7w+UD06D3MvNIB/8OOiR12Vu a0f+Jxj6 OWM0AXvcyTc2wSES56/vBLN9VbCgl2CgQujEbakrYSg8TmcQe4MJHIJgbHHgLZZps2sMa15PxsJFFG4eloJFq9QZQ9+e2p+fsaj0GoAPxZ0exMYQQlQ3nyBIXPS9lNJYS25f8de+O08o11u7ype93agxghsARNhE0gQ/4VY+Dz3PJR7rY8lKubgBtCINlJXvKBvA/FWuGyANvDusb+O5map8jTS1Ca9vl6DqiZGJrd0S5jQGOEWTJcSD2rUdGyaMo7I1S2fCLkCeUhG0eQx0EOFxQgVhIXj12lFBrhyw/TOEO3xDr9P+FOPvdTpvO6nKUV8KgBcGabcIu3YRiCef1GQQhvuxFj5Apomdu/YGICUCg9rNHfbAoQkCXKg1NFR5lNlY5kByyt9Essu5w18dk+dKKyoAcZ56HXDx68/GwT0bspuH/mI5QJTXhNNZ3AZHmKvgRQNXDl8lBxZ+R9ahTiinYrwrS8EJ/RxoJseLqe4CHV3K0kR5ZgcVbspx3lQpNVYLQTfAB3a6iljac188302hN4tngOGTwHI+UpPwimlaZLXXOflE/4d7VGAyEUzUGC7lS7JodFp7CtkPeE9w4lGX3sJx9R9jAO8AWgXTQsi3C0IF110UVXnzshA/JOmud20DFOPyftlNkaK0lxpPg6AGKISATmsGKjqzwgfSrU7fu+D21ACOxwCczWB5zN3Fi+l/bjzGsRSkkDjZyuXL1OkDnfpCNLOHKslj1v35HkY0QwUySlGCPlY8fOvC92xeWJroswKROj9tRFO0xEsXfLEG8fQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.005617, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Background ========== In the RFC for Kernel Support of Memory Error Detection [1], one advantage of software-based scanning over hardware patrol scrubber is the ability to make statistics visible to system administrators. The statistics include 2 categories: * Memory error statistics, for example, how many memory error are encountered, how many of them are recovered by the kernel. Note these memory errors are non-fatal to kernel: during the machine check exception (MCE) handling kernel already classified MCE's severity to be unnecessary to panic (but either action required or optional). * Scanner statistics, for example how many times the scanner have fully scanned a NUMA node, how many errors are first detected by the scanner. The memory error statistics are useful to userspace and actually not specific to scanner detected memory errors, and are the focus of this patchset. Motivation ========== Memory error stats are important to userspace but insufficient in kernel today. Datacenter administrators can better monitor a machine's memory health with the visible stats. For example, while memory errors are inevitable on servers with 10+ TB memory, starting server maintenance when there are only 1~2 recovered memory errors could be overreacting; in cloud production environment maintenance usually means live migrate all the workload running on the server and this usually causes nontrivial disruption to the customer. Providing insight into the scope of memory errors on a system helps to determine the appropriate follow-up action. In addition, the kernel's existing memory error stats need to be standardized so that userspace can reliably count on their usefulness. Today kernel provides following memory error info to userspace, but they are not sufficient or have disadvantages: * HardwareCorrupted in /proc/meminfo: number of bytes poisoned in total, not per NUMA node stats though * ras:memory_failure_event: only available after explicitly enabled * /dev/mcelog provides many useful info about the MCEs, but doesn't capture how memory_failure recovered memory MCEs * kernel logs: userspace needs to process log text Exposing memory error stats is also a good start for the in-kernel memory error detector. Today the data source of memory error stats are either direct memory error consumption, or hardware patrol scrubber detection (either signaled as UCNA or SRAO). Once in-kernel memory scanner is implemented, it will be the main source as it is usually configured to scan memory DIMMs constantly and faster than hardware patrol scrubber. How Implemented =============== As Naoya pointed out [2], exposing memory error statistics to userspace is useful independent of software or hardware scanner. Therefore we implement the memory error statistics independent of the in-kernel memory error detector. It exposes the following per NUMA node memory error counters: /sys/devices/system/node/node${X}/memory_failure/total /sys/devices/system/node/node${X}/memory_failure/recovered /sys/devices/system/node/node${X}/memory_failure/ignored /sys/devices/system/node/node${X}/memory_failure/failed /sys/devices/system/node/node${X}/memory_failure/delayed These counters describe how many raw pages are poisoned and after the attempted recoveries by the kernel, their resolutions: how many are recovered, ignored, failed, or delayed respectively. This approach can be easier to extend for future use cases than /proc/meminfo, trace event, and log. The following math holds for the statistics: * total = recovered + ignored + failed + delayed These memory error stats are reset during machine boot. The 1st commit introduces these sysfs entries. The 2nd commit populates memory error stats every time memory_failure attempts memory error recovery. The 3rd commit adds documentations for introduced stats. [1] https://lore.kernel.org/linux-mm/7E670362-C29E-4626-B546-26530D54F937@gmail.com/T/#mc22959244f5388891c523882e61163c6e4d703af [2] https://lore.kernel.org/linux-mm/7E670362-C29E-4626-B546-26530D54F937@gmail.com/T/#m52d8d7a333d8536bd7ce74253298858b1c0c0ac6 Changelog v2 changes: - Incorporate feedbacks from Andrew Morton and Horiguchi Naoya . - Correciton in cover letter: both UCNA and SRAO are handled by memory_failure(). - Rename `pages_poisoned` to `total`. - Remove the "pages_" prefix from counter names. - Correction in cover letter and commit message: `total` * PAGE_SIZE * #nodes is not exactly equals to /proc/meminfo/HardwareCorrupted due to cases not accounted. Jiaqi Yan (3): mm: memory-failure: Add memory failure stats to sysfs mm: memory-failure: Bump memory failure stats to pglist_data mm: memory-failure: Document memory failure stats Documentation/ABI/stable/sysfs-devices-node | 39 +++++++++++ drivers/base/node.c | 3 + include/linux/mm.h | 5 ++ include/linux/mmzone.h | 28 ++++++++ mm/memory-failure.c | 71 +++++++++++++++++++++ 5 files changed, 146 insertions(+)