From patchwork Sat Jan 18 23:15:49 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiaqi Yan X-Patchwork-Id: 13944263 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E49CC02185 for ; Sat, 18 Jan 2025 23:16:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AEF176B008A; Sat, 18 Jan 2025 18:16:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A9F2F6B008C; Sat, 18 Jan 2025 18:16:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9175B6B0092; Sat, 18 Jan 2025 18:16:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 73F7E6B008A for ; Sat, 18 Jan 2025 18:16:29 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 25B5C81A22 for ; Sat, 18 Jan 2025 23:16:29 +0000 (UTC) X-FDA: 83022133698.18.7031C88 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf17.hostedemail.com (Postfix) with ESMTP id 5556B40008 for ; Sat, 18 Jan 2025 23:16:27 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Jhv2JZ0Q; spf=pass (imf17.hostedemail.com: domain of 3SjaMZwgKCCsQPHXPfHUNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--jiaqiyan.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3SjaMZwgKCCsQPHXPfHUNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737242187; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=I9XQRTrIMw2nHi4Rykakj51VRt6E+kzcD+EW0K/CYuY=; b=xSxzruOs608ter3p0KG/Y4HosweCsEyi6e2+T/f4lhvS5RRdFPbHGdhNJlWebF1/HDFfjN vpAY1edUNy9zfd2xwoSXZB1kf+VaYbHF3di9mw3TIpzQJg/Xl3rPMoe4GhVmy3PDDk/vsG PLxUaS3SNcMVdCaSQhZc9tr6coxTjic= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737242187; a=rsa-sha256; cv=none; b=ksQOXKpZWljoAx+haiqLKvxt9b9HWmQnfQ2DJTHgj5D/HLKIdA8xjFsXKMBqDOAx5H8ybj rkjZh2kaY8d5BRikmFNbxBg8P9KYg1NyUDhXja1WzC/SHuS0mJJDxcYw2NXt/eemPBCDAp 5AAF2rTZxIe1VU3htwIhEtF0rzdr/DI= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Jhv2JZ0Q; spf=pass (imf17.hostedemail.com: domain of 3SjaMZwgKCCsQPHXPfHUNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--jiaqiyan.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3SjaMZwgKCCsQPHXPfHUNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2178115051dso60855185ad.1 for ; Sat, 18 Jan 2025 15:16:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737242186; x=1737846986; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=I9XQRTrIMw2nHi4Rykakj51VRt6E+kzcD+EW0K/CYuY=; b=Jhv2JZ0QhzRV1wOroA7PzMsPQCYlrvVWAWX35f8BEgBWy8jWNKg4f+WcM1+iDdg4Au l0+Vlu1JsehiPqn9beAJj6CsRmzXsNbLuP84mF05P/yw6Wex5aFcXG90h5jiHNyNZKpy GXy8DgQaty55CRiGTAHVFADaeWfHVd98O5qXwKraR6wHp3vdVyWUAGk5eaV6rLX92qmz 1b4J9ATyNIYNJV6nMkzL1Wz19LbGQN5FFHL4iUTFNhJpUrfndKY9igpLUOd7sDgU+J37 6FBpQXhh2G1yS+8sSByj5fNPTrjOU9VABdMBuITV7gR5mDLglbs3Q9TJWu+zcDWCgaYD r/Uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737242186; x=1737846986; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=I9XQRTrIMw2nHi4Rykakj51VRt6E+kzcD+EW0K/CYuY=; b=Zg58+pwHUWz1qwFEX3kRT0vN4ewaLzOqOBk62LsaJ7FqBIjQprQrE5h02bD0Arx02Z kqT4GaB1m2aSWN0NcFacItzNjKbZ0uDZWdr6qjXr+2tgIMaUg+tdGrFBe0n3kPqPuEvu l3N6oJ3w4iwAMlzROGSRJodFdmK0z1c1Vw6X2dIl5DXdyHFR7WS3atCvixWXEQOonl7G 1d7bzQKtZWKWmBym8LF9gdO2gEmkXLzT86isfj46JxHx2pfbkXjfmbNorkT/cuyF591R Xhpo0J/keLfAuZCiXuNobxPr0uBW+XBmKnSiyYZfg1r4iC34IP+YfsSpBZgYok8iWb7H rj8g== X-Forwarded-Encrypted: i=1; AJvYcCWCkYnS4OyOEdcwTM9yV5jq8eAr+QxlPoWmladIjYCUEWS0Jas//3MpjZ5poZpmKJ+ylirb3dv5vg==@kvack.org X-Gm-Message-State: AOJu0YxhX2Z78hYw/PCWD4ABkjZBR4fqYiA1ynp4pAo/zbVJ/DQ7y2J/ R4fv5ZhWGUhMKADziBNV+FR1W8AbMfpdfzVvvl+h28sEIw8sZ8OyHcnkw8owQ8sBUK2E8cxt6iN /978lDfo+zA== X-Google-Smtp-Source: AGHT+IHwlluQDU7UlSnmrszHFZEXxstYcEx2i2R/0XKRZHVd/xM9gn1WzTOyJfb2igjnZAJi3m+IL+n+5tTv9g== X-Received: from plhj17.prod.google.com ([2002:a17:903:251:b0:216:2234:bf3e]) (user=jiaqiyan job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:f70a:b0:215:a7e4:8475 with SMTP id d9443c01a7336-21c35557a0bmr135957655ad.24.1737242186201; Sat, 18 Jan 2025 15:16:26 -0800 (PST) Date: Sat, 18 Jan 2025 23:15:49 +0000 In-Reply-To: <20250118231549.1652825-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20250118231549.1652825-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250118231549.1652825-4-jiaqiyan@google.com> Subject: [RFC PATCH v1 3/3] Documentation: add userspace MF recovery policy via memfd From: Jiaqi Yan To: nao.horiguchi@gmail.com, linmiaohe@huawei.com Cc: tony.luck@intel.com, wangkefeng.wang@huawei.com, willy@infradead.org, jane.chu@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, rientjes@google.com, duenwen@google.com, jthoughton@google.com, jgg@nvidia.com, ankita@nvidia.com, peterx@redhat.com, sidhartha.kumar@oracle.com, david@redhat.com, dave.hansen@linux.intel.com, muchun.song@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Jiaqi Yan X-Stat-Signature: q9oktmbge4yyc3acwbi8745p8zwhixh7 X-Rspam-User: X-Rspamd-Queue-Id: 5556B40008 X-Rspamd-Server: rspam03 X-HE-Tag: 1737242187-988021 X-HE-Meta: U2FsdGVkX1+vj56xp3NLmLvY/Hyqn5Q9Rsp5PovBIc2CoIoRkg8Fyz1xYqGAVCfWgdBC2gfpDGAapWeZvPAGcii9KJdY+OPdN4HO3lrAtLft6xELO74yq8q8yH9ZLALiD52PFG0oa5yaI4p8EKWN55laWzli2IkQqbWQjU4W9PvesguxrJOhDs95F7LH72TZ0MH9WOJ6/Iww28YkKHw6YuqGRdr0WCWFQy496hZSaobzybhCwW/kZx0I1VMFHY9a31pY9/mAHeDwerAjmeuXgbgwO0n6TJUMSDkvM2YHH/mpWhT/YvqgTylI85Ygcna8R1LUlFnfNrNJu5NYsPlW1oU3E9OlmalbneCYNQPMMZEaswUOWZCri6fv5FSS6AQVMXjRRS6s+lzO9MLAKpx1kL8Jexo6LuKlgBLlhRY2BnYprDEXLJHXE2G9VqPECuoDhBJ53mdR3N0/vUFy3lPeqBbD3wVxyrZl626mkfnh660nzV5r0R1vbEBm3bVTeQW3Zk+/zfvfhJka4dCHasbHd1AC42z+/0UbGNQdxkpM9/TNkNo3qR5Y5ZMwBcg8hAkFQIQsm9hFNteR9lmbLZv7MMTb9IfeTNDGWn/K9ns+XFaRyTI1aEH5LFMVXl/QbanVGvLtt/UhBH3135/XemJ8fC6e/7mY5tgDcAq1k4ceJMLUjZpi7wbu0XCvVYViJ5G4mULBNF6ehw320Ou9U/66NXAwA3V6/OneLA00k3P1MY7aNEe2/aGQ5ANjKmSWpvkwO7RPTvCS6PwtEcjNxd4JY678+S18up9KgQcEC8pjLnIPrbwqxuY935okCkK6kd80jLhjI9OeVTNi9aG2ildPaWoLhAtownJp8hrLgoDr1/m1ZDOq4n7RQ2LpiWAlhtie8gyUKZU1oTIKmSzXlukQi+1A6xDeFlf7K8dPcAYp4PpoLKKKQhY5h8RuV+853cpINIWuiefxETlSTaMeB4l tK2ccqEI 14U7Lw9CUxM/93wR+tQi0jTco12a+5Yvv8JCPol1uHTkXFBBECswBYrP/EZek3Uj2x+Wnf2LpSC4VtJaYySzMBa+A8lI+Dq4H/qfN4ZignaFN48gICDO4RAXbHy3hiGp7c6DOChHGUImRnrirQxVHwD5M6NpAJYGiO9ZR7FnwwSeatVQxM9WrK/NMxq1byx3AEjJOEXESniqbgtqfaR0FiKqIWf3GXlZarq259ZG1KPe1ghXlTlDgz7KYn+q2q/kYy2zinzCJAHQzS+f6xMk+IVJLqIpPYtVMBVTYrCfu4mhGtxMePfMXcuBnzHsuaa0MzdckoaWD5cPv72DJFxLvLcU4+odosTPICRDq5g64DzHIcymCWFAW9sB+9KNsJwnc0ORl901cVu1CQ2IrrCUNe1YMUszN2fTuap8QGBJ6RYc9/6vBpArQIESy5ScRBP5KasgmeOfJoGEnRuMDRG1F9YFmatpiNTiJzQURnmQyLsyjvA38Kwxy+xonQiK83hOYunrvdxUZZYAtWUXGW2B5fiin7Qvu5CpZA9/lrhQcAw9KNTPm4l34G0eB46E44AA//PDF0FNMphOO4sYIwwe9JKYXombHtYaE/czl5jo+HABrb+WN+aKc0IyPlx15OT4zXPbB X-Bogosity: Ham, tests=bogofilter, spamicity=0.023254, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Document its motivation and userspace API. Signed-off-by: Jiaqi Yan --- Documentation/userspace-api/index.rst | 1 + .../userspace-api/mfd_mfr_policy.rst | 55 +++++++++++++++++++ 2 files changed, 56 insertions(+) create mode 100644 Documentation/userspace-api/mfd_mfr_policy.rst diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst index 274cc7546efc2..0f9783b8807ea 100644 --- a/Documentation/userspace-api/index.rst +++ b/Documentation/userspace-api/index.rst @@ -63,6 +63,7 @@ Everything else vduse futex2 perf_ring_buffer + mfd_mfr_policy .. only:: subproject and html diff --git a/Documentation/userspace-api/mfd_mfr_policy.rst b/Documentation/userspace-api/mfd_mfr_policy.rst new file mode 100644 index 0000000000000..d4557693c2c40 --- /dev/null +++ b/Documentation/userspace-api/mfd_mfr_policy.rst @@ -0,0 +1,55 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================================================== +Userspace Memory Failure Recovery Policy via memfd +================================================== + +:Author: + Jiaqi Yan + + +Motivation +========== + +When a userspace process is able to recover from memory failures (MF) +caused by uncorrected memory error (UE) in the DIMM, especially when it is +able to avoid consuming known UEs, keeping the memory page mapped and +accessible may be benifical to the owning process for a couple of reasons: +- The memory pages affected by UE have a large smallest granularity, for + example 1G hugepage, but the actual corrupted amount of the page is only + several cachlines. Losing the entire hugepage of data is unacceptable to + the application. +- In addition to keeping the data accessible, the application still wants + to access with as large page size for the fastest virtual-to-physical + translations. + +Memory failure recovery for 1G or larger HugeTLB is a good example. With +memfd userspace process can control whether the kernel hard offlines its +memory (huge)pages that backs the in-RAM file created by memfd. + + +User API +======== + +``int memfd_create(const char *name, unsigned int flags)`` + +``MFD_MF_KEEP_UE_MAPPED`` + When ``MFD_MF_KEEP_UE_MAPPED`` bit is set in ``flags``, MF recovery + in the kernel does not hard offline memory due to UE until the + returned ``memfd`` is released. IOW, the HWPoison-ed memory emains + accessible via the returned ``memfd`` or the memory mapping created + with the returned ``memfd``. Note the affected memory will be + immediately protected and isolated from future use (by both kernel + and userspace) once the owning process is gone. By default + ``MFD_MF_KEEP_UE_MAPPED`` is not set, and kernel hard offlines + memory having UEs. + +Notes about the behavior and limitations +- Even if the page affected by UE is kept, a portion of the (huge)page is + already lost due to hardware corruption, and the size of the portion + is the smallest page size that kernel uses to manages memory on the + architecture, i.e. PAGESIZE. Accessing a virtual address within any of + these parts results in a SIGBUS; accessing virtual address outside these + parts are good until it is corrupted by new memory error. +- ``MFD_MF_KEEP_UE_MAPPED`` currently only works for HugeTLB, so + ``MFD_HUGETLB`` must also be set when setting ``MFD_MF_KEEP_UE_MAPPED``.