From patchwork Mon Feb 6 01:02:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13129225 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50EF0C63797 for ; Mon, 6 Feb 2023 01:02:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DD5966B0074; Sun, 5 Feb 2023 20:02:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D85AF6B0075; Sun, 5 Feb 2023 20:02:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C4E066B0078; Sun, 5 Feb 2023 20:02:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B28AC6B0074 for ; Sun, 5 Feb 2023 20:02:34 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 73DE41406D0 for ; Mon, 6 Feb 2023 01:02:34 +0000 (UTC) X-FDA: 80435066628.08.F18F5C6 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf24.hostedemail.com (Postfix) with ESMTP id CDA4E180016 for ; Mon, 6 Feb 2023 01:02:31 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=mxSCqpot; spf=pass (imf24.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675645352; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=5xXAiUcFAWqzXsH1hLDrapt+6i4cnMrLTimRbh8a8mA=; b=Ad2CO/nv0kUINqnRrAgStaO+pBPSsSrSEOjckMkjF681yDOgEr/6ZYw100QSeJJpZfjIKT eiD4KLBpidHa8kfh5AecMn4EuSTPcyeb927MeQ5tJF0MP/kGXSkl/jNpBceMHrhASOMCji WZlgdl0VGI82WTWjKDs9XoJUN9lfrhU= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=mxSCqpot; spf=pass (imf24.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675645352; a=rsa-sha256; cv=none; b=fpwEFujQPNYH3v/r4EZhbzpdZgHJVfN+YO/+ObduRyR5mTlg6lfC6rLtFJeX/UHn8WzMro QOLvf/VhB9RUy0Wrq3FsOAeZACtbtFDZ/tNLqi0+1uG3YpXVlUmHYEEe/N6kCrII8RbK1R R/AmWoUolSimNwRDdMVnVAIMeWKd8UQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1675645351; x=1707181351; h=subject:from:to:cc:date:message-id:mime-version: content-transfer-encoding; bh=TYqfXRtKdMz9O0SA4QkLbocvvVmb0qoRQOoP818nNBQ=; b=mxSCqpotBkaKjeKLdmBmjMFowmRyRorFiROEcaAdwcTGSlFCudv900j5 jkQ07HmnQdICPZa6sv17r9cpIRZaddwMcvcN/cHD+drfxoz+oTTdY/N0Y uUBfddYlovGISAGoNg+D3x2NNtt5/1TiH3tVHJArVBJtRYDRUjUerZJb2 YCJCY0TT/37dWoR7NsyIos1ibrqbg6ztQRDapYBjdudegnu1cI3niT1I4 ZiPqnuYhTpz0c8aMjOkWtwDVT4FUO2js8y2wO88JNY8a+k3F79b4l1akn cSi0FXhkeM0hnSTvafFDQPUJWaX6SgkTL+F8bWGjxmP0rdx6wOtom3zeQ A==; X-IronPort-AV: E=McAfee;i="6500,9779,10612"; a="331243760" X-IronPort-AV: E=Sophos;i="5.97,276,1669104000"; d="scan'208";a="331243760" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2023 17:02:29 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10612"; a="643855712" X-IronPort-AV: E=Sophos;i="5.97,276,1669104000"; d="scan'208";a="643855712" Received: from mkrysak-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.212.255.187]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2023 17:02:29 -0800 Subject: [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default From: Dan Williams To: linux-cxl@vger.kernel.org Cc: David Hildenbrand , Kees Cook , stable@vger.kernel.org, Dave Hansen , Michal Hocko , linux-mm@kvack.org, linux-acpi@vger.kernel.org Date: Sun, 05 Feb 2023 17:02:29 -0800 Message-ID: <167564534874.847146.5222419648551436750.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: CDA4E180016 X-Stat-Signature: 8nd1oro517wep3xqoue4388twxtypkxo X-Rspam-User: X-HE-Tag: 1675645351-172080 X-HE-Meta: U2FsdGVkX1/mRLeWOnm+G8dluU57POvnQXJLthSy2+UHuFOH8nCkBsJBDVkKKyUIX+CSN0jgvi3Fd/rEYa+y1RMsl32nRpjspoTnUFLo+8gNRK5zBNLQDIJ3d4VUhFbloYhp7/a+eW3Bb2UjuxoBaf6gfv+p5u8Mry2D87NhqwsHMDmLCRc4TXCnMrGFoirkCLi3P7Y/Q2YPx8/+DnjnGx9VuEPliEyGw8GBvOSlMYKKbjWLiaRN470wjp1tIGeAJFcCn0a2HxEUo7sIeEZmg5jo9rslPSDhtT/0MG666ma8kp1PVrS5L53iGz5ukBxE3oy+tc9mLz2tcKppNZ6NF6IOgdAg5zbA7O4Vx3Hj+QQZjm2ktnsBZvIVQtLTBqUZsnGHRYZJ6j/+PPG2nQ6yNkSm9rRbCnz4QWyYyNdDY874FI0L0ysY0pm5Inp7n3kDBfiKgNTuIuP8e+IsyMWkjOs7tk5a7dT8VHxc3s33fuSxaESjdH40vf77T51IDqdTwfdLA8MsgIUqM93T56nAsEgkJqzOEmhgxgVdIrttP3q30GBaNEDhFbn5EzspHA7MbjbTEMcWWKP90c0nqFXTHAkcOqtdybzuwJ7z8cxGpe0n+09qtfydLY2Gls7y3gSPuF8aTWUj1LMDBlC+etBkvGP5f7D5REwHBqWZ0TH2fcv5FI3+wU+dN/Qg3COKi8v3OFtCaCiTLl2uuFv3iIrNyniDu6cdBvSc1GcoIm5qE0waIzc7DA/EoRa4EC4N1RFnAqQMhW1+OC+tqlo7XEbUroMkBuKAF36kzwB9nnAMTVRDh5Lj7k/zw5mI+7n8XW9XzKKJf59uHwDD2u+2Wlj1x8/mlKE4RR1aTeQbEBs8KCpWuY7rrNppKsy5MuDU8vux2nmrrNWNdMz5BggINh4lE5IUDh2Rvk0+gTVec97U6Ie6KbkjR3Igp5wqe9bPbMbC0KaVLaN1v2UdQX5gBjX erAWToBO vJ65mu9ZsQbKeQbmDdPHftlx2Iu2op17FEaZ1116GOUaTyTbT3ITTTlMJI2jCyZ5aQ+3VPg/RnK63QpIJQqvuXHtgNRJzQscCcq36Alx5NN9R+BroXrA1+SpEQIgTF3s+2K8dW0jCQHJ2cExSwLeIC+JLHLrbprB8eQfHS8ShuUgkgDYHWg4+TZrJqHWNMhYMEaGf1RI5R9zb6G4fCcJq3AH+EFVq8e8Ouncc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Summary: -------- CXL RAM support allows for the dynamic provisioning of new CXL RAM regions, and more routinely, assembling a region from an existing configuration established by platform-firmware. The latter is motivated by CXL memory RAS (Reliability, Availability and Serviceability) support, that requires associating device events with System Physical Address ranges and vice versa. The 'Soft Reserved' policy rework arranges for performance differentiated memory like CXL attached DRAM, or high-bandwidth memory, to be designated for 'System RAM' by default, rather than the device-dax dedicated access mode. That current device-dax default is confusing and surprising for the Pareto of users that do not expect memory to be quarantined for dedicated access by default. Most users expect all 'System RAM'-capable memory to show up in FREE(1). Details: -------- Recall that the Linux 'Soft Reserved' designation for memory is a reaction to platform-firmware, like EFI EDK2, delineating memory with the EFI Specific Purpose Memory attribute (EFI_MEMORY_SP). An alternative way to think of that attribute is that it specifies the *not* general-purpose memory pool. It is memory that may be too precious for general usage or not performant enough for some hot data structures. However, in the absence of explicit policy it should just be 'System RAM' by default. Rather than require every distribution to ship a udev policy to assign dax devices to dax_kmem (the device-memory hotplug driver) just make that the kernel default. This is similar to the rationale in: commit 8604d9e534a3 ("memory_hotplug: introduce CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE") With this change the relatively niche use case of accessing this memory via mapping a device-dax instance can be achieved by building with CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n, or specifying memhp_default_state=offline at boot, and then use: daxctl reconfigure-device $device -m devdax --force ...to shift the corresponding address range to device-dax access. The process of assembling a device-dax instance for a given CXL region device configuration is similar to the process of assembling a Device-Mapper or MDRAID storage-device array. Specifically, asynchronous probing by the PCI and driver core enumerates all CXL endpoints and their decoders. Then, once enough decoders have arrived to a describe a given region, that region is passed to the device-dax subsystem where it is subject to the above 'dax_kmem' policy. This assignment and policy choice is only possible if memory is set aside by the 'Soft Reserved' designation. Otherwise, CXL that is mapped as 'System RAM' becomes immutable by CXL driver mechanisms, but is still enumerated for RAS purposes. This series is also available via: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.3/cxl-ram-region ...and has gone through some preview testing in various forms. Tested-by: Fan Ni --- Dan Williams (18): cxl/Documentation: Update references to attributes added in v6.0 cxl/region: Add a mode attribute for regions cxl/region: Support empty uuids for non-pmem regions cxl/region: Validate region mode vs decoder mode cxl/region: Add volatile region creation support cxl/region: Refactor attach_target() for autodiscovery cxl/region: Move region-position validation to a helper kernel/range: Uplevel the cxl subsystem's range_contains() helper cxl/region: Enable CONFIG_CXL_REGION to be toggled cxl/region: Fix passthrough-decoder detection cxl/region: Add region autodiscovery tools/testing/cxl: Define a fixed volatile configuration to parse dax/hmem: Move HMAT and Soft reservation probe initcall level dax/hmem: Drop unnecessary dax_hmem_remove() dax/hmem: Convey the dax range via memregion_info() dax/hmem: Move hmem device registration to dax_hmem.ko dax: Assign RAM regions to memory-hotplug by default cxl/dax: Create dax devices for CXL RAM regions Documentation/ABI/testing/sysfs-bus-cxl | 64 +- MAINTAINERS | 1 drivers/acpi/numa/hmat.c | 4 drivers/cxl/Kconfig | 12 drivers/cxl/acpi.c | 3 drivers/cxl/core/core.h | 7 drivers/cxl/core/hdm.c | 8 drivers/cxl/core/pci.c | 5 drivers/cxl/core/port.c | 34 + drivers/cxl/core/region.c | 848 ++++++++++++++++++++++++++++--- drivers/cxl/cxl.h | 46 ++ drivers/cxl/cxlmem.h | 3 drivers/cxl/port.c | 26 + drivers/dax/Kconfig | 17 + drivers/dax/Makefile | 2 drivers/dax/bus.c | 53 +- drivers/dax/bus.h | 12 drivers/dax/cxl.c | 53 ++ drivers/dax/device.c | 3 drivers/dax/hmem/Makefile | 3 drivers/dax/hmem/device.c | 102 ++-- drivers/dax/hmem/hmem.c | 148 +++++ drivers/dax/kmem.c | 1 include/linux/dax.h | 7 include/linux/memregion.h | 2 include/linux/range.h | 5 lib/stackinit_kunit.c | 6 tools/testing/cxl/test/cxl.c | 146 +++++ 28 files changed, 1355 insertions(+), 266 deletions(-) create mode 100644 drivers/dax/cxl.c base-commit: 172738bbccdb4ef76bdd72fc72a315c741c39161