From patchwork Sun Mar 24 23:18:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13600988 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A95BA14E2E3; Sun, 24 Mar 2024 23:18:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322294; cv=none; b=Pyjok5KIcPPrDX9QcShEVVI3BM8w0eQYdfFjylbTFx8l2oRSTOTiVzxcR5bJxXLybtfWFq1646RbiqZy35G3FFTWutLnNM/SlxSJ7TRtkIXo2QSD5FsdEcKRXw7laHE7FHtxtZQd0w1z+l5frHlLPlioriilbl8ZtnrXgexFms4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322294; c=relaxed/simple; bh=ExbqMkEoopRff3LKVChNomfBbb1n44Pk8l4uAHbwlE0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=PiWZX7aRD9iwWKyC3wiw0wBxp/zdDMJgU5sKedcba37IFd77kOvSSVI+OG0rEbnivflCLMJEXcufGftZdG+l2hCH2S3A8/wO1XZbfSD0zf6MnNQm0we0jIbxLAtGO4bcbk64+5BiIK/HRzvNcbyZ/IAWEVbAg0wduJ/o/6e3+X8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Kr8tq9fQ; arc=none smtp.client-ip=192.198.163.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Kr8tq9fQ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322291; x=1742858291; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=ExbqMkEoopRff3LKVChNomfBbb1n44Pk8l4uAHbwlE0=; b=Kr8tq9fQ8iW6/0KUyTT8fNJZz91A1xLZVvEaKVdX1KmVW1SJ9s2TgCon bQdPpkxhRd/Uucxe+7fawdWFA8ZALkJCWCrzUevAc3UBN2GQwPp+nR0qU ZXSQOVmOlwzDqvoIFDzRtQ+ol7nGBKUcgt9Gfwpc7sDY9Cx3uAXbRiXaU 9L/DHTekKxGn7tTv5atjldF6jkCr0i5QuUeDqsAxu4fGQQnLfwYkI+QGu p4broHBpdpLOCUr5xcw0QvDZD8BNTtEIbhIuPn3DkbFSs0C6+LHVCbFqk ZPSgNjei70gUMLOpvrjLIhlBBBgsHyMXYm8oXBhhL399RjjHdvpcO2CQz Q==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="9260885" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="9260885" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15842194" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:07 -0700 From: ira.weiny@intel.com Date: Sun, 24 Mar 2024 16:18:04 -0700 Subject: [PATCH 01/26] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-1-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=4137; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=L0YFDCEsjv8HWC653cNW+kOC3bCUS327QGxxY4JOQM8=; b=biXfqR5QP6sUOlha0l3wJRnbYHnl4W23vQSOFjYBoz+1E/TIkRQR07lHRJdcwgPCgOikrU1Xg MksnR9HVghHAqUmllg5MShxurUIlSvsHrQ+V8apS4uEt7M+M+ujjY1w X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh Per the CXL 3.1 specification software must check the Command Effects Log (CEL) to know if a device supports dynamic capacity (DC). If the device does support DC the specifics of the DC Regions (0-7) are read through the mailbox. Flag DC Device (DCD) commands in a device if they are supported. Subsequent patches will key off these bits to configure DCD. Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny Reviewed-by: Jonathan Cameron Reviewed-by: Fan Ni Reviewed-by: Davidlohr Bueso Reviewed-by: Dave Jiang --- Changes for v1 [iweiny: update to latest master] [iweiny: update commit message] [iweiny: Based on the fix: https://lore.kernel.org/all/20230903-cxl-cel-fix-v1-1-e260c9467be3@intel.com/ [jonathan: remove unneeded format change] [jonathan: don't split security code in mbox.c] --- drivers/cxl/core/mbox.c | 33 +++++++++++++++++++++++++++++++++ drivers/cxl/cxlmem.h | 15 +++++++++++++++ 2 files changed, 48 insertions(+) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 9adda4795eb7..ed4131c6f50b 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -161,6 +161,34 @@ static void cxl_set_security_cmd_enabled(struct cxl_security_state *security, } } +static bool cxl_is_dcd_command(u16 opcode) +{ +#define CXL_MBOX_OP_DCD_CMDS 0x48 + + return (opcode >> 8) == CXL_MBOX_OP_DCD_CMDS; +} + +static void cxl_set_dcd_cmd_enabled(struct cxl_memdev_state *mds, + u16 opcode) +{ + switch (opcode) { + case CXL_MBOX_OP_GET_DC_CONFIG: + set_bit(CXL_DCD_ENABLED_GET_CONFIG, mds->dcd_cmds); + break; + case CXL_MBOX_OP_GET_DC_EXTENT_LIST: + set_bit(CXL_DCD_ENABLED_GET_EXTENT_LIST, mds->dcd_cmds); + break; + case CXL_MBOX_OP_ADD_DC_RESPONSE: + set_bit(CXL_DCD_ENABLED_ADD_RESPONSE, mds->dcd_cmds); + break; + case CXL_MBOX_OP_RELEASE_DC: + set_bit(CXL_DCD_ENABLED_RELEASE, mds->dcd_cmds); + break; + default: + break; + } +} + static bool cxl_is_poison_command(u16 opcode) { #define CXL_MBOX_OP_POISON_CMDS 0x43 @@ -733,6 +761,11 @@ static void cxl_walk_cel(struct cxl_memdev_state *mds, size_t size, u8 *cel) enabled++; } + if (cxl_is_dcd_command(opcode)) { + cxl_set_dcd_cmd_enabled(mds, opcode); + enabled++; + } + dev_dbg(dev, "Opcode 0x%04x %s\n", opcode, enabled ? "enabled" : "unsupported by driver"); } diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 20fb3b35e89e..79a67cff9143 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -238,6 +238,15 @@ struct cxl_event_state { struct mutex log_lock; }; +/* Device enabled DCD commands */ +enum dcd_cmd_enabled_bits { + CXL_DCD_ENABLED_GET_CONFIG, + CXL_DCD_ENABLED_GET_EXTENT_LIST, + CXL_DCD_ENABLED_ADD_RESPONSE, + CXL_DCD_ENABLED_RELEASE, + CXL_DCD_ENABLED_MAX +}; + /* Device enabled poison commands */ enum poison_cmd_enabled_bits { CXL_POISON_ENABLED_LIST, @@ -454,6 +463,7 @@ struct cxl_dev_state { * (CXL 2.0 8.2.9.5.1.1 Identify Memory Device) * @mbox_mutex: Mutex to synchronize mailbox access. * @firmware_version: Firmware version for the memory device. + * @dcd_cmds: List of DCD commands implemented by memory device * @enabled_cmds: Hardware commands found enabled in CEL. * @exclusive_cmds: Commands that are kernel-internal only * @total_bytes: sum of all possible capacities @@ -481,6 +491,7 @@ struct cxl_memdev_state { size_t lsa_size; struct mutex mbox_mutex; /* Protects device mailbox and firmware */ char firmware_version[0x10]; + DECLARE_BITMAP(dcd_cmds, CXL_DCD_ENABLED_MAX); DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX); DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX); u64 total_bytes; @@ -551,6 +562,10 @@ enum cxl_opcode { CXL_MBOX_OP_UNLOCK = 0x4503, CXL_MBOX_OP_FREEZE_SECURITY = 0x4504, CXL_MBOX_OP_PASSPHRASE_SECURE_ERASE = 0x4505, + CXL_MBOX_OP_GET_DC_CONFIG = 0x4800, + CXL_MBOX_OP_GET_DC_EXTENT_LIST = 0x4801, + CXL_MBOX_OP_ADD_DC_RESPONSE = 0x4802, + CXL_MBOX_OP_RELEASE_DC = 0x4803, CXL_MBOX_OP_MAX = 0x10000 }; From patchwork Sun Mar 24 23:18:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601116 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 93AEF2002C9; Sun, 24 Mar 2024 23:18:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322300; cv=none; b=jNdN6nG9duwo6jW9WpNGeBLdvfpWhItHmiz+9V8Mf438CwwjfVjmVcryS4boZFdN1YWpcTtNLuCMwnTdrps9X4VF8RSKC4IuQAvYofVl8KzMwNUI9l9J9+ls/ltL8XsMmJblX8pB6Kq0Kbo1vWCGdgUBlxnfiDV4sTOhLAgNv9s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322300; c=relaxed/simple; bh=h1CNRTltsU7gPeX80g46Nck3+CI4jauWHOLhiBqnfJQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=GnT9oXISqS2psTMqcy1/fQFmCb95CsiocXU0JUAlwzdEPMZkqiPenXj43MDZ/g9rW6wwlwssJaisFR5DKlteMoYKtvw//+6dCnqssdF0cm7VkbSp9qnlHP3ZBWwsec9DvPxHWiITGWkyFAsoI4lZeftDIpxb4uAoEWJLu446xT0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ONwCAvgp; arc=none smtp.client-ip=192.198.163.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ONwCAvgp" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322298; x=1742858298; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=h1CNRTltsU7gPeX80g46Nck3+CI4jauWHOLhiBqnfJQ=; b=ONwCAvgp7474I+BfCLhGKlBX7P8vCww7mY16sldczbfOQttAZzthK2vj TNnTLN/nH7y64/1PLkCLBXfoirk79GS0IdknLp/SZ+giJBvs5+Nl1T8Sp R+W3X8WddiJ6AqrF9j9riuFF9Z9AnleVma9EcUN2qLl6bOl9RaxI+sgfp FJTDBIEDKA/K2H0K19kUkmnyoMDgRJFCLmmOi2deOOgmBw288LIhFlBR1 /GBH5Lmj6VN8WTV8EhBxjMn4aMqqD5tAhQeO/Wu/5O7vWDBeqspwfDc0R DBn2yIaX+HsURGFLDTTKcBCMKfXqY2ifiyMcOkkRkmAXLGnEHtWlT3LJh Q==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="9260906" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="9260906" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15842195" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:08 -0700 From: ira.weiny@intel.com Date: Sun, 24 Mar 2024 16:18:05 -0700 Subject: [PATCH 02/26] cxl/core: Separate region mode from decoder mode Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-2-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, Jonathan Cameron X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=9537; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=V2iP5Szvwem/AOSd5vyUWK3LbafmZL0V8Ed3fQGBvZo=; b=zbhEvTN2nNzo9o8XT4Yvnq4HvBT7x0cTu1fxa5482N3wiqcoIfkyCZUtSkrlqa1b2IevYieNZ 4QdHBzjJz/FA+Rhov0PzbvXJaSudOgqYfR7gG98v5P+o6JKbMMKOClY X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh Until now region modes and decoder modes were equivalent in that they were either PMEM or RAM. With the upcoming addition of Dynamic Capacity regions (which will represent an array of device regions [better named partitions] the index of which could be different on different interleaved devices), the mode of an endpoint decoder and a region will no longer be equivalent. Define a new region mode enumeration and adjust the code for it. Suggested-by: Jonathan Cameron Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny --- Changes for v1 --- drivers/cxl/core/region.c | 77 +++++++++++++++++++++++++++++++++++------------ drivers/cxl/cxl.h | 26 ++++++++++++++-- 2 files changed, 81 insertions(+), 22 deletions(-) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 4c7fd2d5cccb..1723d17f121e 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -40,7 +40,7 @@ static ssize_t uuid_show(struct device *dev, struct device_attribute *attr, rc = down_read_interruptible(&cxl_region_rwsem); if (rc) return rc; - if (cxlr->mode != CXL_DECODER_PMEM) + if (cxlr->mode != CXL_REGION_PMEM) rc = sysfs_emit(buf, "\n"); else rc = sysfs_emit(buf, "%pUb\n", &p->uuid); @@ -353,7 +353,7 @@ static umode_t cxl_region_visible(struct kobject *kobj, struct attribute *a, * Support tooling that expects to find a 'uuid' attribute for all * regions regardless of mode. */ - if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_DECODER_PMEM) + if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_REGION_PMEM) return 0444; return a->mode; } @@ -516,7 +516,7 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr, { struct cxl_region *cxlr = to_cxl_region(dev); - return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxlr->mode)); + return sysfs_emit(buf, "%s\n", cxl_region_mode_name(cxlr->mode)); } static DEVICE_ATTR_RO(mode); @@ -542,7 +542,7 @@ static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size) /* ways, granularity and uuid (if PMEM) need to be set before HPA */ if (!p->interleave_ways || !p->interleave_granularity || - (cxlr->mode == CXL_DECODER_PMEM && uuid_is_null(&p->uuid))) + (cxlr->mode == CXL_REGION_PMEM && uuid_is_null(&p->uuid))) return -ENXIO; div64_u64_rem(size, (u64)SZ_256M * p->interleave_ways, &remainder); @@ -1683,6 +1683,17 @@ static int cxl_region_sort_targets(struct cxl_region *cxlr) return rc; } +static bool cxl_modes_compatible(enum cxl_region_mode rmode, + enum cxl_decoder_mode dmode) +{ + if (rmode == CXL_REGION_RAM && dmode == CXL_DECODER_RAM) + return true; + if (rmode == CXL_REGION_PMEM && dmode == CXL_DECODER_PMEM) + return true; + + return false; +} + static int cxl_region_attach(struct cxl_region *cxlr, struct cxl_endpoint_decoder *cxled, int pos) { @@ -1693,9 +1704,11 @@ static int cxl_region_attach(struct cxl_region *cxlr, struct cxl_dport *dport; int rc = -ENXIO; - if (cxled->mode != cxlr->mode) { - dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n", - dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode); + if (!cxl_modes_compatible(cxlr->mode, cxled->mode)) { + dev_dbg(&cxlr->dev, "%s region mode: %s mismatch decoder: %s\n", + dev_name(&cxled->cxld.dev), + cxl_region_mode_name(cxlr->mode), + cxl_decoder_mode_name(cxled->mode)); return -EINVAL; } @@ -2168,7 +2181,7 @@ static struct cxl_region *cxl_region_alloc(struct cxl_root_decoder *cxlrd, int i * devm_cxl_add_region - Adds a region to a decoder * @cxlrd: root decoder * @id: memregion id to create, or memregion_free() on failure - * @mode: mode for the endpoint decoders of this region + * @mode: mode of this region * @type: select whether this is an expander or accelerator (type-2 or type-3) * * This is the second step of region initialization. Regions exist within an @@ -2179,7 +2192,7 @@ static struct cxl_region *cxl_region_alloc(struct cxl_root_decoder *cxlrd, int i */ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd, int id, - enum cxl_decoder_mode mode, + enum cxl_region_mode mode, enum cxl_decoder_type type) { struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent); @@ -2188,11 +2201,12 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd, int rc; switch (mode) { - case CXL_DECODER_RAM: - case CXL_DECODER_PMEM: + case CXL_REGION_RAM: + case CXL_REGION_PMEM: break; default: - dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %d\n", mode); + dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %s\n", + cxl_region_mode_name(mode)); return ERR_PTR(-EINVAL); } @@ -2242,7 +2256,7 @@ static ssize_t create_ram_region_show(struct device *dev, } static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd, - enum cxl_decoder_mode mode, int id) + enum cxl_region_mode mode, int id) { int rc; @@ -2270,7 +2284,7 @@ static ssize_t create_pmem_region_store(struct device *dev, if (rc != 1) return -EINVAL; - cxlr = __create_region(cxlrd, CXL_DECODER_PMEM, id); + cxlr = __create_region(cxlrd, CXL_REGION_PMEM, id); if (IS_ERR(cxlr)) return PTR_ERR(cxlr); @@ -2290,7 +2304,7 @@ static ssize_t create_ram_region_store(struct device *dev, if (rc != 1) return -EINVAL; - cxlr = __create_region(cxlrd, CXL_DECODER_RAM, id); + cxlr = __create_region(cxlrd, CXL_REGION_RAM, id); if (IS_ERR(cxlr)) return PTR_ERR(cxlr); @@ -2800,6 +2814,24 @@ static int match_region_by_range(struct device *dev, void *data) return rc; } +static enum cxl_region_mode +cxl_decoder_to_region_mode(enum cxl_decoder_mode mode) +{ + switch (mode) { + case CXL_DECODER_NONE: + return CXL_REGION_NONE; + case CXL_DECODER_RAM: + return CXL_REGION_RAM; + case CXL_DECODER_PMEM: + return CXL_REGION_PMEM; + case CXL_DECODER_MIXED: + default: + return CXL_REGION_MIXED; + } + + return CXL_REGION_MIXED; +} + /* Establish an empty region covering the given HPA range */ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd, struct cxl_endpoint_decoder *cxled) @@ -2808,12 +2840,17 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd, struct cxl_port *port = cxlrd_to_port(cxlrd); struct range *hpa = &cxled->cxld.hpa_range; struct cxl_region_params *p; + enum cxl_region_mode mode; struct cxl_region *cxlr; struct resource *res; int rc; + if (cxled->mode == CXL_DECODER_DEAD) + return ERR_PTR(-EINVAL); + + mode = cxl_decoder_to_region_mode(cxled->mode); do { - cxlr = __create_region(cxlrd, cxled->mode, + cxlr = __create_region(cxlrd, mode, atomic_read(&cxlrd->region_id)); } while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY); @@ -2996,9 +3033,9 @@ static int cxl_region_probe(struct device *dev) return rc; switch (cxlr->mode) { - case CXL_DECODER_PMEM: + case CXL_REGION_PMEM: return devm_cxl_add_pmem_region(cxlr); - case CXL_DECODER_RAM: + case CXL_REGION_RAM: /* * The region can not be manged by CXL if any portion of * it is already online as 'System RAM' @@ -3010,8 +3047,8 @@ static int cxl_region_probe(struct device *dev) return 0; return devm_cxl_add_dax_region(cxlr); default: - dev_dbg(&cxlr->dev, "unsupported region mode: %d\n", - cxlr->mode); + dev_dbg(&cxlr->dev, "unsupported region mode: %s\n", + cxl_region_mode_name(cxlr->mode)); return -ENXIO; } } diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 003feebab79b..9a0cce1e6fca 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -383,6 +383,27 @@ static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode) return "mixed"; } +enum cxl_region_mode { + CXL_REGION_NONE, + CXL_REGION_RAM, + CXL_REGION_PMEM, + CXL_REGION_MIXED, +}; + +static inline const char *cxl_region_mode_name(enum cxl_region_mode mode) +{ + static const char * const names[] = { + [CXL_REGION_NONE] = "none", + [CXL_REGION_RAM] = "ram", + [CXL_REGION_PMEM] = "pmem", + [CXL_REGION_MIXED] = "mixed", + }; + + if (mode >= CXL_REGION_NONE && mode <= CXL_REGION_MIXED) + return names[mode]; + return "mixed"; +} + /* * Track whether this decoder is reserved for region autodiscovery, or * free for userspace provisioning. @@ -511,7 +532,8 @@ struct cxl_region_params { * struct cxl_region - CXL region * @dev: This region's device * @id: This region's id. Id is globally unique across all regions - * @mode: Endpoint decoder allocation / access mode + * @mode: Region mode which defines which endpoint decoder mode the region is + * compatible with * @type: Endpoint decoder target type * @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown * @cxlr_pmem: (for pmem regions) cached copy of the nvdimm bridge @@ -521,7 +543,7 @@ struct cxl_region_params { struct cxl_region { struct device dev; int id; - enum cxl_decoder_mode mode; + enum cxl_region_mode mode; enum cxl_decoder_type type; struct cxl_nvdimm_bridge *cxl_nvb; struct cxl_pmem_region *cxlr_pmem; From patchwork Sun Mar 24 23:18:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601013 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C57DD14E2EF; Sun, 24 Mar 2024 23:18:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322298; cv=none; b=VwPHw9gTqD3qTktP5qPIDRuQWJi0oZIKfoZ5oAt1zg/XaYjsS/GshzcelPdhOb9IvCHw/EUDYD21RcxqH3X/OkqytZcbT6zA4hNdnMdaWsRwUcLcxK9Y6xYAM4zsSknKY9cmYbZqwo7PdnWTF1Nd/XhE7p8cFX4m3rZ6US/fZPU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322298; c=relaxed/simple; bh=gXFukfdO+9GClAuLeiJnLXVckGkIjmj7N2RX/CD17Iw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=oInV7pyGneeofiV282MktFApItTFCWYIhA77+ek4xwiH5PaM2ARe9OxX7IrdiFw+FbZj8pRXznEiFYf0wJf32yjAeVsYS2t28ViufXuswF8owIvorJ9uWdffPorj8CLP9k3hLYTHRQB0lKiQqD+CUkZ8kFJzGFGEs4ifSmtZt08= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=AHcV2fRD; arc=none smtp.client-ip=192.198.163.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="AHcV2fRD" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322295; x=1742858295; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=gXFukfdO+9GClAuLeiJnLXVckGkIjmj7N2RX/CD17Iw=; b=AHcV2fRD6KOodno0x5r9r2bjScf8BjLL3iusqyWy9LTW7f08yxqm1xAX 4ZgiuhXSmDsO/MPR/tV7pjKAprDzHSzsimSicMM65ji4gNwIek1lS1n21 vIiOJFObs7Q2LoTOVttfndgGs4PwqAe8n+75xzQ1lG4YcXXXqBwuTiDmB 3F4ketcx4unJDM5Umjkz7uJ+yiMCYRLZa+Qhh+LZ7UjweBct6krNE3h0g AM9zfvcvH7yRfzPd9zWi60AOh73Df4I2+G53lOP2C7QBrXKB+wX4XCDHF rpVGVbVCv27GArIlrCcZy7Yj8oT5YaowxHpd+8mFmBc6AVq2MatmymPIi Q==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="9260901" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="9260901" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15842197" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:09 -0700 From: ira.weiny@intel.com Date: Sun, 24 Mar 2024 16:18:06 -0700 Subject: [PATCH 03/26] cxl/mem: Read dynamic capacity configuration from the device Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-3-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=13179; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=YBe1lZctaajAhYBvq1juOjo8yj8PP7PHSCE92tEpsuo=; b=zHp9gqYgy43Ah8BI0jC3MkZistAm75xnKwrLVcGy9HZuOUVQ065InukpuZXXuB3HfWEZBN8lG KMFMC5YtxKyBRKuoIAj235OSGz/xXZxNku0nDDZNKl20a6x0YNRMa4K X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh Devices can optionally support Dynamic Capacity (DC). These devices are known as Dynamic Capacity Devices (DCD). Implement the DC mailbox commands as specified in CXL 3.1 section 8.2.9.9.9 (opcodes 48XXh). Read the DC configuration and store the DC region information in the device state. Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny --- Changes for v1 [Jørgen: ensure CXL 2.0 device support by removing dc_event_log_size] [iweiny/Jørgen: use get DC config command to signal DCD support] [djiang: fix subject] [Fan: add additional region configuration checks] [Jonathan/djiang: split out region mode changes] [Jonathan: fix up comments/kdoc] [Jonathan: s/cxl_get_dc_id/cxl_get_dc_config/] [Jonathan: use __free() in identify call] [Jonathan: remove unneeded formatting changes] [Jonathan: s/cxl_mbox_dynamic_capacity/cxl_mbox_get_dc_config_out/] [Jonathan: s/cxl_mbox_get_dc_config/cxl_mbox_get_dc_config_in/] [iweiny: remove type2 work dependancy/rebase on master] [iweiny: fix 0day build issues] --- drivers/cxl/core/mbox.c | 184 +++++++++++++++++++++++++++++++++++++++++++++++- drivers/cxl/cxlmem.h | 49 +++++++++++++ drivers/cxl/pci.c | 4 ++ 3 files changed, 236 insertions(+), 1 deletion(-) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index ed4131c6f50b..14e8a7528a8b 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -1123,7 +1123,7 @@ int cxl_dev_state_identify(struct cxl_memdev_state *mds) if (rc < 0) return rc; - mds->total_bytes = + mds->static_cap = le64_to_cpu(id.total_capacity) * CXL_CAPACITY_MULTIPLIER; mds->volatile_only_bytes = le64_to_cpu(id.volatile_capacity) * CXL_CAPACITY_MULTIPLIER; @@ -1230,6 +1230,175 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd) return rc; } +static int cxl_dc_save_region_info(struct cxl_memdev_state *mds, u8 index, + struct cxl_dc_region_config *region_config) +{ + struct cxl_dc_region_info *dcr = &mds->dc_region[index]; + struct device *dev = mds->cxlds.dev; + + dcr->base = le64_to_cpu(region_config->region_base); + dcr->decode_len = le64_to_cpu(region_config->region_decode_length); + dcr->decode_len *= CXL_CAPACITY_MULTIPLIER; + dcr->len = le64_to_cpu(region_config->region_length); + dcr->blk_size = le64_to_cpu(region_config->region_block_size); + dcr->dsmad_handle = le32_to_cpu(region_config->region_dsmad_handle); + dcr->flags = region_config->flags; + snprintf(dcr->name, CXL_DC_REGION_STRLEN, "dc%d", index); + + /* Check regions are in increasing DPA order */ + if (index > 0) { + struct cxl_dc_region_info *prev_dcr = &mds->dc_region[index - 1]; + + if ((prev_dcr->base + prev_dcr->decode_len) > dcr->base) { + dev_err(dev, + "DPA ordering violation for DC region %d and %d\n", + index - 1, index); + return -EINVAL; + } + } + + if (!IS_ALIGNED(dcr->base, SZ_256M) || + !IS_ALIGNED(dcr->base, dcr->blk_size)) { + dev_err(dev, "DC region %d invalid base %#llx blk size %#llx\n", index, + dcr->base, dcr->blk_size); + return -EINVAL; + } + + if (dcr->decode_len == 0 || dcr->len == 0 || dcr->decode_len < dcr->len || + !IS_ALIGNED(dcr->len, dcr->blk_size)) { + dev_err(dev, "DC region %d invalid length; decode %#llx len %#llx blk size %#llx\n", + index, dcr->decode_len, dcr->len, dcr->blk_size); + return -EINVAL; + } + + if (dcr->blk_size == 0 || dcr->blk_size % 0x40 || + !is_power_of_2(dcr->blk_size)) { + dev_err(dev, "DC region %d invalid block size; %#llx\n", + index, dcr->blk_size); + return -EINVAL; + } + + dev_dbg(dev, + "DC region %s DPA: %#llx LEN: %#llx BLKSZ: %#llx\n", + dcr->name, dcr->base, dcr->decode_len, dcr->blk_size); + + return 0; +} + +/* Returns the number of regions in dc_resp or -ERRNO */ +static int cxl_get_dc_config(struct cxl_memdev_state *mds, u8 start_region, + struct cxl_mbox_get_dc_config_out *dc_resp, + size_t dc_resp_size) +{ + struct cxl_mbox_get_dc_config_in get_dc = (struct cxl_mbox_get_dc_config_in) { + .region_count = CXL_MAX_DC_REGION, + .start_region_index = start_region, + }; + struct cxl_mbox_cmd mbox_cmd = (struct cxl_mbox_cmd) { + .opcode = CXL_MBOX_OP_GET_DC_CONFIG, + .payload_in = &get_dc, + .size_in = sizeof(get_dc), + .size_out = dc_resp_size, + .payload_out = dc_resp, + .min_out = 1, + }; + struct device *dev = mds->cxlds.dev; + int rc; + + rc = cxl_internal_send_cmd(mds, &mbox_cmd); + if (rc < 0) + return rc; + + rc = dc_resp->avail_region_count - start_region; + + /* + * The number of regions in the payload may have been truncated due to + * payload_size limits; if so adjust the returned count to match. + */ + if (mbox_cmd.size_out < sizeof(*dc_resp)) + rc = CXL_REGIONS_RETURNED(mbox_cmd.size_out); + + dev_dbg(dev, "Read %d/%d DC regions\n", rc, dc_resp->avail_region_count); + + return rc; +} + +static bool cxl_dcd_supported(struct cxl_memdev_state *mds) +{ + return test_bit(CXL_DCD_ENABLED_GET_CONFIG, mds->dcd_cmds); +} + +/** + * cxl_dev_dynamic_capacity_identify() - Reads the dynamic capacity + * information from the device. + * @mds: The memory device state + * + * Read Dynamic Capacity information from the device and populate the state + * structures for later use. + * + * Return: 0 if identify was executed successfully, -ERRNO on error. + */ +int cxl_dev_dynamic_capacity_identify(struct cxl_memdev_state *mds) +{ + size_t dc_resp_size = mds->payload_size; + struct device *dev = mds->cxlds.dev; + u8 start_region, i; + int rc = 0; + + for (i = 0; i < CXL_MAX_DC_REGION; i++) + snprintf(mds->dc_region[i].name, CXL_DC_REGION_STRLEN, ""); + + /* Check GET_DC_CONFIG is supported by device */ + if (!cxl_dcd_supported(mds)) { + dev_dbg(dev, "DCD not supported\n"); + return 0; + } + + struct cxl_mbox_get_dc_config_out *dc_resp __free(kfree) = + kvmalloc(dc_resp_size, GFP_KERNEL); + if (!dc_resp) + return -ENOMEM; + + start_region = 0; + do { + int j; + + rc = cxl_get_dc_config(mds, start_region, dc_resp, dc_resp_size); + if (rc < 0) { + dev_dbg(dev, "Failed to get DC config: %d\n", rc); + return rc; + } + + mds->nr_dc_region += rc; + + if (mds->nr_dc_region < 1 || mds->nr_dc_region > CXL_MAX_DC_REGION) { + dev_err(dev, "Invalid num of dynamic capacity regions %d\n", + mds->nr_dc_region); + return -EINVAL; + } + + for (i = start_region, j = 0; i < mds->nr_dc_region; i++, j++) { + rc = cxl_dc_save_region_info(mds, i, &dc_resp->region[j]); + if (rc) { + dev_dbg(dev, "Failed to save region info: %d\n", rc); + return rc; + } + } + + start_region = mds->nr_dc_region; + + } while (mds->nr_dc_region < dc_resp->avail_region_count); + + mds->dynamic_cap = + mds->dc_region[mds->nr_dc_region - 1].base + + mds->dc_region[mds->nr_dc_region - 1].decode_len - + mds->dc_region[0].base; + dev_dbg(dev, "Total dynamic capacity: %#llx\n", mds->dynamic_cap); + + return 0; +} +EXPORT_SYMBOL_NS_GPL(cxl_dev_dynamic_capacity_identify, CXL); + static int add_dpa_res(struct device *dev, struct resource *parent, struct resource *res, resource_size_t start, resource_size_t size, const char *type) @@ -1260,8 +1429,12 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds) { struct cxl_dev_state *cxlds = &mds->cxlds; struct device *dev = cxlds->dev; + size_t untenanted_mem; int rc; + untenanted_mem = mds->dc_region[0].base - mds->static_cap; + mds->total_bytes = mds->static_cap + untenanted_mem + mds->dynamic_cap; + if (!cxlds->media_ready) { cxlds->dpa_res = DEFINE_RES_MEM(0, 0); cxlds->ram_res = DEFINE_RES_MEM(0, 0); @@ -1271,6 +1444,15 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds) cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes); + for (int i = 0; i < mds->nr_dc_region; i++) { + struct cxl_dc_region_info *dcr = &mds->dc_region[i]; + + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->dc_res[i], + dcr->base, dcr->decode_len, dcr->name); + if (rc) + return rc; + } + if (mds->partition_align_bytes == 0) { rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0, mds->volatile_only_bytes, "ram"); diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 79a67cff9143..4624cf612c1e 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -402,6 +402,7 @@ enum cxl_devtype { CXL_DEVTYPE_CLASSMEM, }; +#define CXL_MAX_DC_REGION 8 /** * struct cxl_dpa_perf - DPA performance property entry * @dpa_range - range for DPA address @@ -431,6 +432,8 @@ struct cxl_dpa_perf { * @dpa_res: Overall DPA resource tree for the device * @pmem_res: Active Persistent memory capacity configuration * @ram_res: Active Volatile memory capacity configuration + * @dc_res: Active Dynamic Capacity memory configuration for each possible + * region * @serial: PCIe Device Serial Number * @type: Generic Memory Class device or Vendor Specific Memory device */ @@ -445,10 +448,22 @@ struct cxl_dev_state { struct resource dpa_res; struct resource pmem_res; struct resource ram_res; + struct resource dc_res[CXL_MAX_DC_REGION]; u64 serial; enum cxl_devtype type; }; +#define CXL_DC_REGION_STRLEN 8 +struct cxl_dc_region_info { + u64 base; + u64 decode_len; + u64 len; + u64 blk_size; + u32 dsmad_handle; + u8 flags; + u8 name[CXL_DC_REGION_STRLEN]; +}; + /** * struct cxl_memdev_state - Generic Type-3 Memory Device Class driver data * @@ -467,6 +482,8 @@ struct cxl_dev_state { * @enabled_cmds: Hardware commands found enabled in CEL. * @exclusive_cmds: Commands that are kernel-internal only * @total_bytes: sum of all possible capacities + * @static_cap: Sum of static RAM and PMEM capacities + * @dynamic_cap: Complete DPA range occupied by DC regions * @volatile_only_bytes: hard volatile capacity * @persistent_only_bytes: hard persistent capacity * @partition_align_bytes: alignment size for partition-able capacity @@ -474,6 +491,8 @@ struct cxl_dev_state { * @active_persistent_bytes: sum of hard + soft persistent * @next_volatile_bytes: volatile capacity change pending device reset * @next_persistent_bytes: persistent capacity change pending device reset + * @nr_dc_region: number of DC regions implemented in the memory device + * @dc_region: array containing info about the DC regions * @event: event log driver state * @poison: poison driver state info * @security: security driver state info @@ -494,7 +513,10 @@ struct cxl_memdev_state { DECLARE_BITMAP(dcd_cmds, CXL_DCD_ENABLED_MAX); DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX); DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX); + u64 total_bytes; + u64 static_cap; + u64 dynamic_cap; u64 volatile_only_bytes; u64 persistent_only_bytes; u64 partition_align_bytes; @@ -506,6 +528,9 @@ struct cxl_memdev_state { struct cxl_dpa_perf ram_perf; struct cxl_dpa_perf pmem_perf; + u8 nr_dc_region; + struct cxl_dc_region_info dc_region[CXL_MAX_DC_REGION]; + struct cxl_event_state event; struct cxl_poison_state poison; struct cxl_security_state security; @@ -705,6 +730,29 @@ struct cxl_mbox_set_partition_info { #define CXL_SET_PARTITION_IMMEDIATE_FLAG BIT(0) +struct cxl_mbox_get_dc_config_in { + u8 region_count; + u8 start_region_index; +} __packed; + +/* See CXL 3.0 Table 125 get dynamic capacity config Output Payload */ +struct cxl_mbox_get_dc_config_out { + u8 avail_region_count; + u8 rsvd[7]; + struct cxl_dc_region_config { + __le64 region_base; + __le64 region_decode_length; + __le64 region_length; + __le64 region_block_size; + __le32 region_dsmad_handle; + u8 flags; + u8 rsvd[3]; + } __packed region[]; +} __packed; +#define CXL_DYNAMIC_CAPACITY_SANITIZE_ON_RELEASE_FLAG BIT(0) +#define CXL_REGIONS_RETURNED(size_out) \ + ((size_out - 8) / sizeof(struct cxl_dc_region_config)) + /* Set Timestamp CXL 3.0 Spec 8.2.9.4.2 */ struct cxl_mbox_set_timestamp_in { __le64 timestamp; @@ -828,6 +876,7 @@ enum { int cxl_internal_send_cmd(struct cxl_memdev_state *mds, struct cxl_mbox_cmd *cmd); int cxl_dev_state_identify(struct cxl_memdev_state *mds); +int cxl_dev_dynamic_capacity_identify(struct cxl_memdev_state *mds); int cxl_await_media_ready(struct cxl_dev_state *cxlds); int cxl_enumerate_cmds(struct cxl_memdev_state *mds); int cxl_mem_create_range_info(struct cxl_memdev_state *mds); diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 2ff361e756d6..216881455364 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -874,6 +874,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (rc) return rc; + rc = cxl_dev_dynamic_capacity_identify(mds); + if (rc) + return rc; + rc = cxl_mem_create_range_info(mds); if (rc) return rc; From patchwork Sun Mar 24 23:18:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601019 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 22BB12002DE; Sun, 24 Mar 2024 23:18:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322299; cv=none; b=NYGcZ6qGF8lG2OObWKP1dRfnQrGkNqpo2ewS/0qJBos/kKVE7lKYHiazEizyLqbW2/vfPhO6Uaexc/eArcT0L6J8oZMVMomxA9gxiEfUb017DIm4HEykV+nk38L8ZNlGHfc+kFOyEhcgNow3VI9eEfrUMHjXqnRNmmKT+jwZoPk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322299; c=relaxed/simple; bh=rZ2W5lZQWy401VuvLUwAJKAtZJoCb6t43e8Yxv3Tz0g=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=JWO+ziYHTmNwCg/roZqs7jv7ApXCf6ZwlFU6MEyBUwAjZLMHNhDE4zyQZnTOLGWsxbPmkqEQdH21OItldEealBA8p7v1AEbQ/p4pfvaaYX1YkSKXhOKEWaiddsWJtEJhX/tBGVNq3xJslA2JgjKRDaeNpFo6eBgcplhw7beMSTw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Dsf7EDKl; arc=none smtp.client-ip=192.198.163.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Dsf7EDKl" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322298; x=1742858298; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=rZ2W5lZQWy401VuvLUwAJKAtZJoCb6t43e8Yxv3Tz0g=; b=Dsf7EDKlMVCtG+50wfjZOsPuCUnv88d3XCkhBspru7PfmAoBJDi0sOTx PJMhvs8ABtfmDC4KpRAWAWAvauX0nZ7rTkBqElK0zcqAYMFeztqQJavPQ 8Ansl0JPoU6Kz7s1HEfXOkax4XXwr77nRDpeBZSE+dBB3aSkSpKIN0LOZ VPolb1kD3rHMtOywZ522QoCHplQcufrpn92/DPnvunPrRYWsLDLaDMJyl 3cWe8ZKhV/MG0+nzUz2lkujc/ZrNK94a6gciFixclzu9s7tKke9Zb1LDY fcH2GR9wLvlNrh2WCcWCOZmul2eNjVJ6n9cl9UyUXOrBzFnktdOkKNR9a w==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="9260910" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="9260910" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15842201" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:10 -0700 From: ira.weiny@intel.com Date: Sun, 24 Mar 2024 16:18:07 -0700 Subject: [PATCH 04/26] cxl/region: Add dynamic capacity decoder and region modes Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-4-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=3180; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=DrGB6SmqkZKimHQWlqq8YFuwMkutNGEKvIPEi1V59b0=; b=3lRxBOPbiMO7+9CZhlovdLnJMcm1WOcKabF0rjUgBBivq8DakfJ0WyNvTZ7vgWDY5v2KIuEt0 on7i3qRK2GjALoS21svVLocb0QiP02bumrED3tIk6w+/YzbKU8exbYY X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh Region mode must reflect a general dynamic capacity type which is associated with a specific Dynamic Capacity (DC) partitions in each device decoder within the region. DC partitions are also know as DC regions per CXL 3.1. Decoder mode reflects a specific DC partition. Define the new modes to use in subsequent patches and the helper functions required to make the association between these new modes. Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny Reviewed-by: Jonathan Cameron Reviewed-by: Fan Ni --- Changes for v1 [iweiny: split out from: Add dynamic capacity cxl region support.] --- drivers/cxl/core/region.c | 4 ++++ drivers/cxl/cxl.h | 23 +++++++++++++++++++++++ 2 files changed, 27 insertions(+) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 1723d17f121e..ec3b8c6948e9 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1690,6 +1690,8 @@ static bool cxl_modes_compatible(enum cxl_region_mode rmode, return true; if (rmode == CXL_REGION_PMEM && dmode == CXL_DECODER_PMEM) return true; + if (rmode == CXL_REGION_DC && cxl_decoder_mode_is_dc(dmode)) + return true; return false; } @@ -2824,6 +2826,8 @@ cxl_decoder_to_region_mode(enum cxl_decoder_mode mode) return CXL_REGION_RAM; case CXL_DECODER_PMEM: return CXL_REGION_PMEM; + case CXL_DECODER_DC0 ... CXL_DECODER_DC7: + return CXL_REGION_DC; case CXL_DECODER_MIXED: default: return CXL_REGION_MIXED; diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 9a0cce1e6fca..3b8935089c0c 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -365,6 +365,14 @@ enum cxl_decoder_mode { CXL_DECODER_NONE, CXL_DECODER_RAM, CXL_DECODER_PMEM, + CXL_DECODER_DC0, + CXL_DECODER_DC1, + CXL_DECODER_DC2, + CXL_DECODER_DC3, + CXL_DECODER_DC4, + CXL_DECODER_DC5, + CXL_DECODER_DC6, + CXL_DECODER_DC7, CXL_DECODER_MIXED, CXL_DECODER_DEAD, }; @@ -375,6 +383,14 @@ static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode) [CXL_DECODER_NONE] = "none", [CXL_DECODER_RAM] = "ram", [CXL_DECODER_PMEM] = "pmem", + [CXL_DECODER_DC0] = "dc0", + [CXL_DECODER_DC1] = "dc1", + [CXL_DECODER_DC2] = "dc2", + [CXL_DECODER_DC3] = "dc3", + [CXL_DECODER_DC4] = "dc4", + [CXL_DECODER_DC5] = "dc5", + [CXL_DECODER_DC6] = "dc6", + [CXL_DECODER_DC7] = "dc7", [CXL_DECODER_MIXED] = "mixed", }; @@ -383,10 +399,16 @@ static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode) return "mixed"; } +static inline bool cxl_decoder_mode_is_dc(enum cxl_decoder_mode mode) +{ + return (mode >= CXL_DECODER_DC0 && mode <= CXL_DECODER_DC7); +} + enum cxl_region_mode { CXL_REGION_NONE, CXL_REGION_RAM, CXL_REGION_PMEM, + CXL_REGION_DC, CXL_REGION_MIXED, }; @@ -396,6 +418,7 @@ static inline const char *cxl_region_mode_name(enum cxl_region_mode mode) [CXL_REGION_NONE] = "none", [CXL_REGION_RAM] = "ram", [CXL_REGION_PMEM] = "pmem", + [CXL_REGION_DC] = "dc", [CXL_REGION_MIXED] = "mixed", }; From patchwork Sun Mar 24 23:18:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601021 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 606992002F5; Sun, 24 Mar 2024 23:18:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322301; cv=none; b=DtockoKM48B68ukALdZsiml2HAbuCmUuf+sugGiHJnjTUFucNQAV5MQbU7W0B8+Ep7pvcbRcN1FcGt/6VCpbSeokJc8A1i3aGwUYyrfU0Fi8xjaURr+lKAJAZ/C/VQmTnkFiwVnE+8i22N/wqJNfeWSiWQbaZ44EoYGYaOtN3LA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322301; c=relaxed/simple; bh=vBU/pflBdLmZJMuDuDL9v0okN0Tx6M0I59I2YcJWzag=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=t9JjJ7wHwORoW6BPcvE55OXSoCve4E9u3tGHFavFXuuXPceFdbKlv3DCADaroXBx0BWNoQh81XEdipt2fwQHkpWMt3CqrESBW13MIGew/Ubhz5mh3A3zdVR2KvxMZRJNbGFam3GSmjm9wnDQRJ3r0yVyzQTVrsS94gLt4oQyj9I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=fqrJ7Cwx; arc=none smtp.client-ip=192.198.163.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="fqrJ7Cwx" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322299; x=1742858299; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=vBU/pflBdLmZJMuDuDL9v0okN0Tx6M0I59I2YcJWzag=; b=fqrJ7Cwx3zxW2fC5t7fuER1bq05wCaGjq3Z1yBS1XdOh9xyGHAxLKxGK uUImiYIUNBCVq0kckf8OCz08Jt8M2vEMbNV1KdqbOJyYUADda7Nt7TYvw i3xNwU3aoW1A8jXMStNmf4ycg6U/5ROJI0/1xzoGpyjMyRoB9PTGTF9lg hhAjY4RH2IdAz4J0jau8Vjy8V3ix8cd/kXVulNRPmVtp5OkglCinFDZwc R6hBQEfKUZa2ubcqm3CZJchuCMO2SwLSfDtgtjZm4KyAm3MzwxfXnN7dZ F/iCm/n7a5jEA+tw9f7nqMT1+z5R0VKFaed4Manm5K5ZsLAx1FOGKFTvT Q==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="9260912" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="9260912" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15842205" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:11 -0700 From: Ira Weiny Date: Sun, 24 Mar 2024 16:18:08 -0700 Subject: [PATCH 05/26] cxl/core: Simplify cxl_dpa_set_mode() Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-5-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=2474; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=vBU/pflBdLmZJMuDuDL9v0okN0Tx6M0I59I2YcJWzag=; b=fPq8+Av/qDsOx0C9MXtjtTcZ5bFsXczKaf3cCuDETt+WZxoouImZ+EB0z7WJd3A3cC3xCVvOV DjtIP9Nhs7VCRZnj3/AmW3gNiiSoEvv09UPaOP3QML8YIMvDQ2dWVRF X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= cxl_dpa_set_mode() checks the mode for validity two times, once outside of the DPA RW semaphore and again within. The function is not in a critical path. Prior to Dynamic Capacity the extra check was not much of an issue. The addition of DC modes increases the complexity of the check. Simplify the mode check before adding the more complex DC modes. Signed-off-by: Ira Weiny Reviewed-by: Jonathan Cameron Reviewed-by: Davidlohr Bueso Reviewed-by: Fan Ni Reviewed-by: Dave Jiang --- Changes for v1: [iweiny: new patch] [Jonathan: based on getting rid of the loop in cxl_dpa_set_mode] [Jonathan: standardize on resource_size() == 0] --- drivers/cxl/core/hdm.c | 45 ++++++++++++++++++--------------------------- 1 file changed, 18 insertions(+), 27 deletions(-) diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 7d97790b893d..66b8419fd0c3 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -411,44 +411,35 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct device *dev = &cxled->cxld.dev; - int rc; + guard(rwsem_write)(&cxl_dpa_rwsem); + if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) + return -EBUSY; + + /* + * Check that the mode is supported by the current partition + * configuration + */ switch (mode) { case CXL_DECODER_RAM: + if (!resource_size(&cxlds->ram_res)) { + dev_dbg(dev, "no available ram capacity\n"); + return -ENXIO; + } + break; case CXL_DECODER_PMEM: + if (!resource_size(&cxlds->pmem_res)) { + dev_dbg(dev, "no available pmem capacity\n"); + return -ENXIO; + } break; default: dev_dbg(dev, "unsupported mode: %d\n", mode); return -EINVAL; } - down_write(&cxl_dpa_rwsem); - if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) { - rc = -EBUSY; - goto out; - } - - /* - * Only allow modes that are supported by the current partition - * configuration - */ - if (mode == CXL_DECODER_PMEM && !resource_size(&cxlds->pmem_res)) { - dev_dbg(dev, "no available pmem capacity\n"); - rc = -ENXIO; - goto out; - } - if (mode == CXL_DECODER_RAM && !resource_size(&cxlds->ram_res)) { - dev_dbg(dev, "no available ram capacity\n"); - rc = -ENXIO; - goto out; - } - cxled->mode = mode; - rc = 0; -out: - up_write(&cxl_dpa_rwsem); - - return rc; + return 0; } int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) From patchwork Sun Mar 24 23:18:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601011 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2F56615E802; Sun, 24 Mar 2024 23:18:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322296; cv=none; b=N3H+8yxH9MBqABwxOs5HWu95i7n8Te+rH7YomDJStGSQo2wMDsQY9ww3KjvKb8yaAdG+0O/PlEp3f/o49IxGnmhVC5Tg/IK5HrCDe2LuwKnh6wanmVYt4g+xb8N3HEcxMulUpbg9U8NU2BYC0YqTMMFYAGNPhTMJ7a+Or53801A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322296; c=relaxed/simple; bh=M+medqUQU7lSqKA8l3jnSvTpXlTcFSndgvbsEVhVGSE=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=gEInjHlyXJNSh4UNelLiGLayVL26Q3iXe/7SIEoaFHl+ToAIhiY2bAa4+SdrFSk2NAaYvsPVoEGolMzR3Tmb+pm6h1xJrd2HYEJTtmQyH06yt0kLuNVw2kJerocA19YHqBmfeenuQLFdNPX4MlpXtv/BKZHV1ck6Wl1Qy3KPHGA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=R910GRuI; arc=none smtp.client-ip=192.198.163.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="R910GRuI" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322294; x=1742858294; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=M+medqUQU7lSqKA8l3jnSvTpXlTcFSndgvbsEVhVGSE=; b=R910GRuIV+e9EdIJftqoKllO0TrIJGfS5bZz5obDCEVCVbZKXtZE446i BClirswQyUcJ6qko32wB7r0pS1W5BUDzJuJ20QgPtYD1OzWsxi6EOm+aL Gg0Hdi8eSQzwOyXNG8G9OeFFYa8yeCt+d2XqVQLyPMGA1ML3oW6Wv7wUh dC1qYHZbmuUlhAxzsX0TckAzS+zhkV0qRUbqKA6GizkUwGHYwN32P+EHm +3A5e770zm5XMMWu+0wSpVNT0xLZFwaF10Wegjv/oU52kZbc7taK2N2YC n7qNiyE8xVhDaY3JdHUAIv2ey+RV3dAd32qy+NkPFqbeAlDI9nPojcKe7 A==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="10078005" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="10078005" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15869353" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by orviesa006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:12 -0700 From: ira.weiny@intel.com Date: Sun, 24 Mar 2024 16:18:09 -0700 Subject: [PATCH 06/26] cxl/port: Add Dynamic Capacity mode support to endpoint decoders Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-6-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=4932; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=E2n+KEwFlMb8AhAZBsqU2WmNuZ+3LaZb4X01l2hoYBk=; b=tTH4ZAwHdfKGAQPOCI5pNgXdEob7CTitZZvEt7eLXnGvKVR1KOpNg6o1Z+oNzN8TsI8B/c66v cIINC79d5udBywpso7Bk2MnApBWLBCH3y2qzBGkSKytjp5XCEaL7iyg X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh Endpoint decoders which are used to map Dynamic Capacity must be configured to point to the correct Dynamic Capacity (DC) Region. The decoder mode currently represents the partition the decoder points to such as ram or pmem. Expand the mode to include DC regions [partitions]. Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny --- Changes for v1: [iweiny: eliminate added gotos] [iweiny: Mark DC support for 6.10 kernel] --- Documentation/ABI/testing/sysfs-bus-cxl | 21 +++++++++++---------- drivers/cxl/core/hdm.c | 19 +++++++++++++++++++ drivers/cxl/core/port.c | 16 ++++++++++++++++ 3 files changed, 46 insertions(+), 10 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index fff2581b8033..8b3efaf6563c 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -316,23 +316,24 @@ Description: What: /sys/bus/cxl/devices/decoderX.Y/mode -Date: May, 2022 -KernelVersion: v6.0 +Date: May, 2022, June 2024 +KernelVersion: v6.0, v6.10 (dcY) Contact: linux-cxl@vger.kernel.org Description: (RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it translates from a host physical address range, to a device local address range. Device-local address ranges are further split - into a 'ram' (volatile memory) range and 'pmem' (persistent - memory) range. The 'mode' attribute emits one of 'ram', 'pmem', - 'mixed', or 'none'. The 'mixed' indication is for error cases - when a decoder straddles the volatile/persistent partition - boundary, and 'none' indicates the decoder is not actively - decoding, or no DPA allocation policy has been set. + into a 'ram' (volatile memory) range, 'pmem' (persistent + memory) range, or Dynamic Capacity (DC) range. The 'mode' + attribute emits one of 'ram', 'pmem', 'dcY', 'mixed', or + 'none'. The 'mixed' indication is for error cases when a + decoder straddles the volatile/persistent partition boundary, + and 'none' indicates the decoder is not actively decoding, or + no DPA allocation policy has been set. 'mode' can be written, when the decoder is in the 'disabled' - state, with either 'ram' or 'pmem' to set the boundaries for the - next allocation. + state, with 'ram', 'pmem', or 'dcY' to set the boundaries for + the next allocation. What: /sys/bus/cxl/devices/decoderX.Y/dpa_resource diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 66b8419fd0c3..e22b6f4f7145 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -255,6 +255,14 @@ static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled) __cxl_dpa_release(cxled); } +static int dc_mode_to_region_index(enum cxl_decoder_mode mode) +{ + if (mode < CXL_DECODER_DC0 || CXL_DECODER_DC7 < mode) + return -EINVAL; + + return mode - CXL_DECODER_DC0; +} + static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, resource_size_t base, resource_size_t len, resource_size_t skipped) @@ -411,6 +419,7 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct device *dev = &cxled->cxld.dev; + int rc; guard(rwsem_write)(&cxl_dpa_rwsem); if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) @@ -433,6 +442,16 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, return -ENXIO; } break; + case CXL_DECODER_DC0 ... CXL_DECODER_DC7: + rc = dc_mode_to_region_index(mode); + if (rc < 0) + return rc; + + if (resource_size(&cxlds->dc_res[rc]) == 0) { + dev_dbg(dev, "no available dynamic capacity\n"); + return -ENXIO; + } + break; default: dev_dbg(dev, "unsupported mode: %d\n", mode); return -EINVAL; diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index e59d9d37aa65..80c0651794eb 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -208,6 +208,22 @@ static ssize_t mode_store(struct device *dev, struct device_attribute *attr, mode = CXL_DECODER_PMEM; else if (sysfs_streq(buf, "ram")) mode = CXL_DECODER_RAM; + else if (sysfs_streq(buf, "dc0")) + mode = CXL_DECODER_DC0; + else if (sysfs_streq(buf, "dc1")) + mode = CXL_DECODER_DC1; + else if (sysfs_streq(buf, "dc2")) + mode = CXL_DECODER_DC2; + else if (sysfs_streq(buf, "dc3")) + mode = CXL_DECODER_DC3; + else if (sysfs_streq(buf, "dc4")) + mode = CXL_DECODER_DC4; + else if (sysfs_streq(buf, "dc5")) + mode = CXL_DECODER_DC5; + else if (sysfs_streq(buf, "dc6")) + mode = CXL_DECODER_DC6; + else if (sysfs_streq(buf, "dc7")) + mode = CXL_DECODER_DC7; else return -EINVAL; From patchwork Sun Mar 24 23:18:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601015 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AF8332002D3; Sun, 24 Mar 2024 23:18:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322299; cv=none; b=hTvqQ3i7Rn2oMWGAB1hd7qgl0wYYu7T5t1h/dy5DaK3DycOadbdVjR3HDTZTxJporNDoVHsxlND96/tnwJ9x9iTEQ2YkvAbZnxcO8SLv3hoD4IvG5A8ND7OPd5qKtdO7XOB/0gaurb54kGCF8FIvNwcdKvLDMFg7Wvvdv117FcM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322299; c=relaxed/simple; bh=4Qg+7frpmwyiZY8ppzNptIOhkREvQjfxzi0VbQ1rbRM=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=RorTH9HXNFhh8usmRpIWJXDCpGEfggWzwDuzRUc2DBinYzfp+qVaCyf7hraD9PbjgCiSfh4MpZNKs9aKO73U/q+hRCHx7EhcEl5rmETxXouXvYfpmqyBOfllstrROHktVrjPgjyMQxWkFahuzbYko/JqtjO7F30KTl3fO0QjKfk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=IMQnLHxI; arc=none smtp.client-ip=192.198.163.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="IMQnLHxI" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322297; x=1742858297; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=4Qg+7frpmwyiZY8ppzNptIOhkREvQjfxzi0VbQ1rbRM=; b=IMQnLHxIwhYQJMMyRpu1Iwn3yvQiVf0Fd1XMdwMO1XHf5lfe6UVhPasO EmOrw0yVJJSEFO/ZTSSewVPwdAz5rjWUuE6Fz1i4BHKmqCSa6l0Mh54CN qL6w/hrDG8Fk+8+FCC8cSTPBAw0U9sCj1+xY8KkpzIuS00zjQHx1BHqvb CCy9/Gi+FxfeP25huAdL6Vo0NIhUTrwCbvCrd0LCnqkuL9azztAnnvmQy d+huX7pooliPH9cHU0VoGegWOLLzgxIY/5njB5FiYeRHedeOHsiCeHubz w6NWMhbImjIDOYbWPiYvm4GL++qNwb92Iujad562H+2woHQ03tKOmhnsx w==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="10078009" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="10078009" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15869358" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by orviesa006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:12 -0700 From: ira.weiny@intel.com Date: Sun, 24 Mar 2024 16:18:10 -0700 Subject: [PATCH 07/26] cxl/port: Add dynamic capacity size support to endpoint decoders Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-7-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=13589; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=E1IYYAgxO2k5XOaXMcJIWvlNr7YyMQC22ygo4DnQ/fM=; b=H+8hjpH17J8cou6hHSjplDOF/w7mRZVS3tXy6K/CPYDgQqp39USvmByu+5V/OA8BugMRCCsyi 6KLiSRcdlxCD3Lc7zmeU/8ZuWSMi4S4UzlhVUitUKB9WWKy5V2guWSm X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh To support Dynamic Capacity Devices (DCD) endpoint decoders will need to map DC partitions (regions). In addition to assigning the size of the DC partition, the decoder must assign any skip value from the previous decoder. This must be done within a contiguous DPA space. Two complications arise with Dynamic Capacity regions which did not exist with Ram and PMEM partitions. First, gaps in the DPA space can exist between and around the DC Regions. Second, the Linux resource tree does not allow a resource to be marked across existing nodes within a tree. For clarity, below is an example of an 60GB device with 10GB of RAM, 10GB of PMEM and 10GB for each of 2 DC Regions. The desired CXL mapping is 5GB of RAM, 5GB of PMEM, and all 10GB of DC1. DPA RANGE (dpa_res) 0GB 10GB 20GB 30GB 40GB 50GB 60GB |----------|----------|----------|----------|----------|----------| RAM PMEM DC0 DC1 (ram_res) (pmem_res) (dc_res[0]) (dc_res[1]) |----------|----------| |----------| |----------| RAM PMEM DC1 |XXXXX|----|XXXXX|----|----------|----------|----------|XXXXXXXXXX| 0GB 5GB 10GB 15GB 20GB 30GB 40GB 50GB 60GB The previous skip resource between RAM and PMEM was always a child of the RAM resource and fit nicely [see (S) below]. Because of this simplicity this skip resource reference was not stored in any CXL state. On release the skip range could be calculated based on the endpoint decoders stored values. Now when DC1 is being mapped 4 skip resources must be created as children. One for the PMEM resource (A), two of the parent DPA resource (B,D), and one more child of the DC0 resource (C). 0GB 10GB 20GB 30GB 40GB 50GB 60GB |----------|----------|----------|----------|----------|----------| | | |----------|----------| | |----------| | |----------| | | | | | (S) (A) (B) (C) (D) v v v v v |XXXXX|----|XXXXX|----|----------|----------|----------|XXXXXXXXXX| skip skip skip skip skip Expand the calculation of DPA freespace and enhance the logic to support mapping/unmapping DC DPA space. To track the potential of multiple skip resources an xarray is attached to the endpoint decoder. The existing algorithm between RAM and PMEM is consolidated within the new one to streamline the code even though the result is the storage of a single skip resource in the xarray. Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny --- Changes for v1: [iweiny: Update cover letter] --- drivers/cxl/core/hdm.c | 192 +++++++++++++++++++++++++++++++++++++++++++----- drivers/cxl/core/port.c | 2 + drivers/cxl/cxl.h | 2 + 3 files changed, 179 insertions(+), 17 deletions(-) diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index e22b6f4f7145..da7d58184490 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -210,6 +210,25 @@ void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds) } EXPORT_SYMBOL_NS_GPL(cxl_dpa_debug, CXL); +static void cxl_skip_release(struct cxl_endpoint_decoder *cxled) +{ + struct cxl_dev_state *cxlds = cxled_to_memdev(cxled)->cxlds; + struct cxl_port *port = cxled_to_port(cxled); + struct device *dev = &port->dev; + unsigned long index; + void *entry; + + xa_for_each(&cxled->skip_res, index, entry) { + struct resource *res = entry; + + dev_dbg(dev, "decoder%d.%d: releasing skipped space; %pr\n", + port->id, cxled->cxld.id, res); + __release_region(&cxlds->dpa_res, res->start, + resource_size(res)); + xa_erase(&cxled->skip_res, index); + } +} + /* * Must be called in a context that synchronizes against this decoder's * port ->remove() callback (like an endpoint decoder sysfs attribute) @@ -220,15 +239,11 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled) struct cxl_port *port = cxled_to_port(cxled); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct resource *res = cxled->dpa_res; - resource_size_t skip_start; lockdep_assert_held_write(&cxl_dpa_rwsem); - /* save @skip_start, before @res is released */ - skip_start = res->start - cxled->skip; __release_region(&cxlds->dpa_res, res->start, resource_size(res)); - if (cxled->skip) - __release_region(&cxlds->dpa_res, skip_start, cxled->skip); + cxl_skip_release(cxled); cxled->skip = 0; cxled->dpa_res = NULL; put_device(&cxled->cxld.dev); @@ -263,6 +278,100 @@ static int dc_mode_to_region_index(enum cxl_decoder_mode mode) return mode - CXL_DECODER_DC0; } +static int cxl_request_skip(struct cxl_endpoint_decoder *cxled, + resource_size_t skip_base, resource_size_t skip_len) +{ + struct cxl_dev_state *cxlds = cxled_to_memdev(cxled)->cxlds; + const char *name = dev_name(&cxled->cxld.dev); + struct cxl_port *port = cxled_to_port(cxled); + struct resource *dpa_res = &cxlds->dpa_res; + struct device *dev = &port->dev; + struct resource *res; + int rc; + + res = __request_region(dpa_res, skip_base, skip_len, name, 0); + if (!res) + return -EBUSY; + + rc = xa_insert(&cxled->skip_res, skip_base, res, GFP_KERNEL); + if (rc) { + __release_region(dpa_res, skip_base, skip_len); + return rc; + } + + dev_dbg(dev, "decoder%d.%d: skipped space; %pr\n", + port->id, cxled->cxld.id, res); + return 0; +} + +static int cxl_reserve_dpa_skip(struct cxl_endpoint_decoder *cxled, + resource_size_t base, resource_size_t skipped) +{ + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_port *port = cxled_to_port(cxled); + struct cxl_dev_state *cxlds = cxlmd->cxlds; + resource_size_t skip_base = base - skipped; + struct device *dev = &port->dev; + resource_size_t skip_len = 0; + int rc, index; + + if (resource_size(&cxlds->ram_res) && skip_base <= cxlds->ram_res.end) { + skip_len = cxlds->ram_res.end - skip_base + 1; + rc = cxl_request_skip(cxled, skip_base, skip_len); + if (rc) + return rc; + skip_base += skip_len; + } + + if (skip_base == base) { + dev_dbg(dev, "skip done ram!\n"); + return 0; + } + + if (resource_size(&cxlds->pmem_res) && + skip_base <= cxlds->pmem_res.end) { + skip_len = cxlds->pmem_res.end - skip_base + 1; + rc = cxl_request_skip(cxled, skip_base, skip_len); + if (rc) + return rc; + skip_base += skip_len; + } + + index = dc_mode_to_region_index(cxled->mode); + for (int i = 0; i <= index; i++) { + struct resource *dcr = &cxlds->dc_res[i]; + + if (skip_base < dcr->start) { + skip_len = dcr->start - skip_base; + rc = cxl_request_skip(cxled, skip_base, skip_len); + if (rc) + return rc; + skip_base += skip_len; + } + + if (skip_base == base) { + dev_dbg(dev, "skip done DC region %d!\n", i); + break; + } + + if (resource_size(dcr) && skip_base <= dcr->end) { + if (skip_base > base) { + dev_err(dev, "Skip error DC region %d; skip_base %pa; base %pa\n", + i, &skip_base, &base); + return -ENXIO; + } + + skip_len = dcr->end - skip_base + 1; + rc = cxl_request_skip(cxled, skip_base, skip_len); + if (rc) + return rc; + skip_base += skip_len; + } + } + + return 0; +} + static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, resource_size_t base, resource_size_t len, resource_size_t skipped) @@ -300,13 +409,12 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, } if (skipped) { - res = __request_region(&cxlds->dpa_res, base - skipped, skipped, - dev_name(&cxled->cxld.dev), 0); - if (!res) { - dev_dbg(dev, - "decoder%d.%d: failed to reserve skipped space\n", - port->id, cxled->cxld.id); - return -EBUSY; + int rc = cxl_reserve_dpa_skip(cxled, base, skipped); + + if (rc) { + dev_dbg(dev, "decoder%d.%d: failed to reserve skipped space; %pa - %pa\n", + port->id, cxled->cxld.id, &base, &skipped); + return rc; } } res = __request_region(&cxlds->dpa_res, base, len, @@ -314,14 +422,20 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, if (!res) { dev_dbg(dev, "decoder%d.%d: failed to reserve allocation\n", port->id, cxled->cxld.id); - if (skipped) - __release_region(&cxlds->dpa_res, base - skipped, - skipped); + cxl_skip_release(cxled); return -EBUSY; } cxled->dpa_res = res; cxled->skip = skipped; + for (int mode = CXL_DECODER_DC0; mode <= CXL_DECODER_DC7; mode++) { + int index = dc_mode_to_region_index(mode); + + if (resource_contains(&cxlds->dc_res[index], res)) { + cxled->mode = mode; + goto success; + } + } if (resource_contains(&cxlds->pmem_res, res)) cxled->mode = CXL_DECODER_PMEM; else if (resource_contains(&cxlds->ram_res, res)) @@ -332,6 +446,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, cxled->mode = CXL_DECODER_MIXED; } +success: + dev_dbg(dev, "decoder%d.%d: %pr mode: %d\n", port->id, cxled->cxld.id, + cxled->dpa_res, cxled->mode); port->hdm_end++; get_device(&cxled->cxld.dev); return 0; @@ -463,14 +580,14 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) { + resource_size_t free_ram_start, free_pmem_start, free_dc_start; struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); - resource_size_t free_ram_start, free_pmem_start; struct cxl_port *port = cxled_to_port(cxled); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct device *dev = &cxled->cxld.dev; resource_size_t start, avail, skip; struct resource *p, *last; - int rc; + int rc, dc_index; down_write(&cxl_dpa_rwsem); if (cxled->cxld.region) { @@ -500,6 +617,21 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) else free_pmem_start = cxlds->pmem_res.start; + /* + * Limit each decoder to a single DC region to map memory with + * different DSMAS entry. + */ + dc_index = dc_mode_to_region_index(cxled->mode); + if (dc_index >= 0) { + if (cxlds->dc_res[dc_index].child) { + dev_err(dev, "Cannot allocate DPA from DC Region: %d\n", + dc_index); + rc = -EINVAL; + goto out; + } + free_dc_start = cxlds->dc_res[dc_index].start; + } + if (cxled->mode == CXL_DECODER_RAM) { start = free_ram_start; avail = cxlds->ram_res.end - start + 1; @@ -521,12 +653,38 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) else skip_end = start - 1; skip = skip_end - skip_start + 1; + } else if (cxl_decoder_mode_is_dc(cxled->mode)) { + resource_size_t skip_start, skip_end; + + start = free_dc_start; + avail = cxlds->dc_res[dc_index].end - start + 1; + if ((resource_size(&cxlds->pmem_res) == 0) || !cxlds->pmem_res.child) + skip_start = free_ram_start; + else + skip_start = free_pmem_start; + /* + * If any dc region is already mapped, then that allocation + * already handled the RAM and PMEM skip. Check for DC region + * skip. + */ + for (int i = dc_index - 1; i >= 0 ; i--) { + if (cxlds->dc_res[i].child) { + skip_start = cxlds->dc_res[i].child->end + 1; + break; + } + } + + skip_end = start - 1; + skip = skip_end - skip_start + 1; } else { dev_dbg(dev, "mode not set\n"); rc = -EINVAL; goto out; } + dev_dbg(dev, "DPA Allocation start: %pa len: %#llx Skip: %pa\n", + &start, size, &skip); + if (size > avail) { dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size, cxled->mode == CXL_DECODER_RAM ? "ram" : "pmem", diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 80c0651794eb..036b61cb3007 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -434,6 +434,7 @@ static void cxl_endpoint_decoder_release(struct device *dev) struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); __cxl_decoder_release(&cxled->cxld); + xa_destroy(&cxled->skip_res); kfree(cxled); } @@ -1896,6 +1897,7 @@ struct cxl_endpoint_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port) return ERR_PTR(-ENOMEM); cxled->pos = -1; + xa_init(&cxled->skip_res); cxld = &cxled->cxld; rc = cxl_decoder_init(port, cxld); if (rc) { diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 3b8935089c0c..15d418b3bc9b 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -441,6 +441,7 @@ enum cxl_decoder_state { * @cxld: base cxl_decoder_object * @dpa_res: actively claimed DPA span of this decoder * @skip: offset into @dpa_res where @cxld.hpa_range maps + * @skip_res: array of skipped resources from the previous decoder end * @mode: which memory type / access-mode-partition this decoder targets * @state: autodiscovery state * @pos: interleave position in @cxld.region @@ -449,6 +450,7 @@ struct cxl_endpoint_decoder { struct cxl_decoder cxld; struct resource *dpa_res; resource_size_t skip; + struct xarray skip_res; enum cxl_decoder_mode mode; enum cxl_decoder_state state; int pos; From patchwork Sun Mar 24 23:18:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601018 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BD1102002E4; Sun, 24 Mar 2024 23:18:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322300; cv=none; b=S1ESHq7RHYr1SFgU0keZXgpVMV07c1CwGLlCH/+617QxmC3t89keeQi9nx5J54HAot8eDRXrN+GeEFnZP76fHkFl6IVKepQ3glzjZFOuUQvov0gHxCgTzVpHqinlFd6XcCkzbtN+zrGrb2aBqGvaDYGglvLStXxEvVYjpYnxmCo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322300; c=relaxed/simple; bh=ym05624QlF+M51JYEl67O0kD6CiBDhKaB/w5yQTYn9k=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=qWEuCWzJAIXCZcAazLdaTJXn6CZkTbnsjLwMXzCZcEll6/woFxb03P84HxccCUjDHuDFd6IxK+J1JUov7wLbD9O6StMtNERXDFbYYdl7Oi9ubqKI+Up11WjYBuX+BgtwPF3DTcVVeWbD9yL+w7dgUt+Saf6LeQ0tnlXgb1yXHXw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=c6v21Mb9; arc=none smtp.client-ip=192.198.163.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="c6v21Mb9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322298; x=1742858298; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=ym05624QlF+M51JYEl67O0kD6CiBDhKaB/w5yQTYn9k=; b=c6v21Mb9E0YMQvXxUYlTm14w7DPwtcWn52Z6R2EWYXy8DW9PP4YVS553 1eGgIpWZIw8FJPvovtGbyScyzjE8HXXTF92+mw5a6AwLlXssymoeErIt6 Bse5aieMwQs0+uY/Eg1DXc7HYGl2egZ/nFMw0bFfyTuBPoyavKC7F5o/q mdr6aRK+c9+gZWmHEPhChBQ0zuOGimBEq2EW839ga+AbeNIITjp1MKb83 9wetxe1H5OHZVHV07NBZpxeaO5p3FKBrpdlPQZ6D7lgk1EuUNqJ4OC4/O 9gVcslPemjMUDvy9Xh7eL4C6r7ryYILXMI/WwnIvrl2o4biE5a4kqaPrn w==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="10078015" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="10078015" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15869362" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by orviesa006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:13 -0700 From: ira.weiny@intel.com Date: Sun, 24 Mar 2024 16:18:11 -0700 Subject: [PATCH 08/26] cxl/mem: Expose device dynamic capacity capabilities Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-8-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=4987; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=Wq3gb7omnGtPtB2eYEFuqUs1HeX/Mmr6GMdslTnf6Tg=; b=O9UTPyK9mGikrLriFRWsu3itRU7wDUho/u57Kv9yshveTfdziesP7bxPSwFf2Mn9EUkPwvYKr WJd1eNEu1L9CDHuzPE0WK7GsuAw9KiIUeLgnaHpGt53sYpAstA3YRW7 X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh To properly configure CXL regions on Dynamic Capacity Devices (DCD), user space will need to know the details of the DC Regions available on a device. Expose driver dynamic capacity capabilities through sysfs attributes. Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny Reviewed-by: Jonathan Cameron --- Changes for v1: [iweiny: remove review tags] [iweiny: mark sysfs for 6.10 kernel] --- Documentation/ABI/testing/sysfs-bus-cxl | 17 ++++++++ drivers/cxl/core/memdev.c | 76 +++++++++++++++++++++++++++++++++ 2 files changed, 93 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index 8b3efaf6563c..8a4f572c8498 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -54,6 +54,23 @@ Description: identically named field in the Identify Memory Device Output Payload in the CXL-2.0 specification. +What: /sys/bus/cxl/devices/memX/dc/region_count +Date: June, 2024 +KernelVersion: v6.10 +Contact: linux-cxl@vger.kernel.org +Description: + (RO) Number of Dynamic Capacity (DC) regions supported on the + device. May be 0 if the device does not support Dynamic + Capacity. + +What: /sys/bus/cxl/devices/memX/dc/regionY_size +Date: June, 2024 +KernelVersion: v6.10 +Contact: linux-cxl@vger.kernel.org +Description: + (RO) Size of the Dynamic Capacity (DC) region Y. Only + available on devices which support DC and only for those + region indexes supported by the device. What: /sys/bus/cxl/devices/memX/pmem/qos_class Date: May, 2023 diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index d4e259f3a7e9..a7b880e33a7e 100644 --- a/drivers/cxl/core/memdev.c +++ b/drivers/cxl/core/memdev.c @@ -101,6 +101,18 @@ static ssize_t pmem_size_show(struct device *dev, struct device_attribute *attr, static struct device_attribute dev_attr_pmem_size = __ATTR(size, 0444, pmem_size_show, NULL); +static ssize_t region_count_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); + + return sysfs_emit(buf, "%d\n", mds->nr_dc_region); +} + +static struct device_attribute dev_attr_region_count = + __ATTR(region_count, 0444, region_count_show, NULL); + static ssize_t serial_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -492,6 +504,63 @@ static struct attribute *cxl_memdev_security_attributes[] = { NULL, }; +static ssize_t show_size_regionN(struct cxl_memdev *cxlmd, char *buf, int pos) +{ + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); + + return sysfs_emit(buf, "%#llx\n", mds->dc_region[pos].decode_len); +} + +#define REGION_SIZE_ATTR_RO(n) \ +static ssize_t region##n##_size_show(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + return show_size_regionN(to_cxl_memdev(dev), buf, (n)); \ +} \ +static DEVICE_ATTR_RO(region##n##_size) +REGION_SIZE_ATTR_RO(0); +REGION_SIZE_ATTR_RO(1); +REGION_SIZE_ATTR_RO(2); +REGION_SIZE_ATTR_RO(3); +REGION_SIZE_ATTR_RO(4); +REGION_SIZE_ATTR_RO(5); +REGION_SIZE_ATTR_RO(6); +REGION_SIZE_ATTR_RO(7); + +static struct attribute *cxl_memdev_dc_attributes[] = { + &dev_attr_region0_size.attr, + &dev_attr_region1_size.attr, + &dev_attr_region2_size.attr, + &dev_attr_region3_size.attr, + &dev_attr_region4_size.attr, + &dev_attr_region5_size.attr, + &dev_attr_region6_size.attr, + &dev_attr_region7_size.attr, + &dev_attr_region_count.attr, + NULL, +}; + +static umode_t cxl_dc_visible(struct kobject *kobj, struct attribute *a, int n) +{ + struct device *dev = kobj_to_dev(kobj); + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); + + /* Not a memory device */ + if (!mds) + return 0; + + if (a == &dev_attr_region_count.attr) + return a->mode; + + /* Show only the regions supported */ + if (n < mds->nr_dc_region) + return a->mode; + + return 0; +} + static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a, int n) { @@ -567,11 +636,18 @@ static struct attribute_group cxl_memdev_security_attribute_group = { .is_visible = cxl_memdev_security_visible, }; +static struct attribute_group cxl_memdev_dc_attribute_group = { + .name = "dc", + .attrs = cxl_memdev_dc_attributes, + .is_visible = cxl_dc_visible, +}; + static const struct attribute_group *cxl_memdev_attribute_groups[] = { &cxl_memdev_attribute_group, &cxl_memdev_ram_attribute_group, &cxl_memdev_pmem_attribute_group, &cxl_memdev_security_attribute_group, + &cxl_memdev_dc_attribute_group, NULL, }; From patchwork Sun Mar 24 23:18:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601017 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 013512002DB; Sun, 24 Mar 2024 23:18:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322300; cv=none; b=md+NEWTZHDMqiyGQSQ7LYErFAXvesn3fjaPQxAamF9kklLCgdjZ0uN0mdrSVjjKuFHGhFCdAAy8UKzjhAfsvgbtrxDFcqGqBJyshoMM7mjRkDwJvNhEq2mVO4sG53RkcV1NW+P51RuQBMtv2bj7ouQGUhklauVs/unkP2AJqcAI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322300; c=relaxed/simple; bh=LU3mMInKh8CA4wuJytrcoR1TJCTXH8qvdHqcPiWRSYc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=aUrHR51a8/67c4/hUkFrZmPGMmIy5Dg9CTrHocBrdp+ivbgvRfKTtToB9+HSFbEIzQlqbbRdeyT1HAA+u1Atnf2Y8Zdsdn6EbGCOmP3+Vk+uiNITBCoop/jT4voWZCqxVmsoh/HMsPYVkEQhI9M/oWn/b4xEESzbUxoPC+uh5EI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=hZFUAwJ6; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hZFUAwJ6" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322298; x=1742858298; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=LU3mMInKh8CA4wuJytrcoR1TJCTXH8qvdHqcPiWRSYc=; b=hZFUAwJ6Q2bo04pS8Zm5jngomqyBbSCaSOV8jJzjSs9fyIdQkXy1C3Dc 18r4Knn04i9R9B77w/BGYwfoIynP/nrSBDqqdrqRO5Yd8U256giJGMazQ Lz/jYhUM06U8m3WfAJ716O/pJX9bbO4aNIITNuIM16aFVy8Ks8bM1Rkek i1EGs++PvZLYzQFTtcvncZ6no66SdqaINDa0bAJji5y3aCIDdqUoKXWc3 s8yz9GN61vuJR9tldzKtZpCu1FASZYIiSLcWANxQp+BJuCqIAr6bFE4c+ YiTad9/HNCF20LbFGafPdm4wKp+9qZNcGMAU0oaIelLTstNwflEOk/LAu g==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="6431709" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="6431709" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15464682" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:16 -0700 From: ira.weiny@intel.com Date: Sun, 24 Mar 2024 16:18:12 -0700 Subject: [PATCH 09/26] cxl/region: Add Dynamic Capacity CXL region support Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-9-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=8897; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=+w44ZaRnnuSgyXxYYZ37HJrVw/1eUidIfwSam5HwsWQ=; b=I6krxIl+DxVHMk247UcvYhObxjCvUUasSgFqtxiAN9orUhOHzPVLzRdzGl+IMtUkxHAVY6ZwA 4Ll8mzfGHTtC4nmGDNaZ7tYLrBdfb2DiNAhF34CWK+5JJMNBj9Okpf+ X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh CXL devices optionally support dynamic capacity. CXL Regions must be configured correctly to access this capacity. Similar to ram and pmem partitions, DC Regions, as they are called in CXL 3.1, represent different partitions of the DPA space. Introduce the concept of a sparse DAX region. Add the create_dc_region sysfs entry to create sparse DC DAX regions. Special case DC capable regions to create a 0 sized seed DAX device to maintain backwards compatibility with older software which needs a default DAX device to hold the region reference. Flag sparse DAX regions to indicate 0 capacity available until such time as DC capacity is added. Interleaving is deferred in this series. Add an early check. Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny Reviewed-by: Jonathan Cameron --- Changes for v1: [djiang: mark sysfs entries to be in 6.10 kernel including date] [djbw: change dax region typing to be 'sparse' rather than 'dynamic'] [iweiny: rebase changes to master instead of type2 patches] --- Documentation/ABI/testing/sysfs-bus-cxl | 22 +++++++++++----------- drivers/cxl/core/core.h | 1 + drivers/cxl/core/port.c | 1 + drivers/cxl/core/region.c | 33 +++++++++++++++++++++++++++++++++ drivers/dax/bus.c | 8 ++++++++ drivers/dax/bus.h | 1 + drivers/dax/cxl.c | 15 +++++++++++++-- 7 files changed, 68 insertions(+), 13 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index 8a4f572c8498..f0cf52fff9fa 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -411,20 +411,20 @@ Description: interleave_granularity). -What: /sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram}_region -Date: May, 2022, January, 2023 -KernelVersion: v6.0 (pmem), v6.3 (ram) +What: /sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram,dc}_region +Date: May, 2022, January, 2023, June 2024 +KernelVersion: v6.0 (pmem), v6.3 (ram), v6.10 (dc) Contact: linux-cxl@vger.kernel.org Description: (RW) Write a string in the form 'regionZ' to start the process - of defining a new persistent, or volatile memory region - (interleave-set) within the decode range bounded by root decoder - 'decoderX.Y'. The value written must match the current value - returned from reading this attribute. An atomic compare exchange - operation is done on write to assign the requested id to a - region and allocate the region-id for the next creation attempt. - EBUSY is returned if the region name written does not match the - current cached value. + of defining a new persistent, volatile, or Dynamic Capacity + (DC) memory region (interleave-set) within the decode range + bounded by root decoder 'decoderX.Y'. The value written must + match the current value returned from reading this attribute. + An atomic compare exchange operation is done on write to assign + the requested id to a region and allocate the region-id for the + next creation attempt. EBUSY is returned if the region name + written does not match the current cached value. What: /sys/bus/cxl/devices/decoderX.Y/delete_region diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 3b64fb1b9ed0..91abeffbe985 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -13,6 +13,7 @@ extern struct attribute_group cxl_base_attribute_group; #ifdef CONFIG_CXL_REGION extern struct device_attribute dev_attr_create_pmem_region; extern struct device_attribute dev_attr_create_ram_region; +extern struct device_attribute dev_attr_create_dc_region; extern struct device_attribute dev_attr_delete_region; extern struct device_attribute dev_attr_region; extern const struct device_type cxl_pmem_region_type; diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 036b61cb3007..661177b575f7 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -335,6 +335,7 @@ static struct attribute *cxl_decoder_root_attrs[] = { &dev_attr_qos_class.attr, SET_CXL_REGION_ATTR(create_pmem_region) SET_CXL_REGION_ATTR(create_ram_region) + SET_CXL_REGION_ATTR(create_dc_region) SET_CXL_REGION_ATTR(delete_region) NULL, }; diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index ec3b8c6948e9..0d7b09a49dcf 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -2205,6 +2205,7 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd, switch (mode) { case CXL_REGION_RAM: case CXL_REGION_PMEM: + case CXL_REGION_DC: break; default: dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %s\n", @@ -2314,6 +2315,32 @@ static ssize_t create_ram_region_store(struct device *dev, } DEVICE_ATTR_RW(create_ram_region); +static ssize_t create_dc_region_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return __create_region_show(to_cxl_root_decoder(dev), buf); +} + +static ssize_t create_dc_region_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev); + struct cxl_region *cxlr; + int rc, id; + + rc = sscanf(buf, "region%d\n", &id); + if (rc != 1) + return -EINVAL; + + cxlr = __create_region(cxlrd, CXL_REGION_DC, id); + if (IS_ERR(cxlr)) + return PTR_ERR(cxlr); + + return len; +} +DEVICE_ATTR_RW(create_dc_region); + static ssize_t region_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -2759,6 +2786,11 @@ static int devm_cxl_add_dax_region(struct cxl_region *cxlr) struct device *dev; int rc; + if (cxlr->mode == CXL_REGION_DC && cxlr->params.interleave_ways != 1) { + dev_err(&cxlr->dev, "Interleaving DC not supported\n"); + return -EINVAL; + } + cxlr_dax = cxl_dax_region_alloc(cxlr); if (IS_ERR(cxlr_dax)) return PTR_ERR(cxlr_dax); @@ -3040,6 +3072,7 @@ static int cxl_region_probe(struct device *dev) case CXL_REGION_PMEM: return devm_cxl_add_pmem_region(cxlr); case CXL_REGION_RAM: + case CXL_REGION_DC: /* * The region can not be manged by CXL if any portion of * it is already online as 'System RAM' diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index cb148f74ceda..903566aff5eb 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -181,6 +181,11 @@ static bool is_static(struct dax_region *dax_region) return (dax_region->res.flags & IORESOURCE_DAX_STATIC) != 0; } +static bool is_sparse(struct dax_region *dax_region) +{ + return (dax_region->res.flags & IORESOURCE_DAX_SPARSE_CAP) != 0; +} + bool static_dev_dax(struct dev_dax *dev_dax) { return is_static(dev_dax->region); @@ -304,6 +309,9 @@ static unsigned long long dax_region_avail_size(struct dax_region *dax_region) WARN_ON_ONCE(!rwsem_is_locked(&dax_region_rwsem)); + if (is_sparse(dax_region)) + return 0; + for_each_dax_region_resource(dax_region, res) size -= resource_size(res); return size; diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h index cbbf64443098..783bfeef42cc 100644 --- a/drivers/dax/bus.h +++ b/drivers/dax/bus.h @@ -13,6 +13,7 @@ struct dax_region; /* dax bus specific ioresource flags */ #define IORESOURCE_DAX_STATIC BIT(0) #define IORESOURCE_DAX_KMEM BIT(1) +#define IORESOURCE_DAX_SPARSE_CAP BIT(2) struct dax_region *alloc_dax_region(struct device *parent, int region_id, struct range *range, int target_node, unsigned int align, diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c index c696837ab23c..415d03fbf9b6 100644 --- a/drivers/dax/cxl.c +++ b/drivers/dax/cxl.c @@ -13,19 +13,30 @@ static int cxl_dax_region_probe(struct device *dev) struct cxl_region *cxlr = cxlr_dax->cxlr; struct dax_region *dax_region; struct dev_dax_data data; + resource_size_t dev_size; + unsigned long flags; if (nid == NUMA_NO_NODE) nid = memory_add_physaddr_to_nid(cxlr_dax->hpa_range.start); + flags = IORESOURCE_DAX_KMEM; + if (cxlr->mode == CXL_REGION_DC) + flags |= IORESOURCE_DAX_SPARSE_CAP; + dax_region = alloc_dax_region(dev, cxlr->id, &cxlr_dax->hpa_range, nid, - PMD_SIZE, IORESOURCE_DAX_KMEM); + PMD_SIZE, flags); if (!dax_region) return -ENOMEM; + dev_size = range_len(&cxlr_dax->hpa_range); + /* Add empty seed dax device */ + if (cxlr->mode == CXL_REGION_DC) + dev_size = 0; + data = (struct dev_dax_data) { .dax_region = dax_region, .id = -1, - .size = range_len(&cxlr_dax->hpa_range), + .size = dev_size, .memmap_on_memory = true, }; From patchwork Sun Mar 24 23:18:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601020 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 37E412002F4; Sun, 24 Mar 2024 23:18:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322300; cv=none; b=RmorVpHmh9MK1orf1li2TqSFLTz7QD6MYh0b6LBEy/OElBmbvKVz4yUWPHxigaQcTjlo+05cCeexLuASWY22ko7NSK/JmC4d6NLcnXyzIx2JMDebK55Ln+iwAJrkjBW/VJ2uiICK1K++j6sZS5ps2m1nCzHkH9/VP1Ks2nHLer8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322300; c=relaxed/simple; bh=sHOHAAgo36BLciM/K7n2mRo2mKuGuaXUBGhyzlTFcnU=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=nbl+lvodsjIrHNRyTghKQE3uqhMs/pKzM3Ms1yx1AGx8DPdcgUQ50961beBkEONa52DP9gePRa4m4Gh8HQL6ymp2QWZysFFsSlpS6JgMc6csoo2jcxE17xXNgTXiI+GoKwz+fYbz47XBe6gnuVRkt4fltaVsNT+VrEjgNnTSEts= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=kK3WrLdc; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="kK3WrLdc" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322299; x=1742858299; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=sHOHAAgo36BLciM/K7n2mRo2mKuGuaXUBGhyzlTFcnU=; b=kK3WrLdclJBucFHlfv03QDJaU7CST7/H608cRd8t00XSzM5YZpGJy9mG uPRjsRJRRqTVCggiBO1tKkRkLnAFxszMKxP4o4ByhpTH70kT/kHeUoMS0 Gro0ngCHGc+81SYMmRBgzCVTW5agKTMR4Iq+6VDEBOCmEmVHGHFn/4HtL 3gkAQfJhBDKopYene+nv59jsc4BsgCdBqO2OuaWVW/6Ys1otmMmOVFPBR vsfmmgtcCH8NB8IDzVESKDVH4CD3AsFChhI4u24iMPQlhz45hF8s6+9So Kcw7vt/n06kEG4fUGarH+BtFRTEykq65kQKkzagyyeIGkx9PW7ETs+L+s Q==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="6431712" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="6431712" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15464685" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:16 -0700 From: Ira Weiny Date: Sun, 24 Mar 2024 16:18:13 -0700 Subject: [PATCH 10/26] cxl/events: Factor out event msgnum configuration Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-10-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=2584; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=sHOHAAgo36BLciM/K7n2mRo2mKuGuaXUBGhyzlTFcnU=; b=mOpBBOsOLZaK3hFPvkaJoSljLE6XhFjBl9gLKSZ5STCIrET6lhWbgu36Xtiyq0Ma+PV7zpKbY 95epuu3rURaCGVXEZPqCj74bQDnUUZEUwB7drjn3Mu+bcK2WpQcfZqN X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= Dynamic Capacity Devices (DCD) require events to process extent addition or removal. BIOS may have control over memory event processing. Factor out cxl_event_config_msgnums() in preparation for setting up DCD event interrupts separate from memory events. Signed-off-by: Ira Weiny Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron --- drivers/cxl/pci.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 216881455364..cedd9b05f129 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -698,35 +698,31 @@ static int cxl_event_config_msgnums(struct cxl_memdev_state *mds, return cxl_event_get_int_policy(mds, policy); } -static int cxl_event_irqsetup(struct cxl_memdev_state *mds) +static int cxl_event_irqsetup(struct cxl_memdev_state *mds, + struct cxl_event_interrupt_policy *policy) { struct cxl_dev_state *cxlds = &mds->cxlds; - struct cxl_event_interrupt_policy policy; int rc; - rc = cxl_event_config_msgnums(mds, &policy); - if (rc) - return rc; - - rc = cxl_event_req_irq(cxlds, policy.info_settings); + rc = cxl_event_req_irq(cxlds, policy->info_settings); if (rc) { dev_err(cxlds->dev, "Failed to get interrupt for event Info log\n"); return rc; } - rc = cxl_event_req_irq(cxlds, policy.warn_settings); + rc = cxl_event_req_irq(cxlds, policy->warn_settings); if (rc) { dev_err(cxlds->dev, "Failed to get interrupt for event Warn log\n"); return rc; } - rc = cxl_event_req_irq(cxlds, policy.failure_settings); + rc = cxl_event_req_irq(cxlds, policy->failure_settings); if (rc) { dev_err(cxlds->dev, "Failed to get interrupt for event Failure log\n"); return rc; } - rc = cxl_event_req_irq(cxlds, policy.fatal_settings); + rc = cxl_event_req_irq(cxlds, policy->fatal_settings); if (rc) { dev_err(cxlds->dev, "Failed to get interrupt for event Fatal log\n"); return rc; @@ -745,7 +741,7 @@ static bool cxl_event_int_is_fw(u8 setting) static int cxl_event_config(struct pci_host_bridge *host_bridge, struct cxl_memdev_state *mds, bool irq_avail) { - struct cxl_event_interrupt_policy policy; + struct cxl_event_interrupt_policy policy = { 0 }; int rc; /* @@ -777,7 +773,11 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge, return -EBUSY; } - rc = cxl_event_irqsetup(mds); + rc = cxl_event_config_msgnums(mds, &policy); + if (rc) + return rc; + + rc = cxl_event_irqsetup(mds, &policy); if (rc) return rc; From patchwork Sun Mar 24 23:18:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601022 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 670351FEC63; Sun, 24 Mar 2024 23:18:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322302; cv=none; b=hOILEM8xXyCuuxWDID7PDjeksNMcLX80QMo6onI6+a6yKZ5ncR0bxDy/GtMOeM+p4Z35WCmX65ffD+fhTkCCJ6AvzCUhkqjd3CDdvvCzn8ld2fcMES0wcjhlVVI0n8lOAW6zAPcH122qGmd70PZ3xmLU28AQ6ySIm3V2SC2+tpY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322302; c=relaxed/simple; bh=aOa7tSKdrtNNgJakY80BLsxWTWX10CPFdKyFwZ4Ph/I=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=hlf0asXA/WTkowp3xH2FQPvXh3uBP4EZDdVaV0klcc3KXnNTcLCczRAwNo/VV0YnTHWiJ8Cb4TWTrAbSyo8GssQOIejVg8gaGHUOE/PjjpsKFMmCdanVNKR6vx/clx/tTqDphxnTMANhQ10C2al8EG4SFS3oisV4QBu5Gzdu/w4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=LS4GmB83; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="LS4GmB83" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322301; x=1742858301; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=aOa7tSKdrtNNgJakY80BLsxWTWX10CPFdKyFwZ4Ph/I=; b=LS4GmB83aH99kr9QIMO3UK2GFZTZb+O/N+UOGDdjOGpFaHc3FS+MpnKX u08R5PszEAXdahVu7RVXKUqLOCIXX7HuKnJtl82G8dIL0ntr+U2nZAMYo a7muO6WeXVaHeHWNZMgvDIe7j9h3+shKsN1QIlTGu0XzPEg6qh6kf/zdC GnahHw1LRNk8j5Yi1Xx2/G1MdWHafUDlR/qZ2sYF2MU/eCLitiDDvzn7J owU58oDzItmxVCYTD+tVPVgYp/YDggstvrcCCcbZ7uxn4tlnFwItA0kk4 ur7M+FxURgUfNmqTWv8tXjUSrcudcRv6bhApwLzLGU/5hUb485FckQ3Vi A==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="6431715" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="6431715" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15464689" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:17 -0700 From: Ira Weiny Date: Sun, 24 Mar 2024 16:18:14 -0700 Subject: [PATCH 11/26] cxl/pci: Delay event buffer allocation Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-11-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=1026; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=aOa7tSKdrtNNgJakY80BLsxWTWX10CPFdKyFwZ4Ph/I=; b=RVAKA90DnUdbxQtuGUIzs3pcIWux1IflAmZNWbu+KeOiQP5vCfQyAxYhTvQnW3tCgDa5G705n N4dJSVSlGUKCLvxaXUC+QZvVQ/VUxMOFTzbwYYNXV9DAiMSGT2n2fz4 X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= The event buffer does not need to be allocated if something has failed in setting up event irq's. In prep for adjusting event configuration for DCD events move the buffer allocation to the end of the event configuration. Signed-off-by: Ira Weiny Reviewed-by: Davidlohr Bueso Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron --- drivers/cxl/pci.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index cedd9b05f129..ccaf4ad26a4f 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -756,10 +756,6 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge, return 0; } - rc = cxl_mem_alloc_event_buf(mds); - if (rc) - return rc; - rc = cxl_event_get_int_policy(mds, &policy); if (rc) return rc; @@ -777,6 +773,10 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge, if (rc) return rc; + rc = cxl_mem_alloc_event_buf(mds); + if (rc) + return rc; + rc = cxl_event_irqsetup(mds, &policy); if (rc) return rc; From patchwork Sun Mar 24 23:18:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601023 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 37252236581; Sun, 24 Mar 2024 23:18:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322302; cv=none; b=b+AxuGmwa8zZ+PcPAOyNQo70r3/eJOWzJYSwg2XY52fw/wLVrc0lajh1c7X2QLC9V7z1Z7kRdOsNtZyr/qhtKtjC4xvL9Ztz+J7SM5BpO4z6N6D6lO5s0yGHtynpVhve/8CynyP3IBby8oEUeuL8HZUQQbDN2XCVhE2Zs0c/3mo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322302; c=relaxed/simple; bh=CiHiKsG4LEsFUzhFG3rWxV2fGxiw05tzfl9X4v0Ed0o=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=J0trhuvHAQCLJPbJHy7FsteC9ZgIU21TM3bNnT8y96/g5wTBV8+aIJJBp8RnPVXBeavQJ7qPmyCZdksbzIwuNdOZ65IHlZtgWfm5uvSoWaU0GzdmsAaksIqWJ2WJDgZjrjUh9mqdkD5Tc++Q5RxuntG+O0+R2DLAZt0mul9heU4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Nh5n5TSr; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Nh5n5TSr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322301; x=1742858301; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=CiHiKsG4LEsFUzhFG3rWxV2fGxiw05tzfl9X4v0Ed0o=; b=Nh5n5TSrWs9nBntAAKceTzUORFOssIcAGzMEAKr5iZg7JAjTs3iSGad5 7krxjTsWfy7i2T3lZLmz+SvGewX1S2ytM4KzUqh8Y7Qos2zA4Pljzmcyj h6jjcad9aL1E4269VGnqrKeJJkK6ZSXIhZ6FVQf6TGH49Cr98lkx/HoFs goOXZ3725tC90iPqVKz5zWRm62tox7AKt+kpJgR0ZRQYjIdH4Z/msx21g GZI64hdcPlzQ01lqtKU++FyjMfU8XFgQy7ZiurOMCdK+YjABTa/f0roV3 xMWA1Kbo3d6uoGyzonwyqtv8zygyZXE4YE4uL6RaBD9jlLycrvy37LEXz Q==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="6431720" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="6431720" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15464692" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:17 -0700 From: Ira Weiny Date: Sun, 24 Mar 2024 16:18:15 -0700 Subject: [PATCH 12/26] cxl/pci: Factor out interrupt policy check Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-12-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=1959; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=CiHiKsG4LEsFUzhFG3rWxV2fGxiw05tzfl9X4v0Ed0o=; b=o2G89A/C/ro+KexNY+8PtmPRkeuPhN2vLmc6wqo19fSzntp8uudLilGvrNqhq2329sh3Ci0Ax LWMISYLWW/rCCZtl2abE1KLgfKtgSQBNTtFFRBN3xqGiilF3oKEe2Yt X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= Dynamic capacity devices (DCD) require interrupts to notify the host of events in the DCD log. The interrupts for DCD may be supported despite FW control of memory event logs. Prepare to support DCD event interrupts separate from other event interrupts by factoring out the check for event interrupt settings. Signed-off-by: Ira Weiny Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron --- Changes for V3: [iweiny: new patch] --- drivers/cxl/pci.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index ccaf4ad26a4f..12cd5d399230 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -738,6 +738,21 @@ static bool cxl_event_int_is_fw(u8 setting) return mode == CXL_INT_FW; } +static bool cxl_event_validate_mem_policy(struct cxl_memdev_state *mds, + struct cxl_event_interrupt_policy *policy) +{ + if (cxl_event_int_is_fw(policy->info_settings) || + cxl_event_int_is_fw(policy->warn_settings) || + cxl_event_int_is_fw(policy->failure_settings) || + cxl_event_int_is_fw(policy->fatal_settings)) { + dev_err(mds->cxlds.dev, + "FW still in control of Event Logs despite _OSC settings\n"); + return false; + } + + return true; +} + static int cxl_event_config(struct pci_host_bridge *host_bridge, struct cxl_memdev_state *mds, bool irq_avail) { @@ -760,14 +775,8 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge, if (rc) return rc; - if (cxl_event_int_is_fw(policy.info_settings) || - cxl_event_int_is_fw(policy.warn_settings) || - cxl_event_int_is_fw(policy.failure_settings) || - cxl_event_int_is_fw(policy.fatal_settings)) { - dev_err(mds->cxlds.dev, - "FW still in control of Event Logs despite _OSC settings\n"); + if (!cxl_event_validate_mem_policy(mds, &policy)) return -EBUSY; - } rc = cxl_event_config_msgnums(mds, &policy); if (rc) From patchwork Sun Mar 24 23:18:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601024 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EFF6723658E; Sun, 24 Mar 2024 23:18:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322304; cv=none; b=Fi5g82VbYiCGmJlKrNAUeV0mxZC+YtHZwbd4LObdKbY6d9fekuB5FeAeZDgi4rjwFfVc3XVfm6UqUvER0uEnJiKKgRujIDgUJ+yZp4iA6rAJvHprKE951xIQ1P+CjFzOLtuhUb8/jEzJZLHc3irtLCZpGWeHXEQdCPW3sCBhbiw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322304; c=relaxed/simple; bh=jpICBPUr8QKCqbR4SJy1LOgy8CWC6HE3wLUZi8Ng7QI=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=RT3e4J5vMP9Nd1G7wcuit7VQvspDLoV7d9fEywWiOWAQoBPBvjaB3zsY2mR+14aE5Gh31Z/SjkUk9it/EaaDAvsvSwVT1RPc6G/gg3DNlxFps8f6m/hjdZjpcqulLh2pM9XqqN6PQnfuxCYa6704sSViktRhO36w4U4I6MR+swo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=jvOfUriz; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="jvOfUriz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322302; x=1742858302; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=jpICBPUr8QKCqbR4SJy1LOgy8CWC6HE3wLUZi8Ng7QI=; b=jvOfUrizr10OCGWvoalLVSvC6SKLWKRxjMAezrlqLdEFy1YwOy9hkV3b Mvu81rbDTlgMH7MDvGjvsXVqkdXSdwYb7yDkbKBdhr9/xtE8jj8fhLl2Y fAKFt4QeerEdaXBr7N3KIqLrWz8bBW0lFJ39rzubHnhKP0dLKoNNvCy9g 4Gx34n0Y9VXiU/1YSLsM8z2rBPHTji6BXyOKrsM4ggQY099qESfSZpFBu 7mvK9cXk0gzfscnZ7Oi3/snM4tJSZbUAjJpJm/Que3U/99j0hd7IW3WI3 fRBUblJmb6VHHqlyu18WzN8tTX/stJ9y7tqA7XbJQAqn8wgE1g/BKhEv8 g==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="6431724" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="6431724" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15464695" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:18 -0700 From: ira.weiny@intel.com Date: Sun, 24 Mar 2024 16:18:16 -0700 Subject: [PATCH 13/26] cxl/mem: Configure dynamic capacity interrupts Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-13-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=7384; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=afeKpiwwXvLku3qEQtUdlGCLPDe8+v1v2EDR9Z21gU0=; b=1CWhZo72Bc/KNO7mSZme0mvhCKIebSsHGuyOOKuKh7hjC9cPxXTiEC9s9Gbe1Bt1ITIafKZjy cy9QJof5DZpD1h0DjwHRMOpY2eTVe4aywAK/34OUtZQOhpL+FZG+gAd X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh Dynamic Capacity Devices (DCD) support extent change notifications through the event log mechanism. The interrupt mailbox commands were extended in CXL 3.1 to support these notifications. Firmware can't configure DCD events to be FW controlled but can retain control of memory events. Split irq configuration of memory events and DCD events to allow for FW control of memory events while DCD is host controlled. Configure DCD event log interrupts on devices supporting dynamic capacity. Disable DCD if interrupts are not supported. Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny Reviewed-by: Jonathan Cameron --- Changes for v1 [iweiny: rebase to upstream irq code] [iweiny: disable DCD if irqs not supported] --- drivers/cxl/core/mbox.c | 9 ++++++- drivers/cxl/cxl.h | 4 ++- drivers/cxl/cxlmem.h | 4 +++ drivers/cxl/pci.c | 71 ++++++++++++++++++++++++++++++++++++++++--------- 4 files changed, 74 insertions(+), 14 deletions(-) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 14e8a7528a8b..58b31fa47b93 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -1323,10 +1323,17 @@ static int cxl_get_dc_config(struct cxl_memdev_state *mds, u8 start_region, return rc; } -static bool cxl_dcd_supported(struct cxl_memdev_state *mds) +bool cxl_dcd_supported(struct cxl_memdev_state *mds) { return test_bit(CXL_DCD_ENABLED_GET_CONFIG, mds->dcd_cmds); } +EXPORT_SYMBOL_NS_GPL(cxl_dcd_supported, CXL); + +void cxl_disable_dcd(struct cxl_memdev_state *mds) +{ + clear_bit(CXL_DCD_ENABLED_GET_CONFIG, mds->dcd_cmds); +} +EXPORT_SYMBOL_NS_GPL(cxl_disable_dcd, CXL); /** * cxl_dev_dynamic_capacity_identify() - Reads the dynamic capacity diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 15d418b3bc9b..d585f5fdd3ae 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -164,11 +164,13 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw) #define CXLDEV_EVENT_STATUS_WARN BIT(1) #define CXLDEV_EVENT_STATUS_FAIL BIT(2) #define CXLDEV_EVENT_STATUS_FATAL BIT(3) +#define CXLDEV_EVENT_STATUS_DCD BIT(4) #define CXLDEV_EVENT_STATUS_ALL (CXLDEV_EVENT_STATUS_INFO | \ CXLDEV_EVENT_STATUS_WARN | \ CXLDEV_EVENT_STATUS_FAIL | \ - CXLDEV_EVENT_STATUS_FATAL) + CXLDEV_EVENT_STATUS_FATAL| \ + CXLDEV_EVENT_STATUS_DCD) /* CXL rev 3.0 section 8.2.9.2.4; Table 8-52 */ #define CXLDEV_EVENT_INT_MODE_MASK GENMASK(1, 0) diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 4624cf612c1e..01bee6eedff3 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -225,7 +225,9 @@ struct cxl_event_interrupt_policy { u8 warn_settings; u8 failure_settings; u8 fatal_settings; + u8 dcd_settings; } __packed; +#define CXL_EVENT_INT_POLICY_BASE_SIZE 4 /* info, warn, failure, fatal */ /** * struct cxl_event_state - Event log driver state @@ -890,6 +892,8 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd, enum cxl_event_log_type type, enum cxl_event_type event_type, const uuid_t *uuid, union cxl_event *evt); +bool cxl_dcd_supported(struct cxl_memdev_state *mds); +void cxl_disable_dcd(struct cxl_memdev_state *mds); int cxl_set_timestamp(struct cxl_memdev_state *mds); int cxl_poison_state_init(struct cxl_memdev_state *mds); int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len, diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 12cd5d399230..ef482eae09e9 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -669,22 +669,33 @@ static int cxl_event_get_int_policy(struct cxl_memdev_state *mds, } static int cxl_event_config_msgnums(struct cxl_memdev_state *mds, - struct cxl_event_interrupt_policy *policy) + struct cxl_event_interrupt_policy *policy, + bool native_cxl) { struct cxl_mbox_cmd mbox_cmd; + size_t size_in; int rc; - *policy = (struct cxl_event_interrupt_policy) { - .info_settings = CXL_INT_MSI_MSIX, - .warn_settings = CXL_INT_MSI_MSIX, - .failure_settings = CXL_INT_MSI_MSIX, - .fatal_settings = CXL_INT_MSI_MSIX, - }; + if (native_cxl) { + *policy = (struct cxl_event_interrupt_policy) { + .info_settings = CXL_INT_MSI_MSIX, + .warn_settings = CXL_INT_MSI_MSIX, + .failure_settings = CXL_INT_MSI_MSIX, + .fatal_settings = CXL_INT_MSI_MSIX, + .dcd_settings = 0, + }; + } + size_in = CXL_EVENT_INT_POLICY_BASE_SIZE; + + if (cxl_dcd_supported(mds)) { + policy->dcd_settings = CXL_INT_MSI_MSIX; + size_in += sizeof(policy->dcd_settings); + } mbox_cmd = (struct cxl_mbox_cmd) { .opcode = CXL_MBOX_OP_SET_EVT_INT_POLICY, .payload_in = policy, - .size_in = sizeof(*policy), + .size_in = size_in, }; rc = cxl_internal_send_cmd(mds, &mbox_cmd); @@ -731,6 +742,31 @@ static int cxl_event_irqsetup(struct cxl_memdev_state *mds, return 0; } +static int cxl_irqsetup(struct cxl_memdev_state *mds, + struct cxl_event_interrupt_policy *policy, + bool native_cxl) +{ + struct cxl_dev_state *cxlds = &mds->cxlds; + int rc; + + if (native_cxl) { + rc = cxl_event_irqsetup(mds, policy); + if (rc) + return rc; + } + + if (cxl_dcd_supported(mds)) { + rc = cxl_event_req_irq(cxlds, policy->dcd_settings); + if (rc) { + dev_err(cxlds->dev, "Failed to get interrupt for DCD event log\n"); + cxl_disable_dcd(mds); + return rc; + } + } + + return 0; +} + static bool cxl_event_int_is_fw(u8 setting) { u8 mode = FIELD_GET(CXLDEV_EVENT_INT_MODE_MASK, setting); @@ -757,17 +793,25 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge, struct cxl_memdev_state *mds, bool irq_avail) { struct cxl_event_interrupt_policy policy = { 0 }; + bool native_cxl = host_bridge->native_cxl_error; int rc; /* * When BIOS maintains CXL error reporting control, it will process * event records. Only one agent can do so. + * + * If BIOS has control of events and DCD is not supported skip event + * configuration. */ - if (!host_bridge->native_cxl_error) + if (!native_cxl && !cxl_dcd_supported(mds)) return 0; if (!irq_avail) { dev_info(mds->cxlds.dev, "No interrupt support, disable event processing.\n"); + if (cxl_dcd_supported(mds)) { + dev_info(mds->cxlds.dev, "DCD requires interrupts, disable DCD\n"); + cxl_disable_dcd(mds); + } return 0; } @@ -775,10 +819,10 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge, if (rc) return rc; - if (!cxl_event_validate_mem_policy(mds, &policy)) + if (native_cxl && !cxl_event_validate_mem_policy(mds, &policy)) return -EBUSY; - rc = cxl_event_config_msgnums(mds, &policy); + rc = cxl_event_config_msgnums(mds, &policy, native_cxl); if (rc) return rc; @@ -786,12 +830,15 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge, if (rc) return rc; - rc = cxl_event_irqsetup(mds, &policy); + rc = cxl_irqsetup(mds, &policy, native_cxl); if (rc) return rc; cxl_mem_get_event_records(mds, CXLDEV_EVENT_STATUS_ALL); + dev_dbg(mds->cxlds.dev, "Event config : %d %d\n", + native_cxl, cxl_dcd_supported(mds)); + return 0; } From patchwork Sun Mar 24 23:18:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601117 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2532B23658F; Sun, 24 Mar 2024 23:18:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322304; cv=none; b=dj1UpGDiouT7qZ6fNjBUMDD4QSjlLwiG0otz8oAwRNy0SAcL3kzsQ/O6HjKk2K/Dds0jbVdPvVyDdSPpd1a4Odx9wUeVNlQkxb/RvWsjYyLRgLsUSZwlkFaVacbjGZFweREe6PXNoKnjUsZb8dNFQgsEc+6nFRgmTbFoUhcYFYM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322304; c=relaxed/simple; bh=x1GgnfLYWhI5QFklssm7uwKR9yqAjOlcwPHpIhLEmk0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=JUaRVtH7N89sWw1tp0Q8JLDzASOhMMCevva5Fdg/+kXQfRCjn3pb1ffL0RVrEwP2FlhCSa+5EkxMkyLAHQleLMaF9tAHpzywJLqhLDpsEY04RDZ8E0lfoVoRnbSHruGFN5YODPbjMCetr45gnxw6RkpZygyCs5iNBmnXnqHzUJQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=U06QLCb0; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="U06QLCb0" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322302; x=1742858302; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=x1GgnfLYWhI5QFklssm7uwKR9yqAjOlcwPHpIhLEmk0=; b=U06QLCb0sC/xhG3aRXt3oZwTDKHqyURR9EM8v4gc0SoMXHH5jDKzq9KP ezDphx0hQh91m6KtWod0THnPySTRPvLi1bEDOCUARGxY9hjBl1HbtsKbp WWi72aUiqMwZFhkE57kDGBHLPffR2RkQe919wkgmQaA1enUiR9j2CeJfu e2alSD5Sqtmzi9gqv8PlKhU4j12N6QZwOp4GA6rIQFdtUW7+XnnOde9iQ mNyw9UBweHJXiqtW1MsnEGnND5/UePUduQgQqgREdX7OQ9GCitAhN154+ AK9ugX/d7GpnhmKhetuNGmlDTWmh373p5+KPJCXkf/dZxp2Bl0JtrH8Uf w==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="6431729" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="6431729" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15464698" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:18 -0700 From: ira.weiny@intel.com Date: Sun, 24 Mar 2024 16:18:17 -0700 Subject: [PATCH 14/26] cxl/region: Read existing extents on region creation Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-14-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=12604; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=KSm40xjc+yMQCVDZnhmabrRF5UPpbl6fqI4VMIITfAU=; b=jnX5681Ub4O6kxCUM7v0YY26C2yBk9Ofywnm0S8dd5h3EeDcBCnr9E5yPB6NXEzLNQEN3adOw 1D3OF+PYNdaBOM4upxteQpP2R/LXWwWyrb/GYi6D596wesL4/o70Gpv X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh Dynamic capacity device extents may be left in an accepted state on a device due to an unexpected host crash. In this case creation of a new region on top of the DC partition (region) is expected to expose those extents for continued use. Once all endpoint decoders are part of a region and the region is being realized read the device extent list. For ease of review, this patch stops after reading the extent list and leaves realization of the region extents to a future patch. Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny --- Changes for v1: [iweiny: remove extent list xarray] [iweiny: Update spec references to 3.1] [iweiny: use struct range in extents] [iweiny: remove all reference tracking and let regions track extents through the extent devices.] [djbw/Jonathan/Fan: move extent tracking to endpoint decoders] --- drivers/cxl/core/core.h | 9 +++ drivers/cxl/core/mbox.c | 192 ++++++++++++++++++++++++++++++++++++++++++++++ drivers/cxl/core/region.c | 29 +++++++ drivers/cxl/cxlmem.h | 49 ++++++++++++ 4 files changed, 279 insertions(+) diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 91abeffbe985..119b12362977 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -4,6 +4,8 @@ #ifndef __CXL_CORE_H__ #define __CXL_CORE_H__ +#include + extern const struct device_type cxl_nvdimm_bridge_type; extern const struct device_type cxl_nvdimm_type; extern const struct device_type cxl_pmu_type; @@ -28,6 +30,8 @@ void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled); int cxl_region_init(void); void cxl_region_exit(void); int cxl_get_poison_by_endpoint(struct cxl_port *port); +int cxl_ed_add_one_extent(struct cxl_endpoint_decoder *cxled, + struct cxl_dc_extent *dc_extent); #else static inline int cxl_get_poison_by_endpoint(struct cxl_port *port) { @@ -43,6 +47,11 @@ static inline int cxl_region_init(void) static inline void cxl_region_exit(void) { } +static inline int cxl_ed_add_one_extent(struct cxl_endpoint_decoder *cxled, + struct cxl_dc_extent *dc_extent) +{ + return 0; +} #define CXL_REGION_ATTR(x) NULL #define CXL_REGION_TYPE(x) NULL #define SET_CXL_REGION_ATTR(x) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 58b31fa47b93..9e33a0976828 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -870,6 +870,53 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds) } EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL); +static int cxl_validate_extent(struct cxl_memdev_state *mds, + struct cxl_dc_extent *dc_extent) +{ + struct device *dev = mds->cxlds.dev; + uint64_t start, len; + + start = le64_to_cpu(dc_extent->start_dpa); + len = le64_to_cpu(dc_extent->length); + + /* Extents must not cross region boundary's */ + for (int i = 0; i < mds->nr_dc_region; i++) { + struct cxl_dc_region_info *dcr = &mds->dc_region[i]; + + if (dcr->base <= start && + (start + len) <= (dcr->base + dcr->decode_len)) { + dev_dbg(dev, "DC extent DPA %#llx - %#llx (DCR:%d:%#llx)\n", + start, start + len - 1, i, start - dcr->base); + return 0; + } + } + + dev_err_ratelimited(dev, + "DC extent DPA %#llx - %#llx is not in any DC region\n", + start, start + len - 1); + return -EINVAL; +} + +static bool cxl_dc_extent_in_ed(struct cxl_endpoint_decoder *cxled, + struct cxl_dc_extent *extent) +{ + uint64_t start = le64_to_cpu(extent->start_dpa); + uint64_t length = le64_to_cpu(extent->length); + struct range ext_range = (struct range){ + .start = start, + .end = start + length - 1, + }; + struct range ed_range = (struct range) { + .start = cxled->dpa_res->start, + .end = cxled->dpa_res->end, + }; + + dev_dbg(&cxled->cxld.dev, "Checking ED (%pr) for extent DPA:%#llx LEN:%#llx\n", + cxled->dpa_res, start, length); + + return range_contains(&ed_range, &ext_range); +} + void cxl_event_trace_record(const struct cxl_memdev *cxlmd, enum cxl_event_log_type type, enum cxl_event_type event_type, @@ -973,6 +1020,15 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds, return rc; } +static struct cxl_memdev_state * +cxled_to_mds(struct cxl_endpoint_decoder *cxled) +{ + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_dev_state *cxlds = cxlmd->cxlds; + + return container_of(cxlds, struct cxl_memdev_state, cxlds); +} + static void cxl_mem_get_records_log(struct cxl_memdev_state *mds, enum cxl_event_log_type type) { @@ -1406,6 +1462,142 @@ int cxl_dev_dynamic_capacity_identify(struct cxl_memdev_state *mds) } EXPORT_SYMBOL_NS_GPL(cxl_dev_dynamic_capacity_identify, CXL); +static int cxl_dev_get_dc_extent_cnt(struct cxl_memdev_state *mds, + unsigned int *extent_gen_num) +{ + struct cxl_mbox_get_dc_extent_in get_dc_extent; + struct cxl_mbox_get_dc_extent_out dc_extents; + struct cxl_mbox_cmd mbox_cmd; + unsigned int count; + int rc; + + get_dc_extent = (struct cxl_mbox_get_dc_extent_in) { + .extent_cnt = cpu_to_le32(0), + .start_extent_index = cpu_to_le32(0), + }; + + mbox_cmd = (struct cxl_mbox_cmd) { + .opcode = CXL_MBOX_OP_GET_DC_EXTENT_LIST, + .payload_in = &get_dc_extent, + .size_in = sizeof(get_dc_extent), + .size_out = sizeof(dc_extents), + .payload_out = &dc_extents, + .min_out = 1, + }; + + rc = cxl_internal_send_cmd(mds, &mbox_cmd); + if (rc < 0) + return rc; + + count = le32_to_cpu(dc_extents.total_extent_cnt); + *extent_gen_num = le32_to_cpu(dc_extents.extent_list_num); + + return count; +} + +static int cxl_dev_get_dc_extents(struct cxl_endpoint_decoder *cxled, + unsigned int start_gen_num, + unsigned int exp_cnt) +{ + struct cxl_memdev_state *mds = cxled_to_mds(cxled); + unsigned int start_index, total_read; + struct device *dev = mds->cxlds.dev; + struct cxl_mbox_cmd mbox_cmd; + + struct cxl_mbox_get_dc_extent_out *dc_extents __free(kfree) = + kvmalloc(mds->payload_size, GFP_KERNEL); + if (!dc_extents) + return -ENOMEM; + + total_read = 0; + start_index = 0; + do { + unsigned int nr_ext, total_extent_cnt, gen_num; + struct cxl_mbox_get_dc_extent_in get_dc_extent; + int rc; + + get_dc_extent = (struct cxl_mbox_get_dc_extent_in) { + .extent_cnt = cpu_to_le32(exp_cnt - start_index), + .start_extent_index = cpu_to_le32(start_index), + }; + + mbox_cmd = (struct cxl_mbox_cmd) { + .opcode = CXL_MBOX_OP_GET_DC_EXTENT_LIST, + .payload_in = &get_dc_extent, + .size_in = sizeof(get_dc_extent), + .size_out = mds->payload_size, + .payload_out = dc_extents, + .min_out = 1, + }; + + rc = cxl_internal_send_cmd(mds, &mbox_cmd); + if (rc < 0) + return rc; + + nr_ext = le32_to_cpu(dc_extents->ret_extent_cnt); + total_read += nr_ext; + total_extent_cnt = le32_to_cpu(dc_extents->total_extent_cnt); + gen_num = le32_to_cpu(dc_extents->extent_list_num); + + dev_dbg(dev, "Get extent list count:%d generation Num:%d\n", + total_extent_cnt, gen_num); + + if (gen_num != start_gen_num || exp_cnt != total_extent_cnt) { + dev_err(dev, "Possible incomplete extent list; gen %u != %u : cnt %u != %u\n", + gen_num, start_gen_num, exp_cnt, total_extent_cnt); + return -EIO; + } + + for (int i = 0; i < nr_ext ; i++) { + dev_dbg(dev, "Processing extent %d/%d\n", + start_index + i, exp_cnt); + rc = cxl_validate_extent(mds, &dc_extents->extent[i]); + if (rc) + continue; + if (!cxl_dc_extent_in_ed(cxled, &dc_extents->extent[i])) + continue; + rc = cxl_ed_add_one_extent(cxled, &dc_extents->extent[i]); + if (rc) + return rc; + } + + start_index += nr_ext; + } while (exp_cnt > total_read); + + return 0; +} + +/** + * cxl_read_dc_extents() - Read any existing extents + * @cxled: Endpoint decoder which is part of a region + * + * Issue the Get Dynamic Capacity Extent List command to the device + * and add any existing extents found which belong to this decoder. + * + * Return: 0 if command was executed successfully, -ERRNO on error. + */ +int cxl_read_dc_extents(struct cxl_endpoint_decoder *cxled) +{ + struct cxl_memdev_state *mds = cxled_to_mds(cxled); + struct device *dev = mds->cxlds.dev; + unsigned int extent_gen_num; + int rc; + + if (!cxl_dcd_supported(mds)) { + dev_dbg(dev, "DCD unsupported\n"); + return 0; + } + + rc = cxl_dev_get_dc_extent_cnt(mds, &extent_gen_num); + dev_dbg(mds->cxlds.dev, "Extent count: %d Generation Num: %d\n", + rc, extent_gen_num); + if (rc <= 0) /* 0 == no records found */ + return rc; + + return cxl_dev_get_dc_extents(cxled, extent_gen_num, rc); +} +EXPORT_SYMBOL_NS_GPL(cxl_read_dc_extents, CXL); + static int add_dpa_res(struct device *dev, struct resource *parent, struct resource *res, resource_size_t start, resource_size_t size, const char *type) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 0d7b09a49dcf..3e563ab29afe 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1450,6 +1450,13 @@ static int cxl_region_validate_position(struct cxl_region *cxlr, return 0; } +/* Callers are expected to ensure cxled has been attached to a region */ +int cxl_ed_add_one_extent(struct cxl_endpoint_decoder *cxled, + struct cxl_dc_extent *dc_extent) +{ + return 0; +} + static int cxl_region_attach_position(struct cxl_region *cxlr, struct cxl_root_decoder *cxlrd, struct cxl_endpoint_decoder *cxled, @@ -2773,6 +2780,22 @@ static int devm_cxl_add_pmem_region(struct cxl_region *cxlr) return rc; } +static int cxl_region_read_extents(struct cxl_region *cxlr) +{ + struct cxl_region_params *p = &cxlr->params; + int i; + + for (i = 0; i < p->nr_targets; i++) { + int rc; + + rc = cxl_read_dc_extents(p->targets[i]); + if (rc) + return rc; + } + + return 0; +} + static void cxlr_dax_unregister(void *_cxlr_dax) { struct cxl_dax_region *cxlr_dax = _cxlr_dax; @@ -2807,6 +2830,12 @@ static int devm_cxl_add_dax_region(struct cxl_region *cxlr) dev_dbg(&cxlr->dev, "%s: register %s\n", dev_name(dev->parent), dev_name(dev)); + if (cxlr->mode == CXL_REGION_DC) { + rc = cxl_region_read_extents(cxlr); + if (rc) + goto err; + } + return devm_add_action_or_reset(&cxlr->dev, cxlr_dax_unregister, cxlr_dax); err: diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 01bee6eedff3..8f2d8944d334 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -604,6 +604,54 @@ enum cxl_opcode { UUID_INIT(0xe1819d9, 0x11a9, 0x400c, 0x81, 0x1f, 0xd6, 0x07, 0x19, \ 0x40, 0x3d, 0x86) +/* + * Add Dynamic Capacity Response + * CXL rev 3.1 section 8.2.9.9.9.3; Table 8-168 & Table 8-169 + */ +struct cxl_mbox_dc_response { + __le32 extent_list_size; + u8 flags; + u8 reserved[3]; + struct updated_extent_list { + __le64 dpa_start; + __le64 length; + u8 reserved[8]; + } __packed extent_list[]; +} __packed; + +/* + * CXL rev 3.1 section 8.2.9.2.1.6; Table 8-51 + */ +#define CXL_DC_EXTENT_TAG_LEN 0x10 +struct cxl_dc_extent { + __le64 start_dpa; + __le64 length; + u8 tag[CXL_DC_EXTENT_TAG_LEN]; + __le16 shared_extn_seq; + u8 reserved[6]; +} __packed; + +/* + * Get Dynamic Capacity Extent List; Input Payload + * CXL rev 3.1 section 8.2.9.9.9.2; Table 8-166 + */ +struct cxl_mbox_get_dc_extent_in { + __le32 extent_cnt; + __le32 start_extent_index; +} __packed; + +/* + * Get Dynamic Capacity Extent List; Output Payload + * CXL rev 3.1 section 8.2.9.9.9.2; Table 8-167 + */ +struct cxl_mbox_get_dc_extent_out { + __le32 ret_extent_cnt; + __le32 total_extent_cnt; + __le32 extent_list_num; + u8 rsvd[4]; + struct cxl_dc_extent extent[]; +} __packed; + struct cxl_mbox_get_supported_logs { __le16 entries; u8 rsvd[6]; @@ -879,6 +927,7 @@ int cxl_internal_send_cmd(struct cxl_memdev_state *mds, struct cxl_mbox_cmd *cmd); int cxl_dev_state_identify(struct cxl_memdev_state *mds); int cxl_dev_dynamic_capacity_identify(struct cxl_memdev_state *mds); +int cxl_read_dc_extents(struct cxl_endpoint_decoder *cxled); int cxl_await_media_ready(struct cxl_dev_state *cxlds); int cxl_enumerate_cmds(struct cxl_memdev_state *mds); int cxl_mem_create_range_info(struct cxl_memdev_state *mds); From patchwork Sun Mar 24 23:18:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601016 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F26822365B2; Sun, 24 Mar 2024 23:18:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322305; cv=none; b=gpRq0VqV5qcUa6CCFOlQ8m/W4XIo+mzOw+RQauSbE+gAg8VxPYP+THojz7L61nLSIBfiFbG7cnSMAkfqK5AHPCoUbNcGO/vZcfq3d6nwZ3BSHV74+x58LtSKzX5ep3igPL67HNLZOGOfAJn5X8JKxKxaoia76eZjXjgYYv1nOIc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322305; c=relaxed/simple; bh=jVPxGR5BgM9vA0s1brpDXZj2La8xlSIyBw8kpmZZV6w=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=reiAtqHXICb9XUL9tvIOLpZ8MEruM0oR/o+nmMmMDZv4qkdhpYyiJwrFQgqas7wq3jZtFeyDcg2hmKbMhT0NJTZ3I2COqcEZaRWsye3lkFnjnvFQK11o6VJROpwUwUpH14fUaW1gho4Fjf2foHyuo5dCOgkeIG5bBT9Q2J9z7Xc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=MfPqms57; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="MfPqms57" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322304; x=1742858304; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=jVPxGR5BgM9vA0s1brpDXZj2La8xlSIyBw8kpmZZV6w=; b=MfPqms57nDAHyOU65vEHcPUe3gnrGc8xmcSUPBLWtp2V2Rrhu8BaN9gr 2THhUKHilOo7JfNAa7A8upBU8fPrgewYYe1ou7WE3BEca6zPruznM0oMb 86WNDZiseQxtt3P7eXTEQzU/hz/dENdQeC2RFu/JsQL39OKIDo3vl8imU paWUQfbPGHDlRsuzuQuhkTswt0X6sEkTv5N5pOdvVAQ+Zo35R8oz9nvYi OgvebLsLSJRYH0AeZ1vYzFr6QUDzt1Ra3Xk0LVnCBEQN0/FkbSTT0RqGx KvEWzLwpq1lC9V40Zz1rwcXQAT+3YUrehmdcPybARkFTlXcXDYCbbeZbe Q==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="6431733" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="6431733" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15464701" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:19 -0700 From: Ira Weiny Date: Sun, 24 Mar 2024 16:18:18 -0700 Subject: [PATCH 15/26] range: Add range_overlaps() Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-15-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, Chris Mason , Josef Bacik , David Sterba X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=3117; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=jVPxGR5BgM9vA0s1brpDXZj2La8xlSIyBw8kpmZZV6w=; b=53iCHLJ/xWLPJlIgzSANBdAyM7tqEPc2zNiSaLKJv/ddYCJwR6n/w4cTZhkxgLdxBdAy9OS1f AobHeXQ6eQTD+GzTSt916xd9MiMVrNMwMNBn+yOb85x9/HEwj20voeQ X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= Code to support CXL Dynamic Capacity devices will have extent ranges which need to be compared for intersection not a subset as is being checked in range_contains(). range_overlaps() is defined in btrfs with a different meaning from what is required in the standard range code. Dan Williams pointed this out in [1]. Adjust the btrfs call according to his suggestion there. Then add a generic range_overlaps(). Cc: Dan Williams Cc: Chris Mason Cc: Josef Bacik Cc: David Sterba Cc: linux-btrfs@vger.kernel.org Signed-off-by: Ira Weiny [1] https://lore.kernel.org/all/65949f79ef908_8dc68294f2@dwillia2-xfh.jf.intel.com.notmuch/ Acked-by: David Sterba Reviewed-by: Davidlohr Bueso Reviewed-by: Johannes Thumshirn Reviewed-by: Fan Ni Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron --- fs/btrfs/ordered-data.c | 10 +++++----- include/linux/range.h | 7 +++++++ 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 59850dc17b22..032d30a49edc 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -111,8 +111,8 @@ static struct rb_node *__tree_search(struct rb_root *root, u64 file_offset, return NULL; } -static int range_overlaps(struct btrfs_ordered_extent *entry, u64 file_offset, - u64 len) +static int btrfs_range_overlaps(struct btrfs_ordered_extent *entry, u64 file_offset, + u64 len) { if (file_offset + len <= entry->file_offset || entry->file_offset + entry->num_bytes <= file_offset) @@ -914,7 +914,7 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_range( while (1) { entry = rb_entry(node, struct btrfs_ordered_extent, rb_node); - if (range_overlaps(entry, file_offset, len)) + if (btrfs_range_overlaps(entry, file_offset, len)) break; if (entry->file_offset >= file_offset + len) { @@ -1043,12 +1043,12 @@ struct btrfs_ordered_extent *btrfs_lookup_first_ordered_range( } if (prev) { entry = rb_entry(prev, struct btrfs_ordered_extent, rb_node); - if (range_overlaps(entry, file_offset, len)) + if (btrfs_range_overlaps(entry, file_offset, len)) goto out; } if (next) { entry = rb_entry(next, struct btrfs_ordered_extent, rb_node); - if (range_overlaps(entry, file_offset, len)) + if (btrfs_range_overlaps(entry, file_offset, len)) goto out; } /* No ordered extent in the range */ diff --git a/include/linux/range.h b/include/linux/range.h index 6ad0b73cb7ad..9a46f3212965 100644 --- a/include/linux/range.h +++ b/include/linux/range.h @@ -13,11 +13,18 @@ static inline u64 range_len(const struct range *range) return range->end - range->start + 1; } +/* True if r1 completely contains r2 */ static inline bool range_contains(struct range *r1, struct range *r2) { return r1->start <= r2->start && r1->end >= r2->end; } +/* True if any part of r1 overlaps r2 */ +static inline bool range_overlaps(struct range *r1, struct range *r2) +{ + return r1->start <= r2->end && r1->end >= r2->start; +} + int add_range(struct range *range, int az, int nr_range, u64 start, u64 end); From patchwork Sun Mar 24 23:18:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601014 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 56A4F2365A7; Sun, 24 Mar 2024 23:18:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322307; cv=none; b=BcQlaaNbwKuEXKTvcUKRbkF7B84XOum/jmnqqDfEwR5Ar5Ca57RmM3iImt+TEHHQYwKksRnKp9Oq0HZcp/d88C3XPhP+YStdWwuYA9sGGmhGsNqZiOsXewGh3CpaFRjTaOic8GVHfGkzSX4T3kDPxzAALP70igrQiRNpPmkOtOE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322307; c=relaxed/simple; bh=uzIkaDR9DzVTzWyCZoOxFMC2grw/x4sFZdUY2CmfHmU=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=mV3ZRJwVFxz0+XMbwSRFRCbl+y2CHbEpuaNL4/7iZtxsIoBZsFb/IHX/5gAJW4TWmEvev7g9NrLFU44h7baB/hORkddeGUWb8gEZViZqRrXg+dfpbOwo0NY0fJdOt3dRMT035XwH0GsW09HumL1xKBTZ6VkY2TkhResY4MomyPI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=cTrLlE7h; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="cTrLlE7h" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322306; x=1742858306; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=uzIkaDR9DzVTzWyCZoOxFMC2grw/x4sFZdUY2CmfHmU=; b=cTrLlE7h90TLm4GFtMDJdr2fbk1XZRYhuUQU5rRUz90lBmFXopsKltzI 88b3yLoNn7LXWaSyw+IirHdseqvc3avGq9noeQmcZiMuJN+0RU4Q12FVK MPD7tJEvc5v2GiICX/U0pwR6V3tepPgTozYQl9Q/8/bbPpHednoe6udPm SNOzrPozhl9TajpRWVyvEcvqIoZB/x8N5jNEPCDQTic+8HNOeGRAmA6qk A1YEzdEi78owBWolMpMDWOEoX9CgmAWEH4Jqh5QNCZktmIiauQXL2fTIP mBPWwvcSlDpvCXrpqjfZJaLnCzMGwCKzVtMVLcmmDA8Ewr+sTCfauf18Y Q==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="6431741" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="6431741" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15464704" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:19 -0700 From: ira.weiny@intel.com Date: Sun, 24 Mar 2024 16:18:19 -0700 Subject: [PATCH 16/26] cxl/extent: Realize extent devices Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-16-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=13693; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=NgYiJWM9/4ntAtlB+IBfuHkMoxNavAg2ElfAU/PW/2c=; b=H72ExBSb3UYyr58zgrBbSJtfrjkY/qMaq98SBMYQ9tlNMsauXTvEfZR71PqVKGhxe3EwfXslQ J836JYtnFSWAd5CYGFcDSLnE8lzAbEgQrjH0MpMktw0hBDlyjOrgQJh X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh Once all extents of an interleave set are present a region must surface an extent to the region. Without interleaving; endpoint decoder and region extents have a 1:1 relationship. Future support for IW > 1 will maintain a N:1 relationship between the device extents and region extents. Create a region extent device for every device extent found. Release of the extent device triggers a response to the underlying hardware extent. There is no strong use case to support the addition of extents which overlap previously accepted extent ranges. Reject such new extents until such time as a good use case emerges. Expose the necessary details of region extents by creating the following sysfs entries. /sys/bus/cxl/devices/dax_regionX/extentY /sys/bus/cxl/devices/dax_regionX/extentY/offset /sys/bus/cxl/devices/dax_regionX/extentY/length /sys/bus/cxl/devices/dax_regionX/extentY/label The use of the extent devices by the DAX layer is deferred to later patches. Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny --- Changes for v1 [iweiny: new patch] [iweiny: Rename 'dr_extent' to 'region_extent'] --- drivers/cxl/core/Makefile | 1 + drivers/cxl/core/extent.c | 133 ++++++++++++++++++++++++++++++++++++++++++++++ drivers/cxl/core/mbox.c | 43 +++++++++++++++ drivers/cxl/core/region.c | 76 +++++++++++++++++++++++++- drivers/cxl/cxl.h | 37 +++++++++++++ tools/testing/cxl/Kbuild | 1 + 6 files changed, 290 insertions(+), 1 deletion(-) diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile index 9259bcc6773c..35c5c76bfcf1 100644 --- a/drivers/cxl/core/Makefile +++ b/drivers/cxl/core/Makefile @@ -14,5 +14,6 @@ cxl_core-y += pci.o cxl_core-y += hdm.o cxl_core-y += pmu.o cxl_core-y += cdat.o +cxl_core-y += extent.o cxl_core-$(CONFIG_TRACING) += trace.o cxl_core-$(CONFIG_CXL_REGION) += region.o diff --git a/drivers/cxl/core/extent.c b/drivers/cxl/core/extent.c new file mode 100644 index 000000000000..487c220f1c3c --- /dev/null +++ b/drivers/cxl/core/extent.c @@ -0,0 +1,133 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2024 Intel Corporation. All rights reserved. */ + +#include +#include +#include + +static DEFINE_IDA(cxl_extent_ida); + +static ssize_t offset_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct region_extent *reg_ext = to_region_extent(dev); + + return sysfs_emit(buf, "%pa\n", ®_ext->hpa_range.start); +} +static DEVICE_ATTR_RO(offset); + +static ssize_t length_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct region_extent *reg_ext = to_region_extent(dev); + u64 length = range_len(®_ext->hpa_range); + + return sysfs_emit(buf, "%pa\n", &length); +} +static DEVICE_ATTR_RO(length); + +static ssize_t label_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct region_extent *reg_ext = to_region_extent(dev); + + return sysfs_emit(buf, "%s\n", reg_ext->label); +} +static DEVICE_ATTR_RO(label); + +static struct attribute *region_extent_attrs[] = { + &dev_attr_offset.attr, + &dev_attr_length.attr, + &dev_attr_label.attr, + NULL, +}; + +static const struct attribute_group region_extent_attribute_group = { + .attrs = region_extent_attrs, +}; + +static const struct attribute_group *region_extent_attribute_groups[] = { + ®ion_extent_attribute_group, + NULL, +}; + +static void region_extent_release(struct device *dev) +{ + struct region_extent *reg_ext = to_region_extent(dev); + + cxl_release_ed_extent(®_ext->ed_ext); + ida_free(&cxl_extent_ida, reg_ext->dev.id); + kfree(reg_ext); +} + +static const struct device_type region_extent_type = { + .name = "extent", + .release = region_extent_release, + .groups = region_extent_attribute_groups, +}; + +bool is_region_extent(struct device *dev) +{ + return dev->type == ®ion_extent_type; +} +EXPORT_SYMBOL_NS_GPL(is_region_extent, CXL); + +static void region_extent_unregister(void *ext) +{ + struct region_extent *reg_ext = ext; + + dev_dbg(®_ext->dev, "DAX region rm extent HPA %#llx - %#llx\n", + reg_ext->hpa_range.start, reg_ext->hpa_range.end); + device_unregister(®_ext->dev); +} + +int dax_region_create_ext(struct cxl_dax_region *cxlr_dax, + struct range *hpa_range, + const char *label, + struct range *dpa_range, + struct cxl_endpoint_decoder *cxled) +{ + struct region_extent *reg_ext; + struct device *dev; + int rc, id; + + id = ida_alloc(&cxl_extent_ida, GFP_KERNEL); + if (id < 0) + return -ENOMEM; + + reg_ext = kzalloc(sizeof(*reg_ext), GFP_KERNEL); + if (!reg_ext) + return -ENOMEM; + + reg_ext->hpa_range = *hpa_range; + reg_ext->ed_ext.dpa_range = *dpa_range; + reg_ext->ed_ext.cxled = cxled; + snprintf(reg_ext->label, DAX_EXTENT_LABEL_LEN, "%s", label); + + dev = ®_ext->dev; + device_initialize(dev); + dev->id = id; + device_set_pm_not_required(dev); + dev->parent = &cxlr_dax->dev; + dev->type = ®ion_extent_type; + rc = dev_set_name(dev, "extent%d", dev->id); + if (rc) + goto err; + + rc = device_add(dev); + if (rc) + goto err; + + dev_dbg(dev, "DAX region extent HPA %#llx - %#llx\n", + reg_ext->hpa_range.start, reg_ext->hpa_range.end); + + return devm_add_action_or_reset(&cxlr_dax->dev, region_extent_unregister, + reg_ext); + +err: + dev_err(&cxlr_dax->dev, "Failed to initialize DAX extent dev HPA %#llx - %#llx\n", + reg_ext->hpa_range.start, reg_ext->hpa_range.end); + + put_device(dev); + return rc; +} diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 9e33a0976828..6b00e717e42b 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -1020,6 +1020,32 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds, return rc; } +static int cxl_send_dc_cap_response(struct cxl_memdev_state *mds, + struct range *extent, int opcode) +{ + struct cxl_mbox_cmd mbox_cmd; + size_t size; + + struct cxl_mbox_dc_response *dc_res __free(kfree); + size = struct_size(dc_res, extent_list, 1); + dc_res = kzalloc(size, GFP_KERNEL); + if (!dc_res) + return -ENOMEM; + + dc_res->extent_list[0].dpa_start = cpu_to_le64(extent->start); + memset(dc_res->extent_list[0].reserved, 0, 8); + dc_res->extent_list[0].length = cpu_to_le64(range_len(extent)); + dc_res->extent_list_size = cpu_to_le32(1); + + mbox_cmd = (struct cxl_mbox_cmd) { + .opcode = opcode, + .size_in = size, + .payload_in = dc_res, + }; + + return cxl_internal_send_cmd(mds, &mbox_cmd); +} + static struct cxl_memdev_state * cxled_to_mds(struct cxl_endpoint_decoder *cxled) { @@ -1029,6 +1055,23 @@ cxled_to_mds(struct cxl_endpoint_decoder *cxled) return container_of(cxlds, struct cxl_memdev_state, cxlds); } +void cxl_release_ed_extent(struct cxl_ed_extent *extent) +{ + struct cxl_endpoint_decoder *cxled = extent->cxled; + struct cxl_memdev_state *mds = cxled_to_mds(cxled); + struct device *dev = mds->cxlds.dev; + int rc; + + dev_dbg(dev, "Releasing DC extent DPA %#llx - %#llx\n", + extent->dpa_range.start, extent->dpa_range.end); + + rc = cxl_send_dc_cap_response(mds, &extent->dpa_range, CXL_MBOX_OP_RELEASE_DC); + if (rc) + dev_dbg(dev, "Failed to respond releasing extent DPA %#llx - %#llx; %d\n", + extent->dpa_range.start, extent->dpa_range.end, rc); +} +EXPORT_SYMBOL_NS_GPL(cxl_release_ed_extent, CXL); + static void cxl_mem_get_records_log(struct cxl_memdev_state *mds, enum cxl_event_log_type type) { diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 3e563ab29afe..7635ff109578 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1450,11 +1450,81 @@ static int cxl_region_validate_position(struct cxl_region *cxlr, return 0; } +static int extent_check_overlap(struct device *dev, void *arg) +{ + struct range *new_range = arg; + struct region_extent *ext; + + if (!is_region_extent(dev)) + return 0; + + ext = to_region_extent(dev); + return range_overlaps(&ext->hpa_range, new_range); +} + +static int extent_overlaps(struct cxl_dax_region *cxlr_dax, + struct range *hpa_range) +{ + struct device *dev __free(put_device) = + device_find_child(&cxlr_dax->dev, hpa_range, extent_check_overlap); + + if (dev) + return -EINVAL; + return 0; +} + /* Callers are expected to ensure cxled has been attached to a region */ int cxl_ed_add_one_extent(struct cxl_endpoint_decoder *cxled, struct cxl_dc_extent *dc_extent) { - return 0; + struct cxl_region *cxlr = cxled->cxld.region; + struct range ext_dpa_range, ext_hpa_range; + struct device *dev = &cxlr->dev; + resource_size_t dpa_offset, hpa; + + /* + * Interleave ways == 1 means this coresponds to a 1:1 mapping between + * device extents and DAX region extents. Future implementations + * should hold DC region extents here until the full dax region extent + * can be realized. + */ + if (cxlr->params.interleave_ways != 1) { + dev_err(dev, "Interleaving DC not supported\n"); + return -EINVAL; + } + + ext_dpa_range = (struct range) { + .start = le64_to_cpu(dc_extent->start_dpa), + .end = le64_to_cpu(dc_extent->start_dpa) + + le64_to_cpu(dc_extent->length) - 1, + }; + + dev_dbg(dev, "Adding DC extent DPA %#llx - %#llx\n", + ext_dpa_range.start, ext_dpa_range.end); + + /* + * Without interleave... + * HPA offset == DPA offset + * ... but do the math anyway + */ + dpa_offset = ext_dpa_range.start - cxled->dpa_res->start; + hpa = cxled->cxld.hpa_range.start + dpa_offset; + + ext_hpa_range = (struct range) { + .start = hpa - cxlr->cxlr_dax->hpa_range.start, + .end = ext_hpa_range.start + range_len(&ext_dpa_range) - 1, + }; + + if (extent_overlaps(cxlr->cxlr_dax, &ext_hpa_range)) + return -EINVAL; + + dev_dbg(dev, "Realizing region extent at HPA %#llx - %#llx\n", + ext_hpa_range.start, ext_hpa_range.end); + + return dax_region_create_ext(cxlr->cxlr_dax, &ext_hpa_range, + (char *)dc_extent->tag, + &ext_dpa_range, + cxled); } static int cxl_region_attach_position(struct cxl_region *cxlr, @@ -2684,6 +2754,7 @@ static struct cxl_dax_region *cxl_dax_region_alloc(struct cxl_region *cxlr) dev = &cxlr_dax->dev; cxlr_dax->cxlr = cxlr; + cxlr->cxlr_dax = cxlr_dax; device_initialize(dev); lockdep_set_class(&dev->mutex, &cxl_dax_region_key); device_set_pm_not_required(dev); @@ -2799,7 +2870,10 @@ static int cxl_region_read_extents(struct cxl_region *cxlr) static void cxlr_dax_unregister(void *_cxlr_dax) { struct cxl_dax_region *cxlr_dax = _cxlr_dax; + struct cxl_region *cxlr = cxlr_dax->cxlr; + cxlr->cxlr_dax = NULL; + cxlr_dax->cxlr = NULL; device_unregister(&cxlr_dax->dev); } diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index d585f5fdd3ae..5379ad7f5852 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -564,6 +564,7 @@ struct cxl_region_params { * @type: Endpoint decoder target type * @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown * @cxlr_pmem: (for pmem regions) cached copy of the nvdimm bridge + * @cxlr_dax: (for DC regions) cached copy of CXL DAX bridge * @flags: Region state flags * @params: active + config params for the region */ @@ -574,6 +575,7 @@ struct cxl_region { enum cxl_decoder_type type; struct cxl_nvdimm_bridge *cxl_nvb; struct cxl_pmem_region *cxlr_pmem; + struct cxl_dax_region *cxlr_dax; unsigned long flags; struct cxl_region_params params; }; @@ -617,6 +619,41 @@ struct cxl_dax_region { struct range hpa_range; }; +/** + * struct cxl_ed_extent - Extent within an endpoint decoder + * @dpa_range: DPA range this extent covers within the decoder + * @cxled: reference to the endpoint decoder + */ +struct cxl_ed_extent { + struct range dpa_range; + struct cxl_endpoint_decoder *cxled; +}; +void cxl_release_ed_extent(struct cxl_ed_extent *extent); + +/** + * struct region_extent - CXL DAX region extent + * @dev: device representing this extent + * @hpa_range: HPA range of this extent + * @label: label of the extent + * @ed_ext: Endpoint decoder extent which backs this extent + */ +#define DAX_EXTENT_LABEL_LEN 64 +struct region_extent { + struct device dev; + struct range hpa_range; + char label[DAX_EXTENT_LABEL_LEN]; + struct cxl_ed_extent ed_ext; +}; + +int dax_region_create_ext(struct cxl_dax_region *cxlr_dax, + struct range *hpa_range, + const char *label, + struct range *dpa_range, + struct cxl_endpoint_decoder *cxled); + +bool is_region_extent(struct device *dev); +#define to_region_extent(dev) container_of(dev, struct region_extent, dev) + /** * struct cxl_port - logical collection of upstream port devices and * downstream port devices to construct a CXL memory diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild index 030b388800f0..dc0cc1d5e6a0 100644 --- a/tools/testing/cxl/Kbuild +++ b/tools/testing/cxl/Kbuild @@ -60,6 +60,7 @@ cxl_core-y += $(CXL_CORE_SRC)/pci.o cxl_core-y += $(CXL_CORE_SRC)/hdm.o cxl_core-y += $(CXL_CORE_SRC)/pmu.o cxl_core-y += $(CXL_CORE_SRC)/cdat.o +cxl_core-y += $(CXL_CORE_SRC)/extent.o cxl_core-$(CONFIG_TRACING) += $(CXL_CORE_SRC)/trace.o cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o cxl_core-y += config_check.o From patchwork Sun Mar 24 23:18:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601118 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DFFB236CE1; Sun, 24 Mar 2024 23:18:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322307; cv=none; b=aN0/EZK0Cle0BEXDChW0QmFuugvpYDjtUaN8DjKcQ7LYTeKUMb5Au2CD5vwOHKI0xHQWeJl/QEjpjbWC5vyCLODekU7wgEY6BaicaIsrGY3uizzkXC0ESfj3DF5fysL9BXc9SxQNBXf5xFm0Q0qYEPANbvqEI3IL8jdifmDS33A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322307; c=relaxed/simple; bh=IZqFBqovj9w5b04kWEpxsSjxo2TucKltF7PDM/S6mu0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=mWEuzaeOkkPUK7sQ+pAQVFQ087LELV2yRfm/UzA29sj6uMtlWID7n5UFAshnEZrDPzTH7co+EcFu7T028k58TOlD/qZvALQobETc/DICvtnYEp6k9WcwauUzpbeVTMlYVTWB2SKyYEiAjVzXWXkjkOb3cWlZainMcyo/Nm5Scio= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=GHomoorr; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="GHomoorr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322306; x=1742858306; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=IZqFBqovj9w5b04kWEpxsSjxo2TucKltF7PDM/S6mu0=; b=GHomoorrQgiS87TSjGgiFaZ17UKWo0VvjgGT98rnwDOivWDeWNaY2R0s o/FBI3ZxfH4BUzheeEjDK2Jjpp7LRZC6mNxEGifDAVDXYX2Q7eXLjbolE TxjuHM3FeDcrdMqWNdSYUpkFMD4Ol6X5h1l30B6nh69GfCEmMhUcphQJS R8V19ErTJacRSoEl+8dvxKsPZ+1YjM17Ne5Dhx6IufSfU4E1Ck6VbCENK PDcj7Lj9DCntfyUh12BNbMaT0KaRyjsimPtlCGAZgMm80yl7SxphALsW3 /shryzfrrlHiHW16miAbsALxOdQX59c1S03IisWxb/Sa3DO2b5KInsXQS g==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="6431745" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="6431745" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15464710" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:20 -0700 From: ira.weiny@intel.com Date: Sun, 24 Mar 2024 16:18:20 -0700 Subject: [PATCH 17/26] dax/region: Create extent resources on DAX region driver load Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-17-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=6268; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=uW2K4YJK+QU1uOOUdZ9NGHrCKT5Q4KuadM77x+M1R7E=; b=etzHC0KZ70w39eMR5Arfv8AD/8/NpO5ETQZT79OSFkZF8fRBppU9/JRIogrZxW+Kk2kkhkUJj IYr9K+mo9cbCvpyXeYM8HFQqCggMDzUJyW1n4MZM17Ney6ucG3EDDLg X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh DAX regions mapping dynamic capacity partitions introduce a requirement for the memory backing the region to come and go as required. This results in a DAX region with sparse areas of memory backing. To track the sparseness of the region, DAX extent objects need to track sub-resource information as a new layer between the DAX region resource and DAX device range resources. Recall that DCD extents may be accepted when a region is first created. Extend this support on region driver load. Scan existing extents and create DAX extent resources as a first step to DAX extent realization. The lifetime of a DAX extent is tricky to manage because the extent life may end in one of two ways. First, the device may request the extent be released. Second, the region may release the extent when it is destroyed without hardware involvement. Support extent release without hardware involvement first. Subsequent patches will provide for hardware to request extent removal. Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny Reviewed-by: Jonathan Cameron --- Changes for v1 [iweiny: remove xarrays] [iweiny: remove as much of extra reference stuff as possible] [iweiny: Move extent resource handling to core DAX code] --- drivers/dax/bus.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++ drivers/dax/cxl.c | 43 ++++++++++++++++++++++++++++++++++-- drivers/dax/dax-private.h | 12 +++++++++++ 3 files changed, 108 insertions(+), 2 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 903566aff5eb..4d5ed7ab6537 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -186,6 +186,61 @@ static bool is_sparse(struct dax_region *dax_region) return (dax_region->res.flags & IORESOURCE_DAX_SPARSE_CAP) != 0; } +static int dax_region_add_resource(struct dax_region *dax_region, + struct dax_extent *dax_ext, + resource_size_t start, + resource_size_t length) +{ + struct resource *ext_res; + + dev_dbg(dax_region->dev, "DAX region resource %pr\n", &dax_region->res); + ext_res = __request_region(&dax_region->res, start, length, "extent", 0); + if (!ext_res) { + dev_err(dax_region->dev, "Failed to add region s:%pa l:%pa\n", + &start, &length); + return -ENOSPC; + } + + dax_ext->region = dax_region; + dax_ext->res = ext_res; + dev_dbg(dax_region->dev, "Extent add resource %pr\n", ext_res); + + return 0; +} + +static void dax_region_release_extent(void *ext) +{ + struct dax_extent *dax_ext = ext; + struct dax_region *dax_region = dax_ext->region; + + dev_dbg(dax_region->dev, "Extent release resource %pr\n", dax_ext->res); + if (dax_ext->res) + __release_region(&dax_region->res, dax_ext->res->start, + resource_size(dax_ext->res)); + + kfree(dax_ext); +} + +int dax_region_add_extent(struct dax_region *dax_region, struct device *ext_dev, + resource_size_t start, resource_size_t length) +{ + int rc; + + struct dax_extent *dax_ext __free(kfree) = kzalloc(sizeof(*dax_ext), + GFP_KERNEL); + if (!dax_ext) + return -ENOMEM; + + guard(rwsem_write)(&dax_region_rwsem); + rc = dax_region_add_resource(dax_region, dax_ext, start, length); + if (rc) + return rc; + + return devm_add_action_or_reset(ext_dev, dax_region_release_extent, + no_free_ptr(dax_ext)); +} +EXPORT_SYMBOL_GPL(dax_region_add_extent); + bool static_dev_dax(struct dev_dax *dev_dax) { return is_static(dev_dax->region); diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c index 415d03fbf9b6..70bdc7a878ab 100644 --- a/drivers/dax/cxl.c +++ b/drivers/dax/cxl.c @@ -5,6 +5,42 @@ #include "../cxl/cxl.h" #include "bus.h" +#include "dax-private.h" + +static int __cxl_dax_region_add_extent(struct dax_region *dax_region, + struct region_extent *reg_ext) +{ + struct device *ext_dev = ®_ext->dev; + resource_size_t start, length; + + dev_dbg(dax_region->dev, "Adding extent HPA %#llx - %#llx\n", + reg_ext->hpa_range.start, reg_ext->hpa_range.end); + + start = dax_region->res.start + reg_ext->hpa_range.start; + length = reg_ext->hpa_range.end - reg_ext->hpa_range.start + 1; + + return dax_region_add_extent(dax_region, ext_dev, start, length); +} + +static int cxl_dax_region_add_extent(struct device *dev, void *data) +{ + struct dax_region *dax_region = data; + struct region_extent *reg_ext; + + if (!is_region_extent(dev)) + return 0; + + reg_ext = to_region_extent(dev); + + return __cxl_dax_region_add_extent(dax_region, reg_ext); +} + +static void cxl_dax_region_add_extents(struct cxl_dax_region *cxlr_dax, + struct dax_region *dax_region) +{ + dev_dbg(&cxlr_dax->dev, "Adding extents\n"); + device_for_each_child(&cxlr_dax->dev, dax_region, cxl_dax_region_add_extent); +} static int cxl_dax_region_probe(struct device *dev) { @@ -29,9 +65,12 @@ static int cxl_dax_region_probe(struct device *dev) return -ENOMEM; dev_size = range_len(&cxlr_dax->hpa_range); - /* Add empty seed dax device */ - if (cxlr->mode == CXL_REGION_DC) + if (cxlr->mode == CXL_REGION_DC) { + /* NOTE: Depends on dax_region being set in driver data */ + cxl_dax_region_add_extents(cxlr_dax, dax_region); + /* Add empty seed dax device */ dev_size = 0; + } data = (struct dev_dax_data) { .dax_region = dax_region, diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index 446617b73aea..c6319c6567fb 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -16,6 +16,18 @@ struct inode *dax_inode(struct dax_device *dax_dev); int dax_bus_init(void); void dax_bus_exit(void); +/** + * struct dax_extent - For sparse regions; an active extent + * @region: dax_region this resources is in + * @res: resource this extent covers + */ +struct dax_extent { + struct dax_region *region; + struct resource *res; +}; +int dax_region_add_extent(struct dax_region *dax_region, struct device *ext_dev, + resource_size_t start, resource_size_t length); + /** * struct dax_region - mapping infrastructure for dax devices * @id: kernel-wide unique region for a memory range From patchwork Sun Mar 24 23:18:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601027 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D3E48236CFA; Sun, 24 Mar 2024 23:18:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322309; cv=none; b=Bn07Fq1tEqT7SW2r4N+QHcjiCfhi6AOP6QGF4kDC8v+cKr6qfmGJR/95Mc0yI+rA00aFn0D2yRbxB6udxfGK7p9Oh1KP2LoBNovAkeCNS8uu6jz+sSSZ0DTv3ZJl03HGLnf/oOQzq7DoCewOfWqBk8aIcgNEE4qtAZtJSLyhCgQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322309; c=relaxed/simple; bh=EWYEgLHalFugQDxHMuZ1SSZaWkLeAcVsY/Km0kOJIZo=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=iWmFsaSbMelA8ieb6c36LEkhV4llN603z37juGBstW7kVqf3owWQ7aR2BuvErt08YNs78/j/GHDuubyriHvAUNIz4bWVZq9/XpSidGiIm2Q5M6OLXjfRJFueWdDzPlfz2zetlTSjXZ+LaAwtrgsWZrUqnsh6Ecvhv1vzK3lMG2I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=H5zWWqv8; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="H5zWWqv8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322307; x=1742858307; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=EWYEgLHalFugQDxHMuZ1SSZaWkLeAcVsY/Km0kOJIZo=; b=H5zWWqv844hh+z8vBY+AYQKEFqhrGhoLLoGT9XdVaDjhz6YGyvc3J8QM LrwaqJnwyriZa4ZwNb9lNMky38kqf7EXLj80dVROBX9+L2wJbet7qkGzP notcX6rkI9uV7Eci8pkIfhd1NsfQAhKw3o6Yx5V6fREgASIn3R4+oAd9R LqyRqHrvthAc7q+Jv1gvd5lJidQDh+CnHMjFhpilqIC8/jxPxQGY4GFuz RkQ/I7HfzcZKWOR8oOwg/P5KTkTRCMo3LzQ32UmoggyjDAQH0GPxMQ7jS p8P+mEB/YyAZ/5MQIh6Y8xaNa0APaSwnn4m6F1Dqsm2UWFui3vEwnhqhl Q==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="6431749" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="6431749" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15464715" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:20 -0700 From: ira.weiny@intel.com Date: Sun, 24 Mar 2024 16:18:21 -0700 Subject: [PATCH 18/26] cxl/mem: Handle DCD add & release capacity events. Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-18-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=22380; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=5b74S/iuWtfp8wKEYv8807A22Msp/uMjamEP83LPS3Q=; b=dV7PiA9aPr9lRSmlAT9NP2mLMgMQZLzjjegLZfJmCONBD4m9cNRYn/0x7k9XFlPqy3iaVMnd9 KjuGnMc4MRtCcb4nsXVus5LSNjbBC4pvU0ENX8RUY/mKLAJinMGqNDo X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh A dynamic capacity devices (DCD) send events to signal the host about changes in the availability of Dynamic Capacity (DC) memory. These events contain extents, the addition or removal of which may occur at any time. Adding memory is straight forward. If no region exists the extent is rejected. If a region does exist, a region extent is formed and surfaced. Removing memory requires checking if the memory is currently in use. Memory use tracking is added in a subsequent patch so here the memory is never in use and the removal occurs immediately. Most often extents will be offered to and accepted by the host in well defined chunks. However, part of an extent may be requested for release. Simplify extent tracking by signaling removal of any extent which overlaps the requested release range. Force removal is intended as a mechanism between the FM and the device and intended only when the host is unresponsive or otherwise broken. Purposely ignore force removal events. Process DCD extents. Recall that all devices of an interleave set must offer a corresponding extent for the region extent to be realized. This patch limits interleave to 1. Thus the 1:1 mapping between device extent and DAX region extent allows immediate surfacing. Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny --- Changes for v1 [iweiny: remove all xarrays] [iweiny: entirely new architecture] --- drivers/cxl/core/extent.c | 4 ++ drivers/cxl/core/mbox.c | 142 +++++++++++++++++++++++++++++++++++++++++++--- drivers/cxl/core/region.c | 139 ++++++++++++++++++++++++++++++++++++++++----- drivers/cxl/cxl.h | 34 +++++++++++ drivers/cxl/cxlmem.h | 21 +++---- drivers/cxl/mem.c | 45 +++++++++++++++ drivers/dax/cxl.c | 22 +++++++ include/linux/cxl-event.h | 31 ++++++++++ 8 files changed, 405 insertions(+), 33 deletions(-) diff --git a/drivers/cxl/core/extent.c b/drivers/cxl/core/extent.c index 487c220f1c3c..e98acd98ebe2 100644 --- a/drivers/cxl/core/extent.c +++ b/drivers/cxl/core/extent.c @@ -118,6 +118,10 @@ int dax_region_create_ext(struct cxl_dax_region *cxlr_dax, if (rc) goto err; + rc = cxl_region_notify_extent(cxled->cxld.region, DCD_ADD_CAPACITY, reg_ext); + if (rc) + goto err; + dev_dbg(dev, "DAX region extent HPA %#llx - %#llx\n", reg_ext->hpa_range.start, reg_ext->hpa_range.end); diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 6b00e717e42b..7babac2d1c95 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -870,6 +870,37 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds) } EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL); +static int cxl_notify_dc_extent(struct cxl_memdev_state *mds, + enum dc_event event, + struct cxl_dc_extent *dc_extent) +{ + struct cxl_drv_nd nd = (struct cxl_drv_nd) { + .event = event, + .dc_extent = dc_extent + }; + struct device *dev; + int rc = -ENXIO; + + dev = &mds->cxlds.cxlmd->dev; + dev_dbg(dev, "Notify: type %d DPA:%#llx LEN:%#llx\n", + event, le64_to_cpu(dc_extent->start_dpa), + le64_to_cpu(dc_extent->length)); + + device_lock(dev); + if (dev->driver) { + struct cxl_driver *mem_drv = to_cxl_drv(dev->driver); + + if (mem_drv->notify) { + dev_dbg(dev, "Notify driver type %d DPA:%#llx LEN:%#llx\n", + event, le64_to_cpu(dc_extent->start_dpa), + le64_to_cpu(dc_extent->length)); + rc = mem_drv->notify(dev, &nd); + } + } + device_unlock(dev); + return rc; +} + static int cxl_validate_extent(struct cxl_memdev_state *mds, struct cxl_dc_extent *dc_extent) { @@ -897,8 +928,8 @@ static int cxl_validate_extent(struct cxl_memdev_state *mds, return -EINVAL; } -static bool cxl_dc_extent_in_ed(struct cxl_endpoint_decoder *cxled, - struct cxl_dc_extent *extent) +bool cxl_dc_extent_in_ed(struct cxl_endpoint_decoder *cxled, + struct cxl_dc_extent *extent) { uint64_t start = le64_to_cpu(extent->start_dpa); uint64_t length = le64_to_cpu(extent->length); @@ -916,6 +947,7 @@ static bool cxl_dc_extent_in_ed(struct cxl_endpoint_decoder *cxled, return range_contains(&ed_range, &ext_range); } +EXPORT_SYMBOL_NS_GPL(cxl_dc_extent_in_ed, CXL); void cxl_event_trace_record(const struct cxl_memdev *cxlmd, enum cxl_event_log_type type, @@ -1027,15 +1059,20 @@ static int cxl_send_dc_cap_response(struct cxl_memdev_state *mds, size_t size; struct cxl_mbox_dc_response *dc_res __free(kfree); - size = struct_size(dc_res, extent_list, 1); + if (!extent) + size = struct_size(dc_res, extent_list, 0); + else + size = struct_size(dc_res, extent_list, 1); dc_res = kzalloc(size, GFP_KERNEL); if (!dc_res) return -ENOMEM; - dc_res->extent_list[0].dpa_start = cpu_to_le64(extent->start); - memset(dc_res->extent_list[0].reserved, 0, 8); - dc_res->extent_list[0].length = cpu_to_le64(range_len(extent)); - dc_res->extent_list_size = cpu_to_le32(1); + if (extent) { + dc_res->extent_list[0].dpa_start = cpu_to_le64(extent->start); + memset(dc_res->extent_list[0].reserved, 0, 8); + dc_res->extent_list[0].length = cpu_to_le64(range_len(extent)); + dc_res->extent_list_size = cpu_to_le32(1); + } mbox_cmd = (struct cxl_mbox_cmd) { .opcode = opcode, @@ -1072,6 +1109,85 @@ void cxl_release_ed_extent(struct cxl_ed_extent *extent) } EXPORT_SYMBOL_NS_GPL(cxl_release_ed_extent, CXL); +static int cxl_handle_dcd_release_event(struct cxl_memdev_state *mds, + struct cxl_dc_extent *dc_extent) +{ + return cxl_notify_dc_extent(mds, DCD_RELEASE_CAPACITY, dc_extent); +} + +static int cxl_handle_dcd_add_event(struct cxl_memdev_state *mds, + struct cxl_dc_extent *dc_extent) +{ + struct range alloc_range, *resp_range; + struct device *dev = mds->cxlds.dev; + int rc; + + alloc_range = (struct range){ + .start = le64_to_cpu(dc_extent->start_dpa), + .end = le64_to_cpu(dc_extent->start_dpa) + + le64_to_cpu(dc_extent->length) - 1, + }; + resp_range = &alloc_range; + + rc = cxl_notify_dc_extent(mds, DCD_ADD_CAPACITY, dc_extent); + if (rc) { + dev_dbg(dev, "unconsumed DC extent DPA:%#llx LEN:%#llx\n", + le64_to_cpu(dc_extent->start_dpa), + le64_to_cpu(dc_extent->length)); + resp_range = NULL; + } + + return cxl_send_dc_cap_response(mds, resp_range, + CXL_MBOX_OP_ADD_DC_RESPONSE); +} + +static char *cxl_dcd_evt_type_str(u8 type) +{ + switch (type) { + case DCD_ADD_CAPACITY: + return "add"; + case DCD_RELEASE_CAPACITY: + return "release"; + case DCD_FORCED_CAPACITY_RELEASE: + return "force release"; + default: + break; + } + + return ""; +} + +static int cxl_handle_dcd_event_records(struct cxl_memdev_state *mds, + struct cxl_event_record_raw *raw_rec) +{ + struct cxl_event_dcd *event = &raw_rec->event.dcd; + struct cxl_dc_extent *dc_extent = &event->extent; + struct device *dev = mds->cxlds.dev; + uuid_t *id = &raw_rec->id; + + if (!uuid_equal(id, &CXL_EVENT_DC_EVENT_UUID)) + return -EINVAL; + + dev_dbg(dev, "DCD event %s : DPA:%#llx LEN:%#llx\n", + cxl_dcd_evt_type_str(event->event_type), + le64_to_cpu(dc_extent->start_dpa), + le64_to_cpu(dc_extent->length)); + + switch (event->event_type) { + case DCD_ADD_CAPACITY: + return cxl_handle_dcd_add_event(mds, dc_extent); + case DCD_RELEASE_CAPACITY: + return cxl_handle_dcd_release_event(mds, dc_extent); + case DCD_FORCED_CAPACITY_RELEASE: + dev_err_ratelimited(dev, "Forced release event ignored.\n"); + return 0; + default: + return -EINVAL; + } + + return 0; +} + static void cxl_mem_get_records_log(struct cxl_memdev_state *mds, enum cxl_event_log_type type) { @@ -1109,9 +1225,17 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds, if (!nr_rec) break; - for (i = 0; i < nr_rec; i++) + for (i = 0; i < nr_rec; i++) { __cxl_event_trace_record(cxlmd, type, &payload->records[i]); + if (type == CXL_EVENT_TYPE_DCD) { + rc = cxl_handle_dcd_event_records(mds, + &payload->records[i]); + if (rc) + dev_err_ratelimited(dev, "dcd event failed: %d\n", + rc); + } + } if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW) trace_cxl_overflow(cxlmd, type, payload); @@ -1143,6 +1267,8 @@ void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status) { dev_dbg(mds->cxlds.dev, "Reading event logs: %x\n", status); + if (cxl_dcd_supported(mds) && (status & CXLDEV_EVENT_STATUS_DCD)) + cxl_mem_get_records_log(mds, CXL_EVENT_TYPE_DCD); if (status & CXLDEV_EVENT_STATUS_FATAL) cxl_mem_get_records_log(mds, CXL_EVENT_TYPE_FATAL); if (status & CXLDEV_EVENT_STATUS_FAIL) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 7635ff109578..a07d95136f0d 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1450,6 +1450,57 @@ static int cxl_region_validate_position(struct cxl_region *cxlr, return 0; } +int cxl_region_notify_extent(struct cxl_region *cxlr, enum dc_event event, + struct region_extent *reg_ext) +{ + struct cxl_dax_region *cxlr_dax; + struct device *dev; + int rc = -ENXIO; + + cxlr_dax = cxlr->cxlr_dax; + dev = &cxlr_dax->dev; + dev_dbg(dev, "Trying notify: type %d HPA %#llx - %#llx\n", + event, reg_ext->hpa_range.start, reg_ext->hpa_range.end); + + device_lock(dev); + if (dev->driver) { + struct cxl_driver *reg_drv = to_cxl_drv(dev->driver); + struct cxl_drv_nd nd = (struct cxl_drv_nd) { + .event = event, + .reg_ext = reg_ext, + }; + + if (reg_drv->notify) { + dev_dbg(dev, "Notify: type %d HPA %#llx - %#llx\n", + event, reg_ext->hpa_range.start, + reg_ext->hpa_range.end); + rc = reg_drv->notify(dev, &nd); + } + } + device_unlock(dev); + return rc; +} + +static void calc_hpa_range(struct cxl_endpoint_decoder *cxled, + struct cxl_dax_region *cxlr_dax, + struct cxl_dc_extent *dc_extent, + struct range *dpa_range, + struct range *hpa_range) +{ + resource_size_t dpa_offset, hpa; + + /* + * Without interleave... + * HPA offset == DPA offset + * ... but do the math anyway + */ + dpa_offset = dpa_range->start - cxled->dpa_res->start; + hpa = cxled->cxld.hpa_range.start + dpa_offset; + + hpa_range->start = hpa - cxlr_dax->hpa_range.start; + hpa_range->end = hpa_range->start + range_len(dpa_range) - 1; +} + static int extent_check_overlap(struct device *dev, void *arg) { struct range *new_range = arg; @@ -1480,7 +1531,6 @@ int cxl_ed_add_one_extent(struct cxl_endpoint_decoder *cxled, struct cxl_region *cxlr = cxled->cxld.region; struct range ext_dpa_range, ext_hpa_range; struct device *dev = &cxlr->dev; - resource_size_t dpa_offset, hpa; /* * Interleave ways == 1 means this coresponds to a 1:1 mapping between @@ -1502,18 +1552,7 @@ int cxl_ed_add_one_extent(struct cxl_endpoint_decoder *cxled, dev_dbg(dev, "Adding DC extent DPA %#llx - %#llx\n", ext_dpa_range.start, ext_dpa_range.end); - /* - * Without interleave... - * HPA offset == DPA offset - * ... but do the math anyway - */ - dpa_offset = ext_dpa_range.start - cxled->dpa_res->start; - hpa = cxled->cxld.hpa_range.start + dpa_offset; - - ext_hpa_range = (struct range) { - .start = hpa - cxlr->cxlr_dax->hpa_range.start, - .end = ext_hpa_range.start + range_len(&ext_dpa_range) - 1, - }; + calc_hpa_range(cxled, cxlr->cxlr_dax, dc_extent, &ext_dpa_range, &ext_hpa_range); if (extent_overlaps(cxlr->cxlr_dax, &ext_hpa_range)) return -EINVAL; @@ -1527,6 +1566,80 @@ int cxl_ed_add_one_extent(struct cxl_endpoint_decoder *cxled, cxled); } +static void cxl_ed_rm_region_extent(struct cxl_region *cxlr, + struct region_extent *reg_ext) +{ + cxl_region_notify_extent(cxlr, DCD_RELEASE_CAPACITY, reg_ext); +} + +struct rm_data { + struct cxl_region *cxlr; + struct range *range; +}; + +static int cxl_rm_reg_ext_by_range(struct device *dev, void *data) +{ + struct rm_data *rm_data = data; + struct region_extent *reg_ext; + + if (!is_region_extent(dev)) + return 0; + reg_ext = to_region_extent(dev); + + /* + * Any extent which 'touches' the released range is notified + * for removal. No partials of the extent are released. + */ + if (range_overlaps(rm_data->range, ®_ext->hpa_range)) { + struct cxl_region *cxlr = rm_data->cxlr; + + dev_dbg(dev, "Remove DAX region ext HPA %#llx - %#llx\n", + reg_ext->hpa_range.start, reg_ext->hpa_range.end); + cxl_ed_rm_region_extent(cxlr, reg_ext); + } + return 0; +} + +static int cxl_ed_rm_extent(struct cxl_endpoint_decoder *cxled, + struct cxl_dc_extent *dc_extent) +{ + struct cxl_region *cxlr = cxled->cxld.region; + struct range hpa_range; + + struct range rel_dpa_range = { + .start = le64_to_cpu(dc_extent->start_dpa), + .end = le64_to_cpu(dc_extent->start_dpa) + + le64_to_cpu(dc_extent->length) - 1, + }; + + calc_hpa_range(cxled, cxlr->cxlr_dax, dc_extent, &rel_dpa_range, &hpa_range); + + struct rm_data rm_data = { + .cxlr = cxlr, + .range = &hpa_range, + }; + + return device_for_each_child(&cxlr->cxlr_dax->dev, &rm_data, + cxl_rm_reg_ext_by_range); +} + +int cxl_ed_notify_extent(struct cxl_endpoint_decoder *cxled, + struct cxl_drv_nd *nd) +{ + switch (nd->event) { + case DCD_ADD_CAPACITY: + return cxl_ed_add_one_extent(cxled, nd->dc_extent); + case DCD_RELEASE_CAPACITY: + return cxl_ed_rm_extent(cxled, nd->dc_extent); + case DCD_FORCED_CAPACITY_RELEASE: + default: + dev_err(&cxled->cxld.dev, "Unknown DC event %d\n", nd->event); + break; + } + return -ENXIO; +} +EXPORT_SYMBOL_NS_GPL(cxl_ed_notify_extent, CXL); + static int cxl_region_attach_position(struct cxl_region *cxlr, struct cxl_root_decoder *cxlrd, struct cxl_endpoint_decoder *cxled, diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 5379ad7f5852..156d7c9a8de5 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -10,6 +10,7 @@ #include #include #include +#include /** * DOC: cxl objects @@ -613,6 +614,14 @@ struct cxl_pmem_region { struct cxl_pmem_region_mapping mapping[]; }; +/* See CXL 3.0 8.2.9.2.1.5 */ +enum dc_event { + DCD_ADD_CAPACITY, + DCD_RELEASE_CAPACITY, + DCD_FORCED_CAPACITY_RELEASE, + DCD_REGION_CONFIGURATION_UPDATED, +}; + struct cxl_dax_region { struct device dev; struct cxl_region *cxlr; @@ -891,10 +900,18 @@ bool is_cxl_region(struct device *dev); extern struct bus_type cxl_bus_type; +/* Driver Notifier Data */ +struct cxl_drv_nd { + enum dc_event event; + struct cxl_dc_extent *dc_extent; + struct region_extent *reg_ext; +}; + struct cxl_driver { const char *name; int (*probe)(struct device *dev); void (*remove)(struct device *dev); + int (*notify)(struct device *dev, struct cxl_drv_nd *nd); struct device_driver drv; int id; }; @@ -933,6 +950,8 @@ bool is_cxl_nvdimm(struct device *dev); bool is_cxl_nvdimm_bridge(struct device *dev); int devm_cxl_add_nvdimm(struct cxl_memdev *cxlmd); struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct cxl_memdev *cxlmd); +bool cxl_dc_extent_in_ed(struct cxl_endpoint_decoder *cxled, + struct cxl_dc_extent *extent); #ifdef CONFIG_CXL_REGION bool is_cxl_pmem_region(struct device *dev); @@ -940,6 +959,10 @@ struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev); int cxl_add_to_region(struct cxl_port *root, struct cxl_endpoint_decoder *cxled); struct cxl_dax_region *to_cxl_dax_region(struct device *dev); +int cxl_ed_notify_extent(struct cxl_endpoint_decoder *cxled, + struct cxl_drv_nd *nd); +int cxl_region_notify_extent(struct cxl_region *cxlr, enum dc_event event, + struct region_extent *reg_ext); #else static inline bool is_cxl_pmem_region(struct device *dev) { @@ -958,6 +981,17 @@ static inline struct cxl_dax_region *to_cxl_dax_region(struct device *dev) { return NULL; } +static inline int cxl_ed_notify_extent(struct cxl_endpoint_decoder *cxled, + struct cxl_drv_nd *nd) +{ + return 0; +} +static inline int cxl_region_notify_extent(struct cxl_region *cxlr, + enum dc_event event, + struct region_extent *reg_ext) +{ + return 0; +} #endif void cxl_endpoint_parse_cdat(struct cxl_port *port); diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 8f2d8944d334..eb10cae99ff0 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -619,18 +619,6 @@ struct cxl_mbox_dc_response { } __packed extent_list[]; } __packed; -/* - * CXL rev 3.1 section 8.2.9.2.1.6; Table 8-51 - */ -#define CXL_DC_EXTENT_TAG_LEN 0x10 -struct cxl_dc_extent { - __le64 start_dpa; - __le64 length; - u8 tag[CXL_DC_EXTENT_TAG_LEN]; - __le16 shared_extn_seq; - u8 reserved[6]; -} __packed; - /* * Get Dynamic Capacity Extent List; Input Payload * CXL rev 3.1 section 8.2.9.9.9.2; Table 8-166 @@ -714,6 +702,14 @@ struct cxl_mbox_identify { UUID_INIT(0xfe927475, 0xdd59, 0x4339, 0xa5, 0x86, 0x79, 0xba, 0xb1, \ 0x13, 0xb7, 0x74) +/* + * Dynamic Capacity Event Record + * CXL rev 3.1 section 8.2.9.2.1; Table 8-43 + */ +#define CXL_EVENT_DC_EVENT_UUID \ + UUID_INIT(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f, 0x95, 0x26, 0x8e, \ + 0x10, 0x1a, 0x2a) + /* * Get Event Records output payload * CXL rev 3.0 section 8.2.9.2.2; Table 8-50 @@ -739,6 +735,7 @@ enum cxl_event_log_type { CXL_EVENT_TYPE_WARN, CXL_EVENT_TYPE_FAIL, CXL_EVENT_TYPE_FATAL, + CXL_EVENT_TYPE_DCD, CXL_EVENT_TYPE_MAX }; diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c index 0c79d9ce877c..20832f09c40c 100644 --- a/drivers/cxl/mem.c +++ b/drivers/cxl/mem.c @@ -103,6 +103,50 @@ static int cxl_debugfs_poison_clear(void *data, u64 dpa) DEFINE_DEBUGFS_ATTRIBUTE(cxl_poison_clear_fops, NULL, cxl_debugfs_poison_clear, "%llx\n"); +static int match_ep_decoder_by_range(struct device *dev, void *data) +{ + struct cxl_dc_extent *dc_extent = data; + struct cxl_endpoint_decoder *cxled; + + if (!is_endpoint_decoder(dev)) + return 0; + + cxled = to_cxl_endpoint_decoder(dev); + if (!cxled->cxld.region) + return 0; + + return cxl_dc_extent_in_ed(cxled, dc_extent); +} + +static int cxl_mem_notify(struct device *dev, struct cxl_drv_nd *nd) +{ + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); + struct cxl_port *endpoint = cxlmd->endpoint; + struct cxl_endpoint_decoder *cxled; + struct cxl_dc_extent *dc_extent; + struct device *ep_dev; + int rc; + + dc_extent = nd->dc_extent; + dev_dbg(dev, "notify DC action %d DPA:%#llx LEN:%#llx\n", + nd->event, le64_to_cpu(dc_extent->start_dpa), + le64_to_cpu(dc_extent->length)); + + ep_dev = device_find_child(&endpoint->dev, dc_extent, + match_ep_decoder_by_range); + if (!ep_dev) { + dev_dbg(dev, "Extent DPA:%#llx LEN:%#llx not mapped; evt %d\n", + le64_to_cpu(dc_extent->start_dpa), + le64_to_cpu(dc_extent->length), nd->event); + return -ENXIO; + } + + cxled = to_cxl_endpoint_decoder(ep_dev); + rc = cxl_ed_notify_extent(cxled, nd); + put_device(ep_dev); + return rc; +} + static int cxl_mem_probe(struct device *dev) { struct cxl_memdev *cxlmd = to_cxl_memdev(dev); @@ -244,6 +288,7 @@ __ATTRIBUTE_GROUPS(cxl_mem); static struct cxl_driver cxl_mem_driver = { .name = "cxl_mem", .probe = cxl_mem_probe, + .notify = cxl_mem_notify, .id = CXL_DEVICE_MEMORY_EXPANDER, .drv = { .dev_groups = cxl_mem_groups, diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c index 70bdc7a878ab..83ee45aff69a 100644 --- a/drivers/dax/cxl.c +++ b/drivers/dax/cxl.c @@ -42,6 +42,27 @@ static void cxl_dax_region_add_extents(struct cxl_dax_region *cxlr_dax, device_for_each_child(&cxlr_dax->dev, dax_region, cxl_dax_region_add_extent); } +static int cxl_dax_region_notify(struct device *dev, + struct cxl_drv_nd *nd) +{ + struct cxl_dax_region *cxlr_dax = to_cxl_dax_region(dev); + struct dax_region *dax_region = dev_get_drvdata(dev); + struct region_extent *reg_ext = nd->reg_ext; + + switch (nd->event) { + case DCD_ADD_CAPACITY: + return __cxl_dax_region_add_extent(dax_region, reg_ext); + case DCD_RELEASE_CAPACITY: + return 0; + case DCD_FORCED_CAPACITY_RELEASE: + default: + dev_err(&cxlr_dax->dev, "Unknown DC event %d\n", nd->event); + break; + } + + return -ENXIO; +} + static int cxl_dax_region_probe(struct device *dev) { struct cxl_dax_region *cxlr_dax = to_cxl_dax_region(dev); @@ -85,6 +106,7 @@ static int cxl_dax_region_probe(struct device *dev) static struct cxl_driver cxl_dax_region_driver = { .name = "cxl_dax_region", .probe = cxl_dax_region_probe, + .notify = cxl_dax_region_notify, .id = CXL_DEVICE_DAX_REGION, .drv = { .suppress_bind_attrs = true, diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h index 03fa6d50d46f..6b745c913f96 100644 --- a/include/linux/cxl-event.h +++ b/include/linux/cxl-event.h @@ -91,11 +91,42 @@ struct cxl_event_mem_module { u8 reserved[0x3d]; } __packed; +/* + * CXL rev 3.1 section 8.2.9.2.1.6; Table 8-51 + */ +#define CXL_DC_EXTENT_TAG_LEN 0x10 +struct cxl_dc_extent { + __le64 start_dpa; + __le64 length; + u8 tag[CXL_DC_EXTENT_TAG_LEN]; + __le16 shared_extn_seq; + u8 reserved[0x6]; +} __packed; + +/* + * Dynamic Capacity Event Record + * CXL rev 3.1 section 8.2.9.2.1.6; Table 8-50 + */ +struct cxl_event_dcd { + struct cxl_event_record_hdr hdr; + u8 event_type; + u8 validity_flags; + __le16 host_id; + u8 region_index; + u8 flags; + u8 reserved1[0x2]; + struct cxl_dc_extent extent; + u8 reserved2[0x18]; + __le32 num_avail_extents; + __le32 num_avail_tags; +} __packed; + union cxl_event { struct cxl_event_generic generic; struct cxl_event_gen_media gen_media; struct cxl_event_dram dram; struct cxl_event_mem_module mem_module; + struct cxl_event_dcd dcd; } __packed; /* From patchwork Sun Mar 24 23:18:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601028 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4D9CA236D1B; Sun, 24 Mar 2024 23:18:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322310; cv=none; b=iwbUOGHwJhr2khMhUe3ipanPnUyhPTvCYbzTEFWLEEolNSCOb6Ex1oOGdl7wZq+IZYmRz7X/mPB4Je9fOb+EVt13L7N5jsa4f7QV7f9jowACA/2XNswgkxk03GizFwgNAtX73p4b2oBAvZoG2BOvB+8Tnfpvn0S7eGnHZAH8DFs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322310; c=relaxed/simple; bh=ut6MPUSgbN0WUhXp23QFP+26uxY7Nhx7fWMd5btbDxw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=H6jnqQYegNWku6ABmzibIR3mBmlFCT35nSESGhiIh5dvJVhWF/hfJjq1JN4RZdXQgnXKFe+G28MM7ZBdiV2fLfkDiyj+rCl2Eu5i5uAWE8r4T704CToVHIY+clFbd2dwP5ak3bVgqFKsopGeU4npaN8lEDXlHb0lkIyd4pNU8K0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=LD+Rah4x; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="LD+Rah4x" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322308; x=1742858308; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=ut6MPUSgbN0WUhXp23QFP+26uxY7Nhx7fWMd5btbDxw=; b=LD+Rah4xKdomjyrxiEF0j7m1QVD3U3uS2R+kROiC/eHTzoyqW7DFNVqE 2nO5UYAPxdvAPfXx8eO5NfMsnrK6pPm8z73idJZFL5StEKHrUcZdoOVmO 3fqYuHRL56gTNiCXhZgtFP1QFRoS+OO6UZOhMBRr99tsmb99hbP5oUnzj x1i2Opt7Bwy8aXnPDPO7gBgl7N9NtDCIKNgqN/7GP76DY+pK3SF2k/7gr 5zLl8A+6di0DiKm+GKIqQpG2X92P4t5DUoGkxivF42hHZieUAM3Oowtmb T2dKSl6RYqsihfrCgRVpLSIrb0a2zQCMIaqnMIk4TWqjulwxeY2tehQcN Q==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="6431753" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="6431753" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15464723" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:21 -0700 From: Ira Weiny Date: Sun, 24 Mar 2024 16:18:22 -0700 Subject: [PATCH 19/26] dax/bus: Factor out dev dax resize logic Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-19-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=8758; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=ut6MPUSgbN0WUhXp23QFP+26uxY7Nhx7fWMd5btbDxw=; b=NfgJS8i9f/7kiYTas5hrgQ6EXDGgWd7rut5kBmuK/s5UA/sJ/rzt14baQQ/rrk/QppI4rt7ai 8BcSbu6qxB+CDwfWeiyMCUGAD5iOi98+ews/dt2HflYKT80Wc+d6/SO X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= Dynamic Capacity regions must limit dev dax resources to those areas which have extents backing real memory. Such DAX regions are dubbed 'sparse' regions. In order to manage where memory is available four alternatives were considered: 1) Create a single region resource child on region creation which reserves the entire region. Then as extents are added punch holes in this reservation. This requires new resource manipulation to punch the holes and still requires an additional iteration over the extent areas which may already have existing dev dax resources used. 2) Maintain an ordered xarray of extents which can be queried while processing the resize logic. The issue is that existing region->res children may artificially limit the allocation size sent to alloc_dev_dax_range(). IE the resource children can't be directly used in the resize logic to find where space in the region is. This also poses a problem of managing the available size in 2 places. 3) Maintain a separate resource tree with extents. This option is the same as 2) but with the different data structure. Most ideally there should be a unified representation of the resource tree not two places to look for space. 4) Create region resource children for each extent. Manage the dax dev resize logic in the same way as before but use a region child (extent) resource as the parents to find space within each extent. Option 4 can leverage the existing resize algorithm to find space within the extents. It manages the available space in a singular resource tree which is less complicated for finding space. In preparation for this change, factor out the dev_dax_resize logic. For static regions use dax_region->res as the parent to find space for the dax ranges. Future patches will use the same algorithm with individual extent resources as the parent. Signed-off-by: Ira Weiny --- Changes for V1 [iweiny: Rebase on new DAX region locking] [iweiny: Reword commit message] [iweiny: Drop reviews] --- drivers/dax/bus.c | 129 +++++++++++++++++++++++++++++++++--------------------- 1 file changed, 79 insertions(+), 50 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 4d5ed7ab6537..bab19fc578d0 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -928,11 +928,9 @@ static int devm_register_dax_mapping(struct dev_dax *dev_dax, int range_id) return 0; } -static int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start, - resource_size_t size) +static int alloc_dev_dax_range(struct resource *parent, struct dev_dax *dev_dax, + u64 start, resource_size_t size) { - struct dax_region *dax_region = dev_dax->region; - struct resource *res = &dax_region->res; struct device *dev = &dev_dax->dev; struct dev_dax_range *ranges; unsigned long pgoff = 0; @@ -950,14 +948,14 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start, return 0; } - alloc = __request_region(res, start, size, dev_name(dev), 0); + alloc = __request_region(parent, start, size, dev_name(dev), 0); if (!alloc) return -ENOMEM; ranges = krealloc(dev_dax->ranges, sizeof(*ranges) * (dev_dax->nr_range + 1), GFP_KERNEL); if (!ranges) { - __release_region(res, alloc->start, resource_size(alloc)); + __release_region(parent, alloc->start, resource_size(alloc)); return -ENOMEM; } @@ -1110,50 +1108,45 @@ static bool adjust_ok(struct dev_dax *dev_dax, struct resource *res) return true; } -static ssize_t dev_dax_resize(struct dax_region *dax_region, - struct dev_dax *dev_dax, resource_size_t size) +/** + * dev_dax_resize_static - Expand the device into the unused portion of the + * region. This may involve adjusting the end of an existing resource, or + * allocating a new resource. + * + * @parent: parent resource to allocate this range in + * @dev_dax: DAX device to be expanded + * @to_alloc: amount of space to alloc; must be <= space available in @parent + * + * Return the amount of space allocated or -ERRNO on failure + */ +static ssize_t dev_dax_resize_static(struct resource *parent, + struct dev_dax *dev_dax, + resource_size_t to_alloc) { - resource_size_t avail = dax_region_avail_size(dax_region), to_alloc; - resource_size_t dev_size = dev_dax_size(dev_dax); - struct resource *region_res = &dax_region->res; - struct device *dev = &dev_dax->dev; struct resource *res, *first; - resource_size_t alloc = 0; int rc; - if (dev->driver) - return -EBUSY; - if (size == dev_size) - return 0; - if (size > dev_size && size - dev_size > avail) - return -ENOSPC; - if (size < dev_size) - return dev_dax_shrink(dev_dax, size); - - to_alloc = size - dev_size; - if (dev_WARN_ONCE(dev, !alloc_is_aligned(dev_dax, to_alloc), - "resize of %pa misaligned\n", &to_alloc)) - return -ENXIO; - - /* - * Expand the device into the unused portion of the region. This - * may involve adjusting the end of an existing resource, or - * allocating a new resource. - */ -retry: - first = region_res->child; - if (!first) - return alloc_dev_dax_range(dev_dax, dax_region->res.start, to_alloc); + first = parent->child; + if (!first) { + rc = alloc_dev_dax_range(parent, dev_dax, + parent->start, to_alloc); + if (rc) + return rc; + return to_alloc; + } - rc = -ENOSPC; for (res = first; res; res = res->sibling) { struct resource *next = res->sibling; + resource_size_t alloc; /* space at the beginning of the region */ - if (res == first && res->start > dax_region->res.start) { - alloc = min(res->start - dax_region->res.start, to_alloc); - rc = alloc_dev_dax_range(dev_dax, dax_region->res.start, alloc); - break; + if (res == first && res->start > parent->start) { + alloc = min(res->start - parent->start, to_alloc); + rc = alloc_dev_dax_range(parent, dev_dax, + parent->start, alloc); + if (rc) + return rc; + return alloc; } alloc = 0; @@ -1162,21 +1155,55 @@ static ssize_t dev_dax_resize(struct dax_region *dax_region, alloc = min(next->start - (res->end + 1), to_alloc); /* space at the end of the region */ - if (!alloc && !next && res->end < region_res->end) - alloc = min(region_res->end - res->end, to_alloc); + if (!alloc && !next && res->end < parent->end) + alloc = min(parent->end - res->end, to_alloc); if (!alloc) continue; if (adjust_ok(dev_dax, res)) { rc = adjust_dev_dax_range(dev_dax, res, resource_size(res) + alloc); - break; + if (rc) + return rc; + return alloc; } - rc = alloc_dev_dax_range(dev_dax, res->end + 1, alloc); - break; + rc = alloc_dev_dax_range(parent, dev_dax, res->end + 1, alloc); + if (rc) + return rc; + return alloc; } - if (rc) - return rc; + + /* available was already calculated and should never be an issue */ + dev_WARN_ONCE(&dev_dax->dev, 1, "space not found?"); + return 0; +} + +static ssize_t dev_dax_resize(struct dax_region *dax_region, + struct dev_dax *dev_dax, resource_size_t size) +{ + resource_size_t avail = dax_region_avail_size(dax_region), to_alloc; + resource_size_t dev_size = dev_dax_size(dev_dax); + struct device *dev = &dev_dax->dev; + resource_size_t alloc = 0; + + if (dev->driver) + return -EBUSY; + if (size == dev_size) + return 0; + if (size > dev_size && size - dev_size > avail) + return -ENOSPC; + if (size < dev_size) + return dev_dax_shrink(dev_dax, size); + + to_alloc = size - dev_size; + if (dev_WARN_ONCE(dev, !alloc_is_aligned(dev_dax, to_alloc), + "resize of %pa misaligned\n", &to_alloc)) + return -ENXIO; + +retry: + alloc = dev_dax_resize_static(&dax_region->res, dev_dax, to_alloc); + if (alloc <= 0) + return alloc; to_alloc -= alloc; if (to_alloc) goto retry; @@ -1283,7 +1310,8 @@ static ssize_t mapping_store(struct device *dev, struct device_attribute *attr, to_alloc = range_len(&r); if (alloc_is_aligned(dev_dax, to_alloc)) - rc = alloc_dev_dax_range(dev_dax, r.start, to_alloc); + rc = alloc_dev_dax_range(&dax_region->res, dev_dax, r.start, + to_alloc); up_write(&dax_dev_rwsem); up_write(&dax_region_rwsem); @@ -1506,7 +1534,8 @@ static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data) device_initialize(dev); dev_set_name(dev, "dax%d.%d", dax_region->id, dev_dax->id); - rc = alloc_dev_dax_range(dev_dax, dax_region->res.start, data->size); + rc = alloc_dev_dax_range(&dax_region->res, dev_dax, dax_region->res.start, + data->size); if (rc) goto err_range; From patchwork Sun Mar 24 23:18:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601119 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D0DB236D0A; Sun, 24 Mar 2024 23:18:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322309; cv=none; b=ICxoUtKaUvuw0Kl5MP7wUe7BDSZJamDydxG5VlFdjKh2CZ0jLtuqjsMBZnMsE6ONAKpeHtXUvqWThIlQkS1TZaBHPebJl6DJ9pplS1xvB1X6NphfQPiXweMnfG6iL+2Q3SeGxr/OH64DEJaj4U1LzyoeITJyL/TOeuhk6wzRKxk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322309; c=relaxed/simple; bh=+DJc1mIap7mEkFNzdwzl72773SLqKqtas6uS8idaz98=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=AZd8J5O6s7/qr4Ew6wsaEHA06Kp7FP+LfMw2TWAxJBZQZmCY2FQq83eopwTq9KhS36ndzY/+HDPcYJPU6NaiNka4EsMnpiVvxQO4L5if46Y2TvMy/l00DjkTB11vI4iZn3v21Xci2ZTmootHbjDHqdt8r6OmZNyR6Lh0bqBpc8A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Iow/FmYU; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Iow/FmYU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322308; x=1742858308; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=+DJc1mIap7mEkFNzdwzl72773SLqKqtas6uS8idaz98=; b=Iow/FmYULklMNX+0qCkIU5tFysnN2RcUiMlsfRUOqzxr15Thn0yQfpvE 8DuKxmUzaue5rDMd2bf8CuR2wr4vLwGuf6DwJ/bhehSP93Hb8O0a1FDOT SYppxywdShgZOwT/KBWKjvG5fVTqvYST73BjqTWNhBu0bUb8sUmU8wV8R gJENH9095XkZP84VQmSmMiajCOufQH706vwZOnI34MAfR82+ThAgW+drh 1T/+MsytyZ0R8lE87J5mRCZZ6Saw2RKFc2IaG2rWfyAAoGUAxtPn7n+RB w2+5FwhNKgRAESG/xRM7gCK+wN5ojFLDvf059rVi09LLMF+RV4OdjmSnT Q==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="6431757" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="6431757" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15464727" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:21 -0700 From: Ira Weiny Date: Sun, 24 Mar 2024 16:18:23 -0700 Subject: [PATCH 20/26] dax: Document dax dev range tuple Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-20-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=1060; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=+DJc1mIap7mEkFNzdwzl72773SLqKqtas6uS8idaz98=; b=PUQMR0mymeyY/LriUmBNTsS94i+lzbBY8csOAgPv7zWU+jRoNvvbcOd4mM6iKKXvl18QxJVjt VuuyFuZUZRmCMduN5qFdVGLi5EDLGi7Ra6czf/F83o95lhq//so/i8F X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= The device DAX structure is being enhanced to track additional DCD information. The current range tuple was not fully documented. Document it prior to adding information for DC. Suggested-by: Jonathan Cameron Signed-off-by: Ira Weiny Reviewed-by: Dave Jiang --- Changes for v1 [iweiny: new patch] --- drivers/dax/dax-private.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index c6319c6567fb..ac1ccf158650 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -70,7 +70,10 @@ struct dax_mapping { * @dev - device core * @pgmap - pgmap for memmap setup / lifetime (driver owned) * @nr_range: size of @ranges - * @ranges: resource-span + pgoff tuples for the instance + * @ranges: range tuples of memory used + * @pgoff: page offset + * @range: resource-span + * @mapping: device to assist in interrogating the range layout */ struct dev_dax { struct dax_region *region; From patchwork Sun Mar 24 23:18:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601029 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 24D9615ECC0; Sun, 24 Mar 2024 23:18:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322311; cv=none; b=Z0uaTI2BWlGO7Em47VYKLw8Pz0BFcuVHotHuT+HKhPZ5U7GpBmbQM0rZDv31tU+RXyKlsx2kNilp6ZYo/4nBBgzWAyIdug70uwNv8xer+RrttKJxuNC7RPV8iDBVKMT81yyuNE0Pl+jrTQNhiQRV4t8JAA1LKTBub8dTjFQf8WQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322311; c=relaxed/simple; bh=SdKJxNmCpZI7diYOjhg3gzpenx7d7Y4+UcdBECqzHV0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Ud5NSveDl3AzT1laHz+61lEx/cYkQ4J6Ppb21acp9KHVgX5k7J1Yj63PkH+3AvDmpsP3tpT86P4uaSITO+OlFethqiG/a0JJb45Sjok7ijwkSsYOitg6igNRjJUJiu1G8kdwgk6faCD4xEAmhs7rYKCcJ9Kp1n9lWYsrxfUkkHk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=k6C3eXtT; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="k6C3eXtT" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322310; x=1742858310; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=SdKJxNmCpZI7diYOjhg3gzpenx7d7Y4+UcdBECqzHV0=; b=k6C3eXtTKZAHwUUpne0qIJ3GxNE32ymGILouBC4ZsC2jClwp12b9KNfb SUndVFsiEnc9ZdHxEWhAT7HVaJVzEJ8fYFCw00oB8EZJlgo0Cy77k0I2Q jLN7d3yjBD++Owwp4yRqMWKmrQxs5b1eAC+DLfO6VP1hpQSHGYW9TzRM/ yDgD7rUPFi3412xj0jh9YOqUgXhYVHGH9EJTpHxfv+7UObzHGx18xhJMw w60xRzKThrd4vFHOdiu1K7VQlRVWtZJzkYAYSWSTFxZTaeYAMK6315fWL bmEtggX88p3ofqEAdC9ypUyad3DI6rsT6u4H6DHpoW4/COOz1oJkCtdB1 w==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="6431762" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="6431762" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15464730" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:22 -0700 From: Ira Weiny Date: Sun, 24 Mar 2024 16:18:24 -0700 Subject: [PATCH 21/26] dax/region: Prevent range mapping allocation on sparse regions Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-21-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=925; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=SdKJxNmCpZI7diYOjhg3gzpenx7d7Y4+UcdBECqzHV0=; b=j4GxZzdCdz6VRtuvgsE/aGqiV1S2P6y/VmyhkFz9BzgBE5/gjCqHSjGiLoXM/n/CvC2x1f70n cy40TEvGDUfAFuMAbbNUF0ZJGaG9zb4THvgqpVqx+28odB3iqzjByie X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= Sparse regions are not fully populated with memory and this complicates range mapping of dax devices on those regions. There is no use case for range mapping on sparse regions. Avoid the complication by prevent range mapping of dax devices on sparse regions. Signed-off-by: Ira Weiny Reviewed-by: Dave Jiang --- drivers/dax/bus.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index bab19fc578d0..56dddaceeccb 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -1452,6 +1452,8 @@ static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int n) return 0; if (a == &dev_attr_mapping.attr && is_static(dax_region)) return 0; + if (a == &dev_attr_mapping.attr && is_sparse(dax_region)) + return 0; if ((a == &dev_attr_align.attr || a == &dev_attr_size.attr) && is_static(dax_region)) return 0444; From patchwork Sun Mar 24 23:18:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601030 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4AAEF15ECC8; Sun, 24 Mar 2024 23:18:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322312; cv=none; b=YlwD0geG23PgEmUn7bB40wYuEowKbwqjkiP51+3dyI2DuG4JQb7xSqF4A8QLFinlm6cbrEwKkMW1Wu0b90ZF/vxZqrqYXUZfVJTaxcHPL6Ic3Lrhidm+a5u7lsPSEs5VlhOBHFMOLOOATn7ijqRHayHWyCPQBVO6P+ggyKxqKy4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322312; c=relaxed/simple; bh=G0IGndHDbP9hJ9ZtGXbAmQkU6XG/YHY1jlW10czS90M=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=MDtCFrivxsDhju036US08og5Fq+1guaAP3pJzB3E4plnQHJDnTO5aiAtBWkNy/PnQpyUsrVEneKyiIBze8YKmU7oNF7IGnpZXxHAZoHuKZJ6klWtqwLQLafv7SDgcHmHUW19/rVVRk/dkBO2H8a6RhKysTFxGxah+TipMRaxnDI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=mviP4DL5; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="mviP4DL5" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322310; x=1742858310; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=G0IGndHDbP9hJ9ZtGXbAmQkU6XG/YHY1jlW10czS90M=; b=mviP4DL5gsUoVUPnvNA1IMy7VXP9c0L36KBi7UO+EyMZnkDy9elslmMz LxcGL4pFr96ewAHN58iOyTrYeFzhd7zcK6lKigooAEXBuurcxv1YBFJBP NgfqwZnIF5nrkJNJzhvZL27CYNtVP58XSQpi+5e1pvdlsqVIugvRejVTk qrey4Sqgo1PbolUg2TfQr/fnUacUGfDu9rlVgDsAe63I4p81hRRaagppc 8lXtuovzvSr7HefvHwQdwqzkxMfOBjAv87oMeypAqJglA0v2nu8hxcxab CVvZC0pOsm5ZcyDtuO2QAc0kNTHzwu134mluAualEO/Bx5Q4U+PDQoKvD Q==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="6431765" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="6431765" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15464733" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:22 -0700 From: Ira Weiny Date: Sun, 24 Mar 2024 16:18:25 -0700 Subject: [PATCH 22/26] dax/region: Support DAX device creation on sparse DAX regions Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-22-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=20942; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=G0IGndHDbP9hJ9ZtGXbAmQkU6XG/YHY1jlW10czS90M=; b=sTnsAdQLyCQhXXsyBSOBjRc4Ac0EH0nnvrD9WR2Y6dr+jOiZgmnl/QgVNogK51bqo8+/E9JYl 2ycQewUF0IZDevDWLqic9EUQfGVXBVXUWsr3Ov5mJXqEPXzUa9+rFo0 X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= Previous patches introduced a new sparse DAX region type. This region type may have 0 or more bytes of backing memory. DAX devices already have the ability to reference sparse ranges of a DAX region. Leverage the range support of DAX devices to track memory across a sparse set of region extents. Requests for extent removal can be received from the device at any time. But the host is not obliged to release that memory until it is finished with it. Introduce a use count to track how many DAX devices are using an extent. If that extent is in use reject the removal of the extent. Leverage the region RW semaphore to protect the extent data as any changes to the use of the extent require DAX device, DAX region, and extent stability during those operations. Signed-off-by: Ira Weiny Reviewed-by: Jonathan Cameron --- Changes for v3 [iweiny: simplify the extent objects] [iweiny: refactor based on the new extent objects created] [iweiny: remove xarray] [iweiny: use lock/invalidate/cnt rather than kref] --- drivers/cxl/core/extent.c | 8 ++ drivers/cxl/core/region.c | 6 +- drivers/cxl/cxl.h | 1 + drivers/dax/bus.c | 191 +++++++++++++++++++++++++++++++++++++++------- drivers/dax/bus.h | 3 +- drivers/dax/cxl.c | 55 ++++++++++++- drivers/dax/dax-private.h | 23 ++++++ drivers/dax/hmem/hmem.c | 2 +- drivers/dax/pmem.c | 2 +- 9 files changed, 258 insertions(+), 33 deletions(-) diff --git a/drivers/cxl/core/extent.c b/drivers/cxl/core/extent.c index e98acd98ebe2..633397d62836 100644 --- a/drivers/cxl/core/extent.c +++ b/drivers/cxl/core/extent.c @@ -81,6 +81,14 @@ static void region_extent_unregister(void *ext) device_unregister(®_ext->dev); } +void dax_reg_ext_release(struct region_extent *reg_ext) +{ + struct device *region_dev = reg_ext->dev.parent; + + devm_release_action(region_dev, region_extent_unregister, reg_ext); +} +EXPORT_SYMBOL_NS_GPL(dax_reg_ext_release, CXL); + int dax_region_create_ext(struct cxl_dax_region *cxlr_dax, struct range *hpa_range, const char *label, diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index a07d95136f0d..7d75512a16bc 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1569,7 +1569,11 @@ int cxl_ed_add_one_extent(struct cxl_endpoint_decoder *cxled, static void cxl_ed_rm_region_extent(struct cxl_region *cxlr, struct region_extent *reg_ext) { - cxl_region_notify_extent(cxlr, DCD_RELEASE_CAPACITY, reg_ext); + if (cxl_region_notify_extent(cxlr, DCD_RELEASE_CAPACITY, reg_ext)) + return; + + /* Extent not in use, release it */ + dax_reg_ext_release(reg_ext); } struct rm_data { diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 156d7c9a8de5..e002c0ea3c2b 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -660,6 +660,7 @@ int dax_region_create_ext(struct cxl_dax_region *cxlr_dax, struct range *dpa_range, struct cxl_endpoint_decoder *cxled); +void dax_reg_ext_release(struct region_extent *dr_ext); bool is_region_extent(struct device *dev); #define to_region_extent(dev) container_of(dev, struct region_extent, dev) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 56dddaceeccb..70a559763e8c 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -236,11 +236,32 @@ int dax_region_add_extent(struct dax_region *dax_region, struct device *ext_dev, if (rc) return rc; - return devm_add_action_or_reset(ext_dev, dax_region_release_extent, + /* Assume the devm action will be configured without error */ + dev_set_drvdata(ext_dev, dax_ext); + rc = devm_add_action_or_reset(ext_dev, dax_region_release_extent, no_free_ptr(dax_ext)); + if (rc) + dev_set_drvdata(ext_dev, NULL); + return rc; } EXPORT_SYMBOL_GPL(dax_region_add_extent); +int dax_region_rm_extent(struct dax_region *dax_region, + struct device *ext_dev) +{ + struct dax_extent *dax_ext; + + guard(rwsem_write)(&dax_region_rwsem); + + dax_ext = dev_get_drvdata(ext_dev); + if (!dax_ext || dax_ext->use_cnt == 0) + return 0; /* extent not in use */ + + dax_ext->invalid = true; + return -EINVAL; +} +EXPORT_SYMBOL_GPL(dax_region_rm_extent); + bool static_dev_dax(struct dev_dax *dev_dax) { return is_static(dev_dax->region); @@ -354,19 +375,44 @@ static ssize_t region_align_show(struct device *dev, static struct device_attribute dev_attr_region_align = __ATTR(align, 0400, region_align_show, NULL); +#define for_each_extent_resource(extent, res) \ + for (res = (extent)->child; res; res = res->sibling) + +unsigned long long +dax_extent_avail_size(struct resource *ext_res) +{ + unsigned long long rc; + struct resource *used_res; + + rc = resource_size(ext_res); + for_each_extent_resource(ext_res, used_res) + rc -= resource_size(used_res); + return rc; +} +EXPORT_SYMBOL_GPL(dax_extent_avail_size); + #define for_each_dax_region_resource(dax_region, res) \ for (res = (dax_region)->res.child; res; res = res->sibling) static unsigned long long dax_region_avail_size(struct dax_region *dax_region) { - resource_size_t size = resource_size(&dax_region->res); + resource_size_t size; struct resource *res; WARN_ON_ONCE(!rwsem_is_locked(&dax_region_rwsem)); - if (is_sparse(dax_region)) - return 0; + if (is_sparse(dax_region)) { + /* + * Children of a sparse region represent available space not + * used space. + */ + size = 0; + for_each_dax_region_resource(dax_region, res) + size += dax_extent_avail_size(res); + return size; + } + size = resource_size(&dax_region->res); for_each_dax_region_resource(dax_region, res) size -= resource_size(res); return size; @@ -507,15 +553,26 @@ EXPORT_SYMBOL_GPL(kill_dev_dax); static void trim_dev_dax_range(struct dev_dax *dev_dax) { int i = dev_dax->nr_range - 1; - struct range *range = &dev_dax->ranges[i].range; + struct dev_dax_range *dev_range = &dev_dax->ranges[i]; + struct range *range = &dev_range->range; struct dax_region *dax_region = dev_dax->region; + struct resource *res = &dax_region->res; WARN_ON_ONCE(!rwsem_is_locked(&dax_region_rwsem)); dev_dbg(&dev_dax->dev, "delete range[%d]: %#llx:%#llx\n", i, (unsigned long long)range->start, (unsigned long long)range->end); - __release_region(&dax_region->res, range->start, range_len(range)); + if (dev_range->dax_ext) { + res = dev_range->dax_ext->res; + dev_dbg(&dev_dax->dev, "Trim sparse extent %pr\n", res); + } + + __release_region(res, range->start, range_len(range)); + + if (dev_range->dax_ext) + dev_range->dax_ext->use_cnt--; + if (--dev_dax->nr_range == 0) { kfree(dev_dax->ranges); dev_dax->ranges = NULL; @@ -711,7 +768,7 @@ static void dax_region_unregister(void *region) struct dax_region *alloc_dax_region(struct device *parent, int region_id, struct range *range, int target_node, unsigned int align, - unsigned long flags) + unsigned long flags, struct dax_reg_sparse_ops *sparse_ops) { struct dax_region *dax_region; @@ -729,12 +786,16 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id, || !IS_ALIGNED(range_len(range), align)) return NULL; + if (!sparse_ops && (flags & IORESOURCE_DAX_SPARSE_CAP)) + return NULL; + dax_region = kzalloc(sizeof(*dax_region), GFP_KERNEL); if (!dax_region) return NULL; dev_set_drvdata(parent, dax_region); kref_init(&dax_region->kref); + dax_region->sparse_ops = sparse_ops; dax_region->id = region_id; dax_region->align = align; dax_region->dev = parent; @@ -929,7 +990,8 @@ static int devm_register_dax_mapping(struct dev_dax *dev_dax, int range_id) } static int alloc_dev_dax_range(struct resource *parent, struct dev_dax *dev_dax, - u64 start, resource_size_t size) + u64 start, resource_size_t size, + struct dax_extent *dax_ext) { struct device *dev = &dev_dax->dev; struct dev_dax_range *ranges; @@ -968,6 +1030,7 @@ static int alloc_dev_dax_range(struct resource *parent, struct dev_dax *dev_dax, .start = alloc->start, .end = alloc->end, }, + .dax_ext = dax_ext, }; dev_dbg(dev, "alloc range[%d]: %pa:%pa\n", dev_dax->nr_range - 1, @@ -1050,7 +1113,8 @@ static int dev_dax_shrink(struct dev_dax *dev_dax, resource_size_t size) int i; for (i = dev_dax->nr_range - 1; i >= 0; i--) { - struct range *range = &dev_dax->ranges[i].range; + struct dev_dax_range *dev_range = &dev_dax->ranges[i]; + struct range *range = &dev_range->range; struct dax_mapping *mapping = dev_dax->ranges[i].mapping; struct resource *adjust = NULL, *res; resource_size_t shrink; @@ -1066,12 +1130,21 @@ static int dev_dax_shrink(struct dev_dax *dev_dax, resource_size_t size) continue; } - for_each_dax_region_resource(dax_region, res) - if (strcmp(res->name, dev_name(dev)) == 0 - && res->start == range->start) { - adjust = res; - break; - } + if (dev_range->dax_ext) { + for_each_extent_resource(dev_range->dax_ext->res, res) + if (strcmp(res->name, dev_name(dev)) == 0 + && res->start == range->start) { + adjust = res; + break; + } + } else { + for_each_dax_region_resource(dax_region, res) + if (strcmp(res->name, dev_name(dev)) == 0 + && res->start == range->start) { + adjust = res; + break; + } + } if (dev_WARN_ONCE(dev, !adjust || i != dev_dax->nr_range - 1, "failed to find matching resource\n")) @@ -1109,19 +1182,21 @@ static bool adjust_ok(struct dev_dax *dev_dax, struct resource *res) } /** - * dev_dax_resize_static - Expand the device into the unused portion of the - * region. This may involve adjusting the end of an existing resource, or - * allocating a new resource. + * __dev_dax_resize - Expand the device into the unused portion of the region. + * This may involve adjusting the end of an existing resource, or allocating a + * new resource. * * @parent: parent resource to allocate this range in * @dev_dax: DAX device to be expanded * @to_alloc: amount of space to alloc; must be <= space available in @parent + * @dax_ext: if sparse; the extent containing parent * * Return the amount of space allocated or -ERRNO on failure */ -static ssize_t dev_dax_resize_static(struct resource *parent, - struct dev_dax *dev_dax, - resource_size_t to_alloc) +static ssize_t __dev_dax_resize(struct resource *parent, + struct dev_dax *dev_dax, + resource_size_t to_alloc, + struct dax_extent *dax_ext) { struct resource *res, *first; int rc; @@ -1129,7 +1204,8 @@ static ssize_t dev_dax_resize_static(struct resource *parent, first = parent->child; if (!first) { rc = alloc_dev_dax_range(parent, dev_dax, - parent->start, to_alloc); + parent->start, to_alloc, + dax_ext); if (rc) return rc; return to_alloc; @@ -1143,7 +1219,8 @@ static ssize_t dev_dax_resize_static(struct resource *parent, if (res == first && res->start > parent->start) { alloc = min(res->start - parent->start, to_alloc); rc = alloc_dev_dax_range(parent, dev_dax, - parent->start, alloc); + parent->start, alloc, + dax_ext); if (rc) return rc; return alloc; @@ -1167,7 +1244,8 @@ static ssize_t dev_dax_resize_static(struct resource *parent, return rc; return alloc; } - rc = alloc_dev_dax_range(parent, dev_dax, res->end + 1, alloc); + rc = alloc_dev_dax_range(parent, dev_dax, res->end + 1, alloc, + dax_ext); if (rc) return rc; return alloc; @@ -1178,6 +1256,56 @@ static ssize_t dev_dax_resize_static(struct resource *parent, return 0; } +static ssize_t dev_dax_resize_static(struct dax_region *dax_region, + struct dev_dax *dev_dax, + resource_size_t to_alloc) +{ + return __dev_dax_resize(&dax_region->res, dev_dax, to_alloc, NULL); +} + +static int dax_ext_match_avail_size(struct device *dev, resource_size_t *size_avail) +{ + resource_size_t extent_max; + struct dax_extent *dax_ext; + + dax_ext = dev_get_drvdata(dev); + if (!dax_ext || dax_ext->invalid) + return 0; + + extent_max = dax_extent_avail_size(dax_ext->res); + if (!extent_max) + return 0; + + *size_avail = extent_max; + dax_ext->use_cnt++; + return 1; +} + +static ssize_t dev_dax_resize_sparse(struct dax_region *dax_region, + struct dev_dax *dev_dax, + resource_size_t to_alloc) +{ + struct dax_extent *dax_ext; + resource_size_t extent_max; + struct device *ext_dev; + ssize_t alloc; + + ext_dev = dax_region->sparse_ops->find_ext(dax_region, &extent_max, + dax_ext_match_avail_size); + if (!ext_dev) + return -ENOSPC; + + dax_ext = dev_get_drvdata(ext_dev); + if (!dax_ext) + return -ENOSPC; + + to_alloc = min(extent_max, to_alloc); + alloc = __dev_dax_resize(dax_ext->res, dev_dax, to_alloc, dax_ext); + if (alloc < 0) + dax_ext->use_cnt--; + return alloc; +} + static ssize_t dev_dax_resize(struct dax_region *dax_region, struct dev_dax *dev_dax, resource_size_t size) { @@ -1201,7 +1329,10 @@ static ssize_t dev_dax_resize(struct dax_region *dax_region, return -ENXIO; retry: - alloc = dev_dax_resize_static(&dax_region->res, dev_dax, to_alloc); + if (is_sparse(dax_region)) + alloc = dev_dax_resize_sparse(dax_region, dev_dax, to_alloc); + else + alloc = dev_dax_resize_static(dax_region, dev_dax, to_alloc); if (alloc <= 0) return alloc; to_alloc -= alloc; @@ -1311,7 +1442,7 @@ static ssize_t mapping_store(struct device *dev, struct device_attribute *attr, to_alloc = range_len(&r); if (alloc_is_aligned(dev_dax, to_alloc)) rc = alloc_dev_dax_range(&dax_region->res, dev_dax, r.start, - to_alloc); + to_alloc, NULL); up_write(&dax_dev_rwsem); up_write(&dax_region_rwsem); @@ -1536,8 +1667,14 @@ static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data) device_initialize(dev); dev_set_name(dev, "dax%d.%d", dax_region->id, dev_dax->id); + if (is_sparse(dax_region) && data->size) { + dev_err(parent, "Sparse DAX region devices are created initially with 0 size"); + rc = -EINVAL; + goto err_id; + } + rc = alloc_dev_dax_range(&dax_region->res, dev_dax, dax_region->res.start, - data->size); + data->size, NULL); if (rc) goto err_range; diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h index 783bfeef42cc..4127eee1bd6d 100644 --- a/drivers/dax/bus.h +++ b/drivers/dax/bus.h @@ -9,6 +9,7 @@ struct dev_dax; struct resource; struct dax_device; struct dax_region; +struct dax_reg_sparse_ops; /* dax bus specific ioresource flags */ #define IORESOURCE_DAX_STATIC BIT(0) @@ -17,7 +18,7 @@ struct dax_region; struct dax_region *alloc_dax_region(struct device *parent, int region_id, struct range *range, int target_node, unsigned int align, - unsigned long flags); + unsigned long flags, struct dax_reg_sparse_ops *sparse_ops); struct dev_dax_data { struct dax_region *dax_region; diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c index 83ee45aff69a..3cb95e5988ae 100644 --- a/drivers/dax/cxl.c +++ b/drivers/dax/cxl.c @@ -53,7 +53,7 @@ static int cxl_dax_region_notify(struct device *dev, case DCD_ADD_CAPACITY: return __cxl_dax_region_add_extent(dax_region, reg_ext); case DCD_RELEASE_CAPACITY: - return 0; + return dax_region_rm_extent(dax_region, ®_ext->dev); case DCD_FORCED_CAPACITY_RELEASE: default: dev_err(&cxlr_dax->dev, "Unknown DC event %d\n", nd->event); @@ -63,6 +63,57 @@ static int cxl_dax_region_notify(struct device *dev, return -ENXIO; } +struct match_data { + match_cb match_fn; + resource_size_t *size_avail; +}; + +static int cxl_dax_match_ext(struct device *dev, void *data) +{ + struct match_data *md = data; + + if (!is_region_extent(dev)) + return 0; + + return md->match_fn(dev, md->size_avail); +} + +/** + * find_ext - Match Extent callback + * @dax_region: region to search + * @size_avail: the available size if an extent is found + * @match_fn: match function + * + * Callback to itterate through the child devices of the DAX region calling + * match_fn only on those devices which are extents. + * + * If a match is found match_fn is responsible for locking or reference + * counting dax_ext as needed. + */ +static struct device *find_ext(struct dax_region *dax_region, + resource_size_t *size_avail, + match_cb match_fn) +{ + struct match_data md = { + .match_fn = match_fn, + .size_avail = size_avail, + }; + struct device *ext_dev; + + ext_dev = device_find_child(dax_region->dev, &md, cxl_dax_match_ext); + + if (!ext_dev) + return NULL; + + /* caller must hold a count on extent data */ + put_device(ext_dev); + return ext_dev; +} + +struct dax_reg_sparse_ops sparse_ops = { + .find_ext = find_ext, +}; + static int cxl_dax_region_probe(struct device *dev) { struct cxl_dax_region *cxlr_dax = to_cxl_dax_region(dev); @@ -81,7 +132,7 @@ static int cxl_dax_region_probe(struct device *dev) flags |= IORESOURCE_DAX_SPARSE_CAP; dax_region = alloc_dax_region(dev, cxlr->id, &cxlr_dax->hpa_range, nid, - PMD_SIZE, flags); + PMD_SIZE, flags, &sparse_ops); if (!dax_region) return -ENOMEM; diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index ac1ccf158650..fe3b271e721c 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -20,13 +20,32 @@ void dax_bus_exit(void); * struct dax_extent - For sparse regions; an active extent * @region: dax_region this resources is in * @res: resource this extent covers + * @invalid: extent is invalid and going away + * @use_cnt: count the number of uses of this extent */ struct dax_extent { struct dax_region *region; struct resource *res; + bool invalid; + unsigned int use_cnt; }; int dax_region_add_extent(struct dax_region *dax_region, struct device *ext_dev, resource_size_t start, resource_size_t length); +int dax_region_rm_extent(struct dax_region *dax_region, + struct device *ext_dev); +unsigned long long dax_extent_avail_size(struct resource *ext_res); + +typedef int (*match_cb)(struct device *dev, resource_size_t *size_avail); + +/** + * struct dax_reg_sparse_ops - Operations for sparse regions + * @find_ext: Find the extent matched with match_fn + */ +struct dax_reg_sparse_ops { + struct device *(*find_ext)(struct dax_region *dax_region, + resource_size_t *size_avail, + match_cb match_fn); +}; /** * struct dax_region - mapping infrastructure for dax devices @@ -39,6 +58,7 @@ int dax_region_add_extent(struct dax_region *dax_region, struct device *ext_dev, * @res: resource tree to track instance allocations * @seed: allow userspace to find the first unbound seed device * @youngest: allow userspace to find the most recently created device + * @sparse_ops: operations required for sparce regions */ struct dax_region { int id; @@ -50,6 +70,7 @@ struct dax_region { struct resource res; struct device *seed; struct device *youngest; + struct dax_reg_sparse_ops *sparse_ops; }; struct dax_mapping { @@ -74,6 +95,7 @@ struct dax_mapping { * @pgoff: page offset * @range: resource-span * @mapping: device to assist in interrogating the range layout + * @dax_ext: if not NULL; dax region extent referenced by this range */ struct dev_dax { struct dax_region *region; @@ -91,6 +113,7 @@ struct dev_dax { unsigned long pgoff; struct range range; struct dax_mapping *mapping; + struct dax_extent *dax_ext; } *ranges; }; diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c index b9da69f92697..c5ddbcef532f 100644 --- a/drivers/dax/hmem/hmem.c +++ b/drivers/dax/hmem/hmem.c @@ -28,7 +28,7 @@ static int dax_hmem_probe(struct platform_device *pdev) mri = dev->platform_data; dax_region = alloc_dax_region(dev, pdev->id, &mri->range, - mri->target_node, PMD_SIZE, flags); + mri->target_node, PMD_SIZE, flags, NULL); if (!dax_region) return -ENOMEM; diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c index f3c6c67b8412..acb311539272 100644 --- a/drivers/dax/pmem.c +++ b/drivers/dax/pmem.c @@ -54,7 +54,7 @@ static struct dev_dax *__dax_pmem_probe(struct device *dev) range.start += offset; dax_region = alloc_dax_region(dev, region_id, &range, nd_region->target_node, le32_to_cpu(pfn_sb->align), - IORESOURCE_DAX_STATIC); + IORESOURCE_DAX_STATIC, NULL); if (!dax_region) return ERR_PTR(-ENOMEM); From patchwork Sun Mar 24 23:18:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601031 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 487B515ECD7; Sun, 24 Mar 2024 23:18:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322313; cv=none; b=Vt0UrW2rvZ48j1wyeJJeJVmn31oq90YWmDElpYI/flIKC9lVvT2+IJu/3pmb8ppTL7U6hhHaZY/IhlIiz2qKksh0oJ78Db5Oe0gLvtbneX7ag/Mz4BeGEZ8hezrUT1JkiGwEkOlVURbBwSCmXTm8RH8UHi7fMazDyYFUujEhUYU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322313; c=relaxed/simple; bh=pNPNa0Vz/2FAaxcjyzojaOJ1PyeVu+MD+20K6Q2OzsU=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=VioB5Pf3EAj4LxxICNEZa2C4MtRVynUQs4C86jK8WXAjPbYGWGaby58m+m4SqQQQBnsI1dRX8XxMxBjWKSqGmnYIsEpIdnigE2Qo3HHd9hrY4NJ5BTxvICM4cDkQbIgck93iXYZA2/k0EsiwuHrC5VBQaXWxDK5JVQsz7wZTJYw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=TdJyXdAs; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="TdJyXdAs" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322311; x=1742858311; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=pNPNa0Vz/2FAaxcjyzojaOJ1PyeVu+MD+20K6Q2OzsU=; b=TdJyXdAseflgduYLXTQJi+4f5cHpA5n+LebU7Zx/2Ghug52skTHzJ6GG FctlBRDDf0HQnPkiBfGC+mrym8SXIDg2+yDdft+fVq5yqXkmYOPhHGKJK 5I0CD8tK7tJ23EkMYo7bv7GVzwhkNCDW0lfM5pxHm46vcM9Qint03i7pt qdXfeHP25jrq1cNc+T0vNXFq/1MVttNHIe5ztXN7j7lRMUKPAaMDLIsTS +Jjx96f6dPE5r72r/gtPnwoJfmzHojUZn2/JYowlwBfge1agaR+oTT1PO 1s90C+jv5XS0z2HwqkEPrRdmViLi9VCpj1vFG8MTBI3UTkL8tfcVM+HDU Q==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="6431769" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="6431769" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:24 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15464736" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:23 -0700 From: ira.weiny@intel.com Date: Sun, 24 Mar 2024 16:18:26 -0700 Subject: [PATCH 23/26] cxl/mem: Trace Dynamic capacity Event Record Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-23-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=3453; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=5XTbxRhNm2c6lB8ckOSWSRy74o/IK7D9ey1y7O1QDZc=; b=XLx7Ic9m9LZoZXzIb3TSREdQX3qbEzBgVJ7tNOfbzEsn6szfaFsIkskSLYoKDGXyjA0LQa2P2 sFqWG+8D0XyDLd5dPNtPXZUVHTGJDNrxd1YExWpT9z20z25ezydKBoP X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh CXL rev 3.1 section 8.2.9.2.1 adds the Dynamic Capacity Event Records. Notify the host of extents being added or removed. User space has little use for these events other than for debugging. Add DC trace points to the trace log for debugging purposes. Signed-off-by: Navneet Singh Signed-off-by: Ira Weiny Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron --- Changes for v1 [iweiny: Adjust to new trace code] --- drivers/cxl/core/mbox.c | 4 +++ drivers/cxl/core/trace.h | 65 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 7babac2d1c95..cb4576890187 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -978,6 +978,10 @@ static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd, ev_type = CXL_CPER_EVENT_DRAM; else if (uuid_equal(uuid, &CXL_EVENT_MEM_MODULE_UUID)) ev_type = CXL_CPER_EVENT_MEM_MODULE; + else if (uuid_equal(uuid, &CXL_EVENT_DC_EVENT_UUID)) { + trace_cxl_dynamic_capacity(cxlmd, type, &record->event.dcd); + return; + } cxl_event_trace_record(cxlmd, type, ev_type, uuid, &record->event); } diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h index bdf117a33744..7646fdd9aee3 100644 --- a/drivers/cxl/core/trace.h +++ b/drivers/cxl/core/trace.h @@ -707,6 +707,71 @@ TRACE_EVENT(cxl_poison, ) ); +/* + * DYNAMIC CAPACITY Event Record - DER + * + * CXL rev 3.0 section 8.2.9.2.1.5 Table 8-47 + */ + +#define CXL_DC_ADD_CAPACITY 0x00 +#define CXL_DC_REL_CAPACITY 0x01 +#define CXL_DC_FORCED_REL_CAPACITY 0x02 +#define CXL_DC_REG_CONF_UPDATED 0x03 +#define show_dc_evt_type(type) __print_symbolic(type, \ + { CXL_DC_ADD_CAPACITY, "Add capacity"}, \ + { CXL_DC_REL_CAPACITY, "Release capacity"}, \ + { CXL_DC_FORCED_REL_CAPACITY, "Forced capacity release"}, \ + { CXL_DC_REG_CONF_UPDATED, "Region Configuration Updated" } \ +) + +TRACE_EVENT(cxl_dynamic_capacity, + + TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log, + struct cxl_event_dcd *rec), + + TP_ARGS(cxlmd, log, rec), + + TP_STRUCT__entry( + CXL_EVT_TP_entry + + /* Dynamic capacity Event */ + __field(u8, event_type) + __field(u16, hostid) + __field(u8, region_id) + __field(u64, dpa_start) + __field(u64, length) + __array(u8, tag, CXL_DC_EXTENT_TAG_LEN) + __field(u16, sh_extent_seq) + ), + + TP_fast_assign( + CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr); + + /* Dynamic_capacity Event */ + __entry->event_type = rec->event_type; + + /* DCD event record data */ + __entry->hostid = le16_to_cpu(rec->host_id); + __entry->region_id = rec->region_index; + __entry->dpa_start = le64_to_cpu(rec->extent.start_dpa); + __entry->length = le64_to_cpu(rec->extent.length); + memcpy(__entry->tag, &rec->extent.tag, CXL_DC_EXTENT_TAG_LEN); + __entry->sh_extent_seq = le16_to_cpu(rec->extent.shared_extn_seq); + ), + + CXL_EVT_TP_printk("event_type='%s' host_id='%d' region_id='%d' " \ + "starting_dpa=%llx length=%llx tag=%s " \ + "shared_extent_sequence=%d", + show_dc_evt_type(__entry->event_type), + __entry->hostid, + __entry->region_id, + __entry->dpa_start, + __entry->length, + __print_hex(__entry->tag, CXL_DC_EXTENT_TAG_LEN), + __entry->sh_extent_seq + ) +); + #endif /* _CXL_EVENTS_H */ #define TRACE_INCLUDE_FILE trace From patchwork Sun Mar 24 23:18:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601032 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 045F615ECF9; Sun, 24 Mar 2024 23:18:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322315; cv=none; b=CLMPzDie1jfCxBlPaLeiftWFY0TzsTHlVZYZBZPbyBy+yOUNsr/TyFDG/Pk/cJa7sbqx/PfHxJcK8lHMVd4F9eF1vumpF1bZLhggEQLpfe16A0IXvkAkgCsVL95rmjmUtGwT0buOOo7C74MijhhAwOavTtRqdrLpWMCSSdDxv4o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322315; c=relaxed/simple; bh=qNDdcDSP8wEaAJJd7uyKmBgAQscJG7aJdkOstM+hti4=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=i6TApecoIvKd4Tz3xQyqO7NpzgDrRpEWlcDzrDHB5ikGn23hwwJZby57BSJTckCJ6gMSb32MBPVHAc/3UTpxsbH6FFLx7fwxYJVHdPgOfJ9eorc776eS//5B94jdXS+JTe9V2FI47ZPy4Vjpn2+7mkQdpuapjw863RZ6Bl5hS1A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=baGJvKbb; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="baGJvKbb" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322313; x=1742858313; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=qNDdcDSP8wEaAJJd7uyKmBgAQscJG7aJdkOstM+hti4=; b=baGJvKbbKckOK+FR8mNvCMdMcxR11Koj4VuDaT8zl937AGBbasHturg1 eYAALQXTdyQMptZ9U1ehNuAHHs5+95PKHrmktuJ4lniLTB7gTnPrkIxFu dkB8R0dLrKFHeTG8BxXV/ysxGcrALRnbE5GVhxsxnsVr5TomEXJhoA6M+ ReUa5QG24EJyr7KbYKd7k9D9NKQEow7q4QssjFkTQsBqsWnydbwzlLg8R msXVj+ivgdaE9V9hJLVLWofFTcthI+p/BvUsSxAXXfHKXgJY26+42bJh+ dlRBOW0cpD3yv/wM7yBku3hlFLuax5GArjThqsbs+ldUcsdMGffVdH7FH Q==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="6431773" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="6431773" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:24 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15464739" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:24 -0700 From: Ira Weiny Date: Sun, 24 Mar 2024 16:18:27 -0700 Subject: [PATCH 24/26] tools/testing/cxl: Make event logs dynamic Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-24-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=15319; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=qNDdcDSP8wEaAJJd7uyKmBgAQscJG7aJdkOstM+hti4=; b=x25M1ktIhLuCYWQpdNm/VSic4+qXTwF6HdG8/iGYkv0AARVpR+KWbfbGl/YkaJdqRPvQAqFux BLZzYUBV2I5Au34+EkcPb69duZFWTeGoVHs5kxGGj+oFTKqNbFOpJk2 X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= The test event logs were created as static arrays as an easy way to mock events. Dynamic Capacity Device (DCD) test support requires events be generated dynamically when extents are created or destroyed. Modify the event log storage to be dynamically allocated. Reuse the static event data to create the dynamic events in the new logs without inventing complex event injection for the previous tests. Simplify the processing of the logs by using the event log array index as the handle. Add a lock to manage concurrency required when user space is allowed to control DCD extents Signed-off-by: Ira Weiny --- Changes for v1 [iweiny: Adjust for new event code] --- tools/testing/cxl/test/mem.c | 281 ++++++++++++++++++++++++++----------------- 1 file changed, 172 insertions(+), 109 deletions(-) diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c index 35ee41e435ab..d8d62e6eeb18 100644 --- a/tools/testing/cxl/test/mem.c +++ b/tools/testing/cxl/test/mem.c @@ -124,18 +124,27 @@ static struct { #define PASS_TRY_LIMIT 3 -#define CXL_TEST_EVENT_CNT_MAX 15 +#define CXL_TEST_EVENT_CNT_MAX 17 /* Set a number of events to return at a time for simulation. */ #define CXL_TEST_EVENT_CNT 3 +/* + * @next_handle: next handle (index) to be stored to + * @cur_handle: current handle (index) to be returned to the user on get_event + * @nr_events: total events in this log + * @nr_overflow: number of events added past the log size + * @lock: protect these state variables + * @events: array of pending events to be returned. + */ struct mock_event_log { - u16 clear_idx; - u16 cur_idx; + u16 next_handle; + u16 cur_handle; u16 nr_events; u16 nr_overflow; - u16 overflow_reset; - struct cxl_event_record_raw *events[CXL_TEST_EVENT_CNT_MAX]; + rwlock_t lock; + /* 1 extra slot to accommodate that handles can't be 0 */ + struct cxl_event_record_raw *events[CXL_TEST_EVENT_CNT_MAX + 1]; }; struct mock_event_store { @@ -170,64 +179,76 @@ static struct mock_event_log *event_find_log(struct device *dev, int log_type) return &mdata->mes.mock_logs[log_type]; } -static struct cxl_event_record_raw *event_get_current(struct mock_event_log *log) -{ - return log->events[log->cur_idx]; -} - -static void event_reset_log(struct mock_event_log *log) -{ - log->cur_idx = 0; - log->clear_idx = 0; - log->nr_overflow = log->overflow_reset; -} - /* Handle can never be 0 use 1 based indexing for handle */ -static u16 event_get_clear_handle(struct mock_event_log *log) +static void event_inc_handle(u16 *handle) { - return log->clear_idx + 1; + *handle = (*handle + 1) % CXL_TEST_EVENT_CNT_MAX; + if (!*handle) + *handle = *handle + 1; } -/* Handle can never be 0 use 1 based indexing for handle */ -static __le16 event_get_cur_event_handle(struct mock_event_log *log) -{ - u16 cur_handle = log->cur_idx + 1; - - return cpu_to_le16(cur_handle); -} - -static bool event_log_empty(struct mock_event_log *log) -{ - return log->cur_idx == log->nr_events; -} - -static void mes_add_event(struct mock_event_store *mes, +/* Add the event or free it on 'overflow' */ +static void mes_add_event(struct cxl_mockmem_data *mdata, enum cxl_event_log_type log_type, struct cxl_event_record_raw *event) { + struct device *dev = mdata->mds->cxlds.dev; struct mock_event_log *log; + u16 handle; if (WARN_ON(log_type >= CXL_EVENT_TYPE_MAX)) return; - log = &mes->mock_logs[log_type]; + log = &mdata->mes.mock_logs[log_type]; - if ((log->nr_events + 1) > CXL_TEST_EVENT_CNT_MAX) { + write_lock(&log->lock); + + handle = log->next_handle; + if ((handle + 1) == log->cur_handle) { log->nr_overflow++; - log->overflow_reset = log->nr_overflow; - return; + dev_dbg(dev, "Overflowing %d\n", log_type); + devm_kfree(dev, event); + goto unlock; } - log->events[log->nr_events] = event; + dev_dbg(dev, "Log %d; handle %u\n", log_type, handle); + event->event.generic.hdr.handle = cpu_to_le16(handle); + log->events[handle] = event; + event_inc_handle(&log->next_handle); log->nr_events++; + +unlock: + write_unlock(&log->lock); +} + +static void mes_del_event(struct device *dev, + struct mock_event_log *log, + u16 handle) +{ + struct cxl_event_record_raw *cur; + + lockdep_assert(lockdep_is_held(&log->lock)); + + dev_dbg(dev, "Clearing event %u; cur %u\n", handle, log->cur_handle); + cur = log->events[handle]; + if (!cur) { + dev_err(dev, "Mock event index %u empty? nr_events %u", + handle, log->nr_events); + return; + } + log->events[handle] = NULL; + + event_inc_handle(&log->cur_handle); + log->nr_events--; + devm_kfree(dev, cur); } static int mock_get_event(struct device *dev, struct cxl_mbox_cmd *cmd) { struct cxl_get_event_payload *pl; struct mock_event_log *log; - u16 nr_overflow; u8 log_type; + u16 handle; int i; if (cmd->size_in != sizeof(log_type)) @@ -240,31 +261,40 @@ static int mock_get_event(struct device *dev, struct cxl_mbox_cmd *cmd) if (log_type >= CXL_EVENT_TYPE_MAX) return -EINVAL; - memset(cmd->payload_out, 0, cmd->size_out); - log = event_find_log(dev, log_type); - if (!log || event_log_empty(log)) + if (!log) return 0; + memset(cmd->payload_out, 0, cmd->size_out); pl = cmd->payload_out; - for (i = 0; i < CXL_TEST_EVENT_CNT && !event_log_empty(log); i++) { - memcpy(&pl->records[i], event_get_current(log), - sizeof(pl->records[i])); - pl->records[i].event.generic.hdr.handle = - event_get_cur_event_handle(log); - log->cur_idx++; + read_lock(&log->lock); + + handle = log->cur_handle; + dev_dbg(dev, "Get log %d handle %u next %u\n", + log_type, handle, log->next_handle); + for (i = 0; + i < CXL_TEST_EVENT_CNT && handle != log->next_handle; + i++, event_inc_handle(&handle)) { + struct cxl_event_record_raw *cur; + + cur = log->events[handle]; + dev_dbg(dev, "Sending event log %d handle %d idx %u\n", + log_type, le16_to_cpu(cur->event.generic.hdr.handle), + handle); + memcpy(&pl->records[i], cur, sizeof(pl->records[i])); + pl->records[i].event.generic.hdr.handle = cpu_to_le16(handle); } pl->record_count = cpu_to_le16(i); - if (!event_log_empty(log)) + if (log->nr_events > i) pl->flags |= CXL_GET_EVENT_FLAG_MORE_RECORDS; if (log->nr_overflow) { u64 ns; pl->flags |= CXL_GET_EVENT_FLAG_OVERFLOW; - pl->overflow_err_count = cpu_to_le16(nr_overflow); + pl->overflow_err_count = cpu_to_le16(log->nr_overflow); ns = ktime_get_real_ns(); ns -= 5000000000; /* 5s ago */ pl->first_overflow_timestamp = cpu_to_le64(ns); @@ -273,16 +303,17 @@ static int mock_get_event(struct device *dev, struct cxl_mbox_cmd *cmd) pl->last_overflow_timestamp = cpu_to_le64(ns); } + read_unlock(&log->lock); return 0; } static int mock_clear_event(struct device *dev, struct cxl_mbox_cmd *cmd) { struct cxl_mbox_clear_event_payload *pl = cmd->payload_in; - struct mock_event_log *log; u8 log_type = pl->event_log; + struct mock_event_log *log; + int nr, rc = 0; u16 handle; - int nr; if (log_type >= CXL_EVENT_TYPE_MAX) return -EINVAL; @@ -291,24 +322,23 @@ static int mock_clear_event(struct device *dev, struct cxl_mbox_cmd *cmd) if (!log) return 0; /* No mock data in this log */ - /* - * This check is technically not invalid per the specification AFAICS. - * (The host could 'guess' handles and clear them in order). - * However, this is not good behavior for the host so test it. - */ - if (log->clear_idx + pl->nr_recs > log->cur_idx) { - dev_err(dev, - "Attempting to clear more events than returned!\n"); - return -EINVAL; - } + write_lock(&log->lock); /* Check handle order prior to clearing events */ - for (nr = 0, handle = event_get_clear_handle(log); - nr < pl->nr_recs; - nr++, handle++) { + handle = log->cur_handle; + for (nr = 0; + nr < pl->nr_recs && handle != log->next_handle; + nr++, event_inc_handle(&handle)) { + + dev_dbg(dev, "Checking clear of %d handle %u plhandle %u\n", + log_type, handle, + le16_to_cpu(pl->handles[nr])); + if (handle != le16_to_cpu(pl->handles[nr])) { - dev_err(dev, "Clearing events out of order\n"); - return -EINVAL; + dev_err(dev, "Clearing events out of order %u %u\n", + handle, le16_to_cpu(pl->handles[nr])); + rc = -EINVAL; + goto unlock; } } @@ -316,25 +346,12 @@ static int mock_clear_event(struct device *dev, struct cxl_mbox_cmd *cmd) log->nr_overflow = 0; /* Clear events */ - log->clear_idx += pl->nr_recs; - return 0; -} + for (nr = 0; nr < pl->nr_recs; nr++) + mes_del_event(dev, log, le16_to_cpu(pl->handles[nr])); -static void cxl_mock_event_trigger(struct device *dev) -{ - struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); - struct mock_event_store *mes = &mdata->mes; - int i; - - for (i = CXL_EVENT_TYPE_INFO; i < CXL_EVENT_TYPE_MAX; i++) { - struct mock_event_log *log; - - log = event_find_log(dev, i); - if (log) - event_reset_log(log); - } - - cxl_mem_get_event_records(mdata->mds, mes->ev_status); +unlock: + write_unlock(&log->lock); + return rc; } struct cxl_event_record_raw maint_needed = { @@ -459,8 +476,27 @@ static int mock_set_timestamp(struct cxl_dev_state *cxlds, return 0; } -static void cxl_mock_add_event_logs(struct mock_event_store *mes) +/* Create a dynamically allocated event out of a statically defined event. */ +static void add_event_from_static(struct cxl_mockmem_data *mdata, + enum cxl_event_log_type log_type, + struct cxl_event_record_raw *raw) { + struct device *dev = mdata->mds->cxlds.dev; + struct cxl_event_record_raw *rec; + + rec = devm_kmemdup(dev, raw, sizeof(*rec), GFP_KERNEL); + if (!rec) { + dev_err(dev, "Failed to alloc event for log\n"); + return; + } + mes_add_event(mdata, log_type, rec); +} + +static void cxl_mock_add_event_logs(struct cxl_mockmem_data *mdata) +{ + struct mock_event_store *mes = &mdata->mes; + struct device *dev = mdata->mds->cxlds.dev; + put_unaligned_le16(CXL_GMER_VALID_CHANNEL | CXL_GMER_VALID_RANK, &gen_media.rec.validity_flags); @@ -468,43 +504,60 @@ static void cxl_mock_add_event_logs(struct mock_event_store *mes) CXL_DER_VALID_BANK | CXL_DER_VALID_COLUMN, &dram.rec.validity_flags); - mes_add_event(mes, CXL_EVENT_TYPE_INFO, &maint_needed); - mes_add_event(mes, CXL_EVENT_TYPE_INFO, + dev_dbg(dev, "Generating fake event logs %d\n", + CXL_EVENT_TYPE_INFO); + add_event_from_static(mdata, CXL_EVENT_TYPE_INFO, &maint_needed); + add_event_from_static(mdata, CXL_EVENT_TYPE_INFO, (struct cxl_event_record_raw *)&gen_media); - mes_add_event(mes, CXL_EVENT_TYPE_INFO, + add_event_from_static(mdata, CXL_EVENT_TYPE_INFO, (struct cxl_event_record_raw *)&mem_module); mes->ev_status |= CXLDEV_EVENT_STATUS_INFO; - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &maint_needed); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, + dev_dbg(dev, "Generating fake event logs %d\n", + CXL_EVENT_TYPE_FAIL); + add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &maint_needed); + add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, + (struct cxl_event_record_raw *)&mem_module); + add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, (struct cxl_event_record_raw *)&dram); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, + add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, (struct cxl_event_record_raw *)&gen_media); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, + add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, (struct cxl_event_record_raw *)&mem_module); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, + add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, (struct cxl_event_record_raw *)&dram); /* Overflow this log */ - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace); mes->ev_status |= CXLDEV_EVENT_STATUS_FAIL; - mes_add_event(mes, CXL_EVENT_TYPE_FATAL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FATAL, + dev_dbg(dev, "Generating fake event logs %d\n", + CXL_EVENT_TYPE_FATAL); + add_event_from_static(mdata, CXL_EVENT_TYPE_FATAL, &hardware_replace); + add_event_from_static(mdata, CXL_EVENT_TYPE_FATAL, (struct cxl_event_record_raw *)&dram); mes->ev_status |= CXLDEV_EVENT_STATUS_FATAL; } +static void cxl_mock_event_trigger(struct device *dev) +{ + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); + struct mock_event_store *mes = &mdata->mes; + + cxl_mock_add_event_logs(mdata); + cxl_mem_get_event_records(mdata->mds, mes->ev_status); +} + static int mock_gsl(struct cxl_mbox_cmd *cmd) { if (cmd->size_out < sizeof(mock_gsl_payload)) @@ -1438,6 +1491,14 @@ static ssize_t event_trigger_store(struct device *dev, } static DEVICE_ATTR_WO(event_trigger); +static void init_event_log(struct mock_event_log *log) +{ + rwlock_init(&log->lock); + /* Handle can never be 0 use 1 based indexing for handle */ + log->cur_handle = 1; + log->next_handle = 1; +} + static int cxl_mock_mem_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; @@ -1504,7 +1565,9 @@ static int cxl_mock_mem_probe(struct platform_device *pdev) if (rc) return rc; - cxl_mock_add_event_logs(&mdata->mes); + for (int i = 0; i < CXL_EVENT_TYPE_MAX; i++) + init_event_log(&mdata->mes.mock_logs[i]); + cxl_mock_add_event_logs(mdata); cxlmd = devm_cxl_add_memdev(&pdev->dev, cxlds); if (IS_ERR(cxlmd)) From patchwork Sun Mar 24 23:18:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AEC9315ECFE; Sun, 24 Mar 2024 23:18:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322316; cv=none; b=CUy/TbGsFazrVdeTcUrULrW/1h1tGrXjqUEqlCyHEl/pA5l2GvxZ1WAnM5Sgpi+QYixmkooDYpJkftNV1QiOKdNU4gy5J0RmzljC0oJiGI0dnefcCvCIl5DP2VzaFdelRyWEVsNYiILx/BC5gBPjGl3cQdhoR0tOmjKrXNQYAJQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322316; c=relaxed/simple; bh=1FDjkLijZOPxtk8eVYc1f5q/hCjbFQPfEX5BsvtZmb8=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=nByTxaqtPUWlLNn5FhL8Xmn5VOAz6y99WvG8Kue16hh4L6OcC/EjlqGtbxV5Mx/V7Np5rSpmNJ+kyrOK0GnSvfWKXipEU/ZzyAXCJqfrQdTUbz8rNjr/JRqgG/rm7MUcYrYSISzmC7wosju3Eb92f6J6jqtHqy/VzMILnKTD13U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=fTZZUjLt; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="fTZZUjLt" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322314; x=1742858314; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=1FDjkLijZOPxtk8eVYc1f5q/hCjbFQPfEX5BsvtZmb8=; b=fTZZUjLtqceY9nnMZmMjAsZiIhWVStN+jg961SBfuw4MDrUTyuY92gnE PsEOTybXWvXIAItbIhbTXgpDHYb5sOkexweD64edMdzXvc3DDClag8p3B wNfbg/FrixM0fc9l7pYn0RoCGLBk6j63Cd8Oxeo0oxlQ63i7LOyISx7ZF bxiOshm3prHbg06PPE+Y2LuIJxX3wSr1AK3vofvMIbai3aSz/0N6F3Hvb A3Y7bHUWbgITr+c1vY32sVrIE+uQpJxFY4m44YUHW9lKLZ/OPr5YhILX9 KrbXUX0etjylquap9tFd16llPRr36w/0dFGg5Aa5Bm0v+TKSQRJH56j6Q Q==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="6431777" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="6431777" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15464742" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:24 -0700 From: Ira Weiny Date: Sun, 24 Mar 2024 16:18:28 -0700 Subject: [PATCH 25/26] tools/testing/cxl: Add DC Regions to mock mem data Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-25-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=19632; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=1FDjkLijZOPxtk8eVYc1f5q/hCjbFQPfEX5BsvtZmb8=; b=XjZXq0GLKAGwzh09XmwBDQAyuRWw6F9nIcFYBHrTcUBKFv8aUT8nSvV5toiXFMs9gv8yzZFIG 7fZqoQLIdt1BJNVM1yz4xAdLtbaGflWD5s+TnhfuFVRA+lCQG5pPxQw X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= cxl_test provides a good way to ensure quick smoke and regression testing. The complexity of Dynamic Capacity (DC) devices and the new sparse DAX regions required to use them benefits greatly with a series of smoke tests. To test DC regions the mock memory devices will need mock DC information and manage fake extent data. Define mock_dc_region information within the mock memory data. Add sysfs entries on the mock device to inject and delete extents. The inject format is :: The delete format is : Add DC mailbox commands to the CEL and implement those commands. Signed-off-by: Ira Weiny --- Changes for v1 [iweiny: adjust to new events] [iweiny: remove most extent checks to allow negative testing] --- tools/testing/cxl/test/mem.c | 575 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 574 insertions(+), 1 deletion(-) diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c index d8d62e6eeb18..7d1d897d9f2b 100644 --- a/tools/testing/cxl/test/mem.c +++ b/tools/testing/cxl/test/mem.c @@ -18,6 +18,7 @@ #define FW_SLOTS 3 #define DEV_SIZE SZ_2G #define EFFECT(x) (1U << x) +#define BASE_DYNAMIC_CAP_DPA DEV_SIZE #define MOCK_INJECT_DEV_MAX 8 #define MOCK_INJECT_TEST_MAX 128 @@ -95,6 +96,22 @@ static struct cxl_cel_entry mock_cel[] = { EFFECT(SECURITY_CHANGE_IMMEDIATE) | EFFECT(BACKGROUND_OP)), }, + { + .opcode = cpu_to_le16(CXL_MBOX_OP_GET_DC_CONFIG), + .effect = CXL_CMD_EFFECT_NONE, + }, + { + .opcode = cpu_to_le16(CXL_MBOX_OP_GET_DC_EXTENT_LIST), + .effect = CXL_CMD_EFFECT_NONE, + }, + { + .opcode = cpu_to_le16(CXL_MBOX_OP_ADD_DC_RESPONSE), + .effect = cpu_to_le16(EFFECT(CONF_CHANGE_IMMEDIATE)), + }, + { + .opcode = cpu_to_le16(CXL_MBOX_OP_RELEASE_DC), + .effect = cpu_to_le16(EFFECT(CONF_CHANGE_IMMEDIATE)), + }, }; /* See CXL 2.0 Table 181 Get Health Info Output Payload */ @@ -152,6 +169,7 @@ struct mock_event_store { u32 ev_status; }; +#define NUM_MOCK_DC_REGIONS 2 struct cxl_mockmem_data { void *lsa; void *fw; @@ -168,6 +186,11 @@ struct cxl_mockmem_data { u8 event_buf[SZ_4K]; u64 timestamp; unsigned long sanitize_timeout; + struct cxl_dc_region_config dc_regions[NUM_MOCK_DC_REGIONS]; + u32 dc_ext_generation; + struct mutex ext_lock; + struct xarray dc_extents; + struct xarray dc_accepted_exts; }; static struct mock_event_log *event_find_log(struct device *dev, int log_type) @@ -558,6 +581,200 @@ static void cxl_mock_event_trigger(struct device *dev) cxl_mem_get_event_records(mdata->mds, mes->ev_status); } +struct cxl_dc_extent_data { + u64 dpa_start; + u64 length; + u8 tag[CXL_DC_EXTENT_TAG_LEN]; +}; + +static int __devm_add_extent(struct device *dev, struct xarray *array, + u64 start, u64 length, const char *tag) +{ + struct cxl_dc_extent_data *extent; + + extent = devm_kzalloc(dev, sizeof(*extent), GFP_KERNEL); + if (!extent) + return -ENOMEM; + + extent->dpa_start = start; + extent->length = length; + memcpy(extent->tag, tag, min(sizeof(extent->tag), strlen(tag))); + + if (xa_insert(array, start, extent, GFP_KERNEL)) { + devm_kfree(dev, extent); + dev_err(dev, "Failed xarry insert %#llx\n", start); + return -EINVAL; + } + + return 0; +} + +static int devm_add_extent(struct device *dev, u64 start, u64 length, + const char *tag) +{ + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); + + guard(mutex)(&mdata->ext_lock); + return __devm_add_extent(dev, &mdata->dc_extents, start, length, tag); +} + +/* It is known that ext and the new range are not equal */ +static struct cxl_dc_extent_data * +split_ext(struct device *dev, struct xarray *array, + struct cxl_dc_extent_data *ext, u64 start, u64 length) +{ + u64 new_start, new_length; + + if (ext->dpa_start == start) { + new_start = start + length; + new_length = (ext->dpa_start + ext->length) - new_start; + + if (__devm_add_extent(dev, array, new_start, new_length, + ext->tag)) + return NULL; + + ext = xa_erase(array, ext->dpa_start); + if (__devm_add_extent(dev, array, start, length, ext->tag)) + return NULL; + + return xa_load(array, start); + } + + /* ext->dpa_start != start */ + + if (__devm_add_extent(dev, array, start, length, ext->tag)) + return NULL; + + new_start = ext->dpa_start; + new_length = start - ext->dpa_start; + + ext = xa_erase(array, ext->dpa_start); + if (__devm_add_extent(dev, array, new_start, new_length, ext->tag)) + return NULL; + + return xa_load(array, start); +} + +/* + * Do not handle extents which are not inside a single extent sent to + * the host. + */ +static struct cxl_dc_extent_data * +find_create_ext(struct device *dev, struct xarray *array, u64 start, u64 length) +{ + struct cxl_dc_extent_data *ext; + unsigned long index; + + xa_for_each(array, index, ext) { + u64 end = start + length; + + /* start < [ext) <= start */ + if (start < ext->dpa_start || + (ext->dpa_start + ext->length) <= start) + continue; + + if (end <= ext->dpa_start || + (ext->dpa_start + ext->length) < end) { + dev_err(dev, "Invalid range %#llx-%#llx\n", start, + end); + return NULL; + } + + break; + } + + if (!ext) + return NULL; + + if (start == ext->dpa_start && length == ext->length) + return ext; + + return split_ext(dev, array, ext, start, length); +} + +static int dc_accept_extent(struct device *dev, u64 start, u64 length) +{ + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); + struct cxl_dc_extent_data *ext; + + dev_dbg(dev, "Host accepting extent %#llx\n", start); + mdata->dc_ext_generation++; + + guard(mutex)(&mdata->ext_lock); + ext = find_create_ext(dev, &mdata->dc_extents, start, length); + if (!ext) { + dev_err(dev, "Extent %#llx-%#llx not found\n", + start, start + length); + return -ENOMEM; + } + ext = xa_erase(&mdata->dc_extents, ext->dpa_start); + return xa_insert(&mdata->dc_accepted_exts, start, ext, GFP_KERNEL); +} + +static void release_dc_ext(void *md) +{ + struct cxl_mockmem_data *mdata = md; + + xa_destroy(&mdata->dc_extents); + xa_destroy(&mdata->dc_accepted_exts); +} + +static int cxl_mock_dc_region_setup(struct device *dev) +{ +#define DUMMY_EXT_OFFSET SZ_256M +#define DUMMY_EXT_LENGTH SZ_256M + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); + u64 base_dpa = BASE_DYNAMIC_CAP_DPA; + u32 dsmad_handle = 0xFADE; + u64 decode_length = SZ_1G; + u64 block_size = SZ_512; + /* For testing make this smaller than decode length */ + u64 length = SZ_1G; + int rc; + + mutex_init(&mdata->ext_lock); + xa_init(&mdata->dc_extents); + xa_init(&mdata->dc_accepted_exts); + + rc = devm_add_action_or_reset(dev, release_dc_ext, mdata); + if (rc) + return rc; + + for (int i = 0; i < NUM_MOCK_DC_REGIONS; i++) { + struct cxl_dc_region_config *conf = &mdata->dc_regions[i]; + + dev_dbg(dev, "Creating DC region DC%d DPA:%#llx LEN:%#llx\n", + i, base_dpa, length); + + conf->region_base = cpu_to_le64(base_dpa); + conf->region_decode_length = cpu_to_le64(decode_length / + CXL_CAPACITY_MULTIPLIER); + conf->region_length = cpu_to_le64(length); + conf->region_block_size = cpu_to_le64(block_size); + conf->region_dsmad_handle = cpu_to_le32(dsmad_handle); + dsmad_handle++; + + /* Pretend to have some previous accepted extents */ + rc = devm_add_extent(dev, base_dpa + DUMMY_EXT_OFFSET, + DUMMY_EXT_LENGTH, "CXL-TEST"); + if (rc) { + dev_err(dev, "Failed to add extent DC%d DPA:%#llx LEN:%#x; %d\n", + i, base_dpa + DUMMY_EXT_OFFSET, + DUMMY_EXT_LENGTH, rc); + return rc; + } + + rc = dc_accept_extent(dev, base_dpa + DUMMY_EXT_OFFSET, + DUMMY_EXT_LENGTH); + if (rc) + return rc; + + base_dpa += decode_length; + } + + return 0; +} + static int mock_gsl(struct cxl_mbox_cmd *cmd) { if (cmd->size_out < sizeof(mock_gsl_payload)) @@ -1371,6 +1588,177 @@ static int mock_activate_fw(struct cxl_mockmem_data *mdata, return -EINVAL; } +static int mock_get_dc_config(struct device *dev, + struct cxl_mbox_cmd *cmd) +{ + struct cxl_mbox_get_dc_config_in *dc_config = cmd->payload_in; + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); + u8 region_requested, region_start_idx, region_ret_cnt; + struct cxl_mbox_get_dc_config_out *resp; + + region_requested = dc_config->region_count; + if (region_requested > NUM_MOCK_DC_REGIONS) + region_requested = NUM_MOCK_DC_REGIONS; + + if (cmd->size_out < struct_size(resp, region, region_requested)) + return -EINVAL; + + memset(cmd->payload_out, 0, cmd->size_out); + resp = cmd->payload_out; + + region_start_idx = dc_config->start_region_index; + region_ret_cnt = 0; + for (int i = 0; i < NUM_MOCK_DC_REGIONS; i++) { + if (i >= region_start_idx) { + memcpy(&resp->region[region_ret_cnt], + &mdata->dc_regions[i], + sizeof(resp->region[region_ret_cnt])); + region_ret_cnt++; + } + } + resp->avail_region_count = region_ret_cnt; + + dev_dbg(dev, "Returning %d dc regions\n", region_ret_cnt); + return 0; +} + +static int mock_get_dc_extent_list(struct device *dev, + struct cxl_mbox_cmd *cmd) +{ + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); + struct cxl_mbox_get_dc_extent_in *get = cmd->payload_in; + struct cxl_mbox_get_dc_extent_out *resp = cmd->payload_out; + u32 total_avail = 0, total_ret = 0; + struct cxl_dc_extent_data *ext; + u32 ext_count, start_idx; + unsigned long i; + + ext_count = le32_to_cpu(get->extent_cnt); + start_idx = le32_to_cpu(get->start_extent_index); + + memset(resp, 0, sizeof(*resp)); + + guard(mutex)(&mdata->ext_lock); + /* + * Total available needs to be calculated and returned regardless of + * how many can actually be returned. + */ + xa_for_each(&mdata->dc_accepted_exts, i, ext) + total_avail++; + + if (start_idx > total_avail) + return -EINVAL; + + xa_for_each(&mdata->dc_accepted_exts, i, ext) { + if (total_ret >= ext_count) + break; + + if (total_ret >= start_idx) { + resp->extent[total_ret].start_dpa = + cpu_to_le64(ext->dpa_start); + resp->extent[total_ret].length = + cpu_to_le64(ext->length); + memcpy(&resp->extent[total_ret].tag, ext->tag, + sizeof(resp->extent[total_ret])); + total_ret++; + } + } + + resp->ret_extent_cnt = cpu_to_le32(total_ret); + resp->total_extent_cnt = cpu_to_le32(total_avail); + resp->extent_list_num = cpu_to_le32(mdata->dc_ext_generation); + + dev_dbg(dev, "Returning %d extents of %d total\n", + total_ret, total_avail); + + return 0; +} + +static int mock_add_dc_response(struct device *dev, + struct cxl_mbox_cmd *cmd) +{ + struct cxl_mbox_dc_response *req = cmd->payload_in; + u32 list_size = le32_to_cpu(req->extent_list_size); + + for (int i = 0; i < list_size; i++) { + u64 start = le64_to_cpu(req->extent_list[i].dpa_start); + u64 length = le64_to_cpu(req->extent_list[i].length); + int rc; + + rc = dc_accept_extent(dev, start, length); + if (rc) + return rc; + } + + return 0; +} + +static void dc_delete_extent(struct device *dev, unsigned long long start, + unsigned long long length) +{ + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); + unsigned long long end = start + length; + struct cxl_dc_extent_data *ext; + unsigned long index; + + dev_dbg(dev, "Deleting extent at %#llx len:%#llx\n", start, length); + + guard(mutex)(&mdata->ext_lock); + xa_for_each(&mdata->dc_extents, index, ext) { + u64 extent_end = ext->dpa_start + ext->length; + + /* + * Any extent which 'touches' the released delete range will be + * removed. + */ + if ((start <= ext->dpa_start && ext->dpa_start < end) || + (start <= extent_end && extent_end < end)) { + xa_erase(&mdata->dc_extents, ext->dpa_start); + } + } + + /* + * If the extent was accepted let it be for the host to drop + * later. + */ +} + +static int release_accepted_extent(struct device *dev, + unsigned long long start, + unsigned long long length) +{ + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); + struct cxl_dc_extent_data *ext; + + guard(mutex)(&mdata->ext_lock); + ext = find_create_ext(dev, &mdata->dc_accepted_exts, start, length); + if (!ext) { + dev_err(dev, "Extent %#llx not in accepted state\n", start); + return -EINVAL; + } + xa_erase(&mdata->dc_accepted_exts, ext->dpa_start); + mdata->dc_ext_generation++; + + return 0; +} + +static int mock_dc_release(struct device *dev, + struct cxl_mbox_cmd *cmd) +{ + struct cxl_mbox_dc_response *req = cmd->payload_in; + u32 list_size = le32_to_cpu(req->extent_list_size); + + for (int i = 0; i < list_size; i++) { + u64 start = le64_to_cpu(req->extent_list[i].dpa_start); + u64 length = le64_to_cpu(req->extent_list[i].length); + + dev_dbg(dev, "Extent %#llx released by host\n", start); + release_accepted_extent(dev, start, length); + } + + return 0; +} + static int cxl_mock_mbox_send(struct cxl_memdev_state *mds, struct cxl_mbox_cmd *cmd) { @@ -1455,6 +1843,18 @@ static int cxl_mock_mbox_send(struct cxl_memdev_state *mds, case CXL_MBOX_OP_ACTIVATE_FW: rc = mock_activate_fw(mdata, cmd); break; + case CXL_MBOX_OP_GET_DC_CONFIG: + rc = mock_get_dc_config(dev, cmd); + break; + case CXL_MBOX_OP_GET_DC_EXTENT_LIST: + rc = mock_get_dc_extent_list(dev, cmd); + break; + case CXL_MBOX_OP_ADD_DC_RESPONSE: + rc = mock_add_dc_response(dev, cmd); + break; + case CXL_MBOX_OP_RELEASE_DC: + rc = mock_dc_release(dev, cmd); + break; default: break; } @@ -1499,6 +1899,14 @@ static void init_event_log(struct mock_event_log *log) log->next_handle = 1; } +static void cxl_mock_mem_remove(struct platform_device *pdev) +{ + struct cxl_mockmem_data *mdata = dev_get_drvdata(&pdev->dev); + struct cxl_memdev_state *mds = mdata->mds; + + dev_dbg(mds->cxlds.dev, "Removing extents\n"); +} + static int cxl_mock_mem_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; @@ -1513,6 +1921,10 @@ static int cxl_mock_mem_probe(struct platform_device *pdev) return -ENOMEM; dev_set_drvdata(dev, mdata); + rc = cxl_mock_dc_region_setup(dev); + if (rc) + return rc; + mdata->lsa = vmalloc(LSA_SIZE); if (!mdata->lsa) return -ENOMEM; @@ -1561,6 +1973,10 @@ static int cxl_mock_mem_probe(struct platform_device *pdev) if (rc) return rc; + rc = cxl_dev_dynamic_capacity_identify(mds); + if (rc) + return rc; + rc = cxl_mem_create_range_info(mds); if (rc) return rc; @@ -1673,14 +2089,170 @@ static ssize_t sanitize_timeout_store(struct device *dev, return count; } - static DEVICE_ATTR_RW(sanitize_timeout); +/* Return if the proposed extent would break the test code */ +static bool new_extent_valid(struct device *dev, size_t new_start, + size_t new_len) +{ + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); + struct cxl_dc_extent_data *extent; + size_t new_end, i; + + if (!new_len) + return false; + + new_end = new_start + new_len; + + dev_dbg(dev, "New extent %zx-%zx\n", new_start, new_end); + + guard(mutex)(&mdata->ext_lock); + dev_dbg(dev, "Checking extents starts...\n"); + xa_for_each(&mdata->dc_extents, i, extent) { + if (extent->dpa_start == new_start) + return false; + } + + dev_dbg(dev, "Checking accepted extents starts...\n"); + xa_for_each(&mdata->dc_accepted_exts, i, extent) { + if (extent->dpa_start == new_start) + return false; + } + + return true; +} + +/* + * Format :: + * + * start and length must be a multiple of the configured region block size. + * Tag can be any string up to 16 bytes. + * + * Extents must be exclusive of other extents + */ +static ssize_t dc_inject_extent_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + unsigned long long start, length; + char *len_str, *tag_str; + size_t buf_len = count; + int rc; + + char *start_str __free(kfree) = kstrdup(buf, GFP_KERNEL); + if (!start_str) + return -ENOMEM; + + len_str = strnchr(start_str, buf_len, ':'); + if (!len_str) { + dev_err(dev, "Extent failed to find len_str: %s\n", start_str); + return -EINVAL; + } + + *len_str = '\0'; + len_str += 1; + buf_len -= strlen(start_str); + + tag_str = strnchr(len_str, buf_len, ':'); + if (!tag_str) { + dev_err(dev, "Extent failed to find tag_str: %s\n", len_str); + return -EINVAL; + } + *tag_str = '\0'; + tag_str += 1; + + if (kstrtoull(start_str, 0, &start)) { + dev_err(dev, "Extent failed to parse start: %s\n", start_str); + return -EINVAL; + } + + if (kstrtoull(len_str, 0, &length)) { + dev_err(dev, "Extent failed to parse length: %s\n", len_str); + return -EINVAL; + } + + if (!new_extent_valid(dev, start, length)) + return -EINVAL; + + rc = devm_add_extent(dev, start, length, tag_str); + if (rc) { + dev_err(dev, "Failed to add extent DPA:%#llx LEN:%#llx; %d\n", + start, length, rc); + return rc; + } + + return count; +} +static DEVICE_ATTR_WO(dc_inject_extent); + +static ssize_t __dc_del_extent_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count, + enum dc_event type) +{ + unsigned long long start, length; + char *len_str; + + char *start_str __free(kfree) = kstrdup(buf, GFP_KERNEL); + if (!start_str) + return -ENOMEM; + + len_str = strnchr(start_str, count, ':'); + if (!len_str) { + dev_err(dev, "Failed to find len_str: %s\n", start_str); + return -EINVAL; + } + *len_str = '\0'; + len_str += 1; + + if (kstrtoull(start_str, 0, &start)) { + dev_err(dev, "Failed to parse start: %s\n", start_str); + return -EINVAL; + } + + if (kstrtoull(len_str, 0, &length)) { + dev_err(dev, "Failed to parse length: %s\n", len_str); + return -EINVAL; + } + + dc_delete_extent(dev, start, length); + + if (type == DCD_FORCED_CAPACITY_RELEASE) + dev_dbg(dev, "Forcing delete of extent %#llx len:%#llx\n", + start, length); + + return count; +} + +/* + * Format : + */ +static ssize_t dc_del_extent_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + return __dc_del_extent_store(dev, attr, buf, count, + DCD_RELEASE_CAPACITY); +} +static DEVICE_ATTR_WO(dc_del_extent); + +static ssize_t dc_force_del_extent_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + return __dc_del_extent_store(dev, attr, buf, count, + DCD_FORCED_CAPACITY_RELEASE); +} +static DEVICE_ATTR_WO(dc_force_del_extent); + static struct attribute *cxl_mock_mem_attrs[] = { &dev_attr_security_lock.attr, &dev_attr_event_trigger.attr, &dev_attr_fw_buf_checksum.attr, &dev_attr_sanitize_timeout.attr, + &dev_attr_dc_inject_extent.attr, + &dev_attr_dc_del_extent.attr, + &dev_attr_dc_force_del_extent.attr, NULL }; ATTRIBUTE_GROUPS(cxl_mock_mem); @@ -1694,6 +2266,7 @@ MODULE_DEVICE_TABLE(platform, cxl_mock_mem_ids); static struct platform_driver cxl_mock_mem_driver = { .probe = cxl_mock_mem_probe, + .remove_new = cxl_mock_mem_remove, .id_table = cxl_mock_mem_ids, .driver = { .name = KBUILD_MODNAME, From patchwork Sun Mar 24 23:18:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13601025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 56E5D1E621F; Sun, 24 Mar 2024 23:18:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322316; cv=none; b=QMfmAEzpC0b3Ne/1vkv2GdSpa0qxypewhf0DgWUsClaO/i8WcC4bw5hnm18e3tKKGudY7tvOV0NWNPsMwMe1HsRqsf6u4zB5MOuva/63UDITw49+x+NdvHIEkSDu7c7moVcsgmR/E+IpAi9bByaOgXNKRUNEoaO0+huukBwAOSc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322316; c=relaxed/simple; bh=4r1yORMfG2GW4l9FysFu86n8CxAflr6BotLuESBN30Y=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Sj9QsrPEERNRVkrTuV3WVllgHbpWAHORAG3DFf8MVu6bt4G3ZtkDrSu18NE6kSFWXVVZpjZAL0+ZbfWFVVrSq+57VuYgc9wpdmYGsf2ux+1j8DP1dfJk7uKavldaGkGEIFUBM7MUQ8gssU00azgD/2ZyxDvLQfhYFP8w/0PYbbs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=KyyL16QT; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="KyyL16QT" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711322315; x=1742858315; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=4r1yORMfG2GW4l9FysFu86n8CxAflr6BotLuESBN30Y=; b=KyyL16QTu6UuA+b0sx6LRx1oGItbQQvIsdQvnSolQeqJqvGiYz9+PM+k s1VRDQcduTW3xgWOaSzXlq25la7Jjaf4toMHyOPm8HGdu6phQCjgmgm6F TtxEmqB3NhlLb8q0tYS9FulhihU3+NSRLGgdrIGdrTdUyE55AgmoB0642 X/x181JwsvZq5E+J/7a6AR+GmzwJHl2ijCGus9tJFoVw/klR2bNSuDD5i BzlU2OwOIJVt0GCXTX9e7QnpRFc+cribbPCNVf2d0pn9/DZ76W9vd2zn+ Gop2vrFj6pTKipWmnyT+ZeDjMdZvrp3c7hlpKXcZpA5XCtPsgcnEnicGY g==; X-IronPort-AV: E=McAfee;i="6600,9927,11023"; a="6431781" X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="6431781" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,152,1708416000"; d="scan'208";a="15464745" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.213.186.165]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2024 16:18:25 -0700 From: Ira Weiny Date: Sun, 24 Mar 2024 16:18:29 -0700 Subject: [PATCH 26/26] tools/testing/cxl: Add Dynamic Capacity events Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240324-dcd-type2-upstream-v1-26-b7b00d623625@intel.com> References: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> In-Reply-To: <20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711322284; l=3859; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=4r1yORMfG2GW4l9FysFu86n8CxAflr6BotLuESBN30Y=; b=2wMkvzR7hOyCAE/hZCpNZoz7BvuAavRQSyPfw4ZEfsjYXlVPnG9KQlvo0fHS12TZ7TTfvULM3 UmqKvDZxmMdDqmEz7Ik1KCnLdr0HY+k36CZ+hl3JaKzp0FhbY1mWPMN X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= cxl_test provides a good way to ensure quick smoke and regression testing. The complexity of DCD and the new sparse DAX regions required to use them benefits greatly with a series of smoke tests. The only part of the kernel stack which must be bypassed is the actual irq of DCD events. However, the event processing itself can be tested via cxl_test calling directly the event processing function directly. In this way the rest of the stack; management of sparse regions, the extent device lifetimes, and the dax device operations can be tested. Add Dynamic Capacity Device tests for kernels which have DCD support in cxl_test. Add events on DCD extent injection. Directly call the event irq callback to simulate irqs to process the test extents. Signed-off-by: Ira Weiny --- Changes for v1 [iweiny: Adjust to new events] --- tools/testing/cxl/test/mem.c | 58 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c index 7d1d897d9f2b..e7efb1d3e20f 100644 --- a/tools/testing/cxl/test/mem.c +++ b/tools/testing/cxl/test/mem.c @@ -2122,6 +2122,49 @@ static bool new_extent_valid(struct device *dev, size_t new_start, return true; } +struct cxl_test_dcd { + uuid_t id; + struct cxl_event_dcd rec; +} __packed; + +struct cxl_test_dcd dcd_event_rec_template = { + .id = CXL_EVENT_DC_EVENT_UUID, + .rec = { + .hdr = { + .length = sizeof(struct cxl_test_dcd), + }, + }, +}; + +static int log_dc_event(struct cxl_mockmem_data *mdata, enum dc_event type, + u64 start, u64 length, const char *tag_str) +{ + struct device *dev = mdata->mds->cxlds.dev; + struct cxl_test_dcd *dcd_event; + + dev_dbg(dev, "mock device log event %d\n", type); + + dcd_event = devm_kmemdup(dev, &dcd_event_rec_template, + sizeof(*dcd_event), GFP_KERNEL); + if (!dcd_event) + return -ENOMEM; + + dcd_event->rec.event_type = type; + dcd_event->rec.extent.start_dpa = cpu_to_le64(start); + dcd_event->rec.extent.length = cpu_to_le64(length); + memcpy(dcd_event->rec.extent.tag, tag_str, + min(sizeof(dcd_event->rec.extent.tag), + strlen(tag_str))); + + mes_add_event(mdata, CXL_EVENT_TYPE_DCD, + (struct cxl_event_record_raw *)dcd_event); + + /* Fake the irq */ + cxl_mem_get_event_records(mdata->mds, CXLDEV_EVENT_STATUS_DCD); + + return 0; +} + /* * Format :: * @@ -2134,6 +2177,7 @@ static ssize_t dc_inject_extent_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); unsigned long long start, length; char *len_str, *tag_str; size_t buf_len = count; @@ -2181,6 +2225,12 @@ static ssize_t dc_inject_extent_store(struct device *dev, return rc; } + rc = log_dc_event(mdata, DCD_ADD_CAPACITY, start, length, tag_str); + if (rc) { + dev_err(dev, "Failed to add event %d\n", rc); + return rc; + } + return count; } static DEVICE_ATTR_WO(dc_inject_extent); @@ -2190,8 +2240,10 @@ static ssize_t __dc_del_extent_store(struct device *dev, const char *buf, size_t count, enum dc_event type) { + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); unsigned long long start, length; char *len_str; + int rc; char *start_str __free(kfree) = kstrdup(buf, GFP_KERNEL); if (!start_str) @@ -2221,6 +2273,12 @@ static ssize_t __dc_del_extent_store(struct device *dev, dev_dbg(dev, "Forcing delete of extent %#llx len:%#llx\n", start, length); + rc = log_dc_event(mdata, type, start, length, ""); + if (rc) { + dev_err(dev, "Failed to add event %d\n", rc); + return rc; + } + return count; }