From patchwork Wed Jan 22 08:59:16 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13947027 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C7611F76C7 for ; Wed, 22 Jan 2025 08:59:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737536359; cv=none; b=Lqdl8hqSlMQ1znoHXjh55B7tFaAQXBu3j/rUW8k2x/pk8NW8ycpMN0WRErmniEsEYUK3Qbn9oM/8fEZZaHdJJ5rbncOHa8cqDl421/XPejPNFkC0yVhn4RouQRZvLjDiB84HNu2Tb6eXDx4HO5UydoZ0kyRxW+y+eLN/NvDJZYw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737536359; c=relaxed/simple; bh=nVetSTq65LpgKFmFl2kYCrEjxN13S8JJLdSvNph7Dh8=; h=Subject:From:To:Cc:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=rarcZjqe7aTDSzpNe9ivDRPxweaeeYKjDO1jOSBdkdJoe/Q8H7OhFH7hvBCSfjygzgT4fmDc9d0NiJ01dLjsuVN+dvc8kUdKUPwifgcf2bxTZUhNtkcDnuaYp55K3BTo3++YQg1nBI+AhMXn9fp9y5B91vYLsPjmqK7zswYNnpA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ACZFjc+t; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ACZFjc+t" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737536358; x=1769072358; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nVetSTq65LpgKFmFl2kYCrEjxN13S8JJLdSvNph7Dh8=; b=ACZFjc+t+pHJ0HkvevpY0kSZDCQ0Ir2n6L3fk6DSNybKx+G4yRmfU6uz DsgpP8zbU/DNVo7K6T+90j4Cz6N/ZwliAi30p3lw5ff/n+QnWsCo59Fk0 /3VFfP2J/Ai3vuw9Pivx2B9Sc3/qHNYM0NbXa/1gFbNSbbiZIuNbD/f6J 8ArL6PIifW+8LvPaXZaxHIf7VyJ3oOK5ChO3ZKFlU1yT9iPzz9GzCPtaE rwgjNx2m8LPGZdRfJScXdoQoIiVRQ/jGakFSnJHVweVvs4c1GKAw3ThCY JetN07Tijd92Ep3EyTPUAQzfmunxw6SDQxvzZsOTG9PAxb1HWwk2pVr2T A==; X-CSE-ConnectionGUID: apZ+/qL+Ssm4DoNr0opk1g== X-CSE-MsgGUID: xWIg9xm/T8KtFkTzEG5yMw== X-IronPort-AV: E=McAfee;i="6700,10204,11322"; a="55395958" X-IronPort-AV: E=Sophos;i="6.13,224,1732608000"; d="scan'208";a="55395958" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 00:59:17 -0800 X-CSE-ConnectionGUID: QWGoQPu5T7uPtlubgAs2NQ== X-CSE-MsgGUID: TuZ8yYAvTWObnEzAC+Hh0g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="130378048" Received: from ldmartin-desk2.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.125.110.77]) by fmviesa002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 00:59:16 -0800 Subject: [PATCH v2 1/5] cxl: Remove the CXL_DECODER_MIXED mistake From: Dan Williams To: linux-cxl@vger.kernel.org Cc: dave.jiang@intel.com, Jonathan.Cameron@huawei.com Date: Wed, 22 Jan 2025 00:59:16 -0800 Message-ID: <173753635601.3849855.5582594127330525596.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <173753635014.3849855.17902348420186052714.stgit@dwillia2-xfh.jf.intel.com> References: <173753635014.3849855.17902348420186052714.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 CXL_DECODER_MIXED is a safety mechanism introduced for the case where platform firmware has programmed an endpoint decoder that straddles a DPA partition boundary. While the kernel is careful to only allocate DPA capacity within a single partition there is no guarantee that platform firmware, or anything that touched the device before the current kernel, gets that right. However, __cxl_dpa_reserve() will never get to the CXL_DECODER_MIXED designation because of the way it tracks partition boundaries. A request_resource() that spans ->ram_res and ->pmem_res fails with the following signature: __cxl_dpa_reserve: cxl_port endpoint15: decoder15.0: failed to reserve allocation CXL_DECODER_MIXED is dead defensive programming after the driver has already given up on the device. It has never offered any protection in practice, just delete it. Signed-off-by: Dan Williams Reviewed-by: Ira Weiny Reviewed-by: Jonathan Cameron Reviewed-by: Alejandro Lucero Reviewed-by: Dave Jiang --- drivers/cxl/core/hdm.c | 6 +++--- drivers/cxl/core/region.c | 12 ------------ drivers/cxl/cxl.h | 4 +--- 3 files changed, 4 insertions(+), 18 deletions(-) diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 28edd5822486..2848d6991d45 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -332,9 +332,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, else if (resource_contains(&cxlds->ram_res, res)) cxled->mode = CXL_DECODER_RAM; else { - dev_warn(dev, "decoder%d.%d: %pr mixed mode not supported\n", - port->id, cxled->cxld.id, cxled->dpa_res); - cxled->mode = CXL_DECODER_MIXED; + dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n", + port->id, cxled->cxld.id, res); + cxled->mode = CXL_DECODER_NONE; } port->hdm_end++; diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index d77899650798..e4885acac853 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -2725,18 +2725,6 @@ static int poison_by_decoder(struct device *dev, void *arg) if (!cxled->dpa_res || !resource_size(cxled->dpa_res)) return rc; - /* - * Regions are only created with single mode decoders: pmem or ram. - * Linux does not support mixed mode decoders. This means that - * reading poison per endpoint decoder adheres to the requirement - * that poison reads of pmem and ram must be separated. - * CXL 3.0 Spec 8.2.9.8.4.1 - */ - if (cxled->mode == CXL_DECODER_MIXED) { - dev_dbg(dev, "poison list read unsupported in mixed mode\n"); - return rc; - } - cxlmd = cxled_to_memdev(cxled); if (cxled->skip) { offset = cxled->dpa_res->start - cxled->skip; diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index f6015f24ad38..4d0550367042 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -379,7 +379,6 @@ enum cxl_decoder_mode { CXL_DECODER_NONE, CXL_DECODER_RAM, CXL_DECODER_PMEM, - CXL_DECODER_MIXED, CXL_DECODER_DEAD, }; @@ -389,10 +388,9 @@ static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode) [CXL_DECODER_NONE] = "none", [CXL_DECODER_RAM] = "ram", [CXL_DECODER_PMEM] = "pmem", - [CXL_DECODER_MIXED] = "mixed", }; - if (mode >= CXL_DECODER_NONE && mode <= CXL_DECODER_MIXED) + if (mode >= CXL_DECODER_NONE && mode < CXL_DECODER_DEAD) return names[mode]; return "mixed"; } From patchwork Wed Jan 22 08:59:21 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13947028 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 729731F76C7 for ; Wed, 22 Jan 2025 08:59:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737536365; cv=none; b=YnxRyb6LknIwcUF+2BzpHGknWIsqMHsbFNtlh1vRBmPOhBLudQJYmEpTz9ATmPEvPvhMyo1ikJSsL8b55n1WGsL8RpEuqbcVdEJwesd/TIWxdnCfArOuyiK99YwF80rphpuOxYVNy5W7+1EwpOtEavrC5YMXgd5xwpfMMGDTz48= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737536365; c=relaxed/simple; bh=ZEpyXaAJJw5IBgmcxtrgxeC2O2Npj+kBSPijHKjmpl8=; h=Subject:From:To:Cc:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=O7aeqFso8sOZgZ+H0+UDWeW6L5FSSFEAj1ZMx3IJjcmBiiKX4Rari8DtgotkNn9HNHIgxnhJBWDlZHb5qmrdL665z2MLk30dFP6A6hLXP7JI9mBXLGB4l1SrDvPeF1YPQ98rOBqtr39HR+Q0P4buzvlEGExg5IRQCjr7NopAig4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=D8SHKCon; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="D8SHKCon" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737536362; x=1769072362; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZEpyXaAJJw5IBgmcxtrgxeC2O2Npj+kBSPijHKjmpl8=; b=D8SHKConHIXPcags0or1vNKr+YMnt0iox2aLCGr1FiH9MQyi7Kh/cf/X sV4QZFzhb5Z8xs4WdSZ0TBMd8Ss757+GMVUitvoqk48yA0AjAgeOzpyjO ECENCnAMKJTVZJWXLfz6cKENBhgaJBU9199KRUMMhcbw/dsMzsARcobIf Ngm/F4Qkog68cN3jhrEbe4dddVSuRplbHs7x1JYPRmL+Fk0DrWI0+w3mm DSZ/sXCrk0/kjCE2SO+dZc9K3TO7PuPKLPfsBbpxvfKcl8h12MImKLbe/ 9PEgXFsQYe54cJPR1alXZHi9xQl4LwdX68SRGY1diPTgZvwZO20/ZkngS g==; X-CSE-ConnectionGUID: HBqaorwTQz6uN8e26QdDjw== X-CSE-MsgGUID: BjqPDCT5QVahztjs7HXrfA== X-IronPort-AV: E=McAfee;i="6700,10204,11322"; a="55395973" X-IronPort-AV: E=Sophos;i="6.13,224,1732608000"; d="scan'208";a="55395973" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 00:59:22 -0800 X-CSE-ConnectionGUID: 5lUW+7HZRdiOflDz4NpcYg== X-CSE-MsgGUID: 7H6L0oKKRt+vXavgUueRUQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="130378055" Received: from ldmartin-desk2.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.125.110.77]) by fmviesa002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 00:59:22 -0800 Subject: [PATCH v2 2/5] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Dave Jiang , Alejandro Lucero , Ira Weiny , dave.jiang@intel.com, Jonathan.Cameron@huawei.com Date: Wed, 22 Jan 2025 00:59:21 -0800 Message-ID: <173753636159.3849855.512949598685608224.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <173753635014.3849855.17902348420186052714.stgit@dwillia2-xfh.jf.intel.com> References: <173753635014.3849855.17902348420186052714.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In preparation for consolidating all DPA partition information into an array of DPA metadata, introduce helpers that hide the layout of the current data. I.e. make the eventual replacement of ->ram_res, ->pmem_res, ->ram_perf, and ->pmem_perf with a new DPA metadata array a no-op for code paths that consume that information, and reduce the noise of follow-on patches. The end goal is to consolidate all DPA information in 'struct cxl_dev_state', but for now the helpers just make it appear that all DPA metadata is relative to @cxlds. Note that a follow-on patch also cleans up the temporary placeholders of @ram_res, and @pmem_res in the qos_class manipulation code, cxl_dpa_alloc(), and cxl_mem_create_range_info(). Cc: Dave Jiang Cc: Alejandro Lucero Cc: Ira Weiny Signed-off-by: Dan Williams Reviewed-by: Ira Weiny Reviewed-by: Dave Jiang --- drivers/cxl/core/cdat.c | 70 +++++++++++++++++++++++++----------------- drivers/cxl/core/hdm.c | 26 ++++++++-------- drivers/cxl/core/mbox.c | 18 ++++++----- drivers/cxl/core/memdev.c | 42 +++++++++++++------------ drivers/cxl/core/region.c | 10 ++++-- drivers/cxl/cxlmem.h | 58 ++++++++++++++++++++++++++++++----- drivers/cxl/mem.c | 2 + tools/testing/cxl/test/cxl.c | 25 ++++++++------- 8 files changed, 159 insertions(+), 92 deletions(-) diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c index 8153f8d83a16..b177a488e29b 100644 --- a/drivers/cxl/core/cdat.c +++ b/drivers/cxl/core/cdat.c @@ -258,29 +258,33 @@ static void update_perf_entry(struct device *dev, struct dsmas_entry *dent, static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds, struct xarray *dsmas_xa) { - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds); struct device *dev = cxlds->dev; - struct range pmem_range = { - .start = cxlds->pmem_res.start, - .end = cxlds->pmem_res.end, - }; - struct range ram_range = { - .start = cxlds->ram_res.start, - .end = cxlds->ram_res.end, - }; struct dsmas_entry *dent; unsigned long index; + const struct resource *partition[] = { + to_ram_res(cxlds), + to_pmem_res(cxlds), + }; + struct cxl_dpa_perf *perf[] = { + to_ram_perf(cxlds), + to_pmem_perf(cxlds), + }; xa_for_each(dsmas_xa, index, dent) { - if (resource_size(&cxlds->ram_res) && - range_contains(&ram_range, &dent->dpa_range)) - update_perf_entry(dev, dent, &mds->ram_perf); - else if (resource_size(&cxlds->pmem_res) && - range_contains(&pmem_range, &dent->dpa_range)) - update_perf_entry(dev, dent, &mds->pmem_perf); - else - dev_dbg(dev, "no partition for dsmas dpa: %pra\n", - &dent->dpa_range); + for (int i = 0; i < ARRAY_SIZE(partition); i++) { + const struct resource *res = partition[i]; + struct range range = { + .start = res->start, + .end = res->end, + }; + + if (range_contains(&range, &dent->dpa_range)) + update_perf_entry(dev, dent, perf[i]); + else + dev_dbg(dev, + "no partition for dsmas dpa: %pra\n", + &dent->dpa_range); + } } } @@ -304,6 +308,9 @@ static int match_cxlrd_qos_class(struct device *dev, void *data) static void reset_dpa_perf(struct cxl_dpa_perf *dpa_perf) { + if (!dpa_perf) + return; + *dpa_perf = (struct cxl_dpa_perf) { .qos_class = CXL_QOS_CLASS_INVALID, }; @@ -312,6 +319,9 @@ static void reset_dpa_perf(struct cxl_dpa_perf *dpa_perf) static bool cxl_qos_match(struct cxl_port *root_port, struct cxl_dpa_perf *dpa_perf) { + if (!dpa_perf) + return false; + if (dpa_perf->qos_class == CXL_QOS_CLASS_INVALID) return false; @@ -346,7 +356,8 @@ static int match_cxlrd_hb(struct device *dev, void *data) static int cxl_qos_class_verify(struct cxl_memdev *cxlmd) { struct cxl_dev_state *cxlds = cxlmd->cxlds; - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds); + struct cxl_dpa_perf *ram_perf = to_ram_perf(cxlds), + *pmem_perf = to_pmem_perf(cxlds); struct cxl_port *root_port; int rc; @@ -359,17 +370,17 @@ static int cxl_qos_class_verify(struct cxl_memdev *cxlmd) root_port = &cxl_root->port; /* Check that the QTG IDs are all sane between end device and root decoders */ - if (!cxl_qos_match(root_port, &mds->ram_perf)) - reset_dpa_perf(&mds->ram_perf); - if (!cxl_qos_match(root_port, &mds->pmem_perf)) - reset_dpa_perf(&mds->pmem_perf); + if (!cxl_qos_match(root_port, ram_perf)) + reset_dpa_perf(ram_perf); + if (!cxl_qos_match(root_port, pmem_perf)) + reset_dpa_perf(pmem_perf); /* Check to make sure that the device's host bridge is under a root decoder */ rc = device_for_each_child(&root_port->dev, cxlmd->endpoint->host_bridge, match_cxlrd_hb); if (!rc) { - reset_dpa_perf(&mds->ram_perf); - reset_dpa_perf(&mds->pmem_perf); + reset_dpa_perf(ram_perf); + reset_dpa_perf(pmem_perf); } return rc; @@ -567,6 +578,9 @@ static bool dpa_perf_contains(struct cxl_dpa_perf *perf, .end = dpa_res->end, }; + if (!perf) + return false; + return range_contains(&perf->dpa_range, &dpa); } @@ -574,15 +588,15 @@ static struct cxl_dpa_perf *cxled_get_dpa_perf(struct cxl_endpoint_decoder *cxle enum cxl_decoder_mode mode) { struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); + struct cxl_dev_state *cxlds = cxlmd->cxlds; struct cxl_dpa_perf *perf; switch (mode) { case CXL_DECODER_RAM: - perf = &mds->ram_perf; + perf = to_ram_perf(cxlds); break; case CXL_DECODER_PMEM: - perf = &mds->pmem_perf; + perf = to_pmem_perf(cxlds); break; default: return ERR_PTR(-EINVAL); diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 2848d6991d45..7a85522294ad 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -327,9 +327,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, cxled->dpa_res = res; cxled->skip = skipped; - if (resource_contains(&cxlds->pmem_res, res)) + if (resource_contains(to_pmem_res(cxlds), res)) cxled->mode = CXL_DECODER_PMEM; - else if (resource_contains(&cxlds->ram_res, res)) + else if (resource_contains(to_ram_res(cxlds), res)) cxled->mode = CXL_DECODER_RAM; else { dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n", @@ -442,11 +442,11 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, * Only allow modes that are supported by the current partition * configuration */ - if (mode == CXL_DECODER_PMEM && !resource_size(&cxlds->pmem_res)) { + if (mode == CXL_DECODER_PMEM && !cxl_pmem_size(cxlds)) { dev_dbg(dev, "no available pmem capacity\n"); return -ENXIO; } - if (mode == CXL_DECODER_RAM && !resource_size(&cxlds->ram_res)) { + if (mode == CXL_DECODER_RAM && !cxl_ram_size(cxlds)) { dev_dbg(dev, "no available ram capacity\n"); return -ENXIO; } @@ -464,6 +464,8 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) struct device *dev = &cxled->cxld.dev; resource_size_t start, avail, skip; struct resource *p, *last; + const struct resource *ram_res = to_ram_res(cxlds); + const struct resource *pmem_res = to_pmem_res(cxlds); int rc; down_write(&cxl_dpa_rwsem); @@ -480,37 +482,37 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) goto out; } - for (p = cxlds->ram_res.child, last = NULL; p; p = p->sibling) + for (p = ram_res->child, last = NULL; p; p = p->sibling) last = p; if (last) free_ram_start = last->end + 1; else - free_ram_start = cxlds->ram_res.start; + free_ram_start = ram_res->start; - for (p = cxlds->pmem_res.child, last = NULL; p; p = p->sibling) + for (p = pmem_res->child, last = NULL; p; p = p->sibling) last = p; if (last) free_pmem_start = last->end + 1; else - free_pmem_start = cxlds->pmem_res.start; + free_pmem_start = pmem_res->start; if (cxled->mode == CXL_DECODER_RAM) { start = free_ram_start; - avail = cxlds->ram_res.end - start + 1; + avail = ram_res->end - start + 1; skip = 0; } else if (cxled->mode == CXL_DECODER_PMEM) { resource_size_t skip_start, skip_end; start = free_pmem_start; - avail = cxlds->pmem_res.end - start + 1; + avail = pmem_res->end - start + 1; skip_start = free_ram_start; /* * If some pmem is already allocated, then that allocation * already handled the skip. */ - if (cxlds->pmem_res.child && - skip_start == cxlds->pmem_res.child->start) + if (pmem_res->child && + skip_start == pmem_res->child->start) skip_end = skip_start - 1; else skip_end = start - 1; diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 548564c770c0..3502f1633ad2 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -1270,24 +1270,26 @@ static int add_dpa_res(struct device *dev, struct resource *parent, int cxl_mem_create_range_info(struct cxl_memdev_state *mds) { struct cxl_dev_state *cxlds = &mds->cxlds; + struct resource *ram_res = to_ram_res(cxlds); + struct resource *pmem_res = to_pmem_res(cxlds); struct device *dev = cxlds->dev; int rc; if (!cxlds->media_ready) { cxlds->dpa_res = DEFINE_RES_MEM(0, 0); - cxlds->ram_res = DEFINE_RES_MEM(0, 0); - cxlds->pmem_res = DEFINE_RES_MEM(0, 0); + *ram_res = DEFINE_RES_MEM(0, 0); + *pmem_res = DEFINE_RES_MEM(0, 0); return 0; } cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes); if (mds->partition_align_bytes == 0) { - rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0, + rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0, mds->volatile_only_bytes, "ram"); if (rc) return rc; - return add_dpa_res(dev, &cxlds->dpa_res, &cxlds->pmem_res, + return add_dpa_res(dev, &cxlds->dpa_res, pmem_res, mds->volatile_only_bytes, mds->persistent_only_bytes, "pmem"); } @@ -1298,11 +1300,11 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds) return rc; } - rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0, + rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0, mds->active_volatile_bytes, "ram"); if (rc) return rc; - return add_dpa_res(dev, &cxlds->dpa_res, &cxlds->pmem_res, + return add_dpa_res(dev, &cxlds->dpa_res, pmem_res, mds->active_volatile_bytes, mds->active_persistent_bytes, "pmem"); } @@ -1450,8 +1452,8 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev) mds->cxlds.reg_map.host = dev; mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE; mds->cxlds.type = CXL_DEVTYPE_CLASSMEM; - mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID; - mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID; + to_ram_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID; + to_pmem_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID; return mds; } diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index ae3dfcbe8938..c5f8320ed330 100644 --- a/drivers/cxl/core/memdev.c +++ b/drivers/cxl/core/memdev.c @@ -80,7 +80,7 @@ static ssize_t ram_size_show(struct device *dev, struct device_attribute *attr, { struct cxl_memdev *cxlmd = to_cxl_memdev(dev); struct cxl_dev_state *cxlds = cxlmd->cxlds; - unsigned long long len = resource_size(&cxlds->ram_res); + unsigned long long len = resource_size(to_ram_res(cxlds)); return sysfs_emit(buf, "%#llx\n", len); } @@ -93,7 +93,7 @@ static ssize_t pmem_size_show(struct device *dev, struct device_attribute *attr, { struct cxl_memdev *cxlmd = to_cxl_memdev(dev); struct cxl_dev_state *cxlds = cxlmd->cxlds; - unsigned long long len = resource_size(&cxlds->pmem_res); + unsigned long long len = cxl_pmem_size(cxlds); return sysfs_emit(buf, "%#llx\n", len); } @@ -198,16 +198,20 @@ static int cxl_get_poison_by_memdev(struct cxl_memdev *cxlmd) int rc = 0; /* CXL 3.0 Spec 8.2.9.8.4.1 Separate pmem and ram poison requests */ - if (resource_size(&cxlds->pmem_res)) { - offset = cxlds->pmem_res.start; - length = resource_size(&cxlds->pmem_res); + if (cxl_pmem_size(cxlds)) { + const struct resource *res = to_pmem_res(cxlds); + + offset = res->start; + length = resource_size(res); rc = cxl_mem_get_poison(cxlmd, offset, length, NULL); if (rc) return rc; } - if (resource_size(&cxlds->ram_res)) { - offset = cxlds->ram_res.start; - length = resource_size(&cxlds->ram_res); + if (cxl_ram_size(cxlds)) { + const struct resource *res = to_ram_res(cxlds); + + offset = res->start; + length = resource_size(res); rc = cxl_mem_get_poison(cxlmd, offset, length, NULL); /* * Invalid Physical Address is not an error for @@ -409,9 +413,8 @@ static ssize_t pmem_qos_class_show(struct device *dev, { struct cxl_memdev *cxlmd = to_cxl_memdev(dev); struct cxl_dev_state *cxlds = cxlmd->cxlds; - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds); - return sysfs_emit(buf, "%d\n", mds->pmem_perf.qos_class); + return sysfs_emit(buf, "%d\n", to_pmem_perf(cxlds)->qos_class); } static struct device_attribute dev_attr_pmem_qos_class = @@ -428,9 +431,8 @@ static ssize_t ram_qos_class_show(struct device *dev, { struct cxl_memdev *cxlmd = to_cxl_memdev(dev); struct cxl_dev_state *cxlds = cxlmd->cxlds; - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds); - return sysfs_emit(buf, "%d\n", mds->ram_perf.qos_class); + return sysfs_emit(buf, "%d\n", to_ram_perf(cxlds)->qos_class); } static struct device_attribute dev_attr_ram_qos_class = @@ -466,11 +468,11 @@ static umode_t cxl_ram_visible(struct kobject *kobj, struct attribute *a, int n) { struct device *dev = kobj_to_dev(kobj); struct cxl_memdev *cxlmd = to_cxl_memdev(dev); - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); + struct cxl_dpa_perf *perf = to_ram_perf(cxlmd->cxlds); - if (a == &dev_attr_ram_qos_class.attr) - if (mds->ram_perf.qos_class == CXL_QOS_CLASS_INVALID) - return 0; + if (a == &dev_attr_ram_qos_class.attr && + (!perf || perf->qos_class == CXL_QOS_CLASS_INVALID)) + return 0; return a->mode; } @@ -485,11 +487,11 @@ static umode_t cxl_pmem_visible(struct kobject *kobj, struct attribute *a, int n { struct device *dev = kobj_to_dev(kobj); struct cxl_memdev *cxlmd = to_cxl_memdev(dev); - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); + struct cxl_dpa_perf *perf = to_pmem_perf(cxlmd->cxlds); - if (a == &dev_attr_pmem_qos_class.attr) - if (mds->pmem_perf.qos_class == CXL_QOS_CLASS_INVALID) - return 0; + if (a == &dev_attr_pmem_qos_class.attr && + (!perf || perf->qos_class == CXL_QOS_CLASS_INVALID)) + return 0; return a->mode; } diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index e4885acac853..9f0f6fdbc841 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -2688,7 +2688,7 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd, if (ctx->mode == CXL_DECODER_RAM) { offset = ctx->offset; - length = resource_size(&cxlds->ram_res) - offset; + length = cxl_ram_size(cxlds) - offset; rc = cxl_mem_get_poison(cxlmd, offset, length, NULL); if (rc == -EFAULT) rc = 0; @@ -2700,9 +2700,11 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd, length = resource_size(&cxlds->dpa_res) - offset; if (!length) return 0; - } else if (resource_size(&cxlds->pmem_res)) { - offset = cxlds->pmem_res.start; - length = resource_size(&cxlds->pmem_res); + } else if (cxl_pmem_size(cxlds)) { + const struct resource *res = to_pmem_res(cxlds); + + offset = res->start; + length = resource_size(res); } else { return 0; } diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 2a25d1957ddb..78e92e24d7b5 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -423,8 +423,8 @@ struct cxl_dpa_perf { * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH) * @media_ready: Indicate whether the device media is usable * @dpa_res: Overall DPA resource tree for the device - * @pmem_res: Active Persistent memory capacity configuration - * @ram_res: Active Volatile memory capacity configuration + * @_pmem_res: Active Persistent memory capacity configuration + * @_ram_res: Active Volatile memory capacity configuration * @serial: PCIe Device Serial Number * @type: Generic Memory Class device or Vendor Specific Memory device * @cxl_mbox: CXL mailbox context @@ -438,13 +438,41 @@ struct cxl_dev_state { bool rcd; bool media_ready; struct resource dpa_res; - struct resource pmem_res; - struct resource ram_res; + struct resource _pmem_res; + struct resource _ram_res; u64 serial; enum cxl_devtype type; struct cxl_mailbox cxl_mbox; }; +static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds) +{ + return &cxlds->_ram_res; +} + +static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds) +{ + return &cxlds->_pmem_res; +} + +static inline resource_size_t cxl_ram_size(struct cxl_dev_state *cxlds) +{ + const struct resource *res = to_ram_res(cxlds); + + if (!res) + return 0; + return resource_size(res); +} + +static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds) +{ + const struct resource *res = to_pmem_res(cxlds); + + if (!res) + return 0; + return resource_size(res); +} + static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox) { return dev_get_drvdata(cxl_mbox->host); @@ -471,8 +499,8 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox) * @active_persistent_bytes: sum of hard + soft persistent * @next_volatile_bytes: volatile capacity change pending device reset * @next_persistent_bytes: persistent capacity change pending device reset - * @ram_perf: performance data entry matched to RAM partition - * @pmem_perf: performance data entry matched to PMEM partition + * @_ram_perf: performance data entry matched to RAM partition + * @_pmem_perf: performance data entry matched to PMEM partition * @event: event log driver state * @poison: poison driver state info * @security: security driver state info @@ -496,8 +524,8 @@ struct cxl_memdev_state { u64 next_volatile_bytes; u64 next_persistent_bytes; - struct cxl_dpa_perf ram_perf; - struct cxl_dpa_perf pmem_perf; + struct cxl_dpa_perf _ram_perf; + struct cxl_dpa_perf _pmem_perf; struct cxl_event_state event; struct cxl_poison_state poison; @@ -505,6 +533,20 @@ struct cxl_memdev_state { struct cxl_fw_state fw; }; +static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds) +{ + struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds); + + return &mds->_ram_perf; +} + +static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds) +{ + struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds); + + return &mds->_pmem_perf; +} + static inline struct cxl_memdev_state * to_cxl_memdev_state(struct cxl_dev_state *cxlds) { diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c index 2f03a4d5606e..9675243bd05b 100644 --- a/drivers/cxl/mem.c +++ b/drivers/cxl/mem.c @@ -152,7 +152,7 @@ static int cxl_mem_probe(struct device *dev) return -ENXIO; } - if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM)) { + if (cxl_pmem_size(cxlds) && IS_ENABLED(CONFIG_CXL_PMEM)) { rc = devm_cxl_add_nvdimm(parent_port, cxlmd); if (rc) { if (rc == -ENODEV) diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c index d0337c11f9ee..7f1c5061307b 100644 --- a/tools/testing/cxl/test/cxl.c +++ b/tools/testing/cxl/test/cxl.c @@ -1000,25 +1000,28 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port) find_cxl_root(port); struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev); struct cxl_dev_state *cxlds = cxlmd->cxlds; - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds); struct access_coordinate ep_c[ACCESS_COORDINATE_MAX]; - struct range pmem_range = { - .start = cxlds->pmem_res.start, - .end = cxlds->pmem_res.end, + const struct resource *partition[] = { + to_ram_res(cxlds), + to_pmem_res(cxlds), }; - struct range ram_range = { - .start = cxlds->ram_res.start, - .end = cxlds->ram_res.end, + struct cxl_dpa_perf *perf[] = { + to_ram_perf(cxlds), + to_pmem_perf(cxlds), }; if (!cxl_root) return; - if (range_len(&ram_range)) - dpa_perf_setup(port, &ram_range, &mds->ram_perf); + for (int i = 0; i < ARRAY_SIZE(partition); i++) { + const struct resource *res = partition[i]; + struct range range = { + .start = res->start, + .end = res->end, + }; - if (range_len(&pmem_range)) - dpa_perf_setup(port, &pmem_range, &mds->pmem_perf); + dpa_perf_setup(port, &range, perf[i]); + } cxl_memdev_update_perf(cxlmd); From patchwork Wed Jan 22 08:59:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13947029 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A738320FA86 for ; Wed, 22 Jan 2025 08:59:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737536370; cv=none; b=flAYRsQraWIP5DSnrCqZowul6vaQKupMlFkM62UI4U8MNLBddqlsSCz8Kn+fPYnxQstXDXa1jj64S/yHhEsnVvVmECAVvxUVZti21IMTH0gMZAbmSZsvzn1XqYCQL9DaA8oxp7IuYsrrPvuK/N4hEVzilwykSvwre847AIouhhI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737536370; c=relaxed/simple; bh=GHNNZDk0CL/Z66fN12WRqzJSvQMElxufyQd6AsQsIS0=; h=Subject:From:To:Cc:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=pL7h03UpBb39/DR1Yt+/H+gJo4tYf2t0CRfmTZ7YjAw+NOSdIDK+d8IzT5508RXNKePHZpCHaryieYAzKJjKcitvQAgqaZXGBr855FexXSlituYDPB5F3t+FUD4l9cdWJgp+H0rdWCAhvZVPUBuUzBAWOjhBK016P4GLOzQDMZE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=awV95vlf; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="awV95vlf" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737536369; x=1769072369; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GHNNZDk0CL/Z66fN12WRqzJSvQMElxufyQd6AsQsIS0=; b=awV95vlfIPagnLD5Pm8LXdf8UtM4NK++dpZW6rbVs+2lftwwl/NE8mtU J52Jw+5QnT5w9JjSnIrx5GJiZOM/JYUvKzjH4ohZhUWMwC7XZY2ktwuBT Z7y+FoGpqNi3rmIyIGH39XeWDpSq7wZIBxGhhwlQ+THGS5uzmI14ShtHR 8JW9LiYYv957Zf83xjssotVgNEv0vEtX8kd1h1V4fLLJAWECvawds8Ayy DrvURIWDeVJ3udrpo7F9nbcZ7duv1yfnOJGgab2j8VSZ2OgtxhIJRFezU QpeZbcws+12eg6OxEnFd8qCks5X+TL1TT6ldwN5G2m8bVV4bJrQAjdh8C g==; X-CSE-ConnectionGUID: /nrTUJr3SuGimz3Lu3kr2g== X-CSE-MsgGUID: XDBVHPtXSjmfNYMkbDhRpw== X-IronPort-AV: E=McAfee;i="6700,10204,11322"; a="37680081" X-IronPort-AV: E=Sophos;i="6.13,224,1732608000"; d="scan'208";a="37680081" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 00:59:28 -0800 X-CSE-ConnectionGUID: kAYrOMPLQvW4rrzmRNOz9A== X-CSE-MsgGUID: rmvhp3XISfyhsCmBPQc3zQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,224,1732608000"; d="scan'208";a="107180022" Received: from ldmartin-desk2.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.125.110.77]) by fmviesa008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 00:59:27 -0800 Subject: [PATCH v2 3/5] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info' From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Dave Jiang , Alejandro Lucero , Ira Weiny , dave.jiang@intel.com, Jonathan.Cameron@huawei.com Date: Wed, 22 Jan 2025 00:59:27 -0800 Message-ID: <173753636727.3849855.464861650807086965.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <173753635014.3849855.17902348420186052714.stgit@dwillia2-xfh.jf.intel.com> References: <173753635014.3849855.17902348420186052714.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The pending efforts to add CXL Accelerator (type-2) device [1], and Dynamic Capacity (DCD) support [2], tripped on the no-longer-fit-for-purpose design in the CXL subsystem for tracking device-physical-address (DPA) metadata. Trip hazards include: - CXL Memory Devices need to consider a PMEM partition, but Accelerator devices with CXL.mem likely do not in the common case. - CXL Memory Devices enumerate DPA through Memory Device mailbox commands like Partition Info, Accelerators devices do not. - CXL Memory Devices that support DCD support more than 2 partitions. Some of the driver algorithms are awkward to expand to > 2 partition cases. - DPA performance data is a general capability that can be shared with accelerators, so tracking it in 'struct cxl_memdev_state' is no longer suitable. - Hardcoded assumptions around the PMEM partition always being index-1 if RAM is zero-sized or PMEM is zero sized. - 'enum cxl_decoder_mode' is sometimes a partition id and sometimes a memory property, it should be phased in favor of a partition id and the memory property comes from the partition info. Towards cleaning up those issues and allowing a smoother landing for the aforementioned pending efforts, introduce a 'struct cxl_dpa_partition' array to 'struct cxl_dev_state', and 'struct cxl_range_info' as a shared way for Memory Devices and Accelerators to initialize the DPA information in 'struct cxl_dev_state'. For now, split a new cxl_dpa_setup() from cxl_mem_create_range_info() to get the new data structure initialized, and cleanup some qos_class init. Follow on patches will go further to use the new data structure to cleanup algorithms that are better suited to loop over all possible partitions. cxl_dpa_setup() follows the locking expectations of mutating the device DPA map, and is suitable for Accelerator drivers to use. Accelerators likely only have one hardcoded 'ram' partition to convey to the cxl_core. Link: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com [1] Link: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com [2] Cc: Dave Jiang Cc: Alejandro Lucero Cc: Ira Weiny Signed-off-by: Dan Williams Reviewed-by: Ira Weiny Reviewed-by: Dave Jiang Reviewed-by: Alejandro Lucero --- drivers/cxl/core/cdat.c | 15 ++----- drivers/cxl/core/hdm.c | 75 +++++++++++++++++++++++++++++++++- drivers/cxl/core/mbox.c | 68 ++++++++++-------------------- drivers/cxl/core/memdev.c | 2 - drivers/cxl/cxlmem.h | 94 +++++++++++++++++++++++++++++------------- drivers/cxl/pci.c | 7 +++ tools/testing/cxl/test/cxl.c | 15 ++----- tools/testing/cxl/test/mem.c | 7 +++ 8 files changed, 183 insertions(+), 100 deletions(-) diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c index b177a488e29b..5400a421ad30 100644 --- a/drivers/cxl/core/cdat.c +++ b/drivers/cxl/core/cdat.c @@ -261,25 +261,18 @@ static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds, struct device *dev = cxlds->dev; struct dsmas_entry *dent; unsigned long index; - const struct resource *partition[] = { - to_ram_res(cxlds), - to_pmem_res(cxlds), - }; - struct cxl_dpa_perf *perf[] = { - to_ram_perf(cxlds), - to_pmem_perf(cxlds), - }; xa_for_each(dsmas_xa, index, dent) { - for (int i = 0; i < ARRAY_SIZE(partition); i++) { - const struct resource *res = partition[i]; + for (int i = 0; i < cxlds->nr_partitions; i++) { + struct resource *res = &cxlds->part[i].res; struct range range = { .start = res->start, .end = res->end, }; if (range_contains(&range, &dent->dpa_range)) - update_perf_entry(dev, dent, perf[i]); + update_perf_entry(dev, dent, + &cxlds->part[i].perf); else dev_dbg(dev, "no partition for dsmas dpa: %pra\n", diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 7a85522294ad..3f8a54ca4624 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -327,9 +327,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, cxled->dpa_res = res; cxled->skip = skipped; - if (resource_contains(to_pmem_res(cxlds), res)) + if (to_pmem_res(cxlds) && resource_contains(to_pmem_res(cxlds), res)) cxled->mode = CXL_DECODER_PMEM; - else if (resource_contains(to_ram_res(cxlds), res)) + else if (to_ram_res(cxlds) && resource_contains(to_ram_res(cxlds), res)) cxled->mode = CXL_DECODER_RAM; else { dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n", @@ -342,6 +342,77 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, return 0; } +static int add_dpa_res(struct device *dev, struct resource *parent, + struct resource *res, resource_size_t start, + resource_size_t size, const char *type) +{ + int rc; + + *res = (struct resource) { + .name = type, + .start = start, + .end = start + size - 1, + .flags = IORESOURCE_MEM, + }; + if (resource_size(res) == 0) { + dev_dbg(dev, "DPA(%s): no capacity\n", res->name); + return 0; + } + rc = request_resource(parent, res); + if (rc) { + dev_err(dev, "DPA(%s): failed to track %pr (%d)\n", res->name, + res, rc); + return rc; + } + + dev_dbg(dev, "DPA(%s): %pr\n", res->name, res); + + return 0; +} + +/* if this fails the caller must destroy @cxlds, there is no recovery */ +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info) +{ + struct device *dev = cxlds->dev; + + guard(rwsem_write)(&cxl_dpa_rwsem); + + if (cxlds->nr_partitions) + return -EBUSY; + + if (!info->size || !info->nr_partitions) { + cxlds->dpa_res = DEFINE_RES_MEM(0, 0); + cxlds->nr_partitions = 0; + return 0; + } + + cxlds->dpa_res = DEFINE_RES_MEM(0, info->size); + + for (int i = 0; i < info->nr_partitions; i++) { + const struct cxl_dpa_part_info *part = &info->part[i]; + const char *desc; + int rc; + + if (part->mode == CXL_PARTMODE_RAM) + desc = "ram"; + else if (part->mode == CXL_PARTMODE_PMEM) + desc = "pmem"; + else + desc = ""; + cxlds->part[i].perf.qos_class = CXL_QOS_CLASS_INVALID; + cxlds->part[i].mode = part->mode; + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->part[i].res, + part->range.start, range_len(&part->range), + desc); + if (rc) + return rc; + cxlds->nr_partitions++; + } + + return 0; +} +EXPORT_SYMBOL_GPL(cxl_dpa_setup); + int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, resource_size_t base, resource_size_t len, resource_size_t skipped) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 3502f1633ad2..62bb3653362f 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -1241,57 +1241,39 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd) return rc; } -static int add_dpa_res(struct device *dev, struct resource *parent, - struct resource *res, resource_size_t start, - resource_size_t size, const char *type) +static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_partition_mode mode) { - int rc; + int i = info->nr_partitions; - res->name = type; - res->start = start; - res->end = start + size - 1; - res->flags = IORESOURCE_MEM; - if (resource_size(res) == 0) { - dev_dbg(dev, "DPA(%s): no capacity\n", res->name); - return 0; - } - rc = request_resource(parent, res); - if (rc) { - dev_err(dev, "DPA(%s): failed to track %pr (%d)\n", res->name, - res, rc); - return rc; - } - - dev_dbg(dev, "DPA(%s): %pr\n", res->name, res); + if (size == 0) + return; - return 0; + info->part[i].range = (struct range) { + .start = start, + .end = start + size - 1, + }; + info->part[i].mode = mode; + info->nr_partitions++; } -int cxl_mem_create_range_info(struct cxl_memdev_state *mds) +int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info) { struct cxl_dev_state *cxlds = &mds->cxlds; - struct resource *ram_res = to_ram_res(cxlds); - struct resource *pmem_res = to_pmem_res(cxlds); struct device *dev = cxlds->dev; int rc; if (!cxlds->media_ready) { - cxlds->dpa_res = DEFINE_RES_MEM(0, 0); - *ram_res = DEFINE_RES_MEM(0, 0); - *pmem_res = DEFINE_RES_MEM(0, 0); + info->size = 0; return 0; } - cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes); + info->size = mds->total_bytes; if (mds->partition_align_bytes == 0) { - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0, - mds->volatile_only_bytes, "ram"); - if (rc) - return rc; - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res, - mds->volatile_only_bytes, - mds->persistent_only_bytes, "pmem"); + add_part(info, 0, mds->volatile_only_bytes, CXL_PARTMODE_RAM); + add_part(info, mds->volatile_only_bytes, + mds->persistent_only_bytes, CXL_PARTMODE_PMEM); + return 0; } rc = cxl_mem_get_partition_info(mds); @@ -1300,15 +1282,13 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds) return rc; } - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0, - mds->active_volatile_bytes, "ram"); - if (rc) - return rc; - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res, - mds->active_volatile_bytes, - mds->active_persistent_bytes, "pmem"); + add_part(info, 0, mds->active_volatile_bytes, CXL_PARTMODE_RAM); + add_part(info, mds->active_volatile_bytes, mds->active_persistent_bytes, + CXL_PARTMODE_PMEM); + + return 0; } -EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, "CXL"); +EXPORT_SYMBOL_NS_GPL(cxl_mem_dpa_fetch, "CXL"); int cxl_set_timestamp(struct cxl_memdev_state *mds) { @@ -1452,8 +1432,6 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev) mds->cxlds.reg_map.host = dev; mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE; mds->cxlds.type = CXL_DEVTYPE_CLASSMEM; - to_ram_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID; - to_pmem_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID; return mds; } diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index c5f8320ed330..be0eb57086e1 100644 --- a/drivers/cxl/core/memdev.c +++ b/drivers/cxl/core/memdev.c @@ -80,7 +80,7 @@ static ssize_t ram_size_show(struct device *dev, struct device_attribute *attr, { struct cxl_memdev *cxlmd = to_cxl_memdev(dev); struct cxl_dev_state *cxlds = cxlmd->cxlds; - unsigned long long len = resource_size(to_ram_res(cxlds)); + unsigned long long len = cxl_ram_size(cxlds); return sysfs_emit(buf, "%#llx\n", len); } diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 78e92e24d7b5..15f549afab7c 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -97,6 +97,25 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, resource_size_t base, resource_size_t len, resource_size_t skipped); +enum cxl_partition_mode { + CXL_PARTMODE_NONE, + CXL_PARTMODE_RAM, + CXL_PARTMODE_PMEM, +}; + +#define CXL_NR_PARTITIONS_MAX 2 + +struct cxl_dpa_info { + u64 size; + struct cxl_dpa_part_info { + struct range range; + enum cxl_partition_mode mode; + } part[CXL_NR_PARTITIONS_MAX]; + int nr_partitions; +}; + +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info); + static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port, struct cxl_memdev *cxlmd) { @@ -408,6 +427,18 @@ struct cxl_dpa_perf { int qos_class; }; +/** + * struct cxl_dpa_partition - DPA partition descriptor + * @res: shortcut to the partition in the DPA resource tree (cxlds->dpa_res) + * @perf: performance attributes of the partition from CDAT + * @mode: operation mode for the DPA capacity, e.g. ram, pmem, dynamic... + */ +struct cxl_dpa_partition { + struct resource res; + struct cxl_dpa_perf perf; + enum cxl_partition_mode mode; +}; + /** * struct cxl_dev_state - The driver device state * @@ -423,8 +454,8 @@ struct cxl_dpa_perf { * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH) * @media_ready: Indicate whether the device media is usable * @dpa_res: Overall DPA resource tree for the device - * @_pmem_res: Active Persistent memory capacity configuration - * @_ram_res: Active Volatile memory capacity configuration + * @part: DPA partition array + * @nr_partitions: Number of DPA partitions * @serial: PCIe Device Serial Number * @type: Generic Memory Class device or Vendor Specific Memory device * @cxl_mbox: CXL mailbox context @@ -438,21 +469,47 @@ struct cxl_dev_state { bool rcd; bool media_ready; struct resource dpa_res; - struct resource _pmem_res; - struct resource _ram_res; + struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX]; + unsigned int nr_partitions; u64 serial; enum cxl_devtype type; struct cxl_mailbox cxl_mbox; }; -static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds) + +/* Static RAM is only expected at partition 0. */ +static inline const struct resource *to_ram_res(struct cxl_dev_state *cxlds) +{ + if (cxlds->part[0].mode != CXL_PARTMODE_RAM) + return NULL; + return &cxlds->part[0].res; +} + +/* + * Static PMEM may be at partition index 0 when there is no static RAM + * capacity. + */ +static inline const struct resource *to_pmem_res(struct cxl_dev_state *cxlds) +{ + for (int i = 0; i < cxlds->nr_partitions; i++) + if (cxlds->part[i].mode == CXL_PARTMODE_PMEM) + return &cxlds->part[i].res; + return NULL; +} + +static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds) { - return &cxlds->_ram_res; + if (cxlds->part[0].mode != CXL_PARTMODE_RAM) + return NULL; + return &cxlds->part[0].perf; } -static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds) +static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds) { - return &cxlds->_pmem_res; + for (int i = 0; i < cxlds->nr_partitions; i++) + if (cxlds->part[i].mode == CXL_PARTMODE_PMEM) + return &cxlds->part[i].perf; + return NULL; } static inline resource_size_t cxl_ram_size(struct cxl_dev_state *cxlds) @@ -499,8 +556,6 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox) * @active_persistent_bytes: sum of hard + soft persistent * @next_volatile_bytes: volatile capacity change pending device reset * @next_persistent_bytes: persistent capacity change pending device reset - * @_ram_perf: performance data entry matched to RAM partition - * @_pmem_perf: performance data entry matched to PMEM partition * @event: event log driver state * @poison: poison driver state info * @security: security driver state info @@ -524,29 +579,12 @@ struct cxl_memdev_state { u64 next_volatile_bytes; u64 next_persistent_bytes; - struct cxl_dpa_perf _ram_perf; - struct cxl_dpa_perf _pmem_perf; - struct cxl_event_state event; struct cxl_poison_state poison; struct cxl_security_state security; struct cxl_fw_state fw; }; -static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds) -{ - struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds); - - return &mds->_ram_perf; -} - -static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds) -{ - struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds); - - return &mds->_pmem_perf; -} - static inline struct cxl_memdev_state * to_cxl_memdev_state(struct cxl_dev_state *cxlds) { @@ -860,7 +898,7 @@ int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox, int cxl_dev_state_identify(struct cxl_memdev_state *mds); int cxl_await_media_ready(struct cxl_dev_state *cxlds); int cxl_enumerate_cmds(struct cxl_memdev_state *mds); -int cxl_mem_create_range_info(struct cxl_memdev_state *mds); +int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info); struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev); void set_exclusive_cxl_commands(struct cxl_memdev_state *mds, unsigned long *cmds); diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 0241d1d7133a..47dbfe406236 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -900,6 +900,7 @@ __ATTRIBUTE_GROUPS(cxl_rcd); static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus); + struct cxl_dpa_info range_info = { 0 }; struct cxl_memdev_state *mds; struct cxl_dev_state *cxlds; struct cxl_register_map map; @@ -989,7 +990,11 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (rc) return rc; - rc = cxl_mem_create_range_info(mds); + rc = cxl_mem_dpa_fetch(mds, &range_info); + if (rc) + return rc; + + rc = cxl_dpa_setup(cxlds, &range_info); if (rc) return rc; diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c index 7f1c5061307b..ba3d48b37de3 100644 --- a/tools/testing/cxl/test/cxl.c +++ b/tools/testing/cxl/test/cxl.c @@ -1001,26 +1001,19 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port) struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct access_coordinate ep_c[ACCESS_COORDINATE_MAX]; - const struct resource *partition[] = { - to_ram_res(cxlds), - to_pmem_res(cxlds), - }; - struct cxl_dpa_perf *perf[] = { - to_ram_perf(cxlds), - to_pmem_perf(cxlds), - }; if (!cxl_root) return; - for (int i = 0; i < ARRAY_SIZE(partition); i++) { - const struct resource *res = partition[i]; + for (int i = 0; i < cxlds->nr_partitions; i++) { + struct resource *res = &cxlds->part[i].res; + struct cxl_dpa_perf *perf = &cxlds->part[i].perf; struct range range = { .start = res->start, .end = res->end, }; - dpa_perf_setup(port, &range, perf[i]); + dpa_perf_setup(port, &range, perf); } cxl_memdev_update_perf(cxlmd); diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c index 347c1e7b37bd..ed365e083c8f 100644 --- a/tools/testing/cxl/test/mem.c +++ b/tools/testing/cxl/test/mem.c @@ -1477,6 +1477,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev) struct cxl_dev_state *cxlds; struct cxl_mockmem_data *mdata; struct cxl_mailbox *cxl_mbox; + struct cxl_dpa_info range_info = { 0 }; int rc; mdata = devm_kzalloc(dev, sizeof(*mdata), GFP_KERNEL); @@ -1537,7 +1538,11 @@ static int cxl_mock_mem_probe(struct platform_device *pdev) if (rc) return rc; - rc = cxl_mem_create_range_info(mds); + rc = cxl_mem_dpa_fetch(mds, &range_info); + if (rc) + return rc; + + rc = cxl_dpa_setup(cxlds, &range_info); if (rc) return rc; From patchwork Wed Jan 22 08:59:33 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13947030 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E101C20FA80 for ; Wed, 22 Jan 2025 08:59:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737536376; cv=none; b=gfkK3xZvXKDXbQo+jm9IqEvQ+mD+qeRBlscFv1nGY7PxmwdaayYd7jgYMrb0Xg2lTMuphbACfx/U7vJ+jqvvWDvwi2gduknfzVYUfPErKPusC7qigEi5G2WQY+7AsXFEwAIJrhmiP2AVbM0k65+1ctp+ClnlX+2RESMNpkETuRI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737536376; c=relaxed/simple; bh=+pQ7x8D3Jh3r5Qcht5j1uNnKKdO3FI5oRtixrpdn7eo=; h=Subject:From:To:Cc:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=leA54qMN6L7dKTUBKc0BYOZOm7x91TXGcul6yjpu7hUNABbRe/rMUHUllet+fU5FI6S8u99fuLsWDXEeTSnbQbjXzcigB1GsaqlWTx16N2a8ub7+aaxTrgp5bMHofNIcN/sTdBPt5+PFysYK+9vELXd9BMwkMQsVQ5MENmxZy88= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=GlWpYqqu; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="GlWpYqqu" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737536374; x=1769072374; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+pQ7x8D3Jh3r5Qcht5j1uNnKKdO3FI5oRtixrpdn7eo=; b=GlWpYqquIGJv+rJtw7RhKErA+ZpFUcMNh+uya12BOkDD67NO0cpkLZP1 BXnYOjShOXMjp4XOGv2HtMOeHcqa/x6FPHKnuosyVmwtTvFdEL39ivH0d ofl5HT3xF4UnoWjxXEhBE9o4SUqyH1Fgj8o+uemwBQxsKiEBi/3p6d3Hp WLKnEE6mcRq51bvO0DpQ9Jpk2RWHIahqUST0TsloWY8NBHpy9RhK/b0vC fNoJYTTPQMX53D75Q2Y3N710yUolGR45UCUp/MZ3ndlITsLvnCg3bPfhu csI0j5smx3NhMz45vgYk4OjX39nvJJ06cIz5xjrvvdjvMhKW6WdsecpQ2 A==; X-CSE-ConnectionGUID: hxjMs4etSEu0zraGYxLiqQ== X-CSE-MsgGUID: +A8nl07xSla6G/nyhc6cmw== X-IronPort-AV: E=McAfee;i="6700,10204,11322"; a="37680092" X-IronPort-AV: E=Sophos;i="6.13,224,1732608000"; d="scan'208";a="37680092" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 00:59:34 -0800 X-CSE-ConnectionGUID: Kk47CJuSSXq0UR34RTw1OQ== X-CSE-MsgGUID: M503/Ua1Sdi5k3LTi6rIBg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,224,1732608000"; d="scan'208";a="107180032" Received: from ldmartin-desk2.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.125.110.77]) by fmviesa008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 00:59:33 -0800 Subject: [PATCH v2 4/5] cxl: Make cxl_dpa_alloc() DPA partition number agnostic From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Dave Jiang , Alejandro Lucero , Ira Weiny , dave.jiang@intel.com, Jonathan.Cameron@huawei.com Date: Wed, 22 Jan 2025 00:59:33 -0800 Message-ID: <173753637297.3849855.5217976225600372473.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <173753635014.3849855.17902348420186052714.stgit@dwillia2-xfh.jf.intel.com> References: <173753635014.3849855.17902348420186052714.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 cxl_dpa_alloc() is a hard coded nest of assumptions around PMEM allocations being distinct from RAM allocations in specific ways when in practice the allocation rules are only relative to DPA partition index. The rules for cxl_dpa_alloc() are: - allocations can only come from 1 partition - if allocating at partition-index-N, all free space in partitions less than partition-index-N must be skipped over Use the new 'struct cxl_dpa_partition' array to support allocation with an arbitrary number of DPA partitions on the device. A follow-on patch can go further to cleanup 'enum cxl_decoder_mode' concept and supersede it with looking up the memory properties from partition metadata. Until then cxl_part_mode() temporarily bridges code that looks up partitions by @cxled->mode. Cc: Dave Jiang Cc: Alejandro Lucero Cc: Ira Weiny Signed-off-by: Dan Williams Reviewed-by: Ira Weiny Reviewed-by: Alejandro Lucero Reviewed-by: Dave Jiang --- drivers/cxl/core/hdm.c | 215 +++++++++++++++++++++++++++++++++++------------- drivers/cxl/cxlmem.h | 14 +++ 2 files changed, 172 insertions(+), 57 deletions(-) diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 3f8a54ca4624..591aeb26c9e1 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -223,6 +223,31 @@ void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds) } EXPORT_SYMBOL_NS_GPL(cxl_dpa_debug, "CXL"); +/* See request_skip() kernel-doc */ +static void release_skip(struct cxl_dev_state *cxlds, + const resource_size_t skip_base, + const resource_size_t skip_len) +{ + resource_size_t skip_start = skip_base, skip_rem = skip_len; + + for (int i = 0; i < cxlds->nr_partitions; i++) { + const struct resource *part_res = &cxlds->part[i].res; + resource_size_t skip_end, skip_size; + + if (skip_start < part_res->start || skip_start > part_res->end) + continue; + + skip_end = min(part_res->end, skip_start + skip_rem - 1); + skip_size = skip_end - skip_start + 1; + __release_region(&cxlds->dpa_res, skip_start, skip_size); + skip_start += skip_size; + skip_rem -= skip_size; + + if (!skip_rem) + break; + } +} + /* * Must be called in a context that synchronizes against this decoder's * port ->remove() callback (like an endpoint decoder sysfs attribute) @@ -241,7 +266,7 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled) skip_start = res->start - cxled->skip; __release_region(&cxlds->dpa_res, res->start, resource_size(res)); if (cxled->skip) - __release_region(&cxlds->dpa_res, skip_start, cxled->skip); + release_skip(cxlds, skip_start, cxled->skip); cxled->skip = 0; cxled->dpa_res = NULL; put_device(&cxled->cxld.dev); @@ -268,6 +293,79 @@ static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled) __cxl_dpa_release(cxled); } +/** + * request_skip() - Track DPA 'skip' in @cxlds->dpa_res resource tree + * @cxlds: CXL.mem device context that parents @cxled + * @cxled: Endpoint decoder establishing new allocation that skips lower DPA + * @skip_base: DPA < start of new DPA allocation (DPAnew) + * @skip_len: @skip_base + @skip_len == DPAnew + * + * DPA 'skip' arises from out-of-sequence DPA allocation events relative + * to free capacity across multiple partitions. It is a wasteful event + * as usable DPA gets thrown away, but if a deployment has, for example, + * a dual RAM+PMEM device, wants to use PMEM, and has unallocated RAM + * DPA, the free RAM DPA must be sacrificed to start allocating PMEM. + * See third "Implementation Note" in CXL 3.1 8.2.4.19.13 "Decoder + * Protection" for more details. + * + * A 'skip' always covers the last allocated DPA in a previous partition + * to the start of the current partition to allocate. Allocations never + * start in the middle of a partition, and allocations are always + * de-allocated in reverse order (see cxl_dpa_free(), or natural devm + * unwind order from forced in-order allocation). + * + * If @cxlds->nr_partitions was guaranteed to be <= 2 then the 'skip' + * would always be contained to a single partition. Given + * @cxlds->nr_partitions may be > 2 it results in cases where the 'skip' + * might span "tail capacity of partition[0], all of partition[1], ..., + * all of partition[N-1]" to support allocating from partition[N]. That + * in turn interacts with the partition 'struct resource' boundaries + * within @cxlds->dpa_res whereby 'skip' requests need to be divided by + * partition. I.e. this is a quirk of using a 'struct resource' tree to + * detect range conflicts while also tracking partition boundaries in + * @cxlds->dpa_res. + */ +static int request_skip(struct cxl_dev_state *cxlds, + struct cxl_endpoint_decoder *cxled, + const resource_size_t skip_base, + const resource_size_t skip_len) +{ + resource_size_t skip_start = skip_base, skip_rem = skip_len; + + for (int i = 0; i < cxlds->nr_partitions; i++) { + const struct resource *part_res = &cxlds->part[i].res; + struct cxl_port *port = cxled_to_port(cxled); + resource_size_t skip_end, skip_size; + struct resource *res; + + if (skip_start < part_res->start || skip_start > part_res->end) + continue; + + skip_end = min(part_res->end, skip_start + skip_rem - 1); + skip_size = skip_end - skip_start + 1; + + res = __request_region(&cxlds->dpa_res, skip_start, skip_size, + dev_name(&cxled->cxld.dev), 0); + if (!res) { + dev_dbg(cxlds->dev, + "decoder%d.%d: failed to reserve skipped space\n", + port->id, cxled->cxld.id); + break; + } + skip_start += skip_size; + skip_rem -= skip_size; + if (!skip_rem) + break; + } + + if (skip_rem == 0) + return 0; + + release_skip(cxlds, skip_base, skip_len - skip_rem); + + return -EBUSY; +} + static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, resource_size_t base, resource_size_t len, resource_size_t skipped) @@ -276,7 +374,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, struct cxl_port *port = cxled_to_port(cxled); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct device *dev = &port->dev; + enum cxl_decoder_mode mode; struct resource *res; + int rc; lockdep_assert_held_write(&cxl_dpa_rwsem); @@ -305,14 +405,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, } if (skipped) { - res = __request_region(&cxlds->dpa_res, base - skipped, skipped, - dev_name(&cxled->cxld.dev), 0); - if (!res) { - dev_dbg(dev, - "decoder%d.%d: failed to reserve skipped space\n", - port->id, cxled->cxld.id); - return -EBUSY; - } + rc = request_skip(cxlds, cxled, base - skipped, skipped); + if (rc) + return rc; } res = __request_region(&cxlds->dpa_res, base, len, dev_name(&cxled->cxld.dev), 0); @@ -320,22 +415,23 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, dev_dbg(dev, "decoder%d.%d: failed to reserve allocation\n", port->id, cxled->cxld.id); if (skipped) - __release_region(&cxlds->dpa_res, base - skipped, - skipped); + release_skip(cxlds, base - skipped, skipped); return -EBUSY; } cxled->dpa_res = res; cxled->skip = skipped; - if (to_pmem_res(cxlds) && resource_contains(to_pmem_res(cxlds), res)) - cxled->mode = CXL_DECODER_PMEM; - else if (to_ram_res(cxlds) && resource_contains(to_ram_res(cxlds), res)) - cxled->mode = CXL_DECODER_RAM; - else { + mode = CXL_DECODER_NONE; + for (int i = 0; cxlds->nr_partitions; i++) + if (resource_contains(&cxlds->part[i].res, res)) { + mode = cxl_part_mode(cxlds->part[i].mode); + break; + } + + if (mode == CXL_DECODER_NONE) dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n", port->id, cxled->cxld.id, res); - cxled->mode = CXL_DECODER_NONE; - } + cxled->mode = mode; port->hdm_end++; get_device(&cxled->cxld.dev); @@ -529,15 +625,13 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) { struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); - resource_size_t free_ram_start, free_pmem_start; struct cxl_port *port = cxled_to_port(cxled); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct device *dev = &cxled->cxld.dev; - resource_size_t start, avail, skip; + struct resource *res, *prev = NULL; + resource_size_t start, avail, skip, skip_start; struct resource *p, *last; - const struct resource *ram_res = to_ram_res(cxlds); - const struct resource *pmem_res = to_pmem_res(cxlds); - int rc; + int part, rc; down_write(&cxl_dpa_rwsem); if (cxled->cxld.region) { @@ -553,47 +647,54 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) goto out; } - for (p = ram_res->child, last = NULL; p; p = p->sibling) - last = p; - if (last) - free_ram_start = last->end + 1; - else - free_ram_start = ram_res->start; + part = -1; + for (int i = 0; i < cxlds->nr_partitions; i++) { + if (cxled->mode == cxl_part_mode(cxlds->part[i].mode)) { + part = i; + break; + } + } - for (p = pmem_res->child, last = NULL; p; p = p->sibling) + if (part < 0) { + dev_dbg(dev, "partition %d not found\n", part); + rc = -EBUSY; + goto out; + } + + res = &cxlds->part[part].res; + for (p = res->child, last = NULL; p; p = p->sibling) last = p; if (last) - free_pmem_start = last->end + 1; + start = last->end + 1; else - free_pmem_start = pmem_res->start; - - if (cxled->mode == CXL_DECODER_RAM) { - start = free_ram_start; - avail = ram_res->end - start + 1; - skip = 0; - } else if (cxled->mode == CXL_DECODER_PMEM) { - resource_size_t skip_start, skip_end; + start = res->start; - start = free_pmem_start; - avail = pmem_res->end - start + 1; - skip_start = free_ram_start; - - /* - * If some pmem is already allocated, then that allocation - * already handled the skip. - */ - if (pmem_res->child && - skip_start == pmem_res->child->start) - skip_end = skip_start - 1; - else - skip_end = start - 1; - skip = skip_end - skip_start + 1; - } else { - dev_dbg(dev, "mode not set\n"); - rc = -EINVAL; - goto out; + /* + * To allocate at partition N, a skip needs to be calculated for all + * unallocated space at lower partitions indices. + * + * If a partition has any allocations, the search can end because a + * previous cxl_dpa_alloc() invocation is assumed to have accounted for + * all previous partitions. + */ + skip_start = CXL_RESOURCE_NONE; + for (int i = part; i; i--) { + prev = &cxlds->part[i - 1].res; + for (p = prev->child, last = NULL; p; p = p->sibling) + last = p; + if (last) { + skip_start = last->end + 1; + break; + } + skip_start = prev->start; } + avail = res->end - start + 1; + if (skip_start == CXL_RESOURCE_NONE) + skip = 0; + else + skip = res->start - skip_start; + if (size > avail) { dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size, cxl_decoder_mode_name(cxled->mode), &avail); diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 15f549afab7c..bad99456e901 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -530,6 +530,20 @@ static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds) return resource_size(res); } +/* + * Translate the operational mode of memory capacity with the + * operational mode of a decoder + * TODO: kill 'enum cxl_decoder_mode' to obviate this helper + */ +static inline enum cxl_decoder_mode cxl_part_mode(enum cxl_partition_mode mode) +{ + if (mode == CXL_PARTMODE_RAM) + return CXL_DECODER_RAM; + if (mode == CXL_PARTMODE_PMEM) + return CXL_DECODER_PMEM; + return CXL_DECODER_NONE; +} + static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox) { return dev_get_drvdata(cxl_mbox->host); From patchwork Wed Jan 22 08:59:38 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13947031 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AC47920FA8F for ; Wed, 22 Jan 2025 08:59:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737536382; cv=none; b=oPViXY64ZZPTaOxreKfZ/4KJ/jBIiUKQ1UlG8X9Z1e9GmkNFaSCGYOAow5aLzUDAIXKGD4q+irDHSNne7PeAHW4uSggUjXgDeRPw31tA9+/MYoXXzv5qqpq0xqQfSV3ZqbJhLYYc1AbkgTZ35FNCfYSdqSd+tXQPNKISE9v6138= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737536382; c=relaxed/simple; bh=c1oo7aM/TB3eN8Be7A4xErT2AFb56TyHo7mMRWWkHSk=; h=Subject:From:To:Cc:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=DkbrmHNDvx+Sf5Gwtg/Rgw6X+uVzw65XqzXcNFjzEjvMCvuSGxzgnwYyiSHJfHTeitjcU8c2JUUIkD+qwt/emS/eTK+bj25wsnv6mc0IBoXsmwCC3JLnqsKyGwFVYbOpfMwxVGet3cfIOabGQ8yn7EHzcZRZCNqRCENu0O7Jnog= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=lbpDHPPo; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="lbpDHPPo" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737536380; x=1769072380; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=c1oo7aM/TB3eN8Be7A4xErT2AFb56TyHo7mMRWWkHSk=; b=lbpDHPPo5bvO+ec7q5+SoqwQ4RWuAdgCuKVmiAMKFJGgSncf5bVa30ea LFSp+/uF38WFCc4l+lR06ij1OoQXyMw0ILl9NQZMNy1Bj3nnT2Rqqry1n tvn078xAxDioPlYR6HayZQDejyVYn310daxIh8evcb+xjAZvhgghpj9Ds 9apqO58jhTJmy6kmu+fgLP3EDp68FmiUjDukMH3p3jG8JMFxADGwiZtD9 hHWJ6XQiJom9t069Y0au15j+ZfjN+Q0qEY8cLIBcJy4RT0dc1401Zb9yF xc0dcGpDF/1zwuE620qgmVl0t5I981TonQErlm/e0ww3JO3591FvX8vZE A==; X-CSE-ConnectionGUID: Vl4zMN0uQeOoCU44rP4HJA== X-CSE-MsgGUID: cNC8eeg+Q5qTegvICce+Zg== X-IronPort-AV: E=McAfee;i="6700,10204,11322"; a="37680107" X-IronPort-AV: E=Sophos;i="6.13,224,1732608000"; d="scan'208";a="37680107" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 00:59:40 -0800 X-CSE-ConnectionGUID: 8X6bfI6MRJSoc6xKrOZoiQ== X-CSE-MsgGUID: QY2QiyYCRmSXXBt5DTGDMg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,224,1732608000"; d="scan'208";a="107180044" Received: from ldmartin-desk2.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.125.110.77]) by fmviesa008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 00:59:39 -0800 Subject: [PATCH v2 5/5] cxl: Kill enum cxl_decoder_mode From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Dave Jiang , Alejandro Lucero , Ira Weiny , dave.jiang@intel.com, Jonathan.Cameron@huawei.com Date: Wed, 22 Jan 2025 00:59:38 -0800 Message-ID: <173753637863.3849855.16067432468334597297.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <173753635014.3849855.17902348420186052714.stgit@dwillia2-xfh.jf.intel.com> References: <173753635014.3849855.17902348420186052714.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Now that the operational mode of DPA capacity (ram vs pmem... etc) is tracked in the partition, and no code paths have dependencies on the mode implying the partition index, the ambiguous 'enum cxl_decoder_mode' can be cleaned up, specifically this ambiguity on whether the operation mode implied anything about the partition order. Endpoint decoders simply reference their assigned partition where the operational mode can be retrieved as partition mode. With this in place PMEM can now be partition0 which happens today when the RAM capacity size is zero. Dynamic RAM can appear above PMEM when DCD arrives, etc. Code sequences that hard coded the "PMEM after RAM" assumption can now just iterate partitions and consult the partition mode after the fact. Cc: Dave Jiang Cc: Alejandro Lucero Cc: Ira Weiny Signed-off-by: Dan Williams Reviewed-by: Ira Weiny Reviewed-by: Alejandro Lucero Reviewed-by: Dave Jiang --- drivers/cxl/core/cdat.c | 21 ++----- drivers/cxl/core/core.h | 4 + drivers/cxl/core/hdm.c | 64 +++++++---------------- drivers/cxl/core/memdev.c | 15 +---- drivers/cxl/core/port.c | 20 +++++-- drivers/cxl/core/region.c | 128 +++++++++++++++++++++++++-------------------- drivers/cxl/cxl.h | 38 ++++--------- drivers/cxl/cxlmem.h | 20 ------- 8 files changed, 127 insertions(+), 183 deletions(-) diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c index 5400a421ad30..ca7fb2b182ed 100644 --- a/drivers/cxl/core/cdat.c +++ b/drivers/cxl/core/cdat.c @@ -571,29 +571,18 @@ static bool dpa_perf_contains(struct cxl_dpa_perf *perf, .end = dpa_res->end, }; - if (!perf) - return false; - return range_contains(&perf->dpa_range, &dpa); } -static struct cxl_dpa_perf *cxled_get_dpa_perf(struct cxl_endpoint_decoder *cxled, - enum cxl_decoder_mode mode) +static struct cxl_dpa_perf *cxled_get_dpa_perf(struct cxl_endpoint_decoder *cxled) { struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct cxl_dpa_perf *perf; - switch (mode) { - case CXL_DECODER_RAM: - perf = to_ram_perf(cxlds); - break; - case CXL_DECODER_PMEM: - perf = to_pmem_perf(cxlds); - break; - default: + if (cxled->part < 0) return ERR_PTR(-EINVAL); - } + perf = &cxlds->part[cxled->part].perf; if (!dpa_perf_contains(perf, cxled->dpa_res)) return ERR_PTR(-EINVAL); @@ -654,7 +643,7 @@ static int cxl_endpoint_gather_bandwidth(struct cxl_region *cxlr, if (cxlds->rcd) return -ENODEV; - perf = cxled_get_dpa_perf(cxled, cxlr->mode); + perf = cxled_get_dpa_perf(cxled); if (IS_ERR(perf)) return PTR_ERR(perf); @@ -1060,7 +1049,7 @@ void cxl_region_perf_data_calculate(struct cxl_region *cxlr, lockdep_assert_held(&cxl_dpa_rwsem); - perf = cxled_get_dpa_perf(cxled, cxlr->mode); + perf = cxled_get_dpa_perf(cxled); if (IS_ERR(perf)) return; diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 800466f96a68..22dac79c5192 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -72,8 +72,8 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr, resource_size_t length); struct dentry *cxl_debugfs_create_dir(const char *dir); -int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, - enum cxl_decoder_mode mode); +int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled, + enum cxl_partition_mode mode); int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size); int cxl_dpa_free(struct cxl_endpoint_decoder *cxled); resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled); diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 591aeb26c9e1..bb478e7b12f6 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -374,7 +374,6 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, struct cxl_port *port = cxled_to_port(cxled); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct device *dev = &port->dev; - enum cxl_decoder_mode mode; struct resource *res; int rc; @@ -421,18 +420,6 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, cxled->dpa_res = res; cxled->skip = skipped; - mode = CXL_DECODER_NONE; - for (int i = 0; cxlds->nr_partitions; i++) - if (resource_contains(&cxlds->part[i].res, res)) { - mode = cxl_part_mode(cxlds->part[i].mode); - break; - } - - if (mode == CXL_DECODER_NONE) - dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n", - port->id, cxled->cxld.id, res); - cxled->mode = mode; - port->hdm_end++; get_device(&cxled->cxld.dev); return 0; @@ -585,40 +572,36 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled) return rc; } -int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, - enum cxl_decoder_mode mode) +int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled, + enum cxl_partition_mode mode) { struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct device *dev = &cxled->cxld.dev; - - switch (mode) { - case CXL_DECODER_RAM: - case CXL_DECODER_PMEM: - break; - default: - dev_dbg(dev, "unsupported mode: %d\n", mode); - return -EINVAL; - } + int part; guard(rwsem_write)(&cxl_dpa_rwsem); if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) return -EBUSY; - /* - * Only allow modes that are supported by the current partition - * configuration - */ - if (mode == CXL_DECODER_PMEM && !cxl_pmem_size(cxlds)) { - dev_dbg(dev, "no available pmem capacity\n"); - return -ENXIO; + part = -1; + for (int i = 0; i < cxlds->nr_partitions; i++) + if (cxlds->part[i].mode == mode) { + part = i; + break; + } + + if (part < 0) { + dev_dbg(dev, "unsupported mode: %d\n", mode); + return -EINVAL; } - if (mode == CXL_DECODER_RAM && !cxl_ram_size(cxlds)) { - dev_dbg(dev, "no available ram capacity\n"); + + if (!resource_size(&cxlds->part[part].res)) { + dev_dbg(dev, "no available capacity for mode: %d\n", mode); return -ENXIO; } - cxled->mode = mode; + cxled->part = part; return 0; } @@ -647,16 +630,9 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) goto out; } - part = -1; - for (int i = 0; i < cxlds->nr_partitions; i++) { - if (cxled->mode == cxl_part_mode(cxlds->part[i].mode)) { - part = i; - break; - } - } - + part = cxled->part; if (part < 0) { - dev_dbg(dev, "partition %d not found\n", part); + dev_dbg(dev, "partition not set\n"); rc = -EBUSY; goto out; } @@ -697,7 +673,7 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) if (size > avail) { dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size, - cxl_decoder_mode_name(cxled->mode), &avail); + res->name, &avail); rc = -ENOSPC; goto out; } diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index be0eb57086e1..615cbd861f66 100644 --- a/drivers/cxl/core/memdev.c +++ b/drivers/cxl/core/memdev.c @@ -198,17 +198,8 @@ static int cxl_get_poison_by_memdev(struct cxl_memdev *cxlmd) int rc = 0; /* CXL 3.0 Spec 8.2.9.8.4.1 Separate pmem and ram poison requests */ - if (cxl_pmem_size(cxlds)) { - const struct resource *res = to_pmem_res(cxlds); - - offset = res->start; - length = resource_size(res); - rc = cxl_mem_get_poison(cxlmd, offset, length, NULL); - if (rc) - return rc; - } - if (cxl_ram_size(cxlds)) { - const struct resource *res = to_ram_res(cxlds); + for (int i = 0; i < cxlds->nr_partitions; i++) { + const struct resource *res = &cxlds->part[i].res; offset = res->start; length = resource_size(res); @@ -217,7 +208,7 @@ static int cxl_get_poison_by_memdev(struct cxl_memdev *cxlmd) * Invalid Physical Address is not an error for * volatile addresses. Device support is optional. */ - if (rc == -EFAULT) + if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM) rc = 0; } return rc; diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 78a5c2c25982..f5f2701c8771 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -194,25 +194,35 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr, char *buf) { struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_dev_state *cxlds = cxlmd->cxlds; + /* without @cxl_dpa_rwsem, make sure @part is not reloaded */ + int part = READ_ONCE(cxled->part); + const char *desc; + + if (part < 0) + desc = "none"; + else + desc = cxlds->part[part].res.name; - return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxled->mode)); + return sysfs_emit(buf, "%s\n", desc); } static ssize_t mode_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); - enum cxl_decoder_mode mode; + enum cxl_partition_mode mode; ssize_t rc; if (sysfs_streq(buf, "pmem")) - mode = CXL_DECODER_PMEM; + mode = CXL_PARTMODE_PMEM; else if (sysfs_streq(buf, "ram")) - mode = CXL_DECODER_RAM; + mode = CXL_PARTMODE_RAM; else return -EINVAL; - rc = cxl_dpa_set_mode(cxled, mode); + rc = cxl_dpa_set_part(cxled, mode); if (rc) return rc; diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 9f0f6fdbc841..83b985d2ba76 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -144,7 +144,7 @@ static ssize_t uuid_show(struct device *dev, struct device_attribute *attr, rc = down_read_interruptible(&cxl_region_rwsem); if (rc) return rc; - if (cxlr->mode != CXL_DECODER_PMEM) + if (cxlr->mode != CXL_PARTMODE_PMEM) rc = sysfs_emit(buf, "\n"); else rc = sysfs_emit(buf, "%pUb\n", &p->uuid); @@ -441,7 +441,7 @@ static umode_t cxl_region_visible(struct kobject *kobj, struct attribute *a, * Support tooling that expects to find a 'uuid' attribute for all * regions regardless of mode. */ - if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_DECODER_PMEM) + if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_PARTMODE_PMEM) return 0444; return a->mode; } @@ -603,8 +603,16 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr, char *buf) { struct cxl_region *cxlr = to_cxl_region(dev); + const char *desc; - return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxlr->mode)); + if (cxlr->mode == CXL_PARTMODE_RAM) + desc = "ram"; + else if (cxlr->mode == CXL_PARTMODE_PMEM) + desc = "pmem"; + else + desc = ""; + + return sysfs_emit(buf, "%s\n", desc); } static DEVICE_ATTR_RO(mode); @@ -630,7 +638,7 @@ static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size) /* ways, granularity and uuid (if PMEM) need to be set before HPA */ if (!p->interleave_ways || !p->interleave_granularity || - (cxlr->mode == CXL_DECODER_PMEM && uuid_is_null(&p->uuid))) + (cxlr->mode == CXL_PARTMODE_PMEM && uuid_is_null(&p->uuid))) return -ENXIO; div64_u64_rem(size, (u64)SZ_256M * p->interleave_ways, &remainder); @@ -1875,6 +1883,7 @@ static int cxl_region_attach(struct cxl_region *cxlr, { struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_dev_state *cxlds = cxlmd->cxlds; struct cxl_region_params *p = &cxlr->params; struct cxl_port *ep_port, *root_port; struct cxl_dport *dport; @@ -1889,17 +1898,17 @@ static int cxl_region_attach(struct cxl_region *cxlr, return rc; } - if (cxled->mode != cxlr->mode) { - dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n", - dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode); - return -EINVAL; - } - - if (cxled->mode == CXL_DECODER_DEAD) { + if (cxled->part < 0) { dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev)); return -ENODEV; } + if (cxlds->part[cxled->part].mode != cxlr->mode) { + dev_dbg(&cxlr->dev, "%s region mode: %d mismatch\n", + dev_name(&cxled->cxld.dev), cxlr->mode); + return -EINVAL; + } + /* all full of members, or interleave config not established? */ if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) { dev_dbg(&cxlr->dev, "region already active\n"); @@ -2102,7 +2111,7 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled) void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled) { down_write(&cxl_region_rwsem); - cxled->mode = CXL_DECODER_DEAD; + cxled->part = -1; cxl_region_detach(cxled); up_write(&cxl_region_rwsem); } @@ -2458,7 +2467,7 @@ static int cxl_region_calculate_adistance(struct notifier_block *nb, */ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd, int id, - enum cxl_decoder_mode mode, + enum cxl_partition_mode mode, enum cxl_decoder_type type) { struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent); @@ -2512,13 +2521,13 @@ static ssize_t create_ram_region_show(struct device *dev, } static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd, - enum cxl_decoder_mode mode, int id) + enum cxl_partition_mode mode, int id) { int rc; switch (mode) { - case CXL_DECODER_RAM: - case CXL_DECODER_PMEM: + case CXL_PARTMODE_RAM: + case CXL_PARTMODE_PMEM: break; default: dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %d\n", mode); @@ -2538,7 +2547,7 @@ static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd, } static ssize_t create_region_store(struct device *dev, const char *buf, - size_t len, enum cxl_decoder_mode mode) + size_t len, enum cxl_partition_mode mode) { struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev); struct cxl_region *cxlr; @@ -2559,7 +2568,7 @@ static ssize_t create_pmem_region_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { - return create_region_store(dev, buf, len, CXL_DECODER_PMEM); + return create_region_store(dev, buf, len, CXL_PARTMODE_PMEM); } DEVICE_ATTR_RW(create_pmem_region); @@ -2567,7 +2576,7 @@ static ssize_t create_ram_region_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { - return create_region_store(dev, buf, len, CXL_DECODER_RAM); + return create_region_store(dev, buf, len, CXL_PARTMODE_RAM); } DEVICE_ATTR_RW(create_ram_region); @@ -2665,7 +2674,7 @@ EXPORT_SYMBOL_NS_GPL(to_cxl_pmem_region, "CXL"); struct cxl_poison_context { struct cxl_port *port; - enum cxl_decoder_mode mode; + int part; u64 offset; }; @@ -2673,49 +2682,45 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd, struct cxl_poison_context *ctx) { struct cxl_dev_state *cxlds = cxlmd->cxlds; + const struct resource *res; + struct resource *p, *last; u64 offset, length; int rc = 0; + if (ctx->part < 0) + return 0; + /* - * Collect poison for the remaining unmapped resources - * after poison is collected by committed endpoints. - * - * Knowing that PMEM must always follow RAM, get poison - * for unmapped resources based on the last decoder's mode: - * ram: scan remains of ram range, then any pmem range - * pmem: scan remains of pmem range + * Collect poison for the remaining unmapped resources after + * poison is collected by committed endpoints decoders. */ - - if (ctx->mode == CXL_DECODER_RAM) { - offset = ctx->offset; - length = cxl_ram_size(cxlds) - offset; + for (int i = ctx->part; i < cxlds->nr_partitions; i++) { + res = &cxlds->part[i].res; + for (p = res->child, last = NULL; p; p = p->sibling) + last = p; + if (last) + offset = last->end + 1; + else + offset = res->start; + length = res->end - offset + 1; + if (!length) + break; rc = cxl_mem_get_poison(cxlmd, offset, length, NULL); - if (rc == -EFAULT) - rc = 0; + if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM) + continue; if (rc) - return rc; - } - if (ctx->mode == CXL_DECODER_PMEM) { - offset = ctx->offset; - length = resource_size(&cxlds->dpa_res) - offset; - if (!length) - return 0; - } else if (cxl_pmem_size(cxlds)) { - const struct resource *res = to_pmem_res(cxlds); - - offset = res->start; - length = resource_size(res); - } else { - return 0; + break; } - return cxl_mem_get_poison(cxlmd, offset, length, NULL); + return rc; } static int poison_by_decoder(struct device *dev, void *arg) { struct cxl_poison_context *ctx = arg; struct cxl_endpoint_decoder *cxled; + enum cxl_partition_mode mode; + struct cxl_dev_state *cxlds; struct cxl_memdev *cxlmd; u64 offset, length; int rc = 0; @@ -2728,11 +2733,17 @@ static int poison_by_decoder(struct device *dev, void *arg) return rc; cxlmd = cxled_to_memdev(cxled); + cxlds = cxlmd->cxlds; + if (cxled->part < 0) + mode = CXL_PARTMODE_NONE; + else + mode = cxlds->part[cxled->part].mode; + if (cxled->skip) { offset = cxled->dpa_res->start - cxled->skip; length = cxled->skip; rc = cxl_mem_get_poison(cxlmd, offset, length, NULL); - if (rc == -EFAULT && cxled->mode == CXL_DECODER_RAM) + if (rc == -EFAULT && mode == CXL_PARTMODE_RAM) rc = 0; if (rc) return rc; @@ -2741,7 +2752,7 @@ static int poison_by_decoder(struct device *dev, void *arg) offset = cxled->dpa_res->start; length = cxled->dpa_res->end - offset + 1; rc = cxl_mem_get_poison(cxlmd, offset, length, cxled->cxld.region); - if (rc == -EFAULT && cxled->mode == CXL_DECODER_RAM) + if (rc == -EFAULT && mode == CXL_PARTMODE_RAM) rc = 0; if (rc) return rc; @@ -2749,7 +2760,7 @@ static int poison_by_decoder(struct device *dev, void *arg) /* Iterate until commit_end is reached */ if (cxled->cxld.id == ctx->port->commit_end) { ctx->offset = cxled->dpa_res->end + 1; - ctx->mode = cxled->mode; + ctx->part = cxled->part; return 1; } @@ -2762,7 +2773,8 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port) int rc = 0; ctx = (struct cxl_poison_context) { - .port = port + .port = port, + .part = -1, }; rc = device_for_each_child(&port->dev, &ctx, poison_by_decoder); @@ -3206,14 +3218,18 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd, { struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); struct cxl_port *port = cxlrd_to_port(cxlrd); + struct cxl_dev_state *cxlds = cxlmd->cxlds; struct range *hpa = &cxled->cxld.hpa_range; + int rc, part = READ_ONCE(cxled->part); struct cxl_region_params *p; struct cxl_region *cxlr; struct resource *res; - int rc; + + if (part < 0) + return ERR_PTR(-EBUSY); do { - cxlr = __create_region(cxlrd, cxled->mode, + cxlr = __create_region(cxlrd, cxlds->part[part].mode, atomic_read(&cxlrd->region_id)); } while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY); @@ -3416,9 +3432,9 @@ static int cxl_region_probe(struct device *dev) return rc; switch (cxlr->mode) { - case CXL_DECODER_PMEM: + case CXL_PARTMODE_PMEM: return devm_cxl_add_pmem_region(cxlr); - case CXL_DECODER_RAM: + case CXL_PARTMODE_RAM: /* * The region can not be manged by CXL if any portion of * it is already online as 'System RAM' diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 4d0550367042..cb6f0b761b24 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -371,30 +371,6 @@ struct cxl_decoder { void (*reset)(struct cxl_decoder *cxld); }; -/* - * CXL_DECODER_DEAD prevents endpoints from being reattached to regions - * while cxld_unregister() is running - */ -enum cxl_decoder_mode { - CXL_DECODER_NONE, - CXL_DECODER_RAM, - CXL_DECODER_PMEM, - CXL_DECODER_DEAD, -}; - -static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode) -{ - static const char * const names[] = { - [CXL_DECODER_NONE] = "none", - [CXL_DECODER_RAM] = "ram", - [CXL_DECODER_PMEM] = "pmem", - }; - - if (mode >= CXL_DECODER_NONE && mode < CXL_DECODER_DEAD) - return names[mode]; - return "mixed"; -} - /* * Track whether this decoder is reserved for region autodiscovery, or * free for userspace provisioning. @@ -409,16 +385,16 @@ enum cxl_decoder_state { * @cxld: base cxl_decoder_object * @dpa_res: actively claimed DPA span of this decoder * @skip: offset into @dpa_res where @cxld.hpa_range maps - * @mode: which memory type / access-mode-partition this decoder targets * @state: autodiscovery state + * @part: partition index this decoder maps * @pos: interleave position in @cxld.region */ struct cxl_endpoint_decoder { struct cxl_decoder cxld; struct resource *dpa_res; resource_size_t skip; - enum cxl_decoder_mode mode; enum cxl_decoder_state state; + int part; int pos; }; @@ -503,6 +479,12 @@ struct cxl_region_params { int nr_targets; }; +enum cxl_partition_mode { + CXL_PARTMODE_NONE, + CXL_PARTMODE_RAM, + CXL_PARTMODE_PMEM, +}; + /* * Indicate whether this region has been assembled by autodetection or * userspace assembly. Prevent endpoint decoders outside of automatic @@ -522,7 +504,7 @@ struct cxl_region_params { * struct cxl_region - CXL region * @dev: This region's device * @id: This region's id. Id is globally unique across all regions - * @mode: Endpoint decoder allocation / access mode + * @mode: Operational mode of the mapped capacity * @type: Endpoint decoder target type * @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown * @cxlr_pmem: (for pmem regions) cached copy of the nvdimm bridge @@ -535,7 +517,7 @@ struct cxl_region_params { struct cxl_region { struct device dev; int id; - enum cxl_decoder_mode mode; + enum cxl_partition_mode mode; enum cxl_decoder_type type; struct cxl_nvdimm_bridge *cxl_nvb; struct cxl_pmem_region *cxlr_pmem; diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index bad99456e901..f218d43dec9f 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -97,12 +97,6 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, resource_size_t base, resource_size_t len, resource_size_t skipped); -enum cxl_partition_mode { - CXL_PARTMODE_NONE, - CXL_PARTMODE_RAM, - CXL_PARTMODE_PMEM, -}; - #define CXL_NR_PARTITIONS_MAX 2 struct cxl_dpa_info { @@ -530,20 +524,6 @@ static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds) return resource_size(res); } -/* - * Translate the operational mode of memory capacity with the - * operational mode of a decoder - * TODO: kill 'enum cxl_decoder_mode' to obviate this helper - */ -static inline enum cxl_decoder_mode cxl_part_mode(enum cxl_partition_mode mode) -{ - if (mode == CXL_PARTMODE_RAM) - return CXL_DECODER_RAM; - if (mode == CXL_PARTMODE_PMEM) - return CXL_DECODER_PMEM; - return CXL_DECODER_NONE; -} - static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox) { return dev_get_drvdata(cxl_mbox->host);