diff mbox series

[v5,4/4] cxl: Add checks to access_coordinate calculation to fail missing data

Message ID 20240325230234.1847525-5-dave.jiang@intel.com
State Superseded
Headers show
Series cxl: access_coordinate validity fixes for 6.9 | expand

Commit Message

Dave Jiang March 25, 2024, 11 p.m. UTC
Jonathan noted that when the coordinates for host bridge and switches
can be 0s if no actual data are retrieved and the calculation continues.
The resulting number would be inaccurate. Add checks to ensure that the
calculation would complete only if the numbers are valid.

While not seen in the wild, issue may show up with a BIOS that reported
CXL root ports via Generic Ports (via a PCI handle in the SRAT entry).

Fixes: 14a6960b3e92 ("cxl: Add helper function that calculate performance data for downstream ports")
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
v5:
- Adjust for multiple access classes.
---
 drivers/cxl/core/port.c | 38 +++++++++++++++++++++++++++-----------
 1 file changed, 27 insertions(+), 11 deletions(-)

Comments

Dan Williams April 1, 2024, 7:37 p.m. UTC | #1
Dave Jiang wrote:
> Jonathan noted that when the coordinates for host bridge and switches
> can be 0s if no actual data are retrieved and the calculation continues.
> The resulting number would be inaccurate. Add checks to ensure that the
> calculation would complete only if the numbers are valid.
> 
> While not seen in the wild, issue may show up with a BIOS that reported
> CXL root ports via Generic Ports (via a PCI handle in the SRAT entry).
> 
> Fixes: 14a6960b3e92 ("cxl: Add helper function that calculate performance data for downstream ports")
> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
> v5:
> - Adjust for multiple access classes.
> ---
>  drivers/cxl/core/port.c | 38 +++++++++++++++++++++++++++-----------
>  1 file changed, 27 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index e388f50675f8..ca5bc9ed6d2f 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -2141,6 +2141,18 @@ static void add_latency(struct access_coordinate *c, long latency)
>  	}
>  }
>  
> +static bool coordinates_valid(struct access_coordinate *c)
> +{
> +	for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) {
> +		if (c[i].read_bandwidth && c[i].write_bandwidth &&
> +		    c[i].read_latency && c[i].write_latency)
> +			continue;
> +		return false;

With the observation that a coordinate tracks nanoseconds while the CDAT
reports picoseconds I wonder what this check does for sub-nanosecond
latency values. Does that need a min_not_zero(1, latency) somewhere...
and if that is in place does it mean latency can never be zero?
diff mbox series

Patch

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index e388f50675f8..ca5bc9ed6d2f 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -2141,6 +2141,18 @@  static void add_latency(struct access_coordinate *c, long latency)
 	}
 }
 
+static bool coordinates_valid(struct access_coordinate *c)
+{
+	for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) {
+		if (c[i].read_bandwidth && c[i].write_bandwidth &&
+		    c[i].read_latency && c[i].write_latency)
+			continue;
+		return false;
+	}
+
+	return true;
+}
+
 static void set_min_bandwidth(struct access_coordinate *c, unsigned int bw)
 {
 	for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) {
@@ -2156,6 +2168,11 @@  static void set_access_coordinates(struct access_coordinate *out,
 		out[i] = in[i];
 }
 
+static bool parent_port_is_cxl_root(struct cxl_port *port)
+{
+	return is_cxl_root(to_cxl_port(port->dev.parent));
+}
+
 /**
  * cxl_endpoint_get_perf_coordinates - Retrieve performance numbers stored in dports
  *				   of CXL path
@@ -2185,21 +2202,20 @@  int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
 	if (!is_cxl_endpoint(port))
 		return -EINVAL;
 
-	dport = iter->parent_dport;
-
 	/*
-	 * Exit the loop when the parent port of the current port is cxl root.
-	 * The iterative loop starts at the endpoint and gathers the
-	 * latency of the CXL link from the current iter to the next downstream
-	 * port each iteration. If the parent is cxl root then there is
-	 * nothing to gather.
+	 * Exit the loop when the parent port of the current iter port is cxl
+	 * root. The iterative loop starts at the endpoint and gathers the
+	 * latency of the CXL link from the current device/port to the connected
+	 * downstream port each iteration.
 	 */
-	while (!is_cxl_root(to_cxl_port(iter->dev.parent))) {
-		iter = to_cxl_port(iter->dev.parent);
+	do {
+		dport = iter->parent_dport;
+		if (!coordinates_valid(dport->coord))
+			return -EINVAL;
 		cxl_coordinates_combine(c, c, dport->coord);
 		add_latency(c, dport->link_latency);
-		dport = iter->parent_dport;
-	}
+		iter = to_cxl_port(iter->dev.parent);
+	} while (!parent_port_is_cxl_root(iter));
 
 	/* Get the calculated PCI paths bandwidth */
 	pdev = to_pci_dev(port->uport_dev->parent);