Message ID | 20240229002542.634982-2-dave.jiang@intel.com |
---|---|
State | Superseded |
Headers | show |
Series | [1/2] cxl: Remove checking of iter in cxl_endpoint_get_perf_coordinates() | expand |
Dave Jiang wrote: > Jonathan noted that when the coordinates for host bridge and switches > can be 0s if no actual data are retrieved and the calculation continues. > The resulting number would be inaccurate. Add checks to ensure that the > calculation would complete only if the numbers are valid. Similar comment as patch1. This smells like a fix, is this an urgent thing to get into v6.8, i.e. most configurations are busted without this, or is a nice to have fixup for a QEMU effect that may or may not show up in physical systems?
On 2/28/24 5:35 PM, Dan Williams wrote: > Dave Jiang wrote: >> Jonathan noted that when the coordinates for host bridge and switches >> can be 0s if no actual data are retrieved and the calculation continues. >> The resulting number would be inaccurate. Add checks to ensure that the >> calculation would complete only if the numbers are valid. > > Similar comment as patch1. This smells like a fix, is this an urgent > thing to get into v6.8, i.e. most configurations are busted without > this, or is a nice to have fixup for a QEMU effect that may or may not > show up in physical systems? This would only be an issue if the switches do not supply a proper CDAT and/or if BIOS does not provide proper ACPI tables. I think it is only experienced on QEMU currently. I'll add a fix tag if you think it should be a fix.
Dave Jiang wrote: > > > On 2/28/24 5:35 PM, Dan Williams wrote: > > Dave Jiang wrote: > >> Jonathan noted that when the coordinates for host bridge and switches > >> can be 0s if no actual data are retrieved and the calculation continues. > >> The resulting number would be inaccurate. Add checks to ensure that the > >> calculation would complete only if the numbers are valid. > > > > Similar comment as patch1. This smells like a fix, is this an urgent > > thing to get into v6.8, i.e. most configurations are busted without > > this, or is a nice to have fixup for a QEMU effect that may or may not > > show up in physical systems? > > This would only be an issue if the switches do not supply a proper > CDAT and/or if BIOS does not provide proper ACPI tables. I think it is > only experienced on QEMU currently. I'll add a fix tag if you think it > should be a fix. No, but please add that relevant clarification of impact to the changelog. It would have made it clear this is a nice to have thing for v6.9, nothing to worry about for v6.8.
On Wed, 28 Feb 2024 16:44:29 -0800 Dan Williams <dan.j.williams@intel.com> wrote: > Dave Jiang wrote: > > > > > > On 2/28/24 5:35 PM, Dan Williams wrote: > > > Dave Jiang wrote: > > >> Jonathan noted that when the coordinates for host bridge and switches > > >> can be 0s if no actual data are retrieved and the calculation continues. > > >> The resulting number would be inaccurate. Add checks to ensure that the > > >> calculation would complete only if the numbers are valid. > > > > > > Similar comment as patch1. This smells like a fix, is this an urgent > > > thing to get into v6.8, i.e. most configurations are busted without > > > this, or is a nice to have fixup for a QEMU effect that may or may not > > > show up in physical systems? > > > > This would only be an issue if the switches do not supply a proper > > CDAT and/or if BIOS does not provide proper ACPI tables. I think it is > > only experienced on QEMU currently. I'll add a fix tag if you think it > > should be a fix. > > No, but please add that relevant clarification of impact to the > changelog. It would have made it clear this is a nice to have thing for > v6.9, nothing to worry about for v6.8. Not seen in wild, but it might show up with a BIOS that reported CXL root ports via Generic Ports (via a PCI handle in the SRAT entry) rather than CXL Host Bridge (via an ACPI Handle int he SRAT entry). The code doesn't yet support this case (which is fine) but it should cleanly not support it rather than giving wrong numbers. There are other fun corners like embedded CXL switches in the host where the reporting will also be more 'creative'. I expect we will see these eventually but not in first generation or two. I 'think' this would be a reasonable system choice as there is nothing to say that a CXL hostbridge doesn't have a bunch of root ports (potentially on different dies) that have different host initiator to port exit latency and bandwidth characteristics. Whilst no known hardware. I don't think this is a FW bug case only. Not a rush job though, fixes tag + 6.9 will be fine. Jonathan
On Wed, 28 Feb 2024 17:25:42 -0700 Dave Jiang <dave.jiang@intel.com> wrote: > Jonathan noted that when the coordinates for host bridge and switches > can be 0s if no actual data are retrieved and the calculation continues. > The resulting number would be inaccurate. Add checks to ensure that the > calculation would complete only if the numbers are valid. > > Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Signed-off-by: Dave Jiang <dave.jiang@intel.com> Hi Dave, Whilst I think the fix is right, it is getting hard to read. Maybe a rethink is needed on how that iteration works? > --- > drivers/cxl/core/port.c | 31 +++++++++++++++++++++++++++---- > 1 file changed, 27 insertions(+), 4 deletions(-) > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c > index e1d30a885700..2c82fe24b789 100644 > --- a/drivers/cxl/core/port.c > +++ b/drivers/cxl/core/port.c > @@ -2110,6 +2110,20 @@ static void combine_coordinates(struct access_coordinate *c1, > c1->read_latency += c2->read_latency; > } > > +static bool coordinates_invalid(struct access_coordinate *c) > +{ > + if (!c->read_bandwidth && !c->write_bandwidth && > + !c->read_latency && !c->write_latency) > + return true; > + > + return false; > +} > + > +static bool parent_port_is_cxl_root(struct cxl_port *port) > +{ > + return is_cxl_root(to_cxl_port(port->dev.parent)); > +} > + > /** > * cxl_endpoint_get_perf_coordinates - Retrieve performance numbers stored in dports > * of CXL path > @@ -2142,16 +2156,25 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port, > * port each iteration. If the parent is cxl root then there is > * nothing to gather. > */ > - while (!is_cxl_root(to_cxl_port(iter->dev.parent))) { > - combine_coordinates(&c, &dport->sw_coord); > + while (!parent_port_is_cxl_root(iter)) { > + iter = to_cxl_port(iter->dev.parent); > + > + /* There's no CDAT for the host bridge, so skip if so. */ Comment refers to skipping whereas code is 'doing more' for the other case so this is confusing to me. The inverse of this only occurs on the last iteration I think. Possibly a do / while instead of a while will do it. I'm far from confident though as all the levels of look up have me too confused. do { if (coordinates_invalid(&dport->sw_coord)) return -EINVAL; combine_coordinates(&c, &dport->sw_coord); iter = to_cxl_port(iter->dev.parent); dport = iter->parent_dport; } while (!parent_port_is_cxl_root(iter)); /* Do final link updates */ c.write_latency += dport->link_latency; c.read_latency += dport->link_latency; > + if (!parent_port_is_cxl_root(iter)) { > + if (coordinates_invalid(&dport->sw_coord)) > + return -EINVAL; > + > + combine_coordinates(&c, &dport->sw_coord); > + } > + > c.write_latency += dport->link_latency; > c.read_latency += dport->link_latency; > - > - iter = to_cxl_port(iter->dev.parent); > dport = iter->parent_dport; > } > > /* Augment with the generic port (host bridge) perf data */ > + if (coordinates_invalid(&dport->hb_coord)) > + return -EINVAL; > combine_coordinates(&c, &dport->hb_coord); > > /* Get the calculated PCI paths bandwidth */
On 2/29/24 10:44 AM, Jonathan Cameron wrote: > On Wed, 28 Feb 2024 17:25:42 -0700 > Dave Jiang <dave.jiang@intel.com> wrote: > >> Jonathan noted that when the coordinates for host bridge and switches >> can be 0s if no actual data are retrieved and the calculation continues. >> The resulting number would be inaccurate. Add checks to ensure that the >> calculation would complete only if the numbers are valid. >> >> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >> Signed-off-by: Dave Jiang <dave.jiang@intel.com> > > Hi Dave, > > Whilst I think the fix is right, it is getting hard to read. Maybe > a rethink is needed on how that iteration works? > >> --- >> drivers/cxl/core/port.c | 31 +++++++++++++++++++++++++++---- >> 1 file changed, 27 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c >> index e1d30a885700..2c82fe24b789 100644 >> --- a/drivers/cxl/core/port.c >> +++ b/drivers/cxl/core/port.c >> @@ -2110,6 +2110,20 @@ static void combine_coordinates(struct access_coordinate *c1, >> c1->read_latency += c2->read_latency; >> } >> >> +static bool coordinates_invalid(struct access_coordinate *c) >> +{ >> + if (!c->read_bandwidth && !c->write_bandwidth && >> + !c->read_latency && !c->write_latency) >> + return true; >> + >> + return false; >> +} >> + >> +static bool parent_port_is_cxl_root(struct cxl_port *port) >> +{ >> + return is_cxl_root(to_cxl_port(port->dev.parent)); >> +} >> + >> /** >> * cxl_endpoint_get_perf_coordinates - Retrieve performance numbers stored in dports >> * of CXL path >> @@ -2142,16 +2156,25 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port, >> * port each iteration. If the parent is cxl root then there is >> * nothing to gather. >> */ >> - while (!is_cxl_root(to_cxl_port(iter->dev.parent))) { >> - combine_coordinates(&c, &dport->sw_coord); >> + while (!parent_port_is_cxl_root(iter)) { >> + iter = to_cxl_port(iter->dev.parent); >> + >> + /* There's no CDAT for the host bridge, so skip if so. */ > > Comment refers to skipping whereas code is 'doing more' for the other case > so this is confusing to me. > > The inverse of this only occurs on the last iteration I think. > > Possibly a do / while instead of a while will do it. > I'm far from confident though as all the levels of look up have > me too confused. So this is somewhat tricky. For example: devices/pci0000:35/0000:35:01.0/0000:37:00.0/mem5 In this case the endpoint is attached to the host bridge without any switches. The endpoint is 0000:37:00.0 and the host bridge down stream port is 0000:35:01.0. In this instance there is no switch and therefore switch CDAT, but there is a valid dport. So we would skip the dport->sw_coord. However, we do need to pick up the link_latency between endpoint and downstream port. So we spend 1 iteration in the loop and skips the dport->sw_coord. devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/0000:c1:00.0/0000:c2:00.0/mem8 Now in this case there's a CXL switch in between. So in first iteration, we pick up the dport->sw_coord. And in second iteration, we skip the dport->sw_coord. However, we would be adding two link_latency. For the link between 0000:c2:00.0 (endpoint) and 0000:c1:00.0 (switch downstream port), and the link between 0000:c0:00.0 (switch upstream port) and 0000:bf:00.0 (host bridge down stream port). So therefore we can't put the sum of link_latency outside of the loop. Not sure how much better this is: do { struct cxl_port *parent_port = to_cxl_port(iter->dev.parent); dport = iter->parent_dport; if (!parent_port_is_cxl_root(parent_port)) { if (coordinates_invalid(&dport->sw_coord)) return -EINVAL; combine_coordinates(&c, &dport->sw_coord); } c.write_latency += dport->link_latency; c.read_latency += dport->link_latency; iter = to_cxl_port(iter->dev.parent); } while (!parent_port_is_cxl_root(iter)); or: do { struct cxl_port *parent_port = to_cxl_port(iter->dev.parent); dport = iter->parent_dport; c.write_latency += dport->link_latency; c.read_latency += dport->link_latency; if (parent_port_is_cxl_root(parent_port)) break; if (coordinates_invalid(&dport->sw_coord)) return -EINVAL; combine_coordinates(&c, &dport->sw_coord); iter = to_cxl_port(iter->dev.parent); } while (!parent_port_is_cxl_root(iter)); > > > do { > if (coordinates_invalid(&dport->sw_coord)) > return -EINVAL; > > combine_coordinates(&c, &dport->sw_coord); > iter = to_cxl_port(iter->dev.parent); > dport = iter->parent_dport; > } while (!parent_port_is_cxl_root(iter)); > /* Do final link updates */ > c.write_latency += dport->link_latency; > c.read_latency += dport->link_latency; > >> + if (!parent_port_is_cxl_root(iter)) { >> + if (coordinates_invalid(&dport->sw_coord)) >> + return -EINVAL; >> + >> + combine_coordinates(&c, &dport->sw_coord); >> + } >> + >> c.write_latency += dport->link_latency; >> c.read_latency += dport->link_latency; >> - >> - iter = to_cxl_port(iter->dev.parent); >> dport = iter->parent_dport; >> } >> >> /* Augment with the generic port (host bridge) perf data */ >> + if (coordinates_invalid(&dport->hb_coord)) >> + return -EINVAL; >> combine_coordinates(&c, &dport->hb_coord); >> >> /* Get the calculated PCI paths bandwidth */ >
On 3/5/24 3:36 PM, Dave Jiang wrote: > > > On 2/29/24 10:44 AM, Jonathan Cameron wrote: >> On Wed, 28 Feb 2024 17:25:42 -0700 >> Dave Jiang <dave.jiang@intel.com> wrote: >> >>> Jonathan noted that when the coordinates for host bridge and switches >>> can be 0s if no actual data are retrieved and the calculation continues. >>> The resulting number would be inaccurate. Add checks to ensure that the >>> calculation would complete only if the numbers are valid. >>> >>> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >>> Signed-off-by: Dave Jiang <dave.jiang@intel.com> >> >> Hi Dave, >> >> Whilst I think the fix is right, it is getting hard to read. Maybe >> a rethink is needed on how that iteration works? >> >>> --- >>> drivers/cxl/core/port.c | 31 +++++++++++++++++++++++++++---- >>> 1 file changed, 27 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c >>> index e1d30a885700..2c82fe24b789 100644 >>> --- a/drivers/cxl/core/port.c >>> +++ b/drivers/cxl/core/port.c >>> @@ -2110,6 +2110,20 @@ static void combine_coordinates(struct access_coordinate *c1, >>> c1->read_latency += c2->read_latency; >>> } >>> >>> +static bool coordinates_invalid(struct access_coordinate *c) >>> +{ >>> + if (!c->read_bandwidth && !c->write_bandwidth && >>> + !c->read_latency && !c->write_latency) >>> + return true; >>> + >>> + return false; >>> +} >>> + >>> +static bool parent_port_is_cxl_root(struct cxl_port *port) >>> +{ >>> + return is_cxl_root(to_cxl_port(port->dev.parent)); >>> +} >>> + >>> /** >>> * cxl_endpoint_get_perf_coordinates - Retrieve performance numbers stored in dports >>> * of CXL path >>> @@ -2142,16 +2156,25 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port, >>> * port each iteration. If the parent is cxl root then there is >>> * nothing to gather. >>> */ >>> - while (!is_cxl_root(to_cxl_port(iter->dev.parent))) { >>> - combine_coordinates(&c, &dport->sw_coord); >>> + while (!parent_port_is_cxl_root(iter)) { >>> + iter = to_cxl_port(iter->dev.parent); >>> + >>> + /* There's no CDAT for the host bridge, so skip if so. */ >> >> Comment refers to skipping whereas code is 'doing more' for the other case >> so this is confusing to me. >> >> The inverse of this only occurs on the last iteration I think. >> >> Possibly a do / while instead of a while will do it. >> I'm far from confident though as all the levels of look up have >> me too confused. > > So this is somewhat tricky. For example: > devices/pci0000:35/0000:35:01.0/0000:37:00.0/mem5 > > In this case the endpoint is attached to the host bridge without any switches. The endpoint is 0000:37:00.0 and the host bridge down stream port is 0000:35:01.0. In this instance there is no switch and therefore switch CDAT, but there is a valid dport. So we would skip the dport->sw_coord. However, we do need to pick up the link_latency between endpoint and downstream port. So we spend 1 iteration in the loop and skips the dport->sw_coord. > > devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/0000:c1:00.0/0000:c2:00.0/mem8 > > Now in this case there's a CXL switch in between. So in first iteration, we pick up the dport->sw_coord. And in second iteration, we skip the dport->sw_coord. However, we would be adding two link_latency. For the link between 0000:c2:00.0 (endpoint) and 0000:c1:00.0 (switch downstream port), and the link between 0000:c0:00.0 (switch upstream port) and 0000:bf:00.0 (host bridge down stream port). So therefore we can't put the sum of link_latency outside of the loop. > > Not sure how much better this is: > > do { > struct cxl_port *parent_port = to_cxl_port(iter->dev.parent); > > dport = iter->parent_dport; > if (!parent_port_is_cxl_root(parent_port)) { > if (coordinates_invalid(&dport->sw_coord)) > return -EINVAL; > > combine_coordinates(&c, &dport->sw_coord); > } > > c.write_latency += dport->link_latency; > c.read_latency += dport->link_latency; > iter = to_cxl_port(iter->dev.parent); > } while (!parent_port_is_cxl_root(iter)); > > or: > > do { > struct cxl_port *parent_port = to_cxl_port(iter->dev.parent); > > dport = iter->parent_dport; > c.write_latency += dport->link_latency; > c.read_latency += dport->link_latency; > > if (parent_port_is_cxl_root(parent_port)) > break; > > if (coordinates_invalid(&dport->sw_coord)) > return -EINVAL; > > combine_coordinates(&c, &dport->sw_coord); > iter = to_cxl_port(iter->dev.parent); > } while (!parent_port_is_cxl_root(iter)); Actually, I think if we just make it dport->coord instead of dport->sw_coord and dport->hb_coord, we can remove the check and everything should work out properly. > > >> >> >> do { >> if (coordinates_invalid(&dport->sw_coord)) >> return -EINVAL; >> >> combine_coordinates(&c, &dport->sw_coord); >> iter = to_cxl_port(iter->dev.parent); >> dport = iter->parent_dport; >> } while (!parent_port_is_cxl_root(iter)); >> /* Do final link updates */ >> c.write_latency += dport->link_latency; >> c.read_latency += dport->link_latency; >> >>> + if (!parent_port_is_cxl_root(iter)) { >>> + if (coordinates_invalid(&dport->sw_coord)) >>> + return -EINVAL; >>> + >>> + combine_coordinates(&c, &dport->sw_coord); >>> + } >>> + >>> c.write_latency += dport->link_latency; >>> c.read_latency += dport->link_latency; >>> - >>> - iter = to_cxl_port(iter->dev.parent); >>> dport = iter->parent_dport; >>> } >>> >>> /* Augment with the generic port (host bridge) perf data */ >>> + if (coordinates_invalid(&dport->hb_coord)) >>> + return -EINVAL; >>> combine_coordinates(&c, &dport->hb_coord); >>> >>> /* Get the calculated PCI paths bandwidth */ >> >
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index e1d30a885700..2c82fe24b789 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -2110,6 +2110,20 @@ static void combine_coordinates(struct access_coordinate *c1, c1->read_latency += c2->read_latency; } +static bool coordinates_invalid(struct access_coordinate *c) +{ + if (!c->read_bandwidth && !c->write_bandwidth && + !c->read_latency && !c->write_latency) + return true; + + return false; +} + +static bool parent_port_is_cxl_root(struct cxl_port *port) +{ + return is_cxl_root(to_cxl_port(port->dev.parent)); +} + /** * cxl_endpoint_get_perf_coordinates - Retrieve performance numbers stored in dports * of CXL path @@ -2142,16 +2156,25 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port, * port each iteration. If the parent is cxl root then there is * nothing to gather. */ - while (!is_cxl_root(to_cxl_port(iter->dev.parent))) { - combine_coordinates(&c, &dport->sw_coord); + while (!parent_port_is_cxl_root(iter)) { + iter = to_cxl_port(iter->dev.parent); + + /* There's no CDAT for the host bridge, so skip if so. */ + if (!parent_port_is_cxl_root(iter)) { + if (coordinates_invalid(&dport->sw_coord)) + return -EINVAL; + + combine_coordinates(&c, &dport->sw_coord); + } + c.write_latency += dport->link_latency; c.read_latency += dport->link_latency; - - iter = to_cxl_port(iter->dev.parent); dport = iter->parent_dport; } /* Augment with the generic port (host bridge) perf data */ + if (coordinates_invalid(&dport->hb_coord)) + return -EINVAL; combine_coordinates(&c, &dport->hb_coord); /* Get the calculated PCI paths bandwidth */
Jonathan noted that when the coordinates for host bridge and switches can be 0s if no actual data are retrieved and the calculation continues. The resulting number would be inaccurate. Add checks to ensure that the calculation would complete only if the numbers are valid. Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> --- drivers/cxl/core/port.c | 31 +++++++++++++++++++++++++++---- 1 file changed, 27 insertions(+), 4 deletions(-)