Message ID | 20240220231402.3156281-1-dave.jiang@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | cxl: Add support to report region access coordinates to numa nodes | expand |
On Tue, 20 Feb 2024 16:12:29 -0700 Dave Jiang <dave.jiang@intel.com> wrote: > Hi Rafael, > Please review patches 1-4,10,11 and ack if they look ok to you. Thank you! > > Hi Greg, > Please review patch 2 and 11 and ack the numa node bits if they look ok to you. Thank you! Whilst currently a bit light weight, I poked this along with the QEMU Generic Port emulation on the gitlab.com/jic23/qemu cxl-2024-03-05 and some pathological cases from host side, It works so Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > v6: > - Enhance macros used to reduce code for cxl access coordinates sysfs attrs (Jonathan) > - Various minor updates and fixes, see per commit details. (Jonathan) > - Added review tags from Jonathan. > > v5: > - Fix various 0-day issues > - Remove EXPORT_SYMBOL for cxl_coords_combine() (Dan) > - Rebased against fixes series for qos_class [1]. > > v4: > - Introduce access class 0 and 1 for CXL access coordinates. > - See individual patches for detailed change log if applicable. > > v3: > - Make attributes not visible if no data. (Jonathan) > - Fix documentation verbiage. (Jonathan) > - Check against read bandwidth instead of write bandwidth due to future RO devices. (Jonathan) > - Export node_set_perf_attrs() to all namespaces. (Jonathan) > - Remove setting of coordinate access level 1. (Jonathan) > > v2: > - Move calculation function to core/cdat.c due to QTG series changes > - Make cxlr->coord static (Dan) > - Move calculation to cxl_region_attach to be under cxl_dpa_rwsem (Dan) > - Normalize perf latency numbers to nanoseconds (Brice) > - Update documentation with units and initiator details (Brice, Dan) > - Fix notifier return values (Dan) > - Use devm_add_action_or_reset() to unregister memory notifier (Dan) > > This series adds support for computing the performance data of a CXL region > and also updates the performance data to the NUMA node. This series depends > on the CXL QOS class series that's pending 6.8 pull request. > > CXL memory devices already attached before boot are enumerated by the BIOS. > The SRAT and HMAT tables are properly setup to including memory regions > enumerated from those CXL memory devices. For regions not programmed or a > hot-plugged CXL memory device, the BIOS does not have the relevant > information and the performance data has to be caluclated by the driver > post region assembly. > > According to numaperf documentation [2] there are 2 access classes defined > for performance between an initiator node and a memory target node. Access > class "0" describes attributes between a memory target and the highest > performing initator local to the target. In this case the initiator can be > a CPU or an I/O initiator such as a GPU or NIC. Access class "1" describes > attributes between a memory target and the nearest CPU node. Both access > classes are calculated for the CXL memory target and updated for NUMA nodes > through HMAT_REPORTING code or directly depending on if the NUMA node is > described by the ACPI SRAT table. > > Recall from qos_class series (v6.8) that the performance data for the ranges > of a CXL memory device is computed and cached. A CXL memory region can be > backed by one or more devices. Thus the performance data would be the > aggregated bandwidth of all devices that back a region and the worst > latency out of all devices backing the region. > > See kernel git branch [3] for convenience. > > [1]: https://lore.kernel.org/linux-cxl/20240206190431.1810289-1-dave.jiang@intel.com/T/#t > [2]: https://www.kernel.org/doc/Documentation/admin-guide/mm/numaperf.rst > [3]: https://git.kernel.org/pub/scm/linux/kernel/git/djiang/linux.git/log/?h=cxl-hmem-report >