mbox series

[ndctl,v12,0/8] Support poison list retrieval

Message ID cover.1711519822.git.alison.schofield@intel.com
Headers show
Series Support poison list retrieval | expand

Message

Alison Schofield March 27, 2024, 7:52 p.m. UTC
From: Alison Schofield <alison.schofield@intel.com>

Changes since v11:
- Remove needless rc init (DaveJ)
- Update man page examples (Wonjae, Dan)
- Replace fprintf() w err() in libcxl.c  (Fan)
- Update stale --poison comment in unit test (Wonjae)
- Move ndctl/cxl/event_trace.c/.h to ndctl/util/  (Dan)
- Constify the pointer in poison_source array declaration
- Move the json flags from the poison_ctx to event_ctx
- Move intro of poison_ctx to parsing patch, Patch 6
- Add unsupported feature err() message in libcxl.c 
- v11: https://lore.kernel.org/cover.1710386468.git.alison.schofield@intel.com/


Begin cover letter:
Add the option to add a memory devices poison list to the cxl-list
json output. Offer the option by memdev and by region. 

From the man page cxl-list:

       -L, --media-errors
           Include media-error information. The poison list is retrieved from
           the device(s) and media_error records are added to the listing.
           Apply this option to memdevs and regions where devices support the
           poison list capability. "offset:" is relative to the region
           resource when listing by region and is the absolute device DPA when
           listing by memdev. "source:" is one of: External, Internal,
           Injected, Vendor Specific, or Unknown, as defined in CXL
           Specification v3.1 Table 8-140.

           # cxl list -m mem9 --media-errors -u
           {
             "memdev":"mem9",
             "pmem_size":"1024.00 MiB (1073.74 MB)",
             "pmem_qos_class":42,
             "ram_size":"1024.00 MiB (1073.74 MB)",
             "ram_qos_class":42,
             "serial":"0x5",
             "numa_node":1,
             "host":"cxl_mem.5",
             "media_errors":[
               {
                 "offset":"0x40000000",
                 "length":64,
                 "source":"Injected"
               }
             ]
           }

           # cxl list -r region5 --media-errors -u
           {
             "region":"region5",
             "resource":"0xf110000000",
             "size":"2.00 GiB (2.15 GB)",
             "type":"pmem",
             "interleave_ways":2,
             "interleave_granularity":4096,
             "decode_state":"commit",
             "media_errors":[
               {
                 "offset":"0x1000",
                 "length":64,
                 "source":"Injected"
               },
               {
                 "offset":"0x2000",
                 "length":64,
                 "source":"Injected"
               }
             ]
           }

           More complex cxl list queries can be created by using cxl list object
           and filtering options. The first example below emits all the endpoint
           ports with their decoders and memdevs with media-errors. The second
           example filters that output further by a single memdev.

              # cxl list -DEM -p endpoint --media-errors
              # cxl list -DEM -p mem9 --media-errors


Alison Schofield (8):
  util/trace: move trace helpers from ndctl/cxl/ to ndctl/util/
  util/trace: add an optional pid check to event parsing
  util/trace: pass an event_ctx to its own parse_event method
  util/trace: add helpers to retrieve tep fields by type
  libcxl: add interfaces for GET_POISON_LIST mailbox commands
  cxl/list: collect and parse media_error records
  cxl/list: add --media-errors option to cxl list
  cxl/test: add cxl-poison.sh unit test

 Documentation/cxl/cxl-list.txt |  64 ++++++++++-
 cxl/event_trace.h              |  27 -----
 cxl/filter.h                   |   3 +
 cxl/json.c                     | 195 +++++++++++++++++++++++++++++++++
 cxl/lib/libcxl.c               |  53 +++++++++
 cxl/lib/libcxl.sym             |   2 +
 cxl/libcxl.h                   |   2 +
 cxl/list.c                     |   3 +
 cxl/meson.build                |   2 +-
 cxl/monitor.c                  |  11 +-
 test/cxl-poison.sh             | 137 +++++++++++++++++++++++
 test/meson.build               |   2 +
 {cxl => util}/event_trace.c    |  68 +++++++++---
 util/event_trace.h             |  42 +++++++
 14 files changed, 562 insertions(+), 49 deletions(-)
 delete mode 100644 cxl/event_trace.h
 create mode 100644 test/cxl-poison.sh
 rename {cxl => util}/event_trace.c (76%)
 create mode 100644 util/event_trace.h


base-commit: 5e9157d6721a878757f0fe8a3c51f06f9e94934a