mbox series

[ndctl,v13,0/8] Support poison list retrieval

Message ID cover.1720241079.git.alison.schofield@intel.com
Headers show
Series Support poison list retrieval | expand

Message

Alison Schofield July 6, 2024, 6:24 a.m. UTC
From: Alison Schofield <alison.schofield@intel.com>


Patches 1 & 3 need review.

Changes since v12:
- There were no responses on the mailing list to v12.
- Remove add-on query suggestions from the --media-errors section of
  the man page. Developers continue to debate what the user needs in
  regard to cxl list queries beyond the basic list by memdev and by
  region. The only direct user feedback is that they want the poison
  list capability added to ndctl. Since the command line query and the
  json output are solid, move ahead and get this into the hands of 
  users. Let the user experience drive enhanced queries.
Link to v12:
https://lore.kernel.org/cover.1711519822.git.alison.schofield@intel.com/


Begin cover letter:

Add the option to add a memory devices poison list to the cxl-list
json output. Offer the option by memdev and by region. 

From the man page cxl-list:

       -L, --media-errors
           Include media-error information. The poison list is retrieved from
           the device(s) and media_error records are added to the listing.
           Apply this option to memdevs and regions where devices support the
           poison list capability. "offset:" is relative to the region
           resource when listing by region and is the absolute device DPA when
           listing by memdev. "source:" is one of: External, Internal,
           Injected, Vendor Specific, or Unknown, as defined in CXL
           Specification v3.1 Table 8-140.

           # cxl list -m mem9 --media-errors -u
           {
             "memdev":"mem9",
             "pmem_size":"1024.00 MiB (1073.74 MB)",
             "pmem_qos_class":42,
             "ram_size":"1024.00 MiB (1073.74 MB)",
             "ram_qos_class":42,
             "serial":"0x5",
             "numa_node":1,
             "host":"cxl_mem.5",
             "media_errors":[
               {
                 "offset":"0x40000000",
                 "length":64,
                 "source":"Injected"
               }
             ]
           }

           # cxl list -r region5 --media-errors -u
           {
             "region":"region5",
             "resource":"0xf110000000",
             "size":"2.00 GiB (2.15 GB)",
             "type":"pmem",
             "interleave_ways":2,
             "interleave_granularity":4096,
             "decode_state":"commit",
             "media_errors":[
               {
                 "offset":"0x1000",
                 "length":64,
                 "source":"Injected"
               },
               {
                 "offset":"0x2000",
                 "length":64,
                 "source":"Injected"
               }
             ]
           }



Alison Schofield (8):
  util/trace: move trace helpers from ndctl/cxl/ to ndctl/util/
  util/trace: add an optional pid check to event parsing
  util/trace: pass an event_ctx to its own parse_event method
  util/trace: add helpers to retrieve tep fields by type
  libcxl: add interfaces for GET_POISON_LIST mailbox commands
  cxl/list: collect and parse media_error records
  cxl/list: add --media-errors option to cxl list
  cxl/test: add cxl-poison.sh unit test

 Documentation/cxl/cxl-list.txt |  56 +++++++++-
 cxl/event_trace.h              |  27 -----
 cxl/filter.h                   |   3 +
 cxl/json.c                     | 195 +++++++++++++++++++++++++++++++++
 cxl/lib/libcxl.c               |  53 +++++++++
 cxl/lib/libcxl.sym             |   6 +
 cxl/libcxl.h                   |   2 +
 cxl/list.c                     |   3 +
 cxl/meson.build                |   2 +-
 cxl/monitor.c                  |  11 +-
 test/cxl-poison.sh             | 137 +++++++++++++++++++++++
 test/meson.build               |   2 +
 {cxl => util}/event_trace.c    |  68 +++++++++---
 util/event_trace.h             |  42 +++++++
 14 files changed, 558 insertions(+), 49 deletions(-)
 delete mode 100644 cxl/event_trace.h
 create mode 100644 test/cxl-poison.sh
 rename {cxl => util}/event_trace.c (76%)
 create mode 100644 util/event_trace.h


base-commit: 16f45755f991f4fb6d76fec70a42992426c84234