mbox series

[v2,0/2] Add log related mailbox commands

Message ID 20240222172350.512-1-sthanneeru.opensrc@micron.com
Headers show
Series Add log related mailbox commands | expand

Message

Srinivasulu Opensrc Feb. 22, 2024, 5:23 p.m. UTC
From: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>

Add support to expose following mailbox commands to userspace
for clearing and populating the Vendor debug log and
Component State dump log in certain scenarios,
allowing for the aggregation of results over time.

1. CXL r3.1 8.2.9.5.3 Get Log Capabilities.
2. CXL r3.1 8.2.9.5.4 Clear Log commands.
3. CXL r3.1 8.2.9.5.6 Get Supported Logs Sub-List.

---
Changes in v2:
- Add descption for exposing these mailbox log commands to ioctl.
- Create seperate patch for 'Clear log'.
- Restrict the ‘Clear log’ action to only apply to
Vendor debug logs and Component state dump logs.
- Rename get log sublist to get supported log sublist.
- Link to v1: https://lore.kernel.org/linux-mm/20240207103634.199-1-sthanneeru.opensrc@micron.com/
---

Srinivasulu Thanneeru (2):
  cxl/mbox: Add Get Log Capabilities and Get Supported Logs Sub-List
    commands
  cxl/mbox: Add Clear Log mailbox command

 drivers/cxl/core/mbox.c      | 13 +++++++++++++
 drivers/cxl/cxlmem.h         |  6 ++++++
 include/uapi/linux/cxl_mem.h |  3 +++
 3 files changed, 22 insertions(+)

Comments

Dan Williams March 4, 2024, 8:39 p.m. UTC | #1
Any specific reason to include linux-mm on this enabling? I suspect this
topic is only of interest to linux-cxl. Lets drop linux-mm from a v3
posting.

sthanneeru.opensrc@ wrote:
> From: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
> 
> Add support to expose following mailbox commands to userspace
> for clearing and populating the Vendor debug log and
> Component State dump log in certain scenarios,
> allowing for the aggregation of results over time.

What is missing for me here is why the ioctl() ABI is suitable for both
of these. The Vendor Debug Log seems straightforward to enable via
ioctl() since there is no background operation entanglement, no
population complexity, and no reference to other payloads.

The Component State Dump has several caveats in my mind for ioctl() not
being a suitable ABI:

1/ Populate Log has an unreasonable expectation that the submitter can
monopolize the mailbox indefinitely. At least that can be solved by
Linux only supporting automatically populated Component State Dump logs.

2/ Automatic log population requires the caller to handle races with
auto-population. The kernel need not export that complexity to
userspace. This is not fatal to the proposal, but it has interactions
with caveat 3.

3/ The Component State Dump format wants to reference events in the
Event Log, if the Event Log has been cleared then, per the spec, the
Component State Dump must not reference the Event Handle. To me that
implies that the code that clears event records must be careful to not
destroy linkage to component state information. That suggests that the
proper place to dump the component state log is an addendum to the
current trace events, before that code clears the event record.

Something roughly like this:

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 9adda4795eb7..498b2a0b3e76 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -941,7 +941,8 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds,
 }
 
 static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
-                                   enum cxl_event_log_type type)
+                                   enum cxl_event_log_type type,
+                                   struct cxl_component_state_dump *csd)
 {
        struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
        struct device *dev = mds->cxlds.dev;
@@ -977,9 +978,12 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
                if (!nr_rec)
                        break;
 
-               for (i = 0; i < nr_rec; i++)
+               for (i = 0; i < nr_rec; i++) {
                        __cxl_event_trace_record(cxlmd, type,
                                                 &payload->records[i]);
+                       if (is_event_referenced(csd, type, &payload->records[i]))
+                               trace_csd(csd, ...);
+               }
 
                if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW)
                        trace_cxl_overflow(cxlmd, type, payload);

...but in general this cover letter needs to comment on the long term
suitability of the ABI.
Srinivasulu Opensrc March 5, 2024, 10:16 a.m. UTC | #2
> -----Original Message-----
> From: Dan Williams <dan.j.williams@intel.com>
> Sent: Tuesday, March 5, 2024 2:10 AM
> To: Srinivasulu Opensrc <sthanneeru.opensrc@micron.com>; linux-
> cxl@vger.kernel.org; linux-mm@kvack.org
> Cc: Jonathan.Cameron@huawei.com; dan.j.williams@intel.com;
> john@jagalactic.com; Eishan Mirakhur <emirakhur@micron.com>; Ajay Joshi
> <ajayjoshi@micron.com>; Ravis OpenSrc <Ravis.OpenSrc@micron.com>;
> Srinivasulu Thanneeru <sthanneeru@micron.com>
> Subject: [EXT] Re: [PATCH v2 0/2] Add log related mailbox commands
> 
> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless
> you recognize the sender and were expecting this message.
> 
> 
> Any specific reason to include linux-mm on this enabling? I suspect this
> topic is only of interest to linux-cxl. Lets drop linux-mm from a v3
> posting.

Sorry, it's a copy paste mistake.
Agreed, will drop linux-mm from V3. 
> 
> sthanneeru.opensrc@ wrote:
> > From: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
> >
> > Add support to expose following mailbox commands to userspace
> > for clearing and populating the Vendor debug log and
> > Component State dump log in certain scenarios,
> > allowing for the aggregation of results over time.
> 
> What is missing for me here is why the ioctl() ABI is suitable for both
> of these. The Vendor Debug Log seems straightforward to enable via
> ioctl() since there is no background operation entanglement, no
> population complexity, and no reference to other payloads.
> 
> The Component State Dump has several caveats in my mind for ioctl() not
> being a suitable ABI:
> 
> 1/ Populate Log has an unreasonable expectation that the submitter can
> monopolize the mailbox indefinitely. At least that can be solved by
> Linux only supporting automatically populated Component State Dump logs.
> 
> 2/ Automatic log population requires the caller to handle races with
> auto-population. The kernel need not export that complexity to
> userspace. This is not fatal to the proposal, but it has interactions
> with caveat 3.
> 
> 3/ The Component State Dump format wants to reference events in the
> Event Log, if the Event Log has been cleared then, per the spec, the
> Component State Dump must not reference the Event Handle. To me that
> implies that the code that clears event records must be careful to not
> destroy linkage to component state information. That suggests that the
> proper place to dump the component state log is an addendum to the
> current trace events, before that code clears the event record.
> 
> Something roughly like this:

Thank you for suggesting this approach.
Next version V3, will add "clear log" for Component State Dump separately,
as suggested bellow. Probably as a separate patch.

> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 9adda4795eb7..498b2a0b3e76 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -941,7 +941,8 @@ static int cxl_clear_event_record(struct
> cxl_memdev_state *mds,
>  }
> 
>  static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> -                                   enum cxl_event_log_type type)
> +                                   enum cxl_event_log_type type,
> +                                   struct cxl_component_state_dump *csd)
>  {
>         struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
>         struct device *dev = mds->cxlds.dev;
> @@ -977,9 +978,12 @@ static void cxl_mem_get_records_log(struct
> cxl_memdev_state *mds,
>                 if (!nr_rec)
>                         break;
> 
> -               for (i = 0; i < nr_rec; i++)
> +               for (i = 0; i < nr_rec; i++) {
>                         __cxl_event_trace_record(cxlmd, type,
>                                                  &payload->records[i]);
> +                       if (is_event_referenced(csd, type, &payload->records[i]))
> +                               trace_csd(csd, ...);
> +               }
> 
>                 if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW)
>                         trace_cxl_overflow(cxlmd, type, payload);
> 
> ...but in general this cover letter needs to comment on the long term
> suitability of the ABI.
Will add more details to cover letter in V3.