mbox series

[v5,0/6] Update Event Records to CXL spec rev 3.1

Message ID 20250110115556.1654-1-shiju.jose@huawei.com
Headers show
Series Update Event Records to CXL spec rev 3.1 | expand

Message

Shiju Jose Jan. 10, 2025, 11:55 a.m. UTC
From: Shiju Jose <shiju.jose@huawei.com>

Add updates in the CXL events records and CXL trace events implementations
for the changes in CXL spec rev 3.1.

Shiju Jose (6):
  cxl/events: Update Common Event Record to CXL spec rev 3.1
  cxl/events: Add Component Identifier formatting for CXL spec rev 3.1
  cxl/events: Update General Media Event Record to CXL spec rev 3.1
  cxl/events: Update DRAM Event Record to CXL spec rev 3.1
  cxl/events: Update Memory Module Event Record to CXL spec rev 3.1
  cxl/test: Update test code for event records to CXL spec rev 3.1

Changes:
V4 -> V5
1. Reverted changes made in v4 for overcoming parsing error when
libtraceevent in userspace parses the CXL trace events, for rasdaemon.
This was due to trace event's format file is larger than PAGE_SIZE, not
supported reading complete format file in one go in the kernel and thus
fixed in the rasdaemon.
2. Rebased to v6.13-rc5.
3. Tested with rasdaemon and ras-mc-ctl tools updated for CXL spec rev 3.1
   event record changes.

V3 -> V4
1. Changes for the parsing error parsing error when libtraceevent in
userspace parses the CXL trace events, for rasdaemon.
It was found that long decoded strings of field values in the TP_printk()
caused the issue, looks like due to buffer overflow/corruption.
Increasing known buffer sizes in userspace and kernel did not help.
As a solution, decoding of some fields in the TP_printk() are removed
to accommodate the new fields.
Decoding of these fields is added in the userspace tool rasdaemon.

V2 -> V3
1. Changes for the feedbacks from Jonathan.
 - Added printing component Id format bit in show_valid_flags()
 - Modified parsing component ID in patch [2] and added logging
   of raw comp-id, comp_id_pldm_flags, PLDM entity id and
   PLDM resource id in patches 3 to 4.
 
V1 -> V2
1. Changes for the feedbacks from Jonathan.
  - Separate patch for Component Identifier formatting.
  - Moved printing of event sub type after event type.
  - For memory module event, rename sub_type to event_sub_type. 
2. Changes for the feedbacks from Alison.
  - Updated patch's subject
  - Updated CXL test code for CXL spec rev 3.1 event records.
3. Changed logic for Component Identifier formatting and other improvements.

 drivers/cxl/core/trace.h     | 258 +++++++++++++++++++++++++++++------
 include/cxl/event.h          |  28 ++--
 tools/testing/cxl/test/mem.c |  23 +++-
 3 files changed, 256 insertions(+), 53 deletions(-)

Comments

Jonathan Cameron Jan. 10, 2025, 4:06 p.m. UTC | #1
On Fri, 10 Jan 2025 11:55:50 +0000
<shiju.jose@huawei.com> wrote:

> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Add updates in the CXL events records and CXL trace events implementations
> for the changes in CXL spec rev 3.1.
> 
> Shiju Jose (6):
>   cxl/events: Update Common Event Record to CXL spec rev 3.1
>   cxl/events: Add Component Identifier formatting for CXL spec rev 3.1
>   cxl/events: Update General Media Event Record to CXL spec rev 3.1
>   cxl/events: Update DRAM Event Record to CXL spec rev 3.1
>   cxl/events: Update Memory Module Event Record to CXL spec rev 3.1
>   cxl/test: Update test code for event records to CXL spec rev 3.1
> 
> Changes:
> V4 -> V5
> 1. Reverted changes made in v4 for overcoming parsing error when
> libtraceevent in userspace parses the CXL trace events, for rasdaemon.
> This was due to trace event's format file is larger than PAGE_SIZE, not
> supported reading complete format file in one go in the kernel and thus
> fixed in the rasdaemon.

Great to see that resolved.

> 2. Rebased to v6.13-rc5.

Should probably say why when doing a rebase to something other than rc1.
In this case this is what cxl.git/next is based on after some fixes earlier
in the cycle so a sensible choice for this set.

As far as I'm concerned this set is ready to go, but more eyes always good
if anyone has time! Same for the ras-daemon series once this is queued for
the kernel.

Jonathan

> 3. Tested with rasdaemon and ras-mc-ctl tools updated for CXL spec rev 3.1
>    event record changes.
Shiju Jose Jan. 10, 2025, 4:46 p.m. UTC | #2
>-----Original Message-----
>From: Jonathan Cameron <jonathan.cameron@huawei.com>
>Sent: 10 January 2025 16:07
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: dave.jiang@intel.com; dan.j.williams@intel.com; alison.schofield@intel.com;
>nifan.cxl@gmail.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>dave@stgolabs.net; linux-cxl@vger.kernel.org; linux-kernel@vger.kernel.org;
>Linuxarm <linuxarm@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>;
>Zengtao (B) <prime.zeng@hisilicon.com>
>Subject: Re: [PATCH v5 0/6] Update Event Records to CXL spec rev 3.1
>
>On Fri, 10 Jan 2025 11:55:50 +0000
><shiju.jose@huawei.com> wrote:
>
>> From: Shiju Jose <shiju.jose@huawei.com>
>>
>> Add updates in the CXL events records and CXL trace events
>> implementations for the changes in CXL spec rev 3.1.
>>
>> Shiju Jose (6):
>>   cxl/events: Update Common Event Record to CXL spec rev 3.1
>>   cxl/events: Add Component Identifier formatting for CXL spec rev 3.1
>>   cxl/events: Update General Media Event Record to CXL spec rev 3.1
>>   cxl/events: Update DRAM Event Record to CXL spec rev 3.1
>>   cxl/events: Update Memory Module Event Record to CXL spec rev 3.1
>>   cxl/test: Update test code for event records to CXL spec rev 3.1
>>
>> Changes:
>> V4 -> V5
>> 1. Reverted changes made in v4 for overcoming parsing error when
>> libtraceevent in userspace parses the CXL trace events, for rasdaemon.
>> This was due to trace event's format file is larger than PAGE_SIZE,
>> not supported reading complete format file in one go in the kernel and
>> thus fixed in the rasdaemon.
>
>Great to see that resolved.
>
>> 2. Rebased to v6.13-rc5.
>
>Should probably say why when doing a rebase to something other than rc1.
>In this case this is what cxl.git/next is based on after some fixes earlier in the
>cycle so a sensible choice for this set.

I checked. These patches applied cleanly in cxl.git/next and buid okay. 

Thanks,
Shiju
>
>As far as I'm concerned this set is ready to go, but more eyes always good if
>anyone has time! Same for the ras-daemon series once this is queued for the
>kernel.
>
>Jonathan
>
>> 3. Tested with rasdaemon and ras-mc-ctl tools updated for CXL spec rev 3.1
>>    event record changes.
Dave Jiang Jan. 10, 2025, 8:18 p.m. UTC | #3
On 1/10/25 9:46 AM, Shiju Jose wrote:
>> -----Original Message-----
>> From: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Sent: 10 January 2025 16:07
>> To: Shiju Jose <shiju.jose@huawei.com>
>> Cc: dave.jiang@intel.com; dan.j.williams@intel.com; alison.schofield@intel.com;
>> nifan.cxl@gmail.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>> dave@stgolabs.net; linux-cxl@vger.kernel.org; linux-kernel@vger.kernel.org;
>> Linuxarm <linuxarm@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>;
>> Zengtao (B) <prime.zeng@hisilicon.com>
>> Subject: Re: [PATCH v5 0/6] Update Event Records to CXL spec rev 3.1
>>
>> On Fri, 10 Jan 2025 11:55:50 +0000
>> <shiju.jose@huawei.com> wrote:
>>
>>> From: Shiju Jose <shiju.jose@huawei.com>
>>>
>>> Add updates in the CXL events records and CXL trace events
>>> implementations for the changes in CXL spec rev 3.1.
>>>
>>> Shiju Jose (6):
>>>   cxl/events: Update Common Event Record to CXL spec rev 3.1
>>>   cxl/events: Add Component Identifier formatting for CXL spec rev 3.1
>>>   cxl/events: Update General Media Event Record to CXL spec rev 3.1
>>>   cxl/events: Update DRAM Event Record to CXL spec rev 3.1
>>>   cxl/events: Update Memory Module Event Record to CXL spec rev 3.1
>>>   cxl/test: Update test code for event records to CXL spec rev 3.1
>>>
>>> Changes:
>>> V4 -> V5
>>> 1. Reverted changes made in v4 for overcoming parsing error when
>>> libtraceevent in userspace parses the CXL trace events, for rasdaemon.
>>> This was due to trace event's format file is larger than PAGE_SIZE,
>>> not supported reading complete format file in one go in the kernel and
>>> thus fixed in the rasdaemon.
>>
>> Great to see that resolved.
>>
>>> 2. Rebased to v6.13-rc5.
>>
>> Should probably say why when doing a rebase to something other than rc1.
>> In this case this is what cxl.git/next is based on after some fixes earlier in the
>> cycle so a sensible choice for this set.
> 
> I checked. These patches applied cleanly in cxl.git/next and buid okay. 

Hi Shiju,
Can you please apply Ira's suggestions and respin a v6? Thanks!

> 
> Thanks,
> Shiju
>>
>> As far as I'm concerned this set is ready to go, but more eyes always good if
>> anyone has time! Same for the ras-daemon series once this is queued for the
>> kernel.
>>
>> Jonathan
>>
>>> 3. Tested with rasdaemon and ras-mc-ctl tools updated for CXL spec rev 3.1
>>>    event record changes.
>
Shiju Jose Jan. 11, 2025, 9:22 a.m. UTC | #4
>-----Original Message-----
>From: Dave Jiang <dave.jiang@intel.com>
>Sent: 10 January 2025 20:18
>To: Shiju Jose <shiju.jose@huawei.com>; Jonathan Cameron
><jonathan.cameron@huawei.com>
>Cc: dan.j.williams@intel.com; alison.schofield@intel.com; nifan.cxl@gmail.com;
>vishal.l.verma@intel.com; ira.weiny@intel.com; dave@stgolabs.net; linux-
>cxl@vger.kernel.org; linux-kernel@vger.kernel.org; Linuxarm
><linuxarm@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B)
><prime.zeng@hisilicon.com>
>Subject: Re: [PATCH v5 0/6] Update Event Records to CXL spec rev 3.1
>
>
>
>On 1/10/25 9:46 AM, Shiju Jose wrote:
>>> -----Original Message-----
>>> From: Jonathan Cameron <jonathan.cameron@huawei.com>
>>> Sent: 10 January 2025 16:07
>>> To: Shiju Jose <shiju.jose@huawei.com>
>>> Cc: dave.jiang@intel.com; dan.j.williams@intel.com;
>>> alison.schofield@intel.com; nifan.cxl@gmail.com;
>>> vishal.l.verma@intel.com; ira.weiny@intel.com; dave@stgolabs.net;
>>> linux-cxl@vger.kernel.org; linux-kernel@vger.kernel.org; Linuxarm
>>> <linuxarm@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>; Zengtao
>>> (B) <prime.zeng@hisilicon.com>
>>> Subject: Re: [PATCH v5 0/6] Update Event Records to CXL spec rev 3.1
>>>
>>> On Fri, 10 Jan 2025 11:55:50 +0000
>>> <shiju.jose@huawei.com> wrote:
>>>
>>>> From: Shiju Jose <shiju.jose@huawei.com>
>>>>
>>>> Add updates in the CXL events records and CXL trace events
>>>> implementations for the changes in CXL spec rev 3.1.
>>>>
>>>> Shiju Jose (6):
>>>>   cxl/events: Update Common Event Record to CXL spec rev 3.1
>>>>   cxl/events: Add Component Identifier formatting for CXL spec rev 3.1
>>>>   cxl/events: Update General Media Event Record to CXL spec rev 3.1
>>>>   cxl/events: Update DRAM Event Record to CXL spec rev 3.1
>>>>   cxl/events: Update Memory Module Event Record to CXL spec rev 3.1
>>>>   cxl/test: Update test code for event records to CXL spec rev 3.1
>>>>
>>>> Changes:
>>>> V4 -> V5
>>>> 1. Reverted changes made in v4 for overcoming parsing error when
>>>> libtraceevent in userspace parses the CXL trace events, for rasdaemon.
>>>> This was due to trace event's format file is larger than PAGE_SIZE,
>>>> not supported reading complete format file in one go in the kernel
>>>> and thus fixed in the rasdaemon.
>>>
>>> Great to see that resolved.
>>>
>>>> 2. Rebased to v6.13-rc5.
>>>
>>> Should probably say why when doing a rebase to something other than rc1.
>>> In this case this is what cxl.git/next is based on after some fixes
>>> earlier in the cycle so a sensible choice for this set.
>>
>> I checked. These patches applied cleanly in cxl.git/next and buid okay.
>
>Hi Shiju,
>Can you please apply Ira's suggestions and respin a v6? Thanks!

Hi  Dave,

Sure. I added Ira's suggestions and please find v6 of the series here.
https://lore.kernel.org/all/20250111091756.1682-1-shiju.jose@huawei.com/

Thanks,
Shiju

>
[...]
>