diff mbox series

tracing: Include PPIN in mce_record tracepoint

Message ID 20240123235150.3744089-1-avadhut.naik@amd.com (mailing list archive)
State Handled Elsewhere
Headers show
Series tracing: Include PPIN in mce_record tracepoint | expand

Commit Message

Avadhut Naik Jan. 23, 2024, 11:51 p.m. UTC
Machine Check Error information from struct mce is exported to userspace
through the mce_record tracepoint.

Currently, however, the PPIN (Protected Processor Inventory Number) field
of struct mce is not exported through the tracepoint.

Export PPIN through the tracepoint as it may provide useful information
for debug and analysis.

Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
---
 include/trace/events/mce.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)


base-commit: 451b2bc29430fa147e36a48348f8b6b615fd6820

Comments

Tony Luck Jan. 24, 2024, 12:12 a.m. UTC | #1
On Tue, Jan 23, 2024 at 05:51:50PM -0600, Avadhut Naik wrote:
> Machine Check Error information from struct mce is exported to userspace
> through the mce_record tracepoint.
> 
> Currently, however, the PPIN (Protected Processor Inventory Number) field
> of struct mce is not exported through the tracepoint.
> 
> Export PPIN through the tracepoint as it may provide useful information
> for debug and analysis.

Awesome. I've been meaning to update the tracepoint for ages, but
it never gets to the top of the queue.

But some questions:

1) Are tracepoints a user visible ABI? Adding a new field in the middle
feels like it might be problematic. I asked this question many years
ago and Steven Rostedt said there was some tracing library in the works
that would make this OK for appplications using that library.

2) While you are adding to the tracepoint, should we batch up all
the useful changes that have been made to "struct mce". I think the
new fields that might be of use are:

        __u64 synd;             /* MCA_SYND MSR: only valid on SMCA systems */
        __u64 ipid;             /* MCA_IPID MSR: only valid on SMCA systems */
        __u64 ppin;             /* Protected Processor Inventory Number */
        __u32 microcode;        /* Microcode revision */

> 
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> ---
>  include/trace/events/mce.h | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
> index 1391ada0da3b..657b93ec8176 100644
> --- a/include/trace/events/mce.h
> +++ b/include/trace/events/mce.h
> @@ -25,6 +25,7 @@ TRACE_EVENT(mce_record,
>  		__field(	u64,		ipid		)
>  		__field(	u64,		ip		)
>  		__field(	u64,		tsc		)
> +		__field(	u64,		ppin	)
>  		__field(	u64,		walltime	)
>  		__field(	u32,		cpu		)
>  		__field(	u32,		cpuid		)
> @@ -45,6 +46,7 @@ TRACE_EVENT(mce_record,
>  		__entry->ipid		= m->ipid;
>  		__entry->ip		= m->ip;
>  		__entry->tsc		= m->tsc;
> +		__entry->ppin		= m->ppin;
>  		__entry->walltime	= m->time;
>  		__entry->cpu		= m->extcpu;
>  		__entry->cpuid		= m->cpuid;
> @@ -55,7 +57,7 @@ TRACE_EVENT(mce_record,
>  		__entry->cpuvendor	= m->cpuvendor;
>  	),

... rest of patch trimmed.

-Tony
Naik, Avadhut Jan. 24, 2024, 1:29 a.m. UTC | #2
Hi,

On 1/23/2024 6:12 PM, Tony Luck wrote:
> On Tue, Jan 23, 2024 at 05:51:50PM -0600, Avadhut Naik wrote:
>> Machine Check Error information from struct mce is exported to userspace
>> through the mce_record tracepoint.
>>
>> Currently, however, the PPIN (Protected Processor Inventory Number) field
>> of struct mce is not exported through the tracepoint.
>>
>> Export PPIN through the tracepoint as it may provide useful information
>> for debug and analysis.
> 
> Awesome. I've been meaning to update the tracepoint for ages, but
> it never gets to the top of the queue.
> 
> But some questions:
> 
> 1) Are tracepoints a user visible ABI? Adding a new field in the middle
> feels like it might be problematic. I asked this question many years
> ago and Steven Rostedt said there was some tracing library in the works
> that would make this OK for appplications using that library.
> 

I think they can be user visible through the "trace" and "trace_pipe" in
/sys/kernel/debug/tracing. But you will have to enable the events you want
to trace through /sys/kernel/debug/tracing/events/<event-name>/enable.

AFAIK, this (adding field in the middle) shouldn't be problematic as we
have the tracepoint format available in debugfs. For e.g. with this patch,
the format is as follows:

[root avadnaik]# cat /sys/kernel/debug/tracing/events/mce/mce_record/format 
name: mce_record
ID: 113
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:u64 mcgcap;       offset:8;       size:8; signed:0;
        field:u64 mcgstatus;    offset:16;      size:8; signed:0;
        field:u64 status;       offset:24;      size:8; signed:0;
        field:u64 addr; offset:32;      size:8; signed:0;
        field:u64 misc; offset:40;      size:8; signed:0;
        field:u64 synd; offset:48;      size:8; signed:0;
        field:u64 ipid; offset:56;      size:8; signed:0;
        field:u64 ip;   offset:64;      size:8; signed:0;
        field:u64 tsc;  offset:72;      size:8; signed:0;
        field:u64 ppin; offset:80;      size:8; signed:0;
        field:u64 walltime;     offset:88;      size:8; signed:0;
        field:u32 cpu;  offset:96;      size:4; signed:0;
        field:u32 cpuid;        offset:100;     size:4; signed:0;
        field:u32 apicid;       offset:104;     size:4; signed:0;
        field:u32 socketid;     offset:108;     size:4; signed:0;
        field:u8 cs;    offset:112;     size:1; signed:0;
        field:u8 bank;  offset:113;     size:1; signed:0;
        field:u8 cpuvendor;     offset:114;     size:1; signed:0;

print fmt: "CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x", REC->cpu, REC->mcgcap, REC->mcgstatus, REC->bank, REC->status, REC->ipid, REC->addr, REC->misc, REC->synd, REC->cs, REC->ip, REC->tsc, REC->ppin, REC->cpuvendor, REC->cpuid, REC->walltime, REC->socketid, REC->apicid


Just quickly tried with rasdaemon and things seem to be okay.

Also, not a cent percent sure, but the library you are mentioning of, I think
its the libtraceevent library and IIUC, it utilizes the above tracepoint format.

> 2) While you are adding to the tracepoint, should we batch up all
> the useful changes that have been made to "struct mce". I think the
> new fields that might be of use are:
> 
>         __u64 synd;             /* MCA_SYND MSR: only valid on SMCA systems */
>         __u64 ipid;             /* MCA_IPID MSR: only valid on SMCA systems */
>         __u64 ppin;             /* Protected Processor Inventory Number */
>         __u32 microcode;        /* Microcode revision */
> 

synd and ipid are already a part of mce_record tracepoint. (They too have been
added in the middle). Will add the microcode field in the next version.

>>
>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>> ---
>>  include/trace/events/mce.h | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
>> index 1391ada0da3b..657b93ec8176 100644
>> --- a/include/trace/events/mce.h
>> +++ b/include/trace/events/mce.h
>> @@ -25,6 +25,7 @@ TRACE_EVENT(mce_record,
>>  		__field(	u64,		ipid		)
>>  		__field(	u64,		ip		)
>>  		__field(	u64,		tsc		)
>> +		__field(	u64,		ppin	)
>>  		__field(	u64,		walltime	)
>>  		__field(	u32,		cpu		)
>>  		__field(	u32,		cpuid		)
>> @@ -45,6 +46,7 @@ TRACE_EVENT(mce_record,
>>  		__entry->ipid		= m->ipid;
>>  		__entry->ip		= m->ip;
>>  		__entry->tsc		= m->tsc;
>> +		__entry->ppin		= m->ppin;
>>  		__entry->walltime	= m->time;
>>  		__entry->cpu		= m->extcpu;
>>  		__entry->cpuid		= m->cpuid;
>> @@ -55,7 +57,7 @@ TRACE_EVENT(mce_record,
>>  		__entry->cpuvendor	= m->cpuvendor;
>>  	),
> 
> ... rest of patch trimmed.
> 
> -Tony
Steven Rostedt Jan. 24, 2024, 1:38 a.m. UTC | #3
On Tue, 23 Jan 2024 19:29:52 -0600
"Naik, Avadhut" <avadnaik@amd.com> wrote:

> > But some questions:
> > 
> > 1) Are tracepoints a user visible ABI? Adding a new field in the middle
> > feels like it might be problematic. I asked this question many years
> > ago and Steven Rostedt said there was some tracing library in the works
> > that would make this OK for appplications using that library.
> >   
> 
> I think they can be user visible through the "trace" and "trace_pipe" in
> /sys/kernel/debug/tracing. But you will have to enable the events you want
> to trace through /sys/kernel/debug/tracing/events/<event-name>/enable.
> 
> AFAIK, this (adding field in the middle) shouldn't be problematic as we
> have the tracepoint format available in debugfs. For e.g. with this patch,
> the format is as follows:
> 
> [root avadnaik]# cat /sys/kernel/debug/tracing/events/mce/mce_record/format 
> name: mce_record
> ID: 113
> format:
>         field:unsigned short common_type;       offset:0;       size:2; signed:0;
>         field:unsigned char common_flags;       offset:2;       size:1; signed:0;
>         field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
>         field:int common_pid;   offset:4;       size:4; signed:1;
> 
>         field:u64 mcgcap;       offset:8;       size:8; signed:0;
>         field:u64 mcgstatus;    offset:16;      size:8; signed:0;
>         field:u64 status;       offset:24;      size:8; signed:0;
>         field:u64 addr; offset:32;      size:8; signed:0;
>         field:u64 misc; offset:40;      size:8; signed:0;
>         field:u64 synd; offset:48;      size:8; signed:0;
>         field:u64 ipid; offset:56;      size:8; signed:0;
>         field:u64 ip;   offset:64;      size:8; signed:0;
>         field:u64 tsc;  offset:72;      size:8; signed:0;
>         field:u64 ppin; offset:80;      size:8; signed:0;
>         field:u64 walltime;     offset:88;      size:8; signed:0;
>         field:u32 cpu;  offset:96;      size:4; signed:0;
>         field:u32 cpuid;        offset:100;     size:4; signed:0;
>         field:u32 apicid;       offset:104;     size:4; signed:0;
>         field:u32 socketid;     offset:108;     size:4; signed:0;
>         field:u8 cs;    offset:112;     size:1; signed:0;
>         field:u8 bank;  offset:113;     size:1; signed:0;
>         field:u8 cpuvendor;     offset:114;     size:1; signed:0;
> 
> print fmt: "CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x", REC->cpu, REC->mcgcap, REC->mcgstatus, REC->bank, REC->status, REC->ipid, REC->addr, REC->misc, REC->synd, REC->cs, REC->ip, REC->tsc, REC->ppin, REC->cpuvendor, REC->cpuid, REC->walltime, REC->socketid, REC->apicid
> 
> 
> Just quickly tried with rasdaemon and things seem to be okay.
> 
> Also, not a cent percent sure, but the library you are mentioning of, I think
> its the libtraceevent library and IIUC, it utilizes the above tracepoint format.
> 

Yes, rasdaemon uses libtraceevent (or a copy of it internally) that
reads the format file to find fields. You can safely add fields to the
middle of the event structure and the parsing will be just fine.

-- Steve
Borislav Petkov Jan. 24, 2024, 9:57 a.m. UTC | #4
On Tue, Jan 23, 2024 at 08:38:53PM -0500, Steven Rostedt wrote:
> Yes, rasdaemon uses libtraceevent (or a copy of it internally) that
> reads the format file to find fields. You can safely add fields to the
> middle of the event structure and the parsing will be just fine.

Should we worry about tools who consume the event "blindly", without the
lib?

I guess no until we break some use case and then we will have to revert.
At least this is what we've done in the past...
Steven Rostedt Jan. 24, 2024, 2:09 p.m. UTC | #5
On Wed, 24 Jan 2024 10:57:08 +0100
Borislav Petkov <bp@alien8.de> wrote:

> On Tue, Jan 23, 2024 at 08:38:53PM -0500, Steven Rostedt wrote:
> > Yes, rasdaemon uses libtraceevent (or a copy of it internally) that
> > reads the format file to find fields. You can safely add fields to the
> > middle of the event structure and the parsing will be just fine.  
> 
> Should we worry about tools who consume the event "blindly", without the
> lib?

I don't think that's a worry anymore. The offsets can change based on
kernel config. PowerTop needed to have the library ported to it because
it use to hardcode the offsets but then it broke when running the 32bit
version on a 64bit kernel.

> 
> I guess no until we break some use case and then we will have to revert.
> At least this is what we've done in the past...
> 

But that revert was reverted when we converted PowerTop to use libtraceevent.

-- Steve
Borislav Petkov Jan. 25, 2024, 6:54 p.m. UTC | #6
On Wed, Jan 24, 2024 at 09:09:08AM -0500, Steven Rostedt wrote:
> I don't think that's a worry anymore. The offsets can change based on
> kernel config. PowerTop needed to have the library ported to it because
> it use to hardcode the offsets but then it broke when running the 32bit
> version on a 64bit kernel.
> 
> > 
> > I guess no until we break some use case and then we will have to revert.
> > At least this is what we've done in the past...
> > 
> 
> But that revert was reverted when we converted PowerTop to use libtraceevent.

Ok, sounds like a good plan.

/me makes a mental note for the future.

Thx.
diff mbox series

Patch

diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
index 1391ada0da3b..657b93ec8176 100644
--- a/include/trace/events/mce.h
+++ b/include/trace/events/mce.h
@@ -25,6 +25,7 @@  TRACE_EVENT(mce_record,
 		__field(	u64,		ipid		)
 		__field(	u64,		ip		)
 		__field(	u64,		tsc		)
+		__field(	u64,		ppin	)
 		__field(	u64,		walltime	)
 		__field(	u32,		cpu		)
 		__field(	u32,		cpuid		)
@@ -45,6 +46,7 @@  TRACE_EVENT(mce_record,
 		__entry->ipid		= m->ipid;
 		__entry->ip		= m->ip;
 		__entry->tsc		= m->tsc;
+		__entry->ppin		= m->ppin;
 		__entry->walltime	= m->time;
 		__entry->cpu		= m->extcpu;
 		__entry->cpuid		= m->cpuid;
@@ -55,7 +57,7 @@  TRACE_EVENT(mce_record,
 		__entry->cpuvendor	= m->cpuvendor;
 	),
 
-	TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x",
+	TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x",
 		__entry->cpu,
 		__entry->mcgcap, __entry->mcgstatus,
 		__entry->bank, __entry->status,
@@ -63,6 +65,7 @@  TRACE_EVENT(mce_record,
 		__entry->addr, __entry->misc, __entry->synd,
 		__entry->cs, __entry->ip,
 		__entry->tsc,
+		__entry->ppin,
 		__entry->cpuvendor, __entry->cpuid,
 		__entry->walltime,
 		__entry->socketid,