Message ID | 20240123235150.3744089-1-avadhut.naik@amd.com (mailing list archive) |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | tracing: Include PPIN in mce_record tracepoint | expand |
On Tue, Jan 23, 2024 at 05:51:50PM -0600, Avadhut Naik wrote: > Machine Check Error information from struct mce is exported to userspace > through the mce_record tracepoint. > > Currently, however, the PPIN (Protected Processor Inventory Number) field > of struct mce is not exported through the tracepoint. > > Export PPIN through the tracepoint as it may provide useful information > for debug and analysis. Awesome. I've been meaning to update the tracepoint for ages, but it never gets to the top of the queue. But some questions: 1) Are tracepoints a user visible ABI? Adding a new field in the middle feels like it might be problematic. I asked this question many years ago and Steven Rostedt said there was some tracing library in the works that would make this OK for appplications using that library. 2) While you are adding to the tracepoint, should we batch up all the useful changes that have been made to "struct mce". I think the new fields that might be of use are: __u64 synd; /* MCA_SYND MSR: only valid on SMCA systems */ __u64 ipid; /* MCA_IPID MSR: only valid on SMCA systems */ __u64 ppin; /* Protected Processor Inventory Number */ __u32 microcode; /* Microcode revision */ > > Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> > --- > include/trace/events/mce.h | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h > index 1391ada0da3b..657b93ec8176 100644 > --- a/include/trace/events/mce.h > +++ b/include/trace/events/mce.h > @@ -25,6 +25,7 @@ TRACE_EVENT(mce_record, > __field( u64, ipid ) > __field( u64, ip ) > __field( u64, tsc ) > + __field( u64, ppin ) > __field( u64, walltime ) > __field( u32, cpu ) > __field( u32, cpuid ) > @@ -45,6 +46,7 @@ TRACE_EVENT(mce_record, > __entry->ipid = m->ipid; > __entry->ip = m->ip; > __entry->tsc = m->tsc; > + __entry->ppin = m->ppin; > __entry->walltime = m->time; > __entry->cpu = m->extcpu; > __entry->cpuid = m->cpuid; > @@ -55,7 +57,7 @@ TRACE_EVENT(mce_record, > __entry->cpuvendor = m->cpuvendor; > ), ... rest of patch trimmed. -Tony
Hi, On 1/23/2024 6:12 PM, Tony Luck wrote: > On Tue, Jan 23, 2024 at 05:51:50PM -0600, Avadhut Naik wrote: >> Machine Check Error information from struct mce is exported to userspace >> through the mce_record tracepoint. >> >> Currently, however, the PPIN (Protected Processor Inventory Number) field >> of struct mce is not exported through the tracepoint. >> >> Export PPIN through the tracepoint as it may provide useful information >> for debug and analysis. > > Awesome. I've been meaning to update the tracepoint for ages, but > it never gets to the top of the queue. > > But some questions: > > 1) Are tracepoints a user visible ABI? Adding a new field in the middle > feels like it might be problematic. I asked this question many years > ago and Steven Rostedt said there was some tracing library in the works > that would make this OK for appplications using that library. > I think they can be user visible through the "trace" and "trace_pipe" in /sys/kernel/debug/tracing. But you will have to enable the events you want to trace through /sys/kernel/debug/tracing/events/<event-name>/enable. AFAIK, this (adding field in the middle) shouldn't be problematic as we have the tracepoint format available in debugfs. For e.g. with this patch, the format is as follows: [root avadnaik]# cat /sys/kernel/debug/tracing/events/mce/mce_record/format name: mce_record ID: 113 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:u64 mcgcap; offset:8; size:8; signed:0; field:u64 mcgstatus; offset:16; size:8; signed:0; field:u64 status; offset:24; size:8; signed:0; field:u64 addr; offset:32; size:8; signed:0; field:u64 misc; offset:40; size:8; signed:0; field:u64 synd; offset:48; size:8; signed:0; field:u64 ipid; offset:56; size:8; signed:0; field:u64 ip; offset:64; size:8; signed:0; field:u64 tsc; offset:72; size:8; signed:0; field:u64 ppin; offset:80; size:8; signed:0; field:u64 walltime; offset:88; size:8; signed:0; field:u32 cpu; offset:96; size:4; signed:0; field:u32 cpuid; offset:100; size:4; signed:0; field:u32 apicid; offset:104; size:4; signed:0; field:u32 socketid; offset:108; size:4; signed:0; field:u8 cs; offset:112; size:1; signed:0; field:u8 bank; offset:113; size:1; signed:0; field:u8 cpuvendor; offset:114; size:1; signed:0; print fmt: "CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x", REC->cpu, REC->mcgcap, REC->mcgstatus, REC->bank, REC->status, REC->ipid, REC->addr, REC->misc, REC->synd, REC->cs, REC->ip, REC->tsc, REC->ppin, REC->cpuvendor, REC->cpuid, REC->walltime, REC->socketid, REC->apicid Just quickly tried with rasdaemon and things seem to be okay. Also, not a cent percent sure, but the library you are mentioning of, I think its the libtraceevent library and IIUC, it utilizes the above tracepoint format. > 2) While you are adding to the tracepoint, should we batch up all > the useful changes that have been made to "struct mce". I think the > new fields that might be of use are: > > __u64 synd; /* MCA_SYND MSR: only valid on SMCA systems */ > __u64 ipid; /* MCA_IPID MSR: only valid on SMCA systems */ > __u64 ppin; /* Protected Processor Inventory Number */ > __u32 microcode; /* Microcode revision */ > synd and ipid are already a part of mce_record tracepoint. (They too have been added in the middle). Will add the microcode field in the next version. >> >> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> >> --- >> include/trace/events/mce.h | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h >> index 1391ada0da3b..657b93ec8176 100644 >> --- a/include/trace/events/mce.h >> +++ b/include/trace/events/mce.h >> @@ -25,6 +25,7 @@ TRACE_EVENT(mce_record, >> __field( u64, ipid ) >> __field( u64, ip ) >> __field( u64, tsc ) >> + __field( u64, ppin ) >> __field( u64, walltime ) >> __field( u32, cpu ) >> __field( u32, cpuid ) >> @@ -45,6 +46,7 @@ TRACE_EVENT(mce_record, >> __entry->ipid = m->ipid; >> __entry->ip = m->ip; >> __entry->tsc = m->tsc; >> + __entry->ppin = m->ppin; >> __entry->walltime = m->time; >> __entry->cpu = m->extcpu; >> __entry->cpuid = m->cpuid; >> @@ -55,7 +57,7 @@ TRACE_EVENT(mce_record, >> __entry->cpuvendor = m->cpuvendor; >> ), > > ... rest of patch trimmed. > > -Tony
On Tue, 23 Jan 2024 19:29:52 -0600 "Naik, Avadhut" <avadnaik@amd.com> wrote: > > But some questions: > > > > 1) Are tracepoints a user visible ABI? Adding a new field in the middle > > feels like it might be problematic. I asked this question many years > > ago and Steven Rostedt said there was some tracing library in the works > > that would make this OK for appplications using that library. > > > > I think they can be user visible through the "trace" and "trace_pipe" in > /sys/kernel/debug/tracing. But you will have to enable the events you want > to trace through /sys/kernel/debug/tracing/events/<event-name>/enable. > > AFAIK, this (adding field in the middle) shouldn't be problematic as we > have the tracepoint format available in debugfs. For e.g. with this patch, > the format is as follows: > > [root avadnaik]# cat /sys/kernel/debug/tracing/events/mce/mce_record/format > name: mce_record > ID: 113 > format: > field:unsigned short common_type; offset:0; size:2; signed:0; > field:unsigned char common_flags; offset:2; size:1; signed:0; > field:unsigned char common_preempt_count; offset:3; size:1; signed:0; > field:int common_pid; offset:4; size:4; signed:1; > > field:u64 mcgcap; offset:8; size:8; signed:0; > field:u64 mcgstatus; offset:16; size:8; signed:0; > field:u64 status; offset:24; size:8; signed:0; > field:u64 addr; offset:32; size:8; signed:0; > field:u64 misc; offset:40; size:8; signed:0; > field:u64 synd; offset:48; size:8; signed:0; > field:u64 ipid; offset:56; size:8; signed:0; > field:u64 ip; offset:64; size:8; signed:0; > field:u64 tsc; offset:72; size:8; signed:0; > field:u64 ppin; offset:80; size:8; signed:0; > field:u64 walltime; offset:88; size:8; signed:0; > field:u32 cpu; offset:96; size:4; signed:0; > field:u32 cpuid; offset:100; size:4; signed:0; > field:u32 apicid; offset:104; size:4; signed:0; > field:u32 socketid; offset:108; size:4; signed:0; > field:u8 cs; offset:112; size:1; signed:0; > field:u8 bank; offset:113; size:1; signed:0; > field:u8 cpuvendor; offset:114; size:1; signed:0; > > print fmt: "CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x", REC->cpu, REC->mcgcap, REC->mcgstatus, REC->bank, REC->status, REC->ipid, REC->addr, REC->misc, REC->synd, REC->cs, REC->ip, REC->tsc, REC->ppin, REC->cpuvendor, REC->cpuid, REC->walltime, REC->socketid, REC->apicid > > > Just quickly tried with rasdaemon and things seem to be okay. > > Also, not a cent percent sure, but the library you are mentioning of, I think > its the libtraceevent library and IIUC, it utilizes the above tracepoint format. > Yes, rasdaemon uses libtraceevent (or a copy of it internally) that reads the format file to find fields. You can safely add fields to the middle of the event structure and the parsing will be just fine. -- Steve
On Tue, Jan 23, 2024 at 08:38:53PM -0500, Steven Rostedt wrote: > Yes, rasdaemon uses libtraceevent (or a copy of it internally) that > reads the format file to find fields. You can safely add fields to the > middle of the event structure and the parsing will be just fine. Should we worry about tools who consume the event "blindly", without the lib? I guess no until we break some use case and then we will have to revert. At least this is what we've done in the past...
On Wed, 24 Jan 2024 10:57:08 +0100 Borislav Petkov <bp@alien8.de> wrote: > On Tue, Jan 23, 2024 at 08:38:53PM -0500, Steven Rostedt wrote: > > Yes, rasdaemon uses libtraceevent (or a copy of it internally) that > > reads the format file to find fields. You can safely add fields to the > > middle of the event structure and the parsing will be just fine. > > Should we worry about tools who consume the event "blindly", without the > lib? I don't think that's a worry anymore. The offsets can change based on kernel config. PowerTop needed to have the library ported to it because it use to hardcode the offsets but then it broke when running the 32bit version on a 64bit kernel. > > I guess no until we break some use case and then we will have to revert. > At least this is what we've done in the past... > But that revert was reverted when we converted PowerTop to use libtraceevent. -- Steve
On Wed, Jan 24, 2024 at 09:09:08AM -0500, Steven Rostedt wrote: > I don't think that's a worry anymore. The offsets can change based on > kernel config. PowerTop needed to have the library ported to it because > it use to hardcode the offsets but then it broke when running the 32bit > version on a 64bit kernel. > > > > > I guess no until we break some use case and then we will have to revert. > > At least this is what we've done in the past... > > > > But that revert was reverted when we converted PowerTop to use libtraceevent. Ok, sounds like a good plan. /me makes a mental note for the future. Thx.
diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h index 1391ada0da3b..657b93ec8176 100644 --- a/include/trace/events/mce.h +++ b/include/trace/events/mce.h @@ -25,6 +25,7 @@ TRACE_EVENT(mce_record, __field( u64, ipid ) __field( u64, ip ) __field( u64, tsc ) + __field( u64, ppin ) __field( u64, walltime ) __field( u32, cpu ) __field( u32, cpuid ) @@ -45,6 +46,7 @@ TRACE_EVENT(mce_record, __entry->ipid = m->ipid; __entry->ip = m->ip; __entry->tsc = m->tsc; + __entry->ppin = m->ppin; __entry->walltime = m->time; __entry->cpu = m->extcpu; __entry->cpuid = m->cpuid; @@ -55,7 +57,7 @@ TRACE_EVENT(mce_record, __entry->cpuvendor = m->cpuvendor; ), - TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x", + TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x", __entry->cpu, __entry->mcgcap, __entry->mcgstatus, __entry->bank, __entry->status, @@ -63,6 +65,7 @@ TRACE_EVENT(mce_record, __entry->addr, __entry->misc, __entry->synd, __entry->cs, __entry->ip, __entry->tsc, + __entry->ppin, __entry->cpuvendor, __entry->cpuid, __entry->walltime, __entry->socketid,
Machine Check Error information from struct mce is exported to userspace through the mce_record tracepoint. Currently, however, the PPIN (Protected Processor Inventory Number) field of struct mce is not exported through the tracepoint. Export PPIN through the tracepoint as it may provide useful information for debug and analysis. Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> --- include/trace/events/mce.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) base-commit: 451b2bc29430fa147e36a48348f8b6b615fd6820