Message ID | 20230214234426.344960-5-jithu.joseph@intel.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
Series | Add Array BIST test support to IFS | expand |
On 2/14/23 15:44, Jithu Joseph wrote: > +/* MSR_ARRAY_BIST bit fields */ > +union ifs_array { > + u64 data; > + struct { > + u32 array_bitmask :32; > + u32 array_bank :16; > + u32 rsvd :15; > + u32 ctrl_result :1; > + }; > +}; Wow, after all that, the bitfield remains! Even the totally unnecessary parts, like the 32-bit wide bitfield in the 32-bit value. The *LEAST* you can do is this: struct ifs_array { u32 array_bitmask; u16 array_bank u16 rsvd :15; u16 ctrl_result :1; }; to at least minimize the *ENTIRELY* unnecessary bitfields. I'm also not quite sure why I'm going to the trouble of writing another one of these things since the last set of things I suggested were not incorporated. Or, what the obsession is with u32. I also think the union is a bit silly. You could literally just do: msrvals[0] = (u64 *)&activate; or memcpy(&msrvals[0], &activate, sizeof(activate)); and probably end up with exactly the same instructions. There's no type safety here either way. The cast, for instance, at least makes the lack of type safety obvious. > +static void ifs_array_test_core(int cpu, struct device *dev) > +{ > + union ifs_array activate, status = {0}; So, 'status' here is initialized to 0. But, 'activate'... hmmm Here's 1 of the 4 fields getting initialized: > + activate.array_bitmask = ~0U; > + timeout = jiffies + HZ / 2; > + > + do { > + if (time_after(jiffies, timeout)) { > + timed_out = true; > + break; > + } > + > + msrvals[0] = activate.data; and then the *WHOLE* union is read here. What *is* the uninitialized member behavior of a bitfield? I actually haven't the foggiest idea since I never use them. Is there some subtly C voodoo that initializes the other 3 fields? BTW, ifs_test_core() seems to be doing basically the same thing with ifs_status. > + atomic_set(&array_cpus_out, 0); > + stop_core_cpuslocked(cpu, do_array_test, msrvals); > + status.data = msrvals[1]; > + > + if (status.ctrl_result) > + break; > + > + activate.array_bitmask = status.array_bitmask; > + activate.array_bank = status.array_bank; > + > + } while (status.array_bitmask); > + > + ifsd->scan_details = status.data; Beautiful. It passes raw MSR values back out to userspace in sysfs. OK, so all this union.data nonsense is to, in the end, pass two MSR values through an array over to the do_array_test() function. That function, in the end, just does: + if (cpu == first) { + wrmsrl(MSR_ARRAY_BIST, msrs[0]); + /* Pass back the result of the test */ + rdmsrl(MSR_ARRAY_BIST, msrs[1]); + } with them. It doesn't even reuse msrs[0]. It also doesn't bother to even give them symbolic names, like command and result. The msrs[] values are also totally hard coded. I'd probably do something like the attached patch. It gets rid of 'data' and uses sane types for the bitfield. It does away with separate variables and munging into/out of the msr[] array and just passes a single command struct to the work function. It doesn't have any uninitialized structure/bitfield fields. Oh, and it's less code. > + if (status.ctrl_result) > + ifsd->status = SCAN_TEST_FAIL; > + else if (timed_out || status.array_bitmask) > + ifsd->status = SCAN_NOT_TESTED; > + else > + ifsd->status = SCAN_TEST_PASS; > +} > + > /* > * Initiate per core test. It wakes up work queue threads on the target cpu and > * its sibling cpu. Once all sibling threads wake up, the scan test gets executed and > @@ -253,6 +341,8 @@ int do_core_test(int cpu, struct device *dev) > ifs_test_core(cpu, dev); > break; > case IFS_ARRAY: > + ifs_array_test_core(cpu, dev); > + break; > default: > return -EINVAL; > }
On 2/15/23 08:58, Dave Hansen wrote: > On 2/14/23 15:44, Jithu Joseph wrote: > I'd probably do something like the attached patch. It gets rid of > 'data' and uses sane types for the bitfield. It does away with separate > variables and munging into/out of the msr[] array and just passes a > single command struct to the work function. It doesn't have any > uninitialized structure/bitfield fields. Real patch attached now.
On 2/15/2023 8:58 AM, Dave Hansen wrote: > On 2/14/23 15:44, Jithu Joseph wrote: ... > >> +static void ifs_array_test_core(int cpu, struct device *dev) >> +{ >> + union ifs_array activate, status = {0}; > > So, 'status' here is initialized to 0. But, 'activate'... hmmm > > Here's 1 of the 4 fields getting initialized: > >> + activate.array_bitmask = ~0U; >> + timeout = jiffies + HZ / 2; >> + >> + do { >> + if (time_after(jiffies, timeout)) { >> + timed_out = true; >> + break; >> + } >> + >> + msrvals[0] = activate.data; > > and then the *WHOLE* union is read here. What *is* the uninitialized > member behavior of a bitfield? I actually haven't the foggiest idea > since I never use them. Is there some subtly C voodoo that initializes > the other 3 fields? Thanks for pointing the mistake Dave. I see the bug w.r.t not initializing activate to zero. Thanks Dave for the proposed patch . Let me get back after taking a detailed look Jithu
On 2/15/2023 9:11 AM, Dave Hansen wrote: > On 2/15/23 08:58, Dave Hansen wrote: >> On 2/14/23 15:44, Jithu Joseph wrote: >> I'd probably do something like the attached patch. It gets rid of >> 'data' and uses sane types for the bitfield. It does away with separate >> variables and munging into/out of the msr[] array and just passes a >> single command struct to the work function. It doesn't have any >> uninitialized structure/bitfield fields. > > Real patch attached now. I did try out Dave's suggested patch. With few additional type castings, I could get it to work. We can go with this. I will incorporate this while posting v3. (Will await a few day for additional comments before posting v3) Jithu For context, the functions under discussion incorporating Dave's changes would look as below: static int do_array_test(void *data) { struct ifs_array *command = data; int cpu = smp_processor_id(); int first; /* * Only one logical CPU on a core needs to trigger the Array test via MSR write. */ first = cpumask_first(cpu_smt_mask(cpu)); if (cpu == first) { wrmsrl(MSR_ARRAY_BIST, *((u64 *)command)); /* Pass back the result of the test */ rdmsrl(MSR_ARRAY_BIST, *((u64 *)command)); } /* Tests complete faster if the sibling is spinning here */ wait_for_sibling_cpu(&array_cpus_out, NSEC_PER_SEC); return 0; } static void ifs_array_test_core(int cpu, struct device *dev) { struct ifs_array command = {}; bool timed_out = false; struct ifs_data *ifsd; unsigned long timeout; ifsd = ifs_get_data(dev); command.array_bitmask = ~0U; timeout = jiffies + HZ / 2; do { struct ifs_array before = command; if (time_after(jiffies, timeout)) { timed_out = true; break; } atomic_set(&array_cpus_out, 0); stop_core_cpuslocked(cpu, do_array_test, &command); trace_ifs_array(cpu, *((u64 *)&before), *((u64 *)&command)); if (command.ctrl_result) break; } while (command.array_bitmask); ifsd->scan_details = *((u64 *)&command); if (command.ctrl_result) ifsd->status = SCAN_TEST_FAIL; else if (timed_out || command.array_bitmask) ifsd->status = SCAN_NOT_TESTED; else ifsd->status = SCAN_TEST_PASS; }
On 2/15/23 12:22, Joseph, Jithu wrote:
> trace_ifs_array(cpu, *((u64 *)&before), *((u64 *)&command));
Uhh, you control the types in the tracepoint. Just make them compatible
so you don't need casts.
On 2/15/2023 12:26 PM, Dave Hansen wrote: > On 2/15/23 12:22, Joseph, Jithu wrote: >> trace_ifs_array(cpu, *((u64 *)&before), *((u64 *)&command)); > > Uhh, you control the types in the tracepoint. Just make them compatible > so you don't need casts. will change it to: trace_ifs_array(cpu, before.array_bitmask, before.array_bank, *((u64 *)&command)); i.e will pass compatible types for array_list and array_bank. And for the last argument, we need to dump the whole 64 bits within "command" into trace output . Since the suggested change replaced the union with a struct, it is simplest to cast it to u64 needed by traceoutput. So I would prefer to keep the cast for the last argument alone. sample trace output ... <snip> bash-4562 [001] ..... 761.563749: ifs_array: cpu: 10, array_list: 00007fe0, array_bank: 0101, status: 0000000200007fe0 bash-4562 [001] ..... 761.563772: ifs_array: cpu: 10, array_list: 00007fe0, array_bank: 0002, status: 0000010200007fe0 </snip>
On 2/15/23 13:13, Joseph, Jithu wrote: > > On 2/15/2023 12:26 PM, Dave Hansen wrote: >> On 2/15/23 12:22, Joseph, Jithu wrote: >>> trace_ifs_array(cpu, *((u64 *)&before), *((u64 *)&command)); >> Uhh, you control the types in the tracepoint. Just make them compatible >> so you don't need casts. > will change it to: > trace_ifs_array(cpu, before.array_bitmask, before.array_bank, *((u64 *)&command)); > > i.e will pass compatible types for array_list and array_bank. And for the last argument, we need to dump the whole 64 bits within "command" > into trace output . Since the suggested change replaced the union with a struct, it is simplest to cast it to u64 needed by traceoutput. > So I would prefer to keep the cast for the last argument alone. <sigh> Your trace even can literally be: + TP_STRUCT__entry( + __field(struct ifs_foo, before ) + __field(struct ifs_foo, after ) + __field( int, cpu ) + ), and then you can just use structure assignment or a memcpy. *That* is what I mean by compatible types. But, also, I'm not sure these tracepoints even make any sense. You're passing raw MSR contents back and forth. Why not just use the MSR tracepoints? They'll give you the same data.
On 2/15/23 13:13, Joseph, Jithu wrote: > And for the last argument, we need to dump the whole 64 bits within "command" > into trace output . No, no you don't. Just split it up in to human-readable logical pieces. You don't need to dump the whole 'command'. If you want to trace the completion mask, then extract the mask and print *that* with a proper field name. I actually really think this tracing should go away, *ESPECIALLY* if you insist on dumping out raw, non-human-readable MSR values. I say NAK on the tracing. You haven't convinced me that it's necessary on top of what we have today (MSR tracing).
On 2/22/2023 12:12 PM, Dave Hansen wrote: > > I actually really think this tracing should go away, *ESPECIALLY* if you > insist on dumping out raw, non-human-readable MSR values. > > I say NAK on the tracing. You haven't convinced me that it's necessary > on top of what we have today (MSR tracing). Thanks for the pointer on MSR trace. I agree that setting the appropriate filter ("msr == 0x105") would generate the read and write values (snip2 below). The primary value addition for the custom trace added by this driver was readability / convenience (snip1 below). The format it is dumping today is most convenient for tracking the progress of this test from start to completion. (Two formatted input fields (wrmsr) and one output field (rdmsr) on the same line ). Another convenience is that, by enabling the high-level intel_ifs trace, we can see all activity related to all IFS tests (and not having to explicitly remember and enable the MSRs corresponding to each of the IFS tests) Since the trace has to be explicitly enabled (I thought it is okay to add a more convenient custom one only to be enabled for detailed analysis when required). <Snip 1 added by driver> bash-9411 [123] ..... 423555.355664: ifs_array: cpu: 10, array_list: ffffffff, array_bank: 0000, status: 0000010000007ff2 bash-9411 [123] ..... 423555.355858: ifs_array: cpu: 10, array_list: 00007ff2, array_bank: 0100, status: 0000000000007fe0 bash-9411 [123] ..... 423555.355891: ifs_array: cpu: 10, array_list: 00007fe0, array_bank: 0000, status: 0000010000007fe0 bash-9411 [123] ..... 423555.355923: ifs_array: cpu: 10, array_list: 00007fe0, array_bank: 0100, status: 0000000100007fe0 </snip1> <snip2 using msr trace> migration/10-76 [010] d..1. 423672.955522: write_msr: 105, value ffffffff migration/10-76 [010] d..1. 423672.955525: read_msr: 105, value 10000007ff2 migration/10-76 [010] d..1. 423672.955733: write_msr: 105, value 10000007ff2 migration/10-76 [010] d..1. 423672.955736: read_msr: 105, value 7fe0 migration/10-76 [010] d..1. 423672.955783: write_msr: 105, value 7fe0 migration/10-76 [010] d..1. 423672.955786: read_msr: 105, value 10000007fe0 migration/10-76 [010] d..1. 423672.955819: write_msr: 105, value 10000007fe0 migration/10-76 [010] d..1. 423672.955822: read_msr: 105, value 100007fe0 </snip2 msr trace> Jithu
On 2/22/23 14:07, Joseph, Jithu wrote: > Since the trace has to be explicitly enabled (I thought it is okay to > add a more convenient custom one only to be enabled for detailed > analysis when required). man perf-script You should be able to write one to do exactly what you need in about 10 minutes. No new tracepoints, no new kernel code to maintain. Knock yourself out with whatever conveniences you want. Just do it in userspace. I still think this patch should go away.
On Wed, 22 Feb 2023 14:28:55 -0800 Dave Hansen <dave.hansen@intel.com> wrote: > On 2/22/23 14:07, Joseph, Jithu wrote: > > Since the trace has to be explicitly enabled (I thought it is okay to > > add a more convenient custom one only to be enabled for detailed > > analysis when required). > > man perf-script Or do it in C with tracefs :-) https://www.trace-cmd.org/Documentation/libtracefs/ -- Steve > > You should be able to write one to do exactly what you need in about 10 > minutes. No new tracepoints, no new kernel code to maintain. Knock > yourself out with whatever conveniences you want. Just do it in userspace. > > I still think this patch should go away.
On 2/22/2023 2:28 PM, Dave Hansen wrote: > On 2/22/23 14:07, Joseph, Jithu wrote: >> Since the trace has to be explicitly enabled (I thought it is okay to >> add a more convenient custom one only to be enabled for detailed >> analysis when required). > > man perf-script > > You should be able to write one to do exactly what you need in about 10 > minutes. No new tracepoints, no new kernel code to maintain. Knock Thanks for the pointers. I gather that user level tools might be able to format the msr trace output in a more convenient way (perhaps as generated by the custom driver trace) There is still the other convenience of not having to look-up the MSRs addresses corresponding to all the individual tests and setting the filter on them to analyze IFS test activity (rather than enabling it via echo 1 > /sys/kernel/debug/tracing/events/intel_ifs/enable) Agreed that for array test it is a single MSR. For other test, it is 2 separate MSRs ... there could be additional tests in the future. Jithu
On 2/22/23 15:32, Joseph, Jithu wrote: > On 2/22/2023 2:28 PM, Dave Hansen wrote: >> On 2/22/23 14:07, Joseph, Jithu wrote: >>> Since the trace has to be explicitly enabled (I thought it is okay to >>> add a more convenient custom one only to be enabled for detailed >>> analysis when required). >> man perf-script >> >> You should be able to write one to do exactly what you need in about 10 >> minutes. No new tracepoints, no new kernel code to maintain. Knock > Thanks for the pointers. I gather that user level tools might be able to format the > msr trace output in a more convenient way (perhaps as generated by the custom driver trace) > > There is still the other convenience of not having to look-up the > MSRs addresses corresponding to all the individual tests and setting the filter on them to > analyze IFS test activity (rather than enabling it via echo 1 > /sys/kernel/debug/tracing/events/intel_ifs/enable) Jithu, the perf-script tooling that I suggested wraps can abstract away the sysfs interface. I suspect you might also have discovered this in the process of exploring my (and others') suggestions. Please go invest some time there. Feel free to come back to this thread if you get stuck or have done your due diligence exhausted the userspace-only path.
diff --git a/drivers/platform/x86/intel/ifs/ifs.h b/drivers/platform/x86/intel/ifs/ifs.h index a0cb2696c1d9..e5375191b56d 100644 --- a/drivers/platform/x86/intel/ifs/ifs.h +++ b/drivers/platform/x86/intel/ifs/ifs.h @@ -127,6 +127,7 @@ #include <linux/device.h> #include <linux/miscdevice.h> +#define MSR_ARRAY_BIST 0x00000105 #define MSR_COPY_SCAN_HASHES 0x000002c2 #define MSR_SCAN_HASHES_STATUS 0x000002c3 #define MSR_AUTHENTICATE_AND_COPY_CHUNK 0x000002c4 @@ -194,6 +195,17 @@ union ifs_status { }; }; +/* MSR_ARRAY_BIST bit fields */ +union ifs_array { + u64 data; + struct { + u32 array_bitmask :32; + u32 array_bank :16; + u32 rsvd :15; + u32 ctrl_result :1; + }; +}; + /* * Driver populated error-codes * 0xFD: Test timed out before completing all the chunks. diff --git a/drivers/platform/x86/intel/ifs/runtest.c b/drivers/platform/x86/intel/ifs/runtest.c index 65e08af70994..12880fca0aa8 100644 --- a/drivers/platform/x86/intel/ifs/runtest.c +++ b/drivers/platform/x86/intel/ifs/runtest.c @@ -229,6 +229,94 @@ static void ifs_test_core(int cpu, struct device *dev) } } +#define SPINUNIT 100 /* 100 nsec */ +static atomic_t array_cpus_out; + +/* + * Simplified cpu sibling rendezvous loop based on microcode loader __wait_for_cpus() + */ +static void wait_for_sibling_cpu(atomic_t *t, long long timeout) +{ + int cpu = smp_processor_id(); + const struct cpumask *smt_mask = cpu_smt_mask(cpu); + int all_cpus = cpumask_weight(smt_mask); + + atomic_inc(t); + while (atomic_read(t) < all_cpus) { + if (timeout < SPINUNIT) + return; + ndelay(SPINUNIT); + timeout -= SPINUNIT; + touch_nmi_watchdog(); + } +} + +static int do_array_test(void *data) +{ + int cpu = smp_processor_id(); + u64 *msrs = data; + int first; + + /* + * Only one logical CPU on a core needs to trigger the Array test via MSR write. + */ + first = cpumask_first(cpu_smt_mask(cpu)); + + if (cpu == first) { + wrmsrl(MSR_ARRAY_BIST, msrs[0]); + /* Pass back the result of the test */ + rdmsrl(MSR_ARRAY_BIST, msrs[1]); + } + + /* Tests complete faster if the sibling is spinning here */ + wait_for_sibling_cpu(&array_cpus_out, NSEC_PER_SEC); + + return 0; +} + +static void ifs_array_test_core(int cpu, struct device *dev) +{ + union ifs_array activate, status = {0}; + bool timed_out = false; + struct ifs_data *ifsd; + unsigned long timeout; + u64 msrvals[2]; + + ifsd = ifs_get_data(dev); + + activate.array_bitmask = ~0U; + timeout = jiffies + HZ / 2; + + do { + if (time_after(jiffies, timeout)) { + timed_out = true; + break; + } + + msrvals[0] = activate.data; + + atomic_set(&array_cpus_out, 0); + stop_core_cpuslocked(cpu, do_array_test, msrvals); + status.data = msrvals[1]; + + if (status.ctrl_result) + break; + + activate.array_bitmask = status.array_bitmask; + activate.array_bank = status.array_bank; + + } while (status.array_bitmask); + + ifsd->scan_details = status.data; + + if (status.ctrl_result) + ifsd->status = SCAN_TEST_FAIL; + else if (timed_out || status.array_bitmask) + ifsd->status = SCAN_NOT_TESTED; + else + ifsd->status = SCAN_TEST_PASS; +} + /* * Initiate per core test. It wakes up work queue threads on the target cpu and * its sibling cpu. Once all sibling threads wake up, the scan test gets executed and @@ -253,6 +341,8 @@ int do_core_test(int cpu, struct device *dev) ifs_test_core(cpu, dev); break; case IFS_ARRAY: + ifs_array_test_core(cpu, dev); + break; default: return -EINVAL; }