diff mbox series

[v2,4/7] platform/x86/intel/ifs: Implement Array BIST test

Message ID 20230214234426.344960-5-jithu.joseph@intel.com (mailing list archive)
State Changes Requested, archived
Headers show
Series Add Array BIST test support to IFS | expand

Commit Message

Joseph, Jithu Feb. 14, 2023, 11:44 p.m. UTC
Array BIST test (for a particlular core) is triggered by writing
to MSR_ARRAY_BIST from one sibling of the core.

This will initiate a test for all supported arrays on that
CPU. Array BIST test may be aborted before completing all the
arrays in the event of an interrupt or other reasons.
In this case, kernel will restart the test from that point
onwards. Array test will also be aborted when the test fails,
in which case the test is stopped immediately without further
retry.

Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 drivers/platform/x86/intel/ifs/ifs.h     | 12 ++++
 drivers/platform/x86/intel/ifs/runtest.c | 90 ++++++++++++++++++++++++
 2 files changed, 102 insertions(+)

Comments

Dave Hansen Feb. 15, 2023, 4:58 p.m. UTC | #1
On 2/14/23 15:44, Jithu Joseph wrote:
> +/* MSR_ARRAY_BIST bit fields */
> +union ifs_array {
> +	u64	data;
> +	struct {
> +		u32	array_bitmask		:32;
> +		u32	array_bank		:16;
> +		u32	rsvd			:15;
> +		u32	ctrl_result		:1;
> +	};
> +};

Wow, after all that, the bitfield remains!  Even the totally unnecessary
parts, like the 32-bit wide bitfield in the 32-bit value.  The *LEAST*
you can do is this:

	struct ifs_array {
		u32	array_bitmask;
		u16	array_bank
		u16	rsvd			:15;
		u16	ctrl_result		:1;
	};

to at least minimize the *ENTIRELY* unnecessary bitfields.  I'm also not
quite sure why I'm going to the trouble of writing another one of these
things since the last set of things I suggested were not incorporated.
Or, what the obsession is with u32.

I also think the union is a bit silly.  You could literally just do:

	msrvals[0] = (u64 *)&activate;
or
	memcpy(&msrvals[0], &activate, sizeof(activate));

and probably end up with exactly the same instructions.  There's no type
safety here either way.  The cast, for instance, at least makes the lack
of type safety obvious.

> +static void ifs_array_test_core(int cpu, struct device *dev)
> +{
> +	union ifs_array activate, status = {0};

So, 'status' here is initialized to 0.  But, 'activate'... hmmm

Here's 1 of the 4 fields getting initialized:

> +	activate.array_bitmask = ~0U;
> +	timeout = jiffies + HZ / 2;
> +
> +	do {
> +		if (time_after(jiffies, timeout)) {
> +			timed_out = true;
> +			break;
> +		}
> +
> +		msrvals[0] = activate.data;

and then the *WHOLE* union is read here.  What *is* the uninitialized
member behavior of a bitfield?  I actually haven't the foggiest idea
since I never use them.  Is there some subtly C voodoo that initializes
the other 3 fields?

BTW, ifs_test_core() seems to be doing basically the same thing with
ifs_status.

> +		atomic_set(&array_cpus_out, 0);
> +		stop_core_cpuslocked(cpu, do_array_test, msrvals);
> +		status.data = msrvals[1];
> +
> +		if (status.ctrl_result)
> +			break;
> +
> +		activate.array_bitmask = status.array_bitmask;
> +		activate.array_bank = status.array_bank;
> +
> +	} while (status.array_bitmask);
> +
> +	ifsd->scan_details = status.data;

Beautiful.  It passes raw MSR values back out to userspace in sysfs.

OK, so all this union.data nonsense is to, in the end, pass two MSR
values through an array over to the do_array_test() function. That
function, in the end, just does:

+	if (cpu == first) {
+		wrmsrl(MSR_ARRAY_BIST, msrs[0]);
+		/* Pass back the result of the test */
+		rdmsrl(MSR_ARRAY_BIST, msrs[1]);
+	}

with them.  It doesn't even reuse msrs[0].  It also doesn't bother to
even give them symbolic names, like command and result.  The msrs[]
values are also totally hard coded.

I'd probably do something like the attached patch.  It gets rid of
'data' and uses sane types for the bitfield.  It does away with separate
variables and munging into/out of the msr[] array and just passes a
single command struct to the work function.  It doesn't have any
uninitialized structure/bitfield fields.

Oh, and it's less code.

> +	if (status.ctrl_result)
> +		ifsd->status = SCAN_TEST_FAIL;
> +	else if (timed_out || status.array_bitmask)
> +		ifsd->status = SCAN_NOT_TESTED;
> +	else
> +		ifsd->status = SCAN_TEST_PASS;
> +}
> +
>  /*
>   * Initiate per core test. It wakes up work queue threads on the target cpu and
>   * its sibling cpu. Once all sibling threads wake up, the scan test gets executed and
> @@ -253,6 +341,8 @@ int do_core_test(int cpu, struct device *dev)
>  		ifs_test_core(cpu, dev);
>  		break;
>  	case IFS_ARRAY:
> +		ifs_array_test_core(cpu, dev);
> +		break;
>  	default:
>  		return -EINVAL;
>  	}
Dave Hansen Feb. 15, 2023, 5:11 p.m. UTC | #2
On 2/15/23 08:58, Dave Hansen wrote:
> On 2/14/23 15:44, Jithu Joseph wrote:
> I'd probably do something like the attached patch.  It gets rid of
> 'data' and uses sane types for the bitfield.  It does away with separate
> variables and munging into/out of the msr[] array and just passes a
> single command struct to the work function.  It doesn't have any
> uninitialized structure/bitfield fields.

Real patch attached now.
Joseph, Jithu Feb. 15, 2023, 5:44 p.m. UTC | #3
On 2/15/2023 8:58 AM, Dave Hansen wrote:
> On 2/14/23 15:44, Jithu Joseph wrote:
...
> 
>> +static void ifs_array_test_core(int cpu, struct device *dev)
>> +{
>> +	union ifs_array activate, status = {0};
> 
> So, 'status' here is initialized to 0.  But, 'activate'... hmmm
> 
> Here's 1 of the 4 fields getting initialized:
> 
>> +	activate.array_bitmask = ~0U;
>> +	timeout = jiffies + HZ / 2;
>> +
>> +	do {
>> +		if (time_after(jiffies, timeout)) {
>> +			timed_out = true;
>> +			break;
>> +		}
>> +
>> +		msrvals[0] = activate.data;
> 
> and then the *WHOLE* union is read here.  What *is* the uninitialized
> member behavior of a bitfield?  I actually haven't the foggiest idea
> since I never use them.  Is there some subtly C voodoo that initializes
> the other 3 fields?

Thanks for pointing the mistake Dave. I see the bug w.r.t not initializing activate to
zero.

Thanks Dave for the proposed patch . Let me get back after taking a detailed look

Jithu
Joseph, Jithu Feb. 15, 2023, 8:22 p.m. UTC | #4
On 2/15/2023 9:11 AM, Dave Hansen wrote:
> On 2/15/23 08:58, Dave Hansen wrote:
>> On 2/14/23 15:44, Jithu Joseph wrote:
>> I'd probably do something like the attached patch.  It gets rid of
>> 'data' and uses sane types for the bitfield.  It does away with separate
>> variables and munging into/out of the msr[] array and just passes a
>> single command struct to the work function.  It doesn't have any
>> uninitialized structure/bitfield fields.
> 
> Real patch attached now.

I did try out Dave's suggested patch. With few additional type castings, I could get it to work.
We can go with this. I will incorporate this while posting v3. 

(Will await a few day for additional comments before posting v3)

Jithu

For context, the functions under discussion incorporating Dave's changes would look as below:
static int do_array_test(void *data)
{
	struct ifs_array *command = data;
	int cpu = smp_processor_id();

	int first;

	/*
	 * Only one logical CPU on a core needs to trigger the Array test via MSR write.
	 */
	first = cpumask_first(cpu_smt_mask(cpu));

	if (cpu == first) {
		wrmsrl(MSR_ARRAY_BIST, *((u64 *)command));
		/* Pass back the result of the test */
		rdmsrl(MSR_ARRAY_BIST, *((u64 *)command));
	}

	/* Tests complete faster if the sibling is spinning here */
	wait_for_sibling_cpu(&array_cpus_out, NSEC_PER_SEC);

	return 0;
}

static void ifs_array_test_core(int cpu, struct device *dev)
{
	struct ifs_array command = {};
	bool timed_out = false;
	struct ifs_data *ifsd;
	unsigned long timeout;

	ifsd = ifs_get_data(dev);

	command.array_bitmask = ~0U;
	timeout = jiffies + HZ / 2;

	do {
		struct ifs_array before = command;

		if (time_after(jiffies, timeout)) {
			timed_out = true;
			break;
		}
		atomic_set(&array_cpus_out, 0);
		stop_core_cpuslocked(cpu, do_array_test, &command);

		trace_ifs_array(cpu, *((u64 *)&before), *((u64 *)&command));
		if (command.ctrl_result)
			break;
	} while (command.array_bitmask);

	ifsd->scan_details = *((u64 *)&command);

	if (command.ctrl_result)
		ifsd->status = SCAN_TEST_FAIL;
	else if (timed_out || command.array_bitmask)
		ifsd->status = SCAN_NOT_TESTED;
	else
		ifsd->status = SCAN_TEST_PASS;
}
Dave Hansen Feb. 15, 2023, 8:26 p.m. UTC | #5
On 2/15/23 12:22, Joseph, Jithu wrote:
> 		trace_ifs_array(cpu, *((u64 *)&before), *((u64 *)&command));

Uhh, you control the types in the tracepoint.  Just make them compatible
so you don't need casts.
Joseph, Jithu Feb. 15, 2023, 9:13 p.m. UTC | #6
On 2/15/2023 12:26 PM, Dave Hansen wrote:
> On 2/15/23 12:22, Joseph, Jithu wrote:
>> 		trace_ifs_array(cpu, *((u64 *)&before), *((u64 *)&command));
> 
> Uhh, you control the types in the tracepoint.  Just make them compatible
> so you don't need casts.

will change it to:
trace_ifs_array(cpu, before.array_bitmask, before.array_bank,  *((u64 *)&command));

i.e will pass compatible types for array_list and array_bank. And for the last argument, we need to dump the whole 64 bits within "command"
into trace output . Since the suggested change replaced the union with a struct, it is simplest to cast it to u64 needed by traceoutput.
So I would prefer to keep the cast for the last argument alone.

sample trace output ... 

<snip>
  bash-4562    [001] .....   761.563749: ifs_array: cpu: 10, array_list: 00007fe0, array_bank: 0101, status: 0000000200007fe0
  bash-4562    [001] .....   761.563772: ifs_array: cpu: 10, array_list: 00007fe0, array_bank: 0002, status: 0000010200007fe0
</snip>
Dave Hansen Feb. 15, 2023, 9:18 p.m. UTC | #7
On 2/15/23 13:13, Joseph, Jithu wrote:
> 
> On 2/15/2023 12:26 PM, Dave Hansen wrote:
>> On 2/15/23 12:22, Joseph, Jithu wrote:
>>> 		trace_ifs_array(cpu, *((u64 *)&before), *((u64 *)&command));
>> Uhh, you control the types in the tracepoint.  Just make them compatible
>> so you don't need casts.
> will change it to:
> trace_ifs_array(cpu, before.array_bitmask, before.array_bank,  *((u64 *)&command));
> 
> i.e will pass compatible types for array_list and array_bank. And for the last argument, we need to dump the whole 64 bits within "command"
> into trace output . Since the suggested change replaced the union with a struct, it is simplest to cast it to u64 needed by traceoutput.
> So I would prefer to keep the cast for the last argument alone.

<sigh>

Your trace even can literally be:

+	TP_STRUCT__entry(
+		__field(struct ifs_foo,	before	)
+		__field(struct ifs_foo,	after	)
+		__field(	int,	cpu	)
+	),

and then you can just use structure assignment or a memcpy.  *That* is
what I mean by compatible types.

But, also, I'm not sure these tracepoints even make any sense.  You're
passing raw MSR contents back and forth.  Why not just use the MSR
tracepoints?  They'll give you the same data.
Dave Hansen Feb. 22, 2023, 8:12 p.m. UTC | #8
On 2/15/23 13:13, Joseph, Jithu wrote:
>  And for the last argument, we need to dump the whole 64 bits within "command"
> into trace output . 

No, no you don't.

Just split it up in to human-readable logical pieces.  You don't need to
dump the whole 'command'.  If you want to trace the completion mask,
then extract the mask and print *that* with a proper field name.

I actually really think this tracing should go away, *ESPECIALLY* if you
insist on dumping out raw, non-human-readable MSR values.

I say NAK on the tracing.  You haven't convinced me that it's necessary
on top of what we have today (MSR tracing).
Joseph, Jithu Feb. 22, 2023, 10:07 p.m. UTC | #9
On 2/22/2023 12:12 PM, Dave Hansen wrote:
> 
> I actually really think this tracing should go away, *ESPECIALLY* if you
> insist on dumping out raw, non-human-readable MSR values.
> 
> I say NAK on the tracing.  You haven't convinced me that it's necessary
> on top of what we have today (MSR tracing).


Thanks for the pointer on MSR trace. I agree that setting the appropriate filter ("msr == 0x105") would generate the read and write values (snip2 below).

The primary value addition for the custom trace added by this driver was readability / convenience (snip1 below).  The format it is dumping today is most convenient for tracking  the progress of this test  from start to completion. (Two formatted input fields (wrmsr) and one output field (rdmsr) on the same line ).
Another convenience is that, by enabling the high-level intel_ifs trace,  we can see all  activity related to all IFS tests (and not having to explicitly remember and enable the MSRs corresponding to each of the IFS tests)

Since the trace has to be explicitly enabled (I thought it is okay to add a more convenient custom one only to be enabled for detailed analysis when required).

<Snip 1 added by driver>
     bash-9411    [123] ..... 423555.355664: ifs_array: cpu: 10, array_list: ffffffff, array_bank: 0000, status: 0000010000007ff2
     bash-9411    [123] ..... 423555.355858: ifs_array: cpu: 10, array_list: 00007ff2, array_bank: 0100, status: 0000000000007fe0
     bash-9411    [123] ..... 423555.355891: ifs_array: cpu: 10, array_list: 00007fe0, array_bank: 0000, status: 0000010000007fe0
     bash-9411    [123] ..... 423555.355923: ifs_array: cpu: 10, array_list: 00007fe0, array_bank: 0100, status: 0000000100007fe0
</snip1>


<snip2 using msr trace>
   migration/10-76      [010] d..1. 423672.955522: write_msr: 105, value ffffffff
   migration/10-76      [010] d..1. 423672.955525: read_msr: 105, value 10000007ff2
   migration/10-76      [010] d..1. 423672.955733: write_msr: 105, value 10000007ff2
   migration/10-76      [010] d..1. 423672.955736: read_msr: 105, value 7fe0
   migration/10-76      [010] d..1. 423672.955783: write_msr: 105, value 7fe0
   migration/10-76      [010] d..1. 423672.955786: read_msr: 105, value 10000007fe0
   migration/10-76      [010] d..1. 423672.955819: write_msr: 105, value 10000007fe0
   migration/10-76      [010] d..1. 423672.955822: read_msr: 105, value 100007fe0
</snip2 msr trace>


Jithu
Dave Hansen Feb. 22, 2023, 10:28 p.m. UTC | #10
On 2/22/23 14:07, Joseph, Jithu wrote:
> Since the trace has to be explicitly enabled (I thought it is okay to
> add a more convenient custom one only to be enabled for detailed
> analysis when required).

	man perf-script

You should be able to write one to do exactly what you need in about 10
minutes.  No new tracepoints, no new kernel code to maintain.  Knock
yourself out with whatever conveniences you want.  Just do it in userspace.

I still think this patch should go away.
Steven Rostedt Feb. 22, 2023, 10:36 p.m. UTC | #11
On Wed, 22 Feb 2023 14:28:55 -0800
Dave Hansen <dave.hansen@intel.com> wrote:

> On 2/22/23 14:07, Joseph, Jithu wrote:
> > Since the trace has to be explicitly enabled (I thought it is okay to
> > add a more convenient custom one only to be enabled for detailed
> > analysis when required).  
> 
> 	man perf-script

Or do it in C with tracefs :-)

  https://www.trace-cmd.org/Documentation/libtracefs/

-- Steve


> 
> You should be able to write one to do exactly what you need in about 10
> minutes.  No new tracepoints, no new kernel code to maintain.  Knock
> yourself out with whatever conveniences you want.  Just do it in userspace.
> 
> I still think this patch should go away.
Joseph, Jithu Feb. 22, 2023, 11:32 p.m. UTC | #12
On 2/22/2023 2:28 PM, Dave Hansen wrote:
> On 2/22/23 14:07, Joseph, Jithu wrote:
>> Since the trace has to be explicitly enabled (I thought it is okay to
>> add a more convenient custom one only to be enabled for detailed
>> analysis when required).
> 
> 	man perf-script
> 
> You should be able to write one to do exactly what you need in about 10
> minutes.  No new tracepoints, no new kernel code to maintain.  Knock

Thanks for the pointers. I gather that user level tools might be able to format the
msr trace output in a more convenient way (perhaps as generated by the custom driver trace)

There is still the other convenience of not having to look-up the
MSRs addresses corresponding to all the individual tests and setting the filter on them to
analyze IFS test activity (rather than enabling it via echo 1 > /sys/kernel/debug/tracing/events/intel_ifs/enable)

Agreed that for array test it is a single MSR. For other test, it is 2 separate MSRs ... there could be additional
tests in the future. 


Jithu
Dave Hansen Feb. 22, 2023, 11:59 p.m. UTC | #13
On 2/22/23 15:32, Joseph, Jithu wrote:
> On 2/22/2023 2:28 PM, Dave Hansen wrote:
>> On 2/22/23 14:07, Joseph, Jithu wrote:
>>> Since the trace has to be explicitly enabled (I thought it is okay to
>>> add a more convenient custom one only to be enabled for detailed
>>> analysis when required).
>> 	man perf-script
>>
>> You should be able to write one to do exactly what you need in about 10
>> minutes.  No new tracepoints, no new kernel code to maintain.  Knock
> Thanks for the pointers. I gather that user level tools might be able to format the
> msr trace output in a more convenient way (perhaps as generated by the custom driver trace)
> 
> There is still the other convenience of not having to look-up the
> MSRs addresses corresponding to all the individual tests and setting the filter on them to
> analyze IFS test activity (rather than enabling it via echo 1 > /sys/kernel/debug/tracing/events/intel_ifs/enable)

Jithu, the perf-script tooling that I suggested wraps can abstract away
the sysfs interface.  I suspect you might also have discovered this in
the process of exploring my (and others') suggestions.

Please go invest some time there.  Feel free to come back to this thread
if you get stuck or have done your due diligence exhausted the
userspace-only path.
diff mbox series

Patch

diff --git a/drivers/platform/x86/intel/ifs/ifs.h b/drivers/platform/x86/intel/ifs/ifs.h
index a0cb2696c1d9..e5375191b56d 100644
--- a/drivers/platform/x86/intel/ifs/ifs.h
+++ b/drivers/platform/x86/intel/ifs/ifs.h
@@ -127,6 +127,7 @@ 
 #include <linux/device.h>
 #include <linux/miscdevice.h>
 
+#define MSR_ARRAY_BIST				0x00000105
 #define MSR_COPY_SCAN_HASHES			0x000002c2
 #define MSR_SCAN_HASHES_STATUS			0x000002c3
 #define MSR_AUTHENTICATE_AND_COPY_CHUNK		0x000002c4
@@ -194,6 +195,17 @@  union ifs_status {
 	};
 };
 
+/* MSR_ARRAY_BIST bit fields */
+union ifs_array {
+	u64	data;
+	struct {
+		u32	array_bitmask		:32;
+		u32	array_bank		:16;
+		u32	rsvd			:15;
+		u32	ctrl_result		:1;
+	};
+};
+
 /*
  * Driver populated error-codes
  * 0xFD: Test timed out before completing all the chunks.
diff --git a/drivers/platform/x86/intel/ifs/runtest.c b/drivers/platform/x86/intel/ifs/runtest.c
index 65e08af70994..12880fca0aa8 100644
--- a/drivers/platform/x86/intel/ifs/runtest.c
+++ b/drivers/platform/x86/intel/ifs/runtest.c
@@ -229,6 +229,94 @@  static void ifs_test_core(int cpu, struct device *dev)
 	}
 }
 
+#define SPINUNIT 100 /* 100 nsec */
+static atomic_t array_cpus_out;
+
+/*
+ * Simplified cpu sibling rendezvous loop based on microcode loader __wait_for_cpus()
+ */
+static void wait_for_sibling_cpu(atomic_t *t, long long timeout)
+{
+	int cpu = smp_processor_id();
+	const struct cpumask *smt_mask = cpu_smt_mask(cpu);
+	int all_cpus = cpumask_weight(smt_mask);
+
+	atomic_inc(t);
+	while (atomic_read(t) < all_cpus) {
+		if (timeout < SPINUNIT)
+			return;
+		ndelay(SPINUNIT);
+		timeout -= SPINUNIT;
+		touch_nmi_watchdog();
+	}
+}
+
+static int do_array_test(void *data)
+{
+	int cpu = smp_processor_id();
+	u64 *msrs = data;
+	int first;
+
+	/*
+	 * Only one logical CPU on a core needs to trigger the Array test via MSR write.
+	 */
+	first = cpumask_first(cpu_smt_mask(cpu));
+
+	if (cpu == first) {
+		wrmsrl(MSR_ARRAY_BIST, msrs[0]);
+		/* Pass back the result of the test */
+		rdmsrl(MSR_ARRAY_BIST, msrs[1]);
+	}
+
+	/* Tests complete faster if the sibling is spinning here */
+	wait_for_sibling_cpu(&array_cpus_out, NSEC_PER_SEC);
+
+	return 0;
+}
+
+static void ifs_array_test_core(int cpu, struct device *dev)
+{
+	union ifs_array activate, status = {0};
+	bool timed_out = false;
+	struct ifs_data *ifsd;
+	unsigned long timeout;
+	u64 msrvals[2];
+
+	ifsd = ifs_get_data(dev);
+
+	activate.array_bitmask = ~0U;
+	timeout = jiffies + HZ / 2;
+
+	do {
+		if (time_after(jiffies, timeout)) {
+			timed_out = true;
+			break;
+		}
+
+		msrvals[0] = activate.data;
+
+		atomic_set(&array_cpus_out, 0);
+		stop_core_cpuslocked(cpu, do_array_test, msrvals);
+		status.data = msrvals[1];
+
+		if (status.ctrl_result)
+			break;
+
+		activate.array_bitmask = status.array_bitmask;
+		activate.array_bank = status.array_bank;
+
+	} while (status.array_bitmask);
+
+	ifsd->scan_details = status.data;
+
+	if (status.ctrl_result)
+		ifsd->status = SCAN_TEST_FAIL;
+	else if (timed_out || status.array_bitmask)
+		ifsd->status = SCAN_NOT_TESTED;
+	else
+		ifsd->status = SCAN_TEST_PASS;
+}
+
 /*
  * Initiate per core test. It wakes up work queue threads on the target cpu and
  * its sibling cpu. Once all sibling threads wake up, the scan test gets executed and
@@ -253,6 +341,8 @@  int do_core_test(int cpu, struct device *dev)
 		ifs_test_core(cpu, dev);
 		break;
 	case IFS_ARRAY:
+		ifs_array_test_core(cpu, dev);
+		break;
 	default:
 		return -EINVAL;
 	}