diff mbox series

[1/5] x86/mce/inject: Check if a bank is unpopulated before error simulation

Message ID 20210915232739.6367-2-Smita.KoralahalliChannabasappa@amd.com (mailing list archive)
State New, archived
Headers show
Series x86/mce: Handle error simulation failures in mce-inject module | expand

Commit Message

Smita Koralahalli Sept. 15, 2021, 11:27 p.m. UTC
The MCA_IPID register uniquely identifies a bank's type on Scalable MCA
(SMCA) systems. When an MCA bank is not populated, the MCA_IPID register
will read as zero and writes to it will be ignored. Check the value of
this register before trying to simulate the error.

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
 arch/x86/kernel/cpu/mce/inject.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Comments

Borislav Petkov Sept. 24, 2021, 8:26 a.m. UTC | #1
On Wed, Sep 15, 2021 at 06:27:35PM -0500, Smita Koralahalli wrote:
> The MCA_IPID register uniquely identifies a bank's type on Scalable MCA
> (SMCA) systems. When an MCA bank is not populated, the MCA_IPID register
> will read as zero and writes to it will be ignored. Check the value of
> this register before trying to simulate the error.
> 
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
>  arch/x86/kernel/cpu/mce/inject.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
> index 0bfc14041bbb..51ac575c4605 100644
> --- a/arch/x86/kernel/cpu/mce/inject.c
> +++ b/arch/x86/kernel/cpu/mce/inject.c
> @@ -577,6 +577,24 @@ static int inj_bank_set(void *data, u64 val)
>  	}
>  
>  	m->bank = val;
> +
> +	/* Read IPID value to determine if a bank is unpopulated on the target
> +	 * CPU.
> +	 */

Kernel comments style format is:

	/*
	 * A sentence ending with a full-stop.
	 * Another sentence. ...
	 * More sentences. ...
	 */

> +	if (boot_cpu_has(X86_FEATURE_SMCA)) {

This whole thing belongs into inj_ipid_set() where you should verify
whether the bank is set when you try to set the IPID for that bank.

> +
> +		/* Check for user provided IPID value. */
> +		if (!m->ipid) {
> +			rdmsrl_on_cpu(m->extcpu, MSR_AMD64_SMCA_MCx_IPID(val),
> +				      &m->ipid);

Oh well, one IPI per ipid write. We're doing injection so we can't be on
a production machine so who cares about IPIs there.

> +			if (!m->ipid) {
> +				pr_err("Error simulation not possible: Bank %llu unpopulated\n",


"Cannot set IPID for bank... - bank %d unpopulated\n"

Also, in all your text, use "injection" instead of "simulation" so that
there's no confusion.

Thx.
Koralahalli Channabasappa, Smita Sept. 27, 2021, 7:51 p.m. UTC | #2
Hi Boris,

On 9/24/21 3:26 AM, Borislav Petkov wrote:

> On Wed, Sep 15, 2021 at 06:27:35PM -0500, Smita Koralahalli wrote:
>> The MCA_IPID register uniquely identifies a bank's type on Scalable MCA
>> (SMCA) systems. When an MCA bank is not populated, the MCA_IPID register
>> will read as zero and writes to it will be ignored. Check the value of
>> this register before trying to simulate the error.
>>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>> ---
>>   arch/x86/kernel/cpu/mce/inject.c | 18 ++++++++++++++++++
>>   1 file changed, 18 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
>> index 0bfc14041bbb..51ac575c4605 100644
>> --- a/arch/x86/kernel/cpu/mce/inject.c
>> +++ b/arch/x86/kernel/cpu/mce/inject.c
>> @@ -577,6 +577,24 @@ static int inj_bank_set(void *data, u64 val)
>>   	}
>>
>> +	if (boot_cpu_has(X86_FEATURE_SMCA)) {
> This whole thing belongs into inj_ipid_set() where you should verify
> whether the bank is set when you try to set the IPID for that bank.

Can you please elaborate on this? I'm not sure if I understood this
right. Should I read the ipid file to verify that the user has input
proper ipid? If ipid file reads zero then do rdmsrl_on_cpu?

Thanks,
Smita

>
> +			if (!m->ipid) {
> +				pr_err("Error simulation not possible: Bank %llu unpopulated\n",
>> +
>> +		/* Check for user provided IPID value. */
>> +		if (!m->ipid) {
>> +			rdmsrl_on_cpu(m->extcpu, MSR_AMD64_SMCA_MCx_IPID(val),
>> +				      &m->ipid);
> Oh well, one IPI per ipid write. We're doing injection so we can't be on
> a production machine so who cares about IPIs there.
>
Borislav Petkov Sept. 27, 2021, 8:15 p.m. UTC | #3
On Mon, Sep 27, 2021 at 02:51:56PM -0500, Smita Koralahalli Channabasappa wrote:
> Can you please elaborate on this? I'm not sure if I understood this
> right. Should I read the ipid file to verify that the user has input
> proper ipid? If ipid file reads zero then do rdmsrl_on_cpu?

No, on a write to the ipid file you should do that checking and write if
the bank is populated or fail the write otherwise. And you should put
all that code in inj_bank_set() - that's why I say "on a write to the
ipid file".

And instead of boot_cpu_has() you should use cpu_feature_enabled().

Makes sense?
Koralahalli Channabasappa, Smita Sept. 27, 2021, 9:56 p.m. UTC | #4
On 9/27/21 3:15 PM, Borislav Petkov wrote:

> On Mon, Sep 27, 2021 at 02:51:56PM -0500, Smita Koralahalli Channabasappa wrote:
>> Can you please elaborate on this? I'm not sure if I understood this
>> right. Should I read the ipid file to verify that the user has input
>> proper ipid? If ipid file reads zero then do rdmsrl_on_cpu?
> No, on a write to the ipid file you should do that checking and write if
> the bank is populated or fail the write otherwise. And you should put
> all that code in inj_bank_set() - that's why I say "on a write to the
> ipid file".
>
> And instead of boot_cpu_has() you should use cpu_feature_enabled().
>
> Makes sense?

Yes, this makes sense to me now. But you meant to say inj_ipid_set()
instead of inj_bank_set()..?

Something like this:

-MCE_INJECT_SET(ipid)

+static int inj_ipid_set(void *data, u64 val)
+{
+	struct mce *m = (struct mce*)data;

+	if cpu_feature_enabled(X86_FEATURE_SMCA)) {

+		rdmsrl_on_cpu(..
		..
		..
+	m->ipid = val;
+	..
+}

Thanks,
Borislav Petkov Sept. 27, 2021, 10:05 p.m. UTC | #5
On Mon, Sep 27, 2021 at 04:56:17PM -0500, Smita Koralahalli Channabasappa wrote:
> Yes, this makes sense to me now. But you meant to say inj_ipid_set()
> instead of inj_bank_set()..?

Yeah, I had it correct before:

"This whole thing belongs into inj_ipid_set() where you should verify... "

> 
> Something like this:
> 
> -MCE_INJECT_SET(ipid)
> 
> +static int inj_ipid_set(void *data, u64 val)
> +{
> +	struct mce *m = (struct mce*)data;
> 
> +	if cpu_feature_enabled(X86_FEATURE_SMCA)) {
> 
> +		rdmsrl_on_cpu(..
> 		..
> 		..
> +	m->ipid = val;
> +	..
> +}

Yes, and return proper error codes.

Thx.
Koralahalli Channabasappa, Smita Oct. 11, 2021, 9:12 p.m. UTC | #6
Hi Boris,

Sorry for the delayed response.

When I was coding this up, I came across few issues. Mentioning below.

On 9/24/21 3:26 AM, Borislav Petkov wrote:

> On Wed, Sep 15, 2021 at 06:27:35PM -0500, Smita Koralahalli wrote:
>> The MCA_IPID register uniquely identifies a bank's type on Scalable MCA
>> (SMCA) systems. When an MCA bank is not populated, the MCA_IPID register
>> will read as zero and writes to it will be ignored. Check the value of
>> this register before trying to simulate the error.
>>
>> Signed-off-by: Smita Koralahalli<Smita.KoralahalliChannabasappa@amd.com>
>> ---
>>   arch/x86/kernel/cpu/mce/inject.c | 18 ++++++++++++++++++
>>   1 file changed, 18 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
>> index 0bfc14041bbb..51ac575c4605 100644
>> --- a/arch/x86/kernel/cpu/mce/inject.c
>> +++ b/arch/x86/kernel/cpu/mce/inject.c
>> @@ -577,6 +577,24 @@ static int inj_bank_set(void *data, u64 val)
>>   	}
>>   
>>   	m->bank = val;
>> +
>> +	/* Read IPID value to determine if a bank is unpopulated on the target
>> +	 * CPU.
>> +	 */
> Kernel comments style format is:
>
> 	/*
> 	 * A sentence ending with a full-stop.
> 	 * Another sentence. ...
> 	 * More sentences. ...
> 	 */
>
>> +	if (boot_cpu_has(X86_FEATURE_SMCA)) {
> This whole thing belongs into inj_ipid_set() where you should verify
> whether the bank is set when you try to set the IPID for that bank.

I do not have the bank number in order to look up the IPID for that bank.
I couldn't know the bank number because mce-inject files are synchronized
in a way that once the bank number is written the injection starts.
Can you please suggest what needs to be done here?
  
Also, the IPID register is read only from the OS, hence the user provided
IPID values could be useful for "sw" error injection types. For "hw" error
injection types we need to read from the registers to determine the IPID
value.

Should there be two cases where on a "sw" injection use the user provided
IPID value whereas on "hw" injection read from registers?

I'm pasting the code snippet after rework on the comments.

static int inj_ipid_set(void *data, u64 val)
{
         struct mce *m = (struct mce *)data;

         if (cpu_feature_enabled(X86_FEATURE_SMCA)) {
                 if (val && inj_type == SW_INJ)
                         m->ipid = val;
                 else {
                         rdmsrl_on_cpu(m->extcpu, MSR_AMD64_SMCA_MCx_IPID(?),
                                       &m->ipid); // Requires bank number here.
                         if (!m->ipid) {
                                 pr_err("Cannot set IPID - unpopulated bank\n");
                                 return -ENODEV;
                         }
                 }
         }

         return 0;
  }

Please let me know what do you think?
Thanks,

>> +
>> +		/* Check for user provided IPID value. */
>> +		if (!m->ipid) {
>> +			rdmsrl_on_cpu(m->extcpu, MSR_AMD64_SMCA_MCx_IPID(val),
>> +				      &m->ipid);
> Oh well, one IPI per ipid write. We're doing injection so we can't be on
> a production machine so who cares about IPIs there.
>
>> +			if (!m->ipid) {
>> +				pr_err("Error simulation not possible: Bank %llu unpopulated\n",
> "Cannot set IPID for bank... - bank %d unpopulated\n"
>
> Also, in all your text, use "injection" instead of "simulation" so that
> there's no confusion.
>
> Thx.
>
Borislav Petkov Oct. 14, 2021, 6:22 p.m. UTC | #7
On Mon, Oct 11, 2021 at 04:12:14PM -0500, Koralahalli Channabasappa, Smita wrote:
> I do not have the bank number in order to look up the IPID for that bank.
> I couldn't know the bank number because mce-inject files are synchronized
> in a way that once the bank number is written the injection starts.
> Can you please suggest what needs to be done here?
>
> Also, the IPID register is read only from the OS, hence the user provided
> IPID values could be useful for "sw" error injection types. For "hw" error
> injection types we need to read from the registers to determine the IPID
> value.
> 
> Should there be two cases where on a "sw" injection use the user provided
> IPID value whereas on "hw" injection read from registers?

Right, that's a good point. So the way I see it is, we need to decide
what is allowed for sw injection and what for hw injection, wrt to IPID
value.

I think for sw injection, we probably should say that since this is
sw only and its purpose is to test the code only, there should not be
any limitations imposed by the underlying machine. Like using the bank
number, for example.

So what you do now for sw injection:

		if (val && inj_type == SW_INJ)
			m->ipid = val;

should be good enough. User simply sets some IPID value and that value
will be used for the bank which is written when injecting.

Now, for hw injection, you have two cases:

1. The bank is unpopulated so setting the IPID there doesn't make any sense.

2. The bank *is* populated and the respective IPID MSR has a value
describing what that bank is.

And in that case, does it even make sense to set the IPID? I don't think
so because that IP block's type - aka IPID - has been set already by
hardware/firmware.

So the way I see it, it makes no sense whatsoever to set the IPID of a
bank during hw injection.

Right?
Koralahalli Channabasappa, Smita Oct. 14, 2021, 8:26 p.m. UTC | #8
On 10/14/21 1:22 PM, Borislav Petkov wrote:

> On Mon, Oct 11, 2021 at 04:12:14PM -0500, Koralahalli Channabasappa, Smita wrote:
>> I do not have the bank number in order to look up the IPID for that bank.
>> I couldn't know the bank number because mce-inject files are synchronized
>> in a way that once the bank number is written the injection starts.
>> Can you please suggest what needs to be done here?
>>
>> Also, the IPID register is read only from the OS, hence the user provided
>> IPID values could be useful for "sw" error injection types. For "hw" error
>> injection types we need to read from the registers to determine the IPID
>> value.
>>
>> Should there be two cases where on a "sw" injection use the user provided
>> IPID value whereas on "hw" injection read from registers?
> Right, that's a good point. So the way I see it is, we need to decide
> what is allowed for sw injection and what for hw injection, wrt to IPID
> value.
>
> I think for sw injection, we probably should say that since this is
> sw only and its purpose is to test the code only, there should not be
> any limitations imposed by the underlying machine. Like using the bank
> number, for example.
>
> So what you do now for sw injection:
>
> 		if (val && inj_type == SW_INJ)
> 			m->ipid = val;
>
> should be good enough. User simply sets some IPID value and that value
> will be used for the bank which is written when injecting.
>
> Now, for hw injection, you have two cases:
>
> 1. The bank is unpopulated so setting the IPID there doesn't make any sense.
>
> 2. The bank *is* populated and the respective IPID MSR has a value
> describing what that bank is.
>
> And in that case, does it even make sense to set the IPID? I don't think
> so because that IP block's type - aka IPID - has been set already by
> hardware/firmware.
>
> So the way I see it, it makes no sense whatsoever to set the IPID of a
> bank during hw injection.
>
> Right?

Yes, I agree. inj_ipid_set() can be used to serve the purpose of setting
user provided IPID on a sw injection only.

My concern was, we need to determine whether the bank is unpopulated or
populated before trying to inject the errors on a hw injection, for which
we need to read the IPID MSR of that bank.

We cannot do that inside inj_ipid_set() as we do not know the bank number
until inj_bank_set() executes which is called after inj_ipid_set().
mce-inject files are synchronized in a way that once the bank number is
written in inj_bank_set(), injection starts.

So this snippet of code:

if (inj_type != SW_INJ) {
	rdmsrl_on_cpu(m->extcpu, MSR_AMD64_SMCA_MCx_IPID(val),&m->ipid);
	if (!m->ipid) {
		pr_err("Error injection not possible - bank %d unpopulated\n");
		return -ENODEV;
	}
}

should be retained inside inj_bank_set() ?

And inj_ipid_set() should just set m->ipid = val on a SW_INJ as you mentioned
above?

Thanks,
Borislav Petkov Oct. 14, 2021, 8:57 p.m. UTC | #9
On Thu, Oct 14, 2021 at 03:26:13PM -0500, Koralahalli Channabasappa, Smita wrote:
> My concern was, we need to determine whether the bank is unpopulated or
> populated before trying to inject the errors on a hw injection, for which
> we need to read the IPID MSR of that bank.

Ah, that. Look at the smca_banks[] array in .../mce/amd.c and how
smca_configure() prepares all banks in there. You could use that array
to query which SMCA bank on which CPU is initialized, before injecting
into it.

> should be retained inside inj_bank_set() ?

And yes, I guess you'll have to do it there because then you know which
bank and which CPU the hw injection is supposed to happen on.

> And inj_ipid_set() should just set m->ipid = val on a SW_INJ as you mentioned
> above?

Yap.

Thx.
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 0bfc14041bbb..51ac575c4605 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -577,6 +577,24 @@  static int inj_bank_set(void *data, u64 val)
 	}
 
 	m->bank = val;
+
+	/* Read IPID value to determine if a bank is unpopulated on the target
+	 * CPU.
+	 */
+	if (boot_cpu_has(X86_FEATURE_SMCA)) {
+
+		/* Check for user provided IPID value. */
+		if (!m->ipid) {
+			rdmsrl_on_cpu(m->extcpu, MSR_AMD64_SMCA_MCx_IPID(val),
+				      &m->ipid);
+			if (!m->ipid) {
+				pr_err("Error simulation not possible: Bank %llu unpopulated\n",
+					val);
+				return -ENODEV;
+			}
+		}
+	}
+
 	do_inject();
 
 	/* Reset injection struct */