diff mbox series

[v4,5/9] misc: amd-sbi: Add support for mailbox error codes

Message ID 20240912070810.1644621-6-akshay.gupta@amd.com (mailing list archive)
State Handled Elsewhere
Headers show
Series misc: Add AMD side band interface(SBI) functionality | expand

Commit Message

Gupta, Akshay Sept. 12, 2024, 7:08 a.m. UTC
APML mailbox protocol returns additional error codes written by
SMU firmware in the out-bound register 0x37. These errors include,
invalid core, message not supported over platform and
others. This additional error codes can be used to provide more
details to userspace.

Signed-off-by: Akshay Gupta <akshay.gupta@amd.com>
Reviewed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
---
Changes since v3:
- update ioctl comment description
 
Changes since v1:
- bifurcated from previous patch 5

 drivers/misc/amd-sbi/rmi-core.c | 12 +++++++++++-
 include/uapi/misc/amd-apml.h    |  5 +++++
 2 files changed, 16 insertions(+), 1 deletion(-)

Comments

Greg Kroah-Hartman Oct. 13, 2024, 3:19 p.m. UTC | #1
On Thu, Sep 12, 2024 at 07:08:06AM +0000, Akshay Gupta wrote:
> --- a/include/uapi/misc/amd-apml.h
> +++ b/include/uapi/misc/amd-apml.h
> @@ -38,6 +38,10 @@ struct apml_message {
>  		__u32 mb_in[2];
>  		__u8 reg_in[8];
>  	} data_in;
> +	/*
> +	 * Error code is returned in case of soft mailbox
> +	 */
> +	__u32 fw_ret_code;
>  } __attribute__((packed));

You can not just randomly change the size of a user/kernel structure
like this, what just broke because of this?

confused,

greg k-h
Gupta, Akshay Oct. 15, 2024, 9:12 a.m. UTC | #2
On 10/13/2024 8:49 PM, Greg KH wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>
>
> On Thu, Sep 12, 2024 at 07:08:06AM +0000, Akshay Gupta wrote:
>> --- a/include/uapi/misc/amd-apml.h
>> +++ b/include/uapi/misc/amd-apml.h
>> @@ -38,6 +38,10 @@ struct apml_message {
>>                __u32 mb_in[2];
>>                __u8 reg_in[8];
>>        } data_in;
>> +     /*
>> +      * Error code is returned in case of soft mailbox
>> +      */
>> +     __u32 fw_ret_code;
>>   } __attribute__((packed));
> You can not just randomly change the size of a user/kernel structure
> like this, what just broke because of this?
>
> confused,

The changes are not because of anything is broken, we support 3 
different protocol under 1 IOCTL using the same structure. I split the 
patch to make it easy to review.
Modification in patch 4, is only for the existing code. This patch 
(patch 5) has additional functionality, so we do not want add multiple 
changes in single patch (patch 4).

The changes done in patches are as follows:

Patch 4:

- Adding basic structure as per current protocol in upstream kernel

Patch 5:

- Adding additional error code from PMFW.

Patch 6:

- Add changes required to support CPUID protocol

Patch 7:

- Comments modification for MCAMSR protocol (structure remains same as 
CPUID)

> greg k-h
>
Greg Kroah-Hartman Oct. 15, 2024, 10:04 a.m. UTC | #3
On Tue, Oct 15, 2024 at 02:42:08PM +0530, Gupta, Akshay wrote:
> On 10/13/2024 8:49 PM, Greg KH wrote:
> > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> > 
> > 
> > On Thu, Sep 12, 2024 at 07:08:06AM +0000, Akshay Gupta wrote:
> > > --- a/include/uapi/misc/amd-apml.h
> > > +++ b/include/uapi/misc/amd-apml.h
> > > @@ -38,6 +38,10 @@ struct apml_message {
> > >                __u32 mb_in[2];
> > >                __u8 reg_in[8];
> > >        } data_in;
> > > +     /*
> > > +      * Error code is returned in case of soft mailbox
> > > +      */
> > > +     __u32 fw_ret_code;
> > >   } __attribute__((packed));
> > You can not just randomly change the size of a user/kernel structure
> > like this, what just broke because of this?
> > 
> > confused,
> 
> The changes are not because of anything is broken, we support 3 different
> protocol under 1 IOCTL using the same structure. I split the patch to make
> it easy to review.
> Modification in patch 4, is only for the existing code. This patch (patch 5)
> has additional functionality, so we do not want add multiple changes in
> single patch (patch 4).
> 
> The changes done in patches are as follows:
> 
> Patch 4:
> 
> - Adding basic structure as per current protocol in upstream kernel

So what if we only take the first 4 patches?  Now any changes after that
would change the user/kernel api and break things.

Please don't write changes and then "fix them up" later on, that's not
how to do stuff as it makes it very difficult to review.  What would you
want to see if _you_ had to review this patch series?

thanks,

greg k-h
Gupta, Akshay Oct. 18, 2024, 9:23 a.m. UTC | #4
On 10/15/2024 3:34 PM, Greg KH wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>
>
> On Tue, Oct 15, 2024 at 02:42:08PM +0530, Gupta, Akshay wrote:
>> On 10/13/2024 8:49 PM, Greg KH wrote:
>>> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>>>
>>>
>>> On Thu, Sep 12, 2024 at 07:08:06AM +0000, Akshay Gupta wrote:
>>>> --- a/include/uapi/misc/amd-apml.h
>>>> +++ b/include/uapi/misc/amd-apml.h
>>>> @@ -38,6 +38,10 @@ struct apml_message {
>>>>                 __u32 mb_in[2];
>>>>                 __u8 reg_in[8];
>>>>         } data_in;
>>>> +     /*
>>>> +      * Error code is returned in case of soft mailbox
>>>> +      */
>>>> +     __u32 fw_ret_code;
>>>>    } __attribute__((packed));
>>> You can not just randomly change the size of a user/kernel structure
>>> like this, what just broke because of this?
>>>
>>> confused,
>> The changes are not because of anything is broken, we support 3 different
>> protocol under 1 IOCTL using the same structure. I split the patch to make
>> it easy to review.
>> Modification in patch 4, is only for the existing code. This patch (patch 5)
>> has additional functionality, so we do not want add multiple changes in
>> single patch (patch 4).
>>
>> The changes done in patches are as follows:
>>
>> Patch 4:
>>
>> - Adding basic structure as per current protocol in upstream kernel
> So what if we only take the first 4 patches?  Now any changes after that
> would change the user/kernel api and break things.

Yes, it will break. We need all the patches to go.

>
> Please don't write changes and then "fix them up" later on, that's not
> how to do stuff as it makes it very difficult to review.  What would you
> want to see if _you_ had to review this patch series?

We submitted a single patch in v1, later split the patch based on each 
functionality for ease of review.

I will squash and submit along with other review comments addressed.

>
> thanks,
>
> greg k-h
Greg Kroah-Hartman Oct. 18, 2024, 9:35 a.m. UTC | #5
On Fri, Oct 18, 2024 at 02:53:26PM +0530, Gupta, Akshay wrote:
> 
> On 10/15/2024 3:34 PM, Greg KH wrote:
> > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> > 
> > 
> > On Tue, Oct 15, 2024 at 02:42:08PM +0530, Gupta, Akshay wrote:
> > > On 10/13/2024 8:49 PM, Greg KH wrote:
> > > > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> > > > 
> > > > 
> > > > On Thu, Sep 12, 2024 at 07:08:06AM +0000, Akshay Gupta wrote:
> > > > > --- a/include/uapi/misc/amd-apml.h
> > > > > +++ b/include/uapi/misc/amd-apml.h
> > > > > @@ -38,6 +38,10 @@ struct apml_message {
> > > > >                 __u32 mb_in[2];
> > > > >                 __u8 reg_in[8];
> > > > >         } data_in;
> > > > > +     /*
> > > > > +      * Error code is returned in case of soft mailbox
> > > > > +      */
> > > > > +     __u32 fw_ret_code;
> > > > >    } __attribute__((packed));
> > > > You can not just randomly change the size of a user/kernel structure
> > > > like this, what just broke because of this?
> > > > 
> > > > confused,
> > > The changes are not because of anything is broken, we support 3 different
> > > protocol under 1 IOCTL using the same structure. I split the patch to make
> > > it easy to review.
> > > Modification in patch 4, is only for the existing code. This patch (patch 5)
> > > has additional functionality, so we do not want add multiple changes in
> > > single patch (patch 4).
> > > 
> > > The changes done in patches are as follows:
> > > 
> > > Patch 4:
> > > 
> > > - Adding basic structure as per current protocol in upstream kernel
> > So what if we only take the first 4 patches?  Now any changes after that
> > would change the user/kernel api and break things.
> 
> Yes, it will break. We need all the patches to go.

That's not how to submit a patch series.  Please work with the other
kernel developers at your company to do this right before resubmitting.
You shouldn't rely on the community to point out basic engineering
problems like this.  Would you want to review a series like this?

> > Please don't write changes and then "fix them up" later on, that's not
> > how to do stuff as it makes it very difficult to review.  What would you
> > want to see if _you_ had to review this patch series?
> 
> We submitted a single patch in v1, later split the patch based on each
> functionality for ease of review.
> 
> I will squash and submit along with other review comments addressed.

No, don't squash, do it in a patch series, one at a time properly such
that if we were to take any moment in time of the series, all would
still work correctly.  That's the proper way to do any sort of software
engineering, this isn't unique to us at all.

thanks,

greg k-h
Gupta, Akshay Oct. 21, 2024, 4:07 p.m. UTC | #6
On 10/18/2024 3:05 PM, Greg KH wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>
>
> On Fri, Oct 18, 2024 at 02:53:26PM +0530, Gupta, Akshay wrote:
>> On 10/15/2024 3:34 PM, Greg KH wrote:
>>> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>>>
>>>
>>> On Tue, Oct 15, 2024 at 02:42:08PM +0530, Gupta, Akshay wrote:
>>>> On 10/13/2024 8:49 PM, Greg KH wrote:
>>>>> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>>>>>
>>>>>
>>>>> On Thu, Sep 12, 2024 at 07:08:06AM +0000, Akshay Gupta wrote:
>>>>>> --- a/include/uapi/misc/amd-apml.h
>>>>>> +++ b/include/uapi/misc/amd-apml.h
>>>>>> @@ -38,6 +38,10 @@ struct apml_message {
>>>>>>                  __u32 mb_in[2];
>>>>>>                  __u8 reg_in[8];
>>>>>>          } data_in;
>>>>>> +     /*
>>>>>> +      * Error code is returned in case of soft mailbox
>>>>>> +      */
>>>>>> +     __u32 fw_ret_code;
>>>>>>     } __attribute__((packed));
>>>>> You can not just randomly change the size of a user/kernel structure
>>>>> like this, what just broke because of this?
>>>>>
>>>>> confused,
>>>> The changes are not because of anything is broken, we support 3 different
>>>> protocol under 1 IOCTL using the same structure. I split the patch to make
>>>> it easy to review.
>>>> Modification in patch 4, is only for the existing code. This patch (patch 5)
>>>> has additional functionality, so we do not want add multiple changes in
>>>> single patch (patch 4).
>>>>
>>>> The changes done in patches are as follows:
>>>>
>>>> Patch 4:
>>>>
>>>> - Adding basic structure as per current protocol in upstream kernel
>>> So what if we only take the first 4 patches?  Now any changes after that
>>> would change the user/kernel api and break things.
>> Yes, it will break. We need all the patches to go.
> That's not how to submit a patch series.  Please work with the other
> kernel developers at your company to do this right before resubmitting.
> You shouldn't rely on the community to point out basic engineering
> problems like this.  Would you want to review a series like this?
>
>>> Please don't write changes and then "fix them up" later on, that's not
>>> how to do stuff as it makes it very difficult to review.  What would you
>>> want to see if _you_ had to review this patch series?
>> We submitted a single patch in v1, later split the patch based on each
>> functionality for ease of review.
>>
>> I will squash and submit along with other review comments addressed.
> No, don't squash, do it in a patch series, one at a time properly such
> that if we were to take any moment in time of the series, all would
> still work correctly.  That's the proper way to do any sort of software
> engineering, this isn't unique to us at all.
>
> thanks,
>
> greg k-h

Hi Greg,

We have compiled and verified individual patch in the patch-set over 
reference BMC platforms.

We have an open-sourced user space library 
https://github.com/amd/esmi_oob_library/ 
<https://github.com/amd/esmi_oob_library/> which depend on the 
out-of-tree kernel modules open-sourced 
https://github.com/amd/apml_modules. <https://github.com/amd/apml_modules.>

This patch-set is an effort to upstream the out-of-tree kernel modules 
open-sourced at https://github.com/amd/apml_modules 
<https://github.com/amd/apml_modules>.

After all the patches are accepted into the Linux, we want to update the 
user-space consumers to move to drivers from Linux kernel and deprecate 
out-of-tree modules.

Thanks,

Akshay
diff mbox series

Patch

diff --git a/drivers/misc/amd-sbi/rmi-core.c b/drivers/misc/amd-sbi/rmi-core.c
index 92d33d589bdc..b4f292303ed4 100644
--- a/drivers/misc/amd-sbi/rmi-core.c
+++ b/drivers/misc/amd-sbi/rmi-core.c
@@ -27,13 +27,15 @@ 
 int rmi_mailbox_xfer(struct sbrmi_data *data,
 		     struct apml_message *msg)
 {
-	unsigned int bytes;
+	unsigned int bytes, ec;
 	int i, ret;
 	int sw_status;
 	u8 byte;
 
 	mutex_lock(&data->lock);
 
+	msg->fw_ret_code = 0;
+
 	/* Indicate firmware a command is to be serviced */
 	ret = regmap_write(data->regmap, SBRMI_INBNDMSG7, START_CMD);
 	if (ret < 0)
@@ -74,6 +76,9 @@  int rmi_mailbox_xfer(struct sbrmi_data *data,
 	if (ret)
 		goto exit_unlock;
 
+	ret = regmap_read(data->regmap, SBRMI_OUTBNDMSG7, &ec);
+	if (ret || ec)
+		goto exit_clear_alert;
 	/*
 	 * For a read operation, the initiator (BMC) reads the firmware
 	 * response Command Data Out[31:0] from SBRMI::OutBndMsg_inst[4:1]
@@ -89,12 +94,17 @@  int rmi_mailbox_xfer(struct sbrmi_data *data,
 		}
 	}
 
+exit_clear_alert:
 	/*
 	 * BMC must write 1'b1 to SBRMI::Status[SwAlertSts] to clear the
 	 * ALERT to initiator
 	 */
 	ret = regmap_write(data->regmap, SBRMI_STATUS,
 			   sw_status | SW_ALERT_MASK);
+	if (ec) {
+		ret = -EPROTOTYPE;
+		msg->fw_ret_code = ec;
+	}
 exit_unlock:
 	mutex_unlock(&data->lock);
 	return ret;
diff --git a/include/uapi/misc/amd-apml.h b/include/uapi/misc/amd-apml.h
index dc926327629d..4207aa08b660 100644
--- a/include/uapi/misc/amd-apml.h
+++ b/include/uapi/misc/amd-apml.h
@@ -38,6 +38,10 @@  struct apml_message {
 		__u32 mb_in[2];
 		__u8 reg_in[8];
 	} data_in;
+	/*
+	 * Error code is returned in case of soft mailbox
+	 */
+	__u32 fw_ret_code;
 } __attribute__((packed));
 
 /**
@@ -60,6 +64,7 @@  struct apml_message {
  * The APML RMI module checks whether the cmd is
  *  - Mailbox message read/write(0x0~0x999)
  *  - returning "-EFAULT" if none of the above
+ * "-EPROTOTYPE" error is returned to provide additional error details
  */
 #define SBRMI_IOCTL_CMD		_IOWR(SB_BASE_IOCTL_NR, 0, struct apml_message)