diff mbox series

[v2] firmware: arm_scmi: Queue in scmi layer for mailbox implementation

Message ID 20241009192637.1090238-1-justin.chen@broadcom.com (mailing list archive)
State New, archived
Headers show
Series [v2] firmware: arm_scmi: Queue in scmi layer for mailbox implementation | expand

Commit Message

Justin Chen Oct. 9, 2024, 7:26 p.m. UTC
send_message() does not block in the MBOX implementation. This is
because the mailbox layer has its own queue. However, this confuses
the per xfer timeouts as they all start their timeout ticks in
parallel.

Consider a case where the xfer timeout is 30ms and a SCMI transaction
takes 25ms.

0ms: Message #0 is queued in mailbox layer and sent out, then sits
at scmi_wait_for_message_response() with a timeout of 30ms
1ms: Message #1 is queued in mailbox layer but not sent out yet.
Since send_message() doesn't block, it also sits at
scmi_wait_for_message_response() with a timeout of 30ms
...
25ms: Message #0 is completed, txdone is called and Message #1 is
sent out
31ms: Message #1 times out since the count started at 1ms. Even
though it has only been inflight for 6ms.

Fixes: b53515fa177c ("firmware: arm_scmi: Make MBOX transport a standalone driver")
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
---

Changes in v2:

- Added Fixes tag
- Improved commit message to better capture the issue

 .../firmware/arm_scmi/transports/mailbox.c    | 21 +++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

Comments

Cristian Marussi Oct. 11, 2024, 1:43 p.m. UTC | #1
On Wed, Oct 09, 2024 at 12:26:37PM -0700, Justin Chen wrote:
> send_message() does not block in the MBOX implementation. This is
> because the mailbox layer has its own queue. However, this confuses
> the per xfer timeouts as they all start their timeout ticks in
> parallel.
> 
> Consider a case where the xfer timeout is 30ms and a SCMI transaction
> takes 25ms.
> 
> 0ms: Message #0 is queued in mailbox layer and sent out, then sits
> at scmi_wait_for_message_response() with a timeout of 30ms
> 1ms: Message #1 is queued in mailbox layer but not sent out yet.
> Since send_message() doesn't block, it also sits at
> scmi_wait_for_message_response() with a timeout of 30ms
> ...
> 25ms: Message #0 is completed, txdone is called and Message #1 is
> sent out
> 31ms: Message #1 times out since the count started at 1ms. Even
> though it has only been inflight for 6ms.
> 
> Fixes: b53515fa177c ("firmware: arm_scmi: Make MBOX transport a standalone driver")
> Signed-off-by: Justin Chen <justin.chen@broadcom.com>
> ---
> 
> Changes in v2:

Hi Justin,

thanks.

A few nitpicks and one remark down below.

> 
> - Added Fixes tag
> - Improved commit message to better capture the issue
> 
>  .../firmware/arm_scmi/transports/mailbox.c    | 21 +++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/firmware/arm_scmi/transports/mailbox.c b/drivers/firmware/arm_scmi/transports/mailbox.c
> index 1a754dee24f7..30bc2865582f 100644
> --- a/drivers/firmware/arm_scmi/transports/mailbox.c
> +++ b/drivers/firmware/arm_scmi/transports/mailbox.c
> @@ -33,6 +33,7 @@ struct scmi_mailbox {
>  	struct mbox_chan *chan_platform_receiver;
>  	struct scmi_chan_info *cinfo;
>  	struct scmi_shared_mem __iomem *shmem;
> +	struct mutex chan_lock;

Missing Doxygen comment....

arm_scmi/transports/mailbox.c:39: warning: Function parameter or struct member 'chan_lock' not described in 'scmi_mailbox

>  };
>  
>  #define client_to_scmi_mailbox(c) container_of(c, struct scmi_mailbox, cl)
> @@ -205,6 +206,7 @@ static int mailbox_chan_setup(struct scmi_chan_info *cinfo, struct device *dev,
>  	cl->rx_callback = rx_callback;
>  	cl->tx_block = false;
>  	cl->knows_txdone = tx;
> +	mutex_init(&smbox->chan_lock);

This could be move at the end of this function after the channels are
requested and it is no more possible to fail and bail out....messages
wont flow and lock wont be used anyway until this chan_setup completes...
...BUT I have NOT string opinion about this....you can leave it here
too...up to you
>  
>  	smbox->chan = mbox_request_channel(cl, tx ? 0 : p2a_chan);
>  	if (IS_ERR(smbox->chan)) {
> @@ -267,11 +269,21 @@ static int mailbox_send_message(struct scmi_chan_info *cinfo,
>  	struct scmi_mailbox *smbox = cinfo->transport_info;
>  	int ret;
>  
> +	/*
> +	 * The mailbox layer has it's own queue. However the mailbox queue confuses
 				  its own queue

> +	 * the per message SCMI timeouts since the clock starts when the message is
> +	 * submitted into the mailbox queue. So when multiple messages are queued up
> +	 * the clock starts on all messages instead of only the one inflight.
> +	 */
> +	mutex_lock(&smbox->chan_lock);
> +
>  	ret = mbox_send_message(smbox->chan, xfer);
>  
>  	/* mbox_send_message returns non-negative value on success, so reset */
>  	if (ret > 0)
>  		ret = 0;
> +	else
> +		mutex_unlock(&smbox->chan_lock);

I think this should be

	else if (ret < 0)
		mutex_unlock(&smbox->chan_lock);

...since looking at mbox_send_message() and its implementation it returns
NON-Negative integers on Success...so 0 from mbox_send_mmessage() also means
SUCCESS and we should not release the mutex (I think the 'ret' returned
here is the idx from add_to_rbuf...so it will become zero peridiocally
on normal successfull operation)

>  
>  	return ret;
>  }
> @@ -281,13 +293,10 @@ static void mailbox_mark_txdone(struct scmi_chan_info *cinfo, int ret,
>  {
>  	struct scmi_mailbox *smbox = cinfo->transport_info;
>  
> -	/*
> -	 * NOTE: we might prefer not to need the mailbox ticker to manage the
> -	 * transfer queueing since the protocol layer queues things by itself.
> -	 * Unfortunately, we have to kick the mailbox framework after we have
> -	 * received our message.
> -	 */
>  	mbox_client_txdone(smbox->chan, ret);
> +
> +	/* Release channel */
> +	mutex_unlock(&smbox->chan_lock);
>  }
>  
>  static void mailbox_fetch_response(struct scmi_chan_info *cinfo,
> -- 
> 2.34.1
> 

I gave it a go on a couple of JUNO, without any issues.

Other than the above, LGTM.

Reviewed-by: Cristian Marussi <cristian.marussi@arm.com>
Tested-by: Cristian Marussi <cristian.marussi@arm.com>

Thanks,
Cristian
Florian Fainelli Oct. 11, 2024, 4:58 p.m. UTC | #2
On 10/11/24 06:43, Cristian Marussi wrote:
> On Wed, Oct 09, 2024 at 12:26:37PM -0700, Justin Chen wrote:
>> send_message() does not block in the MBOX implementation. This is
>> because the mailbox layer has its own queue. However, this confuses
>> the per xfer timeouts as they all start their timeout ticks in
>> parallel.
>>
>> Consider a case where the xfer timeout is 30ms and a SCMI transaction
>> takes 25ms.
>>
>> 0ms: Message #0 is queued in mailbox layer and sent out, then sits
>> at scmi_wait_for_message_response() with a timeout of 30ms
>> 1ms: Message #1 is queued in mailbox layer but not sent out yet.
>> Since send_message() doesn't block, it also sits at
>> scmi_wait_for_message_response() with a timeout of 30ms
>> ...
>> 25ms: Message #0 is completed, txdone is called and Message #1 is
>> sent out
>> 31ms: Message #1 times out since the count started at 1ms. Even
>> though it has only been inflight for 6ms.
>>
>> Fixes: b53515fa177c ("firmware: arm_scmi: Make MBOX transport a standalone driver")
>> Signed-off-by: Justin Chen <justin.chen@broadcom.com>
>> ---
>>
>> Changes in v2:
> 
> Hi Justin,
> 
> thanks.
> 
> A few nitpicks and one remark down below.

Since there will likely be a v3, the Fixes tag should IMHO go way back 
to when this problem has been in existence, maybe:

Fixes: 5c8a47a5a91d ("firmware: arm_scmi: Make scmi core independent of 
the transport type")
Justin Chen Oct. 11, 2024, 7:15 p.m. UTC | #3
On 10/11/24 6:43 AM, Cristian Marussi wrote:
> On Wed, Oct 09, 2024 at 12:26:37PM -0700, Justin Chen wrote:
>> send_message() does not block in the MBOX implementation. This is
>> because the mailbox layer has its own queue. However, this confuses
>> the per xfer timeouts as they all start their timeout ticks in
>> parallel.
>>
>> Consider a case where the xfer timeout is 30ms and a SCMI transaction
>> takes 25ms.
>>
>> 0ms: Message #0 is queued in mailbox layer and sent out, then sits
>> at scmi_wait_for_message_response() with a timeout of 30ms
>> 1ms: Message #1 is queued in mailbox layer but not sent out yet.
>> Since send_message() doesn't block, it also sits at
>> scmi_wait_for_message_response() with a timeout of 30ms
>> ...
>> 25ms: Message #0 is completed, txdone is called and Message #1 is
>> sent out
>> 31ms: Message #1 times out since the count started at 1ms. Even
>> though it has only been inflight for 6ms.
>>
>> Fixes: b53515fa177c ("firmware: arm_scmi: Make MBOX transport a standalone driver")
>> Signed-off-by: Justin Chen <justin.chen@broadcom.com>
>> ---
>>
>> Changes in v2:
> 
> Hi Justin,
> 
> thanks.
> 
> A few nitpicks and one remark down below.
> 
>>
>> - Added Fixes tag
>> - Improved commit message to better capture the issue
>>
>>   .../firmware/arm_scmi/transports/mailbox.c    | 21 +++++++++++++------
>>   1 file changed, 15 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/firmware/arm_scmi/transports/mailbox.c b/drivers/firmware/arm_scmi/transports/mailbox.c
>> index 1a754dee24f7..30bc2865582f 100644
>> --- a/drivers/firmware/arm_scmi/transports/mailbox.c
>> +++ b/drivers/firmware/arm_scmi/transports/mailbox.c
>> @@ -33,6 +33,7 @@ struct scmi_mailbox {
>>   	struct mbox_chan *chan_platform_receiver;
>>   	struct scmi_chan_info *cinfo;
>>   	struct scmi_shared_mem __iomem *shmem;
>> +	struct mutex chan_lock;
> 
> Missing Doxygen comment....
> 
> arm_scmi/transports/mailbox.c:39: warning: Function parameter or struct member 'chan_lock' not described in 'scmi_mailbox
> 
>>   };
>>   
>>   #define client_to_scmi_mailbox(c) container_of(c, struct scmi_mailbox, cl)
>> @@ -205,6 +206,7 @@ static int mailbox_chan_setup(struct scmi_chan_info *cinfo, struct device *dev,
>>   	cl->rx_callback = rx_callback;
>>   	cl->tx_block = false;
>>   	cl->knows_txdone = tx;
>> +	mutex_init(&smbox->chan_lock);
> 
> This could be move at the end of this function after the channels are
> requested and it is no more possible to fail and bail out....messages
> wont flow and lock wont be used anyway until this chan_setup completes...
> ...BUT I have NOT string opinion about this....you can leave it here
> too...up to you
>>   
>>   	smbox->chan = mbox_request_channel(cl, tx ? 0 : p2a_chan);
>>   	if (IS_ERR(smbox->chan)) {
>> @@ -267,11 +269,21 @@ static int mailbox_send_message(struct scmi_chan_info *cinfo,
>>   	struct scmi_mailbox *smbox = cinfo->transport_info;
>>   	int ret;
>>   
>> +	/*
>> +	 * The mailbox layer has it's own queue. However the mailbox queue confuses
>   				  its own queue
> 
>> +	 * the per message SCMI timeouts since the clock starts when the message is
>> +	 * submitted into the mailbox queue. So when multiple messages are queued up
>> +	 * the clock starts on all messages instead of only the one inflight.
>> +	 */
>> +	mutex_lock(&smbox->chan_lock);
>> +
>>   	ret = mbox_send_message(smbox->chan, xfer);
>>   
>>   	/* mbox_send_message returns non-negative value on success, so reset */
>>   	if (ret > 0)
>>   		ret = 0;
>> +	else
>> +		mutex_unlock(&smbox->chan_lock);
> 
> I think this should be
> 
> 	else if (ret < 0)
> 		mutex_unlock(&smbox->chan_lock);
> 
> ...since looking at mbox_send_message() and its implementation it returns
> NON-Negative integers on Success...so 0 from mbox_send_mmessage() also means
> SUCCESS and we should not release the mutex (I think the 'ret' returned
> here is the idx from add_to_rbuf...so it will become zero peridiocally
> on normal successfull operation)
> 

Yes, I see the implementation. Looks like it returns the position in the 
ring buffer. I also confirmed with CONFIG_DEBUG_MUTEXES which triggers a 
warning.

What about this?
if (ret >= 0)
	ret = 0
else
	mutex_unlock(&smbox->chan_lock);

A bit easier to read IMO.

Thanks,
Justin

>>   
>>   	return ret;
>>   }
>> @@ -281,13 +293,10 @@ static void mailbox_mark_txdone(struct scmi_chan_info *cinfo, int ret,
>>   {
>>   	struct scmi_mailbox *smbox = cinfo->transport_info;
>>   
>> -	/*
>> -	 * NOTE: we might prefer not to need the mailbox ticker to manage the
>> -	 * transfer queueing since the protocol layer queues things by itself.
>> -	 * Unfortunately, we have to kick the mailbox framework after we have
>> -	 * received our message.
>> -	 */
>>   	mbox_client_txdone(smbox->chan, ret);
>> +
>> +	/* Release channel */
>> +	mutex_unlock(&smbox->chan_lock);
>>   }
>>   
>>   static void mailbox_fetch_response(struct scmi_chan_info *cinfo,
>> -- 
>> 2.34.1
>>
> 
> I gave it a go on a couple of JUNO, without any issues.
> 
> Other than the above, LGTM.
> 
> Reviewed-by: Cristian Marussi <cristian.marussi@arm.com>
> Tested-by: Cristian Marussi <cristian.marussi@arm.com>
> 
> Thanks,
> Cristian
>
Cristian Marussi Oct. 13, 2024, 9:26 a.m. UTC | #4
On Fri, Oct 11, 2024 at 12:15:07PM -0700, Justin Chen wrote:
> 
> 
> On 10/11/24 6:43 AM, Cristian Marussi wrote:
> > On Wed, Oct 09, 2024 at 12:26:37PM -0700, Justin Chen wrote:
> > > send_message() does not block in the MBOX implementation. This is
> > > because the mailbox layer has its own queue. However, this confuses
> > > the per xfer timeouts as they all start their timeout ticks in
> > > parallel.
> > > 
> > > Consider a case where the xfer timeout is 30ms and a SCMI transaction
> > > takes 25ms.
> > > 
> > > 0ms: Message #0 is queued in mailbox layer and sent out, then sits
> > > at scmi_wait_for_message_response() with a timeout of 30ms
> > > 1ms: Message #1 is queued in mailbox layer but not sent out yet.
> > > Since send_message() doesn't block, it also sits at
> > > scmi_wait_for_message_response() with a timeout of 30ms
> > > ...
> > > 25ms: Message #0 is completed, txdone is called and Message #1 is
> > > sent out
> > > 31ms: Message #1 times out since the count started at 1ms. Even
> > > though it has only been inflight for 6ms.
> > > 
> > > Fixes: b53515fa177c ("firmware: arm_scmi: Make MBOX transport a standalone driver")
> > > Signed-off-by: Justin Chen <justin.chen@broadcom.com>
> > > ---
> > > 
> > > Changes in v2:
> > 
> > Hi Justin,
> > 
> > thanks.
> > 
> > A few nitpicks and one remark down below.
> > 
> > > 
> > > - Added Fixes tag
> > > - Improved commit message to better capture the issue
> > > 
> > >   .../firmware/arm_scmi/transports/mailbox.c    | 21 +++++++++++++------
> > >   1 file changed, 15 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/drivers/firmware/arm_scmi/transports/mailbox.c b/drivers/firmware/arm_scmi/transports/mailbox.c
> > > index 1a754dee24f7..30bc2865582f 100644
> > > --- a/drivers/firmware/arm_scmi/transports/mailbox.c
> > > +++ b/drivers/firmware/arm_scmi/transports/mailbox.c
> > > @@ -33,6 +33,7 @@ struct scmi_mailbox {
> > >   	struct mbox_chan *chan_platform_receiver;
> > >   	struct scmi_chan_info *cinfo;
> > >   	struct scmi_shared_mem __iomem *shmem;
> > > +	struct mutex chan_lock;
> > 
> > Missing Doxygen comment....
> > 
> > arm_scmi/transports/mailbox.c:39: warning: Function parameter or struct member 'chan_lock' not described in 'scmi_mailbox
> > 
> > >   };
> > >   #define client_to_scmi_mailbox(c) container_of(c, struct scmi_mailbox, cl)
> > > @@ -205,6 +206,7 @@ static int mailbox_chan_setup(struct scmi_chan_info *cinfo, struct device *dev,
> > >   	cl->rx_callback = rx_callback;
> > >   	cl->tx_block = false;
> > >   	cl->knows_txdone = tx;
> > > +	mutex_init(&smbox->chan_lock);
> > 
> > This could be move at the end of this function after the channels are
> > requested and it is no more possible to fail and bail out....messages
> > wont flow and lock wont be used anyway until this chan_setup completes...
> > ...BUT I have NOT string opinion about this....you can leave it here
> > too...up to you
> > >   	smbox->chan = mbox_request_channel(cl, tx ? 0 : p2a_chan);
> > >   	if (IS_ERR(smbox->chan)) {
> > > @@ -267,11 +269,21 @@ static int mailbox_send_message(struct scmi_chan_info *cinfo,
> > >   	struct scmi_mailbox *smbox = cinfo->transport_info;
> > >   	int ret;
> > > +	/*
> > > +	 * The mailbox layer has it's own queue. However the mailbox queue confuses
> >   				  its own queue
> > 
> > > +	 * the per message SCMI timeouts since the clock starts when the message is
> > > +	 * submitted into the mailbox queue. So when multiple messages are queued up
> > > +	 * the clock starts on all messages instead of only the one inflight.
> > > +	 */
> > > +	mutex_lock(&smbox->chan_lock);
> > > +
> > >   	ret = mbox_send_message(smbox->chan, xfer);
> > >   	/* mbox_send_message returns non-negative value on success, so reset */
> > >   	if (ret > 0)
> > >   		ret = 0;
> > > +	else
> > > +		mutex_unlock(&smbox->chan_lock);
> > 
> > I think this should be
> > 
> > 	else if (ret < 0)
> > 		mutex_unlock(&smbox->chan_lock);
> > 
> > ...since looking at mbox_send_message() and its implementation it returns
> > NON-Negative integers on Success...so 0 from mbox_send_mmessage() also means
> > SUCCESS and we should not release the mutex (I think the 'ret' returned
> > here is the idx from add_to_rbuf...so it will become zero peridiocally
> > on normal successfull operation)
> > 
> 
> Yes, I see the implementation. Looks like it returns the position in the
> ring buffer. I also confirmed with CONFIG_DEBUG_MUTEXES which triggers a
> warning.
> 
> What about this?
> if (ret >= 0)
> 	ret = 0
> else
> 	mutex_unlock(&smbox->chan_lock);
> 
> A bit easier to read IMO.

Oh yes much better definitely...or, maybe, even more simply to read:

 ...

  mutex_lock(&smbox->chan_lock);
  ret = mbox_send_message(smbox->chan, xfer);
  if (ret < 0) {
 	mutex_unlock(&smbox->chan_lock);
        return ret;
  }

  return 0;
}

.... up to You...not sure what Sudeep prefers...

Thanks,
Cristian
Sudeep Holla Oct. 14, 2024, 9:55 a.m. UTC | #5
On Sun, Oct 13, 2024 at 10:26:49AM +0100, Cristian Marussi wrote:
> On Fri, Oct 11, 2024 at 12:15:07PM -0700, Justin Chen wrote:
> > 
> > Yes, I see the implementation. Looks like it returns the position in the
> > ring buffer. I also confirmed with CONFIG_DEBUG_MUTEXES which triggers a
> > warning.
> > 
> > What about this?
> > if (ret >= 0)
> > 	ret = 0
> > else
> > 	mutex_unlock(&smbox->chan_lock);
> > 
> > A bit easier to read IMO.
> 
> Oh yes much better definitely...or, maybe, even more simply to read:
> 
>  ...
> 
>   mutex_lock(&smbox->chan_lock);
>   ret = mbox_send_message(smbox->chan, xfer);
>   if (ret < 0) {
>  	mutex_unlock(&smbox->chan_lock);
>         return ret;
>   }
> 
>   return 0;
> }
> 
> .... up to You...not sure what Sudeep prefers...
>

I like this better. Also I was hoping Justin would send v3 soonish, I want
to send PR for fixes soon. So I have gone ahead and fixed all your comments
myself. I had seen the doxygen one from builder already and had fixed it up
last week when I added this to -next.
Florian Fainelli Oct. 14, 2024, 5:21 p.m. UTC | #6
On 10/14/24 02:55, Sudeep Holla wrote:
> On Sun, Oct 13, 2024 at 10:26:49AM +0100, Cristian Marussi wrote:
>> On Fri, Oct 11, 2024 at 12:15:07PM -0700, Justin Chen wrote:
>>>
>>> Yes, I see the implementation. Looks like it returns the position in the
>>> ring buffer. I also confirmed with CONFIG_DEBUG_MUTEXES which triggers a
>>> warning.
>>>
>>> What about this?
>>> if (ret >= 0)
>>> 	ret = 0
>>> else
>>> 	mutex_unlock(&smbox->chan_lock);
>>>
>>> A bit easier to read IMO.
>>
>> Oh yes much better definitely...or, maybe, even more simply to read:
>>
>>   ...
>>
>>    mutex_lock(&smbox->chan_lock);
>>    ret = mbox_send_message(smbox->chan, xfer);
>>    if (ret < 0) {
>>   	mutex_unlock(&smbox->chan_lock);
>>          return ret;
>>    }
>>
>>    return 0;
>> }
>>
>> .... up to You...not sure what Sudeep prefers...
>>
> 
> I like this better. Also I was hoping Justin would send v3 soonish, I want
> to send PR for fixes soon. So I have gone ahead and fixed all your comments
> myself. I had seen the doxygen one from builder already and had fixed it up
> last week when I added this to -next.
> 

Here's his v3:

https://lore.kernel.org/all/20241014160717.1678953-1-justin.chen@broadcom.com/

we are on the other side of the pond ;)
Sudeep Holla Oct. 14, 2024, 8:29 p.m. UTC | #7
On Mon, Oct 14, 2024 at 10:21:28AM -0700, Florian Fainelli wrote:
> On 10/14/24 02:55, Sudeep Holla wrote:
> > On Sun, Oct 13, 2024 at 10:26:49AM +0100, Cristian Marussi wrote:
> > > On Fri, Oct 11, 2024 at 12:15:07PM -0700, Justin Chen wrote:
> > > > 
> > > > Yes, I see the implementation. Looks like it returns the position in the
> > > > ring buffer. I also confirmed with CONFIG_DEBUG_MUTEXES which triggers a
> > > > warning.
> > > > 
> > > > What about this?
> > > > if (ret >= 0)
> > > > 	ret = 0
> > > > else
> > > > 	mutex_unlock(&smbox->chan_lock);
> > > > 
> > > > A bit easier to read IMO.
> > > 
> > > Oh yes much better definitely...or, maybe, even more simply to read:
> > > 
> > >   ...
> > > 
> > >    mutex_lock(&smbox->chan_lock);
> > >    ret = mbox_send_message(smbox->chan, xfer);
> > >    if (ret < 0) {
> > >   	mutex_unlock(&smbox->chan_lock);
> > >          return ret;
> > >    }
> > > 
> > >    return 0;
> > > }
> > > 
> > > .... up to You...not sure what Sudeep prefers...
> > > 
> > 
> > I like this better. Also I was hoping Justin would send v3 soonish, I want
> > to send PR for fixes soon. So I have gone ahead and fixed all your comments
> > myself. I had seen the doxygen one from builder already and had fixed it up
> > last week when I added this to -next.
> > 
> 
> Here's his v3:
> 
> https://lore.kernel.org/all/20241014160717.1678953-1-justin.chen@broadcom.com/
> 
> we are on the other side of the pond ;)

I do understand 
diff mbox series

Patch

diff --git a/drivers/firmware/arm_scmi/transports/mailbox.c b/drivers/firmware/arm_scmi/transports/mailbox.c
index 1a754dee24f7..30bc2865582f 100644
--- a/drivers/firmware/arm_scmi/transports/mailbox.c
+++ b/drivers/firmware/arm_scmi/transports/mailbox.c
@@ -33,6 +33,7 @@  struct scmi_mailbox {
 	struct mbox_chan *chan_platform_receiver;
 	struct scmi_chan_info *cinfo;
 	struct scmi_shared_mem __iomem *shmem;
+	struct mutex chan_lock;
 };
 
 #define client_to_scmi_mailbox(c) container_of(c, struct scmi_mailbox, cl)
@@ -205,6 +206,7 @@  static int mailbox_chan_setup(struct scmi_chan_info *cinfo, struct device *dev,
 	cl->rx_callback = rx_callback;
 	cl->tx_block = false;
 	cl->knows_txdone = tx;
+	mutex_init(&smbox->chan_lock);
 
 	smbox->chan = mbox_request_channel(cl, tx ? 0 : p2a_chan);
 	if (IS_ERR(smbox->chan)) {
@@ -267,11 +269,21 @@  static int mailbox_send_message(struct scmi_chan_info *cinfo,
 	struct scmi_mailbox *smbox = cinfo->transport_info;
 	int ret;
 
+	/*
+	 * The mailbox layer has it's own queue. However the mailbox queue confuses
+	 * the per message SCMI timeouts since the clock starts when the message is
+	 * submitted into the mailbox queue. So when multiple messages are queued up
+	 * the clock starts on all messages instead of only the one inflight.
+	 */
+	mutex_lock(&smbox->chan_lock);
+
 	ret = mbox_send_message(smbox->chan, xfer);
 
 	/* mbox_send_message returns non-negative value on success, so reset */
 	if (ret > 0)
 		ret = 0;
+	else
+		mutex_unlock(&smbox->chan_lock);
 
 	return ret;
 }
@@ -281,13 +293,10 @@  static void mailbox_mark_txdone(struct scmi_chan_info *cinfo, int ret,
 {
 	struct scmi_mailbox *smbox = cinfo->transport_info;
 
-	/*
-	 * NOTE: we might prefer not to need the mailbox ticker to manage the
-	 * transfer queueing since the protocol layer queues things by itself.
-	 * Unfortunately, we have to kick the mailbox framework after we have
-	 * received our message.
-	 */
 	mbox_client_txdone(smbox->chan, ret);
+
+	/* Release channel */
+	mutex_unlock(&smbox->chan_lock);
 }
 
 static void mailbox_fetch_response(struct scmi_chan_info *cinfo,