Message ID | 20241004221257.2888603-1-justin.chen@broadcom.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | firmware: arm_scmi: Queue in scmi layer for mailbox implementation | expand |
On Fri, Oct 04, 2024 at 03:12:57PM -0700, Justin Chen wrote: > The mailbox layer has its own queue. However this confuses the per > message timeouts since the clock starts ticking the moment the messages > get queued up. So all messages in the queue have there timeout clocks > ticking instead of only the message inflight. To fix this, lets move the > queue back into the SCMI layer. Hi Justin, thanks for reporting this, I'll try to understand better in the following since we always avoided a lock in the mailbox driver (till now) since it seemed not to be needed due to the interaction and inner workings of the mailbox subsytem and its queues....so I just want to understand better...not saying that this patch is wrong by itself for now... when you say: "this confuses the per-message timeouts since the clock starts ticking the moment the message get queued up." ...you are referring to the core mailbox subsystem queues and the code flow that derives from mbox_send_message() right ? So roughly the mailbox subsystem callchain involved would be this: mbox_send_message() add_to_rbuf() msg_submit() Am I right ? Is this the code flow you are referring to m? In this case I am not sure what you mean with "the timer starts ticking as soon as the message is queued"...which timer ? ... because: - the SCMI mailbox client is registered as "tx_block = false" so the sent messages are never waited for with wait_for_completion and on the other side - once we have 1 message in-flight already, the next attempt to send a message will result in a call to add_to_rbuf() BUT the following msg_submit() will bailout immediately at line 63 if (!chan->msg_count || chan->active_req) goto exit; after the first message is queued since there is an active_req inflight, and so no more messages will be queued till the next tx_tick(), which is invoked only by us from the SCMI stack when the reply to the inflight message has been read back. (this also means tx_prepare is NOT called until we kick from SCMI MBOX driver with mbox_client_txdone() What I am missing, here ? Thanks, Cristian
On Fri, Oct 04, 2024 at 03:12:57PM -0700, Justin Chen wrote: > The mailbox layer has its own queue. However this confuses the per > message timeouts since the clock starts ticking the moment the messages > get queued up. So all messages in the queue have there timeout clocks > ticking instead of only the message inflight. To fix this, lets move the > queue back into the SCMI layer. > I think this has come up in the past. We have avoided adding addition locking here as the mailbox layer takes care of it. Has anything changed recently ? -- Regards, Sudeep
On Mon, Oct 07, 2024 at 02:04:10PM +0100, Sudeep Holla wrote: > On Fri, Oct 04, 2024 at 03:12:57PM -0700, Justin Chen wrote: > > The mailbox layer has its own queue. However this confuses the per > > message timeouts since the clock starts ticking the moment the messages > > get queued up. So all messages in the queue have there timeout clocks > > ticking instead of only the message inflight. To fix this, lets move the > > queue back into the SCMI layer. > > > > I think this has come up in the past. We have avoided adding addition > locking here as the mailbox layer takes care of it. Has anything changed > recently ? I asked for an explanation in my reply (we crossed each other mails probably) since it alredy came up in the past a few times and central locking seemed not to be needed...here the difference is about the reason...Justin talks about message timeouts related to the queueing process..so I asked to better explain the detail (and the anbomaly observed) since it still does not seem to me that even in this case the lock is needed....anyway I can definitely be woring of course :D Thanks, Cristian
On 10/7/24 6:10 AM, Cristian Marussi wrote: > On Mon, Oct 07, 2024 at 02:04:10PM +0100, Sudeep Holla wrote: >> On Fri, Oct 04, 2024 at 03:12:57PM -0700, Justin Chen wrote: >>> The mailbox layer has its own queue. However this confuses the per >>> message timeouts since the clock starts ticking the moment the messages >>> get queued up. So all messages in the queue have there timeout clocks >>> ticking instead of only the message inflight. To fix this, lets move the >>> queue back into the SCMI layer. >>> >> >> I think this has come up in the past. We have avoided adding addition >> locking here as the mailbox layer takes care of it. Has anything changed >> recently ? > > I asked for an explanation in my reply (we crossed each other mails probably) > since it alredy came up in the past a few times and central locking seemed not > to be needed...here the difference is about the reason...Justin talks about > message timeouts related to the queueing process..so I asked to better > explain the detail (and the anbomaly observed) since it still does not > seem to me that even in this case the lock is needed....anyway I can > definitely be woring of course :D > Hello Cristian, Thanks for the response. I'll try to elaborate. When comparing SMC and mailbox transport, we noticed mailbox transport timesout much quicker when under load. Originally we thought this was the latency of the mailbox implementation, but after debugging we noticed a weird behavior. We saw SMCI transactions timing out before the mailbox even transmitted the message. This issue lies in the SCMI layer. drivers/firmware/arm_scmi/driver.c do_xfer() function. The fundamental issue is send_message() blocks for SMC transport, but doesn't block for mailbox transport. So if send_message() doesn't block we can have multiple messages waiting at scmi_wait_for_message_response(). SMC looks like this CPU #0 SCMI message 0 -> calls send_message() then calls scmi_wait_for_message_response(), timesout after 30ms. CPU #1 SCMI message 1 -> blocks at send_message() waiting for SCMI message 0 to complete. Mailbox looks like this CPU #0 SCMI message 0 -> calls send_message(), mailbox layer queues up message, mailbox layer sees no message is outgoing and sends it. CPU waits at scmi_wait_for_message_response(), timesout after 30ms CPU #1 SCMI message 1 -> calls send_message(), mailbox layer queues up message, mailbox layer sees message pending, hold message in queue. CPU waits at scmi_wait_for_message_response(), timesout after 30ms. Lets say if transport takes 25ms. The first message would succeed, the second message would timeout after 5ms. Hopefully this makes sense. Justin
> Subject: Re: [PATCH] firmware: arm_scmi: Queue in scmi layer for > mailbox implementation > > > > On 10/7/24 6:10 AM, Cristian Marussi wrote: > > On Mon, Oct 07, 2024 at 02:04:10PM +0100, Sudeep Holla wrote: > >> On Fri, Oct 04, 2024 at 03:12:57PM -0700, Justin Chen wrote: > >>> The mailbox layer has its own queue. However this confuses the > per > >>> message timeouts since the clock starts ticking the moment the > >>> messages get queued up. So all messages in the queue have there > >>> timeout clocks ticking instead of only the message inflight. To fix > >>> this, lets move the queue back into the SCMI layer. > >>> > >> > >> I think this has come up in the past. We have avoided adding > addition > >> locking here as the mailbox layer takes care of it. Has anything > >> changed recently ? > > > > I asked for an explanation in my reply (we crossed each other mails > > probably) since it alredy came up in the past a few times and central > > locking seemed not to be needed...here the difference is about the > > reason...Justin talks about message timeouts related to the queueing > > process..so I asked to better explain the detail (and the anbomaly > > observed) since it still does not seem to me that even in this case > > the lock is needed....anyway I can definitely be woring of course :D > > > > Hello Cristian, > > Thanks for the response. I'll try to elaborate. > > When comparing SMC and mailbox transport, we noticed mailbox > transport timesout much quicker when under load. Originally we > thought this was the latency of the mailbox implementation, but after > debugging we noticed a weird behavior. We saw SMCI transactions > timing out before the mailbox even transmitted the message. > > This issue lies in the SCMI layer. drivers/firmware/arm_scmi/driver.c > do_xfer() function. > > The fundamental issue is send_message() blocks for SMC transport, but > doesn't block for mailbox transport. So if send_message() doesn't block > we can have multiple messages waiting at > scmi_wait_for_message_response(). > > SMC looks like this > CPU #0 SCMI message 0 -> calls send_message() then calls > scmi_wait_for_message_response(), timesout after 30ms. > CPU #1 SCMI message 1 -> blocks at send_message() waiting for SCMI > message 0 to complete. > > Mailbox looks like this > CPU #0 SCMI message 0 -> calls send_message(), mailbox layer queues > up message, mailbox layer sees no message is outgoing and sends it. > CPU waits at scmi_wait_for_message_response(), timesout after 30ms > CPU #1 SCMI message 1 -> calls send_message(), mailbox layer queues > up message, mailbox layer sees message pending, hold message in > queue. CPU waits at scmi_wait_for_message_response(), timesout after > 30ms. > > Lets say if transport takes 25ms. The first message would succeed, the > second message would timeout after 5ms. Each xfer has its own completion, how the 2nd impacts the 1st? Regards, Peng. > > Hopefully this makes sense. > > Justin >
On 10/7/2024 7:43 PM, Peng Fan wrote: >> Subject: Re: [PATCH] firmware: arm_scmi: Queue in scmi layer for >> mailbox implementation >> >> >> >> On 10/7/24 6:10 AM, Cristian Marussi wrote: >>> On Mon, Oct 07, 2024 at 02:04:10PM +0100, Sudeep Holla wrote: >>>> On Fri, Oct 04, 2024 at 03:12:57PM -0700, Justin Chen wrote: >>>>> The mailbox layer has its own queue. However this confuses the >> per >>>>> message timeouts since the clock starts ticking the moment the >>>>> messages get queued up. So all messages in the queue have there >>>>> timeout clocks ticking instead of only the message inflight. To fix >>>>> this, lets move the queue back into the SCMI layer. >>>>> >>>> >>>> I think this has come up in the past. We have avoided adding >> addition >>>> locking here as the mailbox layer takes care of it. Has anything >>>> changed recently ? >>> >>> I asked for an explanation in my reply (we crossed each other mails >>> probably) since it alredy came up in the past a few times and central >>> locking seemed not to be needed...here the difference is about the >>> reason...Justin talks about message timeouts related to the queueing >>> process..so I asked to better explain the detail (and the anbomaly >>> observed) since it still does not seem to me that even in this case >>> the lock is needed....anyway I can definitely be woring of course :D >>> >> >> Hello Cristian, >> >> Thanks for the response. I'll try to elaborate. >> >> When comparing SMC and mailbox transport, we noticed mailbox >> transport timesout much quicker when under load. Originally we >> thought this was the latency of the mailbox implementation, but after >> debugging we noticed a weird behavior. We saw SMCI transactions >> timing out before the mailbox even transmitted the message. >> >> This issue lies in the SCMI layer. drivers/firmware/arm_scmi/driver.c >> do_xfer() function. >> >> The fundamental issue is send_message() blocks for SMC transport, but >> doesn't block for mailbox transport. So if send_message() doesn't block >> we can have multiple messages waiting at >> scmi_wait_for_message_response(). >> >> SMC looks like this >> CPU #0 SCMI message 0 -> calls send_message() then calls >> scmi_wait_for_message_response(), timesout after 30ms. >> CPU #1 SCMI message 1 -> blocks at send_message() waiting for SCMI >> message 0 to complete. >> >> Mailbox looks like this >> CPU #0 SCMI message 0 -> calls send_message(), mailbox layer queues >> up message, mailbox layer sees no message is outgoing and sends it. >> CPU waits at scmi_wait_for_message_response(), timesout after 30ms >> CPU #1 SCMI message 1 -> calls send_message(), mailbox layer queues >> up message, mailbox layer sees message pending, hold message in >> queue. CPU waits at scmi_wait_for_message_response(), timesout after >> 30ms. >> >> Lets say if transport takes 25ms. The first message would succeed, the >> second message would timeout after 5ms. > > Each xfer has its own completion, how the 2nd impacts the 1st? > > Regards, > Peng. > Hello Peng, The mailbox layer queues messages and doesn't block when send_message() is called. So we can have both xfer waiting for completion at the same time. Lets assume a message takes 25ms to complete, with a 30ms timeout. 0ms Message #0 is queued in mailbox layer and sent out, then sits at scmi_wait_for_message_response() with a timeout of 30ms 1ms Message #1 is queued in mailbox layer but not sent out yet. Since send_message() doesn't block, it also sits at scmi_wait_for_message_response() with a timeout of 30ms ... 25ms Message #0 is completed, txdone is called and Message #1 is sent out 31ms Message #1 times out since the count started at 1ms. Even though it has only been inflight for 6ms. Thanks, Justin >> >> Hopefully this makes sense. >> >> Justin >> >
On Mon, Oct 07, 2024 at 10:58:47AM -0700, Justin Chen wrote: > > > On 10/7/24 6:10 AM, Cristian Marussi wrote: > > On Mon, Oct 07, 2024 at 02:04:10PM +0100, Sudeep Holla wrote: > > > On Fri, Oct 04, 2024 at 03:12:57PM -0700, Justin Chen wrote: > > > > The mailbox layer has its own queue. However this confuses the per > > > > message timeouts since the clock starts ticking the moment the messages > > > > get queued up. So all messages in the queue have there timeout clocks > > > > ticking instead of only the message inflight. To fix this, lets move the > > > > queue back into the SCMI layer. > > > > > > > > > > I think this has come up in the past. We have avoided adding addition > > > locking here as the mailbox layer takes care of it. Has anything changed > > > recently ? > > > > I asked for an explanation in my reply (we crossed each other mails probably) > > since it alredy came up in the past a few times and central locking seemed not > > to be needed...here the difference is about the reason...Justin talks about > > message timeouts related to the queueing process..so I asked to better > > explain the detail (and the anbomaly observed) since it still does not > > seem to me that even in this case the lock is needed....anyway I can > > definitely be woring of course :D > > > > Hello Cristian, > Hi Justin, > Thanks for the response. I'll try to elaborate. > > When comparing SMC and mailbox transport, we noticed mailbox transport > timesout much quicker when under load. Originally we thought this was the > latency of the mailbox implementation, but after debugging we noticed a > weird behavior. We saw SMCI transactions timing out before the mailbox even > transmitted the message. > > This issue lies in the SCMI layer. drivers/firmware/arm_scmi/driver.c > do_xfer() function. > > The fundamental issue is send_message() blocks for SMC transport, but > doesn't block for mailbox transport. So if send_message() doesn't block we > can have multiple messages waiting at scmi_wait_for_message_response(). > oh...yes...now I can see it...tx_prepare is really never called given how the mailbox subsystem de-queues messages once at time...so we end up waiting for a reply to some message that is still to be sent...so the message inflight is really NOT corrupted because the next remain pending until the reply in the shmem is read back , BUT the timeout will drift away if you multiple inflights are pending to be sent... > SMC looks like this > CPU #0 SCMI message 0 -> calls send_message() then calls > scmi_wait_for_message_response(), timesout after 30ms. > CPU #1 SCMI message 1 -> blocks at send_message() waiting for SCMI message 0 > to complete. > > Mailbox looks like this > CPU #0 SCMI message 0 -> calls send_message(), mailbox layer queues up > message, mailbox layer sees no message is outgoing and sends it. CPU waits > at scmi_wait_for_message_response(), timesout after 30ms > CPU #1 SCMI message 1 -> calls send_message(), mailbox layer queues up > message, mailbox layer sees message pending, hold message in queue. CPU > waits at scmi_wait_for_message_response(), timesout after 30ms. > > Lets say if transport takes 25ms. The first message would succeed, the > second message would timeout after 5ms. > > Hopefully this makes sense. Yes, of course, thanks, for reporting this, and taking time to explain... ...in general the patch LGTM...I think your patch is good also because it could be easily backported as a fix....can you add a Fixes tag in your next version ? Also can you explain in more detail the issue and the solution in the commit message....that will help having it merged as a Fix in stables... ...for the future (definitely NOT in this series) we could probably think to get rid of the sleeping mutex in favour of some other non-sleeping form of mutual exclusion around the channnel (like in SMC transport) and enable (optionally) Atomic transmission support AND also review if the shmem layer busy-waiting in txprepare is anymore needed at all... Thanks, Cristian
On Mon, Oct 07, 2024 at 10:58:47AM -0700, Justin Chen wrote: > > > On 10/7/24 6:10 AM, Cristian Marussi wrote: > > On Mon, Oct 07, 2024 at 02:04:10PM +0100, Sudeep Holla wrote: > > > On Fri, Oct 04, 2024 at 03:12:57PM -0700, Justin Chen wrote: > > > > The mailbox layer has its own queue. However this confuses the per > > > > message timeouts since the clock starts ticking the moment the messages > > > > get queued up. So all messages in the queue have there timeout clocks > > > > ticking instead of only the message inflight. To fix this, lets move the > > > > queue back into the SCMI layer. > > > > > > > > > > I think this has come up in the past. We have avoided adding addition > > > locking here as the mailbox layer takes care of it. Has anything changed > > > recently ? > > > > I asked for an explanation in my reply (we crossed each other mails probably) > > since it alredy came up in the past a few times and central locking seemed not > > to be needed...here the difference is about the reason...Justin talks about > > message timeouts related to the queueing process..so I asked to better > > explain the detail (and the anbomaly observed) since it still does not > > seem to me that even in this case the lock is needed....anyway I can > > definitely be woring of course :D > > > > Hello Cristian, > > Thanks for the response. I'll try to elaborate. > > When comparing SMC and mailbox transport, we noticed mailbox transport > timesout much quicker when under load. Originally we thought this was the > latency of the mailbox implementation, but after debugging we noticed a > weird behavior. We saw SMCI transactions timing out before the mailbox even > transmitted the message. > > This issue lies in the SCMI layer. drivers/firmware/arm_scmi/driver.c > do_xfer() function. > > The fundamental issue is send_message() blocks for SMC transport, but > doesn't block for mailbox transport. So if send_message() doesn't block we > can have multiple messages waiting at scmi_wait_for_message_response(). > > SMC looks like this > CPU #0 SCMI message 0 -> calls send_message() then calls > scmi_wait_for_message_response(), timesout after 30ms. > CPU #1 SCMI message 1 -> blocks at send_message() waiting for SCMI message 0 > to complete. > > Mailbox looks like this > CPU #0 SCMI message 0 -> calls send_message(), mailbox layer queues up > message, mailbox layer sees no message is outgoing and sends it. CPU waits > at scmi_wait_for_message_response(), timesout after 30ms > CPU #1 SCMI message 1 -> calls send_message(), mailbox layer queues up > message, mailbox layer sees message pending, hold message in queue. CPU > waits at scmi_wait_for_message_response(), timesout after 30ms. > > Lets say if transport takes 25ms. The first message would succeed, the > second message would timeout after 5ms. > I understand this issue and was aware of this. Just had assumed it won't have such a impact as I was assuming transport to take couple of ms for any synchronous commands. Typically I would expect transport to take much less that 25ms(1-4ms IMO) to complete any synchronous commands. If it takes 25ms for sync command, then it could be some serious issue. Anyways I am not against the idea. More details in response to Cristian.
On Tue, Oct 08, 2024 at 01:10:39PM +0100, Cristian Marussi wrote: > On Mon, Oct 07, 2024 at 10:58:47AM -0700, Justin Chen wrote: > > Thanks for the response. I'll try to elaborate. > > > > When comparing SMC and mailbox transport, we noticed mailbox transport > > timesout much quicker when under load. Originally we thought this was the > > latency of the mailbox implementation, but after debugging we noticed a > > weird behavior. We saw SMCI transactions timing out before the mailbox even > > transmitted the message. > > > > This issue lies in the SCMI layer. drivers/firmware/arm_scmi/driver.c > > do_xfer() function. > > > > The fundamental issue is send_message() blocks for SMC transport, but > > doesn't block for mailbox transport. So if send_message() doesn't block we > > can have multiple messages waiting at scmi_wait_for_message_response(). > > > > oh...yes...now I can see it...tx_prepare is really never called given > how the mailbox subsystem de-queues messages once at time...so we end up > waiting for a reply to some message that is still to be sent...so the > message inflight is really NOT corrupted because the next remain pending > until the reply in the shmem is read back , BUT the timeout will drift away > if you multiple inflights are pending to be sent... > Indeed. > > SMC looks like this > > CPU #0 SCMI message 0 -> calls send_message() then calls > > scmi_wait_for_message_response(), timesout after 30ms. > > CPU #1 SCMI message 1 -> blocks at send_message() waiting for SCMI message 0 > > to complete. > > > > Mailbox looks like this > > CPU #0 SCMI message 0 -> calls send_message(), mailbox layer queues up > > message, mailbox layer sees no message is outgoing and sends it. CPU waits > > at scmi_wait_for_message_response(), timesout after 30ms > > CPU #1 SCMI message 1 -> calls send_message(), mailbox layer queues up > > message, mailbox layer sees message pending, hold message in queue. CPU > > waits at scmi_wait_for_message_response(), timesout after 30ms. > > > > Lets say if transport takes 25ms. The first message would succeed, the > > second message would timeout after 5ms. > > > > Hopefully this makes sense. > > Yes, of course, thanks, for reporting this, and taking time to > explain... > > ...in general the patch LGTM...I think your patch is good also because it > could be easily backported as a fix....can you add a Fixes tag in your > next version ? > Are you seeing this issue a lot ? IOW, do we need this to be backported ? > Also can you explain in more detail the issue and the solution in the commit > message....that will help having it merged as a Fix in stables... > > ...for the future (definitely NOT in this series) we could probably think to > get rid of the sleeping mutex in favour of some other non-sleeping form of > mutual exclusion around the channnel (like in SMC transport) and enable > (optionally) Atomic transmission support AND also review if the shmem > layer busy-waiting in txprepare is anymore needed at all... > Agreed, if we are locking the channel in SCMI, we can drop the busy-waiting in tx_prepare and the associated details in the comment as this locking voids that. It is better have both the changes in the same patch to indicate the relation between them. -- Regards, Sudeep
On Tue, Oct 08, 2024 at 02:23:00PM +0100, Sudeep Holla wrote: > On Tue, Oct 08, 2024 at 01:10:39PM +0100, Cristian Marussi wrote: > > On Mon, Oct 07, 2024 at 10:58:47AM -0700, Justin Chen wrote: > > > Thanks for the response. I'll try to elaborate. > > > > > > When comparing SMC and mailbox transport, we noticed mailbox transport > > > timesout much quicker when under load. Originally we thought this was the > > > latency of the mailbox implementation, but after debugging we noticed a > > > weird behavior. We saw SMCI transactions timing out before the mailbox even > > > transmitted the message. > > > > > > This issue lies in the SCMI layer. drivers/firmware/arm_scmi/driver.c > > > do_xfer() function. > > > > > > The fundamental issue is send_message() blocks for SMC transport, but > > > doesn't block for mailbox transport. So if send_message() doesn't block we > > > can have multiple messages waiting at scmi_wait_for_message_response(). > > > > > > > oh...yes...now I can see it...tx_prepare is really never called given > > how the mailbox subsystem de-queues messages once at time...so we end up > > waiting for a reply to some message that is still to be sent...so the > > message inflight is really NOT corrupted because the next remain pending > > until the reply in the shmem is read back , BUT the timeout will drift away > > if you multiple inflights are pending to be sent... > > > > Indeed. > > > > SMC looks like this > > > CPU #0 SCMI message 0 -> calls send_message() then calls > > > scmi_wait_for_message_response(), timesout after 30ms. > > > CPU #1 SCMI message 1 -> blocks at send_message() waiting for SCMI message 0 > > > to complete. > > > > > > Mailbox looks like this > > > CPU #0 SCMI message 0 -> calls send_message(), mailbox layer queues up > > > message, mailbox layer sees no message is outgoing and sends it. CPU waits > > > at scmi_wait_for_message_response(), timesout after 30ms > > > CPU #1 SCMI message 1 -> calls send_message(), mailbox layer queues up > > > message, mailbox layer sees message pending, hold message in queue. CPU > > > waits at scmi_wait_for_message_response(), timesout after 30ms. > > > > > > Lets say if transport takes 25ms. The first message would succeed, the > > > second message would timeout after 5ms. > > > > > > Hopefully this makes sense. > > > > Yes, of course, thanks, for reporting this, and taking time to > > explain... > > > > ...in general the patch LGTM...I think your patch is good also because it > > could be easily backported as a fix....can you add a Fixes tag in your > > next version ? > > > > Are you seeing this issue a lot ? IOW, do we need this to be backported ? > > > Also can you explain in more detail the issue and the solution in the commit > > message....that will help having it merged as a Fix in stables... > > > > ...for the future (definitely NOT in this series) we could probably think to > > get rid of the sleeping mutex in favour of some other non-sleeping form of > > mutual exclusion around the channnel (like in SMC transport) and enable > > (optionally) Atomic transmission support AND also review if the shmem > > layer busy-waiting in txprepare is anymore needed at all... > > > > Agreed, if we are locking the channel in SCMI, we can drop the busy-waiting > in tx_prepare and the associated details in the comment as this locking > voids that. It is better have both the changes in the same patch to indicate > the relation between them. Actually scratch that last point. The waiting in tx_prepare until the platform marks it free for agent to use is still needed. One usecase is when agent/OS times out but platform continues to process and eventually releases the shmem. Sorry I completely forgot about that.
On Tue, Oct 08, 2024 at 02:34:59PM +0100, Sudeep Holla wrote: > On Tue, Oct 08, 2024 at 02:23:00PM +0100, Sudeep Holla wrote: > > On Tue, Oct 08, 2024 at 01:10:39PM +0100, Cristian Marussi wrote: > > > On Mon, Oct 07, 2024 at 10:58:47AM -0700, Justin Chen wrote: > > > > Thanks for the response. I'll try to elaborate. > > > > > > > > When comparing SMC and mailbox transport, we noticed mailbox transport > > > > timesout much quicker when under load. Originally we thought this was the > > > > latency of the mailbox implementation, but after debugging we noticed a > > > > weird behavior. We saw SMCI transactions timing out before the mailbox even > > > > transmitted the message. > > > > > > > > This issue lies in the SCMI layer. drivers/firmware/arm_scmi/driver.c > > > > do_xfer() function. > > > > > > > > The fundamental issue is send_message() blocks for SMC transport, but > > > > doesn't block for mailbox transport. So if send_message() doesn't block we > > > > can have multiple messages waiting at scmi_wait_for_message_response(). > > > > > > > > > > oh...yes...now I can see it...tx_prepare is really never called given > > > how the mailbox subsystem de-queues messages once at time...so we end up > > > waiting for a reply to some message that is still to be sent...so the > > > message inflight is really NOT corrupted because the next remain pending > > > until the reply in the shmem is read back , BUT the timeout will drift away > > > if you multiple inflights are pending to be sent... > > > > > > > Indeed. > > > > > > SMC looks like this > > > > CPU #0 SCMI message 0 -> calls send_message() then calls > > > > scmi_wait_for_message_response(), timesout after 30ms. > > > > CPU #1 SCMI message 1 -> blocks at send_message() waiting for SCMI message 0 > > > > to complete. > > > > > > > > Mailbox looks like this > > > > CPU #0 SCMI message 0 -> calls send_message(), mailbox layer queues up > > > > message, mailbox layer sees no message is outgoing and sends it. CPU waits > > > > at scmi_wait_for_message_response(), timesout after 30ms > > > > CPU #1 SCMI message 1 -> calls send_message(), mailbox layer queues up > > > > message, mailbox layer sees message pending, hold message in queue. CPU > > > > waits at scmi_wait_for_message_response(), timesout after 30ms. > > > > > > > > Lets say if transport takes 25ms. The first message would succeed, the > > > > second message would timeout after 5ms. > > > > > > > > Hopefully this makes sense. > > > > > > Yes, of course, thanks, for reporting this, and taking time to > > > explain... > > > > > > ...in general the patch LGTM...I think your patch is good also because it > > > could be easily backported as a fix....can you add a Fixes tag in your > > > next version ? > > > > > > > Are you seeing this issue a lot ? IOW, do we need this to be backported ? > > > > > Also can you explain in more detail the issue and the solution in the commit > > > message....that will help having it merged as a Fix in stables... > > > > > > ...for the future (definitely NOT in this series) we could probably think to > > > get rid of the sleeping mutex in favour of some other non-sleeping form of > > > mutual exclusion around the channnel (like in SMC transport) and enable > > > (optionally) Atomic transmission support AND also review if the shmem > > > layer busy-waiting in txprepare is anymore needed at all... > > > > > > > Agreed, if we are locking the channel in SCMI, we can drop the busy-waiting > > in tx_prepare and the associated details in the comment as this locking > > voids that. It is better have both the changes in the same patch to indicate > > the relation between them. > > Actually scratch that last point. The waiting in tx_prepare until the platform > marks it free for agent to use is still needed. One usecase is when agent/OS > times out but platform continues to process and eventually releases the shmem. > Sorry I completely forgot about that. > Yes indeed it is the mechanism that we avoid to reclaim forcibly anyway the shmem if the transmission times out...and we should keep that to avoid corruption of newer messages by late replies from the earlier ones that have timed out. Thanks, Cristian
On 10/8/24 6:37 AM, Cristian Marussi wrote: > On Tue, Oct 08, 2024 at 02:34:59PM +0100, Sudeep Holla wrote: >> On Tue, Oct 08, 2024 at 02:23:00PM +0100, Sudeep Holla wrote: >>> On Tue, Oct 08, 2024 at 01:10:39PM +0100, Cristian Marussi wrote: >>>> On Mon, Oct 07, 2024 at 10:58:47AM -0700, Justin Chen wrote: >>>>> Thanks for the response. I'll try to elaborate. >>>>> >>>>> When comparing SMC and mailbox transport, we noticed mailbox transport >>>>> timesout much quicker when under load. Originally we thought this was the >>>>> latency of the mailbox implementation, but after debugging we noticed a >>>>> weird behavior. We saw SMCI transactions timing out before the mailbox even >>>>> transmitted the message. >>>>> >>>>> This issue lies in the SCMI layer. drivers/firmware/arm_scmi/driver.c >>>>> do_xfer() function. >>>>> >>>>> The fundamental issue is send_message() blocks for SMC transport, but >>>>> doesn't block for mailbox transport. So if send_message() doesn't block we >>>>> can have multiple messages waiting at scmi_wait_for_message_response(). >>>>> >>>> >>>> oh...yes...now I can see it...tx_prepare is really never called given >>>> how the mailbox subsystem de-queues messages once at time...so we end up >>>> waiting for a reply to some message that is still to be sent...so the >>>> message inflight is really NOT corrupted because the next remain pending >>>> until the reply in the shmem is read back , BUT the timeout will drift away >>>> if you multiple inflights are pending to be sent... >>>> >>> >>> Indeed. >>> >>>>> SMC looks like this >>>>> CPU #0 SCMI message 0 -> calls send_message() then calls >>>>> scmi_wait_for_message_response(), timesout after 30ms. >>>>> CPU #1 SCMI message 1 -> blocks at send_message() waiting for SCMI message 0 >>>>> to complete. >>>>> >>>>> Mailbox looks like this >>>>> CPU #0 SCMI message 0 -> calls send_message(), mailbox layer queues up >>>>> message, mailbox layer sees no message is outgoing and sends it. CPU waits >>>>> at scmi_wait_for_message_response(), timesout after 30ms >>>>> CPU #1 SCMI message 1 -> calls send_message(), mailbox layer queues up >>>>> message, mailbox layer sees message pending, hold message in queue. CPU >>>>> waits at scmi_wait_for_message_response(), timesout after 30ms. >>>>> >>>>> Lets say if transport takes 25ms. The first message would succeed, the >>>>> second message would timeout after 5ms. >>>>> >>>>> Hopefully this makes sense. >>>> >>>> Yes, of course, thanks, for reporting this, and taking time to >>>> explain... >>>> >>>> ...in general the patch LGTM...I think your patch is good also because it >>>> could be easily backported as a fix....can you add a Fixes tag in your >>>> next version ? >>>> >>> >>> Are you seeing this issue a lot ? IOW, do we need this to be backported ? >>> I wouldn't say a lot. But we are seeing it with standard use of our devices running over an extended amount of time. Yes we would like this backported. >>>> Also can you explain in more detail the issue and the solution in the commit >>>> message....that will help having it merged as a Fix in stables... >>>> >>>> ...for the future (definitely NOT in this series) we could probably think to >>>> get rid of the sleeping mutex in favour of some other non-sleeping form of >>>> mutual exclusion around the channnel (like in SMC transport) and enable >>>> (optionally) Atomic transmission support AND also review if the shmem >>>> layer busy-waiting in txprepare is anymore needed at all... >>>> >>> >>> Agreed, if we are locking the channel in SCMI, we can drop the busy-waiting >>> in tx_prepare and the associated details in the comment as this locking >>> voids that. It is better have both the changes in the same patch to indicate >>> the relation between them. >> >> Actually scratch that last point. The waiting in tx_prepare until the platform >> marks it free for agent to use is still needed. One usecase is when agent/OS >> times out but platform continues to process and eventually releases the shmem. >> Sorry I completely forgot about that. >> > > Yes indeed it is the mechanism that we avoid to reclaim forcibly anyway the shmem > if the transmission times out...and we should keep that to avoid > corruption of newer messages by late replies from the earlier ones that > have timed out. > Yup. I saw an interesting interaction from this. Since modifying shmem and ringing the doorbell are often two different task. The modification of shmem can race with processing of timed out messages from the platform. This usually leads to an early ACK and spurious interrupt. Mostly harmless. We did see lockups when multiple timeouts occur, but it was unclear if this was an issue with the SCMI transport layer or our driver/platform. Thanks, Justin > Thanks, > Cristian
On 10/8/24 12:23 PM, Justin Chen wrote: > > > On 10/8/24 6:37 AM, Cristian Marussi wrote: >> On Tue, Oct 08, 2024 at 02:34:59PM +0100, Sudeep Holla wrote: >>> On Tue, Oct 08, 2024 at 02:23:00PM +0100, Sudeep Holla wrote: >>>> On Tue, Oct 08, 2024 at 01:10:39PM +0100, Cristian Marussi wrote: >>>>> On Mon, Oct 07, 2024 at 10:58:47AM -0700, Justin Chen wrote: >>>>>> Thanks for the response. I'll try to elaborate. >>>>>> >>>>>> When comparing SMC and mailbox transport, we noticed mailbox >>>>>> transport >>>>>> timesout much quicker when under load. Originally we thought this >>>>>> was the >>>>>> latency of the mailbox implementation, but after debugging we >>>>>> noticed a >>>>>> weird behavior. We saw SMCI transactions timing out before the >>>>>> mailbox even >>>>>> transmitted the message. >>>>>> >>>>>> This issue lies in the SCMI layer. drivers/firmware/arm_scmi/driver.c >>>>>> do_xfer() function. >>>>>> >>>>>> The fundamental issue is send_message() blocks for SMC transport, but >>>>>> doesn't block for mailbox transport. So if send_message() doesn't >>>>>> block we >>>>>> can have multiple messages waiting at >>>>>> scmi_wait_for_message_response(). >>>>>> >>>>> >>>>> oh...yes...now I can see it...tx_prepare is really never called given >>>>> how the mailbox subsystem de-queues messages once at time...so we >>>>> end up >>>>> waiting for a reply to some message that is still to be sent...so the >>>>> message inflight is really NOT corrupted because the next remain >>>>> pending >>>>> until the reply in the shmem is read back , BUT the timeout will >>>>> drift away >>>>> if you multiple inflights are pending to be sent... >>>>> >>>> >>>> Indeed. >>>> >>>>>> SMC looks like this >>>>>> CPU #0 SCMI message 0 -> calls send_message() then calls >>>>>> scmi_wait_for_message_response(), timesout after 30ms. >>>>>> CPU #1 SCMI message 1 -> blocks at send_message() waiting for SCMI >>>>>> message 0 >>>>>> to complete. >>>>>> >>>>>> Mailbox looks like this >>>>>> CPU #0 SCMI message 0 -> calls send_message(), mailbox layer >>>>>> queues up >>>>>> message, mailbox layer sees no message is outgoing and sends it. >>>>>> CPU waits >>>>>> at scmi_wait_for_message_response(), timesout after 30ms >>>>>> CPU #1 SCMI message 1 -> calls send_message(), mailbox layer >>>>>> queues up >>>>>> message, mailbox layer sees message pending, hold message in >>>>>> queue. CPU >>>>>> waits at scmi_wait_for_message_response(), timesout after 30ms. >>>>>> >>>>>> Lets say if transport takes 25ms. The first message would succeed, >>>>>> the >>>>>> second message would timeout after 5ms. >>>>>> >>>>>> Hopefully this makes sense. >>>>> >>>>> Yes, of course, thanks, for reporting this, and taking time to >>>>> explain... >>>>> >>>>> ...in general the patch LGTM...I think your patch is good also >>>>> because it >>>>> could be easily backported as a fix....can you add a Fixes tag in your >>>>> next version ? >>>>> >>>> >>>> Are you seeing this issue a lot ? IOW, do we need this to be >>>> backported ? >>>> > > I wouldn't say a lot. But we are seeing it with standard use of our > devices running over an extended amount of time. Yes we would like this > backported. > Any advice on what Fixes tag to add here? Since the driver was moved to the transport folder, I assume this is going to be a manual backport. Not sure if the Fixes tag should be when the SCMI MBOX transport was moved to a standalone driver in the transport folder or the very first SCMI commits since the MBOX transport was added then. Thanks, Justin <snip>
On Tue, Oct 08, 2024 at 12:23:28PM -0700, Justin Chen wrote: > > > On 10/8/24 6:37 AM, Cristian Marussi wrote: > > On Tue, Oct 08, 2024 at 02:34:59PM +0100, Sudeep Holla wrote: > > > On Tue, Oct 08, 2024 at 02:23:00PM +0100, Sudeep Holla wrote: > > > > On Tue, Oct 08, 2024 at 01:10:39PM +0100, Cristian Marussi wrote: > > > > > On Mon, Oct 07, 2024 at 10:58:47AM -0700, Justin Chen wrote: > > > > > > Thanks for the response. I'll try to elaborate. > > > > > > > > > > > > When comparing SMC and mailbox transport, we noticed mailbox transport > > > > > > timesout much quicker when under load. Originally we thought this was the > > > > > > latency of the mailbox implementation, but after debugging we noticed a > > > > > > weird behavior. We saw SMCI transactions timing out before the mailbox even > > > > > > transmitted the message. > > > > > > > > > > > > This issue lies in the SCMI layer. drivers/firmware/arm_scmi/driver.c > > > > > > do_xfer() function. > > > > > > > > > > > > The fundamental issue is send_message() blocks for SMC transport, but > > > > > > doesn't block for mailbox transport. So if send_message() doesn't block we > > > > > > can have multiple messages waiting at scmi_wait_for_message_response(). > > > > > > > > > > > > > > > > oh...yes...now I can see it...tx_prepare is really never called given > > > > > how the mailbox subsystem de-queues messages once at time...so we end up > > > > > waiting for a reply to some message that is still to be sent...so the > > > > > message inflight is really NOT corrupted because the next remain pending > > > > > until the reply in the shmem is read back , BUT the timeout will drift away > > > > > if you multiple inflights are pending to be sent... > > > > > > > > > > > > > Indeed. > > > > > > > > > > SMC looks like this > > > > > > CPU #0 SCMI message 0 -> calls send_message() then calls > > > > > > scmi_wait_for_message_response(), timesout after 30ms. > > > > > > CPU #1 SCMI message 1 -> blocks at send_message() waiting for SCMI message 0 > > > > > > to complete. > > > > > > > > > > > > Mailbox looks like this > > > > > > CPU #0 SCMI message 0 -> calls send_message(), mailbox layer queues up > > > > > > message, mailbox layer sees no message is outgoing and sends it. CPU waits > > > > > > at scmi_wait_for_message_response(), timesout after 30ms > > > > > > CPU #1 SCMI message 1 -> calls send_message(), mailbox layer queues up > > > > > > message, mailbox layer sees message pending, hold message in queue. CPU > > > > > > waits at scmi_wait_for_message_response(), timesout after 30ms. > > > > > > > > > > > > Lets say if transport takes 25ms. The first message would succeed, the > > > > > > second message would timeout after 5ms. > > > > > > > > > > > > Hopefully this makes sense. > > > > > > > > > > Yes, of course, thanks, for reporting this, and taking time to > > > > > explain... > > > > > > > > > > ...in general the patch LGTM...I think your patch is good also because it > > > > > could be easily backported as a fix....can you add a Fixes tag in your > > > > > next version ? > > > > > > > > > > > > > Are you seeing this issue a lot ? IOW, do we need this to be backported ? > > > > > > I wouldn't say a lot. But we are seeing it with standard use of our devices > running over an extended amount of time. Yes we would like this backported. > > > > > > Also can you explain in more detail the issue and the solution in the commit > > > > > message....that will help having it merged as a Fix in stables... > > > > > > > > > > ...for the future (definitely NOT in this series) we could probably think to > > > > > get rid of the sleeping mutex in favour of some other non-sleeping form of > > > > > mutual exclusion around the channnel (like in SMC transport) and enable > > > > > (optionally) Atomic transmission support AND also review if the shmem > > > > > layer busy-waiting in txprepare is anymore needed at all... > > > > > > > > > > > > > Agreed, if we are locking the channel in SCMI, we can drop the busy-waiting > > > > in tx_prepare and the associated details in the comment as this locking > > > > voids that. It is better have both the changes in the same patch to indicate > > > > the relation between them. > > > > > > Actually scratch that last point. The waiting in tx_prepare until the platform > > > marks it free for agent to use is still needed. One usecase is when agent/OS > > > times out but platform continues to process and eventually releases the shmem. > > > Sorry I completely forgot about that. > > > > > > > Yes indeed it is the mechanism that we avoid to reclaim forcibly anyway the shmem > > if the transmission times out...and we should keep that to avoid > > corruption of newer messages by late replies from the earlier ones that > > have timed out. > > > > Yup. I saw an interesting interaction from this. Since modifying shmem and > ringing the doorbell are often two different task. The modification of shmem > can race with processing of timed out messages from the platform. This > usually leads to an early ACK and spurious interrupt. Mostly harmless. We > did see lockups when multiple timeouts occur, but it was unclear if this was > an issue with the SCMI transport layer or our driver/platform. > Are you talking about something similar to this: https://lore.kernel.org/all/20231220172112.763539-1-cristian.marussi@arm.com/ ... reported as a side effect of a spurious IRQ on late timed-out replies, it should have been fixed with the above commit in v6.8. Thanks, Cristian
On 10/9/24 1:32 AM, Cristian Marussi wrote: > On Tue, Oct 08, 2024 at 12:23:28PM -0700, Justin Chen wrote: >> >> >> On 10/8/24 6:37 AM, Cristian Marussi wrote: >>> On Tue, Oct 08, 2024 at 02:34:59PM +0100, Sudeep Holla wrote: >>>> On Tue, Oct 08, 2024 at 02:23:00PM +0100, Sudeep Holla wrote: >>>>> On Tue, Oct 08, 2024 at 01:10:39PM +0100, Cristian Marussi wrote: >>>>>> On Mon, Oct 07, 2024 at 10:58:47AM -0700, Justin Chen wrote: >>>>>>> Thanks for the response. I'll try to elaborate. >>>>>>> >>>>>>> When comparing SMC and mailbox transport, we noticed mailbox transport >>>>>>> timesout much quicker when under load. Originally we thought this was the >>>>>>> latency of the mailbox implementation, but after debugging we noticed a >>>>>>> weird behavior. We saw SMCI transactions timing out before the mailbox even >>>>>>> transmitted the message. >>>>>>> >>>>>>> This issue lies in the SCMI layer. drivers/firmware/arm_scmi/driver.c >>>>>>> do_xfer() function. >>>>>>> >>>>>>> The fundamental issue is send_message() blocks for SMC transport, but >>>>>>> doesn't block for mailbox transport. So if send_message() doesn't block we >>>>>>> can have multiple messages waiting at scmi_wait_for_message_response(). >>>>>>> >>>>>> >>>>>> oh...yes...now I can see it...tx_prepare is really never called given >>>>>> how the mailbox subsystem de-queues messages once at time...so we end up >>>>>> waiting for a reply to some message that is still to be sent...so the >>>>>> message inflight is really NOT corrupted because the next remain pending >>>>>> until the reply in the shmem is read back , BUT the timeout will drift away >>>>>> if you multiple inflights are pending to be sent... >>>>>> >>>>> >>>>> Indeed. >>>>> >>>>>>> SMC looks like this >>>>>>> CPU #0 SCMI message 0 -> calls send_message() then calls >>>>>>> scmi_wait_for_message_response(), timesout after 30ms. >>>>>>> CPU #1 SCMI message 1 -> blocks at send_message() waiting for SCMI message 0 >>>>>>> to complete. >>>>>>> >>>>>>> Mailbox looks like this >>>>>>> CPU #0 SCMI message 0 -> calls send_message(), mailbox layer queues up >>>>>>> message, mailbox layer sees no message is outgoing and sends it. CPU waits >>>>>>> at scmi_wait_for_message_response(), timesout after 30ms >>>>>>> CPU #1 SCMI message 1 -> calls send_message(), mailbox layer queues up >>>>>>> message, mailbox layer sees message pending, hold message in queue. CPU >>>>>>> waits at scmi_wait_for_message_response(), timesout after 30ms. >>>>>>> >>>>>>> Lets say if transport takes 25ms. The first message would succeed, the >>>>>>> second message would timeout after 5ms. >>>>>>> >>>>>>> Hopefully this makes sense. >>>>>> >>>>>> Yes, of course, thanks, for reporting this, and taking time to >>>>>> explain... >>>>>> >>>>>> ...in general the patch LGTM...I think your patch is good also because it >>>>>> could be easily backported as a fix....can you add a Fixes tag in your >>>>>> next version ? >>>>>> >>>>> >>>>> Are you seeing this issue a lot ? IOW, do we need this to be backported ? >>>>> >> >> I wouldn't say a lot. But we are seeing it with standard use of our devices >> running over an extended amount of time. Yes we would like this backported. >> >>>>>> Also can you explain in more detail the issue and the solution in the commit >>>>>> message....that will help having it merged as a Fix in stables... >>>>>> >>>>>> ...for the future (definitely NOT in this series) we could probably think to >>>>>> get rid of the sleeping mutex in favour of some other non-sleeping form of >>>>>> mutual exclusion around the channnel (like in SMC transport) and enable >>>>>> (optionally) Atomic transmission support AND also review if the shmem >>>>>> layer busy-waiting in txprepare is anymore needed at all... >>>>>> >>>>> >>>>> Agreed, if we are locking the channel in SCMI, we can drop the busy-waiting >>>>> in tx_prepare and the associated details in the comment as this locking >>>>> voids that. It is better have both the changes in the same patch to indicate >>>>> the relation between them. >>>> >>>> Actually scratch that last point. The waiting in tx_prepare until the platform >>>> marks it free for agent to use is still needed. One usecase is when agent/OS >>>> times out but platform continues to process and eventually releases the shmem. >>>> Sorry I completely forgot about that. >>>> >>> >>> Yes indeed it is the mechanism that we avoid to reclaim forcibly anyway the shmem >>> if the transmission times out...and we should keep that to avoid >>> corruption of newer messages by late replies from the earlier ones that >>> have timed out. >>> >> >> Yup. I saw an interesting interaction from this. Since modifying shmem and >> ringing the doorbell are often two different task. The modification of shmem >> can race with processing of timed out messages from the platform. This >> usually leads to an early ACK and spurious interrupt. Mostly harmless. We >> did see lockups when multiple timeouts occur, but it was unclear if this was >> an issue with the SCMI transport layer or our driver/platform. >> > > Are you talking about something similar to this: > > https://lore.kernel.org/all/20231220172112.763539-1-cristian.marussi@arm.com/ > > ... reported as a side effect of a spurious IRQ on late timed-out > replies, it should have been fixed with the above commit in v6.8. > I had these fixes and saw these warning message. With these locking changes, I no longer see the lock up. Guess they were related somehow. Either way, Thanks for the review. v2 incoming. Justin > Thanks, > Cristian
diff --git a/drivers/firmware/arm_scmi/transports/mailbox.c b/drivers/firmware/arm_scmi/transports/mailbox.c index 1a754dee24f7..30bc2865582f 100644 --- a/drivers/firmware/arm_scmi/transports/mailbox.c +++ b/drivers/firmware/arm_scmi/transports/mailbox.c @@ -33,6 +33,7 @@ struct scmi_mailbox { struct mbox_chan *chan_platform_receiver; struct scmi_chan_info *cinfo; struct scmi_shared_mem __iomem *shmem; + struct mutex chan_lock; }; #define client_to_scmi_mailbox(c) container_of(c, struct scmi_mailbox, cl) @@ -205,6 +206,7 @@ static int mailbox_chan_setup(struct scmi_chan_info *cinfo, struct device *dev, cl->rx_callback = rx_callback; cl->tx_block = false; cl->knows_txdone = tx; + mutex_init(&smbox->chan_lock); smbox->chan = mbox_request_channel(cl, tx ? 0 : p2a_chan); if (IS_ERR(smbox->chan)) { @@ -267,11 +269,21 @@ static int mailbox_send_message(struct scmi_chan_info *cinfo, struct scmi_mailbox *smbox = cinfo->transport_info; int ret; + /* + * The mailbox layer has it's own queue. However the mailbox queue confuses + * the per message SCMI timeouts since the clock starts when the message is + * submitted into the mailbox queue. So when multiple messages are queued up + * the clock starts on all messages instead of only the one inflight. + */ + mutex_lock(&smbox->chan_lock); + ret = mbox_send_message(smbox->chan, xfer); /* mbox_send_message returns non-negative value on success, so reset */ if (ret > 0) ret = 0; + else + mutex_unlock(&smbox->chan_lock); return ret; } @@ -281,13 +293,10 @@ static void mailbox_mark_txdone(struct scmi_chan_info *cinfo, int ret, { struct scmi_mailbox *smbox = cinfo->transport_info; - /* - * NOTE: we might prefer not to need the mailbox ticker to manage the - * transfer queueing since the protocol layer queues things by itself. - * Unfortunately, we have to kick the mailbox framework after we have - * received our message. - */ mbox_client_txdone(smbox->chan, ret); + + /* Release channel */ + mutex_unlock(&smbox->chan_lock); } static void mailbox_fetch_response(struct scmi_chan_info *cinfo,
The mailbox layer has its own queue. However this confuses the per message timeouts since the clock starts ticking the moment the messages get queued up. So all messages in the queue have there timeout clocks ticking instead of only the message inflight. To fix this, lets move the queue back into the SCMI layer. Signed-off-by: Justin Chen <justin.chen@broadcom.com> --- .../firmware/arm_scmi/transports/mailbox.c | 21 +++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)