Message ID | 20241009192637.1090238-1-justin.chen@broadcom.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] firmware: arm_scmi: Queue in scmi layer for mailbox implementation | expand |
On Wed, Oct 09, 2024 at 12:26:37PM -0700, Justin Chen wrote: > send_message() does not block in the MBOX implementation. This is > because the mailbox layer has its own queue. However, this confuses > the per xfer timeouts as they all start their timeout ticks in > parallel. > > Consider a case where the xfer timeout is 30ms and a SCMI transaction > takes 25ms. > > 0ms: Message #0 is queued in mailbox layer and sent out, then sits > at scmi_wait_for_message_response() with a timeout of 30ms > 1ms: Message #1 is queued in mailbox layer but not sent out yet. > Since send_message() doesn't block, it also sits at > scmi_wait_for_message_response() with a timeout of 30ms > ... > 25ms: Message #0 is completed, txdone is called and Message #1 is > sent out > 31ms: Message #1 times out since the count started at 1ms. Even > though it has only been inflight for 6ms. > > Fixes: b53515fa177c ("firmware: arm_scmi: Make MBOX transport a standalone driver") > Signed-off-by: Justin Chen <justin.chen@broadcom.com> > --- > > Changes in v2: Hi Justin, thanks. A few nitpicks and one remark down below. > > - Added Fixes tag > - Improved commit message to better capture the issue > > .../firmware/arm_scmi/transports/mailbox.c | 21 +++++++++++++------ > 1 file changed, 15 insertions(+), 6 deletions(-) > > diff --git a/drivers/firmware/arm_scmi/transports/mailbox.c b/drivers/firmware/arm_scmi/transports/mailbox.c > index 1a754dee24f7..30bc2865582f 100644 > --- a/drivers/firmware/arm_scmi/transports/mailbox.c > +++ b/drivers/firmware/arm_scmi/transports/mailbox.c > @@ -33,6 +33,7 @@ struct scmi_mailbox { > struct mbox_chan *chan_platform_receiver; > struct scmi_chan_info *cinfo; > struct scmi_shared_mem __iomem *shmem; > + struct mutex chan_lock; Missing Doxygen comment.... arm_scmi/transports/mailbox.c:39: warning: Function parameter or struct member 'chan_lock' not described in 'scmi_mailbox > }; > > #define client_to_scmi_mailbox(c) container_of(c, struct scmi_mailbox, cl) > @@ -205,6 +206,7 @@ static int mailbox_chan_setup(struct scmi_chan_info *cinfo, struct device *dev, > cl->rx_callback = rx_callback; > cl->tx_block = false; > cl->knows_txdone = tx; > + mutex_init(&smbox->chan_lock); This could be move at the end of this function after the channels are requested and it is no more possible to fail and bail out....messages wont flow and lock wont be used anyway until this chan_setup completes... ...BUT I have NOT string opinion about this....you can leave it here too...up to you > > smbox->chan = mbox_request_channel(cl, tx ? 0 : p2a_chan); > if (IS_ERR(smbox->chan)) { > @@ -267,11 +269,21 @@ static int mailbox_send_message(struct scmi_chan_info *cinfo, > struct scmi_mailbox *smbox = cinfo->transport_info; > int ret; > > + /* > + * The mailbox layer has it's own queue. However the mailbox queue confuses its own queue > + * the per message SCMI timeouts since the clock starts when the message is > + * submitted into the mailbox queue. So when multiple messages are queued up > + * the clock starts on all messages instead of only the one inflight. > + */ > + mutex_lock(&smbox->chan_lock); > + > ret = mbox_send_message(smbox->chan, xfer); > > /* mbox_send_message returns non-negative value on success, so reset */ > if (ret > 0) > ret = 0; > + else > + mutex_unlock(&smbox->chan_lock); I think this should be else if (ret < 0) mutex_unlock(&smbox->chan_lock); ...since looking at mbox_send_message() and its implementation it returns NON-Negative integers on Success...so 0 from mbox_send_mmessage() also means SUCCESS and we should not release the mutex (I think the 'ret' returned here is the idx from add_to_rbuf...so it will become zero peridiocally on normal successfull operation) > > return ret; > } > @@ -281,13 +293,10 @@ static void mailbox_mark_txdone(struct scmi_chan_info *cinfo, int ret, > { > struct scmi_mailbox *smbox = cinfo->transport_info; > > - /* > - * NOTE: we might prefer not to need the mailbox ticker to manage the > - * transfer queueing since the protocol layer queues things by itself. > - * Unfortunately, we have to kick the mailbox framework after we have > - * received our message. > - */ > mbox_client_txdone(smbox->chan, ret); > + > + /* Release channel */ > + mutex_unlock(&smbox->chan_lock); > } > > static void mailbox_fetch_response(struct scmi_chan_info *cinfo, > -- > 2.34.1 > I gave it a go on a couple of JUNO, without any issues. Other than the above, LGTM. Reviewed-by: Cristian Marussi <cristian.marussi@arm.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Thanks, Cristian
On 10/11/24 06:43, Cristian Marussi wrote: > On Wed, Oct 09, 2024 at 12:26:37PM -0700, Justin Chen wrote: >> send_message() does not block in the MBOX implementation. This is >> because the mailbox layer has its own queue. However, this confuses >> the per xfer timeouts as they all start their timeout ticks in >> parallel. >> >> Consider a case where the xfer timeout is 30ms and a SCMI transaction >> takes 25ms. >> >> 0ms: Message #0 is queued in mailbox layer and sent out, then sits >> at scmi_wait_for_message_response() with a timeout of 30ms >> 1ms: Message #1 is queued in mailbox layer but not sent out yet. >> Since send_message() doesn't block, it also sits at >> scmi_wait_for_message_response() with a timeout of 30ms >> ... >> 25ms: Message #0 is completed, txdone is called and Message #1 is >> sent out >> 31ms: Message #1 times out since the count started at 1ms. Even >> though it has only been inflight for 6ms. >> >> Fixes: b53515fa177c ("firmware: arm_scmi: Make MBOX transport a standalone driver") >> Signed-off-by: Justin Chen <justin.chen@broadcom.com> >> --- >> >> Changes in v2: > > Hi Justin, > > thanks. > > A few nitpicks and one remark down below. Since there will likely be a v3, the Fixes tag should IMHO go way back to when this problem has been in existence, maybe: Fixes: 5c8a47a5a91d ("firmware: arm_scmi: Make scmi core independent of the transport type")
On 10/11/24 6:43 AM, Cristian Marussi wrote: > On Wed, Oct 09, 2024 at 12:26:37PM -0700, Justin Chen wrote: >> send_message() does not block in the MBOX implementation. This is >> because the mailbox layer has its own queue. However, this confuses >> the per xfer timeouts as they all start their timeout ticks in >> parallel. >> >> Consider a case where the xfer timeout is 30ms and a SCMI transaction >> takes 25ms. >> >> 0ms: Message #0 is queued in mailbox layer and sent out, then sits >> at scmi_wait_for_message_response() with a timeout of 30ms >> 1ms: Message #1 is queued in mailbox layer but not sent out yet. >> Since send_message() doesn't block, it also sits at >> scmi_wait_for_message_response() with a timeout of 30ms >> ... >> 25ms: Message #0 is completed, txdone is called and Message #1 is >> sent out >> 31ms: Message #1 times out since the count started at 1ms. Even >> though it has only been inflight for 6ms. >> >> Fixes: b53515fa177c ("firmware: arm_scmi: Make MBOX transport a standalone driver") >> Signed-off-by: Justin Chen <justin.chen@broadcom.com> >> --- >> >> Changes in v2: > > Hi Justin, > > thanks. > > A few nitpicks and one remark down below. > >> >> - Added Fixes tag >> - Improved commit message to better capture the issue >> >> .../firmware/arm_scmi/transports/mailbox.c | 21 +++++++++++++------ >> 1 file changed, 15 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/firmware/arm_scmi/transports/mailbox.c b/drivers/firmware/arm_scmi/transports/mailbox.c >> index 1a754dee24f7..30bc2865582f 100644 >> --- a/drivers/firmware/arm_scmi/transports/mailbox.c >> +++ b/drivers/firmware/arm_scmi/transports/mailbox.c >> @@ -33,6 +33,7 @@ struct scmi_mailbox { >> struct mbox_chan *chan_platform_receiver; >> struct scmi_chan_info *cinfo; >> struct scmi_shared_mem __iomem *shmem; >> + struct mutex chan_lock; > > Missing Doxygen comment.... > > arm_scmi/transports/mailbox.c:39: warning: Function parameter or struct member 'chan_lock' not described in 'scmi_mailbox > >> }; >> >> #define client_to_scmi_mailbox(c) container_of(c, struct scmi_mailbox, cl) >> @@ -205,6 +206,7 @@ static int mailbox_chan_setup(struct scmi_chan_info *cinfo, struct device *dev, >> cl->rx_callback = rx_callback; >> cl->tx_block = false; >> cl->knows_txdone = tx; >> + mutex_init(&smbox->chan_lock); > > This could be move at the end of this function after the channels are > requested and it is no more possible to fail and bail out....messages > wont flow and lock wont be used anyway until this chan_setup completes... > ...BUT I have NOT string opinion about this....you can leave it here > too...up to you >> >> smbox->chan = mbox_request_channel(cl, tx ? 0 : p2a_chan); >> if (IS_ERR(smbox->chan)) { >> @@ -267,11 +269,21 @@ static int mailbox_send_message(struct scmi_chan_info *cinfo, >> struct scmi_mailbox *smbox = cinfo->transport_info; >> int ret; >> >> + /* >> + * The mailbox layer has it's own queue. However the mailbox queue confuses > its own queue > >> + * the per message SCMI timeouts since the clock starts when the message is >> + * submitted into the mailbox queue. So when multiple messages are queued up >> + * the clock starts on all messages instead of only the one inflight. >> + */ >> + mutex_lock(&smbox->chan_lock); >> + >> ret = mbox_send_message(smbox->chan, xfer); >> >> /* mbox_send_message returns non-negative value on success, so reset */ >> if (ret > 0) >> ret = 0; >> + else >> + mutex_unlock(&smbox->chan_lock); > > I think this should be > > else if (ret < 0) > mutex_unlock(&smbox->chan_lock); > > ...since looking at mbox_send_message() and its implementation it returns > NON-Negative integers on Success...so 0 from mbox_send_mmessage() also means > SUCCESS and we should not release the mutex (I think the 'ret' returned > here is the idx from add_to_rbuf...so it will become zero peridiocally > on normal successfull operation) > Yes, I see the implementation. Looks like it returns the position in the ring buffer. I also confirmed with CONFIG_DEBUG_MUTEXES which triggers a warning. What about this? if (ret >= 0) ret = 0 else mutex_unlock(&smbox->chan_lock); A bit easier to read IMO. Thanks, Justin >> >> return ret; >> } >> @@ -281,13 +293,10 @@ static void mailbox_mark_txdone(struct scmi_chan_info *cinfo, int ret, >> { >> struct scmi_mailbox *smbox = cinfo->transport_info; >> >> - /* >> - * NOTE: we might prefer not to need the mailbox ticker to manage the >> - * transfer queueing since the protocol layer queues things by itself. >> - * Unfortunately, we have to kick the mailbox framework after we have >> - * received our message. >> - */ >> mbox_client_txdone(smbox->chan, ret); >> + >> + /* Release channel */ >> + mutex_unlock(&smbox->chan_lock); >> } >> >> static void mailbox_fetch_response(struct scmi_chan_info *cinfo, >> -- >> 2.34.1 >> > > I gave it a go on a couple of JUNO, without any issues. > > Other than the above, LGTM. > > Reviewed-by: Cristian Marussi <cristian.marussi@arm.com> > Tested-by: Cristian Marussi <cristian.marussi@arm.com> > > Thanks, > Cristian >
On Fri, Oct 11, 2024 at 12:15:07PM -0700, Justin Chen wrote: > > > On 10/11/24 6:43 AM, Cristian Marussi wrote: > > On Wed, Oct 09, 2024 at 12:26:37PM -0700, Justin Chen wrote: > > > send_message() does not block in the MBOX implementation. This is > > > because the mailbox layer has its own queue. However, this confuses > > > the per xfer timeouts as they all start their timeout ticks in > > > parallel. > > > > > > Consider a case where the xfer timeout is 30ms and a SCMI transaction > > > takes 25ms. > > > > > > 0ms: Message #0 is queued in mailbox layer and sent out, then sits > > > at scmi_wait_for_message_response() with a timeout of 30ms > > > 1ms: Message #1 is queued in mailbox layer but not sent out yet. > > > Since send_message() doesn't block, it also sits at > > > scmi_wait_for_message_response() with a timeout of 30ms > > > ... > > > 25ms: Message #0 is completed, txdone is called and Message #1 is > > > sent out > > > 31ms: Message #1 times out since the count started at 1ms. Even > > > though it has only been inflight for 6ms. > > > > > > Fixes: b53515fa177c ("firmware: arm_scmi: Make MBOX transport a standalone driver") > > > Signed-off-by: Justin Chen <justin.chen@broadcom.com> > > > --- > > > > > > Changes in v2: > > > > Hi Justin, > > > > thanks. > > > > A few nitpicks and one remark down below. > > > > > > > > - Added Fixes tag > > > - Improved commit message to better capture the issue > > > > > > .../firmware/arm_scmi/transports/mailbox.c | 21 +++++++++++++------ > > > 1 file changed, 15 insertions(+), 6 deletions(-) > > > > > > diff --git a/drivers/firmware/arm_scmi/transports/mailbox.c b/drivers/firmware/arm_scmi/transports/mailbox.c > > > index 1a754dee24f7..30bc2865582f 100644 > > > --- a/drivers/firmware/arm_scmi/transports/mailbox.c > > > +++ b/drivers/firmware/arm_scmi/transports/mailbox.c > > > @@ -33,6 +33,7 @@ struct scmi_mailbox { > > > struct mbox_chan *chan_platform_receiver; > > > struct scmi_chan_info *cinfo; > > > struct scmi_shared_mem __iomem *shmem; > > > + struct mutex chan_lock; > > > > Missing Doxygen comment.... > > > > arm_scmi/transports/mailbox.c:39: warning: Function parameter or struct member 'chan_lock' not described in 'scmi_mailbox > > > > > }; > > > #define client_to_scmi_mailbox(c) container_of(c, struct scmi_mailbox, cl) > > > @@ -205,6 +206,7 @@ static int mailbox_chan_setup(struct scmi_chan_info *cinfo, struct device *dev, > > > cl->rx_callback = rx_callback; > > > cl->tx_block = false; > > > cl->knows_txdone = tx; > > > + mutex_init(&smbox->chan_lock); > > > > This could be move at the end of this function after the channels are > > requested and it is no more possible to fail and bail out....messages > > wont flow and lock wont be used anyway until this chan_setup completes... > > ...BUT I have NOT string opinion about this....you can leave it here > > too...up to you > > > smbox->chan = mbox_request_channel(cl, tx ? 0 : p2a_chan); > > > if (IS_ERR(smbox->chan)) { > > > @@ -267,11 +269,21 @@ static int mailbox_send_message(struct scmi_chan_info *cinfo, > > > struct scmi_mailbox *smbox = cinfo->transport_info; > > > int ret; > > > + /* > > > + * The mailbox layer has it's own queue. However the mailbox queue confuses > > its own queue > > > > > + * the per message SCMI timeouts since the clock starts when the message is > > > + * submitted into the mailbox queue. So when multiple messages are queued up > > > + * the clock starts on all messages instead of only the one inflight. > > > + */ > > > + mutex_lock(&smbox->chan_lock); > > > + > > > ret = mbox_send_message(smbox->chan, xfer); > > > /* mbox_send_message returns non-negative value on success, so reset */ > > > if (ret > 0) > > > ret = 0; > > > + else > > > + mutex_unlock(&smbox->chan_lock); > > > > I think this should be > > > > else if (ret < 0) > > mutex_unlock(&smbox->chan_lock); > > > > ...since looking at mbox_send_message() and its implementation it returns > > NON-Negative integers on Success...so 0 from mbox_send_mmessage() also means > > SUCCESS and we should not release the mutex (I think the 'ret' returned > > here is the idx from add_to_rbuf...so it will become zero peridiocally > > on normal successfull operation) > > > > Yes, I see the implementation. Looks like it returns the position in the > ring buffer. I also confirmed with CONFIG_DEBUG_MUTEXES which triggers a > warning. > > What about this? > if (ret >= 0) > ret = 0 > else > mutex_unlock(&smbox->chan_lock); > > A bit easier to read IMO. Oh yes much better definitely...or, maybe, even more simply to read: ... mutex_lock(&smbox->chan_lock); ret = mbox_send_message(smbox->chan, xfer); if (ret < 0) { mutex_unlock(&smbox->chan_lock); return ret; } return 0; } .... up to You...not sure what Sudeep prefers... Thanks, Cristian
On Sun, Oct 13, 2024 at 10:26:49AM +0100, Cristian Marussi wrote: > On Fri, Oct 11, 2024 at 12:15:07PM -0700, Justin Chen wrote: > > > > Yes, I see the implementation. Looks like it returns the position in the > > ring buffer. I also confirmed with CONFIG_DEBUG_MUTEXES which triggers a > > warning. > > > > What about this? > > if (ret >= 0) > > ret = 0 > > else > > mutex_unlock(&smbox->chan_lock); > > > > A bit easier to read IMO. > > Oh yes much better definitely...or, maybe, even more simply to read: > > ... > > mutex_lock(&smbox->chan_lock); > ret = mbox_send_message(smbox->chan, xfer); > if (ret < 0) { > mutex_unlock(&smbox->chan_lock); > return ret; > } > > return 0; > } > > .... up to You...not sure what Sudeep prefers... > I like this better. Also I was hoping Justin would send v3 soonish, I want to send PR for fixes soon. So I have gone ahead and fixed all your comments myself. I had seen the doxygen one from builder already and had fixed it up last week when I added this to -next.
On 10/14/24 02:55, Sudeep Holla wrote: > On Sun, Oct 13, 2024 at 10:26:49AM +0100, Cristian Marussi wrote: >> On Fri, Oct 11, 2024 at 12:15:07PM -0700, Justin Chen wrote: >>> >>> Yes, I see the implementation. Looks like it returns the position in the >>> ring buffer. I also confirmed with CONFIG_DEBUG_MUTEXES which triggers a >>> warning. >>> >>> What about this? >>> if (ret >= 0) >>> ret = 0 >>> else >>> mutex_unlock(&smbox->chan_lock); >>> >>> A bit easier to read IMO. >> >> Oh yes much better definitely...or, maybe, even more simply to read: >> >> ... >> >> mutex_lock(&smbox->chan_lock); >> ret = mbox_send_message(smbox->chan, xfer); >> if (ret < 0) { >> mutex_unlock(&smbox->chan_lock); >> return ret; >> } >> >> return 0; >> } >> >> .... up to You...not sure what Sudeep prefers... >> > > I like this better. Also I was hoping Justin would send v3 soonish, I want > to send PR for fixes soon. So I have gone ahead and fixed all your comments > myself. I had seen the doxygen one from builder already and had fixed it up > last week when I added this to -next. > Here's his v3: https://lore.kernel.org/all/20241014160717.1678953-1-justin.chen@broadcom.com/ we are on the other side of the pond ;)
On Mon, Oct 14, 2024 at 10:21:28AM -0700, Florian Fainelli wrote: > On 10/14/24 02:55, Sudeep Holla wrote: > > On Sun, Oct 13, 2024 at 10:26:49AM +0100, Cristian Marussi wrote: > > > On Fri, Oct 11, 2024 at 12:15:07PM -0700, Justin Chen wrote: > > > > > > > > Yes, I see the implementation. Looks like it returns the position in the > > > > ring buffer. I also confirmed with CONFIG_DEBUG_MUTEXES which triggers a > > > > warning. > > > > > > > > What about this? > > > > if (ret >= 0) > > > > ret = 0 > > > > else > > > > mutex_unlock(&smbox->chan_lock); > > > > > > > > A bit easier to read IMO. > > > > > > Oh yes much better definitely...or, maybe, even more simply to read: > > > > > > ... > > > > > > mutex_lock(&smbox->chan_lock); > > > ret = mbox_send_message(smbox->chan, xfer); > > > if (ret < 0) { > > > mutex_unlock(&smbox->chan_lock); > > > return ret; > > > } > > > > > > return 0; > > > } > > > > > > .... up to You...not sure what Sudeep prefers... > > > > > > > I like this better. Also I was hoping Justin would send v3 soonish, I want > > to send PR for fixes soon. So I have gone ahead and fixed all your comments > > myself. I had seen the doxygen one from builder already and had fixed it up > > last week when I added this to -next. > > > > Here's his v3: > > https://lore.kernel.org/all/20241014160717.1678953-1-justin.chen@broadcom.com/ > > we are on the other side of the pond ;) I do understand
diff --git a/drivers/firmware/arm_scmi/transports/mailbox.c b/drivers/firmware/arm_scmi/transports/mailbox.c index 1a754dee24f7..30bc2865582f 100644 --- a/drivers/firmware/arm_scmi/transports/mailbox.c +++ b/drivers/firmware/arm_scmi/transports/mailbox.c @@ -33,6 +33,7 @@ struct scmi_mailbox { struct mbox_chan *chan_platform_receiver; struct scmi_chan_info *cinfo; struct scmi_shared_mem __iomem *shmem; + struct mutex chan_lock; }; #define client_to_scmi_mailbox(c) container_of(c, struct scmi_mailbox, cl) @@ -205,6 +206,7 @@ static int mailbox_chan_setup(struct scmi_chan_info *cinfo, struct device *dev, cl->rx_callback = rx_callback; cl->tx_block = false; cl->knows_txdone = tx; + mutex_init(&smbox->chan_lock); smbox->chan = mbox_request_channel(cl, tx ? 0 : p2a_chan); if (IS_ERR(smbox->chan)) { @@ -267,11 +269,21 @@ static int mailbox_send_message(struct scmi_chan_info *cinfo, struct scmi_mailbox *smbox = cinfo->transport_info; int ret; + /* + * The mailbox layer has it's own queue. However the mailbox queue confuses + * the per message SCMI timeouts since the clock starts when the message is + * submitted into the mailbox queue. So when multiple messages are queued up + * the clock starts on all messages instead of only the one inflight. + */ + mutex_lock(&smbox->chan_lock); + ret = mbox_send_message(smbox->chan, xfer); /* mbox_send_message returns non-negative value on success, so reset */ if (ret > 0) ret = 0; + else + mutex_unlock(&smbox->chan_lock); return ret; } @@ -281,13 +293,10 @@ static void mailbox_mark_txdone(struct scmi_chan_info *cinfo, int ret, { struct scmi_mailbox *smbox = cinfo->transport_info; - /* - * NOTE: we might prefer not to need the mailbox ticker to manage the - * transfer queueing since the protocol layer queues things by itself. - * Unfortunately, we have to kick the mailbox framework after we have - * received our message. - */ mbox_client_txdone(smbox->chan, ret); + + /* Release channel */ + mutex_unlock(&smbox->chan_lock); } static void mailbox_fetch_response(struct scmi_chan_info *cinfo,
send_message() does not block in the MBOX implementation. This is because the mailbox layer has its own queue. However, this confuses the per xfer timeouts as they all start their timeout ticks in parallel. Consider a case where the xfer timeout is 30ms and a SCMI transaction takes 25ms. 0ms: Message #0 is queued in mailbox layer and sent out, then sits at scmi_wait_for_message_response() with a timeout of 30ms 1ms: Message #1 is queued in mailbox layer but not sent out yet. Since send_message() doesn't block, it also sits at scmi_wait_for_message_response() with a timeout of 30ms ... 25ms: Message #0 is completed, txdone is called and Message #1 is sent out 31ms: Message #1 times out since the count started at 1ms. Even though it has only been inflight for 6ms. Fixes: b53515fa177c ("firmware: arm_scmi: Make MBOX transport a standalone driver") Signed-off-by: Justin Chen <justin.chen@broadcom.com> --- Changes in v2: - Added Fixes tag - Improved commit message to better capture the issue .../firmware/arm_scmi/transports/mailbox.c | 21 +++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)