Message ID | 20241022-pmic-glink-ecancelled-v1-1-9e26fc74e0a3@oss.qualcomm.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | soc: qcom: pmic_glink: Resolve failures to bring up pmic_glink | expand |
On Tue, Oct 22, 2024 at 04:17:11AM +0000, Bjorn Andersson wrote: > The initial implementation of request intent response handling dealt > with two outcomes; granted allocations, and all other cases being > considered -ECANCELLED (likely from "cancelling the operation as the > remote is going down"). For the benefit of casual reviewers and contributors, could you add introductory comment about what "intents" are? > But on some channels intent allocation is not supported, instead the > remote will pre-allocate and announce a fixed number of intents for the > sender to use. If for such channels an rpmsg_send() is being invoked > before any channels have been announced, an intent request will be > issued and as this comes back rejected the call is failed with > -ECANCELLED. It's actually the one L -ECANCELED s/is failed/fails/ ? > Given that this is reported in the same way as the remote being shut > down, there's no way for the client to differentiate the two cases. > > In line with the original GLINK design, change the return value to > -EAGAIN for the case where the remote rejects an intent allocation > request. > > It's tempting to handle this case in the GLINK core, as we expect > intents to show up in this case. But there's no way to distinguish > between this case and a rejection for a too big allocation, nor is it > possible to predict if a currently used (and seeminly suitable) intent seemingly > will be returned for reuse or not. As such, returning the error to the > client and allow it to react seems to be the only sensible solution. s/allow/allowing/ ? > In addition to this, commit 'c05dfce0b89e ("rpmsg: glink: Wait for > intent, not just request ack")' changed the logic such that the code > always wait for an intent request response and an intent. This works out > in most cases, but in the event that a intent request is rejected and no an intent > further intent arrives (e.g. client asks for a too big intent), the code > will stall for 10 seconds and then return -ETIMEDOUT; instead of a more > suitable error. > > This change also resulted in intent requests racing with the shutdown of > the remote would be exposed to this same problem, unless some intent > happens to arrive. A patch for this was developed and posted by Sarannya > S [1], and has been incorporated here. > > To summarize, the intent request can end in 4 ways: > - Timeout, no response arrived => return -ETIMEDOUT > - Abort TX, the edge is going away => return -ECANCELLED > - Intent request was rejected => return -EAGAIN > - Intent request was accepted, and an intent arrived => return 0 > > This patch was developed with input from Sarannya S, Deepak Kumar Singh, > and Chris Lew. > > [1] https://lore.kernel.org/all/20240925072328.1163183-1-quic_deesin@quicinc.com/ > > Fixes: c05dfce0b89e ("rpmsg: glink: Wait for intent, not just request ack") > Cc: stable@vger.kernel.org > Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com> Nit picks aside, this was all nice and clear. Tested-by: Johan Hovold <johan+linaro@kernel.org>
diff --git a/drivers/rpmsg/qcom_glink_native.c b/drivers/rpmsg/qcom_glink_native.c index 0b2f290069080638581a13b3a580054d31e176c2..d3af1dfa3c7d71b95dda911dfc7ad844679359d6 100644 --- a/drivers/rpmsg/qcom_glink_native.c +++ b/drivers/rpmsg/qcom_glink_native.c @@ -1440,14 +1440,18 @@ static int qcom_glink_request_intent(struct qcom_glink *glink, goto unlock; ret = wait_event_timeout(channel->intent_req_wq, - READ_ONCE(channel->intent_req_result) >= 0 && - READ_ONCE(channel->intent_received), + READ_ONCE(channel->intent_req_result) == 0 || + (READ_ONCE(channel->intent_req_result) > 0 && + READ_ONCE(channel->intent_received)) || + glink->abort_tx, 10 * HZ); if (!ret) { dev_err(glink->dev, "intent request timed out\n"); ret = -ETIMEDOUT; + } else if (glink->abort_tx) { + ret = -ECANCELED; } else { - ret = READ_ONCE(channel->intent_req_result) ? 0 : -ECANCELED; + ret = READ_ONCE(channel->intent_req_result) ? 0 : -EAGAIN; } unlock:
The initial implementation of request intent response handling dealt with two outcomes; granted allocations, and all other cases being considered -ECANCELLED (likely from "cancelling the operation as the remote is going down"). But on some channels intent allocation is not supported, instead the remote will pre-allocate and announce a fixed number of intents for the sender to use. If for such channels an rpmsg_send() is being invoked before any channels have been announced, an intent request will be issued and as this comes back rejected the call is failed with -ECANCELLED. Given that this is reported in the same way as the remote being shut down, there's no way for the client to differentiate the two cases. In line with the original GLINK design, change the return value to -EAGAIN for the case where the remote rejects an intent allocation request. It's tempting to handle this case in the GLINK core, as we expect intents to show up in this case. But there's no way to distinguish between this case and a rejection for a too big allocation, nor is it possible to predict if a currently used (and seeminly suitable) intent will be returned for reuse or not. As such, returning the error to the client and allow it to react seems to be the only sensible solution. In addition to this, commit 'c05dfce0b89e ("rpmsg: glink: Wait for intent, not just request ack")' changed the logic such that the code always wait for an intent request response and an intent. This works out in most cases, but in the event that a intent request is rejected and no further intent arrives (e.g. client asks for a too big intent), the code will stall for 10 seconds and then return -ETIMEDOUT; instead of a more suitable error. This change also resulted in intent requests racing with the shutdown of the remote would be exposed to this same problem, unless some intent happens to arrive. A patch for this was developed and posted by Sarannya S [1], and has been incorporated here. To summarize, the intent request can end in 4 ways: - Timeout, no response arrived => return -ETIMEDOUT - Abort TX, the edge is going away => return -ECANCELLED - Intent request was rejected => return -EAGAIN - Intent request was accepted, and an intent arrived => return 0 This patch was developed with input from Sarannya S, Deepak Kumar Singh, and Chris Lew. [1] https://lore.kernel.org/all/20240925072328.1163183-1-quic_deesin@quicinc.com/ Fixes: c05dfce0b89e ("rpmsg: glink: Wait for intent, not just request ack") Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com> --- drivers/rpmsg/qcom_glink_native.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)