diff mbox series

[v2,1/2] rpmsg: glink: Handle rejected intent request better

Message ID 20241023-pmic-glink-ecancelled-v2-1-ebc268129407@oss.qualcomm.com (mailing list archive)
State Accepted
Commit a387e73fedd6307c0e194deaa53c42b153ff0bd6
Headers show
Series soc: qcom: pmic_glink: Resolve failures to bring up pmic_glink | expand

Commit Message

Bjorn Andersson Oct. 23, 2024, 5:24 p.m. UTC
GLINK operates using pre-allocated buffers, aka intents, where incoming
messages are aggregated before being passed up the stack. In the case
that no suitable intents have been announced by the receiver, the sender
can request an intent to be allocated.

The initial implementation of the response to such request dealt
with two outcomes; granted allocations, and all other cases being
considered -ECANCELLED (likely from "cancelling the operation as the
remote is going down").

But on some channels intent allocation is not supported, instead the
remote will pre-allocate and announce a fixed number of intents for the
sender to use. If for such channels an rpmsg_send() is being invoked
before any channels have been announced, an intent request will be
issued and as this comes back rejected the call fails with -ECANCELED.

Given that this is reported in the same way as the remote being shut
down, there's no way for the client to differentiate the two cases.

In line with the original GLINK design, change the return value to
-EAGAIN for the case where the remote rejects an intent allocation
request.

It's tempting to handle this case in the GLINK core, as we expect
intents to show up in this case. But there's no way to distinguish
between this case and a rejection for a too big allocation, nor is it
possible to predict if a currently used (and seemingly suitable) intent
will be returned for reuse or not. As such, returning the error to the
client and allow it to react seems to be the only sensible solution.

In addition to this, commit 'c05dfce0b89e ("rpmsg: glink: Wait for
intent, not just request ack")' changed the logic such that the code
always wait for an intent request response and an intent. This works out
in most cases, but in the event that an intent request is rejected and no
further intent arrives (e.g. client asks for a too big intent), the code
will stall for 10 seconds and then return -ETIMEDOUT; instead of a more
suitable error.

This change also resulted in intent requests racing with the shutdown of
the remote would be exposed to this same problem, unless some intent
happens to arrive. A patch for this was developed and posted by Sarannya
S [1], and has been incorporated here.

To summarize, the intent request can end in 4 ways:
- Timeout, no response arrived => return -ETIMEDOUT
- Abort TX, the edge is going away => return -ECANCELLED
- Intent request was rejected => return -EAGAIN
- Intent request was accepted, and an intent arrived => return 0

This patch was developed with input from Sarannya S, Deepak Kumar Singh,
and Chris Lew.

[1] https://lore.kernel.org/all/20240925072328.1163183-1-quic_deesin@quicinc.com/

Fixes: c05dfce0b89e ("rpmsg: glink: Wait for intent, not just request ack")
Cc: stable@vger.kernel.org
Tested-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
---
 drivers/rpmsg/qcom_glink_native.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Comments

Chris Lew Oct. 23, 2024, 10:21 p.m. UTC | #1
On 10/23/2024 10:24 AM, Bjorn Andersson wrote:
> GLINK operates using pre-allocated buffers, aka intents, where incoming
> messages are aggregated before being passed up the stack. In the case
> that no suitable intents have been announced by the receiver, the sender
> can request an intent to be allocated.
> 
> The initial implementation of the response to such request dealt
> with two outcomes; granted allocations, and all other cases being
> considered -ECANCELLED (likely from "cancelling the operation as the
> remote is going down").
> 
> But on some channels intent allocation is not supported, instead the
> remote will pre-allocate and announce a fixed number of intents for the
> sender to use. If for such channels an rpmsg_send() is being invoked
> before any channels have been announced, an intent request will be
> issued and as this comes back rejected the call fails with -ECANCELED.
> 
> Given that this is reported in the same way as the remote being shut
> down, there's no way for the client to differentiate the two cases.
> 
> In line with the original GLINK design, change the return value to
> -EAGAIN for the case where the remote rejects an intent allocation
> request.
> 
> It's tempting to handle this case in the GLINK core, as we expect
> intents to show up in this case. But there's no way to distinguish
> between this case and a rejection for a too big allocation, nor is it
> possible to predict if a currently used (and seemingly suitable) intent
> will be returned for reuse or not. As such, returning the error to the
> client and allow it to react seems to be the only sensible solution.
> 
> In addition to this, commit 'c05dfce0b89e ("rpmsg: glink: Wait for
> intent, not just request ack")' changed the logic such that the code
> always wait for an intent request response and an intent. This works out
> in most cases, but in the event that an intent request is rejected and no
> further intent arrives (e.g. client asks for a too big intent), the code
> will stall for 10 seconds and then return -ETIMEDOUT; instead of a more
> suitable error.
> 
> This change also resulted in intent requests racing with the shutdown of
> the remote would be exposed to this same problem, unless some intent
> happens to arrive. A patch for this was developed and posted by Sarannya
> S [1], and has been incorporated here.
> 
> To summarize, the intent request can end in 4 ways:
> - Timeout, no response arrived => return -ETIMEDOUT
> - Abort TX, the edge is going away => return -ECANCELLED
> - Intent request was rejected => return -EAGAIN
> - Intent request was accepted, and an intent arrived => return 0
> 
> This patch was developed with input from Sarannya S, Deepak Kumar Singh,
> and Chris Lew.
> 
> [1] https://lore.kernel.org/all/20240925072328.1163183-1-quic_deesin@quicinc.com/
> 
> Fixes: c05dfce0b89e ("rpmsg: glink: Wait for intent, not just request ack")
> Cc: stable@vger.kernel.org
> Tested-by: Johan Hovold <johan+linaro@kernel.org>
> Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
> ---

Reviewed-by: Chris Lew <quic_clew@quicinc.com>
diff mbox series

Patch

diff --git a/drivers/rpmsg/qcom_glink_native.c b/drivers/rpmsg/qcom_glink_native.c
index 0b2f290069080638581a13b3a580054d31e176c2..d3af1dfa3c7d71b95dda911dfc7ad844679359d6 100644
--- a/drivers/rpmsg/qcom_glink_native.c
+++ b/drivers/rpmsg/qcom_glink_native.c
@@ -1440,14 +1440,18 @@  static int qcom_glink_request_intent(struct qcom_glink *glink,
 		goto unlock;
 
 	ret = wait_event_timeout(channel->intent_req_wq,
-				 READ_ONCE(channel->intent_req_result) >= 0 &&
-				 READ_ONCE(channel->intent_received),
+				 READ_ONCE(channel->intent_req_result) == 0 ||
+				 (READ_ONCE(channel->intent_req_result) > 0 &&
+				  READ_ONCE(channel->intent_received)) ||
+				 glink->abort_tx,
 				 10 * HZ);
 	if (!ret) {
 		dev_err(glink->dev, "intent request timed out\n");
 		ret = -ETIMEDOUT;
+	} else if (glink->abort_tx) {
+		ret = -ECANCELED;
 	} else {
-		ret = READ_ONCE(channel->intent_req_result) ? 0 : -ECANCELED;
+		ret = READ_ONCE(channel->intent_req_result) ? 0 : -EAGAIN;
 	}
 
 unlock: