diff mbox series

[5/5] cxl/mbox: Retry sending mbox command upon RETRY return_code

Message ID 20220317234049.69323-6-dave@stgolabs.net
State Superseded
Headers show
Series cxl/mbox: Robustify handling of mbox_cmd.return_code | expand

Commit Message

Davidlohr Bueso March 17, 2022, 11:40 p.m. UTC
As mentioned in CXL 2.0 specs: the hardware can set a Retry
Required if the the command was not completed due to a
temporary error. An optional single retry may resolve the
issue. This is the only retry case as in general they are
not recommended for commands that return an error.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 drivers/cxl/core/mbox.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

Comments

Adam Manzanares March 22, 2022, 9:10 p.m. UTC | #1
On Thu, Mar 17, 2022 at 04:40:49PM -0700, Davidlohr Bueso wrote:
> As mentioned in CXL 2.0 specs: the hardware can set a Retry
> Required if the the command was not completed due to a
> temporary error. An optional single retry may resolve the
> issue. This is the only retry case as in general they are
> not recommended for commands that return an error.
> 
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
>  drivers/cxl/core/mbox.c | 23 ++++++++++++++---------
>  1 file changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index fa9e7043f158..567987052b68 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -160,20 +160,25 @@ int cxl_mbox_send_cmd(struct cxl_dev_state *cxlds, u16 opcode, void *in,

Can you update the documentation cxl_mbox_send_cmd that the function now 
retries commands if the RC indicates it should be done?

>  		.size_out = out_size,
>  		.payload_out = out,
>  	};
> -	int rc;
> +	int rc, retry = 1;
>  
>  	if (out_size > cxlds->payload_size)
>  		return -E2BIG;
>  
> -	/* acquire and releases the mbox_mutex */
> -	rc = cxlds->mbox_send(cxlds, &mbox_cmd);
> -	if (rc)
> -		return rc;
> +	do {
> +		/* acquire and releases the mbox_mutex */
> +		rc = cxlds->mbox_send(cxlds, &mbox_cmd);
> +		if (rc)
> +			return rc;
>  
> -	if (mbox_cmd.return_code != CXL_MBOX_CMD_RC_SUCCESS) {
> -		int err = cxl_mbox_cmd_rc2errno(&mbox_cmd);
> -		return -err;
> -	}
> +		if (mbox_cmd.return_code == CXL_MBOX_CMD_RC_SUCCESS)
> +			break;
> +
> +		if (mbox_cmd.return_code != CXL_MBOX_CMD_RC_RETRY || !retry) {
> +			int err = cxl_mbox_cmd_rc2errno(&mbox_cmd);
> +			return -err;
> +		}
> +	} while (retry--);
>  
>  	/*
>  	 * Variable sized commands can't be validated and so it's up to the
> 
> -- 
> 2.26.2
>

Other than the doc change for the cxl_mbox_send_cmd function.
LGTM

Reviewed by: Adam Manzanares <a.manzanares@samsung.com>
Dan Williams March 30, 2022, 12:42 a.m. UTC | #2
On Thu, Mar 17, 2022 at 4:41 PM Davidlohr Bueso <dave@stgolabs.net> wrote:
>
> As mentioned in CXL 2.0 specs: the hardware can set a Retry
> Required if the the command was not completed due to a
> temporary error. An optional single retry may resolve the
> issue. This is the only retry case as in general they are
> not recommended for commands that return an error.

So I disagree that the driver should be deploying this behaviour by
default. This was also tried for the ACPI NVDIMM commands and it was
helpful for Linux to not play along and get the firmware implementers
to get their act together and detail when a retry is appropriate. So,
this policy is up to the submitter. If that agent has information that
the kernel does not have about a retry being effective, it is free to
attempt the retry after the failure. Otherwise, the spec is overly
hopeful in expecting an implementation to attempt a speculative /
optional retry. Hopefully, a default Linux policy to fail commands
that end in a retry encourages fixes to devices / firmwares that want
to use optional retry as a crutch... or it encourages the discipline
to come back to the specification working group with explicit
per-command updates to define the scenarios when a retry is mandatory
and not optional.
diff mbox series

Patch

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index fa9e7043f158..567987052b68 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -160,20 +160,25 @@  int cxl_mbox_send_cmd(struct cxl_dev_state *cxlds, u16 opcode, void *in,
 		.size_out = out_size,
 		.payload_out = out,
 	};
-	int rc;
+	int rc, retry = 1;
 
 	if (out_size > cxlds->payload_size)
 		return -E2BIG;
 
-	/* acquire and releases the mbox_mutex */
-	rc = cxlds->mbox_send(cxlds, &mbox_cmd);
-	if (rc)
-		return rc;
+	do {
+		/* acquire and releases the mbox_mutex */
+		rc = cxlds->mbox_send(cxlds, &mbox_cmd);
+		if (rc)
+			return rc;
 
-	if (mbox_cmd.return_code != CXL_MBOX_CMD_RC_SUCCESS) {
-		int err = cxl_mbox_cmd_rc2errno(&mbox_cmd);
-		return -err;
-	}
+		if (mbox_cmd.return_code == CXL_MBOX_CMD_RC_SUCCESS)
+			break;
+
+		if (mbox_cmd.return_code != CXL_MBOX_CMD_RC_RETRY || !retry) {
+			int err = cxl_mbox_cmd_rc2errno(&mbox_cmd);
+			return -err;
+		}
+	} while (retry--);
 
 	/*
 	 * Variable sized commands can't be validated and so it's up to the