diff mbox series

[net,v2] net/smc: avoid data corruption caused by decline

Message ID 1700197181-83136-1-git-send-email-alibuda@linux.alibaba.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net,v2] net/smc: avoid data corruption caused by decline | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 5482 this patch: 5482
netdev/cc_maintainers success CCed 11 of 12 maintainers
netdev/build_clang success Errors and warnings before: 1386 this patch: 1386
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 5820 this patch: 5820
netdev/checkpatch warning WARNING: line length of 85 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

D. Wythe Nov. 17, 2023, 4:59 a.m. UTC
From: "D. Wythe" <alibuda@linux.alibaba.com>

We found a data corruption issue during testing of SMC-R on Redis
applications.

The benchmark has a low probability of reporting a strange error as
shown below.

"Error: Protocol error, got "\xe2" as reply type byte"

Finally, we found that the retrieved error data was as follows:

0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C
0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2

It is quite obvious that this is a SMC DECLINE message, which means that
the applications received SMC protocol message.
We found that this was caused by the following situations:

client			server
	   proposal
	------------->
	   accept
	<-------------
	   confirm
	------------->
wait confirm

	 failed llc confirm
	    x------
(after 2s)timeout
			wait rsp

wait decline

(after 1s) timeout
			(after 2s) timeout
	    decline
	-------------->
	    decline
	<--------------

As a result, a decline message was sent in the implementation, and this
message was read from TCP by the already-fallback connection.

This patch double the client timeout as 2x of the server value,
With this simple change, the Decline messages should never cross or
collide (during Confirm link timeout).

This issue requires an immediate solution, since the protocol updates
involve a more long-term solution.

Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC flow")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 include/net/netns/smc.h |  2 ++
 net/smc/af_smc.c        |  3 ++-
 net/smc/smc_sysctl.c    | 12 ++++++++++++
 3 files changed, 16 insertions(+), 1 deletion(-)

Comments

Wen Gu Nov. 17, 2023, 6:47 a.m. UTC | #1
On 2023/11/17 12:59, D. Wythe wrote:

> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> We found a data corruption issue during testing of SMC-R on Redis
> applications.
> 
> The benchmark has a low probability of reporting a strange error as
> shown below.
> 
> "Error: Protocol error, got "\xe2" as reply type byte"
> 
> Finally, we found that the retrieved error data was as follows:
> 
> 0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C
> 0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00
> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2
> 
> It is quite obvious that this is a SMC DECLINE message, which means that
> the applications received SMC protocol message.
> We found that this was caused by the following situations:
> 
> client			server
> 	   proposal
> 	------------->
> 	   accept
> 	<-------------
> 	   confirm
> 	------------->
> wait confirm
> 
> 	 failed llc confirm
> 	    x------
> (after 2s)timeout
> 			wait rsp
> 
> wait decline
> 
> (after 1s) timeout
> 			(after 2s) timeout
> 	    decline
> 	-------------->
> 	    decline
> 	<--------------
> 
> As a result, a decline message was sent in the implementation, and this
> message was read from TCP by the already-fallback connection.
> 
> This patch double the client timeout as 2x of the server value,

Is the client's timeout doubled?

 From the code below, it is server's timeout that has been doubled.

> With this simple change, the Decline messages should never cross or
> collide (during Confirm link timeout).
> 
> This issue requires an immediate solution, since the protocol updates
> involve a more long-term solution.
> 
> Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC flow")
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
>   include/net/netns/smc.h |  2 ++
>   net/smc/af_smc.c        |  3 ++-
>   net/smc/smc_sysctl.c    | 12 ++++++++++++
>   3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/netns/smc.h b/include/net/netns/smc.h
> index 582212a..5198896 100644
> --- a/include/net/netns/smc.h
> +++ b/include/net/netns/smc.h
> @@ -22,5 +22,7 @@ struct netns_smc {
>   	int				sysctl_smcr_testlink_time;
>   	int				sysctl_wmem;
>   	int				sysctl_rmem;
> +	/* server's Confirm Link timeout in seconds */
> +	int				sysctl_smcr_srv_confirm_link_timeout;
>   };
>   #endif
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index abd2667..b86ad30 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -1870,7 +1870,8 @@ static int smcr_serv_conf_first_link(struct smc_sock *smc)
>   		return SMC_CLC_DECL_TIMEOUT_CL;
>   
>   	/* receive CONFIRM LINK response from client over the RoCE fabric */
> -	qentry = smc_llc_wait(link->lgr, link, SMC_LLC_WAIT_TIME,
> +	qentry = smc_llc_wait(link->lgr, link,
> +			      sock_net(&smc->sk)->smc.sysctl_smcr_srv_confirm_link_timeout,
>   			      SMC_LLC_CONFIRM_LINK);
>   	if (!qentry) {
>   		struct smc_clc_msg_decline dclc;
> diff --git a/net/smc/smc_sysctl.c b/net/smc/smc_sysctl.c
> index 5cbc18c..919f3f7 100644
> --- a/net/smc/smc_sysctl.c
> +++ b/net/smc/smc_sysctl.c
> @@ -51,6 +51,13 @@
>   		.proc_handler	= proc_dointvec_jiffies,
>   	},
>   	{
> +		.procname	= "smcr_srv_confirm_link_timeout",
> +		.data		= &init_net.smc.sysctl_smcr_srv_confirm_link_timeout,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_jiffies,
> +	},
> +	{
>   		.procname	= "wmem",
>   		.data		= &init_net.smc.sysctl_wmem,
>   		.maxlen		= sizeof(int),
> @@ -95,6 +102,11 @@ int __net_init smc_sysctl_net_init(struct net *net)
>   	net->smc.sysctl_autocorking_size = SMC_AUTOCORKING_DEFAULT_SIZE;
>   	net->smc.sysctl_smcr_buf_type = SMCR_PHYS_CONT_BUFS;
>   	net->smc.sysctl_smcr_testlink_time = SMC_LLC_TESTLINK_DEFAULT_TIME;
> +	/* Increasing the server's timeout by twice as much as the client's
> +	 * timeout by default can temporarily avoid decline messages of
> +	 * both side been crossed or collided.

'both sides' or maybe better for

'..avoid decline messages of both sides crossing or colliding.'



Thanks,
Wen Gu

> +	 */
> +	net->smc.sysctl_smcr_srv_confirm_link_timeout = 2 * SMC_LLC_WAIT_TIME;
>   	WRITE_ONCE(net->smc.sysctl_wmem, net_smc_wmem_init);
>   	WRITE_ONCE(net->smc.sysctl_rmem, net_smc_rmem_init);
>
D. Wythe Nov. 17, 2023, 6:53 a.m. UTC | #2
On 11/17/23 2:47 PM, Wen Gu wrote:
>
>
> On 2023/11/17 12:59, D. Wythe wrote:
>
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>
>> We found a data corruption issue during testing of SMC-R on Redis
>> applications.
>>
>> The benchmark has a low probability of reporting a strange error as
>> shown below.
>>
>> "Error: Protocol error, got "\xe2" as reply type byte"
>>
>> Finally, we found that the retrieved error data was as follows:
>>
>> 0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C
>> 0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2
>>
>> It is quite obvious that this is a SMC DECLINE message, which means that
>> the applications received SMC protocol message.
>> We found that this was caused by the following situations:
>>
>> client            server
>>        proposal
>>     ------------->
>>        accept
>>     <-------------
>>        confirm
>>     ------------->
>> wait confirm
>>
>>      failed llc confirm
>>         x------
>> (after 2s)timeout
>>             wait rsp
>>
>> wait decline
>>
>> (after 1s) timeout
>>             (after 2s) timeout
>>         decline
>>     -------------->
>>         decline
>>     <--------------
>>
>> As a result, a decline message was sent in the implementation, and this
>> message was read from TCP by the already-fallback connection.
>>
>> This patch double the client timeout as 2x of the server value,
>
> Is the client's timeout doubled?
>
> From the code below, it is server's timeout that has been doubled.
>

Forget to fix description, i'll fix that in next revision.

>> With this simple change, the Decline messages should never cross or
>> collide (during Confirm link timeout).
>>
>> This issue requires an immediate solution, since the protocol updates
>> involve a more long-term solution.
>>
>> Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC 
>> flow")
>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>> ---
>>   include/net/netns/smc.h |  2 ++
>>   net/smc/af_smc.c        |  3 ++-
>>   net/smc/smc_sysctl.c    | 12 ++++++++++++
>>   3 files changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/net/netns/smc.h b/include/net/netns/smc.h
>> index 582212a..5198896 100644
>> --- a/include/net/netns/smc.h
>> +++ b/include/net/netns/smc.h
>> @@ -22,5 +22,7 @@ struct netns_smc {
>>       int                sysctl_smcr_testlink_time;
>>       int                sysctl_wmem;
>>       int                sysctl_rmem;
>> +    /* server's Confirm Link timeout in seconds */
>> +    int                sysctl_smcr_srv_confirm_link_timeout;
>>   };
>>   #endif
>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>> index abd2667..b86ad30 100644
>> --- a/net/smc/af_smc.c
>> +++ b/net/smc/af_smc.c
>> @@ -1870,7 +1870,8 @@ static int smcr_serv_conf_first_link(struct 
>> smc_sock *smc)
>>           return SMC_CLC_DECL_TIMEOUT_CL;
>>         /* receive CONFIRM LINK response from client over the RoCE 
>> fabric */
>> -    qentry = smc_llc_wait(link->lgr, link, SMC_LLC_WAIT_TIME,
>> +    qentry = smc_llc_wait(link->lgr, link,
>> + sock_net(&smc->sk)->smc.sysctl_smcr_srv_confirm_link_timeout,
>>                     SMC_LLC_CONFIRM_LINK);
>>       if (!qentry) {
>>           struct smc_clc_msg_decline dclc;
>> diff --git a/net/smc/smc_sysctl.c b/net/smc/smc_sysctl.c
>> index 5cbc18c..919f3f7 100644
>> --- a/net/smc/smc_sysctl.c
>> +++ b/net/smc/smc_sysctl.c
>> @@ -51,6 +51,13 @@
>>           .proc_handler    = proc_dointvec_jiffies,
>>       },
>>       {
>> +        .procname    = "smcr_srv_confirm_link_timeout",
>> +        .data        = 
>> &init_net.smc.sysctl_smcr_srv_confirm_link_timeout,
>> +        .maxlen        = sizeof(int),
>> +        .mode        = 0644,
>> +        .proc_handler    = proc_dointvec_jiffies,
>> +    },
>> +    {
>>           .procname    = "wmem",
>>           .data        = &init_net.smc.sysctl_wmem,
>>           .maxlen        = sizeof(int),
>> @@ -95,6 +102,11 @@ int __net_init smc_sysctl_net_init(struct net *net)
>>       net->smc.sysctl_autocorking_size = SMC_AUTOCORKING_DEFAULT_SIZE;
>>       net->smc.sysctl_smcr_buf_type = SMCR_PHYS_CONT_BUFS;
>>       net->smc.sysctl_smcr_testlink_time = 
>> SMC_LLC_TESTLINK_DEFAULT_TIME;
>> +    /* Increasing the server's timeout by twice as much as the client's
>> +     * timeout by default can temporarily avoid decline messages of
>> +     * both side been crossed or collided.
>
> 'both sides' or maybe better for
>
> '..avoid decline messages of both sides crossing or colliding.'
>
>
Look nice. I'll adopt that.
>
> Thanks,
> Wen Gu
>
>> +     */
>> +    net->smc.sysctl_smcr_srv_confirm_link_timeout = 2 * 
>> SMC_LLC_WAIT_TIME;
>>       WRITE_ONCE(net->smc.sysctl_wmem, net_smc_wmem_init);
>>       WRITE_ONCE(net->smc.sysctl_rmem, net_smc_rmem_init);
Wenjia Zhang Nov. 17, 2023, 12:35 p.m. UTC | #3
On 17.11.23 05:59, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> We found a data corruption issue during testing of SMC-R on Redis
> applications.
> 
> The benchmark has a low probability of reporting a strange error as
> shown below.
> 
> "Error: Protocol error, got "\xe2" as reply type byte"
> 
> Finally, we found that the retrieved error data was as follows:
> 
> 0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C
> 0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00
> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2
> 
> It is quite obvious that this is a SMC DECLINE message, which means that
> the applications received SMC protocol message.
> We found that this was caused by the following situations:
> 
> client			server
> 	   proposal
> 	------------->
> 	   accept
> 	<-------------
> 	   confirm
> 	------------->
> wait confirm
> 
> 	 failed llc confirm
> 	    x------
> (after 2s)timeout
> 			wait rsp
> 
> wait decline
> 
> (after 1s) timeout
> 			(after 2s) timeout
> 	    decline
> 	-------------->
> 	    decline
> 	<--------------
> 
> As a result, a decline message was sent in the implementation, and this
> message was read from TCP by the already-fallback connection.
> 
> This patch double the client timeout as 2x of the server value,
> With this simple change, the Decline messages should never cross or
> collide (during Confirm link timeout).
> 
> This issue requires an immediate solution, since the protocol updates
> involve a more long-term solution.
> 

Hi D.Wythe,

I think you understood me wrong. I mean we don't need sysctl. I like the 
first version more, where you just need to add some comments in the code.

Thanks,
Wenjia
D. Wythe Nov. 17, 2023, 12:43 p.m. UTC | #4
On 11/17/23 8:35 PM, Wenjia Zhang wrote:
>
>
> On 17.11.23 05:59, D. Wythe wrote:
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>
>> We found a data corruption issue during testing of SMC-R on Redis
>> applications.
>>
>> The benchmark has a low probability of reporting a strange error as
>> shown below.
>>
>> "Error: Protocol error, got "\xe2" as reply type byte"
>>
>> Finally, we found that the retrieved error data was as follows:
>>
>> 0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C
>> 0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2
>>
>> It is quite obvious that this is a SMC DECLINE message, which means that
>> the applications received SMC protocol message.
>> We found that this was caused by the following situations:
>>
>> client            server
>>        proposal
>>     ------------->
>>        accept
>>     <-------------
>>        confirm
>>     ------------->
>> wait confirm
>>
>>      failed llc confirm
>>         x------
>> (after 2s)timeout
>>             wait rsp
>>
>> wait decline
>>
>> (after 1s) timeout
>>             (after 2s) timeout
>>         decline
>>     -------------->
>>         decline
>>     <--------------
>>
>> As a result, a decline message was sent in the implementation, and this
>> message was read from TCP by the already-fallback connection.
>>
>> This patch double the client timeout as 2x of the server value,
>> With this simple change, the Decline messages should never cross or
>> collide (during Confirm link timeout).
>>
>> This issue requires an immediate solution, since the protocol updates
>> involve a more long-term solution.
>>
>
> Hi D.Wythe,
>
> I think you understood me wrong. I mean we don't need sysctl. I like 
> the first version more, where you just need to add some comments in 
> the code.
>


diff mbox series

Patch

diff --git a/include/net/netns/smc.h b/include/net/netns/smc.h
index 582212a..5198896 100644
--- a/include/net/netns/smc.h
+++ b/include/net/netns/smc.h
@@ -22,5 +22,7 @@  struct netns_smc {
 	int				sysctl_smcr_testlink_time;
 	int				sysctl_wmem;
 	int				sysctl_rmem;
+	/* server's Confirm Link timeout in seconds */
+	int				sysctl_smcr_srv_confirm_link_timeout;
 };
 #endif
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index abd2667..b86ad30 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -1870,7 +1870,8 @@  static int smcr_serv_conf_first_link(struct smc_sock *smc)
 		return SMC_CLC_DECL_TIMEOUT_CL;
 
 	/* receive CONFIRM LINK response from client over the RoCE fabric */
-	qentry = smc_llc_wait(link->lgr, link, SMC_LLC_WAIT_TIME,
+	qentry = smc_llc_wait(link->lgr, link,
+			      sock_net(&smc->sk)->smc.sysctl_smcr_srv_confirm_link_timeout,
 			      SMC_LLC_CONFIRM_LINK);
 	if (!qentry) {
 		struct smc_clc_msg_decline dclc;
diff --git a/net/smc/smc_sysctl.c b/net/smc/smc_sysctl.c
index 5cbc18c..919f3f7 100644
--- a/net/smc/smc_sysctl.c
+++ b/net/smc/smc_sysctl.c
@@ -51,6 +51,13 @@ 
 		.proc_handler	= proc_dointvec_jiffies,
 	},
 	{
+		.procname	= "smcr_srv_confirm_link_timeout",
+		.data		= &init_net.smc.sysctl_smcr_srv_confirm_link_timeout,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_jiffies,
+	},
+	{
 		.procname	= "wmem",
 		.data		= &init_net.smc.sysctl_wmem,
 		.maxlen		= sizeof(int),
@@ -95,6 +102,11 @@  int __net_init smc_sysctl_net_init(struct net *net)
 	net->smc.sysctl_autocorking_size = SMC_AUTOCORKING_DEFAULT_SIZE;
 	net->smc.sysctl_smcr_buf_type = SMCR_PHYS_CONT_BUFS;
 	net->smc.sysctl_smcr_testlink_time = SMC_LLC_TESTLINK_DEFAULT_TIME;
+	/* Increasing the server's timeout by twice as much as the client's
+	 * timeout by default can temporarily avoid decline messages of
+	 * both side been crossed or collided.
+	 */
+	net->smc.sysctl_smcr_srv_confirm_link_timeout = 2 * SMC_LLC_WAIT_TIME;
 	WRITE_ONCE(net->smc.sysctl_wmem, net_smc_wmem_init);
 	WRITE_ONCE(net->smc.sysctl_rmem, net_smc_rmem_init);