diff mbox series

[net] net/smc: Transfer remaining wait queue entries during fallback

Message ID 1636687839-38962-1-git-send-email-guwen@linux.alibaba.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] net/smc: Transfer remaining wait queue entries during fallback | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 8 this patch: 8
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 26 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Wen Gu Nov. 12, 2021, 3:30 a.m. UTC
The SMC fallback is incomplete currently. There may be some
wait queue entries remaining in smc socket->wq, which should
be removed to clcsocket->wq during the fallback.

For example, in nginx/wrk benchmark, this issue causes an
all-zeros test result:

server: nginx -g 'daemon off;'
client: smc_run wrk -c 1 -t 1 -d 5 http://11.200.15.93/index.html

  Running 5s test @ http://11.200.15.93/index.html
     1 threads and 1 connections
     Thread Stats   Avg      Stdev     Max   ± Stdev
     	Latency     0.00us    0.00us   0.00us    -nan%
	Req/Sec     0.00      0.00     0.00      -nan%
	0 requests in 5.00s, 0.00B read
     Requests/sec:      0.00
     Transfer/sec:       0.00B

The reason for this all-zeros result is that when wrk used SMC
to replace TCP, it added an eppoll_entry into smc socket->wq
and expected to be notified if epoll events like EPOLL_IN/
EPOLL_OUT occurred on the smc socket.

However, once a fallback occurred, wrk switches to use clcsocket.
Now it is clcsocket->wq instead of smc socket->wq which will
be woken up. The eppoll_entry remaining in smc socket->wq does
not work anymore and wrk stops the test.

This patch fixes this issue by removing remaining wait queue
entries from smc socket->wq to clcsocket->wq during the fallback.

Link: https://www.spinics.net/lists/netdev/msg779769.html
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
---
 net/smc/af_smc.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Heiko Carstens Nov. 12, 2021, 2:02 p.m. UTC | #1
On Fri, Nov 12, 2021 at 11:30:39AM +0800, Wen Gu wrote:
...
> +	wait_queue_head_t *smc_wait = sk_sleep(&smc->sk);
> +	wait_queue_head_t *clc_wait = sk_sleep(smc->clcsock->sk);
> +	unsigned long flags;
> +
>  	smc->use_fallback = true;
>  	smc->fallback_rsn = reason_code;
>  	smc_stat_fallback(smc);
> @@ -571,6 +575,16 @@ static void smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
>  		smc->clcsock->file->private_data = smc->clcsock;
>  		smc->clcsock->wq.fasync_list =
>  			smc->sk.sk_socket->wq.fasync_list;
> +
> +		/* There might be some wait queue entries remaining
> +		 * in smc socket->wq, which should be removed to
> +		 * clcsocket->wq during the fallback.
> +		 */
> +		spin_lock_irqsave(&smc_wait->lock, flags);
> +		spin_lock_irqsave(&clc_wait->lock, flags);
> +		list_splice_init(&smc_wait->head, &clc_wait->head);
> +		spin_unlock_irqrestore(&clc_wait->lock, flags);
> +		spin_unlock_irqrestore(&smc_wait->lock, flags);

No idea if the rest of the patch makes sense, however this usage of
spin_lock_irqsave() is not correct. The second spin_lock_irqsave()
would always save a state with irqs disabled into "flags", and
therefore this path would always be left with irqs disabled,
regardless if irqs were enabled or disabled when entering.

You need to change it to something like

> +		spin_lock_irqsave(&smc_wait->lock, flags);
> +		spin_lock(&clc_wait->lock);
> +		list_splice_init(&smc_wait->head, &clc_wait->head);
> +		spin_unlock(&clc_wait->lock);
> +		spin_unlock_irqrestore(&smc_wait->lock, flags);
Wen Gu Nov. 12, 2021, 3:28 p.m. UTC | #2
On 2021/11/12 10:02 pm, Heiko Carstens wrote:
> On Fri, Nov 12, 2021 at 11:30:39AM +0800, Wen Gu wrote:
> ...
>> +	wait_queue_head_t *smc_wait = sk_sleep(&smc->sk);
>> +	wait_queue_head_t *clc_wait = sk_sleep(smc->clcsock->sk);
>> +	unsigned long flags;
>> +
>>   	smc->use_fallback = true;
>>   	smc->fallback_rsn = reason_code;
>>   	smc_stat_fallback(smc);
>> @@ -571,6 +575,16 @@ static void smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
>>   		smc->clcsock->file->private_data = smc->clcsock;
>>   		smc->clcsock->wq.fasync_list =
>>   			smc->sk.sk_socket->wq.fasync_list;
>> +
>> +		/* There might be some wait queue entries remaining
>> +		 * in smc socket->wq, which should be removed to
>> +		 * clcsocket->wq during the fallback.
>> +		 */
>> +		spin_lock_irqsave(&smc_wait->lock, flags);
>> +		spin_lock_irqsave(&clc_wait->lock, flags);
>> +		list_splice_init(&smc_wait->head, &clc_wait->head);
>> +		spin_unlock_irqrestore(&clc_wait->lock, flags);
>> +		spin_unlock_irqrestore(&smc_wait->lock, flags);
> 
> No idea if the rest of the patch makes sense, however this usage of
> spin_lock_irqsave() is not correct. The second spin_lock_irqsave()
> would always save a state with irqs disabled into "flags", and
> therefore this path would always be left with irqs disabled,
> regardless if irqs were enabled or disabled when entering.
> 
> You need to change it to something like
> 
>> +		spin_lock_irqsave(&smc_wait->lock, flags);
>> +		spin_lock(&clc_wait->lock);
>> +		list_splice_init(&smc_wait->head, &clc_wait->head);
>> +		spin_unlock(&clc_wait->lock);
>> +		spin_unlock_irqrestore(&smc_wait->lock, flags);

Sorry for the mistake... I will correct this.

Thanks,
Wen Gu
diff mbox series

Patch

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 0cf7ed2..11a966a 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -562,6 +562,10 @@  static void smc_stat_fallback(struct smc_sock *smc)
 
 static void smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
 {
+	wait_queue_head_t *smc_wait = sk_sleep(&smc->sk);
+	wait_queue_head_t *clc_wait = sk_sleep(smc->clcsock->sk);
+	unsigned long flags;
+
 	smc->use_fallback = true;
 	smc->fallback_rsn = reason_code;
 	smc_stat_fallback(smc);
@@ -571,6 +575,16 @@  static void smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
 		smc->clcsock->file->private_data = smc->clcsock;
 		smc->clcsock->wq.fasync_list =
 			smc->sk.sk_socket->wq.fasync_list;
+
+		/* There might be some wait queue entries remaining
+		 * in smc socket->wq, which should be removed to
+		 * clcsocket->wq during the fallback.
+		 */
+		spin_lock_irqsave(&smc_wait->lock, flags);
+		spin_lock_irqsave(&clc_wait->lock, flags);
+		list_splice_init(&smc_wait->head, &clc_wait->head);
+		spin_unlock_irqrestore(&clc_wait->lock, flags);
+		spin_unlock_irqrestore(&smc_wait->lock, flags);
 	}
 }