diff mbox series

[net] net/smc: Avoid overwriting the copies of clcsock callback functions

Message ID 1644415853-46641-1-git-send-email-guwen@linux.alibaba.com (mailing list archive)
State Accepted
Commit 1de9770d121ee9294794cca0e0be8fbfa0134ee8
Delegated to: Netdev Maintainers
Headers show
Series [net] net/smc: Avoid overwriting the copies of clcsock callback functions | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 29 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Wen Gu Feb. 9, 2022, 2:10 p.m. UTC
The callback functions of clcsock will be saved and replaced during
the fallback. But if the fallback happens more than once, then the
copies of these callback functions will be overwritten incorrectly,
resulting in a loop call issue:

clcsk->sk_error_report
 |- smc_fback_error_report() <------------------------------|
     |- smc_fback_forward_wakeup()                          | (loop)
         |- clcsock_callback()  (incorrectly overwritten)   |
             |- smc->clcsk_error_report() ------------------|

So this patch fixes the issue by saving these function pointers only
once in the fallback and avoiding overwriting.

Reported-by: syzbot+4de3c0e8a263e1e499bc@syzkaller.appspotmail.com
Fixes: 341adeec9ada ("net/smc: Forward wakeup to smc socket waitqueue after fallback")
Link: https://lore.kernel.org/r/0000000000006d045e05d78776f6@google.com
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 net/smc/af_smc.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Comments

Tony Lu Feb. 10, 2022, 2:50 a.m. UTC | #1
On Wed, Feb 09, 2022 at 10:10:53PM +0800, Wen Gu wrote:
> The callback functions of clcsock will be saved and replaced during
> the fallback. But if the fallback happens more than once, then the
> copies of these callback functions will be overwritten incorrectly,
> resulting in a loop call issue:
> 
> clcsk->sk_error_report
>  |- smc_fback_error_report() <------------------------------|
>      |- smc_fback_forward_wakeup()                          | (loop)
>          |- clcsock_callback()  (incorrectly overwritten)   |
>              |- smc->clcsk_error_report() ------------------|
> 
> So this patch fixes the issue by saving these function pointers only
> once in the fallback and avoiding overwriting.
> 
> Reported-by: syzbot+4de3c0e8a263e1e499bc@syzkaller.appspotmail.com
> Fixes: 341adeec9ada ("net/smc: Forward wakeup to smc socket waitqueue after fallback")
> Link: https://lore.kernel.org/r/0000000000006d045e05d78776f6@google.com
> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
> ---
>  net/smc/af_smc.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index 8c89d0b..306d9e8c 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -667,14 +667,17 @@ static void smc_fback_error_report(struct sock *clcsk)
>  static int smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
>  {
>  	struct sock *clcsk;
> +	int rc = 0;
>  
>  	mutex_lock(&smc->clcsock_release_lock);
>  	if (!smc->clcsock) {
> -		mutex_unlock(&smc->clcsock_release_lock);
> -		return -EBADF;
> +		rc = -EBADF;
> +		goto out;
>  	}
>  	clcsk = smc->clcsock->sk;
>  
> +	if (smc->use_fallback)
> +		goto out;
>  	smc->use_fallback = true;

I am wondering that there is a potential racing. If ->use_fallback is
setted to true, but the rest of replacing process is on the way, others
who tested and passed ->use_fallback, they would get old value before
replacing.

Thanks,
Tony Lu
Wen Gu Feb. 10, 2022, 8:56 a.m. UTC | #2
On 2022/2/10 10:50 am, Tony Lu wrote:

> I am wondering that there is a potential racing. If ->use_fallback is
> setted to true, but the rest of replacing process is on the way, others
> who tested and passed ->use_fallback, they would get old value before
> replacing.
> 

Thanks for your comments.

I understand your concern. But when I went through all the places that
check for smc->use_fallback, I haven't found the exact potential racing
point. Please point out if I missed something. Thank you.

In my humble opinion, most of the operations after smc->use_fallback check
have no direct relationship with what did in smc_switch_to_fallback() (the
replacement of clcsock callback functions), except for which in smc_sendmsg(),
smc_recvmsg() and smc_sendpage():

smc_sendmsg():

	if (smc->use_fallback) {
		rc = smc->clcsock->ops->sendmsg(smc->clcsock, msg, len);
	}

smc_recvmsg():

	if (smc->use_fallback) {
		rc = smc->clcsock->ops->recvmsg(smc->clcsock, msg, len, flags);
	}

smc_sendpage():

	if (smc->use_fallback) {
		rc = kernel_sendpage(smc->clcsock, page, offset,
				     size, flags);
	}

If smc->use_fallback is set to true, but callback functions (sk_data_ready ...)
of clcsock haven't been replaced yet at this moment, there may be a racing as
you described.

But it won't happen, because fallback must already be done before sending and receiving.

What do you think about it?

Thanks,
Wen Gu
Tony Lu Feb. 11, 2022, 2:32 a.m. UTC | #3
On Thu, Feb 10, 2022 at 04:56:00PM +0800, Wen Gu wrote:
> 
> 
> On 2022/2/10 10:50 am, Tony Lu wrote:
> 
> > I am wondering that there is a potential racing. If ->use_fallback is
> > setted to true, but the rest of replacing process is on the way, others
> > who tested and passed ->use_fallback, they would get old value before
> > replacing.
> > 
> 
> Thanks for your comments.
> 
> I understand your concern. But when I went through all the places that
> check for smc->use_fallback, I haven't found the exact potential racing
> point. Please point out if I missed something. Thank you.
> 
> In my humble opinion, most of the operations after smc->use_fallback check
> have no direct relationship with what did in smc_switch_to_fallback() (the
> replacement of clcsock callback functions), except for which in smc_sendmsg(),
> smc_recvmsg() and smc_sendpage():
> 
> smc_sendmsg():
> 
> 	if (smc->use_fallback) {
> 		rc = smc->clcsock->ops->sendmsg(smc->clcsock, msg, len);
> 	}
> 
> smc_recvmsg():
> 
> 	if (smc->use_fallback) {
> 		rc = smc->clcsock->ops->recvmsg(smc->clcsock, msg, len, flags);
> 	}
> 
> smc_sendpage():
> 
> 	if (smc->use_fallback) {
> 		rc = kernel_sendpage(smc->clcsock, page, offset,
> 				     size, flags);
> 	}
> 
> If smc->use_fallback is set to true, but callback functions (sk_data_ready ...)
> of clcsock haven't been replaced yet at this moment, there may be a racing as
> you described.
> 
> But it won't happen, because fallback must already be done before sending and receiving.
> 
> What do you think about it?
> 
I am concerning about the non-blocking work in workqueue. If we can make
sure the order of fallback is determined, it would be safe. From your
analysis, I think it is safe for now.

Let's back to the patch, the original version of switch_to_fallback()
has a implicit reentrant semantics. This fixes should work, thanks.

Thanks for your detailed investigation.

Best regards,
Tony Lu
patchwork-bot+netdevbpf@kernel.org Feb. 11, 2022, 12:10 p.m. UTC | #4
Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Wed,  9 Feb 2022 22:10:53 +0800 you wrote:
> The callback functions of clcsock will be saved and replaced during
> the fallback. But if the fallback happens more than once, then the
> copies of these callback functions will be overwritten incorrectly,
> resulting in a loop call issue:
> 
> clcsk->sk_error_report
>  |- smc_fback_error_report() <------------------------------|
>      |- smc_fback_forward_wakeup()                          | (loop)
>          |- clcsock_callback()  (incorrectly overwritten)   |
>              |- smc->clcsk_error_report() ------------------|
> 
> [...]

Here is the summary with links:
  - [net] net/smc: Avoid overwriting the copies of clcsock callback functions
    https://git.kernel.org/netdev/net/c/1de9770d121e

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 8c89d0b..306d9e8c 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -667,14 +667,17 @@  static void smc_fback_error_report(struct sock *clcsk)
 static int smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
 {
 	struct sock *clcsk;
+	int rc = 0;
 
 	mutex_lock(&smc->clcsock_release_lock);
 	if (!smc->clcsock) {
-		mutex_unlock(&smc->clcsock_release_lock);
-		return -EBADF;
+		rc = -EBADF;
+		goto out;
 	}
 	clcsk = smc->clcsock->sk;
 
+	if (smc->use_fallback)
+		goto out;
 	smc->use_fallback = true;
 	smc->fallback_rsn = reason_code;
 	smc_stat_fallback(smc);
@@ -702,8 +705,9 @@  static int smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
 		smc->clcsock->sk->sk_user_data =
 			(void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY);
 	}
+out:
 	mutex_unlock(&smc->clcsock_release_lock);
-	return 0;
+	return rc;
 }
 
 /* fall back during connect */