Message ID | 1644415853-46641-1-git-send-email-guwen@linux.alibaba.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 1de9770d121ee9294794cca0e0be8fbfa0134ee8 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net/smc: Avoid overwriting the copies of clcsock callback functions | expand |
On Wed, Feb 09, 2022 at 10:10:53PM +0800, Wen Gu wrote: > The callback functions of clcsock will be saved and replaced during > the fallback. But if the fallback happens more than once, then the > copies of these callback functions will be overwritten incorrectly, > resulting in a loop call issue: > > clcsk->sk_error_report > |- smc_fback_error_report() <------------------------------| > |- smc_fback_forward_wakeup() | (loop) > |- clcsock_callback() (incorrectly overwritten) | > |- smc->clcsk_error_report() ------------------| > > So this patch fixes the issue by saving these function pointers only > once in the fallback and avoiding overwriting. > > Reported-by: syzbot+4de3c0e8a263e1e499bc@syzkaller.appspotmail.com > Fixes: 341adeec9ada ("net/smc: Forward wakeup to smc socket waitqueue after fallback") > Link: https://lore.kernel.org/r/0000000000006d045e05d78776f6@google.com > Signed-off-by: Wen Gu <guwen@linux.alibaba.com> > --- > net/smc/af_smc.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c > index 8c89d0b..306d9e8c 100644 > --- a/net/smc/af_smc.c > +++ b/net/smc/af_smc.c > @@ -667,14 +667,17 @@ static void smc_fback_error_report(struct sock *clcsk) > static int smc_switch_to_fallback(struct smc_sock *smc, int reason_code) > { > struct sock *clcsk; > + int rc = 0; > > mutex_lock(&smc->clcsock_release_lock); > if (!smc->clcsock) { > - mutex_unlock(&smc->clcsock_release_lock); > - return -EBADF; > + rc = -EBADF; > + goto out; > } > clcsk = smc->clcsock->sk; > > + if (smc->use_fallback) > + goto out; > smc->use_fallback = true; I am wondering that there is a potential racing. If ->use_fallback is setted to true, but the rest of replacing process is on the way, others who tested and passed ->use_fallback, they would get old value before replacing. Thanks, Tony Lu
On 2022/2/10 10:50 am, Tony Lu wrote: > I am wondering that there is a potential racing. If ->use_fallback is > setted to true, but the rest of replacing process is on the way, others > who tested and passed ->use_fallback, they would get old value before > replacing. > Thanks for your comments. I understand your concern. But when I went through all the places that check for smc->use_fallback, I haven't found the exact potential racing point. Please point out if I missed something. Thank you. In my humble opinion, most of the operations after smc->use_fallback check have no direct relationship with what did in smc_switch_to_fallback() (the replacement of clcsock callback functions), except for which in smc_sendmsg(), smc_recvmsg() and smc_sendpage(): smc_sendmsg(): if (smc->use_fallback) { rc = smc->clcsock->ops->sendmsg(smc->clcsock, msg, len); } smc_recvmsg(): if (smc->use_fallback) { rc = smc->clcsock->ops->recvmsg(smc->clcsock, msg, len, flags); } smc_sendpage(): if (smc->use_fallback) { rc = kernel_sendpage(smc->clcsock, page, offset, size, flags); } If smc->use_fallback is set to true, but callback functions (sk_data_ready ...) of clcsock haven't been replaced yet at this moment, there may be a racing as you described. But it won't happen, because fallback must already be done before sending and receiving. What do you think about it? Thanks, Wen Gu
On Thu, Feb 10, 2022 at 04:56:00PM +0800, Wen Gu wrote: > > > On 2022/2/10 10:50 am, Tony Lu wrote: > > > I am wondering that there is a potential racing. If ->use_fallback is > > setted to true, but the rest of replacing process is on the way, others > > who tested and passed ->use_fallback, they would get old value before > > replacing. > > > > Thanks for your comments. > > I understand your concern. But when I went through all the places that > check for smc->use_fallback, I haven't found the exact potential racing > point. Please point out if I missed something. Thank you. > > In my humble opinion, most of the operations after smc->use_fallback check > have no direct relationship with what did in smc_switch_to_fallback() (the > replacement of clcsock callback functions), except for which in smc_sendmsg(), > smc_recvmsg() and smc_sendpage(): > > smc_sendmsg(): > > if (smc->use_fallback) { > rc = smc->clcsock->ops->sendmsg(smc->clcsock, msg, len); > } > > smc_recvmsg(): > > if (smc->use_fallback) { > rc = smc->clcsock->ops->recvmsg(smc->clcsock, msg, len, flags); > } > > smc_sendpage(): > > if (smc->use_fallback) { > rc = kernel_sendpage(smc->clcsock, page, offset, > size, flags); > } > > If smc->use_fallback is set to true, but callback functions (sk_data_ready ...) > of clcsock haven't been replaced yet at this moment, there may be a racing as > you described. > > But it won't happen, because fallback must already be done before sending and receiving. > > What do you think about it? > I am concerning about the non-blocking work in workqueue. If we can make sure the order of fallback is determined, it would be safe. From your analysis, I think it is safe for now. Let's back to the patch, the original version of switch_to_fallback() has a implicit reentrant semantics. This fixes should work, thanks. Thanks for your detailed investigation. Best regards, Tony Lu
Hello: This patch was applied to netdev/net.git (master) by David S. Miller <davem@davemloft.net>: On Wed, 9 Feb 2022 22:10:53 +0800 you wrote: > The callback functions of clcsock will be saved and replaced during > the fallback. But if the fallback happens more than once, then the > copies of these callback functions will be overwritten incorrectly, > resulting in a loop call issue: > > clcsk->sk_error_report > |- smc_fback_error_report() <------------------------------| > |- smc_fback_forward_wakeup() | (loop) > |- clcsock_callback() (incorrectly overwritten) | > |- smc->clcsk_error_report() ------------------| > > [...] Here is the summary with links: - [net] net/smc: Avoid overwriting the copies of clcsock callback functions https://git.kernel.org/netdev/net/c/1de9770d121e You are awesome, thank you!
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 8c89d0b..306d9e8c 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -667,14 +667,17 @@ static void smc_fback_error_report(struct sock *clcsk) static int smc_switch_to_fallback(struct smc_sock *smc, int reason_code) { struct sock *clcsk; + int rc = 0; mutex_lock(&smc->clcsock_release_lock); if (!smc->clcsock) { - mutex_unlock(&smc->clcsock_release_lock); - return -EBADF; + rc = -EBADF; + goto out; } clcsk = smc->clcsock->sk; + if (smc->use_fallback) + goto out; smc->use_fallback = true; smc->fallback_rsn = reason_code; smc_stat_fallback(smc); @@ -702,8 +705,9 @@ static int smc_switch_to_fallback(struct smc_sock *smc, int reason_code) smc->clcsock->sk->sk_user_data = (void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY); } +out: mutex_unlock(&smc->clcsock_release_lock); - return 0; + return rc; } /* fall back during connect */
The callback functions of clcsock will be saved and replaced during the fallback. But if the fallback happens more than once, then the copies of these callback functions will be overwritten incorrectly, resulting in a loop call issue: clcsk->sk_error_report |- smc_fback_error_report() <------------------------------| |- smc_fback_forward_wakeup() | (loop) |- clcsock_callback() (incorrectly overwritten) | |- smc->clcsk_error_report() ------------------| So this patch fixes the issue by saving these function pointers only once in the fallback and avoiding overwriting. Reported-by: syzbot+4de3c0e8a263e1e499bc@syzkaller.appspotmail.com Fixes: 341adeec9ada ("net/smc: Forward wakeup to smc socket waitqueue after fallback") Link: https://lore.kernel.org/r/0000000000006d045e05d78776f6@google.com Signed-off-by: Wen Gu <guwen@linux.alibaba.com> --- net/smc/af_smc.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)