Message ID | 1636548651-44649-1-git-send-email-guwen@linux.alibaba.com (mailing list archive) |
---|---|
Headers | show |
Series | Two RFC patches for the same SMC socket wait queue mismatch issue | expand |
On 10/11/2021 13:50, Wen Gu wrote: > Hi, Karsten > > Thanks for your reply. The previous discussion about the issue of socket > wait queue mismatch in SMC fallback can be referred from: > https://lore.kernel.org/all/db9acf73-abef-209e-6ec2-8ada92e2cfbc@linux.ibm.com/ > > This set of patches includes two RFC patches, they are both aimed to fix > the same issue, the mismatch of socket wait queue in SMC fallback. > > In your last reply, I am suggested to add the complete description about > the intention of initial patch in order that readers can understand the > idea behind it. This has been done in "[RFC PATCH net v2 0/2] net/smc: Fix > socket wait queue mismatch issue caused by fallback" of this mail. > > Unfortunately, I found a defect later in the solution of the initial patch > or the v2 patch mentioned above. The defect is about fasync_list and related > to 67f562e3e14 ("net/smc: transfer fasync_list in case of fallback"). > > When user applications use sock_fasync() to insert entries into fasync_list, > the wait queue they operate is smc socket->wq. But in initial patch or > the v2 patch, I swapped sk->sk_wq of smc socket and clcsocket in smc_create(), > thus the sk_data_ready / sk_write_space.. of smc will wake up clcsocket->wq > finally. So the entries added into smc socket->wq.fasync_list won't be woken > up at all before fallback. > > So the solution in initial patch or the v2 patch of this mail by swapping > sk->sk_wq of smc socket and clcsocket seems a bad way to fix this issue. > > Therefore, I tried another solution by removing the wait queue entries from > smc socket->wq to clcsocket->wq during the fallback, which is described in the > "[RFC PATCH net 2/2] net/smc: Transfer remaining wait queue entries" of this > mail. In our test environment, this patch can fix the fallback issue well. Still running final tests but overall its working well here, too. Until we maybe find a 'cleaner' solution if this I would like to go with your current fixes. But I would like to improve the wording of the commit message and the comments a little bit if you are okay with that. If you send a new series with the 2 patches then I would take them and post them to the list again with my changes. What do you think?
On 2021/11/11 10:21 pm, Karsten Graul wrote: > On 10/11/2021 13:50, Wen Gu wrote: >> Hi, Karsten >> >> Thanks for your reply. The previous discussion about the issue of socket >> wait queue mismatch in SMC fallback can be referred from: >> https://lore.kernel.org/all/db9acf73-abef-209e-6ec2-8ada92e2cfbc@linux.ibm.com/ >> >> This set of patches includes two RFC patches, they are both aimed to fix >> the same issue, the mismatch of socket wait queue in SMC fallback. >> >> In your last reply, I am suggested to add the complete description about >> the intention of initial patch in order that readers can understand the >> idea behind it. This has been done in "[RFC PATCH net v2 0/2] net/smc: Fix >> socket wait queue mismatch issue caused by fallback" of this mail. >> >> Unfortunately, I found a defect later in the solution of the initial patch >> or the v2 patch mentioned above. The defect is about fasync_list and related >> to 67f562e3e14 ("net/smc: transfer fasync_list in case of fallback"). >> >> When user applications use sock_fasync() to insert entries into fasync_list, >> the wait queue they operate is smc socket->wq. But in initial patch or >> the v2 patch, I swapped sk->sk_wq of smc socket and clcsocket in smc_create(), >> thus the sk_data_ready / sk_write_space.. of smc will wake up clcsocket->wq >> finally. So the entries added into smc socket->wq.fasync_list won't be woken >> up at all before fallback. >> >> So the solution in initial patch or the v2 patch of this mail by swapping >> sk->sk_wq of smc socket and clcsocket seems a bad way to fix this issue. >> >> Therefore, I tried another solution by removing the wait queue entries from >> smc socket->wq to clcsocket->wq during the fallback, which is described in the >> "[RFC PATCH net 2/2] net/smc: Transfer remaining wait queue entries" of this >> mail. In our test environment, this patch can fix the fallback issue well. > > Still running final tests but overall its working well here, too. > Until we maybe find a 'cleaner' solution if this I would like to go with your > current fixes. But I would like to improve the wording of the commit message and > the comments a little bit if you are okay with that. > > If you send a new series with the 2 patches then I would take them and post them > to the list again with my changes. Seems just the second patch alone will fix the issue. > > What do you think? > Thanks for your reply. I am glad that the second patch works well. To avoid there being any misunderstanding between us, I want to explain that just the second patch "[RFC PATCH net 2/2] net/smc: Transfer remaining wait queue entries" alone will fix the issue well. Because it transfers the remaining entries in smc socket->wq to clcsocket->wq during the fallback, so that the entries added into smc socket->wq before fallback will still works after fallback, even though user applications start to use clcsocket. The first patch "[RFC PATCH net v2 0/2] net/smc: Fix socket wait queue mismatch issue caused by fallback" should be abandoned. I sent it only to better explain the defect I found in my initial patch or this v2 patch. Hope it didn't bother you. Swapping the sk->sk_wq seems a bad way to fix the issue because it can not handle the fasync_list well. Unfortunately I found this defect until I almost finished it :( So, I think maybe it is fine that just send the second patch "[RFC PATCH net 2/2] net/smc: Transfer remaining wait queue entries" again. I will send it later. And, it is okay for me if you want to improve the commit messages or comments. Thank you. Cheers, Wen Gu