mbox series

[RFC,0/2] Two RFC patches for the same SMC socket wait queue mismatch issue

Message ID 1636548651-44649-1-git-send-email-guwen@linux.alibaba.com (mailing list archive)
Headers show
Series Two RFC patches for the same SMC socket wait queue mismatch issue | expand

Message

Wen Gu Nov. 10, 2021, 12:50 p.m. UTC
Hi, Karsten

Thanks for your reply. The previous discussion about the issue of socket
wait queue mismatch in SMC fallback can be referred from:
https://lore.kernel.org/all/db9acf73-abef-209e-6ec2-8ada92e2cfbc@linux.ibm.com/

This set of patches includes two RFC patches, they are both aimed to fix
the same issue, the mismatch of socket wait queue in SMC fallback.

In your last reply, I am suggested to add the complete description about
the intention of initial patch in order that readers can understand the
idea behind it. This has been done in "[RFC PATCH net v2 0/2] net/smc: Fix
socket wait queue mismatch issue caused by fallback" of this mail.

Unfortunately, I found a defect later in the solution of the initial patch
or the v2 patch mentioned above. The defect is about fasync_list and related
to 67f562e3e14 ("net/smc: transfer fasync_list in case of fallback").

When user applications use sock_fasync() to insert entries into fasync_list,
the wait queue they operate is smc socket->wq. But in initial patch or
the v2 patch, I swapped sk->sk_wq of smc socket and clcsocket in smc_create(),
thus the sk_data_ready / sk_write_space.. of smc will wake up clcsocket->wq
finally. So the entries added into smc socket->wq.fasync_list won't be woken
up at all before fallback.

So the solution in initial patch or the v2 patch of this mail by swapping
sk->sk_wq of smc socket and clcsocket seems a bad way to fix this issue.

Therefore, I tried another solution by removing the wait queue entries from
smc socket->wq to clcsocket->wq during the fallback, which is described in the
"[RFC PATCH net 2/2] net/smc: Transfer remaining wait queue entries" of this
mail. In our test environment, this patch can fix the fallback issue well.

I am looking forward to hear your opinions. Thank you.

Cheers,
Wen Gu

Wen Gu (2):
  net/smc: Fix socket wait queue mismatch issue caused by fallback
  net/smc: Transfer remaining wait queue entries

Comments

Karsten Graul Nov. 11, 2021, 2:21 p.m. UTC | #1
On 10/11/2021 13:50, Wen Gu wrote:
> Hi, Karsten
> 
> Thanks for your reply. The previous discussion about the issue of socket
> wait queue mismatch in SMC fallback can be referred from:
> https://lore.kernel.org/all/db9acf73-abef-209e-6ec2-8ada92e2cfbc@linux.ibm.com/
> 
> This set of patches includes two RFC patches, they are both aimed to fix
> the same issue, the mismatch of socket wait queue in SMC fallback.
> 
> In your last reply, I am suggested to add the complete description about
> the intention of initial patch in order that readers can understand the
> idea behind it. This has been done in "[RFC PATCH net v2 0/2] net/smc: Fix
> socket wait queue mismatch issue caused by fallback" of this mail.
> 
> Unfortunately, I found a defect later in the solution of the initial patch
> or the v2 patch mentioned above. The defect is about fasync_list and related
> to 67f562e3e14 ("net/smc: transfer fasync_list in case of fallback").
> 
> When user applications use sock_fasync() to insert entries into fasync_list,
> the wait queue they operate is smc socket->wq. But in initial patch or
> the v2 patch, I swapped sk->sk_wq of smc socket and clcsocket in smc_create(),
> thus the sk_data_ready / sk_write_space.. of smc will wake up clcsocket->wq
> finally. So the entries added into smc socket->wq.fasync_list won't be woken
> up at all before fallback.
> 
> So the solution in initial patch or the v2 patch of this mail by swapping
> sk->sk_wq of smc socket and clcsocket seems a bad way to fix this issue.
> 
> Therefore, I tried another solution by removing the wait queue entries from
> smc socket->wq to clcsocket->wq during the fallback, which is described in the
> "[RFC PATCH net 2/2] net/smc: Transfer remaining wait queue entries" of this
> mail. In our test environment, this patch can fix the fallback issue well.

Still running final tests but overall its working well here, too.
Until we maybe find a 'cleaner' solution if this I would like to go with your
current fixes. But I would like to improve the wording of the commit message and
the comments a little bit if you are okay with that.

If you send a new series with the 2 patches then I would take them and post them
to the list again with my changes.

What do you think?
Wen Gu Nov. 12, 2021, 3:09 a.m. UTC | #2
On 2021/11/11 10:21 pm, Karsten Graul wrote:
> On 10/11/2021 13:50, Wen Gu wrote:
>> Hi, Karsten
>>
>> Thanks for your reply. The previous discussion about the issue of socket
>> wait queue mismatch in SMC fallback can be referred from:
>> https://lore.kernel.org/all/db9acf73-abef-209e-6ec2-8ada92e2cfbc@linux.ibm.com/
>>
>> This set of patches includes two RFC patches, they are both aimed to fix
>> the same issue, the mismatch of socket wait queue in SMC fallback.
>>
>> In your last reply, I am suggested to add the complete description about
>> the intention of initial patch in order that readers can understand the
>> idea behind it. This has been done in "[RFC PATCH net v2 0/2] net/smc: Fix
>> socket wait queue mismatch issue caused by fallback" of this mail.
>>
>> Unfortunately, I found a defect later in the solution of the initial patch
>> or the v2 patch mentioned above. The defect is about fasync_list and related
>> to 67f562e3e14 ("net/smc: transfer fasync_list in case of fallback").
>>
>> When user applications use sock_fasync() to insert entries into fasync_list,
>> the wait queue they operate is smc socket->wq. But in initial patch or
>> the v2 patch, I swapped sk->sk_wq of smc socket and clcsocket in smc_create(),
>> thus the sk_data_ready / sk_write_space.. of smc will wake up clcsocket->wq
>> finally. So the entries added into smc socket->wq.fasync_list won't be woken
>> up at all before fallback.
>>
>> So the solution in initial patch or the v2 patch of this mail by swapping
>> sk->sk_wq of smc socket and clcsocket seems a bad way to fix this issue.
>>
>> Therefore, I tried another solution by removing the wait queue entries from
>> smc socket->wq to clcsocket->wq during the fallback, which is described in the
>> "[RFC PATCH net 2/2] net/smc: Transfer remaining wait queue entries" of this
>> mail. In our test environment, this patch can fix the fallback issue well.
> 
> Still running final tests but overall its working well here, too.
> Until we maybe find a 'cleaner' solution if this I would like to go with your
> current fixes. But I would like to improve the wording of the commit message and
> the comments a little bit if you are okay with that.
> 
> If you send a new series with the 2 patches then I would take them and post them
> to the list again with my changes.

Seems just the second patch alone will fix the issue.

> 
> What do you think?
> 

Thanks for your reply. I am glad that the second patch works well.

To avoid there being any misunderstanding between us, I want to explain 
that just the second patch "[RFC PATCH net 2/2] net/smc: Transfer 
remaining wait queue entries" alone will fix the issue well.

Because it transfers the remaining entries in smc socket->wq to 
clcsocket->wq during the fallback, so that the entries added into smc 
socket->wq before fallback will still works after fallback, even though 
user applications start to use clcsocket.


The first patch "[RFC PATCH net v2 0/2] net/smc: Fix socket wait queue 
mismatch issue caused by fallback" should be abandoned.

I sent it only to better explain the defect I found in my initial patch 
or this v2 patch. Hope it didn't bother you. Swapping the sk->sk_wq 
seems a bad way to fix the issue because it can not handle the 
fasync_list well. Unfortunately I found this defect until I almost 
finished it :(

So, I think maybe it is fine that just send the second patch "[RFC PATCH 
net 2/2] net/smc: Transfer remaining wait queue entries" again. I will 
send it later.

And, it is okay for me if you want to improve the commit messages or 
comments.

Thank you.

Cheers,
Wen Gu