mbox series

[v2,0/5] cachefiles: some bugfixes for clean object/send req/poll

Message ID 20240515125136.3714580-1-libaokun@huaweicloud.com (mailing list archive)
Headers show
Series cachefiles: some bugfixes for clean object/send req/poll | expand

Message

Baokun Li May 15, 2024, 12:51 p.m. UTC
From: Baokun Li <libaokun1@huawei.com>

Hi all!

This is the second version of this patch series. Thank you, Jia Zhu and
Gao Xiang, for the feedback in the previous version.

We've been testing ondemand mode for cachefiles since January, and we're
almost done. We hit a lot of issues during the testing period, and this
patch set fixes some of the issues related to reopen worker/send req/poll.
The patches have passed internal testing without regression.

Patch 1-3: A read request waiting for reopen could be closed maliciously
before the reopen worker is executing or waiting to be scheduled. So
ondemand_object_worker() may be called after the info and object and even
the cache have been freed and trigger use-after-free. So use
cancel_work_sync() in cachefiles_ondemand_clean_object() to cancel the
reopen worker or wait for it to finish. Since it makes no sense to wait
for the daemon to complete the reopen request, to avoid this pointless
operation blocking cancel_work_sync(), Patch 1 avoids request generation
by the DROPPING state when the request has not been sent, and Patch 2
flushes the requests of the current object before cancel_work_sync().

Patch 4: Cyclic allocation of msg_id to avoid msg_id reuse misleading
the daemon to cause hung.

Patch 5: Hold xas_lock during polling to avoid dereferencing reqs causing
use-after-free. This issue was triggered frequently in our tests, and we
found that anolis 5.10 had fixed it, so to avoid failing the test, this
patch was pushed upstream as well.

Comments and questions are, as always, welcome.
Please let me know what you think.

Thanks,
Baokun

Changes since v1:
  * Collect RVB from Jia Zhu and Gao Xiang.(Thanks for your review!)
  * Pathch 1,2:Add more commit messages.
  * Pathch 3:Add Fixes tag as suggested by Jia Zhu.
  * Pathch 4:No longer changing "do...while" to "retry" to focus changes
    and optimise commit messages.
  * Pathch 5: Drop the internal RVB tag.

[V1]: https://lore.kernel.org/all/20240424033409.2735257-1-libaokun@huaweicloud.com

Baokun Li (3):
  cachefiles: stop sending new request when dropping object
  cachefiles: flush all requests for the object that is being dropped
  cachefiles: cyclic allocation of msg_id to avoid reuse

Hou Tao (1):
  cachefiles: flush ondemand_object_worker during clean object

Jingbo Xu (1):
  cachefiles: add missing lock protection when polling

 fs/cachefiles/daemon.c   |  4 ++--
 fs/cachefiles/internal.h |  3 +++
 fs/cachefiles/ondemand.c | 52 +++++++++++++++++++++++++++++++++++-----
 3 files changed, 51 insertions(+), 8 deletions(-)

Comments

Baokun Li June 26, 2024, 3:04 a.m. UTC | #1
A gentle ping.

On 2024/5/15 20:51, libaokun@huaweicloud.com wrote:
> From: Baokun Li <libaokun1@huawei.com>
>
> Hi all!
>
> This is the second version of this patch series. Thank you, Jia Zhu and
> Gao Xiang, for the feedback in the previous version.
>
> We've been testing ondemand mode for cachefiles since January, and we're
> almost done. We hit a lot of issues during the testing period, and this
> patch set fixes some of the issues related to reopen worker/send req/poll.
> The patches have passed internal testing without regression.
>
> Patch 1-3: A read request waiting for reopen could be closed maliciously
> before the reopen worker is executing or waiting to be scheduled. So
> ondemand_object_worker() may be called after the info and object and even
> the cache have been freed and trigger use-after-free. So use
> cancel_work_sync() in cachefiles_ondemand_clean_object() to cancel the
> reopen worker or wait for it to finish. Since it makes no sense to wait
> for the daemon to complete the reopen request, to avoid this pointless
> operation blocking cancel_work_sync(), Patch 1 avoids request generation
> by the DROPPING state when the request has not been sent, and Patch 2
> flushes the requests of the current object before cancel_work_sync().
>
> Patch 4: Cyclic allocation of msg_id to avoid msg_id reuse misleading
> the daemon to cause hung.
>
> Patch 5: Hold xas_lock during polling to avoid dereferencing reqs causing
> use-after-free. This issue was triggered frequently in our tests, and we
> found that anolis 5.10 had fixed it, so to avoid failing the test, this
> patch was pushed upstream as well.
>
> Comments and questions are, as always, welcome.
> Please let me know what you think.
>
> Thanks,
> Baokun
>
> Changes since v1:
>    * Collect RVB from Jia Zhu and Gao Xiang.(Thanks for your review!)
>    * Pathch 1,2:Add more commit messages.
>    * Pathch 3:Add Fixes tag as suggested by Jia Zhu.
>    * Pathch 4:No longer changing "do...while" to "retry" to focus changes
>      and optimise commit messages.
>    * Pathch 5: Drop the internal RVB tag.
>
> [V1]: https://lore.kernel.org/all/20240424033409.2735257-1-libaokun@huaweicloud.com
>
> Baokun Li (3):
>    cachefiles: stop sending new request when dropping object
>    cachefiles: flush all requests for the object that is being dropped
>    cachefiles: cyclic allocation of msg_id to avoid reuse
>
> Hou Tao (1):
>    cachefiles: flush ondemand_object_worker during clean object
>
> Jingbo Xu (1):
>    cachefiles: add missing lock protection when polling
>
>   fs/cachefiles/daemon.c   |  4 ++--
>   fs/cachefiles/internal.h |  3 +++
>   fs/cachefiles/ondemand.c | 52 +++++++++++++++++++++++++++++++++++-----
>   3 files changed, 51 insertions(+), 8 deletions(-)
>
Gao Xiang June 26, 2024, 3:28 a.m. UTC | #2
On 2024/6/26 11:04, Baokun Li wrote:
> A gentle ping.

Since it's been long time, I guess you could just resend
a new patchset with collected new tags instead of just
ping for the next round review?

Thanks,
Gao Xiang
Baokun Li June 27, 2024, 1:49 a.m. UTC | #3
On 2024/6/26 11:28, Gao Xiang wrote:
>
>
> On 2024/6/26 11:04, Baokun Li wrote:
>> A gentle ping.
>
> Since it's been long time, I guess you could just resend
> a new patchset with collected new tags instead of just
> ping for the next round review?
>
> Thanks,
> Gao Xiang

Okay, if there's still no feedback this week, I'll resend this patch series.

Since both patch sets under review are now 5 patches and have similar
titles, it would be more confusing if they both had RESEND. So when I
resend it I will merge the two patch sets into one patch series.
Gao Xiang June 27, 2024, 2:08 a.m. UTC | #4
On 2024/6/27 09:49, Baokun Li wrote:
> On 2024/6/26 11:28, Gao Xiang wrote:
>>
>>
>> On 2024/6/26 11:04, Baokun Li wrote:
>>> A gentle ping.
>>
>> Since it's been long time, I guess you could just resend
>> a new patchset with collected new tags instead of just
>> ping for the next round review?
>>
>> Thanks,
>> Gao Xiang
> 
> Okay, if there's still no feedback this week, I'll resend this patch series.
> 
> Since both patch sets under review are now 5 patches and have similar
> titles, it would be more confusing if they both had RESEND. So when I
> resend it I will merge the two patch sets into one patch series.

Sounds fine, I think you could rearrange the RESEND patchset with
the following order
cachefiles: some bugfixes for withdraw and xattr
cachefiles: some bugfixes for clean object/send req/poll

Jingbo currently is working on the internal stuff, I will try to
review myself for this work too.

Thanks,
Gao Xiaang

>
Baokun Li June 27, 2024, 2:18 a.m. UTC | #5
On 2024/6/27 10:08, Gao Xiang wrote:
>
>
> On 2024/6/27 09:49, Baokun Li wrote:
>> On 2024/6/26 11:28, Gao Xiang wrote:
>>>
>>>
>>> On 2024/6/26 11:04, Baokun Li wrote:
>>>> A gentle ping.
>>>
>>> Since it's been long time, I guess you could just resend
>>> a new patchset with collected new tags instead of just
>>> ping for the next round review?
>>>
>>> Thanks,
>>> Gao Xiang
>>
>> Okay, if there's still no feedback this week, I'll resend this patch 
>> series.
>>
>> Since both patch sets under review are now 5 patches and have similar
>> titles, it would be more confusing if they both had RESEND. So when I
>> resend it I will merge the two patch sets into one patch series.
>
> Sounds fine, I think you could rearrange the RESEND patchset with
> the following order
> cachefiles: some bugfixes for withdraw and xattr
> cachefiles: some bugfixes for clean object/send req/poll

Okay, I'll arrange the patches in that order.

Thank you for your suggestion!

>
> Jingbo currently is working on the internal stuff, I will try to
> review myself for this work too.
>
> Thanks,
> Gao Xiaang

Thank you so much for helping to review these patches.
Jeff Layton June 27, 2024, 11:03 a.m. UTC | #6
On Wed, 2024-05-15 at 20:51 +0800, libaokun@huaweicloud.com wrote:
> From: Baokun Li <libaokun1@huawei.com>
> 
> Hi all!
> 
> This is the second version of this patch series. Thank you, Jia Zhu and
> Gao Xiang, for the feedback in the previous version.
> 
> We've been testing ondemand mode for cachefiles since January, and we're
> almost done. We hit a lot of issues during the testing period, and this
> patch set fixes some of the issues related to reopen worker/send req/poll.
> The patches have passed internal testing without regression.
> 
> Patch 1-3: A read request waiting for reopen could be closed maliciously
> before the reopen worker is executing or waiting to be scheduled. So
> ondemand_object_worker() may be called after the info and object and even
> the cache have been freed and trigger use-after-free. So use
> cancel_work_sync() in cachefiles_ondemand_clean_object() to cancel the
> reopen worker or wait for it to finish. Since it makes no sense to wait
> for the daemon to complete the reopen request, to avoid this pointless
> operation blocking cancel_work_sync(), Patch 1 avoids request generation
> by the DROPPING state when the request has not been sent, and Patch 2
> flushes the requests of the current object before cancel_work_sync().
> 
> Patch 4: Cyclic allocation of msg_id to avoid msg_id reuse misleading
> the daemon to cause hung.
> 
> Patch 5: Hold xas_lock during polling to avoid dereferencing reqs causing
> use-after-free. This issue was triggered frequently in our tests, and we
> found that anolis 5.10 had fixed it, so to avoid failing the test, this
> patch was pushed upstream as well.
> 
> Comments and questions are, as always, welcome.
> Please let me know what you think.
> 
> Thanks,
> Baokun
> 
> Changes since v1:
>   * Collect RVB from Jia Zhu and Gao Xiang.(Thanks for your review!)
>   * Pathch 1,2:Add more commit messages.
>   * Pathch 3:Add Fixes tag as suggested by Jia Zhu.
>   * Pathch 4:No longer changing "do...while" to "retry" to focus changes
>     and optimise commit messages.
>   * Pathch 5: Drop the internal RVB tag.
> 
> [V1]: https://lore.kernel.org/all/20240424033409.2735257-1-libaokun@huaweicloud.com
> 
> Baokun Li (3):
>   cachefiles: stop sending new request when dropping object
>   cachefiles: flush all requests for the object that is being dropped
>   cachefiles: cyclic allocation of msg_id to avoid reuse
> 
> Hou Tao (1):
>   cachefiles: flush ondemand_object_worker during clean object
> 
> Jingbo Xu (1):
>   cachefiles: add missing lock protection when polling
> 
>  fs/cachefiles/daemon.c   |  4 ++--
>  fs/cachefiles/internal.h |  3 +++
>  fs/cachefiles/ondemand.c | 52 +++++++++++++++++++++++++++++++++++-----
>  3 files changed, 51 insertions(+), 8 deletions(-)
> 

The set itself looks fairly straightforward, but I don't know this code
well enough to give it a proper review.

Acked-by: Jeff Layton <jlayton@kernel.org>