mbox series

[0/2] mm/zsmalloc: simplify synchronization between zs_page_migrate() and free_zspage()

Message ID 20240226-zsmalloc-zspage-rcu-v1-0-456b0ef1a89d@bytedance.com (mailing list archive)
Headers show
Series mm/zsmalloc: simplify synchronization between zs_page_migrate() and free_zspage() | expand

Message

Chengming Zhou Feb. 27, 2024, 3:02 a.m. UTC
Hello,

free_zspage() has to hold locks of all pages, since zs_page_migrate()
path rely on this page lock to protect the race between zs_free() and
it, so it can safely get zspage from page->private.

But this way is not good and simple enough:

1. Since zs_free() couldn't be sleepable, it can only trylock pages,
   or has to kick_deferred_free() to defer that to a work.

2. Even in the worker context, async_free_zspage() can't simply
   lock all pages in lock_zspage(), it's still trylock because of
   the race between zs_free() and zs_page_migrate(). Please see
   the commit 2505a981114d ("zsmalloc: fix races between asynchronous
   zspage free and page migration") for details.

Actually, all free_zspage() needs is to get zspage from page safely,
we can use RCU to achieve it easily. Then free_zspage() don't need to
hold locks of all pages, so don't need the deferred free mechanism
at all. This patchset implements it and remove all of deferred free
related code.

Thanks for review and comments!

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
Chengming Zhou (2):
      mm/zsmalloc: don't hold locks of all pages when free_zspage()
      mm/zsmalloc: remove the deferred free mechanism

 mm/zsmalloc.c | 206 ++++++++++++++++------------------------------------------
 1 file changed, 56 insertions(+), 150 deletions(-)
---
base-commit: ccbd06e764bac9bbf6b4e91c700fe6dd28f08fb3
change-id: 20240226-zsmalloc-zspage-rcu-b2c12f054fb4

Best regards,

Comments

Sergey Senozhatsky Feb. 28, 2024, 1:57 a.m. UTC | #1
On (24/02/27 03:02), Chengming Zhou wrote:
> Hello,
> 
> free_zspage() has to hold locks of all pages, since zs_page_migrate()
> path rely on this page lock to protect the race between zs_free() and
> it, so it can safely get zspage from page->private.
> 
> But this way is not good and simple enough:
> 
> 1. Since zs_free() couldn't be sleepable, it can only trylock pages,
>    or has to kick_deferred_free() to defer that to a work.
> 
> 2. Even in the worker context, async_free_zspage() can't simply
>    lock all pages in lock_zspage(), it's still trylock because of
>    the race between zs_free() and zs_page_migrate(). Please see
>    the commit 2505a981114d ("zsmalloc: fix races between asynchronous
>    zspage free and page migration") for details.
> 
> Actually, all free_zspage() needs is to get zspage from page safely,
> we can use RCU to achieve it easily. Then free_zspage() don't need to
> hold locks of all pages, so don't need the deferred free mechanism
> at all. This patchset implements it and remove all of deferred free
> related code.
> 
> Thanks for review and comments!
> 
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>

JFI, recovered from the SPAM folder
"The sender hasn't authenticated this message"
Chengming Zhou Feb. 28, 2024, 2:22 a.m. UTC | #2
On 2024/2/28 09:57, Sergey Senozhatsky wrote:
> On (24/02/27 03:02), Chengming Zhou wrote:
>> Hello,
>>
>> free_zspage() has to hold locks of all pages, since zs_page_migrate()
>> path rely on this page lock to protect the race between zs_free() and
>> it, so it can safely get zspage from page->private.
>>
>> But this way is not good and simple enough:
>>
>> 1. Since zs_free() couldn't be sleepable, it can only trylock pages,
>>    or has to kick_deferred_free() to defer that to a work.
>>
>> 2. Even in the worker context, async_free_zspage() can't simply
>>    lock all pages in lock_zspage(), it's still trylock because of
>>    the race between zs_free() and zs_page_migrate(). Please see
>>    the commit 2505a981114d ("zsmalloc: fix races between asynchronous
>>    zspage free and page migration") for details.
>>
>> Actually, all free_zspage() needs is to get zspage from page safely,
>> we can use RCU to achieve it easily. Then free_zspage() don't need to
>> hold locks of all pages, so don't need the deferred free mechanism
>> at all. This patchset implements it and remove all of deferred free
>> related code.
>>
>> Thanks for review and comments!
>>
>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> 
> JFI, recovered from the SPAM folder
> "The sender hasn't authenticated this message"

Sorry for this, I thought the problem was fixed after testing with my own
Gmail last time. But it turns out my corporation email still sometimes has
this problem.

I will always use linux.dev email in the future to avoid these problems.

Thanks for your time!
Sergey Senozhatsky Feb. 28, 2024, 3:54 a.m. UTC | #3
On (24/02/27 03:02), Chengming Zhou wrote:
> free_zspage() has to hold locks of all pages, since zs_page_migrate()
> path rely on this page lock to protect the race between zs_free() and
> it, so it can safely get zspage from page->private.
> 
> But this way is not good and simple enough:
> 
> 1. Since zs_free() couldn't be sleepable, it can only trylock pages,
>    or has to kick_deferred_free() to defer that to a work.
> 
> 2. Even in the worker context, async_free_zspage() can't simply
>    lock all pages in lock_zspage(), it's still trylock because of
>    the race between zs_free() and zs_page_migrate(). Please see
>    the commit 2505a981114d ("zsmalloc: fix races between asynchronous
>    zspage free and page migration") for details.
> 
> Actually, all free_zspage() needs is to get zspage from page safely,
> we can use RCU to achieve it easily. Then free_zspage() don't need to
> hold locks of all pages, so don't need the deferred free mechanism
> at all. This patchset implements it and remove all of deferred free
> related code.
> 
> Thanks for review and comments!
> 
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
> Chengming Zhou (2):
>       mm/zsmalloc: don't hold locks of all pages when free_zspage()

That seems to be crashing on me:

[   28.123867] ==================================================================
[   28.125303] BUG: KASAN: null-ptr-deref in obj_malloc+0xa9/0x1f0
[   28.126289] Read of size 8 at addr 0000000000000028 by task mkfs.ext2/432
[   28.127414] 
[   28.127684] CPU: 8 PID: 432 Comm: mkfs.ext2 Tainted: G                 N 6.8.0-rc5+ #309
[   28.129015] Call Trace:
[   28.129442]  <TASK>
[   28.129805]  dump_stack_lvl+0x6f/0xab
[   28.130437]  print_report+0xe0/0x5e0
[   28.131050]  ? _printk+0x59/0x7b
[   28.131602]  ? kasan_report+0x96/0x120
[   28.132233]  ? obj_malloc+0xa9/0x1f0
[   28.132837]  kasan_report+0xe7/0x120
[   28.133441]  ? obj_malloc+0xa9/0x1f0
[   28.134046]  obj_malloc+0xa9/0x1f0
[   28.134633]  zs_malloc+0x22c/0x3e0
[   28.135211]  zram_submit_bio+0x44e/0xee0
[   28.135871]  ? lock_release+0x50c/0x700
[   28.136520]  submit_bio_noacct_nocheck+0x22a/0x650
[   28.137327]  __block_write_full_folio+0x48b/0x710
[   28.138119]  ? __cfi_blkdev_get_block+0x10/0x10
[   28.138885]  ? __cfi_block_write_full_folio+0x10/0x10
[   28.139737]  write_cache_pages+0x83/0xf0
[   28.140397]  ? __cfi_blkdev_get_block+0x10/0x10
[   28.141152]  blkdev_writepages+0x46/0x80
[   28.141810]  do_writepages+0x1be/0x400
[   28.142443]  file_write_and_wait_range+0x104/0x170
[   28.143254]  blkdev_fsync+0x4a/0x70
[   28.143846]  __x64_sys_fsync+0xe9/0x120
[   28.144491]  do_syscall_64+0x8d/0x130
[   28.145106]  entry_SYSCALL_64_after_hwframe+0x46/0x4e