Message ID | 202301131736452546903@zte.com.cn (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [linux-next,v3] swap_state: update shadow_nodes for anonymous page | expand |
On Fri, Jan 13, 2023 at 05:36:45PM +0800, yang.yang29@zte.com.cn wrote: > From: Yang Yang <yang.yang29@zte.com.cn> > > Shadow_nodes is for shadow nodes reclaiming of workingset handling, > it is updated when page cache add or delete since long time ago > workingset only supported page cache. But when workingset supports > anonymous page detection, we missied updating shadow nodes for > it. This caused that shadow nodes of anonymous page will never be > reclaimd by scan_shadow_nodes() even they use much memory and > system memory is tense. > > So update shadow_nodes of anonymous page when swap cache is > add or delete by calling xas_set_update(..workingset_update_node). What testing did you do of this? I have this crash in today's testing: 04304 BUG: kernel NULL pointer dereference, address: 0000000000000080 04304 #PF: supervisor read access in kernel mode 04304 #PF: error_code(0x0000) - not-present page 04304 PGD 0 P4D 0 04304 Oops: 0000 [#1] PREEMPT SMP NOPTI 04304 CPU: 4 PID: 3219629 Comm: sh Kdump: loaded Not tainted 6.2.0-rc4-next-20230116-00016-gd289d3de8ce5-dirty #69 04304 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014 04304 RIP: 0010:_raw_spin_trylock+0x12/0x50 04304 Code: e0 41 5c 5d c3 89 c6 48 89 df e8 89 06 00 00 4c 89 e0 5b 41 5c 5d c3 90 55 48 89 e5 53 48 89 fb bf 01 00 00 00 e8 be 5b 71 ff <8b> 03 85 c0 75 16 ba 01 00 00 00 f0 0f b1 13 b8 01 00 00 00 75 06 04304 RSP: 0018:ffff888059afbbb8 EFLAGS: 00010093 04304 RAX: 0000000000000003 RBX: 0000000000000080 RCX: 0000000000000000 04304 RDX: 0000000000000000 RSI: ffff8880033e24c8 RDI: 0000000000000001 04304 RBP: ffff888059afbbc0 R08: 0000000000000000 R09: ffff888059afbd68 04304 R10: ffff88807d9db868 R11: 0000000000000000 R12: ffff8880033e24c0 04304 R13: ffff88800a1d8008 R14: ffff8880033e24c8 R15: ffff8880033e24c0 04304 FS: 00007feeeabc6740(0000) GS:ffff88807d900000(0000) knlGS:0000000000000000 04304 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 04304 CR2: 0000000000000080 CR3: 0000000059830003 CR4: 0000000000770ea0 04304 PKRU: 55555554 04304 Call Trace: 04304 <TASK> 04304 shadow_lru_isolate+0x3a/0x120 04304 __list_lru_walk_one+0xa3/0x190 04304 ? memcg_list_lru_alloc+0x330/0x330 04304 ? memcg_list_lru_alloc+0x330/0x330 04304 list_lru_walk_one_irq+0x59/0x80 04304 scan_shadow_nodes+0x27/0x30 04304 do_shrink_slab+0x13b/0x2e0 04304 shrink_slab+0x92/0x250 04304 drop_slab+0x41/0x90 04304 drop_caches_sysctl_handler+0x70/0x80 04304 proc_sys_call_handler+0x162/0x210 04304 proc_sys_write+0xe/0x10 04304 vfs_write+0x1c7/0x3a0 04304 ksys_write+0x57/0xd0 04304 __x64_sys_write+0x14/0x20 04304 do_syscall_64+0x34/0x80 04304 entry_SYSCALL_64_after_hwframe+0x63/0xcd 04304 RIP: 0033:0x7feeeacc1190 Decoding it, shadow_lru_isolate+0x3a/0x120 maps back to this line: if (!spin_trylock(&mapping->host->i_lock)) { i_lock is at offset 128 of struct inode, so that matches the dump. I believe that swapper_spaces never have ->host set, so I don't believe you've tested this patch since 51b8c1fe250d went in back in 2021.
> What testing did you do of this? I have this crash in today's testing:
My test is this:
1.Configure zram for swap.
2.Run some program malloc and access large memory, make sure they
can cause swap.
3.Watch count_shadow_nodes() and shadow_lru_isolate() to make sure
that shadow_nodes are really shrinking by adding printk().
Really sorry for inadequate test, I will try more tests include drop_caches
by sysctl.
> i_lock is at offset 128 of struct inode, so that matches the dump. > I believe that swapper_spaces never have ->host set, so I don't > believe you've tested this patch since 51b8c1fe250d went in > back in 2021. You are totally right. I reproduce the panic in linux-next, and fix it by patch v4. I should be more careful, since I used Linux 5.14 to test the patch which is a mistake. Much apologies for the time wasted. Thanks.
diff --git a/include/linux/xarray.h b/include/linux/xarray.h index 44dd6d6e01bc..5cc1f718fec9 100644 --- a/include/linux/xarray.h +++ b/include/linux/xarray.h @@ -1643,7 +1643,8 @@ static inline void xas_set_order(struct xa_state *xas, unsigned long index, * @update: Function to call when updating a node. * * The XArray can notify a caller after it has updated an xa_node. - * This is advanced functionality and is only needed by the page cache. + * This is advanced functionality and is only needed by the page cache + * and swap cache. */ static inline void xas_set_update(struct xa_state *xas, xa_update_node_t update) { diff --git a/mm/swap_state.c b/mm/swap_state.c index cb9aaa00951d..7a003d8abb37 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -94,6 +94,8 @@ int add_to_swap_cache(struct folio *folio, swp_entry_t entry, unsigned long i, nr = folio_nr_pages(folio); void *old; + xas_set_update(&xas, workingset_update_node); + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio); VM_BUG_ON_FOLIO(!folio_test_swapbacked(folio), folio); @@ -145,6 +147,8 @@ void __delete_from_swap_cache(struct folio *folio, pgoff_t idx = swp_offset(entry); XA_STATE(xas, &address_space->i_pages, idx); + xas_set_update(&xas, workingset_update_node); + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); VM_BUG_ON_FOLIO(!folio_test_swapcache(folio), folio); VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio); @@ -252,6 +256,8 @@ void clear_shadow_from_swap_cache(int type, unsigned long begin, struct address_space *address_space = swap_address_space(entry); XA_STATE(xas, &address_space->i_pages, curr); + xas_set_update(&xas, workingset_update_node); + xa_lock_irq(&address_space->i_pages); xas_for_each(&xas, old, end) { if (!xa_is_value(old))