[108/192] mm: zram: amend SLAB_RECLAIM_ACCOUNT on zspage_cachep

Message ID	20210701015258.BrxjIzdE1%akpm@linux-foundation.org (mailing list archive)
State	New
Headers	show Return-Path: <SRS0=qc1w=LZ=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EE4856105A Date: Wed, 30 Jun 2021 18:52:58 -0700 From: Andrew Morton <akpm@linux-foundation.org> To: akpm@linux-foundation.org, linux-mm@kvack.org, minchan@kernel.org, mm-commits@vger.kernel.org, senozhatsky@chromium.org, torvalds@linux-foundation.org, zhaoyang.huang@unisoc.com Subject: [patch 108/192] mm: zram: amend SLAB_RECLAIM_ACCOUNT on zspage_cachep Message-ID: <20210701015258.BrxjIzdE1%akpm@linux-foundation.org> In-Reply-To: <20210630184624.9ca1937310b0dd5ce66b30e7@linux-foundation.org> User-Agent: s-nail v14.8.16 Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	[001/192] mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c \| expand [001/192] mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c [002/192] mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP [003/192] mm: hugetlb: gather discrete indexes of tail page [004/192] mm: hugetlb: free the vmemmap pages associated with each HugeTLB page [005/192] mm: hugetlb: defer freeing of HugeTLB pages [006/192] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page [007/192] mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap [008/192] mm: memory_hotplug: disable memmap_on_memory when hugetlb_free_vmemmap enabled [009/192] mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate [010/192] mm/debug_vm_pgtable: move {pmd/pud}_huge_tests out of CONFIG_TRANSPARENT_HUGEPAGE [011/192] mm/debug_vm_pgtable: remove redundant pfn_{pmd/pte}() and fix one comment mistake [012/192] mm/huge_memory.c: remove dedicated macro HPAGE_CACHE_INDEX_MASK [013/192] mm/huge_memory.c: use page->deferred_list [014/192] mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled() [015/192] mm/huge_memory.c: remove unnecessary tlb_remove_page_size() for huge zero pmd [016/192] mm/huge_memory.c: don't discard hugepage if other processes are mapping it [017/192] mm/hugetlb: change parameters of arch_make_huge_pte() [018/192] mm/pgtable: add stubs for {pmd/pub}_{set/clear}_huge [019/192] mm/vmalloc: enable mapping of huge pages at pte level in vmap [020/192] mm/vmalloc: enable mapping of huge pages at pte level in vmalloc [021/192] powerpc/8xx: add support for huge pages on VMAP and VMALLOC [022/192] khugepaged: selftests: remove debug_cow [023/192] mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY [024/192] mm: sparsemem: split the huge PMD mapping of vmemmap pages [025/192] mm: sparsemem: use huge PMD mapping for vmemmap pages [026/192] mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON [027/192] hugetlb: remove prep_compound_huge_page cleanup [028/192] hugetlb: address ref count racing in prep_compound_gigantic_page [029/192] mm/hwpoison: disable pcp for page_handle_poison() [030/192] userfaultfd/selftests: use user mode only [031/192] userfaultfd/selftests: remove the time() check on delayed uffd [032/192] userfaultfd/selftests: dropping VERIFY check in locking_thread [033/192] userfaultfd/selftests: only dump counts if mode enabled [034/192] userfaultfd/selftests: unify error handling [035/192] mm/thp: simplify copying of huge zero page pmd when fork [036/192] mm/userfaultfd: fix uffd-wp special cases for fork() [037/192] mm/userfaultfd: fail uffd-wp registration if not supported [038/192] mm/pagemap: export uffd-wp protection information [039/192] userfaultfd/selftests: add pagemap uffd-wp test [040/192] userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte [041/192] userfaultfd/shmem: support minor fault registration for shmem [042/192] userfaultfd/shmem: support UFFDIO_CONTINUE for shmem [043/192] userfaultfd/shmem: advertise shmem minor fault support [044/192] userfaultfd/shmem: modify shmem_mfill_atomic_pte to use install_pte() [045/192] userfaultfd/selftests: use memfd_create for shmem test type [046/192] userfaultfd/selftests: create alias mappings in the shmem test [047/192] userfaultfd/selftests: reinitialize test context in each test [048/192] userfaultfd/selftests: exercise minor fault handling shmem support [049/192] mm/vmscan.c: fix potential deadlock in reclaim_pages() [050/192] include/trace/events/vmscan.h: remove mm_vmscan_inactive_list_is_low [051/192] mm: workingset: define macro WORKINGSET_SHIFT [052/192] mm/kconfig: move HOLES_IN_ZONE into mm [053/192] docs: proc.rst: meminfo: briefly describe gaps in memory accounting [054/192] fs/proc/kcore: drop KCORE_REMAP and KCORE_OTHER [055/192] fs/proc/kcore: pfn_is_ram check only applies to KCORE_RAM [056/192] fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages [057/192] mm: introduce page_offline_(begin\|end\|freeze\|thaw) to synchronize setting PageOffline() [058/192] virtio-mem: use page_offline_(start\|end) when setting PageOffline() [059/192] fs/proc/kcore: use page_offline_(freeze\|thaw) [060/192] mm/z3fold: define macro NCHUNKS as TOTAL_CHUNKS - ZHDR_CHUNKS [061/192] mm/z3fold: avoid possible underflow in z3fold_alloc() [062/192] mm/z3fold: remove magic number in z3fold_create_pool() [063/192] mm/z3fold: remove unused function handle_to_z3fold_header() [064/192] mm/z3fold: fix potential memory leak in z3fold_destroy_pool() [065/192] mm/z3fold: use release_z3fold_page_locked() to release locked z3fold page [066/192] mm/zbud: reuse unbuddied[0] as buddied in zbud_pool [067/192] mm/zbud: don't export any zbud API [068/192] mm/compaction: use DEVICE_ATTR_WO macro [069/192] mm: compaction: remove duplicate !list_empty(&sublist) check [070/192] mm/compaction: fix 'limit' in fast_isolate_freepages [071/192] mm/mempolicy: cleanup nodemask intersection check for oom [072/192] mm/mempolicy: don't handle MPOL_LOCAL like a fake MPOL_PREFERRED policy [073/192] mm/mempolicy: unify the parameter sanity check for mbind and set_mempolicy [074/192] mm: mempolicy: don't have to split pmd for huge zero page [075/192] mm/mempolicy: use unified 'nodes' for bind/interleave/prefer policies [076/192] include/linux/mmzone.h: add documentation for pfn_valid() [077/192] memblock: update initialization of reserved pages [078/192] arm64: decouple check whether pfn is in linear map from pfn_valid() [079/192] arm64: drop pfn_valid_within() and simplify pfn_valid() [080/192] arm64/mm: drop HAVE_ARCH_PFN_VALID [081/192] mm: migrate: fix missing update page_private to hugetlb_page_subpool [082/192] mm, thp: relax the VM_DENYWRITE constraint on file-backed THPs [083/192] mm: memory: add orig_pmd to struct vm_fault [084/192] mm: memory: make numa_migrate_prep() non-static [085/192] mm: thp: refactor NUMA fault handling [086/192] mm: migrate: account THP NUMA migration counters correctly [087/192] mm: migrate: don't split THP for misplaced NUMA page [088/192] mm: migrate: check mapcount for THP instead of refcount [089/192] mm: thp: skip make PMD PROT_NONE if THP migration is not supported [090/192] mm/thp: make ARCH_ENABLE_SPLIT_PMD_PTLOCK dependent on PGTABLE_LEVELS > 2 [091/192] mm: rmap: make try_to_unmap() void function [092/192] mm/thp: remap_page() is only needed on anonymous THP [093/192] mm: hwpoison_user_mappings() try_to_unmap() with TTU_SYNC [094/192] mm/thp: fix strncpy warning [095/192] nommu: remove __GFP_HIGHMEM in vmalloc/vzalloc [096/192] mm/nommu: unexport do_munmap() [097/192] mm: generalize ZONE_[DMA\|DMA32] [098/192] mm: make variable names for populate_vma_page_range() consistent [099/192] mm/madvise: introduce MADV_POPULATE_(READ\|WRITE) to prefault page tables [100/192] MAINTAINERS: add tools/testing/selftests/vm/ to MEMORY MANAGEMENT [101/192] selftests/vm: add protection_keys_32 / protection_keys_64 to gitignore [102/192] selftests/vm: add test for MADV_POPULATE_(READ\|WRITE) [103/192] mm/memory_hotplug: rate limit page migration warnings [104/192] mm,memory_hotplug: drop unneeded locking [105/192] mm/zswap.c: remove unused function zswap_debugfs_exit() [106/192] mm/zswap.c: avoid unnecessary copy-in at map time [107/192] mm/zswap.c: fix two bugs in zswap_writeback_entry() [108/192] mm: zram: amend SLAB_RECLAIM_ACCOUNT on zspage_cachep [109/192] mm/zsmalloc.c: remove confusing code in obj_free() [110/192] mm/zsmalloc.c: improve readability for async_free_zspage() [111/192] zram: move backing_dev under macro CONFIG_ZRAM_WRITEBACK [112/192] mm: fix typos and grammar error in comments [113/192] mm: define default value for FIRST_USER_ADDRESS [114/192] mm: fix spelling mistakes [115/192] mm/vmscan: remove kerneldoc-like comment from isolate_lru_pages [116/192] mm/vmalloc: include header for prototype of set_iounmap_nonlazy [117/192] mm/page_alloc: make should_fail_alloc_page() static [118/192] mm/mapping_dirty_helpers: remove double Note in kerneldoc [119/192] mm/memcontrol.c: fix kerneldoc comment for mem_cgroup_calculate_protection [120/192] mm/memory_hotplug: fix kerneldoc comment for __try_online_node [121/192] mm/memory_hotplug: fix kerneldoc comment for __remove_memory [122/192] mm/zbud: add kerneldoc fields for zbud_pool [123/192] mm/z3fold: add kerneldoc fields for z3fold_pool [124/192] mm/swap: make swap_address_space an inline function [125/192] mm/mmap_lock: remove dead code for !CONFIG_TRACING configurations [126/192] mm/page_alloc: move prototype for find_suitable_fallback [127/192] mm/swap: make NODE_DATA an inline function on CONFIG_FLATMEM [128/192] mm/thp: define default pmd_pgtable() [129/192] kfence: unconditionally use unbound work queue [130/192] mm: remove special swap entry functions [131/192] mm/swapops: rework swap entry manipulation code [132/192] mm/rmap: split try_to_munlock from try_to_unmap [133/192] mm/rmap: split migration into its own function [134/192] mm: rename migrate_pgmap_owner [135/192] mm/memory.c: allow different return codes for copy_nonpresent_pte() [136/192] mm: device exclusive memory access [137/192] mm: selftests for exclusive device memory [138/192] nouveau/svm: refactor nouveau_range_fault [139/192] nouveau/svm: implement atomic SVM access [140/192] proc: Avoid mixing integer types in mem_rw() [141/192] fs/proc/kcore.c: add mmap interface [142/192] procfs: allow reading fdinfo with PTRACE_MODE_READ [143/192] procfs/dmabuf: add inode number to /proc//fdinfo [144/192] sysctl: remove redundant assignment to first [145/192] drm: include only needed headers in ascii85.h [146/192] kernel.h: split out panic and oops helpers [147/192] lib: decompress_bunzip2: remove an unneeded semicolon [148/192] lib/string_helpers: switch to use BIT() macro [149/192] lib/string_helpers: move ESCAPE_NP check inside 'else' branch in a loop [150/192] lib/string_helpers: drop indentation level in string_escape_mem() [151/192] lib/string_helpers: introduce ESCAPE_NA for escaping non-ASCII [152/192] lib/string_helpers: introduce ESCAPE_NAP to escape non-ASCII and non-printable [153/192] lib/string_helpers: allow to append additional characters to be escaped [154/192] lib/test-string_helpers: print flags in hexadecimal format [155/192] lib/test-string_helpers: get rid of trailing comma in terminators [156/192] lib/test-string_helpers: add test cases for new features [157/192] MAINTAINERS: add myself as designated reviewer for generic string library [158/192] seq_file: introduce seq_escape_mem() [159/192] seq_file: add seq_escape_str() as replica of string_escape_str() [160/192] seq_file: convert seq_escape() to use seq_escape_str() [161/192] nfsd: avoid non-flexible API in seq_quote_mem() [162/192] seq_file: drop unused _escape_mem_ascii() [163/192] lib/math/rational.c: fix divide by zero [164/192] lib/math/rational: add Kunit test cases [165/192] lib/decompressors: fix spelling mistakes [166/192] lib/mpi: fix spelling mistakes [167/192] lib: memscan() fixlet [168/192] lib: uninline simple_strtoull() [169/192] lib/test_string.c: allow module removal [170/192] kernel.h: split out kstrtox() and simple_strtox() to a separate header [171/192] lz4_decompress: declare LZ4_decompress_safe_withPrefix64k static [172/192] lib/decompress_unlz4.c: correctly handle zero-padding around initrds. [173/192] checkpatch: scripts/spdxcheck.py now requires python3 [174/192] checkpatch: improve the indented label test [175/192] checkpatch: do not complain about positive return values starting with EPOLL [176/192] init: print out unknown kernel parameters [177/192] kprobes: remove duplicated strong free_insn_page in x86 and s390 [178/192] nilfs2: remove redundant continue statement in a while-loop [179/192] hfsplus: remove unnecessary oom message [180/192] hfsplus: report create_date to kstat.btime [181/192] x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned [182/192] exec: remove checks in __register_bimfmt() [183/192] kcov: add __no_sanitize_coverage to fix noinstr for all architectures [184/192] selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random [185/192] selftests/vm/pkeys: handle negative sys_pkey_alloc() return code [186/192] selftests/vm/pkeys: refill shadow register after implicit kernel write [187/192] selftests/vm/pkeys: exercise x86 XSAVE init state [188/192] lib/decompressors: remove set but not used variabled 'level' [189/192] ipc sem: use kvmalloc for sem_undo allocation [190/192] ipc: use kmalloc for msg_queue and shmid_kernel [191/192] ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock [192/192] ipc/util.c: use binary search for max_idx

On Fri, Jul 02, 2021 at 02:20:42PM +0800, Zhaoyang Huang wrote: > On Fri, Jul 2, 2021 at 1:47 PM Minchan Kim <minchan@kernel.org> wrote: > > > > On Fri, Jul 02, 2021 at 10:45:09AM +0800, Zhaoyang Huang wrote: > > > On Thu, Jul 1, 2021 at 10:56 PM Minchan Kim <minchan@kernel.org> wrote: > > > > > > > > On Wed, Jun 30, 2021 at 06:52:58PM -0700, Andrew Morton wrote: > > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > > > > > Subject: mm: zram: amend SLAB_RECLAIM_ACCOUNT on zspage_cachep > > > > > > > > > > Zspage_cachep is found be merged with other kmem cache during test, which > > > > > is not good for debug things (zs_pool->zspage_cachep present to be another > > > > > kmem cache in memory dumpfile). It is also neccessary to do so as > > > > > shrinker has been registered for zspage. > > > > > > > > > > Amending this flag can help kernel to calculate SLAB_RECLAIMBLE correctly. > > > > > > > > > > Link: https://lkml.kernel.org/r/1623137297-29685-1-git-send-email-huangzhaoyang@gmail.com > > > > > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > > > > > Cc: Minchan Kim <minchan@kernel.org> > > > > > Cc: Sergey Senozhatsky <senozhatsky@chromium.org> > > > > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > > > > > > > > Sorry for the late. I don't think this is correct. > > > > > > > > It's true "struct zspage" can be freed by zsmalloc's compaction registerred > > > > by slab shrinker so tempted to make it SLAB_RECLAIM_ACCOUNT. However, it's > > > > quite limited to work only when objects in the zspage are heavily fragmented. > > > > Once the compaction is done, zspage are never discardable until objects are > > > > fragmented again. It means it could hurt other reclaimable slab page reclaiming > > > > since the zspage slab object pins the page. > > > IMHO, kmem cache's reclaiming is NOT affected by SLAB_RECLAIM_ACCOUNT > > > . This flag just affects kmem cache merge[1], the slab page's migrate > > > type[2] and the page's statistics. Actually, zspage's cache DO merged > > > with others even without SLAB_RECLAIM_ACCOUNT currently, which maybe > > > cause zspage's object will NEVER be discarded.(SLAB_MERGE_SAME > > > introduce confusions as people believe the cache will merge with > > > others when it set and vice versa) > > > > > > [1] > > > struct kmem_cache *find_mergeable(size_t size, size_t align, unsigned > > > long flags, const char *name, void (*ctor)(void *)) > > > ... > > > if ((flags & SLAB_MERGE_SAME) != (s->flags & SLAB_MERGE_SAME)) > > > continue; > > > > > > [2] > > > if (s->flags & SLAB_RECLAIM_ACCOUNT) > > > s->allocflags |= __GFP_RECLAIMABLE; > > > > That's the point here. With SLAB_RECLAIM_ACCOUNT, page allocator > > try to allocate pages from MIGRATE_RECLAIMABLE with belief those > > objects are easily reclaimable. Say a page has object A, B, C, D > > and E. A-D are easily reclaimable but E is hard. What happens is > > VM couldn't reclaim the page in the end due to E even though it > > already reclaimed A-D. And the such fragmenation could be spread > > out entire MIGRATE_RECLAIMABLE pageblocks over time. > > That's why I'd like to put zspage into MIGRATE_UNMOVALBE from the > > beginning since I don't think it's easily reclaimble once compaction > > is done. > The slab page could fallback to any migrate type even allocating with It's true but it couldn't be justication to allocate objects from any migration type. We should try to select right type. Please see below. > __GFP_RECLAIMABLE, and there is only one page per slab within zspage's > cache, which will not be affected by compaction, so I think that > doesn't make sense. You shouldn't rely on how many pages the slab has since it's internal implemenation and zspage size also could be changed in the future. And please think about external fragmentaion as well as internal one. What we want to try with allocation type is to group similar lifetime objects together in a pageblock group to help external fragmentation for high-order allocation. Think what happens if the unreclaimable object is located in a reclaimable pageblock. The block couldn't be merged into high-order page in the end so it causes more compaction and smaller available high-order pages in the system.

[108/192] mm: zram: amend SLAB_RECLAIM_ACCOUNT on zspage_cachep

Commit Message

Comments

Patch