Message ID | 20220321142128.2471199-5-zi.yan@sent.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Split a huge page to any lower order pages | expand |
On Mon, Mar 21, 2022 at 10:21:27AM -0400, Zi Yan wrote: > From: Zi Yan <ziy@nvidia.com> > > To minimize the number of pages after a huge page truncation, we do not > need to split it all the way down to order-0. The huge page has at most > three parts, the part before offset, the part to be truncated, the part > remaining at the end. Find the greatest common power of two multiplier of > the non-zero values of them as the new order, so we can split the huge > page to this order and keep the remaining pages as large and as few as > possible. Would you mind please to describe the algorithm in more details? Thanks!
On 21 Mar 2022, at 18:32, Roman Gushchin wrote: > On Mon, Mar 21, 2022 at 10:21:27AM -0400, Zi Yan wrote: >> From: Zi Yan <ziy@nvidia.com> >> >> To minimize the number of pages after a huge page truncation, we do not >> need to split it all the way down to order-0. The huge page has at most >> three parts, the part before offset, the part to be truncated, the part >> remaining at the end. Find the greatest common power of two multiplier of >> the non-zero values of them as the new order, so we can split the huge >> page to this order and keep the remaining pages as large and as few as >> possible. > > Would you mind please to describe the algorithm in more details? Sure. During truncation, there can be three parts in a huge page: 1. the _offset_ from the beginning of the huge page, 2. the _length_ of the to-be-truncated part, 3. the _remaining_ part after the to-be-truncated part. the size of the split huge page need to be the greatest common divisor of the non-zero ones of three after being rounded down to power of two. OK, I actually find there is a gcd function. I think the algorithm can simplified to new_order = ilog2(gcd(gcd(offset, length), remaining)) - PAGE_SHIFT; I will update the code, the commit message, and the comment in the next version. Thank you for the comment. -- Best Regards, Yan, Zi
Greeting, FYI, we noticed the following commit (built with gcc-9): commit: 2757cee2d6c6c76f672ec6566ade2dcb8c1605dd ("[RFC PATCH 4/5] mm: truncate: split huge page cache page to a non-zero order if possible.") url: https://github.com/0day-ci/linux/commits/Zi-Yan/Split-a-huge-page-to-any-lower-order-pages/20220321-222304 base: https://github.com/hnaz/linux-mm master patch link: https://lore.kernel.org/linux-mm/20220321142128.2471199-5-zi.yan@sent.com in testcase: boot on test machine: qemu-system-x86_64 -enable-kvm -cpu Icelake-Server -smp 4 -m 16G caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): If you fix the issue, kindly add following tag Reported-by: kernel test robot <oliver.sang@intel.com> [ 25.633813][ T275] UBSAN: shift-out-of-bounds in include/linux/log2.h:67:13 [ 25.635404][ T275] shift exponent 4294967295 is too large for 32-bit type 'long unsigned int' [ 25.636781][ T275] CPU: 3 PID: 275 Comm: cron Not tainted 5.17.0-rc8-mm1-00485-g2757cee2d6c6 #1 [ 25.638246][ T275] Call Trace: [ 25.638768][ T275] dump_stack_lvl (lib/dump_stack.c:107) [ 25.651568][ T275] dump_stack (lib/dump_stack.c:114) [ 25.652209][ T275] ubsan_epilogue (lib/ubsan.c:152) [ 25.652926][ T275] __ubsan_handle_shift_out_of_bounds.cold (arch/x86/include/asm/smap.h:85) [ 25.654034][ T275] ? lock_release (kernel/locking/lockdep.c:5348 kernel/locking/lockdep.c:5692) [ 25.654781][ T275] ? __kmap_local_pfn_prot (mm/highmem.c:532) [ 25.655650][ T275] ? kunmap_local_indexed (mm/highmem.c:600 (discriminator 3)) [ 25.656382][ T275] ? zero_user_segments (mm/highmem.c:408) [ 25.657110][ T275] greatest_pow_of_two_multiplier.cold (include/linux/log2.h:67 mm/truncate.c:204) [ 25.658134][ T275] truncate_inode_partial_folio (mm/truncate.c:255) g System Logging[ 25.659902][ T275] shmem_undo_range (mm/shmem.c:966) Service... [ 25.660789][ T275] ? zero_user_segments (mm/highmem.c:408) [ 25.661744][ T275] ? folio_mark_dirty (mm/page-writeback.c:2717) [ 25.662480][ T275] ? unlock_page (mm/folio-compat.c:21) [ 25.663179][ T275] ? __lock_acquire (kernel/locking/lockdep.c:5060) [ 25.668235][ T275] shmem_truncate_range (mm/shmem.c:1045) [ 25.668251][ T275] ? setattr_prepare (fs/attr.c:108) [ 25.668256][ T275] ? mark_held_locks (kernel/locking/lockdep.c:4239) Startin[ 25.668262][ T275] shmem_setattr (mm/shmem.c:1109) g /etc/rc.local [ 25.675802][ T275] ? shmem_setattr (mm/shmem.c:1109) Compatibility...[ 25.676621][ T275] ? current_time (fs/inode.c:2406) [ 25.677351][ T275] notify_change (fs/attr.c:414) [ 25.677989][ T275] ? shmem_evict_inode (mm/shmem.c:1077) [ 25.678735][ T275] ? notify_change (fs/attr.c:414) [ 25.679481][ T275] ? lock_acquire (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5674) [ 25.680186][ T275] do_truncate (fs/open.c:67) [ 25.680843][ T275] ? do_truncate (fs/open.c:67) [ 25.681520][ T275] ? do_truncate (fs/open.c:67) [ 25.682230][ T275] do_sys_ftruncate (fs/open.c:197) [ 25.683001][ T275] __ia32_sys_ftruncate (fs/open.c:204) [ 25.684027][ T275] __do_fast_syscall_32 (arch/x86/entry/common.c:112 arch/x86/entry/common.c:178) [ 25.684801][ T275] ? irqentry_exit_to_user_mode (kernel/entry/common.c:324) [ 25.685720][ T275] do_fast_syscall_32 (arch/x86/entry/common.c:203) [ 25.686464][ T275] do_SYSENTER_32 (arch/x86/entry/common.c:247) [ 25.687172][ T275] entry_SYSENTER_32 (arch/x86/entry/entry_32.S:869) [ 25.687180][ T275] EIP: 0xa7f00549 [ 25.687184][ T275] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76 All code ======== 0: 03 74 c0 01 add 0x1(%rax,%rax,8),%esi 4: 10 05 03 74 b8 01 adc %al,0x1b87403(%rip) # 0x1b8740d a: 10 06 adc %al,(%rsi) c: 03 74 b4 01 add 0x1(%rsp,%rsi,4),%esi 10: 10 07 adc %al,(%rdi) 12: 03 74 b0 01 add 0x1(%rax,%rsi,4),%esi 16: 10 08 adc %cl,(%rax) 18: 03 74 d8 01 add 0x1(%rax,%rbx,8),%esi 1c: 00 00 add %al,(%rax) 1e: 00 00 add %al,(%rax) 20: 00 51 52 add %dl,0x52(%rcx) 23: 55 push %rbp 24: 89 e5 mov %esp,%ebp 26: 0f 34 sysenter 28: cd 80 int $0x80 2a:* 5d pop %rbp <-- trapping instruction 2b: 5a pop %rdx 2c: 59 pop %rcx 2d: c3 retq 2e: 90 nop 2f: 90 nop 30: 90 nop 31: 90 nop 32: 8d 76 00 lea 0x0(%rsi),%esi 35: 58 pop %rax 36: b8 77 00 00 00 mov $0x77,%eax 3b: cd 80 int $0x80 3d: 90 nop 3e: 8d .byte 0x8d 3f: 76 .byte 0x76 Code starting with the faulting instruction =========================================== 0: 5d pop %rbp 1: 5a pop %rdx 2: 59 pop %rcx 3: c3 retq 4: 90 nop 5: 90 nop 6: 90 nop 7: 90 nop 8: 8d 76 00 lea 0x0(%rsi),%esi b: 58 pop %rax c: b8 77 00 00 00 mov $0x77,%eax 11: cd 80 int $0x80 13: 90 nop 14: 8d .byte 0x8d 15: 76 .byte 0x76 [ 25.687187][ T275] EAX: ffffffda EBX: 00000003 ECX: 00000004 EDX: 00453000 [ 25.687190][ T275] ESI: 00000004 EDI: 00000002 EBP: 0207e168 ESP: aff7174c [ 25.687193][ T275] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000292 [ 25.687280][ T275] ================================================================================ Starting LKP bootstrap... [ OK ] Started D-Bus System Message Bus. [ 25.719619] rc.local[280]: mkdir: cannot create directory '/var/lock/lkp-bootstrap.lock': File exists [ OK ] Started Daily Cleanup of Temporary Directories. Starting Login Service... [ OK ] Started Daily apt upgrade and clean activities. [ OK ] Reached target Timers. Starting LSB: Execute the kexec -e command to reboot system... Starting OpenBSD Secure Shell server... Starting Permit User Sessions... [ OK ] Started LKP bootstrap. [ OK ] Started Permit User Sessions. [ OK ] Started LSB: Start and stop bmc-watchdog. [ OK ] Started OpenBSD Secure Shell server. [ OK ] Started LSB: Execute the kexec -e command to reboot system. [ OK ] Started Login Service. Starting LSB: Load kernel image with kexec... LKP: ttyS0: 290: Kernel tests: Boot OK! LKP: ttyS0: 290: HOSTNAME vm-icl-74, MAC 52:54:00:12:34:56, kernel 5.17.0-rc8-mm1-00485-g2757cee2d6c6 1 [ OK ] Reached target Sound Card. [ OK ] Reached target Printer. [ OK ] Started LSB: Load kernel image with kexec. LKP: ttyS0: 290: /lkp/lkp/src/bin/run-lkp /lkp/jobs/scheduled/vm-icl-74/boot-1-debian-i386-20191205.cgz-2757cee2d6c6c76f672ec6566ade2dcb8c1605dd-20220322-39641-na3mp9-2.yaml [ 32.682503][ T516] wget (516) used greatest stack depth: 5812 bytes left [ OK ] Started System Logging Service. [ 36.389049][ T556] rsync (556) used greatest stack depth: 5772 bytes left [ 37.653297][ T315] LKP: stdout: 290: Kernel tests: Boot OK! [ 37.653316][ T315] [ 40.012523][ T605] wget (605) used greatest stack depth: 5732 bytes left LKP: ttyS0: 290: LKP: rebooting forcely [ 42.275963][ T315] LKP: stdout: 290: HOSTNAME vm-icl-74, MAC 52:54:00:12:34:56, kernel 5.17.0-rc8-mm1-00485-g2757cee2d6c6 1 [ 42.275991][ T315] [ 42.291539][ T315] install debs round one: dpkg -i --force-confdef --force-depends /opt/deb/gawk_1%3a4.1.4+dfsg-1_i386.deb [ 42.291562][ T315] [ 42.297409][ T315] Selecting previously unselected package gawk. [ 42.297426][ T315] [ 42.314957][ T315] (Reading database ... 16210 files and directories currently installed.) [ 42.314974][ T315] [ 42.319689][ T315] Preparing to unpack .../gawk_1%3a4.1.4+dfsg-1_i386.deb ... [ 42.319710][ T315] [ 42.323192][ T315] Unpacking gawk (1:4.1.4+dfsg-1) ... [ 42.323208][ T315] [ 42.326270][ T315] Setting up gawk (1:4.1.4+dfsg-1) ... [ 42.326285][ T315] [ 42.941438][ T290] sysrq: Emergency Sync [ 42.942418][ T36] Emergency Sync complete [ 42.943081][ T290] sysrq: Resetting Kboot worker: lkp-worker04 Elapsed time: 60 kvm=( qemu-system-x86_64 -enable-kvm To reproduce: # build kernel cd linux cp config-5.17.0-rc8-mm1-00485-g2757cee2d6c6 .config make HOSTCC=gcc-9 CC=gcc-9 ARCH=i386 olddefconfig prepare modules_prepare bzImage modules make HOSTCC=gcc-9 CC=gcc-9 ARCH=i386 INSTALL_MOD_PATH=<mod-install-dir> modules_install cd <mod-install-dir> find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state.
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 3617aa3ad0b1..76db0092a1e2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2349,6 +2349,7 @@ static void __split_huge_page_tail(struct page *head, int tail, prep_compound_page(page_tail, new_order); prep_transhuge_page(page_tail); } + VM_BUG_ON_PAGE(PageTail(page_tail), page_tail); /* Finally unfreeze refcount. Additional reference from page cache. */ page_ref_unfreeze(page_tail, 1 + ((!PageAnon(head) || diff --git a/mm/truncate.c b/mm/truncate.c index ab50d0d59a2a..4f71e67dec09 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -197,6 +197,14 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio) return 0; } +static unsigned int greatest_pow_of_two_multiplier(unsigned int num) +{ + if (num & 1) + return 0; + return min_t(unsigned int, ilog2(num), + ilog2(num - rounddown_pow_of_two(num))); +} + /* * Handle partial folios. The folio may be entirely within the * range if a split has raced with us. If not, we zero the part of the @@ -211,7 +219,8 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio) bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) { loff_t pos = folio_pos(folio); - unsigned int offset, length; + unsigned int offset, length, remaining; + unsigned int new_order = folio_order(folio); if (pos < start) offset = start - pos; @@ -222,6 +231,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) length = length - offset; else length = end + 1 - pos - offset; + remaining = folio_size(folio) - offset - length; folio_wait_writeback(folio); if (length == folio_size(folio)) { @@ -236,11 +246,30 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) */ folio_zero_range(folio, offset, length); + /* + * Find the greatest common power of two multiplier of the non-zero + * offset, length, and remaining as the new order. So we can truncate + * a subpage as large as possible. + */ + if (offset) + new_order = greatest_pow_of_two_multiplier(offset / PAGE_SIZE); + if (length) + new_order = min_t(unsigned int, new_order, + greatest_pow_of_two_multiplier(length / PAGE_SIZE)); + if (remaining) + new_order = min_t(unsigned int, new_order, + greatest_pow_of_two_multiplier(remaining / PAGE_SIZE)); + + /* order-1 THP not supported, downgrade to order-0 */ + if (new_order == 1) + new_order = 0; + + if (folio_has_private(folio)) folio_invalidate(folio, offset, length); if (!folio_test_large(folio)) return true; - if (split_huge_page(&folio->page) == 0) + if (split_huge_page_to_list_to_order(&folio->page, NULL, new_order) == 0) return true; if (folio_test_dirty(folio)) return false;