Message ID | 20200211213128.73302-1-almasrymina@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v12,1/9] hugetlb_cgroup: Add hugetlb_cgroup reservation counter | expand |
On Tue, 11 Feb 2020 13:31:20 -0800 Mina Almasry <almasrymina@google.com> wrote: > These counters will track hugetlb reservations rather than hugetlb > memory faulted in. This patch only adds the counter, following patches > add the charging and uncharging of the counter. We're still pretty thin on review here, but as it's v12 and Mike appears to be signed up to look at this work, I'll add them to -next to help move things forward.
On Tue, 2020-02-11 at 15:19 -0800, Andrew Morton wrote: > On Tue, 11 Feb 2020 13:31:20 -0800 Mina Almasry <almasrymina@google.com> wrote: > > > These counters will track hugetlb reservations rather than hugetlb > > memory faulted in. This patch only adds the counter, following patches > > add the charging and uncharging of the counter. > > We're still pretty thin on review here, but as it's v12 and Mike > appears to be signed up to look at this work, I'll add them to -next to > help move things forward. > Reverted the whole series on the top of next-20200217 fixed a crash below (I don't see anything in next-20200218 would make any differences). [ 7933.691114][T35046] LTP: starting hugemmap06 [ 7933.806377][T14355] ------------[ cut here ]------------ [ 7933.806541][T14355] kernel BUG at mm/hugetlb.c:490! VM_BUG_ON(t - f <= 1); [ 7933.806562][T14355] Oops: Exception in kernel mode, sig: 5 [#1] [ 7933.806573][T14355] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA PowerNV [ 7933.806594][T14355] Modules linked in: kvm_hv kvm brd ext4 crc16 mbcache jbd2 loop ip_tables x_tables xfs sd_mod bnx2x ahci mdio libahci tg3 libata libphy firmware_class dm_mirror dm_region_hash dm_log dm_mod [last unloaded: binfmt_misc] [ 7933.806651][T14355] CPU: 54 PID: 14355 Comm: hugemmap06 Tainted: G O 5.6.0-rc2-next-20200217 #1 [ 7933.806674][T14355] NIP: c00000000040d22c LR: c00000000040d210 CTR: 0000000000000000 [ 7933.806696][T14355] REGS: c0000014b71ef660 TRAP: 0700 Tainted: G O (5.6.0-rc2-next-20200217) [ 7933.806727][T14355] MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 22022228 XER: 00000000 [ 7933.806772][T14355] CFAR: c00000000040cbec IRQMASK: 0 [ 7933.806772][T14355] GPR00: c00000000040d210 c0000014b71ef8f0 c000000001657000 0000000000000001 [ 7933.806772][T14355] GPR04: 0000000000000012 0000000000000013 0000000000000000 0000000000000000 [ 7933.806772][T14355] GPR08: 0000000000000002 0000000000000002 0000000000000001 0000000000000036 [ 7933.806772][T14355] GPR12: 0000000022022222 c000001ffffd3d00 00007fffad670000 00007fffa4bc0000 [ 7933.806772][T14355] GPR16: 0000000000000000 c000000001567178 c0000014b71efa50 0000000000000000 [ 7933.806772][T14355] GPR20: 0000000000000000 0000000000000013 0000000000000012 0000000000000001 [ 7933.806772][T14355] GPR24: c0000019f74cd270 5deadbeef0000100 5deadbeef0000122 c0000019f74cd2c0 [ 7933.806772][T14355] GPR28: 0000000000000001 c0000019f74cd268 c0000014b71ef918 0000000000000001 [ 7933.806961][T14355] NIP [c00000000040d22c] region_add+0x11c/0x3a0 [ 7933.806980][T14355] LR [c00000000040d210] region_add+0x100/0x3a0 [ 7933.807008][T14355] Call Trace: [ 7933.807024][T14355] [c0000014b71ef8f0] [c00000000040d210] region_add+0x100/0x3a0 (unreliable) [ 7933.807056][T14355] [c0000014b71ef9b0] [c00000000040e0c8] __vma_reservation_common+0x148/0x210 __vma_reservation_common at mm/hugetlb.c:2150 [ 7933.807087][T14355] [c0000014b71efa20] [c0000000004132a0] alloc_huge_page+0x350/0x830 alloc_huge_page at mm/hugetlb.c:2359 [ 7933.807100][T14355] [c0000014b71efad0] [c0000000004168f8] hugetlb_no_page+0x158/0xcb0 [ 7933.807113][T14355] [c0000014b71efc20] [c000000000417bc8] hugetlb_fault+0x678/0xb30 [ 7933.807136][T14355] [c0000014b71efcd0] [c0000000003b1de4] handle_mm_fault+0x444/0x450 [ 7933.807158][T14355] [c0000014b71efd20] [c000000000070b1c] __do_page_fault+0x2bc/0xfd0 [ 7933.807181][T14355] [c0000014b71efe20] [c00000000000aa88] handle_page_fault+0x10/0x30 [ 7933.807201][T14355] Instruction dump: [ 7933.807209][T14355] 38c00000 7ea5ab78 7ec4b378 7fa3eb78 4bfff80d e9210020 e91d0050 e95d0068 [ 7933.807232][T14355] 7d3c4850 7d294214 7faa4800 409c0238 <0b170000> 7f03c378 4858c005 60000000 [ 7933.807267][T14355] ---[ end trace 7560275de5f409f8 ]--- [ 7933.905258][T14355] [ 7934.905339][T14355] Kernel panic - not syncing: Fatal exception
On Tue, Feb 18, 2020 at 6:21 AM Qian Cai <cai@lca.pw> wrote: > > On Tue, 2020-02-11 at 15:19 -0800, Andrew Morton wrote: > > On Tue, 11 Feb 2020 13:31:20 -0800 Mina Almasry <almasrymina@google.com> wrote: > > > > > These counters will track hugetlb reservations rather than hugetlb > > > memory faulted in. This patch only adds the counter, following patches > > > add the charging and uncharging of the counter. > > > > We're still pretty thin on review here, but as it's v12 and Mike > > appears to be signed up to look at this work, I'll add them to -next to > > help move things forward. > > > > Reverted the whole series on the top of next-20200217 fixed a crash below (I > don't see anything in next-20200218 would make any differences). > > [ 7933.691114][T35046] LTP: starting hugemmap06 > [ 7933.806377][T14355] ------------[ cut here ]------------ > [ 7933.806541][T14355] kernel BUG at mm/hugetlb.c:490! > VM_BUG_ON(t - f <= 1); > [ 7933.806562][T14355] Oops: Exception in kernel mode, sig: 5 [#1] > [ 7933.806573][T14355] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 > DEBUG_PAGEALLOC NUMA PowerNV > [ 7933.806594][T14355] Modules linked in: kvm_hv kvm brd ext4 crc16 mbcache jbd2 > loop ip_tables x_tables xfs sd_mod bnx2x ahci mdio libahci tg3 libata libphy > firmware_class dm_mirror dm_region_hash dm_log dm_mod [last unloaded: > binfmt_misc] > [ 7933.806651][T14355] CPU: 54 PID: 14355 Comm: hugemmap06 Tainted: > G O 5.6.0-rc2-next-20200217 #1 > [ 7933.806674][T14355] NIP: c00000000040d22c LR: c00000000040d210 CTR: > 0000000000000000 > [ 7933.806696][T14355] REGS: c0000014b71ef660 TRAP: 0700 Tainted: > G O (5.6.0-rc2-next-20200217) > [ 7933.806727][T14355] MSR: 900000000282b033 > <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 22022228 XER: 00000000 > [ 7933.806772][T14355] CFAR: c00000000040cbec IRQMASK: 0 > [ 7933.806772][T14355] GPR00: c00000000040d210 c0000014b71ef8f0 c000000001657000 > 0000000000000001 > [ 7933.806772][T14355] GPR04: 0000000000000012 0000000000000013 0000000000000000 > 0000000000000000 > [ 7933.806772][T14355] GPR08: 0000000000000002 0000000000000002 0000000000000001 > 0000000000000036 > [ 7933.806772][T14355] GPR12: 0000000022022222 c000001ffffd3d00 00007fffad670000 > 00007fffa4bc0000 > [ 7933.806772][T14355] GPR16: 0000000000000000 c000000001567178 c0000014b71efa50 > 0000000000000000 > [ 7933.806772][T14355] GPR20: 0000000000000000 0000000000000013 0000000000000012 > 0000000000000001 > [ 7933.806772][T14355] GPR24: c0000019f74cd270 5deadbeef0000100 5deadbeef0000122 > c0000019f74cd2c0 > [ 7933.806772][T14355] GPR28: 0000000000000001 c0000019f74cd268 c0000014b71ef918 > 0000000000000001 > [ 7933.806961][T14355] NIP [c00000000040d22c] region_add+0x11c/0x3a0 > [ 7933.806980][T14355] LR [c00000000040d210] region_add+0x100/0x3a0 > [ 7933.807008][T14355] Call Trace: > [ 7933.807024][T14355] [c0000014b71ef8f0] [c00000000040d210] > region_add+0x100/0x3a0 (unreliable) > [ 7933.807056][T14355] [c0000014b71ef9b0] [c00000000040e0c8] > __vma_reservation_common+0x148/0x210 > __vma_reservation_common at mm/hugetlb.c:2150 > [ 7933.807087][T14355] [c0000014b71efa20] [c0000000004132a0] > alloc_huge_page+0x350/0x830 > alloc_huge_page at mm/hugetlb.c:2359 > [ 7933.807100][T14355] [c0000014b71efad0] [c0000000004168f8] > hugetlb_no_page+0x158/0xcb0 > [ 7933.807113][T14355] [c0000014b71efc20] [c000000000417bc8] > hugetlb_fault+0x678/0xb30 > [ 7933.807136][T14355] [c0000014b71efcd0] [c0000000003b1de4] > handle_mm_fault+0x444/0x450 > [ 7933.807158][T14355] [c0000014b71efd20] [c000000000070b1c] > __do_page_fault+0x2bc/0xfd0 > [ 7933.807181][T14355] [c0000014b71efe20] [c00000000000aa88] > handle_page_fault+0x10/0x30 > [ 7933.807201][T14355] Instruction dump: > [ 7933.807209][T14355] 38c00000 7ea5ab78 7ec4b378 7fa3eb78 4bfff80d e9210020 > e91d0050 e95d0068 > [ 7933.807232][T14355] 7d3c4850 7d294214 7faa4800 409c0238 <0b170000> 7f03c378 > 4858c005 60000000 > [ 7933.807267][T14355] ---[ end trace 7560275de5f409f8 ]--- > [ 7933.905258][T14355] > [ 7934.905339][T14355] Kernel panic - not syncing: Fatal exception Hi Qian, Yes this VM_BUG_ON was added by a patch in the series ("hugetlb: disable region_add file_region coalescing") so it's definitely related to the series. I'm taking a look at why this VM_BUG_ON fires. Can you confirm you reproduce this by running hugemmap06 from the ltp on a powerpc machine? Can I maybe have your config? Thanks!
On Tue, 2020-02-18 at 10:35 -0800, Mina Almasry wrote: > On Tue, Feb 18, 2020 at 6:21 AM Qian Cai <cai@lca.pw> wrote: > > > > On Tue, 2020-02-11 at 15:19 -0800, Andrew Morton wrote: > > > On Tue, 11 Feb 2020 13:31:20 -0800 Mina Almasry <almasrymina@google.com> wrote: > > > > > > > These counters will track hugetlb reservations rather than hugetlb > > > > memory faulted in. This patch only adds the counter, following patches > > > > add the charging and uncharging of the counter. > > > > > > We're still pretty thin on review here, but as it's v12 and Mike > > > appears to be signed up to look at this work, I'll add them to -next to > > > help move things forward. > > > > > > > Reverted the whole series on the top of next-20200217 fixed a crash below (I > > don't see anything in next-20200218 would make any differences). > > > > [ 7933.691114][T35046] LTP: starting hugemmap06 > > [ 7933.806377][T14355] ------------[ cut here ]------------ > > [ 7933.806541][T14355] kernel BUG at mm/hugetlb.c:490! > > VM_BUG_ON(t - f <= 1); > > [ 7933.806562][T14355] Oops: Exception in kernel mode, sig: 5 [#1] > > [ 7933.806573][T14355] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 > > DEBUG_PAGEALLOC NUMA PowerNV > > [ 7933.806594][T14355] Modules linked in: kvm_hv kvm brd ext4 crc16 mbcache jbd2 > > loop ip_tables x_tables xfs sd_mod bnx2x ahci mdio libahci tg3 libata libphy > > firmware_class dm_mirror dm_region_hash dm_log dm_mod [last unloaded: > > binfmt_misc] > > [ 7933.806651][T14355] CPU: 54 PID: 14355 Comm: hugemmap06 Tainted: > > G O 5.6.0-rc2-next-20200217 #1 > > [ 7933.806674][T14355] NIP: c00000000040d22c LR: c00000000040d210 CTR: > > 0000000000000000 > > [ 7933.806696][T14355] REGS: c0000014b71ef660 TRAP: 0700 Tainted: > > G O (5.6.0-rc2-next-20200217) > > [ 7933.806727][T14355] MSR: 900000000282b033 > > <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 22022228 XER: 00000000 > > [ 7933.806772][T14355] CFAR: c00000000040cbec IRQMASK: 0 > > [ 7933.806772][T14355] GPR00: c00000000040d210 c0000014b71ef8f0 c000000001657000 > > 0000000000000001 > > [ 7933.806772][T14355] GPR04: 0000000000000012 0000000000000013 0000000000000000 > > 0000000000000000 > > [ 7933.806772][T14355] GPR08: 0000000000000002 0000000000000002 0000000000000001 > > 0000000000000036 > > [ 7933.806772][T14355] GPR12: 0000000022022222 c000001ffffd3d00 00007fffad670000 > > 00007fffa4bc0000 > > [ 7933.806772][T14355] GPR16: 0000000000000000 c000000001567178 c0000014b71efa50 > > 0000000000000000 > > [ 7933.806772][T14355] GPR20: 0000000000000000 0000000000000013 0000000000000012 > > 0000000000000001 > > [ 7933.806772][T14355] GPR24: c0000019f74cd270 5deadbeef0000100 5deadbeef0000122 > > c0000019f74cd2c0 > > [ 7933.806772][T14355] GPR28: 0000000000000001 c0000019f74cd268 c0000014b71ef918 > > 0000000000000001 > > [ 7933.806961][T14355] NIP [c00000000040d22c] region_add+0x11c/0x3a0 > > [ 7933.806980][T14355] LR [c00000000040d210] region_add+0x100/0x3a0 > > [ 7933.807008][T14355] Call Trace: > > [ 7933.807024][T14355] [c0000014b71ef8f0] [c00000000040d210] > > region_add+0x100/0x3a0 (unreliable) > > [ 7933.807056][T14355] [c0000014b71ef9b0] [c00000000040e0c8] > > __vma_reservation_common+0x148/0x210 > > __vma_reservation_common at mm/hugetlb.c:2150 > > [ 7933.807087][T14355] [c0000014b71efa20] [c0000000004132a0] > > alloc_huge_page+0x350/0x830 > > alloc_huge_page at mm/hugetlb.c:2359 > > [ 7933.807100][T14355] [c0000014b71efad0] [c0000000004168f8] > > hugetlb_no_page+0x158/0xcb0 > > [ 7933.807113][T14355] [c0000014b71efc20] [c000000000417bc8] > > hugetlb_fault+0x678/0xb30 > > [ 7933.807136][T14355] [c0000014b71efcd0] [c0000000003b1de4] > > handle_mm_fault+0x444/0x450 > > [ 7933.807158][T14355] [c0000014b71efd20] [c000000000070b1c] > > __do_page_fault+0x2bc/0xfd0 > > [ 7933.807181][T14355] [c0000014b71efe20] [c00000000000aa88] > > handle_page_fault+0x10/0x30 > > [ 7933.807201][T14355] Instruction dump: > > [ 7933.807209][T14355] 38c00000 7ea5ab78 7ec4b378 7fa3eb78 4bfff80d e9210020 > > e91d0050 e95d0068 > > [ 7933.807232][T14355] 7d3c4850 7d294214 7faa4800 409c0238 <0b170000> 7f03c378 > > 4858c005 60000000 > > [ 7933.807267][T14355] ---[ end trace 7560275de5f409f8 ]--- > > [ 7933.905258][T14355] > > [ 7934.905339][T14355] Kernel panic - not syncing: Fatal exception > > Hi Qian, > > Yes this VM_BUG_ON was added by a patch in the series ("hugetlb: > disable region_add file_region coalescing") so it's definitely related > to the series. I'm taking a look at why this VM_BUG_ON fires. Can you > confirm you reproduce this by running hugemmap06 from the ltp on a > powerpc machine? Can I maybe have your config? Yes, reproduced on both powerpc and x86. Configs are in, https://github.com/cailca/linux-mm
On 2/18/20 10:35 AM, Mina Almasry wrote: > On Tue, Feb 18, 2020 at 6:21 AM Qian Cai <cai@lca.pw> wrote: >> >> On Tue, 2020-02-11 at 15:19 -0800, Andrew Morton wrote: >>> On Tue, 11 Feb 2020 13:31:20 -0800 Mina Almasry <almasrymina@google.com> wrote: >>> >> [ 7933.806377][T14355] ------------[ cut here ]------------ >> [ 7933.806541][T14355] kernel BUG at mm/hugetlb.c:490! >> VM_BUG_ON(t - f <= 1); >> [ 7933.806562][T14355] Oops: Exception in kernel mode, sig: 5 [#1] <snip> > Hi Qian, > > Yes this VM_BUG_ON was added by a patch in the series ("hugetlb: > disable region_add file_region coalescing") so it's definitely related > to the series. I'm taking a look at why this VM_BUG_ON fires. Can you > confirm you reproduce this by running hugemmap06 from the ltp on a > powerpc machine? Can I maybe have your config? > > Thanks! Hi Mina, Looking at the region_chg code again, we do a resv->adds_in_progress += *out_regions_needed; and then potentially drop the lock to allocate the needed entries. Could anopther thread (only adding reservation for a single page) then come in and notice that there are not enough entries in the cache and hit the VM_BUG_ON()?
On Tue, Feb 18, 2020 at 11:14 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 2/18/20 10:35 AM, Mina Almasry wrote: > > On Tue, Feb 18, 2020 at 6:21 AM Qian Cai <cai@lca.pw> wrote: > >> > >> On Tue, 2020-02-11 at 15:19 -0800, Andrew Morton wrote: > >>> On Tue, 11 Feb 2020 13:31:20 -0800 Mina Almasry <almasrymina@google.com> wrote: > >>> > >> [ 7933.806377][T14355] ------------[ cut here ]------------ > >> [ 7933.806541][T14355] kernel BUG at mm/hugetlb.c:490! > >> VM_BUG_ON(t - f <= 1); > >> [ 7933.806562][T14355] Oops: Exception in kernel mode, sig: 5 [#1] > <snip> > > Hi Qian, > > > > Yes this VM_BUG_ON was added by a patch in the series ("hugetlb: > > disable region_add file_region coalescing") so it's definitely related > > to the series. I'm taking a look at why this VM_BUG_ON fires. Can you > > confirm you reproduce this by running hugemmap06 from the ltp on a > > powerpc machine? Can I maybe have your config? > > > > Thanks! > > Hi Mina, > > Looking at the region_chg code again, we do a > > resv->adds_in_progress += *out_regions_needed; > > and then potentially drop the lock to allocate the needed entries. Could > anopther thread (only adding reservation for a single page) then come in > and notice that there are not enough entries in the cache and hit the > VM_BUG_ON()? Maybe. Also I'm thinking the code thinks actual_regions_needed >= in_regions_needed, but that doesn't seem like a guarantee. I think this call sequence with the same t->f range would violate that: region_chg (regions_needed=1) region_chg (regions_needed=1) region_add (fills in the range) region_add (in_regions_needed = 1, actual_regions_needed = 0, so assumptions in the code break). Luckily it seems the ltp readily reproduces this, so I'm working on reproducing it. I should have a fix soon, at least if I can reproduce it as well.
On Tue, Feb 18, 2020 at 11:25 AM Mina Almasry <almasrymina@google.com> wrote: > > On Tue, Feb 18, 2020 at 11:14 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > > > On 2/18/20 10:35 AM, Mina Almasry wrote: > > > On Tue, Feb 18, 2020 at 6:21 AM Qian Cai <cai@lca.pw> wrote: > > >> > > >> On Tue, 2020-02-11 at 15:19 -0800, Andrew Morton wrote: > > >>> On Tue, 11 Feb 2020 13:31:20 -0800 Mina Almasry <almasrymina@google.com> wrote: > > >>> > > >> [ 7933.806377][T14355] ------------[ cut here ]------------ > > >> [ 7933.806541][T14355] kernel BUG at mm/hugetlb.c:490! > > >> VM_BUG_ON(t - f <= 1); > > >> [ 7933.806562][T14355] Oops: Exception in kernel mode, sig: 5 [#1] > > <snip> > > > Hi Qian, > > > > > > Yes this VM_BUG_ON was added by a patch in the series ("hugetlb: > > > disable region_add file_region coalescing") so it's definitely related > > > to the series. I'm taking a look at why this VM_BUG_ON fires. Can you > > > confirm you reproduce this by running hugemmap06 from the ltp on a > > > powerpc machine? Can I maybe have your config? > > > > > > Thanks! > > > > Hi Mina, > > > > Looking at the region_chg code again, we do a > > > > resv->adds_in_progress += *out_regions_needed; > > > > and then potentially drop the lock to allocate the needed entries. Could > > anopther thread (only adding reservation for a single page) then come in > > and notice that there are not enough entries in the cache and hit the > > VM_BUG_ON()? > > Maybe. Also I'm thinking the code thinks actual_regions_needed >= > in_regions_needed, but that doesn't seem like a guarantee. I think > this call sequence with the same t->f range would violate that: > > region_chg (regions_needed=1) > region_chg (regions_needed=1) > region_add (fills in the range) > region_add (in_regions_needed = 1, actual_regions_needed = 0, so > assumptions in the code break). > > Luckily it seems the ltp readily reproduces this, so I'm working on > reproducing it. I should have a fix soon, at least if I can reproduce > it as well. I had a bit of trouble reproducing this but I got it just now. Makes sense I've never run into this even though others can readily reproduce it. I happen to run my kernels on a pretty beefy 36 core machine and in that setup things seem to execute fast and there is never a queue of pending file_region inserts into the resv_map. Once I limited qemu to only use 2 cores I ran into the issue right away. Looking into a fix now.
On 2/18/20 1:36 PM, Mina Almasry wrote: > On Tue, Feb 18, 2020 at 11:25 AM Mina Almasry <almasrymina@google.com> wrote: >> >> On Tue, Feb 18, 2020 at 11:14 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: >>> >>> On 2/18/20 10:35 AM, Mina Almasry wrote: >>>> On Tue, Feb 18, 2020 at 6:21 AM Qian Cai <cai@lca.pw> wrote: >>>>> >>>>> On Tue, 2020-02-11 at 15:19 -0800, Andrew Morton wrote: >>>>>> On Tue, 11 Feb 2020 13:31:20 -0800 Mina Almasry <almasrymina@google.com> wrote: >>>>>> >>>>> [ 7933.806377][T14355] ------------[ cut here ]------------ >>>>> [ 7933.806541][T14355] kernel BUG at mm/hugetlb.c:490! >>>>> VM_BUG_ON(t - f <= 1); >>>>> [ 7933.806562][T14355] Oops: Exception in kernel mode, sig: 5 [#1] >>> <snip> >>>> Hi Qian, >>>> >>>> Yes this VM_BUG_ON was added by a patch in the series ("hugetlb: >>>> disable region_add file_region coalescing") so it's definitely related >>>> to the series. I'm taking a look at why this VM_BUG_ON fires. Can you >>>> confirm you reproduce this by running hugemmap06 from the ltp on a >>>> powerpc machine? Can I maybe have your config? >>>> >>>> Thanks! >>> >>> Hi Mina, >>> >>> Looking at the region_chg code again, we do a >>> >>> resv->adds_in_progress += *out_regions_needed; >>> >>> and then potentially drop the lock to allocate the needed entries. Could >>> anopther thread (only adding reservation for a single page) then come in >>> and notice that there are not enough entries in the cache and hit the >>> VM_BUG_ON()? >> >> Maybe. Also I'm thinking the code thinks actual_regions_needed >= >> in_regions_needed, but that doesn't seem like a guarantee. I think >> this call sequence with the same t->f range would violate that: >> >> region_chg (regions_needed=1) >> region_chg (regions_needed=1) >> region_add (fills in the range) >> region_add (in_regions_needed = 1, actual_regions_needed = 0, so >> assumptions in the code break). >> >> Luckily it seems the ltp readily reproduces this, so I'm working on >> reproducing it. I should have a fix soon, at least if I can reproduce >> it as well. > > I had a bit of trouble reproducing this but I got it just now. > > Makes sense I've never run into this even though others can readily > reproduce it. I happen to run my kernels on a pretty beefy 36 core > machine and in that setup things seem to execute fast and there is > never a queue of pending file_region inserts into the resv_map. Once I > limited qemu to only use 2 cores I ran into the issue right away. > Looking into a fix now. This may not be optimal, but it resolves the issue for me. I just put it together to test the theory that the region_chg code was at fault.
On Tue, Feb 18, 2020 at 1:41 PM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 2/18/20 1:36 PM, Mina Almasry wrote: > > On Tue, Feb 18, 2020 at 11:25 AM Mina Almasry <almasrymina@google.com> wrote: > >> > >> On Tue, Feb 18, 2020 at 11:14 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > >>> > >>> On 2/18/20 10:35 AM, Mina Almasry wrote: > >>>> On Tue, Feb 18, 2020 at 6:21 AM Qian Cai <cai@lca.pw> wrote: > >>>>> > >>>>> On Tue, 2020-02-11 at 15:19 -0800, Andrew Morton wrote: > >>>>>> On Tue, 11 Feb 2020 13:31:20 -0800 Mina Almasry <almasrymina@google.com> wrote: > >>>>>> > >>>>> [ 7933.806377][T14355] ------------[ cut here ]------------ > >>>>> [ 7933.806541][T14355] kernel BUG at mm/hugetlb.c:490! > >>>>> VM_BUG_ON(t - f <= 1); > >>>>> [ 7933.806562][T14355] Oops: Exception in kernel mode, sig: 5 [#1] > >>> <snip> > >>>> Hi Qian, > >>>> > >>>> Yes this VM_BUG_ON was added by a patch in the series ("hugetlb: > >>>> disable region_add file_region coalescing") so it's definitely related > >>>> to the series. I'm taking a look at why this VM_BUG_ON fires. Can you > >>>> confirm you reproduce this by running hugemmap06 from the ltp on a > >>>> powerpc machine? Can I maybe have your config? > >>>> > >>>> Thanks! > >>> > >>> Hi Mina, > >>> > >>> Looking at the region_chg code again, we do a > >>> > >>> resv->adds_in_progress += *out_regions_needed; > >>> > >>> and then potentially drop the lock to allocate the needed entries. Could > >>> anopther thread (only adding reservation for a single page) then come in > >>> and notice that there are not enough entries in the cache and hit the > >>> VM_BUG_ON()? > >> > >> Maybe. Also I'm thinking the code thinks actual_regions_needed >= > >> in_regions_needed, but that doesn't seem like a guarantee. I think > >> this call sequence with the same t->f range would violate that: > >> > >> region_chg (regions_needed=1) > >> region_chg (regions_needed=1) > >> region_add (fills in the range) > >> region_add (in_regions_needed = 1, actual_regions_needed = 0, so > >> assumptions in the code break). > >> > >> Luckily it seems the ltp readily reproduces this, so I'm working on > >> reproducing it. I should have a fix soon, at least if I can reproduce > >> it as well. > > > > I had a bit of trouble reproducing this but I got it just now. > > > > Makes sense I've never run into this even though others can readily > > reproduce it. I happen to run my kernels on a pretty beefy 36 core > > machine and in that setup things seem to execute fast and there is > > never a queue of pending file_region inserts into the resv_map. Once I > > limited qemu to only use 2 cores I ran into the issue right away. > > Looking into a fix now. > > This may not be optimal, but it resolves the issue for me. I just put it > together to test the theory that the region_chg code was at fault. Thanks! Just sent out a similar patch "[PATCH -next] mm/hugetlb: Fix file_region entry allocations"
On Tue, Feb 11, 2020 at 3:19 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Tue, 11 Feb 2020 13:31:20 -0800 Mina Almasry <almasrymina@google.com> wrote: > > > These counters will track hugetlb reservations rather than hugetlb > > memory faulted in. This patch only adds the counter, following patches > > add the charging and uncharging of the counter. > > We're still pretty thin on review here, but as it's v12 and Mike > appears to be signed up to look at this work, I'll add them to -next to > help move things forward. > Hi Andrew, Since the patches were merged into -next there have been build fixes and test fixes and some review comments. Would you like me to submit *new* patches to address these, or would you like me to squash the fixes into my existing patch series and submit another iteration of the patch series?
On Wed, 19 Feb 2020 11:05:41 -0800 Mina Almasry <almasrymina@google.com> wrote: > On Tue, Feb 11, 2020 at 3:19 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Tue, 11 Feb 2020 13:31:20 -0800 Mina Almasry <almasrymina@google.com> wrote: > > > > > These counters will track hugetlb reservations rather than hugetlb > > > memory faulted in. This patch only adds the counter, following patches > > > add the charging and uncharging of the counter. > > > > We're still pretty thin on review here, but as it's v12 and Mike > > appears to be signed up to look at this work, I'll add them to -next to > > help move things forward. > > > > Hi Andrew, > > Since the patches were merged into -next there have been build fixes > and test fixes and some review comments. Would you like me to submit > *new* patches to address these, or would you like me to squash the > fixes into my existing patch series and submit another iteration of > the patch series? What you did worked OK ;) Please check the end result next time I release a kernel.
On Wed, Feb 19, 2020 at 1:06 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Wed, 19 Feb 2020 11:05:41 -0800 Mina Almasry <almasrymina@google.com> wrote: > > > On Tue, Feb 11, 2020 at 3:19 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > On Tue, 11 Feb 2020 13:31:20 -0800 Mina Almasry <almasrymina@google.com> wrote: > > > > > > > These counters will track hugetlb reservations rather than hugetlb > > > > memory faulted in. This patch only adds the counter, following patches > > > > add the charging and uncharging of the counter. > > > > > > We're still pretty thin on review here, but as it's v12 and Mike > > > appears to be signed up to look at this work, I'll add them to -next to > > > help move things forward. > > > > > > > Hi Andrew, > > > > Since the patches were merged into -next there have been build fixes > > and test fixes and some review comments. Would you like me to submit > > *new* patches to address these, or would you like me to squash the > > fixes into my existing patch series and submit another iteration of > > the patch series? > > What you did worked OK ;) > > Please check the end result next time I release a kernel. Thanks Andrew! Things definitely moved along after the patchseries got into -next :D By my count I think all my patches outside of the tests patch have been acked or reviewed. When you have a chance I have a couple of questions: 1. For the non-tests patch, anything pending on those preventing eventual submission to linus's tree? 2. For the tests patch, I only have a Tested-by from Sandipan. Is that good enough? If the worst comes to worst and I don't get a review on that patch I would rather (if possible) that 'tests' patch can be dropped while I nag folks for a review, rather than block submission of the entire patch series. I ask because it's been out for review for some time and it's the one I got least discussion on so I'm not sure I'll have a review by the time it's needed. Thanks again!
On Thu, 20 Feb 2020 11:22:58 -0800 Mina Almasry <almasrymina@google.com> wrote: > On Wed, Feb 19, 2020 at 1:06 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Wed, 19 Feb 2020 11:05:41 -0800 Mina Almasry <almasrymina@google.com> wrote: > > > > > On Tue, Feb 11, 2020 at 3:19 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > On Tue, 11 Feb 2020 13:31:20 -0800 Mina Almasry <almasrymina@google.com> wrote: > > > > > > > > > These counters will track hugetlb reservations rather than hugetlb > > > > > memory faulted in. This patch only adds the counter, following patches > > > > > add the charging and uncharging of the counter. > > > > > > > > We're still pretty thin on review here, but as it's v12 and Mike > > > > appears to be signed up to look at this work, I'll add them to -next to > > > > help move things forward. > > > > > > > > > > Hi Andrew, > > > > > > Since the patches were merged into -next there have been build fixes > > > and test fixes and some review comments. Would you like me to submit > > > *new* patches to address these, or would you like me to squash the > > > fixes into my existing patch series and submit another iteration of > > > the patch series? > > > > What you did worked OK ;) > > > > Please check the end result next time I release a kernel. > > Thanks Andrew! Things definitely moved along after the patchseries got > into -next :D > > By my count I think all my patches outside of the tests patch have > been acked or reviewed. When you have a chance I have a couple of > questions: > > 1. For the non-tests patch, anything pending on those preventing > eventual submission to linus's tree? > 2. For the tests patch, I only have a Tested-by from Sandipan. Is that > good enough? If the worst comes to worst and I don't get a review on > that patch I would rather (if possible) that 'tests' patch can be > dropped while I nag folks for a review, rather than block submission > of the entire patch series. I ask because it's been out for review for > some time and it's the one I got least discussion on so I'm not sure > I'll have a review by the time it's needed. > It all looks pretty good and I expect we can get everything into 5.7-rc1, unless some issues pop up. It's unclear to me whether http://lkml.kernel.org/r/CAHS8izOTipknnYaKz=FdzL-7yW-Z61ck1yPnYWixyMSJuTUYLQ@mail.gmail.com was going to result in an update?
On 2/20/20 4:28 PM, Andrew Morton wrote: > It all looks pretty good and I expect we can get everything into > 5.7-rc1, unless some issues pop up. > > It's unclear to me whether > http://lkml.kernel.org/r/CAHS8izOTipknnYaKz=FdzL-7yW-Z61ck1yPnYWixyMSJuTUYLQ@mail.gmail.com > was going to result in an update? Yes there was, https://lore.kernel.org/linux-mm/20200219233610.13808-1-almasrymina@google.com/ BTW, I've been through the selftest code at a high level. Not at the level of detail Shuah may provide. I am just happy that Mina provided a relatively comprehensive test for this new functionality. Will give it an ACK shortly.
On Thu, Feb 20, 2020 at 4:41 PM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 2/20/20 4:28 PM, Andrew Morton wrote: > > It all looks pretty good and I expect we can get everything into > > 5.7-rc1, unless some issues pop up. > > Awesome! I'll be on the lookout for issues. Folks that reviewed (and especially Mike), thank you so much for the patient reviews! > > It's unclear to me whether > > http://lkml.kernel.org/r/CAHS8izOTipknnYaKz=FdzL-7yW-Z61ck1yPnYWixyMSJuTUYLQ@mail.gmail.com > > was going to result in an update? > > Yes there was, > > https://lore.kernel.org/linux-mm/20200219233610.13808-1-almasrymina@google.com/ > > BTW, I've been through the selftest code at a high level. Not at the level > of detail Shuah may provide. I am just happy that Mina provided a relatively > comprehensive test for this new functionality. Will give it an ACK shortly. > -- > Mike Kravetz
On Wed, Feb 19, 2020 at 1:06 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Wed, 19 Feb 2020 11:05:41 -0800 Mina Almasry <almasrymina@google.com> wrote: > > > On Tue, Feb 11, 2020 at 3:19 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > On Tue, 11 Feb 2020 13:31:20 -0800 Mina Almasry <almasrymina@google.com> wrote: > > > > > > > These counters will track hugetlb reservations rather than hugetlb > > > > memory faulted in. This patch only adds the counter, following patches > > > > add the charging and uncharging of the counter. > > > > > > We're still pretty thin on review here, but as it's v12 and Mike > > > appears to be signed up to look at this work, I'll add them to -next to > > > help move things forward. > > > > > > > Hi Andrew, > > > > Since the patches were merged into -next there have been build fixes > > and test fixes and some review comments. Would you like me to submit > > *new* patches to address these, or would you like me to squash the > > fixes into my existing patch series and submit another iteration of > > the patch series? > > What you did worked OK ;) > > Please check the end result next time I release a kernel. Hey Andrew, Thanks for taking in the patset and fixes. Only pending change in the latest -next tree is this one: https://lore.kernel.org/linux-mm/20200219233610.13808-1-almasrymina@google.com/ It's reviewed by Mike here: https://lore.kernel.org/linux-mm/a0d7b8e1-cb43-3b43-68c3-55631f2ce199@oracle.com/
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 1e897e4168ac1..dea6143aa0685 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -432,8 +432,8 @@ struct hstate { unsigned int surplus_huge_pages_node[MAX_NUMNODES]; #ifdef CONFIG_CGROUP_HUGETLB /* cgroup control files */ - struct cftype cgroup_files_dfl[5]; - struct cftype cgroup_files_legacy[5]; + struct cftype cgroup_files_dfl[7]; + struct cftype cgroup_files_legacy[9]; #endif char name[HSTATE_NAME_LEN]; }; diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index e434b05416c68..08b2adcdb5c1c 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -36,6 +36,11 @@ struct hugetlb_cgroup { */ struct page_counter hugepage[HUGE_MAX_HSTATE]; + /* + * the counter to account for hugepage reservations from hugetlb. + */ + struct page_counter rsvd_hugepage[HUGE_MAX_HSTATE]; + atomic_long_t events[HUGE_MAX_HSTATE][HUGETLB_NR_MEMORY_EVENTS]; atomic_long_t events_local[HUGE_MAX_HSTATE][HUGETLB_NR_MEMORY_EVENTS]; @@ -55,6 +60,15 @@ struct hugetlb_cgroup { static struct hugetlb_cgroup *root_h_cgroup __read_mostly; +static inline struct page_counter * +hugetlb_cgroup_counter_from_cgroup(struct hugetlb_cgroup *h_cg, int idx, + bool rsvd) +{ + if (rsvd) + return &h_cg->rsvd_hugepage[idx]; + return &h_cg->hugepage[idx]; +} + static inline struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s) { @@ -295,28 +309,42 @@ void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, enum { RES_USAGE, + RES_RSVD_USAGE, RES_LIMIT, + RES_RSVD_LIMIT, RES_MAX_USAGE, + RES_RSVD_MAX_USAGE, RES_FAILCNT, + RES_RSVD_FAILCNT, }; static u64 hugetlb_cgroup_read_u64(struct cgroup_subsys_state *css, struct cftype *cft) { struct page_counter *counter; + struct page_counter *rsvd_counter; struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_css(css); counter = &h_cg->hugepage[MEMFILE_IDX(cft->private)]; + rsvd_counter = &h_cg->rsvd_hugepage[MEMFILE_IDX(cft->private)]; switch (MEMFILE_ATTR(cft->private)) { case RES_USAGE: return (u64)page_counter_read(counter) * PAGE_SIZE; + case RES_RSVD_USAGE: + return (u64)page_counter_read(rsvd_counter) * PAGE_SIZE; case RES_LIMIT: return (u64)counter->max * PAGE_SIZE; + case RES_RSVD_LIMIT: + return (u64)rsvd_counter->max * PAGE_SIZE; case RES_MAX_USAGE: return (u64)counter->watermark * PAGE_SIZE; + case RES_RSVD_MAX_USAGE: + return (u64)rsvd_counter->watermark * PAGE_SIZE; case RES_FAILCNT: return counter->failcnt; + case RES_RSVD_FAILCNT: + return rsvd_counter->failcnt; default: BUG(); } @@ -338,10 +366,16 @@ static int hugetlb_cgroup_read_u64_max(struct seq_file *seq, void *v) 1 << huge_page_order(&hstates[idx])); switch (MEMFILE_ATTR(cft->private)) { + case RES_RSVD_USAGE: + counter = &h_cg->rsvd_hugepage[idx]; + /* Fall through. */ case RES_USAGE: val = (u64)page_counter_read(counter); seq_printf(seq, "%llu\n", val * PAGE_SIZE); break; + case RES_RSVD_LIMIT: + counter = &h_cg->rsvd_hugepage[idx]; + /* Fall through. */ case RES_LIMIT: val = (u64)counter->max; if (val == limit) @@ -365,6 +399,7 @@ static ssize_t hugetlb_cgroup_write(struct kernfs_open_file *of, int ret, idx; unsigned long nr_pages; struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_css(of_css(of)); + bool rsvd = false; if (hugetlb_cgroup_is_root(h_cg)) /* Can't set limit on root */ return -EINVAL; @@ -378,9 +413,14 @@ static ssize_t hugetlb_cgroup_write(struct kernfs_open_file *of, nr_pages = round_down(nr_pages, 1 << huge_page_order(&hstates[idx])); switch (MEMFILE_ATTR(of_cft(of)->private)) { + case RES_RSVD_LIMIT: + rsvd = true; + /* Fall through. */ case RES_LIMIT: mutex_lock(&hugetlb_limit_mutex); - ret = page_counter_set_max(&h_cg->hugepage[idx], nr_pages); + ret = page_counter_set_max( + hugetlb_cgroup_counter_from_cgroup(h_cg, idx, rsvd), + nr_pages); mutex_unlock(&hugetlb_limit_mutex); break; default: @@ -406,18 +446,25 @@ static ssize_t hugetlb_cgroup_reset(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { int ret = 0; - struct page_counter *counter; + struct page_counter *counter, *rsvd_counter; struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_css(of_css(of)); counter = &h_cg->hugepage[MEMFILE_IDX(of_cft(of)->private)]; + rsvd_counter = &h_cg->rsvd_hugepage[MEMFILE_IDX(of_cft(of)->private)]; switch (MEMFILE_ATTR(of_cft(of)->private)) { case RES_MAX_USAGE: page_counter_reset_watermark(counter); break; + case RES_RSVD_MAX_USAGE: + page_counter_reset_watermark(rsvd_counter); + break; case RES_FAILCNT: counter->failcnt = 0; break; + case RES_RSVD_FAILCNT: + rsvd_counter->failcnt = 0; + break; default: ret = -EINVAL; break; @@ -472,7 +519,7 @@ static void __init __hugetlb_cgroup_file_dfl_init(int idx) struct hstate *h = &hstates[idx]; /* format the size */ - mem_fmt(buf, 32, huge_page_size(h)); + mem_fmt(buf, sizeof(buf), huge_page_size(h)); /* Add the limit file */ cft = &h->cgroup_files_dfl[0]; @@ -482,15 +529,30 @@ static void __init __hugetlb_cgroup_file_dfl_init(int idx) cft->write = hugetlb_cgroup_write_dfl; cft->flags = CFTYPE_NOT_ON_ROOT; - /* Add the current usage file */ + /* Add the reservation limit file */ cft = &h->cgroup_files_dfl[1]; + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.rsvd.max", buf); + cft->private = MEMFILE_PRIVATE(idx, RES_RSVD_LIMIT); + cft->seq_show = hugetlb_cgroup_read_u64_max; + cft->write = hugetlb_cgroup_write_dfl; + cft->flags = CFTYPE_NOT_ON_ROOT; + + /* Add the current usage file */ + cft = &h->cgroup_files_dfl[2]; snprintf(cft->name, MAX_CFTYPE_NAME, "%s.current", buf); cft->private = MEMFILE_PRIVATE(idx, RES_USAGE); cft->seq_show = hugetlb_cgroup_read_u64_max; cft->flags = CFTYPE_NOT_ON_ROOT; + /* Add the current reservation usage file */ + cft = &h->cgroup_files_dfl[3]; + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.rsvd.current", buf); + cft->private = MEMFILE_PRIVATE(idx, RES_RSVD_USAGE); + cft->seq_show = hugetlb_cgroup_read_u64_max; + cft->flags = CFTYPE_NOT_ON_ROOT; + /* Add the events file */ - cft = &h->cgroup_files_dfl[2]; + cft = &h->cgroup_files_dfl[4]; snprintf(cft->name, MAX_CFTYPE_NAME, "%s.events", buf); cft->private = MEMFILE_PRIVATE(idx, 0); cft->seq_show = hugetlb_events_show; @@ -498,7 +560,7 @@ static void __init __hugetlb_cgroup_file_dfl_init(int idx) cft->flags = CFTYPE_NOT_ON_ROOT; /* Add the events.local file */ - cft = &h->cgroup_files_dfl[3]; + cft = &h->cgroup_files_dfl[5]; snprintf(cft->name, MAX_CFTYPE_NAME, "%s.events.local", buf); cft->private = MEMFILE_PRIVATE(idx, 0); cft->seq_show = hugetlb_events_local_show; @@ -507,7 +569,7 @@ static void __init __hugetlb_cgroup_file_dfl_init(int idx) cft->flags = CFTYPE_NOT_ON_ROOT; /* NULL terminate the last cft */ - cft = &h->cgroup_files_dfl[4]; + cft = &h->cgroup_files_dfl[6]; memset(cft, 0, sizeof(*cft)); WARN_ON(cgroup_add_dfl_cftypes(&hugetlb_cgrp_subsys, @@ -521,7 +583,7 @@ static void __init __hugetlb_cgroup_file_legacy_init(int idx) struct hstate *h = &hstates[idx]; /* format the size */ - mem_fmt(buf, 32, huge_page_size(h)); + mem_fmt(buf, sizeof(buf), huge_page_size(h)); /* Add the limit file */ cft = &h->cgroup_files_legacy[0]; @@ -530,28 +592,55 @@ static void __init __hugetlb_cgroup_file_legacy_init(int idx) cft->read_u64 = hugetlb_cgroup_read_u64; cft->write = hugetlb_cgroup_write_legacy; - /* Add the usage file */ + /* Add the reservation limit file */ cft = &h->cgroup_files_legacy[1]; + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.rsvd.limit_in_bytes", buf); + cft->private = MEMFILE_PRIVATE(idx, RES_RSVD_LIMIT); + cft->read_u64 = hugetlb_cgroup_read_u64; + cft->write = hugetlb_cgroup_write_legacy; + + /* Add the usage file */ + cft = &h->cgroup_files_legacy[2]; snprintf(cft->name, MAX_CFTYPE_NAME, "%s.usage_in_bytes", buf); cft->private = MEMFILE_PRIVATE(idx, RES_USAGE); cft->read_u64 = hugetlb_cgroup_read_u64; + /* Add the reservation usage file */ + cft = &h->cgroup_files_legacy[3]; + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.rsvd.usage_in_bytes", buf); + cft->private = MEMFILE_PRIVATE(idx, RES_RSVD_USAGE); + cft->read_u64 = hugetlb_cgroup_read_u64; + /* Add the MAX usage file */ - cft = &h->cgroup_files_legacy[2]; + cft = &h->cgroup_files_legacy[4]; snprintf(cft->name, MAX_CFTYPE_NAME, "%s.max_usage_in_bytes", buf); cft->private = MEMFILE_PRIVATE(idx, RES_MAX_USAGE); cft->write = hugetlb_cgroup_reset; cft->read_u64 = hugetlb_cgroup_read_u64; + /* Add the MAX reservation usage file */ + cft = &h->cgroup_files_legacy[5]; + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.rsvd.max_usage_in_bytes", buf); + cft->private = MEMFILE_PRIVATE(idx, RES_RSVD_MAX_USAGE); + cft->write = hugetlb_cgroup_reset; + cft->read_u64 = hugetlb_cgroup_read_u64; + /* Add the failcntfile */ - cft = &h->cgroup_files_legacy[3]; + cft = &h->cgroup_files_legacy[6]; snprintf(cft->name, MAX_CFTYPE_NAME, "%s.failcnt", buf); - cft->private = MEMFILE_PRIVATE(idx, RES_FAILCNT); + cft->private = MEMFILE_PRIVATE(idx, RES_FAILCNT); + cft->write = hugetlb_cgroup_reset; + cft->read_u64 = hugetlb_cgroup_read_u64; + + /* Add the reservation failcntfile */ + cft = &h->cgroup_files_legacy[7]; + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.rsvd.failcnt", buf); + cft->private = MEMFILE_PRIVATE(idx, RES_RSVD_FAILCNT); cft->write = hugetlb_cgroup_reset; cft->read_u64 = hugetlb_cgroup_read_u64; /* NULL terminate the last cft */ - cft = &h->cgroup_files_legacy[4]; + cft = &h->cgroup_files_legacy[8]; memset(cft, 0, sizeof(*cft)); WARN_ON(cgroup_add_legacy_cftypes(&hugetlb_cgrp_subsys,