diff mbox series

[v9,02/10] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap()

Message ID 20231207161211.2374093-3-ryan.roberts@arm.com (mailing list archive)
State New, archived
Headers show
Series Multi-size THP for anonymous memory | expand

Commit Message

Ryan Roberts Dec. 7, 2023, 4:12 p.m. UTC
In preparation for supporting anonymous multi-size THP, improve
folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
passed to it. In this case, all contained pages are accounted using the
order-0 folio (or base page) scheme.

Reviewed-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Barry Song <v-songbaohua@oppo.com>
Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Tested-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 mm/rmap.c | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

Comments

Jiri Olsa Jan. 13, 2024, 10:42 p.m. UTC | #1
On Thu, Dec 07, 2023 at 04:12:03PM +0000, Ryan Roberts wrote:
> In preparation for supporting anonymous multi-size THP, improve
> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
> passed to it. In this case, all contained pages are accounted using the
> order-0 folio (or base page) scheme.
> 
> Reviewed-by: Yu Zhao <yuzhao@google.com>
> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Barry Song <v-songbaohua@oppo.com>
> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> Tested-by: John Hubbard <jhubbard@nvidia.com>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  mm/rmap.c | 28 ++++++++++++++++++++--------
>  1 file changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 2a1e45e6419f..846fc79f3ca9 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
>   * This means the inc-and-test can be bypassed.
>   * The folio does not have to be locked.
>   *
> - * If the folio is large, it is accounted as a THP.  As the folio
> + * If the folio is pmd-mappable, it is accounted as a THP.  As the folio
>   * is new, it's assumed to be mapped exclusively by a single process.
>   */
>  void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>  		unsigned long address)
>  {
> -	int nr;
> +	int nr = folio_nr_pages(folio);
>  
> -	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
> +	VM_BUG_ON_VMA(address < vma->vm_start ||
> +			address + (nr << PAGE_SHIFT) > vma->vm_end, vma);

hi,
I'm hitting this bug (console output below) with adding uprobe
on simple program like:

  $ cat up.c 
  int main(void)
  { 
     return 0;
  }

  # bpftrace -e 'uprobe:/home/jolsa/up:_start {}'

  $ ./up

it's on top of current linus tree master:
  052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

before this patch it seems to work, I can send my .config if needed

thanks,
jirka


---
[  147.562264][  T719] vma ffff888166134e68 start 0000000000401000 end 0000000000402000 mm ffff88817cf2a840
[  147.562264][  T719] prot 25 anon_vma ffff88817b6818e0 vm_ops ffffffff83475ec0
[  147.562264][  T719] pgoff 1 file ffff888168d01240 private_data 0000000000000000
[  147.562264][  T719] flags: 0x75(read|exec|mayread|maywrite|mayexec)
[  147.571660][  T719] ------------[ cut here ]------------
[  147.572319][  T719] kernel BUG at mm/rmap.c:1412!
[  147.572825][  T719] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN NOPTI
[  147.573792][  T719] CPU: 3 PID: 719 Comm: up Not tainted 6.7.0+ #273 faf755a6fc44b54f4ff1c207411fbd9df5a3968d
[  147.574831][  T719] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
[  147.575652][  T719] RIP: 0010:folio_add_new_anon_rmap+0x2cc/0x8f0
[  147.576164][  T719] Code: c7 c6 20 d2 38 83 48 89 df e8 c0 4a fb ff 0f 0b 48 89 ef e8 16 ab 08 00 4c 3b 65 00 0f 83 cd fd ff ff 48 89 ef e8 34 44 fb ff  f7 c3 ff 0f 00 00 0f 85 de fe ff ff be 08 00 00 00 48 89 df
[  147.577609][  T719] RSP: 0018:ffff88815759f568 EFLAGS: 00010286
[  147.578140][  T719] RAX: 00000000000000fa RBX: ffffea00053eef40 RCX: 0000000000000000
[  147.578825][  T719] RDX: 0000000000000000 RSI: ffffffff81289b44 RDI: ffffffff872ff1a0
[  147.579513][  T719] RBP: ffff888166134e68 R08: 0000000000000001 R09: ffffed102aeb3e5f
[  147.580198][  T719] R10: ffff88815759f2ff R11: 0000000000000000 R12: 0000000000401020
[  147.580886][  T719] R13: 0000000000000001 R14: ffffea00053eef40 R15: ffffea00053eef40
[  147.581566][  T719] FS:  0000000000000000(0000) GS:ffff88842ce00000(0000) knlGS:0000000000000000
[  147.582263][  T719] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  147.582724][  T719] CR2: 00005634f0ffe880 CR3: 000000010c0f8002 CR4: 0000000000770ef0
[  147.583304][  T719] PKRU: 55555554
[  147.583586][  T719] Call Trace:
[  147.583869][  T719]  <TASK>
[  147.584122][  T719]  ? die+0x32/0x80
[  147.584422][  T719]  ? do_trap+0x12f/0x220
[  147.584800][  T719]  ? folio_add_new_anon_rmap+0x2cc/0x8f0
[  147.585411][  T719]  ? folio_add_new_anon_rmap+0x2cc/0x8f0
[  147.585891][  T719]  ? do_error_trap+0xa7/0x160
[  147.586349][  T719]  ? folio_add_new_anon_rmap+0x2cc/0x8f0
[  147.586879][  T719]  ? handle_invalid_op+0x2c/0x40
[  147.587354][  T719]  ? folio_add_new_anon_rmap+0x2cc/0x8f0
[  147.587892][  T719]  ? exc_invalid_op+0x29/0x40
[  147.588352][  T719]  ? asm_exc_invalid_op+0x16/0x20
[  147.588847][  T719]  ? preempt_count_sub+0x14/0xc0
[  147.589335][  T719]  ? folio_add_new_anon_rmap+0x2cc/0x8f0
[  147.589899][  T719]  ? folio_add_new_anon_rmap+0x2cc/0x8f0
[  147.590437][  T719]  __replace_page+0x364/0xb40
[  147.590918][  T719]  ? __pfx___replace_page+0x10/0x10
[  147.591412][  T719]  ? __pfx_lock_release+0x10/0x10
[  147.591910][  T719]  ? do_raw_spin_trylock+0xcd/0x120
[  147.592555][  T719]  ? __pfx_vma_alloc_folio+0x10/0x10
[  147.593095][  T719]  ? preempt_count_add+0x6e/0xc0
[  147.593612][  T719]  ? preempt_count_sub+0x14/0xc0
[  147.594143][  T719]  uprobe_write_opcode+0x3f6/0x820
[  147.594616][  T719]  ? __pfx_uprobe_write_opcode+0x10/0x10
[  147.595125][  T719]  ? preempt_count_sub+0x14/0xc0
[  147.595551][  T719]  ? up_write+0x125/0x2f0
[  147.596014][  T719]  install_breakpoint.isra.0+0xe5/0x470
[  147.596635][  T719]  uprobe_mmap+0x37b/0x8d0
[  147.598111][  T719]  ? __pfx_uprobe_mmap+0x10/0x10
[  147.598561][  T719]  mmap_region+0xa02/0x1220
[  147.599013][  T719]  ? rcu_is_watching+0x34/0x60
[  147.599602][  T719]  ? lock_acquired+0xbf/0x670
[  147.600024][  T719]  ? __pfx_mmap_region+0x10/0x10
[  147.600458][  T719]  ? security_mmap_addr+0x20/0x60
[  147.600909][  T719]  ? get_unmapped_area+0x169/0x1f0
[  147.601353][  T719]  do_mmap+0x425/0x660
[  147.601739][  T719]  vm_mmap_pgoff+0x15e/0x2b0
[  147.602156][  T719]  ? __pfx_vm_mmap_pgoff+0x10/0x10
[  147.602597][  T719]  ? __pfx_get_random_u64+0x10/0x10
[  147.603059][  T719]  elf_load+0xdc/0x3a0
[  147.603433][  T719]  load_elf_binary+0x6f6/0x22b0
[  147.603889][  T719]  ? __pfx_load_elf_binary+0x10/0x10
[  147.604385][  T719]  ? __pfx_lock_acquired+0x10/0x10
[  147.604952][  T719]  bprm_execve+0x494/0xc80
[  147.605379][  T719]  ? __pfx_bprm_execve+0x10/0x10
[  147.605843][  T719]  do_execveat_common.isra.0+0x24f/0x330
[  147.606358][  T719]  __x64_sys_execve+0x52/0x60
[  147.606797][  T719]  do_syscall_64+0x87/0x1b0
[  147.607148][  T719]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[  147.607630][  T719] RIP: 0033:0x7faa9b0bdb4b
[  147.608732][  T719] Code: Unable to access opcode bytes at 0x7faa9b0bdb21.
[  147.609318][  T719] RSP: 002b:00007ffff9921708 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
[  147.610046][  T719] RAX: ffffffffffffffda RBX: 00005634f1964990 RCX: 00007faa9b0bdb4b
[  147.610727][  T719] RDX: 00005634f1966d20 RSI: 00005634f19612c0 RDI: 00005634f1964990
[  147.611528][  T719] RBP: 00007ffff9921800 R08: 0000000000000001 R09: 0000000000000001
[  147.612192][  T719] R10: 0000000000000008 R11: 0000000000000246 R12: 00000000ffffffff
[  147.612829][  T719] R13: 00005634f1964990 R14: 00005634f19612c0 R15: 00005634f1966d20
[  147.613479][  T719]  </TASK>
[  147.613787][  T719] Modules linked in: intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel kvm_intel rapl iTiTCO_vendor_support i2c_i801 i2c_smbus lpc_ich drm loop drm_panel_orientation_quirks zram
[  147.615630][  T719] ---[ end trace 0000000000000000 ]---
[  147.616253][  T719] RIP: 0010:folio_add_new_anon_rmap+0x2cc/0x8f0
[  147.616714][  T719] Code: c7 c6 20 d2 38 83 48 89 df e8 c0 4a fb ff 0f 0b 48 89 ef e8 16 ab 08 00 4c 3b 65 00 0f 83 cd fd ff ff 48 89 ef e8 34 44 fb ff  f7 c3 ff 0f 00 00 0f 85 de fe ff ff be 08 00 00 00 48 89 df
[  147.618160][  T719] RSP: 0018:ffff88815759f568 EFLAGS: 00010286
[  147.618594][  T719] RAX: 00000000000000fa RBX: ffffea00053eef40 RCX: 0000000000000000
[  147.619318][  T719] RDX: 0000000000000000 RSI: ffffffff81289b44 RDI: ffffffff872ff1a0
[  147.619930][  T719] RBP: ffff888166134e68 R08: 0000000000000001 R09: ffffed102aeb3e5f
[  147.620577][  T719] R10: ffff88815759f2ff R11: 0000000000000000 R12: 0000000000401020
[  147.621236][  T719] R13: 0000000000000001 R14: ffffea00053eef40 R15: ffffea00053eef40
[  147.621894][  T719] FS:  0000000000000000(0000) GS:ffff88842ce00000(0000) knlGS:0000000000000000
[  147.622596][  T719] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  147.623186][  T719] CR2: 00007faa9b0bdb21 CR3: 000000010c0f8002 CR4: 0000000000770ef0
[  147.623960][  T719] PKRU: 55555554
[  147.624331][  T719] note: up[719] exited with preempt_count 1
[  147.624953][  T719] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49
[  147.625898][  T719] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 719, name: up
[  147.626672][  T719] preempt_count: 0, expected: 0
[  147.627945][  T719] RCU nest depth: 1, expected: 0
[  147.628410][  T719] INFO: lockdep is turned off.
[  147.628898][  T719] CPU: 3 PID: 719 Comm: up Tainted: G      D            6.7.0+ #273 faf755a6fc44b54f4ff1c207411fbd9df5a3968d
[  147.629954][  T719] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
[  147.630838][  T719] Call Trace:
[  147.631185][  T719]  <TASK>
[  147.631514][  T719]  dump_stack_lvl+0x15d/0x180
[  147.631973][  T719]  __might_resched+0x270/0x3b0
[  147.636533][  T719]  exit_signals+0x1d/0x460
[  147.636947][  T719]  do_exit+0x27f/0x13b0
[  147.637368][  T719]  ? __pfx__printk+0x10/0x10
[  147.637827][  T719]  ? __pfx_do_exit+0x10/0x10
[  147.638238][  T719]  make_task_dead+0xd9/0x240
[  147.638610][  T719]  rewind_stack_and_make_dead+0x17/0x20
[  147.639064][  T719] RIP: 0033:0x7faa9b0bdb4b
[  147.639445][  T719] Code: Unable to access opcode bytes at 0x7faa9b0bdb21.
[  147.640015][  T719] RSP: 002b:00007ffff9921708 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
[  147.640694][  T719] RAX: ffffffffffffffda RBX: 00005634f1964990 RCX: 00007faa9b0bdb4b
[  147.641407][  T719] RDX: 00005634f1966d20 RSI: 00005634f19612c0 RDI: 00005634f1964990
[  147.642133][  T719] RBP: 00007ffff9921800 R08: 0000000000000001 R09: 0000000000000001
[  147.642911][  T719] R10: 0000000000000008 R11: 0000000000000246 R12: 00000000ffffffff
[  147.643685][  T719] R13: 00005634f1964990 R14: 00005634f19612c0 R15: 00005634f1966d20
[  147.644454][  T719]  </TASK>
[  147.644819][  T719] ------------[ cut here ]------------
David Hildenbrand Jan. 14, 2024, 5:33 p.m. UTC | #2
On 13.01.24 23:42, Jiri Olsa wrote:
> On Thu, Dec 07, 2023 at 04:12:03PM +0000, Ryan Roberts wrote:
>> In preparation for supporting anonymous multi-size THP, improve
>> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
>> passed to it. In this case, all contained pages are accounted using the
>> order-0 folio (or base page) scheme.
>>
>> Reviewed-by: Yu Zhao <yuzhao@google.com>
>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
>> Reviewed-by: David Hildenbrand <david@redhat.com>
>> Reviewed-by: Barry Song <v-songbaohua@oppo.com>
>> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> Tested-by: John Hubbard <jhubbard@nvidia.com>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>   mm/rmap.c | 28 ++++++++++++++++++++--------
>>   1 file changed, 20 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 2a1e45e6419f..846fc79f3ca9 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
>>    * This means the inc-and-test can be bypassed.
>>    * The folio does not have to be locked.
>>    *
>> - * If the folio is large, it is accounted as a THP.  As the folio
>> + * If the folio is pmd-mappable, it is accounted as a THP.  As the folio
>>    * is new, it's assumed to be mapped exclusively by a single process.
>>    */
>>   void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>>   		unsigned long address)
>>   {
>> -	int nr;
>> +	int nr = folio_nr_pages(folio);
>>   
>> -	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
>> +	VM_BUG_ON_VMA(address < vma->vm_start ||
>> +			address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
> 
> hi,
> I'm hitting this bug (console output below) with adding uprobe
> on simple program like:
> 
>    $ cat up.c
>    int main(void)
>    {
>       return 0;
>    }
> 
>    # bpftrace -e 'uprobe:/home/jolsa/up:_start {}'
> 
>    $ ./up
> 
> it's on top of current linus tree master:
>    052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
> 
> before this patch it seems to work, I can send my .config if needed

bpf only inserts a small folio, so no magic there.

It was:
	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
And now it is
	VM_BUG_ON_VMA(address < vma->vm_start || address + (nr << PAGE_SHIFT) > vma->vm_end, vma);

I think this change is sane. As long as the address is aligned to full pages
(which it better should be)

Staring at uprobe_write_opcode, likely vaddr isn't aligned ...

Likely (hopefully) that is not an issue for __folio_set_anon(), because linear_page_index()
will mask these bits off.


Would the following change fix it for you?

 From c640a8363e47bc96965a35115a040b5f876c4320 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Sun, 14 Jan 2024 18:32:57 +0100
Subject: [PATCH] tmp

Signed-off-by: David Hildenbrand <david@redhat.com>
---
  kernel/events/uprobes.c | 2 +-
  mm/rmap.c               | 1 +
  2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 485bb0389b488..929e98c629652 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -537,7 +537,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
  		}
  	}
  
-	ret = __replace_page(vma, vaddr, old_page, new_page);
+	ret = __replace_page(vma, vaddr & PAGE_MASK, old_page, new_page);
  	if (new_page)
  		put_page(new_page);
  put_old:
diff --git a/mm/rmap.c b/mm/rmap.c
index f5d43edad529a..a903db4df6b97 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1408,6 +1408,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
  {
  	int nr = folio_nr_pages(folio);
  
+	VM_WARN_ON_FOLIO(!IS_ALIGNED(address, PAGE_SIZE), folio);
  	VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
  	VM_BUG_ON_VMA(address < vma->vm_start ||
  			address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
Jiri Olsa Jan. 14, 2024, 8:55 p.m. UTC | #3
On Sun, Jan 14, 2024 at 06:33:56PM +0100, David Hildenbrand wrote:
> On 13.01.24 23:42, Jiri Olsa wrote:
> > On Thu, Dec 07, 2023 at 04:12:03PM +0000, Ryan Roberts wrote:
> > > In preparation for supporting anonymous multi-size THP, improve
> > > folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
> > > passed to it. In this case, all contained pages are accounted using the
> > > order-0 folio (or base page) scheme.
> > > 
> > > Reviewed-by: Yu Zhao <yuzhao@google.com>
> > > Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
> > > Reviewed-by: David Hildenbrand <david@redhat.com>
> > > Reviewed-by: Barry Song <v-songbaohua@oppo.com>
> > > Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> > > Tested-by: John Hubbard <jhubbard@nvidia.com>
> > > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> > > ---
> > >   mm/rmap.c | 28 ++++++++++++++++++++--------
> > >   1 file changed, 20 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > index 2a1e45e6419f..846fc79f3ca9 100644
> > > --- a/mm/rmap.c
> > > +++ b/mm/rmap.c
> > > @@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
> > >    * This means the inc-and-test can be bypassed.
> > >    * The folio does not have to be locked.
> > >    *
> > > - * If the folio is large, it is accounted as a THP.  As the folio
> > > + * If the folio is pmd-mappable, it is accounted as a THP.  As the folio
> > >    * is new, it's assumed to be mapped exclusively by a single process.
> > >    */
> > >   void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> > >   		unsigned long address)
> > >   {
> > > -	int nr;
> > > +	int nr = folio_nr_pages(folio);
> > > -	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
> > > +	VM_BUG_ON_VMA(address < vma->vm_start ||
> > > +			address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
> > 
> > hi,
> > I'm hitting this bug (console output below) with adding uprobe
> > on simple program like:
> > 
> >    $ cat up.c
> >    int main(void)
> >    {
> >       return 0;
> >    }
> > 
> >    # bpftrace -e 'uprobe:/home/jolsa/up:_start {}'
> > 
> >    $ ./up
> > 
> > it's on top of current linus tree master:
> >    052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
> > 
> > before this patch it seems to work, I can send my .config if needed
> 
> bpf only inserts a small folio, so no magic there.
> 
> It was:
> 	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
> And now it is
> 	VM_BUG_ON_VMA(address < vma->vm_start || address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
> 
> I think this change is sane. As long as the address is aligned to full pages
> (which it better should be)
> 
> Staring at uprobe_write_opcode, likely vaddr isn't aligned ...
> 
> Likely (hopefully) that is not an issue for __folio_set_anon(), because linear_page_index()
> will mask these bits off.
> 
> 
> Would the following change fix it for you?

great, that fixes it for me, you can add my

Tested-by: Jiri Olsa <jolsa@kernel.org>

thanks,
jirka

> 
> From c640a8363e47bc96965a35115a040b5f876c4320 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@redhat.com>
> Date: Sun, 14 Jan 2024 18:32:57 +0100
> Subject: [PATCH] tmp
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  kernel/events/uprobes.c | 2 +-
>  mm/rmap.c               | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 485bb0389b488..929e98c629652 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -537,7 +537,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>  		}
>  	}
> -	ret = __replace_page(vma, vaddr, old_page, new_page);
> +	ret = __replace_page(vma, vaddr & PAGE_MASK, old_page, new_page);
>  	if (new_page)
>  		put_page(new_page);
>  put_old:
> diff --git a/mm/rmap.c b/mm/rmap.c
> index f5d43edad529a..a903db4df6b97 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1408,6 +1408,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>  {
>  	int nr = folio_nr_pages(folio);
> +	VM_WARN_ON_FOLIO(!IS_ALIGNED(address, PAGE_SIZE), folio);
>  	VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
>  	VM_BUG_ON_VMA(address < vma->vm_start ||
>  			address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
> -- 
> 2.43.0
> 
> 
> 
> -- 
> Cheers,
> 
> David / dhildenb
>
Ryan Roberts Jan. 15, 2024, 8:50 a.m. UTC | #4
On 14/01/2024 20:55, Jiri Olsa wrote:
> On Sun, Jan 14, 2024 at 06:33:56PM +0100, David Hildenbrand wrote:
>> On 13.01.24 23:42, Jiri Olsa wrote:
>>> On Thu, Dec 07, 2023 at 04:12:03PM +0000, Ryan Roberts wrote:
>>>> In preparation for supporting anonymous multi-size THP, improve
>>>> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
>>>> passed to it. In this case, all contained pages are accounted using the
>>>> order-0 folio (or base page) scheme.
>>>>
>>>> Reviewed-by: Yu Zhao <yuzhao@google.com>
>>>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
>>>> Reviewed-by: David Hildenbrand <david@redhat.com>
>>>> Reviewed-by: Barry Song <v-songbaohua@oppo.com>
>>>> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>> Tested-by: John Hubbard <jhubbard@nvidia.com>
>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>> ---
>>>>   mm/rmap.c | 28 ++++++++++++++++++++--------
>>>>   1 file changed, 20 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>> index 2a1e45e6419f..846fc79f3ca9 100644
>>>> --- a/mm/rmap.c
>>>> +++ b/mm/rmap.c
>>>> @@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
>>>>    * This means the inc-and-test can be bypassed.
>>>>    * The folio does not have to be locked.
>>>>    *
>>>> - * If the folio is large, it is accounted as a THP.  As the folio
>>>> + * If the folio is pmd-mappable, it is accounted as a THP.  As the folio
>>>>    * is new, it's assumed to be mapped exclusively by a single process.
>>>>    */
>>>>   void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>>>>   		unsigned long address)
>>>>   {
>>>> -	int nr;
>>>> +	int nr = folio_nr_pages(folio);
>>>> -	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
>>>> +	VM_BUG_ON_VMA(address < vma->vm_start ||
>>>> +			address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>>>
>>> hi,
>>> I'm hitting this bug (console output below) with adding uprobe
>>> on simple program like:
>>>
>>>    $ cat up.c
>>>    int main(void)
>>>    {
>>>       return 0;
>>>    }
>>>
>>>    # bpftrace -e 'uprobe:/home/jolsa/up:_start {}'
>>>
>>>    $ ./up
>>>
>>> it's on top of current linus tree master:
>>>    052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
>>>
>>> before this patch it seems to work, I can send my .config if needed

Thanks for the bug report!

>>
>> bpf only inserts a small folio, so no magic there.
>>
>> It was:
>> 	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
>> And now it is
>> 	VM_BUG_ON_VMA(address < vma->vm_start || address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>>
>> I think this change is sane. As long as the address is aligned to full pages
>> (which it better should be)
>>
>> Staring at uprobe_write_opcode, likely vaddr isn't aligned ...
>>
>> Likely (hopefully) that is not an issue for __folio_set_anon(), because linear_page_index()
>> will mask these bits off.
>>
>>
>> Would the following change fix it for you?

And thanks for fixing my mess so quickly, David.

FWIW, I agree with your diagnosis. One small suggestion below.

> 
> great, that fixes it for me, you can add my
> 
> Tested-by: Jiri Olsa <jolsa@kernel.org>
> 
> thanks,
> jirka
> 
>>
>> From c640a8363e47bc96965a35115a040b5f876c4320 Mon Sep 17 00:00:00 2001
>> From: David Hildenbrand <david@redhat.com>
>> Date: Sun, 14 Jan 2024 18:32:57 +0100
>> Subject: [PATCH] tmp
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  kernel/events/uprobes.c | 2 +-
>>  mm/rmap.c               | 1 +
>>  2 files changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
>> index 485bb0389b488..929e98c629652 100644
>> --- a/kernel/events/uprobes.c
>> +++ b/kernel/events/uprobes.c
>> @@ -537,7 +537,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>>  		}
>>  	}
>> -	ret = __replace_page(vma, vaddr, old_page, new_page);
>> +	ret = __replace_page(vma, vaddr & PAGE_MASK, old_page, new_page);
>>  	if (new_page)
>>  		put_page(new_page);
>>  put_old:
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index f5d43edad529a..a903db4df6b97 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1408,6 +1408,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>>  {
>>  	int nr = folio_nr_pages(folio);
>> +	VM_WARN_ON_FOLIO(!IS_ALIGNED(address, PAGE_SIZE), folio);

nit: Is it worth also adding this to __folio_add_anon_rmap() so that
folio_add_anon_rmap_ptes() and folio_add_anon_rmap_pmd() also benefit?

Regardless:

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

>>  	VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
>>  	VM_BUG_ON_VMA(address < vma->vm_start ||
>>  			address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>> -- 
>> 2.43.0
>>
>>
>>
>> -- 
>> Cheers,
>>
>> David / dhildenb
>>
David Hildenbrand Jan. 15, 2024, 9:38 a.m. UTC | #5
>>> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
>>> index 485bb0389b488..929e98c629652 100644
>>> --- a/kernel/events/uprobes.c
>>> +++ b/kernel/events/uprobes.c
>>> @@ -537,7 +537,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>>>   		}
>>>   	}
>>> -	ret = __replace_page(vma, vaddr, old_page, new_page);
>>> +	ret = __replace_page(vma, vaddr & PAGE_MASK, old_page, new_page);
>>>   	if (new_page)
>>>   		put_page(new_page);
>>>   put_old:
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index f5d43edad529a..a903db4df6b97 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -1408,6 +1408,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>>>   {
>>>   	int nr = folio_nr_pages(folio);
>>> +	VM_WARN_ON_FOLIO(!IS_ALIGNED(address, PAGE_SIZE), folio);
> 
> nit: Is it worth also adding this to __folio_add_anon_rmap() so that
> folio_add_anon_rmap_ptes() and folio_add_anon_rmap_pmd() also benefit?
> 

Yes, same thoughts. Just included it so we would catch if still 
something goes wrong here.

I'll split that change out either way.


> Regardless:
> 
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

Thanks!
Sven Schnelle Jan. 24, 2024, 11:15 a.m. UTC | #6
Ryan Roberts <ryan.roberts@arm.com> writes:

> On 14/01/2024 20:55, Jiri Olsa wrote:
>> On Sun, Jan 14, 2024 at 06:33:56PM +0100, David Hildenbrand wrote:
>>> On 13.01.24 23:42, Jiri Olsa wrote:
>>>> On Thu, Dec 07, 2023 at 04:12:03PM +0000, Ryan Roberts wrote:
>>>>> In preparation for supporting anonymous multi-size THP, improve
>>>>> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
>>>>> passed to it. In this case, all contained pages are accounted using the
>>>>> order-0 folio (or base page) scheme.
>>>>>
>>>>> Reviewed-by: Yu Zhao <yuzhao@google.com>
>>>>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
>>>>> Reviewed-by: David Hildenbrand <david@redhat.com>
>>>>> Reviewed-by: Barry Song <v-songbaohua@oppo.com>
>>>>> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>>> Tested-by: John Hubbard <jhubbard@nvidia.com>
>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>>> ---
>>>>>   mm/rmap.c | 28 ++++++++++++++++++++--------
>>>>>   1 file changed, 20 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>>> index 2a1e45e6419f..846fc79f3ca9 100644
>>>>> --- a/mm/rmap.c
>>>>> +++ b/mm/rmap.c
>>>>> @@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
>>>>>    * This means the inc-and-test can be bypassed.
>>>>>    * The folio does not have to be locked.
>>>>>    *
>>>>> - * If the folio is large, it is accounted as a THP.  As the folio
>>>>> + * If the folio is pmd-mappable, it is accounted as a THP.  As the folio
>>>>>    * is new, it's assumed to be mapped exclusively by a single process.
>>>>>    */
>>>>>   void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>>>>>   		unsigned long address)
>>>>>   {
>>>>> -	int nr;
>>>>> +	int nr = folio_nr_pages(folio);
>>>>> -	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
>>>>> +	VM_BUG_ON_VMA(address < vma->vm_start ||
>>>>> +			address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>>>>
>>>> hi,
>>>> I'm hitting this bug (console output below) with adding uprobe
>>>> on simple program like:
>>>>
>>>>    $ cat up.c
>>>>    int main(void)
>>>>    {
>>>>       return 0;
>>>>    }
>>>>
>>>>    # bpftrace -e 'uprobe:/home/jolsa/up:_start {}'
>>>>
>>>>    $ ./up
>>>>
>>>> it's on top of current linus tree master:
>>>>    052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
>>>>
>>>> before this patch it seems to work, I can send my .config if needed
>
> Thanks for the bug report!

I just hit the same bug in our CI, but can't find the fix in -next. Is
this in the queue somewhere?

Thanks
Sven
Jiri Olsa Jan. 24, 2024, 11:19 a.m. UTC | #7
On Wed, Jan 24, 2024 at 12:15:52PM +0100, Sven Schnelle wrote:
> Ryan Roberts <ryan.roberts@arm.com> writes:
> 
> > On 14/01/2024 20:55, Jiri Olsa wrote:
> >> On Sun, Jan 14, 2024 at 06:33:56PM +0100, David Hildenbrand wrote:
> >>> On 13.01.24 23:42, Jiri Olsa wrote:
> >>>> On Thu, Dec 07, 2023 at 04:12:03PM +0000, Ryan Roberts wrote:
> >>>>> In preparation for supporting anonymous multi-size THP, improve
> >>>>> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
> >>>>> passed to it. In this case, all contained pages are accounted using the
> >>>>> order-0 folio (or base page) scheme.
> >>>>>
> >>>>> Reviewed-by: Yu Zhao <yuzhao@google.com>
> >>>>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
> >>>>> Reviewed-by: David Hildenbrand <david@redhat.com>
> >>>>> Reviewed-by: Barry Song <v-songbaohua@oppo.com>
> >>>>> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> >>>>> Tested-by: John Hubbard <jhubbard@nvidia.com>
> >>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> >>>>> ---
> >>>>>   mm/rmap.c | 28 ++++++++++++++++++++--------
> >>>>>   1 file changed, 20 insertions(+), 8 deletions(-)
> >>>>>
> >>>>> diff --git a/mm/rmap.c b/mm/rmap.c
> >>>>> index 2a1e45e6419f..846fc79f3ca9 100644
> >>>>> --- a/mm/rmap.c
> >>>>> +++ b/mm/rmap.c
> >>>>> @@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
> >>>>>    * This means the inc-and-test can be bypassed.
> >>>>>    * The folio does not have to be locked.
> >>>>>    *
> >>>>> - * If the folio is large, it is accounted as a THP.  As the folio
> >>>>> + * If the folio is pmd-mappable, it is accounted as a THP.  As the folio
> >>>>>    * is new, it's assumed to be mapped exclusively by a single process.
> >>>>>    */
> >>>>>   void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> >>>>>   		unsigned long address)
> >>>>>   {
> >>>>> -	int nr;
> >>>>> +	int nr = folio_nr_pages(folio);
> >>>>> -	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
> >>>>> +	VM_BUG_ON_VMA(address < vma->vm_start ||
> >>>>> +			address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
> >>>>
> >>>> hi,
> >>>> I'm hitting this bug (console output below) with adding uprobe
> >>>> on simple program like:
> >>>>
> >>>>    $ cat up.c
> >>>>    int main(void)
> >>>>    {
> >>>>       return 0;
> >>>>    }
> >>>>
> >>>>    # bpftrace -e 'uprobe:/home/jolsa/up:_start {}'
> >>>>
> >>>>    $ ./up
> >>>>
> >>>> it's on top of current linus tree master:
> >>>>    052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
> >>>>
> >>>> before this patch it seems to work, I can send my .config if needed
> >
> > Thanks for the bug report!
> 
> I just hit the same bug in our CI, but can't find the fix in -next. Is
> this in the queue somewhere?

we hit it as well, but I can see the fix in linux-next/master

  4c137bc28064 uprobes: use pagesize-aligned virtual address when replacing pages

jirka
Ryan Roberts Jan. 24, 2024, 12:02 p.m. UTC | #8
On 24/01/2024 11:19, Jiri Olsa wrote:
> On Wed, Jan 24, 2024 at 12:15:52PM +0100, Sven Schnelle wrote:
>> Ryan Roberts <ryan.roberts@arm.com> writes:
>>
>>> On 14/01/2024 20:55, Jiri Olsa wrote:
>>>> On Sun, Jan 14, 2024 at 06:33:56PM +0100, David Hildenbrand wrote:
>>>>> On 13.01.24 23:42, Jiri Olsa wrote:
>>>>>> On Thu, Dec 07, 2023 at 04:12:03PM +0000, Ryan Roberts wrote:
>>>>>>> In preparation for supporting anonymous multi-size THP, improve
>>>>>>> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
>>>>>>> passed to it. In this case, all contained pages are accounted using the
>>>>>>> order-0 folio (or base page) scheme.
>>>>>>>
>>>>>>> Reviewed-by: Yu Zhao <yuzhao@google.com>
>>>>>>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
>>>>>>> Reviewed-by: David Hildenbrand <david@redhat.com>
>>>>>>> Reviewed-by: Barry Song <v-songbaohua@oppo.com>
>>>>>>> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>>>>> Tested-by: John Hubbard <jhubbard@nvidia.com>
>>>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>>>>> ---
>>>>>>>   mm/rmap.c | 28 ++++++++++++++++++++--------
>>>>>>>   1 file changed, 20 insertions(+), 8 deletions(-)
>>>>>>>
>>>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>>>>> index 2a1e45e6419f..846fc79f3ca9 100644
>>>>>>> --- a/mm/rmap.c
>>>>>>> +++ b/mm/rmap.c
>>>>>>> @@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
>>>>>>>    * This means the inc-and-test can be bypassed.
>>>>>>>    * The folio does not have to be locked.
>>>>>>>    *
>>>>>>> - * If the folio is large, it is accounted as a THP.  As the folio
>>>>>>> + * If the folio is pmd-mappable, it is accounted as a THP.  As the folio
>>>>>>>    * is new, it's assumed to be mapped exclusively by a single process.
>>>>>>>    */
>>>>>>>   void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>>>>>>>   		unsigned long address)
>>>>>>>   {
>>>>>>> -	int nr;
>>>>>>> +	int nr = folio_nr_pages(folio);
>>>>>>> -	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
>>>>>>> +	VM_BUG_ON_VMA(address < vma->vm_start ||
>>>>>>> +			address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>>>>>>
>>>>>> hi,
>>>>>> I'm hitting this bug (console output below) with adding uprobe
>>>>>> on simple program like:
>>>>>>
>>>>>>    $ cat up.c
>>>>>>    int main(void)
>>>>>>    {
>>>>>>       return 0;
>>>>>>    }
>>>>>>
>>>>>>    # bpftrace -e 'uprobe:/home/jolsa/up:_start {}'
>>>>>>
>>>>>>    $ ./up
>>>>>>
>>>>>> it's on top of current linus tree master:
>>>>>>    052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
>>>>>>
>>>>>> before this patch it seems to work, I can send my .config if needed
>>>
>>> Thanks for the bug report!
>>
>> I just hit the same bug in our CI, but can't find the fix in -next. Is
>> this in the queue somewhere?
> 
> we hit it as well, but I can see the fix in linux-next/master
> 
>   4c137bc28064 uprobes: use pagesize-aligned virtual address when replacing pages

Yes that's the one. Just to confirm: you are still hitting the VM_BUG_ON despite
having this change in your kernel? Could you please send over the full bug log?

> 
> jirka
Jiri Olsa Jan. 24, 2024, 12:06 p.m. UTC | #9
On Wed, Jan 24, 2024 at 12:02:53PM +0000, Ryan Roberts wrote:
> On 24/01/2024 11:19, Jiri Olsa wrote:
> > On Wed, Jan 24, 2024 at 12:15:52PM +0100, Sven Schnelle wrote:
> >> Ryan Roberts <ryan.roberts@arm.com> writes:
> >>
> >>> On 14/01/2024 20:55, Jiri Olsa wrote:
> >>>> On Sun, Jan 14, 2024 at 06:33:56PM +0100, David Hildenbrand wrote:
> >>>>> On 13.01.24 23:42, Jiri Olsa wrote:
> >>>>>> On Thu, Dec 07, 2023 at 04:12:03PM +0000, Ryan Roberts wrote:
> >>>>>>> In preparation for supporting anonymous multi-size THP, improve
> >>>>>>> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
> >>>>>>> passed to it. In this case, all contained pages are accounted using the
> >>>>>>> order-0 folio (or base page) scheme.
> >>>>>>>
> >>>>>>> Reviewed-by: Yu Zhao <yuzhao@google.com>
> >>>>>>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
> >>>>>>> Reviewed-by: David Hildenbrand <david@redhat.com>
> >>>>>>> Reviewed-by: Barry Song <v-songbaohua@oppo.com>
> >>>>>>> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> >>>>>>> Tested-by: John Hubbard <jhubbard@nvidia.com>
> >>>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> >>>>>>> ---
> >>>>>>>   mm/rmap.c | 28 ++++++++++++++++++++--------
> >>>>>>>   1 file changed, 20 insertions(+), 8 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
> >>>>>>> index 2a1e45e6419f..846fc79f3ca9 100644
> >>>>>>> --- a/mm/rmap.c
> >>>>>>> +++ b/mm/rmap.c
> >>>>>>> @@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
> >>>>>>>    * This means the inc-and-test can be bypassed.
> >>>>>>>    * The folio does not have to be locked.
> >>>>>>>    *
> >>>>>>> - * If the folio is large, it is accounted as a THP.  As the folio
> >>>>>>> + * If the folio is pmd-mappable, it is accounted as a THP.  As the folio
> >>>>>>>    * is new, it's assumed to be mapped exclusively by a single process.
> >>>>>>>    */
> >>>>>>>   void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> >>>>>>>   		unsigned long address)
> >>>>>>>   {
> >>>>>>> -	int nr;
> >>>>>>> +	int nr = folio_nr_pages(folio);
> >>>>>>> -	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
> >>>>>>> +	VM_BUG_ON_VMA(address < vma->vm_start ||
> >>>>>>> +			address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
> >>>>>>
> >>>>>> hi,
> >>>>>> I'm hitting this bug (console output below) with adding uprobe
> >>>>>> on simple program like:
> >>>>>>
> >>>>>>    $ cat up.c
> >>>>>>    int main(void)
> >>>>>>    {
> >>>>>>       return 0;
> >>>>>>    }
> >>>>>>
> >>>>>>    # bpftrace -e 'uprobe:/home/jolsa/up:_start {}'
> >>>>>>
> >>>>>>    $ ./up
> >>>>>>
> >>>>>> it's on top of current linus tree master:
> >>>>>>    052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
> >>>>>>
> >>>>>> before this patch it seems to work, I can send my .config if needed
> >>>
> >>> Thanks for the bug report!
> >>
> >> I just hit the same bug in our CI, but can't find the fix in -next. Is
> >> this in the queue somewhere?
> > 
> > we hit it as well, but I can see the fix in linux-next/master
> > 
> >   4c137bc28064 uprobes: use pagesize-aligned virtual address when replacing pages
> 
> Yes that's the one. Just to confirm: you are still hitting the VM_BUG_ON despite
> having this change in your kernel? Could you please send over the full bug log?

ah sorry.. I meant the change fixes the problem for us, it just did not
yet propagate through the merge cycle into bpf trees.. but I can see it
in linux-next tree, so it's probably just matter of time

jirka
Ryan Roberts Jan. 24, 2024, 12:17 p.m. UTC | #10
On 24/01/2024 12:06, Jiri Olsa wrote:
> On Wed, Jan 24, 2024 at 12:02:53PM +0000, Ryan Roberts wrote:
>> On 24/01/2024 11:19, Jiri Olsa wrote:
>>> On Wed, Jan 24, 2024 at 12:15:52PM +0100, Sven Schnelle wrote:
>>>> Ryan Roberts <ryan.roberts@arm.com> writes:
>>>>
>>>>> On 14/01/2024 20:55, Jiri Olsa wrote:
>>>>>> On Sun, Jan 14, 2024 at 06:33:56PM +0100, David Hildenbrand wrote:
>>>>>>> On 13.01.24 23:42, Jiri Olsa wrote:
>>>>>>>> On Thu, Dec 07, 2023 at 04:12:03PM +0000, Ryan Roberts wrote:
>>>>>>>>> In preparation for supporting anonymous multi-size THP, improve
>>>>>>>>> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
>>>>>>>>> passed to it. In this case, all contained pages are accounted using the
>>>>>>>>> order-0 folio (or base page) scheme.
>>>>>>>>>
>>>>>>>>> Reviewed-by: Yu Zhao <yuzhao@google.com>
>>>>>>>>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
>>>>>>>>> Reviewed-by: David Hildenbrand <david@redhat.com>
>>>>>>>>> Reviewed-by: Barry Song <v-songbaohua@oppo.com>
>>>>>>>>> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>>>>>>> Tested-by: John Hubbard <jhubbard@nvidia.com>
>>>>>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>>>>>>> ---
>>>>>>>>>   mm/rmap.c | 28 ++++++++++++++++++++--------
>>>>>>>>>   1 file changed, 20 insertions(+), 8 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>>>>>>> index 2a1e45e6419f..846fc79f3ca9 100644
>>>>>>>>> --- a/mm/rmap.c
>>>>>>>>> +++ b/mm/rmap.c
>>>>>>>>> @@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
>>>>>>>>>    * This means the inc-and-test can be bypassed.
>>>>>>>>>    * The folio does not have to be locked.
>>>>>>>>>    *
>>>>>>>>> - * If the folio is large, it is accounted as a THP.  As the folio
>>>>>>>>> + * If the folio is pmd-mappable, it is accounted as a THP.  As the folio
>>>>>>>>>    * is new, it's assumed to be mapped exclusively by a single process.
>>>>>>>>>    */
>>>>>>>>>   void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>>>>>>>>>   		unsigned long address)
>>>>>>>>>   {
>>>>>>>>> -	int nr;
>>>>>>>>> +	int nr = folio_nr_pages(folio);
>>>>>>>>> -	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
>>>>>>>>> +	VM_BUG_ON_VMA(address < vma->vm_start ||
>>>>>>>>> +			address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>>>>>>>>
>>>>>>>> hi,
>>>>>>>> I'm hitting this bug (console output below) with adding uprobe
>>>>>>>> on simple program like:
>>>>>>>>
>>>>>>>>    $ cat up.c
>>>>>>>>    int main(void)
>>>>>>>>    {
>>>>>>>>       return 0;
>>>>>>>>    }
>>>>>>>>
>>>>>>>>    # bpftrace -e 'uprobe:/home/jolsa/up:_start {}'
>>>>>>>>
>>>>>>>>    $ ./up
>>>>>>>>
>>>>>>>> it's on top of current linus tree master:
>>>>>>>>    052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
>>>>>>>>
>>>>>>>> before this patch it seems to work, I can send my .config if needed
>>>>>
>>>>> Thanks for the bug report!
>>>>
>>>> I just hit the same bug in our CI, but can't find the fix in -next. Is
>>>> this in the queue somewhere?
>>>
>>> we hit it as well, but I can see the fix in linux-next/master
>>>
>>>   4c137bc28064 uprobes: use pagesize-aligned virtual address when replacing pages
>>
>> Yes that's the one. Just to confirm: you are still hitting the VM_BUG_ON despite
>> having this change in your kernel? Could you please send over the full bug log?
> 
> ah sorry.. I meant the change fixes the problem for us, it just did not
> yet propagate through the merge cycle into bpf trees.. but I can see it
> in linux-next tree, so it's probably just matter of time

OK great! How about you, Sven? Do you have this change in your kernel? Hopefully
it should fix your problem.

> 
> jirka
Sven Schnelle Jan. 24, 2024, 12:28 p.m. UTC | #11
Hi Ryan,

Ryan Roberts <ryan.roberts@arm.com> writes:

>>>>>>>>> I'm hitting this bug (console output below) with adding uprobe
>>>>>>>>> on simple program like:
>>>>>>>>>
>>>>>>>>>    $ cat up.c
>>>>>>>>>    int main(void)
>>>>>>>>>    {
>>>>>>>>>       return 0;
>>>>>>>>>    }
>>>>>>>>>
>>>>>>>>>    # bpftrace -e 'uprobe:/home/jolsa/up:_start {}'
>>>>>>>>>
>>>>>>>>>    $ ./up
>>>>>>>>>
>>>>>>>>> it's on top of current linus tree master:
>>>>>>>>>    052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
>>>>>>>>>
>>>>>>>>> before this patch it seems to work, I can send my .config if needed
>>>>>>
>>>>>> Thanks for the bug report!
>>>>>
>>>>> I just hit the same bug in our CI, but can't find the fix in -next. Is
>>>>> this in the queue somewhere?
>>>>
>>>> we hit it as well, but I can see the fix in linux-next/master
>>>>
>>>>   4c137bc28064 uprobes: use pagesize-aligned virtual address when replacing pages
>>>
>>> Yes that's the one. Just to confirm: you are still hitting the VM_BUG_ON despite
>>> having this change in your kernel? Could you please send over the full bug log?
>> 
>> ah sorry.. I meant the change fixes the problem for us, it just did not
>> yet propagate through the merge cycle into bpf trees.. but I can see it
>> in linux-next tree, so it's probably just matter of time
>
> OK great! How about you, Sven? Do you have this change in your kernel? Hopefully
> it should fix your problem.

Same here - the fix makes uprobes work again, i just didn't see it in
torvalds-master and neither in todays linux-next. But Jiri is right,
it's in linux-next/master. I just missed to find it there. So everything
should be ok.
Ryan Roberts Jan. 24, 2024, 12:42 p.m. UTC | #12
On 24/01/2024 12:28, Sven Schnelle wrote:
> Hi Ryan,
> 
> Ryan Roberts <ryan.roberts@arm.com> writes:
> 
>>>>>>>>>> I'm hitting this bug (console output below) with adding uprobe
>>>>>>>>>> on simple program like:
>>>>>>>>>>
>>>>>>>>>>    $ cat up.c
>>>>>>>>>>    int main(void)
>>>>>>>>>>    {
>>>>>>>>>>       return 0;
>>>>>>>>>>    }
>>>>>>>>>>
>>>>>>>>>>    # bpftrace -e 'uprobe:/home/jolsa/up:_start {}'
>>>>>>>>>>
>>>>>>>>>>    $ ./up
>>>>>>>>>>
>>>>>>>>>> it's on top of current linus tree master:
>>>>>>>>>>    052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
>>>>>>>>>>
>>>>>>>>>> before this patch it seems to work, I can send my .config if needed
>>>>>>>
>>>>>>> Thanks for the bug report!
>>>>>>
>>>>>> I just hit the same bug in our CI, but can't find the fix in -next. Is
>>>>>> this in the queue somewhere?
>>>>>
>>>>> we hit it as well, but I can see the fix in linux-next/master
>>>>>
>>>>>   4c137bc28064 uprobes: use pagesize-aligned virtual address when replacing pages
>>>>
>>>> Yes that's the one. Just to confirm: you are still hitting the VM_BUG_ON despite
>>>> having this change in your kernel? Could you please send over the full bug log?
>>>
>>> ah sorry.. I meant the change fixes the problem for us, it just did not
>>> yet propagate through the merge cycle into bpf trees.. but I can see it
>>> in linux-next tree, so it's probably just matter of time
>>
>> OK great! How about you, Sven? Do you have this change in your kernel? Hopefully
>> it should fix your problem.
> 
> Same here - the fix makes uprobes work again, i just didn't see it in
> torvalds-master and neither in todays linux-next. But Jiri is right,
> it's in linux-next/master. I just missed to find it there. So everything
> should be ok.

Great!
diff mbox series

Patch

diff --git a/mm/rmap.c b/mm/rmap.c
index 2a1e45e6419f..846fc79f3ca9 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1335,32 +1335,44 @@  void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
  * This means the inc-and-test can be bypassed.
  * The folio does not have to be locked.
  *
- * If the folio is large, it is accounted as a THP.  As the folio
+ * If the folio is pmd-mappable, it is accounted as a THP.  As the folio
  * is new, it's assumed to be mapped exclusively by a single process.
  */
 void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
 		unsigned long address)
 {
-	int nr;
+	int nr = folio_nr_pages(folio);
 
-	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
+	VM_BUG_ON_VMA(address < vma->vm_start ||
+			address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
 	__folio_set_swapbacked(folio);
+	__folio_set_anon(folio, vma, address, true);
 
-	if (likely(!folio_test_pmd_mappable(folio))) {
+	if (likely(!folio_test_large(folio))) {
 		/* increment count (starts at -1) */
 		atomic_set(&folio->_mapcount, 0);
-		nr = 1;
+		SetPageAnonExclusive(&folio->page);
+	} else if (!folio_test_pmd_mappable(folio)) {
+		int i;
+
+		for (i = 0; i < nr; i++) {
+			struct page *page = folio_page(folio, i);
+
+			/* increment count (starts at -1) */
+			atomic_set(&page->_mapcount, 0);
+			SetPageAnonExclusive(page);
+		}
+
+		atomic_set(&folio->_nr_pages_mapped, nr);
 	} else {
 		/* increment count (starts at -1) */
 		atomic_set(&folio->_entire_mapcount, 0);
 		atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED);
-		nr = folio_nr_pages(folio);
+		SetPageAnonExclusive(&folio->page);
 		__lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr);
 	}
 
 	__lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr);
-	__folio_set_anon(folio, vma, address, true);
-	SetPageAnonExclusive(&folio->page);
 }
 
 /**