diff mbox series

[v8,2/4] uprobe: use original page when all uprobes are removed

Message ID 20190724083600.832091-3-songliubraving@fb.com (mailing list archive)
State New, archived
Headers show
Series THP aware uprobe | expand

Commit Message

Song Liu July 24, 2019, 8:35 a.m. UTC
Currently, uprobe swaps the target page with a anonymous page in both
install_breakpoint() and remove_breakpoint(). When all uprobes on a page
are removed, the given mm is still using an anonymous page (not the
original page).

This patch allows uprobe to use original page when possible (all uprobes
on the page are already removed).

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 kernel/events/uprobes.c | 46 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 38 insertions(+), 8 deletions(-)

Comments

Oleg Nesterov July 24, 2019, 9:07 a.m. UTC | #1
On 07/24, Song Liu wrote:
>
> This patch allows uprobe to use original page when possible (all uprobes
> on the page are already removed).

and only if the original page is already in the page cache and uptodate,
right?

another reason why I think unmap makes more sense... but I won't argue.

Oleg.
Oleg Nesterov July 24, 2019, 9:17 a.m. UTC | #2
On 07/24, Oleg Nesterov wrote:
>
> On 07/24, Song Liu wrote:
> >
> > This patch allows uprobe to use original page when possible (all uprobes
> > on the page are already removed).
>
> and only if the original page is already in the page cache and uptodate,
> right?
>
> another reason why I think unmap makes more sense... but I won't argue.

but somehow I forgot we need to read the original page anyway to check
pages_identical(), so unmap is not really better, please forget.

Oleg.
Song Liu July 24, 2019, 9:20 a.m. UTC | #3
> On Jul 24, 2019, at 2:17 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> 
> On 07/24, Oleg Nesterov wrote:
>> 
>> On 07/24, Song Liu wrote:
>>> 
>>> This patch allows uprobe to use original page when possible (all uprobes
>>> on the page are already removed).
>> 
>> and only if the original page is already in the page cache and uptodate,
>> right?
>> 
>> another reason why I think unmap makes more sense... but I won't argue.
> 
> but somehow I forgot we need to read the original page anyway to check
> pages_identical(), so unmap is not really better, please forget.
> 

Yeah, I was trying to explain this. :)

Thanks for your feedbacks. 
Song
Oleg Nesterov July 24, 2019, 11:37 a.m. UTC | #4
On 07/24, Song Liu wrote:
>
>  	lock_page(old_page);
> @@ -177,15 +180,24 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
>  	mmu_notifier_invalidate_range_start(&range);
>  	err = -EAGAIN;
>  	if (!page_vma_mapped_walk(&pvmw)) {
> -		mem_cgroup_cancel_charge(new_page, memcg, false);
> +		if (!orig)
> +			mem_cgroup_cancel_charge(new_page, memcg, false);
>  		goto unlock;
>  	}
>  	VM_BUG_ON_PAGE(addr != pvmw.address, old_page);
>  
>  	get_page(new_page);
> -	page_add_new_anon_rmap(new_page, vma, addr, false);
> -	mem_cgroup_commit_charge(new_page, memcg, false, false);
> -	lru_cache_add_active_or_unevictable(new_page, vma);
> +	if (orig) {
> +		lock_page(new_page);  /* for page_add_file_rmap() */
> +		page_add_file_rmap(new_page, false);


Shouldn't we re-check new_page->mapping after lock_page() ? Or we can't
race with truncate?


and I am worried this code can try to lock the same page twice...
Say, the probed application does MADV_DONTNEED and then writes "int3"
into vma->vm_file at the same address to fool verify_opcode().

Oleg.
Song Liu July 24, 2019, 6:52 p.m. UTC | #5
> On Jul 24, 2019, at 4:37 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> 
> On 07/24, Song Liu wrote:
>> 
>> 	lock_page(old_page);
>> @@ -177,15 +180,24 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
>> 	mmu_notifier_invalidate_range_start(&range);
>> 	err = -EAGAIN;
>> 	if (!page_vma_mapped_walk(&pvmw)) {
>> -		mem_cgroup_cancel_charge(new_page, memcg, false);
>> +		if (!orig)
>> +			mem_cgroup_cancel_charge(new_page, memcg, false);
>> 		goto unlock;
>> 	}
>> 	VM_BUG_ON_PAGE(addr != pvmw.address, old_page);
>> 
>> 	get_page(new_page);
>> -	page_add_new_anon_rmap(new_page, vma, addr, false);
>> -	mem_cgroup_commit_charge(new_page, memcg, false, false);
>> -	lru_cache_add_active_or_unevictable(new_page, vma);
>> +	if (orig) {
>> +		lock_page(new_page);  /* for page_add_file_rmap() */
>> +		page_add_file_rmap(new_page, false);
> 
> 
> Shouldn't we re-check new_page->mapping after lock_page() ? Or we can't
> race with truncate?

We can't race with truncate, because the file is open as binary and 
protected with DENYWRITE (ETXTBSY). 

> 
> 
> and I am worried this code can try to lock the same page twice...
> Say, the probed application does MADV_DONTNEED and then writes "int3"
> into vma->vm_file at the same address to fool verify_opcode().
> 

Do you mean the case where old_page == new_page? I think this won't 
happen, because in uprobe_write_opcode() we only do orig_page for 
!is_register case. 

Thanks,
Song
Oleg Nesterov July 25, 2019, 8:14 a.m. UTC | #6
On 07/24, Song Liu wrote:
>
>
> > On Jul 24, 2019, at 4:37 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > On 07/24, Song Liu wrote:
> >>
> >> 	lock_page(old_page);
> >> @@ -177,15 +180,24 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
> >> 	mmu_notifier_invalidate_range_start(&range);
> >> 	err = -EAGAIN;
> >> 	if (!page_vma_mapped_walk(&pvmw)) {
> >> -		mem_cgroup_cancel_charge(new_page, memcg, false);
> >> +		if (!orig)
> >> +			mem_cgroup_cancel_charge(new_page, memcg, false);
> >> 		goto unlock;
> >> 	}
> >> 	VM_BUG_ON_PAGE(addr != pvmw.address, old_page);
> >>
> >> 	get_page(new_page);
> >> -	page_add_new_anon_rmap(new_page, vma, addr, false);
> >> -	mem_cgroup_commit_charge(new_page, memcg, false, false);
> >> -	lru_cache_add_active_or_unevictable(new_page, vma);
> >> +	if (orig) {
> >> +		lock_page(new_page);  /* for page_add_file_rmap() */
> >> +		page_add_file_rmap(new_page, false);
> >
> >
> > Shouldn't we re-check new_page->mapping after lock_page() ? Or we can't
> > race with truncate?
>
> We can't race with truncate, because the file is open as binary and
> protected with DENYWRITE (ETXTBSY).

No. Yes, deny_write_access() protects mm->exe_file, but not the dynamic
libraries or other files which can be mmaped.

> > and I am worried this code can try to lock the same page twice...
> > Say, the probed application does MADV_DONTNEED and then writes "int3"
> > into vma->vm_file at the same address to fool verify_opcode().
> >
>
> Do you mean the case where old_page == new_page?

Yes,

> I think this won't
> happen, because in uprobe_write_opcode() we only do orig_page for
> !is_register case.

See above.

!is_register doesn't necessarily mean the original page was previously cow'ed.
And even if it was cow'ed, MADV_DONTNEED can restore the original mapping.

Oleg.
Song Liu July 25, 2019, 6:17 p.m. UTC | #7
> On Jul 25, 2019, at 1:14 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> 
> On 07/24, Song Liu wrote:
>> 
>> 
>>> On Jul 24, 2019, at 4:37 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>> 
>>> On 07/24, Song Liu wrote:
>>>> 
>>>> 	lock_page(old_page);
>>>> @@ -177,15 +180,24 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
>>>> 	mmu_notifier_invalidate_range_start(&range);
>>>> 	err = -EAGAIN;
>>>> 	if (!page_vma_mapped_walk(&pvmw)) {
>>>> -		mem_cgroup_cancel_charge(new_page, memcg, false);
>>>> +		if (!orig)
>>>> +			mem_cgroup_cancel_charge(new_page, memcg, false);
>>>> 		goto unlock;
>>>> 	}
>>>> 	VM_BUG_ON_PAGE(addr != pvmw.address, old_page);
>>>> 
>>>> 	get_page(new_page);
>>>> -	page_add_new_anon_rmap(new_page, vma, addr, false);
>>>> -	mem_cgroup_commit_charge(new_page, memcg, false, false);
>>>> -	lru_cache_add_active_or_unevictable(new_page, vma);
>>>> +	if (orig) {
>>>> +		lock_page(new_page);  /* for page_add_file_rmap() */
>>>> +		page_add_file_rmap(new_page, false);
>>> 
>>> 
>>> Shouldn't we re-check new_page->mapping after lock_page() ? Or we can't
>>> race with truncate?
>> 
>> We can't race with truncate, because the file is open as binary and
>> protected with DENYWRITE (ETXTBSY).
> 
> No. Yes, deny_write_access() protects mm->exe_file, but not the dynamic
> libraries or other files which can be mmaped.

I see. Let me see how we can cover this. 

> 
>>> and I am worried this code can try to lock the same page twice...
>>> Say, the probed application does MADV_DONTNEED and then writes "int3"
>>> into vma->vm_file at the same address to fool verify_opcode().
>>> 
>> 
>> Do you mean the case where old_page == new_page?
> 
> Yes,
> 
>> I think this won't
>> happen, because in uprobe_write_opcode() we only do orig_page for
>> !is_register case.
> 
> See above.
> 
> !is_register doesn't necessarily mean the original page was previously cow'ed.
> And even if it was cow'ed, MADV_DONTNEED can restore the original mapping.

I guess I know the case now. We can probably avoid this with an simple 
check for old_page == new_page?

Thanks,
Song
Song Liu July 26, 2019, 6:07 a.m. UTC | #8
Hi Oleg, 

>> 
>> No. Yes, deny_write_access() protects mm->exe_file, but not the dynamic
>> libraries or other files which can be mmaped.
> 
> I see. Let me see how we can cover this. 
> 
>> 
>>>> and I am worried this code can try to lock the same page twice...
>>>> Say, the probed application does MADV_DONTNEED and then writes "int3"
>>>> into vma->vm_file at the same address to fool verify_opcode().
>>>> 
>>> 
>>> Do you mean the case where old_page == new_page?
>> 
>> Yes,
>> 
>>> I think this won't
>>> happen, because in uprobe_write_opcode() we only do orig_page for
>>> !is_register case.
>> 
>> See above.
>> 
>> !is_register doesn't necessarily mean the original page was previously cow'ed.
>> And even if it was cow'ed, MADV_DONTNEED can restore the original mapping.
> 
> I guess I know the case now. We can probably avoid this with an simple 
> check for old_page == new_page?

I decided to follow your suggestion of "unmap old_page; fault in orig_page". 
Please see v9 of the set. 

Thanks,
Song
Oleg Nesterov July 26, 2019, 8:44 a.m. UTC | #9
On 07/25, Song Liu wrote:
>
> I guess I know the case now. We can probably avoid this with an simple 
> check for old_page == new_page?

better yet, I think we can check PageAnon(old_page) and avoid the unnecessary
__replace_page() in this case. See the patch below.

Anyway, why __replace_page() needs to lock both pages? This doesn't look nice
even if it were correct. I think it can do lock_page(old_page) later.

Oleg.


--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -488,6 +488,10 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 		ref_ctr_updated = 1;
 	}
 
+	ret = 0;
+	if (!is_register && !PageAnon(old_page))
+		goto put_old;
+
 	ret = anon_vma_prepare(vma);
 	if (ret)
 		goto put_old;
Song Liu July 26, 2019, 9:19 p.m. UTC | #10
> On Jul 26, 2019, at 1:44 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> 
> On 07/25, Song Liu wrote:
>> 
>> I guess I know the case now. We can probably avoid this with an simple 
>> check for old_page == new_page?
> 
> better yet, I think we can check PageAnon(old_page) and avoid the unnecessary
> __replace_page() in this case. See the patch below.

I added PageAnon(old_page) check in v9 of the patch. 

> 
> Anyway, why __replace_page() needs to lock both pages? This doesn't look nice
> even if it were correct. I think it can do lock_page(old_page) later.
> 

Agreed. I have changed the v9 to only unmap old_page. So it should be cleaner. 

Thanks again for these good suggestions,
Song
diff mbox series

Patch

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 84fa00497c49..6b217bd031ef 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -160,16 +160,19 @@  static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	int err;
 	struct mmu_notifier_range range;
 	struct mem_cgroup *memcg;
+	bool orig = new_page->mapping != NULL;  /* new_page == orig_page */
 
 	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, addr,
 				addr + PAGE_SIZE);
 
 	VM_BUG_ON_PAGE(PageTransHuge(old_page), old_page);
 
-	err = mem_cgroup_try_charge(new_page, vma->vm_mm, GFP_KERNEL, &memcg,
-			false);
-	if (err)
-		return err;
+	if (!orig) {
+		err = mem_cgroup_try_charge(new_page, vma->vm_mm, GFP_KERNEL,
+					    &memcg, false);
+		if (err)
+			return err;
+	}
 
 	/* For try_to_free_swap() and munlock_vma_page() below */
 	lock_page(old_page);
@@ -177,15 +180,24 @@  static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	mmu_notifier_invalidate_range_start(&range);
 	err = -EAGAIN;
 	if (!page_vma_mapped_walk(&pvmw)) {
-		mem_cgroup_cancel_charge(new_page, memcg, false);
+		if (!orig)
+			mem_cgroup_cancel_charge(new_page, memcg, false);
 		goto unlock;
 	}
 	VM_BUG_ON_PAGE(addr != pvmw.address, old_page);
 
 	get_page(new_page);
-	page_add_new_anon_rmap(new_page, vma, addr, false);
-	mem_cgroup_commit_charge(new_page, memcg, false, false);
-	lru_cache_add_active_or_unevictable(new_page, vma);
+	if (orig) {
+		lock_page(new_page);  /* for page_add_file_rmap() */
+		page_add_file_rmap(new_page, false);
+		unlock_page(new_page);
+		inc_mm_counter(mm, mm_counter_file(new_page));
+		dec_mm_counter(mm, MM_ANONPAGES);
+	} else {
+		page_add_new_anon_rmap(new_page, vma, addr, false);
+		mem_cgroup_commit_charge(new_page, memcg, false, false);
+		lru_cache_add_active_or_unevictable(new_page, vma);
+	}
 
 	if (!PageAnon(old_page)) {
 		dec_mm_counter(mm, mm_counter_file(old_page));
@@ -501,6 +513,24 @@  int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	copy_highpage(new_page, old_page);
 	copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
 
+	if (!is_register) {
+		struct page *orig_page;
+		pgoff_t index;
+
+		index = vaddr_to_offset(vma, vaddr & PAGE_MASK) >> PAGE_SHIFT;
+		orig_page = find_get_page(vma->vm_file->f_inode->i_mapping,
+					  index);
+
+		if (orig_page) {
+			if (PageUptodate(orig_page) &&
+			    pages_identical(new_page, orig_page)) {
+				put_page(new_page);
+				new_page = orig_page;
+			} else
+				put_page(orig_page);
+		}
+	}
+
 	ret = __replace_page(vma, vaddr, old_page, new_page);
 	put_page(new_page);
 put_old: