Message ID | 20190724083600.832091-3-songliubraving@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | THP aware uprobe | expand |
On 07/24, Song Liu wrote: > > This patch allows uprobe to use original page when possible (all uprobes > on the page are already removed). and only if the original page is already in the page cache and uptodate, right? another reason why I think unmap makes more sense... but I won't argue. Oleg.
On 07/24, Oleg Nesterov wrote: > > On 07/24, Song Liu wrote: > > > > This patch allows uprobe to use original page when possible (all uprobes > > on the page are already removed). > > and only if the original page is already in the page cache and uptodate, > right? > > another reason why I think unmap makes more sense... but I won't argue. but somehow I forgot we need to read the original page anyway to check pages_identical(), so unmap is not really better, please forget. Oleg.
> On Jul 24, 2019, at 2:17 AM, Oleg Nesterov <oleg@redhat.com> wrote: > > On 07/24, Oleg Nesterov wrote: >> >> On 07/24, Song Liu wrote: >>> >>> This patch allows uprobe to use original page when possible (all uprobes >>> on the page are already removed). >> >> and only if the original page is already in the page cache and uptodate, >> right? >> >> another reason why I think unmap makes more sense... but I won't argue. > > but somehow I forgot we need to read the original page anyway to check > pages_identical(), so unmap is not really better, please forget. > Yeah, I was trying to explain this. :) Thanks for your feedbacks. Song
On 07/24, Song Liu wrote: > > lock_page(old_page); > @@ -177,15 +180,24 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, > mmu_notifier_invalidate_range_start(&range); > err = -EAGAIN; > if (!page_vma_mapped_walk(&pvmw)) { > - mem_cgroup_cancel_charge(new_page, memcg, false); > + if (!orig) > + mem_cgroup_cancel_charge(new_page, memcg, false); > goto unlock; > } > VM_BUG_ON_PAGE(addr != pvmw.address, old_page); > > get_page(new_page); > - page_add_new_anon_rmap(new_page, vma, addr, false); > - mem_cgroup_commit_charge(new_page, memcg, false, false); > - lru_cache_add_active_or_unevictable(new_page, vma); > + if (orig) { > + lock_page(new_page); /* for page_add_file_rmap() */ > + page_add_file_rmap(new_page, false); Shouldn't we re-check new_page->mapping after lock_page() ? Or we can't race with truncate? and I am worried this code can try to lock the same page twice... Say, the probed application does MADV_DONTNEED and then writes "int3" into vma->vm_file at the same address to fool verify_opcode(). Oleg.
> On Jul 24, 2019, at 4:37 AM, Oleg Nesterov <oleg@redhat.com> wrote: > > On 07/24, Song Liu wrote: >> >> lock_page(old_page); >> @@ -177,15 +180,24 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, >> mmu_notifier_invalidate_range_start(&range); >> err = -EAGAIN; >> if (!page_vma_mapped_walk(&pvmw)) { >> - mem_cgroup_cancel_charge(new_page, memcg, false); >> + if (!orig) >> + mem_cgroup_cancel_charge(new_page, memcg, false); >> goto unlock; >> } >> VM_BUG_ON_PAGE(addr != pvmw.address, old_page); >> >> get_page(new_page); >> - page_add_new_anon_rmap(new_page, vma, addr, false); >> - mem_cgroup_commit_charge(new_page, memcg, false, false); >> - lru_cache_add_active_or_unevictable(new_page, vma); >> + if (orig) { >> + lock_page(new_page); /* for page_add_file_rmap() */ >> + page_add_file_rmap(new_page, false); > > > Shouldn't we re-check new_page->mapping after lock_page() ? Or we can't > race with truncate? We can't race with truncate, because the file is open as binary and protected with DENYWRITE (ETXTBSY). > > > and I am worried this code can try to lock the same page twice... > Say, the probed application does MADV_DONTNEED and then writes "int3" > into vma->vm_file at the same address to fool verify_opcode(). > Do you mean the case where old_page == new_page? I think this won't happen, because in uprobe_write_opcode() we only do orig_page for !is_register case. Thanks, Song
On 07/24, Song Liu wrote: > > > > On Jul 24, 2019, at 4:37 AM, Oleg Nesterov <oleg@redhat.com> wrote: > > > > On 07/24, Song Liu wrote: > >> > >> lock_page(old_page); > >> @@ -177,15 +180,24 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, > >> mmu_notifier_invalidate_range_start(&range); > >> err = -EAGAIN; > >> if (!page_vma_mapped_walk(&pvmw)) { > >> - mem_cgroup_cancel_charge(new_page, memcg, false); > >> + if (!orig) > >> + mem_cgroup_cancel_charge(new_page, memcg, false); > >> goto unlock; > >> } > >> VM_BUG_ON_PAGE(addr != pvmw.address, old_page); > >> > >> get_page(new_page); > >> - page_add_new_anon_rmap(new_page, vma, addr, false); > >> - mem_cgroup_commit_charge(new_page, memcg, false, false); > >> - lru_cache_add_active_or_unevictable(new_page, vma); > >> + if (orig) { > >> + lock_page(new_page); /* for page_add_file_rmap() */ > >> + page_add_file_rmap(new_page, false); > > > > > > Shouldn't we re-check new_page->mapping after lock_page() ? Or we can't > > race with truncate? > > We can't race with truncate, because the file is open as binary and > protected with DENYWRITE (ETXTBSY). No. Yes, deny_write_access() protects mm->exe_file, but not the dynamic libraries or other files which can be mmaped. > > and I am worried this code can try to lock the same page twice... > > Say, the probed application does MADV_DONTNEED and then writes "int3" > > into vma->vm_file at the same address to fool verify_opcode(). > > > > Do you mean the case where old_page == new_page? Yes, > I think this won't > happen, because in uprobe_write_opcode() we only do orig_page for > !is_register case. See above. !is_register doesn't necessarily mean the original page was previously cow'ed. And even if it was cow'ed, MADV_DONTNEED can restore the original mapping. Oleg.
> On Jul 25, 2019, at 1:14 AM, Oleg Nesterov <oleg@redhat.com> wrote: > > On 07/24, Song Liu wrote: >> >> >>> On Jul 24, 2019, at 4:37 AM, Oleg Nesterov <oleg@redhat.com> wrote: >>> >>> On 07/24, Song Liu wrote: >>>> >>>> lock_page(old_page); >>>> @@ -177,15 +180,24 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, >>>> mmu_notifier_invalidate_range_start(&range); >>>> err = -EAGAIN; >>>> if (!page_vma_mapped_walk(&pvmw)) { >>>> - mem_cgroup_cancel_charge(new_page, memcg, false); >>>> + if (!orig) >>>> + mem_cgroup_cancel_charge(new_page, memcg, false); >>>> goto unlock; >>>> } >>>> VM_BUG_ON_PAGE(addr != pvmw.address, old_page); >>>> >>>> get_page(new_page); >>>> - page_add_new_anon_rmap(new_page, vma, addr, false); >>>> - mem_cgroup_commit_charge(new_page, memcg, false, false); >>>> - lru_cache_add_active_or_unevictable(new_page, vma); >>>> + if (orig) { >>>> + lock_page(new_page); /* for page_add_file_rmap() */ >>>> + page_add_file_rmap(new_page, false); >>> >>> >>> Shouldn't we re-check new_page->mapping after lock_page() ? Or we can't >>> race with truncate? >> >> We can't race with truncate, because the file is open as binary and >> protected with DENYWRITE (ETXTBSY). > > No. Yes, deny_write_access() protects mm->exe_file, but not the dynamic > libraries or other files which can be mmaped. I see. Let me see how we can cover this. > >>> and I am worried this code can try to lock the same page twice... >>> Say, the probed application does MADV_DONTNEED and then writes "int3" >>> into vma->vm_file at the same address to fool verify_opcode(). >>> >> >> Do you mean the case where old_page == new_page? > > Yes, > >> I think this won't >> happen, because in uprobe_write_opcode() we only do orig_page for >> !is_register case. > > See above. > > !is_register doesn't necessarily mean the original page was previously cow'ed. > And even if it was cow'ed, MADV_DONTNEED can restore the original mapping. I guess I know the case now. We can probably avoid this with an simple check for old_page == new_page? Thanks, Song
Hi Oleg, >> >> No. Yes, deny_write_access() protects mm->exe_file, but not the dynamic >> libraries or other files which can be mmaped. > > I see. Let me see how we can cover this. > >> >>>> and I am worried this code can try to lock the same page twice... >>>> Say, the probed application does MADV_DONTNEED and then writes "int3" >>>> into vma->vm_file at the same address to fool verify_opcode(). >>>> >>> >>> Do you mean the case where old_page == new_page? >> >> Yes, >> >>> I think this won't >>> happen, because in uprobe_write_opcode() we only do orig_page for >>> !is_register case. >> >> See above. >> >> !is_register doesn't necessarily mean the original page was previously cow'ed. >> And even if it was cow'ed, MADV_DONTNEED can restore the original mapping. > > I guess I know the case now. We can probably avoid this with an simple > check for old_page == new_page? I decided to follow your suggestion of "unmap old_page; fault in orig_page". Please see v9 of the set. Thanks, Song
On 07/25, Song Liu wrote: > > I guess I know the case now. We can probably avoid this with an simple > check for old_page == new_page? better yet, I think we can check PageAnon(old_page) and avoid the unnecessary __replace_page() in this case. See the patch below. Anyway, why __replace_page() needs to lock both pages? This doesn't look nice even if it were correct. I think it can do lock_page(old_page) later. Oleg. --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -488,6 +488,10 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, ref_ctr_updated = 1; } + ret = 0; + if (!is_register && !PageAnon(old_page)) + goto put_old; + ret = anon_vma_prepare(vma); if (ret) goto put_old;
> On Jul 26, 2019, at 1:44 AM, Oleg Nesterov <oleg@redhat.com> wrote: > > On 07/25, Song Liu wrote: >> >> I guess I know the case now. We can probably avoid this with an simple >> check for old_page == new_page? > > better yet, I think we can check PageAnon(old_page) and avoid the unnecessary > __replace_page() in this case. See the patch below. I added PageAnon(old_page) check in v9 of the patch. > > Anyway, why __replace_page() needs to lock both pages? This doesn't look nice > even if it were correct. I think it can do lock_page(old_page) later. > Agreed. I have changed the v9 to only unmap old_page. So it should be cleaner. Thanks again for these good suggestions, Song
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 84fa00497c49..6b217bd031ef 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -160,16 +160,19 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, int err; struct mmu_notifier_range range; struct mem_cgroup *memcg; + bool orig = new_page->mapping != NULL; /* new_page == orig_page */ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, addr, addr + PAGE_SIZE); VM_BUG_ON_PAGE(PageTransHuge(old_page), old_page); - err = mem_cgroup_try_charge(new_page, vma->vm_mm, GFP_KERNEL, &memcg, - false); - if (err) - return err; + if (!orig) { + err = mem_cgroup_try_charge(new_page, vma->vm_mm, GFP_KERNEL, + &memcg, false); + if (err) + return err; + } /* For try_to_free_swap() and munlock_vma_page() below */ lock_page(old_page); @@ -177,15 +180,24 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, mmu_notifier_invalidate_range_start(&range); err = -EAGAIN; if (!page_vma_mapped_walk(&pvmw)) { - mem_cgroup_cancel_charge(new_page, memcg, false); + if (!orig) + mem_cgroup_cancel_charge(new_page, memcg, false); goto unlock; } VM_BUG_ON_PAGE(addr != pvmw.address, old_page); get_page(new_page); - page_add_new_anon_rmap(new_page, vma, addr, false); - mem_cgroup_commit_charge(new_page, memcg, false, false); - lru_cache_add_active_or_unevictable(new_page, vma); + if (orig) { + lock_page(new_page); /* for page_add_file_rmap() */ + page_add_file_rmap(new_page, false); + unlock_page(new_page); + inc_mm_counter(mm, mm_counter_file(new_page)); + dec_mm_counter(mm, MM_ANONPAGES); + } else { + page_add_new_anon_rmap(new_page, vma, addr, false); + mem_cgroup_commit_charge(new_page, memcg, false, false); + lru_cache_add_active_or_unevictable(new_page, vma); + } if (!PageAnon(old_page)) { dec_mm_counter(mm, mm_counter_file(old_page)); @@ -501,6 +513,24 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, copy_highpage(new_page, old_page); copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE); + if (!is_register) { + struct page *orig_page; + pgoff_t index; + + index = vaddr_to_offset(vma, vaddr & PAGE_MASK) >> PAGE_SHIFT; + orig_page = find_get_page(vma->vm_file->f_inode->i_mapping, + index); + + if (orig_page) { + if (PageUptodate(orig_page) && + pages_identical(new_page, orig_page)) { + put_page(new_page); + new_page = orig_page; + } else + put_page(orig_page); + } + } + ret = __replace_page(vma, vaddr, old_page, new_page); put_page(new_page); put_old: