Message ID | 1491397622-16665-1-git-send-email-sironi@amazon.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
2017-04-05 15:07+0200, Filippo Sironi: > cmpxchg_gpte() calls get_user_pages_fast() to retrieve the number of > pages and the respective struct pages for mapping in the kernel virtual > address space. > This doesn't work if get_user_pages_fast() is invoked with a userspace > virtual address that's backed by PFNs outside of kernel reach (e.g., > when limiting the kernel memory with mem= in the command line and using > /dev/mem to map memory). > > If get_user_pages_fast() fails, look up the VMA that backs the userspace > virtual address, compute the PFN and the physical address, and map it in > the kernel virtual address space with memremap(). What is the reason for a configuration that voluntarily restricts access to memory that it needs? > Signed-off-by: Filippo Sironi <sironi@amazon.de> > Cc: Anthony Liguori <aliguori@amazon.com> > Cc: kvm@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > --- > diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h > @@ -147,15 +147,36 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, > struct page *page; > > npages = get_user_pages_fast((unsigned long)ptep_user, 1, 1, &page); > - /* Check if the user is doing something meaningless. */ > - if (unlikely(npages != 1)) > - return -EFAULT; > - > - table = kmap_atomic(page); > - ret = CMPXCHG(&table[index], orig_pte, new_pte); > - kunmap_atomic(table); > - > - kvm_release_page_dirty(page); > + if (likely(npages == 1)) { > + table = kmap_atomic(page); > + ret = CMPXCHG(&table[index], orig_pte, new_pte); > + kunmap_atomic(table); > + > + kvm_release_page_dirty(page); > + } else { > + struct vm_area_struct *vma; > + unsigned long vaddr = (unsigned long)ptep_user & PAGE_MASK; > + unsigned long pfn; > + unsigned long paddr; > + > + down_read(¤t->mm->mmap_sem); > + vma = find_vma_intersection(current->mm, vaddr, > + vaddr + PAGE_SIZE); Hm, with the argument order like this, we check that vaddr < vma->vm_end && vaddr + PAGE_SIZE > vma->vm_start but shouldn't we actually check that the whole page is there, i.e. vaddr + PAGE_SIZE < vma->vm_end && vaddr > vma->vm_start ? Thanks. > + if (!vma || !(vma->vm_flags & VM_PFNMAP)) { > + up_read(¤t->mm->mmap_sem); > + return -EFAULT; > + } > + pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; > + paddr = pfn << PAGE_SHIFT; > + table = memremap(paddr, PAGE_SIZE, MEMREMAP_WB); (I don't undestand why there isn't a wrapper for this ... Looks like we're doing something unexpected.) > + if (!table) { > + up_read(¤t->mm->mmap_sem); > + return -EFAULT; > + } > + ret = CMPXCHG(&table[index], orig_pte, new_pte); > + memunmap(table); > + up_read(¤t->mm->mmap_sem); > + } > > return (ret != orig_pte); > } > -- > 2.7.4 >
Thanks for taking the time and sorry for the delay. > On 6. Apr 2017, at 16:22, Radim Krčmář <rkrcmar@redhat.com> wrote: > > 2017-04-05 15:07+0200, Filippo Sironi: >> cmpxchg_gpte() calls get_user_pages_fast() to retrieve the number of >> pages and the respective struct pages for mapping in the kernel virtual >> address space. >> This doesn't work if get_user_pages_fast() is invoked with a userspace >> virtual address that's backed by PFNs outside of kernel reach (e.g., >> when limiting the kernel memory with mem= in the command line and using >> /dev/mem to map memory). >> >> If get_user_pages_fast() fails, look up the VMA that backs the userspace >> virtual address, compute the PFN and the physical address, and map it in >> the kernel virtual address space with memremap(). > > What is the reason for a configuration that voluntarily restricts access > to memory that it needs? By using /dev/mem to provide VM memory, one can avoid the overhead of allocating struct page(s) for the whole memory, which is wasteful when using a server entirely for hosting VMs. >> Signed-off-by: Filippo Sironi <sironi@amazon.de> >> Cc: Anthony Liguori <aliguori@amazon.com> >> Cc: kvm@vger.kernel.org >> Cc: linux-kernel@vger.kernel.org >> --- >> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h >> @@ -147,15 +147,36 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, >> struct page *page; >> >> npages = get_user_pages_fast((unsigned long)ptep_user, 1, 1, &page); >> - /* Check if the user is doing something meaningless. */ >> - if (unlikely(npages != 1)) >> - return -EFAULT; >> - >> - table = kmap_atomic(page); >> - ret = CMPXCHG(&table[index], orig_pte, new_pte); >> - kunmap_atomic(table); >> - >> - kvm_release_page_dirty(page); >> + if (likely(npages == 1)) { >> + table = kmap_atomic(page); >> + ret = CMPXCHG(&table[index], orig_pte, new_pte); >> + kunmap_atomic(table); >> + >> + kvm_release_page_dirty(page); >> + } else { >> + struct vm_area_struct *vma; >> + unsigned long vaddr = (unsigned long)ptep_user & PAGE_MASK; >> + unsigned long pfn; >> + unsigned long paddr; >> + >> + down_read(¤t->mm->mmap_sem); >> + vma = find_vma_intersection(current->mm, vaddr, >> + vaddr + PAGE_SIZE); > > Hm, with the argument order like this, we check that > > vaddr < vma->vm_end && vaddr + PAGE_SIZE > vma->vm_start > > but shouldn't we actually check that the whole page is there, i.e. > > vaddr + PAGE_SIZE < vma->vm_end && vaddr > vma->vm_start > > ? > > Thanks. Hm, right now we check for the following: vaddr >= vma->vm_start && vaddr < vma->vm_end && vaddr + PAGE_SIZE > vma->vm_start given that vaddr is PAGE_SIZE aligned, we're guaranteed that vaddr + PAGE_SIZE <= vma->vm_end. This seems more complex than necessary. I believe that: vma = find_vma(current->mm, vaddr); should be enough. >> + if (!vma || !(vma->vm_flags & VM_PFNMAP)) { >> + up_read(¤t->mm->mmap_sem); >> + return -EFAULT; >> + } >> + pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; >> + paddr = pfn << PAGE_SHIFT; >> + table = memremap(paddr, PAGE_SIZE, MEMREMAP_WB); > > (I don't undestand why there isn't a wrapper for this ... > Looks like we're doing something unexpected.) Do you mean a wrapper for getting the pfn/paddr? >> + if (!table) { >> + up_read(¤t->mm->mmap_sem); >> + return -EFAULT; >> + } >> + ret = CMPXCHG(&table[index], orig_pte, new_pte); >> + memunmap(table); >> + up_read(¤t->mm->mmap_sem); >> + } >> >> return (ret != orig_pte); >> } >> -- >> 2.7.4 I'll submit a v2 version soon. Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
On 04/12/2017 09:16 PM, Sironi, Filippo wrote: > Thanks for taking the time and sorry for the delay. > >> On 6. Apr 2017, at 16:22, Radim Krčmář <rkrcmar@redhat.com> wrote: >> >> 2017-04-05 15:07+0200, Filippo Sironi: >>> cmpxchg_gpte() calls get_user_pages_fast() to retrieve the number of >>> pages and the respective struct pages for mapping in the kernel virtual >>> address space. >>> This doesn't work if get_user_pages_fast() is invoked with a userspace >>> virtual address that's backed by PFNs outside of kernel reach (e.g., >>> when limiting the kernel memory with mem= in the command line and using >>> /dev/mem to map memory). >>> >>> If get_user_pages_fast() fails, look up the VMA that backs the userspace >>> virtual address, compute the PFN and the physical address, and map it in >>> the kernel virtual address space with memremap(). >> >> What is the reason for a configuration that voluntarily restricts access >> to memory that it needs? > > By using /dev/mem to provide VM memory, one can avoid the overhead of allocating struct page(s) for the whole memory, which is wasteful when using a server entirely for hosting VMs. > Sounds reasonable, however it is incomplete so far as there are some code paths still do not support non-page backend memory, e.g, emulator_cmpxchg_emulated(). I would suggest to unify the code introduced in this patch with existing hva_to_pfn(), also we can introduce a common API, maybe named kvm_map_hva(), to improve the caller sides. BTW, i do not know why we used kmap_atomic() rather than kmap(), the path of cmpxchg_gpte() is sleep-able anyway. Thanks!
On 17/04/2017 13:26, Xiao Guangrong wrote: >> > > Sounds reasonable, however it is incomplete so far as there are some > code paths still do not support non-page backend memory, e.g, > emulator_cmpxchg_emulated(). > > I would suggest to unify the code introduced in this patch with existing > hva_to_pfn(), also we can introduce a common API, maybe named > kvm_map_hva(), to improve the caller sides. I agree with Guangrong's suggestion. Paolo > BTW, i do not know why we used kmap_atomic() rather than kmap(), the > path of cmpxchg_gpte() is sleep-able anyway.
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index a01105485315..ab4d6617238c 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -147,15 +147,36 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, struct page *page; npages = get_user_pages_fast((unsigned long)ptep_user, 1, 1, &page); - /* Check if the user is doing something meaningless. */ - if (unlikely(npages != 1)) - return -EFAULT; - - table = kmap_atomic(page); - ret = CMPXCHG(&table[index], orig_pte, new_pte); - kunmap_atomic(table); - - kvm_release_page_dirty(page); + if (likely(npages == 1)) { + table = kmap_atomic(page); + ret = CMPXCHG(&table[index], orig_pte, new_pte); + kunmap_atomic(table); + + kvm_release_page_dirty(page); + } else { + struct vm_area_struct *vma; + unsigned long vaddr = (unsigned long)ptep_user & PAGE_MASK; + unsigned long pfn; + unsigned long paddr; + + down_read(¤t->mm->mmap_sem); + vma = find_vma_intersection(current->mm, vaddr, + vaddr + PAGE_SIZE); + if (!vma || !(vma->vm_flags & VM_PFNMAP)) { + up_read(¤t->mm->mmap_sem); + return -EFAULT; + } + pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; + paddr = pfn << PAGE_SHIFT; + table = memremap(paddr, PAGE_SIZE, MEMREMAP_WB); + if (!table) { + up_read(¤t->mm->mmap_sem); + return -EFAULT; + } + ret = CMPXCHG(&table[index], orig_pte, new_pte); + memunmap(table); + up_read(¤t->mm->mmap_sem); + } return (ret != orig_pte); }
cmpxchg_gpte() calls get_user_pages_fast() to retrieve the number of pages and the respective struct pages for mapping in the kernel virtual address space. This doesn't work if get_user_pages_fast() is invoked with a userspace virtual address that's backed by PFNs outside of kernel reach (e.g., when limiting the kernel memory with mem= in the command line and using /dev/mem to map memory). If get_user_pages_fast() fails, look up the VMA that backs the userspace virtual address, compute the PFN and the physical address, and map it in the kernel virtual address space with memremap(). Signed-off-by: Filippo Sironi <sironi@amazon.de> Cc: Anthony Liguori <aliguori@amazon.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- arch/x86/kvm/paging_tmpl.h | 39 ++++++++++++++++++++++++++++++--------- 1 file changed, 30 insertions(+), 9 deletions(-)