Message ID | 1435932447-84377-1-git-send-email-kirill.shutemov@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/03/2015 05:07 PM, Kirill A. Shutemov wrote: > Reading page fault handler code I've noticed that under right > circumstances kernel would map anonymous pages into file mappings: > if the VMA doesn't have vm_ops->fault() and the VMA wasn't fully > populated on ->mmap(), kernel would handle page fault to not populated > pte with do_anonymous_page(). > > There's chance that it was done intentionally, but I don't see good > justification for this. We just hide bugs in broken drivers. > Have you done a preliminary audit for these broken drivers? If they actually exist in-tree then this patch is a regression for them. We need to look for vm_ops without an .fault = . Perhaps define a map_annonimous() for those to revert to the old behavior, if any actually exist. > Let's change page fault handler to use do_anonymous_page() only on > anonymous VMA (->vm_ops == NULL). > > For file mappings without vm_ops->fault() page fault on pte_none() entry > would lead to SIGBUS. > Again that could mean a theoretical regression for some in-tree driver, do you know of any such driver? Thanks Boaz > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > --- > mm/memory.c | 15 +++++++++------ > 1 file changed, 9 insertions(+), 6 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 8a2fc9945b46..f3ee782059e3 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3115,6 +3115,9 @@ static int do_fault(struct mm_struct *mm, struct vm_area_struct *vma, > - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; > > pte_unmap(page_table); > + > + if (unlikely(!vma->vm_ops->fault)) > + return VM_FAULT_SIGBUS; > if (!(flags & FAULT_FLAG_WRITE)) > return do_read_fault(mm, vma, address, pmd, pgoff, flags, > orig_pte); > @@ -3260,13 +3263,13 @@ static int handle_pte_fault(struct mm_struct *mm, > barrier(); > if (!pte_present(entry)) { > if (pte_none(entry)) { > - if (vma->vm_ops) { > - if (likely(vma->vm_ops->fault)) > - return do_fault(mm, vma, address, pte, > - pmd, flags, entry); > + if (!vma->vm_ops) { > + return do_anonymous_page(mm, vma, address, pte, > + pmd, flags); > + } else { > + return do_fault(mm, vma, address, pte, pmd, > + flags, entry); > } > - return do_anonymous_page(mm, vma, address, > - pte, pmd, flags); > } > return do_swap_page(mm, vma, address, > pte, pmd, flags, entry); > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jul 05, 2015 at 06:15:20PM +0300, Boaz Harrosh wrote: > On 07/03/2015 05:07 PM, Kirill A. Shutemov wrote: > > Reading page fault handler code I've noticed that under right > > circumstances kernel would map anonymous pages into file mappings: > > if the VMA doesn't have vm_ops->fault() and the VMA wasn't fully > > populated on ->mmap(), kernel would handle page fault to not populated > > pte with do_anonymous_page(). > > > > There's chance that it was done intentionally, but I don't see good > > justification for this. We just hide bugs in broken drivers. > > > > Have you done a preliminary audit for these broken drivers? If they actually > exist in-tree then this patch is a regression for them. No, I didn't check drivers. On other hand, if such driver exists it has security issue. If you're able to setup zero page into file mapping, you can make it writable with security implications. > We need to look for vm_ops without an .fault = . Perhaps define a > map_annonimous() for those to revert to the old behavior, if any > actually exist. No. Drivers should be fixed properly. > > Let's change page fault handler to use do_anonymous_page() only on > > anonymous VMA (->vm_ops == NULL). > > > > For file mappings without vm_ops->fault() page fault on pte_none() entry > > would lead to SIGBUS. > > > > Again that could mean a theoretical regression for some in-tree driver, > do you know of any such driver? I did very little testing with the patch: boot kvm with Fedora and run trinity there for a while. More testing is required.
On 07/05/2015 06:44 PM, Kirill A. Shutemov wrote: >> Again that could mean a theoretical regression for some in-tree driver, >> do you know of any such driver? > > I did very little testing with the patch: boot kvm with Fedora and run > trinity there for a while. More testing is required. > It seems more likely to be a bug in some obscure real HW driver, then anything virtualized. Let me run a quick search and see if I can see any obvious candidates for this ... <arch/x86/kernel/vsyscall_64.c> static struct vm_operations_struct gate_vma_ops = { .name = gate_vma_name, }; Perhaps it was done for this one </arch/x86/kernel/vsyscall_64.c> <arch/x86/mm/mpx.c> static struct vm_operations_struct mpx_vma_ops = { .name = mpx_mapping_name, }; Or this </arch/x86/mm/mpx.c> <more> static const struct vm_operations_struct pci_mmap_ops = { static const struct vm_operations_struct mmap_mem_ops = { ... </more> I was looking in-tree for any vm_operations_struct declaration without a .fault member, there are these above and a slue of HW drivers that only have an .open and .close so those might populate at open time and never actually ever fault. Please have a quick look, I did not. I agree about the possible security badness. Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/mm/memory.c b/mm/memory.c index 8a2fc9945b46..f3ee782059e3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3115,6 +3115,9 @@ static int do_fault(struct mm_struct *mm, struct vm_area_struct *vma, - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; pte_unmap(page_table); + + if (unlikely(!vma->vm_ops->fault)) + return VM_FAULT_SIGBUS; if (!(flags & FAULT_FLAG_WRITE)) return do_read_fault(mm, vma, address, pmd, pgoff, flags, orig_pte); @@ -3260,13 +3263,13 @@ static int handle_pte_fault(struct mm_struct *mm, barrier(); if (!pte_present(entry)) { if (pte_none(entry)) { - if (vma->vm_ops) { - if (likely(vma->vm_ops->fault)) - return do_fault(mm, vma, address, pte, - pmd, flags, entry); + if (!vma->vm_ops) { + return do_anonymous_page(mm, vma, address, pte, + pmd, flags); + } else { + return do_fault(mm, vma, address, pte, pmd, + flags, entry); } - return do_anonymous_page(mm, vma, address, - pte, pmd, flags); } return do_swap_page(mm, vma, address, pte, pmd, flags, entry);
Reading page fault handler code I've noticed that under right circumstances kernel would map anonymous pages into file mappings: if the VMA doesn't have vm_ops->fault() and the VMA wasn't fully populated on ->mmap(), kernel would handle page fault to not populated pte with do_anonymous_page(). There's chance that it was done intentionally, but I don't see good justification for this. We just hide bugs in broken drivers. Let's change page fault handler to use do_anonymous_page() only on anonymous VMA (->vm_ops == NULL). For file mappings without vm_ops->fault() page fault on pte_none() entry would lead to SIGBUS. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> --- mm/memory.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)