Message ID | 20180926031858.9692-1-aneesh.kumar@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [V2] mm: Recheck page table entry with page table lock held | expand |
On Wed, Sep 26, 2018 at 08:48:58AM +0530, Aneesh Kumar K.V wrote: > We clear the pte temporarily during read/modify/write update of the pte. If we > take a page fault while the pte is cleared, the application can get SIGBUS. One > such case is with remap_pfn_range without a backing vm_ops->fault callback. > do_fault will return SIGBUS in that case. > > cpu 0 cpu1 > mprotect() > ptep_modify_prot_start()/pte cleared. > . > . page fault. > . > . > prep_modify_prot_commit() > > Fix this by taking page table lock and rechecking for pte_none. > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > --- > V1: > * update commit message. You choosed to stick with VM_FAULT_NOPAGE, that's fine. Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Should it be in stable?
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> 于2018年9月26日周三 上午11:19写道: > We clear the pte temporarily during read/modify/write update of the pte. > If we > take a page fault while the pte is cleared, the application can get > SIGBUS. One > such case is with remap_pfn_range without a backing vm_ops->fault callback. > do_fault will return SIGBUS in that case. > what is " remap_pfn_range without a backing vm_ops->fault callback ", would you like elaborate the scenario? is it the case using remap_pfn_range() in drivers mmap() file operations? if in that case, why it will trap into do_fault? > > cpu 0 cpu1 > mprotect() > ptep_modify_prot_start()/pte cleared. > . > . page fault. > . > . > prep_modify_prot_commit() i am confusing this scenario, when CPU0 will call in change_pte_range()->ptep_modify_prot_start() to clear the pte content, and on the other thread, in handle_pte_fault(), pte_offset_map() can get the pte, and the pte is not invalid, it's pte is valid but just the content is all zero, so why it will call into do_fault? in handle_pte_fault(): vmf->pte = pte_offset_map(vmf->pmd, vmf->address); if (!vmf->pte) { return do_fault(vmf); } > > > Fix this by taking page table lock and rechecking for pte_none. > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > --- > V1: > * update commit message. > > mm/memory.c | 31 +++++++++++++++++++++++++++---- > 1 file changed, 27 insertions(+), 4 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index c467102a5cbc..c2f933184303 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3745,10 +3745,33 @@ static vm_fault_t do_fault(struct vm_fault *vmf) > struct vm_area_struct *vma = vmf->vma; > vm_fault_t ret; > > - /* The VMA was not fully populated on mmap() or missing > VM_DONTEXPAND */ > - if (!vma->vm_ops->fault) > - ret = VM_FAULT_SIGBUS; > - else if (!(vmf->flags & FAULT_FLAG_WRITE)) > + /* > + * The VMA was not fully populated on mmap() or missing > VM_DONTEXPAND > + */ > + if (!vma->vm_ops->fault) { > + > + /* > + * pmd entries won't be marked none during a R/M/W cycle. > + */ > + if (unlikely(pmd_none(*vmf->pmd))) > + ret = VM_FAULT_SIGBUS; > + else { > + vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd); > + /* > + * Make sure this is not a temporary clearing of > pte > + * by holding ptl and checking again. A R/M/W > update > + * of pte involves: take ptl, clearing the pte so > that > + * we don't have concurrent modification by > hardware > + * followed by an update. > + */ > + spin_lock(vmf->ptl); > + if (unlikely(pte_none(*vmf->pte))) > + ret = VM_FAULT_SIGBUS; > + else > + ret = VM_FAULT_NOPAGE; > + spin_unlock(vmf->ptl); > + } > + } else if (!(vmf->flags & FAULT_FLAG_WRITE)) > ret = do_read_fault(vmf); > else if (!(vma->vm_flags & VM_SHARED)) > ret = do_cow_fault(vmf); > -- > 2.17.1 > >
On Fri, Oct 25, 2019 at 11:13:58AM +0800, Figo.zhang wrote: > Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> 于2018年9月26日周三 上午11:19写道: > > > We clear the pte temporarily during read/modify/write update of the pte. > > If we > > take a page fault while the pte is cleared, the application can get > > SIGBUS. One > > such case is with remap_pfn_range without a backing vm_ops->fault callback. > > do_fault will return SIGBUS in that case. > > > what is " remap_pfn_range without a backing vm_ops->fault callback ", would > you like elaborate the scenario? > is it the case using remap_pfn_range() in drivers mmap() file operations? > if in that case, why it will trap into do_fault? Because there's no page mapped there during the race. > > > > cpu 0 cpu1 > > mprotect() > > ptep_modify_prot_start()/pte cleared. > > . > > . page fault. > > . > > . > > prep_modify_prot_commit() > > > i am confusing this scenario, when CPU0 will call > in change_pte_range()->ptep_modify_prot_start() to clear the pte content, > and > on the other thread, in handle_pte_fault(), pte_offset_map() can get the > pte, and the pte is not invalid, it's pte is valid but just the content is > all zero, so why it will call into do_fault? > > in handle_pte_fault(): > vmf->pte = pte_offset_map(vmf->pmd, vmf->address); > if (!vmf->pte) { > return do_fault(vmf); > } This case handles the situation when pte is none (clear) or page table is not allocated at all.
diff --git a/mm/memory.c b/mm/memory.c index c467102a5cbc..c2f933184303 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3745,10 +3745,33 @@ static vm_fault_t do_fault(struct vm_fault *vmf) struct vm_area_struct *vma = vmf->vma; vm_fault_t ret; - /* The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */ - if (!vma->vm_ops->fault) - ret = VM_FAULT_SIGBUS; - else if (!(vmf->flags & FAULT_FLAG_WRITE)) + /* + * The VMA was not fully populated on mmap() or missing VM_DONTEXPAND + */ + if (!vma->vm_ops->fault) { + + /* + * pmd entries won't be marked none during a R/M/W cycle. + */ + if (unlikely(pmd_none(*vmf->pmd))) + ret = VM_FAULT_SIGBUS; + else { + vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd); + /* + * Make sure this is not a temporary clearing of pte + * by holding ptl and checking again. A R/M/W update + * of pte involves: take ptl, clearing the pte so that + * we don't have concurrent modification by hardware + * followed by an update. + */ + spin_lock(vmf->ptl); + if (unlikely(pte_none(*vmf->pte))) + ret = VM_FAULT_SIGBUS; + else + ret = VM_FAULT_NOPAGE; + spin_unlock(vmf->ptl); + } + } else if (!(vmf->flags & FAULT_FLAG_WRITE)) ret = do_read_fault(vmf); else if (!(vma->vm_flags & VM_SHARED)) ret = do_cow_fault(vmf);
We clear the pte temporarily during read/modify/write update of the pte. If we take a page fault while the pte is cleared, the application can get SIGBUS. One such case is with remap_pfn_range without a backing vm_ops->fault callback. do_fault will return SIGBUS in that case. cpu 0 cpu1 mprotect() ptep_modify_prot_start()/pte cleared. . . page fault. . . prep_modify_prot_commit() Fix this by taking page table lock and rechecking for pte_none. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> --- V1: * update commit message. mm/memory.c | 31 +++++++++++++++++++++++++++---- 1 file changed, 27 insertions(+), 4 deletions(-)