Message ID | 20180920092408.9128-1-aneesh.kumar@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: Recheck page table entry with page table lock held | expand |
On Thu, Sep 20, 2018 at 02:54:08PM +0530, Aneesh Kumar K.V wrote: > We clear the pte temporarily during read/modify/write update of the pte. If we > take a page fault while the pte is cleared, the application can get SIGBUS. One > such case is with remap_pfn_range without a backing vm_ops->fault callback. > do_fault will return SIGBUS in that case. It would be nice to show the path that clears pte temporarily. > Fix this by taking page table lock and rechecking for pte_none. > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > --- > mm/memory.c | 31 +++++++++++++++++++++++++++---- > 1 file changed, 27 insertions(+), 4 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index c467102a5cbc..c2f933184303 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3745,10 +3745,33 @@ static vm_fault_t do_fault(struct vm_fault *vmf) > struct vm_area_struct *vma = vmf->vma; > vm_fault_t ret; > > - /* The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */ > - if (!vma->vm_ops->fault) > - ret = VM_FAULT_SIGBUS; > - else if (!(vmf->flags & FAULT_FLAG_WRITE)) > + /* > + * The VMA was not fully populated on mmap() or missing VM_DONTEXPAND > + */ > + if (!vma->vm_ops->fault) { > + > + /* > + * pmd entries won't be marked none during a R/M/W cycle. > + */ > + if (unlikely(pmd_none(*vmf->pmd))) > + ret = VM_FAULT_SIGBUS; > + else { > + vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd); > + /* > + * Make sure this is not a temporary clearing of pte > + * by holding ptl and checking again. A R/M/W update > + * of pte involves: take ptl, clearing the pte so that > + * we don't have concurrent modification by hardware > + * followed by an update. > + */ > + spin_lock(vmf->ptl); > + if (unlikely(pte_none(*vmf->pte))) > + ret = VM_FAULT_SIGBUS; > + else > + ret = VM_FAULT_NOPAGE; We return 0 if we did nothing in fault path.
On 9/20/18 4:35 PM, Kirill A. Shutemov wrote: > On Thu, Sep 20, 2018 at 02:54:08PM +0530, Aneesh Kumar K.V wrote: >> We clear the pte temporarily during read/modify/write update of the pte. If we >> take a page fault while the pte is cleared, the application can get SIGBUS. One >> such case is with remap_pfn_range without a backing vm_ops->fault callback. >> do_fault will return SIGBUS in that case. > > It would be nice to show the path that clears pte temporarily. > >> Fix this by taking page table lock and rechecking for pte_none. we do that in the ptep_modify_prot_start/ptep_modify_prot_commit. Also in hugetlb_change_protection. The hugetlb case many not be relevant because that cannot be backed by a vma without vma->vm_ops. What will hit this will be mprotect of a remap_pfn_range address? >> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> >> --- >> mm/memory.c | 31 +++++++++++++++++++++++++++---- >> 1 file changed, 27 insertions(+), 4 deletions(-) >> >> diff --git a/mm/memory.c b/mm/memory.c >> index c467102a5cbc..c2f933184303 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -3745,10 +3745,33 @@ static vm_fault_t do_fault(struct vm_fault *vmf) >> struct vm_area_struct *vma = vmf->vma; >> vm_fault_t ret; >> >> - /* The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */ >> - if (!vma->vm_ops->fault) >> - ret = VM_FAULT_SIGBUS; >> - else if (!(vmf->flags & FAULT_FLAG_WRITE)) >> + /* >> + * The VMA was not fully populated on mmap() or missing VM_DONTEXPAND >> + */ >> + if (!vma->vm_ops->fault) { >> + >> + /* >> + * pmd entries won't be marked none during a R/M/W cycle. >> + */ >> + if (unlikely(pmd_none(*vmf->pmd))) >> + ret = VM_FAULT_SIGBUS; >> + else { >> + vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd); >> + /* >> + * Make sure this is not a temporary clearing of pte >> + * by holding ptl and checking again. A R/M/W update >> + * of pte involves: take ptl, clearing the pte so that >> + * we don't have concurrent modification by hardware >> + * followed by an update. >> + */ >> + spin_lock(vmf->ptl); >> + if (unlikely(pte_none(*vmf->pte))) >> + ret = VM_FAULT_SIGBUS; >> + else >> + ret = VM_FAULT_NOPAGE; > > We return 0 if we did nothing in fault path. > I didn't get that. If we find the pte not none, we return so that we retry the access. Are you suggesting VM_FAULT_NOPAGE is not the right return for that? -aneesh
On Thu, Sep 20, 2018 at 04:41:59PM +0530, Aneesh Kumar K.V wrote: > On 9/20/18 4:35 PM, Kirill A. Shutemov wrote: > > On Thu, Sep 20, 2018 at 02:54:08PM +0530, Aneesh Kumar K.V wrote: > > > We clear the pte temporarily during read/modify/write update of the pte. If we > > > take a page fault while the pte is cleared, the application can get SIGBUS. One > > > such case is with remap_pfn_range without a backing vm_ops->fault callback. > > > do_fault will return SIGBUS in that case. > > > > It would be nice to show the path that clears pte temporarily. > > > > > Fix this by taking page table lock and rechecking for pte_none. > > > we do that in the ptep_modify_prot_start/ptep_modify_prot_commit. Also in > hugetlb_change_protection. The hugetlb case many not be relevant because > that cannot be backed by a vma without vma->vm_ops. > > What will hit this will be mprotect of a remap_pfn_range address? Sounds right. Please update commit message. > > > > > > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > > > --- > > > mm/memory.c | 31 +++++++++++++++++++++++++++---- > > > 1 file changed, 27 insertions(+), 4 deletions(-) > > > > > > diff --git a/mm/memory.c b/mm/memory.c > > > index c467102a5cbc..c2f933184303 100644 > > > --- a/mm/memory.c > > > +++ b/mm/memory.c > > > @@ -3745,10 +3745,33 @@ static vm_fault_t do_fault(struct vm_fault *vmf) > > > struct vm_area_struct *vma = vmf->vma; > > > vm_fault_t ret; > > > - /* The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */ > > > - if (!vma->vm_ops->fault) > > > - ret = VM_FAULT_SIGBUS; > > > - else if (!(vmf->flags & FAULT_FLAG_WRITE)) > > > + /* > > > + * The VMA was not fully populated on mmap() or missing VM_DONTEXPAND > > > + */ > > > + if (!vma->vm_ops->fault) { > > > + > > > + /* > > > + * pmd entries won't be marked none during a R/M/W cycle. > > > + */ > > > + if (unlikely(pmd_none(*vmf->pmd))) > > > + ret = VM_FAULT_SIGBUS; > > > + else { > > > + vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd); > > > + /* > > > + * Make sure this is not a temporary clearing of pte > > > + * by holding ptl and checking again. A R/M/W update > > > + * of pte involves: take ptl, clearing the pte so that > > > + * we don't have concurrent modification by hardware > > > + * followed by an update. > > > + */ > > > + spin_lock(vmf->ptl); > > > + if (unlikely(pte_none(*vmf->pte))) > > > + ret = VM_FAULT_SIGBUS; > > > + else > > > + ret = VM_FAULT_NOPAGE; > > > > We return 0 if we did nothing in fault path. > > > > I didn't get that. If we find the pte not none, we return so that we retry > the access. Are you suggesting VM_FAULT_NOPAGE is not the right return for > that? We usually use VM_FAULT_NOPAGE to indicate that ->fault() installed the pte and we don't need to do anything. We don't touch pte in this page fault. It doesn't make difference in this particular case, nobody cares upper by stack. Just a nitpick.
diff --git a/mm/memory.c b/mm/memory.c index c467102a5cbc..c2f933184303 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3745,10 +3745,33 @@ static vm_fault_t do_fault(struct vm_fault *vmf) struct vm_area_struct *vma = vmf->vma; vm_fault_t ret; - /* The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */ - if (!vma->vm_ops->fault) - ret = VM_FAULT_SIGBUS; - else if (!(vmf->flags & FAULT_FLAG_WRITE)) + /* + * The VMA was not fully populated on mmap() or missing VM_DONTEXPAND + */ + if (!vma->vm_ops->fault) { + + /* + * pmd entries won't be marked none during a R/M/W cycle. + */ + if (unlikely(pmd_none(*vmf->pmd))) + ret = VM_FAULT_SIGBUS; + else { + vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd); + /* + * Make sure this is not a temporary clearing of pte + * by holding ptl and checking again. A R/M/W update + * of pte involves: take ptl, clearing the pte so that + * we don't have concurrent modification by hardware + * followed by an update. + */ + spin_lock(vmf->ptl); + if (unlikely(pte_none(*vmf->pte))) + ret = VM_FAULT_SIGBUS; + else + ret = VM_FAULT_NOPAGE; + spin_unlock(vmf->ptl); + } + } else if (!(vmf->flags & FAULT_FLAG_WRITE)) ret = do_read_fault(vmf); else if (!(vma->vm_flags & VM_SHARED)) ret = do_cow_fault(vmf);
We clear the pte temporarily during read/modify/write update of the pte. If we take a page fault while the pte is cleared, the application can get SIGBUS. One such case is with remap_pfn_range without a backing vm_ops->fault callback. do_fault will return SIGBUS in that case. Fix this by taking page table lock and rechecking for pte_none. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> --- mm/memory.c | 31 +++++++++++++++++++++++++++---- 1 file changed, 27 insertions(+), 4 deletions(-)