diff mbox series

[2/2] mm/page_vma_mapped: page table boundary is already guaranteed

Message ID 20191128010321.21730-2-richardw.yang@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series [1/2] mm/page_vma_mapped: use PMD_SIZE instead of calculating it | expand

Commit Message

Wei Yang Nov. 28, 2019, 1:03 a.m. UTC
The check here is to guarantee pvmw->address iteration is limited in one
page table boundary. To be specific, here the address range should be in
one PMD_SIZE.

If my understanding is correct, this check is already done in the above
check:

    address >= __vma_address(page, vma) + PMD_SIZE

The boundary check here seems not necessary.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>

---
Test:
   more than 48 hours kernel build test shows this code is not touched.
---
 mm/page_vma_mapped.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

Comments

Kirill A. Shutemov Nov. 28, 2019, 8:31 a.m. UTC | #1
On Thu, Nov 28, 2019 at 09:03:21AM +0800, Wei Yang wrote:
> The check here is to guarantee pvmw->address iteration is limited in one
> page table boundary. To be specific, here the address range should be in
> one PMD_SIZE.
> 
> If my understanding is correct, this check is already done in the above
> check:
> 
>     address >= __vma_address(page, vma) + PMD_SIZE
> 
> The boundary check here seems not necessary.
> 
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>

NAK.

THP can be mapped with PTE not aligned to PMD_SIZE. Consider mremap().

> Test:
>    more than 48 hours kernel build test shows this code is not touched.

Not an argument. I doubt mremap(2) is ever called in kernel build
workload.
Wei Yang Nov. 28, 2019, 9:09 p.m. UTC | #2
On Thu, Nov 28, 2019 at 11:31:43AM +0300, Kirill A. Shutemov wrote:
>On Thu, Nov 28, 2019 at 09:03:21AM +0800, Wei Yang wrote:
>> The check here is to guarantee pvmw->address iteration is limited in one
>> page table boundary. To be specific, here the address range should be in
>> one PMD_SIZE.
>> 
>> If my understanding is correct, this check is already done in the above
>> check:
>> 
>>     address >= __vma_address(page, vma) + PMD_SIZE
>> 
>> The boundary check here seems not necessary.
>> 
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>
>NAK.
>
>THP can be mapped with PTE not aligned to PMD_SIZE. Consider mremap().
>

Hi, Kirill

Thanks for your comment during Thanks Giving Day. Happy holiday:-)

I didn't think about this case before, thanks for reminding. Then I tried to
understand your concern.

mremap() would expand/shrink a memory mapping. In this case, probably shrink
is in concern. Since pvmw->page and pvmw->vma are not changed in the loop, the
case you mentioned maybe pvmw->page is the head of a THP but part of it is
unmapped.

This means the following condition stands:

    vma->vm_start <= vma_address(page) 
    vma->vm_end <=   vma_address(page) + page_size(page)

Since we have checked address with vm_end, do you think this case is also
guarded?

Not sure my understanding is correct, look forward your comments.

>> Test:
>>    more than 48 hours kernel build test shows this code is not touched.
>
>Not an argument. I doubt mremap(2) is ever called in kernel build
>workload.
>
>-- 
> Kirill A. Shutemov
Matthew Wilcox (Oracle) Nov. 28, 2019, 10:39 p.m. UTC | #3
On Thu, Nov 28, 2019 at 09:09:45PM +0000, Wei Yang wrote:
> On Thu, Nov 28, 2019 at 11:31:43AM +0300, Kirill A. Shutemov wrote:
> >On Thu, Nov 28, 2019 at 09:03:21AM +0800, Wei Yang wrote:
> >> The check here is to guarantee pvmw->address iteration is limited in one
> >> page table boundary. To be specific, here the address range should be in
> >> one PMD_SIZE.
> >> 
> >> If my understanding is correct, this check is already done in the above
> >> check:
> >> 
> >>     address >= __vma_address(page, vma) + PMD_SIZE
> >> 
> >> The boundary check here seems not necessary.
> >> 
> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> >
> >NAK.
> >
> >THP can be mapped with PTE not aligned to PMD_SIZE. Consider mremap().
> >
> 
> Hi, Kirill
> 
> Thanks for your comment during Thanks Giving Day. Happy holiday:-)
> 
> I didn't think about this case before, thanks for reminding. Then I tried to
> understand your concern.
> 
> mremap() would expand/shrink a memory mapping. In this case, probably shrink
> is in concern. Since pvmw->page and pvmw->vma are not changed in the loop, the
> case you mentioned maybe pvmw->page is the head of a THP but part of it is
> unmapped.

mremap() can also move a mapping, see MREMAP_FIXED.

> This means the following condition stands:
> 
>     vma->vm_start <= vma_address(page) 
>     vma->vm_end <=   vma_address(page) + page_size(page)
> 
> Since we have checked address with vm_end, do you think this case is also
> guarded?
> 
> Not sure my understanding is correct, look forward your comments.
> 
> >> Test:
> >>    more than 48 hours kernel build test shows this code is not touched.
> >
> >Not an argument. I doubt mremap(2) is ever called in kernel build
> >workload.
> >
> >-- 
> > Kirill A. Shutemov
> 
> -- 
> Wei Yang
> Help you, Help me
>
Wei Yang Nov. 29, 2019, 8:30 a.m. UTC | #4
On Thu, Nov 28, 2019 at 02:39:04PM -0800, Matthew Wilcox wrote:
>On Thu, Nov 28, 2019 at 09:09:45PM +0000, Wei Yang wrote:
>> On Thu, Nov 28, 2019 at 11:31:43AM +0300, Kirill A. Shutemov wrote:
>> >On Thu, Nov 28, 2019 at 09:03:21AM +0800, Wei Yang wrote:
>> >> The check here is to guarantee pvmw->address iteration is limited in one
>> >> page table boundary. To be specific, here the address range should be in
>> >> one PMD_SIZE.
>> >> 
>> >> If my understanding is correct, this check is already done in the above
>> >> check:
>> >> 
>> >>     address >= __vma_address(page, vma) + PMD_SIZE
>> >> 
>> >> The boundary check here seems not necessary.
>> >> 
>> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> >
>> >NAK.
>> >
>> >THP can be mapped with PTE not aligned to PMD_SIZE. Consider mremap().
>> >
>> 
>> Hi, Kirill
>> 
>> Thanks for your comment during Thanks Giving Day. Happy holiday:-)
>> 
>> I didn't think about this case before, thanks for reminding. Then I tried to
>> understand your concern.
>> 
>> mremap() would expand/shrink a memory mapping. In this case, probably shrink
>> is in concern. Since pvmw->page and pvmw->vma are not changed in the loop, the
>> case you mentioned maybe pvmw->page is the head of a THP but part of it is
>> unmapped.
>
>mremap() can also move a mapping, see MREMAP_FIXED.

Hi, Matthew

Thanks for your comment.

I took a look into the MREMAP_FIXED case, but still not clear in which case it
fall into the situation Kirill mentioned.

Per my understanding, move mapping is achieved in two steps:

    * unmap some range in old vma if old_len >= new_len
    * move vma

If the length doesn't change, we are expecting to have the "copy" of old
vma. This doesn't change the THP PMD mapping.

So the change still happens in the unmap step, if I am correct.

Would you mind giving me more hint on the case when we would have the
situation as Kirill mentioned?

>
>> This means the following condition stands:
>> 
>>     vma->vm_start <= vma_address(page) 
>>     vma->vm_end <=   vma_address(page) + page_size(page)
>> 
>> Since we have checked address with vm_end, do you think this case is also
>> guarded?
>> 
>> Not sure my understanding is correct, look forward your comments.
>> 
>> >> Test:
>> >>    more than 48 hours kernel build test shows this code is not touched.
>> >
>> >Not an argument. I doubt mremap(2) is ever called in kernel build
>> >workload.
>> >
>> >-- 
>> > Kirill A. Shutemov
>> 
>> -- 
>> Wei Yang
>> Help you, Help me
>>
Matthew Wilcox (Oracle) Nov. 29, 2019, 11:18 a.m. UTC | #5
On Fri, Nov 29, 2019 at 04:30:02PM +0800, Wei Yang wrote:
> On Thu, Nov 28, 2019 at 02:39:04PM -0800, Matthew Wilcox wrote:
> >On Thu, Nov 28, 2019 at 09:09:45PM +0000, Wei Yang wrote:
> >> On Thu, Nov 28, 2019 at 11:31:43AM +0300, Kirill A. Shutemov wrote:
> >> >On Thu, Nov 28, 2019 at 09:03:21AM +0800, Wei Yang wrote:
> >> >> The check here is to guarantee pvmw->address iteration is limited in one
> >> >> page table boundary. To be specific, here the address range should be in
> >> >> one PMD_SIZE.
> >> >> 
> >> >> If my understanding is correct, this check is already done in the above
> >> >> check:
> >> >> 
> >> >>     address >= __vma_address(page, vma) + PMD_SIZE
> >> >> 
> >> >> The boundary check here seems not necessary.
> >> >> 
> >> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> >> >
> >> >NAK.
> >> >
> >> >THP can be mapped with PTE not aligned to PMD_SIZE. Consider mremap().
> >> >
> >> 
> >> Hi, Kirill
> >> 
> >> Thanks for your comment during Thanks Giving Day. Happy holiday:-)
> >> 
> >> I didn't think about this case before, thanks for reminding. Then I tried to
> >> understand your concern.
> >> 
> >> mremap() would expand/shrink a memory mapping. In this case, probably shrink
> >> is in concern. Since pvmw->page and pvmw->vma are not changed in the loop, the
> >> case you mentioned maybe pvmw->page is the head of a THP but part of it is
> >> unmapped.
> >
> >mremap() can also move a mapping, see MREMAP_FIXED.
> 
> Hi, Matthew
> 
> Thanks for your comment.
> 
> I took a look into the MREMAP_FIXED case, but still not clear in which case it
> fall into the situation Kirill mentioned.
> 
> Per my understanding, move mapping is achieved in two steps:
> 
>     * unmap some range in old vma if old_len >= new_len
>     * move vma
> 
> If the length doesn't change, we are expecting to have the "copy" of old
> vma. This doesn't change the THP PMD mapping.
> 
> So the change still happens in the unmap step, if I am correct.
> 
> Would you mind giving me more hint on the case when we would have the
> situation as Kirill mentioned?

Set up a THP mapping.
Move it to an address which is no longer 2MB aligned.
Unmap it.
Wei Yang Dec. 2, 2019, 6:53 a.m. UTC | #6
On Fri, Nov 29, 2019 at 03:18:01AM -0800, Matthew Wilcox wrote:
>On Fri, Nov 29, 2019 at 04:30:02PM +0800, Wei Yang wrote:
>> On Thu, Nov 28, 2019 at 02:39:04PM -0800, Matthew Wilcox wrote:
>> >On Thu, Nov 28, 2019 at 09:09:45PM +0000, Wei Yang wrote:
>> >> On Thu, Nov 28, 2019 at 11:31:43AM +0300, Kirill A. Shutemov wrote:
>> >> >On Thu, Nov 28, 2019 at 09:03:21AM +0800, Wei Yang wrote:
>> >> >> The check here is to guarantee pvmw->address iteration is limited in one
>> >> >> page table boundary. To be specific, here the address range should be in
>> >> >> one PMD_SIZE.
>> >> >> 
>> >> >> If my understanding is correct, this check is already done in the above
>> >> >> check:
>> >> >> 
>> >> >>     address >= __vma_address(page, vma) + PMD_SIZE
>> >> >> 
>> >> >> The boundary check here seems not necessary.
>> >> >> 
>> >> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> >> >
>> >> >NAK.
>> >> >
>> >> >THP can be mapped with PTE not aligned to PMD_SIZE. Consider mremap().
>> >> >
>> >> 
>> >> Hi, Kirill
>> >> 
>> >> Thanks for your comment during Thanks Giving Day. Happy holiday:-)
>> >> 
>> >> I didn't think about this case before, thanks for reminding. Then I tried to
>> >> understand your concern.
>> >> 
>> >> mremap() would expand/shrink a memory mapping. In this case, probably shrink
>> >> is in concern. Since pvmw->page and pvmw->vma are not changed in the loop, the
>> >> case you mentioned maybe pvmw->page is the head of a THP but part of it is
>> >> unmapped.
>> >
>> >mremap() can also move a mapping, see MREMAP_FIXED.
>> 
>> Hi, Matthew
>> 
>> Thanks for your comment.
>> 
>> I took a look into the MREMAP_FIXED case, but still not clear in which case it
>> fall into the situation Kirill mentioned.
>> 
>> Per my understanding, move mapping is achieved in two steps:
>> 
>>     * unmap some range in old vma if old_len >= new_len
>>     * move vma
>> 
>> If the length doesn't change, we are expecting to have the "copy" of old
>> vma. This doesn't change the THP PMD mapping.
>> 
>> So the change still happens in the unmap step, if I am correct.
>> 
>> Would you mind giving me more hint on the case when we would have the
>> situation as Kirill mentioned?
>
>Set up a THP mapping.
>Move it to an address which is no longer 2MB aligned.
>Unmap it.

Thanks Matthew

I got the point, thanks a lot :-)
diff mbox series

Patch

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 76e03650a3ab..25aada8a1271 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -163,7 +163,6 @@  bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
 			return not_found(pvmw);
 		return true;
 	}
-restart:
 	pgd = pgd_offset(mm, pvmw->address);
 	if (!pgd_present(*pgd))
 		return false;
@@ -225,17 +224,7 @@  bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
 					__vma_address(pvmw->page, pvmw->vma) +
 					PMD_SIZE)
 				return not_found(pvmw);
-			/* Did we cross page table boundary? */
-			if (pvmw->address % PMD_SIZE == 0) {
-				pte_unmap(pvmw->pte);
-				if (pvmw->ptl) {
-					spin_unlock(pvmw->ptl);
-					pvmw->ptl = NULL;
-				}
-				goto restart;
-			} else {
-				pvmw->pte++;
-			}
+			pvmw->pte++;
 		} while (pte_none(*pvmw->pte));
 
 		if (!pvmw->ptl) {