Message ID | 20201229042125.2663029-1-lixinhai.lxh@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/hugetlb.c: fix unnecessary address expansion of pmd sharing | expand |
On Tue, 29 Dec 2020 12:21:25 +0800 Li Xinhai <lixinhai.lxh@gmail.com> wrote: > The current code would unnecessarily expand the address range. Consider > one example, (start, end) = (1G-2M, 3G+2M), and (vm_start, vm_end) = > (1G-4M, 3G+4M), the expected adjustment should be keep (1G-2M, 3G+2M) > without expand. But the current result will be (1G-4M, 3G+4M). Actually, > the range (1G-4M, 1G) and (3G, 3G+4M) would never been involved in pmd > sharing. > > After this patch, if pud aligned *start across vm_start, then we know the > *start and vm_start are in same pud_index, and vm_start is not pud > aligned, so don't adjust *start. Same logic applied to *end. Thanks. What are the user-visible runtime effects of this issue?
On 12/28/20 8:21 PM, Li Xinhai wrote: > The current code would unnecessarily expand the address range. Consider > one example, (start, end) = (1G-2M, 3G+2M), and (vm_start, vm_end) = > (1G-4M, 3G+4M), the expected adjustment should be keep (1G-2M, 3G+2M) > without expand. But the current result will be (1G-4M, 3G+4M). Actually, > the range (1G-4M, 1G) and (3G, 3G+4M) would never been involved in pmd > sharing. > > After this patch, if pud aligned *start across vm_start, then we know the > *start and vm_start are in same pud_index, and vm_start is not pud > aligned, so don't adjust *start. Same logic applied to *end. > > Fixes: commit 75802ca66354 ("mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible") > Cc: Mike Kravetz <mike.kravetz@oracle.com> > Cc: Peter Xu <peterx@redhat.com> > Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> Thank you. That does indeed fix an issue in the current code. Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Andrew, You asked about user-visible runtime effects. The 'adjusted range' is used for calls to mmu notifiers and cache(tlb) flushing. Since the current code unnecessarily expands the range in some cases, more entries than necessary would be flushed. This would/could result in performance degradation. However, this is highly dependent on the user runtime. Is there a combination of vma layout and calls to actually hit this issue? If the issue is hit, will those entries unnecessarily flushed be used again and need to be unnecessarily reloaded?
On Tue, Dec 29, 2020 at 12:21:25PM +0800, Li Xinhai wrote: > The current code would unnecessarily expand the address range. Consider > one example, (start, end) = (1G-2M, 3G+2M), and (vm_start, vm_end) = > (1G-4M, 3G+4M), the expected adjustment should be keep (1G-2M, 3G+2M) > without expand. But the current result will be (1G-4M, 3G+4M). Actually, > the range (1G-4M, 1G) and (3G, 3G+4M) would never been involved in pmd > sharing. > > After this patch, if pud aligned *start across vm_start, then we know the > *start and vm_start are in same pud_index, and vm_start is not pud > aligned, so don't adjust *start. Same logic applied to *end. > > Fixes: commit 75802ca66354 ("mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible") > Cc: Mike Kravetz <mike.kravetz@oracle.com> > Cc: Peter Xu <peterx@redhat.com> > Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> > --- > mm/hugetlb.c | 13 +++++++++---- > 1 file changed, 9 insertions(+), 4 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index cbf32d2824fd..d1e9ea55b7e6 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5249,11 +5249,16 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, > a_end = ALIGN(*end, PUD_SIZE); > > /* > - * Intersect the range with the vma range, since pmd sharing won't be > - * across vma after all > + * If the PUD aligned address across vma range, then it means the > + * vm_start/vm_end is not PUD aligned. In that case, we must don't > + * adjust range because pmd sharing is not possbile at the start and/or > + * end part of vma. > */ > - *start = max(vma->vm_start, a_start); > - *end = min(vma->vm_end, a_end); > + if (a_start >= vma->vm_start) > + *start = a_start; > + > + if (a_end <= vma->vm_end) > + *end = a_end; > } Looks correct, thanks. Reviewed-by: Peter Xu <peterx@redhat.com>
On 12/29/20 1:20 PM, Mike Kravetz wrote: > On 12/28/20 8:21 PM, Li Xinhai wrote: >> The current code would unnecessarily expand the address range. Consider >> one example, (start, end) = (1G-2M, 3G+2M), and (vm_start, vm_end) = >> (1G-4M, 3G+4M), the expected adjustment should be keep (1G-2M, 3G+2M) >> without expand. But the current result will be (1G-4M, 3G+4M). Actually, >> the range (1G-4M, 1G) and (3G, 3G+4M) would never been involved in pmd >> sharing. >> >> After this patch, if pud aligned *start across vm_start, then we know the >> *start and vm_start are in same pud_index, and vm_start is not pud >> aligned, so don't adjust *start. Same logic applied to *end. >> >> Fixes: commit 75802ca66354 ("mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible") >> Cc: Mike Kravetz <mike.kravetz@oracle.com> >> Cc: Peter Xu <peterx@redhat.com> >> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> > > Thank you. That does indeed fix an issue in the current code. > > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Upon further thought, this patch also expands the passed range when not necessary. Consider the example (start, end) = (1G-6M, 1G-4M), and (vm_start, vm_end) = (1G, 1G-2M). This patch would adjust the range to (1G, 1G-4M). However, no adjustment should be performed as no sharing is possible. Below is proposed code to address the issue. I'm not sending a formal patch yet as I would like comments on the code first. It is not a critical issue and any fix can wait a bit. diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 7e89f31d7ef8..41ccec617f74 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5264,16 +5264,19 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, if (!(vma->vm_flags & VM_MAYSHARE)) return; - /* Extend the range to be PUD aligned for a worst case scenario */ - a_start = ALIGN_DOWN(*start, PUD_SIZE); - a_end = ALIGN(*end, PUD_SIZE); - /* - * Intersect the range with the vma range, since pmd sharing won't be - * across vma after all + * Check if start and end are within a PUD aligned range of the + * vma. If they are, then adjust to PUD alignment. */ - *start = max(vma->vm_start, a_start); - *end = min(vma->vm_end, a_end); + a_start = ALIGN_DOWN(*start, PUD_SIZE); + a_end = ALIGN(*start, PUD_SIZE); + if (range_in_vma(vma, a_start, a_end)) + *start = a_start; + + a_start = ALIGN_DOWN(*end, PUD_SIZE); + a_end = ALIGN(*end, PUD_SIZE); + if (range_in_vma(vma, a_start, a_end)) + *end = a_end; } /*
On 1/1/21 1:56 AM, Mike Kravetz wrote: > On 12/29/20 1:20 PM, Mike Kravetz wrote: >> On 12/28/20 8:21 PM, Li Xinhai wrote: >>> The current code would unnecessarily expand the address range. Consider >>> one example, (start, end) = (1G-2M, 3G+2M), and (vm_start, vm_end) = >>> (1G-4M, 3G+4M), the expected adjustment should be keep (1G-2M, 3G+2M) >>> without expand. But the current result will be (1G-4M, 3G+4M). Actually, >>> the range (1G-4M, 1G) and (3G, 3G+4M) would never been involved in pmd >>> sharing. >>> >>> After this patch, if pud aligned *start across vm_start, then we know the >>> *start and vm_start are in same pud_index, and vm_start is not pud >>> aligned, so don't adjust *start. Same logic applied to *end. >>> >>> Fixes: commit 75802ca66354 ("mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible") >>> Cc: Mike Kravetz <mike.kravetz@oracle.com> >>> Cc: Peter Xu <peterx@redhat.com> >>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> >> >> Thank you. That does indeed fix an issue in the current code. >> >> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> > > Upon further thought, this patch also expands the passed range when not > necessary. Consider the example (start, end) = (1G-6M, 1G-4M), and > (vm_start, vm_end) = (1G, 1G-2M). This patch would adjust the range to > (1G, 1G-4M). However, no adjustment should be performed as no sharing > is possible. > correct, my previous patch did not fully fix the issue. Above example maybe typo for vm_start, vm_end. The issue didn't fixed by my patch would be with another example, (vm_start, vm_end) = (1G-8M, 1G+2M), (start, end) = (1G-6M, 1G-4M), end should not be adjusted to 1G, although after adjust it still below vm_end. > Below is proposed code to address the issue. I'm not sending a formal > patch yet as I would like comments on the code first. It is not a critical > issue and any fix can wait a bit. > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 7e89f31d7ef8..41ccec617f74 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5264,16 +5264,19 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, > if (!(vma->vm_flags & VM_MAYSHARE)) > return; > > - /* Extend the range to be PUD aligned for a worst case scenario */ > - a_start = ALIGN_DOWN(*start, PUD_SIZE); > - a_end = ALIGN(*end, PUD_SIZE); > - > /* > - * Intersect the range with the vma range, since pmd sharing won't be > - * across vma after all > + * Check if start and end are within a PUD aligned range of the > + * vma. If they are, then adjust to PUD alignment. > */ > - *start = max(vma->vm_start, a_start); > - *end = min(vma->vm_end, a_end); > + a_start = ALIGN_DOWN(*start, PUD_SIZE); > + a_end = ALIGN(*start, PUD_SIZE); > + if (range_in_vma(vma, a_start, a_end)) > + *start = a_start; > + > + a_start = ALIGN_DOWN(*end, PUD_SIZE); > + a_end = ALIGN(*end, PUD_SIZE); > + if (range_in_vma(vma, a_start, a_end)) > + *end = a_end; > } > > /* > Now, this fully fixed the issue. One thing to be sure is that the (start, end) as input parameter must already within vma's range, although the range_in_vma test() can cover the out of range cases. Reviewed-by: Li Xinhai <lixinhai.lxh@gmail.com>
On 1/2/21 3:56 AM, Li Xinhai wrote: > > > On 1/1/21 1:56 AM, Mike Kravetz wrote: >> On 12/29/20 1:20 PM, Mike Kravetz wrote: >>> On 12/28/20 8:21 PM, Li Xinhai wrote: >>>> The current code would unnecessarily expand the address range. Consider >>>> one example, (start, end) = (1G-2M, 3G+2M), and (vm_start, vm_end) = >>>> (1G-4M, 3G+4M), the expected adjustment should be keep (1G-2M, 3G+2M) >>>> without expand. But the current result will be (1G-4M, 3G+4M). Actually, >>>> the range (1G-4M, 1G) and (3G, 3G+4M) would never been involved in pmd >>>> sharing. >>>> >>>> After this patch, if pud aligned *start across vm_start, then we know the >>>> *start and vm_start are in same pud_index, and vm_start is not pud >>>> aligned, so don't adjust *start. Same logic applied to *end. >>>> >>>> Fixes: commit 75802ca66354 ("mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible") >>>> Cc: Mike Kravetz <mike.kravetz@oracle.com> >>>> Cc: Peter Xu <peterx@redhat.com> >>>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> >>> >>> Thank you. That does indeed fix an issue in the current code. >>> >>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> >> >> Upon further thought, this patch also expands the passed range when not >> necessary. Consider the example (start, end) = (1G-6M, 1G-4M), and >> (vm_start, vm_end) = (1G, 1G-2M). This patch would adjust the range to >> (1G, 1G-4M). However, no adjustment should be performed as no sharing >> is possible. >> > correct, my previous patch did not fully fix the issue. > > Above example maybe typo for vm_start, vm_end. The issue didn't fixed by my patch would be with another example, > (vm_start, vm_end) = (1G-8M, 1G+2M), (start, end) = (1G-6M, 1G-4M), end should not be adjusted to 1G, although after adjust it still below vm_end. > Sorry, I did incorrectly write that example. It should have read: Consider the example (start, end) = (2G-6M, 2G-4M), and (vm_start, vm_end) = (2G, 2G-2M). This patch would adjust the range to (2G, 2G-4M). However, no adjustment should be performed as no sharing is possible. >> Below is proposed code to address the issue. I'm not sending a formal >> patch yet as I would like comments on the code first. It is not a critical >> issue and any fix can wait a bit. > Now, this fully fixed the issue. > > One thing to be sure is that the (start, end) as input parameter must already within vma's range, although the range_in_vma test() can cover the out of range cases. > > Reviewed-by: Li Xinhai <lixinhai.lxh@gmail.com> Thanks for taking a look. I believe the only case where your patch produced incorrect results is when the range was within a vma that was smaller than PUD_SIZE. Do you agree? If that is the case, then how about just adding the following to your patch? I think this is simpler and faster than the 'range_in_vma' checking I proposed. diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 49990c0a02a3..716d1e58a7ae 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5261,7 +5261,9 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, { unsigned long a_start, a_end; - if (!(vma->vm_flags & VM_MAYSHARE)) + /* Quick check for vma capable of pmd sharing */ + if (!(vma->vm_flags & VM_MAYSHARE) || + (vma->vm_start - vma->vm_end) < PUD_SIZE) return; /* Extend the range to be PUD aligned for a worst case scenario */
On 1/4/21 11:55 AM, Mike Kravetz wrote: > I believe the only case where your patch produced incorrect results is > when the range was within a vma that was smaller than PUD_SIZE. Do you > agree? > Not exactly. We need to consider for vma which span at least one PUD_SIZE after align its vm_start and vm_end. > If that is the case, then how about just adding the following to your patch? > I think this is simpler and faster than the 'range_in_vma' checking I proposed. > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 49990c0a02a3..716d1e58a7ae 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5261,7 +5261,9 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, > { > unsigned long a_start, a_end; > > - if (!(vma->vm_flags & VM_MAYSHARE)) > + /* Quick check for vma capable of pmd sharing */ > + if (!(vma->vm_flags & VM_MAYSHARE) || > + (vma->vm_start - vma->vm_end) < PUD_SIZE) > return; > > /* Extend the range to be PUD aligned for a worst case scenario */ Your suggestion is good, we are able to simplify the code a bit, with less comparison, and maybe easier to follow. I will send a new patch to review.
On 1/3/21 11:10 PM, Li Xinhai wrote: > > > On 1/4/21 11:55 AM, Mike Kravetz wrote: >> I believe the only case where your patch produced incorrect results is >> when the range was within a vma that was smaller than PUD_SIZE. Do you >> agree? >> > Not exactly. We need to consider for vma which span at least one > PUD_SIZE after align its vm_start and vm_end. > I know that I provided an incorrect example again. Sorry (again)! Can you provide an example where adding the simple check for vma size less than PUD_SIZE to your original patch will not work. The logic in your V2 patch is correct. However, I am having a hard time finding a problem with this simpler approach.
On 1/5/21 2:59 AM, Mike Kravetz wrote: > On 1/3/21 11:10 PM, Li Xinhai wrote: >> >> >> On 1/4/21 11:55 AM, Mike Kravetz wrote: >>> I believe the only case where your patch produced incorrect results is >>> when the range was within a vma that was smaller than PUD_SIZE. Do you >>> agree? >>> >> Not exactly. We need to consider for vma which span at least one >> PUD_SIZE after align its vm_start and vm_end. >> > > I know that I provided an incorrect example again. Sorry (again)! > > Can you provide an example where adding the simple check for vma size less > than PUD_SIZE to your original patch will not work. The logic in your V2 > patch is correct. However, I am having a hard time finding a problem with > this simpler approach. > Thanks for checking. An example like this: (vm_start, vm_end) = (2G - 6M, 2G+8M), so this vma is bigger than PUD_SIZE, and the check for vma size bigger than PUD_SIZE will pass. With (start, end) = (2G-4M, 2G-2M), the previous patch will not adjust start(because adjust it from (2G-4M) to 1G will exceeding vm_start), but it will adjust end from (2G-2M) to 2G (because 2G still below vm_end). The adjustment of end is incorrect, because (2G-6M, 2G) range of vma is not allowed for PMD sharing(i.e., that range do not fully occupy PUD_SIZE). If we make the wrong adjustment, then we will unnecessarily impact to range (2G-2M, 2G).
On 1/5/21 10:10 AM, Li Xinhai wrote: > > > On 1/5/21 2:59 AM, Mike Kravetz wrote: >> On 1/3/21 11:10 PM, Li Xinhai wrote: >>> >>> >>> On 1/4/21 11:55 AM, Mike Kravetz wrote: >>>> I believe the only case where your patch produced incorrect results is >>>> when the range was within a vma that was smaller than PUD_SIZE. Do you >>>> agree? >>>> >>> Not exactly. We need to consider for vma which span at least one >>> PUD_SIZE after align its vm_start and vm_end. >>> >> >> I know that I provided an incorrect example again. Sorry (again)! >> >> Can you provide an example where adding the simple check for vma size >> less >> than PUD_SIZE to your original patch will not work. The logic in your V2 >> patch is correct. However, I am having a hard time finding a problem >> with >> this simpler approach. >> > > > Thanks for checking. An example like this: > (vm_start, vm_end) = (2G - 6M, 2G+8M), so this vma is bigger than > PUD_SIZE, and the check for vma size bigger than PUD_SIZE will pass. > > With (start, end) = (2G-4M, 2G-2M), the previous patch will not adjust > start(because adjust it from (2G-4M) to 1G will exceeding vm_start), but > it will adjust end from (2G-2M) to 2G (because 2G still below vm_end). > > The adjustment of end is incorrect, because (2G-6M, 2G) range of vma > is not allowed for PMD sharing(i.e., that range do not fully occupy > PUD_SIZE). If we make the wrong adjustment, then we will unnecessarily > impact to range (2G-2M, 2G). > > sorry, the above example is not correct for vma size, update it as below: (vm_start, vm_end) = (2G - 600M, 2G+800M), so this vma is bigger than PUD_SIZE, and the check for vma size bigger than PUD_SIZE will pass. With (start, end) = (2G-4M, 2G-2M), the previous patch will not adjust start(because adjust it from (2G-4M) to 1G will exceeding vm_start), but it will adjust end from (2G-2M) to 2G (because 2G still below vm_end). The adjustment of end is incorrect, because (2G-600M, 2G) range of vma is not allowed for PMD sharing(i.e., that range do not fully occupy PUD_SIZE). If we make the wrong adjustment, then we will unnecessarily impact to range (2G-2M, 2G).
On 1/4/21 6:38 PM, Li Xinhai wrote: > > > On 1/5/21 10:10 AM, Li Xinhai wrote: >> >> >> On 1/5/21 2:59 AM, Mike Kravetz wrote: >>> On 1/3/21 11:10 PM, Li Xinhai wrote: >>>> >>>> >>>> On 1/4/21 11:55 AM, Mike Kravetz wrote: >>>>> I believe the only case where your patch produced incorrect results is >>>>> when the range was within a vma that was smaller than PUD_SIZE. Do you >>>>> agree? >>>>> >>>> Not exactly. We need to consider for vma which span at least one >>>> PUD_SIZE after align its vm_start and vm_end. >>>> >>> >>> I know that I provided an incorrect example again. Sorry (again)! >>> >>> Can you provide an example where adding the simple check for vma size less >>> than PUD_SIZE to your original patch will not work. The logic in your V2 >>> patch is correct. However, I am having a hard time finding a problem with >>> this simpler approach. >>> >> >> >> Thanks for checking. An example like this: >> (vm_start, vm_end) = (2G - 6M, 2G+8M), so this vma is bigger than >> PUD_SIZE, and the check for vma size bigger than PUD_SIZE will pass. >> >> With (start, end) = (2G-4M, 2G-2M), the previous patch will not adjust >> start(because adjust it from (2G-4M) to 1G will exceeding vm_start), but >> it will adjust end from (2G-2M) to 2G (because 2G still below vm_end). >> >> The adjustment of end is incorrect, because (2G-6M, 2G) range of vma >> is not allowed for PMD sharing(i.e., that range do not fully occupy >> PUD_SIZE). If we make the wrong adjustment, then we will unnecessarily >> impact to range (2G-2M, 2G). >> >> > sorry, the above example is not correct for vma size, update it as > below: > > (vm_start, vm_end) = (2G - 600M, 2G+800M), so this vma is bigger than > PUD_SIZE, and the check for vma size bigger than PUD_SIZE will pass. > > With (start, end) = (2G-4M, 2G-2M), the previous patch will not adjust > start(because adjust it from (2G-4M) to 1G will exceeding vm_start), but > it will adjust end from (2G-2M) to 2G (because 2G still below vm_end). > > The adjustment of end is incorrect, because (2G-600M, 2G) range of vma > is not allowed for PMD sharing(i.e., that range do not fully occupy > PUD_SIZE). If we make the wrong adjustment, then we will unnecessarily > impact to range (2G-2M, 2G). Thanks you for the example! This routine is conceptually simple, but for some reason there have been multiple problems in the implementation.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index cbf32d2824fd..d1e9ea55b7e6 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5249,11 +5249,16 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, a_end = ALIGN(*end, PUD_SIZE); /* - * Intersect the range with the vma range, since pmd sharing won't be - * across vma after all + * If the PUD aligned address across vma range, then it means the + * vm_start/vm_end is not PUD aligned. In that case, we must don't + * adjust range because pmd sharing is not possbile at the start and/or + * end part of vma. */ - *start = max(vma->vm_start, a_start); - *end = min(vma->vm_end, a_end); + if (a_start >= vma->vm_start) + *start = a_start; + + if (a_end <= vma->vm_end) + *end = a_end; } /*
The current code would unnecessarily expand the address range. Consider one example, (start, end) = (1G-2M, 3G+2M), and (vm_start, vm_end) = (1G-4M, 3G+4M), the expected adjustment should be keep (1G-2M, 3G+2M) without expand. But the current result will be (1G-4M, 3G+4M). Actually, the range (1G-4M, 1G) and (3G, 3G+4M) would never been involved in pmd sharing. After this patch, if pud aligned *start across vm_start, then we know the *start and vm_start are in same pud_index, and vm_start is not pud aligned, so don't adjust *start. Same logic applied to *end. Fixes: commit 75802ca66354 ("mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible") Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> --- mm/hugetlb.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)