Message ID | 1639ac32194f2b2590852f410fd3ce3595eb730b.1730360798.git.zhengqi.arch@bytedance.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | synchronously scan and reclaim empty user PTE pages | expand |
On Thu, Oct 31, 2024 at 9:14 AM Qi Zheng <zhengqi.arch@bytedance.com> wrote: > This commit introduces do_zap_pte_range() to actually zap the PTEs, which > will help improve code readability and facilitate secondary checking of > the processed PTEs in the future. > > No functional change. > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Jann Horn <jannh@google.com>
On 31.10.24 09:13, Qi Zheng wrote: > This commit introduces do_zap_pte_range() to actually zap the PTEs, which > will help improve code readability and facilitate secondary checking of > the processed PTEs in the future. > > No functional change. > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> > --- > mm/memory.c | 45 ++++++++++++++++++++++++++------------------- > 1 file changed, 26 insertions(+), 19 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index bd9ebe0f4471f..c1150e62dd073 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1657,6 +1657,27 @@ static inline int zap_nonpresent_ptes(struct mmu_gather *tlb, > return nr; > } > > +static inline int do_zap_pte_range(struct mmu_gather *tlb, > + struct vm_area_struct *vma, pte_t *pte, > + unsigned long addr, unsigned long end, > + struct zap_details *details, int *rss, > + bool *force_flush, bool *force_break) > +{ > + pte_t ptent = ptep_get(pte); > + int max_nr = (end - addr) / PAGE_SIZE; > + > + if (pte_none(ptent)) > + return 1; Maybe we should just skip all applicable pte_none() here directly. Acked-by: David Hildenbrand <david@redhat.com>
On 2024/11/13 01:00, David Hildenbrand wrote: > On 31.10.24 09:13, Qi Zheng wrote: >> This commit introduces do_zap_pte_range() to actually zap the PTEs, which >> will help improve code readability and facilitate secondary checking of >> the processed PTEs in the future. >> >> No functional change. >> >> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >> --- >> mm/memory.c | 45 ++++++++++++++++++++++++++------------------- >> 1 file changed, 26 insertions(+), 19 deletions(-) >> >> diff --git a/mm/memory.c b/mm/memory.c >> index bd9ebe0f4471f..c1150e62dd073 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -1657,6 +1657,27 @@ static inline int zap_nonpresent_ptes(struct >> mmu_gather *tlb, >> return nr; >> } >> +static inline int do_zap_pte_range(struct mmu_gather *tlb, >> + struct vm_area_struct *vma, pte_t *pte, >> + unsigned long addr, unsigned long end, >> + struct zap_details *details, int *rss, >> + bool *force_flush, bool *force_break) >> +{ >> + pte_t ptent = ptep_get(pte); >> + int max_nr = (end - addr) / PAGE_SIZE; >> + >> + if (pte_none(ptent)) >> + return 1; > > Maybe we should just skip all applicable pte_none() here directly. Do you mean we should keep pte_none() case in zap_pte_range()? Like below: diff --git a/mm/memory.c b/mm/memory.c index 002aa4f454fa0..2ccdcf37b7a46 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1666,9 +1666,6 @@ static inline int do_zap_pte_range(struct mmu_gather *tlb, pte_t ptent = ptep_get(pte); int max_nr = (end - addr) / PAGE_SIZE; - if (pte_none(ptent)) - return 1; - if (pte_present(ptent)) return zap_present_ptes(tlb, vma, pte, ptent, max_nr, addr, details, rss, force_flush, @@ -1704,11 +1701,15 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, if (need_resched()) break; - nr = do_zap_pte_range(tlb, vma, pte, addr, end, details, rss, - &force_flush, &force_break); - if (unlikely(force_break)) { - addr += nr * PAGE_SIZE; - break; + if (pte_none(ptep_get(pte))) { + nr = 1; + } else { + nr = do_zap_pte_range(tlb, vma, pte, addr, end, details, + rss, &force_flush, &force_break); + if (unlikely(force_break)) { + addr += nr * PAGE_SIZE; + break; + } } } while (pte += nr, addr += PAGE_SIZE * nr, addr != end); This avoids repeated checks for pte_none() later. Both are fine for me, will change to this in the next version. > > Acked-by: David Hildenbrand <david@redhat.com> Thanks! >
On 13.11.24 03:40, Qi Zheng wrote: > > > On 2024/11/13 01:00, David Hildenbrand wrote: >> On 31.10.24 09:13, Qi Zheng wrote: >>> This commit introduces do_zap_pte_range() to actually zap the PTEs, which >>> will help improve code readability and facilitate secondary checking of >>> the processed PTEs in the future. >>> >>> No functional change. >>> >>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >>> --- >>> mm/memory.c | 45 ++++++++++++++++++++++++++------------------- >>> 1 file changed, 26 insertions(+), 19 deletions(-) >>> >>> diff --git a/mm/memory.c b/mm/memory.c >>> index bd9ebe0f4471f..c1150e62dd073 100644 >>> --- a/mm/memory.c >>> +++ b/mm/memory.c >>> @@ -1657,6 +1657,27 @@ static inline int zap_nonpresent_ptes(struct >>> mmu_gather *tlb, >>> return nr; >>> } >>> +static inline int do_zap_pte_range(struct mmu_gather *tlb, >>> + struct vm_area_struct *vma, pte_t *pte, >>> + unsigned long addr, unsigned long end, >>> + struct zap_details *details, int *rss, >>> + bool *force_flush, bool *force_break) >>> +{ >>> + pte_t ptent = ptep_get(pte); >>> + int max_nr = (end - addr) / PAGE_SIZE; >>> + >>> + if (pte_none(ptent)) >>> + return 1; >> >> Maybe we should just skip all applicable pte_none() here directly. > > Do you mean we should keep pte_none() case in zap_pte_range()? Like > below: > No rather an addon patch that will simply skip over all consecutive pte_none, like: if (pte_none(ptent)) { int nr; for (nr = 1; nr < max_nr; nr++) { ptent = ptep_get(pte + nr); if (pte_none(ptent)) continue; } max_nr -= nr; if (!max_nr) return nr; addr += nr * PAGE_SIZE; pte += nr; } Assuming that it's likely more common to have larger pte_none() holes that single ones, optimizing out the need_resched()+force_break+incremental pte/addr increments etc.
On 2024/11/13 19:43, David Hildenbrand wrote: > On 13.11.24 03:40, Qi Zheng wrote: >> >> >> On 2024/11/13 01:00, David Hildenbrand wrote: >>> On 31.10.24 09:13, Qi Zheng wrote: >>>> This commit introduces do_zap_pte_range() to actually zap the PTEs, >>>> which >>>> will help improve code readability and facilitate secondary checking of >>>> the processed PTEs in the future. >>>> >>>> No functional change. >>>> >>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >>>> --- >>>> mm/memory.c | 45 ++++++++++++++++++++++++++------------------- >>>> 1 file changed, 26 insertions(+), 19 deletions(-) >>>> >>>> diff --git a/mm/memory.c b/mm/memory.c >>>> index bd9ebe0f4471f..c1150e62dd073 100644 >>>> --- a/mm/memory.c >>>> +++ b/mm/memory.c >>>> @@ -1657,6 +1657,27 @@ static inline int zap_nonpresent_ptes(struct >>>> mmu_gather *tlb, >>>> return nr; >>>> } >>>> +static inline int do_zap_pte_range(struct mmu_gather *tlb, >>>> + struct vm_area_struct *vma, pte_t *pte, >>>> + unsigned long addr, unsigned long end, >>>> + struct zap_details *details, int *rss, >>>> + bool *force_flush, bool *force_break) >>>> +{ >>>> + pte_t ptent = ptep_get(pte); >>>> + int max_nr = (end - addr) / PAGE_SIZE; >>>> + >>>> + if (pte_none(ptent)) >>>> + return 1; >>> >>> Maybe we should just skip all applicable pte_none() here directly. >> >> Do you mean we should keep pte_none() case in zap_pte_range()? Like >> below: >> > > No rather an addon patch that will simply skip over all > consecutive pte_none, like: > > if (pte_none(ptent)) { > int nr; > > for (nr = 1; nr < max_nr; nr++) { > ptent = ptep_get(pte + nr); > if (pte_none(ptent)) > continue; > } > > max_nr -= nr; > if (!max_nr) > return nr; > addr += nr * PAGE_SIZE; > pte += nr; > } > > Assuming that it's likely more common to have larger pte_none() holes > that single ones, optimizing out the > need_resched()+force_break+incremental pte/addr increments etc. Ah, got it. And I agree with you, will change to it in the next version. Thanks! >
On 2024/11/13 19:43, David Hildenbrand wrote: > On 13.11.24 03:40, Qi Zheng wrote: >> >> >> On 2024/11/13 01:00, David Hildenbrand wrote: >>> On 31.10.24 09:13, Qi Zheng wrote: >>>> This commit introduces do_zap_pte_range() to actually zap the PTEs, >>>> which >>>> will help improve code readability and facilitate secondary checking of >>>> the processed PTEs in the future. >>>> >>>> No functional change. >>>> >>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >>>> --- >>>> mm/memory.c | 45 ++++++++++++++++++++++++++------------------- >>>> 1 file changed, 26 insertions(+), 19 deletions(-) >>>> >>>> diff --git a/mm/memory.c b/mm/memory.c >>>> index bd9ebe0f4471f..c1150e62dd073 100644 >>>> --- a/mm/memory.c >>>> +++ b/mm/memory.c >>>> @@ -1657,6 +1657,27 @@ static inline int zap_nonpresent_ptes(struct >>>> mmu_gather *tlb, >>>> return nr; >>>> } >>>> +static inline int do_zap_pte_range(struct mmu_gather *tlb, >>>> + struct vm_area_struct *vma, pte_t *pte, >>>> + unsigned long addr, unsigned long end, >>>> + struct zap_details *details, int *rss, >>>> + bool *force_flush, bool *force_break) >>>> +{ >>>> + pte_t ptent = ptep_get(pte); >>>> + int max_nr = (end - addr) / PAGE_SIZE; >>>> + >>>> + if (pte_none(ptent)) >>>> + return 1; >>> >>> Maybe we should just skip all applicable pte_none() here directly. >> >> Do you mean we should keep pte_none() case in zap_pte_range()? Like >> below: >> > > No rather an addon patch that will simply skip over all > consecutive pte_none, like: > > if (pte_none(ptent)) { > int nr; > > for (nr = 1; nr < max_nr; nr++) { > ptent = ptep_get(pte + nr); > if (pte_none(ptent)) > continue; > } > > max_nr -= nr; > if (!max_nr) > return nr; > addr += nr * PAGE_SIZE; > pte += nr; > } I tend to hand over the pte/addr increments here to the loop outside do_zap_pte_range(), like this: diff --git a/mm/memory.c b/mm/memory.c index bd9ebe0f4471f..2367a1c19edd6 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1657,6 +1657,36 @@ static inline int zap_nonpresent_ptes(struct mmu_gather *tlb, return nr; } +static inline int do_zap_pte_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, pte_t *pte, + unsigned long addr, unsigned long end, + struct zap_details *details, int *rss, + bool *force_flush, bool *force_break) +{ + pte_t ptent = ptep_get(pte); + int max_nr = (end - addr) / PAGE_SIZE; + + if (pte_none(ptent)) { + int nr = 1; + + for (; nr < max_nr; nr++) { + ptent = ptep_get(pte + nr); + if (!pte_none(ptent)) + break; + } + + return nr; + } + + if (pte_present(ptent)) + return zap_present_ptes(tlb, vma, pte, ptent, max_nr, + addr, details, rss, force_flush, + force_break); + + return zap_nonpresent_ptes(tlb, vma, pte, ptent, max_nr, addr, + details, rss); +} + static unsigned long zap_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, @@ -1679,28 +1709,14 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, flush_tlb_batched_pending(mm); arch_enter_lazy_mmu_mode(); do { - pte_t ptent = ptep_get(pte); - int max_nr; - - nr = 1; - if (pte_none(ptent)) - continue; - if (need_resched()) break; - max_nr = (end - addr) / PAGE_SIZE; - if (pte_present(ptent)) { - nr = zap_present_ptes(tlb, vma, pte, ptent, max_nr, - addr, details, rss, &force_flush, - &force_break); - if (unlikely(force_break)) { - addr += nr * PAGE_SIZE; - break; - } - } else { - nr = zap_nonpresent_ptes(tlb, vma, pte, ptent, max_nr, - addr, details, rss); + nr = do_zap_pte_range(tlb, vma, pte, addr, end, details, + rss, &force_flush, &force_break); + if (unlikely(force_break)) { + addr += nr * PAGE_SIZE; + break; } } while (pte += nr, addr += PAGE_SIZE * nr, addr != end); > > Assuming that it's likely more common to have larger pte_none() holes > that single ones, optimizing out the > need_resched()+force_break+incremental pte/addr increments etc. >
On 2024/11/14 11:09, Qi Zheng wrote: > > > On 2024/11/13 19:43, David Hildenbrand wrote: >> On 13.11.24 03:40, Qi Zheng wrote: >>> >>> >>> On 2024/11/13 01:00, David Hildenbrand wrote: >>>> On 31.10.24 09:13, Qi Zheng wrote: >>>>> This commit introduces do_zap_pte_range() to actually zap the PTEs, >>>>> which >>>>> will help improve code readability and facilitate secondary >>>>> checking of >>>>> the processed PTEs in the future. >>>>> >>>>> No functional change. >>>>> >>>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >>>>> --- >>>>> mm/memory.c | 45 ++++++++++++++++++++++++++------------------- >>>>> 1 file changed, 26 insertions(+), 19 deletions(-) >>>>> >>>>> diff --git a/mm/memory.c b/mm/memory.c >>>>> index bd9ebe0f4471f..c1150e62dd073 100644 >>>>> --- a/mm/memory.c >>>>> +++ b/mm/memory.c >>>>> @@ -1657,6 +1657,27 @@ static inline int zap_nonpresent_ptes(struct >>>>> mmu_gather *tlb, >>>>> return nr; >>>>> } >>>>> +static inline int do_zap_pte_range(struct mmu_gather *tlb, >>>>> + struct vm_area_struct *vma, pte_t *pte, >>>>> + unsigned long addr, unsigned long end, >>>>> + struct zap_details *details, int *rss, >>>>> + bool *force_flush, bool *force_break) >>>>> +{ >>>>> + pte_t ptent = ptep_get(pte); >>>>> + int max_nr = (end - addr) / PAGE_SIZE; >>>>> + >>>>> + if (pte_none(ptent)) >>>>> + return 1; >>>> >>>> Maybe we should just skip all applicable pte_none() here directly. >>> >>> Do you mean we should keep pte_none() case in zap_pte_range()? Like >>> below: >>> >> >> No rather an addon patch that will simply skip over all >> consecutive pte_none, like: >> >> if (pte_none(ptent)) { >> int nr; >> >> for (nr = 1; nr < max_nr; nr++) { >> ptent = ptep_get(pte + nr); >> if (pte_none(ptent)) >> continue; >> } >> >> max_nr -= nr; >> if (!max_nr) >> return nr; >> addr += nr * PAGE_SIZE; >> pte += nr; >> } > > I tend to hand over the pte/addr increments here to the loop > outside do_zap_pte_range(), like this: > Finally, I choose to introduce skip_none_ptes() to do this: diff --git a/mm/memory.c b/mm/memory.c index bd9ebe0f4471f..24633d0e1445a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1657,6 +1657,28 @@ static inline int zap_nonpresent_ptes(struct mmu_gather *tlb, return nr; } +static inline int skip_none_ptes(pte_t *pte, unsigned long addr, + unsigned long end) +{ + pte_t ptent = ptep_get(pte); + int max_nr; + int nr; + + if (!pte_none(ptent)) + return 0; + + max_nr = (end - addr) / PAGE_SIZE; + nr = 1; + + for (; nr < max_nr; nr++) { + ptent = ptep_get(pte + nr); + if (!pte_none(ptent)) + break; + } + + return nr; +} + static unsigned long zap_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, @@ -1682,13 +1704,17 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, pte_t ptent = ptep_get(pte); int max_nr; - nr = 1; - if (pte_none(ptent)) - continue; - if (need_resched()) break; + nr = skip_none_ptes(pte, addr, end); + if (nr) { + addr += PAGE_SIZE * nr; + if (addr == end) + break; + pte += nr; + } + max_nr = (end - addr) / PAGE_SIZE; if (pte_present(ptent)) { nr = zap_present_ptes(tlb, vma, pte, ptent, max_nr, Thanks!
diff --git a/mm/memory.c b/mm/memory.c index bd9ebe0f4471f..c1150e62dd073 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1657,6 +1657,27 @@ static inline int zap_nonpresent_ptes(struct mmu_gather *tlb, return nr; } +static inline int do_zap_pte_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, pte_t *pte, + unsigned long addr, unsigned long end, + struct zap_details *details, int *rss, + bool *force_flush, bool *force_break) +{ + pte_t ptent = ptep_get(pte); + int max_nr = (end - addr) / PAGE_SIZE; + + if (pte_none(ptent)) + return 1; + + if (pte_present(ptent)) + return zap_present_ptes(tlb, vma, pte, ptent, max_nr, + addr, details, rss, force_flush, + force_break); + + return zap_nonpresent_ptes(tlb, vma, pte, ptent, max_nr, addr, + details, rss); +} + static unsigned long zap_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, @@ -1679,28 +1700,14 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, flush_tlb_batched_pending(mm); arch_enter_lazy_mmu_mode(); do { - pte_t ptent = ptep_get(pte); - int max_nr; - - nr = 1; - if (pte_none(ptent)) - continue; - if (need_resched()) break; - max_nr = (end - addr) / PAGE_SIZE; - if (pte_present(ptent)) { - nr = zap_present_ptes(tlb, vma, pte, ptent, max_nr, - addr, details, rss, &force_flush, - &force_break); - if (unlikely(force_break)) { - addr += nr * PAGE_SIZE; - break; - } - } else { - nr = zap_nonpresent_ptes(tlb, vma, pte, ptent, max_nr, - addr, details, rss); + nr = do_zap_pte_range(tlb, vma, pte, addr, end, details, rss, + &force_flush, &force_break); + if (unlikely(force_break)) { + addr += nr * PAGE_SIZE; + break; } } while (pte += nr, addr += PAGE_SIZE * nr, addr != end);
This commit introduces do_zap_pte_range() to actually zap the PTEs, which will help improve code readability and facilitate secondary checking of the processed PTEs in the future. No functional change. Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> --- mm/memory.c | 45 ++++++++++++++++++++++++++------------------- 1 file changed, 26 insertions(+), 19 deletions(-)