Message ID | 20250205151003.88959-11-ryan.roberts@arm.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | hugetlb and vmalloc fixes and perf improvements | expand |
On 2/5/25 20:39, Ryan Roberts wrote: > A call to vmalloc_huge() may cause memory blocks to be mapped at pmd or > pud level. But it is possible to subsquently call vunmap_range() on a s/subsquently/subsequently > sub-range of the mapped memory, which partially overlaps a pmd or pud. > In this case, vmalloc unmaps the entire pmd or pud so that the > no-overlapping portion is also unmapped. Clearly that would have a bad > outcome, but it's not something that any callers do today as far as I > can tell. So I guess it's jsut expected that callers will not do this. s/jsut/just > > However, it would be useful to know if this happened in future; let's > add a warning to cover the eventuality. This is a reasonable check to prevent bad outcomes later. > > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > --- > mm/vmalloc.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index a6e7acebe9ad..fcdf67d5177a 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -374,8 +374,10 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, > if (cleared || pmd_bad(*pmd)) > *mask |= PGTBL_PMD_MODIFIED; > > - if (cleared) > + if (cleared) { > + WARN_ON(next - addr < PMD_SIZE); > continue; > + } > if (pmd_none_or_clear_bad(pmd)) > continue; > vunmap_pte_range(pmd, addr, next, mask); > @@ -399,8 +401,10 @@ static void vunmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, > if (cleared || pud_bad(*pud)) > *mask |= PGTBL_PUD_MODIFIED; > > - if (cleared) > + if (cleared) { > + WARN_ON(next - addr < PUD_SIZE); > continue; > + } > if (pud_none_or_clear_bad(pud)) > continue; > vunmap_pmd_range(pud, addr, next, mask); Why not also include such checks in vunmap_p4d_range() and __vunmap_range_noflush() for corresponding P4D and PGD levels as well ?
On 07/02/2025 08:41, Anshuman Khandual wrote: > On 2/5/25 20:39, Ryan Roberts wrote: >> A call to vmalloc_huge() may cause memory blocks to be mapped at pmd or >> pud level. But it is possible to subsquently call vunmap_range() on a > > s/subsquently/subsequently > >> sub-range of the mapped memory, which partially overlaps a pmd or pud. >> In this case, vmalloc unmaps the entire pmd or pud so that the >> no-overlapping portion is also unmapped. Clearly that would have a bad >> outcome, but it's not something that any callers do today as far as I >> can tell. So I guess it's jsut expected that callers will not do this. > > s/jsut/just > >> >> However, it would be useful to know if this happened in future; let's >> add a warning to cover the eventuality. > > This is a reasonable check to prevent bad outcomes later. > >> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >> --- >> mm/vmalloc.c | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >> index a6e7acebe9ad..fcdf67d5177a 100644 >> --- a/mm/vmalloc.c >> +++ b/mm/vmalloc.c >> @@ -374,8 +374,10 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, >> if (cleared || pmd_bad(*pmd)) >> *mask |= PGTBL_PMD_MODIFIED; >> >> - if (cleared) >> + if (cleared) { >> + WARN_ON(next - addr < PMD_SIZE); >> continue; >> + } >> if (pmd_none_or_clear_bad(pmd)) >> continue; >> vunmap_pte_range(pmd, addr, next, mask); >> @@ -399,8 +401,10 @@ static void vunmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, >> if (cleared || pud_bad(*pud)) >> *mask |= PGTBL_PUD_MODIFIED; >> >> - if (cleared) >> + if (cleared) { >> + WARN_ON(next - addr < PUD_SIZE); >> continue; >> + } >> if (pud_none_or_clear_bad(pud)) >> continue; >> vunmap_pmd_range(pud, addr, next, mask); > Why not also include such checks in vunmap_p4d_range() and __vunmap_range_noflush() > for corresponding P4D and PGD levels as well ? The kernel does not support p4d or pgd leaf entries so there is nothing to check. Although vunmap_p4d_range() does call p4d_clear_huge(). The function is a stub and returns void (unlike p[mu]d_clear_huge()). I suspect we could just remove p4d_clear_huge() entirely. But that would be a separate patch to mm tree I think. For pgd, there isn't even an equivalent looking function. Basically at those 2 levels, it's always a table. Thanks, Ryan
On 2/7/25 16:29, Ryan Roberts wrote: > On 07/02/2025 08:41, Anshuman Khandual wrote: >> On 2/5/25 20:39, Ryan Roberts wrote: >>> A call to vmalloc_huge() may cause memory blocks to be mapped at pmd or >>> pud level. But it is possible to subsquently call vunmap_range() on a >> >> s/subsquently/subsequently >> >>> sub-range of the mapped memory, which partially overlaps a pmd or pud. >>> In this case, vmalloc unmaps the entire pmd or pud so that the >>> no-overlapping portion is also unmapped. Clearly that would have a bad >>> outcome, but it's not something that any callers do today as far as I >>> can tell. So I guess it's jsut expected that callers will not do this. >> >> s/jsut/just >> >>> >>> However, it would be useful to know if this happened in future; let's >>> add a warning to cover the eventuality. >> >> This is a reasonable check to prevent bad outcomes later. >> >>> >>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>> --- >>> mm/vmalloc.c | 8 ++++++-- >>> 1 file changed, 6 insertions(+), 2 deletions(-) >>> >>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >>> index a6e7acebe9ad..fcdf67d5177a 100644 >>> --- a/mm/vmalloc.c >>> +++ b/mm/vmalloc.c >>> @@ -374,8 +374,10 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, >>> if (cleared || pmd_bad(*pmd)) >>> *mask |= PGTBL_PMD_MODIFIED; >>> >>> - if (cleared) >>> + if (cleared) { >>> + WARN_ON(next - addr < PMD_SIZE); >>> continue; >>> + } >>> if (pmd_none_or_clear_bad(pmd)) >>> continue; >>> vunmap_pte_range(pmd, addr, next, mask); >>> @@ -399,8 +401,10 @@ static void vunmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, >>> if (cleared || pud_bad(*pud)) >>> *mask |= PGTBL_PUD_MODIFIED; >>> >>> - if (cleared) >>> + if (cleared) { >>> + WARN_ON(next - addr < PUD_SIZE); >>> continue; >>> + } >>> if (pud_none_or_clear_bad(pud)) >>> continue; >>> vunmap_pmd_range(pud, addr, next, mask); >> Why not also include such checks in vunmap_p4d_range() and __vunmap_range_noflush() >> for corresponding P4D and PGD levels as well ? > > The kernel does not support p4d or pgd leaf entries so there is nothing to check.> > Although vunmap_p4d_range() does call p4d_clear_huge(). The function is a stub > and returns void (unlike p[mu]d_clear_huge()). I suspect we could just remove > p4d_clear_huge() entirely. But that would be a separate patch to mm tree I think. > > For pgd, there isn't even an equivalent looking function. > > Basically at those 2 levels, it's always a table. Understood, thanks !
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index a6e7acebe9ad..fcdf67d5177a 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -374,8 +374,10 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, if (cleared || pmd_bad(*pmd)) *mask |= PGTBL_PMD_MODIFIED; - if (cleared) + if (cleared) { + WARN_ON(next - addr < PMD_SIZE); continue; + } if (pmd_none_or_clear_bad(pmd)) continue; vunmap_pte_range(pmd, addr, next, mask); @@ -399,8 +401,10 @@ static void vunmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, if (cleared || pud_bad(*pud)) *mask |= PGTBL_PUD_MODIFIED; - if (cleared) + if (cleared) { + WARN_ON(next - addr < PUD_SIZE); continue; + } if (pud_none_or_clear_bad(pud)) continue; vunmap_pmd_range(pud, addr, next, mask);
A call to vmalloc_huge() may cause memory blocks to be mapped at pmd or pud level. But it is possible to subsquently call vunmap_range() on a sub-range of the mapped memory, which partially overlaps a pmd or pud. In this case, vmalloc unmaps the entire pmd or pud so that the no-overlapping portion is also unmapped. Clearly that would have a bad outcome, but it's not something that any callers do today as far as I can tell. So I guess it's jsut expected that callers will not do this. However, it would be useful to know if this happened in future; let's add a warning to cover the eventuality. Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> --- mm/vmalloc.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)