diff mbox series

[v1,10/16] mm/vmalloc: Warn on improper use of vunmap_range()

Message ID 20250205151003.88959-11-ryan.roberts@arm.com (mailing list archive)
State New
Headers show
Series hugetlb and vmalloc fixes and perf improvements | expand

Commit Message

Ryan Roberts Feb. 5, 2025, 3:09 p.m. UTC
A call to vmalloc_huge() may cause memory blocks to be mapped at pmd or
pud level. But it is possible to subsquently call vunmap_range() on a
sub-range of the mapped memory, which partially overlaps a pmd or pud.
In this case, vmalloc unmaps the entire pmd or pud so that the
no-overlapping portion is also unmapped. Clearly that would have a bad
outcome, but it's not something that any callers do today as far as I
can tell. So I guess it's jsut expected that callers will not do this.

However, it would be useful to know if this happened in future; let's
add a warning to cover the eventuality.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 mm/vmalloc.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Anshuman Khandual Feb. 7, 2025, 8:41 a.m. UTC | #1
On 2/5/25 20:39, Ryan Roberts wrote:
> A call to vmalloc_huge() may cause memory blocks to be mapped at pmd or
> pud level. But it is possible to subsquently call vunmap_range() on a

s/subsquently/subsequently

> sub-range of the mapped memory, which partially overlaps a pmd or pud.
> In this case, vmalloc unmaps the entire pmd or pud so that the
> no-overlapping portion is also unmapped. Clearly that would have a bad
> outcome, but it's not something that any callers do today as far as I
> can tell. So I guess it's jsut expected that callers will not do this.

s/jsut/just

> 
> However, it would be useful to know if this happened in future; let's
> add a warning to cover the eventuality.

This is a reasonable check to prevent bad outcomes later.

> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  mm/vmalloc.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index a6e7acebe9ad..fcdf67d5177a 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -374,8 +374,10 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
>  		if (cleared || pmd_bad(*pmd))
>  			*mask |= PGTBL_PMD_MODIFIED;
>  
> -		if (cleared)
> +		if (cleared) {
> +			WARN_ON(next - addr < PMD_SIZE);
>  			continue;
> +		}
>  		if (pmd_none_or_clear_bad(pmd))
>  			continue;
>  		vunmap_pte_range(pmd, addr, next, mask);
> @@ -399,8 +401,10 @@ static void vunmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
>  		if (cleared || pud_bad(*pud))
>  			*mask |= PGTBL_PUD_MODIFIED;
>  
> -		if (cleared)
> +		if (cleared) {
> +			WARN_ON(next - addr < PUD_SIZE);
>  			continue;
> +		}
>  		if (pud_none_or_clear_bad(pud))
>  			continue;
>  		vunmap_pmd_range(pud, addr, next, mask);
Why not also include such checks in vunmap_p4d_range() and __vunmap_range_noflush()
for corresponding P4D and PGD levels as well ?
Ryan Roberts Feb. 7, 2025, 10:59 a.m. UTC | #2
On 07/02/2025 08:41, Anshuman Khandual wrote:
> On 2/5/25 20:39, Ryan Roberts wrote:
>> A call to vmalloc_huge() may cause memory blocks to be mapped at pmd or
>> pud level. But it is possible to subsquently call vunmap_range() on a
> 
> s/subsquently/subsequently
> 
>> sub-range of the mapped memory, which partially overlaps a pmd or pud.
>> In this case, vmalloc unmaps the entire pmd or pud so that the
>> no-overlapping portion is also unmapped. Clearly that would have a bad
>> outcome, but it's not something that any callers do today as far as I
>> can tell. So I guess it's jsut expected that callers will not do this.
> 
> s/jsut/just
> 
>>
>> However, it would be useful to know if this happened in future; let's
>> add a warning to cover the eventuality.
> 
> This is a reasonable check to prevent bad outcomes later.
> 
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  mm/vmalloc.c | 8 ++++++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index a6e7acebe9ad..fcdf67d5177a 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -374,8 +374,10 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
>>  		if (cleared || pmd_bad(*pmd))
>>  			*mask |= PGTBL_PMD_MODIFIED;
>>  
>> -		if (cleared)
>> +		if (cleared) {
>> +			WARN_ON(next - addr < PMD_SIZE);
>>  			continue;
>> +		}
>>  		if (pmd_none_or_clear_bad(pmd))
>>  			continue;
>>  		vunmap_pte_range(pmd, addr, next, mask);
>> @@ -399,8 +401,10 @@ static void vunmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
>>  		if (cleared || pud_bad(*pud))
>>  			*mask |= PGTBL_PUD_MODIFIED;
>>  
>> -		if (cleared)
>> +		if (cleared) {
>> +			WARN_ON(next - addr < PUD_SIZE);
>>  			continue;
>> +		}
>>  		if (pud_none_or_clear_bad(pud))
>>  			continue;
>>  		vunmap_pmd_range(pud, addr, next, mask);
> Why not also include such checks in vunmap_p4d_range() and __vunmap_range_noflush()
> for corresponding P4D and PGD levels as well ?

The kernel does not support p4d or pgd leaf entries so there is nothing to check.

Although vunmap_p4d_range() does call p4d_clear_huge(). The function is a stub
and returns void (unlike p[mu]d_clear_huge()). I suspect we could just remove
p4d_clear_huge() entirely. But that would be a separate patch to mm tree I think.

For pgd, there isn't even an equivalent looking function.

Basically at those 2 levels, it's always a table.

Thanks,
Ryan
Anshuman Khandual Feb. 13, 2025, 6:36 a.m. UTC | #3
On 2/7/25 16:29, Ryan Roberts wrote:
> On 07/02/2025 08:41, Anshuman Khandual wrote:
>> On 2/5/25 20:39, Ryan Roberts wrote:
>>> A call to vmalloc_huge() may cause memory blocks to be mapped at pmd or
>>> pud level. But it is possible to subsquently call vunmap_range() on a
>>
>> s/subsquently/subsequently
>>
>>> sub-range of the mapped memory, which partially overlaps a pmd or pud.
>>> In this case, vmalloc unmaps the entire pmd or pud so that the
>>> no-overlapping portion is also unmapped. Clearly that would have a bad
>>> outcome, but it's not something that any callers do today as far as I
>>> can tell. So I guess it's jsut expected that callers will not do this.
>>
>> s/jsut/just
>>
>>>
>>> However, it would be useful to know if this happened in future; let's
>>> add a warning to cover the eventuality.
>>
>> This is a reasonable check to prevent bad outcomes later.
>>
>>>
>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>> ---
>>>  mm/vmalloc.c | 8 ++++++--
>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>> index a6e7acebe9ad..fcdf67d5177a 100644
>>> --- a/mm/vmalloc.c
>>> +++ b/mm/vmalloc.c
>>> @@ -374,8 +374,10 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
>>>  		if (cleared || pmd_bad(*pmd))
>>>  			*mask |= PGTBL_PMD_MODIFIED;
>>>  
>>> -		if (cleared)
>>> +		if (cleared) {
>>> +			WARN_ON(next - addr < PMD_SIZE);
>>>  			continue;
>>> +		}
>>>  		if (pmd_none_or_clear_bad(pmd))
>>>  			continue;
>>>  		vunmap_pte_range(pmd, addr, next, mask);
>>> @@ -399,8 +401,10 @@ static void vunmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
>>>  		if (cleared || pud_bad(*pud))
>>>  			*mask |= PGTBL_PUD_MODIFIED;
>>>  
>>> -		if (cleared)
>>> +		if (cleared) {
>>> +			WARN_ON(next - addr < PUD_SIZE);
>>>  			continue;
>>> +		}
>>>  		if (pud_none_or_clear_bad(pud))
>>>  			continue;
>>>  		vunmap_pmd_range(pud, addr, next, mask);
>> Why not also include such checks in vunmap_p4d_range() and __vunmap_range_noflush()
>> for corresponding P4D and PGD levels as well ?
> 
> The kernel does not support p4d or pgd leaf entries so there is nothing to check.> 
> Although vunmap_p4d_range() does call p4d_clear_huge(). The function is a stub
> and returns void (unlike p[mu]d_clear_huge()). I suspect we could just remove
> p4d_clear_huge() entirely. But that would be a separate patch to mm tree I think.
> 
> For pgd, there isn't even an equivalent looking function.
> 
> Basically at those 2 levels, it's always a table.

Understood, thanks !
diff mbox series

Patch

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a6e7acebe9ad..fcdf67d5177a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -374,8 +374,10 @@  static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
 		if (cleared || pmd_bad(*pmd))
 			*mask |= PGTBL_PMD_MODIFIED;
 
-		if (cleared)
+		if (cleared) {
+			WARN_ON(next - addr < PMD_SIZE);
 			continue;
+		}
 		if (pmd_none_or_clear_bad(pmd))
 			continue;
 		vunmap_pte_range(pmd, addr, next, mask);
@@ -399,8 +401,10 @@  static void vunmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
 		if (cleared || pud_bad(*pud))
 			*mask |= PGTBL_PUD_MODIFIED;
 
-		if (cleared)
+		if (cleared) {
+			WARN_ON(next - addr < PUD_SIZE);
 			continue;
+		}
 		if (pud_none_or_clear_bad(pud))
 			continue;
 		vunmap_pmd_range(pud, addr, next, mask);