diff mbox series

[v2] arm64/kernel: Always use level 2 or higher for early mappings

Message ID 20250311073043.96795-2-ardb+git@google.com (mailing list archive)
State New
Headers show
Series [v2] arm64/kernel: Always use level 2 or higher for early mappings | expand

Commit Message

Ard Biesheuvel March 11, 2025, 7:30 a.m. UTC
From: Ard Biesheuvel <ardb@kernel.org>

The page table population code in map_range() uses a recursive algorithm
to create the early mappings of the kernel, the DTB and the ID mapped
text and data pages, and this fails to take into account that the way
these page tables may be constructed is not precisely the same at each
level. In particular, block mappings are not permitted at each level,
and the code as it exists today might inadvertently create such a
forbidden block mapping if it were used to map a region of the
appropriate size and alignment.

This never happens in practice, given the limited size of the assets
being mapped by the early boot code. Nonetheless, it would be better if
this code would behave correctly in all circumstances.

So only permit block mappings at level 2, and page mappings at level 3,
for any page size, and use table mappings exclusively at all other
levels. This change should have no impact in practice, but it makes the
code more robust.

Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Reported-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
v2: take the assignment of protval out of the loop again, so that
    clearing a mapping works as expected wrt the PTE_CONT bit

 arch/arm64/kernel/pi/map_range.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Anshuman Khandual March 11, 2025, 8:42 a.m. UTC | #1
On 3/11/25 13:00, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> The page table population code in map_range() uses a recursive algorithm
> to create the early mappings of the kernel, the DTB and the ID mapped
> text and data pages, and this fails to take into account that the way
> these page tables may be constructed is not precisely the same at each
> level. In particular, block mappings are not permitted at each level,
> and the code as it exists today might inadvertently create such a
> forbidden block mapping if it were used to map a region of the
> appropriate size and alignment.
> 
> This never happens in practice, given the limited size of the assets
> being mapped by the early boot code. Nonetheless, it would be better if
> this code would behave correctly in all circumstances.
> 
> So only permit block mappings at level 2, and page mappings at level 3,
> for any page size, and use table mappings exclusively at all other
> levels. This change should have no impact in practice, but it makes the
> code more robust.
> 
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Reported-by: Ryan Roberts <ryan.roberts@arm.com>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> v2: take the assignment of protval out of the loop again, so that
>     clearing a mapping works as expected wrt the PTE_CONT bit
> 
>  arch/arm64/kernel/pi/map_range.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/pi/map_range.c b/arch/arm64/kernel/pi/map_range.c
> index 2b69e3beeef8..5778697f3062 100644
> --- a/arch/arm64/kernel/pi/map_range.c
> +++ b/arch/arm64/kernel/pi/map_range.c
> @@ -45,12 +45,12 @@ void __init map_range(u64 *pte, u64 start, u64 end, u64 pa, pgprot_t prot,
>  	 * clearing the mapping
>  	 */
>  	if (protval)
> -		protval |= (level < 3) ? PMD_TYPE_SECT : PTE_TYPE_PAGE;
> +		protval |= (level == 2) ? PMD_TYPE_SECT : PTE_TYPE_PAGE;
>  
>  	while (start < end) {
>  		u64 next = min((start | lmask) + 1, PAGE_ALIGN(end));
>  
> -		if (level < 3 && (start | next | pa) & lmask) {
> +		if (level < 2 || (level == 2 && (start | next | pa) & lmask)) {
>  			/*
>  			 * This chunk needs a finer grained mapping. Create a
>  			 * table mapping if necessary and recurse.

Again did not see any problem running this on various 4K/16K/64K page configs.

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Ryan Roberts March 11, 2025, 11:31 a.m. UTC | #2
On 11/03/2025 07:30, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> The page table population code in map_range() uses a recursive algorithm
> to create the early mappings of the kernel, the DTB and the ID mapped
> text and data pages, and this fails to take into account that the way
> these page tables may be constructed is not precisely the same at each
> level. In particular, block mappings are not permitted at each level,
> and the code as it exists today might inadvertently create such a
> forbidden block mapping if it were used to map a region of the
> appropriate size and alignment.
> 
> This never happens in practice, given the limited size of the assets
> being mapped by the early boot code. Nonetheless, it would be better if
> this code would behave correctly in all circumstances.
> 
> So only permit block mappings at level 2, and page mappings at level 3,
> for any page size, and use table mappings exclusively at all other
> levels. This change should have no impact in practice, but it makes the
> code more robust.
> 
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Reported-by: Ryan Roberts <ryan.roberts@arm.com>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> v2: take the assignment of protval out of the loop again, so that
>     clearing a mapping works as expected wrt the PTE_CONT bit

Ouch, totally missed that. Now that I've looked at the code properly, I wonder
if it is worth supporting CONT_PMD mappings at level 2? That would be 32M for 4K
pages, so is likely to actaually get used. I think it would just be something like:

u64 cmask = (level == 3) ? CONT_PTE_SIZE - 1 :
	   ((level == 2) ? CONT_PMD_SIZE - 1 : U64_MAX) : U64_MAX;

Anyway, probably not worth the complexity for these boot-time tables, so regardless:

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

> 
>  arch/arm64/kernel/pi/map_range.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/pi/map_range.c b/arch/arm64/kernel/pi/map_range.c
> index 2b69e3beeef8..5778697f3062 100644
> --- a/arch/arm64/kernel/pi/map_range.c
> +++ b/arch/arm64/kernel/pi/map_range.c
> @@ -45,12 +45,12 @@ void __init map_range(u64 *pte, u64 start, u64 end, u64 pa, pgprot_t prot,
>  	 * clearing the mapping
>  	 */
>  	if (protval)
> -		protval |= (level < 3) ? PMD_TYPE_SECT : PTE_TYPE_PAGE;
> +		protval |= (level == 2) ? PMD_TYPE_SECT : PTE_TYPE_PAGE;
>  
>  	while (start < end) {
>  		u64 next = min((start | lmask) + 1, PAGE_ALIGN(end));
>  
> -		if (level < 3 && (start | next | pa) & lmask) {
> +		if (level < 2 || (level == 2 && (start | next | pa) & lmask)) {
>  			/*
>  			 * This chunk needs a finer grained mapping. Create a
>  			 * table mapping if necessary and recurse.
Ard Biesheuvel March 11, 2025, 6:35 p.m. UTC | #3
On Tue, 11 Mar 2025 at 12:31, Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 11/03/2025 07:30, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > The page table population code in map_range() uses a recursive algorithm
> > to create the early mappings of the kernel, the DTB and the ID mapped
> > text and data pages, and this fails to take into account that the way
> > these page tables may be constructed is not precisely the same at each
> > level. In particular, block mappings are not permitted at each level,
> > and the code as it exists today might inadvertently create such a
> > forbidden block mapping if it were used to map a region of the
> > appropriate size and alignment.
> >
> > This never happens in practice, given the limited size of the assets
> > being mapped by the early boot code. Nonetheless, it would be better if
> > this code would behave correctly in all circumstances.
> >
> > So only permit block mappings at level 2, and page mappings at level 3,
> > for any page size, and use table mappings exclusively at all other
> > levels. This change should have no impact in practice, but it makes the
> > code more robust.
> >
> > Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> > Reported-by: Ryan Roberts <ryan.roberts@arm.com>
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> > v2: take the assignment of protval out of the loop again, so that
> >     clearing a mapping works as expected wrt the PTE_CONT bit
>
> Ouch, totally missed that. Now that I've looked at the code properly, I wonder
> if it is worth supporting CONT_PMD mappings at level 2? That would be 32M for 4K
> pages, so is likely to actaually get used. I think it would just be something like:
>
> u64 cmask = (level == 3) ? CONT_PTE_SIZE - 1 :
>            ((level == 2) ? CONT_PMD_SIZE - 1 : U64_MAX) : U64_MAX;
>

With this patch applied, it is guaranteed that cmask is only
referenced if level >= 2, so we can simplify this as

  u64 cmask = (level < 3) ? CONT_PMD_SIZE - 1 : CONT_PTE_SIZE - 1;

(using the level < 3 for consistency with the protval assignment)

> Anyway, probably not worth the complexity for these boot-time tables, so regardless:
>

They are created at boot time but these are not 'boot-time tables' -
they are the ones that are used by the kernel at runtime. So I think
it makes sense to add this.

> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
>

Thanks.
diff mbox series

Patch

diff --git a/arch/arm64/kernel/pi/map_range.c b/arch/arm64/kernel/pi/map_range.c
index 2b69e3beeef8..5778697f3062 100644
--- a/arch/arm64/kernel/pi/map_range.c
+++ b/arch/arm64/kernel/pi/map_range.c
@@ -45,12 +45,12 @@  void __init map_range(u64 *pte, u64 start, u64 end, u64 pa, pgprot_t prot,
 	 * clearing the mapping
 	 */
 	if (protval)
-		protval |= (level < 3) ? PMD_TYPE_SECT : PTE_TYPE_PAGE;
+		protval |= (level == 2) ? PMD_TYPE_SECT : PTE_TYPE_PAGE;
 
 	while (start < end) {
 		u64 next = min((start | lmask) + 1, PAGE_ALIGN(end));
 
-		if (level < 3 && (start | next | pa) & lmask) {
+		if (level < 2 || (level == 2 && (start | next | pa) & lmask)) {
 			/*
 			 * This chunk needs a finer grained mapping. Create a
 			 * table mapping if necessary and recurse.