Message ID | 20250310062812.216031-1-anshuman.khandual@arm.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [V2] arm64/mm: Create level specific section mappings in map_range() | expand |
On Mon, Mar 10, 2025 at 11:58:12AM +0530, Anshuman Khandual wrote: > Currently PMD section mapping mask i.e PMD_TYPE_SECT is used while creating > section mapping at all page table levels except the last level. This works > fine as the section mapping masks are exactly the same (0x1UL) for all page > table levels. > > This will change in the future with D128 page tables that have unique skip > level values (SKL) required for creating section mapping at different page > table levels. Hence use page table level specific section mapping macros > instead of the common PMD_TYPE_SECT. While here also ensure that a section > mapping is only created on page table levels which could support that on a > given page size configuration otherwise fall back to create table entries. > > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Will Deacon <will@kernel.org> > Cc: Ryan Roberts <ryan.roberts@arm.com> > Cc: Ard Biesheuvel <ardb@kernel.org> > Cc: linux-kernel@vger.kernel.org > Cc: linux-arm-kernel@lists.infradead.org > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> > --- > This patch applies on v6.14-rc6 > > Changes in V2: > > - Dropped PGD_TYPE_SECT macro and its instance from map_range() > - Create table entries on levels where section mapping is not possible On the last version, there was talk of an extant bug: https://lore.kernel.org/all/5f1b36e2-6455-44d9-97b0-253aefd5024f@arm.com/ ... did that turn out to not be the case? > > Changes in V1: > > https://lore.kernel.org/all/20250303041834.2796751-1-anshuman.khandual@arm.com/ > > arch/arm64/kernel/pi/map_range.c | 38 +++++++++++++++++++++++++++++--- > 1 file changed, 35 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/kernel/pi/map_range.c b/arch/arm64/kernel/pi/map_range.c > index 2b69e3beeef8..25a70c675c4d 100644 > --- a/arch/arm64/kernel/pi/map_range.c > +++ b/arch/arm64/kernel/pi/map_range.c > @@ -11,6 +11,22 @@ > > #include "pi.h" > > +static bool sect_supported(int level) What exactly is this function supposed to indicate? Due to the 'default' case, it'll return true for level-3 tables where sections/blocks aren't possible, but pages are. So maybe this is just misnamed, and wants to be something like "leaf_supported()" ? > +{ > + switch (level) { > + case -1: > + return false; > + case 0: > + if (IS_ENABLED(CONFIG_ARM64_16K_PAGES) || > + IS_ENABLED(CONFIG_ARM64_64K_PAGES)) > + return false; > + else > + return true; Surely: case 0: return IS_ENABLED(CONFIG_ARM64_4K_PAGES); ... though that's only the case when LPA2 is supported and TCR_ELx.DS is set. > + default: > + return true; As above, what is this supposed to return for level-3 tables where sections/blocks aren't possible? Can we BUILD_BUG() for bogus levels? > + } > +} > + > /** > * map_range - Map a contiguous range of physical pages into virtual memory > * > @@ -44,13 +60,29 @@ void __init map_range(u64 *pte, u64 start, u64 end, u64 pa, pgprot_t prot, > * Set the right block/page bits for this level unless we are > * clearing the mapping > */ > - if (protval) > - protval |= (level < 3) ? PMD_TYPE_SECT : PTE_TYPE_PAGE; > + if (protval && sect_supported(level)) { The existing code is trying to find the leaf prot for a given level of table, so as-is checking sect_supported() is at best misleading. If we're going to do something different below based on sect_supported(), I reckon this should be moved into the path where sect_supported() returned true, maybe factored out into a helper for clarity. Mark. > + switch (level) { > + case 3: > + protval |= PTE_TYPE_PAGE; > + break; > + case 2: > + protval |= PMD_TYPE_SECT; > + break; > + case 1: > + protval |= PUD_TYPE_SECT; > + break; > + case 0: > + protval |= P4D_TYPE_SECT; > + break; > + default: > + break; > + } > + } > > while (start < end) { > u64 next = min((start | lmask) + 1, PAGE_ALIGN(end)); > > - if (level < 3 && (start | next | pa) & lmask) { > + if ((level < 3 && (start | next | pa) & lmask) || !sect_supported(level)) { > /* > * This chunk needs a finer grained mapping. Create a > * table mapping if necessary and recurse. > -- > 2.25.1 > >
On 10/03/2025 10:55, Mark Rutland wrote: > On Mon, Mar 10, 2025 at 11:58:12AM +0530, Anshuman Khandual wrote: >> Currently PMD section mapping mask i.e PMD_TYPE_SECT is used while creating >> section mapping at all page table levels except the last level. This works >> fine as the section mapping masks are exactly the same (0x1UL) for all page >> table levels. >> >> This will change in the future with D128 page tables that have unique skip >> level values (SKL) required for creating section mapping at different page >> table levels. Hence use page table level specific section mapping macros >> instead of the common PMD_TYPE_SECT. While here also ensure that a section >> mapping is only created on page table levels which could support that on a >> given page size configuration otherwise fall back to create table entries. >> >> Cc: Catalin Marinas <catalin.marinas@arm.com> >> Cc: Will Deacon <will@kernel.org> >> Cc: Ryan Roberts <ryan.roberts@arm.com> >> Cc: Ard Biesheuvel <ardb@kernel.org> >> Cc: linux-kernel@vger.kernel.org >> Cc: linux-arm-kernel@lists.infradead.org >> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> >> --- >> This patch applies on v6.14-rc6 >> >> Changes in V2: >> >> - Dropped PGD_TYPE_SECT macro and its instance from map_range() >> - Create table entries on levels where section mapping is not possible > > On the last version, there was talk of an extant bug: > > https://lore.kernel.org/all/5f1b36e2-6455-44d9-97b0-253aefd5024f@arm.com/ > > ... did that turn out to not be the case? In the absence of Ard (or anyone else) telling me it's not a bug, then I'll continue to assert that it is a bug, and should have a Fixes tag, and Cc stable. > >> >> Changes in V1: >> >> https://lore.kernel.org/all/20250303041834.2796751-1-anshuman.khandual@arm.com/ >> >> arch/arm64/kernel/pi/map_range.c | 38 +++++++++++++++++++++++++++++--- >> 1 file changed, 35 insertions(+), 3 deletions(-) >> >> diff --git a/arch/arm64/kernel/pi/map_range.c b/arch/arm64/kernel/pi/map_range.c >> index 2b69e3beeef8..25a70c675c4d 100644 >> --- a/arch/arm64/kernel/pi/map_range.c >> +++ b/arch/arm64/kernel/pi/map_range.c >> @@ -11,6 +11,22 @@ >> >> #include "pi.h" >> >> +static bool sect_supported(int level) > > What exactly is this function supposed to indicate? > > Due to the 'default' case, it'll return true for level-3 tables where > sections/blocks aren't possible, but pages are. So maybe this is just > misnamed, and wants to be something like "leaf_supported()" ? I always thought a "section" was (outdated?) linux terminology for a leaf mapping at any level. For my understanding, I think you're saying it specifically means a leaf mapping at any level other than the last level, so it is actually equivalent to a "block" mapping in the Arm ARM? Either way, the intended semantic for this function is "is it valid to create a leaf mapping at the specified level?" where leaf = block or page. Thanks, Ryan > >> +{ >> + switch (level) { >> + case -1: >> + return false; >> + case 0: >> + if (IS_ENABLED(CONFIG_ARM64_16K_PAGES) || >> + IS_ENABLED(CONFIG_ARM64_64K_PAGES)) >> + return false; >> + else >> + return true; > > Surely: > > case 0: > return IS_ENABLED(CONFIG_ARM64_4K_PAGES); > > ... though that's only the case when LPA2 is supported and TCR_ELx.DS is > set. > >> + default: >> + return true; > > As above, what is this supposed to return for level-3 tables where > sections/blocks aren't possible? > > Can we BUILD_BUG() for bogus levels? > >> + } >> +} >> + >> /** >> * map_range - Map a contiguous range of physical pages into virtual memory >> * >> @@ -44,13 +60,29 @@ void __init map_range(u64 *pte, u64 start, u64 end, u64 pa, pgprot_t prot, >> * Set the right block/page bits for this level unless we are >> * clearing the mapping >> */ >> - if (protval) >> - protval |= (level < 3) ? PMD_TYPE_SECT : PTE_TYPE_PAGE; >> + if (protval && sect_supported(level)) { > > The existing code is trying to find the leaf prot for a given level of > table, so as-is checking sect_supported() is at best misleading. > > If we're going to do something different below based on > sect_supported(), I reckon this should be moved into the path where > sect_supported() returned true, maybe factored out into a helper for > clarity. > > Mark. > >> + switch (level) { >> + case 3: >> + protval |= PTE_TYPE_PAGE; >> + break; >> + case 2: >> + protval |= PMD_TYPE_SECT; >> + break; >> + case 1: >> + protval |= PUD_TYPE_SECT; >> + break; >> + case 0: >> + protval |= P4D_TYPE_SECT; >> + break; >> + default: >> + break; >> + } >> + } >> >> while (start < end) { >> u64 next = min((start | lmask) + 1, PAGE_ALIGN(end)); >> >> - if (level < 3 && (start | next | pa) & lmask) { >> + if ((level < 3 && (start | next | pa) & lmask) || !sect_supported(level)) { >> /* >> * This chunk needs a finer grained mapping. Create a >> * table mapping if necessary and recurse. >> -- >> 2.25.1 >> >>
On Mon, Mar 10, 2025 at 01:48:53PM +0000, Ryan Roberts wrote: > On 10/03/2025 10:55, Mark Rutland wrote: > > On Mon, Mar 10, 2025 at 11:58:12AM +0530, Anshuman Khandual wrote: > >> Currently PMD section mapping mask i.e PMD_TYPE_SECT is used while creating > >> section mapping at all page table levels except the last level. This works > >> fine as the section mapping masks are exactly the same (0x1UL) for all page > >> table levels. > >> > >> This will change in the future with D128 page tables that have unique skip > >> level values (SKL) required for creating section mapping at different page > >> table levels. Hence use page table level specific section mapping macros > >> instead of the common PMD_TYPE_SECT. While here also ensure that a section > >> mapping is only created on page table levels which could support that on a > >> given page size configuration otherwise fall back to create table entries. > >> > >> Cc: Catalin Marinas <catalin.marinas@arm.com> > >> Cc: Will Deacon <will@kernel.org> > >> Cc: Ryan Roberts <ryan.roberts@arm.com> > >> Cc: Ard Biesheuvel <ardb@kernel.org> > >> Cc: linux-kernel@vger.kernel.org > >> Cc: linux-arm-kernel@lists.infradead.org > >> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> > >> --- > >> This patch applies on v6.14-rc6 > >> > >> Changes in V2: > >> > >> - Dropped PGD_TYPE_SECT macro and its instance from map_range() > >> - Create table entries on levels where section mapping is not possible > > > > On the last version, there was talk of an extant bug: > > > > https://lore.kernel.org/all/5f1b36e2-6455-44d9-97b0-253aefd5024f@arm.com/ > > > > ... did that turn out to not be the case? > > In the absence of Ard (or anyone else) telling me it's not a bug, then I'll > continue to assert that it is a bug, and should have a Fixes tag, and Cc stable. Ok -- can we please have a description of the bug, and a fix specifically for that, before we try to reshape this for D128? IIUC the problem is that for a sufficiently large (and aligned) extent of physical memory, we might try to create a block mapping larger than the HW supports? With that in mind, my earlier comment about LPA2 and blocks at level-0 is particularly relevant... > >> Changes in V1: > >> > >> https://lore.kernel.org/all/20250303041834.2796751-1-anshuman.khandual@arm.com/ > >> > >> arch/arm64/kernel/pi/map_range.c | 38 +++++++++++++++++++++++++++++--- > >> 1 file changed, 35 insertions(+), 3 deletions(-) > >> > >> diff --git a/arch/arm64/kernel/pi/map_range.c b/arch/arm64/kernel/pi/map_range.c > >> index 2b69e3beeef8..25a70c675c4d 100644 > >> --- a/arch/arm64/kernel/pi/map_range.c > >> +++ b/arch/arm64/kernel/pi/map_range.c > >> @@ -11,6 +11,22 @@ > >> > >> #include "pi.h" > >> > >> +static bool sect_supported(int level) > > > > What exactly is this function supposed to indicate? > > > > Due to the 'default' case, it'll return true for level-3 tables where > > sections/blocks aren't possible, but pages are. So maybe this is just > > misnamed, and wants to be something like "leaf_supported()" ? > > I always thought a "section" was (outdated?) linux terminology for a leaf > mapping at any level. For my understanding, I think you're saying it > specifically means a leaf mapping at any level other than the last level, so it > is actually equivalent to a "block" mapping in the Arm ARM? Yep -- "section" is old style terminology that's (roughly) equivalent to "block" at level 2. When the arm64 port got written, the "section"/"sect" terminology was inherited to mean "block" at any level. Ideally we'd clean that up. Historically ARM had "small pages", "large pages", "sections", and "supersections". See the ARMv7-A ARM (ARM DDI 0406C.d), page B3-1321. > Either way, the intended semantic for this function is "is it valid to create a > leaf mapping at the specified level?" where leaf = block or page. Cool. With that in mind, having "leaf" in the name would be better. Though TBH, maybe it makes more sense to split this up into separate levels (as with the code in arch/arm64/mm/mmu.c) and handle each level explicitly with a separate function. That way each level's check can be handled within that level's function, and we can use the right types, etc. It feels like that might be better for D128 in future? Mark. > > Thanks, > Ryan > > > > >> +{ > >> + switch (level) { > >> + case -1: > >> + return false; > >> + case 0: > >> + if (IS_ENABLED(CONFIG_ARM64_16K_PAGES) || > >> + IS_ENABLED(CONFIG_ARM64_64K_PAGES)) > >> + return false; > >> + else > >> + return true; > > > > Surely: > > > > case 0: > > return IS_ENABLED(CONFIG_ARM64_4K_PAGES); > > > > ... though that's only the case when LPA2 is supported and TCR_ELx.DS is > > set. > > > >> + default: > >> + return true; > > > > As above, what is this supposed to return for level-3 tables where > > sections/blocks aren't possible? > > > > Can we BUILD_BUG() for bogus levels? > > > >> + } > >> +} > >> + > >> /** > >> * map_range - Map a contiguous range of physical pages into virtual memory > >> * > >> @@ -44,13 +60,29 @@ void __init map_range(u64 *pte, u64 start, u64 end, u64 pa, pgprot_t prot, > >> * Set the right block/page bits for this level unless we are > >> * clearing the mapping > >> */ > >> - if (protval) > >> - protval |= (level < 3) ? PMD_TYPE_SECT : PTE_TYPE_PAGE; > >> + if (protval && sect_supported(level)) { > > > > The existing code is trying to find the leaf prot for a given level of > > table, so as-is checking sect_supported() is at best misleading. > > > > If we're going to do something different below based on > > sect_supported(), I reckon this should be moved into the path where > > sect_supported() returned true, maybe factored out into a helper for > > clarity. > > > > Mark. > > > >> + switch (level) { > >> + case 3: > >> + protval |= PTE_TYPE_PAGE; > >> + break; > >> + case 2: > >> + protval |= PMD_TYPE_SECT; > >> + break; > >> + case 1: > >> + protval |= PUD_TYPE_SECT; > >> + break; > >> + case 0: > >> + protval |= P4D_TYPE_SECT; > >> + break; > >> + default: > >> + break; > >> + } > >> + } > >> > >> while (start < end) { > >> u64 next = min((start | lmask) + 1, PAGE_ALIGN(end)); > >> > >> - if (level < 3 && (start | next | pa) & lmask) { > >> + if ((level < 3 && (start | next | pa) & lmask) || !sect_supported(level)) { > >> /* > >> * This chunk needs a finer grained mapping. Create a > >> * table mapping if necessary and recurse. > >> -- > >> 2.25.1 > >> > >> >
On Mon, 10 Mar 2025 at 15:42, Mark Rutland <mark.rutland@arm.com> wrote: > > On Mon, Mar 10, 2025 at 01:48:53PM +0000, Ryan Roberts wrote: > > On 10/03/2025 10:55, Mark Rutland wrote: > > > On Mon, Mar 10, 2025 at 11:58:12AM +0530, Anshuman Khandual wrote: > > >> Currently PMD section mapping mask i.e PMD_TYPE_SECT is used while creating > > >> section mapping at all page table levels except the last level. This works > > >> fine as the section mapping masks are exactly the same (0x1UL) for all page > > >> table levels. > > >> > > >> This will change in the future with D128 page tables that have unique skip > > >> level values (SKL) required for creating section mapping at different page > > >> table levels. Hence use page table level specific section mapping macros > > >> instead of the common PMD_TYPE_SECT. While here also ensure that a section > > >> mapping is only created on page table levels which could support that on a > > >> given page size configuration otherwise fall back to create table entries. > > >> > > >> Cc: Catalin Marinas <catalin.marinas@arm.com> > > >> Cc: Will Deacon <will@kernel.org> > > >> Cc: Ryan Roberts <ryan.roberts@arm.com> > > >> Cc: Ard Biesheuvel <ardb@kernel.org> > > >> Cc: linux-kernel@vger.kernel.org > > >> Cc: linux-arm-kernel@lists.infradead.org > > >> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> > > >> --- > > >> This patch applies on v6.14-rc6 > > >> > > >> Changes in V2: > > >> > > >> - Dropped PGD_TYPE_SECT macro and its instance from map_range() > > >> - Create table entries on levels where section mapping is not possible > > > > > > On the last version, there was talk of an extant bug: > > > > > > https://lore.kernel.org/all/5f1b36e2-6455-44d9-97b0-253aefd5024f@arm.com/ > > > > > > ... did that turn out to not be the case? > > > > In the absence of Ard (or anyone else) telling me it's not a bug, then I'll > > continue to assert that it is a bug, and should have a Fixes tag, and Cc stable. > > Ok -- can we please have a description of the bug, and a fix > specifically for that, before we try to reshape this for D128? > > IIUC the problem is that for a sufficiently large (and aligned) extent > of physical memory, we might try to create a block mapping larger than > the HW supports? > Yes. However, this code is only called to map the kernel, the DTB and the ID map code and data pages, and so it seems unlikely that we'll hit this bug in practice. Nonetheless, it is a shortcoming that we should address to avoid future surprises. > With that in mind, my earlier comment about LPA2 and blocks at level-0 > is particularly relevant... > Indeed - at the higher levels, there is quite some variation between page sizes, LPA2 vs !LPA2 etc in which level is the highest at which block mappings are supported. But given the context where this code is used, I'd prefer to simply map everything down to level 2 or higher (for 64-bit descriptors) rather than add elaborate rules for all these combinations that are never exercised in practice. The net result should be identical. ... > > Though TBH, maybe it makes more sense to split this up into separate > levels (as with the code in arch/arm64/mm/mmu.c) and handle each level > explicitly with a separate function. That way each level's check can be > handled within that level's function, and we can use the right types, > etc. It feels like that might be better for D128 in future? > I have a slight inclination towards having a separate routine for D128, but I can easily be persuaded otherwise.
On Mon, Mar 10, 2025 at 03:53:33PM +0100, Ard Biesheuvel wrote: > On Mon, 10 Mar 2025 at 15:42, Mark Rutland <mark.rutland@arm.com> wrote: > > > > On Mon, Mar 10, 2025 at 01:48:53PM +0000, Ryan Roberts wrote: > > > On 10/03/2025 10:55, Mark Rutland wrote: > > IIUC the problem is that for a sufficiently large (and aligned) extent > > of physical memory, we might try to create a block mapping larger than > > the HW supports? > > Yes. However, this code is only called to map the kernel, the DTB and > the ID map code and data pages, and so it seems unlikely that we'll > hit this bug in practice. > > Nonetheless, it is a shortcoming that we should address to avoid > future surprises. > > > With that in mind, my earlier comment about LPA2 and blocks at level-0 > > is particularly relevant... > > Indeed - at the higher levels, there is quite some variation between > page sizes, LPA2 vs !LPA2 etc in which level is the highest at which > block mappings are supported. > > But given the context where this code is used, I'd prefer to simply > map everything down to level 2 or higher (for 64-bit descriptors) > rather than add elaborate rules for all these combinations that are > never exercised in practice. The net result should be identical. That sounds much better to me! > > Though TBH, maybe it makes more sense to split this up into separate > > levels (as with the code in arch/arm64/mm/mmu.c) and handle each level > > explicitly with a separate function. That way each level's check can be > > handled within that level's function, and we can use the right types, > > etc. It feels like that might be better for D128 in future? > > I have a slight inclination towards having a separate routine for > D128, but I can easily be persuaded otherwise. FWIW, I don't have strong feelings either way w.r.t. sharing the code with D128. If it's cleaner for that to be separate, sure, but I'd have to see the code. My concern here was the the current recursive nature of the PI map_range() function makes it hard to have level-specific behaviour, and means that it's structurally different to most other code in the kernel, e.g. the alloc_init_pXX() functions in arch/arm64/mm/mmu.c, where we have separate functions for each of the levels that handle the level-specific details. I think that structure of separate functions per level is clearer, even if it's a bit more code. Mark.
diff --git a/arch/arm64/kernel/pi/map_range.c b/arch/arm64/kernel/pi/map_range.c index 2b69e3beeef8..25a70c675c4d 100644 --- a/arch/arm64/kernel/pi/map_range.c +++ b/arch/arm64/kernel/pi/map_range.c @@ -11,6 +11,22 @@ #include "pi.h" +static bool sect_supported(int level) +{ + switch (level) { + case -1: + return false; + case 0: + if (IS_ENABLED(CONFIG_ARM64_16K_PAGES) || + IS_ENABLED(CONFIG_ARM64_64K_PAGES)) + return false; + else + return true; + default: + return true; + } +} + /** * map_range - Map a contiguous range of physical pages into virtual memory * @@ -44,13 +60,29 @@ void __init map_range(u64 *pte, u64 start, u64 end, u64 pa, pgprot_t prot, * Set the right block/page bits for this level unless we are * clearing the mapping */ - if (protval) - protval |= (level < 3) ? PMD_TYPE_SECT : PTE_TYPE_PAGE; + if (protval && sect_supported(level)) { + switch (level) { + case 3: + protval |= PTE_TYPE_PAGE; + break; + case 2: + protval |= PMD_TYPE_SECT; + break; + case 1: + protval |= PUD_TYPE_SECT; + break; + case 0: + protval |= P4D_TYPE_SECT; + break; + default: + break; + } + } while (start < end) { u64 next = min((start | lmask) + 1, PAGE_ALIGN(end)); - if (level < 3 && (start | next | pa) & lmask) { + if ((level < 3 && (start | next | pa) & lmask) || !sect_supported(level)) { /* * This chunk needs a finer grained mapping. Create a * table mapping if necessary and recurse.
Currently PMD section mapping mask i.e PMD_TYPE_SECT is used while creating section mapping at all page table levels except the last level. This works fine as the section mapping masks are exactly the same (0x1UL) for all page table levels. This will change in the future with D128 page tables that have unique skip level values (SKL) required for creating section mapping at different page table levels. Hence use page table level specific section mapping macros instead of the common PMD_TYPE_SECT. While here also ensure that a section mapping is only created on page table levels which could support that on a given page size configuration otherwise fall back to create table entries. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> --- This patch applies on v6.14-rc6 Changes in V2: - Dropped PGD_TYPE_SECT macro and its instance from map_range() - Create table entries on levels where section mapping is not possible Changes in V1: https://lore.kernel.org/all/20250303041834.2796751-1-anshuman.khandual@arm.com/ arch/arm64/kernel/pi/map_range.c | 38 +++++++++++++++++++++++++++++--- 1 file changed, 35 insertions(+), 3 deletions(-)