diff mbox

arm64: mm: Create gigabyte kernel logical mappings where possible

Message ID 1398857782-1525-1-git-send-email-steve.capper@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Steve Capper April 30, 2014, 11:36 a.m. UTC
We have the capability to map 1GB level 1 blocks when using a 4K
granule.

This patch adjusts the create_mapping logic s.t. when mapping physical
memory on boot, we attempt to use a 1GB block if both the VA and PA
start and end are 1GB aligned. This both reduces the levels of lookup
required to resolve a kernel logical address, as well as reduces TLB
pressure on cores that support 1GB TLB entries.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
Hello,
This patch has been tested on the FastModel for 4K and 64K pages.
Also, this has been tested with Jungseok's 4 level patch.

I put in the explicit check for PAGE_SHIFT, as I am anticipating a
three level 64KB configuration at some point.

With two level 64K, a PUD is equivalent to a PMD which is equivalent to
a PGD, and these are all level 2 descriptors.

Under three level 64K, a PUD would be equivalent to a PGD which would
be a level 1 descriptor thus may not be a block.

Comments/critique/testers welcome.

Cheers,

Comments

Arnd Bergmann April 30, 2014, 6:11 p.m. UTC | #1
On Wednesday 30 April 2014 12:36:22 Steve Capper wrote:
> We have the capability to map 1GB level 1 blocks when using a 4K
> granule.
> 
> This patch adjusts the create_mapping logic s.t. when mapping physical
> memory on boot, we attempt to use a 1GB block if both the VA and PA
> start and end are 1GB aligned. This both reduces the levels of lookup
> required to resolve a kernel logical address, as well as reduces TLB
> pressure on cores that support 1GB TLB entries.
> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>
> ---
> Hello,
> This patch has been tested on the FastModel for 4K and 64K pages.
> Also, this has been tested with Jungseok's 4 level patch.
> 
> I put in the explicit check for PAGE_SHIFT, as I am anticipating a
> three level 64KB configuration at some point.
> 
> With two level 64K, a PUD is equivalent to a PMD which is equivalent to
> a PGD, and these are all level 2 descriptors.
> 
> Under three level 64K, a PUD would be equivalent to a PGD which would
> be a level 1 descriptor thus may not be a block.
> 
> Comments/critique/testers welcome.

It seems like a great idea. I have to admit that I don't understand
the existing code, but what are the page sizes used here?

Does the code always use the largest possible page size, or does
it just use either small pages or 1G pages?

In combination with the contiguous page hint, we should be able
to theoretically support 4KB/64KB/2M/32M/1G/16G TLBs in any
combination for boot-time mappings on a 4K page size kernel,
or 64KB/1M/512M/8G on a 64KB page size kernel.

	Arnd
Steve Capper May 1, 2014, 8:54 a.m. UTC | #2
On Wed, Apr 30, 2014 at 08:11:26PM +0200, Arnd Bergmann wrote:
> On Wednesday 30 April 2014 12:36:22 Steve Capper wrote:
> > We have the capability to map 1GB level 1 blocks when using a 4K
> > granule.
> > 
> > This patch adjusts the create_mapping logic s.t. when mapping physical
> > memory on boot, we attempt to use a 1GB block if both the VA and PA
> > start and end are 1GB aligned. This both reduces the levels of lookup
> > required to resolve a kernel logical address, as well as reduces TLB
> > pressure on cores that support 1GB TLB entries.
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > ---
> > Hello,
> > This patch has been tested on the FastModel for 4K and 64K pages.
> > Also, this has been tested with Jungseok's 4 level patch.
> > 
> > I put in the explicit check for PAGE_SHIFT, as I am anticipating a
> > three level 64KB configuration at some point.
> > 
> > With two level 64K, a PUD is equivalent to a PMD which is equivalent to
> > a PGD, and these are all level 2 descriptors.
> > 
> > Under three level 64K, a PUD would be equivalent to a PGD which would
> > be a level 1 descriptor thus may not be a block.
> > 
> > Comments/critique/testers welcome.
> 
> It seems like a great idea. I have to admit that I don't understand
> the existing code, but what are the page sizes used here?

Actually, I think it was your idea ;-). I remember you talking about
increasing the mapping size when 4-level page tables were being
discussed. (I think I should have added a Reported-by, would be happy
to if you want?).

With a 64KB granule, we'll map 512MB blocks if possible, otherwise 64K.
And with a 4KB granule, the original code will map 2MB blocks if
possible, and 4KB otherwise.

The patch will make the 4KB granule case also map 1GB blocks if
possible.

> 
> Does the code always use the largest possible page size, or does
> it just use either small pages or 1G pages?

The code will put down the largest mappings it can. As the physical
memory sizes/address are very likely to be aligned to whatever block
size we use; we are likely to achieve the maximum size for our
mappings.

> 
> In combination with the contiguous page hint, we should be able
> to theoretically support 4KB/64KB/2M/32M/1G/16G TLBs in any
> combination for boot-time mappings on a 4K page size kernel,
> or 64KB/1M/512M/8G on a 64KB page size kernel.
> 

A contiguous hint could be applied to these mappings. The logic would
be a bit more complicated though when we consider different granules.
For 4KB we chain together 16 entries, for 64KB we use 32. If/when we
adopt a 16KB granule, we use 32 entries for a level 2 lookup and
128 entries for a level 3 lookup...

The largest TLB entry sizes that I am aware of in play are the block
sizes (i.e. 2MB, 512MB, 1GB). So I don't think we'll get any benefit at
the moment for adding the contiguous logic.

Cheers,
Arnd Bergmann May 1, 2014, 1:36 p.m. UTC | #3
On Thursday 01 May 2014 09:54:12 Steve Capper wrote:
> On Wed, Apr 30, 2014 at 08:11:26PM +0200, Arnd Bergmann wrote:
> > On Wednesday 30 April 2014 12:36:22 Steve Capper wrote:
> > > We have the capability to map 1GB level 1 blocks when using a 4K
> > > granule.
> > > 
> > > This patch adjusts the create_mapping logic s.t. when mapping physical
> > > memory on boot, we attempt to use a 1GB block if both the VA and PA
> > > start and end are 1GB aligned. This both reduces the levels of lookup
> > > required to resolve a kernel logical address, as well as reduces TLB
> > > pressure on cores that support 1GB TLB entries.
> > > 
> > > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > > ---
> > > Hello,
> > > This patch has been tested on the FastModel for 4K and 64K pages.
> > > Also, this has been tested with Jungseok's 4 level patch.
> > > 
> > > I put in the explicit check for PAGE_SHIFT, as I am anticipating a
> > > three level 64KB configuration at some point.
> > > 
> > > With two level 64K, a PUD is equivalent to a PMD which is equivalent to
> > > a PGD, and these are all level 2 descriptors.
> > > 
> > > Under three level 64K, a PUD would be equivalent to a PGD which would
> > > be a level 1 descriptor thus may not be a block.
> > > 
> > > Comments/critique/testers welcome.
> > 
> > It seems like a great idea. I have to admit that I don't understand
> > the existing code, but what are the page sizes used here?
> 
> Actually, I think it was your idea ;-). I remember you talking about
> increasing the mapping size when 4-level page tables were being
> discussed. (I think I should have added a Reported-by, would be happy
> to if you want?).

I completely forgot we had talked about this.

> With a 64KB granule, we'll map 512MB blocks if possible, otherwise 64K.
> And with a 4KB granule, the original code will map 2MB blocks if
> possible, and 4KB otherwise.
> 
> The patch will make the 4KB granule case also map 1GB blocks if
> possible.

Ok.

> > In combination with the contiguous page hint, we should be able
> > to theoretically support 4KB/64KB/2M/32M/1G/16G TLBs in any
> > combination for boot-time mappings on a 4K page size kernel,
> > or 64KB/1M/512M/8G on a 64KB page size kernel.
> 
> A contiguous hint could be applied to these mappings. The logic would
> be a bit more complicated though when we consider different granules.
> For 4KB we chain together 16 entries, for 64KB we use 32. If/when we
> adopt a 16KB granule, we use 32 entries for a level 2 lookup and
> 128 entries for a level 3 lookup...
> 
> The largest TLB entry sizes that I am aware of in play are the block
> sizes (i.e. 2MB, 512MB, 1GB). So I don't think we'll get any benefit at
> the moment for adding the contiguous logic.

Is that an architecture limit, or specific to the Cortex-A53/A57
implementations?

	Arnd
Steve Capper May 1, 2014, 4:20 p.m. UTC | #4
On Thu, May 01, 2014 at 03:36:05PM +0200, Arnd Bergmann wrote:
> On Thursday 01 May 2014 09:54:12 Steve Capper wrote:
> > On Wed, Apr 30, 2014 at 08:11:26PM +0200, Arnd Bergmann wrote:
> > > On Wednesday 30 April 2014 12:36:22 Steve Capper wrote:
> > > > We have the capability to map 1GB level 1 blocks when using a 4K
> > > > granule.
> > > > 
> > > > This patch adjusts the create_mapping logic s.t. when mapping physical
> > > > memory on boot, we attempt to use a 1GB block if both the VA and PA
> > > > start and end are 1GB aligned. This both reduces the levels of lookup
> > > > required to resolve a kernel logical address, as well as reduces TLB
> > > > pressure on cores that support 1GB TLB entries.
> > > > 
> > > > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > > > ---
> > > > Hello,
> > > > This patch has been tested on the FastModel for 4K and 64K pages.
> > > > Also, this has been tested with Jungseok's 4 level patch.
> > > > 
> > > > I put in the explicit check for PAGE_SHIFT, as I am anticipating a
> > > > three level 64KB configuration at some point.
> > > > 
> > > > With two level 64K, a PUD is equivalent to a PMD which is equivalent to
> > > > a PGD, and these are all level 2 descriptors.
> > > > 
> > > > Under three level 64K, a PUD would be equivalent to a PGD which would
> > > > be a level 1 descriptor thus may not be a block.
> > > > 
> > > > Comments/critique/testers welcome.
> > > 
> > > It seems like a great idea. I have to admit that I don't understand
> > > the existing code, but what are the page sizes used here?
> > 
> > Actually, I think it was your idea ;-). I remember you talking about
> > increasing the mapping size when 4-level page tables were being
> > discussed. (I think I should have added a Reported-by, would be happy
> > to if you want?).
> 
> I completely forgot we had talked about this.
> 
> > With a 64KB granule, we'll map 512MB blocks if possible, otherwise 64K.
> > And with a 4KB granule, the original code will map 2MB blocks if
> > possible, and 4KB otherwise.
> > 
> > The patch will make the 4KB granule case also map 1GB blocks if
> > possible.
> 
> Ok.
> 
> > > In combination with the contiguous page hint, we should be able
> > > to theoretically support 4KB/64KB/2M/32M/1G/16G TLBs in any
> > > combination for boot-time mappings on a 4K page size kernel,
> > > or 64KB/1M/512M/8G on a 64KB page size kernel.
> > 
> > A contiguous hint could be applied to these mappings. The logic would
> > be a bit more complicated though when we consider different granules.
> > For 4KB we chain together 16 entries, for 64KB we use 32. If/when we
> > adopt a 16KB granule, we use 32 entries for a level 2 lookup and
> > 128 entries for a level 3 lookup...
> > 
> > The largest TLB entry sizes that I am aware of in play are the block
> > sizes (i.e. 2MB, 512MB, 1GB). So I don't think we'll get any benefit at
> > the moment for adding the contiguous logic.
> 
> Is that an architecture limit, or specific to the Cortex-A53/A57
> implementations?

Those are the TLBs that are documented for the Cortex-A53 and
Cortex-A57. I have an idea of what the architectural limit is, but I
will need to seek confirmation on it.

Cheers,
??? May 2, 2014, 1:03 a.m. UTC | #5
On Wednesday, April 30, 2014 8:36 PM, Steve Capper wrote:
> We have the capability to map 1GB level 1 blocks when using a 4K granule.
> 
> This patch adjusts the create_mapping logic s.t. when mapping physical memory on boot, we attempt to
> use a 1GB block if both the VA and PA start and end are 1GB aligned. This both reduces the levels of
> lookup required to resolve a kernel logical address, as well as reduces TLB pressure on cores that
> support 1GB TLB entries.
> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>
> ---
> Hello,
> This patch has been tested on the FastModel for 4K and 64K pages.
> Also, this has been tested with Jungseok's 4 level patch.
> 
> I put in the explicit check for PAGE_SHIFT, as I am anticipating a three level 64KB configuration at
> some point.
> 
> With two level 64K, a PUD is equivalent to a PMD which is equivalent to a PGD, and these are all level
> 2 descriptors.
> 
> Under three level 64K, a PUD would be equivalent to a PGD which would be a level 1 descriptor thus may
> not be a block.
> 
> Comments/critique/testers welcome.

Hi, Steve

I've tested on my platform, and it works well.

If SoC design follows "Principles of ARM Memory Maps",
PA should be supposed to be 1GB aligned. Thus, I think
this patch is effective against them.

Best Regards
Jungseok Lee
Catalin Marinas May 2, 2014, 8:51 a.m. UTC | #6
On Wed, Apr 30, 2014 at 12:36:22PM +0100, Steve Capper wrote:
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 4d29332..867e979 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -234,7 +234,20 @@ static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr,
>  	pud = pud_offset(pgd, addr);
>  	do {
>  		next = pud_addr_end(addr, end);
> -		alloc_init_pmd(pud, addr, next, phys);
> +
> +		/*
> +		 * For 4K granule only, attempt to put down a 1GB block
> +		 */
> +		if ((PAGE_SHIFT == 12) &&
> +			((addr | next | phys) & ~PUD_MASK) == 0) {
> +			pud_t old_pud = *pud;
> +			set_pud(pud, __pud(phys | prot_sect_kernel));
> +
> +			if (!pud_none(old_pud))
> +				flush_tlb_all();

We could even free the original pmd here. I think a
memblock_free(pud_pfn(old_pud) << PAGE_SHIFT, PAGE_SIZE) should do
(untested, and you need to define pud_pfn).
Steve Capper May 2, 2014, 9:11 a.m. UTC | #7
On Fri, May 02, 2014 at 10:03:02AM +0900, Jungseok Lee wrote:
> On Wednesday, April 30, 2014 8:36 PM, Steve Capper wrote:
> > We have the capability to map 1GB level 1 blocks when using a 4K granule.
> > 
> > This patch adjusts the create_mapping logic s.t. when mapping physical memory on boot, we attempt to
> > use a 1GB block if both the VA and PA start and end are 1GB aligned. This both reduces the levels of
> > lookup required to resolve a kernel logical address, as well as reduces TLB pressure on cores that
> > support 1GB TLB entries.
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > ---
> > Hello,
> > This patch has been tested on the FastModel for 4K and 64K pages.
> > Also, this has been tested with Jungseok's 4 level patch.
> > 
> > I put in the explicit check for PAGE_SHIFT, as I am anticipating a three level 64KB configuration at
> > some point.
> > 
> > With two level 64K, a PUD is equivalent to a PMD which is equivalent to a PGD, and these are all level
> > 2 descriptors.
> > 
> > Under three level 64K, a PUD would be equivalent to a PGD which would be a level 1 descriptor thus may
> > not be a block.
> > 
> > Comments/critique/testers welcome.
> 
> Hi, Steve
> 
> I've tested on my platform, and it works well.
> 

Thanks for giving this a go!

> If SoC design follows "Principles of ARM Memory Maps",
> PA should be supposed to be 1GB aligned. Thus, I think
> this patch is effective against them.
> 
> Best Regards
> Jungseok Lee
>
Steve Capper May 2, 2014, 9:21 a.m. UTC | #8
On Fri, May 02, 2014 at 09:51:21AM +0100, Catalin Marinas wrote:
> On Wed, Apr 30, 2014 at 12:36:22PM +0100, Steve Capper wrote:
> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > index 4d29332..867e979 100644
> > --- a/arch/arm64/mm/mmu.c
> > +++ b/arch/arm64/mm/mmu.c
> > @@ -234,7 +234,20 @@ static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr,
> >  	pud = pud_offset(pgd, addr);
> >  	do {
> >  		next = pud_addr_end(addr, end);
> > -		alloc_init_pmd(pud, addr, next, phys);
> > +
> > +		/*
> > +		 * For 4K granule only, attempt to put down a 1GB block
> > +		 */
> > +		if ((PAGE_SHIFT == 12) &&
> > +			((addr | next | phys) & ~PUD_MASK) == 0) {
> > +			pud_t old_pud = *pud;
> > +			set_pud(pud, __pud(phys | prot_sect_kernel));
> > +
> > +			if (!pud_none(old_pud))
> > +				flush_tlb_all();
> 
> We could even free the original pmd here. I think a
> memblock_free(pud_pfn(old_pud) << PAGE_SHIFT, PAGE_SIZE) should do
> (untested, and you need to define pud_pfn).

I see what you mean, we will potentially have an unused page in our
swapper_pg_dir array.

I'll have a think, and add some logic to remove the redundant page.

Cheers,
diff mbox

Patch

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 4d29332..867e979 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -234,7 +234,20 @@  static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr,
 	pud = pud_offset(pgd, addr);
 	do {
 		next = pud_addr_end(addr, end);
-		alloc_init_pmd(pud, addr, next, phys);
+
+		/*
+		 * For 4K granule only, attempt to put down a 1GB block
+		 */
+		if ((PAGE_SHIFT == 12) &&
+			((addr | next | phys) & ~PUD_MASK) == 0) {
+			pud_t old_pud = *pud;
+			set_pud(pud, __pud(phys | prot_sect_kernel));
+
+			if (!pud_none(old_pud))
+				flush_tlb_all();
+		} else {
+			alloc_init_pmd(pud, addr, next, phys);
+		}
 		phys += next - addr;
 	} while (pud++, addr = next, addr != end);
 }