Message ID | CAN_5kQBszi=hV1RVjyKO6gOhOuymGjsMwLk6ORaWpkaL-4USxA@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jul 06, 2011 at 04:14:57AM +0100, heechul Yun wrote: > I found a few other places which, I believe, are not necessary for Cortex-A9. > > diff --git a/arch/arm/mm/copypage-v6.c b/arch/arm/mm/copypage-v6.c > index bdba6c6..6d5a847 100644 > --- a/arch/arm/mm/copypage-v6.c > +++ b/arch/arm/mm/copypage-v6.c > @@ -41,7 +41,9 @@ static void v6_copy_user_highpage_nonaliasing(struct page *to, > kfrom = kmap_atomic(from, KM_USER0); > kto = kmap_atomic(to, KM_USER1); > copy_page(kto, kfrom); > +#ifndef CONFIG_CPU_CACHE_V7 > __cpuc_flush_dcache_area(kto, PAGE_SIZE); > +#endif > kunmap_atomic(kto, KM_USER1); > kunmap_atomic(kfrom, KM_USER0); > } > > On handling COW page fault, the above function is called to copy the > page content of the parent to a newly allocate page frame for the > child. Again, since D cache of A9 is PIPT, we do not need to flush the > page as in x86. This modification improves lmbench (fork/exec/shell) > performance by 4-6%. See commit 115b2247 introducing this. We indeed have a PIPT like cache on A9 but it is a Harvard architecture with separate I and D caches. It happened in the past that we got a COW for text page and the I and D cache became incoherent. Since then, the dynamic linker has been fixed and no longer causes this. We could add a check for VM_EXEC in vma->vm_flags. But I wonder whether we still need this flush after commit c0177800 where we assume that a new page cache page has dirty D-cache (and we later flush the caches via set_pte_at). > I think above two patches work for least Cortex-A9 although I am not > sure the use of CONFIG_CPU_CACHE_V7 is appropriate. We need to check the ID_MMFR1 register as there are other ARMv7 cores that cannot do page table walks in the L1 cache.
On Wed, Jul 06, 2011 at 09:56:56AM +0100, Catalin Marinas wrote: > On Wed, Jul 06, 2011 at 04:14:57AM +0100, heechul Yun wrote: > > I found a few other places which, I believe, are not necessary for Cortex-A9. > > > > diff --git a/arch/arm/mm/copypage-v6.c b/arch/arm/mm/copypage-v6.c > > index bdba6c6..6d5a847 100644 > > --- a/arch/arm/mm/copypage-v6.c > > +++ b/arch/arm/mm/copypage-v6.c > > @@ -41,7 +41,9 @@ static void v6_copy_user_highpage_nonaliasing(struct page *to, > > kfrom = kmap_atomic(from, KM_USER0); > > kto = kmap_atomic(to, KM_USER1); > > copy_page(kto, kfrom); > > +#ifndef CONFIG_CPU_CACHE_V7 > > __cpuc_flush_dcache_area(kto, PAGE_SIZE); > > +#endif > > kunmap_atomic(kto, KM_USER1); > > kunmap_atomic(kfrom, KM_USER0); > > } > > > > On handling COW page fault, the above function is called to copy the > > page content of the parent to a newly allocate page frame for the > > child. Again, since D cache of A9 is PIPT, we do not need to flush the > > page as in x86. This modification improves lmbench (fork/exec/shell) > > performance by 4-6%. > > See commit 115b2247 introducing this. We indeed have a PIPT like cache > on A9 but it is a Harvard architecture with separate I and D caches. It > happened in the past that we got a COW for text page and the I and D > cache became incoherent. Since then, the dynamic linker has been fixed > and no longer causes this. We could add a check for VM_EXEC in > vma->vm_flags. > > But I wonder whether we still need this flush after commit c0177800 > where we assume that a new page cache page has dirty D-cache (and we > later flush the caches via set_pte_at). I don't think we need that flush there after c0177800 either. I/D coherency implies that pte_exec() is set, which will get us through to the checking of PG_arch_1 in __sync_icache_dcache(), where we'll call __flush_dcache_page for this page. We don't need this flush anymore, so let's simply kill it outright. Heechul (sorry, is that the correct way of addressing you?) could you please submit a patch removing the __cpuc_flush_dcache_area() from v6_copy_user_highpage_nonaliasing() entirely please? Thanks.
> > We don't need this flush anymore, so let's simply kill it outright. > > Heechul (sorry, is that the correct way of addressing you?) could > you please submit a patch removing the __cpuc_flush_dcache_area() > from v6_copy_user_highpage_nonaliasing() entirely please? > I sent the patch Thanks Heechul
diff --git a/arch/arm/mm/copypage-v6.c b/arch/arm/mm/copypage-v6.c index bdba6c6..6d5a847 100644 --- a/arch/arm/mm/copypage-v6.c +++ b/arch/arm/mm/copypage-v6.c @@ -41,7 +41,9 @@ static void v6_copy_user_highpage_nonaliasing(struct page *to, kfrom = kmap_atomic(from, KM_USER0); kto = kmap_atomic(to, KM_USER1); copy_page(kto, kfrom); +#ifndef CONFIG_CPU_CACHE_V7 __cpuc_flush_dcache_area(kto, PAGE_SIZE); +#endif kunmap_atomic(kto, KM_USER1); kunmap_atomic(kfrom, KM_USER0); } On handling COW page fault, the above function is called to copy the page content of the parent to a newly allocate page frame for the child. Again, since D cache of A9 is PIPT, we do not need to flush the page as in x86. This modification improves lmbench (fork/exec/shell) performance by 4-6%. diff --git a/arch/arm/include/asm/pgalloc.h b/arch/arm/include/asm/pgalloc.h index b12cc98..bff9858 100644 --- a/arch/arm/include/asm/pgalloc.h +++ b/arch/arm/include/asm/pgalloc.h @@ -61,7 +61,9 @@ pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr) pte = (pte_t *)__get_free_page(PGALLOC_GFP); if (pte) { +#if !CONFIG_CPU_CACHE_V7 clean_dcache_area(pte, sizeof(pte_t) * PTRS_PER_PTE); +#endif pte += PTRS_PER_PTE; } @@ -81,7 +83,9 @@ pte_alloc_one(struct mm_struct *mm, unsigned long addr) if (pte) { if (!PageHighMem(pte)) { void *page = page_address(pte); +#if !CONFIG_CPU_CACHE_V7 clean_dcache_area(page, sizeof(pte_t) * PTRS_PER_PTE); +#endif } pgtable_page_ctor(pte); } diff --git a/arch/arm/mm/pgd.c b/arch/arm/mm/pgd.c index be5f58e..343df1b 100644 --- a/arch/arm/mm/pgd.c +++ b/arch/arm/mm/pgd.c @@ -41,8 +41,9 @@ pgd_t *get_pgd_slow(struct mm_struct *mm) memcpy(new_pgd + FIRST_KERNEL_PGD_NR, init_pgd + FIRST_KERNEL_PGD_NR, (PTRS_PER_PGD - FIRST_KERNEL_PGD_NR) * sizeof(pgd_t)); +#if !CONFIG_CPU_CACHE_V7 clean_dcache_area(new_pgd, PTRS_PER_PGD * sizeof(pgd_t)); - +#endif if (!vectors_high()) { /* * On ARM, first page must always be allocated since it