diff mbox

arm64: mm: check length in sync_icache_aliases for performance

Message ID 1494490772-203184-1-git-send-email-zhangshaokun@hisilicon.com (mailing list archive)
State New, archived
Headers show

Commit Message

Shaokun Zhang May 11, 2017, 8:19 a.m. UTC
sync_icache_aliases calls flush_icache_range if icache is non-aliasing
policy[see 0a28714 ("arm64: Use PoU cache instr for I/D coherency")].
  
If icache uses non-aliasing and page size is 64K, it will broadcast 1K
DVMs(IC IVAU) to other cpu cores per page. In multi-cores system, so many
DVMs would degenerate performance. Even if page size is 4K, 64 DVMs will
be broadcasted and executed.
 
This patch fixes this issue using invalidation icache all instread of by
VA when length is one or multiple PAGE_SIZE, especailly for
__sync_icache_dcache.

Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
---
 arch/arm64/mm/flush.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Mark Rutland May 11, 2017, 9:16 a.m. UTC | #1
Hi,

On Thu, May 11, 2017 at 04:19:32PM +0800, Shaokun Zhang wrote:
> sync_icache_aliases calls flush_icache_range if icache is non-aliasing
> policy[see 0a28714 ("arm64: Use PoU cache instr for I/D coherency")].
>   
> If icache uses non-aliasing and page size is 64K, it will broadcast 1K
> DVMs(IC IVAU) to other cpu cores per page. In multi-cores system, so many
> DVMs would degenerate performance. Even if page size is 4K, 64 DVMs will
> be broadcasted and executed.

Please note that this depends on the I-cache and D-cache line sizes,
which are not necessarily 64 bytes.

This is also dependent on system integration. DVMs are not an
architectural concept, and the interconnect may optimize this (e.g. with
snoop filters).

> This patch fixes this issue using invalidation icache all instread of by
> VA when length is one or multiple PAGE_SIZE, especailly for
> __sync_icache_dcache.

This means that we'll over-invalidate the I-caches all the time,
potentially harming the performance of unrelated tasks. So this is not
necessarily an improvement.

Do you have a particular workload which is affected by this?

Thanks,
Mark.

> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
> ---
>  arch/arm64/mm/flush.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> index 21a8d82..f71da2d 100644
> --- a/arch/arm64/mm/flush.c
> +++ b/arch/arm64/mm/flush.c
> @@ -29,7 +29,7 @@ void sync_icache_aliases(void *kaddr, unsigned long len)
>  {
>  	unsigned long addr = (unsigned long)kaddr;
>  
> -	if (icache_is_aliasing()) {
> +	if ((len >= PAGE_SIZE) || icache_is_aliasing()) {
>  		__clean_dcache_area_pou(kaddr, len);
>  		__flush_icache_all();
>  	} else {
> -- 
> 1.9.1
>
Shaokun Zhang May 11, 2017, 2:42 p.m. UTC | #2
Hi Mark

Thanks for your reply.

On 2017/5/11 17:16, Mark Rutland wrote:
> Hi,
> 
> On Thu, May 11, 2017 at 04:19:32PM +0800, Shaokun Zhang wrote:
>> sync_icache_aliases calls flush_icache_range if icache is non-aliasing
>> policy[see 0a28714 ("arm64: Use PoU cache instr for I/D coherency")].
>>   
>> If icache uses non-aliasing and page size is 64K, it will broadcast 1K
>> DVMs(IC IVAU) to other cpu cores per page. In multi-cores system, so many
>> DVMs would degenerate performance. Even if page size is 4K, 64 DVMs will
>> be broadcasted and executed.
> 
> Please note that this depends on the I-cache and D-cache line sizes,
> which are not necessarily 64 bytes.

Right. I am sorry that maybe i should explain I-cache line size is 64 bytes
in my case.

> 
> This is also dependent on system integration. DVMs are not an
> architectural concept, and the interconnect may optimize this (e.g. with
> snoop filters).

Hmm, SF is a good choice, However it may be not suitable for IC IVAU broadcast,
perhaps i am limited about this.

> 
>> This patch fixes this issue using invalidation icache all instread of by
>> VA when length is one or multiple PAGE_SIZE, especailly for
>> __sync_icache_dcache.
> 
> This means that we'll over-invalidate the I-caches all the time,
> potentially harming the performance of unrelated tasks. So this is not
> necessarily an improvement.

Agree its harm, therefore only under the condition that one or more pages
would be used IC IVAU, using invalidate the I-cache replaces it.

> 
> Do you have a particular workload which is affected by this?

I write self-modifying code that i want to simulate JVM, it uses mmap to
allocate large memory holding executing code. In the test procedure, i
found that __sync_icache_dcache would be called many times and lots of
DVMs occur. It is mainly used to handle page fault and memory migration.
When i add this check, it decreases number of DVMs. Because of much OOM
printing information, i couldn't give the result between the two scenes.
Maybe i need to optimize this test model.

Thanks
Shaokun

> 
> Thanks,
> Mark.
> 
>> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
>> ---
>>  arch/arm64/mm/flush.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
>> index 21a8d82..f71da2d 100644
>> --- a/arch/arm64/mm/flush.c
>> +++ b/arch/arm64/mm/flush.c
>> @@ -29,7 +29,7 @@ void sync_icache_aliases(void *kaddr, unsigned long len)
>>  {
>>  	unsigned long addr = (unsigned long)kaddr;
>>  
>> -	if (icache_is_aliasing()) {
>> +	if ((len >= PAGE_SIZE) || icache_is_aliasing()) {
>>  		__clean_dcache_area_pou(kaddr, len);
>>  		__flush_icache_all();
>>  	} else {
>> -- 
>> 1.9.1
>>
> 
> .
>
diff mbox

Patch

diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index 21a8d82..f71da2d 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -29,7 +29,7 @@  void sync_icache_aliases(void *kaddr, unsigned long len)
 {
 	unsigned long addr = (unsigned long)kaddr;
 
-	if (icache_is_aliasing()) {
+	if ((len >= PAGE_SIZE) || icache_is_aliasing()) {
 		__clean_dcache_area_pou(kaddr, len);
 		__flush_icache_all();
 	} else {