diff mbox series

[v4,04/12] riscv: Only send remote fences when some other CPU is online

Message ID 20240102220134.3229156-5-samuel.holland@sifive.com (mailing list archive)
State Superseded
Headers show
Series riscv: ASID-related and UP-related TLB flush enhancements | expand

Checks

Context Check Description
conchuod/vmtest-for-next-PR success PR summary
conchuod/vmtest-fixes-PR success PR summary
conchuod/patch-4-test-1 success .github/scripts/patches/tests/build_rv32_defconfig.sh
conchuod/patch-4-test-2 success .github/scripts/patches/tests/build_rv64_clang_allmodconfig.sh
conchuod/patch-4-test-3 success .github/scripts/patches/tests/build_rv64_gcc_allmodconfig.sh
conchuod/patch-4-test-4 success .github/scripts/patches/tests/build_rv64_nommu_k210_defconfig.sh
conchuod/patch-4-test-5 success .github/scripts/patches/tests/build_rv64_nommu_virt_defconfig.sh
conchuod/patch-4-test-6 success .github/scripts/patches/tests/checkpatch.sh
conchuod/patch-4-test-7 success .github/scripts/patches/tests/dtb_warn_rv64.sh
conchuod/patch-4-test-8 success .github/scripts/patches/tests/header_inline.sh
conchuod/patch-4-test-9 success .github/scripts/patches/tests/kdoc.sh
conchuod/patch-4-test-10 success .github/scripts/patches/tests/module_param.sh
conchuod/patch-4-test-11 success .github/scripts/patches/tests/verify_fixes.sh
conchuod/patch-4-test-12 success .github/scripts/patches/tests/verify_signedoff.sh

Commit Message

Samuel Holland Jan. 2, 2024, 10 p.m. UTC
If no other CPU is online, a local cache or TLB flush is sufficient.
These checks can be constant-folded when SMP is disabled.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

Changes in v4:
 - New patch for v4

 arch/riscv/mm/cacheflush.c | 4 +++-
 arch/riscv/mm/tlbflush.c   | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

Comments

Jisheng Zhang Jan. 3, 2024, 2:57 p.m. UTC | #1
On Tue, Jan 02, 2024 at 02:00:41PM -0800, Samuel Holland wrote:
> If no other CPU is online, a local cache or TLB flush is sufficient.
> These checks can be constant-folded when SMP is disabled.
> 
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
> 
> Changes in v4:
>  - New patch for v4
> 
>  arch/riscv/mm/cacheflush.c | 4 +++-
>  arch/riscv/mm/tlbflush.c   | 4 +++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index 47c485bc7df0..f7933ae88a55 100644
> --- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -21,7 +21,9 @@ void flush_icache_all(void)
>  {
>  	local_flush_icache_all();
>  
> -	if (riscv_use_sbi_for_rfence())
> +	if (num_online_cpus() < 2)

with patch5, I think it's better to short cut for !SMP, I.E
if (!IS_ENABLED(CONFIG_SMP) || num_online_cpus()) < 2)

so that the UP case can avoid a atomic read and check

> +		return;
> +	else if (riscv_use_sbi_for_rfence())
>  		sbi_remote_fence_i(NULL);
>  	else
>  		on_each_cpu(ipi_remote_fence_i, NULL, 1);
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 2f18fe6fc4f3..37b3c93e3c30 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -73,7 +73,9 @@ static void __ipi_flush_tlb_all(void *info)
>  
>  void flush_tlb_all(void)
>  {
> -	if (riscv_use_sbi_for_rfence())
> +	if (num_online_cpus() < 2)

ditto

> +		local_flush_tlb_all();
> +	else if (riscv_use_sbi_for_rfence())
>  		sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID);
>  	else
>  		on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
> -- 
> 2.42.0
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Jisheng Zhang Jan. 3, 2024, 3:04 p.m. UTC | #2
On Wed, Jan 03, 2024 at 10:58:01PM +0800, Jisheng Zhang wrote:
> On Tue, Jan 02, 2024 at 02:00:41PM -0800, Samuel Holland wrote:
> > If no other CPU is online, a local cache or TLB flush is sufficient.
> > These checks can be constant-folded when SMP is disabled.
> > 
> > Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> > ---
> > 
> > Changes in v4:
> >  - New patch for v4
> > 
> >  arch/riscv/mm/cacheflush.c | 4 +++-
> >  arch/riscv/mm/tlbflush.c   | 4 +++-
> >  2 files changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> > index 47c485bc7df0..f7933ae88a55 100644
> > --- a/arch/riscv/mm/cacheflush.c
> > +++ b/arch/riscv/mm/cacheflush.c
> > @@ -21,7 +21,9 @@ void flush_icache_all(void)
> >  {
> >  	local_flush_icache_all();
> >  
> > -	if (riscv_use_sbi_for_rfence())
> > +	if (num_online_cpus() < 2)
> 
> with patch5, I think it's better to short cut for !SMP, I.E
> if (!IS_ENABLED(CONFIG_SMP) || num_online_cpus()) < 2)

aha, plz ignore this comment, I see the num_online_cpus() is defined as 1U for
UP.

> 
> so that the UP case can avoid a atomic read and check
> 
> > +		return;
> > +	else if (riscv_use_sbi_for_rfence())
> >  		sbi_remote_fence_i(NULL);
> >  	else
> >  		on_each_cpu(ipi_remote_fence_i, NULL, 1);
> > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> > index 2f18fe6fc4f3..37b3c93e3c30 100644
> > --- a/arch/riscv/mm/tlbflush.c
> > +++ b/arch/riscv/mm/tlbflush.c
> > @@ -73,7 +73,9 @@ static void __ipi_flush_tlb_all(void *info)
> >  
> >  void flush_tlb_all(void)
> >  {
> > -	if (riscv_use_sbi_for_rfence())
> > +	if (num_online_cpus() < 2)
> 
> ditto
> 
> > +		local_flush_tlb_all();
> > +	else if (riscv_use_sbi_for_rfence())
> >  		sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID);
> >  	else
> >  		on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
> > -- 
> > 2.42.0
> > 
> > 
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
Alexandre Ghiti Jan. 4, 2024, 12:33 p.m. UTC | #3
On Tue, Jan 2, 2024 at 11:01 PM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> If no other CPU is online, a local cache or TLB flush is sufficient.
> These checks can be constant-folded when SMP is disabled.
>
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
>
> Changes in v4:
>  - New patch for v4
>
>  arch/riscv/mm/cacheflush.c | 4 +++-
>  arch/riscv/mm/tlbflush.c   | 4 +++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index 47c485bc7df0..f7933ae88a55 100644
> --- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -21,7 +21,9 @@ void flush_icache_all(void)
>  {
>         local_flush_icache_all();
>
> -       if (riscv_use_sbi_for_rfence())
> +       if (num_online_cpus() < 2)
> +               return;
> +       else if (riscv_use_sbi_for_rfence())
>                 sbi_remote_fence_i(NULL);
>         else
>                 on_each_cpu(ipi_remote_fence_i, NULL, 1);
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 2f18fe6fc4f3..37b3c93e3c30 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -73,7 +73,9 @@ static void __ipi_flush_tlb_all(void *info)
>
>  void flush_tlb_all(void)
>  {
> -       if (riscv_use_sbi_for_rfence())
> +       if (num_online_cpus() < 2)
> +               local_flush_tlb_all();
> +       else if (riscv_use_sbi_for_rfence())
>                 sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID);
>         else
>                 on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
> --
> 2.42.0
>

on_each_cpu() already deals correctly with a single online cpu, the
only thing to optimize here is the SBI rfence. So I'd move this new
test in sbi_remote_sfence_vma_asid() and sbi_remote_fence_i() to avoid
the superfluous M-mode entry when only one cpu is online by checking
the cpumask. And since sbi_remote_fence_i() is used in another
function (flush_icache_mm()), we could also take advantage of this
optimization when only the local cpu must be flushed.
Samuel Holland Jan. 4, 2024, 3:33 p.m. UTC | #4
Hi Alex,

On 2024-01-04 6:33 AM, Alexandre Ghiti wrote:
> On Tue, Jan 2, 2024 at 11:01 PM Samuel Holland
> <samuel.holland@sifive.com> wrote:
>>
>> If no other CPU is online, a local cache or TLB flush is sufficient.
>> These checks can be constant-folded when SMP is disabled.
>>
>> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
>> ---
>>
>> Changes in v4:
>>  - New patch for v4
>>
>>  arch/riscv/mm/cacheflush.c | 4 +++-
>>  arch/riscv/mm/tlbflush.c   | 4 +++-
>>  2 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
>> index 47c485bc7df0..f7933ae88a55 100644
>> --- a/arch/riscv/mm/cacheflush.c
>> +++ b/arch/riscv/mm/cacheflush.c
>> @@ -21,7 +21,9 @@ void flush_icache_all(void)
>>  {
>>         local_flush_icache_all();
>>
>> -       if (riscv_use_sbi_for_rfence())
>> +       if (num_online_cpus() < 2)
>> +               return;
>> +       else if (riscv_use_sbi_for_rfence())
>>                 sbi_remote_fence_i(NULL);
>>         else
>>                 on_each_cpu(ipi_remote_fence_i, NULL, 1);
>> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
>> index 2f18fe6fc4f3..37b3c93e3c30 100644
>> --- a/arch/riscv/mm/tlbflush.c
>> +++ b/arch/riscv/mm/tlbflush.c
>> @@ -73,7 +73,9 @@ static void __ipi_flush_tlb_all(void *info)
>>
>>  void flush_tlb_all(void)
>>  {
>> -       if (riscv_use_sbi_for_rfence())
>> +       if (num_online_cpus() < 2)
>> +               local_flush_tlb_all();
>> +       else if (riscv_use_sbi_for_rfence())
>>                 sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID);
>>         else
>>                 on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
>> --
>> 2.42.0
>>
> 
> on_each_cpu() already deals correctly with a single online cpu, the
> only thing to optimize here is the SBI rfence. So I'd move this new
> test in sbi_remote_sfence_vma_asid() and sbi_remote_fence_i() to avoid
> the superfluous M-mode entry when only one cpu is online by checking
> the cpumask. And since sbi_remote_fence_i() is used in another

What specific cpumask check are you suggesting? In sbi_remote_sfence_vma_asid()
I don't think we can assume the local cpu is always included in the cpumask
(which we _can_ assume here), so it would need to construct/compare the whole
bitmap. That's much more expensive than the atomic load here.

> function (flush_icache_mm()), we could also take advantage of this
> optimization when only the local cpu must be flushed.

flush_icache_mm() already has a "local" variable which it uses to skip the call
to sbi_remote_fence_i(). Same with the broadcast check in __flush_tlb_range().
So no additional check is needed there. Those two functions, plus the two
changed in this patch, are the only call sites of the sbi_*() functions. I think
it makes more sense to optimize the four call sites than adding an additional
check in the sbi_*() functions.

Regards,
Samuel
diff mbox series

Patch

diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index 47c485bc7df0..f7933ae88a55 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -21,7 +21,9 @@  void flush_icache_all(void)
 {
 	local_flush_icache_all();
 
-	if (riscv_use_sbi_for_rfence())
+	if (num_online_cpus() < 2)
+		return;
+	else if (riscv_use_sbi_for_rfence())
 		sbi_remote_fence_i(NULL);
 	else
 		on_each_cpu(ipi_remote_fence_i, NULL, 1);
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 2f18fe6fc4f3..37b3c93e3c30 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -73,7 +73,9 @@  static void __ipi_flush_tlb_all(void *info)
 
 void flush_tlb_all(void)
 {
-	if (riscv_use_sbi_for_rfence())
+	if (num_online_cpus() < 2)
+		local_flush_tlb_all();
+	else if (riscv_use_sbi_for_rfence())
 		sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID);
 	else
 		on_each_cpu(__ipi_flush_tlb_all, NULL, 1);