diff mbox series

[v8,01/14] riscv: prevent null-pointer dereference with sbi_remote_fence_i

Message ID 20220324000710.575331-2-heiko@sntech.de (mailing list archive)
State New, archived
Headers show
Series riscv: support for Svpbmt and D1 memory types | expand

Commit Message

Heiko Stuebner March 24, 2022, 12:06 a.m. UTC
The callback used inside sbi_remote_fence_i is set at sbi probe time
to the needed variant. Before that it is a NULL pointer.

Some users like the flush_icache_*() functions suggest a generic
functionality, that doesn't depend on a specific boot-stage but
uses sbi_remote_fence_i as one option to flush other cpu cores.

So they definitely shouldn't run into null-pointer dereference
issues when called "too early" during boot.

So introduce an empty function to be the standard for the __sbi_rfence
function pointer until sbi_init has run.

Users of sbi_remote_fence_i will have separate code for the local
cpu and sbi_init() is called before other cpus are brought up.
So there are no other cpus present at the time when the issue
might happen.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
---
 arch/riscv/kernel/sbi.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig March 31, 2022, 9:51 a.m. UTC | #1
On Thu, Mar 24, 2022 at 01:06:57AM +0100, Heiko Stuebner wrote:
> The callback used inside sbi_remote_fence_i is set at sbi probe time
> to the needed variant. Before that it is a NULL pointer.
> 
> Some users like the flush_icache_*() functions suggest a generic
> functionality, that doesn't depend on a specific boot-stage but
> uses sbi_remote_fence_i as one option to flush other cpu cores.
> 
> So they definitely shouldn't run into null-pointer dereference
> issues when called "too early" during boot.
> 
> So introduce an empty function to be the standard for the __sbi_rfence
> function pointer until sbi_init has run.
> 
> Users of sbi_remote_fence_i will have separate code for the local
> cpu and sbi_init() is called before other cpus are brought up.
> So there are no other cpus present at the time when the issue
> might happen.

I don't really understand this changelog.  If flush_icache_* or
other routines using SBI calls are called too early they won't
do what they are asked to do, which implies a bug in the code.

So crashing absolutely is the right thing to do here as we don't
really have any other error reporting method available.

So unless I'm totally misunderstanding what you are saying here:

Nacked-by: Christoph Hellwig <hch@lst.de>
Heiko Stuebner March 31, 2022, 12:28 p.m. UTC | #2
Hi,

Am Donnerstag, 31. März 2022, 11:51:55 CEST schrieb Christoph Hellwig:
> On Thu, Mar 24, 2022 at 01:06:57AM +0100, Heiko Stuebner wrote:
> > The callback used inside sbi_remote_fence_i is set at sbi probe time
> > to the needed variant. Before that it is a NULL pointer.
> > 
> > Some users like the flush_icache_*() functions suggest a generic
> > functionality, that doesn't depend on a specific boot-stage but
> > uses sbi_remote_fence_i as one option to flush other cpu cores.
> > 
> > So they definitely shouldn't run into null-pointer dereference
> > issues when called "too early" during boot.
> > 
> > So introduce an empty function to be the standard for the __sbi_rfence
> > function pointer until sbi_init has run.
> > 
> > Users of sbi_remote_fence_i will have separate code for the local
> > cpu and sbi_init() is called before other cpus are brought up.
> > So there are no other cpus present at the time when the issue
> > might happen.
> 
> I don't really understand this changelog.  If flush_icache_* or
> other routines using SBI calls are called too early they won't
> do what they are asked to do, which implies a bug in the code.
> 
> So crashing absolutely is the right thing to do here as we don't
> really have any other error reporting method available.
> 
> So unless I'm totally misunderstanding what you are saying here:
> 
> Nacked-by: Christoph Hellwig <hch@lst.de>

The function is defined as

void flush_icache_all(void)
{
	local_flush_icache_all();

	if (IS_ENABLED(CONFIG_RISCV_SBI))
		sbi_remote_fence_i(NULL);
	else
		on_each_cpu(ipi_remote_fence_i, NULL, 1);
}

so essentially flushes the _local_ icache first and then tries to flush
caches on other cores, either via an ipi or via sbi.

The remote-fence callback is set correctly during sbi_init().
The other cores are only brought up after sbi-init is done.

So it's not really about error reporting but making sure that flush_icache_all()
does something sane even when still running on the first core.
As I assume the "all" means on all available cores (which would be the
core the system booted on).

Does this make it clearer what this tries to solve?
Heiko
Christoph Hellwig March 31, 2022, 12:33 p.m. UTC | #3
On Thu, Mar 31, 2022 at 02:28:06PM +0200, Heiko Stübner wrote:
> so essentially flushes the _local_ icache first and then tries to flush
> caches on other cores, either via an ipi or via sbi.
> 
> The remote-fence callback is set correctly during sbi_init().
> The other cores are only brought up after sbi-init is done.
> 
> So it's not really about error reporting but making sure that flush_icache_all()
> does something sane even when still running on the first core.
> As I assume the "all" means on all available cores (which would be the
> core the system booted on).
> 
> Does this make it clearer what this tries to solve?

A little.  Whatever code calls this early still seems broken to me
and just just do a local flush, though.
Heiko Stuebner April 7, 2022, 6:50 p.m. UTC | #4
Am Donnerstag, 31. März 2022, 14:33:22 CEST schrieb Christoph Hellwig:
> On Thu, Mar 31, 2022 at 02:28:06PM +0200, Heiko Stübner wrote:
> > so essentially flushes the _local_ icache first and then tries to flush
> > caches on other cores, either via an ipi or via sbi.
> > 
> > The remote-fence callback is set correctly during sbi_init().
> > The other cores are only brought up after sbi-init is done.
> > 
> > So it's not really about error reporting but making sure that flush_icache_all()
> > does something sane even when still running on the first core.
> > As I assume the "all" means on all available cores (which would be the
> > core the system booted on).
> > 
> > Does this make it clearer what this tries to solve?
> 
> A little.  Whatever code calls this early still seems broken to me
> and just just do a local flush, though.

Looking at this again made me realize that we don't need this patch
anymore at all.

In a previous revision the "main" alternatives apply-point was earlier,
triggering the issue when the generic patching-functions wanted to
flush the cache, but with the alternatives getting applied where they
are now, the issue is not triggered anymore.

And the "early" alternatives use a local flush anyway.
diff mbox series

Patch

diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
index 775d3322b422..5a60a458c0b7 100644
--- a/arch/riscv/kernel/sbi.c
+++ b/arch/riscv/kernel/sbi.c
@@ -16,11 +16,19 @@ 
 unsigned long sbi_spec_version __ro_after_init = SBI_SPEC_VERSION_DEFAULT;
 EXPORT_SYMBOL(sbi_spec_version);
 
+static int __sbi_rfence_none(int fid, const struct cpumask *cpu_mask,
+			     unsigned long start, unsigned long size,
+			     unsigned long arg4, unsigned long arg5)
+{
+	return -EOPNOTSUPP;
+}
+
 static void (*__sbi_set_timer)(uint64_t stime) __ro_after_init;
 static int (*__sbi_send_ipi)(const struct cpumask *cpu_mask) __ro_after_init;
 static int (*__sbi_rfence)(int fid, const struct cpumask *cpu_mask,
 			   unsigned long start, unsigned long size,
-			   unsigned long arg4, unsigned long arg5) __ro_after_init;
+			   unsigned long arg4, unsigned long arg5)
+			   __ro_after_init = __sbi_rfence_none;
 
 struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
 			unsigned long arg1, unsigned long arg2,