Message ID | 20230330204217.47666-2-prabhakar.mahadev-lad.rj@bp.renesas.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | RISC-V non-coherent function pointer based CMO + non-coherent DMA support for AX45MP | expand |
Context | Check | Description |
---|---|---|
conchuod/tree_selection | fail | Failed to apply to next/pending-fixes or riscv/for-next |
On Thu, Mar 30, 2023, at 22:42, Prabhakar wrote: > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > > Currently, selecting which CMOs to use on a given platform is done using > and ALTERNATIVE_X() macro. This was manageable when there were just two > CMO implementations, but now that there are more and more platforms coming > needing custom CMOs, the use of the ALTERNATIVE_X() macro is unmanageable. > > To avoid such issues this patch switches to use of function pointers > instead of ALTERNATIVE_X() macro for cache management (the only drawback > being performance over the previous approach). > > void (*clean_range)(unsigned long addr, unsigned long size); > void (*inv_range)(unsigned long addr, unsigned long size); > void (*flush_range)(unsigned long addr, unsigned long size); > > The above function pointers are provided to be overridden for platforms > needing CMO. > > Convert ZICBOM and T-HEAD CMO to use function pointers. > > Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> This is looking pretty good. There are a few things that I still see sticking out, and I think I've mentioned some of them before, but don't remember if there was a reason for doing it like this: > +#ifdef CONFIG_ERRATA_THEAD_CMO I would rename this to not call this an 'ERRATA' but just make it a driver. Not important though, and there was probably a reason you did it like this. > +extern struct riscv_cache_ops noncoherent_cache_ops; > + > +void riscv_noncoherent_register_cache_ops(const struct riscv_cache_ops > *ops); > + > +static inline void riscv_dma_noncoherent_clean(void *vaddr, size_t > size) > +{ > + if (noncoherent_cache_ops.clean_range) { > + unsigned long addr = (unsigned long)vaddr; > + > + noncoherent_cache_ops.clean_range(addr, size); > + } > +} The type case should not be necessary here. Instead I would make the callbacks use 'void *' as well, not 'unsigned long'. It's possible that some future cache controller driver requires passing physical addresses, as is common for last level cache, but as long as all drivers pass virtual addresses, it's easier to do the phys_to_virt() in common code. It also seems wrong to have the fallback be to do nothing when the pointer is NULL, since that cannot actually work when a device is not cache coherent. I would either initialize the function pointer to the zicbom version and remove the NULL check, or keep the pointer NULL and have an explicit 'else zicbom_clean_range()' fallback. > +static void zicbom_register_cmo_ops(void) > +{ > + riscv_noncoherent_register_cache_ops(&zicbom_cmo_ops); > +} > +#else > +static void zicbom_register_cmo_ops(void) {} > +#endif As far as I recall, the #else path here was needed previously to work around a binutils dependency, but with the current code, it should be possible to just always enable CONFIG_RISCV_ISA_ZICBOM when RISCV_DMA_NONCOHERENT is used. Arnd
Hi Prabhakar, Thanks for your patch! On Thu, Mar 30, 2023 at 10:42 PM Prabhakar <prabhakar.csengg@gmail.com> wrote: > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > > Currently, selecting which CMOs to use on a given platform is done using > and ALTERNATIVE_X() macro. This was manageable when there were just two the ALTERNATIVE_X() > CMO implementations, but now that there are more and more platforms coming > needing custom CMOs, the use of the ALTERNATIVE_X() macro is unmanageable. > > To avoid such issues this patch switches to use of function pointers "the use" or "using" > instead of ALTERNATIVE_X() macro for cache management (the only drawback the ALTERNATIVE_X() > being performance over the previous approach). > > void (*clean_range)(unsigned long addr, unsigned long size); > void (*inv_range)(unsigned long addr, unsigned long size); > void (*flush_range)(unsigned long addr, unsigned long size); > > The above function pointers are provided to be overridden for platforms > needing CMO. > > Convert ZICBOM and T-HEAD CMO to use function pointers. > > Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > --- a/arch/riscv/errata/thead/errata.c > +++ b/arch/riscv/errata/thead/errata.c > +#ifdef CONFIG_ERRATA_THEAD_CMO > +static void thead_register_cmo_ops(void) > +{ > + riscv_noncoherent_register_cache_ops(&thead_cmo_ops); > +} > +#else > +static void thead_register_cmo_ops(void) {} > +#endif > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -75,3 +83,12 @@ void riscv_noncoherent_supported(void) > "Non-coherent DMA support enabled without a block size\n"); > noncoherent_supported = true; > } > + > +void riscv_noncoherent_register_cache_ops(const struct riscv_cache_ops *ops) > +{ > + if (!ops) > + return; This is never true. I guess originally you wanted to call riscv_noncoherent_register_cache_ops() unconditionally from common code, instead of the various *register_cmo_ops()? But that would have required something like #ifdef CONFIG_ERRATA_THEAD_CMO #define THEAD_CMO_OPS_PTR (&thead_cmo_ops) #else #define THEAD_CMO_OPS_PTR NULL #endif Or can we come up with some macro like pm_ptr(), but that also takes care of the "&", so we can do "#define thead_cmo_ops NULL"? > + > + noncoherent_cache_ops = *ops; > +} > +EXPORT_SYMBOL_GPL(riscv_noncoherent_register_cache_ops); Gr{oetje,eeting}s, Geert
On Thu, Mar 30, 2023 at 11:34:02PM +0200, Arnd Bergmann wrote: > On Thu, Mar 30, 2023, at 22:42, Prabhakar wrote: > > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > > +#ifdef CONFIG_ERRATA_THEAD_CMO > > I would rename this to not call this an 'ERRATA' but > just make it a driver. Not important though, and there > was probably a reason you did it like this. I think what was discussed in a prior iteration was that we'd leave refactoring the T-HEAD bits into a driver for a subsequent work.
On Fri, Mar 31, 2023, at 09:54, Conor Dooley wrote: > On Thu, Mar 30, 2023 at 11:34:02PM +0200, Arnd Bergmann wrote: >> On Thu, Mar 30, 2023, at 22:42, Prabhakar wrote: >> > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > >> > +#ifdef CONFIG_ERRATA_THEAD_CMO >> >> I would rename this to not call this an 'ERRATA' but >> just make it a driver. Not important though, and there >> was probably a reason you did it like this. > > I think what was discussed in a prior iteration was that we'd leave > refactoring the T-HEAD bits into a driver for a subsequent work. Ok, makes sense. Arnd
Hi Arnd, Thank you for the review. On Thu, Mar 30, 2023 at 10:34 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Thu, Mar 30, 2023, at 22:42, Prabhakar wrote: > > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > > > > Currently, selecting which CMOs to use on a given platform is done using > > and ALTERNATIVE_X() macro. This was manageable when there were just two > > CMO implementations, but now that there are more and more platforms coming > > needing custom CMOs, the use of the ALTERNATIVE_X() macro is unmanageable. > > > > To avoid such issues this patch switches to use of function pointers > > instead of ALTERNATIVE_X() macro for cache management (the only drawback > > being performance over the previous approach). > > > > void (*clean_range)(unsigned long addr, unsigned long size); > > void (*inv_range)(unsigned long addr, unsigned long size); > > void (*flush_range)(unsigned long addr, unsigned long size); > > > > The above function pointers are provided to be overridden for platforms > > needing CMO. > > > > Convert ZICBOM and T-HEAD CMO to use function pointers. > > > > Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > > This is looking pretty good. There are a few things that I > still see sticking out, and I think I've mentioned some of > them before, but don't remember if there was a reason for > doing it like this: > > > +#ifdef CONFIG_ERRATA_THEAD_CMO > > I would rename this to not call this an 'ERRATA' but > just make it a driver. Not important though, and there > was probably a reason you did it like this. > As agreed, we will keep this as is. > > +extern struct riscv_cache_ops noncoherent_cache_ops; > > + > > +void riscv_noncoherent_register_cache_ops(const struct riscv_cache_ops > > *ops); > > + > > +static inline void riscv_dma_noncoherent_clean(void *vaddr, size_t > > size) > > +{ > > + if (noncoherent_cache_ops.clean_range) { > > + unsigned long addr = (unsigned long)vaddr; > > + > > + noncoherent_cache_ops.clean_range(addr, size); > > + } > > +} > > The type case should not be necessary here. Instead I would > make the callbacks use 'void *' as well, not 'unsigned long'. > Ok, I'll update this on using void * > It's possible that some future cache controller driver requires > passing physical addresses, as is common for last level cache, > but as long as all drivers pass virtual addresses, it's easier > to do the phys_to_virt() in common code. > Agreed. > It also seems wrong to have the fallback be to do nothing > when the pointer is NULL, since that cannot actually work > when a device is not cache coherent. > If the device is non cache coherent and if it doesn't support ZICBOM ISA extension the device won't work anyway. So non-cache coherent devices until they have their CMO config enabled won't work anyway. So I didn't see any benefit in enabling ZICBOM by default. Please let me know if I am misunderstanding. > I would either initialize the function pointer to the > zicbom version and remove the NULL check, or keep the > pointer NULL and have an explicit > 'else zicbom_clean_range()' fallback. > > > +static void zicbom_register_cmo_ops(void) > > +{ > > + riscv_noncoherent_register_cache_ops(&zicbom_cmo_ops); > > +} > > +#else > > +static void zicbom_register_cmo_ops(void) {} > > +#endif > > As far as I recall, the #else path here was needed previously > to work around a binutils dependency, but with the current > code, it should be possible to just always enable > CONFIG_RISCV_ISA_ZICBOM when RISCV_DMA_NONCOHERENT is used. > Are you suggesting something like below? diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 4dadf35ac721..a55dee98ccf8 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -242,6 +242,7 @@ config RISCV_DMA_NONCOHERENT select ARCH_HAS_SYNC_DMA_FOR_CPU select ARCH_HAS_SYNC_DMA_FOR_DEVICE select DMA_DIRECT_REMAP + select RISCV_ISA_ZICBOM config AS_HAS_INSN def_bool $(as-instr,.insn r 51$(comma) 0$(comma) 0$(comma) t0$(comma) t0$(comma) zero) @@ -465,7 +466,6 @@ config RISCV_ISA_ZICBOM depends on MMU depends on RISCV_ALTERNATIVE default y - select RISCV_DMA_NONCOHERENT help Adds support to dynamically detect the presence of the ZICBOM extension (Cache Block Management Operations) and enable its But what if the platform doesn't have the ZICBOM ISA extension? Cheers, Prabhakar
On Fri, Mar 31, 2023, at 12:37, Lad, Prabhakar wrote: > On Thu, Mar 30, 2023 at 10:34 PM Arnd Bergmann <arnd@arndb.de> wrote: > >> It also seems wrong to have the fallback be to do nothing >> when the pointer is NULL, since that cannot actually work >> when a device is not cache coherent. >> > If the device is non cache coherent and if it doesn't support ZICBOM > ISA extension the device won't work anyway. So non-cache coherent > devices until they have their CMO config enabled won't work anyway. So > I didn't see any benefit in enabling ZICBOM by default. Please let me > know if I am misunderstanding. Two things: - Having a broken machine crash with in invalid instruction exception is better than having it run into silent data corruption. - a correctly predicted branch is typically faster than an indirect function call, so the fallback to zicbom makes the expected (at least in the future) case the fast one. > @@ -465,7 +466,6 @@ config RISCV_ISA_ZICBOM > depends on MMU > depends on RISCV_ALTERNATIVE > default y > - select RISCV_DMA_NONCOHERENT > help > Adds support to dynamically detect the presence of the ZICBOM > extension (Cache Block Management Operations) and enable its > > But what if the platform doesn't have the ZICBOM ISA extension? Then it needs to register its cache operations before the first DMA, which is something that it should do anyway. With your current code, it may work by accident depending on the state of the cache, but with the version I suggested, it will either work correctly all the time or crash in an obvious way when misconfigured. Arnd
Hi Geert, Thank you for the review. On Fri, Mar 31, 2023 at 8:31 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > Hi Prabhakar, > > Thanks for your patch! > > On Thu, Mar 30, 2023 at 10:42 PM Prabhakar <prabhakar.csengg@gmail.com> wrote: > > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > > > > Currently, selecting which CMOs to use on a given platform is done using > > and ALTERNATIVE_X() macro. This was manageable when there were just two > > the ALTERNATIVE_X() > OK. > > CMO implementations, but now that there are more and more platforms coming > > needing custom CMOs, the use of the ALTERNATIVE_X() macro is unmanageable. > > > > To avoid such issues this patch switches to use of function pointers > > "the use" or "using" > OK. > > instead of ALTERNATIVE_X() macro for cache management (the only drawback > > the ALTERNATIVE_X() > OK. > > being performance over the previous approach). > > > > void (*clean_range)(unsigned long addr, unsigned long size); > > void (*inv_range)(unsigned long addr, unsigned long size); > > void (*flush_range)(unsigned long addr, unsigned long size); > > > > The above function pointers are provided to be overridden for platforms > > needing CMO. > > > > Convert ZICBOM and T-HEAD CMO to use function pointers. > > > > Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > > > --- a/arch/riscv/errata/thead/errata.c > > +++ b/arch/riscv/errata/thead/errata.c > > > +#ifdef CONFIG_ERRATA_THEAD_CMO > > > +static void thead_register_cmo_ops(void) > > +{ > > + riscv_noncoherent_register_cache_ops(&thead_cmo_ops); > > +} > > +#else > > +static void thead_register_cmo_ops(void) {} > > +#endif > > > --- a/arch/riscv/mm/dma-noncoherent.c > > +++ b/arch/riscv/mm/dma-noncoherent.c > > > @@ -75,3 +83,12 @@ void riscv_noncoherent_supported(void) > > "Non-coherent DMA support enabled without a block size\n"); > > noncoherent_supported = true; > > } > > + > > +void riscv_noncoherent_register_cache_ops(const struct riscv_cache_ops *ops) > > +{ > > + if (!ops) > > + return; > > This is never true. I just wanted to add a sanity check. > I guess originally you wanted to call riscv_noncoherent_register_cache_ops() > unconditionally from common code, instead of the various *register_cmo_ops()? > But that would have required something like > > #ifdef CONFIG_ERRATA_THEAD_CMO > #define THEAD_CMO_OPS_PTR (&thead_cmo_ops) > #else > #define THEAD_CMO_OPS_PTR NULL > #endif > riscv_noncoherent_register_cache_ops() will only be called if the ISA/Errata needs to be applied so I'll drop this check. Cheers, Prabhakar
On Fri, Mar 31, 2023 at 11:37:30AM +0100, Lad, Prabhakar wrote: > > As far as I recall, the #else path here was needed previously > > to work around a binutils dependency, but with the current > > code, it should be possible to just always enable > > CONFIG_RISCV_ISA_ZICBOM when RISCV_DMA_NONCOHERENT is used. > > > Are you suggesting something like below? > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > index 4dadf35ac721..a55dee98ccf8 100644 > --- a/arch/riscv/Kconfig > +++ b/arch/riscv/Kconfig > @@ -242,6 +242,7 @@ config RISCV_DMA_NONCOHERENT > select ARCH_HAS_SYNC_DMA_FOR_CPU > select ARCH_HAS_SYNC_DMA_FOR_DEVICE > select DMA_DIRECT_REMAP > + select RISCV_ISA_ZICBOM > > config AS_HAS_INSN > def_bool $(as-instr,.insn r 51$(comma) 0$(comma) 0$(comma) > t0$(comma) t0$(comma) zero) > @@ -465,7 +466,6 @@ config RISCV_ISA_ZICBOM > depends on MMU > depends on RISCV_ALTERNATIVE > default y > - select RISCV_DMA_NONCOHERENT > help > Adds support to dynamically detect the presence of the ZICBOM > extension (Cache Block Management Operations) and enable its > Does that actually work? I don't think it does. If you try to enable RISCV_ISA_ZICBOM then you won't get RISC_DMA_NONCOHERENT turned on. Run menuconfig and disable support for Renesas, SiFive and T-Head SoCs & you can replicate. I think one of RISCV_ISA_ZICBOM and RISCV_DMA_NONCOHERENT should just be dropped, although I don't know which one to pick! Making RISCV_DMA_NONCOHERENT user selectable probably makes the most sense.
On Fri, Mar 31, 2023, at 12:55, Conor Dooley wrote: > On Fri, Mar 31, 2023 at 11:37:30AM +0100, Lad, Prabhakar wrote: >> > > Does that actually work? I don't think it does. > If you try to enable RISCV_ISA_ZICBOM then you won't get > RISC_DMA_NONCOHERENT turned on. Run menuconfig and disable support for > Renesas, SiFive and T-Head SoCs & you can replicate. Right, the circular dependency has to be broken in some form. > I think one of RISCV_ISA_ZICBOM and RISCV_DMA_NONCOHERENT should just be > dropped, although I don't know which one to pick! > Making RISCV_DMA_NONCOHERENT user selectable probably makes the most > sense. That sounds good to me. Arnd
On Fri, Mar 31, 2023 at 11:45 AM Arnd Bergmann <arnd@arndb.de> wrote: > > On Fri, Mar 31, 2023, at 12:37, Lad, Prabhakar wrote: > > On Thu, Mar 30, 2023 at 10:34 PM Arnd Bergmann <arnd@arndb.de> wrote: > > > >> It also seems wrong to have the fallback be to do nothing > >> when the pointer is NULL, since that cannot actually work > >> when a device is not cache coherent. > >> > > If the device is non cache coherent and if it doesn't support ZICBOM > > ISA extension the device won't work anyway. So non-cache coherent > > devices until they have their CMO config enabled won't work anyway. So > > I didn't see any benefit in enabling ZICBOM by default. Please let me > > know if I am misunderstanding. > > Two things: > > - Having a broken machine crash with in invalid instruction > exception is better than having it run into silent data > corruption. > > - a correctly predicted branch is typically faster than an > indirect function call, so the fallback to zicbom makes the > expected (at least in the future) case the fast one. > Ok, thank you for the clarification. I'll default to zicbom. > > @@ -465,7 +466,6 @@ config RISCV_ISA_ZICBOM > > depends on MMU > > depends on RISCV_ALTERNATIVE > > default y > > - select RISCV_DMA_NONCOHERENT > > help > > Adds support to dynamically detect the presence of the ZICBOM > > extension (Cache Block Management Operations) and enable its > > > > But what if the platform doesn't have the ZICBOM ISA extension? > > Then it needs to register its cache operations before the first > DMA, which is something that it should do anyway. With your > current code, it may work by accident depending on the state of > the cache, but with the version I suggested, it will either work > correctly all the time or crash in an obvious way when misconfigured. > Okay, agreed. Cheers, Prabhakar
Hey, I think most of what I wanted to talk about has been raised by Arnd or Geert, so I kinda only have a couple of small comments for you here. On Thu, Mar 30, 2023 at 09:42:12PM +0100, Prabhakar wrote: > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > > Currently, selecting which CMOs to use on a given platform is done using > and ALTERNATIVE_X() macro. This was manageable when there were just two > CMO implementations, but now that there are more and more platforms coming > needing custom CMOs, the use of the ALTERNATIVE_X() macro is unmanageable. > > To avoid such issues this patch switches to use of function pointers > instead of ALTERNATIVE_X() macro for cache management (the only drawback > being performance over the previous approach). > > void (*clean_range)(unsigned long addr, unsigned long size); > void (*inv_range)(unsigned long addr, unsigned long size); > void (*flush_range)(unsigned long addr, unsigned long size); So, given Arnd has renamed the generic helpers, should we use the writeback/invalidate/writeback & invalidate terminology here too? I think it'd be nice to make the "driver" functions match the generic implementations's names (even though that means not making the instruction naming). > The above function pointers are provided to be overridden for platforms > needing CMO. > > Convert ZICBOM and T-HEAD CMO to use function pointers. > > Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > --- > v6->v7 > * Updated commit description > * Fixed build issues when CONFIG_ERRATA_THEAD_CMO=n > * Used static const struct ptr to register CMO ops > * Dropped riscv_dma_noncoherent_cmo_ops > * Moved ZICBOM CMO setup to setup.c Why'd you do that? What is the reason that the Zicbom stuff cannot live in dma-noncoherent.[ch] and only expose, say: void riscv_cbom_register_cmo_ops(void) { riscv_noncoherent_register_cache_ops(&zicbom_cmo_ops); } which you then call from setup.c? > v5->v6 > * New patch > --- > arch/riscv/errata/thead/errata.c | 76 ++++++++++++++++++++++++ > arch/riscv/include/asm/dma-noncoherent.h | 73 +++++++++++++++++++++++ > arch/riscv/include/asm/errata_list.h | 53 ----------------- > arch/riscv/kernel/setup.c | 49 ++++++++++++++- > arch/riscv/mm/dma-noncoherent.c | 25 ++++++-- > arch/riscv/mm/pmem.c | 6 +- > 6 files changed, 221 insertions(+), 61 deletions(-) > create mode 100644 arch/riscv/include/asm/dma-noncoherent.h > +#ifdef CONFIG_RISCV_ISA_ZICBOM > + > +#define ZICBOM_CMO_OP(_op, _start, _size, _cachesize) \ > + asm volatile("mv a0, %1\n\t" \ > + "j 2f\n\t" \ > + "3:\n\t" \ > + CBO_##_op(a0) \ > + "add a0, a0, %0\n\t" \ > + "2:\n\t" \ > + "bltu a0, %2, 3b\n\t" \ > + : : "r"(_cachesize), \ > + "r"((unsigned long)(_start) & ~((_cachesize) - 1UL)), \ > + "r"((unsigned long)(_start) + (_size)) \ > + : "a0") > + > +static void zicbom_cmo_clean_range(unsigned long addr, unsigned long size) > +{ > + ZICBOM_CMO_OP(clean, addr, size, riscv_cbom_block_size); > +} > + > +static void zicbom_cmo_flush_range(unsigned long addr, unsigned long size) > +{ > + ZICBOM_CMO_OP(flush, addr, size, riscv_cbom_block_size); > +} > + > +static void zicbom_cmo_inval_range(unsigned long addr, unsigned long size) > +{ > + ZICBOM_CMO_OP(inval, addr, size, riscv_cbom_block_size); > +} > + > +const struct riscv_cache_ops zicbom_cmo_ops = { > + .clean_range = &zicbom_cmo_clean_range, > + .inv_range = &zicbom_cmo_inval_range, > + .flush_range = &zicbom_cmo_flush_range, > +}; > + > +static void zicbom_register_cmo_ops(void) > +{ > + riscv_noncoherent_register_cache_ops(&zicbom_cmo_ops); > +} > +#else > +static void zicbom_register_cmo_ops(void) {} > +#endif I think all of the above should be prefixed with riscv_cbom to match the other riscv_cbom stuff we already have. Although, given the discussion elsewhere about just falling back to zicbom in the absence of specific ops, most of these functions will probably disappear (if not all of them). Cheers, Conor.
Hi Arnd, On Fri, Mar 31, 2023 at 11:45 AM Arnd Bergmann <arnd@arndb.de> wrote: > > On Fri, Mar 31, 2023, at 12:37, Lad, Prabhakar wrote: > > On Thu, Mar 30, 2023 at 10:34 PM Arnd Bergmann <arnd@arndb.de> wrote: > > > >> It also seems wrong to have the fallback be to do nothing > >> when the pointer is NULL, since that cannot actually work > >> when a device is not cache coherent. > >> > > If the device is non cache coherent and if it doesn't support ZICBOM > > ISA extension the device won't work anyway. So non-cache coherent > > devices until they have their CMO config enabled won't work anyway. So > > I didn't see any benefit in enabling ZICBOM by default. Please let me > > know if I am misunderstanding. > > Two things: > > - Having a broken machine crash with in invalid instruction > exception is better than having it run into silent data > corruption. > > - a correctly predicted branch is typically faster than an > indirect function call, so the fallback to zicbom makes the > expected (at least in the future) case the fast one. > > > @@ -465,7 +466,6 @@ config RISCV_ISA_ZICBOM > > depends on MMU > > depends on RISCV_ALTERNATIVE > > default y > > - select RISCV_DMA_NONCOHERENT > > help > > Adds support to dynamically detect the presence of the ZICBOM > > extension (Cache Block Management Operations) and enable its > > > > But what if the platform doesn't have the ZICBOM ISA extension? > > Then it needs to register its cache operations before the first > DMA, which is something that it should do anyway. With your > current code, it may work by accident depending on the state of > the cache, but with the version I suggested, it will either work > correctly all the time or crash in an obvious way when misconfigured. > You were right, defaulting to ZICBOM is giving me a crash. So I need to switch to something else rather than using arch_initcall(). I tried early_initcall() but this doesn't let me register a platform driver. I should be able to access the resources and DT from init callback and then register CMO callbacks (I havent tried this yet) but it wont be a platform driver. Should this be OK? Cheers, Prabhakar
Hi Conor, Thank you for the review. On Fri, Mar 31, 2023 at 1:24 PM Conor Dooley <conor.dooley@microchip.com> wrote: > > Hey, > > I think most of what I wanted to talk about has been raised by Arnd or > Geert, so I kinda only have a couple of small comments for you here. > > On Thu, Mar 30, 2023 at 09:42:12PM +0100, Prabhakar wrote: > > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > > > > Currently, selecting which CMOs to use on a given platform is done using > > and ALTERNATIVE_X() macro. This was manageable when there were just two > > CMO implementations, but now that there are more and more platforms coming > > needing custom CMOs, the use of the ALTERNATIVE_X() macro is unmanageable. > > > > To avoid such issues this patch switches to use of function pointers > > instead of ALTERNATIVE_X() macro for cache management (the only drawback > > being performance over the previous approach). > > > > void (*clean_range)(unsigned long addr, unsigned long size); > > void (*inv_range)(unsigned long addr, unsigned long size); > > void (*flush_range)(unsigned long addr, unsigned long size); > > So, given Arnd has renamed the generic helpers, should we use the > writeback/invalidate/writeback & invalidate terminology here too? > I think it'd be nice to make the "driver" functions match the generic > implementations's names (even though that means not making the > instruction naming). > > > The above function pointers are provided to be overridden for platforms > > needing CMO. > > > > Convert ZICBOM and T-HEAD CMO to use function pointers. > > > > Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > > --- > > v6->v7 > > * Updated commit description > > * Fixed build issues when CONFIG_ERRATA_THEAD_CMO=n > > * Used static const struct ptr to register CMO ops > > * Dropped riscv_dma_noncoherent_cmo_ops > > > * Moved ZICBOM CMO setup to setup.c > > Why'd you do that? > What is the reason that the Zicbom stuff cannot live in > dma-noncoherent.[ch] and only expose, say: > void riscv_cbom_register_cmo_ops(void) > { > riscv_noncoherent_register_cache_ops(&zicbom_cmo_ops); > } > which you then call from setup.c? > Commit abcc445acd ("riscv: move riscv_noncoherent_supported() out of ZICBOM probe) moved the zicbom the setup to setup.c hence I moved the CMO stuff here. Said that, now I am defaulting to zicbom ops so I have mode the functions to dma-noncoherent.c . > > v5->v6 > > * New patch > > --- > > arch/riscv/errata/thead/errata.c | 76 ++++++++++++++++++++++++ > > arch/riscv/include/asm/dma-noncoherent.h | 73 +++++++++++++++++++++++ > > arch/riscv/include/asm/errata_list.h | 53 ----------------- > > arch/riscv/kernel/setup.c | 49 ++++++++++++++- > > arch/riscv/mm/dma-noncoherent.c | 25 ++++++-- > > arch/riscv/mm/pmem.c | 6 +- > > 6 files changed, 221 insertions(+), 61 deletions(-) > > create mode 100644 arch/riscv/include/asm/dma-noncoherent.h > > > +#ifdef CONFIG_RISCV_ISA_ZICBOM > > + > > +#define ZICBOM_CMO_OP(_op, _start, _size, _cachesize) \ > > + asm volatile("mv a0, %1\n\t" \ > > + "j 2f\n\t" \ > > + "3:\n\t" \ > > + CBO_##_op(a0) \ > > + "add a0, a0, %0\n\t" \ > > + "2:\n\t" \ > > + "bltu a0, %2, 3b\n\t" \ > > + : : "r"(_cachesize), \ > > + "r"((unsigned long)(_start) & ~((_cachesize) - 1UL)), \ > > + "r"((unsigned long)(_start) + (_size)) \ > > + : "a0") > > + > > +static void zicbom_cmo_clean_range(unsigned long addr, unsigned long size) > > +{ > > + ZICBOM_CMO_OP(clean, addr, size, riscv_cbom_block_size); > > +} > > + > > +static void zicbom_cmo_flush_range(unsigned long addr, unsigned long size) > > +{ > > + ZICBOM_CMO_OP(flush, addr, size, riscv_cbom_block_size); > > +} > > + > > +static void zicbom_cmo_inval_range(unsigned long addr, unsigned long size) > > +{ > > + ZICBOM_CMO_OP(inval, addr, size, riscv_cbom_block_size); > > +} > > + > > +const struct riscv_cache_ops zicbom_cmo_ops = { > > + .clean_range = &zicbom_cmo_clean_range, > > + .inv_range = &zicbom_cmo_inval_range, > > + .flush_range = &zicbom_cmo_flush_range, > > +}; > > + > > +static void zicbom_register_cmo_ops(void) > > +{ > > + riscv_noncoherent_register_cache_ops(&zicbom_cmo_ops); > > +} > > +#else > > +static void zicbom_register_cmo_ops(void) {} > > +#endif > > I think all of the above should be prefixed with riscv_cbom to match the > other riscv_cbom stuff we already have. Just to clarify, the riscv_cbom prefix should just be applied to the ZICOM functions and not to T-HEAD? Cheers, Prabhakar
On Mon, Apr 03, 2023 at 07:23:37PM +0100, Lad, Prabhakar wrote: > > I think all of the above should be prefixed with riscv_cbom to match the > > other riscv_cbom stuff we already have. > Just to clarify, the riscv_cbom prefix should just be applied to the > ZICOM functions and not to T-HEAD? Yah, can leave the T-HEAD stuff as is IMO. Cheers, Conor.
On Thu, Mar 30, 2023 at 09:42:12PM +0100, Prabhakar wrote: > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > > Currently, selecting which CMOs to use on a given platform is done using > and ALTERNATIVE_X() macro. This was manageable when there were just two > CMO implementations, but now that there are more and more platforms coming > needing custom CMOs, the use of the ALTERNATIVE_X() macro is unmanageable. > > To avoid such issues this patch switches to use of function pointers > instead of ALTERNATIVE_X() macro for cache management (the only drawback > being performance over the previous approach). > > void (*clean_range)(unsigned long addr, unsigned long size); > void (*inv_range)(unsigned long addr, unsigned long size); > void (*flush_range)(unsigned long addr, unsigned long size); > NAK. Function pointers for somthing high performance as cache maintainance is a complete no-go.
Hi Christoph Hellwig, > -----Original Message----- > From: Christoph Hellwig <hch@infradead.org> > Sent: Tuesday, April 4, 2023 6:29 AM > To: Prabhakar <prabhakar.csengg@gmail.com> > Cc: Arnd Bergmann <arnd@arndb.de>; Conor Dooley > <conor.dooley@microchip.com>; Geert Uytterhoeven <geert+renesas@glider.be>; > Heiko Stuebner <heiko@sntech.de>; Guo Ren <guoren@kernel.org>; Andrew Jones > <ajones@ventanamicro.com>; Paul Walmsley <paul.walmsley@sifive.com>; Palmer > Dabbelt <palmer@dabbelt.com>; Albert Ou <aou@eecs.berkeley.edu>; Samuel > Holland <samuel@sholland.org>; linux-riscv@lists.infradead.org; Rob Herring > <robh+dt@kernel.org>; Krzysztof Kozlowski > <krzysztof.kozlowski+dt@linaro.org>; devicetree@vger.kernel.org; linux- > kernel@vger.kernel.org; linux-renesas-soc@vger.kernel.org; Biju Das > <biju.das.jz@bp.renesas.com>; Prabhakar Mahadev Lad <prabhakar.mahadev- > lad.rj@bp.renesas.com> > Subject: Re: [PATCH v7 1/6] riscv: mm: dma-noncoherent: Switch using > function pointers for cache management > > On Thu, Mar 30, 2023 at 09:42:12PM +0100, Prabhakar wrote: > > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > > > > Currently, selecting which CMOs to use on a given platform is done > > using and ALTERNATIVE_X() macro. This was manageable when there were > > just two CMO implementations, but now that there are more and more > > platforms coming needing custom CMOs, the use of the ALTERNATIVE_X() macro > is unmanageable. > > > > To avoid such issues this patch switches to use of function pointers > > instead of ALTERNATIVE_X() macro for cache management (the only > > drawback being performance over the previous approach). > > > > void (*clean_range)(unsigned long addr, unsigned long size); void > > (*inv_range)(unsigned long addr, unsigned long size); void > > (*flush_range)(unsigned long addr, unsigned long size); > > > > NAK. Function pointers for somthing high performance as cache maintainance > is a complete no-go. Just a question, how does function pointer makes a performance difference compared to ALTERNATIVE_X() macros? On both cases, we are pushing function parameters to stack, jumping to the actual routine And then on return pop the variables from stack. Am I missing something here? Benchmark results by [1], shows that there is no performance degradation. I am not sure What type of benchmarking used in this case and How accurate is this benchmark? https://lore.kernel.org/linux-renesas-soc/40cdea465fef49a8a337b48e2a748138c66a08eb.camel@icenowy.me/T/#m093c1f3d8f7f0e15bd2909900bf137d5714c553c Cheers, Biju
On Tue, Apr 4, 2023, at 07:29, Christoph Hellwig wrote: > On Thu, Mar 30, 2023 at 09:42:12PM +0100, Prabhakar wrote: >> From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> >> >> Currently, selecting which CMOs to use on a given platform is done using >> and ALTERNATIVE_X() macro. This was manageable when there were just two >> CMO implementations, but now that there are more and more platforms coming >> needing custom CMOs, the use of the ALTERNATIVE_X() macro is unmanageable. >> >> To avoid such issues this patch switches to use of function pointers >> instead of ALTERNATIVE_X() macro for cache management (the only drawback >> being performance over the previous approach). >> >> void (*clean_range)(unsigned long addr, unsigned long size); >> void (*inv_range)(unsigned long addr, unsigned long size); >> void (*flush_range)(unsigned long addr, unsigned long size); >> > > NAK. Function pointers for somthing high performance as cache > maintainance is a complete no-go. As we already discussed, this is now planned to use a direct branch to the zicbom version when the function pointer is NULL, which should be a fast predicted branch on all standard implementations and only be slow on the nonstandard ones, which I think is a reasonable compromise. I'm also not sure I'd actually consider this a fast path, since there is already a significant cost in accessing the invalidated cache lines afterwards, which is likely going to be much higher than the cost of an indirect branch. I suppose an alternative would be to use the linux/static_call.h infrastructure to avoid the overhead of indirect branches altogether. Right now, this has no riscv specific implementation though, so using it just turns into a regular indirect branch until someone implements static_call. Arnd
On Tue, Apr 04, 2023 at 08:50:16AM +0200, Arnd Bergmann wrote: > On Tue, Apr 4, 2023, at 07:29, Christoph Hellwig wrote: > > On Thu, Mar 30, 2023 at 09:42:12PM +0100, Prabhakar wrote: > >> From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > >> > >> Currently, selecting which CMOs to use on a given platform is done using > >> and ALTERNATIVE_X() macro. This was manageable when there were just two > >> CMO implementations, but now that there are more and more platforms coming > >> needing custom CMOs, the use of the ALTERNATIVE_X() macro is unmanageable. > >> > >> To avoid such issues this patch switches to use of function pointers > >> instead of ALTERNATIVE_X() macro for cache management (the only drawback > >> being performance over the previous approach). > >> > >> void (*clean_range)(unsigned long addr, unsigned long size); > >> void (*inv_range)(unsigned long addr, unsigned long size); > >> void (*flush_range)(unsigned long addr, unsigned long size); > >> > > > > NAK. Function pointers for somthing high performance as cache > > maintainance is a complete no-go. > > As we already discussed, this is now planned to use a direct > branch to the zicbom version when the function pointer is NULL, > which should be a fast predicted branch on all standard implementations > and only be slow on the nonstandard ones, which I think is a reasonable > compromise. > > I'm also not sure I'd actually consider this a fast path, since > there is already a significant cost in accessing the invalidated > cache lines afterwards, which is likely going to be much higher than > the cost of an indirect branch. > > I suppose an alternative would be to use the linux/static_call.h > infrastructure to avoid the overhead of indirect branches altogether. > Right now, this has no riscv specific implementation though, so > using it just turns into a regular indirect branch until someone > implements static_call. Someone actually posted an implementation for that the other day: https://patchwork.kernel.org/project/linux-riscv/patch/tencent_A8A256967B654625AEE1DB222514B0613B07@qq.com/ I haven't looked at it at all, other than noticing the build issues outside of for !rv64 || !mmu, so I have no idea what state it is actually in.
On Tue, Apr 04, 2023 at 06:24:16AM +0000, Biju Das wrote: > Just a question, how does function pointer makes a performance difference compared to > ALTERNATIVE_X() macros? > > On both cases, we are pushing function parameters to stack, jumping to the actual routine > And then on return pop the variables from stack. Am I missing something here? Indirect calls have always been more expensive, and with the hard- and software mitigations for spectre-like attacks they are becoming even more expensive. But other other point is adding more cache flushing variants should not be easy. Everyone should be using the standardize version. If it's not implemented in hardware despite having ratified extensions you can fake it up in SBI. Yes, that's way more expensive than indirect calls, but that's what you get for taping out new hardware that ignores the actual architecture specification and just does their own made up shit.
Hi Christoph Hellwig, Thanks for the feedback. > Subject: Re: [PATCH v7 1/6] riscv: mm: dma-noncoherent: Switch using > function pointers for cache management > > On Tue, Apr 04, 2023 at 06:24:16AM +0000, Biju Das wrote: > > Just a question, how does function pointer makes a performance > > difference compared to > > ALTERNATIVE_X() macros? > > > > On both cases, we are pushing function parameters to stack, jumping to > > the actual routine And then on return pop the variables from stack. Am I > missing something here? > > Indirect calls have always been more expensive, and with the hard- and > software mitigations for spectre-like attacks they are becoming even more > expensive. Thanks for the info. I agree, it will be more expensive with software mitigations for spectre-like attacks. Cheers, Biju
Hi Christoph, On Tue, Apr 4, 2023 at 6:29 AM Christoph Hellwig <hch@infradead.org> wrote: > > On Thu, Mar 30, 2023 at 09:42:12PM +0100, Prabhakar wrote: > > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > > > > Currently, selecting which CMOs to use on a given platform is done using > > and ALTERNATIVE_X() macro. This was manageable when there were just two > > CMO implementations, but now that there are more and more platforms coming > > needing custom CMOs, the use of the ALTERNATIVE_X() macro is unmanageable. > > > > To avoid such issues this patch switches to use of function pointers > > instead of ALTERNATIVE_X() macro for cache management (the only drawback > > being performance over the previous approach). > > > > void (*clean_range)(unsigned long addr, unsigned long size); > > void (*inv_range)(unsigned long addr, unsigned long size); > > void (*flush_range)(unsigned long addr, unsigned long size); > > > > NAK. Function pointers for somthing high performance as cache > maintainance is a complete no-go. > Ok, I will take the ALTERNATIVE() macro route. Cheers, Prabhakar
> But other other point is adding more cache flushing variants should not > be easy. Everyone should be using the standardize version. If it's not > implemented in hardware despite having ratified extensions you can fake > it up in SBI. Yes, that's way more expensive than indirect calls, but > that's what you get for taping out new hardware that ignores the actual > architecture specification and just does their own made up shit. FWIW, ALTERNATIVE_X() for "three instructions with (what should be a) crystal-clear semantics" already smells like "we're doing it wrong" to me, function pointers would be closer to "we're looking for trouble". Thanks, Andrea
On Fri, Apr 07, 2023 at 02:03:36AM +0200, Andrea Parri wrote: > > But other other point is adding more cache flushing variants should not > > be easy. Everyone should be using the standardize version. If it's not > > implemented in hardware despite having ratified extensions you can fake > > it up in SBI. Yes, that's way more expensive than indirect calls, but > > that's what you get for taping out new hardware that ignores the actual > > architecture specification and just does their own made up shit. > > FWIW, ALTERNATIVE_X() for "three instructions with (what should be a) > crystal-clear semantics" already smells like "we're doing it wrong" to > me, function pointers would be closer to "we're looking for trouble". Thanks for putting my feelings into such concise words.
diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c index 7e8d50ebb71a..4bb3f2baee03 100644 --- a/arch/riscv/errata/thead/errata.c +++ b/arch/riscv/errata/thead/errata.c @@ -11,10 +11,83 @@ #include <linux/uaccess.h> #include <asm/alternative.h> #include <asm/cacheflush.h> +#include <asm/dma-noncoherent.h> #include <asm/errata_list.h> #include <asm/patch.h> #include <asm/vendorid_list.h> +#ifdef CONFIG_ERRATA_THEAD_CMO +/* + * dcache.ipa rs1 (invalidate, physical address) + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | + * 0000001 01010 rs1 000 00000 0001011 + * dache.iva rs1 (invalida, virtual address) + * 0000001 00110 rs1 000 00000 0001011 + * + * dcache.cpa rs1 (clean, physical address) + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | + * 0000001 01001 rs1 000 00000 0001011 + * dcache.cva rs1 (clean, virtual address) + * 0000001 00100 rs1 000 00000 0001011 + * + * dcache.cipa rs1 (clean then invalidate, physical address) + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | + * 0000001 01011 rs1 000 00000 0001011 + * dcache.civa rs1 (... virtual address) + * 0000001 00111 rs1 000 00000 0001011 + * + * sync.s (make sure all cache operations finished) + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | + * 0000000 11001 00000 000 00000 0001011 + */ +#define THEAD_inval_A0 ".long 0x0265000b" +#define THEAD_clean_A0 ".long 0x0245000b" +#define THEAD_flush_A0 ".long 0x0275000b" +#define THEAD_SYNC_S ".long 0x0190000b" + +#define THEAD_CMO_OP(_op, _start, _size, _cachesize) \ + asm volatile("mv a0, %1\n\t" \ + "j 2f\n\t" \ + "3:\n\t" \ + THEAD_##_op##_A0 "\n\t" \ + "add a0, a0, %0\n\t" \ + "2:\n\t" \ + "bltu a0, %2, 3b\n\t" \ + THEAD_SYNC_S \ + : : "r"(_cachesize), \ + "r"((unsigned long)(_start) & ~((_cachesize) - 1UL)), \ + "r"((unsigned long)(_start) + (_size)) \ + : "a0") + +static void thead_cmo_clean_range(unsigned long addr, unsigned long size) +{ + THEAD_CMO_OP(clean, addr, size, riscv_cbom_block_size); +} + +static void thead_cmo_flush_range(unsigned long addr, unsigned long size) +{ + THEAD_CMO_OP(flush, addr, size, riscv_cbom_block_size); +} + +static void thead_cmo_inval_range(unsigned long addr, unsigned long size) +{ + THEAD_CMO_OP(inval, addr, size, riscv_cbom_block_size); +} + +static const struct riscv_cache_ops thead_cmo_ops = { + .clean_range = &thead_cmo_clean_range, + .inv_range = &thead_cmo_inval_range, + .flush_range = &thead_cmo_flush_range, +}; + +static void thead_register_cmo_ops(void) +{ + riscv_noncoherent_register_cache_ops(&thead_cmo_ops); +} +#else +static void thead_register_cmo_ops(void) {} +#endif + static bool errata_probe_pbmt(unsigned int stage, unsigned long arch_id, unsigned long impid) { @@ -45,6 +118,9 @@ static bool errata_probe_cmo(unsigned int stage, riscv_cbom_block_size = L1_CACHE_BYTES; riscv_noncoherent_supported(); + + thead_register_cmo_ops(); + return true; } diff --git a/arch/riscv/include/asm/dma-noncoherent.h b/arch/riscv/include/asm/dma-noncoherent.h new file mode 100644 index 000000000000..fc8d16c89f01 --- /dev/null +++ b/arch/riscv/include/asm/dma-noncoherent.h @@ -0,0 +1,73 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 Renesas Electronics Corp. + */ + +#ifndef __ASM_DMA_NONCOHERENT_H +#define __ASM_DMA_NONCOHERENT_H + +#include <linux/dma-direct.h> + +#ifdef CONFIG_RISCV_DMA_NONCOHERENT + +/* + * struct riscv_cache_ops - Structure for CMO function pointers + * + * @clean_range: Function pointer for clean cache + * @inv_range: Function pointer for invalidate cache + * @flush_range: Function pointer for flushing the cache + */ +struct riscv_cache_ops { + void (*clean_range)(unsigned long addr, unsigned long size); + void (*inv_range)(unsigned long addr, unsigned long size); + void (*flush_range)(unsigned long addr, unsigned long size); +}; + +extern struct riscv_cache_ops noncoherent_cache_ops; + +void riscv_noncoherent_register_cache_ops(const struct riscv_cache_ops *ops); + +static inline void riscv_dma_noncoherent_clean(void *vaddr, size_t size) +{ + if (noncoherent_cache_ops.clean_range) { + unsigned long addr = (unsigned long)vaddr; + + noncoherent_cache_ops.clean_range(addr, size); + } +} + +static inline void riscv_dma_noncoherent_flush(void *vaddr, size_t size) +{ + if (noncoherent_cache_ops.flush_range) { + unsigned long addr = (unsigned long)vaddr; + + noncoherent_cache_ops.flush_range(addr, size); + } +} + +static inline void riscv_dma_noncoherent_inval(void *vaddr, size_t size) +{ + if (noncoherent_cache_ops.inv_range) { + unsigned long addr = (unsigned long)vaddr; + + noncoherent_cache_ops.inv_range(addr, size); + } +} + +static inline void riscv_dma_noncoherent_pmem_clean(void *vaddr, size_t size) +{ + riscv_dma_noncoherent_clean(vaddr, size); +} + +static inline void riscv_dma_noncoherent_pmem_inval(void *vaddr, size_t size) +{ + riscv_dma_noncoherent_inval(vaddr, size); +} +#else + +static inline void riscv_dma_noncoherent_pmem_clean(void *vaddr, size_t size) {} +static inline void riscv_dma_noncoherent_pmem_inval(void *vaddr, size_t size) {} + +#endif /* CONFIG_RISCV_DMA_NONCOHERENT */ + +#endif /* __ASM_DMA_NONCOHERENT_H */ diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h index fb1a810f3d8c..112429910ee6 100644 --- a/arch/riscv/include/asm/errata_list.h +++ b/arch/riscv/include/asm/errata_list.h @@ -89,59 +89,6 @@ asm volatile(ALTERNATIVE( \ #define ALT_THEAD_PMA(_val) #endif -/* - * dcache.ipa rs1 (invalidate, physical address) - * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | - * 0000001 01010 rs1 000 00000 0001011 - * dache.iva rs1 (invalida, virtual address) - * 0000001 00110 rs1 000 00000 0001011 - * - * dcache.cpa rs1 (clean, physical address) - * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | - * 0000001 01001 rs1 000 00000 0001011 - * dcache.cva rs1 (clean, virtual address) - * 0000001 00100 rs1 000 00000 0001011 - * - * dcache.cipa rs1 (clean then invalidate, physical address) - * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | - * 0000001 01011 rs1 000 00000 0001011 - * dcache.civa rs1 (... virtual address) - * 0000001 00111 rs1 000 00000 0001011 - * - * sync.s (make sure all cache operations finished) - * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | - * 0000000 11001 00000 000 00000 0001011 - */ -#define THEAD_inval_A0 ".long 0x0265000b" -#define THEAD_clean_A0 ".long 0x0245000b" -#define THEAD_flush_A0 ".long 0x0275000b" -#define THEAD_SYNC_S ".long 0x0190000b" - -#define ALT_CMO_OP(_op, _start, _size, _cachesize) \ -asm volatile(ALTERNATIVE_2( \ - __nops(6), \ - "mv a0, %1\n\t" \ - "j 2f\n\t" \ - "3:\n\t" \ - CBO_##_op(a0) \ - "add a0, a0, %0\n\t" \ - "2:\n\t" \ - "bltu a0, %2, 3b\n\t" \ - "nop", 0, RISCV_ISA_EXT_ZICBOM, CONFIG_RISCV_ISA_ZICBOM, \ - "mv a0, %1\n\t" \ - "j 2f\n\t" \ - "3:\n\t" \ - THEAD_##_op##_A0 "\n\t" \ - "add a0, a0, %0\n\t" \ - "2:\n\t" \ - "bltu a0, %2, 3b\n\t" \ - THEAD_SYNC_S, THEAD_VENDOR_ID, \ - ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO) \ - : : "r"(_cachesize), \ - "r"((unsigned long)(_start) & ~((_cachesize) - 1UL)), \ - "r"((unsigned long)(_start) + (_size)) \ - : "a0") - #define THEAD_C9XX_RV_IRQ_PMU 17 #define THEAD_C9XX_CSR_SCOUNTEROF 0x5c5 diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c index 5d3184cbf518..b2b69d1dec23 100644 --- a/arch/riscv/kernel/setup.c +++ b/arch/riscv/kernel/setup.c @@ -24,6 +24,7 @@ #include <asm/alternative.h> #include <asm/cacheflush.h> #include <asm/cpu_ops.h> +#include <asm/dma-noncoherent.h> #include <asm/early_ioremap.h> #include <asm/pgtable.h> #include <asm/setup.h> @@ -262,6 +263,50 @@ static void __init parse_dtb(void) #endif } +#ifdef CONFIG_RISCV_ISA_ZICBOM + +#define ZICBOM_CMO_OP(_op, _start, _size, _cachesize) \ + asm volatile("mv a0, %1\n\t" \ + "j 2f\n\t" \ + "3:\n\t" \ + CBO_##_op(a0) \ + "add a0, a0, %0\n\t" \ + "2:\n\t" \ + "bltu a0, %2, 3b\n\t" \ + : : "r"(_cachesize), \ + "r"((unsigned long)(_start) & ~((_cachesize) - 1UL)), \ + "r"((unsigned long)(_start) + (_size)) \ + : "a0") + +static void zicbom_cmo_clean_range(unsigned long addr, unsigned long size) +{ + ZICBOM_CMO_OP(clean, addr, size, riscv_cbom_block_size); +} + +static void zicbom_cmo_flush_range(unsigned long addr, unsigned long size) +{ + ZICBOM_CMO_OP(flush, addr, size, riscv_cbom_block_size); +} + +static void zicbom_cmo_inval_range(unsigned long addr, unsigned long size) +{ + ZICBOM_CMO_OP(inval, addr, size, riscv_cbom_block_size); +} + +const struct riscv_cache_ops zicbom_cmo_ops = { + .clean_range = &zicbom_cmo_clean_range, + .inv_range = &zicbom_cmo_inval_range, + .flush_range = &zicbom_cmo_flush_range, +}; + +static void zicbom_register_cmo_ops(void) +{ + riscv_noncoherent_register_cache_ops(&zicbom_cmo_ops); +} +#else +static void zicbom_register_cmo_ops(void) {} +#endif + void __init setup_arch(char **cmdline_p) { parse_dtb(); @@ -301,8 +346,10 @@ void __init setup_arch(char **cmdline_p) riscv_fill_hwcap(); apply_boot_alternatives(); if (IS_ENABLED(CONFIG_RISCV_ISA_ZICBOM) && - riscv_isa_extension_available(NULL, ZICBOM)) + riscv_isa_extension_available(NULL, ZICBOM)) { riscv_noncoherent_supported(); + zicbom_register_cmo_ops(); + } } static int __init topology_init(void) diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index b9a9f57e02be..ab8f6dc67914 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -9,28 +9,36 @@ #include <linux/dma-map-ops.h> #include <linux/mm.h> #include <asm/cacheflush.h> +#include <asm/dma-noncoherent.h> static bool noncoherent_supported; +struct riscv_cache_ops noncoherent_cache_ops = { + .clean_range = NULL, + .inv_range = NULL, + .flush_range = NULL, +}; +EXPORT_SYMBOL_GPL(noncoherent_cache_ops); + static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { void *vaddr = phys_to_virt(paddr); - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); + riscv_dma_noncoherent_clean(vaddr, size); } static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) { void *vaddr = phys_to_virt(paddr); - ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); + riscv_dma_noncoherent_inval(vaddr, size); } static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) { void *vaddr = phys_to_virt(paddr); - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + riscv_dma_noncoherent_flush(vaddr, size); } static inline bool arch_sync_dma_clean_before_fromdevice(void) @@ -50,7 +58,7 @@ void arch_dma_prep_coherent(struct page *page, size_t size) { void *flush_addr = page_address(page); - ALT_CMO_OP(flush, flush_addr, size, riscv_cbom_block_size); + riscv_dma_noncoherent_flush(flush_addr, size); } void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, @@ -75,3 +83,12 @@ void riscv_noncoherent_supported(void) "Non-coherent DMA support enabled without a block size\n"); noncoherent_supported = true; } + +void riscv_noncoherent_register_cache_ops(const struct riscv_cache_ops *ops) +{ + if (!ops) + return; + + noncoherent_cache_ops = *ops; +} +EXPORT_SYMBOL_GPL(riscv_noncoherent_register_cache_ops); diff --git a/arch/riscv/mm/pmem.c b/arch/riscv/mm/pmem.c index 089df92ae876..aad7c2209eca 100644 --- a/arch/riscv/mm/pmem.c +++ b/arch/riscv/mm/pmem.c @@ -6,16 +6,16 @@ #include <linux/export.h> #include <linux/libnvdimm.h> -#include <asm/cacheflush.h> +#include <asm/dma-noncoherent.h> void arch_wb_cache_pmem(void *addr, size_t size) { - ALT_CMO_OP(clean, addr, size, riscv_cbom_block_size); + riscv_dma_noncoherent_pmem_clean(addr, size); } EXPORT_SYMBOL_GPL(arch_wb_cache_pmem); void arch_invalidate_pmem(void *addr, size_t size) { - ALT_CMO_OP(inval, addr, size, riscv_cbom_block_size); + riscv_dma_noncoherent_pmem_inval(addr, size); } EXPORT_SYMBOL_GPL(arch_invalidate_pmem);