mbox series

[v3,0/2] riscv: errata: thead: use riscv_nonstd_cache_ops for CMO

Message ID 20231012141456.4078-1-jszhang@kernel.org (mailing list archive)
Headers show
Series riscv: errata: thead: use riscv_nonstd_cache_ops for CMO | expand

Message

Jisheng Zhang Oct. 12, 2023, 2:14 p.m. UTC
Previously, we use alternative mechanism to dynamically patch
the CMO operations for THEAD C906/C910 during boot for performance
reason. But as pointed out by Arnd, "there is already a significant
cost in accessing the invalidated cache lines afterwards, which is
likely going to be much higher than the cost of an indirect branch".
And indeed, there's no performance difference with GMAC and EMMC per
my test on Sipeed Lichee Pi 4A board.

Use riscv_nonstd_cache_ops for THEAD C906/C910 CMO to simplify
the alternative code, and to acchieve Arnd's goal -- "I think
moving the THEAD ops at the same level as all nonstandard operations
makes sense, but I'd still leave CMO as an explicit fast path that
avoids the indirect branch. This seems like the right thing to do both
for readability and for platforms on which the indirect branch has a
noticeable overhead."

To make bisect easy, I use two patches here: patch1 does the conversion
which just mimics current CMO behavior via. riscv_nonstd_cache_ops, I
assume no functionalities changes. patch2 uses T-HEAD PA based CMO
instructions so that we don't need to covert PA to VA.

Hi Guo,

I didn't use wback_inv for wback as you suggested during v1 reviewing,
this can be left as future optimizations.

Thanks

since v2:
  - collect Reviewed-by tag
  - fix typo

since v1:
  - collect Tested-by tag
  - add patch2 to use T-HEAD PA based CMO instructions.

Jisheng Zhang (2):
  riscv: errata: thead: use riscv_nonstd_cache_ops for CMO
  riscv: errata: thead: use pa based instructions for CMO

 arch/riscv/Kconfig.errata            |  1 +
 arch/riscv/errata/thead/errata.c     | 69 +++++++++++++++++++++++++++-
 arch/riscv/include/asm/errata_list.h | 50 +++-----------------
 3 files changed, 74 insertions(+), 46 deletions(-)

Comments

Jisheng Zhang Oct. 12, 2023, 2:21 p.m. UTC | #1
On Thu, Oct 12, 2023 at 10:14:54PM +0800, Jisheng Zhang wrote:
> Previously, we use alternative mechanism to dynamically patch
> the CMO operations for THEAD C906/C910 during boot for performance
> reason. But as pointed out by Arnd, "there is already a significant
> cost in accessing the invalidated cache lines afterwards, which is
> likely going to be much higher than the cost of an indirect branch".
> And indeed, there's no performance difference with GMAC and EMMC per
> my test on Sipeed Lichee Pi 4A board.
> 
> Use riscv_nonstd_cache_ops for THEAD C906/C910 CMO to simplify
> the alternative code, and to acchieve Arnd's goal -- "I think
> moving the THEAD ops at the same level as all nonstandard operations
> makes sense, but I'd still leave CMO as an explicit fast path that
> avoids the indirect branch. This seems like the right thing to do both
> for readability and for platforms on which the indirect branch has a
> noticeable overhead."
> 
> To make bisect easy, I use two patches here: patch1 does the conversion
> which just mimics current CMO behavior via. riscv_nonstd_cache_ops, I
> assume no functionalities changes. patch2 uses T-HEAD PA based CMO
> instructions so that we don't need to covert PA to VA.
> 
> Hi Guo,
> 
> I didn't use wback_inv for wback as you suggested during v1 reviewing,
> this can be left as future optimizations.
> 
> Thanks
> 
> since v2:
>   - collect Reviewed-by tag

Oh, I missed the tag collection, but I know maintainers are using b4 which can
collect and apply tags automatically ;). let me know if want a new
version.

>   - fix typo
> 
> since v1:
>   - collect Tested-by tag
>   - add patch2 to use T-HEAD PA based CMO instructions.
> 
> Jisheng Zhang (2):
>   riscv: errata: thead: use riscv_nonstd_cache_ops for CMO
>   riscv: errata: thead: use pa based instructions for CMO
> 
>  arch/riscv/Kconfig.errata            |  1 +
>  arch/riscv/errata/thead/errata.c     | 69 +++++++++++++++++++++++++++-
>  arch/riscv/include/asm/errata_list.h | 50 +++-----------------
>  3 files changed, 74 insertions(+), 46 deletions(-)
> 
> -- 
> 2.40.1
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Conor Dooley Oct. 12, 2023, 2:36 p.m. UTC | #2
On Thu, Oct 12, 2023 at 10:21:08PM +0800, Jisheng Zhang wrote:
> On Thu, Oct 12, 2023 at 10:14:54PM +0800, Jisheng Zhang wrote:
> > Previously, we use alternative mechanism to dynamically patch
> > the CMO operations for THEAD C906/C910 during boot for performance
> > reason. But as pointed out by Arnd, "there is already a significant
> > cost in accessing the invalidated cache lines afterwards, which is
> > likely going to be much higher than the cost of an indirect branch".
> > And indeed, there's no performance difference with GMAC and EMMC per
> > my test on Sipeed Lichee Pi 4A board.
> > 
> > Use riscv_nonstd_cache_ops for THEAD C906/C910 CMO to simplify
> > the alternative code, and to acchieve Arnd's goal -- "I think
> > moving the THEAD ops at the same level as all nonstandard operations
> > makes sense, but I'd still leave CMO as an explicit fast path that
> > avoids the indirect branch. This seems like the right thing to do both
> > for readability and for platforms on which the indirect branch has a
> > noticeable overhead."
> > 
> > To make bisect easy, I use two patches here: patch1 does the conversion
> > which just mimics current CMO behavior via. riscv_nonstd_cache_ops, I
> > assume no functionalities changes. patch2 uses T-HEAD PA based CMO
> > instructions so that we don't need to covert PA to VA.
> > 
> > Hi Guo,
> > 
> > I didn't use wback_inv for wback as you suggested during v1 reviewing,
> > this can be left as future optimizations.
> > 
> > Thanks
> > 
> > since v2:
> >   - collect Reviewed-by tag
> 
> Oh, I missed the tag collection, but I know maintainers are using b4 which can
> collect and apply tags automatically ;). let me know if want a new
> version.

It doesn't collect tags (AFAIU) from earlier revisions though.
Jisheng Zhang Oct. 12, 2023, 2:40 p.m. UTC | #3
On Thu, Oct 12, 2023 at 03:36:28PM +0100, Conor Dooley wrote:
> On Thu, Oct 12, 2023 at 10:21:08PM +0800, Jisheng Zhang wrote:
> > On Thu, Oct 12, 2023 at 10:14:54PM +0800, Jisheng Zhang wrote:
> > > Previously, we use alternative mechanism to dynamically patch
> > > the CMO operations for THEAD C906/C910 during boot for performance
> > > reason. But as pointed out by Arnd, "there is already a significant
> > > cost in accessing the invalidated cache lines afterwards, which is
> > > likely going to be much higher than the cost of an indirect branch".
> > > And indeed, there's no performance difference with GMAC and EMMC per
> > > my test on Sipeed Lichee Pi 4A board.
> > > 
> > > Use riscv_nonstd_cache_ops for THEAD C906/C910 CMO to simplify
> > > the alternative code, and to acchieve Arnd's goal -- "I think
> > > moving the THEAD ops at the same level as all nonstandard operations
> > > makes sense, but I'd still leave CMO as an explicit fast path that
> > > avoids the indirect branch. This seems like the right thing to do both
> > > for readability and for platforms on which the indirect branch has a
> > > noticeable overhead."
> > > 
> > > To make bisect easy, I use two patches here: patch1 does the conversion
> > > which just mimics current CMO behavior via. riscv_nonstd_cache_ops, I
> > > assume no functionalities changes. patch2 uses T-HEAD PA based CMO
> > > instructions so that we don't need to covert PA to VA.
> > > 
> > > Hi Guo,
> > > 
> > > I didn't use wback_inv for wback as you suggested during v1 reviewing,
> > > this can be left as future optimizations.
> > > 
> > > Thanks
> > > 
> > > since v2:
> > >   - collect Reviewed-by tag
> > 
> > Oh, I missed the tag collection, but I know maintainers are using b4 which can
> > collect and apply tags automatically ;). let me know if want a new
> > version.
> 
> It doesn't collect tags (AFAIU) from earlier revisions though.

oops I didn't know this before, just sent out v4 with real tag collection to
make the merging progress smooth.