Message ID | cover.1553647082.git.gary@garyguo.net (mailing list archive) |
---|---|
Headers | show |
Series | TLB/I$ flush cleanups and improvements | expand |
On Wed, Mar 27, 2019 at 12:41:11AM +0000, Gary Guo wrote: > From: Gary Guo <gary@garyguo.net> > > This is the v4 of the general TLB/I$ flush improvement series. > I still have tlbi_method=ipi being the default as opposed to what > Atish suggests, as: > * There are still usage of BBL in the wild > * OpenSBI's support isn't made into stable yet. > * OpenSBI's support on the dev branch has some racing issues yet to resolve. Can you clarify the races? I know Anup had some FIFO-order commits in opensbi about a week ago, did they address you concerns? Anup, do you have performance numbers for the old opensbi vs your implementation of the optimized TLB flushing vs this patch?
On Wed, Apr 10, 2019 at 12:34 PM Christoph Hellwig <hch@infradead.org> wrote: > > On Wed, Mar 27, 2019 at 12:41:11AM +0000, Gary Guo wrote: > > From: Gary Guo <gary@garyguo.net> > > > > This is the v4 of the general TLB/I$ flush improvement series. > > I still have tlbi_method=ipi being the default as opposed to what > > Atish suggests, as: > > * There are still usage of BBL in the wild > > * OpenSBI's support isn't made into stable yet. > > * OpenSBI's support on the dev branch has some racing issues yet to resolve. > > Can you clarify the races? I know Anup had some FIFO-order commits > in opensbi about a week ago, did they address you concerns? > > Anup, do you have performance numbers for the old opensbi vs your > implementation of the optimized TLB flushing vs this patch? Atish had posted performance numbers on his GitHub PR at: https://github.com/riscv/opensbi/pull/111 These performance numbers are as follows..... Benchmark used: A microbenchmark that mmap a ramdisk (1G) and multiple threads access 50MB of memory randomly. https://github.com/westerndigitalcorporation/hmmap/blob/master/userspace/hmmap_uspace_common.c The result is averaged over 25 iterations for 8 threads on HiFive Unleashed board. In both cases around ~1M remote tlb flushes are triggered. IPI SBI Gain Average Write Time 2.53183 2.43263 +4.34% Average Read Time 1.32198 1.24643 +6.09% Total Time 97.7589 92.859 +5.01% I believe he has more optimizations in-pipeline for OpenSBI so we might see even better numbers. Regards, Anup
On Wed, Apr 10, 2019 at 02:31:04PM +0530, Anup Patel wrote: > > Can you clarify the races? I know Anup had some FIFO-order commits > > in opensbi about a week ago, did they address you concerns? > > > > Anup, do you have performance numbers for the old opensbi vs your > > implementation of the optimized TLB flushing vs this patch? > > Atish had posted performance numbers on his GitHub PR at: > https://github.com/riscv/opensbi/pull/111 > > These performance numbers are as follows..... > > Benchmark used: A microbenchmark that mmap a ramdisk (1G) and > multiple threads access 50MB of memory randomly. > > https://github.com/westerndigitalcorporation/hmmap/blob/master/userspace/hmmap_uspace_common.c > > The result is averaged over 25 iterations for 8 threads on HiFive > Unleashed board. In both cases around ~1M remote tlb flushes are triggered. > > IPI SBI Gain > Average Write Time 2.53183 2.43263 +4.34% > Average Read Time 1.32198 1.24643 +6.09% > Total Time 97.7589 92.859 +5.01% So what does this mean? I assume the codebases are latest(-ish) opensbi and latest(-ish) kernel with the patches from Gary, and IPI is with the lernel based code enabled, and SBI is with the SBI calls?
On Wed, Apr 10, 2019 at 3:41 PM Christoph Hellwig <hch@infradead.org> wrote: > > On Wed, Apr 10, 2019 at 02:31:04PM +0530, Anup Patel wrote: > > > Can you clarify the races? I know Anup had some FIFO-order commits > > > in opensbi about a week ago, did they address you concerns? > > > > > > Anup, do you have performance numbers for the old opensbi vs your > > > implementation of the optimized TLB flushing vs this patch? > > > > Atish had posted performance numbers on his GitHub PR at: > > https://github.com/riscv/opensbi/pull/111 > > > > These performance numbers are as follows..... > > > > Benchmark used: A microbenchmark that mmap a ramdisk (1G) and > > multiple threads access 50MB of memory randomly. > > > > https://github.com/westerndigitalcorporation/hmmap/blob/master/userspace/hmmap_uspace_common.c > > > > The result is averaged over 25 iterations for 8 threads on HiFive > > Unleashed board. In both cases around ~1M remote tlb flushes are triggered. > > > > IPI SBI Gain > > Average Write Time 2.53183 2.43263 +4.34% > > Average Read Time 1.32198 1.24643 +6.09% > > Total Time 97.7589 92.859 +5.01% > > So what does this mean? I assume the codebases are latest(-ish) > opensbi and latest(-ish) kernel with the patches from Gary, and > IPI is with the lernel based code enabled, and SBI is with the SBI > calls? Yes, this is measured using Gary's v4 patches.The IPI numbers are with in-kernel remote TLB flush whereas SBI numbers are with SBI-based remote TLB flush. Atish's changes for remote TLB flushes are available in latest OpenSBI. Regards, Anup
On 4/10/19 3:22 AM, Anup Patel wrote: > On Wed, Apr 10, 2019 at 3:41 PM Christoph Hellwig <hch@infradead.org> wrote: >> >> On Wed, Apr 10, 2019 at 02:31:04PM +0530, Anup Patel wrote: >>>> Can you clarify the races? I know Anup had some FIFO-order commits >>>> in opensbi about a week ago, did they address you concerns? >>>> >>>> Anup, do you have performance numbers for the old opensbi vs your >>>> implementation of the optimized TLB flushing vs this patch? >>> >>> Atish had posted performance numbers on his GitHub PR at: >>> https://github.com/riscv/opensbi/pull/111 >>> >>> These performance numbers are as follows..... >>> >>> Benchmark used: A microbenchmark that mmap a ramdisk (1G) and >>> multiple threads access 50MB of memory randomly. >>> >>> https://github.com/westerndigitalcorporation/hmmap/blob/master/userspace/hmmap_uspace_common.c >>> >>> The result is averaged over 25 iterations for 8 threads on HiFive >>> Unleashed board. In both cases around ~1M remote tlb flushes are triggered. >>> >>> IPI SBI Gain >>> Average Write Time 2.53183 2.43263 +4.34% >>> Average Read Time 1.32198 1.24643 +6.09% >>> Total Time 97.7589 92.859 +5.01% >> >> So what does this mean? I assume the codebases are latest(-ish) >> opensbi and latest(-ish) kernel with the patches from Gary, and >> IPI is with the lernel based code enabled, and SBI is with the SBI >> calls? > > Yes, this is measured using Gary's v4 patches.The IPI numbers are > with in-kernel remote TLB flush whereas SBI numbers are with > SBI-based remote TLB flush. > > Atish's changes for remote TLB flushes are available in latest OpenSBI. > > Regards, > Anup > The patch to access the tlb statistics in vmstat can be found here. https://patchwork.kernel.org/project/linux-riscv/list/?series=103939 Regards, Atish
From: Gary Guo <gary@garyguo.net> This is the v4 of the general TLB/I$ flush improvement series. I still have tlbi_method=ipi being the default as opposed to what Atish suggests, as: * There are still usage of BBL in the wild * OpenSBI's support isn't made into stable yet. * OpenSBI's support on the dev branch has some racing issues yet to resolve. Once most SBIs used in the wild can handle remote shootdown properly we can the submit another patch to change the default value to tlbi_method=sbi again. This patches does: 1. Move long and expensive functions aways from header files. 2. Fix missing arguments for SBI calls. 3. Performance improvements for TLB flush. 4. Implement IPI-based remote shootdown in case the SBI ignores ASID and vaddr operands. Changes since v3: - Document tlbi_max_ops and tlbi_method in kernel-parameter.txt - Split IPI-based shootdown implementation into its own commit Changes since v2: - Replace __setup with early_param - Rebase on top of for-next Changes since v1: - Use kernel boot parameters instead of Kconfig - Style fixes Gary Guo (5): riscv: move flush_icache_{all,mm} to cacheflush.c riscv: move switch_mm to its own file riscv: fix sbi_remote_sfence_vma{,_asid}. riscv: rewrite tlb flush for performance riscv: implement IPI-based remote TLB shootdown .../admin-guide/kernel-parameters.rst | 1 + .../admin-guide/kernel-parameters.txt | 13 ++ arch/riscv/include/asm/cacheflush.h | 2 +- arch/riscv/include/asm/mmu_context.h | 59 +---- arch/riscv/include/asm/pgtable.h | 2 +- arch/riscv/include/asm/sbi.h | 19 +- arch/riscv/include/asm/tlbflush.h | 76 +++---- arch/riscv/kernel/smp.c | 49 ---- arch/riscv/mm/Makefile | 2 + arch/riscv/mm/cacheflush.c | 61 +++++ arch/riscv/mm/context.c | 77 +++++++ arch/riscv/mm/init.c | 2 +- arch/riscv/mm/tlbflush.c | 215 ++++++++++++++++++ 13 files changed, 417 insertions(+), 161 deletions(-) create mode 100644 arch/riscv/mm/context.c create mode 100644 arch/riscv/mm/tlbflush.c