Message ID | 20230417060618.639395-1-vincent.chen@sifive.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
Series | [v3] riscv: mm: execute local TLB flush after populating vmemmap | expand |
Context | Check | Description |
---|---|---|
conchuod/cover_letter | success | Single patches do not need cover letters |
conchuod/tree_selection | success | Guessed tree name to be fixes at HEAD 1b50f956c8fe |
conchuod/fixes_present | success | Fixes tag present in non-next series |
conchuod/maintainers_pattern | success | MAINTAINERS pattern errors before the patch: 1 and now 1 |
conchuod/verify_signedoff | success | Signed-off-by tag matches author and committer |
conchuod/kdoc | success | Errors and warnings before: 0 this patch: 0 |
conchuod/build_rv64_clang_allmodconfig | success | Errors and warnings before: 3288 this patch: 3288 |
conchuod/module_param | success | Was 0 now: 0 |
conchuod/build_rv64_gcc_allmodconfig | success | Errors and warnings before: 17609 this patch: 17609 |
conchuod/build_rv32_defconfig | success | Build OK |
conchuod/dtb_warn_rv64 | success | Errors and warnings before: 3 this patch: 3 |
conchuod/header_inline | success | No static functions without inline keyword in header files |
conchuod/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 28 lines checked |
conchuod/source_inline | success | Was 0 now: 0 |
conchuod/build_rv64_nommu_k210_defconfig | success | Build OK |
conchuod/verify_fixes | success | Fixes tag looks correct |
conchuod/build_rv64_nommu_virt_defconfig | success | Build OK |
On Sun, 16 Apr 2023 23:06:18 PDT (-0700), vincent.chen@sifive.com wrote: > The spare_init() calls memmap_populate() many times to create VA to PA > mapping for the VMEMMAP area, where all "struct page" are located once > CONFIG_SPARSEMEM_VMEMMAP is defined. These "struct page" are later > initialized in the zone_sizes_init() function. However, during this > process, no sfence.vma instruction is executed for this VMEMMAP area. > This omission may cause the hart to fail to perform page table walk > because some data related to the address translation is invisible to the > hart. To solve this issue, the local_flush_tlb_kernel_range() is called > right after the spare_init() to execute a sfence.vma instruction for the > VMEMMAP area, ensuring that all data related to the address translation > is visible to the hart. > > Fixes: d95f1a542c3d ("RISC-V: Implement sparsemem") > Signed-off-by: Vincent Chen <vincent.chen@sifive.com> > Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> > Reviewed-by: Andrew Jones <ajones@ventanamicro.com> > --- > arch/riscv/include/asm/tlbflush.h | 7 +++++++ > arch/riscv/mm/init.c | 5 +++++ > 2 files changed, 12 insertions(+) > > diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h > index a09196f8de68..f9d3712bd93b 100644 > --- a/arch/riscv/include/asm/tlbflush.h > +++ b/arch/riscv/include/asm/tlbflush.h > @@ -61,4 +61,11 @@ static inline void flush_tlb_kernel_range(unsigned long start, > flush_tlb_all(); > } > > +/* Flush a range of kernel pages without broadcasting */ > +static inline void local_flush_tlb_kernel_range(unsigned long start, > + unsigned long end) > +{ > + local_flush_tlb_all(); Why is this a global flush, istead of a range-based one? We should probably make the same change for flush_tlb_kernel_range(), though that's not a fix. > +} > + > #endif /* _ASM_RISCV_TLBFLUSH_H */ > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c > index 0f14f4a8d179..bcf365cbbcc1 100644 > --- a/arch/riscv/mm/init.c > +++ b/arch/riscv/mm/init.c > @@ -1221,6 +1221,10 @@ void __init misc_mem_init(void) > early_memtest(min_low_pfn << PAGE_SHIFT, max_low_pfn << PAGE_SHIFT); > arch_numa_init(); > sparse_init(); > +#ifdef CONFIG_SPARSEMEM_VMEMMAP > + /* The entire VMEMMAP region has been populated. Flush TLB for this region */ > + local_flush_tlb_kernel_range(VMEMMAP_START, VMEMMAP_END); > +#endif > zone_sizes_init(); > reserve_crashkernel(); > memblock_dump_all(); > @@ -1230,6 +1234,7 @@ void __init misc_mem_init(void) > int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, > struct vmem_altmap *altmap) > { > + /* Defer the required TLB flush until the entire VMEMMAP region has been populated */ > return vmemmap_populate_basepages(start, end, node, NULL); > } > #endif
On Tue, May 2, 2023 at 2:18 AM Palmer Dabbelt <palmer@dabbelt.com> wrote: > > On Sun, 16 Apr 2023 23:06:18 PDT (-0700), vincent.chen@sifive.com wrote: > > The spare_init() calls memmap_populate() many times to create VA to PA > > mapping for the VMEMMAP area, where all "struct page" are located once > > CONFIG_SPARSEMEM_VMEMMAP is defined. These "struct page" are later > > initialized in the zone_sizes_init() function. However, during this > > process, no sfence.vma instruction is executed for this VMEMMAP area. > > This omission may cause the hart to fail to perform page table walk > > because some data related to the address translation is invisible to the > > hart. To solve this issue, the local_flush_tlb_kernel_range() is called > > right after the spare_init() to execute a sfence.vma instruction for the > > VMEMMAP area, ensuring that all data related to the address translation > > is visible to the hart. > > > > Fixes: d95f1a542c3d ("RISC-V: Implement sparsemem") > > Signed-off-by: Vincent Chen <vincent.chen@sifive.com> > > Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > Reviewed-by: Andrew Jones <ajones@ventanamicro.com> > > --- > > arch/riscv/include/asm/tlbflush.h | 7 +++++++ > > arch/riscv/mm/init.c | 5 +++++ > > 2 files changed, 12 insertions(+) > > > > diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h > > index a09196f8de68..f9d3712bd93b 100644 > > --- a/arch/riscv/include/asm/tlbflush.h > > +++ b/arch/riscv/include/asm/tlbflush.h > > @@ -61,4 +61,11 @@ static inline void flush_tlb_kernel_range(unsigned long start, > > flush_tlb_all(); > > } > > > > +/* Flush a range of kernel pages without broadcasting */ > > +static inline void local_flush_tlb_kernel_range(unsigned long start, > > + unsigned long end) > > +{ > > + local_flush_tlb_all(); > > Why is this a global flush, istead of a range-based one? We should > probably make the same change for flush_tlb_kernel_range(), though > that's not a fix. You're right, but the current implementation of flush_tlb_range will ultimately call flush_tlb_all if the size > PAGE_SIZE, which is very likely in this case. Anyway, I have a patchset almost ready that improves the tlb flush routines, and I noted that I had to fix this function once merged. > > > +} > > + > > #endif /* _ASM_RISCV_TLBFLUSH_H */ > > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c > > index 0f14f4a8d179..bcf365cbbcc1 100644 > > --- a/arch/riscv/mm/init.c > > +++ b/arch/riscv/mm/init.c > > @@ -1221,6 +1221,10 @@ void __init misc_mem_init(void) > > early_memtest(min_low_pfn << PAGE_SHIFT, max_low_pfn << PAGE_SHIFT); > > arch_numa_init(); > > sparse_init(); > > +#ifdef CONFIG_SPARSEMEM_VMEMMAP > > + /* The entire VMEMMAP region has been populated. Flush TLB for this region */ > > + local_flush_tlb_kernel_range(VMEMMAP_START, VMEMMAP_END); > > +#endif > > zone_sizes_init(); > > reserve_crashkernel(); > > memblock_dump_all(); > > @@ -1230,6 +1234,7 @@ void __init misc_mem_init(void) > > int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, > > struct vmem_altmap *altmap) > > { > > + /* Defer the required TLB flush until the entire VMEMMAP region has been populated */ > > return vmemmap_populate_basepages(start, end, node, NULL); > > } > > #endif
On Tue, May 2, 2023 at 8:16 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > On Tue, May 2, 2023 at 2:18 AM Palmer Dabbelt <palmer@dabbelt.com> wrote: > > > > On Sun, 16 Apr 2023 23:06:18 PDT (-0700), vincent.chen@sifive.com wrote: > > > The spare_init() calls memmap_populate() many times to create VA to PA > > > mapping for the VMEMMAP area, where all "struct page" are located once > > > CONFIG_SPARSEMEM_VMEMMAP is defined. These "struct page" are later > > > initialized in the zone_sizes_init() function. However, during this > > > process, no sfence.vma instruction is executed for this VMEMMAP area. > > > This omission may cause the hart to fail to perform page table walk > > > because some data related to the address translation is invisible to the > > > hart. To solve this issue, the local_flush_tlb_kernel_range() is called > > > right after the spare_init() to execute a sfence.vma instruction for the > > > VMEMMAP area, ensuring that all data related to the address translation > > > is visible to the hart. > > > > > > Fixes: d95f1a542c3d ("RISC-V: Implement sparsemem") > > > Signed-off-by: Vincent Chen <vincent.chen@sifive.com> > > > Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > Reviewed-by: Andrew Jones <ajones@ventanamicro.com> > > > --- > > > arch/riscv/include/asm/tlbflush.h | 7 +++++++ > > > arch/riscv/mm/init.c | 5 +++++ > > > 2 files changed, 12 insertions(+) > > > > > > diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h > > > index a09196f8de68..f9d3712bd93b 100644 > > > --- a/arch/riscv/include/asm/tlbflush.h > > > +++ b/arch/riscv/include/asm/tlbflush.h > > > @@ -61,4 +61,11 @@ static inline void flush_tlb_kernel_range(unsigned long start, > > > flush_tlb_all(); > > > } > > > > > > +/* Flush a range of kernel pages without broadcasting */ > > > +static inline void local_flush_tlb_kernel_range(unsigned long start, > > > + unsigned long end) > > > +{ > > > + local_flush_tlb_all(); > > > > Why is this a global flush, istead of a range-based one? We should > > probably make the same change for flush_tlb_kernel_range(), though > > that's not a fix. > > You're right, but the current implementation of flush_tlb_range will > ultimately call flush_tlb_all if the size > PAGE_SIZE, which is very > likely in this case. Anyway, I have a patchset almost ready that > improves the tlb flush routines, and I noted that I had to fix this > function once merged. > I agree with Alexandre that it's highly probable that the size will be greater than PAGE_SIZE in this case. However, I must admit that this implementation is not suitable for the case where the size is less than or equal to PAGE_SIZE. I am willing to wait for Alexandre's patch and rebase my patch on top of his If you believe it is a better approach. > > > > > +} > > > + > > > #endif /* _ASM_RISCV_TLBFLUSH_H */ > > > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c > > > index 0f14f4a8d179..bcf365cbbcc1 100644 > > > --- a/arch/riscv/mm/init.c > > > +++ b/arch/riscv/mm/init.c > > > @@ -1221,6 +1221,10 @@ void __init misc_mem_init(void) > > > early_memtest(min_low_pfn << PAGE_SHIFT, max_low_pfn << PAGE_SHIFT); > > > arch_numa_init(); > > > sparse_init(); > > > +#ifdef CONFIG_SPARSEMEM_VMEMMAP > > > + /* The entire VMEMMAP region has been populated. Flush TLB for this region */ > > > + local_flush_tlb_kernel_range(VMEMMAP_START, VMEMMAP_END); > > > +#endif > > > zone_sizes_init(); > > > reserve_crashkernel(); > > > memblock_dump_all(); > > > @@ -1230,6 +1234,7 @@ void __init misc_mem_init(void) > > > int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, > > > struct vmem_altmap *altmap) > > > { > > > + /* Defer the required TLB flush until the entire VMEMMAP region has been populated */ > > > return vmemmap_populate_basepages(start, end, node, NULL); > > > } > > > #endif
Hi Vincent, On 04/05/2023 11:13, Vincent Chen wrote: > On Tue, May 2, 2023 at 8:16 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: >> On Tue, May 2, 2023 at 2:18 AM Palmer Dabbelt <palmer@dabbelt.com> wrote: >>> On Sun, 16 Apr 2023 23:06:18 PDT (-0700), vincent.chen@sifive.com wrote: >>>> The spare_init() calls memmap_populate() many times to create VA to PA >>>> mapping for the VMEMMAP area, where all "struct page" are located once >>>> CONFIG_SPARSEMEM_VMEMMAP is defined. These "struct page" are later >>>> initialized in the zone_sizes_init() function. However, during this >>>> process, no sfence.vma instruction is executed for this VMEMMAP area. >>>> This omission may cause the hart to fail to perform page table walk >>>> because some data related to the address translation is invisible to the >>>> hart. To solve this issue, the local_flush_tlb_kernel_range() is called >>>> right after the spare_init() to execute a sfence.vma instruction for the >>>> VMEMMAP area, ensuring that all data related to the address translation >>>> is visible to the hart. >>>> >>>> Fixes: d95f1a542c3d ("RISC-V: Implement sparsemem") >>>> Signed-off-by: Vincent Chen <vincent.chen@sifive.com> >>>> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> >>>> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> >>>> --- >>>> arch/riscv/include/asm/tlbflush.h | 7 +++++++ >>>> arch/riscv/mm/init.c | 5 +++++ >>>> 2 files changed, 12 insertions(+) >>>> >>>> diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h >>>> index a09196f8de68..f9d3712bd93b 100644 >>>> --- a/arch/riscv/include/asm/tlbflush.h >>>> +++ b/arch/riscv/include/asm/tlbflush.h >>>> @@ -61,4 +61,11 @@ static inline void flush_tlb_kernel_range(unsigned long start, >>>> flush_tlb_all(); >>>> } >>>> >>>> +/* Flush a range of kernel pages without broadcasting */ >>>> +static inline void local_flush_tlb_kernel_range(unsigned long start, >>>> + unsigned long end) >>>> +{ >>>> + local_flush_tlb_all(); >>> Why is this a global flush, istead of a range-based one? We should >>> probably make the same change for flush_tlb_kernel_range(), though >>> that's not a fix. >> You're right, but the current implementation of flush_tlb_range will >> ultimately call flush_tlb_all if the size > PAGE_SIZE, which is very >> likely in this case. Anyway, I have a patchset almost ready that >> improves the tlb flush routines, and I noted that I had to fix this >> function once merged. >> > I agree with Alexandre that it's highly probable that the size will be > greater than PAGE_SIZE in this case. However, I must admit that this > implementation is not suitable for the case where the size is less > than or equal to PAGE_SIZE. I am willing to wait for Alexandre's patch > and rebase my patch on top of his If you believe it is a better > approach. My patchset was merged in 6.6, so you can now rebase this :) Let me know if I can help. Thanks, Alex > >>>> +} >>>> + >>>> #endif /* _ASM_RISCV_TLBFLUSH_H */ >>>> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c >>>> index 0f14f4a8d179..bcf365cbbcc1 100644 >>>> --- a/arch/riscv/mm/init.c >>>> +++ b/arch/riscv/mm/init.c >>>> @@ -1221,6 +1221,10 @@ void __init misc_mem_init(void) >>>> early_memtest(min_low_pfn << PAGE_SHIFT, max_low_pfn << PAGE_SHIFT); >>>> arch_numa_init(); >>>> sparse_init(); >>>> +#ifdef CONFIG_SPARSEMEM_VMEMMAP >>>> + /* The entire VMEMMAP region has been populated. Flush TLB for this region */ >>>> + local_flush_tlb_kernel_range(VMEMMAP_START, VMEMMAP_END); >>>> +#endif >>>> zone_sizes_init(); >>>> reserve_crashkernel(); >>>> memblock_dump_all(); >>>> @@ -1230,6 +1234,7 @@ void __init misc_mem_init(void) >>>> int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, >>>> struct vmem_altmap *altmap) >>>> { >>>> + /* Defer the required TLB flush until the entire VMEMMAP region has been populated */ >>>> return vmemmap_populate_basepages(start, end, node, NULL); >>>> } >>>> #endif > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On 12/12/2023 06:42, Vincent Chen wrote: > > > On Mon, Dec 11, 2023 at 4:57 PM Alexandre Ghiti <alex@ghiti.fr> wrote: > > Hi Vincent, > > On 04/05/2023 11:13, Vincent Chen wrote: > > On Tue, May 2, 2023 at 8:16 PM Alexandre Ghiti > <alexghiti@rivosinc.com> wrote: > >> On Tue, May 2, 2023 at 2:18 AM Palmer Dabbelt > <palmer@dabbelt.com> wrote: > >>> On Sun, 16 Apr 2023 23:06:18 PDT (-0700), > vincent.chen@sifive.com wrote: > >>>> The spare_init() calls memmap_populate() many times to create > VA to PA > >>>> mapping for the VMEMMAP area, where all "struct page" are > located once > >>>> CONFIG_SPARSEMEM_VMEMMAP is defined. These "struct page" are > later > >>>> initialized in the zone_sizes_init() function. However, > during this > >>>> process, no sfence.vma instruction is executed for this > VMEMMAP area. > >>>> This omission may cause the hart to fail to perform page > table walk > >>>> because some data related to the address translation is > invisible to the > >>>> hart. To solve this issue, the local_flush_tlb_kernel_range() > is called > >>>> right after the spare_init() to execute a sfence.vma > instruction for the > >>>> VMEMMAP area, ensuring that all data related to the address > translation > >>>> is visible to the hart. > >>>> > >>>> Fixes: d95f1a542c3d ("RISC-V: Implement sparsemem") > >>>> Signed-off-by: Vincent Chen <vincent.chen@sifive.com> > >>>> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> > >>>> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> > >>>> --- > >>>> arch/riscv/include/asm/tlbflush.h | 7 +++++++ > >>>> arch/riscv/mm/init.c | 5 +++++ > >>>> 2 files changed, 12 insertions(+) > >>>> > >>>> diff --git a/arch/riscv/include/asm/tlbflush.h > b/arch/riscv/include/asm/tlbflush.h > >>>> index a09196f8de68..f9d3712bd93b 100644 > >>>> --- a/arch/riscv/include/asm/tlbflush.h > >>>> +++ b/arch/riscv/include/asm/tlbflush.h > >>>> @@ -61,4 +61,11 @@ static inline void > flush_tlb_kernel_range(unsigned long start, > >>>> flush_tlb_all(); > >>>> } > >>>> > >>>> +/* Flush a range of kernel pages without broadcasting */ > >>>> +static inline void local_flush_tlb_kernel_range(unsigned > long start, > >>>> + unsigned long end) > >>>> +{ > >>>> + local_flush_tlb_all(); > >>> Why is this a global flush, istead of a range-based one? We > should > >>> probably make the same change for flush_tlb_kernel_range(), though > >>> that's not a fix. > >> You're right, but the current implementation of flush_tlb_range > will > >> ultimately call flush_tlb_all if the size > PAGE_SIZE, which is > very > >> likely in this case. Anyway, I have a patchset almost ready that > >> improves the tlb flush routines, and I noted that I had to fix this > >> function once merged. > >> > > I agree with Alexandre that it's highly probable that the size > will be > > greater than PAGE_SIZE in this case. However, I must admit that this > > implementation is not suitable for the case where the size is less > > than or equal to PAGE_SIZE. I am willing to wait for Alexandre's > patch > > and rebase my patch on top of his If you believe it is a better > > approach. > > > My patchset was merged in 6.6, so you can now rebase this :) Let > me know > if I can help. > > Thanks, > > Alex > > Thank you for the notification. I will rebase my patch on the Linux 6.6. Sorry, it actually landed in 6.7, not 6.6. Thanks! Alex > > Thanks, > Vincent > > > > > >>>> +} > >>>> + > >>>> #endif /* _ASM_RISCV_TLBFLUSH_H */ > >>>> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c > >>>> index 0f14f4a8d179..bcf365cbbcc1 100644 > >>>> --- a/arch/riscv/mm/init.c > >>>> +++ b/arch/riscv/mm/init.c > >>>> @@ -1221,6 +1221,10 @@ void __init misc_mem_init(void) > >>>> early_memtest(min_low_pfn << PAGE_SHIFT, max_low_pfn > << PAGE_SHIFT); > >>>> arch_numa_init(); > >>>> sparse_init(); > >>>> +#ifdef CONFIG_SPARSEMEM_VMEMMAP > >>>> + /* The entire VMEMMAP region has been populated. Flush > TLB for this region */ > >>>> + local_flush_tlb_kernel_range(VMEMMAP_START, VMEMMAP_END); > >>>> +#endif > >>>> zone_sizes_init(); > >>>> reserve_crashkernel(); > >>>> memblock_dump_all(); > >>>> @@ -1230,6 +1234,7 @@ void __init misc_mem_init(void) > >>>> int __meminit vmemmap_populate(unsigned long start, > unsigned long end, int node, > >>>> struct vmem_altmap *altmap) > >>>> { > >>>> + /* Defer the required TLB flush until the entire > VMEMMAP region has been populated */ > >>>> return vmemmap_populate_basepages(start, end, node, NULL); > >>>> } > >>>> #endif > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-riscv >
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index a09196f8de68..f9d3712bd93b 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -61,4 +61,11 @@ static inline void flush_tlb_kernel_range(unsigned long start, flush_tlb_all(); } +/* Flush a range of kernel pages without broadcasting */ +static inline void local_flush_tlb_kernel_range(unsigned long start, + unsigned long end) +{ + local_flush_tlb_all(); +} + #endif /* _ASM_RISCV_TLBFLUSH_H */ diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index 0f14f4a8d179..bcf365cbbcc1 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -1221,6 +1221,10 @@ void __init misc_mem_init(void) early_memtest(min_low_pfn << PAGE_SHIFT, max_low_pfn << PAGE_SHIFT); arch_numa_init(); sparse_init(); +#ifdef CONFIG_SPARSEMEM_VMEMMAP + /* The entire VMEMMAP region has been populated. Flush TLB for this region */ + local_flush_tlb_kernel_range(VMEMMAP_START, VMEMMAP_END); +#endif zone_sizes_init(); reserve_crashkernel(); memblock_dump_all(); @@ -1230,6 +1234,7 @@ void __init misc_mem_init(void) int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, struct vmem_altmap *altmap) { + /* Defer the required TLB flush until the entire VMEMMAP region has been populated */ return vmemmap_populate_basepages(start, end, node, NULL); } #endif