Message ID | 20250206044346.3810242-2-riel@surriel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | AMD broadcast TLB invalidation | expand |
Hi Rik, On Thu, 6 Feb 2025 at 05:45, Rik van Riel <riel@surriel.com> wrote: > This is done because some paravirt TLB flush implementations > handle the TLB flush in the hypervisor, and will do the flush > even when the target CPU has interrupts disabled. > > Always handle page table freeing with MMU_GATHER_RCU_TABLE_FREE. > Using RCU synchronization between page table freeing and get_user_pages_fast() > allows bare metal to also do TLB flushing while interrupts are disabled. > > Various places in the mm do still block IRQs or disable preemption > as an implicit way to block RCU frees. > > That makes it safe to use INVLPGB on AMD CPUs. It would be nice to update the CONFIG_MMU_GATHER_RCU_TABLE_FREE comment in mm/mmu_gather.c to mention INVLPG alongside "Architectures that do not have this (PPC)" - and while that's being updated it would also be useful to note down the paravirt thing you explained above, IMO it's pretty helpful to have more examples of the concrete usecases for this logic.
On Fri, Feb 07, 2025 at 03:28:31PM +0100, Brendan Jackman wrote: > Hi Rik, > > On Thu, 6 Feb 2025 at 05:45, Rik van Riel <riel@surriel.com> wrote: > > This is done because some paravirt TLB flush implementations > > handle the TLB flush in the hypervisor, and will do the flush > > even when the target CPU has interrupts disabled. > > > > Always handle page table freeing with MMU_GATHER_RCU_TABLE_FREE. > > Using RCU synchronization between page table freeing and get_user_pages_fast() > > allows bare metal to also do TLB flushing while interrupts are disabled. > > > > Various places in the mm do still block IRQs or disable preemption > > as an implicit way to block RCU frees. > > > > That makes it safe to use INVLPGB on AMD CPUs. > > It would be nice to update the CONFIG_MMU_GATHER_RCU_TABLE_FREE > comment in mm/mmu_gather.c to mention INVLPG alongside "Architectures > that do not have this (PPC)" Why? This is just one more architecture that does broadcast. - and while that's being updated it would > also be useful to note down the paravirt thing you explained above, > IMO it's pretty helpful to have more examples of the concrete usecases > for this logic. Look at kvm_flush_tlb_multi() if you're interested. The notable detail is that is avoids flushing TLB for vCPUs that are preempted, and instead lets the vCPU resume code do the invalidate.
On Tue, 11 Feb 2025 at 12:07, Peter Zijlstra <peterz@infradead.org> wrote: > > It would be nice to update the CONFIG_MMU_GATHER_RCU_TABLE_FREE > > comment in mm/mmu_gather.c to mention INVLPG alongside "Architectures > > that do not have this (PPC)" > > Why? This is just one more architecture that does broadcast. Hmm yeah, I didn't really make the leap from "doesn't do an IPI" to "that just means it uses broadcast TLB flush". In that light I can see how it's "just another architecture". I do think it would make sense to be more explicit about that, even though it seems obvious now you point it out. But it's not really relevant to this patchset. > - and while that's being updated it would > > also be useful to note down the paravirt thing you explained above, > > IMO it's pretty helpful to have more examples of the concrete usecases > > for this logic. > > Look at kvm_flush_tlb_multi() if you're interested. The notable detail > is that is avoids flushing TLB for vCPUs that are preempted, and instead > lets the vCPU resume code do the invalidate. Oh that actually looks like a slightly different case from what Rik mentioned? > some paravirt TLB flush implementations > handle the TLB flush in the hypervisor, and will do the flush > even when the target CPU has interrupts disabled. Do we have a) Systems where the flush gets completely pushed into the hypervisor - xen_flush_tlb_multi()? b) Systems where the guest coordinates with the hypervisor to avoid IPI-ing preempted vCPUs? Maybe I can send a separate patch to note this in the commentary, it's interesting and useful to know.
On Tue, 2025-02-11 at 13:10 +0100, Brendan Jackman wrote: > > Maybe I can send a separate patch to note this in the commentary, > it's > interesting and useful to know. > I added a similar comment to this patch series for v10 yesterday. I'll send it out after a few more tests have finished running. I'm not particularly invested in which version of the comment gets merged :)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 9d7bd0ae48c4..e8743f8c9fd0 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -274,7 +274,7 @@ config X86 select HAVE_PCI select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP - select MMU_GATHER_RCU_TABLE_FREE if PARAVIRT + select MMU_GATHER_RCU_TABLE_FREE select MMU_GATHER_MERGE_VMAS select HAVE_POSIX_CPU_TIMERS_TASK_WORK select HAVE_REGS_AND_STACK_ACCESS_API diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index fec381533555..2b78a6b466ed 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -59,11 +59,6 @@ void __init native_pv_lock_init(void) static_branch_enable(&virt_spin_lock_key); } -static void native_tlb_remove_table(struct mmu_gather *tlb, void *table) -{ - tlb_remove_page(tlb, table); -} - struct static_key paravirt_steal_enabled; struct static_key paravirt_steal_rq_enabled; @@ -191,7 +186,7 @@ struct paravirt_patch_template pv_ops = { .mmu.flush_tlb_kernel = native_flush_tlb_global, .mmu.flush_tlb_one_user = native_flush_tlb_one_user, .mmu.flush_tlb_multi = native_flush_tlb_multi, - .mmu.tlb_remove_table = native_tlb_remove_table, + .mmu.tlb_remove_table = tlb_remove_table, .mmu.exit_mmap = paravirt_nop, .mmu.notify_page_enc_status_changed = paravirt_nop, diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 5745a354a241..3dc4af1f7868 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -18,14 +18,6 @@ EXPORT_SYMBOL(physical_mask); #define PGTABLE_HIGHMEM 0 #endif -#ifndef CONFIG_PARAVIRT -static inline -void paravirt_tlb_remove_table(struct mmu_gather *tlb, void *table) -{ - tlb_remove_page(tlb, table); -} -#endif - gfp_t __userpte_alloc_gfp = GFP_PGTABLE_USER | PGTABLE_HIGHMEM; pgtable_t pte_alloc_one(struct mm_struct *mm) @@ -54,7 +46,7 @@ void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte) { pagetable_pte_dtor(page_ptdesc(pte)); paravirt_release_pte(page_to_pfn(pte)); - paravirt_tlb_remove_table(tlb, pte); + tlb_remove_table(tlb, pte); } #if CONFIG_PGTABLE_LEVELS > 2 @@ -70,7 +62,7 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd) tlb->need_flush_all = 1; #endif pagetable_pmd_dtor(ptdesc); - paravirt_tlb_remove_table(tlb, ptdesc_page(ptdesc)); + tlb_remove_table(tlb, ptdesc_page(ptdesc)); } #if CONFIG_PGTABLE_LEVELS > 3 @@ -80,14 +72,14 @@ void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud) pagetable_pud_dtor(ptdesc); paravirt_release_pud(__pa(pud) >> PAGE_SHIFT); - paravirt_tlb_remove_table(tlb, virt_to_page(pud)); + tlb_remove_table(tlb, virt_to_page(pud)); } #if CONFIG_PGTABLE_LEVELS > 4 void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d) { paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT); - paravirt_tlb_remove_table(tlb, virt_to_page(p4d)); + tlb_remove_table(tlb, virt_to_page(p4d)); } #endif /* CONFIG_PGTABLE_LEVELS > 4 */ #endif /* CONFIG_PGTABLE_LEVELS > 3 */