diff mbox series

[v5,2/4] mm/memory.c: Update local TLB if PTE entry exists

Message ID 1590031837-9582-2-git-send-email-maobibo@loongson.cn (mailing list archive)
State Superseded
Headers show
Series [v5,1/4] MIPS: Do not flush tlb page when updating PTE entry | expand

Commit Message

bibo mao May 21, 2020, 3:30 a.m. UTC
If two threads concurrently fault at the same address, the thread that
won the race updates the PTE and its local TLB. For now, the other
thread gives up, simply does nothing, and continues.

It could happen that this second thread triggers another fault, whereby
it only updates its local TLB while handling the fault. Instead of
triggering another fault, let's directly update the local TLB of the
second thread.

It is only useful to architectures where software can update TLB, it may
bring out some negative effect if update_mmu_cache is used for other
purpose also. It seldom happens where multiple threads access the same
page at the same time, so the negative effect is limited on other arches.

With specjvm2008 workload, smp-race pgfault counts is about 3% to 4%
of the total pgfault counts by watching /proc/vmstats information

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
---
 arch/mips/include/asm/pgtable.h | 20 ++++++++++++++++++++
 mm/memory.c                     | 27 +++++++++++++++++++--------
 2 files changed, 39 insertions(+), 8 deletions(-)

Comments

Andrew Morton May 21, 2020, 7:22 p.m. UTC | #1
On Thu, 21 May 2020 11:30:35 +0800 Bibo Mao <maobibo@loongson.cn> wrote:

> If two threads concurrently fault at the same address, the thread that
> won the race updates the PTE and its local TLB. For now, the other
> thread gives up, simply does nothing, and continues.
> 
> It could happen that this second thread triggers another fault, whereby
> it only updates its local TLB while handling the fault. Instead of
> triggering another fault, let's directly update the local TLB of the
> second thread.
> 
> It is only useful to architectures where software can update TLB, it may
> bring out some negative effect if update_mmu_cache is used for other
> purpose also. It seldom happens where multiple threads access the same
> page at the same time, so the negative effect is limited on other arches.
> 
> With specjvm2008 workload, smp-race pgfault counts is about 3% to 4%
> of the total pgfault counts by watching /proc/vmstats information
> 

I'm sorry to keep thrashing this for so long, but I'd really prefer not
to add any overhead to architectures which don't need it.  However,
we're getting somewhere!

> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2436,10 +2436,9 @@ static inline bool cow_user_page(struct page *dst, struct page *src,
>  		if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) {
>  			/*
>  			 * Other thread has already handled the fault
> -			 * and we don't need to do anything. If it's
> -			 * not the case, the fault will be triggered
> -			 * again on the same address.
> +			 * and update local tlb only
>  			 */
> +			update_mmu_cache(vma, addr, vmf->pte);

Now, all the patch does is to add new calls to update_mmu_cache().

So can we replace all these with a call to a new
update_mmu_cache_sw_tlb() (or whatever) which is a no-op on
architectures which don't need the additional call?

Also, I wonder about the long-term maintainability.  People who
regularly work on this code won't be thinking of this MIPS peculiarity
and it's likely that any new calls to update_mmu_cache_sw_tlb() won't
be added where they should have been.  Hopefully copy-and-paste from
the existing code will serve us well.  Please do ensure that the
update_mmu_cache_sw_tlb() implementation is carefully commented so
that people can understand where they should (and shouldn't) include
this call.
bibo mao May 22, 2020, 8:48 a.m. UTC | #2
On 05/22/2020 03:22 AM, Andrew Morton wrote:
> On Thu, 21 May 2020 11:30:35 +0800 Bibo Mao <maobibo@loongson.cn> wrote:
> 
>> If two threads concurrently fault at the same address, the thread that
>> won the race updates the PTE and its local TLB. For now, the other
>> thread gives up, simply does nothing, and continues.
>>
>> It could happen that this second thread triggers another fault, whereby
>> it only updates its local TLB while handling the fault. Instead of
>> triggering another fault, let's directly update the local TLB of the
>> second thread.
>>
>> It is only useful to architectures where software can update TLB, it may
>> bring out some negative effect if update_mmu_cache is used for other
>> purpose also. It seldom happens where multiple threads access the same
>> page at the same time, so the negative effect is limited on other arches.
>>
>> With specjvm2008 workload, smp-race pgfault counts is about 3% to 4%
>> of the total pgfault counts by watching /proc/vmstats information
>>
> 
> I'm sorry to keep thrashing this for so long, but I'd really prefer not
> to add any overhead to architectures which don't need it.  However,
> we're getting somewhere!
> 
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -2436,10 +2436,9 @@ static inline bool cow_user_page(struct page *dst, struct page *src,
>>  		if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) {
>>  			/*
>>  			 * Other thread has already handled the fault
>> -			 * and we don't need to do anything. If it's
>> -			 * not the case, the fault will be triggered
>> -			 * again on the same address.
>> +			 * and update local tlb only
>>  			 */
>> +			update_mmu_cache(vma, addr, vmf->pte);
> 
> Now, all the patch does is to add new calls to update_mmu_cache().
> 
> So can we replace all these with a call to a new
> update_mmu_cache_sw_tlb() (or whatever) which is a no-op on
> architectures which don't need the additional call?
> 
> Also, I wonder about the long-term maintainability.  People who
> regularly work on this code won't be thinking of this MIPS peculiarity
> and it's likely that any new calls to update_mmu_cache_sw_tlb() won't
> be added where they should have been.  Hopefully copy-and-paste from
> the existing code will serve us well.  Please do ensure that the
> update_mmu_cache_sw_tlb() implementation is carefully commented so
> that people can understand where they should (and shouldn't) include
> this call.
Well, I will do that. MIPS is actually somewhat different with generic
architectures, and old MIPS system does not support hardware page walk,
it requires software to update TLB entry.

regards
bibo, mao
diff mbox series

Patch

diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index 0d625c2..5f610ec 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -480,6 +480,26 @@  static inline pgprot_t pgprot_writecombine(pgprot_t _prot)
 
 #define flush_tlb_fix_spurious_fault(vma, address) do { } while (0)
 
+#define __HAVE_ARCH_PTE_SAME
+static inline int pte_same(pte_t pte_a, pte_t pte_b)
+{
+	return pte_val(pte_a) == pte_val(pte_b);
+}
+
+#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
+static inline int ptep_set_access_flags(struct vm_area_struct *vma,
+					unsigned long address, pte_t *ptep,
+					pte_t entry, int dirty)
+{
+	if (!pte_same(*ptep, entry))
+		set_pte_at(vma->vm_mm, address, ptep, entry);
+	/*
+	 * update_mmu_cache will unconditionally execute, handling both
+	 * the case that the PTE changed and the spurious fault case.
+	 */
+	return true;
+}
+
 /*
  * Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
diff --git a/mm/memory.c b/mm/memory.c
index f703fe8..9e2be4a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2436,10 +2436,9 @@  static inline bool cow_user_page(struct page *dst, struct page *src,
 		if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) {
 			/*
 			 * Other thread has already handled the fault
-			 * and we don't need to do anything. If it's
-			 * not the case, the fault will be triggered
-			 * again on the same address.
+			 * and update local tlb only
 			 */
+			update_mmu_cache(vma, addr, vmf->pte);
 			ret = false;
 			goto pte_unlock;
 		}
@@ -2463,7 +2462,8 @@  static inline bool cow_user_page(struct page *dst, struct page *src,
 		vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl);
 		locked = true;
 		if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) {
-			/* The PTE changed under us. Retry page fault. */
+			/* The PTE changed under us, update local tlb */
+			update_mmu_cache(vma, addr, vmf->pte);
 			ret = false;
 			goto pte_unlock;
 		}
@@ -2752,6 +2752,7 @@  static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 		new_page = old_page;
 		page_copied = 1;
 	} else {
+		update_mmu_cache(vma, vmf->address, vmf->pte);
 		mem_cgroup_cancel_charge(new_page, memcg, false);
 	}
 
@@ -2812,6 +2813,7 @@  vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf)
 	 * pte_offset_map_lock.
 	 */
 	if (!pte_same(*vmf->pte, vmf->orig_pte)) {
+		update_mmu_cache(vmf->vma, vmf->address, vmf->pte);
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
 		return VM_FAULT_NOPAGE;
 	}
@@ -2936,6 +2938,7 @@  static vm_fault_t do_wp_page(struct vm_fault *vmf)
 			vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
 					vmf->address, &vmf->ptl);
 			if (!pte_same(*vmf->pte, vmf->orig_pte)) {
+				update_mmu_cache(vma, vmf->address, vmf->pte);
 				unlock_page(vmf->page);
 				pte_unmap_unlock(vmf->pte, vmf->ptl);
 				put_page(vmf->page);
@@ -3341,8 +3344,10 @@  static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 						vma->vm_page_prot));
 		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
 				vmf->address, &vmf->ptl);
-		if (!pte_none(*vmf->pte))
+		if (!pte_none(*vmf->pte)) {
+			update_mmu_cache(vma, vmf->address, vmf->pte);
 			goto unlock;
+		}
 		ret = check_stable_address_space(vma->vm_mm);
 		if (ret)
 			goto unlock;
@@ -3378,8 +3383,10 @@  static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 
 	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address,
 			&vmf->ptl);
-	if (!pte_none(*vmf->pte))
+	if (!pte_none(*vmf->pte)) {
+		update_mmu_cache(vma, vmf->address, vmf->pte);
 		goto release;
+	}
 
 	ret = check_stable_address_space(vma->vm_mm);
 	if (ret)
@@ -3646,8 +3653,10 @@  vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
 	}
 
 	/* Re-check under ptl */
-	if (unlikely(!pte_none(*vmf->pte)))
+	if (unlikely(!pte_none(*vmf->pte))) {
+		update_mmu_cache(vma, vmf->address, vmf->pte);
 		return VM_FAULT_NOPAGE;
+	}
 
 	flush_icache_page(vma, page);
 	entry = mk_pte(page, vma->vm_page_prot);
@@ -4224,8 +4233,10 @@  static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 	vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
 	spin_lock(vmf->ptl);
 	entry = vmf->orig_pte;
-	if (unlikely(!pte_same(*vmf->pte, entry)))
+	if (unlikely(!pte_same(*vmf->pte, entry))) {
+		update_mmu_cache(vmf->vma, vmf->address, vmf->pte);
 		goto unlock;
+	}
 	if (vmf->flags & FAULT_FLAG_WRITE) {
 		if (!pte_write(entry))
 			return do_wp_page(vmf);