From patchwork Wed Feb 26 12:01:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992191 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D493BC18E7C for ; Wed, 26 Feb 2025 12:02:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2425B280030; Wed, 26 Feb 2025 07:02:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1F4F528002D; Wed, 26 Feb 2025 07:02:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 06DAD280030; Wed, 26 Feb 2025 07:01:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D22ED28002D for ; Wed, 26 Feb 2025 07:01:59 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 05DED813BB for ; Wed, 26 Feb 2025 12:01:55 +0000 (UTC) X-FDA: 83161956990.21.8DDA3DD Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf13.hostedemail.com (Postfix) with ESMTP id AEAF720037 for ; Wed, 26 Feb 2025 12:01:52 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571313; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=WHk7Qy3oxEORyfQNmjfm3iiZNsAQ/CI5XMApxaqbKNA=; b=f+ryyg0d0A/f8ec34mrAVEYH6HrWAl6KYi3oVUtliwE0VQr8F9Bt/O3LQGi19yDsxAoLb6 gpijoyXbQvBrIynrtRXN/V2aJn+fRdGbXxOnnHaUbFAv+mDRDXLMtBl2aQkaU3R5jj/gR8 1HOxYjVez7Yp0F+dAtrC2rNdljOVIy8= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571313; a=rsa-sha256; cv=none; b=brRJku2lGrV9Huq/0vy2z2qYlAQJ5DJ8WCefT8ycoEJkCmrG9RHRoMTEjiRQ9tvad16NfQ zMDt6HeKTx+827inyzmJwxNukz0FLSOhMCuqrFBAUlfmXho3Phho5WCImhTF1cFuFO4+KC H13ZIbAfwB4KDiSz8wxgaRkAC05St4Q= X-AuditID: a67dfc5b-3e1ff7000001d7ae-27-67bf02a7fd3a From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 20/25] mm, fs: skip tlb flushes for luf'd filemap that already has been done Date: Wed, 26 Feb 2025 21:01:27 +0900 Message-Id: <20250226120132.28469-20-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrCLMWRmVeSWpSXmKPExsXC9ZZnke5ypv3pBt0NShZz1q9hs/i84R+b xdf1v5gtnn7qY7G4vGsOm8W9Nf9ZLc7vWstqsWPpPiaLSwcWMFkc7z3AZDH/3mc2i82bpjJb HJ8yldHi9485bA58Ht9b+1g8ds66y+6xYFOpx+YVWh6bVnWyeWz6NInd4925c+weJ2b8ZvF4 v+8qm8fWX3YejVOvsXl83iQXwBPFZZOSmpNZllqkb5fAlbFux132gq7AimnNvA2Mmxy7GDk5 JARMJBY132CHsb9vfccKYrMJqEvcuPGTGcQWETCTONj6B6iGi4NZYBmTxN4TDWwgjrBAD6PE hR9nwKpYBFQlLvU9ZQGxeYE6znR+ZISYKi+xesMBsBpOoPi/3b/BtgkJJEu0rP/NAjJIQuA2 m8Tev63MEA2SEgdX3GCZwMi7gJFhFaNQZl5ZbmJmjoleRmVeZoVecn7uJkZgWC+r/RO9g/HT heBDjAIcjEo8vA/O7E0XYk0sK67MPcQowcGsJMLLmbknXYg3JbGyKrUoP76oNCe1+BCjNAeL kjiv0bfyFCGB9MSS1OzU1ILUIpgsEwenVAOjkpimz+0/0Vk8D+aJCt9cYnX6rIdB0Jm06VON eidnd26Q3SSVWuwjUJUZJCYVzrD8le3RvaqtT1I2Tup8tU6yOpj3dueDzJPLs7YtrhA2/rz8 /M31erNqq5alqLTNu1ufvPASuz3Tx/g5Hv8Sb9xbsXTRzAV3+Fi61u1M6zrldGXFrZ4stfkm SizFGYmGWsxFxYkAIuzsUmcCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrNLMWRmVeSWpSXmKPExsXC5WfdrLucaX+6wYnZkhZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBlrNtxl72gK7Bi WjNvA+Mmxy5GTg4JAROJ71vfsYLYbALqEjdu/GQGsUUEzCQOtv5h72Lk4mAWWMYksfdEAxuI IyzQwyhx4ccZsCoWAVWJS31PWUBsXqCOM50fGSGmykus3nAArIYTKP5v9292EFtIIFmiZf1v lgmMXAsYGVYximTmleUmZuaY6hVnZ1TmZVboJefnbmIEBumy2j8TdzB+uex+iFGAg1GJh/fB mb3pQqyJZcWVuYcYJTiYlUR4OTP3pAvxpiRWVqUW5ccXleakFh9ilOZgURLn9QpPTRASSE8s Sc1OTS1ILYLJMnFwSjUwzs147Xe3kiX9JXe5GGcV66b/y/6+OBKeMCPe5l1whs7XY8vPK99b c8w185Jjm8j5FubbwU8qdaff7BUOVpFPY42SjFGb+vmJws3FWm+dF1v5ZlwTnfSCP/N19coD jDffvOy5fk17Oa/uBrtXhyO2SXuY53htcedU73sda/7q18vHe9eLJoSKKrEUZyQaajEXFScC AOtgG4hOAgAA X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: AEAF720037 X-Stat-Signature: 5f8m8fsbkxb9ffmptzhb1ij1oj1tb6js X-HE-Tag: 1740571312-145793 X-HE-Meta: U2FsdGVkX1+7OxLdlKH3p+1bknyyhDibev1/kVUZZhUY4ofb2mrjmF/D9WWp8OhOV9R2XfOtx7ztGroeBpYdYPXS6h5iJuQ5jl/rOc5Ye3MiEs4pX8j8VnnreFy26hAXllH8Z4A4WNsECuLv/jY3x4PzfTwtWk4/OuU2XSGeW+SqNVSV4d5KjadLdFapLAcJ39V08DnLsH31Rko4K7YuweoaEBQuXf5LNBcO3nD5ejTp6v6o2VqZMROMTa+SCg6qgExDf7BxkbBj5fwtUkNRckKXI2koFRka0jeexxwgb52nvpZdWBVNYxejNyA0n8GZ/GoALRruTc3Lh4ZrP9GIFJzdF2HfiItEHg5mEtISQTJeAcqxiRhfyVrZ5P54zuWa5kKHSSQl4Ed3npGe4bi93YDVAdTLCqhhP9Q40bzhPEkVFDwMJkADE8lH/n8yu9lvtMJx715KpVQ3Py/SVhGVToGj+5Tiu+SJorSaad6u0t6rLRA21dO2/yM/ZyeUK9MHjoRn4+LxbTak7hDPRwdZ0dqWV7q66Vny5siiNLlW1asWdcuihwl427KUJnPuhRm4CyIqXl0VzxLl2RHBODVG/5s+Gr5/hYz6abYOcbNstsOxU6E1g0QzXYH/IQU1QRAR+ZEQjXdcuE34GhI1Q9Tdx6AlUL24LGDfHqwAxm4/OX+1d3tx8NeOEUmogESmsq2j9AQLerI+0dYV+E+SoFsn63UYKy+dRXTxW1f0W1YCfjMgOYAmybAADusWUiu6ggaZemBPz2dysK3OngeSSTEq4wCwNCMLO+jpgI3GS5f5N7wBenvQNfc6APUhRDX7cmpXsc9QbTZz9TOnGh4/d6GDxi/9uBY9suVwZXXPNsCLT9sUlxPWeVw2Ce/mp82eZGaq1tiOrWkRXJQihhbvPER8yrcJ87FZSMWV67pRYcKQY3JAateZmcPIft380qlkU1XuQpZ/6IHuyva9bUtSFBt 4h4QnlGR TOE5ib1Q7bC33j3p9VmGOB5j0gQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: For luf'd filemap, tlb shootdown is performed when updating page cache, no matter whether tlb flushes required already has been done or not. By storing luf meta data in struct address_space and updating the luf meta data properly, we can skip unnecessary tlb flush. Signed-off-by: Byungchul Park --- fs/inode.c | 1 + include/linux/fs.h | 4 ++- include/linux/mm_types.h | 2 ++ mm/memory.c | 4 +-- mm/rmap.c | 59 +++++++++++++++++++++++++--------------- mm/truncate.c | 14 +++++----- mm/vmscan.c | 2 +- 7 files changed, 53 insertions(+), 33 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 5587aabdaa5ee..752fb2df6f3b3 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -475,6 +475,7 @@ static void __address_space_init_once(struct address_space *mapping) init_rwsem(&mapping->i_mmap_rwsem); INIT_LIST_HEAD(&mapping->i_private_list); spin_lock_init(&mapping->i_private_lock); + luf_batch_init(&mapping->luf_batch); mapping->i_mmap = RB_ROOT_CACHED; } diff --git a/include/linux/fs.h b/include/linux/fs.h index 78aaf769d32d1..a2f014b31028f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -498,6 +498,7 @@ extern const struct address_space_operations empty_aops; * @i_private_lock: For use by the owner of the address_space. * @i_private_list: For use by the owner of the address_space. * @i_private_data: For use by the owner of the address_space. + * @luf_batch: Data to track need of tlb flush by luf. */ struct address_space { struct inode *host; @@ -519,6 +520,7 @@ struct address_space { struct list_head i_private_list; struct rw_semaphore i_mmap_rwsem; void * i_private_data; + struct luf_batch luf_batch; } __attribute__((aligned(sizeof(long)))) __randomize_layout; /* * On most architectures that alignment is already the case; but @@ -545,7 +547,7 @@ static inline int mapping_write_begin(struct file *file, * Ensure to clean stale tlb entries for this mapping. */ if (!ret) - luf_flush(0); + luf_flush_mapping(mapping); return ret; } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index c32ef19a25056..d73a3eb0f7b21 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1362,10 +1362,12 @@ extern void tlb_finish_mmu(struct mmu_gather *tlb); void luf_flush(unsigned short luf_key); void luf_flush_mm(struct mm_struct *mm); void luf_flush_vma(struct vm_area_struct *vma); +void luf_flush_mapping(struct address_space *mapping); #else static inline void luf_flush(unsigned short luf_key) {} static inline void luf_flush_mm(struct mm_struct *mm) {} static inline void luf_flush_vma(struct vm_area_struct *vma) {} +static inline void luf_flush_mapping(struct address_space *mapping) {} #endif struct vm_fault; diff --git a/mm/memory.c b/mm/memory.c index 93e5879583b07..62137ab258d2c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6296,10 +6296,10 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, if (flush) { /* * If it has a VM_SHARED mapping, all the mms involved - * should be luf_flush'ed. + * in the struct address_space should be luf_flush'ed. */ if (mapping) - luf_flush(0); + luf_flush_mapping(mapping); luf_flush_mm(mm); } diff --git a/mm/rmap.c b/mm/rmap.c index fe9c4606ae542..f5c5190be24e0 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -691,7 +691,7 @@ void fold_batch(struct tlbflush_unmap_batch *dst, #define NR_LUF_BATCH (1 << (sizeof(short) * 8)) /* - * Use 0th entry as accumulated batch. + * XXX: Reserve the 0th entry for later use. */ struct luf_batch luf_batch[NR_LUF_BATCH]; @@ -936,7 +936,7 @@ void luf_flush_vma(struct vm_area_struct *vma) mapping = vma->vm_file->f_mapping; if (mapping) - luf_flush(0); + luf_flush_mapping(mapping); luf_flush_mm(mm); } @@ -962,6 +962,29 @@ void luf_flush_mm(struct mm_struct *mm) try_to_unmap_flush(); } +void luf_flush_mapping(struct address_space *mapping) +{ + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct luf_batch *lb; + unsigned long flags; + unsigned long lb_ugen; + + if (!mapping) + return; + + lb = &mapping->luf_batch; + read_lock_irqsave(&lb->lock, flags); + fold_batch(tlb_ubc, &lb->batch, false); + lb_ugen = lb->ugen; + read_unlock_irqrestore(&lb->lock, flags); + + if (arch_tlbbatch_diet(&tlb_ubc->arch, lb_ugen)) + return; + + try_to_unmap_flush(); +} +EXPORT_SYMBOL(luf_flush_mapping); + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed @@ -1010,7 +1033,8 @@ void try_to_unmap_flush_dirty(void) static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, unsigned long start, unsigned long end, - struct vm_area_struct *vma) + struct vm_area_struct *vma, + struct address_space *mapping) { struct tlbflush_unmap_batch *tlb_ubc; int batch; @@ -1032,27 +1056,15 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, tlb_ubc = ¤t->tlb_ubc; else { tlb_ubc = ¤t->tlb_ubc_ro; + fold_luf_batch_mm(&mm->luf_batch, mm); + if (mapping) + fold_luf_batch_mm(&mapping->luf_batch, mm); } arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, start, end); tlb_ubc->flush_required = true; - if (can_luf_test()) { - struct luf_batch *lb; - unsigned long flags; - - /* - * Accumulate to the 0th entry right away so that - * luf_flush(0) can be uesed to properly perform pending - * TLB flush once this unmapping is observed. - */ - lb = &luf_batch[0]; - write_lock_irqsave(&lb->lock, flags); - __fold_luf_batch(lb, tlb_ubc, new_luf_ugen()); - write_unlock_irqrestore(&lb->lock, flags); - } - /* * Ensure compiler does not re-order the setting of tlb_flush_batched * before the PTE is cleared. @@ -1134,7 +1146,8 @@ void flush_tlb_batched_pending(struct mm_struct *mm) #else static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, unsigned long start, unsigned long end, - struct vm_area_struct *vma) + struct vm_area_struct *vma, + struct address_space *mapping) { } @@ -1511,7 +1524,7 @@ int folio_mkclean(struct folio *folio) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return cleaned; } @@ -2198,6 +2211,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, unsigned long nr_pages = 1, end_addr; unsigned long pfn; unsigned long hsz = 0; + struct address_space *mapping = folio_mapping(folio); /* * When racing against e.g. zap_pte_range() on another cpu, @@ -2359,7 +2373,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * and traps if the PTE is unmapped. */ if (should_defer_flush(mm, flags)) - set_tlb_ubc_flush_pending(mm, pteval, address, end_addr, vma); + set_tlb_ubc_flush_pending(mm, pteval, address, end_addr, vma, mapping); else flush_tlb_range(vma, address, end_addr); if (pte_dirty(pteval)) @@ -2611,6 +2625,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, enum ttu_flags flags = (enum ttu_flags)(long)arg; unsigned long pfn; unsigned long hsz = 0; + struct address_space *mapping = folio_mapping(folio); /* * When racing against e.g. zap_pte_range() on another cpu, @@ -2758,7 +2773,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval, address, address + PAGE_SIZE, vma); + set_tlb_ubc_flush_pending(mm, pteval, address, address + PAGE_SIZE, vma, mapping); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } diff --git a/mm/truncate.c b/mm/truncate.c index 68c9ded2f789b..8c133b93cefe8 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -142,7 +142,7 @@ void folio_invalidate(struct folio *folio, size_t offset, size_t length) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(folio->mapping); } EXPORT_SYMBOL_GPL(folio_invalidate); @@ -183,7 +183,7 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return 0; } @@ -234,7 +234,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(folio->mapping); if (!folio_test_large(folio)) return true; @@ -324,7 +324,7 @@ long mapping_evict_folio(struct address_space *mapping, struct folio *folio) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return ret; } @@ -459,7 +459,7 @@ void truncate_inode_pages_range(struct address_space *mapping, /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); } EXPORT_SYMBOL(truncate_inode_pages_range); @@ -579,7 +579,7 @@ unsigned long mapping_try_invalidate(struct address_space *mapping, /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return count; } @@ -749,7 +749,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping, /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return ret; } EXPORT_SYMBOL_GPL(invalidate_inode_pages2_range); diff --git a/mm/vmscan.c b/mm/vmscan.c index 422b9a03a6753..f145c09629b97 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -853,7 +853,7 @@ long remove_mapping(struct address_space *mapping, struct folio *folio) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return ret; }