From patchwork Sun Dec 12 11:31:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baolin Wang X-Patchwork-Id: 12672219 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50875C433F5 for ; Sun, 12 Dec 2021 11:33:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 007C46B0074; Sun, 12 Dec 2021 06:32:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DDAF26B007B; Sun, 12 Dec 2021 06:32:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CA3446B0078; Sun, 12 Dec 2021 06:32:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0093.hostedemail.com [216.40.44.93]) by kanga.kvack.org (Postfix) with ESMTP id A766C6B007B for ; Sun, 12 Dec 2021 06:32:26 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 658A28249980 for ; Sun, 12 Dec 2021 11:32:16 +0000 (UTC) X-FDA: 78908928672.13.13F20C2 Received: from out30-42.freemail.mail.aliyun.com (out30-42.freemail.mail.aliyun.com [115.124.30.42]) by imf14.hostedemail.com (Postfix) with ESMTP id 656BA100007 for ; Sun, 12 Dec 2021 11:32:14 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=10;SR=0;TI=SMTPD_---0V-JZpjp_1639308731; Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0V-JZpjp_1639308731) by smtp.aliyun-inc.com(127.0.0.1); Sun, 12 Dec 2021 19:32:12 +0800 From: Baolin Wang To: akpm@linux-foundation.org, ying.huang@intel.com, dave.hansen@linux.intel.com Cc: ziy@nvidia.com, shy828301@gmail.com, baolin.wang@linux.alibaba.com, zhongjiang-ali@linux.alibaba.com, xlpang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 2/4] mm: Add a debug interface to control the range of speculative numa fault Date: Sun, 12 Dec 2021 19:31:58 +0800 Message-Id: <913a8a5282d265dc771309ca552c9c62c247c2b0.1639306956.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 656BA100007 X-Stat-Signature: sp8kw93so6hdhfh398hkbodqgokw3sfz Authentication-Results: imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.42 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com X-HE-Tag: 1639308734-599959 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a debug interface to control the range of speculative numa fault, which can be used to tuning the performance or event close the speculative numa fault window for some workloads. Signed-off-by: Baolin Wang --- mm/memory.c | 46 +++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 43 insertions(+), 3 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 2c9ed63e4e23..a0f4a2a008cc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4052,7 +4052,29 @@ vm_fault_t finish_fault(struct vm_fault *vmf) static unsigned long fault_around_bytes __read_mostly = rounddown_pow_of_two(65536); +static unsigned long numa_around_bytes __read_mostly; + #ifdef CONFIG_DEBUG_FS +static int numa_around_bytes_get(void *data, u64 *val) +{ + *val = numa_around_bytes; + return 0; +} + +static int numa_around_bytes_set(void *data, u64 val) +{ + if (val / PAGE_SIZE > PTRS_PER_PTE) + return -EINVAL; + if (val > PAGE_SIZE) + numa_around_bytes = rounddown_pow_of_two(val); + else + numa_around_bytes = 0; /* rounddown_pow_of_two(0) is undefined */ + return 0; +} +DEFINE_DEBUGFS_ATTRIBUTE(numa_around_bytes_fops, + numa_around_bytes_get, + numa_around_bytes_set, "%llu\n"); + static int fault_around_bytes_get(void *data, u64 *val) { *val = fault_around_bytes; @@ -4080,6 +4102,8 @@ static int __init fault_around_debugfs(void) { debugfs_create_file_unsafe("fault_around_bytes", 0644, NULL, NULL, &fault_around_bytes_fops); + debugfs_create_file_unsafe("numa_around_bytes", 0644, NULL, NULL, + &numa_around_bytes_fops); return 0; } late_initcall(fault_around_debugfs); @@ -4348,10 +4372,13 @@ static bool try_next_numa_page(struct vm_fault *vmf, unsigned int win_pages, ((win) & NUMA_FAULT_WINDOW_SIZE_MASK)) static inline unsigned int numa_fault_max_pages(struct vm_area_struct *vma, - unsigned long fault_address) + unsigned long fault_address, + unsigned long numa_around_size) { + unsigned long numa_around_addr = + (fault_address + numa_around_size) & PAGE_MASK; unsigned long pmd_end_addr = (fault_address & PMD_MASK) + PMD_SIZE; - unsigned long max_fault_addr = min_t(unsigned long, pmd_end_addr, + unsigned long max_fault_addr = min3(numa_around_addr, pmd_end_addr, vma->vm_end); return (max_fault_addr - fault_address - 1) >> PAGE_SHIFT; @@ -4360,12 +4387,24 @@ static inline unsigned int numa_fault_max_pages(struct vm_area_struct *vma, static unsigned int adjust_numa_fault_window(struct vm_area_struct *vma, unsigned long fault_address) { + unsigned long numa_around_size = READ_ONCE(numa_around_bytes); unsigned long numafault_ahead = GET_NUMA_FAULT_INFO(vma); unsigned long prev_start = NUMA_FAULT_WINDOW_START(numafault_ahead); unsigned int prev_pages = NUMA_FAULT_WINDOW_SIZE(numafault_ahead); unsigned long win_start; unsigned int win_pages, max_fault_pages; + /* + * Shut down the proactive numa fault if the numa_around_bytes + * is set to 0. + */ + if (!numa_around_size) { + if (numafault_ahead) + atomic_long_set(&vma->numafault_ahead_info, + NUMA_FAULT_INFO(0, 0)); + return 0; + } + win_start = fault_address + PAGE_SIZE; /* @@ -4437,7 +4476,8 @@ static unsigned int adjust_numa_fault_window(struct vm_area_struct *vma, * Make sure the size of ahead numa fault address is less than the * size of current VMA or PMD. */ - max_fault_pages = numa_fault_max_pages(vma, fault_address); + max_fault_pages = numa_fault_max_pages(vma, fault_address, + numa_around_size); if (win_pages > max_fault_pages) win_pages = max_fault_pages;