From patchwork Sat Mar 23 04:44:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 10866769 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 22A6F139A for ; Sat, 23 Mar 2019 04:45:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 080E32A83B for ; Sat, 23 Mar 2019 04:45:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EF22E2A8A1; Sat, 23 Mar 2019 04:45:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 651D02A83B for ; Sat, 23 Mar 2019 04:45:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0D07F6B0006; Sat, 23 Mar 2019 00:45:06 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 08B006B0007; Sat, 23 Mar 2019 00:45:06 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB0926B0008; Sat, 23 Mar 2019 00:45:05 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id B349D6B0007 for ; Sat, 23 Mar 2019 00:45:05 -0400 (EDT) Received: by mail-pf1-f199.google.com with SMTP id g83so4272035pfd.3 for ; Fri, 22 Mar 2019 21:45:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=VeTrpkgTol96HI011wIp/gboY/0gXosEdyexTBsWDs4=; b=ugXF9axIJ3BhFSmEWdxfQu549AaEp8I3tIFM5ETgnZfdvkweUZj7fyT5ZaHRZbgc9O e/QnYVDb0StNLjWBHj3X92/6OX4rG1L4Rdgwb95ROjeXwcKxhtrxhBOtyVWRPnCnNLlT 60+g4yf91SEUIn+g6/IBHIaf/IiuU/Q/3+OhQOOjOO3nWxTnU6C5s3Lb5/iamutWbypU r0kanqxtReHLjTwyW6mFVlgx9EwAIoEn4whkgU7hJYX106eIz6SIJg/WfktcF2T6/PbT joYi4pX5HZDV9zZteJ0TtalDdYHk0+xo2NK25GWVgCrsc+lGHhWLBMtgdLK4oc36Oz0c qpCQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.56 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Gm-Message-State: APjAAAWpftZfzGj6y5zDDoWHa3wJjjznRZ9sfFfX3lSaO0CSRaBoToXy jMPe9LJSgmpGa7k+AB4X1kPi3yvyLsD2SlBxCCew5A3Hzbbo2rHO2IG/igIdf/7HmVdWtQnDL2H xMF9o3lee5IpPlKQ2PcU3SuqpoDcIobaiSWlmzuPRs6LvalyJm5rqId4yqdtUW6vV3w== X-Received: by 2002:a62:29c6:: with SMTP id p189mr13211652pfp.194.1553316305330; Fri, 22 Mar 2019 21:45:05 -0700 (PDT) X-Google-Smtp-Source: APXvYqwqQJVKhwG8CddfG0PSEgiGzyV9kduOGGBmoPagOKFzP+V93WAe2XPhL0DYPrVsC2Z3NeVV X-Received: by 2002:a62:29c6:: with SMTP id p189mr13211567pfp.194.1553316303876; Fri, 22 Mar 2019 21:45:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553316303; cv=none; d=google.com; s=arc-20160816; b=1AiprqyY2Fl8t+taNEbP4vI5me3DB9DFmxZMWNBhXr78ydc26RodfIyBHTZkkwcSLO Mbft3d9Xn1pd4AXOVQsjfdjnHaBOFjw0VEQhKdkJETSaBllMb53tsvzLjjSotI8OASNR BEWMwKnHS26vjt6CvxQ9LUVi2qHX9OVgfSFiRx6KTBkOqG+pBy5I2m/hh/4pJGpb7Txs dB6QVFuFLBfOlgqSwzhSdygmpYqvwU0ysH2Is5XlItHIX3SwBCjrIV/uTpeOejgvxGm9 S6mv2QfSwoQa+qwF36NM4qW/yLDLencH+JYVJDzU9/MaBlGmRE6l+1/HM4SgVlqdSXa/ BvLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=VeTrpkgTol96HI011wIp/gboY/0gXosEdyexTBsWDs4=; b=RS1smvu1A5tbrw5/V71m5z5l6lK0QcH1RNmqk34uDNCyG9jyCrVPk4+qvsxw4UZ4gg jZqbwyc0caV52gf3iUtDMTFvkTMw1M8U4mgDsTP1NcuFQTULf1l5H698VBA2fhMzFd4G it7SJBWVPvWzxv02NveRVJh1pejKEKbV8RLXYG4QxhRwmuPrGrytuyLM4JObmsNGMvbe m/atdRY+CkWRsDM+ckwvfxCNXPUndse/5H1PDEpo0XtcCpdzM1MWL3oDWPV/k+SE3p6w dKlHdm8Gcb9CmWwlaEi1EnKK4urq42K4BLSviHndf82437d853EGEXoqFEA2DMFxg4Ju xzbw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.56 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out30-56.freemail.mail.aliyun.com (out30-56.freemail.mail.aliyun.com. [115.124.30.56]) by mx.google.com with ESMTPS id w12si8053022pgr.104.2019.03.22.21.45.03 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 21:45:03 -0700 (PDT) Received-SPF: pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.56 as permitted sender) client-ip=115.124.30.56; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.56 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R101e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04391;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=14;SR=0;TI=SMTPD_---0TNPuxAM_1553316293; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TNPuxAM_1553316293) by smtp.aliyun-inc.com(127.0.0.1); Sat, 23 Mar 2019 12:45:01 +0800 From: Yang Shi To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, dave.hansen@intel.com, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 01/10] mm: control memory placement by nodemask for two tier main memory Date: Sat, 23 Mar 2019 12:44:26 +0800 Message-Id: <1553316275-21985-2-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com> References: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When running applications on the machine with NVDIMM as NUMA node, the memory allocation may end up on NVDIMM node. This may result in silent performance degradation and regression due to the difference of hardware property. DRAM first should be obeyed to prevent from surprising regression. Any non-DRAM nodes should be excluded from default allocation. Use nodemask to control the memory placement. Introduce def_alloc_nodemask which has DRAM nodes set only. Any non-DRAM allocation should be specified by NUMA policy explicitly. In the future we may be able to extract the memory charasteristics from HMAT or other source to build up the default allocation nodemask. However, just distinguish DRAM and PMEM (non-DRAM) nodes by SRAT flag for the time being. Signed-off-by: Yang Shi --- arch/x86/mm/numa.c | 1 + drivers/acpi/numa.c | 8 ++++++++ include/linux/mmzone.h | 3 +++ mm/page_alloc.c | 18 ++++++++++++++++-- 4 files changed, 28 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index dfb6c4d..d9e0ca4 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -626,6 +626,7 @@ static int __init numa_init(int (*init_func)(void)) nodes_clear(numa_nodes_parsed); nodes_clear(node_possible_map); nodes_clear(node_online_map); + nodes_clear(def_alloc_nodemask); memset(&numa_meminfo, 0, sizeof(numa_meminfo)); WARN_ON(memblock_set_node(0, ULLONG_MAX, &memblock.memory, MAX_NUMNODES)); diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c index 867f6e3..79dfedf 100644 --- a/drivers/acpi/numa.c +++ b/drivers/acpi/numa.c @@ -296,6 +296,14 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit) goto out_err_bad_srat; } + /* + * Non volatile memory is excluded from zonelist by default. + * Only regular DRAM nodes are set in default allocation node + * mask. + */ + if (!(ma->flags & ACPI_SRAT_MEM_NON_VOLATILE)) + node_set(node, def_alloc_nodemask); + node_set(node, numa_nodes_parsed); pr_info("SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]%s%s\n", diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index fba7741..063c3b4 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -927,6 +927,9 @@ extern int numa_zonelist_order_handler(struct ctl_table *, int, extern struct pglist_data *next_online_pgdat(struct pglist_data *pgdat); extern struct zone *next_zone(struct zone *zone); +/* Regular DRAM nodes */ +extern nodemask_t def_alloc_nodemask; + /** * for_each_online_pgdat - helper macro to iterate over all online nodes * @pgdat - pointer to a pg_data_t variable diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 03fcf73..68ad8c6 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -134,6 +134,8 @@ struct pcpu_drain { int percpu_pagelist_fraction; gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK; +nodemask_t def_alloc_nodemask __read_mostly; + /* * A cached value of the page's pageblock's migratetype, used when the page is * put on a pcplist. Used to avoid the pageblock migratetype lookup when @@ -4524,12 +4526,24 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order, { ac->high_zoneidx = gfp_zone(gfp_mask); ac->zonelist = node_zonelist(preferred_nid, gfp_mask); - ac->nodemask = nodemask; ac->migratetype = gfpflags_to_migratetype(gfp_mask); + if (!nodemask) { + /* Non-DRAM node is preferred node */ + if (!node_isset(preferred_nid, def_alloc_nodemask)) + /* + * With MPOL_PREFERRED policy, once PMEM is allowed, + * can falback to all memory nodes. + */ + ac->nodemask = &node_states[N_MEMORY]; + else + ac->nodemask = &def_alloc_nodemask; + } else + ac->nodemask = nodemask; + if (cpusets_enabled()) { *alloc_mask |= __GFP_HARDWALL; - if (!ac->nodemask) + if (nodes_equal(*ac->nodemask, def_alloc_nodemask)) ac->nodemask = &cpuset_current_mems_allowed; else *alloc_flags |= ALLOC_CPUSET; From patchwork Sat Mar 23 04:44:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 10866777 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BF764922 for ; Sat, 23 Mar 2019 04:45:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A5CBC2A83B for ; Sat, 23 Mar 2019 04:45:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 99E2F2A8A1; Sat, 23 Mar 2019 04:45:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DDBB32A83B for ; Sat, 23 Mar 2019 04:45:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 36A296B000E; Sat, 23 Mar 2019 00:45:11 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 340BB6B0010; Sat, 23 Mar 2019 00:45:11 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 121A96B0266; Sat, 23 Mar 2019 00:45:10 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id BC12E6B000E for ; Sat, 23 Mar 2019 00:45:10 -0400 (EDT) Received: by mail-pf1-f199.google.com with SMTP id y2so4233340pfl.16 for ; Fri, 22 Mar 2019 21:45:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=mRnqNBUA/+X55QouLJxhPr6tTyfUiX9z7wlgO1jXwO8=; b=RxOaH9kw3emRkau5AWoQYox4IuH/lLaxlGXETJhGN2+SjKWcvUaSeCDxK7Sb1J4RDD bIKCqJJZhfdGyTyk07ybmTsuC+tRYIpd6PRRSJTjFOR4Go1K1JNFMk/qbSVYRsfKKpuN UcMyerEmmqS5Xcz1HO0TxozYsCQK3+YdbHhcO2PoeDptJ/IUK0seEESlMK6hjpAIymim ysseO/RMCONXfu3HJPJooStgKWHNFGVZY4olSQxRk09YqTbt6yW+crwdx6qGBXYTENWZ o4JhsDtT5TNgcZzEziHI3Cnwa7mQLMGhX2Ut8BiWZqp7stJiqbAsCTsdkaxqJSMmeRMx JKyA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 47.88.44.36 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Gm-Message-State: APjAAAWTmrtd1MkSFOeCwho8ttAkXkBRaqhKtQjWTXsRk08D6Pbwt8DF xcdzD+oYtjblbmm87W9B9TrpS7PlHtyiB2m36Obx/KmfbBzzj12B4g2+dC/M7KaC7Pf7ACNr3+3 TlSMcqmTfoSfq7b5LUDt3zZBrX5VI0J2sV6pmmFRKg5CDGfz5AGGeR8bqyN+v/MHdtw== X-Received: by 2002:a17:902:d705:: with SMTP id w5mr13172402ply.243.1553316310405; Fri, 22 Mar 2019 21:45:10 -0700 (PDT) X-Google-Smtp-Source: APXvYqxj1ZavJZzrZw0+DKfGF3fk7Xct7j2R9/YwqQGhWUL49+oVCiePxkL6Dw6hgabMXI5/yi2V X-Received: by 2002:a17:902:d705:: with SMTP id w5mr13172313ply.243.1553316308960; Fri, 22 Mar 2019 21:45:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553316308; cv=none; d=google.com; s=arc-20160816; b=gYOETg0oyafrBsP6C8p1EOSJ0YE8voBSlO47sLl6RXhGWY5MAZHIt02uxZVYjDTbuG R2GKlDu4eMRO7nbvU/rpQesURx15f1ww6BElWQuLcVY1IfBxRgUktI0lozc5BMMQfb7v bnkxjQi5T/rghTEy5ohb1tkvThcn6NmLuZVmnWNtZzafgZRGuKWESXzOKRcpVp0+aUFL Cz938xeCLpF1DZeSoOWN5Ve5shi75cD1l6OAK9/NYeAGfqpwTwnrI4JvL/+FE4dctYoH 14YtW4sxG+m6nEU8nD/+TNXiugUwKu/qFHAz4yskRmRoKKykhn1UB+uFUH9jOA7o+2B8 FuHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=mRnqNBUA/+X55QouLJxhPr6tTyfUiX9z7wlgO1jXwO8=; b=YldVzOyNaJQbdFPuzY1T1SGcg5/EnwCgoErL883xrGGjx0RpUjBBNjDaer84VfLjDF xlQACxPdMW2eEGMExESb5LNzZbUbFk49D/zZrSi2y/yIbz6lnE+G9MhyYjIPn7liip3/ siF/ElbRVZqiti/GEtwxXaxanOVQKLv5zEwmd9nHFH+uptRIhXMzgp3yC/FVf+7wqSOb 2XQjinAO0UTMgGo4+kdKJrC6cvWFGJy7JRP6F68RAR0/QqMjhb0ymKsM1fLjGE4CT8nd 6s3qGdqWxg9/vmqdxwsOapui2dOCD4IXsuNnPn37PB8d3yizGdy7xoiVzyMgEE94Hbb4 N8xg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 47.88.44.36 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out4436.biz.mail.alibaba.com (out4436.biz.mail.alibaba.com. [47.88.44.36]) by mx.google.com with ESMTPS id h12si6714801plt.69.2019.03.22.21.45.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 21:45:08 -0700 (PDT) Received-SPF: pass (google.com: domain of yang.shi@linux.alibaba.com designates 47.88.44.36 as permitted sender) client-ip=47.88.44.36; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 47.88.44.36 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R641e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04397;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=14;SR=0;TI=SMTPD_---0TNPuxAM_1553316293; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TNPuxAM_1553316293) by smtp.aliyun-inc.com(127.0.0.1); Sat, 23 Mar 2019 12:45:01 +0800 From: Yang Shi To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, dave.hansen@intel.com, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 02/10] mm: mempolicy: introduce MPOL_HYBRID policy Date: Sat, 23 Mar 2019 12:44:27 +0800 Message-Id: <1553316275-21985-3-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com> References: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Introduce a new NUMA policy, MPOL_HYBRID. It behaves like MPOL_BIND, but since we need migrate pages from non-DRAM node (i.e. PMEM node) to DRAM node on demand, MPOL_HYBRID would do page migration on numa fault, so it would have MPOL_F_MOF set by default. The NUMA balancing stuff will be enabled in the following patch. Signed-off-by: Yang Shi --- include/uapi/linux/mempolicy.h | 1 + mm/mempolicy.c | 56 +++++++++++++++++++++++++++++++++++++----- 2 files changed, 51 insertions(+), 6 deletions(-) diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h index 3354774..0fdc73d 100644 --- a/include/uapi/linux/mempolicy.h +++ b/include/uapi/linux/mempolicy.h @@ -22,6 +22,7 @@ enum { MPOL_BIND, MPOL_INTERLEAVE, MPOL_LOCAL, + MPOL_HYBRID, MPOL_MAX, /* always last member of enum */ }; diff --git a/mm/mempolicy.c b/mm/mempolicy.c index af171cc..7d0a432 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -31,6 +31,10 @@ * but useful to set in a VMA when you have a non default * process policy. * + * hybrid Only allocate memory on specific set of nodes. If the set of + * nodes include non-DRAM nodes, NUMA balancing would promote + * the page to DRAM node. + * * default Allocate on the local node first, or when on a VMA * use the process policy. This is what Linux always did * in a NUMA aware kernel and still does by, ahem, default. @@ -191,6 +195,17 @@ static int mpol_new_bind(struct mempolicy *pol, const nodemask_t *nodes) return 0; } +static int mpol_new_hybrid(struct mempolicy *pol, const nodemask_t *nodes) +{ + if (nodes_empty(*nodes)) + return -EINVAL; + + /* Hybrid policy would promote pages in page fault */ + pol->flags |= MPOL_F_MOF; + pol->v.nodes = *nodes; + return 0; +} + /* * mpol_set_nodemask is called after mpol_new() to set up the nodemask, if * any, for the new policy. mpol_new() has already validated the nodes @@ -401,6 +416,10 @@ void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new) .create = mpol_new_bind, .rebind = mpol_rebind_nodemask, }, + [MPOL_HYBRID] = { + .create = mpol_new_hybrid, + .rebind = mpol_rebind_nodemask, + }, }; static void migrate_page_add(struct page *page, struct list_head *pagelist, @@ -782,6 +801,8 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes) return; switch (p->mode) { + case MPOL_HYBRID: + /* Fall through */ case MPOL_BIND: /* Fall through */ case MPOL_INTERLEAVE: @@ -1721,8 +1742,12 @@ static int apply_policy_zone(struct mempolicy *policy, enum zone_type zone) */ static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) { - /* Lower zones don't get a nodemask applied for MPOL_BIND */ - if (unlikely(policy->mode == MPOL_BIND) && + /* + * Lower zones don't get a nodemask applied for MPOL_BIND + * or MPOL_HYBRID. + */ + if (unlikely((policy->mode == MPOL_BIND) || + (policy->mode == MPOL_HYBRID)) && apply_policy_zone(policy, gfp_zone(gfp)) && cpuset_nodemask_valid_mems_allowed(&policy->v.nodes)) return &policy->v.nodes; @@ -1742,7 +1767,9 @@ static int policy_node(gfp_t gfp, struct mempolicy *policy, * because we might easily break the expectation to stay on the * requested node and not break the policy. */ - WARN_ON_ONCE(policy->mode == MPOL_BIND && (gfp & __GFP_THISNODE)); + WARN_ON_ONCE((policy->mode == MPOL_BIND || + policy->mode == MPOL_HYBRID) && + (gfp & __GFP_THISNODE)); } return nd; @@ -1786,6 +1813,8 @@ unsigned int mempolicy_slab_node(void) case MPOL_INTERLEAVE: return interleave_nodes(policy); + case MPOL_HYBRID: + /* Fall through */ case MPOL_BIND: { struct zoneref *z; @@ -1856,7 +1885,8 @@ static inline unsigned interleave_nid(struct mempolicy *pol, * @addr: address in @vma for shared policy lookup and interleave policy * @gfp_flags: for requested zone * @mpol: pointer to mempolicy pointer for reference counted mempolicy - * @nodemask: pointer to nodemask pointer for MPOL_BIND nodemask + * @nodemask: pointer to nodemask pointer for MPOL_BIND or MPOL_HYBRID + * nodemask * * Returns a nid suitable for a huge page allocation and a pointer * to the struct mempolicy for conditional unref after allocation. @@ -1871,14 +1901,16 @@ int huge_node(struct vm_area_struct *vma, unsigned long addr, gfp_t gfp_flags, int nid; *mpol = get_vma_policy(vma, addr); - *nodemask = NULL; /* assume !MPOL_BIND */ + /* assume !MPOL_BIND || !MPOL_HYBRID */ + *nodemask = NULL; if (unlikely((*mpol)->mode == MPOL_INTERLEAVE)) { nid = interleave_nid(*mpol, vma, addr, huge_page_shift(hstate_vma(vma))); } else { nid = policy_node(gfp_flags, *mpol, numa_node_id()); - if ((*mpol)->mode == MPOL_BIND) + if ((*mpol)->mode == MPOL_BIND || + (*mpol)->mode == MPOL_HYBRID) *nodemask = &(*mpol)->v.nodes; } return nid; @@ -1919,6 +1951,8 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask) init_nodemask_of_node(mask, nid); break; + case MPOL_HYBRID: + /* Fall through */ case MPOL_BIND: /* Fall through */ case MPOL_INTERLEAVE: @@ -1966,6 +2000,7 @@ bool mempolicy_nodemask_intersects(struct task_struct *tsk, * nodes in mask. */ break; + case MPOL_HYBRID: case MPOL_BIND: case MPOL_INTERLEAVE: ret = nodes_intersects(mempolicy->v.nodes, *mask); @@ -2170,6 +2205,8 @@ bool __mpol_equal(struct mempolicy *a, struct mempolicy *b) return false; switch (a->mode) { + case MPOL_HYBRID: + /* Fall through */ case MPOL_BIND: /* Fall through */ case MPOL_INTERLEAVE: @@ -2325,6 +2362,9 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long polnid = pol->v.preferred_node; break; + case MPOL_HYBRID: + /* Fall through */ + case MPOL_BIND: /* @@ -2693,6 +2733,7 @@ void numa_default_policy(void) [MPOL_BIND] = "bind", [MPOL_INTERLEAVE] = "interleave", [MPOL_LOCAL] = "local", + [MPOL_HYBRID] = "hybrid", }; @@ -2768,6 +2809,8 @@ int mpol_parse_str(char *str, struct mempolicy **mpol) if (!nodelist) err = 0; goto out; + case MPOL_HYBRID: + /* Fall through */ case MPOL_BIND: /* * Insist on a nodelist @@ -2856,6 +2899,7 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) else node_set(pol->v.preferred_node, nodes); break; + case MPOL_HYBRID: case MPOL_BIND: case MPOL_INTERLEAVE: nodes = pol->v.nodes; From patchwork Sat Mar 23 04:44:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 10866771 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 80453922 for ; Sat, 23 Mar 2019 04:45:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6757E2A83B for ; Sat, 23 Mar 2019 04:45:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5BDD92A8A1; Sat, 23 Mar 2019 04:45:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 01C4E2A83B for ; Sat, 23 Mar 2019 04:45:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1C1216B0008; Sat, 23 Mar 2019 00:45:07 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 05A8F6B000A; Sat, 23 Mar 2019 00:45:06 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E62CD6B000C; Sat, 23 Mar 2019 00:45:06 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id AE87F6B0008 for ; Sat, 23 Mar 2019 00:45:06 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id g83so4272065pfd.3 for ; Fri, 22 Mar 2019 21:45:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=kdFWjbNMrgHE4IiWbQnApU5LlKLCfxjq39LREWERgU0=; b=UPaVQtVF9tk21o26Jb7PMaPpXyUTGIxpfVDXu0mX8ckDEHLv4X6zBJFrPlsa+9dS4F FsriMGuXeQEsSbRpwtiTK06KVwcnqeYCl/R7B74MBn04D9mUmo99MYHd7nH0SzBayK09 XUJvmREKbszJzi0GCksB+Xaa40fgW6RqEZFSHPwHD4Pr/bllUAR6uI/db8kM1F/VekJV yOncF0Z3bogRpH90AoQHNFwhrSAW6hIF8BMQZwVzAwgQEn9fREstQMz5BJyLbhkO3fbp nhHYIbVGrzLKFa0rtFau/wE7YxK43wpPNTVyVN24nfqP4u/ewPG1tbZ+BqBlEaPlxErk nihA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.43 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Gm-Message-State: APjAAAXGi7auXfCLsN8Xb9IUA2KQD82615zPcpyjYW3nitTBikkB/ZzR JcDoHyv5jpu7knVHeIJ8pBxU5hZ8FSmDATuPMp01bzLHL2/2NAQGDdjJjCiIE7LLDhKle0JAf3R kQdXowgvtFbzV+aA0dyuLmNTzRx5eNObx9HbACVZNiYLohiDRCr52SLS+V8n4D6+DZg== X-Received: by 2002:a17:902:203:: with SMTP id 3mr13429256plc.336.1553316306406; Fri, 22 Mar 2019 21:45:06 -0700 (PDT) X-Google-Smtp-Source: APXvYqzVMhQEF+c7eLgO2II+Jax/YEyuN0+LhEmOXu+loSq1yT5eDsnEmLIYulTuErQmNjq2hpHh X-Received: by 2002:a17:902:203:: with SMTP id 3mr13429196plc.336.1553316305387; Fri, 22 Mar 2019 21:45:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553316305; cv=none; d=google.com; s=arc-20160816; b=qTSxjc7aSibVyZ6k0lsUDtV9OYz9ZeXOPYB5glhIAfzMjEqz1G0jw3Hyi2vmgGhY/+ fSBTX9dri+eBGYaH8JkJHpMJQkheElK8q8ODiFRxw74KZwmNEXbUvpPhQ/lm+962vPZx ohH2iJQtL4ShadXx4PZfWqlJ+1m6giDZEOCgtrPIjHRif83KBlh5deyu8MfoaAU9nrXu kU7nEQA8+faAuA81lMOIDEsElbK3XFYOgdns5eDLTJfg0hR1PAkfM7Av1M73noUBN39/ Wvb/uBvS1MOKrHgvzRxtotOUMsL7199MCy6v6tK3/VxAl6ckNgSTQyRAygB4Hjfy+O1C OrlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=kdFWjbNMrgHE4IiWbQnApU5LlKLCfxjq39LREWERgU0=; b=uLmi+P5rHOzQmHvglU0bhLkGE3VaRLE/M+O6iGb/EVojZS9Qxh9iqf9PvCTrc2C6Xc 57DSmmgUi112kvVqAyDNJXstXJBpBhVvxq7TZwMGcYwZyzlA+vP6NYoF1wqyCja3Ujq4 bB2nKvhv2KJRWu7r4vMsWwJmk4UChGRxjOka5ZHoztSSVkcxsHBgx/9TDmFsagy208kC l5aBvif6TzXs03disyMbtujZdNgnbDttE/gL+/VDRFsg47HKCnClZcK8zYk2glzEGzlP /+jS/3HMa1SdP5WLzqO23yAx3TTi1LeOUZdABprK9aUGQspmat3GK0pp2Ku2J8Ya+2j/ PPOQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.43 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out30-43.freemail.mail.aliyun.com (out30-43.freemail.mail.aliyun.com. [115.124.30.43]) by mx.google.com with ESMTPS id 186si8479616pfe.262.2019.03.22.21.45.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 21:45:05 -0700 (PDT) Received-SPF: pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.43 as permitted sender) client-ip=115.124.30.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.43 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R661e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01424;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=14;SR=0;TI=SMTPD_---0TNPuxAM_1553316293; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TNPuxAM_1553316293) by smtp.aliyun-inc.com(127.0.0.1); Sat, 23 Mar 2019 12:45:02 +0800 From: Yang Shi To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, dave.hansen@intel.com, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 03/10] mm: mempolicy: promote page to DRAM for MPOL_HYBRID Date: Sat, 23 Mar 2019 12:44:28 +0800 Message-Id: <1553316275-21985-4-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com> References: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP With MPOL_HYBRID the memory allocation may end up on non-DRAM node, this may be not optimal for performance. Promote pages to DRAM with NUMA balancing for MPOL_HYBRID. If DRAM nodes are specified, migrate to the specified nodes. If no DRAM node is specified, migrate to the local DRAM node. Signed-off-by: Yang Shi --- mm/mempolicy.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 7d0a432..87bc691 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2339,6 +2339,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long struct zoneref *z; int curnid = page_to_nid(page); unsigned long pgoff; + nodemask_t nmask; int thiscpu = raw_smp_processor_id(); int thisnid = cpu_to_node(thiscpu); int polnid = NUMA_NO_NODE; @@ -2363,7 +2364,24 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long break; case MPOL_HYBRID: - /* Fall through */ + if (node_isset(curnid, pol->v.nodes) && + node_isset(curnid, def_alloc_nodemask)) + /* The page is already on DRAM node */ + goto out; + + /* + * Promote to the DRAM node specified by the policy, or + * the local DRAM node if no DRAM node is specified. + */ + nodes_and(nmask, pol->v.nodes, def_alloc_nodemask); + + z = first_zones_zonelist( + node_zonelist(numa_node_id(), GFP_HIGHUSER), + gfp_zone(GFP_HIGHUSER), + nodes_empty(nmask) ? &def_alloc_nodemask : &nmask); + polnid = z->zone->node; + + break; case MPOL_BIND: From patchwork Sat Mar 23 04:44:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 10866781 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DE944139A for ; Sat, 23 Mar 2019 04:45:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BB7532A9FB for ; Sat, 23 Mar 2019 04:45:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AA4662AA0B; Sat, 23 Mar 2019 04:45:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 18D1F2A9FB for ; Sat, 23 Mar 2019 04:45:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9FD496B026C; Sat, 23 Mar 2019 00:45:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 934FE6B026E; Sat, 23 Mar 2019 00:45:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7AE936B026F; Sat, 23 Mar 2019 00:45:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 2DD266B026C for ; Sat, 23 Mar 2019 00:45:38 -0400 (EDT) Received: by mail-pf1-f197.google.com with SMTP id a72so4229358pfj.19 for ; Fri, 22 Mar 2019 21:45:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=Ipb+kryOclOsdI7bZtqHOlDm7ELTlnRdbAsmhRsy3rE=; b=e1GEyIyjjHIGIji7el30kwr9m04ymlx0EhF95l65T4WBS+NBWecEv6WY5U8n2jaVDn XKBLg714wmEmlK6vOI1YCn59G3Rcfv8ULMJUa0XJfphGVeGFiFTX3ZapB8tbHBbh5xUP lw62GGOg4H4gpj+bncjbiHoKYLRoo9sR8wVjr2hKOMlfdRMekNbC2E0VOnz/xhl/SZ+q yBFDEjZzt7bGUXVBRTltD2wq3YYT49hl/H42tCdeJovsmywWtAAj90rK+46EeUpr0UcF JYVnt2DUNjB+H3xcpR24PiWetR6laMjLuTIjS9iDQJrtlEeviM1WckVx78X+qO9Xmab6 zYeg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.131 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Gm-Message-State: APjAAAWVFw+Tz0imFzV4+SoX2jh3bnHZMifVYdDXCdGw1NeuKBvRhtCI MdzOB9Fj7FbCiK+t4WFQ8+rfJs4z/+koHg8YemcKGiiKjTd4HcpHCVrNw8odXmD1X/RJvqLB9We 6Hr0QreuvBAuKEM4rE4r6e/0I06fStJZ2LAhLsjCrApyeftrnKuUqFtUVSe1jlIkcSA== X-Received: by 2002:a17:902:a413:: with SMTP id p19mr13386851plq.337.1553316337810; Fri, 22 Mar 2019 21:45:37 -0700 (PDT) X-Google-Smtp-Source: APXvYqwJrHlQJkNwLjAH2jY6C2sv0VVICYermC2j2dzgNIuRRbMXJ+BlXtQQB3ccVCtqhkaAx2Ml X-Received: by 2002:a17:902:a413:: with SMTP id p19mr13386780plq.337.1553316336557; Fri, 22 Mar 2019 21:45:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553316336; cv=none; d=google.com; s=arc-20160816; b=Q8z0exUcvzez9a34T9iUY6cyETNp8Rm5XzcaymZGNYZZkmRFPOa/HUWFgrryY2ZUSP l072VbUfitj3AuSY+JRWoliYWc7JQvp+hiCU3U24I2GVNC2jk5lnnsrrgfTrO5r1D75C /ZkIGuaqp0wdq5EuS5zvanlj+N1vZG85tDGJrlNu0bD4xYt9EN8mXoHzn4telKhAqj/6 Yy3ua7g4MytWhtI/gH6WmNDHT12vjCnDN1bq34Df4d1Fp8NVqOO2hdr6pvtGcd4aFoZY g6JIdf1TtB06FNUIR+xX/J6xOFrAX9yRZsb5O6mjlpuzg/HKV9/bPillh+kPZjXEtOtI FOhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=Ipb+kryOclOsdI7bZtqHOlDm7ELTlnRdbAsmhRsy3rE=; b=vOfISO+bnASQGNBd0wW2Tqby3bv5P+2GoRs6TK1FRvr5DjblQQV7KycILbAGixTv5T VCvRjf0VajU5FTOE0fIYolR2x8hK3ctKLipIrBn93/xbmIkSPGE/4z4+NO6t9Uf6kbJR 03ALTEMgOT5WJEtWuiKoxRVjnU+YSRYSJha9uhaRQElrrHIdmAzg63nLhQVMe5JrJ8Py 7LzPwB1DV9eKzNxNU3f2qbkmZknc8PJy1/2t74NIroCe6U4QVj6eBZsTPwIpjYwWrO6m EK2zARwMziU+aLGzimKxw6j98ZauDcYPYti+OLlf24lus3tp9RqH7LVXUppkZ7+XdHNw MIzg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.131 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com. [115.124.30.131]) by mx.google.com with ESMTPS id h35si9138722plb.180.2019.03.22.21.45.35 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 21:45:36 -0700 (PDT) Received-SPF: pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.131 as permitted sender) client-ip=115.124.30.131; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.131 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R181e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04389;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=14;SR=0;TI=SMTPD_---0TNPuxAM_1553316293; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TNPuxAM_1553316293) by smtp.aliyun-inc.com(127.0.0.1); Sat, 23 Mar 2019 12:45:02 +0800 From: Yang Shi To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, dave.hansen@intel.com, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 04/10] mm: numa: promote pages to DRAM when it is accessed twice Date: Sat, 23 Mar 2019 12:44:29 +0800 Message-Id: <1553316275-21985-5-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com> References: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP NUMA balancing would promote the pages to DRAM once it is accessed, but it might be just one off access. To reduce migration thrashing and memory bandwidth pressure, introduce PG_promote flag to mark promote candidate. The page will be promoted to DRAM when it is accessed twice. This might be a good way to filter out those one-off access pages. PG_promote flag will be inherited by tail pages when THP gets split. But, it will not be copied to the new page once the migration is done. This approach is not definitely the optimal one to distinguish the hot or cold pages. It may need much more sophisticated algorithm to distinguish hot or cold pages accurately. Kernel may be not the good place to implement such algorithm considering the complexity and potential overhead. But, kernel may still need such capability. With NUMA balancing the whole workingset of the process may end up being promoted to DRAM finally. It depends on the page reclaim to demote inactive pages to PMEM implemented by the following patch. Signed-off-by: Yang Shi --- include/linux/page-flags.h | 4 ++++ include/trace/events/mmflags.h | 3 ++- mm/huge_memory.c | 10 ++++++++++ mm/memory.c | 8 ++++++++ 4 files changed, 24 insertions(+), 1 deletion(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 9f8712a..2d53166 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -131,6 +131,7 @@ enum pageflags { PG_young, PG_idle, #endif + PG_promote, /* Promote candidate for NUMA balancing */ __NR_PAGEFLAGS, /* Filesystems */ @@ -348,6 +349,9 @@ static inline void page_init_poison(struct page *page, size_t size) PAGEFLAG(OwnerPriv1, owner_priv_1, PF_ANY) TESTCLEARFLAG(OwnerPriv1, owner_priv_1, PF_ANY) +PAGEFLAG(Promote, promote, PF_ANY) __SETPAGEFLAG(Promote, promote, PF_ANY) + __CLEARPAGEFLAG(Promote, promote, PF_ANY) + /* * Only test-and-set exist for PG_writeback. The unconditional operators are * risky: they bypass page accounting. diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index a1675d4..f13c2a1 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -100,7 +100,8 @@ {1UL << PG_mappedtodisk, "mappedtodisk" }, \ {1UL << PG_reclaim, "reclaim" }, \ {1UL << PG_swapbacked, "swapbacked" }, \ - {1UL << PG_unevictable, "unevictable" } \ + {1UL << PG_unevictable, "unevictable" }, \ + {1UL << PG_promote, "promote" } \ IF_HAVE_PG_MLOCK(PG_mlocked, "mlocked" ) \ IF_HAVE_PG_UNCACHED(PG_uncached, "uncached" ) \ IF_HAVE_PG_HWPOISON(PG_hwpoison, "hwpoison" ) \ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 404acdc..8268a3c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1589,6 +1589,15 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) haddr + HPAGE_PMD_SIZE); } + /* Promote page to DRAM when referenced twice */ + if (!(node_isset(page_nid, def_alloc_nodemask)) && + !PagePromote(page)) { + SetPagePromote(page); + put_page(page); + page_nid = -1; + goto clear_pmdnuma; + } + /* * Migrate the THP to the requested node, returns with page unlocked * and access rights restored. @@ -2396,6 +2405,7 @@ static void __split_huge_page_tail(struct page *head, int tail, (1L << PG_workingset) | (1L << PG_locked) | (1L << PG_unevictable) | + (1L << PG_promote) | (1L << PG_dirty))); /* ->mapping in first tail page is compound_mapcount */ diff --git a/mm/memory.c b/mm/memory.c index 47fe250..2494c11 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3680,6 +3680,14 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) goto out; } + /* Promote the non-DRAM page when it is referenced twice */ + if (!(node_isset(page_nid, def_alloc_nodemask)) && + !PagePromote(page)) { + SetPagePromote(page); + put_page(page); + goto out; + } + /* Migrate to the requested node */ migrated = migrate_misplaced_page(page, vma, target_nid); if (migrated) { From patchwork Sat Mar 23 04:44:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 10866779 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CFCB4139A for ; Sat, 23 Mar 2019 04:45:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B62822A83B for ; Sat, 23 Mar 2019 04:45:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A87F22A8A1; Sat, 23 Mar 2019 04:45:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2B4D22A83B for ; Sat, 23 Mar 2019 04:45:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0FB1E6B026B; Sat, 23 Mar 2019 00:45:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0810C6B026C; Sat, 23 Mar 2019 00:45:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E643E6B026D; Sat, 23 Mar 2019 00:45:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id A83F36B026B for ; Sat, 23 Mar 2019 00:45:35 -0400 (EDT) Received: by mail-pg1-f200.google.com with SMTP id u2so3964658pgi.10 for ; Fri, 22 Mar 2019 21:45:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=s9/PKRRwNdObDLsxPq9IaWvFf+ZDklBgCSct6YIvkgc=; b=ZE+fEhcSc3Iuo2mSoWY+7yuQ/b8Cup8wYfrG88qhY0PzeTWDENkdHgdGM4iOmzbdIG UtH2Cja6/ADUmfQvS4WUIjLUf6gskzD0eSOgnhcZtGB28bShvff5OoZULVYLec5fHn3J G0P6aZHhJRkGSTX/lQFVHL+jVeJ83yrqXzFH6Bi8jFEUvt38XXonU/8yJOYWNm0fWu+v /7BjJi2zlGKmVxJ/SpwrCrlPFjmdfKT+Vs1ZrKPWmPfzDxGXB4JtjbO3tqlKyiWI4AN3 KOJUcw3TmdoDIzYu8K6/ITIxd/JIayIbwkGkucg9vOi9E8DyPHrmcUowNlDgdqfU8jOm SD0g== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.43 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Gm-Message-State: APjAAAXz4VtNQhNWukIz9ed/yn3l9n9dXRPgJNV6p32hh/VV0VJ1RKEd 4MVZ292d7kqcuSSS+CbRDuzYcS8v9HRs452ueQMTmVuXzDd8204rS1/fPd89GhkD8WdFzvK2rJp F+pUN1NpQxx15rdchbF4BtKaFApvn9sId7cFwGSGmMwIjTfXNq7mFhJBUaESPtO1IFQ== X-Received: by 2002:a17:902:b48c:: with SMTP id y12mr13048172plr.280.1553316335283; Fri, 22 Mar 2019 21:45:35 -0700 (PDT) X-Google-Smtp-Source: APXvYqxmLvfudpEOwjDdElbJRxJSfTNY7WvYK/JVVdYNLBDglVbu7vpeaXUyZLmCjXi2SHMDxdiE X-Received: by 2002:a17:902:b48c:: with SMTP id y12mr13048106plr.280.1553316333891; Fri, 22 Mar 2019 21:45:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553316333; cv=none; d=google.com; s=arc-20160816; b=0O6TnFcbGKOYhh7nsrdctaWwItfBo+ICaxE3+EvWFphT0snxbBl/Dlbmn1gPcUnMPt AVXchN0rpwnSjgLldWtwy/ZRPnhRB38bQe92sHYh22xOCMXFO0/OAjiB949aGP0LNhAM cJPMEg8rwwGVu2yAhBZ5RWTQTRc7/IhMzanY6Pp9V1bUruAj59y4mwNZtFJo7PPH0dD7 MrU+X6LBGgpB7A+9mRxRF16uNslP4TT+ogSD7t4MKnjr8GNb/N3V3ISnY3zJhE0jBFpR oD/82/Cp2nYkubFk4iVwN8egsGTz2ta1nBpaUCthiPWhIPGUGMtO8eGEbTsfRARInAmR 5/Lg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=s9/PKRRwNdObDLsxPq9IaWvFf+ZDklBgCSct6YIvkgc=; b=HCsIfViK7Rjlp4RxFoUj4NBXHiVe4+Lt8vSCa9grD3v8tihElKS/0ikrAqljHWdC5w iWlq0aX/AeXXqvopolLmkAlewUGCyGHC9sxbYMjiXLrrI1I0R3UzaASuroP9tarLqN3X s7Ej2+olqUcFjLGBiEsXug94amFSQ92ilK4SJW/k4zhOiqkcmQvVgqIcjoBMD3wDbwtP 7NzApYvrXj7OlbAk467H4Hq4DYTIpJM27awRjHaFnrpW5RR6b3WTOpeE9APjai7iAmae 8KyvQfLusDj3XsaXcg1CJmObKsoBmqkT7pB7nUqnVqYmwNNcTQ8rWBD4jnA/lvGc/5JU UMEA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.43 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out30-43.freemail.mail.aliyun.com (out30-43.freemail.mail.aliyun.com. [115.124.30.43]) by mx.google.com with ESMTPS id m133si8321847pga.314.2019.03.22.21.45.33 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 21:45:33 -0700 (PDT) Received-SPF: pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.43 as permitted sender) client-ip=115.124.30.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.43 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e07487;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=14;SR=0;TI=SMTPD_---0TNPuxAM_1553316293; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TNPuxAM_1553316293) by smtp.aliyun-inc.com(127.0.0.1); Sat, 23 Mar 2019 12:45:02 +0800 From: Yang Shi To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, dave.hansen@intel.com, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 05/10] mm: page_alloc: make find_next_best_node could skip DRAM node Date: Sat, 23 Mar 2019 12:44:30 +0800 Message-Id: <1553316275-21985-6-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com> References: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Need find the cloest non-DRAM node to demote DRAM pages. Add "skip_ram_node" parameter to find_next_best_node() to skip DRAM node on demand. Signed-off-by: Yang Shi --- mm/internal.h | 11 +++++++++++ mm/page_alloc.c | 15 +++++++++++---- 2 files changed, 22 insertions(+), 4 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 9eeaf2b..46ad0d8 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -292,6 +292,17 @@ static inline bool is_data_mapping(vm_flags_t flags) return (flags & (VM_WRITE | VM_SHARED | VM_STACK)) == VM_WRITE; } +#ifdef CONFIG_NUMA +extern int find_next_best_node(int node, nodemask_t *used_node_mask, + bool skip_ram_node); +#else +static inline int find_next_best_node(int node, nodemask_t *used_node_mask, + bool skip_ram_node) +{ + return 0; +} +#endif + /* mm/util.c */ void __vma_link_list(struct mm_struct *mm, struct vm_area_struct *vma, struct vm_area_struct *prev, struct rb_node *rb_parent); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 68ad8c6..07d767b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5375,6 +5375,7 @@ int numa_zonelist_order_handler(struct ctl_table *table, int write, * find_next_best_node - find the next node that should appear in a given node's fallback list * @node: node whose fallback list we're appending * @used_node_mask: nodemask_t of already used nodes + * @skip_ram_node: find next best non-DRAM node * * We use a number of factors to determine which is the next node that should * appear on a given node's fallback list. The node should not have appeared @@ -5386,7 +5387,8 @@ int numa_zonelist_order_handler(struct ctl_table *table, int write, * * Return: node id of the found node or %NUMA_NO_NODE if no node is found. */ -static int find_next_best_node(int node, nodemask_t *used_node_mask) +int find_next_best_node(int node, nodemask_t *used_node_mask, + bool skip_ram_node) { int n, val; int min_val = INT_MAX; @@ -5394,13 +5396,19 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask) const struct cpumask *tmp = cpumask_of_node(0); /* Use the local node if we haven't already */ - if (!node_isset(node, *used_node_mask)) { + if (!node_isset(node, *used_node_mask) && + !skip_ram_node) { node_set(node, *used_node_mask); return node; } for_each_node_state(n, N_MEMORY) { + /* Find next best non-DRAM node */ + if (skip_ram_node && + (node_isset(n, def_alloc_nodemask))) + continue; + /* Don't want a node to appear more than once */ if (node_isset(n, *used_node_mask)) continue; @@ -5432,7 +5440,6 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask) return best_node; } - /* * Build zonelists ordered by node and zones within node. * This results in maximum locality--normal zone overflows into local @@ -5494,7 +5501,7 @@ static void build_zonelists(pg_data_t *pgdat) nodes_clear(used_mask); memset(node_order, 0, sizeof(node_order)); - while ((node = find_next_best_node(local_node, &used_mask)) >= 0) { + while ((node = find_next_best_node(local_node, &used_mask, false)) >= 0) { /* * We don't want to pressure a particular node. * So adding penalty to the first node in same From patchwork Sat Mar 23 04:44:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 10866787 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 993DA139A for ; Sat, 23 Mar 2019 04:46:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7DFBB2A9FB for ; Sat, 23 Mar 2019 04:46:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 71DDC2AA0B; Sat, 23 Mar 2019 04:46:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9056E2A9FB for ; Sat, 23 Mar 2019 04:46:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 455C46B0272; Sat, 23 Mar 2019 00:46:03 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3B5C46B0274; Sat, 23 Mar 2019 00:46:03 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E28E6B0275; Sat, 23 Mar 2019 00:46:03 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id D492E6B0272 for ; Sat, 23 Mar 2019 00:46:02 -0400 (EDT) Received: by mail-pf1-f197.google.com with SMTP id y2so4234867pfl.16 for ; Fri, 22 Mar 2019 21:46:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=Anh8RAnFf1SY1MKZOT6H+6LIqM2FoCryfFLLcFGk7rw=; b=IoBtdYFEbGJ1YsoCaV7fg6/I0Nbiep+opYPpwaAU4a/9Oxrp9peeXME3KLgSkNmaF4 jaScEW3H2B0sWiCSoMaQmnEuckBPoCgTAwyRJ2yd/bW55/XThIcHJTeKPAtQCxs7qI6X S82EGrsy+mLXnVor09QtOeJXX7q3vxaKydRppkL/U7Iz31XAH1Irv0d01TK6cTDT9PVX SVfMzJgVIBdz6iY6BOeGoj6RgfJtEkO136z21j5VxIpmZU21xe4k8Djpe/wu1HbcBQjd UwM6VZXX9ozOwKKbrNoxGvVgZmfx341n30CiypBXrqumPPjijiNQq1Im5B3DNwrAufTt xEKA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.57 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Gm-Message-State: APjAAAWVrxKgcdKDcqy1VpwHUpCznRDZvUvhVQsSxII/nGPpuUELRAfc /2DruGIep6N9A5DS/eXU5aLWSyEWLbbFjEEltptsqc4YGUnd1/XSF6Kdt78WwyzhY3MmP4CbJPK vGv8WvI82f+hbxw9QA6SVKSGJVHnum1lZvp6tIVb0XSULl8/RybCN4Q3yFH2JqmHpCA== X-Received: by 2002:a62:29c5:: with SMTP id p188mr12519596pfp.203.1553316362518; Fri, 22 Mar 2019 21:46:02 -0700 (PDT) X-Google-Smtp-Source: APXvYqxUQ9pV5geUw7b9DW+dCZbw8N+7LhGg82LIjFKw6BefhBeS/GN4D1K30Y+EpqVbc21Hiw2r X-Received: by 2002:a62:29c5:: with SMTP id p188mr12519520pfp.203.1553316360945; Fri, 22 Mar 2019 21:46:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553316360; cv=none; d=google.com; s=arc-20160816; b=e3P47e+UCPhnZS3a7Ny3uM5IKWNbDQcJKV+tF6L/CxXBpaD6C0LO9L0nnwD5wYzKhf dltBxQsPWb/lqxNXWh2Ma48tc7BfVs52BUL3cPNHM+4IqnoBEwfqBWXWjI0XZ1QSzDXa u0LnbvfBkGP2nh4n012gHrEubATeJ2W26nPp6DuHFeIT1SO38evmB/BSJ4X+UMUOV/EJ EvPP4rTSqe30wNf806Bcg9aDQMql90scTiGNOxSk78o3fA8O0wNAqWBGEKl2m0FXe9si ho/xhYBB2Q8lNSpI5STkxztNorPAPHiOCMTn5XrBb04CDGr50qeiD5USgrR7kmLDXOPm m15A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=Anh8RAnFf1SY1MKZOT6H+6LIqM2FoCryfFLLcFGk7rw=; b=OLdH2ZGCzz9FEa0ScvwQYSmOQ0Y0zCsU52ivy5tLdkHRIwr52QUK8o1jtyXakpKZjA 6VQWDRis2gBStr90Q4PGdNd1kBQVOygpe3O/okzI0DQS8U0qiXY+Uj9DoGKYiTV41euu 4BmViszEWzA65Zx6IktmfGLxY+XGE/xNjerh3C49JPI6KvzB8/zN/YvuSmRSsNJr72QL D4/QasIw6621Rpdf/opLsmHAYm2MvIxx/7B2RWyy5dM398ETOTLYkA0nobYeWZqFt+ve 8oli7R1jmrTBQb8n0HN95ePhzCK1/CHgYK1wxIwchMovDfRo+PfTZWCFPWG+xkNjQjoY PMVA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.57 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out30-57.freemail.mail.aliyun.com (out30-57.freemail.mail.aliyun.com. [115.124.30.57]) by mx.google.com with ESMTPS id n14si8106794pgl.277.2019.03.22.21.46.00 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 21:46:00 -0700 (PDT) Received-SPF: pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.57 as permitted sender) client-ip=115.124.30.57; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.57 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R101e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04452;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=14;SR=0;TI=SMTPD_---0TNPuxAM_1553316293; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TNPuxAM_1553316293) by smtp.aliyun-inc.com(127.0.0.1); Sat, 23 Mar 2019 12:45:03 +0800 From: Yang Shi To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, dave.hansen@intel.com, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 06/10] mm: vmscan: demote anon DRAM pages to PMEM node Date: Sat, 23 Mar 2019 12:44:31 +0800 Message-Id: <1553316275-21985-7-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com> References: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Since PMEM provides larger capacity than DRAM and has much lower access latency than disk, so it is a good choice to use as a middle tier between DRAM and disk in page reclaim path. With PMEM nodes, the demotion path of anonymous pages could be: DRAM -> PMEM -> swap device This patch demotes anonymous pages only for the time being and demote THP to PMEM in a whole. However this may cause expensive page reclaim and/or compaction on PMEM node if there is memory pressure on it. But, considering the capacity of PMEM and allocation only happens on PMEM when PMEM is specified explicity, such cases should be not that often. So, it sounds worth keeping THP in a whole instead of splitting it. Demote pages to the cloest non-DRAM node even though the system is swapless. The current logic of page reclaim just scan anon LRU when swap is on and swappiness is set properly. Demoting to PMEM doesn't need care whether swap is available or not. But, reclaiming from PMEM still skip anon LRU is swap is not available. The demotion just happens between DRAM node and its cloest PMEM node. Demoting to a remote PMEM node is not allowed for now. And, define a new migration reason for demotion, called MR_DEMOTE. Demote page via async migration to avoid blocking. Signed-off-by: Yang Shi --- include/linux/migrate.h | 1 + include/trace/events/migrate.h | 3 +- mm/debug.c | 1 + mm/internal.h | 22 ++++++++++ mm/vmscan.c | 99 ++++++++++++++++++++++++++++++++++-------- 5 files changed, 107 insertions(+), 19 deletions(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index e13d9bf..78c8dda 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -25,6 +25,7 @@ enum migrate_reason { MR_MEMPOLICY_MBIND, MR_NUMA_MISPLACED, MR_CONTIG_RANGE, + MR_DEMOTE, MR_TYPES }; diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h index 705b33d..c1d5b36 100644 --- a/include/trace/events/migrate.h +++ b/include/trace/events/migrate.h @@ -20,7 +20,8 @@ EM( MR_SYSCALL, "syscall_or_cpuset") \ EM( MR_MEMPOLICY_MBIND, "mempolicy_mbind") \ EM( MR_NUMA_MISPLACED, "numa_misplaced") \ - EMe(MR_CONTIG_RANGE, "contig_range") + EM( MR_CONTIG_RANGE, "contig_range") \ + EMe(MR_DEMOTE, "demote") /* * First define the enums in the above macros to be exported to userspace diff --git a/mm/debug.c b/mm/debug.c index c0b31b6..cc0d7df 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -25,6 +25,7 @@ "mempolicy_mbind", "numa_misplaced", "cma", + "demote", }; const struct trace_print_flags pageflag_names[] = { diff --git a/mm/internal.h b/mm/internal.h index 46ad0d8..0152300 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -303,6 +303,19 @@ static inline int find_next_best_node(int node, nodemask_t *used_node_mask, } #endif +static inline bool has_nonram_online(void) +{ + int i = 0; + + for_each_online_node(i) { + /* Have PMEM node online? */ + if (!node_isset(i, def_alloc_nodemask)) + return true; + } + + return false; +} + /* mm/util.c */ void __vma_link_list(struct mm_struct *mm, struct vm_area_struct *vma, struct vm_area_struct *prev, struct rb_node *rb_parent); @@ -565,5 +578,14 @@ static inline bool is_migrate_highatomic_page(struct page *page) } void setup_zone_pageset(struct zone *zone); + +#ifdef CONFIG_NUMA extern struct page *alloc_new_node_page(struct page *page, unsigned long node); +#else +static inline struct page *alloc_new_node_page(struct page *page, + unsigned long node) +{ + return NULL; +} +#endif #endif /* __MM_INTERNAL_H */ diff --git a/mm/vmscan.c b/mm/vmscan.c index a5ad0b3..bdcab6b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1094,6 +1094,19 @@ static void page_check_dirty_writeback(struct page *page, mapping->a_ops->is_dirty_writeback(page, dirty, writeback); } +static inline bool is_demote_ok(struct pglist_data *pgdat) +{ + /* Current node is not DRAM node */ + if (!node_isset(pgdat->node_id, def_alloc_nodemask)) + return false; + + /* No online PMEM node */ + if (!has_nonram_online()) + return false; + + return true; +} + /* * shrink_page_list() returns the number of reclaimed pages */ @@ -1106,6 +1119,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, { LIST_HEAD(ret_pages); LIST_HEAD(free_pages); + LIST_HEAD(demote_pages); unsigned nr_reclaimed = 0; memset(stat, 0, sizeof(*stat)); @@ -1262,6 +1276,22 @@ static unsigned long shrink_page_list(struct list_head *page_list, } /* + * Demote DRAM pages regardless the mempolicy. + * Demot anonymous pages only for now and skip MADV_FREE + * pages. + */ + if (PageAnon(page) && !PageSwapCache(page) && + (node_isset(page_to_nid(page), def_alloc_nodemask)) && + PageSwapBacked(page)) { + + if (has_nonram_online()) { + list_add(&page->lru, &demote_pages); + unlock_page(page); + continue; + } + } + + /* * Anonymous process memory has backing store? * Try to allocate it some swap space here. * Lazyfree page could be freed directly @@ -1477,6 +1507,25 @@ static unsigned long shrink_page_list(struct list_head *page_list, VM_BUG_ON_PAGE(PageLRU(page) || PageUnevictable(page), page); } + /* Demote pages to PMEM */ + if (!list_empty(&demote_pages)) { + int err, target_nid; + nodemask_t used_mask; + + nodes_clear(used_mask); + target_nid = find_next_best_node(pgdat->node_id, &used_mask, + true); + + err = migrate_pages(&demote_pages, alloc_new_node_page, NULL, + target_nid, MIGRATE_ASYNC, MR_DEMOTE); + + if (err) { + putback_movable_pages(&demote_pages); + + list_splice(&ret_pages, &demote_pages); + } + } + mem_cgroup_uncharge_list(&free_pages); try_to_unmap_flush(); free_unref_page_list(&free_pages); @@ -2188,10 +2237,11 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file, unsigned long gb; /* - * If we don't have swap space, anonymous page deactivation - * is pointless. + * If we don't have swap space or PMEM online, anonymous page + * deactivation is pointless. */ - if (!file && !total_swap_pages) + if (!file && !total_swap_pages && + !is_demote_ok(pgdat)) return false; inactive = lruvec_lru_size(lruvec, inactive_lru, sc->reclaim_idx); @@ -2271,22 +2321,34 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, unsigned long ap, fp; enum lru_list lru; - /* If we have no swap space, do not bother scanning anon pages. */ - if (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0) { - scan_balance = SCAN_FILE; - goto out; - } - /* - * Global reclaim will swap to prevent OOM even with no - * swappiness, but memcg users want to use this knob to - * disable swapping for individual groups completely when - * using the memory controller's swap limit feature would be - * too expensive. + * Anon pages can be demoted to PMEM. If there is PMEM node online, + * still scan anonymous LRU even though the systme is swapless or + * swapping is disabled by memcg. + * + * If current node is already PMEM node, demotion is not applicable. */ - if (!global_reclaim(sc) && !swappiness) { - scan_balance = SCAN_FILE; - goto out; + if (!is_demote_ok(pgdat)) { + /* + * If we have no swap space, do not bother scanning + * anon pages. + */ + if (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0) { + scan_balance = SCAN_FILE; + goto out; + } + + /* + * Global reclaim will swap to prevent OOM even with no + * swappiness, but memcg users want to use this knob to + * disable swapping for individual groups completely when + * using the memory controller's swap limit feature would be + * too expensive. + */ + if (!global_reclaim(sc) && !swappiness) { + scan_balance = SCAN_FILE; + goto out; + } } /* @@ -3332,7 +3394,8 @@ static void age_active_anon(struct pglist_data *pgdat, { struct mem_cgroup *memcg; - if (!total_swap_pages) + /* Aging anon page as long as demotion is fine */ + if (!total_swap_pages && !is_demote_ok(pgdat)) return; memcg = mem_cgroup_iter(NULL, NULL, NULL); From patchwork Sat Mar 23 04:44:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 10866785 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 83DAB922 for ; Sat, 23 Mar 2019 04:46:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 697FA2A9FB for ; Sat, 23 Mar 2019 04:46:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5DF452AA0B; Sat, 23 Mar 2019 04:46:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 417DB2AA09 for ; Sat, 23 Mar 2019 04:46:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 47D3B6B0270; Sat, 23 Mar 2019 00:46:00 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 403206B0272; Sat, 23 Mar 2019 00:46:00 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A3936B0273; Sat, 23 Mar 2019 00:46:00 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id D9AD56B0270 for ; Sat, 23 Mar 2019 00:45:59 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id a72so4229956pfj.19 for ; Fri, 22 Mar 2019 21:45:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=GbACvGM2k0XChSgXP2m0OAFhhF6wTX5VAiPEQe8PPuA=; b=W1xGXoKvhFSobbn4jt8LLo3+XqlT1YyzFqXBWFtUuC6vpiHUIMBl/kYXDy7HallAH2 cEtBvIpBHt6KTa7+Tivg5MXu5HLw4KLzh2DSpO5OeQbNoEf6TSIwZvDcwtjnUbpYj0mj M4jtlFeg8vI2frQfqxMNS39uYOxKQeUQf5EAZzJusLiV6uglDxX5sKBbzDPtoZgrjg2r je6FkIPf5TkaKbNzhrfjpC09rxEU9Y0bV1FvNd4thKHANcoTJDfScULg2eRmWSKTIsPx p8ipYyE0r6eVxWYHM97Y6zjEcM3+6BpeXbsfJ1WMmhFUX2biciwA4QNqlPsoKQkyPbvj s7wg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.54 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Gm-Message-State: APjAAAXcibNmuOupD9I7gAywj+3pE2cgu63Gp7/O4H2E2HYic0r6HSVR NDhLvm6sKifCB6bbEGxXHHgDHwBFAlxfu5JSdjYZFtl0kqvfmW8zmBRewgJ6UeQzWy1x8OU4Akv k0SBNOHxZuElUMpoJ3Qh8bjl7M3ra84uuDc2GTIJ2TgmMyVNClhxo0G1eSxCK/2D+BQ== X-Received: by 2002:a63:f544:: with SMTP id e4mr12558675pgk.145.1553316359570; Fri, 22 Mar 2019 21:45:59 -0700 (PDT) X-Google-Smtp-Source: APXvYqw9/FQG/p43Ig4KppLZOgux/mwgl37qoeL6kuwrengdWTU141QuvV86QVuO1AejXug+RuNh X-Received: by 2002:a63:f544:: with SMTP id e4mr12558621pgk.145.1553316358330; Fri, 22 Mar 2019 21:45:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553316358; cv=none; d=google.com; s=arc-20160816; b=XM0NyIlel5TNnk5ipLafkIbkULit74NJjmZ1SnzDtL4Eh4xgc00qyYjsojhAur0ekj tNvQe9/SwYnZEZB6bUgxSEYCzHFpbZzT/9JTAKrpQfae54eKEfdzHtupksNQc022y8k7 E1YpPbYX1G0I+JTrZofgbTx/oX5+vL1fmDowkKh2fJ+AODhUwOI6O9Lmg5+BUznAEISC wzdqh/acQCFIlu2fkjTFMNjVhUg5z9tNtZRFmpI3BBBPV4/yd4J1KdHWXGXJEMUhiexr Zo1VRIqLgZctHnzHxyvm+2StyzXLT8f5M4d40DMVB1a9vXoMpMuwg00YDGpFqGk6bQXo 0LuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=GbACvGM2k0XChSgXP2m0OAFhhF6wTX5VAiPEQe8PPuA=; b=T54dqSQxoyj2F1hbs1sNTufHUz7TZlpkOiuYNEiBKRteRJ0BR7ufyRV8Ix5mCmu5Z0 DTjVKA8ad220SrfnSDV9n+bvJtR8/WNWuEoruHIqtjoD8+JH/HDveJGtjAsEybBPL6lF bfP2N1EpVsnxjBH2C6ParVCvJFe1C/q8O7R7ukGLIs6SxhKkPCEpR5Fz1LYKpeIlX1FQ RdhPCAZhvsBR4Q3XLOBzyafGzJfjGk3gqTCSAw33ZWfkHc8bWeCMl28gsT/02ty3gjRN NRU4pP/WQgOPK31+zOp7i4AE/a1N6Pu+5lUscdyX7W0mR7JGF0G4JNP4Ad7P79x91+Ec gUJQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.54 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com. [115.124.30.54]) by mx.google.com with ESMTPS id e9si8100867pgs.450.2019.03.22.21.45.57 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 21:45:58 -0700 (PDT) Received-SPF: pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.54 as permitted sender) client-ip=115.124.30.54; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.54 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R981e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04455;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=14;SR=0;TI=SMTPD_---0TNPuxAM_1553316293; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TNPuxAM_1553316293) by smtp.aliyun-inc.com(127.0.0.1); Sat, 23 Mar 2019 12:45:03 +0800 From: Yang Shi To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, dave.hansen@intel.com, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 07/10] mm: vmscan: add page demotion counter Date: Sat, 23 Mar 2019 12:44:32 +0800 Message-Id: <1553316275-21985-8-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com> References: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Demoted pages are counted into reclaim_state->nr_demoted instead of nr_reclaimed since they are not reclaimed actually. They are still in memory, but just migrated to PMEM. Add pgdemote_kswapd and pgdemote_direct VM counters showed in /proc/vmstat. Signed-off-by: Yang Shi --- include/linux/vm_event_item.h | 2 ++ include/linux/vmstat.h | 1 + mm/vmscan.c | 14 ++++++++++++++ mm/vmstat.c | 2 ++ 4 files changed, 19 insertions(+) diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 47a3441..499a3aa 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -32,6 +32,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, PGREFILL, PGSTEAL_KSWAPD, PGSTEAL_DIRECT, + PGDEMOTE_KSWAPD, + PGDEMOTE_DIRECT, PGSCAN_KSWAPD, PGSCAN_DIRECT, PGSCAN_DIRECT_THROTTLE, diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 2db8d60..eb5d21c 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -29,6 +29,7 @@ struct reclaim_stat { unsigned nr_activate; unsigned nr_ref_keep; unsigned nr_unmap_fail; + unsigned nr_demoted; }; #ifdef CONFIG_VM_EVENT_COUNTERS diff --git a/mm/vmscan.c b/mm/vmscan.c index bdcab6b..3c7ba7e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1286,6 +1286,10 @@ static unsigned long shrink_page_list(struct list_head *page_list, if (has_nonram_online()) { list_add(&page->lru, &demote_pages); + if (PageTransHuge(page)) + stat->nr_demoted += HPAGE_PMD_NR; + else + stat->nr_demoted++; unlock_page(page); continue; } @@ -1523,7 +1527,17 @@ static unsigned long shrink_page_list(struct list_head *page_list, putback_movable_pages(&demote_pages); list_splice(&ret_pages, &demote_pages); + + if (err > 0) + stat->nr_demoted -= err; + else + stat->nr_demoted = 0; } + + if (current_is_kswapd()) + __count_vm_events(PGDEMOTE_KSWAPD, stat->nr_demoted); + else + __count_vm_events(PGDEMOTE_DIRECT, stat->nr_demoted); } mem_cgroup_uncharge_list(&free_pages); diff --git a/mm/vmstat.c b/mm/vmstat.c index 36b56f8..0e863e7 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1192,6 +1192,8 @@ int fragmentation_index(struct zone *zone, unsigned int order) "pgrefill", "pgsteal_kswapd", "pgsteal_direct", + "pgdemote_kswapd", + "pgdemote_direct", "pgscan_kswapd", "pgscan_direct", "pgscan_direct_throttle", From patchwork Sat Mar 23 04:44:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 10866783 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 78EDB922 for ; Sat, 23 Mar 2019 04:45:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5D84C2A9FB for ; Sat, 23 Mar 2019 04:45:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4FDE92AA0B; Sat, 23 Mar 2019 04:45:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2E0C32A9FB for ; Sat, 23 Mar 2019 04:45:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 05D656B026E; Sat, 23 Mar 2019 00:45:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F263A6B0270; Sat, 23 Mar 2019 00:45:44 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DEEA96B0271; Sat, 23 Mar 2019 00:45:44 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id 9DFD66B026E for ; Sat, 23 Mar 2019 00:45:44 -0400 (EDT) Received: by mail-pg1-f199.google.com with SMTP id 33so3941274pgv.17 for ; Fri, 22 Mar 2019 21:45:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=cL/AGuSDoqRTGwGzMJZQWG4Buhnh3vg9z83nZE9E16I=; b=Ni53IBV4f0mAOJm0YU7OakXUk2oCIA+YUaWBWvf2oOn5+X+ZxCYHd7gKpBK/X8SMZU YoQuD/qvhSatwID0RIW1nQyd63L3vXl99kfjQzI+G5lq/rb53vUZblg4Ax7AJmy4S/pj M4tVv9CQNcrjNJlBAzumVhoI+Sptk2LLcUfX+lCs2ERdbApIHnWCK6po8OsMvEZkvSWD kFcYv/kr/roD2ezI1dMY/TeqwEHagJLOEe5m5uLOD0pULW+dSvjVQoQOTZZbXxzhS3Rq 3RYRRaFDKhDNB6W+tIg4gmJjTbUaurSoCmHfOV9FTZvxFMLCiSnI9+08N2H8tvP3IyM/ gCRQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.43 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Gm-Message-State: APjAAAUdA0bxfE86RPOANml2Lt0OWKgCbo6/YFk5pzDoduTQOE7y8Q1R ymDQmSbO3a4GpPza4hAXhAmnYVyLzGHQxnE24rxTvRyIfoIeaEAoPqlQWZqi3WkTEvFwl+aOF80 s/moTF4oUxaYAZb7YJd2H0aUGGiTFLOIUPV9wHQnNRHhlzHUdoqOyqKiQlLkvRpZ8gw== X-Received: by 2002:a63:5c66:: with SMTP id n38mr12415641pgm.15.1553316344287; Fri, 22 Mar 2019 21:45:44 -0700 (PDT) X-Google-Smtp-Source: APXvYqyFPgXqS0oEBi43zcqDeB32MVfkl2vauEt1xsI9RuGjOXxyvyJXasYcUYoq2v8sCpQ1K5Gj X-Received: by 2002:a63:5c66:: with SMTP id n38mr12415558pgm.15.1553316342896; Fri, 22 Mar 2019 21:45:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553316342; cv=none; d=google.com; s=arc-20160816; b=w6OWX/gTJDFf7ceLddF2AzuTxYWG40zI1KSBiwiVz9r4x/2Zw+B5LcHNZeidJAMn20 SgmrqHyijwwDuROSG7gN/Mr+IiBCMXF6JPpwYE0ECA/aaVMQsoNtO4v24QUY2xBfV/2b nuwAJ2HOykqUpFUMOE3NBZUcSrN6MLOhU24kta+blIGpL5UKFizgDlwkao1uw9J1pgvE EaWNhJeyPx99M8Bm3cLRtfhZrMUua5K8KGJqYjuKpdLhfHOMSH4euiT5KBVLroqO4XHE YiOnfcumfdx1PPqn5uO3XOoYcctl6JNMu0JfpgosH8tzUg/niRb9O9/Me1PFCBWkH53d 1kMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=cL/AGuSDoqRTGwGzMJZQWG4Buhnh3vg9z83nZE9E16I=; b=zNzFhK3uLhYFwFTzQjwSW2OVP3+rUGUFXzcDT1JMoZrERbE7B/hnkADszKDU1q22WJ ErH+/D4Yqjv+C9/n4EGAtZDq02UgqJt92n8yZyXrneuWLGgsPU5Y6Xy5cjPhg3MXcNTh kq8E8jDAR3R+aIDCe9XltW/LbFsmXsWQMsklEQ3g7daUgQtNJFJ9SZTlxUwDRgp73FgA Caf5rN54OzVm8I7BEOEhfXLNpyo55s+Iasaj3mHV9sRhX6gS2AoRdW/a22k7AtUudq+3 rrHq9Szh/o6yYTV2RolBgzogPcKZx3UU11V412fVjqOU6vDat+XGHfh4Inx04/w4TCir RtRw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.43 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out30-43.freemail.mail.aliyun.com (out30-43.freemail.mail.aliyun.com. [115.124.30.43]) by mx.google.com with ESMTPS id d17si6746521pgk.479.2019.03.22.21.45.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 21:45:42 -0700 (PDT) Received-SPF: pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.43 as permitted sender) client-ip=115.124.30.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.43 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R191e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04428;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=14;SR=0;TI=SMTPD_---0TNPuxAM_1553316293; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TNPuxAM_1553316293) by smtp.aliyun-inc.com(127.0.0.1); Sat, 23 Mar 2019 12:45:03 +0800 From: Yang Shi To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, dave.hansen@intel.com, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 08/10] mm: numa: add page promotion counter Date: Sat, 23 Mar 2019 12:44:33 +0800 Message-Id: <1553316275-21985-9-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com> References: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Add counter for page promotion for NUMA balancing. Signed-off-by: Yang Shi --- include/linux/vm_event_item.h | 1 + mm/huge_memory.c | 4 ++++ mm/memory.c | 4 ++++ mm/vmstat.c | 1 + 4 files changed, 10 insertions(+) diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 499a3aa..9f52a62 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -51,6 +51,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, NUMA_HINT_FAULTS, NUMA_HINT_FAULTS_LOCAL, NUMA_PAGE_MIGRATE, + NUMA_PAGE_PROMOTE, #endif #ifdef CONFIG_MIGRATION PGMIGRATE_SUCCESS, PGMIGRATE_FAIL, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 8268a3c..9d5f5ce 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1607,6 +1607,10 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) migrated = migrate_misplaced_transhuge_page(vma->vm_mm, vma, vmf->pmd, pmd, vmf->address, page, target_nid); if (migrated) { + if (!node_isset(page_nid, def_alloc_nodemask) && + node_isset(target_nid, def_alloc_nodemask)) + count_vm_numa_events(NUMA_PAGE_PROMOTE, HPAGE_PMD_NR); + flags |= TNF_MIGRATED; page_nid = target_nid; } else diff --git a/mm/memory.c b/mm/memory.c index 2494c11..554191b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3691,6 +3691,10 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) /* Migrate to the requested node */ migrated = migrate_misplaced_page(page, vma, target_nid); if (migrated) { + if (!node_isset(page_nid, def_alloc_nodemask) && + node_isset(target_nid, def_alloc_nodemask)) + count_vm_numa_event(NUMA_PAGE_PROMOTE); + page_nid = target_nid; flags |= TNF_MIGRATED; } else diff --git a/mm/vmstat.c b/mm/vmstat.c index 0e863e7..4b44fc8 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1220,6 +1220,7 @@ int fragmentation_index(struct zone *zone, unsigned int order) "numa_hint_faults", "numa_hint_faults_local", "numa_pages_migrated", + "numa_pages_promoted", #endif #ifdef CONFIG_MIGRATION "pgmigrate_success", From patchwork Sat Mar 23 04:44:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 10866773 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DFF8E139A for ; Sat, 23 Mar 2019 04:45:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C72EC2A83B for ; Sat, 23 Mar 2019 04:45:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BB8412A8A1; Sat, 23 Mar 2019 04:45:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 623632A83B for ; Sat, 23 Mar 2019 04:45:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 827BD6B000A; Sat, 23 Mar 2019 00:45:07 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 789646B000C; Sat, 23 Mar 2019 00:45:07 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 64C856B000D; Sat, 23 Mar 2019 00:45:07 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 310AC6B000C for ; Sat, 23 Mar 2019 00:45:07 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id i23so4281561pfa.0 for ; Fri, 22 Mar 2019 21:45:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=M9ZFfxU3HA1GGesCdT2Kmzqdpeoh9T1PaRxIC8NiE8w=; b=oMu71Zzwqdg2YeNUg5esKmrpZNajaefhPOmYnQluyNRVqYCLOH+40HiUvccyBqDLK7 vXZPVzK63OCwNIWzToJx8tbBEnJON4FV31KCBd8oVrSxv9dJLZuwecNumaYbJXG27efZ FtQNLzpwAHGamACmB94uEnaOQRqWwD43DoiOGmyHHkpGHE5N8wpOiLFWQ/ZADcQ3BJFs KEeh1Lf2CCm90zyxNVXCFXhVZVzyuN36TVg/Ht3mQ0GI8pPd7rn2WTF/QT9iqdNtd2/w fuODzZpiOSBDcxpm1+fVcWXDH62nlOl0Om6v1LyyvxoqJKmkAEp91g0ndmATFBzUV7UK Oy/w== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.56 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Gm-Message-State: APjAAAUuIMshW8t2ElxJVqLUnGncdjryS5pNnjdq7E3YLkt2xc0wSBDv wLlBd6Tr8HaaIPvm3igPod+d6qjfoxcr4BHkqmjIOIKGp5thfs2ylPJDwjBR/QxL8PQ2JTN3odg fM5/vG+PU6u9AdMw3yf7WWGgVAsRQsN3b6Dbhr4rOB1+Gk1QWETklJNlD6U0GsFUJfA== X-Received: by 2002:a17:902:50e3:: with SMTP id c32mr13321710plj.57.1553316306891; Fri, 22 Mar 2019 21:45:06 -0700 (PDT) X-Google-Smtp-Source: APXvYqxWU9CrB7aiZ/OdGaxzSCzduir4ALPJ30P28bDTFyH4HaH8Dpi6SecU2/ELbX2Wp2oUMpXy X-Received: by 2002:a17:902:50e3:: with SMTP id c32mr13321664plj.57.1553316305860; Fri, 22 Mar 2019 21:45:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553316305; cv=none; d=google.com; s=arc-20160816; b=j6ztZ/Lw10s9yp/CuBJ1lGHZubD6o2gApYfC7/K7U4xwPUqeYd6mZlhvnM3VRI3tR+ Ay/haH5Rm2Ee7KgkB1N14d7NWHFNyDmVbZEYZrb6owtYrrth+c5YbRZwTXNIZyQfrNZ1 zFGQKsRe54cimhsGcl1NypnfmwLCrh+Tl9G8l/5kuWMjUVqzbibgUP82i+qME+egxV4g 5eLhnpzVyDgjKHp2Y9ItyiMyYpR8pUOds84Ou4OOSMRX06UllN09SkBkODVeb0GcLfYc ruZi8fr0JLLFJEW/4hTEWoQqi8KSFoc5KYAJ9a4vSpv/v2h9znm7uLSLuGyQfInXUNUt 1T5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=M9ZFfxU3HA1GGesCdT2Kmzqdpeoh9T1PaRxIC8NiE8w=; b=hKnib3Sr64A79QaQY+XT+eWUFw9xJEjPGCy3UMJOot48AWVx6cFgCp/1TM/MHoUQ/l anUc7jFR4lOm/oWhIw6ZWT6/rCRIJ7ySZ56VKB2nw4v6E2zX1G5jSGegyVGNmAgb1qzN oLmfGqD/tW4xLLow9qZIxTNY8+NQTfOKWazSd0/U0H1hHzUJf1FYhvbrgYGcGh7QiSSs M4PoCQGCq7D7NAl2hO5eXI1DvQEfkiaU/VT0wtmN7pqh3s5UDLPU/z4ZJLorDISr4uEr YUPvcnyj0DkV0N2okAcGbSY1qrijeJqLU7oQIPXOHBWjy/8WWDb37i+OuMqX5iBRJT/3 ztAw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.56 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out30-56.freemail.mail.aliyun.com (out30-56.freemail.mail.aliyun.com. [115.124.30.56]) by mx.google.com with ESMTPS id b12si8167296plr.285.2019.03.22.21.45.05 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 21:45:05 -0700 (PDT) Received-SPF: pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.56 as permitted sender) client-ip=115.124.30.56; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.56 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R111e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04446;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=14;SR=0;TI=SMTPD_---0TNPuxAM_1553316293; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TNPuxAM_1553316293) by smtp.aliyun-inc.com(127.0.0.1); Sat, 23 Mar 2019 12:45:04 +0800 From: Yang Shi To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, dave.hansen@intel.com, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 09/10] doc: add description for MPOL_HYBRID mode Date: Sat, 23 Mar 2019 12:44:34 +0800 Message-Id: <1553316275-21985-10-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com> References: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Add description for MPOL_HYBRID mode in kernel documentation. Signed-off-by: Yang Shi --- Documentation/admin-guide/mm/numa_memory_policy.rst | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst index d78c5b3..3db8257 100644 --- a/Documentation/admin-guide/mm/numa_memory_policy.rst +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst @@ -198,6 +198,16 @@ MPOL_BIND the node in the set with sufficient free memory that is closest to the node where the allocation takes place. +MPOL_HYBRID + This mode specifies that the page allocation must happen on the + nodes specified by the policy. If both DRAM and non-DRAM nodes + are specified, NUMA balancing may promote the pages from non-DRAM + nodes to the specified DRAM nodes. If only non-DRAM nodes are + specified, NUMA balancing may promote the pages to any available + DRAM nodes. Any other policy doesn't do such page promotion. The + default mode may do NUMA balancing, but non-DRAM nodes are masked + off for default mode. + MPOL_PREFERRED This mode specifies that the allocation should be attempted from the single node specified in the policy. If that From patchwork Sat Mar 23 04:44:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 10866775 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 99CE4139A for ; Sat, 23 Mar 2019 04:45:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 80B1C2A83B for ; Sat, 23 Mar 2019 04:45:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 74C672A8A1; Sat, 23 Mar 2019 04:45:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0AA432A83B for ; Sat, 23 Mar 2019 04:45:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 25B756B000C; Sat, 23 Mar 2019 00:45:08 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 208976B000D; Sat, 23 Mar 2019 00:45:08 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0A8476B000E; Sat, 23 Mar 2019 00:45:08 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id BB4B66B000C for ; Sat, 23 Mar 2019 00:45:07 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id f12so3981609pgs.2 for ; Fri, 22 Mar 2019 21:45:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=3MssjOYGLVJvWySvWCRwamZGM4wpNkW74+k4V2c5XL8=; b=IP8a/JKT0I3F+5ydwJiTSNfps4mjpBmPS6GPlr2Thb1Q7T8gh7lPkYC0oeUNheCEsj UHajAPNPoetJ3mV5yhXd7zKfioR0HcU+ovX+zk8iOnwdnk/by8jOnYC5O1A9SBcjvrja 6qEeCj0eneJFcyRug0tpm8Dn+BE5GjbcjUk6b2/NVfcNtsdfvZYMthlFq8vYZuBiapmm oYm2iLFhDCRvQjVpMFwjHLYmF8wZo32hXe3m3eVAMurlOwhZlh0eDb1h8Uxf8sCs7uCU XdKFX6VYyPbPPqsjw6/Efm8SWiousZDDRc7RxJeT4FDq8hH68XDke97jZYA1HlMbRjWn K8Xw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.45 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Gm-Message-State: APjAAAU8lEeAIg5JONOU9gxK1mxWFoHga8h7OSmV3nsbtr8oAK7Hva1s boq4MntXBLFF5raw5iq4SkcPfpTQs07YLOsvo3zrrNTdzh45elSMSghKwlavvAB0zN0wgvuKEzF q57p20LGhTIrzmneP6csnzxlolTTgPpVGQ/FcKINQTgzrz82NDpFRnMwVFILHlvcKuQ== X-Received: by 2002:a65:4549:: with SMTP id x9mr12776135pgr.3.1553316307441; Fri, 22 Mar 2019 21:45:07 -0700 (PDT) X-Google-Smtp-Source: APXvYqwdHLUgMWcBfTLL4LAypomLuX/j/GiLGMBEphIrEfeweIWdvPZJVpVUQ8rzhm0RIksEvpFv X-Received: by 2002:a65:4549:: with SMTP id x9mr12776076pgr.3.1553316306293; Fri, 22 Mar 2019 21:45:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553316306; cv=none; d=google.com; s=arc-20160816; b=ApNagy01IGN4kAnD5Hz9a0zlvm+ThVo7OZ8z6DG4/bTvbpNYBD/W2BETdskwSyTtaL Q//LnOs5kBtnBZ/cmFZhz3yeK/MLhENsmVr6YK1XSlZ8HVMUtLzJLDb+qpQUflETprG1 4k3Fi9P9hURl5KVF6szbCfMxylc4eMDp7xeB0fydpx45l3H+KOG8W5EZyeuFn7w455H7 dYf6cNO9eNsMKUNoj2HDq/oz5lopSLMG+/p2IqtcAjDF6bkxVBNeJ4KGhqPqS+bjb7ha YrlShS0gg02YNZZBUyQ1QmcSlPfD0/nxJCogxCxL/s/B4TRRdsthVc4rIaHaYbBxas6h rnpw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=3MssjOYGLVJvWySvWCRwamZGM4wpNkW74+k4V2c5XL8=; b=mVEw7jpZQFRwVc6dJJ64SgRew+skETDaeKGVdxR/UaLxjdAyG/DbjUEzM9RQ0M87F+ vLyYUjErcVGHXlN3cRniBa0l2RDfEY7iMfWWNMVP1FlPaSqvLi4+Dkk+Oop8VVfSER/B 9rg/nv2Z397eAdU1zj3P+LSHD/v82GO/o6Au8JKCcJrd9d6wfvOuqbzv1OOgSe1SXgR1 gmCB0hrvznpVtoW2cU76MlqjRAZwR4cHAYtE3fY5vz5nS/jvdhRCfxp1hwtMMhzgR5xy BbdNhnw4kV2PeebR5y8+dQjDDWabwpdKo1FyTvFjvw+lAUOf1njSm/l2j59L7bv5VRQZ 3KHw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.45 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out30-45.freemail.mail.aliyun.com (out30-45.freemail.mail.aliyun.com. [115.124.30.45]) by mx.google.com with ESMTPS id g5si8166563pgc.122.2019.03.22.21.45.05 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 21:45:06 -0700 (PDT) Received-SPF: pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.45 as permitted sender) client-ip=115.124.30.45; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.45 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R231e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e07488;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=14;SR=0;TI=SMTPD_---0TNPuxAM_1553316293; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TNPuxAM_1553316293) by smtp.aliyun-inc.com(127.0.0.1); Sat, 23 Mar 2019 12:45:04 +0800 From: Yang Shi To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, dave.hansen@intel.com, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 10/10] doc: elaborate the PMEM allocation rule Date: Sat, 23 Mar 2019 12:44:35 +0800 Message-Id: <1553316275-21985-11-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com> References: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP non-DRAM nodes are excluded from default allocation node mask, elaborate the rules. Signed-off-by: Yang Shi --- Documentation/vm/numa.rst | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/Documentation/vm/numa.rst b/Documentation/vm/numa.rst index 185d8a5..8c2fd5c 100644 --- a/Documentation/vm/numa.rst +++ b/Documentation/vm/numa.rst @@ -133,7 +133,7 @@ a subsystem allocates per CPU memory resources, for example. A typical model for making such an allocation is to obtain the node id of the node to which the "current CPU" is attached using one of the kernel's -numa_node_id() or CPU_to_node() functions and then request memory from only +numa_node_id() or cpu_to_node() functions and then request memory from only the node id returned. When such an allocation fails, the requesting subsystem may revert to its own fallback path. The slab kernel memory allocator is an example of this. Or, the subsystem may choose to disable or not to enable @@ -148,3 +148,8 @@ architectures transparently, kernel subsystems can use the numa_mem_id() or cpu_to_mem() function to locate the "local memory node" for the calling or specified CPU. Again, this is the same node from which default, local page allocations will be attempted. + +If the architecture supports non-regular DRAM nodes, i.e. NVDIMM on x86, the +non-DRAM nodes are hidden from default mode, IOWs the default allocation +would not end up on non-DRAM nodes, unless thoes nodes are specified +explicity by mempolicy. [see Documentation/vm/numa_memory_policy.txt.]