From patchwork Wed Dec 26 13:14:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10743067 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8290891E for ; Wed, 26 Dec 2018 13:37:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6EA3B28495 for ; Wed, 26 Dec 2018 13:37:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6228C28938; Wed, 26 Dec 2018 13:37:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 485A128495 for ; Wed, 26 Dec 2018 13:37:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D21BE8E0004; Wed, 26 Dec 2018 08:37:06 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id CA7B98E0001; Wed, 26 Dec 2018 08:37:06 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B60DB8E0004; Wed, 26 Dec 2018 08:37:06 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id 6B5CC8E0001 for ; Wed, 26 Dec 2018 08:37:06 -0500 (EST) Received: by mail-pf1-f199.google.com with SMTP id u20so17813346pfa.1 for ; Wed, 26 Dec 2018 05:37:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:subject :references:mime-version:content-disposition; bh=R9KC6ubzg8F3Mp0v0KkgZ134RXbUulmpW2HiMrvB4P4=; b=DBwEWiW/quyFhddexUKVpvjN32gbakBMwgDoI780AH6XOMS2rmoRrYfEWCM4pysjhP SiibwqlLzwBEqouJoW15g60flNc9nTB2qkSAgt7TW7owWm/Urbyx5903BUNeoFSJs5H7 hCpT5rhT0DJ1CqC8v4SgiQjJlyuEWP1IvTQGknqgpYPXATK+pFXSsaChHb3UkDjQ+ucr vKSCMckKhbSkelWUJJBluHnYaff9u3W4Lafa+JxnwLgvc7D5N7kiH3cQ0n9BttvsvCLc Ca1fjuFVwu86M+67JDDvm7xB7cUqNPelzhIYd+R9edH7gwxXzY6ngSPQk4OSphYr7CHw pxWA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AJcUukfRK8prjtFWAIsH+J8mePs3Q//svvPSso03TsGYaCZXZE1S5Yxh fHSZPxMq3+ds2L7TofNloK7We81l4Dg+XIh4N1bqOvRyRImEkrAtEFozJKyD5L7fhPr8MZkWOAq 3oCMT4T1K7/3YyHDD8wbjv3r6IYlNo9esjJrffs7nvLXbVFPpCv0bAM5CTH16wsp7uA== X-Received: by 2002:a63:170c:: with SMTP id x12mr18382384pgl.364.1545831426106; Wed, 26 Dec 2018 05:37:06 -0800 (PST) X-Google-Smtp-Source: ALg8bN48XhIm5jcIjUeEA/TMABCJ2XGs/RXrrQcDTJFhjoEOfKble9GtJY18l7oXuhH47v+b40SJ X-Received: by 2002:a63:170c:: with SMTP id x12mr18382348pgl.364.1545831425562; Wed, 26 Dec 2018 05:37:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831425; cv=none; d=google.com; s=arc-20160816; b=Bl17Gs1l+CCZDwKma+jqj6faPDwBl377pm2T82xuPMDniu8Rpg4jlkrgCDOaZXvkOd xOoFq1ZoserJhq7HP5oSu0x/fUkMPeFMBkTUtHK7Mh1JhQs5rpUkas8jQE0f1gN2HfmX SsygvACRRQzk+Y/vVxrsGbV0S+2koG2fkZFTUxWJfeTtuhIwkrS8aFoJPX1zLnpT0v+H h8C8iRtcc7CgbVaCu6XgLMMTtg1klE5DP29EpL/8N96rgbuvIhcJ95FIo/0V0qQ8kczW YJsoHpDi4N1GomIqMc+SnH23HIrXJurGglFiX5MlaTVVzo1hKjpRXYV+oSwTnXSJbMcF KcrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:references:subject:cc:cc:cc:cc:cc :cc:cc:cc:cc:cc:cc:to:from:date:user-agent:message-id; bh=R9KC6ubzg8F3Mp0v0KkgZ134RXbUulmpW2HiMrvB4P4=; b=IMXXNA8BIHabHTdBdh0otQ+z+WBxmKhh9ALpNmDowMU4+lAOylxoI9UcyAUi210snt BazGnb22Iddv8uVITjxCbH+wYCmWFIsJ4yxOTS1HMBMzbju+kkl3D7T2X2JkKT/SOtBr JTpzTFzhKjHCngP+V5xrPkijv8nJh3DsQrPsEnYtqSA84SE236ZL5bl8At9ijwZljI8R y0QBJBaJ0ypQ7h3Rz32dXgwMuhcvDfxn6tTxQfZIN7qf/yqUkV5Wno6FiYlp8X8ruPm8 7j9KEWcmEyy5C7jYx79SX5MZK3xeIW7kztdDjD2TmVmEutsuMRpGKhcco0WdDh6c8q33 EbGg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id e68si15371744pfb.101.2018.12.26.05.37.05 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Dec 2018 05:37:05 -0800 (PST) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="121185455" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by FMSMGA003.fm.intel.com with ESMTP; 26 Dec 2018 05:37:01 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005Nv-69; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133351.106676005@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:14:47 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Fan Du , Fengguang Wu cc: kvm@vger.kernel.org Cc: LKML cc: Yao Yuan cc: Peng Dong cc: Huang Ying CC: Liu Jingqi cc: Dong Eddie cc: Dave Hansen cc: Zhang Yi cc: Dan Williams Subject: [RFC][PATCH v2 01/21] e820: cheat PMEM as DRAM References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=0001-e820-Force-PMEM-entry-as-RAM-type-to-enumerate-NUMA-.patch X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Fan Du This is a hack to enumerate PMEM as NUMA nodes. It's necessary for current BIOS that don't yet fill ACPI HMAT table. WARNING: take care to backup. It is mutual exclusive with libnvdimm subsystem and can destroy ndctl managed namespaces. Signed-off-by: Fan Du Signed-off-by: Fengguang Wu --- arch/x86/kernel/e820.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- linux.orig/arch/x86/kernel/e820.c 2018-12-23 19:20:34.587078783 +0800 +++ linux/arch/x86/kernel/e820.c 2018-12-23 19:20:34.587078783 +0800 @@ -403,7 +403,8 @@ static int __init __append_e820_table(st /* Ignore the entry on 64-bit overflow: */ if (start > end && likely(size)) return -1; - + if (type == E820_TYPE_PMEM) + type = E820_TYPE_RAM; e820__range_add(start, size, type); entry++; From patchwork Wed Dec 26 13:14:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10743077 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0E3F8924 for ; Wed, 26 Dec 2018 13:37:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F075628495 for ; Wed, 26 Dec 2018 13:37:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E223328938; Wed, 26 Dec 2018 13:37:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 82D8328495 for ; Wed, 26 Dec 2018 13:37:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EF8B48E0009; Wed, 26 Dec 2018 08:37:07 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C5DF48E0003; Wed, 26 Dec 2018 08:37:07 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A4A338E0009; Wed, 26 Dec 2018 08:37:07 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 2F0C08E0006 for ; Wed, 26 Dec 2018 08:37:07 -0500 (EST) Received: by mail-pg1-f197.google.com with SMTP id s22so15223477pgv.8 for ; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:subject :references:mime-version:content-disposition; bh=bLqbS/PT6+OZn4Cd5ognINxOjsGpaTLmfW4Gzyha0A4=; b=NM5AqxtgvI0sPCSC7eCYsY2rHc6pXdxWCyARWdzUQirD+yebcLW+wUhQDtOdqfI+1l QRx77MsjU6X/g1cebkC1cioPra6Zr/C11YTHD9Krn/ObbslnJ9reXtzkSNdvPXQnpd// Ddp3forWtX1Rsahl/pdzW7b3FFKpCjPzd9jPCSApWAFTq3ycOY3AySkqh6rjoSnOcn7J 4OhczEjTBR3iRDNTJ4lZfSoeqDYO4pD5+LQo4hen2KBR4EHrTa9GfR/Yty2X06nSdUlD VRBNe2vbkn/h70rH5kUozG5h3pVEpMb8yp6As4RJc7VtkLHUxugTFagHzHst2rW3gwB+ XwpA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWaxmL6qH5T4yaHndwjeIjlGS1Ik7CCdRLykf6ajQuGVFEobFsWz GnIfEf422rqBvTKYDaLc/9lCZd0Ulpm9fzM0qQ/e5AyiNHytjvkNJoEzgwXR1zOGTrF3ztUjcnD rCfpaw6D1r6mV4OYxbLys/CEB8ZQYjZiOqURnmZo26FYo8oHLVx4GwrNI5NMeC485MQ== X-Received: by 2002:aa7:8286:: with SMTP id s6mr19849796pfm.63.1545831426820; Wed, 26 Dec 2018 05:37:06 -0800 (PST) X-Google-Smtp-Source: AFSGD/WQ4B60GhMAduoRZhcDq54lJ66ohrTTN6e0JuJ1HGfD+xHUlIohYnwBU82oQ1LCgC3W/drH X-Received: by 2002:aa7:8286:: with SMTP id s6mr19849757pfm.63.1545831426118; Wed, 26 Dec 2018 05:37:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831426; cv=none; d=google.com; s=arc-20160816; b=YGjs0E7YKjQD8P1I9uTbUB0hD+3JfO4B+tij/eDcVeX+a4dQh448qANYJnrqUUAlka 2mnthQw1RTMol3Pd6D8b6hBQlOKNsbBE4ZXkdT9VPuHlg9YI+pf0KxzfCVV4/z/pNgPA 0nWOuU1B+EpNfxTnMnhhaOVWZavHtcy67PmYtqb6yrgts51SCsWrY68p0po5lRLGxudj 3FvycUcfV45baAMzDRo5RCEEDGV8eull3F+EhZHGPu283iI/7Cimr50i9Eb4/TQ6KTAk Y5pJvDX6usx2yXGhLbNFdLpVuswCp/JvKeEreQDhciUGsySMdYhZ9pdS/j9eBkskRZRO Ekdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:references:subject:cc:cc:cc:cc:cc :cc:cc:cc:cc:cc:cc:to:from:date:user-agent:message-id; bh=bLqbS/PT6+OZn4Cd5ognINxOjsGpaTLmfW4Gzyha0A4=; b=pq8aQYkOMpFj9e10gitZIPllbb11Xfp9oinWxQhL+uMuH9acYiqN0N0gpdLnWy6BlD yTl6mOvlF2nVKl3V2TggL+Z03RNlMYcCulCuum++xjR4L64vuUhEQk4ynskLC8qn41y4 sGph6iKVWv3Ai3NU7hP5pegyJ3StEOiWQJYm8wVEo5opV+qLnzK71VNyUfqiNqexggB+ 7vo3Mhs7CotOg53IQ+BYhmO3293YUMpzfCtcajBWwkVXTtLCCMhiAPLsZBAHp+//60AW SRnT9CXEujLvrsj9Puxk58k3MBS83vJqjN1txc50vPajD3uBFMzsBKPQotz3r8P36nW+ O2tw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id e68si15371744pfb.101.2018.12.26.05.37.05 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Dec 2018 05:37:06 -0800 (PST) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="121185460" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by FMSMGA003.fm.intel.com with ESMTP; 26 Dec 2018 05:37:01 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005Nz-7S; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133351.164047705@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:14:48 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Fan Du , Fengguang Wu cc: kvm@vger.kernel.org Cc: LKML cc: Yao Yuan cc: Peng Dong cc: Huang Ying CC: Liu Jingqi cc: Dong Eddie cc: Dave Hansen cc: Zhang Yi cc: Dan Williams Subject: [RFC][PATCH v2 02/21] acpi/numa: memorize NUMA node type from SRAT table References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=0002-acpi-Memorize-numa-node-type-from-SRAT-table.patch X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Fan Du Mark NUMA node as DRAM or PMEM. This could happen in boot up state (see the e820 pmem type override patch), or on fly when bind devdax device with kmem driver. It depends on BIOS supplying PMEM NUMA proximity in SRAT table, that's current production BIOS does. Signed-off-by: Fan Du Signed-off-by: Fengguang Wu --- arch/x86/include/asm/numa.h | 2 ++ arch/x86/mm/numa.c | 2 ++ drivers/acpi/numa.c | 5 +++++ 3 files changed, 9 insertions(+) --- linux.orig/arch/x86/include/asm/numa.h 2018-12-23 19:20:39.890947888 +0800 +++ linux/arch/x86/include/asm/numa.h 2018-12-23 19:20:39.890947888 +0800 @@ -30,6 +30,8 @@ extern int numa_off; */ extern s16 __apicid_to_node[MAX_LOCAL_APIC]; extern nodemask_t numa_nodes_parsed __initdata; +extern nodemask_t numa_nodes_pmem; +extern nodemask_t numa_nodes_dram; extern int __init numa_add_memblk(int nodeid, u64 start, u64 end); extern void __init numa_set_distance(int from, int to, int distance); --- linux.orig/arch/x86/mm/numa.c 2018-12-23 19:20:39.890947888 +0800 +++ linux/arch/x86/mm/numa.c 2018-12-23 19:20:39.890947888 +0800 @@ -20,6 +20,8 @@ int numa_off; nodemask_t numa_nodes_parsed __initdata; +nodemask_t numa_nodes_pmem; +nodemask_t numa_nodes_dram; struct pglist_data *node_data[MAX_NUMNODES] __read_mostly; EXPORT_SYMBOL(node_data); --- linux.orig/drivers/acpi/numa.c 2018-12-23 19:20:39.890947888 +0800 +++ linux/drivers/acpi/numa.c 2018-12-23 19:20:39.890947888 +0800 @@ -297,6 +297,11 @@ acpi_numa_memory_affinity_init(struct ac node_set(node, numa_nodes_parsed); + if (ma->flags & ACPI_SRAT_MEM_NON_VOLATILE) + node_set(node, numa_nodes_pmem); + else + node_set(node, numa_nodes_dram); + pr_info("SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]%s%s\n", node, pxm, (unsigned long long) start, (unsigned long long) end - 1, From patchwork Wed Dec 26 13:14:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10743071 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6F534924 for ; Wed, 26 Dec 2018 13:37:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5B4F228495 for ; Wed, 26 Dec 2018 13:37:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4FA7828938; Wed, 26 Dec 2018 13:37:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E7A3D28495 for ; Wed, 26 Dec 2018 13:37:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2C3AD8E0005; Wed, 26 Dec 2018 08:37:07 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2460E8E0003; Wed, 26 Dec 2018 08:37:07 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0C1188E0001; Wed, 26 Dec 2018 08:37:07 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id A161B8E0002 for ; Wed, 26 Dec 2018 08:37:06 -0500 (EST) Received: by mail-pl1-f199.google.com with SMTP id j8so14043738plb.1 for ; Wed, 26 Dec 2018 05:37:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:subject :references:mime-version:content-disposition; bh=A32p6PRUkSePjgXOn8auRhPpIw8cvQ9iIIA7hvL+s2I=; b=UQkqWebzAYSMQ1ZTeCgPvuz2Ba9gf8PE/GuMKrI+ZnS6YrqYfZ2+5rtaP3vdcGmGew YyY3E7/SjT7I4FNJt1k91OoAwYa55HTu532n8FvkmZK+txH16EU5vXCpXKvH6rRAM/Oa VKJSROautDjKjqgTFwEqwBBRfiFX7wyX9rcYK7S4Nztd8FmGslIZ3olrav2Hxq9XFcAI JxkVE6eX3df2Z9d6mheGFN7NSyxC/iP8nl2vDcs8GiappwGKCKYLpX032Gt2ctJ7hr9G CF2fG01wAkw4ktOlSQwDXGWAlxcJYAR37ZoOvYZGoOig14QoGueUhTpC5p/ziVx+mEBC yxvQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWYASnNFOyNMGnPJHwfxJAfTJvotXWJizDoHclGlZgvE6y90HhW2 BhFQOFYl9mIHu0+GV9UlUyFZNWoqvSpnKZ2wBwg9roQt1FeYfaB4oHby0PnVWAY59McqxAZzXqZ J+yrthKo6iNl2cj9UCtdUMahwilQ9rdyvk7VoQB48TK8yviz5uANiUztoXDvEGOeBqw== X-Received: by 2002:a62:b80a:: with SMTP id p10mr20146968pfe.32.1545831426336; Wed, 26 Dec 2018 05:37:06 -0800 (PST) X-Google-Smtp-Source: AFSGD/UonuPM4hcWlLd+GOQcLdvPjo/SZKDdabzSZhSR/zJPSQ871HpzV92Q5XQ9bvlykcpq7mb9 X-Received: by 2002:a62:b80a:: with SMTP id p10mr20146936pfe.32.1545831425823; Wed, 26 Dec 2018 05:37:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831425; cv=none; d=google.com; s=arc-20160816; b=ZWVfEdJ+1Fjyj0c91TMDDtywGvR0YCDAI9TfiqidFMoved5B0a0VhSx+y28luTRFfj 79J8PlApQpPJ4OY/nM2cxV81TjQJTearfYnczLfU0+G+lDgzXQcO3QdP6/RSNbO5Iv27 yLlSX6NNd73qGWQX6FcryHAuSNhIbpz5MslhI2xEYZZYVWVZYhbJa9sDon9wxnzCdOn+ W6EY9N4Yet4bXR60jqI116CneM/Bb3+gg5aK5mSvYk29oydHP3u24udDLaAbAuEUCuoT QRwV6zhcdx6WnNPuKuMOLiHWwnXSHbHfXknSW2TA4biVpH7arHGx7gjW8VqfZn3JEb/U tBkg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:references:subject:cc:cc:cc:cc:cc :cc:cc:cc:cc:cc:cc:cc:to:from:date:user-agent:message-id; bh=A32p6PRUkSePjgXOn8auRhPpIw8cvQ9iIIA7hvL+s2I=; b=zvRZwaYSUvMYJFgsro4NPJDudfXcg6TF6KoQSYYFhfLQ0jGcvYvfu/4p5E3qiM2/20 0lUMTeeFWMHxhj7Se90zIeoYKX4H5EI6C+dtOHVtOnn2SX7c2Bi0h8ANIWSpCXO0HcNq Q93EOCa2KhgvizHVPB9v2r4KotFsioaPehhEpaXUsgRyEPdnrK5OhGvQ9uIVDylQUrsF DjL88tToUvAWC3mNRrTmG+JxBQCM7jfBZQEnqFebguAkvcJBh5Y3ax4uken/vnzFF1yd i3CvLsdW1GQZiAbNU7Rv5s1K/bedbpFzP/0H5HyBZcDfaAl2ejYbCac+hMACQcg4oMCT yqCg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id r12si1487152plo.59.2018.12.26.05.37.05 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Dec 2018 05:37:05 -0800 (PST) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="113358926" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by orsmga003.jf.intel.com with ESMTP; 26 Dec 2018 05:37:02 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005O3-8C; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133351.229014333@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:14:49 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Fan Du cc: kvm@vger.kernel.org Cc: LKML cc: Yao Yuan cc: Peng Dong cc: Huang Ying CC: Liu Jingqi cc: Dong Eddie cc: Dave Hansen cc: Zhang Yi cc: Dan Williams cc: Fengguang Wu Subject: [RFC][PATCH v2 03/21] x86/numa_emulation: fix fake NUMA in uniform case References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=fix-fake-numa.patch X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Fan Du The index of numa_meminfo is expected to the same as of numa_meminfo.blk[]. and numa_remove_memblk_from break the expectation. 2S system does not break, because before numa_remove_memblk_from index nid 0 0 1 1 after numa_remove_memblk_from index nid 0 1 1 1 If you try to configure uniform fake node in 4S system. index nid 0 0 1 1 2 2 3 3 node 3 will be removed by numa_remove_memblk_from when iterate index 2. so we only create fake node for 3 physcial node, and a portion of memroy wasted as much as it hit lost pages checking in numa_meminfo_cover_memory. Signed-off-by: Fan Du --- arch/x86/mm/numa_emulation.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) --- linux.orig/arch/x86/mm/numa_emulation.c 2018-12-23 19:20:51.570664269 +0800 +++ linux/arch/x86/mm/numa_emulation.c 2018-12-23 19:20:51.566664364 +0800 @@ -381,7 +381,21 @@ void __init numa_emulation(struct numa_m goto no_emu; memset(&ei, 0, sizeof(ei)); - pi = *numa_meminfo; + + { + /* Make sure the index is identical with nid */ + struct numa_meminfo *mi = numa_meminfo; + int nid; + + for (i = 0; i < mi->nr_blks; i++) { + nid = mi->blk[i].nid; + pi.blk[nid].nid = nid; + pi.blk[nid].start = mi->blk[i].start; + pi.blk[nid].end = mi->blk[i].end; + } + pi.nr_blks = mi->nr_blks; + + } for (i = 0; i < MAX_NUMNODES; i++) emu_nid_to_phys[i] = NUMA_NO_NODE; From patchwork Wed Dec 26 13:14:50 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10743087 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2AFE791E for ; Wed, 26 Dec 2018 13:37:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 167B628495 for ; Wed, 26 Dec 2018 13:37:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0A23D28938; Wed, 26 Dec 2018 13:37:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A12F228495 for ; Wed, 26 Dec 2018 13:37:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2FCD88E000F; Wed, 26 Dec 2018 08:37:09 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EF9D78E000E; Wed, 26 Dec 2018 08:37:08 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5AADE8E0002; Wed, 26 Dec 2018 08:37:08 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 7F50F8E000A for ; Wed, 26 Dec 2018 08:37:07 -0500 (EST) Received: by mail-pf1-f197.google.com with SMTP id q64so17734192pfa.18 for ; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:subject :references:mime-version:content-disposition; bh=XVqS352sXVaGGtGXOcEdrkWMtHZ0EYY2T97kfIarVBs=; b=mXAqGiifjJaupsiaX//203e8PLS97y4o++ALykH+BO5gGhKSO6iZwKwNcLEWawEeLN Q+AEEjEQJfSwQtr1R18eYUGO4x6x16vEwHPnUpnvdGpzzdjun2GDfL4TziWefbshtGbM aZiVw2LY+3AouOXyxm47fr2txFiyBps/oOCVlQVJ6DEhNAVWJyx2FZJAF2XU/KZ2Uikk hZ4cVWrgsUA7MbUO0F1eH233tOqQvE5jlUPnrUviYFrjuNLJSnWB2kmzF6cAL1SOeHZI uv0Tzh1Oc0UWV0UF71fV45v2h4/kVzXnCbo/KkXxsPnVN+1EtOgNIEZN1YzkF7edNwsu pHfA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AJcUukeb4d4Bb7tlFhaBdUNtHfahtsfaHMzio13ntzcPS3MwrylG8jkd /Q9XdcA/GXKXVHZuPfBBOJvEcgJosWVURvpSZ2sebFEWQ5jdmOWw86KqtkCybCzntS57Gb3NdTs bBYbBvqX7b9PBDMyhh3KjDCZUnZk3iWz31YdHfrsBSdCnO61sLgGuax9GfBbiIjkIhw== X-Received: by 2002:a17:902:b118:: with SMTP id q24mr20054525plr.209.1545831427220; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-Smtp-Source: ALg8bN678uHNuRhloRKs5+4c5h5CAVA4KGg+l9fS3ySVh93nfmkhriCajuxhvfiUcEjbB4ztgvYO X-Received: by 2002:a17:902:b118:: with SMTP id q24mr20054498plr.209.1545831426662; Wed, 26 Dec 2018 05:37:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831426; cv=none; d=google.com; s=arc-20160816; b=bK/cF/0wLu8e7FCmdfF8b2Cwg3C1L3VE6ci/YSoyvxmVLCV1cHcUMRij9y+BQlT6he h2ItS1muMr/n4rE7bLMUcYdM+8AxNP32jSGlo+4wgfd+tCNQyISSym5tyFiXNrdrefWJ ukxa37jzckAv/XXGh+35YcE8S+KJSfnW/Nz6ZRq2KJsK4k27JgdtuwfPhgdSYVxGtEK3 Afl5Aw4geO+UpFpnvZ15L7Yo63/76PmCZ6qyfN86HmVOmf5KdE2E+HfHhzgJ1xpojEU+ D2zJj0PZxysZ+SsCQBU4pDlP4C+DaD6NJUCd3o8wavpfoB07yIagknq/li52UAXQl1Qk g0wg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:references:subject:cc:cc:cc:cc:cc :cc:cc:cc:cc:cc:cc:cc:to:from:date:user-agent:message-id; bh=XVqS352sXVaGGtGXOcEdrkWMtHZ0EYY2T97kfIarVBs=; b=UPLb7uYmuI4AaI6v72Et5XsIndAZzlIYWOF13CtgoqqxLB8adKkfLHmcRmzx4tIQpS WwovQQViVDwIcqaQZ4eAKQYyTleCadMoRlBYLl68EVQCo9QTL3hxiMtfrZGjNLDXmzWy VBwHmo1Gz+8Qknfzzt4x92+k6/9oC+pXQwhkOQViqj8p7hHJ32D4OOS67d/j3Bgjo/xV GmWeMsnK0WX15I999bW/MH7fiXfwmEoroJWMgn62nn3bj4qVktwpyj5DMLuipOV/oyNp uSgNUnaOePlwuEA1HUEFtB1BUBu1rvlDreapP7rKJ3fmVk8wrrbk6p3R7iIbj5P0V4SC BJoA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id r12si1487152plo.59.2018.12.26.05.37.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Dec 2018 05:37:06 -0800 (PST) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="113358931" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by orsmga003.jf.intel.com with ESMTP; 26 Dec 2018 05:37:01 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005O7-8z; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133351.287359389@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:14:50 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Fan Du cc: kvm@vger.kernel.org Cc: LKML cc: Yao Yuan cc: Peng Dong cc: Huang Ying CC: Liu Jingqi cc: Dong Eddie cc: Dave Hansen cc: Zhang Yi cc: Dan Williams cc: Fengguang Wu Subject: [RFC][PATCH v2 04/21] x86/numa_emulation: pass numa node type to fake nodes References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=0021-x86-numa-Fix-fake-numa-in-uniform-case.patch X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Fan Du Signed-off-by: Fan Du --- arch/x86/mm/numa_emulation.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) --- linux.orig/arch/x86/mm/numa_emulation.c 2018-12-23 19:21:11.002206144 +0800 +++ linux/arch/x86/mm/numa_emulation.c 2018-12-23 19:21:10.998206236 +0800 @@ -12,6 +12,8 @@ static int emu_nid_to_phys[MAX_NUMNODES]; static char *emu_cmdline __initdata; +static nodemask_t emu_numa_nodes_pmem; +static nodemask_t emu_numa_nodes_dram; void __init numa_emu_cmdline(char *str) { @@ -311,6 +313,12 @@ static int __init split_nodes_size_inter min(end, limit) - start); if (ret < 0) return ret; + + /* Update numa node type for fake numa node */ + if (node_isset(i, emu_numa_nodes_pmem)) + node_set(nid - 1, numa_nodes_pmem); + else + node_set(nid - 1, numa_nodes_dram); } } return nid; @@ -410,6 +418,12 @@ void __init numa_emulation(struct numa_m unsigned long n; int nid = 0; + emu_numa_nodes_pmem = numa_nodes_pmem; + emu_numa_nodes_dram = numa_nodes_dram; + + nodes_clear(numa_nodes_pmem); + nodes_clear(numa_nodes_dram); + n = simple_strtoul(emu_cmdline, &emu_cmdline, 0); ret = -1; for_each_node_mask(i, physnode_mask) { From patchwork Wed Dec 26 13:14:51 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10743075 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8A03E91E for ; Wed, 26 Dec 2018 13:37:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 75EE528495 for ; Wed, 26 Dec 2018 13:37:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 69CD828938; Wed, 26 Dec 2018 13:37:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F124628495 for ; Wed, 26 Dec 2018 13:37:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 831CE8E0007; Wed, 26 Dec 2018 08:37:07 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5AE7F8E0009; Wed, 26 Dec 2018 08:37:07 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2BB698E0001; Wed, 26 Dec 2018 08:37:07 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id DA73C8E0005 for ; Wed, 26 Dec 2018 08:37:06 -0500 (EST) Received: by mail-pl1-f198.google.com with SMTP id bj3so13962427plb.17 for ; Wed, 26 Dec 2018 05:37:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:subject :references:mime-version:content-disposition; bh=c+luLYTm7+SGMH9nIfj/WGMlXDxTm0Th3M4ATYUWmYU=; b=JxkGvE51lkX7uiNGatOx1Pax1t5cnAlRoTJ9ePwInaOQ10i2dNpqOc9Vs51neB4r3I spDiMs9+c6f7XMfM+DPjuAag/paXwI+PyZujRMW5qZeLVBHfrY2/r+7+3XMT7oIOMy/C Uyzrdi4YKeLmXPGTwZO+BORR40hwFtMQzJzh09PhASJwxemjs1o24aj/a2FXntGNe1wr ng6CWp9P7pp17Vg3eCJ29WI/C7MO3lJ10AXuBEhPt77TRZeRFugGIpHGeGAbJt7QPnhA HGUM1wwQEa0SnGYuhyPNNubND+pRooTe9Fi4D9UXIIn40VwNfdvGHSn9HMiGfYFSAFwh DyTw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AJcUukc+wSGW/4iQrsBk7vAVSu8G/78m1+2lZC2a/GQtE+OyUkaPiJpr f3Hi0j+wz/hZDfPj1SchkiQNOSBwsUUXhnZLc5ethgc3oOd3qmEGWEEBGx/nY2oGyW56ydtTOVJ MG+wv3E4OXP4IpdpED+bP/O4eMM8Rdyp7DCbD20fJ42wMLlQ4Kri6c+G3X5PIxqNYXw== X-Received: by 2002:a63:2507:: with SMTP id l7mr18218225pgl.22.1545831426590; Wed, 26 Dec 2018 05:37:06 -0800 (PST) X-Google-Smtp-Source: ALg8bN7vX3YI1BqvnbR/LiYn2RBWA7tUmfku6awpxon3tbxaUZ8pUdg9P8y10BdR8EzSjh23tgeo X-Received: by 2002:a63:2507:: with SMTP id l7mr18218194pgl.22.1545831426102; Wed, 26 Dec 2018 05:37:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831426; cv=none; d=google.com; s=arc-20160816; b=VTPx/xdo7DY1FdPAEl2ojtcfDXC+biDQaw3tVtpy05xfiNyQ+jRRTHGKAV4mMPb3W0 SYDC2wAQBJemshkfdz1IB1KI72iniw260GX9P3UcgNFs/349xmlxna7mR5cACFan+1Ic ubmGUo9X8bIMtuOLlLH+UIyJcXDzos5uCvFH5HCpgRkCFYctNkNHCGtUAgURtIYRFjCi 6zxOs1Ypa6qbQKtIElN49zoAttQAv7VTDJi67PJ5I10TXNWuK5MM3wYKq3iTsNr+qIN8 4jIWEhbnYx+auMvy/tCdzyyDWl3+yqA556L2ce8DMsKPKVQ6TlKRPtNSPWK5nPHFvz5B BYAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:references:subject:cc:cc:cc:cc:cc :cc:cc:cc:cc:cc:cc:to:from:date:user-agent:message-id; bh=c+luLYTm7+SGMH9nIfj/WGMlXDxTm0Th3M4ATYUWmYU=; b=bkA8ZYobR6SPgaLj7YLhh7PISAyV0G7tWYeQfPK4ooTt39gK5p8rhV7YRle+9zY1y3 /n9Q00D0EuvXOV7K5a9fT4DhC0SfMEXuVeZh8lWjHGG3H12EizW9E14mHS2Sa6gUEqan CT21a+9c2Cm0Ii8l1QdkX1YwXJR9+0Md/BLgIKYa9N3+xTaI8asy27c1WFt3nBeObIaI 7yH5bFpf94H7Fo+mU/xV4ZCWU3mVSFDRgfXSqGeYd3iiN9v8aEZeoKUqnREDjGCeBOg3 dI4Ce8EvGTFMYbmNaZ4Kv+Hkle3qMI2TqZNj0lMxjDIpVWC4nkTUg7eCpAvz2okYiIhN vhMg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id r12si1487152plo.59.2018.12.26.05.37.05 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Dec 2018 05:37:06 -0800 (PST) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="113358927" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by orsmga003.jf.intel.com with ESMTP; 26 Dec 2018 05:37:01 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005OB-9m; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133351.348801665@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:14:51 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Fan Du , Fengguang Wu cc: kvm@vger.kernel.org Cc: LKML cc: Yao Yuan cc: Peng Dong cc: Huang Ying CC: Liu Jingqi cc: Dong Eddie cc: Dave Hansen cc: Zhang Yi cc: Dan Williams Subject: [RFC][PATCH v2 05/21] mmzone: new pgdat flags for DRAM and PMEM References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=0003-mmzone-Introduce-new-flag-to-tag-pgdat-type.patch X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Fan Du One system with DRAM and PMEM, we need new flag to tag pgdat is made of DRAM or peristent memory. This patch serves as preparetion one for follow up patch. Signed-off-by: Fan Du Signed-off-by: Fengguang Wu --- include/linux/mmzone.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) --- linux.orig/include/linux/mmzone.h 2018-12-23 19:29:42.430602202 +0800 +++ linux/include/linux/mmzone.h 2018-12-23 19:29:42.430602202 +0800 @@ -522,6 +522,8 @@ enum pgdat_flags { * many pages under writeback */ PGDAT_RECLAIM_LOCKED, /* prevents concurrent reclaim */ + PGDAT_DRAM, /* Volatile DRAM memory node */ + PGDAT_PMEM, /* Persistent memory node */ }; static inline unsigned long zone_end_pfn(const struct zone *zone) @@ -919,6 +921,30 @@ extern struct pglist_data contig_page_da #endif /* !CONFIG_NEED_MULTIPLE_NODES */ +static inline int is_node_pmem(int nid) +{ + pg_data_t *pgdat = NODE_DATA(nid); + + return test_bit(PGDAT_PMEM, &pgdat->flags); +} + +static inline int is_node_dram(int nid) +{ + pg_data_t *pgdat = NODE_DATA(nid); + + return test_bit(PGDAT_DRAM, &pgdat->flags); +} + +static inline void set_node_type(int nid) +{ + pg_data_t *pgdat = NODE_DATA(nid); + + if (node_isset(nid, numa_nodes_pmem)) + set_bit(PGDAT_PMEM, &pgdat->flags); + else + set_bit(PGDAT_DRAM, &pgdat->flags); +} + extern struct pglist_data *first_online_pgdat(void); extern struct pglist_data *next_online_pgdat(struct pglist_data *pgdat); extern struct zone *next_zone(struct zone *zone); From patchwork Wed Dec 26 13:14:52 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10743091 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 009DE924 for ; Wed, 26 Dec 2018 13:37:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E0A1E28495 for ; Wed, 26 Dec 2018 13:37:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D527128938; Wed, 26 Dec 2018 13:37:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 79CE628495 for ; Wed, 26 Dec 2018 13:37:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 936AF8E0003; Wed, 26 Dec 2018 08:37:09 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1D5698E000C; Wed, 26 Dec 2018 08:37:09 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB90D8E000C; Wed, 26 Dec 2018 08:37:08 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id ADE7B8E000E for ; Wed, 26 Dec 2018 08:37:07 -0500 (EST) Received: by mail-pf1-f198.google.com with SMTP id p9so17807808pfj.3 for ; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:subject :references:mime-version:content-disposition; bh=AU3CNRHe4j8llGakV2TMFwGlbQws6UPjl/4IO2w5qSU=; b=l79K4X+5PzdklSXzQAUlKKn8taeIrePHtwaiMpOcC57IVR4qbC52ZcK0UI3eFK3JMI lmxkpztbm7jnxGp1UAiYTBF9iKGyzHHGWMY27L9bN/+2PWlmS16t+D0GYPm9bMP2V0UP DQ5cebg16pqR5c+8ZLM62UKcuGXqgM3LxO28IA8Tp990z1wO+XXTSvQMK5jJbz3PwAqj cWsPb8uibM5oNJVkeYXtynCydOImHHTBaEHmAqAUmCYW9MdTzKKSUiMC0RRqJdoAoG9/ 3lBuMWEpCPo8Mn7cmAILQEPic6DRKP/jQg3nvHGKptpvRekDXxv+J5aLiGgiZ5/Sz+tV wU0w== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AJcUukfbh/0a3cnDcDE066xp03mer3ZyZzloSTt7y9QDJKNFHjWKKtbw bWru3WMJVtEVbA1rgUk7W5uYIyPhkSZSbNk8gEO5FxYe70XR0URs45OObsKKT3SmViFASKr0tdJ 3mVkHR6EMRuWutaeHPO65p1FIqJFJBGmLvFzA02bp18HgEPSP0gfL7XUW8bPXewwxig== X-Received: by 2002:a63:c42:: with SMTP id 2mr19079772pgm.372.1545831427410; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-Smtp-Source: ALg8bN42dKw6qvjrCcjVjH1+RLdTT70fyLQuyOfxZ1nIdqCuEyDyyUAcG8kFtb0ZbqkMFJUUskAD X-Received: by 2002:a63:c42:: with SMTP id 2mr19079748pgm.372.1545831426901; Wed, 26 Dec 2018 05:37:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831426; cv=none; d=google.com; s=arc-20160816; b=jM6eHZ6wmL7MUXPv//ebEhDmEmofxM1bvkOl2IZL2DSPEBeudsV2LSLflqv7Clu/mo wpV9ydpl55AXAGESXLfzf3HHnPUjPx2fLPorNXKqlZBknfYefF4+DmHzLUR7htAsIEwP AHAKYgG9J42RilXkXZfeZ8jz7LuGZ1+pLTwCM+RWgLCJxUv5n8xXFvZUrLro3PTggZqZ iMoPgIr2RWcrtLTq9T6yf1aZGpezmAU7uf0jfhDnLwc4YOZtZZ9SCWJE3BajheNXi9Zn FEVRkWy4tSxT0XBuuqAjy3+fqvYehH5Zsb4idbX7l1vZh9CnPfl3qzuDg6U/0ih/kPyU 0B9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:references:subject:cc:cc:cc:cc:cc :cc:cc:cc:cc:cc:cc:to:from:date:user-agent:message-id; bh=AU3CNRHe4j8llGakV2TMFwGlbQws6UPjl/4IO2w5qSU=; b=YTIJYwNp7daKlH/2i8BXIOyGb9kHVkSFEnvdmE8lzAtO54YqroCnbu84qNIvD41mVH rdUvKf2KApelYr8W//6zZrcIjPISziE3T92jumz2v4g5yf390Ok3jK2NgtQGPsHh0PqN U+fWMxrIZ66iT4PdmaYnsp92wboyyCXynovsFyGEubQgx0RLWtVwW72H+dLfiYpNomk0 b2hpWd7Tlow9MJW4YKwO+su8fb4ctca7kkaCfvcK+Pkfh+/XcywPTeICiCr3nSfkD00t aXB6ZHRmmSoTK4Hp08JK98u9qv3MHYLoqXtt6Wmno178wgv9lZ/hi2yk+utgLFm1Hza/ 3j0w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id r12si1487152plo.59.2018.12.26.05.37.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Dec 2018 05:37:06 -0800 (PST) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="113358935" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by orsmga003.jf.intel.com with ESMTP; 26 Dec 2018 05:37:02 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005OH-Ae; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133351.410639437@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:14:52 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Fan Du , Fengguang Wu cc: kvm@vger.kernel.org Cc: LKML cc: Yao Yuan cc: Peng Dong cc: Huang Ying CC: Liu Jingqi cc: Dong Eddie cc: Dave Hansen cc: Zhang Yi cc: Dan Williams Subject: [RFC][PATCH v2 06/21] x86,numa: update numa node type References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=0004-x86-numa-Update-numa-node-type.patch X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Fan Du Signed-off-by: Fan Du Signed-off-by: Fengguang Wu --- arch/x86/mm/numa.c | 1 + 1 file changed, 1 insertion(+) --- linux.orig/arch/x86/mm/numa.c 2018-12-23 19:38:17.363582512 +0800 +++ linux/arch/x86/mm/numa.c 2018-12-23 19:38:17.363582512 +0800 @@ -594,6 +594,7 @@ static int __init numa_register_memblks( continue; alloc_node_data(nid); + set_node_type(nid); } /* Dump memblock with node info and return. */ From patchwork Wed Dec 26 13:14:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10743073 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9C0A5924 for ; Wed, 26 Dec 2018 13:37:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8689D28495 for ; Wed, 26 Dec 2018 13:37:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7A90C28938; Wed, 26 Dec 2018 13:37:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6415D28495 for ; Wed, 26 Dec 2018 13:37:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 616008E0001; Wed, 26 Dec 2018 08:37:07 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 46C868E0007; Wed, 26 Dec 2018 08:37:07 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 15D5B8E0002; Wed, 26 Dec 2018 08:37:07 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id B0ECC8E0003 for ; Wed, 26 Dec 2018 08:37:06 -0500 (EST) Received: by mail-pg1-f199.google.com with SMTP id s22so15223468pgv.8 for ; Wed, 26 Dec 2018 05:37:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:subject :references:mime-version:content-disposition; bh=MbMZBCdGS9vCEzO7ug9k7Oryi7M/ui4d6Yz4YrcaGK4=; b=EvSwkfgu4jd5asNg8PfI7l6gmciEvKbsdsz7fcn52FSw9kcaZlpzl6q71M+PjjMXcK vqzymaSapE7FHHPNu9XTWaQM7Sf9IFCe3OAFudB0yHZ5Tlw4JJH9tNhqUxsZuC6HjoUl L4/uJIqgFJeevMXIJVttkSxDabduyn9NJOGaO3iGZ8SvMXp9+vqgp7UWBPxED/1q03zQ Mn/kkVokP6KtVwRImUl97uAL/gNRmJas0x9iM8ukREQ7sUWi5sK+4UBbNHfLUACBo/eS eQ67zAFbSN4jj9PTxzssiWS8RWN0nIBxEISt1MwSl0TCv72FcFP+PIU8V8xZZs6hztBX po5g== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AJcUukfy8ycToCIEf77fqXzflXE8kZPYNd2ZRCxU5KJXdOd6BCRdah8E GZkPR/GgRxChsPu8vvOin6cSq1YhM7lDZcxWCIdjdykH4at1zqNyds+QOMq2jA/FnYqqWQMrk0G fgXYf/wRGlvOKNEg/iHZsxk9eWBn2dQH4ECHedzE5fQAiIcSX0MWnz7sQ4vJSAyIJFw== X-Received: by 2002:a63:4926:: with SMTP id w38mr18228264pga.353.1545831426409; Wed, 26 Dec 2018 05:37:06 -0800 (PST) X-Google-Smtp-Source: ALg8bN4codTF3hEdwVYFfmniwG2c4uN7ge+istjc7etBiL/IfxyoGS2C9mfhPwdK9PgAu0gQRQRp X-Received: by 2002:a63:4926:: with SMTP id w38mr18228227pga.353.1545831425841; Wed, 26 Dec 2018 05:37:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831425; cv=none; d=google.com; s=arc-20160816; b=K3tlKPpNhqmPkuiAM1PqyR7DqrByUESkqVRaBEvspM6xn+PptTZuV+S4nj2VfgML4/ 7YJqgNi1bdNrQvOw50pzByLVUy0nN8hLvHQLoyveKrYAQSodYEZ4Rp+QbDTFURJiylAa nvTCrXCMVOnfZHqSnHHwtoD9jt3Bick8/Ua9RPTeV/uUK6i9ESRYKs6gRsyMJ/udkWHk c0fJZmaO57Uz2/F2RM65M3TgWst2CAa7ypf/4s47VTV3EwsXM2iMJmcjnDcPujvQleb1 /SmcuX4tL8JvXcmYiopd8yiWyTIkK4wB8D+PgdWJxskMCS8Pwi6QJTgieu+VvTeBiIQD 2K+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:references:subject:cc:cc:cc:cc:cc :cc:cc:cc:cc:cc:cc:to:from:date:user-agent:message-id; bh=MbMZBCdGS9vCEzO7ug9k7Oryi7M/ui4d6Yz4YrcaGK4=; b=Fd5Qxn2GZ9c7dZvebrOqaiDYxNx8/bkvrerAj3Es7T/Ce/9Sq/VYk28xhubu4tuuGs XeHEfGHssvnqGW9KV4/G1YUjU7+xusovijpN4/x5PdGud0+QIZpRLnUBXKqS1IgnJt3b h0x5fBifwOS4jsBtSQKhKSSdA1bjt0SRW8hfAis8baIDX40GQzz7TfxWCrhnKBQgXuUe J8TO7d8v7RUfrdXOSeMn2CMxUPb4anoSQseiY8BR8MOJVyAwXn/9pkXoQRhrgJsdE+/t pZ+iEGXhERX52UCPsiy1RMu+064gVds21be+vg9J7SlVbBbwdNiFhy6SdplcvsvJ1rsE rHLA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id e68si15371744pfb.101.2018.12.26.05.37.05 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Dec 2018 05:37:05 -0800 (PST) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="121185457" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by FMSMGA003.fm.intel.com with ESMTP; 26 Dec 2018 05:37:01 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005ON-BS; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133351.463947436@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:14:53 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Fan Du , Fengguang Wu cc: kvm@vger.kernel.org Cc: LKML cc: Yao Yuan cc: Peng Dong cc: Huang Ying CC: Liu Jingqi cc: Dong Eddie cc: Dave Hansen cc: Zhang Yi cc: Dan Williams Subject: [RFC][PATCH v2 07/21] mm: export node type {pmem|dram} under /sys/bus/node References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=0005-Export-node-type-pmem-ram-in-sys-bus-node.patch X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Fan Du User space migration daemon could check /sys/bus/node/devices/nodeX/type for node type. Software can interrogate node type for node memory type and distance to get desirable target node in migration. grep -r . /sys/devices/system/node/*/type /sys/devices/system/node/node0/type:dram /sys/devices/system/node/node1/type:dram /sys/devices/system/node/node2/type:pmem /sys/devices/system/node/node3/type:pmem Along with next patch which export `peer_node`, migration daemon could easily find the memory type of current node, and the target node in case of migration. grep -r . /sys/devices/system/node/*/peer_node /sys/devices/system/node/node0/peer_node:2 /sys/devices/system/node/node1/peer_node:3 /sys/devices/system/node/node2/peer_node:0 /sys/devices/system/node/node3/peer_node:1 Signed-off-by: Fan Du Signed-off-by: Fengguang Wu --- drivers/base/node.c | 10 ++++++++++ 1 file changed, 10 insertions(+) --- linux.orig/drivers/base/node.c 2018-12-23 19:39:04.763414931 +0800 +++ linux/drivers/base/node.c 2018-12-23 19:39:04.763414931 +0800 @@ -233,6 +233,15 @@ static ssize_t node_read_distance(struct } static DEVICE_ATTR(distance, S_IRUGO, node_read_distance, NULL); +static ssize_t type_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + int nid = dev->id; + + return sprintf(buf, is_node_pmem(nid) ? "pmem\n" : "dram\n"); +} +static DEVICE_ATTR(type, S_IRUGO, type_show, NULL); + static struct attribute *node_dev_attrs[] = { &dev_attr_cpumap.attr, &dev_attr_cpulist.attr, @@ -240,6 +249,7 @@ static struct attribute *node_dev_attrs[ &dev_attr_numastat.attr, &dev_attr_distance.attr, &dev_attr_vmstat.attr, + &dev_attr_type.attr, NULL }; ATTRIBUTE_GROUPS(node_dev); From patchwork Wed Dec 26 13:14:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10743079 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 674F5924 for ; Wed, 26 Dec 2018 13:37:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 51CBA28495 for ; Wed, 26 Dec 2018 13:37:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 45DB728938; Wed, 26 Dec 2018 13:37:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 931B428495 for ; Wed, 26 Dec 2018 13:37:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 405CD8E000A; Wed, 26 Dec 2018 08:37:08 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 189248E0008; Wed, 26 Dec 2018 08:37:07 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B52CD8E000F; Wed, 26 Dec 2018 08:37:07 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id 631098E0003 for ; Wed, 26 Dec 2018 08:37:07 -0500 (EST) Received: by mail-pl1-f199.google.com with SMTP id 89so13992674ple.19 for ; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:subject :references:mime-version:content-disposition; bh=YdkPR857z2QJHU+MTdQBAeYrLi5HPVsYANWKLt6w3ss=; b=G54udu2YyIp5vDKKwGj7Jdn2oeUSXeHf6ldcMBY6pdbGF4m4qH54r4PfhDH6VShhL4 RAIvJ8mGaGBefvWl8b8rJZ5PnQfZ+scVQc9M1LJisZKqEH1KLvRctMvCAVe5gnj3zHSW 1ksu0JIiD5ps5OV61K9L5XZf/rI8IK6UEo8dYNA4KdZnv38ocaY0QeRdAeC6cOgQZwxf 5vhRfzYVL6ME6YbHqT6/+yE5C7KSetSHTg9T4vJYy5IENEz0IQB2g6N+q33SPEuTBvCQ LOt6Ej/BFgB5cpqWDMd7xH685EBS2RSsRoT8zix+Upfw5RyYRH9ykyqB+JSfj9f1Xtbw pEHQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AJcUukc13LA1IyWvwhnYagEl01D5mAW4p/Nd44hUGD3d+N5O21JcqPVy XXdoJxGgo8OKuX/BblnWF/B35JWKn6P06hlBlBI2kx5gsFCuNyF+tl02+R2LGc8fLZHWoihz2IO le4BEf+XybPIs0s6UOHJx3+gxawo1OJf0OrWl2pjTod98H07V+vXCrHmwql5zFjM3CQ== X-Received: by 2002:a63:4101:: with SMTP id o1mr18777694pga.447.1545831427100; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-Smtp-Source: ALg8bN7WMQyJsGwJiGqWnZtHzwEaOER8RGzX96D4MxG23CjFqXoGuSE7aHEaUZ+Gy+VnYTzKo+G3 X-Received: by 2002:a63:4101:: with SMTP id o1mr18777656pga.447.1545831426513; Wed, 26 Dec 2018 05:37:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831426; cv=none; d=google.com; s=arc-20160816; b=j2jauWEOVwWfGoRPtNDwgIo3tgtIIWN+FCM8JH/Vm9X2YXHi0AnHbZvosJ3ihDKT0T uzE0iWellzxIE6giSHC58TUKXob3GEuB5t+gBBF/Dqc/CKb2jJvGM0ZLGTt1rSpHVZDW dO3CqSZ2VqWtE2s3UH57d5vyiQweuwuoZ6SFGEYz1MsSGattV7wiAYIfWkii2+VB8gcD VXiI3NyMWO2E04jwDHWM6z+AR88dBq5NcKcaAV7yiShhkc73jFxKqGgiP10jROF+veKf JOsdlzVw7vSJcQNXhiNrd1CzEos0VvFcyWbrAAJiLNgluK/mDXTBvhOzh+hNUWG/uIja VPTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:references:subject:cc:cc:cc:cc:cc :cc:cc:cc:cc:cc:cc:to:from:date:user-agent:message-id; bh=YdkPR857z2QJHU+MTdQBAeYrLi5HPVsYANWKLt6w3ss=; b=W4WuyoP/5PUGym5RG4MRfTAzG6BrhKAYpkp6MQMY6kBQiBZRKrz81tleEsks4cl1wc ugpKH5HWm5AN+mwyxNEzn/u0y5xgJk8rxfA/3OcZr5Rlp/DzD6KRP7mZqWO/M58ns+bx 3+vI+eoZHBczxCkkMkqin447zMzBK0T6Heb4xp4NtySEB2/gZY9xnSXcECDbVD+3GHZ5 SHLh85BYB+WVYh8gfStWlHuRJ3j0ad/VW2ZyV00QeFheVbUpOPABVmRJo91POcZtORYo ty9JdTjD9HyLBvT8QbJ5YVk30pUxb/qeXGlKha+6wdxiVHO13KHod5jFrvVI6AjtZX3r ggvA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id c7si33395890pgg.339.2018.12.26.05.37.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Dec 2018 05:37:06 -0800 (PST) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="121185462" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by FMSMGA003.fm.intel.com with ESMTP; 26 Dec 2018 05:37:01 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005OT-CD; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133351.521151384@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:14:54 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Fan Du , Fengguang Wu cc: kvm@vger.kernel.org Cc: LKML cc: Yao Yuan cc: Peng Dong cc: Huang Ying CC: Liu Jingqi cc: Dong Eddie cc: Dave Hansen cc: Zhang Yi cc: Dan Williams Subject: [RFC][PATCH v2 08/21] mm: introduce and export pgdat peer_node References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=0019-mm-Introduce-and-export-peer_node-for-pgdat.patch X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Fan Du Each CPU socket can have 1 DRAM and 1 PMEM node, we call them "peer nodes". Migration between DRAM and PMEM will by default happen between peer nodes. It's a temp solution. In multiple memory layers, a node can have both promotion and demotion targets instead of a single peer node. User space may also be able to infer promotion/demotion targets based on future HMAT info. Signed-off-by: Fan Du Signed-off-by: Fengguang Wu --- drivers/base/node.c | 11 +++++++++++ include/linux/mmzone.h | 12 ++++++++++++ mm/page_alloc.c | 29 +++++++++++++++++++++++++++++ 3 files changed, 52 insertions(+) --- linux.orig/drivers/base/node.c 2018-12-23 19:39:51.647261099 +0800 +++ linux/drivers/base/node.c 2018-12-23 19:39:51.643261112 +0800 @@ -242,6 +242,16 @@ static ssize_t type_show(struct device * } static DEVICE_ATTR(type, S_IRUGO, type_show, NULL); +static ssize_t peer_node_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + int nid = dev->id; + struct pglist_data *pgdat = NODE_DATA(nid); + + return sprintf(buf, "%d\n", pgdat->peer_node); +} +static DEVICE_ATTR(peer_node, S_IRUGO, peer_node_show, NULL); + static struct attribute *node_dev_attrs[] = { &dev_attr_cpumap.attr, &dev_attr_cpulist.attr, @@ -250,6 +260,7 @@ static struct attribute *node_dev_attrs[ &dev_attr_distance.attr, &dev_attr_vmstat.attr, &dev_attr_type.attr, + &dev_attr_peer_node.attr, NULL }; ATTRIBUTE_GROUPS(node_dev); --- linux.orig/include/linux/mmzone.h 2018-12-23 19:39:51.647261099 +0800 +++ linux/include/linux/mmzone.h 2018-12-23 19:39:51.643261112 +0800 @@ -713,6 +713,18 @@ typedef struct pglist_data { /* Per-node vmstats */ struct per_cpu_nodestat __percpu *per_cpu_nodestats; atomic_long_t vm_stat[NR_VM_NODE_STAT_ITEMS]; + + /* + * Points to the nearest node in terms of latency + * E.g. peer of node 0 is node 2 per SLIT + * node distances: + * node 0 1 2 3 + * 0: 10 21 17 28 + * 1: 21 10 28 17 + * 2: 17 28 10 28 + * 3: 28 17 28 10 + */ + int peer_node; } pg_data_t; #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) --- linux.orig/mm/page_alloc.c 2018-12-23 19:39:51.647261099 +0800 +++ linux/mm/page_alloc.c 2018-12-23 19:39:51.643261112 +0800 @@ -6926,6 +6926,34 @@ static void check_for_memory(pg_data_t * } } +/* + * Return the nearest peer node in terms of *locality* + * E.g. peer of node 0 is node 2 per SLIT + * node distances: + * node 0 1 2 3 + * 0: 10 21 17 28 + * 1: 21 10 28 17 + * 2: 17 28 10 28 + * 3: 28 17 28 10 + */ +static int find_best_peer_node(int nid) +{ + int n, val; + int min_val = INT_MAX; + int peer = NUMA_NO_NODE; + + for_each_online_node(n) { + if (n == nid) + continue; + val = node_distance(nid, n); + if (val < min_val) { + min_val = val; + peer = n; + } + } + return peer; +} + /** * free_area_init_nodes - Initialise all pg_data_t and zone data * @max_zone_pfn: an array of max PFNs for each zone @@ -7012,6 +7040,7 @@ void __init free_area_init_nodes(unsigne if (pgdat->node_present_pages) node_set_state(nid, N_MEMORY); check_for_memory(pgdat, nid); + pgdat->peer_node = find_best_peer_node(nid); } } From patchwork Wed Dec 26 13:14:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10743089 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2A37391E for ; Wed, 26 Dec 2018 13:37:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1606028495 for ; Wed, 26 Dec 2018 13:37:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 098D828938; Wed, 26 Dec 2018 13:37:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8FCC028495 for ; Wed, 26 Dec 2018 13:37:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 65B738E000E; Wed, 26 Dec 2018 08:37:09 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1ACA78E0003; Wed, 26 Dec 2018 08:37:08 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 868058E000E; Wed, 26 Dec 2018 08:37:08 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id 923418E000C for ; Wed, 26 Dec 2018 08:37:07 -0500 (EST) Received: by mail-pf1-f199.google.com with SMTP id t72so17766609pfi.21 for ; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:subject :references:mime-version:content-disposition; bh=l9DDjMRdHR8SlduXYgg9WlaFUBCS2oE/F8RL01O5gwY=; b=nL6KfYz4u8+le7ltaCixYtKA9G9G+pLg45VppmNj2qw3c0j0BR0SwIpzA6VKhKhAm+ Bh8pVGz9GXt5TuZqOEWXSxumDET+PPAb5paSeGPONSp6u6kbt4vkFEo2GVOpxuo1reM3 xuLas2GASVwYi+VxWEWx6RtQ2h4k5otk3mytP+KIXMLsQ/E2TYPIooe5pQTRZQo2Tpi3 vRPkXqcsC4puwizaOaTxFTLLgvnR78wlNSpE23BJqmVFjKvlPIpRPorQDQlmatWLf7sW u5yd9lmuXd3WTVLRIbXJg4jY5gtPcTRgTM3IpFGLhIl2bE6aC4bVmsD9uvqps2CNbKOJ rSGA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AJcUukdEqbi0u3nQUM9ECzlfStqROK16avdZm+DC4eQ2FfrGqaqCJUQF K4pqa/2INSkruRD+2ABYBRfVGnY67ejr7sNk0HhhXjOMeTqgfMqc/HpDMWOrXXT78BpJP+joqUE 5HrecAsBnUqfFlPHvYNWPvHpVojqen+uIBSfpU+yvjEdQCKgXrUcI/m4ZNGPUXTff4w== X-Received: by 2002:a62:1c7:: with SMTP id 190mr20266401pfb.46.1545831427294; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-Smtp-Source: AFSGD/WJpyXyujlfFV46idNOaLGAzRP7FHJSRjgJEwbTNNDwRgxvl59ot1XZXHSxKBM7axfQPktr X-Received: by 2002:a62:1c7:: with SMTP id 190mr20266374pfb.46.1545831426792; Wed, 26 Dec 2018 05:37:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831426; cv=none; d=google.com; s=arc-20160816; b=iyOil8IQ3S0F/rCWO9exippBH3MngzA0x4xMaqWQxxu9Bd+1yugdB4rzp4Vs4f0oQn 3lNKoHy/Y6aM1t6ce+xV4GsmuVs4RZSa7e06G9mJNrENkEOy7uAs2OcJSgJrNhdeSLQY ST4wS/VunoiIUml1KBX3ZrX1Y4RZBPUNNSYCGGXYv30koyi4JdO09Mt/tmtD6wz+y0aL Fe5QdvwtZ+/gMq10UcBQ19n0DM1odJehc8tXXmKxT9PIouliqIB+su0edPiAw0YdE5fE E2AvuCBxRtOeCVz3IIPBRisqBrVqIT26bU8oOCrx+wvYY59qyWBOdWlY5okykYNnHYSL ylcA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:references:subject:cc:cc:cc:cc:cc :cc:cc:cc:cc:cc:cc:cc:to:from:date:user-agent:message-id; bh=l9DDjMRdHR8SlduXYgg9WlaFUBCS2oE/F8RL01O5gwY=; b=J0qjEXwLn3YhB5wlBRhk5TnnDcPqq1gQmG/rfM8OxXYO3gZK8VWEfXDOsgolHdHh6Z qCiqCyz//hm5SMRD+f94QGvvsJy8TBNjp/ebvEkK0jqvNniGDCPW2RrK5ovOPTORaNkv pZIPTSW0L7IcPAde2GV5oSegSySm1rQ0Bqvp7Xp4CHZvJSGhzmJilmMgWTfgcvb3buLE vuMjb3J5hHgAXqi01MnNr6zSZTRedZKhWebJGGuVlQ7zgglHq02ZFCQcgUuuVqo7MBIZ Scf1dBJJykv8rym7OFTdl+CoCMEyeGtwoyuYjV/T3IDnvHjFKSlAMq641HmEiKULwkU9 qrNA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id p11si31508288plk.191.2018.12.26.05.37.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Dec 2018 05:37:06 -0800 (PST) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="113358933" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by orsmga003.jf.intel.com with ESMTP; 26 Dec 2018 05:37:02 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005OY-D7; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133351.579378360@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:14:55 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Fengguang Wu cc: kvm@vger.kernel.org Cc: LKML cc: Fan Du cc: Yao Yuan cc: Peng Dong cc: Huang Ying CC: Liu Jingqi cc: Dong Eddie cc: Dave Hansen cc: Zhang Yi cc: Dan Williams Subject: [RFC][PATCH v2 09/21] mm: avoid duplicate peer target node References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=0020-page_alloc-avoid-duplicate-peer-target-node.patch X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP To ensure 1:1 peer node mapping on broken BIOS node distances: node 0 1 2 3 0: 10 21 20 20 1: 21 10 20 20 2: 20 20 10 20 3: 20 20 20 10 or with numa=fake=4U node distances: node 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: 10 10 10 10 21 21 21 21 17 17 17 17 28 28 28 28 1: 10 10 10 10 21 21 21 21 17 17 17 17 28 28 28 28 2: 10 10 10 10 21 21 21 21 17 17 17 17 28 28 28 28 3: 10 10 10 10 21 21 21 21 17 17 17 17 28 28 28 28 4: 21 21 21 21 10 10 10 10 28 28 28 28 17 17 17 17 5: 21 21 21 21 10 10 10 10 28 28 28 28 17 17 17 17 6: 21 21 21 21 10 10 10 10 28 28 28 28 17 17 17 17 7: 21 21 21 21 10 10 10 10 28 28 28 28 17 17 17 17 8: 17 17 17 17 28 28 28 28 10 10 10 10 28 28 28 28 9: 17 17 17 17 28 28 28 28 10 10 10 10 28 28 28 28 10: 17 17 17 17 28 28 28 28 10 10 10 10 28 28 28 28 11: 17 17 17 17 28 28 28 28 10 10 10 10 28 28 28 28 12: 28 28 28 28 17 17 17 17 28 28 28 28 10 10 10 10 13: 28 28 28 28 17 17 17 17 28 28 28 28 10 10 10 10 14: 28 28 28 28 17 17 17 17 28 28 28 28 10 10 10 10 15: 28 28 28 28 17 17 17 17 28 28 28 28 10 10 10 10 Signed-off-by: Fengguang Wu --- mm/page_alloc.c | 6 ++++++ 1 file changed, 6 insertions(+) --- linux.orig/mm/page_alloc.c 2018-12-23 19:48:27.366110325 +0800 +++ linux/mm/page_alloc.c 2018-12-23 19:48:27.362110332 +0800 @@ -6941,16 +6941,22 @@ static int find_best_peer_node(int nid) int n, val; int min_val = INT_MAX; int peer = NUMA_NO_NODE; + static nodemask_t target_nodes = NODE_MASK_NONE; for_each_online_node(n) { if (n == nid) continue; val = node_distance(nid, n); + if (val == LOCAL_DISTANCE) + continue; + if (node_isset(n, target_nodes)) + continue; if (val < min_val) { min_val = val; peer = n; } } + node_set(peer, target_nodes); return peer; } From patchwork Wed Dec 26 13:14:56 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10743103 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 669D591E for ; Wed, 26 Dec 2018 13:37:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 534FC28495 for ; Wed, 26 Dec 2018 13:37:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 46EAA28938; Wed, 26 Dec 2018 13:37:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C5B0F28495 for ; Wed, 26 Dec 2018 13:37:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 50C338E0012; Wed, 26 Dec 2018 08:37:10 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 72BE28E0011; Wed, 26 Dec 2018 08:37:09 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 13DD58E0014; Wed, 26 Dec 2018 08:37:08 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id 088668E0010 for ; Wed, 26 Dec 2018 08:37:08 -0500 (EST) Received: by mail-pg1-f199.google.com with SMTP id 143so15273932pgc.3 for ; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:subject :references:mime-version:content-disposition; bh=xxdhI13zA/ZYq1pM1tVWHu7Nmzr2NFM/Mu6MhY8pnCo=; b=a+r5oKmKn5rqvg6iv7DV78ogOGdEZm/fG1PmKB0P1iFUZZW+cJviM5M7FUEzatJh/y 1T0+CzeiYFBGbkg8MMKh9Yx2lSjX1gNADsIkr/mCxhcfhLpfWwLM/u/x2PnYjK/lrrVU KORmAK20f6dApq+MmKCdy0DJccuxQ9HszXxx5jL5x6X7gRx1mhnIE11kcALe3uybKqHe bDJjOoAGLK/nCAWPe7lmaL2W9EHHYkwODklUu8KySxHmAG/SIHIqO+sSxty3czEhFkhd qMSqIJznqPhw1fm8779fhWyVoLvMVpG05GIE28gPBHNjOdjnGdZ8a/2bX0WmEcqjNK9p O7Xw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWYlmKYNxMawFS0ldsZ3I5yncdQmJS6LLjDKcdKFu5DEIxkWz0jU otA4vr2IEA3HesnNRsU8xO74+gHIPmyVHU9o59G9cgdZmAZZrfUFoWzqO/x2gLT5Wjir8KtXv80 GzPDAl6sCnqZqvvaFm9uH378jPJ4qRRYrmkL9WTHnzs3oPvV+Jab1XboUcEyXzlT0YA== X-Received: by 2002:a62:ed0f:: with SMTP id u15mr19787063pfh.188.1545831427700; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-Smtp-Source: AFSGD/XupqZauO4RNVNJmfiaBmVPdzL9ZrheeGtt/iA6NhnZDBWuALDdkqQE1HTTBHyINt4yqZCR X-Received: by 2002:a62:ed0f:: with SMTP id u15mr19787034pfh.188.1545831427176; Wed, 26 Dec 2018 05:37:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831427; cv=none; d=google.com; s=arc-20160816; b=lm8xwco0dh9iTtQ8YPNASc0XeuB1rQRdOAwRhUrXM5fea7Ei1YvyjkATi3fH1CodTR gzKAS3tKNAbkuKCAow65x0IAP3975Bzxu9i6+VtkMiRerkqNCFXL/uge/GTmIMMKefBz DHkmkek0I2Q/oO1fa8tbHW2Uh38kMAXpWoHvTwtqVTTBcmvAKxbI0RXupVRXMXdKKgXa 4b+xuumhn0MOFZva5258QWvBgwaWAE8YvCb8/EqF2gC5NlR/E967tCdlbyFAOiRp2cfv ey+acb3OIEr6y0sGXtYiaOcBcBbylVt08dhwk22OnkWbXMCk0ED2ESW/fCGHn3sQZJi9 juzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:references:subject:cc:cc:cc:cc:cc :cc:cc:cc:cc:cc:cc:to:from:date:user-agent:message-id; bh=xxdhI13zA/ZYq1pM1tVWHu7Nmzr2NFM/Mu6MhY8pnCo=; b=w9Vm8e3l58Cwkzeg+Kfz0scnucLIVpfJk/+QnFineqQUtNlJz6FSqQ5zp5pT3z/TTW Cq1s00O2NebWhs1zCLZoxAvtyu4bbFtEuQts5StFtRzhf9zxbYH3hpYv66c6KkKLeXbd nudu3HlYyJiNi5Ofw5yclIpUOpwa8lZCH3tqhbQpjWIAm+J+Fbp0L+x2jJDd1UY8CG+b p62P6wUKg2DY41ktfKMMyBOAkLUU10navu8GDICRHR8RSibrVjUL2Ujs94t9WRgSf4jX Nguwq94aqVCisniOnmOByU7cUtUuIGG45Z1I0lj3SDq2TxYsF7YTFg8ETvuJeFW6c+kK pwrg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id r12si1487152plo.59.2018.12.26.05.37.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Dec 2018 05:37:07 -0800 (PST) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="113358937" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by orsmga003.jf.intel.com with ESMTP; 26 Dec 2018 05:37:02 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005Oe-Dr; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133351.644607371@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:14:56 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Fan Du , Fengguang Wu cc: kvm@vger.kernel.org Cc: LKML cc: Yao Yuan cc: Peng Dong cc: Huang Ying CC: Liu Jingqi cc: Dong Eddie cc: Dave Hansen cc: Zhang Yi cc: Dan Williams Subject: [RFC][PATCH v2 10/21] mm: build separate zonelist for PMEM and DRAM node References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=0016-page-alloc-Build-separate-zonelist-for-PMEM-and-RAM-.patch X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Fan Du When allocate page, DRAM and PMEM node should better not fall back to each other. This allows migration code to explicitly control which type of node to allocate pages from. With this patch, PMEM NUMA node can only be used in 2 ways: - migrate in and out - numactl That guarantees PMEM NUMA node will only hold anon pages. We don't detect hotness for other types of pages for now. So need to prevent some PMEM page goes hot while not able to detect/move it to DRAM. Another implication is, new page allocations will by default goto DRAM nodes. Which is normally a good choice -- since DRAM writes are cheaper than PMEM, it's often benefitial to watch new pages in DRAM for some time and only move the likely cold pages to PMEM. However there can be exceptions. For example, if PMEM:DRAM ratio is very high, some page allocations may better go to PMEM nodes directly. In long term, we may create more kind of fallback zonelists and make them configurable by NUMA policy. Signed-off-by: Fan Du Signed-off-by: Fengguang Wu --- mm/mempolicy.c | 14 ++++++++++++++ mm/page_alloc.c | 42 +++++++++++++++++++++++++++++------------- 2 files changed, 43 insertions(+), 13 deletions(-) --- linux.orig/mm/mempolicy.c 2018-12-26 20:03:49.821417489 +0800 +++ linux/mm/mempolicy.c 2018-12-26 20:29:24.597884301 +0800 @@ -1745,6 +1745,20 @@ static int policy_node(gfp_t gfp, struct WARN_ON_ONCE(policy->mode == MPOL_BIND && (gfp & __GFP_THISNODE)); } + if (policy->mode == MPOL_BIND) { + nodemask_t nodes = policy->v.nodes; + + /* + * The rule is if we run on DRAM node and mbind to PMEM node, + * perferred node id is the peer node, vice versa. + * if we run on DRAM node and mbind to DRAM node, #PF node is + * the preferred node, vice versa, so just fall back. + */ + if ((is_node_dram(nd) && nodes_subset(nodes, numa_nodes_pmem)) || + (is_node_pmem(nd) && nodes_subset(nodes, numa_nodes_dram))) + nd = NODE_DATA(nd)->peer_node; + } + return nd; } --- linux.orig/mm/page_alloc.c 2018-12-26 20:03:49.821417489 +0800 +++ linux/mm/page_alloc.c 2018-12-26 20:03:49.817417321 +0800 @@ -5153,6 +5153,10 @@ static int find_next_best_node(int node, if (node_isset(n, *used_node_mask)) continue; + /* DRAM node doesn't fallback to pmem node */ + if (is_node_pmem(n)) + continue; + /* Use the distance array to find the distance */ val = node_distance(node, n); @@ -5242,19 +5246,31 @@ static void build_zonelists(pg_data_t *p nodes_clear(used_mask); memset(node_order, 0, sizeof(node_order)); - while ((node = find_next_best_node(local_node, &used_mask)) >= 0) { - /* - * We don't want to pressure a particular node. - * So adding penalty to the first node in same - * distance group to make it round-robin. - */ - if (node_distance(local_node, node) != - node_distance(local_node, prev_node)) - node_load[node] = load; - - node_order[nr_nodes++] = node; - prev_node = node; - load--; + /* Pmem node doesn't fallback to DRAM node */ + if (is_node_pmem(local_node)) { + int n; + + /* Pmem nodes should fallback to each other */ + node_order[nr_nodes++] = local_node; + for_each_node_state(n, N_MEMORY) { + if ((n != local_node) && is_node_pmem(n)) + node_order[nr_nodes++] = n; + } + } else { + while ((node = find_next_best_node(local_node, &used_mask)) >= 0) { + /* + * We don't want to pressure a particular node. + * So adding penalty to the first node in same + * distance group to make it round-robin. + */ + if (node_distance(local_node, node) != + node_distance(local_node, prev_node)) + node_load[node] = load; + + node_order[nr_nodes++] = node; + prev_node = node; + load--; + } } build_zonelists_in_node_order(pgdat, node_order, nr_nodes); From patchwork Wed Dec 26 13:14:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10743083 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2169C924 for ; Wed, 26 Dec 2018 13:37:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0D6F228495 for ; Wed, 26 Dec 2018 13:37:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 01C9F28938; Wed, 26 Dec 2018 13:37:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8553D28495 for ; Wed, 26 Dec 2018 13:37:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BB92E8E000D; Wed, 26 Dec 2018 08:37:08 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AD32E8E0014; Wed, 26 Dec 2018 08:37:08 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3FDC68E000B; Wed, 26 Dec 2018 08:37:07 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 522698E0008 for ; Wed, 26 Dec 2018 08:37:07 -0500 (EST) Received: by mail-pf1-f197.google.com with SMTP id q64so17734186pfa.18 for ; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:subject :references:mime-version:content-disposition; bh=65FA41VLdlPtPOOJtgzAErEZDKqpBZV5hcSbMKUAFtQ=; b=TrpG3auKZuNch4U8E/ri4wqD7hkm4PyB2nna2ckpCam+BNKLRiXmynW+z4PFaDZRnS pD+SXwtwDEx71RPDnMOAn1OK0jkshbW6xaajpiBiIb7uFGn9MBJhO4NLoYgRExEb2Gu1 wVMvOFksvuc7AV0+1/clTHpq0T9imZ0eOlj7L/QQ/yvBP563Lue1bnIp9xo+i9B25W0g Dhdn4SS9apiu2ltDa4Sw4fzRl0V2NZOs/c9t8X+MJ8hCgwgoRf1vmvbUe+lm9LrfCByj LSuYSFKjgtSsLwLtcEDfHUO2DHlWJm/hqqe6RBFOWrfrjF49fuFkrQBaGa3AFKF7ftxe wt1A== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWYojHMbwfefx0Ypwlvy4RBaQYmbR3HbNtSrji+T15fn5Ffn7A/j cJufDnMaVoF5Mjzkw6yiCvXLx+CeC2CY5gbGbGrQDHmXSTr7K59zWPu6bobJStZlr7izXdJe9L0 mSak0ByGhwmLwW5ulDm2Qj16TjDBaPaetr2uXjJ7p76k7SVHW6456N4b71co0gYNG7w== X-Received: by 2002:a62:4886:: with SMTP id q6mr20793806pfi.182.1545831427009; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-Smtp-Source: AFSGD/WTkA0qJgvn02GtHzs8wBULK68frMrCC5uHcP3rqRITO7NnACi28diqlqn8x+6rlhI9uajD X-Received: by 2002:a62:4886:: with SMTP id q6mr20793769pfi.182.1545831426376; Wed, 26 Dec 2018 05:37:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831426; cv=none; d=google.com; s=arc-20160816; b=s5aVniss+0Xus8g4eGIJn9JmiDIRUVx9Sk6q9UT9RMdMjiQ7FC3ekmJtvD+5OxSHZu qxb8bEZ9MfxGsLMvTdEL6zmQ1hbyZl8NF3NPFDRe7ObC4iw5s2CYhldeBKy06oO5cC9I yKEukCT3TKcH1gEvmfOahoILvNp7mFpyBMJ5crn0MBz1dhGJtVoa7HGJJAD57M15Jw4w g2hYwaJ98tuWv4490ZnbTSzq36Wq+SMKaTI6zNF/pzYN9aoy2H844QSKaWrZt0JBqhwg g4sT/r+uZHAtdfDuNb0WcsaDVqvYCrD9AA9mjgb5pUylyao6J7fWVsDZxrrL86G+wzTB 8vtQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:references:subject:cc:cc:cc:cc:cc :cc:cc:cc:cc:cc:cc:to:from:date:user-agent:message-id; bh=65FA41VLdlPtPOOJtgzAErEZDKqpBZV5hcSbMKUAFtQ=; b=btDeXWesdskot4v4JITCyBcOTGSL7RfTqEYab+fELRxrd2Ukv2T2OwtPqEnbryGvrZ mR775TG18cudRUvn7t84Gp5qus0pcvqWgLVKUboyIBS4ZJ0QQl4styAS2vCDed04cBdn ucBWp36vr/2fFLu/n7kz/1oz8ShUWetmoKwCWKBk8NDWIc9xVCwlH/xRcEITZ1xL5e0z /6jJ2x+4TCaBpPDR3Cxeoh733IrAyobH9BSo+Cqz+7tHhsbTIBhXYummrf69I3/5wnP9 WtI+EyhAPeEhRmy8TEd1v7ChSXczg1h35EzWF0rwicajPgLBWCQrQ622+UwBSp7TBw9Q w8ow== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id r12si1487152plo.59.2018.12.26.05.37.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Dec 2018 05:37:06 -0800 (PST) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="113358929" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by orsmga003.jf.intel.com with ESMTP; 26 Dec 2018 05:37:02 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005Oj-Em; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133351.703380444@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:14:57 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Yao Yuan , Fengguang Wu cc: kvm@vger.kernel.org Cc: LKML cc: Fan Du cc: Peng Dong cc: Huang Ying CC: Liu Jingqi cc: Dong Eddie cc: Dave Hansen cc: Zhang Yi cc: Dan Williams Subject: [RFC][PATCH v2 11/21] kvm: allocate page table pages from DRAM References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=0001-kvm-allocate-page-table-pages-from-DRAM.patch X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Yao Yuan Signed-off-by: Yao Yuan Signed-off-by: Fengguang Wu --- arch/x86/kvm/mmu.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) --- linux.orig/arch/x86/kvm/mmu.c 2018-12-26 20:54:48.846720344 +0800 +++ linux/arch/x86/kvm/mmu.c 2018-12-26 20:54:48.842719614 +0800 @@ -950,6 +950,16 @@ static void mmu_free_memory_cache(struct kmem_cache_free(cache, mc->objects[--mc->nobjs]); } +static unsigned long __get_dram_free_pages(gfp_t gfp_mask) +{ + struct page *page; + + page = __alloc_pages(GFP_KERNEL_ACCOUNT, 0, numa_node_id()); + if (!page) + return 0; + return (unsigned long) page_address(page); +} + static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache, int min) { @@ -958,7 +968,7 @@ static int mmu_topup_memory_cache_page(s if (cache->nobjs >= min) return 0; while (cache->nobjs < ARRAY_SIZE(cache->objects)) { - page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT); + page = (void *)__get_dram_free_pages(GFP_KERNEL_ACCOUNT); if (!page) return cache->nobjs >= min ? 0 : -ENOMEM; cache->objects[cache->nobjs++] = page; From patchwork Wed Dec 26 13:14:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10743081 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4A75F91E for ; Wed, 26 Dec 2018 13:37:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 36B5228495 for ; Wed, 26 Dec 2018 13:37:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2B0B628938; Wed, 26 Dec 2018 13:37:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9EDA428495 for ; Wed, 26 Dec 2018 13:37:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 86AE98E0013; Wed, 26 Dec 2018 08:37:08 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 75A258E000C; Wed, 26 Dec 2018 08:37:08 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF4958E0002; Wed, 26 Dec 2018 08:37:07 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id 4E2F08E0002 for ; Wed, 26 Dec 2018 08:37:07 -0500 (EST) Received: by mail-pg1-f198.google.com with SMTP id m16so15290285pgd.0 for ; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:subject :references:mime-version:content-disposition; bh=5Mnwktm5eJFyeiOEO0Th0lml9blQA9FGr4VHQc/KtHc=; b=cL64DXlUU9r1XKmD8BSVu8OtL6AS6NTx5cVaXyH+i/+UnAbK8LknDqJl6S6zPmcg44 l9Zhr0RWQJDx4d8lRhRuFCgP3o4bacd+NT37K/eNnLb+9DzgLK5rfcAhmm7G0B0GMhHs 09IJ6Qxcxx71e6BJHxLDde4Us0csibB/hNbW20gzLrF4x2LbNoa3J3mGpWjmoJLKlm1z XXg73C9J7m149KXh4s0Vha4UUN0ktubUoGI0LUFCnK+C6FhNDNpMj0aK6KxIOXi1HvIx clqzI6b0Y+8eWBZv9Sc7pzowGVgTeebUYB1pCpXT2Ct1MEnMnD5+A6uT+SHdq7dVNHd7 87Yw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AJcUukch0HJ05VX8mFf7P4nF2E/cvesm0eEuSfSfS5vWnzoX7boEbL80 xkCjxwRfVwPL+nErR87PbUNULQVvpbpMGjJl/z3BXOoxJ4T3nb9v38fyeDZRhLeo6wmySkQMEfx +W4YJtuBNF5R6AVgidM2oie0iGWJYxTaljHq0Eygs15Lbtk8Hon4cGbUiyKN+bvyb3w== X-Received: by 2002:a17:902:22f:: with SMTP id 44mr19745075plc.137.1545831427020; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-Smtp-Source: ALg8bN7L8un+cIsXDE6sJgneWRuY1vE1xzoIlwcTy0m8TduUUWGaQjfmHYIpjYJwMhbMjHxYssvF X-Received: by 2002:a17:902:22f:: with SMTP id 44mr19745040plc.137.1545831426397; Wed, 26 Dec 2018 05:37:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831426; cv=none; d=google.com; s=arc-20160816; b=lm/XA6XnE4Yutt7o4oofahx8AZB3ISncmkVyI0x3RjRTR2vo98Yh7SHehJnIikcPWZ MpBUXu4qDb0xVYAT9WsVSErzCL0POIAxtn0vifZG4UsNXdCqVB+mcSYs+K0dmbnrqb/z FdvSWa1d0udtnXuV6FM0RknHd4lvsQaZsERisLVo3FF21h2iMA5oPBI3JEJZfiMdlgaD eXDmpcW8EUPVTUl1nqhTKYgObUT0RGNMzDxNFC5ZOaGd8bT2c4wiKk1GmcVLRzP2FRSN 8PiK+VlsPiB34LVn/RI1wBgdYflZC9w+pacTJi5kdv05glIl4M/WW8mgsvXBmLGcPi9l KRvA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:references:subject:cc:cc:cc:cc:cc :cc:cc:cc:cc:cc:cc:cc:to:from:date:user-agent:message-id; bh=5Mnwktm5eJFyeiOEO0Th0lml9blQA9FGr4VHQc/KtHc=; b=0Xhjy9xNRwGgxROXSrlm7YHawzStRoi1uvnw9k2u53I4Y6x6Y3OviSgkRrRwjtQZvW uoZLYOGz+wgaIEfZ1BYn9TCqzkgPvYKwRSFdOikM2Jv1aGrW1Rh6qc9R5SH0TrSveFzT VX8chTJRotiRgz5fts1ejXSSQBJDX7k4RnUG9v58hOBGCPPyfffSsyP3tq6derYw3GXx 4aJ/OX+PfSziyz5/9eF1eiTU7jFFDa1wdxzN/GaqPo1sHXSgIUgYMsSRP1i4wbMd6rHx oX8aneFj24cS6XEXeQBo9/tMsSBorgDuLwb1hQFQ/yJ92dmEE19BoOqZU03Thbmm7KrM 0kaA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id e68si15371744pfb.101.2018.12.26.05.37.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Dec 2018 05:37:06 -0800 (PST) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="121185464" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by FMSMGA003.fm.intel.com with ESMTP; 26 Dec 2018 05:37:02 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005Oo-FY; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133351.770245668@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:14:58 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Fengguang Wu cc: kvm@vger.kernel.org Cc: LKML cc: Fan Du cc: Yao Yuan cc: Peng Dong cc: Huang Ying CC: Liu Jingqi cc: Dong Eddie cc: Dave Hansen cc: Zhang Yi cc: Dan Williams Subject: [RFC][PATCH v2 12/21] x86/pgtable: allocate page table pages from DRAM References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=0018-pgtable-force-pgtable-allocation-from-DRAM-node-0.patch X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP On rand read/writes on large data, we find near half memory accesses caused by TLB misses, hence hit the page table pages. So better keep page table pages in faster DRAM nodes. Signed-off-by: Fengguang Wu --- arch/x86/include/asm/pgalloc.h | 10 +++++++--- arch/x86/mm/pgtable.c | 22 ++++++++++++++++++---- 2 files changed, 25 insertions(+), 7 deletions(-) --- linux.orig/arch/x86/mm/pgtable.c 2018-12-26 19:41:57.494900885 +0800 +++ linux/arch/x86/mm/pgtable.c 2018-12-26 19:42:35.531621035 +0800 @@ -22,17 +22,30 @@ EXPORT_SYMBOL(physical_mask); #endif gfp_t __userpte_alloc_gfp = PGALLOC_GFP | PGALLOC_USER_GFP; +nodemask_t all_node_mask = NODE_MASK_ALL; + +unsigned long __get_free_pgtable_pages(gfp_t gfp_mask, + unsigned int order) +{ + struct page *page; + + page = __alloc_pages_nodemask(gfp_mask, order, numa_node_id(), &all_node_mask); + if (!page) + return 0; + return (unsigned long) page_address(page); +} +EXPORT_SYMBOL(__get_free_pgtable_pages); pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address) { - return (pte_t *)__get_free_page(PGALLOC_GFP & ~__GFP_ACCOUNT); + return (pte_t *)__get_free_pgtable_pages(PGALLOC_GFP & ~__GFP_ACCOUNT, 0); } pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address) { struct page *pte; - pte = alloc_pages(__userpte_alloc_gfp, 0); + pte = __alloc_pages_nodemask(__userpte_alloc_gfp, 0, numa_node_id(), &all_node_mask); if (!pte) return NULL; if (!pgtable_page_ctor(pte)) { @@ -241,7 +254,7 @@ static int preallocate_pmds(struct mm_st gfp &= ~__GFP_ACCOUNT; for (i = 0; i < count; i++) { - pmd_t *pmd = (pmd_t *)__get_free_page(gfp); + pmd_t *pmd = (pmd_t *)__get_free_pgtable_pages(gfp, 0); if (!pmd) failed = true; if (pmd && !pgtable_pmd_page_ctor(virt_to_page(pmd))) { @@ -422,7 +435,8 @@ static inline void _pgd_free(pgd_t *pgd) static inline pgd_t *_pgd_alloc(void) { - return (pgd_t *)__get_free_pages(PGALLOC_GFP, PGD_ALLOCATION_ORDER); + return (pgd_t *)__get_free_pgtable_pages(PGALLOC_GFP, + PGD_ALLOCATION_ORDER); } static inline void _pgd_free(pgd_t *pgd) --- linux.orig/arch/x86/include/asm/pgalloc.h 2018-12-26 19:40:12.992251270 +0800 +++ linux/arch/x86/include/asm/pgalloc.h 2018-12-26 19:42:35.531621035 +0800 @@ -96,10 +96,11 @@ static inline pmd_t *pmd_alloc_one(struc { struct page *page; gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO; + nodemask_t all_node_mask = NODE_MASK_ALL; if (mm == &init_mm) gfp &= ~__GFP_ACCOUNT; - page = alloc_pages(gfp, 0); + page = __alloc_pages_nodemask(gfp, 0, numa_node_id(), &all_node_mask); if (!page) return NULL; if (!pgtable_pmd_page_ctor(page)) { @@ -141,13 +142,16 @@ static inline void p4d_populate(struct m set_p4d(p4d, __p4d(_PAGE_TABLE | __pa(pud))); } +extern unsigned long __get_free_pgtable_pages(gfp_t gfp_mask, + unsigned int order); + static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr) { gfp_t gfp = GFP_KERNEL_ACCOUNT; if (mm == &init_mm) gfp &= ~__GFP_ACCOUNT; - return (pud_t *)get_zeroed_page(gfp); + return (pud_t *)__get_free_pgtable_pages(gfp | __GFP_ZERO, 0); } static inline void pud_free(struct mm_struct *mm, pud_t *pud) @@ -179,7 +183,7 @@ static inline p4d_t *p4d_alloc_one(struc if (mm == &init_mm) gfp &= ~__GFP_ACCOUNT; - return (p4d_t *)get_zeroed_page(gfp); + return (p4d_t *)__get_free_pgtable_pages(gfp | __GFP_ZERO, 0); } static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d) From patchwork Wed Dec 26 13:14:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10743107 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E9D9691E for ; Wed, 26 Dec 2018 13:37:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D667E28495 for ; Wed, 26 Dec 2018 13:37:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CA7152893D; Wed, 26 Dec 2018 13:37:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B49DB28495 for ; Wed, 26 Dec 2018 13:37:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7557A8E0011; Wed, 26 Dec 2018 08:37:10 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 890708E0014; Wed, 26 Dec 2018 08:37:09 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 276788E0015; Wed, 26 Dec 2018 08:37:09 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id 1D08A8E0003 for ; Wed, 26 Dec 2018 08:37:08 -0500 (EST) Received: by mail-pg1-f198.google.com with SMTP id f9so15242490pgs.13 for ; Wed, 26 Dec 2018 05:37:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:subject :references:mime-version:content-disposition; bh=nOmb6v+nXizyBHW4WE4KtqyuH+NWX1h+e022jm/zANE=; b=JTKCEzfzxYEbxwU+gDZwyVlSmcwgsHuxG/v9TXK9bC+rywh74Jf22CZPLU4XEksKZx VdaiG+QS+VHxcrnP5ensPC1cnjM7OJjqv/pI7kmJ1kjVKmE3Bgr/q+ugjae/qRxx0Gzh 8RNJtIEs3kqIBVUykNn3e0xFvigTNmxTJ7hJHjEpFhg4rXSWUe4sqXdimFEaaG6hN4Y6 hl5ves6CThAktk86sWNWRUNOOgUJaR2b5PMFLuCdEmb5nG9AA4BOqTSGrqNuI2ALjpWA 1H/K+G4A2QvG2MIvPxo9UlZrpxizsoSFxa/KsS3EfFZWc/y6rq21P9tQJIvcbHDKTLsW Hc4Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AJcUukfvID5AiPkkJxOFp/w/R9cUU9fZznHs5nTDAb93yZb21S08JyJI eOmVRJax+6I92KuldJOqSJ/mganSJFU18hVR3uwYmQ1v8PVX02eVi1I9JqoyeThlLf+e8gefA+E abxShRr/FDfynbkKcU53YOYQ9PswuOrTuqXwhqufaxaphteMi5IQXU+zSG1yBwrL0ZA== X-Received: by 2002:a63:1c61:: with SMTP id c33mr18519270pgm.354.1545831427816; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-Smtp-Source: ALg8bN56kcypRc7Jo8+yRCU7wkc/n3cvMDNdtcUZIa1l8Wwvo8mWhkDQ6Ad6c1hx+mt0qZoM3Xce X-Received: by 2002:a63:1c61:: with SMTP id c33mr18519232pgm.354.1545831427338; Wed, 26 Dec 2018 05:37:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831427; cv=none; d=google.com; s=arc-20160816; b=tH/Y5qSzxcmFz9IAoB8WTFcDKo9fyIr51QMhGZqFhlFI1kAVd9+oHF6Yi9wV156Du9 djhPGCL+Lyfd6xhVb5tPF5mxJzlWiltmyKM6ebIVbGLkexpSjqYN3CXWeZcghpr5OViS EEbjSewyAOE5FhvrzCsPBOEVb9e0eTZYVSKIbx8ISvZwcsQdeAEWNCZLWXFAygwuwU3U caFLtLmoO3dAWD5oHmjtaU+vnoHOyjYkZ/SOsC/TaynUIVbMd5rgdDxGxLUIORFFeau1 GxjzRFVy20Aw4Igk+QczYkXp6U7Edgp1cvgQiS+U9YtuWvA6PnHv+BfqbBrNmM2Ypr9N uKTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:references:subject:cc:cc:cc:cc:cc :cc:cc:cc:cc:cc:cc:to:from:date:user-agent:message-id; bh=nOmb6v+nXizyBHW4WE4KtqyuH+NWX1h+e022jm/zANE=; b=DG+Hi5cw1CMRbbwJ6VVaS8/PdAZyBSbOWQd0omH1TtVTE7OhFjxEf4+9IrPoVcgzfc g6Do4hjxuz4HFOGBehRl5QgFteJXVV1qri+FT5Urqpp8g8jHdMeZfPMc88jpqZ3eXt8b cEDo6eBbfw7qu2mY0hV4WpedFGXxAa1beCd1YlfHE58/RvepDf4AQsWrD7WIYdlvuBX9 6UXZU2bTJdO7MqJIf8tgbenwEBacQTwEQLJi44r+9U/C28jSFfmsH2Zw2INXt1DpK/2H 94BTOfLjE7Jm1DX0wwcclbP+YghoJl/POGRCKc9j90/CjNBwBVy0PKkgHQ9OAMfOhVH4 XVmg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id p11si31508288plk.191.2018.12.26.05.37.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Dec 2018 05:37:07 -0800 (PST) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="113358941" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by orsmga003.jf.intel.com with ESMTP; 26 Dec 2018 05:37:02 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005Ot-GK; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133351.828074959@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:14:59 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Jingqi Liu , Fengguang Wu cc: kvm@vger.kernel.org Cc: LKML cc: Fan Du cc: Yao Yuan cc: Peng Dong cc: Huang Ying cc: Dong Eddie cc: Dave Hansen cc: Zhang Yi cc: Dan Williams Subject: [RFC][PATCH v2 13/21] x86/pgtable: dont check PMD accessed bit References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=0006-pgtable-don-t-check-the-page-accessed-bit.patch X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jingqi Liu ept-idle will clear PMD accessed bit to speedup PTE scan -- if the bit remains unset in the next scan, all the 512 PTEs can be skipped. So don't complain on !_PAGE_ACCESSED in pmd_bad(). Note that clearing PMD accessed bit has its own cost, the optimization may only be worthwhile for - large idle area - sparsely populated area Signed-off-by: Jingqi Liu Signed-off-by: Fengguang Wu --- arch/x86/include/asm/pgtable.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- linux.orig/arch/x86/include/asm/pgtable.h 2018-12-23 19:50:50.917902600 +0800 +++ linux/arch/x86/include/asm/pgtable.h 2018-12-23 19:50:50.913902605 +0800 @@ -821,7 +821,8 @@ static inline pte_t *pte_offset_kernel(p static inline int pmd_bad(pmd_t pmd) { - return (pmd_flags(pmd) & ~_PAGE_USER) != _KERNPG_TABLE; + return (pmd_flags(pmd) & ~(_PAGE_USER | _PAGE_ACCESSED)) != + (_KERNPG_TABLE & ~_PAGE_ACCESSED); } static inline unsigned long pages_to_mb(unsigned long npg) From patchwork Wed Dec 26 13:15:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10743101 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9CA5291E for ; Wed, 26 Dec 2018 13:37:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8783128495 for ; Wed, 26 Dec 2018 13:37:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7B17D28938; Wed, 26 Dec 2018 13:37:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 14D0728495 for ; Wed, 26 Dec 2018 13:37:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F3588E0002; Wed, 26 Dec 2018 08:37:10 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4C2AD8E0016; Wed, 26 Dec 2018 08:37:09 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E845B8E0006; Wed, 26 Dec 2018 08:37:08 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id A72FD8E0006 for ; Wed, 26 Dec 2018 08:37:07 -0500 (EST) Received: by mail-pg1-f197.google.com with SMTP id y8so15205734pgq.12 for ; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:subject :references:mime-version:content-disposition; bh=YZLzxQiwGlvXOZFkL97HKAur/8XWKtAC5MIphih9KC4=; b=cCEbp1weIE4hIiygOqaIWg/IfbKzzTJ/uqlEcvG3if4SX2waHIJ7hQHYBeLWwiYcf4 M+KFDmwbtjX4nRtix5e5iTlJ2HeZHy1iK4k3a+k0KGsIKn9hLOR00pmUwbcdgLEt5FCL 3vHWGYDqUJBLzQotvKigPsi1Z/Z86VvglQxZIRK464eH2OCKK4mSfi4M8y0XOD6yKKZ5 3r+1zanrAa/w9hVe7LJozA9PU9JlfUZTqFdPKUG2O+aVURXkyjb1bOq4BPIhORLfqr0s E4fUYCRaFR98xLWyDBqrNbag1f6K+lMZbzvtw7oUQ7dOpiIN0gLZeINaJKZbvrgd9eev JEcw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AJcUukedibMPAX7ds6UQE0YVYZaXu5hOLnokD2m7H+kG+EnQsA1XL6lk 2VNgfmKFg1J5LwfSBlEsCPdeCYvRZ08cZkkytmMf2wUzL37H81Pf6Z/jB1saXgmoV9gMozjCsqi 2bDGgWXsxX4FWKrDQKtVfwBinmsV7qFYdvKd4sFUc3/qYXS/dtu4irN8eMtCvpelRsg== X-Received: by 2002:a17:902:541:: with SMTP id 59mr20142456plf.88.1545831427377; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-Smtp-Source: ALg8bN6onIXpHro1JDeoRiXbTIHg3VJhDU6CIMxy+ftIooY8o8n3BX6vFGBfvFoBQ21dWVNUBS7z X-Received: by 2002:a17:902:541:: with SMTP id 59mr20142429plf.88.1545831426902; Wed, 26 Dec 2018 05:37:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831426; cv=none; d=google.com; s=arc-20160816; b=fjj+3MhZAw5gel/J7oiogSaoscHyaZGSqfJlSCO/oMFtluOjH35QwfpmH5plb5D0qa Uk9h9P4equlTSZpY9zcZ/MjdMrI0BKRVuy8f9tP8piOSmo80n45FfmBIKg7EikYqF8IT in6gUnNNoJMagijOfM6aKzuM8l/PtYrQZOzRgAKCzJtuhpFRbg8ssq2/99k+ZH1w6Nir xm2dwo1j2gz1rGd9c90qlHa1JJ2Vl13aPuQTdEKxMf+JTfrxfHsrMxNvOhcADkPoWylD NZRk3lQdNeDmoDpsPCvRElNS2nwHNgiJhrzgRYyc1OAor+WyHcOJHlTlucLsGslz+vSq IZag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:references:subject:cc:cc:cc:cc:cc :cc:cc:cc:cc:cc:cc:cc:to:from:date:user-agent:message-id; bh=YZLzxQiwGlvXOZFkL97HKAur/8XWKtAC5MIphih9KC4=; b=uLDReHjzfhXelifCaJhX/LmK7j5dkpHGnVFffZIIaGfIqpwkrfifXrLK8+R7ZPo8UA VWWBTVFnZD7/In54oWximfAou4o7t4E+rVFr83dK6sKidZWZPAWzv4veeNM6HscezoeA 1zWetytZw8Xwq0GmUTi7g2WU/VcNSVWe4r0OZf4X2/6QbVG9Zz/CpSTzNHUOcZ13KLBp FouahY+ALU+1RYVONdJIzs6RIogJ/J+HRKBfsxNTxrMrL5Jrj+XNadOfI4OBdP66ZVDa fiBDFIcklJ7TKmrGL/0VBDulXy0tUf8i+m9O96puuS2cBoK/7H86MNxRKsWiUkMgzI9B 2t9g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id e68si15371744pfb.101.2018.12.26.05.37.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Dec 2018 05:37:06 -0800 (PST) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="121185473" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by FMSMGA003.fm.intel.com with ESMTP; 26 Dec 2018 05:37:02 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005Oy-IC; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133351.894160986@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:15:00 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Nikita Leshenko , Christian Borntraeger , Fengguang Wu cc: kvm@vger.kernel.org Cc: LKML cc: Fan Du cc: Yao Yuan cc: Peng Dong cc: Huang Ying CC: Liu Jingqi cc: Dong Eddie cc: Dave Hansen cc: Zhang Yi cc: Dan Williams Subject: [RFC][PATCH v2 14/21] kvm: register in mm_struct References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=0009-kvm-register-in-mm_struct.patch X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP VM is associated with an address space and not a specific thread. >From Documentation/virtual/kvm/api.txt: Only run VM ioctls from the same process (address space) that was used to create the VM. CC: Nikita Leshenko CC: Christian Borntraeger Signed-off-by: Fengguang Wu --- include/linux/mm_types.h | 11 +++++++++++ virt/kvm/kvm_main.c | 3 +++ 2 files changed, 14 insertions(+) --- linux.orig/include/linux/mm_types.h 2018-12-23 19:58:06.993417137 +0800 +++ linux/include/linux/mm_types.h 2018-12-23 19:58:06.993417137 +0800 @@ -27,6 +27,7 @@ typedef int vm_fault_t; struct address_space; struct mem_cgroup; struct hmm; +struct kvm; /* * Each physical page in the system has a struct page associated with @@ -496,6 +497,10 @@ struct mm_struct { /* HMM needs to track a few things per mm */ struct hmm *hmm; #endif + +#if IS_ENABLED(CONFIG_KVM) + struct kvm *kvm; +#endif } __randomize_layout; /* @@ -507,6 +512,12 @@ struct mm_struct { extern struct mm_struct init_mm; +#if IS_ENABLED(CONFIG_KVM) +static inline struct kvm *mm_kvm(struct mm_struct *mm) { return mm->kvm; } +#else +static inline struct kvm *mm_kvm(struct mm_struct *mm) { return NULL; } +#endif + /* Pointer magic because the dynamic array size confuses some compilers. */ static inline void mm_init_cpumask(struct mm_struct *mm) { --- linux.orig/virt/kvm/kvm_main.c 2018-12-23 19:58:06.993417137 +0800 +++ linux/virt/kvm/kvm_main.c 2018-12-23 19:58:06.993417137 +0800 @@ -727,6 +727,7 @@ static void kvm_destroy_vm(struct kvm *k struct mm_struct *mm = kvm->mm; kvm_uevent_notify_change(KVM_EVENT_DESTROY_VM, kvm); + mm->kvm = NULL; kvm_destroy_vm_debugfs(kvm); kvm_arch_sync_events(kvm); spin_lock(&kvm_lock); @@ -3224,6 +3225,8 @@ static int kvm_dev_ioctl_create_vm(unsig fput(file); return -ENOMEM; } + + kvm->mm->kvm = kvm; kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, kvm); fd_install(r, file); From patchwork Wed Dec 26 13:15:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10743111 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 55244924 for ; Wed, 26 Dec 2018 13:37:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3F01D28495 for ; Wed, 26 Dec 2018 13:37:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 327DE28938; Wed, 26 Dec 2018 13:37:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0224B28495 for ; Wed, 26 Dec 2018 13:37:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 96E538E0014; Wed, 26 Dec 2018 08:37:10 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D989B8E000B; Wed, 26 Dec 2018 08:37:09 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 331328E0012; Wed, 26 Dec 2018 08:37:09 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id F060C8E000F for ; Wed, 26 Dec 2018 08:37:07 -0500 (EST) Received: by mail-pl1-f198.google.com with SMTP id j8so14043785plb.1 for ; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:subject :references:mime-version:content-disposition; bh=6JH+5134M+Wd4TDdow5m8EKsrfFes0kHVggXCeZOfv0=; b=I3N/EAx180/+wPSbW+Zrehm9CyrS6NmzOZCC9uzzOOPjsLrRHIbMleMPORu1mmu6ev b5/vZX6puEk1SuOZyHF+LpoNVWNazKPxEgSCOtGPEp/wYxJRZzFv6lcomvlVbKvx7JMe yk3AP8ceSVUG8jPaCMJCPAW+A5yUcHNujPtD9htJ4XCRu/p+gVY4vRidB2BDGo6OVLo3 TgQAce6MgONdf/SKai+J6wLA64sEDqxQQh8evhnuwSx9Ju7Kfn+KPymltydt8QHU9FIm fipi3/PwsyYZPZMsXcpOIn8RgzPnzmZ5wCu0EVFdYlHLBFqlK1h5Ufy4DVuTkZ8hfEsD q8KA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWZxio+8XHxqGv089KdmA4+jz4hitEXsJr2KnfgQKRxxzIy2awOM lclUQVjet1s35FLcbBKXQ5eVfsVDcu5hLp43/JbqTB52x4q5HtvwGl68DCyhtBSEDxJlbL3961U y9Zbw7WJWgQAI+0If7525rzqtseyaytSRySZKSUGFPPHGfepJlDWJQeCR0iZtXnC5WA== X-Received: by 2002:a62:8dd9:: with SMTP id p86mr19985990pfk.143.1545831427660; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-Smtp-Source: AFSGD/WfAIrtd9BaOkQrMtJIFLpT4mJ5+LIWkSNtPQmGicS4oNPDbPrH3gchuEXzsqb981Uu2ZFy X-Received: by 2002:a62:8dd9:: with SMTP id p86mr19985958pfk.143.1545831427092; Wed, 26 Dec 2018 05:37:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831427; cv=none; d=google.com; s=arc-20160816; b=EKxwVVGHfuEtwAjiunUz68FOWSr3pWUnMVatEpnsszkV3YQJhgY3E2pYjPS+4Z8qT0 QW5SNakTPA8syQt+UwcLq7XckVcyylRtUwECewodRa/n3W5dflY7jYYBuIdHNDSIxBta Lgd2xC3O6N4Yepr75TREhoUC5ElIPhgL8ToMFRKCkL1y9zzZG2FLgzS0b+bJwBIj5nHd GWmDEaXtI/pSXQHm4XYV2vQl7sHne8P5SHurS6//ZK8vCG60ymxK87KL83dbJb2kYxpX JsOkS4Q5OVWZ88tW+f0c6nsr2d7H1IvmUhg48bHS0ArbYE5lnt9xLH+auT+x5XPfm9ju GVFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:references:subject:cc:cc:cc:cc:cc :cc:cc:cc:cc:to:from:date:user-agent:message-id; bh=6JH+5134M+Wd4TDdow5m8EKsrfFes0kHVggXCeZOfv0=; b=XE/nkR7Q71/VqXvZ26AZ4ONZZPPr8GljfJ2lGr8thYGj9iRpD0CcZrjvB8nAsD4879 AsXOVI2CpQqt/see+p75dlatMGkLVKGh42kHaPnhEn9FfedDGZGVPxknnU2Jbbsopi0h Wy5ut4eptNo6GJgdV8Ec/fn0fEhgAl7mpQipz8qMPLcrrCn3+mEEMMvGriqZqKwIraQQ Z/CYRKcYqd5A831VaiJUNTQpEDS3ShT9SxH5AHelKsz/mwFHX/YsfQfmgaPr+4G4Fqfs RR26oDKnAEGzdpmuVYc2uAAqsbB5fBqA/5w3ih/ABFVxIo2HiZM+Um6KIfLY09v0ML6h k+qQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id c7si33395890pgg.339.2018.12.26.05.37.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Dec 2018 05:37:07 -0800 (PST) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:06 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="121185476" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by FMSMGA003.fm.intel.com with ESMTP; 26 Dec 2018 05:37:02 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005P3-JE; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133351.956098465@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:15:01 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Dave Hansen , Peng Dong , Liu Jingqi , Fengguang Wu cc: kvm@vger.kernel.org Cc: LKML cc: Fan Du cc: Yao Yuan cc: Huang Ying cc: Dong Eddie cc: Zhang Yi cc: Dan Williams Subject: [RFC][PATCH v2 15/21] ept-idle: EPT walk for virtual machine References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=0014-kvm-ept-idle-EPT-page-table-walk-for-A-bits.patch X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP For virtual machines, "accessed" bits will be set in guest page tables and EPT/NPT. So for qemu-kvm process, convert HVA to GFN to GPA, then do EPT/NPT walks. This borrows host page table walk macros/functions to do EPT/NPT walk. So it depends on them using the same level. As proposed by Dave Hansen, invalidate TLB when finished one round of scan, in order to ensure HW will set accessed bit for super-hot pages. V2: convert idle_bitmap to idle_pages to be more efficient on - huge pages - sparse page table - ranges of similar pages The new idle_pages file contains a series of records of different size reporting ranges of different page size to user space. That interface has a major downside: it breaks read() assumption about range_to_read == read_buffer_size. Now we workaround this problem by deducing range_to_read from read_buffer_size, and let read() return when either read_buffer_size is filled, or range_to_read is fully scanned. To make a more precise interface, we may need further switch to ioctl(). CC: Dave Hansen Signed-off-by: Peng Dong Signed-off-by: Liu Jingqi Signed-off-by: Fengguang Wu --- arch/x86/kvm/ept_idle.c | 637 ++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/ept_idle.h | 116 ++++++ 2 files changed, 753 insertions(+) create mode 100644 arch/x86/kvm/ept_idle.c create mode 100644 arch/x86/kvm/ept_idle.h --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux/arch/x86/kvm/ept_idle.c 2018-12-26 20:38:07.298994533 +0800 @@ -0,0 +1,637 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "ept_idle.h" + +/* #define DEBUG 1 */ + +#ifdef DEBUG + +#define debug_printk trace_printk + +#define set_restart_gpa(val, note) ({ \ + unsigned long old_val = eic->restart_gpa; \ + eic->restart_gpa = (val); \ + trace_printk("restart_gpa=%lx %luK %s %s %d\n", \ + (val), (eic->restart_gpa - old_val) >> 10, \ + note, __func__, __LINE__); \ +}) + +#define set_next_hva(val, note) ({ \ + unsigned long old_val = eic->next_hva; \ + eic->next_hva = (val); \ + trace_printk(" next_hva=%lx %luK %s %s %d\n", \ + (val), (eic->next_hva - old_val) >> 10, \ + note, __func__, __LINE__); \ +}) + +#else + +#define debug_printk(...) + +#define set_restart_gpa(val, note) ({ \ + eic->restart_gpa = (val); \ +}) + +#define set_next_hva(val, note) ({ \ + eic->next_hva = (val); \ +}) + +#endif + +static unsigned long pagetype_size[16] = { + [PTE_ACCESSED] = PAGE_SIZE, /* 4k page */ + [PMD_ACCESSED] = PMD_SIZE, /* 2M page */ + [PUD_PRESENT] = PUD_SIZE, /* 1G page */ + + [PTE_DIRTY] = PAGE_SIZE, + [PMD_DIRTY] = PMD_SIZE, + + [PTE_IDLE] = PAGE_SIZE, + [PMD_IDLE] = PMD_SIZE, + [PMD_IDLE_PTES] = PMD_SIZE, + + [PTE_HOLE] = PAGE_SIZE, + [PMD_HOLE] = PMD_SIZE, +}; + +static void u64_to_u8(uint64_t n, uint8_t *p) +{ + p += sizeof(uint64_t) - 1; + + *p-- = n; n >>= 8; + *p-- = n; n >>= 8; + *p-- = n; n >>= 8; + *p-- = n; n >>= 8; + + *p-- = n; n >>= 8; + *p-- = n; n >>= 8; + *p-- = n; n >>= 8; + *p = n; +} + +static void dump_eic(struct ept_idle_ctrl *eic) +{ + debug_printk("ept_idle_ctrl: pie_read=%d pie_read_max=%d buf_size=%d " + "bytes_copied=%d next_hva=%lx restart_gpa=%lx " + "gpa_to_hva=%lx\n", + eic->pie_read, + eic->pie_read_max, + eic->buf_size, + eic->bytes_copied, + eic->next_hva, + eic->restart_gpa, + eic->gpa_to_hva); +} + +static void eic_report_addr(struct ept_idle_ctrl *eic, unsigned long addr) +{ + unsigned long hva; + eic->kpie[eic->pie_read++] = PIP_CMD_SET_HVA; + hva = addr; + u64_to_u8(hva, &eic->kpie[eic->pie_read]); + eic->pie_read += sizeof(uint64_t); + debug_printk("eic_report_addr %lx\n", addr); + dump_eic(eic); +} + +static int eic_add_page(struct ept_idle_ctrl *eic, + unsigned long addr, + unsigned long next, + enum ProcIdlePageType page_type) +{ + int page_size = pagetype_size[page_type]; + + debug_printk("eic_add_page addr=%lx next=%lx " + "page_type=%d pagesize=%dK\n", + addr, next, (int)page_type, (int)page_size >> 10); + dump_eic(eic); + + /* align kernel/user vision of cursor position */ + next = round_up(next, page_size); + + if (!eic->pie_read || + addr + eic->gpa_to_hva != eic->next_hva) { + /* merge hole */ + if (page_type == PTE_HOLE || + page_type == PMD_HOLE) { + set_restart_gpa(next, "PTE_HOLE|PMD_HOLE"); + return 0; + } + + if (addr + eic->gpa_to_hva < eic->next_hva) { + debug_printk("ept_idle: addr moves backwards\n"); + WARN_ONCE(1, "ept_idle: addr moves backwards"); + } + + if (eic->pie_read + sizeof(uint64_t) + 2 >= eic->pie_read_max) { + set_restart_gpa(addr, "EPT_IDLE_KBUF_FULL"); + return EPT_IDLE_KBUF_FULL; + } + + eic_report_addr(eic, round_down(addr, page_size) + + eic->gpa_to_hva); + } else { + if (PIP_TYPE(eic->kpie[eic->pie_read - 1]) == page_type && + PIP_SIZE(eic->kpie[eic->pie_read - 1]) < 0xF) { + set_next_hva(next + eic->gpa_to_hva, "IN-PLACE INC"); + set_restart_gpa(next, "IN-PLACE INC"); + eic->kpie[eic->pie_read - 1]++; + WARN_ONCE(page_size < next-addr, "next-addr too large"); + return 0; + } + if (eic->pie_read >= eic->pie_read_max) { + set_restart_gpa(addr, "EPT_IDLE_KBUF_FULL"); + return EPT_IDLE_KBUF_FULL; + } + } + + set_next_hva(next + eic->gpa_to_hva, "NEW-ITEM"); + set_restart_gpa(next, "NEW-ITEM"); + eic->kpie[eic->pie_read] = PIP_COMPOSE(page_type, 1); + eic->pie_read++; + + return 0; +} + +static int ept_pte_range(struct ept_idle_ctrl *eic, + pmd_t *pmd, unsigned long addr, unsigned long end) +{ + pte_t *pte; + enum ProcIdlePageType page_type; + int err = 0; + + pte = pte_offset_kernel(pmd, addr); + do { + if (!ept_pte_present(*pte)) + page_type = PTE_HOLE; + else if (!test_and_clear_bit(_PAGE_BIT_EPT_ACCESSED, + (unsigned long *) &pte->pte)) + page_type = PTE_IDLE; + else { + page_type = PTE_ACCESSED; + } + + err = eic_add_page(eic, addr, addr + PAGE_SIZE, page_type); + if (err) + break; + } while (pte++, addr += PAGE_SIZE, addr != end); + + return err; +} + +static int ept_pmd_range(struct ept_idle_ctrl *eic, + pud_t *pud, unsigned long addr, unsigned long end) +{ + pmd_t *pmd; + unsigned long next; + enum ProcIdlePageType page_type; + enum ProcIdlePageType pte_page_type; + int err = 0; + + if (eic->flags & SCAN_HUGE_PAGE) + pte_page_type = PMD_IDLE_PTES; + else + pte_page_type = IDLE_PAGE_TYPE_MAX; + + pmd = pmd_offset(pud, addr); + do { + next = pmd_addr_end(addr, end); + + if (!ept_pmd_present(*pmd)) + page_type = PMD_HOLE; /* likely won't hit here */ + else if (!test_and_clear_bit(_PAGE_BIT_EPT_ACCESSED, + (unsigned long *)pmd)) { + if (pmd_large(*pmd)) + page_type = PMD_IDLE; + else if (eic->flags & SCAN_SKIM_IDLE) + page_type = PMD_IDLE_PTES; + else + page_type = pte_page_type; + } else if (pmd_large(*pmd)) { + page_type = PMD_ACCESSED; + } else + page_type = pte_page_type; + + if (page_type != IDLE_PAGE_TYPE_MAX) + err = eic_add_page(eic, addr, next, page_type); + else + err = ept_pte_range(eic, pmd, addr, next); + if (err) + break; + } while (pmd++, addr = next, addr != end); + + return err; +} + +static int ept_pud_range(struct ept_idle_ctrl *eic, + p4d_t *p4d, unsigned long addr, unsigned long end) +{ + pud_t *pud; + unsigned long next; + int err = 0; + + pud = pud_offset(p4d, addr); + do { + next = pud_addr_end(addr, end); + + if (!ept_pud_present(*pud)) { + set_restart_gpa(next, "PUD_HOLE"); + continue; + } + + if (pud_large(*pud)) + err = eic_add_page(eic, addr, next, PUD_PRESENT); + else + err = ept_pmd_range(eic, pud, addr, next); + + if (err) + break; + } while (pud++, addr = next, addr != end); + + return err; +} + +static int ept_p4d_range(struct ept_idle_ctrl *eic, + pgd_t *pgd, unsigned long addr, unsigned long end) +{ + p4d_t *p4d; + unsigned long next; + int err = 0; + + p4d = p4d_offset(pgd, addr); + do { + next = p4d_addr_end(addr, end); + if (!ept_p4d_present(*p4d)) { + set_restart_gpa(next, "P4D_HOLE"); + continue; + } + + err = ept_pud_range(eic, p4d, addr, next); + if (err) + break; + } while (p4d++, addr = next, addr != end); + + return err; +} + +static int ept_page_range(struct ept_idle_ctrl *eic, + unsigned long addr, + unsigned long end) +{ + struct kvm_vcpu *vcpu; + struct kvm_mmu *mmu; + pgd_t *ept_root; + pgd_t *pgd; + unsigned long next; + int err = 0; + + BUG_ON(addr >= end); + + spin_lock(&eic->kvm->mmu_lock); + + vcpu = kvm_get_vcpu(eic->kvm, 0); + if (!vcpu) { + err = -EINVAL; + goto out_unlock; + } + + mmu = vcpu->arch.mmu; + if (!VALID_PAGE(mmu->root_hpa)) { + err = -EINVAL; + goto out_unlock; + } + + ept_root = __va(mmu->root_hpa); + + local_irq_disable(); + pgd = pgd_offset_pgd(ept_root, addr); + do { + next = pgd_addr_end(addr, end); + if (!ept_pgd_present(*pgd)) { + set_restart_gpa(next, "PGD_HOLE"); + continue; + } + + err = ept_p4d_range(eic, pgd, addr, next); + if (err) + break; + } while (pgd++, addr = next, addr != end); + local_irq_enable(); +out_unlock: + spin_unlock(&eic->kvm->mmu_lock); + return err; +} + +static void init_ept_idle_ctrl_buffer(struct ept_idle_ctrl *eic) +{ + eic->pie_read = 0; + eic->pie_read_max = min(EPT_IDLE_KBUF_SIZE, + eic->buf_size - eic->bytes_copied); + /* reserve space for PIP_CMD_SET_HVA in the end */ + eic->pie_read_max -= sizeof(uint64_t) + 1; + memset(eic->kpie, 0, sizeof(eic->kpie)); +} + +static int ept_idle_copy_user(struct ept_idle_ctrl *eic, + unsigned long start, unsigned long end) +{ + int bytes_read; + int lc = 0; /* last copy? */ + int ret; + + debug_printk("ept_idle_copy_user %lx %lx\n", start, end); + dump_eic(eic); + + /* Break out of loop on no more progress. */ + if (!eic->pie_read) { + lc = 1; + if (start < end) + start = end; + } + + if (start >= end && start > eic->next_hva) { + set_next_hva(start, "TAIL-HOLE"); + eic_report_addr(eic, start); + } + + bytes_read = eic->pie_read; + if (!bytes_read) + return 1; + + ret = copy_to_user(eic->buf, eic->kpie, bytes_read); + if (ret) + return -EFAULT; + + eic->buf += bytes_read; + eic->bytes_copied += bytes_read; + if (eic->bytes_copied >= eic->buf_size) + return EPT_IDLE_BUF_FULL; + if (lc) + return lc; + + init_ept_idle_ctrl_buffer(eic); + cond_resched(); + return 0; +} + +/* + * Depending on whether hva falls in a memslot: + * + * 1) found => return gpa and remaining memslot size in *addr_range + * + * |<----- addr_range --------->| + * [ mem slot ] + * ^hva + * + * 2) not found => return hole size in *addr_range + * + * |<----- addr_range --------->| + * [ first mem slot above hva ] + * ^hva + * + * If hva is above all mem slots, *addr_range will be ~0UL. We can finish read(2). + */ +static unsigned long ept_idle_find_gpa(struct ept_idle_ctrl *eic, + unsigned long hva, + unsigned long *addr_range) +{ + struct kvm *kvm = eic->kvm; + struct kvm_memslots *slots; + struct kvm_memory_slot *memslot; + unsigned long hva_end; + gfn_t gfn; + + *addr_range = ~0UL; + mutex_lock(&kvm->slots_lock); + slots = kvm_memslots(eic->kvm); + kvm_for_each_memslot(memslot, slots) { + hva_end = memslot->userspace_addr + + (memslot->npages << PAGE_SHIFT); + + if (hva >= memslot->userspace_addr && hva < hva_end) { + gpa_t gpa; + gfn = hva_to_gfn_memslot(hva, memslot); + *addr_range = hva_end - hva; + gpa = gfn_to_gpa(gfn); + debug_printk("ept_idle_find_gpa slot %lx=>%llx %lx=>%llx " + "delta %llx size %lx\n", + memslot->userspace_addr, + gfn_to_gpa(memslot->base_gfn), + hva, gpa, + hva - gpa, + memslot->npages << PAGE_SHIFT); + mutex_unlock(&kvm->slots_lock); + return gpa; + } + + if (memslot->userspace_addr > hva) + *addr_range = min(*addr_range, + memslot->userspace_addr - hva); + } + mutex_unlock(&kvm->slots_lock); + return INVALID_PAGE; +} + +static int ept_idle_supports_cpu(struct kvm *kvm) +{ + struct kvm_vcpu *vcpu; + struct kvm_mmu *mmu; + int ret; + + vcpu = kvm_get_vcpu(kvm, 0); + if (!vcpu) + return -EINVAL; + + spin_lock(&kvm->mmu_lock); + mmu = vcpu->arch.mmu; + if (mmu->mmu_role.base.ad_disabled) { + printk(KERN_NOTICE + "CPU does not support EPT A/D bits tracking\n"); + ret = -EINVAL; + } else if (mmu->shadow_root_level != 4 + (! !pgtable_l5_enabled())) { + printk(KERN_NOTICE "Unsupported EPT level %d\n", + mmu->shadow_root_level); + ret = -EINVAL; + } else + ret = 0; + spin_unlock(&kvm->mmu_lock); + + return ret; +} + +static int ept_idle_walk_hva_range(struct ept_idle_ctrl *eic, + unsigned long start, unsigned long end) +{ + unsigned long gpa_addr; + unsigned long addr_range; + int ret; + + ret = ept_idle_supports_cpu(eic->kvm); + if (ret) + return ret; + + init_ept_idle_ctrl_buffer(eic); + + for (; start < end;) { + gpa_addr = ept_idle_find_gpa(eic, start, &addr_range); + + if (gpa_addr == INVALID_PAGE) { + eic->gpa_to_hva = 0; + if (addr_range == ~0UL) /* beyond max virtual address */ + set_restart_gpa(TASK_SIZE, "EOF"); + else { + start += addr_range; + set_restart_gpa(start, "OUT-OF-SLOT"); + } + } else { + eic->gpa_to_hva = start - gpa_addr; + ept_page_range(eic, gpa_addr, gpa_addr + addr_range); + } + + start = eic->restart_gpa + eic->gpa_to_hva; + ret = ept_idle_copy_user(eic, start, end); + if (ret) + break; + } + + if (eic->bytes_copied) + ret = 0; + return ret; +} + +static ssize_t ept_idle_read(struct file *file, char *buf, + size_t count, loff_t *ppos) +{ + struct mm_struct *mm = file->private_data; + struct ept_idle_ctrl *eic; + unsigned long hva_start = *ppos; + unsigned long hva_end = hva_start + (count << (3 + PAGE_SHIFT)); + int ret; + + if (hva_start >= TASK_SIZE) { + debug_printk("ept_idle_read past TASK_SIZE: %lx %lx\n", + hva_start, TASK_SIZE); + return 0; + } + + if (!mm_kvm(mm)) + return mm_idle_read(file, buf, count, ppos); + + if (hva_end <= hva_start) { + debug_printk("ept_idle_read past EOF: %lx %lx\n", + hva_start, hva_end); + return 0; + } + if (*ppos & (PAGE_SIZE - 1)) { + debug_printk("ept_idle_read unaligned ppos: %lx\n", + hva_start); + return -EINVAL; + } + if (count < EPT_IDLE_BUF_MIN) { + debug_printk("ept_idle_read small count: %lx\n", + (unsigned long)count); + return -EINVAL; + } + + eic = kzalloc(sizeof(*eic), GFP_KERNEL); + if (!eic) + return -ENOMEM; + + if (!mm || !mmget_not_zero(mm)) { + ret = -ESRCH; + goto out_free_eic; + } + + eic->buf = buf; + eic->buf_size = count; + eic->mm = mm; + eic->kvm = mm_kvm(mm); + if (!eic->kvm) { + ret = -EINVAL; + goto out_mm; + } + + kvm_get_kvm(eic->kvm); + + ret = ept_idle_walk_hva_range(eic, hva_start, hva_end); + if (ret) + goto out_kvm; + + ret = eic->bytes_copied; + *ppos = eic->next_hva; + debug_printk("ppos=%lx bytes_copied=%d\n", + eic->next_hva, ret); +out_kvm: + kvm_put_kvm(eic->kvm); +out_mm: + mmput(mm); +out_free_eic: + kfree(eic); + return ret; +} + +static int ept_idle_open(struct inode *inode, struct file *file) +{ + if (!try_module_get(THIS_MODULE)) + return -EBUSY; + + return 0; +} + +static int ept_idle_release(struct inode *inode, struct file *file) +{ + struct mm_struct *mm = file->private_data; + struct kvm *kvm; + int ret = 0; + + if (!mm) { + ret = -EBADF; + goto out; + } + + kvm = mm_kvm(mm); + if (!kvm) { + ret = -EINVAL; + goto out; + } + + spin_lock(&kvm->mmu_lock); + kvm_flush_remote_tlbs(kvm); + spin_unlock(&kvm->mmu_lock); + +out: + module_put(THIS_MODULE); + return ret; +} + +extern struct file_operations proc_ept_idle_operations; + +static int ept_idle_entry(void) +{ + proc_ept_idle_operations.owner = THIS_MODULE; + proc_ept_idle_operations.read = ept_idle_read; + proc_ept_idle_operations.open = ept_idle_open; + proc_ept_idle_operations.release = ept_idle_release; + + return 0; +} + +static void ept_idle_exit(void) +{ + memset(&proc_ept_idle_operations, 0, sizeof(proc_ept_idle_operations)); +} + +MODULE_LICENSE("GPL"); +module_init(ept_idle_entry); +module_exit(ept_idle_exit); --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux/arch/x86/kvm/ept_idle.h 2018-12-26 20:32:09.775444685 +0800 @@ -0,0 +1,116 @@ +#ifndef _EPT_IDLE_H +#define _EPT_IDLE_H + +#define SCAN_HUGE_PAGE O_NONBLOCK /* only huge page */ +#define SCAN_SKIM_IDLE O_NOFOLLOW /* stop on PMD_IDLE_PTES */ + +enum ProcIdlePageType { + PTE_ACCESSED, /* 4k page */ + PMD_ACCESSED, /* 2M page */ + PUD_PRESENT, /* 1G page */ + + PTE_DIRTY, + PMD_DIRTY, + + PTE_IDLE, + PMD_IDLE, + PMD_IDLE_PTES, /* all PTE idle */ + + PTE_HOLE, + PMD_HOLE, + + PIP_CMD, + + IDLE_PAGE_TYPE_MAX +}; + +#define PIP_TYPE(a) (0xf & (a >> 4)) +#define PIP_SIZE(a) (0xf & a) +#define PIP_COMPOSE(type, nr) ((type << 4) | nr) + +#define PIP_CMD_SET_HVA PIP_COMPOSE(PIP_CMD, 0) + +#define _PAGE_BIT_EPT_ACCESSED 8 +#define _PAGE_EPT_ACCESSED (_AT(pteval_t, 1) << _PAGE_BIT_EPT_ACCESSED) + +#define _PAGE_EPT_PRESENT (_AT(pteval_t, 7)) + +static inline int ept_pte_present(pte_t a) +{ + return pte_flags(a) & _PAGE_EPT_PRESENT; +} + +static inline int ept_pmd_present(pmd_t a) +{ + return pmd_flags(a) & _PAGE_EPT_PRESENT; +} + +static inline int ept_pud_present(pud_t a) +{ + return pud_flags(a) & _PAGE_EPT_PRESENT; +} + +static inline int ept_p4d_present(p4d_t a) +{ + return p4d_flags(a) & _PAGE_EPT_PRESENT; +} + +static inline int ept_pgd_present(pgd_t a) +{ + return pgd_flags(a) & _PAGE_EPT_PRESENT; +} + +static inline int ept_pte_accessed(pte_t a) +{ + return pte_flags(a) & _PAGE_EPT_ACCESSED; +} + +static inline int ept_pmd_accessed(pmd_t a) +{ + return pmd_flags(a) & _PAGE_EPT_ACCESSED; +} + +static inline int ept_pud_accessed(pud_t a) +{ + return pud_flags(a) & _PAGE_EPT_ACCESSED; +} + +static inline int ept_p4d_accessed(p4d_t a) +{ + return p4d_flags(a) & _PAGE_EPT_ACCESSED; +} + +static inline int ept_pgd_accessed(pgd_t a) +{ + return pgd_flags(a) & _PAGE_EPT_ACCESSED; +} + +extern struct file_operations proc_ept_idle_operations; + +#define EPT_IDLE_KBUF_FULL 1 +#define EPT_IDLE_BUF_FULL 2 +#define EPT_IDLE_BUF_MIN (sizeof(uint64_t) * 2 + 3) + +#define EPT_IDLE_KBUF_SIZE 8000 + +struct ept_idle_ctrl { + struct mm_struct *mm; + struct kvm *kvm; + + uint8_t kpie[EPT_IDLE_KBUF_SIZE]; + int pie_read; + int pie_read_max; + + void __user *buf; + int buf_size; + int bytes_copied; + + unsigned long next_hva; /* GPA for EPT; VA for PT */ + unsigned long gpa_to_hva; + unsigned long restart_gpa; + unsigned long last_va; + + unsigned int flags; +}; + +#endif From patchwork Wed Dec 26 13:15:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10743097 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C5254924 for ; Wed, 26 Dec 2018 13:37:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A9F1528900 for ; Wed, 26 Dec 2018 13:37:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9E37228939; Wed, 26 Dec 2018 13:37:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 386E628900 for ; Wed, 26 Dec 2018 13:37:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E26CE8E000C; Wed, 26 Dec 2018 08:37:09 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2CF398E0002; Wed, 26 Dec 2018 08:37:09 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D78FA8E0002; Wed, 26 Dec 2018 08:37:08 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id AA4EA8E000D for ; Wed, 26 Dec 2018 08:37:07 -0500 (EST) Received: by mail-pl1-f197.google.com with SMTP id o23so14041132pll.0 for ; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:subject :references:mime-version:content-disposition; bh=4nNLa4VNw57XqdcWrPd2uVxow3DQwO1d97n7JZywYT0=; b=euhIS8qKAjI3wFdlBMm0Y8vWLRRtpYR9d6DJRXJANY/9+gyjHXZzLda8OelYuXwXbk ZX7Vuer8FNWs0ONL6aiVzo+7BAPcsabPom/iPUpB3AMveYXQkhjaVKdyMywHx7OA5obU v9Ei5PLr/OO+EOzbO5Ql2KJtr+77ksSPS5DZfWuLQLDShxS3VNfSWeKObvQz5+uZ3tha wOjPFQH8nZAJ9WgNoXuBwhUxlNQ5a+/6qVburIo88Swodpziv0xLaUdqIf3OsDIO+u+m q9H7goJKYC6zbbclFnfg0D2pRtezBbkR3vde24G30+GaFPXGm1BZC6RkraGd6M1jjc7+ V8mA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AJcUukeveH/etdjllrXvXp2KMWEvVSKLNoGr27en/f324AzcQ8zJAgu9 fZWK08GN/mTD1TFUMzAHDywOyAx0CN0Oz7a9hRneTVVNKkDi/QOURd8FNTfKC5ZT0ozGuYxdJq9 9PsitJ4zY0l7dPDzpOqYg6Fhnqs5lIN64DgYtgUzkc/6MAerRS1p2aIWuMDUc2oo15Q== X-Received: by 2002:a17:902:7791:: with SMTP id o17mr19535384pll.60.1545831427377; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-Smtp-Source: ALg8bN567LX85Ko01Rahu6HEAaVKl5YWxmEN5VQCy5ZR7hU+uXq+ZrJ1Z/WUp0EHDjaHEnzBW1xt X-Received: by 2002:a17:902:7791:: with SMTP id o17mr19535346pll.60.1545831426800; Wed, 26 Dec 2018 05:37:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831426; cv=none; d=google.com; s=arc-20160816; b=DHphF1iwPjTBCmAFC2a9BexZ2sv/sqabBAX4k8aCM5Bz1jB4iGQTPzW8qNxmDKMSoI +9CH7wEd56WRN1Tai4nb8JhaWssS24F05aVvO2xbaL6ZSNauNsBiNdFOsuu6RvjDwnEg 19F809i5szzd4I3JbKYDGdyvEuxwp1lLVtomlyEkgh+IA8d32NC4SKfZKItGmsoPVRmu wGYmiVw+PAJyN50A481GTL69b7oIa0d2HZNBBi/TJDDeMAckMQ2tWRBA0tq2k3ULjk1e S5tROiFmLUHvu4G2FfcvNASmmspeJbdhxJc3p5w4MAVZAtmpR1sM50ECKU1e4fKPcrrH ToEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:references:subject:cc:cc:cc:cc:cc :cc:cc:cc:cc:cc:cc:to:from:date:user-agent:message-id; bh=4nNLa4VNw57XqdcWrPd2uVxow3DQwO1d97n7JZywYT0=; b=CEK34X16EhhkxYTt3WVNmZR4m0cfmaRA1iNTKukbbq1pO00BePwpDiCrb+t/AHMC0K HRGgsd5Zb7IPpPbcxJRAUkW7hqIvBLJ2ZOViFqWy7OXRBfTR30Q6t+/tplnxsYSFFbxN qSCBvCUaVVw0qMpa/BZsVDy7SwORDqba+NOcNtxVcgBa/tyA0p0I6XOA5QA/VLD54Scn l2GKyE4uXsFunB4pJyFZDBvdhvDwFkA7qXJrptJJBBfC0kOE0jAtq7FWtya2mq0HulXV rBBTm7/ZqRIletWdD0Cbrd1IPC2XTGnnFIUMqATY3IIlKMJ+jah9IAAq+DOP+1Otmewq iIAQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id c7si33395890pgg.339.2018.12.26.05.37.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Dec 2018 05:37:06 -0800 (PST) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="121185469" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by FMSMGA003.fm.intel.com with ESMTP; 26 Dec 2018 05:37:02 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005P8-Jt; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133352.012352050@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:15:02 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Zhang Yi , Fengguang Wu cc: kvm@vger.kernel.org Cc: LKML cc: Fan Du cc: Yao Yuan cc: Peng Dong cc: Huang Ying CC: Liu Jingqi cc: Dong Eddie cc: Dave Hansen cc: Dan Williams Subject: [RFC][PATCH v2 16/21] mm-idle: mm_walk for normal task References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=0015-page-idle-Added-mmu-idle-page-walk.patch X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zhang Yi File pages are skipped for now. They are in general not guaranteed to be mapped. It means when become hot, there is no guarantee to find and move them to DRAM nodes. Signed-off-by: Zhang Yi Signed-off-by: Fengguang Wu --- arch/x86/kvm/ept_idle.c | 204 ++++++++++++++++++++++++++++++++++++++ mm/pagewalk.c | 1 2 files changed, 205 insertions(+) --- linux.orig/arch/x86/kvm/ept_idle.c 2018-12-26 19:58:30.576894801 +0800 +++ linux/arch/x86/kvm/ept_idle.c 2018-12-26 19:58:39.840936072 +0800 @@ -510,6 +510,9 @@ static int ept_idle_walk_hva_range(struc return ret; } +static ssize_t mm_idle_read(struct file *file, char *buf, + size_t count, loff_t *ppos); + static ssize_t ept_idle_read(struct file *file, char *buf, size_t count, loff_t *ppos) { @@ -615,6 +618,207 @@ out: return ret; } +static int mm_idle_pte_range(struct ept_idle_ctrl *eic, pmd_t *pmd, + unsigned long addr, unsigned long next) +{ + enum ProcIdlePageType page_type; + pte_t *pte; + int err = 0; + + pte = pte_offset_kernel(pmd, addr); + do { + if (!pte_present(*pte)) + page_type = PTE_HOLE; + else if (!test_and_clear_bit(_PAGE_BIT_ACCESSED, + (unsigned long *) &pte->pte)) + page_type = PTE_IDLE; + else { + page_type = PTE_ACCESSED; + } + + err = eic_add_page(eic, addr, addr + PAGE_SIZE, page_type); + if (err) + break; + } while (pte++, addr += PAGE_SIZE, addr != next); + + return err; +} + +static int mm_idle_pmd_entry(pmd_t *pmd, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + struct ept_idle_ctrl *eic = walk->private; + enum ProcIdlePageType page_type; + enum ProcIdlePageType pte_page_type; + int err; + + /* + * Skip duplicate PMD_IDLE_PTES: when the PMD crosses VMA boundary, + * walk_page_range() can call on the same PMD twice. + */ + if ((addr & PMD_MASK) == (eic->last_va & PMD_MASK)) { + debug_printk("ignore duplicate addr %lx %lx\n", + addr, eic->last_va); + return 0; + } + eic->last_va = addr; + + if (eic->flags & SCAN_HUGE_PAGE) + pte_page_type = PMD_IDLE_PTES; + else + pte_page_type = IDLE_PAGE_TYPE_MAX; + + if (!pmd_present(*pmd)) + page_type = PMD_HOLE; + else if (!test_and_clear_bit(_PAGE_BIT_ACCESSED, (unsigned long *)pmd)) { + if (pmd_large(*pmd)) + page_type = PMD_IDLE; + else if (eic->flags & SCAN_SKIM_IDLE) + page_type = PMD_IDLE_PTES; + else + page_type = pte_page_type; + } else if (pmd_large(*pmd)) { + page_type = PMD_ACCESSED; + } else + page_type = pte_page_type; + + if (page_type != IDLE_PAGE_TYPE_MAX) + err = eic_add_page(eic, addr, next, page_type); + else + err = mm_idle_pte_range(eic, pmd, addr, next); + + return err; +} + +static int mm_idle_pud_entry(pud_t *pud, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + struct ept_idle_ctrl *eic = walk->private; + + if ((addr & PUD_MASK) != (eic->last_va & PUD_MASK)) { + eic_add_page(eic, addr, next, PUD_PRESENT); + eic->last_va = addr; + } + return 1; +} + +static int mm_idle_test_walk(unsigned long start, unsigned long end, + struct mm_walk *walk) +{ + struct vm_area_struct *vma = walk->vma; + + if (vma->vm_file) { + if ((vma->vm_flags & (VM_WRITE|VM_MAYSHARE)) == VM_WRITE) + return 0; + return 1; + } + + return 0; +} + +static int mm_idle_walk_range(struct ept_idle_ctrl *eic, + unsigned long start, + unsigned long end, + struct mm_walk *walk) +{ + struct vm_area_struct *vma; + int ret; + + init_ept_idle_ctrl_buffer(eic); + + for (; start < end;) + { + down_read(&walk->mm->mmap_sem); + vma = find_vma(walk->mm, start); + if (vma) { + if (end > vma->vm_start) { + local_irq_disable(); + ret = walk_page_range(start, end, walk); + local_irq_enable(); + } else + set_restart_gpa(vma->vm_start, "VMA-HOLE"); + } else + set_restart_gpa(TASK_SIZE, "EOF"); + up_read(&walk->mm->mmap_sem); + + WARN_ONCE(eic->gpa_to_hva, "non-zero gpa_to_hva"); + start = eic->restart_gpa; + ret = ept_idle_copy_user(eic, start, end); + if (ret) + break; + } + + if (eic->bytes_copied) { + if (ret != EPT_IDLE_BUF_FULL && eic->next_hva < end) + debug_printk("partial scan: next_hva=%lx end=%lx\n", + eic->next_hva, end); + ret = 0; + } else + WARN_ONCE(1, "nothing read"); + return ret; +} + +static ssize_t mm_idle_read(struct file *file, char *buf, + size_t count, loff_t *ppos) +{ + struct mm_struct *mm = file->private_data; + struct mm_walk mm_walk = {}; + struct ept_idle_ctrl *eic; + unsigned long va_start = *ppos; + unsigned long va_end = va_start + (count << (3 + PAGE_SHIFT)); + int ret; + + if (va_end <= va_start) { + debug_printk("mm_idle_read past EOF: %lx %lx\n", + va_start, va_end); + return 0; + } + if (*ppos & (PAGE_SIZE - 1)) { + debug_printk("mm_idle_read unaligned ppos: %lx\n", + va_start); + return -EINVAL; + } + if (count < EPT_IDLE_BUF_MIN) { + debug_printk("mm_idle_read small count: %lx\n", + (unsigned long)count); + return -EINVAL; + } + + eic = kzalloc(sizeof(*eic), GFP_KERNEL); + if (!eic) + return -ENOMEM; + + if (!mm || !mmget_not_zero(mm)) { + ret = -ESRCH; + goto out_free; + } + + eic->buf = buf; + eic->buf_size = count; + eic->mm = mm; + eic->flags = file->f_flags; + + mm_walk.mm = mm; + mm_walk.pmd_entry = mm_idle_pmd_entry; + mm_walk.pud_entry = mm_idle_pud_entry; + mm_walk.test_walk = mm_idle_test_walk; + mm_walk.private = eic; + + ret = mm_idle_walk_range(eic, va_start, va_end, &mm_walk); + if (ret) + goto out_mm; + + ret = eic->bytes_copied; + *ppos = eic->next_hva; + debug_printk("ppos=%lx bytes_copied=%d\n", + eic->next_hva, ret); +out_mm: + mmput(mm); +out_free: + kfree(eic); + return ret; +} + extern struct file_operations proc_ept_idle_operations; static int ept_idle_entry(void) --- linux.orig/mm/pagewalk.c 2018-12-26 19:58:30.576894801 +0800 +++ linux/mm/pagewalk.c 2018-12-26 19:58:30.576894801 +0800 @@ -338,6 +338,7 @@ int walk_page_range(unsigned long start, } while (start = next, start < end); return err; } +EXPORT_SYMBOL(walk_page_range); int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk) { From patchwork Wed Dec 26 13:15:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10743131 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4584417FB for ; Wed, 26 Dec 2018 13:38:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 324B626242 for ; Wed, 26 Dec 2018 13:38:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 25D2C204BF; Wed, 26 Dec 2018 13:38:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9A33726242 for ; Wed, 26 Dec 2018 13:38:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7C5A88E0004; Wed, 26 Dec 2018 08:38:44 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7508D8E0001; Wed, 26 Dec 2018 08:38:44 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5F1078E0004; Wed, 26 Dec 2018 08:38:44 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id 16CD88E0001 for ; Wed, 26 Dec 2018 08:38:44 -0500 (EST) Received: by mail-pg1-f199.google.com with SMTP id d3so15186482pgv.23 for ; Wed, 26 Dec 2018 05:38:44 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:subject :references:mime-version:content-disposition; bh=sK2wyaa9OQJCGIyx2U8qb6p7gEuhbJ2Wb9fFN0ccy8A=; b=qUYgFoCeggXdRRSLatYLRkAG7+JkeSw05P4c5nEvXsWrPATEFDdFWKSkbN3ZaSeJr6 GoO+Q6tud3abzWaHQbZN5MID4k1Ki6E9ERleb4+aCgP3T0mFgdbVectxurehdjGVeNXD ghEAHvp7YUhpRFWekQd4rantmHWGZs4BSP7au7MHVvaSbA1DamoBfj2af6RuttzuWHBo aRftWcMffcrYBaGbsvrCg4UPYUmEIeSttHbwwj3lbnFmyD65EbzPW5xk0FjdmftZFVTw wJEZwHiYSzz4Gkr3vUzwhyFzdD6Wx4bosjtcNODNXHEB3s+0YEJUwqcQ/juqdxqUHi+P f24A== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWbXB4SGzwm544XxD2SgMTwdlEMSRQV2TyMFBfktpL/M9SAcCqu7 yGrNrQJYf4aLuPOcbfy5sJjrEMCgSi/SexFrloNPEI1yHSNS/gmxmBziOSS1NtEGr5Z+5VXn3vN q91fBvMqh60g0lWnIWFIh/wU5DtrEoO85PuRpmuR1cqIrXIpYdOf/Yhm+J+dLoRGPww== X-Received: by 2002:a62:26c7:: with SMTP id m190mr20523555pfm.79.1545831523763; Wed, 26 Dec 2018 05:38:43 -0800 (PST) X-Google-Smtp-Source: AFSGD/UBy4hg46vX54Wa0Ux2AL/Xy/Dk8d1cW3RDid+ow1B734ewxKvU0mfa3Es1FltGLmDuNBuE X-Received: by 2002:a62:26c7:: with SMTP id m190mr20518694pfm.79.1545831427585; Wed, 26 Dec 2018 05:37:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831427; cv=none; d=google.com; s=arc-20160816; b=v4wKV+vnVl+rS9y8N7aXdxqv7LGY19PZdWUAf0T1LEWicn/45kQDa/MNvjNYblkgKM hdnQ1/1NNZs9j3LyLYk/2x1+KFHWIr1PpJPKWq9pgsNeHPXm8tKqeBLwYI0GbPC1kw/s HYQJ4guylESZyjGBSgDi7KZYPjU5cZCkZdShNqwHnOlDsljcEKuvFFnj5AH75+fvLzQr ZgT6nLpIa8lGsL48hqd4ufMBcMslnTMdIwj+NYsOWGiRxbw/lG4BuO2a+khEt8sg/6G2 hIQDdsMu35iIOGkR0Z1Fj16yEHaJUvb0pXc881k4ajVkPYB1YW2JzqmnPqZ9E0clCMpX xNjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:references:subject:cc:cc:cc:cc:cc :cc:cc:cc:cc:cc:cc:to:from:date:user-agent:message-id; bh=sK2wyaa9OQJCGIyx2U8qb6p7gEuhbJ2Wb9fFN0ccy8A=; b=w1FFsLvU/Xxhqkc/z8/qEpP/0TtBmhzt0QMGgmRD5ckGiP34epraN3xmwNnzUZzclq Dw3QqlD5GkRfsPv4QZkhiIlWgAmXAHUDnb2Cb50F49i20wLdW2VfXStTwYsZ20rT1HUS HF5GbbZ1dZ4d9B2bnCNOyMxs4mMV7zJYbeBCpSTfHzuAQ/w9U1uzSWeI8DpLa3BAdC1R BWrVMvESaMwqTX9ExTOIJOyxuMFw3goWgj6wjChYCZB2VdZRw08eF4OcdS/oEmdiCIkh 4S1tM/Dk0ZEBZyzofhqvzL/E1La6aygogGCa4ID1Kj0XHSTzNj4YfAUmxHw2TRSE3jRR uVqQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id p11si31508288plk.191.2018.12.26.05.37.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Dec 2018 05:37:07 -0800 (PST) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="113358947" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by orsmga003.jf.intel.com with ESMTP; 26 Dec 2018 05:37:02 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005PD-Kb; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133352.076749877@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:15:03 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Huang Ying , Brendan Gregg , Fengguang Wu cc: kvm@vger.kernel.org Cc: LKML cc: Fan Du cc: Yao Yuan cc: Peng Dong CC: Liu Jingqi cc: Dong Eddie cc: Dave Hansen cc: Zhang Yi cc: Dan Williams Subject: [RFC][PATCH v2 17/21] proc: introduce /proc/PID/idle_pages References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=0008-proc-introduce-proc-PID-idle_pages.patch X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This will be similar to /sys/kernel/mm/page_idle/bitmap documented in Documentation/admin-guide/mm/idle_page_tracking.rst, however indexed by process virtual address. When using the global PFN indexed idle bitmap, we find 2 kind of overheads: - to track a task's working set, Brendan Gregg end up writing wss-v1 for small tasks and wss-v2 for large tasks: https://github.com/brendangregg/wss That's because VAs may point to random PAs throughout the physical address space. So we either query /proc/pid/pagemap first and access the lots of random PFNs (with lots of syscalls) in the bitmap, or write+read the whole system idle bitmap beforehand. - page table walking by PFN has much more overheads than to walk a page table in its natural order: - rmap queries - more locking - random memory reads/writes This interface provides a cheap path for the majority non-shared mapping pages. To walk 1TB memory of 4k active pages, it costs 2s vs 15s system time to scan the per-task/global idle bitmaps. Which means ~7x speedup. The gap will be enlarged if consider - the extra /proc/pid/pagemap walk - natural page table walks can skip the whole 512 PTEs if PMD is idle OTOH, the per-task idle bitmap is not suitable in some situations: - not accurate for shared pages - don't work with non-mapped file pages - don't perform well for sparse page tables (pointed out by Huang Ying) So it's more about complementing the existing global idle bitmap. CC: Huang Ying CC: Brendan Gregg Signed-off-by: Fengguang Wu --- fs/proc/base.c | 2 + fs/proc/internal.h | 1 fs/proc/task_mmu.c | 54 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 57 insertions(+) --- linux.orig/fs/proc/base.c 2018-12-23 20:08:14.228919325 +0800 +++ linux/fs/proc/base.c 2018-12-23 20:08:14.224919327 +0800 @@ -2969,6 +2969,7 @@ static const struct pid_entry tgid_base_ REG("smaps", S_IRUGO, proc_pid_smaps_operations), REG("smaps_rollup", S_IRUGO, proc_pid_smaps_rollup_operations), REG("pagemap", S_IRUSR, proc_pagemap_operations), + REG("idle_pages", S_IRUSR|S_IWUSR, proc_mm_idle_operations), #endif #ifdef CONFIG_SECURITY DIR("attr", S_IRUGO|S_IXUGO, proc_attr_dir_inode_operations, proc_attr_dir_operations), @@ -3357,6 +3358,7 @@ static const struct pid_entry tid_base_s REG("smaps", S_IRUGO, proc_pid_smaps_operations), REG("smaps_rollup", S_IRUGO, proc_pid_smaps_rollup_operations), REG("pagemap", S_IRUSR, proc_pagemap_operations), + REG("idle_pages", S_IRUSR|S_IWUSR, proc_mm_idle_operations), #endif #ifdef CONFIG_SECURITY DIR("attr", S_IRUGO|S_IXUGO, proc_attr_dir_inode_operations, proc_attr_dir_operations), --- linux.orig/fs/proc/internal.h 2018-12-23 20:08:14.228919325 +0800 +++ linux/fs/proc/internal.h 2018-12-23 20:08:14.224919327 +0800 @@ -298,6 +298,7 @@ extern const struct file_operations proc extern const struct file_operations proc_pid_smaps_rollup_operations; extern const struct file_operations proc_clear_refs_operations; extern const struct file_operations proc_pagemap_operations; +extern const struct file_operations proc_mm_idle_operations; extern unsigned long task_vsize(struct mm_struct *); extern unsigned long task_statm(struct mm_struct *, --- linux.orig/fs/proc/task_mmu.c 2018-12-23 20:08:14.228919325 +0800 +++ linux/fs/proc/task_mmu.c 2018-12-23 20:08:14.224919327 +0800 @@ -1559,6 +1559,60 @@ const struct file_operations proc_pagema .open = pagemap_open, .release = pagemap_release, }; + +/* will be filled when kvm_ept_idle module loads */ +struct file_operations proc_ept_idle_operations = { +}; +EXPORT_SYMBOL_GPL(proc_ept_idle_operations); + +static ssize_t mm_idle_read(struct file *file, char __user *buf, + size_t count, loff_t *ppos) +{ + if (proc_ept_idle_operations.read) + return proc_ept_idle_operations.read(file, buf, count, ppos); + + return 0; +} + + +static int mm_idle_open(struct inode *inode, struct file *file) +{ + struct mm_struct *mm = proc_mem_open(inode, PTRACE_MODE_READ); + + if (IS_ERR(mm)) + return PTR_ERR(mm); + + file->private_data = mm; + + if (proc_ept_idle_operations.open) + return proc_ept_idle_operations.open(inode, file); + + return 0; +} + +static int mm_idle_release(struct inode *inode, struct file *file) +{ + struct mm_struct *mm = file->private_data; + + if (mm) { + if (!mm_kvm(mm)) + flush_tlb_mm(mm); + mmdrop(mm); + } + + if (proc_ept_idle_operations.release) + return proc_ept_idle_operations.release(inode, file); + + return 0; +} + +const struct file_operations proc_mm_idle_operations = { + .llseek = mem_lseek, /* borrow this */ + .read = mm_idle_read, + .open = mm_idle_open, + .release = mm_idle_release, +}; + #endif /* CONFIG_PROC_PAGE_MONITOR */ #ifdef CONFIG_NUMA From patchwork Wed Dec 26 13:15:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10743085 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D550C924 for ; Wed, 26 Dec 2018 13:37:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C1D9B28495 for ; Wed, 26 Dec 2018 13:37:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B633128938; Wed, 26 Dec 2018 13:37:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 552BD28495 for ; Wed, 26 Dec 2018 13:37:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F27158E0010; Wed, 26 Dec 2018 08:37:08 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C5B308E0012; Wed, 26 Dec 2018 08:37:08 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5F1368E0012; Wed, 26 Dec 2018 08:37:08 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id 859FA8E000B for ; Wed, 26 Dec 2018 08:37:07 -0500 (EST) Received: by mail-pl1-f198.google.com with SMTP id l9so13984252plt.7 for ; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:subject :references:mime-version:content-disposition; bh=VHGacJehgC+4YcNMytQsBmtvZ7i7lp3jqF9SjGnKTPY=; b=DlaYonDzlljn5glskEhzh1gT1weAh9JMImBh0e06BWWE6hiWRjeBI+A07u3exeKYXy DIz4q1lz1319C9N/uXET5pyByvqzo4P+Nnv1fcIetrUQIY6lGLTReEu1G+SBfYhfJGDu ATa2iYLV2yfLbLuIJTs7pkOqat/Korq2jEFWLcXS7Q45AveDwZa+1gY5HFrzvBXjOkRN 9mPf+e3krrwv+FUKsqCa5KY42gb9kv5I4ZBXM/MF5UJk0dFkQDhQ3Wac3920nvSnn7s+ Laj3MaPY6yReCFLRtY1LrR13zOfvsMfc2QFxhY5++Tv8FzpJCADmoOT9bhIFu0RFIwA/ 5fNw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AJcUukcN094yhzNi3VrzfT8HAJESrakdeh7pW2lAQgwmyP9+ToNH0+MV rzMgj9OqCiIkxIlNcugjVjRBO1iy3P50gU0167hQnLlfxgXU6ZJ8CfsG3l9KqpaS6IFiPaVg7wb tnbFwGH7g2M5Ct0P4J/8Cb8WWZ8O+ZcNOrpwwkkoUqEGAoNPKuTUIUn/gFzU/4R4USg== X-Received: by 2002:a63:4002:: with SMTP id n2mr18661852pga.137.1545831427241; Wed, 26 Dec 2018 05:37:07 -0800 (PST) X-Google-Smtp-Source: ALg8bN657Fq+zbibRPsn6/INlnbtPgB427w+e0l5sLtovPcEOEQuOxX7WynE9Szq+idmc81UxhjN X-Received: by 2002:a63:4002:: with SMTP id n2mr18661821pga.137.1545831426672; Wed, 26 Dec 2018 05:37:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831426; cv=none; d=google.com; s=arc-20160816; b=FiotiwQ0iwKjEZ7m1CDc8ViRiwrAw5yCvUgDWjC2fQIwswo/TRSDuXx9uNuoxCs+no NIts0gX7lUg6ui/2CBMVD46w7QDpwknfoXLzQK7vOWCPNO18PtCyKwpjByz23Fl+ZSvF 7AnSPmvjym8x9M1zUk+SLxKba7G2xnneXNhLndNIH5GhLQcpUM/a37ALVXNWGtSjLENb Iod/EfT4LkHxaWHm87yliqGNhPmS2pm6R8aVjtvJV+p8YEY/oVe8n14S6jLEUF3Fo90p PnwmW1iJcsut8DpogAbQDI/4WL7wSBsjLias717CDV8s0M9RP24Tc85cCLIDVlNKAtDs tj4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:references:subject:cc:cc:cc:cc:cc :cc:cc:cc:cc:cc:cc:cc:to:from:date:user-agent:message-id; bh=VHGacJehgC+4YcNMytQsBmtvZ7i7lp3jqF9SjGnKTPY=; b=n0WsFVREI7IHCYy4cPa5i+ThuDe/ejsYDNsTOaL6qXE4UnYzrDtRpw0zRfGq2BWOn5 LHU1ZMOL9ibzPkC4lHRBWKKO75Lj9GF/WEEAdly1NRn39U1xOyx5I6QcVgyGBB+9yyHW xPcakvRIl7GfCsl0N5mK07Eo592wQ90z61U+VEmBMVselYTqdNF4nkHXdnfrvAQmmFHc wAIVrGjp2hFsKoZamJAWy7S+NLigSTL9qqkaVGfsI381bFjAFi5+be0wcBMSq9zywJEX PmosJARZI9wqvtaLrepGAe7RXZ8v2MGXrPEiZ0+KIsfdrjZPaStp+Ctnmu7jbIZblgoG 6o0A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id e68si15371744pfb.101.2018.12.26.05.37.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Dec 2018 05:37:06 -0800 (PST) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="121185467" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by FMSMGA003.fm.intel.com with ESMTP; 26 Dec 2018 05:37:02 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005PI-LE; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133352.133164898@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:15:04 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Fengguang Wu cc: kvm@vger.kernel.org Cc: LKML cc: Fan Du cc: Yao Yuan cc: Peng Dong cc: Huang Ying CC: Liu Jingqi cc: Dong Eddie cc: Dave Hansen cc: Zhang Yi cc: Dan Williams Subject: [RFC][PATCH v2 18/21] kvm-ept-idle: enable module References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=0007-kvm-ept-idle-enable-module.patch X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Fengguang Wu --- arch/x86/kvm/Kconfig | 11 +++++++++++ arch/x86/kvm/Makefile | 4 ++++ 2 files changed, 15 insertions(+) --- linux.orig/arch/x86/kvm/Kconfig 2018-12-23 20:09:04.628882396 +0800 +++ linux/arch/x86/kvm/Kconfig 2018-12-23 20:09:04.628882396 +0800 @@ -96,6 +96,17 @@ config KVM_MMU_AUDIT This option adds a R/W kVM module parameter 'mmu_audit', which allows auditing of KVM MMU events at runtime. +config KVM_EPT_IDLE + tristate "KVM EPT idle page tracking" + depends on KVM_INTEL + depends on PROC_PAGE_MONITOR + ---help--- + Provides support for walking EPT to get the A bits on Intel + processors equipped with the VT extensions. + + To compile this as a module, choose M here: the module + will be called kvm-ept-idle. + # OK, it's a little counter-intuitive to do this, but it puts it neatly under # the virtualization menu. source drivers/vhost/Kconfig --- linux.orig/arch/x86/kvm/Makefile 2018-12-23 20:09:04.628882396 +0800 +++ linux/arch/x86/kvm/Makefile 2018-12-23 20:09:04.628882396 +0800 @@ -19,6 +19,10 @@ kvm-y += x86.o mmu.o emulate.o i8259.o kvm-intel-y += vmx.o pmu_intel.o kvm-amd-y += svm.o pmu_amd.o +kvm-ept-idle-y += ept_idle.o + obj-$(CONFIG_KVM) += kvm.o obj-$(CONFIG_KVM_INTEL) += kvm-intel.o obj-$(CONFIG_KVM_AMD) += kvm-amd.o + +obj-$(CONFIG_KVM_EPT_IDLE) += kvm-ept-idle.o From patchwork Wed Dec 26 13:15:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10743117 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D23C4924 for ; Wed, 26 Dec 2018 13:38:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BFA4728495 for ; Wed, 26 Dec 2018 13:38:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B361628938; Wed, 26 Dec 2018 13:38:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3D0E128495 for ; Wed, 26 Dec 2018 13:38:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CEE398E0006; Wed, 26 Dec 2018 08:37:10 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 41C018E0015; Wed, 26 Dec 2018 08:37:10 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 058958E0012; Wed, 26 Dec 2018 08:37:09 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id A7AF08E000B for ; Wed, 26 Dec 2018 08:37:08 -0500 (EST) Received: by mail-pf1-f198.google.com with SMTP id f69so17813903pff.5 for ; Wed, 26 Dec 2018 05:37:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:subject :references:mime-version:content-disposition; bh=pybQb2bIsIMoSE+nhg2Fmhc1oq+DOuwKQqNor65wLbc=; b=K4LwyMRobzh272l0a30cRWw7mhgfcJ36AGHHW8QBMiYTE4H/HlH9DCR8S24avuO/O9 jCPqVdV3Flu+qL95A4h/nTck9jnDq1mb4mz0qFLhENf3S2K6Wp8ZbpOoHCCJJkRkt0lL h3rYTRIoGsXahH3gDtgYash4/6kJyPCwpPfcTOr9W/mvlhqc3rxv0eKf0cX6DDTt8wRj XwLZ0czRfpSWG+U0Z22ath5G6rlECr7R+Jg45kM4CkQ4RAMQfNOj832UtsRKmlqEKtoh OkS2SNL+oY9OnDtU4yGkb0S7iEFG8k3KvRnrz34BLzgh4poHU3kTUQfvMfDtBpPhSClr Jq6g== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AJcUukfHkXEK79XJHVLlkM5u/4Y7OOXcxs64so9dqupPiS2j8LCw3AEI RVXOFsMNlolgUS+3W49JY3uK+kfaW4R97TJpqyYduo0YoFiB2MCFcvlL770OU0kbpTAx3keVsRW z3csESnAOJfL95E5744yYBCIKL2slhogN7q+T7TIx7uGyCeCWURZtjt8QUHhXIfewzg== X-Received: by 2002:a17:902:33c2:: with SMTP id b60mr19800499plc.211.1545831428384; Wed, 26 Dec 2018 05:37:08 -0800 (PST) X-Google-Smtp-Source: ALg8bN6SFVbWGRzDvrswlQwO87CPALQsYihNvIXcuFobPSwIDC8DUwhYYgMD9vFq/xkHVM57JQZl X-Received: by 2002:a17:902:33c2:: with SMTP id b60mr19800463plc.211.1545831427737; Wed, 26 Dec 2018 05:37:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831427; cv=none; d=google.com; s=arc-20160816; b=sD2jCjcKrwtOusXHndYypdFH/JxK2NfiN3PPm1fQjFgeYWSYWuCOevTicJDiYql+gN PGrvm8C1hR0eM6yyTPzXnrCSK4EPBTiY32KMvJHVLIfq9ggtj5m4gaoPcm7jzZpX9UBC y+/SzAO4ZN1AKGAF4LpxOFV2zQUopQ7q1scN8MSLO1yrxYoOfprV+/8dN+2Hui3XfGg6 6c/RW8teLijNtob81c1NhInGXyPabsHDzU7AcCusF9DOb9n6TtmJx7cOldCCe8JXuIwh ZEy8EdzjpMibAQ9L4riDc/loDWozAqysKVNVYZcsoBR71HnYB3E2wz/lHO/xf9jUHtkC /vWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:references:subject:cc:cc:cc:cc:cc :cc:cc:cc:cc:cc:cc:to:from:date:user-agent:message-id; bh=pybQb2bIsIMoSE+nhg2Fmhc1oq+DOuwKQqNor65wLbc=; b=le2nGtjnH1sb9oE7vcLnY7GHUcMakMSkQHxPSiYdxwbOCc3b5uoNvlUfw246tGOXtp 9pljxQc2ZBCITjgGtlr0zwM6yqIOeataimDKSq4TKEMDZrt8ScRydnrH+X1ZCxUyC6gR 8i3Gda867JlUUZ6Rgvyz64MFZ/GaTLSx/poR9T8oeQFF3hMdirqlG4leQZxsMqNUHMFx 9c/jTYbqTqMKKmUqd4M3UExfleoagbSwVLqoPCW9lR9MWjLzEsqvck8jABzUCmXOvzRa FqZmSH0/OudEL76wQY5yObC7fUAVeN5UwV/I3JcHA9CCNE0edezkQQdRUUXv4CRQ2sV2 ZajA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id r12si1487152plo.59.2018.12.26.05.37.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Dec 2018 05:37:07 -0800 (PST) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="113358949" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by orsmga003.jf.intel.com with ESMTP; 26 Dec 2018 05:37:02 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005PN-M1; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133352.189896494@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:15:05 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Liu Jingqi , Fengguang Wu cc: kvm@vger.kernel.org Cc: LKML cc: Fan Du cc: Yao Yuan cc: Peng Dong cc: Huang Ying cc: Dong Eddie cc: Dave Hansen cc: Zhang Yi cc: Dan Williams Subject: [RFC][PATCH v2 19/21] mm/migrate.c: add move_pages(MPOL_MF_SW_YOUNG) flag References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=0010-migrate-check-if-the-page-is-software-young-when-mov.patch X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Liu Jingqi Introduce MPOL_MF_SW_YOUNG flag to move_pages(). When on, the already-in-DRAM pages will be set PG_referenced. Background: The use space migration daemon will frequently scan page table and read-clear accessed bits to detect hot/cold pages. Then migrate hot pages from PMEM to DRAM node. When doing so, it btw tells kernel that these are the hot page set. This maintains a persistent view of hot/cold pages between kernel and user space daemon. The more concrete steps are 1) do multiple scan of page table, count accessed bits 2) highest accessed count => hot pages 3) call move_pages(hot pages, DRAM nodes, MPOL_MF_SW_YOUNG) (1) regularly clears PTE young, which makes kernel lose access to PTE young information (2) for anonymous pages, user space daemon defines which is hot and which is cold (3) conveys user space view of hot/cold pages to kernel through PG_referenced In the long run, most hot pages could already be in DRAM. move_pages(MPOL_MF_SW_YOUNG) sets PG_referenced for those already in DRAM hot pages. But not for newly migrated hot pages. Since they are expected to put to the end of LRU, thus has long enough time in LRU to gather accessed/PG_referenced bit and prove to kernel they are really hot. The daemon may only select DRAM/2 pages as hot for 2 purposes: - avoid thrashing, eg. some warm pages got promoted then demoted soon - make sure enough DRAM LRU pages look "cold" to kernel, so that vmscan won't run into trouble busy scanning LRU lists Signed-off-by: Liu Jingqi Signed-off-by: Fengguang Wu --- mm/migrate.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) --- linux.orig/mm/migrate.c 2018-12-23 20:37:12.604621319 +0800 +++ linux/mm/migrate.c 2018-12-23 20:37:12.604621319 +0800 @@ -55,6 +55,8 @@ #include "internal.h" +#define MPOL_MF_SW_YOUNG (1<<7) + /* * migrate_prep() needs to be called before we start compiling a list of pages * to be migrated using isolate_lru_page(). If scheduling work on other CPUs is @@ -1484,12 +1486,13 @@ static int do_move_pages_to_node(struct * the target node */ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr, - int node, struct list_head *pagelist, bool migrate_all) + int node, struct list_head *pagelist, int flags) { struct vm_area_struct *vma; struct page *page; unsigned int follflags; int err; + bool migrate_all = flags & MPOL_MF_MOVE_ALL; down_read(&mm->mmap_sem); err = -EFAULT; @@ -1519,6 +1522,8 @@ static int add_page_for_migration(struct if (PageHuge(page)) { if (PageHead(page)) { + if (flags & MPOL_MF_SW_YOUNG) + SetPageReferenced(page); isolate_huge_page(page, pagelist); err = 0; } @@ -1531,6 +1536,8 @@ static int add_page_for_migration(struct goto out_putpage; err = 0; + if (flags & MPOL_MF_SW_YOUNG) + SetPageReferenced(head); list_add_tail(&head->lru, pagelist); mod_node_page_state(page_pgdat(head), NR_ISOLATED_ANON + page_is_file_cache(head), @@ -1606,7 +1613,7 @@ static int do_pages_move(struct mm_struc * report them via status */ err = add_page_for_migration(mm, addr, current_node, - &pagelist, flags & MPOL_MF_MOVE_ALL); + &pagelist, flags); if (!err) continue; @@ -1725,7 +1732,7 @@ static int kernel_move_pages(pid_t pid, nodemask_t task_nodes; /* Check flags */ - if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL)) + if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL|MPOL_MF_SW_YOUNG)) return -EINVAL; if ((flags & MPOL_MF_MOVE_ALL) && !capable(CAP_SYS_NICE)) From patchwork Wed Dec 26 13:15:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10743139 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B5A02924 for ; Wed, 26 Dec 2018 13:39:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5EF4927FB7 for ; Wed, 26 Dec 2018 13:39:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 526A7280FC; Wed, 26 Dec 2018 13:39:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CE9FA27FB7 for ; Wed, 26 Dec 2018 13:39:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EA66A8E0008; Wed, 26 Dec 2018 08:39:03 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E2E198E0001; Wed, 26 Dec 2018 08:39:03 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CCF488E0008; Wed, 26 Dec 2018 08:39:03 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 85EA58E0001 for ; Wed, 26 Dec 2018 08:39:03 -0500 (EST) Received: by mail-pl1-f200.google.com with SMTP id x7so13954102pll.23 for ; Wed, 26 Dec 2018 05:39:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:subject :references:mime-version:content-disposition; bh=tYqj7DG34CtC/nTD0z3u9A6Jj04pHSJsobbAtn39leM=; b=FnTmIa8o8n8wqNnCo7AXuwICmslr1PzBG75NnsFeltY+7sDaMiNsnxlGx/x0e2+oPt /5TP5bIQCYyAjbHcPCv0N/nSdaler4V4DYuZcDnZWSInrYF89xYRjSvgKy7dIk4TCkis DGR5I3UDYR9Hh7SQJCFbM8/LoUhIwKslT0Gj38JZARq+/pddfDUC0GeAxo6Fi3zRFL2O zB2gzBswNz4Yud1JGyXRMh6yLTL4JKG8KoWBCMCE3cMHMkn4F1OuafASG3t/hFcdRzgK SMWQGDyLQ/7bLRUwF2oyXfFTcHKIAsFeIrU4FDIpLl46Qb0KaZs3FBDzWUOm+Fu4yweU yg3A== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWaXTiCy33zNGQ1obQ1YubCMa9w3EbzEY7amV+Nx/5gkrFRPXLwV gCJQp3vZrbxVzZaHHUEkONQf20X45QTAzhF3PM6hvnD3PIuF5YgbPxWKGToFyOxzA78KllCN9Zy 3R6Rh28YDKRBMG5/4aeZ5UIMIWtimxsskrUAp0OwUrDqJ/JJdGWwXB8ohOEF7wQC60w== X-Received: by 2002:a62:dbc2:: with SMTP id f185mr20046327pfg.235.1545831543226; Wed, 26 Dec 2018 05:39:03 -0800 (PST) X-Google-Smtp-Source: AFSGD/V+2q9ausOlvfFXMvIV9VjxLCl9AKgR/vGCKA3e7e8K3baMNF3MrhGeYh+zo3tpH0/Z2Iyn X-Received: by 2002:a62:dbc2:: with SMTP id f185mr20040484pfg.235.1545831427188; Wed, 26 Dec 2018 05:37:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831427; cv=none; d=google.com; s=arc-20160816; b=pxqEvYYyt0kxnu7j8l8GmVmmUzhs4zNsI0FyLmsSwIU5/Ym2WGCZoDeiiVnzpIvb+6 3xXsWpH7TzWo6ZNiRKwRNdQQKLTeauDiUqVMEg46vXSqzhN27UpwzU4vmZqNBABkR348 kn6jJpaekbNqLo7q9iXTiNNggfGjAlV9ThrVynRUrVnL9Fcn1P1XiqaGe3/cpE2UCrpJ nZEE5Q9z2myKRCMNpskrtVbNwCLzAopQfumXDCKIzOfCKg0ARVwjkB94LlJlvmYp5GLi JOOfdpaeIiZAI9j4cRq34j4eQhJGyeLWzdgAI7CflbsO0hxMOxWyGjSNNKYy8OWVcZGi gvmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:references:subject:cc:cc:cc:cc:cc :cc:cc:cc:cc:cc:to:from:date:user-agent:message-id; bh=tYqj7DG34CtC/nTD0z3u9A6Jj04pHSJsobbAtn39leM=; b=OAYdvtNwhPMmPyMEOyqMpG/WP10OqEXHmlDeusllYBJ/AUHzp2qxmrjvQGopAL7P9H JRAAEe88DYCd5G7nVgQ3sZ+fr5WK66M87Dxc5/vNmkVI4VuZhkU0Wnd5ooHyIJsySx3c cg6sirlNMcdWz7v/wv0uQleje3LtuQv5/QYCG+MhsMfaOVZw6oQCwcd5/3I6160AkFpn IubL3T0oB0EStw1Z6TBUeep/tG70FbS/4YLWT9qjSsVYHTmdUDQvisT7HV2dddyex87k Md0UEDiGmnjpRPpa4g08r9BOMbtInYblbmZYWscVKK4BpTXUMMPyIUmjzKTUlzb2b3/2 nyNA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id e68si15371744pfb.101.2018.12.26.05.37.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Dec 2018 05:37:07 -0800 (PST) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="121185471" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by FMSMGA003.fm.intel.com with ESMTP; 26 Dec 2018 05:37:02 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005PS-Ms; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133352.246320288@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:15:06 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Fan Du , Jingqi Liu , Fengguang Wu cc: kvm@vger.kernel.org Cc: LKML cc: Yao Yuan cc: Peng Dong cc: Huang Ying cc: Dong Eddie cc: Dave Hansen cc: Zhang Yi cc: Dan Williams Subject: [RFC][PATCH v2 20/21] mm/vmscan.c: migrate anon DRAM pages to PMEM node References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=0012-vmscan-migrate-anonymous-pages-to-pmem-node-before-s.patch X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jingqi Liu With PMEM nodes, the demotion path could be 1) DRAM pages: migrate to PMEM node 2) PMEM pages: swap out This patch does (1) for anonymous pages only. Since we cannot detect hotness of (unmapped) page cache pages for now. The user space daemon can do migration in both directions: - PMEM=>DRAM hot page migration - DRAM=>PMEM cold page migration However it's more natural for user space to do hot page migration and kernel to do cold page migration. Especially, only kernel can guarantee on-demand migration when there is memory pressure. So the big picture will look like this: user space daemon does regular hot page migration to DRAM, creating memory pressure on DRAM nodes, which triggers kernel cold page migration to PMEM nodes. Du Fan: - Support multiple NUMA nodes. - Don't migrate clean MADV_FREE pages to PMEM node. With advise(MADV_FREE) syscall, both vma structure and its corresponding page entries still lives, but we got MADV_FREE page, anonymous but WITHOUT SwapBacked. In case of page reclaim, clean MADV_FREE pages will be freed and return to buddy system, the dirty ones then turn into canonical anonymous page with PageSwapBacked(page) set, and put into LRU_INACTIVE_FILE list falling into standard aging routine. Point is clean MADV_FREE pages should not be migrated, it has steal (useless) user data once madvise(MADV_FREE) called and guard against thus scenarios. P.S. MADV_FREE is heavily used by jemalloc engine, and workload like redis, refer to [1] for detailed backgroud, usecase, and benchmark result. [1] https://lore.kernel.org/patchwork/patch/622179/ Fengguang: - detect migrate thp and hugetlb - avoid moving pages to a non-existent node Signed-off-by: Fan Du Signed-off-by: Jingqi Liu Signed-off-by: Fengguang Wu --- mm/vmscan.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) --- linux.orig/mm/vmscan.c 2018-12-23 20:37:58.305551976 +0800 +++ linux/mm/vmscan.c 2018-12-23 20:37:58.305551976 +0800 @@ -1112,6 +1112,7 @@ static unsigned long shrink_page_list(st { LIST_HEAD(ret_pages); LIST_HEAD(free_pages); + LIST_HEAD(move_pages); int pgactivate = 0; unsigned nr_unqueued_dirty = 0; unsigned nr_dirty = 0; @@ -1121,6 +1122,7 @@ static unsigned long shrink_page_list(st unsigned nr_immediate = 0; unsigned nr_ref_keep = 0; unsigned nr_unmap_fail = 0; + int page_on_dram = is_node_dram(pgdat->node_id); cond_resched(); @@ -1275,6 +1277,21 @@ static unsigned long shrink_page_list(st } /* + * Check if the page is in DRAM numa node. + * Skip MADV_FREE pages as it might be freed + * immediately to buddy system if it's clean. + */ + if (node_online(pgdat->peer_node) && + PageAnon(page) && (PageSwapBacked(page) || PageTransHuge(page))) { + if (page_on_dram) { + /* Add to the page list which will be moved to pmem numa node. */ + list_add(&page->lru, &move_pages); + unlock_page(page); + continue; + } + } + + /* * Anonymous process memory has backing store? * Try to allocate it some swap space here. * Lazyfree page could be freed directly @@ -1496,6 +1513,22 @@ keep: VM_BUG_ON_PAGE(PageLRU(page) || PageUnevictable(page), page); } + /* Move the anonymous pages to PMEM numa node. */ + if (!list_empty(&move_pages)) { + int err; + + /* Could not block. */ + err = migrate_pages(&move_pages, alloc_new_node_page, NULL, + pgdat->peer_node, + MIGRATE_ASYNC, MR_NUMA_MISPLACED); + if (err) { + putback_movable_pages(&move_pages); + + /* Join the pages which were not migrated. */ + list_splice(&ret_pages, &move_pages); + } + } + mem_cgroup_uncharge_list(&free_pages); try_to_unmap_flush(); free_unref_page_list(&free_pages); From patchwork Wed Dec 26 13:15:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10743119 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9BD94924 for ; Wed, 26 Dec 2018 13:38:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 89B2A28495 for ; Wed, 26 Dec 2018 13:38:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7D68C28938; Wed, 26 Dec 2018 13:38:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2684528495 for ; Wed, 26 Dec 2018 13:38:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ED1528E0015; Wed, 26 Dec 2018 08:37:10 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3D58D8E0008; Wed, 26 Dec 2018 08:37:10 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C601D8E0008; Wed, 26 Dec 2018 08:37:09 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id 6E8AE8E0008 for ; Wed, 26 Dec 2018 08:37:08 -0500 (EST) Received: by mail-pg1-f198.google.com with SMTP id a2so15212809pgt.11 for ; Wed, 26 Dec 2018 05:37:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:subject :references:mime-version:content-disposition; bh=s9jTr6JOKBa1wmrz7jGYMpBIemoQeVl0/cJO8ct8GUo=; b=pjJKUUw/G48Hfh90huDvRV611iVUFYTxuxwYEqbTbfn2miCvDoYIzJK9fD31GNfkTc Lz5Q+p6P1lHYsdujUJ3DSj46Niy/TIZ86Gtwx9BU5Q4Oq8LUIooc4/UbCjtSArd+dwPF 9lQfsUKWnmeQWA8xnM4XBFqJ8mVNh3wZnwYctFoMmykK0toffa253BrMU6YQ6PEBPszK YbHBFHxZEwaK6xkNs/U67qv++yQKb4hL8w+9RLnJfR1hK27hjMaAENqYGu8B+7dd3jnB cvsL36yApyw1s2tWOxt6S9QWpaC7K1TVMz5vfXjbeMl5ta/jgD4AUIQ48QBrgdZI45od RiAA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWYOUiVYjB0tWqRvt/2mk4H4gB8maP2cmZSp9rPa+VPyDRoumo69 WDzBsA5zXvs9eR9aDqwEt4NknV2BW6H66jHmvM1wJRmSQWkxnL1Aw0ZY1JuWbTm4kkLTP3hxpG5 XmeNHxP3DfMP4TaI/+Gvil5PMJaiXaG+jo0C/8FrFtvGuHtHItmMTiC62i9LHT0YeBA== X-Received: by 2002:a62:6204:: with SMTP id w4mr20407817pfb.5.1545831428157; Wed, 26 Dec 2018 05:37:08 -0800 (PST) X-Google-Smtp-Source: AFSGD/UYUcyt9sVyAypMxy/kx8oPS9MGhnC85iACNlH+r2fm8WCCL49jRZDbzZLv0rHtjRg9Xtrn X-Received: by 2002:a62:6204:: with SMTP id w4mr20407782pfb.5.1545831427462; Wed, 26 Dec 2018 05:37:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831427; cv=none; d=google.com; s=arc-20160816; b=xLYSPzh0ADUP9tJzg9Dn7IA9lGhgDNm3Tu86jOb5+1jOmI039dla8D/zchqd01KOdC 1JRbQnHp+l+71wWDxTfML0anGm5gBvYa160eUFx5gCYiSfvNWD+WN+w9BzJDzebBruoR QTnks9PVCVi0iWamp04cJBFjtnoQB+qR2l7WIpr7IdsHYc+fyxuiJ38nD+bMdwLpv4c3 XuNRZnAi2OG9EnObaEBKeZjq4VA3+mlP4/gDYVzpBudXDiMIn/VniqFZgPDT8wNdQVDe 6/MSRyaUi5J7dkaHy7YO2to59/MjTTwPGKDzVUSYe6VgnfP4VFXEANiXD9tRGFEJBDWn sT5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:references:subject:cc:cc:cc:cc:cc :cc:cc:cc:cc:cc:cc:cc:to:from:date:user-agent:message-id; bh=s9jTr6JOKBa1wmrz7jGYMpBIemoQeVl0/cJO8ct8GUo=; b=xYeYNBPmwklYjAMW/U/YA3YgTX7gSpDZq6r3GKMRyMoaFcuYJI3urCy5c0Wq2VNJVg agAQz/huHASplmM2XH1djT8rU10xfNcg7D3cJb4SBexEFlBjNdAR7zB9sdoojAKFPgtp hjeuQAjI24h+oPV50DMk/PwOjDqwQCotoQFHW78mcU38pzRSM3P9E4SMH+t07e5laXjQ W94yAthH5NAWPyM9Ew9+JV9iZm/mTqVxNzoNGN9NdicaJcwnoAD4399j6ssR+a79rYE3 0WB5tb4arVaTlG9rGpJIL2One5bduNCE68nHcjieAoaZCnaGWXnScFV8XSI2qNiKOBSK sbww== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id r12si1487152plo.59.2018.12.26.05.37.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Dec 2018 05:37:07 -0800 (PST) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="113358944" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by orsmga003.jf.intel.com with ESMTP; 26 Dec 2018 05:37:02 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005PX-NY; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133352.303666865@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:15:07 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Fengguang Wu cc: kvm@vger.kernel.org Cc: LKML cc: Fan Du cc: Yao Yuan cc: Peng Dong cc: Huang Ying CC: Liu Jingqi cc: Dong Eddie cc: Dave Hansen cc: Zhang Yi cc: Dan Williams Subject: [RFC][PATCH v2 21/21] mm/vmscan.c: shrink anon list if can migrate to PMEM References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=0013-vmscan-disable-0-swap-space-optimization.patch X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Fix OOM by making in-kernel DRAM=>PMEM migration reachable. Here we assume these 2 possible demotion paths: - DRAM migrate to PMEM - PMEM to swap device Signed-off-by: Fengguang Wu --- mm/vmscan.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) --- linux.orig/mm/vmscan.c 2018-12-23 20:38:44.310446223 +0800 +++ linux/mm/vmscan.c 2018-12-23 20:38:44.306446146 +0800 @@ -2259,7 +2259,7 @@ static bool inactive_list_is_low(struct * If we don't have swap space, anonymous page deactivation * is pointless. */ - if (!file && !total_swap_pages) + if (!file && (is_node_pmem(pgdat->node_id) && !total_swap_pages)) return false; inactive = lruvec_lru_size(lruvec, inactive_lru, sc->reclaim_idx); @@ -2340,7 +2340,8 @@ static void get_scan_count(struct lruvec enum lru_list lru; /* If we have no swap space, do not bother scanning anon pages. */ - if (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0) { + if (is_node_pmem(pgdat->node_id) && + (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0)) { scan_balance = SCAN_FILE; goto out; }