From patchwork Tue Jun 25 07:52:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 11014949 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2BEEF6C5 for ; Tue, 25 Jun 2019 07:53:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 20BF928990 for ; Tue, 25 Jun 2019 07:53:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 14F5F289F3; Tue, 25 Jun 2019 07:53:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AF9D828990 for ; Tue, 25 Jun 2019 07:53:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D2AA46B0006; Tue, 25 Jun 2019 03:53:07 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id CDE0D8E0002; Tue, 25 Jun 2019 03:53:07 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD7846B0006; Tue, 25 Jun 2019 03:53:07 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by kanga.kvack.org (Postfix) with ESMTP id 63E7D6B0006 for ; Tue, 25 Jun 2019 03:53:07 -0400 (EDT) Received: by mail-ed1-f72.google.com with SMTP id b33so24377253edc.17 for ; Tue, 25 Jun 2019 00:53:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=GtFb1x2oQ1s4oVLLn37PNjVvlDaCyD0zWDCvyli2BTc=; b=pkwbKGByvP1cy514Ich9R6cGAByUC3FlpygcteXyY89QUtG8ZOAE8OmFbDL1yNZ2pa 5tzmHo5qs7p6yUfMDvxAXj8YtbtBEhqJtIE09BblpD1nK8WsIP5g1/5Yi05qyWkrF9X3 i4cbXkPRcDedYh604VKtmq+waaTKNIoHNjHMDsBobcyeIDCaDSS/UVh8kWCLWCepO+rj Pjcuy6vy+O1hfCYX7r4CoGNvz0bvnvpEgKx4PsR93tt3IKjRglfzdvcvxhTVfKpcxsZg WQK9HV7pvG+jOga3nuoLOTKpZ6YCzp5/sioTEqkEMJpjv7pWSnnMmWIpOAenocUXWeb8 QTtg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Gm-Message-State: APjAAAWpJ0D5BZSvKlkpQtXChiNOmB036iVMcdOHyXNRL5qRAClG1DQ2 l9BTL2/7TGLyVt74NAa98Kdd5gBiqSu3eYb0odIgvpgrC/7szCpuzOjBfDszlPheTuArtQPTZbC nYjjU4RK4vytiFdUWHwUq/7ocaiZmdnGC4mkDXFMY8m6ftnhDILOpNDeNiZ6yHKH9YQ== X-Received: by 2002:a17:906:f43:: with SMTP id h3mr16881040ejj.143.1561449186939; Tue, 25 Jun 2019 00:53:06 -0700 (PDT) X-Google-Smtp-Source: APXvYqx4Jp0WSTiCiywdHOT2CRVrUKOyD19ewI3AElOkzSmhZcWOnglNKYdfK2vauAezEG4rTrzR X-Received: by 2002:a17:906:f43:: with SMTP id h3mr16880990ejj.143.1561449186095; Tue, 25 Jun 2019 00:53:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561449186; cv=none; d=google.com; s=arc-20160816; b=bxJTzoidZiCcC3sxfd5hoagJnGzIhPrMo0ZvUbeG2FDxIca82qr/QVupiMmJQJRhwk TExIifI5NRH5kjhQPJNGmeZk1Cezx41jnA57PF6PQiSzR8Y6Ft45WLlZKqMtIENAl21S KQtO0p08+8PvkYpDz4Tvb+DbTmqUVoDseaQuOKmr6SH69NG8+KitV0cY36DPGsZay+BP H5xkJYSdUfv1yBknzrKXf6weIivjRMyW19IrzRaAlU3LVaWlMkWtQXXRFN1WFWHN/gAo hfC0xRY+3eC+MOFwQIBDixzrQqdA6Focf1dsLwLQnlYAQF9on+HwNJk4n5oPE0RI/w7h id5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=GtFb1x2oQ1s4oVLLn37PNjVvlDaCyD0zWDCvyli2BTc=; b=u9hjlRjGXG7Dm3UqbTxGaPZAqqGYD+jrpNySZOggbCG0S65EiF5YHqyRYcXYSIRiTj VIpnk+6DUPOFcNj5pEMRbV+XQloxC8oZ9Wwo0dRtL+c2nzkXjTSLkhbxzgOAOFsajzWS HkCceQZfRuPYeVZ3WS2VFYBDj5fb5D4t507q/+DToh37ZXz3sIhbfUPGk1sqH0AljEdU 2XR8cga+2NpcRKys1mZaHcZUZ8tja+9YFbzvxttoI8wf885hj3uZO+vl9JW0mYVXsaoj 3Kue75Uzmzr36ouro+hdQtmcUl61JwhbkxaNLBtZc+/XrNAWRsteW+gkYW1cux2QdvNu H01g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id j15si12601282eda.119.2019.06.25.00.53.05 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Jun 2019 00:53:06 -0700 (PDT) Received-SPF: pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 25 Jun 2019 09:53:05 +0200 Received: from suse.de (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (NOT encrypted); Tue, 25 Jun 2019 08:52:33 +0100 From: Oscar Salvador To: akpm@linux-foundation.org Cc: mhocko@suse.com, dan.j.williams@intel.com, pasha.tatashin@soleen.com, Jonathan.Cameron@huawei.com, david@redhat.com, anshuman.khandual@arm.com, vbabka@suse.cz, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador Subject: [PATCH v2 1/5] drivers/base/memory: Remove unneeded check in remove_memory_block_devices Date: Tue, 25 Jun 2019 09:52:23 +0200 Message-Id: <20190625075227.15193-2-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20190625075227.15193-1-osalvador@suse.de> References: <20190625075227.15193-1-osalvador@suse.de> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP remove_memory_block_devices() checks for the range to be aligned to memory_block_size_bytes, which is our current memory block size, and WARNs_ON and bails out if it is not. This is the right to do, but we do already do that in try_remove_memory(), where remove_memory_block_devices() gets called from, and we even are more strict in try_remove_memory, since we directly BUG_ON in case the range is not properly aligned. Since remove_memory_block_devices() is only called from try_remove_memory(), we can safely drop the check here. To be honest, I am not sure if we should kill the system in case we cannot remove memory. I tend to think that WARN_ON and return and error is better. Signed-off-by: Oscar Salvador --- drivers/base/memory.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 826dd76f662e..07ba731beb42 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -771,10 +771,6 @@ void remove_memory_block_devices(unsigned long start, unsigned long size) struct memory_block *mem; int block_id; - if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) || - !IS_ALIGNED(size, memory_block_size_bytes()))) - return; - mutex_lock(&mem_sysfs_mutex); for (block_id = start_block_id; block_id != end_block_id; block_id++) { mem = find_memory_block_by_id(block_id, NULL); From patchwork Tue Jun 25 07:52:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 11014951 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EE481924 for ; Tue, 25 Jun 2019 07:53:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E30C828990 for ; Tue, 25 Jun 2019 07:53:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D71B8289F3; Tue, 25 Jun 2019 07:53:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0D11F28990 for ; Tue, 25 Jun 2019 07:53:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 033406B0007; Tue, 25 Jun 2019 03:53:10 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EFF838E0003; Tue, 25 Jun 2019 03:53:09 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DA1CC8E0002; Tue, 25 Jun 2019 03:53:09 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by kanga.kvack.org (Postfix) with ESMTP id 87E8E6B0007 for ; Tue, 25 Jun 2019 03:53:09 -0400 (EDT) Received: by mail-ed1-f69.google.com with SMTP id r21so24365782edp.11 for ; Tue, 25 Jun 2019 00:53:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=y+2I+zTZXBOYzfMRH7PbfawnKUkMs7bTqZUUsKijyZM=; b=mTrbpD/E25Dvm26IatkvMa+wv5FtpqJAgB3uZFDPTS6JbDGOocFW6GDCWNnNvaIyof HuMOra9aC5KhAnOIVBtKVUCm19bx63nJdP+XloPRuNPX4bRJ9x+RueyuMRoMBWDchPwr PWTDDsmGuhnztAaPhC7WBRUTu+hfFScoSYKwri+DFY18iH6DW/ltba+gWXR4rzF567VP y6jWbeubF/Jltl9Yq+kcaCa2Ot36KMyqqPXATUNC+IqQXL3FuRd17fnzNR+z//9iXCvr I2RM580iQc2TBAwFjWkGBadrYbUcsySfnUYwCGV5nuTaolJWX65+WVVh3EnYtezKW1wt Fn0Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Gm-Message-State: APjAAAW+/+TGTQw3aYCm3taY2cTYJTBGDF21pHubztz/P6HTHwmIszO4 Vam2CjxT2ugnC1bA8wDxEsqXc/HSwuOB6JtJTQTahBfjHBJz/DbwmbbJ34o5v4iv50JBTiBZDX4 oH2KziqQPgL9b2H8+HzgEDMX1EiDgM5IB+fC5kuFWuxtZDZRw65vvn0EAYVegdyQYlQ== X-Received: by 2002:a17:906:3953:: with SMTP id g19mr17204091eje.242.1561449189069; Tue, 25 Jun 2019 00:53:09 -0700 (PDT) X-Google-Smtp-Source: APXvYqzBpsIZIjvV9ihQT3/wudIRTzC81yCrjGzaZFgahIHXimwE5sClPnDqf4Q2sCnLqSCIuj7r X-Received: by 2002:a17:906:3953:: with SMTP id g19mr17203989eje.242.1561449187473; Tue, 25 Jun 2019 00:53:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561449187; cv=none; d=google.com; s=arc-20160816; b=1AlxVqz/2m/D8N1RKqqDimAUxviIRsqIEqhFAzpF8MFrdawhXNGiWHNVmwf/ILOTsy IdbQKioJAZy3hUS03e+FpDSrcjPwOMxHckp5B0NLOQoLpE87KYIjpSMZzwSTb9kmskXw 0BJSDi8li/V5n+s/EJtix7oBWmnFCMR/TxeBUuh5DMqCKpBkiYJqird5wt/DDZHQmSdR 0GL1A363Ehq1ttPXx08nlVPgftPql3hDi4ssvVCq8oqt4DDnRUeFZzYdrrUXxCZFdyNR SRrojV/1UZZBXc9LOQSZKkOeMjR486KjeGljNLdBmwglxDhD4RlsYYG42XvZjs/x4xLH 22jA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=y+2I+zTZXBOYzfMRH7PbfawnKUkMs7bTqZUUsKijyZM=; b=JeU40cLqKaRJri/pE0v6cOeGBgYZpNete9bJV0aPrZt314gnSjUNj7aO0uKucHgHvD Z/Yq4WoFquauNFb4Xz5sCWCWvwHir12sm082PTlf8D3VoeLu2sOtIxff21idlQR32bI6 z5Ud8OhdL8/wTlQ9nzeonBxYxSJKTsB9zyQRsmHPGVbmlEtjmGm2mqD7uYkmzyNt/3k/ k2l5J5CXPUyVokAtp8nol736ekjayOwDh7OkLb6zxR9t6wxFEUUSYaAT91fhNcAo/pCO TkGSwaLogYFjBGKTtXopPWcF1plusw3Aq8YTWgCY2HxhCPuLGmaPUxMn/DrMQya7kBqK XrIA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id a17si8173847ejp.146.2019.06.25.00.53.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Jun 2019 00:53:07 -0700 (PDT) Received-SPF: pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 25 Jun 2019 09:53:06 +0200 Received: from suse.de (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (NOT encrypted); Tue, 25 Jun 2019 08:52:34 +0100 From: Oscar Salvador To: akpm@linux-foundation.org Cc: mhocko@suse.com, dan.j.williams@intel.com, pasha.tatashin@soleen.com, Jonathan.Cameron@huawei.com, david@redhat.com, anshuman.khandual@arm.com, vbabka@suse.cz, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador Subject: [PATCH v2 2/5] mm,memory_hotplug: Introduce MHP_VMEMMAP_FLAGS Date: Tue, 25 Jun 2019 09:52:24 +0200 Message-Id: <20190625075227.15193-3-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20190625075227.15193-1-osalvador@suse.de> References: <20190625075227.15193-1-osalvador@suse.de> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This patch introduces MHP_MEMMAP_DEVICE and MHP_MEMMAP_MEMBLOCK flags, and prepares the callers that add memory to take a "flags" parameter. This "flags" parameter will be evaluated later on in Patch#3 to init mhp_restrictions struct. The callers are: add_memory __add_memory add_memory_resource Unfortunately, we do not have a single entry point to add memory, as depending on the requisites of the caller, they want to hook up in different places, (e.g: Xen reserve_additional_memory()), so we have to spread the parameter in the three callers. The flags are either MHP_MEMMAP_DEVICE or MHP_MEMMAP_MEMBLOCK, and only differ in the way they allocate vmemmap pages within the memory blocks. MHP_MEMMAP_MEMBLOCK: - With this flag, we will allocate vmemmap pages in each memory block. This means that if we hot-add a range that spans multiple memory blocks, we will use the beginning of each memory block for the vmemmap pages. This strategy is good for cases where the caller wants the flexiblity to hot-remove memory in a different granularity than when it was added. E.g: We allocate a range (x,y], that spans 3 memory blocks, and given memory block size = 128MB. [memblock#0 ] [0 - 511 pfns ] - vmemmaps for section#0 [512 - 32767 pfns ] - normal memory [memblock#1 ] [32768 - 33279 pfns] - vmemmaps for section#1 [33280 - 65535 pfns] - normal memory [memblock#2 ] [65536 - 66047 pfns] - vmemmap for section#2 [66048 - 98304 pfns] - normal memory MHP_MEMMAP_DEVICE: - With this flag, we will store all vmemmap pages at the beginning of hot-added memory. E.g: We allocate a range (x,y], that spans 3 memory blocks, and given memory block size = 128MB. [memblock #0 ] [0 - 1533 pfns ] - vmemmap for section#{0-2} [1534 - 98304 pfns] - normal memory When using larger memory blocks (1GB or 2GB), the principle is the same. Of course, MHP_MEMMAP_DEVICE is nicer when it comes to have a large contigous area, while MHP_MEMMAP_MEMBLOCK allows us to have flexibility when removing the memory. Signed-off-by: Oscar Salvador Reviewed-by: David Hildenbrand Reviewed-by: Dan Williams --- drivers/acpi/acpi_memhotplug.c | 2 +- drivers/base/memory.c | 2 +- drivers/dax/kmem.c | 2 +- drivers/hv/hv_balloon.c | 2 +- drivers/s390/char/sclp_cmd.c | 2 +- drivers/xen/balloon.c | 2 +- include/linux/memory_hotplug.h | 22 +++++++++++++++++++--- mm/memory_hotplug.c | 10 +++++----- 8 files changed, 30 insertions(+), 14 deletions(-) diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index db013dc21c02..860f84e82dd0 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -218,7 +218,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) if (node < 0) node = memory_add_physaddr_to_nid(info->start_addr); - result = __add_memory(node, info->start_addr, info->length); + result = __add_memory(node, info->start_addr, info->length, 0); /* * If the memory block has been used by the kernel, add_memory() diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 07ba731beb42..ad9834b8b7f7 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -516,7 +516,7 @@ static ssize_t probe_store(struct device *dev, struct device_attribute *attr, nid = memory_add_physaddr_to_nid(phys_addr); ret = __add_memory(nid, phys_addr, - MIN_MEMORY_BLOCK_SIZE * sections_per_block); + MIN_MEMORY_BLOCK_SIZE * sections_per_block, 0); if (ret) goto out; diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index 3d0a7e702c94..e159184e0ba0 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -65,7 +65,7 @@ int dev_dax_kmem_probe(struct device *dev) new_res->flags = IORESOURCE_SYSTEM_RAM; new_res->name = dev_name(dev); - rc = add_memory(numa_node, new_res->start, resource_size(new_res)); + rc = add_memory(numa_node, new_res->start, resource_size(new_res), 0); if (rc) { release_resource(new_res); kfree(new_res); diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index 6fb4ea5f0304..beb92bc56186 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -731,7 +731,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size, nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn)); ret = add_memory(nid, PFN_PHYS((start_pfn)), - (HA_CHUNK << PAGE_SHIFT)); + (HA_CHUNK << PAGE_SHIFT), 0); if (ret) { pr_err("hot_add memory failed error is %d\n", ret); diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c index 37d42de06079..f61026c7db7e 100644 --- a/drivers/s390/char/sclp_cmd.c +++ b/drivers/s390/char/sclp_cmd.c @@ -406,7 +406,7 @@ static void __init add_memory_merged(u16 rn) if (!size) goto skip_add; for (addr = start; addr < start + size; addr += block_size) - add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size); + add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size, 0); skip_add: first_rn = rn; num = 1; diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index 37a36c6b9f93..33814b3513ca 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -349,7 +349,7 @@ static enum bp_state reserve_additional_memory(void) mutex_unlock(&balloon_mutex); /* add_memory_resource() requires the device_hotplug lock */ lock_device_hotplug(); - rc = add_memory_resource(nid, resource); + rc = add_memory_resource(nid, resource, 0); unlock_device_hotplug(); mutex_lock(&balloon_mutex); diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 0b8a5e5ef2da..6fdbce9d04f9 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -54,6 +54,22 @@ enum { }; /* + * We want memmap (struct page array) to be allocated from the hotadded range. + * To do so, there are two possible ways depending on what the caller wants. + * 1) Allocate memmap pages per device (whole hot-added range) + * 2) Allocate memmap pages per memblock + * The former implies that we wil use the beginning of the hot-added range + * to store the memmap pages of the whole range, while the latter implies + * that we will use the beginning of each memblock to store its own memmap + * pages. + * Please note that only SPARSE_VMEMMAP implements this feature and some + * architectures might not support it even for that memory model (e.g. s390) + */ +#define MHP_MEMMAP_DEVICE (1UL<<0) +#define MHP_MEMMAP_MEMBLOCK (1UL<<1) +#define MHP_VMEMMAP_FLAGS (MHP_MEMMAP_DEVICE|MHP_MEMMAP_MEMBLOCK) + +/* * Restrictions for the memory hotplug: * flags: MHP_ flags * altmap: alternative allocator for memmap array @@ -342,9 +358,9 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {} extern void __ref free_area_init_core_hotplug(int nid); extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn, void *arg, int (*func)(struct memory_block *, void *)); -extern int __add_memory(int nid, u64 start, u64 size); -extern int add_memory(int nid, u64 start, u64 size); -extern int add_memory_resource(int nid, struct resource *resource); +extern int __add_memory(int nid, u64 start, u64 size, unsigned long flags); +extern int add_memory(int nid, u64 start, u64 size, unsigned long flags); +extern int add_memory_resource(int nid, struct resource *resource, unsigned long flags); extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap); extern bool is_memblock_offlined(struct memory_block *mem); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 4e8e65954f31..e4e3baa6eaa7 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1057,7 +1057,7 @@ static int online_memory_block(struct memory_block *mem, void *arg) * * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ -int __ref add_memory_resource(int nid, struct resource *res) +int __ref add_memory_resource(int nid, struct resource *res, unsigned long flags) { struct mhp_restrictions restrictions = {}; u64 start, size; @@ -1135,7 +1135,7 @@ int __ref add_memory_resource(int nid, struct resource *res) } /* requires device_hotplug_lock, see add_memory_resource() */ -int __ref __add_memory(int nid, u64 start, u64 size) +int __ref __add_memory(int nid, u64 start, u64 size, unsigned long flags) { struct resource *res; int ret; @@ -1144,18 +1144,18 @@ int __ref __add_memory(int nid, u64 start, u64 size) if (IS_ERR(res)) return PTR_ERR(res); - ret = add_memory_resource(nid, res); + ret = add_memory_resource(nid, res, flags); if (ret < 0) release_memory_resource(res); return ret; } -int add_memory(int nid, u64 start, u64 size) +int add_memory(int nid, u64 start, u64 size, unsigned long flags) { int rc; lock_device_hotplug(); - rc = __add_memory(nid, start, size); + rc = __add_memory(nid, start, size, flags); unlock_device_hotplug(); return rc; From patchwork Tue Jun 25 07:52:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 11014953 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 119C1924 for ; Tue, 25 Jun 2019 07:53:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 020C128990 for ; Tue, 25 Jun 2019 07:53:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E5FA4289F3; Tue, 25 Jun 2019 07:53:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 808B028990 for ; Tue, 25 Jun 2019 07:53:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E41F6B0008; Tue, 25 Jun 2019 03:53:11 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0CD1C8E0003; Tue, 25 Jun 2019 03:53:10 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E14108E0002; Tue, 25 Jun 2019 03:53:10 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by kanga.kvack.org (Postfix) with ESMTP id 95F556B0008 for ; Tue, 25 Jun 2019 03:53:10 -0400 (EDT) Received: by mail-ed1-f72.google.com with SMTP id s7so24339014edb.19 for ; Tue, 25 Jun 2019 00:53:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=pQySH0L22RUnV1R+Ri6D2HobC1fyefST74dWKRzISAc=; b=JWvbw+JHcoe/R1M7xUfPlY0Aw0EG2CrtbYtiJy1Jx17X4MkmWIs5OX+YnGl56l8HTg nXhbKp41agOHBBxV5O4gdiWiGzCKcj3ao+7NOpOof8rW7+WmVUdWQATSYbaKqouhZYt5 UzNNoOjlYFCwLgCWU63xrHbehUC05lWocea1FDNLo/kDemy6dmxWYPMzsmSSNwvQ9wK0 NQbifc/gSSWwhRF5L1IPb5d10AzM/wtbe0iCe0davve0Q4M0j48wnS8TqzbYRc4yEPp1 uuwzHsQHo6moLL1we0gmTxgFomuJywUcw9h7pnsr8jkSFplKXsk1IXU0gxW1bwVJ9Z2Y iWEg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Gm-Message-State: APjAAAUp2KjKzZR+4Dd3gAGPCEzUMjwYIIvICyVzbJ5sVj4LeUrzawPm pKk55gD5VI8XlnPLtnEnjE1hR+ydVle8/F0ugEZF74W6H08C2Gpnj8d/TYE5WV9cJBrA2u0ydq4 +sJh8tKgIxzZq//e2dVoX/fJ9gFcGJd9bSydJhzdB0RaqccMxGultiojz/hXnz2xbNg== X-Received: by 2002:a17:906:3953:: with SMTP id g19mr17204153eje.242.1561449190192; Tue, 25 Jun 2019 00:53:10 -0700 (PDT) X-Google-Smtp-Source: APXvYqyHUQ4uiafh49rYR/Ooi6d/eoY2fKQ4YJQnAs+cJggb2qcQyyo48V/G73domiGWZd8bLePB X-Received: by 2002:a17:906:3953:: with SMTP id g19mr17204083eje.242.1561449188973; Tue, 25 Jun 2019 00:53:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561449188; cv=none; d=google.com; s=arc-20160816; b=ES3NebXaS/goRHdD2ZxCVaxNXTOypBV0S9XCtb4tSj2+YypaakkDNaYwg8Ug2YifTn k4PwMboOFoLuqnKHZ79HxXtm8JsMlrpDOqlESBGMZu42c7US3TapW64F1uCRbhpKyVnb iS2XzDZcDKSQM+4f8Di+7CTBfnryLDBokhKcWflMk+hYeyZDzaUZ47/kt2SbXhGuVpTK NX1ptbwAcISmI5GD3T7GVw2qjpbNzkUEgPten0SYbXZ50j5gAvvsOQRaBDok6sfwrC4/ lrzjHGG6KL8jXmZ7oFWIcv1aa2i0cqBBOZXX+UKHfHIGIg3zoQZPqUzvwFdG55pwH0hx agIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=pQySH0L22RUnV1R+Ri6D2HobC1fyefST74dWKRzISAc=; b=tTACkTXVb9UPXWfvlOy+VhOrmco9AfRglgS3iCPN0Tlt0bF21ZrZuUJFVjf9BHdXwz 5J4LBIRDb0a4vOcqWSE6+7k3y0ZcVdcHkCi6E/kKYCxEpQ8jrO/5sG17x7pk4RAxZge+ TS7t5yCV4uX2exOecSClVTgy/aG5luld8ETUYWcThe9Bj60R4s6AQa8isj70IoOaB0sh urhSsrXbzE7ARHw3syauBPCAsm6w+pIED4LVn9wT4H1vNwjG10yoBZ1s3bJtv7h7rmk8 2cAXV+HOysmv5vshc1mA696GwYEw7MZdBXFouwRd4zM3mkNv7cywOD9YJ9kTERzUitSK 6lgg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id j20si8309411ejt.117.2019.06.25.00.53.08 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Jun 2019 00:53:08 -0700 (PDT) Received-SPF: pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 25 Jun 2019 09:53:08 +0200 Received: from suse.de (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (NOT encrypted); Tue, 25 Jun 2019 08:52:34 +0100 From: Oscar Salvador To: akpm@linux-foundation.org Cc: mhocko@suse.com, dan.j.williams@intel.com, pasha.tatashin@soleen.com, Jonathan.Cameron@huawei.com, david@redhat.com, anshuman.khandual@arm.com, vbabka@suse.cz, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador Subject: [PATCH v2 3/5] mm,memory_hotplug: Introduce Vmemmap page helpers Date: Tue, 25 Jun 2019 09:52:25 +0200 Message-Id: <20190625075227.15193-4-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20190625075227.15193-1-osalvador@suse.de> References: <20190625075227.15193-1-osalvador@suse.de> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Introduce a set of functions for Vmemmap pages. Set of functions: - {Set,Clear,Check} Vmemmap flag - Given a vmemmap page, get its vmemmap-head - Get #nr of vmemmap pages taking into account the current position of the page These functions will be used for the code handling Vmemmap pages. Signed-off-by: Oscar Salvador --- include/linux/page-flags.h | 34 ++++++++++++++++++++++++++++++++++ mm/util.c | 2 ++ 2 files changed, 36 insertions(+) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index b848517da64c..a8b9b57162b3 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -466,6 +466,40 @@ static __always_inline int __PageMovable(struct page *page) PAGE_MAPPING_MOVABLE; } +#define VMEMMAP_PAGE (~PAGE_MAPPING_FLAGS) +static __always_inline int PageVmemmap(struct page *page) +{ + return PageReserved(page) && (unsigned long)page->mapping == VMEMMAP_PAGE; +} + +static __always_inline int __PageVmemmap(struct page *page) +{ + return (unsigned long)page->mapping == VMEMMAP_PAGE; +} + +static __always_inline void __ClearPageVmemmap(struct page *page) +{ + __ClearPageReserved(page); + page->mapping = NULL; +} + +static __always_inline void __SetPageVmemmap(struct page *page) +{ + __SetPageReserved(page); + page->mapping = (void *)VMEMMAP_PAGE; +} + +static __always_inline struct page *vmemmap_get_head(struct page *page) +{ + return (struct page *)page->freelist; +} + +static __always_inline unsigned long get_nr_vmemmap_pages(struct page *page) +{ + struct page *head = vmemmap_get_head(page); + return head->private - (page - head); +} + #ifdef CONFIG_KSM /* * A KSM page is one of those write-protected "shared pages" or "merged pages" diff --git a/mm/util.c b/mm/util.c index 021648a8a3a3..5e20563cdef6 100644 --- a/mm/util.c +++ b/mm/util.c @@ -607,6 +607,8 @@ struct address_space *page_mapping(struct page *page) mapping = page->mapping; if ((unsigned long)mapping & PAGE_MAPPING_ANON) return NULL; + if ((unsigned long)mapping == VMEMMAP_PAGE) + return NULL; return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS); } From patchwork Tue Jun 25 07:52:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 11014957 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 86F056C5 for ; Tue, 25 Jun 2019 07:53:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7A5D428990 for ; Tue, 25 Jun 2019 07:53:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6E249289F3; Tue, 25 Jun 2019 07:53:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DB97028990 for ; Tue, 25 Jun 2019 07:53:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A95576B000C; Tue, 25 Jun 2019 03:53:13 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9F6AE8E0003; Tue, 25 Jun 2019 03:53:13 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D57C8E0002; Tue, 25 Jun 2019 03:53:13 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by kanga.kvack.org (Postfix) with ESMTP id 120236B000C for ; Tue, 25 Jun 2019 03:53:13 -0400 (EDT) Received: by mail-ed1-f70.google.com with SMTP id y3so24367425edm.21 for ; Tue, 25 Jun 2019 00:53:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=7ZJiIu6+Q0/KpgqbrO7Z2w7ZhGh9YiaEfkdXVkzctqI=; b=RWGiPvoFqRdoATNNqc+jDqXaQjdZ4LZigITRT3zrmdtIC4f1vBryhLbdqxH4UJBO/L Oc6DNPbQrHVrArInXWEzODEAg/V+fjzsPHqhRjtCAzgWd+2Yei1EwRQtltRbW9AY0TgV q7OmdHgUVS5Zyt2+cLcOsfgF2/mQgwEhLlG0ieLoC4Jg6k7yzsUOOpKwsOwNkaCNoDX1 QpaDWx7+PJL4Ee80YBsTJaBMMgljvPWapikxiYerYLulb6gjax/DhHgU8P1cCFwnSXSj m97EBDgvjtxVRgIEIA6IPqpw4gCDfPUPZ7Z6UodFztiFz/KGgsKiyKpgysiPuVrfBL/k vjyQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Gm-Message-State: APjAAAXRLDbwCXWgKJFtG0184R9INURULEE/Wqx8gy/z2aeVRrT+Bn50 ew+PpF+0mAnENJsv59CRBaLSnX6PIiOFloFSZmT9T3QfpNuAKceuxenu/7G4mxYpnealcNq9Q1g 64cGt1cGvMg/FiHqEKM64S5oPwBNCWPsDKnNMPZO69Vn4XYhFuLR84ZuwjEk2E/23Xg== X-Received: by 2002:a17:906:d69:: with SMTP id s9mr30030372ejh.305.1561449192542; Tue, 25 Jun 2019 00:53:12 -0700 (PDT) X-Google-Smtp-Source: APXvYqxUR3bS125N6BGlvqrll0D/m27aK2sWir2j3+Q0lbrzFGwUxYXpxkEJIVjBE37i2Ikj/JaF X-Received: by 2002:a17:906:d69:: with SMTP id s9mr30030265ejh.305.1561449190404; Tue, 25 Jun 2019 00:53:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561449190; cv=none; d=google.com; s=arc-20160816; b=I0rBPrxs7Kaufiyt5LkM/eTDX/vLOxxU3OH19G2U4pRRTY8wbx3o4Y++07v0Ca6imn VNw3+ytLazORKsPxAIwcI5msLKCwfzoi3U5N3ARcMLQcFLQkdWt2Saho6uo00wyqOJ+t xNRmHG7E6g2Y8EADrfRgV2VaU3NJPScWQQFVBHNyFmxGbd9+Ul3VxjvoR2dnPST9FuyU MKV17Hhej6C6oXu7iaCJfaXr0ai1Vq2VPw1Nl8JSeP82xppZIHMr5p+8k6p4UmhDTGOk +3XP4OPJShfRnUfAxi2XSMrKVOxc9Woylg8jvGo1uaaZ6szTQX/sSNHI89iucVMx63FE 6Ejw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=7ZJiIu6+Q0/KpgqbrO7Z2w7ZhGh9YiaEfkdXVkzctqI=; b=KZeRZx1TPEXECgpnApxzC1iafM6mhCs2hb9o2coBSl360ojWU57qH9xHgnTFmQZsRN PZyGGs9S7BpiDjj7z2lKuVjFRYaJ2fMaxPj6Dt1rOHispC361/e3uJ6vd4frvA9qy8hZ qNcwKMYguedlEe0PEpOrHcvOGZ4HyQnNIiFu8N5V8WJevFpzcTL+dmcl/Jt1EzGV8Q8Z aqqP6kh82HlpJ5L5p8TxS/OY4oHAtwcZEM+abdGuaoCEb2fJLbWseWGSOX0SsLmtcSAk YW/yPQRtG0ozzYwTQQj8JEVemcteNR5JQ27WIGuGTMVFCldW/jNl9Ro+KRliM46epFvr +8Mg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id g39si12558324edc.434.2019.06.25.00.53.10 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Jun 2019 00:53:10 -0700 (PDT) Received-SPF: pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 25 Jun 2019 09:53:09 +0200 Received: from suse.de (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (NOT encrypted); Tue, 25 Jun 2019 08:52:35 +0100 From: Oscar Salvador To: akpm@linux-foundation.org Cc: mhocko@suse.com, dan.j.williams@intel.com, pasha.tatashin@soleen.com, Jonathan.Cameron@huawei.com, david@redhat.com, anshuman.khandual@arm.com, vbabka@suse.cz, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador Subject: [PATCH v2 4/5] mm,memory_hotplug: allocate memmap from the added memory range for sparse-vmemmap Date: Tue, 25 Jun 2019 09:52:26 +0200 Message-Id: <20190625075227.15193-5-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20190625075227.15193-1-osalvador@suse.de> References: <20190625075227.15193-1-osalvador@suse.de> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Physical memory hotadd has to allocate a memmap (struct page array) for the newly added memory section. Currently, alloc_pages_node() is used for those allocations. This has some disadvantages: a) an existing memory is consumed for that purpose (~2MB per 128MB memory section on x86_64) b) if the whole node is movable then we have off-node struct pages which has performance drawbacks. a) has turned out to be a problem for memory hotplug based ballooning because the userspace might not react in time to online memory while the memory consumed during physical hotadd consumes enough memory to push system to OOM. 31bc3858ea3e ("memory-hotplug: add automatic onlining policy for the newly added memory") has been added to workaround that problem. I have also seen hot-add operations failing on powerpc due to the fact that we try to use order-8 pages. If the base page size is 64KB, this gives us 16MB, and if we run out of those, we simply fail. One could arge that we can fall back to basepages as we do in x86_64, but we can do better when CONFIG_SPARSEMEM_VMEMMAP is enabled. Vmemap page tables can map arbitrary memory. That means that we can simply use the beginning of each memory section and map struct pages there. struct pages which back the allocated space then just need to be treated carefully. Implementation wise we reuse vmem_altmap infrastructure to override the default allocator used by __vmemap_populate. Once the memmap is allocated we need a way to mark altmap pfns used for the allocation. If MHP_MEMMAP_{DEVICE,MEMBLOCK} flag was passed, we set up the layout of the altmap structure at the beginning of __add_pages(), and then we call mark_vmemmap_pages(). Depending on which flag is passed (MHP_MEMMAP_DEVICE or MHP_MEMMAP_MEMBLOCK), mark_vmemmap_pages() gets called at a different stage. With MHP_MEMMAP_MEMBLOCK, we call it once we have populated the sections fitting in a single memblock, while with MHP_MEMMAP_DEVICE we wait until all sections have been populated. mark_vmemmap_pages() marks the pages as vmemmap and sets some metadata: The current layout of the Vmemmap pages are: [Head->refcount] : Nr sections used by this altmap [Head->private] : Nr of vmemmap pages [Tail->freelist] : Pointer to the head page This is done to easy the computation we need in some places. E.g: Example 1) We hot-add 1GB on x86_64 (memory block 128MB) using MHP_MEMMAP_DEVICE: head->_refcount = 8 sections head->private = 4096 vmemmap pages tail's->freelist = head Example 2) We hot-add 1GB on x86_64 using MHP_MEMMAP_MEMBLOCK: [at the beginning of each memblock] head->_refcount = 1 section head->private = 512 vmemmap pages tail's->freelist = head We have the refcount because when using MHP_MEMMAP_DEVICE, we need to know how much do we have to defer the call to vmemmap_free(). The thing is that the first pages of the hot-added range are used to create the memmap mapping, so we cannot remove those first, otherwise we would blow up when accessing the other pages. What we do is that since when we hot-remove a memory-range, sections are being removed sequentially, we wait until we hit the last section, and then we free the hole range to vmemmap_free backwards. We know that it is the last section because in every pass we decrease head->_refcount, and when it reaches 0, we got our last section. We also have to be careful about those pages during online and offline operations. They are simply skipped, so online will keep them reserved and so unusable for any other purpose and offline ignores them so they do not block the offline operation. In offline operation we only have to check for one particularity. Depending on how large was the hot-added range, and using MHP_MEMMAP_DEVICE, can be that one or more than one memory block is filled with only vmemmap pages. We just need to check for this case and skip 1) isolating 2) migrating, because those pages do not need to be migrated anywhere, they are self-hosted. Signed-off-by: Oscar Salvador --- arch/arm64/mm/mmu.c | 5 +- arch/powerpc/mm/init_64.c | 7 +++ arch/s390/mm/init.c | 6 ++ arch/x86/mm/init_64.c | 10 +++ drivers/acpi/acpi_memhotplug.c | 2 +- drivers/base/memory.c | 2 +- include/linux/memory_hotplug.h | 6 ++ include/linux/memremap.h | 2 +- mm/compaction.c | 7 +++ mm/memory_hotplug.c | 138 +++++++++++++++++++++++++++++++++++------ mm/page_alloc.c | 22 ++++++- mm/page_isolation.c | 14 ++++- mm/sparse.c | 93 +++++++++++++++++++++++++++ 13 files changed, 289 insertions(+), 25 deletions(-) diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 93ed0df4df79..d4b5661fa6b6 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -765,7 +765,10 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, if (pmd_none(READ_ONCE(*pmdp))) { void *p = NULL; - p = vmemmap_alloc_block_buf(PMD_SIZE, node); + if (altmap) + p = altmap_alloc_block_buf(PMD_SIZE, altmap); + else + p = vmemmap_alloc_block_buf(PMD_SIZE, node); if (!p) return -ENOMEM; diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c index a4e17a979e45..ff9d2c245321 100644 --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -289,6 +289,13 @@ void __ref vmemmap_free(unsigned long start, unsigned long end, if (base_pfn >= alt_start && base_pfn < alt_end) { vmem_altmap_free(altmap, nr_pages); + } else if (PageVmemmap(page)) { + /* + * runtime vmemmap pages are residing inside the memory + * section so they do not have to be freed anywhere. + */ + while (PageVmemmap(page)) + __ClearPageVmemmap(page++); } else if (PageReserved(page)) { /* allocated from bootmem */ if (page_size < PAGE_SIZE) { diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index ffb81fe95c77..c045411552a3 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -226,6 +226,12 @@ int arch_add_memory(int nid, u64 start, u64 size, unsigned long size_pages = PFN_DOWN(size); int rc; + /* + * Physical memory is added only later during the memory online so we + * cannot use the added range at this stage unfortunately. + */ + restrictions->flags &= ~restrictions->flags; + if (WARN_ON_ONCE(restrictions->altmap)) return -EINVAL; diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 688fb0687e55..00d17b666337 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -874,6 +874,16 @@ static void __meminit free_pagetable(struct page *page, int order) unsigned long magic; unsigned int nr_pages = 1 << order; + /* + * Runtime vmemmap pages are residing inside the memory section so + * they do not have to be freed anywhere. + */ + if (PageVmemmap(page)) { + while (nr_pages--) + __ClearPageVmemmap(page++); + return; + } + /* bootmem page has reserved flag */ if (PageReserved(page)) { __ClearPageReserved(page); diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 860f84e82dd0..3257edb98d90 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -218,7 +218,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) if (node < 0) node = memory_add_physaddr_to_nid(info->start_addr); - result = __add_memory(node, info->start_addr, info->length, 0); + result = __add_memory(node, info->start_addr, info->length, MHP_MEMMAP_DEVICE); /* * If the memory block has been used by the kernel, add_memory() diff --git a/drivers/base/memory.c b/drivers/base/memory.c index ad9834b8b7f7..e0ac9a3b66f8 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -32,7 +32,7 @@ static DEFINE_MUTEX(mem_sysfs_mutex); #define to_memory_block(dev) container_of(dev, struct memory_block, dev) -static int sections_per_block; +int sections_per_block; static inline int base_memory_block_id(int section_nr) { diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 6fdbce9d04f9..e28e226c9a20 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -375,4 +375,10 @@ extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_ int online_type); extern struct zone *zone_for_pfn_range(int online_type, int nid, unsigned start_pfn, unsigned long nr_pages); + +#ifdef CONFIG_SPARSEMEM_VMEMMAP +extern void mark_vmemmap_pages(struct vmem_altmap *self); +#else +static inline void mark_vmemmap_pages(struct vmem_altmap *self) {} +#endif #endif /* __LINUX_MEMORY_HOTPLUG_H */ diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 1732dea030b2..6de37e168f57 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -16,7 +16,7 @@ struct device; * @alloc: track pages consumed, private to vmemmap_populate() */ struct vmem_altmap { - const unsigned long base_pfn; + unsigned long base_pfn; const unsigned long reserve; unsigned long free; unsigned long align; diff --git a/mm/compaction.c b/mm/compaction.c index 9e1b9acb116b..40697f74b8b4 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -855,6 +855,13 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, nr_scanned++; page = pfn_to_page(low_pfn); + /* + * Vmemmap pages do not need to be isolated. + */ + if (PageVmemmap(page)) { + low_pfn += get_nr_vmemmap_pages(page) - 1; + continue; + } /* * Check if the pageblock has already been marked skipped. diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index e4e3baa6eaa7..b5106cb75795 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -42,6 +42,8 @@ #include "internal.h" #include "shuffle.h" +extern int sections_per_block; + /* * online_page_callback contains pointer to current page onlining function. * Initially it is generic_online_page(). If it is required it could be @@ -279,6 +281,24 @@ static int check_pfn_span(unsigned long pfn, unsigned long nr_pages, return 0; } +static void mhp_reset_altmap(unsigned long next_pfn, + struct vmem_altmap *altmap) +{ + altmap->base_pfn = next_pfn; + altmap->alloc = 0; +} + +static void mhp_init_altmap(unsigned long pfn, unsigned long nr_pages, + unsigned long mhp_flags, + struct vmem_altmap *altmap) +{ + if (mhp_flags & MHP_MEMMAP_DEVICE) + altmap->free = nr_pages; + else + altmap->free = PAGES_PER_SECTION * sections_per_block; + altmap->base_pfn = pfn; +} + /* * Reasonably generic function for adding memory. It is * expected that archs that support memory hotplug will @@ -290,8 +310,17 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, { unsigned long i; int start_sec, end_sec, err; - struct vmem_altmap *altmap = restrictions->altmap; + struct vmem_altmap *altmap; + struct vmem_altmap __memblk_altmap = {}; + unsigned long mhp_flags = restrictions->flags; + unsigned long sections_added; + + if (mhp_flags & MHP_VMEMMAP_FLAGS) { + mhp_init_altmap(pfn, nr_pages, mhp_flags, &__memblk_altmap); + restrictions->altmap = &__memblk_altmap; + } + altmap = restrictions->altmap; if (altmap) { /* * Validate altmap is within bounds of the total request @@ -308,9 +337,10 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, if (err) return err; + sections_added = 1; start_sec = pfn_to_section_nr(pfn); end_sec = pfn_to_section_nr(pfn + nr_pages - 1); - for (i = start_sec; i <= end_sec; i++) { + for (i = start_sec; i <= end_sec; i++, sections_added++) { unsigned long pfns; pfns = min(nr_pages, PAGES_PER_SECTION @@ -320,9 +350,19 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, break; pfn += pfns; nr_pages -= pfns; + + if (mhp_flags & MHP_MEMMAP_MEMBLOCK && + !(sections_added % sections_per_block)) { + mark_vmemmap_pages(altmap); + mhp_reset_altmap(pfn, altmap); + } cond_resched(); } vmemmap_populate_print_last(); + + if (mhp_flags & MHP_MEMMAP_DEVICE) + mark_vmemmap_pages(altmap); + return err; } @@ -642,6 +682,14 @@ static int online_pages_blocks(unsigned long start, unsigned long nr_pages) while (start < end) { order = min(MAX_ORDER - 1, get_order(PFN_PHYS(end) - PFN_PHYS(start))); + /* + * Check if the pfn is aligned to its order. + * If not, we decrement the order until it is, + * otherwise __free_one_page will bug us. + */ + while (start & ((1 << order) - 1)) + order--; + (*online_page_callback)(pfn_to_page(start), order); onlined_pages += (1UL << order); @@ -654,13 +702,30 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages, void *arg) { unsigned long onlined_pages = *(unsigned long *)arg; + unsigned long pfn = start_pfn; + unsigned long nr_vmemmap_pages = 0; - if (PageReserved(pfn_to_page(start_pfn))) - onlined_pages += online_pages_blocks(start_pfn, nr_pages); + if (PageVmemmap(pfn_to_page(pfn))) { + /* + * Do not send vmemmap pages to the page allocator. + */ + nr_vmemmap_pages = get_nr_vmemmap_pages(pfn_to_page(start_pfn)); + nr_vmemmap_pages = min(nr_vmemmap_pages, nr_pages); + pfn += nr_vmemmap_pages; + if (nr_vmemmap_pages == nr_pages) + /* + * If the entire range contains only vmemmap pages, + * there are no pages left for the page allocator. + */ + goto skip_online; + } + if (PageReserved(pfn_to_page(pfn))) + onlined_pages += online_pages_blocks(pfn, nr_pages - nr_vmemmap_pages); +skip_online: online_mem_sections(start_pfn, start_pfn + nr_pages); - *(unsigned long *)arg = onlined_pages; + *(unsigned long *)arg = onlined_pages + nr_vmemmap_pages; return 0; } @@ -1051,6 +1116,23 @@ static int online_memory_block(struct memory_block *mem, void *arg) return device_online(&mem->dev); } +static bool mhp_check_correct_flags(unsigned long flags) +{ + if (flags & MHP_VMEMMAP_FLAGS) { + if (!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)) { + WARN(1, "Vmemmap capability can only be used on" + "CONFIG_SPARSEMEM_VMEMMAP. Ignoring flags.\n"); + return false; + } + if ((flags & MHP_VMEMMAP_FLAGS) == MHP_VMEMMAP_FLAGS) { + WARN(1, "Both MHP_MEMMAP_DEVICE and MHP_MEMMAP_MEMBLOCK" + "were passed. Ignoring flags.\n"); + return false; + } + } + return true; +} + /* * NOTE: The caller must call lock_device_hotplug() to serialize hotplug * and online/offline operations (triggered e.g. by sysfs). @@ -1086,6 +1168,9 @@ int __ref add_memory_resource(int nid, struct resource *res, unsigned long flags goto error; new_node = ret; + if (mhp_check_correct_flags(flags)) + restrictions.flags = flags; + /* call arch's memory hotadd */ ret = arch_add_memory(nid, start, size, &restrictions); if (ret < 0) @@ -1518,12 +1603,14 @@ static int __ref __offline_pages(unsigned long start_pfn, { unsigned long pfn, nr_pages; unsigned long offlined_pages = 0; + unsigned long nr_vmemmap_pages = 0; int ret, node, nr_isolate_pageblock; unsigned long flags; unsigned long valid_start, valid_end; struct zone *zone; struct memory_notify arg; char *reason; + bool skip = false; mem_hotplug_begin(); @@ -1540,15 +1627,24 @@ static int __ref __offline_pages(unsigned long start_pfn, node = zone_to_nid(zone); nr_pages = end_pfn - start_pfn; - /* set above range as isolated */ - ret = start_isolate_page_range(start_pfn, end_pfn, - MIGRATE_MOVABLE, - SKIP_HWPOISON | REPORT_FAILURE); - if (ret < 0) { - reason = "failure to isolate range"; - goto failed_removal; + if (PageVmemmap(pfn_to_page(start_pfn))) { + nr_vmemmap_pages = get_nr_vmemmap_pages(pfn_to_page(start_pfn)); + nr_vmemmap_pages = min(nr_vmemmap_pages, nr_pages); + if (nr_vmemmap_pages == nr_pages) + skip = true; + } + + if (!skip) { + /* set above range as isolated */ + ret = start_isolate_page_range(start_pfn, end_pfn, + MIGRATE_MOVABLE, + SKIP_HWPOISON | REPORT_FAILURE); + if (ret < 0) { + reason = "failure to isolate range"; + goto failed_removal; + } + nr_isolate_pageblock = ret; } - nr_isolate_pageblock = ret; arg.start_pfn = start_pfn; arg.nr_pages = nr_pages; @@ -1561,6 +1657,9 @@ static int __ref __offline_pages(unsigned long start_pfn, goto failed_removal_isolated; } + if (skip) + goto skip_migration; + do { for (pfn = start_pfn; pfn;) { if (signal_pending(current)) { @@ -1601,7 +1700,9 @@ static int __ref __offline_pages(unsigned long start_pfn, We cannot do rollback at this point. */ walk_system_ram_range(start_pfn, end_pfn - start_pfn, &offlined_pages, offline_isolated_pages_cb); - pr_info("Offlined Pages %ld\n", offlined_pages); + +skip_migration: + pr_info("Offlined Pages %ld\n", offlined_pages + nr_vmemmap_pages); /* * Onlining will reset pagetype flags and makes migrate type * MOVABLE, so just need to decrease the number of isolated @@ -1612,11 +1713,12 @@ static int __ref __offline_pages(unsigned long start_pfn, spin_unlock_irqrestore(&zone->lock, flags); /* removal success */ - adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages); - zone->present_pages -= offlined_pages; + if (offlined_pages) + adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages); + zone->present_pages -= offlined_pages + nr_vmemmap_pages; pgdat_resize_lock(zone->zone_pgdat, &flags); - zone->zone_pgdat->node_present_pages -= offlined_pages; + zone->zone_pgdat->node_present_pages -= offlined_pages + nr_vmemmap_pages; pgdat_resize_unlock(zone->zone_pgdat, &flags); init_per_zone_wmark_min(); @@ -1645,7 +1747,7 @@ static int __ref __offline_pages(unsigned long start_pfn, memory_notify(MEM_CANCEL_OFFLINE, &arg); failed_removal: pr_debug("memory offlining [mem %#010llx-%#010llx] failed due to %s\n", - (unsigned long long) start_pfn << PAGE_SHIFT, + (unsigned long long) (start_pfn - nr_vmemmap_pages) << PAGE_SHIFT, ((unsigned long long) end_pfn << PAGE_SHIFT) - 1, reason); /* pushback to free area */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5b3266d63521..7a73a06c5730 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1282,9 +1282,14 @@ static void free_one_page(struct zone *zone, static void __meminit __init_single_page(struct page *page, unsigned long pfn, unsigned long zone, int nid) { - mm_zero_struct_page(page); + if (!__PageVmemmap(page)) { + /* + * Vmemmap pages need to preserve their state. + */ + mm_zero_struct_page(page); + init_page_count(page); + } set_page_links(page, zone, nid, pfn); - init_page_count(page); page_mapcount_reset(page); page_cpupid_reset_last(page); page_kasan_tag_reset(page); @@ -8143,6 +8148,14 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, page = pfn_to_page(check); + /* + * Vmemmap pages are not needed to be moved around. + */ + if (PageVmemmap(page)) { + iter += get_nr_vmemmap_pages(page) - 1; + continue; + } + if (PageReserved(page)) goto unmovable; @@ -8510,6 +8523,11 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) continue; } page = pfn_to_page(pfn); + + if (PageVmemmap(page)) { + pfn += get_nr_vmemmap_pages(page); + continue; + } /* * The HWPoisoned page may be not in buddy system, and * page_count() is not 0. diff --git a/mm/page_isolation.c b/mm/page_isolation.c index e3638a5bafff..128c47a27925 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -146,7 +146,7 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype) static inline struct page * __first_valid_page(unsigned long pfn, unsigned long nr_pages) { - int i; + unsigned long i; for (i = 0; i < nr_pages; i++) { struct page *page; @@ -154,6 +154,10 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) page = pfn_to_online_page(pfn + i); if (!page) continue; + if (PageVmemmap(page)) { + i += get_nr_vmemmap_pages(page) - 1; + continue; + } return page; } return NULL; @@ -268,6 +272,14 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn, continue; } page = pfn_to_page(pfn); + /* + * Vmemmap pages are not isolated. Skip them. + */ + if (PageVmemmap(page)) { + pfn += get_nr_vmemmap_pages(page); + continue; + } + if (PageBuddy(page)) /* * If the page is on a free list, it has to be on diff --git a/mm/sparse.c b/mm/sparse.c index b77ca21a27a4..04b395fb4463 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -635,6 +635,94 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn) #endif #ifdef CONFIG_SPARSEMEM_VMEMMAP +void mark_vmemmap_pages(struct vmem_altmap *self) +{ + unsigned long pfn = self->base_pfn + self->reserve; + unsigned long nr_pages = self->alloc; + unsigned long nr_sects = self->free / PAGES_PER_SECTION; + unsigned long i; + struct page *head; + + if (!nr_pages) + return; + + pr_debug("%s: marking %px - %px as Vmemmap (%ld pages)\n", + __func__, + pfn_to_page(pfn), + pfn_to_page(pfn + nr_pages - 1), + nr_pages); + + /* + * All allocations for the memory hotplug are the same sized so align + * should be 0. + */ + WARN_ON(self->align); + + /* + * Layout of vmemmap pages: + * [Head->refcount] : Nr sections used by this altmap + * [Head->private] : Nr of vmemmap pages + * [Tail->freelist] : Pointer to the head page + */ + + /* + * Head, first vmemmap page + */ + head = pfn_to_page(pfn); + for (i = 0; i < nr_pages; i++, pfn++) { + struct page *page = pfn_to_page(pfn); + + mm_zero_struct_page(page); + __SetPageVmemmap(page); + page->freelist = head; + init_page_count(page); + } + set_page_count(head, (int)nr_sects); + set_page_private(head, nr_pages); +} +/* + * If the range we are trying to remove was hot-added with vmemmap pages + * using MHP_MEMMAP_DEVICE, we need to keep track of it to know how much + * do we have do defer the free up. + * Since sections are removed sequentally in __remove_pages()-> + * __remove_section(), we just wait until we hit the last section. + * Once that happens, we can trigger free_deferred_vmemmap_range to actually + * free the whole memory-range. + */ +static struct page *head_vmemmap_page = NULL;; +static bool freeing_vmemmap_range = false; + +static inline bool vmemmap_dec_and_test(void) +{ + return page_ref_dec_and_test(head_vmemmap_page); +} + +static void free_deferred_vmemmap_range(unsigned long start, + unsigned long end) +{ + unsigned long nr_pages = end - start; + unsigned long first_section = (unsigned long)head_vmemmap_page; + + while (start >= first_section) { + vmemmap_free(start, end, NULL); + end = start; + start -= nr_pages; + } + head_vmemmap_page = NULL; + freeing_vmemmap_range = false; +} + +static void deferred_vmemmap_free(unsigned long start, unsigned long end) +{ + if (!freeing_vmemmap_range) { + freeing_vmemmap_range = true; + head_vmemmap_page = (struct page *)start; + } + + if (vmemmap_dec_and_test()) + free_deferred_vmemmap_range(start, end); +} + static struct page *populate_section_memmap(unsigned long pfn, unsigned long nr_pages, int nid, struct vmem_altmap *altmap) { @@ -647,6 +735,11 @@ static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages, unsigned long start = (unsigned long) pfn_to_page(pfn); unsigned long end = start + nr_pages * sizeof(struct page); + if (PageVmemmap((struct page *)start) || freeing_vmemmap_range) { + deferred_vmemmap_free(start, end); + return; + } + vmemmap_free(start, end, altmap); } static void free_map_bootmem(struct page *memmap) From patchwork Tue Jun 25 07:52:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 11014955 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B565B6C5 for ; Tue, 25 Jun 2019 07:53:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A8FB228990 for ; Tue, 25 Jun 2019 07:53:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9D45D289F3; Tue, 25 Jun 2019 07:53:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 275B728990 for ; Tue, 25 Jun 2019 07:53:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 032AB6B000A; Tue, 25 Jun 2019 03:53:13 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EFEB28E0003; Tue, 25 Jun 2019 03:53:12 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DEE228E0002; Tue, 25 Jun 2019 03:53:12 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by kanga.kvack.org (Postfix) with ESMTP id 913A16B000A for ; Tue, 25 Jun 2019 03:53:12 -0400 (EDT) Received: by mail-ed1-f71.google.com with SMTP id f19so24367366edv.16 for ; Tue, 25 Jun 2019 00:53:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=pOYCJ3wCl+5Aq16br5xzf6IifgjC4BiDEDrEzJSV9Ig=; b=QBZB3ioZAVTlg66yX/siWfU6FT0AorVOSIFZflvIQQ4Bb/4IG5Y4SdqXBNRN0YTs0X PVqchoLTmcTfYH1fUp8z2g0uU64tqwA19RSfnvnA+OQQUeUcW7fjTGmu3NY5pjse0z3r oBM8yjxK5N9l67d1zUYgreqrO2KP1USByReHyHVJEYUGqKeeSQqorOaSH47+BodxyDo3 tutL6iuIufBmVHY5QhuXOySyd4/Uczwv84A4TIK+t3jr2RAuXpcTizYvW+z7B9fqDL/T ca7EXPWV+d8wPBPnroT0TYAH6ldo0HksjkUg4GQDHI2YnyCIECWi/+TbZzDXRkcR1nGp BBog== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Gm-Message-State: APjAAAVrsIxKGMW2HIYPbGNUmwAEILx2kBS82/UlWTsNgwWBZ/9UsMry H3UI460CzbHxoDz7KTVGartuAotnpjVoEHm9yposT2pNSF78byChhWGGu9C4CeDgJQ70mDCALXc O1+o03MPMd0UtVfLifxtd63SCl8nQzx8s/DX5dQgw4w3dU732mVky5HkKxheloU2n6Q== X-Received: by 2002:a17:906:fcb3:: with SMTP id qw19mr105429441ejb.286.1561449192151; Tue, 25 Jun 2019 00:53:12 -0700 (PDT) X-Google-Smtp-Source: APXvYqzgX5/BFj7/q+f5soUQhaV1o4Kn5Itp4tXRdDP2zyKL3J88+321e0LwcbAMGulCbARPbNjn X-Received: by 2002:a17:906:fcb3:: with SMTP id qw19mr105429397ejb.286.1561449191317; Tue, 25 Jun 2019 00:53:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561449191; cv=none; d=google.com; s=arc-20160816; b=m08ASN4e+vpsgtjR0W97bhu6ylta5xFG62P/rSQCjunCryoIPwqwgmq+iDt9/QQGSy eFXdexN5YIQDt5+lLnxJIqD5bJeXp+wWrEkRU4Lk1yn3AApuvzNyI5EbL9g8/33qubyw TKYJSmCGxwuMcm+tygrmKbpYXoxF2conQGX5MjLbOXpcglsNZie3YybaIM+1gD515xQt 4ZH5dtY1ctENreog2l80V/4fuiyPIxyA9ZLNHX+f4pUQpk/ugF2sNLN468jn4E4mWLdY 21DUdmjAc2AdPsrL/pxANrDG2V31ok8eLCsZe30F6oTw74abNUb/3vUcsuW1ksQx+bKQ /K7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=pOYCJ3wCl+5Aq16br5xzf6IifgjC4BiDEDrEzJSV9Ig=; b=QhA72gNBuX3gJF1npnVc5XJAoFjGm3Tg8dxZTkOyHCVLfQqHTpyzAHTzS3azA9gXPQ xU8MqYneORnq4xvlAYYYVSKGVLAvx1f1pi/LfAgjulLceSOgtYnozUL1aScGNMyQBWOy YFOtcceYfESnl8ffTaS0mD4QrZaeSX7x10VqdMxKdqH4Lu8vIn2LXRIcirWD9Z5ZV2pG JHTeW6nHJ+xkUv2948+Lj4lN8z/o7u6iQN0Kp46jR2VjIdBFeZcF8tgUwpEueOPpLKZf IfXBMUQ62qQVFpuXl2//HpB0IFxuzVzTQfdgByhdakpsNYZbBCHRJD4RbiQmNTH0jrVH /m1w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id o7si8086149ejd.303.2019.06.25.00.53.11 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Jun 2019 00:53:11 -0700 (PDT) Received-SPF: pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 25 Jun 2019 09:53:10 +0200 Received: from suse.de (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (NOT encrypted); Tue, 25 Jun 2019 08:52:35 +0100 From: Oscar Salvador To: akpm@linux-foundation.org Cc: mhocko@suse.com, dan.j.williams@intel.com, pasha.tatashin@soleen.com, Jonathan.Cameron@huawei.com, david@redhat.com, anshuman.khandual@arm.com, vbabka@suse.cz, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador Subject: [PATCH v2 5/5] mm,memory_hotplug: Allow userspace to enable/disable vmemmap Date: Tue, 25 Jun 2019 09:52:27 +0200 Message-Id: <20190625075227.15193-6-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20190625075227.15193-1-osalvador@suse.de> References: <20190625075227.15193-1-osalvador@suse.de> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP It seems that we have some users out there that want to expose all hotpluggable memory to userspace, so this implements a toggling mechanism for those users who want to disable it. By default, vmemmap pages mechanism is enabled. Signed-off-by: Oscar Salvador --- drivers/base/memory.c | 33 +++++++++++++++++++++++++++++++++ include/linux/memory_hotplug.h | 3 +++ mm/memory_hotplug.c | 6 +++++- 3 files changed, 41 insertions(+), 1 deletion(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index e0ac9a3b66f8..6fca2c96cc08 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -573,6 +573,35 @@ static DEVICE_ATTR_WO(soft_offline_page); static DEVICE_ATTR_WO(hard_offline_page); #endif +#ifdef CONFIG_SPARSEMEM_VMEMMAP +static ssize_t vmemmap_hotplug_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + if (vmemmap_enabled) + return sprintf(buf, "enabled\n"); + else + return sprintf(buf, "disabled\n"); +} + +static ssize_t vmemmap_hotplug_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (sysfs_streq(buf, "enable")) + vmemmap_enabled = true; + else if (sysfs_streq(buf, "disable")) + vmemmap_enabled = false; + else + return -EINVAL; + + return count; +} +static DEVICE_ATTR_RW(vmemmap_hotplug); +#endif + /* * Note that phys_device is optional. It is here to allow for * differentiation between which *physical* devices each @@ -799,6 +828,10 @@ static struct attribute *memory_root_attrs[] = { &dev_attr_hard_offline_page.attr, #endif +#ifdef CONFIG_SPARSEMEM_VMEMMAP + &dev_attr_vmemmap_hotplug.attr, +#endif + &dev_attr_block_size_bytes.attr, &dev_attr_auto_online_blocks.attr, NULL diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index e28e226c9a20..94b4adc1a0ba 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -131,6 +131,9 @@ extern int arch_add_memory(int nid, u64 start, u64 size, struct mhp_restrictions *restrictions); extern u64 max_mem_size; +#ifdef CONFIG_SPARSEMEM_VMEMMAP +extern bool vmemmap_enabled; +#endif extern bool memhp_auto_online; /* If movable_node boot option specified */ extern bool movable_node_enabled; diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index b5106cb75795..32ee6fb7d3bf 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -70,6 +70,10 @@ void put_online_mems(void) bool movable_node_enabled = false; +#ifdef CONFIG_SPARSEMEM_VMEMMAP +bool vmemmap_enabled __read_mostly = true; +#endif + #ifndef CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE bool memhp_auto_online; #else @@ -1168,7 +1172,7 @@ int __ref add_memory_resource(int nid, struct resource *res, unsigned long flags goto error; new_node = ret; - if (mhp_check_correct_flags(flags)) + if (vmemmap_enabled && mhp_check_correct_flags(flags)) restrictions.flags = flags; /* call arch's memory hotadd */