From patchwork Tue Oct 2 15:00:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 10623821 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 875A215A7 for ; Tue, 2 Oct 2018 15:01:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7A40B204FB for ; Tue, 2 Oct 2018 15:01:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6E98B2237D; Tue, 2 Oct 2018 15:01:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 99319204FB for ; Tue, 2 Oct 2018 15:01:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B11616B000C; Tue, 2 Oct 2018 11:00:53 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AC65B6B000E; Tue, 2 Oct 2018 11:00:53 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 960256B000D; Tue, 2 Oct 2018 11:00:53 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by kanga.kvack.org (Postfix) with ESMTP id 36AC86B000E for ; Tue, 2 Oct 2018 11:00:53 -0400 (EDT) Received: by mail-wr1-f70.google.com with SMTP id k44-v6so1771610wre.18 for ; Tue, 02 Oct 2018 08:00:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=rffykfS8GnnC5FY6HrPy1m+B8yF9Iyvro/mIhUV67Fk=; b=OeroRTS9R3gB/2UF3J/vCjm86yCRqXazd8ghyj/2tjFQ+FLS91Fxfqw8zoKZN0cZny HA4cFJCkWFDQ2E7M5JMExItSRthBswGjbAyIFe5q09Fln5+MI7AL5AJG1ozGV9jw1xGY 6dis17+yVweqdnkFqBPLBemjxV60xKOLLK8p5+rQjjt/gxZrLYmGbObEzPgMzKQTVsZi jKRSW5jbQt0fc6QOSyHK1RjE0tB5Yt62CsibOoki4bjKYIsNXT523RpUzsRXpC2VwQlu f1BTg1o++ZF6qzUdRyOuqCiKVVMnj6o8gu02WTqBMpTnk5DTGuWw3eAx0Ez/1otkaWK8 6+WQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador.vilardaga@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=osalvador.vilardaga@gmail.com X-Gm-Message-State: ABuFfoi3pToNuwXGJpedtFocZQqkrjiVih8JobAY8bQeF8A7lBC8bSzu mGgTfd1f4LGcDSq055qqg3M7VrPl/CDLoExhQYJRGIm4lK//5rtiWNO9vpAd1htbKPqGuOZLsbT EdAr21bCbREN+zeMtT2+B560o2WfeXrwKdTbiYkyyaLony9BREUTN1V3YzR9o129WcpHOjTf17S TSueaw5b6gQyUsX4S4ocIWnS4cWTGy7K2fxebMb9j68EeLwTHtjlYfFxAakVvnHqGBwj7+eJUh6 jiRsfC/mp3bPdr9rHDUdOIoJwvhShmRQw6xpT9hbjpEeGcfFNvDK+bkANdZ3ZOPutO9lWsdwn0N 9tcWxkOliQkOmrODmE5FufP2Tz6hJy18JJtoQbgv7ujOR/PDEtkGVKcQbzraucRtY540xeZkKw= = X-Received: by 2002:a5d:4d4b:: with SMTP id a11-v6mr11481368wru.40.1538492452212; Tue, 02 Oct 2018 08:00:52 -0700 (PDT) X-Received: by 2002:a5d:4d4b:: with SMTP id a11-v6mr11481202wru.40.1538492449948; Tue, 02 Oct 2018 08:00:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538492449; cv=none; d=google.com; s=arc-20160816; b=NPo7KyS1y3fw3xJZB8D+6xHznxX7gfcYDPMpTDS5FJu9X03wHXu5fJeKTpKjJMyruS g7QALDL7wIyBB/gzJrGBmy62wl/vtVkxJnvTDONBNSrUkF6fxSuWMkuKaDVBPqUI+tQ9 H5xXUzpmecrGp14C/6UTCErT+VKXgdgYjqxqB3tbNv+ctf+Hr71wGU+Qr/4Oj3ajoGHr YdkK2k1d1WT7qaJHPmSyFXJ0rjD/kp6XQkuBNzCvzkkWMnrr+8MV2xFbrvxOKrOQL9qR IMP19k9H8yv1CPdoJ1tUMJQXjwkFVXPcFIMmBNqjR3Kfw536CMEKIgDfD8GhacWL1MWC oBBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=rffykfS8GnnC5FY6HrPy1m+B8yF9Iyvro/mIhUV67Fk=; b=FSK2JOyA51dnya1FGCNy5cpm4XpR2rVxK/BY4hMGrYXMnLKYHWUK/hBkTAcwwdy9h9 QnDfaHfNKp2YMFEk2SjtFLbPFa7qJMNAUKsmvoGIz0sb6I7eFWfP9K1ryGKpS1frWonb mOr4yHFU1b/JMZ2ZL3ijxrFh8+WkCOec5mFRZHNrGfEAx6xpuGeQlqjYI2xZvLouTyGC DXW4zWmh+kCjU/3nyP/t7skSh/aK59QXHJfi5drq9/xulSt8zT2hlo4IdT3KP6/O3Ky7 c1Z9M/AtGERbEjkaLtwQKz8uz+AKVE7RCtvOMpUd3Zu4c9BAxhyHxVUv2RA7rQM6nQ72 2vDg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of osalvador.vilardaga@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=osalvador.vilardaga@gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id n198-v6sor8940953wmd.1.2018.10.02.08.00.49 for (Google Transport Security); Tue, 02 Oct 2018 08:00:49 -0700 (PDT) Received-SPF: pass (google.com: domain of osalvador.vilardaga@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador.vilardaga@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=osalvador.vilardaga@gmail.com X-Google-Smtp-Source: ACcGV63lWXoZ8OOrBe9MPcA/4IoJOYcPZxofXSp0Ou4pVNE+mt6ihwP9AZLU96bk43bnXO9Dfp3Lzw== X-Received: by 2002:a1c:2052:: with SMTP id g79-v6mr2246463wmg.42.1538492449309; Tue, 02 Oct 2018 08:00:49 -0700 (PDT) Received: from techadventures.net (techadventures.net. [62.201.165.239]) by smtp.gmail.com with ESMTPSA id r16-v6sm4286132wrv.21.2018.10.02.08.00.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 02 Oct 2018 08:00:48 -0700 (PDT) Received: from d104.suse.de (charybdis-ext.suse.de [195.135.221.2]) by techadventures.net (Postfix) with ESMTPA id 61D6912561E; Tue, 2 Oct 2018 17:00:47 +0200 (CEST) From: Oscar Salvador To: linux-mm@kvack.org Cc: mhocko@suse.com, dan.j.williams@intel.com, yasu.isimatu@gmail.com, rppt@linux.vnet.ibm.com, malat@debian.org, linux-kernel@vger.kernel.org, pavel.tatashin@microsoft.com, jglisse@redhat.com, Jonathan.Cameron@huawei.com, rafael@kernel.org, david@redhat.com, dave.jiang@intel.com, Oscar Salvador Subject: [RFC PATCH v3 2/5] mm/memory_hotplug: Create add/del_device_memory functions Date: Tue, 2 Oct 2018 17:00:26 +0200 Message-Id: <20181002150029.23461-3-osalvador@techadventures.net> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20181002150029.23461-1-osalvador@techadventures.net> References: <20181002150029.23461-1-osalvador@techadventures.net> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Oscar Salvador HMM/devm have a particular handling of memory-hotplug. They do not go through the common path, and so, they do not call either offline_pages() or online_pages(). The operations they perform are the following ones: 1) Create the linear mapping in case the memory is not private 2) Initialize the pages and add the sections 3) Move the pages to ZONE_DEVICE in case the memory is not private. Due to this particular handling of hot-add/remove memory from HMM/devm, I think it would be nice to provide a helper function in order to make this cleaner, and not populate other regions with code that should belong to memory-hotplug. The helpers are named: del_device_memory add_device_memory The idea is that add_device_memory will be in charge of: a) call either arch_add_memory() or add_pages(), depending on whether we want a linear mapping b) online the memory sections that correspond to the pfn range c) call move_pfn_range_to_zone() being zone ZONE_DEVICE to expand zone/pgdat spanned pages and initialize its pages del_device_memory, on the other hand, will be in charge of: a) offline the memory sections that correspond to the pfn range b) call shrink_zone_pgdat_pages(), which shrinks node/zone spanned pages. c) call either arch_remove_memory() or __remove_pages(), depending on whether we need to tear down the linear mapping or not In order to split up better the patches and ease the review, this patch will only make a) case work for add_device_memory(), and case c) for del_device_memory. The other cases will be added in the next patch. Since [1], hmm is using devm_memremap_pages and devm_memremap_pages_release instead of its own functions, so these two functions have to only be called from devm code. add_device_memory: - devm_memremap_pages() del_device_memory: - devm_memremap_pages_release() Another thing that this patch does is to move init_currently_empty_zone to be protected by the span_lock lock. Zone locking rules states the following: * Locking rules: * * zone_start_pfn and spanned_pages are protected by span_seqlock. * It is a seqlock because it has to be read outside of zone->lock, * and it is done in the main allocator path. But, it is written * quite infrequently. * Since init_currently_empty_zone changes zone_start_pfn, it makes sense to have it envolved by its lock. [1] https://patchwork.kernel.org/patch/10598657/ Signed-off-by: Oscar Salvador --- include/linux/memory_hotplug.h | 7 ++++++ kernel/memremap.c | 48 ++++++++++++++------------------------ mm/memory_hotplug.c | 53 +++++++++++++++++++++++++++++++++++++++--- 3 files changed, 74 insertions(+), 34 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index f9fc35819e65..2f7b8eb4cddb 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -117,6 +117,13 @@ extern int __remove_pages(struct zone *zone, unsigned long start_pfn, extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap, bool want_memblock); +#ifdef CONFIG_ZONE_DEVICE +extern int del_device_memory(int nid, unsigned long start, unsigned long size, + struct vmem_altmap *altmap, bool private_mem); +extern int add_device_memory(int nid, unsigned long start, unsigned long size, + struct vmem_altmap *altmap, bool private_mem); +#endif + #ifndef CONFIG_ARCH_HAS_ADD_PAGES static inline int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap, diff --git a/kernel/memremap.c b/kernel/memremap.c index fe54bba2d7e2..0f168a75c5b0 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -120,8 +120,11 @@ static void devm_memremap_pages_release(void *data) struct device *dev = pgmap->dev; struct resource *res = &pgmap->res; resource_size_t align_start, align_size; + struct vmem_altmap *altmap = pgmap->altmap_valid ? + &pgmap->altmap : NULL; unsigned long pfn; int nid; + bool private_mem; pgmap->kill(pgmap->ref); for_each_device_pfn(pfn, pgmap) @@ -133,17 +136,14 @@ static void devm_memremap_pages_release(void *data) - align_start; nid = dev_to_node(dev); - mem_hotplug_begin(); - if (pgmap->type == MEMORY_DEVICE_PRIVATE) { - pfn = align_start >> PAGE_SHIFT; - __remove_pages(page_zone(pfn_to_page(pfn)), pfn, - align_size >> PAGE_SHIFT, NULL); - } else { - arch_remove_memory(nid, align_start, align_size, - pgmap->altmap_valid ? &pgmap->altmap : NULL); + if (pgmap->type == MEMORY_DEVICE_PRIVATE) + private_mem = true; + else + private_mem = false; + + del_device_memory(nid, align_start, align_size, altmap, private_mem); + if (!private_mem) kasan_remove_zero_shadow(__va(align_start), align_size); - } - mem_hotplug_done(); untrack_pfn(NULL, PHYS_PFN(align_start), align_size); pgmap_radix_release(res, -1); @@ -180,6 +180,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) pgprot_t pgprot = PAGE_KERNEL; int error, nid, is_ram; struct dev_pagemap *conflict_pgmap; + bool private_mem; if (!pgmap->ref || !pgmap->kill) return ERR_PTR(-EINVAL); @@ -239,8 +240,6 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) if (error) goto err_pfn_remap; - mem_hotplug_begin(); - /* * For device private memory we call add_pages() as we only need to * allocate and initialize struct page for the device memory. More- @@ -252,29 +251,16 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) * the CPU, we do want the linear mapping and thus use * arch_add_memory(). */ - if (pgmap->type == MEMORY_DEVICE_PRIVATE) { - error = add_pages(nid, align_start >> PAGE_SHIFT, - align_size >> PAGE_SHIFT, NULL, false); - } else { + if (pgmap->type == MEMORY_DEVICE_PRIVATE) + private_mem = true; + else { error = kasan_add_zero_shadow(__va(align_start), align_size); - if (error) { - mem_hotplug_done(); + if (error) goto err_kasan; - } - - error = arch_add_memory(nid, align_start, align_size, altmap, - false); - } - - if (!error) { - struct zone *zone; - - zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE]; - move_pfn_range_to_zone(zone, align_start >> PAGE_SHIFT, - align_size >> PAGE_SHIFT, altmap); + private_mem = false; } - mem_hotplug_done(); + error = add_device_memory(nid, align_start, align_size, altmap, private_mem); if (error) goto err_add_memory; diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 11b7dcf83323..72928808c5e9 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -764,14 +764,13 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, int nid = pgdat->node_id; unsigned long flags; - if (zone_is_empty(zone)) - init_currently_empty_zone(zone, start_pfn, nr_pages); - clear_zone_contiguous(zone); /* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */ pgdat_resize_lock(pgdat, &flags); zone_span_writelock(zone); + if (zone_is_empty(zone)) + init_currently_empty_zone(zone, start_pfn, nr_pages); resize_zone_range(zone, start_pfn, nr_pages); zone_span_writeunlock(zone); resize_pgdat_range(pgdat, start_pfn, nr_pages); @@ -1904,4 +1903,52 @@ void remove_memory(int nid, u64 start, u64 size) unlock_device_hotplug(); } EXPORT_SYMBOL_GPL(remove_memory); + +#ifdef CONFIG_ZONE_DEVICE +int del_device_memory(int nid, unsigned long start, unsigned long size, + struct vmem_altmap *altmap, bool private_mem) +{ + int ret; + unsigned long start_pfn = PHYS_PFN(start); + unsigned long nr_pages = size >> PAGE_SHIFT; + struct zone *zone = page_zone(pfn_to_page(pfn)); + + mem_hotplug_begin(); + + if (private_mem) + ret = __remove_pages(zone, start_pfn, nr_pages, NULL); + else + ret = arch_remove_memory(nid, start, size, altmap); + + mem_hotplug_done(); + + return ret; +} +#endif #endif /* CONFIG_MEMORY_HOTREMOVE */ + +#ifdef CONFIG_ZONE_DEVICE +int add_device_memory(int nid, unsigned long start, unsigned long size, + struct vmem_altmap *altmap, bool private_mem) +{ + int ret; + unsigned long start_pfn = PHYS_PFN(start); + unsigned long nr_pages = size >> PAGE_SHIFT; + + mem_hotplug_begin(); + + if (private_mem) + ret = add_pages(nid, start_pfn, nr_pages, NULL, false); + else + ret = arch_add_memory(nid, start, size, altmap, false); + + mem_hotplug_done(); + + if (!ret) { + struct zone *zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE]; + move_pfn_range_to_zone(zone, start_pfn, nr_pages, altmap); + } + + return ret; +} +#endif