From patchwork Mon Aug 17 21:45:56 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 7027281 Return-Path: X-Original-To: patchwork-linux-nvdimm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 69BE09F358 for ; Mon, 17 Aug 2015 21:46:09 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 7B1092071A for ; Mon, 17 Aug 2015 21:46:08 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 94B7C2077C for ; Mon, 17 Aug 2015 21:46:07 +0000 (UTC) Received: from ml01.vlan14.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 66010182878; Mon, 17 Aug 2015 14:46:07 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received: from mail-qg0-x22c.google.com (mail-qg0-x22c.google.com [IPv6:2607:f8b0:400d:c04::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 4A3DF182759 for ; Mon, 17 Aug 2015 14:46:06 -0700 (PDT) Received: by qgdd90 with SMTP id d90so104121077qgd.3 for ; Mon, 17 Aug 2015 14:46:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=qEAbg6TpNaM0Rv5utF+jWy1P8YpBHa25p5mjwU2d8pc=; b=PAJnrzQ2Ucgo1Tq/aXCnp1hlmLs0t5n0Xhr00VIMpUdqNiJMGe5csf0MZa0qD2X8Gf 8KMjMfn5vaxpeolPFId3LA0EnodC066zlFWwnzCewFzxZOkpjiJdzOxJK2IcQFYXW4wK gIAKWAiDEudxC3VPdqaNOpipzW+imIzRQHUkyba8QRh0eif59YMLqihzEFXWnN2cGayB HQ8JJRFgYIKUXmUkapjrGMmqKTVdKk80OO2JHzsyD6oI4EduGxNynGf7aqYJQ0psVqdx iFQMRlImQja+B89bl/uGJtEL4XZdlc6qiDCXXo7JJwndzMe0txQw1UWjEulWknxZmMdC OWQw== X-Received: by 10.140.101.212 with SMTP id u78mr6445127qge.51.1439847965085; Mon, 17 Aug 2015 14:46:05 -0700 (PDT) Received: from gmail.com (nat-pool-bos-t.redhat.com. [66.187.233.206]) by smtp.gmail.com with ESMTPSA id k49sm8968266qgf.34.2015.08.17.14.46.03 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 17 Aug 2015 14:46:04 -0700 (PDT) Date: Mon, 17 Aug 2015 17:45:56 -0400 From: Jerome Glisse To: Dan Williams Subject: Re: [RFC PATCH 1/7] x86, mm: ZONE_DEVICE for "device memory" Message-ID: <20150817214554.GA5976@gmail.com> References: <20150813031253.36913.29580.stgit@otcpl-skl-sds-2.jf.intel.com> <20150813035005.36913.77364.stgit@otcpl-skl-sds-2.jf.intel.com> <20150814213714.GA3265@gmail.com> <20150814220605.GB3265@gmail.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Cc: Rik van Riel , "linux-nvdimm@lists.01.org" , Dave Hansen , david , "linux-kernel@vger.kernel.org" , Christoph Hellwig , Linux MM , Ingo Molnar , Mel Gorman , "H. Peter Anvin" , "torvalds@linux-foundation.org" , Ingo Molnar X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Fri, Aug 14, 2015 at 07:11:27PM -0700, Dan Williams wrote: > On Fri, Aug 14, 2015 at 3:33 PM, Dan Williams wrote: > > On Fri, Aug 14, 2015 at 3:06 PM, Jerome Glisse wrote: > >> On Fri, Aug 14, 2015 at 02:52:15PM -0700, Dan Williams wrote: > >>> On Fri, Aug 14, 2015 at 2:37 PM, Jerome Glisse wrote: > >>> > On Wed, Aug 12, 2015 at 11:50:05PM -0400, Dan Williams wrote: > > [..] > >>> > What is the rational for not updating max_pfn, max_low_pfn, ... ? > >>> > > >>> > >>> The idea is that this memory is not meant to be available to the page > >>> allocator and should not count as new memory capacity. We're only > >>> hotplugging it to get struct page coverage. > >> > >> But this sounds bogus to me to rely on max_pfn to stay smaller than > >> first_dev_pfn. For instance you might plug a device that register > >> dev memory and then some regular memory might be hotplug, effectively > >> updating max_pfn to a value bigger than first_dev_pfn. > >> > > > > True. > > > >> Also i do not think that the buddy allocator use max_pfn or max_low_pfn > >> to consider page/zone for allocation or not. > > > > Yes, I took it out with no effects. I'll investigate further whether > > we should be touching those variables or not for this new usage. > > Although it does not offer perfect protection if device memory is at a > physically lower address than RAM, skipping the update of these > variables does seem to be what we want. For example /dev/mem would > fail to allow write access to persistent memory if it fails a > valid_phys_addr_range() check. Since /dev/mem does not know how to > write to PMEM in a reliably persistent way, it should not treat a > PMEM-pfn like RAM. So i attach is a patch that should keep ZONE_DEVICE out of consideration for the buddy allocator. You might also want to keep page reserved and not free inside the zone, you could replace the generic_online_page() using set_online_page_callback() while hotpluging device memory. Regarding /dev/mem i would not worry about highmem, as /dev/mem is already broken in respect to memory hole that might exist (at least that is my understanding). Alternatively if you really care about /dev/mem you could add an arch valid_phys_addr_range() that could check valid zone. Cheers, Jérôme From 45976e1186eee45ecb277fe5293a7cfa7466d740 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= Date: Mon, 17 Aug 2015 17:31:27 -0400 Subject: [PATCH] mm/ZONE_DEVICE: Keep ZONE_DEVICE out of allocation zonelist. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Memory inside a ZONE_DEVICE should never be consider by the buddy allocator and thus any such zone should never be added to any of the zonelist. This patch just do that. Signed-off-by: Jérôme Glisse --- mm/page_alloc.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ef19f22..f3e26de 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3834,6 +3834,13 @@ static int build_zonelists_node(pg_data_t *pgdat, struct zonelist *zonelist, do { zone_type--; zone = pgdat->node_zones + zone_type; + /* + * Device zone is special memory and should never be consider + * for regular allocation. It is expected that page in device + * zone will be allocated by other means. + */ + if (is_dev_zone(zone)) + continue; if (populated_zone(zone)) { zoneref_set_zone(zone, &zonelist->_zonerefs[nr_zones++]);