From patchwork Fri Nov 30 17:59:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 10706955 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 79781109C for ; Fri, 30 Nov 2018 18:00:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 656F42FF1B for ; Fri, 30 Nov 2018 18:00:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 566632FF33; Fri, 30 Nov 2018 18:00:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A49872FF1B for ; Fri, 30 Nov 2018 18:00:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A56606B5978; Fri, 30 Nov 2018 12:59:59 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A05386B5979; Fri, 30 Nov 2018 12:59:59 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8CD736B597A; Fri, 30 Nov 2018 12:59:59 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by kanga.kvack.org (Postfix) with ESMTP id 612EB6B5978 for ; Fri, 30 Nov 2018 12:59:59 -0500 (EST) Received: by mail-qt1-f198.google.com with SMTP id z6so6094984qtj.21 for ; Fri, 30 Nov 2018 09:59:59 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=sIgZT5dIN9J663fMGgb7vGURpyTl84HMrHO43+ecLco=; b=PWlnwp78CKUBbKqtS47tc2gviU4lzl+GCy1bt1BvnS/1fmarYsQe7Nfl5VsqTjYQEH 9zURs81TjEmPwsJOdV5/9TrYzM9i/bHgJFobqfgo2YaVKf2BVxeBGEVnDSZX2zRaE2X+ 9km6SfEXGkG+61HT/HdOiXq9xFoXbXk27X/XWsFPIAy2q488CZbwYR5uNCxrLjloe9/T GBT64lHoASOo9kgveMLkBfwzTSGhooh8HGbSnfsv6ik6XeC+I3lfazTRbhis7oZNQkv+ r9wAa0vScH9/8rjkRTiDfWFLtrOWRbi/uGQk1BQyiUb1UML4t185DUmUb8Uq+OGrM6yL 2NRw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of david@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AA+aEWbM9bPkqvfxj21fw7wDfpa0uRACEnhYJBXNffCmGqPTP0qd3gjX b0LC0WWVZEBuSZtkmUz5pPXgEkDJNOWUeOkS7Vz5qp9ONQf3tfWT+Rbt75emSraW02Qcd+0fh7U xrRKRSGOvd00XGAUVyV/7ygI0Wc6MS10HeDaujcp3JDboGLd/EEuLNq1csWDUz6phyA== X-Received: by 2002:a37:f706:: with SMTP id q6mr5943717qkj.96.1543600799097; Fri, 30 Nov 2018 09:59:59 -0800 (PST) X-Google-Smtp-Source: AFSGD/UxtXXxxskI6sb3AWXHIT2oP5ztMMMdYDHQyCyllb5CXgMAKnOAXyjzkIL3jK0qLrWRxLN7 X-Received: by 2002:a37:f706:: with SMTP id q6mr5943687qkj.96.1543600798413; Fri, 30 Nov 2018 09:59:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543600798; cv=none; d=google.com; s=arc-20160816; b=JDpvucQCGXcc0MJVsrhRSwaT0C6SXqkOnobIhEY8ZuKmmfcL70Mi1EVcZG1lhNoNji PgLhNfm/hEIs/Js0KsGZ0g/QUL/2qtNxgk83khxuG5zIdRz5WaeorVsbU54TCisrngfA 9BYQAjoOr6ncGGfXGeQJEzlUdgMDQWakP2qPQzN22NPZadjTgzE9tjEVIz83Gnb2+2YA ZgvlF34uRNk4XYoJc3HYi91wNfNYfBu+XcRsL7429eZKjr9fQS9GL+uAi3xnm976qzGB 53D65djphKWFUNht1F3ugYCJ0d16BuWv7TsIJchCLTfcG8Z5hiDWj0/4FbhhIXGxaAtt U06Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=sIgZT5dIN9J663fMGgb7vGURpyTl84HMrHO43+ecLco=; b=E5utkiT9c4mUp58RHS0HtOv7pzFIAbj5vHTMWg8HKfdY6StxEitFDI29QxY5wZYdy0 e8THvUgFnIX70DN58h9hwXtGfv1Gz+VzlYFYz3XW4Tl+aco6TBo7QFtDd/xgkk/kCn3e wdRFSAFVGYt7GN25k2kqkKt95C83OLXZUXe6S6cVbmkN2AF1ckcVWZQtrqbx49S9LXos BukzyVL7/HF2ZTPjhMr0tdOC6C/KCr/c8oYcJFYYZcvYCOR1gHmWEMi8ykm9qbMHXiKY 8c+4y1d0eKf0rndRgBN0+iQQJZSU2OWh4mc6ZOWS8Ypvt4/XIMU2Q+Wibk9DrjHPrkom 3zSA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of david@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id f196si4021590qka.61.2018.11.30.09.59.58 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 30 Nov 2018 09:59:58 -0800 (PST) Received-SPF: pass (google.com: domain of david@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of david@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4937F8E664; Fri, 30 Nov 2018 17:59:57 +0000 (UTC) Received: from t460s.redhat.com (ovpn-126-156.rdu2.redhat.com [10.10.126.156]) by smtp.corp.redhat.com (Postfix) with ESMTP id 734C55D9C9; Fri, 30 Nov 2018 17:59:44 +0000 (UTC) From: David Hildenbrand To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, linux-acpi@vger.kernel.org, devel@linuxdriverproject.org, xen-devel@lists.xenproject.org, x86@kernel.org, David Hildenbrand , Greg Kroah-Hartman , "Rafael J. Wysocki" , Andrew Morton , Ingo Molnar , Pavel Tatashin , Stephen Rothwell , Andrew Banman , "mike.travis@hpe.com" , Oscar Salvador , Dave Hansen , Michal Hocko , =?utf-8?q?Michal_Such=C3=A1nek?= , Vitaly Kuznetsov , Dan Williams , Pavel Tatashin , Martin Schwidefsky , Heiko Carstens Subject: [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types Date: Fri, 30 Nov 2018 18:59:19 +0100 Message-Id: <20181130175922.10425-2-david@redhat.com> In-Reply-To: <20181130175922.10425-1-david@redhat.com> References: <20181130175922.10425-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Fri, 30 Nov 2018 17:59:57 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Memory onlining should always be handled by user space, because only user space knows which use cases it wants to satisfy. E.g. memory might be onlined to the MOVABLE zone even if it can never be removed from the system, e.g. to make usage of huge pages more reliable. However to implement such rules (especially default rules in distributions) we need more information about the memory that was added in user space. E.g. on x86 we want to online memory provided by balloon devices (e.g. XEN, Hyper-V) differently (-> will not be unplugged by offlining the whole block) than ordinary DIMMs (-> might eventually be unplugged by offlining the whole block). This might also become relevat for other architectures. Also, udev rules right now check if running on s390x and treat all added memory blocks as standby memory (-> don't online automatically). As soon as we support other memory hotplug mechanism (e.g. virtio-mem) checks would have to get more involved (e.g. also check if under KVM) but eventually also wrong (e.g. if KVM ever supports standby memory we are doomed). I decided to allow to specify the type of memory that is getting added to the system. Let's start with two types, BOOT and UNSPECIFIED to get the basic infrastructure running. We'll introduce and use further types in follow-up patches. For now we classify any hotplugged memory temporarily as as UNSPECIFIED (which will eventually be dropped later on). Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Cc: Andrew Morton Cc: Ingo Molnar Cc: Pavel Tatashin Cc: Stephen Rothwell Cc: Andrew Banman Cc: "mike.travis@hpe.com" Cc: Oscar Salvador Cc: Dave Hansen Cc: Michal Hocko Cc: Michal Suchánek Cc: Vitaly Kuznetsov Cc: Dan Williams Cc: Pavel Tatashin Cc: Martin Schwidefsky Cc: Heiko Carstens Signed-off-by: David Hildenbrand --- drivers/base/memory.c | 38 +++++++++++++++++++++++++++++++++++--- include/linux/memory.h | 27 +++++++++++++++++++++++++++ 2 files changed, 62 insertions(+), 3 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 0c290f86ab20..17f2985c07c5 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -381,6 +381,29 @@ static ssize_t show_phys_device(struct device *dev, return sprintf(buf, "%d\n", mem->phys_device); } +static ssize_t type_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct memory_block *mem = to_memory_block(dev); + ssize_t len = 0; + + switch (mem->type) { + case MEMORY_BLOCK_UNSPECIFIED: + len = sprintf(buf, "unspecified\n"); + break; + case MEMORY_BLOCK_BOOT: + len = sprintf(buf, "boot\n"); + break; + default: + len = sprintf(buf, "ERROR-UNKNOWN-%ld\n", + mem->state); + WARN_ON(1); + break; + } + + return len; +} + #ifdef CONFIG_MEMORY_HOTREMOVE static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn, unsigned long nr_pages, int online_type, @@ -442,6 +465,7 @@ static DEVICE_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL); static DEVICE_ATTR(state, 0644, show_mem_state, store_mem_state); static DEVICE_ATTR(phys_device, 0444, show_phys_device, NULL); static DEVICE_ATTR(removable, 0444, show_mem_removable, NULL); +static DEVICE_ATTR_RO(type); /* * Block size attribute stuff @@ -620,6 +644,7 @@ static struct attribute *memory_memblk_attrs[] = { &dev_attr_state.attr, &dev_attr_phys_device.attr, &dev_attr_removable.attr, + &dev_attr_type.attr, #ifdef CONFIG_MEMORY_HOTREMOVE &dev_attr_valid_zones.attr, #endif @@ -657,13 +682,17 @@ int register_memory(struct memory_block *memory) } static int init_memory_block(struct memory_block **memory, - struct mem_section *section, unsigned long state) + struct mem_section *section, unsigned long state, + int type) { struct memory_block *mem; unsigned long start_pfn; int scn_nr; int ret = 0; + if (type == MEMORY_BLOCK_NONE) + return -EINVAL; + mem = kzalloc(sizeof(*mem), GFP_KERNEL); if (!mem) return -ENOMEM; @@ -675,6 +704,7 @@ static int init_memory_block(struct memory_block **memory, mem->state = state; start_pfn = section_nr_to_pfn(mem->start_section_nr); mem->phys_device = arch_get_memory_phys_device(start_pfn); + mem->type = type; ret = register_memory(mem); @@ -699,7 +729,8 @@ static int add_memory_block(int base_section_nr) if (section_count == 0) return 0; - ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE); + ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE, + MEMORY_BLOCK_BOOT); if (ret) return ret; mem->section_count = section_count; @@ -722,7 +753,8 @@ int hotplug_memory_register(int nid, struct mem_section *section) mem->section_count++; put_device(&mem->dev); } else { - ret = init_memory_block(&mem, section, MEM_OFFLINE); + ret = init_memory_block(&mem, section, MEM_OFFLINE, + MEMORY_BLOCK_UNSPECIFIED); if (ret) goto out; mem->section_count++; diff --git a/include/linux/memory.h b/include/linux/memory.h index d75ec88ca09d..06268e96e0da 100644 --- a/include/linux/memory.h +++ b/include/linux/memory.h @@ -34,12 +34,39 @@ struct memory_block { int (*phys_callback)(struct memory_block *); struct device dev; int nid; /* NID for this memory block */ + int type; /* type of this memory block */ }; int arch_get_memory_phys_device(unsigned long start_pfn); unsigned long memory_block_size_bytes(void); int set_memory_block_size_order(unsigned int order); +/* + * Memory block types allow user space to formulate rules if and how to + * online memory blocks. The types are exposed to user space as text + * strings in sysfs. + * + * MEMORY_BLOCK_NONE: + * No memory block is to be created (e.g. device memory). Not exposed to + * user space. + * + * MEMORY_BLOCK_UNSPECIFIED: + * The type of memory block was not further specified when adding the + * memory block. + * + * MEMORY_BLOCK_BOOT: + * This memory block was added during boot by the basic system. No + * specific device driver takes care of this memory block. This memory + * block type is onlined automatically by the kernel during boot and might + * later be managed by a different device driver, in which case the type + * might change. + */ +enum { + MEMORY_BLOCK_NONE = 0, + MEMORY_BLOCK_UNSPECIFIED, + MEMORY_BLOCK_BOOT, +}; + /* These states are exposed to userspace as text strings in sysfs */ #define MEM_ONLINE (1<<0) /* exposed to userspace */ #define MEM_GOING_OFFLINE (1<<1) /* exposed to userspace */