From patchwork Wed Jun 19 04:17:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anshuman Khandual X-Patchwork-Id: 11003307 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 743DB924 for ; Wed, 19 Jun 2019 04:17:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5C99728A9D for ; Wed, 19 Jun 2019 04:17:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4D16328AA7; Wed, 19 Jun 2019 04:17:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 548AD28A9D for ; Wed, 19 Jun 2019 04:17:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 034566B0003; Wed, 19 Jun 2019 00:17:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F26588E0002; Wed, 19 Jun 2019 00:17:35 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DEE458E0001; Wed, 19 Jun 2019 00:17:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by kanga.kvack.org (Postfix) with ESMTP id 901626B0003 for ; Wed, 19 Jun 2019 00:17:35 -0400 (EDT) Received: by mail-ed1-f71.google.com with SMTP id f19so24243673edv.16 for ; Tue, 18 Jun 2019 21:17:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id; bh=GA+EkdIe7VYbsTa4PymsFj/igR339vfY8iBYL3S9moQ=; b=JeGmlGSHVySvO1UFvP0zUQ+f4Nr5CCeVC74VCLlhdZZ1GAYls9MW7gVx501eHL54xX I0BW4gRJpn16hPNIdyaba0ALzjlj4KvWO+cr1Ni5e0JVOol7X9AgKTsfibzgFL5meDeE rcU47fg+nwp44quaq4TBE5iG0eXTXY8Jsfabxc8sJY0WmzNNuaq2KNxoouINFrCm6xOe oGNlRi9OxAlEs4/hiOfUc0aTAY/adGjnZls0KKWhew8RXoHJl0/F2JezzbEIHrkay8w8 8deAJRVKCFRepEhc22msuxb1ZNOlANfDD3f+sxUTtPR2DdcVmdeOhCIjqgd/vjPJK0Kj +5TA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of anshuman.khandual@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=anshuman.khandual@arm.com X-Gm-Message-State: APjAAAVsnDzuLKV4/saoVHYNqmWcNqz8MZx0/Xq9lfNSHMoEsxcR76Ni cvB1Ga41Q4EM0n+dADnjyl6Lq5AheqVJbjHcJ3j+7TEUyZQfuVZ1xvovvScGsmpbfu6Q/hn+5bx liyqK1EDro1dceVpBLLEAgYIjm2cvenMhX0OdT1uPaq4qqaLpzy5YETszzIaEscvM5Q== X-Received: by 2002:a17:906:ca9:: with SMTP id k9mr97067344ejh.4.1560917854923; Tue, 18 Jun 2019 21:17:34 -0700 (PDT) X-Google-Smtp-Source: APXvYqwOCui44XI1qaYvWSL26PbqB0m7BGYDbSjbwU5cudOcgbPHMQiCqxFCkPFunhuAFy+cXu/F X-Received: by 2002:a17:906:ca9:: with SMTP id k9mr97067287ejh.4.1560917853780; Tue, 18 Jun 2019 21:17:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560917853; cv=none; d=google.com; s=arc-20160816; b=ZpI0F3AVdQxi8bWjmeLHFoJoqCMQ5xhI0XjOeOPFhiwGLEBwRfme2P02HfApU2yePf WtKCqD8kACrWMLgS8ecBu6b78MFt0U88BbEpUdSyzSOzFXaI56TWgeoGdGLP8qP4X1y2 T2xvbr5r48L5F6JhD+gUmlDPxLt9h4Gy+96qxL8zBHGBPXiENWTukyr/1LCgjv3/9ky/ AlmTfrFCEn+Xp0iSpwLT2FRkAWiCjQveEH47bW30uK8IfHrPh4a6I/UltwQcZm33z0gK 79CG/k66nQoUFl7QmYk1Sc2UMwBmnfUiq7n2GUyN8y+NSUL8nDrhwt2UU8oMoJeI6p1q fBmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:date:subject:cc:to:from; bh=GA+EkdIe7VYbsTa4PymsFj/igR339vfY8iBYL3S9moQ=; b=haenOBSO5sLoNGoYpck7I5j8bb7knhMuZ0A/JzYBmke4XI4m1pGQQUSkQr1nvDlH5P uXY64JkIP5+aW9rFMOCdd18TtXGxL7FE7sMtFEGbj134xNkCq3P1ufjs4Q5DhFp78UoA pDcCncqOYVdUQYSj9Ta/afBlibfauhkxFR+FD3rklXE0IEU1ro3s9S0Z2qn1CPzq0B4J FsNhMNOLB6Mossz+CpTUbzPgDkgemq/DtIaE3WGOqN8zsZ2hEarhvy0Ye2mVuumsll/G ThIQx9Zgay2PYFmNNwD2MNEu4hS/F7DvR+M0iiH2rXzGyhNTswfAyz4Xu4rsRcFNiomb Vr5g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of anshuman.khandual@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=anshuman.khandual@arm.com Received: from foss.arm.com (foss.arm.com. [217.140.110.172]) by mx.google.com with ESMTP id m6si3443548eje.94.2019.06.18.21.17.32 for ; Tue, 18 Jun 2019 21:17:33 -0700 (PDT) Received-SPF: pass (google.com: domain of anshuman.khandual@arm.com designates 217.140.110.172 as permitted sender) client-ip=217.140.110.172; Authentication-Results: mx.google.com; spf=pass (google.com: domain of anshuman.khandual@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=anshuman.khandual@arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D88EA360; Tue, 18 Jun 2019 21:17:31 -0700 (PDT) Received: from p8cg001049571a15.blr.arm.com (p8cg001049571a15.blr.arm.com [10.162.43.130]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 350E33F718; Tue, 18 Jun 2019 21:17:24 -0700 (PDT) From: Anshuman Khandual To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, akpm@linux-foundation.org, catalin.marinas@arm.com, will.deacon@arm.com Cc: mark.rutland@arm.com, mhocko@suse.com, ira.weiny@intel.com, david@redhat.com, cai@lca.pw, logang@deltatee.com, james.morse@arm.com, cpandya@codeaurora.org, arunks@codeaurora.org, dan.j.williams@intel.com, mgorman@techsingularity.net, osalvador@suse.de, ard.biesheuvel@arm.com, steve.capper@arm.com Subject: [PATCH V6 0/3] arm64/mm: Enable memory hot remove Date: Wed, 19 Jun 2019 09:47:37 +0530 Message-Id: <1560917860-26169-1-git-send-email-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.7.4 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This series enables memory hot remove on arm64 after fixing a memblock removal ordering problem in generic try_remove_memory() and a possible arm64 platform specific kernel page table race condition. This series is based on linux-next (next-20190613). Concurrent vmalloc() and hot-remove conflict: As pointed out earlier on the v5 thread [2] there can be potential conflict between concurrent vmalloc() and memory hot-remove operation. This can be solved or at least avoided with some possible methods. The problem here is caused by inadequate locking in vmalloc() which protects installation of a page table page but not the walk or the leaf entry modification. Option 1: Making locking in vmalloc() adequate Current locking scheme protects installation of page table pages but not the page table walk or leaf entry creation which can conflict with hot-remove. This scheme is sufficient for now as vmalloc() works on mutually exclusive ranges which can proceed concurrently only if their shared page table pages can be created while inside the lock. It achieves performance improvement which will be compromised if entire vmalloc() operation (even if with some optimization) has to be completed under a lock. Option 2: Making sure hot-remove does not happen during vmalloc() Take mem_hotplug_lock in read mode through [get|put]_online_mems() constructs for the entire duration of vmalloc(). It protects from concurrent memory hot remove operation and does not add any significant overhead to other concurrent vmalloc() threads. It solves the problem in right way unless we do not want to extend the usage of mem_hotplug_lock in generic MM. Option 3: Memory hot-remove does not free (conflicting) page table pages Don't not free page table pages (if any) for vmemmap mappings after unmapping it's virtual range. The only downside here is that some page table pages might remain empty and unused until next memory hot-add operation of the same memory range. Option 4: Dont let vmalloc and vmemmap share intermediate page table pages The conflict does not arise if vmalloc and vmemap range do not share kernel page table pages to start with. If such placement can be ensured in platform kernel virtual address layout, this problem can be successfully avoided. There are two generic solutions (Option 1 and 2) and two platform specific solutions (Options 2 and 3). This series has decided to go with (Option 3) which requires minimum changes while self-contained inside the functionality. Testing: Memory hot remove has been tested on arm64 for 4K, 16K, 64K page config options with all possible CONFIG_ARM64_VA_BITS and CONFIG_PGTABLE_LEVELS combinations. Its only build tested on non-arm64 platforms. Changes in V6: - Implemented most of the suggestions from Mark Rutland - Added in ptdump - remove_pagetable() now has two distinct passes over the kernel page table - First pass unmap_hotplug_range() removes leaf level entries at all level - Second pass free_empty_tables() removes empty page table pages - Kernel page table lock has been dropped completely - vmemmap_free() does not call freee_empty_tables() to avoid conflict with vmalloc() - All address range scanning are converted to do {} while() loop - Added 'unsigned long end' in __remove_pgd_mapping() - Callers need not provide starting pointer argument to free_[pte|pmd|pud]_table() - Drop the starting pointer argument from free_[pte|pmd|pud]_table() functions - Fetching pxxp[i] in free_[pte|pmd|pud]_table() is wrapped around in READ_ONCE() - free_[pte|pmd|pud]_table() now computes starting pointer inside the function - Fixed TLB handling while freeing huge page section mappings at PMD or PUD level - Added WARN_ON(!page) in free_hotplug_page_range() - Added WARN_ON(![pm|pud]_table(pud|pmd)) when there is no section mapping - [PATCH 1/3] mm/hotplug: Reorder memblock_[free|remove]() calls in try_remove_memory() - Request earlier for separate merger (https://patchwork.kernel.org/patch/10986599/) - s/__remove_memory/try_remove_memory in the subject line - s/arch_remove_memory/memblock_[free|remove] in the subject line - A small change in the commit message as re-order happens now for memblock remove functions not for arch_remove_memory() Changes in V5: (https://lkml.org/lkml/2019/5/29/218) - Have some agreement [1] over using memory_hotplug_lock for arm64 ptdump - Change 7ba36eccb3f8 ("arm64/mm: Inhibit huge-vmap with ptdump") already merged - Dropped the above patch from this series - Fixed indentation problem in arch_[add|remove]_memory() as per David - Collected all new Acked-by tags Changes in V4: (https://lkml.org/lkml/2019/5/20/19) - Implemented most of the suggestions from Mark Rutland - Interchanged patch [PATCH 2/4] <---> [PATCH 3/4] and updated commit message - Moved CONFIG_PGTABLE_LEVELS inside free_[pud|pmd]_table() - Used READ_ONCE() in missing instances while accessing page table entries - s/p???_present()/p???_none() for checking valid kernel page table entries - WARN_ON() when an entry is !p???_none() and !p???_present() at the same time - Updated memory hot-remove commit message with additional details as suggested - Rebased the series on 5.2-rc1 with hotplug changes from David and Michal Hocko - Collected all new Acked-by tags Changes in V3: (https://lkml.org/lkml/2019/5/14/197) - Implemented most of the suggestions from Mark Rutland for remove_pagetable() - Fixed applicable PGTABLE_LEVEL wrappers around pgtable page freeing functions - Replaced 'direct' with 'sparse_vmap' in remove_pagetable() with inverted polarity - Changed pointer names ('p' at end) and removed tmp from iterations - Perform intermediate TLB invalidation while clearing pgtable entries - Dropped flush_tlb_kernel_range() in remove_pagetable() - Added flush_tlb_kernel_range() in remove_pte_table() instead - Renamed page freeing functions for pgtable page and mapped pages - Used page range size instead of order while freeing mapped or pgtable pages - Removed all PageReserved() handling while freeing mapped or pgtable pages - Replaced XXX_index() with XXX_offset() while walking the kernel page table - Used READ_ONCE() while fetching individual pgtable entries - Taken overall init_mm.page_table_lock instead of just while changing an entry - Dropped previously added [pmd|pud]_index() which are not required anymore - Added a new patch to protect kernel page table race condition for ptdump - Added a new patch from Mark Rutland to prevent huge-vmap with ptdump Changes in V2: (https://lkml.org/lkml/2019/4/14/5) - Added all received review and ack tags - Split the series from ZONE_DEVICE enablement for better review - Moved memblock re-order patch to the front as per Robin Murphy - Updated commit message on memblock re-order patch per Michal Hocko - Dropped [pmd|pud]_large() definitions - Used existing [pmd|pud]_sect() instead of earlier [pmd|pud]_large() - Removed __meminit and __ref tags as per Oscar Salvador - Dropped unnecessary 'ret' init in arch_add_memory() per Robin Murphy - Skipped calling into pgtable_page_dtor() for linear mapping page table pages and updated all relevant functions Changes in V1: (https://lkml.org/lkml/2019/4/3/28) References: [1] https://lkml.org/lkml/2019/5/28/584 [2] https://lkml.org/lkml/2019/6/11/709 Anshuman Khandual (3): mm/hotplug: Reorder memblock_[free|remove]() calls in try_remove_memory() arm64/mm: Hold memory hotplug lock while walking for kernel page table dump arm64/mm: Enable memory hot remove arch/arm64/Kconfig | 3 + arch/arm64/mm/mmu.c | 290 +++++++++++++++++++++++++++++++++++++++-- arch/arm64/mm/ptdump_debugfs.c | 4 + mm/memory_hotplug.c | 4 +- 4 files changed, 290 insertions(+), 11 deletions(-)