From patchwork Wed Aug 15 20:30:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 10566803 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 05B7314E1 for ; Wed, 15 Aug 2018 20:34:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BA8A12ACA9 for ; Wed, 15 Aug 2018 20:34:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AEDB42B006; Wed, 15 Aug 2018 20:34:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0DA572ACA9 for ; Wed, 15 Aug 2018 20:34:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 81C1D6B026D; Wed, 15 Aug 2018 16:34:22 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7DA656B0270; Wed, 15 Aug 2018 16:34:22 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 645476B0272; Wed, 15 Aug 2018 16:34:22 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl0-f71.google.com (mail-pl0-f71.google.com [209.85.160.71]) by kanga.kvack.org (Postfix) with ESMTP id 143156B026D for ; Wed, 15 Aug 2018 16:34:22 -0400 (EDT) Received: by mail-pl0-f71.google.com with SMTP id q2-v6so1265920plh.12 for ; Wed, 15 Aug 2018 13:34:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=ahVot9+x7CJ53bz7+tDWa2qrE+1TmmU52RkD8nINRws=; b=QiZHnUhoJUWt/k5CGvyjPI65AogubVTwWetXMNqQGM7yUt/9fQH7YLipFbN+iBl4dA zgAFW2bHzmrj6GnjeRH9UzjEmIwmgdBdwLpnxnL8RrUeGrO+soQLKCIKfrzZf+RMsPTU kZdEP5Jg9PWybVoGTQbXQSMhvIdjh2ewY7YSE5FB/9AJdx/HDFH42lxinrv+AymZVIW3 42qZ+zRdgOduqaAB0/zYTMfenw0cHCEdH6BXLU1ty+WsgmBWEDR8HlF5MvdBVmjWJeVe 3gCStljcfBsv3FpVjkrMdlWv8PEHbm0/E2J1nTkLNK1iFwEKXDgcQTI2HYiKVH5Q0HFj HG3Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AOUpUlGbZnzZxg0xxQvxp/sblTEy8/ECzABRBg9pJBSKFVKTrqMyLklP 8U1ZqIv9SPXLPUj17fMDomWJ6eiUwysA8sNRqQfozT85skAwLq6C+tDCJQAAL4zAigCZmoDCIPn 6Vf5wv9kgBeOrkQpiZu9KRf08FaJFiM3HrblIyk/Ltgdm6Rw/3DoxSVGbUllcOKJhQQ== X-Received: by 2002:a17:902:40d:: with SMTP id 13-v6mr11323095ple.170.1534365261744; Wed, 15 Aug 2018 13:34:21 -0700 (PDT) X-Google-Smtp-Source: AA+uWPwlVTOk+aV2ejvyqv9XZV5ESCfEehcxIDgO8LM3yaJQvWkSPUoTOSV+8vbSw4l4zSLE6iLO X-Received: by 2002:a17:902:40d:: with SMTP id 13-v6mr11323053ple.170.1534365260814; Wed, 15 Aug 2018 13:34:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534365260; cv=none; d=google.com; s=arc-20160816; b=svTs6pirs7ckkYPZxmQMJUk6I87xNA7B0Sz4MOcInHmt7YKydvT0eD1kvoqjNVMkjU egsC82kLY5PZ/MA5BcMRUtITCiG4eeK2dYPrqRrs8cfujNfuSQBO3rE7vih7pcffM7NU 0sqBoriyb5tNJrCL7A+gWOi0LwGOHVTy91MJi6F1EGEH9APPAUx4xf+bgAuvAbdcvxjo WWtyHBREGtU9G/6msigK42ltMxvq9ZJuSQtlqnqmM6+Fea+c5OPa5MlsCy92FDU01j8l 7i/gzs6RZBsHKCiSknyTo3fhcRppRvG1fr6u2/X/WMNXsPKXYdzsaRmEc9cGgQQZ8Yl1 IkNg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=ahVot9+x7CJ53bz7+tDWa2qrE+1TmmU52RkD8nINRws=; b=dzfL6yT+Rfd0SpX6vfyQVvAr4zWhCQV6Um1vfNsFvaV4bizIqbWrPPn+6f0zOj9peL jCrxCmRyDsQ+jU6jRVYs6tdtJ8gNjIco+ub0r7zJId06DQkH9lkpcbequnAq7nCV4b0K YlwStKquUQRfCxHgc7zLNSmCfB/KpKw85WabRnEqOOLThEbhzKS884VgHihkJU4eOiy3 maCvAVQGVULdL7TQdIUCyWuv8jAkEhJIj7CyJ0XcE3aXfAlIUkJEM4+0tvk54lpY5oZb 7i367cDaDTsihr12uJMSXIdmlLW1om0YmVwC5EbmIhBcrXUhOOOm/bWkWHjHC5Ad/S9r k6wg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga03.intel.com (mga03.intel.com. [134.134.136.65]) by mx.google.com with ESMTPS id n3-v6si18917408pld.146.2018.08.15.13.34.20 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 15 Aug 2018 13:34:20 -0700 (PDT) Received-SPF: pass (google.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) client-ip=134.134.136.65; Authentication-Results: mx.google.com; spf=pass (google.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 15 Aug 2018 13:34:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,244,1531810800"; d="scan'208";a="224930394" Received: from rpedgeco-hp-z240-tower-workstation.jf.intel.com (HELO rpedgeco-DESK5.jf.intel.com) ([10.54.75.168]) by orsmga004.jf.intel.com with ESMTP; 15 Aug 2018 13:34:18 -0700 From: Rick Edgecombe To: tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, daniel@iogearbox.net, jannh@google.com, keescook@chromium.org Cc: kristen@linux.intel.com, dave.hansen@intel.com, arjan@linux.intel.com, Rick Edgecombe Subject: [PATCH v3 1/3] vmalloc: Add __vmalloc_node_try_addr function Date: Wed, 15 Aug 2018 13:30:17 -0700 Message-Id: <1534365020-18943-2-git-send-email-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1534365020-18943-1-git-send-email-rick.p.edgecombe@intel.com> References: <1534365020-18943-1-git-send-email-rick.p.edgecombe@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Create __vmalloc_node_try_addr function that tries to allocate at a specific address and supports caller specified behavior for whether any lazy purging happens if there is a collision. Signed-off-by: Rick Edgecombe --- include/linux/vmalloc.h | 3 + mm/vmalloc.c | 177 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 180 insertions(+) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 398e9c9..c7712c8 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -82,6 +82,9 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align, unsigned long start, unsigned long end, gfp_t gfp_mask, pgprot_t prot, unsigned long vm_flags, int node, const void *caller); +extern void *__vmalloc_node_try_addr(unsigned long addr, unsigned long size, + gfp_t gfp_mask, pgprot_t prot, unsigned long vm_flags, + int node, int try_purge, const void *caller); #ifndef CONFIG_MMU extern void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags); static inline void *__vmalloc_node_flags_caller(unsigned long size, int node, diff --git a/mm/vmalloc.c b/mm/vmalloc.c index cfea25b..fb85ec9 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -709,6 +709,7 @@ static void purge_vmap_area_lazy(void) __purge_vmap_area_lazy(ULONG_MAX, 0); mutex_unlock(&vmap_purge_lock); } +EXPORT_SYMBOL(purge_vmap_area_lazy); /* * Free a vmap area, caller ensuring that the area has been unmapped @@ -1709,6 +1710,182 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, return NULL; } +static bool pvm_find_next_prev(unsigned long end, + struct vmap_area **pnext, + struct vmap_area **pprev); + +/* Try to allocate a region of KVA of the specified address and size. */ +static struct vmap_area *try_alloc_vmap_area(unsigned long addr, + unsigned long size, int node, gfp_t gfp_mask, + int try_purge) +{ + struct vmap_area *va; + struct vmap_area *cur_va = NULL; + struct vmap_area *first_before = NULL; + int need_purge = 0; + int blocked = 0; + int purged = 0; + unsigned long addr_end; + + WARN_ON(!size); + WARN_ON(offset_in_page(size)); + + addr_end = addr + size; + if (addr > addr_end) + return ERR_PTR(-EOVERFLOW); + + might_sleep(); + + va = kmalloc_node(sizeof(struct vmap_area), + gfp_mask & GFP_RECLAIM_MASK, node); + if (unlikely(!va)) + return ERR_PTR(-ENOMEM); + + /* + * Only scan the relevant parts containing pointers to other objects + * to avoid false negatives. + */ + kmemleak_scan_area(&va->rb_node, SIZE_MAX, gfp_mask & GFP_RECLAIM_MASK); + +retry: + spin_lock(&vmap_area_lock); + + pvm_find_next_prev(addr, &cur_va, &first_before); + + if (!cur_va) + goto found; + + /* + * If there is no VA that starts before the target address, start the + * check from the closest VA to still check for the case allocation + * overlaps it at the end. + */ + if (first_before && addr < first_before->va_end) + cur_va = first_before; + + /* Linearly search through to make sure there is a hole */ + while (cur_va->va_start < addr_end) { + if (cur_va->va_end > addr) { + if (cur_va->flags & VM_LAZY_FREE) { + need_purge = 1; + } else { + blocked = 1; + break; + } + } + + if (list_is_last(&cur_va->list, &vmap_area_list)) + break; + + cur_va = list_next_entry(cur_va, list); + } + + /* + * If a non-lazy free va blocks the allocation, or + * we are not supposed to purge, but we need to the + * allocation fails. + */ + if (blocked || (need_purge && !try_purge)) + goto fail; + + if (try_purge && need_purge) { + /* if purged once before, give up */ + if (purged) + goto fail; + + /* + * If the va blocking the allocation is set to + * be purged then purge all vmap_areas that are + * set to purged since this will flush the TLBs + * anyway. + */ + spin_unlock(&vmap_area_lock); + purge_vmap_area_lazy(); + need_purge = 0; + purged = 1; + goto retry; + } + +found: + va->va_start = addr; + va->va_end = addr_end; + va->flags = 0; + __insert_vmap_area(va); + spin_unlock(&vmap_area_lock); + + return va; +fail: + spin_unlock(&vmap_area_lock); + kfree(va); + if (need_purge && !blocked) + return ERR_PTR(-EUCLEAN); + return ERR_PTR(-EBUSY); +} + +/** + * __vmalloc_try_addr - try to alloc at a specific address + * @addr: address to try + * @size: size to try + * @gfp_mask: flags for the page level allocator + * @prot: protection mask for the allocated pages + * @vm_flags: additional vm area flags (e.g. %VM_NO_GUARD) + * @node: node to use for allocation or NUMA_NO_NODE + * @try_purge: try to purge if needed to fulfill and allocation + * @caller: caller's return address + * + * Try to allocate at the specific address. If it succeeds the address is + * returned. If it fails an EBUSY ERR_PTR is returned. If try_purge is + * zero, it will return an EUCLEAN ERR_PTR if it could have allocated if it + * was allowed to purge. It may trigger TLB flushes if a purge is needed, + * and try_purge is set. + */ +void *__vmalloc_node_try_addr(unsigned long addr, unsigned long size, + gfp_t gfp_mask, pgprot_t prot, unsigned long vm_flags, + int node, int try_purge, const void *caller) +{ + struct vmap_area *va; + struct vm_struct *area; + void *alloc_addr; + unsigned long real_size = size; + + size = PAGE_ALIGN(size); + if (!size || (size >> PAGE_SHIFT) > totalram_pages) + return NULL; + + WARN_ON(in_interrupt()); + + area = kzalloc_node(sizeof(*area), gfp_mask & GFP_RECLAIM_MASK, node); + if (unlikely(!area)) { + warn_alloc(gfp_mask, NULL, "kmalloc: allocation failure"); + return ERR_PTR(-ENOMEM); + } + + if (!(vm_flags & VM_NO_GUARD)) + size += PAGE_SIZE; + + va = try_alloc_vmap_area(addr, size, node, gfp_mask, try_purge); + if (IS_ERR(va)) + goto fail; + + setup_vmalloc_vm(area, va, vm_flags, caller); + + alloc_addr = __vmalloc_area_node(area, gfp_mask, prot, node); + if (!alloc_addr) { + warn_alloc(gfp_mask, NULL, + "vmalloc: allocation failure: %lu bytes", real_size); + return ERR_PTR(-ENOMEM); + } + + clear_vm_uninitialized_flag(area); + + kmemleak_vmalloc(area, size, gfp_mask); + + return alloc_addr; +fail: + kfree(area); + return va; +} + /** * __vmalloc_node_range - allocate virtually contiguous memory * @size: allocation size From patchwork Wed Aug 15 20:30:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 10566805 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 18935921 for ; Wed, 15 Aug 2018 20:34:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BD7B02ACA9 for ; Wed, 15 Aug 2018 20:34:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B191E2AD66; Wed, 15 Aug 2018 20:34:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0BCC92ACA9 for ; Wed, 15 Aug 2018 20:34:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 02EB06B0270; Wed, 15 Aug 2018 16:34:23 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F20FC6B0271; Wed, 15 Aug 2018 16:34:22 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE6D56B0272; Wed, 15 Aug 2018 16:34:22 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 96CDA6B0271 for ; Wed, 15 Aug 2018 16:34:22 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id e15-v6so1024503pfi.5 for ; Wed, 15 Aug 2018 13:34:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=BEXcMjnNTOXJwbPz65J095Omuwtk8/8Za5oTyTpcKxk=; b=PeBzfVF62a6vvHknH2rFWMSFMomyvGNz7XzZAQcCsazySa9IcIVJULqUorcny6YAoQ T7lRgJEMo1tNswMqiZYJC1JyWd/3KNTM1uCaMiyro59GpJSG3QfG5477z7xFyR9TJqxE m7fSUHYWmeUC303BVLcFW01O5FoCzsPPQArn+uoM5EthRjmZ+zQe+qn4h2nRwsZfeE8M s4lq67vvAx4lJRpxJ8FKbDIQZQNmG93WrWiXkClESoeh1hXGZgpFEt4/DJivXa2bkBqO 8ThDcejvzNpq14ruMN9VmH6siUtphxCN1xB/sWhE5Xz2GM5GMPIMlTYKISlFM7Zjq5tc fiSQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AOUpUlHbG13WzgUXGj9b3jIL+/j+2E89jY/QDv4H/hK8GhEERv1axoaZ y7ATLQxh4Afjud+N70HYWWfENE458jUBZBEaYpVMCJV4kb5XICcV/oIWuHDw0tOffwkS4uq66/Q e5JWVyJWsLASncU86XK//n2RShh+lDEitOzxxjhhpYF5ql8AkdM5rB6M4Hq7+WGsg5g== X-Received: by 2002:a65:58c8:: with SMTP id e8-v6mr11230992pgu.96.1534365262221; Wed, 15 Aug 2018 13:34:22 -0700 (PDT) X-Google-Smtp-Source: AA+uWPxdjihzvUaYhNWl+1TmXDac0N/txS3f/ZSFRgTDAdyvPOhbmYAd6sBwSVPL6Jn6WilWLMiu X-Received: by 2002:a65:58c8:: with SMTP id e8-v6mr11230936pgu.96.1534365261041; Wed, 15 Aug 2018 13:34:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534365261; cv=none; d=google.com; s=arc-20160816; b=GcQYHcrK2jahnfwa9vuzz9KLue1OafkJSR/q/LIob3HeA3ewuTvMTvH7Wz7MvFmdRp khq+ActluM1KU1yoJL67ZX0OjqNyEUNGtBSB2Bol4o8c4rbBxg/wuBMACpgIO6KMZUnN 45WOo4v47vc6hBElBdpg56q4FhaLuyDcztb3+Tb/nsYBvb2cOQ04GJKGKUaWEnO0G2Vb UgqUyLnVRepKIggdLI2x8vRlUfABr+AKHuCLKj7nU7hfLgm/sdkInvT7BU2DLMdoSe85 A7KjxTO9Ik+A6LTAL+6m61Hif9aoIqfHK06hkAnkSUe21jcKSR2fgermG/qzt+DEYUZR 1vnQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=BEXcMjnNTOXJwbPz65J095Omuwtk8/8Za5oTyTpcKxk=; b=yTbehS7RzScA0hOk8dbWNjz4/1EFP3PuIJPXcVygMi70Laa4ZPci3StGBg1YrLFakT 4OMO+NsA0Rs3b/+i+Xj1HQnrCbsNBC6iQ0PmfGQ6rlStTnGNyCCamZWa4VYdVakLOyYY 0NZtXx1qtEQu0vPSSXbp5amxTSlQTVyH7ZqlXF6+g6AG4BQ/rIqqxv7Lrzyo4ewM8g2T iYDZlHH2K693hBueob4qBeiRLHmlzEQLp/LRgnarF7skJ5KTLs4y/ggJd2zyU/E607hP DF+3Dzn3xm1YQ3J98dcs/tlp62j6CAptGZXOzvxq7TigNxa5M89c/aEBSl38WnNIP+lr H7eg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga03.intel.com (mga03.intel.com. [134.134.136.65]) by mx.google.com with ESMTPS id n3-v6si18917408pld.146.2018.08.15.13.34.20 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 15 Aug 2018 13:34:21 -0700 (PDT) Received-SPF: pass (google.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) client-ip=134.134.136.65; Authentication-Results: mx.google.com; spf=pass (google.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 15 Aug 2018 13:34:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,244,1531810800"; d="scan'208";a="224930396" Received: from rpedgeco-hp-z240-tower-workstation.jf.intel.com (HELO rpedgeco-DESK5.jf.intel.com) ([10.54.75.168]) by orsmga004.jf.intel.com with ESMTP; 15 Aug 2018 13:34:18 -0700 From: Rick Edgecombe To: tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, daniel@iogearbox.net, jannh@google.com, keescook@chromium.org Cc: kristen@linux.intel.com, dave.hansen@intel.com, arjan@linux.intel.com, Rick Edgecombe Subject: [PATCH v3 2/3] x86/modules: Increase randomization for modules Date: Wed, 15 Aug 2018 13:30:18 -0700 Message-Id: <1534365020-18943-3-git-send-email-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1534365020-18943-1-git-send-email-rick.p.edgecombe@intel.com> References: <1534365020-18943-1-git-send-email-rick.p.edgecombe@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This changes the behavior of the KASLR logic for allocating memory for the text sections of loadable modules. It randomizes the location of each module text section with about 17 bits of entropy in typical use. This is enabled on X86_64 only. For 32 bit, the behavior is unchanged. It refactors existing code around module randomization in order to cleanly implement this along side the unchanged 32 bit behavior. The algorithm breaks the module space in two, a random area and a backup area. It first tries to allocate at a number of randomly located starting pages inside the random section without purging any lazy free vmap areas and triggering the associated TLB flush. If this fails, it will try again a number of times allowing for purges if needed. Finally if those both fail to find a position it will allocate in the backup area. The backup area base will be offset in the same way as the current algorithm does for the base area, 1024 possible locations. Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/pgtable_64_types.h | 7 ++ arch/x86/kernel/module.c | 163 +++++++++++++++++++++++++++----- 2 files changed, 147 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index 054765a..c320b7f 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -142,6 +142,13 @@ extern unsigned int ptrs_per_p4d; #define MODULES_END _AC(0xffffffffff000000, UL) #define MODULES_LEN (MODULES_END - MODULES_VADDR) +/* + * Dedicate the first part of the module space to a randomized area when KASLR + * is in use. Leave the remaining part for a fallback if we are unable to + * allocate in the random area. + */ +#define MODULES_RAND_LEN PAGE_ALIGN((MODULES_LEN/3)*2) + #define ESPFIX_PGD_ENTRY _AC(-2, UL) #define ESPFIX_BASE_ADDR (ESPFIX_PGD_ENTRY << P4D_SHIFT) diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c index f58336a..1cb8efa 100644 --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -48,34 +48,149 @@ do { \ } while (0) #endif -#ifdef CONFIG_RANDOMIZE_BASE +static inline int kaslr_randomize_each_module(void) +{ + return kaslr_enabled() + && IS_ENABLED(CONFIG_RANDOMIZE_BASE) + && IS_ENABLED(CONFIG_X86_64); +} + +static inline int kaslr_randomize_base(void) +{ + return kaslr_enabled() + && IS_ENABLED(CONFIG_RANDOMIZE_BASE) + && !IS_ENABLED(CONFIG_X86_64); +} + +#ifdef CONFIG_X86_64 +static inline const unsigned long get_modules_rand_len(void) +{ + return MODULES_RAND_LEN; +} +#else +static inline const unsigned long get_modules_rand_len(void) +{ + BUILD_BUG(); + return 0; +} +#endif + static unsigned long module_load_offset; +static const unsigned long NR_NO_PURGE = 5000; +static const unsigned long NR_TRY_PURGE = 5000; /* Mutex protects the module_load_offset. */ static DEFINE_MUTEX(module_kaslr_mutex); static unsigned long int get_module_load_offset(void) { - if (kaslr_enabled()) { - mutex_lock(&module_kaslr_mutex); - /* - * Calculate the module_load_offset the first time this - * code is called. Once calculated it stays the same until - * reboot. - */ - if (module_load_offset == 0) - module_load_offset = - (get_random_int() % 1024 + 1) * PAGE_SIZE; - mutex_unlock(&module_kaslr_mutex); - } + mutex_lock(&module_kaslr_mutex); + /* + * Calculate the module_load_offset the first time this + * code is called. Once calculated it stays the same until + * reboot. + */ + if (module_load_offset == 0) + module_load_offset = (get_random_int() % 1024 + 1) * PAGE_SIZE; + mutex_unlock(&module_kaslr_mutex); + return module_load_offset; } -#else -static unsigned long int get_module_load_offset(void) + +static unsigned long get_module_vmalloc_start(void) { - return 0; + if (kaslr_randomize_each_module()) + return MODULES_VADDR + get_modules_rand_len() + + get_module_load_offset(); + else if (kaslr_randomize_base()) + return MODULES_VADDR + get_module_load_offset(); + + return MODULES_VADDR; +} + +static void *try_module_alloc(unsigned long addr, unsigned long size, + int try_purge) +{ + const unsigned long vm_flags = 0; + + return __vmalloc_node_try_addr(addr, size, GFP_KERNEL, PAGE_KERNEL_EXEC, + vm_flags, NUMA_NO_NODE, try_purge, + __builtin_return_address(0)); +} + +/* + * Find a random address to try that wont obviosly not fit. Random areas are + * allowed to overflow into the backup area + */ +static unsigned long get_rand_module_addr(unsigned long size) +{ + unsigned long nr_max_pos = (MODULES_LEN - size) / MODULE_ALIGN + 1; + unsigned long nr_rnd_pos = get_modules_rand_len() / MODULE_ALIGN; + unsigned long nr_pos = min(nr_max_pos, nr_rnd_pos); + + unsigned long module_position_nr = get_random_long() % nr_pos; + unsigned long offset = module_position_nr * MODULE_ALIGN; + + return MODULES_VADDR + offset; +} + +/* + * Try to allocate in the random area. First 5000 times without purging, then + * 5000 times with purging. If these fail, return NULL. + */ +static void *try_module_randomize_each(unsigned long size) +{ + void *p = NULL; + unsigned int i; + unsigned long last_lazy_free_blocked = 0; + + /* This will have a gaurd page */ + unsigned long real_size = PAGE_ALIGN(size) + PAGE_SIZE; + + if (!kaslr_randomize_each_module()) + return NULL; + + /* Make sure there is at least one address that might fit. */ + if (real_size < PAGE_ALIGN(size) || real_size > MODULES_LEN) + return NULL; + + /* Try to find a spot that doesn't need a lazy purge */ + for (i = 0; i < NR_NO_PURGE; i++) { + unsigned long addr = get_rand_module_addr(real_size); + + /* First try to avoid having to purge */ + p = try_module_alloc(addr, real_size, 0); + + /* + * Save the last value that was blocked by a + * lazy purge area. + */ + if (IS_ERR(p) && PTR_ERR(p) == -EUCLEAN) + last_lazy_free_blocked = addr; + else if (!IS_ERR(p)) + return p; + } + + /* Try the most recent spot that could be used after a lazy purge */ + if (last_lazy_free_blocked) { + p = try_module_alloc(last_lazy_free_blocked, real_size, 1); + + if (!IS_ERR(p)) + return p; + } + + /* Look for more spots and allow lazy purges */ + for (i = 0; i < NR_TRY_PURGE; i++) { + unsigned long addr = get_rand_module_addr(real_size); + + /* Give up and allow for purges */ + p = try_module_alloc(addr, real_size, 1); + + if (!IS_ERR(p)) + return p; + } + return NULL; } -#endif void *module_alloc(unsigned long size) { @@ -84,16 +199,18 @@ void *module_alloc(unsigned long size) if (PAGE_ALIGN(size) > MODULES_LEN) return NULL; - p = __vmalloc_node_range(size, MODULE_ALIGN, - MODULES_VADDR + get_module_load_offset(), - MODULES_END, GFP_KERNEL, - PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE, - __builtin_return_address(0)); + p = try_module_randomize_each(size); + + if (!p) + p = __vmalloc_node_range(size, MODULE_ALIGN, + get_module_vmalloc_start(), MODULES_END, + GFP_KERNEL, PAGE_KERNEL_EXEC, 0, + NUMA_NO_NODE, __builtin_return_address(0)); + if (p && (kasan_module_alloc(p, size) < 0)) { vfree(p); return NULL; } - return p; } From patchwork Wed Aug 15 20:30:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 10566801 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 26D4B14E1 for ; Wed, 15 Aug 2018 20:34:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DBAFE2ACA9 for ; Wed, 15 Aug 2018 20:34:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CFCD32B006; Wed, 15 Aug 2018 20:34:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5D8AB2ACA9 for ; Wed, 15 Aug 2018 20:34:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5357A6B026E; Wed, 15 Aug 2018 16:34:22 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4EEBA6B0270; Wed, 15 Aug 2018 16:34:22 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3FC8F6B0271; Wed, 15 Aug 2018 16:34:22 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl0-f69.google.com (mail-pl0-f69.google.com [209.85.160.69]) by kanga.kvack.org (Postfix) with ESMTP id E8A456B026E for ; Wed, 15 Aug 2018 16:34:21 -0400 (EDT) Received: by mail-pl0-f69.google.com with SMTP id d10-v6so1260289pll.22 for ; Wed, 15 Aug 2018 13:34:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=gTrnlqy16kqpxjxGPq3uKDcEk2l/OvHsgL8/LW0Lw2w=; b=RDHSR4YeicClIp+nNFiT+Ubc8TqtUNAWtISt4ytQSSwYuyvI15S9OoqqtpDwlMCkFb MO0QcK8jbk2ssqeSg2X6263B2Y/uLgVTRwSnvn48b2t/vDwyf+PPD9yHOH9pCJPciULW jgiFALu5WH2sdbZE1aWT7IsPNXOS2JUtL6FmMrF8eV9wVqCwB4Tbwz04ElpgsP4L4MrK mc1mHjcustSG79x7ZOf4YXVi6G+bTqarbX3xjopA5l6tLCAL/niQz+q8lAOZUD4kpe97 1cHhY7fJfGaB9sWUmAkSCFpNqCOyH/qbxpkgFooCRZQqCjGtT2soG+8VwK9A2CUzBlEe qizQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AOUpUlEzR2IjYKlWOP1vhlOMAGfTvYupSHwgDTwGYgyUYQani6LaOplk 89s2MXzByVeI2LSxp2f7IBgR87/CUd9lIK0/cXwrXGlJUQHY6zgb+sJROm9TMElxDLOrOUCy1CW gEBCdd2jnS4hZuTb9nDPb3QszQlFKbxslBrKeu/Nak7ubc+Eneb8KPtdFt4FrGt7rHQ== X-Received: by 2002:a17:902:599b:: with SMTP id p27-v6mr25325737pli.191.1534365261613; Wed, 15 Aug 2018 13:34:21 -0700 (PDT) X-Google-Smtp-Source: AA+uWPxhTiVcvrmFK8SA5uwFI90X3RspkeT+tVL6kjq0bJYG612G0mcJ6jerpaZGfFClq1jCuoJy X-Received: by 2002:a17:902:599b:: with SMTP id p27-v6mr25325687pli.191.1534365260634; Wed, 15 Aug 2018 13:34:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534365260; cv=none; d=google.com; s=arc-20160816; b=XmBmmnF6Zt6MulSwKiySw9fFlT4b7xfK/wNZZD0A/0IZPe95FcLZjrQ4QsXTyoU8A/ U/wXetRZAYyco0Ysdr623DboI61T/NjxZlmH6qwKFtiRVg7S57LGDoFwZNAx79QiXCxJ KE11cLINJ6NBsuLdsFNa1g4/DtADzNh/fVeGZG7HBEG5GZXzC2n5X+ZNDdkvw0ROnqcS R/oRQAvW9LYPLkSWOZ8GYptRXavFAtiUouB7VkABIxCWXS5tMIe7lmEGLnm6pFk7OH0E rrPTjNtRgbGXpsKOp1rZJ2mQBq5cBbdEGekcvOjiCB+iC2EmW0h7zKRiPGwpUYtpECMq MdJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=gTrnlqy16kqpxjxGPq3uKDcEk2l/OvHsgL8/LW0Lw2w=; b=STrsAV4gDLRSFjCOGTjE7d3XdRaIVc8UW/xN7Qc0k1bokF+hqNSjOAQeruks5HIAV3 2w/ZgGjSmWWtlj8J4WdJiH/kyNDq60Ds9BTsOcy95aGNFrAMqfIkIHpjKu8UYMFDqcoW TdZBObrFUMKMUP+XkE+HW5oWn/l1TzBiSQkRe8lSb7T6kYwRay94W4x1CYgHisgCJVRS NBYhrwELZKlppxRqGLJzVCLgybv+HuLAdVfYbtaNuF6A1yQOb2IIRfTCqbWOCETKuclC 5ab6QwnFmBt+mkVXLFF6y2Z/ha301RtvJJH2wFGmMW7LfWzhkmtCSusHBJZ+YO5CZFXu j1/g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga03.intel.com (mga03.intel.com. [134.134.136.65]) by mx.google.com with ESMTPS id n3-v6si18917408pld.146.2018.08.15.13.34.20 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 15 Aug 2018 13:34:20 -0700 (PDT) Received-SPF: pass (google.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) client-ip=134.134.136.65; Authentication-Results: mx.google.com; spf=pass (google.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 15 Aug 2018 13:34:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,244,1531810800"; d="scan'208";a="224930399" Received: from rpedgeco-hp-z240-tower-workstation.jf.intel.com (HELO rpedgeco-DESK5.jf.intel.com) ([10.54.75.168]) by orsmga004.jf.intel.com with ESMTP; 15 Aug 2018 13:34:18 -0700 From: Rick Edgecombe To: tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, daniel@iogearbox.net, jannh@google.com, keescook@chromium.org Cc: kristen@linux.intel.com, dave.hansen@intel.com, arjan@linux.intel.com, Rick Edgecombe Subject: [PATCH v3 3/3] vmalloc: Add debugfs modfraginfo Date: Wed, 15 Aug 2018 13:30:19 -0700 Message-Id: <1534365020-18943-4-git-send-email-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1534365020-18943-1-git-send-email-rick.p.edgecombe@intel.com> References: <1534365020-18943-1-git-send-email-rick.p.edgecombe@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Add debugfs file "modfraginfo" for providing info on module space fragmentation. This can be used for determining if loadable module randomization is causing any problems for extreme module loading situations, like huge numbers of modules or extremely large modules. Sample output when KASLR is enabled and X86_64 is configured: Largest free space: 897912 kB Total free space: 1025424 kB Allocations in backup area: 0 Sample output when just X86_64: Largest free space: 897912 kB Total free space: 1025424 kB Signed-off-by: Rick Edgecombe --- mm/vmalloc.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 88 insertions(+), 1 deletion(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index fb85ec9..cb55138 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -33,6 +34,7 @@ #include #include +#include #include #include @@ -2925,7 +2927,92 @@ static int __init proc_vmalloc_init(void) proc_create_seq("vmallocinfo", 0400, NULL, &vmalloc_op); return 0; } -module_init(proc_vmalloc_init); +#else +static int __init proc_vmalloc_init(void) +{ + return 0; +} +#endif + +#if defined(CONFIG_DEBUG_FS) && defined(CONFIG_X86_64) +static int modulefraginfo_debug_show(struct seq_file *m, void *v) +{ + unsigned long last_end = MODULES_VADDR; + unsigned long total_free = 0; + unsigned long largest_free = 0; + unsigned long backup_cnt = 0; + unsigned long gap; + struct vmap_area *prev, *cur = NULL; + + spin_lock(&vmap_area_lock); + + if (!pvm_find_next_prev(MODULES_VADDR, &cur, &prev) || !cur) + goto done; + + for (; cur->va_end <= MODULES_END; cur = list_next_entry(cur, list)) { + /* Don't count areas that are marked to be lazily freed */ + if ((cur->flags & VM_LAZY_FREE)) + continue; + + if (cur->va_start >= MODULES_VADDR + MODULES_RAND_LEN) + backup_cnt++; + gap = cur->va_start - last_end; + if (gap > largest_free) + largest_free = gap; + total_free += gap; + last_end = cur->va_end; + + if (list_is_last(&cur->list, &vmap_area_list)) + break; + } + +done: + gap = (MODULES_END - last_end); + if (gap > largest_free) + largest_free = gap; + total_free += gap; + + spin_unlock(&vmap_area_lock); + seq_printf(m, "\tLargest free space:\t%lu kB\n", largest_free / 1024); + seq_printf(m, "\t Total free space:\t%lu kB\n", total_free / 1024); + + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && kaslr_enabled()) + seq_printf(m, "Allocations in backup area:\t%lu\n", backup_cnt); + + return 0; +} + +static int proc_module_frag_debug_open(struct inode *inode, struct file *file) +{ + return single_open(file, modulefraginfo_debug_show, NULL); +} + +static const struct file_operations debug_module_frag_operations = { + .open = proc_module_frag_debug_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static void __init debug_modfrag_init(void) +{ + debugfs_create_file("modfraginfo", 0400, NULL, NULL, + &debug_module_frag_operations); +} +#else /* defined(CONFIG_DEBUG_FS) && defined(CONFIG_X86_64) */ +static void __init debug_modfrag_init(void) +{ +} #endif +#if defined(CONFIG_DEBUG_FS) || defined(CONFIG_PROC_FS) +static int __init info_vmalloc_init(void) +{ + proc_vmalloc_init(); + debug_modfrag_init(); + return 0; +} + +module_init(info_vmalloc_init); +#endif