From patchwork Fri Oct 11 11:11:59 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steve Capper X-Patchwork-Id: 3023451 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 95C54BF924 for ; Fri, 11 Oct 2013 11:36:34 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 2D4BB2021C for ; Fri, 11 Oct 2013 11:36:33 +0000 (UTC) Received: from casper.infradead.org (casper.infradead.org [85.118.1.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 50121201F2 for ; Fri, 11 Oct 2013 11:36:28 +0000 (UTC) Received: from merlin.infradead.org ([2001:4978:20e::2]) by casper.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1VUb12-000569-85; Fri, 11 Oct 2013 11:36:20 +0000 Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1VUb0z-0005t7-Ro; Fri, 11 Oct 2013 11:36:17 +0000 Received: from mail-wg0-f45.google.com ([74.125.82.45]) by merlin.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1VUb0x-0005sP-BY for linux-arm-kernel@lists.infradead.org; Fri, 11 Oct 2013 11:36:16 +0000 Received: by mail-wg0-f45.google.com with SMTP id z12so2433103wgg.0 for ; Fri, 11 Oct 2013 04:35:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=s75RDppKbk6sIC3rWmXntqu4q5Nvae3Z2MpWXXtIVpE=; b=aRiW+2q7BUVmNE/5FWNFwfmeWsJVKmygCTpwxIo3ZAKeHZlEgidkf4oGhh+OCqMDfk RaMygrtOfr15RG04SU+cDNiAgKlHoPzrD+O2IB527HbvqBoyWNvmxbWqorUrXDIZd9pI VPaQrz/RzK1g4C+RxNFyHNeyuygm1Xffbqoj4WdGbQzWDPwFFpFX0yaO48ViZd93KYb5 vS/i7094b9JoCFQftfQX1RpQ9r1xwQkXoPIoKAvIP6YrnliKCBQqdXCaHYijqYJ5Itqy 7XQ+C7RkBxtQowYlr9MkffYSuSYy7xOCul2xScO25Yi3zWgK9NkNlLOMaK8I0/By9l+T AGAw== X-Gm-Message-State: ALoCoQnYnIZhn95FLz2ZAC54I8dpbWK+7zw+Uq675+FGx6N9tVyKBwojuad4TYIR4FcJn5zNy/OX X-Received: by 10.194.120.226 with SMTP id lf2mr1355560wjb.47.1381489927960; Fri, 11 Oct 2013 04:12:07 -0700 (PDT) Received: from marmot.wormnet.eu (marmot.wormnet.eu. [188.246.204.87]) by mx.google.com with ESMTPSA id e1sm4674455wij.6.2013.10.11.04.12.07 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 11 Oct 2013 04:12:07 -0700 (PDT) From: Steve Capper To: linux-arm-kernel@lists.infradead.org Subject: [PATCH V3] ARM: mm: make UACCESS_WITH_MEMCPY huge page aware Date: Fri, 11 Oct 2013 12:11:59 +0100 Message-Id: <1381489919-2508-1-git-send-email-steve.capper@linaro.org> X-Mailer: git-send-email 1.7.10.4 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20131011_073615_614294_8AB6E376 X-CRM114-Status: GOOD ( 20.15 ) X-Spam-Score: -0.1 (/) Cc: Steve Capper , linaro-kernel@lists.linaro.org, linux@arm.linux.org.uk, patches@linaro.org, nico@linaro.org X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD,SUSPICIOUS_RECIPS,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The memory pinning code in uaccess_with_memcpy.c does not check for HugeTLB or THP pmds, and will enter an infinite loop should a __copy_to_user or __clear_user occur against a huge page. This patch adds detection code for huge pages to pin_page_for_write. As this code can be executed in a fast path it refers to the actual pmds rather than the vma. If a HugeTLB or THP is found (they have the same pmd representation on ARM), the page table spinlock is taken to prevent modification whilst the page is pinned. On ARM, huge pages are only represented as pmds, thus no huge pud checks are performed. (For huge puds one would lock the page table in a similar manner as in the pmd case). Two helper functions are introduced; pmd_thp_or_huge will check whether or not a page is huge or transparent huge (which have the same pmd layout on ARM), and pmd_hugewillfault will detect whether or not a page fault will occur on write to the page. Running the following test (with the chunking from read_zero removed): $ dd if=/dev/zero of=/dev/null bs=10M count=1024 Gave: 2.3 GB/s backed by normal pages, 2.9 GB/s backed by huge pages, 5.1 GB/s backed by huge pages, with page mask=HPAGE_MASK. After some discussion, it was decided not to adopt the HPAGE_MASK, as this would have a significant detrimental effect on the overall system latency due to page_table_lock being held for too long. This could be revisited if split huge page locks are adopted. Signed-off-by: Steve Capper --- Changes in V3: pmd_hugewillfault and pmd_thp_or_huge were erroneously ommitted from pgtable-2level.h causing build problems. Empty stubs are added for now as we don't have huge page support for short descriptors. Nicolas, can I please re-add your Reviewed-by to this? --- arch/arm/include/asm/pgtable-2level.h | 7 ++++++ arch/arm/include/asm/pgtable-3level.h | 3 +++ arch/arm/lib/uaccess_with_memcpy.c | 41 ++++++++++++++++++++++++++++++++--- 3 files changed, 48 insertions(+), 3 deletions(-) diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h index f97ee02..86a659a 100644 --- a/arch/arm/include/asm/pgtable-2level.h +++ b/arch/arm/include/asm/pgtable-2level.h @@ -181,6 +181,13 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr) #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext) +/* + * We don't have huge page support for short descriptors, for the moment + * define empty stubs for use by pin_page_for_write. + */ +#define pmd_hugewillfault(pmd) (0) +#define pmd_thp_or_huge(pmd) (0) + #endif /* __ASSEMBLY__ */ #endif /* _ASM_PGTABLE_2LEVEL_H */ diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index 5689c18..39c54cf 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -206,6 +206,9 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr) #define __HAVE_ARCH_PMD_WRITE #define pmd_write(pmd) (!(pmd_val(pmd) & PMD_SECT_RDONLY)) +#define pmd_hugewillfault(pmd) (!pmd_young(pmd) || !pmd_write(pmd)) +#define pmd_thp_or_huge(pmd) (pmd_huge(pmd) || pmd_trans_huge(pmd)) + #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define pmd_trans_huge(pmd) (pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT)) #define pmd_trans_splitting(pmd) (pmd_val(pmd) & PMD_SECT_SPLITTING) diff --git a/arch/arm/lib/uaccess_with_memcpy.c b/arch/arm/lib/uaccess_with_memcpy.c index 025f742..3e58d71 100644 --- a/arch/arm/lib/uaccess_with_memcpy.c +++ b/arch/arm/lib/uaccess_with_memcpy.c @@ -18,6 +18,7 @@ #include /* for in_atomic() */ #include #include +#include #include #include @@ -40,7 +41,35 @@ pin_page_for_write(const void __user *_addr, pte_t **ptep, spinlock_t **ptlp) return 0; pmd = pmd_offset(pud, addr); - if (unlikely(pmd_none(*pmd) || pmd_bad(*pmd))) + if (unlikely(pmd_none(*pmd))) + return 0; + + /* + * A pmd can be bad if it refers to a HugeTLB or THP page. + * + * Both THP and HugeTLB pages have the same pmd layout + * and should not be manipulated by the pte functions. + * + * Lock the page table for the destination and check + * to see that it's still huge and whether or not we will + * need to fault on write, or if we have a splitting THP. + */ + if (unlikely(pmd_thp_or_huge(*pmd))) { + ptl = ¤t->mm->page_table_lock; + spin_lock(ptl); + if (unlikely(!pmd_thp_or_huge(*pmd) + || pmd_hugewillfault(*pmd) + || pmd_trans_splitting(*pmd))) { + spin_unlock(ptl); + return 0; + } + + *ptep = NULL; + *ptlp = ptl; + return 1; + } + + if (unlikely(pmd_bad(*pmd))) return 0; pte = pte_offset_map_lock(current->mm, pmd, addr, &ptl); @@ -94,7 +123,10 @@ __copy_to_user_memcpy(void __user *to, const void *from, unsigned long n) from += tocopy; n -= tocopy; - pte_unmap_unlock(pte, ptl); + if (pte) + pte_unmap_unlock(pte, ptl); + else + spin_unlock(ptl); } if (!atomic) up_read(¤t->mm->mmap_sem); @@ -147,7 +179,10 @@ __clear_user_memset(void __user *addr, unsigned long n) addr += tocopy; n -= tocopy; - pte_unmap_unlock(pte, ptl); + if (pte) + pte_unmap_unlock(pte, ptl); + else + spin_unlock(ptl); } up_read(¤t->mm->mmap_sem);