From patchwork Fri Aug 28 14:03:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gerald Schaefer X-Patchwork-Id: 11742905 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C8E391575 for ; Fri, 28 Aug 2020 14:04:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6F5B1208CA for ; Fri, 28 Aug 2020 14:04:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="Aj7fZ0gK" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6F5B1208CA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4E4C76B0003; Fri, 28 Aug 2020 10:04:00 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4947D6B0005; Fri, 28 Aug 2020 10:04:00 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 35B776B0006; Fri, 28 Aug 2020 10:04:00 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0216.hostedemail.com [216.40.44.216]) by kanga.kvack.org (Postfix) with ESMTP id 1959D6B0003 for ; Fri, 28 Aug 2020 10:04:00 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id ABBEC180ABF42 for ; Fri, 28 Aug 2020 14:03:59 +0000 (UTC) X-FDA: 77200146198.04.owner12_1209dce27076 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin04.hostedemail.com (Postfix) with ESMTP id 140E180137F2 for ; Fri, 28 Aug 2020 14:03:56 +0000 (UTC) X-Spam-Summary: 1,0,0,96fcccb3371851a0,d41d8cd98f00b204,gerald.schaefer@linux.ibm.com,,RULES_HIT:1:41:69:355:379:541:800:960:967:973:988:989:1260:1261:1345:1431:1437:1605:1730:1747:1777:1792:1801:2194:2199:2393:2525:2559:2565:2570:2636:2682:2685:2693:2703:2859:2933:2937:2939:2942:2945:2947:2951:2954:3022:3865:3866:3867:3868:3870:3871:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4321:4605:5007:6117:6119:6120:6261:6630:7576:7875:7903:8985:9025:9121:10004:11658:13146:13161:13229:13230,0,RBL:148.163.156.1:@linux.ibm.com:.lbl8.mailshell.net-62.2.0.100 64.100.201.201;04y8hnqs8db5cd71goq1egz5713swypstagpgtuuiqhxfhrg6c9mnwxbzgami3k.xhrcfx99akwf4coh6e8pbchyfyn916oqn9zyeb4b8fkb9fx6c9gdw9ahhr8gxrt.a-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:54,LUA_SUMMARY:none X-HE-Tag: owner12_1209dce27076 X-Filterd-Recvd-Size: 14061 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf21.hostedemail.com (Postfix) with ESMTP for ; Fri, 28 Aug 2020 14:03:47 +0000 (UTC) Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 07SE2VSr144342; Fri, 28 Aug 2020 10:03:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=pp1; bh=n2ItrrMooY+oV6LxWyAVMcFRhCBWXeSwcj4sLpDnYXw=; b=Aj7fZ0gKH9cPg4ESIhFhKGbbaCv3REP5flG61sQ6zupQyK+zcER8w0Fu790G3UlvrCVV zXQ6Ux/i2KzsjgI/IuxUNla1tZDT+ivz1YU2QooUr65rohnKXJQjhojpxXCaIQREGww5 t+ZsID6dkIrGYKqp70AR1T2v3R6g87OQBw/Yk+cYprpmbPbSfck5Wt7A1xHu9hmuLFUr ir3/V+Uz4lTQQDnX53dxVvYzqtukgeX6v3hcPoIoWWF6Diw+Vd17mG4QXxLVhVq2ddki ewLzwN4FJJHvi4fvUWT1OMzMi6Upfq3PYDakDw0BoHoGqHK3E3yUpCiNe1UDZDGFHZ7l eg== Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com [149.81.74.108]) by mx0a-001b2d01.pphosted.com with ESMTP id 3370k5524s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 28 Aug 2020 10:03:44 -0400 Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1]) by ppma05fra.de.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 07SE3QGQ029198; Fri, 28 Aug 2020 14:03:41 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma05fra.de.ibm.com with ESMTP id 335j271ncq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 28 Aug 2020 14:03:41 +0000 Received: from d06av24.portsmouth.uk.ibm.com (d06av24.portsmouth.uk.ibm.com [9.149.105.60]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 07SE3cbD23331252 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 28 Aug 2020 14:03:38 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A5C644204B; Fri, 28 Aug 2020 14:03:38 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3FA0642045; Fri, 28 Aug 2020 14:03:38 +0000 (GMT) Received: from tuxmaker.boeblingen.de.ibm.com (unknown [9.152.85.9]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Fri, 28 Aug 2020 14:03:38 +0000 (GMT) From: Gerald Schaefer To: Linus Torvalds , Andrew Morton Cc: linux-mm , LKML , Vasily Gorbik , Alexander Gordeev , linux-s390@vger.kernel.org, Heiko Carstens , Claudio Imbrenda , Christian Borntraeger Subject: [RFC PATCH 1/2] mm/gup: fix gup_fast with dynamic page table folding Date: Fri, 28 Aug 2020 16:03:13 +0200 Message-Id: <20200828140314.8556-2-gerald.schaefer@linux.ibm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200828140314.8556-1-gerald.schaefer@linux.ibm.com> References: <20200828140314.8556-1-gerald.schaefer@linux.ibm.com> X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235,18.0.687 definitions=2020-08-28_08:2020-08-28,2020-08-28 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 suspectscore=0 adultscore=0 impostorscore=0 bulkscore=0 mlxlogscore=999 spamscore=0 lowpriorityscore=0 mlxscore=0 malwarescore=0 phishscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2008280104 X-Rspamd-Queue-Id: 140E180137F2 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Vasily Gorbik Commit 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast code") introduced a subtle but severe bug on s390 with gup_fast, due to dynamic page table folding. The question "What would it require for the generic code to work for s390" has already been discussed here https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1 and ended with a promising approach here https://lkml.kernel.org/r/20190419153307.4f2911b5@mschwideX1 which in the end unfortunately didn't quite work completely. We tried to mimic static level folding by changing pgd_offset to always calculate top level page table offset, and do nothing in folded pXd_offset. What has been overlooked is that PxD_SIZE/MASK and thus pXd_addr_end do not reflect this dynamic behaviour, and still act like static 5-level page tables. Here is an example of what happens with gup_fast on s390, for a task with 3-levels paging, crossing a 2 GB pud boundary: // addr = 0x1007ffff000, end = 0x10080001000 static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, unsigned int flags, struct page **pages, int *nr) { unsigned long next; pud_t *pudp; // pud_offset returns &p4d itself (a pointer to a value on stack) pudp = pud_offset(&p4d, addr); do { // on second iteratation reading "random" stack value pud_t pud = READ_ONCE(*pudp); // next = 0x10080000000, due to PUD_SIZE/MASK != PGDIR_SIZE/MASK on s390 next = pud_addr_end(addr, end); ... } while (pudp++, addr = next, addr != end); // pudp++ iterating over stack return 1; } pud_addr_end = 0x10080000000 is correct, but the previous pgd/p4d_addr_end should also have returned that limit, instead of the 5-level static pgd/p4d limits with PUD_SIZE/MASK != PGDIR_SIZE/MASK. Then the "end" parameter for gup_pud_range would also have been 0x10080000000, and we would not iterate further in gup_pud_range, but rather go back and (correctly) do it in gup_pgd_range. So, for the second iteration in gup_pud_range, we will increase pudp, which pointed to a stack value and not the real pud table. This new pudp will then point to whatever lies behind the p4d stack value. In general, this happens to be the previously read pgd, but it probably could also be something different, depending on compiler decisions. Most unfortunately, if it happens to be the pgd value, which is the same as the p4d / pud due to folding, it is a valid and present entry. So after the increment, we would still point to the same pud entry. The addr however has been increased in the second iteration, so that we now have different pmd/pte_index values, which will result in very wrong behaviour for the remaining gup_pmd/pte_range calls. We will effectively operate on an address minus 2 GB, due to missing pudp increase. In the "good case", if nothing is mapped there, we will fall back to the slow gup path. But if something is mapped there, and valid for gup_fast, we will end up (silently) getting references on the wrong pages and also add the wrong pages to the **pages result array. This can cause data corruption. Fix this with an approach that has already been discussed in https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1 It will additionally pass along pXd pointers in gup_pXd_range, and also introduce pXd_offset_orig for s390, which takes an additional pXd entry value parameter. This allows returning correct pointers while still preseving the READ_ONCE logic for gup_fast. No change for other architectures introduced. Cc: # 5.2+ Fixes: 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast code") Reviewed-by: Gerald Schaefer Reviewed-by: Alexander Gordeev Signed-off-by: Vasily Gorbik --- arch/s390/include/asm/pgtable.h | 42 +++++++++++++++++++++++---------- include/linux/pgtable.h | 10 ++++++++ mm/gup.c | 18 +++++++------- 3 files changed, 49 insertions(+), 21 deletions(-) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 7eb01a5459cd..69a92e39d7b8 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1260,26 +1260,44 @@ static inline pgd_t *pgd_offset_raw(pgd_t *pgd, unsigned long address) #define pgd_offset(mm, address) pgd_offset_raw(READ_ONCE((mm)->pgd), address) -static inline p4d_t *p4d_offset(pgd_t *pgd, unsigned long address) +static inline p4d_t *p4d_offset_orig(pgd_t *pgdp, pgd_t pgd, unsigned long address) { - if ((pgd_val(*pgd) & _REGION_ENTRY_TYPE_MASK) >= _REGION_ENTRY_TYPE_R1) - return (p4d_t *) pgd_deref(*pgd) + p4d_index(address); - return (p4d_t *) pgd; + if ((pgd_val(pgd) & _REGION_ENTRY_TYPE_MASK) >= _REGION_ENTRY_TYPE_R1) + return (p4d_t *) pgd_deref(pgd) + p4d_index(address); + return (p4d_t *) pgdp; } +#define p4d_offset_orig p4d_offset_orig -static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address) +static inline p4d_t *p4d_offset(pgd_t *pgdp, unsigned long address) { - if ((p4d_val(*p4d) & _REGION_ENTRY_TYPE_MASK) >= _REGION_ENTRY_TYPE_R2) - return (pud_t *) p4d_deref(*p4d) + pud_index(address); - return (pud_t *) p4d; + return p4d_offset_orig(pgdp, *pgdp, address); +} + +static inline pud_t *pud_offset_orig(p4d_t *p4dp, p4d_t p4d, unsigned long address) +{ + if ((p4d_val(p4d) & _REGION_ENTRY_TYPE_MASK) >= _REGION_ENTRY_TYPE_R2) + return (pud_t *) p4d_deref(p4d) + pud_index(address); + return (pud_t *) p4dp; +} +#define pud_offset_orig pud_offset_orig + +static inline pud_t *pud_offset(p4d_t *p4dp, unsigned long address) +{ + return pud_offset_orig(p4dp, *p4dp, address); } #define pud_offset pud_offset -static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) +static inline pmd_t *pmd_offset_orig(pud_t *pudp, pud_t pud, unsigned long address) +{ + if ((pud_val(pud) & _REGION_ENTRY_TYPE_MASK) >= _REGION_ENTRY_TYPE_R3) + return (pmd_t *) pud_deref(pud) + pmd_index(address); + return (pmd_t *) pudp; +} +#define pmd_offset_orig pmd_offset_orig + +static inline pmd_t *pmd_offset(pud_t *pudp, unsigned long address) { - if ((pud_val(*pud) & _REGION_ENTRY_TYPE_MASK) >= _REGION_ENTRY_TYPE_R3) - return (pmd_t *) pud_deref(*pud) + pmd_index(address); - return (pmd_t *) pud; + return pmd_offset_orig(pudp, *pudp, address); } #define pmd_offset pmd_offset diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index a124c21e3204..02f93358126e 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1425,6 +1425,16 @@ typedef unsigned int pgtbl_mod_mask; #define mm_pmd_folded(mm) __is_defined(__PAGETABLE_PMD_FOLDED) #endif +#ifndef p4d_offset_orig +#define p4d_offset_orig(pgdp, pgd, address) p4d_offset(&pgd, address) +#endif +#ifndef pud_offset_orig +#define pud_offset_orig(p4dp, p4d, address) pud_offset(&p4d, address) +#endif +#ifndef pmd_offset_orig +#define pmd_offset_orig(pudp, pud, address) pmd_offset(&pud, address) +#endif + /* * p?d_leaf() - true if this entry is a final mapping to a physical address. * This differs from p?d_huge() by the fact that they are always available (if diff --git a/mm/gup.c b/mm/gup.c index ae096ea7583f..fbdd9f0bf219 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2500,13 +2500,13 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, return 1; } -static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end, +static int gup_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr, unsigned long end, unsigned int flags, struct page **pages, int *nr) { unsigned long next; pmd_t *pmdp; - pmdp = pmd_offset(&pud, addr); + pmdp = pmd_offset_orig(pudp, pud, addr); do { pmd_t pmd = READ_ONCE(*pmdp); @@ -2543,13 +2543,13 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end, return 1; } -static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, +static int gup_pud_range(p4d_t *p4dp, p4d_t p4d, unsigned long addr, unsigned long end, unsigned int flags, struct page **pages, int *nr) { unsigned long next; pud_t *pudp; - pudp = pud_offset(&p4d, addr); + pudp = pud_offset_orig(p4dp, p4d, addr); do { pud_t pud = READ_ONCE(*pudp); @@ -2564,20 +2564,20 @@ static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, if (!gup_huge_pd(__hugepd(pud_val(pud)), addr, PUD_SHIFT, next, flags, pages, nr)) return 0; - } else if (!gup_pmd_range(pud, addr, next, flags, pages, nr)) + } else if (!gup_pmd_range(pudp, pud, addr, next, flags, pages, nr)) return 0; } while (pudp++, addr = next, addr != end); return 1; } -static int gup_p4d_range(pgd_t pgd, unsigned long addr, unsigned long end, +static int gup_p4d_range(pgd_t *pgdp, pgd_t pgd, unsigned long addr, unsigned long end, unsigned int flags, struct page **pages, int *nr) { unsigned long next; p4d_t *p4dp; - p4dp = p4d_offset(&pgd, addr); + p4dp = p4d_offset_orig(pgdp, pgd, addr); do { p4d_t p4d = READ_ONCE(*p4dp); @@ -2589,7 +2589,7 @@ static int gup_p4d_range(pgd_t pgd, unsigned long addr, unsigned long end, if (!gup_huge_pd(__hugepd(p4d_val(p4d)), addr, P4D_SHIFT, next, flags, pages, nr)) return 0; - } else if (!gup_pud_range(p4d, addr, next, flags, pages, nr)) + } else if (!gup_pud_range(p4dp, p4d, addr, next, flags, pages, nr)) return 0; } while (p4dp++, addr = next, addr != end); @@ -2617,7 +2617,7 @@ static void gup_pgd_range(unsigned long addr, unsigned long end, if (!gup_huge_pd(__hugepd(pgd_val(pgd)), addr, PGDIR_SHIFT, next, flags, pages, nr)) return; - } else if (!gup_p4d_range(pgd, addr, next, flags, pages, nr)) + } else if (!gup_p4d_range(pgdp, pgd, addr, next, flags, pages, nr)) return; } while (pgdp++, addr = next, addr != end); } From patchwork Fri Aug 28 14:03:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gerald Schaefer X-Patchwork-Id: 11742907 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2D59B1575 for ; Fri, 28 Aug 2020 14:04:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E06AB2086A for ; Fri, 28 Aug 2020 14:04:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="dzUz08yg" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E06AB2086A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6E20B6B0006; Fri, 28 Aug 2020 10:04:03 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 66AAD8D0001; Fri, 28 Aug 2020 10:04:03 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4CD206B0008; Fri, 28 Aug 2020 10:04:03 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0254.hostedemail.com [216.40.44.254]) by kanga.kvack.org (Postfix) with ESMTP id 34C486B0005 for ; Fri, 28 Aug 2020 10:04:03 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id E34D58245571 for ; Fri, 28 Aug 2020 14:04:02 +0000 (UTC) X-FDA: 77200146324.17.crowd53_6302e1127076 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id 79190180CF5E5 for ; Fri, 28 Aug 2020 14:03:52 +0000 (UTC) X-Spam-Summary: 1,0,0,731ce31104042713,d41d8cd98f00b204,gerald.schaefer@linux.ibm.com,,RULES_HIT:1:2:41:355:379:541:800:960:967:973:988:989:1260:1261:1345:1359:1431:1437:1605:1730:1747:1777:1792:1801:1981:2194:2199:2393:2525:2559:2564:2682:2685:2693:2859:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4030:4050:4321:4605:5007:6119:6120:6261:6630:6653:7576:7875:7903:8603:8985:9025:9121:10004:11026:11233:11473:11657:11658:11854:11914:12043:12291:12296:12297:12438:12555:12679:12683:12895:12986:13146:13161:13229:13230:14096:21063:21080:21433:21451:21627:21740:21990:30003:30012:30041:30054:30070,0,RBL:148.163.158.5:@linux.ibm.com:.lbl8.mailshell.net-62.14.0.100 64.201.201.201;04yrf33ixkx5epb6z7iq156zwxwyiopooy8oxxuexhow7tmcbrtonnqgu3woatx.e67zwjx58ba1mubouun1mbqzshwzxc8p9dn8dx1isexggih1bhu8fnw9koq61qu.k-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5 ,Netchec X-HE-Tag: crowd53_6302e1127076 X-Filterd-Recvd-Size: 11768 Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf22.hostedemail.com (Postfix) with ESMTP for ; Fri, 28 Aug 2020 14:03:46 +0000 (UTC) Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 07SE2008134321; Fri, 28 Aug 2020 10:03:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=pp1; bh=NWbIn2u4VR2d9Dsg63TWk3TLh0Y6CBG2PbU389msfy0=; b=dzUz08ygFNUQxmMTl9Avt6kzP099I8KJ3XmzOhIQnWoNxc8F/KVx8salH5XpApbDTBTD aFLJeSrCFUp9OkMcucDrN/kUaxXpmbWrLfmqol2ge9pbvMwSWngy57zvLB6ngyL9oHYr sA6sMCuSmzYq8Blt+K5aW43yyMfFDqss+Zxy8QmSyUG4bXMHSnYpFY2eWzP6J393OyTQ eAoPAfAPmBU3ELBsvoYk5Dug7PKrBxNE1GPt7Os5RBxPX8DxmMyL359inv5WSMsiBbf9 KMPF7jd240DAeR7I1tVLmnZTMGUHWyswY905KPfxFBLiPJ6qE0s896GSO/2axiIZHami Dg== Received: from ppma03fra.de.ibm.com (6b.4a.5195.ip4.static.sl-reverse.com [149.81.74.107]) by mx0b-001b2d01.pphosted.com with ESMTP id 3371gbuhf7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 28 Aug 2020 10:03:44 -0400 Received: from pps.filterd (ppma03fra.de.ibm.com [127.0.0.1]) by ppma03fra.de.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 07SE3LZ6001337; Fri, 28 Aug 2020 14:03:43 GMT Received: from b06cxnps4074.portsmouth.uk.ibm.com (d06relay11.portsmouth.uk.ibm.com [9.149.109.196]) by ppma03fra.de.ibm.com with ESMTP id 332utq46su-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 28 Aug 2020 14:03:42 +0000 Received: from d06av24.portsmouth.uk.ibm.com (d06av24.portsmouth.uk.ibm.com [9.149.105.60]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 07SE3eEp31130018 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 28 Aug 2020 14:03:40 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E138B42042; Fri, 28 Aug 2020 14:03:39 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7A13A42041; Fri, 28 Aug 2020 14:03:39 +0000 (GMT) Received: from tuxmaker.boeblingen.de.ibm.com (unknown [9.152.85.9]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Fri, 28 Aug 2020 14:03:39 +0000 (GMT) From: Gerald Schaefer To: Linus Torvalds , Andrew Morton Cc: linux-mm , LKML , Vasily Gorbik , Alexander Gordeev , linux-s390@vger.kernel.org, Heiko Carstens , Claudio Imbrenda , Christian Borntraeger Subject: [RFC PATCH 2/2] mm/gup: fix gup_fast with dynamic page table folding Date: Fri, 28 Aug 2020 16:03:14 +0200 Message-Id: <20200828140314.8556-3-gerald.schaefer@linux.ibm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200828140314.8556-1-gerald.schaefer@linux.ibm.com> References: <20200828140314.8556-1-gerald.schaefer@linux.ibm.com> X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235,18.0.687 definitions=2020-08-28_08:2020-08-28,2020-08-28 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 spamscore=0 impostorscore=0 mlxlogscore=999 lowpriorityscore=0 priorityscore=1501 malwarescore=0 bulkscore=0 adultscore=0 clxscore=1015 mlxscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2008280104 X-Rspamd-Queue-Id: 79190180CF5E5 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexander Gordeev Commit 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast code") introduced a subtle but severe bug on s390 with gup_fast, due to dynamic page table folding. The question "What would it require for the generic code to work for s390" has already been discussed here https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1 and ended with a promising approach here https://lkml.kernel.org/r/20190419153307.4f2911b5@mschwideX1 which in the end unfortunately didn't quite work completely. We tried to mimic static level folding by changing pgd_offset to always calculate top level page table offset, and do nothing in folded pXd_offset. What has been overlooked is that PxD_SIZE/MASK and thus pXd_addr_end do not reflect this dynamic behaviour, and still act like static 5-level page tables. Here is an example of what happens with gup_fast on s390, for a task with 3-levels paging, crossing a 2 GB pud boundary: // addr = 0x1007ffff000, end = 0x10080001000 static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, unsigned int flags, struct page **pages, int *nr) { unsigned long next; pud_t *pudp; // pud_offset returns &p4d itself (a pointer to a value on stack) pudp = pud_offset(&p4d, addr); do { // on second iteratation reading "random" stack value pud_t pud = READ_ONCE(*pudp); // next = 0x10080000000, due to PUD_SIZE/MASK != PGDIR_SIZE/MASK on s390 next = pud_addr_end(addr, end); ... } while (pudp++, addr = next, addr != end); // pudp++ iterating over stack return 1; } pud_addr_end = 0x10080000000 is correct, but the previous pgd/p4d_addr_end should also have returned that limit, instead of the 5-level static pgd/p4d limits with PUD_SIZE/MASK != PGDIR_SIZE/MASK. Then the "end" parameter for gup_pud_range would also have been 0x10080000000, and we would not iterate further in gup_pud_range, but rather go back and (correctly) do it in gup_pgd_range. So, for the second iteration in gup_pud_range, we will increase pudp, which pointed to a stack value and not the real pud table. This new pudp will then point to whatever lies behind the p4d stack value. In general, this happens to be the previously read pgd, but it probably could also be something different, depending on compiler decisions. Most unfortunately, if it happens to be the pgd value, which is the same as the p4d / pud due to folding, it is a valid and present entry. So after the increment, we would still point to the same pud entry. The addr however has been increased in the second iteration, so that we now have different pmd/pte_index values, which will result in very wrong behaviour for the remaining gup_pmd/pte_range calls. We will effectively operate on an address minus 2 GB, due to missing pudp increase. In the "good case", if nothing is mapped there, we will fall back to the slow gup path. But if something is mapped there, and valid for gup_fast, we will end up (silently) getting references on the wrong pages and also add the wrong pages to the **pages result array. This can cause data corruption. Fix this by introducing new gup_pXd_addr_end helpers, which take an additional pXd entry value parameter, that can be used on s390 to determine the correct page table level and return corresponding end / boundary. With that, the pointer iteration will always happen in gup_pgd_range for s390. No change for other architectures introduced. Cc: # 5.2+ Fixes: 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast code") Reviewed-by: Gerald Schaefer Signed-off-by: Alexander Gordeev --- arch/s390/include/asm/pgtable.h | 49 +++++++++++++++++++++++++++++++++ include/linux/pgtable.h | 16 +++++++++++ mm/gup.c | 8 +++--- 3 files changed, 69 insertions(+), 4 deletions(-) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 7eb01a5459cd..1b8f461f5783 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -512,6 +512,55 @@ static inline bool mm_pmd_folded(struct mm_struct *mm) } #define mm_pmd_folded(mm) mm_pmd_folded(mm) +static inline unsigned long gup_folded_addr_end(unsigned long rste, + unsigned long addr, unsigned long end) +{ + unsigned int type = rste & _REGION_ENTRY_TYPE_MASK; + unsigned long size, mask, boundary; + + switch (type) { + case _REGION_ENTRY_TYPE_R1: + size = PGDIR_SIZE; + mask = PGDIR_MASK; + break; + case _REGION_ENTRY_TYPE_R2: + size = P4D_SIZE; + mask = P4D_MASK; + break; + case _REGION_ENTRY_TYPE_R3: + size = PUD_SIZE; + mask = PUD_MASK; + break; + default: + BUG(); + }; + + boundary = (addr + size) & mask; + + return (boundary - 1) < (end - 1) ? boundary : end; +} + +#define gup_pgd_addr_end gup_pgd_addr_end +static inline unsigned long gup_pgd_addr_end(pgd_t pgd, + unsigned long addr, unsigned long end) +{ + return gup_folded_addr_end(pgd_val(pgd), addr, end); +} + +#define gup_p4d_addr_end gup_p4d_addr_end +static inline unsigned long gup_p4d_addr_end(p4d_t p4d, + unsigned long addr, unsigned long end) +{ + return gup_folded_addr_end(p4d_val(p4d), addr, end); +} + +#define gup_pud_addr_end gup_pud_addr_end +static inline unsigned long gup_pud_addr_end(pud_t pud, + unsigned long addr, unsigned long end) +{ + return gup_folded_addr_end(pud_val(pud), addr, end); +} + static inline int mm_has_pgste(struct mm_struct *mm) { #ifdef CONFIG_PGSTE diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index e8cbc2e795d5..620a83c774c7 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -681,6 +681,22 @@ static inline int arch_unmap_one(struct mm_struct *mm, }) #endif +#ifndef gup_pgd_addr_end +#define gup_pgd_addr_end(pgd, addr, end) pgd_addr_end(addr, end) +#endif + +#ifndef gup_p4d_addr_end +#define gup_p4d_addr_end(p4d, addr, end) p4d_addr_end(addr, end) +#endif + +#ifndef gup_pud_addr_end +#define gup_pud_addr_end(pud, addr, end) pud_addr_end(addr, end) +#endif + +#ifndef gup_pmd_addr_end +#define gup_pmd_addr_end(pmd, addr, end) pmd_addr_end(addr, end) +#endif + /* * When walking page tables, we usually want to skip any p?d_none entries; * and any p?d_bad entries - reporting the error before resetting to none. diff --git a/mm/gup.c b/mm/gup.c index ae096ea7583f..149ef3d71457 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2510,7 +2510,7 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end, do { pmd_t pmd = READ_ONCE(*pmdp); - next = pmd_addr_end(addr, end); + next = gup_pmd_addr_end(pmd, addr, end); if (!pmd_present(pmd)) return 0; @@ -2553,7 +2553,7 @@ static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, do { pud_t pud = READ_ONCE(*pudp); - next = pud_addr_end(addr, end); + next = gup_pud_addr_end(pud, addr, end); if (unlikely(!pud_present(pud))) return 0; if (unlikely(pud_huge(pud))) { @@ -2581,7 +2581,7 @@ static int gup_p4d_range(pgd_t pgd, unsigned long addr, unsigned long end, do { p4d_t p4d = READ_ONCE(*p4dp); - next = p4d_addr_end(addr, end); + next = gup_p4d_addr_end(p4d, addr, end); if (p4d_none(p4d)) return 0; BUILD_BUG_ON(p4d_huge(p4d)); @@ -2606,7 +2606,7 @@ static void gup_pgd_range(unsigned long addr, unsigned long end, do { pgd_t pgd = READ_ONCE(*pgdp); - next = pgd_addr_end(addr, end); + next = gup_pgd_addr_end(pgd, addr, end); if (pgd_none(pgd)) return; if (unlikely(pgd_huge(pgd))) {