From patchwork Wed Apr 20 22:37:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12820894 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFB77C433FE for ; Wed, 20 Apr 2022 22:38:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 52AE76B0073; Wed, 20 Apr 2022 18:38:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4DCA16B0074; Wed, 20 Apr 2022 18:38:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 26AA66B0075; Wed, 20 Apr 2022 18:38:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 134646B0073 for ; Wed, 20 Apr 2022 18:38:17 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id DF29B20F7E for ; Wed, 20 Apr 2022 22:38:16 +0000 (UTC) X-FDA: 79378722192.05.0AB4540 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by imf15.hostedemail.com (Postfix) with ESMTP id 96628A0018 for ; Wed, 20 Apr 2022 22:38:14 +0000 (UTC) Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 23KL62JJ025975; Wed, 20 Apr 2022 22:38:13 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2021-07-09; bh=9TO8JGBywDKFNvzhFp7t1FbK+6cmhE8PCx78UOndfm0=; b=F4kYUHxKg19Fm659lhzBvLEaPLbyEyCgBcwvSRfjRlBMeKaMM3y+loNbkDS+PeoEylY9 X35sme7bjrmgNyYorSnEjcRDmj/vZBQdkyWaCCW8goKJjZbjCQXPN+E3aTFAoZQ8yCpH yEWnFK3XmbgwqhTf91ebLrfihXHh0PSeH2XvfYY+0hSs7U0dXbmxHF82I5HbyomHW/zC uUpWIrFrqO0vArlX5R1TtyIG+EPWsiZMBDI6VrJQ5Z8pUaXVFnTEgsR+Os8zUWRqc/u+ HcE9UL9JJh71uFUYAXKlyUCoH8Y3u0dVS3qZD2Jn24CJpFVHIzLPsjKNiiJFecO45qn+ oA== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com with ESMTP id 3ffm7ct91g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 22:38:13 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 23KMQMhP010599; Wed, 20 Apr 2022 22:38:12 GMT Received: from nam04-dm6-obe.outbound.protection.outlook.com (mail-dm6nam08lp2043.outbound.protection.outlook.com [104.47.73.43]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3ffm883jg0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 22:38:12 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fxdMbt1NHTo1or+fwrPVEz9WqHBedAinEIIzguz12r3Y0h+n22G2bYz+VKDw2qllGYHTV9fdhnGZrX4XC3pYTg8S4MW0Mnv9fSPflCiEiFXya6+8xRYOtsxhw0yC9RawmGGXXLmt+RBK92QweXuTGNBSK8bbaKB2p32JLNAc7Rvl7EUXcihjYCM8C2CRLdm6MB6ss6hb4TERsR8GlfWmFS+Kqiw2d276dE3abmDdhURaPNVNs5xmznpVioFFf109fP3c8chtxBynC5AV47f6yHYm69IxawfhUaKF0cPIRku2qOf8Vpm7qxuOag90JA9vM/hp/ZKQM4Bhekoim8Mi6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=9TO8JGBywDKFNvzhFp7t1FbK+6cmhE8PCx78UOndfm0=; b=nnC8FHpl5ZtkvPKGwt2a6120wuN8yVpsVBz+iE+WXYpqQG3128Bd1XnjiSaRcB1OVSHmI2aOHEb1LAuqrPCVeFvnWCE884PLbrEbVPTzKO5Xo6ko402ycRSLCbWZKvUehCG+hptjflfJeKrZz/7eHecWKGFVlSs3l/4GLtv5iwhIsIxoHLPYh3+wLMGIa7j9REHC1q8Lf1c2xuP+mjb6VEQZ9X8BgT0ogmYC/2uVeIvUpF1AVxKsG9A1l3bTN7Wdqo8NrbswipVADuLXUG3AMac7idV+249e4U1r6IXe+SfQll4NWf3pSumQqORQ/jNAJt64m9onJL4F9DpW3zEdsg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=9TO8JGBywDKFNvzhFp7t1FbK+6cmhE8PCx78UOndfm0=; b=cOHLF+9IqjhHgY1sM0zDfSaU0wpsm2EuFe+mXlhJud+YAhMYwnHnXyIOJupqSS+vG0p5rzmc9Cr7SF0O96BUFyRSi4bxkZiK6Sv8AoI6PoeITkt4xhFoxUAYx3QRQinNLVUtBPy6x4T1NHGqsVyqGyv0AOc2j0M69YXSM8E0a6s= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by CH0PR10MB5035.namprd10.prod.outlook.com (2603:10b6:610:c2::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.14; Wed, 20 Apr 2022 22:38:10 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::bd4a:8d2e:a938:56af]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::bd4a:8d2e:a938:56af%9]) with mapi id 15.20.5186.014; Wed, 20 Apr 2022 22:38:10 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Michal Hocko , Peter Xu , Naoya Horiguchi , David Hildenbrand , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [RFC PATCH v2 1/6] hugetlbfs: revert use i_mmap_rwsem to address page fault/truncate race Date: Wed, 20 Apr 2022 15:37:48 -0700 Message-Id: <20220420223753.386645-2-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220420223753.386645-1-mike.kravetz@oracle.com> References: <20220420223753.386645-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW4P220CA0030.NAMP220.PROD.OUTLOOK.COM (2603:10b6:303:115::35) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: b6dc3827-c9ff-47c0-24c1-08da231e7213 X-MS-TrafficTypeDiagnostic: CH0PR10MB5035:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 32Wacf8/t8Ka1ikxEjFIoGdozch0Z0ryr3MexY4QsZgzD/Z0mD22BcBSWiSdBH6S5tjuW+Gc19FTHCOqm4gQnxIRc00I5sRfijmkUOclyPuPadViplAZOp4jIN43QveqVeJIOT6zntlNvq02EvNsWrFath+/lP5X/mGGgxgOhMuVdSYTVt0NBl6VIN4CWUYY2+Y57O0mQfOYQvVoNz22H+LAo2Wi6qJtYqtgUyxrhhqT7iEG72415mcRUHv7HF0PgJ/fSXKurWJWz4kwnyLZa6BJmttyKV9mMayhny9VmOdzfb06+k9klzV/pIjD+jRnQkkGX+w/9TjcRLPp9MxccSnsKu99r0liAtPjqd+g/KyNAIleBgwIjhK29UnXSItfbc+cOJvR8BUZwv+Z0p8o0s08VVqpE6d2uOxofnMftU3syXSoVwPXZJFkKwhZbQ5XznJ6vHnH4ZfTt8OBM0f/lk5pCgLarHxmiVVR00OKVTWhQUJnirmlBLs3BvXa8sn3glTN6Ycxmp8bO+6u3hOJ7HAUYjLTlro8HcDbFFOhvwXAcOewsYEA3D7s3mmIf2cbAIZlv7GKKmwus0CaJXfFi3eH84z0PMpW8nbiZy0AgeX2VL91I+lWa6+jjGPRzLXcoxkzQvOw5O0e72+Yq77CpOBt9D4BMoFhaBthZlT1g/+akQTtQsv/o69ykqpPehr0qg8/KYO2vM7VWC9vIEzoRw== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230001)(366004)(6486002)(8936002)(107886003)(26005)(2616005)(508600001)(6512007)(86362001)(186003)(5660300002)(1076003)(44832011)(2906002)(6506007)(52116002)(7416002)(6666004)(83380400001)(66556008)(66476007)(66946007)(8676002)(4326008)(54906003)(316002)(38100700002)(38350700002)(36756003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: lxyz7n9CVqFely6oBLKNxicNN267JcbnlPh722NRJqMPh3voqJr4NBHXfDBhOezArnKa+73b/lnY+soUbPtf5P7NlNGB6PGIcCUvfIyQlRNSlVfdN12JN+e4DHD7m2TJPtfEMKxYZ2Dp5tKXtvucnbUw+Y5DnKSyGHL0aBhbRpfukIuo1sfIAQ4Fv8x5rhU+ejznN7fpOpH/X4Luvwoy7yQw96SkKOcOwOhbMTXKIAm9czzWM7MZNjRWfbEHFbq1/QA3wADsPdSY3+weTWvRx9MS6zbKmhzY1pzfAvJPHPsPrhoy3W1DDKhE5Zs7dB/8UHawL02QQlkxZ5qGxFdh3CEnQNwVGxuYOb+tA+QB9RT7zxXXVUdq10FJcD0if/8Jiwq1YqM8eGma9XE7fbcGSdkoAXSKdGJBzCbmDhA/4FoE8DUbcPGUe2+WW1cX6JRfK/KJ1LpmppfLzEUf35w1aJ4Xfi5QaNy+59eqb8UBleZtq5OcqCaQ32szIINVAQGygdG+hi3dWOF+m4dVPw/M/w5mzL95LeWmMiYcTK0p5SwMmdQIqi2V1PlOxWgKlzor46w11ykyOM8LF+gNqZbWYoqT9KJ3rPgcpOIHDmdbfxO1uNiR/zwiEyRbzfFFOxWbYkn1rIc0zi1VwPTCGJJEy3BQXENjfCiKIKRG74itmtV7qixFRjqoKrmZ3BFybJfpz+0QdXwMkh8B7I5Me3ndYIZrdEvQs0awn13KaM1os6OFpOuCXGPXiLtR2B4CcwxIwSE9KeaYZgIwLG32NcjLFDYLtzhchJuLBfgcPCQPmMTVfieDXF15Mt8yOLuy4Fu7WmHoW2/8OkdAKZF4nJehWFlP9Vi3kJDUlV6zzkLfBc38CX87SXfjrRa+iUp/uqhvU1PSfQfunbK8TXM9M+o3T6f/wsUlRy3nHq6HqaHxBvmytqOMwyGDU022SxSyyT6h0hvrKgzmBCfyQig0nKUEkInss1LDd8liErVifbt+NwZuh0sx4GfB5YiXMbIWuf65lQqox0hYCbRqbD5KEqxSeGnfXPkikQaQzy9odWxtD7fvK5AOAGS4I+q44cbKrglhBSyHp8gM0rF2SZ91WbXZP6SvDXdMBma1FU79wnCTE4STAZ7AnlIfLYzdEOOa7t0SZ4rfJca8l+nuYHsOxS0zdEqdlJa4Cvg5h08bKXiFNBA8dsQ0/9elY9lT09qPUVqsrfQgr+PxHpxVcQys1/RDmVVl/s//tbrhU4uTEwm7lp6Aq3piK+ch+eyOOXC7J3TE4RdhIzWkN8pFv9izNRuHyyvch5CdmWAWgoPQRyMyJLlTwV2pdCNxFl5wvZfM2WeNZPegEe+COvuzTxpskvDl+a6sd5S10186BovmJb6cynb9FfK7zKTqpW2u/qJQArnUJxy58NQoHqoXvl55YUVVhkEl2dvEzFUhLDglBVYCQGptmq/8H+0FXXJ9NwgfXKtO9g+4XcmQUlQW9YN3e9maOEcnCZcKI27xYFJALKEazscHmQ4KBJ5th2qAKIMsOCJu/xNwTL6djiQ/h4V5epK5e0J8hyF+wttjPz6N56BzmuroGMapt9nCP9pO/cOOq6HdI8U3g6aJAu1ME0fBFVd/zeEO6Asa3gAGTP4uU1rRl1UWKhNCxPGffnn0w2b2Kv10AD7bTrC4zEWBrPuPwCOHZ0fnTar2SD9LXM1DyXNAsSmoC2NqeHLsR5aNxSuBj7t3Q1WHHVN9IyAB2eXxRyLp7vB7nOqMfbceAu1/VG7CkuI= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: b6dc3827-c9ff-47c0-24c1-08da231e7213 X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Apr 2022 22:38:09.8851 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: ksWvD4UZWZdzmCpMVyelTQBGD6eySPSw+zqGFE5X9/XBEDvZVX4AUIq3mdIWucwBYV4BPFPAGsEf9AO5TWfBng== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH0PR10MB5035 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.486,18.0.858 definitions=2022-04-20_06:2022-04-20,2022-04-20 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 adultscore=0 malwarescore=0 mlxlogscore=999 suspectscore=0 spamscore=0 bulkscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204200130 X-Proofpoint-GUID: 1tcUMBvqoA96ivuTupwwItio9IWGEZMK X-Proofpoint-ORIG-GUID: 1tcUMBvqoA96ivuTupwwItio9IWGEZMK X-Stat-Signature: qb7dj8uricssy6ntcrdg997j4zepggb4 X-Rspam-User: Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=F4kYUHxK; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=cOHLF+9I; dmarc=pass (policy=none) header.from=oracle.com; spf=none (imf15.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.177.32) smtp.mailfrom=mike.kravetz@oracle.com X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 96628A0018 X-HE-Tag: 1650494294-889876 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") added code to take i_mmap_rwsem in read mode for the duration of fault processing. The use of i_mmap_rwsem to prevent fault/truncate races depends on this. However, this has been shown to cause performance/scaling issues. As a result, that code will be reverted. Since the use i_mmap_rwsem to address page fault/truncate races depends on this, it must also be reverted. In a subsequent patch, code will be added to detect the fault/truncate race and back out operations as required. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 30 +++++++++--------------------- mm/hugetlb.c | 23 ++++++++++++----------- 2 files changed, 21 insertions(+), 32 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 8b5b9df2be7d..1ad76a7ae1cc 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -451,9 +451,10 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, * In this case, we first scan the range and release found pages. * After releasing pages, hugetlb_unreserve_pages cleans up region/reserve * maps and global counts. Page faults can not race with truncation - * in this routine. hugetlb_no_page() holds i_mmap_rwsem and prevents - * page faults in the truncated range by checking i_size. i_size is - * modified while holding i_mmap_rwsem. + * in this routine. hugetlb_no_page() prevents page faults in the + * truncated range. It checks i_size before allocation, and again after + * with the page table lock for the page held. The same lock must be + * acquired to unmap a page. * hole punch is indicated if end is not LLONG_MAX * In the hole punch case we scan the range and release found pages. * Only when releasing a page is the associated region/reserve map @@ -489,16 +490,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, u32 hash = 0; index = page->index; - if (!truncate_op) { - /* - * Only need to hold the fault mutex in the - * hole punch case. This prevents races with - * page faults. Races are not possible in the - * case of truncation. - */ - hash = hugetlb_fault_mutex_hash(mapping, index); - mutex_lock(&hugetlb_fault_mutex_table[hash]); - } + hash = hugetlb_fault_mutex_hash(mapping, index); + mutex_lock(&hugetlb_fault_mutex_table[hash]); /* * If page is mapped, it was faulted in after being @@ -542,8 +535,7 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, } unlock_page(page); - if (!truncate_op) - mutex_unlock(&hugetlb_fault_mutex_table[hash]); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); } huge_pagevec_release(&pvec); cond_resched(); @@ -581,8 +573,8 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset) BUG_ON(offset & ~huge_page_mask(h)); pgoff = offset >> PAGE_SHIFT; - i_mmap_lock_write(mapping); i_size_write(inode, offset); + i_mmap_lock_write(mapping); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0, ZAP_FLAG_DROP_MARKER); @@ -703,11 +695,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, /* addr is the offset within the file (zero based) */ addr = index * hpage_size; - /* - * fault mutex taken here, protects against fault path - * and hole punch. inode_lock previously taken protects - * against truncation. - */ + /* mutex taken here, fault path and hole punch */ hash = hugetlb_fault_mutex_hash(mapping, index); mutex_lock(&hugetlb_fault_mutex_table[hash]); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index daa4bdd6c26c..9421d2aeddc0 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5477,18 +5477,17 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, } /* - * We can not race with truncation due to holding i_mmap_rwsem. - * i_size is modified when holding i_mmap_rwsem, so check here - * once for faults beyond end of file. + * Use page lock to guard against racing truncation + * before we get page_table_lock. */ - size = i_size_read(mapping->host) >> huge_page_shift(h); - if (idx >= size) - goto out; - retry: new_page = false; page = find_lock_page(mapping, idx); if (!page) { + size = i_size_read(mapping->host) >> huge_page_shift(h); + if (idx >= size) + goto out; + /* Check for page in userfault range */ if (userfaultfd_missing(vma)) { ret = hugetlb_handle_userfault(vma, mapping, idx, @@ -5578,6 +5577,10 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, } ptl = huge_pte_lock(h, mm, ptep); + size = i_size_read(mapping->host) >> huge_page_shift(h); + if (idx >= size) + goto backout; + ret = 0; /* If pte changed from under us, retry */ if (!pte_same(huge_ptep_get(ptep), old_pte)) @@ -5686,10 +5689,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, /* * Acquire i_mmap_rwsem before calling huge_pte_alloc and hold - * until finished with ptep. This serves two purposes: - * 1) It prevents huge_pmd_unshare from being called elsewhere - * and making the ptep no longer valid. - * 2) It synchronizes us with i_size modifications during truncation. + * until finished with ptep. This prevents huge_pmd_unshare from + * being called elsewhere and making the ptep no longer valid. * * ptep could have already be assigned via huge_pte_offset. That * is OK, as huge_pte_alloc will return the same value unless From patchwork Wed Apr 20 22:37:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12820895 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C1BFC433F5 for ; Wed, 20 Apr 2022 22:38:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CCF306B0074; Wed, 20 Apr 2022 18:38:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C7B226B0075; Wed, 20 Apr 2022 18:38:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ACEF76B0078; Wed, 20 Apr 2022 18:38:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 970936B0074 for ; Wed, 20 Apr 2022 18:38:19 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 619E310C9 for ; Wed, 20 Apr 2022 22:38:19 +0000 (UTC) X-FDA: 79378722318.11.37E730D Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by imf07.hostedemail.com (Postfix) with ESMTP id 6EC494001C for ; Wed, 20 Apr 2022 22:38:18 +0000 (UTC) Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 23KJBjcm019298; Wed, 20 Apr 2022 22:38:15 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2021-07-09; bh=5cxMdQ4mXGkP9K9I6vJCx1ymcLUNpuv5S0xU/KmJNik=; b=Cu2aB0UhBi0P5FXkafxRvBYE1QuvIM4m6dr0dP/z6z1u5y38RyQ+9jkZJe/Gh+Kd19JI qupe6ryymqiHdU+FXl+YqhwDQ/b8JyqZBtNdYwbEwrxN5Am4kfeyVtOEnLwIzoVxeshD ZdQNOyxQKUXVwVMGo3YTmVmRYcemW0QO0HTfwtxkJRtdssBxwRveG/la9goQ8JiP90uW 8JN11yuhHolezdWBoohZz5n2zxoOoonMxM8B2AB4SskRIOayFmoNoVF8YKBOArmScqvZ sTC8+5/3aJxbg40+LGKD3NLXCwbiF+ALy2fm2TouY3axv8N1lT6qV96xBnWa59U2sXcW JQ== Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.appoci.oracle.com [138.1.37.129]) by mx0b-00069f02.pphosted.com with ESMTP id 3ffm7ct91j-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 22:38:15 +0000 Received: from pps.filterd (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 23KMQ58u013168; Wed, 20 Apr 2022 22:38:14 GMT Received: from nam11-co1-obe.outbound.protection.outlook.com (mail-co1nam11lp2169.outbound.protection.outlook.com [104.47.56.169]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3ffm87vs32-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 22:38:14 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=am4SrsTHqvQRD+nXPB531/P2TsSfUFE24Evs5zom1EDHNb1YZOq64N2gu1v1qRVXot+IqjETYzOlAdGBySDFMI90flaa/doDTvmVpqfWbwcpQkpWkkjd4haTeu0rJB6mIoKXVZxO3MwqwktxFBLY9UYV2RJaR+WK7KlWXcj7ZHoSbFUKQ9J40eRqY0dEzuYGm5t+RyB9+d4ry47bSxuz0z7mQYVDHVACPd9jZi2jOYTev35+8FiONbkrGUSgXHvk1Fwpk7dkaCVbY+lkJ8JAupTGO4cGqOnS4vyG8BCPfD8EEk+7I3EsViWCCy3z1aRaQ3nBdyKdl8Z5LW4DW00mNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5cxMdQ4mXGkP9K9I6vJCx1ymcLUNpuv5S0xU/KmJNik=; b=Ga+7+PM6XkCS0pvCV2DBEkk1BGnqLolexVeocH3Hi/W+wzx2OxqZoim/L0p8e+qqm6043RzLDQclOGSPkV9wQsqgLWfrXemvOlrNhH92JYr/PdM0dKrskpYosdtbirnYIWGM4d4qNmG2Lk9an5OOiUt+8RbuzPv0hobrK/FmB5mJdUgreVuuCa5157G2QM3jjl2woLH4zD6ninOqfkO1i9d0rs/ij7nxhJmvEOg/a+bXGWwOaBfLk6XzUPrnjSUeF4bNM7Bx2/1CbJjy7B20fC7e7GbsR88HCN6hPzGu4GPHID1im+cA5MevNKtN2PrH2ZFhxpGNVufLO/1nY5G9Zg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5cxMdQ4mXGkP9K9I6vJCx1ymcLUNpuv5S0xU/KmJNik=; b=M7rqOD5G0v1OnR4CN3NxlvzHlYpQptMgHz23q3iydINSv/Tn9B/RaZZgDLp9LJu+tM4dacTUth/GewnHm9nYGO8XRVzCQpCGkYIt7QKyL6nPfDp2IIZ7wPK2j19gWqsZHZvqkbk6KyPPrfEBeN337fzUhCGH1ZkWR3HiB9Kum0g= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by PH0PR10MB4550.namprd10.prod.outlook.com (2603:10b6:510:34::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.14; Wed, 20 Apr 2022 22:38:12 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::bd4a:8d2e:a938:56af]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::bd4a:8d2e:a938:56af%9]) with mapi id 15.20.5186.014; Wed, 20 Apr 2022 22:38:11 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Michal Hocko , Peter Xu , Naoya Horiguchi , David Hildenbrand , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [RFC PATCH v2 2/6] hugetlbfs: revert use i_mmap_rwsem for more pmd sharing synchronization Date: Wed, 20 Apr 2022 15:37:49 -0700 Message-Id: <20220420223753.386645-3-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220420223753.386645-1-mike.kravetz@oracle.com> References: <20220420223753.386645-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW4P220CA0030.NAMP220.PROD.OUTLOOK.COM (2603:10b6:303:115::35) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: ebe4ccce-6f6d-42e0-270e-08da231e7355 X-MS-TrafficTypeDiagnostic: PH0PR10MB4550:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: hFckk5k07Ug5CuWwxoL0cyB8nqTsXG7FeQcJvpyZ1fZA/I8wynv4PFzXIuki7Z9l5dSh2V96j1PVc7S9AWTcrZquk9Ge8hu32OoQUaSQdlKWEvMOeRqlpXNl8F01v1ugx3wzUbjs/f8fOzOWVsKTCjIwzeFgj//ssr8I1h9XLojMmUn+pAe49j7TD1Wbwi9ED53fkihBdwv7QDF4SR09mFFvxCydLcMR0/3CrzfmLToB23PoAfsWhMA8esCAfHfStQZUG5EKfmMnyh57n6b5S5vWkrAeDRm8xTNvRgmeDu5Rjk7skbB+xiI8Uu16qZbdICPIC16ATt3WG/FStuIZzpw66ink+gUSmg78M26B4QKljFff2aUu2VzUTGrMpK7ucCL4C5XFP3DDdBtSALQMEKd4ZY2k0ZXHB7IHPMwKrbhFbIvK2Z4eSGj3PG5whxvJD9Fej4ogeFJzKnDKWoAabI6JlxxNKzGk+BQzKidGwi52E8ZeihZkht4kBy44iGH/0FRfquNXMK0NG4G0ImGZLIEGlynt1rRtexXxXcwhwPDaBw0D0C8wpoBSxwG5p3X18z+9PB+Tow9iHIRwXeUzfNod7SmGaxXp2zVw8beHVY7s9tSEXJKtfjcNKHFuDH/ruNZ3skmGYgUJMZymf+Hq0hOLCsfI3zAh2gd2UBvUBmz1J5wdw1P216CfcR5HXF0PmkmyFY2uWcgBTUcAlOOgSQ== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230001)(366004)(6486002)(38350700002)(38100700002)(508600001)(8936002)(54906003)(86362001)(316002)(30864003)(186003)(52116002)(66556008)(36756003)(83380400001)(44832011)(26005)(66946007)(2906002)(6506007)(6512007)(107886003)(4326008)(1076003)(5660300002)(2616005)(6666004)(66476007)(7416002)(8676002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: EKgA8qhDDfd3QVA3ABj+Jiv0qybmCmcsafHHOzKSFw1/yoSdnULmfWVb7IPbBqD5m2J1NREkS4L53Zk0C8FB+nsDtn9k3FZYwIC9PEoN1uFCDCbizBcWTF2/YrR9N+OPkpEQKMIoy9x4xYjLRIxaUurSAAYZOeEHkhNfC87lsmzbQaNOA0uKLwjM5LyEMM4YSf+nVPfxif6YOF3VRAfnY+W0F18fimbu30J+oxFx9WxogjVEwSxTZ96o6IynP8w1e4DOAM8xeOfEz9mglXKOHcfeecU822N6EbOljQj3J6Bkc9IIgLvC1AtC1TChgDomfjsbUPyK3RDXwfAakbnBM4Tlj2jF191nFiZwTRJkQrp0itLuNFPs2SLYBacsIoT3n7deQyeBptACcXaoPaBIC+AfqlyZu4NEXxN3HRHkOTPzvrpyDD801GTZMeLnL9WqqUybZsxgWekQ+1w2hafjOETQmRqu4R9ebFCDMnmsC/oqgb8ZKVn8EDrP9RrSvMcrt3RVi3sTqeKDHdRtJ6kYHtgyNGOJuiPXSUYSy7RYfhe/cVXqEuI7wKU0nxhjHUtxB+YH9r9/8ZSbrngihWQXq1X+6KB/+cvFzbyF54f1oFOplQ0wbk5Jkie+aN8cE+5PJO5a/m4wDzfGhETBqYE2RQTKnPnG3o7PyFqT8ghe0bXhneADyBf+cODy/135B255xINtRELFYsD/fTlk19P65+C9cgwLj34d2Zalf3ROek5D7QGj3E9Tiu3gadBELF/eS3y7UF2RTosr1yxZUoFdOAxhQXUZUJr+VkdhR35PMeO5qNx8qYgZkzjSrzZpLFgrqkdzOMN3NdaLYMvFOp6aLx4mGzJlcos2taYUWv4ZozvZ7GJ3eqbeVQqxVEUhHO5i0QnoD3tt67sK5B6DOjBD3fEvD2VQndMd94TULvqM5GIXQrdDBFytmNYdJjoKeaD6MBSNiKjMnFsgDcalR++SsrQ3HtW1NBWKlXFY+BfsboKumbV627Y0esO7Eip5Po4nVlHpTyx8cbiYgG6qoiXIW8AlulI8jaWZXbRFpX0Z0OL3YeM/nkUXxjb5uTJhMHVQvNipEDl/Jj4uvSxcpBp3qpaAIK2vxGvuOIIEblRl5vefUxn8mkNNoCPNpXggfR/JMV0deGBJ0w9rxr5afvCCkLCXVMl9XBA4OPkC1pjMMdWd1f15pnWX04Wf27yjPgMIJGXQyiQPh8289c8CI0l7jtzuXS6Z3LzA5ZutTqRkfb3ZdDZ+cq3A1VoyVvYlRYD2/Kfc25NMpFrY1MJZ5BvjmQZAHoMfOgVfPv/8nqJpgoP+pK9Ek5d3MY2UW8ti2pGcPIzUqksDC91/qf4ovzGYk2KJ/COgM0OMimdVoRwHERnC2NRiGVDT1xAjzQwyxoCpYpLBRMjJIfHu5U251zAI/qBZIgh8jx/UhywxwwTF3XJM/fCwiJnEwBCc5uBWn3zhdceG/SuM586W3llVzLrC9Bp/seG/hmKPymbvpkNUR0prGFKkDThWUjpOqLKAY7LN7IIg6So90z97Qa9T+SCRMHbPWWBi1cEDUFsm9b3iaxAG/kU9St1Qew3/9xSPiAZ9DFsNHaaq8189X+xxCztTs6sq9qDn+TNizWQhH1ufHngOIQpR1rl8gqrcYOivb/tUvmHWKtbdIhuTxMKT3IWQYLhsPcfGGRHXVJb4AA3UlxO9HweWvDPHN5sexoHnnPFyQsaoysXFNZsd7S5jQx1ExAMrrSoNvuqj7KcULjIUCDY= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: ebe4ccce-6f6d-42e0-270e-08da231e7355 X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Apr 2022 22:38:11.8547 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: RIT7sogtaD4CbaAjE8pRR+sdAtURimAZSjaldQfORqtV3Suw9Ao0Iw/oLDq47Ib1yD89pqghTcLxSdevlGTGxA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR10MB4550 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.486,18.0.858 definitions=2022-04-20_06:2022-04-20,2022-04-20 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 phishscore=0 malwarescore=0 suspectscore=0 spamscore=0 mlxlogscore=999 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204200130 X-Proofpoint-GUID: 9XdZLBvwU_R5rQl7c9-_EgfymiGcW-hB X-Proofpoint-ORIG-GUID: 9XdZLBvwU_R5rQl7c9-_EgfymiGcW-hB X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 6EC494001C X-Stat-Signature: y34z66k8aog7kjr8qro6xtrr1umir3nk Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=Cu2aB0Uh; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=M7rqOD5G; dmarc=pass (policy=none) header.from=oracle.com; spf=none (imf07.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.177.32) smtp.mailfrom=mike.kravetz@oracle.com X-HE-Tag: 1650494298-965292 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Commit c0d0381ade79 added code to take i_mmap_rwsem in read mode for the duration of fault processing. However, this has been shown to cause performance/scaling issues. Revert the code and go back to the method of only taking the semaphore in huge_pmd_share. Keep the code that takes i_mmap_rwsem in write mode before calling try_to_unmap as this is required if huge_pmd_unshare is called. In a subsequent patch, code will be added to detect when a pmd was 'unshared' during fault processing and deal with that. FIXME - Check locking in move_huge_pte and caller Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 2 -- mm/hugetlb.c | 76 +++++++------------------------------------- mm/rmap.c | 8 +---- mm/userfaultfd.c | 11 ++----- 4 files changed, 15 insertions(+), 82 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 1ad76a7ae1cc..80573f0e8d9f 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -505,9 +505,7 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, if (unlikely(page_mapped(page))) { BUG_ON(truncate_op); - mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_lock_write(mapping); - mutex_lock(&hugetlb_fault_mutex_table[hash]); hugetlb_vmdelete_list(&mapping->i_mmap, index * pages_per_huge_page(h), (index + 1) * pages_per_huge_page(h), diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9421d2aeddc0..562ecac0168f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4717,7 +4717,6 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, struct hstate *h = hstate_vma(src_vma); unsigned long sz = huge_page_size(h); unsigned long npages = pages_per_huge_page(h); - struct address_space *mapping = src_vma->vm_file->f_mapping; struct mmu_notifier_range range; int ret = 0; @@ -4728,14 +4727,6 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, mmu_notifier_invalidate_range_start(&range); mmap_assert_write_locked(src); raw_write_seqcount_begin(&src->write_protect_seq); - } else { - /* - * For shared mappings i_mmap_rwsem must be held to call - * huge_pte_alloc, otherwise the returned ptep could go - * away if part of a shared pmd and another thread calls - * huge_pmd_unshare. - */ - i_mmap_lock_read(mapping); } for (addr = src_vma->vm_start; addr < src_vma->vm_end; addr += sz) { @@ -4878,8 +4869,6 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, if (cow) { raw_write_seqcount_end(&src->write_protect_seq); mmu_notifier_invalidate_range_end(&range); - } else { - i_mmap_unlock_read(mapping); } return ret; @@ -5255,30 +5244,9 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, * may get SIGKILLed if it later faults. */ if (outside_reserve) { - struct address_space *mapping = vma->vm_file->f_mapping; - pgoff_t idx; - u32 hash; - put_page(old_page); BUG_ON(huge_pte_none(pte)); - /* - * Drop hugetlb_fault_mutex and i_mmap_rwsem before - * unmapping. unmapping needs to hold i_mmap_rwsem - * in write mode. Dropping i_mmap_rwsem in read mode - * here is OK as COW mappings do not interact with - * PMD sharing. - * - * Reacquire both after unmap operation. - */ - idx = vma_hugecache_offset(h, vma, haddr); - hash = hugetlb_fault_mutex_hash(mapping, idx); - mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); - unmap_ref_private(mm, vma, old_page, haddr); - - i_mmap_lock_read(mapping); - mutex_lock(&hugetlb_fault_mutex_table[hash]); spin_lock(ptl); ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); if (likely(ptep && @@ -5440,9 +5408,7 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, */ hash = hugetlb_fault_mutex_hash(mapping, idx); mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); ret = handle_userfault(&vmf, reason); - i_mmap_lock_read(mapping); mutex_lock(&hugetlb_fault_mutex_table[hash]); return ret; @@ -5673,11 +5639,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); if (ptep) { - /* - * Since we hold no locks, ptep could be stale. That is - * OK as we are only making decisions based on content and - * not actually modifying content here. - */ entry = huge_ptep_get(ptep); if (unlikely(is_hugetlb_entry_migration(entry))) { migration_entry_wait_huge(vma, mm, ptep); @@ -5685,31 +5646,20 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) return VM_FAULT_HWPOISON_LARGE | VM_FAULT_SET_HINDEX(hstate_index(h)); + } else { + ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); + if (!ptep) + return VM_FAULT_OOM; } - /* - * Acquire i_mmap_rwsem before calling huge_pte_alloc and hold - * until finished with ptep. This prevents huge_pmd_unshare from - * being called elsewhere and making the ptep no longer valid. - * - * ptep could have already be assigned via huge_pte_offset. That - * is OK, as huge_pte_alloc will return the same value unless - * something has changed. - */ mapping = vma->vm_file->f_mapping; - i_mmap_lock_read(mapping); - ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); - if (!ptep) { - i_mmap_unlock_read(mapping); - return VM_FAULT_OOM; - } + idx = vma_hugecache_offset(h, vma, haddr); /* * Serialize hugepage allocation and instantiation, so that we don't * get spurious allocation failures if two CPUs race to instantiate * the same page in the page cache. */ - idx = vma_hugecache_offset(h, vma, haddr); hash = hugetlb_fault_mutex_hash(mapping, idx); mutex_lock(&hugetlb_fault_mutex_table[hash]); @@ -5821,7 +5771,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, } out_mutex: mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); /* * Generally it's safe to hold refcount during waiting page lock. But * here we just wait to defer the next page fault to avoid busy loop and @@ -6659,12 +6608,10 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, * Search for a shareable pmd page for hugetlb. In any case calls pmd_alloc() * and returns the corresponding pte. While this is not necessary for the * !shared pmd case because we can allocate the pmd later as well, it makes the - * code much cleaner. - * - * This routine must be called with i_mmap_rwsem held in at least read mode if - * sharing is possible. For hugetlbfs, this prevents removal of any page - * table entries associated with the address space. This is important as we - * are setting up sharing based on existing page table entries (mappings). + * code much cleaner. pmd allocation is essential for the shared case because + * pud has to be populated inside the same i_mmap_rwsem section - otherwise + * racing tasks could either miss the sharing (see huge_pte_offset) or select a + * bad pmd for sharing. */ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, pud_t *pud) @@ -6678,7 +6625,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, pte_t *pte; spinlock_t *ptl; - i_mmap_assert_locked(mapping); + i_mmap_lock_read(mapping); vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { if (svma == vma) continue; @@ -6708,6 +6655,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, spin_unlock(ptl); out: pte = (pte_t *)pmd_alloc(mm, pud, addr); + i_mmap_unlock_read(mapping); return pte; } @@ -6718,7 +6666,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, * indicated by page_count > 1, unmap is achieved by clearing pud and * decrementing the ref count. If count == 1, the pte page is not shared. * - * Called with page table lock held and i_mmap_rwsem held in write mode. + * Called with page table lock held. * * returns: 1 successfully unmapped a shared pte page * 0 the underlying pte page is not shared, or it is the last user diff --git a/mm/rmap.c b/mm/rmap.c index edfe61f95a7f..33c717163112 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -23,10 +23,9 @@ * inode->i_rwsem (while writing or truncating, not reading or faulting) * mm->mmap_lock * mapping->invalidate_lock (in filemap_fault) - * page->flags PG_locked (lock_page) * (see hugetlbfs below) + * page->flags PG_locked (lock_page) * hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share) * mapping->i_mmap_rwsem - * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) * anon_vma->rwsem * mm->page_table_lock or pte_lock * swap_lock (in swap_duplicate, swap_info_get) @@ -45,11 +44,6 @@ * anon_vma->rwsem,mapping->i_mmap_rwsem (memory_failure, collect_procs_anon) * ->tasklist_lock * pte map lock - * - * * hugetlbfs PageHuge() pages take locks in this order: - * mapping->i_mmap_rwsem - * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) - * page->flags PG_locked (lock_page) */ #include diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 4f4892a5f767..1a2cdac18ad7 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -374,14 +374,10 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, BUG_ON(dst_addr >= dst_start + len); /* - * Serialize via i_mmap_rwsem and hugetlb_fault_mutex. - * i_mmap_rwsem ensures the dst_pte remains valid even - * in the case of shared pmds. fault mutex prevents - * races with other faulting threads. + * Serialize via hugetlb_fault_mutex. */ - mapping = dst_vma->vm_file->f_mapping; - i_mmap_lock_read(mapping); idx = linear_page_index(dst_vma, dst_addr); + mapping = dst_vma->vm_file->f_mapping; hash = hugetlb_fault_mutex_hash(mapping, idx); mutex_lock(&hugetlb_fault_mutex_table[hash]); @@ -389,7 +385,6 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, dst_pte = huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize); if (!dst_pte) { mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); goto out_unlock; } @@ -397,7 +392,6 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, !huge_pte_none_mostly(huge_ptep_get(dst_pte))) { err = -EEXIST; mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); goto out_unlock; } @@ -406,7 +400,6 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, wp_copy); mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); cond_resched(); From patchwork Wed Apr 20 22:37:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12820896 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD6BEC4332F for ; Wed, 20 Apr 2022 22:38:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5A09F6B0075; Wed, 20 Apr 2022 18:38:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 54F8E6B0078; Wed, 20 Apr 2022 18:38:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 356D76B007B; Wed, 20 Apr 2022 18:38:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 2539F6B0075 for ; Wed, 20 Apr 2022 18:38:21 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 05D852632E for ; Wed, 20 Apr 2022 22:38:21 +0000 (UTC) X-FDA: 79378722402.11.5ED1751 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by imf20.hostedemail.com (Postfix) with ESMTP id AD1C91C0024 for ; Wed, 20 Apr 2022 22:38:19 +0000 (UTC) Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 23KJBjcn019298; Wed, 20 Apr 2022 22:38:17 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2021-07-09; bh=w6u/6HyQfi3PPs41Vtgt27h9MotVJMrw5wXoJwL+fo0=; b=lXSdQVQv5nX3bsXKBCX29Do6q+wwdgx3TCM615JRoauTMh2VQLYP+AlqA09L2z9x7NY3 agM3aRoRsidr2M3hEn8V3B9zMp/3tFm4e8qD17lczhDF1jwhJNxppqMhShkUtxUgvg5X OLnsGIgK3ymQqn8wmPL8C/K9+VRjg7yfYSqNgi2sOTd63xTy+VoHPx6pV7s6AYtnT0ap 1bbL1AMvvvCXT7FgHRsXC6vWJ+N+LmDgpp6AXjvZMvhxI6MmBRvzU7ss8o6Henw4S79I 7TdkID5njYpkTytsDnf+DmDwl9Evjxo16ZbK+B/6okx7awglh7YIG6HqK2jFrpNiQMiw Aw== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com with ESMTP id 3ffm7ct91m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 22:38:16 +0000 Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 23KMQum9035920; Wed, 20 Apr 2022 22:38:16 GMT Received: from nam11-co1-obe.outbound.protection.outlook.com (mail-co1nam11lp2169.outbound.protection.outlook.com [104.47.56.169]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3ffm88dqq1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 22:38:15 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mnNciUZ2WcNGPLyGOaSC6LRASxRBhRlWDpZPOJZKGk6bhgWVmeMCYWPaN9g4k8NQxUvVLrplpn/hVdyojbeJsdoceMGKPRGK4QTtXe29mWhvAlkZYYA4m0BqT10mczTnQl+GMvScguOPaeM5BxzAqldnvVZ7I55Vez2fmF3AzjjQDVPrxCmevk0FQXXAi/+RE6thyCq8ZK6kbNhB2Qb+8AcVVxK9eDQofo78Pn56WkRWIc2vztX/N5SakiWNTenQHyjGRM2Jqrgdpy6Uwv7h10O4di3pzcVmx+XW17TykNcSbAJWJ/G1MXRWcE2OMahRVxSUJUuo9VC5uE/RnlgZvg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=w6u/6HyQfi3PPs41Vtgt27h9MotVJMrw5wXoJwL+fo0=; b=N35NgcqSUCIsAONoF0OLtQ6sjisCOtgpoFjxeKQyQp3j71jzrchjUooj9oNwcuVJRwaCHy+N+G0FTbgPWxGE22z4yeJ1aIqb6Zy+930XhJqOiFiCbDcSRnKtjYORTeCP80kMcJrgx1xp/KVtdKMdmLxyt6N8iEJUZzgeqF/SmaTdtUiw8lQ6cm3f1phE2oJlp9sqLz4MeSGYrSEhwE6m5wfLhVTOcyuavms7yJWmY2LB1yh8dkFTAbGNRbBIyZOTeI/Y65jbgFbe2Z0WWtFwCXHSOXZiIDst7xoi6C/J32cLi5yKXEz+tqbBynDLewwmrFkMS8T4hewUhdnfd6F1gg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=w6u/6HyQfi3PPs41Vtgt27h9MotVJMrw5wXoJwL+fo0=; b=hlxQykTIWPx56+ljqUHy2AxuPpOSQg2UskrSZ4cUEIIQ/rXkxfd9tjf59BD9TlWkh3Ml86DIoP9xLr0UxQ4jtAogvUJN03hhm8SHddImO0TEDrGXDiGLYXsGKCLZdq3RY8U7K2dS4QgUYACsXHASLhK4yo2rhpzUSDuvhmCLChU= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by PH0PR10MB4550.namprd10.prod.outlook.com (2603:10b6:510:34::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.14; Wed, 20 Apr 2022 22:38:13 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::bd4a:8d2e:a938:56af]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::bd4a:8d2e:a938:56af%9]) with mapi id 15.20.5186.014; Wed, 20 Apr 2022 22:38:13 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Michal Hocko , Peter Xu , Naoya Horiguchi , David Hildenbrand , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [RFC PATCH v2 3/6] hugetlbfs: move routine remove_huge_page to hugetlb.c Date: Wed, 20 Apr 2022 15:37:50 -0700 Message-Id: <20220420223753.386645-4-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220420223753.386645-1-mike.kravetz@oracle.com> References: <20220420223753.386645-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW4P220CA0030.NAMP220.PROD.OUTLOOK.COM (2603:10b6:303:115::35) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: dea58b18-4c10-4910-5392-08da231e747f X-MS-TrafficTypeDiagnostic: PH0PR10MB4550:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: qnvvupijr52GQ8zB/Es6h5yhQvnFCSEo4RcuasukZZqBErGj2b3l6kU1+Gq486a4ugC4iuXtj0Indt0MPEMHkPlSvju3ZUHIA1LTIlxWvRVPCNnsXJNc3+yTbAXE20MBlf05dDfo/WRNOoQ7x35I7H30DWJBlxUs2davCV5Ix+H2747nenGnijcy+nAezjdfc0q8Xko7cLYBpqYwJsFTVnROZ53+jWmgYnKu2Q+1YSfLaUVqf7k3+Hfv5I191vNRpwWZ18qpuKxwwPUs2GRM6FLCD6+mD2S1vRrGKpq/5AFmE8bgko12TIhuROHoNIoaKA0awwiwmp2T65KZQNxZ3w8aFG8IDknqlfyeJMWrMUQ+ncem5D+Zr8HM13C4mr9p2HtSGiyQZLxrzrppgZ84ndZZf58bN7do5OQqxbAL5y0UJVrx6pJqjGGL454yR5S/roDv2gaDIZm/RHwn5K7X3WSzQbPRwSVCiYXCTpSOPuBw1zAGMKrr84LEJnNRtiiYN7OIY7Zj73jAjI95PNPr/hq02tMGd9UqrIPcJUHTvP5MMrO1G8EoEpR1bkpFjAhfA+uUjDKv5ASaD99aJ0pas3fZwuwJunntkKok0eJ27P5P9lQjac7VEZ5KPYFo8CA2OPYZE5rPF6tUQ2yDjNJegUZDpv14BYrBGxV+wglTvKk5PwZORC5FDqX88vGFVDOmx4kwTb6Cc7Grm2u15NqvjH+JQnDitN8l9Ny1CHENQQo= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230001)(366004)(6486002)(38350700002)(38100700002)(508600001)(8936002)(54906003)(86362001)(316002)(186003)(52116002)(66556008)(36756003)(83380400001)(44832011)(26005)(66946007)(2906002)(6506007)(6512007)(107886003)(4326008)(1076003)(5660300002)(2616005)(6666004)(66476007)(7416002)(8676002)(14583001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 0vdrScbKEBCLJxai1xjp6P6l6cY8f4ypbBgE2TB7rrxnz0rK3FbVnWxV9x+PQHE/cdHYZ0bhctxPx9GyNfXoaMb0eGIf+ONOJhKMRqs/9GazH7CvJbukpyDmmDGM3xk7QvErFFYcDKd1tk/3P7f1kkEjEZBvFEIYejqpO0W3tj+r5oh7PABmrDcYwqlaaRV8aXCuYDOlYYFx/py+WkXa7dvtxyQEN98WVE3D5B4m3KOoqYh8oHIIgwNSwdJ1WmCXjk747SjUXaggOYMHFn1UMaagCgEU4jn1+HH2zvhLieUaJ7t3ZbcxyBodPCvsjLFqlDWLlct4m+rUYYPdFl62SacZ/EQxxCR17hlE1TjmExuxMCy8Z3lZFgVxrIoTbqc+1yeS+3KLachTiV1kImsnp4t8dGTldCu9MBeZXBSIz6HkWBZbJlInlOfqlGhTBF5OhMn6H//ZJ5T6DjhSbIGWXcySy6n/Zrt6Xc7vz7jEbmyiSOzGnL8oevCQyvFDBEYOmDMI6l+WSjLcuq2xu4+si6l+/rucTfzZBeoGgKsRB7SYovLEPklbxgj/rglPKRAx4NBf3JraDaAw3q14Y6aoa5Qz92RVUSzU4vDm8KpdYP5x9lVSXJj+muTeIpdDGFZAa0e+bTvUB3hZDrue676j40mxXygUAN6U1MNDPANh21vqqB6dY0BGsqxK3mdGevGXNzZ6FOzIesPRIHL4UzPNm8l7d8kD/fav5I8AiP5E45PM+8vnIKbe+rzdTKNscIoM4K3IAJXLHmDenob0m/0ZWF854dK+KlIy0zu/Q0lp58RlArW+JSOMWz0k9GkRNsSTpKfQlqfNfeHNMCWUq+Wks2VyJHbFTImmUbGcviYR9OlxyabnXzPQJyItrb/29FprAWqfquNAZGNnpDPp2VXlW7YWW5OOy4JxbrpqKWrvWbPoGI9Clb2fL8pkNKz4Y0xIRZvc7vDjSUeOmKodYX3GTutYf4mNIt1g8zZvgYHnuIe0of001wCAIT6r1tIov+2BBrF3s8wLrGUOM5GWYSS1SXcv8fN9kCAcXrwHwyC6f1wmNyuNZxhLgJ3vnWXpQu29tuGgsrgXc6FNL2mASm1CLcAKxp3Pvin26TOfQ8pnQ6g/r1JfIRsIyPbdxhxKgX9+jnt2ezVT+MlPRwRxQhsu4bCjGK4ILwnZffNzyhex1krqpBWntCLk6KRaq09gyXDbRAK76BgE/2UvnFWdT9n5N09u1TqRrqtaX5/QpX7SeB+ZqQ5Hzz7zx2bQzcghOrFopqbjD7aT/yrLEZDm/Vea//j8IrMoWREfFnPDYN1RskUT0ur4T8INvEfATjX+a/rzuLAt6Ux5qg5aBX8mYCbJaCYSsUclfS7vbvcp8sb/p3MCOVMdyzAoJMaHkxVGMeyft9eN8nfGkKqjcDkHKJHIb18p5F1xRUB2B6uSKynOZYGtfOcyg78ZPt65aQFxyGF8DixrLHDvu6p+dGo/kBs+PUNyGMgZs9QROHmalK/X1HQXdvA8lf/oAq9EBEnhH6hXb9LTtURz0u/gxV+dgZO29S2bRNJDxU527TTqQ8oC9i69xP37YSf1jbTlt/Wv4tfAYJ6hCDBu0grCNxfBUv5pvebj1u7JxiazTTQ0NjlLultlZiGvnsefXzPhxFZAPLPKirYGqmqHsy0gEXicAVlZsOaGtc9Q9bkHoXZonNiyedISUCpGyvozIXj1xBCdoMvZgN953iTOYnOZKDjBD8dVouOQwLOCQMPbext4koEYUxs= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: dea58b18-4c10-4910-5392-08da231e747f X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Apr 2022 22:38:13.7932 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 7EtJQUQnA9ZMo56/oIN72ehO9nTRENHY94Tmxy4FZ9s5yoOn4gAFtEaZtHyyM3NHzgSz5i5uHk6CFuPvLgkaVw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR10MB4550 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.486,18.0.858 definitions=2022-04-20_06:2022-04-20,2022-04-20 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 bulkscore=0 suspectscore=0 malwarescore=0 mlxlogscore=914 adultscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204200130 X-Proofpoint-GUID: tgoo0DFMzdyx9XaILCdJWPJDkC3je82T X-Proofpoint-ORIG-GUID: tgoo0DFMzdyx9XaILCdJWPJDkC3je82T X-Rspam-User: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: AD1C91C0024 X-Stat-Signature: ee8iws6wpp5fyybmxh3sf4witb6zxurc Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=lXSdQVQv; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=hlxQykTI; spf=none (imf20.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.177.32) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com X-HE-Tag: 1650494299-997255 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In preparation for code in hugetlb.c removing pages from the page cache, move remove_huge_page to hugetlb.c. For a more descriptive global name, rename to hugetlb_delete_from_page_cache. Also, rename huge_add_to_page_cache to be consistent. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 24 ++++++++---------------- include/linux/hugetlb.h | 3 ++- mm/hugetlb.c | 15 +++++++++++---- 3 files changed, 21 insertions(+), 21 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 80573f0e8d9f..5e4bd2f1705f 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -396,13 +396,6 @@ static int hugetlbfs_write_end(struct file *file, struct address_space *mapping, return -EINVAL; } -static void remove_huge_page(struct page *page) -{ - ClearPageDirty(page); - ClearPageUptodate(page); - delete_from_page_cache(page); -} - static void hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, unsigned long zap_flags) @@ -516,15 +509,14 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, lock_page(page); /* * We must free the huge page and remove from page - * cache (remove_huge_page) BEFORE removing the - * region/reserve map (hugetlb_unreserve_pages). In - * rare out of memory conditions, removal of the - * region/reserve map could fail. Correspondingly, - * the subpool and global reserve usage count can need - * to be adjusted. + * cache BEFORE removing the region/reserve map + * (hugetlb_unreserve_pages). In rare out of memory + * conditions, removal of the region/reserve map could + * fail. Correspondingly, the subpool and global + * reserve usage count can need to be adjusted. */ VM_BUG_ON(HPageRestoreReserve(page)); - remove_huge_page(page); + hugetlb_delete_from_page_cache(page); freed++; if (!truncate_op) { if (unlikely(hugetlb_unreserve_pages(inode, @@ -723,7 +715,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, } clear_huge_page(page, addr, pages_per_huge_page(h)); __SetPageUptodate(page); - error = huge_add_to_page_cache(page, mapping, index); + error = hugetlb_add_to_page_cache(page, mapping, index); if (unlikely(error)) { restore_reserve_on_error(h, &pseudo_vma, addr, page); put_page(page); @@ -975,7 +967,7 @@ static int hugetlbfs_error_remove_page(struct address_space *mapping, struct inode *inode = mapping->host; pgoff_t index = page->index; - remove_huge_page(page); + hugetlb_delete_from_page_cache(page); if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1))) hugetlb_fix_reserve_counts(inode); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index b5f4a2f69dd3..75f4ff481538 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -655,8 +655,9 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask, gfp_t gfp_mask); struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma, unsigned long address); -int huge_add_to_page_cache(struct page *page, struct address_space *mapping, +int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx); +void hugetlb_delete_from_page_cache(struct page *page); void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, unsigned long address, struct page *page); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 562ecac0168f..d60997462df8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5353,7 +5353,7 @@ static bool hugetlbfs_pagecache_present(struct hstate *h, return page != NULL; } -int huge_add_to_page_cache(struct page *page, struct address_space *mapping, +int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx) { struct inode *inode = mapping->host; @@ -5376,6 +5376,13 @@ int huge_add_to_page_cache(struct page *page, struct address_space *mapping, return 0; } +void hugetlb_delete_from_page_cache(struct page *page) +{ + ClearPageDirty(page); + ClearPageUptodate(page); + delete_from_page_cache(page); +} + static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, struct address_space *mapping, pgoff_t idx, @@ -5488,7 +5495,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, new_page = true; if (vma->vm_flags & VM_MAYSHARE) { - int err = huge_add_to_page_cache(page, mapping, idx); + int err = hugetlb_add_to_page_cache(page, mapping, idx); if (err) { put_page(page); if (err == -EEXIST) @@ -5897,11 +5904,11 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, /* * Serialization between remove_inode_hugepages() and - * huge_add_to_page_cache() below happens through the + * hugetlb_add_to_page_cache() below happens through the * hugetlb_fault_mutex_table that here must be hold by * the caller. */ - ret = huge_add_to_page_cache(page, mapping, idx); + ret = hugetlb_add_to_page_cache(page, mapping, idx); if (ret) goto out_release_nounlock; page_in_pagecache = true; From patchwork Wed Apr 20 22:37:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12820897 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D0D1C433FE for ; Wed, 20 Apr 2022 22:38:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2BBD56B0078; Wed, 20 Apr 2022 18:38:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 26C366B007B; Wed, 20 Apr 2022 18:38:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 094C06B007D; Wed, 20 Apr 2022 18:38:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id EEADD6B0078 for ; Wed, 20 Apr 2022 18:38:23 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id BEF72120A36 for ; Wed, 20 Apr 2022 22:38:23 +0000 (UTC) X-FDA: 79378722486.15.020848B Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by imf14.hostedemail.com (Postfix) with ESMTP id 629DE10001B for ; Wed, 20 Apr 2022 22:38:22 +0000 (UTC) Received: from pps.filterd (m0246631.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 23KJBhbp009605; Wed, 20 Apr 2022 22:38:19 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2021-07-09; bh=S+Lj0Un0XA7m4klVzSq+CKG30o15R40mxoXfO5MaODA=; b=pY/qXCQ5Yr7KTGuv+AStkbBeA1ATBe/9FlocR9yry0jtK6uPOLsXHVWk7E3jlB9/rUzy n0HX32awUOknteg6gvCx+CzKL3VWYy97EbFS+Kl+AffVisyJF73Kn/o5IZfbe3srj/MI DFkAdoFinQH2/lSDZHBPCZh6ymI/lULjyukgPum9LwgUqVfXOQVXk3ykLqe8TJ7I6sE2 0PH8THC1f/MZ0PsE6wRZcp1oRugywxTbkNKa1lpxrYm2lvNy1Nw5ortoT4UDZsoZBJJe V9mUojxo+8g7fIzKVgkUeq804oqE9KHp8aobzuE1nOd1gF8ZJaKQxuN5iCXSiFsr7vWR lQ== Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.appoci.oracle.com [138.1.37.129]) by mx0b-00069f02.pphosted.com with ESMTP id 3ffmk2tg7c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 22:38:18 +0000 Received: from pps.filterd (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 23KMQ58x013168; Wed, 20 Apr 2022 22:38:17 GMT Received: from nam11-co1-obe.outbound.protection.outlook.com (mail-co1nam11lp2175.outbound.protection.outlook.com [104.47.56.175]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3ffm87vs4n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 22:38:17 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ehvsbC8jCHjL4NH00ndTJtrq89k3J8RMCmQei1yNE2j9aDbCJKziS/9JxWIYBeSByJ31Eyg58kDwnS0oaG5Lc1PsJvFg9MAth3Qrxswfq17U7Bf+ygu5oJ06HhCb/9cLMync7nk4gVddNIR7O/Q6HiFYOHMQPpqnEFUNfHHg36XR9jo5Uuyby6AuzAmlOIu6H7oQvNGh6vb0RJdmQZBpjs3CesYZ3sTayCJaQNmZWkIKIKMaN9CWfJiPPZl3FyFlevsww4ay5lTmkubwhQoq1RGbNBIb1FvBlQnQaKPZ2Nroe/NdH+9xAjdXi9kzxNpzPmsccKZhramwieOrGxTjXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=S+Lj0Un0XA7m4klVzSq+CKG30o15R40mxoXfO5MaODA=; b=eta2OJDKRCXwySP6B4rXQEBImonrQEjZe+RFx20x47VNiwxDgLh43mIJqyWj/BuVVA1RwMIpUn9yYKXxSiK+81vwbGIyeye/zRC5rbVVnI1IbKyO5kHMIRSWxDOrAppuq+YLjZT6Z6zbkXG/2fPgKm3u7HrqIXF2CQKnDWHs8OkMmluxqCGcyVaoJluUt5aNovvgLwhovmnvq11CxkFh7Q9+sdRYbVlN3wx8603qsAr/Reb4aAiITRJ6wzzxhrAoal8zQt0MGyOVbDZNRjNTsn62TZ6gCIae/++cFmaGFxHiIWMFap0RCkmio82gVtDp0ydZzrr94JSxsHwJt4MBBQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=S+Lj0Un0XA7m4klVzSq+CKG30o15R40mxoXfO5MaODA=; b=EPlvRYyb13666V5dF3MBSmjs5gvgzSRx8ZfakaIaQajpCcMcSrB73XS0cIcYeKc76oXbiAEEG9VpVctwu075Vo26qDGrl6PmrlIH+FSsxnZE0ByqQ46nKKvnwwcUMjDfvbsHViqwCVKTM9dKkvFyU4Z6gVQEUQ2KvGyu7ghloBg= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by PH0PR10MB4550.namprd10.prod.outlook.com (2603:10b6:510:34::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.14; Wed, 20 Apr 2022 22:38:15 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::bd4a:8d2e:a938:56af]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::bd4a:8d2e:a938:56af%9]) with mapi id 15.20.5186.014; Wed, 20 Apr 2022 22:38:15 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Michal Hocko , Peter Xu , Naoya Horiguchi , David Hildenbrand , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [RFC PATCH v2 4/6] hugetlbfs: catch and handle truncate racing with page faults Date: Wed, 20 Apr 2022 15:37:51 -0700 Message-Id: <20220420223753.386645-5-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220420223753.386645-1-mike.kravetz@oracle.com> References: <20220420223753.386645-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW4P220CA0030.NAMP220.PROD.OUTLOOK.COM (2603:10b6:303:115::35) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 22661210-0259-4e4e-ff17-08da231e7579 X-MS-TrafficTypeDiagnostic: PH0PR10MB4550:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: gAkAIrJbeSfpxMli9+QkqMoP37jzwzmtd38j/vYKiEgvtxBAzLJuZjNds7CWlWVuJOuug/Mqr9WgIDD+JDV+J54YNt+Ua3bj+BAXvthkFyI0mmzCmfMAOtCijbgfoAbaM4qe9DNNJwhxMZgKtuiRTqzc9midUKGujO7uuC9dcVQQ0+KE5oN5yOMaVDkrs8eeTfB8afgB6xPECzFFuVC0GA7k6B36GpN8KGZgVE1d7NxGiDulWuZriqFYSzy9zZ+/XUpkd5TBTCQYbbwgcTX+nWvnU/+HKDQc9mjUSbmycIxURWjCg1xdY9KkJrh2joG2xgl1wpP+ulL2uqBx5tU9kjFD3GswQadznTN20btARgawa8arGhkE3JsadZoDWEGsXJ+aP2bHcIddIjt1T7WJjaICcI8EunI9SJ+X6E6k6haQ7fR1Z7sc6q5Y1gwSNeRvKl6S0N9g4MXJOaHl6aIB2JmhKi8KDg7kTZdD0It2NCx5ZE3rq2JgNqQHBI8oajCVVlLoHdjmBtb15QmqRJrQXl6EItBMOyubNBIs76XrT4YU+85qSejMPLHATuxb4sBvJc2CNb8q/0rFbzDMGvvo0cIVQNfSjuNSL/IKimViyzI2ux0zzTCt4lUOqZdhQe8a8taSIOezXUz+x2KHV9WSZqsDaJzgXfLkOG5JiVLS98GCSVgvuTUxGEg41nsxqCeKlxKAzf4tVmtNY04XiZ4+Ww== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230001)(366004)(6486002)(38350700002)(38100700002)(508600001)(8936002)(54906003)(86362001)(316002)(30864003)(186003)(52116002)(66556008)(36756003)(83380400001)(44832011)(26005)(66946007)(2906002)(6506007)(6512007)(107886003)(4326008)(1076003)(5660300002)(2616005)(6666004)(66476007)(7416002)(8676002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: CK3x5JrZ4wTYh0tfcNEPC9to5uKg88cw7BmL8DbJCd7XinN5XY12N3yxxKAqFmE+pQt7scdn1Z+qX35DGtZCOOg06vfFmGo1ra0GVKPVke+h+8zHRGcw08h+TNUJ5fzPgzbrl2+PyExKuCxVpHbCjAkRXlsDJBiNGlGfEAo0kIKPEKWRcibMBldk8mFSemM2Io73qgEHY3mtc32Eu1GiLmdkiXIIudesPFO9UtR4ZcCBAmcdYE+raho19skpeCzTbHdyCmxYHbC+fDjzAS4KHSUnv6U8M+c3mRZiFjKGtjstE3OfPsNy9O5RbNwo49ZSJ86nLPRS0aKOocsFTx+au5AZatDzGYzq2jmX/eraKx+cruhYJPJ2TZj7pnzb5G7vlh8RxFG/s6bdYJS32x6YPBqsCwFc8JDr8i9SqC12eFjwk2wXxXqX7Gp+K8EINdtVmxktYU+RFvw2P7zwlszDlpxC53Kl1qDqoqlo4dHTccX+H0don2XHgL956xip4I+gqAHTH+DOt5HXhfYGC7/xe8wGIsCW3Jbp/ldNGc8yF5nuOVX67Kgqi/RV5znMltnhLWJW5W3I65olQqW+Jjkda6w48wCvVZtfOjqh8EnVWohkHf9e+bsuwdwkAJjPBzVwgcNFMVXuC3Up2RN7SM13w7uw3VGeAgA6rTnaAZtPdM/NlXh/HbuHY25vxpmdfuRk/PVBEWC5EyTfvechOJUEESJf8ibTAbb9/R7o46Pz7ydL+75vj8UqDeKy2a3syEpivN6/dItwtMOzdX0uq2gdftvz2eOGI14Aaj0FhXYNHbBBDet2nleF2jXluHIQj4Sxjw98AcMJe9jZiNZqDpAT1cSw1uIae/vjA47sI/tbBWisSwF4pj/ikZoJ3q60K06V0d2PoWM2T9r9LQ2xIFhkITDQIq16xi1og+7Md6AdWN+l88W5mm2Xh3tVhjh60MJauVy0/nkdMLUTlHYFajZl/4CqVS7ljpMWnCXS8j2XibYSG7MpQPtKcAZDnJF41nijA7YsndhkHwv7ofPFMKkDtGwPMmN3XZ6FVLAFZWUWu0ixMGUAhoPBwGJvkqmW9FXMwNB+lKUcfPrpovludf4fOw8ZHvU7m6QWrNUMYpZ/LlWxUPaVX00F3xts2kFai7utyAT96uLf+gtad2W5yNerVkr0+alwRXUWyH76loqdzGCqGAj2ljnMA148UYJOx9FrX/v8YFXXSXVA0f7TW8LEg53IXNJinkOVjUaOS6eFk9wGziInvngcE0Za4iw8r6lO2iesVdJeZwOPAD7pV0X4XKdSD2l177Lf8Qt9euI3hSYLpbokrUCRH86g08oOWa1rLOca+3/BC64M+YFge2S66HeXsUdpU4wZFpoxcwx4Ev/gD5VUJBS3VWOBhLN+QIBTg7+Z52Cf+WrDEkXLz3nQL5I/8ZsaOBoZ9LyeNTtqtUSANpKMKO+7WCYMhcbnTCN2bJnRnsNk6okv3M+XYZpnAWOjdmpS78G+/6Eqg2Ro/R7NcjyNORWjIMOya2ED1DibcyMVjWgDKWor6S1bizcdwdUIFTAj0Rl/cgBBSO9rnQrek0mWys/0fTIlFrwfUheyv9LeOhiNR+BQ5KUA3LUcOdjC3ju82w6YZ+E5U1UAydVizLwHO64Frv1Vu1GpCLH9ACSil2yuLSuRFhuaQ1s14CqDW/GlZZpuO6KScogzq9qsqv4pMaAmvfnOBjWiJMkEHYmNZc61KTz2Zu525UaXKz1/S8PPiwAeUdKo0OmzcPI= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 22661210-0259-4e4e-ff17-08da231e7579 X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Apr 2022 22:38:15.5622 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: nKtMwEHtY9/d7SQmfGR+h2WopaRbEWquHtFHwH5G4eHcSN7ww3Fs6vJ1/utFYccV9bvel746fQelxZ92N85ptQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR10MB4550 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.486,18.0.858 definitions=2022-04-20_06:2022-04-20,2022-04-20 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 phishscore=0 malwarescore=0 suspectscore=0 spamscore=0 mlxlogscore=999 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204200130 X-Proofpoint-GUID: jq1l1Q7xNQlIvHEmRA5JhZSmLrM22i83 X-Proofpoint-ORIG-GUID: jq1l1Q7xNQlIvHEmRA5JhZSmLrM22i83 Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b="pY/qXCQ5"; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=EPlvRYyb; dmarc=pass (policy=none) header.from=oracle.com; spf=none (imf14.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.177.32) smtp.mailfrom=mike.kravetz@oracle.com X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 629DE10001B X-Stat-Signature: 5myo7ucrutygh767jxbbn1eiaj9amaun X-HE-Tag: 1650494302-242464 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Most hugetlb fault handling code checks for faults beyond i_size. While there are early checks in the code paths, the most difficult to handle are those discovered after taking the page table lock. At this point, we have possibly allocated a page and consumed associated reservations and possibly added the page to the page cache. When discovering a fault beyond i_size, be sure to: - Remove the page from page cache, else it will sit there until the file is removed. - Do not restore any reservation for the page consumed. Otherwise there will be an outstanding reservation for an offset beyond the end of file. The 'truncation' code in remove_inode_hugepages must deal with fault code potentially removing a page from the cache after the page was returned by pagevec_lookup and before locking the page. This can be discovered by a change in page_mapping() after taking page lock. In addition, this code must deal with fault code potentially consuming and returning reservations. Th synchronize this, remove_inode_hugepages will not take the fault mutex for ALL indicies in the hole or truncated range. In this way, it KNOWS fault code has finished or will see the updated file size. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 104 +++++++++++++++++++++++++++++++------------ mm/hugetlb.c | 39 ++++++++++++---- 2 files changed, 105 insertions(+), 38 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 5e4bd2f1705f..d239646fa85d 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -443,11 +443,10 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, * truncation is indicated by end of range being LLONG_MAX * In this case, we first scan the range and release found pages. * After releasing pages, hugetlb_unreserve_pages cleans up region/reserve - * maps and global counts. Page faults can not race with truncation - * in this routine. hugetlb_no_page() prevents page faults in the - * truncated range. It checks i_size before allocation, and again after - * with the page table lock for the page held. The same lock must be - * acquired to unmap a page. + * maps and global counts. Page faults can race with truncation. + * During faults, hugetlb_no_page() checks i_size before page allocation, + * and again after obtaining page table lock. It will 'back out' + * allocations in the truncated range. * hole punch is indicated if end is not LLONG_MAX * In the hole punch case we scan the range and release found pages. * Only when releasing a page is the associated region/reserve map @@ -456,14 +455,26 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, * This is indicated if we find a mapped page. * Note: If the passed end of range value is beyond the end of file, but * not LLONG_MAX this routine still performs a hole punch operation. + * + * Since page faults can race with this routine, care must be taken as both + * modify huge page reservation data. To somewhat synchronize these operations + * the hugetlb fault mutex is taken for EVERY index in the range to be hole + * punched or truncated. In this way, we KNOW fault code will either have + * completed backout operations under the mutex, or fault code will see the + * updated file size and not allocate a page for offsets beyond truncated size. + * The parameter 'lm__end' indicates the offset of the end of hole or file + * before truncation. For hole punch lm_end == lend. */ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, - loff_t lend) + loff_t lend, loff_t lm_end) { + u32 hash; struct hstate *h = hstate_inode(inode); struct address_space *mapping = &inode->i_data; const pgoff_t start = lstart >> huge_page_shift(h); const pgoff_t end = lend >> huge_page_shift(h); + pgoff_t m_end = lm_end >> huge_page_shift(h); + pgoff_t m_start, m_index; struct pagevec pvec; pgoff_t next, index; int i, freed = 0; @@ -475,14 +486,33 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, /* * When no more pages are found, we are done. */ - if (!pagevec_lookup_range(&pvec, mapping, &next, end - 1)) + m_start = next; + if (!pagevec_lookup_range(&pvec, mapping, &next, end - 1)) { + /* + * To synchronize with faults, take fault mutex for + * each index in range. + */ + for (m_index = m_start; m_index < m_end; m_index++) { + hash = hugetlb_fault_mutex_hash(mapping, + m_index); + mutex_lock(&hugetlb_fault_mutex_table[hash]); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + } break; + } for (i = 0; i < pagevec_count(&pvec); ++i) { struct page *page = pvec.pages[i]; - u32 hash = 0; index = page->index; + /* Take fault mutex for missing pages before index */ + for (m_index = m_start; m_index < index; m_index++) { + hash = hugetlb_fault_mutex_hash(mapping, + m_index); + mutex_lock(&hugetlb_fault_mutex_table[hash]); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + } + m_start = index + 1; hash = hugetlb_fault_mutex_hash(mapping, index); mutex_lock(&hugetlb_fault_mutex_table[hash]); @@ -491,13 +521,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, * unmapped in caller. Unmap (again) now after taking * the fault mutex. The mutex will prevent faults * until we finish removing the page. - * - * This race can only happen in the hole punch case. - * Getting here in a truncate operation is a bug. */ if (unlikely(page_mapped(page))) { - BUG_ON(truncate_op); - i_mmap_lock_write(mapping); hugetlb_vmdelete_list(&mapping->i_mmap, index * pages_per_huge_page(h), @@ -508,27 +533,46 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, lock_page(page); /* - * We must free the huge page and remove from page - * cache BEFORE removing the region/reserve map - * (hugetlb_unreserve_pages). In rare out of memory - * conditions, removal of the region/reserve map could - * fail. Correspondingly, the subpool and global - * reserve usage count can need to be adjusted. + * After locking page, make sure mapping is the same. + * We could have raced with page fault populate and + * backout code. */ - VM_BUG_ON(HPageRestoreReserve(page)); - hugetlb_delete_from_page_cache(page); - freed++; - if (!truncate_op) { - if (unlikely(hugetlb_unreserve_pages(inode, + if (page_mapping(page) == mapping) { + /* + * We must free the huge page and remove from + * page cache BEFORE removing the region/ + * reserve map (hugetlb_unreserve_pages). In + * rare out of memory conditions, removal of + * the region/reserve map could fail. + * Correspondingly, the subpool and global + * reserve usage count can need to be adjusted. + */ + VM_BUG_ON(HPageRestoreReserve(page)); + hugetlb_delete_from_page_cache(page); + freed++; + if (!truncate_op) { + if (unlikely( + hugetlb_unreserve_pages(inode, index, index + 1, 1))) - hugetlb_fix_reserve_counts(inode); + hugetlb_fix_reserve_counts( + inode); + } } - unlock_page(page); mutex_unlock(&hugetlb_fault_mutex_table[hash]); } huge_pagevec_release(&pvec); cond_resched(); + + if (!(next < end)) { + /* Will exit loop, take mutex for indicies up to end */ + for (m_index = m_start; m_index < m_end; m_index++) { + hash = hugetlb_fault_mutex_hash(mapping, + m_index); + mutex_lock(&hugetlb_fault_mutex_table[hash]); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + } + } } if (truncate_op) @@ -538,8 +582,9 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, static void hugetlbfs_evict_inode(struct inode *inode) { struct resv_map *resv_map; + loff_t prev_size = i_size_read(inode); - remove_inode_hugepages(inode, 0, LLONG_MAX); + remove_inode_hugepages(inode, 0, LLONG_MAX, prev_size); /* * Get the resv_map from the address space embedded in the inode. @@ -559,6 +604,7 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset) pgoff_t pgoff; struct address_space *mapping = inode->i_mapping; struct hstate *h = hstate_inode(inode); + loff_t prev_size = i_size_read(inode); BUG_ON(offset & ~huge_page_mask(h)); pgoff = offset >> PAGE_SHIFT; @@ -569,7 +615,7 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset) hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0, ZAP_FLAG_DROP_MARKER); i_mmap_unlock_write(mapping); - remove_inode_hugepages(inode, offset, LLONG_MAX); + remove_inode_hugepages(inode, offset, LLONG_MAX, prev_size); } static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) @@ -603,7 +649,7 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) hole_start >> PAGE_SHIFT, hole_end >> PAGE_SHIFT, 0); i_mmap_unlock_write(mapping); - remove_inode_hugepages(inode, hole_start, hole_end); + remove_inode_hugepages(inode, hole_start, hole_end, hole_end); inode_unlock(inode); } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d60997462df8..e02df3527a9c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5436,6 +5436,8 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, spinlock_t *ptl; unsigned long haddr = address & huge_page_mask(h); bool new_page, new_pagecache_page = false; + bool beyond_i_size = false; + bool reserve_alloc = false; /* * Currently, we are forced to kill the process in the event the @@ -5493,6 +5495,8 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, clear_huge_page(page, address, pages_per_huge_page(h)); __SetPageUptodate(page); new_page = true; + if (HPageRestoreReserve(page)) + reserve_alloc = true; if (vma->vm_flags & VM_MAYSHARE) { int err = hugetlb_add_to_page_cache(page, mapping, idx); @@ -5551,8 +5555,10 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, ptl = huge_pte_lock(h, mm, ptep); size = i_size_read(mapping->host) >> huge_page_shift(h); - if (idx >= size) + if (idx >= size) { + beyond_i_size = true; goto backout; + } ret = 0; /* If pte changed from under us, retry */ @@ -5597,10 +5603,25 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, backout: spin_unlock(ptl); backout_unlocked: + if (new_page) { + if (new_pagecache_page) + hugetlb_delete_from_page_cache(page); + + /* + * If reserve was consumed, make sure flag is set so that it + * will be restored in free_huge_page(). + */ + if (reserve_alloc) + SetHPageRestoreReserve(page); + + /* + * Do not restore reserve map entries beyond i_size. + * Otherwise, there will be leaks when the file is removed. + */ + if (!beyond_i_size) + restore_reserve_on_error(h, vma, haddr, page); + } unlock_page(page); - /* restore reserve for newly allocated pages not in page cache */ - if (new_page && !new_pagecache_page) - restore_reserve_on_error(h, vma, haddr, page); put_page(page); goto out; } @@ -5921,15 +5942,15 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, * Recheck the i_size after holding PT lock to make sure not * to leave any page mapped (as page_mapped()) beyond the end * of the i_size (remove_inode_hugepages() is strict about - * enforcing that). If we bail out here, we'll also leave a - * page in the radix tree in the vm_shared case beyond the end - * of the i_size, but remove_inode_hugepages() will take care - * of it as soon as we drop the hugetlb_fault_mutex_table. + * enforcing that). If we bail out here, remove the page + * added to the radix tree. */ size = i_size_read(mapping->host) >> huge_page_shift(h); ret = -EFAULT; - if (idx >= size) + if (idx >= size) { + hugetlb_delete_from_page_cache(page); goto out_release_unlock; + } ret = -EEXIST; /* From patchwork Wed Apr 20 22:37:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12820898 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF5F0C433F5 for ; Wed, 20 Apr 2022 22:38:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 971FA6B007B; Wed, 20 Apr 2022 18:38:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 920C66B007D; Wed, 20 Apr 2022 18:38:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 775256B007E; Wed, 20 Apr 2022 18:38:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 617836B007B for ; Wed, 20 Apr 2022 18:38:25 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3082826176 for ; Wed, 20 Apr 2022 22:38:25 +0000 (UTC) X-FDA: 79378722570.18.71E1AAA Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by imf22.hostedemail.com (Postfix) with ESMTP id 3E4BCC000F for ; Wed, 20 Apr 2022 22:38:24 +0000 (UTC) Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 23KJBjcp019298; Wed, 20 Apr 2022 22:38:21 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2021-07-09; bh=NmyqhhEaUoaXJLuHNWT1slkwIGoiqwlIpP5W2GTLP+Y=; b=FA9DBgADFVC7dCQ29IE+mpcbjDtWTLRAhjqDFcLqMm6lnbHrCW7UIMPSw9fkpCdmRl8F GSZaMY3FEd1okIgQWY4ziqAjjbtvPPY5fMvMooPl+G88WLfcJBqxsF7W3C+73NFbgHim gPB61Yro37bUXIIyUzFGgLkxxa7FLGY1qINRFcMRVeZYOZ1Ugc+BlX8yjA1WhZj5OBQG Rnx7isnYL/Gu2WUk7oXbQB+PmlpKycDWMwbjf2BTNv7vhHpbPcfOiIIyWoxz8ey6blbW /crEivAaOyiOkr4zxfBPgKXzpltMCRZj8B9QTKRxUNLd68fFeqzod3l/IqyAqhA1UNjR 8A== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com with ESMTP id 3ffm7ct91s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 22:38:21 +0000 Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 23KMQuqL035884; Wed, 20 Apr 2022 22:38:20 GMT Received: from nam11-co1-obe.outbound.protection.outlook.com (mail-co1nam11lp2173.outbound.protection.outlook.com [104.47.56.173]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3ffm88dqrw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 22:38:19 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=RNw0xGHoIOQAI2agn4MtCFoUnUox2ygXbQE4GvxcBgFgvtp8Sz/lXpBChOdQVjq9JOrRfZ0D5J2w2XRJC7mQ6n4s1/BcEflnmlVs4iydimNgst3Vk5NO8adpU3DEpBYaPRhgy99PQbJOw7KcO25JOVBcWHz07Fcui5te85N/U3A0GTnjJA3ga8vnCS6VLbxezK4s3Tj3TcpJpeIlmh+54CR5+B7xruVnhQuDVowU6QKqQxyG0JhRoUYMQFz6qpbGJ/Jtjqlf5PXvY0JMYQYwGjJOyuVgvcSGPg9BhYJS2X5pcAsHzakIgdx5TSW3gKewo+RXgKQukl0nr83rzZtoNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=NmyqhhEaUoaXJLuHNWT1slkwIGoiqwlIpP5W2GTLP+Y=; b=PKplrXktbOTfhozBu0y671xeMFqmradbiORVSU+r5pK1XFxdOEYY9k0MHbg9CDem50+MDnFAbTZnVbDskfZKVCFc+OahhuI5bwoJSrmMuDgEVYzqRRgktSSedONq2uA7XBS50/F1Qy8Di6fjpMiGj+d7Bxht49CTwceyXXPF0L1cijrRWnpp9pMd5hl0vDtaJ7D2DhrnHyw6G4SXBA711m0xJDGtANV6yIFbkXuR4F/jbRntTVFRygvk7/TVD8VnktaCb4Sa6ixDswggLSHAtKz1oFL6jTBR6RNZsztYB8RbdQvvwv4O2l7J0DALOJ0lFDVqfHScebn1MH+DYx4+UQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=NmyqhhEaUoaXJLuHNWT1slkwIGoiqwlIpP5W2GTLP+Y=; b=ldLVOkfnB9YeiZSl6IM/WagoMPWfFMV4PVaAtdgdjso1mM7biYIHeQLIxWK02oH+n8OEfxOCLcDMy1T5O11q08hg5+ylZhdVYCjA0cVbJuprlrFNxXlnYP/YAqzmTSAL71KrYc2Wh7iFTlz6PgGGWeyAy94MU1L9eIAig6ISfxA= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by PH0PR10MB4550.namprd10.prod.outlook.com (2603:10b6:510:34::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.14; Wed, 20 Apr 2022 22:38:17 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::bd4a:8d2e:a938:56af]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::bd4a:8d2e:a938:56af%9]) with mapi id 15.20.5186.014; Wed, 20 Apr 2022 22:38:17 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Michal Hocko , Peter Xu , Naoya Horiguchi , David Hildenbrand , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [RFC PATCH v2 5/6] hugetlbfs: Do not use pmd locks if hugetlb sharing possible Date: Wed, 20 Apr 2022 15:37:52 -0700 Message-Id: <20220420223753.386645-6-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220420223753.386645-1-mike.kravetz@oracle.com> References: <20220420223753.386645-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW4P220CA0030.NAMP220.PROD.OUTLOOK.COM (2603:10b6:303:115::35) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: ab265435-1391-415e-78d7-08da231e7693 X-MS-TrafficTypeDiagnostic: PH0PR10MB4550:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: lcp0fg29MJ4tD3QhJUpfAtE9sfZq/b2mYytVvJFUH2cutxm6u5VDokziQ8y9ieab2VozY8tDBfJ8VXlcbxhZMcS6HEmTb2FIQJYtRSZtgrpyYP8VB1xT0k0kbKSgJATmago58YcfSWbwzWYVjjDYjpdiU9dzbkSGmm9Pxi/jOxFDsneBWuMMS5uGAbk6jYI+ee4ihTFrYGFpQBiE1xxYZXCI8BEvE31pAPD+3MPqete3wqCa7Up44jzckIpBPmnXIlSv6SnpVJvUiVvBijBLoDpCFPTw67D7aUMCCrgtGKGWb6QP3kEsFRAaW493lT34XwYOXGfQrSRROoeJV+ldUqySr/puwpVt+nA9R7XddJxDfG5mhLMMcOJQd2Ay+ZEJZZrh39UZYyyoMzLQ2uHRZMjrChSY4B8UYAv0CEyFwuP4XevWtXZegi3dbtH4fCd/6zC9pH9RoCWbKvS0YPXLPT5V4kXuY2QMerENocAB5QuFKvBMaba70IvkVeHQGome6crCuC++v4gnxlOgFaY3H0vYrjUSKWZoNN/AsK0b5QLaVEr5SQ/PhOt4hg35MNON80l5usikJM5oYLvlRW/BNDrrYyyoUcLN3z2CvZ2yQsaumhXQi/e5HOuXCAAv066YH0tJKk+wwELA7OFwh4xLGMnCZsuO/UNdO489BbVvs14hj1tLs8XOyBL/+qsRikpoYej4uycX6nIhOhtWxvHVQg== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230001)(366004)(6486002)(38350700002)(38100700002)(508600001)(8936002)(54906003)(86362001)(316002)(30864003)(186003)(52116002)(66556008)(36756003)(83380400001)(44832011)(26005)(66946007)(2906002)(6506007)(6512007)(107886003)(4326008)(1076003)(5660300002)(2616005)(6666004)(66476007)(7416002)(8676002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: nW8cuOqHc8R7eVgoZmt5esjOTNU98F2yezTiwwxTB2I6fdw4Zb742h6X3aEd59YofvwBssYLM6ibkEmxohSh0UgedgA9jNGXP0SCnqSgwiCVxymKdLNmzbj7UrlFN2HWiB7juG6lvn8Gc1MZi8rOGsd9vEx67/O0TRL3aXBPuD5nh0y6hnLIULFTYbq6cjXXs9tTtDuR/Gj0P/nFS1VHRNILtQn8UpoRhYoyXgBm+TjfFMHihAY3SfcR1MK63cRtduy1hVCk0b2M7U5JsqGSNnk3NwLOybpRYmDsSFn2ha1lqBqmmy+3tA5PDLtYDcsWWS3O0e8vY5Wopnizgs8sn88Vly2OmSLLdxZK4Lvj3odnKMbmgFu1e5sEcd90EB+ZVXS0KAbHxxP/jJPMKN99JPEgdqcWMMIQOlT09WAMNgEUW2KUbX12y4cVb3L95voR8r/lZJPk2PPQ3Sg1xTnRIhsgs7ijZ78JtIUZSaWlpevslFxQsKLtLHYRT6z9A6xBKDvUhatD6IVQjJGeyssuaduNUo0q+qth2WZi9u6GL3TjhKRZhDZ6EJnpU0x8oxsDdByn0JUINmx0ak0681LcYrCunht4h7z35arP8GOLHtCP190MTmt05DpI1IE9JNLAdb2On1Nwf6wCYIRcZqbwbd1NgSTpwS4Vx78DO+EUv7OILOCsMUA9ba3QS6sdSsqEJuyOE6RF4aFl+mgRFQWInHF69icAShyfAkNl3oInEpLfWmyySfarumeD7rAfUPmTer9YWUlUuwQ8yGXU6SmRFIbpoR+y830xfbqXLhTfD4Qbftvg/tBjw7N64pmwqRGPeg2aseHTeJeCqv1uSb57rVc+vvOV9xt47ojAJP9lHnRZnzhtsBClXQPIn946SV4XXpRv8P4H4GVl3t16uFtSvHsWfm82/ijZutrtYHx7jvV6OSN9g6sW0R6dKRGTLeRwy7mR2m736WB8jDhdPNDHUsyYQUpric87uZsqovC3qfbfM+QLNaTBkkYxxdE9HHkCjjLIxD80KWvfz09h2U6TheVt0xxMOhgO8RoxbXRByMPza4WQBHzCdmLdHnW3wR18kxbLHzokfZ1tvPTx+ul1EsYTxZTeQC9JYSH5xmYhIlxNr/wovUcVdrEown+Kj4NLrS0pQGfayWvBc2eLv96cvP7k1foNEUgeK+laUz+FC0paSXipoegbWvz0DYffAoOD5QkiL1OEFbDTQFZMxneTyLmUExIK41ardGC1RdGj+R40W0u53sUR0bq/UKHPfjDG5GYXUKwN2DncWdAg8OjN8PqChr0vbMoJKPM9l/uOE2dKBLGEykP4BrN3eo6J+YRa8grWHJxKN7VZ1JsFa7+OxodmPVKPt7z1EewV5kaGLJB9+ELLCFeerFelOQAHvA7G60bek2WHXU7OQzM6xv98ZS8u56ErbGmFCbWHfu+3O/ShW2ssVsYOG/31wfnLyUrbHBN4+UQYOonnPibOkgow/zLR+abAqtip+FPi3b4toNC46oByJ8DA1OvcRXk2ul7ltSY+fPD6LJ/+f8FFGx+0pdmtnD+oJVbuawctSZP0dueTrGyPdT3GMsWKzUCGHINP+BOuewuakKKgaXujNZ+0kN2/gNu6qIUCA2uCGHCR0pAB61fVNyGawZDe7ADiprC8FzqrBEPR2ZOgPUW+A13WQPa2Ez6gmklcgU3AyYeVua7+T5Fg4Sp36jjgp3U+GxRR21v2haWAbjuZiiaeC0jUqZjIq65ApgozzRlIJZ03ES0= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: ab265435-1391-415e-78d7-08da231e7693 X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Apr 2022 22:38:17.3119 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: fQj34edFzK7WlQGNobEiKVpkdXxWMFMyqwgHS6VDicLpTCSzkQMzdi0sG2TEShaNY5AEKb+XTc17ObASqQgNQQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR10MB4550 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.486,18.0.858 definitions=2022-04-20_06:2022-04-20,2022-04-20 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 bulkscore=0 suspectscore=0 malwarescore=0 mlxlogscore=999 adultscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204200130 X-Proofpoint-GUID: CqxUkVpa9tGnDE9Ls3EHUZJ-NeQkNSDT X-Proofpoint-ORIG-GUID: CqxUkVpa9tGnDE9Ls3EHUZJ-NeQkNSDT X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 3E4BCC000F X-Rspam-User: Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=FA9DBgAD; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=ldLVOkfn; dmarc=pass (policy=none) header.from=oracle.com; spf=none (imf22.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.177.32) smtp.mailfrom=mike.kravetz@oracle.com X-Stat-Signature: nhri8xtsictnatbks571caisd9ygthkx X-HE-Tag: 1650494304-767053 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In hugetlbfs, split pmd page table locks are generally used if huge_page_size is equal to PMD_SIZE. These locks are located in the struct page of the corresponding pmd page. A pmd pointer is used to locate the page. In the case of pmd sharing, pmd pointers can become invalid unless one holds the page table lock. This creates a chicken/egg problem as we need to use the pointer to locate the lock. To address this issue, use the page_table_lock in the mm_struct in the pmd pointer is associated with a sharable vma. The routines dealing with huge pte locks (huge_pte_lockptr and huge_pte_lock) are modified to take a vma pointer instead of mm pointer. The vma is then checked to determine if sharing is possible. If it is, then the page table lock in the mm_struct is used. Otherwise, the lock in hte pmd page struct page is used. Note that code uses the mm_struct if any part of the vma is sharable. This could be optimized by passing in the virtial address associated with the pte pointer to determine if that specific address is sharable. Signed-off-by: Mike Kravetz --- arch/powerpc/mm/pgtable.c | 2 +- include/linux/hugetlb.h | 27 ++++-------- mm/damon/vaddr.c | 4 +- mm/hmm.c | 2 +- mm/hugetlb.c | 92 +++++++++++++++++++++++++++++---------- mm/mempolicy.c | 2 +- mm/migrate.c | 2 +- mm/page_vma_mapped.c | 2 +- 8 files changed, 85 insertions(+), 48 deletions(-) diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index 6ec5a7dd7913..02f76e8b735a 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -261,7 +261,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, psize = hstate_get_psize(h); #ifdef CONFIG_DEBUG_VM - assert_spin_locked(huge_pte_lockptr(h, vma->vm_mm, ptep)); + assert_spin_locked(huge_pte_lockptr(h, vma, ptep)); #endif #else diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 75f4ff481538..c37611eb8571 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -864,15 +864,8 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask) return modified_mask; } -static inline spinlock_t *huge_pte_lockptr(struct hstate *h, - struct mm_struct *mm, pte_t *pte) -{ - if (huge_page_size(h) == PMD_SIZE) - return pmd_lockptr(mm, (pmd_t *) pte); - VM_BUG_ON(huge_page_size(h) == PAGE_SIZE); - return &mm->page_table_lock; -} - +spinlock_t *huge_pte_lockptr(struct hstate *h, struct vm_area_struct *vma, + pte_t *pte); #ifndef hugepages_supported /* * Some platform decide whether they support huge pages at boot @@ -1073,8 +1066,11 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask) } static inline spinlock_t *huge_pte_lockptr(struct hstate *h, - struct mm_struct *mm, pte_t *pte) + struct vm_area_struct *vma, + pte_t *pte) { + struct mm_struct *mm = vma->vm_mm; + return &mm->page_table_lock; } @@ -1096,15 +1092,8 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr } #endif /* CONFIG_HUGETLB_PAGE */ -static inline spinlock_t *huge_pte_lock(struct hstate *h, - struct mm_struct *mm, pte_t *pte) -{ - spinlock_t *ptl; - - ptl = huge_pte_lockptr(h, mm, pte); - spin_lock(ptl); - return ptl; -} +spinlock_t *huge_pte_lock(struct hstate *h, struct vm_area_struct *vma, + pte_t *pte); #if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_CMA) extern void __init hugetlb_cma_reserve(int order); diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index b2ec0aa1ff45..125439fc88b6 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -432,7 +432,7 @@ static int damon_mkold_hugetlb_entry(pte_t *pte, unsigned long hmask, spinlock_t *ptl; pte_t entry; - ptl = huge_pte_lock(h, walk->mm, pte); + ptl = huge_pte_lock(h, walk->vma, pte); entry = huge_ptep_get(pte); if (!pte_present(entry)) goto out; @@ -555,7 +555,7 @@ static int damon_young_hugetlb_entry(pte_t *pte, unsigned long hmask, spinlock_t *ptl; pte_t entry; - ptl = huge_pte_lock(h, walk->mm, pte); + ptl = huge_pte_lock(h, walk->vma, pte); entry = huge_ptep_get(pte); if (!pte_present(entry)) goto out; diff --git a/mm/hmm.c b/mm/hmm.c index 3fd3242c5e50..95b443f2e48e 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -486,7 +486,7 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, spinlock_t *ptl; pte_t entry; - ptl = huge_pte_lock(hstate_vma(vma), walk->mm, pte); + ptl = huge_pte_lock(hstate_vma(vma), vma, pte); entry = huge_ptep_get(pte); i = (start - range->start) >> PAGE_SHIFT; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e02df3527a9c..c1352ab7f941 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -94,8 +94,32 @@ DEFINE_SPINLOCK(hugetlb_lock); static int num_fault_mutexes; struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp; -/* Forward declaration */ +/* Forward declarations */ static int hugetlb_acct_memory(struct hstate *h, long delta); +static bool vma_range_shareable(struct vm_area_struct *vma, + unsigned long start, unsigned long end); + +spinlock_t *huge_pte_lockptr(struct hstate *h, struct vm_area_struct *vma, + pte_t *pte) +{ + struct mm_struct *mm = vma->vm_mm; + + if (huge_page_size(h) == PMD_SIZE && + !vma_range_shareable(vma, vma->vm_start, vma->vm_end)) + return pmd_lockptr(mm, (pmd_t *) pte); + VM_BUG_ON(huge_page_size(h) == PAGE_SIZE); + return &mm->page_table_lock; +} + +spinlock_t *huge_pte_lock(struct hstate *h, struct vm_area_struct *vma, + pte_t *pte) +{ + spinlock_t *ptl; + + ptl = huge_pte_lockptr(h, vma, pte); + spin_lock(ptl); + return ptl; +} static inline bool subpool_is_free(struct hugepage_subpool *spool) { @@ -4753,8 +4777,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, if ((dst_pte == src_pte) || !huge_pte_none(dst_entry)) continue; - dst_ptl = huge_pte_lock(h, dst, dst_pte); - src_ptl = huge_pte_lockptr(h, src, src_pte); + dst_ptl = huge_pte_lock(h, dst_vma, dst_pte); + src_ptl = huge_pte_lockptr(h, src_vma, src_pte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); dst_entry = huge_ptep_get(dst_pte); @@ -4830,8 +4854,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, put_page(ptepage); /* Install the new huge page if src pte stable */ - dst_ptl = huge_pte_lock(h, dst, dst_pte); - src_ptl = huge_pte_lockptr(h, src, src_pte); + dst_ptl = huge_pte_lock(h, dst_vma, dst_pte); + src_ptl = huge_pte_lockptr(h, src_vma, src_pte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { @@ -4882,8 +4906,8 @@ static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr, spinlock_t *src_ptl, *dst_ptl; pte_t pte; - dst_ptl = huge_pte_lock(h, mm, dst_pte); - src_ptl = huge_pte_lockptr(h, mm, src_pte); + dst_ptl = huge_pte_lock(h, vma, dst_pte); + src_ptl = huge_pte_lockptr(h, vma, src_pte); /* * We don't have to worry about the ordering of src and dst ptlocks @@ -4988,7 +5012,7 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct if (!ptep) continue; - ptl = huge_pte_lock(h, mm, ptep); + ptl = huge_pte_lock(h, vma, ptep); if (huge_pmd_unshare(mm, vma, &address, ptep)) { spin_unlock(ptl); tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE); @@ -5485,7 +5509,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, * here. Before returning error, get ptl and make * sure there really is no pte entry. */ - ptl = huge_pte_lock(h, mm, ptep); + ptl = huge_pte_lock(h, vma, ptep); ret = 0; if (huge_pte_none(huge_ptep_get(ptep))) ret = vmf_error(PTR_ERR(page)); @@ -5553,7 +5577,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, vma_end_reservation(h, vma, haddr); } - ptl = huge_pte_lock(h, mm, ptep); + ptl = huge_pte_lock(h, vma, ptep); size = i_size_read(mapping->host) >> huge_page_shift(h); if (idx >= size) { beyond_i_size = true; @@ -5733,7 +5757,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, vma, haddr); } - ptl = huge_pte_lock(h, mm, ptep); + ptl = huge_pte_lock(h, vma, ptep); /* Check for a racing update before calling hugetlb_wp() */ if (unlikely(!pte_same(entry, huge_ptep_get(ptep)))) @@ -5935,7 +5959,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, page_in_pagecache = true; } - ptl = huge_pte_lockptr(h, dst_mm, dst_pte); + ptl = huge_pte_lockptr(h, dst_vma, dst_pte); spin_lock(ptl); /* @@ -6089,7 +6113,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, pte = huge_pte_offset(mm, vaddr & huge_page_mask(h), huge_page_size(h)); if (pte) - ptl = huge_pte_lock(h, mm, pte); + ptl = huge_pte_lock(h, vma, pte); absent = !pte || huge_pte_none(huge_ptep_get(pte)); /* @@ -6267,7 +6291,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, ptep = huge_pte_offset(mm, address, psize); if (!ptep) continue; - ptl = huge_pte_lock(h, mm, ptep); + ptl = huge_pte_lock(h, vma, ptep); if (huge_pmd_unshare(mm, vma, &address, ptep)) { /* * When uffd-wp is enabled on the vma, unshare @@ -6583,26 +6607,44 @@ static unsigned long page_table_shareable(struct vm_area_struct *svma, return saddr; } -static bool vma_shareable(struct vm_area_struct *vma, unsigned long addr) +static bool __vma_aligned_range_shareable(struct vm_area_struct *vma, + unsigned long start, unsigned long end) { - unsigned long base = addr & PUD_MASK; - unsigned long end = base + PUD_SIZE; - /* * check on proper vm_flags and page table alignment */ - if (vma->vm_flags & VM_MAYSHARE && range_in_vma(vma, base, end)) + if (vma->vm_flags & VM_MAYSHARE && range_in_vma(vma, start, end)) return true; return false; } +static bool vma_range_shareable(struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + unsigned long v_start = ALIGN(vma->vm_start, PUD_SIZE), + v_end = ALIGN_DOWN(vma->vm_end, PUD_SIZE); + + if (v_start >= v_end) + return false; + + return __vma_aligned_range_shareable(vma, v_start, v_end); +} + +static bool vma_addr_shareable(struct vm_area_struct *vma, unsigned long addr) +{ + unsigned long start = addr & PUD_MASK; + unsigned long end = start + PUD_SIZE; + + return __vma_aligned_range_shareable(vma, start, end); +} + bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) { #ifdef CONFIG_USERFAULTFD if (uffd_disable_huge_pmd_share(vma)) return false; #endif - return vma_shareable(vma, addr); + return vma_addr_shareable(vma, addr); } /* @@ -6672,7 +6714,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, if (!spte) goto out; - ptl = huge_pte_lock(hstate_vma(vma), mm, spte); + ptl = huge_pte_lock(hstate_vma(vma), vma, spte); if (pud_none(*pud)) { pud_populate(mm, pud, (pmd_t *)((unsigned long)spte & PAGE_MASK)); @@ -6719,6 +6761,12 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, } #else /* !CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ +static bool vma_range_shareable(struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + return false; +} + pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, pud_t *pud) { @@ -7034,7 +7082,7 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) ptep = huge_pte_offset(mm, address, sz); if (!ptep) continue; - ptl = huge_pte_lock(h, mm, ptep); + ptl = huge_pte_lock(h, vma, ptep); /* We don't want 'address' to be changed */ huge_pmd_unshare(mm, vma, &tmp, ptep); spin_unlock(ptl); diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 58af432a39b2..4692640847eb 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -577,7 +577,7 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask, spinlock_t *ptl; pte_t entry; - ptl = huge_pte_lock(hstate_vma(walk->vma), walk->mm, pte); + ptl = huge_pte_lock(hstate_vma(walk->vma), walk->vma, pte); entry = huge_ptep_get(pte); if (!pte_present(entry)) goto unlock; diff --git a/mm/migrate.c b/mm/migrate.c index b2678279eb43..3d765ee101ad 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -318,7 +318,7 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, void migration_entry_wait_huge(struct vm_area_struct *vma, struct mm_struct *mm, pte_t *pte) { - spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), mm, pte); + spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), vma, pte); __migration_entry_wait(mm, pte, ptl); } diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index c10f839fc410..f09eaef2a828 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -174,7 +174,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (!pvmw->pte) return false; - pvmw->ptl = huge_pte_lockptr(hstate, mm, pvmw->pte); + pvmw->ptl = huge_pte_lockptr(hstate, vma, pvmw->pte); spin_lock(pvmw->ptl); if (!check_pte(pvmw)) return not_found(pvmw); From patchwork Wed Apr 20 22:37:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12820899 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57653C433EF for ; Wed, 20 Apr 2022 22:38:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6FEA56B007D; Wed, 20 Apr 2022 18:38:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6AF1B6B007E; Wed, 20 Apr 2022 18:38:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 466BD6B0080; Wed, 20 Apr 2022 18:38:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 38E156B007D for ; Wed, 20 Apr 2022 18:38:26 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 106116133D for ; Wed, 20 Apr 2022 22:38:26 +0000 (UTC) X-FDA: 79378722612.16.7B45431 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by imf20.hostedemail.com (Postfix) with ESMTP id C6F4A1C0026 for ; Wed, 20 Apr 2022 22:38:24 +0000 (UTC) Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 23KMaRMD014729; Wed, 20 Apr 2022 22:38:22 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2021-07-09; bh=Fjtd50hbyElPrCVlmLMp+5qxBBYw+mzUQL957qPyfFM=; b=V0VhrCtB9K5rlu2B8i6pR1s/FL6618ZIYHn5JYOeOcbDUjM9QYe4X2X31nd67Tu4XP/r oESz6bZ4YE4zm+yEIsa9iDOxlukBZBpWrfJGxMlseK7/Rde6F8NHeVQEoXJipo50wbAu I7FfbF9t6ehO5jFWBVGNq569XBC19sOd9a3LYJqA89OSEE2kuM8vDPM/whyKtDp4AyN9 hgwTzu4cHq+kYrfLTwyEXYBg2lLqV4PJUVvMYjDRa2QXpGhmHB4+00JCuUgqIZi3uKcQ Elgokp3Y57JQbGlAJzm9ucQelrFs3oSeRhHQMYE01YkfXrNOE53ZR9Zvhg8bielWO6nP Dw== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com with ESMTP id 3ffndtjh2e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 22:38:21 +0000 Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 23KMQuqM035884; Wed, 20 Apr 2022 22:38:20 GMT Received: from nam11-co1-obe.outbound.protection.outlook.com (mail-co1nam11lp2173.outbound.protection.outlook.com [104.47.56.173]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3ffm88dqrw-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 22:38:20 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=I51sKDn8mkQne3Iwetsa+PKf+I9Mrj4vihLdi2P7NgQqdST7GjMSJx8YbTKm97Iao0UafMetZplyg4gBMdr7tM+OI/VBdRda03QpYe21RozazI5aUPwmIHSkFaIUblEyBYwS+KZm1wvwxOabZAcIMR8OMkMysmHGV4n0gBypcRv9x+TSl1YU7abFKoh8mqrzUV3GFg9nG87v3hJMfeUEY6LBKbjtOypmmJxXfd4/Ax21rnCyTg32lXadZjKaSSqLqNNsjpPXqVSU4dpOkV29qQHjrZgJ48Al3oGPToV/srxfN8LS6GAHmjnjEj2Z1DeSpfa92ADMnmbhp3bqYQk6lg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Fjtd50hbyElPrCVlmLMp+5qxBBYw+mzUQL957qPyfFM=; b=MdwC1kJ8H5hycKL7/q9ILvHb66hZau6hmn1MQkmvTxFptx82jIQRmZm6Ed3TsMmq489keSHxpFXTqAnG6UbKE7dJaI3wonw27YTQb25rFNlXh73nYn4y1gtXul9g6LsBDU9NPOSul2iTTF+lUOiv7T06qGVSIt8le5v8JQ6KX+FChjxRX4bLAMGaesfX1tvDdV/JFAo5hqJJihAYSQkm3Btoo4Snqpxs7rTQ/hz9rqcaDIaEgoLnDlaYCyhjLb+1A94+ygJWegkQnqq7hF+PLPXPQA5oJxoehZBPQOACw0Xe4elIJSf1GJnJiW8wj2q4ycky6tR1qGdBAdeQ4o4PPA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Fjtd50hbyElPrCVlmLMp+5qxBBYw+mzUQL957qPyfFM=; b=i3nsQPKoAqiQQYQeYWQcbhYTvgmex8I+EkrFGSOhgU7HkCzWTn2wGXVnDXG6VjCxS6xipK+NCO8rSX22HTjiRVT7bQKaDJ5HnflylyWt18VNS1mULANc2OzCpthF7a0sFEs4AeAmLXYijn69VosVpXeTZ/hcDqMEY3lJ85ciO18= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by PH0PR10MB4550.namprd10.prod.outlook.com (2603:10b6:510:34::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.14; Wed, 20 Apr 2022 22:38:19 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::bd4a:8d2e:a938:56af]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::bd4a:8d2e:a938:56af%9]) with mapi id 15.20.5186.014; Wed, 20 Apr 2022 22:38:19 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Michal Hocko , Peter Xu , Naoya Horiguchi , David Hildenbrand , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [RFC PATCH v2 6/6] hugetlb: Check for pmd unshare and fault/lookup races Date: Wed, 20 Apr 2022 15:37:53 -0700 Message-Id: <20220420223753.386645-7-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220420223753.386645-1-mike.kravetz@oracle.com> References: <20220420223753.386645-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW4P220CA0030.NAMP220.PROD.OUTLOOK.COM (2603:10b6:303:115::35) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: cb557698-c1f0-4290-17fb-08da231e77d3 X-MS-TrafficTypeDiagnostic: PH0PR10MB4550:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: PutE92cbaTnRNABJXXLQpOg0QaLCI2qjSgoMdJMLShY9dve0Dp5RA7kJpygZzYhSuz/tZPPQDbPgOKa+2pKoRrd/ufdUe0MvNlGQeRAtvCdO9CgayNiOt0vDU9Yl7Z4t5z+yWpWgR4EcESkx9fGxD9kZhOnxIPi77dIt9eLCEleTr23S79t2YBQ68fgPBHW2ti1QNB6wJtzOoROy6+IU1vfumd/nuKT3KqZvK0wRHEkyJy98RmzZGfH4LOSZIaGo1cvP4aSD60kwOoNTUkvzE35SYEZr4KwDq6M2+Qv13X4j/Pr6y0cuQk7SJPv2s7TLDNzDXFIMZG0rPwqdOfZU/ri3W+72EjR2yUL2EcBklm5vI0HPy9d/v0xKfDcEddxEsFyvtKOtc3kGYhRGbfD35TXm0s81GjSmsX/8u3pqKw6zEnmNzdcwiF8faRNNFEBuA64sn2OEtiEo7JJS53ZpyBv5nA7pKI3CTf2bqlrTTVJka+S1yIZ8SwE8hcwfDKZt4Qz7RscOzeAPKrk1+b9nvtY/8iawVY4Rp4QqahfdWYF9ZSloYAb8DPMY+HPmShpGHFcg0pyz8v37mPoiSJxPvlPv2kHQ7HLIzm6T8vFSOnO31TLiK0D+SABnf/dzXHi2kw7ZHhwh/wfFpCs6LiKV7G+HxgqMxLi8KFKe20zVhh4kkocD3rO6y0eUcIUtZdMChM4nHvbjDNoMojZLJKAN4w== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230001)(366004)(6486002)(38350700002)(38100700002)(508600001)(8936002)(54906003)(86362001)(316002)(186003)(52116002)(66556008)(36756003)(83380400001)(44832011)(26005)(66946007)(2906002)(6506007)(6512007)(107886003)(4326008)(1076003)(5660300002)(2616005)(6666004)(66476007)(7416002)(8676002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 6L78Ue2IAog36YEqx0c1rxk4M1FZzuBWCtBS2cNaWDEoZ8F0KSBAnd4Jr2Dzl6ilzs0ZxMrFdRXE39NPusc0opPY07ijC6A8pBpsllJM6CdM2du8RUbkUZguAN9/ZuBdqP6rjMAA2F4J+8/QStiUVNRZK87H/lWcwdDh5EAVJrZYzdpwtOS2lN+51FSjQjwi1GyN5YbuE8eLlTp2GtshQSq6LdgA8hcuu50xWkkcQroVnAc2Rw6PpUwayE/Jq1QI2KigujIV+OCuNbDEkhxeNSv4d/ecEAT8CCjEs7n3C7kGVQEk7WCTgq1LeI86R0yW7jPUq7MKP44X7auOPePayjOfH3EwxE0Pu3X4f6ET1s7frH51wNPBcM0ItFJaLKGBNGt+1CAMtmztcOLJbTvfTEJxDdBDWQQ7JzXlZvpijZ/c1y/BZ4AcYSpVc9Zvx4AUJGSibWLQCW/MHIsi8coi5g5D90XE1b/UM/a92denikxLANXFSysDfanC+pbSwBY9uCun9fw/qWpsPeQ+zAFTJv512BoTs8Lsb+ykam57Ut/Od3yiKcPCIQ+r9k9Rrudh/FJ5AevDH1NYAS5jQEgs0FHxeawQA3Jme72cSX1k7SBRuMykdU/om03xkL/mdZx92GJ+YF2sz7h0fqqejbX/ShbGEPEdkfqnydOo5UzLykkrKh5jPc5Vlc3td7jZGBo1lLM9hu7vmT5TNMr+dNoSKqIqB21RsPM2LGe4Dbefne0jfGWFQCR4IqOfBrinCso6jBqGHGIiGnd2RkVEJyLlFf5edxiIzAaWweLfYa6nj9ixrALR/TOGen7yCnTKhhRccVo6vpWasiqRTqJD8Az0vgbVuD/vz+17J1EZu1HfIT0YkeBmlCyFwPsYm555EVPP4UXbHSNCYnMUfG0kVYTE0Lvupt4hA0Nc4kZrKrech3wgMTm/8cNHAj+TuxQuZxLYvV5wcnLgyd2ObJQ4AvCYOtRtGx1T3WWVTJbcZFFTGPNlIGHNQFjHkJNT176ibaNnIli7iMpBLVAdvsu9CWHwXRLOECn8xB5vCWsPjUeaIk+zA6mjeI/vjIOyGKt2fQ4+CwuPJojspYujufVHuXtA1LDWzEAsn2QLALj53r+fK/jaG8ZDPgUSC6CJORuyrx89ZUVgJpMunwG27Mc+bIq7iGGR3DxEe2RfrSePbQKKUiAoIXWpO2MRfBxzL2xmaNo4YnBVZToCqV2LbDSdv0aKLzzlnLHdhJpssLv0ujHCC5UUqjWgZauZDF2GpTCSYfqgo6TTW0tC2auQaHs+A3heQysfb0GRtlmoy1PHmlK0oV/v1rGszFWgtdL0nsBJdQ74JuMbJVmEE9gz+8fQokiRWawNnnZYhDeOIPbnJcKXQxtUr1Rh/6jlslh3NuxEANSLIQxPZKempASGO6Gt3gA5sLWhw2iy63hXmybnjZsrztlihF87DC8OGpWAMko1Kmd0ci20az5Cw2h8fGxoP8EHcn+80YaAA/JfFBk9jouKTtOHUxl+3OH/tr0j6bfReNJwuDI/LZfHqDJCFiLuaUIjlvxKfDeGX7j19IHP0NAN02czVuOWMPXodGkwExo0cMg4NbL4/n4w4/euv08AiZ8QSJLGL6k3A1BzHRRMy1h7rh22Eb3qn1/n6WOl4Pp1mTyHLbRqRUgKpOGBuY2fIxfMmWNWpt8qPuNY7iOdZpfH5MIx5TziqElyMSag+kLvMRVSlDkUR++sNXC/G+dQ0S8i5hmH1W/MycmigsWIF8Oah8c= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: cb557698-c1f0-4290-17fb-08da231e77d3 X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Apr 2022 22:38:19.4066 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: k12veRuQYD9eTaal0OylxzBgK7ZcE5M2kPPWW68k8XCb5LCfHRpdiAp17SwJz9i8Z194imoF9P6GLRwFYh6SBw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR10MB4550 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.486,18.0.858 definitions=2022-04-20_06:2022-04-20,2022-04-20 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 bulkscore=0 suspectscore=0 malwarescore=0 mlxlogscore=999 adultscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204200130 X-Proofpoint-ORIG-GUID: FnSZ2ufbAxcFIJmcwGAMgOq6h0WDMfSP X-Proofpoint-GUID: FnSZ2ufbAxcFIJmcwGAMgOq6h0WDMfSP X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: C6F4A1C0026 X-Stat-Signature: hk1rqb64g3bimd7a6zsnjowk9n81uaup Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=V0VhrCtB; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=i3nsQPKo; spf=none (imf20.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.177.32) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com X-Rspam-User: X-HE-Tag: 1650494304-69979 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When a pmd is 'unshared' it effectivelly deletes part of a processes page tables. The routine huge_pmd_unshare must be called with i_mmap_rwsem held in write mode and the page table locked. However, consider a page fault happening within that same process. We could have the following race: Faulting thread Unsharing thread ... ... ptep = huge_pte_offset() or ptep = huge_pte_alloc() ... i_mmap_unlock_write lock_page table ptep invalid <------------------------ huge_pmd_unshare Could be in a previously unlock_page_table sharing process or worse ... ptl = huge_pte_lock(ptep) get/update pte set_pte_at(pte, ptep) If the above race happens, we can update the pte of another process. Catch this situation by doing another huge_pte_offset/page table walk after obtaining the page table lock and compare pointers. If the pointers are different, then we know a race happened and we can bail and cleanup. In fault code, make sure to check for this race AFTER checking for faults beyond i_size so page cache can be cleaned up properly. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 34 +++++++++++++++++++++++++++++----- 1 file changed, 29 insertions(+), 5 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c1352ab7f941..804a8d0a2cb8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4735,6 +4735,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *src_vma) { pte_t *src_pte, *dst_pte, entry, dst_entry; + pte_t *src_pte2; struct page *ptepage; unsigned long addr; bool cow = is_cow_mapping(src_vma->vm_flags); @@ -4783,7 +4784,15 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, entry = huge_ptep_get(src_pte); dst_entry = huge_ptep_get(dst_pte); again: - if (huge_pte_none(entry) || !huge_pte_none(dst_entry)) { + + src_pte2 = huge_pte_offset(src, addr, sz); + if (unlikely(src_pte2 != src_pte)) { + /* + * Another thread could have unshared src_pte. + * Just skip. + */ + ; + } else if (huge_pte_none(entry) || !huge_pte_none(dst_entry)) { /* * Skip if src entry none. Also, skip in the * unlikely case dst entry !none as this implies @@ -5462,6 +5471,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, bool new_page, new_pagecache_page = false; bool beyond_i_size = false; bool reserve_alloc = false; + pte_t *ptep2; /* * Currently, we are forced to kill the process in the event the @@ -5510,8 +5520,10 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, * sure there really is no pte entry. */ ptl = huge_pte_lock(h, vma, ptep); + /* ptep2 checks for racing unshare page tables */ + ptep2 = huge_pte_offset(mm, haddr, huge_page_size(h)); ret = 0; - if (huge_pte_none(huge_ptep_get(ptep))) + if (ptep2 == ptep && huge_pte_none(huge_ptep_get(ptep))) ret = vmf_error(PTR_ERR(page)); spin_unlock(ptl); goto out; @@ -5584,6 +5596,11 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, goto backout; } + /* Check for racing unshare page tables */ + ptep2 = huge_pte_offset(mm, haddr, huge_page_size(h)); + if (ptep2 != ptep) + goto backout; + ret = 0; /* If pte changed from under us, retry */ if (!pte_same(huge_ptep_get(ptep), old_pte)) @@ -5677,7 +5694,7 @@ u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx) vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, unsigned int flags) { - pte_t *ptep, entry; + pte_t *ptep, *ptep2, entry; spinlock_t *ptl; vm_fault_t ret; u32 hash; @@ -5759,8 +5776,9 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, ptl = huge_pte_lock(h, vma, ptep); - /* Check for a racing update before calling hugetlb_wp() */ - if (unlikely(!pte_same(entry, huge_ptep_get(ptep)))) + /* Check for a racing update or unshare before calling hugetlb_wp() */ + ptep2 = huge_pte_offset(mm, haddr, huge_page_size(h)); + if (unlikely(ptep2 != ptep || !pte_same(entry, huge_ptep_get(ptep)))) goto out_ptl; /* Handle userfault-wp first, before trying to lock more pages */ @@ -5861,6 +5879,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, struct page *page; int writable; bool page_in_pagecache = false; + pte_t *ptep2; if (is_continue) { ret = -EFAULT; @@ -5976,6 +5995,11 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, goto out_release_unlock; } + /* Check for racing unshare page tables */ + ptep2 = huge_pte_offset(dst_mm, dst_addr, huge_page_size(h)); + if (unlikely(ptep2 != dst_pte)) + goto out_release_unlock; + ret = -EEXIST; /* * We allow to overwrite a pte marker: consider when both MISSING|WP