From patchwork Wed Apr 6 20:48:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12804250 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0DA2FC433F5 for ; Wed, 6 Apr 2022 23:56:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4EBFA6B0071; Wed, 6 Apr 2022 19:55:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 49A746B0073; Wed, 6 Apr 2022 19:55:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A0DF8D0001; Wed, 6 Apr 2022 19:55:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 196E86B0071 for ; Wed, 6 Apr 2022 19:55:52 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id DB61B1228CE for ; Wed, 6 Apr 2022 23:55:41 +0000 (UTC) X-FDA: 79328114082.11.031DF75 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by imf12.hostedemail.com (Postfix) with ESMTP id 2F0B040004 for ; Wed, 6 Apr 2022 23:55:40 +0000 (UTC) Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 236JwOCB024505; Wed, 6 Apr 2022 20:48:42 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2021-07-09; bh=0XY7zY6ntq67oUzjau6R15h26yGx5JqSdiJFIknMbsg=; b=tkX7D/3KV7XvMsNXHKIPJzRW1KmqGPgYmEb2ox+S+a6jaGYjX0UW0+r81dNVs7/wJYOc AceUtoRfL3EcJlkKQNyuAf/S4lM48bHqkGNz1qaYtoBUv4dDxdsm9Ol6oT6Y3Y81E8Iw mwZ2G3BSxrtq0INNNJdW4kchWaD16vQxX2s9NYIpXjGKGihuny56Zh26L8Vlqw+sL8z6 XOLQThbZ/uGFcxI4gsFrYEJqa55TU8BgbbCsRMMIsGHVnzstn7kE5sfoGpHusxrHZW0z lGTAiwqHTw6hCGz6/0qeRnt4ZwJVU3KqDMA4fmleuGRhEuFLaOu7CO4rW5WIGO57WAF0 ug== Received: from iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta03.appoci.oracle.com [130.35.103.27]) by mx0b-00069f02.pphosted.com with ESMTP id 3f6f1ta8tc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Apr 2022 20:48:42 +0000 Received: from pps.filterd (iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 236KkO4V013648; Wed, 6 Apr 2022 20:48:40 GMT Received: from nam10-mw2-obe.outbound.protection.outlook.com (mail-mw2nam10lp2104.outbound.protection.outlook.com [104.47.55.104]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com with ESMTP id 3f97wqhmsh-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Apr 2022 20:48:40 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fCrcv43ScIowxQO4ztkczwnIzMN/nRbqYn65ptp/up/qI5gTggPaV1YP6eagx3jnyMHzDYAj6hWm/ph8oHtoI/QQQisfOOnGg7vxGVxMiwD0B5AlpcokpbUGvI1WLZn8YsA2h2kP06yhfx98efg3LR/ADgSCT+nyyR26n0zjiKveTTEVv25uKYNZ/KcpbosjJn4uqsoT9j+Iz0jGDOQFr2L3gy22pMR3gw3s8cvn+RlUgn6/XUpBdOdgDODwHLVcYgqwJCipgxhRRZH75te+YZzEKwXKjcTkwal05yj3fra09JjQRFk0E5Yyj/3B7aA5M6ETBQ+a4Z8oF2+mSthMOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=0XY7zY6ntq67oUzjau6R15h26yGx5JqSdiJFIknMbsg=; b=byFkI6A/S+6h/9cMfT24yImeCjtAl8z+4h1Xj+cBxbEYMytI+u10+L9gdAaEpy8/3W2CjUIb/2rlc37kAb1bCFyUBfQFg2bI/kFuiqxchYvSjOwnGXghasFhLpLn8+XNiHqg/1WtQVosfWNReQS6V1TEoXFFX1lbVPqHSVb8dMDbx5z/mrpz2QLpeXuyYwjVJiqUn3KWL2KTluH9aLIcukJ9Kxeq1ZiP5NUya1k3BnTfwL8a955CmtjewY6LcjPw4GwASUEPiLUPNHURW6nhbS4lAR3ZCTu7nqjEFxdCpSKQIBq2wJNgu26+aKKFPJrGTxIUbHawMyvP+AxZGsMyxw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=0XY7zY6ntq67oUzjau6R15h26yGx5JqSdiJFIknMbsg=; b=Xogq+E0TD/dFCRVZ/3jR+JjvBAgFj6emp3MNah0IJf6BZACYCArUKJ7kMQQ1wGDQK0Db94/I/LQZ7wmafMxWeXxBXEBJRt/d+B2w99LmAOfLJXiVzQuiuAE8kto0LQy8/LNz31TeVHWGTO8Fij674RL6BA0RFpmtChAh7AIY8UQ= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by CY4PR10MB1909.namprd10.prod.outlook.com (2603:10b6:903:11f::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5123.31; Wed, 6 Apr 2022 20:48:39 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::245f:e3b1:35fd:43c5]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::245f:e3b1:35fd:43c5%8]) with mapi id 15.20.5144.019; Wed, 6 Apr 2022 20:48:39 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [RFC PATCH 1/5] hugetlbfs: revert use i_mmap_rwsem to address page fault/truncate race Date: Wed, 6 Apr 2022 13:48:19 -0700 Message-Id: <20220406204823.46548-2-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220406204823.46548-1-mike.kravetz@oracle.com> References: <20220406204823.46548-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW4PR03CA0012.namprd03.prod.outlook.com (2603:10b6:303:8f::17) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 5aa08e16-3020-457b-b99a-08da180ed412 X-MS-TrafficTypeDiagnostic: CY4PR10MB1909:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 6TLsJc0UVE6yV/jqqk4Fe2R7j+rGVsg5O6fqn6a5v9w0qhFAFDtCq2qKA5A/DYcoCLF0V498BC4MX8PoxUaH7CI2zGR68+hJj4AbtHJuEIUAcE4A86JmZwd4eDG6+h3o6Oj6PpbGnYMlW28Ml9j0kIpRH4X41Rg4peam8DwBuIPy/rIrf2YaVqZGrqva/pIFstGqGacRLfYKCKpnzlgznNMs20vIf8hwoNmESqszWqA5oXUwMx53xPGK0N6uXXLix80kw9TEQHVjw5vb3S/eg7/v5imu0bC1DcYsRauEdQpmWstLJC8zq8shS7geE6XTqsxgC+6uaQhKvqIdlM+boinm5VBocHj/ISJ6YIcp7qJewkAimSq62azQpNGJCsiphVFkADJb9oidU/TTZeycpXj6c+9XkOfs+TQWUXxli7CR/3lFp7RSWh6+NMBZh+9apdqz2mm3o3K1IE41IO2wGJOU1vTpcK9f++Zybt3p1xUREQOjZtEJZwpUsWC4Vpe6LhEp3sql/LOUvm0SH5UrnTP7r591qFK3zAeUaFVL4WCiAmUwcP8AOD5EKcQeUC0OHXp1I2RI8dJiDj3vvS7IMynTQsMu4QKcr01mYpPbzHsgtbs+RUL/jn+x38Cdat6r62AhkFeBEUxk7PuCRQa7q9j3raUz7ineDuxOqXRQ9FnAW9fgSlx7RjGPFsvSzlbR9zLHf12kyxwSnbY9jIuW9w== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230001)(366004)(38100700002)(54906003)(2906002)(26005)(6512007)(52116002)(66476007)(66946007)(4326008)(66556008)(83380400001)(316002)(186003)(6666004)(8676002)(38350700002)(8936002)(86362001)(508600001)(1076003)(6506007)(7416002)(44832011)(36756003)(6486002)(5660300002)(107886003)(2616005);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: K68gWHZcpuih4gXEAGQCd18UwmqpYJ6DFTU42+z0WVyTYeAHpWmcl4/cIFxO1hYsLAXcxU5yq8lQtctwFC3b0ufjZBwmaVyNxmBPrHcoZnEmRTv/aI44rOezxXKrWa7CWVSa3YZSctz5JdJwL09asB5125Bt7AU++Frsl3xBo2gw2XvK8GcGYpJ9+ExuHWMnWPCk1wwsn10RGns5VWPqwWIrEPJrJ6j8nuL3wiB4to3jVynw1xQfPlxz23UgVXf2BKYgpcfGUsiJqS/Td1hXYqvR5DpUasusxiKZe0Edc//XsmhXlXJ9UWQkXQqA3kmLnnJKGCC3fI8jv6lG5R5hm6B1pP03NfHVgRSsYWHJjiU6g2afnEbXZScwGj56nvXpqY1nu6k1GVsZbiGKo/yi/P90Rmt/iIlbyXLRoXJB7/kntIvaE3t7lAOGHg+l5pf7p8X4iA2aCoKW8Bmx4C3EsXOTbH5FLc65u0jHRHiBff9EAouXdlXIMkWw4apQXdY68Nb25fkUD5JhNNKqjHBiEmSwW7pnt6H0lUQrvstFE+ot4QLbIvzFP4OvO/x3yzXXnOWug6Rl8LbmZj+w7RDWDD625XxLAYDdBtqT+3Tq6eCOc93P77Grg7CPIB1GsHVzrsBGDjwUNb7yCNlNGlkAopSH8wsFJq43UekiiSbadtB5EZLOs4QARKEJGWI6bS7txZb9tyDK0pTD9pAvBO7ZZ92aIo0+QM/xbuUIPzwAEmHbpKo9nf8z4EcLO9G6tRViQGpd1OjuPlx6pwbgFZJHfHd7gebsNQ6M7qfeSqktSJExAwv3zsT2htSU6iYQ6AWjU3e9RSSdhh4XqaNr6a8/vu88I1qyguWUDoMcand7CaAuqkbDK3OuWQbjlwgJjpJ1Rs9sCOIsTs/cpVfdxnZ6Z0tagthqLLV5J6w0kBTyz4YrH+IcMPHRGuCaLL6YwCUqCFOPvtEW0iWDykrk5aEC37AweLdH1cgFEgG+yjZQFQ2lA+ZTPy7M1sP1UtJp7pGFFWvxOLRsPj53vaL0c+6wY3LhZoZ2nMCatrP01R+Ygx6sigJAQ4h177C66A1h1Vmaza3oZvkytwHqWVxHoI6WFkz03p05MxIZjUTedd6fLVn/WKUgbQ00WXuC5hMvT1ThjwnVBhO87Lw/CvFgMbaqhuEoZuBmHyJAl//WZfsilC6FKvuBqCuh4LN5itb4rQWEN6+6kMn7qHfgNw09FlzNQsE9M/+1j5bavhfHDa8t/mEJ9QbUs+DCvK2NSMP7QcDygit+2OtCYUw2iIpDIbGI15VuG+SUYMQdnNNxrqetlnWyT+5aKsQ/96Jo0m0zfPvxGZiBaS+bGZ4UqFokc3txr5AX+BMHD7C0ooNjzG9EvhZlM7fgIi8YAXRSqpf9/bT00WEOQ3jS1e1GHe6EAGMBSg4nYap9YCD6aUASECr/AawVckO3jAdgvzFNKgJFGeQWe81ixagcVeapd+FUvwjWl6MJ5iWlyGtK0IKDZ/rJargRUJUrhb2EqfiVAwxIIEbA2dBCB+8UVtKjhnt7pTuOFSph/gIagGqChP9ceX1khY11HaVET8zfQl698bH3+r0+xnqLpjVQPYabFnw693ICHdUb5WIc34K700tc2a6K+JeM0sJpg0BW8Xs9pIAFBeMoQYwboKvGSrdC3cyPLdrDphmjbBm5rfLng98PuWIpI9f6tR0YN6BDOCrsgfqufC3DEOMMkuKStxGgM5DcoZocIb2MB7oPGhKjgynZ+2tkuNA= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 5aa08e16-3020-457b-b99a-08da180ed412 X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Apr 2022 20:48:39.4358 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: OPGO+Z+PLrqfTDipHOrAQ3T3X79pHjtgZ2QUEnZCuUMso3AnmmpCjXPtsJJOV7scLtqJHu34bk14sQC7R58aQg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR10MB1909 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.425,18.0.850 definitions=2022-04-06_12:2022-04-06,2022-04-06 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 suspectscore=0 phishscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 spamscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204060103 X-Proofpoint-ORIG-GUID: JmJN8gwGYn35fu7oGYU5Zk0xEQ_giKac X-Proofpoint-GUID: JmJN8gwGYn35fu7oGYU5Zk0xEQ_giKac X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 2F0B040004 X-Stat-Signature: 3wzukk113cu799p8uqwobd7ujs6w1hqd Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b="tkX7D/3K"; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=Xogq+E0T; dmarc=pass (policy=none) header.from=oracle.com; spf=none (imf12.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.165.32) smtp.mailfrom=mike.kravetz@oracle.com X-Rspam-User: X-HE-Tag: 1649289340-234723 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") added code to take i_mmap_rwsem in read mode for the duration of fault processing. The use of i_mmap_rwsem to prevent fault/truncate races depends on this. However, this has been shown to cause performance/scaling issues. As a result, that code will be reverted. Since the use i_mmap_rwsem to address page fault/truncate races depends on this, it must also be reverted. In a subsequent patch, code will be added to detect the fault/truncate race and back out operations as required. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 30 +++++++++--------------------- mm/hugetlb.c | 23 ++++++++++++----------- 2 files changed, 21 insertions(+), 32 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index a7c6c7498be0..e50de48c7707 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -450,9 +450,10 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end) * In this case, we first scan the range and release found pages. * After releasing pages, hugetlb_unreserve_pages cleans up region/reserve * maps and global counts. Page faults can not race with truncation - * in this routine. hugetlb_no_page() holds i_mmap_rwsem and prevents - * page faults in the truncated range by checking i_size. i_size is - * modified while holding i_mmap_rwsem. + * in this routine. hugetlb_no_page() prevents page faults in the + * truncated range. It checks i_size before allocation, and again after + * with the page table lock for the page held. The same lock must be + * acquired to unmap a page. * hole punch is indicated if end is not LLONG_MAX * In the hole punch case we scan the range and release found pages. * Only when releasing a page is the associated region/reserve map @@ -488,16 +489,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, u32 hash = 0; index = page->index; - if (!truncate_op) { - /* - * Only need to hold the fault mutex in the - * hole punch case. This prevents races with - * page faults. Races are not possible in the - * case of truncation. - */ - hash = hugetlb_fault_mutex_hash(mapping, index); - mutex_lock(&hugetlb_fault_mutex_table[hash]); - } + hash = hugetlb_fault_mutex_hash(mapping, index); + mutex_lock(&hugetlb_fault_mutex_table[hash]); /* * If page is mapped, it was faulted in after being @@ -540,8 +533,7 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, } unlock_page(page); - if (!truncate_op) - mutex_unlock(&hugetlb_fault_mutex_table[hash]); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); } huge_pagevec_release(&pvec); cond_resched(); @@ -579,8 +571,8 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset) BUG_ON(offset & ~huge_page_mask(h)); pgoff = offset >> PAGE_SHIFT; - i_mmap_lock_write(mapping); i_size_write(inode, offset); + i_mmap_lock_write(mapping); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0); i_mmap_unlock_write(mapping); @@ -700,11 +692,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, /* addr is the offset within the file (zero based) */ addr = index * hpage_size; - /* - * fault mutex taken here, protects against fault path - * and hole punch. inode_lock previously taken protects - * against truncation. - */ + /* mutex taken here, fault path and hole punch */ hash = hugetlb_fault_mutex_hash(mapping, index); mutex_lock(&hugetlb_fault_mutex_table[hash]); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f294db835f4b..398b7742cc63 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5401,18 +5401,17 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, } /* - * We can not race with truncation due to holding i_mmap_rwsem. - * i_size is modified when holding i_mmap_rwsem, so check here - * once for faults beyond end of file. + * Use page lock to guard against racing truncation + * before we get page_table_lock. */ - size = i_size_read(mapping->host) >> huge_page_shift(h); - if (idx >= size) - goto out; - retry: new_page = false; page = find_lock_page(mapping, idx); if (!page) { + size = i_size_read(mapping->host) >> huge_page_shift(h); + if (idx >= size) + goto out; + /* Check for page in userfault range */ if (userfaultfd_missing(vma)) { ret = hugetlb_handle_userfault(vma, mapping, idx, @@ -5502,6 +5501,10 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, } ptl = huge_pte_lock(h, mm, ptep); + size = i_size_read(mapping->host) >> huge_page_shift(h); + if (idx >= size) + goto backout; + ret = 0; if (!huge_pte_none(huge_ptep_get(ptep))) goto backout; @@ -5603,10 +5606,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, /* * Acquire i_mmap_rwsem before calling huge_pte_alloc and hold - * until finished with ptep. This serves two purposes: - * 1) It prevents huge_pmd_unshare from being called elsewhere - * and making the ptep no longer valid. - * 2) It synchronizes us with i_size modifications during truncation. + * until finished with ptep. This prevents huge_pmd_unshare from + * being called elsewhere and making the ptep no longer valid. * * ptep could have already be assigned via huge_pte_offset. That * is OK, as huge_pte_alloc will return the same value unless From patchwork Wed Apr 6 20:48:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12804046 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6809FC433FE for ; Wed, 6 Apr 2022 20:49:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0DBA36B0073; Wed, 6 Apr 2022 16:49:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 062DC6B0074; Wed, 6 Apr 2022 16:49:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DAA116B0075; Wed, 6 Apr 2022 16:49:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id C7D6D6B0073 for ; Wed, 6 Apr 2022 16:49:00 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 87F04262AA for ; Wed, 6 Apr 2022 20:48:50 +0000 (UTC) X-FDA: 79327643220.01.FA5D73D Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by imf23.hostedemail.com (Postfix) with ESMTP id 9512F140008 for ; Wed, 6 Apr 2022 20:48:49 +0000 (UTC) Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 236IdjVt006371; Wed, 6 Apr 2022 20:48:45 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2021-07-09; bh=FoxNRpKRVHv0Q/z/jeADjpdbyfaiyWRRZB/mt6jB2kE=; b=eO5eNuPSqtY7loC/II2+KUwfRJYG6T+MjHDH+jOGyzYucKGi3uVeHolex5GHyElK783E 8lL+SjBQD1hf7U7+cAqkdTGeQ9lmnJc4P0D9fVzW15tZNSKLWFmTIz897cWefNvIUBRe NAyZN4e8M+9Lm1MlLT4tlQvlovULijLncGAilURwbdRPonDkCGo6al97t/R3HXkIY47y UKY8JEvmh78BBDjNSUcUz2MGRGZCsrefuhjVymERYnwNoOevvCMn3Rst9VuJoMnkBkjY aPIilSbwgX4zZ7tWGivnKSgCWvgzw8t4BUJ7YBaPtkB4Wy1MbU+59XkHV3Gj1vK1BCM0 /w== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com with ESMTP id 3f6d31j6tw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Apr 2022 20:48:44 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 236KlLPO001934; Wed, 6 Apr 2022 20:48:43 GMT Received: from nam12-mw2-obe.outbound.protection.outlook.com (mail-mw2nam12lp2041.outbound.protection.outlook.com [104.47.66.41]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3f97uw2k83-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Apr 2022 20:48:43 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kyhv7qhZ6Om6yJn+R9o5jyZYNPl5eTgj6CA9IkIBSrrNE5RLyl0OhaWgmkYT3kSnAZdGbSKGPEbqIdxLPaL3WIfH4BWCsxeN6S9iPapEXnGIXLzJ/WW89F2a1vsc9+hEvmAImMKSNL1IJrTLDynRpNKvxTHx1cVcJeRPOKH5oytqnvXpGhnn6yDJv8XeBQmSES76CK6PYlFIWk6PlLF/ARBYlOsPq+2AUR1XISm99twg2Pi3S0Sl/WN1wk7cUlO39C0xrMEi9BRyZ4ec4pYX6YTvVBk7rF0EQuGMVkvtrbcQ1sZx2xN+jxFjhYU/oLZHyi1q+kOjUu1RTx76tYdD/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=FoxNRpKRVHv0Q/z/jeADjpdbyfaiyWRRZB/mt6jB2kE=; b=CeJ/QDLIrKVwoN3rkvK0s+hqmFnJzO1bdNpYSika6TMOIpjxaS1zzbwEPjN3PyouP5UG7C75UrDBspQpgWBSNsDrFiyilhebfAyutVBa13esgyuZAgqK5ZCv/lRJ3axzYxeU11CHCpmVWwqlTdzALmifMHzGz3Xu+zQF4so2iA6ZLycCqQOHzY7n/4y5whphl8dEFKjDHGFwTZf/NIyecfCrOIvXwyPeYmpGVTjhHvhfrnYgAgxpkdQRPEDOX2o9gOUMzYFWQ6k0+y9tGdWXM7uvEhBvU8paQEP+Ff3/KTfY8O60wsZMI6uB3cvUDvEHVmvkct6JdqneZW1/g368OQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=FoxNRpKRVHv0Q/z/jeADjpdbyfaiyWRRZB/mt6jB2kE=; b=cvF7SliWhwCcFApQ4BlJeiW/gWoXlAtDDrJ9Jgd+inf5pLTFC1IaNy+b5RrEpcwgOjCkAaCoUFc2Hh030ZHn2TMEIdn9aI2dKIE7kzHJNrsb4B+X2hv7fJ7LrsfU4ueOa645MYQ6MScYj711KskFp7YaOVvr9DDRPZpzHld5kWE= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by BN8PR10MB3362.namprd10.prod.outlook.com (2603:10b6:408:cf::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5123.31; Wed, 6 Apr 2022 20:48:41 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::245f:e3b1:35fd:43c5]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::245f:e3b1:35fd:43c5%8]) with mapi id 15.20.5144.019; Wed, 6 Apr 2022 20:48:41 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [RFC PATCH 2/5] hugetlbfs: revert use i_mmap_rwsem for more pmd sharing synchronization Date: Wed, 6 Apr 2022 13:48:20 -0700 Message-Id: <20220406204823.46548-3-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220406204823.46548-1-mike.kravetz@oracle.com> References: <20220406204823.46548-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW4PR03CA0012.namprd03.prod.outlook.com (2603:10b6:303:8f::17) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: c9743ff7-97d1-4164-430f-08da180ed508 X-MS-TrafficTypeDiagnostic: BN8PR10MB3362:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: dAji66pHyJWJ+j1zQsTDqIaIHOtD2IZBNstFfz07CGvO3GtzRz3DXzHxT1cdlM0UKlq5AW3HduDRu/8eFJ6Y+bg0vRjxILNoQVBbmEI5yYx3yodPlRdeD+fXfLT6oNp3vngVt6D9ZHfTWRDw/2EaP+rfAZtRCwrkNwKWmPKZFD4/iJbLcDtCdQwJt0CTFjamKbksjnP15ytBnPV3Tw/FI1xL4p/2H5VSlSepyNWiPJ9LRMEJbCqHvWYbcswl7ha83zlvluQL7W5BZVOhqib3azmXJt94qU0JZPTanXpx332cTnqa/DyO1L819A3IOov+lcjKHNKn1aIZBd+tElmNTTCATHOzwwwvL/U0hF70zMegeAS3vMjUYgecVn0WZ31H8JqPNkL50ErBPCSXt2w2V4bMqhFxPB017RfwDCewNWV8mUOa78XHShHlhnsccJOc+zBbXsgvHf//K97mgmdUIVtKkwhr667woNEK8HUhXlJYWK1o3sJ7m9ebKQ6yZWLuDz+VxF2Aawya/3sXOJSNHFuOCQjG8F/WzApimTOPcPo7ZxLbvcErk0dPdESNkA39nesm+sVbDl+iShdFSb/GX0faP2pcdO+F3cnsEXs/KzysWpF79x3Xrykw4RTfMlU80Ub4qJL3LAHR3lyl5HVyfq0r1ToAtB4mtGqQINuvleJac8VFMxM22f74oCTDzilyNnBJKMoM0WQqdz13JGuwPg== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230001)(366004)(6666004)(38100700002)(66476007)(86362001)(66556008)(66946007)(83380400001)(8676002)(6486002)(38350700002)(508600001)(186003)(26005)(4326008)(6506007)(54906003)(316002)(1076003)(6512007)(44832011)(2616005)(107886003)(5660300002)(36756003)(8936002)(30864003)(7416002)(2906002)(52116002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: Jnvwk5YYjfdbwDgSwFmkd+ilPTG3VTFHAgNVbX3I1RtEzjzC0hrmfWVcrrbU9sLr6MaRO8lMdqDymR0xPtbLl+GsNtN84CArAy+3g8jXdasPEmR0wMsU6c6otUHVAMBTTbuXN7+euhKB7H/DTF9XSH3mAGihb8/J6JEgds4AIR5vwM9v8lwGHiwFcqGs052y0eH/JJIfiVVO9G6p+he/y9ppGH6/m5SZeLj0lNTIWa+la6FiZ/TxhwG/IrbbjAPXW5X74sw+QTV/NYhtdVwPBNfJpUxqUhDrnPtWRyUlT7oWcaxdJaJoVxArkLfGVKc+23cKSZ9SQiX01z8W7UZQkG98pVoof2tpDCnjdBWU8eJMWNXxHAYEKPQ9XkfAjKCPY6YpZMS91SBUoE2aRo/18aUgiY5cTGKF9ffaHZ9S+KT2mcbPEb8uvRfrvI/qIhOYAGsP5SYzeSmppuEpnrFa2YGwgMRPvoeJHSiFb/31YK8FXPA8bbz57NGms9RIsS0+H0RRd8cKxi4q8NOjtQk/az83AoshgZgx3ulgfBhKLasAXfDRGrWKrtzbJdgLrnXSlKeGsvo37Cvmf+7GvkMfzQMGAA278H7oWXO7PHbuuldcZP7j1Y22sHdcRYl9rqht5zJVLGLsUJcj+olrGOZ+aYci9+e6z4mlw+jalzhYGnwHjRanJ2dv1J6AzXjs+WAigjwHH9CWKbS3+I42HPom6037nWTjQfciLw48J3HvPjgGNXfblls4tW2Nd1mR7+QG8lRiyPq9yXQWnsNNmzOKNVtBURO9EwUDVviANQzCgWGQZBPAe9IPpit8ziMxz6EUUp+8ZZqKfz1yyd9wUsxsu8KGPztG9EnTkcLaumTMctlixBcpB3s/lxiPeZve0CGdD5EAOlVgDTYfAfkQQggZKMC8nEwifxe5Gugxt0ZsRfVDAZEHarQwhEd5/jNktQeOK0+3eew5liLsXPVHOG8sgsyHrNtJYAfkcpHz0/cxVEAmEdyCPD5lEaX2TNMTv5TtIphu61hHtDMA2JYXvTqED3Q2Q/aDmpImpqkfCdI3+cNQYlPuq5qoNWFfVHvp+wrBdwK+kI/LfYnIjPtkeyi6tYWoz9Cxi0hA6KRmiTw7RckQP/5Y+GF6fHy30RgB2I+b3DizmH7mnSooEhgBo3dTOEW8Yl+mpWU30h1wNRVH/v8XUskha88c0SJIPC6D0rKNFZSvDYBQfJ3qHIZiRKynvd//8c+mPmlS9Ddw8I1sQMmIvyTCsC27eVA2+FjYT4vRbjvCRBVEiidiXTUC0oDtPyrNS/H9BAPnJXHgH+MUHkqXJsMSm5+d/QpaR7gUSwYVwYsTdKJ3qITzZXLJ1OXKqpScLIeoMj2S++IbHNwvwYgeVAabLR9FaxTlB4h0lgOqrBdjFIu/Ks+UgTen1Y9X8RaH++nMd5wJlHQxkQwERuAADMeLvCbFNksGEe7sAZft/dTrZ7TyOUdO9ALTO2h6BNI+GgeJKR/u8Tx9X1SheiIw2+WN3JE5olUh3MiBJjiHsKVnfSu6Ozww+la/cnly4MovKtvQRNx1id0wd3vwsop6nb1yd1MnAkQDqc/B6gbp4InxC/CezkQe6oJg2z3ObVhY1C1fXkUkSZwife9/edJoRhEAk6jbsjUt+dYSN8bc3Pz3CJr3lqnAPHd+WzxWn/pQCym3TPuMiFCFVnprvb3RZ0T8NlqjYCRCAkdtPg5We9++9XPiNOe1oM/IjXwSsg7NK3+Q2TBwHP82+xLW9EA= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: c9743ff7-97d1-4164-430f-08da180ed508 X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Apr 2022 20:48:41.0605 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: dkfAK9teU1XIaIr10K8n7juSbMYICNtrhfJpu4r+r3CUanYBFfN0oEBDaJ/rL74dNPNd38E39Lz+xo+3ndUsRQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN8PR10MB3362 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.425,18.0.850 definitions=2022-04-06_12:2022-04-06,2022-04-06 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 malwarescore=0 bulkscore=0 mlxscore=0 adultscore=0 phishscore=0 suspectscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204060103 X-Proofpoint-GUID: I8oDeKK31fE3ydY756g5Ckkff73NDzh7 X-Proofpoint-ORIG-GUID: I8oDeKK31fE3ydY756g5Ckkff73NDzh7 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 9512F140008 X-Rspam-User: Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=eO5eNuPS; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=cvF7SliW; dmarc=pass (policy=none) header.from=oracle.com; spf=none (imf23.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.165.32) smtp.mailfrom=mike.kravetz@oracle.com X-Stat-Signature: mqjzzo3ief139sqyoaxe6zhf78xndng3 X-HE-Tag: 1649278129-129342 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Commit c0d0381ade79 added code to take i_mmap_rwsem in read mode for the duration of fault processing. However, this has been shown to cause performance/scaling issues. Revert the code and go back to the method of only taking the semaphore in huge_pmd_share. Keep the code that takes i_mmap_rwsem in write mode before calling try_to_unmap as this is required if huge_pmd_unshare is called. In a subsequent patch, code will be added to detect when a pmd was 'unshared' during fault processing and deal with that. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 2 -- mm/hugetlb.c | 76 +++++++------------------------------------- mm/rmap.c | 14 +------- mm/userfaultfd.c | 11 ++----- 4 files changed, 15 insertions(+), 88 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index e50de48c7707..56cd75b6cfc0 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -504,9 +504,7 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, if (unlikely(page_mapped(page))) { BUG_ON(truncate_op); - mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_lock_write(mapping); - mutex_lock(&hugetlb_fault_mutex_table[hash]); hugetlb_vmdelete_list(&mapping->i_mmap, index * pages_per_huge_page(h), (index + 1) * pages_per_huge_page(h)); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 398b7742cc63..8fa2386bf7c0 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4701,7 +4701,6 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, struct hstate *h = hstate_vma(vma); unsigned long sz = huge_page_size(h); unsigned long npages = pages_per_huge_page(h); - struct address_space *mapping = vma->vm_file->f_mapping; struct mmu_notifier_range range; int ret = 0; @@ -4710,14 +4709,6 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, vma->vm_start, vma->vm_end); mmu_notifier_invalidate_range_start(&range); - } else { - /* - * For shared mappings i_mmap_rwsem must be held to call - * huge_pte_alloc, otherwise the returned ptep could go - * away if part of a shared pmd and another thread calls - * huge_pmd_unshare. - */ - i_mmap_lock_read(mapping); } for (addr = vma->vm_start; addr < vma->vm_end; addr += sz) { @@ -4844,8 +4835,6 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, if (cow) mmu_notifier_invalidate_range_end(&range); - else - i_mmap_unlock_read(mapping); return ret; } @@ -5189,30 +5178,9 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma, * may get SIGKILLed if it later faults. */ if (outside_reserve) { - struct address_space *mapping = vma->vm_file->f_mapping; - pgoff_t idx; - u32 hash; - put_page(old_page); BUG_ON(huge_pte_none(pte)); - /* - * Drop hugetlb_fault_mutex and i_mmap_rwsem before - * unmapping. unmapping needs to hold i_mmap_rwsem - * in write mode. Dropping i_mmap_rwsem in read mode - * here is OK as COW mappings do not interact with - * PMD sharing. - * - * Reacquire both after unmap operation. - */ - idx = vma_hugecache_offset(h, vma, haddr); - hash = hugetlb_fault_mutex_hash(mapping, idx); - mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); - unmap_ref_private(mm, vma, old_page, haddr); - - i_mmap_lock_read(mapping); - mutex_lock(&hugetlb_fault_mutex_table[hash]); spin_lock(ptl); ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); if (likely(ptep && @@ -5366,9 +5334,7 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, */ hash = hugetlb_fault_mutex_hash(mapping, idx); mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); ret = handle_userfault(&vmf, reason); - i_mmap_lock_read(mapping); mutex_lock(&hugetlb_fault_mutex_table[hash]); return ret; @@ -5590,11 +5556,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); if (ptep) { - /* - * Since we hold no locks, ptep could be stale. That is - * OK as we are only making decisions based on content and - * not actually modifying content here. - */ entry = huge_ptep_get(ptep); if (unlikely(is_hugetlb_entry_migration(entry))) { migration_entry_wait_huge(vma, mm, ptep); @@ -5602,31 +5563,20 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) return VM_FAULT_HWPOISON_LARGE | VM_FAULT_SET_HINDEX(hstate_index(h)); + } else { + ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); + if (!ptep) + return VM_FAULT_OOM; } - /* - * Acquire i_mmap_rwsem before calling huge_pte_alloc and hold - * until finished with ptep. This prevents huge_pmd_unshare from - * being called elsewhere and making the ptep no longer valid. - * - * ptep could have already be assigned via huge_pte_offset. That - * is OK, as huge_pte_alloc will return the same value unless - * something has changed. - */ mapping = vma->vm_file->f_mapping; - i_mmap_lock_read(mapping); - ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); - if (!ptep) { - i_mmap_unlock_read(mapping); - return VM_FAULT_OOM; - } + idx = vma_hugecache_offset(h, vma, haddr); /* * Serialize hugepage allocation and instantiation, so that we don't * get spurious allocation failures if two CPUs race to instantiate * the same page in the page cache. */ - idx = vma_hugecache_offset(h, vma, haddr); hash = hugetlb_fault_mutex_hash(mapping, idx); mutex_lock(&hugetlb_fault_mutex_table[hash]); @@ -5714,7 +5664,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, } out_mutex: mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); /* * Generally it's safe to hold refcount during waiting page lock. But * here we just wait to defer the next page fault to avoid busy loop and @@ -6475,12 +6424,10 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, * Search for a shareable pmd page for hugetlb. In any case calls pmd_alloc() * and returns the corresponding pte. While this is not necessary for the * !shared pmd case because we can allocate the pmd later as well, it makes the - * code much cleaner. - * - * This routine must be called with i_mmap_rwsem held in at least read mode if - * sharing is possible. For hugetlbfs, this prevents removal of any page - * table entries associated with the address space. This is important as we - * are setting up sharing based on existing page table entries (mappings). + * code much cleaner. pmd allocation is essential for the shared case because + * pud has to be populated inside the same i_mmap_rwsem section - otherwise + * racing tasks could either miss the sharing (see huge_pte_offset) or select a + * bad pmd for sharing. */ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, pud_t *pud) @@ -6494,7 +6441,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, pte_t *pte; spinlock_t *ptl; - i_mmap_assert_locked(mapping); + i_mmap_lock_read(mapping); vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { if (svma == vma) continue; @@ -6524,6 +6471,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, spin_unlock(ptl); out: pte = (pte_t *)pmd_alloc(mm, pud, addr); + i_mmap_unlock_read(mapping); return pte; } @@ -6534,7 +6482,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, * indicated by page_count > 1, unmap is achieved by clearing pud and * decrementing the ref count. If count == 1, the pte page is not shared. * - * Called with page table lock held and i_mmap_rwsem held in write mode. + * Called with page table lock held. * * returns: 1 successfully unmapped a shared pte page * 0 the underlying pte page is not shared, or it is the last user diff --git a/mm/rmap.c b/mm/rmap.c index 6a1e8c7f6213..206e7d3efdb1 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -23,10 +23,9 @@ * inode->i_rwsem (while writing or truncating, not reading or faulting) * mm->mmap_lock * mapping->invalidate_lock (in filemap_fault) - * page->flags PG_locked (lock_page) * (see hugetlbfs below) + * page->flags PG_locked (lock_page) * hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share) * mapping->i_mmap_rwsem - * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) * anon_vma->rwsem * mm->page_table_lock or pte_lock * swap_lock (in swap_duplicate, swap_info_get) @@ -45,11 +44,6 @@ * anon_vma->rwsem,mapping->i_mmap_rwsem (memory_failure, collect_procs_anon) * ->tasklist_lock * pte map lock - * - * * hugetlbfs PageHuge() pages take locks in this order: - * mapping->i_mmap_rwsem - * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) - * page->flags PG_locked (lock_page) */ #include @@ -1495,12 +1489,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, address = pvmw.address; if (PageHuge(page) && !PageAnon(page)) { - /* - * To call huge_pmd_unshare, i_mmap_rwsem must be - * held in write mode. Caller needs to explicitly - * do this outside rmap routines. - */ - VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) { /* * huge_pmd_unshare unmapped an entire PMD diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 0780c2a57ff1..81e299edbc1a 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -351,14 +351,10 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, BUG_ON(dst_addr >= dst_start + len); /* - * Serialize via i_mmap_rwsem and hugetlb_fault_mutex. - * i_mmap_rwsem ensures the dst_pte remains valid even - * in the case of shared pmds. fault mutex prevents - * races with other faulting threads. + * Serialize via hugetlb_fault_mutex. */ - mapping = dst_vma->vm_file->f_mapping; - i_mmap_lock_read(mapping); idx = linear_page_index(dst_vma, dst_addr); + mapping = dst_vma->vm_file->f_mapping; hash = hugetlb_fault_mutex_hash(mapping, idx); mutex_lock(&hugetlb_fault_mutex_table[hash]); @@ -366,7 +362,6 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, dst_pte = huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize); if (!dst_pte) { mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); goto out_unlock; } @@ -374,7 +369,6 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, !huge_pte_none(huge_ptep_get(dst_pte))) { err = -EEXIST; mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); goto out_unlock; } @@ -382,7 +376,6 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, dst_addr, src_addr, mode, &page); mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); cond_resched(); From patchwork Wed Apr 6 20:48:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12804048 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D272AC433F5 for ; Wed, 6 Apr 2022 20:50:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A78086B0075; Wed, 6 Apr 2022 16:49:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A002B6B0078; Wed, 6 Apr 2022 16:49:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 804556B007B; Wed, 6 Apr 2022 16:49:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 6EE876B0075 for ; Wed, 6 Apr 2022 16:49:04 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 422A56652F for ; Wed, 6 Apr 2022 20:48:54 +0000 (UTC) X-FDA: 79327643388.11.5ECE6D0 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by imf24.hostedemail.com (Postfix) with ESMTP id 350D0180003 for ; Wed, 6 Apr 2022 20:48:52 +0000 (UTC) Received: from pps.filterd (m0246629.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 236JxC8o014737; Wed, 6 Apr 2022 20:48:48 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2021-07-09; bh=lKX/aPmypYgym8OdZUeou9vQU/ZbAXm+0KAbPMTL2RQ=; b=ug0mqfQhBRgEYo04/l23hH9i/5108xphHxo+UK7y6/TPd2fDHDGX83cpI2SLydg0hQqa 26llocvvFtikp7RRFu8mK6wqD5bkA5QADI1tcoy8ZTtm3Hx63uZ0iuFcFE62Qe/ZLuqI iLqTvQEJjTGhINqM6fcqSyeZ5NT03wF4t0XKaS0g5XPG90V27N4dMnILKE954Z2/zA9x ++HjD1jEaVv6NYoPj0Jxu//QcyCZFOLV5Eq5oAa5Iq5p4hjpuLbiBSQz5oJqPqf9X0p7 JKLnMyQGkjk905j+h9zQZLKvgaFerStan85Y7Asti3tSGqw2S2+1yv/7GzdznghTp+Sy Kg== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com with ESMTP id 3f6ec9swex-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Apr 2022 20:48:45 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 236KlLPP001934; Wed, 6 Apr 2022 20:48:44 GMT Received: from nam12-mw2-obe.outbound.protection.outlook.com (mail-mw2nam12lp2041.outbound.protection.outlook.com [104.47.66.41]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3f97uw2k83-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Apr 2022 20:48:44 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=oZNOauI/VwpiXvZrV0lR95K+OatZzbjVuLkZQYpbtE0ms78X7xSfV4LbeXe7VOat5ooZDOdTTw/7h6cik5I1gwOKRWvBqCHTWXhJJ+6dufgQIbWrxDj5H6cBattm9lRoQS9MEZY2E0oSUgEfkzKtmSQ5DFyKNyUILOk7whcYmHUQkThHafuhCpL1rdYjWmDE0DcRoTRazOeLgqeYgBFA0sQ6tyXDiasTLRJ+BNV/ZFUfUBg57SJ2R+J1JoZrvw+K4qbf3yvRIpM12j35GoXKVUbAJs9+w4eymUBTOeZ3YpQnd8MX0iXNTqCXxmUAlK0Nk5kwkrXQTLBsVjTmBjIu+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lKX/aPmypYgym8OdZUeou9vQU/ZbAXm+0KAbPMTL2RQ=; b=nS/wMYEtgRNJ30CbVDb39PIQe3laEjnRohQZuVzjIna2Z1fbVlnGzZvHwFXUPmKcWYl8X/1v6TN+W47R0aDR5iExiTSq//nPXsOJcdEXsVBVczV/yBm8MG3FQR52JJSznwV/tCTE9ZNrqm5f+trR8Ko8LydS47Sm7fQ8ILLHRnEx7UUIEU1YUrsFiP+gowCW0o38xV6yyGrLRhnGSSmiZghSmjBvTVEnlQIvkBXkRy0pLCDU4oRluVOxsxx+b0KOtY8q9ht1hkPSJSnlI2z0h0Pqf8468svUSQfd8uXrEowRZhCT9JrKHwbpBe9NgNf5y20nUImZure0L1p4KEQVkA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lKX/aPmypYgym8OdZUeou9vQU/ZbAXm+0KAbPMTL2RQ=; b=lLA2GVXw2snf1biQzVT6u/1jY/2gYIU5BOsWJgWkKY9GhFQKeqW9bq6C8HvAWrxY1h1+6lJXDQCcQ35VTk3ZpOcfhRVu4tfplLj3+3fSkmq/v4ZMer6F+Yn+DF9eyyLwGgu4835dm6bnw3VU7aFGcONcSwvLOweJy7GkROlq6T4= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by BN8PR10MB3362.namprd10.prod.outlook.com (2603:10b6:408:cf::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5123.31; Wed, 6 Apr 2022 20:48:42 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::245f:e3b1:35fd:43c5]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::245f:e3b1:35fd:43c5%8]) with mapi id 15.20.5144.019; Wed, 6 Apr 2022 20:48:42 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [RFC PATCH 3/5] hugetlbfs: move routine remove_huge_page to hugetlb.c Date: Wed, 6 Apr 2022 13:48:21 -0700 Message-Id: <20220406204823.46548-4-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220406204823.46548-1-mike.kravetz@oracle.com> References: <20220406204823.46548-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW4PR03CA0012.namprd03.prod.outlook.com (2603:10b6:303:8f::17) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: c7c78e6e-656c-4717-63b8-08da180ed5ec X-MS-TrafficTypeDiagnostic: BN8PR10MB3362:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: jT0y/lWJYcX09kNnpFR44dVGo1ig4Ahm19G1mOAGwBK1RqleIrM5/thyM4X/H/zZf7v0703U++DU41MoC4t8sfeVretmZLtcpylJnh6o/sJXjFJYDrxRveYI3xka+6MvTneifv0bbSFVX7lCU9FuHsI0cSeWWC83H9ZIIji2ONqID5ZumldMfWv2hzGjTmwvnhLTMzuseYR0GIK9soZYV0qPVIm7CV5m5fmYph2dnFnRB4YwJcpo/swzrG0paPJsDbtc6EOdjNx3nAHlAEqwZDv/5DfxKOXTeGrMbZ3P6bb2pgNVVaCsVuzs6gWfxvXBeKkx5c5bVrIBhcG+91ZIb+OGrq03F8LSCz2m7Cpy8lPE9310ygAslBELE5B1CAEsSutbZsnjC4JjkoPfvQCnc25QaOjD4R+f6gagcKl7PTbNJW2Ru20ncj2IqKIPB6dyhVX9EvOdL42im3g5pc5yKN1GMZOYGKZaDCES2xG+uztcOn5T56HnKF7qIuNb3InLu+WLruW1oDRC/WCf/UkhEe/+u8mFcxRWxnzssbAlpmCOdGhGMzoeMRm8C1re9K2sARtD4SpkbdK//LT63KPcYLGda2PaDeACzDElOtuTWo1cbGLYIi/bBcvqP5Uygj/Kg2stcwMrn6WjxXtoQh1EEHylOUuO0EjPO5rDbBiTVBLLdGigoED0oVgzqjW20csWXA7O5s8MG3kqFEKboLLAXQBLJEdDVXiWPG6hedoPc/U= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230001)(366004)(6666004)(38100700002)(66476007)(86362001)(66556008)(66946007)(83380400001)(8676002)(6486002)(38350700002)(508600001)(186003)(26005)(4326008)(6506007)(54906003)(316002)(1076003)(6512007)(44832011)(2616005)(107886003)(5660300002)(36756003)(8936002)(7416002)(2906002)(52116002)(14583001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 7KOEeYFXwaGbpHwO1Iry7rL0MqNEbigdqtQLKRPJQu/eVhTTTVkq/k2xrbB4B7oQVNCqKzgs91CzWySzwqkD01s+ne7kvVqqDfeN6oPUnHiJ5c24jaxb6EiG/17kXXeVyjDRbUgWwgwMBN3hBUW3+UnTPsT3EcC2yuEsc5Ql1KsIugxILFs1cNbpLdWiFXyt2KHxkDD2BZ0cjBKlAkK/2rj3KTSWmBh0Cqzdr24BY83BHppu6ZTr0AfXIlStgl8HRoEG72GmTs0AgX6X5//hKI1p0+2uZvCeRYyANpFTpW31Wn1MwLSXdLn2ygOZ7mMHbMNZTt8Sk/Fzj67zCXaWr0Uv7QZqRKZUts7i+xoGW2bLL/TOjOkRPpXSabkxJU0VQ6a5HVLxmmfEBd3LeSY0mJ2jFDDg9NHeBjHeGtT+PwM0S2K3IjyF1w65dintsiHyo0z1f4Pd1FC2lERzaoyweSYIcvj1lsImT9HV9hnM/bWhJIJj6XXcK8WIYf+2NDtPMrPW07uEcIfO9ga2LlYRthfxd6/IgY7cQ9M6TS0xglApYMxlIwXXLCb73TJttWfLWR3WU7ROE6AdJcX5KCLJR92a9L+FTxH37GjSSP+7IHFFyquYmZWek1N8KJ+6MRHZmBs2cqESTALRmpXGJpVXQAD7FGUg2kr0YQSXQLyKix8lD6JkMP17Kwa6r3AxQhN1nmYZVTVBhTw1ZM9rrXBjnFw+x0T3WU5ty6KZlvK68du3UkD9R4Mpp3EEHXDkMh0CvnpD8kazDtv8ATTjtf7rEBt0lE4tB3iYrFb2jB5Xa0ZtYxbt5CEQTHbbIeLTEsC1GRP2vNikHZ7fFp1Mcpqk8LSuAWubWKrzonDC6cesvLFNXW/uxkv618jJA3RWKzI6eyKAw/GK+fZyGiabGvGkrGCjlwACAVTUVTchXXxt+R5OiOIxyUt/1KKVjibl/9aMb9CloJk/gfbxPgfh2yvWP+to9mgGOrAEbddJguNprPK/TOfUW99RKGvEbuNrvZJqX42VOf0hIMzO38SGOY/GPP4tf/7dsia1WtIHNn6rL8QogDNLEaLtIboHfV86VeqeDTef2+yqqGk+Mt69iPZKw8bE0phCVLuv+mv78tNW1q1mzeMRDibPE3FCXVqVVN7qTjXSY1V4G2dxcOYuyFhUYTHp846mXZ1Hei5guixaiR/RrROrHfpksrWJEgNwiFogWkARhFz4/dluAQwmH1XsLzpasSIhTHAX9JN/Chrd+UZsPpbCiUCY9X42+05vEbUL5zPPzpqR6FqhO21CGM9hN3tnuq7Eemos8PTPYwwn6RFiK116EW2R0l12543SJ5SNIW7MLbKW7xPlIRsvLYtBXKlq14eEmP2OWN1WYxR024P/afKthewitXtv6tBKuqf4eGdjElsZJvGns8kyW2lAFjojTHZ8q46I901AoejdxC7quJzXuQe5fF/7tJbNZRV+a4yvf1id7EcbvPd60/ouua8ZTKVxJUCxvt+QGrjUBaKWA+KAmlmmXvt3GCHr+xpV04ci5mXKe3bvRHvUOHwd4pMEKzs7WRndHGNTdW98LoUzvn9asBPkHN/67OftLz3ZI+Uou4SZFzjjzsYEdgp6BmnglnPhqaJGk2Fxg3YjvN57AuzU4h0pX9ICVUyjO8xSH9WvWn9c77HhABy7CPJYfNLu+Exoj2oAkqFo5GUXdwn4OYcCujOQBNYsK8eohEkWUwIN6trQaFzX6CZj71NMYGw/t7/TIkqIrubetrp8Z5Y= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: c7c78e6e-656c-4717-63b8-08da180ed5ec X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Apr 2022 20:48:42.6033 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: ic6GCQR4fNGAUO1K5TeeFGSmhvnoZa8WC5SF4Zr+1OvlswIogm1ZdoiRns2sM8X2ryzqAJVdi5raAqE5NnFBKQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN8PR10MB3362 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.425,18.0.850 definitions=2022-04-06_12:2022-04-06,2022-04-06 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=923 malwarescore=0 bulkscore=0 mlxscore=0 adultscore=0 phishscore=0 suspectscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204060103 X-Proofpoint-GUID: Edp9TS8sBswfDGly0-fVjK_xe7sveL42 X-Proofpoint-ORIG-GUID: Edp9TS8sBswfDGly0-fVjK_xe7sveL42 X-Rspam-User: X-Stat-Signature: xepfwjr5f99n1nphp65aqos5iqnixqfw Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=ug0mqfQh; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=lLA2GVXw; spf=none (imf24.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.165.32) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 350D0180003 X-HE-Tag: 1649278132-968319 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In preparation for code in hugetlb.c removing pages from the page cache, move remove_huge_page to hugetlb.c. For a more descriptive global name, rename to hugetlb_delete_from_page_cache. Also, rename huge_add_to_page_cache to be consistent. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 24 ++++++++---------------- include/linux/hugetlb.h | 3 ++- mm/hugetlb.c | 15 +++++++++++---- 3 files changed, 21 insertions(+), 21 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 56cd75b6cfc0..0cf352555354 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -396,13 +396,6 @@ static int hugetlbfs_write_end(struct file *file, struct address_space *mapping, return -EINVAL; } -static void remove_huge_page(struct page *page) -{ - ClearPageDirty(page); - ClearPageUptodate(page); - delete_from_page_cache(page); -} - static void hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end) { @@ -514,15 +507,14 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, lock_page(page); /* * We must free the huge page and remove from page - * cache (remove_huge_page) BEFORE removing the - * region/reserve map (hugetlb_unreserve_pages). In - * rare out of memory conditions, removal of the - * region/reserve map could fail. Correspondingly, - * the subpool and global reserve usage count can need - * to be adjusted. + * cache BEFORE removing the region/reserve map + * (hugetlb_unreserve_pages). In rare out of memory + * conditions, removal of the region/reserve map could + * fail. Correspondingly, the subpool and global + * reserve usage count can need to be adjusted. */ VM_BUG_ON(HPageRestoreReserve(page)); - remove_huge_page(page); + hugetlb_delete_from_page_cache(page); freed++; if (!truncate_op) { if (unlikely(hugetlb_unreserve_pages(inode, @@ -720,7 +712,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, } clear_huge_page(page, addr, pages_per_huge_page(h)); __SetPageUptodate(page); - error = huge_add_to_page_cache(page, mapping, index); + error = hugetlb_add_to_page_cache(page, mapping, index); if (unlikely(error)) { restore_reserve_on_error(h, &pseudo_vma, addr, page); put_page(page); @@ -972,7 +964,7 @@ static int hugetlbfs_error_remove_page(struct address_space *mapping, struct inode *inode = mapping->host; pgoff_t index = page->index; - remove_huge_page(page); + hugetlb_delete_from_page_cache(page); if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1))) hugetlb_fix_reserve_counts(inode); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d1897a69c540..2cf99d769f61 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -640,8 +640,9 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask, gfp_t gfp_mask); struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma, unsigned long address); -int huge_add_to_page_cache(struct page *page, struct address_space *mapping, +int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx); +void hugetlb_delete_from_page_cache(struct page *page); void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, unsigned long address, struct page *page); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8fa2386bf7c0..c6d76f61de98 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5281,7 +5281,7 @@ static bool hugetlbfs_pagecache_present(struct hstate *h, return page != NULL; } -int huge_add_to_page_cache(struct page *page, struct address_space *mapping, +int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx) { struct inode *inode = mapping->host; @@ -5304,6 +5304,13 @@ int huge_add_to_page_cache(struct page *page, struct address_space *mapping, return 0; } +void hugetlb_delete_from_page_cache(struct page *page) +{ + ClearPageDirty(page); + ClearPageUptodate(page); + delete_from_page_cache(page); +} + static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, struct address_space *mapping, pgoff_t idx, @@ -5412,7 +5419,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, new_page = true; if (vma->vm_flags & VM_MAYSHARE) { - int err = huge_add_to_page_cache(page, mapping, idx); + int err = hugetlb_add_to_page_cache(page, mapping, idx); if (err) { put_page(page); if (err == -EEXIST) @@ -5788,11 +5795,11 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, /* * Serialization between remove_inode_hugepages() and - * huge_add_to_page_cache() below happens through the + * hugetlb_add_to_page_cache() below happens through the * hugetlb_fault_mutex_table that here must be hold by * the caller. */ - ret = huge_add_to_page_cache(page, mapping, idx); + ret = hugetlb_add_to_page_cache(page, mapping, idx); if (ret) goto out_release_nounlock; page_in_pagecache = true; From patchwork Wed Apr 6 20:48:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12804047 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7569FC433EF for ; Wed, 6 Apr 2022 20:49:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3745C6B0074; Wed, 6 Apr 2022 16:49:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2FA8A6B0075; Wed, 6 Apr 2022 16:49:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0D9196B0078; Wed, 6 Apr 2022 16:49:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0229.hostedemail.com [216.40.44.229]) by kanga.kvack.org (Postfix) with ESMTP id EFA5E6B0074 for ; Wed, 6 Apr 2022 16:49:02 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id A260B18352D53 for ; Wed, 6 Apr 2022 20:48:52 +0000 (UTC) X-FDA: 79327643304.23.815EBF9 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by imf13.hostedemail.com (Postfix) with ESMTP id C762220004 for ; Wed, 6 Apr 2022 20:48:51 +0000 (UTC) Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 236I6kqQ006378; Wed, 6 Apr 2022 20:48:47 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2021-07-09; bh=iHXQx76nOltcSEbcuNujPBA+R8M14NFeVFqYrZtPfqA=; b=FP+52k1L9RWzhNIm2sd1PGC4Z9T/9E/WzBGr/2UDnaesurWwkOhmzGSi2M3an9fWA0sb QPAVYHfcTmon1H/L3n5BZLNthXPaaweGJl/z/GE7bTZsLGWA8UBdkRb/jft6pPLY7Bue x2l4mjP9/POCHnq4KRpY3GYhNmI1+HlK4r+vsocKrL8Nkef/iLP8sjnsoJ5fdbf9jmXp W80DUKZIThvZ+y3vg4PSYPkdUMrUHBBeCfihPiVB2oWyHNcDH7qJfFBvdsp2cog2w38b eerxtsIa+J8nOfOVCHupOvYLjYfc9QR3eXKY5H0jHC3m4EouozjsjfdPA4wHwC0EA417 yA== Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.appoci.oracle.com [138.1.37.129]) by mx0b-00069f02.pphosted.com with ESMTP id 3f6d31j6u2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Apr 2022 20:48:46 +0000 Received: from pps.filterd (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 236KkcVx007568; Wed, 6 Apr 2022 20:48:46 GMT Received: from nam10-mw2-obe.outbound.protection.outlook.com (mail-mw2nam10lp2100.outbound.protection.outlook.com [104.47.55.100]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3f97y6tera-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Apr 2022 20:48:46 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=S/YBoGaT4/euv0o5MA9mpBnfptUaalzAxFqEuyL8NcKc9FdXXga4U3BKQQa2OqY3nZcrW/O+4Z/CGAWvT10i7BPQO6PXYMjsF1UW4zQ4lFJJu6N9TSZC9iDKtQco0f+6Xat2EIHr0y2FnbeQ5R+SyJ6/VjFdjl+GKcDu0AG6q2GQF0+E4JWoMYsXxfv+iyaXY9Tt4xx4QT4SYCMeDJWyjNhs4smTgIWYCvJZsFCyhpCvI0/lqBCgqP+nTL7Q0yYO6uk41uu6h32ling38oi5eV4KBzQRi+zH33/h2cun0EtsXtQG5ZU2DnfXeFUKLAzT4MHwlRnbMey7XfBCoekcXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=iHXQx76nOltcSEbcuNujPBA+R8M14NFeVFqYrZtPfqA=; b=TI/aGD/G+YnwDrrKaJ4+aDkWOwSPhVcZUqmuhOfB22ZygUuII3Olp8sJ/LNP/O1H3BKP0yDmv1rBOmRkbqudihnwxrpmXzSmbL21HcbvHPhz0X1qAYdSqv4AwndjWYIp9Au3L5OA6+R1AKuOchv7H237Z+n0FkvHFO15z/a+IkBUt25eGyNWpzype1SRTj+0Rwsn/bmh2hU3nRIHIKQkW0mIW+2JKOlmqUv/k1YJ1Pb4n2THxG97x7LL8vzLSr2Le0TRYDycquy6EDbFfApY1DuPGsEPdJTTk4F+78U7nl4quYRIbStqEqOnuWFsQMTzv3UgXrN4RvAPI6npR4HaVw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=iHXQx76nOltcSEbcuNujPBA+R8M14NFeVFqYrZtPfqA=; b=lpTDUrKAM/gW69m+B8GzsLrSY5/pKXKRfXifWKIO1i8+MJSpBFWjBjLYEKZieQPqpOmDQuSJmMx24ATsrGXKrBR3d6tZIyTrsB2suY/9HYnyUHbp8xPTcZ34EwFzDHcqP7uVtThDDAXxO1D9kmrXVF+0v1W3djwMwXKW8bJ5gYE= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by CY4PR10MB1909.namprd10.prod.outlook.com (2603:10b6:903:11f::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5123.31; Wed, 6 Apr 2022 20:48:44 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::245f:e3b1:35fd:43c5]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::245f:e3b1:35fd:43c5%8]) with mapi id 15.20.5144.019; Wed, 6 Apr 2022 20:48:44 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [RFC PATCH 4/5] hugetlbfs: catch and handle truncate racing with page faults Date: Wed, 6 Apr 2022 13:48:22 -0700 Message-Id: <20220406204823.46548-5-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220406204823.46548-1-mike.kravetz@oracle.com> References: <20220406204823.46548-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW4PR03CA0012.namprd03.prod.outlook.com (2603:10b6:303:8f::17) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 61a96100-507d-4952-cbf3-08da180ed6d1 X-MS-TrafficTypeDiagnostic: CY4PR10MB1909:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Ow2hMhYoRedYKahwckKqxSOI1YTsxlO1qagTMpMKaX/eW0CIIAUUxzYjMlFcJfDzZsz7UR0/4+QahQ7tI51JU/H8pYOIXKA1G/gZcs5wTZlDxPOrjW/4cE/q7FsJambsTA53EfZGo02g3b4U21wkX6TAZ55zlkE6NcqmqziWnc2XQgi2BJeQAIHARi4q+Z5sfxEU7hTP6FQr7qJ3ntbC18SrX3gRDzYvjTFoaFfgaMIfUBwGFLriC19SyycG9g9mJTRjLgOYcSjjuk0IGnhXkDoxJItaLwDW7a/SpWwqg8jQL53qjT5fiVY6ERUFwEySGaoh879NYXe2DRNxROfH0aptkJLVPE5syoPM57X3r1GkDtxrmjkWb2aSFEJvKfUQiaFFsqSUJPy6P89xkZt1GpddivpyYQZRdwt7EsTgKkGDhEs3LRAXdrZtM6PCGTZy92VWV3INdhqY+Way6iLXeiVI0gZoghaFrY1SBJstSBU9pFnprt58DsOw9pCBvXKDUhUGxBuz9OGs6svdxAoipmF9uOtPHHvNngPEzP5LZZY+Moy09tZNADALO0gir9e4dKyo/oYWAOlw36eC+mjhOzsaFUv72jcxNK8EUxKMw7iP7+mHuNGPeH7cy7ltbibm0xJ505KGKA6TGotWX/5maPbguO+Cm4AWDpkF7vvaIgemo/CU0/q/NmNXo+RDDWqTjgRKm+HZkWhYLekl+FXGQg== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230001)(366004)(38100700002)(54906003)(2906002)(26005)(6512007)(52116002)(66476007)(66946007)(4326008)(66556008)(83380400001)(316002)(186003)(6666004)(8676002)(38350700002)(8936002)(86362001)(508600001)(1076003)(6506007)(7416002)(44832011)(36756003)(6486002)(5660300002)(107886003)(2616005);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: KdGXH4++lCgEcTBLLIAEIFECuNqLsF+WG8O0g/ftvnDOAcw6Cz6NU/RTAT4BiPTKCNThEf+w3pyS+R9QbUjm3cpAHTRWmKP7t2zRDB8rihSJAGd154SZnpu6GIqHkke6OaorFhvM4gnodTF80/vxblTguX751HdWWLF/fbxBxK0ZTh2rvE4tdtCCbOgD5PRXnjAzjHq03d/BUBTmkUdUdlkyQy4L9PYQ6LfCL8Jf+SdAi3xdE/sFZCnNyZ23DmtnJ8ogpb4qypXiWkhbbR+ToB+ft8X8Ds22+vbLqvePouG16CB1vU3DnUv2jxaJN1LAyse6/WLbGprMIxlueP16AeSMiq4JrCYe+Q2gz1BPUaz2xzyP0rqB0M5VmOFFmqgORgYmdrOWY7thh/ODhnf88cXNQ7xLXZrQxPy3JKY4gX6PAiaqq/x5T/kQ7mtL8uXqN8tdlapxtBxfxy8LCGuw26TVMEoYYKTROECK01dCfh+D4DdmvmRDcFksQ0BcmDoNhEzUN3YcON/ZO3jptExBl7JZXJ0UpcE9AzwylXOu+E8AsOJtLkc5uegviIlDND+zzTiJ2BZO2VDnJ6WRFBXlMjZGJzhDlOaD1zg32jg3rdQVcMIN66xooQwMxuHYHHrY4T7HkbapncW1gsC6I2xaqUuvUx6+D/eUqFmeXsibYSR/UmplJxQDIyLYN5DwrU7H8VRhVwfXlbDzhqC/66mtDA3InFVvX46dcnzjlu61zQDmHbxXWz2ZPIwHETli4TpjRM8rryvTJmFODSNcj18M67gIZvHOvzae2iGOJ34cMRUP/LVJcpPCXvHVeGh2O92s+SZLlTkFgEfMDDNu1IuEl1vrFeuy1STfZIisiDWpTAGVH/nsl61ab9caWF61tLNv6qphZwMbilE6LQ/rNS4NXhmQv3KKBAwcPHn6YT0gAVjQP1dDkYPqcYDwD97eej1ul2cy/ujJFu73zh0HhAE8aae+p4W0ICZxWzWi/jUtU+PBMbApAjPjAnn1mnZTU6btWX4GWd/rdGV9Xzwdn/DbzA0UsdHuDtD496H8p2+xoEGI3nwQ66vxt/iyqZG8aDWfJ31OMIpFAvlHLQm3CUtKU46dAoBsGfHZh2LBzpH5qrcxox8Sax1pFxN053H3Vs0RAev3Uw75gERpEloBniZBfqj2C5XfaLlC+VDG+GbPjSOR113vBFr1CAWbzNBM6rwYT6tVqlF3YAKSS3Bv+N5cK7SnC/0ggEopNOL3BRMHzw3KXx4RLmZ4ijk0FyAWQp1wWrKStjnMffEc2aGXImltfT9uwk/MVlB6jTOQvMQDY+lKovDk+6zCNoq3y3TBzZIyzvzjSHZOhbKHpxsSTutAPi575dS9L6WQHXIff50a/yXvXWEvkm6GMWKdLV56frQEGcKr90e9H1AScwy4G4kqP/WYKncONYKlzhU4kEzzUYKTfMEfU4mEX5OjYalDRMIl7NGgNP5l9v4l+a4DlvToLQaHN/A/DPHKi6XXGpr4AhifdGNWuwV07N/hNqfaqq1AA3NLxape4U1hHlolYGlCM4Y4dRmKxjXy+wB3/RyxpyQxKJa51L24wzyEo3DdCW1vMGoG6A9F5XdNyTStpV13GwJDR7R3vhe12KKXfBgu4VseziG+yxbvXpYMmPRiRS2s58qSt30XhVO2RxHZW4le67YLRofIN8dPEX35rnvYF330u9ZzI9ncQbu9KAhEwUyFCbbMe43IYvma2vkd6qdK/1nWzHhNrFHpyUhLoDL4WNw= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 61a96100-507d-4952-cbf3-08da180ed6d1 X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Apr 2022 20:48:44.0903 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: rdlEr64zpASTk9lICXffyTK69IMM53pXrWIn0kbjwFBmVIkVepoxnOy1oevuhd4myDcpPhXOjnZAiUrOJ1rq3Q== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR10MB1909 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.425,18.0.850 definitions=2022-04-06_12:2022-04-06,2022-04-06 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxlogscore=896 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204060103 X-Proofpoint-GUID: WVqzha9uuDMeBNOtyT4_LWU-1_AApKPd X-Proofpoint-ORIG-GUID: WVqzha9uuDMeBNOtyT4_LWU-1_AApKPd Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=FP+52k1L; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=lpTDUrKA; spf=none (imf13.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.165.32) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: C762220004 X-Stat-Signature: ace6i9s6tarwh1kcq54f5rrhxhea61py X-HE-Tag: 1649278131-82651 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Most hugetlb fault handling code checks for faults beyond i_size. While there are early checks in the code paths, the most difficult to handle are those discovered after taking the page table lock. At this point, we have possibly allocated a page and consumed associated reservations and possibly added the page to the page cache. When discovering a fault beyond i_size, be sure to: - Remove the page from page cache, else it will sit there until the file is removed. - Do not restore any reservation for the page consumed. Otherwise there will be an outstanding reservation for an offset beyond the end of file. The 'truncation' code in remove_inode_hugepages must deal with fault code potentially removing a page from the cache after the page was returned by pagevec_lookup and before locking the page. This can be discovered by a change in page_mapping(). Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 40 ++++++++++++++++++++++------------------ mm/hugetlb.c | 28 ++++++++++++++++++++-------- 2 files changed, 42 insertions(+), 26 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 0cf352555354..341156c2a7d0 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -490,13 +490,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, * unmapped in caller. Unmap (again) now after taking * the fault mutex. The mutex will prevent faults * until we finish removing the page. - * - * This race can only happen in the hole punch case. - * Getting here in a truncate operation is a bug. */ if (unlikely(page_mapped(page))) { - BUG_ON(truncate_op); - i_mmap_lock_write(mapping); hugetlb_vmdelete_list(&mapping->i_mmap, index * pages_per_huge_page(h), @@ -506,22 +501,31 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, lock_page(page); /* - * We must free the huge page and remove from page - * cache BEFORE removing the region/reserve map - * (hugetlb_unreserve_pages). In rare out of memory - * conditions, removal of the region/reserve map could - * fail. Correspondingly, the subpool and global - * reserve usage count can need to be adjusted. + * After locking page, make sure mapping is the same. + * We could have raced with page fault populate and + * backout code. */ - VM_BUG_ON(HPageRestoreReserve(page)); - hugetlb_delete_from_page_cache(page); - freed++; - if (!truncate_op) { - if (unlikely(hugetlb_unreserve_pages(inode, + if (page_mapping(page) == mapping) { + /* + * We must free the huge page and remove from + * page cache BEFORE removing the region/ + * reserve map (hugetlb_unreserve_pages). In + * rare out of memory conditions, removal of + * the region/reserve map could fail. + * Correspondingly, the subpool and global + * reserve usage count can need to be adjusted. + */ + VM_BUG_ON(HPageRestoreReserve(page)); + hugetlb_delete_from_page_cache(page); + freed++; + if (!truncate_op) { + if (unlikely( + hugetlb_unreserve_pages(inode, index, index + 1, 1))) - hugetlb_fix_reserve_counts(inode); + hugetlb_fix_reserve_counts( + inode); + } } - unlock_page(page); mutex_unlock(&hugetlb_fault_mutex_table[hash]); } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c6d76f61de98..b8f994961a68 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5361,6 +5361,8 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, spinlock_t *ptl; unsigned long haddr = address & huge_page_mask(h); bool new_page, new_pagecache_page = false; + bool beyond_i_size = false; + bool reserve_alloc = false; /* * Currently, we are forced to kill the process in the event the @@ -5417,6 +5419,8 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, clear_huge_page(page, address, pages_per_huge_page(h)); __SetPageUptodate(page); new_page = true; + if (HPageRestoreReserve(page)) + reserve_alloc = true; if (vma->vm_flags & VM_MAYSHARE) { int err = hugetlb_add_to_page_cache(page, mapping, idx); @@ -5475,8 +5479,10 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, ptl = huge_pte_lock(h, mm, ptep); size = i_size_read(mapping->host) >> huge_page_shift(h); - if (idx >= size) + if (idx >= size) { + beyond_i_size = true; goto backout; + } ret = 0; if (!huge_pte_none(huge_ptep_get(ptep))) @@ -5514,10 +5520,16 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, backout: spin_unlock(ptl); backout_unlocked: + if (new_pagecache_page && beyond_i_size) + hugetlb_delete_from_page_cache(page); unlock_page(page); /* restore reserve for newly allocated pages not in page cache */ - if (new_page && !new_pagecache_page) - restore_reserve_on_error(h, vma, haddr, page); + if (!new_pagecache_page) { + if (reserve_alloc) + SetHPageRestoreReserve(page); + if (new_page) + restore_reserve_on_error(h, vma, haddr, page); + } put_page(page); goto out; } @@ -5812,15 +5824,15 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, * Recheck the i_size after holding PT lock to make sure not * to leave any page mapped (as page_mapped()) beyond the end * of the i_size (remove_inode_hugepages() is strict about - * enforcing that). If we bail out here, we'll also leave a - * page in the radix tree in the vm_shared case beyond the end - * of the i_size, but remove_inode_hugepages() will take care - * of it as soon as we drop the hugetlb_fault_mutex_table. + * enforcing that). If we bail out here, remove the page + * added to the radix tree. */ size = i_size_read(mapping->host) >> huge_page_shift(h); ret = -EFAULT; - if (idx >= size) + if (idx >= size) { + hugetlb_delete_from_page_cache(page); goto out_release_unlock; + } ret = -EEXIST; if (!huge_pte_none(huge_ptep_get(dst_pte))) From patchwork Wed Apr 6 20:48:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12804260 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E33BC433EF for ; Thu, 7 Apr 2022 00:15:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 76C9A6B0071; Wed, 6 Apr 2022 20:15:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 71D6A8D0001; Wed, 6 Apr 2022 20:15:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 56F976B0074; Wed, 6 Apr 2022 20:15:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 4846A6B0071 for ; Wed, 6 Apr 2022 20:15:24 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 110952690 for ; Thu, 7 Apr 2022 00:15:14 +0000 (UTC) X-FDA: 79328163348.10.F60AFB3 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by imf03.hostedemail.com (Postfix) with ESMTP id 1068320002 for ; Thu, 7 Apr 2022 00:15:12 +0000 (UTC) Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 236KXxDb024447; Wed, 6 Apr 2022 20:48:50 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2021-07-09; bh=CbB8HYoTtQrkgpUR1SAECZ5vTD+acpv+582ig97c9e8=; b=ZlbV0oXjsniRk7/ZCh5tJT4PS15tSHsI3ksRHGj66kS+2ya3r2ciKu0kkyMBQ+gpinc4 zKrzAgflUL/d4LVMf8ykVIC060byAJhzVLYw/hcKG6OA6vHxAL1tioxDaPnQxvdSvsqY Xj532z0rc1LTpZAqyAbMtF9j3CCyU5+FwLTQJl1kJHcNKIjiFrxvKnwQ0e1jyqF7vyyR M8zAKBm0Dtf8iUeoCClfJU6kTTS+p5FMYDM1lz5KYDiGVDxzO2jX74amgEGhiFVo9oXF b3WeWnWMSQ5km+5Y4c4utfl9PsbHfMZ8a7suAQzx1NXW1kYvIp0fUWmSYTaQBTs+2eZF Ww== Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com with ESMTP id 3f6f1ta8ty-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Apr 2022 20:48:50 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 236KkxEl025441; Wed, 6 Apr 2022 20:48:48 GMT Received: from nam11-co1-obe.outbound.protection.outlook.com (mail-co1nam11lp2172.outbound.protection.outlook.com [104.47.56.172]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com with ESMTP id 3f974dbg4j-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Apr 2022 20:48:48 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cRnPVY6bqQMpIlt8y1YANZEmXMja21KCeY4hqd+QMv0xDo0yg8aeEC8SWPOuJJT/9RtPWULezKg8PbEAe+g7WDAlonbEjhMqDPsQ7Gp9I/BQTkQ5h7wKhdSIMD0K7FqGpvy0/HsOqVyJChSsdC8GpZlOl6hpTfWy6p56O5C9PEb+Bvn6TV/3qN4NFIiL6g93dUcyAxvyhEVdLCooqVt64cqRrEUtzbj9oHRi8CEW45PGuk6ZMFouIh+p9HCpU8I4f5brwxu/+19b+PxeIeLOUuTHgqQDtzJCbaSQYEuDuK/wDkuo106MDCHiPHtrfOcEVqdvT124a3S2iWaeE7vbAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=CbB8HYoTtQrkgpUR1SAECZ5vTD+acpv+582ig97c9e8=; b=dau7AFnpNVuDKDqeFtsAmFkVDI6f8g61hbwLME7i/jnL9Z6NzVVcTPxvYlkY3B63Mhw0yNN1LP8zjeAFCOFYMco5B/onJnVfoLEdaJ1+7I1cIDR3nCK9qpRJXxcgX++WXaSo+Z8zebr8RI24a9qaWD7QNLpVIcf8yVOxVgaawepEtTdWJ4ZmPyuxOIaFlDLSUu8NMBWCsZa+8hA6cXLNFCnJB0iw+qTriQ97TwPjC5IJSxBq9K8REdskS/fL7S//fFMGrSE5Pt2OvUDErCqdPxFJVTWfk/D2vwCupYnYfwv26VurMV32wmveNFoJxtUjHwBxuDXuCVSKobJRuTED+g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=CbB8HYoTtQrkgpUR1SAECZ5vTD+acpv+582ig97c9e8=; b=Zbp/GEdXjPGTrYK8y2iqyc673CB6WLFMrgpr1s0ljy/RhmVROXtp1rarTjN0ODNdXRTYnBcLUUvn3HaCWlQqTnOtB6GnxgmPtU4pMO7EdQ+mRZp7zLtsoPtRVWvNEnmJ0M8Zvuq1Gg7oDM0kS6XmzCjkzLv5a9TzvDV4+bzqeEc= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by BN8PR10MB3362.namprd10.prod.outlook.com (2603:10b6:408:cf::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5123.31; Wed, 6 Apr 2022 20:48:45 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::245f:e3b1:35fd:43c5]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::245f:e3b1:35fd:43c5%8]) with mapi id 15.20.5144.019; Wed, 6 Apr 2022 20:48:45 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [RFC PATCH 5/5] hugetlb: Check for pmd unshare and fault/lookup races Date: Wed, 6 Apr 2022 13:48:23 -0700 Message-Id: <20220406204823.46548-6-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220406204823.46548-1-mike.kravetz@oracle.com> References: <20220406204823.46548-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW4PR03CA0012.namprd03.prod.outlook.com (2603:10b6:303:8f::17) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: a5a9d5f0-4c18-4cfe-7860-08da180ed7b4 X-MS-TrafficTypeDiagnostic: BN8PR10MB3362:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: k1IjBBjmZsaFjrcwHuxWintP10fmGp1MCB4747kd9OYze8+N3dMB1OXF2UE3IXaS9ga7wu8NaOWMQcgEjLkXe/zBnUwo070gzKLsBR5y55WgITk1eSPVz2vpmKY+Z4YIrmSF72p34jF4SLFKa6aRSqi57HSHe5Pxs50Or3HtuqZOvgRYxi+qdmbHGfT5z9lv56+dQ+t+CVtsYPXUwrzea9IXQLEAhCwicQYYMZwLEVisTiCMx4mKj4Dapw+MLptwHymu038DhAakABV3JNvYoLcorfwZVgcla2vXISQ+iAACj+s2OHLKs+1QH0VVQyecfOJdAch44NSe0d1wKD7xoZhAhbwRJI5EaOqvbwiMOh072F4+QqWAqJnR/OB7hZNiJdBC0508sj3vHDQ+ar4QToTiQHD64FTnIZM7CZyYa5sZg1IluwL+E7qC57iwZW9I/Pi/lGpM8I2b19Aw8UK/QVSqCllFVhRBZ1V1S/XNW+OY5FnntRdMxQK0h9p7Pl4eAvhpNyhPHltW/ZNMGgZEF413yk3ADQxdmbhtUHMbeHQ3b1NzxF1nfGP1eBh8OC0lfXan9mqQON7zT4nwtG4nrV7m6897D5Sl5g3qxgKVPpw1+26iNBkHccieC/rsFqTA2WA+TV3+J7WZHZK0xuQIKh4kfmFScXHkc/ZZ4QClNnelwwuQ0PO62WijAQSSTeHGXSqapKKSdBA0ZR58jjAd5w== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230001)(366004)(6666004)(38100700002)(66476007)(86362001)(66556008)(66946007)(83380400001)(8676002)(6486002)(38350700002)(508600001)(186003)(26005)(4326008)(6506007)(54906003)(316002)(1076003)(6512007)(44832011)(2616005)(107886003)(5660300002)(36756003)(8936002)(7416002)(2906002)(52116002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: AoaUxBLevg1/bWL20/wgc6zPa8S9VjGBg8ylupUVwdKBYq9FD/ypWB+DXmcolcgNTjC4/1xxc8wcYTblfFshBgJfnp4pyZADxQIDw16CdQ/OiTGSXiuQ4AZBmRxmFBEykcN7SRfPxzV3AQhQL/mw/s7jfDGrftiPrMb6sppsK6HBsoOlu2gz8V1bBbUqcHkBi092TjIRA9jbOKaLNZ7Dl+sBn0ZY1nCPmM1JLcVa/gJD3rDoNohGppe4o//c+HC3oOrYRUA0VUKeiY3Zb/cTOCS2X5WA/fgZsPl+QiuX3JRBd9U0D0aCqK8YwK/LOKkiu8+DNMDEuLRp7wJ4H/RvaQW8vXSOy6pfl9QqG1hlEPMAwJFH+8SlSt/0hZ27JprmFMvnznm6Kw8JosL8yTaR+UERdW2LgHKA/GBxV89SbD6b8uFg+Jsgrqqk4bDRmxlK3hB63NDl6tP6BFTbUMAEaMwbb+WKONtYFHv2SYUqdYWLV/EPJtxj/5HpD1SAZaMGKodunFVAKcMaRYvaxdc/z4tg/z0ost4v3F7/eVHYAFT3RHxYWHbuYPWxXMjnp2dRews/JgDENBYTwNBLqmyrvJzlkLHvdG5vauinVXOKW8KKKs1yiLX0JYImz6qiqCRMr+qGyP9YRFYZ1nv2hi4t5f/pF+c5jWFTNit843+HhI4hS0rhCjXiuoE5Fdd3Jk4lMhjERiV9LFIxUy5uvS+f/539ETWbKu3pP09kpC+YrLVgIzrxnfMw8C2vxht5f5odR8ZAzT0Kd2gEZ2Pty8B2aLIGUEOC3Z8XVbl7tHzg3qX0+BJNHF/IhVuJ5AC/P9KxpUhEO5mkemYWrdSW10XmF0kS/cSjpifEMmqeXWFFsX1UXKSP9Dtd63+4d6IC24L6qAdIjOdOYRlm3eii5ZuxyMIneebKBYC+HQlWajxG1Az50VohpQzIIGXic14C2zNn8WBQ5oiGNqhDA1b4Fv7t3sZBjebkH0fL/IHCZiVIXIOIxqTvlK3AO8jbkG0k6u9i6Xepp+pXziIh3ujmxTUvu5cIdQp4HZQC1FveHfsUhbwdomS0v4wDKkvkI1KIaQtYbB8WDkh4uEuoJ+OyH6N/B+OfburoYywIyzQW7F6wPeqV1b+QEeF0TR25uv9P0iOGslGEhobJzx/kf1FlVSVNwaDU3oBEMJjD6PzYhsrkfo75YCcgBDuGg7DIoa+MSm9+yzM+52xx0x+PsdpjSesVx5HRUSSTXLSiLY5zwUI88U9EYovegu07rA6q3/WeCuZnoZu8RJCCCYsUUEruY5ibtlh10hNdbrHmDeNxD6q/LoXgyBIOzCRR/cdfHHcKe9F53IIuveDD8HwYXmb4/Ovfgb/HVRp4hmHindPaMLJp2WBFs3dPvi2q4WiscZaoyVQdElsgvvuZloP+zSjbh3CG7rg2NXKOyWGJni4KPkeAX/xah2Gr7SYYZ8t6cdtIchtqjz9qGmojpnTXw+/KNEwt1UOyumazT4mnYOl/dZhuhFQiIKMf5Kg3T5uVJL2z418ywZ5Bn/NQ4kS8Ec8dIBu46sIS290GjsZskrH6WspHqR4HoLPzJKaKJTeX8EEYkzHCxOtYgvHBk1lnvNlmLxoW3VTXLs8BcZBIwSzj5khYQwyss5ZWOxzp/421fBw9hSBE4pEYhzOSeqJ/t+GokpZQezmhFwpGHfwgycpXn5/1pRG+GC9PHbNr1zrbcnm6uA7gyzmpCVHhyx86RO1KhQS3EHB8mWrp8rk3de3CmV0rIaU= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: a5a9d5f0-4c18-4cfe-7860-08da180ed7b4 X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Apr 2022 20:48:45.6071 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: R/Nw4MaczzTjfOX3/bmUXeDdzo0BSaPFsbyx8QYQq+nPlY0+3oGgGqpuxY5PJMab1dzbjISA0sHv5le3kEvRNw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN8PR10MB3362 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.425,18.0.850 definitions=2022-04-06_12:2022-04-06,2022-04-06 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 mlxlogscore=999 suspectscore=0 adultscore=0 bulkscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204060103 X-Proofpoint-ORIG-GUID: ZEYI-vJqnT0ULAbMIX8ko_MaygG70-EO X-Proofpoint-GUID: ZEYI-vJqnT0ULAbMIX8ko_MaygG70-EO X-Stat-Signature: bmuo3qmo1hzam1af3pcr4ruam59mufaw Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=ZlbV0oXj; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b="Zbp/GEdX"; spf=none (imf03.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.165.32) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 1068320002 X-HE-Tag: 1649290512-565539 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When a pmd is 'unshared' it effectivelly deletes part of a processes page tables. The routine huge_pmd_unshare must be called with i_mmap_rwsem held in write mode and the page table locked. However, consider a page fault happening within that same process. We could have the following race: Faulting thread Unsharing thread ... ... ptep = huge_pte_offset() or ptep = huge_pte_alloc() ... i_mmap_unlock_write lock_page table ptep invalid <------------------------ huge_pmd_unshare Could be in a previously unlock_page_table sharing process or worse ... ptl = huge_pte_lock(ptep) get/update pte set_pte_at(pte, ptep) If the above race happens, we can update the pte of another process. Catch this situation by doing another huge_pte_offset/page table walk after obtaining the page table lock and compare pointers. If the pointers are different, then we know a race happened and we can bail and cleanup. In fault code, make sure to check for this race AFTER checking for faults beyond i_size so page cache can be cleaned up properly. Do note that even this is not perfect. The page table lock is in the page struct of the pmd page. We need the pmd pointer (ptep) to get the page table lock. As shown above, we can not even be certain ptep is still valid when getting/locking the page table. The other option is to always use 'mm->page_table_lock' for hugetlb page table. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 33 ++++++++++++++++++++++++++++----- 1 file changed, 28 insertions(+), 5 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b8f994961a68..e5196f0fa09c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4695,6 +4695,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *vma) { pte_t *src_pte, *dst_pte, entry, dst_entry; + pte_t *src_pte2; struct page *ptepage; unsigned long addr; bool cow = is_cow_mapping(vma->vm_flags); @@ -4741,7 +4742,15 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, entry = huge_ptep_get(src_pte); dst_entry = huge_ptep_get(dst_pte); again: - if (huge_pte_none(entry) || !huge_pte_none(dst_entry)) { + + src_pte2 = huge_pte_offset(src, addr, sz); + if (unlikely(src_pte2 != src_pte)) { + /* + * Another thread could have unshared src_pte. + * Just skip. + */ + ; + } else if (huge_pte_none(entry) || !huge_pte_none(dst_entry)) { /* * Skip if src entry none. Also, skip in the * unlikely case dst entry !none as this implies @@ -5363,6 +5372,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, bool new_page, new_pagecache_page = false; bool beyond_i_size = false; bool reserve_alloc = false; + pte_t *ptep2; /* * Currently, we are forced to kill the process in the event the @@ -5410,8 +5420,10 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, * sure there really is no pte entry. */ ptl = huge_pte_lock(h, mm, ptep); + /* ptep2 checks for racing unshare page tables */ + ptep2 = huge_pte_offset(mm, haddr, huge_page_size(h)); ret = 0; - if (huge_pte_none(huge_ptep_get(ptep))) + if (ptep2 == ptep && huge_pte_none(huge_ptep_get(ptep))) ret = vmf_error(PTR_ERR(page)); spin_unlock(ptl); goto out; @@ -5484,6 +5496,11 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, goto backout; } + /* Check for racing unshare page tables */ + ptep2 = huge_pte_offset(mm, haddr, huge_page_size(h)); + if (ptep2 != ptep) + goto backout; + ret = 0; if (!huge_pte_none(huge_ptep_get(ptep))) goto backout; @@ -5561,7 +5578,7 @@ u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx) vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, unsigned int flags) { - pte_t *ptep, entry; + pte_t *ptep, *ptep2, entry; spinlock_t *ptl; vm_fault_t ret; u32 hash; @@ -5640,8 +5657,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, ptl = huge_pte_lock(h, mm, ptep); - /* Check for a racing update before calling hugetlb_cow */ - if (unlikely(!pte_same(entry, huge_ptep_get(ptep)))) + /* Check for a racing update or unshare before calling hugetlb_cow */ + if (unlikely(ptep2 != ptep || !pte_same(entry, huge_ptep_get(ptep)))) goto out_ptl; /* @@ -5720,6 +5737,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, struct page *page; int writable; bool page_in_pagecache = false; + pte_t *ptep2; if (is_continue) { ret = -EFAULT; @@ -5834,6 +5852,11 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, goto out_release_unlock; } + /* Check for racing unshare page tables */ + ptep2 = huge_pte_offset(dst_mm, dst_addr, huge_page_size(h)); + if (unlikely(ptep2 != dst_pte)) + goto out_release_unlock; + ret = -EEXIST; if (!huge_pte_none(huge_ptep_get(dst_pte))) goto out_release_unlock;