From patchwork Wed Aug 24 17:57:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12953867 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D191AC00140 for ; Wed, 24 Aug 2022 17:58:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6FE49940007; Wed, 24 Aug 2022 13:58:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6AD476B0078; Wed, 24 Aug 2022 13:58:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4FF8B940007; Wed, 24 Aug 2022 13:58:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 403986B0075 for ; Wed, 24 Aug 2022 13:58:29 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 15A8914028E for ; Wed, 24 Aug 2022 17:58:29 +0000 (UTC) X-FDA: 79835245938.17.316A5A3 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by imf06.hostedemail.com (Postfix) with ESMTP id 4488F180047 for ; Wed, 24 Aug 2022 17:58:28 +0000 (UTC) Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27OHkK8Q019890; Wed, 24 Aug 2022 17:58:12 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2022-7-12; bh=zx9J43R47QH1zrykkR9tfPv7uSBXWgGXVu8WR9/YCuM=; b=EYRpbRa3z9C46CC+oN76I4dDQGuqYRCGx7pLG45LUFCdYLWRlRdxABYQL43JKPoU4bMu +LelB8XT+ONQ7CNuQl+/LhncFXCyT/Y4c09CLOScAY+aPSJC3tkaIVbMGmW1luntpYzx 67LJU6zAzDxO9FybbWmxofoXFIbMwPnpdsE9gBicfSlqLnYG46AhHzGPaRXDz9beIdAL +wAn7tYTIVP6RcdBWboLfqOMS1Kb4p3O0tPn/OpGxcYp/FtmxebRNBbi4fgnvcJEf80m VKBlMB8Ylg763BX61u9buQMvTgncnb1EJtDDaY1qIa2DamwYWO+rbuGMGvrotFgWH0+N zg== Received: from iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta03.appoci.oracle.com [130.35.103.27]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3j4w23vbum-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Aug 2022 17:58:11 +0000 Received: from pps.filterd (iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 27OGujfG008185; Wed, 24 Aug 2022 17:58:10 GMT Received: from nam02-dm3-obe.outbound.protection.outlook.com (mail-dm3nam02lp2044.outbound.protection.outlook.com [104.47.56.44]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3j5n7akm44-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Aug 2022 17:58:10 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=KswQyKbGwQvLJZh7JOZb+CjRVsTQH5Oq6RLuwDa+LJwE8C5aaHLSJycV6ugzNfy73L1ROoJbsN4Zg4e+oUwM9gqkS+v/RfWitePUOXmfKp3q/oHErmR3QuNwq5AnGYrLEVNIeD9oGlNqNHbTwS05V8ksDnrg/SOT24WkR9LqwArt3XBPhiXeaOxYQvH0xsN7za/qhkSkISl7wAV+G5q0ArY3RSGJep4JX8VmRxjG7/BTSVo1U0PaH3tsfRuMeOLKSDaKB+GiCtFD+wKWaMwwG4OvFfUkdH8tahjSX5SCEOtPLhLeVPhtYkGHjE4a9yH+7nwg/G+I1r/FTIs8qy37GA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zx9J43R47QH1zrykkR9tfPv7uSBXWgGXVu8WR9/YCuM=; b=PXfOx+TNgmQGT/EKleW7sPi9FoSK/nMCZ65Hzm5kz1mONCs5A7jMfk5QFE138HX/7g07Cs1JF4EEqZt41azxgi4+lbI0cqcYzqv22fTVpTgLTpxs7HHsKjiz6hjZL3/vMh/T5UjRVRn4wEK/Q8wyqQfTKgsa8QrTgmWQCBStHX6tA0WQxIOBcMVp5r5wEtKd9VViNhokwpyWEoXBCK9qr+J9Ltzoi1neZqJ9rL232LXnaQJfSexuDSh5JiTTJaj5IgTAoVefAyscSWrMQNFK7ufnK0LaIOZW+1uuatxQTfJS6m/IkN3WdhBsqZ7dZb84KWM7F6kz2uLF4ftgugs0sg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zx9J43R47QH1zrykkR9tfPv7uSBXWgGXVu8WR9/YCuM=; b=OB0MMIr7sujqQLVA9NuWJC8kh42gsjgzmY5yA7f7kTYLEIQGb0zGwBUR1NohDs6ZvyrN0jSUW1ImhJC55fQD/wM7fVTrHkwFsHkBCvxlxA+Zs5Z5U/Mk/LF/w0jhKAvJ6ss81lIPA4waiWtuhDC2syCZpO6rl4fSBCHmdq9F6V4= Received: from DM6PR10MB4201.namprd10.prod.outlook.com (2603:10b6:5:216::10) by DM5PR1001MB2251.namprd10.prod.outlook.com (2603:10b6:4:2e::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5546.22; Wed, 24 Aug 2022 17:58:07 +0000 Received: from DM6PR10MB4201.namprd10.prod.outlook.com ([fe80::11b6:7a8a:1432:bec]) by DM6PR10MB4201.namprd10.prod.outlook.com ([fe80::11b6:7a8a:1432:bec%6]) with mapi id 15.20.5546.022; Wed, 24 Aug 2022 17:58:07 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Miaohe Lin , David Hildenbrand , Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [PATCH 1/8] hugetlbfs: revert use i_mmap_rwsem to address page fault/truncate race Date: Wed, 24 Aug 2022 10:57:50 -0700 Message-Id: <20220824175757.20590-2-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220824175757.20590-1-mike.kravetz@oracle.com> References: <20220824175757.20590-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW4PR04CA0362.namprd04.prod.outlook.com (2603:10b6:303:81::7) To DM6PR10MB4201.namprd10.prod.outlook.com (2603:10b6:5:216::10) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 381512ad-1c9a-4526-1742-08da85fa3355 X-MS-TrafficTypeDiagnostic: DM5PR1001MB2251:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: pX6p7Jg3WQGDRyeEAaDvrSxnVKY36hzV+Dq0HdXF9hMheHyr5hSnr6tSbwWPKBYZ/lr8e7UDXt55svpV0o68DbiCzvW0M/+qnGd4YWYfJtOqX9v5Q2qWj+dJuVQpvvrBMItOKOu/DTSoYE2jEw6zkUUhtc6jcw25p97CSd7912Ifl6Fp/U8MzMu8o1onTY7LJU6WhmlTUbx8R83fNBdWa3k0AcXzH45evVnqQmSrvYRWdz3Xie7yk47oHzokQp3ZaN7N0iQv7tmH2cC6MNVG4dK9Slxp+GusKOLURuhKC10yLnuVlM1TjANUfwKOeaY2sRr7rbohtJUnL5rYvch6+rXpfMT196o5IZwX2aB9vu6HJYB9uWhvOkePgGXVsbgUFwZ2I9BY0QGEkexXhindDW0f0Klm0wd9ehoNp4o+JFrJaioWvR4Ys8KmhQW4yD2g1J3etgyLIMyWg9Hdbmr0Cpbcll0frlll/ybcThz3pt6bOOqmsaEmGZui0Tv1gOrCo/KCxQJ1gyMTSLQl+4MO1qwJTqSja/zeMVnX+HYCw1Qraa272s/X0LMwxlb0IEmIA9HKn+ZjKhhhQ0ozLSv/stw5cF1t6gzmmUYWl0iPXxTSP7L5QFk9myLjd2g1TgoCS2owwRdnJXScUq/fOUiV6uTN7KfPWtpg6PWn9azhaxxMXOXX0n2EL/eQVhIJbDlHaMlRv1CcaNh5rqcJgAS/Mg== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DM6PR10MB4201.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230016)(136003)(396003)(376002)(366004)(39860400002)(346002)(44832011)(2616005)(186003)(38100700002)(54906003)(83380400001)(1076003)(66946007)(86362001)(8676002)(66476007)(66556008)(316002)(478600001)(4326008)(41300700001)(6486002)(36756003)(7416002)(8936002)(5660300002)(6666004)(6506007)(6512007)(26005)(2906002)(107886003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: L9b3H2bGyVYD8eVvDdQfOpF3Fww8E8IAdX1RZ9qNkd6BzBHrODncM51xYXYF6+/TP3eTO2QwGDlrZYoY6p0ZSSYqPsLJklh+ARomiWXWlN2jfmUeck5u3b3AA7vOmtI7sTyuKRAx/IQvyPAcCcYfJtJCLA3bQjnTvilpK7F30RCId+sDu0+RlvhdMGyDp0Au5c062P/H04pdqen7YJnZUZpPCGcb/NSHEwcquiDo4Dl207+vWa9/aGR0hxoW94r2WxwUNoLnuF1xx8FAFeYYcVk36CDHcru0BgIPkVxW/7bR2bCB9Mz5iCGutNyt2P0xsWSvd0/IkdtQGu9Uvvptgk7qJnXB/ndMcX6OqoYt3nDaAbuHODiAtvcESn10l2hpDPLHIoT+bAMfOFLRug5Lgw9N+Xt+cDWkKW4mi+fHWrTz+QA4J0ET0CYCxhHMl3ALDo85GZVWvwb2vlIxL6LKQjIqBXWSHNM6CQtOLVPZb3EDCheX0iM1o+RHUUu/LuOFaoesCP3NculCQOB8qm5O6PJS7RvxOdzqZ77JTtiX4dCrZMfb5u19uecr0vTCfYyvl7y/gcqCMCzTCckoC/TGwGzZR5BKZMKW4dXdIm79KxwhIJrfC+sxBUJImFESCKoUOJ9rEs+QXawETvo2Qb9JLwLn9560+XMytLqxrL3fQHLhQ2pudIFqHtrNu+qBIBWNFRM9gDEE/itet/2YBRj44GtZC/hEUTCDCyqjYDpKok3IFoMWxiEuhZxmBgL5ksPcs17kIQZ2iJq2gzGmaSFz82prHyC2s/4qJNg5TgHg0SSeVVGVLKuA+p9UZbZWNfJC6ajZeGTW2UAVI60+YLa91kDGS263J/IoWLqfLTPkkXxLctwv6GCWdy8nhzI9HDhCvnM7UQNxcA+zgRwcynKnAW5uBAsUWLWXX6uGpzclBNQmMtLXKuZ/7ktdwZgMGKUA4BJRyLE8QI4xnm5boKfUiHQpAcLN0HK8cjQeq5EwcuFkbde8NkepaSY/GlvPkwzNYp3RxZ9y2h8/KLGtivGe09X+vCUfJqPTW15UfF3apwOruplAxImhs6iQ1WBTvJU1MIEwMTBzHGn+PU6UCkQNJI69h9Xw/BfsUz1fXpCDLsfqP3j3R+70KNAZ71YoCnq8e9KH+//srRM3mtkIdW9oi3JYJCB1ZTgIXatLIHpsZtJAnzWPdV4Fnssvn63fmjAKg4Rz5o8aMiWRses0sEvb5J8cqfsICcDSl+xzJJne2hRikT2STilLTfMNmpG+3vXBRZFNWW/+x5l+a6FqrfkVGVisK/teFp+kUsStB8cwVhYhUt8QJf/4i7fKBPSG4PkMt5mnnEE00ZRe7yB86JxSOB4MsCIM+y8NJQKLk5jhSUE7xtLgVaeWTj9QnPTokPENVUc4hUWS1yENyDBgm/W252VKdxG9lGCj7hd+3Zjd2GksgNlT9yTChZaCfbOTrKWxJJYXaQ17KDaCn4XVYJvpW4ur7Gi3U+O0SqxEnXCn2St/XCyUnOazrCO172cBdzsX2wJuMwC0sqvJlEnDiLr18ArdSrEddQtMQ+nOIN5w9p09T+wxb/4jEuLKCaVGvU8J8luf/6bQhFagtMUcaHo4Kw== X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 381512ad-1c9a-4526-1742-08da85fa3355 X-MS-Exchange-CrossTenant-AuthSource: DM6PR10MB4201.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Aug 2022 17:58:07.6504 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: t7+HNqcezM0cGkV1hBpk5h7rJrXs9E5wmDDvLpgerts4XZcfR3hfvYdYsj+HZAI749+rUfq7ZARo7xMFhG1DeA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR1001MB2251 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-24_11,2022-08-22_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 suspectscore=0 phishscore=0 mlxlogscore=999 adultscore=0 mlxscore=0 malwarescore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2207270000 definitions=main-2208240066 X-Proofpoint-ORIG-GUID: J7FDiQgmc6_cczxmbC4jycd0woQ4ctum X-Proofpoint-GUID: J7FDiQgmc6_cczxmbC4jycd0woQ4ctum ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661363908; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zx9J43R47QH1zrykkR9tfPv7uSBXWgGXVu8WR9/YCuM=; b=qSBojSuY+QJHGkOWe1Z6qo8DnvNOH1W0bOta+YiRGHwF03shYf0RH1he3EVS5KmpzKnsSg bOHV+BUAdesZMYS1Hc+xzCdKEkLlpIvljuYi91xVa4+dq+hXZsl1qu+IfruDmeJQHmVPA8 zPtL6G1sboFttPMjmnEjSp72EcZ1ieU= ARC-Authentication-Results: i=2; imf06.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2022-7-12 header.b=EYRpbRa3; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=OB0MMIr7; arc=pass ("microsoft.com:s=arcselector9901:i=1"); spf=pass (imf06.hostedemail.com: domain of mike.kravetz@oracle.com designates 205.220.165.32 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1661363908; a=rsa-sha256; cv=pass; b=0vZdFfLJ6+TWxMpYWgsZCgXaq72VGOgqbvlm0bBo2EkpXqdvnk+mvppC/78o5P43PczCiZ n2OwfXfpn4v4xcqYWcsZ/xvQqCuYoCSTjBd99ILuvs5aPPO9zX0RpJH2p2RsSywgAWp3rY OhAeNHv1oiQntICj8gndcB+5BEt10M8= X-Stat-Signature: a6jiro6btpp7eqkxfengfeb1f4ifjkds X-Rspamd-Queue-Id: 4488F180047 Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2022-7-12 header.b=EYRpbRa3; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=OB0MMIr7; arc=pass ("microsoft.com:s=arcselector9901:i=1"); spf=pass (imf06.hostedemail.com: domain of mike.kravetz@oracle.com designates 205.220.165.32 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com X-Rspam-User: X-Rspamd-Server: rspam03 X-HE-Tag: 1661363908-112384 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") added code to take i_mmap_rwsem in read mode for the duration of fault processing. The use of i_mmap_rwsem to prevent fault/truncate races depends on this. However, this has been shown to cause performance/scaling issues. As a result, that code will be reverted. Since the use i_mmap_rwsem to address page fault/truncate races depends on this, it must also be reverted. In a subsequent patch, code will be added to detect the fault/truncate race and back out operations as required. Signed-off-by: Mike Kravetz Reviewed-by: Miaohe Lin --- fs/hugetlbfs/inode.c | 30 +++++++++--------------------- mm/hugetlb.c | 23 ++++++++++++----------- 2 files changed, 21 insertions(+), 32 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index f7a5b5124d8a..a32031e751d1 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -419,9 +419,10 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, * In this case, we first scan the range and release found pages. * After releasing pages, hugetlb_unreserve_pages cleans up region/reserve * maps and global counts. Page faults can not race with truncation - * in this routine. hugetlb_no_page() holds i_mmap_rwsem and prevents - * page faults in the truncated range by checking i_size. i_size is - * modified while holding i_mmap_rwsem. + * in this routine. hugetlb_no_page() prevents page faults in the + * truncated range. It checks i_size before allocation, and again after + * with the page table lock for the page held. The same lock must be + * acquired to unmap a page. * hole punch is indicated if end is not LLONG_MAX * In the hole punch case we scan the range and release found pages. * Only when releasing a page is the associated region/reserve map @@ -451,16 +452,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, u32 hash = 0; index = folio->index; - if (!truncate_op) { - /* - * Only need to hold the fault mutex in the - * hole punch case. This prevents races with - * page faults. Races are not possible in the - * case of truncation. - */ - hash = hugetlb_fault_mutex_hash(mapping, index); - mutex_lock(&hugetlb_fault_mutex_table[hash]); - } + hash = hugetlb_fault_mutex_hash(mapping, index); + mutex_lock(&hugetlb_fault_mutex_table[hash]); /* * If folio is mapped, it was faulted in after being @@ -504,8 +497,7 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, } folio_unlock(folio); - if (!truncate_op) - mutex_unlock(&hugetlb_fault_mutex_table[hash]); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); } folio_batch_release(&fbatch); cond_resched(); @@ -543,8 +535,8 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset) BUG_ON(offset & ~huge_page_mask(h)); pgoff = offset >> PAGE_SHIFT; - i_mmap_lock_write(mapping); i_size_write(inode, offset); + i_mmap_lock_write(mapping); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0, ZAP_FLAG_DROP_MARKER); @@ -703,11 +695,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, /* addr is the offset within the file (zero based) */ addr = index * hpage_size; - /* - * fault mutex taken here, protects against fault path - * and hole punch. inode_lock previously taken protects - * against truncation. - */ + /* mutex taken here, fault path and hole punch */ hash = hugetlb_fault_mutex_hash(mapping, index); mutex_lock(&hugetlb_fault_mutex_table[hash]); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9a72499486c1..70bc7f867bc0 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5575,18 +5575,17 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, } /* - * We can not race with truncation due to holding i_mmap_rwsem. - * i_size is modified when holding i_mmap_rwsem, so check here - * once for faults beyond end of file. + * Use page lock to guard against racing truncation + * before we get page_table_lock. */ - size = i_size_read(mapping->host) >> huge_page_shift(h); - if (idx >= size) - goto out; - retry: new_page = false; page = find_lock_page(mapping, idx); if (!page) { + size = i_size_read(mapping->host) >> huge_page_shift(h); + if (idx >= size) + goto out; + /* Check for page in userfault range */ if (userfaultfd_missing(vma)) { ret = hugetlb_handle_userfault(vma, mapping, idx, @@ -5677,6 +5676,10 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, } ptl = huge_pte_lock(h, mm, ptep); + size = i_size_read(mapping->host) >> huge_page_shift(h); + if (idx >= size) + goto backout; + ret = 0; /* If pte changed from under us, retry */ if (!pte_same(huge_ptep_get(ptep), old_pte)) @@ -5785,10 +5788,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, /* * Acquire i_mmap_rwsem before calling huge_pte_alloc and hold - * until finished with ptep. This serves two purposes: - * 1) It prevents huge_pmd_unshare from being called elsewhere - * and making the ptep no longer valid. - * 2) It synchronizes us with i_size modifications during truncation. + * until finished with ptep. This prevents huge_pmd_unshare from + * being called elsewhere and making the ptep no longer valid. * * ptep could have already be assigned via huge_pte_offset. That * is OK, as huge_pte_alloc will return the same value unless From patchwork Wed Aug 24 17:57:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12953865 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CFF9C00140 for ; Wed, 24 Aug 2022 17:58:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C7BB16B0073; Wed, 24 Aug 2022 13:58:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C2AEF940008; Wed, 24 Aug 2022 13:58:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A309B940007; Wed, 24 Aug 2022 13:58:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9173A6B0073 for ; Wed, 24 Aug 2022 13:58:25 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 23B11803F6 for ; Wed, 24 Aug 2022 17:58:25 +0000 (UTC) X-FDA: 79835245770.07.CD87BDE Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by imf29.hostedemail.com (Postfix) with ESMTP id 9C8B912006D for ; Wed, 24 Aug 2022 17:58:24 +0000 (UTC) Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27OHkcR6007213; Wed, 24 Aug 2022 17:58:15 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2022-7-12; bh=86V0o8EjtPizUkP7ckO7SCMiRdZbs1RSaDWcJarlHC4=; b=wl4aamfqsDo9TbB4SZSUh6evC6mBxlYsao0a3XRRVECg2x432l6KP8tO0EG1U+DSr3oz rU1bydrlQ3eT2sit24NIVOrfzQUeX5wkXkV7t8L4BZrntKOXE4lc7lrZ+8f2SK18jxwO J3lFGWFHDTqbqk56HtXVdTH/tnWdb018atAAwmYo16VO4s50GpaZJsc8nUlT5Z5auvhd dYU2ohazFgXFppjYPYLOfis0zf3QgeMQYKwQNE2fNjpZHraMRfs4XFCtV7EmUfTREWDj C39nXq9xIgRDJnuN90gmGLqW+aybanY/JvRQPhTD1UM1uYR0ul4nnXTqjo4COH3yFxTN lQ== Received: from iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta03.appoci.oracle.com [130.35.103.27]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3j55nyapgr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Aug 2022 17:58:14 +0000 Received: from pps.filterd (iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 27OGuJTu008242; Wed, 24 Aug 2022 17:58:13 GMT Received: from nam02-dm3-obe.outbound.protection.outlook.com (mail-dm3nam02lp2046.outbound.protection.outlook.com [104.47.56.46]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3j5n7akm5k-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Aug 2022 17:58:13 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=GxNn8L9hQnYrANd+j0Y4x+ionV3qY6ctkRf9n2obZq+sA9ASfmzywRic7HsfIfq5ZKRz8yVO+p9UsGdgq4C72sKtIMd1dIhzwjgdLAzGJhOTmmOnB52AgjSbY83JMSAS+VazkyiRwCAFb92XecwIDsyuGGh5eH+MhrB3iPjSa4O9+rWo3EgeEOQHdmIfasXIRLOUSgQQ4IqwxhIlcXvyyWKNkehZZvujFpDCreQgQ0EdIvBYpigrJO90HACIOvJ/vMZHMkbFz4QDmdaNOix/2gVjtM6nb0j+45a1t7jrsabO/zq0w+ICT/ngmGDa6FBsxfZDCHosS0qKLZ6Usm6vDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=86V0o8EjtPizUkP7ckO7SCMiRdZbs1RSaDWcJarlHC4=; b=PverUZo0TRlOjNnczJ5Ugxd2LtzVAL5cNdxu7GV2HBEKanlUQYRAsb7r9K4PAnYAAT82ocH251mwISp05qo/o9T7tvQFR3R9esEmvCyFWIBjot4RpO4MEbG8wdroR4VF+Q/HZmuePpcOKtR7hXBm8KH9gf23lXeN2MuLXDNznNLgNeA36CG6DJSHLXcSEVp3Int0GoTRt8ckpv42grQPbhIvmF6+dMkGEAH/r6JEs6QpDIbdUK9aVBKZ1k1KHOIqoMuAU+p6AeeyOYUUV5CbHSUV8fPmVG5u3fB4mR+YLahULOehQTeQEnGUljqB2U+UmOy0/wnywoTv5RkfLq18Dg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=86V0o8EjtPizUkP7ckO7SCMiRdZbs1RSaDWcJarlHC4=; b=t303mTzXID7QtuNQrzdUYpbxON1hLXMg8WhCtu8ZE73GVDYx9zdzimM7JZ31nbneuqy/TzGPsxXVtx5QfjPhWe/wUR+dYYjmD21gUXQ9+ZCrEsZAcYPOnWNk8smrqN8OhTJFg1A2nVhgnADQrqMcWym1HmdfDgF9puPRUdRDMXU= Received: from DM6PR10MB4201.namprd10.prod.outlook.com (2603:10b6:5:216::10) by DM5PR1001MB2251.namprd10.prod.outlook.com (2603:10b6:4:2e::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5546.22; Wed, 24 Aug 2022 17:58:10 +0000 Received: from DM6PR10MB4201.namprd10.prod.outlook.com ([fe80::11b6:7a8a:1432:bec]) by DM6PR10MB4201.namprd10.prod.outlook.com ([fe80::11b6:7a8a:1432:bec%6]) with mapi id 15.20.5546.022; Wed, 24 Aug 2022 17:58:10 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Miaohe Lin , David Hildenbrand , Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [PATCH 2/8] hugetlbfs: revert use i_mmap_rwsem for more pmd sharing synchronization Date: Wed, 24 Aug 2022 10:57:51 -0700 Message-Id: <20220824175757.20590-3-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220824175757.20590-1-mike.kravetz@oracle.com> References: <20220824175757.20590-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW4PR03CA0086.namprd03.prod.outlook.com (2603:10b6:303:b6::31) To DM6PR10MB4201.namprd10.prod.outlook.com (2603:10b6:5:216::10) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: a5076828-2b29-4627-0168-08da85fa34eb X-MS-TrafficTypeDiagnostic: DM5PR1001MB2251:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: KXozgBxmEbhpsgOQchKZixjk3h7m7IxgxCpii/d0cvn/0sJMUGJcOTLuRikhzqfQtDyOQCtr919ABs3As601BLj8ZmuOwzH7Q0kfx8XnqKxux27B3TsZwPit3YGdnoM7dE0Xh8sBX4Uxk3uzogZ7EIo0e3XMXQpojKBURzsGDSzeuqOopJkWpBIwkDwfAsNu78tipiGv43K1svMVEzXGFePwpkkMtqevJge0kdqdkhcPOBjZshGrY/OVoxMWU7Xei7qY2+xxUqUwX3m9I86r3hszCioJrSI545MUEHGELVNXSqbFDnMZSk1netmW237fWE4S6r0g9o+tvZgzFv0G8XSnowoLa+OfM7n52HlWZv5/DXcGjE27+GWmwpPN/hq/bGRkgA60meNXrfEJ4ZWqVAljCPs1iKKybZB0MVJHJKF0dBtvwzYF4q3U4XLKo0xZ/VIWJaYS9z2tdHzVn4h37EVB1nn+D8Y/md6iedc/c9M8Ykb6RR/4+HmjcllpT0vNUhqCfvgfMr9g6womhDqloaq3bkShTlH5T/GJUCZrh392pQq4dSZAGsnLv6OhgO7mA8PjBxPwA2LHwLRjgZ3wKUdgUvdoNF3M4d/MGFHDEz5etGXREEWOvR9peXjxaVE+QOr5p917B/xEyZslkGEWGSW6gsE2vSDmCGSHGC5hwETFyov32Nrhm/rBwIEgGjeV8UO/mfBVk5KZu5Q/3s+dOA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DM6PR10MB4201.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230016)(136003)(396003)(376002)(366004)(39860400002)(346002)(44832011)(2616005)(186003)(38100700002)(54906003)(83380400001)(1076003)(66946007)(86362001)(8676002)(66476007)(66556008)(316002)(478600001)(4326008)(41300700001)(6486002)(36756003)(7416002)(8936002)(5660300002)(6666004)(6506007)(30864003)(6512007)(26005)(2906002)(107886003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: fX7OiTm0JHTHFQEsvXpLSJrzQWfR/lKcgM7JIiziZXYMnXYhIADjTgH8JjCtsK9/0u1MGyXbxpicrJjqQQlstxA93ZM/m2uLn3SAyHqptQ1mANYTVe4gg5f1kSASjG6XieNBgD+JlOPahZ+UzQlKTmkGsffCb3J4iummqCoCS+C80AT3l/2oMKwSM7KKdZh27ONECAJh6LAqnnRptpMPO88q17qBEHIClJuxNeyY1H+sxZUiE1puyt5NdHetdtRMWhRUpyFocVzywoMZ+uXdulTR74fAoUgl+OzViDIcIkIiKb38Fr8SufC+6E/jMtKGHIWW+G3S8iKwRU2jj9uLSmvH7+F7KNHju+GTGJhd8lI0rBLA23QBXsVj4wlZjI47/lRUgTpsrTI9uFj4KSAFgsFqWsgrh+QFj7wza4zFtogeRTKphatBsIDFjin4sT4U0J6xoK/6S1mdjXoHrAYRaQEQgTfPVSKY/i2J5lAwTA1auqWEcdM1d75n1192ktQkOVfpcWqx01j2bYoBcKdG8cY4XA4DugALQnJka1l4U8Fcn2RFIZO1cTA8/xxWxCCrz+0FVOv6EkXNdta6wHxuEVrBTPiQkvuitvzcjfJj15EKYG2BLNmoAzzP2Hvu3f+1M5si2Ox0ncel1irZKCzs30BwRJlHcqxWYNLUw+X+YQCR8x734Q03urwcpJegsete+Mt9BauZwJXvG8TmQjzsQUumbcZi7P6fNymvkB6vGNRpDTViDCgNl6ouH+in4xdD4tflLWPTZJ9mURX4ktE7k/UE4FQePoFl1dDVhN5crq6wH9kDnTTIqJ7Gbg3SeDFeXhjU6QquhrQbmOXN6Gay4yPSQvzTNSL8ciwc0e479BdJdyAi5X/lCNqagl0YbKcii0v6sY7zrWFQ59SdiwLG10qz2LXkZzJ/HOlfZJvqQq+n0L4EtzzuPP/nA1xIgnpETVS18YFJTZ3bwD0F2v1mVp1LMiAhM0YNCrSJdnCNe5ioThWsFO+18OPuho4y+HJY12GQCNeTL4iyYhz0atlUy/09DUTI3IOnoI/lzX7L2ZEZ9TjWrfq/ON5rAuAimWHmRDnafFsBTdOAnB6O/leqbcRnT2EZWh0340ASh7+ZYy+ItcwBCKceGrOtPb9DUnNAydnNnoVA7WmRrvy+0uD6YCpQTD7yz+ni35pv0WPDrh0fv7X0fRAdVh6XSXx8Bbsc8vyz2feJ4WspQrTdJmHoFL3XZ0UfrPnbk+OcxgcBc2CaiK1mmlC6JpoT5FTKg2qCZT0oliOWgH+eMHMA8T3E+KsEdUOcGmzIxEB59tZv3imQWiDU5TVKRm3UV7+SDr1QScfdHbSkWa5ymlhJJ4SjmPJeZ4CAe89LD4Pc0W8NaSMk8aRSCyRYDtKncJVS6mDHPRZ2icRVu0Yw52OylkUGGzXIfIy2aOeaTHqdp5VgQ5+uPZwkldfeVNaGpxz1/wMftbMWjI9RaiRXf47gPPZYIkDPLri39U32kteM7Q+IsAWNpxLRGNXMC7rlIz8b8PqeDKfwolKxq0+AtrL85SJngBMCJxarYyYTn1/XDbwDrazEHGoeU2IGRsmNF2ErJkVrhtuqDWK2MFqpCuS6FySmKA== X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: a5076828-2b29-4627-0168-08da85fa34eb X-MS-Exchange-CrossTenant-AuthSource: DM6PR10MB4201.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Aug 2022 17:58:10.3887 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: x8ueKKMJNCrJcgTMz3sR9Pi6K/ZIfddVA7fDAqnHyRinAl/0F52NWsdOERDOnjvcbb2Z+fj44Qn3Ps0Vy9+Xdw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR1001MB2251 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-24_11,2022-08-22_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 suspectscore=0 phishscore=0 mlxlogscore=999 adultscore=0 mlxscore=0 malwarescore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2207270000 definitions=main-2208240066 X-Proofpoint-ORIG-GUID: h9ZIU6WFqNdJUOFCw4Jj6cXNsoqRNSub X-Proofpoint-GUID: h9ZIU6WFqNdJUOFCw4Jj6cXNsoqRNSub ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1661363904; a=rsa-sha256; cv=pass; b=ETILCTqJ9UCrykivIbXVDrLBV7lHsHtMMywRDE2tAfgE8bLzRR6qVolEUPGgE+J+Dis/ig AUlMH+M8KR6ZtJTVJRRANzcIt2OUOc5nZgVlSKW1Q04+B8Q/1O4znVknJUgrOJlJ5YnLrO FKgR+IeZpm+IXmVLB/rYITEEuI49sGc= ARC-Authentication-Results: i=2; imf29.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2022-7-12 header.b=wl4aamfq; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=t303mTzX; dmarc=pass (policy=none) header.from=oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1"); spf=pass (imf29.hostedemail.com: domain of mike.kravetz@oracle.com designates 205.220.165.32 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661363904; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=86V0o8EjtPizUkP7ckO7SCMiRdZbs1RSaDWcJarlHC4=; b=s9VOl34026QWJe2T7fAExKpn0dAIblKZL7sPjZFE8Y3X4unY1aF/n2aHJCjXqF6eSu+x9U al97g28RGwjESZQB1gQSZWSWbM/rogU72gUFyAfMJeBjVVhs7tjAoAaA3TTjBmtIX/p2ja WdNzX/cQahHetnMmr/oH/AjMDMot/RQ= Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2022-7-12 header.b=wl4aamfq; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=t303mTzX; dmarc=pass (policy=none) header.from=oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1"); spf=pass (imf29.hostedemail.com: domain of mike.kravetz@oracle.com designates 205.220.165.32 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com X-Rspamd-Queue-Id: 9C8B912006D X-Rspamd-Server: rspam02 X-Stat-Signature: au5a9dhdpi14xwt361kwffeymjfdssba X-Rspam-User: X-HE-Tag: 1661363904-349071 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") added code to take i_mmap_rwsem in read mode for the duration of fault processing. However, this has been shown to cause performance/scaling issues. Revert the code and go back to only taking the semaphore in huge_pmd_share during the fault path. Keep the code that takes i_mmap_rwsem in write mode before calling try_to_unmap as this is required if huge_pmd_unshare is called. NOTE: Reverting this code does expose the following race condition. Faulting thread Unsharing thread ... ... ptep = huge_pte_offset() or ptep = huge_pte_alloc() ... i_mmap_lock_write lock page table ptep invalid <------------------------ huge_pmd_unshare() Could be in a previously unlock_page_table sharing process or worse i_mmap_unlock_write ... ptl = huge_pte_lock(ptep) get/update pte set_pte_at(pte, ptep) It is unknown if the above race was ever experienced by a user. It was discovered via code inspection when initially addressed. In subsequent patches, a new synchronization mechanism will be added to coordinate pmd sharing and eliminate this race. Signed-off-by: Mike Kravetz Reviewed-by: Miaohe Lin --- fs/hugetlbfs/inode.c | 2 -- mm/hugetlb.c | 77 +++++++------------------------------------- mm/rmap.c | 8 +---- mm/userfaultfd.c | 11 ++----- 4 files changed, 15 insertions(+), 83 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index a32031e751d1..dfb735a91bbb 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -467,9 +467,7 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, if (unlikely(folio_mapped(folio))) { BUG_ON(truncate_op); - mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_lock_write(mapping); - mutex_lock(&hugetlb_fault_mutex_table[hash]); hugetlb_vmdelete_list(&mapping->i_mmap, index * pages_per_huge_page(h), (index + 1) * pages_per_huge_page(h), diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 70bc7f867bc0..95c6f9a5bbf0 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4770,7 +4770,6 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, struct hstate *h = hstate_vma(src_vma); unsigned long sz = huge_page_size(h); unsigned long npages = pages_per_huge_page(h); - struct address_space *mapping = src_vma->vm_file->f_mapping; struct mmu_notifier_range range; unsigned long last_addr_mask; int ret = 0; @@ -4782,14 +4781,6 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, mmu_notifier_invalidate_range_start(&range); mmap_assert_write_locked(src); raw_write_seqcount_begin(&src->write_protect_seq); - } else { - /* - * For shared mappings i_mmap_rwsem must be held to call - * huge_pte_alloc, otherwise the returned ptep could go - * away if part of a shared pmd and another thread calls - * huge_pmd_unshare. - */ - i_mmap_lock_read(mapping); } last_addr_mask = hugetlb_mask_last_page(h); @@ -4937,8 +4928,6 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, if (cow) { raw_write_seqcount_end(&src->write_protect_seq); mmu_notifier_invalidate_range_end(&range); - } else { - i_mmap_unlock_read(mapping); } return ret; @@ -5347,30 +5336,9 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, * may get SIGKILLed if it later faults. */ if (outside_reserve) { - struct address_space *mapping = vma->vm_file->f_mapping; - pgoff_t idx; - u32 hash; - put_page(old_page); BUG_ON(huge_pte_none(pte)); - /* - * Drop hugetlb_fault_mutex and i_mmap_rwsem before - * unmapping. unmapping needs to hold i_mmap_rwsem - * in write mode. Dropping i_mmap_rwsem in read mode - * here is OK as COW mappings do not interact with - * PMD sharing. - * - * Reacquire both after unmap operation. - */ - idx = vma_hugecache_offset(h, vma, haddr); - hash = hugetlb_fault_mutex_hash(mapping, idx); - mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); - unmap_ref_private(mm, vma, old_page, haddr); - - i_mmap_lock_read(mapping); - mutex_lock(&hugetlb_fault_mutex_table[hash]); spin_lock(ptl); ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); if (likely(ptep && @@ -5538,9 +5506,7 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, */ hash = hugetlb_fault_mutex_hash(mapping, idx); mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); ret = handle_userfault(&vmf, reason); - i_mmap_lock_read(mapping); mutex_lock(&hugetlb_fault_mutex_table[hash]); return ret; @@ -5772,11 +5738,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); if (ptep) { - /* - * Since we hold no locks, ptep could be stale. That is - * OK as we are only making decisions based on content and - * not actually modifying content here. - */ entry = huge_ptep_get(ptep); if (unlikely(is_hugetlb_entry_migration(entry))) { migration_entry_wait_huge(vma, ptep); @@ -5784,31 +5745,20 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) return VM_FAULT_HWPOISON_LARGE | VM_FAULT_SET_HINDEX(hstate_index(h)); + } else { + ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); + if (!ptep) + return VM_FAULT_OOM; } - /* - * Acquire i_mmap_rwsem before calling huge_pte_alloc and hold - * until finished with ptep. This prevents huge_pmd_unshare from - * being called elsewhere and making the ptep no longer valid. - * - * ptep could have already be assigned via huge_pte_offset. That - * is OK, as huge_pte_alloc will return the same value unless - * something has changed. - */ mapping = vma->vm_file->f_mapping; - i_mmap_lock_read(mapping); - ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); - if (!ptep) { - i_mmap_unlock_read(mapping); - return VM_FAULT_OOM; - } + idx = vma_hugecache_offset(h, vma, haddr); /* * Serialize hugepage allocation and instantiation, so that we don't * get spurious allocation failures if two CPUs race to instantiate * the same page in the page cache. */ - idx = vma_hugecache_offset(h, vma, haddr); hash = hugetlb_fault_mutex_hash(mapping, idx); mutex_lock(&hugetlb_fault_mutex_table[hash]); @@ -5873,7 +5823,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, put_page(pagecache_page); } mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); return handle_userfault(&vmf, VM_UFFD_WP); } @@ -5917,7 +5866,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, } out_mutex: mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); /* * Generally it's safe to hold refcount during waiting page lock. But * here we just wait to defer the next page fault to avoid busy loop and @@ -6758,12 +6706,10 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, * Search for a shareable pmd page for hugetlb. In any case calls pmd_alloc() * and returns the corresponding pte. While this is not necessary for the * !shared pmd case because we can allocate the pmd later as well, it makes the - * code much cleaner. - * - * This routine must be called with i_mmap_rwsem held in at least read mode if - * sharing is possible. For hugetlbfs, this prevents removal of any page - * table entries associated with the address space. This is important as we - * are setting up sharing based on existing page table entries (mappings). + * code much cleaner. pmd allocation is essential for the shared case because + * pud has to be populated inside the same i_mmap_rwsem section - otherwise + * racing tasks could either miss the sharing (see huge_pte_offset) or select a + * bad pmd for sharing. */ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, pud_t *pud) @@ -6777,7 +6723,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, pte_t *pte; spinlock_t *ptl; - i_mmap_assert_locked(mapping); + i_mmap_lock_read(mapping); vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { if (svma == vma) continue; @@ -6807,6 +6753,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, spin_unlock(ptl); out: pte = (pte_t *)pmd_alloc(mm, pud, addr); + i_mmap_unlock_read(mapping); return pte; } @@ -6817,7 +6764,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, * indicated by page_count > 1, unmap is achieved by clearing pud and * decrementing the ref count. If count == 1, the pte page is not shared. * - * Called with page table lock held and i_mmap_rwsem held in write mode. + * Called with page table lock held. * * returns: 1 successfully unmapped a shared pte page * 0 the underlying pte page is not shared, or it is the last user diff --git a/mm/rmap.c b/mm/rmap.c index 7dc6d77ae865..ad9c97c6445c 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -23,10 +23,9 @@ * inode->i_rwsem (while writing or truncating, not reading or faulting) * mm->mmap_lock * mapping->invalidate_lock (in filemap_fault) - * page->flags PG_locked (lock_page) * (see hugetlbfs below) + * page->flags PG_locked (lock_page) * hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share) * mapping->i_mmap_rwsem - * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) * anon_vma->rwsem * mm->page_table_lock or pte_lock * swap_lock (in swap_duplicate, swap_info_get) @@ -45,11 +44,6 @@ * anon_vma->rwsem,mapping->i_mmap_rwsem (memory_failure, collect_procs_anon) * ->tasklist_lock * pte map lock - * - * * hugetlbfs PageHuge() pages take locks in this order: - * mapping->i_mmap_rwsem - * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) - * page->flags PG_locked (lock_page) */ #include diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 7327b2573f7c..7707f2664adb 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -377,14 +377,10 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, BUG_ON(dst_addr >= dst_start + len); /* - * Serialize via i_mmap_rwsem and hugetlb_fault_mutex. - * i_mmap_rwsem ensures the dst_pte remains valid even - * in the case of shared pmds. fault mutex prevents - * races with other faulting threads. + * Serialize via hugetlb_fault_mutex. */ - mapping = dst_vma->vm_file->f_mapping; - i_mmap_lock_read(mapping); idx = linear_page_index(dst_vma, dst_addr); + mapping = dst_vma->vm_file->f_mapping; hash = hugetlb_fault_mutex_hash(mapping, idx); mutex_lock(&hugetlb_fault_mutex_table[hash]); @@ -392,7 +388,6 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, dst_pte = huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize); if (!dst_pte) { mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); goto out_unlock; } @@ -400,7 +395,6 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, !huge_pte_none_mostly(huge_ptep_get(dst_pte))) { err = -EEXIST; mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); goto out_unlock; } @@ -409,7 +403,6 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, wp_copy); mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); cond_resched(); From patchwork Wed Aug 24 17:57:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12953866 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC404C32796 for ; Wed, 24 Aug 2022 17:58:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 529146B0074; Wed, 24 Aug 2022 13:58:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4D8C1940008; Wed, 24 Aug 2022 13:58:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 21BB6940007; Wed, 24 Aug 2022 13:58:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 111796B0074 for ; Wed, 24 Aug 2022 13:58:27 -0400 (EDT) Received: from smtpin31.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D5B124182C for ; Wed, 24 Aug 2022 17:58:26 +0000 (UTC) X-FDA: 79835245812.31.9AC9182 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by imf30.hostedemail.com (Postfix) with ESMTP id 4D3E68003E for ; Wed, 24 Aug 2022 17:58:26 +0000 (UTC) Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27OHkFOH019876; Wed, 24 Aug 2022 17:58:17 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2022-7-12; bh=3E9Q+FnEp5d1pm/OmIFD4QRcfBrQQhhBz51RIJ0xty4=; b=MUQujRCmVzUDFm4HNgHw3Stx6RoeZQKwBS82xe5ggBXUvZ0DZ6HhCDMHYy2IRTxGcvkL WYKQ8GYgW+jHhkfDoEcZbIgi9CFfw8pjyfFFN3H/U2xfCpkVcTxBOWR0fCu9tcvL/Xaa aj4lCs01lsPNiTSSVhtnkNq5dfknp3V07sNBRMBq/W0crs1wuesTtY/7MAih+iEUxLEg YRVq9cMa1s9ziMhYzK8UBwXxs4mrVUhBlBqlZuxyByY4yCcFvIChIG1wusNosldOuMBf oryavEYrdC7RvwdJUYIUXSZ+ZiYxKZ5HBEnASMWBTGLXTdY1xm+lCOB9ZycT3jAH341T cw== Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.appoci.oracle.com [147.154.18.20]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3j4w23vbux-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Aug 2022 17:58:16 +0000 Received: from pps.filterd (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 27OHCX1a008968; Wed, 24 Aug 2022 17:58:15 GMT Received: from nam11-dm6-obe.outbound.protection.outlook.com (mail-dm6nam11lp2174.outbound.protection.outlook.com [104.47.57.174]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3j5n6nbqbn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Aug 2022 17:58:15 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Y2d6nDDVsmOqqe1yRH7y1y1HrR3V621yzDsopUpsKVKRyeBMe8D0OqdFqDa2CMGY4ZhO0Mj+Wj6m3b4quZXTP1Ps7XBnquBOhlVUrKomgDLEpaevQzjhj7vuxhKxP/CgW6xD3/JT0Qxx9itAkdYvPWe4AD8z9oAIirBBP3Oo045QRRCK/mw7Qt1oOo+RXSBZ+9gQ466TpN7ny6aDj8PY6MHfZo/7g8UEG6iSEjPPAv4TpE2stjh/3OCSczRImmyyIsO44lnJvSKPL57z3NKfv4L+u+dYMIjh/9GuS26pK1KfXwuPLrZlwc/+YJZEKiTYjbw2fmD1qtxrJheVXkIDFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=3E9Q+FnEp5d1pm/OmIFD4QRcfBrQQhhBz51RIJ0xty4=; b=QqSQcBZxMGiTObC2bKkbMuerNOnJs9qNgd5VZxm6+4TPw4EYo92nYNCcaNqi/fTYl6hCQ+R3ZFpMJTm/1MCjNDYfxSX2bPr5EqfNXmAd+NpyK9xssV/c3l7JaocefX+VGR++IJvmAts1kmUAPGUw+PyxZGhLY2MAYBsSMqTNYNNZ0f74lzSp8qJsOnn1OIEv3XALqA6duydwuxIwbtmgMXOxFpgupFkdRU90zpOuCk67jeiNpcoel49zwsND8LHStDnLJXIEpQ6Kqu0j1cJGhlLbr2WPdfnBaf/S/x+L21Zy/7iqRyDGkpo75AhFQbDvui3wN93gc4EMPpTbC3qEMQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=3E9Q+FnEp5d1pm/OmIFD4QRcfBrQQhhBz51RIJ0xty4=; b=gMzQ3qTEzS5NhaAHprh1NYnOLvSjtBXieOf+6nkmqZo97EYws9ilTou5jS6UnmYYTDv631FmpM8zdrhZkOf+IwCLA/ZfFhrNG4U2tfM2/BpjO/XKPOdGlRee8qWloYPB6VkuWuDElXrQtS3fQLr0jVqRE5QBGVWYEzmrWubWmLo= Received: from DM6PR10MB4201.namprd10.prod.outlook.com (2603:10b6:5:216::10) by PH0PR10MB4520.namprd10.prod.outlook.com (2603:10b6:510:43::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5546.19; Wed, 24 Aug 2022 17:58:13 +0000 Received: from DM6PR10MB4201.namprd10.prod.outlook.com ([fe80::11b6:7a8a:1432:bec]) by DM6PR10MB4201.namprd10.prod.outlook.com ([fe80::11b6:7a8a:1432:bec%6]) with mapi id 15.20.5546.022; Wed, 24 Aug 2022 17:58:13 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Miaohe Lin , David Hildenbrand , Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [PATCH 3/8] hugetlb: rename remove_huge_page to hugetlb_delete_from_page_cache Date: Wed, 24 Aug 2022 10:57:52 -0700 Message-Id: <20220824175757.20590-4-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220824175757.20590-1-mike.kravetz@oracle.com> References: <20220824175757.20590-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW4PR03CA0226.namprd03.prod.outlook.com (2603:10b6:303:b9::21) To DM6PR10MB4201.namprd10.prod.outlook.com (2603:10b6:5:216::10) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 8395f486-276b-4294-d321-08da85fa36a8 X-MS-TrafficTypeDiagnostic: PH0PR10MB4520:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: yBYpRofnoswpxOGiJ/0KswKblymS5d1yMog6Wuo0ae3/Hhwah/je4k59h6CReWJiT+/AuiSJhObZCaAQaGv+Gb0jjEa/1yEmDWNm6KZWXCbs78JYKz6vWRbm/LxfsXWpPzTDR5axpW4/6KJ6RVl8DXkZUude1nrOc+jyNtEusEc4RiV7BIgNcC+tmYj2wZ3PZPQDNXhxfBKCCH+MRRJPF96W21FrTpCn6fv2tONNz4yWebIMR+li4b+W56UrnVi+TghkhuU/3zMvoWUYJg1T6ISlVkJuXP3BaI0zzlPEXaOWgE7ntgn8nC6AwG6z0DUNi8Ix238RFyWd8FfCGm+ZNCVXqB3JU57OjvbpPGP9pqu67A0IptPv27t6VBFqdN1MfPf9QwodnD8RdSgMJQuxAP0+dD3iMXhhlvYdPG4T4aV0JrIcMjgKUJhvvqAotzMsCQ7P7T55l4rjjY3Tz/aZKxDNyN8tOCxWwB6h4rAWzeQ/KIfLT6ZqzQHwX2n82hMmqK/klCB2SRMItv0HNcLI9VyednmENRooaT3AH4IQArJ4axKBVBayYZeK6OHlqfyeKOx9J1KyA3kGCLCNLHNnlgxXAYecWuv+Gq17cDpH2oncz/uFcQh0uGeSAbzNb4u7vKznVIgOZHnSeD6WzUretwT93REHkaa16TINKVh0PukDifxHNQXSIItN7MwM2PEJzvB5PpRSxM0krGyx/PTjvi+kzLVPBORRCM0JvIyrkJxEEHKYYm4irOfneBmSJizW X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DM6PR10MB4201.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230016)(39860400002)(396003)(136003)(376002)(346002)(366004)(7416002)(1076003)(6512007)(6666004)(8936002)(26005)(186003)(107886003)(2616005)(6506007)(44832011)(86362001)(36756003)(41300700001)(316002)(5660300002)(54906003)(478600001)(66556008)(8676002)(4326008)(6486002)(83380400001)(38100700002)(66946007)(2906002)(66476007)(14583001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 9VRwgO6jAwnUtzLD7AG/bu8IohRIbHDqw0aPTkvluxAM74621NBKVmF4ETs87XJoy5C9Q/RuLlQcQLWRjDenBL65laOxN0TnsVAOGLWr+mfWZgXl3aCzISdufYRJF5H0vPUcqmGJumWy1lFdxK6EfMcWxO0Vcr/pFEa5q9vMCjjeTDVEnD6DFqewye2znbEzkepS6J7wx4SQ+OEdoiEZCGfRZ7Oy2TJR0raQZdveYbGJxfT/SvdPJ6BtJIDVMCNwNk7AoyiZ9H5OaMlDFdoR5HXuzCwyPcpUYjExhS3tNvNQ1hFsFnqYEnsyhXznuzi4ZAVx8Wq4IKOfbOCKt9umsdUdjNS86SXmG52kh2FvKw6mH9CgPj0lMMWdbomqfO9w8Lvrj/Vgh+kjY6msPs3wiYYSzcK9Zb35ZI2WM/tbt/8x5PAOGcJlIAWEuz/YyjhhOFwqCkPUnvl/CYKNL7r3JNAGxkzmkoNiKP4pHWFmQJp+AR4eZQ1c3yJG1E159wD79AA8OSxPoC4v1Axg3Siu8JeaJ8YozR2v5dqXgKX9mqfnQY6pEjEcdXt3IXsCA420eqBjnv+eACXiVH9ZIJ+gFeVfrcZbfTpTrKbImCjXu3VzIl5SYaYQwj2nJnqJoyHJy/2gsyMSxOGl9axuh9mRIIpyapKy3m2AtSBC1otDcXBlR0C0YJOGTcR4UZ+AFMVRumc8JkLX36XnfkwnL/KEby/G280GSMGLRUA572VTxx7TcRxsCt03YB+qMJl2xZxGcwIMtsxbohXL0n4EAuACSNR0Xd3rct4/u0udEqvNDDpqSUxBnpVABtfXgANLwS9srW2TkLXMytKJZCBrjHqKNAV4TF92I+1ZgsxhAPk05PEaYYr7QdPSU2mE0lyUv99kdcvO8c1IAKThRBNAVS8RHVO1HHlVw8wA7+5q+CRAPNONyfC6JD/bM/OKj8FRKBG2dRSn4fcCEgSxfMMcdMnkVQng2X87pIz/NXKlpvjoYU4GJt4rv2iys07ODNPlb5devfH9JrPMhjl75hwxFR+n+5dSwndHnXYk/LP+oKZ34oUHP1Mj2mKYXYRvivQYI84VpE+m4dwrijqpkxhT03xpnWBC8yPhK6FdOec/yX9FjFmH2WuCmjRPD8Iuvu7zYtqv2vW2HRLiqdajHTgJLg1j2we817xVpEyiZx2w9XdO8bxtTK/zBgTvf4csJ+mskX8Ph2LwA2+H0HKN5Xj+wL8H78lRixV6jVq/n33hbLMklqcs+c3Hmzffj4nGNfeaXcjJvR6fn8sCRRJdCIED+fRA06K4OHP6LsF4/mXVXIQPL2Lk72ORMx/C3gAosIVhseCDaD0GipGwtWGrIzdL4Mx0gNxUXEckikO4NJn1EyTD8RIQ8J5s0SfvVYDCys0xO9LAI+yXt7F77BKIc6xU85biSSYf5rpDw/tdHQAl6uhl1c+kQ40BqwRwTufP0Xknebxi0b9meABv1+5EhiUm4j4cw2moDru7pGexXgDfISx990baKNLRucSJUgwXIEv92RJU00MI9msmy9jeQy5EhciJ3OtKsd38lIYDIV4VK+CYgnhiqXdJyH1R5aWyerfywbj5N8mnv8EBIK2O3Ou6X2xwDw== X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 8395f486-276b-4294-d321-08da85fa36a8 X-MS-Exchange-CrossTenant-AuthSource: DM6PR10MB4201.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Aug 2022 17:58:13.2791 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: wqSq6vaFzGtAfGF9d2f1fE5esREjys3nO8FoPb52h4rVQbOBE9IC+uhbXQTC55prSNxqQEvj3rXgYLs2DZocGA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR10MB4520 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-24_11,2022-08-22_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 spamscore=0 bulkscore=0 adultscore=0 mlxscore=0 malwarescore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2207270000 definitions=main-2208240066 X-Proofpoint-ORIG-GUID: 6r7In_beGn3aI3qpcm5jGYeqCOOe8vU8 X-Proofpoint-GUID: 6r7In_beGn3aI3qpcm5jGYeqCOOe8vU8 ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1661363906; a=rsa-sha256; cv=pass; b=NSeVEUobPTZWRNLBtuTOI0Vp4UOCmECoEjz1T6KwxLdL6kvdGGGCoqn+hvMX8l2r0SPMVB 5KAfW/tBpSDQysHwdzOa9iEZPCgEOfH0TKidjKGElJcoMFw1oz01ZbeSH2lVxmpwOCpEHi KzFfEKJVrR3B+zK89V4X+76QVjZqKSE= ARC-Authentication-Results: i=2; imf30.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2022-7-12 header.b=MUQujRCm; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=gMzQ3qTE; spf=pass (imf30.hostedemail.com: domain of mike.kravetz@oracle.com designates 205.220.165.32 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661363906; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3E9Q+FnEp5d1pm/OmIFD4QRcfBrQQhhBz51RIJ0xty4=; b=JCbMVU+zF+YR1ZAlgCLMntcYRboFFcNOGM4lmGXXbtNDJpM7RxnGPD09e/TMo301rmCcUA oWCrO70wJYXm2TV76wmAkDmx2ZNf2B9t3fLDVaBC0ZoFGvD5DzpivEfSLL+Q4cb4FjvZWj XEoH2g3Q+xt87rFos+cZz7niysALyfQ= X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 4D3E68003E Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2022-7-12 header.b=MUQujRCm; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=gMzQ3qTE; spf=pass (imf30.hostedemail.com: domain of mike.kravetz@oracle.com designates 205.220.165.32 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") X-Stat-Signature: yr1hofk6pdmn56wsqhnrh31jr4g1qy4k X-HE-Tag: 1661363906-195544 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: remove_huge_page removes a hugetlb page from the page cache. Change to hugetlb_delete_from_page_cache as it is a more descriptive name. huge_add_to_page_cache is global in scope, but only deals with hugetlb pages. For consistency and clarity, rename to hugetlb_add_to_page_cache. Signed-off-by: Mike Kravetz Reviewed-by: Miaohe Lin --- fs/hugetlbfs/inode.c | 21 ++++++++++----------- include/linux/hugetlb.h | 2 +- mm/hugetlb.c | 8 ++++---- 3 files changed, 15 insertions(+), 16 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index dfb735a91bbb..d98c6edbd1a4 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -364,7 +364,7 @@ static int hugetlbfs_write_end(struct file *file, struct address_space *mapping, return -EINVAL; } -static void remove_huge_page(struct page *page) +static void hugetlb_delete_from_page_cache(struct page *page) { ClearPageDirty(page); ClearPageUptodate(page); @@ -478,15 +478,14 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, folio_lock(folio); /* * We must free the huge page and remove from page - * cache (remove_huge_page) BEFORE removing the - * region/reserve map (hugetlb_unreserve_pages). In - * rare out of memory conditions, removal of the - * region/reserve map could fail. Correspondingly, - * the subpool and global reserve usage count can need - * to be adjusted. + * cache BEFORE removing the * region/reserve map + * (hugetlb_unreserve_pages). In rare out of memory + * conditions, removal of the region/reserve map could + * fail. Correspondingly, the subpool and global + * reserve usage count can need to be adjusted. */ VM_BUG_ON(HPageRestoreReserve(&folio->page)); - remove_huge_page(&folio->page); + hugetlb_delete_from_page_cache(&folio->page); freed++; if (!truncate_op) { if (unlikely(hugetlb_unreserve_pages(inode, @@ -723,7 +722,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, } clear_huge_page(page, addr, pages_per_huge_page(h)); __SetPageUptodate(page); - error = huge_add_to_page_cache(page, mapping, index); + error = hugetlb_add_to_page_cache(page, mapping, index); if (unlikely(error)) { restore_reserve_on_error(h, &pseudo_vma, addr, page); put_page(page); @@ -735,7 +734,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, SetHPageMigratable(page); /* - * unlock_page because locked by huge_add_to_page_cache() + * unlock_page because locked by hugetlb_add_to_page_cache() * put_page() due to reference from alloc_huge_page() */ unlock_page(page); @@ -980,7 +979,7 @@ static int hugetlbfs_error_remove_page(struct address_space *mapping, struct inode *inode = mapping->host; pgoff_t index = page->index; - remove_huge_page(page); + hugetlb_delete_from_page_cache(page); if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1))) hugetlb_fix_reserve_counts(inode); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 3ec981a0d8b3..acace1a25226 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -665,7 +665,7 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask, gfp_t gfp_mask); struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma, unsigned long address); -int huge_add_to_page_cache(struct page *page, struct address_space *mapping, +int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx); void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, unsigned long address, struct page *page); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 95c6f9a5bbf0..11c02513588c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5445,7 +5445,7 @@ static bool hugetlbfs_pagecache_present(struct hstate *h, return page != NULL; } -int huge_add_to_page_cache(struct page *page, struct address_space *mapping, +int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx) { struct folio *folio = page_folio(page); @@ -5586,7 +5586,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, new_page = true; if (vma->vm_flags & VM_MAYSHARE) { - int err = huge_add_to_page_cache(page, mapping, idx); + int err = hugetlb_add_to_page_cache(page, mapping, idx); if (err) { restore_reserve_on_error(h, vma, haddr, page); put_page(page); @@ -5993,11 +5993,11 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, /* * Serialization between remove_inode_hugepages() and - * huge_add_to_page_cache() below happens through the + * hugetlb_add_to_page_cache() below happens through the * hugetlb_fault_mutex_table that here must be hold by * the caller. */ - ret = huge_add_to_page_cache(page, mapping, idx); + ret = hugetlb_add_to_page_cache(page, mapping, idx); if (ret) goto out_release_nounlock; page_in_pagecache = true; From patchwork Wed Aug 24 17:57:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12953868 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31B24C32792 for ; Wed, 24 Aug 2022 17:58:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7E889940008; Wed, 24 Aug 2022 13:58:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7977F6B0078; Wed, 24 Aug 2022 13:58:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 59CD1940008; Wed, 24 Aug 2022 13:58:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 43D776B0075 for ; Wed, 24 Aug 2022 13:58:30 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 152CA120867 for ; Wed, 24 Aug 2022 17:58:30 +0000 (UTC) X-FDA: 79835245980.25.346A228 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by imf18.hostedemail.com (Postfix) with ESMTP id 8E77E1C001A for ; Wed, 24 Aug 2022 17:58:29 +0000 (UTC) Received: from pps.filterd (m0246629.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27OHkWXs030222; Wed, 24 Aug 2022 17:58:19 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2022-7-12; bh=OGb2VYkTMafrtVeVL6G6hTUZMQp2A7MemO9VR1IxmHw=; b=FhssGk7GBh9uTcfloQpJNf7vlbSXTvMVkxqQZVHVevOrCRW9n+h39v+QCGtHQgH9rVyT DXaJSL77+rX6zwJJsbN7TCBr8ZgZPpLiCvBgsWiblVqly8/q6fwfcz2MY+6GBMBl+ZWX v4R7MjGI+1tbMT2unDsbJFN4PiT5ysa9snlci4AymKv40x8IauypyxLC8+19SUBeYMuq pWVvaxbvtz/SAkC/pNJJINlcPrzt6BhfIUktMpCmEV5ih4VHmqPKAYAbgOP//rxggZsv R0SylddmLkilwZKrq4Uu8ecPTrLPhcPeqEhwjf7wkGHq/hwEjfzWISh/CNDMheS58eqE DQ== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3j55p22ms8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Aug 2022 17:58:19 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 27OHABww028377; Wed, 24 Aug 2022 17:58:18 GMT Received: from nam11-dm6-obe.outbound.protection.outlook.com (mail-dm6nam11lp2169.outbound.protection.outlook.com [104.47.57.169]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3j5n4kbnw5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Aug 2022 17:58:18 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=O9pfdS9pxsay7K1rjcEJrRUnZcqF8eGoisCRmd46xISggcNjPqiDQG9zmRNGtpAtMDYc74SSgh3wPnHtxy4ErTucOIBydQYADbNTf/mcY+11QJ/KDsc3vvtZt9icrhzYvegOdCLhUZXcYLWOYFpN0oAk73Sbd3OgtImaDRIMErvpkAdJnQvMdLPsfkM6BO3APLj+9aFLGFg2+Z3Ch/mgct4YvE0rjXEHAFETWOHgIqrfEREZAmAfrVjiIPQ/mdWkY6BO7pHzek986EQex9noRl6lUkL0rwqt4X5iDv7ji5LTrjJLCNUdN7W583R/dlXxb2OWIoZK4zSAF3NOXnzB6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=OGb2VYkTMafrtVeVL6G6hTUZMQp2A7MemO9VR1IxmHw=; b=Pfl+TW82T2/pTIYR9DZylye7O/rbGlTPW/DLp5cBk2Wau4/EACxPjpIW1uDQ1bflXKG2H5xk/9xNY1u9kx9cut51GqCp/IUDqRyNrCC+GHmGXGwm/05Fvn1iFrf6eqv4AQXo455JAHnGnpLLl0zA+bFYnT0pDcfniBGezZcthjhqUX1Fr+Of58by1G0tk1vrDL088OpfXpZE4xOA67bvd6t79OsK2EzW8y13m+S3QdIGuWSPhXS4LIYXZcBkDa3/P39zxZI5zLcdI/7O/iSjGw6aq4IMQaN/K5IVdNv+W+OvPd7LeSXKnT/juf3zNzr0WiWhrjoiXSrkiRq6UxoUgw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=OGb2VYkTMafrtVeVL6G6hTUZMQp2A7MemO9VR1IxmHw=; b=YkKNUQbI7Cn57rfElarC8+kGPLVZYh5fB9szRuy1T55AfWKgVEn/7h7euwRoet53za5zHSrSCSfmeFUJo3bR8uMk8z+t9YeeXDMNAQJwM/tXskNAID75abO5/DDTSwO6WoM74HXFrW/3/a7Aq2NrNA9D9X8sGIa69M90D5Hzt0g= Received: from DM6PR10MB4201.namprd10.prod.outlook.com (2603:10b6:5:216::10) by PH0PR10MB4520.namprd10.prod.outlook.com (2603:10b6:510:43::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5546.19; Wed, 24 Aug 2022 17:58:15 +0000 Received: from DM6PR10MB4201.namprd10.prod.outlook.com ([fe80::11b6:7a8a:1432:bec]) by DM6PR10MB4201.namprd10.prod.outlook.com ([fe80::11b6:7a8a:1432:bec%6]) with mapi id 15.20.5546.022; Wed, 24 Aug 2022 17:58:15 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Miaohe Lin , David Hildenbrand , Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [PATCH 4/8] hugetlb: handle truncate racing with page faults Date: Wed, 24 Aug 2022 10:57:53 -0700 Message-Id: <20220824175757.20590-5-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220824175757.20590-1-mike.kravetz@oracle.com> References: <20220824175757.20590-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW2PR2101CA0025.namprd21.prod.outlook.com (2603:10b6:302:1::38) To DM6PR10MB4201.namprd10.prod.outlook.com (2603:10b6:5:216::10) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 85d695c4-7d95-4b5d-414a-08da85fa381f X-MS-TrafficTypeDiagnostic: PH0PR10MB4520:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: XAT2TW1ahF8BIn0D/FZN1M8AT1x810+egDz+wtY7zcNsrnU2sZsopPmrhJ4CcOnuPvmfH6ihxi4J48BeCrAK02pcusPX/ZU8c2edwZWZO6mmuuDOX1FY8z0RRXGKZo4H5YZpJ/EIQNkMD/8cLVkn492uU/Ze/FPDpqT7hfoW8oDa3U9J2aNm6EdJlCDQHdJ4N+qXZG72cLkqDHduGqaLRZ3S9Fjd1DwHU1yT2wggMqHvbtm01RoqEbQuKbN1tQ6vQCahm6xBrncFExMUrqQ3KTVexwA6T2sSPyWEJavIdFET10MiGc/N140L28MpnNhUfcOl1ERCMQTfKgCtaIhN1YVDOveKv89PKYEv1AeRKHnxzyNha/eYVcfKfcBu2BQU/3qw6xm0znU9mU2RDlF8V5OIJFc3MfVWsnyHHVpYPishQP5wJnsnqhHCCjwWA6ydwrPGysvaf93tuXLBHmiSlu7r3l+sjWWkZPe3/XPDtgZV0qGor+gAxEXBmmmOrFBnquODNFuc4QUtZw5yNPLYNlhUx4bI3EcD40TBz6c9Q3yp1ODlksYu7YIirX520CsEL6orjL4gXCFRO/aehNa4YK3jEvvqGUqMxBKYzuC4brHwSaHOndFL1EAANkpSLbFPlK99Ri9A96NrHCc+pCnO9DAnqh3J7/YJtLwqz98gTWdbYcUxj8ouyYtNBwWF3ebSkrxIpmTz4MwKB7RGc/He+Afm3zgYxr5veO4DcRQ0RP13w9RdFWpx2LLKE37ZgiR/ X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DM6PR10MB4201.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230016)(39860400002)(396003)(136003)(376002)(346002)(366004)(7416002)(1076003)(6512007)(6666004)(8936002)(26005)(186003)(30864003)(107886003)(2616005)(6506007)(44832011)(86362001)(36756003)(41300700001)(316002)(5660300002)(54906003)(478600001)(66556008)(8676002)(4326008)(6486002)(83380400001)(38100700002)(66946007)(2906002)(66476007)(76704002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 3M9zxDnIso6a/CoA4xBkXQ1vaVHGpKxFnHl/EjgZ5wXswIzBrV400mQVpgd6wjilsn69PeOx26dgaSCQZRGsM7g8lGzC9j7PJWjrVOcE5b0hMaL50xtGhfHunAHc5NN4806NIrsMuqtLkvBtEuDCfNCKOX4h1pPREDrpk7E4r3Xf2Rrxj3tnLX2vcJSIXIlkhlYxyqb0y05MxSKOdJmPF7yjMbkWBZOPC2I1xrMuckS2X4UohJDQifctH6cfix7a+jba0XZnsMwazt78reWoek5A9UCFgAuO0F5/6NdPb+TSHfIiQHN9CxToguggfiXuQwHch39W5y+7UtGrkK45q8dJWpxHKxUFUlgE8WuMx4wIC3AtU6N/H19iOTmtZSHYez3zT5XU7Lzcu8akoekhTIjF455Rhgb8rK9+tkbsdxz1Tgiwqc1ymGikU5mFqGWqJfQpXBk6vF4N9fwbAl6H/u67BUfMwI6LJ2IhtiU+XT4RtFzXATGVLCowTzhH1q5FFIQ5GnU9lHUtPf/NrIupw8QVUNS7fh1zzuwuHgILKg0g0mM3DuTk4oZu7dM5iLamsATdWpVHwfAFrN960p+wzha4WYUehIH7a1WvEjHToryx7tkfNaV54WMX38OXlqVkfeszTYsAO+/UeVcppvyyahI9fIjWYgXQKu/y6IVVmxundnER/eA7tVKfoKGMIHfoReenkC93XiWZOoB4dxfcHf75cDGKIOp3O/Yl3Tg4G1YqACcMInos+gBZa0BvK43mc9P/4PIgc7enZ4qChaFFtfnFqWXlGGePrUhKfZesBWuHo2yotf1Fw4BP2UnFrjS2j3bSM8DqLJC+e2Wd7/O5smvU8EQXvrxYYVlGkcAvis3fBfBvzMaSQcgWmmsDawk/TdkC0bYtLWAysnxTmWyA0CMGpT5nNFPxWJnXMjynWD48Wxsjy32BVrONap2/QrCsamNPwMnCV0UOMZseVbNs35O72WP+OTAFKCvnc/tk5E70YPg0zikTbpEABtkA1S4DwIFtI/fwxjWW3kdxg50kxIMrGYp6Q+JNmIQDaD0QZ8SWgCwgrtizvKenjMQ2+Q9GOE21t0zd0WpCsr4VqZX5CUQ306DxizlrMIOaoMRgyA+z00Y2ol5dnnFaUBRPU5y1SRiJQJu9KHFkuOQvkzhvByfmmn57kkuFe1fRCIeZXxqSoCdzyjc48zQ676xUcosZU1MDcmg89clMI45jRZre4Of51FSWueHbzXFtSVSimrNjP5KVh4PSgE3XjZrvz0s9TJDjOvVhfZjdfZGHZuNoe3oAX0u37XU2U49Wi2sTZpfvtEoLkkm4PT7G6KYb4Pz/zocsP+EnmUkoV+kqlbMfzEugWQIbpqJaKdwZ+HJgOMZ0ZJxfoStHqhT/TRP0pZZGHabYuWz5lCLKcin9iggfxa2l9izWjg9SUCUg6t6HTq182D84EoSNC1eZqsuIwPF6Fs1WlhKZSWZ1R0A86buMdKBzPaGjMHY8QY1/kFZYz8heEDDUEG5ptkJmjs0Y7NB5zjx4WtDDzPiIUzDlLKWEgh77oXn0rUDXUZjppLuhIm9gPr4sPpAUmHkN3o++zwZ2tlpFSM+ve442+Am/XQzjNw== X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 85d695c4-7d95-4b5d-414a-08da85fa381f X-MS-Exchange-CrossTenant-AuthSource: DM6PR10MB4201.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Aug 2022 17:58:15.7030 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 4guw0pCwG7hek5828um1z0A5Fj85iiDaW1Ycdg/bPjMCdnCiCxBaOCUq+TPWC+H1z5KL3FqayFvzUiJJ5AkgJQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR10MB4520 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-24_11,2022-08-22_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 phishscore=0 spamscore=0 malwarescore=0 mlxlogscore=999 suspectscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2207270000 definitions=main-2208240066 X-Proofpoint-ORIG-GUID: NakEOfmO8Ve__PEhObjpiCHLGe9hrduT X-Proofpoint-GUID: NakEOfmO8Ve__PEhObjpiCHLGe9hrduT ARC-Authentication-Results: i=2; imf18.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2022-7-12 header.b=FhssGk7G; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=YkKNUQbI; spf=pass (imf18.hostedemail.com: domain of mike.kravetz@oracle.com designates 205.220.165.32 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1661363909; a=rsa-sha256; cv=pass; b=x20L7m3zqKK5/V6CrsrPwQDcfcsJjMVWRR3G6b76EkNyu4BXKtTpQAVxWp14Rjt0iVjYVZ mz8RlmWwRdQ2MBYgjwMyxEoPOE6wRbywNzzU69D0OJztOnvDn4SIqeTJz4YzexPTQbK85g WVkxVtbRxeDrh5mtyp08Cz3qk15XeRM= ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661363909; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OGb2VYkTMafrtVeVL6G6hTUZMQp2A7MemO9VR1IxmHw=; b=mL7Dx0d7xoakQQjSD0z1ixauo5t6+tzswBQIg2mRaPywYz/HdxAM8RknvnQwVmDOfYIUnM 86OPpZvFbrvQkE6bNDWpXbseOmB+cGRx7LTTpj7a3OvzNSWJzVBFwnhwZK4hLRn9lWqKF3 Uilmy2dJ19eIGc3fs6DpBmrLh6OyMIc= Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2022-7-12 header.b=FhssGk7G; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=YkKNUQbI; spf=pass (imf18.hostedemail.com: domain of mike.kravetz@oracle.com designates 205.220.165.32 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 8E77E1C001A X-Stat-Signature: nmy3pq5i7id3cx31pwmzgpwqg37y3jps X-HE-Tag: 1661363909-305136 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When page fault code needs to allocate and instantiate a new hugetlb page (huegtlb_no_page), it checks early to determine if the fault is beyond i_size. When discovered early, it is easy to abort the fault and return an error. However, it becomes much more difficult to handle when discovered later after allocating the page and consuming reservations and adding to the page cache. Backing out changes in such instances becomes difficult and error prone. Instead of trying to catch and backout all such races, use the hugetlb fault mutex to handle truncate racing with page faults. The most significant change is modification of the routine remove_inode_hugepages such that it will take the fault mutex for EVERY index in the truncated range (or hole in the case of hole punch). Since remove_inode_hugepages is called in the truncate path after updating i_size, we can experience races as follows. - truncate code updates i_size and takes fault mutex before a racing fault. After fault code takes mutex, it will notice fault beyond i_size and abort early. - fault code obtains mutex, and truncate updates i_size after early checks in fault code. fault code will add page beyond i_size. When truncate code takes mutex for page/index, it will remove the page. - truncate updates i_size, but fault code obtains mutex first. If fault code sees updated i_size it will abort early. If fault code does not see updated i_size, it will add page beyond i_size and truncate code will remove page when it obtains fault mutex. Note, for performance reasons remove_inode_hugepages will still use filemap_get_folios for bulk folio lookups. For indicies not returned in the bulk lookup, it will need to lookup individual folios to check for races with page fault. Signed-off-by: Mike Kravetz Reviewed-by: Miaohe Lin Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 184 +++++++++++++++++++++++++++++++------------ mm/hugetlb.c | 41 +++++----- 2 files changed, 152 insertions(+), 73 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index d98c6edbd1a4..e83fd31671b3 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -411,6 +411,95 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, } } +/* + * Called with hugetlb fault mutex held. + * Returns true if page was actually removed, false otherwise. + */ +static bool remove_inode_single_folio(struct hstate *h, struct inode *inode, + struct address_space *mapping, + struct folio *folio, pgoff_t index, + bool truncate_op) +{ + bool ret = false; + + /* + * If folio is mapped, it was faulted in after being + * unmapped in caller. Unmap (again) while holding + * the fault mutex. The mutex will prevent faults + * until we finish removing the folio. + */ + if (unlikely(folio_mapped(folio))) { + i_mmap_lock_write(mapping); + hugetlb_vmdelete_list(&mapping->i_mmap, + index * pages_per_huge_page(h), + (index + 1) * pages_per_huge_page(h), + ZAP_FLAG_DROP_MARKER); + i_mmap_unlock_write(mapping); + } + + folio_lock(folio); + /* + * After locking page, make sure mapping is the same. + * We could have raced with page fault populate and + * backout code. + */ + if (folio_mapping(folio) == mapping) { + /* + * We must remove the folio from page cache before removing + * the region/ reserve map (hugetlb_unreserve_pages). In + * rare out of memory conditions, removal of the region/reserve + * map could fail. Correspondingly, the subpool and global + * reserve usage count can need to be adjusted. + */ + VM_BUG_ON(HPageRestoreReserve(&folio->page)); + hugetlb_delete_from_page_cache(&folio->page); + ret = true; + if (!truncate_op) { + if (unlikely(hugetlb_unreserve_pages(inode, index, + index + 1, 1))) + hugetlb_fix_reserve_counts(inode); + } + } + + folio_unlock(folio); + return ret; +} + +/* + * Take hugetlb fault mutex for a set of inode indicies. + * Check for and remove any found folios. Return the number of + * any removed folios. + * + */ +static long fault_lock_inode_indicies(struct hstate *h, + struct inode *inode, + struct address_space *mapping, + pgoff_t start, pgoff_t end, + bool truncate_op) +{ + struct folio *folio; + long freed = 0; + pgoff_t index; + u32 hash; + + for (index = start; index < end; index++) { + hash = hugetlb_fault_mutex_hash(mapping, index); + mutex_lock(&hugetlb_fault_mutex_table[hash]); + + folio = filemap_get_folio(mapping, index); + if (folio) { + if (remove_inode_single_folio(h, inode, mapping, folio, + index, truncate_op)) + freed++; + folio_put(folio); + } + + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + } + + return freed; +} + /* * remove_inode_hugepages handles two distinct cases: truncation and hole * punch. There are subtle differences in operation for each case. @@ -418,11 +507,10 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, * truncation is indicated by end of range being LLONG_MAX * In this case, we first scan the range and release found pages. * After releasing pages, hugetlb_unreserve_pages cleans up region/reserve - * maps and global counts. Page faults can not race with truncation - * in this routine. hugetlb_no_page() prevents page faults in the - * truncated range. It checks i_size before allocation, and again after - * with the page table lock for the page held. The same lock must be - * acquired to unmap a page. + * maps and global counts. Page faults can race with truncation. + * During faults, hugetlb_no_page() checks i_size before page allocation, + * and again after obtaining page table lock. It will 'back out' + * allocations in the truncated range. * hole punch is indicated if end is not LLONG_MAX * In the hole punch case we scan the range and release found pages. * Only when releasing a page is the associated region/reserve map @@ -431,75 +519,69 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, * This is indicated if we find a mapped page. * Note: If the passed end of range value is beyond the end of file, but * not LLONG_MAX this routine still performs a hole punch operation. + * + * Since page faults can race with this routine, care must be taken as both + * modify huge page reservation data. To somewhat synchronize these operations + * the hugetlb fault mutex is taken for EVERY index in the range to be hole + * punched or truncated. In this way, we KNOW either: + * - fault code has added a page beyond i_size, and we will remove here + * - fault code will see updated i_size and not add a page beyond + * The parameter 'lm__end' indicates the offset of the end of hole or file + * before truncation. For hole punch lm_end == lend. */ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, - loff_t lend) + loff_t lend, loff_t lm_end) { struct hstate *h = hstate_inode(inode); struct address_space *mapping = &inode->i_data; const pgoff_t start = lstart >> huge_page_shift(h); const pgoff_t end = lend >> huge_page_shift(h); + pgoff_t m_end = lm_end >> huge_page_shift(h); + pgoff_t m_start, m_index; struct folio_batch fbatch; + struct folio *folio; pgoff_t next, index; - int i, freed = 0; + unsigned int i; + long freed = 0; + u32 hash; bool truncate_op = (lend == LLONG_MAX); folio_batch_init(&fbatch); - next = start; + next = m_start = start; while (filemap_get_folios(mapping, &next, end - 1, &fbatch)) { for (i = 0; i < folio_batch_count(&fbatch); ++i) { - struct folio *folio = fbatch.folios[i]; - u32 hash = 0; + folio = fbatch.folios[i]; index = folio->index; - hash = hugetlb_fault_mutex_hash(mapping, index); - mutex_lock(&hugetlb_fault_mutex_table[hash]); - /* - * If folio is mapped, it was faulted in after being - * unmapped in caller. Unmap (again) now after taking - * the fault mutex. The mutex will prevent faults - * until we finish removing the folio. - * - * This race can only happen in the hole punch case. - * Getting here in a truncate operation is a bug. + * Take fault mutex for missing folios before index, + * while checking folios that might have been added + * due to a race with fault code. */ - if (unlikely(folio_mapped(folio))) { - BUG_ON(truncate_op); - - i_mmap_lock_write(mapping); - hugetlb_vmdelete_list(&mapping->i_mmap, - index * pages_per_huge_page(h), - (index + 1) * pages_per_huge_page(h), - ZAP_FLAG_DROP_MARKER); - i_mmap_unlock_write(mapping); - } + freed += fault_lock_inode_indicies(h, inode, mapping, + m_start, m_index, truncate_op); - folio_lock(folio); /* - * We must free the huge page and remove from page - * cache BEFORE removing the * region/reserve map - * (hugetlb_unreserve_pages). In rare out of memory - * conditions, removal of the region/reserve map could - * fail. Correspondingly, the subpool and global - * reserve usage count can need to be adjusted. + * Remove folio that was part of folio_batch. */ - VM_BUG_ON(HPageRestoreReserve(&folio->page)); - hugetlb_delete_from_page_cache(&folio->page); - freed++; - if (!truncate_op) { - if (unlikely(hugetlb_unreserve_pages(inode, - index, index + 1, 1))) - hugetlb_fix_reserve_counts(inode); - } - - folio_unlock(folio); + hash = hugetlb_fault_mutex_hash(mapping, index); + mutex_lock(&hugetlb_fault_mutex_table[hash]); + if (remove_inode_single_folio(h, inode, mapping, folio, + index, truncate_op)) + freed++; mutex_unlock(&hugetlb_fault_mutex_table[hash]); } folio_batch_release(&fbatch); cond_resched(); } + /* + * Take fault mutex for missing folios at end of range while checking + * for folios that might have been added due to a race with fault code. + */ + freed += fault_lock_inode_indicies(h, inode, mapping, m_start, m_end, + truncate_op); + if (truncate_op) (void)hugetlb_unreserve_pages(inode, start, LONG_MAX, freed); } @@ -507,8 +589,9 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, static void hugetlbfs_evict_inode(struct inode *inode) { struct resv_map *resv_map; + loff_t prev_size = i_size_read(inode); - remove_inode_hugepages(inode, 0, LLONG_MAX); + remove_inode_hugepages(inode, 0, LLONG_MAX, prev_size); /* * Get the resv_map from the address space embedded in the inode. @@ -528,6 +611,7 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset) pgoff_t pgoff; struct address_space *mapping = inode->i_mapping; struct hstate *h = hstate_inode(inode); + loff_t prev_size = i_size_read(inode); BUG_ON(offset & ~huge_page_mask(h)); pgoff = offset >> PAGE_SHIFT; @@ -538,7 +622,7 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset) hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0, ZAP_FLAG_DROP_MARKER); i_mmap_unlock_write(mapping); - remove_inode_hugepages(inode, offset, LLONG_MAX); + remove_inode_hugepages(inode, offset, LLONG_MAX, prev_size); } static void hugetlbfs_zero_partial_page(struct hstate *h, @@ -610,7 +694,7 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) /* Remove full pages from the file. */ if (hole_end > hole_start) - remove_inode_hugepages(inode, hole_start, hole_end); + remove_inode_hugepages(inode, hole_start, hole_end, hole_end); inode_unlock(inode); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 11c02513588c..a6eb46c64baf 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5527,6 +5527,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, spinlock_t *ptl; unsigned long haddr = address & huge_page_mask(h); bool new_page, new_pagecache_page = false; + bool reserve_alloc = false; /* * Currently, we are forced to kill the process in the event the @@ -5584,9 +5585,13 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, clear_huge_page(page, address, pages_per_huge_page(h)); __SetPageUptodate(page); new_page = true; + if (HPageRestoreReserve(page)) + reserve_alloc = true; if (vma->vm_flags & VM_MAYSHARE) { - int err = hugetlb_add_to_page_cache(page, mapping, idx); + int err; + + err = hugetlb_add_to_page_cache(page, mapping, idx); if (err) { restore_reserve_on_error(h, vma, haddr, page); put_page(page); @@ -5642,10 +5647,6 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, } ptl = huge_pte_lock(h, mm, ptep); - size = i_size_read(mapping->host) >> huge_page_shift(h); - if (idx >= size) - goto backout; - ret = 0; /* If pte changed from under us, retry */ if (!pte_same(huge_ptep_get(ptep), old_pte)) @@ -5689,10 +5690,18 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, backout: spin_unlock(ptl); backout_unlocked: - unlock_page(page); - /* restore reserve for newly allocated pages not in page cache */ - if (new_page && !new_pagecache_page) + if (new_page && !new_pagecache_page) { + /* + * If reserve was consumed, make sure flag is set so that it + * will be restored in free_huge_page(). + */ + if (reserve_alloc) + SetHPageRestoreReserve(page); + restore_reserve_on_error(h, vma, haddr, page); + } + + unlock_page(page); put_page(page); goto out; } @@ -6006,26 +6015,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, ptl = huge_pte_lockptr(h, dst_mm, dst_pte); spin_lock(ptl); - /* - * Recheck the i_size after holding PT lock to make sure not - * to leave any page mapped (as page_mapped()) beyond the end - * of the i_size (remove_inode_hugepages() is strict about - * enforcing that). If we bail out here, we'll also leave a - * page in the radix tree in the vm_shared case beyond the end - * of the i_size, but remove_inode_hugepages() will take care - * of it as soon as we drop the hugetlb_fault_mutex_table. - */ - size = i_size_read(mapping->host) >> huge_page_shift(h); - ret = -EFAULT; - if (idx >= size) - goto out_release_unlock; - - ret = -EEXIST; /* * We allow to overwrite a pte marker: consider when both MISSING|WP * registered, we firstly wr-protect a none pte which has no page cache * page backing it, then access the page. */ + ret = -EEXIST; if (!huge_pte_none_mostly(huge_ptep_get(dst_pte))) goto out_release_unlock; From patchwork Wed Aug 24 17:57:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12953869 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1515C32796 for ; Wed, 24 Aug 2022 17:58:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D66296B0075; Wed, 24 Aug 2022 13:58:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CEFC5940009; Wed, 24 Aug 2022 13:58:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF3946B0078; Wed, 24 Aug 2022 13:58:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 98A20940009 for ; Wed, 24 Aug 2022 13:58:30 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 60AD080287 for ; Wed, 24 Aug 2022 17:58:30 +0000 (UTC) X-FDA: 79835245980.12.3BB3D50 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by imf27.hostedemail.com (Postfix) with ESMTP id CF85D40057 for ; Wed, 24 Aug 2022 17:58:29 +0000 (UTC) Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27OHkK8W019890; Wed, 24 Aug 2022 17:58:21 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2022-7-12; bh=Glb9IFLJyvvc/v2Fwk4BwqIMnxr1gMo/DaP60kaegfQ=; b=bmVaLAVsW/TsvUOm3KdbOpYIT0F/gYzMrOruoLDqg/oiw4sKgJULugoDzltDwJ9et3qV U3C7iuNjBP0qPOOGboRqBHG7Ve1x7WtyEeXLGc5jw46W3Qqi0KmdN0iRuzcxcqTTu2fP iEO2zDfRCuTyt/8lDz33nsBod/wwHmHaDXTu83bISJFTp9feuSbRwHGANqqOBaLgyEVi 8dozkY51wqX1gwtgwRwKRe+M6RQ4vYi7OfPV9IpHaMecRlBFcwwrn1tDU7j/u0L6zvzK UAFPd3y1lqXV72CBvC5806EzYhRWlWkpNUz+YAVd6b/XAkdea6JFZR7xmwwP1zD7yzEk Ug== Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.appoci.oracle.com [138.1.37.129]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3j4w23vbvb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Aug 2022 17:58:21 +0000 Received: from pps.filterd (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 27OHCw6v021777; Wed, 24 Aug 2022 17:58:20 GMT Received: from nam02-sn1-obe.outbound.protection.outlook.com (mail-sn1anam02lp2041.outbound.protection.outlook.com [104.47.57.41]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3j5n5nbgf4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Aug 2022 17:58:20 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=MxXRYhDCvuU4quXzMs2VY1fFAOE+Cxu0Eu2mBH8ZNbOaURpA3LvaBTK/3glckdrZ7mUYjpuHNsmkOTctRFBVssnlnA3326uE7AM1CnTLz5Y5OcDpDGpbNId1y0n25VsbScRFhrRREpHZdIaJSkpZbgI3pZCHrQ+3btjsqA7s54Oh3UwkumW6MkyB+RQiQnjrq6+krzG6Utebo+Wrrw/ZxNxRCbrTmAZwWwYhNQcNnHUAJYcuoJQgIy9goaRZoYZV1F3aOjmhCb3a+bZEeM/oEnyqebpD5GZzFNbcX087qqzF33pAs7FygKkqp4tqeUqCvRfVtKtWZWkcPhaV156c6w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Glb9IFLJyvvc/v2Fwk4BwqIMnxr1gMo/DaP60kaegfQ=; b=YR5M4iS9/qU15OF1gw0NkNfMDhqejH87L7ZpqL/sNcdOkdmFsVRop2oHZO6RwiCG9bS7V282fCQHGBSVWb9joazcC4PsjtoE6qrKatssMsG4R5XOcYMklEohqvskrf+7LGak/CDJVNFPltWIiuXjZeplVz+EJ1xtS2nDeuVMfvF0f/xdt4uNe3fQXo5pojhLfQv3ULIAtRWo54aigpRwNf2cY1omJzbtiiTyjY39ROTLxBS0XUS6i6IWVUA9Nq7rLDqbU+BneQGaXap2eX2t099sYr2QYHVjjO0KJ2n/QEACidO1Bk7d5exE+5cKgZSebNzEpq5CTNhQHMu2iWBh8A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Glb9IFLJyvvc/v2Fwk4BwqIMnxr1gMo/DaP60kaegfQ=; b=UKMRsviVb/VUlxhzeHESG7aqCQ2ec7SOoZj1zP9Ua/U3EGyPYkN14nxR8UgkMxgeFU++L4w2cyFlkx+PtMgk94H07GX6E4mifuzQ8MAWCGkY4ls6W3mY+b9W+efIJ1pfvgbpPsklvC+2yqf0i4KCovsJqY5tNL9BVh8A8yMo1i4= Received: from DM6PR10MB4201.namprd10.prod.outlook.com (2603:10b6:5:216::10) by PH0PR10MB4520.namprd10.prod.outlook.com (2603:10b6:510:43::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5546.19; Wed, 24 Aug 2022 17:58:18 +0000 Received: from DM6PR10MB4201.namprd10.prod.outlook.com ([fe80::11b6:7a8a:1432:bec]) by DM6PR10MB4201.namprd10.prod.outlook.com ([fe80::11b6:7a8a:1432:bec%6]) with mapi id 15.20.5546.022; Wed, 24 Aug 2022 17:58:18 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Miaohe Lin , David Hildenbrand , Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [PATCH 5/8] hugetlb: rename vma_shareable() and refactor code Date: Wed, 24 Aug 2022 10:57:54 -0700 Message-Id: <20220824175757.20590-6-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220824175757.20590-1-mike.kravetz@oracle.com> References: <20220824175757.20590-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW2PR16CA0063.namprd16.prod.outlook.com (2603:10b6:907:1::40) To DM6PR10MB4201.namprd10.prod.outlook.com (2603:10b6:5:216::10) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: f22e230a-f671-41ab-5e8d-08da85fa39cf X-MS-TrafficTypeDiagnostic: PH0PR10MB4520:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 40g0J68DKbepl0hxQDrS3oxXeVge9wCg0+e25UkkgR+ylGkW1zljHV5Bzk1mZp5brUbzMaln7EGCn7geF3dwNJ2gllBSAR/ouW9Hgo3e1jcibP0qPo1M+nBDbv0hWCrxmb09HZb8LFAhd6wNGioo9QHfzhNRH3LvCsso3tepYK2+siDGEZ8QwvXJx6vun0PX0hqDhRJy3W/9yce6KU8mxFU1Wsa/s39T1u4ZfTdzd5KQryOsnjI73Oxh28ZP4OelTSdLJ1cJtw6zE9ZBz/WC6CPRcCh0qaZqF7M6OAnLB2YTj0B/kt9Ldf/GYdZaTBJZMGgYsqYCt9eUB4uj97LqbAJyZe+bA/Er/lF8CgxqUf9eTPv5R3BrMBsAgF+enNcLIqDQEjK4yB/ppWEFmTnkZNTuAaT6YWJV0yuh/Fxe12yqfILKx4o0L6OHR2Tb/l2hCyREsnY2JEF7p4u81lOkkl6dLxDlkVhttppKGZH7jMgb9vLP8E2qFh3UfvgDFTNyM3/CaFiJVP4ferlDAjacdo/0BkdS6uKFTxAle5N+/SBZnZnbFO8Uy7eAFFo5Hy8sXcq2Mdtq8Ga4wgN5y4DriHqRx0W8U+AE3kbCr6pRlfhpvIa0XCbMynkUsJ5AbwK8Eh5gKUK4QCmr/q4JupLpu6yR3gfKCONUTBlt2cF7oj4OBwg7JMGf/zUqhsPhCb8vGSfbit6xA6Ma+ucs94B1TA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DM6PR10MB4201.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230016)(39860400002)(396003)(136003)(376002)(346002)(366004)(7416002)(1076003)(6512007)(6666004)(8936002)(26005)(186003)(107886003)(2616005)(6506007)(44832011)(86362001)(36756003)(41300700001)(316002)(5660300002)(54906003)(478600001)(66556008)(8676002)(4326008)(6486002)(83380400001)(38100700002)(66946007)(2906002)(66476007);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: sJiBa6wsm6KJ5zBz66XkDs1Rve96iA3fq1XW9wm15QKkUhOt0TTzi6oyvVaWqRQzZNrv4e4E1BQGOYLEUwhrgHcYdXD/2a9KugqLRylFTZ7dulHRDoBjKMkdjR/S8qos5wfrvNhoSJlceLxOYgqBQGKsNWtJHwwf7AihrsxKUq7yHTVLZ6/sm3TCi60YXV3oywlRa6KTmW/qr3frA3tMOmt0QRoEtXPutCgadAYF7b28S2TuqepNAFcwDp+3bhiCxoEH/jdC3hY2CMe7k1wxPlG7D15XoGd7ZpM/OMAd4dTrhSkqzevEwtXRb7K9/ckQa5sDN5chyCmMq3q9/ayRAUxx+cdyY3MfIzzHlPpCDJHQ6l6vJfYre/HsWmhAulO13gKG7hcxkt8OKl+dTdxjx5+mgOm9Pj555AtzhOZfrNnxejZhiWgRI3QDArT2T7PcnBNjnbaeewla+VYezxvAHRb7z798iSIIHxWH8Xgbh5qYKuIh/xGSbJRYmQw6A2V3Ywmq2BKiD76ikzgCOy2dJPNSPbh9KwNCSaBPNBErB0Mfe9y492/6SlY6+/fkEPl/7pm4pDI+75rT5UqKve9IIfFBPGXOXivf6GnY0sR4ioHQCzgogjupBJN1Ud0H3Gzigd9+DJFB8u+JemB8RMssnYz5jmIYRCiZas+uSa6spkNNUUwbdA4YQ2Mg26Spv3MUm2nouuLehEohzgJdEhA8J/JBeZF7ZoWgid0kKx6Dgn4qOHpiK4L9aTzqeoFyb9YWjDmgPQY4Zrbdrf/G/SzrD2FDWB6fvZb1jQpMY6VLpjSZKv1UZtxbDk72kwEvtTf1Ozg02EiIfIxPQk1GggyJKmI5f90Nmw66w1nOUBE3FFMBs0KDuULxoF7+QVE4cIc0Yfap44Of5fAV3IS3pT3yCsuWlREW6on0rnnUL5Ji+/tjiQ8vCKqT8xJ/22/22OgAR1XuxFKljiJ1qSyrYsiNZgsOlnpwnGQrdE3/gXKqx61V8eqjdGUsJDq/fd4JBKRKQxYbY4LD+RmBtNMzWzUZx1WweXh+6+TF9EBdY9EJD3YSgputO3lzZDEUXNs/A4d8Ro7mO9lAaO8GurfSRvjMZhOW467K4Baeh3UTc1p0mr60BaCHo7pf9M4vE3Z9XQwTAYnbDgKrfmtPQXl5xELLzO3WxK7Q/uqeLRGo9RFYXV8CclVw1mgb76okHOuJuIvzebFrzGTZCIk68ikAlbfVKlG7T3yl8mfs3ETLHvtkIL/Zmz+Q5CauzT1iXX/NhB7gQwDrumBP/uMnXHFSUM3cndjrXhPLio205Hxwv5aC8p+rVveBDzhHGg+m8q4Xclrco22gmVx4ya4B32Z2kWFHBWHkpIT0yCJOv0c7IAA4lttd4vsd1QKZwTDKgt0ptf1hLFRrmNTLgn10uukSx2TiVogUGim2GqkhD+Y5Vww8qDZlsB+gd2tSL82x9ZG6OFE38gycWQzAR8SVzY2OxuowdkY9IJ15M3jBq5cNFCmiKpKNxAqV7P+fVSLRIQLpdwnyf/9wfYlcuaAazCUu3crlRCVzgzFHRy2BM14X9G0jliHFG9UB10pkyMvZL5Ol7khyGgMmvq7Y5ILHbXwqJsviAw== X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: f22e230a-f671-41ab-5e8d-08da85fa39cf X-MS-Exchange-CrossTenant-AuthSource: DM6PR10MB4201.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Aug 2022 17:58:18.5337 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: tB2OYvFd2x1ZpbFcBLkqPUBVSGgcjG6EjQwuXpXeMFJPYqM2mFpvsw1LTGLC+nUJHsHoJDUEfpH5BCLzxuY4mA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR10MB4520 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-24_11,2022-08-22_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 mlxlogscore=999 phishscore=0 adultscore=0 mlxscore=0 bulkscore=0 suspectscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2207270000 definitions=main-2208240066 X-Proofpoint-ORIG-GUID: qzDGbdXMLe7l5D7Nxr2hAGVfcXHJslbp X-Proofpoint-GUID: qzDGbdXMLe7l5D7Nxr2hAGVfcXHJslbp ARC-Authentication-Results: i=2; imf27.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2022-7-12 header.b=bmVaLAVs; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=UKMRsviV; spf=pass (imf27.hostedemail.com: domain of mike.kravetz@oracle.com designates 205.220.165.32 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1661363909; a=rsa-sha256; cv=pass; b=sq+WLb5x7QqSgJ3w8BMiG6vfTvmpUyI5j+BnqHjgnu+aKpD2kYa+/Z3W2rKUKLF11WZqZy 63q6LzcyBt9j7qT1l5d7WUDeKEJqFeGXRNNpXoTKD9Q0cnETQeZbbbwVwk2HNKfCi3Uwsw gZvADkie236KvngiItGcimVKd0IOaMA= ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661363909; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Glb9IFLJyvvc/v2Fwk4BwqIMnxr1gMo/DaP60kaegfQ=; b=N/v8Zl69KvthM5ViMxhkDGMX45FrGaZbE0Jvl4hU8mjDjf2kTqP08qw+N4ulQfMMWrpWmD NZtE7qdTM5ACdcBuhGWgZ/HEPfdpntzmYcPJXtArlM49wOxN0nKcV3b2/SY3p7BYO5FYPL 7R1xbRe/0K4L+foShbqVeytDedc4Vqo= Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2022-7-12 header.b=bmVaLAVs; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=UKMRsviV; spf=pass (imf27.hostedemail.com: domain of mike.kravetz@oracle.com designates 205.220.165.32 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: CF85D40057 X-Stat-Signature: yh1b7gedfue6rxgme46xsfq4rmqjmp7k X-HE-Tag: 1661363909-119438 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Rename the routine vma_shareable to vma_addr_pmd_shareable as it is checking a specific address within the vma. Refactor code to check if an aligned range is shareable as this will be needed in a subsequent patch. Signed-off-by: Mike Kravetz Reviewed-by: Miaohe Lin --- mm/hugetlb.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a6eb46c64baf..758b6844d566 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6648,26 +6648,33 @@ static unsigned long page_table_shareable(struct vm_area_struct *svma, return saddr; } -static bool vma_shareable(struct vm_area_struct *vma, unsigned long addr) +static bool __vma_aligned_range_pmd_shareable(struct vm_area_struct *vma, + unsigned long start, unsigned long end) { - unsigned long base = addr & PUD_MASK; - unsigned long end = base + PUD_SIZE; - /* * check on proper vm_flags and page table alignment */ - if (vma->vm_flags & VM_MAYSHARE && range_in_vma(vma, base, end)) + if (vma->vm_flags & VM_MAYSHARE && range_in_vma(vma, start, end)) return true; return false; } +static bool vma_addr_pmd_shareable(struct vm_area_struct *vma, + unsigned long addr) +{ + unsigned long start = addr & PUD_MASK; + unsigned long end = start + PUD_SIZE; + + return __vma_aligned_range_pmd_shareable(vma, start, end); +} + bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) { #ifdef CONFIG_USERFAULTFD if (uffd_disable_huge_pmd_share(vma)) return false; #endif - return vma_shareable(vma, addr); + return vma_addr_pmd_shareable(vma, addr); } /* From patchwork Wed Aug 24 17:57:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12953870 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7AD56C00140 for ; Wed, 24 Aug 2022 17:58:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0954794000A; Wed, 24 Aug 2022 13:58:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 04531940009; Wed, 24 Aug 2022 13:58:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D8B0394000A; Wed, 24 Aug 2022 13:58:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C63CB940009 for ; Wed, 24 Aug 2022 13:58:36 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id A676E1A0390 for ; Wed, 24 Aug 2022 17:58:36 +0000 (UTC) X-FDA: 79835246232.13.BA79EB8 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by imf24.hostedemail.com (Postfix) with ESMTP id 1BE3618006C for ; Wed, 24 Aug 2022 17:58:35 +0000 (UTC) Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27OHkI9D019882; Wed, 24 Aug 2022 17:58:25 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2022-7-12; bh=QCwZ/N85rjWntb6nEsKPXaX94vxWecueaucD/Orm1TI=; b=Dcb2agzjjRh2Z6LeRqmcL/twI0xbEEPtw9lrlQkftutXdrZcNnlmesJ91Nt8C4pvaSu8 qsAht7t7h3tg87foOO/n1tnuFK+APkT6FFXd13WCQ0F/zLHr2RTjDs2f9XVBAknjRhki QKT0ccvwdGF/LkfArFkZ6PA4DmoNNRhozbXsitV89RguPbW7muE3Y9MNLSse98UaRFGV V+OxXRwbGD2BSISY7MbWUR1WUxy06RaGuCxhtTFgK8Pk05aeZUQw4pQYbJXKyGpwkOnt WB+JzXVwF3ZIpUpXGY/oGOgcEqEwCHc4BBsiQt0xhR2NJOqCMUwdpfAgMwkfzOhD+SDj RQ== Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3j4w23vbvf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Aug 2022 17:58:25 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 27OHEEu5011231; Wed, 24 Aug 2022 17:58:23 GMT Received: from nam02-sn1-obe.outbound.protection.outlook.com (mail-sn1anam02lp2047.outbound.protection.outlook.com [104.47.57.47]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3j5n6kujd3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Aug 2022 17:58:23 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OFdTP6/HlyVAfyjAow1wDP614Ti15uR1vP6F/BXuUxPz8Qugll/yd6dJQOcQqSnuioovGE3UZyBxTk8JswKt9Qk1cZct46rJiwWYCpJenUyVzDfn5q03WY2anhIfXUx4AR62QO1nlTegJkdF8QaKcdaXzONx56gDXkkkGyqeMTdGvjT29rk440WIFtqrXJtV2LuqTqhTlPelDMgNFz808vuXevJPdYaucWMHdRQkXdMDISwGux60v2ZslINCjasw/vNZeWtPI3gba4cLsPTKAsn2PidYAPJ0di6Q6DxetsSryX1+JUnSNQ1IHY9IxRIZKGEMTDUp7P6BMFzBE/cH/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=QCwZ/N85rjWntb6nEsKPXaX94vxWecueaucD/Orm1TI=; b=hKfJmAOfwoZ2U/sMWiUGEYLgmT8O/261Tkzm+R1qh+ZQBRMdqaenQYR2Gebxzh0FWL+9vqVVlcVcpcXlOkSOWzwQTV8m/Bn/PN7FeIkxk9P7kT+Fb6zMwfWTgZK7nTC2ABL593CAic19mzpkg2Fo9AuYtgXK7F+jBgOcpgSnAfRIqHB4bAfFLEhGxX0vNVmZ6+1hxAuNevVtwT2q1AYt/uvEu43HutGPSmgnUVi+IWju5bTuwkf/aSuaRrtYxDPDGQZEMRWvvvoUNmv6JxvgAZJmIoEc0jwOPgo+f8IxII3jEGC5Mg2Ovxwzzw8PNEIXIXdh//3Vp0pY29NjiKJlBg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=QCwZ/N85rjWntb6nEsKPXaX94vxWecueaucD/Orm1TI=; b=ptuQvbMiwfulU2zyc7VdHg3XlrewUy0+uSLWq9IlnjO2de+8Kk7irvwwxxIoPv1NQ1BTnY6gRhxgPNJsB319U91Fhgp8xuZJxXhZeS3r1hhCVhbGddDpW1ZjbROmg3ce6oJ3Y+KvlmY36ukKAh2VsymSQmsEwL8QFGbExmDJ5X8= Received: from DM6PR10MB4201.namprd10.prod.outlook.com (2603:10b6:5:216::10) by PH0PR10MB4520.namprd10.prod.outlook.com (2603:10b6:510:43::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5546.19; Wed, 24 Aug 2022 17:58:21 +0000 Received: from DM6PR10MB4201.namprd10.prod.outlook.com ([fe80::11b6:7a8a:1432:bec]) by DM6PR10MB4201.namprd10.prod.outlook.com ([fe80::11b6:7a8a:1432:bec%6]) with mapi id 15.20.5546.022; Wed, 24 Aug 2022 17:58:21 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Miaohe Lin , David Hildenbrand , Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [PATCH 6/8] hugetlb: add vma based lock for pmd sharing Date: Wed, 24 Aug 2022 10:57:55 -0700 Message-Id: <20220824175757.20590-7-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220824175757.20590-1-mike.kravetz@oracle.com> References: <20220824175757.20590-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW2PR16CA0063.namprd16.prod.outlook.com (2603:10b6:907:1::40) To DM6PR10MB4201.namprd10.prod.outlook.com (2603:10b6:5:216::10) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 34f7e533-a436-4aa5-b8e6-08da85fa3b32 X-MS-TrafficTypeDiagnostic: PH0PR10MB4520:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: bfEyzMoIKp/ZW2BRh8JCq85oC2PZlIxAXOnCbm6bekuR/b/tKbfZwlMJGRviEUGTORzWs666QkTubUmhGRvIO2dDuP0I366dCyc03haRaqcvv/GcIX5DJXYmIXaIX8RKkO2jVpUOC7IvxQKCRluQ+wnKQOUyJNLfA85K6gXM5hUmCYng5qKeshVU9EuBE7V1rc5Qtxu6J7tVZTaaZfYt0v6IG9TKiW/KAKWOqHx80IPqp+xQaqRagMaPkFjk5reAjGLqylQmyk8Bt5e5rGWcGrwPiEECIHsTTu43ZLlBCaJ4QvBLLi3tHbbjwA88YhGW/r2NBIMeZPUbCIzTk57OGTers7YLhiGt0oXPK6oU8dolKfHFlUlo+aOX2FNlM2W6sgGg9dKF/dKhadzHlo/kGnZddn5s5npLc4l239hwjCJkMTAuzreND4Gl598rLkENuSlcaEiSqE+zOhpS1XlRN9UTQl8zgDqwoAf7FBChHYtVjvm1CG00EQDAIv0KmPcfqYOgmL97eBjwrvg7+VmYEd8cPAxDMfd3sFPpaKmWglpYo0K64ZNjpR+IrdvYF8kfpW8cwoJOFzOqQfzwy8Lig3031krOG4RiT/ls0KHYz6jtheORwZgRhhkXfbUMbmIkJ8aYDC8qZ3cK+JvIFZKQcV44xWY7modoNvXaDSvBxpWwgbVgUfzC+pmJ4fmKe/0p04TULZ81vm/M8P14ayU36A== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DM6PR10MB4201.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230016)(39860400002)(396003)(136003)(376002)(346002)(366004)(7416002)(1076003)(6512007)(6666004)(8936002)(26005)(186003)(30864003)(107886003)(2616005)(6506007)(44832011)(86362001)(36756003)(41300700001)(316002)(5660300002)(54906003)(478600001)(66556008)(8676002)(4326008)(6486002)(83380400001)(38100700002)(66946007)(2906002)(66476007);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: misgXEFnBILQN/ZpHZMbTR9Tri11KarO/459c7WYAMSnTdKuIL24jb3m3XTjlhRnejdpxmSOmgTzVJ1xHtSn6dt7F29GlS7QKcOBUvCXEzDZ1E1qCGaOzku0YDl/aEjr/Gn2W04FtZnhf6B1pgQSPE4HvAu1WoB9rBINVIBUErIosVW5rrEx8qNwyXYDo2fgDlLqB6xCEfjNMvMwzIFLLjG+BQWsqib4KLCp3NPoc+U0L6Djmok5ZKKaTI/wt7pOS595e0kJKDK1SXmymVYDE3/G3AtxN6q3pw+2P4vOG9631HlScaDwRGtrJJ3ZwqaCVQrn5Ov1pCalHydbKctCLomjQRMdJnz8wuIKCG3QhDWh66JBni1fKq5OTw/KabzvXGqSm54nY+DQIX3f04AcBU9T8QB4iBB6v+cYnKefwWkmtwB6yJY5b9gy9fX0NnggqjFLdVa3EEOsb9uG1SfuDcwTxffU8xv4fIdCner6B6KeJepbQx9j/ku/fvecMeqAQjF7jEDikqP6EMqnCMSI4hVydP6Yld+eKfoWb/FVmbnB59tMbJDKb3QGEGYSgwZ1B63QKCilztxjdjwKjs9ZpRJ7n6Cq+6J3T8K+g8xnA/BhwzfFxmQPBixzkPifzpZP/BF17t1BV1YwbSVnMGnVqbuEndgKgstN1NTBf95xOd4b04T7+XT+LEx3pdYT5PWrrNGrBY5RO8sjnN+Biavn/PtzFlSHPReBpIXOfNf+rCC27cSxc1GsMpwx/0rj9DpWVY+icAj0m0PpsvN5ArETXWlFC1iDugPVw6w5KyQD5ACrMMfPkq2qMyUU2D15+xgDxinLrgAE0DYJiUq605mz5QBwqS++dALGl14ebohMUwpQrPbwsn2W267biW2b+4yTJj/kuXLsefTFvNNqMAF1kiNyxiGanrCqFniuc/bjKwFW/ES9yXCBWqJ17EohYlvSVZxmtjFMcr3d6Hf9XHw6uPFq0yIaKKETY2s1lV6se5mfa6/RAsTY4hLhRmFnhw8pEIs16YzzfreJn67squK3Pbty6sTkLuGwEkXVZ9ilZNWQ6ffhOjCFG2Lwa3C8s4Z5tX06yKfx46KI1fJLhtvVruX74sbddAORUXU9pAKpWAmk36fQZ/zzp9ZWIToTShzbAcmaZMWaEcEtNjhUFKbDYU4ZZXPf1KpA4ZGkPli3A3yrKa/Q6O1TFEkFl9M2fvk68Z5OXMA2TLSpiS/YovwLsXCEI2lcILRz9qa8oYDgAHNs8iD3MvF9cr+eqoNb0GVw9+shySUmP2j7wfyCZccP7Zb98Rbtgnu6Kv+9vPVV3BN/L5NqiVfUB4wFoqVEc6exMe9MnFNW1LS3n7/Eex3ar0irqgCzSSIuUMWYB+bmSQA0yG7ZyGwA4TyHadnzVg8QpShG15pvQLeY3TVnkNsJi3mLpBfdGLN66M+zvo1qQK4Opt9WdGGIkPoR3iwkbiNNHW5XEAzi0LS9+LohiowWkeJOs0eFnylZx+DgAFxrj5VwAGFcXEfJiWn6vPnf7Brc0f7Hbche9zIXDBWZpdR+XD3U2ycFycTjZxBUGAkxSySKpFtUcoNHgDw1JrS6WGv5PKfDT+EqzX7RkyrfoeytnQ== X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 34f7e533-a436-4aa5-b8e6-08da85fa3b32 X-MS-Exchange-CrossTenant-AuthSource: DM6PR10MB4201.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Aug 2022 17:58:21.0031 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: HS7IX+kdeqSwDlG0DEmpYg4AwwP0E5IVrhJCXzjLP+bMfNMRbF4PT+pq4Cj/MhL7e0WUjUI7uoDVsR9CWTo/jw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR10MB4520 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-24_11,2022-08-22_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 suspectscore=0 mlxlogscore=999 mlxscore=0 spamscore=0 adultscore=0 phishscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2207270000 definitions=main-2208240066 X-Proofpoint-ORIG-GUID: 5qdOHb4hRfaM1-gW9d87htonKZ8w7sr4 X-Proofpoint-GUID: 5qdOHb4hRfaM1-gW9d87htonKZ8w7sr4 ARC-Authentication-Results: i=2; imf24.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2022-7-12 header.b=Dcb2agzj; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=ptuQvbMi; spf=pass (imf24.hostedemail.com: domain of mike.kravetz@oracle.com designates 205.220.165.32 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1661363916; a=rsa-sha256; cv=pass; b=jtgL8dXSv5fhquBDMNmp0G+dJB0D/MZpT6Pu69xn1p9tD2F/eDLdrvYEJYbRYRi8yhUsc5 96JMWQDx6qahvBj+rWpWpOTWdKglSTz7DS+XuygPKmEPpRQOKxNx0MXMS67KOIqX/LNs+s FFYmOf8Y3XJk3p+3swh5bnAiTxnH3Dg= ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661363916; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QCwZ/N85rjWntb6nEsKPXaX94vxWecueaucD/Orm1TI=; b=4eUE0EP0RpJnmm00lWIDCvGdEzDPK6178BdYuEb6KPtuyfC1aW/neb54aGhZIFWDf0IthG ex68+YDCUdalUr/3eR5zyn9nM3RrKNHsYm9OJAomxVgciYfwi9id5Y9gdlkywuZnrj8Oe7 h11ECMXxB8U/vxc60J9uGcBwAX9Ssuo= Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2022-7-12 header.b=Dcb2agzj; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=ptuQvbMi; spf=pass (imf24.hostedemail.com: domain of mike.kravetz@oracle.com designates 205.220.165.32 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 1BE3618006C X-Stat-Signature: h6tmtjirgn1aohr8oedwz5s6wtzqtwrz X-HE-Tag: 1661363915-118744 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Allocate a rw semaphore and hang off vm_private_data for synchronization use by vmas that could be involved in pmd sharing. Only add infrastructure for the new lock here. Actual use will be added in subsequent patch. Signed-off-by: Mike Kravetz --- include/linux/hugetlb.h | 36 ++++++++- kernel/fork.c | 6 +- mm/hugetlb.c | 170 ++++++++++++++++++++++++++++++++++++---- mm/rmap.c | 8 +- 4 files changed, 197 insertions(+), 23 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index acace1a25226..852f911d676e 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -126,7 +126,7 @@ struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages, long min_hpages); void hugepage_put_subpool(struct hugepage_subpool *spool); -void reset_vma_resv_huge_pages(struct vm_area_struct *vma); +void hugetlb_dup_vma_private(struct vm_area_struct *vma); void clear_vma_resv_huge_pages(struct vm_area_struct *vma); int hugetlb_sysctl_handler(struct ctl_table *, int, void *, size_t *, loff_t *); int hugetlb_overcommit_handler(struct ctl_table *, int, void *, size_t *, @@ -214,6 +214,13 @@ struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address, struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address, pgd_t *pgd, int flags); +void hugetlb_vma_lock_read(struct vm_area_struct *vma); +void hugetlb_vma_unlock_read(struct vm_area_struct *vma); +void hugetlb_vma_lock_write(struct vm_area_struct *vma); +void hugetlb_vma_unlock_write(struct vm_area_struct *vma); +int hugetlb_vma_trylock_write(struct vm_area_struct *vma); +void hugetlb_vma_assert_locked(struct vm_area_struct *vma); + int pmd_huge(pmd_t pmd); int pud_huge(pud_t pud); unsigned long hugetlb_change_protection(struct vm_area_struct *vma, @@ -225,7 +232,7 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); #else /* !CONFIG_HUGETLB_PAGE */ -static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma) +static inline void hugetlb_dup_vma_private(struct vm_area_struct *vma) { } @@ -336,6 +343,31 @@ static inline int prepare_hugepage_range(struct file *file, return -EINVAL; } +static inline void hugetlb_vma_lock_read(struct vm_area_struct *vma) +{ +} + +static inline void hugetlb_vma_unlock_read(struct vm_area_struct *vma) +{ +} + +static inline void hugetlb_vma_lock_write(struct vm_area_struct *vma) +{ +} + +static inline void hugetlb_vma_unlock_write(struct vm_area_struct *vma) +{ +} + +static inline int hugetlb_vma_trylock_write(struct vm_area_struct *vma) +{ + return 1; +} + +static inline void hugetlb_vma_assert_locked(struct vm_area_struct *vma) +{ +} + static inline int pmd_huge(pmd_t pmd) { return 0; diff --git a/kernel/fork.c b/kernel/fork.c index 9470220e8f43..421c143286d2 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -675,12 +675,10 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, } /* - * Clear hugetlb-related page reserves for children. This only - * affects MAP_PRIVATE mappings. Faults generated by the child - * are not guaranteed to succeed, even if read-only + * Copy/update hugetlb private vma information. */ if (is_vm_hugetlb_page(tmp)) - reset_vma_resv_huge_pages(tmp); + hugetlb_dup_vma_private(tmp); /* * Link in the new vma and copy the page table entries. diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 758b6844d566..6fb0bff2c7ee 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -91,6 +91,8 @@ struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp; /* Forward declaration */ static int hugetlb_acct_memory(struct hstate *h, long delta); +static void hugetlb_vma_lock_free(struct vm_area_struct *vma); +static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma); static inline bool subpool_is_free(struct hugepage_subpool *spool) { @@ -1008,12 +1010,25 @@ static int is_vma_resv_set(struct vm_area_struct *vma, unsigned long flag) return (get_vma_private_data(vma) & flag) != 0; } -/* Reset counters to 0 and clear all HPAGE_RESV_* flags */ -void reset_vma_resv_huge_pages(struct vm_area_struct *vma) +void hugetlb_dup_vma_private(struct vm_area_struct *vma) { VM_BUG_ON_VMA(!is_vm_hugetlb_page(vma), vma); + /* + * Clear vm_private_data + * - For MAP_PRIVATE mappings, this is the reserve map which does + * not apply to children. Faults generated by the children are + * not guaranteed to succeed, even if read-only. + * - For shared mappings this is a per-vma semaphore that may be + * allocated below. + */ + vma->vm_private_data = (void *)0; if (!(vma->vm_flags & VM_MAYSHARE)) - vma->vm_private_data = (void *)0; + return; + + /* + * Allocate semaphore if pmd sharing is possible. + */ + hugetlb_vma_lock_alloc(vma); } /* @@ -1044,7 +1059,7 @@ void clear_vma_resv_huge_pages(struct vm_area_struct *vma) kref_put(&reservations->refs, resv_map_release); } - reset_vma_resv_huge_pages(vma); + hugetlb_dup_vma_private(vma); } /* Returns true if the VMA has associated reserve pages */ @@ -4623,16 +4638,21 @@ static void hugetlb_vm_op_open(struct vm_area_struct *vma) resv_map_dup_hugetlb_cgroup_uncharge_info(resv); kref_get(&resv->refs); } + + hugetlb_vma_lock_alloc(vma); } static void hugetlb_vm_op_close(struct vm_area_struct *vma) { struct hstate *h = hstate_vma(vma); - struct resv_map *resv = vma_resv_map(vma); + struct resv_map *resv; struct hugepage_subpool *spool = subpool_vma(vma); unsigned long reserve, start, end; long gbl_reserve; + hugetlb_vma_lock_free(vma); + + resv = vma_resv_map(vma); if (!resv || !is_vma_resv_set(vma, HPAGE_RESV_OWNER)) return; @@ -6447,6 +6467,11 @@ bool hugetlb_reserve_pages(struct inode *inode, return false; } + /* + * vma specific semaphore used for pmd sharing synchronization + */ + hugetlb_vma_lock_alloc(vma); + /* * Only apply hugepage reservation if asked. At fault time, an * attempt will be made for VM_NORESERVE to allocate a page @@ -6470,12 +6495,11 @@ bool hugetlb_reserve_pages(struct inode *inode, resv_map = inode_resv_map(inode); chg = region_chg(resv_map, from, to, ®ions_needed); - } else { /* Private mapping. */ resv_map = resv_map_alloc(); if (!resv_map) - return false; + goto out_err; chg = to - from; @@ -6570,6 +6594,7 @@ bool hugetlb_reserve_pages(struct inode *inode, hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h), chg * pages_per_huge_page(h), h_cg); out_err: + hugetlb_vma_lock_free(vma); if (!vma || vma->vm_flags & VM_MAYSHARE) /* Only call region_abort if the region_chg succeeded but the * region_add failed or didn't run. @@ -6649,14 +6674,34 @@ static unsigned long page_table_shareable(struct vm_area_struct *svma, } static bool __vma_aligned_range_pmd_shareable(struct vm_area_struct *vma, - unsigned long start, unsigned long end) + unsigned long start, unsigned long end, + bool check_vma_lock) { +#ifdef CONFIG_USERFAULTFD + if (uffd_disable_huge_pmd_share(vma)) + return false; +#endif /* * check on proper vm_flags and page table alignment */ - if (vma->vm_flags & VM_MAYSHARE && range_in_vma(vma, start, end)) - return true; - return false; + if (!(vma->vm_flags & VM_MAYSHARE)) + return false; + if (check_vma_lock && !vma->vm_private_data) + return false; + if (!range_in_vma(vma, start, end)) + return false; + return true; +} + +static bool vma_pmd_shareable(struct vm_area_struct *vma) +{ + unsigned long start = ALIGN(vma->vm_start, PUD_SIZE), + end = ALIGN_DOWN(vma->vm_end, PUD_SIZE); + + if (start >= end) + return false; + + return __vma_aligned_range_pmd_shareable(vma, start, end, false); } static bool vma_addr_pmd_shareable(struct vm_area_struct *vma, @@ -6665,15 +6710,11 @@ static bool vma_addr_pmd_shareable(struct vm_area_struct *vma, unsigned long start = addr & PUD_MASK; unsigned long end = start + PUD_SIZE; - return __vma_aligned_range_pmd_shareable(vma, start, end); + return __vma_aligned_range_pmd_shareable(vma, start, end, true); } bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) { -#ifdef CONFIG_USERFAULTFD - if (uffd_disable_huge_pmd_share(vma)) - return false; -#endif return vma_addr_pmd_shareable(vma, addr); } @@ -6704,6 +6745,95 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, *end = ALIGN(*end, PUD_SIZE); } +static bool __vma_shareable_flags_pmd(struct vm_area_struct *vma) +{ + return vma->vm_flags & (VM_MAYSHARE | VM_SHARED) && + vma->vm_private_data; +} + +void hugetlb_vma_lock_read(struct vm_area_struct *vma) +{ + if (__vma_shareable_flags_pmd(vma)) + down_read((struct rw_semaphore *)vma->vm_private_data); +} + +void hugetlb_vma_unlock_read(struct vm_area_struct *vma) +{ + if (__vma_shareable_flags_pmd(vma)) + up_read((struct rw_semaphore *)vma->vm_private_data); +} + +void hugetlb_vma_lock_write(struct vm_area_struct *vma) +{ + if (__vma_shareable_flags_pmd(vma)) + down_write((struct rw_semaphore *)vma->vm_private_data); +} + +void hugetlb_vma_unlock_write(struct vm_area_struct *vma) +{ + if (__vma_shareable_flags_pmd(vma)) + up_write((struct rw_semaphore *)vma->vm_private_data); +} + +int hugetlb_vma_trylock_write(struct vm_area_struct *vma) +{ + if (!__vma_shareable_flags_pmd(vma)) + return 1; + + return down_write_trylock((struct rw_semaphore *)vma->vm_private_data); +} + +void hugetlb_vma_assert_locked(struct vm_area_struct *vma) +{ + if (__vma_shareable_flags_pmd(vma)) + lockdep_assert_held((struct rw_semaphore *) + vma->vm_private_data); +} + +static void hugetlb_vma_lock_free(struct vm_area_struct *vma) +{ + /* + * Only present in sharable vmas. See comment in + * __unmap_hugepage_range_final about the neeed to check both + * VM_SHARED and VM_MAYSHARE in free path + */ + if (!vma || !(vma->vm_flags & (VM_MAYSHARE | VM_SHARED))) + return; + + if (vma->vm_private_data) { + kfree(vma->vm_private_data); + vma->vm_private_data = NULL; + } +} + +static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) +{ + struct rw_semaphore *vma_sema; + + /* Only establish in (flags) sharable vmas */ + if (!vma || !(vma->vm_flags & VM_MAYSHARE)) + return; + + /* Should never get here with non-NULL vm_private_data */ + if (vma->vm_private_data) + return; + + /* Check size/alignment for pmd sharing possible */ + if (!vma_pmd_shareable(vma)) + return; + + vma_sema = kmalloc(sizeof(*vma_sema), GFP_KERNEL); + if (!vma_sema) + /* + * If we can not allocate semaphore, then vma can not + * participate in pmd sharing. + */ + return; + + init_rwsem(vma_sema); + vma->vm_private_data = vma_sema; +} + /* * Search for a shareable pmd page for hugetlb. In any case calls pmd_alloc() * and returns the corresponding pte. While this is not necessary for the @@ -6790,6 +6920,14 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, } #else /* !CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ +static void hugetlb_vma_lock_free(struct vm_area_struct *vma) +{ +} + +static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) +{ +} + pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, pud_t *pud) { diff --git a/mm/rmap.c b/mm/rmap.c index ad9c97c6445c..55209e029847 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -24,7 +24,7 @@ * mm->mmap_lock * mapping->invalidate_lock (in filemap_fault) * page->flags PG_locked (lock_page) - * hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share) + * hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share, see hugetlbfs below) * mapping->i_mmap_rwsem * anon_vma->rwsem * mm->page_table_lock or pte_lock @@ -44,6 +44,12 @@ * anon_vma->rwsem,mapping->i_mmap_rwsem (memory_failure, collect_procs_anon) * ->tasklist_lock * pte map lock + * + * hugetlbfs PageHuge() take locks in this order: + * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) + * vma_lock (hugetlb specific lock for pmd_sharing) + * mapping->i_mmap_rwsem (also used for hugetlb pmd sharing) + * page->flags PG_locked (lock_page) */ #include From patchwork Wed Aug 24 17:57:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12953871 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A93ADC32792 for ; Wed, 24 Aug 2022 17:58:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 290EA94000B; Wed, 24 Aug 2022 13:58:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 21991940009; Wed, 24 Aug 2022 13:58:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 01DDA94000B; Wed, 24 Aug 2022 13:58:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E7DC3940009 for ; Wed, 24 Aug 2022 13:58:38 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C5599C1930 for ; Wed, 24 Aug 2022 17:58:38 +0000 (UTC) X-FDA: 79835246316.03.3E233C7 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by imf16.hostedemail.com (Postfix) with ESMTP id 5FEB218001C for ; Wed, 24 Aug 2022 17:58:38 +0000 (UTC) Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27OHkmI5023702; Wed, 24 Aug 2022 17:58:27 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2022-7-12; bh=uEL2h/a8wus2bTuGqexO7huRxlaSkSg+XS0O4TCJXZc=; b=xxBUvbCzE/V0Biru3SqsLKqbs6CQxkgZj5U5ZAepqkb4HcSHr6a0pFELSDNAp4rjm0R6 TGw7HJMwh7iXLqbNyU6IYFwu+CfCXwo4pdxk44LHnAMrqAWi86QtBC+7rLYqXMyc3MN1 93R/23i7gIHPhyNQn+v/cWWPXkI1DgAcPSB2sQlrUV2dq3IUlrJkeCXPnRF7Vt9AMJEX sxxtKp6s2frxigvT0ACU8l9qLX0j1PAvGdcjsFEhJVF2TKeNCAO2sWTrcFGwE5mqa6d4 ghV5E/a81a2fJHFmEZ4UWPH7xwLxGnGDcHqcbGpMmIO+eB2d2vlj0d2Z4tleGhWgSgsI ZQ== Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.appoci.oracle.com [138.1.37.129]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3j4w25m8ss-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Aug 2022 17:58:27 +0000 Received: from pps.filterd (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 27OHBoq1021835; Wed, 24 Aug 2022 17:58:26 GMT Received: from nam02-sn1-obe.outbound.protection.outlook.com (mail-sn1anam02lp2049.outbound.protection.outlook.com [104.47.57.49]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3j5n5nbghb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Aug 2022 17:58:26 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IkZdSKLjB9+oet9wSAd1SiXTc48w4Lc6m4mS8QX7M1zV7OCxPQ1jLUWj4E1F4WJyJ83jdAg+sjERGFfBQ/7w2D1Vwsshzf49GayQiYbRLVvJrfDHzwNci7pIA3SeriiI2kFOs1jLXbaDsjRcidwGuNQPhS1avxUApD5WmyWONCSVTsNODR3h8EGpPmIDxyG6dJGmJKF6oSzM1T9DISc2qX+FNPxkIUkFNqZ/ObR58lSVa32orM7TzmjxfAldPa7quluSZeKtyaWHl8/FBEosU19LbpfgxXlAhse7GuG9aJ9OAXZ/C8jPeWiabjWGPu+7tglotZpXECnxjwybqmk/vA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=uEL2h/a8wus2bTuGqexO7huRxlaSkSg+XS0O4TCJXZc=; b=UIFaABeYOjR0nt1ZJfZl5z1pow67mg3et28JTh0H5aumGfagiszppuScC90HyC2U1zdbXSMhrLs7xX/gnNkDLzgCs0i1SdR9vScgswA4MUHxLzJD+AkB1WEKWA2uTeJFLUl7GLv8SVZnFfVcYrFeQioYB+szeWqz8KywPIYhwVGZAh8HEjLs3G1C7RV7cuhf72D99+xETfvgKW6f0UKktXMn/NmNS0UGoy/9nj/c6Pk9sVyCmjQ4x2ZKQwMdcFETWTgtEZMthv+pyH5kJ0RB6X7yEoqG3tstms6V1lsPdPhNHXQHUcaSmvuTtWtAlgmvSqtnmTjnbZpak94NU8SbeQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=uEL2h/a8wus2bTuGqexO7huRxlaSkSg+XS0O4TCJXZc=; b=BqxUz5q3koufSGsjbYQ2UAZGeh0EVwW6O/YJadcxof7nmABDTA9j5mZc3BXl3cKpMEOgRTWxjTDU0iy4zYNzPmhuovyCFQqfEi02KaeZ8xhhz5XtxA0ZQ18xcDO6rNIOvB78Cjy/9JVMzJ8yZGT8KxVLP55MJbDsj6HnSjCVXy4= Received: from DM6PR10MB4201.namprd10.prod.outlook.com (2603:10b6:5:216::10) by PH0PR10MB4520.namprd10.prod.outlook.com (2603:10b6:510:43::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5546.19; Wed, 24 Aug 2022 17:58:24 +0000 Received: from DM6PR10MB4201.namprd10.prod.outlook.com ([fe80::11b6:7a8a:1432:bec]) by DM6PR10MB4201.namprd10.prod.outlook.com ([fe80::11b6:7a8a:1432:bec%6]) with mapi id 15.20.5546.022; Wed, 24 Aug 2022 17:58:24 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Miaohe Lin , David Hildenbrand , Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [PATCH 7/8] hugetlb: create hugetlb_unmap_file_folio to unmap single file folio Date: Wed, 24 Aug 2022 10:57:56 -0700 Message-Id: <20220824175757.20590-8-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220824175757.20590-1-mike.kravetz@oracle.com> References: <20220824175757.20590-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW3PR06CA0030.namprd06.prod.outlook.com (2603:10b6:303:2a::35) To DM6PR10MB4201.namprd10.prod.outlook.com (2603:10b6:5:216::10) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 167982a6-dc4a-40f1-1052-08da85fa3d31 X-MS-TrafficTypeDiagnostic: PH0PR10MB4520:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: ZN+AFYmnDgECMpNSpmdHUQHo2naLL5ZtXAcXWI6gJa3NEoi2PzwIHiQzByVQgdHobiusO8Y692AEdRCaNq0asWTj2Knp0UHPeVTY31SpR1rlbJDiXMG5pFi4fEAaOqXNBDg/LC2Svo+A7S23dugrc6w+RvrFpxrUIYCAK00plnP/22I25qKQ1z74JOkvVloJGs3pdh4+jp8qIUVulGFo3OrMnW76s15CsQ76pjM5aFt5nosh9T/M5+Rj6s6YwM2BwU8IKfJd4twrwOqNmSjc7AaXcKAR6gZo2kyaHlFpERrtHadSKQs9XB4wQ5dXPdZNWbmrcp8O2Pl1Z0OUhcI1jeWsahTaT8JqkMW0/Xwfq+GDA5SoJLyrtO5j0uak323sQssVPtAw5+f15wJr+hAT6FJtlo4rBkjkvSy9+7tGZDKz5/9UQpVsfdCXpfhNArCfBqnlS0Wf394mreG11d9xk8epuJS5VkNq92TFlIl4SyNTZsNByMZfgs2oxgMSq1vAxe0UqFiKXOSsNp+GUOksuTR1Lhh9J6NyYD1vakEhM89WEnf7hn+Cu3TaLBuVw4vcESIFZmq9CW4kT79qbQYzk7KlIO1r/CCq3u6dg7AQgqHTno+RKjScudFU4iVp7yxS3P4FlVdRYg7cZ221tRJ7ji+YBz+3m2fEP8jYSZWQTNV4UxukFs4hJ2Xe0lT908L5Gu+FwmyQhGtDqTj/jsEMkA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DM6PR10MB4201.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230016)(39860400002)(396003)(136003)(376002)(346002)(366004)(7416002)(1076003)(6512007)(6666004)(8936002)(26005)(186003)(107886003)(2616005)(6506007)(44832011)(86362001)(36756003)(41300700001)(316002)(5660300002)(54906003)(478600001)(66556008)(8676002)(4326008)(6486002)(83380400001)(38100700002)(66946007)(2906002)(66476007);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: CCzlgZ2skqvy4/ccAi/uUqGszAEfUSPEVSeVvNsUOht3UTZu3FrZkDsaA2GU2Vyt2DjaW9j4XUtMsz2C3Lt1/BCcltiNoLfT/+RBFdNFcqNkk/2+AvXOprhUJ6b5Ugz1peM7iNygI7aq1aBe2s/oOi8QocWer1NIPKT4ZPXaFRdQ8QFs9ajsDKaptTy1+THJFW6w3VJl8iPg2g/9N1Qh2rrtdCVNPhiw1k3KJJvURfUpb8LmiX++HkUNE1wfa0yUwcBBKTvOLc9o309owl7aQb31A+d+1O7zcjF3LXCHaeSGsNPTXLYVX1Ohp/vA22xTg3XVsZI3vYH9/+dGxrt5dtedA32WDNdpkce79oWdu9JxLnSIrCHlEndPKu19QQvVG6Z1tZQyQURdhEigU/WdixvNKKxaQD8ZARmZ3m0jjlFpXJyr5uEuuAq581orY4QQoQi1Lf+JLWYyvLfxyJntRXPz+ie4typho9Sbn625Vvxj5yIYT6TuJB7gsoP7KvWRx6aX+Ys88z0Mj8YtkEQ9oQzVU711soWkwaU8duBq8vgr24Bz3hzOoiBKGglIIHDFppQIznd656J4Y9MzN1wx2iiccSZpCd1g4MEHHr5RYWDos3/Blq4+84Rcp+ZYad/I/JnsOV2dFLESO8tNSbxn/mRA32UxyPMB/Ff05KBCpSHR3FUryuQzdbbU3/pinFcwCtA1V66l3gdo2Jjj3geEvxf8vU9yDpniJZgqda8hlibu3809GeAzVbggm5EMbmKoSOV8gZQLvPI+zB+pSA+WzJRX48ycedT0FXTIKUNepKa6HSzqC821OZ3uJm0hM83bCD9hJQ/hhJcHbZvL5NZKeNQO5UPMMLzH+OVLlg5QpbqiKl0ktGjbFd9T9r6PCAHZYHHEVJq6/+X+e67F4S6bxBJsKBmSoNoF7sWI7oHJOy8ITZIDvwh77VKFaOf6Y5DQ/twhnZzWVLZfVeFU2TmwmomNsqhMsE3hrZFSA6OLGysPUkrm+cRN6UTafTp3A44QgsVplXVz34Mhwgj9pJR2G7iJFMrl3fJhnzziGC659CtxbfJNDOquytZKZ+QIaRy35xBbgeVSgArt+2giFcPbLqquIOHn3tJh2TvQor74p9tPoMIHWyzBJqTaUUuNQLNr+JBM5lnB932Wr8eWRerjsrfubh86sKdyIv3TkMoOMBepRH8kXA26yZ+7NSl+HZ/FAWzGJ61YEbNz//HFSTSL19HWJYjFciYY80wKxyg+/yjIV2g79F24EMf71GSdmNqrzp69oMiu2+khWOp3pl0KblOB/e4CVTQwYzjCSRIJyIrU6/K0zR6TpriwNIrI1ZYLhoMUPdJrBLPufYju8YEUQ2/eyIlCtzNtC1aealVUhwJJUk3C8gUkbKbGS+SYyzBQHKh9TQt//HtTnE3qiP6BhQLmlMzT04HmQqq5XN70+pAx7Ne3TVrLCMF9M6+VxVkLDEeXByfqM+5mCVlCkeTLBtghkDHE1BkRnuQl6wQWcWxP2Ik1II/el52jH/mIFA9/Zyy7nKjIi/ltIowdw0cFr8tYbGDcmd0MeWeXeC6U6tIH4kaXPSd6JAiJPrZsoXqnC7q7JCzi3e5iHs9soECiiQ== X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 167982a6-dc4a-40f1-1052-08da85fa3d31 X-MS-Exchange-CrossTenant-AuthSource: DM6PR10MB4201.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Aug 2022 17:58:24.2540 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: FuWYq/ol0P47WTRVMhVfKAe9/GEHC7zco1EVPf6S/dqh1DUfqyB0AjanH0mAVVdz6sttibfqt6s5bgGcwEZFrA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR10MB4520 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-24_11,2022-08-22_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 mlxlogscore=999 phishscore=0 adultscore=0 mlxscore=0 bulkscore=0 suspectscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2207270000 definitions=main-2208240066 X-Proofpoint-ORIG-GUID: RzgMuu7MSrWp9QnOusznSr8mUVlJw97R X-Proofpoint-GUID: RzgMuu7MSrWp9QnOusznSr8mUVlJw97R ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1661363918; a=rsa-sha256; cv=pass; b=Ai0Jy0ujnlECIYcFAn/1uProj96EfTNY5H0ohQX6xH1euux2xBZhrn0XW9o8RUFrhMP22C 5ND1YWHyndKmzHfBnGPQK/qT/b61Yx/VNDeKrWoycEcO8rQ1p0Lcbu6AZz9L2txjrcIx7j 3a/7n5B5uPpRZymg+WgSZUW14ua+BxE= ARC-Authentication-Results: i=2; imf16.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2022-7-12 header.b=xxBUvbCz; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=BqxUz5q3; spf=pass (imf16.hostedemail.com: domain of mike.kravetz@oracle.com designates 205.220.177.32 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661363918; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uEL2h/a8wus2bTuGqexO7huRxlaSkSg+XS0O4TCJXZc=; b=fPXk2n162upvvlz5bf/NUN+3LLeOjNAkvLjyI4U1FVYF107ztKYcEeq+rOPF+xzgEvCss9 1TT24tZEiZrom64BB5HgQb2apolVXwJeIypdeMObblYslpGdeblSME/JlrGnRBjhGAjSia PHghbTGsxa8e4ligNzErpW5cItuQUFg= X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 5FEB218001C Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2022-7-12 header.b=xxBUvbCz; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=BqxUz5q3; spf=pass (imf16.hostedemail.com: domain of mike.kravetz@oracle.com designates 205.220.177.32 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") X-Stat-Signature: fbssq45su1mjyy1ijxouaxpoejfujhg7 X-HE-Tag: 1661363918-553614 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Create the new routine hugetlb_unmap_file_folio that will unmap a single file folio. This is refactored code from hugetlb_vmdelete_list. It is modified to do locking within the routine itself and check whether the page is mapped within a specific vma before unmapping. This refactoring will be put to use and expanded upon in a subsequent patch adding vma specific locking. Signed-off-by: Mike Kravetz Reviewed-by: Miaohe Lin --- fs/hugetlbfs/inode.c | 123 +++++++++++++++++++++++++++++++++---------- 1 file changed, 94 insertions(+), 29 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index e83fd31671b3..b93d131b0cb5 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -371,6 +371,94 @@ static void hugetlb_delete_from_page_cache(struct page *page) delete_from_page_cache(page); } +/* + * Called with i_mmap_rwsem held for inode based vma maps. This makes + * sure vma (and vm_mm) will not go away. We also hold the hugetlb fault + * mutex for the page in the mapping. So, we can not race with page being + * faulted into the vma. + */ +static bool hugetlb_vma_maps_page(struct vm_area_struct *vma, + unsigned long addr, struct page *page) +{ + pte_t *ptep, pte; + + ptep = huge_pte_offset(vma->vm_mm, addr, + huge_page_size(hstate_vma(vma))); + + if (!ptep) + return false; + + pte = huge_ptep_get(ptep); + if (huge_pte_none(pte) || !pte_present(pte)) + return false; + + if (pte_page(pte) == page) + return true; + + return false; +} + +/* + * Can vma_offset_start/vma_offset_end overflow on 32-bit arches? + * No, because the interval tree returns us only those vmas + * which overlap the truncated area starting at pgoff, + * and no vma on a 32-bit arch can span beyond the 4GB. + */ +static unsigned long vma_offset_start(struct vm_area_struct *vma, pgoff_t start) +{ + if (vma->vm_pgoff < start) + return (start - vma->vm_pgoff) << PAGE_SHIFT; + else + return 0; +} + +static unsigned long vma_offset_end(struct vm_area_struct *vma, pgoff_t end) +{ + unsigned long t_end; + + if (!end) + return vma->vm_end; + + t_end = ((end - vma->vm_pgoff) << PAGE_SHIFT) + vma->vm_start; + if (t_end > vma->vm_end) + t_end = vma->vm_end; + return t_end; +} + +/* + * Called with hugetlb fault mutex held. Therefore, no more mappings to + * this folio can be created while executing the routine. + */ +static void hugetlb_unmap_file_folio(struct hstate *h, + struct address_space *mapping, + struct folio *folio, pgoff_t index) +{ + struct rb_root_cached *root = &mapping->i_mmap; + struct page *page = &folio->page; + struct vm_area_struct *vma; + unsigned long v_start; + unsigned long v_end; + pgoff_t start, end; + + start = index * pages_per_huge_page(h); + end = ((index + 1) * pages_per_huge_page(h)); + + i_mmap_lock_write(mapping); + + vma_interval_tree_foreach(vma, root, start, end - 1) { + v_start = vma_offset_start(vma, start); + v_end = vma_offset_end(vma, end); + + if (!hugetlb_vma_maps_page(vma, vma->vm_start + v_start, page)) + continue; + + unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, + NULL, ZAP_FLAG_DROP_MARKER); + } + + i_mmap_unlock_write(mapping); +} + static void hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, zap_flags_t zap_flags) @@ -383,30 +471,13 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, * an inclusive "last". */ vma_interval_tree_foreach(vma, root, start, end ? end - 1 : ULONG_MAX) { - unsigned long v_offset; + unsigned long v_start; unsigned long v_end; - /* - * Can the expression below overflow on 32-bit arches? - * No, because the interval tree returns us only those vmas - * which overlap the truncated area starting at pgoff, - * and no vma on a 32-bit arch can span beyond the 4GB. - */ - if (vma->vm_pgoff < start) - v_offset = (start - vma->vm_pgoff) << PAGE_SHIFT; - else - v_offset = 0; - - if (!end) - v_end = vma->vm_end; - else { - v_end = ((end - vma->vm_pgoff) << PAGE_SHIFT) - + vma->vm_start; - if (v_end > vma->vm_end) - v_end = vma->vm_end; - } + v_start = vma_offset_start(vma, start); + v_end = vma_offset_end(vma, end); - unmap_hugepage_range(vma, vma->vm_start + v_offset, v_end, + unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, NULL, zap_flags); } } @@ -428,14 +499,8 @@ static bool remove_inode_single_folio(struct hstate *h, struct inode *inode, * the fault mutex. The mutex will prevent faults * until we finish removing the folio. */ - if (unlikely(folio_mapped(folio))) { - i_mmap_lock_write(mapping); - hugetlb_vmdelete_list(&mapping->i_mmap, - index * pages_per_huge_page(h), - (index + 1) * pages_per_huge_page(h), - ZAP_FLAG_DROP_MARKER); - i_mmap_unlock_write(mapping); - } + if (unlikely(folio_mapped(folio))) + hugetlb_unmap_file_folio(h, mapping, folio, index); folio_lock(folio); /* From patchwork Wed Aug 24 17:57:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12953872 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74823C00140 for ; Wed, 24 Aug 2022 17:58:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 10A2094000C; Wed, 24 Aug 2022 13:58:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0BAE1940009; Wed, 24 Aug 2022 13:58:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB40494000C; Wed, 24 Aug 2022 13:58:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id BF4B3940009 for ; Wed, 24 Aug 2022 13:58:40 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 9C06880A43 for ; Wed, 24 Aug 2022 17:58:40 +0000 (UTC) X-FDA: 79835246400.21.5164646 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by imf06.hostedemail.com (Postfix) with ESMTP id 1B81D180051 for ; Wed, 24 Aug 2022 17:58:39 +0000 (UTC) Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27OHkbaX009154; Wed, 24 Aug 2022 17:58:31 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2022-7-12; bh=7CDRJVazPQPG9DiPh65pmT4kKeEC7OVNZLRGCF/gHE8=; b=vJM8yYPPQGCyj64o8femZipqrgtc6qlpUwPDLu5NjAfEWy96feLGqWRwCxFIbGXyiRLq h2g3KW49l4l/t7QxVs31JED9kc+cULwFiPB4vUlJH1m0RFEhOB8rXEJxnyw18brBxmWc EJD7F3k8AW3Ha9MHesk7bCRTkGcTEb+y4jW7imcEKA27r7whx6ZDLk5D5C33MK/jL/Rf XDiKYF3Yfihu1HrNX0S1nYaiEQpWd7vHiSw1B8GFHhyKsTf2H2I+BIIvzN/kNo+vB6JO 0Ni7EfY4DEE7I524DbPOK+LnxJvX7YXtJgM71AeVOVEH6n7CF1hxCiOU8sD7PMyZ97HJ kQ== Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3j5aww21nq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Aug 2022 17:58:31 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 27OHF19X011111; Wed, 24 Aug 2022 17:58:30 GMT Received: from nam02-sn1-obe.outbound.protection.outlook.com (mail-sn1anam02lp2047.outbound.protection.outlook.com [104.47.57.47]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3j5n6kujgb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Aug 2022 17:58:30 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=S6FDtcIw9VoFlZTyCIx8SQNtwZBPuzLo4zT4ymzXPb3G5dM1zWGX4+8Htj/TDKeT9NkbKVUjECjA6WEzdejl3bCXr2UmhrGNb1NPweEEMdb7hZCpbQyqUnsxqugKd2BLzo9Osx+GWqYJ6X9O/fX0beY3wOV3aZ66vSmbjvT5J9bFu6wLkS7747cCvuEygjJNuH/N9QxJx/oLMwpKOwN5c2hhFuCirzn6p1zmi40JdUMg0Ed9prnaBKN4uYw12FGAPxe4vSFt7dhNH0TVJUmMbQrFIqfER4fr2e+BiuLvrStDQC/0DzSZP2SQhpmQ45wgVuiFG12Xt2yFCgTNSm92gA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=7CDRJVazPQPG9DiPh65pmT4kKeEC7OVNZLRGCF/gHE8=; b=PaggHaEXiBI3ExlzvOwIsvGqLTugGo9gl6kRlQBmq2hjCqTtWH5gXPJz5Hzc9HnXSzZoqwZ/oXxuwHXYo0Pa7kyFk8Fz1OXpxWKGEcxmoje7gl4hSFmyduTBtpbi/jXx099gBS4U3SSdpJOt016BlCpfskszZT98KHZAZ4EVQAiPki1UeLZeBaNNzxKhyaKJbw0bG6SXmjfYh9XltLejKqUzejFvEfqsCFDnoexhPDp6hvW1isQmgkOg6RN3F1VzbgkkpSUvcaqNK9yYtopuyrtiGVhHqzXxys9DuwokF7wZ9VX+vx1PHGWMllQCtGg5qYL0Jijpj4vq9Ae8jO1I9w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=7CDRJVazPQPG9DiPh65pmT4kKeEC7OVNZLRGCF/gHE8=; b=jQ2JxYFtqN7UcfuFw32385gbRYMKsfOEDD4sfzpY8s4bf4Z5+zirFa5Gt3NA32Sms2D7VLvJUiZJYhbUnBYOuuvGbdtE+LyMVcmOAB24hnqNqGx8bVVrmsXC+NZqtsSHmSijUFrM7B+WscpoDEefqbV7D/SpqomMaeoJDOyxLUI= Received: from DM6PR10MB4201.namprd10.prod.outlook.com (2603:10b6:5:216::10) by PH0PR10MB4520.namprd10.prod.outlook.com (2603:10b6:510:43::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5546.19; Wed, 24 Aug 2022 17:58:28 +0000 Received: from DM6PR10MB4201.namprd10.prod.outlook.com ([fe80::11b6:7a8a:1432:bec]) by DM6PR10MB4201.namprd10.prod.outlook.com ([fe80::11b6:7a8a:1432:bec%6]) with mapi id 15.20.5546.022; Wed, 24 Aug 2022 17:58:28 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Miaohe Lin , David Hildenbrand , Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [PATCH 8/8] hugetlb: use new vma_lock for pmd sharing synchronization Date: Wed, 24 Aug 2022 10:57:57 -0700 Message-Id: <20220824175757.20590-9-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220824175757.20590-1-mike.kravetz@oracle.com> References: <20220824175757.20590-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW4PR03CA0139.namprd03.prod.outlook.com (2603:10b6:303:8c::24) To DM6PR10MB4201.namprd10.prod.outlook.com (2603:10b6:5:216::10) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 45b9958e-2bb9-40ea-52a0-08da85fa3f7b X-MS-TrafficTypeDiagnostic: PH0PR10MB4520:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: bVEfA9zpxQ5Ikecekc2LbsKlQFIah6YvH4rU0b/7HNomVRhoO30kXFbS7Dome4Sbio4EYdY0mxyMRlo9YZ/g3ikfog3QzgtX+b52xHoebzQavW7qym2VSpgQcEPInZknLIqO6MAZbiqcE1fiRcXS/6rRrlh92pIu7Jpyws74nrSIq1pOk4OgKw/LsLNKA42+8OpSzejLdhsZfNQe1o/HrQrSdkrAzlcf5ONWj5CFJnReb50DIRFqmtysGQLpngQC5taTlEGBAQiLEiy8ITm3A3ZoLab9n1EJMSenDu1S7Vq5nHug/wC+S65RkmvWSQPAXChzSZyNZTMMHhz6c5YDT7YlAreZh3xCxGO4mRHPmYBY6q28WdrINngwDMxfJC60iHG4O/UC9/ihcGZshtOrS/mUoCL1AeuAiuNKIeWfJxI9IjqkzA5hulwakzsSAO9M3K8tV837IlH41XqljNUdkTEXQmLdpensGqG1YGA1DyxKYmbH7EgVYNLnTSTNMJzWbB5f/YNLYBYGLAi7jDapqmh3dTO3cJvqyz646W8bWULcieseQ9wFe94qRS8AZW1I/LLv0M0GVwR5eoRzts905ZWbVeUauVCTqT0O6XMutpDzJleeWTgGwa3atZ8TZIIVvQaay41EAQEDPCPJs00EIAcQDRuZHqBr3YHjkl0XDtirTV3OdmobmbBzraSF2OX0J+FaRjhxk6BEcAkql32sbA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DM6PR10MB4201.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230016)(39860400002)(396003)(136003)(376002)(346002)(366004)(7416002)(1076003)(6512007)(6666004)(8936002)(26005)(186003)(30864003)(107886003)(2616005)(6506007)(44832011)(86362001)(36756003)(41300700001)(316002)(5660300002)(54906003)(478600001)(66556008)(8676002)(4326008)(6486002)(83380400001)(38100700002)(66946007)(2906002)(66476007);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: Ew9aplPEbgmMpy7m33wAoDM+BQXMlNhkh64JifgOIdMy1ALYUPPKX6UztMWbv/BCLXuIv1Z23U761C86BN1LBWAeuTVjZHaaXuni4Fyd77VltlByia8xJKFxxyKuE15rK0uk52XrBQF7e04eu8S1FK7GpcMX8CYqtmpg/qD2xs3tFVqKTMa+Z9UVHLwcOPyDg9tgSsMxk7iYJDurLZStAWKgPu+0TNaWmnZ8bmYqvWF81/U5u/ZBVoqfe6ZT9pjTmX7ztof965Xc+bmaVak1A09ycKdl/+9y2eG+ToiTjRIYWSOoofko7pRC21Ge2tTLpWTftLAs6zqNZnqzTylCXILqHvRBYiCcaqD1GFnk8PsEkH6NckYh28M1qYxPJWRWL2OKcRNrbYPNJE8v7/tHnnfxwL7Gk00+ex85nITT4rPnGIWt8oQBII0vNyCAOUFfYetAsInKy5IibTZf/xN8MKUlL+WUZ5qHLRnliorVHhzey3tCP9rQd5xesBrXNye97qio/Sj/YQFSBze1X51ItpRPjxg7iVI9BjdvqTyOYC5pE3ktoWhdYHqxxBU9PZgQT3Y+/7uOcOFrnJoRPgNI95Nj14cs0ifRFQEG1bp7K7130zo73ge20GYGG0wUmPZcQfGyTNDhLRzj+7xoZtz4xJMStIrV19X1U37ZeOMj37bgigjeRKzFYsWZ/zry0zdtl8jR1M3s1UfZIpLcWBcW8kIDOEbAKsawH8hUVXSPHqN5GaGyUwxneX35On08Mx51Q/Mv6KWtWAmXAS0ljlLeEm7qPJfxdgJulShaRwIZZBBw96y9Jw/BKCdNoD61p34+UK1n0Qtl3zPYEl3P4Wlg1Bhz8zItzp+jJ5mjP87fA3bmyPWLC0KP4Z3Kr0nF8xmmXZKOuRmmTaD2dM+Q8Tne4060RsyRSvv198V30dnG0fJ3jUY/8Xmvk8GjVAdoQuzCqbtB49nzAoJCi44e6mYBKpdeZZIlaKbtwCVp0xpRrQaLG71ObbfiyUGoKv02sfrgbLwjsJlwduBx/qLTcLGMgGqxO1pz+V//dGKj6/HbSRNV1aZEyPvN6Z53zgynZ+dAljG4SC0x595Ov4QIbVHQi3444r1wUUgyAdHYWABdTmedXrqq+dpk2b8pGpQBoMqUhk2byDiLnwFlG8nYeBwUHT2mUcHhnksVqghd4LaHEh8g0aqqv3uljJB1gw0U7N/G5cMOuBLkZj2SwjmNjAKsOJWm3Y+S5uAACyetcJ1v17EbrF3NmIBD0I6/Jpx2YjMEOmJevlf+4IZWX80c1fD0/NJTMwacQgaq01jxYg67TBWbmeE08aJDr+tAmelenoaiVhqm9fOZ8EdabIhwlSnRQjSTHLY2hqz0teV6SARKADPsjKlbhf/bkq46F2xG42yIFbLenfTZOPPYGCoSKxFbiXUUf9jRoaxDC91QvXMDhMqvJVUc2GfzjX+5d253AHJtigYYTmsb/BsT16stseG62+h5wR55iBkuRcmS/P46JO2uI9ubdg2jYmx62jXF+F33TlkeOmueY9mJhQjVn7wCOoE7EUWHbaJTD6o80F0/1ORuzA2K6kLbKpOndxQ714dbHhsZqB/JeJuc6AIPqI4DOA== X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 45b9958e-2bb9-40ea-52a0-08da85fa3f7b X-MS-Exchange-CrossTenant-AuthSource: DM6PR10MB4201.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Aug 2022 17:58:28.1898 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: sTSMlEJwZ3gSe5U0AKc/oXAk52QrpgOYxI9fl1/mTPVJ/38nv6oBB9WN0vToSb6Wd5N/zFzuedqwaYGsWgEHtA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR10MB4520 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-24_11,2022-08-22_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 suspectscore=0 mlxlogscore=999 mlxscore=0 spamscore=0 adultscore=0 phishscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2207270000 definitions=main-2208240066 X-Proofpoint-GUID: e8Ypa2Hh_XG989LNZYfwgmGaQyCjRPhX X-Proofpoint-ORIG-GUID: e8Ypa2Hh_XG989LNZYfwgmGaQyCjRPhX ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1661363920; a=rsa-sha256; cv=pass; b=FCnaBaCxbEtRiLGkwgjPBw0J7qO6J+o0XiiLzMFQv2oDIfGxB9LVvBOn0BR2DXmURdK8Ib NKNpjpO6iDmTgJUCic1lP5ZkuDer3F0VC1ApbEUQkeYqZGSHl8XQi4wpe+QhGBTUKWWSjO nu5cpVP8oJ6WVsCreyc1xg00h+Q7ywA= ARC-Authentication-Results: i=2; imf06.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2022-7-12 header.b=vJM8yYPP; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=jQ2JxYFt; spf=pass (imf06.hostedemail.com: domain of mike.kravetz@oracle.com designates 205.220.177.32 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661363920; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7CDRJVazPQPG9DiPh65pmT4kKeEC7OVNZLRGCF/gHE8=; b=74M0VMwMsxdeQcz78xi0w6EznG5XXupDEhqGoGczLDJFyA9iTmpxBDC/pCNAMmYYMeXLFl 3O3XhnFxdVsLZfFFIDVeLZh1VyigBk64ua9s1niAND4YBijHAh8dSBXkp0aF+BDytKX440 P4whIgWfFGnmlh3UlTwvH8JA+FXu5gA= X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 1B81D180051 Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2022-7-12 header.b=vJM8yYPP; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=jQ2JxYFt; spf=pass (imf06.hostedemail.com: domain of mike.kravetz@oracle.com designates 205.220.177.32 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") X-Stat-Signature: j8b1pkxn5s7nu8tko5zfqf5duo589181 X-HE-Tag: 1661363919-649918 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The new hugetlb vma lock (rw semaphore) is used to address this race: Faulting thread Unsharing thread ... ... ptep = huge_pte_offset() or ptep = huge_pte_alloc() ... i_mmap_lock_write lock page table ptep invalid <------------------------ huge_pmd_unshare() Could be in a previously unlock_page_table sharing process or worse i_mmap_unlock_write ... The vma_lock is used as follows: - During fault processing. the lock is acquired in read mode before doing a page table lock and allocation (huge_pte_alloc). The lock is held until code is finished with the page table entry (ptep). - The lock must be held in write mode whenever huge_pmd_unshare is called. Lock ordering issues come into play when unmapping a page from all vmas mapping the page. The i_mmap_rwsem must be held to search for the vmas, and the vma lock must be held before calling unmap which will call huge_pmd_unshare. This is done today in: - try_to_migrate_one and try_to_unmap_ for page migration and memory error handling. In these routines we 'try' to obtain the vma lock and fail to unmap if unsuccessful. Calling routines already deal with the failure of unmapping. - hugetlb_vmdelete_list for truncation and hole punch. This routine also tries to acquire the vma lock. If it fails, it skips the unmapping. However, we can not have file truncation or hole punch fail because of contention. After hugetlb_vmdelete_list, truncation and hole punch call remove_inode_hugepages. remove_inode_hugepages check for mapped pages and call hugetlb_unmap_file_page to unmap them. hugetlb_unmap_file_page is designed to drop locks and reacquire in the correct order to guarantee unmap success. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 46 +++++++++++++++++++ mm/hugetlb.c | 102 +++++++++++++++++++++++++++++++++++++++---- mm/memory.c | 2 + mm/rmap.c | 100 +++++++++++++++++++++++++++--------------- mm/userfaultfd.c | 9 +++- 5 files changed, 214 insertions(+), 45 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index b93d131b0cb5..52d9b390389b 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -434,6 +434,8 @@ static void hugetlb_unmap_file_folio(struct hstate *h, struct folio *folio, pgoff_t index) { struct rb_root_cached *root = &mapping->i_mmap; + unsigned long skipped_vm_start; + struct mm_struct *skipped_mm; struct page *page = &folio->page; struct vm_area_struct *vma; unsigned long v_start; @@ -444,6 +446,8 @@ static void hugetlb_unmap_file_folio(struct hstate *h, end = ((index + 1) * pages_per_huge_page(h)); i_mmap_lock_write(mapping); +retry: + skipped_mm = NULL; vma_interval_tree_foreach(vma, root, start, end - 1) { v_start = vma_offset_start(vma, start); @@ -452,11 +456,49 @@ static void hugetlb_unmap_file_folio(struct hstate *h, if (!hugetlb_vma_maps_page(vma, vma->vm_start + v_start, page)) continue; + if (!hugetlb_vma_trylock_write(vma)) { + /* + * If we can not get vma lock, we need to drop + * immap_sema and take locks in order. + */ + skipped_vm_start = vma->vm_start; + skipped_mm = vma->vm_mm; + /* grab mm-struct as we will be dropping i_mmap_sema */ + mmgrab(skipped_mm); + break; + } + unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, NULL, ZAP_FLAG_DROP_MARKER); + hugetlb_vma_unlock_write(vma); } i_mmap_unlock_write(mapping); + + if (skipped_mm) { + mmap_read_lock(skipped_mm); + vma = find_vma(skipped_mm, skipped_vm_start); + if (!vma || !is_vm_hugetlb_page(vma) || + vma->vm_file->f_mapping != mapping || + vma->vm_start != skipped_vm_start) { + mmap_read_unlock(skipped_mm); + mmdrop(skipped_mm); + goto retry; + } + + hugetlb_vma_lock_write(vma); + i_mmap_lock_write(mapping); + mmap_read_unlock(skipped_mm); + mmdrop(skipped_mm); + + v_start = vma_offset_start(vma, start); + v_end = vma_offset_end(vma, end); + unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, + NULL, ZAP_FLAG_DROP_MARKER); + hugetlb_vma_unlock_write(vma); + + goto retry; + } } static void @@ -474,11 +516,15 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, unsigned long v_start; unsigned long v_end; + if (!hugetlb_vma_trylock_write(vma)) + continue; + v_start = vma_offset_start(vma, start); v_end = vma_offset_end(vma, end); unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, NULL, zap_flags); + hugetlb_vma_unlock_write(vma); } } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6fb0bff2c7ee..5912c2b97ddf 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4801,6 +4801,14 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, mmu_notifier_invalidate_range_start(&range); mmap_assert_write_locked(src); raw_write_seqcount_begin(&src->write_protect_seq); + } else { + /* + * For shared mappings the vma lock must be held before + * calling huge_pte_offset in the src vma. Otherwise, the + * returned ptep could go away if part of a shared pmd and + * another thread calls huge_pmd_unshare. + */ + hugetlb_vma_lock_read(src_vma); } last_addr_mask = hugetlb_mask_last_page(h); @@ -4948,6 +4956,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, if (cow) { raw_write_seqcount_end(&src->write_protect_seq); mmu_notifier_invalidate_range_end(&range); + } else { + hugetlb_vma_unlock_read(src_vma); } return ret; @@ -5006,6 +5016,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, mmu_notifier_invalidate_range_start(&range); last_addr_mask = hugetlb_mask_last_page(h); /* Prevent race with file truncation */ + hugetlb_vma_lock_write(vma); i_mmap_lock_write(mapping); for (; old_addr < old_end; old_addr += sz, new_addr += sz) { src_pte = huge_pte_offset(mm, old_addr, sz); @@ -5037,6 +5048,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, flush_tlb_range(vma, old_end - len, old_end); mmu_notifier_invalidate_range_end(&range); i_mmap_unlock_write(mapping); + hugetlb_vma_unlock_write(vma); return len + old_addr - old_end; } @@ -5356,9 +5368,30 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, * may get SIGKILLed if it later faults. */ if (outside_reserve) { + struct address_space *mapping = vma->vm_file->f_mapping; + pgoff_t idx; + u32 hash; + put_page(old_page); BUG_ON(huge_pte_none(pte)); + /* + * Drop hugetlb_fault_mutex and vma_lock before + * unmapping. unmapping needs to hold vma_lock + * in write mode. Dropping vma_lock in read mode + * here is OK as COW mappings do not interact with + * PMD sharing. + * + * Reacquire both after unmap operation. + */ + idx = vma_hugecache_offset(h, vma, haddr); + hash = hugetlb_fault_mutex_hash(mapping, idx); + hugetlb_vma_unlock_read(vma); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + unmap_ref_private(mm, vma, old_page, haddr); + + mutex_lock(&hugetlb_fault_mutex_table[hash]); + hugetlb_vma_lock_read(vma); spin_lock(ptl); ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); if (likely(ptep && @@ -5520,14 +5553,16 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, }; /* - * hugetlb_fault_mutex and i_mmap_rwsem must be + * vma_lock and hugetlb_fault_mutex must be * dropped before handling userfault. Reacquire * after handling fault to make calling code simpler. */ + hugetlb_vma_unlock_read(vma); hash = hugetlb_fault_mutex_hash(mapping, idx); mutex_unlock(&hugetlb_fault_mutex_table[hash]); ret = handle_userfault(&vmf, reason); mutex_lock(&hugetlb_fault_mutex_table[hash]); + hugetlb_vma_lock_read(vma); return ret; } @@ -5767,6 +5802,11 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); if (ptep) { + /* + * Since we hold no locks, ptep could be stale. That is + * OK as we are only making decisions based on content and + * not actually modifying content here. + */ entry = huge_ptep_get(ptep); if (unlikely(is_hugetlb_entry_migration(entry))) { migration_entry_wait_huge(vma, ptep); @@ -5774,23 +5814,35 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) return VM_FAULT_HWPOISON_LARGE | VM_FAULT_SET_HINDEX(hstate_index(h)); - } else { - ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); - if (!ptep) - return VM_FAULT_OOM; } - mapping = vma->vm_file->f_mapping; - idx = vma_hugecache_offset(h, vma, haddr); - /* * Serialize hugepage allocation and instantiation, so that we don't * get spurious allocation failures if two CPUs race to instantiate * the same page in the page cache. */ + mapping = vma->vm_file->f_mapping; + idx = vma_hugecache_offset(h, vma, haddr); hash = hugetlb_fault_mutex_hash(mapping, idx); mutex_lock(&hugetlb_fault_mutex_table[hash]); + /* + * Acquire vma lock before calling huge_pte_alloc and hold + * until finished with ptep. This prevents huge_pmd_unshare from + * being called elsewhere and making the ptep no longer valid. + * + * ptep could have already be assigned via huge_pte_offset. That + * is OK, as huge_pte_alloc will return the same value unless + * something has changed. + */ + hugetlb_vma_lock_read(vma); + ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); + if (!ptep) { + hugetlb_vma_unlock_read(vma); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + return VM_FAULT_OOM; + } + entry = huge_ptep_get(ptep); /* PTE markers should be handled the same way as none pte */ if (huge_pte_none_mostly(entry)) { @@ -5851,6 +5903,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, unlock_page(pagecache_page); put_page(pagecache_page); } + hugetlb_vma_unlock_read(vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); return handle_userfault(&vmf, VM_UFFD_WP); } @@ -5894,6 +5947,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, put_page(pagecache_page); } out_mutex: + hugetlb_vma_unlock_read(vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); /* * Generally it's safe to hold refcount during waiting page lock. But @@ -6343,8 +6397,9 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, flush_cache_range(vma, range.start, range.end); mmu_notifier_invalidate_range_start(&range); - last_addr_mask = hugetlb_mask_last_page(h); + hugetlb_vma_lock_write(vma); i_mmap_lock_write(vma->vm_file->f_mapping); + last_addr_mask = hugetlb_mask_last_page(h); for (; address < end; address += psize) { spinlock_t *ptl; ptep = huge_pte_offset(mm, address, psize); @@ -6443,6 +6498,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, * See Documentation/mm/mmu_notifier.rst */ i_mmap_unlock_write(vma->vm_file->f_mapping); + hugetlb_vma_unlock_write(vma); mmu_notifier_invalidate_range_end(&range); return pages << h->order; @@ -6909,6 +6965,7 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, pud_t *pud = pud_offset(p4d, addr); i_mmap_assert_write_locked(vma->vm_file->f_mapping); + hugetlb_vma_assert_locked(vma); BUG_ON(page_count(virt_to_page(ptep)) == 0); if (page_count(virt_to_page(ptep)) == 1) return 0; @@ -6920,6 +6977,31 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, } #else /* !CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ +void hugetlb_vma_lock_read(struct vm_area_struct *vma) +{ +} + +void hugetlb_vma_unlock_read(struct vm_area_struct *vma) +{ +} + +void hugetlb_vma_lock_write(struct vm_area_struct *vma) +{ +} + +void hugetlb_vma_unlock_write(struct vm_area_struct *vma) +{ +} + +int hugetlb_vma_trylock_write(struct vm_area_struct *vma) +{ + return 1; +} + +void hugetlb_vma_assert_locked(struct vm_area_struct *vma) +{ +} + static void hugetlb_vma_lock_free(struct vm_area_struct *vma) { } @@ -7298,6 +7380,7 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, start, end); mmu_notifier_invalidate_range_start(&range); + hugetlb_vma_lock_write(vma); i_mmap_lock_write(vma->vm_file->f_mapping); for (address = start; address < end; address += PUD_SIZE) { ptep = huge_pte_offset(mm, address, sz); @@ -7309,6 +7392,7 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) } flush_hugetlb_tlb_range(vma, start, end); i_mmap_unlock_write(vma->vm_file->f_mapping); + hugetlb_vma_unlock_write(vma); /* * No need to call mmu_notifier_invalidate_range(), see * Documentation/mm/mmu_notifier.rst. diff --git a/mm/memory.c b/mm/memory.c index 2f3cc57a5a11..55166045ab55 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1675,10 +1675,12 @@ static void unmap_single_vma(struct mmu_gather *tlb, if (vma->vm_file) { zap_flags_t zap_flags = details ? details->zap_flags : 0; + hugetlb_vma_lock_write(vma); i_mmap_lock_write(vma->vm_file->f_mapping); __unmap_hugepage_range_final(tlb, vma, start, end, NULL, zap_flags); i_mmap_unlock_write(vma->vm_file->f_mapping); + hugetlb_vma_unlock_write(vma); } } else unmap_page_range(tlb, vma, start, end, details); diff --git a/mm/rmap.c b/mm/rmap.c index 55209e029847..60d7db60428e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1558,24 +1558,39 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * To call huge_pmd_unshare, i_mmap_rwsem must be * held in write mode. Caller needs to explicitly * do this outside rmap routines. + * + * We also must hold hugetlb vma_lock in write mode. + * Lock order dictates acquiring vma_lock BEFORE + * i_mmap_rwsem. We can only try lock here and fail + * if unsuccessful. */ - VM_BUG_ON(!anon && !(flags & TTU_RMAP_LOCKED)); - if (!anon && huge_pmd_unshare(mm, vma, address, pvmw.pte)) { - flush_tlb_range(vma, range.start, range.end); - mmu_notifier_invalidate_range(mm, range.start, - range.end); - - /* - * The ref count of the PMD page was dropped - * which is part of the way map counting - * is done for shared PMDs. Return 'true' - * here. When there is no other sharing, - * huge_pmd_unshare returns false and we will - * unmap the actual page and drop map count - * to zero. - */ - page_vma_mapped_walk_done(&pvmw); - break; + if (!anon) { + VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + if (!hugetlb_vma_trylock_write(vma)) { + page_vma_mapped_walk_done(&pvmw); + ret = false; + break; + } + if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) { + hugetlb_vma_unlock_write(vma); + flush_tlb_range(vma, + range.start, range.end); + mmu_notifier_invalidate_range(mm, + range.start, range.end); + /* + * The ref count of the PMD page was + * dropped which is part of the way map + * counting is done for shared PMDs. + * Return 'true' here. When there is + * no other sharing, huge_pmd_unshare + * returns false and we will unmap the + * actual page and drop map count + * to zero. + */ + page_vma_mapped_walk_done(&pvmw); + break; + } + hugetlb_vma_unlock_write(vma); } pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { @@ -1934,26 +1949,41 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, * To call huge_pmd_unshare, i_mmap_rwsem must be * held in write mode. Caller needs to explicitly * do this outside rmap routines. + * + * We also must hold hugetlb vma_lock in write mode. + * Lock order dictates acquiring vma_lock BEFORE + * i_mmap_rwsem. We can only try lock here and + * fail if unsuccessful. */ - VM_BUG_ON(!anon && !(flags & TTU_RMAP_LOCKED)); - if (!anon && huge_pmd_unshare(mm, vma, address, pvmw.pte)) { - flush_tlb_range(vma, range.start, range.end); - mmu_notifier_invalidate_range(mm, range.start, - range.end); - - /* - * The ref count of the PMD page was dropped - * which is part of the way map counting - * is done for shared PMDs. Return 'true' - * here. When there is no other sharing, - * huge_pmd_unshare returns false and we will - * unmap the actual page and drop map count - * to zero. - */ - page_vma_mapped_walk_done(&pvmw); - break; + if (!anon) { + VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + if (!hugetlb_vma_trylock_write(vma)) { + page_vma_mapped_walk_done(&pvmw); + ret = false; + break; + } + if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) { + hugetlb_vma_unlock_write(vma); + flush_tlb_range(vma, + range.start, range.end); + mmu_notifier_invalidate_range(mm, + range.start, range.end); + + /* + * The ref count of the PMD page was + * dropped which is part of the way map + * counting is done for shared PMDs. + * Return 'true' here. When there is + * no other sharing, huge_pmd_unshare + * returns false and we will unmap the + * actual page and drop map count + * to zero. + */ + page_vma_mapped_walk_done(&pvmw); + break; + } + hugetlb_vma_unlock_write(vma); } - /* Nuke the hugetlb page table entry */ pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 7707f2664adb..2b0502710ea1 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -377,16 +377,21 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, BUG_ON(dst_addr >= dst_start + len); /* - * Serialize via hugetlb_fault_mutex. + * Serialize via vma_lock and hugetlb_fault_mutex. + * vma_lock ensures the dst_pte remains valid even + * in the case of shared pmds. fault mutex prevents + * races with other faulting threads. */ idx = linear_page_index(dst_vma, dst_addr); mapping = dst_vma->vm_file->f_mapping; hash = hugetlb_fault_mutex_hash(mapping, idx); mutex_lock(&hugetlb_fault_mutex_table[hash]); + hugetlb_vma_lock_read(dst_vma); err = -ENOMEM; dst_pte = huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize); if (!dst_pte) { + hugetlb_vma_unlock_read(dst_vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out_unlock; } @@ -394,6 +399,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, if (mode != MCOPY_ATOMIC_CONTINUE && !huge_pte_none_mostly(huge_ptep_get(dst_pte))) { err = -EEXIST; + hugetlb_vma_unlock_read(dst_vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out_unlock; } @@ -402,6 +408,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, dst_addr, src_addr, mode, &page, wp_copy); + hugetlb_vma_unlock_read(dst_vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); cond_resched();