From patchwork Wed Aug 9 06:11:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yin Fengwei X-Patchwork-Id: 13347444 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A168FEB64DD for ; Wed, 9 Aug 2023 06:12:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DB8F66B0074; Wed, 9 Aug 2023 02:12:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D690C6B0075; Wed, 9 Aug 2023 02:12:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C30118D0001; Wed, 9 Aug 2023 02:12:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id AF4406B0074 for ; Wed, 9 Aug 2023 02:12:48 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 7A7D6160906 for ; Wed, 9 Aug 2023 06:12:48 +0000 (UTC) X-FDA: 81103547616.19.A946E70 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.20]) by imf03.hostedemail.com (Postfix) with ESMTP id 797612001F for ; Wed, 9 Aug 2023 06:12:46 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=M7rbvyhD; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf03.hostedemail.com: domain of fengwei.yin@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691561566; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=deQV58wLgruMOqZkOxyWxKwxQfdyNdJFgULcqvib22I=; b=vKB4gqCoFMkoqkkjlOdZxWypeWpFaaARUVL/lLpsxNUYYRXxQRg1cHnTQci56UHobq4pdA wap3dk+LutbDjGmk/GoG9YE6kV3kspCUuaa2zMBt4mZ+x+D2RLsl5+apDXUoNnLBi68HNG EAHwuZCaupHU1HFPnf+7G/OaINKgIMo= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=M7rbvyhD; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf03.hostedemail.com: domain of fengwei.yin@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691561566; a=rsa-sha256; cv=none; b=pruE6scitAAy3oRGCmYYKHJN5IGm5JIFlWMTwEOI+tSfcgKpJ0OHTcJGiyUb+gJ30Q7mIC CcUc6OSe5WvIy8SggkaA9f8wjYzcGs3Bign4QSvubYqHfVErJNybTAlNW+DVm+cxrv0+FV IKLqavb2axPaNw6NYIzAb2O2/txAhM8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691561566; x=1723097566; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KsgBZKiNNdVU9A9eEIcQRL879OD2WdX41NJw/witcyM=; b=M7rbvyhDuA7xOuE4vbJyuBb/gdBiI1iXdcgmIgStdga8h8QEEq2wRvCj AT2thYa24W3yHVs4nIU49PVr3ShBhaX0b6y8JiMkfI8+aczik0QCPbvM9 vOhMhTZ6+FiKPCvxfX+d3iJ1SdwNdy3mkmzt+zBBwmqag7zhy8u8y/2vR J61XJydd9ZNkhyKmoC/PzId5cCoc0Y0mCxpwDFbWR2tunnPjR3rK33AWq QkYljOLu38Z592C008hTuNyUPDl0KOH/5O/C/Vt93zi0b9JHluHKmLI1J ZufZ/q4db6HvFDkSxpRtL7QUgyBWnwyXdJA8f7wxkWPoepnt7zJ3NzMae w==; X-IronPort-AV: E=McAfee;i="6600,9927,10795"; a="361159624" X-IronPort-AV: E=Sophos;i="6.01,158,1684825200"; d="scan'208";a="361159624" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2023 23:12:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10795"; a="681553232" X-IronPort-AV: E=Sophos;i="6.01,158,1684825200"; d="scan'208";a="681553232" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by orsmga003.jf.intel.com with ESMTP; 08 Aug 2023 23:12:37 -0700 From: Yin Fengwei To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, yuzhao@google.com, willy@infradead.org, hughd@google.com, yosryahmed@google.com, ryan.roberts@arm.com, david@redhat.com, shy828301@gmail.com Cc: fengwei.yin@intel.com Subject: [PATCH v2 1/3] mm: add functions folio_in_range() and folio_within_vma() Date: Wed, 9 Aug 2023 14:11:03 +0800 Message-Id: <20230809061105.3369958-2-fengwei.yin@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230809061105.3369958-1-fengwei.yin@intel.com> References: <20230809061105.3369958-1-fengwei.yin@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 797612001F X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: om46nuzuow9wmmu6tmejxheu6ptwpijw X-HE-Tag: 1691561566-367280 X-HE-Meta: U2FsdGVkX18K+FiKsIZr5MtbeXqz0v7fjvfWMy14BIwD6yWgU+a5eLle/JsQNtcAT9ZPc+do6twJgJveaOstFQqL0IoqDQE7owHxZ21luammDucHKSSYu6Fgi3fjudEOmWPA7CKAfJ0U79QS62NFqRTkz/FKWcw41iKKveyVHoWcCU4c5HqnXeF3iKfUPidfYzhwdYUVsZ53TUtibD+CQgih/mGA/KI+Dgb7feHonLyGRL6R5eZ8RRact7CMXcS8CBxzCWb1iqTTFJZVS6cH/W0M0BnAyjDLMwVlfhfBF2IrbGIUYhb8tTwiwFTorG+gF7TmzYl5US9rUGE5UPHrV/KMJ3Gk71GzD6+87/hGhnO3nOLc7906RAgjBLyr7aF8gqzTxgwFFPKdinC5Nt0Qt1WgsxYiVz+xHdiyWdMBjyO8DJXhUVxYozZFPHs0qv42neuLcD4oFvsjkfGG2iyfd4kFUv1EqtEozlDkFdZodclpydn5nMsaXYQ3Sg/MKP+AXk1U8UhTVJL+J9O0t2zdtV8y1KTOBhctGUq8ibMqg+pqWpEleqHsw08/6QyJxdHKdIIYkrH6Cmgm3nB0kqYzdouVtgJY54cb4m7G4cink0CYVpj29JQ566B0cY+/8KURrjb4Jkb6Fj1spFh1GJXSYGVvj0jBAtvISuDzICsYQf9/BVgK8kclXHp6/4J883Cu0pId2pso+esc7tOryfAvMuyCapw73NcPikNskjjHr3n5zPXrd4mkLjXFGJZYMpMia4jRe4hYgW4HJX0p0We6lOz/WGGIIk4QybAgsuc6wzSE8FMG2PK9C+oh+hNnYluZG9miF4N469UUS4rGz3/WZxgR7c2vNfl00qtNrWWx4iIIr7ycENqfjrE9H9rXNa81vcb8EmudfgpaxK7WKZK+PKGj1hHMtvkAZz7Jh83Uu8b8hd/yKRonftpggLrr9P5wwManDDWlYumZ2cKRm4p HJc/KCMP d9T8yh8oDBKiG/5oTQA8ryHiGxdXL82sTALWMzw/iwCGWiDD2ndpSR1VBNMCw4LfexS7tfoaDe5m2z07FNx3zsD16fM5R/WTZ1DzQllL/EmB37zWtgC3C3PV3v5nCWOTA01NTQUMlwGfsnX/AK7kYY0Fyx7lLpmXPIGGtavr0Sl2ZJi4S0TILOkfp0oIncZhz+Zsc/+a4LCPGkq4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: It will be used to check whether the folio is mapped to specific VMA and whether the mapping address of folio is in the range. Also a helper function folio_within_vma() to check whether folio is in the range of vma based on folio_in_range(). Signed-off-by: Yin Fengwei --- mm/internal.h | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/mm/internal.h b/mm/internal.h index 154da4f0d557..5d1b71010fd2 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -585,6 +585,41 @@ extern long faultin_vma_page_range(struct vm_area_struct *vma, bool write, int *locked); extern bool mlock_future_ok(struct mm_struct *mm, unsigned long flags, unsigned long bytes); + +static inline bool +folio_in_range(struct folio *folio, struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + pgoff_t pgoff, addr; + unsigned long vma_pglen = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; + + VM_WARN_ON_FOLIO(folio_test_ksm(folio), folio); + if (start > end) + return false; + + if (start < vma->vm_start) + start = vma->vm_start; + + if (end > vma->vm_end) + end = vma->vm_end; + + pgoff = folio_pgoff(folio); + + /* if folio start address is not in vma range */ + if (!in_range(pgoff, vma->vm_pgoff, vma_pglen)) + return false; + + addr = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); + + return !(addr < start || end - addr < folio_size(folio)); +} + +static inline bool +folio_within_vma(struct folio *folio, struct vm_area_struct *vma) +{ + return folio_in_range(folio, vma, vma->vm_start, vma->vm_end); +} + /* * mlock_vma_folio() and munlock_vma_folio(): * should be called with vma's mmap_lock held for read or write, From patchwork Wed Aug 9 06:11:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yin Fengwei X-Patchwork-Id: 13347445 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61BCCC04A6A for ; Wed, 9 Aug 2023 06:13:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B48B76B0075; Wed, 9 Aug 2023 02:13:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AD1A18D0002; Wed, 9 Aug 2023 02:13:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 998378D0001; Wed, 9 Aug 2023 02:13:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 85C1A6B0075 for ; Wed, 9 Aug 2023 02:13:02 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 51A9B160DF1 for ; Wed, 9 Aug 2023 06:13:02 +0000 (UTC) X-FDA: 81103548204.17.F7D11A0 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) by imf05.hostedemail.com (Postfix) with ESMTP id 23ACA100012 for ; Wed, 9 Aug 2023 06:12:59 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=WE93Dsc9; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf05.hostedemail.com: domain of fengwei.yin@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691561580; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RUCeLQjLdszSfEdTvthQheVh8rqU7yspEEIx65TwktA=; b=euNJ1nCNDeyyVgnGsoY1RsiFSMYeIBJf7ymR0EtpxJLhUt+7oAScKNZCxLa9kWwvt5szhY BuCpeMUfBDYfozfxv09wS2vVjZlC6DkBUxFqwCV/Tmzaid+e0EHxqss7+4F0YnSlyz6PR1 t1wFWCmG06F4a/k29Ma2HSzp+SgTdnQ= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=WE93Dsc9; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf05.hostedemail.com: domain of fengwei.yin@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691561580; a=rsa-sha256; cv=none; b=AyFOm+d0goMR9p50w4zIxzSSp9JYCRwNgYcxDLH9V93hUHWH/bz0OSS+KZzTMCgXLsEkv8 1ia8b27Ebu1D4nSXhEb5Hb2YqrMrfnSw+1DxZ6LOvV6gTTD3hMle0pBErdUputNJbG4LmP eIfA3oF8ndlwWaZBPynib4hAap9ghio= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691561580; x=1723097580; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fb4ma1U7pPZ93GdCuAzqiSTIky7nmfYMnw9psMcatoY=; b=WE93Dsc9FD3yFhjQVK5e7O2CTTQHu1JpWScM9nQ5dozATFN/Z1V9j/dV 5AUMJnYEHrN7ZODrxBkNfPvAZ1fscc2/yDlvGsZIjwD/l8qM00L9mw2Z9 /UbHz9RfKPkYDVkL4ydCdqbvG9NWCwvWWiUkJSwyevGyiHX4zjmhNgIhY dzMj40KWtjZ9v3nN7K2e44kUDkv2D9OIH9caYtkdFr5aQoZxJ1Q0+3YW9 RBSaoWpURLvDzRaDrvAcTIKLjgQxXauEr5bOgnk9jQ+iR2RGTdpAZMgun qZfGseye1esu3Oyxkxu/+vecNf92C5Zz6Wav7ISCGBsN+hszxB61CvdhX A==; X-IronPort-AV: E=McAfee;i="6600,9927,10795"; a="457410048" X-IronPort-AV: E=Sophos;i="6.01,158,1684825200"; d="scan'208";a="457410048" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2023 23:12:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10795"; a="731680568" X-IronPort-AV: E=Sophos;i="6.01,158,1684825200"; d="scan'208";a="731680568" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by orsmga002.jf.intel.com with ESMTP; 08 Aug 2023 23:12:54 -0700 From: Yin Fengwei To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, yuzhao@google.com, willy@infradead.org, hughd@google.com, yosryahmed@google.com, ryan.roberts@arm.com, david@redhat.com, shy828301@gmail.com Cc: fengwei.yin@intel.com Subject: [PATCH v2 2/3] mm: handle large folio when large folio in VM_LOCKED VMA range Date: Wed, 9 Aug 2023 14:11:04 +0800 Message-Id: <20230809061105.3369958-3-fengwei.yin@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230809061105.3369958-1-fengwei.yin@intel.com> References: <20230809061105.3369958-1-fengwei.yin@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: cu1mz6zync4fyfwd8akngyp3886fz7zo X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 23ACA100012 X-HE-Tag: 1691561579-243846 X-HE-Meta: U2FsdGVkX19aLAKEpRrFC5ETMy4XfxQ0djVwzYgk+MD71djfm0PEupW2deAn1XQSBXUrqdxMwOkksP1wA2DEnEQtkpa5BiVIpza6hvlf1IJLNwbbNKHT7rlk8hm5Mbe4WQh5yXG+iBYNQovz8nlhNc3sh3P4EToXQ/u/GkTwD7Z6+L2k9kvcsvJFhdijuU72/fcanCatexZ0Wc2BgPImiGTLHDSabOuBOxtINEjxrQIPHBqngmvFuiSqcOfVvREmP4t2hSjKGeWBPe+8G4Fe9dYqan2p5gaGZu7VtKE0NgN5wttc7lQoO00FRBHQKX+G51ZC5wD4KRtJoOptahAe3HRAvhAaZBlN5TK0l5W7xJPLihiVUzSFE6NIrVCporqc+Hb6R5L1qB4clke+ZU3iOSu6YsR6KUGW4i8Uh1D/TRSdHfQsbOWPEoDWxTwyMv1DbntspaGPC6Y93QvvNKYOV7qGAnAEHMy3NDMhiiLHcImOziRJnSsTqaNlx7ZpQF/FPR2pCXiIN7JMDdqxcGNw9x0nwYswbEX67GFPhQjsba3bTa0Uen3CG5G3/YBqYCisuSBuIJfm9ppih1jFp7hN8BzVCzMIoDSZilwV9w4Y4cyfahJ7jFPWcur7TwaK2cVReO0J4unzjOsSuXc9PMCOOwAI7xCtn4oATSsTsol2+2xcISYbAJb88ntE592WBFFSi689lw5kTLl7pWzdUZNxckIWmgg+MPGSnt88bzDQj4D8eN+JCSfDf+ESrwNAhT92L0RAo9Lgqhl0PYCiXomqXREtfnEUvd9+bpgkqs10gte6ulCmt63/D76/FXq4vl3XCbAtDF1w3W/02EN7osmYEXKGdn+jfyxghHHPkyP5q3ilVh6ylWajW3Y7VAKnLFgmVzCwNmg7lxM0cQmzs9/0koSuAehGGfO9EfRnnEMB/54OOvuQI+tAyXRbzZPql88SPiEJ0JQtcEts9v0dTKp wtV5g0SF oHQ+3dPmgwRLpElK/xRc+jDDbXZWuKfrtdeoe3UMJ8VynZmhJVQIibzCmOoPIzWiGyX2RYC6D/adckexau1lCdxPaark1omeeILmAysAxzoYKCuA0sysVk9lsPX1+k1wpPRKFoMNvQ/s1hQJ/mMdhjYVi68vka+z1iq+3rCO6Oi2NrFmc8e9omnU5V4Hyu932s6Q/w74ZYlxsEPA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If large folio is in the range of VM_LOCKED VMA, it should be mlocked to avoid being picked by page reclaim. Which may split the large folio and then mlock each pages again. Mlock this kind of large folio to prevent them being picked by page reclaim. For the large folio which cross the boundary of VM_LOCKED VMA or not fully mapped to VM_LOCKED VMA, we'd better not to mlock it. So if the system is under memory pressure, this kind of large folio will be split and the pages ouf of VM_LOCKED VMA can be reclaimed. Ideally, for large folio, we should mlock it when the large folio is fully mapped to VMA and munlock it if any page are unmampped from VMA. But it's not easy to detect whether the large folio is fully mapped to VMA in some cases (like add/remove rmap). So we update mlock_vma_folio() and munlock_vma_folio() to mlock/munlock the folio according to vma->vm_flags. Let caller to decide whether they should call these two functions. For add rmap, only mlock normal 4K folio and postpone large folio handling to page reclaim phase. It is possible to reuse page table iterator to detect whether folio is fully mapped or not during page reclaim phase. For remove rmap, invoke munlock_vma_folio() to munlock folio unconditionly because rmap makes folio not fully mapped to VMA. Signed-off-by: Yin Fengwei --- mm/internal.h | 23 ++++++++++-------- mm/rmap.c | 66 ++++++++++++++++++++++++++++++++++++++++++--------- 2 files changed, 68 insertions(+), 21 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 5d1b71010fd2..b14fb2d8b04c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -628,14 +628,10 @@ folio_within_vma(struct folio *folio, struct vm_area_struct *vma) * mlock is usually called at the end of page_add_*_rmap(), munlock at * the end of page_remove_rmap(); but new anon folios are managed by * folio_add_lru_vma() calling mlock_new_folio(). - * - * @compound is used to include pmd mappings of THPs, but filter out - * pte mappings of THPs, which cannot be consistently counted: a pte - * mapping of the THP head cannot be distinguished by the page alone. */ void mlock_folio(struct folio *folio); static inline void mlock_vma_folio(struct folio *folio, - struct vm_area_struct *vma, bool compound) + struct vm_area_struct *vma) { /* * The VM_SPECIAL check here serves two purposes. @@ -645,17 +641,24 @@ static inline void mlock_vma_folio(struct folio *folio, * file->f_op->mmap() is using vm_insert_page(s), when VM_LOCKED may * still be set while VM_SPECIAL bits are added: so ignore it then. */ - if (unlikely((vma->vm_flags & (VM_LOCKED|VM_SPECIAL)) == VM_LOCKED) && - (compound || !folio_test_large(folio))) + if (unlikely((vma->vm_flags & (VM_LOCKED|VM_SPECIAL)) == VM_LOCKED)) mlock_folio(folio); } void munlock_folio(struct folio *folio); static inline void munlock_vma_folio(struct folio *folio, - struct vm_area_struct *vma, bool compound) + struct vm_area_struct *vma) { - if (unlikely(vma->vm_flags & VM_LOCKED) && - (compound || !folio_test_large(folio))) + /* + * munlock if the function is called. Ideally, we should only + * do munlock if any page of folio is unmapped from VMA and + * cause folio not fully mapped to VMA. + * + * But it's not easy to confirm that's the situation. So we + * always munlock the folio and page reclaim will correct it + * if it's wrong. + */ + if (unlikely(vma->vm_flags & VM_LOCKED)) munlock_folio(folio); } diff --git a/mm/rmap.c b/mm/rmap.c index 3c20d0d79905..dae0443e9ab0 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -798,6 +798,7 @@ struct folio_referenced_arg { unsigned long vm_flags; struct mem_cgroup *memcg; }; + /* * arg: folio_referenced_arg will be passed */ @@ -807,17 +808,33 @@ static bool folio_referenced_one(struct folio *folio, struct folio_referenced_arg *pra = arg; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); int referenced = 0; + unsigned long start = address, ptes = 0; while (page_vma_mapped_walk(&pvmw)) { address = pvmw.address; - if ((vma->vm_flags & VM_LOCKED) && - (!folio_test_large(folio) || !pvmw.pte)) { - /* Restore the mlock which got missed */ - mlock_vma_folio(folio, vma, !pvmw.pte); - page_vma_mapped_walk_done(&pvmw); - pra->vm_flags |= VM_LOCKED; - return false; /* To break the loop */ + if (vma->vm_flags & VM_LOCKED) { + if (!folio_test_large(folio) || !pvmw.pte) { + /* Restore the mlock which got missed */ + mlock_vma_folio(folio, vma); + page_vma_mapped_walk_done(&pvmw); + pra->vm_flags |= VM_LOCKED; + return false; /* To break the loop */ + } + /* + * For large folio fully mapped to VMA, will + * be handled after the pvmw loop. + * + * For large folio cross VMA boundaries, it's + * expected to be picked by page reclaim. But + * should skip reference of pages which are in + * the range of VM_LOCKED vma. As page reclaim + * should just count the reference of pages out + * the range of VM_LOCKED vma. + */ + ptes++; + pra->mapcount--; + continue; } if (pvmw.pte) { @@ -842,6 +859,23 @@ static bool folio_referenced_one(struct folio *folio, pra->mapcount--; } + if ((vma->vm_flags & VM_LOCKED) && + folio_test_large(folio) && + folio_within_vma(folio, vma)) { + unsigned long s_align, e_align; + + s_align = ALIGN_DOWN(start, PMD_SIZE); + e_align = ALIGN_DOWN(start + folio_size(folio) - 1, PMD_SIZE); + + /* folio doesn't cross page table boundary and fully mapped */ + if ((s_align == e_align) && (ptes == folio_nr_pages(folio))) { + /* Restore the mlock which got missed */ + mlock_vma_folio(folio, vma); + pra->vm_flags |= VM_LOCKED; + return false; /* To break the loop */ + } + } + if (referenced) folio_clear_idle(folio); if (folio_test_clear_young(folio)) @@ -1260,7 +1294,14 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, __page_check_anon_rmap(folio, page, vma, address); } - mlock_vma_folio(folio, vma, compound); + /* + * For large folio, only mlock it if it's fully mapped to VMA. It's + * not easy to check whether the large folio is fully mapped to VMA + * here. Only mlock normal 4K folio and leave page reclaim to handle + * large folio. + */ + if (!folio_test_large(folio)) + mlock_vma_folio(folio, vma); } void folio_add_new_anon_rmap_range(struct folio *folio, @@ -1371,7 +1412,9 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page, if (nr) __lruvec_stat_mod_folio(folio, NR_FILE_MAPPED, nr); - mlock_vma_folio(folio, vma, compound); + /* See comments in page_add_anon_rmap() */ + if (!folio_test_large(folio)) + mlock_vma_folio(folio, vma); } /** @@ -1482,7 +1525,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, * it's only reliable while mapped. */ - munlock_vma_folio(folio, vma, compound); + munlock_vma_folio(folio, vma); } /* @@ -1543,7 +1586,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, if (!(flags & TTU_IGNORE_MLOCK) && (vma->vm_flags & VM_LOCKED)) { /* Restore the mlock which got missed */ - mlock_vma_folio(folio, vma, false); + if (!folio_test_large(folio)) + mlock_vma_folio(folio, vma); page_vma_mapped_walk_done(&pvmw); ret = false; break; From patchwork Wed Aug 9 06:11:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yin Fengwei X-Patchwork-Id: 13347446 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7FB3FC001B0 for ; Wed, 9 Aug 2023 06:13:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 10F9E6B0078; Wed, 9 Aug 2023 02:13:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0BEF86B007B; Wed, 9 Aug 2023 02:13:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC93B8D0001; Wed, 9 Aug 2023 02:13:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id D924B6B0078 for ; Wed, 9 Aug 2023 02:13:15 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B09FD1C8E6F for ; Wed, 9 Aug 2023 06:13:15 +0000 (UTC) X-FDA: 81103548750.09.5BC34E9 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) by imf19.hostedemail.com (Postfix) with ESMTP id 8527D1A000D for ; Wed, 9 Aug 2023 06:13:13 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=OQ2R2dLD; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of fengwei.yin@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691561593; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nTat4hjNwabT9RgLwLQU/73bsULgdhGM/jU6a6BKHXE=; b=i6+CQkUQmn6tSdqWrAqzo+Mt8giyMmxq+eYTSnxNvyqVjHkh5KkHkXlUyav00xHjB6fPUH FmAhVwb41ugj50yATfpGidPSzRJPb4cXuCibK6WpoxR4DSXiiec0ew/5udIaNQrXw0gmNo 70+P346eJyIpRa4fJCH6DIbcQrMs5I8= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=OQ2R2dLD; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of fengwei.yin@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691561593; a=rsa-sha256; cv=none; b=KbyUeSC4j3fS3aMnD9sz92Ch36+V9CrvOJ8iZn10R+y2O0MT23+b1cfzF7kdxF9hyxkrgM BmWojTJgJMQKj6R8BamGzH1w55FB5Y/6uGLCppbVAJTcdOHAm39y7YN7j48aUcqIFzEwji IG4pvD0cENe+QvnYpxlB8IKHKhtW/Yo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691561593; x=1723097593; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LGJZorJ3E6t3J456VFnC+I2HIJq1tAE/jnA+v0QWpWc=; b=OQ2R2dLDymLFdKjHa3rBoYlT5aTlSA2ntQJ633rtJXZ5YMXY+qwCIumb jVp3mQylWFeeS7bTqILjaVp0rvUiF0g6rHlN0ok6LeV2F1tm6FWpOR0D5 DXAbsMa8ricGwVyPJDjiwIT1/hiEAAl5pe4r5A8k/Q4OBJKiSX4LsTata VJXPaUhrKj9MjSCxau5krcMNRKtU65R+TGGuVKW5dDV3t7Y/QJcy+7Aff bV5lxU+04nso2guKD0fThgCBwAn4r8+m63nE3Ugumuv30LuFh2eYwF4ec m3qpV1+7QaItzTkfWkg/NWKxgulx3VS4v3oNt/VllO4tSUcO0/MbqApLX g==; X-IronPort-AV: E=McAfee;i="6600,9927,10795"; a="457410063" X-IronPort-AV: E=Sophos;i="6.01,158,1684825200"; d="scan'208";a="457410063" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2023 23:13:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10795"; a="731680634" X-IronPort-AV: E=Sophos;i="6.01,158,1684825200"; d="scan'208";a="731680634" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by orsmga002.jf.intel.com with ESMTP; 08 Aug 2023 23:13:08 -0700 From: Yin Fengwei To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, yuzhao@google.com, willy@infradead.org, hughd@google.com, yosryahmed@google.com, ryan.roberts@arm.com, david@redhat.com, shy828301@gmail.com Cc: fengwei.yin@intel.com Subject: [PATCH v2 3/3] mm: mlock: update mlock_pte_range to handle large folio Date: Wed, 9 Aug 2023 14:11:05 +0800 Message-Id: <20230809061105.3369958-4-fengwei.yin@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230809061105.3369958-1-fengwei.yin@intel.com> References: <20230809061105.3369958-1-fengwei.yin@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 8527D1A000D X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: h6era7x3qi8ocuxfe1tp8y1u5hb1qdki X-HE-Tag: 1691561593-859328 X-HE-Meta: U2FsdGVkX1+VUReA7JkuCWhiIIMVRXrYEnX0xCALADTrymT2X/u8EJaLHdXxEfRJoE0lHPCU5CmUSQ0DSKl/ro9dLRqSFKQ4+xxikQcX/rTN2B0uq+5XrQLk7fajbZ4pNDnPTZd8lQJPov7melbWAqCvvm3irIEfPChj8d3uAqmyPoB28xU2VllECsg+bc6SeRmyhC91YUEiLcJySLcxQXKZOY7weVncgt46Gt+6r45F5mK+MPZrWeKifGY+U2DBYrd23CQgJo2ECUEFPiqsUALYSNzFrjeeY8XkABbB4kgOnoZzrDG8OS+NIU3I2E2ysJ04lntrlrRX0bILay9C36FX9e5LUR4qa0MqyVF9xIqAU8wkkcLEgQ314nu3gS5l9Z0DUSa3Wtio05Hxt/1ByZ6epFJirHHruXWKXvflCU7b2S+UoBPyKNGNDwRrALllyyn5ibdMxXAt0Xq7ldI5nh5rT1F0pJznvxBfIcbcKLtKT/++Ur5t4ZC5LD+hqWW2yl8NjZkvmqC9WaCJanDlqtE6PDuPcGJKNFBkrZe/sSZe5fZtEGw2punLf0DLq9XDQMjimWfoJ3zg7HV8BJgQkYtwDuxbHTSvXt7zM1g6ZKUBB0LCmlXXyr6cgNQjo9FLIz+o2Ux1Po2W9A2WU3UXHa+9KWJJxVJ/n87rFqR5GeawimUf+xVCA36OXDWxwNlNyHmuqtOy4KiaeDBmeUVXekRpZmGWgwav2QDjEP/4hWDSy5ZUzwuZI1BLdC+0GDkViN6uYW9bUepK8w8EIYTeFa8cjqZHtErlqzI+aYFRUOuZ95mWjHtX8i6K5UBR0XZOH7a7e0T0Ju2NJkgL+BrLj9JVED+hJouika2U+ATAIL69ssWV4ohM+n015M2JeNmlGKZZNOKMxp4OaX84L3EsKwm6QnTANaQa1ZkfrXkp1EqaJurMr+k3as5O+CUtZ/92ZepkY2x6tLM36+JpdXT Z9by32br T2UCGn0tCQV8RJD/UVIKpJsX7VLjk+0z6sE3+Y74ByPcgV8hf/B9kYlfrSoxOk4LMC7ljHYPDejWntCWqj4mOj9xJp7fjNy+vmNGKHhRjg1g0bbWqwB9HsbkLFhgseHBQdDQG/2vfSxTAW5SR/3yHVrpVFdD28EwGSxnqJypJ6qvwKyKH3lE0qxiR631x+HCQcaaHkvTUvJlyZtI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Current kernel only lock base size folio during mlock syscall. Add large folio support with following rules: - Only mlock large folio when it's in VM_LOCKED VMA range and fully mapped to page table. fully mapped folio is required as if folio is not fully mapped to a VM_LOCKED VMA, if system is in memory pressure, page reclaim is allowed to pick up this folio, split it and reclaim the pages which are not in VM_LOCKED VMA. - munlock will apply to the large folio which is in VMA range or cross the VMA boundary. This is required to handle the case that the large folio is mlocked, later the VMA is split in the middle of large folio. Signed-off-by: Yin Fengwei --- mm/mlock.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 64 insertions(+), 2 deletions(-) diff --git a/mm/mlock.c b/mm/mlock.c index 06bdfab83b58..1da1996745e7 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -305,6 +305,58 @@ void munlock_folio(struct folio *folio) local_unlock(&mlock_fbatch.lock); } +static inline unsigned int folio_mlock_step(struct folio *folio, + pte_t *pte, unsigned long addr, unsigned long end) +{ + unsigned int count, i, nr = folio_nr_pages(folio); + unsigned long pfn = folio_pfn(folio); + pte_t ptent = ptep_get(pte); + + if (!folio_test_large(folio)) + return 1; + + count = pfn + nr - pte_pfn(ptent); + count = min_t(unsigned int, count, (end - addr) >> PAGE_SHIFT); + + for (i = 0; i < count; i++, pte++) { + pte_t entry = ptep_get(pte); + + if (!pte_present(entry)) + break; + if (pte_pfn(entry) - pfn >= nr) + break; + } + + return i; +} + +static inline bool allow_mlock_munlock(struct folio *folio, + struct vm_area_struct *vma, unsigned long start, + unsigned long end, unsigned int step) +{ + /* + * For unlock, allow munlock large folio which is partially + * mapped to VMA. As it's possible that large folio is + * mlocked and VMA is split later. + * + * During memory pressure, such kind of large folio can + * be split. And the pages are not in VM_LOCKed VMA + * can be reclaimed. + */ + if (!(vma->vm_flags & VM_LOCKED)) + return true; + + /* folio not in range [start, end), skip mlock */ + if (!folio_in_range(folio, vma, start, end)) + return false; + + /* folio is not fully mapped, skip mlock */ + if (step != folio_nr_pages(folio)) + return false; + + return true; +} + static int mlock_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) @@ -314,6 +366,8 @@ static int mlock_pte_range(pmd_t *pmd, unsigned long addr, pte_t *start_pte, *pte; pte_t ptent; struct folio *folio; + unsigned int step = 1; + unsigned long start = addr; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { @@ -334,6 +388,7 @@ static int mlock_pte_range(pmd_t *pmd, unsigned long addr, walk->action = ACTION_AGAIN; return 0; } + for (pte = start_pte; addr != end; pte++, addr += PAGE_SIZE) { ptent = ptep_get(pte); if (!pte_present(ptent)) @@ -341,12 +396,19 @@ static int mlock_pte_range(pmd_t *pmd, unsigned long addr, folio = vm_normal_folio(vma, addr, ptent); if (!folio || folio_is_zone_device(folio)) continue; - if (folio_test_large(folio)) - continue; + + step = folio_mlock_step(folio, pte, addr, end); + if (!allow_mlock_munlock(folio, vma, start, end, step)) + goto next_entry; + if (vma->vm_flags & VM_LOCKED) mlock_folio(folio); else munlock_folio(folio); + +next_entry: + pte += step - 1; + addr += (step - 1) << PAGE_SHIFT; } pte_unmap(start_pte); out: