From patchwork Wed Jul 12 06:01:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yin Fengwei X-Patchwork-Id: 13309581 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C818CEB64D9 for ; Wed, 12 Jul 2023 06:01:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DDD566B0071; Wed, 12 Jul 2023 02:01:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D8D7C6B0072; Wed, 12 Jul 2023 02:01:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C54F06B0075; Wed, 12 Jul 2023 02:01:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id B36396B0071 for ; Wed, 12 Jul 2023 02:01:47 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 610881A0125 for ; Wed, 12 Jul 2023 06:01:47 +0000 (UTC) X-FDA: 81001913454.08.2C5E141 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf28.hostedemail.com (Postfix) with ESMTP id C1D05C001B for ; Wed, 12 Jul 2023 06:01:44 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=S57vesls; spf=pass (imf28.hostedemail.com: domain of fengwei.yin@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689141705; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=+YFF0pZHr0LCI5kYIqzQNeJIMwEi3Gc3suPf3Xwdvq4=; b=d/yI8e/ciOmAvIgI4/JeJJM93GkXTgSM/dhkhwc/5LoJbXJXts2d4XooJAaY5iQ2RDADHK ijn7P9OEJzM5tIOgpRD//1duO86YT+BVYHtTicxA4EUjenLElcXCOWxvogNP6fKjfqHa3H Wc0qM8+zTHbmsImaw1Aq2TZV8FnJi6I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689141705; a=rsa-sha256; cv=none; b=tCdqLgqyRaXqswCCiD4rp2s51PDysiTboa7gRZu4rGq0dA3zp5cgyJxx8/8gDl8fZsayVX yHR270RdoejeZcJkUl9AHVQFzWQ5/9BBF9H0ikkTR1ETum6SAbhPG5LNYftkM9YLr8mWNW mW4Id5JX+aoAB6ptkonssC1XtU4ZPLM= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=S57vesls; spf=pass (imf28.hostedemail.com: domain of fengwei.yin@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689141704; x=1720677704; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=NLp00fEtLzfAThjLKM31DTKLIpo3OIAVoOoRd7xeCPc=; b=S57veslsMbNcn6irlkyRkOiCC4R7QmEUkXC8u7yT0MG7WKn3Gb/ogn6o OJ9zWxswaRO+rQBkYineDa/x128NINh9ezPr1A8HQWMvs+L1AQbk2OGq3 9vtMCl/2xV+K8vJHLB5aJttO/xXvj7JCUNGvI7zZH5f55MDbTf8xfQvu7 +DD+dy6nyExXa+V0xCep7u4O+2ukKERqLY7F9Xx7z8haTaduKmGGfrxBW ZaQmHe+S1RQySuMWsGkT+FFrMKG9s2b4OnBvd9QBLpWTc6I4z+T/hTWZx 97EcAKbv/rlSDtKzbN3MI86WBnAAIn/4SnvuN3QxHI9LqSsNYAgc09eNy A==; X-IronPort-AV: E=McAfee;i="6600,9927,10768"; a="349662700" X-IronPort-AV: E=Sophos;i="6.01,198,1684825200"; d="scan'208";a="349662700" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jul 2023 23:01:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10768"; a="751039738" X-IronPort-AV: E=Sophos;i="6.01,198,1684825200"; d="scan'208";a="751039738" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by orsmga008.jf.intel.com with ESMTP; 11 Jul 2023 23:01:39 -0700 From: Yin Fengwei To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, yuzhao@google.com, willy@infradead.org, david@redhat.com, ryan.roberts@arm.com, shy828301@gmail.com Cc: fengwei.yin@intel.com Subject: [RFC PATCH v2 0/3] support large folio for mlock Date: Wed, 12 Jul 2023 14:01:41 +0800 Message-Id: <20230712060144.3006358-1-fengwei.yin@intel.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-Rspamd-Queue-Id: C1D05C001B X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: bszzj13ej3qzhecpeb1iqsng9rhy5m47 X-HE-Tag: 1689141704-256294 X-HE-Meta: U2FsdGVkX1+JgUdApg7Q3r8LHjPqke+OuW3VrCCTJgYURY6PA/9f4bqTeZqLeudjVyjVTp/lYN5GSxxPLRfWtyEigPVJxujDcP0YWAGAcYhxIbwy/CBYrbOttKRO5hcUwcYghpIZ8da35VL2EwFLuUKEm5/Ko2tH/0jX1okIJXZrf6lhpaDzubzBYECD8X5NM5MDnaxweSPuA5IjBb1AOn52o3q1OTZs85T50tevM7KSQtPg95nF1cyvEYRDbByDaylTJIXrIWWqzlJp+K1nSs4wbgP0NsZe82YdHDhpbYopKFDrzT/+4uSCzuOq/yijnAgfMxYLjFG3cKKic13azePLqOrja1Trty5BU1asa3o7M3nGJ+MjA1oyljPSF/wioZWNHGZSwrCIUIkfKvRKJbGFUI+ujpi/DDhziqgTz5G8+wHrOEslmRioHIDPsKF/Uz/vcKiZpTrJXEiAUP8ar6Sum4Tgl/Iwuxo5xsKSyJix9weEmsN0Cz8qHOL6KIwZdT4+rEVcTzaWOH8jOHp4Nx1o4S1XJA80p2aMUvvKjLC77IzyB3b150pQXK0Wj7A3L4TA6cmePEK/mhUAlj3yjGehib94H3tCcKDJVW/XnkNMdlAIzdM94ZEseC8JLXHXoBMqKKf0AvlQNGwjxpc/xeRVCEvItEtIpx2vASMWt+fOq0Rlylo/Smnf03Wf+ZwRjHRHvFhJb7F/D1kNTyR3Erx8pRFLFVmJS9b+Z5TIaUQw7XEaxLs7YADxwqjcV0+S6B6sUnjHR+ru8XSFQqr9k5zIvsuCvYLMKdxwWiH1yo0fIuITUCZfGvCgRTYLgNxQfldgaVUa8j+aknexC2Urd50Vgn2qSlrJsf3nshODTf5205EZeN/n/L6ruxlEiI5vYHQNFY4tEHQoKJfpWtG5dYechj19yG0TLf72HPzaYDQyzXkazbqMbyEuBCWYwdYfAkNkdoqLcEnbz+kPKmY 5C0scLN6 MYYRX9pjIaWnQT8Xvj/le5Jip2AOw+g4YYlZqTEyyZj/4t9RdjoC/1z7aFPnOfpX+EoN1sxR3FqmNjllRjX+pL7jrF+sEzIsLchgKea0BYKOM3fNppewFQi4RHpWaRu+JPE7Gp6ZUfMc5Gv6YiNG/5d1WhEc2ABtMsWo92KTw8QpRh7PWzzpQN2y/xEUHhRFdPz21e4Xfx5Uf0Q8w5QfnH+SFNA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Yu mentioned at [1] about the mlock() can't be applied to large folio. I leant the related code and here is my understanding: - For RLIMIT_MEMLOCK related, there is no problem. Becuase the RLIMIT_MEMLOCK statistics is not related underneath page. That means underneath page mlock or munlock doesn't impact the RLIMIT_MEMLOCK statistics collection which is always correct. - For keeping the page in RAM, there is no problem either. At least, during try_to_unmap_one(), once detect the VMA has VM_LOCKED bit set in vm_flags, the folio will be kept whatever the folio is mlocked or not. So the function of mlock for large folio works. But it's not optimized because the page reclaim needs scan these large folio and may split them. This series identified the large folio for mlock to two types: - The large folio is in VM_LOCKED VMA range - The large folio cross VM_LOCKED VMA boundary For the first type, we mlock large folio so page relcaim will skip it. For the second type, we don't mlock large folio. It's allowed to be picked by page reclaim and be split. So the pages not in VM_LOCKED VMA range are allowed to be reclaimed/released. patch1 introduce API to check whether large folio is in VMA range. patch2 make page reclaim/mlock_vma_folio/munlock_vma_folio support large folio mlock/munlock. patch3 make mlock/munlock syscall support large folio. testing done: - kernel selftest. No extra failure introduced Matthew commented on v1 that the large folio should be split if it crosses the VMA boundaries. But there is no obvious correct method to handle split failure and it's a common issue for mprotect, mlock, mremap, munmap.... So I keep v1 behaivor (not split folio if it crosses VMA boundaries) in v2. [1] https://lore.kernel.org/linux-mm/CAOUHufbtNPkdktjt_5qM45GegVO-rCFOMkSh0HQminQ12zsV8Q@mail.gmail.com/ Changes from v1: patch1: - Add new function folio_within_vma() based on folio_in_range() per Yu's suggestion patch2: - Update folio_referenced_one() to skip the entries which are out of VM_LOCKED VMA range if the large folio cross VMA boundaries per Yu's suggestion patch3: - Simplify the changes in mlock_pte_range() by introduing two helper functions should_mlock_folio() and get_folio_mlock_step() per Yu's suggestion Yin Fengwei (3): mm: add functions folio_in_range() and folio_within_vma() mm: handle large folio when large folio in VM_LOCKED VMA range mm: mlock: update mlock_pte_range to handle large folio mm/internal.h | 43 +++++++++++++++++++-- mm/mlock.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++--- mm/rmap.c | 34 +++++++++++++---- 3 files changed, 166 insertions(+), 15 deletions(-)