From patchwork Sun Dec 15 07:34:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13908675 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35439E7716A for ; Sun, 15 Dec 2024 07:34:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 12C086B007B; Sun, 15 Dec 2024 02:34:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0B4E96B0083; Sun, 15 Dec 2024 02:34:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E70976B0085; Sun, 15 Dec 2024 02:34:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C67586B007B for ; Sun, 15 Dec 2024 02:34:42 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 2B2ED1CAA22 for ; Sun, 15 Dec 2024 07:34:42 +0000 (UTC) X-FDA: 82896379902.08.BADCBCD Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf20.hostedemail.com (Postfix) with ESMTP id 2E9001C0008 for ; Sun, 15 Dec 2024 07:34:10 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=NvK7TzcW; spf=pass (imf20.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734248059; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=tr/TTziJ2HA3mfABcCNIrZjRmJCj8NGSObw/0Ynu06M=; b=MEgTPvETNoP/9jMbnyCjnuW613+iZEJaJ6AQDrp7KFOC6JCh5mqRFetwudtVZaCr1Iq0op +G+9sK4DkAFpW0heziGJ+LqnZjPBW20t+OK65OKmkCEAaOJOdJGjnmx/MnuipI26/kZEyS rs8xq1YbUDTQpyERodU3b5cwZM8CspY= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=NvK7TzcW; spf=pass (imf20.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734248059; a=rsa-sha256; cv=none; b=OICTt7efDtpuY8vkXTEjbar1FKAJeeKkE1bpKdoR13k2x7XB4AhMQu4SrODlYE5eU7Jdcc ixiQLvsPF5at6vAXuXIxW/6Af6Dl53Khj0gU+FYg44SuzjUJCgzLgfxnxB3uFQtlmze0js xQ3KLVnY5ZER4Aj8Np+FfPGzFbk4rcU= Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-21654fdd5daso25152665ad.1 for ; Sat, 14 Dec 2024 23:34:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734248079; x=1734852879; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=tr/TTziJ2HA3mfABcCNIrZjRmJCj8NGSObw/0Ynu06M=; b=NvK7TzcWRW8328gXUNO9GTcv5qnQzFk6RA46jw0ZxbelosupUsHOHJy75zQtpUZMEp JVX5RVxBVVrv+Pjgv0PpZ1frK02gmat2gD2SSk3Ctxdnzt3BUWOhuxcrbwYH6/cS19yN e0FtU2s+LETSpsSWpg6n0Ldy6EoqmD3cjlwLXn6Ndi+KaVDeIIr8GvHcIf7UTVldO3l8 ufCinpzRbrhAhdqYFVSCZ6mujWO7UWK2KJBq/f4adQEJf+T3CWiYAOVr68n50W6ogZup z09LJPPN+E5WeFZTR3dzgVcYk0J3hUJGliH9B/t91edStEC9yjzhGPmT5KlbkuePA2mK oqGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734248079; x=1734852879; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=tr/TTziJ2HA3mfABcCNIrZjRmJCj8NGSObw/0Ynu06M=; b=eXsa/OXDxRriNNDF8xMllkEoLRsMXk4NJKUZQIlFYRy3okA52KU9ia0i029ghSRBly /f0t2BMcSZLDz+l4Tw37QcDNhwnwouihi7lawSIPn9gFHi+CZF+D4arnH0bxXYCycJwZ PPTJ7gryDHfRRDvii46bFXdYKZMtDJAOyCNSViy6GM/fAbobWagq4dQ0NElClin5g3A8 oukxAQAJcVY6ZACabfbVNqnH8FONByqFUwrHIlRvyDQjBqQLD6XqX4ztD66vvEhSHsU+ CBr3xemsCR2MKEzO9b+udZmWL/VsQ7QaSlsB2TvRfcTPdkRvioIxMWBOmjnRhTr5YUbV XXJQ== X-Gm-Message-State: AOJu0YyQlVwyJh9gxl4zQL8B0tb9dBpjoU8b6JRgq5GEzgGKKE6ZXaYC nSK8x1jefgZjg8ahsj+AVV5LxoiM+1XyrXdpzLZiPILDnjJmQX1D X-Gm-Gg: ASbGncv6fP6Q2BpS2qof7KV6jTKYCBEsOsws23nbGmMknuCBKuJoDZzpxTID76YMzoO CFdlhNdql9YSwfUGAw9z2ziuIc7ZpUvRxUj8s1UpRrU5198kRf2nu8snOSv+rMbaTMRAJ/pmbVB BVzV/RZA4+x2v/52wUqiEBDPWXPi5wnic2W9w5Yo21IweQoi65WIdeR2kLOawF7J6dKJvSjo5OV ALQGLzQilqBMN1sIAZuTswl2slbW3kbKhPwzov8KjnZom6KfXQDVPtrNEXXEbKEzxNoD1Ctz7Ne 2Z0NJgg= X-Google-Smtp-Source: AGHT+IGSjU7QgFXbkUcXb/M1JY2enXdMUfe70By9E+r6ZJWbvNHkBqzQtxlyX516rejcjdjW8YVBlg== X-Received: by 2002:a17:902:da84:b0:216:6a4a:9a39 with SMTP id d9443c01a7336-21892a86816mr132695205ad.56.1734248078961; Sat, 14 Dec 2024 23:34:38 -0800 (PST) Received: from localhost.localdomain ([180.159.118.224]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2f142daea5esm5933149a91.17.2024.12.14.23.34.36 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sat, 14 Dec 2024 23:34:38 -0800 (PST) From: Yafang Shao To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org Cc: linux-mm@kvack.org, Yafang Shao Subject: [RFC PATCH 0/2] memcg: add nomlock to avoid folios beling mlocked in a memcg Date: Sun, 15 Dec 2024 15:34:13 +0800 Message-Id: <20241215073415.88961-1-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) MIME-Version: 1.0 X-Rspamd-Queue-Id: 2E9001C0008 X-Rspamd-Server: rspam12 X-Stat-Signature: nc71ckguq4w5gd3eit18ssh4jy8tz14j X-Rspam-User: X-HE-Tag: 1734248050-615038 X-HE-Meta: U2FsdGVkX19fiT1McWD6tvFOPGe8J5XdPsE32iUriyZ947lqoP5EyleDI9D/CcicY5tp/PfcVAzTCMZR2waLe9w6o2jtN3cBgvtfB1orLqaWcxiuovCpDxVIf8yEOHLZOk7dL7zw9ONiMMhm9E0T/9RQLibY8lSlLoJTLfTk3mM04ktVV2b56VVovTpY2yDWPcMEQjeGy1S+xeonvNtKWZ0mE9fRLt1B3boPfvfghw6dlnge+If/a5g+g1eg25Av3gntT3s3xcNuGHphxIC6kGUOsIrhLQzDx0169r7l5Xut/FYAvEVsqNA1djsTRqDyXH87cK9ZcfhrYBj0jNZz+19HT3hVPF7jAtadnLSX3IJZvbD/yiMI9FLTwe7M663Qt3oxATRwsszmDKkPT615Ysm6VsPyw+QcDROD5W119H2QxAOSwJLUId+BZW9EdIOcTXLGXvZZvR2iKksVwocdr+WiS1TvsA0K30P0FqPrS5fDAHLxQuJVkibyffi594yiO/90fo7JOHVZNVNpR5OtNDo4yBworlZtjePg0Dymno6Y0MwOios68AcjRTW4rzIcNzOB5PjfUzyUzMo/igLo24mZnbJ0wuT66k9M0ALlb3E5VXOOO1As89Ys9mmM+ot/yRzjSEOqCe4YDHaqnba6G054xgEc8R+ZKlYoGWIokWjIlP2Vx66PCB38cb4zoAQUAbejwlLXGrYhNuEnvLA9G4Jje7Xq9rImAwPpM/A8gZyt1oTlsppx4wqlVfA5ESC7GYPeRUE+8roJGN4k3SrbmtFHYLAsL1RsFhMSBXapxkI69WGtOqfziV+V0kRgD0x/GjPy8U5taNd7IuOXH4rDs6yMYt+/bpdeq4B9ExFuqnVMma5V/qRrzEz0DIKFv6wcrr40oT7//GVfOJRYJZwzJ+ruSSoshTNuLxqsjoRAEqNgA3cXJwr9dgk29TMeZvYGdywn6jifV8L+DcyKPwb pC1Wb/Rj 4FvB90o8cY+3LIx4tTg1q3S7M0Q2zQMTfRsBWJzpGagAYNTWLKURZ4R4yaMGMPVXakhxKwtezl6YhcBbdfAHLtYcRwB6rfxFcu+Lg3e26nqa/rcsFkbyAN2wuE3Nmqes3xHniFRhKBLwRSG0alXSwlC43fCE8suTL/1c8ciCutXNxqjnFDHO1iBGGS3B4tI5bes9juPrp8lZyT7t06uKwfGVggk8Wsn2PT9yQEvZw/k9CAIpww5oo2OdVdtIEZPUFh8Y61gu2Sd955m304mDVmALy0p9RQdjFjHu6Fp7+CCeiyAGDs/oUAC1O9UeLx5UmPNW/jyAYahaJ177R4EH9dH2PI0Tc3FKejENG/CI6DU6GLIkZf9W4KajFRMTa7SfonTc7AmLhaOLC2Oj8Yv0iska7Etwqa0vz053CFdMYrfSC806RbD11Mazq0S6IQH8QWlYxIECbTqduSyGPrvQoY0qmgwpzX1ljC92MgYRlZJTlYPE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000109, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The Use Case ============ We have a scenario where multiple services (cgroups) may share the same file cache, as illustrated below: download-proxy application \ / /shared_path/shared_files When the application needs specific types of files, it sends an RPC request to the download-proxy. The download-proxy then downloads the files to shared paths, after which the application reads these shared files. All disk I/O operations are performed using buffered I/O. The reason for using buffered I/O, rather than direct I/O, is that the download-proxy itself may also read these shared files. This is because it serves as a peer-to-peer (P2P) service: download-proxy of server1 <- P2P -> download-proxy of server2 /shared_path/shared_files /shared_path/shared_files The Problem =========== Applications reading these shared files may use mlock to pin the files in memory for performance reasons. However, the shared file cache is charged to the memory cgroup of the download-proxy during the download or P2P process. Consequently, the page cache pages of the shared files might be mlocked within the download-proxy's memcg, as shown: download-proxy application | / (charged) (mlocked) | / pagecache pages \ \ /shared_path/shared_files This setup leads to a frequent scenario where the memory usage of the download-proxy's memcg reaches its limit, potentially resulting in OOM events. This behavior is undesirable. The Solution ============ To address this, we propose introducing a new cgroup file, memory.nomlock, which prevents page cache pages from being mlocked in a specific memcg when set to 1. Implementation Options ---------------------- - Solution A: Allow file caches on the unevictable list to become reclaimable. This approach would require significant refactoring of the page reclaim logic. - Solution B: Prevent file caches from being moved to the unevictable list during mlock and ignore the VM_LOCKED flag during page reclaim. This is a more straightforward solution and is the one we have chosen. If the file caches are reclaimed from the download-proxy's memcg and subsequently accessed by tasks in the application’s memcg, a filemap fault will occur. A new file cache will be faulted in, charged to the application’s memcg, and locked there. Current limitations ================== This solution is in its early stages and has the following limitations: - Timing Dependency: memory.nomlock must be set before file caches are moved to the unevictable list. Otherwise, the file caches cannot be reclaimed. - Metrics Inaccuracy: The "unevictable" metric in memory.stat and the "Mlocked" metric in /proc/meminfo may not be reliable. However, these metrics are already affected by the use of large folios. If this solution is deemed acceptable, I will proceed with refining the implementation and addressing these limitations. Yafang Shao (2): mm/memcontrol: add a new cgroup file memory.nomlock mm: Add support for nomlock to avoid folios beling mlocked in a memcg include/linux/memcontrol.h | 3 +++ mm/memcontrol.c | 35 +++++++++++++++++++++++++++++++++++ mm/mlock.c | 9 +++++++++ mm/rmap.c | 8 +++++++- mm/vmscan.c | 5 +++++ 5 files changed, 59 insertions(+), 1 deletion(-)