From patchwork Tue Apr 4 12:01:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 13199706 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3680C761A6 for ; Tue, 4 Apr 2023 12:01:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6B8FD6B0074; Tue, 4 Apr 2023 08:01:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 642886B0075; Tue, 4 Apr 2023 08:01:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 509AC6B0078; Tue, 4 Apr 2023 08:01:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 399546B0074 for ; Tue, 4 Apr 2023 08:01:33 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id F0410140E9D for ; Tue, 4 Apr 2023 12:01:32 +0000 (UTC) X-FDA: 80643568824.11.B20D515 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf24.hostedemail.com (Postfix) with ESMTP id D6D1C18001D for ; Tue, 4 Apr 2023 12:01:30 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=GhUPSo9s; spf=pass (imf24.hostedemail.com: domain of stevensd@chromium.org designates 209.85.214.176 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680609690; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BXv1GcIP0b01WCB0dxAHE5qtL4a0jd48WmxV6aMJGEI=; b=ov4YW9IgBg1Z4lfvlierPyv0aSQwbESZm57Q/4yR4sjpo7S6zlvioWz47mAtr6emWjOHSU yKB8oJtdF5KMR5d/o+U1/ND9Art6v53HHFi5n10MnMSaaqS+eZM4H6pTaVXljI97BzVVMN ocT5EmxJUUEnVUL5+AZK6DOFfB/58J0= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=GhUPSo9s; spf=pass (imf24.hostedemail.com: domain of stevensd@chromium.org designates 209.85.214.176 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680609690; a=rsa-sha256; cv=none; b=r4ejEmEE/rM5xvhfocBbpqGRrY2mywhKQsaGAcmq0UpCzqMF4rBNAwYoIGoMTUWbrlxRzE IJXcxE/OaKlUHKL31MePt/LlgebmnixD+5CIzdAz8sWEVkwnvUCxYJ+8fTZ0MFPXEl3Kca pz/DhEA82YtTR/MKGw/Sw0dgQOh/gjM= Received: by mail-pl1-f176.google.com with SMTP id n14so15175278plc.8 for ; Tue, 04 Apr 2023 05:01:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1680609689; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BXv1GcIP0b01WCB0dxAHE5qtL4a0jd48WmxV6aMJGEI=; b=GhUPSo9sh6Zwwfo4h+KoyiwNBesV7G+G8J1L0pQX0JIU2wZv09kHmO7svPlTwq7tWY W0Wc6i0y6qoBqfdiEgvioqzZqSn479UQH0LZb+QYhGBOawXvCQgAYwZy+crfeg95NXWc /4gHmZ1MbS8u0pbiY4snCi0+AH22BTnSmnjRM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680609689; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BXv1GcIP0b01WCB0dxAHE5qtL4a0jd48WmxV6aMJGEI=; b=BxxXaM1KlmgJhi80si30tIlbpU990sgz7tl3neOe6YekJQSgjwKwAk4eXbgzKn2L6e npuAiCFQXKP6S12MiHdEXbzqi1gAemPWRRlqSwvcNI8hXuvpQR/o40CfNGpGKA0tOvyR 4RyHqsKYNytymzW10oWLtIPlghQ1uikOfSLwBxnTAo+ObiaWxWCmE3p+HIWgYCraKBMn JWxj/IZCb9ej9WRUIH7xps3qM6vAmN6YBhYSS6IUQcVRdN/Ezbc9ZyW/dpfsSpJUpAi6 y6N/hAfNEYXiCmtL1WSXebUcBd1G6u/J7spJ+xwbMZ63j4ef2AnEi5P3NM6lD7n6atSE Ca7w== X-Gm-Message-State: AAQBX9eYnkDzawqxr0M4nWU+Vbzizpq58xstDQXVqIcknbwS6DnpcJ6g Cv0HwXN0swueH++vYcarnzVZrWfKRcnu09bo5q4= X-Google-Smtp-Source: AKy350asjKP3U0+twDNYb29W50NssXzM9KfQVg707hZ1pElnupMPu2X9cFoRAu3bB4exQZa/RnwHDg== X-Received: by 2002:a17:903:32c2:b0:1a2:98b1:1ee2 with SMTP id i2-20020a17090332c200b001a298b11ee2mr3128683plr.15.1680609689270; Tue, 04 Apr 2023 05:01:29 -0700 (PDT) Received: from localhost ([2401:fa00:8f:203:959b:21ea:166b:c273]) by smtp.gmail.com with UTF8SMTPSA id l2-20020a17090aec0200b0022335f1dae2sm7834214pjy.22.2023.04.04.05.01.26 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 04 Apr 2023 05:01:28 -0700 (PDT) From: David Stevens X-Google-Original-From: David Stevens To: linux-mm@kvack.org, Peter Xu , Hugh Dickins Cc: Andrew Morton , Matthew Wilcox , "Kirill A . Shutemov" , Yang Shi , David Hildenbrand , Jiaqi Yan , linux-kernel@vger.kernel.org, David Stevens Subject: [PATCH v6 1/4] mm/khugepaged: drain lru after swapping in shmem Date: Tue, 4 Apr 2023 21:01:14 +0900 Message-Id: <20230404120117.2562166-2-stevensd@google.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog In-Reply-To: <20230404120117.2562166-1-stevensd@google.com> References: <20230404120117.2562166-1-stevensd@google.com> MIME-Version: 1.0 X-Stat-Signature: ggodoiy4n936t4y9adx5ropwhtkaetcy X-Rspam-User: X-Rspamd-Queue-Id: D6D1C18001D X-Rspamd-Server: rspam06 X-HE-Tag: 1680609690-88332 X-HE-Meta: U2FsdGVkX1+sYUCQBAsqxLBWsJmWDHsoTKZHTFauFDpu4idwN+M01rn0bUfqIt8UsNxaLo8LNt17XIGZ1+BkocD0Pv7Bk/cQ1WoCV6AoKff7r8Xnffgjv3If2BZJ6PffBPbsMYrG/+gKCFJHzpMmdwqKS3cWKPflYa6Y8ArqUOPGPMZeOll3htFbbko/NII6+frqf/PCsio6wO/62wn+vcYOiQq+fvgYqWlRyK+VEbJgXeYj4fb7zHqx62e35C0BK5E9NPhkzQdQjXBABQWwcmRjv5CJx3MQKDTYzHNX/Jwqg4xPpK6rb0gDDZVa2rgtzkxU30IPb6ovfL8UV3hLspaT/piw/F9Ory1/wDPvdPW+8NSmI77TkgE379xykGnthzDh44oYACVisXC8SarG/Drg37fkrFktbfOfH477iaINLYm+Ln51aCrxbtaIkUXAS7mQl1q7Mcm9FVjkGqLLnNp4K47qd0hkXej2xs3D53opHyZuEywPrezpsWmIowoU3CS33+cebuHBAtIgq4c4Ot6lWtiH2uK503IYCjl4F8w3a80z7KfJepauqGZ9AEULLzaNAuQn0MBCl/WQoDenYrJb2eOJKYjgnF6HLyV/yG5JmW4xlAZizIzBIxTTdkPSGXnELBC5qBo6mMAmk1F0nYlnY/ZPZasNebXrwsqY+Kpxy/AgfgXpERQhFda9ivN7RBAuO6lFJRJ5aLqoWg2kl0EAMIfcm86Znf0H+X4ykxwDajbyKyDLz4+VC3YaXd7ZVQV5UYeHDQl3jnZWwnZgowx4ESBPyTloyPLlU13JhoUAFJ5tiAisLSPAYs4MJXsA+VkVmmWnf8o5dmmbOtnlqL7r/xCaBXkIOw+Comv07S+PCPsY1ja7KUeHAXBa0SA5mlbTP8fuiBV5DfA2FiwHXQ/W4tzfJgNZQ2MhPOqCf/NDcdGWPZ54JSFUxaxhI2YZGXwyT2AOv3TYhpCk0HJ xoJX7pIz hQrH8qVTqTZQ4canvPWjJDFSsmcG/H/S3/KjFiy6474LDdVKQ0cIfG6SeALfLM9wgkv/19pVEwxX1TN8EUKq8Ac96tSAs+OcSaOvRtHwWvsjnuQM1I2vyB7bOMdvzHVl+kMNfrXn1pVRI7ikc9R/JrpxP4oRzlPyPj1HYAkNkU/zMqkb5wraiyzu64HqbkyD0PHDCZqQYqColNHhT0OD9ue+7f7u9ktk4QIYHef7JPzoz65pYRb0uuLKBGwPyNaPh/gOckyf5s/HyjLye/cI757wzdvuy4lLxV2jx48iCr8Ul37car7jKEgRKWJZicxYNwDdfw99bgydmXyG/Kxb/hqnuDTZGsBCQEPXD+fMhZYIpOi2yGe7eSf8uKw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Stevens Call lru_add_drain after swapping in shmem pages so that isolate_lru_page is more likely to succeed. Signed-off-by: David Stevens --- mm/khugepaged.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 666d2c4e38dd..90577247cfaf 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1963,6 +1963,8 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, result = SCAN_FAIL; goto xa_unlocked; } + /* drain pagevecs to help isolate_lru_page() */ + lru_add_drain(); page = folio_file_page(folio, index); } else if (trylock_page(page)) { get_page(page); From patchwork Tue Apr 4 12:01:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 13199707 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 229A7C6FD1D for ; Tue, 4 Apr 2023 12:01:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AFEAC6B0078; Tue, 4 Apr 2023 08:01:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AAE506B007D; Tue, 4 Apr 2023 08:01:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 94F226B007E; Tue, 4 Apr 2023 08:01:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 86FCF6B0078 for ; Tue, 4 Apr 2023 08:01:38 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 3DCEA1C5CF3 for ; Tue, 4 Apr 2023 12:01:38 +0000 (UTC) X-FDA: 80643569076.23.6709B82 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf06.hostedemail.com (Postfix) with ESMTP id F0821180008 for ; Tue, 4 Apr 2023 12:01:35 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b="LpKJkwC/"; spf=pass (imf06.hostedemail.com: domain of stevensd@chromium.org designates 209.85.214.173 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680609696; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XOz9ZvZBBEKBeMMkHNXogZGyfPp/Ut91O6nybFFIxig=; b=60dlecz6TBgcKz1U559Ey9Imt28+0mctm0tMe8O/iPJP0pCK/RThXZoO8vNQpv+oWGTLd9 F+ywB67XfiViCdyhPxVfYcdjKupQPbB1ygQbyFmgLkgvvXuT6EqAxqhXoKclALqV6y3gtU irQKb30xDXzj8deGMnbDJeGYxdzrSww= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b="LpKJkwC/"; spf=pass (imf06.hostedemail.com: domain of stevensd@chromium.org designates 209.85.214.173 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680609696; a=rsa-sha256; cv=none; b=7i2wvJXSN6UUffXfZovjcDAeY8CVGsJR1dcOPNSCrFnW5EoGQIcDW4YYbsKeX94aqi8tx7 Bvg9dvA2j1Ergx8/750UkJY0dV2D+ZdjplxSAZ/2vtmgY41tfnaC8X/FwY8Ym2CnWb/qgp C+AOgszONqaLIi6NBj15hAFGOcGfqXM= Received: by mail-pl1-f173.google.com with SMTP id le6so31014586plb.12 for ; Tue, 04 Apr 2023 05:01:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1680609694; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XOz9ZvZBBEKBeMMkHNXogZGyfPp/Ut91O6nybFFIxig=; b=LpKJkwC/Xvom7WI58Uh3lqj2ua0zYirTPTwcp394etxw0rK3IwhkM96vdC6V3HusTq goPgMnjzSxw+XBSabm8XtESIEkMpOPmHzi9fwkbw9wzsV9CXLyzpw728M3cSAQEtzgO4 cG2js/CXkOtBJhMLfCxTPgx3KpeWs3kP21T+g= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680609694; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XOz9ZvZBBEKBeMMkHNXogZGyfPp/Ut91O6nybFFIxig=; b=2N/1zF0ab7XvaSTNw6mUdBSoc69RKdV9QmohgbdgZqxnVVJ2QnYbclSGPTulXgaGfx XoNmWjFQcd1WMya0FtrQ5LM8/NJuOo33tSZio0CdLjiMEceyKAId7qgBSXdVx4i9cawa WzpCzC8jv1ZnNfK/Vr9RLz/f+Ab7ydQa9auUA8VSgPiIrTihoT9fEnXCl69Kj2i4sHiL +L/d3amLouaYlllHYQ6EOu384ekts7Yta1d55Ykptqfyy2waZAE8PNY8QBnE8FzESA0z Qh6bAh0DE7/4m2NWTTAV+Tp3nawvOhvDkSixnExA1JSbQYmfQpykRzbDafKQXE8JlRC3 4foQ== X-Gm-Message-State: AAQBX9e+nD1QHWguC8hIRhGJhuzu+2DcOMVhoVnOTIgvUNLZz3t5ZKlm feM3xrwYraujhGMeVJW1PFIis0joTtxF20E3b+Q= X-Google-Smtp-Source: AKy350YMLghR1MHGUC8Rhnn9F5I2hBH6L2PbmBMQmf2fM+UAqA1gmblC6yVFeYDBpiXa/jdBNsdClg== X-Received: by 2002:a17:90b:1b06:b0:234:ba34:71bf with SMTP id nu6-20020a17090b1b0600b00234ba3471bfmr2848124pjb.1.1680609694233; Tue, 04 Apr 2023 05:01:34 -0700 (PDT) Received: from localhost ([2401:fa00:8f:203:959b:21ea:166b:c273]) by smtp.gmail.com with UTF8SMTPSA id s17-20020a63d051000000b0051322a5aa64sm7658703pgi.3.2023.04.04.05.01.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 04 Apr 2023 05:01:33 -0700 (PDT) From: David Stevens X-Google-Original-From: David Stevens To: linux-mm@kvack.org, Peter Xu , Hugh Dickins Cc: Andrew Morton , Matthew Wilcox , "Kirill A . Shutemov" , Yang Shi , David Hildenbrand , Jiaqi Yan , linux-kernel@vger.kernel.org, David Stevens Subject: [PATCH v6 2/4] mm/khugepaged: refactor collapse_file control flow Date: Tue, 4 Apr 2023 21:01:15 +0900 Message-Id: <20230404120117.2562166-3-stevensd@google.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog In-Reply-To: <20230404120117.2562166-1-stevensd@google.com> References: <20230404120117.2562166-1-stevensd@google.com> MIME-Version: 1.0 X-Stat-Signature: oua38phi9kp96s8u69c1paghndxi5ftm X-Rspam-User: X-Rspamd-Queue-Id: F0821180008 X-Rspamd-Server: rspam06 X-HE-Tag: 1680609695-520937 X-HE-Meta: U2FsdGVkX1+6qylEz1BwcbgVezI0d6Hn5sZL+hdU1fdXgKPJHuh/6lOWqr3AqjLqlOJ4BMxyQt+YVpDpzpoyuKA+Sm7ZwLreERocfKvzhwNMjvxT/108jFN2gOktfOrEdL2gFnEW+c7IwQ7OCe5HfwKcQeLUlCgbbvT3L+cZ//0+rlPLQSizF+O/9PVuMhgIc+8RR45RU1uZCKqbOagpR0Nn4VX0RZ2JyBXlMD8uiALODeszW4nEhEV8XcTf3xEj8qTy2zCQ6CNAfI0lKHOpnNPGszkGHJBS5IkiPKq77TkcfVjr1eGrw7FrfKKAPpBdnCcu8UUUo0tQ+q7mERRlTh3yxzEMtaOK13Vvpd/j8430s7IpNfrCiTiPdrs+2xBw44OEF2Lt1VWvJ0fWzr8LhqZnNOlVT1I0ILCkwplN7MRLLqxQH7NrLfsExD/loXdz4yVF0ukqKN3wZ1EMt/yW89qyAgsJ2i6YqF1qgcI2aZgRtKI2NOWa2wVjtOeRNYL9owMrt1KwkGgMXKRruxRZ29nPtUFitJk19A0kqVNQWinn5aKtfhBiGRPkRvmm31JmBCO6iBwUgYvLLOD3SbGv1Zov9tnOlUMVz6EdVJv5L88UusuDTnzvh9dFLNltEBbXCkio7wl1hlrWdM/AZFh2lkedr2fcPHcu5W+dKJNvhlT53PH4t5bDlQYFtOy5/MHbZXqEiy/A0oA1JZl9qmQ/Fb21VIY56hNiTKg73v9hUviW62zzNtrAHq3nV4HAuFYSjAFfXMvhS9NC5ALP8XCxK1Y0t0oOcDLP28x4jUcuaZFEab1AE0sqKCKSv/VEzFBkpm3nssP9W+3T4SH8tdCIjXYma1D1H3SiUI+25AiWZyYn/qaJ0gqFf5Hnf7fI3ESLonWRXlht+HY2grv3mrJO+ZawUhW47xloUozwWVAOmNvw5rV7pQ1HlSbpJ1LZOcPPaNjAyG/CmQjippJ75TL jg24iq+W tZ/PcRHNY8gls4bU4TMolk29MUDzvlkJFUxEpzpN/sjjTh7A8bCC6MTidZ+BOuU7xog2JhRg/X/GF6aWkN8oeDuhG8n+YY8XHVTCuCa+1uiyS0nlsmRNmoBPgh0BWFZt9x8w5w23nwGMzne0m5OFoYhysTHew2y1eO/exTz4aGUm9pZdIBcRt0Wy3cvi+qpGMe7qmdfJXFlt2YzQ3GjehQmX36siX+Im3tQ8Y30QKsF8pSbwE9IfwRvByo8DRgBIo1cF0rU0uPyfKyfRrczpArpNov/9H3iUWXkewgnZXYeAvfW24KiHDmpuirb4UJbZiBx1oLNwXoYjDiR06uqjcJYV9Y4GiGi2GaqKlYoS093UO5sRcUniatr2VkLEuel9x3x26B2ubKgaGsyB2sHoFP8J078vAOulnauUyhxmnuzAklG9reVLRRvSyBXRQzLOWeN9EO9Y4E6vSCth5ZhccYi6XxA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Stevens Add a rollback label to deal with failure, instead of continuously checking for RESULT_SUCCESS, to make it easier to add more failure cases. The refactoring also allows the collapse_file tracepoint to include hpage on success (instead of NULL). Signed-off-by: David Stevens Acked-by: Peter Xu Reviewed-by: Yang Shi Acked-by: Hugh Dickins --- mm/khugepaged.c | 230 ++++++++++++++++++++++++------------------------ 1 file changed, 113 insertions(+), 117 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 90577247cfaf..90828272a065 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1890,6 +1890,12 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, if (result != SCAN_SUCCEED) goto out; + __SetPageLocked(hpage); + if (is_shmem) + __SetPageSwapBacked(hpage); + hpage->index = start; + hpage->mapping = mapping; + /* * Ensure we have slots for all the pages in the range. This is * almost certainly a no-op because most of the pages must be present @@ -1902,16 +1908,10 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, xas_unlock_irq(&xas); if (!xas_nomem(&xas, GFP_KERNEL)) { result = SCAN_FAIL; - goto out; + goto rollback; } } while (1); - __SetPageLocked(hpage); - if (is_shmem) - __SetPageSwapBacked(hpage); - hpage->index = start; - hpage->mapping = mapping; - /* * At this point the hpage is locked and not up-to-date. * It's safe to insert it into the page cache, because nobody would @@ -2137,137 +2137,133 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, */ try_to_unmap_flush(); - if (result == SCAN_SUCCEED) { - /* - * Replacing old pages with new one has succeeded, now we - * attempt to copy the contents. - */ - index = start; - list_for_each_entry(page, &pagelist, lru) { - while (index < page->index) { - clear_highpage(hpage + (index % HPAGE_PMD_NR)); - index++; - } - if (copy_mc_highpage(hpage + (page->index % HPAGE_PMD_NR), - page) > 0) { - result = SCAN_COPY_MC; - break; - } - index++; - } - while (result == SCAN_SUCCEED && index < end) { + if (result != SCAN_SUCCEED) + goto rollback; + + /* + * Replacing old pages with new one has succeeded, now we + * attempt to copy the contents. + */ + index = start; + list_for_each_entry(page, &pagelist, lru) { + while (index < page->index) { clear_highpage(hpage + (index % HPAGE_PMD_NR)); index++; } + if (copy_mc_highpage(hpage + (page->index % HPAGE_PMD_NR), page) > 0) { + result = SCAN_COPY_MC; + goto rollback; + } + index++; + } + while (index < end) { + clear_highpage(hpage + (index % HPAGE_PMD_NR)); + index++; + } + + /* + * Copying old pages to huge one has succeeded, now we + * need to free the old pages. + */ + list_for_each_entry_safe(page, tmp, &pagelist, lru) { + list_del(&page->lru); + page->mapping = NULL; + page_ref_unfreeze(page, 1); + ClearPageActive(page); + ClearPageUnevictable(page); + unlock_page(page); + put_page(page); } nr = thp_nr_pages(hpage); - if (result == SCAN_SUCCEED) { - /* - * Copying old pages to huge one has succeeded, now we - * need to free the old pages. - */ - list_for_each_entry_safe(page, tmp, &pagelist, lru) { - list_del(&page->lru); - page->mapping = NULL; - page_ref_unfreeze(page, 1); - ClearPageActive(page); - ClearPageUnevictable(page); - unlock_page(page); - put_page(page); - } + xas_lock_irq(&xas); + if (is_shmem) + __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr); + else + __mod_lruvec_page_state(hpage, NR_FILE_THPS, nr); - xas_lock_irq(&xas); - if (is_shmem) - __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr); - else - __mod_lruvec_page_state(hpage, NR_FILE_THPS, nr); + if (nr_none) { + __mod_lruvec_page_state(hpage, NR_FILE_PAGES, nr_none); + /* nr_none is always 0 for non-shmem. */ + __mod_lruvec_page_state(hpage, NR_SHMEM, nr_none); + } + /* Join all the small entries into a single multi-index entry. */ + xas_set_order(&xas, start, HPAGE_PMD_ORDER); + xas_store(&xas, hpage); + xas_unlock_irq(&xas); - if (nr_none) { - __mod_lruvec_page_state(hpage, NR_FILE_PAGES, nr_none); - /* nr_none is always 0 for non-shmem. */ - __mod_lruvec_page_state(hpage, NR_SHMEM, nr_none); - } - /* Join all the small entries into a single multi-index entry. */ - xas_set_order(&xas, start, HPAGE_PMD_ORDER); - xas_store(&xas, hpage); - xas_unlock_irq(&xas); + folio = page_folio(hpage); + folio_mark_uptodate(folio); + folio_ref_add(folio, HPAGE_PMD_NR - 1); - folio = page_folio(hpage); - folio_mark_uptodate(folio); - folio_ref_add(folio, HPAGE_PMD_NR - 1); + if (is_shmem) + folio_mark_dirty(folio); + folio_add_lru(folio); - if (is_shmem) - folio_mark_dirty(folio); - folio_add_lru(folio); + /* + * Remove pte page tables, so we can re-fault the page as huge. + */ + result = retract_page_tables(mapping, start, mm, addr, hpage, + cc); + unlock_page(hpage); + goto out; + +rollback: + /* Something went wrong: roll back page cache changes */ + xas_lock_irq(&xas); + if (nr_none) { + mapping->nrpages -= nr_none; + shmem_uncharge(mapping->host, nr_none); + } - /* - * Remove pte page tables, so we can re-fault the page as huge. - */ - result = retract_page_tables(mapping, start, mm, addr, hpage, - cc); - unlock_page(hpage); - hpage = NULL; - } else { - /* Something went wrong: roll back page cache changes */ - xas_lock_irq(&xas); - if (nr_none) { - mapping->nrpages -= nr_none; - shmem_uncharge(mapping->host, nr_none); + xas_set(&xas, start); + xas_for_each(&xas, page, end - 1) { + page = list_first_entry_or_null(&pagelist, + struct page, lru); + if (!page || xas.xa_index < page->index) { + if (!nr_none) + break; + nr_none--; + /* Put holes back where they were */ + xas_store(&xas, NULL); + continue; } - xas_set(&xas, start); - xas_for_each(&xas, page, end - 1) { - page = list_first_entry_or_null(&pagelist, - struct page, lru); - if (!page || xas.xa_index < page->index) { - if (!nr_none) - break; - nr_none--; - /* Put holes back where they were */ - xas_store(&xas, NULL); - continue; - } + VM_BUG_ON_PAGE(page->index != xas.xa_index, page); - VM_BUG_ON_PAGE(page->index != xas.xa_index, page); - - /* Unfreeze the page. */ - list_del(&page->lru); - page_ref_unfreeze(page, 2); - xas_store(&xas, page); - xas_pause(&xas); - xas_unlock_irq(&xas); - unlock_page(page); - putback_lru_page(page); - xas_lock_irq(&xas); - } - VM_BUG_ON(nr_none); + /* Unfreeze the page. */ + list_del(&page->lru); + page_ref_unfreeze(page, 2); + xas_store(&xas, page); + xas_pause(&xas); + xas_unlock_irq(&xas); + unlock_page(page); + putback_lru_page(page); + xas_lock_irq(&xas); + } + VM_BUG_ON(nr_none); + /* + * Undo the updates of filemap_nr_thps_inc for non-SHMEM + * file only. This undo is not needed unless failure is + * due to SCAN_COPY_MC. + */ + if (!is_shmem && result == SCAN_COPY_MC) { + filemap_nr_thps_dec(mapping); /* - * Undo the updates of filemap_nr_thps_inc for non-SHMEM - * file only. This undo is not needed unless failure is - * due to SCAN_COPY_MC. + * Paired with smp_mb() in do_dentry_open() to + * ensure the update to nr_thps is visible. */ - if (!is_shmem && result == SCAN_COPY_MC) { - filemap_nr_thps_dec(mapping); - /* - * Paired with smp_mb() in do_dentry_open() to - * ensure the update to nr_thps is visible. - */ - smp_mb(); - } + smp_mb(); + } - xas_unlock_irq(&xas); + xas_unlock_irq(&xas); - hpage->mapping = NULL; - } + hpage->mapping = NULL; - if (hpage) - unlock_page(hpage); + unlock_page(hpage); + put_page(hpage); out: VM_BUG_ON(!list_empty(&pagelist)); - if (hpage) - put_page(hpage); - trace_mm_khugepaged_collapse_file(mm, hpage, index, is_shmem, addr, file, nr, result); return result; } From patchwork Tue Apr 4 12:01:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 13199708 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A16A0C6FD1D for ; Tue, 4 Apr 2023 12:01:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 30EDE6B007D; Tue, 4 Apr 2023 08:01:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2BFA56B007E; Tue, 4 Apr 2023 08:01:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 142DC6B0080; Tue, 4 Apr 2023 08:01:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 023E06B007D for ; Tue, 4 Apr 2023 08:01:43 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 7DF7B140E8E for ; Tue, 4 Apr 2023 12:01:42 +0000 (UTC) X-FDA: 80643569244.27.43A49B4 Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) by imf24.hostedemail.com (Postfix) with ESMTP id 4E07D180020 for ; Tue, 4 Apr 2023 12:01:40 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=gGkyBqM8; spf=pass (imf24.hostedemail.com: domain of stevensd@chromium.org designates 209.85.216.50 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680609700; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4qUt+bgMbpOhTRgQhL+AHDV3zlnHqzmbomM5Wqo2t8M=; b=tdbeO26pqoPX/WEZ9MQ9pwyTypzK9FaX07eVNIbiR2cnxvzuMew5JIXQXNw1m58W4g7Gkw 7GMEOVoqUUivyL8jyXivHO1ZQvSD5CbQOcBeranLpSA3YjT4Es//4rgcYhm6oVDpiHFET+ 9FtYeqi+6T4G+HkG1vHdVdX1mT+IrYk= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=gGkyBqM8; spf=pass (imf24.hostedemail.com: domain of stevensd@chromium.org designates 209.85.216.50 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680609700; a=rsa-sha256; cv=none; b=waSXloEIJ54RL8JFARgM8N5CVjRRs/BYFiFpgyHLRwTUUnHAj4Z9RtWlPtdXfUORqfhmbp wES1vcHClZZ4JDBrrMD2YL0UW0wZ5YpnrjsV3nmah3HNvxgS+xDzQ6Td9Wvt4Gt03XhI9S ncOEA2D7/09Ft48Ph6ulxDoShUSPqGU= Received: by mail-pj1-f50.google.com with SMTP id r7-20020a17090b050700b002404be7920aso31833472pjz.5 for ; Tue, 04 Apr 2023 05:01:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1680609698; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4qUt+bgMbpOhTRgQhL+AHDV3zlnHqzmbomM5Wqo2t8M=; b=gGkyBqM897kTx3mCEJLjxur+giZZePN0hsSQZRggvEHfuxcLZ/xAHZn7d/zocZ+B/t 1cIzL09rih7OgQJFA0TJpSLo18rTYMeEnbExDfttmiQbIW+B8qdcJ1V6KdSL3C2CVoIk uhn8K/3B8O1z7wv+u/gk9GUet3CqJe2QvDgOU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680609698; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4qUt+bgMbpOhTRgQhL+AHDV3zlnHqzmbomM5Wqo2t8M=; b=yfOG9cKN+/7x8RflN2B4PfdAAqNvkMZObZ2Yrc0KQiREJZ7OTUiBOT6OyYLbYNETk9 sJB7W/3MRReDLzZEMYKcHrP70hs3Wm42DB+5XzbTHAeKZUZVFiZkbfOxbBAr+HVyf9yY wfO5hvUSFu8WQBVs3vufTFaqakaSBftFv14f2FFhQ5NOeJteqHPAPO0Wv/tB+1IjDE/A 28juvQYr8U6Sdiz1vR6+JG2hhrWELheV4W8OCDa5j6RGwZ6ed/NA1QMj2U7sU+2NwmGZ YwRF/dbSX8G4cClD1uJUlJHID4W8+MHWAcnlpiUNo2FhcHmp+T/QwKCVGqcOznkYO7MD LL6Q== X-Gm-Message-State: AAQBX9fl2JXvU+OGK7VRlYFP9bX2VUDL48v1fTArUUYk6q4nRNYD6i6+ zurJALVElTsLyzY2SwHxuRr9g8HYbLMOtZZTemA= X-Google-Smtp-Source: AKy350YJmn368SsJA0s1UjNOINFx3s/OTYEKafflR1j6AXeAzEBTEXX+E3A7d0NoY4XJPEiZqkDcUw== X-Received: by 2002:a17:90b:1a86:b0:23f:7c82:2463 with SMTP id ng6-20020a17090b1a8600b0023f7c822463mr2874580pjb.9.1680609698541; Tue, 04 Apr 2023 05:01:38 -0700 (PDT) Received: from localhost ([2401:fa00:8f:203:959b:21ea:166b:c273]) by smtp.gmail.com with UTF8SMTPSA id q3-20020a17090a938300b0023b15e61f07sm7879485pjo.12.2023.04.04.05.01.35 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 04 Apr 2023 05:01:38 -0700 (PDT) From: David Stevens X-Google-Original-From: David Stevens To: linux-mm@kvack.org, Peter Xu , Hugh Dickins Cc: Andrew Morton , Matthew Wilcox , "Kirill A . Shutemov" , Yang Shi , David Hildenbrand , Jiaqi Yan , linux-kernel@vger.kernel.org, David Stevens Subject: [PATCH v6 3/4] mm/khugepaged: skip shmem with userfaultfd Date: Tue, 4 Apr 2023 21:01:16 +0900 Message-Id: <20230404120117.2562166-4-stevensd@google.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog In-Reply-To: <20230404120117.2562166-1-stevensd@google.com> References: <20230404120117.2562166-1-stevensd@google.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 4E07D180020 X-Stat-Signature: agzqmfg7ohcre1id5jyxp1rh64t89yiz X-HE-Tag: 1680609700-235826 X-HE-Meta: U2FsdGVkX18IPwDXOczbU57cL7brT1llOHiTm/DUQwbdXIOt5RzXh0gD/g4s4uchxzX3in0z+pWiJhHuy0QeypVSaqn7IHj6oQuXveivEEBsG0nLjsbUH2LnmxNSwdnb8wtsJhxckDkna6YdBSMOEiqUzFWH56QneodeMGpeYhczbzRcf1fTiUw/8QP6OTiKzENgNcg3C12+QgrxrqyTZqH7+XXj8veBvxQKYVHgci/Myj5kCQ6NyLsPs3cHZbb8LOBnt1rKcknsiOLcNXyZAo0+Zq4vCysIAzmkp7meHfYdW4rNuvpFuMc/7iapN5me7uHCHe9DWLfZeKWv5Erdu9kuzRjSwFxOoUQcNQWLLrcHvlr9pznLUOwEHu/GrzcQSYHqCLdc2xFeIrCb+VtIhpKnWhSjMdOaMUYneKQqZVK476HA6yoiJiWEPxz9+Q615SH9hEXFsvkfylo6v0s2cAOP0ZYEh8t1qb64wzU4FASxMRU78bB6HpdJXnCltdDgUSfAWNRDODaABFMFhnkXGFF+cbxA2CP0X1L6B7+o5craVbz2LeC6v4WTxQKAtbfE39LkCHdkFgNDp6VeCBL4J3O8JAKATjv7RFNcWcBQadh0PgRm7TYfn7veff0AVzPCHp8iOq9YEtPuZWQKH5j4AVXNGSuG9VW7y+Oc51KQ2xLhXDi19N3d1+qG3o/DX85czSF2Y6Bfg2eE3/jDEpcZxQymHJ+0Rl4xgGoWhEfMRdRU5Ra+XHbSzPalYQGRh3SbCg0UUBfHbzFTavYNKHZXYffL6v5oXuvaJ+XZ6Y0xXcWGd2ocm/0fptcC1igWNHAAgrhb/n0EQZgusmSD8qEpJKv0BZ8zj20BwpdHN+spS9efr9dNNUzvZ9E/1i7PsfFoLFyzAlnMo4Nb2eWh8gPuv6gEUSowK534iudI2ztuEIlekksEZ461oFvXVcxWcjN1zdn9TtgDdRoFZ19QYBp SbHtE4I+ Zc9Rl+Sg9QjC7qotYgt89qSTdHrBv9Ig/KaamSC5Sl+H2A21brD4MEPNd97OgZDowCq+UDJo6vyewWlNy5cjsc6M++kyALRZePQFN16ZvQkQbeISAC3IVjAJefsMXlCQCIxpS8GmoN64jQTdNthBUy2F75kf/cSvzFQHInd6rlSGv3LnnfYcV+cioT/PzqzeQvLtTxp5G3ILo4eIqXSbn8p5mt3c0fe8fYBs3FEfrkY4v9dmqkOP1Cl/tJKdz/1GpSKajhq1/BCNzGFuH07c9Mham/HOvdNoeJcX8ozCrqxjdE2EpQb2EFSxTvnzE6cV1HAhufeZ7tQD6lrYOVH0Q1n19UAfml3CBgWVAE3l9pt+WchknSqlQ/qCUPg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Stevens Make sure that collapse_file respects any userfaultfds registered with MODE_MISSING. If userspace has any such userfaultfds registered, then for any page which it knows to be missing, it may expect a UFFD_EVENT_PAGEFAULT. This means collapse_file needs to be careful when collapsing a shmem range would result in replacing an empty page with a THP, to avoid breaking userfaultfd. Synchronization when checking for userfaultfds in collapse_file is tricky because the mmap locks can't be used to prevent races with the registration of new userfaultfds. Instead, we provide synchronization by ensuring that userspace cannot observe the fact that pages are missing before we check for userfaultfds. Although this allows registration of a userfaultfd to race with collapse_file, it ensures that userspace cannot observe any pages transition from missing to present after such a race occurs. This makes such a race indistinguishable to the collapse occurring immediately before the userfaultfd registration. The first step to provide this synchronization is to stop filling gaps during the loop iterating over the target range, since the page cache lock can be dropped during that loop. The second step is to fill the gaps with XA_RETRY_ENTRY after the page cache lock is acquired the final time, to avoid races with accesses to the page cache that only take the RCU read lock. The fact that we don't fill holes during the initial iteration means that collapse_file now has to handle faults occurring during the collapse. This is done by re-validating the number of missing pages after acquiring the page cache lock for the final time. This fix is targeted at khugepaged, but the change also applies to MADV_COLLAPSE. MADV_COLLAPSE on a range with a userfaultfd will now return EBUSY if there are any missing pages (instead of succeeding on shmem and returning EINVAL on anonymous memory). There is also now a window during MADV_COLLAPSE where a fault on a missing page will cause the syscall to fail with EAGAIN. The fact that intermediate page cache state can no longer be observed before the rollback of a failed collapse is also technically a userspace-visible change (via at least SEEK_DATA and SEEK_END), but it is exceedingly unlikely that anything relies on being able to observe that transient state. Signed-off-by: David Stevens Acked-by: Peter Xu --- include/trace/events/huge_memory.h | 3 +- mm/khugepaged.c | 109 +++++++++++++++++++++-------- 2 files changed, 81 insertions(+), 31 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index eca4c6f3625e..877cbf9fd2ec 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -38,7 +38,8 @@ EM( SCAN_TRUNCATED, "truncated") \ EM( SCAN_PAGE_HAS_PRIVATE, "page_has_private") \ EM( SCAN_STORE_FAILED, "store_failed") \ - EMe(SCAN_COPY_MC, "copy_poisoned_page") + EM( SCAN_COPY_MC, "copy_poisoned_page") \ + EMe(SCAN_PAGE_FILLED, "page_filled") \ #undef EM #undef EMe diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 90828272a065..7679551e9540 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -57,6 +57,7 @@ enum scan_result { SCAN_PAGE_HAS_PRIVATE, SCAN_COPY_MC, SCAN_STORE_FAILED, + SCAN_PAGE_FILLED, }; #define CREATE_TRACE_POINTS @@ -1856,8 +1857,8 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, * - allocate and lock a new huge page; * - scan page cache replacing old pages with the new one * + swap/gup in pages if necessary; - * + fill in gaps; * + keep old pages around in case rollback is required; + * - finalize updates to the page cache; * - if replacing succeeds: * + copy data over; * + free old pages; @@ -1935,22 +1936,12 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, result = SCAN_TRUNCATED; goto xa_locked; } - xas_set(&xas, index); + xas_set(&xas, index + 1); } if (!shmem_charge(mapping->host, 1)) { result = SCAN_FAIL; goto xa_locked; } - xas_store(&xas, hpage); - if (xas_error(&xas)) { - /* revert shmem_charge performed - * in the previous condition - */ - mapping->nrpages--; - shmem_uncharge(mapping->host, 1); - result = SCAN_STORE_FAILED; - goto xa_locked; - } nr_none++; continue; } @@ -2161,22 +2152,66 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, index++; } - /* - * Copying old pages to huge one has succeeded, now we - * need to free the old pages. - */ - list_for_each_entry_safe(page, tmp, &pagelist, lru) { - list_del(&page->lru); - page->mapping = NULL; - page_ref_unfreeze(page, 1); - ClearPageActive(page); - ClearPageUnevictable(page); - unlock_page(page); - put_page(page); + if (nr_none) { + struct vm_area_struct *vma; + int nr_none_check = 0; + + i_mmap_lock_read(mapping); + xas_lock_irq(&xas); + + xas_set(&xas, start); + for (index = start; index < end; index++) { + if (!xas_next(&xas)) { + xas_store(&xas, XA_RETRY_ENTRY); + if (xas_error(&xas)) { + result = SCAN_STORE_FAILED; + goto immap_locked; + } + nr_none_check++; + } + } + + if (nr_none != nr_none_check) { + result = SCAN_PAGE_FILLED; + goto immap_locked; + } + + /* + * If userspace observed a missing page in a VMA with a MODE_MISSING + * userfaultfd, then it might expect a UFFD_EVENT_PAGEFAULT for that + * page. If so, we need to roll back to avoid suppressing such an + * event. Since wp/minor userfaultfds don't give userspace any + * guarantees that the kernel doesn't fill a missing page with a zero + * page, so they don't matter here. + * + * Any userfaultfds registered after this point will not be able to + * observe any missing pages due to the previously inserted retry + * entries. + */ + vma_interval_tree_foreach(vma, &mapping->i_mmap, start, end) { + if (userfaultfd_missing(vma)) { + result = SCAN_EXCEED_NONE_PTE; + goto immap_locked; + } + } + +immap_locked: + i_mmap_unlock_read(mapping); + if (result != SCAN_SUCCEED) { + xas_set(&xas, start); + for (index = start; index < end; index++) { + if (xas_next(&xas) == XA_RETRY_ENTRY) + xas_store(&xas, NULL); + } + + xas_unlock_irq(&xas); + goto rollback; + } + } else { + xas_lock_irq(&xas); } nr = thp_nr_pages(hpage); - xas_lock_irq(&xas); if (is_shmem) __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr); else @@ -2206,6 +2241,20 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, result = retract_page_tables(mapping, start, mm, addr, hpage, cc); unlock_page(hpage); + + /* + * The collapse has succeeded, so free the old pages. + */ + list_for_each_entry_safe(page, tmp, &pagelist, lru) { + list_del(&page->lru); + page->mapping = NULL; + page_ref_unfreeze(page, 1); + ClearPageActive(page); + ClearPageUnevictable(page); + unlock_page(page); + put_page(page); + } + goto out; rollback: @@ -2217,15 +2266,13 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, } xas_set(&xas, start); - xas_for_each(&xas, page, end - 1) { + end = index; + for (index = start; index < end; index++) { + xas_next(&xas); page = list_first_entry_or_null(&pagelist, struct page, lru); if (!page || xas.xa_index < page->index) { - if (!nr_none) - break; nr_none--; - /* Put holes back where they were */ - xas_store(&xas, NULL); continue; } @@ -2749,12 +2796,14 @@ static int madvise_collapse_errno(enum scan_result r) case SCAN_ALLOC_HUGE_PAGE_FAIL: return -ENOMEM; case SCAN_CGROUP_CHARGE_FAIL: + case SCAN_EXCEED_NONE_PTE: return -EBUSY; /* Resource temporary unavailable - trying again might succeed */ case SCAN_PAGE_COUNT: case SCAN_PAGE_LOCK: case SCAN_PAGE_LRU: case SCAN_DEL_PAGE_LRU: + case SCAN_PAGE_FILLED: return -EAGAIN; /* * Other: Trying again likely not to succeed / error intrinsic to From patchwork Tue Apr 4 12:01:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 13199709 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4734CC761A6 for ; Tue, 4 Apr 2023 12:01:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D6E946B0080; Tue, 4 Apr 2023 08:01:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D1F106B0081; Tue, 4 Apr 2023 08:01:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC01E6B0082; Tue, 4 Apr 2023 08:01:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id AC8B86B0080 for ; Tue, 4 Apr 2023 08:01:47 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 8C19C1A06D7 for ; Tue, 4 Apr 2023 12:01:47 +0000 (UTC) X-FDA: 80643569454.13.1F3EF2A Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) by imf10.hostedemail.com (Postfix) with ESMTP id A8448C0005 for ; Tue, 4 Apr 2023 12:01:45 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b="cpPgK8/n"; spf=pass (imf10.hostedemail.com: domain of stevensd@chromium.org designates 209.85.216.41 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680609705; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Vyhfi+1sWMxl+7ZSluYYXlOU9BFaVL7c1FFz74Xev1Y=; b=8i5WGIGSNDtUfj6HrY2pN5YoB5yxvQwon4deIfJOPoByeYpRHeDHRFtS4dtbGyyls/+/1M SDl7T1zHBTypfF225hWOtfD7Lkd+Uj8SjzeVHFj5wpF1h/f3MFsAcaQdwmur/H0CjvdU/+ lR100S8xkiplY57ZTmUbg+xP1v3LVm0= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b="cpPgK8/n"; spf=pass (imf10.hostedemail.com: domain of stevensd@chromium.org designates 209.85.216.41 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680609705; a=rsa-sha256; cv=none; b=HCt3HthpJ1BtAqm4dFwGgbVwV8IENW9V5EsCHd01Kuio4aKHdhqeFDyfW+Uy3a0IEtt9+L Ui97kZX+kGXLNdH56ulSGvKAu1bj4iLEG57NiVLPG2Z7n3qVCKTV8ZDkHuLsE+iVWBrmkE YKlSJ0E312ixUS/3+i5sVPxORyFpBWs= Received: by mail-pj1-f41.google.com with SMTP id e15-20020a17090ac20f00b0023d1b009f52so35866006pjt.2 for ; Tue, 04 Apr 2023 05:01:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1680609704; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Vyhfi+1sWMxl+7ZSluYYXlOU9BFaVL7c1FFz74Xev1Y=; b=cpPgK8/nWgVUBBTA+Q+NHmPq/3BrBdxiVC2lvjhc5fGgylaFPc5/panj0rvENCI0Sh GGcuIYS+4UxwSAR2K0MukNoiOP1u4VJYUoTPIALoJ1eIGK9MyEaSkPQDH/xxCeN3oa1F cQPVr2GzWPoKMusf6zXvu113yINmDfSEd8s4U= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680609704; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Vyhfi+1sWMxl+7ZSluYYXlOU9BFaVL7c1FFz74Xev1Y=; b=JVKQKeW3VZArMBx0IiYUWnblGieEpIP8rCpH/LpYIUEXiWwcYynZQ/1lskhcV+xEXD ESWi+Pg5e7wpczf7FtK9Fz71OVVX+5TkQhuYJJECgQ9s3um5E64ln+CU1s1vrP5b124w 6A2RIynQB4iZSTLI8OBPjm7/TKYx7sviTsW+bb+tqu9hLN5Zq7oWWgADzfEx9KiK2xyQ XRpt8uE/69Gk3oQQejtl979oXEhD942H3UIqrAh1NTDT27ATaD3cdzSRQDQg3JwNuq2O EcwGSx2abhImjr5CxFMVv5f78i5WsOOd/FXjJK2YAzxpgVC4PQ3gQjbcQxAzjyZhxNzg iLTQ== X-Gm-Message-State: AAQBX9fg/nkMQClViuzXpr/10x/YHkJtPdf0Soq/KVJErTLqZ4aL/hpA d4vkUx/PtBlI+5c6133F5Hgp7uLPWKsbaJS4GnE= X-Google-Smtp-Source: AKy350ZGp0by7zBEDeJP7f167x6LyxoDREyVG0zApneHT5l/sdT8Kwof8HJrTkmCawNC1ggmn03V2g== X-Received: by 2002:a17:90b:3b8a:b0:237:161d:f5ac with SMTP id pc10-20020a17090b3b8a00b00237161df5acmr2422009pjb.36.1680609703010; Tue, 04 Apr 2023 05:01:43 -0700 (PDT) Received: from localhost ([2401:fa00:8f:203:959b:21ea:166b:c273]) by smtp.gmail.com with UTF8SMTPSA id e5-20020a170902744500b0019ee045a2b3sm8193256plt.308.2023.04.04.05.01.40 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 04 Apr 2023 05:01:42 -0700 (PDT) From: David Stevens X-Google-Original-From: David Stevens To: linux-mm@kvack.org, Peter Xu , Hugh Dickins Cc: Andrew Morton , Matthew Wilcox , "Kirill A . Shutemov" , Yang Shi , David Hildenbrand , Jiaqi Yan , linux-kernel@vger.kernel.org, David Stevens Subject: [PATCH v6 4/4] mm/khugepaged: maintain page cache uptodate flag Date: Tue, 4 Apr 2023 21:01:17 +0900 Message-Id: <20230404120117.2562166-5-stevensd@google.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog In-Reply-To: <20230404120117.2562166-1-stevensd@google.com> References: <20230404120117.2562166-1-stevensd@google.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: A8448C0005 X-Stat-Signature: yueip6nzwan8oy8kgguwijwk8zn8oehp X-HE-Tag: 1680609705-33584 X-HE-Meta: U2FsdGVkX1/G9E9xGG1agMAKJeBQ2O59nPnTmBaRjNpQC0vRx36GQlXmZzVt1a3/uSQA7+1ojshfxgYajtsOJflzcUdDt67jlNOhzRis5hma0deB3OVR4Tu1eIIAraDJOuWM5nVS5fA88uMAZq4HmKRdD3abHuKcLrKjqQsSVvBVmOXmCgsSPU1x0DnTJEz4Bss1bzECcCmgliunCNRcAotohWQ5INqadwJZCc3MCnt3eHPUeZr+nV7QqZe7dLULrUWytKfZb7AIAkPiZj+gZSw1obd6ENc2MoKD+B4y4y6AkNADeSNyqnTEttETiQ1Y+6qvxbNln+4oRAOlxWLuM1fChNhKSabgavCD2+FUgYFqiATqwQkJ4Bkhdj0iTvPZSJAhREmVqdwV2c/5mKwXvkvcyXPggKolHIwXrIJXciRAVjgIjN+de+4Wco65/giocQW26LmohINxRtJb/ELiHfL+9GaQv7nCoBB9Pr/JP/vxTudjWAs/AsRmdS7W5TWmOL2dhMKlfvvLtXdv1e4U6JYmHtY50ThQaHe/70nZUU/fN/drAY7321dHBNJur0DitIKaprnUYcOuskX/vk5ow0MI/fo0/KNoLaXvfpDsdMvumaI9I+iGe/ZulO3jRnJKmc5zm3QGTrbg5f1wA7a8o2iW+BO8TVddtb1lQY4115v2l7cNR5XLsrVdyLukFYVjLLtq40ixdA6ebekjmgsTZ0esD4MgZB5AJrDY8wGt5js7MXqtrifTAd0I1OFvMaIXoZuBh2KfXsz147+G/n2k2la+7sXPCIGrgfNEzJ1Fx1sHCCpkVcjv/T2o5XBqMojcwwa9D8kfsG63/kXIn5fHvzpEE8t1GzHsmUYmdxsvdsLrrugDuc56qkKWERuw1AOmNapq5qhBNaXvvrd9hzfw3/J71nASCtV1KQ3VHeD/nahYeG8VoaOM9dG5FOZZFuL153B02xz0XYjN+0bx3EU 773I7yOp 2kUxmKDEyjaXF+6/lOzXYkruZ7BqVRDpCl6g7RvUQfOiWuy8681FL4tfOWmK8lRjPgvrGCahS9ai53rc/OOGHEnatteUoKkKUyBXwvbToI7LpKcwnCTg43L/4R1r8FSDtF0GzTxYfroI9PIkesrxxAQp3XzRrY+9bs/HdLw8NfFAEWtMN7p3Ohb2L0OPcveZkEExQwl653fjLCFcgF3L7ycLqInUPV0t80znJS5MvkiB08BCxGQCyYWIfeDgVcxyAoJjWBQ3M1I/a/keEMcArQwhvrgG5f6NNIuBN4IgqZLzIXpuhd+Ymled+/PVOID0LfZDiRDnOzz88YDwRN8GkHMYEUJ1SWPPWGsLc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Stevens Make sure that collapse_file doesn't interfere with checking the uptodate flag in the page cache by only inserting hpage into the page cache after it has been updated and marked uptodate. This is achieved by simply not replacing present pages with hpage when iterating over the target range. The present pages are already locked, so replacing them with the locked hpage before the collapse is finalized is unnecessary. However, it is necessary to stop freezing the present pages after validating them, since leaving long-term frozen pages in the page cache can lead to deadlocks. Simply checking the reference count is sufficient to ensure that there are no long-term references hanging around that would the collapse would break. Similar to hpage, there is no reason that the present pages actually need to be frozen in addition to being locked. This fixes a race where folio_seek_hole_data would mistake hpage for an fallocated but unwritten page. This race is visible to userspace via data temporarily disappearing from SEEK_DATA/SEEK_HOLE. This also fixes a similar race where pages could temporarily disappear from mincore. Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: David Stevens --- mm/khugepaged.c | 79 ++++++++++++++++++------------------------------- 1 file changed, 29 insertions(+), 50 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 7679551e9540..a19aa140fd52 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1855,17 +1855,18 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, * * Basic scheme is simple, details are more complex: * - allocate and lock a new huge page; - * - scan page cache replacing old pages with the new one + * - scan page cache, locking old pages * + swap/gup in pages if necessary; - * + keep old pages around in case rollback is required; + * - copy data to new page + * - handle shmem holes + * + re-validate that holes weren't filled by someone else + * + check for userfaultfd * - finalize updates to the page cache; * - if replacing succeeds: - * + copy data over; - * + free old pages; * + unlock huge page; + * + free old pages; * - if replacing failed; - * + put all pages back and unfreeze them; - * + restore gaps in the page cache; + * + unlock old pages * + unlock and free huge page; */ static int collapse_file(struct mm_struct *mm, unsigned long addr, @@ -1913,12 +1914,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, } } while (1); - /* - * At this point the hpage is locked and not up-to-date. - * It's safe to insert it into the page cache, because nobody would - * be able to map it or use it in another way until we unlock it. - */ - xas_set(&xas, start); for (index = start; index < end; index++) { page = xas_next(&xas); @@ -2076,12 +2071,16 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, VM_BUG_ON_PAGE(page != xas_load(&xas), page); /* - * The page is expected to have page_count() == 3: + * We control three references to the page: * - we hold a pin on it; * - one reference from page cache; * - one from isolate_lru_page; + * If those are the only references, then any new usage of the + * page will have to fetch it from the page cache. That requires + * locking the page to handle truncate, so any new usage will be + * blocked until we unlock page after collapse/during rollback. */ - if (!page_ref_freeze(page, 3)) { + if (page_count(page) != 3) { result = SCAN_PAGE_COUNT; xas_unlock_irq(&xas); putback_lru_page(page); @@ -2089,13 +2088,9 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, } /* - * Add the page to the list to be able to undo the collapse if - * something go wrong. + * Accumulate the pages that are being collapsed. */ list_add_tail(&page->lru, &pagelist); - - /* Finally, replace with the new page. */ - xas_store(&xas, hpage); continue; out_unlock: unlock_page(page); @@ -2132,8 +2127,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, goto rollback; /* - * Replacing old pages with new one has succeeded, now we - * attempt to copy the contents. + * The old pages are locked, so they won't change anymore. */ index = start; list_for_each_entry(page, &pagelist, lru) { @@ -2222,11 +2216,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, /* nr_none is always 0 for non-shmem. */ __mod_lruvec_page_state(hpage, NR_SHMEM, nr_none); } - /* Join all the small entries into a single multi-index entry. */ - xas_set_order(&xas, start, HPAGE_PMD_ORDER); - xas_store(&xas, hpage); - xas_unlock_irq(&xas); + /* + * Mark hpage as uptodate before inserting it into the page cache so + * that it isn't mistaken for an fallocated but unwritten page. + */ folio = page_folio(hpage); folio_mark_uptodate(folio); folio_ref_add(folio, HPAGE_PMD_NR - 1); @@ -2235,6 +2229,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, folio_mark_dirty(folio); folio_add_lru(folio); + /* Join all the small entries into a single multi-index entry. */ + xas_set_order(&xas, start, HPAGE_PMD_ORDER); + xas_store(&xas, hpage); + xas_unlock_irq(&xas); + /* * Remove pte page tables, so we can re-fault the page as huge. */ @@ -2248,47 +2247,29 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, list_for_each_entry_safe(page, tmp, &pagelist, lru) { list_del(&page->lru); page->mapping = NULL; - page_ref_unfreeze(page, 1); ClearPageActive(page); ClearPageUnevictable(page); unlock_page(page); - put_page(page); + folio_put_refs(page_folio(page), 3); } goto out; rollback: /* Something went wrong: roll back page cache changes */ - xas_lock_irq(&xas); if (nr_none) { + xas_lock_irq(&xas); mapping->nrpages -= nr_none; shmem_uncharge(mapping->host, nr_none); + xas_unlock_irq(&xas); } - xas_set(&xas, start); - end = index; - for (index = start; index < end; index++) { - xas_next(&xas); - page = list_first_entry_or_null(&pagelist, - struct page, lru); - if (!page || xas.xa_index < page->index) { - nr_none--; - continue; - } - - VM_BUG_ON_PAGE(page->index != xas.xa_index, page); - - /* Unfreeze the page. */ + list_for_each_entry_safe(page, tmp, &pagelist, lru) { list_del(&page->lru); - page_ref_unfreeze(page, 2); - xas_store(&xas, page); - xas_pause(&xas); - xas_unlock_irq(&xas); unlock_page(page); putback_lru_page(page); - xas_lock_irq(&xas); + put_page(page); } - VM_BUG_ON(nr_none); /* * Undo the updates of filemap_nr_thps_inc for non-SHMEM * file only. This undo is not needed unless failure is @@ -2303,8 +2284,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, smp_mb(); } - xas_unlock_irq(&xas); - hpage->mapping = NULL; unlock_page(hpage);