From patchwork Tue Mar 7 05:20:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 13162795 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 714B1C678DB for ; Tue, 7 Mar 2023 05:20:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EC5156B0072; Tue, 7 Mar 2023 00:20:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E72496B0073; Tue, 7 Mar 2023 00:20:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D3A396B0074; Tue, 7 Mar 2023 00:20:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C568F6B0072 for ; Tue, 7 Mar 2023 00:20:56 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 9D28780B91 for ; Tue, 7 Mar 2023 05:20:56 +0000 (UTC) X-FDA: 80540952912.22.FA3A179 Received: from mail-pg1-f181.google.com (mail-pg1-f181.google.com [209.85.215.181]) by imf25.hostedemail.com (Postfix) with ESMTP id A340AA0007 for ; Tue, 7 Mar 2023 05:20:54 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=CMO72NJk; spf=pass (imf25.hostedemail.com: domain of stevensd@chromium.org designates 209.85.215.181 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678166454; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xQ8JLh1t+SQa8MFtRCjvqWkhYIqZz1dASMHutsA+ZIA=; b=hrVTlNGzuD3sq8JE+3tKb7E75zaumuw46QD4WMCLMWx2zJhyUiL/HTDtBlE8LEgDXoZD+d PhZzI+3Y1aDcFQ2Bz29UVPGVMY80//JscPu/HZbdWkV3s5Itt5n8GNHEcCVA0mxBsCkbf+ 6r3UOVb5Us/wCZzVHI9VB+tyxiHWdIk= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=CMO72NJk; spf=pass (imf25.hostedemail.com: domain of stevensd@chromium.org designates 209.85.215.181 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678166454; a=rsa-sha256; cv=none; b=UL+7BLMH1iSZ+z0oUof9rKy1S9MuHVv+OthS8pEX2fRPK33BM+9/LvtXvmXL6mfqzIfk3x b5Nd4mn+MyZmtaMsfdxNWCVjvrU2FMS/TLrO7s1O3r6JYIE/vTuBGjVRelRrkfEwLbWRdj c5nulo64C0CRedmZZca2yDRt5TaD4Ak= Received: by mail-pg1-f181.google.com with SMTP id bn17so6908663pgb.10 for ; Mon, 06 Mar 2023 21:20:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1678166453; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xQ8JLh1t+SQa8MFtRCjvqWkhYIqZz1dASMHutsA+ZIA=; b=CMO72NJkW0Gjo9VKTtSKmbPwn1x80TIoUa7QDwO/WtPJ3SB4S9uE4uELHTnI6cLvRr 3/+EAcw807ziz5ELFf3PmVbEwRg4I0QT0bKP84kgt9Zew2qiUWL34pwhrz3umv9if16g EGPNFHXrFVzRkYWfFwOHqdM1K875EV6Hcx8KQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678166453; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xQ8JLh1t+SQa8MFtRCjvqWkhYIqZz1dASMHutsA+ZIA=; b=rlL1NG5YznE/4Jy/849QxCpDIZ9w4DEd1PjnNqezyIxz2MlBMKbxt5Yl6bb0bps5Ph aoP3v3KXN50MrNPyDNEi0BM+3liXCWBK/J3sJMqXp8i/KRyEZClJNimUw1V0B6CguhZU k3SIrRkh86RIy+CD2uyj3Y392iB8nlWO2qYlglmzGYPuKzdvERIXfb2K61XFr5O/dK5N Qs3JMjlgWdnELUn9UQyCdh0t9/U/LRqHjnt5OATelXORIzveZWET4rOb2tNOP/IZv+x0 mCv6qJhm2OKY73o62a4sIuUHDZNQPEeueg0DtCIVgEk4QdJla74TGn/h0zItNdpn9rMO zq9Q== X-Gm-Message-State: AO0yUKU9agIx4i9uhhqMvVPnIVxR8EdBt0CecCo9ZImKygCmpGh0mIqm Sw46kE39X6geBqEh94aNFa/akMYDnEeKjq9rIJQ= X-Google-Smtp-Source: AK7set/WWK3FBlMA7qczOE6tpe9xjw7BHBGdOTgOWHL2+jS1vRIuoH1POzid1TxtaatIr6JYfANBWQ== X-Received: by 2002:a62:1ad4:0:b0:5a8:d407:60f9 with SMTP id a203-20020a621ad4000000b005a8d40760f9mr10448268pfa.29.1678166453057; Mon, 06 Mar 2023 21:20:53 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:1f73:9034:ce28:4421]) by smtp.gmail.com with UTF8SMTPSA id a4-20020aa78644000000b005a8ba70315bsm7096096pfo.6.2023.03.06.21.20.50 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 06 Mar 2023 21:20:52 -0800 (PST) From: David Stevens X-Google-Original-From: David Stevens To: linux-mm@kvack.org, Andrew Morton Cc: Peter Xu , Matthew Wilcox , "Kirill A . Shutemov" , Yang Shi , David Hildenbrand , Hugh Dickins , Jiaqi Yan , linux-kernel@vger.kernel.org, David Stevens Subject: [PATCH v5 1/3] mm/khugepaged: refactor collapse_file control flow Date: Tue, 7 Mar 2023 14:20:34 +0900 Message-Id: <20230307052036.1520708-2-stevensd@google.com> X-Mailer: git-send-email 2.40.0.rc0.216.gc4246ad0f0-goog In-Reply-To: <20230307052036.1520708-1-stevensd@google.com> References: <20230307052036.1520708-1-stevensd@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: A340AA0007 X-Stat-Signature: iqx8i7exaac3iy5c975cjm9mrx8hft7i X-Rspam-User: X-HE-Tag: 1678166454-248328 X-HE-Meta: U2FsdGVkX1+++9R2W5xxLPryE6UR1e2ZBYi43naJCtL15LHu0StR9Mpl6NVMtZ78tYzrqnQNNNvQe/S2dEp/COIiGfW8LaUpProcH5UGeptmxCIsP3Gx/TVbGJIj8uZOt/kFxQ5siO6pcYv8eZ+AVTOIP+0k9f3F7lnCSXpKzXnlV7f5aj9cYjCHop004XazSA8UzWO6qZhFfXiyqPr4Ry8+qgDNT2pVgE2T2ysisTJF/SdArxfPonJ9n5dUSE0qb2awPGG0UOwiSLiANgWWXEnSQ3dG2nnso2+aAV74n4eT2xnB7DhWeauqAQdQ6QiZF3QQW4IIZl0I20dUQVdEZZkJbmDWrpASUxsVhDbkbBLJPY019ExysZBnt8NeGgnGLZD33VPHmmrjozkrueAEWbqKPST+fiG+QVvb0dj5+fAXp0hgT5YQMK5RhTbnANUPdyeNMyrzEAUkt6Eb91NuAptjXGLgRC521o5ulDnmU0WEuR0PzAZMQRn1tseHry1zrishvgeL5xb0FoJIOVrbyuKPcZjCY4wZjLQW+2O6mhRFliJjOzjayflggyW9DoFd8epoC43GsD+4TpO34ehm2Uor8tjhYBvQMr1pmMM68CKsCZQN7FYo6DJ8IrP5ZB/LTbApJwDYz+zT3RnieXO1qhu0DwVm1rZuGFhhFjS+TYfu5zMljb0QH7Za69f2pmjx2fl1AQyKpZFNvQrdRGsRgAv7muU7Rel+Hk94+PZDeLWAPk8ckEZtLEOfK92eMUl854MeT1wityAhpg0JNhbIKwofXjdg5sA1kq8m1EC2yRLpDCmuJXrVldO1CNuIPZO+87X0kcGpIMt4MtsfGcJIfbCZUeEai57f5HWypXT0mleIlv8J/nXSSvTw/cOMkUCHZBN8TvrY3wa876WAP86v8hKaVdrifhdCsyQRXjg4VDZWVykA5YXV0cUT7j6Rytz8ioicrsbFIDxENpwucGj A+jZas5V 4LSArVfyl2aswKbNt1H89C+y3n9th5qgQnoXMUoctqhvXBvmKQOr+rQOvE9YMQ6w+7Abn/71oP3tGR2kCsojXhJhsKeXu5MU8+8EKMWhUqSDscH6FVQrtbTKb9oOtoPRL6mlFWHbxAVIepDFa4lYRJoJZRSo5Bnkw2w/9lFwmkGenNGsJ7jni2ZZIqIWXBO8Y4TQ0O7OZofj6RzyElr1dtckQ8u6Bz9fL2zXpGcT0JOcGOkZk9zMak9sx3GetXh65QCwXypBdgmqTWAtaBUL+5B9ekwRcK+QYEy7RaMiO+D8MDlYgzgesyBSTkNNr6BnKDeeL50OcpRdNCpggbJOrZeYM/Ng2D+1kN2PKQoaSgCTSefqctaw63TqGcVrmiRhm7P+a6LhtAgOaMNNriwAr5AE20kgWepq7RyJxKS6Y/Ye7zXr8mSvcjeTDBR+fhfD9CDi8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Stevens Add a rollback label to deal with failure, instead of continuously checking for RESULT_SUCCESS, to make it easier to add more failure cases. The refactoring also allows the collapse_file tracepoint to include hpage on success (instead of NULL). Signed-off-by: David Stevens Acked-by: Peter Xu Reviewed-by: Yang Shi Acked-by: Hugh Dickins --- mm/khugepaged.c | 219 ++++++++++++++++++++++++------------------------ 1 file changed, 108 insertions(+), 111 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 3ea2aa55c2c5..b954e3c685e7 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1907,6 +1907,12 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, if (result != SCAN_SUCCEED) goto out; + __SetPageLocked(hpage); + if (is_shmem) + __SetPageSwapBacked(hpage); + hpage->index = start; + hpage->mapping = mapping; + /* * Ensure we have slots for all the pages in the range. This is * almost certainly a no-op because most of the pages must be present @@ -1919,16 +1925,10 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, xas_unlock_irq(&xas); if (!xas_nomem(&xas, GFP_KERNEL)) { result = SCAN_FAIL; - goto out; + goto rollback; } } while (1); - __SetPageLocked(hpage); - if (is_shmem) - __SetPageSwapBacked(hpage); - hpage->index = start; - hpage->mapping = mapping; - /* * At this point the hpage is locked and not up-to-date. * It's safe to insert it into the page cache, because nobody would @@ -2145,129 +2145,126 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, */ try_to_unmap_flush(); - if (result == SCAN_SUCCEED) { - /* - * Replacing old pages with new one has succeeded, now we - * attempt to copy the contents. - */ - index = start; - list_for_each_entry(page, &pagelist, lru) { - while (index < page->index) { - clear_highpage(hpage + (index % HPAGE_PMD_NR)); - index++; - } - if (copy_mc_highpage(hpage + (page->index % HPAGE_PMD_NR), - page) > 0) { - result = SCAN_COPY_MC; - break; - } - index++; - } - while (result == SCAN_SUCCEED && index < end) { + if (result != SCAN_SUCCEED) + goto rollback; + + /* + * Replacing old pages with new one has succeeded, now we + * attempt to copy the contents. + */ + index = start; + list_for_each_entry(page, &pagelist, lru) { + while (index < page->index) { clear_highpage(hpage + (index % HPAGE_PMD_NR)); index++; } + if (copy_mc_highpage(hpage + (page->index % HPAGE_PMD_NR), page) > 0) { + result = SCAN_COPY_MC; + goto rollback; + } + index++; + } + while (index < end) { + clear_highpage(hpage + (index % HPAGE_PMD_NR)); + index++; } - if (result == SCAN_SUCCEED) { - /* - * Copying old pages to huge one has succeeded, now we - * need to free the old pages. - */ - list_for_each_entry_safe(page, tmp, &pagelist, lru) { - list_del(&page->lru); - page->mapping = NULL; - page_ref_unfreeze(page, 1); - ClearPageActive(page); - ClearPageUnevictable(page); - unlock_page(page); - put_page(page); - } + /* + * Copying old pages to huge one has succeeded, now we + * need to free the old pages. + */ + list_for_each_entry_safe(page, tmp, &pagelist, lru) { + list_del(&page->lru); + page->mapping = NULL; + page_ref_unfreeze(page, 1); + ClearPageActive(page); + ClearPageUnevictable(page); + unlock_page(page); + put_page(page); + } - xas_lock_irq(&xas); - if (is_shmem) - __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr); - else - __mod_lruvec_page_state(hpage, NR_FILE_THPS, nr); + xas_lock_irq(&xas); + if (is_shmem) + __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr); + else + __mod_lruvec_page_state(hpage, NR_FILE_THPS, nr); - if (nr_none) { - __mod_lruvec_page_state(hpage, NR_FILE_PAGES, nr_none); - /* nr_none is always 0 for non-shmem. */ - __mod_lruvec_page_state(hpage, NR_SHMEM, nr_none); - } - /* Join all the small entries into a single multi-index entry. */ - xas_set_order(&xas, start, HPAGE_PMD_ORDER); - xas_store(&xas, hpage); - xas_unlock_irq(&xas); + if (nr_none) { + __mod_lruvec_page_state(hpage, NR_FILE_PAGES, nr_none); + /* nr_none is always 0 for non-shmem. */ + __mod_lruvec_page_state(hpage, NR_SHMEM, nr_none); + } + /* Join all the small entries into a single multi-index entry. */ + xas_set_order(&xas, start, HPAGE_PMD_ORDER); + xas_store(&xas, hpage); + xas_unlock_irq(&xas); - folio = page_folio(hpage); - folio_mark_uptodate(folio); - folio_ref_add(folio, HPAGE_PMD_NR - 1); + folio = page_folio(hpage); + folio_mark_uptodate(folio); + folio_ref_add(folio, HPAGE_PMD_NR - 1); - if (is_shmem) - folio_mark_dirty(folio); - folio_add_lru(folio); + if (is_shmem) + folio_mark_dirty(folio); + folio_add_lru(folio); - /* - * Remove pte page tables, so we can re-fault the page as huge. - */ - result = retract_page_tables(mapping, start, mm, addr, hpage, - cc); - unlock_page(hpage); - hpage = NULL; - } else { - /* Something went wrong: roll back page cache changes */ - xas_lock_irq(&xas); - if (nr_none) { - mapping->nrpages -= nr_none; - shmem_uncharge(mapping->host, nr_none); + /* + * Remove pte page tables, so we can re-fault the page as huge. + */ + result = retract_page_tables(mapping, start, mm, addr, hpage, + cc); + unlock_page(hpage); + goto out; + +rollback: + /* Something went wrong: roll back page cache changes */ + xas_lock_irq(&xas); + if (nr_none) { + mapping->nrpages -= nr_none; + shmem_uncharge(mapping->host, nr_none); + } + + xas_set(&xas, start); + xas_for_each(&xas, page, end - 1) { + page = list_first_entry_or_null(&pagelist, + struct page, lru); + if (!page || xas.xa_index < page->index) { + if (!nr_none) + break; + nr_none--; + /* Put holes back where they were */ + xas_store(&xas, NULL); + continue; } - xas_set(&xas, start); - xas_for_each(&xas, page, end - 1) { - page = list_first_entry_or_null(&pagelist, - struct page, lru); - if (!page || xas.xa_index < page->index) { - if (!nr_none) - break; - nr_none--; - /* Put holes back where they were */ - xas_store(&xas, NULL); - continue; - } + VM_BUG_ON_PAGE(page->index != xas.xa_index, page); - VM_BUG_ON_PAGE(page->index != xas.xa_index, page); + /* Unfreeze the page. */ + list_del(&page->lru); + page_ref_unfreeze(page, 2); + xas_store(&xas, page); + xas_pause(&xas); + xas_unlock_irq(&xas); + unlock_page(page); + putback_lru_page(page); + xas_lock_irq(&xas); + } + VM_BUG_ON(nr_none); + /* + * Undo the updates of filemap_nr_thps_inc for non-SHMEM file only. + * This undo is not needed unless failure is due to SCAN_COPY_MC. + */ + if (!is_shmem && result == SCAN_COPY_MC) + filemap_nr_thps_dec(mapping); - /* Unfreeze the page. */ - list_del(&page->lru); - page_ref_unfreeze(page, 2); - xas_store(&xas, page); - xas_pause(&xas); - xas_unlock_irq(&xas); - unlock_page(page); - putback_lru_page(page); - xas_lock_irq(&xas); - } - VM_BUG_ON(nr_none); - /* - * Undo the updates of filemap_nr_thps_inc for non-SHMEM file only. - * This undo is not needed unless failure is due to SCAN_COPY_MC. - */ - if (!is_shmem && result == SCAN_COPY_MC) - filemap_nr_thps_dec(mapping); + xas_unlock_irq(&xas); - xas_unlock_irq(&xas); + hpage->mapping = NULL; - hpage->mapping = NULL; - } + unlock_page(hpage); + put_page(hpage); - if (hpage) - unlock_page(hpage); out: VM_BUG_ON(!list_empty(&pagelist)); - if (hpage) - put_page(hpage); - trace_mm_khugepaged_collapse_file(mm, hpage, index, is_shmem, addr, file, nr, result); return result; } From patchwork Tue Mar 7 05:20:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 13162796 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D5CAC678DB for ; Tue, 7 Mar 2023 05:21:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AA3EA280001; Tue, 7 Mar 2023 00:21:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A2C986B0075; Tue, 7 Mar 2023 00:21:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8CCFA280001; Tue, 7 Mar 2023 00:21:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 7CD7F6B0074 for ; Tue, 7 Mar 2023 00:21:02 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 47F121209FC for ; Tue, 7 Mar 2023 05:21:02 +0000 (UTC) X-FDA: 80540953164.18.99A4C00 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf17.hostedemail.com (Postfix) with ESMTP id 5C69E40004 for ; Tue, 7 Mar 2023 05:21:00 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b="c/TSVl2F"; spf=pass (imf17.hostedemail.com: domain of stevensd@chromium.org designates 209.85.214.177 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678166460; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AjnWe8/2NQ2Rb5ldPsFVe45g3piuxL8Z1VW03Vfk8mw=; b=AGHMQgkQSk4Qb39P+2VzPwrYGggF1eIul2Q+396MEuJDgRwA3OzngsaZQZAewPrXQ9ib5w D+B3OCO1VetG83PDIn/FBGKoIeGpuWjdGIKXBPQkemBNzKR3DT5irRVmlBDa3MOA66D9Af RjQh1VFpeE8K9JW2XoUnqTik9PTjaL0= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b="c/TSVl2F"; spf=pass (imf17.hostedemail.com: domain of stevensd@chromium.org designates 209.85.214.177 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678166460; a=rsa-sha256; cv=none; b=6hTg8zQ/wJKAi9283m/a/i1C7M49oPI+wT7uhfF3B1Rjr9L/zsGvzOYFNUMXnmM3R8AsWt w5qHTL/m1r4NcXm1ZVc1wRBXm9O463WJxaamIJJkpuQGWWqBdbjHkqDJry+q+ZTdM9Hy8e z0CfRq/VhF+TBmjcz2E1XOJ9LxVYa7Y= Received: by mail-pl1-f177.google.com with SMTP id i3so12914580plg.6 for ; Mon, 06 Mar 2023 21:21:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1678166459; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=AjnWe8/2NQ2Rb5ldPsFVe45g3piuxL8Z1VW03Vfk8mw=; b=c/TSVl2FysCwSRDNgHhh/dJyDsqS/5hQaMYLYoqxjFzQWn7AXX3UZF4FuRy7UY1i9K MVejeaJmtA9f3rea8lGt5INqGQcXbzwiy1Um3ot59joo0dm+lvTAIUYirm2QX2HI08/5 HqwOxJAUkUhfHw9+ssbkFifYSHpYKHTiSwPRI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678166459; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AjnWe8/2NQ2Rb5ldPsFVe45g3piuxL8Z1VW03Vfk8mw=; b=wOv8NXFqmkp4H/UXYgwPyvs+FcRYScmfHLSaFuVn5GhgL7eKTBHHxFxSRnkVB1aXMf l9pCONIDMW09jOxUipB9QZiA8BtaiMEJj61mxh6Kv+LcheQGdBFl+EIqq7sbF8UFUqTW NqWY9/LuCmQYX9dhbJrcM4t0oor0qT+aEeetxZot0SLycKh7BwPLdqtxcn0UOCjngKQj pnSIlnoq8Ra1fALTfyveyNlsTWDe5Dq6sRCVkMce92Yr8HYOEbO9bnS4LtMS80gDykZP V493S7v4qaRwPwycnEOhy/130Fk5Mow2+oP2bLNPLtajT6XM9HApxzluI16ngcMIQNaC 9d0A== X-Gm-Message-State: AO0yUKXXDrjupstMZsWkN9Cj5qXHND1+0kdonDlpoKNUWtebG92+zT95 CrvwcGZYr66O0VVdg/jcd7hbhRbLCxqUz4m58Ro= X-Google-Smtp-Source: AK7set/sEHWb6Xu3exvXoB/mmv+4TeohXcOBTwkbILkcrj+AmDUd121Z7n91XJwSf4KotxpzMT99NQ== X-Received: by 2002:a05:6a20:3558:b0:bd:17a4:c35f with SMTP id f24-20020a056a20355800b000bd17a4c35fmr11288488pze.23.1678166458635; Mon, 06 Mar 2023 21:20:58 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:1f73:9034:ce28:4421]) by smtp.gmail.com with UTF8SMTPSA id n13-20020aa7904d000000b0058b927b9653sm7293130pfo.92.2023.03.06.21.20.56 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 06 Mar 2023 21:20:58 -0800 (PST) From: David Stevens X-Google-Original-From: David Stevens To: linux-mm@kvack.org, Andrew Morton Cc: Peter Xu , Matthew Wilcox , "Kirill A . Shutemov" , Yang Shi , David Hildenbrand , Hugh Dickins , Jiaqi Yan , linux-kernel@vger.kernel.org, David Stevens Subject: [PATCH v5 2/3] mm/khugepaged: skip shmem with userfaultfd Date: Tue, 7 Mar 2023 14:20:35 +0900 Message-Id: <20230307052036.1520708-3-stevensd@google.com> X-Mailer: git-send-email 2.40.0.rc0.216.gc4246ad0f0-goog In-Reply-To: <20230307052036.1520708-1-stevensd@google.com> References: <20230307052036.1520708-1-stevensd@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 5C69E40004 X-Stat-Signature: n9hssw8gosyft6jnz5738kdcqag9h31e X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1678166460-454424 X-HE-Meta: U2FsdGVkX1/p+SM10EhvN/d20Mz4W5wCF7BTsUSzOick+7kI9gMwv08Ib3iOp0jUJAS6OS0/Mf4MRdBlOhfGfMdSKEtetMS0XKc6O743FTeK85J/Q7/IAU1On11Jm0DzeXtgWi0DjA+tVR10H3Zhe/TeJWIfBVVjhoEHfEt85OiktQo2Qh3cIvp3BQKfEE0lC0rNzQjsA2kbqakr35mtpaOkPG93Vru8pml8O8axe8mfomwK1khp9GvLztbvYh9S4jbWPi2CflGBFyClb93ffJTRLTmxiHVFjGbqoEuESSicjbWyETUZ64EAIsg2prd4SXAk24zIxm88PDbRmX5/bWso1kqO+zY6is+YN2r+sAVxamhQHmFW25ocbQ6YTyDcpCtlPruKLnjwkOHXuVwihy9x/emZ2T++2IXFnQVltf6zv31Y/9BVfeCIto/veGFGs7pLexmqMhguZMnwjXm+cd6xWptf6E6+ig7O7QM0Bpzuq8WyWL+aoUXnFgITbzHeIooSjopzqgk3Ocq6A3OGnMs1B/pI3UfoiVk2koQnZpBZuCE8jTxo/WiXmENUYlFyrPBQn7DxuvEBl0imoM+0kMjW9SdgBzbdfjH6EvZrtH3MaXxTjqqnLYVZee8mAmgyIe+g0Tw4RbzVtulMLYBQTt/Qh/XuZJv6qkVbvHJsnVjaW1yn9tjHOV89Auo0arOxCS6GVW2PlJ16SLSCUULrZP8Lmb9GCDoK/h7e8PS6jHP0HtVmV5zRsbQu+aUlgsjXhE+euNEEjyYOy+AvMtQQrgvO4080cKDHqKP1epBdjlqWU4vxbgVuWTZUEbdjopWM5D+h4UbiLr2eVoia0vcnx/QIfhiQaqh+ZGO5w5uaQ6P26/DqGsfNEYJBvnWzYkVZWpKbPRisf9owPJF8LGSP5auJbmSTh4U2nGc4G3m/1WMxyDVCnwOV6wYG2O1oJY6wHcqAumUIXLR3ssF+jn9 xLnLW0vQ 7RZcsO30T+lcAqg5wCQOxR+njdVQBHgp5g16kKQEF2bRAF40PqgFb6IfJln52/wsQlDLcdLIGIC9QDBDZf4JOCDC9jLLqexJB1WNbt5uQaNqwdqyi3pXOjnzGAyNm7WlQlLcsEK5VoskiHL3vbsMEWdmWR9MV6c+VEmJHlDPsfc9tyHU/dAg/FKY7UGqhf+F5A5RNIFC+4vm9hyv6lQ6BWLZ8UcKQeFUbFj0BQTZbwW6d5+nlC1Bq3H9O5IVyxHjOJCkCu0AxDziPukUmPKxqhx8NBsC3FnGL4pfTyNFN/CIoBgOAMF3Ge/eEHcT1hklz62XBI0EXY27W6indQGoCngcnGmve5WHH8AZHxBQsXNuRhZg0DPnq+wXBEw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Stevens Make sure that collapse_file respects any userfaultfds registered with MODE_MISSING. If userspace has any such userfaultfds registered, then for any page which it knows to be missing, it may expect a UFFD_EVENT_PAGEFAULT. This means collapse_file needs to be careful when collapsing a shmem range would result in replacing an empty page with a THP, to avoid breaking userfaultfd. Synchronization when checking for userfaultfds in collapse_file is tricky because the mmap locks can't be used to prevent races with the registration of new userfaultfds. Instead, we provide synchronization by ensuring that userspace cannot observe the fact that pages are missing before we check for userfaultfds. Although this allows registration of a userfaultfd to race with collapse_file, it ensures that userspace cannot observe any pages transition from missing to present after such a race occurs. This makes such a race indistinguishable to the collapse occurring immediately before the userfaultfd registration. The first step to provide this synchronization is to stop filling gaps during the loop iterating over the target range, since the page cache lock can be dropped during that loop. The second step is to fill the gaps with XA_RETRY_ENTRY after the page cache lock is acquired the final time, to avoid races with accesses to the page cache that only take the RCU read lock. The fact that we don't fill holes during the initial iteration means that collapse_file now has to handle faults occurring during the collapse. This is done by re-validating the number of missing pages after acquiring the page cache lock for the final time. This fix is targeted at khugepaged, but the change also applies to MADV_COLLAPSE. MADV_COLLAPSE on a range with a userfaultfd will now return EBUSY if there are any missing pages (instead of succeeding on shmem and returning EINVAL on anonymous memory). There is also now a window during MADV_COLLAPSE where a fault on a missing page will cause the syscall to fail with EAGAIN. The fact that intermediate page cache state can no longer be observed before the rollback of a failed collapse is also technically a userspace-visible change (via at least SEEK_DATA and SEEK_END), but it is exceedingly unlikely that anything relies on being able to observe that transient state. Signed-off-by: David Stevens Acked-by: Peter Xu --- include/trace/events/huge_memory.h | 3 +- mm/khugepaged.c | 92 +++++++++++++++++++++++------- 2 files changed, 73 insertions(+), 22 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index 46cce509957b..7ee85fff89a3 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -37,7 +37,8 @@ EM( SCAN_CGROUP_CHARGE_FAIL, "ccgroup_charge_failed") \ EM( SCAN_TRUNCATED, "truncated") \ EM( SCAN_PAGE_HAS_PRIVATE, "page_has_private") \ - EMe(SCAN_COPY_MC, "copy_poisoned_page") \ + EM( SCAN_COPY_MC, "copy_poisoned_page") \ + EMe(SCAN_PAGE_FILLED, "page_filled") \ #undef EM #undef EMe diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b954e3c685e7..51ae399f2035 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -57,6 +57,7 @@ enum scan_result { SCAN_TRUNCATED, SCAN_PAGE_HAS_PRIVATE, SCAN_COPY_MC, + SCAN_PAGE_FILLED, }; #define CREATE_TRACE_POINTS @@ -1873,8 +1874,8 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, * - allocate and lock a new huge page; * - scan page cache replacing old pages with the new one * + swap/gup in pages if necessary; - * + fill in gaps; * + keep old pages around in case rollback is required; + * - finalize updates to the page cache; * - if replacing succeeds: * + copy data over; * + free old pages; @@ -1952,13 +1953,12 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, result = SCAN_TRUNCATED; goto xa_locked; } - xas_set(&xas, index); + xas_set(&xas, index + 1); } if (!shmem_charge(mapping->host, 1)) { result = SCAN_FAIL; goto xa_locked; } - xas_store(&xas, hpage); nr_none++; continue; } @@ -2169,21 +2169,57 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, index++; } - /* - * Copying old pages to huge one has succeeded, now we - * need to free the old pages. - */ - list_for_each_entry_safe(page, tmp, &pagelist, lru) { - list_del(&page->lru); - page->mapping = NULL; - page_ref_unfreeze(page, 1); - ClearPageActive(page); - ClearPageUnevictable(page); - unlock_page(page); - put_page(page); + if (nr_none) { + struct vm_area_struct *vma; + int nr_none_check = 0; + + i_mmap_lock_read(mapping); + xas_lock_irq(&xas); + + xas_set(&xas, start); + for (index = start; index < end; index++) { + if (!xas_next(&xas)) { + xas_store(&xas, XA_RETRY_ENTRY); + nr_none_check++; + } + } + + if (nr_none != nr_none_check) { + result = SCAN_PAGE_FILLED; + goto immap_locked; + } + + /* + * If userspace observed a missing page in a VMA with an armed + * userfaultfd, then it might expect a UFFD_EVENT_PAGEFAULT for + * that page, so we need to roll back to avoid suppressing such + * an event. Any userfaultfds armed after this point will not be + * able to observe any missing pages due to the previously + * inserted retry entries. + */ + vma_interval_tree_foreach(vma, &mapping->i_mmap, start, start) { + if (userfaultfd_missing(vma)) { + result = SCAN_EXCEED_NONE_PTE; + goto immap_locked; + } + } + +immap_locked: + i_mmap_unlock_read(mapping); + if (result != SCAN_SUCCEED) { + xas_set(&xas, start); + for (index = start; index < end; index++) { + if (xas_next(&xas) == XA_RETRY_ENTRY) + xas_store(&xas, NULL); + } + + xas_unlock_irq(&xas); + goto rollback; + } + } else { + xas_lock_irq(&xas); } - xas_lock_irq(&xas); if (is_shmem) __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr); else @@ -2213,6 +2249,20 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, result = retract_page_tables(mapping, start, mm, addr, hpage, cc); unlock_page(hpage); + + /* + * The collapse has succeeded, so free the old pages. + */ + list_for_each_entry_safe(page, tmp, &pagelist, lru) { + list_del(&page->lru); + page->mapping = NULL; + page_ref_unfreeze(page, 1); + ClearPageActive(page); + ClearPageUnevictable(page); + unlock_page(page); + put_page(page); + } + goto out; rollback: @@ -2224,15 +2274,13 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, } xas_set(&xas, start); - xas_for_each(&xas, page, end - 1) { + end = index; + for (index = start; index < end; index++) { + xas_next(&xas); page = list_first_entry_or_null(&pagelist, struct page, lru); if (!page || xas.xa_index < page->index) { - if (!nr_none) - break; nr_none--; - /* Put holes back where they were */ - xas_store(&xas, NULL); continue; } @@ -2750,12 +2798,14 @@ static int madvise_collapse_errno(enum scan_result r) case SCAN_ALLOC_HUGE_PAGE_FAIL: return -ENOMEM; case SCAN_CGROUP_CHARGE_FAIL: + case SCAN_EXCEED_NONE_PTE: return -EBUSY; /* Resource temporary unavailable - trying again might succeed */ case SCAN_PAGE_COUNT: case SCAN_PAGE_LOCK: case SCAN_PAGE_LRU: case SCAN_DEL_PAGE_LRU: + case SCAN_PAGE_FILLED: return -EAGAIN; /* * Other: Trying again likely not to succeed / error intrinsic to From patchwork Tue Mar 7 05:20:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 13162797 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17B5AC678DB for ; Tue, 7 Mar 2023 05:21:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9B9BE6B0074; Tue, 7 Mar 2023 00:21:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 96A2F280003; Tue, 7 Mar 2023 00:21:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 832C2280002; Tue, 7 Mar 2023 00:21:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 7175F6B0074 for ; Tue, 7 Mar 2023 00:21:08 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id EEF1F8072A for ; Tue, 7 Mar 2023 05:21:07 +0000 (UTC) X-FDA: 80540953374.17.49C615E Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by imf24.hostedemail.com (Postfix) with ESMTP id 04DE2180017 for ; Tue, 7 Mar 2023 05:21:05 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=Q9sGR9Jp; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf24.hostedemail.com: domain of stevensd@chromium.org designates 209.85.210.174 as permitted sender) smtp.mailfrom=stevensd@chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678166466; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eFAeWPY2kcSCdT3O6CTUVlduCnpUJhZej+FbmoWYNkg=; b=jHR5Wyg2Q5e5AaNpPolOQL2qslwBXBFAst2WHvc+5+h689p7tVrUeRtqkW7uVkLWDksIBW EgluwPR/4YKCWh9dheGqO5U6iZBNYCprqoT6dh9GymgVUZDkhRUbGHGf+/4u5b8M9zbQk5 fF330iRnyLy4AK1bOremqg31cWqC0zc= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=Q9sGR9Jp; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf24.hostedemail.com: domain of stevensd@chromium.org designates 209.85.210.174 as permitted sender) smtp.mailfrom=stevensd@chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678166466; a=rsa-sha256; cv=none; b=ZqCapadr/JfQB0RqZ1MX4aT3Nuq60jGztzrdyF4Cc44YWi6FmDJY6VLoMyTOA+qXquQhAF EVgrv+m5qeCavwK7YjWi27aPJGHapuwgQN/oGHrLNdP6v5cRvilaoMs5J7c6FpCIZyECWW lhBm4k7A8N3hZ7I2uKe2s1DTJjWkBLM= Received: by mail-pf1-f174.google.com with SMTP id a7so7294675pfx.10 for ; Mon, 06 Mar 2023 21:21:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1678166464; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=eFAeWPY2kcSCdT3O6CTUVlduCnpUJhZej+FbmoWYNkg=; b=Q9sGR9JplEaeNqsSC0pRNCxNi/C7wmGhoCkA56L/loqSr4HfTSKbd24kgSyhErzSqH 0I6wjvacPrWo2mFsQrRLwwJXlg3UaeAhxOZ0HtAJPSslGOMFmzE06GPMU9FLNZYobvT/ 3o1S3aGwC8Ks5mnW1V4r0yFDwjDaYhzpIxfho= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678166464; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eFAeWPY2kcSCdT3O6CTUVlduCnpUJhZej+FbmoWYNkg=; b=nQbYYeO2HKrbdPc5NuYBZK1UoAAzNIstYgz/qkRNjNUMjOxf6UaHrQWP+7elTBKK/q xMeWOq9rJ5w2tSVtOgMEl9a0PXM9/GLvcQ4kE4+nCvOchck0X+b85Iu/gBMK+WDcAcl4 Rrq/AvtMntOiRR5YB4mPKwqEYAH3e8O+cUfafSZaix5u43a38VB2/UCmtRlTHNJeT6au 34+zm4MCrPsB9XEA/9ga8qq6pHfcEPZMgtZgRzahxug5qsr5g92lY84Ay9ADvOZ6MY5Z lwZM7sFv94IW6TGYiKca0E2SvgVVQOaY0c0nZvrZaGJ3Vl/MROvrUbmR1c142fP7H0VO 7VHg== X-Gm-Message-State: AO0yUKUWxkGHQjyoEMq3zhFfCM7Q4EJ18EnVWtnOH32MDN1wsjrcOWR2 npUdOWjD3OMypHpYVHbm4zZElOMESUIvIQnLx5U= X-Google-Smtp-Source: AK7set9Qm/5FdMeOBnVeJtsCUuwHXzpeDkbq4jDrAQajKNW8Ze2GzSb1kc4cMCotGeX1k2yhybBoAw== X-Received: by 2002:a62:6542:0:b0:5a8:ad9d:83f with SMTP id z63-20020a626542000000b005a8ad9d083fmr10412674pfb.24.1678166464453; Mon, 06 Mar 2023 21:21:04 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:1f73:9034:ce28:4421]) by smtp.gmail.com with UTF8SMTPSA id c26-20020aa78c1a000000b005a8f1187112sm7117378pfd.58.2023.03.06.21.21.01 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 06 Mar 2023 21:21:04 -0800 (PST) From: David Stevens X-Google-Original-From: David Stevens To: linux-mm@kvack.org, Andrew Morton Cc: Peter Xu , Matthew Wilcox , "Kirill A . Shutemov" , Yang Shi , David Hildenbrand , Hugh Dickins , Jiaqi Yan , linux-kernel@vger.kernel.org, David Stevens Subject: [PATCH v5 3/3] mm/khugepaged: maintain page cache uptodate flag Date: Tue, 7 Mar 2023 14:20:36 +0900 Message-Id: <20230307052036.1520708-4-stevensd@google.com> X-Mailer: git-send-email 2.40.0.rc0.216.gc4246ad0f0-goog In-Reply-To: <20230307052036.1520708-1-stevensd@google.com> References: <20230307052036.1520708-1-stevensd@google.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 04DE2180017 X-Stat-Signature: m94acftmrf54k8e6rg977tyyqr3r7oye X-HE-Tag: 1678166465-309516 X-HE-Meta: U2FsdGVkX194Emk4CmFyzJI92JFzrMla4v8VLr9vtQjf2KQKIe27Oe17tzdi+rro7DVJ+ATgwL9lndi6HDZWet0SmMXHTs3fOB6A1TXXF7p2fw3TalfX1iToPE1eqPaRppzgJ+uUs/Qzb0h8PfttpGZzNSOtm9z6bnH5iEiZ2iUKgc8CEZtY1onsjrBYp6RYjs8moRKfc3/y6oPhXUhDFR4Ij3Q3bHHUQfSC8MvfyX7ETklrfYuvjqGCNSpCrRNzc08nn6Y62vKyMzffy/T1BOxUekeVDPy9WBx0aeZ7HX3stvnlk/2zWKTWSTWa5dfVFsMD6UkVlIXeLYmzgcu2PTcDaBlOUrVBCftzFU1tmL/xHdi/QrfBRlvBhp010uwjXaFlJImfZvo8JcCia8uhZWeW/Ed3ih2zY0u5dC0Ec/mWVJqOiEjkYC6czVF4N118h/CcOFlQNYqJnQxbmSVIPi93gCkIXa3iLpyxXpiGUSSs8ldlhPlSaldmlfaIkhAI+Pp+DHNGcZ4fsSobBQgWFiu3v63L197I7POXBN5qVYJ2lHTS6HbtyAf1YK6AmZ6gjDLK0aisa17IC198vacNHO3ELmZnqdeQa0bTYxjYQbfW1Kcgv5YQK832BFnXKoxwoQ/vQhNYoe86EgDphdsEGkEbHsGtJVdc0nZdNL0FAUGw8IBcYzP4lWjn1hnDA/kNgDUL+MNCbHY+2wERBXiWZUEFayX5dq/xse+MV2boS9T3xyUe562nc/9cXOvs4XDdf9gMeWyLxt3GgvKsuFUYGXyVba4IqqUTddNkyhO/qzOfAeP2p4k6GVJVDkcJaRkRAIByMTlTdI8YPgiwSB4sNsV0NSEJnnG7+K9vg+vqreEQSr9t1Fkkooib63KKtKHShcjeStDv5UI5GXwhe+tq3LYDmD6yxcVtiXCkjomQrOgCUt4qMocUtpvuKsrZSwSwqKm5mT8rIK7idhCx+dS Q+A4JtNT HHxOdkGe1GwXeY363NR8M++4eoWNOD4AD9Dilu7BPAxwxPmGbXrh7BN1oReF4w6b3sFVyk9qR41Pzmw3iAF0BXUtJsE5kDYPb1/EsA2XcjLRKlOuc57GHtiiE54emjj9+HXnRHN0JyxmY/BLF3MDEFRoyRmxFJGe91u722LXvC3uxLFd68bjtr+8Ao/1nwOQy6TjcOw7kLYE0FtR71PLS0r3bFcFTB8RLF3J0rMd57l1ghTxy1SO/H67cvv/fO1SyLEhDA49pHRWxm9Ui8r8myk/wj+NyyqSYD6bb0mmqyrEGR2HlsVuXLM3j6cHl+OD7GETX4tWo9d06h6ob9EpuBxD5DuQJJsrG0XOCbgXXd89NI3NckUuIbiGniw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Stevens Make sure that collapse_file doesn't interfere with checking the uptodate flag in the page cache by only inserting hpage into the page cache after it has been updated and marked uptodate. This is achieved by simply not replacing present pages with hpage when iterating over the target range. The present pages are already locked, so replacing the with the locked hpage before the collapse is finalized is unnecessary. This fixes a race where folio_seek_hole_data would mistake hpage for an fallocated but unwritten page. This race is visible to userspace via data temporarily disappearing from SEEK_DATA/SEEK_HOLE. Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: David Stevens Acked-by: Peter Xu --- mm/khugepaged.c | 50 ++++++++++++------------------------------------- 1 file changed, 12 insertions(+), 38 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 51ae399f2035..bdde0a02811b 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1930,12 +1930,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, } } while (1); - /* - * At this point the hpage is locked and not up-to-date. - * It's safe to insert it into the page cache, because nobody would - * be able to map it or use it in another way until we unlock it. - */ - xas_set(&xas, start); for (index = start; index < end; index++) { page = xas_next(&xas); @@ -2104,13 +2098,9 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, } /* - * Add the page to the list to be able to undo the collapse if - * something go wrong. + * Accumulate the pages that are being collapsed. */ list_add_tail(&page->lru, &pagelist); - - /* Finally, replace with the new page. */ - xas_store(&xas, hpage); continue; out_unlock: unlock_page(page); @@ -2149,8 +2139,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, goto rollback; /* - * Replacing old pages with new one has succeeded, now we - * attempt to copy the contents. + * The old pages are locked, so they won't change anymore. */ index = start; list_for_each_entry(page, &pagelist, lru) { @@ -2230,11 +2219,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, /* nr_none is always 0 for non-shmem. */ __mod_lruvec_page_state(hpage, NR_SHMEM, nr_none); } - /* Join all the small entries into a single multi-index entry. */ - xas_set_order(&xas, start, HPAGE_PMD_ORDER); - xas_store(&xas, hpage); - xas_unlock_irq(&xas); + /* + * Mark hpage as uptodate before inserting it into the page cache so + * that it isn't mistaken for an fallocated but unwritten page. + */ folio = page_folio(hpage); folio_mark_uptodate(folio); folio_ref_add(folio, HPAGE_PMD_NR - 1); @@ -2243,6 +2232,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, folio_mark_dirty(folio); folio_add_lru(folio); + /* Join all the small entries into a single multi-index entry. */ + xas_set_order(&xas, start, HPAGE_PMD_ORDER); + xas_store(&xas, hpage); + xas_unlock_irq(&xas); + /* * Remove pte page tables, so we can re-fault the page as huge. */ @@ -2267,36 +2261,18 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, rollback: /* Something went wrong: roll back page cache changes */ - xas_lock_irq(&xas); if (nr_none) { mapping->nrpages -= nr_none; shmem_uncharge(mapping->host, nr_none); } - xas_set(&xas, start); - end = index; - for (index = start; index < end; index++) { - xas_next(&xas); - page = list_first_entry_or_null(&pagelist, - struct page, lru); - if (!page || xas.xa_index < page->index) { - nr_none--; - continue; - } - - VM_BUG_ON_PAGE(page->index != xas.xa_index, page); - + list_for_each_entry_safe(page, tmp, &pagelist, lru) { /* Unfreeze the page. */ list_del(&page->lru); page_ref_unfreeze(page, 2); - xas_store(&xas, page); - xas_pause(&xas); - xas_unlock_irq(&xas); unlock_page(page); putback_lru_page(page); - xas_lock_irq(&xas); } - VM_BUG_ON(nr_none); /* * Undo the updates of filemap_nr_thps_inc for non-SHMEM file only. * This undo is not needed unless failure is due to SCAN_COPY_MC. @@ -2304,8 +2280,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, if (!is_shmem && result == SCAN_COPY_MC) filemap_nr_thps_dec(mapping); - xas_unlock_irq(&xas); - hpage->mapping = NULL; unlock_page(hpage);