From patchwork Fri Feb 17 08:54:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 13144437 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C9EAC636D4 for ; Fri, 17 Feb 2023 08:55:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DA5CA6B0075; Fri, 17 Feb 2023 03:55:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D56766B0078; Fri, 17 Feb 2023 03:55:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF8276B007B; Fri, 17 Feb 2023 03:55:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B19DD6B0075 for ; Fri, 17 Feb 2023 03:55:09 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 81E51ABD84 for ; Fri, 17 Feb 2023 08:55:09 +0000 (UTC) X-FDA: 80476174338.25.C999E3A Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) by imf01.hostedemail.com (Postfix) with ESMTP id 82FDD40010 for ; Fri, 17 Feb 2023 08:55:07 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=fuZjVDB1; spf=pass (imf01.hostedemail.com: domain of stevensd@chromium.org designates 209.85.216.54 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676624107; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mkfSvy10yTOVtyro8pzajTvlxPW2cal5F0m+TWnGBJI=; b=uIE7pErGsNUu63k9/wiRgby6RGyAuQYXd80ZgzJ5292jChYtcNBLXWn/FOFY214gSOlJby 930A9o/3QigUzRGDMHkMGVPZbuy1Pv37fn47t7gBQBHUrS6R3HGpNHbbFsYOPSvtiAQexe lXoXXtlhOrrLsJdx2uyvXy1xKvMmOHY= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=fuZjVDB1; spf=pass (imf01.hostedemail.com: domain of stevensd@chromium.org designates 209.85.216.54 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676624107; a=rsa-sha256; cv=none; b=AHoHjFvK77xd5w/oHAkoVDQyBRjJpdCuFnDh6Wqzk670p8r2aqCLmSGxUll6+3LchFuqXN FLl9frcb5gDE8URC/+Ld/GdWKJOdc06iFaD6SKgfjrBn0Kp9vfIvo64JGY2eH0XzF0JztO OKqeqMu2MMyKweHrVt03Gn1hCsXq8SI= Received: by mail-pj1-f54.google.com with SMTP id j2so645381pjb.4 for ; Fri, 17 Feb 2023 00:55:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mkfSvy10yTOVtyro8pzajTvlxPW2cal5F0m+TWnGBJI=; b=fuZjVDB1JJSt2E73C3UPMMNrppsEpGv5FH4b5JY96FTsYo6yYLlYHWoX5M+5+TVcju UiSDrXU1dRgGMw40ttJAWrzfgC36ElYDvtynjJ+3E1LwbkZImWo+MnMOMv1cgRmg8mVj gXE1H1KJTTn48o6NnpjjBktQ0/zwB/1MutaBM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mkfSvy10yTOVtyro8pzajTvlxPW2cal5F0m+TWnGBJI=; b=quks+I+dOohRryqZuFWrwFlxdYBo9CCmBOi+BoRID0iuPtmT+4m8TEZ7NFunZYid8T ruYIqE+nKMAqSM3frDo2dTIwG3QjgJrboYZ43AcOPReUOU0WHmp5PIaLT4ZhH13Q4N1+ uMpVU4Nv49BzYeqEQkcYwueEE5pmspK8g3V4jIeYch0eSyRW4uZAK8OrnOCk8XOiTn9W arXV9qaN4dORr80TLBERreiEjPvo+7I67SYs6U+4rl65z6KraJvF6A/w4lbcUDu0Nj5L bOCLSoRGjUBCRSknuptGGzu/eyjmPbZAoAOSIW0p2Ove/ACyXj0dfAgE1ZXqNwnpynIO e8Fw== X-Gm-Message-State: AO0yUKXlWPUwzMc2lPtfYtvL52Oo96o01XPz8XaLdH6sxNlKiETVpk6p AqsFukaNa2yPf2dk7bEwGsYfNT7da1J35Nk3 X-Google-Smtp-Source: AK7set/iqmOUT7SHo7n0JV+juRYDXdzWb7GPiATpt58f1BhfCy/jRZIhTCV30jeU0FNmEkLWkdY9vQ== X-Received: by 2002:a17:902:c40d:b0:199:49fc:610d with SMTP id k13-20020a170902c40d00b0019949fc610dmr858449plk.15.1676624105797; Fri, 17 Feb 2023 00:55:05 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:b7bc:8cb9:1364:30fb]) by smtp.gmail.com with UTF8SMTPSA id l13-20020a170902d34d00b00198dd432329sm2622132plk.51.2023.02.17.00.55.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 17 Feb 2023 00:55:05 -0800 (PST) From: David Stevens X-Google-Original-From: David Stevens To: linux-mm@kvack.org, Peter Xu , Matthew Wilcox Cc: Andrew Morton , "Kirill A . Shutemov" , Yang Shi , David Hildenbrand , Hugh Dickins , linux-kernel@vger.kernel.org, David Stevens Subject: [PATCH v4 1/3] mm/khugepaged: refactor collapse_file control flow Date: Fri, 17 Feb 2023 17:54:37 +0900 Message-Id: <20230217085439.2826375-2-stevensd@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog In-Reply-To: <20230217085439.2826375-1-stevensd@google.com> References: <20230217085439.2826375-1-stevensd@google.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 82FDD40010 X-Stat-Signature: w5e5fyg3m9spmkhp3oq7c17h155j439m X-HE-Tag: 1676624107-440415 X-HE-Meta: U2FsdGVkX19J56rHAP191r5+7WRsaSzcnT+Z53RvuqGJtJZdGDk+m0EEV70FLcgbtMzTJvej7UqPHWhVlu4xwKiXUSlWyW1QrLkddhY+RtC2+KFeeyKabQCUJPMaUFn6Cz80zlP9l4CsWhsALuXSOfDu8WcKOcVEb8wtjK8UVWCkSyMNRF7/osV28+HkRFJ1RvMOnSrkoy0FpTJGlJwKIX1Rnz2LGDcBCwc4wBs23uxi/g3YoEN5Z4+FZphxmzCaR+8Lsp19LtoC+Kk4Zbj2gkTAlagFtslppGSFDitcIX+1Vvqd8wSWMWMKuKEu6dg6Ge7HOdspm7o8/cLXtN9ErxkrYtW63SWc5uXVsHOnmf22zWR+/sS2CP6aaixIJhzXFAdZs2SFckfGSLw+EWlQ3SfOtKa2k2swvVpMuC4vHlPcXrz8OnzODYVZKWkQ4GQvO7x/vkG2LTUJrDFL5iRFw2Q4FaozIvvPAedFCdQvyuYxIe4EJ1dPjDwd8bqHDSIwsIMp7x6bsBDnaD2uYzp96G2B2IiMgkxe0q7sLuiLhRFTQ7Bxum6Hz4nCo8uDkYwaks0I2clnWWIlI70zPTevTOU4FM4SGcr3GuenakJw7Th7mVyFvCZX75K0EcapwqeKuT8jkI088CrlrEdRfBhoBuxvzcIe3B50eTygAdExoIdKZbHHYNlYSAOuBT4XAshOzw9in8owFc2M/WuHGIw43uBaHaelhoCliqmxKJ3xXqhFqflSi02uyrMsHVfHP2oFXOUancOkF2vXUZteY8D44DZuZLVaaCrtQaKA2blHGHlwpLtgx04KjZlq6XMfzp2C4/W0h5fQezeASgzA8CS4lQ1wnCnm0UUM7edWqunUG5QzBa9lcgX3rRRQ+xm/msbjDvGpe8P+omm45A4mdyLCOdf5eFeH2MknX+EAGuPsaFWZqv1FakY2wmT4EHLAYIKjYjcWRcj6CZccRtgOeUN 8DnwIUQM NOehSnwSO41CMWnmB8GNJO8DB4teSukawqNbta+7Mc5q5KjzZOrSY46s/i0V+Ch8D7ohCkKNQL8falg2Auv+buW2R0JPRWBiJNr/elbUzsZbpxwIwZ2u0Hwev57kCiT6BpxRqA9/YdFqUAodAepIC4eOzWt03dsTmEyk0cyw14YnSZCpKz2B2cLW0UlAyj6BoJ57XkdaZf2LDs4DN8/nZFvbr2OLGVXb5Sw7dFpoRdx6JB0kV4XQ6nrgDsNpuxbP6qQFufwg7RdU6brxq67g7qD1iCZfOzZGdGtFFdQ/fzsBXuGPP46Q5ov3MuklmhLPLSMmNqt9jUXSfZntFOqQCbQw7fpLhECHmRRoZUigaNtFaDBY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Stevens Add a rollback label to deal with failure, instead of continuously checking for RESULT_SUCCESS, to make it easier to add more failure cases. The refactoring also allows the collapse_file tracepoint to include hpage on success (instead of NULL). Signed-off-by: David Stevens Acked-by: Peter Xu --- mm/khugepaged.c | 223 ++++++++++++++++++++++++------------------------ 1 file changed, 110 insertions(+), 113 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 8dbc39896811..6a3d6d2e25e0 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1885,6 +1885,12 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, if (result != SCAN_SUCCEED) goto out; + __SetPageLocked(hpage); + if (is_shmem) + __SetPageSwapBacked(hpage); + hpage->index = start; + hpage->mapping = mapping; + /* * Ensure we have slots for all the pages in the range. This is * almost certainly a no-op because most of the pages must be present @@ -1897,16 +1903,10 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, xas_unlock_irq(&xas); if (!xas_nomem(&xas, GFP_KERNEL)) { result = SCAN_FAIL; - goto out; + goto rollback; } } while (1); - __SetPageLocked(hpage); - if (is_shmem) - __SetPageSwapBacked(hpage); - hpage->index = start; - hpage->mapping = mapping; - /* * At this point the hpage is locked and not up-to-date. * It's safe to insert it into the page cache, because nobody would @@ -2123,131 +2123,128 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, */ try_to_unmap_flush(); - if (result == SCAN_SUCCEED) { - /* - * Replacing old pages with new one has succeeded, now we - * attempt to copy the contents. - */ - index = start; - list_for_each_entry(page, &pagelist, lru) { - while (index < page->index) { - clear_highpage(hpage + (index % HPAGE_PMD_NR)); - index++; - } - if (copy_mc_page(hpage + (page->index % HPAGE_PMD_NR), - page) > 0) { - result = SCAN_COPY_MC; - break; - } - index++; - } - while (result == SCAN_SUCCEED && index < end) { + if (result != SCAN_SUCCEED) + goto rollback; + + /* + * Replacing old pages with new one has succeeded, now we + * attempt to copy the contents. + */ + index = start; + list_for_each_entry(page, &pagelist, lru) { + while (index < page->index) { clear_highpage(hpage + (index % HPAGE_PMD_NR)); index++; } + if (copy_mc_page(hpage + (page->index % HPAGE_PMD_NR), + page) > 0) { + result = SCAN_COPY_MC; + goto rollback; + } + index++; + } + while (index < end) { + clear_highpage(hpage + (index % HPAGE_PMD_NR)); + index++; } - if (result == SCAN_SUCCEED) { - /* - * Copying old pages to huge one has succeeded, now we - * need to free the old pages. - */ - list_for_each_entry_safe(page, tmp, &pagelist, lru) { - list_del(&page->lru); - page->mapping = NULL; - page_ref_unfreeze(page, 1); - ClearPageActive(page); - ClearPageUnevictable(page); - unlock_page(page); - put_page(page); - } + /* + * Copying old pages to huge one has succeeded, now we + * need to free the old pages. + */ + list_for_each_entry_safe(page, tmp, &pagelist, lru) { + list_del(&page->lru); + page->mapping = NULL; + page_ref_unfreeze(page, 1); + ClearPageActive(page); + ClearPageUnevictable(page); + unlock_page(page); + put_page(page); + } - xas_lock_irq(&xas); - if (is_shmem) - __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr); - else - __mod_lruvec_page_state(hpage, NR_FILE_THPS, nr); + xas_lock_irq(&xas); + if (is_shmem) + __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr); + else + __mod_lruvec_page_state(hpage, NR_FILE_THPS, nr); + + if (nr_none) { + __mod_lruvec_page_state(hpage, NR_FILE_PAGES, nr_none); + /* nr_none is always 0 for non-shmem. */ + __mod_lruvec_page_state(hpage, NR_SHMEM, nr_none); + } + /* Join all the small entries into a single multi-index entry. */ + xas_set_order(&xas, start, HPAGE_PMD_ORDER); + xas_store(&xas, hpage); + xas_unlock_irq(&xas); - if (nr_none) { - __mod_lruvec_page_state(hpage, NR_FILE_PAGES, nr_none); - /* nr_none is always 0 for non-shmem. */ - __mod_lruvec_page_state(hpage, NR_SHMEM, nr_none); - } - /* Join all the small entries into a single multi-index entry. */ - xas_set_order(&xas, start, HPAGE_PMD_ORDER); - xas_store(&xas, hpage); - xas_unlock_irq(&xas); + folio = page_folio(hpage); + folio_mark_uptodate(folio); + folio_ref_add(folio, HPAGE_PMD_NR - 1); - folio = page_folio(hpage); - folio_mark_uptodate(folio); - folio_ref_add(folio, HPAGE_PMD_NR - 1); + if (is_shmem) + folio_mark_dirty(folio); + folio_add_lru(folio); - if (is_shmem) - folio_mark_dirty(folio); - folio_add_lru(folio); + /* + * Remove pte page tables, so we can re-fault the page as huge. + */ + result = retract_page_tables(mapping, start, mm, addr, hpage, + cc); + unlock_page(hpage); + goto out; + +rollback: + /* Something went wrong: roll back page cache changes */ + xas_lock_irq(&xas); + if (nr_none) { + mapping->nrpages -= nr_none; + shmem_uncharge(mapping->host, nr_none); + } - /* - * Remove pte page tables, so we can re-fault the page as huge. - */ - result = retract_page_tables(mapping, start, mm, addr, hpage, - cc); - unlock_page(hpage); - hpage = NULL; - } else { - /* Something went wrong: roll back page cache changes */ - xas_lock_irq(&xas); - if (nr_none) { - mapping->nrpages -= nr_none; - shmem_uncharge(mapping->host, nr_none); + xas_set(&xas, start); + xas_for_each(&xas, page, end - 1) { + page = list_first_entry_or_null(&pagelist, + struct page, lru); + if (!page || xas.xa_index < page->index) { + if (!nr_none) + break; + nr_none--; + /* Put holes back where they were */ + xas_store(&xas, NULL); + continue; } - xas_set(&xas, start); - xas_for_each(&xas, page, end - 1) { - page = list_first_entry_or_null(&pagelist, - struct page, lru); - if (!page || xas.xa_index < page->index) { - if (!nr_none) - break; - nr_none--; - /* Put holes back where they were */ - xas_store(&xas, NULL); - continue; - } + VM_BUG_ON_PAGE(page->index != xas.xa_index, page); - VM_BUG_ON_PAGE(page->index != xas.xa_index, page); + /* Unfreeze the page. */ + list_del(&page->lru); + page_ref_unfreeze(page, 2); + xas_store(&xas, page); + xas_pause(&xas); + xas_unlock_irq(&xas); + unlock_page(page); + putback_lru_page(page); + xas_lock_irq(&xas); + } + VM_BUG_ON(nr_none); + /* + * Undo the updates of filemap_nr_thps_inc for non-SHMEM file only. + * This undo is not needed unless failure is due to SCAN_COPY_MC. + */ + if (!is_shmem && result == SCAN_COPY_MC) + filemap_nr_thps_dec(mapping); - /* Unfreeze the page. */ - list_del(&page->lru); - page_ref_unfreeze(page, 2); - xas_store(&xas, page); - xas_pause(&xas); - xas_unlock_irq(&xas); - unlock_page(page); - putback_lru_page(page); - xas_lock_irq(&xas); - } - VM_BUG_ON(nr_none); - /* - * Undo the updates of filemap_nr_thps_inc for non-SHMEM file only. - * This undo is not needed unless failure is due to SCAN_COPY_MC. - */ - if (!is_shmem && result == SCAN_COPY_MC) - filemap_nr_thps_dec(mapping); + xas_unlock_irq(&xas); - xas_unlock_irq(&xas); + hpage->mapping = NULL; - hpage->mapping = NULL; - } + unlock_page(hpage); + mem_cgroup_uncharge(page_folio(hpage)); + put_page(hpage); - if (hpage) - unlock_page(hpage); out: VM_BUG_ON(!list_empty(&pagelist)); - if (hpage) { - mem_cgroup_uncharge(page_folio(hpage)); - put_page(hpage); - } - trace_mm_khugepaged_collapse_file(mm, hpage, index, is_shmem, addr, file, nr, result); return result; } From patchwork Fri Feb 17 08:54:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 13144438 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15C5FC636D4 for ; Fri, 17 Feb 2023 08:55:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8C4C36B007B; Fri, 17 Feb 2023 03:55:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 875AC6B007D; Fri, 17 Feb 2023 03:55:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7156D6B007E; Fri, 17 Feb 2023 03:55:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 61D5D6B007B for ; Fri, 17 Feb 2023 03:55:14 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 23B64415AC for ; Fri, 17 Feb 2023 08:55:14 +0000 (UTC) X-FDA: 80476174548.18.8650826 Received: from mail-pj1-f44.google.com (mail-pj1-f44.google.com [209.85.216.44]) by imf21.hostedemail.com (Postfix) with ESMTP id 3E58B1C0002 for ; Fri, 17 Feb 2023 08:55:12 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=ekdQdSmF; spf=pass (imf21.hostedemail.com: domain of stevensd@chromium.org designates 209.85.216.44 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676624112; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oqXGa91fBd3Z4sCFS0+Zh+U2aV6NSqY4V6hrmKXsV3Q=; b=LqNUPrcX+suFNZkwH528IaRY/J2c7SSb3rJCL3kR1MZXPeWCvbuYx4RtA9OhYsA4R96jvs MLlocz1LSGZoBaowBHeR5H2SI48ZNzwNEHCgC+v8hI6fsLz1IALZtcly69ofDbqhApu1q0 gO2v2ByOUP5gYgZxiy1QlWB1GIUI2kU= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=ekdQdSmF; spf=pass (imf21.hostedemail.com: domain of stevensd@chromium.org designates 209.85.216.44 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676624112; a=rsa-sha256; cv=none; b=1jzKOLCXoyca8If/19l5rEOIFA5a/YDMOTuFtal3koLt8WGVAYu/5pdYL05vv4Op26xuK8 D1Tp9P3f7GPptANoN6IIWjxtCBmrSR8S+oEWV6Koh9f6GU1LXUE9yEEOoNFWMqWoti33UX 0/ZOy6z1ncLIS750V5ByLoEEHVE0L0o= Received: by mail-pj1-f44.google.com with SMTP id fh15-20020a17090b034f00b0023127b2d602so667466pjb.2 for ; Fri, 17 Feb 2023 00:55:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=oqXGa91fBd3Z4sCFS0+Zh+U2aV6NSqY4V6hrmKXsV3Q=; b=ekdQdSmFV568DB16kBGov8uGvKdVtBLBgCQakS0i1HxbfiDRB/ooe7otZ5xC0o8AXG q76yII0lJAloQrG/myB+cvj1LdFwLt9YxA6c12LlBLu4TJWCriKaiXeFWK/sDjw5/pc8 Yu+9MhSm/TSJVreRVMqGSO+HS+qVNC93NIyd4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oqXGa91fBd3Z4sCFS0+Zh+U2aV6NSqY4V6hrmKXsV3Q=; b=25RK98qRZ10GQfq/GjJ3i8MNpOUoYcqD+Ug0vVikpR/0uFXVNojcvn+BuiycM792st /TydcTAVEnSFyo1NeaKGklqia4TKHkFqz/Wp13InF68U0t/pc6FEarA88dUwM53XI0Vr mtWwzVJk4ouAbV1yJEonh12vBb5nZMS73XtEV+jlkDzS+nnTIes2YwxntcIKpVJacZKA DOE+hI4gLC1OIzTYIUt6op/gt+mMXRKhZzW+C6X7PbI+FEP5GYj3UnyFikhpy4BrXOra WCFBKU2DD8mkaGZ6FetyqV7S28qOipShg3U4xc5Xi97xsiQs6Jx+nYK0rW1tl4m+E6cg aCHQ== X-Gm-Message-State: AO0yUKU1JL8bjA8xbYze11SfqE35T2n51FNtZp8cJZcXESEn45hJydqO 9JvefqvQZl1V3fWtkV2qhmhxvkSuoGw7EMif X-Google-Smtp-Source: AK7set8mJxF4pYS0xXPZZ5p7uXgxF9+2qtOM7oDd6No4lhMUVWnoPBcz+AZoPD6Miu+FO7vIhGNPCA== X-Received: by 2002:a17:902:dacf:b0:196:5839:b374 with SMTP id q15-20020a170902dacf00b001965839b374mr1180046plx.9.1676624110619; Fri, 17 Feb 2023 00:55:10 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:b7bc:8cb9:1364:30fb]) by smtp.gmail.com with UTF8SMTPSA id 6-20020a170902c20600b00194d14d8e54sm2610781pll.96.2023.02.17.00.55.07 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 17 Feb 2023 00:55:09 -0800 (PST) From: David Stevens X-Google-Original-From: David Stevens To: linux-mm@kvack.org, Peter Xu , Matthew Wilcox Cc: Andrew Morton , "Kirill A . Shutemov" , Yang Shi , David Hildenbrand , Hugh Dickins , linux-kernel@vger.kernel.org, David Stevens Subject: [PATCH v4 2/3] mm/khugepaged: skip shmem with userfaultfd Date: Fri, 17 Feb 2023 17:54:38 +0900 Message-Id: <20230217085439.2826375-3-stevensd@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog In-Reply-To: <20230217085439.2826375-1-stevensd@google.com> References: <20230217085439.2826375-1-stevensd@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 3E58B1C0002 X-Rspam-User: X-Stat-Signature: 9gzi6pnboggba4i44y5qarajewyhxuxy X-HE-Tag: 1676624112-139056 X-HE-Meta: U2FsdGVkX18LljEvdct8c0kz7RTllxRV55V4RtC4Lsbr4/CSwifxJdqfOTMdYNp+AXnjFrj2oFhLW3RQWb2NFoWSZcLJXCr4c6kQJdM2duOA5r0iOZjq1/ZQKulRHRWL3GarD/24ySNKBWbpQ1eAllLbKx5VmllhpV/iB8NPr72RDv9pCIUfxlnYhL791pX64CnkY8oHNlNb2J0ih7OPwnaHzmXV0AxDu1vRpAhzhiwvQyPsB2svm0GYUbMCN0Xl91sXYagpL2WcOY3bwZ83daFnmkfDFn5fvBWqnvmJw6RI2l2v3oTFZwzZMx2N/B+qiJsALi8O30VQQA5AngvfCK9TUgWkGdKWyuxarNLJL4uf/fu66MVG293YEHwjAXPgLikHP2fpyrioL0r7HansizGfhDFLHuF1WZRUBwez7s1VtBuVKp/bH8jh4SD+d+U530kVRwUMYtjsnZZql6agIw/lTY0T3+P7IDYQXb0i/n8x7ZuJrOjqsM8wzHK6HcnzoxetWaeL8ryb7pGc2oMnxz/HrCz0bxmbjxhKFfhfevuRniIBqX2N87cw2pJH1aISLmA+EciwCI0EFq6VNmxY+fca+tYBJx24QxmMoqwjKIrVl2TAc5wxX0457RQkf0cQcjHQrFh1iXLzYMTa4+IX8PRn4rrgOsFoun9cXb9E0ayAcInwH/Qg0fTnIA0TZASgiYJkvhEGNjolU6kDKliuxuy0/68lx6Jo6LUwvu3o8nB8i28R97coGtsKBIj0N/c4sYc43270uJnSNCzYA8TWqPQODSowU6GB8YbUF2bKmoTwNHSnT0g83YlFpWfwsWTqxWbLdKj8wMl617XGcLzkWs63aEDaBe9UTC1zpiCz28E2EYfBmOwCAHTbLrVgafbO5vndnr6gcf1l/B+semoJf93P7NIwmLt518FxQRIVUrc7x95ugn26DSw1caxmWc4WNtTa1OZUBp3TbANpsAv QdKBPgyJ vVjoFVo08zCCRhZQjPWLn4W6VArGp3Msn5UIv8Xih8Z3nEybVnxOclNkjzdntAF1yOXmsGhnt6fGHRGDr3Qk2TufhQtBtqYD6ve3YvAjgCloN2wyS2456EyqL4ZJYorYrQJWNztbd3eFHY+TqeIznyAE6tipKhMixU88MFyD2oWYBU6S5kUhy7m2Bzwgw3xBxzOQvYfUXR9zYe1XHNYQ2fMpb1y01P1fBoPe+xaMJ9jOqT5kQDul9RCRBHD/GFFoLoofj2domPXqAxnLXTbFEMJJRuJu3HY3G/lGyBhpVwIF1TkQAQfHFOb/FxNPPKuLlvm+f X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Stevens Make sure that collapse_file respects any userfaultfds registered with MODE_MISSING. If userspace has any such userfaultfds registered, then for any page which it knows to be missing, it may expect a UFFD_EVENT_PAGEFAULT. This means collapse_file needs to be careful when collapsing a shmem range would result in replacing an empty page with a THP, to avoid breaking userfaultfd. Synchronization when checking for userfaultfds in collapse_file is tricky because the mmap locks can't be used to prevent races with the registration of new userfaultfds. Instead, we provide synchronization by ensuring that userspace cannot observe the fact that pages are missing before we check for userfaultfds. Although this allows registration of a userfaultfd to race with collapse_file, it ensures that userspace cannot observe any pages transition from missing to present after such a race occurs. This makes such a race indistinguishable to the collapse occurring immediately before the userfaultfd registration. The first step to provide this synchronization is to stop filling gaps during the loop iterating over the target range, since the page cache lock can be dropped during that loop. The second step is to fill the gaps with XA_RETRY_ENTRY after the page cache lock is acquired the final time, to avoid races with accesses to the page cache that only take the RCU read lock. The fact that we don't fill holes during the initial iteration means that collapse_file now has to handle faults occurring during the collapse. This is done by re-validating the number of missing pages after acquiring the page cache lock for the final time. This fix is targeted at khugepaged, but the change also applies to MADV_COLLAPSE. MADV_COLLAPSE on a range with a userfaultfd will now return EBUSY if there are any missing pages (instead of succeeding on shmem and returning EINVAL on anonymous memory). There is also now a window during MADV_COLLAPSE where a fault on a missing page will cause the syscall to fail with EAGAIN. The fact that intermediate page cache state can no longer be observed before the rollback of a failed collapse is also technically a userspace-visible change (via at least SEEK_DATA and SEEK_END), but it is exceedingly unlikely that anything relies on being able to observe that transient state. Signed-off-by: David Stevens Acked-by: Peter Xu --- include/trace/events/huge_memory.h | 3 +- mm/khugepaged.c | 92 +++++++++++++++++++++++------- 2 files changed, 73 insertions(+), 22 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index 46cce509957b..7ee85fff89a3 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -37,7 +37,8 @@ EM( SCAN_CGROUP_CHARGE_FAIL, "ccgroup_charge_failed") \ EM( SCAN_TRUNCATED, "truncated") \ EM( SCAN_PAGE_HAS_PRIVATE, "page_has_private") \ - EMe(SCAN_COPY_MC, "copy_poisoned_page") \ + EM( SCAN_COPY_MC, "copy_poisoned_page") \ + EMe(SCAN_PAGE_FILLED, "page_filled") \ #undef EM #undef EMe diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 6a3d6d2e25e0..1c37f9151345 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -57,6 +57,7 @@ enum scan_result { SCAN_TRUNCATED, SCAN_PAGE_HAS_PRIVATE, SCAN_COPY_MC, + SCAN_PAGE_FILLED, }; #define CREATE_TRACE_POINTS @@ -1851,8 +1852,8 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, * - allocate and lock a new huge page; * - scan page cache replacing old pages with the new one * + swap/gup in pages if necessary; - * + fill in gaps; * + keep old pages around in case rollback is required; + * - finalize updates to the page cache; * - if replacing succeeds: * + copy data over; * + free old pages; @@ -1930,13 +1931,12 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, result = SCAN_TRUNCATED; goto xa_locked; } - xas_set(&xas, index); + xas_set(&xas, index + 1); } if (!shmem_charge(mapping->host, 1)) { result = SCAN_FAIL; goto xa_locked; } - xas_store(&xas, hpage); nr_none++; continue; } @@ -2148,21 +2148,57 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, index++; } - /* - * Copying old pages to huge one has succeeded, now we - * need to free the old pages. - */ - list_for_each_entry_safe(page, tmp, &pagelist, lru) { - list_del(&page->lru); - page->mapping = NULL; - page_ref_unfreeze(page, 1); - ClearPageActive(page); - ClearPageUnevictable(page); - unlock_page(page); - put_page(page); + if (nr_none) { + struct vm_area_struct *vma; + int nr_none_check = 0; + + i_mmap_lock_read(mapping); + xas_lock_irq(&xas); + + xas_set(&xas, start); + for (index = start; index < end; index++) { + if (!xas_next(&xas)) { + xas_store(&xas, XA_RETRY_ENTRY); + nr_none_check++; + } + } + + if (nr_none != nr_none_check) { + result = SCAN_PAGE_FILLED; + goto immap_locked; + } + + /* + * If userspace observed a missing page in a VMA with an armed + * userfaultfd, then it might expect a UFFD_EVENT_PAGEFAULT for + * that page, so we need to roll back to avoid suppressing such + * an event. Any userfaultfds armed after this point will not be + * able to observe any missing pages due to the previously + * inserted retry entries. + */ + vma_interval_tree_foreach(vma, &mapping->i_mmap, start, start) { + if (userfaultfd_missing(vma)) { + result = SCAN_EXCEED_NONE_PTE; + goto immap_locked; + } + } + +immap_locked: + i_mmap_unlock_read(mapping); + if (result != SCAN_SUCCEED) { + xas_set(&xas, start); + for (index = start; index < end; index++) { + if (xas_next(&xas) == XA_RETRY_ENTRY) + xas_store(&xas, NULL); + } + + xas_unlock_irq(&xas); + goto rollback; + } + } else { + xas_lock_irq(&xas); } - xas_lock_irq(&xas); if (is_shmem) __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr); else @@ -2192,6 +2228,20 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, result = retract_page_tables(mapping, start, mm, addr, hpage, cc); unlock_page(hpage); + + /* + * The collapse has succeeded, so free the old pages. + */ + list_for_each_entry_safe(page, tmp, &pagelist, lru) { + list_del(&page->lru); + page->mapping = NULL; + page_ref_unfreeze(page, 1); + ClearPageActive(page); + ClearPageUnevictable(page); + unlock_page(page); + put_page(page); + } + goto out; rollback: @@ -2203,15 +2253,13 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, } xas_set(&xas, start); - xas_for_each(&xas, page, end - 1) { + end = index; + for (index = start; index < end; index++) { + xas_next(&xas); page = list_first_entry_or_null(&pagelist, struct page, lru); if (!page || xas.xa_index < page->index) { - if (!nr_none) - break; nr_none--; - /* Put holes back where they were */ - xas_store(&xas, NULL); continue; } @@ -2730,12 +2778,14 @@ static int madvise_collapse_errno(enum scan_result r) case SCAN_ALLOC_HUGE_PAGE_FAIL: return -ENOMEM; case SCAN_CGROUP_CHARGE_FAIL: + case SCAN_EXCEED_NONE_PTE: return -EBUSY; /* Resource temporary unavailable - trying again might succeed */ case SCAN_PAGE_COUNT: case SCAN_PAGE_LOCK: case SCAN_PAGE_LRU: case SCAN_DEL_PAGE_LRU: + case SCAN_PAGE_FILLED: return -EAGAIN; /* * Other: Trying again likely not to succeed / error intrinsic to From patchwork Fri Feb 17 08:54:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 13144439 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A410CC636D4 for ; Fri, 17 Feb 2023 08:55:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 431536B007D; Fri, 17 Feb 2023 03:55:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 407CD6B007E; Fri, 17 Feb 2023 03:55:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2CF876B0080; Fri, 17 Feb 2023 03:55:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 1E3426B007D for ; Fri, 17 Feb 2023 03:55:19 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id C15C114159E for ; Fri, 17 Feb 2023 08:55:18 +0000 (UTC) X-FDA: 80476174716.14.906C887 Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) by imf04.hostedemail.com (Postfix) with ESMTP id D01064000C for ; Fri, 17 Feb 2023 08:55:16 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=GZkmRSsd; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf04.hostedemail.com: domain of stevensd@chromium.org designates 209.85.216.49 as permitted sender) smtp.mailfrom=stevensd@chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676624116; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KjMpoZ8bcgvH2E9XkmftEhbBkctgLmEbKJ6cV9dN/tw=; b=fELT0NFGXi/bDZcaSYtV5tYNYdHE28T9YipHLHmQqhGxTUq4htcgd8NGIrN2BLGrPZd/V7 vkHcXfK2TmOixsT9lDedJ4zRK10WHT4JWjEnp5Ff35RWw+5n5hfCMVtTNbUumJXWm4wVuY r66aXQLqx+KrysybaK2rA3+0LgxzU68= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=GZkmRSsd; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf04.hostedemail.com: domain of stevensd@chromium.org designates 209.85.216.49 as permitted sender) smtp.mailfrom=stevensd@chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676624116; a=rsa-sha256; cv=none; b=EkjxgTohBoPUdnNVUHdRnusdEXdWDd6t3iLPJD8Mr3l3jnzHb0ZcQxgCZUhJFOG6nuD16t SiFRsXkN0p5lZQ8MGE0wiYGMAe6qeCVT9YIFXQ1rCOEbjoBElAxJSjtJtMrPKAWQBeOmsq zxywOJ112QMHmOt6w/TX9wFE/mw7QSs= Received: by mail-pj1-f49.google.com with SMTP id d15-20020a17090ae28f00b00229eec90a7fso513362pjz.0 for ; Fri, 17 Feb 2023 00:55:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KjMpoZ8bcgvH2E9XkmftEhbBkctgLmEbKJ6cV9dN/tw=; b=GZkmRSsdPBfla5qFntrp2Pw4Fv/pqdobPbGt2YMGNVPFzOczxMqoeQwL/ECUItTSwC Z2EdXnqyL4OaBFASsYyJs5KeDsJjdMoRycYji3+G03xuy31DgKDrPMgW5P/kcBQAMSNx pvFaDbymNIwo9wkkmZsc85H5601aY8DC4tZos= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KjMpoZ8bcgvH2E9XkmftEhbBkctgLmEbKJ6cV9dN/tw=; b=JpS4QOtB8rLXTydUZFAsgnR2g6uk135Magp2QTOVDZq/URXQW0bdppLgQKyyUtpgUq Nvp/f+LevkOmbCCfiSiofugH5E3S0Nk/Wjhsp5WxuTA3Vmb6ffVsZk9mBs0oRsb98OZr x4McWrBCnyr0slnFBADuncWWsMtsJ/QGKAiaP8kfjuL28KH4/03Czg7TLo7aVHWQr/9L kjlz0gdx2Rr81Jxj4tf5RfL9njTmUCNfIYZPNsGyI74JFp2KjuKtwdjpPO6q8krAq2aT 0iVut9zjN9XqsBWMs0SF/aqbjOIlHk5aQ+h/YoIGiI5uitgCuxeuv0bHMewOPMgrUjLZ VhWQ== X-Gm-Message-State: AO0yUKWJH2poKQrNVaQPIrsxU2MPVe9/t6jMaO2lQCFxpjGwaBV9BR+S MwG6ztdSXcC3domUI3WKKt5Q326hTtOoZkGP X-Google-Smtp-Source: AK7set+gdOoMg/f/eZ6vrpn2+McA1nHrAJV9kACCzgvxBVBKR4TgUlsSlLQeog5O33xrEkX2ULDZpg== X-Received: by 2002:a05:6a20:7b11:b0:c7:32b8:d6b9 with SMTP id s17-20020a056a207b1100b000c732b8d6b9mr3982808pzh.13.1676624115269; Fri, 17 Feb 2023 00:55:15 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:b7bc:8cb9:1364:30fb]) by smtp.gmail.com with UTF8SMTPSA id j12-20020a62b60c000000b0059072daa002sm2550076pff.192.2023.02.17.00.55.12 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 17 Feb 2023 00:55:14 -0800 (PST) From: David Stevens X-Google-Original-From: David Stevens To: linux-mm@kvack.org, Peter Xu , Matthew Wilcox Cc: Andrew Morton , "Kirill A . Shutemov" , Yang Shi , David Hildenbrand , Hugh Dickins , linux-kernel@vger.kernel.org, David Stevens Subject: [PATCH v4 3/3] mm/khugepaged: maintain page cache uptodate flag Date: Fri, 17 Feb 2023 17:54:39 +0900 Message-Id: <20230217085439.2826375-4-stevensd@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog In-Reply-To: <20230217085439.2826375-1-stevensd@google.com> References: <20230217085439.2826375-1-stevensd@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: D01064000C X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: pgft3byr3enpoq7dumykmmunbnaeh8ky X-HE-Tag: 1676624116-342887 X-HE-Meta: U2FsdGVkX18XBY7txWY0IlNIDzCnwDRIm9pVfPyU3LbE+FnU/IT5NrzaxoTrqbup7RfpMPXsYM4TTs8gx/kcChJYF0TvAocKjw1CqNSJ/E9x4Hats0jbwBXHmQlvXDHCry4X1mcGXacknOvAWxGf3jwxlOQhPpy/k8DW7D6Ue6aHff9rPTqF0d83s+an//TMLrNiPu1sXaz6uOuZZy08ifSbm/FTJtp4fFey0mGPGsW1OhnKooQ12MZRlaC+Z6j+FUexpY49xCuRMVlfXY4CGO0UMVhLD+AFGScECt6J4Y8cP1eiZTtMCsKpCgujVNkau5Zux40B+jr507jYYqanBJ1GPMwJizyjz99bg6kk77PUIO6t/9uCqZU3rMQBGbYFYfI8hnhnLJ8p4EiDjctAgodw1bDi6lBzOszKvqtGJi25XlmRI+gbUvUZKWscBKF2MrlodJTLNdD/4w5ZG6gZ1H0nJ92XFA6SoCnEZJb8BmgdJMWXZwHkgfBAk35OFpBmfk0tXHW3W0JevwDCB+9VmkX3m+H17X2LSbJvAJz3cKJDo72ciNnFAEof2a15thcIBEPS0Q1DBl2trKtuzuz6/Y57mks/m9ToUpW8DuBcjZtc+Q5K2PnirYAv77onL6n+0H+dJhpPpEL9CmthjVe3MRZQhu9z3JbNo+fFYzbYwPL8GDGUZWfjeb0dR4NRIYA/aQrWAu6G0XDcUHpDdUM9PXNlR71B+aYkGJZxeCGidG0DyNnCZ4oLDxueZnlWX0qy8mRpShqrab/fAmnxag+Zvh97ojuW7IdFWsK/EolOh/STNJrZbNR+61pvaR3ELkNNbC7tW9aczwPlgcCvx3hDOV2Jl0LUYtby4xRoWcWMGE51JpzXoZdoi2la6uZOKjOugwWtCSK/TuVIwWRtVpau5RJtJRW6r2DMINeQlu+7qt94UYbFG0KrYgNhiEUtKOy/hybswlMZU6CbTB9iKPj 5ekd5J8Y NCN5vl6xyXC5cTrWql1vEyVDdMOmqll/qL9PUicDrKxXr6vMBukIDB8+xCu25I7Np6KqLJyl0hCUCUl8mWUygAP6uppAKWEuj2etmIBV/jspsPFlIUXgoHaaa90OwfRbeg0nWyOBUSjsD9HNtmAnJBd/llOLcLpKLxzW11hdSmJ+SEmflfkxT+6jKAgYA0SG285L/dJLrFgv2Beui1WpYG1zmIbsS/vrLFz+F9u3/o+qn0qD+W98IY692ZbLrkMVHKnfDuSPetTW/rlhrFhXerQu4QuNaAPxPr6WU5ODAWFVE2ivT+ytlr+dheuOYhIu6P4AM X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Stevens Make sure that collapse_file doesn't interfere with checking the uptodate flag in the page cache by only inserting hpage into the page cache after it has been updated and marked uptodate. This is achieved by simply not replacing present pages with hpage when iterating over them target range. The present pages are already locked, so replacing the with the locked hpage before the collapse is finalized is unnecessary. This fixes a race where folio_seek_hole_data would mistake hpage for an fallocated but unwritten page. This race is visible to userspace via data temporarily disappearing from SEEK_DATA/SEEK_HOLE. Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: David Stevens Acked-by: Peter Xu --- mm/khugepaged.c | 50 ++++++++++++------------------------------------- 1 file changed, 12 insertions(+), 38 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 1c37f9151345..e08cf7c5ebdf 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1908,12 +1908,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, } } while (1); - /* - * At this point the hpage is locked and not up-to-date. - * It's safe to insert it into the page cache, because nobody would - * be able to map it or use it in another way until we unlock it. - */ - xas_set(&xas, start); for (index = start; index < end; index++) { page = xas_next(&xas); @@ -2082,13 +2076,9 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, } /* - * Add the page to the list to be able to undo the collapse if - * something go wrong. + * Accumulate the pages that are being collapsed. */ list_add_tail(&page->lru, &pagelist); - - /* Finally, replace with the new page. */ - xas_store(&xas, hpage); continue; out_unlock: unlock_page(page); @@ -2127,8 +2117,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, goto rollback; /* - * Replacing old pages with new one has succeeded, now we - * attempt to copy the contents. + * The old pages are locked, so they won't change anymore. */ index = start; list_for_each_entry(page, &pagelist, lru) { @@ -2209,11 +2198,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, /* nr_none is always 0 for non-shmem. */ __mod_lruvec_page_state(hpage, NR_SHMEM, nr_none); } - /* Join all the small entries into a single multi-index entry. */ - xas_set_order(&xas, start, HPAGE_PMD_ORDER); - xas_store(&xas, hpage); - xas_unlock_irq(&xas); + /* + * Mark hpage as uptodate before inserting it into the page cache so + * that it isn't mistaken for an fallocated but unwritten page. + */ folio = page_folio(hpage); folio_mark_uptodate(folio); folio_ref_add(folio, HPAGE_PMD_NR - 1); @@ -2222,6 +2211,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, folio_mark_dirty(folio); folio_add_lru(folio); + /* Join all the small entries into a single multi-index entry. */ + xas_set_order(&xas, start, HPAGE_PMD_ORDER); + xas_store(&xas, hpage); + xas_unlock_irq(&xas); + /* * Remove pte page tables, so we can re-fault the page as huge. */ @@ -2246,36 +2240,18 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, rollback: /* Something went wrong: roll back page cache changes */ - xas_lock_irq(&xas); if (nr_none) { mapping->nrpages -= nr_none; shmem_uncharge(mapping->host, nr_none); } - xas_set(&xas, start); - end = index; - for (index = start; index < end; index++) { - xas_next(&xas); - page = list_first_entry_or_null(&pagelist, - struct page, lru); - if (!page || xas.xa_index < page->index) { - nr_none--; - continue; - } - - VM_BUG_ON_PAGE(page->index != xas.xa_index, page); - + list_for_each_entry_safe(page, tmp, &pagelist, lru) { /* Unfreeze the page. */ list_del(&page->lru); page_ref_unfreeze(page, 2); - xas_store(&xas, page); - xas_pause(&xas); - xas_unlock_irq(&xas); unlock_page(page); putback_lru_page(page); - xas_lock_irq(&xas); } - VM_BUG_ON(nr_none); /* * Undo the updates of filemap_nr_thps_inc for non-SHMEM file only. * This undo is not needed unless failure is due to SCAN_COPY_MC. @@ -2283,8 +2259,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, if (!is_shmem && result == SCAN_COPY_MC) filemap_nr_thps_dec(mapping); - xas_unlock_irq(&xas); - hpage->mapping = NULL; unlock_page(hpage);