From patchwork Tue Apr 4 12:01:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 13199709 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4734CC761A6 for ; Tue, 4 Apr 2023 12:01:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D6E946B0080; Tue, 4 Apr 2023 08:01:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D1F106B0081; Tue, 4 Apr 2023 08:01:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC01E6B0082; Tue, 4 Apr 2023 08:01:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id AC8B86B0080 for ; Tue, 4 Apr 2023 08:01:47 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 8C19C1A06D7 for ; Tue, 4 Apr 2023 12:01:47 +0000 (UTC) X-FDA: 80643569454.13.1F3EF2A Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) by imf10.hostedemail.com (Postfix) with ESMTP id A8448C0005 for ; Tue, 4 Apr 2023 12:01:45 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b="cpPgK8/n"; spf=pass (imf10.hostedemail.com: domain of stevensd@chromium.org designates 209.85.216.41 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680609705; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Vyhfi+1sWMxl+7ZSluYYXlOU9BFaVL7c1FFz74Xev1Y=; b=8i5WGIGSNDtUfj6HrY2pN5YoB5yxvQwon4deIfJOPoByeYpRHeDHRFtS4dtbGyyls/+/1M SDl7T1zHBTypfF225hWOtfD7Lkd+Uj8SjzeVHFj5wpF1h/f3MFsAcaQdwmur/H0CjvdU/+ lR100S8xkiplY57ZTmUbg+xP1v3LVm0= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b="cpPgK8/n"; spf=pass (imf10.hostedemail.com: domain of stevensd@chromium.org designates 209.85.216.41 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680609705; a=rsa-sha256; cv=none; b=HCt3HthpJ1BtAqm4dFwGgbVwV8IENW9V5EsCHd01Kuio4aKHdhqeFDyfW+Uy3a0IEtt9+L Ui97kZX+kGXLNdH56ulSGvKAu1bj4iLEG57NiVLPG2Z7n3qVCKTV8ZDkHuLsE+iVWBrmkE YKlSJ0E312ixUS/3+i5sVPxORyFpBWs= Received: by mail-pj1-f41.google.com with SMTP id e15-20020a17090ac20f00b0023d1b009f52so35866006pjt.2 for ; Tue, 04 Apr 2023 05:01:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1680609704; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Vyhfi+1sWMxl+7ZSluYYXlOU9BFaVL7c1FFz74Xev1Y=; b=cpPgK8/nWgVUBBTA+Q+NHmPq/3BrBdxiVC2lvjhc5fGgylaFPc5/panj0rvENCI0Sh GGcuIYS+4UxwSAR2K0MukNoiOP1u4VJYUoTPIALoJ1eIGK9MyEaSkPQDH/xxCeN3oa1F cQPVr2GzWPoKMusf6zXvu113yINmDfSEd8s4U= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680609704; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Vyhfi+1sWMxl+7ZSluYYXlOU9BFaVL7c1FFz74Xev1Y=; b=JVKQKeW3VZArMBx0IiYUWnblGieEpIP8rCpH/LpYIUEXiWwcYynZQ/1lskhcV+xEXD ESWi+Pg5e7wpczf7FtK9Fz71OVVX+5TkQhuYJJECgQ9s3um5E64ln+CU1s1vrP5b124w 6A2RIynQB4iZSTLI8OBPjm7/TKYx7sviTsW+bb+tqu9hLN5Zq7oWWgADzfEx9KiK2xyQ XRpt8uE/69Gk3oQQejtl979oXEhD942H3UIqrAh1NTDT27ATaD3cdzSRQDQg3JwNuq2O EcwGSx2abhImjr5CxFMVv5f78i5WsOOd/FXjJK2YAzxpgVC4PQ3gQjbcQxAzjyZhxNzg iLTQ== X-Gm-Message-State: AAQBX9fg/nkMQClViuzXpr/10x/YHkJtPdf0Soq/KVJErTLqZ4aL/hpA d4vkUx/PtBlI+5c6133F5Hgp7uLPWKsbaJS4GnE= X-Google-Smtp-Source: AKy350ZGp0by7zBEDeJP7f167x6LyxoDREyVG0zApneHT5l/sdT8Kwof8HJrTkmCawNC1ggmn03V2g== X-Received: by 2002:a17:90b:3b8a:b0:237:161d:f5ac with SMTP id pc10-20020a17090b3b8a00b00237161df5acmr2422009pjb.36.1680609703010; Tue, 04 Apr 2023 05:01:43 -0700 (PDT) Received: from localhost ([2401:fa00:8f:203:959b:21ea:166b:c273]) by smtp.gmail.com with UTF8SMTPSA id e5-20020a170902744500b0019ee045a2b3sm8193256plt.308.2023.04.04.05.01.40 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 04 Apr 2023 05:01:42 -0700 (PDT) From: David Stevens X-Google-Original-From: David Stevens To: linux-mm@kvack.org, Peter Xu , Hugh Dickins Cc: Andrew Morton , Matthew Wilcox , "Kirill A . Shutemov" , Yang Shi , David Hildenbrand , Jiaqi Yan , linux-kernel@vger.kernel.org, David Stevens Subject: [PATCH v6 4/4] mm/khugepaged: maintain page cache uptodate flag Date: Tue, 4 Apr 2023 21:01:17 +0900 Message-Id: <20230404120117.2562166-5-stevensd@google.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog In-Reply-To: <20230404120117.2562166-1-stevensd@google.com> References: <20230404120117.2562166-1-stevensd@google.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: A8448C0005 X-Stat-Signature: yueip6nzwan8oy8kgguwijwk8zn8oehp X-HE-Tag: 1680609705-33584 X-HE-Meta: U2FsdGVkX1/G9E9xGG1agMAKJeBQ2O59nPnTmBaRjNpQC0vRx36GQlXmZzVt1a3/uSQA7+1ojshfxgYajtsOJflzcUdDt67jlNOhzRis5hma0deB3OVR4Tu1eIIAraDJOuWM5nVS5fA88uMAZq4HmKRdD3abHuKcLrKjqQsSVvBVmOXmCgsSPU1x0DnTJEz4Bss1bzECcCmgliunCNRcAotohWQ5INqadwJZCc3MCnt3eHPUeZr+nV7QqZe7dLULrUWytKfZb7AIAkPiZj+gZSw1obd6ENc2MoKD+B4y4y6AkNADeSNyqnTEttETiQ1Y+6qvxbNln+4oRAOlxWLuM1fChNhKSabgavCD2+FUgYFqiATqwQkJ4Bkhdj0iTvPZSJAhREmVqdwV2c/5mKwXvkvcyXPggKolHIwXrIJXciRAVjgIjN+de+4Wco65/giocQW26LmohINxRtJb/ELiHfL+9GaQv7nCoBB9Pr/JP/vxTudjWAs/AsRmdS7W5TWmOL2dhMKlfvvLtXdv1e4U6JYmHtY50ThQaHe/70nZUU/fN/drAY7321dHBNJur0DitIKaprnUYcOuskX/vk5ow0MI/fo0/KNoLaXvfpDsdMvumaI9I+iGe/ZulO3jRnJKmc5zm3QGTrbg5f1wA7a8o2iW+BO8TVddtb1lQY4115v2l7cNR5XLsrVdyLukFYVjLLtq40ixdA6ebekjmgsTZ0esD4MgZB5AJrDY8wGt5js7MXqtrifTAd0I1OFvMaIXoZuBh2KfXsz147+G/n2k2la+7sXPCIGrgfNEzJ1Fx1sHCCpkVcjv/T2o5XBqMojcwwa9D8kfsG63/kXIn5fHvzpEE8t1GzHsmUYmdxsvdsLrrugDuc56qkKWERuw1AOmNapq5qhBNaXvvrd9hzfw3/J71nASCtV1KQ3VHeD/nahYeG8VoaOM9dG5FOZZFuL153B02xz0XYjN+0bx3EU 773I7yOp 2kUxmKDEyjaXF+6/lOzXYkruZ7BqVRDpCl6g7RvUQfOiWuy8681FL4tfOWmK8lRjPgvrGCahS9ai53rc/OOGHEnatteUoKkKUyBXwvbToI7LpKcwnCTg43L/4R1r8FSDtF0GzTxYfroI9PIkesrxxAQp3XzRrY+9bs/HdLw8NfFAEWtMN7p3Ohb2L0OPcveZkEExQwl653fjLCFcgF3L7ycLqInUPV0t80znJS5MvkiB08BCxGQCyYWIfeDgVcxyAoJjWBQ3M1I/a/keEMcArQwhvrgG5f6NNIuBN4IgqZLzIXpuhd+Ymled+/PVOID0LfZDiRDnOzz88YDwRN8GkHMYEUJ1SWPPWGsLc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Stevens Make sure that collapse_file doesn't interfere with checking the uptodate flag in the page cache by only inserting hpage into the page cache after it has been updated and marked uptodate. This is achieved by simply not replacing present pages with hpage when iterating over the target range. The present pages are already locked, so replacing them with the locked hpage before the collapse is finalized is unnecessary. However, it is necessary to stop freezing the present pages after validating them, since leaving long-term frozen pages in the page cache can lead to deadlocks. Simply checking the reference count is sufficient to ensure that there are no long-term references hanging around that would the collapse would break. Similar to hpage, there is no reason that the present pages actually need to be frozen in addition to being locked. This fixes a race where folio_seek_hole_data would mistake hpage for an fallocated but unwritten page. This race is visible to userspace via data temporarily disappearing from SEEK_DATA/SEEK_HOLE. This also fixes a similar race where pages could temporarily disappear from mincore. Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: David Stevens --- mm/khugepaged.c | 79 ++++++++++++++++++------------------------------- 1 file changed, 29 insertions(+), 50 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 7679551e9540..a19aa140fd52 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1855,17 +1855,18 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, * * Basic scheme is simple, details are more complex: * - allocate and lock a new huge page; - * - scan page cache replacing old pages with the new one + * - scan page cache, locking old pages * + swap/gup in pages if necessary; - * + keep old pages around in case rollback is required; + * - copy data to new page + * - handle shmem holes + * + re-validate that holes weren't filled by someone else + * + check for userfaultfd * - finalize updates to the page cache; * - if replacing succeeds: - * + copy data over; - * + free old pages; * + unlock huge page; + * + free old pages; * - if replacing failed; - * + put all pages back and unfreeze them; - * + restore gaps in the page cache; + * + unlock old pages * + unlock and free huge page; */ static int collapse_file(struct mm_struct *mm, unsigned long addr, @@ -1913,12 +1914,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, } } while (1); - /* - * At this point the hpage is locked and not up-to-date. - * It's safe to insert it into the page cache, because nobody would - * be able to map it or use it in another way until we unlock it. - */ - xas_set(&xas, start); for (index = start; index < end; index++) { page = xas_next(&xas); @@ -2076,12 +2071,16 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, VM_BUG_ON_PAGE(page != xas_load(&xas), page); /* - * The page is expected to have page_count() == 3: + * We control three references to the page: * - we hold a pin on it; * - one reference from page cache; * - one from isolate_lru_page; + * If those are the only references, then any new usage of the + * page will have to fetch it from the page cache. That requires + * locking the page to handle truncate, so any new usage will be + * blocked until we unlock page after collapse/during rollback. */ - if (!page_ref_freeze(page, 3)) { + if (page_count(page) != 3) { result = SCAN_PAGE_COUNT; xas_unlock_irq(&xas); putback_lru_page(page); @@ -2089,13 +2088,9 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, } /* - * Add the page to the list to be able to undo the collapse if - * something go wrong. + * Accumulate the pages that are being collapsed. */ list_add_tail(&page->lru, &pagelist); - - /* Finally, replace with the new page. */ - xas_store(&xas, hpage); continue; out_unlock: unlock_page(page); @@ -2132,8 +2127,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, goto rollback; /* - * Replacing old pages with new one has succeeded, now we - * attempt to copy the contents. + * The old pages are locked, so they won't change anymore. */ index = start; list_for_each_entry(page, &pagelist, lru) { @@ -2222,11 +2216,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, /* nr_none is always 0 for non-shmem. */ __mod_lruvec_page_state(hpage, NR_SHMEM, nr_none); } - /* Join all the small entries into a single multi-index entry. */ - xas_set_order(&xas, start, HPAGE_PMD_ORDER); - xas_store(&xas, hpage); - xas_unlock_irq(&xas); + /* + * Mark hpage as uptodate before inserting it into the page cache so + * that it isn't mistaken for an fallocated but unwritten page. + */ folio = page_folio(hpage); folio_mark_uptodate(folio); folio_ref_add(folio, HPAGE_PMD_NR - 1); @@ -2235,6 +2229,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, folio_mark_dirty(folio); folio_add_lru(folio); + /* Join all the small entries into a single multi-index entry. */ + xas_set_order(&xas, start, HPAGE_PMD_ORDER); + xas_store(&xas, hpage); + xas_unlock_irq(&xas); + /* * Remove pte page tables, so we can re-fault the page as huge. */ @@ -2248,47 +2247,29 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, list_for_each_entry_safe(page, tmp, &pagelist, lru) { list_del(&page->lru); page->mapping = NULL; - page_ref_unfreeze(page, 1); ClearPageActive(page); ClearPageUnevictable(page); unlock_page(page); - put_page(page); + folio_put_refs(page_folio(page), 3); } goto out; rollback: /* Something went wrong: roll back page cache changes */ - xas_lock_irq(&xas); if (nr_none) { + xas_lock_irq(&xas); mapping->nrpages -= nr_none; shmem_uncharge(mapping->host, nr_none); + xas_unlock_irq(&xas); } - xas_set(&xas, start); - end = index; - for (index = start; index < end; index++) { - xas_next(&xas); - page = list_first_entry_or_null(&pagelist, - struct page, lru); - if (!page || xas.xa_index < page->index) { - nr_none--; - continue; - } - - VM_BUG_ON_PAGE(page->index != xas.xa_index, page); - - /* Unfreeze the page. */ + list_for_each_entry_safe(page, tmp, &pagelist, lru) { list_del(&page->lru); - page_ref_unfreeze(page, 2); - xas_store(&xas, page); - xas_pause(&xas); - xas_unlock_irq(&xas); unlock_page(page); putback_lru_page(page); - xas_lock_irq(&xas); + put_page(page); } - VM_BUG_ON(nr_none); /* * Undo the updates of filemap_nr_thps_inc for non-SHMEM * file only. This undo is not needed unless failure is @@ -2303,8 +2284,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, smp_mb(); } - xas_unlock_irq(&xas); - hpage->mapping = NULL; unlock_page(hpage);