From patchwork Tue Feb 14 07:57:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 13139582 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ECDBCC677F1 for ; Tue, 14 Feb 2023 07:57:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 08C486B0073; Tue, 14 Feb 2023 02:57:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 03B156B0074; Tue, 14 Feb 2023 02:57:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF7E3280001; Tue, 14 Feb 2023 02:57:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id CD2FC6B0073 for ; Tue, 14 Feb 2023 02:57:23 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id A0C4F1A07A6 for ; Tue, 14 Feb 2023 07:57:23 +0000 (UTC) X-FDA: 80465142366.27.68E16F2 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf14.hostedemail.com (Postfix) with ESMTP id CF008100010 for ; Tue, 14 Feb 2023 07:57:20 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=JgLu+1vt; spf=pass (imf14.hostedemail.com: domain of stevensd@chromium.org designates 209.85.216.43 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676361440; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=O7DfH1WNOVf4t8XjgAEqyor0ieXU0nLmep1VXk+2LuQ=; b=hTFDxl5mxT+9m6ZojFVtZxtYIqyr0SP8+sCl1KQiF0Qcv5rbosbPj4MOWJMoLOQyiTVjkR Sh29vZHxZZOVZAVGHEm8n6t2YelKo5UWz+5HpYCHpW1LzV9zQ8Jros4xsR6xZANCYdBdLD T6sb8154+9kBvFumMrSuledWM+tVa54= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=JgLu+1vt; spf=pass (imf14.hostedemail.com: domain of stevensd@chromium.org designates 209.85.216.43 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676361441; a=rsa-sha256; cv=none; b=eRVoYU47/d9cTTJSm4jJmOFPUvspzuG4+5+QKW93lZPecWtOcGOtFg/o+atSCC7mQpzUn3 hqCDnBF6jH782IDpwGUaB9V1zy8AiSlkE+pujP+M7OHfATPs9MEI3KYahZSK4VuXPGCGga YSj/s+8/6Vn+IWHnajUXwiz/gacIRes= Received: by mail-pj1-f43.google.com with SMTP id rm7-20020a17090b3ec700b0022c05558d22so14659244pjb.5 for ; Mon, 13 Feb 2023 23:57:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=O7DfH1WNOVf4t8XjgAEqyor0ieXU0nLmep1VXk+2LuQ=; b=JgLu+1vtOkmwsCN3mtGKQep0/DC4S+kayR7D5/h/M9mQWoDn55NSl/pxsmy26Q0iv0 mZ+yyHegrQKSXmMrSflFTPdrdXHSB97v+65nUUnKidXISbycELbs6KX+KCL4NrJI8VzH MTDC30KzrUwl6T5WRf01DC/YAQkaz2oSKCCFs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=O7DfH1WNOVf4t8XjgAEqyor0ieXU0nLmep1VXk+2LuQ=; b=SwHBt1/HW53gHN2up03/7XjNqF09aMLQ20T/rZ/8zQ6VUgr7hQs94Vi5XdFacx0Zc0 hAzJtlc0StrNQGvTSvC+efUUBkFejXwaIqCkvs14IiLioWyNgVlBAl993/oCCzQhi7l7 6509bLrX4h/fw1US4EwUFaM7lh++UDRmx7s/XqWOQZm3IIS+cKgh+E2UdpOCVyhFoiXK miuabVxZNbjALRwza3LqOcesXaZ81HaGtO+GpGyIciyjX+zKNGMwDPnzKauZG/ucclrm IZvqTnw/qUil6j+KSsu8Kgi7hkQzFIlltpttThBzkdrYWon9tpSZBkcLM8bxno2FnYQu LkPg== X-Gm-Message-State: AO0yUKXag0a71Jt1y9CMmEO2EdYZGW2lkVXdOdTxZcSPr6xcy3p/LjC2 mWD3TKSeXTySrtACf49rPFYskDsQ6MRVMZYt X-Google-Smtp-Source: AK7set/nm9XIjLUEwnIUiUfsvq5VxOiZljHMLnHXN1Qe4ILFHTpFYXB0qtXRVUOZTIZdGK5qE4uh0g== X-Received: by 2002:a17:902:d489:b0:198:e63d:9a4f with SMTP id c9-20020a170902d48900b00198e63d9a4fmr2053058plg.47.1676361439221; Mon, 13 Feb 2023 23:57:19 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:45f7:92a0:f546:300f]) by smtp.gmail.com with UTF8SMTPSA id i6-20020a170902eb4600b0019a7c890c61sm2981446pli.252.2023.02.13.23.57.16 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 13 Feb 2023 23:57:18 -0800 (PST) From: David Stevens X-Google-Original-From: David Stevens To: linux-mm@kvack.org, Peter Xu , Matthew Wilcox Cc: Andrew Morton , "Kirill A . Shutemov" , Yang Shi , David Hildenbrand , Hugh Dickins , linux-kernel@vger.kernel.org, David Stevens Subject: [PATCH 1/2] mm/khugepaged: set THP as uptodate earlier for shmem Date: Tue, 14 Feb 2023 16:57:09 +0900 Message-Id: <20230214075710.2401855-1-stevensd@google.com> X-Mailer: git-send-email 2.39.1.581.gbfd45094c4-goog MIME-Version: 1.0 X-Rspamd-Queue-Id: CF008100010 X-Stat-Signature: ewzts5g6yxmq93xhrtqgcqbi5tp4d366 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1676361440-245476 X-HE-Meta: U2FsdGVkX1+wN+9EpWi4E7uEJH7teqAcIBFYJhcNTI8ggp2ZWF+XKjO87gTM6Va9e638YWdELl4ei55yHbct6LhNtFyuVTnfd3Zg5PSW5y6N5abd1FcUZdYscGb9q0p3Ye+2nIgtNenL8MbuY0GhLQ7MoF5kFRckBg4mRvaHDN16y1OdsTswtkH0jBsqB0Svv/bhWg4m82MmFIYzs/5hNKwOn6kCrxfLunZ86ZtI0yRxsAhxLoW0DBNensTyeO5yXXaMvVVkXaQtt45XkMrWuLcqAQvCJoYnZbQzwlPY6mp4c1MT+oXf6AJUSKUbsT+V0vFSI5rxBR32jQxSCuByjWDEliq2EeZVN0Mt7d6WHNKxXybqpfeA31JNe7dYeXfXgD7//tBhDF8PdryRqmAM7XeAzftXLwMERsR8DxM6Ct++zGQpb/e8r9KAroWsciJP2CC2r0gvwpa0tQ5g/FXkrP0L3ozBkaHt9xVIbWWtf58zVCZSYLr2fX0vI3zD6lYgeXX4W7O4YsAl2PeAr8hEebwXLXebpCAzOLfs8W8KAvPYITRT901AwMQFRs4H5I8XTerjvnosIQUOIzOi70k7kne5/YGMQElXqGdNKJ3uErNzCCGJWzD4CexFOHDXswSk3ottdHAdCTC8CGU9CLLMLoUPVDxIw34QYJ0ZhslC6jBdqrfB5pWvVhY7LCglmlpwaN9F8pGEiPd0TmhY7j9edDHmQBAEcC5rvcC+REglj3V//cVJ5C3FqH4d93LdHvhz6d4shUvHklsXZxeUyTql/eiEGe41GlvFCC2CsLpa1pVtxbHrUjKQolw36ttGUK5IBYCN8C0dltTbvdHvyUM0X7hHIwKaohlYDQYbdLaD5ZRHedeXNKkWdebSFmCHp23i5LxP+Jstn5BtwFxpD83K31DVavnxvxXyvHVTXcHl+xbIkU4unlfBQqPvlT0Xc8XeuJ7bVU4Quek1iXp0b1v 9UA2TIO3 aMn3Mpcq4e5lHmz4kTWUVsQrGKJzy8+530p9U2h90T1Nc4DprQcOP4BETdjwFVhaY0giQLq9LIrK+f5l0w+QgYrqCi7jeVPNWsk6bB02+GFA7HS6phQ7B2wvA6ejtjxK4kzdX1g74pY98yF712yA/6VCw/nEsP2WwfKCRsZ4Al5RMsCcJUdj+fuzt2GKYfD0dSH1UtXGHI61cwjhSHSJQRnv9SjAbbwEKPfWOqvxF21jUK1Y0e0GIwd9Ytr3d9X/jtHb2p2BMQZtECvAR3J88Q5sEYed6hfZ8nhHIu3/nd6vzfUpNM9IAEfg1Oo1v8QGDAW1I X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Stevens In collapse_file, mark the THP as up-to-date before inserting it into the page cache. This fixes a race where folio_seek_hole_data would mistake the THP for an fallocated but unwritten page. This race is visible to userspace via data temporarily disappearing from SEEK_DATA/SEEK_HOLE, which can cause data loss for applications that use lseek to efficiently snapshot sparse shmem. Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: David Stevens --- mm/khugepaged.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 79be13133322..b648f1053d95 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1779,10 +1779,13 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, hpage->mapping = mapping; /* - * At this point the hpage is locked and not up-to-date. - * It's safe to insert it into the page cache, because nobody would - * be able to map it or use it in another way until we unlock it. + * Mark hpage as up-to-date before inserting it into the page cache to + * prevent it from being mistaken for an fallocated but unwritten page. + * Inserting the unfinished hpage into the page cache is safe because + * it is locked, so nobody can map it or use it in another way until we + * unlock it. */ + SetPageUptodate(hpage); xas_set(&xas, start); for (index = start; index < end; index++) { From patchwork Tue Feb 14 07:57:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 13139583 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 494B7C05027 for ; Tue, 14 Feb 2023 07:57:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A50986B0074; Tue, 14 Feb 2023 02:57:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9FFE76B0075; Tue, 14 Feb 2023 02:57:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C72D280001; Tue, 14 Feb 2023 02:57:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 757646B0074 for ; Tue, 14 Feb 2023 02:57:27 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 45662C0D5C for ; Tue, 14 Feb 2023 07:57:27 +0000 (UTC) X-FDA: 80465142534.04.63039F8 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf15.hostedemail.com (Postfix) with ESMTP id 4C1EBA0002 for ; Tue, 14 Feb 2023 07:57:25 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=TT5qGbRp; spf=pass (imf15.hostedemail.com: domain of stevensd@chromium.org designates 209.85.210.176 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676361445; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=j0Z3rzguH4ZoJOngNzhmwoM/o5xYQMHZRaqzjDLV+ZY=; b=7dTIlfZinoDmWPlHdhPsTWPk/uAaqh85Daui3TN7xK+jRDwmfGWi1sYgyld9JPR7yZuI0y yAuoCnVpBgTnRhCv4nrxyaJFWB0Vs5a5osoSACEjmRXJgidgti1yL1rKP9ZwpOWCpDDc1F 5kbd224V1iXXVXAxFpBDTGtdto26HDU= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=TT5qGbRp; spf=pass (imf15.hostedemail.com: domain of stevensd@chromium.org designates 209.85.210.176 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676361445; a=rsa-sha256; cv=none; b=olw3Ngb/9qu9ty0OvGdMJZ4NR99dPVYCacHHTqtRNz8sKZd50KLYIPN9SmXwIUwUtfR7BA CSZOY55T8SQfIUFjnwENtQCxxyvjMbmRB7H68jVMjtTVAfN8iSFLdeJ/Gc8Wqr15U+Ydko mjqtCZmzpm7fmKNh5E+4k088LGxaEUE= Received: by mail-pf1-f176.google.com with SMTP id bt14so2770192pfb.13 for ; Mon, 13 Feb 2023 23:57:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=j0Z3rzguH4ZoJOngNzhmwoM/o5xYQMHZRaqzjDLV+ZY=; b=TT5qGbRp+TwoKbeoF8RFdMUbptoh0Tt+uCr8QbFYXP+VDWlmHqE0TWskWe7Gg86ZOl w2szTKW6pZULps5U/lb0RAWobrUA+dl61/EqpGukGnH6udCW7LU+n688a001VOF6WnRG l8Xj5sx1HKNNl4w4OHepXx71PY3MkIN5u7aYE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=j0Z3rzguH4ZoJOngNzhmwoM/o5xYQMHZRaqzjDLV+ZY=; b=Qh3SYdLF9zA9y61ZiZ0p0C8vUOr5SQL+6YJj9GZqNgZTCNUxM2Vh5+qHwwKb92o55l y6wF0UJr3xxSzjTSBIVVKuPZ3golpXbUeTV1255zE8Lql3xg2C2mLuw2VnOrcxbIXpBQ b59ffW3sI03Dc1OHcbnltftp/0qVoxGLqMK5mhzNM7DdPHr46Bf8oU3+XrEPoRImDOxk rKC9LxxlZptUwGtqVUiqznRLZ28JZGgOUO6TpPCmPilRkCs2fZp6p9RE2rCB0Ucwnj7A +p4VKSlSyrfuoLuE7IgNiub28gBI8mpJhVTm5TKzWYBfrOM9E+3zYm+fm9Apb3R/aCkf ymxA== X-Gm-Message-State: AO0yUKUX+Zo2/n2Z9dtKmv0UBUY39+FIXHH0XcXmBZhZGrEtp3BCE2lV eLu/ldsKgtRA9ofBefVUdrwui7fpg7DDdkdV X-Google-Smtp-Source: AK7set8t1j4eGyz7sD1P5cj3c4rXd6726tRljRGwe6ONwfw0yR6YYRfJIu4Pb2DYlGLxA9+9Gn6mkg== X-Received: by 2002:aa7:95a9:0:b0:5a8:bd6e:90fb with SMTP id a9-20020aa795a9000000b005a8bd6e90fbmr1427099pfk.19.1676361443593; Mon, 13 Feb 2023 23:57:23 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:45f7:92a0:f546:300f]) by smtp.gmail.com with UTF8SMTPSA id d19-20020aa78153000000b00593fa670c88sm9134529pfn.57.2023.02.13.23.57.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 13 Feb 2023 23:57:22 -0800 (PST) From: David Stevens X-Google-Original-From: David Stevens To: linux-mm@kvack.org, Peter Xu , Matthew Wilcox Cc: Andrew Morton , "Kirill A . Shutemov" , Yang Shi , David Hildenbrand , Hugh Dickins , linux-kernel@vger.kernel.org, David Stevens Subject: [PATCH 2/2] mm/khugepaged: skip shmem with userfaultfd Date: Tue, 14 Feb 2023 16:57:10 +0900 Message-Id: <20230214075710.2401855-2-stevensd@google.com> X-Mailer: git-send-email 2.39.1.581.gbfd45094c4-goog In-Reply-To: <20230214075710.2401855-1-stevensd@google.com> References: <20230214075710.2401855-1-stevensd@google.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 4C1EBA0002 X-Stat-Signature: gdcnzndub4ico3w17yfn6ntjohb8ucbi X-HE-Tag: 1676361445-798362 X-HE-Meta: U2FsdGVkX192nZOSeWz//rl3AfadyY9NntaksKL9+VtJy8x2eWTGIQUNPDNdxz44Hg8bDmryF6EmlTd/DIkY+N/0OyJDp/G44Vb4fvHvJ+Ie54X2UAkRS9K/4W1cmAN8ICAvPlLHWfTjAxKunkKcE7xyR3DEkC7Jlj3wCOyu1/YCwh/oVCJYKNpohe41I1og0V+DZUlGY5J1iB1zwJ50LwugIWB0u2einPLyn8HzT9ku/dk9uFvwQvjKFeqITdAa44nDtOHhBMME0N+ngjMFmYDd2XWKgGiG47bx+TSbFM2YDV038cc7n+Bqiqd9w0CNP1onstbvQIUgQTyLrNkeoQTcDnHrLJT1yL51mp+1lLjHpU1Yv1E/5OxoJVUDJMDDgHs5yy5mBlU98JgdKiF7RqMB6URAKodtZIelOYdnCi3i+WWMR4T5s4jCra1KfmmRzaO0OspyxFsr4Dn5dVMta9kglTOKrSNPC9RFON/CQTCYcHO2H2ky479iD2mthmUUE4M/XSbOeBnQaWQPmJLzfXcf06Dz80OVWg72OyxvWsP/y4lqy05PPmC2KBylc3amIqY5HXx69lMsAqjnwW0AsnwE95PZwRfekmtmI6a6plDcIIpOvztD0SQhOD0iKFpRW2WFu4t/AaCsaJCHJPzgBR9ieE27OSSRgLzKLV6MwtilPnwlRH9MCLT6pWC6BXwuUMd+s8D9V6zwErWmePtj4V7/XmawWh3Swdj54y99/DEQPzWQMa6C050n/5VbhT3mdedauq0l++/NUDJvPojkEPBvie5bRCajCAwE7NnI/nwlk2feANtsB2pNaeRc6xunpbOXjL48jRo94DykSS/ZoaAXj+ph2+uhuBP/Z+zgGcEoaNFQcWQAImkxQO9lKpG2q84vKT4uOOinwA8m6fbS6sQKVMXC8h1gPT4WGoiXw2c2VOtWCNtvVaL7pfxSghu0rv1uzWIU6DE21DJPlKO 1hmIT5ic C8CpvqVWPshi+aeZ0MYQPYIIwRnQdJXDU66aCWUHplrpSRdMeTLdDHmqUUjoLgSQZlHevaUAUpn9jVtV5+ykNy9xwSlRdDdrSwGfuYV/x2ZxDTNJfPgyw7mdmQZrGwak+eZ8Qnru/YR6P6wnWLNTPKclGtfpxS1IrGdBhX4W0lOrUMZXUybzAuZI+6BXvCpeQY18gfFn/Cbv5syVZmPkBxQTj/Zcel1AAaDw0gumf13pH5byRqiBG4p9NP3N+o/OHT1jSFS+S+Lgaq9FEacrT5K+FGCrxNXG7mWgaYFpCKFeFLse56n8mCW4bs2+9UoLBB6CC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Stevens Make sure that collapse_file respects any userfaultfds registered with MODE_MISSING. If userspace has any such userfaultfds registered, then for any page which it knows to be missing, it may expect a UFFD_EVENT_PAGEFAULT. This means collapse_file needs to take care when collapsing a shmem range would result in replacing an empty page with a THP, so that it doesn't break userfaultfd. Synchronization when checking for userfaultfds in collapse_file is tricky because the mmap locks can't be used to prevent races with the registration of new userfaultfds. Instead, we provide synchronization by ensuring that userspace cannot observe the fact that pages are missing before we check for userfaultfds. Although this allows registration of a userfaultfd to race with collapse_file, it ensures that userspace cannot observe any pages transition from missing to present after such a race. This makes such a race indistinguishable to the collapse occurring immediately before the userfaultfd registration. The first step to provide this synchronization is to stop filling gaps during the loop iterating over the target range, since the page cache lock can be dropped during that loop. The second step is to fill the gaps with XA_RETRY_ENTRY after the page cache lock is acquired the final time, to avoid races with accesses to the page cache that only take the RCU read lock. This fix is targeted at khugepaged, but the change also applies to MADV_COLLAPSE. MADV_COLLAPSE on a range with a userfaultfd will now return EBUSY if there are any missing pages (instead of succeeding on shmem and returning EINVAL on anonymous memory). There is also now a window during MADV_COLLAPSE where a fault on a missing page will cause the syscall to fail with EAGAIN. The fact that intermediate page cache state can no longer be observed before the rollback of a failed collapse is also technically a userspace-visible change (via at least SEEK_DATA and SEEK_END), but it is exceedingly unlikely that anything relies on being able to observe that transient state. Signed-off-by: David Stevens --- mm/khugepaged.c | 66 +++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 58 insertions(+), 8 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b648f1053d95..8c2e2349e883 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -55,6 +55,7 @@ enum scan_result { SCAN_CGROUP_CHARGE_FAIL, SCAN_TRUNCATED, SCAN_PAGE_HAS_PRIVATE, + SCAN_PAGE_FILLED, }; #define CREATE_TRACE_POINTS @@ -1725,8 +1726,8 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, * - allocate and lock a new huge page; * - scan page cache replacing old pages with the new one * + swap/gup in pages if necessary; - * + fill in gaps; * + keep old pages around in case rollback is required; + * - finalize updates to the page cache; * - if replacing succeeds: * + copy data over; * + free old pages; @@ -1805,13 +1806,12 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, result = SCAN_TRUNCATED; goto xa_locked; } - xas_set(&xas, index); + xas_set(&xas, index + 1); } if (!shmem_charge(mapping->host, 1)) { result = SCAN_FAIL; goto xa_locked; } - xas_store(&xas, hpage); nr_none++; continue; } @@ -1970,6 +1970,56 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, put_page(page); goto xa_unlocked; } + + if (nr_none) { + struct vm_area_struct *vma; + int nr_none_check = 0; + + xas_unlock_irq(&xas); + i_mmap_lock_read(mapping); + xas_lock_irq(&xas); + + xas_set(&xas, start); + for (index = start; index < end; index++) { + if (!xas_next(&xas)) { + xas_store(&xas, XA_RETRY_ENTRY); + nr_none_check++; + } + } + + if (nr_none != nr_none_check) { + result = SCAN_PAGE_FILLED; + goto immap_locked; + } + + /* + * If userspace observed a missing page in a VMA with an armed + * userfaultfd, then it might expect a UFFD_EVENT_PAGEFAULT for + * that page, so we need to roll back to avoid suppressing such + * an event. Any userfaultfds armed after this point will not be + * able to observe any missing pages due to the previously + * inserted retry entries. + */ + vma_interval_tree_foreach(vma, &mapping->i_mmap, start, start) { + if (userfaultfd_missing(vma)) { + result = SCAN_EXCEED_NONE_PTE; + goto immap_locked; + } + } + +immap_locked: + i_mmap_unlock_read(mapping); + if (result != SCAN_SUCCEED) { + xas_set(&xas, start); + for (index = start; index < end; index++) { + if (xas_next(&xas) == XA_RETRY_ENTRY) + xas_store(&xas, NULL); + } + + goto xa_locked; + } + } + nr = thp_nr_pages(hpage); if (is_shmem) @@ -2068,15 +2118,13 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, } xas_set(&xas, start); - xas_for_each(&xas, page, end - 1) { + end = index; + for (index = start; index < end; index++) { + xas_next(&xas); page = list_first_entry_or_null(&pagelist, struct page, lru); if (!page || xas.xa_index < page->index) { - if (!nr_none) - break; nr_none--; - /* Put holes back where they were */ - xas_store(&xas, NULL); continue; } @@ -2592,11 +2640,13 @@ static int madvise_collapse_errno(enum scan_result r) case SCAN_ALLOC_HUGE_PAGE_FAIL: return -ENOMEM; case SCAN_CGROUP_CHARGE_FAIL: + case SCAN_EXCEED_NONE_PTE: return -EBUSY; /* Resource temporary unavailable - trying again might succeed */ case SCAN_PAGE_LOCK: case SCAN_PAGE_LRU: case SCAN_DEL_PAGE_LRU: + case SCAN_PAGE_FILLED: return -EAGAIN; /* * Other: Trying again likely not to succeed / error intrinsic to