From patchwork Tue Aug 17 08:17:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 12440857 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B082EC4320A for ; Tue, 17 Aug 2021 08:17:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6781760FBF for ; Tue, 17 Aug 2021 08:17:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 6781760FBF Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 095C76B0072; Tue, 17 Aug 2021 04:17:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 045488D0002; Tue, 17 Aug 2021 04:17:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E76718D0001; Tue, 17 Aug 2021 04:17:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0144.hostedemail.com [216.40.44.144]) by kanga.kvack.org (Postfix) with ESMTP id CEA666B0072 for ; Tue, 17 Aug 2021 04:17:46 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 7FA3A8249980 for ; Tue, 17 Aug 2021 08:17:46 +0000 (UTC) X-FDA: 78483868932.06.4ECA719 Received: from mail-qk1-f182.google.com (mail-qk1-f182.google.com [209.85.222.182]) by imf29.hostedemail.com (Postfix) with ESMTP id 119A6901E5B3 for ; Tue, 17 Aug 2021 08:17:44 +0000 (UTC) Received: by mail-qk1-f182.google.com with SMTP id t68so22132604qkf.8 for ; Tue, 17 Aug 2021 01:17:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=ijowZ4jcrG0Ehbu0X7RLv2m890u4vuBU8RzGN0n7bL0=; b=hkL/t1N59snMllnmFaWkLsl2GgYb22030wlHiEeGop1btUrASwwzsFxbAV5LMYGE7q ozjKkck4uIJ4l1yp4GbhPSoqRVn8uj5D5eatF3Z9fVPNX22tkXoXqc2N+x1RNEw8XK+Y GS6Y74GADMaDC9HV+cat0PaBPrmtm9u64hTjHxzKGPZeEFb95d72hs6yQGUjYxhWnmhM LZ5cGAOGrIQNAeddwbzrpmiwIVtS/7Eg67KQ15w6ZNtFK9UZ7gUiFvDLYtXzmo9UjvuI PA9PwX/hWqQngtiDQwZygVZ85HzrBgyRHoJv+9zymjFDvvfF8iqcoQ492w0mOGbUv1Ds k1Dg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=ijowZ4jcrG0Ehbu0X7RLv2m890u4vuBU8RzGN0n7bL0=; b=DC4wrllpJEDedvxUthC3SoTiE/2gr4he5WS1eip1kKeaYPveBaKjeQur3lWnXpo9Gj pqSYF7XdSeKlX9bhN9ZfqL0dxU7wDpuVwchx3rFCVqTLEuDkExKyUIp2q2Lpjp13WjGL Ahbd5VPG3S94d/KbR/tKaw91TmAi340vVIQmDMVdVhnnS3qLdKNFSUcrGbZ8ud+cqdls vURAOX3ZlDHdZXn29ERtMm4sAEt7mCzGwql6SbMimZu1EfiJ9mzJF5tVYRTXoO6NHfBu EfgUvQ5QE6nsh5o0djMrRmtbvsH+gjFKxt36koyGdD/oByYx9B0Ej5AvZ3NtTv6v3D7x RMOQ== X-Gm-Message-State: AOAM532XVUD3CJSajs/3jqFyjafYrWnHOCfZylxQljOD2SravhkVFwfM zRyuK0Tb1qDbbyvyxI/Y4fa0Tw== X-Google-Smtp-Source: ABdhPJyDjNDrAfBVAAvGPzIj04mSmcEsVajGGsFy6FFaXWF2QwGM7GhbPjNQfw/JroHWC0p7JHupbg== X-Received: by 2002:a05:620a:1222:: with SMTP id v2mr2636380qkj.1.1629188264267; Tue, 17 Aug 2021 01:17:44 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id h140sm921535qke.112.2021.08.17.01.17.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Aug 2021 01:17:43 -0700 (PDT) Date: Tue, 17 Aug 2021 01:17:41 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Matthew Wilcox , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 6/9] huge tmpfs: SGP_NOALLOC to stop collapse_file() on race In-Reply-To: Message-ID: <1355343b-acf-4653-ef79-6aee40214ac5@google.com> References: MIME-Version: 1.0 Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b="hkL/t1N5"; spf=pass (imf29.hostedemail.com: domain of hughd@google.com designates 209.85.222.182 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 119A6901E5B3 X-Stat-Signature: fubhp1rssdf6zq4d33n5u995jrykudm8 X-HE-Tag: 1629188264-787728 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: khugepaged's collapse_file() currently uses SGP_NOHUGE to tell shmem_getpage() not to try allocating a huge page, in the very unlikely event that a racing hole-punch removes the swapped or fallocated page as soon as i_pages lock is dropped. We want to consolidate shmem's huge decisions, removing SGP_HUGE and SGP_NOHUGE; but cannot quite persuade ourselves that it's okay to regress the protection in this case - Yang Shi points out that the huge page would remain indefinitely, charged to root instead of the intended memcg. collapse_file() should not even allocate a small page in this case: why proceed if someone is punching a hole? SGP_READ is almost the right flag here, except that it optimizes away from a fallocated page, with NULL to tell caller to fill with zeroes (like a hole); whereas collapse_file()'s sequence relies on using a cache page. Add SGP_NOALLOC just for this. There are too many consecutive "if (page"s there in shmem_getpage_gfp(): group it better; and fix the outdated "bring it back from swap" comment. Signed-off-by: Hugh Dickins Reviewed-by: Yang Shi --- include/linux/shmem_fs.h | 1 + mm/khugepaged.c | 2 +- mm/shmem.c | 29 +++++++++++++++++------------ 3 files changed, 19 insertions(+), 13 deletions(-) diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 9b7f7ac52351..7d97b15a2f7a 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -94,6 +94,7 @@ extern unsigned long shmem_partial_swap_usage(struct address_space *mapping, /* Flag allocation requirements to shmem_getpage */ enum sgp_type { SGP_READ, /* don't exceed i_size, don't allocate page */ + SGP_NOALLOC, /* similar, but fail on hole or use fallocated page */ SGP_CACHE, /* don't exceed i_size, may allocate page */ SGP_NOHUGE, /* like SGP_CACHE, but no huge pages */ SGP_HUGE, /* like SGP_CACHE, huge pages preferred */ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b0412be08fa2..045cc579f724 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1721,7 +1721,7 @@ static void collapse_file(struct mm_struct *mm, xas_unlock_irq(&xas); /* swap in or instantiate fallocated page */ if (shmem_getpage(mapping->host, index, &page, - SGP_NOHUGE)) { + SGP_NOALLOC)) { result = SCAN_FAIL; goto xa_unlocked; } diff --git a/mm/shmem.c b/mm/shmem.c index 740d48ef1eb5..226ac3a911e9 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1871,26 +1871,31 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, return error; } - if (page) + if (page) { hindex = page->index; - if (page && sgp == SGP_WRITE) - mark_page_accessed(page); - - /* fallocated page? */ - if (page && !PageUptodate(page)) { + if (sgp == SGP_WRITE) + mark_page_accessed(page); + if (PageUptodate(page)) + goto out; + /* fallocated page */ if (sgp != SGP_READ) goto clear; unlock_page(page); put_page(page); - page = NULL; - hindex = index; } - if (page || sgp == SGP_READ) - goto out; /* - * Fast cache lookup did not find it: - * bring it back from swap or allocate. + * SGP_READ: succeed on hole, with NULL page, letting caller zero. + * SGP_NOALLOC: fail on hole, with NULL page, letting caller fail. + */ + *pagep = NULL; + if (sgp == SGP_READ) + return 0; + if (sgp == SGP_NOALLOC) + return -ENOENT; + + /* + * Fast cache lookup and swap lookup did not find it: allocate. */ if (vma && userfaultfd_missing(vma)) {