From patchwork Sat Sep 30 03:27:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13404928 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 736BDE7734F for ; Sat, 30 Sep 2023 03:28:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 086578D00F7; Fri, 29 Sep 2023 23:28:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 036B98D002B; Fri, 29 Sep 2023 23:27:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E68778D00F7; Fri, 29 Sep 2023 23:27:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D48548D002B for ; Fri, 29 Sep 2023 23:27:59 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 28578160600 for ; Sat, 30 Sep 2023 03:27:59 +0000 (UTC) X-FDA: 81291829878.07.55379F3 Received: from mail-yw1-f177.google.com (mail-yw1-f177.google.com [209.85.128.177]) by imf27.hostedemail.com (Postfix) with ESMTP id 5CA1240008 for ; Sat, 30 Sep 2023 03:27:57 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=U8k78+zl; spf=pass (imf27.hostedemail.com: domain of hughd@google.com designates 209.85.128.177 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696044477; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=c3VmQUPLhrERK4N60GAFBYo4ZpXXYTUhvBGEGGQtmto=; b=O/dHvVPYQhT8CaSl2nKG4lzR77x0UEvUdicMcvtcqC+b+CP71QmTUIhYzF4ufsBDz4j3m2 6Clgfs3OpMTzi3UtkycwRa9IHWFfDqGNl43TVNXUfv1jjdnweuOXya5DAQmooRwqd+5giC 2CFUrB/nnQVgLs4+AP2KOocTX2fhzCE= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=U8k78+zl; spf=pass (imf27.hostedemail.com: domain of hughd@google.com designates 209.85.128.177 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696044477; a=rsa-sha256; cv=none; b=aGmdWgvdmG8uqQbssuEarzR6XgGd2k8FiR0fvvjLjpQPB4i8t6tfp5uopAhjATyI0FInQX PMK8ZFm9cbM3cnlMA+F63DyCCb0++bmY870zNfD13GG6rfnbl/6cVmGpqYDxAeAEvE2P8G gNoX4QRQnHiH8jrtLXWl6JI0jFp0oT4= Received: by mail-yw1-f177.google.com with SMTP id 00721157ae682-59bebd5bdadso181819297b3.0 for ; Fri, 29 Sep 2023 20:27:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1696044476; x=1696649276; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=c3VmQUPLhrERK4N60GAFBYo4ZpXXYTUhvBGEGGQtmto=; b=U8k78+zlemQ9G/Y6P7ntBjBcy019U42AOq2XJo+/FDzVMt8qayJO1JaV+vM98VDlr8 9BQBkkoJVTj5J1veKKJeSwP0HEdgx3gen+4no+kiAa6mq6LimYCJF31HZEbBp05QTTLS ikFkF88aBVVJWNfxMmP/ZntE+VOTl1WRdCZXJPRxOxxqoBu4ftueARL5uMBlEqHQI3OZ cTGXbLvBhke5RYzH6MuK1fYaFk7j9NcHP6/2i6pL+GPKNx8uFEpAnkqoAfIbOxB7zpfg HKgEJ2UY5yq0HvfSrTB/hOyPz/IvpsDYPS9yUFKSJu5LR0ix7TO6gfYSC+iYbd3uPBub rmZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696044476; x=1696649276; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=c3VmQUPLhrERK4N60GAFBYo4ZpXXYTUhvBGEGGQtmto=; b=kY3MMdVRaFnFURWCw+lwwkG2Bv6nyeu4laYak5n4x4qD3FhZ3W4Po1Z3LGmoRNXuet xjQLsL7JFaw4VBhgrW3G89/ptptYHQ3+TzcwjkXVjElvLGa6KZMs/74hw/oCGw9QxAzQ 2TMiwEErb7Ee2v0BU/BC3BYaiFQ6Dda4meNBxQ0FiaMysUsWt92NTG8Mrqa7M7Srwvto iwn6fYPrsPZZJwFKZALVJg2W6Q+JckPlyJwchKA2nxqkVNeaiNV14m0DtYBnrN/jaLJh v2rVgTtTzHviGUWnNECdRMU1DWXGKrT5/lxRxl2RKT1Wj4/SD8ZyxR3Fm74WCpmts3YA /avg== X-Gm-Message-State: AOJu0YxKSYoqtHwr5eJOrOvrhvDVyjt36lXH0XaMoZ9t03t+Zy8DqZ7S SD4jhUPM+1ZESDiTPqL5ds7uVg== X-Google-Smtp-Source: AGHT+IHp1qF65eHgx27QgmeBCNpipwBMvCNXdEQuNLXjnq7w6FkkrCvB+RR9BXWpFDUscD7OTgD8iw== X-Received: by 2002:a0d:ee46:0:b0:5a1:635e:e68 with SMTP id x67-20020a0dee46000000b005a1635e0e68mr5109602ywe.46.1696044476344; Fri, 29 Sep 2023 20:27:56 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id l8-20020a0de208000000b00586108dd8f5sm5983418ywe.18.2023.09.29.20.27.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Sep 2023 20:27:55 -0700 (PDT) Date: Fri, 29 Sep 2023 20:27:53 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Christian Brauner , Carlos Maiolino , Chuck Lever , Jan Kara , Matthew Wilcox , Johannes Weiner , Axel Rasmussen , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 3/8] shmem: factor shmem_falloc_wait() out of shmem_fault() In-Reply-To: Message-ID: <6fe379a4-6176-9225-9263-fe60d2633c0@google.com> References: MIME-Version: 1.0 X-Rspamd-Queue-Id: 5CA1240008 X-Rspam-User: X-Stat-Signature: 7hh4trbkcfbzcngsnw59zio1u4srazfq X-Rspamd-Server: rspam01 X-HE-Tag: 1696044477-871404 X-HE-Meta: U2FsdGVkX1+snmAAX7A7B+suh4Auab8AL9cveVS0K6NzK6t/bS4GaA+kGwnUfDG68ij7iCBa5D0kADl7Nmvy/PhEiorNKtZTu1Sa59zYafLpl3zx7F5OYTcKF6xzd2/9Mtus6tcgx7jAtMyy7r2a0Pjo8Xhub/32NKFUuojKCRKUYcE4aS1nsj8FpfSGdU/F9A62kHOqzqxukptFJDNP4ogpaKsKNpJYHty5odGbVMoOAvbTtl+Bx9mN8U9sx8nHWZ2ItEmH/x4ignil1nuAJuXakfGxoqr6a6K1JMyjdePISJ6LCsOZj1UpKepmMvL0yz4YA5EN6SAbj2nclZ2woytPe3OEk2A/bDmdmLKx7PMEcE6IWt1TkKPGmSlX/5h5TZ4WJ2FbOThuvBPC5shEkcpB0wgW9pam4DQP7zv//Epdu7clBiKu9z8IXpNrXwdGq7gs2/LOlRQip8NEfGpVRvqvOryoChZnSsppDtFqNckYNwyKs3mVjAbX1xwEnCvFvlSTQPFkQFpipFJtlxYcC1AcZ2Yp/zOgXt8u6WAtr5a+BNprCk7VjhbVrCeojo4AYi7ewNPGhP5kYhMb3xuXFH//2KWXQs0qB/zT2FMvFcrG+yBYNzs2zFXavdAJYKOT1BRLFzDF5c+cc1aW+BB/mwq2KD4/B/A8iD7dK7TmQZAOBhEoDcwN5eFNz7blxG0sjpHe9W9wW7tadzFnOjbf/68Cs0nV7gJnxr7yNgpvC13Vdck7/+cumPFC5x88ZlFGWbkpg13NJFDwMwDbz3S4hk/Xy7lZnecTNOQLTo5M2iTan7NBuJOg6alUDgJGp86I7hr1b7tljhKFsysKjMnK0w56waTRr8x8xs5hhvSbUC+/f19wTEcdjRBXNn05rUeIfMLSaPHn6AL5gNyWC8ixd9RYo02HMVaXhco+lZvRVRKsVB/TDONNUJ83g1FA8cBmiglXRJW4uUXvC+bjbWg A1oK3T1/ 8eJHcegI7OSwijJhB91id+k4wlLw9b7F1OGCOnqUwjaHlKnR6ZIrTmiofCsA7+W9laF39ap7mJ+zJOSEzrZKwHcG3SbAM5IG4FgVzUYSlAXvxuHKDsHd5T6H4F2Q+VX9LOL3CVPHqo/9MPi9488QD1cFaWSXs/nHmyx4sSpwA5fg4iP1TJ1yeXWFGmXkD7EAbuUB/6C8XWxxIW1zcadMA9GhiDn3XNcINhnk+dVHv2lhzwcM6JmmeRAUa0l3r2PJMm44npqPgPXjg9OrD76E5m8yj3ilvgzyI/ggWdoQqQWHUICRdCCOh6A/4RSBJAQeyT7oUVGaPWI5Z5xhmSl48gRvczTCzuyBNZ96u86DYxPTVAhNSYx6/OvsBdkdeKHCltketBBYa9CN9ELMhsBU2mjtJUn0x/ynG9MPv3Aj/MFf85j76dyrXJYNWkA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: That Trinity livelock shmem_falloc avoidance block is unlikely, and a distraction from the proper business of shmem_fault(): separate it out. (This used to help compilers save stack on the fault path too, but both gcc and clang nowadays seem to make better choices anyway.) Signed-off-by: Hugh Dickins Reviewed-by: Jan Kara --- mm/shmem.c | 126 +++++++++++++++++++++++++++++------------------------ 1 file changed, 69 insertions(+), 57 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 824eb55671d2..5501a5bc8d8c 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2148,87 +2148,99 @@ int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop, * entry unconditionally - even if something else had already woken the * target. */ -static int synchronous_wake_function(wait_queue_entry_t *wait, unsigned mode, int sync, void *key) +static int synchronous_wake_function(wait_queue_entry_t *wait, + unsigned int mode, int sync, void *key) { int ret = default_wake_function(wait, mode, sync, key); list_del_init(&wait->entry); return ret; } +/* + * Trinity finds that probing a hole which tmpfs is punching can + * prevent the hole-punch from ever completing: which in turn + * locks writers out with its hold on i_rwsem. So refrain from + * faulting pages into the hole while it's being punched. Although + * shmem_undo_range() does remove the additions, it may be unable to + * keep up, as each new page needs its own unmap_mapping_range() call, + * and the i_mmap tree grows ever slower to scan if new vmas are added. + * + * It does not matter if we sometimes reach this check just before the + * hole-punch begins, so that one fault then races with the punch: + * we just need to make racing faults a rare case. + * + * The implementation below would be much simpler if we just used a + * standard mutex or completion: but we cannot take i_rwsem in fault, + * and bloating every shmem inode for this unlikely case would be sad. + */ +static vm_fault_t shmem_falloc_wait(struct vm_fault *vmf, struct inode *inode) +{ + struct shmem_falloc *shmem_falloc; + struct file *fpin = NULL; + vm_fault_t ret = 0; + + spin_lock(&inode->i_lock); + shmem_falloc = inode->i_private; + if (shmem_falloc && + shmem_falloc->waitq && + vmf->pgoff >= shmem_falloc->start && + vmf->pgoff < shmem_falloc->next) { + wait_queue_head_t *shmem_falloc_waitq; + DEFINE_WAIT_FUNC(shmem_fault_wait, synchronous_wake_function); + + ret = VM_FAULT_NOPAGE; + fpin = maybe_unlock_mmap_for_io(vmf, NULL); + shmem_falloc_waitq = shmem_falloc->waitq; + prepare_to_wait(shmem_falloc_waitq, &shmem_fault_wait, + TASK_UNINTERRUPTIBLE); + spin_unlock(&inode->i_lock); + schedule(); + + /* + * shmem_falloc_waitq points into the shmem_fallocate() + * stack of the hole-punching task: shmem_falloc_waitq + * is usually invalid by the time we reach here, but + * finish_wait() does not dereference it in that case; + * though i_lock needed lest racing with wake_up_all(). + */ + spin_lock(&inode->i_lock); + finish_wait(shmem_falloc_waitq, &shmem_fault_wait); + } + spin_unlock(&inode->i_lock); + if (fpin) { + fput(fpin); + ret = VM_FAULT_RETRY; + } + return ret; +} + static vm_fault_t shmem_fault(struct vm_fault *vmf) { - struct vm_area_struct *vma = vmf->vma; - struct inode *inode = file_inode(vma->vm_file); + struct inode *inode = file_inode(vmf->vma->vm_file); gfp_t gfp = mapping_gfp_mask(inode->i_mapping); struct folio *folio = NULL; + vm_fault_t ret = 0; int err; - vm_fault_t ret = VM_FAULT_LOCKED; /* * Trinity finds that probing a hole which tmpfs is punching can - * prevent the hole-punch from ever completing: which in turn - * locks writers out with its hold on i_rwsem. So refrain from - * faulting pages into the hole while it's being punched. Although - * shmem_undo_range() does remove the additions, it may be unable to - * keep up, as each new page needs its own unmap_mapping_range() call, - * and the i_mmap tree grows ever slower to scan if new vmas are added. - * - * It does not matter if we sometimes reach this check just before the - * hole-punch begins, so that one fault then races with the punch: - * we just need to make racing faults a rare case. - * - * The implementation below would be much simpler if we just used a - * standard mutex or completion: but we cannot take i_rwsem in fault, - * and bloating every shmem inode for this unlikely case would be sad. + * prevent the hole-punch from ever completing: noted in i_private. */ if (unlikely(inode->i_private)) { - struct shmem_falloc *shmem_falloc; - - spin_lock(&inode->i_lock); - shmem_falloc = inode->i_private; - if (shmem_falloc && - shmem_falloc->waitq && - vmf->pgoff >= shmem_falloc->start && - vmf->pgoff < shmem_falloc->next) { - struct file *fpin; - wait_queue_head_t *shmem_falloc_waitq; - DEFINE_WAIT_FUNC(shmem_fault_wait, synchronous_wake_function); - - ret = VM_FAULT_NOPAGE; - fpin = maybe_unlock_mmap_for_io(vmf, NULL); - if (fpin) - ret = VM_FAULT_RETRY; - - shmem_falloc_waitq = shmem_falloc->waitq; - prepare_to_wait(shmem_falloc_waitq, &shmem_fault_wait, - TASK_UNINTERRUPTIBLE); - spin_unlock(&inode->i_lock); - schedule(); - - /* - * shmem_falloc_waitq points into the shmem_fallocate() - * stack of the hole-punching task: shmem_falloc_waitq - * is usually invalid by the time we reach here, but - * finish_wait() does not dereference it in that case; - * though i_lock needed lest racing with wake_up_all(). - */ - spin_lock(&inode->i_lock); - finish_wait(shmem_falloc_waitq, &shmem_fault_wait); - spin_unlock(&inode->i_lock); - - if (fpin) - fput(fpin); + ret = shmem_falloc_wait(vmf, inode); + if (ret) return ret; - } - spin_unlock(&inode->i_lock); } + WARN_ON_ONCE(vmf->page != NULL); err = shmem_get_folio_gfp(inode, vmf->pgoff, &folio, SGP_CACHE, gfp, vmf, &ret); if (err) return vmf_error(err); - if (folio) + if (folio) { vmf->page = folio_file_page(folio, vmf->pgoff); + ret |= VM_FAULT_LOCKED; + } return ret; }