From patchwork Wed Feb 26 18:55:08 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 13992984 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79A2BC19776 for ; Wed, 26 Feb 2025 18:55:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 96CAB280008; Wed, 26 Feb 2025 13:55:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 91C3E28000F; Wed, 26 Feb 2025 13:55:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 749FE280008; Wed, 26 Feb 2025 13:55:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 486B628000F for ; Wed, 26 Feb 2025 13:55:20 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 64DC81A0675 for ; Wed, 26 Feb 2025 18:55:18 +0000 (UTC) X-FDA: 83162998716.03.DD89801 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf24.hostedemail.com (Postfix) with ESMTP id 8E3E7180009 for ; Wed, 26 Feb 2025 18:55:16 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="ukRu1n/u"; spf=pass (imf24.hostedemail.com: domain of 3k2O_ZwYKCJwOQNAJ7CKKCHA.8KIHEJQT-IIGR68G.KNC@flex--surenb.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3k2O_ZwYKCJwOQNAJ7CKKCHA.8KIHEJQT-IIGR68G.KNC@flex--surenb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740596116; a=rsa-sha256; cv=none; b=IcgAgI86zr9tieqqYOKHSM/ja67SDGjyx/ZmUTgnYHQ1gxFx81UVweO/0o/ZddtSsJN9VR lhBwWyvN1BrQ/L9ojsPPi+wxRUfL0xiRK8IF7z9LFiZEV4sCom1m3c1UWlVLjpuwOnrknL nVQSiKSzRJRW+/rN2KWvcr3rCSlt24E= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="ukRu1n/u"; spf=pass (imf24.hostedemail.com: domain of 3k2O_ZwYKCJwOQNAJ7CKKCHA.8KIHEJQT-IIGR68G.KNC@flex--surenb.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3k2O_ZwYKCJwOQNAJ7CKKCHA.8KIHEJQT-IIGR68G.KNC@flex--surenb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740596116; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ds9BQrxGPLyNAd9raX4LmiVN9lc2UNQzgASQxqxWRTg=; b=wXCq5z+iFALh0tpb0hO5g+ZtbDXwoRhI45e78K11qBBhWFPBGi21Eq3kCaLQudTzmMW33B zjfDgYUURw8+D7n5xa/jOkpNn7AdxAiK8wsbAlkyt+OYkmL7bA9cqdJtPbJ/6jYc6/VYvb SzfSwV2vPVcGpbLXqqh48OlW8dhflWQ= Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2fc1a4c14d4so361923a91.0 for ; Wed, 26 Feb 2025 10:55:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740596115; x=1741200915; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ds9BQrxGPLyNAd9raX4LmiVN9lc2UNQzgASQxqxWRTg=; b=ukRu1n/uRJfFU/9El3ga5o5pAkNxOnG+3GfP5DkkeAgEfu5A/aCkQYUdjWVC7uOMJO y8fkuK/qX87SsWW8jauBJ64mNACO5LpK+9C4qTDj+3YYF/qizmIfZCXP1Cr2pui5w7of 6J0TIbmYmq9We8+/so0pMKEuoNFxTaAGHWYAmqBiHz/s3DOYLYbk9isI1jYUInihad24 CNWIUFS2N+Qal9VGX45QpOyWUDlisLWZ24k8H6jqOrHyKoXIj59aGhcsiS5VhFiZzzVb zGHOrP51KraCDcMa5bbr1lseHycP0UNcfqDjJj3dU4CpuHIivpkO2s48qEqCqrJnhjJX DIng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740596115; x=1741200915; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ds9BQrxGPLyNAd9raX4LmiVN9lc2UNQzgASQxqxWRTg=; b=Q6Hol0dIl11br6VOEx3jCVyLzwRV2aZ4vQG53gqn16CxSUm0tFyOsPGMAF66J51JWe 5QEfhCatgV/yqv8uNkltjQ2VJaiUl3MhvypyDs4AV6IAf8lDM4Ffw1N5EB4o4GNFToo1 N+Z2insed8C0CNgqz41FRShtuyFXCkrexFXPHlMQoBVYbJo55BywDV5XWjpmScjc1LzD F65qQ+gqHOkOXmwnOpZkXAKljXZOFEzVNSoWYzFEko8tkA9UeifI6B4e3MqA0LVYbqQj nKPe772x73nkPpCXMr+FV3l3PDQUY+N3JYD0b5G2fuQFaH2Bwr9XOeJ+PywzDL3DtxG+ 2Ilg== X-Forwarded-Encrypted: i=1; AJvYcCV9X01qnN3eC3wUHSLw0rvyJXDIJ7374hCD/7QuGuBOB5gOEkZU/hfychc/3fURSS6PDQ1G6gWBUw==@kvack.org X-Gm-Message-State: AOJu0YwBKV7b35R9lmilT81zHDiVSD1ObTdngXAgcd4DgUFCVo0QUv2m 3L6lwEFgbqzuVpD5Ij/oQkCnSrugjszpfL+pZNKb6h4ieaIs0tn9LFlcrGxF+08X9USP7KS0SSi OYw== X-Google-Smtp-Source: AGHT+IFcJCXuyF0wiZx7bOR1HEPmTXt8+rT2+rQ4qFqtP/5aygIpSvhmaCmY0HxoJ0P5eJwqc4YHPU/rkPM= X-Received: from pjboi16.prod.google.com ([2002:a17:90b:3a10:b0:2fc:1eb0:5743]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:54c7:b0:2ee:5111:a54b with SMTP id 98e67ed59e1d1-2fe68d05e2bmr12186917a91.31.1740596115492; Wed, 26 Feb 2025 10:55:15 -0800 (PST) Date: Wed, 26 Feb 2025 10:55:08 -0800 In-Reply-To: <20250226185510.2732648-1-surenb@google.com> Mime-Version: 1.0 References: <20250226185510.2732648-1-surenb@google.com> X-Mailer: git-send-email 2.48.1.658.g4767266eb4-goog Message-ID: <20250226185510.2732648-2-surenb@google.com> Subject: [PATCH 1/2] userfaultfd: do not block on locking a large folio with raised refcount From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: lokeshgidra@google.com, aarcange@redhat.com, 21cnbao@gmail.com, v-songbaohua@oppo.com, david@redhat.com, peterx@redhat.com, willy@infradead.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, hughd@google.com, jannh@google.com, kaleshsingh@google.com, surenb@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org X-Stat-Signature: xpdb3t654gcmwhbk1cnewazpt5dqpi53 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 8E3E7180009 X-Rspam-User: X-HE-Tag: 1740596116-983306 X-HE-Meta: U2FsdGVkX1+qIpbsRkYtSuK5ug3c7dymq/zzNmoOjeJanx8WAMH7V9sws3WJl1e8cxZochI7niuZAdwiQ7pejJpk2EsRaATYQxGEfRVJIUoSf2cueAknsxkK63QEN5QkWAmxO0ICbyWKEJbNSWf/toXJJzdZEfmFvjeDDWf+fHp3zhYuulrOdT4l4GKRcgn/IPeg116XT42tpvDxTbe7u56ER9QHq0VXjrqdth9Gw3cAwW2WXppCRufHTvLf+xuHZTqFo4Fd2HH3xNuMtC/HO+VEhQ+HUAA8I+CtB+Tw+beXkHi+IB5EDaHsVP8Moz2fhlfjaoY8cBCe7PRCD0/WuOoXelsSUp1TS+UXgecMUI0gOcUijPO/ShP4R6twcOyul6q03UA7cSQOhou6Kia+eJjHnkTcgoGCdL51G2ZEfI3r94kd8uqlNt4jCLtjOcmmd0D8pQW/16QhIf4qfN5OgZsnLy9Y/iMxMCV/UxntnyFcn/sZnu5yh6VYRF095QqgKFxKXwwk4S9ZNtB2+du7+92jeLApvhIX/RLXcHVM3wMRSsXQnVDWlrurQLeBp9MyVz03bQhF+52fAqeTl0ZFdD7T+AEB98L4FSk4Wl/1YS4Aatf+p8YitpyW2iXVsnFLLE6ZP1jTlgjd7gTLF+CHfaU1uw262mbESesEBee+Dj5Vf9RFDyFSzTEZtIkb9LZKEpL4WmMGoob4XOOSKyMd0cAyKzzrE1a9wDHOM7hXWzIGmhsJojxqSWhH1n/r6tNGUgJE30m5OGtG9aFTFqox+cX7cvUGR4UhUHj88vhit3bXC9rA0hjDk5mKZsc/ztrKhcmmk9A/fI4oEYgTTTEtiBjYrvKvzK0Ut9ozmQUMxoOWHmVOqnehPZeC/n+R5rkz0nwd6znzALWKGq4/4fQIY5oCDV2AYebajAnpdT6TxfoTgA7HEUdx9E3ME99ijQI9/CwWn+lEHr+amSwpaMP kUKE/jhK JwwqXQk3rEHYxCFWTwzbnJAA9upZJitpnDJAgmKyDA+tHFV9C4V19p8Bx4SeyprO1PhevVJto2Q8E6q4RaZtPGEcIbw9nNrQFHq5kFLdVYMuzt/ZSHsVMc9x5DR+DCKN4qMkw3mkB5+7b2G1ASmsjZrHufZK6a3Ju9Mt/y0ee3wAGcQ2bpS2246J1KvaXbLqN4BMdLzMeJ7BaCfqjzpC1scdItqmUxyXVPqKsx5egIpTQOI8Ziuu/n+4Hx03MGQYrymUkSDJ2zviLxCIIWs1aYuyYZs2OnUOj9NMA768ZeuRy5pCsyLwGKChfgl/2PsNMoUabwEw+lXqbJMRE7p1Y0KiafegiXTsSR7OB+DLz55F3Qhvg1/nXNyK5dtgNByxyG3g1eiFgyEKI58XwJ0MJZsEraiVpGgeYZhRFfLzNRNLuGXyk/06Zjvz5m0Ra1T5Htdot7/RzVHvWAtbydZzsxEWUs8bRwycqAC+C0YGrVac1r0aQbpLbJ00lOsW+o72Sp0FzLxNQvTq2vLnEivDFn0W+MnN2ULP7nfU9ReGqu3bYMfdkOFcrH6OG/yWBeMZJazTCkH0WKIQ3f9He/oazxzuf/Vou0LD8124C4hBKJ/u+PwM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Lokesh recently raised an issue about UFFDIO_MOVE getting into a deadlock state when it goes into split_folio() with raised folio refcount. split_folio() expects the reference count to be exactly mapcount + num_pages_in_folio + 1 (see can_split_folio()) and fails with EAGAIN otherwise. If multiple processes are trying to move the same large folio, they raise the refcount (all tasks succeed in that) then one of them succeeds in locking the folio, while others will block in folio_lock() while keeping the refcount raised. The winner of this race will proceed with calling split_folio() and will fail returning EAGAIN to the caller and unlocking the folio. The next competing process will get the folio locked and will go through the same flow. In the meantime the original winner will be retried and will block in folio_lock(), getting into the queue of waiting processes only to repeat the same path. All this results in a livelock. An easy fix would be to avoid waiting for the folio lock while holding folio refcount, similar to madvise_free_huge_pmd() where folio lock is acquired before raising the folio refcount. Since we lock and take a refcount of the folio while holding the PTE lock, changing the order of these operations should not break anything. Modify move_pages_pte() to try locking the folio first and if that fails and the folio is large then return EAGAIN without touching the folio refcount. If the folio is single-page then split_folio() is not called, so we don't have this issue. Lokesh has a reproducer [1] and I verified that this change fixes the issue. [1] https://github.com/lokeshgidra/uffd_move_ioctl_deadlock Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") Reported-by: Lokesh Gidra Signed-off-by: Suren Baghdasaryan Reviewed-by: Peter Xu Cc: stable@vger.kernel.org Acked-by: Liam R. Howlett --- Note this patch is v2 of [2] but I did not bump up the version because now it's part of the patchset which is at its v1. Hopefully that's not too confusing. Changes since v1 [2]: - Rebased over mm-hotfixes-unstable to avoid merge conflicts with [3] - Added Reviewed-by, per Peter Xu - Added a note about PTL lock in the changelog, per Liam R. Howlett - CC'ed stable [2] https://lore.kernel.org/all/20250225204613.2316092-1-surenb@google.com/ [3] https://lore.kernel.org/all/20250226003234.0B98FC4CEDD@smtp.kernel.org/ mm/userfaultfd.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 8eae4ea3cafd..e0f1e38ac5d8 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1250,6 +1250,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, */ if (!src_folio) { struct folio *folio; + bool locked; /* * Pin the page while holding the lock to be sure the @@ -1269,12 +1270,26 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, goto out; } + locked = folio_trylock(folio); + /* + * We avoid waiting for folio lock with a raised refcount + * for large folios because extra refcounts will result in + * split_folio() failing later and retrying. If multiple + * tasks are trying to move a large folio we can end + * livelocking. + */ + if (!locked && folio_test_large(folio)) { + spin_unlock(src_ptl); + err = -EAGAIN; + goto out; + } + folio_get(folio); src_folio = folio; src_folio_pte = orig_src_pte; spin_unlock(src_ptl); - if (!folio_trylock(src_folio)) { + if (!locked) { pte_unmap(&orig_src_pte); pte_unmap(&orig_dst_pte); src_pte = dst_pte = NULL;