From patchwork Tue Feb 25 20:46:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 13990925 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CF67C021B2 for ; Tue, 25 Feb 2025 20:46:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2813A6B0089; Tue, 25 Feb 2025 15:46:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 231A46B008A; Tue, 25 Feb 2025 15:46:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F8FB280001; Tue, 25 Feb 2025 15:46:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E10CF6B0089 for ; Tue, 25 Feb 2025 15:46:21 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 89740C02F4 for ; Tue, 25 Feb 2025 20:46:21 +0000 (UTC) X-FDA: 83159649762.20.70C1CC5 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf01.hostedemail.com (Postfix) with ESMTP id BBBFA40007 for ; Tue, 25 Feb 2025 20:46:18 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=klypevgW; spf=pass (imf01.hostedemail.com: domain of 3GSy-ZwYKCK4gifSbPUccUZS.QcaZWbil-aaYjOQY.cfU@flex--surenb.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3GSy-ZwYKCK4gifSbPUccUZS.QcaZWbil-aaYjOQY.cfU@flex--surenb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740516378; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=AwsNt/V6C0SEvs4Dnf61cWDTvftwNiiIlb4RnaXcmjM=; b=Tq1frOPOxtiEfZ6jUqG5AU17GwAWq/7a3jHm1D7FeAULkNzurpfJFaHWVBFTRKxFhaTrW7 X+iKtRQBPfdcB+uhAFmplF3BkiWrCYSP9X14jaju+Bu6exblTw57P6baIRa6nhXAJGmBpt L6kv89479130CxaGvl+Y5FqxoYkE6J0= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=klypevgW; spf=pass (imf01.hostedemail.com: domain of 3GSy-ZwYKCK4gifSbPUccUZS.QcaZWbil-aaYjOQY.cfU@flex--surenb.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3GSy-ZwYKCK4gifSbPUccUZS.QcaZWbil-aaYjOQY.cfU@flex--surenb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740516378; a=rsa-sha256; cv=none; b=08bg21Yno8V+jjD/q6LCewJtjzjgdXFVUllaAPNdCT2UirFQynNINh3qpxzMJhsJCl0bh5 Ja/WaZYN3BR/3aqrUyQ0dCJGxH7RIUANvDJIFTEfNzcWS30TD6aglt911TPxlnEst0A63g eIPNr/WDfjUBs9N3L0mOOqClZWHmh3A= Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-220e04e67e2so173068705ad.2 for ; Tue, 25 Feb 2025 12:46:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740516377; x=1741121177; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=AwsNt/V6C0SEvs4Dnf61cWDTvftwNiiIlb4RnaXcmjM=; b=klypevgW5R68pQKpRuG+ZYboKaALzwq+iawqMK0QQ2OZN7V+MZB6v9kZp15CRcGV1W qbBEHqZeo9F7qO/UrFo1TCRmDS9TXAODtR2hIDxKd81hOHi2RY0npMSfZjESgUmWTSA1 64RljX3lPetbo8LRqBJRwx3NFMRP1NjWnIBA6B7mQob+TjFWnEuzL+6ifnlEKXWa+vTp gn+YcOJk3L77sCFytlQ3VJsgl2HsZ9Rs+IRXjt/JrT2nQ0H8Z2dRVnofSLBlNa7BSfI9 OizefK5iT650Ax4ajdocl/fPx/cvwh02ZBmD4rqC8fE6KGcl4it8CVItF+a745rvCvTY Xm7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740516377; x=1741121177; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=AwsNt/V6C0SEvs4Dnf61cWDTvftwNiiIlb4RnaXcmjM=; b=iNbqymX8zmUYc6vn/WO5bpgdoWu2EczsRaZzebosXT97LW+GT4VJTfYBD3Pc1UGFcw otauLQVAyN2Kns2PNDwB1+rmR+vqKiJ7XwzJsHrEepfemFhH1ZDgs2I/PptlayhltDtr ozuO8OrmeNJPV8hOrzSM1A9+qTgmsZHXTIlzXlcodh27hHkyhJVUO0RETxyAkHfJ9ElI UapqYUvw8Uk/TTuBU0HDZ70ha3qnMO2hy7VWFn3TTk0X3KQIN46EpLOrepX4+yhlVDAJ 5OCeHpcFdKHZIVgzQNE+nAo+RFeVKP8j92Nr6A/E4J71vz+ngBRwnb9L6KSs+PK+rlsl PRbQ== X-Forwarded-Encrypted: i=1; AJvYcCVx89a2GBNdsbxlTqZiRC15VZ34S45v+ULu5ZrR9QrwYa/fkoQq0DvRGjrYiNweEuJ0HJrPP3i8Ew==@kvack.org X-Gm-Message-State: AOJu0Yw0wsz9JxKrBo0tyMBBNGRW5nsBE76WJSKuO97nExaTAnWrhAs8 /YUUl4O8+15QOHGiA1FMn6gBKaa6cAkP0eyGePFudJE6UBnY1dTw2TTUVoKtex4a5CLwWdAOXqK 4Ug== X-Google-Smtp-Source: AGHT+IG3nnnvmkG+wxrHTInj2Ig69F87Y7hW6VG7poXUtHgZo3oQOVU0yNovmqwt3N1ta/wB/S58PbVD86Y= X-Received: from pfwp47.prod.google.com ([2002:a05:6a00:26ef:b0:730:7a22:c567]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:aa7:8895:0:b0:732:6480:2bed with SMTP id d2e1a72fcca58-7348bdd348dmr1042646b3a.13.1740516377355; Tue, 25 Feb 2025 12:46:17 -0800 (PST) Date: Tue, 25 Feb 2025 12:46:13 -0800 Mime-Version: 1.0 X-Mailer: git-send-email 2.48.1.658.g4767266eb4-goog Message-ID: <20250225204613.2316092-1-surenb@google.com> Subject: [PATCH 1/1] userfaultfd: do not block on locking a large folio with raised refcount From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: lokeshgidra@google.com, aarcange@redhat.com, 21cnbao@gmail.com, v-songbaohua@oppo.com, david@redhat.com, peterx@redhat.com, willy@infradead.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, hughd@google.com, jannh@google.com, kaleshsingh@google.com, surenb@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org X-Rspam-User: X-Rspamd-Queue-Id: BBBFA40007 X-Stat-Signature: p4846uhqhr1otbjsm64r7n9qd4546i1n X-Rspamd-Server: rspam03 X-HE-Tag: 1740516378-41001 X-HE-Meta: U2FsdGVkX18PkyaMTUQ752gd4XRp8pghSZSJoA7Q10c2Wx5MbUvWCWpft2qKHfFShFRUxHxQtVqMo3PvooQIyDxFnHAaprtk+at5jhKss03MlUXSJd1rhUgChLmUgZhq0CyLVxlN05epkg5uRQrsKa7sc1/XjYzj/8xIO/fGrJPmDpCQ0M3qwPbb3vLqlP9YocZBioGpubiZZXn5pBIAD+2wlOcFZESHxPEIOGZBGCAwBKx2zHM2TfkUmtmUzY6Ti0sZh0YQy5R4i0qvzi4viqSX0QXiR4dQR7ZQk4yNkZxLtLNxoSrrhxAz33aLsz7yVeSvVoJViczAwFKLSDhSxGDAPQoJPWtmScVoJtA8cPl+GrfHYMq5TGgBsp/MF7ZQCMXTmmVP/P87xfDIp+7zGYYDkAC4mbKSGJMVkYiW2NngcGjjQaf+4xSoyGB8mR5QVPEgS61fWWGDJ11XvdKpkbAwfI1MdT6Gyp9IaQPBu78sU5m+WB4iDlOya9Qmw8N8dnh2EfXI5WxPhsCE75kTGXLwAJv72Ebzn9CEX4nSdYziSNy3HQg3/Ol2m76lUIHDDWeJ1V/+9ItI0M5qt2jPpCP3lVy93a+yhlOymoPO49VSmJT8b96nFSGpYhlN9VnmGclxnQ+RuwFZysFk3XCJFCqwh+pJ7ef5Bt6GxFLrZnyOztThqk5zwc3CupHReLIlHPflJCOBqiVwWsXfiKYvZOFiHQjju5V767s0OCy34zP/5LOzl/3ncmyOLS2KGLrT9XijFcQfcyLKGI+ooK/1ArDusKadbDIlJVQhi0C900lJ810OBS9VwAhH45Snm1W5NFlUxns8oB/g7Tmkd7o+dluJbY5GiREp89fi11AK5mgZ0RAxzvW+lGlseK/qgIAYNx/y8Lz8vwCYaw9SQuJbAIdooeV372AjtbfTsvtvGkgcVgUC518qays1v+0cHYPlTgTlObUSS7NDYtJI3nC tstwm3cn pDwVWSfZ4m8PA3QeTc3YAPCw/eB3Icgmpr+iJ7r+ncZmu/WM4ELSLgt0YEaAy17oNAAmP2LuhYInSM0urSZsE1SJ09d1i7QY9fqa+SaQysjXh9ftRqUrar9o6ViEygVKJGHDRBw33L6b6JVOZYu87CyhMRObugujReOxydwYBD3rmJrgPwBmmsEEOEUupmCmS748XB0PuOwHyKUarBBoUGWTtYptHfiGt6bkSyKlGg22dvcHGIKcoGD9rzyWAJ8XtNVQ24odcl0iRnFZFbbwqSCDZHwi8Bv0qcA/6K48v0TJn9TQ0xlRy8Su8+vBBH+yQZHNfcpXEgKFVa7X6d0TYgrwbkbSoeB4qsEwk2xYwOSM0/0H7U2A5z7bBJGVAkrCAMuuxaDCqgzr3Yd9erSZYsyoGUBB/NKm+R87W86ka+RsNyufS2NicR+VafltzG8CCZj3sLPueDPhelsQYwsvVgTw1lfc/98SDgVCEgyF+bZ18srSfFi9zLNjpqbnGziDhnqaDSNU6CiF7b+FNnbF5NBf5UsuHsY3ZIwXUCnkw88H1MZKRpGipgC+3Rmy6WkS1mVA6 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000016, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Lokesh recently raised an issue about UFFDIO_MOVE getting into a deadlock state when it goes into split_folio() with raised folio refcount. split_folio() expects the reference count to be exactly mapcount + num_pages_in_folio + 1 (see can_split_folio()) and fails with EAGAIN otherwise. If multiple processes are trying to move the same large folio, they raise the refcount (all tasks succeed in that) then one of them succeeds in locking the folio, while others will block in folio_lock() while keeping the refcount raised. The winner of this race will proceed with calling split_folio() and will fail returning EAGAIN to the caller and unlocking the folio. The next competing process will get the folio locked and will go through the same flow. In the meantime the original winner will be retried and will block in folio_lock(), getting into the queue of waiting processes only to repeat the same path. All this results in a livelock. An easy fix would be to avoid waiting for the folio lock while holding folio refcount, similar to madvise_free_huge_pmd() where folio lock is acquired before raising the folio refcount. Modify move_pages_pte() to try locking the folio first and if that fails and the folio is large then return EAGAIN without touching the folio refcount. If the folio is single-page then split_folio() is not called, so we don't have this issue. Lokesh has a reproducer [1] and I verified that this change fixes the issue. [1] https://github.com/lokeshgidra/uffd_move_ioctl_deadlock Reported-by: Lokesh Gidra Signed-off-by: Suren Baghdasaryan Reviewed-by: Peter Xu --- mm/userfaultfd.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) base-commit: 801d47bd96ce22acd43809bc09e004679f707c39 diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 867898c4e30b..f17f8290c523 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1236,6 +1236,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, */ if (!src_folio) { struct folio *folio; + bool locked; /* * Pin the page while holding the lock to be sure the @@ -1255,12 +1256,26 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, goto out; } + locked = folio_trylock(folio); + /* + * We avoid waiting for folio lock with a raised refcount + * for large folios because extra refcounts will result in + * split_folio() failing later and retrying. If multiple + * tasks are trying to move a large folio we can end + * livelocking. + */ + if (!locked && folio_test_large(folio)) { + spin_unlock(src_ptl); + err = -EAGAIN; + goto out; + } + folio_get(folio); src_folio = folio; src_folio_pte = orig_src_pte; spin_unlock(src_ptl); - if (!folio_trylock(src_folio)) { + if (!locked) { pte_unmap(&orig_src_pte); pte_unmap(&orig_dst_pte); src_pte = dst_pte = NULL;