From patchwork Mon Feb 10 19:37:43 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13968977 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CDFE9257446 for ; Mon, 10 Feb 2025 19:38:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216294; cv=none; b=dKi66exVHHBd/i69PcGU/KZ+ofkT8PEXCcL61baiiHi9LJB8AugH0YI43ez4XtpeyG/CZNmumEfrDEPnsXfEM++dX962GoVZQUyLK9GKsmlw+V/KdQ7aSrEBp9vyF7B0rNo9O5uXYYtTvVN3AhAI1zIVc6wxjBu1u1Lu/MZ177E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216294; c=relaxed/simple; bh=ODeZini8JnqXcIwx+tCC9TEASo3kB4fA5d8uBTr46Z8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=BXhC6ola9pKHg9NLpoKwOLqenG4ocOymfTEyGyUFCtjbDFD4rAESsOvMz8LS3AqiLeSWpITyoP9ml8LL/msCDVtjW+mcABMcCMfo1H9sI/hA9kEGp2wfQhAZXbMb8H7KAIodezncKmz8fIa9lZNgZVASiQvZ0hbddCOB+37LtOE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=JiN8OcKY; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="JiN8OcKY" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216291; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=h7zrLQOHNY2WMB4NJ6M+YTGeviHiiyTUNMzrBj8byNk=; b=JiN8OcKYdZ3ALwmSKUIKbnuLL9GtwYJsSHjOVV90U/7K5iJXRsf4AyzIR7YbOAVdTvABjS QngfZW92lUMXLmpz1oaIr/1e9OVHIGnmsHdIcOUckJnvwIblno29v62w7XB6nZKN1sCVDK T1ks82TgPv2UYlopBI0AH+rscLW62X0= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-66-ZZDhziXqN2Gh5TL-byuiaA-1; Mon, 10 Feb 2025 14:38:10 -0500 X-MC-Unique: ZZDhziXqN2Gh5TL-byuiaA-1 X-Mimecast-MFC-AGG-ID: ZZDhziXqN2Gh5TL-byuiaA Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-43943bd1409so9692875e9.3 for ; Mon, 10 Feb 2025 11:38:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216289; x=1739821089; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=h7zrLQOHNY2WMB4NJ6M+YTGeviHiiyTUNMzrBj8byNk=; b=OJPA1Bt8EjLbUGi8a76lb/okxGWIAsHzwUiuP2I6vYHzBQdkjIWLEW4xfE37FUpyY0 cSJ6gUjtjhjw/VSPn0IS3i8e47biczrHYgJFhvYlFbXTwNcvaRvsdHoixWxhY5aVUeNJ gnbqzhO+r8Fds9KC4NTKi1FxbBkVyLrUpyF7prGt019t/Efjax7Nc/JV7UvQDmu57DxM aMfUDOu1ir2EPuuQtzlJxD+3BVjVFCBFJ7Lw8UVqzbEikO/veg6IS7HcWIWCcYvHZJiD wBVFXjN/bdmTCaJNTTLAQ5rHwL6+jhduuFt3ZZ6CVatoZRcVZh1a6UQgpl6qeIge7pdI ToTQ== X-Forwarded-Encrypted: i=1; AJvYcCV+ZwV3aYlAZUB+rHtQQR5CnfqsSo9UIHr6ScN06zoYmo1mbin8lcWIvDwZhh8HwNbQIy8oLQ9JwuJO/znZGUoP0js=@vger.kernel.org X-Gm-Message-State: AOJu0YxKO4QgFQB/EWBiQkvy+gDclT6QKSQKhHAFEDq510rx6GFGHYOY ksSeplm7bnqhcrKPZE055lAQM5fFQcWGxamb7J2ZM6mDXl0+tgwMKtJKi9i78SDtTnhArKhMC0U MZdJj46nqj+MKuAM024HK2IjmujfBoMJIAN/NOPYUHPPO3ApK2N/MG7K5SpJrEZQ2ql3BAg== X-Gm-Gg: ASbGnct6WQ1CGrh7C5xHvMZ+6Dh/sUvRQOtzwX2Mz6e+qdq4qropncN5DZFDB9Clm+9 wtTxm12Fl0k6xxyOur+Fanpe2UfH5XSp3108250spCjq/oHLRV4GDbHqHmcfvx7U74kEuhq5U6r MLdLDWKmafdeAzKTz9oXkr3Iq/lUpqMISfHPO2BBqycGegZFQYt/lBF57XXQA45XG45PXHC6jHG wpBwYewYZ9zlzI8nvmzW3IBw8jmJ+0PXbUQkf3kTex4ZZJpFgGUMedITowb6hiX/sSKdLbbtKfV ZgPxrkMgu+QicwRK1aQEeenhZcU1/WUIxPz+BtsSfSkiDiWhH4pvavwqkzvZKCPITA== X-Received: by 2002:a05:600c:3482:b0:439:42c6:f108 with SMTP id 5b1f17b1804b1-43942c6f621mr54495735e9.6.1739216289151; Mon, 10 Feb 2025 11:38:09 -0800 (PST) X-Google-Smtp-Source: AGHT+IGBNHtYM4Npt0edDE/HafeE1RKAAghby0XrE6w/BdutPKy4T8is0nT67O6E9ZhRDeehOSIMcA== X-Received: by 2002:a05:600c:3482:b0:439:42c6:f108 with SMTP id 5b1f17b1804b1-43942c6f621mr54495475e9.6.1739216288775; Mon, 10 Feb 2025 11:38:08 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-439452533ecsm23523535e9.0.2025.02.10.11.38.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:07 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe , John Hubbard , stable@vger.kernel.org Subject: [PATCH v2 01/17] mm/gup: reject FOLL_SPLIT_PMD with hugetlb VMAs Date: Mon, 10 Feb 2025 20:37:43 +0100 Message-ID: <20250210193801.781278-2-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: xh5jz8OvJrPBYD8jHFmZTEr1TQcAgpcacXVw0Gje6x4_1739216289 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true We only have two FOLL_SPLIT_PMD users. While uprobe refuses hugetlb early, make_device_exclusive_range() can end up getting called on hugetlb VMAs. Right now, this means that with a PMD-sized hugetlb page, we can end up calling split_huge_pmd(), because pmd_trans_huge() also succeeds with hugetlb PMDs. For example, using a modified hmm-test selftest one can trigger: [ 207.017134][T14945] ------------[ cut here ]------------ [ 207.018614][T14945] kernel BUG at mm/page_table_check.c:87! [ 207.019716][T14945] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 207.021072][T14945] CPU: 3 UID: 0 PID: ... [ 207.023036][T14945] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 [ 207.024834][T14945] RIP: 0010:page_table_check_clear.part.0+0x488/0x510 [ 207.026128][T14945] Code: ... [ 207.029965][T14945] RSP: 0018:ffffc9000cb8f348 EFLAGS: 00010293 [ 207.031139][T14945] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: ffffffff8249a0cd [ 207.032649][T14945] RDX: ffff88811e883c80 RSI: ffffffff8249a357 RDI: ffff88811e883c80 [ 207.034183][T14945] RBP: ffff888105c0a050 R08: 0000000000000005 R09: 0000000000000000 [ 207.035688][T14945] R10: 00000000ffffffff R11: 0000000000000003 R12: 0000000000000001 [ 207.037203][T14945] R13: 0000000000000200 R14: 0000000000000001 R15: dffffc0000000000 [ 207.038711][T14945] FS: 00007f2783275740(0000) GS:ffff8881f4980000(0000) knlGS:0000000000000000 [ 207.040407][T14945] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 207.041660][T14945] CR2: 00007f2782c00000 CR3: 0000000132356000 CR4: 0000000000750ef0 [ 207.043196][T14945] PKRU: 55555554 [ 207.043880][T14945] Call Trace: [ 207.044506][T14945] [ 207.045086][T14945] ? __die+0x51/0x92 [ 207.045864][T14945] ? die+0x29/0x50 [ 207.046596][T14945] ? do_trap+0x250/0x320 [ 207.047430][T14945] ? do_error_trap+0xe7/0x220 [ 207.048346][T14945] ? page_table_check_clear.part.0+0x488/0x510 [ 207.049535][T14945] ? handle_invalid_op+0x34/0x40 [ 207.050494][T14945] ? page_table_check_clear.part.0+0x488/0x510 [ 207.051681][T14945] ? exc_invalid_op+0x2e/0x50 [ 207.052589][T14945] ? asm_exc_invalid_op+0x1a/0x20 [ 207.053596][T14945] ? page_table_check_clear.part.0+0x1fd/0x510 [ 207.054790][T14945] ? page_table_check_clear.part.0+0x487/0x510 [ 207.055993][T14945] ? page_table_check_clear.part.0+0x488/0x510 [ 207.057195][T14945] ? page_table_check_clear.part.0+0x487/0x510 [ 207.058384][T14945] __page_table_check_pmd_clear+0x34b/0x5a0 [ 207.059524][T14945] ? __pfx___page_table_check_pmd_clear+0x10/0x10 [ 207.060775][T14945] ? __pfx___mutex_unlock_slowpath+0x10/0x10 [ 207.061940][T14945] ? __pfx___lock_acquire+0x10/0x10 [ 207.062967][T14945] pmdp_huge_clear_flush+0x279/0x360 [ 207.064024][T14945] split_huge_pmd_locked+0x82b/0x3750 ... Before commit 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code"), we would have ignored the flag; instead, let's simply refuse the combination completely in check_vma_flags(): the caller is likely not prepared to handle any hugetlb folios. We'll teach make_device_exclusive_range() separately to ignore any hugetlb folios as a future-proof safety net. Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code") Reviewed-by: John Hubbard Reviewed-by: Alistair Popple Cc: Signed-off-by: David Hildenbrand --- mm/gup.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/gup.c b/mm/gup.c index 3883b307780ea..61e751baf862c 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1283,6 +1283,9 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma)) return -EOPNOTSUPP; + if ((gup_flags & FOLL_SPLIT_PMD) && is_vm_hugetlb_page(vma)) + return -EOPNOTSUPP; + if (vma_is_secretmem(vma)) return -EFAULT; From patchwork Mon Feb 10 19:37:44 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13968978 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CEECB2586D6 for ; Mon, 10 Feb 2025 19:38:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216300; cv=none; b=A/fmVwDElRcKbBG+G1R950DRAZa9fr5X8rt7qd5zorEw4DNLtNcgraXexcOB7iZ8rbcR52JCZEyuZi39h98OaKU6b+tNzNBq0/VhkaKqO5rpnAYRIuNH2MvURqOUZEKUuZjb+wVCK+Cf4c0VWGxnLDTEIHqgePcvAdO6VXgghJs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216300; c=relaxed/simple; bh=Xaxds3jkrAWo/nPRt6XArpsBiXLgWRUTYm/DhS1e3Ec=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=kYa35VVzc2TtUSFIOmoGhVB0nAr+xUHIFyiuhuKiXOUPEeyhUjtOTVqCykz0+TSZfpsxONalhnm/scN7FIqrZ5V298mbEVqY4Tu+6z03E1bZCEF6+rCqaIvscltPyIUPnxiRwHDmRfAupJj8VYdWH8WsEOqILc/JS++LiEaeb7M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=PvAMsqFV; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PvAMsqFV" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216297; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sqeXk3kyzFMqBsxEiyyOZegmX6HzmCNZGUJC3XQma+8=; b=PvAMsqFVqyLFimvCSyLi25m+PfFuhxxh7sQCbk7oBX/IZRsF/mA/OaixvocvpWHxkt0/jj uN4N5Ed4hb27y3wYZgS8gbxhGg/DrnxDnTt61OIqJVDA7zDdbLHrUP8LZaodkH4s5s6JCl 5Zk5iBgbWId32Js+IxueQOM7lYR3b80= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-301-GK5n9PnLPVOHk-LGqqapmg-1; Mon, 10 Feb 2025 14:38:14 -0500 X-MC-Unique: GK5n9PnLPVOHk-LGqqapmg-1 X-Mimecast-MFC-AGG-ID: GK5n9PnLPVOHk-LGqqapmg Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-4394b8bd4e1so3393065e9.0 for ; Mon, 10 Feb 2025 11:38:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216293; x=1739821093; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sqeXk3kyzFMqBsxEiyyOZegmX6HzmCNZGUJC3XQma+8=; b=TpDvBHQioVPilDbDeTcJLjW7X0GOQRnaK+il+PKWnH8soqGACCQQ5QvQR6F4XYv6Sd BH1AB0HIzG8sy4bNl5l/AubwF72d4lt0+2aMyf6xs7FVt4l5NREH1Hoq+OXAv18Meliq QFMFv3HGX0t2hPZVoRjpRTnPNrB9B1dFz6ZvSLqp2PdG3s7LfXdmDgTXy7z/swIoymf3 iENCjcHr8/BkF9aJnDREOumMTS0tXrWqst8x3NX49KP8cFxxFePAh3V6un6MaxQ+ptyR vx3aeVNIgmOWA+mMMtdEt55QA3vWvEDAzJMsf9b7bAADmocUvG3UxTFL/R2hlagTGWNX G5zg== X-Forwarded-Encrypted: i=1; AJvYcCXxyFEohebOXC/QYbY5Mec+WvElYIqgx1MaHOOOYk7J5aGvkRmxdEoiGUhRDg2Zv6RciMufUY7B2W2Xvdnk9ECv2lk=@vger.kernel.org X-Gm-Message-State: AOJu0Yz7ZE64OHZdsv+uXeML6TchcGQp0lEXFHRP7mQATbJBGKtqwGUS De4VDv4vNmEQqs0vlQUbRY9T78CsyN8Ac9h2F5vsofJ3yfgc9jrtvwELMIEkui2w8T3nBZpRJZh whBdPsQnLPXlfSnbQhxpKxpNKtib9wcpr5TdCHcfKplmwU0qPHPCaAkSeCkPIJajcSYcpsA== X-Gm-Gg: ASbGncslalUQsWb+E28ZQoDxOiEuwlleE3CpMEb5PeREBbhW1bN5wuy9urCcylbe3v+ RWC5Mx9LX/MllYMo5mf4N4SYf22dfm5F4NdFVJMT1dXu9vhQqhJJXCJ6+lp5R/f31QByppIBXrO T9OgANhMOQOljvuaroq56ppQDP4/T04Cc+F9YFPF8z9WlllBBHdpc00Scs/1fJZBc3iwT2pfIG9 lBKgVhqhyu7lU7yES9YlVZDByRJxh+ivL4VNR3MjuV+6LssCt2dt+peHLzlEUON/47w8dqvagt8 JDFvKpXvIYO3ey0dFnw3giptu2ngxjZaY4YRs7bO5xod0sXc5vgJKMezgZcnJPWHtQ== X-Received: by 2002:a05:600c:34c4:b0:439:4b9e:461b with SMTP id 5b1f17b1804b1-4394b9e47dfmr21358185e9.14.1739216292949; Mon, 10 Feb 2025 11:38:12 -0800 (PST) X-Google-Smtp-Source: AGHT+IFz4p8WiTgagZxwnDzZ7MmocZCF9a1hUwQMJIvxH3cH5x6X+HJSMDirq9gM71FmkI7KhAXcpA== X-Received: by 2002:a05:600c:34c4:b0:439:4b9e:461b with SMTP id 5b1f17b1804b1-4394b9e47dfmr21357855e9.14.1739216292555; Mon, 10 Feb 2025 11:38:12 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-43947bdc5c4sm26937995e9.23.2025.02.10.11.38.09 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:11 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe , stable@vger.kernel.org Subject: [PATCH v2 02/17] mm/rmap: reject hugetlb folios in folio_make_device_exclusive() Date: Mon, 10 Feb 2025 20:37:44 +0100 Message-ID: <20250210193801.781278-3-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: jLIh7yzMAYYvEd0bT2Q8velbiO2xu9Qkvxw6KvJMhIo_1739216293 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true Even though FOLL_SPLIT_PMD on hugetlb now always fails with -EOPNOTSUPP, let's add a safety net in case FOLL_SPLIT_PMD usage would ever be reworked. In particular, before commit 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code"), GUP(FOLL_SPLIT_PMD) would just have returned a page. In particular, hugetlb folios that are not PMD-sized would never have been prone to FOLL_SPLIT_PMD. hugetlb folios can be anonymous, and page_make_device_exclusive_one() is not really prepared for handling them at all. So let's spell that out. Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Reviewed-by: Alistair Popple Cc: Signed-off-by: David Hildenbrand --- mm/rmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/rmap.c b/mm/rmap.c index c6c4d4ea29a7e..17fbfa61f7efb 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2499,7 +2499,7 @@ static bool folio_make_device_exclusive(struct folio *folio, * Restrict to anonymous folios for now to avoid potential writeback * issues. */ - if (!folio_test_anon(folio)) + if (!folio_test_anon(folio) || folio_test_hugetlb(folio)) return false; rmap_walk(folio, &rwc); From patchwork Mon Feb 10 19:37:45 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13968979 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 22BB1257448 for ; Mon, 10 Feb 2025 19:38:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216303; cv=none; b=Od3fpaM4IZEXtNvrLHiRO5jkCFksLJTXNcoVCOaCzxULg9TYwHgSFBmOLaSIaswKMp842CrqcC7tQbrwgc1z7NCETrVg+2ONxO8mXt/oBVB41fz6Df/Oau/XnYKjz9giUMaKqdKNHgkSVO86wc3XJNZkYOqGWR/SEVLCAHfUv00= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216303; c=relaxed/simple; bh=81YcqeJ5l6K3coBhnps2sV67ll19MCM8Tocl525qDoc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=VkTb5AtbtbBzANcoFI6M1FkqgATw0/z8mgj98/khkHgZamF2fpU9nb/KzXf0F42dt0up45iW1YWG4//25+XHL9wKpTa22i4oE1I/e5TkYi7RWZS9Qv6iGcO3iIVJJMEAi/VaN2l7NftgOT1xAgt3d7kZj2gVsRws54ZbqSYlmq4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=aioW5dBR; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="aioW5dBR" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216300; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rcUwV1yRdQas5qpumSgRDUB0x7445Uy9FJj+N197GyA=; b=aioW5dBR7otXmuvMI2o/HmnjlgAEErhToSHLj55BInDwcObR+VjoEFM3O0/SUtoPEII5es dw1UvjjZbXre+izYbQ08vPonm385s7+8Ui7QP+j7IzCijbzHwP1t6DTm4nv34jTSI7Z6Vu /Y/8C9fW33XXls9cGor6LbD2IL+DuI8= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-553-mT1ODbxFMmiHVIA2_XQjhQ-1; Mon, 10 Feb 2025 14:38:18 -0500 X-MC-Unique: mT1ODbxFMmiHVIA2_XQjhQ-1 X-Mimecast-MFC-AGG-ID: mT1ODbxFMmiHVIA2_XQjhQ Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-43933eb7e1dso15162535e9.2 for ; Mon, 10 Feb 2025 11:38:18 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216297; x=1739821097; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rcUwV1yRdQas5qpumSgRDUB0x7445Uy9FJj+N197GyA=; b=rpV0Cd3+ht7uzoEnN48sJwVr83g++SGJVw2dwU75DmGQ9ARhtE2rA33A1fgrSnu9f+ UlsM015fY1uzZvlCNzxYmmNnD4Sql7yXU67cmD6raWGoEHX2IFSpXZhELUVJmqsHIK1x ST1eqewdJDQ23NldUi58LtUEiQUpDlSsxzBBOYU7/CyEpL7YGZue6DADB9ThhoaOnU7g pS8OZyAe1WAuq7QIywfe/Wju0ZwZUTElFLHv9nho7QUMavne1pOqmfwTtOQohzBgzNcW VQ7fvFS79yqLhXSWwhx6GzFZXdmWbc/mf/XSwgSkSw3ZqCw83cajohW96oYluf8rQ6s4 +BKQ== X-Forwarded-Encrypted: i=1; AJvYcCX8079nM69aO8kAEcRsfkeRml2iL/yO3VWt/Os1yTb/6fpXQY19ebb0ZOqj5/a9vwGG5Zdu6oeKEWVi0ds6kPfZC5Y=@vger.kernel.org X-Gm-Message-State: AOJu0YwZK1uHTqUStyGwekdby4eryhwuzfor5bLEpfUfwID10xgTFdLh tyqRrGbsXyIFhgFE79YeqBpxglCHs1llbxkKq7PLlN+OW1hxyq35HCdiVIvWvvxZ7j8+2GZTKS2 KDgsz69byNhFpfm2wOQBIuMIHJ929rkXyz1/72+iEkVWGdWWlOqUXLosmczct5pNeH2k1Zg== X-Gm-Gg: ASbGncstQvXT1Mu3MJrv8orCYlfcpsnuGqB+urmnZSlz+VljDcswelYJrjQ3VAM2nMD BSkzyALmxMFHI4dBgrL2KQxG6Tr3cbX0T03jDrQnFf3Fzv6epnCpWgMi8Xgz/3XVy6nyljCdZdM O5oN1an5XgP+sDB4sg8Qk0Kt5KK2xLRwXtWT40ly/YGpFbZmHG/PqyAf60mXuLD9gz4+KFsjOvD lsp2ieGonMizxn2Yjs8wrDDCVhw/IfFtCqjEcVs4Qc8zD+lle+F8lZBbM/nqlV6CxIofTnfEMfa NEPKgmA8BQrsFbrraq9ECA3aIij/3IPEJq2iPa3GmSoMfGfrxdOyLrWq58c0R57Pbg== X-Received: by 2002:a05:600c:34ce:b0:439:3254:4bf1 with SMTP id 5b1f17b1804b1-43932544f7cmr76706325e9.8.1739216296970; Mon, 10 Feb 2025 11:38:16 -0800 (PST) X-Google-Smtp-Source: AGHT+IG+xVOX7QwrZYtM4UMSRPj8gjU3UfaLjkH/l+EqYjgLGDmsbA/lZxZrURGW8ZjfFBhlKrTOWA== X-Received: by 2002:a05:600c:34ce:b0:439:3254:4bf1 with SMTP id 5b1f17b1804b1-43932544f7cmr76706055e9.8.1739216296381; Mon, 10 Feb 2025 11:38:16 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-4390d94d802sm195253865e9.12.2025.02.10.11.38.13 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:15 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe , Simona Vetter Subject: [PATCH v2 03/17] mm/rmap: convert make_device_exclusive_range() to make_device_exclusive() Date: Mon, 10 Feb 2025 20:37:45 +0100 Message-ID: <20250210193801.781278-4-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: QTPBoFjlV_ax190JFnuqp5Y9TM57b3UFEjvE3bKiScI_1739216297 X-Mimecast-Originator: redhat.com The single "real" user in the tree of make_device_exclusive_range() always requests making only a single address exclusive. The current implementation is hard to fix for properly supporting anonymous THP / large folios and for avoiding messing with rmap walks in weird ways. So let's always process a single address/page and return folio + page to minimize page -> folio lookups. This is a preparation for further changes. Reject any non-anonymous or hugetlb folios early, directly after GUP. While at it, extend the documentation of make_device_exclusive() to clarify some things. Acked-by: Simona Vetter Reviewed-by: Alistair Popple Signed-off-by: David Hildenbrand Signed-off-by: David Hildenbrand --- Documentation/mm/hmm.rst | 2 +- Documentation/translations/zh_CN/mm/hmm.rst | 2 +- drivers/gpu/drm/nouveau/nouveau_svm.c | 5 +- include/linux/mmu_notifier.h | 2 +- include/linux/rmap.h | 5 +- lib/test_hmm.c | 41 +++----- mm/rmap.c | 103 ++++++++++++-------- 7 files changed, 83 insertions(+), 77 deletions(-) diff --git a/Documentation/mm/hmm.rst b/Documentation/mm/hmm.rst index f6d53c37a2ca8..7d61b7a8b65b7 100644 --- a/Documentation/mm/hmm.rst +++ b/Documentation/mm/hmm.rst @@ -400,7 +400,7 @@ Exclusive access memory Some devices have features such as atomic PTE bits that can be used to implement atomic access to system memory. To support atomic operations to a shared virtual memory page such a device needs access to that page which is exclusive of any -userspace access from the CPU. The ``make_device_exclusive_range()`` function +userspace access from the CPU. The ``make_device_exclusive()`` function can be used to make a memory range inaccessible from userspace. This replaces all mappings for pages in the given range with special swap diff --git a/Documentation/translations/zh_CN/mm/hmm.rst b/Documentation/translations/zh_CN/mm/hmm.rst index 0669f947d0bc9..22c210f4e94f3 100644 --- a/Documentation/translations/zh_CN/mm/hmm.rst +++ b/Documentation/translations/zh_CN/mm/hmm.rst @@ -326,7 +326,7 @@ devm_memunmap_pages() 和 devm_release_mem_region() 当资源可以绑定到 ``s 一些设备具有诸如原子PTE位的功能,可以用来实现对系统内存的原子访问。为了支持对一 个共享的虚拟内存页的原子操作,这样的设备需要对该页的访问是排他的,而不是来自CPU -的任何用户空间访问。 ``make_device_exclusive_range()`` 函数可以用来使一 +的任何用户空间访问。 ``make_device_exclusive()`` 函数可以用来使一 个内存范围不能从用户空间访问。 这将用特殊的交换条目替换给定范围内的所有页的映射。任何试图访问交换条目的行为都会 diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c index b4da82ddbb6b2..39e3740980bb7 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.c +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -609,10 +609,9 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, notifier_seq = mmu_interval_read_begin(¬ifier->notifier); mmap_read_lock(mm); - ret = make_device_exclusive_range(mm, start, start + PAGE_SIZE, - &page, drm->dev); + page = make_device_exclusive(mm, start, drm->dev, &folio); mmap_read_unlock(mm); - if (ret <= 0 || !page) { + if (IS_ERR(page)) { ret = -EINVAL; goto out; } diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index e2dd57ca368b0..d4e7146618262 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -46,7 +46,7 @@ struct mmu_interval_notifier; * @MMU_NOTIFY_EXCLUSIVE: to signal a device driver that the device will no * longer have exclusive access to the page. When sent during creation of an * exclusive range the owner will be initialised to the value provided by the - * caller of make_device_exclusive_range(), otherwise the owner will be NULL. + * caller of make_device_exclusive(), otherwise the owner will be NULL. */ enum mmu_notifier_event { MMU_NOTIFY_UNMAP = 0, diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 683a04088f3f2..86425d42c1a90 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -663,9 +663,8 @@ int folio_referenced(struct folio *, int is_locked, void try_to_migrate(struct folio *folio, enum ttu_flags flags); void try_to_unmap(struct folio *, enum ttu_flags flags); -int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, - unsigned long end, struct page **pages, - void *arg); +struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr, + void *owner, struct folio **foliop); /* Avoid racy checks */ #define PVMW_SYNC (1 << 0) diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 056f2e411d7b4..e4afca8d18802 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -780,10 +780,8 @@ static int dmirror_exclusive(struct dmirror *dmirror, unsigned long start, end, addr; unsigned long size = cmd->npages << PAGE_SHIFT; struct mm_struct *mm = dmirror->notifier.mm; - struct page *pages[64]; struct dmirror_bounce bounce; - unsigned long next; - int ret; + int ret = 0; start = cmd->addr; end = start + size; @@ -795,36 +793,27 @@ static int dmirror_exclusive(struct dmirror *dmirror, return -EINVAL; mmap_read_lock(mm); - for (addr = start; addr < end; addr = next) { - unsigned long mapped = 0; - int i; - - next = min(end, addr + (ARRAY_SIZE(pages) << PAGE_SHIFT)); + for (addr = start; !ret && addr < end; addr += PAGE_SIZE) { + struct folio *folio; + struct page *page; - ret = make_device_exclusive_range(mm, addr, next, pages, NULL); - /* - * Do dmirror_atomic_map() iff all pages are marked for - * exclusive access to avoid accessing uninitialized - * fields of pages. - */ - if (ret == (next - addr) >> PAGE_SHIFT) - mapped = dmirror_atomic_map(addr, next, pages, dmirror); - for (i = 0; i < ret; i++) { - if (pages[i]) { - unlock_page(pages[i]); - put_page(pages[i]); - } + page = make_device_exclusive(mm, addr, NULL, &folio); + if (IS_ERR(page)) { + ret = PTR_ERR(page); + break; } - if (addr + (mapped << PAGE_SHIFT) < next) { - mmap_read_unlock(mm); - mmput(mm); - return -EBUSY; - } + ret = dmirror_atomic_map(addr, addr + PAGE_SIZE, &page, dmirror); + ret = ret == 1 ? 0 : -EBUSY; + folio_unlock(folio); + folio_put(folio); } mmap_read_unlock(mm); mmput(mm); + if (ret) + return ret; + /* Return the migrated data for verification. */ ret = dmirror_bounce_init(&bounce, start, size); if (ret) diff --git a/mm/rmap.c b/mm/rmap.c index 17fbfa61f7efb..7ccf850565d33 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2495,70 +2495,89 @@ static bool folio_make_device_exclusive(struct folio *folio, .arg = &args, }; - /* - * Restrict to anonymous folios for now to avoid potential writeback - * issues. - */ - if (!folio_test_anon(folio) || folio_test_hugetlb(folio)) - return false; - rmap_walk(folio, &rwc); return args.valid && !folio_mapcount(folio); } /** - * make_device_exclusive_range() - Mark a range for exclusive use by a device + * make_device_exclusive() - Mark a page for exclusive use by a device * @mm: mm_struct of associated target process - * @start: start of the region to mark for exclusive device access - * @end: end address of region - * @pages: returns the pages which were successfully marked for exclusive access + * @addr: the virtual address to mark for exclusive device access * @owner: passed to MMU_NOTIFY_EXCLUSIVE range notifier to allow filtering + * @foliop: folio pointer will be stored here on success. + * + * This function looks up the page mapped at the given address, grabs a + * folio reference, locks the folio and replaces the PTE with special + * device-exclusive PFN swap entry, preventing access through the process + * page tables. The function will return with the folio locked and referenced. * - * Returns: number of pages found in the range by GUP. A page is marked for - * exclusive access only if the page pointer is non-NULL. + * On fault, the device-exclusive entries are replaced with the original PTE + * under folio lock, after calling MMU notifiers. * - * This function finds ptes mapping page(s) to the given address range, locks - * them and replaces mappings with special swap entries preventing userspace CPU - * access. On fault these entries are replaced with the original mapping after - * calling MMU notifiers. + * Only anonymous non-hugetlb folios are supported and the VMA must have + * write permissions such that we can fault in the anonymous page writable + * in order to mark it exclusive. The caller must hold the mmap_lock in read + * mode. * * A driver using this to program access from a device must use a mmu notifier * critical section to hold a device specific lock during programming. Once - * programming is complete it should drop the page lock and reference after + * programming is complete it should drop the folio lock and reference after * which point CPU access to the page will revoke the exclusive access. + * + * Notes: + * #. This function always operates on individual PTEs mapping individual + * pages. PMD-sized THPs are first remapped to be mapped by PTEs before + * the conversion happens on a single PTE corresponding to @addr. + * #. While concurrent access through the process page tables is prevented, + * concurrent access through other page references (e.g., earlier GUP + * invocation) is not handled and not supported. + * #. device-exclusive entries are considered "clean" and "old" by core-mm. + * Device drivers must update the folio state when informed by MMU + * notifiers. + * + * Returns: pointer to mapped page on success, otherwise a negative error. */ -int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, - unsigned long end, struct page **pages, - void *owner) +struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr, + void *owner, struct folio **foliop) { - long npages = (end - start) >> PAGE_SHIFT; - long i; + struct folio *folio; + struct page *page; + long npages; + + mmap_assert_locked(mm); - npages = get_user_pages_remote(mm, start, npages, + /* + * Fault in the page writable and try to lock it; note that if the + * address would already be marked for exclusive use by a device, + * the GUP call would undo that first by triggering a fault. + */ + npages = get_user_pages_remote(mm, addr, 1, FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD, - pages, NULL); - if (npages < 0) - return npages; - - for (i = 0; i < npages; i++, start += PAGE_SIZE) { - struct folio *folio = page_folio(pages[i]); - if (PageTail(pages[i]) || !folio_trylock(folio)) { - folio_put(folio); - pages[i] = NULL; - continue; - } + &page, NULL); + if (npages != 1) + return ERR_PTR(npages); + folio = page_folio(page); - if (!folio_make_device_exclusive(folio, mm, start, owner)) { - folio_unlock(folio); - folio_put(folio); - pages[i] = NULL; - } + if (!folio_test_anon(folio) || folio_test_hugetlb(folio)) { + folio_put(folio); + return ERR_PTR(-EOPNOTSUPP); + } + + if (!folio_trylock(folio)) { + folio_put(folio); + return ERR_PTR(-EBUSY); } - return npages; + if (!folio_make_device_exclusive(folio, mm, addr, owner)) { + folio_unlock(folio); + folio_put(folio); + return ERR_PTR(-EBUSY); + } + *foliop = folio; + return page; } -EXPORT_SYMBOL_GPL(make_device_exclusive_range); +EXPORT_SYMBOL_GPL(make_device_exclusive); #endif void __put_anon_vma(struct anon_vma *anon_vma) From patchwork Mon Feb 10 19:37:46 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13968980 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6982E257448 for ; Mon, 10 Feb 2025 19:38:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216305; cv=none; b=aAdtk3j3a7TPOAhliji3EAfJPH9KjvfB1b3TiHzZ4KYMCeJxYl9V8X3b5uDEfDrGdffNESlm+A4ChGx9otSBl2miOrS080eoFvZBfitYBoW1FVSseHi8UfetZnE3ieD1ILg0KDRzRk2yhq3Dw2/gwiCvr+F5ilfq6a84qLOPOfI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216305; c=relaxed/simple; bh=4e5JAQXYU1EozA5iJ8OMoEmE9HBCSMZ+9l2eMTqKdcE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=RmDNpVgnvodlXVond/g9cjPhFbtXzAn667/5haA9feFPVQ/s2DThAGHRO6D9UoK8RkG4fH8bs8jh/e1ZprEJ6sfakKLIteOrwM5w+VWLc3rsr1466BXzbGKXV52O2GzUv4kP83ZcLApl98PSORrKZfaE0SN/ERnkmw7jM3Qby14= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=RqNmGrTJ; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="RqNmGrTJ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216302; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PTNkd4QUaFJn8BXhdNDgA0x2CIZMe/1M0YGUP/Aku4s=; b=RqNmGrTJlFDUZVrqYEPbHQP3Q/eS3QQRcZuIxZ8jxbF/XrySbipuM9h9XJUJd4/peHHpy6 YdRXNdVUeDREHmqVFhR5TWckVDhbG2X7wHJPkDAXiUMnT75qwU6sY90UgQPJ9/rFTE79ie YvQOeBtSJ6aw7N3MM7FOxsiIK8JNFUc= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-381-Cpf7FyzTMTW6pX1kB5MWrw-1; Mon, 10 Feb 2025 14:38:21 -0500 X-MC-Unique: Cpf7FyzTMTW6pX1kB5MWrw-1 X-Mimecast-MFC-AGG-ID: Cpf7FyzTMTW6pX1kB5MWrw Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-4388eee7073so27030755e9.0 for ; Mon, 10 Feb 2025 11:38:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216300; x=1739821100; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PTNkd4QUaFJn8BXhdNDgA0x2CIZMe/1M0YGUP/Aku4s=; b=Fhf3ZSMnyI1b1nDU1cK/eY0SrFFOV7iiwWwR3EMSvCt3ikjKr48j6yRK3ywD9NA19f X4dPmrnb3YYRBYGgiRemali3rDtqkwBfJc8GiZDHnK3x1TDV6p69hT7ZNzRd4Ns4lLoW /teqITaz08Jxn6Hifct7ml9ay9eXOyCoAa95WgrJBi/WUWPS8YvldZcVhf7fc5+Zrn7Z WZPMxxyC6f/11w7MU+1XiTcEEslMFllAiwtsj3rw40y4qf53vtBcZ85hwd/m8dZblSOX QVvbIVvrP81TO4caM84T/aOA5uxzDP002x+vrC/Io6JvXInsMsIK4FakGbBa+dU7vZro HtQA== X-Forwarded-Encrypted: i=1; AJvYcCXI/RHv1CAPwOaaA/H4u6jwsL/HTSZ1huAWzVLB0SjAoc6OhXgbtds/u4MgNqWkovk6UpvR6y1Rrj19rc6/c3EOkVc=@vger.kernel.org X-Gm-Message-State: AOJu0YwTv0e/NxksZ2HgZAz78RF/sEeGDN+Wrrego7KPEzyO1jrR3oQz lP32GIAwOlajNpIVpD/SoxPZH06T1n9mU12dgG+9FvxHDEBvcSC55lhp1w1emfxo1W8jT9gZj0H 8hWkUJ8Q5Mbx+AsNy9Dqc/WbX+q4oIcqtf+gzwXQHcGAZ41ndM5fG1gpCIrFeJB8cy/O2eg== X-Gm-Gg: ASbGncsGTzD2LAXZNsniVS52qHyX1xTkxYVSgCmV5+45P7+OHWpRUXKuXJNhG5IfyyD GPdAlxbFX7OZlM6w67fOWrWkAwP0i+d6cirb9NqspFWLA1AdxQFFPkQVjdumtuCmYgf8LEojEk3 3aQ1bTNJP2xVXo9RwHMh+U+SJc4rDd0Wqfeut92dnqwJMloi9rLTevvnot4fIS4Skoveulr/UHC Wy+8UD2ts9m/WMWoVyzVgSahVIulLYUXdLp6belLE8rBQaf4wZl4m73OQ7SdEq3js+Tgjl1g9Cm YqYVpTM/SQohwggyhxWPf8JYxk++ZAnCR2Gqdk0JL5hDMNSKYvVuDurt1YtnUR1Fkw== X-Received: by 2002:a05:600c:490f:b0:438:9280:61d5 with SMTP id 5b1f17b1804b1-4394ceb21e5mr4501615e9.5.1739216299917; Mon, 10 Feb 2025 11:38:19 -0800 (PST) X-Google-Smtp-Source: AGHT+IFc0kj/uTsAeIklzzlg0/rE3LVYXx2C/9ALcZ7/TUEZYAL5lOHh0P4FpVZlZXUmD9AHXTDheA== X-Received: by 2002:a05:600c:490f:b0:438:9280:61d5 with SMTP id 5b1f17b1804b1-4394ceb21e5mr4501495e9.5.1739216299521; Mon, 10 Feb 2025 11:38:19 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38dc2f6aeafsm11910943f8f.20.2025.02.10.11.38.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:19 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 04/17] mm/rmap: implement make_device_exclusive() using folio_walk instead of rmap walk Date: Mon, 10 Feb 2025 20:37:46 +0100 Message-ID: <20250210193801.781278-5-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: xaT7xgWiTYmtzw5-3NCye2yZ359D1HolVpoEtnXygA4_1739216300 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true We require a writable PTE and only support anonymous folio: we can only have exactly one PTE pointing at that page, which we can just lookup using a folio walk, avoiding the rmap walk and the anon VMA lock. So let's stop doing an rmap walk and perform a folio walk instead, so we can easily just modify a single PTE and avoid relying on rmap/mapcounts. We now effectively work on a single PTE instead of multiple PTEs of a large folio, allowing for conversion of individual PTEs from non-exclusive to device-exclusive -- note that the opposite direction always works on single PTEs: restore_exclusive_pte(). With this change, device-exclusive handling is fully compatible with THPs / large folios. We still require PMD-sized THPs to get PTE-mapped, and supporting PMD-mapped THP (without the PTE-remapping) is a different endeavour that might not be worth it at this point: it might even have negative side-effects [1]. This gets rid of the "folio_mapcount()" usage and let's us fix ordinary rmap walks (migration/swapout) next. Spell out that messing with the mapcount is wrong and must be fixed. [1] https://lkml.kernel.org/r/Z5tI-cOSyzdLjoe_@phenom.ffwll.local Signed-off-by: David Hildenbrand --- mm/rmap.c | 200 ++++++++++++++++++------------------------------------ 1 file changed, 67 insertions(+), 133 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 7ccf850565d33..0cd2a2d3de00d 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2375,131 +2375,6 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags) } #ifdef CONFIG_DEVICE_PRIVATE -struct make_exclusive_args { - struct mm_struct *mm; - unsigned long address; - void *owner; - bool valid; -}; - -static bool page_make_device_exclusive_one(struct folio *folio, - struct vm_area_struct *vma, unsigned long address, void *priv) -{ - struct mm_struct *mm = vma->vm_mm; - DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); - struct make_exclusive_args *args = priv; - pte_t pteval; - struct page *subpage; - bool ret = true; - struct mmu_notifier_range range; - swp_entry_t entry; - pte_t swp_pte; - pte_t ptent; - - mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0, - vma->vm_mm, address, min(vma->vm_end, - address + folio_size(folio)), - args->owner); - mmu_notifier_invalidate_range_start(&range); - - while (page_vma_mapped_walk(&pvmw)) { - /* Unexpected PMD-mapped THP? */ - VM_BUG_ON_FOLIO(!pvmw.pte, folio); - - ptent = ptep_get(pvmw.pte); - if (!pte_present(ptent)) { - ret = false; - page_vma_mapped_walk_done(&pvmw); - break; - } - - subpage = folio_page(folio, - pte_pfn(ptent) - folio_pfn(folio)); - address = pvmw.address; - - /* Nuke the page table entry. */ - flush_cache_page(vma, address, pte_pfn(ptent)); - pteval = ptep_clear_flush(vma, address, pvmw.pte); - - /* Set the dirty flag on the folio now the pte is gone. */ - if (pte_dirty(pteval)) - folio_mark_dirty(folio); - - /* - * Check that our target page is still mapped at the expected - * address. - */ - if (args->mm == mm && args->address == address && - pte_write(pteval)) - args->valid = true; - - /* - * Store the pfn of the page in a special migration - * pte. do_swap_page() will wait until the migration - * pte is removed and then restart fault handling. - */ - if (pte_write(pteval)) - entry = make_writable_device_exclusive_entry( - page_to_pfn(subpage)); - else - entry = make_readable_device_exclusive_entry( - page_to_pfn(subpage)); - swp_pte = swp_entry_to_pte(entry); - if (pte_soft_dirty(pteval)) - swp_pte = pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte = pte_swp_mkuffd_wp(swp_pte); - - set_pte_at(mm, address, pvmw.pte, swp_pte); - - /* - * There is a reference on the page for the swap entry which has - * been removed, so shouldn't take another. - */ - folio_remove_rmap_pte(folio, subpage, vma); - } - - mmu_notifier_invalidate_range_end(&range); - - return ret; -} - -/** - * folio_make_device_exclusive - Mark the folio exclusively owned by a device. - * @folio: The folio to replace page table entries for. - * @mm: The mm_struct where the folio is expected to be mapped. - * @address: Address where the folio is expected to be mapped. - * @owner: passed to MMU_NOTIFY_EXCLUSIVE range notifier callbacks - * - * Tries to remove all the page table entries which are mapping this - * folio and replace them with special device exclusive swap entries to - * grant a device exclusive access to the folio. - * - * Context: Caller must hold the folio lock. - * Return: false if the page is still mapped, or if it could not be unmapped - * from the expected address. Otherwise returns true (success). - */ -static bool folio_make_device_exclusive(struct folio *folio, - struct mm_struct *mm, unsigned long address, void *owner) -{ - struct make_exclusive_args args = { - .mm = mm, - .address = address, - .owner = owner, - .valid = false, - }; - struct rmap_walk_control rwc = { - .rmap_one = page_make_device_exclusive_one, - .done = folio_not_mapped, - .anon_lock = folio_lock_anon_vma_read, - .arg = &args, - }; - - rmap_walk(folio, &rwc); - - return args.valid && !folio_mapcount(folio); -} - /** * make_device_exclusive() - Mark a page for exclusive use by a device * @mm: mm_struct of associated target process @@ -2541,22 +2416,31 @@ static bool folio_make_device_exclusive(struct folio *folio, struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr, void *owner, struct folio **foliop) { - struct folio *folio; + struct mmu_notifier_range range; + struct folio *folio, *fw_folio; + struct vm_area_struct *vma; + struct folio_walk fw; struct page *page; - long npages; + swp_entry_t entry; + pte_t swp_pte; mmap_assert_locked(mm); + addr = PAGE_ALIGN_DOWN(addr); /* * Fault in the page writable and try to lock it; note that if the * address would already be marked for exclusive use by a device, * the GUP call would undo that first by triggering a fault. + * + * If any other device would already map this page exclusively, the + * fault will trigger a conversion to an ordinary + * (non-device-exclusive) PTE and issue a MMU_NOTIFY_EXCLUSIVE. */ - npages = get_user_pages_remote(mm, addr, 1, - FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD, - &page, NULL); - if (npages != 1) - return ERR_PTR(npages); + page = get_user_page_vma_remote(mm, addr, + FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD, + &vma); + if (IS_ERR(page)) + return page; folio = page_folio(page); if (!folio_test_anon(folio) || folio_test_hugetlb(folio)) { @@ -2569,11 +2453,61 @@ struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr, return ERR_PTR(-EBUSY); } - if (!folio_make_device_exclusive(folio, mm, addr, owner)) { + /* + * Inform secondary MMUs that we are going to convert this PTE to + * device-exclusive, such that they unmap it now. Note that the + * caller must filter this event out to prevent livelocks. + */ + mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0, + mm, addr, addr + PAGE_SIZE, owner); + mmu_notifier_invalidate_range_start(&range); + + /* + * Let's do a second walk and make sure we still find the same page + * mapped writable. Note that any page of an anonymous folio can + * only be mapped writable using exactly one PTE ("exclusive"), so + * there cannot be other mappings. + */ + fw_folio = folio_walk_start(&fw, vma, addr, 0); + if (fw_folio != folio || fw.page != page || + fw.level != FW_LEVEL_PTE || !pte_write(fw.pte)) { + if (fw_folio) + folio_walk_end(&fw, vma); + mmu_notifier_invalidate_range_end(&range); folio_unlock(folio); folio_put(folio); return ERR_PTR(-EBUSY); } + + /* Nuke the page table entry so we get the uptodate dirty bit. */ + flush_cache_page(vma, addr, page_to_pfn(page)); + fw.pte = ptep_clear_flush(vma, addr, fw.ptep); + + /* Set the dirty flag on the folio now the PTE is gone. */ + if (pte_dirty(fw.pte)) + folio_mark_dirty(folio); + + /* + * Store the pfn of the page in a special device-exclusive PFN swap PTE. + * do_swap_page() will trigger the conversion back while holding the + * folio lock. + */ + entry = make_writable_device_exclusive_entry(page_to_pfn(page)); + swp_pte = swp_entry_to_pte(entry); + if (pte_soft_dirty(fw.pte)) + swp_pte = pte_swp_mksoft_dirty(swp_pte); + /* The pte is writable, uffd-wp does not apply. */ + set_pte_at(mm, addr, fw.ptep, swp_pte); + + /* + * TODO: The device-exclusive PFN swap PTE holds a folio reference but + * does not count as a mapping (mapcount), which is wrong and must be + * fixed, otherwise RMAP walks don't behave as expected. + */ + folio_remove_rmap_pte(folio, page, vma); + + folio_walk_end(&fw, vma); + mmu_notifier_invalidate_range_end(&range); *foliop = folio; return page; } From patchwork Mon Feb 10 19:37:47 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13968981 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 02A0125A2A8 for ; Mon, 10 Feb 2025 19:38:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216309; cv=none; b=pcnF3sWF/iMtNa0OqH9OZf6BfHe8z5rucc5neDGNx/nZezf+rTzB1HrJkT3gGliDIwOZ7I8owKIeonG62WGMI2sHoGXCq1xlzXkRUXUzeHZSBc3z3jGIvM2l/ePA5JcYtRm1ic82JyHU8HqDS6aMzxZLoeI4eLEY5Zyikoma534= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216309; c=relaxed/simple; bh=L16e+mH6ktPBqGV6aP9/w+AaZ+mPEhmgfdTbdQ7/2+A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=FPhIwqdRiAe8R7Tjo5W6mflYMXQ3UCBC6qA5FY/0q6BGEPr+28Kpe5d36AiXxisKsQVtFoH40PM0eHrNw4DXiNkk9oPtCAzCCdXPHM3HxUCPJGBfMiaYO8NkRiOniCmgrtjAUGlZTtTPiFmFT7kS1+ANQxO2FgA4LTskLNyuMZA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=HnkcK9DW; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="HnkcK9DW" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216306; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mARFTZKeX08fN9+zJg7ejGdzGLsE2rU+aFhmUgxRNAg=; b=HnkcK9DWYuxN9ka/LuGhyVgLICm4uJDgEeX8kckz48HoHj6qLaWkTLfIFuA0Jx7MokmFbe RJQDdr1J2OXqiKk/Y1cbMPttxv5L62XiDGOzeqPwldc8LjKwU3qgW8acFcJQsZBW6tTHFM 32ytaifHIlgvv7iyRvI6LoIAIq6snAc= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-137-CQh4ntTFOMGN_PNGYABcJA-1; Mon, 10 Feb 2025 14:38:24 -0500 X-MC-Unique: CQh4ntTFOMGN_PNGYABcJA-1 X-Mimecast-MFC-AGG-ID: CQh4ntTFOMGN_PNGYABcJA Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-38dd692b6d9so894929f8f.3 for ; Mon, 10 Feb 2025 11:38:24 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216303; x=1739821103; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mARFTZKeX08fN9+zJg7ejGdzGLsE2rU+aFhmUgxRNAg=; b=YiYRtpdBq0Fz1ctPXYdpG96J/xHk13ae9XdU2/78xISf4Vn2NU0waGouw+p8lpvA2/ xGGwsTFKDArUwVitvVzk73JjYiplnhrSgy8YzAvM24U6puO8FuG94ZSQ3JrJfUssCMY1 DnBrAo7qIKBTjbAlnLO/Xkic+iLZEZzR/JTPoblf3A2PKvrZceOlXvJIRXljcC+98ufM Kylw7K7WQ4+kt5BKunZbx0jn0bWnma33004pZ5Au3Hkh3aYdC+IKSkaZDvs/uV/5GEPR cDP8SJDXEhvT1TiathWCsBZWz0qIpFHJWxiOJbWCceL9A4JiFhqSkO3t0HKT2nRQy8Sv F8FQ== X-Forwarded-Encrypted: i=1; AJvYcCWr9F+ZNXztzxZwpsPyCUVF0VxG2n9QqcnGPkMaEIFrnPYoev11y2uu0MgsrbS8/KMiY+S189jiXH2UsZ3lg/p1pB0=@vger.kernel.org X-Gm-Message-State: AOJu0YycpfsddWnRBtW0II1rWM4i9egk2/xnirV3+2RSkQLWd/YhN6iY bIRiUOwrV+cgKnAS4g8C20Fvn/zsbOUCmpLayb2DKW14Ro20Emmuk0aRRwkFk9s5VJGtyUWjby+ 3+KLYSwN5x37ij54HciHOiozuwv8nzF3pqigCFfRzc82CZwmS+2uPI6d6OPcHsLjuoO9QeQ== X-Gm-Gg: ASbGncsK8TvoWoJEPkvBOCOJzbz7uc+1Tx2meJE9X2NrPxoC6oTcLZxDGpD02zOsBMc akrK9fNJjH4/+kD9peLhLrYQvS5xWe3BOMIjVrGl5gmLNHSfBJdOkwC2Wx/bg76cjPAyfeKVxe8 Dhr/+1UkD7K6NhL0XD8f0h+YNTCIuCYpZdvWCuVArd2Lp+2pnJkRkurNkO8TFEjxPm6zCimPwrI RcHwl1h3vrsXp5mW2hM9xOKQ+Ow0yw2bFLd8CwxkGBQHbXE3Od1jRzHXPulAAj5UZBIDCppRANn bc/rKqGaUlZgLTN7Zz0rQYT1a+pXUXNonCdtFosw8Vev4AsIAbJ4iNNJkM0hnWB2iA== X-Received: by 2002:a05:6000:18a5:b0:38d:e33d:d0db with SMTP id ffacd0b85a97d-38de33dd2b2mr2312790f8f.14.1739216303477; Mon, 10 Feb 2025 11:38:23 -0800 (PST) X-Google-Smtp-Source: AGHT+IH1Llhtf765OHh3Rdp8g77cDKhZyIe7rPLF3LamWfRFHXBXmPgwl/7uacqNHueSdZU9tH9+kA== X-Received: by 2002:a05:6000:18a5:b0:38d:e33d:d0db with SMTP id ffacd0b85a97d-38de33dd2b2mr2312758f8f.14.1739216303053; Mon, 10 Feb 2025 11:38:23 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38dd3fc7ee5sm7734941f8f.39.2025.02.10.11.38.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:21 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 05/17] mm/memory: detect writability in restore_exclusive_pte() through can_change_pte_writable() Date: Mon, 10 Feb 2025 20:37:47 +0100 Message-ID: <20250210193801.781278-6-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: i0aJwGyJ3EyvkHLyLcCvqT7ko9M2JQRb9KAxpTulX4U_1739216303 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true Let's do it just like mprotect write-upgrade or during NUMA-hinting faults on PROT_NONE PTEs: detect if the PTE can be writable by using can_change_pte_writable(). Set the PTE only dirty if the folio is dirty: we might not necessarily have a write access, and setting the PTE writable doesn't require setting the PTE dirty. From a CPU perspective, these entries are clean. So only set the PTE dirty if the folios is dirty. With this change in place, there is no need to have separate readable and writable device-exclusive entry types, and we'll merge them next separately. Note that, during fork(), we first convert the device-exclusive entries back to ordinary PTEs, and we only ever allow conversion of writable PTEs to device-exclusive -- only mprotect can currently change them to readable-device-exclusive. Consequently, we always expect PageAnonExclusive(page)==true and can_change_pte_writable()==true, unless we are dealing with soft-dirty tracking or uffd-wp. But reusing can_change_pte_writable() for now is cleaner. Signed-off-by: David Hildenbrand --- mm/memory.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 539c0f7c6d545..ba33ba3b7ea17 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -723,18 +723,21 @@ static void restore_exclusive_pte(struct vm_area_struct *vma, struct folio *folio = page_folio(page); pte_t orig_pte; pte_t pte; - swp_entry_t entry; orig_pte = ptep_get(ptep); pte = pte_mkold(mk_pte(page, READ_ONCE(vma->vm_page_prot))); if (pte_swp_soft_dirty(orig_pte)) pte = pte_mksoft_dirty(pte); - entry = pte_to_swp_entry(orig_pte); if (pte_swp_uffd_wp(orig_pte)) pte = pte_mkuffd_wp(pte); - else if (is_writable_device_exclusive_entry(entry)) - pte = maybe_mkwrite(pte_mkdirty(pte), vma); + + if ((vma->vm_flags & VM_WRITE) && + can_change_pte_writable(vma, address, pte)) { + if (folio_test_dirty(folio)) + pte = pte_mkdirty(pte); + pte = pte_mkwrite(pte, vma); + } VM_BUG_ON_FOLIO(pte_write(pte) && (!folio_test_anon(folio) && PageAnonExclusive(page)), folio); From patchwork Mon Feb 10 19:37:48 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13968982 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0D64325A2CA for ; Mon, 10 Feb 2025 19:38:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216312; cv=none; b=G4KoNFJ5T7xgLL8uXHag+tAqI+a9qWTaWEYwLorS8g6weEQ69vWil8xC33LRo+3ivz3wTaHA1Ub7yn5L7lj9zh7z4gaDDzovClt4n5bARtXYyAbndQ5e2rEVC50aPrRJwLFQh7q8wEKeF277LkSUk0MS9+cviFuxhjmyYVdJ3+U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216312; c=relaxed/simple; bh=aAO5U4rIUSmSiW9HiZeU+4RaV76aniUNOw4ep9QcM3I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=g1VIXIH9ewbvQRjK4GsL2QcHcivaBdrUynufFW1/t5Ai7S1+3CG0uCzVPr6RP/qs0aZ031wePi1URTqoi+/uYVcCgcZeahoaeMnLWwynJsjN9lrk2i9D5u8SnEBazyeikLdLho/21yO3nIIpPZ8+5sWqFEWx52XKJeEs4982nzE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=dUsKcbG8; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dUsKcbG8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216310; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fLiFHA5Aiy4k7pcGr1QPi7NyoGcLfJYJaAk0JBe4xM8=; b=dUsKcbG86Ghp0O/cbBA+Hr3g6NI9LYCR2GL77poGUOAJEkBiNDRuQD8VcIItPET1iaMC9Z drCz+Xi/4z46FmfEp5xWVjCcFLdyjI2gimBAg9v85WmiOQ4HJqBlwXVTBWscu7nDLBbRgS UTjkK96iHi7Dy+v7bZdgc/dZRJK5zLg= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-523-H52CM2KTOEKADLRpyiDWRA-1; Mon, 10 Feb 2025 14:38:28 -0500 X-MC-Unique: H52CM2KTOEKADLRpyiDWRA-1 X-Mimecast-MFC-AGG-ID: H52CM2KTOEKADLRpyiDWRA Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-38ddee833e0so763663f8f.1 for ; Mon, 10 Feb 2025 11:38:28 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216307; x=1739821107; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fLiFHA5Aiy4k7pcGr1QPi7NyoGcLfJYJaAk0JBe4xM8=; b=kHbVSHbE1xxQOu4sOaCaQrNfbDnddZqA05TPbJW58uzgOAjUiPOadRDlz3o/elDhUu 3zVpeCtEhzYel7JB2y+8Je/OsoTItNhkE0rcug5d2Gls8OIv3Rpe4kqtRveyf6VRCwNG Px9DHvJmAKij33O5E+T/w3SmyH/Y/ZcIq3EWf4g4LkcibyHW9TakFhEqOH6qHZraci5B wJYc2o0sjyzkg+PaYdHBmtSItjangYN1bRD96DwcqQO7/FVCZ1gI/xOsszGMduo+N5jk thkt/vd6jwHEhQywxyD21oqNGpZptewEwT4D6YMmas2urOuf8UFfSYfQKuySFO0bxd2a j58w== X-Forwarded-Encrypted: i=1; AJvYcCUxQgPy0eKq7EeonTcSUDhx94Ap4maUBnD52SHv5DJVnqlPf3Ju5ekAKXET94I0YxaZmO/+yrRkcj0p1o3dTzg2dj0=@vger.kernel.org X-Gm-Message-State: AOJu0Yxq/7t9DM7vKhXx2P/hsjbnzqSGkA8MjI+7OqZnxyeQYrTLp9Ap xRKzoUc5f+iqGwvJC/roP76qbc8NE41IFEJflWqEyX2kSh+EMGP5pdBdrctslQct0w4FDCPSn5g jmEhy5pnwPvsczw8uMSnUUxhOkbPY9bNQsEIQlnkKM22bTeyMYqxwgvf+tGsqA1S/F0PsRQ== X-Gm-Gg: ASbGncuhD+nYcorf5WNeR6D0vm/ZkWiYCoggH2n16/FstNIdIF4a1G91ukdzAVAV8uL iGRa6v2RZ828weqVdh2suphM98ZSynmjMt6PSUmGTauzjAIdxJLtD3U/M6vkRUqUyP5eJBbQh00 KfuX4vuU6XTjIdQvSxmT2U/7kQ4DOHF2w7Q4CORjunt6a0pzUTsxNL/EfuVFGa55E41e2MaCN/i OWOi3OR5u7cjCffhz89ggWpMFzk+JwXJ1+kOlGBXAVLy66ZMK12InQTopR3/dqwFug4ABcLTycH WetEtD03ygP2v9EwTFVhAWEMiral4M+ExGk/jCn79/v/Prmis80qTIjmfG9Ig2je4w== X-Received: by 2002:a05:6000:2ab:b0:38d:e3fd:1a1c with SMTP id ffacd0b85a97d-38de43b38d8mr568321f8f.23.1739216307214; Mon, 10 Feb 2025 11:38:27 -0800 (PST) X-Google-Smtp-Source: AGHT+IGsdgSvAB5qD8mlN6AGjEYvBbSJNfBUob/X5TfyCrhv/WAdQoby+DEB/v6QRZcPR+eQm0f2lQ== X-Received: by 2002:a05:6000:2ab:b0:38d:e3fd:1a1c with SMTP id ffacd0b85a97d-38de43b38d8mr568308f8f.23.1739216306759; Mon, 10 Feb 2025 11:38:26 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38dc2f6aeafsm11911098f8f.20.2025.02.10.11.38.23 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:25 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe , Simona Vetter Subject: [PATCH v2 06/17] mm: use single SWP_DEVICE_EXCLUSIVE entry type Date: Mon, 10 Feb 2025 20:37:48 +0100 Message-ID: <20250210193801.781278-7-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: zjbfXrks_tqT3RYbApysR2GqOupu154f4xBViYNpgnQ_1739216307 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true There is no need for the distinction anymore; let's merge the readable and writable device-exclusive entries into a single device-exclusive entry type. Acked-by: Simona Vetter Reviewed-by: Alistair Popple Signed-off-by: David Hildenbrand --- include/linux/swap.h | 7 +++---- include/linux/swapops.h | 27 ++++----------------------- mm/mprotect.c | 8 -------- mm/page_table_check.c | 5 ++--- mm/rmap.c | 2 +- 5 files changed, 10 insertions(+), 39 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index b13b72645db33..26b1d8cc5b0e7 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -74,14 +74,13 @@ static inline int current_is_kswapd(void) * to a special SWP_DEVICE_{READ|WRITE} entry. * * When a page is mapped by the device for exclusive access we set the CPU page - * table entries to special SWP_DEVICE_EXCLUSIVE_* entries. + * table entries to a special SWP_DEVICE_EXCLUSIVE entry. */ #ifdef CONFIG_DEVICE_PRIVATE -#define SWP_DEVICE_NUM 4 +#define SWP_DEVICE_NUM 3 #define SWP_DEVICE_WRITE (MAX_SWAPFILES+SWP_HWPOISON_NUM+SWP_MIGRATION_NUM) #define SWP_DEVICE_READ (MAX_SWAPFILES+SWP_HWPOISON_NUM+SWP_MIGRATION_NUM+1) -#define SWP_DEVICE_EXCLUSIVE_WRITE (MAX_SWAPFILES+SWP_HWPOISON_NUM+SWP_MIGRATION_NUM+2) -#define SWP_DEVICE_EXCLUSIVE_READ (MAX_SWAPFILES+SWP_HWPOISON_NUM+SWP_MIGRATION_NUM+3) +#define SWP_DEVICE_EXCLUSIVE (MAX_SWAPFILES+SWP_HWPOISON_NUM+SWP_MIGRATION_NUM+2) #else #define SWP_DEVICE_NUM 0 #endif diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 96f26e29fefed..64ea151a7ae39 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -186,26 +186,16 @@ static inline bool is_writable_device_private_entry(swp_entry_t entry) return unlikely(swp_type(entry) == SWP_DEVICE_WRITE); } -static inline swp_entry_t make_readable_device_exclusive_entry(pgoff_t offset) +static inline swp_entry_t make_device_exclusive_entry(pgoff_t offset) { - return swp_entry(SWP_DEVICE_EXCLUSIVE_READ, offset); -} - -static inline swp_entry_t make_writable_device_exclusive_entry(pgoff_t offset) -{ - return swp_entry(SWP_DEVICE_EXCLUSIVE_WRITE, offset); + return swp_entry(SWP_DEVICE_EXCLUSIVE, offset); } static inline bool is_device_exclusive_entry(swp_entry_t entry) { - return swp_type(entry) == SWP_DEVICE_EXCLUSIVE_READ || - swp_type(entry) == SWP_DEVICE_EXCLUSIVE_WRITE; + return swp_type(entry) == SWP_DEVICE_EXCLUSIVE; } -static inline bool is_writable_device_exclusive_entry(swp_entry_t entry) -{ - return unlikely(swp_type(entry) == SWP_DEVICE_EXCLUSIVE_WRITE); -} #else /* CONFIG_DEVICE_PRIVATE */ static inline swp_entry_t make_readable_device_private_entry(pgoff_t offset) { @@ -227,12 +217,7 @@ static inline bool is_writable_device_private_entry(swp_entry_t entry) return false; } -static inline swp_entry_t make_readable_device_exclusive_entry(pgoff_t offset) -{ - return swp_entry(0, 0); -} - -static inline swp_entry_t make_writable_device_exclusive_entry(pgoff_t offset) +static inline swp_entry_t make_device_exclusive_entry(pgoff_t offset) { return swp_entry(0, 0); } @@ -242,10 +227,6 @@ static inline bool is_device_exclusive_entry(swp_entry_t entry) return false; } -static inline bool is_writable_device_exclusive_entry(swp_entry_t entry) -{ - return false; -} #endif /* CONFIG_DEVICE_PRIVATE */ #ifdef CONFIG_MIGRATION diff --git a/mm/mprotect.c b/mm/mprotect.c index 516b1d847e2cd..9cb6ab7c40480 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -225,14 +225,6 @@ static long change_pte_range(struct mmu_gather *tlb, newpte = swp_entry_to_pte(entry); if (pte_swp_uffd_wp(oldpte)) newpte = pte_swp_mkuffd_wp(newpte); - } else if (is_writable_device_exclusive_entry(entry)) { - entry = make_readable_device_exclusive_entry( - swp_offset(entry)); - newpte = swp_entry_to_pte(entry); - if (pte_swp_soft_dirty(oldpte)) - newpte = pte_swp_mksoft_dirty(newpte); - if (pte_swp_uffd_wp(oldpte)) - newpte = pte_swp_mkuffd_wp(newpte); } else if (is_pte_marker_entry(entry)) { /* * Ignore error swap entries unconditionally, diff --git a/mm/page_table_check.c b/mm/page_table_check.c index 509c6ef8de400..c2b3600429a0c 100644 --- a/mm/page_table_check.c +++ b/mm/page_table_check.c @@ -196,9 +196,8 @@ EXPORT_SYMBOL(__page_table_check_pud_clear); /* Whether the swap entry cached writable information */ static inline bool swap_cached_writable(swp_entry_t entry) { - return is_writable_device_exclusive_entry(entry) || - is_writable_device_private_entry(entry) || - is_writable_migration_entry(entry); + return is_writable_device_private_entry(entry) || + is_writable_migration_entry(entry); } static inline void page_table_check_pte_flags(pte_t pte) diff --git a/mm/rmap.c b/mm/rmap.c index 0cd2a2d3de00d..1129ed132af94 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2492,7 +2492,7 @@ struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr, * do_swap_page() will trigger the conversion back while holding the * folio lock. */ - entry = make_writable_device_exclusive_entry(page_to_pfn(page)); + entry = make_device_exclusive_entry(page_to_pfn(page)); swp_pte = swp_entry_to_pte(entry); if (pte_soft_dirty(fw.pte)) swp_pte = pte_swp_mksoft_dirty(swp_pte); From patchwork Mon Feb 10 19:37:49 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13968983 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 23FC1257AEA for ; Mon, 10 Feb 2025 19:38:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216317; cv=none; b=hGvyZjF1hwhOZG+wsYqw6qQFEaI7BMhNS2+7aouiMD9Wu50MucSGxXnC0KxoqhhgA1RNpZweMd6/3kx7+cxOlg92MT7K7HEftPFFZnaw9tgn+7MIzme9dpzrshcLYKr3/aKbwmmrnr6W8SC4XUi5zXC6JXWw7GXf8QSPCr55Dco= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216317; c=relaxed/simple; bh=WTAKrIObiH0+XnAmevo4XgmL855z4Fu5Qvoz089fAGM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=ExtgwsfPz1MoKKSqFLYvXydZ3DFTXwXdUGb0Md58tMlKM4O/x58iqUxxn5uK+lliUlzqUMyL/O3iqDGXNAH3rWF87kyFWPu0MZmXvValEO3eW8Xaph7IEwl9M6a40PhdJN1P38jZVO6P9Qz/DzxwlX5J29RlwGTHx6SiKTSP29I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ikpo1MCT; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ikpo1MCT" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216314; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NPITbiV5yRtNZIanqm2v+SKlb5hdEoVu/zyhpD+AgeU=; b=ikpo1MCTffr8Ho/CsjTxQinDzBJSBDm6LK1sMOgGp3IeFkuZ97IiW/rcOxRahPXs2RnVNs UB/aNkXi1DB6RTEIKPoJ69lKDqtUfbor+VUMI/31D+kaJ2UE4q8zU9NGL4O+TF+ZPf7h+I 8mrTEJmoqpQwZfgdk3D9cL5yoKTgl7c= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-462-6o3A221jOB6sUajnm4EB4w-1; Mon, 10 Feb 2025 14:38:32 -0500 X-MC-Unique: 6o3A221jOB6sUajnm4EB4w-1 X-Mimecast-MFC-AGG-ID: 6o3A221jOB6sUajnm4EB4w Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-43933b8d9b1so12560355e9.3 for ; Mon, 10 Feb 2025 11:38:32 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216311; x=1739821111; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NPITbiV5yRtNZIanqm2v+SKlb5hdEoVu/zyhpD+AgeU=; b=YpKjV/nw5/ibr8/9tOZI+uLxvfL7eGm7/1MokF0hyUnYUlioucwZMMPoxm5mRH0JLn 975WgXH+fT5VKCjNOaVXvFT/Ehmyqz/pNEzGzNhzqaNPD6nAfs3JSharrTD3tCqc+LhQ W5dGlUfyflWlZtbAW3bp1bD4k/+yLlPxA71bvj/GbSzNMjVCSSNVrThYiPi3jm85W9pr UHiPUkPaZNpiuRfEnGpGdHznddG8bMcfkOq0wBBxaEox38cfyC0JR9ugkghG3f69P+tF rWTutiEk90DPtC5LnFtLeqVMW39F8w8KWEj2fvsU74k2kpcgKBVbtdAs2CDz2XslWSAT 8aZQ== X-Forwarded-Encrypted: i=1; AJvYcCVg51DJKaRfSIp4ZPUiyRvQ/4cR6b4RpjHKb0MkFzNkNkm2n/DzB9J4FsNEXQUd/C0J6TerS27sqqFqyI5/rge3H9Q=@vger.kernel.org X-Gm-Message-State: AOJu0Yw/Aqqw9XdLu2LqgUu9UASYT6Cku/Vaj+49GhXucICDiPGmSG67 tXEogtvBMI9BwOGQPQQ69hdQy2t0b9KUlJY18WXH6hMu5b3SKLsoZtdCyhoXev3DlTwjutvOZSw 1FuQ8AfAnWq0GjYe5/uv3bVOHLFXhsSpMCBSwFOeyVYx8kR7p8bmrnRv8LfkYDVAOpJSsYg== X-Gm-Gg: ASbGnct3OkvLFOaeHkpGyVL84uBoKHAOQqy2MLH1d4K5hMS4fpKK0T42i1v8M2lV9JZ ovqdnw15nGwbVrWedbdmv+sfFCLrzY+qSgFz/sr/iBltuuCRCWyLws+E/OXkV4xv7xSykwf8avw fbrs4SBZChFjPHPhpvb56sPlKj711jfUaE1OEdYEr0yd3lNH8vhwkLXAwwWniOyiTNHn0FncCTR n1gdgdwe/JSk+CVl2a4FUZkgLqTNRSpUsnsPXaHgz+3yOxl+dx5vEB73mPn3eBTkz8qwI+KZs58 1VIhHE8Szfs/Q9Qm7p+OL6rWt5ePO5dBStwsuO8IdClJXU+p30GaZZ1ahfQUUGUiRg== X-Received: by 2002:a05:600c:500d:b0:439:3dc0:29b6 with SMTP id 5b1f17b1804b1-4393dc02be1mr57220275e9.2.1739216311480; Mon, 10 Feb 2025 11:38:31 -0800 (PST) X-Google-Smtp-Source: AGHT+IHbUQzL3Yq1p3VwDR+lxH773I/PQZNBDcgTpUgE2nEJHxULXhio/DNj6yNxLtRtBjWIFSwTIg== X-Received: by 2002:a05:600c:500d:b0:439:3dc0:29b6 with SMTP id 5b1f17b1804b1-4393dc02be1mr57220015e9.2.1739216311167; Mon, 10 Feb 2025 11:38:31 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-4390d933523sm192523445e9.1.2025.02.10.11.38.27 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:29 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 07/17] mm/page_vma_mapped: device-exclusive entries are not migration entries Date: Mon, 10 Feb 2025 20:37:49 +0100 Message-ID: <20250210193801.781278-8-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: YH5uvycJTXWKkJ3v9zO5vQyCtp-8BvkV_DHQ2sotTrI_1739216311 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true It's unclear why they would be considered migration entries; they are not. Likely we'll never really trigger that case in practice, because migration (including folio split) of a folio that has device-exclusive entries is never started, as we would detect "additional references": device-exclusive entries adjust the mapcount, but not the refcount. Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Reviewed-by: Alistair Popple Signed-off-by: David Hildenbrand --- mm/page_vma_mapped.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 81839a9e74f16..32679be22d30c 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -111,8 +111,7 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw) return false; entry = pte_to_swp_entry(ptent); - if (!is_migration_entry(entry) && - !is_device_exclusive_entry(entry)) + if (!is_migration_entry(entry)) return false; pfn = swp_offset_pfn(entry); From patchwork Mon Feb 10 19:37:50 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13968984 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 64D0A25A34A for ; Mon, 10 Feb 2025 19:38:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216321; cv=none; b=docdehyUq5TzHGSoSpAwzgUbUNE5YGm4yHXmmedF+dWD4JR7dCoRW7QPrfUKDUxKS+3qlh7WdiktDE2gcUo9K7RGObcXhxFjd+hyqNcWXRnNnTzJ84cADpd9wf4bIfDPCBiyXH6TtV582n+SnNL2gDf4hd5SCYnv77WuvvwwfcY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216321; c=relaxed/simple; bh=76d5pEBVKf4Yijczbwj7KuBibjsq0aAlESoqsTf2wbw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=H8st6ays3a+qlXYmYocNYYxzpSPvP0kN3CQcBbQ91xmfaJwABM7wWPntr9cdtwGfNnNaYQDCDTF3hrKP04kw0omrFbrTlLCrey3JWOZmkbgho6xAW8n3Mtn9v7T7zafNeP5zaIrunJtO0KM+2ufCk5bFO2K4AfE0jILKK4vVmfc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=PdE02XrA; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PdE02XrA" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216318; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8sC+CKPivo9nJ2nwAVN+E4SAgEKJPEiCVQrD/mzwcqM=; b=PdE02XrAQ0Z3/Ks7U0EExqc5jHg+iXq8xaygP2zD56pBR0wV0Xp9iIJZLOARoAfeLZDwTi h8Za3ZPWLahslaLCX/0DTXTI+vBvT1koONkJyptkdL10130Sp+aoYjPXHh90X4coRYib3u +Nj62xbcvj14Jd8gYNiHMowR1L6wwAA= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-398-iJUPighwO0qIKvsEVvHILg-1; Mon, 10 Feb 2025 14:38:36 -0500 X-MC-Unique: iJUPighwO0qIKvsEVvHILg-1 X-Mimecast-MFC-AGG-ID: iJUPighwO0qIKvsEVvHILg Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-4394c489babso2566065e9.1 for ; Mon, 10 Feb 2025 11:38:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216315; x=1739821115; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8sC+CKPivo9nJ2nwAVN+E4SAgEKJPEiCVQrD/mzwcqM=; b=pQNiOQwWZWvaUwPSPM+1ipbEogTBUPeNvIktC0E0e3VGjQGS5Wyv5R/V2XJpKgHlQ5 dDiX2vsa4cxaRx8lZNbQ1eCnsftdrXUYVuxo1DYsd5rqRU4pptbY6T+CXDLUeQ/oIAmk Od/vu6ANUIaZlP8TyY+jl8Vhxyu0d+9vYaK2PJhhcOxkwTbmr1st7ihCJIpowKRb3h+K l7wl8SDfKE5770IvOExq7Mpd3IOTkm2bZbTz0bGDJlWqK1jL1XgoN49YiyU8gjObJWy5 cgdO1OWAFMecryNKVMkWit7DdiUxRXDghq0ptOGd69r4oSIVmcw3zc1E87FTaX88zfnn XGwQ== X-Forwarded-Encrypted: i=1; AJvYcCV5l63N1AEQ7A9Xp6KROX8+0rkvylENoFgWQtViDPm5rAQyctA3qteRXugjC0+vWyY5HXOynGV0puZHDMoYRpPugKc=@vger.kernel.org X-Gm-Message-State: AOJu0YxFPGblWZ74U5QZa6zD9/OFPTr1d2yVYIYaAY5thFAlxvjm/zMM FCqS51n/Sl6x35DxkOOLh+AqENoXdVt+yshSHo6JlpdBGr7YKGirIInT8sQVsmjDsBjT1tb+Fdo 8aMd+qw49vcSGuSD4ZcZQp+6gXWgOGwa+KtT3MYQjygV2yXhNZ3tKS9wC8j2JlxrS6Q7UHA== X-Gm-Gg: ASbGncvJyCUiOlY97dJPimD/v2/WV8wGMN85NeRBC985qRpTjfIxN+ic1/Kqfj9WSho 2kh27nxmzvm9yFdBdgjQYklzftkh0RNRWJL266JJcLie1GSPPGgMp+obADkvN7+p9Bia0KSSGVw y1rbcujvHNZZmhfpnqAQmQPCEkuz+yPsh4as/H941rX/dD8LazBuhzHc9zkRu3IEBTblve2ByMa uu3YAusiiAzfoXFa8Tg943rKXOzmUvN3xlfTytoVool/96c26Pau6EB0aJK7R7nBkmjYGUq60um v99ng2qsqRltexuFZF587d1+M4WBahP2D4nfJj2ug4TfmS5DRGfxkWck43tJFjhkOg== X-Received: by 2002:a05:600c:4e91:b0:439:4637:9d9 with SMTP id 5b1f17b1804b1-43946370d97mr43287405e9.12.1739216315600; Mon, 10 Feb 2025 11:38:35 -0800 (PST) X-Google-Smtp-Source: AGHT+IF6B7q2dhfsb9dj8LY+8M1CJU6DFUQTZK7m2VERaPDmly8AB+qIiZRItOcPeyw5uy4LKosd2g== X-Received: by 2002:a05:600c:4e91:b0:439:4637:9d9 with SMTP id 5b1f17b1804b1-43946370d97mr43287075e9.12.1739216315147; Mon, 10 Feb 2025 11:38:35 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-4390d94d802sm195260345e9.12.2025.02.10.11.38.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:33 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 08/17] kernel/events/uprobes: handle device-exclusive entries correctly in __replace_page() Date: Mon, 10 Feb 2025 20:37:50 +0100 Message-ID: <20250210193801.781278-9-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: iqWP5tRAb4tyomTnUzqDW4SvpJXhjyuOOO1Kqf5cAkU_1739216316 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). __replace_page() is not prepared for that, so teach it about these PFN swap PTEs. Note that device-private entries are so far not applicable on that path, because GUP would never have returned such folios (conversion to device-private happens by page migration, not in-place conversion of the PTE). There is a race between GUP and us locking the folio to look it up using page_vma_mapped_walk(), so this is likely a fix (unless something else could prevent that race, but it doesn't look like). pte_pfn() on something that is not a present pte could give use garbage, and we'd wrongly mess up the mapcount because it was already adjusted by calling folio_remove_rmap_pte() when making the entry device-exclusive. Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand --- kernel/events/uprobes.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 2ca797cbe465f..cd6105b100325 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -173,6 +173,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, DEFINE_FOLIO_VMA_WALK(pvmw, old_folio, vma, addr, 0); int err; struct mmu_notifier_range range; + pte_t pte; mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, addr, addr + PAGE_SIZE); @@ -192,6 +193,16 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, if (!page_vma_mapped_walk(&pvmw)) goto unlock; VM_BUG_ON_PAGE(addr != pvmw.address, old_page); + pte = ptep_get(pvmw.pte); + + /* + * Handle PFN swap PTES, such as device-exclusive ones, that actually + * map pages: simply trigger GUP again to fix it up. + */ + if (unlikely(!pte_present(pte))) { + page_vma_mapped_walk_done(&pvmw); + goto unlock; + } if (new_page) { folio_get(new_folio); @@ -206,7 +217,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, inc_mm_counter(mm, MM_ANONPAGES); } - flush_cache_page(vma, addr, pte_pfn(ptep_get(pvmw.pte))); + flush_cache_page(vma, addr, pte_pfn(pte)); ptep_clear_flush(vma, addr, pvmw.pte); if (new_page) set_pte_at(mm, addr, pvmw.pte, From patchwork Mon Feb 10 19:37:51 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13968985 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 022AF25A351 for ; Mon, 10 Feb 2025 19:38:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216323; cv=none; b=j0HOWm7X39LYzoKGSyGB8/50WozSgIpe2Aebz/u10fgjrmGQ7batCx/ULApav4ptJ5TEVoIFgZ+duUm/bRcZe6yh42I25Xeo5G4N3Q5OzocVgIbHD2SJGkrTPNQ3/B5zkyURA+oeHDxeeuasSaRV+IT1/6QeGBC1ldoMed5qsZ8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216323; c=relaxed/simple; bh=XECEsYRumwDXs38xfqw/DSW1eeDdoyEKYX/zmrQ3FPo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=Acb8MEHJnZv3r1WNzFWFx/aQH9Cfcc6O/LW3UU8BbeRBP5kj4t2hCjrBg2Wx8eevpfsetPqhYYTVsZ6v32Ebbz2YpkWDY2Fl0UJnVreYX213zF5a0QtKQ9g00DHKgJhX4Nru+2TGVuXJJ4svbpXxFmn8FQ7Qe9PPc6rAGGk/u2k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=WDw8D0Gg; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="WDw8D0Gg" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216321; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=b7jLfJ9vH0eUU1vYEXZ3dUW5UEIvFQqjIYltl+lFPPU=; b=WDw8D0GgGNKMZT0mBqbsJGqDgZ1bopJ6Wn8XeBgE2JfTPy/bXAIs9x7Icx+dU9j2gwqsH7 jNsRRx84/y3XfBK/gbSYVm6ut4C2//TEmmPA6N0oEjLTBJ09j5fk5qlU3sV7olL4mmVpB9 mHzpSMXwHM0FyOzUlGzTbGCVrtAXu2I= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-538-IT_o2I1tOgKoE6YqrmgDcw-1; Mon, 10 Feb 2025 14:38:39 -0500 X-MC-Unique: IT_o2I1tOgKoE6YqrmgDcw-1 X-Mimecast-MFC-AGG-ID: IT_o2I1tOgKoE6YqrmgDcw Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-4393e89e910so9545635e9.0 for ; Mon, 10 Feb 2025 11:38:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216318; x=1739821118; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=b7jLfJ9vH0eUU1vYEXZ3dUW5UEIvFQqjIYltl+lFPPU=; b=PNICslieMPLkrnbxZNnRKstOOqG2K3LBAm7ONAr2MlZJ1UeOgEbyYLvGdagOSin4SQ mw8Y++JNM5Fy65//uZJCm3t9ChBbRUg8knXCz4fCwMjYMw6vAmS7n/DDPgiicw0qfiJa IasTwLbznvKboke56Jcz7o7spWJjbkJeR+L0wzRqjAwyE33FYPqj/Tyxay8q7geSTn3k FKkR+3CrDWR/ZjchWz2MVdhZBGPUiRdereV3ooNq+zvzFFRyz5rHiWmRuHjUz8DsFN7y 0IMFpAbuzzsxVrh84jG3pn1nPikGHoZSG7lFBWNUlx47g3I/AZZZui0jg0gKS7i2kF13 O8zw== X-Forwarded-Encrypted: i=1; AJvYcCUwugW8DXuzgHG/i+6cTb+ZxlpGH9XcK0T7NHSPb8XMsHIKbP8bkAdHXJlUURawigyUTraWET4mUkQ8FzLP6NnTrjk=@vger.kernel.org X-Gm-Message-State: AOJu0Yz/D4uZUyOs2S3SQIcO3HGLvhWiiAmEE2A7df/RreCggCMLLX7K K+glf06ghjxZUleOm+OAgQfOg/5iRPpch+zP7COEygHkNpLmJD4x9BYKLNS/ia/TWG80U/TPOki Tc3QyB+Xs6FApqMY12OfF8nNMtaP4yWJ5bphpgvOkJ+yOutxMWI4ia/HCe/FSOpl333QdNA== X-Gm-Gg: ASbGncsMXUmiljZbNm/xcei8euUPv+UCcw/E4UE88UhDXh1ur6Ch1NEWmO3Kup0T+B4 TuLG03ivy5ZaRiefDHCjrs1fiYyoSQ+qCrueBjhlRaR5nttMWXSKFfgk6LzcPf7kmWi6RhDDRDD jf4V1YJHNzW9RXyz86CbPl9C5YJb6LLZ1VaSpxEVLKReQ1RfgOjY6fZy2pd/02xneuVZL58A6x7 Hg2aLOHYVHw+v4IR3uUY9kGNdeB97uWwHTQ87ivpDGVU2g2bL5Sp4Tji8vR7WNmoxluPoNg736s xf2R75pruLc8OfiKWPkz02ZWW/xVXCyd3L8Vww0Lm+kNz8GZO12b7xmqWmMlAoKrzA== X-Received: by 2002:a05:600c:4f90:b0:434:a7e7:a1ca with SMTP id 5b1f17b1804b1-439249b04f8mr116077455e9.20.1739216318624; Mon, 10 Feb 2025 11:38:38 -0800 (PST) X-Google-Smtp-Source: AGHT+IHdlczuWpIJN55C/ZdLw7mis7jMK/FKd714PEZJ/2b6DQtULEf1bpuI2yab2McnQRseLHdLgg== X-Received: by 2002:a05:600c:4f90:b0:434:a7e7:a1ca with SMTP id 5b1f17b1804b1-439249b04f8mr116077285e9.20.1739216318299; Mon, 10 Feb 2025 11:38:38 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-4391da96502sm158809495e9.1.2025.02.10.11.38.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:37 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 09/17] mm/ksm: handle device-exclusive entries correctly in write_protect_page() Date: Mon, 10 Feb 2025 20:37:51 +0100 Message-ID: <20250210193801.781278-10-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: svmNV2m3EKNCs8kNMkZi24gpe10V0go_TpBWYqKD138_1739216319 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). write_protect_page() is not prepared for that, so teach it about these PFN swap PTEs. Note that device-private entries are so far not applicable on that path, because GUP would never have returned such folios (conversion to device-private happens by page migration, not in-place conversion of the PTE). There is a race between performing the folio_walk (which fails on non-present PTEs) and locking the folio to look it up using page_vma_mapped_walk() again, so this is likely a fix (unless something else could prevent that race, but it doesn't look like). In the future it could be handled if ever required, for now just give up and ignore them like folio_walk would. Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand --- mm/ksm.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/mm/ksm.c b/mm/ksm.c index 8be2b144fefd6..8583fb91ef136 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1270,8 +1270,15 @@ static int write_protect_page(struct vm_area_struct *vma, struct folio *folio, if (WARN_ONCE(!pvmw.pte, "Unexpected PMD mapping?")) goto out_unlock; - anon_exclusive = PageAnonExclusive(&folio->page); entry = ptep_get(pvmw.pte); + /* + * Handle PFN swap PTEs, such as device-exclusive ones, that actually + * map pages: give up just like the next folio_walk would. + */ + if (unlikely(!pte_present(entry))) + goto out_unlock; + + anon_exclusive = PageAnonExclusive(&folio->page); if (pte_write(entry) || pte_dirty(entry) || anon_exclusive || mm_tlb_flush_pending(mm)) { swapped = folio_test_swapcache(folio); From patchwork Mon Feb 10 19:37:52 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13968986 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A2B224C684 for ; Mon, 10 Feb 2025 19:38:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216329; cv=none; b=eo62CzonVCBMDatqnccD4pPfmuKv4H8CpWprgsxJbBL3D4H0GEnDlLQCxKbgIvV2OREzY3kofhRuEAnmEOGe/kZgGvd7qo6NZvksi8l999+0oHs4+4UjFfNY3b0LpaJucJpwXPBZXDWaEXAWLmAG2JCsu1w1qXamWOiTMSkQ9PY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216329; c=relaxed/simple; bh=wTEfnlnRBIWds4A4VWjgD0zaJLq1iJwNdx86V2VqPGE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=MQST0F0uAuIIJmkGK96H7c3jXHF5jiUobfMVmYEKplvLj5alFj1N7tsCrtXR4eVWBexoY87ML6NmVfLxlQ6VTWXexpzcon85j+ghZXleMVp/UxPtBCpGxRKHGz0omN3ujCeMPx+xWMRSz1iuekxdFirTJ84HUQdifu+UF6rtlHg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=CMhZx0Z3; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="CMhZx0Z3" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216327; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LvU1uLGkpTPZRA0QaV3EggyJAUMnys9Fj6An1hzKeYY=; b=CMhZx0Z3ImLDVs6Kd1kt57wjbu7x/alVPQfw20r0IM7lryQ2z8Wqa7KoFnDCXJuS3C16+R DLJNih8zwpoWNoMqweL9Y6R4n8fi8HId9HEplCMbxbvDOxP2NYb5AwPxmbHdPA1mKCJP/o Vn67FKsRirUIi4GIYPnYWO//pYlapp0= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-664-A-U59fo4OrS7zGGuGVGHTg-1; Mon, 10 Feb 2025 14:38:44 -0500 X-MC-Unique: A-U59fo4OrS7zGGuGVGHTg-1 X-Mimecast-MFC-AGG-ID: A-U59fo4OrS7zGGuGVGHTg Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-4392fc6bceaso13134895e9.2 for ; Mon, 10 Feb 2025 11:38:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216322; x=1739821122; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LvU1uLGkpTPZRA0QaV3EggyJAUMnys9Fj6An1hzKeYY=; b=J12SsmE0gl8H9YmzJLs17p5kYAEaCLzVQB+LiiOdbVWc2q08ZCfUwSnF3GdxHnqfHI O9CHhzWlufh71wK6fXdWuHdE4WpLRm49VwVy2kIzjV5qOzeKa7lMgSV+gfbW2uJS/prF 4n8Ac8iCvUvae92fbK47E+WSpzlDkbqY5+fUE4sxIZsFacFhBdXguXdXTKoMdFK0iJux w7LYjGvuYibHGlwKr0VoCfIi6+RS4OIltk5i1n1/LNOqQqYWnesSfpRaRmsoa2HYRsWs 6YGGd8tdlzY3/0nc0Jm+a0RY4QwJy2xnzCQ9soFlNLalULFXwIruWp/A76TGrk+9Z4BS ZzOQ== X-Forwarded-Encrypted: i=1; AJvYcCWaV5F+qdHzZsquMLwjXEQ7CM4U30YkyxeGTSWAbWszUuIf6llXQEZCnC4QJCaQBManOe9qUBCk+JZ2NL63EMox1h0=@vger.kernel.org X-Gm-Message-State: AOJu0YyYHEuRfxD4MBz7HmkqgHL1BAv2aw6rMG5dva9ZIOfxbwBc5A2Y 8eCDP13L9r/OdIbPlnxjeufbQnDbGIRHj0p7WHybN77RWJztdW/qxOifH3yAz2vC5pi2tRT6PDG uMMeFUjgd6Lj8Y7hSeH8to1U4x84yip8qmIJVHgTeUSJ6OfXJom7/Z/qEyVzpFEHa6SmQqA== X-Gm-Gg: ASbGncslOJkSkv0EjjCeDJZbvQ+p0lQavhDq4PpQu2+abSjocgNQIrjpy4L8hqZWyCm xMOocAgpQRsC/MTaHWy3A1e6CdUR1MFTpBMY+85jPotzK6afr3L+2A/rbfcfNrLiHkxSTWkN+bK O3dvl/GMr2ArMtjhKyJrS3qDAyEmOet0qOF5t/UueekGWs7Y0TJ+F7rLLHEW4+LkfkXZq5A0kcX 7fLPzd/SHDoYIVW6uf13RqRFUv2JkD54vplsyL03CjpmvTWmilMpEPMQ0EH+3RARFoxf6pe3KDT ddOYRicr4Eep2qZM6qYJlN6C0GCAATuRwgrdIqj3/GKaqajVEylynSDNEBhuYRKzcA== X-Received: by 2002:a05:600c:1e0e:b0:431:5e3c:2ff0 with SMTP id 5b1f17b1804b1-439249889a8mr117196065e9.8.1739216322484; Mon, 10 Feb 2025 11:38:42 -0800 (PST) X-Google-Smtp-Source: AGHT+IGjSKW3zWL6vzVGiNBIPirrt7XpGCC7ElTlo8fIz4HKSkBODD3ovt81FL+SvLTQ6DCkqg1UrQ== X-Received: by 2002:a05:600c:1e0e:b0:431:5e3c:2ff0 with SMTP id 5b1f17b1804b1-439249889a8mr117195855e9.8.1739216321998; Mon, 10 Feb 2025 11:38:41 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38ddaf333c5sm5084761f8f.36.2025.02.10.11.38.39 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:40 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 10/17] mm/rmap: handle device-exclusive entries correctly in try_to_unmap_one() Date: Mon, 10 Feb 2025 20:37:52 +0100 Message-ID: <20250210193801.781278-11-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: egf2FGLyB5soqsJKdnYUjw_0UadFGeAbLH8ufUrjk7w_1739216322 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). try_to_unmap_one() is not prepared for that, so teach it about these PFN swap PTEs. Note that device-private entries are so far not applicable on that path, as we expect ZONE_DEVICE pages so far only in migration code when it comes to the RMAP. Note that we could currently only run into this case with device-exclusive entries on THPs. We still adjust the mapcount on conversion to device-exclusive; this makes the rmap walk abort early for small folios, because we'll always have !folio_mapped() with a single device-exclusive entry. We'll adjust the mapcount logic once all page_vma_mapped_walk() users can properly handle device-exclusive entries. Further note that try_to_unmap() calls MMU notifiers and holds the folio lock, so any device-exclusive users should be properly prepared for a device-exclusive PTE to "vanish". Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand --- mm/rmap.c | 52 +++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 39 insertions(+), 13 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 1129ed132af94..47142a656ae51 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1648,9 +1648,9 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, { struct mm_struct *mm = vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); + bool anon_exclusive, ret = true; pte_t pteval; struct page *subpage; - bool anon_exclusive, ret = true; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; unsigned long pfn; @@ -1722,7 +1722,18 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, /* Unexpected PMD-mapped THP? */ VM_BUG_ON_FOLIO(!pvmw.pte, folio); - pfn = pte_pfn(ptep_get(pvmw.pte)); + /* + * Handle PFN swap PTEs, such as device-exclusive ones, that + * actually map pages. + */ + pteval = ptep_get(pvmw.pte); + if (likely(pte_present(pteval))) { + pfn = pte_pfn(pteval); + } else { + pfn = swp_offset_pfn(pte_to_swp_entry(pteval)); + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); + } + subpage = folio_page(folio, pfn - folio_pfn(folio)); address = pvmw.address; anon_exclusive = folio_test_anon(folio) && @@ -1778,7 +1789,9 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, hugetlb_vma_unlock_write(vma); } pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); - } else { + if (pte_dirty(pteval)) + folio_mark_dirty(folio); + } else if (likely(pte_present(pteval))) { flush_cache_page(vma, address, pfn); /* Nuke the page table entry. */ if (should_defer_flush(mm, flags)) { @@ -1796,6 +1809,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } + if (pte_dirty(pteval)) + folio_mark_dirty(folio); + } else { + pte_clear(mm, address, pvmw.pte); } /* @@ -1805,10 +1822,6 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, */ pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); - /* Set the dirty flag on the folio now the pte is gone. */ - if (pte_dirty(pteval)) - folio_mark_dirty(folio); - /* Update high watermark before we lower rss */ update_hiwater_rss(mm); @@ -1822,8 +1835,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, dec_mm_counter(mm, mm_counter(folio)); set_pte_at(mm, address, pvmw.pte, pteval); } - - } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) { + } else if (likely(pte_present(pteval)) && pte_unused(pteval) && + !userfaultfd_armed(vma)) { /* * The guest indicated that the page content is of no * interest anymore. Simply discard the pte, vmscan @@ -1902,6 +1915,12 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, set_pte_at(mm, address, pvmw.pte, pteval); goto walk_abort; } + + /* + * arch_unmap_one() is expected to be a NOP on + * architectures where we could have PFN swap PTEs, + * so we'll not check/care. + */ if (arch_unmap_one(mm, vma, address, pteval) < 0) { swap_free(entry); set_pte_at(mm, address, pvmw.pte, pteval); @@ -1926,10 +1945,17 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, swp_pte = swp_entry_to_pte(entry); if (anon_exclusive) swp_pte = pte_swp_mkexclusive(swp_pte); - if (pte_soft_dirty(pteval)) - swp_pte = pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte = pte_swp_mkuffd_wp(swp_pte); + if (likely(pte_present(pteval))) { + if (pte_soft_dirty(pteval)) + swp_pte = pte_swp_mksoft_dirty(swp_pte); + if (pte_uffd_wp(pteval)) + swp_pte = pte_swp_mkuffd_wp(swp_pte); + } else { + if (pte_swp_soft_dirty(pteval)) + swp_pte = pte_swp_mksoft_dirty(swp_pte); + if (pte_swp_uffd_wp(pteval)) + swp_pte = pte_swp_mkuffd_wp(swp_pte); + } set_pte_at(mm, address, pvmw.pte, swp_pte); } else { /* From patchwork Mon Feb 10 19:37:53 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13968987 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 217CC2586D0 for ; Mon, 10 Feb 2025 19:38:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216332; cv=none; b=fjOMpNT+V4vLB4cxwZTSZ5LgODK4vLJPEvO7OaUXW0q53mrUw5iNr94hEr29epXXmJBG1gQ3UVEkQXVaUrqzR2sc4ye/OpeLMf7SCXjfl2gUCG/GWj66bA3atFyBuuB/rqmnnInLwUERzz2CHDvmEUtQvanwF3Z9tjySkyADnzQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216332; c=relaxed/simple; bh=up872fM6yJvkziP+4dI1MrGG1wgvrmLWB2nPuwT2cw8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=MLmJUzcKBB2IOWZSexikbS8iTE1FxFQMnS1ezHuECKtt8aScs47/k45gBTrGuUlXh0W28yacDLIsTTd1y5EP/x/qoV8ojXpVWEbSmQnMuo2DiohzWiH9eDf1xB0gGmhBQmkzNNx0Kd8wi6PaxU+LWlKMQKB13aRXwZBKXGGEexA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Er3L2kI6; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Er3L2kI6" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216328; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QxHe6mgzr5LViG+7+kvBnycnKkPt1kZTw9hZ/c9dl+o=; b=Er3L2kI6LbrBA74nW9z4h8Uqh9EhvS2+IJI6ini5gmcM0qMywBFHmic2B5uHSeAOXkLx+y qSfnlYhT1mwgjQRaNvW1tOQrVWo3J0gbgcbdlLYDPsYwBlOeVUpaBwHphxjzNBaxiFDXJS CJ+hRfSt8DUZ+qvv9wqvgQ0t79pBoC0= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-100-ahinHzFMNhucAcY_MA1z0w-1; Mon, 10 Feb 2025 14:38:47 -0500 X-MC-Unique: ahinHzFMNhucAcY_MA1z0w-1 X-Mimecast-MFC-AGG-ID: ahinHzFMNhucAcY_MA1z0w Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-438e180821aso27983315e9.1 for ; Mon, 10 Feb 2025 11:38:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216326; x=1739821126; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QxHe6mgzr5LViG+7+kvBnycnKkPt1kZTw9hZ/c9dl+o=; b=exok1QhYM6B46OyIh2vOgAopAEKsj5kCMWMwD0HjBcTErANmlvfaCtXIdztPJ5hRgl LkdPUBx+OUFqHiSUShmGT0vORzWNTAvBYHYQY/22J5cB5jI4Ns+3LQ8q9lcuZ8vblA7R 6v6ch9WE2UDeSACSJWdw/bugmU7E41nvka8x+FVAAmQH8sgerw4upqL3Pq3cRMQ370ES u99ZdOHLo30UIuSQDeLMT58mHvF/CCeYxbY3tCVu8/IiKhcBBHtdgfljT+zHD/+4DFyU y3S2uZk1DoSEXUt5/SSLVWhiuv+lvWKdWR1cGu0bRWnWLOFYkNOCkTVVQAI0+AIuP3c/ yJwA== X-Forwarded-Encrypted: i=1; AJvYcCXH1TmTMX2WODA47mFahcaG/piEum49mPMQAjkznlvkwU5rPkE1FVosfKn6YWrhAf3OrA/8PlT0+gLZeq7w9otsMJA=@vger.kernel.org X-Gm-Message-State: AOJu0YxCr7ciGbB0qzZyVzeQ+RcSzSuRVMNz7oiRS++qGd5oeexXHtYL uzGd5c+Xb0S1iTHvsH02Af/oLtHQ2kZsdMOv8BTF8TT2nQtA9sYfn4GIDH2A8s+m2eD9oY3dd+w MThc0BpVZtUyY2u5ceZAEHjchpbzRWMfAj7aM/7tMXPXsYBX+sVsVq4ToLrA8Z+lJMFTRDg== X-Gm-Gg: ASbGncu1tvqdAZlfJQFqIl9we5phBFvq4MQ3FqtDmz6q7RtR9Q5mbxd3lGbjFaz1prf IUCT1Dkhu5HsxOi4hddwsadV5OjKxwuzeMBZaEhayZLg1YSG1xNfDDNbdnvqJgjwL9lgGxl285x VfcmXtjEbCH7wYeTFEc5A1nx0ICD+iTn3LW5ZCZJFZ5Xex9Gd4AtmOsAjTXXxCKt5nNzltk7EUs yFndvJjtbmgiuQW3c7yYBs1RhIjvx4KlGC5hlzSR+lFFYcqEJPVJKO5+zyXJ/2aU3qN2Hd63i9o meS7GzIM+RwfKD/UbMXHDJ1TIpvLA0CBmG4K5IukFZAx/zdQAaa2FfjCmorOo3saRw== X-Received: by 2002:a05:6000:2a6:b0:38d:dc4d:3473 with SMTP id ffacd0b85a97d-38ddc4d34b0mr6018871f8f.51.1739216326365; Mon, 10 Feb 2025 11:38:46 -0800 (PST) X-Google-Smtp-Source: AGHT+IEI9wcsziW5nBYTCF/f9iYx8hWmPInDi/J4NMO1Qty0GFsip5PSFKOaGK1UrWp2GzbjhG1/jA== X-Received: by 2002:a05:6000:2a6:b0:38d:dc4d:3473 with SMTP id ffacd0b85a97d-38ddc4d34b0mr6018834f8f.51.1739216325915; Mon, 10 Feb 2025 11:38:45 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38dd295200asm7894656f8f.44.2025.02.10.11.38.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:44 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 11/17] mm/rmap: handle device-exclusive entries correctly in try_to_migrate_one() Date: Mon, 10 Feb 2025 20:37:53 +0100 Message-ID: <20250210193801.781278-12-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: d9N5lophjGdNZE0gx3UvFIJPQgWcBB294-r8_xACml4_1739216326 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). try_to_migrate_one() is not prepared for that, so teach it about these PFN swap PTEs. We already handle device-private entries by specializing on the folio, so we can reshuffle that code to make it work on the PFN swap PTEs instead. Get rid of the folio_is_device_private() handling. Note that we never currently expect device-private folios with HWPoison flag set at that point, so add a warning in case that ever changes and we can figure out what the right thing to do is. Note that we could currently only run into this case with device-exclusive entries on THPs. We still adjust the mapcount on conversion to device-exclusive; this makes the rmap walk abort early for small folios, because we'll always have !folio_mapped() with a single device-exclusive entry. We'll adjust the mapcount logic once all page_vma_mapped_walk() users can properly handle device-exclusive entries. Further note that try_to_migrate() calls MMU notifiers and holds the folio lock, so any device-exclusive users should be properly prepared for a device-exclusive PTE to "vanish". Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand --- mm/rmap.c | 124 ++++++++++++++++++++++-------------------------------- 1 file changed, 51 insertions(+), 73 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 47142a656ae51..7c471c3ea64c4 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2039,9 +2039,9 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, { struct mm_struct *mm = vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); + bool anon_exclusive, writable, ret = true; pte_t pteval; struct page *subpage; - bool anon_exclusive, ret = true; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; unsigned long pfn; @@ -2108,24 +2108,19 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, /* Unexpected PMD-mapped THP? */ VM_BUG_ON_FOLIO(!pvmw.pte, folio); - pfn = pte_pfn(ptep_get(pvmw.pte)); - - if (folio_is_zone_device(folio)) { - /* - * Our PTE is a non-present device exclusive entry and - * calculating the subpage as for the common case would - * result in an invalid pointer. - * - * Since only PAGE_SIZE pages can currently be - * migrated, just set it to page. This will need to be - * changed when hugepage migrations to device private - * memory are supported. - */ - VM_BUG_ON_FOLIO(folio_nr_pages(folio) > 1, folio); - subpage = &folio->page; + /* + * Handle PFN swap PTEs, such as device-exclusive ones, that + * actually map pages. + */ + pteval = ptep_get(pvmw.pte); + if (likely(pte_present(pteval))) { + pfn = pte_pfn(pteval); } else { - subpage = folio_page(folio, pfn - folio_pfn(folio)); + pfn = swp_offset_pfn(pte_to_swp_entry(pteval)); + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); } + + subpage = folio_page(folio, pfn - folio_pfn(folio)); address = pvmw.address; anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(subpage); @@ -2181,7 +2176,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, } /* Nuke the hugetlb page table entry */ pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); - } else { + if (pte_dirty(pteval)) + folio_mark_dirty(folio); + writable = pte_write(pteval); + } else if (likely(pte_present(pteval))) { flush_cache_page(vma, address, pfn); /* Nuke the page table entry. */ if (should_defer_flush(mm, flags)) { @@ -2199,54 +2197,23 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } + if (pte_dirty(pteval)) + folio_mark_dirty(folio); + writable = pte_write(pteval); + } else { + pte_clear(mm, address, pvmw.pte); + writable = is_writable_device_private_entry(pte_to_swp_entry(pteval)); } - /* Set the dirty flag on the folio now the pte is gone. */ - if (pte_dirty(pteval)) - folio_mark_dirty(folio); + VM_WARN_ON_FOLIO(writable && folio_test_anon(folio) && + !anon_exclusive, folio); /* Update high watermark before we lower rss */ update_hiwater_rss(mm); - if (folio_is_device_private(folio)) { - unsigned long pfn = folio_pfn(folio); - swp_entry_t entry; - pte_t swp_pte; - - if (anon_exclusive) - WARN_ON_ONCE(folio_try_share_anon_rmap_pte(folio, - subpage)); + if (PageHWPoison(subpage)) { + VM_WARN_ON_FOLIO(folio_is_device_private(folio), folio); - /* - * Store the pfn of the page in a special migration - * pte. do_swap_page() will wait until the migration - * pte is removed and then restart fault handling. - */ - entry = pte_to_swp_entry(pteval); - if (is_writable_device_private_entry(entry)) - entry = make_writable_migration_entry(pfn); - else if (anon_exclusive) - entry = make_readable_exclusive_migration_entry(pfn); - else - entry = make_readable_migration_entry(pfn); - swp_pte = swp_entry_to_pte(entry); - - /* - * pteval maps a zone device page and is therefore - * a swap pte. - */ - if (pte_swp_soft_dirty(pteval)) - swp_pte = pte_swp_mksoft_dirty(swp_pte); - if (pte_swp_uffd_wp(pteval)) - swp_pte = pte_swp_mkuffd_wp(swp_pte); - set_pte_at(mm, pvmw.address, pvmw.pte, swp_pte); - trace_set_migration_pte(pvmw.address, pte_val(swp_pte), - folio_order(folio)); - /* - * No need to invalidate here it will synchronize on - * against the special swap migration pte. - */ - } else if (PageHWPoison(subpage)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { hugetlb_count_sub(folio_nr_pages(folio), mm); @@ -2256,8 +2223,8 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, dec_mm_counter(mm, mm_counter(folio)); set_pte_at(mm, address, pvmw.pte, pteval); } - - } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) { + } else if (likely(pte_present(pteval)) && pte_unused(pteval) && + !userfaultfd_armed(vma)) { /* * The guest indicated that the page content is of no * interest anymore. Simply discard the pte, vmscan @@ -2273,6 +2240,11 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, swp_entry_t entry; pte_t swp_pte; + /* + * arch_unmap_one() is expected to be a NOP on + * architectures where we could have PFN swap PTEs, + * so we'll not check/care. + */ if (arch_unmap_one(mm, vma, address, pteval) < 0) { if (folio_test_hugetlb(folio)) set_huge_pte_at(mm, address, pvmw.pte, @@ -2283,8 +2255,6 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, page_vma_mapped_walk_done(&pvmw); break; } - VM_BUG_ON_PAGE(pte_write(pteval) && folio_test_anon(folio) && - !anon_exclusive, subpage); /* See folio_try_share_anon_rmap_pte(): clear PTE first. */ if (folio_test_hugetlb(folio)) { @@ -2309,7 +2279,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, * pte. do_swap_page() will wait until the migration * pte is removed and then restart fault handling. */ - if (pte_write(pteval)) + if (writable) entry = make_writable_migration_entry( page_to_pfn(subpage)); else if (anon_exclusive) @@ -2318,15 +2288,23 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, else entry = make_readable_migration_entry( page_to_pfn(subpage)); - if (pte_young(pteval)) - entry = make_migration_entry_young(entry); - if (pte_dirty(pteval)) - entry = make_migration_entry_dirty(entry); - swp_pte = swp_entry_to_pte(entry); - if (pte_soft_dirty(pteval)) - swp_pte = pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte = pte_swp_mkuffd_wp(swp_pte); + if (likely(pte_present(pteval))) { + if (pte_young(pteval)) + entry = make_migration_entry_young(entry); + if (pte_dirty(pteval)) + entry = make_migration_entry_dirty(entry); + swp_pte = swp_entry_to_pte(entry); + if (pte_soft_dirty(pteval)) + swp_pte = pte_swp_mksoft_dirty(swp_pte); + if (pte_uffd_wp(pteval)) + swp_pte = pte_swp_mkuffd_wp(swp_pte); + } else { + swp_pte = swp_entry_to_pte(entry); + if (pte_swp_soft_dirty(pteval)) + swp_pte = pte_swp_mksoft_dirty(swp_pte); + if (pte_swp_uffd_wp(pteval)) + swp_pte = pte_swp_mkuffd_wp(swp_pte); + } if (folio_test_hugetlb(folio)) set_huge_pte_at(mm, address, pvmw.pte, swp_pte, hsz); From patchwork Mon Feb 10 19:37:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13968988 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E6C2024E4B6 for ; Mon, 10 Feb 2025 19:38:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216334; cv=none; b=PlfjMpg618obUy/jt30ZdfNEEaA5MzyqIDDKjLm1Pe/QMFvm57cH8Hr6/xgWX+PDEPXlui4gq6MpeNdJS1nKNtSJiHIM1l12DYZ9WOoc8x/NIpIURsVfyutiQcXK+uAjf6Ti9hcOeBaARU6CCIrgFFSegRh6ne/p0k7lA9vTMws= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216334; c=relaxed/simple; bh=cMJXMPJ9fLW/woS9dzf0yuP4P9GXVUKQkO8wkukVnbg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=A5SgQms0B9SSrzfZrJnauTZyU1D9/BWXv1NtGaRQEZr6y10LeaRDQrA69qFUkL8ctiWG8lb+2Fyo44vuVtxa8McSYOkbI3a/NuBF6E5vXDB0sIbawrnNPfKYPDkXORUA21mkQmG+BlQt7T93LuTIxFwy3JHW5hyAwR+c+GRV5Jc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Nrelez0O; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Nrelez0O" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216332; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LTEWxPe8B8Wum8eXVDX5hUwjNBhNNdMWJNli9ItwISM=; b=Nrelez0Okav55ARJLkfm+qqrceoMnlR7IrL0XVbPou2/EbswXQLfATuakvOZhSx0VRfZRh 61Wr6On1w9CqbWFhpEZdULEGO5HiRmKCbFAulzP7wCLkjOXCDPSLOmnD9s5Kg1RIYvrW3j brP2zX3puxzBFXtp96g+hIv4hRpxcz0= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-515-_RYw7GRDNOKp8xzVisyYog-1; Mon, 10 Feb 2025 14:38:51 -0500 X-MC-Unique: _RYw7GRDNOKp8xzVisyYog-1 X-Mimecast-MFC-AGG-ID: _RYw7GRDNOKp8xzVisyYog Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-43625ceae52so28220075e9.0 for ; Mon, 10 Feb 2025 11:38:50 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216330; x=1739821130; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LTEWxPe8B8Wum8eXVDX5hUwjNBhNNdMWJNli9ItwISM=; b=FDl8cfnshHKZgmMXQdV0vUOkALJJejNfqXMWPqPnxx8pOIgKs6BJnnVq0FdVTPtZ/5 M5+6PlrCaCi29K8ns+yPJz6L3s7+QzFQSXliQBNX+AwaRba3G1WpvP8gpCEU1cyy0IqO bVyEweXcLMobd1qHsFsjgXC3E70cWCIQE7kW09vEqGKpCIfD5iL8lgigeIX85Zy9rbqG qYw/TgpOCyPV9MLUW756ZPex64vRfsvmHVkbUOAJo03AEeujovp/XHU8w8/le6aitECP 2+NUZ2LEXV31KqNj9npF+bnDOj6vyRo/nWQkbV86EdkYYj9FlFq9qYgBla4HECSYNum9 cXxQ== X-Forwarded-Encrypted: i=1; AJvYcCVKVyVdkx8DQDfMbbnT8GrJeWJi8IOQiiZyEXGAxCeSFmdwsgNZHOsPmhrntErPLb76tX1C7e9xjj7rT38w0WQzwWU=@vger.kernel.org X-Gm-Message-State: AOJu0YwRJs/xzMTLv+R7L1mAMdd6RkJ6C0eSyls6WjN8BTnEJT2QwQ/J PmSgfcu7ZXOTvGCTY6fWOXy/UsetDzpi7y/cHe0ZxSXD4cAlAKfK3mpY4xIiO/Ifz01div+uh9o 830uCjEEmeOnfUdaGI4gWX5xrSvZ0Z2nnHY7ceU/mz4hlcz8iiCySsnvKLpAq9q6iHbxNRg== X-Gm-Gg: ASbGncvc5qLSHdzRpq2A7nmgfIda6JUP5xxDnYdvpAn7sWd7421gKguRzLB3j/deuz7 IbyM+UhIZXPSF0hJC2AsjEU1k7tRqX9/vapoK5nxSkgU/CutnpF4VMZd6SSCMXgkfQ947+KS7Xk Q8BGigwqTLzHB4wtlCOHwaDmE4aPL2r8ldUe4lTSFE9S+gqfEAA4DxNMkLYDHoVtjZp2o/NrKkI o+IiQ5dkwHDOqD+Cqznaua1Lp0j2HArI+L0/2PU2Gec+FywMkyQoWKZXQn+Uf1P/7Fcx3LzqU8t MsQZdxGR846tGY891q1OXS8NtZWp25dTOHR7KBuUIhyKusZjKckESmUNVkIZpgMiKQ== X-Received: by 2002:a05:600c:4f89:b0:439:4bb0:aba0 with SMTP id 5b1f17b1804b1-4394bb0adb6mr17902675e9.8.1739216329957; Mon, 10 Feb 2025 11:38:49 -0800 (PST) X-Google-Smtp-Source: AGHT+IH0Un4mhmMUf8CyrAHOSxNb3M+hrifrlRxr2GMjHtCC/u6kqhVG2aMU260gZj/4uoW1cKICNg== X-Received: by 2002:a05:600c:4f89:b0:439:4bb0:aba0 with SMTP id 5b1f17b1804b1-4394bb0adb6mr17902495e9.8.1739216329619; Mon, 10 Feb 2025 11:38:49 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-43947bdc5c4sm26951255e9.23.2025.02.10.11.38.46 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:48 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 12/17] mm/rmap: handle device-exclusive entries correctly in page_vma_mkclean_one() Date: Mon, 10 Feb 2025 20:37:54 +0100 Message-ID: <20250210193801.781278-13-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: m1Db-5n8xGi7e2fYmcV3zCVnH-ZKN3IN7g22YQAT44I_1739216330 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). page_vma_mkclean_one() is not prepared for that, so teach it about these PFN swap PTEs. Note that device-private entries are so far not applicable on that path, as we expect ZONE_DEVICE pages so far only in migration code when it comes to the RMAP. Note that we could currently only run into this case with device-exclusive entries on THPs. We still adjust the mapcount on conversion to device-exclusive; this makes the rmap walk abort early for small folios, because we'll always have !folio_mapped() with a single device-exclusive entry. We'll adjust the mapcount logic once all page_vma_mapped_walk() users can properly handle device-exclusive entries. Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand --- mm/rmap.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/rmap.c b/mm/rmap.c index 7c471c3ea64c4..7b737f0f68fb5 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1044,6 +1044,14 @@ static int page_vma_mkclean_one(struct page_vma_mapped_walk *pvmw) pte_t *pte = pvmw->pte; pte_t entry = ptep_get(pte); + /* + * PFN swap PTEs, such as device-exclusive ones, that + * actually map pages are clean and not writable from a + * CPU perspective. The MMU notifier takes care of any + * device aspects. + */ + if (!pte_present(entry)) + continue; if (!pte_dirty(entry) && !pte_write(entry)) continue; From patchwork Mon Feb 10 19:37:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13968989 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1676624C67F for ; Mon, 10 Feb 2025 19:38:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216339; cv=none; b=m3STD5U9wtqv0otkea8in4y77VEMqtrTgbsR23qk3ZeBECcGYo2gMuGxjJNY10Ue/mOL5Qn8qLgvEPOBqdbdEbHwNx6tGDvjdubijpyya8yEr55eFE4l2/my116AFtJv3w8dMq+Ige6exStJrWWmpV/AI8lHq7/PWa9WldOJsWc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216339; c=relaxed/simple; bh=OGZL6ZsQM7FmXrtZCcJrdgAs8xmw1WSnlW2HAWblnM0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=JcOiIUOC0fH6VBo35Yl39TQbx1Oj6EpU3XlS2l/xV9oucFNCgh3tiy6eTQRBEvEcv8TKtdZ/9V4O6mxeKrOUybvD+1lW0wMxspmVsCVlqHLAUMvtZMWES4WzeRGyAKtNIxbzd7TZU/RUv0TBFLR1aqKTLnSQ9A693/0SnN3qnJE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=NzLQBD1Y; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="NzLQBD1Y" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216337; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mwHnciP+OvnJG9alUSG9/jEypyVF3ZPTXQLV9dVcfyg=; b=NzLQBD1YoiGo2u0kAb6fxvb28ijj6H/GOmQSpBziT+sH5GV7wYf7AwEeNxiuLwNjku3WJk B7AaYQ1DZCY7RpTkUBgR4X6mHl7gUm0SpsbigSlof2w/lsdAC4Oegf/Cw6TZ9KoVcNQtwC m6b57wn0lY3bDqEhLsIONzX3qtQIplA= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-206-cnTFJMGlNTuuESr-xZUhOg-1; Mon, 10 Feb 2025 14:38:55 -0500 X-MC-Unique: cnTFJMGlNTuuESr-xZUhOg-1 X-Mimecast-MFC-AGG-ID: cnTFJMGlNTuuESr-xZUhOg Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-38dd6edef72so1033065f8f.0 for ; Mon, 10 Feb 2025 11:38:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216334; x=1739821134; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mwHnciP+OvnJG9alUSG9/jEypyVF3ZPTXQLV9dVcfyg=; b=nkmIISj10U0JWinN0pAcOAOmcInvvCZ/rveFH4huWc0YwRjDRTBu6BaWyXraWv2ZZI xoouoe1xiMufbmlBDXljG/ZMbFgc2Hyn/3s0Lum33reSR5eb3RO6veARaIS5Z24ifrPG Ps/FQpQ1vJxGTbiqvEbVHWSD5ZZ0b4MclQHsixyKl2K/owkmlQkf+ebbXEkkYKkEAclL 3cehBs2UnBNaZvJN3hOlotZmjsYTbXQ9Friy1aJ7d+pMZeztmfBq5+4i9yN2WsM5VHrE GxwmfF0/9YbM54HRddQSD45zimrOAHXmdY3bO8xQxLHwozoGLsYDeSuZU6ldlclU6wMd h3Wg== X-Forwarded-Encrypted: i=1; AJvYcCVMkE7lsVC/daiw2jh8q/hgbkuY9xOCGel87WV2lmav8TIVeX1qv5XlkkPKC77m0qobLAOBHEvE0ngelLOCrZc+iI8=@vger.kernel.org X-Gm-Message-State: AOJu0YyRPg0iK3Gj+7CB02kkNL0fqYJpnaVHxOFVFGrcDNTMEcki10iM wAMFQyYcUR+vAqE0InFeq4Bt2ORIUv2S5B7yfG3dL1HI3oS/nki2tUFBMHYzW9ol8sEN5LGF2u0 btWRj3tjIlNsDO4IZIiUWZi2tpp2POEbbQ+27SuYzCA7ZJSZtZCeHWfSr4vKqGCqvGuzhXw== X-Gm-Gg: ASbGncuQnNt88qA0fKPrOPOHxOffiMjhehPlNKJppRZ6IS0zhYqbnklB0wC7lumOuVN 9+/uSQk72MHWX7JgKKq2f58X0OGqgtaeBdcmeZjpS889sgBiYPx1WTmM14rxeYrA2LX1ZTj5kg9 ECCdh8CuD1Q9kdO+9QsYZEOICapXjRvUSm9VdDjyPmB5a7EvUilJXiQQdcj3isw8NSebZ5HSjpQ 3pFnpJc12HcFHWcolRNHPCjlP06LyIdZL2KZh2vnBXq6hMvteAxZUOBabOO6o1czJA/vlXlXqas RzT9OvJETOLf8ue2vkTMMOC6wQgHiCoHuIFS4UsvSUXvg2oaj+m+VBqARqsnOkP/Ag== X-Received: by 2002:a5d:5f42:0:b0:38d:df15:2770 with SMTP id ffacd0b85a97d-38de432d90fmr568594f8f.0.1739216333878; Mon, 10 Feb 2025 11:38:53 -0800 (PST) X-Google-Smtp-Source: AGHT+IEQ8jI3Pr0PhgIrX5XFi88nBpwoaMc1TLFqexlyUt/iDFsCN9k6Hzlrlovdfu0/+3mYproMTQ== X-Received: by 2002:a5d:5f42:0:b0:38d:df15:2770 with SMTP id ffacd0b85a97d-38de432d90fmr568579f8f.0.1739216333460; Mon, 10 Feb 2025 11:38:53 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-4390db11200sm187831345e9.38.2025.02.10.11.38.50 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:52 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 13/17] mm/page_idle: handle device-exclusive entries correctly in page_idle_clear_pte_refs_one() Date: Mon, 10 Feb 2025 20:37:55 +0100 Message-ID: <20250210193801.781278-14-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: sEdfl9eJ3n47tbl5-Pb-M7KT_eXLkYwg8Wbz5YnS4e0_1739216334 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). page_idle_clear_pte_refs_one() is not prepared for that, so let's teach it what to do with these PFN swap PTEs. Note that device-private entries are so far not applicable on that path, as page_idle_get_folio() filters out non-lru folios. Should we just skip PFN swap PTEs completely? Possible, but it seems straight forward to just handle them correctly. Note that we could currently only run into this case with device-exclusive entries on THPs. We still adjust the mapcount on conversion to device-exclusive; this makes the rmap walk abort early for small folios, because we'll always have !folio_mapped() with a single device-exclusive entry. We'll adjust the mapcount logic once all page_vma_mapped_walk() users can properly handle device-exclusive entries. Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand Reviewed-by: SeongJae Park --- mm/page_idle.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/mm/page_idle.c b/mm/page_idle.c index 947c7c7a37289..408aaf29a3ea6 100644 --- a/mm/page_idle.c +++ b/mm/page_idle.c @@ -62,9 +62,14 @@ static bool page_idle_clear_pte_refs_one(struct folio *folio, /* * For PTE-mapped THP, one sub page is referenced, * the whole THP is referenced. + * + * PFN swap PTEs, such as device-exclusive ones, that + * actually map pages are "old" from a CPU perspective. + * The MMU notifier takes care of any device aspects. */ - if (ptep_clear_young_notify(vma, addr, pvmw.pte)) - referenced = true; + if (likely(pte_present(ptep_get(pvmw.pte)))) + referenced |= ptep_test_and_clear_young(vma, addr, pvmw.pte); + referenced |= mmu_notifier_clear_young(vma->vm_mm, addr, addr + PAGE_SIZE); } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { if (pmdp_clear_young_notify(vma, addr, pvmw.pmd)) referenced = true; From patchwork Mon Feb 10 19:37:56 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13968990 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2B9182505A8 for ; Mon, 10 Feb 2025 19:39:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216342; cv=none; b=EQxdPWqoo1C3zKSoCcIWaIjC42oJpPmYQDuc8B95QhV/AmbfPmynqAXmiivHiiUWU/YzZTbWu9/DH7TGwxH8F4n30Zrd857lwGhcak/sZtOd59yy3pNXxLaVKH/LCrpSkIbOCf0BQB6VOiLx0XUNU/L7xhCEJNS6ycvwMJvWwAo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216342; c=relaxed/simple; bh=xdb/Bd8nabay5U0SQXbchR0hBzNIzuX4jtY3vE4Olv8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=iHMudYlmvh/Hi79pnnaB9TvAxfZzNiIOKyPBOyQTvQyTUtxBejgMcYqgT/WqGJ9k8B+QBk7i1G+QNXNW4Cwh3mFY8xqBRM2Vqgw/3qZ92s2BS2lFgbFzKFpQYtw0ZLHS0CA74aIX2vR7jBF5cBFsHha9mmyCzK09baZCjg6gW3k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=HENOEqYW; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="HENOEqYW" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216340; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oE9Ja/aC61CHup4qeT96Odiagg40Q7TPEFXRUvHwXRo=; b=HENOEqYW23FVkl2qDgOt1R9Yb2YQsPWKLOM7RE3Lz48FdX4WK6IjmMjQ3pPwnpSTyXYDuf w7jBnPIHOaeceoQaNQSfuxT7dsFlYfyy3jzEx0zmDHeAk5kH6M+hYBFq0RurpIBcTjw23V EHg3DsgrXoTX+ZQ2e/tGel8oyacktDM= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-447-9bPCQyu_NMyZHg0ZssKrWQ-1; Mon, 10 Feb 2025 14:38:59 -0500 X-MC-Unique: 9bPCQyu_NMyZHg0ZssKrWQ-1 X-Mimecast-MFC-AGG-ID: 9bPCQyu_NMyZHg0ZssKrWQ Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-38dc709f938so1785941f8f.0 for ; Mon, 10 Feb 2025 11:38:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216338; x=1739821138; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oE9Ja/aC61CHup4qeT96Odiagg40Q7TPEFXRUvHwXRo=; b=BQszKyd170cZYcJf83ZYziBJOXs0sd2eXykzsdRcPihNQSTXf3s70Vihjx3jk5Lbrh 6WKbruh98OOtiV+X1J+zQK1Pm6NAk+MS2IKRvfOFsHQlQ/HgPft8CbjkSBXG2zWssKNV Xws+iEYKAF2DY2A/jml3+c1UlinPjx6XZAhKPytSsXnll5801xWKt/5nRj7erHomkQa+ BP02fgaqIYxUYsmqvGhmAc4JunSzYld4UioTMTRAlAza+zn9GbEuHNvabocrTPtG4zUD SAI/DDnmLWtZEqywzX5k3XlPKn+bO5Byhq4hveIm1xhlZi8GFp5bzW5kDgL0NAj5vWy/ GY4w== X-Forwarded-Encrypted: i=1; AJvYcCV6eDn7OevGkGIoIjPqlzmf3uPPjAfCEznYtjgBKNojpYLbMxyrN+Z0CaV2bmk3QGPvNO3qP1PUjNoYXCUVgi757FA=@vger.kernel.org X-Gm-Message-State: AOJu0YzubdRyYlkFvJ09bBGLgUcclIzBYViN5/wflvO5LGRHwm+lgwk9 8V/jxZgqRJt+AuaAqwl0YP63QjKYMPPktO7SR7ggJWsKpOdJk1YxqIUFLubjlhLv66n3T9QxBkx 4N2956DCI3WO3NLnTb3gPg3BtRka/uOYGZEBDEZhatJtQFD2OKgZGxvysPtSby041hJJQhQ== X-Gm-Gg: ASbGncsQJNQcw9cBjrb+Z0RrsSVE0BEnRFr6RD/aZzK0dUwCw9vK9MxWqGR8OOC+3Z3 8/XAJa5VoLhf/dvbNEcdiazOpfgN6PQxHvWYJnECPBMTL8bDOpt0/OfT46G7hUc7AhWXAhfy0CH ynAp7UHY4YqetXpIhIcthhPR66TFLXJNlJ89PVeUqiHeSsGtqrcnRMakoyXtd+/wExVkc1IxP8p nbvjcxpBLH7a5LfDmTsEz6BjrDCeYm8j1+XTsh55QLqkz0+YLZ9DCLZbLKvCfB3658i7610b67n JnpZzGhYEirTxzyZi/cW7q26roKiYY84iMZvxcQGv1H22R5hTvBU3OvRBGHYw3qK3A== X-Received: by 2002:a05:6000:1887:b0:38b:f4e6:21aa with SMTP id ffacd0b85a97d-38de439b7e5mr512530f8f.5.1739216337879; Mon, 10 Feb 2025 11:38:57 -0800 (PST) X-Google-Smtp-Source: AGHT+IF/uzgNmJSm7r4vj5m1RC+0Hk4d4M8qsuHzJI2p/6gaMOev6b/snbfbww+U3ctbsFbws8/HTA== X-Received: by 2002:a05:6000:1887:b0:38b:f4e6:21aa with SMTP id ffacd0b85a97d-38de439b7e5mr512516f8f.5.1739216337518; Mon, 10 Feb 2025 11:38:57 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38dcc9bd251sm9816921f8f.9.2025.02.10.11.38.54 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:56 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 14/17] mm/damon: handle device-exclusive entries correctly in damon_folio_young_one() Date: Mon, 10 Feb 2025 20:37:56 +0100 Message-ID: <20250210193801.781278-15-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: YJUsLkHCGBoG1cNoqx75xO6AVxqxZ3j6POs9H-dnZV0_1739216338 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). damon_folio_young_one() is not prepared for that, so teach it about these PFN swap PTEs. Note that device-private entries are so far not applicable on that path, as we expect ZONE_DEVICE pages so far only in migration code when it comes to the RMAP. The impact is rather small: we'd be calling pte_young() on a non-present PTE, which is not really defined to have semantic. Note that we could currently only run into this case with device-exclusive entries on THPs. We still adjust the mapcount on conversion to device-exclusive; this makes the rmap walk abort early for small folios, because we'll always have !folio_mapped() with a single device-exclusive entry. We'll adjust the mapcount logic once all page_vma_mapped_walk() users can properly handle device-exclusive entries. Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand Reviewed-by: SeongJae Park --- mm/damon/paddr.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c index 0f9ae14f884dd..10d75f9ceeafb 100644 --- a/mm/damon/paddr.c +++ b/mm/damon/paddr.c @@ -92,12 +92,20 @@ static bool damon_folio_young_one(struct folio *folio, { bool *accessed = arg; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, addr, 0); + pte_t pte; *accessed = false; while (page_vma_mapped_walk(&pvmw)) { addr = pvmw.address; if (pvmw.pte) { - *accessed = pte_young(ptep_get(pvmw.pte)) || + pte = ptep_get(pvmw.pte); + + /* + * PFN swap PTEs, such as device-exclusive ones, that + * actually map pages are "old" from a CPU perspective. + * The MMU notifier takes care of any device aspects. + */ + *accessed = (pte_present(pte) && pte_young(pte)) || !folio_test_idle(folio) || mmu_notifier_test_young(vma->vm_mm, addr); } else { From patchwork Mon Feb 10 19:37:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13968991 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 744122505C6 for ; Mon, 10 Feb 2025 19:39:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216347; cv=none; b=tqqzn96sbztcQsdJpowr+ZiU91CXEcKLkk03/NYlqAKii+VaXJ4ttAriCxhkfuMI/9eepcVtSO2Qvfbn+lnbJI/38nfh8WJRUSlsy4z6GbNWU9e1Lhorlt2lawgEhp5GPLdEdiFH9IXN8updCbeAId36ViIorPYknyCq5X6rjHY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216347; c=relaxed/simple; bh=eeTQic3U3Pz3ShuuvH4rmh3qdHr2tcndU+mn29E8LNI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=Gk1gEtbreW0DRh6Y500GV0r+Z2QW/a8TT2JWgVwGIbsJxGsZva/WUEKfBSMtf2NzawGA/duKNE6haEqYL7MZS9UZM3b2B304GEgWfHej4+RatsLvb8QcFZMFQrtlv2IfmlQ1oHtNUGyzbeA56ukaaT4HD4XwvMasR2ux7EQ8NAY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=VmY90EVe; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="VmY90EVe" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216344; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oNpxFEEEyxoH6lwF3K+DQE9+aA/s7jzY02gjXCRiBPg=; b=VmY90EVeuasymwo09mvaugZJVMt9XXNThXA2SNv5uzXii4d/djlwCwTyODIE6ja4L8CGzV B+D9OxHu7vyGm82L0J4kvuIfpk0aytE67PZsfz52BD5QVE9H8hZr65iLSgf756ORxVDG7E rFsd+L6rOqyNZa96f8zY/JRVVADuOhM= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-578-DWoiHk8hNGm0a8L8e6AOaA-1; Mon, 10 Feb 2025 14:39:03 -0500 X-MC-Unique: DWoiHk8hNGm0a8L8e6AOaA-1 X-Mimecast-MFC-AGG-ID: DWoiHk8hNGm0a8L8e6AOaA Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-38ddee8329eso708599f8f.1 for ; Mon, 10 Feb 2025 11:39:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216342; x=1739821142; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oNpxFEEEyxoH6lwF3K+DQE9+aA/s7jzY02gjXCRiBPg=; b=OhdOEdaG2TWULNiorx6LwOl8ooplYoa7inVo+Pz7aDQHKH4/7tc8Cfb6euSJBFG1y0 9Mn2D/rhayb3wXkC84mWjDQlfQJomSIp5AzvZYEGPdEGvoq6ShxonZp8JDaQIpZFjS5C F2mRaD+QXCT7dHlnCuBxzchP+DxupJE5j63iO+luXAAWm3H92HHDpoerfJtuBihoLdmv XJTq7gzN/MQUcopz4WvEHpceXSTkZz9TU64HhF9wV5tQA1h9AQ+3fsXTS2leG5mRevtn /H1VCVpf8lRTd8cW4lrMTNaMNiRf6W/xRj0nn3hCT+xJX42btqKnFRLI7Q7UAKPDrN2v SgQg== X-Forwarded-Encrypted: i=1; AJvYcCVSEeW3+H1H+b5bQ3QHmXcvptTLfND9UoOfuaco1EhNaJ7x74akk7xACTAk8w7OFBOW/PBb9nv5uiCiUrW1tKanyCk=@vger.kernel.org X-Gm-Message-State: AOJu0Yz2bBsCidXLJGR5ZZd5ZSWsygIh26nPGp0q3k73R7UTXUP24SWj 0nuJPpG+sKO9wO8TfUVQ69rDSBz+p3+HNlDqV42BnajTz2ScTbnTno5CQ9o9kOdfYGmjcKqP2zK sWg6QWt9XdoP3wNQzeWCnN9FiLx7xMbEZvAJHFv8Aci13uSTUJGBC6gtLAWYr+GzRUCYlyw== X-Gm-Gg: ASbGnctN/x2LThuubaO+SUa7/3gtifCkwYAF1EKwFO6ONmG95NcjikUZcoMEoFQkkO6 5OCXHEbpBtNAcmiWXmU9KxIrCwI/XxQTe4zWVQj0a3ODFFF7/QOYS/Cnc/QqM1LQPusYdfZ+NNS C5kKW7I5z4q7v16SgkunFtlRlDmWJ6gAWF6vIgvfUFMfWRRnGBCkj729/AuqeQ6IOjKqpxV0mOE uNQ7ZAFWX21ybz6mepPzWhYTFaR8nRypjSWLOKyvXZfYxPYHuMe3wJ/BCXUerOvGYfZgqvMnAXd eEzFFFOYUhobs1hAaF12r1guNJtgDoRfHl6PFDeKsxPX5n1ztdATgtqvf5Snrft6rg== X-Received: by 2002:a05:6000:1813:b0:38a:418e:21c7 with SMTP id ffacd0b85a97d-38dc935246fmr8277024f8f.53.1739216342063; Mon, 10 Feb 2025 11:39:02 -0800 (PST) X-Google-Smtp-Source: AGHT+IHTiiTPkwAyfuLx0qL+LO6PapdaXuVNjUwBeGg/Z0ah/0RffyIckymQJ3LkKa1NUzxInLX6yQ== X-Received: by 2002:a05:6000:1813:b0:38a:418e:21c7 with SMTP id ffacd0b85a97d-38dc935246fmr8276996f8f.53.1739216341643; Mon, 10 Feb 2025 11:39:01 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38dc4d00645sm11916376f8f.66.2025.02.10.11.38.58 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:39:00 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 15/17] mm/damon: handle device-exclusive entries correctly in damon_folio_mkold_one() Date: Mon, 10 Feb 2025 20:37:57 +0100 Message-ID: <20250210193801.781278-16-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: GHKHc-pr5u6hR1hkEp_Jp5sSjnxkaIJodDoxZD_sx2g_1739216342 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). damon_folio_mkold_one() is not prepared for that and calls damon_ptep_mkold() with PFN swap PTEs. Teach damon_ptep_mkold() to deal with these PFN swap PTEs. Note that device-private entries are so far not applicable on that path, as damon_get_folio() filters out non-lru folios. Should we just skip PFN swap PTEs completely? Possible, but it seems straight forward to just handle it correctly. Note that we could currently only run into this case with device-exclusive entries on THPs. We still adjust the mapcount on conversion to device-exclusive; this makes the rmap walk abort early for small folios, because we'll always have !folio_mapped() with a single device-exclusive entry. We'll adjust the mapcount logic once all page_vma_mapped_walk() users can properly handle device-exclusive entries. Signed-off-by: David Hildenbrand Reviewed-by: SeongJae Park --- mm/damon/ops-common.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/mm/damon/ops-common.c b/mm/damon/ops-common.c index d25d99cb5f2bb..86a50e8fbc806 100644 --- a/mm/damon/ops-common.c +++ b/mm/damon/ops-common.c @@ -9,6 +9,8 @@ #include #include #include +#include +#include #include "ops-common.h" @@ -39,12 +41,29 @@ struct folio *damon_get_folio(unsigned long pfn) void damon_ptep_mkold(pte_t *pte, struct vm_area_struct *vma, unsigned long addr) { - struct folio *folio = damon_get_folio(pte_pfn(ptep_get(pte))); + pte_t pteval = ptep_get(pte); + struct folio *folio; + bool young = false; + unsigned long pfn; + + if (likely(pte_present(pteval))) + pfn = pte_pfn(pteval); + else + pfn = swp_offset_pfn(pte_to_swp_entry(pteval)); + folio = damon_get_folio(pfn); if (!folio) return; - if (ptep_clear_young_notify(vma, addr, pte)) + /* + * PFN swap PTEs, such as device-exclusive ones, that actually map pages + * are "old" from a CPU perspective. The MMU notifier takes care of any + * device aspects. + */ + if (likely(pte_present(pteval))) + young |= ptep_test_and_clear_young(vma, addr, pte); + young |= mmu_notifier_clear_young(vma->vm_mm, addr, addr + PAGE_SIZE); + if (young) folio_set_young(folio); folio_set_idle(folio); From patchwork Mon Feb 10 19:37:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13968992 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 80BA72512C1 for ; Mon, 10 Feb 2025 19:39:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216351; cv=none; b=O4eSPSmdM7uFTouNhQpwsyFvB48wEEGAJ/O9C4nIwFAJgyHBIrsH5aXAsDYZXqEfAYWYIYlOl/Lrtp5uIK29400Ge23Mt8A3nKSsqw3XJwIp1dkIBhuzcd5Gg4tDYJ1QYYxH7KoJb1CNTxNbA9ev8MVCI/ltpcGFnFcYOy3sksU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216351; c=relaxed/simple; bh=zopD3ByQcw+T0PaGnkSFpwWr8v6nq8ox8511K+fI4zo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=jQQXGCukAOfaXgArQnAvgw5DxjTg09vDTxekmJ3LxpFQwxXe99IVsCEtcUGHnqet9YqA5xA6fLlCFRzt0Xr5yOdQHOflTiS2p7er8Ckmb8v3ES7s3sGVbQ/uVwAB31mPilCL7jYCIou3prljjuAFtOrtEDMCUegf26HxrzH2v9g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=QK+hDwHu; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="QK+hDwHu" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216348; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=e5kPSWbMiVGXXHUJW6xa869lKm+zAgfCoOcPS4ARFgc=; b=QK+hDwHufu5+xDOo8no0EHLZEQGh5hKXHLjZKTQmUzmmYZC6kQn4zyQVnSbOqYUt80HSjf 6UVCqFyHyAgyyHOxfPUkDzqecgota90VdoabH0v21eJ6WTtcgqXGaYMKSOZ7T6JBFxaaQw MCPwAbpmVModELR3naTef5e0NxKyiZg= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-602-OA_AJe_YNReu83-YtC628A-1; Mon, 10 Feb 2025 14:39:07 -0500 X-MC-Unique: OA_AJe_YNReu83-YtC628A-1 X-Mimecast-MFC-AGG-ID: OA_AJe_YNReu83-YtC628A Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-43933a8ff58so13555185e9.2 for ; Mon, 10 Feb 2025 11:39:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216346; x=1739821146; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=e5kPSWbMiVGXXHUJW6xa869lKm+zAgfCoOcPS4ARFgc=; b=MLO8ofhirKdKYnnpclrNnQ7mPqnaLt4i31cNDtIc6dtz/+U7BnodPH2yPjfQaSAvxz 9TSCXl6juxnDCXSno3lKNvdpS6LhFlKcN15GT/1Td6Vi+3iP+ziNrK6AqRMF3M5xjDZ4 1WK58Ho9GHsCnO4moZL/IO5bb9Ilnw9/SMit/DBp5j6prAYdLbvOq2nk11k4W6e5WEUB gS59KnyuL4AznMGItGJM20W6GSwWfPCOWJ5pDijCnuxhKnB2Br5SUo1fYamx7Kc1cLTZ Jcp0qK6yYWS/G82lk3bC01+vq4kXu1iSLLwQ4x67h4f65GWItnjplt9+GeKLA+n1C8k5 32FQ== X-Forwarded-Encrypted: i=1; AJvYcCVscC2ABU5vwg6yPOy301re2+QGfdJkSNH6+kR0BwAkv/2KVFPxIDPAo/KcoNRCOFuR2em3SR9tz4a36Sf0h4VD6Kk=@vger.kernel.org X-Gm-Message-State: AOJu0YwmMp8UUWjq8+SZcRMRadL2RLoeNVHzDBmhVw/9MPSixxS0eNWc 4G0xlOic5UAh/e3yCgq/T7RRmAzL/zqWWRQocCR4QiotS4hYDChp4HqjdMn5/9DSLXygDPxFud0 ztgLq93ils/R/YZXab8LTVvCLhtUfrtcSCnRSzoD9STGW/TdRapBISEBZHio067VI8VkQjw== X-Gm-Gg: ASbGncsyiWFCeRuloETkqIabVlVxageJmbGww8oDKtr9xqLkdz8tve5yHUkCflHMx/l CnNe6pGq1Issdvd0JXtbZhT258/Cgd10vMby1GHo8pj/yx2gjhk+8Ch68zj9zlLdpqQMf+mmKFd 2oDDH4zIzu/Krey+0DNkaBrXfMERSpVyq1csejigpbM1r7/KnjTJ9eE1k8BxJ7GhrQhNUBCGH1f IMlQKAMy/mRBdSv1IONYOGcWWdM0buDXBLPj+x1v6gXR6nNtL+JYkMA6OR3DGaM5+9oydTNLHfV xf5jRRfczYJlgtKIR87Rwvnt5uA1jjeTsXbTJbfcIk/e51+AYU0UpJVhwv4GXlgsGQ== X-Received: by 2002:a05:600c:1913:b0:434:faa9:5266 with SMTP id 5b1f17b1804b1-43924991f73mr122649015e9.13.1739216345788; Mon, 10 Feb 2025 11:39:05 -0800 (PST) X-Google-Smtp-Source: AGHT+IE0N2Q/3/pgsBEDUurQcQIsnMqE2C8UdYDG2bl1ZHVr+k7cZL7qtDol6R4B+8CggSnILo8JzA== X-Received: by 2002:a05:600c:1913:b0:434:faa9:5266 with SMTP id 5b1f17b1804b1-43924991f73mr122648595e9.13.1739216345384; Mon, 10 Feb 2025 11:39:05 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38dd9c48173sm5308677f8f.37.2025.02.10.11.39.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:39:04 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 16/17] mm/rmap: keep mapcount untouched for device-exclusive entries Date: Mon, 10 Feb 2025 20:37:58 +0100 Message-ID: <20250210193801.781278-17-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: r7sFujFbO8Scw0Ee7ocLLwnwg5VGBPOAE3VG0dGwdYQ_1739216346 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true Now that conversion to device-exclusive does no longer perform an rmap walk and all page_vma_mapped_walk() users were taught to properly handle device-exclusive entries, let's treat device-exclusive entries just as if they would be present, similar to how we handle device-private entries already. This fixes swapout/migration/split/hwpoison of folios with device-exclusive entries. We only had to take care of page_vma_mapped_walk() users, because these traditionally assume pte_present(). Other page table walkers already have to handle !pte_present(), and some of them might simply skip them (e.g., MADV_PAGEOUT) if they are not specialized on them. This change doesn't modify the latter. Note that while folios with device-exclusive PTEs can now get migrated, khugepaged will not collapse a THP if there is device-exclusive PTE. Doing so might also not be desired if the device frequently performs atomics to the same page. Similarly, KSM will never merge order-0 folios that are device-exclusive. Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand --- mm/memory.c | 17 +---------------- mm/rmap.c | 7 ------- 2 files changed, 1 insertion(+), 23 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index ba33ba3b7ea17..e9f54065b117f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -741,20 +741,6 @@ static void restore_exclusive_pte(struct vm_area_struct *vma, VM_BUG_ON_FOLIO(pte_write(pte) && (!folio_test_anon(folio) && PageAnonExclusive(page)), folio); - - /* - * No need to take a page reference as one was already - * created when the swap entry was made. - */ - if (folio_test_anon(folio)) - folio_add_anon_rmap_pte(folio, page, vma, address, RMAP_NONE); - else - /* - * Currently device exclusive access only supports anonymous - * memory so the entry shouldn't point to a filebacked page. - */ - WARN_ON_ONCE(1); - set_pte_at(vma->vm_mm, address, ptep, pte); /* @@ -1626,8 +1612,7 @@ static inline int zap_nonpresent_ptes(struct mmu_gather *tlb, */ WARN_ON_ONCE(!vma_is_anonymous(vma)); rss[mm_counter(folio)]--; - if (is_device_private_entry(entry)) - folio_remove_rmap_pte(folio, page, vma); + folio_remove_rmap_pte(folio, page, vma); folio_put(folio); } else if (!non_swap_entry(entry)) { /* Genuine swap entries, hence a private anon pages */ diff --git a/mm/rmap.c b/mm/rmap.c index 7b737f0f68fb5..e2a543f639ce3 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2511,13 +2511,6 @@ struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr, /* The pte is writable, uffd-wp does not apply. */ set_pte_at(mm, addr, fw.ptep, swp_pte); - /* - * TODO: The device-exclusive PFN swap PTE holds a folio reference but - * does not count as a mapping (mapcount), which is wrong and must be - * fixed, otherwise RMAP walks don't behave as expected. - */ - folio_remove_rmap_pte(folio, page, vma); - folio_walk_end(&fw, vma); mmu_notifier_invalidate_range_end(&range); *foliop = folio; From patchwork Mon Feb 10 19:37:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13968993 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5F73C2512EA for ; Mon, 10 Feb 2025 19:39:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216355; cv=none; b=PeGIcZjKcrJY45iDZorxwjB0cw7AZb2+7pKLgc17ks7FlN+4DWH0HAKamSOd9dvQnhiqwnOBlu/okzXYcTrIYnMoRyK70SQDdVAsadZUo229oYP4KH99c4y/8xMME4HkRHPs2QLTpim1SP9Es0aBkeFfk4xnnH03b095+lSVEN4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216355; c=relaxed/simple; bh=PA01cEf2/9mIQZ9O32CRa6PGoRajWbPJGABkde9iNzs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=HqFJTv3Ydn0lWay1Y7yIvyCk274NiQTIPzy7ku4AwsgRNHkvEa3WTbVgMjzhQE9rlYzhqR1QTQcI3t9L8qVxkgiAZHR/7UVi652/q+w8G9OrH/1j6+ea6P2v5ILf2MRi8dmbgh9h3YVW3qWBwZhj13KgGPKRw1i6X7Xz97UVZlI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=OeIgX9EE; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="OeIgX9EE" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216352; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gxIo4/a9ZeK4nnisZvQU6AU2gxtpzJDN86Rrv62j7Tk=; b=OeIgX9EETrS1A6kA4ePwSvSgkM4zenbWyeU0W6YjUu9nulZgfEU6+PSVZfZRZ1hA0hywpI cXYdzRFPPGaRT9rrMDdYwHpydp9QQoDKEOVH9xaBMouefWFt6ShGyTHzrmn5fhnkNVMH1r bAWPMHZhTOxnBFqEu/mkH/9zRawwL+I= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-146-42hDAaPFPGyGailSjtle1A-1; Mon, 10 Feb 2025 14:39:10 -0500 X-MC-Unique: 42hDAaPFPGyGailSjtle1A-1 X-Mimecast-MFC-AGG-ID: 42hDAaPFPGyGailSjtle1A Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-38ddba9814bso733056f8f.3 for ; Mon, 10 Feb 2025 11:39:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216349; x=1739821149; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gxIo4/a9ZeK4nnisZvQU6AU2gxtpzJDN86Rrv62j7Tk=; b=WtREMvIQEz6alI4YMqpQcvUFqMRW0mqM3HQgVNwe6YssZnUaoxP1+6WMYwwW+M/fVH dtEZdPFUvXBRmLrSMa+Ex/1Gd11mzh6HzGEFI5PQTRkqYdWsf1qTBNZTLMiGGdPanffU lMCfQz7QmEbNrrLDYhYni0WbrlNpfBiLOtv0fcyQjNznEZ+G4pR0fDG+qcmkIDI6jNgW gn3zshJaPvgPTiyieSLrTzes/LoBvCnxMfXB/wEPeK8OwSrT1z//sHnbsTmS6h+O9X3g Re/8MUZv35H4XU5/td0Szr5QV6Btba9oRRQMcWscarjUKHeumgSf0mQLjyVLlrQPltiN tPzw== X-Forwarded-Encrypted: i=1; AJvYcCUs/EUldD/WHMGSmNQ11HgPI5vXFnWvgs/z4bNSkKPrHGvNsSbM4BF2zAqeon+lHCCjtRzgzmMHRoyt9LPmSN+VJmw=@vger.kernel.org X-Gm-Message-State: AOJu0YxJmJLEM4I+VI+tZdBBJX8DLH/Y8nJlAZHriJL9BE9xRYlcMci7 fHbc/v+rnspHp7/vJRaV4OB6I6GSqJuUtxVeJGXDV73U86T/VFc0TyuUfEqfxKuFHYUKPib0/Of sDigF3SvgKN+2POeMpVww4E4oZbCa/o60MAmYVKeF4wmRf2VJ0VuN3kKxAolRThZDEjvQUg== X-Gm-Gg: ASbGncsfjQIUYMlkUCggNj4b7fVE5l7oge4tw+KDrgsye4k6vvYK+2WHDb7XPxnuCmH nzJsa5Rv1Kc2h0dh/TZZJ9rzK6V6gYhN9oh03DbrTL4fOy/E4VM+kALnFla1mjcVsRxM9mEpkVf pRPA6ZFUHJGFZ1IRGL3ivSQ+PpPmWCkhk/1xMHpYZMfBhfBaRjhtY71gORBXdDP8oB9OcisBOMe XuzK3oAfXu+LKNM00zi2vGHnD7bLY42vGCAagkL6+KxWNfxSm+7X+wXO5WRy2Q1IapJVHVl8O87 zcNfdCs6hhvTRZmk0b4ZWKyF4MhARrMD8P4bNiOc5ih43Cf3++YY4/VFTTSCLWvFmw== X-Received: by 2002:a05:6000:1448:b0:38d:a879:4778 with SMTP id ffacd0b85a97d-38dc9343f89mr13325606f8f.33.1739216349523; Mon, 10 Feb 2025 11:39:09 -0800 (PST) X-Google-Smtp-Source: AGHT+IFOm3YA+KmUhyhlND1SLmjpOJWvwKHSp1MsMH3YdrdBRb69cVHv165Q1M/eFn8bTRexW9yPlw== X-Received: by 2002:a05:6000:1448:b0:38d:a879:4778 with SMTP id ffacd0b85a97d-38dc9343f89mr13325571f8f.33.1739216349113; Mon, 10 Feb 2025 11:39:09 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38dca0b4237sm10326047f8f.85.2025.02.10.11.39.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:39:07 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 17/17] mm/rmap: avoid -EBUSY from make_device_exclusive() Date: Mon, 10 Feb 2025 20:37:59 +0100 Message-ID: <20250210193801.781278-18-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: PkQ3SKXCfjrpPeDHrIT_D3gJIt_dGG1i55QJwjoUGiQ_1739216349 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true Failing to obtain the folio lock, for example because the folio is concurrently getting migrated or swapped out, can easily make the callers fail: for example, the hmm selftest can sometimes be observed to fail because of this. Instead of forcing the caller to retry, let's simply retry in this to-be-expected case. Similarly, avoid spurious failures simply because we raced with someone (e.g., swapout) modifying the page table such that our folio_walk fails. Simply unconditionally lock the folio, and retry GUP if our folio_walk fails. Note that the folio_walk repeatedly failing is not something we expect. Note that we might want to avoid grabbing the folio lock at some point; for now, keep that as is and only unconditionally lock the folio. With this change, the hmm selftests don't fail simply because the folio is already locked. While this fixes the selftests in some cases, it's likely not something that deserves a "Fixes:". Signed-off-by: David Hildenbrand --- mm/rmap.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index e2a543f639ce3..0f760b93fc0a2 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2435,6 +2435,7 @@ struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr, struct page *page; swp_entry_t entry; pte_t swp_pte; + int ret; mmap_assert_locked(mm); addr = PAGE_ALIGN_DOWN(addr); @@ -2448,6 +2449,7 @@ struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr, * fault will trigger a conversion to an ordinary * (non-device-exclusive) PTE and issue a MMU_NOTIFY_EXCLUSIVE. */ +retry: page = get_user_page_vma_remote(mm, addr, FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD, &vma); @@ -2460,9 +2462,10 @@ struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr, return ERR_PTR(-EOPNOTSUPP); } - if (!folio_trylock(folio)) { + ret = folio_lock_killable(folio); + if (ret) { folio_put(folio); - return ERR_PTR(-EBUSY); + return ERR_PTR(ret); } /* @@ -2488,7 +2491,7 @@ struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr, mmu_notifier_invalidate_range_end(&range); folio_unlock(folio); folio_put(folio); - return ERR_PTR(-EBUSY); + goto retry; } /* Nuke the page table entry so we get the uptodate dirty bit. */