From patchwork Fri Mar 21 11:37:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 14025296 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96DA4C36002 for ; Fri, 21 Mar 2025 11:37:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 18AFE280006; Fri, 21 Mar 2025 07:37:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C47C280001; Fri, 21 Mar 2025 07:37:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E814A280006; Fri, 21 Mar 2025 07:37:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id BE68B280001 for ; Fri, 21 Mar 2025 07:37:25 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 5D27259CCE for ; Fri, 21 Mar 2025 11:37:27 +0000 (UTC) X-FDA: 83245357734.27.C502863 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf25.hostedemail.com (Postfix) with ESMTP id D9BDCA000B for ; Fri, 21 Mar 2025 11:37:24 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=XNehyhPf; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf25.hostedemail.com: domain of dhildenb@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dhildenb@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742557044; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rJdKl7eMVXp/E0C5Su96l6r1ubOFuz+HPNb8MwX5DOc=; b=vdm55pPHRULUpJMTMifd1nC6OqU+Ax3ZneKcHnpEXEdnnicmEijVkjtdcfbrAvE3qaGsXJ yTDkLK2sOPamjnT+Az2xHRuWsxckoXssI4jZFsHo6kO8wFt3Qxgyyas7JbLESLMK0VrNc5 dSbwZaUpIb2/+IaS1dwlIROz9zxj518= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742557044; a=rsa-sha256; cv=none; b=5glYfoucNsSiv6RrbiWGqTJ03daFPbIfOQgnL7xDz059eM+RBOUYTHolHFfKvVXslrv8Zl 3hdQaPm4AHJshZVrDS4iP/3h+K/h18MgwVwtXrBwFJwAsT0N09JFa7t10L4xNo6+0QwBkJ a1/PRawb8w6BLzwtPjSYjC68Lv5kCbo= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=XNehyhPf; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf25.hostedemail.com: domain of dhildenb@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dhildenb@redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1742557044; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rJdKl7eMVXp/E0C5Su96l6r1ubOFuz+HPNb8MwX5DOc=; b=XNehyhPfdg1WT9ghKW1x4vk3wxgNU9RQn0MsjVCH+NzUtJFbenx7NlIwRZ0VEmc5U6s7La A0WRBB4xcir82LgA0G5sh7BwEUFOMYf74tCBN/nnZdj2OfMAXE9vtcsOEF2XogFO6zsxPJ Y5Zw5IOl2NoExznLVPZPK1hcYI7Yr7s= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-300-7pKD5i-BObO0ij89yuGLVA-1; Fri, 21 Mar 2025 07:37:23 -0400 X-MC-Unique: 7pKD5i-BObO0ij89yuGLVA-1 X-Mimecast-MFC-AGG-ID: 7pKD5i-BObO0ij89yuGLVA_1742557042 Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-3912aab7a36so728708f8f.0 for ; Fri, 21 Mar 2025 04:37:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742557042; x=1743161842; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rJdKl7eMVXp/E0C5Su96l6r1ubOFuz+HPNb8MwX5DOc=; b=S/uCY6GbMJhOpJ/QMYmrIhRSjs3cW5h8+irhd1LzCirWJO0BeSPbRugmRNFPXpnAHh On1hon8gTn5gupVnBef4iNef2Ck0oUR+yGjqnjAIPCQtKnm+Hq5QQM9uwEg0vP/OfoVe 4gklE7/WuH/cMYTJvpLNN98RhHbKSugJhdGrr/Gu31pXWUUpWOaFUs/ActDc/JjAOIk9 YxQllMJ+4XrTueqxLG2dwkvUczqzxiMGr1ZJvNvW4XiXEVyLCjuHfSzeVlG1CU4KhwjG y8X7PChZ4J/FbLm10Zl3hf4fR6FzXDGR13+EWk1NtVvBZPUKn1qjkB2O4Bc/c3NO/4D2 5RtQ== X-Gm-Message-State: AOJu0Yzon18GizJfNvjTx/bUS00b5JurKxSsdec3KvrQ/sjUrpl9EVkv YbMA6XAMELi8xJ/JP4bFB6pdmD6e5gABuTUtYqgplxK4jiVXV+yw7/123rJN0nsqStxItyGvvzx HB/JdKDXfyubvYp8yUKWkA+GtXLa7Qn6sNGLl/6fIQq+8FHjT X-Gm-Gg: ASbGnct4dofHb/SN14z7YkLrnsdWfJ4ybkFIDMhaMzTYtJsd3dzeIf06RpL31AoUk5b /pL855NbAma4FKBvsMk4QtUyowUh4HPnUfreSrS8QPzcohKtkQOyQY3jZ0SidiM1SXimktfWKDY TRWHV8vj1snflNd3PXmUbfRxwPpmNcNmXAc8fGMfpOFL6BzSH3mB0SX+T9aoXM5UELeNYX40L8g QQZIMRGgDIlsbuRHG3s17xur+on73AVGQA7Cn2Ih9pclrEQxsbuJUpTOFFsrFWwnSqwIUDRegvO Ke14FwvG5uSJS/Udf+J3c3O89FzpMLC0H+pwFgkK6WwZY4pVv598A+g6Ulh0OKXhHpddgdMT5U/ s X-Received: by 2002:a5d:6d08:0:b0:391:3261:ff48 with SMTP id ffacd0b85a97d-3997f92d06dmr3398983f8f.35.1742557041656; Fri, 21 Mar 2025 04:37:21 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE8Axp06fh0N+BXwj2bhaZ4JLVC+fk8FODg2R4HEdpHhstbeEn/gKETuIO/dmFqvvCgQmToHQ== X-Received: by 2002:a5d:6d08:0:b0:391:3261:ff48 with SMTP id ffacd0b85a97d-3997f92d06dmr3398923f8f.35.1742557041089; Fri, 21 Mar 2025 04:37:21 -0700 (PDT) Received: from localhost (p200300cbc72a910023d23800cdcc90f0.dip0.t-ipconnect.de. [2003:cb:c72a:9100:23d2:3800:cdcc:90f0]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-3997f99561dsm2182430f8f.12.2025.03.21.04.37.19 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 21 Mar 2025 04:37:20 -0700 (PDT) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, David Hildenbrand , Andrew Morton , Andrii Nakryiko , Matthew Wilcox , Russell King , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , "Liang, Kan" , Tong Tiangen Subject: [PATCH v3 3/3] kernel/events/uprobes: uprobe_write_opcode() rewrite Date: Fri, 21 Mar 2025 12:37:13 +0100 Message-ID: <20250321113713.204682-4-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250321113713.204682-1-david@redhat.com> References: <20250321113713.204682-1-david@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: MZHZDCAzJalWzbAeDt3pHq46YUpbsm2gGAJrfo9eq0g_1742557042 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: D9BDCA000B X-Stat-Signature: tbbhii4ygk1rig4banqzsbpqto1zq3rq X-Rspam-User: X-HE-Tag: 1742557044-317982 X-HE-Meta: U2FsdGVkX19Ll7ripf1O09jzSr9cZDjGldtLzTBbkSzWPxNJq927jmvn/pN7QRXCF9VdAHVSLQhHmWC+EwuZeixEk8ATsZVy8VXENoQYI/abaL127HmigeN6vSo7ywbGmUxQWe2jThL3yGPU+X635E7EQF3EBKNysic4M5EQRF2Vk+AyRKGkBSVyHV4kM3jgijwXG8yblO5REoquRF8KPdS6opWzN+uUzjW4lwSgK3RZQKT3LAuXAegs7AWXm+wl5d803cot3C77UK/5oBMroCRElFurkAB69A/BNeec7sbw9LDHkv2Ham3A5KEYV5c45ZUa7PgP0EZD2QgJ98m4cHejA7z90LHjYyIRwGBJVIQFLzzZjuaghhE5gTVzJpifYPxPQolex4+FnoGg/JXxHjuyXOh6SV8T69hTcwZWdTnVrl2lQTETTvnd8NSBgwQiBmU0d46b9nfd8c+67mExZgvyuEyj4Cb/69akXL6gDBzeWriAofh2MJgY3E58U2kSZHoPvdYA2ZmDc8uTJucmS8kWwyFxsufJlK4+IslxfLmfgLXZoP4sgNcWaNz0xq5LaRoIsy0Z+U/vh5K9NkLUqKHfh8DithKg/+AKR8oyMQNn2q7hvDt3aq3DJRNumtFgu3eRtlyJzKSLKfUTicGmNWbDAf5x68yIBN/pBfND7EgiyUdSQmYsWBNEpqcsYRJO1l9x3S4Qz4aW2/g8X0ZAJ4wWDmp+8QfHXFwo2/rVKOzK8tIQtwhKJ15dLpBmp7qUrEpy+EtpvQ1RUjcjVdgK2oqgjmavIS2RrEyh8SQEoIkaLGRGdo36YWb3jV723Ug+Yk1NXn7Ym5/3blgl/i4PBEo8izDCyWAKspOFDsXz2ZlsxHmJ4hoDiId0MwvZSP8jON2F1cKCM++XfH6F6MGUJh+RQXkQNwTgmF1sRlirjnMMYNdDkOSEMB3otKraX72RhcFApgXY6s65syObOoQ CprMi7VE 8WQ9WDAjdPLyTb/E7m7JF6269P5rejo58dzGpcG9kYBVAlTYH7/BJeVfCu/5W7oEIJ5ijYH5EOvHavdKcQgWbsZAcSaU1aE2OC+PWYj97Pv2GKG25nwqXT7kOPFjM5qyH8aTZamaeka/zfYUXmILpgVIqAb6Nzc3hg8CWks5IB3oFDKyX6+F0/X9Gm3Me0QOqK+jcXG5UbOvfRVSd6RsQ/UOueFeKxyQ+WNJtdtl3J0x9CkUeuEcqNwMnOFzisnapPYx4zE3frb4orqyG6uEZj2XRiBWbv4pmawr1s6FJtqwM3o7D+o2ALAqz5gEppDvfuVWxaIVF2VCBRT55iGs0LGO7nt+diJmffzgu+ziL2tu/PO63JmXLS1KIR7VnFP6dkL4OfYpPYkIK2QO0jhPjqsMhgxFMLtj7gFXUv/pIQGpI++VoqUni93mwu8j1wwSWwcR0EIjMI/tmP20buvT8YS+5kUDAYIHrvNGbX2JkSK5vPY9KWAUDzBY7IijtcwmhpZGwaJddenFG+6DfWtA2BuMLbLIkBlZ7KOGtt4D4lLlg5tDdir4oe2svwLQQKC9jThmIfUmx5dLTAOiZgrTtGCia3T0kzVZxi+Tx X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: uprobe_write_opcode() does some pretty low-level things that really, it shouldn't be doing: for example, manually breaking COW by allocating anonymous folios and replacing mapped pages. Further, it does seem to do some shaky things: for example, writing to possible COW-shared anonymous pages or zapping anonymous pages that might be pinned. We're also not taking care of uffd, uffd-wp, softdirty ... although rather corner cases here. Let's just get it right like ordinary ptrace writes would. Let's rewrite the code, leaving COW-breaking to core-MM, triggered by FOLL_FORCE|FOLL_WRITE (note that the code was already using FOLL_FORCE). We'll use GUP to lookup/faultin the page and break COW if required. Then, we'll walk the page tables using a folio_walk to perform our page modification atomically by temporarily unmap the PTE + flushing the TLB. Likely, we could avoid the temporary unmap in case we can just atomically write the instruction, but that will be a separate project. Unfortunately, we still have to implement the zapping logic manually, because we only want to zap in specific circumstances (e.g., page content identical). Note that we can now handle large folios (compound pages) and the shared zeropage just fine, so drop these checks. Acked-by: Oleg Nesterov Signed-off-by: David Hildenbrand --- kernel/events/uprobes.c | 312 ++++++++++++++++++++-------------------- 1 file changed, 158 insertions(+), 154 deletions(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index ac17c16f65d63..f098e8a4f24ee 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -29,6 +29,7 @@ #include #include #include /* check_stable_address_space */ +#include #include @@ -151,91 +152,6 @@ static loff_t vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr) return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start); } -/** - * __replace_page - replace page in vma by new page. - * based on replace_page in mm/ksm.c - * - * @vma: vma that holds the pte pointing to page - * @addr: address the old @page is mapped at - * @old_page: the page we are replacing by new_page - * @new_page: the modified page we replace page by - * - * If @new_page is NULL, only unmap @old_page. - * - * Returns 0 on success, negative error code otherwise. - */ -static int __replace_page(struct vm_area_struct *vma, unsigned long addr, - struct page *old_page, struct page *new_page) -{ - struct folio *old_folio = page_folio(old_page); - struct folio *new_folio; - struct mm_struct *mm = vma->vm_mm; - DEFINE_FOLIO_VMA_WALK(pvmw, old_folio, vma, addr, 0); - int err; - struct mmu_notifier_range range; - pte_t pte; - - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, addr, - addr + PAGE_SIZE); - - if (new_page) { - new_folio = page_folio(new_page); - err = mem_cgroup_charge(new_folio, vma->vm_mm, GFP_KERNEL); - if (err) - return err; - } - - /* For folio_free_swap() below */ - folio_lock(old_folio); - - mmu_notifier_invalidate_range_start(&range); - err = -EAGAIN; - if (!page_vma_mapped_walk(&pvmw)) - goto unlock; - VM_BUG_ON_PAGE(addr != pvmw.address, old_page); - pte = ptep_get(pvmw.pte); - - /* - * Handle PFN swap PTES, such as device-exclusive ones, that actually - * map pages: simply trigger GUP again to fix it up. - */ - if (unlikely(!pte_present(pte))) { - page_vma_mapped_walk_done(&pvmw); - goto unlock; - } - - if (new_page) { - folio_get(new_folio); - folio_add_new_anon_rmap(new_folio, vma, addr, RMAP_EXCLUSIVE); - folio_add_lru_vma(new_folio, vma); - } else - /* no new page, just dec_mm_counter for old_page */ - dec_mm_counter(mm, MM_ANONPAGES); - - if (!folio_test_anon(old_folio)) { - dec_mm_counter(mm, mm_counter_file(old_folio)); - inc_mm_counter(mm, MM_ANONPAGES); - } - - flush_cache_page(vma, addr, pte_pfn(pte)); - ptep_clear_flush(vma, addr, pvmw.pte); - if (new_page) - set_pte_at(mm, addr, pvmw.pte, - mk_pte(new_page, vma->vm_page_prot)); - - folio_remove_rmap_pte(old_folio, old_page, vma); - if (!folio_mapped(old_folio)) - folio_free_swap(old_folio); - page_vma_mapped_walk_done(&pvmw); - folio_put(old_folio); - - err = 0; - unlock: - mmu_notifier_invalidate_range_end(&range); - folio_unlock(old_folio); - return err; -} - /** * is_swbp_insn - check if instruction is breakpoint instruction. * @insn: instruction to be checked. @@ -463,6 +379,95 @@ static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm, return ret; } +static bool orig_page_is_identical(struct vm_area_struct *vma, + unsigned long vaddr, struct page *page, bool *pmd_mappable) +{ + const pgoff_t index = vaddr_to_offset(vma, vaddr) >> PAGE_SHIFT; + struct folio *orig_folio = filemap_get_folio(vma->vm_file->f_mapping, + index); + struct page *orig_page; + bool identical; + + if (IS_ERR(orig_folio)) + return false; + orig_page = folio_file_page(orig_folio, index); + + *pmd_mappable = folio_test_pmd_mappable(orig_folio); + identical = folio_test_uptodate(orig_folio) && + pages_identical(page, orig_page); + folio_put(orig_folio); + return identical; +} + +static int __uprobe_write_opcode(struct vm_area_struct *vma, + struct folio_walk *fw, struct folio *folio, + unsigned long opcode_vaddr, uprobe_opcode_t opcode) +{ + const unsigned long vaddr = opcode_vaddr & PAGE_MASK; + const bool is_register = !!is_swbp_insn(&opcode); + bool pmd_mappable; + + /* For now, we'll only handle PTE-mapped folios. */ + if (fw->level != FW_LEVEL_PTE) + return -EFAULT; + + /* + * See can_follow_write_pte(): we'd actually prefer a writable PTE here, + * but the VMA might not be writable. + */ + if (!pte_write(fw->pte)) { + if (!PageAnonExclusive(fw->page)) + return -EFAULT; + if (unlikely(userfaultfd_pte_wp(vma, fw->pte))) + return -EFAULT; + /* SOFTDIRTY is handled via pte_mkdirty() below. */ + } + + /* + * We'll temporarily unmap the page and flush the TLB, such that we can + * modify the page atomically. + */ + flush_cache_page(vma, vaddr, pte_pfn(fw->pte)); + fw->pte = ptep_clear_flush(vma, vaddr, fw->ptep); + copy_to_page(fw->page, opcode_vaddr, &opcode, UPROBE_SWBP_INSN_SIZE); + + /* + * When unregistering, we may only zap a PTE if uffd is disabled and + * there are no unexpected folio references ... + */ + if (is_register || userfaultfd_missing(vma) || + (folio_ref_count(folio) != folio_mapcount(folio) + 1 + + folio_test_swapcache(folio) * folio_nr_pages(folio))) + goto remap; + + /* + * ... and the mapped page is identical to the original page that + * would get faulted in on next access. + */ + if (!orig_page_is_identical(vma, vaddr, fw->page, &pmd_mappable)) + goto remap; + + dec_mm_counter(vma->vm_mm, MM_ANONPAGES); + folio_remove_rmap_pte(folio, fw->page, vma); + if (!folio_mapped(folio) && folio_test_swapcache(folio) && + folio_trylock(folio)) { + folio_free_swap(folio); + folio_unlock(folio); + } + folio_put(folio); + + return pmd_mappable; +remap: + /* + * Make sure that our copy_to_page() changes become visible before the + * set_pte_at() write. + */ + smp_wmb(); + /* We modified the page. Make sure to mark the PTE dirty. */ + set_pte_at(vma->vm_mm, vaddr, fw->ptep, pte_mkdirty(fw->pte)); + return 0; +} + /* * NOTE: * Expect the breakpoint instruction to be the smallest size instruction for @@ -475,116 +480,115 @@ static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm, * uprobe_write_opcode - write the opcode at a given virtual address. * @auprobe: arch specific probepoint information. * @vma: the probed virtual memory area. - * @vaddr: the virtual address to store the opcode. - * @opcode: opcode to be written at @vaddr. + * @opcode_vaddr: the virtual address to store the opcode. + * @opcode: opcode to be written at @opcode_vaddr. * * Called with mm->mmap_lock held for read or write. * Return 0 (success) or a negative errno. */ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct vm_area_struct *vma, - unsigned long vaddr, uprobe_opcode_t opcode) + const unsigned long opcode_vaddr, uprobe_opcode_t opcode) { + const unsigned long vaddr = opcode_vaddr & PAGE_MASK; struct mm_struct *mm = vma->vm_mm; struct uprobe *uprobe; - struct page *old_page, *new_page; int ret, is_register, ref_ctr_updated = 0; - bool orig_page_huge = false; unsigned int gup_flags = FOLL_FORCE; + struct mmu_notifier_range range; + struct folio_walk fw; + struct folio *folio; + struct page *page; is_register = is_swbp_insn(&opcode); uprobe = container_of(auprobe, struct uprobe, arch); -retry: + if (WARN_ON_ONCE(!is_cow_mapping(vma->vm_flags))) + return -EINVAL; + + /* + * When registering, we have to break COW to get an exclusive anonymous + * page that we can safely modify. Use FOLL_WRITE to trigger a write + * fault if required. When unregistering, we might be lucky and the + * anon page is already gone. So defer write faults until really + * required. Use FOLL_SPLIT_PMD, because __uprobe_write_opcode() + * cannot deal with PMDs yet. + */ if (is_register) - gup_flags |= FOLL_SPLIT_PMD; - /* Read the page with vaddr into memory */ - ret = get_user_pages_remote(mm, vaddr, 1, gup_flags, &old_page, NULL); - if (ret != 1) - return ret; + gup_flags |= FOLL_WRITE | FOLL_SPLIT_PMD; - ret = verify_opcode(old_page, vaddr, &opcode); +retry: + ret = get_user_pages_remote(mm, vaddr, 1, gup_flags, &page, NULL); if (ret <= 0) - goto put_old; - - if (is_zero_page(old_page)) { - ret = -EINVAL; - goto put_old; - } + goto out; + folio = page_folio(page); - if (WARN(!is_register && PageCompound(old_page), - "uprobe unregister should never work on compound page\n")) { - ret = -EINVAL; - goto put_old; + ret = verify_opcode(page, opcode_vaddr, &opcode); + if (ret <= 0) { + folio_put(folio); + goto out; } /* We are going to replace instruction, update ref_ctr. */ if (!ref_ctr_updated && uprobe->ref_ctr_offset) { ret = update_ref_ctr(uprobe, mm, is_register ? 1 : -1); - if (ret) - goto put_old; + if (ret) { + folio_put(folio); + goto out; + } ref_ctr_updated = 1; } ret = 0; - if (!is_register && !PageAnon(old_page)) - goto put_old; - - ret = anon_vma_prepare(vma); - if (ret) - goto put_old; - - ret = -ENOMEM; - new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vaddr); - if (!new_page) - goto put_old; - - __SetPageUptodate(new_page); - copy_highpage(new_page, old_page); - copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE); + if (unlikely(!folio_test_anon(folio))) { + VM_WARN_ON_ONCE(is_register); + folio_put(folio); + goto out; + } if (!is_register) { - struct page *orig_page; - pgoff_t index; - - VM_BUG_ON_PAGE(!PageAnon(old_page), old_page); - - index = vaddr_to_offset(vma, vaddr & PAGE_MASK) >> PAGE_SHIFT; - orig_page = find_get_page(vma->vm_file->f_inode->i_mapping, - index); - - if (orig_page) { - if (PageUptodate(orig_page) && - pages_identical(new_page, orig_page)) { - /* let go new_page */ - put_page(new_page); - new_page = NULL; - - if (PageCompound(orig_page)) - orig_page_huge = true; - } - put_page(orig_page); - } + /* + * In the common case, we'll be able to zap the page when + * unregistering. So trigger MMU notifiers now, as we won't + * be able to do it under PTL. + */ + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, + vaddr, vaddr + PAGE_SIZE); + mmu_notifier_invalidate_range_start(&range); + } + + ret = -EAGAIN; + /* Walk the page tables again, to perform the actual update. */ + if (folio_walk_start(&fw, vma, vaddr, 0)) { + if (fw.page == page) + ret = __uprobe_write_opcode(vma, &fw, folio, opcode_vaddr, opcode); + folio_walk_end(&fw, vma); } - ret = __replace_page(vma, vaddr & PAGE_MASK, old_page, new_page); - if (new_page) - put_page(new_page); -put_old: - put_page(old_page); + if (!is_register) + mmu_notifier_invalidate_range_end(&range); - if (unlikely(ret == -EAGAIN)) + folio_put(folio); + switch (ret) { + case -EFAULT: + gup_flags |= FOLL_WRITE | FOLL_SPLIT_PMD; + fallthrough; + case -EAGAIN: goto retry; + default: + break; + } +out: /* Revert back reference counter if instruction update failed. */ - if (ret && is_register && ref_ctr_updated) + if (ret < 0 && is_register && ref_ctr_updated) update_ref_ctr(uprobe, mm, -1); /* try collapse pmd for compound page */ - if (!ret && orig_page_huge) + if (ret > 0) collapse_pte_mapped_thp(mm, vaddr, false); - return ret; + return ret < 0 ? ret : 0; } /**