From patchwork Tue Apr 1 21:51:16 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 14035353 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BC661EE7DC; Tue, 1 Apr 2025 21:52:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743544352; cv=none; b=PreJkT1qbe6wLIrZUS+FQJlWL7HQxJJ0MqKHfJq1ahjhblAU3JMFKVEAuNZO9tWK+iIgN9agOM5AH3SSEvr+p8zWcuRO0d6nHkiG0C3ckSFW2sI5w9FJc+JRxtjrKjUNVzNRL6jKx++9zI3nUFuaxt59Nst+qncLbvsola0xmHY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743544352; c=relaxed/simple; bh=6yjoeCeodFKoXwavmmiW4zwQKrdFiBoqWO5IBqPSnkM=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=rk52P0RfotXZbZocXuwuAjtAy1kGdQ0E6eg1WPE6R3xWelPiy+tgwfLA80jAw1ZGIld38pV+JUuM/g/EjW3PtN05VQeFQ3pzn7He1+N2/fill6AtWifS/MALSotcnaq+Gu5KL7L1t8h4d309QtsERNQFRjwMMUKnuW4SZaGgcQk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 98D6AC4CEE8; Tue, 1 Apr 2025 21:52:31 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1tzjY5-00000006JuO-1hky; Tue, 01 Apr 2025 17:53:33 -0400 Message-ID: <20250401215333.257648667@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 01 Apr 2025 17:51:16 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Linus Torvalds , Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort , Vlastimil Babka , Mike Rapoport , Jann Horn Subject: [PATCH v4 1/4] tracing: Enforce the persistent ring buffer to be page aligned References: <20250401215115.602501043@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Steven Rostedt Enforce that the address and the size of the memory used by the persistent ring buffer is page aligned. Also update the documentation to reflect this requirement. Link: https://lore.kernel.org/all/CAHk-=whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com/ Suggested-by: Linus Torvalds Signed-off-by: Steven Rostedt (Google) --- Documentation/admin-guide/kernel-parameters.txt | 2 ++ Documentation/trace/debugging.rst | 2 ++ kernel/trace/trace.c | 12 ++++++++++++ 3 files changed, 16 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 3435a062a208..f904fd8481bd 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -7266,6 +7266,8 @@ This is just one of many ways that can clear memory. Make sure your system keeps the content of memory across reboots before relying on this option. + NB: Both the mapped address and size must be page aligned for the architecture. + See also Documentation/trace/debugging.rst diff --git a/Documentation/trace/debugging.rst b/Documentation/trace/debugging.rst index 54fb16239d70..d54bc500af80 100644 --- a/Documentation/trace/debugging.rst +++ b/Documentation/trace/debugging.rst @@ -136,6 +136,8 @@ kernel, so only the same kernel is guaranteed to work if the mapping is preserved. Switching to a different kernel version may find a different layout and mark the buffer as invalid. +NB: Both the mapped address and size must be page aligned for the architecture. + Using trace_printk() in the boot instance ----------------------------------------- By default, the content of trace_printk() goes into the top level tracing diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index de6d7f0e6206..de9c237e5826 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -10788,6 +10788,18 @@ __init static void enable_instances(void) } if (start) { + /* Start and size must be page aligned */ + if (start & ~PAGE_MASK) { + pr_warn("Tracing: mapping start addr %lx is not page aligned\n", + (unsigned long)start); + continue; + } + if (size & ~PAGE_MASK) { + pr_warn("Tracing: mapping size %lx is not page aligned\n", + (unsigned long)size); + continue; + } + addr = map_pages(start, size); if (addr) { pr_info("Tracing: mapped boot instance %s at physical memory %pa of size 0x%lx\n", From patchwork Tue Apr 1 21:51:17 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 14035354 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BBBC1DF258; Tue, 1 Apr 2025 21:52:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743544352; cv=none; b=mJrpDcff5eMJGga2Vf5vadu7dyuU9aMS2GjAJk5j9u6wSC+kfMjxqIMpuaQ91OwCwJH9ujbty9GYbJA4rzyds79FKCMNCnKceciq2O6vgsUleehOl0QyX9goHxCoobAcv/oDj6WPX7t307N3WT1RMoyIJZzgFWKDG1g+vx61cRM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743544352; c=relaxed/simple; bh=Iyp390rbcCwAnXTaQmy8r08tCmhbu0jByvK7exZghUU=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=ajtUZlW+TzxE9iMuHPBrBI1J1AAep7ZYLL0SMFAz8AOmZojZ18X/EvbthMnA2qoJHhqwTTDPbN7TGvakxMEoFuytlmj0AUxg7zs/Qxl/tRzQVlZNxj7TSt0VTa0nDeyb/mPiOMCrZI4Sgx0wwhT8rCPhsmqwUOLMR1sWAfYyjMo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id B2349C4CEED; Tue, 1 Apr 2025 21:52:31 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1tzjY5-00000006Jus-2R6j; Tue, 01 Apr 2025 17:53:33 -0400 Message-ID: <20250401215333.427506494@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 01 Apr 2025 17:51:17 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Linus Torvalds , Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort , Vlastimil Babka , Mike Rapoport , Jann Horn Subject: [PATCH v4 2/4] tracing: Have reserve_mem use phys_to_virt() and separate from memmap buffer References: <20250401215115.602501043@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Steven Rostedt The reserve_mem kernel command line option may pass back a physical address, but the memory is still part of the normal memory just like using memblock_reserve() would be. This means that the physical memory returned by the reserve_mem command line option can be converted directly to virtual memory by simply using phys_to_virt(). When freeing the buffer allocated by reserve_mem, use free_reserved_area(). Because the persistent ring buffer can also be allocated via the memmap option, which *is* different than normal memory as it cannot be added back to the buddy system, it must be treated differently. It still needs to be virtually mapped to have access to it. It also can not be freed nor can it ever be memory mapped to user space. Create a new trace_array flag called TRACE_ARRAY_FL_MEMMAP which gets set if the buffer is created by the memmap option, and this will prevent the buffer from being memory mapped by user space. Also increment the ref count for memmap'ed buffers so that they can never be freed. Link: https://lore.kernel.org/all/Z-wFszhJ_9o4dc8O@kernel.org/ Suggested-by: Mike Rapoport Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace.c | 28 ++++++++++++++++++++++------ kernel/trace/trace.h | 1 + 2 files changed, 23 insertions(+), 6 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index de9c237e5826..d960f80701bd 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8505,6 +8505,10 @@ static int tracing_buffers_mmap(struct file *filp, struct vm_area_struct *vma) struct trace_iterator *iter = &info->iter; int ret = 0; + /* A memmap'ed buffer is not supported for user space mmap */ + if (iter->tr->flags & TRACE_ARRAY_FL_MEMMAP) + return -ENODEV; + /* Currently the boot mapped buffer is not supported for mmap */ if (iter->tr->flags & TRACE_ARRAY_FL_BOOT) return -ENODEV; @@ -9615,8 +9619,12 @@ static void free_trace_buffers(struct trace_array *tr) free_trace_buffer(&tr->max_buffer); #endif - if (tr->range_addr_start) - vunmap((void *)tr->range_addr_start); + if (tr->range_addr_start) { + void *start = (void *)tr->range_addr_start; + void *end = start + tr->range_addr_size; + + free_reserved_area(start, end, 0, tr->range_name); + } } static void init_trace_flags_index(struct trace_array *tr) @@ -10710,6 +10718,7 @@ static inline void do_allocate_snapshot(const char *name) { } __init static void enable_instances(void) { struct trace_array *tr; + bool memmap_area = false; char *curr_str; char *name; char *str; @@ -10778,6 +10787,7 @@ __init static void enable_instances(void) name); continue; } + memmap_area = true; } else if (tok) { if (!reserve_mem_find_by_name(tok, &start, &size)) { start = 0; @@ -10800,7 +10810,10 @@ __init static void enable_instances(void) continue; } - addr = map_pages(start, size); + if (memmap_area) + addr = map_pages(start, size); + else + addr = (unsigned long)phys_to_virt(start); if (addr) { pr_info("Tracing: mapped boot instance %s at physical memory %pa of size 0x%lx\n", name, &start, (unsigned long)size); @@ -10827,10 +10840,13 @@ __init static void enable_instances(void) update_printk_trace(tr); /* - * If start is set, then this is a mapped buffer, and - * cannot be deleted by user space, so keep the reference - * to it. + * memmap'd buffers can not be freed. */ + if (memmap_area) { + tr->flags |= TRACE_ARRAY_FL_MEMMAP; + tr->ref++; + } + if (start) { tr->flags |= TRACE_ARRAY_FL_BOOT | TRACE_ARRAY_FL_LAST_BOOT; tr->range_name = no_free_ptr(rname); diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index c20f6bcc200a..f9513dc14c37 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -447,6 +447,7 @@ enum { TRACE_ARRAY_FL_BOOT = BIT(1), TRACE_ARRAY_FL_LAST_BOOT = BIT(2), TRACE_ARRAY_FL_MOD_INIT = BIT(3), + TRACE_ARRAY_FL_MEMMAP = BIT(4), }; #ifdef CONFIG_MODULES From patchwork Tue Apr 1 21:51:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 14035356 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 759502147E6; Tue, 1 Apr 2025 21:52:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743544352; cv=none; b=oAbr3nLUCBjFWGFRDvXT6RkOvKQoJZh7ISADm7gLoI1ASejxbptt5e1sx64RpVzXbQS6YzBTriEbWulDuBocXE3p5CL2UIw0UOz2ZcU5iXVgvDrOMPYt3IkCWcRie+4UGwh3NO0Hyhd3DTjldRswGrBOVKEn0ARDS0DeAyw3pIM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743544352; c=relaxed/simple; bh=EqlDiT0LiRLsf2VaSbqot+gpBOn8Go/Mcj0gATo12Zc=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=nS8KegxOzQ8iou5fVSpP2GQrYHDoFfjN3D/KqDIj07WDuuF4mCDaJswpn+pgkVo95JBr7J5NjiOUPm/U+lPXAHbDRpeGy5car5Ou38maQFmIP50pKEshbXk/jFVm80IL1GSaK+w937kFZbKgGSKeEVJD0PfxhaeDd0b1dtlYlCQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id DD33EC4CEF1; Tue, 1 Apr 2025 21:52:31 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1tzjY5-00000006JvM-37tC; Tue, 01 Apr 2025 17:53:33 -0400 Message-ID: <20250401215333.601438272@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 01 Apr 2025 17:51:18 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Linus Torvalds , Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort , Vlastimil Babka , Mike Rapoport , Jann Horn Subject: [PATCH v4 3/4] tracing: Use vmap_page_range() to map memmap ring buffer References: <20250401215115.602501043@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Steven Rostedt The code to map the physical memory retrieved by memmap currently allocates an array of pages to cover the physical memory and then calls vmap() to map it to a virtual address. Instead of using this temporary array of struct page descriptors, simply use vmap_page_range() that can directly map the contiguous physical memory to a virtual address. Link: https://lore.kernel.org/all/CAHk-=whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com/ Suggested-by: Linus Torvalds Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace.c | 33 ++++++++++++++++----------------- 1 file changed, 16 insertions(+), 17 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index d960f80701bd..24852bd4fc01 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -50,6 +50,7 @@ #include #include #include +#include /* vmap_page_range() */ #include /* COMMAND_LINE_SIZE */ @@ -9817,29 +9818,27 @@ static int instance_mkdir(const char *name) return ret; } -static u64 map_pages(u64 start, u64 size) +static u64 map_pages(unsigned long start, unsigned long size) { - struct page **pages; - phys_addr_t page_start; - unsigned int page_count; - unsigned int i; - void *vaddr; - - page_count = DIV_ROUND_UP(size, PAGE_SIZE); + unsigned long vmap_start, vmap_end; + struct vm_struct *area; + int ret; - page_start = start; - pages = kmalloc_array(page_count, sizeof(struct page *), GFP_KERNEL); - if (!pages) + area = get_vm_area(size, VM_IOREMAP); + if (!area) return 0; - for (i = 0; i < page_count; i++) { - phys_addr_t addr = page_start + i * PAGE_SIZE; - pages[i] = pfn_to_page(addr >> PAGE_SHIFT); + vmap_start = (unsigned long) area->addr; + vmap_end = vmap_start + size; + + ret = vmap_page_range(vmap_start, vmap_end, + start, pgprot_nx(PAGE_KERNEL)); + if (ret < 0) { + free_vm_area(area); + return 0; } - vaddr = vmap(pages, page_count, VM_MAP, PAGE_KERNEL); - kfree(pages); - return (u64)(unsigned long)vaddr; + return (u64)vmap_start; } /** From patchwork Tue Apr 1 21:51:19 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 14035355 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 333061EF390; Tue, 1 Apr 2025 21:52:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743544352; cv=none; b=fv8NWvB2sxGib78lyGJ6WZlLeyscpnPJvt1fQSv3+4iXb1kIVC9ryd+WY8ocO14e/eaZv/XSIviHiT7BkaprTY2Il73WIkv1MEcrYSVl/VWqbWffKJj0NHJKb/B/bHTDTdrWAKJ7XWQIB3efoSZdFcC8Zz5kvcIV/Tj+br1Ahlk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743544352; c=relaxed/simple; bh=VVGWqO0rzDMb+s8nP/lH2okTAPFnAxC1NyA5C3vFxbM=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=UF8zJW1kuGgDt5RHdCCm6C/2dpakx9v9mlk88MDZMDA2+GVxlTN2XfL77pv+cVs4K0pcpK8ZN9m7JvA17RWtFbxT0RfU0ZOQKSp1ZDLKgwWzMqfQpWhKPVWftw1p81FConHZ/aX1EbXtzV2ghUAnpo4q64Vg7DEDp2SvpfX61z0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 18D3CC4CEF2; Tue, 1 Apr 2025 21:52:32 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1tzjY5-00000006Jvq-3qPG; Tue, 01 Apr 2025 17:53:33 -0400 Message-ID: <20250401215333.766290625@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 01 Apr 2025 17:51:19 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Linus Torvalds , Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort , Vlastimil Babka , Mike Rapoport , Jann Horn , stable@vger.kernel.org Subject: [PATCH v4 4/4] ring-buffer: Use flush_kernel_vmap_range() over flush_dcache_folio() References: <20250401215115.602501043@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Steven Rostedt Some architectures do not have data cache coherency between user and kernel space. For these architectures, the cache needs to be flushed on both the kernel and user addresses so that user space can see the updates the kernel has made. Instead of using flush_dcache_folio() and playing with virt_to_folio() within the call to that function, use flush_kernel_vmap_range() which takes the virtual address and does the work for those architectures that need it. This also fixes a bug where the flush of the reader page only flushed one page. If the sub-buffer order is 1 or more, where the sub-buffer size would be greater than a page, it would miss the rest of the sub-buffer content, as the "reader page" is not just a page, but the size of a sub-buffer. Link: https://lore.kernel.org/all/CAG48ez3w0my4Rwttbc5tEbNsme6tc0mrSN95thjXUFaJ3aQ6SA@mail.gmail.com/ Cc: stable@vger.kernel.org Fixes: 117c39200d9d7 ("ring-buffer: Introducing ring-buffer mapping functions"); Suggested-by: Jann Horn Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index d8d7b28e2c2f..c0f877d39a24 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -6016,7 +6016,7 @@ static void rb_update_meta_page(struct ring_buffer_per_cpu *cpu_buffer) meta->read = cpu_buffer->read; /* Some archs do not have data cache coherency between kernel and user-space */ - flush_dcache_folio(virt_to_folio(cpu_buffer->meta_page)); + flush_kernel_vmap_range(cpu_buffer->meta_page, PAGE_SIZE); } static void @@ -7319,7 +7319,8 @@ int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu) out: /* Some archs do not have data cache coherency between kernel and user-space */ - flush_dcache_folio(virt_to_folio(cpu_buffer->reader_page->page)); + flush_kernel_vmap_range(cpu_buffer->reader_page->page, + buffer->subbuf_size + BUF_PAGE_HDR_SIZE); rb_update_meta_page(cpu_buffer);