Message ID | 20250401225842.597899085@goodmis.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | tracing: Clean up persistent ring buffer code | expand |
On Tue, 1 Apr 2025 at 15:57, Steven Rostedt <rostedt@goodmis.org> wrote: > > The code to map the physical memory retrieved by memmap currently > allocates an array of pages to cover the physical memory and then calls > vmap() to map it to a virtual address. Instead of using this temporary > array of struct page descriptors, simply use vmap_page_range() that can > directly map the contiguous physical memory to a virtual address. Wait, what? Didn't you just say that you can just use page_address() for the kernel mapping? So the whole vmap thing is entirely unnecessary in the first place. Including the simpler vmap_page_range(). Linus
On Wed, 2 Apr 2025 09:42:00 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > Wait, what? > > Didn't you just say that you can just use page_address() for the kernel mapping? Correct. I misunderstood Mike when he created the reserve_mem and had it return physical addresses. I didn't realize he had it already mapped via the normal virtual mappings. > > So the whole vmap thing is entirely unnecessary in the first place. > > Including the simpler vmap_page_range(). Not entirely. The original code only used memmap=, which does require the vmap_page_range(). The reserve_mem came later, and thinking it was just physical address space (not a mapping), I just had it use the memmap code. That is, the persistent memory buffer was originally created on top of the memmap feature (and only for x86). I talked with Mike about making it more generic and he and I worked out the reserve_mem option. Having the memmap code already doing the vmap() I just had the reserve_mem version use the same, not knowing that it could just use phys_to_virt(). This patch series fixes that miscommunication and separates out a memmap'ed buffer from reserve_mem buffer and simplifies everything. -- Steve
On Wed, 2 Apr 2025 12:55:48 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > This patch series fixes that miscommunication and separates out a memmap'ed > buffer from reserve_mem buffer and simplifies everything. Since I only care about memory mapping a buffer from reserve_mem to user space, this series makes sure that a buffer created from memmap (or any other physical memory area) is never used for mapping to user space. It also prevents that buffer from being freed, as it is not part of the buddy allocator and can't be added later. That also means all the tricks to determine where the page in the ring buffer came from can go away. -- Steve
On Wed, 2 Apr 2025 13:03:37 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > That also means all the tricks to determine where the page in the ring > buffer came from can go away. I wonder, just to be safe, if I should add in the ring buffer mapping code: /* If the buffer is not part of the normal memory, do not allow mapping */ if (!virt_addr_valid(cpu_buffer->reader_page->page)) return -ENODEV; ? As the buffer can come from anywhere, but we only allow a memory mapping to user space if the buffer is from the normal memory allocator. Currently, this patch series has the caller (the tracing code) prevent allowing memmap to be user mapped. Perhaps the above should include a WARN_ON_ONCE()? -- Steve
On Wed, 2 Apr 2025 at 10:13, Steven Rostedt <rostedt@goodmis.org> wrote: > > /* If the buffer is not part of the normal memory, do not allow mapping */ > if (!virt_addr_valid(cpu_buffer->reader_page->page)) > return -ENODEV; Honestly, virt_addr_valid() is pretty much random. It's not really a valid function outside of random debugging code. It has no real semantics. The comment saying that it validates some kind of 'virt_to_page()' thing is pure garbage. > As the buffer can come from anywhere, but we only allow a memory mapping to > user space if the buffer is from the normal memory allocator. You should damn well keep track of where the memory comes from. You can't just say "I'll take random shit, and then I'll ask the VM what it is". Stop it. If the address came from something you consider trustworthy, then just trust it. If some admin person gave you garbage, it's better to just get a random oops than to have random bogus code. Linus
On Wed, 2 Apr 2025 10:20:58 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > You should damn well keep track of where the memory comes from. And it does. > > You can't just say "I'll take random shit, and then I'll ask the VM what it is". > > Stop it. > > If the address came from something you consider trustworthy, then just > trust it. If some admin person gave you garbage, it's better to just > get a random oops than to have random bogus code. This has nothing to do with admins. This would only occur if the kernel itself created a buffer from some random physical address and then tried to mmap it to user space (which would be a bug). My early career came from the military industry (I worked on the C17 engine control systems in the early '90s). It was drilled in my head to have safety checks throughout the code, in case something buggy happened, it would be caught quickly later on. The virt_addr_valid() would only be a debugging feature, hence the added "WARN_ON()" for it. But I'm fine to not do that. -- Steve
On Wed, 2 Apr 2025 at 10:39, Steven Rostedt <rostedt@goodmis.org> wrote: > > This has nothing to do with admins. This would only occur if the kernel > itself created a buffer from some random physical address and then tried to > mmap it to user space (which would be a bug). Do *not* try to check for bugs like that with virt_addr_valid(). It literally snakes debugging harder. You're much better off getting an oops,. and then you have stack traces, distro bug trackers, and various other automated tools that give you information. Trying to "validate" buggy data is crazy. It's absolutely the opposite of safety. It's going to cause more bugs, it's going to only work for the validation scenarios you thought about, and it's going to make it harder to debug the cases it actually catches. And if you are trying to catch kernel bugs, *any* data could be that buggy data. So the whole concept is insane. Yes, you could make every single line be a "WARN_ON()" with some random check for the particular data you are using. Or you could just write good solid code that is actually readable and maintainable, and doesn't have random pointless checks in it. Linus
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 2f9c91f26d5b..e30d1e058fea 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -50,6 +50,7 @@ #include <linux/irq_work.h> #include <linux/workqueue.h> #include <linux/sort.h> +#include <linux/io.h> /* vmap_page_range() */ #include <asm/setup.h> /* COMMAND_LINE_SIZE */ @@ -9810,29 +9811,27 @@ static int instance_mkdir(const char *name) return ret; } -static u64 map_pages(u64 start, u64 size) +static u64 map_pages(unsigned long start, unsigned long size) { - struct page **pages; - phys_addr_t page_start; - unsigned int page_count; - unsigned int i; - void *vaddr; - - page_count = DIV_ROUND_UP(size, PAGE_SIZE); + unsigned long vmap_start, vmap_end; + struct vm_struct *area; + int ret; - page_start = start; - pages = kmalloc_array(page_count, sizeof(struct page *), GFP_KERNEL); - if (!pages) + area = get_vm_area(size, VM_IOREMAP); + if (!area) return 0; - for (i = 0; i < page_count; i++) { - phys_addr_t addr = page_start + i * PAGE_SIZE; - pages[i] = pfn_to_page(addr >> PAGE_SHIFT); + vmap_start = (unsigned long) area->addr; + vmap_end = vmap_start + size; + + ret = vmap_page_range(vmap_start, vmap_end, + start, pgprot_nx(PAGE_KERNEL)); + if (ret < 0) { + free_vm_area(area); + return 0; } - vaddr = vmap(pages, page_count, VM_MAP, PAGE_KERNEL); - kfree(pages); - return (u64)(unsigned long)vaddr; + return (u64)vmap_start; } /**