Message ID | 20240406173649.3210836-1-vdonnefort@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Introducing trace buffer mapping by user-space | expand |
Hi Andrew, et.al. Linus said it's a hard requirement that this code gets an Acked-by (or Reviewed-by) from the memory sub-maintainers before he will accept it. He was upset that we faulted in pages one at a time instead of mapping it in one go: https://lore.kernel.org/all/CAHk-=wh5wWeib7+kVHpBVtUn7kx7GGadWqb5mW5FYTdewEfL=w@mail.gmail.com/ Could you take a look at patches 1-3 to make sure they look sane from a memory management point of view? I really want this applied in the next merge window. Thanks! -- Steve On Sat, 6 Apr 2024 18:36:44 +0100 Vincent Donnefort <vdonnefort@google.com> wrote: > The tracing ring-buffers can be stored on disk or sent to network > without any copy via splice. However the later doesn't allow real time > processing of the traces. A solution is to give userspace direct access > to the ring-buffer pages via a mapping. An application can now become a > consumer of the ring-buffer, in a similar fashion to what trace_pipe > offers. > > Support for this new feature can already be found in libtracefs from > version 1.8, when built with EXTRA_CFLAGS=-DFORCE_MMAP_ENABLE. > > Vincent > > v19 -> v20: > * Fix typos in documentation. > * Remove useless mmap open and fault callbacks. > * add mm.h include for vm_insert_pages > > v18 -> v19: > * Use VM_PFNMAP and vm_insert_pages > * Allocate ring-buffer subbufs with __GFP_COMP > * Pad the meta-page with the zero-page to align on the subbuf_order > * Extend the ring-buffer test with mmap() dedicated suite > > v17 -> v18: > * Fix lockdep_assert_held > * Fix spin_lock_init typo > * Fix CONFIG_TRACER_MAX_TRACE typo > > v16 -> v17: > * Documentation and comments improvements. > * Create get/put_snapshot_map() for clearer code. > * Replace kzalloc with kcalloc. > * Fix -ENOMEM handling in rb_alloc_meta_page(). > * Move flush(cpu_buffer->reader_page) behind the reader lock. > * Move all inc/dec of cpu_buffer->mapped behind reader lock and buffer > mutex. (removes READ_ONCE/WRITE_ONCE accesses). > > v15 -> v16: > * Add comment for the dcache flush. > * Remove now unnecessary WRITE_ONCE for the meta-page. > > v14 -> v15: > * Add meta-page and reader-page flush. Intends to fix the mapping > for VIVT and aliasing-VIPT data caches. > * -EPERM on VM_EXEC. > * Fix build warning !CONFIG_TRACER_MAX_TRACE. > > v13 -> v14: > * All cpu_buffer->mapped readers use READ_ONCE (except for swap_cpu) > * on unmap, sync meta-page teardown with the reader_lock instead of > the synchronize_rcu. > * Add a dedicated spinlock for trace_array ->snapshot and ->mapped. > (intends to fix a lockdep issue) > * Add kerneldoc for flags and Reserved fields. > * Add kselftest for snapshot/map mutual exclusion. > > v12 -> v13: > * Swap subbufs_{touched,lost} for Reserved fields. > * Add a flag field in the meta-page. > * Fix CONFIG_TRACER_MAX_TRACE. > * Rebase on top of trace/urgent. > * Add a comment for try_unregister_trigger() > > v11 -> v12: > * Fix code sample mmap bug. > * Add logging in sample code. > * Reset tracer in selftest. > * Add a refcount for the snapshot users. > * Prevent mapping when there are snapshot users and vice versa. > * Refine the meta-page. > * Fix types in the meta-page. > * Collect Reviewed-by. > > v10 -> v11: > * Add Documentation and code sample. > * Add a selftest. > * Move all the update to the meta-page into a single > rb_update_meta_page(). > * rb_update_meta_page() is now called from > ring_buffer_map_get_reader() to fix NOBLOCK callers. > * kerneldoc for struct trace_meta_page. > * Add a patch to zero all the ring-buffer allocations. > > v9 -> v10: > * Refactor rb_update_meta_page() > * In-loop declaration for foreach_subbuf_page() > * Check for cpu_buffer->mapped overflow > > v8 -> v9: > * Fix the unlock path in ring_buffer_map() > * Fix cpu_buffer cast with rb_work_rq->is_cpu_buffer > * Rebase on linux-trace/for-next (3cb3091138ca0921c4569bcf7ffa062519639b6a) > > v7 -> v8: > * Drop the subbufs renaming into bpages > * Use subbuf as a name when relevant > > v6 -> v7: > * Rebase onto lore.kernel.org/lkml/20231215175502.106587604@goodmis.org/ > * Support for subbufs > * Rename subbufs into bpages > > v5 -> v6: > * Rebase on next-20230802. > * (unsigned long) -> (void *) cast for virt_to_page(). > * Add a wait for the GET_READER_PAGE ioctl. > * Move writer fields update (overrun/pages_lost/entries/pages_touched) > in the irq_work. > * Rearrange id in struct buffer_page. > * Rearrange the meta-page. > * ring_buffer_meta_page -> trace_buffer_meta_page. > * Add meta_struct_len into the meta-page. > > v4 -> v5: > * Trivial rebase onto 6.5-rc3 (previously 6.4-rc3) > > v3 -> v4: > * Add to the meta-page: > - pages_lost / pages_read (allow to compute how full is the > ring-buffer) > - read (allow to compute how many entries can be read) > - A reader_page struct. > * Rename ring_buffer_meta_header -> ring_buffer_meta > * Rename ring_buffer_get_reader_page -> ring_buffer_map_get_reader_page > * Properly consume events on ring_buffer_map_get_reader_page() with > rb_advance_reader(). > > v2 -> v3: > * Remove data page list (for non-consuming read) > ** Implies removing order > 0 meta-page > * Add a new meta page field ->read > * Rename ring_buffer_meta_page_header into ring_buffer_meta_header > > v1 -> v2: > * Hide data_pages from the userspace struct > * Fix META_PAGE_MAX_PAGES > * Support for order > 0 meta-page > * Add missing page->mapping. > > Vincent Donnefort (5): > ring-buffer: allocate sub-buffers with __GFP_COMP > ring-buffer: Introducing ring-buffer mapping functions > tracing: Allow user-space mapping of the ring-buffer > Documentation: tracing: Add ring-buffer mapping > ring-buffer/selftest: Add ring-buffer mapping test > > Documentation/trace/index.rst | 1 + > Documentation/trace/ring-buffer-map.rst | 106 +++++ > include/linux/ring_buffer.h | 6 + > include/uapi/linux/trace_mmap.h | 48 +++ > kernel/trace/ring_buffer.c | 403 +++++++++++++++++- > kernel/trace/trace.c | 113 ++++- > kernel/trace/trace.h | 1 + > tools/testing/selftests/ring-buffer/Makefile | 8 + > tools/testing/selftests/ring-buffer/config | 2 + > .../testing/selftests/ring-buffer/map_test.c | 302 +++++++++++++ > 10 files changed, 979 insertions(+), 11 deletions(-) > create mode 100644 Documentation/trace/ring-buffer-map.rst > create mode 100644 include/uapi/linux/trace_mmap.h > create mode 100644 tools/testing/selftests/ring-buffer/Makefile > create mode 100644 tools/testing/selftests/ring-buffer/config > create mode 100644 tools/testing/selftests/ring-buffer/map_test.c > > > base-commit: 7604256cecef34a82333d9f78262d3180f4eb525
(added linux-mm) On Wed, Apr 10, 2024 at 01:56:12PM -0400, Steven Rostedt wrote: > > Hi Andrew, et.al. > > Linus said it's a hard requirement that this code gets an Acked-by (or > Reviewed-by) from the memory sub-maintainers before he will accept it. > He was upset that we faulted in pages one at a time instead of mapping it > in one go: > > https://lore.kernel.org/all/CAHk-=wh5wWeib7+kVHpBVtUn7kx7GGadWqb5mW5FYTdewEfL=w@mail.gmail.com/ > > Could you take a look at patches 1-3 to make sure they look sane from a > memory management point of view? > > I really want this applied in the next merge window. > > Thanks! > > -- Steve > > > On Sat, 6 Apr 2024 18:36:44 +0100 > Vincent Donnefort <vdonnefort@google.com> wrote: > > > The tracing ring-buffers can be stored on disk or sent to network > > without any copy via splice. However the later doesn't allow real time > > processing of the traces. A solution is to give userspace direct access > > to the ring-buffer pages via a mapping. An application can now become a > > consumer of the ring-buffer, in a similar fashion to what trace_pipe > > offers. > > > > Support for this new feature can already be found in libtracefs from > > version 1.8, when built with EXTRA_CFLAGS=-DFORCE_MMAP_ENABLE. > > > > Vincent > > > > v19 -> v20: > > * Fix typos in documentation. > > * Remove useless mmap open and fault callbacks. > > * add mm.h include for vm_insert_pages > > > > v18 -> v19: > > * Use VM_PFNMAP and vm_insert_pages > > * Allocate ring-buffer subbufs with __GFP_COMP > > * Pad the meta-page with the zero-page to align on the subbuf_order > > * Extend the ring-buffer test with mmap() dedicated suite > > > > v17 -> v18: > > * Fix lockdep_assert_held > > * Fix spin_lock_init typo > > * Fix CONFIG_TRACER_MAX_TRACE typo > > > > v16 -> v17: > > * Documentation and comments improvements. > > * Create get/put_snapshot_map() for clearer code. > > * Replace kzalloc with kcalloc. > > * Fix -ENOMEM handling in rb_alloc_meta_page(). > > * Move flush(cpu_buffer->reader_page) behind the reader lock. > > * Move all inc/dec of cpu_buffer->mapped behind reader lock and buffer > > mutex. (removes READ_ONCE/WRITE_ONCE accesses). > > > > v15 -> v16: > > * Add comment for the dcache flush. > > * Remove now unnecessary WRITE_ONCE for the meta-page. > > > > v14 -> v15: > > * Add meta-page and reader-page flush. Intends to fix the mapping > > for VIVT and aliasing-VIPT data caches. > > * -EPERM on VM_EXEC. > > * Fix build warning !CONFIG_TRACER_MAX_TRACE. > > > > v13 -> v14: > > * All cpu_buffer->mapped readers use READ_ONCE (except for swap_cpu) > > * on unmap, sync meta-page teardown with the reader_lock instead of > > the synchronize_rcu. > > * Add a dedicated spinlock for trace_array ->snapshot and ->mapped. > > (intends to fix a lockdep issue) > > * Add kerneldoc for flags and Reserved fields. > > * Add kselftest for snapshot/map mutual exclusion. > > > > v12 -> v13: > > * Swap subbufs_{touched,lost} for Reserved fields. > > * Add a flag field in the meta-page. > > * Fix CONFIG_TRACER_MAX_TRACE. > > * Rebase on top of trace/urgent. > > * Add a comment for try_unregister_trigger() > > > > v11 -> v12: > > * Fix code sample mmap bug. > > * Add logging in sample code. > > * Reset tracer in selftest. > > * Add a refcount for the snapshot users. > > * Prevent mapping when there are snapshot users and vice versa. > > * Refine the meta-page. > > * Fix types in the meta-page. > > * Collect Reviewed-by. > > > > v10 -> v11: > > * Add Documentation and code sample. > > * Add a selftest. > > * Move all the update to the meta-page into a single > > rb_update_meta_page(). > > * rb_update_meta_page() is now called from > > ring_buffer_map_get_reader() to fix NOBLOCK callers. > > * kerneldoc for struct trace_meta_page. > > * Add a patch to zero all the ring-buffer allocations. > > > > v9 -> v10: > > * Refactor rb_update_meta_page() > > * In-loop declaration for foreach_subbuf_page() > > * Check for cpu_buffer->mapped overflow > > > > v8 -> v9: > > * Fix the unlock path in ring_buffer_map() > > * Fix cpu_buffer cast with rb_work_rq->is_cpu_buffer > > * Rebase on linux-trace/for-next (3cb3091138ca0921c4569bcf7ffa062519639b6a) > > > > v7 -> v8: > > * Drop the subbufs renaming into bpages > > * Use subbuf as a name when relevant > > > > v6 -> v7: > > * Rebase onto lore.kernel.org/lkml/20231215175502.106587604@goodmis.org/ > > * Support for subbufs > > * Rename subbufs into bpages > > > > v5 -> v6: > > * Rebase on next-20230802. > > * (unsigned long) -> (void *) cast for virt_to_page(). > > * Add a wait for the GET_READER_PAGE ioctl. > > * Move writer fields update (overrun/pages_lost/entries/pages_touched) > > in the irq_work. > > * Rearrange id in struct buffer_page. > > * Rearrange the meta-page. > > * ring_buffer_meta_page -> trace_buffer_meta_page. > > * Add meta_struct_len into the meta-page. > > > > v4 -> v5: > > * Trivial rebase onto 6.5-rc3 (previously 6.4-rc3) > > > > v3 -> v4: > > * Add to the meta-page: > > - pages_lost / pages_read (allow to compute how full is the > > ring-buffer) > > - read (allow to compute how many entries can be read) > > - A reader_page struct. > > * Rename ring_buffer_meta_header -> ring_buffer_meta > > * Rename ring_buffer_get_reader_page -> ring_buffer_map_get_reader_page > > * Properly consume events on ring_buffer_map_get_reader_page() with > > rb_advance_reader(). > > > > v2 -> v3: > > * Remove data page list (for non-consuming read) > > ** Implies removing order > 0 meta-page > > * Add a new meta page field ->read > > * Rename ring_buffer_meta_page_header into ring_buffer_meta_header > > > > v1 -> v2: > > * Hide data_pages from the userspace struct > > * Fix META_PAGE_MAX_PAGES > > * Support for order > 0 meta-page > > * Add missing page->mapping. > > > > Vincent Donnefort (5): > > ring-buffer: allocate sub-buffers with __GFP_COMP > > ring-buffer: Introducing ring-buffer mapping functions > > tracing: Allow user-space mapping of the ring-buffer > > Documentation: tracing: Add ring-buffer mapping > > ring-buffer/selftest: Add ring-buffer mapping test > > > > Documentation/trace/index.rst | 1 + > > Documentation/trace/ring-buffer-map.rst | 106 +++++ > > include/linux/ring_buffer.h | 6 + > > include/uapi/linux/trace_mmap.h | 48 +++ > > kernel/trace/ring_buffer.c | 403 +++++++++++++++++- > > kernel/trace/trace.c | 113 ++++- > > kernel/trace/trace.h | 1 + > > tools/testing/selftests/ring-buffer/Makefile | 8 + > > tools/testing/selftests/ring-buffer/config | 2 + > > .../testing/selftests/ring-buffer/map_test.c | 302 +++++++++++++ > > 10 files changed, 979 insertions(+), 11 deletions(-) > > create mode 100644 Documentation/trace/ring-buffer-map.rst > > create mode 100644 include/uapi/linux/trace_mmap.h > > create mode 100644 tools/testing/selftests/ring-buffer/Makefile > > create mode 100644 tools/testing/selftests/ring-buffer/config > > create mode 100644 tools/testing/selftests/ring-buffer/map_test.c > > > > > > base-commit: 7604256cecef34a82333d9f78262d3180f4eb525 >