From patchwork Tue Dec 17 17:32:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13912300 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A7121F9F77; Tue, 17 Dec 2024 17:34:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734456884; cv=none; b=CXxgeJ3zVEW9Fvn0CkZqC3d3caoRYDqV3fbfkgT3jtre9kGlxoZYxphS0duIFzEGZ0BpTvNFuNXDa9S/b9ePiviBLB3Ky1xfKxVE7sW8PIefWehga95sQUbHvUUD5Iyqg4zNuM4OfPzTAbEQGeCZRKNfowvrCoLcFyPfQTeeyTg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734456884; c=relaxed/simple; bh=uyKcL51FMKCU+8NLO6bsbfavxaLmObUo8GS/mhDFluw=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=ZmWrNrO/WuDB2PBkFSd8RzlxJ7jQwn3IuFKTVpEvJ56ynnuDsoS1m+ZbLcyHXSwVrscPXSgVkdDesXCZuWI9KbJ16Lr6f3f9obUHcGdTdkDAXrHTRE3puiaWTlKxsmRaPh4v8X64IzPtJU2Uhc57ldtxNlRQv+e+ZoKYnMYSRhw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 262CDC4CED7; Tue, 17 Dec 2024 17:34:44 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1tNbTc-00000008cuG-1wdH; Tue, 17 Dec 2024 12:35:20 -0500 Message-ID: <20241217173520.314190793@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 17 Dec 2024 12:32:38 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Linus Torvalds , stable@vger.kernel.org Subject: [PATCH 1/3] ring-buffer: Add uname to match criteria for persistent ring buffer References: <20241217173237.836878448@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Steven Rostedt The persistent ring buffer can live across boots. It is expected that the content in the buffer can be translated to the current kernel with delta offsets even with KASLR enabled. But it can only guarantee this if the content of the ring buffer came from the same kernel as the one that is currently running. Add uname into the meta data and if the uname in the meta data from the previous boot does not match the uname of the current boot, then clear the buffer and re-initialize it. This only handles the case of kernel versions. It does not clear the buffer for development. There's several mechanisms to keep bad data from crashing the kernel. The worse that can happen is some corrupt data may be displayed. Cc: stable@vger.kernel.org Fixes: 8f3e6659656e6 ("ring-buffer: Save text and data locations in mapped meta data") Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 7e257e855dd1..3c94c59d000c 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -17,6 +17,7 @@ #include #include #include /* for self test */ +#include #include #include #include @@ -45,10 +46,13 @@ static void update_pages_handler(struct work_struct *work); #define RING_BUFFER_META_MAGIC 0xBADFEED +#define UNAME_SZ 64 struct ring_buffer_meta { int magic; int struct_size; + char uname[UNAME_SZ]; + unsigned long text_addr; unsigned long data_addr; unsigned long first_buffer; @@ -1687,6 +1691,11 @@ static bool rb_meta_valid(struct ring_buffer_meta *meta, int cpu, return false; } + if (strncmp(init_utsname()->release, meta->uname, UNAME_SZ - 1)) { + pr_info("Ring buffer boot meta[%d] mismatch of uname\n", cpu); + return false; + } + /* The subbuffer's size and number of subbuffers must match */ if (meta->subbuf_size != subbuf_size || meta->nr_subbufs != nr_pages + 1) { @@ -1920,6 +1929,7 @@ static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages) meta->magic = RING_BUFFER_META_MAGIC; meta->struct_size = sizeof(*meta); + strscpy(meta->uname, init_utsname()->release, UNAME_SZ); meta->nr_subbufs = nr_pages + 1; meta->subbuf_size = PAGE_SIZE; From patchwork Tue Dec 17 17:32:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13912301 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6DF361FA149; Tue, 17 Dec 2024 17:34:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734456884; cv=none; b=Xwp3/gLA1jqXEQtjXTNPbgI7VexkouGcdJhkXGsz+xmi8w4xGSjba8ncduRjZLfkICFAiE6GB+YuHFOO8nYnXw7ndVW/OA2lJrl6wivIizlYZlP45W1YtLuz/ZqTdQsyEOzbcN5zDS6TQwqaIhpW2AKA3YcvJjEfyBBduevHyDs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734456884; c=relaxed/simple; bh=45aJ2iSzw6x0/ikwu8QXwZFc0u2PeAyud5FPEpmpQvI=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=gfQEdI290+hAivcjMDCfF5JAOPfiK5+EIMi4z9aIy1EW6ACuy/zSg7giudHK61RTsV8Pks+eeSdGC8F/AIvOPgeph3TOG1wACMjmXAOhBa/GZIt9cf+v7r8Qy3rATTHX3wvVw/bLQfeEmFDZ2yIDv8t98T1DpeFvMKgCQRNmztU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 58730C4CEDE; Tue, 17 Dec 2024 17:34:44 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1tNbTc-00000008cul-2exr; Tue, 17 Dec 2024 12:35:20 -0500 Message-ID: <20241217173520.483964366@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 17 Dec 2024 12:32:39 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Linus Torvalds , stable@vger.kernel.org Subject: [PATCH 2/3] trace/ring-buffer: Do not create module or dynamic events in boot mapped buffers References: <20241217173237.836878448@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Steven Rostedt When a ring buffer is mapped across boots, an delta is saved between the addresses of the previous kernel and the current kernel. But this does not handle module events nor dynamic events. Simply do not create module or dynamic events to a boot mapped instance. This will keep them from ever being enabled and therefore not part of the previous kernel trace. Cc: stable@vger.kernel.org Fixes: e645535a954ad ("tracing: Add option to use memmapped memory for trace boot instance") Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace_events.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 77e68efbd43e..d6359318d5c1 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -2984,6 +2984,12 @@ trace_create_new_event(struct trace_event_call *call, if (!event_in_systems(call, tr->system_names)) return NULL; + /* Boot mapped instances cannot use modules or dynamic events */ + if (tr->flags & TRACE_ARRAY_FL_BOOT) { + if ((call->flags & TRACE_EVENT_FL_DYNAMIC) || call->module) + return NULL; + } + file = kmem_cache_alloc(file_cachep, GFP_TRACE); if (!file) return ERR_PTR(-ENOMEM); From patchwork Tue Dec 17 17:32:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13912302 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B4E01FA150; Tue, 17 Dec 2024 17:34:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734456884; cv=none; b=tLPsqir99mTFpBOpjmrbavk7tbCIuvFDacBOqz6eO7Cq/unSUusGFHNbRYwHSFZ6Lu22y6XBd46HykkWv2DbdWWv21B8gRsGV+SpXrXSdEuPoPnCfLWttSG7S9byUg/hb4Cfq9SA2qofxNiJ+ibcA5l0xRdnZuXJJG+402hvjOQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734456884; c=relaxed/simple; bh=0CgAxagI/kBVjLWRC5Hl2/BndNX3/Vqror0Vy5bRnKw=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=OrlbNo1Os6adE1Tz+npf+JyrNvVuDMSDkOhLudJWExr2uZ704nY0jJNMjJqAshWY8zz6AO36UpvBve+R+wJlX7+9uzt5QqnRFKB7TOLwIENYDvZESeJQ2ryxbbf1OeMxkWMGYkvYvaAnXnz4JRDKbQ8KVhuTskB0NTWMVzciEJ0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6D585C4CEE0; Tue, 17 Dec 2024 17:34:44 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1tNbTc-00000008cvF-3MoP; Tue, 17 Dec 2024 12:35:20 -0500 Message-ID: <20241217173520.658174695@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 17 Dec 2024 12:32:40 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Linus Torvalds , stable@vger.kernel.org Subject: [PATCH 3/3] trace/ring-buffer: Do not use TP_printk() formatting for boot mapped buffers References: <20241217173237.836878448@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Steven Rostedt The TP_printk() of a TRACE_EVENT() is a generic printf format that any developer can create for their event. It may include pointers to strings and such. A boot mapped buffer may contain data from a previous kernel where the strings addresses are different. One solution is to copy the event content and update the pointers by the recorded delta, but a simpler solution (for now) is to just use the print_fields() function to print these events. The print_fields() function just iterates the fields and prints them according to what type they are, and ignores the TP_printk() format from the event itself. To understand the difference, when printing via TP_printk() the output looks like this: 4582.696626: kmem_cache_alloc: call_site=getname_flags+0x47/0x1f0 ptr=00000000e70e10e0 bytes_req=4096 bytes_alloc=4096 gfp_flags=GFP_KERNEL node=-1 accounted=false 4582.696629: kmem_cache_alloc: call_site=alloc_empty_file+0x6b/0x110 ptr=0000000095808002 bytes_req=360 bytes_alloc=384 gfp_flags=GFP_KERNEL node=-1 accounted=false 4582.696630: kmem_cache_alloc: call_site=security_file_alloc+0x24/0x100 ptr=00000000576339c3 bytes_req=16 bytes_alloc=16 gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1 accounted=false 4582.696653: kmem_cache_free: call_site=do_sys_openat2+0xa7/0xd0 ptr=00000000e70e10e0 name=names_cache But when printing via print_fields() (echo 1 > /sys/kernel/tracing/options/fields) the same event output looks like this: 4582.696626: kmem_cache_alloc: call_site=0xffffffff92d10d97 (-1831793257) ptr=0xffff9e0e8571e000 (-107689771147264) bytes_req=0x1000 (4096) bytes_alloc=0x1000 (4096) gfp_flags=0xcc0 (3264) node=0xffffffff (-1) accounted=(0) 4582.696629: kmem_cache_alloc: call_site=0xffffffff92d0250b (-1831852789) ptr=0xffff9e0e8577f800 (-107689770747904) bytes_req=0x168 (360) bytes_alloc=0x180 (384) gfp_flags=0xcc0 (3264) node=0xffffffff (-1) accounted=(0) 4582.696630: kmem_cache_alloc: call_site=0xffffffff92efca74 (-1829778828) ptr=0xffff9e0e8d35d3b0 (-107689640864848) bytes_req=0x10 (16) bytes_alloc=0x10 (16) gfp_flags=0xdc0 (3520) node=0xffffffff (-1) accounted=(0) 4582.696653: kmem_cache_free: call_site=0xffffffff92cfbea7 (-1831879001) ptr=0xffff9e0e8571e000 (-107689771147264) name=names_cache The print_fields() needed one update to handle this, and that's to add the delta to the pointer strings. It also needs to handle %pS, but that is out of scope of this fix. Currently, it only prints the raw address. Ftrace events like stack trace and function tracing have their own methods to print and already can handle the deltas. Those event types are less than __TRACE_LAST_TYPE. If the event type is greater than that, then the print_fields() output is forced. Cc: stable@vger.kernel.org Fixes: 07714b4bb3f98 ("tracing: Handle old buffer mappings for event strings and functions") Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace.c | 9 +++++++++ kernel/trace/trace_output.c | 3 ++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index be62f0ea1814..6581cb2bc67f 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4353,6 +4353,15 @@ static enum print_line_t print_trace_fmt(struct trace_iterator *iter) if (event) { if (tr->trace_flags & TRACE_ITER_FIELDS) return print_event_fields(iter, event); + /* + * For TRACE_EVENT() events, the print_fmt is not + * safe to use if the array has delta offsets + * Force printing via the fields. + */ + if ((tr->text_delta || tr->data_delta) && + event->type > __TRACE_LAST_TYPE) + return print_event_fields(iter, event); + return event->funcs->trace(iter, sym_flags, event); } diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index da748b7cbc4d..0a5d12dd860f 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -853,6 +853,7 @@ static void print_fields(struct trace_iterator *iter, struct trace_event_call *c struct list_head *head) { struct ftrace_event_field *field; + long delta = iter->tr->text_delta; int offset; int len; int ret; @@ -889,7 +890,7 @@ static void print_fields(struct trace_iterator *iter, struct trace_event_call *c case FILTER_PTR_STRING: if (!iter->fmt_size) trace_iter_expand_format(iter); - pos = *(void **)pos; + pos = (*(void **)pos) + delta; ret = strncpy_from_kernel_nofault(iter->fmt, pos, iter->fmt_size); if (ret < 0)