From patchwork Mon Dec 13 09:48:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Tzvetomir Stoyanov (VMware)" X-Patchwork-Id: 12673539 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 875BCC433F5 for ; Mon, 13 Dec 2021 10:00:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237612AbhLMKAc (ORCPT ); Mon, 13 Dec 2021 05:00:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59698 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237595AbhLMJ61 (ORCPT ); Mon, 13 Dec 2021 04:58:27 -0500 Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E2CA3C08EB20 for ; Mon, 13 Dec 2021 01:48:29 -0800 (PST) Received: by mail-ed1-x536.google.com with SMTP id e3so50741458edu.4 for ; Mon, 13 Dec 2021 01:48:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=DhRx6ewN7MwnNiVVmf6uS+ga/E3172l7ca+KEv51r0c=; b=f2O1H5UCDK4mNC8YaTAyrF7SEQygYHa+KRJtzvsqdk3ZBJlPEAk86/K/oKJpDRKYeQ vn87bAK7pzbBnY5aB/mmkniKiC8hiwDGRw8MaR6tMlcI2wQQ9N5ydcCf59zx06Qq4mkK MNort9v14EHdDji+wwjq60nXgazrWC5VhAgC+KAtPHIKrNNSxNulNbgPl5VPvxom9Ckt H15CopMdXhJz9HaIJsMAvtFa4brKYjj4gpcx783OuUgtZ+P3apA3OikW6i640ZLbfotP 6BE4RM4IGZFeyIW42g9sRdZ+AbKIOTNEogPHlqAwO1wLMk8SQ+iGonBj9Lvv2/VVjXV/ Stzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=DhRx6ewN7MwnNiVVmf6uS+ga/E3172l7ca+KEv51r0c=; b=mPuP0trwCU6b/S84R8157ntyrM4u6KJmipXWz3ZH6MTi7MYYQn2k5CVLF7mh0Lggnn fH9vM++h1wCOZUsrw49xG2BKwLi8ZEuuclnAQ6dtJYD3d9f2VphIwhMWc6GCHw7wAO6S qQ3haFnnvP7u9q5CSqi/EgBFlVxr47Ij+/sOdIbiahN/cKNEu+XraC/TBV3IfT9+YrFv +OMgQoMthDkCzxzitrA9pufUqz21NgjKcKxRHB5rlihYCxAWmA2ABR5RFWEbSFbOVEDt ET+mNOvol3vWDzQyFRmetFtXxX8FqCckJao877cNSNB4jFGjsY5YS2m8jIyTOZlbsKVe pq3A== X-Gm-Message-State: AOAM530kp4P2HbMLg00RxJNJiKWuupHEEyF1ctsGRA9evEt2FecHbnOC dOwSq34vnuqSlpXy0xqT2bQgEI2tGsA= X-Google-Smtp-Source: ABdhPJyNIWpQjFQvKXAYDGplc/K/i3bSsvGNl7ijof3lqjbvOJTDZz6B1oo7SAwgkSWOnxqRRCdHAA== X-Received: by 2002:a05:6402:3589:: with SMTP id y9mr65485936edc.44.1639388908566; Mon, 13 Dec 2021 01:48:28 -0800 (PST) Received: from oberon.zico.biz.zico.biz ([83.222.187.186]) by smtp.gmail.com with ESMTPSA id yd20sm5465748ejb.47.2021.12.13.01.48.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Dec 2021 01:48:28 -0800 (PST) From: "Tzvetomir Stoyanov (VMware)" To: rostedt@goodmis.org Cc: linux-trace-devel@vger.kernel.org Subject: [PATCH v4 1/5] [RFC] tracing: Refactor ring buffer implementation Date: Mon, 13 Dec 2021 11:48:21 +0200 Message-Id: <20211213094825.61876-2-tz.stoyanov@gmail.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211213094825.61876-1-tz.stoyanov@gmail.com> References: <20211213094825.61876-1-tz.stoyanov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org In order to introduce sub-buffer size per ring buffer, some internal refactoring is needed. As ring_buffer_print_page_header() will depend on the trace_buffer structure, it is moved after the structure definition. Signed-off-by: Tzvetomir Stoyanov (VMware) --- kernel/trace/ring_buffer.c | 59 +++++++++++++++++++------------------- 1 file changed, 30 insertions(+), 29 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 2699e9e562b1..cc34dbfdd29b 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -371,35 +371,6 @@ static inline int test_time_stamp(u64 delta) /* Max payload is BUF_PAGE_SIZE - header (8bytes) */ #define BUF_MAX_DATA_SIZE (BUF_PAGE_SIZE - (sizeof(u32) * 2)) -int ring_buffer_print_page_header(struct trace_seq *s) -{ - struct buffer_data_page field; - - trace_seq_printf(s, "\tfield: u64 timestamp;\t" - "offset:0;\tsize:%u;\tsigned:%u;\n", - (unsigned int)sizeof(field.time_stamp), - (unsigned int)is_signed_type(u64)); - - trace_seq_printf(s, "\tfield: local_t commit;\t" - "offset:%u;\tsize:%u;\tsigned:%u;\n", - (unsigned int)offsetof(typeof(field), commit), - (unsigned int)sizeof(field.commit), - (unsigned int)is_signed_type(long)); - - trace_seq_printf(s, "\tfield: int overwrite;\t" - "offset:%u;\tsize:%u;\tsigned:%u;\n", - (unsigned int)offsetof(typeof(field), commit), - 1, - (unsigned int)is_signed_type(long)); - - trace_seq_printf(s, "\tfield: char data;\t" - "offset:%u;\tsize:%u;\tsigned:%u;\n", - (unsigned int)offsetof(typeof(field), data), - (unsigned int)BUF_PAGE_SIZE, - (unsigned int)is_signed_type(char)); - - return !trace_seq_has_overflowed(s); -} struct rb_irq_work { struct irq_work work; @@ -559,6 +530,36 @@ struct ring_buffer_iter { int missed_events; }; +int ring_buffer_print_page_header(struct trace_seq *s) +{ + struct buffer_data_page field; + + trace_seq_printf(s, "\tfield: u64 timestamp;\t" + "offset:0;\tsize:%u;\tsigned:%u;\n", + (unsigned int)sizeof(field.time_stamp), + (unsigned int)is_signed_type(u64)); + + trace_seq_printf(s, "\tfield: local_t commit;\t" + "offset:%u;\tsize:%u;\tsigned:%u;\n", + (unsigned int)offsetof(typeof(field), commit), + (unsigned int)sizeof(field.commit), + (unsigned int)is_signed_type(long)); + + trace_seq_printf(s, "\tfield: int overwrite;\t" + "offset:%u;\tsize:%u;\tsigned:%u;\n", + (unsigned int)offsetof(typeof(field), commit), + 1, + (unsigned int)is_signed_type(long)); + + trace_seq_printf(s, "\tfield: char data;\t" + "offset:%u;\tsize:%u;\tsigned:%u;\n", + (unsigned int)offsetof(typeof(field), data), + (unsigned int)BUF_PAGE_SIZE, + (unsigned int)is_signed_type(char)); + + return !trace_seq_has_overflowed(s); +} + #ifdef RB_TIME_32 /* From patchwork Mon Dec 13 09:48:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Tzvetomir Stoyanov (VMware)" X-Patchwork-Id: 12673545 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3597C433EF for ; Mon, 13 Dec 2021 10:00:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240241AbhLMKAm (ORCPT ); Mon, 13 Dec 2021 05:00:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59738 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237981AbhLMJ6i (ORCPT ); Mon, 13 Dec 2021 04:58:38 -0500 Received: from mail-ed1-x530.google.com (mail-ed1-x530.google.com [IPv6:2a00:1450:4864:20::530]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A37BC08EB23 for ; Mon, 13 Dec 2021 01:48:31 -0800 (PST) Received: by mail-ed1-x530.google.com with SMTP id x15so50878668edv.1 for ; Mon, 13 Dec 2021 01:48:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=xXegPPtPOndfC/um6fa7hjWe/09fEFMOxsx6gYansIg=; b=bWsmN3xKzbYbmLqzpwGC6htPqBUXUwLJHawBH3htIvP3223F0J9rOl6YnwxxvJH2nc khz1oc4Uj7cfGzGOHSVS1er6RvDYdwrSYxHqR8e6bfJXgquQwpd7uZuVFgIhCrjM/Jcx UlXkmjPCZcA6DjZPGdhfdN3Lq5BT0B+VFDQG9ERjmrLdd1lhIEP7YiZQKHVMIcp+E/zi RTxl7KzoYJZsoli897HewxkOQ8yKfM3tKH6KnRLgXiYLG9C3CCbe19AzSLDRRJ75gBJD sqUk3vOs3nCkTtGWFC3JOUBZlDFNY+SS3aipaIEapUEzAHltxiezXOCRh5/yov1Unfy8 wSfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=xXegPPtPOndfC/um6fa7hjWe/09fEFMOxsx6gYansIg=; b=dZSuI8Vg1rr6r546czJZ9HRPuZXFy3KKZrO1hBfRCYEAxSZ0y0KNsKRuoQz3Lrel1h k5Ec3t9t9H2PVEKLFSJAYy/RXnW2l0Uflol2a2eATJbjg0zhEyGduSvQpgrczj9iTbdx fR44eLdQC81Wux8vCFPj6+9NhAqPSTcsQyGxIXYfu1l/zSbAdo3wgNaxngzKVJl8KS7d NKCdGjapvR6yXdEtswoYDbM+65RRAkGDDSDkacguYd1+93vmiEJ2RYN3H2T/I4rlY/6O xn5GlJ0Vh0qM6zK0HnTPQEwMbZh/lmX9T8Q0KZGango5fIPnWsggF8rPtMmYiSlKt2mH Mm8A== X-Gm-Message-State: AOAM530MRp1TYYjfgm4Z1Y4KRZ1K/LHpXflgKUxTsj2xeVJTLA5tVCSq yjbJyjL/f8EBQk/ns3FbhdUdG17POcs= X-Google-Smtp-Source: ABdhPJxMOlfLTbay6ivQFtDwH+eNViDD6l2U+nbDtSpaNsmDaeltWQN1usy8yf8iQZ3DnT5km07wwA== X-Received: by 2002:a05:6402:438a:: with SMTP id o10mr63688718edc.353.1639388909522; Mon, 13 Dec 2021 01:48:29 -0800 (PST) Received: from oberon.zico.biz.zico.biz ([83.222.187.186]) by smtp.gmail.com with ESMTPSA id yd20sm5465748ejb.47.2021.12.13.01.48.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Dec 2021 01:48:29 -0800 (PST) From: "Tzvetomir Stoyanov (VMware)" To: rostedt@goodmis.org Cc: linux-trace-devel@vger.kernel.org Subject: [PATCH v4 2/5] [RFC] tracing: Page size per ring buffer Date: Mon, 13 Dec 2021 11:48:22 +0200 Message-Id: <20211213094825.61876-3-tz.stoyanov@gmail.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211213094825.61876-1-tz.stoyanov@gmail.com> References: <20211213094825.61876-1-tz.stoyanov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org Currently the size of one sub buffer page is global for all buffers and it is hard coded to one system page. In order to introduce configurable ring buffer sub page size, the internal logic should be refactored to work with sub page size per ring buffer. Signed-off-by: Tzvetomir Stoyanov (VMware) --- include/linux/ring_buffer.h | 2 +- kernel/trace/ring_buffer.c | 66 ++++++++++++++++++++----------------- kernel/trace/trace.c | 2 +- kernel/trace/trace.h | 1 + kernel/trace/trace_events.c | 50 ++++++++++++++++++++++------ 5 files changed, 79 insertions(+), 42 deletions(-) diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index dac53fd3afea..d9a2e6e8fb79 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -200,7 +200,7 @@ int ring_buffer_read_page(struct trace_buffer *buffer, void **data_page, struct trace_seq; int ring_buffer_print_entry_header(struct trace_seq *s); -int ring_buffer_print_page_header(struct trace_seq *s); +int ring_buffer_print_page_header(struct trace_buffer *buffer, struct trace_seq *s); enum ring_buffer_flags { RB_FL_OVERWRITE = 1 << 0, diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index cc34dbfdd29b..68fdeff449c3 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -366,12 +366,6 @@ static inline int test_time_stamp(u64 delta) return 0; } -#define BUF_PAGE_SIZE (PAGE_SIZE - BUF_PAGE_HDR_SIZE) - -/* Max payload is BUF_PAGE_SIZE - header (8bytes) */ -#define BUF_MAX_DATA_SIZE (BUF_PAGE_SIZE - (sizeof(u32) * 2)) - - struct rb_irq_work { struct irq_work work; wait_queue_head_t waiters; @@ -515,6 +509,9 @@ struct trace_buffer { struct rb_irq_work irq_work; bool time_stamp_abs; + + unsigned int subbuf_size; + unsigned int max_data_size; }; struct ring_buffer_iter { @@ -530,7 +527,7 @@ struct ring_buffer_iter { int missed_events; }; -int ring_buffer_print_page_header(struct trace_seq *s) +int ring_buffer_print_page_header(struct trace_buffer *buffer, struct trace_seq *s) { struct buffer_data_page field; @@ -554,7 +551,7 @@ int ring_buffer_print_page_header(struct trace_seq *s) trace_seq_printf(s, "\tfield: char data;\t" "offset:%u;\tsize:%u;\tsigned:%u;\n", (unsigned int)offsetof(typeof(field), data), - (unsigned int)BUF_PAGE_SIZE, + (unsigned int)buffer->subbuf_size, (unsigned int)is_signed_type(char)); return !trace_seq_has_overflowed(s); @@ -1726,7 +1723,13 @@ struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags, if (!zalloc_cpumask_var(&buffer->cpumask, GFP_KERNEL)) goto fail_free_buffer; - nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE); + /* Default buffer page size - one system page */ + buffer->subbuf_size = PAGE_SIZE - BUF_PAGE_HDR_SIZE; + + /* Max payload is buffer page size - header (8bytes) */ + buffer->max_data_size = buffer->subbuf_size - (sizeof(u32) * 2); + + nr_pages = DIV_ROUND_UP(size, buffer->subbuf_size); buffer->flags = flags; buffer->clock = trace_clock_local; buffer->reader_lock_key = key; @@ -1920,7 +1923,8 @@ rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned long nr_pages) * Increment overrun to account for the lost events. */ local_add(page_entries, &cpu_buffer->overrun); - local_sub(BUF_PAGE_SIZE, &cpu_buffer->entries_bytes); + local_sub(cpu_buffer->buffer->subbuf_size, + &cpu_buffer->entries_bytes); } /* @@ -2042,7 +2046,7 @@ static void update_pages_handler(struct work_struct *work) * @size: the new size. * @cpu_id: the cpu buffer to resize * - * Minimum size is 2 * BUF_PAGE_SIZE. + * Minimum size is 2 * buffer->subbuf_size. * * Returns 0 on success and < 0 on failure. */ @@ -2064,7 +2068,7 @@ int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size, !cpumask_test_cpu(cpu_id, buffer->cpumask)) return 0; - nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE); + nr_pages = DIV_ROUND_UP(size, buffer->subbuf_size); /* we need a minimum of two pages */ if (nr_pages < 2) @@ -2291,7 +2295,7 @@ rb_iter_head_event(struct ring_buffer_iter *iter) */ barrier(); - if ((iter->head + length) > commit || length > BUF_MAX_DATA_SIZE) + if ((iter->head + length) > commit || length > iter->cpu_buffer->buffer->max_data_size) /* Writer corrupted the read? */ goto reset; @@ -2404,7 +2408,8 @@ rb_handle_head_page(struct ring_buffer_per_cpu *cpu_buffer, * the counters. */ local_add(entries, &cpu_buffer->overrun); - local_sub(BUF_PAGE_SIZE, &cpu_buffer->entries_bytes); + local_sub(cpu_buffer->buffer->subbuf_size, + &cpu_buffer->entries_bytes); /* * The entries will be zeroed out when we move the @@ -2523,6 +2528,7 @@ static inline void rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer, unsigned long tail, struct rb_event_info *info) { + unsigned long bsize = READ_ONCE(cpu_buffer->buffer->subbuf_size); struct buffer_page *tail_page = info->tail_page; struct ring_buffer_event *event; unsigned long length = info->length; @@ -2531,13 +2537,13 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer, * Only the event that crossed the page boundary * must fill the old tail_page with padding. */ - if (tail >= BUF_PAGE_SIZE) { + if (tail >= bsize) { /* * If the page was filled, then we still need * to update the real_end. Reset it to zero * and the reader will ignore it. */ - if (tail == BUF_PAGE_SIZE) + if (tail == bsize) tail_page->real_end = 0; local_sub(length, &tail_page->write); @@ -2547,7 +2553,7 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer, event = __rb_page_index(tail_page, tail); /* account for padding bytes */ - local_add(BUF_PAGE_SIZE - tail, &cpu_buffer->entries_bytes); + local_add(bsize - tail, &cpu_buffer->entries_bytes); /* * Save the original length to the meta data. @@ -2567,7 +2573,7 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer, * If we are less than the minimum size, we don't need to * worry about it. */ - if (tail > (BUF_PAGE_SIZE - RB_EVNT_MIN_SIZE)) { + if (tail > (bsize - RB_EVNT_MIN_SIZE)) { /* No room for any events */ /* Mark the rest of the page with padding */ @@ -2579,13 +2585,13 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer, } /* Put in a discarded event */ - event->array[0] = (BUF_PAGE_SIZE - tail) - RB_EVNT_HDR_SIZE; + event->array[0] = (bsize - tail) - RB_EVNT_HDR_SIZE; event->type_len = RINGBUF_TYPE_PADDING; /* time delta must be non zero */ event->time_delta = 1; /* Set write to end of buffer */ - length = (tail + length) - BUF_PAGE_SIZE; + length = (tail + length) - bsize; local_sub(length, &tail_page->write); } @@ -3477,7 +3483,7 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer, tail = write - info->length; /* See if we shot pass the end of this buffer page */ - if (unlikely(write > BUF_PAGE_SIZE)) { + if (unlikely(write > cpu_buffer->buffer->subbuf_size)) { /* before and after may now different, fix it up*/ b_ok = rb_time_read(&cpu_buffer->before_stamp, &info->before); a_ok = rb_time_read(&cpu_buffer->write_stamp, &info->after); @@ -3686,7 +3692,7 @@ ring_buffer_lock_reserve(struct trace_buffer *buffer, unsigned long length) if (unlikely(atomic_read(&cpu_buffer->record_disabled))) goto out; - if (unlikely(length > BUF_MAX_DATA_SIZE)) + if (unlikely(length > buffer->max_data_size)) goto out; if (unlikely(trace_recursive_lock(cpu_buffer))) @@ -3836,7 +3842,7 @@ int ring_buffer_write(struct trace_buffer *buffer, if (atomic_read(&cpu_buffer->record_disabled)) goto out; - if (length > BUF_MAX_DATA_SIZE) + if (length > buffer->max_data_size) goto out; if (unlikely(trace_recursive_lock(cpu_buffer))) @@ -4958,7 +4964,7 @@ ring_buffer_read_prepare(struct trace_buffer *buffer, int cpu, gfp_t flags) if (!iter) return NULL; - iter->event = kmalloc(BUF_MAX_DATA_SIZE, flags); + iter->event = kmalloc(buffer->max_data_size, flags); if (!iter->event) { kfree(iter); return NULL; @@ -5076,14 +5082,14 @@ unsigned long ring_buffer_size(struct trace_buffer *buffer, int cpu) { /* * Earlier, this method returned - * BUF_PAGE_SIZE * buffer->nr_pages + * buffer->subbuf_size * buffer->nr_pages * Since the nr_pages field is now removed, we have converted this to * return the per cpu buffer value. */ if (!cpumask_test_cpu(cpu, buffer->cpumask)) return 0; - return BUF_PAGE_SIZE * buffer->buffers[cpu]->nr_pages; + return buffer->subbuf_size * buffer->buffers[cpu]->nr_pages; } EXPORT_SYMBOL_GPL(ring_buffer_size); @@ -5619,7 +5625,7 @@ int ring_buffer_read_page(struct trace_buffer *buffer, } else { /* update the entry counter */ cpu_buffer->read += rb_page_entries(reader); - cpu_buffer->read_bytes += BUF_PAGE_SIZE; + cpu_buffer->read_bytes += buffer->subbuf_size; /* swap the pages */ rb_init_page(bpage); @@ -5650,7 +5656,7 @@ int ring_buffer_read_page(struct trace_buffer *buffer, /* If there is room at the end of the page to save the * missed events, then record it there. */ - if (BUF_PAGE_SIZE - commit >= sizeof(missed_events)) { + if (buffer->subbuf_size - commit >= sizeof(missed_events)) { memcpy(&bpage->data[commit], &missed_events, sizeof(missed_events)); local_add(RB_MISSED_STORED, &bpage->commit); @@ -5662,8 +5668,8 @@ int ring_buffer_read_page(struct trace_buffer *buffer, /* * This page may be off to user land. Zero it out here. */ - if (commit < BUF_PAGE_SIZE) - memset(&bpage->data[commit], 0, BUF_PAGE_SIZE - commit); + if (commit < buffer->subbuf_size) + memset(&bpage->data[commit], 0, buffer->subbuf_size - commit); out_unlock: raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 88de94da596b..0eb8af875184 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4877,7 +4877,7 @@ static int tracing_release(struct inode *inode, struct file *file) return 0; } -static int tracing_release_generic_tr(struct inode *inode, struct file *file) +int tracing_release_generic_tr(struct inode *inode, struct file *file) { struct trace_array *tr = inode->i_private; diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 38715aa6cfdf..101be5d43117 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -580,6 +580,7 @@ void tracing_reset_current(int cpu); void tracing_reset_all_online_cpus(void); int tracing_open_generic(struct inode *inode, struct file *filp); int tracing_open_generic_tr(struct inode *inode, struct file *filp); +int tracing_release_generic_tr(struct inode *inode, struct file *file); bool tracing_is_disabled(void); bool tracer_tracing_is_on(struct trace_array *tr); void tracer_tracing_on(struct trace_array *tr); diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 92be9cb1d7d4..7424c20514ec 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -1847,9 +1847,9 @@ subsystem_filter_write(struct file *filp, const char __user *ubuf, size_t cnt, } static ssize_t -show_header(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos) +show_header_page_file(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos) { - int (*func)(struct trace_seq *s) = filp->private_data; + struct trace_array *tr = filp->private_data; struct trace_seq *s; int r; @@ -1862,7 +1862,31 @@ show_header(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos) trace_seq_init(s); - func(s); + ring_buffer_print_page_header(tr->array_buffer.buffer, s); + r = simple_read_from_buffer(ubuf, cnt, ppos, + s->buffer, trace_seq_used(s)); + + kfree(s); + + return r; +} + +static ssize_t +show_header_event_file(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos) +{ + struct trace_seq *s; + int r; + + if (*ppos) + return 0; + + s = kmalloc(sizeof(*s), GFP_KERNEL); + if (!s) + return -ENOMEM; + + trace_seq_init(s); + + ring_buffer_print_entry_header(s); r = simple_read_from_buffer(ubuf, cnt, ppos, s->buffer, trace_seq_used(s)); @@ -2117,10 +2141,18 @@ static const struct file_operations ftrace_tr_enable_fops = { .release = subsystem_release, }; -static const struct file_operations ftrace_show_header_fops = { - .open = tracing_open_generic, - .read = show_header, +static const struct file_operations ftrace_show_header_page_fops = { + .open = tracing_open_generic_tr, + .read = show_header_page_file, + .llseek = default_llseek, + .release = tracing_release_generic_tr, +}; + +static const struct file_operations ftrace_show_header_event_fops = { + .open = tracing_open_generic_tr, + .read = show_header_event_file, .llseek = default_llseek, + .release = tracing_release_generic_tr, }; static int @@ -3481,14 +3513,12 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr) /* ring buffer internal formats */ entry = trace_create_file("header_page", TRACE_MODE_READ, d_events, - ring_buffer_print_page_header, - &ftrace_show_header_fops); + tr, &ftrace_show_header_page_fops); if (!entry) pr_warn("Could not create tracefs 'header_page' entry\n"); entry = trace_create_file("header_event", TRACE_MODE_READ, d_events, - ring_buffer_print_entry_header, - &ftrace_show_header_fops); + tr, &ftrace_show_header_event_fops); if (!entry) pr_warn("Could not create tracefs 'header_event' entry\n"); From patchwork Mon Dec 13 09:48:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Tzvetomir Stoyanov (VMware)" X-Patchwork-Id: 12673543 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D93DC433FE for ; Mon, 13 Dec 2021 10:00:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240233AbhLMKAl (ORCPT ); Mon, 13 Dec 2021 05:00:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59742 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237993AbhLMJ6i (ORCPT ); Mon, 13 Dec 2021 04:58:38 -0500 Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 042F2C08EB26 for ; Mon, 13 Dec 2021 01:48:32 -0800 (PST) Received: by mail-ed1-x536.google.com with SMTP id x15so50878828edv.1 for ; Mon, 13 Dec 2021 01:48:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=nT5H9R/pOtRAaqy/oVUuO0tXM2N6pl9SsLnStip9vsg=; b=UxwNGFSdPnZp3Jo4Ac3zw9YpAk4zQzQnnHbWBaXNWxH7UDAbO9O4g1Ph1cfiLyjWf/ gU629Ks15EWUpAinc9rn/1j8dkv3KtoCpvr1og/dSPR+1QWHEIjLk0makkq7WZ1LZL8k Oxi4GB9zfIUCWb24UM0D3KNaodP9lcNsZworJVmiIpE9rqUI7fXYKANDNiR7MM+mpr3N pXvDotL1ielc03cV/WXabcUYHZSdro0oTSJ2DI6D6YN07rd+3S2nHcjvsU9ZcT78OkTK gqIUF7srddivo1gF2a6JtxLproyu3LCnP+Hy8Plt2mp+mpFY4dE6qpWOj7ldVa00jw36 167g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=nT5H9R/pOtRAaqy/oVUuO0tXM2N6pl9SsLnStip9vsg=; b=KIXEbuqKT4RlVvqB5WCrSEf5XZ0DJ3Nz9WZmk9MQ7kS++2ErSIHNHkQ6OBhwXTihFm oVnUW8qHJ2nw+cyzYx898piEseTs7CqQZXPRS/He3ohE2EKCGyuBT+JDD4X2b/OkB+DQ YXy6a8Y+uEK3rgUKA2zl8HiGC4lrq12IEsFDSmAkQr4YP5UBsLMGg2RP7wDxTAR5pdJX 94Lr4RKOpgH1Zznc2xDvjau/lfhxda5Y8y0jFTOs3ZQOPOGczW/QlLTh+VTGa+hrhTKl UMJUSk/sUnu/pAwtShqPN+IY7i72xm3PlJhADNd3PHX/3EAq8KgcA1DJ6qV6LOZerbYQ VLRg== X-Gm-Message-State: AOAM533KPC1GAh8AgqrRFI63lJ15UZs38OUMS1UBUcQjM7PjUUfP4zp/ 4QrE2fLPcwQ2+c9N4wmq/SLXaWLphJQ= X-Google-Smtp-Source: ABdhPJxRgyGXpyeG4zpMxcDxJ9iDQ2BOkLCUsCPjFF/zxSzz1C3HtIoWf50AhKwpGJV3/xQLqcgElQ== X-Received: by 2002:a17:906:f43:: with SMTP id h3mr43052592ejj.414.1639388910608; Mon, 13 Dec 2021 01:48:30 -0800 (PST) Received: from oberon.zico.biz.zico.biz ([83.222.187.186]) by smtp.gmail.com with ESMTPSA id yd20sm5465748ejb.47.2021.12.13.01.48.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Dec 2021 01:48:29 -0800 (PST) From: "Tzvetomir Stoyanov (VMware)" To: rostedt@goodmis.org Cc: linux-trace-devel@vger.kernel.org Subject: [PATCH v4 3/5] [RFC] tracing: Add interface for configuring trace sub buffer size Date: Mon, 13 Dec 2021 11:48:23 +0200 Message-Id: <20211213094825.61876-4-tz.stoyanov@gmail.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211213094825.61876-1-tz.stoyanov@gmail.com> References: <20211213094825.61876-1-tz.stoyanov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org The trace ring buffer sub page size can be configured, per trace instance. A new ftrace file "buffer_subbuf_order" is added to get and set the size of the ring buffer sub page for current trace instance. The size must be an order of system page size, that's why the new interface works with system page order, instead of absolute page size: 0 means the ring buffer sub page is equal to 1 system page and so forth: 0 - 1 system page 1 - 2 system pages 2 - 4 system pages ... The ring buffer sub page size is limited between 1 and 128 system pages. The default value is 1 system page. New ring buffer APIs are introduced: ring_buffer_subbuf_order_set() ring_buffer_subbuf_order_get() ring_buffer_subbuf_size_get() Signed-off-by: Tzvetomir Stoyanov (VMware) --- include/linux/ring_buffer.h | 4 ++ kernel/trace/ring_buffer.c | 73 +++++++++++++++++++++++++++++++++++++ kernel/trace/trace.c | 48 ++++++++++++++++++++++++ 3 files changed, 125 insertions(+) diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index d9a2e6e8fb79..9103462f6e85 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -202,6 +202,10 @@ struct trace_seq; int ring_buffer_print_entry_header(struct trace_seq *s); int ring_buffer_print_page_header(struct trace_buffer *buffer, struct trace_seq *s); +int ring_buffer_subbuf_order_get(struct trace_buffer *buffer); +int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order); +int ring_buffer_subbuf_size_get(struct trace_buffer *buffer); + enum ring_buffer_flags { RB_FL_OVERWRITE = 1 << 0, }; diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 68fdeff449c3..4aa5361a8f4c 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -511,6 +511,7 @@ struct trace_buffer { bool time_stamp_abs; unsigned int subbuf_size; + unsigned int subbuf_order; unsigned int max_data_size; }; @@ -5679,6 +5680,78 @@ int ring_buffer_read_page(struct trace_buffer *buffer, } EXPORT_SYMBOL_GPL(ring_buffer_read_page); +/** + * ring_buffer_subbuf_size_get - get size of the sub buffer. + * @buffer: the buffer to get the sub buffer size from + * + * Returns size of the sub buffer, in bytes. + */ +int ring_buffer_subbuf_size_get(struct trace_buffer *buffer) +{ + return buffer->subbuf_size + BUF_PAGE_HDR_SIZE; +} +EXPORT_SYMBOL_GPL(ring_buffer_subbuf_size_get); + +/** + * ring_buffer_subbuf_order_get - get order of system sub pages in one buffer page. + * @buffer: The ring_buffer to get the system sub page order from + * + * By default, one ring buffer sub page equals to one system page. This parameter + * is configurable, per ring buffer. The size of the ring buffer sub page can be + * extended, but must be an order of system page size. + * + * Returns the order of buffer sub page size, in system pages: + * 0 means the sub buffer size is 1 system page and so forth. + * In case of an error < 0 is returned. + */ +int ring_buffer_subbuf_order_get(struct trace_buffer *buffer) +{ + if (!buffer) + return -EINVAL; + + return buffer->subbuf_order; +} +EXPORT_SYMBOL_GPL(ring_buffer_subbuf_order_get); + +/** + * ring_buffer_subbuf_order_set - set the size of ring buffer sub page. + * @buffer: The ring_buffer to set the new page size. + * @order: Order of the system pages in one sub buffer page + * + * By default, one ring buffer pages equals to one system page. This API can be + * used to set new size of the ring buffer page. The size must be order of + * system page size, that's why the input parameter @order is the order of + * system pages that are allocated for one ring buffer page: + * 0 - 1 system page + * 1 - 2 system pages + * 3 - 4 system pages + * ... + * + * Returns 0 on success or < 0 in case of an error. + */ +int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) +{ + int psize; + + if (!buffer || order < 0) + return -EINVAL; + + if (buffer->subbuf_order == order) + return 0; + + psize = (1 << order) * PAGE_SIZE; + if (psize <= BUF_PAGE_HDR_SIZE) + return -EINVAL; + + buffer->subbuf_order = order; + buffer->subbuf_size = psize - BUF_PAGE_HDR_SIZE; + + /* Todo: reset the buffer with the new page size */ + + return 0; +} +EXPORT_SYMBOL_GPL(ring_buffer_subbuf_order_set); + /* * We only allocate new buffers, never free them if the CPU goes down. * If we were to free the buffer, then the user would lose any trace that was in diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 0eb8af875184..867a220b4ef2 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -9015,6 +9015,51 @@ static const struct file_operations buffer_percent_fops = { .llseek = default_llseek, }; +static ssize_t +buffer_order_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos) +{ + struct trace_array *tr = filp->private_data; + char buf[64]; + int r; + + r = sprintf(buf, "%d\n", ring_buffer_subbuf_order_get(tr->array_buffer.buffer)); + + return simple_read_from_buffer(ubuf, cnt, ppos, buf, r); +} + +static ssize_t +buffer_order_write(struct file *filp, const char __user *ubuf, + size_t cnt, loff_t *ppos) +{ + struct trace_array *tr = filp->private_data; + unsigned long val; + int ret; + + ret = kstrtoul_from_user(ubuf, cnt, 10, &val); + if (ret) + return ret; + + /* limit between 1 and 128 system pages */ + if (val < 0 || val > 7) + return -EINVAL; + + ret = ring_buffer_subbuf_order_set(tr->array_buffer.buffer, val); + if (ret) + return ret; + + (*ppos)++; + + return cnt; +} + +static const struct file_operations buffer_order_fops = { + .open = tracing_open_generic_tr, + .read = buffer_order_read, + .write = buffer_order_write, + .release = tracing_release_generic_tr, + .llseek = default_llseek, +}; + static struct dentry *trace_instance_dir; static void @@ -9468,6 +9513,9 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer) trace_create_file("buffer_percent", TRACE_MODE_READ, d_tracer, tr, &buffer_percent_fops); + trace_create_file("buffer_subbuf_order", TRACE_MODE_WRITE, d_tracer, + tr, &buffer_order_fops); + create_trace_options_dir(tr); trace_create_maxlat_file(tr, d_tracer); From patchwork Mon Dec 13 09:48:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Tzvetomir Stoyanov (VMware)" X-Patchwork-Id: 12673565 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94C16C43217 for ; Mon, 13 Dec 2021 10:04:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241424AbhLMKEf (ORCPT ); Mon, 13 Dec 2021 05:04:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59750 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238099AbhLMJ6j (ORCPT ); Mon, 13 Dec 2021 04:58:39 -0500 Received: from mail-ed1-x530.google.com (mail-ed1-x530.google.com [IPv6:2a00:1450:4864:20::530]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DEFB2C08EB29 for ; Mon, 13 Dec 2021 01:48:32 -0800 (PST) Received: by mail-ed1-x530.google.com with SMTP id z5so50766606edd.3 for ; Mon, 13 Dec 2021 01:48:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=dEBZrJY7x9+p5wonW466oAoCm/VHVN/Kkq9coJG1KbM=; b=IRLJCfALUi88Pn7Ytf6nGMImZefZr/dRyKUaTOcWsARmIKmvjFqkTPFsihMHH5IqV4 EMNtTRJojNzyvGy0fvYcEpJq4bdYhEENro6RqJXZAQ52GMb2tzex7sDHJoaPa5qPL1CI OK71VRF+2iSK3NHDKCmz4SrhWjWFbwcU+SJeQzG85IYMB25c27TDyw0Gw+Z7KnMl7jIj iDOP9SOupsKxO2BPOUwJinoAACpZXQYb8eLmPKY6xZmmpTTGHdmKSyJ9XrVYAAQR1rIZ H60HeUvdO8WYZGevDWCSU88jo7i+iraRyiRrECah61CHHoINpNGTCOY86FTyTIS1jLSt uXOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=dEBZrJY7x9+p5wonW466oAoCm/VHVN/Kkq9coJG1KbM=; b=S8vxPu+kzc35NZvx50+3ndSg31kb8SefBMCpA7MsAWBHefnKDhaWycjStVCK/4yUP3 g1rcfxF7MtWS/cb33RPe0F6qg+9XxadW7kmI3rN9tWIx+bOrP6UyTjrzOOc5UvLCCa8/ Xeta1oiF6y7i6gsK6/UkpIH5XpfSaJrPI6utqon5v4rcvG73hxn6HbG08RvNdnT4Q+Qk /ythn0l+F83Kwz6n8yZ6+rHneKS2rO7iZ3xianYC3Ftjea9tr/UfSwSdEy555uCTaRLY m1vVtXwzuQ0B462gd1VEiRmEQFm3DCtCmkggh1ASkXS4sNafNIFY8dOjOQo+jB822Wia 3A0Q== X-Gm-Message-State: AOAM532QuJhXbt6boq0RSxFFaJ61UT1IjcZ8MNRjWUG0j56IbSfYHfzH OO3L76J5E+mR91IcDfhDDuzSQsQYzfM= X-Google-Smtp-Source: ABdhPJxuAdIbbB0hXQDHxQOaJCG1WvZEU0D+VU2UhLWbWmjWXL8UMm9WztCLk+bvCriML2gcxP8Wrw== X-Received: by 2002:a05:6402:604:: with SMTP id n4mr62556846edv.226.1639388911523; Mon, 13 Dec 2021 01:48:31 -0800 (PST) Received: from oberon.zico.biz.zico.biz ([83.222.187.186]) by smtp.gmail.com with ESMTPSA id yd20sm5465748ejb.47.2021.12.13.01.48.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Dec 2021 01:48:31 -0800 (PST) From: "Tzvetomir Stoyanov (VMware)" To: rostedt@goodmis.org Cc: linux-trace-devel@vger.kernel.org Subject: [PATCH v4 4/5] [RFC] tracing: Set new size of the ring buffer sub page Date: Mon, 13 Dec 2021 11:48:24 +0200 Message-Id: <20211213094825.61876-5-tz.stoyanov@gmail.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211213094825.61876-1-tz.stoyanov@gmail.com> References: <20211213094825.61876-1-tz.stoyanov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org There are two approaches when changing the size of the ring buffer sub page: 1. Destroying all pages and allocating new pages with the new size. 2. Allocating new pages, copying the content of the old pages before destroying them. The first approach is easier, it is selected in the proposed implementation. Changing the ring buffer sub page size is supposed to not happen frequently. Usually, that size should be set only once, when the buffer is not in use yet and is supposed to be empty. Signed-off-by: Tzvetomir Stoyanov (VMware) --- kernel/trace/ring_buffer.c | 80 ++++++++++++++++++++++++++++++++++---- 1 file changed, 73 insertions(+), 7 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 4aa5361a8f4c..a40fcb1cb299 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -323,6 +323,7 @@ struct buffer_page { unsigned read; /* index for next read */ local_t entries; /* entries on this page */ unsigned long real_end; /* real end of data */ + unsigned order; /* order of the page */ struct buffer_data_page *page; /* Actual data page */ }; @@ -352,7 +353,7 @@ static void rb_init_page(struct buffer_data_page *bpage) */ static void free_buffer_page(struct buffer_page *bpage) { - free_page((unsigned long)bpage->page); + free_pages((unsigned long)bpage->page, bpage->order); kfree(bpage); } @@ -1563,10 +1564,12 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer, list_add(&bpage->list, pages); - page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu), mflags, 0); + page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu), mflags, + cpu_buffer->buffer->subbuf_order); if (!page) goto free_pages; bpage->page = page_address(page); + bpage->order = cpu_buffer->buffer->subbuf_order; rb_init_page(bpage->page); if (user_thread && fatal_signal_pending(current)) @@ -1645,7 +1648,8 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu) rb_check_bpage(cpu_buffer, bpage); cpu_buffer->reader_page = bpage; - page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL, 0); + + page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL, cpu_buffer->buffer->subbuf_order); if (!page) goto fail_free_reader; bpage->page = page_address(page); @@ -1725,6 +1729,7 @@ struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags, goto fail_free_buffer; /* Default buffer page size - one system page */ + buffer->subbuf_order = 0; buffer->subbuf_size = PAGE_SIZE - BUF_PAGE_HDR_SIZE; /* Max payload is buffer page size - header (8bytes) */ @@ -5434,8 +5439,8 @@ void *ring_buffer_alloc_read_page(struct trace_buffer *buffer, int cpu) if (bpage) goto out; - page = alloc_pages_node(cpu_to_node(cpu), - GFP_KERNEL | __GFP_NORETRY, 0); + page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL | __GFP_NORETRY, + cpu_buffer->buffer->subbuf_order); if (!page) return ERR_PTR(-ENOMEM); @@ -5479,7 +5484,7 @@ void ring_buffer_free_read_page(struct trace_buffer *buffer, int cpu, void *data local_irq_restore(flags); out: - free_page((unsigned long)bpage); + free_pages((unsigned long)bpage, buffer->subbuf_order); } EXPORT_SYMBOL_GPL(ring_buffer_free_read_page); @@ -5731,7 +5736,13 @@ EXPORT_SYMBOL_GPL(ring_buffer_subbuf_order_get); */ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) { + struct ring_buffer_per_cpu **cpu_buffers; + int old_order, old_size; + int nr_pages; int psize; + int bsize; + int err; + int cpu; if (!buffer || order < 0) return -EINVAL; @@ -5743,12 +5754,67 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) if (psize <= BUF_PAGE_HDR_SIZE) return -EINVAL; + bsize = sizeof(void *) * buffer->cpus; + cpu_buffers = kzalloc(bsize, GFP_KERNEL); + if (!cpu_buffers) + return -ENOMEM; + + old_order = buffer->subbuf_order; + old_size = buffer->subbuf_size; + + /* prevent another thread from changing buffer sizes */ + mutex_lock(&buffer->mutex); + atomic_inc(&buffer->record_disabled); + + /* Make sure all commits have finished */ + synchronize_rcu(); + buffer->subbuf_order = order; buffer->subbuf_size = psize - BUF_PAGE_HDR_SIZE; - /* Todo: reset the buffer with the new page size */ + /* Make sure all new buffers are allocated, before deleting the old ones */ + for_each_buffer_cpu(buffer, cpu) { + if (!cpumask_test_cpu(cpu, buffer->cpumask)) + continue; + + nr_pages = buffer->buffers[cpu]->nr_pages; + cpu_buffers[cpu] = rb_allocate_cpu_buffer(buffer, nr_pages, cpu); + if (!cpu_buffers[cpu]) { + err = -ENOMEM; + goto error; + } + } + + for_each_buffer_cpu(buffer, cpu) { + if (!cpumask_test_cpu(cpu, buffer->cpumask)) + continue; + + rb_free_cpu_buffer(buffer->buffers[cpu]); + buffer->buffers[cpu] = cpu_buffers[cpu]; + } + + atomic_dec(&buffer->record_disabled); + mutex_unlock(&buffer->mutex); + + kfree(cpu_buffers); return 0; + +error: + buffer->subbuf_order = old_order; + buffer->subbuf_size = old_size; + + atomic_dec(&buffer->record_disabled); + mutex_unlock(&buffer->mutex); + + for_each_buffer_cpu(buffer, cpu) { + if (!cpu_buffers[cpu]) + continue; + kfree(cpu_buffers[cpu]); + } + kfree(cpu_buffers); + + return err; } EXPORT_SYMBOL_GPL(ring_buffer_subbuf_order_set); From patchwork Mon Dec 13 09:48:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Tzvetomir Stoyanov (VMware)" X-Patchwork-Id: 12673567 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 286EDC433FE for ; Mon, 13 Dec 2021 10:04:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239742AbhLMKEj (ORCPT ); Mon, 13 Dec 2021 05:04:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238160AbhLMJ6k (ORCPT ); Mon, 13 Dec 2021 04:58:40 -0500 Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E829AC08EB2B for ; Mon, 13 Dec 2021 01:48:33 -0800 (PST) Received: by mail-ed1-x534.google.com with SMTP id y12so49272234eda.12 for ; Mon, 13 Dec 2021 01:48:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=fNsh/vSNMVjdkAA/uLVUou4tO/BNCqM78wfdDeHlaFM=; b=GYePBzH5v1BNLUD2RKwzrnoB2ToZGe/J3WTX97n5hhMgwefjccrTPzUltnEVIm8fHC VjcE/11nFKXxbzoQVU6j6uckiI5jHZ7wQdYtDN+L1muPmT7JIgEwod01EGfmAC8Z4JiD r1GuUiJvmjDZAktW3XvuzHARzTuzegq+4jTe0TZEvc3IGkmLBQz44CBBlQvrXnjtuN/S Va+hIhEO5avbCrzIAqAc0aARBkmGUdc/g8HCX2/B8Yeu0ibeYQgjmOSYGkyEptPB5i8l huk0Fky31VltCNH30I7Qvzl9roHcShCiuVty95xOBTtL8ozBNB6pzErF3k7B68UG+JRn ijlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=fNsh/vSNMVjdkAA/uLVUou4tO/BNCqM78wfdDeHlaFM=; b=rNMBKB0OYcDAaoKwlBukJkCb5i2ZwvCjUkZIEUci9hD6h7OeSrtbgMB1OAz8mwwBMq LHE+WVUg5g1IGBGwxG2vKmCNBEjZ1L38oBVIa3KHMN48DaiS48hzJez/7mS4sRjUCagz XjhpQ1NdCNpoFEXYeXWRNMWsHP5a6ZKqGbu1gEkRCSfW15r3upY4WIGAXuOWLBTKFEYA MwjIx4tDoFmtxYL1oKZy/geYlquln1L21DN/uZuP+qOtPQR2deOPJV/yo0fiyvTDOVbF U+Nl1D24VxtYEKvHoAi7kROQ3kVhp4VfMoLJbkWVfnI5hYlMBD46I4rjOilJSawmVf4i 49Mg== X-Gm-Message-State: AOAM530tKveey5noKs0Kh35CLPv4UntDvt01J5vQ5hr4gdxnCh00z0Zw rzBcBFgK5JjzR6HZn1vB2BtDjyTzd4g= X-Google-Smtp-Source: ABdhPJzMV9bEMgx67fyJMj2OQTNENJ4Czfi0hGYIIaCUCHAJNWy5apxW/rsZxp707xiSVYEITcRoaA== X-Received: by 2002:a05:6402:195:: with SMTP id r21mr61802583edv.174.1639388912492; Mon, 13 Dec 2021 01:48:32 -0800 (PST) Received: from oberon.zico.biz.zico.biz ([83.222.187.186]) by smtp.gmail.com with ESMTPSA id yd20sm5465748ejb.47.2021.12.13.01.48.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Dec 2021 01:48:32 -0800 (PST) From: "Tzvetomir Stoyanov (VMware)" To: rostedt@goodmis.org Cc: linux-trace-devel@vger.kernel.org Subject: [PATCH v4 5/5] [RFC] tracing: Read and write to ring buffers with custom sub buffer size Date: Mon, 13 Dec 2021 11:48:25 +0200 Message-Id: <20211213094825.61876-6-tz.stoyanov@gmail.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211213094825.61876-1-tz.stoyanov@gmail.com> References: <20211213094825.61876-1-tz.stoyanov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org As the size of the ring sub buffer page can be changed dynamically, the logic that reads and writes to the buffer should be fixed to take that into account. Some internal ring buffer APIs are changed: ring_buffer_alloc_read_page() ring_buffer_free_read_page() ring_buffer_read_page() A new API is introduced: ring_buffer_read_page_data() Signed-off-by: Tzvetomir Stoyanov (VMware) --- include/linux/ring_buffer.h | 11 ++-- kernel/trace/ring_buffer.c | 75 ++++++++++++++++++++-------- kernel/trace/ring_buffer_benchmark.c | 10 ++-- kernel/trace/trace.c | 34 +++++++------ 4 files changed, 89 insertions(+), 41 deletions(-) diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index 9103462f6e85..8d6807e3865d 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -192,10 +192,15 @@ bool ring_buffer_time_stamp_abs(struct trace_buffer *buffer); size_t ring_buffer_nr_pages(struct trace_buffer *buffer, int cpu); size_t ring_buffer_nr_dirty_pages(struct trace_buffer *buffer, int cpu); -void *ring_buffer_alloc_read_page(struct trace_buffer *buffer, int cpu); -void ring_buffer_free_read_page(struct trace_buffer *buffer, int cpu, void *data); -int ring_buffer_read_page(struct trace_buffer *buffer, void **data_page, +struct buffer_data_read_page; +struct buffer_data_read_page * +ring_buffer_alloc_read_page(struct trace_buffer *buffer, int cpu); +void ring_buffer_free_read_page(struct trace_buffer *buffer, int cpu, + struct buffer_data_read_page *page); +int ring_buffer_read_page(struct trace_buffer *buffer, + struct buffer_data_read_page *data_page, size_t len, int cpu, int full); +void *ring_buffer_read_page_data(struct buffer_data_read_page *page); struct trace_seq; diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index a40fcb1cb299..fd22e0fc7195 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -309,6 +309,11 @@ struct buffer_data_page { unsigned char data[] RB_ALIGN_DATA; /* data of buffer page */ }; +struct buffer_data_read_page { + unsigned order; /* order of the page */ + struct buffer_data_page *data; /* actual data, stored in this page */ +}; + /* * Note, the buffer_page list must be first. The buffer pages * are allocated in cache lines, which means that each buffer @@ -5414,40 +5419,48 @@ EXPORT_SYMBOL_GPL(ring_buffer_swap_cpu); * Returns: * The page allocated, or ERR_PTR */ -void *ring_buffer_alloc_read_page(struct trace_buffer *buffer, int cpu) +struct buffer_data_read_page * +ring_buffer_alloc_read_page(struct trace_buffer *buffer, int cpu) { struct ring_buffer_per_cpu *cpu_buffer; - struct buffer_data_page *bpage = NULL; + struct buffer_data_read_page *bpage = NULL; unsigned long flags; struct page *page; if (!cpumask_test_cpu(cpu, buffer->cpumask)) return ERR_PTR(-ENODEV); + bpage = kzalloc(sizeof(*bpage), GFP_KERNEL); + if (!bpage) + return ERR_PTR(-ENOMEM); + + bpage->order = buffer->subbuf_order; cpu_buffer = buffer->buffers[cpu]; local_irq_save(flags); arch_spin_lock(&cpu_buffer->lock); if (cpu_buffer->free_page) { - bpage = cpu_buffer->free_page; + bpage->data = cpu_buffer->free_page; cpu_buffer->free_page = NULL; } arch_spin_unlock(&cpu_buffer->lock); local_irq_restore(flags); - if (bpage) + if (bpage->data) goto out; page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL | __GFP_NORETRY, cpu_buffer->buffer->subbuf_order); - if (!page) + if (!page) { + kfree(bpage); return ERR_PTR(-ENOMEM); + } - bpage = page_address(page); + bpage->data = page_address(page); out: - rb_init_page(bpage); + rb_init_page(bpage->data); return bpage; } @@ -5457,19 +5470,24 @@ EXPORT_SYMBOL_GPL(ring_buffer_alloc_read_page); * ring_buffer_free_read_page - free an allocated read page * @buffer: the buffer the page was allocate for * @cpu: the cpu buffer the page came from - * @data: the page to free + * @page: the page to free * * Free a page allocated from ring_buffer_alloc_read_page. */ -void ring_buffer_free_read_page(struct trace_buffer *buffer, int cpu, void *data) +void ring_buffer_free_read_page(struct trace_buffer *buffer, int cpu, + struct buffer_data_read_page *data_page) { struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu]; - struct buffer_data_page *bpage = data; + struct buffer_data_page *bpage = data_page->data; struct page *page = virt_to_page(bpage); unsigned long flags; - /* If the page is still in use someplace else, we can't reuse it */ - if (page_ref_count(page) > 1) + /* + * If the page is still in use someplace else, or order of the page + * is different from the subbuffer order of the buffer - + * we can't reuse it + */ + if (page_ref_count(page) > 1 || data_page->order != buffer->subbuf_order) goto out; local_irq_save(flags); @@ -5484,7 +5502,8 @@ void ring_buffer_free_read_page(struct trace_buffer *buffer, int cpu, void *data local_irq_restore(flags); out: - free_pages((unsigned long)bpage, buffer->subbuf_order); + free_pages((unsigned long)bpage, data_page->order); + kfree(data_page); } EXPORT_SYMBOL_GPL(ring_buffer_free_read_page); @@ -5505,9 +5524,10 @@ EXPORT_SYMBOL_GPL(ring_buffer_free_read_page); * rpage = ring_buffer_alloc_read_page(buffer, cpu); * if (IS_ERR(rpage)) * return PTR_ERR(rpage); - * ret = ring_buffer_read_page(buffer, &rpage, len, cpu, 0); + * ret = ring_buffer_read_page(buffer, rpage, len, cpu, 0); * if (ret >= 0) - * process_page(rpage, ret); + * process_page(ring_buffer_read_page_data(rpage), ret); + * ring_buffer_free_read_page(buffer, cpu, rpage); * * When @full is set, the function will not return true unless * the writer is off the reader page. @@ -5522,7 +5542,8 @@ EXPORT_SYMBOL_GPL(ring_buffer_free_read_page); * <0 if no data has been transferred. */ int ring_buffer_read_page(struct trace_buffer *buffer, - void **data_page, size_t len, int cpu, int full) + struct buffer_data_read_page *data_page, + size_t len, int cpu, int full) { struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu]; struct ring_buffer_event *event; @@ -5547,10 +5568,12 @@ int ring_buffer_read_page(struct trace_buffer *buffer, len -= BUF_PAGE_HDR_SIZE; - if (!data_page) + if (!data_page || !data_page->data) + goto out; + if (data_page->order != buffer->subbuf_order) goto out; - bpage = *data_page; + bpage = data_page->data; if (!bpage) goto out; @@ -5636,11 +5659,11 @@ int ring_buffer_read_page(struct trace_buffer *buffer, /* swap the pages */ rb_init_page(bpage); bpage = reader->page; - reader->page = *data_page; + reader->page = data_page->data; local_set(&reader->write, 0); local_set(&reader->entries, 0); reader->read = 0; - *data_page = bpage; + data_page->data = bpage; /* * Use the real_end for the data size, @@ -5685,6 +5708,18 @@ int ring_buffer_read_page(struct trace_buffer *buffer, } EXPORT_SYMBOL_GPL(ring_buffer_read_page); +/** + * ring_buffer_read_page_data - get pointer to the data in the page. + * @page: the page to get the data from + * + * Returns pointer to the actual data in this page. + */ +void *ring_buffer_read_page_data(struct buffer_data_read_page *page) +{ + return page->data; +} +EXPORT_SYMBOL_GPL(ring_buffer_read_page_data); + /** * ring_buffer_subbuf_size_get - get size of the sub buffer. * @buffer: the buffer to get the sub buffer size from diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c index 78e576575b79..7202d6d650e6 100644 --- a/kernel/trace/ring_buffer_benchmark.c +++ b/kernel/trace/ring_buffer_benchmark.c @@ -104,10 +104,11 @@ static enum event_status read_event(int cpu) static enum event_status read_page(int cpu) { + struct buffer_data_read_page *bpage; struct ring_buffer_event *event; struct rb_page *rpage; unsigned long commit; - void *bpage; + int page_size; int *entry; int ret; int inc; @@ -117,14 +118,15 @@ static enum event_status read_page(int cpu) if (IS_ERR(bpage)) return EVENT_DROPPED; - ret = ring_buffer_read_page(buffer, &bpage, PAGE_SIZE, cpu, 1); + page_size = ring_buffer_subbuf_size_get(buffer); + ret = ring_buffer_read_page(buffer, bpage, page_size, cpu, 1); if (ret >= 0) { - rpage = bpage; + rpage = ring_buffer_read_page_data(bpage); /* The commit may have missed event flags set, clear them */ commit = local_read(&rpage->commit) & 0xfffff; for (i = 0; i < commit && !test_error ; i += inc) { - if (i >= (PAGE_SIZE - offsetof(struct rb_page, data))) { + if (i >= (page_size - offsetof(struct rb_page, data))) { TEST_ERROR(); break; } diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 867a220b4ef2..edcf30ea1d25 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8004,6 +8004,8 @@ tracing_buffers_read(struct file *filp, char __user *ubuf, { struct ftrace_buffer_info *info = filp->private_data; struct trace_iterator *iter = &info->iter; + void *trace_data; + int page_size; ssize_t ret = 0; ssize_t size; @@ -8015,6 +8017,8 @@ tracing_buffers_read(struct file *filp, char __user *ubuf, return -EBUSY; #endif + page_size = ring_buffer_subbuf_size_get(iter->array_buffer->buffer); + if (!info->spare) { info->spare = ring_buffer_alloc_read_page(iter->array_buffer->buffer, iter->cpu_file); @@ -8029,13 +8033,13 @@ tracing_buffers_read(struct file *filp, char __user *ubuf, return ret; /* Do we have previous read data to read? */ - if (info->read < PAGE_SIZE) + if (info->read < page_size) goto read; again: trace_access_lock(iter->cpu_file); ret = ring_buffer_read_page(iter->array_buffer->buffer, - &info->spare, + info->spare, count, iter->cpu_file, 0); trace_access_unlock(iter->cpu_file); @@ -8056,11 +8060,11 @@ tracing_buffers_read(struct file *filp, char __user *ubuf, info->read = 0; read: - size = PAGE_SIZE - info->read; + size = page_size - info->read; if (size > count) size = count; - - ret = copy_to_user(ubuf, info->spare + info->read, size); + trace_data = ring_buffer_read_page_data(info->spare); + ret = copy_to_user(ubuf, trace_data + info->read, size); if (ret == size) return -EFAULT; @@ -8165,6 +8169,7 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, .spd_release = buffer_spd_release, }; struct buffer_ref *ref; + int page_size; int entries, i; ssize_t ret = 0; @@ -8173,13 +8178,14 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, return -EBUSY; #endif - if (*ppos & (PAGE_SIZE - 1)) + page_size = ring_buffer_subbuf_size_get(iter->array_buffer->buffer); + if (*ppos & (page_size - 1)) return -EINVAL; - if (len & (PAGE_SIZE - 1)) { - if (len < PAGE_SIZE) + if (len & (page_size - 1)) { + if (len < page_size) return -EINVAL; - len &= PAGE_MASK; + len &= (~(page_size - 1)); } if (splice_grow_spd(pipe, &spd)) @@ -8189,7 +8195,7 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, trace_access_lock(iter->cpu_file); entries = ring_buffer_entries_cpu(iter->array_buffer->buffer, iter->cpu_file); - for (i = 0; i < spd.nr_pages_max && len && entries; i++, len -= PAGE_SIZE) { + for (i = 0; i < spd.nr_pages_max && len && entries; i++, len -= page_size) { struct page *page; int r; @@ -8210,7 +8216,7 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, } ref->cpu = iter->cpu_file; - r = ring_buffer_read_page(ref->buffer, &ref->page, + r = ring_buffer_read_page(ref->buffer, ref->page, len, iter->cpu_file, 1); if (r < 0) { ring_buffer_free_read_page(ref->buffer, ref->cpu, @@ -8219,14 +8225,14 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, break; } - page = virt_to_page(ref->page); + page = virt_to_page(ring_buffer_read_page_data(ref->page)); spd.pages[i] = page; - spd.partial[i].len = PAGE_SIZE; + spd.partial[i].len = page_size; spd.partial[i].offset = 0; spd.partial[i].private = (unsigned long)ref; spd.nr_pages++; - *ppos += PAGE_SIZE; + *ppos += page_size; entries = ring_buffer_entries_cpu(iter->array_buffer->buffer, iter->cpu_file); }