From patchwork Tue Mar 26 10:08:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Donnefort X-Patchwork-Id: 13603822 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5F28C6FD1F for ; Tue, 26 Mar 2024 10:08:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 348CE6B0099; Tue, 26 Mar 2024 06:08:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F8366B009A; Tue, 26 Mar 2024 06:08:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0FC3D6B009B; Tue, 26 Mar 2024 06:08:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id F26B36B0099 for ; Tue, 26 Mar 2024 06:08:45 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A73CE1409BB for ; Tue, 26 Mar 2024 10:08:45 +0000 (UTC) X-FDA: 81938766210.24.B29351E Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf28.hostedemail.com (Postfix) with ESMTP id E5336C0017 for ; Tue, 26 Mar 2024 10:08:43 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="el/T/K04"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of 3q54CZgoKCCobJUTTKLUXZMUUMRK.IUSROTad-SSQbGIQ.UXM@flex--vdonnefort.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3q54CZgoKCCobJUTTKLUXZMUUMRK.IUSROTad-SSQbGIQ.UXM@flex--vdonnefort.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711447723; a=rsa-sha256; cv=none; b=Q7B0pxqwZyJ8JrqZ7DSbX61ZpHbQ7UtEG30DwST7s7LUBjEy9bmrajNWdWfMz0Gn+dQaCz SMPciS9EUaXyA1330LuO2OW/LI1Md52ek3HnfF70BpGBhyqWPgYjD0mH1lRjX2t0IGmDcP YSAEXWVAmR82TyyJ+DiVLADXQyGh3vY= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="el/T/K04"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of 3q54CZgoKCCobJUTTKLUXZMUUMRK.IUSROTad-SSQbGIQ.UXM@flex--vdonnefort.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3q54CZgoKCCobJUTTKLUXZMUUMRK.IUSROTad-SSQbGIQ.UXM@flex--vdonnefort.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711447723; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OyI4CnSC0ToIvIdFLh4rwYAojfcnvHwWob+aFyszGhQ=; b=sSUW3PJ4J73Jfqg5Q3+y7X9TFt8KMc0EyePQcMcboAEGNVCKKCCAbbZLXzs60rX3fL6chX pqDHkUB3mnRmzZm6+8SN4FYkVT0r+QCDwGN9mDU+LXUnvaMWk6jwyy6zma+10hoZSwYGDc asFPxlZhmGJiIQ6y0vd6vIVcUrO/tuo= Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-dd8e82dd47eso7335640276.2 for ; Tue, 26 Mar 2024 03:08:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1711447723; x=1712052523; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=OyI4CnSC0ToIvIdFLh4rwYAojfcnvHwWob+aFyszGhQ=; b=el/T/K04xfeIMUm09/l3dC+y4n7KaKuljyFZi1wSsOU0h5ObtIlnlVW7S7e6fwmC8G +8cxCK/LK0+FsxkgqT9gWB5ZdpJa/rwgsdLMcSM7SkVDpGWqJa7MpJxm8Y/iLRZ+51OY 6UPcLrHySyliyLfZ6SHEWT3IryeJ/H3RVCNubbBjfo8/FuVbAvDCBDNZFmdET/B8lvUi mKE+Be5n1rPOe7sFbvtptGyI+VSj2TZM0qKaqOZQyu7z0DA0zEQ7dW+5vDiEH0GOQGM3 mnxKEdymJbM0hU29ubCqtciryS0XSH5Rdcjwc3v/NH5+W4AbswGO+wzf3BpVxI+pYBwn ZvqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711447723; x=1712052523; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OyI4CnSC0ToIvIdFLh4rwYAojfcnvHwWob+aFyszGhQ=; b=lZKDv983lJCmflyN8rL0ZXnbGl5EMubOxk8zASlhLjYiFdbVnEcojHSC++rA1VIN8H Q0ubnloTIU74n5sbY9h5RFGm84KnSDnysjOqRy63FVxMBBpg25ReKBeK7/OtzCV0qTd8 azGAcoq/sMyIaM7O1+Omw1GzJ2yqAt1JYOBcvstQPH6hgcP4vqdnuYGpe3XbOlfAtY80 t8+TF1U2yPi+dfF10EYv/bKJ3p9Fqr4Ah18a+m23BcsP8p2/zCrbu8bihW8NeY0aFjZp bEV5H7YLkgOyvP4ZvLUWYyLncrwOizrpmu+jPLFDuzxwSkNPK1Idls1GOh+0gT83St0y Fdig== X-Forwarded-Encrypted: i=1; AJvYcCUWMdLLlxQQ3FnS5gac9ZN/5Qt2fpbuFgZO66hSnKAUJbUityJTDbS2U71gJkPXCWQ7y83nPxeyKjZZ/XbFZVXVlHg= X-Gm-Message-State: AOJu0Yyb9gzluZlaczPyb6oDE7mad7Vg8rtpCDqzXEJJyUgMFDZUN1w6 P9K3qwOg64+Jahts6fotUEuWBAtGbFtd3buzZ+vjyyCtbHc6SIBKEAtCi7JBwaRg0poUvTs8aYK 1wa/RrIvMoDlcYxgjmQ== X-Google-Smtp-Source: AGHT+IFLBQXGvzGvl+FqYKWO0kLis/bgrKgWpiHsGKVdG5tD2QW/YzExtK7Ok0IJIsNql7aDOpiZkXeABAKerp4k X-Received: from vdonnefort.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:2eea]) (user=vdonnefort job=sendgmr) by 2002:a05:6902:160e:b0:dda:d7cf:5c2c with SMTP id bw14-20020a056902160e00b00ddad7cf5c2cmr367269ybb.13.1711447723008; Tue, 26 Mar 2024 03:08:43 -0700 (PDT) Date: Tue, 26 Mar 2024 10:08:28 +0000 In-Reply-To: <20240326100830.1326610-1-vdonnefort@google.com> Mime-Version: 1.0 References: <20240326100830.1326610-1-vdonnefort@google.com> X-Mailer: git-send-email 2.44.0.396.g6e790dbe36-goog Message-ID: <20240326100830.1326610-4-vdonnefort@google.com> Subject: [PATCH v19 RESEND 3/5] tracing: Allow user-space mapping of the ring-buffer From: Vincent Donnefort To: rostedt@goodmis.org, mhiramat@kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: mathieu.desnoyers@efficios.com, kernel-team@android.com, Vincent Donnefort , linux-mm@kvack.org X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: E5336C0017 X-Stat-Signature: xxi79uquogtcuxyo8genq39155o7e9i8 X-HE-Tag: 1711447723-229530 X-HE-Meta: U2FsdGVkX19TUYlDdBZxnDiuj8fPFOOXqzGQTq07YlyrMLv+3YjWnXlSGAFpFs0tqv2rHds/TWAi69o/qSSI/jH6UhTgvrXhv4VGBnHZb0qi2pVrSq4Mdwk/tLbL4xSTaqqAbUsXWtQiZGuOnUwg0PCTs5WqtUsUpVwW7ubfIWMWfWsYT0sctopHuri9R1aHJJ44zitwTofzqnfQ4DNO4gj5KBXoWLLXF3jWacSCp67v7P16WmVeJakxi7IjV90xtd7VvWIFRJ0rywSYM6leNEwKurgeT0qu+LNcWwuIMbHkTwtQ0or7TVQIrsSbznoF9/FwltTV2htClvm3Lmh3tSiJ4o2GR3JN1+JptyuA3fWWsIP5a/7TqjAf3U7Vbn8SFDOgCcOOxxnvg8iE4vpzVK9LX+dk0WBVO2Gz1FDAcUY0KB7joadqWiJevzraMBKSumxwpPKjUImoW2ep7UKLJOzDQbZ+yeMe9Bi8o/xcvuuayJ+d5Lv2vGuhfpn6n4wwmmx5czlEPEiLgMSJKANJBGTPM3l5T/eRLpvHxHopK0ZgxSdchHqs2i4J/aI1nLuXhvFkrDZ7hbN9oQiz5iLoxKhVbC9ITto97f8XIi2EMrJDlsqF7ReV68nH8Kp3M8km/CxU2gTzlR2Foo+MWAutkyGXF0Lo/eGbPefjGi1wLxZfWl05sX32wURE6XH15EMFOLJCpTY+Na/OFN+rRoBwz+862QjvharGhKSNwgRK7CmH9510+f1QDvC+V0YWyauVp8S1L2+sA5ZxvBxLoU6rpewlIve/0iv1LPm/1j+8wMqvpdyBLYbhDIIks9NLJ+Blr5UYNE7zJIplcXkXgRJlTDu92fPSedZy8563rm+NN3pM+Iz4Y3l5tSPt5RQVoF128+dwpQbPOhpAYgrPetl/+g2PPzUgAFqA10YpddPYAMmTbnb2u93etBZ0Oo0D2q1cMIik720m6Xj92p2zK8F 7YpwnIEM PsvTZoWTJBjqnbHAIIxnsewt5+4QdzxwQiD0eyHXkGSeBx8hAB0V/mjpgCDZwYWDHodWde2oDZrOXDHe5mT+/i7lH7I+AGdpOuzK/1zpQmY3qmwThdyVZZSmwebHF1YfTWutylOQD5DooYwRFIuAfbq6/uSx7drZ6VfmlkNM1LSBWEPzYXxczJJj7FNBUNSgkvY1/KF88tSJnu54+t8tIrx+C9eWCMHTN92w0INrJlYgEPxLfM8w3eYAHVnA7q/v6ECVCe6WDz1Yi6fLC5e1EkozTTaFmzr6tZhQO3YKNTNZucm58vvhAGrmFFmkckR9E8Jsu/AkBrcoJD9qllq+V+o3MJPULnfOCuP9KTCi3+ooxSnlj3UcT19mCPlrNLBTk8MLj X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently, user-space extracts data from the ring-buffer via splice, which is handy for storage or network sharing. However, due to splice limitations, it is imposible to do real-time analysis without a copy. A solution for that problem is to let the user-space map the ring-buffer directly. The mapping is exposed via the per-CPU file trace_pipe_raw. The first element of the mapping is the meta-page. It is followed by each subbuffer constituting the ring-buffer, ordered by their unique page ID: * Meta-page -- include/uapi/linux/trace_mmap.h for a description * Subbuf ID 0 * Subbuf ID 1 ... It is therefore easy to translate a subbuf ID into an offset in the mapping: reader_id = meta->reader->id; reader_offset = meta->meta_page_size + reader_id * meta->subbuf_size; When new data is available, the mapper must call a newly introduced ioctl: TRACE_MMAP_IOCTL_GET_READER. This will update the Meta-page reader ID to point to the next reader containing unread data. Mapping will prevent snapshot and buffer size modifications. CC: Signed-off-by: Vincent Donnefort diff --git a/include/uapi/linux/trace_mmap.h b/include/uapi/linux/trace_mmap.h index ffcd8dfcaa4f..d25b9d504a7c 100644 --- a/include/uapi/linux/trace_mmap.h +++ b/include/uapi/linux/trace_mmap.h @@ -43,4 +43,6 @@ struct trace_buffer_meta { __u64 Reserved2; }; +#define TRACE_MMAP_IOCTL_GET_READER _IO('T', 0x1) + #endif /* _TRACE_MMAP_H_ */ diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 233d1af39fff..0f37aa9860fd 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1191,6 +1191,12 @@ static void tracing_snapshot_instance_cond(struct trace_array *tr, return; } + if (tr->mapped) { + trace_array_puts(tr, "*** BUFFER MEMORY MAPPED ***\n"); + trace_array_puts(tr, "*** Can not use snapshot (sorry) ***\n"); + return; + } + local_irq_save(flags); update_max_tr(tr, current, smp_processor_id(), cond_data); local_irq_restore(flags); @@ -1323,7 +1329,7 @@ static int tracing_arm_snapshot_locked(struct trace_array *tr) lockdep_assert_held(&trace_types_lock); spin_lock(&tr->snapshot_trigger_lock); - if (tr->snapshot == UINT_MAX) { + if (tr->snapshot == UINT_MAX || tr->mapped) { spin_unlock(&tr->snapshot_trigger_lock); return -EBUSY; } @@ -6068,7 +6074,7 @@ static void tracing_set_nop(struct trace_array *tr) { if (tr->current_trace == &nop_trace) return; - + tr->current_trace->enabled--; if (tr->current_trace->reset) @@ -8194,15 +8200,32 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, return ret; } -/* An ioctl call with cmd 0 to the ring buffer file will wake up all waiters */ static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { struct ftrace_buffer_info *info = file->private_data; struct trace_iterator *iter = &info->iter; + int err; + + if (cmd == TRACE_MMAP_IOCTL_GET_READER) { + if (!(file->f_flags & O_NONBLOCK)) { + err = ring_buffer_wait(iter->array_buffer->buffer, + iter->cpu_file, + iter->tr->buffer_percent, + NULL, NULL); + if (err) + return err; + } - if (cmd) - return -ENOIOCTLCMD; + return ring_buffer_map_get_reader(iter->array_buffer->buffer, + iter->cpu_file); + } else if (cmd) { + return -ENOTTY; + } + /* + * An ioctl call with cmd 0 to the ring buffer file will wake up all + * waiters + */ mutex_lock(&trace_types_lock); /* Make sure the waiters see the new wait_index */ @@ -8214,6 +8237,94 @@ static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned return 0; } +static vm_fault_t tracing_buffers_mmap_fault(struct vm_fault *vmf) +{ + return VM_FAULT_SIGBUS; +} + +#ifdef CONFIG_TRACER_MAX_TRACE +static int get_snapshot_map(struct trace_array *tr) +{ + int err = 0; + + /* + * Called with mmap_lock held. lockdep would be unhappy if we would now + * take trace_types_lock. Instead use the specific + * snapshot_trigger_lock. + */ + spin_lock(&tr->snapshot_trigger_lock); + + if (tr->snapshot || tr->mapped == UINT_MAX) + err = -EBUSY; + else + tr->mapped++; + + spin_unlock(&tr->snapshot_trigger_lock); + + /* Wait for update_max_tr() to observe iter->tr->mapped */ + if (tr->mapped == 1) + synchronize_rcu(); + + return err; + +} +static void put_snapshot_map(struct trace_array *tr) +{ + spin_lock(&tr->snapshot_trigger_lock); + if (!WARN_ON(!tr->mapped)) + tr->mapped--; + spin_unlock(&tr->snapshot_trigger_lock); +} +#else +static inline int get_snapshot_map(struct trace_array *tr) { return 0; } +static inline void put_snapshot_map(struct trace_array *tr) { } +#endif + +static void tracing_buffers_mmap_close(struct vm_area_struct *vma) +{ + struct ftrace_buffer_info *info = vma->vm_file->private_data; + struct trace_iterator *iter = &info->iter; + + WARN_ON(ring_buffer_unmap(iter->array_buffer->buffer, iter->cpu_file)); + put_snapshot_map(iter->tr); +} + +static void tracing_buffers_mmap_open(struct vm_area_struct *vma) { } + +static const struct vm_operations_struct tracing_buffers_vmops = { + .open = tracing_buffers_mmap_open, + .close = tracing_buffers_mmap_close, + .fault = tracing_buffers_mmap_fault, +}; + +static int tracing_buffers_mmap(struct file *filp, struct vm_area_struct *vma) +{ + struct ftrace_buffer_info *info = filp->private_data; + struct trace_iterator *iter = &info->iter; + int ret = 0; + + if (vma->vm_flags & VM_WRITE || vma->vm_flags & VM_EXEC || + !(vma->vm_flags & VM_MAYSHARE)) + return -EPERM; + + vm_flags_mod(vma, + VM_MIXEDMAP | VM_PFNMAP | + VM_DONTCOPY | VM_DONTDUMP | VM_DONTEXPAND | VM_IO, + VM_MAYWRITE); + + vma->vm_ops = &tracing_buffers_vmops; + + ret = get_snapshot_map(iter->tr); + if (ret) + return ret; + + ret = ring_buffer_map(iter->array_buffer->buffer, iter->cpu_file, vma); + if (ret) + put_snapshot_map(iter->tr); + + return ret; +} + static const struct file_operations tracing_buffers_fops = { .open = tracing_buffers_open, .read = tracing_buffers_read, @@ -8223,6 +8334,7 @@ static const struct file_operations tracing_buffers_fops = { .splice_read = tracing_buffers_splice_read, .unlocked_ioctl = tracing_buffers_ioctl, .llseek = no_llseek, + .mmap = tracing_buffers_mmap, }; static ssize_t diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 64450615ca0c..749a182dab48 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -336,6 +336,7 @@ struct trace_array { bool allocated_snapshot; spinlock_t snapshot_trigger_lock; unsigned int snapshot; + unsigned int mapped; unsigned long max_latency; #ifdef CONFIG_FSNOTIFY struct dentry *d_max_latency;