Message ID | 20200509031058.8239-5-mcgrof@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: fix blktrace debugfs use after free | expand |
On 2020-05-08 20:10, Luis Chamberlain wrote: > @@ -493,6 +496,12 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, > */ > strreplace(buts->name, '/', '_'); > > + if (q->blk_trace) { > + pr_warn("Concurrent blktraces are not allowed on %s\n", > + buts->name); > + return -EBUSY; > + } > + > bt = kzalloc(sizeof(*bt), GFP_KERNEL); > if (!bt) > return -ENOMEM; Is this really sufficient? Shouldn't concurrent do_blk_trace_setup() calls that refer to the same request queue be serialized to really prevent that debugfs attribute creation fails? How about using the block device name instead of the partition name in the error message since the concurrency context is the block device and not the partition? Thanks, Bart.
On Sat, May 09, 2020 at 06:09:38PM -0700, Bart Van Assche wrote: > On 2020-05-08 20:10, Luis Chamberlain wrote: > > @@ -493,6 +496,12 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, > > */ > > strreplace(buts->name, '/', '_'); > > > > + if (q->blk_trace) { > > + pr_warn("Concurrent blktraces are not allowed on %s\n", > > + buts->name); > > + return -EBUSY; > > + } > > + > > bt = kzalloc(sizeof(*bt), GFP_KERNEL); > > if (!bt) > > return -ENOMEM; > > Is this really sufficient? Shouldn't concurrent do_blk_trace_setup() > calls that refer to the same request queue be serialized to really > prevent that debugfs attribute creation fails? We'd have to add something like a linked list. Right now I'm just clarifying things which were not clear before. What you describe is a functional feature change. I'm just trying to fix a bug and clarify limitations. > How about using the block device name instead of the partition name in > the error message since the concurrency context is the block device and > not the partition? blk device argument can be NULL here. sg-generic is one case. Luis
On Mon, May 11, 2020 at 7:39 AM Luis Chamberlain <mcgrof@kernel.org> wrote: > > On Sat, May 09, 2020 at 06:09:38PM -0700, Bart Van Assche wrote: > > How about using the block device name instead of the partition name in > > the error message since the concurrency context is the block device and > > not the partition? > > blk device argument can be NULL here. sg-generic is one case. I'm going to add a comment about this, as it is easily forgotten. Luis
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 6c10a1427de2..bd5ec2184d46 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -3,6 +3,9 @@ * Copyright (C) 2006 Jens Axboe <axboe@kernel.dk> * */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include <linux/kernel.h> #include <linux/blkdev.h> #include <linux/blktrace_api.h> @@ -493,6 +496,12 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, */ strreplace(buts->name, '/', '_'); + if (q->blk_trace) { + pr_warn("Concurrent blktraces are not allowed on %s\n", + buts->name); + return -EBUSY; + } + bt = kzalloc(sizeof(*bt), GFP_KERNEL); if (!bt) return -ENOMEM;
We use one blktrace per request_queue, that means one per the entire disk. So we cannot run one blktrace on say /dev/vda and then /dev/vda1, or just two calls on /dev/vda. We check for concurrent setup only at the very end of the blktrace setup though. If we try to run two concurrent blktraces on the same block device the second one will fail, and the first one seems to go on. However when one tries to kill the first one one will see things like this: The kernel will show these: ``` debugfs: File 'dropped' in directory 'nvme1n1' already present! debugfs: File 'msg' in directory 'nvme1n1' already present! debugfs: File 'trace0' in directory 'nvme1n1' already present! `` And userspace just sees this error message for the second call: ``` blktrace /dev/nvme1n1 BLKTRACESETUP(2) /dev/nvme1n1 failed: 5/Input/output error ``` The first userspace process #1 will also claim that the files were taken underneath their nose as well. The files are taken away form the first process given that when the second blktrace fails, it will follow up with a BLKTRACESTOP and BLKTRACETEARDOWN. This means that even if go-happy process #1 is waiting for blktrace data, we *have* been asked to take teardown the blktrace. This can easily be reproduced with break-blktrace [0] run_0005.sh test. Just break out early if we know we're already going to fail, this will prevent trying to create the files all over again, which we know still exist. [0] https://github.com/mcgrof/break-blktrace Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- kernel/trace/blktrace.c | 9 +++++++++ 1 file changed, 9 insertions(+)