diff mbox series

[v3,5/6] blktrace: break out of blktrace setup on concurrent calls

Message ID 20200429074627.5955-6-mcgrof@kernel.org (mailing list archive)
State New, archived
Headers show
Series block: fix blktrace debugfs use after free | expand

Commit Message

Luis Chamberlain April 29, 2020, 7:46 a.m. UTC
We use one blktrace per request_queue, that means one per the entire
disk.  So we cannot run one blktrace on say /dev/vda and then /dev/vda1,
or just two calls on /dev/vda.

We check for concurrent setup only at the very end of the blktrace setup though.

If we try to run two concurrent blktraces on the same block device the
second one will fail, and the first one seems to go on. However when
one tries to kill the first one one will see things like this:

The kernel will show these:

```
debugfs: File 'dropped' in directory 'nvme1n1' already present!
debugfs: File 'msg' in directory 'nvme1n1' already present!
debugfs: File 'trace0' in directory 'nvme1n1' already present!
``

And userspace just sees this error message for the second call:

```
blktrace /dev/nvme1n1
BLKTRACESETUP(2) /dev/nvme1n1 failed: 5/Input/output error
```

The first userspace process #1 will also claim that the files
were taken underneath their nose as well. The files are taken
away form the first process given that when the second blktrace
fails, it will follow up with a BLKTRACESTOP and BLKTRACETEARDOWN.
This means that even if go-happy process #1 is waiting for blktrace
data, we *have* been asked to take teardown the blktrace.

This can easily be reproduced with break-blktrace [0] run_0005.sh test.

Just break out early if we know we're already going to fail, this will
prevent trying to create the files all over again, which we know still
exist.

[0] https://github.com/mcgrof/break-blktrace
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 kernel/trace/blktrace.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Greg KH April 29, 2020, 9:49 a.m. UTC | #1
On Wed, Apr 29, 2020 at 07:46:26AM +0000, Luis Chamberlain wrote:
> We use one blktrace per request_queue, that means one per the entire
> disk.  So we cannot run one blktrace on say /dev/vda and then /dev/vda1,
> or just two calls on /dev/vda.
> 
> We check for concurrent setup only at the very end of the blktrace setup though.
> 
> If we try to run two concurrent blktraces on the same block device the
> second one will fail, and the first one seems to go on. However when
> one tries to kill the first one one will see things like this:
> 
> The kernel will show these:
> 
> ```
> debugfs: File 'dropped' in directory 'nvme1n1' already present!
> debugfs: File 'msg' in directory 'nvme1n1' already present!
> debugfs: File 'trace0' in directory 'nvme1n1' already present!
> ``
> 
> And userspace just sees this error message for the second call:
> 
> ```
> blktrace /dev/nvme1n1
> BLKTRACESETUP(2) /dev/nvme1n1 failed: 5/Input/output error
> ```
> 
> The first userspace process #1 will also claim that the files
> were taken underneath their nose as well. The files are taken
> away form the first process given that when the second blktrace
> fails, it will follow up with a BLKTRACESTOP and BLKTRACETEARDOWN.
> This means that even if go-happy process #1 is waiting for blktrace
> data, we *have* been asked to take teardown the blktrace.
> 
> This can easily be reproduced with break-blktrace [0] run_0005.sh test.
> 
> Just break out early if we know we're already going to fail, this will
> prevent trying to create the files all over again, which we know still
> exist.
> 
> [0] https://github.com/mcgrof/break-blktrace
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  kernel/trace/blktrace.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
> index 5c52976bd762..383045f67cb8 100644
> --- a/kernel/trace/blktrace.c
> +++ b/kernel/trace/blktrace.c
> @@ -4,6 +4,8 @@
>   *
>   */
>  
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
>  #include <linux/kernel.h>
>  #include <linux/blkdev.h>
>  #include <linux/blktrace_api.h>
> @@ -516,6 +518,11 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
>  	 */
>  	strreplace(buts->name, '/', '_');
>  
> +	if (q->blk_trace) {
> +		pr_warn("Concurrent blktraces are not allowed\n");
> +		return -EBUSY;

You have access to a block device here, please use dev_warn() instead
here for that, that makes it obvious as to what device a "concurrent
blktrace" was attempted for.

thanks,

greg k-h
Luis Chamberlain May 1, 2020, 3:06 p.m. UTC | #2
On Wed, Apr 29, 2020 at 11:49:37AM +0200, Greg KH wrote:
> On Wed, Apr 29, 2020 at 07:46:26AM +0000, Luis Chamberlain wrote:
> > diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
> > index 5c52976bd762..383045f67cb8 100644
> > --- a/kernel/trace/blktrace.c
> > +++ b/kernel/trace/blktrace.c
> > @@ -516,6 +518,11 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
> >  	 */
> >  	strreplace(buts->name, '/', '_');
> >  
> > +	if (q->blk_trace) {
> > +		pr_warn("Concurrent blktraces are not allowed\n");
> > +		return -EBUSY;
> 
> You have access to a block device here, please use dev_warn() instead
> here for that, that makes it obvious as to what device a "concurrent
> blktrace" was attempted for.

The block device may be empty, one example is for scsi-generic, but I'll
use buts->name.

  Luis
Christoph Hellwig May 1, 2020, 3:34 p.m. UTC | #3
On Fri, May 01, 2020 at 03:06:26PM +0000, Luis Chamberlain wrote:
> > You have access to a block device here, please use dev_warn() instead
> > here for that, that makes it obvious as to what device a "concurrent
> > blktrace" was attempted for.
> 
> The block device may be empty, one example is for scsi-generic, but I'll
> use buts->name.

Is blktrace on /dev/sg something we intentionally support, or just by
some accident of history?  Given all the pains it causes I'd be tempted
to just remove the support and see if anyone screams.
Luis Chamberlain May 1, 2020, 3:40 p.m. UTC | #4
On Fri, May 01, 2020 at 08:34:23AM -0700, Christoph Hellwig wrote:
> On Fri, May 01, 2020 at 03:06:26PM +0000, Luis Chamberlain wrote:
> > > You have access to a block device here, please use dev_warn() instead
> > > here for that, that makes it obvious as to what device a "concurrent
> > > blktrace" was attempted for.
> > 
> > The block device may be empty, one example is for scsi-generic, but I'll
> > use buts->name.
> 
> Is blktrace on /dev/sg something we intentionally support, or just by
> some accident of history?  Given all the pains it causes I'd be tempted
> to just remove the support and see if anyone screams.

From what I can tell I think it was a historic and brutal mistake. I am
more than happy to remove it.

Re-adding support would just be a symlink.

  Luis
Luis Chamberlain May 1, 2020, 3:50 p.m. UTC | #5
On Fri, May 1, 2020 at 9:40 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Fri, May 01, 2020 at 08:34:23AM -0700, Christoph Hellwig wrote:
> > On Fri, May 01, 2020 at 03:06:26PM +0000, Luis Chamberlain wrote:
> > > > You have access to a block device here, please use dev_warn() instead
> > > > here for that, that makes it obvious as to what device a "concurrent
> > > > blktrace" was attempted for.
> > >
> > > The block device may be empty, one example is for scsi-generic, but I'll
> > > use buts->name.
> >
> > Is blktrace on /dev/sg something we intentionally support, or just by
> > some accident of history?  Given all the pains it causes I'd be tempted
> > to just remove the support and see if anyone screams.
>
> From what I can tell I think it was a historic and brutal mistake. I am
> more than happy to remove it.

I take that back:

commit 6da127ad0918f93ea93678dad62ce15ffed18797
Author: Christof Schmitt <christof.schmitt@de.ibm.com>
Date:   Fri Jan 11 10:09:43 2008 +0100

    blktrace: Add blktrace ioctls to SCSI generic devices

    Since the SCSI layer uses the request queues from the block layer,
blktrace can
    also be used to trace the requests to all SCSI devices (like SCSI
tape drives),
    not only disks. The only missing part is the ioctl interface to
start and stop
    tracing.

    This patch adds the SETUP, START, STOP and TEARDOWN ioctls from
blktrace to the
    sg device files. With this change, blktrace can be used for SCSI
devices like
    for disks, e.g.: blktrace -d /dev/sg1 -o - | blkparse -i -

    Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
    Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

Christof, any thoughts on removing this support?

 Luis
Greg KH May 1, 2020, 4:51 p.m. UTC | #6
On Fri, May 01, 2020 at 03:06:26PM +0000, Luis Chamberlain wrote:
> On Wed, Apr 29, 2020 at 11:49:37AM +0200, Greg KH wrote:
> > On Wed, Apr 29, 2020 at 07:46:26AM +0000, Luis Chamberlain wrote:
> > > diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
> > > index 5c52976bd762..383045f67cb8 100644
> > > --- a/kernel/trace/blktrace.c
> > > +++ b/kernel/trace/blktrace.c
> > > @@ -516,6 +518,11 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
> > >  	 */
> > >  	strreplace(buts->name, '/', '_');
> > >  
> > > +	if (q->blk_trace) {
> > > +		pr_warn("Concurrent blktraces are not allowed\n");
> > > +		return -EBUSY;
> > 
> > You have access to a block device here, please use dev_warn() instead
> > here for that, that makes it obvious as to what device a "concurrent
> > blktrace" was attempted for.
> 
> The block device may be empty, one example is for scsi-generic, but I'll
> use buts->name.

That's fine, give us a chance to know what went wrong, your line as is
does not do that :(

thanks,

greg k-h
diff mbox series

Patch

diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 5c52976bd762..383045f67cb8 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -4,6 +4,8 @@ 
  *
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include <linux/kernel.h>
 #include <linux/blkdev.h>
 #include <linux/blktrace_api.h>
@@ -516,6 +518,11 @@  static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
 	 */
 	strreplace(buts->name, '/', '_');
 
+	if (q->blk_trace) {
+		pr_warn("Concurrent blktraces are not allowed\n");
+		return -EBUSY;
+	}
+
 	bt = kzalloc(sizeof(*bt), GFP_KERNEL);
 	if (!bt)
 		return -ENOMEM;