diff mbox

block: add blktrace C events for bio-based drivers

Message ID x49ziips61y.fsf@segfault.boston.devel.redhat.com (mailing list archive)
State Superseded, archived
Delegated to: Mike Snitzer
Headers show

Commit Message

Jeff Moyer Jan. 17, 2017, 9:57 p.m. UTC
Only a few bio-based drivers actually generate blktrace completion
(C) events.  Instead of changing all bio-based drivers to call
trace_block_bio_complete, move the tracing to bio_complete, and remove
the explicit tracing from the few drivers that actually do it.  After
this patch, there is exactly one caller of trace_block_bio_complete
and one caller of trace_block_rq_complete.  More importantly, all
bio-based drivers now generate C events, which is useful for
performance analysis.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
---

Testing: I made sure that request-based drivers don't see duplicate
completions, and that bio-based drivers show both Q and C events.  I
haven't tested all affected drivers or combinations, though.


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Comments

Jens Axboe Jan. 17, 2017, 10:07 p.m. UTC | #1
On 01/17/2017 01:57 PM, Jeff Moyer wrote:
> Only a few bio-based drivers actually generate blktrace completion
> (C) events.  Instead of changing all bio-based drivers to call
> trace_block_bio_complete, move the tracing to bio_complete, and remove
> the explicit tracing from the few drivers that actually do it.  After
> this patch, there is exactly one caller of trace_block_bio_complete
> and one caller of trace_block_rq_complete.  More importantly, all
> bio-based drivers now generate C events, which is useful for
> performance analysis.

I like the change, hate the naming. I'd prefer one of two things:

- Add bio_endio_complete() instead. That name sucks too, the
  important part is flipping the __name() to have a trace
  version instead.

- Mark the bio as trace completed, and keep the naming. Since
  it's only off the completion path, that can be just marking
  the bi_flags non-atomically.

I probably prefer the latter.
Jeff Moyer Jan. 17, 2017, 10:39 p.m. UTC | #2
Jens Axboe <axboe@kernel.dk> writes:

> On 01/17/2017 01:57 PM, Jeff Moyer wrote:
>> Only a few bio-based drivers actually generate blktrace completion
>> (C) events.  Instead of changing all bio-based drivers to call
>> trace_block_bio_complete, move the tracing to bio_complete, and remove
>> the explicit tracing from the few drivers that actually do it.  After
>> this patch, there is exactly one caller of trace_block_bio_complete
>> and one caller of trace_block_rq_complete.  More importantly, all
>> bio-based drivers now generate C events, which is useful for
>> performance analysis.
>
> I like the change, hate the naming. I'd prefer one of two things:
>
> - Add bio_endio_complete() instead. That name sucks too, the
>   important part is flipping the __name() to have a trace
>   version instead.

I had also considered bio_endio_notrace().

> - Mark the bio as trace completed, and keep the naming. Since
>   it's only off the completion path, that can be just marking
>   the bi_flags non-atomically.
>
> I probably prefer the latter.

Hmm, okay.  I'll take a crack at that.

Thanks!
Jeff

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Jeff Moyer Jan. 18, 2017, 4:01 p.m. UTC | #3
Hi, Jens,

Jens Axboe <axboe@kernel.dk> writes:

> I like the change, hate the naming. I'd prefer one of two things:
>
> - Add bio_endio_complete() instead. That name sucks too, the
>   important part is flipping the __name() to have a trace
>   version instead.

ITYM a notrace version.  By default, we want tracing for bio_endio.  The
only callers that need the inverse of that are in the request-based
path, and there are only 2 of them.

> - Mark the bio as trace completed, and keep the naming. Since
>   it's only off the completion path, that can be just marking
>   the bi_flags non-atomically.

One issue with this is in generic_make_request_checks, where we can call
bio_endio without having called trace_block_bio_queue (so you could get
a C event with no corresponding Q).  To address that, we could make the
flag indicate that trace_block_bio_queue was performed, and clear it in
bio_complete, like so:

	if (test_and_clear_bit(BIO_QUEUE_TRACED, &bio->bi_flags))
		trace_block_bio_complete(...);

That would solve the problem of duplicate completions, but requires
setting the flag in the submission path and clearing it in the
completion path.  I think the former can be done with just a
bio_set_flag (i.e. non-atomic), right?  Of course, where to stick that
bio_set_flag call is another bike-shedding discussion waiting to happen
(i.e. does it go in the tracepoint itself?).

Alternatively, we could set the trace_completed flag in the paths where
we end I/O without having done the trace_block_bio_queue, but that seems
way uglier to me.

Can you think of any other options?  If we're choosing from the above,
my preference is for adding the bio_endio_notrace(), since it's so much
simpler.

Cheers,
Jeff

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/block/bio.c b/block/bio.c
index 2b37502..ba5daad 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1785,16 +1785,7 @@  static inline bool bio_remaining_done(struct bio *bio)
 	return false;
 }
 
-/**
- * bio_endio - end I/O on a bio
- * @bio:	bio
- *
- * Description:
- *   bio_endio() will end I/O on the whole bio. bio_endio() is the preferred
- *   way to end I/O on a bio. No one should call bi_end_io() directly on a
- *   bio unless they own it and thus know that it has an end_io function.
- **/
-void bio_endio(struct bio *bio)
+void __bio_endio(struct bio *bio)
 {
 again:
 	if (!bio_remaining_done(bio))
@@ -1816,6 +1807,22 @@  void bio_endio(struct bio *bio)
 	if (bio->bi_end_io)
 		bio->bi_end_io(bio);
 }
+
+/**
+ * bio_endio - end I/O on a bio
+ * @bio:	bio
+ *
+ * Description:
+ *   bio_endio() will end I/O on the whole bio. bio_endio() is the preferred
+ *   way to end I/O on a bio. No one should call bi_end_io() directly on a
+ *   bio unless they own it and thus know that it has an end_io function.
+ **/
+void bio_endio(struct bio *bio)
+{
+	trace_block_bio_complete(bdev_get_queue(bio->bi_bdev),
+				 bio, bio->bi_error);
+	__bio_endio(bio);
+}
 EXPORT_SYMBOL(bio_endio);
 
 /**
diff --git a/block/blk-core.c b/block/blk-core.c
index 61ba08c..f77f2d9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -153,7 +153,7 @@  static void req_bio_endio(struct request *rq, struct bio *bio,
 
 	/* don't actually finish bio if it's part of flush sequence */
 	if (bio->bi_iter.bi_size == 0 && !(rq->rq_flags & RQF_FLUSH_SEQ))
-		bio_endio(bio);
+		__bio_endio(bio);
 }
 
 void blk_dump_rq_flags(struct request *rq, char *msg)
@@ -1947,7 +1947,7 @@  generic_make_request_checks(struct bio *bio)
 	err = -EOPNOTSUPP;
 end_io:
 	bio->bi_error = err;
-	bio_endio(bio);
+	__bio_endio(bio);
 	return false;
 }
 
diff --git a/block/blk.h b/block/blk.h
index 041185e..1c9b50a 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -57,6 +57,7 @@  int blk_init_rl(struct request_list *rl, struct request_queue *q,
 		gfp_t gfp_mask);
 void blk_exit_rl(struct request_list *rl);
 void init_request_from_bio(struct request *req, struct bio *bio);
+void __bio_endio(struct bio *bio);
 void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
 			struct bio *bio);
 void blk_queue_bypass_start(struct request_queue *q);
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3086da5..e151aef 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -807,7 +807,6 @@  static void dec_pending(struct dm_io *io, int error)
 			queue_io(md, bio);
 		} else {
 			/* done with normal IO or empty flush */
-			trace_block_bio_complete(md->queue, bio, io_error);
 			bio->bi_error = io_error;
 			bio_endio(bio);
 		}
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 36c13e4..17b4e06 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -159,8 +159,6 @@  static void return_io(struct bio_list *return_bi)
 	struct bio *bi;
 	while ((bi = bio_list_pop(return_bi)) != NULL) {
 		bi->bi_iter.bi_size = 0;
-		trace_block_bio_complete(bdev_get_queue(bi->bi_bdev),
-					 bi, 0);
 		bio_endio(bi);
 	}
 }
@@ -4902,8 +4900,6 @@  static void raid5_align_endio(struct bio *bi)
 	rdev_dec_pending(rdev, conf->mddev);
 
 	if (!error) {
-		trace_block_bio_complete(bdev_get_queue(raid_bi->bi_bdev),
-					 raid_bi, 0);
 		bio_endio(raid_bi);
 		if (atomic_dec_and_test(&conf->active_aligned_reads))
 			wake_up(&conf->wait_for_quiescent);
@@ -5470,8 +5466,6 @@  static void raid5_make_request(struct mddev *mddev, struct bio * bi)
 		if ( rw == WRITE )
 			md_write_end(mddev);
 
-		trace_block_bio_complete(bdev_get_queue(bi->bi_bdev),
-					 bi, 0);
 		bio_endio(bi);
 	}
 }
@@ -5878,11 +5872,9 @@  static int  retry_aligned_read(struct r5conf *conf, struct bio *raid_bio)
 		handled++;
 	}
 	remaining = raid5_dec_bi_active_stripes(raid_bio);
-	if (remaining == 0) {
-		trace_block_bio_complete(bdev_get_queue(raid_bio->bi_bdev),
-					 raid_bio, 0);
+	if (remaining == 0)
 		bio_endio(raid_bio);
-	}
+
 	if (atomic_dec_and_test(&conf->active_aligned_reads))
 		wake_up(&conf->wait_for_quiescent);
 	return handled;