diff mbox

[V7,0/2] block: introduce block_rq_error tracepoint

Message ID 20220205091150.6105-1-chaitanyak@nvidia.com (mailing list archive)
State New, archived
Headers show

Commit Message

Chaitanya Kulkarni Feb. 5, 2022, 9:11 a.m. UTC
From: Chaitanya Kulkarni <kch@nvidia.com>

Hi Yang,

I spent sometime generating V7 and testing the same. I kept your
original authorship for the 2nd patch and added a prep patch.
Let me know if you find anything wrong with this series.

Following is the cover-letter for everyone else:-

Currently, rasdaemon uses existing tracepoint block_rq_complete
and filters out non-error cases in order to capture block disk errors.
The generic tracepoint brings overhead see below and requires filtering
for the requests which are failed due to error :-

block_rq_complete() tracepint enabled fio randread :-

  read: IOPS=107k, BW=418MiB/s (439MB/s)(24.5GiB/60001msec)
  read: IOPS=107k, BW=419MiB/s (439MB/s)(24.5GiB/60001msec)
  read: IOPS=106k, BW=416MiB/s (436MB/s)(24.4GiB/60001msec)
  read: IOPS=107k, BW=417MiB/s (437MB/s)(24.4GiB/60001msec)
  read: IOPS=107k, BW=418MiB/s (438MB/s)(24.5GiB/60001msec)
  read: IOPS=106k, BW=414MiB/s (434MB/s)(24.3GiB/60001msec)
  read: IOPS=107k, BW=418MiB/s (438MB/s)(24.5GiB/60001msec)
  read: IOPS=106k, BW=413MiB/s (434MB/s)(24.2GiB/60001msec)
  read: IOPS=106k, BW=414MiB/s (434MB/s)(24.2GiB/60001msec)
  read: IOPS=109k, BW=425MiB/s (445MB/s)(24.9GiB/60001msec)
  AVG = 417 MiB/s

block_rq_complete() tracepint disabled fio randread :-
  read: IOPS=110k, BW=428MiB/s (449MB/s)(25.1GiB/60001msec)
  read: IOPS=107k, BW=418MiB/s (438MB/s)(24.5GiB/60001msec)
  read: IOPS=108k, BW=421MiB/s (442MB/s)(24.7GiB/60001msec)
  read: IOPS=107k, BW=419MiB/s (439MB/s)(24.5GiB/60001msec)
  read: IOPS=108k, BW=422MiB/s (442MB/s)(24.7GiB/60001msec)
  read: IOPS=108k, BW=422MiB/s (443MB/s)(24.7GiB/60001msec)
  read: IOPS=108k, BW=422MiB/s (442MB/s)(24.7GiB/60001msec)
  read: IOPS=108k, BW=422MiB/s (443MB/s)(24.8GiB/60001msec)
  read: IOPS=108k, BW=421MiB/s (442MB/s)(24.7GiB/60001msec)
  read: IOPS=108k, BW=423MiB/s (443MB/s)(24.8GiB/60001msec)
  AVG = 421 MiB/s

Introduce a new tracepoint block_rq_error() just for the error case to
reduce the overhead of generic block_rq_complete() tracepoint to only
trace requests with errors.

Below is the detailed log of testing block_rq_complete() and
block_rq_error() tracepoints.

-ck

The v3 patch was submitted in Feb 2020, and Steven reviewed the patch, but
it was not merged to upstream. See
https://lore.kernel.org/lkml/20200203053650.8923-1-xiyou.wangcong@gmail.com/.

The problems fixed by that patch still exist and we do need it to make
disk error handling in rasdaemon easier. So this resurrected it and
continued the version number.

V6 --> V7:
 * Declare new trace event block_rq_completion and use it with
   bio_rq_complete and bio_rq_error() to avoid code repetation.
 * Add cover letter and document details.
 * Add performance numbers.

v5 --> v6:
 * Removed disk name per Christoph and Chaitanya
 * Kept errno since I didn't find any other block tracepoints print blk
   status code and userspace (i.e. rasdaemon) does expect errno.
v4 --> v5:
 * Report the actual block layer status code instead of the errno per
   Christoph Hellwig.
v3 --> v4:
 * Rebased to v5.17-rc1.
 * Collected reviewed-by tag from Steven.


Chaitanya Kulkarni (1):
  block: create event class for rq completion

Yang Shi (1):
  block: introduce block_rq_error tracepoint

 block/blk-mq.c               |  4 ++-
 include/trace/events/block.h | 49 ++++++++++++++++++++++++++----------
 2 files changed, 39 insertions(+), 14 deletions(-)



* Detailed test log for two tracepoints block_rq_complete() and
  block_rq_error():

* Modified null_blk to fail any incoming REQ_OP_WRITE for
   testing :-

root@dev linux-block (for-next) # git diff drivers/block/null_blk/main.c
root@dev tracing # modprobe null_blk
root@dev tracing # ls -l /dev/nullb0 
brw-rw----. 1 root disk 251, 0 Feb  4 02:45 /dev/nullb0
root@dev tracing # # note the above major number

* Test both block_rq_complete() and block_rq_error() tracepoints :- 

+ cd tracing
+ modprobe -r null_blk
+ rm -fr /dev/nullb0
+ modprobe null_blk
+ sleep 1
+ set +x
###############################################################
# Disable block_rq_[complete|error] tracepoints
#
+ echo 0
+ echo 0
+ cat events/block/block_rq_complete/enable
0
+ cat events/block/block_rq_error/enable
0
+ set +x
###############################################################
# Enable block_rq_complete() tracepoint and generate write error
#
+ echo 1
+ cat events/block/block_rq_complete/enable
1
+ dd if=/dev/zero of=/dev/nullb0 bs=64k count=10 oflag=direct seek=1024
dd: error writing '/dev/nullb0': Input/output error
1+0 records in
0+0 records out
0 bytes copied, 0.00268675 s, 0.0 kB/s
+ cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 1/1   #P:48
#
#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
    kworker/0:1H-1090    [000] .....   586.020342: block_rq_complete: 251,0 WS () 131072 + 128 [-5]
+ echo ''
+ set +x
###############################################################
# Enable block_rq_[complete|error]() tracepoint and 
# generate write error 
#
+ echo 1
+ cat events/block/block_rq_error/enable
1
+ dd if=/dev/zero of=/dev/nullb0 bs=64k count=10 oflag=direct seek=10240
dd: error writing '/dev/nullb0': Input/output error
1+0 records in
0+0 records out
0 bytes copied, 0.0022944 s, 0.0 kB/s
+ cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 2/2   #P:48
#
#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
    kworker/0:1H-1090    [000] .....   586.064527: block_rq_complete: 251,0 WS () 1310720 + 128 [-5]
    kworker/0:1H-1090    [000] .....   586.066135: block_rq_error: 251,0 WS () 1310720 + 128 [-5]
+ echo ''
+ set +x
###############################################################
# Disable block_rq_complete() and keep block_rq_error()
# tracepoint enabled and generate write error 
#
+ echo 0
+ cat events/block/block_rq_complete/enable
0
+ dd if=/dev/zero of=/dev/nullb0 bs=64k count=10 oflag=direct seek=10240
dd: error writing '/dev/nullb0': Input/output error
1+0 records in
0+0 records out
0 bytes copied, 0.00235022 s, 0.0 kB/s
+ cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 1/1   #P:48
#
#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
    kworker/0:1H-1090    [000] .....   586.110419: block_rq_error: 251,0 WS () 1310720 + 128 [-5]
+ echo ''
+ modprobe -r null_blk
+ set +x

Comments

Yang Shi Feb. 7, 2022, 5:17 p.m. UTC | #1
On Sat, Feb 5, 2022 at 1:12 AM Chaitanya Kulkarni <chaitanyak@nvidia.com> wrote:
>
> From: Chaitanya Kulkarni <kch@nvidia.com>
>
> Hi Yang,
>
> I spent sometime generating V7 and testing the same. I kept your
> original authorship for the 2nd patch and added a prep patch.
> Let me know if you find anything wrong with this series.

Hi Chaitanya,

Thank you so much for preparing the patches. They look good to me. But
it seems you forgot cc Steven Rostedt. Looped him in.

>
> Following is the cover-letter for everyone else:-
>
> Currently, rasdaemon uses existing tracepoint block_rq_complete
> and filters out non-error cases in order to capture block disk errors.
> The generic tracepoint brings overhead see below and requires filtering
> for the requests which are failed due to error :-
>
> block_rq_complete() tracepint enabled fio randread :-
>
>   read: IOPS=107k, BW=418MiB/s (439MB/s)(24.5GiB/60001msec)
>   read: IOPS=107k, BW=419MiB/s (439MB/s)(24.5GiB/60001msec)
>   read: IOPS=106k, BW=416MiB/s (436MB/s)(24.4GiB/60001msec)
>   read: IOPS=107k, BW=417MiB/s (437MB/s)(24.4GiB/60001msec)
>   read: IOPS=107k, BW=418MiB/s (438MB/s)(24.5GiB/60001msec)
>   read: IOPS=106k, BW=414MiB/s (434MB/s)(24.3GiB/60001msec)
>   read: IOPS=107k, BW=418MiB/s (438MB/s)(24.5GiB/60001msec)
>   read: IOPS=106k, BW=413MiB/s (434MB/s)(24.2GiB/60001msec)
>   read: IOPS=106k, BW=414MiB/s (434MB/s)(24.2GiB/60001msec)
>   read: IOPS=109k, BW=425MiB/s (445MB/s)(24.9GiB/60001msec)
>   AVG = 417 MiB/s
>
> block_rq_complete() tracepint disabled fio randread :-
>   read: IOPS=110k, BW=428MiB/s (449MB/s)(25.1GiB/60001msec)
>   read: IOPS=107k, BW=418MiB/s (438MB/s)(24.5GiB/60001msec)
>   read: IOPS=108k, BW=421MiB/s (442MB/s)(24.7GiB/60001msec)
>   read: IOPS=107k, BW=419MiB/s (439MB/s)(24.5GiB/60001msec)
>   read: IOPS=108k, BW=422MiB/s (442MB/s)(24.7GiB/60001msec)
>   read: IOPS=108k, BW=422MiB/s (443MB/s)(24.7GiB/60001msec)
>   read: IOPS=108k, BW=422MiB/s (442MB/s)(24.7GiB/60001msec)
>   read: IOPS=108k, BW=422MiB/s (443MB/s)(24.8GiB/60001msec)
>   read: IOPS=108k, BW=421MiB/s (442MB/s)(24.7GiB/60001msec)
>   read: IOPS=108k, BW=423MiB/s (443MB/s)(24.8GiB/60001msec)
>   AVG = 421 MiB/s
>
> Introduce a new tracepoint block_rq_error() just for the error case to
> reduce the overhead of generic block_rq_complete() tracepoint to only
> trace requests with errors.
>
> Below is the detailed log of testing block_rq_complete() and
> block_rq_error() tracepoints.
>
> -ck
>
> The v3 patch was submitted in Feb 2020, and Steven reviewed the patch, but
> it was not merged to upstream. See
> https://lore.kernel.org/lkml/20200203053650.8923-1-xiyou.wangcong@gmail.com/.
>
> The problems fixed by that patch still exist and we do need it to make
> disk error handling in rasdaemon easier. So this resurrected it and
> continued the version number.
>
> V6 --> V7:
>  * Declare new trace event block_rq_completion and use it with
>    bio_rq_complete and bio_rq_error() to avoid code repetation.
>  * Add cover letter and document details.
>  * Add performance numbers.
>
> v5 --> v6:
>  * Removed disk name per Christoph and Chaitanya
>  * Kept errno since I didn't find any other block tracepoints print blk
>    status code and userspace (i.e. rasdaemon) does expect errno.
> v4 --> v5:
>  * Report the actual block layer status code instead of the errno per
>    Christoph Hellwig.
> v3 --> v4:
>  * Rebased to v5.17-rc1.
>  * Collected reviewed-by tag from Steven.
>
>
> Chaitanya Kulkarni (1):
>   block: create event class for rq completion
>
> Yang Shi (1):
>   block: introduce block_rq_error tracepoint
>
>  block/blk-mq.c               |  4 ++-
>  include/trace/events/block.h | 49 ++++++++++++++++++++++++++----------
>  2 files changed, 39 insertions(+), 14 deletions(-)
>
>
>
> * Detailed test log for two tracepoints block_rq_complete() and
>   block_rq_error():
>
> * Modified null_blk to fail any incoming REQ_OP_WRITE for
>    testing :-
>
> root@dev linux-block (for-next) # git diff drivers/block/null_blk/main.c
> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
> index 13004beb48ca..0376d0f46fdf 100644
> --- a/drivers/block/null_blk/main.c
> +++ b/drivers/block/null_blk/main.c
> @@ -1414,6 +1414,9 @@ static blk_status_t null_handle_cmd(struct nullb_cmd *cmd, sector_t sector,
>                         return sts;
>         }
>
> +       if (op == REQ_OP_WRITE)
> +               return BLK_STS_IOERR;
> +
>         if (op == REQ_OP_FLUSH) {
>                 cmd->error = errno_to_blk_status(null_handle_flush(nullb));
>                 goto out;
> root@dev tracing # modprobe null_blk
> root@dev tracing # ls -l /dev/nullb0
> brw-rw----. 1 root disk 251, 0 Feb  4 02:45 /dev/nullb0
> root@dev tracing # # note the above major number
>
> * Test both block_rq_complete() and block_rq_error() tracepoints :-
>
> + cd tracing
> + modprobe -r null_blk
> + rm -fr /dev/nullb0
> + modprobe null_blk
> + sleep 1
> + set +x
> ###############################################################
> # Disable block_rq_[complete|error] tracepoints
> #
> + echo 0
> + echo 0
> + cat events/block/block_rq_complete/enable
> 0
> + cat events/block/block_rq_error/enable
> 0
> + set +x
> ###############################################################
> # Enable block_rq_complete() tracepoint and generate write error
> #
> + echo 1
> + cat events/block/block_rq_complete/enable
> 1
> + dd if=/dev/zero of=/dev/nullb0 bs=64k count=10 oflag=direct seek=1024
> dd: error writing '/dev/nullb0': Input/output error
> 1+0 records in
> 0+0 records out
> 0 bytes copied, 0.00268675 s, 0.0 kB/s
> + cat trace
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 1/1   #P:48
> #
> #                                _-----=> irqs-off/BH-disabled
> #                               / _----=> need-resched
> #                              | / _---=> hardirq/softirq
> #                              || / _--=> preempt-depth
> #                              ||| / _-=> migrate-disable
> #                              |||| /     delay
> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> #              | |         |   |||||     |         |
>     kworker/0:1H-1090    [000] .....   586.020342: block_rq_complete: 251,0 WS () 131072 + 128 [-5]
> + echo ''
> + set +x
> ###############################################################
> # Enable block_rq_[complete|error]() tracepoint and
> # generate write error
> #
> + echo 1
> + cat events/block/block_rq_error/enable
> 1
> + dd if=/dev/zero of=/dev/nullb0 bs=64k count=10 oflag=direct seek=10240
> dd: error writing '/dev/nullb0': Input/output error
> 1+0 records in
> 0+0 records out
> 0 bytes copied, 0.0022944 s, 0.0 kB/s
> + cat trace
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 2/2   #P:48
> #
> #                                _-----=> irqs-off/BH-disabled
> #                               / _----=> need-resched
> #                              | / _---=> hardirq/softirq
> #                              || / _--=> preempt-depth
> #                              ||| / _-=> migrate-disable
> #                              |||| /     delay
> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> #              | |         |   |||||     |         |
>     kworker/0:1H-1090    [000] .....   586.064527: block_rq_complete: 251,0 WS () 1310720 + 128 [-5]
>     kworker/0:1H-1090    [000] .....   586.066135: block_rq_error: 251,0 WS () 1310720 + 128 [-5]
> + echo ''
> + set +x
> ###############################################################
> # Disable block_rq_complete() and keep block_rq_error()
> # tracepoint enabled and generate write error
> #
> + echo 0
> + cat events/block/block_rq_complete/enable
> 0
> + dd if=/dev/zero of=/dev/nullb0 bs=64k count=10 oflag=direct seek=10240
> dd: error writing '/dev/nullb0': Input/output error
> 1+0 records in
> 0+0 records out
> 0 bytes copied, 0.00235022 s, 0.0 kB/s
> + cat trace
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 1/1   #P:48
> #
> #                                _-----=> irqs-off/BH-disabled
> #                               / _----=> need-resched
> #                              | / _---=> hardirq/softirq
> #                              || / _--=> preempt-depth
> #                              ||| / _-=> migrate-disable
> #                              |||| /     delay
> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> #              | |         |   |||||     |         |
>     kworker/0:1H-1090    [000] .....   586.110419: block_rq_error: 251,0 WS () 1310720 + 128 [-5]
> + echo ''
> + modprobe -r null_blk
> + set +x
>
> --
> 2.29.0
>
diff mbox

Patch

diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index 13004beb48ca..0376d0f46fdf 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -1414,6 +1414,9 @@  static blk_status_t null_handle_cmd(struct nullb_cmd *cmd, sector_t sector,
                        return sts;
        }
 
+       if (op == REQ_OP_WRITE)
+               return BLK_STS_IOERR;
+
        if (op == REQ_OP_FLUSH) {
                cmd->error = errno_to_blk_status(null_handle_flush(nullb));
                goto out;