Message ID | 20221016052006.11126-1-kch@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | null_blk: allow teardown on request timeout | expand |
On 10/16/22 14:20, Chaitanya Kulkarni wrote: > In current timeout implementation null_blk just completes the request > with error=BLK_STS_TIMEOUT without doing any cleanup, hence device > cleanup code including handling inflight requests on timeout and > teardown is never exercised. > > Add a module parameter rq_abort_limit to allow null_blk perform device > cleanup when time out occurs. The non zero value of this parameter > allows user to set the number of timeouts to occur before triggering > cleanup/teardown work. > > Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> > --- > * Test result enabling the rq_abort_limit from module parameter and > issueing dd until null_blk device disappears from the pathname > with connetion timeout error. > > linux-block (for-next) # !./test_ > ./test_null_blk_abort_on_timeout.sh > commit 4fbcf978ae8f385aa1830ce40f210c7eec8d58fa (HEAD -> for-next) > Author: Chaitanya Kulkarni <kch@nvidia.com> > Date: Tue Oct 11 23:22:43 2022 -0700 > > null_blk: allow teardown on request timeout > > In current timeout implementation null_blk just completes the request > with error=BLK_STS_TIMEOUT without doing any cleanup, hence device > cleanup code including handling inflight requests on timeout and > teardown is never exercised. > > Add a module parameter rq_abort_limit to allow null_blk perform device > cleanup when time out occurs. The non zero value of this parameter > allows user to set the number of timeouts to occur before triggering > cleanup/teardown work. > > Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> > + umount /mnt/nullb0 > umount: /mnt/nullb0: not mounted. > + rmdir 'config/nullb/nullb*' > rmdir: failed to remove 'config/nullb/nullb*': No such file or directory > + dmesg -c > + modprobe -r null_blk > + lsmod > + grep null_blk > ++ nproc > + make -j 48 M=drivers/block modules > CC [M] drivers/block/null_blk/main.o > CC [M] drivers/block/null_blk/trace.o > CC [M] drivers/block/null_blk/zoned.o > LD [M] drivers/block/null_blk/null_blk.o > MODPOST drivers/block/Module.symvers > LD [M] drivers/block/null_blk/null_blk.ko > + HOST=drivers/block/null_blk/ > ++ uname -r > + HOST_DEST=/lib/modules/6.0.0blk+/kernel/drivers/block/null_blk/ > + cp drivers/block/null_blk//null_blk.ko /lib/modules/6.0.0blk+/kernel/drivers/block/null_blk// > + ls -lrth /lib/modules/6.0.0blk+/kernel/drivers/block/null_blk//null_blk.ko > -rw-r--r--. 1 root root 1.2M Oct 15 21:51 /lib/modules/6.0.0blk+/kernel/drivers/block/null_blk//null_blk.ko > + sleep 1 > + dmesg -c > ################## > ################## > nullb0 250:0 0 250G 0 disk > dd: error writing '/dev/nullb0': Connection timed out > 3+0 records in > 2+0 records out > 8192 bytes (8.2 kB, 8.0 KiB) copied, 6.13436 s, 1.3 kB/s > ################## > ################## > nullb0 250:0 0 250G 0 disk > [ 1598.393544] null_blk: rq 0000000015ff5165 timed out > [ 1598.393555] blk_print_req_error: 74 callbacks suppressed > [ 1598.393558] timeout error, dev nullb0, sector 8 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1598.395885] Buffer I/O error on dev nullb0, logical block 1, async page read > [ 1598.397731] Dev nullb0: unable to read RDB block 8 > [ 1598.398990] nullb0: unable to read partition table > [ 1598.399069] null_blk: disk nullb0 created > [ 1598.399071] null_blk: module loaded > [ 1604.537424] null_blk: rq 00000000b2a753f0 timed out > [ 1604.537437] timeout error, dev nullb0, sector 16 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2 > dd: error writing '/dev/nullb0': Connection timed out > 1+0 records in > 0+0 records out > 0 bytes copied, 6.13561 s, 0.0 kB/s > ################## > ################## > nullb0 250:0 0 250G 0 disk > [ 1610.681242] null_blk: rq 00000000a392790b timed out > [ 1610.681253] timeout error, dev nullb0, sector 0 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2 > dd: error writing '/dev/nullb0': Connection timed out > 1+0 records in > 0+0 records out > 0 bytes copied, 6.13985 s, 0.0 kB/s > ################## > ################## > nullb0 250:0 0 250G 0 disk > [ 1616.825155] null_blk: rq 0000000091a70f01 timed out > [ 1616.825168] timeout error, dev nullb0, sector 0 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2 > dd: error writing '/dev/nullb0': Connection timed out > 1+0 records in > 0+0 records out > 0 bytes copied, 6.1325 s, 0.0 kB/s > ################## > ################## > nullb0 250:0 0 250G 0 disk > [ 1622.969017] null_blk: rq 00000000cdb610e7 timed out > [ 1622.969029] timeout error, dev nullb0, sector 0 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2 > dd: error writing '/dev/nullb0': Connection timed out > 2+0 records in > 1+0 records out > 4096 bytes (4.1 kB, 4.0 KiB) copied, 6.14283 s, 0.7 kB/s > ################## > ################## > [ 1629.112893] null_blk: rq 00000000555ecc5e timed out > [ 1629.118817] blk_sync_queue 230 > [ 1629.118821] blk_sync_queue 232 > [ 1629.118823] blk_sync_queue 234 > [ 1629.118828] timeout error, dev nullb0, sector 8 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2 > [ 1629.125044] blk_sync_queue 230 > [ 1629.125048] blk_sync_queue 232 > [ 1629.125049] blk_sync_queue 234 > ################## > ################## > > * Test result enabling the rq_abort_limit from module parameter and > issueing fio until null_blk device disappears from the pathname > with connetion timeout error. > nullb0 250:0 0 250G 0 disk > RANDREAD: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 > fio-3.27 > Starting 1 process > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=214274048, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=32268288, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=107802624, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=193708032, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=202928128, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=281436160, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=33292288, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=250576896, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=41902080, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=225521664, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=303058944, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=405196800, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=309719040, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=111214592, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=254062592, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=128663552, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=76795904, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=355647488, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=5873664, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=69328896, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=216276992, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=16027648, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=328343552, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=385630208, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=137695232, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=146546688, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=157765632, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=290615296, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=230440960, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=130961408, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=88547328, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=157401088, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=296202240, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=35819520, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=212545536, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=261697536, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=66195456, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=254603264, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=169242624, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=258154496, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=69320704, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=335650816, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=192950272, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=32813056, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=5804032, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=217219072, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=153227264, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=91131904, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=241209344, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=246509568, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=332935168, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=108859392, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=86061056, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=278966272, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=219004928, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=290512896, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=143306752, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=176312320, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=346566656, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=183422976, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=383885312, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=199393280, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=160944128, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=142491648, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=217841664, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=178466816, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=331423744, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=88883200, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=311562240, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=228642816, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=362893312, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=323035136, buflen=4096 > fio: io_u error on file /dev/nullb0: Input/output error: read offset=385089536, buflen=4096 > fio: pid=7811, err=110/file:io_u.c:1841, func=io_u error, error=Connection timed out > > RANDREAD: (groupid=0, jobs=1): err=110 (file:io_u.c:1841, func=io_u error, error=Connection timed out): pid=7811: Sat Oct 15 21:52:09 2022 > read: IOPS=33, BW=64.6KiB/s (66.1kB/s)(420KiB/6503msec) > slat (nsec): min=942, max=6485.4M, avg=29824299.02, stdev=439245367.62 > clat (usec): min=525, max=6501.3k, avg=3899782.52, stdev=3194606.52 > lat (usec): min=530, max=6501.3k, avg=3961597.01, stdev=3181404.42 > clat percentiles (usec): > | 1.00th=[ 545], 5.00th=[ 553], 10.00th=[ 562], > | 20.00th=[ 611], 30.00th=[ 15795], 40.00th=[ 15795], > | 50.00th=[6476006], 60.00th=[6476006], 70.00th=[6476006], > | 80.00th=[6476006], 90.00th=[6476006], 95.00th=[6476006], > | 99.00th=[6476006], 99.50th=[6476006], 99.90th=[6476006], > | 99.95th=[6476006], 99.99th=[6476006] > bw ( KiB/s): min= 216, max= 216, per=100.00%, avg=216.00, stdev= 0.00, samples=1 > iops : min= 54, max= 54, avg=54.00, stdev= 0.00, samples=1 > lat (usec) : 750=12.39% > lat (msec) : 20=6.88%, >=2000=28.90% > cpu : usr=0.00%, sys=0.00%, ctx=4, majf=0, minf=150 > IO depths : 1=0.5%, 2=0.9%, 4=1.8%, 8=3.7%, 16=7.3%, 32=14.7%, >=64=71.1% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=98.9%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=1.1% > issued rwts: total=218,0,0,0 short=40,0,0,0 dropped=0,0,0,0 > latency : target=0, window=0, percentile=100.00%, depth=128 > > Run status group 0 (all jobs): > READ: bw=64.6KiB/s (66.1kB/s), 64.6KiB/s-64.6KiB/s (66.1kB/s-66.1kB/s), io=420KiB (430kB), run=6503-6503msec > > Disk stats (read/write): > nullb0: ios=90/0, merge=0/0, ticks=0/0, in_queue=0, util=98.55% > ################## > ################## > [ 1635.256737] null_blk: rq 00000000834b8ba3 timed out > [ 1635.256748] timeout error, dev nullb0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1635.258876] Buffer I/O error on dev nullb0, logical block 0, async page read > [ 1641.400625] null_blk: rq 00000000ee97fbff timed out > [ 1641.400636] timeout error, dev nullb0, sector 8 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1641.402919] Buffer I/O error on dev nullb0, logical block 1, async page read > [ 1641.404755] Dev nullb0: unable to read RDB block 8 > [ 1641.406666] nullb0: unable to read partition table > [ 1641.407559] null_blk: disk nullb0 created > [ 1641.407562] null_blk: module loaded > [ 1648.056420] null_blk: rq 00000000834b8ba3 timed out > [ 1648.056432] timeout error, dev nullb0, sector 418504 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1648.058914] null_blk: rq 0000000064c974a9 timed out > [ 1648.058918] timeout error, dev nullb0, sector 63024 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1648.061288] null_blk: rq 00000000ee97fbff timed out > [ 1648.061291] timeout error, dev nullb0, sector 210552 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1648.063663] null_blk: rq 000000006c6fa717 timed out > [ 1648.063667] timeout error, dev nullb0, sector 378336 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1648.066011] null_blk: rq 00000000fa856166 timed out > [ 1648.066015] null_blk: rq 00000000de9bec9a timed out > [ 1648.066016] timeout error, dev nullb0, sector 396344 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1648.068390] null_blk: rq 00000000f12b4b17 timed out > [ 1648.068393] timeout error, dev nullb0, sector 549680 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1648.071389] null_blk: rq 00000000667726b2 timed out > [ 1648.071393] timeout error, dev nullb0, sector 65024 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1648.073740] null_blk: rq 0000000088f0f903 timed out > [ 1648.073744] timeout error, dev nullb0, sector 489408 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1648.078759] null_blk: rq 00000000fd349776 timed out > [ 1648.078762] timeout error, dev nullb0, sector 81840 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1648.081093] null_blk: rq 000000002eb20dc4 timed out > [ 1648.081096] timeout error, dev nullb0, sector 440472 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1648.083492] null_blk: rq 000000002db4026f timed out > [ 1648.083496] null_blk: rq 00000000db8f8be3 timed out > [ 1648.083499] null_blk: rq 00000000078a6daf timed out > [ 1648.083502] null_blk: rq 000000002c1b7c68 timed out > [ 1648.083505] null_blk: rq 00000000790d87a1 timed out > [ 1648.083508] null_blk: rq 000000004f48b7de timed out > [ 1648.083516] null_blk: rq 0000000074a5ff86 timed out > [ 1648.083518] null_blk: rq 000000000daf10f8 timed out > [ 1648.083521] null_blk: rq 00000000d54f527e timed out > [ 1648.083523] null_blk: rq 0000000009dc7bb8 timed out > [ 1648.083526] null_blk: rq 0000000017e81cd2 timed out > [ 1648.083529] null_blk: rq 0000000088e83fa2 timed out > [ 1648.083532] null_blk: rq 000000003c014afe timed out > [ 1648.083534] null_blk: rq 00000000b7a99358 timed out > [ 1648.083537] null_blk: rq 00000000d2d80485 timed out > [ 1648.083539] null_blk: rq 000000008805db58 timed out > [ 1648.083541] null_blk: rq 000000005c2e8f7a timed out > [ 1648.083544] null_blk: rq 00000000a33a2f2d timed out > [ 1648.083546] null_blk: rq 00000000d05bf454 timed out > [ 1648.083549] null_blk: rq 0000000080cb849f timed out > [ 1648.083552] null_blk: rq 000000009cbb48e0 timed out > [ 1648.083555] null_blk: rq 000000009e426689 timed out > [ 1648.083558] null_blk: rq 000000003705d117 timed out > [ 1648.083561] null_blk: rq 0000000010bc80b4 timed out > [ 1648.083563] null_blk: rq 000000004da816da timed out > [ 1648.083566] null_blk: rq 0000000081ceaf3f timed out > [ 1648.083568] null_blk: rq 00000000fb1e808e timed out > [ 1648.083570] null_blk: rq 000000006baf39c2 timed out > [ 1648.083573] null_blk: rq 00000000241e851f timed out > [ 1648.083575] null_blk: rq 00000000a7b7ff27 timed out > [ 1648.083577] null_blk: rq 0000000092e37377 timed out > [ 1648.083580] null_blk: rq 00000000a2b070cd timed out > [ 1648.083583] null_blk: rq 00000000a2c63d6f timed out > [ 1648.083585] null_blk: rq 00000000b775711d timed out > [ 1648.083588] null_blk: rq 000000000bfe50e6 timed out > [ 1648.083590] null_blk: rq 000000005c0b7039 timed out > [ 1648.083593] null_blk: rq 00000000af60d454 timed out > [ 1648.083595] null_blk: rq 00000000d2a38d75 timed out > [ 1648.083598] null_blk: rq 0000000000ae81a8 timed out > [ 1648.083601] null_blk: rq 000000001974c62a timed out > [ 1648.083603] null_blk: rq 000000002aaf80e6 timed out > [ 1648.083605] null_blk: rq 00000000bcdf1c85 timed out > [ 1648.083608] null_blk: rq 000000008b5d9eb6 timed out > [ 1648.083610] null_blk: rq 0000000066fcf345 timed out > [ 1648.083613] null_blk: rq 0000000067dc1221 timed out > [ 1648.083615] null_blk: rq 0000000088e78508 timed out > [ 1648.083618] null_blk: rq 000000009d1352eb timed out > [ 1648.083620] null_blk: rq 00000000a7019839 timed out > [ 1648.083622] null_blk: rq 00000000f5756a31 timed out > [ 1648.083625] null_blk: rq 00000000841b16a6 timed out > [ 1648.083627] null_blk: rq 0000000082fa3e11 timed out > [ 1648.083630] null_blk: rq 00000000794e9904 timed out > [ 1648.083632] null_blk: rq 00000000c3bcf5d1 timed out > [ 1648.094395] blk_sync_queue 230 > [ 1648.094400] blk_sync_queue 232 > [ 1648.094402] blk_sync_queue 234 > [ 1648.096130] blk_sync_queue 230 > [ 1648.096133] blk_sync_queue 232 > [ 1648.096135] blk_sync_queue 234 > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > sda 8:0 0 50G 0 disk > ├─sda1 8:1 0 1G 0 part /boot > └─sda2 8:2 0 49G 0 part /home > sdb 8:16 0 100G 0 disk /mnt/data > sr0 11:0 1 1024M 0 rom > zram0 251:0 0 8G 0 disk [SWAP] > --- > drivers/block/null_blk/main.c | 90 +++++++++++++++++++++++++++++-- > drivers/block/null_blk/null_blk.h | 10 ++++ > 2 files changed, 97 insertions(+), 3 deletions(-) > > diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c > index 1f154f92f4c2..52db6b99b448 100644 > --- a/drivers/block/null_blk/main.c > +++ b/drivers/block/null_blk/main.c > @@ -77,6 +77,10 @@ enum { > NULL_IRQ_TIMER = 2, > }; > > +static unsigned int g_rq_abort_limit = 5; > +module_param_named(rq_abort_limit, g_rq_abort_limit, uint, 0644); > +MODULE_PARM_DESC(rq_abort_limit, "request timeout teardown limit. Default:5"); Number of request timeout to trigger device teardown ? That would a lot clearer in my opinion. > + > static bool g_virt_boundary = false; > module_param_named(virt_boundary, g_virt_boundary, bool, 0444); > MODULE_PARM_DESC(virt_boundary, "Require a virtual boundary for the device. Default: False"); > @@ -247,6 +251,7 @@ static void null_del_dev(struct nullb *nullb); > static int null_add_dev(struct nullb_device *dev); > static struct nullb *null_find_dev_by_name(const char *name); > static void null_free_device_storage(struct nullb_device *dev, bool is_cache); > +static void null_destroy_dev(struct nullb *nullb); > > static inline struct nullb_device *to_nullb_device(struct config_item *item) > { > @@ -578,6 +583,18 @@ config_item *nullb_group_make_item(struct config_group *group, const char *name) > { > struct nullb_device *dev; > > + if (g_rq_abort_limit) { > + /* > + * abort_on_timeout removes the null_blk and resources. When ...the null_blk device and its resources. When the null_blk device is created using configfs, ... The remaining of the sentence does not parse at all. > + * null_blk is created using configfs entry by user we will also > + * need to cleanup the those entries when abort_on_timeout is set > + * from null_abort_work() and that we shold not do it, since > + * manupulating user's entries from kernel can create confusion, > + * so just don't allow it. > + */ > + pr_err("don't use g_abort_on_timeout with configfs entries\n"); > + return ERR_PTR(-EINVAL); > + } > if (null_find_dev_by_name(name)) > return ERR_PTR(-EEXIST); > > @@ -614,7 +631,7 @@ static ssize_t memb_group_features_show(struct config_item *item, char *page) > "poll_queues,power,queue_mode,shared_tag_bitmap,size," > "submit_queues,use_per_node_hctx,virt_boundary,zoned," > "zone_capacity,zone_max_active,zone_max_open," > - "zone_nr_conv,zone_size\n"); > + "zone_nr_conv,zone_size,abort_on_timeout,rq_abort_limit\n"); Where is abort_on_timeout defined ? Nowhere to be seen. Does this patch even compile ? Also, assuming this is a boolean, why introduce it ? Wouldn't using "rq_abort_limit > 0" be equivalent ? > } > > CONFIGFS_ATTR_RO(memb_group_, features); > @@ -1609,10 +1626,60 @@ static int null_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob) > return nr; > } > > +static bool null_cancel_request(struct request *rq, void *data) > +{ > + struct nullb_cmd *cmd = blk_mq_rq_to_pdu(rq); > + > + /* > + * Keep it simple and set the timeout error since we've been schedule > + * from timeout handler afterall. > + */ > + cmd->error = BLK_STS_TIMEOUT; > + if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) > + blk_mq_complete_request(rq); > + return true; > +} > + > +static void null_abort_work(struct work_struct *w) > +{ > + struct nullb *nullb = container_of(w, struct nullb, abort_work); > + > + mutex_lock(&lock); > + if (!nullb || !nullb->dev) > + goto unlock; > + > + /* > + * To make sure that null_handle_cmd() callers are not in progress from > + * null_queue_rq()/null_submit_bio() when we are stopping nullb, we > + * quiece the block layer request queue and engage in teardown sequence. > + */ > + blk_freeze_queue_start(nullb->q); > + blk_mq_quiesce_queue(nullb->q); > + /* > + * We already ensured that submit_bio() will not add any plugging by > + * quiescing so it is safe to sync queue now. > + */ > + blk_sync_queue(nullb->q); > + blk_mq_tagset_busy_iter(nullb->tag_set, null_cancel_request, nullb); > + blk_mq_tagset_wait_completed_request(nullb->tag_set); > + /* > + * Unblock any pending dispatch I/Os before we destroy the device. > + * From null_destroy_dev()->del_gendisk() will set GD_DEAD flag > + * causing any new I/O from __bio_queue_enter() to fail with -ENODEV. > + */ > + blk_mq_unquiesce_queue(nullb->q); > + > + null_destroy_dev(nullb); > +unlock: > + mutex_unlock(&lock); > +} > + > static enum blk_eh_timer_return null_timeout_rq(struct request *rq) > { > struct blk_mq_hw_ctx *hctx = rq->mq_hctx; > struct nullb_cmd *cmd = blk_mq_rq_to_pdu(rq); > + struct nullb *nullb = cmd->nq->dev->nullb; > + bool fake_timeout_no_abort = cmd->fake_timeout && !g_rq_abort_limit; > > pr_info("rq %p timed out\n", rq); > > @@ -1632,9 +1699,23 @@ static enum blk_eh_timer_return null_timeout_rq(struct request *rq) > * Only fake timeouts need to execute blk_mq_complete_request() here. > */ > cmd->error = BLK_STS_TIMEOUT; > - if (cmd->fake_timeout || hctx->type == HCTX_TYPE_POLL) > + if (fake_timeout_no_abort || hctx->type == HCTX_TYPE_POLL) { > blk_mq_complete_request(rq); > - return BLK_EH_DONE; > + return BLK_EH_DONE; > + } > + > + if (atomic_dec_return(&nullb->rq_abort_count) != 0) { > + blk_mq_complete_request(rq); > + return BLK_EH_DONE; > + } > + > + if (!flush_work(&nullb->abort_work)) > + queue_work(system_highpri_wq, &nullb->abort_work); > + /* > + * null_abort_work() will take care of completing all the requests > + * including timed out one. > + */ > + return BLK_EH_RESET_TIMER; > } > > static blk_status_t null_queue_rq(struct blk_mq_hw_ctx *hctx, > @@ -2124,6 +2205,9 @@ static int null_add_dev(struct nullb_device *dev) > if (rv) > goto out_ida_free; > > + INIT_WORK(&nullb->abort_work, null_abort_work); > + atomic_set(&nullb->rq_abort_count, g_rq_abort_limit); > + > mutex_lock(&lock); > list_add_tail(&nullb->list, &nullb_list); > mutex_unlock(&lock); > diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h > index 94ff68052b1e..d7f5d3161908 100644 > --- a/drivers/block/null_blk/null_blk.h > +++ b/drivers/block/null_blk/null_blk.h > @@ -133,6 +133,16 @@ struct nullb { > > struct nullb_queue *queues; > unsigned int nr_queues; > + /* > + * null_blk request timeout teardown limit when device is in the > + * stable state, i.e. once this limit is reached issue > + * null_abort_work() to teardown the device from block lyaer > + * request timeout callback and cleanup resources such as > + * memory and pathname. s/issue/execute s/lyaer/layer But I think this can be simplified to something like: /* * Number of requests timeout failures allowed before trigerring * a device teardown from the block layer request timeout callback. */ > + */ > + atomic_t rq_abort_count; > + /* tear down work to be scheduled from block layer request handler */ This comment is not really useful. > + struct work_struct abort_work; > char disk_name[DISK_NAME_LEN]; > }; >
>> drivers/block/null_blk/main.c | 90 +++++++++++++++++++++++++++++-- >> drivers/block/null_blk/null_blk.h | 10 ++++ >> 2 files changed, 97 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c >> index 1f154f92f4c2..52db6b99b448 100644 >> --- a/drivers/block/null_blk/main.c >> +++ b/drivers/block/null_blk/main.c >> @@ -77,6 +77,10 @@ enum { >> NULL_IRQ_TIMER = 2, >> }; >> >> +static unsigned int g_rq_abort_limit = 5; >> +module_param_named(rq_abort_limit, g_rq_abort_limit, uint, 0644); >> +MODULE_PARM_DESC(rq_abort_limit, "request timeout teardown limit. Default:5"); > > Number of request timeout to trigger device teardown ? > > That would a lot clearer in my opinion. okay, will add it to V2. > >> + >> static bool g_virt_boundary = false; >> module_param_named(virt_boundary, g_virt_boundary, bool, 0444); >> MODULE_PARM_DESC(virt_boundary, "Require a virtual boundary for the device. Default: False"); >> @@ -247,6 +251,7 @@ static void null_del_dev(struct nullb *nullb); >> static int null_add_dev(struct nullb_device *dev); >> static struct nullb *null_find_dev_by_name(const char *name); >> static void null_free_device_storage(struct nullb_device *dev, bool is_cache); >> +static void null_destroy_dev(struct nullb *nullb); >> >> static inline struct nullb_device *to_nullb_device(struct config_item *item) >> { >> @@ -578,6 +583,18 @@ config_item *nullb_group_make_item(struct config_group *group, const char *name) >> { >> struct nullb_device *dev; >> >> + if (g_rq_abort_limit) { >> + /* >> + * abort_on_timeout removes the null_blk and resources. When > > ...the null_blk device and its resources. When the null_blk device is > created using configfs, ... > > The remaining of the sentence does not parse at all. okay, will rephrase it in V2. > >> + * null_blk is created using configfs entry by user we will also >> + * need to cleanup the those entries when abort_on_timeout is set >> + * from null_abort_work() and that we shold not do it, since >> + * manupulating user's entries from kernel can create confusion, >> + * so just don't allow it. >> + */ >> + pr_err("don't use g_abort_on_timeout with configfs entries\n"); >> + return ERR_PTR(-EINVAL); >> + } >> if (null_find_dev_by_name(name)) >> return ERR_PTR(-EEXIST); >> >> @@ -614,7 +631,7 @@ static ssize_t memb_group_features_show(struct config_item *item, char *page) >> "poll_queues,power,queue_mode,shared_tag_bitmap,size," >> "submit_queues,use_per_node_hctx,virt_boundary,zoned," >> "zone_capacity,zone_max_active,zone_max_open," >> - "zone_nr_conv,zone_size\n"); >> + "zone_nr_conv,zone_size,abort_on_timeout,rq_abort_limit\n"); > > Where is abort_on_timeout defined ? Nowhere to be seen. Does this patch > even compile ? Also, assuming this is a boolean, why introduce it ? Please note that this patch is complied and tested with the test report of applying this patch see above for the git am, compile and running fio n dd commands to trigger multiple timeouts and teardown. The abort on timeout needs to be removed since I got confused between what name should be used... > Wouldn't using "rq_abort_limit > 0" be equivalent ? > this is exactly what it is doing right now, unless I missed that... [...] >> + /* >> + * null_blk request timeout teardown limit when device is in the >> + * stable state, i.e. once this limit is reached issue >> + * null_abort_work() to teardown the device from block lyaer >> + * request timeout callback and cleanup resources such as >> + * memory and pathname. > > s/issue/execute > s/lyaer/layer > > But I think this can be simplified to something like: > > /* > * Number of requests timeout failures allowed before trigerring > * a device teardown from the block layer request timeout callback. > */ > Okay, if that works for everyone will add to V2. >> + */ >> + atomic_t rq_abort_count; >> + /* tear down work to be scheduled from block layer request handler */ > > This comment is not really useful. > >> + struct work_struct abort_work; >> char disk_name[DISK_NAME_LEN]; >> }; >> > thanks for the comments, will send out V2 with comments fixed. -ck
On Sat, Oct 15, 2022 at 10:20:06PM -0700, Chaitanya Kulkarni wrote: > In current timeout implementation null_blk just completes the request > with error=BLK_STS_TIMEOUT without doing any cleanup, hence device > cleanup code including handling inflight requests on timeout and > teardown is never exercised. > > Add a module parameter rq_abort_limit to allow null_blk perform device > cleanup when time out occurs. The non zero value of this parameter > allows user to set the number of timeouts to occur before triggering > cleanup/teardown work. > > Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> > --- > * Test result enabling the rq_abort_limit from module parameter and > issueing dd until null_blk device disappears from the pathname > with connetion timeout error. > > linux-block (for-next) # !./test_ > ./test_null_blk_abort_on_timeout.sh > commit 4fbcf978ae8f385aa1830ce40f210c7eec8d58fa (HEAD -> for-next) > Author: Chaitanya Kulkarni <kch@nvidia.com> > Date: Tue Oct 11 23:22:43 2022 -0700 > > null_blk: allow teardown on request timeout > > In current timeout implementation null_blk just completes the request > with error=BLK_STS_TIMEOUT without doing any cleanup, hence device > cleanup code including handling inflight requests on timeout and > teardown is never exercised. > > Add a module parameter rq_abort_limit to allow null_blk perform device > cleanup when time out occurs. The non zero value of this parameter > allows user to set the number of timeouts to occur before triggering > cleanup/teardown work. > > Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> > + umount /mnt/nullb0 > umount: /mnt/nullb0: not mounted. > + rmdir 'config/nullb/nullb*' > rmdir: failed to remove 'config/nullb/nullb*': No such file or directory > + dmesg -c > + modprobe -r null_blk > + lsmod > + grep null_blk > ++ nproc > + make -j 48 M=drivers/block modules > CC [M] drivers/block/null_blk/main.o > CC [M] drivers/block/null_blk/trace.o > CC [M] drivers/block/null_blk/zoned.o > LD [M] drivers/block/null_blk/null_blk.o > MODPOST drivers/block/Module.symvers > LD [M] drivers/block/null_blk/null_blk.ko > + HOST=drivers/block/null_blk/ > ++ uname -r > + HOST_DEST=/lib/modules/6.0.0blk+/kernel/drivers/block/null_blk/ > + cp drivers/block/null_blk//null_blk.ko /lib/modules/6.0.0blk+/kernel/drivers/block/null_blk// > + ls -lrth /lib/modules/6.0.0blk+/kernel/drivers/block/null_blk//null_blk.ko > -rw-r--r--. 1 root root 1.2M Oct 15 21:51 /lib/modules/6.0.0blk+/kernel/drivers/block/null_blk//null_blk.ko > + sleep 1 > + dmesg -c > ################## > ################## > nullb0 250:0 0 250G 0 disk > dd: error writing '/dev/nullb0': Connection timed out > 3+0 records in > 2+0 records out > 8192 bytes (8.2 kB, 8.0 KiB) copied, 6.13436 s, 1.3 kB/s > ################## > ################## > nullb0 250:0 0 250G 0 disk > [ 1598.393544] null_blk: rq 0000000015ff5165 timed out > [ 1598.393555] blk_print_req_error: 74 callbacks suppressed > [ 1598.393558] timeout error, dev nullb0, sector 8 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1598.395885] Buffer I/O error on dev nullb0, logical block 1, async page read > [ 1598.397731] Dev nullb0: unable to read RDB block 8 > [ 1598.398990] nullb0: unable to read partition table > [ 1598.399069] null_blk: disk nullb0 created > [ 1598.399071] null_blk: module loaded > [ 1604.537424] null_blk: rq 00000000b2a753f0 timed out > [ 1604.537437] timeout error, dev nullb0, sector 16 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2 > dd: error writing '/dev/nullb0': Connection timed out > 1+0 records in > 0+0 records out > 0 bytes copied, 6.13561 s, 0.0 kB/s > ################## > ################## > nullb0 250:0 0 250G 0 disk > [ 1610.681242] null_blk: rq 00000000a392790b timed out > [ 1610.681253] timeout error, dev nullb0, sector 0 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2 > dd: error writing '/dev/nullb0': Connection timed out > 1+0 records in > 0+0 records out > 0 bytes copied, 6.13985 s, 0.0 kB/s > ################## > ################## > nullb0 250:0 0 250G 0 disk > [ 1616.825155] null_blk: rq 0000000091a70f01 timed out > [ 1616.825168] timeout error, dev nullb0, sector 0 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2 > dd: error writing '/dev/nullb0': Connection timed out > 1+0 records in > 0+0 records out > 0 bytes copied, 6.1325 s, 0.0 kB/s > ################## > ################## > nullb0 250:0 0 250G 0 disk > [ 1622.969017] null_blk: rq 00000000cdb610e7 timed out > [ 1622.969029] timeout error, dev nullb0, sector 0 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2 > dd: error writing '/dev/nullb0': Connection timed out > 2+0 records in > 1+0 records out > 4096 bytes (4.1 kB, 4.0 KiB) copied, 6.14283 s, 0.7 kB/s > ################## > ################## > [ 1629.112893] null_blk: rq 00000000555ecc5e timed out > [ 1629.118817] blk_sync_queue 230 > [ 1629.118821] blk_sync_queue 232 > [ 1629.118823] blk_sync_queue 234 > [ 1629.118828] timeout error, dev nullb0, sector 8 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2 > [ 1629.125044] blk_sync_queue 230 > [ 1629.125048] blk_sync_queue 232 > [ 1629.125049] blk_sync_queue 234 > ################## > ################## > > * Test result enabling the rq_abort_limit from module parameter and > issueing fio until null_blk device disappears from the pathname > with connetion timeout error. > nullb0 250:0 0 250G 0 disk > RANDREAD: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 > fio-3.27 > Starting 1 process > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=214274048, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=32268288, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=107802624, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=193708032, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=202928128, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=281436160, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=33292288, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=250576896, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=41902080, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=225521664, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=303058944, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=405196800, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=309719040, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=111214592, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=254062592, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=128663552, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=76795904, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=355647488, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=5873664, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=69328896, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=216276992, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=16027648, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=328343552, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=385630208, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=137695232, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=146546688, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=157765632, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=290615296, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=230440960, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=130961408, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=88547328, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=157401088, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=296202240, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=35819520, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=212545536, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=261697536, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=66195456, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=254603264, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=169242624, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=258154496, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=69320704, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=335650816, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=192950272, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=32813056, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=5804032, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=217219072, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=153227264, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=91131904, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=241209344, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=246509568, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=332935168, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=108859392, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=86061056, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=278966272, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=219004928, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=290512896, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=143306752, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=176312320, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=346566656, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=183422976, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=383885312, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=199393280, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=160944128, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=142491648, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=217841664, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=178466816, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=331423744, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=88883200, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=311562240, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=228642816, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=362893312, buflen=4096 > fio: io_u error on file /dev/nullb0: Connection timed out: read offset=323035136, buflen=4096 > fio: io_u error on file /dev/nullb0: Input/output error: read offset=385089536, buflen=4096 > fio: pid=7811, err=110/file:io_u.c:1841, func=io_u error, error=Connection timed out > > RANDREAD: (groupid=0, jobs=1): err=110 (file:io_u.c:1841, func=io_u error, error=Connection timed out): pid=7811: Sat Oct 15 21:52:09 2022 > read: IOPS=33, BW=64.6KiB/s (66.1kB/s)(420KiB/6503msec) > slat (nsec): min=942, max=6485.4M, avg=29824299.02, stdev=439245367.62 > clat (usec): min=525, max=6501.3k, avg=3899782.52, stdev=3194606.52 > lat (usec): min=530, max=6501.3k, avg=3961597.01, stdev=3181404.42 > clat percentiles (usec): > | 1.00th=[ 545], 5.00th=[ 553], 10.00th=[ 562], > | 20.00th=[ 611], 30.00th=[ 15795], 40.00th=[ 15795], > | 50.00th=[6476006], 60.00th=[6476006], 70.00th=[6476006], > | 80.00th=[6476006], 90.00th=[6476006], 95.00th=[6476006], > | 99.00th=[6476006], 99.50th=[6476006], 99.90th=[6476006], > | 99.95th=[6476006], 99.99th=[6476006] > bw ( KiB/s): min= 216, max= 216, per=100.00%, avg=216.00, stdev= 0.00, samples=1 > iops : min= 54, max= 54, avg=54.00, stdev= 0.00, samples=1 > lat (usec) : 750=12.39% > lat (msec) : 20=6.88%, >=2000=28.90% > cpu : usr=0.00%, sys=0.00%, ctx=4, majf=0, minf=150 > IO depths : 1=0.5%, 2=0.9%, 4=1.8%, 8=3.7%, 16=7.3%, 32=14.7%, >=64=71.1% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=98.9%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=1.1% > issued rwts: total=218,0,0,0 short=40,0,0,0 dropped=0,0,0,0 > latency : target=0, window=0, percentile=100.00%, depth=128 > > Run status group 0 (all jobs): > READ: bw=64.6KiB/s (66.1kB/s), 64.6KiB/s-64.6KiB/s (66.1kB/s-66.1kB/s), io=420KiB (430kB), run=6503-6503msec > > Disk stats (read/write): > nullb0: ios=90/0, merge=0/0, ticks=0/0, in_queue=0, util=98.55% > ################## > ################## > [ 1635.256737] null_blk: rq 00000000834b8ba3 timed out > [ 1635.256748] timeout error, dev nullb0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1635.258876] Buffer I/O error on dev nullb0, logical block 0, async page read > [ 1641.400625] null_blk: rq 00000000ee97fbff timed out > [ 1641.400636] timeout error, dev nullb0, sector 8 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1641.402919] Buffer I/O error on dev nullb0, logical block 1, async page read > [ 1641.404755] Dev nullb0: unable to read RDB block 8 > [ 1641.406666] nullb0: unable to read partition table > [ 1641.407559] null_blk: disk nullb0 created > [ 1641.407562] null_blk: module loaded > [ 1648.056420] null_blk: rq 00000000834b8ba3 timed out > [ 1648.056432] timeout error, dev nullb0, sector 418504 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1648.058914] null_blk: rq 0000000064c974a9 timed out > [ 1648.058918] timeout error, dev nullb0, sector 63024 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1648.061288] null_blk: rq 00000000ee97fbff timed out > [ 1648.061291] timeout error, dev nullb0, sector 210552 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1648.063663] null_blk: rq 000000006c6fa717 timed out > [ 1648.063667] timeout error, dev nullb0, sector 378336 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1648.066011] null_blk: rq 00000000fa856166 timed out > [ 1648.066015] null_blk: rq 00000000de9bec9a timed out > [ 1648.066016] timeout error, dev nullb0, sector 396344 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1648.068390] null_blk: rq 00000000f12b4b17 timed out > [ 1648.068393] timeout error, dev nullb0, sector 549680 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1648.071389] null_blk: rq 00000000667726b2 timed out > [ 1648.071393] timeout error, dev nullb0, sector 65024 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1648.073740] null_blk: rq 0000000088f0f903 timed out > [ 1648.073744] timeout error, dev nullb0, sector 489408 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1648.078759] null_blk: rq 00000000fd349776 timed out > [ 1648.078762] timeout error, dev nullb0, sector 81840 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1648.081093] null_blk: rq 000000002eb20dc4 timed out > [ 1648.081096] timeout error, dev nullb0, sector 440472 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 > [ 1648.083492] null_blk: rq 000000002db4026f timed out > [ 1648.083496] null_blk: rq 00000000db8f8be3 timed out > [ 1648.083499] null_blk: rq 00000000078a6daf timed out > [ 1648.083502] null_blk: rq 000000002c1b7c68 timed out > [ 1648.083505] null_blk: rq 00000000790d87a1 timed out > [ 1648.083508] null_blk: rq 000000004f48b7de timed out > [ 1648.083516] null_blk: rq 0000000074a5ff86 timed out > [ 1648.083518] null_blk: rq 000000000daf10f8 timed out > [ 1648.083521] null_blk: rq 00000000d54f527e timed out > [ 1648.083523] null_blk: rq 0000000009dc7bb8 timed out > [ 1648.083526] null_blk: rq 0000000017e81cd2 timed out > [ 1648.083529] null_blk: rq 0000000088e83fa2 timed out > [ 1648.083532] null_blk: rq 000000003c014afe timed out > [ 1648.083534] null_blk: rq 00000000b7a99358 timed out > [ 1648.083537] null_blk: rq 00000000d2d80485 timed out > [ 1648.083539] null_blk: rq 000000008805db58 timed out > [ 1648.083541] null_blk: rq 000000005c2e8f7a timed out > [ 1648.083544] null_blk: rq 00000000a33a2f2d timed out > [ 1648.083546] null_blk: rq 00000000d05bf454 timed out > [ 1648.083549] null_blk: rq 0000000080cb849f timed out > [ 1648.083552] null_blk: rq 000000009cbb48e0 timed out > [ 1648.083555] null_blk: rq 000000009e426689 timed out > [ 1648.083558] null_blk: rq 000000003705d117 timed out > [ 1648.083561] null_blk: rq 0000000010bc80b4 timed out > [ 1648.083563] null_blk: rq 000000004da816da timed out > [ 1648.083566] null_blk: rq 0000000081ceaf3f timed out > [ 1648.083568] null_blk: rq 00000000fb1e808e timed out > [ 1648.083570] null_blk: rq 000000006baf39c2 timed out > [ 1648.083573] null_blk: rq 00000000241e851f timed out > [ 1648.083575] null_blk: rq 00000000a7b7ff27 timed out > [ 1648.083577] null_blk: rq 0000000092e37377 timed out > [ 1648.083580] null_blk: rq 00000000a2b070cd timed out > [ 1648.083583] null_blk: rq 00000000a2c63d6f timed out > [ 1648.083585] null_blk: rq 00000000b775711d timed out > [ 1648.083588] null_blk: rq 000000000bfe50e6 timed out > [ 1648.083590] null_blk: rq 000000005c0b7039 timed out > [ 1648.083593] null_blk: rq 00000000af60d454 timed out > [ 1648.083595] null_blk: rq 00000000d2a38d75 timed out > [ 1648.083598] null_blk: rq 0000000000ae81a8 timed out > [ 1648.083601] null_blk: rq 000000001974c62a timed out > [ 1648.083603] null_blk: rq 000000002aaf80e6 timed out > [ 1648.083605] null_blk: rq 00000000bcdf1c85 timed out > [ 1648.083608] null_blk: rq 000000008b5d9eb6 timed out > [ 1648.083610] null_blk: rq 0000000066fcf345 timed out > [ 1648.083613] null_blk: rq 0000000067dc1221 timed out > [ 1648.083615] null_blk: rq 0000000088e78508 timed out > [ 1648.083618] null_blk: rq 000000009d1352eb timed out > [ 1648.083620] null_blk: rq 00000000a7019839 timed out > [ 1648.083622] null_blk: rq 00000000f5756a31 timed out > [ 1648.083625] null_blk: rq 00000000841b16a6 timed out > [ 1648.083627] null_blk: rq 0000000082fa3e11 timed out > [ 1648.083630] null_blk: rq 00000000794e9904 timed out > [ 1648.083632] null_blk: rq 00000000c3bcf5d1 timed out > [ 1648.094395] blk_sync_queue 230 > [ 1648.094400] blk_sync_queue 232 > [ 1648.094402] blk_sync_queue 234 > [ 1648.096130] blk_sync_queue 230 > [ 1648.096133] blk_sync_queue 232 > [ 1648.096135] blk_sync_queue 234 > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > sda 8:0 0 50G 0 disk > ├─sda1 8:1 0 1G 0 part /boot > └─sda2 8:2 0 49G 0 part /home > sdb 8:16 0 100G 0 disk /mnt/data > sr0 11:0 1 1024M 0 rom > zram0 251:0 0 8G 0 disk [SWAP] > --- > drivers/block/null_blk/main.c | 90 +++++++++++++++++++++++++++++-- > drivers/block/null_blk/null_blk.h | 10 ++++ > 2 files changed, 97 insertions(+), 3 deletions(-) > > diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c > index 1f154f92f4c2..52db6b99b448 100644 > --- a/drivers/block/null_blk/main.c > +++ b/drivers/block/null_blk/main.c > @@ -77,6 +77,10 @@ enum { > NULL_IRQ_TIMER = 2, > }; > > +static unsigned int g_rq_abort_limit = 5; > +module_param_named(rq_abort_limit, g_rq_abort_limit, uint, 0644); > +MODULE_PARM_DESC(rq_abort_limit, "request timeout teardown limit. Default:5"); > + > static bool g_virt_boundary = false; > module_param_named(virt_boundary, g_virt_boundary, bool, 0444); > MODULE_PARM_DESC(virt_boundary, "Require a virtual boundary for the device. Default: False"); > @@ -247,6 +251,7 @@ static void null_del_dev(struct nullb *nullb); > static int null_add_dev(struct nullb_device *dev); > static struct nullb *null_find_dev_by_name(const char *name); > static void null_free_device_storage(struct nullb_device *dev, bool is_cache); > +static void null_destroy_dev(struct nullb *nullb); > > static inline struct nullb_device *to_nullb_device(struct config_item *item) > { > @@ -578,6 +583,18 @@ config_item *nullb_group_make_item(struct config_group *group, const char *name) > { > struct nullb_device *dev; > > + if (g_rq_abort_limit) { > + /* > + * abort_on_timeout removes the null_blk and resources. When > + * null_blk is created using configfs entry by user we will also > + * need to cleanup the those entries when abort_on_timeout is set > + * from null_abort_work() and that we shold not do it, since > + * manupulating user's entries from kernel can create confusion, > + * so just don't allow it. > + */ > + pr_err("don't use g_abort_on_timeout with configfs entries\n"); > + return ERR_PTR(-EINVAL); > + } > if (null_find_dev_by_name(name)) > return ERR_PTR(-EEXIST); > > @@ -614,7 +631,7 @@ static ssize_t memb_group_features_show(struct config_item *item, char *page) > "poll_queues,power,queue_mode,shared_tag_bitmap,size," > "submit_queues,use_per_node_hctx,virt_boundary,zoned," > "zone_capacity,zone_max_active,zone_max_open," > - "zone_nr_conv,zone_size\n"); > + "zone_nr_conv,zone_size,abort_on_timeout,rq_abort_limit\n"); > } > > CONFIGFS_ATTR_RO(memb_group_, features); > @@ -1609,10 +1626,60 @@ static int null_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob) > return nr; > } > > +static bool null_cancel_request(struct request *rq, void *data) > +{ > + struct nullb_cmd *cmd = blk_mq_rq_to_pdu(rq); > + > + /* > + * Keep it simple and set the timeout error since we've been schedule > + * from timeout handler afterall. > + */ > + cmd->error = BLK_STS_TIMEOUT; > + if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) > + blk_mq_complete_request(rq); > + return true; > +} > + > +static void null_abort_work(struct work_struct *w) > +{ > + struct nullb *nullb = container_of(w, struct nullb, abort_work); > + > + mutex_lock(&lock); > + if (!nullb || !nullb->dev) > + goto unlock; > + > + /* > + * To make sure that null_handle_cmd() callers are not in progress from > + * null_queue_rq()/null_submit_bio() when we are stopping nullb, we > + * quiece the block layer request queue and engage in teardown sequence. > + */ > + blk_freeze_queue_start(nullb->q); > + blk_mq_quiesce_queue(nullb->q); > + /* > + * We already ensured that submit_bio() will not add any plugging by > + * quiescing so it is safe to sync queue now. > + */ > + blk_sync_queue(nullb->q); > + blk_mq_tagset_busy_iter(nullb->tag_set, null_cancel_request, nullb); > + blk_mq_tagset_wait_completed_request(nullb->tag_set); > + /* > + * Unblock any pending dispatch I/Os before we destroy the device. > + * From null_destroy_dev()->del_gendisk() will set GD_DEAD flag > + * causing any new I/O from __bio_queue_enter() to fail with -ENODEV. > + */ > + blk_mq_unquiesce_queue(nullb->q); > + > + null_destroy_dev(nullb); destroying device is never good cleanup for handling timeout/abort, and it should have been the last straw any time. thanks, Ming
>> + /* >> + * Unblock any pending dispatch I/Os before we destroy the device. >> + * From null_destroy_dev()->del_gendisk() will set GD_DEAD flag >> + * causing any new I/O from __bio_queue_enter() to fail with -ENODEV. >> + */ >> + blk_mq_unquiesce_queue(nullb->q); >> + >> + null_destroy_dev(nullb); > > destroying device is never good cleanup for handling timeout/abort, and it > should have been the last straw any time. > That is exactly why I've added the rq_abort_limit, so until the limit is not reached null_abort_work() will not get scheduled and device is not destroyed. -ck
On Mon, Oct 17, 2022 at 09:30:47AM +0000, Chaitanya Kulkarni wrote: > > >> + /* > >> + * Unblock any pending dispatch I/Os before we destroy the device. > >> + * From null_destroy_dev()->del_gendisk() will set GD_DEAD flag > >> + * causing any new I/O from __bio_queue_enter() to fail with -ENODEV. > >> + */ > >> + blk_mq_unquiesce_queue(nullb->q); > >> + > >> + null_destroy_dev(nullb); > > > > destroying device is never good cleanup for handling timeout/abort, and it > > should have been the last straw any time. > > > > That is exactly why I've added the rq_abort_limit, so until the limit > is not reached null_abort_work() will not get scheduled and device is > not destroyed. I meant destroying device should only be done iff the normal abort handler can't recover the device, however, your patch simply destroys device without running any abort handling. Thanks, Ming
On 10/17/22 02:50, Ming Lei wrote: > On Mon, Oct 17, 2022 at 09:30:47AM +0000, Chaitanya Kulkarni wrote: >> >>>> + /* >>>> + * Unblock any pending dispatch I/Os before we destroy the device. >>>> + * From null_destroy_dev()->del_gendisk() will set GD_DEAD flag >>>> + * causing any new I/O from __bio_queue_enter() to fail with -ENODEV. >>>> + */ >>>> + blk_mq_unquiesce_queue(nullb->q); >>>> + >>>> + null_destroy_dev(nullb); >>> >>> destroying device is never good cleanup for handling timeout/abort, and it >>> should have been the last straw any time. >>> >> >> That is exactly why I've added the rq_abort_limit, so until the limit >> is not reached null_abort_work() will not get scheduled and device is >> not destroyed. > > I meant destroying device should only be done iff the normal abort handler > can't recover the device, however, your patch simply destroys device > without running any abort handling. > I did not understand your comment, can you please elaborate on exactly where and which abort handlers needs to be called in this patch before null_destroy_nullb() ? the objective of this patch it to simulate the teardown scenario from timeout handler so it can get tested on regular basis with null_blk ... -ck
On Mon, Oct 17, 2022 at 10:04:26AM +0000, Chaitanya Kulkarni wrote: > On 10/17/22 02:50, Ming Lei wrote: > > On Mon, Oct 17, 2022 at 09:30:47AM +0000, Chaitanya Kulkarni wrote: > >> > >>>> + /* > >>>> + * Unblock any pending dispatch I/Os before we destroy the device. > >>>> + * From null_destroy_dev()->del_gendisk() will set GD_DEAD flag > >>>> + * causing any new I/O from __bio_queue_enter() to fail with -ENODEV. > >>>> + */ > >>>> + blk_mq_unquiesce_queue(nullb->q); > >>>> + > >>>> + null_destroy_dev(nullb); > >>> > >>> destroying device is never good cleanup for handling timeout/abort, and it > >>> should have been the last straw any time. > >>> > >> > >> That is exactly why I've added the rq_abort_limit, so until the limit > >> is not reached null_abort_work() will not get scheduled and device is > >> not destroyed. > > > > I meant destroying device should only be done iff the normal abort handler > > can't recover the device, however, your patch simply destroys device > > without running any abort handling. > > > > I did not understand your comment, can you please elaborate on exactly > where and which abort handlers needs to be called in this patch before > null_destroy_nullb() ? In case of request timeout, there may be something wrong which needs to be recovered. > > the objective of this patch it to simulate the teardown scenario > from timeout handler so it can get tested on regular basis with > null_blk ... Why does teardown scenario have to be triggered for timeout? That looks you think teardown & destroying device for timeout is one normal and common way, but I think it is not, the device shouldn't be removed if it still can work. I have got such kind of complaints of disk disappeared just by request timeout, such as, nvme-pci. thanks, Ming
On 10/17/22 03:16, Ming Lei wrote: > On Mon, Oct 17, 2022 at 10:04:26AM +0000, Chaitanya Kulkarni wrote: >> On 10/17/22 02:50, Ming Lei wrote: >>> On Mon, Oct 17, 2022 at 09:30:47AM +0000, Chaitanya Kulkarni wrote: >>>> >>>>>> + /* >>>>>> + * Unblock any pending dispatch I/Os before we destroy the device. >>>>>> + * From null_destroy_dev()->del_gendisk() will set GD_DEAD flag >>>>>> + * causing any new I/O from __bio_queue_enter() to fail with -ENODEV. >>>>>> + */ >>>>>> + blk_mq_unquiesce_queue(nullb->q); >>>>>> + >>>>>> + null_destroy_dev(nullb); >>>>> >>>>> destroying device is never good cleanup for handling timeout/abort, and it >>>>> should have been the last straw any time. >>>>> >>>> >>>> That is exactly why I've added the rq_abort_limit, so until the limit >>>> is not reached null_abort_work() will not get scheduled and device is >>>> not destroyed. >>> >>> I meant destroying device should only be done iff the normal abort handler >>> can't recover the device, however, your patch simply destroys device >>> without running any abort handling. >>> >> >> I did not understand your comment, can you please elaborate on exactly >> where and which abort handlers needs to be called in this patch before >> null_destroy_nullb() ? > > In case of request timeout, there may be something wrong which needs > to be recovered. > In case of null_blk there is no real backend controller hence we don't have anything to try to recover, only recovery scenario is exercised by allowing multiple timed out request and waiting for the rq_abort_limit to be reached before teardown. >> >> the objective of this patch it to simulate the teardown scenario >> from timeout handler so it can get tested on regular basis with >> null_blk ... > > Why does teardown scenario have to be triggered for timeout? That The ideal way is to read the controller status from timeout and check if it is recoverable, if controller is not recoverable then we need to gracefully shutdown the device else continuing to issue I/Os to the non-recoverable device can create more damage to the device and potentially to the system, and I've encountered this scenario where SSD was getting hot since device F/W had a bug regarding temperature control reporting and it became non-responsive-> timing out the requests in the SSD Qualification process and lack of teardown made system non-responsive and we only figured it out by logging the temperature with Vendor unique commands.. > looks you think teardown & destroying device for timeout is one normal > and common way, but I think it is not, the device shouldn't be removed No I do not think like that, null_blk has no backend to check for the status of the device. As explained earlier the decision to remove the device only needs to be made after reading controller's state and confirming that it is in non-recoverable state, which is not possible to check for null_blk so we cannot call abort routines before destroying the device .. > if it still can work. I have got such kind of complaints of disk ofcourse it still can work see above explanation but if device status is non-recoverable then it should be allow to gracefully teardown-canceling the I/Os and removing from the system... > disappeared just by request timeout, such as, nvme-pci. > It will be great if we can start a new thread on linux-nvme list to address the complaints that you are received, I'll be happy to review and reply.. -ck
On 10/15/22 22:20, Chaitanya Kulkarni wrote: > In current timeout implementation null_blk just completes the request > with error=BLK_STS_TIMEOUT without doing any cleanup, hence device > cleanup code including handling inflight requests on timeout and > teardown is never exercised. Hi Chaitanya, How about removing that code instead of adding a mechanism for triggering it? > Add a module parameter rq_abort_limit to allow null_blk perform device > cleanup when time out occurs. The non zero value of this parameter > allows user to set the number of timeouts to occur before triggering > cleanup/teardown work. As Ming Lei wrote, there are no other block drivers that destroy themselves if a certain number of timeouts occur. It seems weird to me to trigger self-removal from inside a timeout handler. Thanks, Bart.
On 10/17/22 07:21, Bart Van Assche wrote: > On 10/15/22 22:20, Chaitanya Kulkarni wrote: >> In current timeout implementation null_blk just completes the request >> with error=BLK_STS_TIMEOUT without doing any cleanup, hence device >> cleanup code including handling inflight requests on timeout and >> teardown is never exercised. > > Hi Chaitanya, > > How about removing that code instead of adding a mechanism for > triggering it? > Can you please elaborate on this ? which code needs to be removed? >> Add a module parameter rq_abort_limit to allow null_blk perform device >> cleanup when time out occurs. The non zero value of this parameter >> allows user to set the number of timeouts to occur before triggering >> cleanup/teardown work. > > As Ming Lei wrote, there are no other block drivers that destroy > themselves if a certain number of timeouts occur. It seems weird to me > to trigger self-removal from inside a timeout handler. > Ming thought I'm proposing first line of action to remove the device in the timeout callback without having to look into the device if it can be aborted and make it functional again, which is I'm not, new module parameter allows to set multiple requests to be timed out before engaging in teardown sequence. nvme-rdma host (and I guess nvme-tcp host) does have a the similar behavior to remove the device from the err_work issued from request timeout callback:- from nvme/host/rdma.c nvme_rdma_timeout() nvme_rdma_error_recovery() nvme_err_work() -> nvme_reset_wq nvme_rdma_error_recovery_work() ... nvme_rdma_tear_down_io_queues() nvme_start_freeze() blk_freeze_queue_start() nvme_stop_queues() nvme_stop_ns_queue() blk_mq_quiesce_queue() or blk_mq_wait_quiesce_done() nvme_sync_io_queues() blk_sync_queue() nvme_start_queues() nvme_start_ns_queue() blk_mq_unquiesce_queue() nvme_rdma_reconnect_or_remove() Also, I've listed the problem that I've seen first hand for keeping the device in the system that is non-responsive due to request timeouts, in that case we should let user decide whether user wants to remove or keep the device in the system instead of forcing user to keep the device in the system bringing down whole system, and these problems are really hard to debug even with Teledyne LeCroy [1]. This patch follows the same philosophy where user can decide to opt in for removal with module parameter. Once opt-in user knows what he is getting into. -ck [1] https://teledynelecroy.com/protocolanalyzer/pci-express/interposers-and-probes
On 10/18/22 21:19, Chaitanya Kulkarni wrote: > Also, I've listed the problem that I've seen first hand for keeping the > device in the system that is non-responsive due to request timeouts, in > that case we should let user decide whether user wants to remove or keep > the device in the system instead of forcing user to keep the device in > the system bringing down whole system, and these problems are really > hard to debug even with Teledyne LeCroy [1]. This patch follows the same > philosophy where user can decide to opt in for removal with module > parameter. Once opt-in user knows what he is getting into. Hi Chaitanya, From commit f2298c0403b0 ("null_blk: multi queue aware block test driver"): "Written to facilitate testing of the blk-mq code". I'm not sure of this but adding a mechanism like the one in this patch may fall outside the original scope of the null_blk driver. Thanks, Bart.
On 10/19/22 10:41, Bart Van Assche wrote: > On 10/18/22 21:19, Chaitanya Kulkarni wrote: >> Also, I've listed the problem that I've seen first hand for keeping the >> device in the system that is non-responsive due to request timeouts, in >> that case we should let user decide whether user wants to remove or keep >> the device in the system instead of forcing user to keep the device in >> the system bringing down whole system, and these problems are really >> hard to debug even with Teledyne LeCroy [1]. This patch follows the same >> philosophy where user can decide to opt in for removal with module >> parameter. Once opt-in user knows what he is getting into. > > Hi Chaitanya, > > From commit f2298c0403b0 ("null_blk: multi queue aware block test > driver"): "Written to facilitate testing of the blk-mq code". I'm not > sure of this but adding a mechanism like the one in this patch may fall > outside the original scope of the null_blk driver. > I did not understand your comment, this patch uses blk_mq_XXX() APIs:- + blk_freeze_queue_start(nullb->q); + blk_mq_quiesce_queue(nullb->q); + /* + * We already ensured that submit_bio() will not add any plugging by + * quiescing so it is safe to sync queue now. + */ + blk_sync_queue(nullb->q); + blk_mq_tagset_busy_iter(nullb->tag_set, null_cancel_request, nullb); + blk_mq_tagset_wait_completed_request(nullb->tag_set); + /* + * Unblock any pending dispatch I/Os before we destroy the device. + * From null_destroy_dev()->del_gendisk() will set GD_DEAD flag + * causing any new I/O from __bio_queue_enter() to fail with -ENODEV. + */ + blk_mq_unquiesce_queue(nullb->q); which are called from error_work issued from blk-mq timeout handler and that needs to be a part of null_blk so I can submit the testcases in blktests, these testcases are part of the smoke tests under block category where block tests needs to be run first to established the stability of the block layer baseline with minimal driver then moving on to complex subsystems of additional driver code. Calling blk_mq_XXX() APIs from error work issued from block layer timeout handlers facilitates the testing of the blk-mq code with combination of different error injection parameters for null_blk with blk_should_fake_timeout() as posted in the patch test-report. -ck
diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c index 1f154f92f4c2..52db6b99b448 100644 --- a/drivers/block/null_blk/main.c +++ b/drivers/block/null_blk/main.c @@ -77,6 +77,10 @@ enum { NULL_IRQ_TIMER = 2, }; +static unsigned int g_rq_abort_limit = 5; +module_param_named(rq_abort_limit, g_rq_abort_limit, uint, 0644); +MODULE_PARM_DESC(rq_abort_limit, "request timeout teardown limit. Default:5"); + static bool g_virt_boundary = false; module_param_named(virt_boundary, g_virt_boundary, bool, 0444); MODULE_PARM_DESC(virt_boundary, "Require a virtual boundary for the device. Default: False"); @@ -247,6 +251,7 @@ static void null_del_dev(struct nullb *nullb); static int null_add_dev(struct nullb_device *dev); static struct nullb *null_find_dev_by_name(const char *name); static void null_free_device_storage(struct nullb_device *dev, bool is_cache); +static void null_destroy_dev(struct nullb *nullb); static inline struct nullb_device *to_nullb_device(struct config_item *item) { @@ -578,6 +583,18 @@ config_item *nullb_group_make_item(struct config_group *group, const char *name) { struct nullb_device *dev; + if (g_rq_abort_limit) { + /* + * abort_on_timeout removes the null_blk and resources. When + * null_blk is created using configfs entry by user we will also + * need to cleanup the those entries when abort_on_timeout is set + * from null_abort_work() and that we shold not do it, since + * manupulating user's entries from kernel can create confusion, + * so just don't allow it. + */ + pr_err("don't use g_abort_on_timeout with configfs entries\n"); + return ERR_PTR(-EINVAL); + } if (null_find_dev_by_name(name)) return ERR_PTR(-EEXIST); @@ -614,7 +631,7 @@ static ssize_t memb_group_features_show(struct config_item *item, char *page) "poll_queues,power,queue_mode,shared_tag_bitmap,size," "submit_queues,use_per_node_hctx,virt_boundary,zoned," "zone_capacity,zone_max_active,zone_max_open," - "zone_nr_conv,zone_size\n"); + "zone_nr_conv,zone_size,abort_on_timeout,rq_abort_limit\n"); } CONFIGFS_ATTR_RO(memb_group_, features); @@ -1609,10 +1626,60 @@ static int null_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob) return nr; } +static bool null_cancel_request(struct request *rq, void *data) +{ + struct nullb_cmd *cmd = blk_mq_rq_to_pdu(rq); + + /* + * Keep it simple and set the timeout error since we've been schedule + * from timeout handler afterall. + */ + cmd->error = BLK_STS_TIMEOUT; + if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) + blk_mq_complete_request(rq); + return true; +} + +static void null_abort_work(struct work_struct *w) +{ + struct nullb *nullb = container_of(w, struct nullb, abort_work); + + mutex_lock(&lock); + if (!nullb || !nullb->dev) + goto unlock; + + /* + * To make sure that null_handle_cmd() callers are not in progress from + * null_queue_rq()/null_submit_bio() when we are stopping nullb, we + * quiece the block layer request queue and engage in teardown sequence. + */ + blk_freeze_queue_start(nullb->q); + blk_mq_quiesce_queue(nullb->q); + /* + * We already ensured that submit_bio() will not add any plugging by + * quiescing so it is safe to sync queue now. + */ + blk_sync_queue(nullb->q); + blk_mq_tagset_busy_iter(nullb->tag_set, null_cancel_request, nullb); + blk_mq_tagset_wait_completed_request(nullb->tag_set); + /* + * Unblock any pending dispatch I/Os before we destroy the device. + * From null_destroy_dev()->del_gendisk() will set GD_DEAD flag + * causing any new I/O from __bio_queue_enter() to fail with -ENODEV. + */ + blk_mq_unquiesce_queue(nullb->q); + + null_destroy_dev(nullb); +unlock: + mutex_unlock(&lock); +} + static enum blk_eh_timer_return null_timeout_rq(struct request *rq) { struct blk_mq_hw_ctx *hctx = rq->mq_hctx; struct nullb_cmd *cmd = blk_mq_rq_to_pdu(rq); + struct nullb *nullb = cmd->nq->dev->nullb; + bool fake_timeout_no_abort = cmd->fake_timeout && !g_rq_abort_limit; pr_info("rq %p timed out\n", rq); @@ -1632,9 +1699,23 @@ static enum blk_eh_timer_return null_timeout_rq(struct request *rq) * Only fake timeouts need to execute blk_mq_complete_request() here. */ cmd->error = BLK_STS_TIMEOUT; - if (cmd->fake_timeout || hctx->type == HCTX_TYPE_POLL) + if (fake_timeout_no_abort || hctx->type == HCTX_TYPE_POLL) { blk_mq_complete_request(rq); - return BLK_EH_DONE; + return BLK_EH_DONE; + } + + if (atomic_dec_return(&nullb->rq_abort_count) != 0) { + blk_mq_complete_request(rq); + return BLK_EH_DONE; + } + + if (!flush_work(&nullb->abort_work)) + queue_work(system_highpri_wq, &nullb->abort_work); + /* + * null_abort_work() will take care of completing all the requests + * including timed out one. + */ + return BLK_EH_RESET_TIMER; } static blk_status_t null_queue_rq(struct blk_mq_hw_ctx *hctx, @@ -2124,6 +2205,9 @@ static int null_add_dev(struct nullb_device *dev) if (rv) goto out_ida_free; + INIT_WORK(&nullb->abort_work, null_abort_work); + atomic_set(&nullb->rq_abort_count, g_rq_abort_limit); + mutex_lock(&lock); list_add_tail(&nullb->list, &nullb_list); mutex_unlock(&lock); diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h index 94ff68052b1e..d7f5d3161908 100644 --- a/drivers/block/null_blk/null_blk.h +++ b/drivers/block/null_blk/null_blk.h @@ -133,6 +133,16 @@ struct nullb { struct nullb_queue *queues; unsigned int nr_queues; + /* + * null_blk request timeout teardown limit when device is in the + * stable state, i.e. once this limit is reached issue + * null_abort_work() to teardown the device from block lyaer + * request timeout callback and cleanup resources such as + * memory and pathname. + */ + atomic_t rq_abort_count; + /* tear down work to be scheduled from block layer request handler */ + struct work_struct abort_work; char disk_name[DISK_NAME_LEN]; };