diff mbox series

[bug,report] block nbd0: Unexpected reply (15) 000000009c07859b

Message ID CABvJ_xhxR22i4_xfuFjf9PQxJHVUmW-Xr8dut_1F4Ys7gxW5pw@mail.gmail.com (mailing list archive)
State New, archived
Headers show
Series [bug,report] block nbd0: Unexpected reply (15) 000000009c07859b | expand

Commit Message

Vincent Chen May 10, 2024, 5:04 a.m. UTC
Hi,
I occasionally encountered this NBD error on the Linux 6.9.0-rc7
(commit hash: dd5a440a31fae) arm64 kernel when I executed the
stress-ng HDD test on NBD. The failure rate is approximately 40% in my
testing environment. Since this test case can consistently pass on
Linux 6.2 kernel, I performed a bisect to find the problematic commit.
Finally, I discovered that this NBD issue might be caused by this
commit 65a558f66c30 ("block: Improve performance for BLK_MQ_F_BLOCKING
drivers"). After reverting this commit, I didn't encounter any NBD
issues when executing this test case. Unfortunately, I was unable to
determine the root cause of the problem. I hope that experts in this
area can help clarify this issue. I have posted the execution log and
all relevant experimental information below for further analysis.

 ==== Execution log and error message ====
# udhcpc
udhcpc: started, v1.36.1
udhcpc: broadcasting discover
udhcpc: broadcasting select for 10.0.2.15, server 10.0.2.2
udhcpc: lease of 10.0.2.15 obtained from 10.0.2.2, lease time 86400
deleting routers
adding dns 10.0.2.3
# nbd-client 192.168.10.169 -N arm64-new-poky 10809 /dev/nbd0
Negotiation: ..size = 385MB
bs=512, sz=404463616 bytes
# [  114.228171] nbd0: detected capacity change from 0 to 789968
mount /dev/nbd0 /mnt && \
> cd /mnt && \
> mount -t proc /proc proc/ && \
> mount --rbind /sys sys/ && \
> mount --rbind /dev dev/ && \
> cd - && \
> chroot /mnt
[  119.563027] EXT4-fs (nbd0): recovery complete
[  119.566381] EXT4-fs (nbd0): mounted filesystem
153d936d-5294-4c41-8c7d-e0a6df246e30 r/w with ordered data mode. Quota
mode: none.
/
# stress-ng --seq 0 -t 60 --pathological --verbose --times --tz
--metrics --hdd 4
stress-ng: debug: [230] invoked with 'stress-ng --seq 0 -t 60
--pathological --verbose --times --tz --metrics --hdd 4' by user 0
'root'
stress-ng: debug: [230] stress-ng 0.17.07 g519151f46073
stress-ng: debug: [230] system: Linux buildroot 6.9.0-rc7 #20 SMP
PREEMPT Fri May 10 11:40:17 CST 2024 aarch64, gcc 13.2.0, glibc 2.39,
little endian
stress-ng: debug: [230] RAM total: 1.9G, RAM free: 1.9G, swap free: 0.0
stress-ng: debug: [230] temporary file path: '/', filesystem type:
ext2 (158058 blocks available)
stress-ng: debug: [230] 4 processors online, 4 processors configured
stress-ng: info:  [230] setting to a 1 min, 0 secs run per stressor
stress-ng: debug: [230] cache allocate: using defaults, cannot
determine cache level details
stress-ng: debug: [230] cache allocate: shared cache buffer size: 2048K
stress-ng: info:  [230] dispatching hogs: 4 hdd
stress-ng: debug: [230] starting stressors
stress-ng: debug: [231] hdd: [231] started (instance 0 on CPU 3)
stress-ng: debug: [232] hdd: [232] started (instance 1 on CPU 0)
stress-ng: debug: [230] 4 stressors started
stress-ng: debug: [233] hdd: [233] started (instance 2 on CPU 1)
stress-ng: debug: [234] hdd: [234] started (instance 3 on CPU 2)
stress-ng: debug: [233] hdd: [233] exited (instance 2 on CPU 3)
stress-ng: debug: [232] hdd: [232] exited (instance 1 on CPU 2)
[  196.497492] block nbd0: Unexpected reply (15) 000000009c07859b
[  196.539765] block nbd0: Dead connection, failed to find a fallback
[  196.540442] block nbd0: shutting down sockets
[  196.540787] I/O error, dev nbd0, sector 594178 op 0x1:(WRITE) flags
0x4000 phys_seg 2 prio class 0
[  196.540871] I/O error, dev nbd0, sector 591362 op 0x1:(WRITE) flags
0x4000 phys_seg 1 prio class 0
[  196.541976] I/O error, dev nbd0, sector 594690 op 0x1:(WRITE) flags
0x4000 phys_seg 2 prio class 0
[  196.542335] I/O error, dev nbd0, sector 591618 op 0x1:(WRITE) flags
0x4000 phys_seg 2 prio class 0
[  196.542821] I/O error, dev nbd0, sector 594946 op 0x1:(WRITE) flags
0x4000 phys_seg 1 prio class 0
[  196.544018] I/O error, dev nbd0, sector 594434 op 0x1:(WRITE) flags
0x4000 phys_seg 1 prio class 0
[  196.544786] I/O error, dev nbd0, sector 591874 op 0x1:(WRITE) flags
0x4000 phys_seg 1 prio class 0
[  196.545507] I/O error, dev nbd0, sector 592130 op 0x1:(WRITE) flags
0x4000 phys_seg 2 prio class 0
[  196.546175] I/O error, dev nbd0, sector 592386 op 0x1:(WRITE) flags
0x4000 phys_seg 1 prio class 0
[  196.546829] I/O error, dev nbd0, sector 592642 op 0x1:(WRITE) flags
0x4000 phys_seg 2 prio class 0
[  196.570813] EXT4-fs warning (device nbd0): ext4_end_bio:343: I/O
error 10 writing to inode 64522 starting block 317313)
[  196.572501] Buffer I/O error on device nbd0, logical block 315393
[  196.573011] Buffer I/O error on device nbd0, logical block 315394
[  196.573478] Buffer I/O error on device nbd0, logical block 315395
[  196.573949] Buffer I/O error on device nbd0, logical block 315396
[  196.574475] Buffer I/O error on device nbd0, logical block 315397
[  196.574974] Buffer I/O error on device nbd0, logical block 315398
[  196.575420] Buffer I/O error on device nbd0, logical block 315399
[  196.575411] EXT4-fs (nbd0): shut down requested (2)
[  196.576081] EXT4-fs warning (device nbd0): ext4_end_bio:343: I/O
error 10 writing to inode 64522 starting block 319361)
[  196.576243] Buffer I/O error on device nbd0, logical block 315400
[  196.577125] EXT4-fs warning (device nbd0): ext4_end_bio:343: I/O
error 10 writing to inode 64522 starting block 296833)
[  196.578172] Buffer I/O error on device nbd0, logical block 315401
[  196.578713] Buffer I/O error on device nbd0, logical block 315402
[  196.579861] EXT4-fs warning (device nbd0): ext4_end_bio:343: I/O
error 10 writing to inode 64522 starting block 298881)
[  196.582001] Aborting journal on device nbd0-8.
[  196.582146] EXT4-fs (nbd0): ext4_do_writepages: jbd2_start:
9223372036854772702 pages, ino 64522; err -5
[  196.582192] EXT4-fs (nbd0): ext4_do_writepages: jbd2_start:
9223372036854772768 pages, ino 64522; err -5
[  196.582940] Buffer I/O error on dev nbd0, logical block 139265,
lost sync page write
[  196.584172] EXT4-fs warning (device nbd0): ext4_end_bio:343: I/O
error 10 writing to inode 64522 starting block 385021)
[  196.584626] JBD2: I/O error when updating journal superblock for nbd0-8.
[  196.585291] EXT4-fs warning (device nbd0): ext4_end_bio:343: I/O
error 10 writing to inode 64522 starting block 302849)
stress-ng: fail:  [231] hdd: rmdir './tmp-stress-ng-hdd-231-0' failed,
errno=5 (Input/output error)
stress-ng: debug: [231] hdd: [231] exited (instance 0 on CPU 1)
stress-ng: debug: [230] hdd: [231] terminated (success)
stress-ng: debug: [230] hdd: removing temporary files in
./tmp-stress-ng-hdd-231-0
stress-ng: debug: [230] hdd: [232] terminated (success)
stress-ng: debug: [230] hdd: [233] terminated (success)
stress-ng: fail:  [234] hdd: rmdir './tmp-stress-ng-hdd-234-3' failed,
errno=5 (Input/output error)
stress-ng: debug: [234] hdd: [234] exited (instance 3 on CPU 3)
stress-ng: debug: [230] hdd: [234] terminated (success)
stress-ng: debug: [230] hdd: removing temporary files in
./tmp-stress-ng-hdd-234-3
stress-ng: metrc: [230] stressor       bogo ops real time  usr time
sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [230]                           (secs)    (secs)
(secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [230] hdd               10839     61.51      2.14
 94.28       176.20         112.41        39.19          2124
stress-ng: metrc: [230] miscellaneous metrics:
stress-ng: metrc: [230] hdd                    0.00 MB/sec read rate
(harmonic mean of 4 instances)
stress-ng: metrc: [230] hdd                    2.96 MB/sec write rate
(harmonic mean of 4 instances)
stress-ng: metrc: [230] hdd                    2.96 MB/sec read/write
combined rate (harmonic mean of 4 instances)
stress-ng: debug: [230] metrics-check: all stressor metrics validated and sane
stress-ng: info:  [230] thermal zone temperatures not available
stress-ng: info:  [230] for a 62.50s run time:
stress-ng: info:  [230]     250.00s available CPU time
stress-ng: info:  [230]       2.13s user time   (  0.85%)
stress-ng: info:  [230]      94.32s system time ( 37.73%)
stress-ng: info:  [230]      96.45s total time  ( 38.58%)
stress-ng: info:  [230] load average: 3.75 1.12 0.39
stress-ng: info:  [230] skipped: 0
stress-ng: info:  [230] passed: 4: hdd (4)
stress-ng: info:  [230] failed: 0
stress-ng: info:  [230] metrics untrustworthy: 0
stress-ng: info:  [230] successful run completed in 1 min, 2.50 secs
Bus error
#

==== Kernel information ====
The git tag of the testing kernel is 6.9-rc7 (commit hash: dd5a440a31fae).
The configuration of this arm64 kernel is based on the arm64
defconfig. I only change the CONFIG_BLK_DEV_NBD from m to y and
specify a rootfs.cpio via CONFIG_INITRAMFS_SOURCE.

==== Platform information ====
The platform I used is the virt machine of ARM64 qemu (git tag: v9.0.0
). Below is the QEMU command.

$qemu_img \
        -M virt -cpu max -m 2G \
        -machine virtualization=true \
        -kernel $IMAGE \
        -netdev type=user,id=net0,hostfwd=tcp::7171-:22,hostfwd=tcp::7070-:23 \
        -device e1000e,netdev=net0,mac=52:54:00:12:35:03,bus=pcie.0 \
        -gdb tcp::1234 \
        -nographic \
        -device virtio-iommu-pci \
        -smp 4

==== Disk content information ====
I used Yocto poky (commit 13078ea23ffea) to generate all the contents
of the disk. I refer this website
https://learn.arm.com/learning-paths/embedded-systems/yocto_qemu/yocto_build/
to build it.

Because the default setting of poky does not include stress-ng, I made
the following changes to include stress-ng.
https://github.com/VincentZWC/HDD_NBD_issue/blob/main/core-image-full-cmdline-qemuarm64.rootfs-20240510025944.ext4.

==== stress-ng command ====
 stress-ng --seq 0 -t 60 --pathological --verbose --times --tz --metrics --hdd 4



Please let me know if you need more information about this bug. Thanks.

Best regards,
Vincent

Comments

Jens Axboe May 10, 2024, 12:50 p.m. UTC | #1
Adding Bart


On 5/9/24 11:04 PM, Vincent Chen wrote:
> Hi,
> I occasionally encountered this NBD error on the Linux 6.9.0-rc7
> (commit hash: dd5a440a31fae) arm64 kernel when I executed the
> stress-ng HDD test on NBD. The failure rate is approximately 40% in my
> testing environment. Since this test case can consistently pass on
> Linux 6.2 kernel, I performed a bisect to find the problematic commit.
> Finally, I discovered that this NBD issue might be caused by this
> commit 65a558f66c30 ("block: Improve performance for BLK_MQ_F_BLOCKING
> drivers"). After reverting this commit, I didn't encounter any NBD
> issues when executing this test case. Unfortunately, I was unable to
> determine the root cause of the problem. I hope that experts in this
> area can help clarify this issue. I have posted the execution log and
> all relevant experimental information below for further analysis.
> 
>  ==== Execution log and error message ====
> # udhcpc
> udhcpc: started, v1.36.1
> udhcpc: broadcasting discover
> udhcpc: broadcasting select for 10.0.2.15, server 10.0.2.2
> udhcpc: lease of 10.0.2.15 obtained from 10.0.2.2, lease time 86400
> deleting routers
> adding dns 10.0.2.3
> # nbd-client 192.168.10.169 -N arm64-new-poky 10809 /dev/nbd0
> Negotiation: ..size = 385MB
> bs=512, sz=404463616 bytes
> # [  114.228171] nbd0: detected capacity change from 0 to 789968
> mount /dev/nbd0 /mnt && \
>> cd /mnt && \
>> mount -t proc /proc proc/ && \
>> mount --rbind /sys sys/ && \
>> mount --rbind /dev dev/ && \
>> cd - && \
>> chroot /mnt
> [  119.563027] EXT4-fs (nbd0): recovery complete
> [  119.566381] EXT4-fs (nbd0): mounted filesystem
> 153d936d-5294-4c41-8c7d-e0a6df246e30 r/w with ordered data mode. Quota
> mode: none.
> /
> # stress-ng --seq 0 -t 60 --pathological --verbose --times --tz
> --metrics --hdd 4
> stress-ng: debug: [230] invoked with 'stress-ng --seq 0 -t 60
> --pathological --verbose --times --tz --metrics --hdd 4' by user 0
> 'root'
> stress-ng: debug: [230] stress-ng 0.17.07 g519151f46073
> stress-ng: debug: [230] system: Linux buildroot 6.9.0-rc7 #20 SMP
> PREEMPT Fri May 10 11:40:17 CST 2024 aarch64, gcc 13.2.0, glibc 2.39,
> little endian
> stress-ng: debug: [230] RAM total: 1.9G, RAM free: 1.9G, swap free: 0.0
> stress-ng: debug: [230] temporary file path: '/', filesystem type:
> ext2 (158058 blocks available)
> stress-ng: debug: [230] 4 processors online, 4 processors configured
> stress-ng: info:  [230] setting to a 1 min, 0 secs run per stressor
> stress-ng: debug: [230] cache allocate: using defaults, cannot
> determine cache level details
> stress-ng: debug: [230] cache allocate: shared cache buffer size: 2048K
> stress-ng: info:  [230] dispatching hogs: 4 hdd
> stress-ng: debug: [230] starting stressors
> stress-ng: debug: [231] hdd: [231] started (instance 0 on CPU 3)
> stress-ng: debug: [232] hdd: [232] started (instance 1 on CPU 0)
> stress-ng: debug: [230] 4 stressors started
> stress-ng: debug: [233] hdd: [233] started (instance 2 on CPU 1)
> stress-ng: debug: [234] hdd: [234] started (instance 3 on CPU 2)
> stress-ng: debug: [233] hdd: [233] exited (instance 2 on CPU 3)
> stress-ng: debug: [232] hdd: [232] exited (instance 1 on CPU 2)
> [  196.497492] block nbd0: Unexpected reply (15) 000000009c07859b
> [  196.539765] block nbd0: Dead connection, failed to find a fallback
> [  196.540442] block nbd0: shutting down sockets
> [  196.540787] I/O error, dev nbd0, sector 594178 op 0x1:(WRITE) flags
> 0x4000 phys_seg 2 prio class 0
> [  196.540871] I/O error, dev nbd0, sector 591362 op 0x1:(WRITE) flags
> 0x4000 phys_seg 1 prio class 0
> [  196.541976] I/O error, dev nbd0, sector 594690 op 0x1:(WRITE) flags
> 0x4000 phys_seg 2 prio class 0
> [  196.542335] I/O error, dev nbd0, sector 591618 op 0x1:(WRITE) flags
> 0x4000 phys_seg 2 prio class 0
> [  196.542821] I/O error, dev nbd0, sector 594946 op 0x1:(WRITE) flags
> 0x4000 phys_seg 1 prio class 0
> [  196.544018] I/O error, dev nbd0, sector 594434 op 0x1:(WRITE) flags
> 0x4000 phys_seg 1 prio class 0
> [  196.544786] I/O error, dev nbd0, sector 591874 op 0x1:(WRITE) flags
> 0x4000 phys_seg 1 prio class 0
> [  196.545507] I/O error, dev nbd0, sector 592130 op 0x1:(WRITE) flags
> 0x4000 phys_seg 2 prio class 0
> [  196.546175] I/O error, dev nbd0, sector 592386 op 0x1:(WRITE) flags
> 0x4000 phys_seg 1 prio class 0
> [  196.546829] I/O error, dev nbd0, sector 592642 op 0x1:(WRITE) flags
> 0x4000 phys_seg 2 prio class 0
> [  196.570813] EXT4-fs warning (device nbd0): ext4_end_bio:343: I/O
> error 10 writing to inode 64522 starting block 317313)
> [  196.572501] Buffer I/O error on device nbd0, logical block 315393
> [  196.573011] Buffer I/O error on device nbd0, logical block 315394
> [  196.573478] Buffer I/O error on device nbd0, logical block 315395
> [  196.573949] Buffer I/O error on device nbd0, logical block 315396
> [  196.574475] Buffer I/O error on device nbd0, logical block 315397
> [  196.574974] Buffer I/O error on device nbd0, logical block 315398
> [  196.575420] Buffer I/O error on device nbd0, logical block 315399
> [  196.575411] EXT4-fs (nbd0): shut down requested (2)
> [  196.576081] EXT4-fs warning (device nbd0): ext4_end_bio:343: I/O
> error 10 writing to inode 64522 starting block 319361)
> [  196.576243] Buffer I/O error on device nbd0, logical block 315400
> [  196.577125] EXT4-fs warning (device nbd0): ext4_end_bio:343: I/O
> error 10 writing to inode 64522 starting block 296833)
> [  196.578172] Buffer I/O error on device nbd0, logical block 315401
> [  196.578713] Buffer I/O error on device nbd0, logical block 315402
> [  196.579861] EXT4-fs warning (device nbd0): ext4_end_bio:343: I/O
> error 10 writing to inode 64522 starting block 298881)
> [  196.582001] Aborting journal on device nbd0-8.
> [  196.582146] EXT4-fs (nbd0): ext4_do_writepages: jbd2_start:
> 9223372036854772702 pages, ino 64522; err -5
> [  196.582192] EXT4-fs (nbd0): ext4_do_writepages: jbd2_start:
> 9223372036854772768 pages, ino 64522; err -5
> [  196.582940] Buffer I/O error on dev nbd0, logical block 139265,
> lost sync page write
> [  196.584172] EXT4-fs warning (device nbd0): ext4_end_bio:343: I/O
> error 10 writing to inode 64522 starting block 385021)
> [  196.584626] JBD2: I/O error when updating journal superblock for nbd0-8.
> [  196.585291] EXT4-fs warning (device nbd0): ext4_end_bio:343: I/O
> error 10 writing to inode 64522 starting block 302849)
> stress-ng: fail:  [231] hdd: rmdir './tmp-stress-ng-hdd-231-0' failed,
> errno=5 (Input/output error)
> stress-ng: debug: [231] hdd: [231] exited (instance 0 on CPU 1)
> stress-ng: debug: [230] hdd: [231] terminated (success)
> stress-ng: debug: [230] hdd: removing temporary files in
> ./tmp-stress-ng-hdd-231-0
> stress-ng: debug: [230] hdd: [232] terminated (success)
> stress-ng: debug: [230] hdd: [233] terminated (success)
> stress-ng: fail:  [234] hdd: rmdir './tmp-stress-ng-hdd-234-3' failed,
> errno=5 (Input/output error)
> stress-ng: debug: [234] hdd: [234] exited (instance 3 on CPU 3)
> stress-ng: debug: [230] hdd: [234] terminated (success)
> stress-ng: debug: [230] hdd: removing temporary files in
> ./tmp-stress-ng-hdd-234-3
> stress-ng: metrc: [230] stressor       bogo ops real time  usr time
> sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
> stress-ng: metrc: [230]                           (secs)    (secs)
> (secs)   (real time) (usr+sys time) instance (%)          (KB)
> stress-ng: metrc: [230] hdd               10839     61.51      2.14
>  94.28       176.20         112.41        39.19          2124
> stress-ng: metrc: [230] miscellaneous metrics:
> stress-ng: metrc: [230] hdd                    0.00 MB/sec read rate
> (harmonic mean of 4 instances)
> stress-ng: metrc: [230] hdd                    2.96 MB/sec write rate
> (harmonic mean of 4 instances)
> stress-ng: metrc: [230] hdd                    2.96 MB/sec read/write
> combined rate (harmonic mean of 4 instances)
> stress-ng: debug: [230] metrics-check: all stressor metrics validated and sane
> stress-ng: info:  [230] thermal zone temperatures not available
> stress-ng: info:  [230] for a 62.50s run time:
> stress-ng: info:  [230]     250.00s available CPU time
> stress-ng: info:  [230]       2.13s user time   (  0.85%)
> stress-ng: info:  [230]      94.32s system time ( 37.73%)
> stress-ng: info:  [230]      96.45s total time  ( 38.58%)
> stress-ng: info:  [230] load average: 3.75 1.12 0.39
> stress-ng: info:  [230] skipped: 0
> stress-ng: info:  [230] passed: 4: hdd (4)
> stress-ng: info:  [230] failed: 0
> stress-ng: info:  [230] metrics untrustworthy: 0
> stress-ng: info:  [230] successful run completed in 1 min, 2.50 secs
> Bus error
> #
> 
> ==== Kernel information ====
> The git tag of the testing kernel is 6.9-rc7 (commit hash: dd5a440a31fae).
> The configuration of this arm64 kernel is based on the arm64
> defconfig. I only change the CONFIG_BLK_DEV_NBD from m to y and
> specify a rootfs.cpio via CONFIG_INITRAMFS_SOURCE.
> 
> ==== Platform information ====
> The platform I used is the virt machine of ARM64 qemu (git tag: v9.0.0
> ). Below is the QEMU command.
> 
> $qemu_img \
>         -M virt -cpu max -m 2G \
>         -machine virtualization=true \
>         -kernel $IMAGE \
>         -netdev type=user,id=net0,hostfwd=tcp::7171-:22,hostfwd=tcp::7070-:23 \
>         -device e1000e,netdev=net0,mac=52:54:00:12:35:03,bus=pcie.0 \
>         -gdb tcp::1234 \
>         -nographic \
>         -device virtio-iommu-pci \
>         -smp 4
> 
> ==== Disk content information ====
> I used Yocto poky (commit 13078ea23ffea) to generate all the contents
> of the disk. I refer this website
> https://learn.arm.com/learning-paths/embedded-systems/yocto_qemu/yocto_build/
> to build it.
> 
> Because the default setting of poky does not include stress-ng, I made
> the following changes to include stress-ng.
> diff --git a/meta/recipes-extended/images/core-image-full-cmdline.bb
> b/meta/recipes-extended/images/core-image-full-cmdline.bb
> index b034cd0aeb..4b92dbfbb9 100644
> --- a/meta/recipes-extended/images/core-image-full-cmdline.bb
> +++ b/meta/recipes-extended/images/core-image-full-cmdline.bb
> @@ -7,6 +7,7 @@ IMAGE_INSTALL = "\
>      packagegroup-core-boot \
>      packagegroup-core-full-cmdline \
>      ${CORE_IMAGE_EXTRA_INSTALL} \
> +    stress-ng \
>      "
> 
> I also placed this disk image I used here
> https://github.com/VincentZWC/HDD_NBD_issue/blob/main/core-image-full-cmdline-qemuarm64.rootfs-20240510025944.ext4.
> 
> ==== stress-ng command ====
>  stress-ng --seq 0 -t 60 --pathological --verbose --times --tz --metrics --hdd 4
> 
> 
> 
> Please let me know if you need more information about this bug. Thanks.
> 
> Best regards,
> Vincent
>
Bart Van Assche May 10, 2024, 8:51 p.m. UTC | #2
On 5/9/24 10:04 PM, Vincent Chen wrote:
> I occasionally encountered this NBD error on the Linux 6.9.0-rc7
> (commit hash: dd5a440a31fae) arm64 kernel when I executed the
> stress-ng HDD test on NBD. The failure rate is approximately 40% in my
> testing environment. Since this test case can consistently pass on
> Linux 6.2 kernel, I performed a bisect to find the problematic commit.
> Finally, I discovered that this NBD issue might be caused by this
> commit 65a558f66c30 ("block: Improve performance for BLK_MQ_F_BLOCKING
> drivers"). After reverting this commit, I didn't encounter any NBD
> issues when executing this test case. Unfortunately, I was unable to
> determine the root cause of the problem. I hope that experts in this
> area can help clarify this issue. I have posted the execution log and
> all relevant experimental information below for further analysis.

(+Jens, Christoph and Josef)

Thank you for the detailed report. Unfortunately it is nontrivial to
replicate your test setup so I took a look at the nbd source code.

It seems likely to me that the root cause is in nbd. The root cause 
could e.g. be in the signal handling code. If
nbd_queue_rq() is run asynchronously it is run on the context of a
kernel worker thread then it does not receive signals from the process
that submits I/O. If nbd_queue_rq() is run synchronously then it may
receive signals from the process that submits I/O. I think that I have
found a bug in the nbd signal handling code. See also
https://lore.kernel.org/linux-block/20240510202313.25209-1-bvanassche@acm.org/T/#t

Bart.
Bart Van Assche May 13, 2024, 1:20 p.m. UTC | #3
On 5/12/24 18:59, Vincent Chen wrote:
> Thank you very much for providing the fixed patches. I applied them to 
> my environment, but unfortunately, I still encountered similar issues.
> 
> [   96.784384] block nbd0: Double reply on req 00000000619eb9fb, 
> cmd_cookie 51, handle cookie 49
> [   96.847731] block nbd0: Dead connection, failed to find a fallback
> [   96.848661] block nbd0: shutting down sockets

Hi Vincent,

Thank you for having tested my patches. I will try to reproduce your
test environment and take a closer look as soon as I have the time
(probably early next week).

Bart.
Bart Van Assche May 20, 2024, 9:49 p.m. UTC | #4
On 5/12/24 18:59, Vincent Chen wrote:
> Thank you very much for providing the fixed patches. I applied them to my environment, but unfortunately, I still encountered similar issues.
> 
> [   96.784384] block nbd0: Double reply on req 00000000619eb9fb, cmd_cookie 51, handle cookie 49
> [   96.847731] block nbd0: Dead connection, failed to find a fallback
> [   96.848661] block nbd0: shutting down sockets

Hi Vincent,

That's unfortunate. Since the provided qemu command did not work on my setup,
I have tried to reproduce this issue as follows (in a VM):

modprobe brd
nbd-server 1234 /dev/ram0
nbd-client localhost 1234 /dev/nbd0
mkfs.ext4 /dev/nbd0
mount /dev/nbd0 /mnt
cd /mnt
stress-ng --seq 0 -t 60 --pathological --verbose --times --tz --metrics --hdd 4

I ran the above stress-ng command several times and every time I run that command
it succeeds. I'm not sure how to proceed since I cannot reproduce the reported
issue.

Thanks,

Bart.
Leon Schuermann Oct. 14, 2024, 2:07 p.m. UTC | #5
Vincent Chen <vincent.chen@sifive.com> writes:
> I occasionally encountered this NBD error on the Linux 6.9.0-rc7
> (commit hash: dd5a440a31fae) arm64 kernel when I executed the
> stress-ng HDD test on NBD.
>
> [...]
>
> [  196.497492] block nbd0: Unexpected reply (15) 000000009c07859b
> [  196.539765] block nbd0: Dead connection, failed to find a fallback
> [  196.540442] block nbd0: shutting down sockets

Unfortunately, I can confirm this and the other "Double reply on req"
bugs and pretty reliably reproduce them among a variety of different
configurations and devices, such as

- an AMD64 VM running a vanilla 6.12-rc3+1 6485cf5ea253 ("Merge tag
  'hid-for-linus-2024101301'") with the Fedora 40 kernel config,

- and a Raspberry Pi running its vendor kernel (6.6.51+rpt-rpi-2712 #1
  SMP PREEMPT Debian 1:6.6.51-1+rpt3 (2024-10-08) aarch64).

I'm testing these NBD clients against both the nbd-server package
(3.26.1) and QEMU's qemu-nbd (v8.2.2).

Key to reproducing this issue seems to be high CPU load caused by the
NBD / filesystem / networking stack (so, running on a high-bandwidth
connection with a server that can keep up), as well as an SMP system (I
am testing with 4 cores on both AMD64 and aarch64 platforms).

To pinpoint the issue, I enabled the `nbd_send_request`,
`nbd_header_sent` and `nbd_header_received` tracepoints while triggering
the following bug:

    [    0.000000] Linux version 6.12.0-rc3-nbddebug+ (root@treadmill-nbd-debug-4) (gcc (GCC) 14.2.1 20240912 (Red Hat 14.2.1-3), GNU ld version 2.41-37.fc40) #1 SMP PREEMPT_DYNAMIC Mon Oct 14 01:07:12 UTC 2024
    [    0.000000] Command line: BOOT_IMAGE=(hd0,gpt1)/boot/vmlinuz-6.12.0-rc3-nbddebug+ root=UUID=944ab680-e1ab-49d7-a580-dfce30985180 ro consoleblank=0 systemd.show_status=true crashkernel=auto no_timer_check console=tty1 console=ttyS0,115200n8

    [  +0.011557] EXT4-fs (nbd0): mounted filesystem 8998d2ee-2045-4708-bb3a-0fff335c437f r/w with ordered data mode. Quota mode: none.
    [Oct14 02:36] block nbd0: Double reply on req 0000000015334b0a, cmd_cookie 193, handle cookie 187
    [  +0.004639] block nbd0: Dead connection, failed to find a fallback

Looking at the reversed trace file, we can see the offending header with
cookie 187 == 0xbb:

    kworker/u17:0-121     [000] .....   103.672786: nbd_header_received: nbd transport event: request 0000000015334b0a, handle 0x000000bb00000042

However, some 11 events back, we can see that this request object was
just sent with a different NBD handle/cookie:

    stress-ng-hdd-1119    [000] .....   103.668096: nbd_header_sent: nbd transport event: request 0000000015334b0a, handle 0x000000c100000059
    stress-ng-hdd-1119    [000] .....   103.668084: nbd_send_request: nbd0: request 0000000015334b0a

This new NBD handle (0x000000c100000059) never appears in the trace
file, perhaps because the socket was closed before it could be received.

However, the exact handle of the offending NBD request was sent 49
events before it was received again:

    stress-ng-hdd-1120    [002] .....   103.647257: nbd_header_sent: nbd transport event: request 00000000ae03f314, handle 0x000000bb00000042
    stress-ng-hdd-1120    [002] .....   103.647244: nbd_send_request: nbd0: request 00000000ae03f314

In fact, all other requests in the trace have almost exactly 50 trace
events between them being sent and received again. From this I conclude
that it is mostly likely that the handle of the _received_ offending
reply (103.672786) is indeed correct (0x000000bb00000042) and
corresponds to the request 103.647244 (latency works out), but it is
mapped onto an entirely wrong request object (0000000015334b0a), or onto
a non-existent request object in the case of a "Unexpected request"
error. Could it be that sometimes a request is prematurely marked as
completed such that the queue tag gets reused?

I tried a packet capture to confirm that the trace outputs do not
diverge from what is sent on the wire, but my capture node could not
keep up with the amount of traffic. Reducing bandwidth makes it harder
to trigger a "Double reply" error in favor of "Unexpected reply" errors,
which are more tricky to correlate with traffic. These missed packets
cause Wireshark's protocol analyzer to choke on the pcap.

All in all, this smells like a race condition around the NBD blk_mq
management code / nbd_handle_reply function to me. I tried to trace
through this code, but I am not familiar with the locking semantics and
conventions of this subsystem. Unfortunately, KCSAN did not yield any
promising findings (apart from a bunch of virtqueue and other
filesystem-related races).

I will try to investigate this further over the coming days. If anyone
has advice on how to best debug these issues, that would be appreciated!

-Leon
Kevin Wolf Oct. 16, 2024, 4:43 p.m. UTC | #6
Hi Leon,

I saw only now that you replied to this old thread recently, which looks
a lot like what I reported independently yesterday:

https://lore.kernel.org/linux-block/Zw8i6-DVDsLk3sq9@fedora/T/#t

Am 14.10.2024 um 16:07 hat Leon Schuermann geschrieben:
> Vincent Chen <vincent.chen@sifive.com> writes:
> > I occasionally encountered this NBD error on the Linux 6.9.0-rc7
> > (commit hash: dd5a440a31fae) arm64 kernel when I executed the
> > stress-ng HDD test on NBD.
> >
> > [...]
> >
> > [  196.497492] block nbd0: Unexpected reply (15) 000000009c07859b
> > [  196.539765] block nbd0: Dead connection, failed to find a fallback
> > [  196.540442] block nbd0: shutting down sockets
> 
> Unfortunately, I can confirm this and the other "Double reply on req"
> bugs and pretty reliably reproduce them among a variety of different
> configurations and devices, such as
> 
> - an AMD64 VM running a vanilla 6.12-rc3+1 6485cf5ea253 ("Merge tag
>   'hid-for-linus-2024101301'") with the Fedora 40 kernel config,
> 
> - and a Raspberry Pi running its vendor kernel (6.6.51+rpt-rpi-2712 #1
>   SMP PREEMPT Debian 1:6.6.51-1+rpt3 (2024-10-08) aarch64).
> 
> I'm testing these NBD clients against both the nbd-server package
> (3.26.1) and QEMU's qemu-nbd (v8.2.2).
> 
> Key to reproducing this issue seems to be high CPU load caused by the
> NBD / filesystem / networking stack (so, running on a high-bandwidth
> connection with a server that can keep up), as well as an SMP system (I
> am testing with 4 cores on both AMD64 and aarch64 platforms).

This sounds like you could reproduce the problem with an actual network
connection, going over TCP to another host? This is an interesting
addition for me, because I only saw it with local unix domain sockets.

> To pinpoint the issue, I enabled the `nbd_send_request`,
> `nbd_header_sent` and `nbd_header_received` tracepoints while triggering
> the following bug:
> 
>     [    0.000000] Linux version 6.12.0-rc3-nbddebug+ (root@treadmill-nbd-debug-4) (gcc (GCC) 14.2.1 20240912 (Red Hat 14.2.1-3), GNU ld version 2.41-37.fc40) #1 SMP PREEMPT_DYNAMIC Mon Oct 14 01:07:12 UTC 2024
>     [    0.000000] Command line: BOOT_IMAGE=(hd0,gpt1)/boot/vmlinuz-6.12.0-rc3-nbddebug+ root=UUID=944ab680-e1ab-49d7-a580-dfce30985180 ro consoleblank=0 systemd.show_status=true crashkernel=auto no_timer_check console=tty1 console=ttyS0,115200n8
> 
>     [  +0.011557] EXT4-fs (nbd0): mounted filesystem 8998d2ee-2045-4708-bb3a-0fff335c437f r/w with ordered data mode. Quota mode: none.
>     [Oct14 02:36] block nbd0: Double reply on req 0000000015334b0a, cmd_cookie 193, handle cookie 187
>     [  +0.004639] block nbd0: Dead connection, failed to find a fallback
> 
> Looking at the reversed trace file, we can see the offending header with
> cookie 187 == 0xbb:
> 
>     kworker/u17:0-121     [000] .....   103.672786: nbd_header_received: nbd transport event: request 0000000015334b0a, handle 0x000000bb00000042
> 
> However, some 11 events back, we can see that this request object was
> just sent with a different NBD handle/cookie:
> 
>     stress-ng-hdd-1119    [000] .....   103.668096: nbd_header_sent: nbd transport event: request 0000000015334b0a, handle 0x000000c100000059
>     stress-ng-hdd-1119    [000] .....   103.668084: nbd_send_request: nbd0: request 0000000015334b0a
> 
> This new NBD handle (0x000000c100000059) never appears in the trace
> file, perhaps because the socket was closed before it could be received.
> 
> However, the exact handle of the offending NBD request was sent 49
> events before it was received again:
> 
>     stress-ng-hdd-1120    [002] .....   103.647257: nbd_header_sent: nbd transport event: request 00000000ae03f314, handle 0x000000bb00000042
>     stress-ng-hdd-1120    [002] .....   103.647244: nbd_send_request: nbd0: request 00000000ae03f314
> 
> In fact, all other requests in the trace have almost exactly 50 trace
> events between them being sent and received again. From this I conclude
> that it is mostly likely that the handle of the _received_ offending
> reply (103.672786) is indeed correct (0x000000bb00000042) and
> corresponds to the request 103.647244 (latency works out), but it is
> mapped onto an entirely wrong request object (0000000015334b0a), or onto
> a non-existent request object in the case of a "Unexpected request"
> error. Could it be that sometimes a request is prematurely marked as
> completed such that the queue tag gets reused?

My conclusion was that we hit one of the code paths where a partially
completed request gets requeued (and then assigned a new tag when it
runs the next time).

Maybe you want to test the patch (v3) Ming suggested in the other thread
on your setup? The problem doesn't reproduce for me any more with it,
but I suppose more testing can't hurt.

Kevin
diff mbox series

Patch

diff --git a/meta/recipes-extended/images/core-image-full-cmdline.bb
b/meta/recipes-extended/images/core-image-full-cmdline.bb
index b034cd0aeb..4b92dbfbb9 100644
--- a/meta/recipes-extended/images/core-image-full-cmdline.bb
+++ b/meta/recipes-extended/images/core-image-full-cmdline.bb
@@ -7,6 +7,7 @@  IMAGE_INSTALL = "\
     packagegroup-core-boot \
     packagegroup-core-full-cmdline \
     ${CORE_IMAGE_EXTRA_INSTALL} \
+    stress-ng \
     "

I also placed this disk image I used here