Message ID | tencent_9A3345EA79C1EE9DC4464BB576C6A602A105@qq.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block:added printing when bio->bi_status fails | expand |
Hello, kernel test robot noticed "WARNING:at_fs/buffer.c:#mark_buffer_dirty" on: commit: 0824beb1d430c30731166484b8c26e37147d4dbb ("[PATCH] block:added printing when bio->bi_status fails") url: https://github.com/intel-lab-lkp/linux/commits/824731276-qq-com/block-added-printing-when-bio-bi_status-fails/20240808-181758 base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-next patch link: https://lore.kernel.org/all/tencent_9A3345EA79C1EE9DC4464BB576C6A602A105@qq.com/ patch subject: [PATCH] block:added printing when bio->bi_status fails in testcase: xfstests version: xfstests-x86_64-f5ada754-1_20240812 with following parameters: disk: 4HDD fs: udf test: generic-081 compiler: gcc-12 test machine: 8 threads Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (Skylake) with 28G memory (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202408161114.dfe9cc49-oliver.sang@intel.com we do see a lot of print that: [ 68.449409][ T89] bio: bio_endio status is 10, disk[253:2] [ 68.454169][ T2095] bio: bio_endio status is 10, disk[253:3] [ 68.455863][ T89] bio: bio_endio status is 10, disk[253:2] [ 68.466598][ T89] bio: bio_endio status is 10, disk[253:2] [ 68.472314][ T89] bio: bio_endio status is 10, disk[253:2] ... [ 74.216172][ T89] bio: bio_endio status is 10, disk[253:2] [ 74.221903][ T89] bio: bio_endio status is 10, disk[253:2] [ 74.2:2] [ 74.348274][ T89] bio: bio_endio status is 10, disk[253:3] [ 74.356178][ T2096] ------------[ cut here ]------------ [ 74.361531][ T2096] WARNING: CPU: 0 PID: 2096 at fs/buffer.c:1181 mark_buffer_dirty+0x1e6/0x240 then see below WARNING [ 72.605562][ T2097] ------------[ cut here ]------------ [ 72.610907][ T2097] WARNING: CPU: 7 PID: 2097 at fs/buffer.c:1181 mark_buffer_dirty (fs/buffer.c:1181 (discriminator 1)) [ 72.619661][ T2097] Modules linked in: dm_snapshot dm_bufio udf crc_itu_t cdrom btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp sd_mod ipmi_devintf sg ipmi_msghandler kvm_intel i915 kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 drm_buddy ahci intel_gtt rapl mei_wdt drm_display_helper libahci intel_cstate wmi_bmof ttm mei_me i2c_i801 intel_uncore libata drm_kms_helper i2c_smbus mei intel_pch_thermal video wmi acpi_pad binfmt_misc loop fuse drm dm_mod ip_tables [ 72.671579][ T2097] CPU: 7 UID: 0 PID: 2097 Comm: xfs_io Not tainted 6.11.0-rc1-00021-g0824beb1d430 #1 [ 72.680936][ T2097] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.2.8 01/26/2016 [ 72.689071][ T2097] RIP: 0010:mark_buffer_dirty (fs/buffer.c:1181 (discriminator 1)) [ 72.694676][ T2097] Code: 58 c6 ff 48 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 75 60 48 8b 7d 00 5b be 04 00 00 00 5d e9 3a fc fc ff <0f> 0b e9 34 fe ff ff 48 89 df e8 5b 23 e5 ff e9 54 fe ff ff 48 89 All code ======== 0: 58 pop %rax 1: c6 (bad) 2: ff 48 89 decl -0x77(%rax) 5: ea (bad) 6: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax d: fc ff df 10: 48 c1 ea 03 shr $0x3,%rdx 14: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) 18: 75 60 jne 0x7a 1a: 48 8b 7d 00 mov 0x0(%rbp),%rdi 1e: 5b pop %rbx 1f: be 04 00 00 00 mov $0x4,%esi 24: 5d pop %rbp 25: e9 3a fc fc ff jmpq 0xfffffffffffcfc64 2a:* 0f 0b ud2 <-- trapping instruction 2c: e9 34 fe ff ff jmpq 0xfffffffffffffe65 31: 48 89 df mov %rbx,%rdi 34: e8 5b 23 e5 ff callq 0xffffffffffe52394 39: e9 54 fe ff ff jmpq 0xfffffffffffffe92 3e: 48 rex.W 3f: 89 .byte 0x89 Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: e9 34 fe ff ff jmpq 0xfffffffffffffe3b 7: 48 89 df mov %rbx,%rdi a: e8 5b 23 e5 ff callq 0xffffffffffe5236a f: e9 54 fe ff ff jmpq 0xfffffffffffffe68 14: 48 rex.W 15: 89 .byte 0x89 [ 72.714238][ T2097] RSP: 0018:ffffc900033df8a0 EFLAGS: 00010246 [ 72.720206][ T2097] RAX: 0000000000000001 RBX: ffff888120d219d8 RCX: ffffffff81c2b878 [ 72.728093][ T2097] RDX: ffffed10241a433c RSI: 0000000000000008 RDI: ffff888120d219d8 [ 72.735965][ T2097] RBP: ffff888120d219d8 R08: 0000000000000000 R09: ffffed10241a433b [ 72.743867][ T2097] R10: ffff888120d219df R11: 0000000000000008 R12: ffffed10241a4340 [ 72.751752][ T2097] R13: 0000000000000004 R14: ffff8887472e8000 R15: 0000000000000948 [ 72.759638][ T2097] FS: 0000000000000000(0000) GS:ffff888634180000(0000) knlGS:0000000000000000 [ 72.768470][ T2097] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 72.774945][ T2097] CR2: 000055d09397fc48 CR3: 000000075685c004 CR4: 00000000003706f0 [ 72.782843][ T2097] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 72.790713][ T2097] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 72.798583][ T2097] Call Trace: [ 72.801752][ T2097] <TASK> [ 72.804567][ T2097] ? __warn (kernel/panic.c:735) [ 72.808515][ T2097] ? mark_buffer_dirty (fs/buffer.c:1181 (discriminator 1)) [ 72.813514][ T2097] ? report_bug (lib/bug.c:180 lib/bug.c:219) [ 72.817908][ T2097] ? handle_bug (arch/x86/kernel/traps.c:239) [ 72.822125][ T2097] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1)) [ 72.826692][ T2097] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621) [ 72.831612][ T2097] ? mark_buffer_dirty (arch/x86/include/asm/bitops.h:213 arch/x86/include/asm/bitops.h:245 include/asm-generic/bitops/instrumented-non-atomic.h:154 include/linux/buffer_head.h:171 fs/buffer.c:1181) [ 72.836529][ T2097] ? mark_buffer_dirty (fs/buffer.c:1181 (discriminator 1)) [ 72.841536][ T2097] udf_bitmap_free_blocks (fs/udf/balloc.c:164) udf [ 72.847326][ T2097] udf_free_blocks (fs/udf/balloc.c:662) udf [ 72.852530][ T2097] udf_discard_prealloc (fs/udf/truncate.c:147) udf [ 72.858161][ T2097] ? __pfx_udf_discard_prealloc (fs/udf/truncate.c:118) udf [ 72.864303][ T2097] ? __pfx_down_write (kernel/locking/rwsem.c:1577) [ 72.869051][ T2097] ? __pfx_locks_remove_file (fs/locks.c:2687) [ 72.874412][ T2097] udf_release_file (fs/udf/file.c:185 fs/udf/file.c:174) udf [ 72.879584][ T2097] ? security_file_release (security/security.c:2754 (discriminator 11)) [ 72.884757][ T2097] __fput (fs/file_table.c:422) [ 72.888638][ T2097] task_work_run (kernel/task_work.c:222 (discriminator 1)) [ 72.893108][ T2097] ? __pfx_task_work_run (kernel/task_work.c:190) [ 72.898101][ T2097] do_exit (kernel/exit.c:883) [ 72.902048][ T2097] ? __pfx_do_exit (kernel/exit.c:821) [ 72.906518][ T2097] ? _raw_spin_lock_irq (arch/x86/include/asm/atomic.h:107 include/linux/atomic/atomic-arch-fallback.h:2170 include/linux/atomic/atomic-instrumented.h:1302 include/asm-generic/qspinlock.h:111 include/linux/spinlock.h:187 include/linux/spinlock_api_smp.h:120 kernel/locking/spinlock.c:170) [ 72.911423][ T2097] do_group_exit (kernel/exit.c:1012) [ 72.915818][ T2097] get_signal (include/linux/signal.h:78 kernel/signal.c:2751) [ 72.920215][ T2097] ? finish_task_switch+0x495/0x750 [ 72.925907][ T2097] ? __switch_to (arch/x86/include/asm/bitops.h:55 include/asm-generic/bitops/instrumented-atomic.h:29 include/linux/thread_info.h:89 include/linux/sched.h:1945 arch/x86/include/asm/fpu/sched.h:68 arch/x86/kernel/process_64.c:674) [ 72.930377][ T2097] ? __pfx_get_signal (kernel/signal.c:2682) [ 72.935108][ T2097] ? __schedule (kernel/sched/core.c:6399) [ 72.939576][ T2097] arch_do_signal_or_restart (arch/x86/kernel/signal.c:310) [ 72.945005][ T2097] ? __pfx_arch_do_signal_or_restart (arch/x86/kernel/signal.c:307) [ 72.951046][ T2097] syscall_exit_to_user_mode (kernel/entry/common.c:111 include/linux/entry-common.h:328 kernel/entry/common.c:207 kernel/entry/common.c:218) [ 72.956561][ T2097] do_syscall_64 (arch/x86/entry/common.c:102) [ 72.960942][ T2097] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) [ 72.966720][ T2097] RIP: 0033:0x7f896a5efd32 [ 72.971012][ T2097] Code: Unable to access opcode bytes at 0x7f896a5efd08. Code starting with the faulting instruction =========================================== [ 72.977922][ T2097] RSP: 002b:00007f896a1ffdb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000022 [ 72.986232][ T2097] RAX: fffffffffffffdfe RBX: 00007f896a2006c0 RCX: 00007f896a5efd32 [ 72.994101][ T2097] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000000 [ 73.001969][ T2097] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ffdb5bc88c7 [ 73.009844][ T2097] R10: 00007f896a535f80 R11: 0000000000000293 R12: ffffffffffffff80 [ 73.017712][ T2097] R13: 0000000000000002 R14: 00007ffdb5bc87d0 R15: 00007f8969a00000 [ 73.025582][ T2097] </TASK> [ 73.028478][ T2097] ---[ end trace 0000000000000000 ]--- [ 73.034147][ T66] bio: bio_endio status is 10, disk[253:3] [ 73.034167][ T2097] bio: bio_endio status is 10, disk[253:3] [ 73.039846][ T66] Buffer I/O error on dev dm-3, logical block 259, lost async page write [ 73.045565][ T2097] Buffer I/O error on dev dm-3, logical block 128, lost async page write [ 73.053894][ T66] bio: bio_endio status is 10, disk[253:3] [ 73.067887][ T66] Buffer I/O error on dev dm-3, logical block 387, lost async page write [ 73.076268][ T66] bio: bio_endio status is 10, disk[253:3] [ 73.081976][ T66] Buffer I/O error on dev dm-3, logical block 388, lost async page write [ 73.180045][ T2097] bio: bio_endio status is 10, disk[253:3] [ 73.185743][ T2097] Buffer I/O error on dev dm-3, logical block 128, lost sync page write The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240816/202408161114.dfe9cc49-oliver.sang@intel.com
Hi, 在 2024/08/08 17:54, 824731276@qq.com 写道: > From: baiguo <baiguo@kylinos.cn> > > When ftrace is not enabled and bio is not OK, > the system cannot actively record which disk is abnormal. > Add a message record to bio_endio. > > Signed-off-by: baiguo <baiguo@kylinos.cn> > --- > block/bio.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/block/bio.c b/block/bio.c > index c4053d496..fb07589c8 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -1617,6 +1617,11 @@ void bio_endio(struct bio *bio) > bio_clear_flag(bio, BIO_TRACE_COMPLETION); > } > > + if (bio->bi_status && bio->bi_bdev) > + printk(KERN_ERR "bio: %s status is %d, disk[%d:%d]\n",\ > + __func__, bio->bi_status, bio->bi_bdev->bd_disk->major,\ > + bio->bi_bdev->bd_disk->first_minor); I don't understand why you'll need this, bio_endio() will still be called for unsupported bio from submit_bio_noacct() when the disk is fine. For real disks blk_print_req_error() already print message for failed IO that are submitted to disk. Thanks, Kuai > + > /* > * Need to have a real endio function for chained bios, otherwise > * various corner cases will break (like stacking block devices that >
diff --git a/block/bio.c b/block/bio.c index c4053d496..fb07589c8 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1617,6 +1617,11 @@ void bio_endio(struct bio *bio) bio_clear_flag(bio, BIO_TRACE_COMPLETION); } + if (bio->bi_status && bio->bi_bdev) + printk(KERN_ERR "bio: %s status is %d, disk[%d:%d]\n",\ + __func__, bio->bi_status, bio->bi_bdev->bd_disk->major,\ + bio->bi_bdev->bd_disk->first_minor); + /* * Need to have a real endio function for chained bios, otherwise * various corner cases will break (like stacking block devices that