Message ID | 6ba44b940e4e3eea573cad667ab8c0b2dd8f2c06.1710409033.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: scrub: refine the error messages | expand |
On Thu, Mar 14, 2024 at 9:51 AM Qu Wenruo <wqu@suse.com> wrote: > > Currently the scrub error report is pretty long: > > BTRFS error (device dm-2): unable to fixup (regular) error at logical 13647872 on dev /dev/mapper/test-scratch1 physical 13647872 > BTRFS warning (device dm-2): checksum error at logical 13647872 on dev /dev/mapper/test-scratch1, physical 13647872, root 5, inode 257, offset 16384, length 4096, links 1 (path: file1) > > Since we have so many things to output, it's not a surprise we got long > error lines. > > To make the lines a little shorter, and make important info more > obvious, change the unify output to something like this: > > BTRFS error (device dm-2): unable to fixup (regular) error at logical 13647872(1) physical 1(/dev/mapper/test-scratch1)13647872 > BTRFS warning (device dm-2): checksum error at inode 5/257(file1) fileoff 16384, logical 13647872(1) physical 1(/dev/mapper/test-scratch1)13647872 I find that hard to read because: 1) Please leave spaces before opening parenthesis. That makes things less cluttered and easier to the eyes, more consistent with what we generally do and it's the formal way to do things in English (see https://www.scribens.com/grammar-rules/typography/spacing.html); 2) Instead of "inode 5/257(file1)", something a lot easier to understand like "inode 5 root 257 path \"file1\"", which leaves no margin for doubt about what each number is; 3) Same with "logical 13647872(1)" - what is the 1? That will be the question anyone reading such a log message will likely have. Something like "logical 13647872 mirror 1" makes it clear; 4) Same with "physical 1(/dev/mapper/test-scratch1)13647872". Something like "physical 13647872 device ID 1 device path \"/dev/mapper/test-scratch1\"", makes it clear what each number is and easier to read. Thanks. > > The idea is, to put large values/small values/string separated meanwhile > skip the some descriptor directly: > > - "logical LOGICAL(MIRROR)" > LOGICAL is always a large value, meanwhile MIRROR is always a single > digit. > Thus combining them together, human can still split them instinctively. > > - "physical DEVID(DEVPATH)PHYSICAL" > DEVID is normally a short number, DEVPATH is a long string, and PHYSICAL > is a normally a large number. > And for most end users, the most important device path is still easy > to catch, even surrounded by some large numbers. > > - inode ROOT/INO(PATH) > To locate a btrfs inode, we have to provide both root and inode > number. Normally ROOT should be a much smaller number (3 digits) > meanwhile the INO can be pretty large. > > However for the end user, the most important thing is the PATH, which > is still not hard to locate. > > Any better ideas would be appreciated to make those lines shorter. > > However what we really need is a proper way to report all those info > (maybe put the file/dev name resolution into the user space) to user > space reliably (without rate limit). > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/scrub.c | 55 +++++++++++++++++++++++++++--------------------- > 1 file changed, 31 insertions(+), 24 deletions(-) > > diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c > index 18b2ee3b1616..17e492076bf8 100644 > --- a/fs/btrfs/scrub.c > +++ b/fs/btrfs/scrub.c > @@ -226,6 +226,7 @@ struct scrub_warning { > u64 physical; > u64 logical; > struct btrfs_device *dev; > + int mirror_num; > }; > > static void release_scrub_stripe(struct scrub_stripe *stripe) > @@ -427,12 +428,12 @@ static int scrub_print_warning_inode(u64 inum, u64 offset, u64 num_bytes, > */ > for (i = 0; i < ipath->fspath->elem_cnt; ++i) > btrfs_warn_in_rcu(fs_info, > -"%s at logical %llu on dev %s, physical %llu, root %llu, inode %llu, offset %llu, path: %s", > - swarn->errstr, swarn->logical, > - btrfs_dev_name(swarn->dev), > - swarn->physical, > - root, inum, offset, > - (char *)(unsigned long)ipath->fspath->val[i]); > +"%s at inode %lld/%llu(%s) fileoff %llu, logical %llu(%u) physical %llu(%s)%llu", > + swarn->errstr, root, inum, > + (char *)ipath->fspath->val[i], offset, > + swarn->logical, swarn->mirror_num, > + swarn->dev->devid, btrfs_dev_name(swarn->dev), > + swarn->physical); > > btrfs_put_root(local_root); > free_ipath(ipath); > @@ -440,18 +441,17 @@ static int scrub_print_warning_inode(u64 inum, u64 offset, u64 num_bytes, > > err: > btrfs_warn_in_rcu(fs_info, > - "%s at logical %llu on dev %s, physical %llu, root %llu, inode %llu, offset %llu: path resolving failed with ret=%d", > - swarn->errstr, swarn->logical, > - btrfs_dev_name(swarn->dev), > - swarn->physical, > - root, inum, offset, ret); > - > + "%s at inode %lld/%llu fileoff %llu, logical %llu(%u) physical %llu(%s)%llu", > + swarn->errstr, root, inum, offset, > + swarn->logical, swarn->mirror_num, > + swarn->dev->devid, btrfs_dev_name(swarn->dev), > + swarn->physical); > free_ipath(ipath); > return 0; > } > > static void scrub_print_common_warning(const char *errstr, struct btrfs_device *dev, > - u64 logical, u64 physical) > + u64 logical, u64 physical, int mirror_num) > { > struct btrfs_fs_info *fs_info = dev->fs_info; > struct btrfs_path *path; > @@ -471,6 +471,7 @@ static void scrub_print_common_warning(const char *errstr, struct btrfs_device * > swarn.logical = logical; > swarn.errstr = errstr; > swarn.dev = NULL; > + swarn.mirror_num = mirror_num; > > ret = extent_from_logical(fs_info, swarn.logical, path, &found_key, > &flags); > @@ -501,10 +502,11 @@ static void scrub_print_common_warning(const char *errstr, struct btrfs_device * > if (ret > 0) > break; > btrfs_warn_in_rcu(fs_info, > -"%s at logical %llu on dev %s, physical %llu: metadata %s (level %d) in tree %llu", > - errstr, swarn.logical, btrfs_dev_name(dev), > - swarn.physical, (ref_level ? "node" : "leaf"), > - ref_level, ref_root); > +"%s at metadata in tree %llu(%u), logical %llu(%u) physical %llu(%s)%llu", > + errstr, ref_root, ref_level, > + swarn.logical, swarn.mirror_num, > + dev->devid, btrfs_dev_name(dev), > + swarn.physical); > } > btrfs_release_path(path); > } else { > @@ -879,27 +881,32 @@ static void scrub_stripe_report_errors(struct scrub_ctx *sctx, > */ > if (repaired) { > btrfs_err_rl_in_rcu(fs_info, > - "fixed up error at logical %llu on dev %s physical %llu", > - logical, btrfs_dev_name(dev), > + "fixed up error at logical %llu(%u) physical %llu(%s)%llu", > + logical, stripe->mirror_num, > + dev->devid, btrfs_dev_name(dev), > physical); > continue; > } > > /* The remaining are all for unrepaired. */ > btrfs_err_rl_in_rcu(fs_info, > - "unable to fixup (regular) error at logical %llu on dev %s physical %llu", > - logical, btrfs_dev_name(dev), > + "unable to fixup (regular) error at logical %llu(%u) physical %llu(%s)%llu", > + logical, stripe->mirror_num, > + dev->devid, btrfs_dev_name(dev), > physical); > > if (test_bit(sector_nr, &stripe->io_error_bitmap)) > scrub_print_common_warning("i/o error", dev, > - logical, physical); > + logical, physical, > + stripe->mirror_num); > if (test_bit(sector_nr, &stripe->csum_error_bitmap)) > scrub_print_common_warning("checksum error", dev, > - logical, physical); > + logical, physical, > + stripe->mirror_num); > if (test_bit(sector_nr, &stripe->meta_error_bitmap)) > scrub_print_common_warning("header error", dev, > - logical, physical); > + logical, physical, > + stripe->mirror_num); > } > > spin_lock(&sctx->stat_lock); > -- > 2.44.0 > >
在 2024/3/15 04:10, Filipe Manana 写道: > On Thu, Mar 14, 2024 at 9:51 AM Qu Wenruo <wqu@suse.com> wrote: >> >> Currently the scrub error report is pretty long: >> >> BTRFS error (device dm-2): unable to fixup (regular) error at logical 13647872 on dev /dev/mapper/test-scratch1 physical 13647872 >> BTRFS warning (device dm-2): checksum error at logical 13647872 on dev /dev/mapper/test-scratch1, physical 13647872, root 5, inode 257, offset 16384, length 4096, links 1 (path: file1) >> >> Since we have so many things to output, it's not a surprise we got long >> error lines. >> >> To make the lines a little shorter, and make important info more >> obvious, change the unify output to something like this: >> >> BTRFS error (device dm-2): unable to fixup (regular) error at logical 13647872(1) physical 1(/dev/mapper/test-scratch1)13647872 >> BTRFS warning (device dm-2): checksum error at inode 5/257(file1) fileoff 16384, logical 13647872(1) physical 1(/dev/mapper/test-scratch1)13647872 > > I find that hard to read because: > > 1) Please leave spaces before opening parenthesis. > That makes things less cluttered and easier to the eyes, more > consistent with what we generally do and it's the formal way to do > things in English (see > https://www.scribens.com/grammar-rules/typography/spacing.html); Yep, I can do that, but I also want to keep the involved members together thus closer. Not sure if adding spaces would still keep the close relationships between the values. E.g: inode 5/256 (file1) fileoff 16384, logical 123456 (1) physical 1 (scratch1) 123456 It makes it a little harder to indicate that "(1)" is related to logical address (thus mirror number). > > 2) Instead of "inode 5/257(file1)", something a lot easier to > understand like "inode 5 root 257 path \"file1\"", which leaves no > margin for doubt about what each number is; > > 3) Same with "logical 13647872(1)" - what is the 1? That will be the > question anyone reading such a log message will likely have. > Something like "logical 13647872 mirror 1" makes it clear; > > 4) Same with "physical 1(/dev/mapper/test-scratch1)13647872". > Something like "physical 13647872 device ID 1 device path > \"/dev/mapper/test-scratch1\"", makes it clear what each number is and > easier to read. I totally understand the original output format is much easier to read on its own. However if we have hundreds lines of similar output, it's a different story, a small change in any of the value is much harder to catch, and the extra helpful prompt is in fact a burden other than a bless. (That's why things like spreadsheet is much easier to reader for multiple similarly structured data, and in that case it's much better to craft a script to convert the lines into a csv) Unfortunately we don't have the benefit (at least yet) to structure all these info into a structured output. But what I'm doing here is to reduce the prompts to minimal (at most 4 prompts, "inode", "fileoff", "logical", "physical"), so less duplicated strings for multiple lines of similar error messages. The patchset is in the middle ground between fully detailed output (the old one, focusing on single line) and the fully script orient output (no prompt at all, just all numbers/strings, focusing on CSV like output). Furthermore, I would also argue that, for entry level end users, they can still catch the critical info (like file path and device path) without much difficult, and those info is enough for them to take action. (E.g. deleting the file, or replace the disk) Yes, they would get confused on the devid or rootid, but that's fine and everyone can learn something new. For experienced users who understand basics of btrfs, or us developers, the split in values are already arranged in a way similar sized values are never close together (aka, string/large/small value split). Thanks, Qu > > Thanks. >
Hi Qu, kernel test robot noticed the following build warnings: [auto build test WARNING on kdave/for-next] [also build test WARNING on linus/master v6.8 next-20240314] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-scrub-fix-incorrectly-reported-logical-physical-address/20240314-215457 base: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next patch link: https://lore.kernel.org/r/6ba44b940e4e3eea573cad667ab8c0b2dd8f2c06.1710409033.git.wqu%40suse.com patch subject: [PATCH 6/7] btrfs: scrub: unify and shorten the error message config: m68k-defconfig (https://download.01.org/0day-ci/archive/20240315/202403150650.dNFtHzxf-lkp@intel.com/config) compiler: m68k-linux-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240315/202403150650.dNFtHzxf-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202403150650.dNFtHzxf-lkp@intel.com/ All warnings (new ones prefixed by >>): In file included from fs/btrfs/extent_io.h:16, from fs/btrfs/locking.h:13, from fs/btrfs/ctree.h:19, from fs/btrfs/scrub.c:10: fs/btrfs/scrub.c: In function 'scrub_print_warning_inode': >> fs/btrfs/scrub.c:433:35: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] 433 | (char *)ipath->fspath->val[i], offset, | ^ fs/btrfs/messages.h:27:39: note: in definition of macro 'btrfs_printk' 27 | _btrfs_printk(fs_info, fmt, ##args) | ^~~~ fs/btrfs/messages.h:66:9: note: in expansion of macro 'btrfs_printk_in_rcu' 66 | btrfs_printk_in_rcu(fs_info, KERN_WARNING fmt, ##args) | ^~~~~~~~~~~~~~~~~~~ fs/btrfs/scrub.c:430:17: note: in expansion of macro 'btrfs_warn_in_rcu' 430 | btrfs_warn_in_rcu(fs_info, | ^~~~~~~~~~~~~~~~~ vim +433 fs/btrfs/scrub.c 388 389 static int scrub_print_warning_inode(u64 inum, u64 offset, u64 num_bytes, 390 u64 root, void *warn_ctx) 391 { 392 int ret; 393 int i; 394 unsigned nofs_flag; 395 struct scrub_warning *swarn = warn_ctx; 396 struct btrfs_fs_info *fs_info = swarn->dev->fs_info; 397 struct inode_fs_paths *ipath = NULL; 398 struct btrfs_root *local_root; 399 400 local_root = btrfs_get_fs_root(fs_info, root, true); 401 if (IS_ERR(local_root)) { 402 ret = PTR_ERR(local_root); 403 goto err; 404 } 405 406 /* 407 * init_path might indirectly call vmalloc, or use GFP_KERNEL. Scrub 408 * uses GFP_NOFS in this context, so we keep it consistent but it does 409 * not seem to be strictly necessary. 410 */ 411 nofs_flag = memalloc_nofs_save(); 412 ipath = init_ipath(4096, local_root, swarn->path); 413 memalloc_nofs_restore(nofs_flag); 414 if (IS_ERR(ipath)) { 415 btrfs_put_root(local_root); 416 ret = PTR_ERR(ipath); 417 ipath = NULL; 418 goto err; 419 } 420 ret = paths_from_inode(inum, ipath); 421 422 if (ret < 0) 423 goto err; 424 425 /* 426 * we deliberately ignore the bit ipath might have been too small to 427 * hold all of the paths here 428 */ 429 for (i = 0; i < ipath->fspath->elem_cnt; ++i) 430 btrfs_warn_in_rcu(fs_info, 431 "%s at inode %lld/%llu(%s) fileoff %llu, logical %llu(%u) physical %llu(%s)%llu", 432 swarn->errstr, root, inum, > 433 (char *)ipath->fspath->val[i], offset, 434 swarn->logical, swarn->mirror_num, 435 swarn->dev->devid, btrfs_dev_name(swarn->dev), 436 swarn->physical); 437 438 btrfs_put_root(local_root); 439 free_ipath(ipath); 440 return 0; 441 442 err: 443 btrfs_warn_in_rcu(fs_info, 444 "%s at inode %lld/%llu fileoff %llu, logical %llu(%u) physical %llu(%s)%llu", 445 swarn->errstr, root, inum, offset, 446 swarn->logical, swarn->mirror_num, 447 swarn->dev->devid, btrfs_dev_name(swarn->dev), 448 swarn->physical); 449 free_ipath(ipath); 450 return 0; 451 } 452
Hi Qu,
kernel test robot noticed the following build warnings:
[auto build test WARNING on kdave/for-next]
[also build test WARNING on linus/master v6.8 next-20240315]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-scrub-fix-incorrectly-reported-logical-physical-address/20240314-215457
base: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
patch link: https://lore.kernel.org/r/6ba44b940e4e3eea573cad667ab8c0b2dd8f2c06.1710409033.git.wqu%40suse.com
patch subject: [PATCH 6/7] btrfs: scrub: unify and shorten the error message
config: hexagon-randconfig-r111-20240315 (https://download.01.org/0day-ci/archive/20240315/202403151954.2aDLBPQm-lkp@intel.com/config)
compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project 8f68022f8e6e54d1aeae4ed301f5a015963089b7)
reproduce: (https://download.01.org/0day-ci/archive/20240315/202403151954.2aDLBPQm-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202403151954.2aDLBPQm-lkp@intel.com/
sparse warnings: (new ones prefixed by >>)
>> fs/btrfs/scrub.c:430:17: sparse: sparse: non size-preserving integer to pointer cast
vim +430 fs/btrfs/scrub.c
a2de733c78fa7a Arne Jansen 2011-03-08 388
c7499a64dcf62b Filipe Manana 2022-11-01 389 static int scrub_print_warning_inode(u64 inum, u64 offset, u64 num_bytes,
c7499a64dcf62b Filipe Manana 2022-11-01 390 u64 root, void *warn_ctx)
558540c17771ea Jan Schmidt 2011-06-13 391 {
558540c17771ea Jan Schmidt 2011-06-13 392 int ret;
558540c17771ea Jan Schmidt 2011-06-13 393 int i;
de2491fdefe7e5 David Sterba 2017-05-31 394 unsigned nofs_flag;
ff023aac31198e Stefan Behrens 2012-11-06 395 struct scrub_warning *swarn = warn_ctx;
fb456252d3d9c0 Jeff Mahoney 2016-06-22 396 struct btrfs_fs_info *fs_info = swarn->dev->fs_info;
558540c17771ea Jan Schmidt 2011-06-13 397 struct inode_fs_paths *ipath = NULL;
558540c17771ea Jan Schmidt 2011-06-13 398 struct btrfs_root *local_root;
558540c17771ea Jan Schmidt 2011-06-13 399
56e9357a1e8167 David Sterba 2020-05-15 400 local_root = btrfs_get_fs_root(fs_info, root, true);
558540c17771ea Jan Schmidt 2011-06-13 401 if (IS_ERR(local_root)) {
558540c17771ea Jan Schmidt 2011-06-13 402 ret = PTR_ERR(local_root);
558540c17771ea Jan Schmidt 2011-06-13 403 goto err;
558540c17771ea Jan Schmidt 2011-06-13 404 }
558540c17771ea Jan Schmidt 2011-06-13 405
de2491fdefe7e5 David Sterba 2017-05-31 406 /*
de2491fdefe7e5 David Sterba 2017-05-31 407 * init_path might indirectly call vmalloc, or use GFP_KERNEL. Scrub
de2491fdefe7e5 David Sterba 2017-05-31 408 * uses GFP_NOFS in this context, so we keep it consistent but it does
de2491fdefe7e5 David Sterba 2017-05-31 409 * not seem to be strictly necessary.
de2491fdefe7e5 David Sterba 2017-05-31 410 */
de2491fdefe7e5 David Sterba 2017-05-31 411 nofs_flag = memalloc_nofs_save();
558540c17771ea Jan Schmidt 2011-06-13 412 ipath = init_ipath(4096, local_root, swarn->path);
de2491fdefe7e5 David Sterba 2017-05-31 413 memalloc_nofs_restore(nofs_flag);
26bdef541d26fd Dan Carpenter 2011-11-16 414 if (IS_ERR(ipath)) {
0024652895e347 Josef Bacik 2020-01-24 415 btrfs_put_root(local_root);
26bdef541d26fd Dan Carpenter 2011-11-16 416 ret = PTR_ERR(ipath);
26bdef541d26fd Dan Carpenter 2011-11-16 417 ipath = NULL;
26bdef541d26fd Dan Carpenter 2011-11-16 418 goto err;
26bdef541d26fd Dan Carpenter 2011-11-16 419 }
558540c17771ea Jan Schmidt 2011-06-13 420 ret = paths_from_inode(inum, ipath);
558540c17771ea Jan Schmidt 2011-06-13 421
558540c17771ea Jan Schmidt 2011-06-13 422 if (ret < 0)
558540c17771ea Jan Schmidt 2011-06-13 423 goto err;
558540c17771ea Jan Schmidt 2011-06-13 424
558540c17771ea Jan Schmidt 2011-06-13 425 /*
558540c17771ea Jan Schmidt 2011-06-13 426 * we deliberately ignore the bit ipath might have been too small to
558540c17771ea Jan Schmidt 2011-06-13 427 * hold all of the paths here
558540c17771ea Jan Schmidt 2011-06-13 428 */
558540c17771ea Jan Schmidt 2011-06-13 429 for (i = 0; i < ipath->fspath->elem_cnt; ++i)
5d163e0e68ce74 Jeff Mahoney 2016-09-20 @430 btrfs_warn_in_rcu(fs_info,
023af88ce10a69 Qu Wenruo 2024-03-14 431 "%s at inode %lld/%llu(%s) fileoff %llu, logical %llu(%u) physical %llu(%s)%llu",
023af88ce10a69 Qu Wenruo 2024-03-14 432 swarn->errstr, root, inum,
023af88ce10a69 Qu Wenruo 2024-03-14 433 (char *)ipath->fspath->val[i], offset,
023af88ce10a69 Qu Wenruo 2024-03-14 434 swarn->logical, swarn->mirror_num,
023af88ce10a69 Qu Wenruo 2024-03-14 435 swarn->dev->devid, btrfs_dev_name(swarn->dev),
023af88ce10a69 Qu Wenruo 2024-03-14 436 swarn->physical);
558540c17771ea Jan Schmidt 2011-06-13 437
0024652895e347 Josef Bacik 2020-01-24 438 btrfs_put_root(local_root);
558540c17771ea Jan Schmidt 2011-06-13 439 free_ipath(ipath);
558540c17771ea Jan Schmidt 2011-06-13 440 return 0;
558540c17771ea Jan Schmidt 2011-06-13 441
558540c17771ea Jan Schmidt 2011-06-13 442 err:
5d163e0e68ce74 Jeff Mahoney 2016-09-20 443 btrfs_warn_in_rcu(fs_info,
023af88ce10a69 Qu Wenruo 2024-03-14 444 "%s at inode %lld/%llu fileoff %llu, logical %llu(%u) physical %llu(%s)%llu",
023af88ce10a69 Qu Wenruo 2024-03-14 445 swarn->errstr, root, inum, offset,
023af88ce10a69 Qu Wenruo 2024-03-14 446 swarn->logical, swarn->mirror_num,
023af88ce10a69 Qu Wenruo 2024-03-14 447 swarn->dev->devid, btrfs_dev_name(swarn->dev),
023af88ce10a69 Qu Wenruo 2024-03-14 448 swarn->physical);
558540c17771ea Jan Schmidt 2011-06-13 449 free_ipath(ipath);
558540c17771ea Jan Schmidt 2011-06-13 450 return 0;
558540c17771ea Jan Schmidt 2011-06-13 451 }
558540c17771ea Jan Schmidt 2011-06-13 452
On Thu, Mar 14, 2024 at 8:56 PM Qu Wenruo <wqu@suse.com> wrote: > > > > 在 2024/3/15 04:10, Filipe Manana 写道: > > On Thu, Mar 14, 2024 at 9:51 AM Qu Wenruo <wqu@suse.com> wrote: > >> > >> Currently the scrub error report is pretty long: > >> > >> BTRFS error (device dm-2): unable to fixup (regular) error at logical 13647872 on dev /dev/mapper/test-scratch1 physical 13647872 > >> BTRFS warning (device dm-2): checksum error at logical 13647872 on dev /dev/mapper/test-scratch1, physical 13647872, root 5, inode 257, offset 16384, length 4096, links 1 (path: file1) > >> > >> Since we have so many things to output, it's not a surprise we got long > >> error lines. > >> > >> To make the lines a little shorter, and make important info more > >> obvious, change the unify output to something like this: > >> > >> BTRFS error (device dm-2): unable to fixup (regular) error at logical 13647872(1) physical 1(/dev/mapper/test-scratch1)13647872 > >> BTRFS warning (device dm-2): checksum error at inode 5/257(file1) fileoff 16384, logical 13647872(1) physical 1(/dev/mapper/test-scratch1)13647872 > > > > I find that hard to read because: > > > > 1) Please leave spaces before opening parenthesis. > > That makes things less cluttered and easier to the eyes, more > > consistent with what we generally do and it's the formal way to do > > things in English (see > > https://www.scribens.com/grammar-rules/typography/spacing.html); > > Yep, I can do that, but I also want to keep the involved members > together thus closer. > > Not sure if adding spaces would still keep the close relationships > between the values. > > E.g: inode 5/256 (file1) fileoff 16384, logical 123456 (1) physical 1 > (scratch1) 123456 > > It makes it a little harder to indicate that "(1)" is related to logical > address (thus mirror number). > > > > > 2) Instead of "inode 5/257(file1)", something a lot easier to > > understand like "inode 5 root 257 path \"file1\"", which leaves no > > margin for doubt about what each number is; > > > > 3) Same with "logical 13647872(1)" - what is the 1? That will be the > > question anyone reading such a log message will likely have. > > Something like "logical 13647872 mirror 1" makes it clear; > > > > 4) Same with "physical 1(/dev/mapper/test-scratch1)13647872". > > Something like "physical 13647872 device ID 1 device path > > \"/dev/mapper/test-scratch1\"", makes it clear what each number is and > > easier to read. > > I totally understand the original output format is much easier to read > on its own. > > However if we have hundreds lines of similar output, it's a different > story, a small change in any of the value is much harder to catch, and > the extra helpful prompt is in fact a burden other than a bless. > > (That's why things like spreadsheet is much easier to reader for > multiple similarly structured data, and in that case it's much better to > craft a script to convert the lines into a csv) > > Unfortunately we don't have the benefit (at least yet) to structure all > these info into a structured output. > > > But what I'm doing here is to reduce the prompts to minimal (at most 4 > prompts, "inode", "fileoff", "logical", "physical"), so less duplicated > strings for multiple lines of similar error messages. > > The patchset is in the middle ground between fully detailed output (the > old one, focusing on single line) and the fully script orient output (no > prompt at all, just all numbers/strings, focusing on CSV like output). > > > Furthermore, I would also argue that, for entry level end users, they > can still catch the critical info (like file path and device path) > without much difficult, and those info is enough for them to take action. > (E.g. deleting the file, or replace the disk) > > Yes, they would get confused on the devid or rootid, but that's fine and > everyone can learn something new. > > For experienced users who understand basics of btrfs, or us developers, > the split in values are already arranged in a way similar sized values > are never close together (aka, string/large/small value split). Well, I'm a developer and I can tell you if I ever see such log messages, I'll have to go to the source code to figure out what each value is supposed to be. I'll be able to memorize for some hours or even a few days, but after that I'll forget again and have to look at the source code again. > > Thanks, > Qu > > > > > Thanks. > >
在 2024/3/19 02:56, Filipe Manana 写道: > On Thu, Mar 14, 2024 at 8:56 PM Qu Wenruo <wqu@suse.com> wrote: >> >> >> >> 在 2024/3/15 04:10, Filipe Manana 写道: >>> On Thu, Mar 14, 2024 at 9:51 AM Qu Wenruo <wqu@suse.com> wrote: >>>> >>>> Currently the scrub error report is pretty long: >>>> >>>> BTRFS error (device dm-2): unable to fixup (regular) error at logical 13647872 on dev /dev/mapper/test-scratch1 physical 13647872 >>>> BTRFS warning (device dm-2): checksum error at logical 13647872 on dev /dev/mapper/test-scratch1, physical 13647872, root 5, inode 257, offset 16384, length 4096, links 1 (path: file1) >>>> >>>> Since we have so many things to output, it's not a surprise we got long >>>> error lines. >>>> >>>> To make the lines a little shorter, and make important info more >>>> obvious, change the unify output to something like this: >>>> >>>> BTRFS error (device dm-2): unable to fixup (regular) error at logical 13647872(1) physical 1(/dev/mapper/test-scratch1)13647872 >>>> BTRFS warning (device dm-2): checksum error at inode 5/257(file1) fileoff 16384, logical 13647872(1) physical 1(/dev/mapper/test-scratch1)13647872 >>> >>> I find that hard to read because: >>> >>> 1) Please leave spaces before opening parenthesis. >>> That makes things less cluttered and easier to the eyes, more >>> consistent with what we generally do and it's the formal way to do >>> things in English (see >>> https://www.scribens.com/grammar-rules/typography/spacing.html); >> >> Yep, I can do that, but I also want to keep the involved members >> together thus closer. >> >> Not sure if adding spaces would still keep the close relationships >> between the values. >> >> E.g: inode 5/256 (file1) fileoff 16384, logical 123456 (1) physical 1 >> (scratch1) 123456 >> >> It makes it a little harder to indicate that "(1)" is related to logical >> address (thus mirror number). >> >>> >>> 2) Instead of "inode 5/257(file1)", something a lot easier to >>> understand like "inode 5 root 257 path \"file1\"", which leaves no >>> margin for doubt about what each number is; >>> >>> 3) Same with "logical 13647872(1)" - what is the 1? That will be the >>> question anyone reading such a log message will likely have. >>> Something like "logical 13647872 mirror 1" makes it clear; >>> >>> 4) Same with "physical 1(/dev/mapper/test-scratch1)13647872". >>> Something like "physical 13647872 device ID 1 device path >>> \"/dev/mapper/test-scratch1\"", makes it clear what each number is and >>> easier to read. >> >> I totally understand the original output format is much easier to read >> on its own. >> >> However if we have hundreds lines of similar output, it's a different >> story, a small change in any of the value is much harder to catch, and >> the extra helpful prompt is in fact a burden other than a bless. >> >> (That's why things like spreadsheet is much easier to reader for >> multiple similarly structured data, and in that case it's much better to >> craft a script to convert the lines into a csv) >> >> Unfortunately we don't have the benefit (at least yet) to structure all >> these info into a structured output. >> >> >> But what I'm doing here is to reduce the prompts to minimal (at most 4 >> prompts, "inode", "fileoff", "logical", "physical"), so less duplicated >> strings for multiple lines of similar error messages. >> >> The patchset is in the middle ground between fully detailed output (the >> old one, focusing on single line) and the fully script orient output (no >> prompt at all, just all numbers/strings, focusing on CSV like output). >> >> >> Furthermore, I would also argue that, for entry level end users, they >> can still catch the critical info (like file path and device path) >> without much difficult, and those info is enough for them to take action. >> (E.g. deleting the file, or replace the disk) >> >> Yes, they would get confused on the devid or rootid, but that's fine and >> everyone can learn something new. >> >> For experienced users who understand basics of btrfs, or us developers, >> the split in values are already arranged in a way similar sized values >> are never close together (aka, string/large/small value split). > > Well, I'm a developer and I can tell you if I ever see such log > messages, I'll have to go to the source code to figure out what each > value is supposed to be. > I'll be able to memorize for some hours or even a few days, but after > that I'll forget again and have to look at the source code again. As the core problem is, if you just see one such error, and you need to investigate, yes it needs more digging. But in the real world, the support cases I hit are all have tons of similar lines almost flooding the dmesg (well, in that case they don't include any useful info either). In that case, digging the format once is not a big deal at all, in fact the main heavy lifting part is to determine how severe the corruption is. And I'm not going full csv-like output, we still have several prompts to provide some help. Thanks, Qu > > >> >> Thanks, >> Qu >> >>> >>> Thanks. >>> >
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 18b2ee3b1616..17e492076bf8 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -226,6 +226,7 @@ struct scrub_warning { u64 physical; u64 logical; struct btrfs_device *dev; + int mirror_num; }; static void release_scrub_stripe(struct scrub_stripe *stripe) @@ -427,12 +428,12 @@ static int scrub_print_warning_inode(u64 inum, u64 offset, u64 num_bytes, */ for (i = 0; i < ipath->fspath->elem_cnt; ++i) btrfs_warn_in_rcu(fs_info, -"%s at logical %llu on dev %s, physical %llu, root %llu, inode %llu, offset %llu, path: %s", - swarn->errstr, swarn->logical, - btrfs_dev_name(swarn->dev), - swarn->physical, - root, inum, offset, - (char *)(unsigned long)ipath->fspath->val[i]); +"%s at inode %lld/%llu(%s) fileoff %llu, logical %llu(%u) physical %llu(%s)%llu", + swarn->errstr, root, inum, + (char *)ipath->fspath->val[i], offset, + swarn->logical, swarn->mirror_num, + swarn->dev->devid, btrfs_dev_name(swarn->dev), + swarn->physical); btrfs_put_root(local_root); free_ipath(ipath); @@ -440,18 +441,17 @@ static int scrub_print_warning_inode(u64 inum, u64 offset, u64 num_bytes, err: btrfs_warn_in_rcu(fs_info, - "%s at logical %llu on dev %s, physical %llu, root %llu, inode %llu, offset %llu: path resolving failed with ret=%d", - swarn->errstr, swarn->logical, - btrfs_dev_name(swarn->dev), - swarn->physical, - root, inum, offset, ret); - + "%s at inode %lld/%llu fileoff %llu, logical %llu(%u) physical %llu(%s)%llu", + swarn->errstr, root, inum, offset, + swarn->logical, swarn->mirror_num, + swarn->dev->devid, btrfs_dev_name(swarn->dev), + swarn->physical); free_ipath(ipath); return 0; } static void scrub_print_common_warning(const char *errstr, struct btrfs_device *dev, - u64 logical, u64 physical) + u64 logical, u64 physical, int mirror_num) { struct btrfs_fs_info *fs_info = dev->fs_info; struct btrfs_path *path; @@ -471,6 +471,7 @@ static void scrub_print_common_warning(const char *errstr, struct btrfs_device * swarn.logical = logical; swarn.errstr = errstr; swarn.dev = NULL; + swarn.mirror_num = mirror_num; ret = extent_from_logical(fs_info, swarn.logical, path, &found_key, &flags); @@ -501,10 +502,11 @@ static void scrub_print_common_warning(const char *errstr, struct btrfs_device * if (ret > 0) break; btrfs_warn_in_rcu(fs_info, -"%s at logical %llu on dev %s, physical %llu: metadata %s (level %d) in tree %llu", - errstr, swarn.logical, btrfs_dev_name(dev), - swarn.physical, (ref_level ? "node" : "leaf"), - ref_level, ref_root); +"%s at metadata in tree %llu(%u), logical %llu(%u) physical %llu(%s)%llu", + errstr, ref_root, ref_level, + swarn.logical, swarn.mirror_num, + dev->devid, btrfs_dev_name(dev), + swarn.physical); } btrfs_release_path(path); } else { @@ -879,27 +881,32 @@ static void scrub_stripe_report_errors(struct scrub_ctx *sctx, */ if (repaired) { btrfs_err_rl_in_rcu(fs_info, - "fixed up error at logical %llu on dev %s physical %llu", - logical, btrfs_dev_name(dev), + "fixed up error at logical %llu(%u) physical %llu(%s)%llu", + logical, stripe->mirror_num, + dev->devid, btrfs_dev_name(dev), physical); continue; } /* The remaining are all for unrepaired. */ btrfs_err_rl_in_rcu(fs_info, - "unable to fixup (regular) error at logical %llu on dev %s physical %llu", - logical, btrfs_dev_name(dev), + "unable to fixup (regular) error at logical %llu(%u) physical %llu(%s)%llu", + logical, stripe->mirror_num, + dev->devid, btrfs_dev_name(dev), physical); if (test_bit(sector_nr, &stripe->io_error_bitmap)) scrub_print_common_warning("i/o error", dev, - logical, physical); + logical, physical, + stripe->mirror_num); if (test_bit(sector_nr, &stripe->csum_error_bitmap)) scrub_print_common_warning("checksum error", dev, - logical, physical); + logical, physical, + stripe->mirror_num); if (test_bit(sector_nr, &stripe->meta_error_bitmap)) scrub_print_common_warning("header error", dev, - logical, physical); + logical, physical, + stripe->mirror_num); } spin_lock(&sctx->stat_lock);
Currently the scrub error report is pretty long: BTRFS error (device dm-2): unable to fixup (regular) error at logical 13647872 on dev /dev/mapper/test-scratch1 physical 13647872 BTRFS warning (device dm-2): checksum error at logical 13647872 on dev /dev/mapper/test-scratch1, physical 13647872, root 5, inode 257, offset 16384, length 4096, links 1 (path: file1) Since we have so many things to output, it's not a surprise we got long error lines. To make the lines a little shorter, and make important info more obvious, change the unify output to something like this: BTRFS error (device dm-2): unable to fixup (regular) error at logical 13647872(1) physical 1(/dev/mapper/test-scratch1)13647872 BTRFS warning (device dm-2): checksum error at inode 5/257(file1) fileoff 16384, logical 13647872(1) physical 1(/dev/mapper/test-scratch1)13647872 The idea is, to put large values/small values/string separated meanwhile skip the some descriptor directly: - "logical LOGICAL(MIRROR)" LOGICAL is always a large value, meanwhile MIRROR is always a single digit. Thus combining them together, human can still split them instinctively. - "physical DEVID(DEVPATH)PHYSICAL" DEVID is normally a short number, DEVPATH is a long string, and PHYSICAL is a normally a large number. And for most end users, the most important device path is still easy to catch, even surrounded by some large numbers. - inode ROOT/INO(PATH) To locate a btrfs inode, we have to provide both root and inode number. Normally ROOT should be a much smaller number (3 digits) meanwhile the INO can be pretty large. However for the end user, the most important thing is the PATH, which is still not hard to locate. Any better ideas would be appreciated to make those lines shorter. However what we really need is a proper way to report all those info (maybe put the file/dev name resolution into the user space) to user space reliably (without rate limit). Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/scrub.c | 55 +++++++++++++++++++++++++++--------------------- 1 file changed, 31 insertions(+), 24 deletions(-)