Message ID | 20250129140207.22718-4-joshi.k@samsung.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Btrfs checksum offload | expand |
在 2025/1/30 00:32, Kanchan Joshi 写道: > Add new mount option 'datsum_offload'. > > When passed > - Data checksumming at the FS level is disabled. > - Data checksumming at the device level is enabled. This is done by > setting REQ_INTEGRITY_OFFLOAD flag for data I/O if the underlying device is > capable. > > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Just as mentioned by Christoph, the change on csum tree is an on-disk format change, which requires extra incompat flags. I believe that's fine because this is only a prototype. But my concern is, this lack of csum tree has the following problems: - Require all devices in the btrfs has the same capacity E.g. you can no longer add a devices without REQ_INTEGRITY_OFFLOAD capability. This can be a very big problem, especially if one just wants to migrate the fs to another device. - Less versatile compared to nodatacsum flags/mount option For NODATACSUM flag it can be set on a per-inode basis, but the new no-datacsum flag is bound to hardware storage. And finally my question on why to remove btrfs datacsum. I understand the device's own checksum is super fast and efficient, but that doesn't mean different checksum at different layers are exclusive. Yes, it cause some extra workload, but since it's handled by hardware there is no obvious penalty. On the other hand, it is adding one extra layer of protection, upon the existing btrfs' checksum. Thus if the end user is fully trusting the hardware's protection, then they can just use nodatasum mount option and call it a day. The benefit you shown is really just the benefit from "nodatasum" behavior, and that's more or less expected due to the COW overhead. So I really prefer to let the end user to choose what they want. If they want to fully rely on the hardware's internal checksum, then configure the block device to do it, and create a btrfs with nodatasum. If they want both btrfs and the hardware checksum, just do the usual way. Thanks, Qu > --- > fs/btrfs/bio.c | 12 ++++++++++++ > fs/btrfs/fs.h | 1 + > fs/btrfs/super.c | 9 +++++++++ > 3 files changed, 22 insertions(+) > > diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c > index 7ea6f0b43b95..811d89c64991 100644 > --- a/fs/btrfs/bio.c > +++ b/fs/btrfs/bio.c > @@ -5,6 +5,7 @@ > */ > > #include <linux/bio.h> > +#include <linux/blk-integrity.h> > #include "bio.h" > #include "ctree.h" > #include "volumes.h" > @@ -424,6 +425,15 @@ static void btrfs_clone_write_end_io(struct bio *bio) > bio_put(bio); > } > > +static void btrfs_prep_csum_offload_hw(struct btrfs_device *dev, struct bio *bio) > +{ > + struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk); > + > + if (btrfs_test_opt(dev->fs_info, DATASUM_OFFLOAD) && > + bi && bi->offload_type != BLK_INTEGRITY_OFFLOAD_NONE) > + bio->bi_opf |= REQ_INTEGRITY_OFFLOAD; > +} > + > static void btrfs_submit_dev_bio(struct btrfs_device *dev, struct bio *bio) > { > if (!dev || !dev->bdev || > @@ -435,6 +445,8 @@ static void btrfs_submit_dev_bio(struct btrfs_device *dev, struct bio *bio) > } > > bio_set_dev(bio, dev->bdev); > + if (!(bio->bi_opf & REQ_META)) > + btrfs_prep_csum_offload_hw(dev, bio); > > /* > * For zone append writing, bi_sector must point the beginning of the > diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h > index 79a1a3d6f04d..88e493967100 100644 > --- a/fs/btrfs/fs.h > +++ b/fs/btrfs/fs.h > @@ -228,6 +228,7 @@ enum { > BTRFS_MOUNT_NOSPACECACHE = (1ULL << 30), > BTRFS_MOUNT_IGNOREMETACSUMS = (1ULL << 31), > BTRFS_MOUNT_IGNORESUPERFLAGS = (1ULL << 32), > + BTRFS_MOUNT_DATASUM_OFFLOAD = (1ULL << 33), > }; > > /* > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c > index 7dfe5005129a..d0d5b35c2df9 100644 > --- a/fs/btrfs/super.c > +++ b/fs/btrfs/super.c > @@ -121,6 +121,7 @@ enum { > Opt_treelog, > Opt_user_subvol_rm_allowed, > Opt_norecovery, > + Opt_datasum_offload, > > /* Rescue options */ > Opt_rescue, > @@ -223,6 +224,7 @@ static const struct fs_parameter_spec btrfs_fs_parameters[] = { > fsparam_string("compress-force", Opt_compress_force_type), > fsparam_flag_no("datacow", Opt_datacow), > fsparam_flag_no("datasum", Opt_datasum), > + fsparam_flag_no("datasum_offload", Opt_datasum_offload), > fsparam_flag("degraded", Opt_degraded), > fsparam_string("device", Opt_device), > fsparam_flag_no("discard", Opt_discard), > @@ -323,6 +325,10 @@ static int btrfs_parse_param(struct fs_context *fc, struct fs_parameter *param) > btrfs_clear_opt(ctx->mount_opt, NODATASUM); > } > break; > + case Opt_datasum_offload: > + btrfs_set_opt(ctx->mount_opt, NODATASUM); > + btrfs_set_opt(ctx->mount_opt, DATASUM_OFFLOAD); > + break; > case Opt_datacow: > if (result.negated) { > btrfs_clear_opt(ctx->mount_opt, COMPRESS); > @@ -1057,6 +1063,8 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry) > seq_puts(seq, ",degraded"); > if (btrfs_test_opt(info, NODATASUM)) > seq_puts(seq, ",nodatasum"); > + if (btrfs_test_opt(info, DATASUM_OFFLOAD)) > + seq_puts(seq, ",datasum_offload"); > if (btrfs_test_opt(info, NODATACOW)) > seq_puts(seq, ",nodatacow"); > if (btrfs_test_opt(info, NOBARRIER)) > @@ -1434,6 +1442,7 @@ static void btrfs_emit_options(struct btrfs_fs_info *info, > btrfs_info_if_set(info, old, NODATASUM, "setting nodatasum"); > btrfs_info_if_set(info, old, DEGRADED, "allowing degraded mounts"); > btrfs_info_if_set(info, old, NODATASUM, "setting nodatasum"); > + btrfs_info_if_set(info, old, DATASUM_OFFLOAD, "setting datasum offload to the device"); > btrfs_info_if_set(info, old, SSD, "enabling ssd optimizations"); > btrfs_info_if_set(info, old, SSD_SPREAD, "using spread ssd allocation scheme"); > btrfs_info_if_set(info, old, NOBARRIER, "turning off barriers");
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c index 7ea6f0b43b95..811d89c64991 100644 --- a/fs/btrfs/bio.c +++ b/fs/btrfs/bio.c @@ -5,6 +5,7 @@ */ #include <linux/bio.h> +#include <linux/blk-integrity.h> #include "bio.h" #include "ctree.h" #include "volumes.h" @@ -424,6 +425,15 @@ static void btrfs_clone_write_end_io(struct bio *bio) bio_put(bio); } +static void btrfs_prep_csum_offload_hw(struct btrfs_device *dev, struct bio *bio) +{ + struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk); + + if (btrfs_test_opt(dev->fs_info, DATASUM_OFFLOAD) && + bi && bi->offload_type != BLK_INTEGRITY_OFFLOAD_NONE) + bio->bi_opf |= REQ_INTEGRITY_OFFLOAD; +} + static void btrfs_submit_dev_bio(struct btrfs_device *dev, struct bio *bio) { if (!dev || !dev->bdev || @@ -435,6 +445,8 @@ static void btrfs_submit_dev_bio(struct btrfs_device *dev, struct bio *bio) } bio_set_dev(bio, dev->bdev); + if (!(bio->bi_opf & REQ_META)) + btrfs_prep_csum_offload_hw(dev, bio); /* * For zone append writing, bi_sector must point the beginning of the diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h index 79a1a3d6f04d..88e493967100 100644 --- a/fs/btrfs/fs.h +++ b/fs/btrfs/fs.h @@ -228,6 +228,7 @@ enum { BTRFS_MOUNT_NOSPACECACHE = (1ULL << 30), BTRFS_MOUNT_IGNOREMETACSUMS = (1ULL << 31), BTRFS_MOUNT_IGNORESUPERFLAGS = (1ULL << 32), + BTRFS_MOUNT_DATASUM_OFFLOAD = (1ULL << 33), }; /* diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 7dfe5005129a..d0d5b35c2df9 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -121,6 +121,7 @@ enum { Opt_treelog, Opt_user_subvol_rm_allowed, Opt_norecovery, + Opt_datasum_offload, /* Rescue options */ Opt_rescue, @@ -223,6 +224,7 @@ static const struct fs_parameter_spec btrfs_fs_parameters[] = { fsparam_string("compress-force", Opt_compress_force_type), fsparam_flag_no("datacow", Opt_datacow), fsparam_flag_no("datasum", Opt_datasum), + fsparam_flag_no("datasum_offload", Opt_datasum_offload), fsparam_flag("degraded", Opt_degraded), fsparam_string("device", Opt_device), fsparam_flag_no("discard", Opt_discard), @@ -323,6 +325,10 @@ static int btrfs_parse_param(struct fs_context *fc, struct fs_parameter *param) btrfs_clear_opt(ctx->mount_opt, NODATASUM); } break; + case Opt_datasum_offload: + btrfs_set_opt(ctx->mount_opt, NODATASUM); + btrfs_set_opt(ctx->mount_opt, DATASUM_OFFLOAD); + break; case Opt_datacow: if (result.negated) { btrfs_clear_opt(ctx->mount_opt, COMPRESS); @@ -1057,6 +1063,8 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry) seq_puts(seq, ",degraded"); if (btrfs_test_opt(info, NODATASUM)) seq_puts(seq, ",nodatasum"); + if (btrfs_test_opt(info, DATASUM_OFFLOAD)) + seq_puts(seq, ",datasum_offload"); if (btrfs_test_opt(info, NODATACOW)) seq_puts(seq, ",nodatacow"); if (btrfs_test_opt(info, NOBARRIER)) @@ -1434,6 +1442,7 @@ static void btrfs_emit_options(struct btrfs_fs_info *info, btrfs_info_if_set(info, old, NODATASUM, "setting nodatasum"); btrfs_info_if_set(info, old, DEGRADED, "allowing degraded mounts"); btrfs_info_if_set(info, old, NODATASUM, "setting nodatasum"); + btrfs_info_if_set(info, old, DATASUM_OFFLOAD, "setting datasum offload to the device"); btrfs_info_if_set(info, old, SSD, "enabling ssd optimizations"); btrfs_info_if_set(info, old, SSD_SPREAD, "using spread ssd allocation scheme"); btrfs_info_if_set(info, old, NOBARRIER, "turning off barriers");
Add new mount option 'datsum_offload'. When passed - Data checksumming at the FS level is disabled. - Data checksumming at the device level is enabled. This is done by setting REQ_INTEGRITY_OFFLOAD flag for data I/O if the underlying device is capable. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> --- fs/btrfs/bio.c | 12 ++++++++++++ fs/btrfs/fs.h | 1 + fs/btrfs/super.c | 9 +++++++++ 3 files changed, 22 insertions(+)