Message ID | 20240607055912.3586772-4-hch@lst.de (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | [01/11] dm-integrity: use the nop integrity profile | expand |
Context | Check | Description |
---|---|---|
mdraidci/vmtest-md-6_11-PR | fail | merge-conflict |
On 6/7/24 07:58, Christoph Hellwig wrote: > Remove the BIP_IP_CHECKSUM as sd can just look at the per-disk > checksum type instead. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > Reviewed-by: Bart Van Assche <bvanassche@acm.org> > Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> > Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> > --- > block/bio-integrity.c | 3 --- > drivers/scsi/sd.c | 6 +++--- > include/linux/bio.h | 5 ++--- > 3 files changed, 5 insertions(+), 9 deletions(-) > Reviewed-by: Hannes Reinecke <hare@suse.de> Cheers, Hannes
Christoph, > Remove the BIP_IP_CHECKSUM as sd can just look at the per-disk > checksum type instead. This removes the ability to submit an individual I/O using a CRC instead of the IP checksum. There are cases which can't be expressed when the controller is operating in IP checksum mode.
On Mon, Jun 10, 2024 at 07:56:00AM -0400, Martin K. Petersen wrote: > > Remove the BIP_IP_CHECKSUM as sd can just look at the per-disk > > checksum type instead. > > This removes the ability to submit an individual I/O using a CRC instead > of the IP checksum. There are cases which can't be expressed when > the controller is operating in IP checksum mode. Huh, how?
Christoph, >> This removes the ability to submit an individual I/O using a CRC >> instead of the IP checksum. There are cases which can't be expressed >> when the controller is operating in IP checksum mode. > > Huh, how? On the wire between controller and target there's only CRC. If I want to write a "bad" CRC to disk, I have switch the controller to CRC mode. The controller can't convert a "bad" IP checksum to a "bad" CRC. The PI test tooling relies heavily on being able to write "bad" things to disk and read them back to validate that we detect the error.
On Mon, Jun 10, 2024 at 08:19:33AM -0400, Martin K. Petersen wrote: > On the wire between controller and target there's only CRC. If I want to > write a "bad" CRC to disk, I have switch the controller to CRC mode. The > controller can't convert a "bad" IP checksum to a "bad" CRC. The PI test > tooling relies heavily on being able to write "bad" things to disk and > read them back to validate that we detect the error. But how do you even toggle the flag? There is no no code to do that. And if you already have a special kernel module for that it really should just use a passthrough request to take care of that. Note that unlike the NOCHECK flag which I just cleaned up because they were unused, this one actually does get in the way of the architecture of the whole series :( We could add a per-bip csum_type but it would feel really weird.
Christoph, Sorry about the delay. Travel got in the way. >> On the wire between controller and target there's only CRC. If I want to >> write a "bad" CRC to disk, I have switch the controller to CRC mode. The >> controller can't convert a "bad" IP checksum to a "bad" CRC. The PI test >> tooling relies heavily on being able to write "bad" things to disk and >> read them back to validate that we detect the error. > > But how do you even toggle the flag? There is no no code to do that. > And if you already have a special kernel module for that it really > should just use a passthrough request to take care of that. A passthrough command to the controller? > Note that unlike the NOCHECK flag which I just cleaned up because they > were unused, this one actually does get in the way of the architecture > of the whole series :( We could add a per-bip csum_type but it would > feel really weird. Why would it feel weird? That's how it currently works. The qualification tool issues a flurry of commands injecting errors at various places in the stack to identify that the right entity (block layer, controller, storage device) catch a bad checksum, reference tag, etc. Being able to enable/disable checking at each place in the stack is important. I also have code for target that does the same thing in the reverse direction.
On Tue, Jun 11, 2024 at 03:51:27PM -0400, Martin K. Petersen wrote: > > But how do you even toggle the flag? There is no no code to do that. > > And if you already have a special kernel module for that it really > > should just use a passthrough request to take care of that. > > A passthrough command to the controller? A passthrough command to the LU that has a mismatching checksum. > > Note that unlike the NOCHECK flag which I just cleaned up because they > > were unused, this one actually does get in the way of the architecture > > of the whole series :( We could add a per-bip csum_type but it would > > feel really weird. > > Why would it feel weird? That's how it currently works. Because there's no way to have it set to anything but the per-queue one. > The qualification tool issues a flurry of commands injecting errors at > various places in the stack to identify that the right entity (block > layer, controller, storage device) catch a bad checksum, reference tag, > etc. How does it do that? There's no actualy way to make it mismatch.
Christoph, >> > Note that unlike the NOCHECK flag which I just cleaned up because they >> > were unused, this one actually does get in the way of the architecture >> > of the whole series :( We could add a per-bip csum_type but it would >> > feel really weird. >> >> Why would it feel weird? That's how it currently works. > > Because there's no way to have it set to anything but the per-queue > one. That's what the io_uring passthrough changes enable. Note that the IP checksum is an optional performance feature. A SCSI controller supporting IP-to-CRC conversion does not imply that all submitted metadata must use IP checksum format. The T10 CRC used to be painfully slow to calculate prior to processors growing support for pclmulqdq or similar. Hence the optional IP checksum. But on a modern CPU, the T10 CRC can often be calculated fast enough that it is less of a performance impediment. The interface was explicitly designed so that the entity which creates the metadata decides which checksum it wants to use. And then it uses the bip flag to communicate that to the HBA. The patch which allowed the user to set the desired guard tag format for block layer-owned PI fell by the wayside, apparently. Possibly lost track because the T10 CRC hardware offload changes took a while to land. Note that I would personally love to get rid of the IP checksum altogether but I think it's too soon to make it obsolete. Still a lot of SCSI stuff out there which runs in IP checksum mode. And it is still a bit faster than CRC for many workloads. And as long as it is in use, we need the ability to support it and qualify it. All I'm asking is that we retain the ability to disable checking at the controller level and at the target level. And that the optional IP checksum can be selected on a per-I/O basis. IOW, please just retain the three existing bip flags. Happy to look into what polarity-reversal would look like but I don't think that should hold up your series. >> The qualification tool issues a flurry of commands injecting errors at >> various places in the stack to identify that the right entity (block >> layer, controller, storage device) catch a bad checksum, reference tag, >> etc. > > How does it do that? There's no actualy way to make it mismatch. Through a custom passthrough driver that we want to get rid of and replace with the io_uring interface series.
On Wed, Jun 12, 2024 at 01:27:47PM -0400, Martin K. Petersen wrote: > >> > Note that unlike the NOCHECK flag which I just cleaned up because they > >> > were unused, this one actually does get in the way of the architecture > >> > of the whole series :( We could add a per-bip csum_type but it would > >> > feel really weird. > >> > >> Why would it feel weird? That's how it currently works. > > > > Because there's no way to have it set to anything but the per-queue > > one. > > That's what the io_uring passthrough changes enable. The checksum type? How is that compatible with nvme? Anyway, I'll just leave this flag in for the resend, but if we can't come up with a coherent user for it in a merge cycle or two (which I very much doubt) I'll send another patch to remove it.
Christoph,
> The checksum type? How is that compatible with nvme?
NVMe does not support IP checksum.
On Thu, Jun 13, 2024 at 08:57:26PM -0400, Martin K. Petersen wrote: > > Christoph, > > > The checksum type? How is that compatible with nvme? > > NVMe does not support IP checksum. Yes, I know. So how do you make an interface work where you can arbitrarily force it?
diff --git a/block/bio-integrity.c b/block/bio-integrity.c index af7f71d16114de..c69da65759af89 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -488,9 +488,6 @@ bool bio_integrity_prep(struct bio *bio) bip->bip_flags |= BIP_BLOCK_INTEGRITY; bip_set_seed(bip, bio->bi_iter.bi_sector); - if (bi->flags & BLK_INTEGRITY_IP_CHECKSUM) - bip->bip_flags |= BIP_IP_CHECKSUM; - /* Map it */ offset = offset_in_page(buf); for (i = 0; i < nr_pages && len > 0; i++) { diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index b477383ccc3b2a..e21b7df5c31b0d 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -43,7 +43,7 @@ #include <linux/idr.h> #include <linux/interrupt.h> #include <linux/init.h> -#include <linux/blkdev.h> +#include <linux/blk-integrity.h> #include <linux/blkpg.h> #include <linux/blk-pm.h> #include <linux/delay.h> @@ -799,12 +799,12 @@ static unsigned char sd_setup_protect_cmnd(struct scsi_cmnd *scmd, unsigned int dix, unsigned int dif) { struct request *rq = scsi_cmd_to_rq(scmd); - struct bio *bio = rq->bio; + struct blk_integrity *bi = &rq->q->integrity; unsigned int prot_op = sd_prot_op(rq_data_dir(rq), dix, dif); unsigned int protect = 0; if (dix) { /* DIX Type 0, 1, 2, 3 */ - if (bio_integrity_flagged(bio, BIP_IP_CHECKSUM)) + if (bi->flags & BLK_INTEGRITY_IP_CHECKSUM) scmd->prot_flags |= SCSI_PROT_IP_CHECKSUM; scmd->prot_flags |= SCSI_PROT_GUARD_CHECK; } diff --git a/include/linux/bio.h b/include/linux/bio.h index ec5dcf8635ac66..3295dd6021659b 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -324,9 +324,8 @@ static inline void bio_next_folio(struct folio_iter *fi, struct bio *bio) enum bip_flags { BIP_BLOCK_INTEGRITY = 1 << 0, /* block layer owns integrity data */ BIP_MAPPED_INTEGRITY = 1 << 1, /* ref tag has been remapped */ - BIP_IP_CHECKSUM = 1 << 4, /* IP checksum */ - BIP_INTEGRITY_USER = 1 << 5, /* Integrity payload is user address */ - BIP_COPY_USER = 1 << 6, /* Kernel bounce buffer in use */ + BIP_INTEGRITY_USER = 1 << 2, /* Integrity payload is user address */ + BIP_COPY_USER = 1 << 3, /* Kernel bounce buffer in use */ }; /*