Message ID | cover.1607976425.git.asml.silence@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | no-copy bvec | expand |
On Tue, Dec 15, 2020 at 12:20:19AM +0000, Pavel Begunkov wrote: > Instead of creating a full copy of iter->bvec into bio in direct I/O, > the patchset makes use of the one provided. It changes semantics and > obliges users of asynchronous kiocb to track bvec lifetime, and [1/6] > converts the only place that doesn't. Just think of one corner case: iov_iter(BVEC) may pass bvec table with zero length bvec, which may not be supported by block layer or driver, so this patchset has to address this case first. Please see 7e24969022cb ("block: allow for_each_bvec to support zero len bvec"). thanks, Ming
On 15/12/2020 01:41, Ming Lei wrote: > On Tue, Dec 15, 2020 at 12:20:19AM +0000, Pavel Begunkov wrote: >> Instead of creating a full copy of iter->bvec into bio in direct I/O, >> the patchset makes use of the one provided. It changes semantics and >> obliges users of asynchronous kiocb to track bvec lifetime, and [1/6] >> converts the only place that doesn't. > > Just think of one corner case: iov_iter(BVEC) may pass bvec table with zero > length bvec, which may not be supported by block layer or driver, so > this patchset has to address this case first. The easiest for me would be to fallback to copy if there are zero bvecs, e.g. finding such during iov_iter_alignment(), but do we know from where zero bvecs can came? As it's internals we may want to forbid them if there is not too much hassle. > > Please see 7e24969022cb ("block: allow for_each_bvec to support zero len bvec").
On Tue, Dec 15, 2020 at 11:14:20AM +0000, Pavel Begunkov wrote: > On 15/12/2020 01:41, Ming Lei wrote: > > On Tue, Dec 15, 2020 at 12:20:19AM +0000, Pavel Begunkov wrote: > >> Instead of creating a full copy of iter->bvec into bio in direct I/O, > >> the patchset makes use of the one provided. It changes semantics and > >> obliges users of asynchronous kiocb to track bvec lifetime, and [1/6] > >> converts the only place that doesn't. > > > > Just think of one corner case: iov_iter(BVEC) may pass bvec table with zero > > length bvec, which may not be supported by block layer or driver, so > > this patchset has to address this case first. > > The easiest for me would be to fallback to copy if there are zero bvecs, > e.g. finding such during iov_iter_alignment(), but do we know from where > zero bvecs can came? As it's internals we may want to forbid them if > there is not too much hassle. You may find clue from the following link: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg2262077.html Thanks, Ming
On 15/12/2020 12:03, Ming Lei wrote: > On Tue, Dec 15, 2020 at 11:14:20AM +0000, Pavel Begunkov wrote: >> On 15/12/2020 01:41, Ming Lei wrote: >>> On Tue, Dec 15, 2020 at 12:20:19AM +0000, Pavel Begunkov wrote: >>>> Instead of creating a full copy of iter->bvec into bio in direct I/O, >>>> the patchset makes use of the one provided. It changes semantics and >>>> obliges users of asynchronous kiocb to track bvec lifetime, and [1/6] >>>> converts the only place that doesn't. >>> >>> Just think of one corner case: iov_iter(BVEC) may pass bvec table with zero >>> length bvec, which may not be supported by block layer or driver, so >>> this patchset has to address this case first. >> >> The easiest for me would be to fallback to copy if there are zero bvecs, >> e.g. finding such during iov_iter_alignment(), but do we know from where >> zero bvecs can came? As it's internals we may want to forbid them if >> there is not too much hassle. > > You may find clue from the following link: > > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg2262077.html Thanks for the link! Al, you mentioned "Zero-length segments are not disallowed", do you have a strong opinion on that? Apart from already diverged behaviour from the block layer and getting in the way of this series, without it we'd also be able to remove some extra ifs, e.g. in iterate_bvec()
On Tue, Dec 15, 2020 at 02:05:35PM +0000, Pavel Begunkov wrote: > > You may find clue from the following link: > > > > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg2262077.html > > Thanks for the link! > > Al, you mentioned "Zero-length segments are not disallowed", do you have > a strong opinion on that? Apart from already diverged behaviour from the > block layer and getting in the way of this series, without it we'd also be > able to remove some extra ifs, e.g. in iterate_bvec() I'd prefer not to support zero-length ITER_BVEC and catching them early, as the block layer can't deal with them either. From a quick look at iter_file_splice_write it should be pretty trivial to fix there, although we'll need to audit other callers as well (even if I don't expect them to submit this degenerate case).
On 22/12/2020 14:11, Christoph Hellwig wrote: > On Tue, Dec 15, 2020 at 02:05:35PM +0000, Pavel Begunkov wrote: >>> You may find clue from the following link: >>> >>> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg2262077.html >> >> Thanks for the link! >> >> Al, you mentioned "Zero-length segments are not disallowed", do you have >> a strong opinion on that? Apart from already diverged behaviour from the >> block layer and getting in the way of this series, without it we'd also be >> able to remove some extra ifs, e.g. in iterate_bvec() > > I'd prefer not to support zero-length ITER_BVEC and catching them > early, as the block layer can't deal with them either. From a quick > look at iter_file_splice_write it should be pretty trivial to fix there, > although we'll need to audit other callers as well (even if I don't > expect them to submit this degenerate case). Can scatterlist have 0-len entries? Those are directly translated into bvecs, e.g. in nvme/target/io-cmd-file.c and target/target_core_file.c. I've audited most of others by this moment, they're fine. Thanks for other nits, they will go into next version.
On Wed, Dec 23, 2020 at 12:52:59PM +0000, Pavel Begunkov wrote: > Can scatterlist have 0-len entries? Those are directly translated into > bvecs, e.g. in nvme/target/io-cmd-file.c and target/target_core_file.c. > I've audited most of others by this moment, they're fine. For block layer SGLs we should never see them, and for nvme neither. I think the same is true for the SCSI target code, but please double check.
On Wed, 2020-12-23 at 15:51 +0000, Christoph Hellwig wrote: > On Wed, Dec 23, 2020 at 12:52:59PM +0000, Pavel Begunkov wrote: > > Can scatterlist have 0-len entries? Those are directly translated > > into bvecs, e.g. in nvme/target/io-cmd-file.c and > > target/target_core_file.c. I've audited most of others by this > > moment, they're fine. > > For block layer SGLs we should never see them, and for nvme neither. > I think the same is true for the SCSI target code, but please double > check. Right, no-one ever wants to see a 0-len scatter list entry. The reason is that every driver uses the sgl to program the device DMA engine in the way NVME does. a 0 length sgl would be a dangerous corner case: some DMA engines would ignore it and others would go haywire, so if we ever let a 0 length list down into the driver, they'd have to understand the corner case behaviour of their DMA engine and filter it accordingly, which is why we disallow them in the upper levels, since they're effective nops anyway. James
On 2020-12-23 11:04 a.m., James Bottomley wrote: > On Wed, 2020-12-23 at 15:51 +0000, Christoph Hellwig wrote: >> On Wed, Dec 23, 2020 at 12:52:59PM +0000, Pavel Begunkov wrote: >>> Can scatterlist have 0-len entries? Those are directly translated >>> into bvecs, e.g. in nvme/target/io-cmd-file.c and >>> target/target_core_file.c. I've audited most of others by this >>> moment, they're fine. >> >> For block layer SGLs we should never see them, and for nvme neither. >> I think the same is true for the SCSI target code, but please double >> check. > > Right, no-one ever wants to see a 0-len scatter list entry. The reason > is that every driver uses the sgl to program the device DMA engine in > the way NVME does. a 0 length sgl would be a dangerous corner case: > some DMA engines would ignore it and others would go haywire, so if we > ever let a 0 length list down into the driver, they'd have to > understand the corner case behaviour of their DMA engine and filter it > accordingly, which is why we disallow them in the upper levels, since > they're effective nops anyway. When using scatter gather lists at the far end (i.e. on the storage device) the T10 examples (WRITE SCATTERED and POPULATE TOKEN in SBC-4) explicitly allow the "number of logical blocks" in their sgl_s to be zero and state that it is _not_ to be considered an error. Doug Gilbert
On 23/12/2020 20:23, Douglas Gilbert wrote: > On 2020-12-23 11:04 a.m., James Bottomley wrote: >> On Wed, 2020-12-23 at 15:51 +0000, Christoph Hellwig wrote: >>> On Wed, Dec 23, 2020 at 12:52:59PM +0000, Pavel Begunkov wrote: >>>> Can scatterlist have 0-len entries? Those are directly translated >>>> into bvecs, e.g. in nvme/target/io-cmd-file.c and >>>> target/target_core_file.c. I've audited most of others by this >>>> moment, they're fine. >>> >>> For block layer SGLs we should never see them, and for nvme neither. >>> I think the same is true for the SCSI target code, but please double >>> check. >> >> Right, no-one ever wants to see a 0-len scatter list entry. The reason >> is that every driver uses the sgl to program the device DMA engine in >> the way NVME does. a 0 length sgl would be a dangerous corner case: >> some DMA engines would ignore it and others would go haywire, so if we >> ever let a 0 length list down into the driver, they'd have to >> understand the corner case behaviour of their DMA engine and filter it >> accordingly, which is why we disallow them in the upper levels, since >> they're effective nops anyway. > > When using scatter gather lists at the far end (i.e. on the storage device) > the T10 examples (WRITE SCATTERED and POPULATE TOKEN in SBC-4) explicitly > allow the "number of logical blocks" in their sgl_s to be zero and state > that it is _not_ to be considered an error. It's fine for my case unless it leaks them out of device driver to the net/block layer/etc. Is it?
On Wed, Dec 23, 2020 at 08:32:45PM +0000, Pavel Begunkov wrote: > On 23/12/2020 20:23, Douglas Gilbert wrote: > > On 2020-12-23 11:04 a.m., James Bottomley wrote: > >> On Wed, 2020-12-23 at 15:51 +0000, Christoph Hellwig wrote: > >>> On Wed, Dec 23, 2020 at 12:52:59PM +0000, Pavel Begunkov wrote: > >>>> Can scatterlist have 0-len entries? Those are directly translated > >>>> into bvecs, e.g. in nvme/target/io-cmd-file.c and > >>>> target/target_core_file.c. I've audited most of others by this > >>>> moment, they're fine. > >>> > >>> For block layer SGLs we should never see them, and for nvme neither. > >>> I think the same is true for the SCSI target code, but please double > >>> check. > >> > >> Right, no-one ever wants to see a 0-len scatter list entry.?? The reason > >> is that every driver uses the sgl to program the device DMA engine in > >> the way NVME does.?? a 0 length sgl would be a dangerous corner case: > >> some DMA engines would ignore it and others would go haywire, so if we > >> ever let a 0 length list down into the driver, they'd have to > >> understand the corner case behaviour of their DMA engine and filter it > >> accordingly, which is why we disallow them in the upper levels, since > >> they're effective nops anyway. > > > > When using scatter gather lists at the far end (i.e. on the storage device) > > the T10 examples (WRITE SCATTERED and POPULATE TOKEN in SBC-4) explicitly > > allow the "number of logical blocks" in their sgl_s to be zero and state > > that it is _not_ to be considered an error. > > It's fine for my case unless it leaks them out of device driver to the > net/block layer/etc. Is it? None of the SCSI Command mentions above are supported by Linux, nevermind mapped to struct scatterlist.
On 2020-12-24 1:41 a.m., Christoph Hellwig wrote: > On Wed, Dec 23, 2020 at 08:32:45PM +0000, Pavel Begunkov wrote: >> On 23/12/2020 20:23, Douglas Gilbert wrote: >>> On 2020-12-23 11:04 a.m., James Bottomley wrote: >>>> On Wed, 2020-12-23 at 15:51 +0000, Christoph Hellwig wrote: >>>>> On Wed, Dec 23, 2020 at 12:52:59PM +0000, Pavel Begunkov wrote: >>>>>> Can scatterlist have 0-len entries? Those are directly translated >>>>>> into bvecs, e.g. in nvme/target/io-cmd-file.c and >>>>>> target/target_core_file.c. I've audited most of others by this >>>>>> moment, they're fine. >>>>> >>>>> For block layer SGLs we should never see them, and for nvme neither. >>>>> I think the same is true for the SCSI target code, but please double >>>>> check. >>>> >>>> Right, no-one ever wants to see a 0-len scatter list entry.?? The reason >>>> is that every driver uses the sgl to program the device DMA engine in >>>> the way NVME does.?? a 0 length sgl would be a dangerous corner case: >>>> some DMA engines would ignore it and others would go haywire, so if we >>>> ever let a 0 length list down into the driver, they'd have to >>>> understand the corner case behaviour of their DMA engine and filter it >>>> accordingly, which is why we disallow them in the upper levels, since >>>> they're effective nops anyway. >>> >>> When using scatter gather lists at the far end (i.e. on the storage device) >>> the T10 examples (WRITE SCATTERED and POPULATE TOKEN in SBC-4) explicitly >>> allow the "number of logical blocks" in their sgl_s to be zero and state >>> that it is _not_ to be considered an error. >> >> It's fine for my case unless it leaks them out of device driver to the >> net/block layer/etc. Is it? > > None of the SCSI Command mentions above are supported by Linux, > nevermind mapped to struct scatterlist. > The POPULATE TOKEN / WRITE USING TOKEN pair can be viewed as a subset of EXTENDED COPY (SPC-4) which also supports "range descriptors". It is not clear if target_core_xcopy.c supports these range descriptors but if it did, it would be trying to map them to struct scatterlist objects. That said, it would be easy to skip the "number of logical blocks" == 0 case when translating range descriptors to sgl_s. In my ddpt utility (a dd clone) I have generalized skip= and seek= to optionally take sgl_s. If the last element in one of those sgl_s is LBAn,0 then it is interpreted as "until the end of that device" which is further restricted if the other sgl has a "hard" length or count= is given. The point being a length of 0 can have meaning, a benefit lost with NVMe's 0-based counts. Doug Gilbert
On Wed, 2020-12-23 at 15:23 -0500, Douglas Gilbert wrote: > On 2020-12-23 11:04 a.m., James Bottomley wrote: > > On Wed, 2020-12-23 at 15:51 +0000, Christoph Hellwig wrote: > > > On Wed, Dec 23, 2020 at 12:52:59PM +0000, Pavel Begunkov wrote: > > > > Can scatterlist have 0-len entries? Those are directly > > > > translated into bvecs, e.g. in nvme/target/io-cmd-file.c and > > > > target/target_core_file.c. I've audited most of others by this > > > > moment, they're fine. > > > > > > For block layer SGLs we should never see them, and for nvme > > > neither. I think the same is true for the SCSI target code, but > > > please double check. > > > > Right, no-one ever wants to see a 0-len scatter list entry. The > > reason is that every driver uses the sgl to program the device DMA > > engine in the way NVME does. a 0 length sgl would be a dangerous > > corner case: some DMA engines would ignore it and others would go > > haywire, so if we ever let a 0 length list down into the driver, > > they'd have to understand the corner case behaviour of their DMA > > engine and filter it accordingly, which is why we disallow them in > > the upper levels, since they're effective nops anyway. > > When using scatter gather lists at the far end (i.e. on the storage > device) the T10 examples (WRITE SCATTERED and POPULATE TOKEN in SBC- > 4) explicitly allow the "number of logical blocks" in their sgl_s to > be zero and state that it is _not_ to be considered an error. But that's pretty irrelevant. The scatterlists that block has been constructing to drive DMA engines pre-date SCSI's addition of SGLs by decades (all SCSI commands before the object commands use a linear buffer which is implemented in the HBA engine as a scatterlist but not described by the SCSI standard as one). So the answer to the question should the block layer emit zero length sgl elements is "no" because they can confuse some DMA engines. If there's a more theoretical question of whether the target driver in adding commands it doesn't yet support should inject zero length SGL elements into block because SCSI allows it, the answer is still "no" because we don't want block to have SGLs that may confuse other DMA engines. There's lots of daft corner cases in the SCSI standard we don't implement and a nop for SGL elements seems to be one of the more hare brained because it adds no useful feature and merely causes compatibility issues. James