Message ID | 20240213093713.1753368-11-kernel@pankajraghav.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | enable bs > ps in XFS | expand |
On 2/13/24 10:37, Pankaj Raghav (Samsung) wrote: > From: Pankaj Raghav <p.raghav@samsung.com> > > iomap_dio_zero() will pad a fs block with zeroes if the direct IO size > < fs block size. iomap_dio_zero() has an implicit assumption that fs block > size < page_size. This is true for most filesystems at the moment. > > If the block size > page size, this will send the contents of the page > next to zero page(as len > PAGE_SIZE) to the underlying block device, > causing FS corruption. > > iomap is a generic infrastructure and it should not make any assumptions > about the fs block size and the page size of the system. > > Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> > --- > fs/iomap/direct-io.c | 13 +++++++++++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > index bcd3f8cf5ea4..04f6c5548136 100644 > --- a/fs/iomap/direct-io.c > +++ b/fs/iomap/direct-io.c > @@ -239,14 +239,23 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, > struct page *page = ZERO_PAGE(0); > struct bio *bio; > > - bio = iomap_dio_alloc_bio(iter, dio, 1, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); > + WARN_ON_ONCE(len > (BIO_MAX_VECS * PAGE_SIZE)); > + > + bio = iomap_dio_alloc_bio(iter, dio, BIO_MAX_VECS, > + REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); > fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits, > GFP_KERNEL); > + > bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos); > bio->bi_private = dio; > bio->bi_end_io = iomap_dio_bio_end_io; > > - __bio_add_page(bio, page, len, 0); > + while (len) { > + unsigned int io_len = min_t(unsigned int, len, PAGE_SIZE); > + > + __bio_add_page(bio, page, io_len, 0); > + len -= io_len; > + } > iomap_dio_submit_bio(iter, dio, bio, pos); > } > Reviewed-by: Hannes Reinecke <hare@suse.de> Cheers, Hannes
On Tue, Feb 13, 2024 at 10:37:09AM +0100, Pankaj Raghav (Samsung) wrote: > From: Pankaj Raghav <p.raghav@samsung.com> > > iomap_dio_zero() will pad a fs block with zeroes if the direct IO size > < fs block size. iomap_dio_zero() has an implicit assumption that fs block > size < page_size. This is true for most filesystems at the moment. > > If the block size > page size, this will send the contents of the page > next to zero page(as len > PAGE_SIZE) to the underlying block device, > causing FS corruption. > > iomap is a generic infrastructure and it should not make any assumptions > about the fs block size and the page size of the system. > > Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> > --- > fs/iomap/direct-io.c | 13 +++++++++++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > index bcd3f8cf5ea4..04f6c5548136 100644 > --- a/fs/iomap/direct-io.c > +++ b/fs/iomap/direct-io.c > @@ -239,14 +239,23 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, > struct page *page = ZERO_PAGE(0); > struct bio *bio; > > - bio = iomap_dio_alloc_bio(iter, dio, 1, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); > + WARN_ON_ONCE(len > (BIO_MAX_VECS * PAGE_SIZE)); > + > + bio = iomap_dio_alloc_bio(iter, dio, BIO_MAX_VECS, > + REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); > fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits, > GFP_KERNEL); > + > bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos); > bio->bi_private = dio; > bio->bi_end_io = iomap_dio_bio_end_io; > > - __bio_add_page(bio, page, len, 0); > + while (len) { > + unsigned int io_len = min_t(unsigned int, len, PAGE_SIZE); What was the result of all that discussion about using the PMD-sized zero-folio the last time this patch was submitted? Did that prove to be unwieldly, or did it require enough extra surgery to become its own series? (The code here looks good to me.) --D > + > + __bio_add_page(bio, page, io_len, 0); > + len -= io_len; > + } > iomap_dio_submit_bio(iter, dio, bio, pos); > } > > -- > 2.43.0 > >
On Tue, Feb 13, 2024 at 08:30:37AM -0800, Darrick J. Wong wrote: > On Tue, Feb 13, 2024 at 10:37:09AM +0100, Pankaj Raghav (Samsung) wrote: > > From: Pankaj Raghav <p.raghav@samsung.com> > > > > iomap_dio_zero() will pad a fs block with zeroes if the direct IO size > > < fs block size. iomap_dio_zero() has an implicit assumption that fs block > > size < page_size. This is true for most filesystems at the moment. > > > > If the block size > page size, this will send the contents of the page > > next to zero page(as len > PAGE_SIZE) to the underlying block device, > > causing FS corruption. > > > > iomap is a generic infrastructure and it should not make any assumptions > > about the fs block size and the page size of the system. > > > > Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> > > --- > > fs/iomap/direct-io.c | 13 +++++++++++-- > > 1 file changed, 11 insertions(+), 2 deletions(-) > > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > > index bcd3f8cf5ea4..04f6c5548136 100644 > > --- a/fs/iomap/direct-io.c > > +++ b/fs/iomap/direct-io.c > > @@ -239,14 +239,23 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, > > struct page *page = ZERO_PAGE(0); > > struct bio *bio; > > > > - bio = iomap_dio_alloc_bio(iter, dio, 1, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); > > + WARN_ON_ONCE(len > (BIO_MAX_VECS * PAGE_SIZE)); > > + > > + bio = iomap_dio_alloc_bio(iter, dio, BIO_MAX_VECS, > > + REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); > > fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits, > > GFP_KERNEL); > > + > > bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos); > > bio->bi_private = dio; > > bio->bi_end_io = iomap_dio_bio_end_io; > > > > - __bio_add_page(bio, page, len, 0); > > + while (len) { > > + unsigned int io_len = min_t(unsigned int, len, PAGE_SIZE); > > What was the result of all that discussion about using the PMD-sized > zero-folio the last time this patch was submitted? Did that prove to be > unwieldly, or did it require enough extra surgery to become its own > series? > It proved a bit unwieldly to me at least as I did not know any straight forward way to do it at the time. So I thought I will keep this approach as it is, and add support for the PMD-sized zero folio for later improvement. > (The code here looks good to me.) Thanks! > > --D > > > + > > + __bio_add_page(bio, page, io_len, 0); > > + len -= io_len; > > + } > > iomap_dio_submit_bio(iter, dio, bio, pos); > > } > > > > -- > > 2.43.0 > > > >
On Tue, Feb 13, 2024 at 10:27:32PM +0100, Pankaj Raghav (Samsung) wrote: > On Tue, Feb 13, 2024 at 08:30:37AM -0800, Darrick J. Wong wrote: > > On Tue, Feb 13, 2024 at 10:37:09AM +0100, Pankaj Raghav (Samsung) wrote: > > > From: Pankaj Raghav <p.raghav@samsung.com> > > > > > > iomap_dio_zero() will pad a fs block with zeroes if the direct IO size > > > < fs block size. iomap_dio_zero() has an implicit assumption that fs block > > > size < page_size. This is true for most filesystems at the moment. > > > > > > If the block size > page size, this will send the contents of the page > > > next to zero page(as len > PAGE_SIZE) to the underlying block device, > > > causing FS corruption. > > > > > > iomap is a generic infrastructure and it should not make any assumptions > > > about the fs block size and the page size of the system. > > > > > > Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> > > > --- > > > fs/iomap/direct-io.c | 13 +++++++++++-- > > > 1 file changed, 11 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > > > index bcd3f8cf5ea4..04f6c5548136 100644 > > > --- a/fs/iomap/direct-io.c > > > +++ b/fs/iomap/direct-io.c > > > @@ -239,14 +239,23 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, > > > struct page *page = ZERO_PAGE(0); > > > struct bio *bio; > > > > > > - bio = iomap_dio_alloc_bio(iter, dio, 1, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); > > > + WARN_ON_ONCE(len > (BIO_MAX_VECS * PAGE_SIZE)); > > > + > > > + bio = iomap_dio_alloc_bio(iter, dio, BIO_MAX_VECS, > > > + REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); > > > fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits, > > > GFP_KERNEL); > > > + > > > bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos); > > > bio->bi_private = dio; > > > bio->bi_end_io = iomap_dio_bio_end_io; > > > > > > - __bio_add_page(bio, page, len, 0); > > > + while (len) { > > > + unsigned int io_len = min_t(unsigned int, len, PAGE_SIZE); > > > > What was the result of all that discussion about using the PMD-sized > > zero-folio the last time this patch was submitted? Did that prove to be > > unwieldly, or did it require enough extra surgery to become its own > > series? > > > > It proved a bit unwieldly to me at least as I did not know any straight > forward way to do it at the time. So I thought I will keep this approach > as it is, and add support for the PMD-sized zero folio for later > improvement. > > > (The code here looks good to me.) > > Thanks! In that case I'll throw it on the testing pile and let's ask brauner to merge this for 6.9 if nothing blows up. Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > > > > --D > > > > > + > > > + __bio_add_page(bio, page, io_len, 0); > > > + len -= io_len; > > > + } > > > iomap_dio_submit_bio(iter, dio, bio, pos); > > > } > > > > > > -- > > > 2.43.0 > > > > > > >
> > In that case I'll throw it on the testing pile and let's ask brauner to > merge this for 6.9 if nothing blows up. > Sounds good. Thanks. > Reviewed-by: Darrick J. Wong <djwong@kernel.org> > > --D >
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index bcd3f8cf5ea4..04f6c5548136 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -239,14 +239,23 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, struct page *page = ZERO_PAGE(0); struct bio *bio; - bio = iomap_dio_alloc_bio(iter, dio, 1, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); + WARN_ON_ONCE(len > (BIO_MAX_VECS * PAGE_SIZE)); + + bio = iomap_dio_alloc_bio(iter, dio, BIO_MAX_VECS, + REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits, GFP_KERNEL); + bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos); bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; - __bio_add_page(bio, page, len, 0); + while (len) { + unsigned int io_len = min_t(unsigned int, len, PAGE_SIZE); + + __bio_add_page(bio, page, io_len, 0); + len -= io_len; + } iomap_dio_submit_bio(iter, dio, bio, pos); }