Message ID | 20240503095353.3798063-8-mcgrof@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | enable bs > ps in XFS | expand |
On Fri, May 03, 2024 at 02:53:49AM -0700, Luis Chamberlain wrote: > + bio = iomap_dio_alloc_bio(iter, dio, BIO_MAX_VECS, > + REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); > fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits, > GFP_KERNEL); > + > bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos); > bio->bi_private = dio; > bio->bi_end_io = iomap_dio_bio_end_io; > > - __bio_add_page(bio, page, len, 0); > + while (len) { > + unsigned int io_len = min_t(unsigned int, len, PAGE_SIZE); > + > + __bio_add_page(bio, page, io_len, 0); > + len -= io_len; > + } > iomap_dio_submit_bio(iter, dio, bio, pos); If the len is more than PAGE_SIZE * BIO_MAX_VECS, __bio_add_page() will fail silently. I hate this interface. You should be doing something like ... while (len) { unsigned int io_len = min_t(unsigned int, len, PAGE_SIZE); while (!bio || bio_add_page() < io_len) { if (bio) iomap_dio_submit_bio(iter, dio, bio, pos); bio = iomap_dio_alloc_bio(iter, dio, BIO_MAX_VECS, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits, GFP_KERNEL); } }
On Tue, May 07, 2024 at 05:00:28PM +0100, Matthew Wilcox wrote: > If the len is more than PAGE_SIZE * BIO_MAX_VECS, __bio_add_page() > will fail silently. I hate this interface. No, it won't. You can pass an arbitray len to it. > > You should be doing something like ... > > while (len) { > unsigned int io_len = min_t(unsigned int, len, PAGE_SIZE); > > while (!bio || bio_add_page() < io_len) { > if (bio) > iomap_dio_submit_bio(iter, dio, bio, pos); > bio = iomap_dio_alloc_bio(iter, dio, BIO_MAX_VECS, > REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); > fscrypt_set_bio_crypt_ctx(bio, inode, > pos >> inode->i_blkbits, GFP_KERNEL); > } > } Wee, no. The right way is: bio = iomap_dio_alloc_bio(iter, dio, 1, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); __bio_add_page(bio, page, len, 0); fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits, GFP_KERNEL); (or even better the folio version)
On Tue, May 07, 2024 at 09:10:15AM -0700, Christoph Hellwig wrote: > On Tue, May 07, 2024 at 05:00:28PM +0100, Matthew Wilcox wrote: > > If the len is more than PAGE_SIZE * BIO_MAX_VECS, __bio_add_page() > > will fail silently. I hate this interface. > > No, it won't. You can pass an arbitray len to it. > > > > > You should be doing something like ... > > > > while (len) { > > unsigned int io_len = min_t(unsigned int, len, PAGE_SIZE); > > > > while (!bio || bio_add_page() < io_len) { > > if (bio) > > iomap_dio_submit_bio(iter, dio, bio, pos); > > bio = iomap_dio_alloc_bio(iter, dio, BIO_MAX_VECS, > > REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); > > fscrypt_set_bio_crypt_ctx(bio, inode, > > pos >> inode->i_blkbits, GFP_KERNEL); > > } > > } > > Wee, no. The right way is: > > bio = iomap_dio_alloc_bio(iter, dio, 1, > REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); > __bio_add_page(bio, page, len, 0); no? len can be > PAGE_SIZE. and that can be true in the folio version too, because we won't necessarily be able to allocate the THP. > fscrypt_set_bio_crypt_ctx(bio, inode, > pos >> inode->i_blkbits, GFP_KERNEL); > > (or even better the folio version)
On Tue, May 07, 2024 at 05:11:58PM +0100, Matthew Wilcox wrote: > > __bio_add_page(bio, page, len, 0); > > no? len can be > PAGE_SIZE. Yes. So what?
On Tue, May 07, 2024 at 09:13:17AM -0700, Christoph Hellwig wrote: > On Tue, May 07, 2024 at 05:11:58PM +0100, Matthew Wilcox wrote: > > > __bio_add_page(bio, page, len, 0); > > > > no? len can be > PAGE_SIZE. > > Yes. So what? the zero_page is only PAGE_SIZE bytes long. so you'd be writing from the page that's after the zero page, whatever contents that has.
On Tue, May 07, 2024 at 05:00:28PM +0100, Matthew Wilcox wrote: > On Fri, May 03, 2024 at 02:53:49AM -0700, Luis Chamberlain wrote: > > + bio = iomap_dio_alloc_bio(iter, dio, BIO_MAX_VECS, > > + REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); > > fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits, > > GFP_KERNEL); > > + > > bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos); > > bio->bi_private = dio; > > bio->bi_end_io = iomap_dio_bio_end_io; > > > > - __bio_add_page(bio, page, len, 0); > > + while (len) { > > + unsigned int io_len = min_t(unsigned int, len, PAGE_SIZE); > > + > > + __bio_add_page(bio, page, io_len, 0); > > + len -= io_len; > > + } > > iomap_dio_submit_bio(iter, dio, bio, pos); > > If the len is more than PAGE_SIZE * BIO_MAX_VECS, __bio_add_page() > will fail silently. I hate this interface. I added a WARN_ON_ONCE() at the start of the function so that it does not silently fail: WARN_ON_ONCE(len > (BIO_MAX_VECS * PAGE_SIZE)); This function is used to do only sub block zeroing, and I don't think we will cross 1MB block size in the forseeable future, and even if we do, we have this to warn us about so that it can be changed? > > You should be doing something like ... > > while (len) { > unsigned int io_len = min_t(unsigned int, len, PAGE_SIZE); > > while (!bio || bio_add_page() < io_len) { > if (bio) > iomap_dio_submit_bio(iter, dio, bio, pos); > bio = iomap_dio_alloc_bio(iter, dio, BIO_MAX_VECS, > REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); > fscrypt_set_bio_crypt_ctx(bio, inode, > pos >> inode->i_blkbits, GFP_KERNEL); > } > }
On Wed, May 08, 2024 at 05:24:53AM +0100, Matthew Wilcox wrote: > On Tue, May 07, 2024 at 09:13:17AM -0700, Christoph Hellwig wrote: > > On Tue, May 07, 2024 at 05:11:58PM +0100, Matthew Wilcox wrote: > > > > __bio_add_page(bio, page, len, 0); > > > > > > no? len can be > PAGE_SIZE. > > > > Yes. So what? > > the zero_page is only PAGE_SIZE bytes long. so you'd be writing > from the page that's after the zero page, whatever contents that has. This is the exact reason this patch was added. We were writing the garbage value past the PAGE_SIZE for LBS that led to FS corruption. Not an issue where FS block size <= page size.
On Wed, May 08, 2024 at 05:24:53AM +0100, Matthew Wilcox wrote: > On Tue, May 07, 2024 at 09:13:17AM -0700, Christoph Hellwig wrote: > > On Tue, May 07, 2024 at 05:11:58PM +0100, Matthew Wilcox wrote: > > > > __bio_add_page(bio, page, len, 0); > > > > > > no? len can be > PAGE_SIZE. > > > > Yes. So what? > > the zero_page is only PAGE_SIZE bytes long. so you'd be writing > from the page that's after the zero page, whatever contents that has. Except that the whole point of the exercise is to use the huge folio so that we don't run past the end of the zero page. Yes, if we use ZERO_PAGE we need to chunk things up and use bio_add_page instead bio_page, check the return value and potentially deal with multiple bios. I'd rather avoid that, though.
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index f3b43d223a46..5f481068de5b 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -239,14 +239,23 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, struct page *page = ZERO_PAGE(0); struct bio *bio; - bio = iomap_dio_alloc_bio(iter, dio, 1, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); + WARN_ON_ONCE(len > (BIO_MAX_VECS * PAGE_SIZE)); + + bio = iomap_dio_alloc_bio(iter, dio, BIO_MAX_VECS, + REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits, GFP_KERNEL); + bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos); bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; - __bio_add_page(bio, page, len, 0); + while (len) { + unsigned int io_len = min_t(unsigned int, len, PAGE_SIZE); + + __bio_add_page(bio, page, io_len, 0); + len -= io_len; + } iomap_dio_submit_bio(iter, dio, bio, pos); }