Message ID | 20200406232440.4027-2-chaitanya.kulkarni@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: add bio based read-write helper | expand |
On 2020/04/07 8:24, Chaitanya Kulkarni wrote: > One of the common application for the file systems and drivers to map > the buffer to the bio and issue I/Os on the block device. Drivers generally do not do this. At least not lower ones (scsi, nvme, etc). I guess you are referring to DM drivers. If yes, better be clear about this. Also, "the buffer" is a little unclear. Better qualify that. Another thing: "to map the buffer to the bio" is a little strange. The reverse seems more natural: a bio maps a buffer. > > This is a prep patch which adds two helper functions for block layer > which allows file-systems and the drivers to map data buffer on to the > bios and issue bios synchronously using blkdev_issue_rw() or > asynchronously using __blkdev_issue_rw(). The function names are basically the same but do completely different things. Better rename that. May be blkdev_issue_rw_sync() and blkdev_issue_rw_async() ? > > Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> > --- > block/blk-lib.c | 105 +++++++++++++++++++++++++++++++++++++++++ > include/linux/blkdev.h | 7 +++ > 2 files changed, 112 insertions(+) > > diff --git a/block/blk-lib.c b/block/blk-lib.c > index 5f2c429d4378..451c367fc0d6 100644 > --- a/block/blk-lib.c > +++ b/block/blk-lib.c > @@ -405,3 +405,108 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, > return ret; > } > EXPORT_SYMBOL(blkdev_issue_zeroout); > + > +/** > + * __blkdev_ssue_rw - issue read/write bios from buffer asynchronously s/__blkdev_ssue_rw/__blkdev_issue_rw > + * @bdev: blockdev to read/write > + * @buf: data buffer > + * @sector: start sector > + * @nr_sects: number of sectors > + * @op: REQ_OP_READ or REQ_OP_WRITE > + * @opf: flags > + * @gfp_mask: memory allocation flags (for bio_alloc) > + * @biop: pointer to anchor bio > + * > + * Description: > + * Generic helper function to map data buffer into bios for read and write ops. > + * Returns pointer to the anchored last bio for caller to submit asynchronously > + * or synchronously. > + */ > +int __blkdev_issue_rw(struct block_device *bdev, char *buf, sector_t sector, > + sector_t nr_sects, unsigned op, unsigned opf, > + gfp_t gfp_mask, struct bio **biop) > +{ > + bool vm = is_vmalloc_addr(buf) ? true : false; You do not need the clunky "? true : false" here. is_vmalloc_addr() returns a bool already. > + struct bio *bio = *biop; > + unsigned int nr_pages; > + struct page *page; > + unsigned int off; > + unsigned int len; > + int bi_size; > + > + if (!bdev_get_queue(bdev)) > + return -ENXIO; > + > + if (bdev_read_only(bdev)) > + return -EPERM; One can read a read-only device. So the check here is not correct. You need a "&& op == REQ_OP_WRITE" in the condition. > + > + if (!(op == REQ_OP_READ || op == REQ_OP_WRITE)) > + return -EINVAL; This probably should be checked before read-only. > + > + while (nr_sects != 0) { > + nr_pages = __blkdev_sectors_to_bio_pages(nr_sects); > + > + bio = blk_next_bio(bio, nr_pages, gfp_mask); > + bio->bi_iter.bi_sector = sector; > + bio_set_dev(bio, bdev); > + bio_set_op_attrs(bio, op, 0); > + > + while (nr_sects != 0) { > + off = offset_in_page(buf); > + page = vm ? vmalloc_to_page(buf) : virt_to_page(buf); > + len = min((sector_t) PAGE_SIZE, nr_sects << 9); > + > + bi_size = bio_add_page(bio, page, len, off); The variable naming is super confusing. bio_add_page() returns 0 if nothing is added and len if the page was added. So bi_size as a var name is not the best of name in my opinion. > + > + nr_sects -= bi_size >> 9; > + sector += bi_size >> 9; > + buf += bi_size; > + > + if (bi_size < len) > + break; You will get either 0 or len from bio_add_page. So the check here is not ideal. I think it is better to move it up under bio_add_page() and make it: len = bioa_add_page(bio, page, len, off); if (!len) break; > + } > + cond_resched(); > + } > + *biop = bio; > + return 0; > +} > +EXPORT_SYMBOL_GPL(__blkdev_issue_rw); > + > +/** > + * blkdev_execute_rw_sync - issue read/write bios from buffer synchronously > + * @bdev: blockdev to read/write > + * @buf: data buffer > + * @sector: start sector > + * @count: number of bytes > + * @op: REQ_OP_READ or REQ_OP_WRITE > + * @opf: flags > + * @gfp_mask: memory allocation flags (for bio_alloc) > + * > + * Description: > + * Generic helper function to map data buffer buffer into bios for read and > + * write requests. > + */ > +int blkdev_issue_rw(struct block_device *b, char *buf, sector_t sector, > + unsigned count, unsigned op, unsigned opf, gfp_t mask) function name differs between description and declaration. blkdev_execute_rw_sync() is better than blkdev_issue_rw() in my opinion. > +{ > + unsigned int is_vmalloc = is_vmalloc_addr(buf); > + sector_t nr_sects = count >> 9; > + struct bio *bio = NULL; > + int error; > + > + if (is_vmalloc && op == REQ_OP_WRITE) > + flush_kernel_vmap_range(buf, count); > + > + opf |= REQ_SYNC; You can add this directly in the call below. > + error = __blkdev_issue_rw(b, buf, sector, nr_sects, op, opf, mask, &bio); > + if (!error && bio) { > + error = submit_bio_wait(bio); > + bio_put(bio); > + } > + > + if (is_vmalloc && op == REQ_OP_READ) > + invalidate_kernel_vmap_range(buf, count); > + > + return error; > +} > +EXPORT_SYMBOL_GPL(blkdev_issue_rw); > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index 32868fbedc9e..cb315b301ad9 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -1248,6 +1248,13 @@ static inline int sb_issue_zeroout(struct super_block *sb, sector_t block, > gfp_mask, 0); > } > > +extern int __blkdev_issue_rw(struct block_device *bdev, char *buf, > + sector_t sector, sector_t nr_sects, unsigned op, > + unsigned opf, gfp_t gfp_mask, struct bio **biop); > +extern int blkdev_issue_rw(struct block_device *b, char *buf, sector_t sector, > + unsigned count, unsigned op, unsigned opf, > + gfp_t mask); No need for the extern I think. > + > extern int blk_verify_command(unsigned char *cmd, fmode_t mode); > > enum blk_default_limits { >
In __blkdev_issue_rw(): > + bio = blk_next_bio(bio, nr_pages, gfp_mask); Doesn't this already submits the bio even before the pages are added? > + error = __blkdev_issue_rw(b, buf, sector, nr_sects, op, opf, mask, &bio); > + if (!error && bio) { > + error = submit_bio_wait(bio); And then the bio is submitted again in blkdev_issue_rw()... Or do I understand it wrong?
On 04/06/2020 05:28 PM, Damien Le Moal wrote: > On 2020/04/07 8:24, Chaitanya Kulkarni wrote: >> One of the common application for the file systems and drivers to map >> the buffer to the bio and issue I/Os on the block device. > > Drivers generally do not do this. At least not lower ones (scsi, nvme, etc). I > guess you are referring to DM drivers. If yes, better be clear about this. > Also, "the buffer" is a little unclear. Better qualify that. > Agree most drivers deals with sg_list. Right now there is only one such an example I know which is rnbd as driver as it is mentioned in the cover-letter references. > Another thing: "to map the buffer to the bio" is a little strange. The reverse > seems more natural: a bio maps a buffer. > Make sense, will do. >> >> This is a prep patch which adds two helper functions for block layer >> which allows file-systems and the drivers to map data buffer on to the >> bios and issue bios synchronously using blkdev_issue_rw() or >> asynchronously using __blkdev_issue_rw(). > > The function names are basically the same but do completely different things. > Better rename that. May be blkdev_issue_rw_sync() and blkdev_issue_rw_async() ? > > This is exactly I named functions to start with (see the function documentation which I missed to update), but the pattern in blk-lib.c uses no such sync and async postfix, which is used is sync and async context all over the kernel :- # grep EXPORT block/blk-lib.c EXPORT_SYMBOL(__blkdev_issue_discard); EXPORT_SYMBOL(blkdev_issue_discard);, since as it is nr_sects and sector calculation doesn't matter after break. EXPORT_SYMBOL(blkdev_issue_write_same); EXPORT_SYMBOL(__blkdev_issue_zeroout); EXPORT_SYMBOL(blkdev_issue_zeroout); EXPORT_SYMBOL_GPL(__blkdev_issue_rw); EXPORT_SYMBOL_GPL(blkdev_issue_rw); In absence of documentation for naming how about we just follow existing naming convention ? Which matches the pattern for new API. >> >> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> >> --- >> block/blk-lib.c | 105 +++++++++++++++++++++++++++++++++++++++++ >> include/linux/blkdev.h | 7 +++ >> 2 files changed, 112 insertions(+) >> >> diff --git a/block/blk-lib.c b/block/blk-lib.c >> index 5f2c429d4378..451c367fc0d6 100644 >> --- a/block/blk-lib.c >> +++ b/block/blk-lib.c >> @@ -405,3 +405,108 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, >> return ret; >> } >> EXPORT_SYMBOL(blkdev_issue_zeroout); >> + >> +/** >> + * __blkdev_ssue_rw - issue read/write bios from buffer asynchronously > > s/__blkdev_ssue_rw/__blkdev_issue_rw Thanks, will fix. > >> + * @bdev: blockdev to read/write >> + * @buf: data buffer >> + * @sector: start sector >> + * @nr_sects: number of sectors >> + * @op: REQ_OP_READ or REQ_OP_WRITE >> + * @opf: flags >> + * @gfp_mask: memory allocation flags (for bio_alloc) >> + * @biop: pointer to anchor bio >> + * >> + * Description: >> + * Generic helper function to map data buffer into bios for read and write ops. >> + * Returns pointer to the anchored last bio for caller to submit asynchronously >> + * or synchronously. >> + */ >> +int __blkdev_issue_rw(struct block_device *bdev, char *buf, sector_t sector, >> + sector_t nr_sects, unsigned op, unsigned opf, >> + gfp_t gfp_mask, struct bio **biop) >> +{ >> + bool vm = is_vmalloc_addr(buf) ? true : false; > > You do not need the clunky "? true : false" here. is_vmalloc_addr() returns a > bool already. > I will remove it. >> + struct bio *bio = *biop; >> + unsigned int nr_pages; >> + struct page *page; >> + unsigned int off; >> + unsigned int len; >> + int bi_size; >> + >> + if (!bdev_get_queue(bdev)) >> + return -ENXIO; >> + >> + if (bdev_read_only(bdev)) >> + return -EPERM; > > One can read a read-only device. So the check here is not correct. You need a > "&& op == REQ_OP_WRITE" in the condition. > True, this shouldn't be here, also I'm thinking of lifting these checks to caller, we can add it later if needed. >> + >> + if (!(op == REQ_OP_READ || op == REQ_OP_WRITE)) >> + return -EINVAL; > > This probably should be checked before read-only. > Yes. >> + >> + while (nr_sects != 0) { >> + nr_pages = __blkdev_sectors_to_bio_pages(nr_sects); >> + >> + bio = blk_next_bio(bio, nr_pages, gfp_mask); >> + bio->bi_iter.bi_sector = sector; >> + bio_set_dev(bio, bdev); >> + bio_set_op_attrs(bio, op, 0); >> + >> + while (nr_sects != 0) { >> + off = offset_in_page(buf); >> + page = vm ? vmalloc_to_page(buf) : virt_to_page(buf); >> + len = min((sector_t) PAGE_SIZE, nr_sects << 9); >> + >> + bi_size = bio_add_page(bio, page, len, off); > > The variable naming is super confusing. bio_add_page() returns 0 if nothing is > added and len if the page was added. So bi_size as a var name is not the best of > name in my opinion. > Here bio, page, len, off variables are passed to the bio_add_page() function, which has the same names in the function parameter list. (I'll fix the off to offset) Regarding the bi_size, given that bio_add_page() returns bio->bi_iter.bi_size, isn't bi_size also maps to the what function is returning and explains what we are dealing with? Also, bi_size is used in the blk-lib.c: __blkdev_issue_zero_pages(), if that is confusing then we need to change that, what is your preference? I'll send a cleanup patch __blkdev_issue_zero_pages() also. >> + >> + nr_sects -= bi_size >> 9; >> + sector += bi_size >> 9; >> + buf += bi_size; >> + >> + if (bi_size < len) >> + break; > > You will get either 0 or len from bio_add_page. So the check here is not ideal. > I think it is better to move it up under bio_add_page() and make it: > > len = bioa_add_page(bio, page, len, off); > if (!len) > break; I'd still like to keep bi_size, I think I had received a comment about using same variable for function call and updating as a return value. Regarding the check, will move it. > >> + } >> + cond_resched(); >> + } >> + *biop = bio; >> + return 0; >> +} >> +EXPORT_SYMBOL_GPL(__blkdev_issue_rw); >> + >> +/** >> + * blkdev_execute_rw_sync - issue read/write bios from buffer synchronously >> + * @bdev: blockdev to read/write >> + * @buf: data buffer >> + * @sector: start sector >> + * @count: number of bytes >> + * @op: REQ_OP_READ or REQ_OP_WRITE >> + * @opf: flags >> + * @gfp_mask: memory allocation flags (for bio_alloc) >> + * >> + * Description: >> + * Generic helper function to map data buffer buffer into bios for read and >> + * write requests. >> + */ >> +int blkdev_issue_rw(struct block_device *b, char *buf, sector_t sector, >> + unsigned count, unsigned op, unsigned opf, gfp_t mask) > > function name differs between description and declaration. > blkdev_execute_rw_sync() is better than blkdev_issue_rw() in my opinion. > That was remained from initial version, will fix it, thanks for pointing it out. >> +{ >> + unsigned int is_vmalloc = is_vmalloc_addr(buf); >> + sector_t nr_sects = count >> 9; >> + struct bio *bio = NULL; >> + int error; >> + >> + if (is_vmalloc && op == REQ_OP_WRITE) >> + flush_kernel_vmap_range(buf, count); >> + >> + opf |= REQ_SYNC; > > You can add this directly in the call below. > Breaks calling of __blkdev_issue_rw() to new line. This also adds a new line which is unavoidable but keeps the function call smooth in one line. >> + error = __blkdev_issue_rw(b, buf, sector, nr_sects, op, opf, mask, &bio); >> + if (!error && bio) { >> + error = submit_bio_wait(bio); >> + bio_put(bio); >> + } >> + >> + if (is_vmalloc && op == REQ_OP_READ) >> + invalidate_kernel_vmap_range(buf, count); >> + >> + return error; >> +} >> +EXPORT_SYMBOL_GPL(blkdev_issue_rw); >> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h >> index 32868fbedc9e..cb315b301ad9 100644 >> --- a/include/linux/blkdev.h >> +++ b/include/linux/blkdev.h >> @@ -1248,6 +1248,13 @@ static inline int sb_issue_zeroout(struct super_block *sb, sector_t block, >> gfp_mask, 0); >> } >> >> +extern int __blkdev_issue_rw(struct block_device *bdev, char *buf, >> + sector_t sector, sector_t nr_sects, unsigned op, >> + unsigned opf, gfp_t gfp_mask, struct bio **biop); >> +extern int blkdev_issue_rw(struct block_device *b, char *buf, sector_t sector, >> + unsigned count, unsigned op, unsigned opf, >> + gfp_t mask); > > No need for the extern I think. > Again, following the exiting pattern in the file for the header:- # grep blkdev_issue include/linux/blkdev.h extern int blkdev_issue_flush(struct block_device *, gfp_t, sector_t *); extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector, extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, extern int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, extern int __blkdev_issue_rw(struct block_device *bdev, char *buf, extern int blkdev_issue_rw(struct block_device *b, char *buf, sector_t sector, >> + >> extern int blk_verify_command(unsigned char *cmd, fmode_t mode); >> >> enum blk_default_limits { >> > >
diff --git a/block/blk-lib.c b/block/blk-lib.c index 5f2c429d4378..451c367fc0d6 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -405,3 +405,108 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, return ret; } EXPORT_SYMBOL(blkdev_issue_zeroout); + +/** + * __blkdev_ssue_rw - issue read/write bios from buffer asynchronously + * @bdev: blockdev to read/write + * @buf: data buffer + * @sector: start sector + * @nr_sects: number of sectors + * @op: REQ_OP_READ or REQ_OP_WRITE + * @opf: flags + * @gfp_mask: memory allocation flags (for bio_alloc) + * @biop: pointer to anchor bio + * + * Description: + * Generic helper function to map data buffer into bios for read and write ops. + * Returns pointer to the anchored last bio for caller to submit asynchronously + * or synchronously. + */ +int __blkdev_issue_rw(struct block_device *bdev, char *buf, sector_t sector, + sector_t nr_sects, unsigned op, unsigned opf, + gfp_t gfp_mask, struct bio **biop) +{ + bool vm = is_vmalloc_addr(buf) ? true : false; + struct bio *bio = *biop; + unsigned int nr_pages; + struct page *page; + unsigned int off; + unsigned int len; + int bi_size; + + if (!bdev_get_queue(bdev)) + return -ENXIO; + + if (bdev_read_only(bdev)) + return -EPERM; + + if (!(op == REQ_OP_READ || op == REQ_OP_WRITE)) + return -EINVAL; + + while (nr_sects != 0) { + nr_pages = __blkdev_sectors_to_bio_pages(nr_sects); + + bio = blk_next_bio(bio, nr_pages, gfp_mask); + bio->bi_iter.bi_sector = sector; + bio_set_dev(bio, bdev); + bio_set_op_attrs(bio, op, 0); + + while (nr_sects != 0) { + off = offset_in_page(buf); + page = vm ? vmalloc_to_page(buf) : virt_to_page(buf); + len = min((sector_t) PAGE_SIZE, nr_sects << 9); + + bi_size = bio_add_page(bio, page, len, off); + + nr_sects -= bi_size >> 9; + sector += bi_size >> 9; + buf += bi_size; + + if (bi_size < len) + break; + } + cond_resched(); + } + *biop = bio; + return 0; +} +EXPORT_SYMBOL_GPL(__blkdev_issue_rw); + +/** + * blkdev_execute_rw_sync - issue read/write bios from buffer synchronously + * @bdev: blockdev to read/write + * @buf: data buffer + * @sector: start sector + * @count: number of bytes + * @op: REQ_OP_READ or REQ_OP_WRITE + * @opf: flags + * @gfp_mask: memory allocation flags (for bio_alloc) + * + * Description: + * Generic helper function to map data buffer buffer into bios for read and + * write requests. + */ +int blkdev_issue_rw(struct block_device *b, char *buf, sector_t sector, + unsigned count, unsigned op, unsigned opf, gfp_t mask) +{ + unsigned int is_vmalloc = is_vmalloc_addr(buf); + sector_t nr_sects = count >> 9; + struct bio *bio = NULL; + int error; + + if (is_vmalloc && op == REQ_OP_WRITE) + flush_kernel_vmap_range(buf, count); + + opf |= REQ_SYNC; + error = __blkdev_issue_rw(b, buf, sector, nr_sects, op, opf, mask, &bio); + if (!error && bio) { + error = submit_bio_wait(bio); + bio_put(bio); + } + + if (is_vmalloc && op == REQ_OP_READ) + invalidate_kernel_vmap_range(buf, count); + + return error; +} +EXPORT_SYMBOL_GPL(blkdev_issue_rw); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 32868fbedc9e..cb315b301ad9 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1248,6 +1248,13 @@ static inline int sb_issue_zeroout(struct super_block *sb, sector_t block, gfp_mask, 0); } +extern int __blkdev_issue_rw(struct block_device *bdev, char *buf, + sector_t sector, sector_t nr_sects, unsigned op, + unsigned opf, gfp_t gfp_mask, struct bio **biop); +extern int blkdev_issue_rw(struct block_device *b, char *buf, sector_t sector, + unsigned count, unsigned op, unsigned opf, + gfp_t mask); + extern int blk_verify_command(unsigned char *cmd, fmode_t mode); enum blk_default_limits {
One of the common application for the file systems and drivers to map the buffer to the bio and issue I/Os on the block device. This is a prep patch which adds two helper functions for block layer which allows file-systems and the drivers to map data buffer on to the bios and issue bios synchronously using blkdev_issue_rw() or asynchronously using __blkdev_issue_rw(). Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> --- block/blk-lib.c | 105 +++++++++++++++++++++++++++++++++++++++++ include/linux/blkdev.h | 7 +++ 2 files changed, 112 insertions(+)