Message ID | 20201206055332.3144-3-tom.ty89@gmail.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | [1/3] block: try one write zeroes request before going further | expand |
On 12/6/20 6:53 AM, Tom Yan wrote: > Mimicking blkdev_issue_flush(). Seems like a right thing to do, as > they are a bunch of REQ_OP_WRITE. > > Signed-off-by: Tom Yan <tom.ty89@gmail.com> > --- > block/blk-lib.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/block/blk-lib.c b/block/blk-lib.c > index 354dcab760c7..5579fdea893d 100644 > --- a/block/blk-lib.c > +++ b/block/blk-lib.c > @@ -422,6 +422,8 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, > } else if (!(flags & BLKDEV_ZERO_NOFALLBACK)) { > ret = __blkdev_issue_zero_pages(bdev, sector, nr_sects, > gfp_mask, &bio); > + if (bio) > + bio->bi_opf |= REQ_PREFLUSH; > } else { > /* No zeroing offload support */ > ret = -EOPNOTSUPP; > PREFLUSH is for the 'flush' machinery (cf blk-flush.c). Which is okay for blkdev_issue_flush(), but certainly not for zeroout. Cheers, Hannes
Why? Did you miss that it is in the condition where __blkdev_issue_zero_pages() is called (i.e. it's not WRITE SAME but WRITE). From what I gathered REQ_PREFLUSH triggers a write back cache (that is on the device; not sure about dirty pages) flush, wouldn't it be a right thing to do after we performed a series of WRITE (which is more or less purposed to get a drive wiped clean). On Sun, 6 Dec 2020 at 19:31, Hannes Reinecke <hare@suse.de> wrote: > > On 12/6/20 6:53 AM, Tom Yan wrote: > > Mimicking blkdev_issue_flush(). Seems like a right thing to do, as > > they are a bunch of REQ_OP_WRITE. > > > > Signed-off-by: Tom Yan <tom.ty89@gmail.com> > > --- > > block/blk-lib.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/block/blk-lib.c b/block/blk-lib.c > > index 354dcab760c7..5579fdea893d 100644 > > --- a/block/blk-lib.c > > +++ b/block/blk-lib.c > > @@ -422,6 +422,8 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, > > } else if (!(flags & BLKDEV_ZERO_NOFALLBACK)) { > > ret = __blkdev_issue_zero_pages(bdev, sector, nr_sects, > > gfp_mask, &bio); > > + if (bio) > > + bio->bi_opf |= REQ_PREFLUSH; > > } else { > > /* No zeroing offload support */ > > ret = -EOPNOTSUPP; > > > PREFLUSH is for the 'flush' machinery (cf blk-flush.c). Which is okay > for blkdev_issue_flush(), but certainly not for zeroout. > > Cheers, > > Hannes > -- > Dr. Hannes Reinecke Kernel Storage Architect > hare@suse.de +49 911 74053 688 > SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg > HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
On 12/6/20 2:32 PM, Tom Yan wrote: > Why? Did you miss that it is in the condition where > __blkdev_issue_zero_pages() is called (i.e. it's not WRITE SAME but > WRITE). From what I gathered REQ_PREFLUSH triggers a write back cache > (that is on the device; not sure about dirty pages) flush, wouldn't it > be a right thing to do after we performed a series of WRITE (which is > more or less purposed to get a drive wiped clean). > But what makes 'zero_pages' special as compared to, say, WRITE_SAME? One could use WRITE SAME with '0' content, arriving at pretty much the same content than usine zeroout without unmapping. And neither of them worries about cache flushing. Nor should they, IMO. These are 'native' block layer calls, providing abstract accesses to hardware functionality. If an application wants to use them, it would be the task of the application to insert a 'flush' if it deems neccessary. (There _is_ blkdev_issue_flush(), after all). Cheers, Hannes
On Sun, 6 Dec 2020 at 22:05, Hannes Reinecke <hare@suse.de> wrote: > > On 12/6/20 2:32 PM, Tom Yan wrote: > > Why? Did you miss that it is in the condition where > > __blkdev_issue_zero_pages() is called (i.e. it's not WRITE SAME but > > WRITE). From what I gathered REQ_PREFLUSH triggers a write back cache > > (that is on the device; not sure about dirty pages) flush, wouldn't it > > be a right thing to do after we performed a series of WRITE (which is > > more or less purposed to get a drive wiped clean). > > > > But what makes 'zero_pages' special as compared to, say, WRITE_SAME? > One could use WRITE SAME with '0' content, arriving at pretty much the > same content than usine zeroout without unmapping. And neither of them > worries about cache flushing. > Nor should they, IMO. Because we are writing actual pages (just that they are zero and "shared memory" in the system) to the device, instead of triggering a special command (with a specific parameter)? > > These are 'native' block layer calls, providing abstract accesses to > hardware functionality. If an application wants to use them, it would be > the task of the application to insert a 'flush' if it deems neccessary. > (There _is_ blkdev_issue_flush(), after all). Well my argument would be the call has the purpose of "wiping" so it should try to "atomically" guarantee that the wiping is synced. It's like a complement to REQ_SYNC in the final submit_bio_wait(). > > Cheers, > > Hannes > -- > Dr. Hannes Reinecke Kernel Storage Architect > hare@suse.de +49 911 74053 688 > SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg > HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
On 12/6/20 3:14 PM, Tom Yan wrote: > On Sun, 6 Dec 2020 at 22:05, Hannes Reinecke <hare@suse.de> wrote: >> >> On 12/6/20 2:32 PM, Tom Yan wrote: >>> Why? Did you miss that it is in the condition where >>> __blkdev_issue_zero_pages() is called (i.e. it's not WRITE SAME but >>> WRITE). From what I gathered REQ_PREFLUSH triggers a write back cache >>> (that is on the device; not sure about dirty pages) flush, wouldn't it >>> be a right thing to do after we performed a series of WRITE (which is >>> more or less purposed to get a drive wiped clean). >>> >> >> But what makes 'zero_pages' special as compared to, say, WRITE_SAME? >> One could use WRITE SAME with '0' content, arriving at pretty much the >> same content than usine zeroout without unmapping. And neither of them >> worries about cache flushing. >> Nor should they, IMO. > > Because we are writing actual pages (just that they are zero and > "shared memory" in the system) to the device, instead of triggering a > special command (with a specific parameter)? > But these pages are ephemeral, and never visible to the user. >> >> These are 'native' block layer calls, providing abstract accesses to >> hardware functionality. If an application wants to use them, it would be >> the task of the application to insert a 'flush' if it deems neccessary. >> (There _is_ blkdev_issue_flush(), after all). > > Well my argument would be the call has the purpose of "wiping" so it > should try to "atomically" guarantee that the wiping is synced. It's > like a complement to REQ_SYNC in the final submit_bio_wait(). > That's an assumption. It would be valid if blkdev_issue_zeroout() would only allow to wipe the entire disk. As it stands, it doesn't, and so we shouldn't presume what users might want to do with it. Cheers, Hannes
On Sun, Dec 06, 2020 at 01:53:32PM +0800, Tom Yan wrote: > Mimicking blkdev_issue_flush(). Seems like a right thing to do, as > they are a bunch of REQ_OP_WRITE. Huh? Zeroing out is in no way a data integrity operation that needs a reflush.
On Mon, 7 Dec 2020 at 00:05, Hannes Reinecke <hare@suse.de> wrote: > > On 12/6/20 3:14 PM, Tom Yan wrote: > > On Sun, 6 Dec 2020 at 22:05, Hannes Reinecke <hare@suse.de> wrote: > >> > >> On 12/6/20 2:32 PM, Tom Yan wrote: > >>> Why? Did you miss that it is in the condition where > >>> __blkdev_issue_zero_pages() is called (i.e. it's not WRITE SAME but > >>> WRITE). From what I gathered REQ_PREFLUSH triggers a write back cache > >>> (that is on the device; not sure about dirty pages) flush, wouldn't it > >>> be a right thing to do after we performed a series of WRITE (which is > >>> more or less purposed to get a drive wiped clean). > >>> > >> > >> But what makes 'zero_pages' special as compared to, say, WRITE_SAME? > >> One could use WRITE SAME with '0' content, arriving at pretty much the > >> same content than usine zeroout without unmapping. And neither of them > >> worries about cache flushing. > >> Nor should they, IMO. > > > > Because we are writing actual pages (just that they are zero and > > "shared memory" in the system) to the device, instead of triggering a > > special command (with a specific parameter)? > > > > But these pages are ephemeral, and never visible to the user. What do you mean by the "user"? What I meant was, since it's no different than "normal" write operation, the zero pages will go to the volatile write cache of the device. > > >> > >> These are 'native' block layer calls, providing abstract accesses to > >> hardware functionality. If an application wants to use them, it would be > >> the task of the application to insert a 'flush' if it deems neccessary. > >> (There _is_ blkdev_issue_flush(), after all). > > > > Well my argument would be the call has the purpose of "wiping" so it > > should try to "atomically" guarantee that the wiping is synced. It's > > like a complement to REQ_SYNC in the final submit_bio_wait(). > > > That's an assumption. > > It would be valid if blkdev_issue_zeroout() would only allow to wipe the > entire disk. As it stands, it doesn't, and so we shouldn't presume what > users might want to do with it. Whether it's an entire disk doesn't matter. It still stands when it's only a certain range of blocks. > > Cheers, > > Hannes > -- > Dr. Hannes Reinecke Kernel Storage Architect > hare@suse.de +49 911 74053 688 > SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg > HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
diff --git a/block/blk-lib.c b/block/blk-lib.c index 354dcab760c7..5579fdea893d 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -422,6 +422,8 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, } else if (!(flags & BLKDEV_ZERO_NOFALLBACK)) { ret = __blkdev_issue_zero_pages(bdev, sector, nr_sects, gfp_mask, &bio); + if (bio) + bio->bi_opf |= REQ_PREFLUSH; } else { /* No zeroing offload support */ ret = -EOPNOTSUPP;
Mimicking blkdev_issue_flush(). Seems like a right thing to do, as they are a bunch of REQ_OP_WRITE. Signed-off-by: Tom Yan <tom.ty89@gmail.com> --- block/blk-lib.c | 2 ++ 1 file changed, 2 insertions(+)