Message ID | 20160526180813.GA49039@shli-mbp.local (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, May 26, 2016 at 11:08:14AM -0700, Shaohua Li wrote: > -int blkdev_issue_discard(struct block_device *bdev, sector_t sector, > - sector_t nr_sects, gfp_t gfp_mask, unsigned long flags) > +static int do_blkdev_issue_discard(struct block_device *bdev, sector_t sector, We've split blkdev_issue_discard into __blkdev_issue_discard and a small wrapper around in for 4.7, so this will need a bit of an update. As part of that I also removed the strange EOPNOTSUPP ignore, but Mike reverted it just because it changed something in the dm testsuite. I still believe we should never ignore it in this helper, and only do so in callers that believe it's the right thing. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>> "Sitsofe" == Sitsofe Wheeler <sitsofe@yahoo.com> writes:
Sitsofe> The original SCSI WRITE SAME has overloaded semantics - not
Sitsofe> only does it mean "write this data multiple times" but it can
Sitsofe> also be used to mean "discard this range" too. If the kernel's
Sitsofe> command was modelled on the SCSI original perhaps this
Sitsofe> conflation clouded things?
REQ_WRITE_SAME in the context of the kernel explicitly means "write
payload to this block range".
A REQ_DISCARD command may be serviced using WRITE SAME(16) with the
UNMAP bit set in the SCSI disk driver but that's entirely orthogonal.
>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:
Christoph> As part of that I also removed the strange EOPNOTSUPP ignore,
Christoph> but Mike reverted it just because it changed something in the
Christoph> dm testsuite.
Mike?
Christoph> I still believe we should never ignore it in this helper, and
Christoph> only do so in callers that believe it's the right thing.
Yeah.
I really wish EOPNOTSUPP would just go away except for ioctl callers.
Now that we have real bi_error I don't understand why we need it.
On Thu, Jun 02 2016 at 11:06pm -0400, Martin K. Petersen <martin.petersen@oracle.com> wrote: > >>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes: > > Christoph> As part of that I also removed the strange EOPNOTSUPP ignore, > Christoph> but Mike reverted it just because it changed something in the > Christoph> dm testsuite. > > Mike? Yes? ;) Seems there is some serious confusion going on here. The entirety of hch's post (to which you quoted a subset) makes little sense to me. shli's patch builds ontop of latest blk-lib.c code yet hch said this:: "We've split blkdev_issue_discard into __blkdev_issue_discard and a small wrapper around in for 4.7, so this will need a bit of an update." And hch never "removed the strange EOPNOTSUPP ignore". He preserved it (see his commit 38f25255330's "return ret != -EOPNOTSUPP ? ret : 0;" that I adjusted in commit bbd848e0f -- _and_ he expanded it to eat the early return that I restored). So I can only infer that hch is still missing why my revert fixes historic blkdev_issue_discard() behavior that his commit regressed. Please read commit bbd848e0f's header. That at least details the early vs late -EOPNOTSUPP blkdev_issue_discard() return. But all that nuance aside, AFAICT my commit bbd848e0f ("block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard") really has _nothing_ to do with the issue shli is addressing with his fix. > Christoph> I still believe we should never ignore it in this helper, and > Christoph> only do so in callers that believe it's the right thing. > > Yeah. Hmm... You agreed to what hch said there about how we should probably always return EOPNOTSUPP but then you immediately elaborated with details that mean you don't agree: > I really wish EOPNOTSUPP would just go away except for ioctl callers. > Now that we have real bi_error I don't understand why we need it. But hch was originally in favor of _always_ dropping EOPNOTSUPP on the floor (that is what his commit 38f25255330 did). Then he said he disagrees with these interfaces playing games with masking EOPNOTSUPP -- to which you seemingly really don't agree. Unless I'm completely misreading you. Anyway, shli is at least making it so that blkdev_issue_zerout() can fallback to other mechanisms as needed. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
Mike> But hch was originally in favor of _always_ dropping EOPNOTSUPP on
Mike> the floor (that is what his commit 38f25255330 did). Then he said
Mike> he disagrees with these interfaces playing games with masking
Mike> EOPNOTSUPP -- to which you seemingly really don't agree. Unless
Mike> I'm completely misreading you.
Userland apps rely on EOPNOTSUPP, we can't break that.
What I don't like this is "soft" error special casing of EOPNOTSUPP in
the actual implementation of discard, write same, etc. These functions
should return either success or failure. And the ioctl wrapper should
then decide whether to return EOPNOTSUPP, EIO or EPONIES.
I.e. separate the policy from the implementation. This would also solve
some of the grievances for the target folks.
On Mon, Jun 06, 2016 at 10:32:38PM -0400, Martin K. Petersen wrote: > >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: > > Mike> But hch was originally in favor of _always_ dropping EOPNOTSUPP on > Mike> the floor (that is what his commit 38f25255330 did). Then he said > Mike> he disagrees with these interfaces playing games with masking > Mike> EOPNOTSUPP -- to which you seemingly really don't agree. Unless > Mike> I'm completely misreading you. > > Userland apps rely on EOPNOTSUPP, we can't break that. Rely on what exactly? Current we return EOPNOTSUPP if the device doesn't claim to support discards, but it returns 0 if the device first claims to support it but then fails the I/O. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes: >> Userland apps rely on EOPNOTSUPP, we can't break that. Christoph> Rely on what exactly? Current we return EOPNOTSUPP if the Christoph> device doesn't claim to support discards, but it returns 0 if Christoph> the device first claims to support it but then fails the I/O. Hopefully we can clean up this when/if we go the fallocate() route.
diff --git a/block/blk-lib.c b/block/blk-lib.c index 23d7f30..232f9ea 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -95,8 +95,9 @@ EXPORT_SYMBOL(__blkdev_issue_discard); * Description: * Issue a discard request for the sectors in question. */ -int blkdev_issue_discard(struct block_device *bdev, sector_t sector, - sector_t nr_sects, gfp_t gfp_mask, unsigned long flags) +static int do_blkdev_issue_discard(struct block_device *bdev, sector_t sector, + sector_t nr_sects, gfp_t gfp_mask, unsigned long flags, + bool ignore_nosupport) { int type = REQ_WRITE | REQ_DISCARD; struct bio *bio = NULL; @@ -111,13 +112,20 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, &bio); if (!ret && bio) { ret = submit_bio_wait(type, bio); - if (ret == -EOPNOTSUPP) + if (ignore_nosupport && ret == -EOPNOTSUPP) ret = 0; } blk_finish_plug(&plug); return ret; } + +int blkdev_issue_discard(struct block_device *bdev, sector_t sector, + sector_t nr_sects, gfp_t gfp_mask, unsigned long flags) +{ + return do_blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, + flags, true); +} EXPORT_SYMBOL(blkdev_issue_discard); /** @@ -131,9 +139,9 @@ EXPORT_SYMBOL(blkdev_issue_discard); * Description: * Issue a write same request for the sectors in question. */ -int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, +static int do_blkdev_issue_write_same(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, - struct page *page) + struct page *page, bool ignore_nosupport) { struct request_queue *q = bdev_get_queue(bdev); unsigned int max_write_same_sectors; @@ -167,7 +175,15 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, if (bio) ret = submit_bio_wait(REQ_WRITE | REQ_WRITE_SAME, bio); - return ret != -EOPNOTSUPP ? ret : 0; + return (ret != -EOPNOTSUPP || !ignore_nosupport) ? ret : 0; +} + +int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, + sector_t nr_sects, gfp_t gfp_mask, + struct page *page) +{ + return do_blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, + page, true); } EXPORT_SYMBOL(blkdev_issue_write_same); @@ -238,12 +254,13 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, struct request_queue *q = bdev_get_queue(bdev); if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data && - blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0) + do_blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0, + false) == 0) return 0; if (bdev_write_same(bdev) && - blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, - ZERO_PAGE(0)) == 0) + do_blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, + ZERO_PAGE(0), false) == 0) return 0; return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
blkdev_issue_zeroout try discard/writesame first, if they fail, zeroout fallback to regular write. The problem is discard/writesame doesn't return error for -EOPNOTSUPP, then zeroout can't do fallback and leave disk data not changed. zeroout should have guaranteed zero-fill behavior. BTW, I saw several callers of blkdev_issue_discard can handle -EOPNOTSUPP, not sure why blkdev_issue_discard not returns -EOPNOTSUPP. The same story for blkdev_issue_write_same. https://bugzilla.kernel.org/show_bug.cgi?id=118581 Cc: Sitsofe Wheeler <sitsofe@yahoo.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Jens Axboe <axboe@fb.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Shaohua Li <shli@fb.com> --- block/blk-lib.c | 35 ++++++++++++++++++++++++++--------- 1 file changed, 26 insertions(+), 9 deletions(-)