Message ID | 20230625022633.2753877-1-houtao@huaweicloud.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] virtio_pmem: add the missing REQ_OP_WRITE for flush bio | expand |
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
On 6/24/2023 7:26 PM, Hou Tao wrote: > From: Hou Tao <houtao1@huawei.com> > > When doing mkfs.xfs on a pmem device, the following warning was > reported and : > > ------------[ cut here ]------------ > WARNING: CPU: 2 PID: 384 at block/blk-core.c:751 submit_bio_noacct > Modules linked in: > CPU: 2 PID: 384 Comm: mkfs.xfs Not tainted 6.4.0-rc7+ #154 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) > RIP: 0010:submit_bio_noacct+0x340/0x520 > ...... > Call Trace: > <TASK> > ? asm_exc_invalid_op+0x1b/0x20 > ? submit_bio_noacct+0x340/0x520 > ? submit_bio_noacct+0xd5/0x520 > submit_bio+0x37/0x60 > async_pmem_flush+0x79/0xa0 > nvdimm_flush+0x17/0x40 > pmem_submit_bio+0x370/0x390 > __submit_bio+0xbc/0x190 > submit_bio_noacct_nocheck+0x14d/0x370 > submit_bio_noacct+0x1ef/0x520 > submit_bio+0x55/0x60 > submit_bio_wait+0x5a/0xc0 > blkdev_issue_flush+0x44/0x60 > > The root cause is that submit_bio_noacct() needs bio_op() is either > WRITE or ZONE_APPEND for flush bio and async_pmem_flush() doesn't assign > REQ_OP_WRITE when allocating flush bio, so submit_bio_noacct just fail > the flush bio. > > Simply fix it by adding the missing REQ_OP_WRITE for flush bio. And we > could fix the flush order issue and do flush optimization later. > > Fixes: b4a6bb3a67aa ("block: add a sanity check for non-write flush/fua bios") > Signed-off-by: Hou Tao <houtao1@huawei.com> > --- > v3: > * adjust the overly long lines in both commit message and code > > v2: https://lore.kernel.org/linux-block/20230621134340.878461-1-houtao@huaweicloud.com > * do a minimal fix first (Suggested by Christoph) > > v1: https://lore.kernel.org/linux-block/ZJLpYMC8FgtZ0k2k@infradead.org/T/#t > > Hi Jens & Dan, > > I found Pankaj was working on the optimization of virtio-pmem flush bio > [0], but considering the last status update was 1/12/2022, so could you > please pick the patch up for v6.4 and we can do the flush optimization > later ? > > [0]: https://lore.kernel.org/lkml/20220111161937.56272-1-pankaj.gupta.linux@gmail.com/T/ > I've failed to understand why we should wait for [0] ... Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> -ck
> From: Hou Tao <houtao1@huawei.com> > > When doing mkfs.xfs on a pmem device, the following warning was > reported and : > > ------------[ cut here ]------------ > WARNING: CPU: 2 PID: 384 at block/blk-core.c:751 submit_bio_noacct > Modules linked in: > CPU: 2 PID: 384 Comm: mkfs.xfs Not tainted 6.4.0-rc7+ #154 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) > RIP: 0010:submit_bio_noacct+0x340/0x520 > ...... > Call Trace: > <TASK> > ? asm_exc_invalid_op+0x1b/0x20 > ? submit_bio_noacct+0x340/0x520 > ? submit_bio_noacct+0xd5/0x520 > submit_bio+0x37/0x60 > async_pmem_flush+0x79/0xa0 > nvdimm_flush+0x17/0x40 > pmem_submit_bio+0x370/0x390 > __submit_bio+0xbc/0x190 > submit_bio_noacct_nocheck+0x14d/0x370 > submit_bio_noacct+0x1ef/0x520 > submit_bio+0x55/0x60 > submit_bio_wait+0x5a/0xc0 > blkdev_issue_flush+0x44/0x60 > > The root cause is that submit_bio_noacct() needs bio_op() is either > WRITE or ZONE_APPEND for flush bio and async_pmem_flush() doesn't assign > REQ_OP_WRITE when allocating flush bio, so submit_bio_noacct just fail > the flush bio. > > Simply fix it by adding the missing REQ_OP_WRITE for flush bio. And we > could fix the flush order issue and do flush optimization later. > > Fixes: b4a6bb3a67aa ("block: add a sanity check for non-write flush/fua bios") > Signed-off-by: Hou Tao <houtao1@huawei.com> With 6.3+ stable Cc, Feel free to add: Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Tested-by: Pankaj Gupta <pankaj.gupta@amd.com> Thanks, Pankaj
> I've failed to understand why we should wait for [0] ...
Sorry, missed to answer before. The optimization was to make the
virtio-pmem interface completely
asynchronous and coalesce the flush requests. As this uses the
asynchronous behavior of virtio bus
there is no stall of guest vCPU, so this optimization intends to solve below:
- Guest IO submitting thread can do other other operations till the
host flush completes.
- As all the guest flush work on one virtio-pmem device. If there is
single flush in queue all the subsequent flush would wait enabling
request coalescing.
Currently, its solved by splitting the bio request to parent & child.
But this results in preorder issue (As PREFLUSH happens after the
first fsync, because of the way how bio_submit() works). I think the
preflush order issue will still remain after this patch. But currently
this interface is broken in mainline. So, this patch fixes the issue.
Thanks,
Pankaj
+Cc Vishal, > > Fixes: b4a6bb3a67aa ("block: add a sanity check for non-write flush/fua bios") > > Signed-off-by: Hou Tao <houtao1@huawei.com> > > With 6.3+ stable Cc, Feel free to add: Hi Dan, Vishal, Do you have any suggestion on this patch? Or can we queue this please? Hi Hou, Could you please send a new version with Cc stable or as per any suggestions from maintainers. Thanks, Pankaj
Hi Pankaj, On 7/13/2023 4:23 PM, Pankaj Gupta wrote: > +Cc Vishal, > >>> Fixes: b4a6bb3a67aa ("block: add a sanity check for non-write flush/fua bios") >>> Signed-off-by: Hou Tao <houtao1@huawei.com> >> With 6.3+ stable Cc, Feel free to add: > Hi Dan, Vishal, > > Do you have any suggestion on this patch? Or can we queue this please? > > Hi Hou, > > Could you please send a new version with Cc stable or as per any > suggestions from maintainers. Will add Cc stable for v6.3 in v4. > > Thanks, > Pankaj > .
diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c index c6a648fd8744..1f8c667c6f1e 100644 --- a/drivers/nvdimm/nd_virtio.c +++ b/drivers/nvdimm/nd_virtio.c @@ -105,7 +105,8 @@ int async_pmem_flush(struct nd_region *nd_region, struct bio *bio) * parent bio. Otherwise directly call nd_region flush. */ if (bio && bio->bi_iter.bi_sector != -1) { - struct bio *child = bio_alloc(bio->bi_bdev, 0, REQ_PREFLUSH, + struct bio *child = bio_alloc(bio->bi_bdev, 0, + REQ_OP_WRITE | REQ_PREFLUSH, GFP_ATOMIC); if (!child)