diff mbox series

[v3] virtio_pmem: add the missing REQ_OP_WRITE for flush bio

Message ID 20230625022633.2753877-1-houtao@huaweicloud.com (mailing list archive)
State New, archived
Headers show
Series [v3] virtio_pmem: add the missing REQ_OP_WRITE for flush bio | expand

Commit Message

Hou Tao June 25, 2023, 2:26 a.m. UTC
From: Hou Tao <houtao1@huawei.com>

When doing mkfs.xfs on a pmem device, the following warning was
reported and :

 ------------[ cut here ]------------
 WARNING: CPU: 2 PID: 384 at block/blk-core.c:751 submit_bio_noacct
 Modules linked in:
 CPU: 2 PID: 384 Comm: mkfs.xfs Not tainted 6.4.0-rc7+ #154
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
 RIP: 0010:submit_bio_noacct+0x340/0x520
 ......
 Call Trace:
  <TASK>
  ? asm_exc_invalid_op+0x1b/0x20
  ? submit_bio_noacct+0x340/0x520
  ? submit_bio_noacct+0xd5/0x520
  submit_bio+0x37/0x60
  async_pmem_flush+0x79/0xa0
  nvdimm_flush+0x17/0x40
  pmem_submit_bio+0x370/0x390
  __submit_bio+0xbc/0x190
  submit_bio_noacct_nocheck+0x14d/0x370
  submit_bio_noacct+0x1ef/0x520
  submit_bio+0x55/0x60
  submit_bio_wait+0x5a/0xc0
  blkdev_issue_flush+0x44/0x60

The root cause is that submit_bio_noacct() needs bio_op() is either
WRITE or ZONE_APPEND for flush bio and async_pmem_flush() doesn't assign
REQ_OP_WRITE when allocating flush bio, so submit_bio_noacct just fail
the flush bio.

Simply fix it by adding the missing REQ_OP_WRITE for flush bio. And we
could fix the flush order issue and do flush optimization later.

Fixes: b4a6bb3a67aa ("block: add a sanity check for non-write flush/fua bios")
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
v3:
 * adjust the overly long lines in both commit message and code

v2: https://lore.kernel.org/linux-block/20230621134340.878461-1-houtao@huaweicloud.com
 * do a minimal fix first (Suggested by Christoph)

v1: https://lore.kernel.org/linux-block/ZJLpYMC8FgtZ0k2k@infradead.org/T/#t

Hi Jens & Dan,

I found Pankaj was working on the optimization of virtio-pmem flush bio
[0], but considering the last status update was 1/12/2022, so could you
please pick the patch up for v6.4 and we can do the flush optimization
later ?

[0]: https://lore.kernel.org/lkml/20220111161937.56272-1-pankaj.gupta.linux@gmail.com/T/

 drivers/nvdimm/nd_virtio.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig June 26, 2023, 7:53 a.m. UTC | #1
Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
Chaitanya Kulkarni June 26, 2023, 8:40 a.m. UTC | #2
On 6/24/2023 7:26 PM, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> When doing mkfs.xfs on a pmem device, the following warning was
> reported and :
> 
>   ------------[ cut here ]------------
>   WARNING: CPU: 2 PID: 384 at block/blk-core.c:751 submit_bio_noacct
>   Modules linked in:
>   CPU: 2 PID: 384 Comm: mkfs.xfs Not tainted 6.4.0-rc7+ #154
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
>   RIP: 0010:submit_bio_noacct+0x340/0x520
>   ......
>   Call Trace:
>    <TASK>
>    ? asm_exc_invalid_op+0x1b/0x20
>    ? submit_bio_noacct+0x340/0x520
>    ? submit_bio_noacct+0xd5/0x520
>    submit_bio+0x37/0x60
>    async_pmem_flush+0x79/0xa0
>    nvdimm_flush+0x17/0x40
>    pmem_submit_bio+0x370/0x390
>    __submit_bio+0xbc/0x190
>    submit_bio_noacct_nocheck+0x14d/0x370
>    submit_bio_noacct+0x1ef/0x520
>    submit_bio+0x55/0x60
>    submit_bio_wait+0x5a/0xc0
>    blkdev_issue_flush+0x44/0x60
> 
> The root cause is that submit_bio_noacct() needs bio_op() is either
> WRITE or ZONE_APPEND for flush bio and async_pmem_flush() doesn't assign
> REQ_OP_WRITE when allocating flush bio, so submit_bio_noacct just fail
> the flush bio.
> 
> Simply fix it by adding the missing REQ_OP_WRITE for flush bio. And we
> could fix the flush order issue and do flush optimization later.
> 
> Fixes: b4a6bb3a67aa ("block: add a sanity check for non-write flush/fua bios")
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
> v3:
>   * adjust the overly long lines in both commit message and code
> 
> v2: https://lore.kernel.org/linux-block/20230621134340.878461-1-houtao@huaweicloud.com
>   * do a minimal fix first (Suggested by Christoph)
> 
> v1: https://lore.kernel.org/linux-block/ZJLpYMC8FgtZ0k2k@infradead.org/T/#t
> 
> Hi Jens & Dan,
> 
> I found Pankaj was working on the optimization of virtio-pmem flush bio
> [0], but considering the last status update was 1/12/2022, so could you
> please pick the patch up for v6.4 and we can do the flush optimization
> later ?
> 
> [0]: https://lore.kernel.org/lkml/20220111161937.56272-1-pankaj.gupta.linux@gmail.com/T/
> 

I've failed to understand why we should wait for [0] ...

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck
Pankaj Gupta June 27, 2023, 9:03 a.m. UTC | #3
> From: Hou Tao <houtao1@huawei.com>
>
> When doing mkfs.xfs on a pmem device, the following warning was
> reported and :
>
>  ------------[ cut here ]------------
>  WARNING: CPU: 2 PID: 384 at block/blk-core.c:751 submit_bio_noacct
>  Modules linked in:
>  CPU: 2 PID: 384 Comm: mkfs.xfs Not tainted 6.4.0-rc7+ #154
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
>  RIP: 0010:submit_bio_noacct+0x340/0x520
>  ......
>  Call Trace:
>   <TASK>
>   ? asm_exc_invalid_op+0x1b/0x20
>   ? submit_bio_noacct+0x340/0x520
>   ? submit_bio_noacct+0xd5/0x520
>   submit_bio+0x37/0x60
>   async_pmem_flush+0x79/0xa0
>   nvdimm_flush+0x17/0x40
>   pmem_submit_bio+0x370/0x390
>   __submit_bio+0xbc/0x190
>   submit_bio_noacct_nocheck+0x14d/0x370
>   submit_bio_noacct+0x1ef/0x520
>   submit_bio+0x55/0x60
>   submit_bio_wait+0x5a/0xc0
>   blkdev_issue_flush+0x44/0x60
>
> The root cause is that submit_bio_noacct() needs bio_op() is either
> WRITE or ZONE_APPEND for flush bio and async_pmem_flush() doesn't assign
> REQ_OP_WRITE when allocating flush bio, so submit_bio_noacct just fail
> the flush bio.
>
> Simply fix it by adding the missing REQ_OP_WRITE for flush bio. And we
> could fix the flush order issue and do flush optimization later.
>
> Fixes: b4a6bb3a67aa ("block: add a sanity check for non-write flush/fua bios")
> Signed-off-by: Hou Tao <houtao1@huawei.com>

With 6.3+ stable Cc, Feel free to add:

Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Tested-by: Pankaj Gupta <pankaj.gupta@amd.com>

Thanks,
Pankaj
Pankaj Gupta July 13, 2023, 8:17 a.m. UTC | #4
> I've failed to understand why we should wait for [0] ...

Sorry, missed to answer before. The optimization was to make the
virtio-pmem interface completely
asynchronous and coalesce the flush requests. As this uses the
asynchronous behavior of virtio bus
there is no stall of guest vCPU, so this optimization intends to solve below:

- Guest IO submitting thread can do other other operations till the
host flush completes.
- As all the guest flush work on one virtio-pmem device. If there is
single flush in queue all the subsequent flush would wait enabling
request coalescing.

Currently, its solved by splitting the bio request to parent & child.
But this results in preorder issue (As PREFLUSH happens after the
first fsync, because of the way how bio_submit() works). I think the
preflush order issue will still remain after this patch. But currently
this interface is broken in mainline. So, this patch fixes the issue.


Thanks,
Pankaj
Pankaj Gupta July 13, 2023, 8:23 a.m. UTC | #5
+Cc Vishal,

> > Fixes: b4a6bb3a67aa ("block: add a sanity check for non-write flush/fua bios")
> > Signed-off-by: Hou Tao <houtao1@huawei.com>
>
> With 6.3+ stable Cc, Feel free to add:

Hi Dan, Vishal,

Do you have any suggestion on this patch? Or can we queue this please?

Hi Hou,

Could you please send a new version with Cc stable or as per any
suggestions from maintainers.

Thanks,
Pankaj
Hou Tao July 13, 2023, 1:06 p.m. UTC | #6
Hi Pankaj,

On 7/13/2023 4:23 PM, Pankaj Gupta wrote:
> +Cc Vishal,
>
>>> Fixes: b4a6bb3a67aa ("block: add a sanity check for non-write flush/fua bios")
>>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>> With 6.3+ stable Cc, Feel free to add:
> Hi Dan, Vishal,
>
> Do you have any suggestion on this patch? Or can we queue this please?
>
> Hi Hou,
>
> Could you please send a new version with Cc stable or as per any
> suggestions from maintainers.
Will add Cc stable for v6.3 in v4.
>
> Thanks,
> Pankaj
> .
diff mbox series

Patch

diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
index c6a648fd8744..1f8c667c6f1e 100644
--- a/drivers/nvdimm/nd_virtio.c
+++ b/drivers/nvdimm/nd_virtio.c
@@ -105,7 +105,8 @@  int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
 	 * parent bio. Otherwise directly call nd_region flush.
 	 */
 	if (bio && bio->bi_iter.bi_sector != -1) {
-		struct bio *child = bio_alloc(bio->bi_bdev, 0, REQ_PREFLUSH,
+		struct bio *child = bio_alloc(bio->bi_bdev, 0,
+					      REQ_OP_WRITE | REQ_PREFLUSH,
 					      GFP_ATOMIC);
 
 		if (!child)