mbox series

[v3,0/6] bio_split() error handling rework

Message ID 20241031095918.99964-1-john.g.garry@oracle.com (mailing list archive)
Headers show
Series bio_split() error handling rework | expand

Message

John Garry Oct. 31, 2024, 9:59 a.m. UTC
bio_split() error handling could be improved as follows:
- Instead of returning NULL for an error - which is vague - return a
  PTR_ERR, which may hint what went wrong.
- Remove BUG_ON() calls - which are generally not preferred - and instead
  WARN and pass an error code back to the caller. Many callers of
  bio_split() don't check the return code. As such, for an error we would
  be getting a crash still from an invalid pointer dereference.

Most bio_split() callers don't check the return value. However, it could
be argued the bio_split() calls should not fail. So far I have just
fixed up the md RAID code to handle these errors, as that is my interest
now.

The motivator for this series was initial md RAID atomic write support in
https://lore.kernel.org/linux-block/20241030094912.3960234-1-john.g.garry@oracle.com/T/#m5859ee900de8e6554d5bb027c0558f0147c32df8

There I wanted to ensure that we don't split an atomic write bio, and it
made more sense to handle this in bio_split() (instead of the bio_split()
caller).

Based on 133008e84b99 (block/for-6.13/block) blk-integrity: remove
seed for user mapped buffers

Changes since v2:
- Drop "block: Use BLK_STS_OK in bio_init()" change (Christoph)
- Use proper rdev indexing in raid10_write_request() (Kuai)
- Decrement rdev nr_pending in raid1 read error path (Kuai)
- Add RB tags from Christoph, Johannes, and Kuai (thanks!)

Changes since RFC:
- proper handling to end the raid bio in all cases, and also pass back
  proper error code (Kuai)
- Add WARN_ON_ERROR in bio_split() (Johannes, Christoph)
- Add small patch to use BLK_STS_OK in bio_init()
- Change bio_submit_split() error path (Christoph)

John Garry (6):
  block: Rework bio_split() return value
  block: Error an attempt to split an atomic write in bio_split()
  block: Handle bio_split() errors in bio_submit_split()
  md/raid0: Handle bio_split() errors
  md/raid1: Handle bio_split() errors
  md/raid10: Handle bio_split() errors

 block/bio.c                 | 14 +++++++----
 block/blk-crypto-fallback.c |  2 +-
 block/blk-merge.c           | 15 ++++++++----
 drivers/md/raid0.c          | 12 ++++++++++
 drivers/md/raid1.c          | 33 ++++++++++++++++++++++++--
 drivers/md/raid10.c         | 47 ++++++++++++++++++++++++++++++++++++-
 6 files changed, 110 insertions(+), 13 deletions(-)

Comments

John Garry Nov. 7, 2024, 6:49 a.m. UTC | #1
On 31/10/2024 09:59, John Garry wrote:

Hi Jens,

Could you kindly consider picking up this series via the block tree?

Please note that the raid 0/1/10 atomic write support in 
https://lore.kernel.org/linux-block/20241101144616.497602-1-john.g.garry@oracle.com/ 
depends on this series, so maybe you would also be willing to pick that 
one up (when it's fully reviewed). Or create a branch with all the block 
changes, like which was done for the atomic writes XFS support.

Thanks very much,
John

> bio_split() error handling could be improved as follows:
> - Instead of returning NULL for an error - which is vague - return a
>    PTR_ERR, which may hint what went wrong.
> - Remove BUG_ON() calls - which are generally not preferred - and instead
>    WARN and pass an error code back to the caller. Many callers of
>    bio_split() don't check the return code. As such, for an error we would
>    be getting a crash still from an invalid pointer dereference.
> 
> Most bio_split() callers don't check the return value. However, it could
> be argued the bio_split() calls should not fail. So far I have just
> fixed up the md RAID code to handle these errors, as that is my interest
> now.
> 
> The motivator for this series was initial md RAID atomic write support in
> https://lore.kernel.org/linux-block/20241030094912.3960234-1-john.g.garry@oracle.com/T/#m5859ee900de8e6554d5bb027c0558f0147c32df8
> 
> There I wanted to ensure that we don't split an atomic write bio, and it
> made more sense to handle this in bio_split() (instead of the bio_split()
> caller).
> 
> Based on 133008e84b99 (block/for-6.13/block) blk-integrity: remove
> seed for user mapped buffers
> 
> Changes since v2:
> - Drop "block: Use BLK_STS_OK in bio_init()" change (Christoph)
> - Use proper rdev indexing in raid10_write_request() (Kuai)
> - Decrement rdev nr_pending in raid1 read error path (Kuai)
> - Add RB tags from Christoph, Johannes, and Kuai (thanks!)
> 
> Changes since RFC:
> - proper handling to end the raid bio in all cases, and also pass back
>    proper error code (Kuai)
> - Add WARN_ON_ERROR in bio_split() (Johannes, Christoph)
> - Add small patch to use BLK_STS_OK in bio_init()
> - Change bio_submit_split() error path (Christoph)
> 
> John Garry (6):
>    block: Rework bio_split() return value
>    block: Error an attempt to split an atomic write in bio_split()
>    block: Handle bio_split() errors in bio_submit_split()
>    md/raid0: Handle bio_split() errors
>    md/raid1: Handle bio_split() errors
>    md/raid10: Handle bio_split() errors
> 
>   block/bio.c                 | 14 +++++++----
>   block/blk-crypto-fallback.c |  2 +-
>   block/blk-merge.c           | 15 ++++++++----
>   drivers/md/raid0.c          | 12 ++++++++++
>   drivers/md/raid1.c          | 33 ++++++++++++++++++++++++--
>   drivers/md/raid10.c         | 47 ++++++++++++++++++++++++++++++++++++-
>   6 files changed, 110 insertions(+), 13 deletions(-)
>
John Garry Nov. 7, 2024, 6:49 a.m. UTC | #2
On 31/10/2024 09:59, John Garry wrote:

Hi Jens,

Could you kindly consider picking up this series via the block tree?

Please note that the raid 0/1/10 atomic write support in 
https://lore.kernel.org/linux-block/20241101144616.497602-1-john.g.garry@oracle.com/ 
depends on this series, so maybe you would also be willing to pick that 
one up (when it's fully reviewed). Or create a branch with all the block 
changes, like which was done for the atomic writes XFS support.

Thanks very much,
John

> bio_split() error handling could be improved as follows:
> - Instead of returning NULL for an error - which is vague - return a
>    PTR_ERR, which may hint what went wrong.
> - Remove BUG_ON() calls - which are generally not preferred - and instead
>    WARN and pass an error code back to the caller. Many callers of
>    bio_split() don't check the return code. As such, for an error we would
>    be getting a crash still from an invalid pointer dereference.
> 
> Most bio_split() callers don't check the return value. However, it could
> be argued the bio_split() calls should not fail. So far I have just
> fixed up the md RAID code to handle these errors, as that is my interest
> now.
> 
> The motivator for this series was initial md RAID atomic write support in
> https://lore.kernel.org/linux-block/20241030094912.3960234-1-john.g.garry@oracle.com/T/#m5859ee900de8e6554d5bb027c0558f0147c32df8
> 
> There I wanted to ensure that we don't split an atomic write bio, and it
> made more sense to handle this in bio_split() (instead of the bio_split()
> caller).
> 
> Based on 133008e84b99 (block/for-6.13/block) blk-integrity: remove
> seed for user mapped buffers
> 
> Changes since v2:
> - Drop "block: Use BLK_STS_OK in bio_init()" change (Christoph)
> - Use proper rdev indexing in raid10_write_request() (Kuai)
> - Decrement rdev nr_pending in raid1 read error path (Kuai)
> - Add RB tags from Christoph, Johannes, and Kuai (thanks!)
> 
> Changes since RFC:
> - proper handling to end the raid bio in all cases, and also pass back
>    proper error code (Kuai)
> - Add WARN_ON_ERROR in bio_split() (Johannes, Christoph)
> - Add small patch to use BLK_STS_OK in bio_init()
> - Change bio_submit_split() error path (Christoph)
> 
> John Garry (6):
>    block: Rework bio_split() return value
>    block: Error an attempt to split an atomic write in bio_split()
>    block: Handle bio_split() errors in bio_submit_split()
>    md/raid0: Handle bio_split() errors
>    md/raid1: Handle bio_split() errors
>    md/raid10: Handle bio_split() errors
> 
>   block/bio.c                 | 14 +++++++----
>   block/blk-crypto-fallback.c |  2 +-
>   block/blk-merge.c           | 15 ++++++++----
>   drivers/md/raid0.c          | 12 ++++++++++
>   drivers/md/raid1.c          | 33 ++++++++++++++++++++++++--
>   drivers/md/raid10.c         | 47 ++++++++++++++++++++++++++++++++++++-
>   6 files changed, 110 insertions(+), 13 deletions(-)
>
Jens Axboe Nov. 7, 2024, 6:27 p.m. UTC | #3
On 11/6/24 11:49 PM, John Garry wrote:
> On 31/10/2024 09:59, John Garry wrote:
> 
> Hi Jens,
> 
> Could you kindly consider picking up this series via the block tree?
> 
> Please note that the raid 0/1/10 atomic write support in https://lore.kernel.org/linux-block/20241101144616.497602-1-john.g.garry@oracle.com/ depends on this series, so maybe you would also be willing to pick that one up (when it's fully reviewed). Or create a branch with all the block changes, like which was done for the atomic writes XFS support.

The series doesn't apply to for-6.13/block - the 3rd patch for bio
splitting conflicts with:

commit b35243a447b9fe6457fa8e1352152b818436ba5a
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Aug 26 19:37:54 2024 +0200

    block: rework bio splitting

which was in long before for-6.13/block, which it's supposed to be based
on?

Please double check and resend a v4.
John Garry Nov. 7, 2024, 7:54 p.m. UTC | #4
On 07/11/2024 18:27, Jens Axboe wrote:
> The series doesn't apply to for-6.13/block - the 3rd patch for bio
> splitting conflicts with:
> 
> commit b35243a447b9fe6457fa8e1352152b818436ba5a
> Author: Christoph Hellwig<hch@lst.de>
> Date:   Mon Aug 26 19:37:54 2024 +0200
> 
>      block: rework bio splitting
> 
> which was in long before for-6.13/block, which it's supposed to be based
> on?
> 
> Please double check and resend a v4.

Hi Jens,

I was based on a recent commit, 133008e84b99 (block/for-6.13/block) 
blk-integrity: remove seed for user mapped buffers

Anyway, it is a week old, so I will rebase and repost.

Thanks,
John