diff mbox series

[1/2] block: introduce BLK_STS_OFFLINE

Message ID 20220203064009.1795344-2-song@kernel.org (mailing list archive)
State Superseded
Headers show
Series block: scsi: introduce and use BLK_STS_OFFLINE | expand

Commit Message

Song Liu Feb. 3, 2022, 6:40 a.m. UTC
Currently, drivers reports BLK_STS_IOERR for devices that are not full
online or being removed. This behavior could cause confusion for users,
as they are not really I/O errors from the device.

Solve this issue with a new state BLK_STS_OFFLINE, which reports "device
offline error" in dmesg instead of "I/O error".

Signed-off-by: Song Liu <song@kernel.org>
---
 block/blk-core.c          | 1 +
 include/linux/blk_types.h | 7 +++++++
 2 files changed, 8 insertions(+)

Comments

Song Liu Feb. 3, 2022, 6:52 a.m. UTC | #1
CC linux-block (it was a typo in the original email)

On Wed, Feb 2, 2022 at 10:40 PM Song Liu <song@kernel.org> wrote:
>
> Currently, drivers reports BLK_STS_IOERR for devices that are not full
> online or being removed. This behavior could cause confusion for users,
> as they are not really I/O errors from the device.
>
> Solve this issue with a new state BLK_STS_OFFLINE, which reports "device
> offline error" in dmesg instead of "I/O error".
>
> Signed-off-by: Song Liu <song@kernel.org>
> ---
>  block/blk-core.c          | 1 +
>  include/linux/blk_types.h | 7 +++++++
>  2 files changed, 8 insertions(+)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 61f6a0dc4511..24035dd2eef1 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -164,6 +164,7 @@ static const struct {
>         [BLK_STS_RESOURCE]      = { -ENOMEM,    "kernel resource" },
>         [BLK_STS_DEV_RESOURCE]  = { -EBUSY,     "device resource" },
>         [BLK_STS_AGAIN]         = { -EAGAIN,    "nonblocking retry" },
> +       [BLK_STS_OFFLINE]       = { -EIO,       "device offline" },
>
>         /* device mapper special case, should not leak out: */
>         [BLK_STS_DM_REQUEUE]    = { -EREMCHG, "dm internal retry" },
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index fe065c394fff..5561e58d158a 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -153,6 +153,13 @@ typedef u8 __bitwise blk_status_t;
>   */
>  #define BLK_STS_ZONE_ACTIVE_RESOURCE   ((__force blk_status_t)16)
>
> +/*
> + * BLK_STS_OFFLINE is returned from the driver when the target device is offline
> + * or is being taken offline. This could help differentiate the case where a
> + * device is intentionally being shut down from a real I/O error.
> + */
> +#define BLK_STS_OFFLINE                ((__force blk_status_t)17)
> +
>  /**
>   * blk_path_error - returns true if error may be path related
>   * @error: status the request was completed with
> --
> 2.30.2
>
Hannes Reinecke Feb. 3, 2022, 7:24 a.m. UTC | #2
On 2/3/22 07:52, Song Liu wrote:
> CC linux-block (it was a typo in the original email)
> 
> On Wed, Feb 2, 2022 at 10:40 PM Song Liu <song@kernel.org> wrote:
>>
>> Currently, drivers reports BLK_STS_IOERR for devices that are not full
>> online or being removed. This behavior could cause confusion for users,
>> as they are not really I/O errors from the device.
>>
>> Solve this issue with a new state BLK_STS_OFFLINE, which reports "device
>> offline error" in dmesg instead of "I/O error".
>>
>> Signed-off-by: Song Liu <song@kernel.org>
>> ---
>>   block/blk-core.c          | 1 +
>>   include/linux/blk_types.h | 7 +++++++
>>   2 files changed, 8 insertions(+)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 61f6a0dc4511..24035dd2eef1 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -164,6 +164,7 @@ static const struct {
>>          [BLK_STS_RESOURCE]      = { -ENOMEM,    "kernel resource" },
>>          [BLK_STS_DEV_RESOURCE]  = { -EBUSY,     "device resource" },
>>          [BLK_STS_AGAIN]         = { -EAGAIN,    "nonblocking retry" },
>> +       [BLK_STS_OFFLINE]       = { -EIO,       "device offline" },
>>
>>          /* device mapper special case, should not leak out: */
>>          [BLK_STS_DM_REQUEUE]    = { -EREMCHG, "dm internal retry" },
>> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
>> index fe065c394fff..5561e58d158a 100644
>> --- a/include/linux/blk_types.h
>> +++ b/include/linux/blk_types.h
>> @@ -153,6 +153,13 @@ typedef u8 __bitwise blk_status_t;
>>    */
>>   #define BLK_STS_ZONE_ACTIVE_RESOURCE   ((__force blk_status_t)16)
>>
>> +/*
>> + * BLK_STS_OFFLINE is returned from the driver when the target device is offline
>> + * or is being taken offline. This could help differentiate the case where a
>> + * device is intentionally being shut down from a real I/O error.
>> + */
>> +#define BLK_STS_OFFLINE                ((__force blk_status_t)17)
>> +
>>   /**
>>    * blk_path_error - returns true if error may be path related
>>    * @error: status the request was completed with
>> --
>> 2.30.2
>>
Please do not overload EIO here.
EIO already is a catch-all error if we don't know any better, but for 
the 'device offline' case we do (or rather should).
Please map it onto 'ENODEV' or 'ENXIO'.

Cheers,

Hannes
Jens Axboe Feb. 3, 2022, 1:47 p.m. UTC | #3
On 2/3/22 12:24 AM, Hannes Reinecke wrote:
> On 2/3/22 07:52, Song Liu wrote:
>> CC linux-block (it was a typo in the original email)
>>
>> On Wed, Feb 2, 2022 at 10:40 PM Song Liu <song@kernel.org> wrote:
>>>
>>> Currently, drivers reports BLK_STS_IOERR for devices that are not full
>>> online or being removed. This behavior could cause confusion for users,
>>> as they are not really I/O errors from the device.
>>>
>>> Solve this issue with a new state BLK_STS_OFFLINE, which reports "device
>>> offline error" in dmesg instead of "I/O error".
>>>
>>> Signed-off-by: Song Liu <song@kernel.org>
>>> ---
>>>   block/blk-core.c          | 1 +
>>>   include/linux/blk_types.h | 7 +++++++
>>>   2 files changed, 8 insertions(+)
>>>
>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>> index 61f6a0dc4511..24035dd2eef1 100644
>>> --- a/block/blk-core.c
>>> +++ b/block/blk-core.c
>>> @@ -164,6 +164,7 @@ static const struct {
>>>          [BLK_STS_RESOURCE]      = { -ENOMEM,    "kernel resource" },
>>>          [BLK_STS_DEV_RESOURCE]  = { -EBUSY,     "device resource" },
>>>          [BLK_STS_AGAIN]         = { -EAGAIN,    "nonblocking retry" },
>>> +       [BLK_STS_OFFLINE]       = { -EIO,       "device offline" },
>>>
>>>          /* device mapper special case, should not leak out: */
>>>          [BLK_STS_DM_REQUEUE]    = { -EREMCHG, "dm internal retry" },
>>> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
>>> index fe065c394fff..5561e58d158a 100644
>>> --- a/include/linux/blk_types.h
>>> +++ b/include/linux/blk_types.h
>>> @@ -153,6 +153,13 @@ typedef u8 __bitwise blk_status_t;
>>>    */
>>>   #define BLK_STS_ZONE_ACTIVE_RESOURCE   ((__force blk_status_t)16)
>>>
>>> +/*
>>> + * BLK_STS_OFFLINE is returned from the driver when the target device is offline
>>> + * or is being taken offline. This could help differentiate the case where a
>>> + * device is intentionally being shut down from a real I/O error.
>>> + */
>>> +#define BLK_STS_OFFLINE                ((__force blk_status_t)17)
>>> +
>>>   /**
>>>    * blk_path_error - returns true if error may be path related
>>>    * @error: status the request was completed with
>>> --
>>> 2.30.2
>>>
> Please do not overload EIO here.
> EIO already is a catch-all error if we don't know any better, but for 
> the 'device offline' case we do (or rather should).
> Please map it onto 'ENODEV' or 'ENXIO'.

It's deliberately EIO as not to force a change in behavior. I don't mind
using something else, but that should be a separate change then.
Song Liu Feb. 3, 2022, 5:23 p.m. UTC | #4
Hi Hannes and Jens,

> On Feb 3, 2022, at 5:47 AM, Jens Axboe <axboe@kernel.dk> wrote:
> 
> On 2/3/22 12:24 AM, Hannes Reinecke wrote:
>> On 2/3/22 07:52, Song Liu wrote:
>>> CC linux-block (it was a typo in the original email)
>>> 
>>> On Wed, Feb 2, 2022 at 10:40 PM Song Liu <song@kernel.org> wrote:
>>>> 
>>>> Currently, drivers reports BLK_STS_IOERR for devices that are not full
>>>> online or being removed. This behavior could cause confusion for users,
>>>> as they are not really I/O errors from the device.
>>>> 
>>>> Solve this issue with a new state BLK_STS_OFFLINE, which reports "device
>>>> offline error" in dmesg instead of "I/O error".
>>>> 
>>>> Signed-off-by: Song Liu <song@kernel.org>
>>>> ---
>>>>  block/blk-core.c          | 1 +
>>>>  include/linux/blk_types.h | 7 +++++++
>>>>  2 files changed, 8 insertions(+)
>>>> 
>>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>>> index 61f6a0dc4511..24035dd2eef1 100644
>>>> --- a/block/blk-core.c
>>>> +++ b/block/blk-core.c
>>>> @@ -164,6 +164,7 @@ static const struct {
>>>>         [BLK_STS_RESOURCE]      = { -ENOMEM,    "kernel resource" },
>>>>         [BLK_STS_DEV_RESOURCE]  = { -EBUSY,     "device resource" },
>>>>         [BLK_STS_AGAIN]         = { -EAGAIN,    "nonblocking retry" },
>>>> +       [BLK_STS_OFFLINE]       = { -EIO,       "device offline" },
>>>> 
>>>>         /* device mapper special case, should not leak out: */
>>>>         [BLK_STS_DM_REQUEUE]    = { -EREMCHG, "dm internal retry" },
>>>> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
>>>> index fe065c394fff..5561e58d158a 100644
>>>> --- a/include/linux/blk_types.h
>>>> +++ b/include/linux/blk_types.h
>>>> @@ -153,6 +153,13 @@ typedef u8 __bitwise blk_status_t;
>>>>   */
>>>>  #define BLK_STS_ZONE_ACTIVE_RESOURCE   ((__force blk_status_t)16)
>>>> 
>>>> +/*
>>>> + * BLK_STS_OFFLINE is returned from the driver when the target device is offline
>>>> + * or is being taken offline. This could help differentiate the case where a
>>>> + * device is intentionally being shut down from a real I/O error.
>>>> + */
>>>> +#define BLK_STS_OFFLINE                ((__force blk_status_t)17)
>>>> +
>>>>  /**
>>>>   * blk_path_error - returns true if error may be path related
>>>>   * @error: status the request was completed with
>>>> --
>>>> 2.30.2
>>>> 
>> Please do not overload EIO here.
>> EIO already is a catch-all error if we don't know any better, but for 
>> the 'device offline' case we do (or rather should).
>> Please map it onto 'ENODEV' or 'ENXIO'.
> 
> It's deliberately EIO as not to force a change in behavior. I don't mind
> using something else, but that should be a separate change then.

Thanks for these feedbacks. Shall I send v2 with an extra patch that 
changes EIO to ENODEV/ENXIO? Or shall we do that in a follow up patch? 
Also, any preference between ENODEV and ENXIO? 

Thanks,
Song
Jens Axboe Feb. 3, 2022, 6:51 p.m. UTC | #5
On 2/3/22 10:23 AM, Song Liu wrote:
> Hi Hannes and Jens,
> 
>> On Feb 3, 2022, at 5:47 AM, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 2/3/22 12:24 AM, Hannes Reinecke wrote:
>>> On 2/3/22 07:52, Song Liu wrote:
>>>> CC linux-block (it was a typo in the original email)
>>>>
>>>> On Wed, Feb 2, 2022 at 10:40 PM Song Liu <song@kernel.org> wrote:
>>>>>
>>>>> Currently, drivers reports BLK_STS_IOERR for devices that are not full
>>>>> online or being removed. This behavior could cause confusion for users,
>>>>> as they are not really I/O errors from the device.
>>>>>
>>>>> Solve this issue with a new state BLK_STS_OFFLINE, which reports "device
>>>>> offline error" in dmesg instead of "I/O error".
>>>>>
>>>>> Signed-off-by: Song Liu <song@kernel.org>
>>>>> ---
>>>>>  block/blk-core.c          | 1 +
>>>>>  include/linux/blk_types.h | 7 +++++++
>>>>>  2 files changed, 8 insertions(+)
>>>>>
>>>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>>>> index 61f6a0dc4511..24035dd2eef1 100644
>>>>> --- a/block/blk-core.c
>>>>> +++ b/block/blk-core.c
>>>>> @@ -164,6 +164,7 @@ static const struct {
>>>>>         [BLK_STS_RESOURCE]      = { -ENOMEM,    "kernel resource" },
>>>>>         [BLK_STS_DEV_RESOURCE]  = { -EBUSY,     "device resource" },
>>>>>         [BLK_STS_AGAIN]         = { -EAGAIN,    "nonblocking retry" },
>>>>> +       [BLK_STS_OFFLINE]       = { -EIO,       "device offline" },
>>>>>
>>>>>         /* device mapper special case, should not leak out: */
>>>>>         [BLK_STS_DM_REQUEUE]    = { -EREMCHG, "dm internal retry" },
>>>>> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
>>>>> index fe065c394fff..5561e58d158a 100644
>>>>> --- a/include/linux/blk_types.h
>>>>> +++ b/include/linux/blk_types.h
>>>>> @@ -153,6 +153,13 @@ typedef u8 __bitwise blk_status_t;
>>>>>   */
>>>>>  #define BLK_STS_ZONE_ACTIVE_RESOURCE   ((__force blk_status_t)16)
>>>>>
>>>>> +/*
>>>>> + * BLK_STS_OFFLINE is returned from the driver when the target device is offline
>>>>> + * or is being taken offline. This could help differentiate the case where a
>>>>> + * device is intentionally being shut down from a real I/O error.
>>>>> + */
>>>>> +#define BLK_STS_OFFLINE                ((__force blk_status_t)17)
>>>>> +
>>>>>  /**
>>>>>   * blk_path_error - returns true if error may be path related
>>>>>   * @error: status the request was completed with
>>>>> --
>>>>> 2.30.2
>>>>>
>>> Please do not overload EIO here.
>>> EIO already is a catch-all error if we don't know any better, but for 
>>> the 'device offline' case we do (or rather should).
>>> Please map it onto 'ENODEV' or 'ENXIO'.
>>
>> It's deliberately EIO as not to force a change in behavior. I don't mind
>> using something else, but that should be a separate change then.
> 
> Thanks for these feedbacks. Shall I send v2 with an extra patch that 
> changes EIO to ENODEV/ENXIO? Or shall we do that in a follow up patch? 
> Also, any preference between ENODEV and ENXIO? 

Yeah I think so, and perhaps put a mention in this patch on why EIO is
chosen to not change the user visible return value.
Hannes Reinecke Feb. 4, 2022, 7:14 a.m. UTC | #6
On 2/3/22 18:23, Song Liu wrote:
> Hi Hannes and Jens,
> 
>> On Feb 3, 2022, at 5:47 AM, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 2/3/22 12:24 AM, Hannes Reinecke wrote:
>>> On 2/3/22 07:52, Song Liu wrote:
>>>> CC linux-block (it was a typo in the original email)
>>>>
>>>> On Wed, Feb 2, 2022 at 10:40 PM Song Liu <song@kernel.org> wrote:
>>>>>
>>>>> Currently, drivers reports BLK_STS_IOERR for devices that are not full
>>>>> online or being removed. This behavior could cause confusion for users,
>>>>> as they are not really I/O errors from the device.
>>>>>
>>>>> Solve this issue with a new state BLK_STS_OFFLINE, which reports "device
>>>>> offline error" in dmesg instead of "I/O error".
>>>>>
>>>>> Signed-off-by: Song Liu <song@kernel.org>
>>>>> ---
>>>>>   block/blk-core.c          | 1 +
>>>>>   include/linux/blk_types.h | 7 +++++++
>>>>>   2 files changed, 8 insertions(+)
>>>>>
>>>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>>>> index 61f6a0dc4511..24035dd2eef1 100644
>>>>> --- a/block/blk-core.c
>>>>> +++ b/block/blk-core.c
>>>>> @@ -164,6 +164,7 @@ static const struct {
>>>>>          [BLK_STS_RESOURCE]      = { -ENOMEM,    "kernel resource" },
>>>>>          [BLK_STS_DEV_RESOURCE]  = { -EBUSY,     "device resource" },
>>>>>          [BLK_STS_AGAIN]         = { -EAGAIN,    "nonblocking retry" },
>>>>> +       [BLK_STS_OFFLINE]       = { -EIO,       "device offline" },
>>>>>
>>>>>          /* device mapper special case, should not leak out: */
>>>>>          [BLK_STS_DM_REQUEUE]    = { -EREMCHG, "dm internal retry" },
>>>>> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
>>>>> index fe065c394fff..5561e58d158a 100644
>>>>> --- a/include/linux/blk_types.h
>>>>> +++ b/include/linux/blk_types.h
>>>>> @@ -153,6 +153,13 @@ typedef u8 __bitwise blk_status_t;
>>>>>    */
>>>>>   #define BLK_STS_ZONE_ACTIVE_RESOURCE   ((__force blk_status_t)16)
>>>>>
>>>>> +/*
>>>>> + * BLK_STS_OFFLINE is returned from the driver when the target device is offline
>>>>> + * or is being taken offline. This could help differentiate the case where a
>>>>> + * device is intentionally being shut down from a real I/O error.
>>>>> + */
>>>>> +#define BLK_STS_OFFLINE                ((__force blk_status_t)17)
>>>>> +
>>>>>   /**
>>>>>    * blk_path_error - returns true if error may be path related
>>>>>    * @error: status the request was completed with
>>>>> --
>>>>> 2.30.2
>>>>>
>>> Please do not overload EIO here.
>>> EIO already is a catch-all error if we don't know any better, but for
>>> the 'device offline' case we do (or rather should).
>>> Please map it onto 'ENODEV' or 'ENXIO'.
>>
>> It's deliberately EIO as not to force a change in behavior. I don't mind
>> using something else, but that should be a separate change then.
> 
> Thanks for these feedbacks. Shall I send v2 with an extra patch that
> changes EIO to ENODEV/ENXIO? Or shall we do that in a follow up patch?
> Also, any preference between ENODEV and ENXIO?
> 
Please make it an addtional patch, and use ENODEV as a return value.
For this patch you can add:

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
diff mbox series

Patch

diff --git a/block/blk-core.c b/block/blk-core.c
index 61f6a0dc4511..24035dd2eef1 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -164,6 +164,7 @@  static const struct {
 	[BLK_STS_RESOURCE]	= { -ENOMEM,	"kernel resource" },
 	[BLK_STS_DEV_RESOURCE]	= { -EBUSY,	"device resource" },
 	[BLK_STS_AGAIN]		= { -EAGAIN,	"nonblocking retry" },
+	[BLK_STS_OFFLINE]	= { -EIO,	"device offline" },
 
 	/* device mapper special case, should not leak out: */
 	[BLK_STS_DM_REQUEUE]	= { -EREMCHG, "dm internal retry" },
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index fe065c394fff..5561e58d158a 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -153,6 +153,13 @@  typedef u8 __bitwise blk_status_t;
  */
 #define BLK_STS_ZONE_ACTIVE_RESOURCE	((__force blk_status_t)16)
 
+/*
+ * BLK_STS_OFFLINE is returned from the driver when the target device is offline
+ * or is being taken offline. This could help differentiate the case where a
+ * device is intentionally being shut down from a real I/O error.
+ */
+#define BLK_STS_OFFLINE		((__force blk_status_t)17)
+
 /**
  * blk_path_error - returns true if error may be path related
  * @error: status the request was completed with