[08/13] lightnvm: pblk: Set proper read stutus in bio

Message ID	20190227171442.11853-9-igor.j.konopko@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> From: Igor Konopko <igor.j.konopko@intel.com> To: mb@lightnvm.io, javier@javigon.com, hans.holmberg@cnexlabs.com Cc: linux-block@vger.kernel.org, igor.j.konopko@intel.com Subject: [PATCH 08/13] lightnvm: pblk: Set proper read stutus in bio Date: Wed, 27 Feb 2019 18:14:37 +0100 Message-Id: <20190227171442.11853-9-igor.j.konopko@intel.com> In-Reply-To: <20190227171442.11853-1-igor.j.konopko@intel.com> References: <20190227171442.11853-1-igor.j.konopko@intel.com> Sender: linux-block-owner@vger.kernel.org Precedence: bulk
Series	lightnvm: bugfixes and improvements \| expand [00/13] lightnvm: bugfixes and improvements [01/13] lightnvm: pblk: Line reference fix in GC [02/13] lightnvm: pblk: Gracefully handle GC data malloc fail [03/13] lightnvm: pblk: Fix put line back behaviour [04/13] lightnvm: pblk: Rollback in gc read [05/13] lightnvm: pblk: Count all read errors in stats [06/13] lightnvm: pblk: Ensure that erase is chunk aligned [07/13] lightnvm: pblk: Cleanly fail when there is not enough memory [08/13] lightnvm: pblk: Set proper read stutus in bio [09/13] lightnvm: pblk: Kick writer for flush requests [10/13] lightnvm: pblk: Reduce L2P DRAM footprint [11/13] lightnvm: pblk: Remove unused smeta_ssec field [12/13] lightnvm: pblk: close opened chunks [13/13] lightnvm: Inherit mdts from the parent nvme device

Igor Konopko Feb. 27, 2019, 5:14 p.m. UTC

Currently in case of read errors, bi_status is not
set properly which leads to returning inproper data
to higher layer. This patch fix that by setting proper
status in case of read errors

Patch also removes unnecessary warn_once(), which does
not make sense in that place, since user bio is not used
for interation with drive and thus bi_status will not be
set here.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-read.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

Javier González March 4, 2019, 8:03 a.m. UTC | #1

> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> Currently in case of read errors, bi_status is not
> set properly which leads to returning inproper data
> to higher layer. This patch fix that by setting proper
> status in case of read errors
> 
> Patch also removes unnecessary warn_once(), which does
> not make sense in that place, since user bio is not used
> for interation with drive and thus bi_status will not be
> set here.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-read.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> index 3789185144da..39c1d6ccaedb 100644
> --- a/drivers/lightnvm/pblk-read.c
> +++ b/drivers/lightnvm/pblk-read.c
> @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
> 	WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
> }
> 
> -static void pblk_end_user_read(struct bio *bio)
> +static void pblk_end_user_read(struct bio *bio, int error)
> {
> -#ifdef CONFIG_NVM_PBLK_DEBUG
> -	WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
> -#endif
> +	if (error && error != NVM_RSP_WARN_HIGHECC)
> +		bio_io_error(bio);
> 	bio_endio(bio);
> }
> 
> @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
> 	struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
> 	struct bio *bio = (struct bio *)r_ctx->private;
> 
> -	pblk_end_user_read(bio);
> +	pblk_end_user_read(bio, rqd->error);
> 	__pblk_end_io_read(pblk, rqd, true);
> }
> 
> @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
> 	rqd->bio = NULL;
> 	rqd->nr_ppas = nr_secs;
> 
> -	bio_endio(bio);
> +	pblk_end_user_read(bio, rqd->error);
> 	__pblk_end_io_read(pblk, rqd, false);
> }
> 
> --
> 2.17.1

This is by design. We do not report the read errors as in any other
block device - this is why we clone the read bio.

If you want to remove the WARN_ONCE, it is fine by me - it helped in the
past to debug the read path as when one read failed we go a storm of
them. Now we have better controller tools to debug this, so if nobody
else wants it there, let’s remove it.

Javier

Hans Holmberg March 4, 2019, 9:35 a.m. UTC | #2

On Mon, Mar 4, 2019 at 9:03 AM Javier González <javier@javigon.com> wrote:
>
> > On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >
> > Currently in case of read errors, bi_status is not
> > set properly which leads to returning inproper data
> > to higher layer. This patch fix that by setting proper
> > status in case of read errors
> >
> > Patch also removes unnecessary warn_once(), which does
> > not make sense in that place, since user bio is not used
> > for interation with drive and thus bi_status will not be
> > set here.
> >
> > Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> > ---
> > drivers/lightnvm/pblk-read.c | 11 +++++------
> > 1 file changed, 5 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> > index 3789185144da..39c1d6ccaedb 100644
> > --- a/drivers/lightnvm/pblk-read.c
> > +++ b/drivers/lightnvm/pblk-read.c
> > @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
> >       WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
> > }
> >
> > -static void pblk_end_user_read(struct bio *bio)
> > +static void pblk_end_user_read(struct bio *bio, int error)
> > {
> > -#ifdef CONFIG_NVM_PBLK_DEBUG
> > -     WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
> > -#endif
> > +     if (error && error != NVM_RSP_WARN_HIGHECC)
> > +             bio_io_error(bio);
> >       bio_endio(bio);
> > }
> >
> > @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
> >       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
> >       struct bio *bio = (struct bio *)r_ctx->private;
> >
> > -     pblk_end_user_read(bio);
> > +     pblk_end_user_read(bio, rqd->error);
> >       __pblk_end_io_read(pblk, rqd, true);
> > }
> >
> > @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
> >       rqd->bio = NULL;
> >       rqd->nr_ppas = nr_secs;
> >
> > -     bio_endio(bio);
> > +     pblk_end_user_read(bio, rqd->error);
> >       __pblk_end_io_read(pblk, rqd, false);
> > }
> >
> > --
> > 2.17.1
>
> This is by design. We do not report the read errors as in any other
> block device - this is why we clone the read bio.

Could you elaborate on why not reporting read errors is a good thing in pblk?

>
> If you want to remove the WARN_ONCE, it is fine by me - it helped in the
> past to debug the read path as when one read failed we go a storm of
> them. Now we have better controller tools to debug this, so if nobody
> else wants it there, let’s remove it.
>
> Javier

Javier González March 4, 2019, 9:48 a.m. UTC | #3

> On 4 Mar 2019, at 10.35, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
> 
> On Mon, Mar 4, 2019 at 9:03 AM Javier González <javier@javigon.com> wrote:
>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>> 
>>> Currently in case of read errors, bi_status is not
>>> set properly which leads to returning inproper data
>>> to higher layer. This patch fix that by setting proper
>>> status in case of read errors
>>> 
>>> Patch also removes unnecessary warn_once(), which does
>>> not make sense in that place, since user bio is not used
>>> for interation with drive and thus bi_status will not be
>>> set here.
>>> 
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>> drivers/lightnvm/pblk-read.c | 11 +++++------
>>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>> 
>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>> index 3789185144da..39c1d6ccaedb 100644
>>> --- a/drivers/lightnvm/pblk-read.c
>>> +++ b/drivers/lightnvm/pblk-read.c
>>> @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>>      WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
>>> }
>>> 
>>> -static void pblk_end_user_read(struct bio *bio)
>>> +static void pblk_end_user_read(struct bio *bio, int error)
>>> {
>>> -#ifdef CONFIG_NVM_PBLK_DEBUG
>>> -     WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
>>> -#endif
>>> +     if (error && error != NVM_RSP_WARN_HIGHECC)
>>> +             bio_io_error(bio);
>>>      bio_endio(bio);
>>> }
>>> 
>>> @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>>>      struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>      struct bio *bio = (struct bio *)r_ctx->private;
>>> 
>>> -     pblk_end_user_read(bio);
>>> +     pblk_end_user_read(bio, rqd->error);
>>>      __pblk_end_io_read(pblk, rqd, true);
>>> }
>>> 
>>> @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>      rqd->bio = NULL;
>>>      rqd->nr_ppas = nr_secs;
>>> 
>>> -     bio_endio(bio);
>>> +     pblk_end_user_read(bio, rqd->error);
>>>      __pblk_end_io_read(pblk, rqd, false);
>>> }
>>> 
>>> --
>>> 2.17.1
>> 
>> This is by design. We do not report the read errors as in any other
>> block device - this is why we clone the read bio.
> 
> Could you elaborate on why not reporting read errors is a good thing in pblk?
> 

Normal block devices do not report read errors on the completion path
unless it is a fatal error. This is actually not well understood by the
upper layers, which tend to assume that the device is completely broken.

This is a challenge for OCSSD / Denali / Zone devices as there are cases
where reads can fail. Unfortunately at this point, we need to mask these
errors and deal with them in the different layers.

For OCSSD currently, we do this in pblk, which I think fits well the
model as we exposed a normal block device.

Javier

Hans Holmberg March 4, 2019, 12:14 p.m. UTC | #4

On Mon, Mar 4, 2019 at 10:48 AM Javier González <javier@javigon.com> wrote:
>
>
>
> > On 4 Mar 2019, at 10.35, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
> >
> > On Mon, Mar 4, 2019 at 9:03 AM Javier González <javier@javigon.com> wrote:
> >>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >>>
> >>> Currently in case of read errors, bi_status is not
> >>> set properly which leads to returning inproper data
> >>> to higher layer. This patch fix that by setting proper
> >>> status in case of read errors
> >>>
> >>> Patch also removes unnecessary warn_once(), which does
> >>> not make sense in that place, since user bio is not used
> >>> for interation with drive and thus bi_status will not be
> >>> set here.
> >>>
> >>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >>> ---
> >>> drivers/lightnvm/pblk-read.c | 11 +++++------
> >>> 1 file changed, 5 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> >>> index 3789185144da..39c1d6ccaedb 100644
> >>> --- a/drivers/lightnvm/pblk-read.c
> >>> +++ b/drivers/lightnvm/pblk-read.c
> >>> @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
> >>>      WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
> >>> }
> >>>
> >>> -static void pblk_end_user_read(struct bio *bio)
> >>> +static void pblk_end_user_read(struct bio *bio, int error)
> >>> {
> >>> -#ifdef CONFIG_NVM_PBLK_DEBUG
> >>> -     WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
> >>> -#endif
> >>> +     if (error && error != NVM_RSP_WARN_HIGHECC)
> >>> +             bio_io_error(bio);
> >>>      bio_endio(bio);
> >>> }
> >>>
> >>> @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
> >>>      struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
> >>>      struct bio *bio = (struct bio *)r_ctx->private;
> >>>
> >>> -     pblk_end_user_read(bio);
> >>> +     pblk_end_user_read(bio, rqd->error);
> >>>      __pblk_end_io_read(pblk, rqd, true);
> >>> }
> >>>
> >>> @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
> >>>      rqd->bio = NULL;
> >>>      rqd->nr_ppas = nr_secs;
> >>>
> >>> -     bio_endio(bio);
> >>> +     pblk_end_user_read(bio, rqd->error);
> >>>      __pblk_end_io_read(pblk, rqd, false);
> >>> }
> >>>
> >>> --
> >>> 2.17.1
> >>
> >> This is by design. We do not report the read errors as in any other
> >> block device - this is why we clone the read bio.
> >
> > Could you elaborate on why not reporting read errors is a good thing in pblk?
> >
>
> Normal block devices do not report read errors on the completion path
> unless it is a fatal error. This is actually not well understood by the
> upper layers, which tend to assume that the device is completely broken.

So returning bogus data without even a warning is a preferred
solution? You want to force "the upper layers" to do checksumming?

It's fine to mask out NVM_RSP_WARN_HIGHECC, since that is just a
warning that OCSSD 2.0 adds. The data should still be good.
All other errors (see 4.6.1.2.1 in the NVMe 1.3 spec), indicates that
the command did not complete (As far as I can tell)


>
> This is a challenge for OCSSD / Denali / Zone devices as there are cases
> where reads can fail. Unfortunately at this point, we need to mask these
> errors and deal with them in the different layers.
>
> For OCSSD currently, we do this in pblk, which I think fits well the
> model as we exposed a normal block device.
>
> Javier

Igor Konopko March 4, 2019, 12:51 p.m. UTC | #5

On 04.03.2019 13:14, Hans Holmberg wrote:
> On Mon, Mar 4, 2019 at 10:48 AM Javier González <javier@javigon.com> wrote:
>>
>>
>>
>>> On 4 Mar 2019, at 10.35, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
>>>
>>> On Mon, Mar 4, 2019 at 9:03 AM Javier González <javier@javigon.com> wrote:
>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>>
>>>>> Currently in case of read errors, bi_status is not
>>>>> set properly which leads to returning inproper data
>>>>> to higher layer. This patch fix that by setting proper
>>>>> status in case of read errors
>>>>>
>>>>> Patch also removes unnecessary warn_once(), which does
>>>>> not make sense in that place, since user bio is not used
>>>>> for interation with drive and thus bi_status will not be
>>>>> set here.
>>>>>
>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>> ---
>>>>> drivers/lightnvm/pblk-read.c | 11 +++++------
>>>>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>>>> index 3789185144da..39c1d6ccaedb 100644
>>>>> --- a/drivers/lightnvm/pblk-read.c
>>>>> +++ b/drivers/lightnvm/pblk-read.c
>>>>> @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>       WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
>>>>> }
>>>>>
>>>>> -static void pblk_end_user_read(struct bio *bio)
>>>>> +static void pblk_end_user_read(struct bio *bio, int error)
>>>>> {
>>>>> -#ifdef CONFIG_NVM_PBLK_DEBUG
>>>>> -     WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
>>>>> -#endif
>>>>> +     if (error && error != NVM_RSP_WARN_HIGHECC)
>>>>> +             bio_io_error(bio);
>>>>>       bio_endio(bio);
>>>>> }
>>>>>
>>>>> @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>>>>>       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>>>       struct bio *bio = (struct bio *)r_ctx->private;
>>>>>
>>>>> -     pblk_end_user_read(bio);
>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>       __pblk_end_io_read(pblk, rqd, true);
>>>>> }
>>>>>
>>>>> @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>       rqd->bio = NULL;
>>>>>       rqd->nr_ppas = nr_secs;
>>>>>
>>>>> -     bio_endio(bio);
>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>       __pblk_end_io_read(pblk, rqd, false);
>>>>> }
>>>>>
>>>>> --
>>>>> 2.17.1
>>>>
>>>> This is by design. We do not report the read errors as in any other
>>>> block device - this is why we clone the read bio.
>>>
>>> Could you elaborate on why not reporting read errors is a good thing in pblk?
>>>
>>
>> Normal block devices do not report read errors on the completion path
>> unless it is a fatal error. This is actually not well understood by the
>> upper layers, which tend to assume that the device is completely broken.
> 
> So returning bogus data without even a warning is a preferred
> solution? You want to force "the upper layers" to do checksumming?
> 
> It's fine to mask out NVM_RSP_WARN_HIGHECC, since that is just a
> warning that OCSSD 2.0 adds. The data should still be good.
> All other errors (see 4.6.1.2.1 in the NVMe 1.3 spec), indicates that
> the command did not complete (As far as I can tell)
> 

My approach was exactly like that. In all cases other than WARN_HIGHECC 
we don't have a valid data. Without setting a bio_io_error() we are 
creating the impression for other layers, that we read the data 
correctly, what is not a case then.

I'm also seeing that this patch is not the only user of bio_io_error() 
API, also other drivers such as md uses is commonly.

> 
>>
>> This is a challenge for OCSSD / Denali / Zone devices as there are cases
>> where reads can fail. Unfortunately at this point, we need to mask these
>> errors and deal with them in the different layers.
>>
>> For OCSSD currently, we do this in pblk, which I think fits well the
>> model as we exposed a normal block device.
>>
>> Javier

Matias Bjorling March 4, 2019, 1:04 p.m. UTC | #6

On 3/4/19 10:48 AM, Javier González wrote:
> 
> 
>> On 4 Mar 2019, at 10.35, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
>>
>> On Mon, Mar 4, 2019 at 9:03 AM Javier González <javier@javigon.com> wrote:
>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>
>>>> Currently in case of read errors, bi_status is not
>>>> set properly which leads to returning inproper data
>>>> to higher layer. This patch fix that by setting proper
>>>> status in case of read errors
>>>>
>>>> Patch also removes unnecessary warn_once(), which does
>>>> not make sense in that place, since user bio is not used
>>>> for interation with drive and thus bi_status will not be
>>>> set here.
>>>>
>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>> ---
>>>> drivers/lightnvm/pblk-read.c | 11 +++++------
>>>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>>> index 3789185144da..39c1d6ccaedb 100644
>>>> --- a/drivers/lightnvm/pblk-read.c
>>>> +++ b/drivers/lightnvm/pblk-read.c
>>>> @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>>>       WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
>>>> }
>>>>
>>>> -static void pblk_end_user_read(struct bio *bio)
>>>> +static void pblk_end_user_read(struct bio *bio, int error)
>>>> {
>>>> -#ifdef CONFIG_NVM_PBLK_DEBUG
>>>> -     WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
>>>> -#endif
>>>> +     if (error && error != NVM_RSP_WARN_HIGHECC)
>>>> +             bio_io_error(bio);
>>>>       bio_endio(bio);
>>>> }
>>>>
>>>> @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>>>>       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>>       struct bio *bio = (struct bio *)r_ctx->private;
>>>>
>>>> -     pblk_end_user_read(bio);
>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>       __pblk_end_io_read(pblk, rqd, true);
>>>> }
>>>>
>>>> @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>       rqd->bio = NULL;
>>>>       rqd->nr_ppas = nr_secs;
>>>>
>>>> -     bio_endio(bio);
>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>       __pblk_end_io_read(pblk, rqd, false);
>>>> }
>>>>
>>>> --
>>>> 2.17.1
>>>
>>> This is by design. We do not report the read errors as in any other
>>> block device - this is why we clone the read bio.
>>
>> Could you elaborate on why not reporting read errors is a good thing in pblk?
>>
> 
> Normal block devices do not report read errors on the completion path
> unless it is a fatal error. This is actually not well understood by the
> upper layers, which tend to assume that the device is completely broken.
> 
> This is a challenge for OCSSD / Denali / Zone devices as there are cases
> where reads can fail. Unfortunately at this point, we need to mask these
> errors and deal with them in the different layers.

Please don't include zone devices in that list. ZAC/ZBC are 
well-behaved, and an error is a real error.

> 
> For OCSSD currently, we do this in pblk, which I think fits well the
> model as we exposed a normal block device.
> 
> Javier
>

Matias Bjorling March 4, 2019, 1:08 p.m. UTC | #7

On 3/4/19 1:51 PM, Igor Konopko wrote:
> 
> 
> On 04.03.2019 13:14, Hans Holmberg wrote:
>> On Mon, Mar 4, 2019 at 10:48 AM Javier González <javier@javigon.com> 
>> wrote:
>>>
>>>
>>>
>>>> On 4 Mar 2019, at 10.35, Hans Holmberg 
>>>> <hans.ml.holmberg@owltronix.com> wrote:
>>>>
>>>> On Mon, Mar 4, 2019 at 9:03 AM Javier González <javier@javigon.com> 
>>>> wrote:
>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> 
>>>>>> wrote:
>>>>>>
>>>>>> Currently in case of read errors, bi_status is not
>>>>>> set properly which leads to returning inproper data
>>>>>> to higher layer. This patch fix that by setting proper
>>>>>> status in case of read errors
>>>>>>
>>>>>> Patch also removes unnecessary warn_once(), which does
>>>>>> not make sense in that place, since user bio is not used
>>>>>> for interation with drive and thus bi_status will not be
>>>>>> set here.
>>>>>>
>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>>> ---
>>>>>> drivers/lightnvm/pblk-read.c | 11 +++++------
>>>>>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/lightnvm/pblk-read.c 
>>>>>> b/drivers/lightnvm/pblk-read.c
>>>>>> index 3789185144da..39c1d6ccaedb 100644
>>>>>> --- a/drivers/lightnvm/pblk-read.c
>>>>>> +++ b/drivers/lightnvm/pblk-read.c
>>>>>> @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk 
>>>>>> *pblk, struct nvm_rq *rqd,
>>>>>>       WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random 
>>>>>> request\n");
>>>>>> }
>>>>>>
>>>>>> -static void pblk_end_user_read(struct bio *bio)
>>>>>> +static void pblk_end_user_read(struct bio *bio, int error)
>>>>>> {
>>>>>> -#ifdef CONFIG_NVM_PBLK_DEBUG
>>>>>> -     WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
>>>>>> -#endif
>>>>>> +     if (error && error != NVM_RSP_WARN_HIGHECC)
>>>>>> +             bio_io_error(bio);
>>>>>>       bio_endio(bio);
>>>>>> }
>>>>>>
>>>>>> @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>>>>>>       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>>>>       struct bio *bio = (struct bio *)r_ctx->private;
>>>>>>
>>>>>> -     pblk_end_user_read(bio);
>>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>>       __pblk_end_io_read(pblk, rqd, true);
>>>>>> }
>>>>>>
>>>>>> @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct 
>>>>>> nvm_rq *rqd)
>>>>>>       rqd->bio = NULL;
>>>>>>       rqd->nr_ppas = nr_secs;
>>>>>>
>>>>>> -     bio_endio(bio);
>>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>>       __pblk_end_io_read(pblk, rqd, false);
>>>>>> }
>>>>>>
>>>>>> -- 
>>>>>> 2.17.1
>>>>>
>>>>> This is by design. We do not report the read errors as in any other
>>>>> block device - this is why we clone the read bio.
>>>>
>>>> Could you elaborate on why not reporting read errors is a good thing 
>>>> in pblk?
>>>>
>>>
>>> Normal block devices do not report read errors on the completion path
>>> unless it is a fatal error. This is actually not well understood by the
>>> upper layers, which tend to assume that the device is completely broken.
>>
>> So returning bogus data without even a warning is a preferred
>> solution? You want to force "the upper layers" to do checksumming?
>>
>> It's fine to mask out NVM_RSP_WARN_HIGHECC, since that is just a
>> warning that OCSSD 2.0 adds. The data should still be good.
>> All other errors (see 4.6.1.2.1 in the NVMe 1.3 spec), indicates that
>> the command did not complete (As far as I can tell)
>>
> 
> My approach was exactly like that. In all cases other than WARN_HIGHECC 
> we don't have a valid data. Without setting a bio_io_error() we are 
> creating the impression for other layers, that we read the data 
> correctly, what is not a case then.
> 
> I'm also seeing that this patch is not the only user of bio_io_error() 
> API, also other drivers such as md uses is commonly.

Yes agree. This is an actual error in pblk that lets it return bogus data.

Javier González March 4, 2019, 1:21 p.m. UTC | #8

> On 4 Mar 2019, at 14.04, Matias Bjørling <mb@lightnvm.io> wrote:
> 
> On 3/4/19 10:48 AM, Javier González wrote:
>>> On 4 Mar 2019, at 10.35, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
>>> 
>>> On Mon, Mar 4, 2019 at 9:03 AM Javier González <javier@javigon.com> wrote:
>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>> 
>>>>> Currently in case of read errors, bi_status is not
>>>>> set properly which leads to returning inproper data
>>>>> to higher layer. This patch fix that by setting proper
>>>>> status in case of read errors
>>>>> 
>>>>> Patch also removes unnecessary warn_once(), which does
>>>>> not make sense in that place, since user bio is not used
>>>>> for interation with drive and thus bi_status will not be
>>>>> set here.
>>>>> 
>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>> ---
>>>>> drivers/lightnvm/pblk-read.c | 11 +++++------
>>>>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>>>> 
>>>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>>>> index 3789185144da..39c1d6ccaedb 100644
>>>>> --- a/drivers/lightnvm/pblk-read.c
>>>>> +++ b/drivers/lightnvm/pblk-read.c
>>>>> @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>      WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
>>>>> }
>>>>> 
>>>>> -static void pblk_end_user_read(struct bio *bio)
>>>>> +static void pblk_end_user_read(struct bio *bio, int error)
>>>>> {
>>>>> -#ifdef CONFIG_NVM_PBLK_DEBUG
>>>>> -     WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
>>>>> -#endif
>>>>> +     if (error && error != NVM_RSP_WARN_HIGHECC)
>>>>> +             bio_io_error(bio);
>>>>>      bio_endio(bio);
>>>>> }
>>>>> 
>>>>> @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>>>>>      struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>>>      struct bio *bio = (struct bio *)r_ctx->private;
>>>>> 
>>>>> -     pblk_end_user_read(bio);
>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>      __pblk_end_io_read(pblk, rqd, true);
>>>>> }
>>>>> 
>>>>> @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>      rqd->bio = NULL;
>>>>>      rqd->nr_ppas = nr_secs;
>>>>> 
>>>>> -     bio_endio(bio);
>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>      __pblk_end_io_read(pblk, rqd, false);
>>>>> }
>>>>> 
>>>>> --
>>>>> 2.17.1
>>>> 
>>>> This is by design. We do not report the read errors as in any other
>>>> block device - this is why we clone the read bio.
>>> 
>>> Could you elaborate on why not reporting read errors is a good thing in pblk?
>> Normal block devices do not report read errors on the completion path
>> unless it is a fatal error. This is actually not well understood by the
>> upper layers, which tend to assume that the device is completely broken.
>> This is a challenge for OCSSD / Denali / Zone devices as there are cases
>> where reads can fail. Unfortunately at this point, we need to mask these
>> errors and deal with them in the different layers.
> 
> Please don't include zone devices in that list. ZAC/ZBC are
> well-behaved, and an error is a real error.

They have worked around this for years. AFAIK the read path still
returns predefined data, not an error. So, as I see it is the same

Javier

Javier González March 4, 2019, 1:45 p.m. UTC | #9

> On 4 Mar 2019, at 14.08, Matias Bjørling <mb@lightnvm.io> wrote:
> 
> On 3/4/19 1:51 PM, Igor Konopko wrote:
>> On 04.03.2019 13:14, Hans Holmberg wrote:
>>> On Mon, Mar 4, 2019 at 10:48 AM Javier González <javier@javigon.com> wrote:
>>>>> On 4 Mar 2019, at 10.35, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
>>>>> 
>>>>> On Mon, Mar 4, 2019 at 9:03 AM Javier González <javier@javigon.com> wrote:
>>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>>>> 
>>>>>>> Currently in case of read errors, bi_status is not
>>>>>>> set properly which leads to returning inproper data
>>>>>>> to higher layer. This patch fix that by setting proper
>>>>>>> status in case of read errors
>>>>>>> 
>>>>>>> Patch also removes unnecessary warn_once(), which does
>>>>>>> not make sense in that place, since user bio is not used
>>>>>>> for interation with drive and thus bi_status will not be
>>>>>>> set here.
>>>>>>> 
>>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>>>> ---
>>>>>>> drivers/lightnvm/pblk-read.c | 11 +++++------
>>>>>>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>>>>>> 
>>>>>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>>>>>> index 3789185144da..39c1d6ccaedb 100644
>>>>>>> --- a/drivers/lightnvm/pblk-read.c
>>>>>>> +++ b/drivers/lightnvm/pblk-read.c
>>>>>>> @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>       WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
>>>>>>> }
>>>>>>> 
>>>>>>> -static void pblk_end_user_read(struct bio *bio)
>>>>>>> +static void pblk_end_user_read(struct bio *bio, int error)
>>>>>>> {
>>>>>>> -#ifdef CONFIG_NVM_PBLK_DEBUG
>>>>>>> -     WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
>>>>>>> -#endif
>>>>>>> +     if (error && error != NVM_RSP_WARN_HIGHECC)
>>>>>>> +             bio_io_error(bio);
>>>>>>>       bio_endio(bio);
>>>>>>> }
>>>>>>> 
>>>>>>> @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>>>>>>>       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>>>>>       struct bio *bio = (struct bio *)r_ctx->private;
>>>>>>> 
>>>>>>> -     pblk_end_user_read(bio);
>>>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>>>       __pblk_end_io_read(pblk, rqd, true);
>>>>>>> }
>>>>>>> 
>>>>>>> @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>>>       rqd->bio = NULL;
>>>>>>>       rqd->nr_ppas = nr_secs;
>>>>>>> 
>>>>>>> -     bio_endio(bio);
>>>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>>>       __pblk_end_io_read(pblk, rqd, false);
>>>>>>> }
>>>>>>> 
>>>>>>> --
>>>>>>> 2.17.1
>>>>>> 
>>>>>> This is by design. We do not report the read errors as in any other
>>>>>> block device - this is why we clone the read bio.
>>>>> 
>>>>> Could you elaborate on why not reporting read errors is a good thing in pblk?
>>>> 
>>>> Normal block devices do not report read errors on the completion path
>>>> unless it is a fatal error. This is actually not well understood by the
>>>> upper layers, which tend to assume that the device is completely broken.
>>> 
>>> So returning bogus data without even a warning is a preferred
>>> solution? You want to force "the upper layers" to do checksumming?
>>> 
>>> It's fine to mask out NVM_RSP_WARN_HIGHECC, since that is just a
>>> warning that OCSSD 2.0 adds. The data should still be good.
>>> All other errors (see 4.6.1.2.1 in the NVMe 1.3 spec), indicates that
>>> the command did not complete (As far as I can tell)
>> My approach was exactly like that. In all cases other than WARN_HIGHECC we don't have a valid data. Without setting a bio_io_error() we are creating the impression for other layers, that we read the data correctly, what is not a case then.
>> I'm also seeing that this patch is not the only user of bio_io_error() API, also other drivers such as md uses is commonly.
> 
> Yes agree. This is an actual error in pblk that lets it return bogus data.

I am not against returning an error, I am just saying that this is not
normal behavior on the read path.

The problem is that the upper layers might interpret that the device is
broken completely, which is not true for a spa failing. Think for
example of a host reading under mw_cunits - in reality this is not even
a device problem but a host bug that might result in a fatal error.

Matias: I am surprised to see you answer this way - when I tried to
define a sane read error path with meaningful errors starting in the
spec and all the way up the stack you were the first one to argue for
reads to always succeed no matter what. In fact, using ZBC/ZAC as an
example...

Javier

Matias Bjorling March 4, 2019, 3:12 p.m. UTC | #10

On 3/4/19 2:45 PM, Javier González wrote:
> 
>> On 4 Mar 2019, at 14.08, Matias Bjørling <mb@lightnvm.io> wrote:
>>
>> On 3/4/19 1:51 PM, Igor Konopko wrote:
>>> On 04.03.2019 13:14, Hans Holmberg wrote:
>>>> On Mon, Mar 4, 2019 at 10:48 AM Javier González <javier@javigon.com> wrote:
>>>>>> On 4 Mar 2019, at 10.35, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
>>>>>>
>>>>>> On Mon, Mar 4, 2019 at 9:03 AM Javier González <javier@javigon.com> wrote:
>>>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>>>>>
>>>>>>>> Currently in case of read errors, bi_status is not
>>>>>>>> set properly which leads to returning inproper data
>>>>>>>> to higher layer. This patch fix that by setting proper
>>>>>>>> status in case of read errors
>>>>>>>>
>>>>>>>> Patch also removes unnecessary warn_once(), which does
>>>>>>>> not make sense in that place, since user bio is not used
>>>>>>>> for interation with drive and thus bi_status will not be
>>>>>>>> set here.
>>>>>>>>
>>>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>>>>> ---
>>>>>>>> drivers/lightnvm/pblk-read.c | 11 +++++------
>>>>>>>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>>>>>>> index 3789185144da..39c1d6ccaedb 100644
>>>>>>>> --- a/drivers/lightnvm/pblk-read.c
>>>>>>>> +++ b/drivers/lightnvm/pblk-read.c
>>>>>>>> @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>>        WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
>>>>>>>> }
>>>>>>>>
>>>>>>>> -static void pblk_end_user_read(struct bio *bio)
>>>>>>>> +static void pblk_end_user_read(struct bio *bio, int error)
>>>>>>>> {
>>>>>>>> -#ifdef CONFIG_NVM_PBLK_DEBUG
>>>>>>>> -     WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
>>>>>>>> -#endif
>>>>>>>> +     if (error && error != NVM_RSP_WARN_HIGHECC)
>>>>>>>> +             bio_io_error(bio);
>>>>>>>>        bio_endio(bio);
>>>>>>>> }
>>>>>>>>
>>>>>>>> @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>>>>>>>>        struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>>>>>>        struct bio *bio = (struct bio *)r_ctx->private;
>>>>>>>>
>>>>>>>> -     pblk_end_user_read(bio);
>>>>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>>>>        __pblk_end_io_read(pblk, rqd, true);
>>>>>>>> }
>>>>>>>>
>>>>>>>> @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>>>>        rqd->bio = NULL;
>>>>>>>>        rqd->nr_ppas = nr_secs;
>>>>>>>>
>>>>>>>> -     bio_endio(bio);
>>>>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>>>>        __pblk_end_io_read(pblk, rqd, false);
>>>>>>>> }
>>>>>>>>
>>>>>>>> --
>>>>>>>> 2.17.1
>>>>>>>
>>>>>>> This is by design. We do not report the read errors as in any other
>>>>>>> block device - this is why we clone the read bio.
>>>>>>
>>>>>> Could you elaborate on why not reporting read errors is a good thing in pblk?
>>>>>
>>>>> Normal block devices do not report read errors on the completion path
>>>>> unless it is a fatal error. This is actually not well understood by the
>>>>> upper layers, which tend to assume that the device is completely broken.
>>>>
>>>> So returning bogus data without even a warning is a preferred
>>>> solution? You want to force "the upper layers" to do checksumming?
>>>>
>>>> It's fine to mask out NVM_RSP_WARN_HIGHECC, since that is just a
>>>> warning that OCSSD 2.0 adds. The data should still be good.
>>>> All other errors (see 4.6.1.2.1 in the NVMe 1.3 spec), indicates that
>>>> the command did not complete (As far as I can tell)
>>> My approach was exactly like that. In all cases other than WARN_HIGHECC we don't have a valid data. Without setting a bio_io_error() we are creating the impression for other layers, that we read the data correctly, what is not a case then.
>>> I'm also seeing that this patch is not the only user of bio_io_error() API, also other drivers such as md uses is commonly.
>>
>> Yes agree. This is an actual error in pblk that lets it return bogus data.
> 
> I am not against returning an error, I am just saying that this is not
> normal behavior on the read path.
> 
> The problem is that the upper layers might interpret that the device is
> broken completely, which is not true for a spa failing. Think for
> example of a host reading under mw_cunits - in reality this is not even
> a device problem but a host bug that might result in a fatal error.

Agree, and the host should manage it. The drive shall return a fatal 
error when it is asked to return invalid data. E.g., mw_cunits.

> 
> Matias: I am surprised to see you answer this way - when I tried to
> define a sane read error path with meaningful errors starting in the
> spec and all the way up the stack you were the first one to argue for
> reads to always succeed no matter what. In fact, using ZBC/ZAC as an
> example...

What I objected against was having error messages for each type of error 
that could happen. Instead of a single or few error types (that triggers 
the same set of recovery procedure at the host-side).

Javier González March 5, 2019, 6:43 a.m. UTC | #11

> On 4 Mar 2019, at 16.12, Matias Bjørling <mb@lightnvm.io> wrote:
> 
> On 3/4/19 2:45 PM, Javier González wrote:
>>> On 4 Mar 2019, at 14.08, Matias Bjørling <mb@lightnvm.io> wrote:
>>> 
>>> On 3/4/19 1:51 PM, Igor Konopko wrote:
>>>> On 04.03.2019 13:14, Hans Holmberg wrote:
>>>>> On Mon, Mar 4, 2019 at 10:48 AM Javier González <javier@javigon.com> wrote:
>>>>>>> On 4 Mar 2019, at 10.35, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
>>>>>>> 
>>>>>>> On Mon, Mar 4, 2019 at 9:03 AM Javier González <javier@javigon.com> wrote:
>>>>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>>>>>> 
>>>>>>>>> Currently in case of read errors, bi_status is not
>>>>>>>>> set properly which leads to returning inproper data
>>>>>>>>> to higher layer. This patch fix that by setting proper
>>>>>>>>> status in case of read errors
>>>>>>>>> 
>>>>>>>>> Patch also removes unnecessary warn_once(), which does
>>>>>>>>> not make sense in that place, since user bio is not used
>>>>>>>>> for interation with drive and thus bi_status will not be
>>>>>>>>> set here.
>>>>>>>>> 
>>>>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>>>>>> ---
>>>>>>>>> drivers/lightnvm/pblk-read.c | 11 +++++------
>>>>>>>>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>>>>>>>> 
>>>>>>>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>>>>>>>> index 3789185144da..39c1d6ccaedb 100644
>>>>>>>>> --- a/drivers/lightnvm/pblk-read.c
>>>>>>>>> +++ b/drivers/lightnvm/pblk-read.c
>>>>>>>>> @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>>>       WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> -static void pblk_end_user_read(struct bio *bio)
>>>>>>>>> +static void pblk_end_user_read(struct bio *bio, int error)
>>>>>>>>> {
>>>>>>>>> -#ifdef CONFIG_NVM_PBLK_DEBUG
>>>>>>>>> -     WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
>>>>>>>>> -#endif
>>>>>>>>> +     if (error && error != NVM_RSP_WARN_HIGHECC)
>>>>>>>>> +             bio_io_error(bio);
>>>>>>>>>       bio_endio(bio);
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>>>>>>>>>       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>>>>>>>       struct bio *bio = (struct bio *)r_ctx->private;
>>>>>>>>> 
>>>>>>>>> -     pblk_end_user_read(bio);
>>>>>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>>>>>       __pblk_end_io_read(pblk, rqd, true);
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>>>>>       rqd->bio = NULL;
>>>>>>>>>       rqd->nr_ppas = nr_secs;
>>>>>>>>> 
>>>>>>>>> -     bio_endio(bio);
>>>>>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>>>>>       __pblk_end_io_read(pblk, rqd, false);
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> 2.17.1
>>>>>>>> 
>>>>>>>> This is by design. We do not report the read errors as in any other
>>>>>>>> block device - this is why we clone the read bio.
>>>>>>> 
>>>>>>> Could you elaborate on why not reporting read errors is a good thing in pblk?
>>>>>> 
>>>>>> Normal block devices do not report read errors on the completion path
>>>>>> unless it is a fatal error. This is actually not well understood by the
>>>>>> upper layers, which tend to assume that the device is completely broken.
>>>>> 
>>>>> So returning bogus data without even a warning is a preferred
>>>>> solution? You want to force "the upper layers" to do checksumming?
>>>>> 
>>>>> It's fine to mask out NVM_RSP_WARN_HIGHECC, since that is just a
>>>>> warning that OCSSD 2.0 adds. The data should still be good.
>>>>> All other errors (see 4.6.1.2.1 in the NVMe 1.3 spec), indicates that
>>>>> the command did not complete (As far as I can tell)
>>>> My approach was exactly like that. In all cases other than WARN_HIGHECC we don't have a valid data. Without setting a bio_io_error() we are creating the impression for other layers, that we read the data correctly, what is not a case then.
>>>> I'm also seeing that this patch is not the only user of bio_io_error() API, also other drivers such as md uses is commonly.
>>> 
>>> Yes agree. This is an actual error in pblk that lets it return bogus data.
>> I am not against returning an error, I am just saying that this is not
>> normal behavior on the read path.
>> The problem is that the upper layers might interpret that the device is
>> broken completely, which is not true for a spa failing. Think for
>> example of a host reading under mw_cunits - in reality this is not even
>> a device problem but a host bug that might result in a fatal error.
> 
> Agree, and the host should manage it. The drive shall return a fatal error when it is asked to return invalid data. E.g., mw_cunits.
> 
>> Matias: I am surprised to see you answer this way - when I tried to
>> define a sane read error path with meaningful errors starting in the
>> spec and all the way up the stack you were the first one to argue for
>> reads to always succeed no matter what. In fact, using ZBC/ZAC as an
>> example...
> 
> What I objected against was having error messages for each type of
> error that could happen. Instead of a single or few error types (that
> triggers the same set of recovery procedure at the host-side).

Ok. So just to agree on a good way to move forward: do you think it is a
good idea for the LightNVM subsystem to propagate errors for the read
path moving forward? This would include reading ahead the write pointer,
which _does not_ report errors in ZAC/ZBC either. Here, note that
currently in pblk, we do not prevent a read to go to the device in any
case, and it is the device’s responsibility to return an error. However,
since OCSSD 2.0 does not define a rich error model many errors are
masked under generic buckets.

If so, we need to define which errors are propagated and which are not.
Can you share some insights, using OCSSD 2.0 as a base?

Javier

[08/13] lightnvm: pblk: Set proper read stutus in bio

Commit Message

Comments

Patch