diff mbox series

[15/18] lightnvm: pblk: fix in case of lack of lines

Message ID 20190314160428.3559-16-igor.j.konopko@intel.com (mailing list archive)
State New, archived
Headers show
Series lightnvm: next set of improvements for 5.2 | expand

Commit Message

Igor Konopko March 14, 2019, 4:04 p.m. UTC
In case when mapping fails (called from writer thread) due to lack of
lines, currently we are calling pblk_pipeline_stop(), which waits
for pending write IOs, so it will lead to the deadlock. Switching
to __pblk_pipeline_stop() in that case instead will fix that.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-map.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Javier González March 18, 2019, 7:42 a.m. UTC | #1
> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> In case when mapping fails (called from writer thread) due to lack of
> lines, currently we are calling pblk_pipeline_stop(), which waits
> for pending write IOs, so it will lead to the deadlock. Switching
> to __pblk_pipeline_stop() in that case instead will fix that.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-map.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
> index 5408e32..afc10306 100644
> --- a/drivers/lightnvm/pblk-map.c
> +++ b/drivers/lightnvm/pblk-map.c
> @@ -46,7 +46,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
> 		pblk_line_close_meta(pblk, prev_line);
> 
> 		if (!line) {
> -			pblk_pipeline_stop(pblk);
> +			__pblk_pipeline_stop(pblk);
> 			return -ENOSPC;
> 		}
> 
> --
> 2.9.5

Have you seeing this problem?

Before checking if there is a line, we are closing metadata for the
previous line, so all inflight I/Os should be clear. Can you develop on
the case in which this would happen?
Igor Konopko March 18, 2019, 1:28 p.m. UTC | #2
On 18.03.2019 08:42, Javier González wrote:
>> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>
>> In case when mapping fails (called from writer thread) due to lack of
>> lines, currently we are calling pblk_pipeline_stop(), which waits
>> for pending write IOs, so it will lead to the deadlock. Switching
>> to __pblk_pipeline_stop() in that case instead will fix that.
>>
>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>> ---
>> drivers/lightnvm/pblk-map.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
>> index 5408e32..afc10306 100644
>> --- a/drivers/lightnvm/pblk-map.c
>> +++ b/drivers/lightnvm/pblk-map.c
>> @@ -46,7 +46,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>> 		pblk_line_close_meta(pblk, prev_line);
>>
>> 		if (!line) {
>> -			pblk_pipeline_stop(pblk);
>> +			__pblk_pipeline_stop(pblk);
>> 			return -ENOSPC;
>> 		}
>>
>> --
>> 2.9.5
> 
> Have you seeing this problem?
> 
> Before checking if there is a line, we are closing metadata for the
> previous line, so all inflight I/Os should be clear. Can you develop on
> the case in which this would happen?

So we have following sequence: pblk_pipeline_stop() -> 
__pblk_pipeline_flush() -> pblk_flush_writer() -> wait for emptying 
round buffer.
This will never complete, since we still have some RB entries, which 
cannot be written since writer thread is blocked with waiting inside 
pblk_flush_writer().

Am I missing sth?

>
Javier González March 18, 2019, 7:21 p.m. UTC | #3
> On 18 Mar 2019, at 14.28, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> 
> 
> On 18.03.2019 08:42, Javier González wrote:
>>> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>> 
>>> In case when mapping fails (called from writer thread) due to lack of
>>> lines, currently we are calling pblk_pipeline_stop(), which waits
>>> for pending write IOs, so it will lead to the deadlock. Switching
>>> to __pblk_pipeline_stop() in that case instead will fix that.
>>> 
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>> drivers/lightnvm/pblk-map.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
>>> index 5408e32..afc10306 100644
>>> --- a/drivers/lightnvm/pblk-map.c
>>> +++ b/drivers/lightnvm/pblk-map.c
>>> @@ -46,7 +46,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>> 		pblk_line_close_meta(pblk, prev_line);
>>> 
>>> 		if (!line) {
>>> -			pblk_pipeline_stop(pblk);
>>> +			__pblk_pipeline_stop(pblk);
>>> 			return -ENOSPC;
>>> 		}
>>> 
>>> --
>>> 2.9.5
>> Have you seeing this problem?
>> Before checking if there is a line, we are closing metadata for the
>> previous line, so all inflight I/Os should be clear. Can you develop on
>> the case in which this would happen?
> 
> So we have following sequence: pblk_pipeline_stop() -> __pblk_pipeline_flush() -> pblk_flush_writer() -> wait for emptying round buffer.
> This will never complete, since we still have some RB entries, which cannot be written since writer thread is blocked with waiting inside pblk_flush_writer().
> 
> Am I missing sth?

So this will be the case in which we are in the last line and
pblk_flush_writer() needs to allocate an extra line persist the write
buffer? Shouldn’t the rate-limiter take care of this? As I recall, Hans
implemented some logic to guarantee that at least one line is always
available for GC, which in turn will free a line for user data. When we
hit this limit, performance will drop dramatically, but it should not
stall.

The reason I want to understand the real case behind this fix is that by
calling __pblk_pipeline_stop() we are basically stopping all other
inflight I/Os; we should be able to serve all inflight I/Os before a
mapping error triggers the pipeline to get into read-only mode.
Igor Konopko March 21, 2019, 1:21 p.m. UTC | #4
On 18.03.2019 20:21, Javier González wrote:
> 
>> On 18 Mar 2019, at 14.28, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>
>>
>>
>> On 18.03.2019 08:42, Javier González wrote:
>>>> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>
>>>> In case when mapping fails (called from writer thread) due to lack of
>>>> lines, currently we are calling pblk_pipeline_stop(), which waits
>>>> for pending write IOs, so it will lead to the deadlock. Switching
>>>> to __pblk_pipeline_stop() in that case instead will fix that.
>>>>
>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>> ---
>>>> drivers/lightnvm/pblk-map.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
>>>> index 5408e32..afc10306 100644
>>>> --- a/drivers/lightnvm/pblk-map.c
>>>> +++ b/drivers/lightnvm/pblk-map.c
>>>> @@ -46,7 +46,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>>> 		pblk_line_close_meta(pblk, prev_line);
>>>>
>>>> 		if (!line) {
>>>> -			pblk_pipeline_stop(pblk);
>>>> +			__pblk_pipeline_stop(pblk);
>>>> 			return -ENOSPC;
>>>> 		}
>>>>
>>>> --
>>>> 2.9.5
>>> Have you seeing this problem?
>>> Before checking if there is a line, we are closing metadata for the
>>> previous line, so all inflight I/Os should be clear. Can you develop on
>>> the case in which this would happen?
>>
>> So we have following sequence: pblk_pipeline_stop() -> __pblk_pipeline_flush() -> pblk_flush_writer() -> wait for emptying round buffer.
>> This will never complete, since we still have some RB entries, which cannot be written since writer thread is blocked with waiting inside pblk_flush_writer().
>>
>> Am I missing sth?
> 
> So this will be the case in which we are in the last line and
> pblk_flush_writer() needs to allocate an extra line persist the write
> buffer? Shouldn’t the rate-limiter take care of this? As I recall, Hans
> implemented some logic to guarantee that at least one line is always
> available for GC, which in turn will free a line for user data. When we
> hit this limit, performance will drop dramatically, but it should not
> stall.
> 
> The reason I want to understand the real case behind this fix is that by
> calling __pblk_pipeline_stop() we are basically stopping all other
> inflight I/Os; we should be able to serve all inflight I/Os before a
> mapping error triggers the pipeline to get into read-only mode.
> 

Javier,
my understanding was that if we hit that particular case, we are simply 
"done" with pblk and there is no way to recover it, so I made this 
changes based on code analysis, if it is not true, than this patch does 
not make sense anymore.

Hans,
could you help me to understand how rate limiter ensure that what Javier 
mentioned about? Thanks
Hans Holmberg March 22, 2019, 12:17 p.m. UTC | #5
On Thu, Mar 21, 2019 at 2:21 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
>
>
>
> On 18.03.2019 20:21, Javier González wrote:
> >
> >> On 18 Mar 2019, at 14.28, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >>
> >>
> >>
> >> On 18.03.2019 08:42, Javier González wrote:
> >>>> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >>>>
> >>>> In case when mapping fails (called from writer thread) due to lack of
> >>>> lines, currently we are calling pblk_pipeline_stop(), which waits
> >>>> for pending write IOs, so it will lead to the deadlock. Switching
> >>>> to __pblk_pipeline_stop() in that case instead will fix that.
> >>>>
> >>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >>>> ---
> >>>> drivers/lightnvm/pblk-map.c | 2 +-
> >>>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
> >>>> index 5408e32..afc10306 100644
> >>>> --- a/drivers/lightnvm/pblk-map.c
> >>>> +++ b/drivers/lightnvm/pblk-map.c
> >>>> @@ -46,7 +46,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
> >>>>            pblk_line_close_meta(pblk, prev_line);
> >>>>
> >>>>            if (!line) {
> >>>> -                  pblk_pipeline_stop(pblk);
> >>>> +                  __pblk_pipeline_stop(pblk);
> >>>>                    return -ENOSPC;
> >>>>            }
> >>>>
> >>>> --
> >>>> 2.9.5
> >>> Have you seeing this problem?
> >>> Before checking if there is a line, we are closing metadata for the
> >>> previous line, so all inflight I/Os should be clear. Can you develop on
> >>> the case in which this would happen?
> >>
> >> So we have following sequence: pblk_pipeline_stop() -> __pblk_pipeline_flush() -> pblk_flush_writer() -> wait for emptying round buffer.
> >> This will never complete, since we still have some RB entries, which cannot be written since writer thread is blocked with waiting inside pblk_flush_writer().
> >>
> >> Am I missing sth?
> >
> > So this will be the case in which we are in the last line and
> > pblk_flush_writer() needs to allocate an extra line persist the write
> > buffer? Shouldn’t the rate-limiter take care of this? As I recall, Hans
> > implemented some logic to guarantee that at least one line is always
> > available for GC, which in turn will free a line for user data. When we
> > hit this limit, performance will drop dramatically, but it should not
> > stall.
> >
> > The reason I want to understand the real case behind this fix is that by
> > calling __pblk_pipeline_stop() we are basically stopping all other
> > inflight I/Os; we should be able to serve all inflight I/Os before a
> > mapping error triggers the pipeline to get into read-only mode.
> >
>
> Javier,
> my understanding was that if we hit that particular case, we are simply
> "done" with pblk and there is no way to recover it, so I made this
> changes based on code analysis, if it is not true, than this patch does
> not make sense anymore.
>
> Hans,
> could you help me to understand how rate limiter ensure that what Javier
> mentioned about? Thanks

I think Javier refers to: '3bcebc5bac09 ("lightnvm: pblk: set
conservative threshold for user writes")'

See the commit message. Let me know if something is unclear :)

Hans
diff mbox series

Patch

diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
index 5408e32..afc10306 100644
--- a/drivers/lightnvm/pblk-map.c
+++ b/drivers/lightnvm/pblk-map.c
@@ -46,7 +46,7 @@  static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
 		pblk_line_close_meta(pblk, prev_line);
 
 		if (!line) {
-			pblk_pipeline_stop(pblk);
+			__pblk_pipeline_stop(pblk);
 			return -ENOSPC;
 		}