Message ID | 20190314160428.3559-16-igor.j.konopko@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | lightnvm: next set of improvements for 5.2 | expand |
> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote: > > In case when mapping fails (called from writer thread) due to lack of > lines, currently we are calling pblk_pipeline_stop(), which waits > for pending write IOs, so it will lead to the deadlock. Switching > to __pblk_pipeline_stop() in that case instead will fix that. > > Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> > --- > drivers/lightnvm/pblk-map.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c > index 5408e32..afc10306 100644 > --- a/drivers/lightnvm/pblk-map.c > +++ b/drivers/lightnvm/pblk-map.c > @@ -46,7 +46,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, > pblk_line_close_meta(pblk, prev_line); > > if (!line) { > - pblk_pipeline_stop(pblk); > + __pblk_pipeline_stop(pblk); > return -ENOSPC; > } > > -- > 2.9.5 Have you seeing this problem? Before checking if there is a line, we are closing metadata for the previous line, so all inflight I/Os should be clear. Can you develop on the case in which this would happen?
On 18.03.2019 08:42, Javier González wrote: >> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote: >> >> In case when mapping fails (called from writer thread) due to lack of >> lines, currently we are calling pblk_pipeline_stop(), which waits >> for pending write IOs, so it will lead to the deadlock. Switching >> to __pblk_pipeline_stop() in that case instead will fix that. >> >> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> >> --- >> drivers/lightnvm/pblk-map.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c >> index 5408e32..afc10306 100644 >> --- a/drivers/lightnvm/pblk-map.c >> +++ b/drivers/lightnvm/pblk-map.c >> @@ -46,7 +46,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, >> pblk_line_close_meta(pblk, prev_line); >> >> if (!line) { >> - pblk_pipeline_stop(pblk); >> + __pblk_pipeline_stop(pblk); >> return -ENOSPC; >> } >> >> -- >> 2.9.5 > > Have you seeing this problem? > > Before checking if there is a line, we are closing metadata for the > previous line, so all inflight I/Os should be clear. Can you develop on > the case in which this would happen? So we have following sequence: pblk_pipeline_stop() -> __pblk_pipeline_flush() -> pblk_flush_writer() -> wait for emptying round buffer. This will never complete, since we still have some RB entries, which cannot be written since writer thread is blocked with waiting inside pblk_flush_writer(). Am I missing sth? >
> On 18 Mar 2019, at 14.28, Igor Konopko <igor.j.konopko@intel.com> wrote: > > > > On 18.03.2019 08:42, Javier González wrote: >>> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote: >>> >>> In case when mapping fails (called from writer thread) due to lack of >>> lines, currently we are calling pblk_pipeline_stop(), which waits >>> for pending write IOs, so it will lead to the deadlock. Switching >>> to __pblk_pipeline_stop() in that case instead will fix that. >>> >>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> >>> --- >>> drivers/lightnvm/pblk-map.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c >>> index 5408e32..afc10306 100644 >>> --- a/drivers/lightnvm/pblk-map.c >>> +++ b/drivers/lightnvm/pblk-map.c >>> @@ -46,7 +46,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, >>> pblk_line_close_meta(pblk, prev_line); >>> >>> if (!line) { >>> - pblk_pipeline_stop(pblk); >>> + __pblk_pipeline_stop(pblk); >>> return -ENOSPC; >>> } >>> >>> -- >>> 2.9.5 >> Have you seeing this problem? >> Before checking if there is a line, we are closing metadata for the >> previous line, so all inflight I/Os should be clear. Can you develop on >> the case in which this would happen? > > So we have following sequence: pblk_pipeline_stop() -> __pblk_pipeline_flush() -> pblk_flush_writer() -> wait for emptying round buffer. > This will never complete, since we still have some RB entries, which cannot be written since writer thread is blocked with waiting inside pblk_flush_writer(). > > Am I missing sth? So this will be the case in which we are in the last line and pblk_flush_writer() needs to allocate an extra line persist the write buffer? Shouldn’t the rate-limiter take care of this? As I recall, Hans implemented some logic to guarantee that at least one line is always available for GC, which in turn will free a line for user data. When we hit this limit, performance will drop dramatically, but it should not stall. The reason I want to understand the real case behind this fix is that by calling __pblk_pipeline_stop() we are basically stopping all other inflight I/Os; we should be able to serve all inflight I/Os before a mapping error triggers the pipeline to get into read-only mode.
On 18.03.2019 20:21, Javier González wrote: > >> On 18 Mar 2019, at 14.28, Igor Konopko <igor.j.konopko@intel.com> wrote: >> >> >> >> On 18.03.2019 08:42, Javier González wrote: >>>> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote: >>>> >>>> In case when mapping fails (called from writer thread) due to lack of >>>> lines, currently we are calling pblk_pipeline_stop(), which waits >>>> for pending write IOs, so it will lead to the deadlock. Switching >>>> to __pblk_pipeline_stop() in that case instead will fix that. >>>> >>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> >>>> --- >>>> drivers/lightnvm/pblk-map.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c >>>> index 5408e32..afc10306 100644 >>>> --- a/drivers/lightnvm/pblk-map.c >>>> +++ b/drivers/lightnvm/pblk-map.c >>>> @@ -46,7 +46,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, >>>> pblk_line_close_meta(pblk, prev_line); >>>> >>>> if (!line) { >>>> - pblk_pipeline_stop(pblk); >>>> + __pblk_pipeline_stop(pblk); >>>> return -ENOSPC; >>>> } >>>> >>>> -- >>>> 2.9.5 >>> Have you seeing this problem? >>> Before checking if there is a line, we are closing metadata for the >>> previous line, so all inflight I/Os should be clear. Can you develop on >>> the case in which this would happen? >> >> So we have following sequence: pblk_pipeline_stop() -> __pblk_pipeline_flush() -> pblk_flush_writer() -> wait for emptying round buffer. >> This will never complete, since we still have some RB entries, which cannot be written since writer thread is blocked with waiting inside pblk_flush_writer(). >> >> Am I missing sth? > > So this will be the case in which we are in the last line and > pblk_flush_writer() needs to allocate an extra line persist the write > buffer? Shouldn’t the rate-limiter take care of this? As I recall, Hans > implemented some logic to guarantee that at least one line is always > available for GC, which in turn will free a line for user data. When we > hit this limit, performance will drop dramatically, but it should not > stall. > > The reason I want to understand the real case behind this fix is that by > calling __pblk_pipeline_stop() we are basically stopping all other > inflight I/Os; we should be able to serve all inflight I/Os before a > mapping error triggers the pipeline to get into read-only mode. > Javier, my understanding was that if we hit that particular case, we are simply "done" with pblk and there is no way to recover it, so I made this changes based on code analysis, if it is not true, than this patch does not make sense anymore. Hans, could you help me to understand how rate limiter ensure that what Javier mentioned about? Thanks
On Thu, Mar 21, 2019 at 2:21 PM Igor Konopko <igor.j.konopko@intel.com> wrote: > > > > On 18.03.2019 20:21, Javier González wrote: > > > >> On 18 Mar 2019, at 14.28, Igor Konopko <igor.j.konopko@intel.com> wrote: > >> > >> > >> > >> On 18.03.2019 08:42, Javier González wrote: > >>>> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote: > >>>> > >>>> In case when mapping fails (called from writer thread) due to lack of > >>>> lines, currently we are calling pblk_pipeline_stop(), which waits > >>>> for pending write IOs, so it will lead to the deadlock. Switching > >>>> to __pblk_pipeline_stop() in that case instead will fix that. > >>>> > >>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> > >>>> --- > >>>> drivers/lightnvm/pblk-map.c | 2 +- > >>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>> > >>>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c > >>>> index 5408e32..afc10306 100644 > >>>> --- a/drivers/lightnvm/pblk-map.c > >>>> +++ b/drivers/lightnvm/pblk-map.c > >>>> @@ -46,7 +46,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, > >>>> pblk_line_close_meta(pblk, prev_line); > >>>> > >>>> if (!line) { > >>>> - pblk_pipeline_stop(pblk); > >>>> + __pblk_pipeline_stop(pblk); > >>>> return -ENOSPC; > >>>> } > >>>> > >>>> -- > >>>> 2.9.5 > >>> Have you seeing this problem? > >>> Before checking if there is a line, we are closing metadata for the > >>> previous line, so all inflight I/Os should be clear. Can you develop on > >>> the case in which this would happen? > >> > >> So we have following sequence: pblk_pipeline_stop() -> __pblk_pipeline_flush() -> pblk_flush_writer() -> wait for emptying round buffer. > >> This will never complete, since we still have some RB entries, which cannot be written since writer thread is blocked with waiting inside pblk_flush_writer(). > >> > >> Am I missing sth? > > > > So this will be the case in which we are in the last line and > > pblk_flush_writer() needs to allocate an extra line persist the write > > buffer? Shouldn’t the rate-limiter take care of this? As I recall, Hans > > implemented some logic to guarantee that at least one line is always > > available for GC, which in turn will free a line for user data. When we > > hit this limit, performance will drop dramatically, but it should not > > stall. > > > > The reason I want to understand the real case behind this fix is that by > > calling __pblk_pipeline_stop() we are basically stopping all other > > inflight I/Os; we should be able to serve all inflight I/Os before a > > mapping error triggers the pipeline to get into read-only mode. > > > > Javier, > my understanding was that if we hit that particular case, we are simply > "done" with pblk and there is no way to recover it, so I made this > changes based on code analysis, if it is not true, than this patch does > not make sense anymore. > > Hans, > could you help me to understand how rate limiter ensure that what Javier > mentioned about? Thanks I think Javier refers to: '3bcebc5bac09 ("lightnvm: pblk: set conservative threshold for user writes")' See the commit message. Let me know if something is unclear :) Hans
diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c index 5408e32..afc10306 100644 --- a/drivers/lightnvm/pblk-map.c +++ b/drivers/lightnvm/pblk-map.c @@ -46,7 +46,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, pblk_line_close_meta(pblk, prev_line); if (!line) { - pblk_pipeline_stop(pblk); + __pblk_pipeline_stop(pblk); return -ENOSPC; }
In case when mapping fails (called from writer thread) due to lack of lines, currently we are calling pblk_pipeline_stop(), which waits for pending write IOs, so it will lead to the deadlock. Switching to __pblk_pipeline_stop() in that case instead will fix that. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> --- drivers/lightnvm/pblk-map.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)