[12/28] lustre: ptlrpc: do not wakeup every second

Message ID	1539543498-29105-13-git-send-email-jsimmons@infradead.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <lustre-devel-bounces@lists.lustre.org> From: James Simmons <jsimmons@infradead.org> To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.com> Date: Sun, 14 Oct 2018 14:58:02 -0400 Message-Id: <1539543498-29105-13-git-send-email-jsimmons@infradead.org> In-Reply-To: <1539543498-29105-1-git-send-email-jsimmons@infradead.org> References: <1539543498-29105-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 12/28] lustre: ptlrpc: do not wakeup every second Precedence: list Cc: Lustre Development List <lustre-devel@lists.lustre.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>
Series	lustre: more assorted fixes for lustre 2.10 \| expand [00/28] lustre: more assorted fixes for lustre 2.10 [01/28] lustre: osc: osc_extent_tree_dump0() implementation is suboptimal [02/28] lustre: llite: Read ahead should return pages read [03/28] lustre: ptlrpc: missing barrier before wake_up [04/28] lustre: ptlrpc: Do not assert when bd_nob_transferred != 0 [05/28] lustre: uapi: add back LUSTRE_MAXFSNAME to lustre_user.h [06/28] lustre: ldlm: ELC shouldn't wait on lock flush [07/28] lustre: llite: pipeline readahead better with large I/O [08/28] lustre: hsm: add kkuc before sending registration RPCs [09/28] lustre: mdc: improve mdc_enqueue() error message [10/28] lustre: llite: Update i_nlink on unlink [11/28] lustre: llite: use security context if it's enabled in the kernel [12/28] lustre: ptlrpc: do not wakeup every second [13/28] lustre: ldlm: check lock cancellation in ldlm_cli_cancel() [14/28] lustre: ptlrpc: handle case of epp_free_pages <= PTLRPC_MAX_BRW_PAGES [15/28] lustre: llite: fix for stat under kthread and X86_X32 [16/28] lustre: statahead: missing barrier before wake_up [17/28] lustre: ldlm: Make lru clear always discard read lock pages [18/28] lustre: mdc: expose changelog through char devices [19/28] lustre: uapi: add missing headers in lustre UAPI headers [20/28] lustre: obdclass: deprecate OBD_GET_VERSION ioctl [21/28] lustre: llite: enhance vvp_dev data structure naming [22/28] lustre: clio: update spare bit handling [23/28] lustre: llog: fix EOF handling in llog_client_next_block() [24/28] lustre: llite: IO accounting of page read [25/28] lustre: llite: disable statahead if starting statahead fail [26/28] lustre: mdc: set correct body eadatasize for getxattr() [27/28] lustre: llite: control concurrent statahead instances [28/28] lustre: llite: restore lld_nfs_dentry handling

James Simmons Oct. 14, 2018, 6:58 p.m. UTC

From: Alex Zhuravlev <bzzz@whamcloud.com>

Even if there are no RPC requests on the set, there is no need to
wake up every second. The thread is woken up when a request is added
to the set or when the STOP bit is set, so it is sufficient to only
wake up when there are requests on the set to worry about.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-9660
Reviewed-on: https://review.whamcloud.com/28776
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

NeilBrown Oct. 29, 2018, 12:03 a.m. UTC | #1

On Sun, Oct 14 2018, James Simmons wrote:

> From: Alex Zhuravlev <bzzz@whamcloud.com>
>
> Even if there are no RPC requests on the set, there is no need to
> wake up every second. The thread is woken up when a request is added
> to the set or when the STOP bit is set, so it is sufficient to only
> wake up when there are requests on the set to worry about.
>
> Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
> WC-bug-id: https://jira.whamcloud.com/browse/LU-9660
> Reviewed-on: https://review.whamcloud.com/28776
> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
> Reviewed-by: Patrick Farrell <paf@cray.com>
> Reviewed-by: Oleg Drokin <green@whamcloud.com>
> Signed-off-by: James Simmons <jsimmons@infradead.org>
> ---
>  drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c b/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c
> index c201a88..5b4977b 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c
> @@ -371,7 +371,7 @@ static int ptlrpcd_check(struct lu_env *env, struct ptlrpcd_ctl *pc)
>  		}
>  	}
>  
> -	return rc;
> +	return rc || test_bit(LIOD_STOP, &pc->pc_flags);
>  }
>  
>  /**
> @@ -441,7 +441,7 @@ static int ptlrpcd(void *arg)
>  		lu_context_enter(env.le_ses);
>  		if (wait_event_idle_timeout(set->set_waitq,
>  					    ptlrpcd_check(&env, pc),
> -					    (timeout ? timeout : 1) * HZ) == 0)
> +					    timeout * HZ) == 0)
>  			ptlrpc_expired_set(set);

This is incorrect.
A timeout of zero means the timeout happens after zero jiffies
(immediately), it doesn't mean there is no timeout.
If we want a "timeout" of zero to mean "Wait forever", we need something
like:

  wait_event_idle_timeout(.....,
                          timeout ? (timeout * HZ) : MAX_SCHEDULE_TIMEOUT) == 0

I've updated the patch accordingly.

Thanks,
NeilBrown

>  
>  		lu_context_exit(&env.le_ctx);
> -- 
> 1.8.3.1

Patrick Farrell Oct. 29, 2018, 1:35 a.m. UTC | #2

Neil,

Does your statement imply this would spin?  It definitely doesn’t just spin (that behavior in a main “wait for work” spot of a (depending on settings) ~per-CPU daemon would render systems unusable and this patch has been in testing for a while).  So what is the detailed behavior of a “timeout that expires immediately”?

- Patrick

NeilBrown Oct. 29, 2018, 2:41 a.m. UTC | #3

On Mon, Oct 29 2018, Patrick Farrell wrote:

> Neil,
>
> Does your statement imply this would spin?  It definitely doesn’t just
> spin (that behavior in a main “wait for work” spot of a (depending on
> settings) ~per-CPU daemon would render systems unusable and this patch
> has been in testing for a while).  So what is the detailed behavior of
> a “timeout that expires immediately”?

Hi Patrick,
 it definitely spins for me.

 I should have clarified that the SFS patch

   e81847bd0651 LU-9660 ptlrpc: do not wakeup every second

 is correct, as __l_wait_event() treats a timeout value of 0 as meaning an
 indefinite timeout.
 The error was in the conversion to wait_event_idle_timeout().  The
 various wait_event*timeout() functions treat 0 as 1 less than 1.
 If you want to not have a timeout, you need to not use the *_timeout()
 version.
 If a timeout is undesirable rather than fatal, then
 MAX_SCHEDULE_TIMEOUT can be used.  In this case, that seemed best.

Thanks,
NeilBrown


>
> - Patrick
>
>
> ________________________________
> From: lustre-devel <lustre-devel-bounces@lists.lustre.org> on behalf of NeilBrown <neilb@suse.com>
> Sent: Sunday, October 28, 2018 7:03:02 PM
> To: James Simmons; Andreas Dilger; Oleg Drokin
> Cc: Lustre Development List
> Subject: Re: [lustre-devel] [PATCH 12/28] lustre: ptlrpc: do not wakeup every second
>
> On Sun, Oct 14 2018, James Simmons wrote:
>
>> From: Alex Zhuravlev <bzzz@whamcloud.com>
>>
>> Even if there are no RPC requests on the set, there is no need to
>> wake up every second. The thread is woken up when a request is added
>> to the set or when the STOP bit is set, so it is sufficient to only
>> wake up when there are requests on the set to worry about.
>>
>> Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
>> WC-bug-id: https://jira.whamcloud.com/browse/LU-9660
>> Reviewed-on: https://review.whamcloud.com/28776
>> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
>> Reviewed-by: Patrick Farrell <paf@cray.com>
>> Reviewed-by: Oleg Drokin <green@whamcloud.com>
>> Signed-off-by: James Simmons <jsimmons@infradead.org>
>> ---
>>  drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c b/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c
>> index c201a88..5b4977b 100644
>> --- a/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c
>> +++ b/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c
>> @@ -371,7 +371,7 @@ static int ptlrpcd_check(struct lu_env *env, struct ptlrpcd_ctl *pc)
>>                }
>>        }
>>
>> -     return rc;
>> +     return rc || test_bit(LIOD_STOP, &pc->pc_flags);
>>  }
>>
>>  /**
>> @@ -441,7 +441,7 @@ static int ptlrpcd(void *arg)
>>                lu_context_enter(env.le_ses);
>>                if (wait_event_idle_timeout(set->set_waitq,
>>                                            ptlrpcd_check(&env, pc),
>> -                                         (timeout ? timeout : 1) * HZ) == 0)
>> +                                         timeout * HZ) == 0)
>>                        ptlrpc_expired_set(set);
>
> This is incorrect.
> A timeout of zero means the timeout happens after zero jiffies
> (immediately), it doesn't mean there is no timeout.
> If we want a "timeout" of zero to mean "Wait forever", we need something
> like:
>
>   wait_event_idle_timeout(.....,
>                           timeout ? (timeout * HZ) : MAX_SCHEDULE_TIMEOUT) == 0
>
> I've updated the patch accordingly.
>
> Thanks,
> NeilBrown
>
>>
>>                lu_context_exit(&env.le_ctx);
>> --
>> 1.8.3.1

James Simmons Oct. 29, 2018, 3:42 a.m. UTC | #4

> > Neil,
> >
> > Does your statement imply this would spin?  It definitely doesn’t just
> > spin (that behavior in a main “wait for work” spot of a (depending on
> > settings) ~per-CPU daemon would render systems unusable and this patch
> > has been in testing for a while).  So what is the detailed behavior of
> > a “timeout that expires immediately”?
> 
> Hi Patrick,
>  it definitely spins for me.

Ah that is where the high cpu load is coming from.
 
>  I should have clarified that the SFS patch
> 
>    e81847bd0651 LU-9660 ptlrpc: do not wakeup every second
> 
>  is correct, as __l_wait_event() treats a timeout value of 0 as meaning an
>  indefinite timeout.
>  The error was in the conversion to wait_event_idle_timeout().  The
>  various wait_event*timeout() functions treat 0 as 1 less than 1.
>  If you want to not have a timeout, you need to not use the *_timeout()
>  version.
>  If a timeout is undesirable rather than fatal, then
>  MAX_SCHEDULE_TIMEOUT can be used.  In this case, that seemed best.

I missed that in the conversion. Now that I know that the mapping I will
do the future server code port correctly ;-)

> > - Patrick
> >
> >
> > ________________________________
> > From: lustre-devel <lustre-devel-bounces@lists.lustre.org> on behalf of NeilBrown <neilb@suse.com>
> > Sent: Sunday, October 28, 2018 7:03:02 PM
> > To: James Simmons; Andreas Dilger; Oleg Drokin
> > Cc: Lustre Development List
> > Subject: Re: [lustre-devel] [PATCH 12/28] lustre: ptlrpc: do not wakeup every second
> >
> > On Sun, Oct 14 2018, James Simmons wrote:
> >
> >> From: Alex Zhuravlev <bzzz@whamcloud.com>
> >>
> >> Even if there are no RPC requests on the set, there is no need to
> >> wake up every second. The thread is woken up when a request is added
> >> to the set or when the STOP bit is set, so it is sufficient to only
> >> wake up when there are requests on the set to worry about.
> >>
> >> Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
> >> WC-bug-id: https://jira.whamcloud.com/browse/LU-9660
> >> Reviewed-on: https://review.whamcloud.com/28776
> >> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
> >> Reviewed-by: Patrick Farrell <paf@cray.com>
> >> Reviewed-by: Oleg Drokin <green@whamcloud.com>
> >> Signed-off-by: James Simmons <jsimmons@infradead.org>
> >> ---
> >>  drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c b/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c
> >> index c201a88..5b4977b 100644
> >> --- a/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c
> >> +++ b/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c
> >> @@ -371,7 +371,7 @@ static int ptlrpcd_check(struct lu_env *env, struct ptlrpcd_ctl *pc)
> >>                }
> >>        }
> >>
> >> -     return rc;
> >> +     return rc || test_bit(LIOD_STOP, &pc->pc_flags);
> >>  }
> >>
> >>  /**
> >> @@ -441,7 +441,7 @@ static int ptlrpcd(void *arg)
> >>                lu_context_enter(env.le_ses);
> >>                if (wait_event_idle_timeout(set->set_waitq,
> >>                                            ptlrpcd_check(&env, pc),
> >> -                                         (timeout ? timeout : 1) * HZ) == 0)
> >> +                                         timeout * HZ) == 0)
> >>                        ptlrpc_expired_set(set);
> >
> > This is incorrect.
> > A timeout of zero means the timeout happens after zero jiffies
> > (immediately), it doesn't mean there is no timeout.
> > If we want a "timeout" of zero to mean "Wait forever", we need something
> > like:
> >
> >   wait_event_idle_timeout(.....,
> >                           timeout ? (timeout * HZ) : MAX_SCHEDULE_TIMEOUT) == 0
> >
> > I've updated the patch accordingly.
> >
> > Thanks,
> > NeilBrown
> >
> >>
> >>                lu_context_exit(&env.le_ctx);
> >> --
> >> 1.8.3.1
>

Patrick Farrell Oct. 29, 2018, 2:17 p.m. UTC | #5

Ah, OK!

Thanks, Neil.  That makes sense.  I just knew there was no way that had been missed in the WC branch - Just as I'm sure you two noticed it immediately in your testing.


Also useful to know about the timeout functions and 0.


Regards,

Patrick

James Simmons Nov. 4, 2018, 8:53 p.m. UTC | #6

> > Neil,
> >
> > Does your statement imply this would spin?  It definitely doesn’t just
> > spin (that behavior in a main “wait for work” spot of a (depending on
> > settings) ~per-CPU daemon would render systems unusable and this patch
> > has been in testing for a while).  So what is the detailed behavior of
> > a “timeout that expires immediately”?
> 
> Hi Patrick,
>  it definitely spins for me.
> 
>  I should have clarified that the SFS patch
> 
>    e81847bd0651 LU-9660 ptlrpc: do not wakeup every second
> 
>  is correct, as __l_wait_event() treats a timeout value of 0 as meaning an
>  indefinite timeout.
>  The error was in the conversion to wait_event_idle_timeout().  The
>  various wait_event*timeout() functions treat 0 as 1 less than 1.
>  If you want to not have a timeout, you need to not use the *_timeout()
>  version.
>  If a timeout is undesirable rather than fatal, then
>  MAX_SCHEDULE_TIMEOUT can be used.  In this case, that seemed best.
> 
> Thanks,
> NeilBrown
> 
> 
> >
> > - Patrick
> >
> >
> > ________________________________
> > From: lustre-devel <lustre-devel-bounces@lists.lustre.org> on behalf of NeilBrown <neilb@suse.com>
> > Sent: Sunday, October 28, 2018 7:03:02 PM
> > To: James Simmons; Andreas Dilger; Oleg Drokin
> > Cc: Lustre Development List
> > Subject: Re: [lustre-devel] [PATCH 12/28] lustre: ptlrpc: do not wakeup every second
> >
> > On Sun, Oct 14 2018, James Simmons wrote:
> >
> >> From: Alex Zhuravlev <bzzz@whamcloud.com>
> >>
> >> Even if there are no RPC requests on the set, there is no need to
> >> wake up every second. The thread is woken up when a request is added
> >> to the set or when the STOP bit is set, so it is sufficient to only
> >> wake up when there are requests on the set to worry about.
> >>
> >> Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
> >> WC-bug-id: https://jira.whamcloud.com/browse/LU-9660
> >> Reviewed-on: https://review.whamcloud.com/28776
> >> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
> >> Reviewed-by: Patrick Farrell <paf@cray.com>
> >> Reviewed-by: Oleg Drokin <green@whamcloud.com>
> >> Signed-off-by: James Simmons <jsimmons@infradead.org>
> >> ---
> >>  drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c b/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c
> >> index c201a88..5b4977b 100644
> >> --- a/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c
> >> +++ b/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c
> >> @@ -371,7 +371,7 @@ static int ptlrpcd_check(struct lu_env *env, struct ptlrpcd_ctl *pc)
> >>                }
> >>        }
> >>
> >> -     return rc;
> >> +     return rc || test_bit(LIOD_STOP, &pc->pc_flags);
> >>  }
> >>
> >>  /**
> >> @@ -441,7 +441,7 @@ static int ptlrpcd(void *arg)
> >>                lu_context_enter(env.le_ses);
> >>                if (wait_event_idle_timeout(set->set_waitq,
> >>                                            ptlrpcd_check(&env, pc),
> >> -                                         (timeout ? timeout : 1) * HZ) == 0)
> >> +                                         timeout * HZ) == 0)
> >>                        ptlrpc_expired_set(set);
> >
> > This is incorrect.
> > A timeout of zero means the timeout happens after zero jiffies
> > (immediately), it doesn't mean there is no timeout.
> > If we want a "timeout" of zero to mean "Wait forever", we need something
> > like:
> >
> >   wait_event_idle_timeout(.....,
> >                           timeout ? (timeout * HZ) : MAX_SCHEDULE_TIMEOUT) == 0
> >
> > I've updated the patch accordingly.

I did that change locally as well and my CPU load problem went away. 
Thanks for figuring it out. Will be more careful in the future.

[12/28] lustre: ptlrpc: do not wakeup every second

Commit Message

Comments

Patch