mbox series

[RFC,00/13] NFSD backchannel fixes

Message ID 170619984210.2833.7173004255003914651.stgit@manet.1015granger.net (mailing list archive)
Headers show
Series NFSD backchannel fixes | expand

Message

Chuck Lever Jan. 25, 2024, 4:28 p.m. UTC
The first three patches fix bugs that prevent NFSD's backchannel
from reliably retransmitting after a client reconnects. These fixes
might be appropriate for 6.8-rc.

Following that are some new trace points that might be helpful for
field troubleshooting.

Then there are some minor clean-ups.

I am still testing this series, and there is one msleep() call that
needs some thought. Thoughts, comments, opinions, rotten fruit? You
know the drill.

---

Chuck Lever (13):
      NFSD: Reset cb_seq_status after NFS4ERR_DELAY
      NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down
      NFSD: Retransmit callbacks after client reconnects
      NFSD: Add nfsd_seq4_status trace event
      NFSD: Replace dprintks in nfsd4_cb_sequence_done()
      NFSD: Rename nfsd_cb_state trace point
      NFSD: Add callback operation lifetime trace points
      SUNRPC: Remove EXPORT_SYMBOL_GPL for svc_process_bc()
      NFSD: Remove unused @reason argument
      NFSD: Replace comment with lockdep assertion
      NFSD: Remove BUG_ON in nfsd4_process_cb_update()
      SUNRPC: Remove stale comments
      NFSD: Remove redundant cb_seq_status initialization


 fs/nfsd/nfs4callback.c   |  81 +++++++++++++-------
 fs/nfsd/nfs4state.c      |   1 +
 fs/nfsd/trace.h          | 162 ++++++++++++++++++++++++++++++++++++++-
 include/trace/misc/nfs.h |  34 ++++++++
 net/sunrpc/svc.c         |   1 -
 net/sunrpc/xprtsock.c    |   9 ---
 6 files changed, 250 insertions(+), 38 deletions(-)

--
Chuck Lever

Comments

Jeff Layton Jan. 25, 2024, 8:41 p.m. UTC | #1
On Thu, 2024-01-25 at 11:28 -0500, Chuck Lever wrote:
> The first three patches fix bugs that prevent NFSD's backchannel
> from reliably retransmitting after a client reconnects. These fixes
> might be appropriate for 6.8-rc.
> 
> Following that are some new trace points that might be helpful for
> field troubleshooting.
> 
> Then there are some minor clean-ups.
> 
> I am still testing this series, and there is one msleep() call that
> needs some thought. Thoughts, comments, opinions, rotten fruit? You
> know the drill.
> 
> ---
> 
> Chuck Lever (13):
>       NFSD: Reset cb_seq_status after NFS4ERR_DELAY
>       NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down
>       NFSD: Retransmit callbacks after client reconnects
>       NFSD: Add nfsd_seq4_status trace event
>       NFSD: Replace dprintks in nfsd4_cb_sequence_done()
>       NFSD: Rename nfsd_cb_state trace point
>       NFSD: Add callback operation lifetime trace points
>       SUNRPC: Remove EXPORT_SYMBOL_GPL for svc_process_bc()
>       NFSD: Remove unused @reason argument
>       NFSD: Replace comment with lockdep assertion
>       NFSD: Remove BUG_ON in nfsd4_process_cb_update()
>       SUNRPC: Remove stale comments
>       NFSD: Remove redundant cb_seq_status initialization
> 
> 
>  fs/nfsd/nfs4callback.c   |  81 +++++++++++++-------
>  fs/nfsd/nfs4state.c      |   1 +
>  fs/nfsd/trace.h          | 162 ++++++++++++++++++++++++++++++++++++++-
>  include/trace/misc/nfs.h |  34 ++++++++
>  net/sunrpc/svc.c         |   1 -
>  net/sunrpc/xprtsock.c    |   9 ---
>  6 files changed, 250 insertions(+), 38 deletions(-)
> 
> --
> Chuck Lever
> 
> 

Love some of the new backchannel tracepoints. That should be helpful.

You can add this to 4-13:

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Benjamin Coddington Jan. 25, 2024, 10:09 p.m. UTC | #2
On 25 Jan 2024, at 11:28, Chuck Lever wrote:

> The first three patches fix bugs that prevent NFSD's backchannel
> from reliably retransmitting after a client reconnects. These fixes
> might be appropriate for 6.8-rc.
>
> Following that are some new trace points that might be helpful for
> field troubleshooting.
>
> Then there are some minor clean-ups.
>
> I am still testing this series, and there is one msleep() call that
> needs some thought. Thoughts, comments, opinions, rotten fruit? You
> know the drill.
>
> ---
>
> Chuck Lever (13):
>       NFSD: Reset cb_seq_status after NFS4ERR_DELAY
>       NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down
>       NFSD: Retransmit callbacks after client reconnects
>       NFSD: Add nfsd_seq4_status trace event
>       NFSD: Replace dprintks in nfsd4_cb_sequence_done()
>       NFSD: Rename nfsd_cb_state trace point
>       NFSD: Add callback operation lifetime trace points
>       SUNRPC: Remove EXPORT_SYMBOL_GPL for svc_process_bc()
>       NFSD: Remove unused @reason argument
>       NFSD: Replace comment with lockdep assertion
>       NFSD: Remove BUG_ON in nfsd4_process_cb_update()
>       SUNRPC: Remove stale comments
>       NFSD: Remove redundant cb_seq_status initialization
>
>
>  fs/nfsd/nfs4callback.c   |  81 +++++++++++++-------
>  fs/nfsd/nfs4state.c      |   1 +
>  fs/nfsd/trace.h          | 162 ++++++++++++++++++++++++++++++++++++++-
>  include/trace/misc/nfs.h |  34 ++++++++
>  net/sunrpc/svc.c         |   1 -
>  net/sunrpc/xprtsock.c    |   9 ---
>  6 files changed, 250 insertions(+), 38 deletions(-)


These are great, looking forward to see how 02/13 waits for reconnection.
Seems like a wait_on_bit or wait_on_var triggered from nfsd4_init_conn()
would do, but that's just my wild speculation.

Ben
Chuck Lever III Jan. 26, 2024, 1:56 p.m. UTC | #3
> On Jan 25, 2024, at 5:09 PM, Benjamin Coddington <bcodding@redhat.com> wrote:
> 
> On 25 Jan 2024, at 11:28, Chuck Lever wrote:
> 
>> The first three patches fix bugs that prevent NFSD's backchannel
>> from reliably retransmitting after a client reconnects. These fixes
>> might be appropriate for 6.8-rc.
>> 
>> Following that are some new trace points that might be helpful for
>> field troubleshooting.
>> 
>> Then there are some minor clean-ups.
>> 
>> I am still testing this series, and there is one msleep() call that
>> needs some thought. Thoughts, comments, opinions, rotten fruit? You
>> know the drill.
>> 
>> ---
>> 
>> Chuck Lever (13):
>>      NFSD: Reset cb_seq_status after NFS4ERR_DELAY
>>      NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down
>>      NFSD: Retransmit callbacks after client reconnects
>>      NFSD: Add nfsd_seq4_status trace event
>>      NFSD: Replace dprintks in nfsd4_cb_sequence_done()
>>      NFSD: Rename nfsd_cb_state trace point
>>      NFSD: Add callback operation lifetime trace points
>>      SUNRPC: Remove EXPORT_SYMBOL_GPL for svc_process_bc()
>>      NFSD: Remove unused @reason argument
>>      NFSD: Replace comment with lockdep assertion
>>      NFSD: Remove BUG_ON in nfsd4_process_cb_update()
>>      SUNRPC: Remove stale comments
>>      NFSD: Remove redundant cb_seq_status initialization
>> 
>> 
>> fs/nfsd/nfs4callback.c   |  81 +++++++++++++-------
>> fs/nfsd/nfs4state.c      |   1 +
>> fs/nfsd/trace.h          | 162 ++++++++++++++++++++++++++++++++++++++-
>> include/trace/misc/nfs.h |  34 ++++++++
>> net/sunrpc/svc.c         |   1 -
>> net/sunrpc/xprtsock.c    |   9 ---
>> 6 files changed, 250 insertions(+), 38 deletions(-)
> 
> 
> These are great, looking forward to see how 02/13 waits for reconnection.
> Seems like a wait_on_bit or wait_on_var triggered from nfsd4_init_conn()
> would do, but that's just my wild speculation.

Thanks for having a look!

Well Jeff pointed out that we don't want to put a workqueue worker
thread to sleep, so wait_on_* would work but is not an appropriate
long-term solution.

I'd rather prefer to have a mechanism where the callback RPC tasks
can be "hard" and then they would wait in the RPC client for the
reconnect. The issue then is how connection establishment is
indicated so that the waiting tasks are woken at the right time.


--
Chuck Lever