Message ID | 20241031134000.53396-18-cel@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | async COPY fixes for NFSD | expand |
On Thu, 2024-10-31 at 09:40 -0400, cel@kernel.org wrote: > From: Chuck Lever <chuck.lever@oracle.com> > > If an async COPY operation happens to be running when the server is > shut down, notify the requesting client that the copy has completed. > > Since the nfs4_client is going away, seems like this could introduce > some UAFs. > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > --- > fs/nfsd/nfs4proc.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > index 4c964bce6bd7..51b3f85f3791 100644 > --- a/fs/nfsd/nfs4proc.c > +++ b/fs/nfsd/nfs4proc.c > @@ -68,6 +68,8 @@ MODULE_PARM_DESC(nfsd4_ssc_umount_timeout, > > #define NFSDDBG_FACILITY NFSDDBG_PROC > > +static void nfsd4_send_cb_offload(struct nfsd4_copy *copy); > + > static u32 nfsd_attrmask[] = { > NFSD_WRITEABLE_ATTRS_WORD0, > NFSD_WRITEABLE_ATTRS_WORD1, > @@ -1381,8 +1383,10 @@ void nfsd4_shutdown_copy(struct nfs4_client *clp) > { > struct nfsd4_copy *copy; > > - while ((copy = nfsd4_get_copy(clp)) != NULL) > + while ((copy = nfsd4_get_copy(clp)) != NULL) { > nfsd4_stop_copy(copy); > + nfsd4_send_cb_offload(copy); > + } Not sure about a UAF, but it seems like NFS4ERR_DELAY returns might delay the client destruction for quite a while. Maybe this CB_OFFLOAD shouldn't retry on DELAY? > } > #ifdef CONFIG_NFSD_V4_2_INTER_SSC >
On Fri, Nov 01, 2024 at 09:05:07AM -0400, Jeff Layton wrote: > On Thu, 2024-10-31 at 09:40 -0400, cel@kernel.org wrote: > > From: Chuck Lever <chuck.lever@oracle.com> > > > > If an async COPY operation happens to be running when the server is > > shut down, notify the requesting client that the copy has completed. > > > > Since the nfs4_client is going away, seems like this could introduce > > some UAFs. > > > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > --- > > fs/nfsd/nfs4proc.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > > index 4c964bce6bd7..51b3f85f3791 100644 > > --- a/fs/nfsd/nfs4proc.c > > +++ b/fs/nfsd/nfs4proc.c > > @@ -68,6 +68,8 @@ MODULE_PARM_DESC(nfsd4_ssc_umount_timeout, > > > > #define NFSDDBG_FACILITY NFSDDBG_PROC > > > > +static void nfsd4_send_cb_offload(struct nfsd4_copy *copy); > > + > > static u32 nfsd_attrmask[] = { > > NFSD_WRITEABLE_ATTRS_WORD0, > > NFSD_WRITEABLE_ATTRS_WORD1, > > @@ -1381,8 +1383,10 @@ void nfsd4_shutdown_copy(struct nfs4_client *clp) > > { > > struct nfsd4_copy *copy; > > > > - while ((copy = nfsd4_get_copy(clp)) != NULL) > > + while ((copy = nfsd4_get_copy(clp)) != NULL) { > > nfsd4_stop_copy(copy); > > + nfsd4_send_cb_offload(copy); > > + } > > Not sure about a UAF, but it seems like NFS4ERR_DELAY returns might > delay the client destruction for quite a while. The nfsd4_copy() is removed from the nfs4_client's async_copies list, and the RPC proceeds in the background. It doesn't block the destruction of the CLIENTID. But it might be a problem for the RPC logic to transmit the call when there's no nfs4_client to dereference. I should probably drop this patch and try a different approach. > Maybe this CB_OFFLOAD shouldn't retry on DELAY? > > > > } > > #ifdef CONFIG_NFSD_V4_2_INTER_SSC > > > > > -- > Jeff Layton <jlayton@kernel.org> >
On Fri, 2024-11-01 at 09:18 -0400, Chuck Lever wrote: > On Fri, Nov 01, 2024 at 09:05:07AM -0400, Jeff Layton wrote: > > On Thu, 2024-10-31 at 09:40 -0400, cel@kernel.org wrote: > > > From: Chuck Lever <chuck.lever@oracle.com> > > > > > > If an async COPY operation happens to be running when the server is > > > shut down, notify the requesting client that the copy has completed. > > > > > > Since the nfs4_client is going away, seems like this could introduce > > > some UAFs. > > > > > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > > --- > > > fs/nfsd/nfs4proc.c | 6 +++++- > > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > > > index 4c964bce6bd7..51b3f85f3791 100644 > > > --- a/fs/nfsd/nfs4proc.c > > > +++ b/fs/nfsd/nfs4proc.c > > > @@ -68,6 +68,8 @@ MODULE_PARM_DESC(nfsd4_ssc_umount_timeout, > > > > > > #define NFSDDBG_FACILITY NFSDDBG_PROC > > > > > > +static void nfsd4_send_cb_offload(struct nfsd4_copy *copy); > > > + > > > static u32 nfsd_attrmask[] = { > > > NFSD_WRITEABLE_ATTRS_WORD0, > > > NFSD_WRITEABLE_ATTRS_WORD1, > > > @@ -1381,8 +1383,10 @@ void nfsd4_shutdown_copy(struct nfs4_client *clp) > > > { > > > struct nfsd4_copy *copy; > > > > > > - while ((copy = nfsd4_get_copy(clp)) != NULL) > > > + while ((copy = nfsd4_get_copy(clp)) != NULL) { > > > nfsd4_stop_copy(copy); > > > + nfsd4_send_cb_offload(copy); > > > + } > > > > Not sure about a UAF, but it seems like NFS4ERR_DELAY returns might > > delay the client destruction for quite a while. > > The nfsd4_copy() is removed from the nfs4_client's async_copies > list, and the RPC proceeds in the background. It doesn't block the > destruction of the CLIENTID. > > But it might be a problem for the RPC logic to transmit the call > when there's no nfs4_client to dereference. I should probably > drop this patch and try a different approach. No, I think this will block the client shutdown. After nfsd4_shutdown_copy(), it calls nfsd4_shutdown_callback(), which does: flush_workqueue(clp->cl_callback_wq); nfsd41_cb_inflight_wait_complete(clp); So the CB_OFFLOAD workqueue jobs will run, and everything will wait for clp->cl_cb_inflight to go to 0. That won't happen until the CB_OFFLOAD jobs complete. No objection from me though if you see a better approach. Dropping this one for now seems reasonable. > > > Maybe this CB_OFFLOAD shouldn't retry on DELAY? > > > > > > > } > > > #ifdef CONFIG_NFSD_V4_2_INTER_SSC > > > > > > > > > -- > > Jeff Layton <jlayton@kernel.org> > > >
On Fri, Nov 01, 2024 at 09:30:44AM -0400, Jeff Layton wrote: > On Fri, 2024-11-01 at 09:18 -0400, Chuck Lever wrote: > > On Fri, Nov 01, 2024 at 09:05:07AM -0400, Jeff Layton wrote: > > > On Thu, 2024-10-31 at 09:40 -0400, cel@kernel.org wrote: > > > > From: Chuck Lever <chuck.lever@oracle.com> > > > > > > > > If an async COPY operation happens to be running when the server is > > > > shut down, notify the requesting client that the copy has completed. > > > > > > > > Since the nfs4_client is going away, seems like this could introduce > > > > some UAFs. > > > > > > > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > > > --- > > > > fs/nfsd/nfs4proc.c | 6 +++++- > > > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > > > > index 4c964bce6bd7..51b3f85f3791 100644 > > > > --- a/fs/nfsd/nfs4proc.c > > > > +++ b/fs/nfsd/nfs4proc.c > > > > @@ -68,6 +68,8 @@ MODULE_PARM_DESC(nfsd4_ssc_umount_timeout, > > > > > > > > #define NFSDDBG_FACILITY NFSDDBG_PROC > > > > > > > > +static void nfsd4_send_cb_offload(struct nfsd4_copy *copy); > > > > + > > > > static u32 nfsd_attrmask[] = { > > > > NFSD_WRITEABLE_ATTRS_WORD0, > > > > NFSD_WRITEABLE_ATTRS_WORD1, > > > > @@ -1381,8 +1383,10 @@ void nfsd4_shutdown_copy(struct nfs4_client *clp) > > > > { > > > > struct nfsd4_copy *copy; > > > > > > > > - while ((copy = nfsd4_get_copy(clp)) != NULL) > > > > + while ((copy = nfsd4_get_copy(clp)) != NULL) { > > > > nfsd4_stop_copy(copy); > > > > + nfsd4_send_cb_offload(copy); > > > > + } > > > > > > Not sure about a UAF, but it seems like NFS4ERR_DELAY returns might > > > delay the client destruction for quite a while. > > > > The nfsd4_copy() is removed from the nfs4_client's async_copies > > list, and the RPC proceeds in the background. It doesn't block the > > destruction of the CLIENTID. > > > > But it might be a problem for the RPC logic to transmit the call > > when there's no nfs4_client to dereference. I should probably > > drop this patch and try a different approach. > > > No, I think this will block the client shutdown. After > nfsd4_shutdown_copy(), it calls nfsd4_shutdown_callback(), which does: > > flush_workqueue(clp->cl_callback_wq); > nfsd41_cb_inflight_wait_complete(clp); > > So the CB_OFFLOAD workqueue jobs will run, and everything will wait for > clp->cl_cb_inflight to go to 0. That won't happen until the CB_OFFLOAD > jobs complete. > > No objection from me though if you see a better approach. Dropping this > one for now seems reasonable. There is no spec mandate (yet) for CB_OFFLOAD notification on DESTROY_CLIENTID, so I'm dropping the patch for now. > > > Maybe this CB_OFFLOAD shouldn't retry on DELAY? Some logic can be added such that no retry is done. However, if the client does not respond at all to the CB_OFFLOAD, NFSD's RPC timeout is still 7 seconds. We expect this will be exceptionally rare, though -- clients are typically aware of ongoing COPY operations and generally won't try to destroy a client ID where there is pending work that they care about. > > > > } > > > > #ifdef CONFIG_NFSD_V4_2_INTER_SSC > > > > > > > > > > > > > -- > > > Jeff Layton <jlayton@kernel.org> > > > > > > > -- > Jeff Layton <jlayton@kernel.org> >
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 4c964bce6bd7..51b3f85f3791 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -68,6 +68,8 @@ MODULE_PARM_DESC(nfsd4_ssc_umount_timeout, #define NFSDDBG_FACILITY NFSDDBG_PROC +static void nfsd4_send_cb_offload(struct nfsd4_copy *copy); + static u32 nfsd_attrmask[] = { NFSD_WRITEABLE_ATTRS_WORD0, NFSD_WRITEABLE_ATTRS_WORD1, @@ -1381,8 +1383,10 @@ void nfsd4_shutdown_copy(struct nfs4_client *clp) { struct nfsd4_copy *copy; - while ((copy = nfsd4_get_copy(clp)) != NULL) + while ((copy = nfsd4_get_copy(clp)) != NULL) { nfsd4_stop_copy(copy); + nfsd4_send_cb_offload(copy); + } } #ifdef CONFIG_NFSD_V4_2_INTER_SSC