Message ID | Z1cra8/5H5HvJ5Sw@igm-qws-u22929a.delacy.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | nfs: propagate fileid changed errors back to syscall | expand |
On 9 Dec 2024, at 12:39, Nikhil Jha wrote: > Hello! This is the first kernel patch I have tried to upstream. I'm > following along with the kernel newbies guide but apologies if I got > anything wrong. > > Currently, if there is a mismatch in the request and response fileids in > an NFS request, the kernel logs an error and attempts to return ESTALE. > However, this error is currently dropped before it makes it all the way > to userspace. This appears to be a mistake, since as far as I can tell > that ESTALE value is never consumed from anywhere. > > Callstack for async NFS write, at time of error: > > nfs_update_inode <- returns -ESTALE > nfs_refresh_inode_locked > nfs_writeback_update_inode <- error is dropped here > nfs3_write_done > nfs_writeback_done > nfs_pgio_result <- other errors are collected here > rpc_exit_task > __rpc_execute > rpc_async_schedule > process_one_work > worker_thread > kthread > ret_from_fork > > We ran into this issue ourselves, and seeing the -ESTALE in the kernel > source code but not from userspace was surprising. > > I tested a rebased version of this patch on an el8 kernel (v6.1.114), > and it seems to correctly propagate this error. > >> 8------------------------------------------------------8< > > If an NFS server returns a response with a different file id to the > response, the kernel currently prints out an error and attempts to > return -ESTALE. However, this -ESTALE value is never surfaced anywhere. Hi Nikhil Jha, Will this cause us to return -ESTALE to the application even if the WRITE was successful? Ben
On Tue, Dec 10, 2024 at 07:11:43AM -0500, Benjamin Coddington wrote: > On 9 Dec 2024, at 12:39, Nikhil Jha wrote: > > > Hello! This is the first kernel patch I have tried to upstream. I'm > > following along with the kernel newbies guide but apologies if I got > > anything wrong. > > > > Currently, if there is a mismatch in the request and response fileids in > > an NFS request, the kernel logs an error and attempts to return ESTALE. > > However, this error is currently dropped before it makes it all the way > > to userspace. This appears to be a mistake, since as far as I can tell > > that ESTALE value is never consumed from anywhere. > > > > Callstack for async NFS write, at time of error: > > > > nfs_update_inode <- returns -ESTALE > > nfs_refresh_inode_locked > > nfs_writeback_update_inode <- error is dropped here > > nfs3_write_done > > nfs_writeback_done > > nfs_pgio_result <- other errors are collected here > > rpc_exit_task > > __rpc_execute > > rpc_async_schedule > > process_one_work > > worker_thread > > kthread > > ret_from_fork > > > > We ran into this issue ourselves, and seeing the -ESTALE in the kernel > > source code but not from userspace was surprising. > > > > I tested a rebased version of this patch on an el8 kernel (v6.1.114), > > and it seems to correctly propagate this error. > > > >> 8------------------------------------------------------8< > > > > If an NFS server returns a response with a different file id to the > > response, the kernel currently prints out an error and attempts to > > return -ESTALE. However, this -ESTALE value is never surfaced anywhere. > > Hi Nikhil Jha, > > Will this cause us to return -ESTALE to the application even if the WRITE > was successful? > > Ben > Hi Ben, Hmm.. I'm not sure how to answer that question exactly. A fileid is only mismatched between a request RPC and response RPC when something really weird is going on (i.e. a bug in the NFS server), so it's hard to reason about if a WRITE was successful or not. The `return -ESTALE` was already in the kernel code, but this particular codepath seems to have accidentally dropped that error. Nikhil
On Tue, 2024-12-10 at 12:12 -0500, Nikhil Jha wrote: > On Tue, Dec 10, 2024 at 07:11:43AM -0500, Benjamin Coddington wrote: > > On 9 Dec 2024, at 12:39, Nikhil Jha wrote: > > > > > Hello! This is the first kernel patch I have tried to upstream. > > > I'm > > > following along with the kernel newbies guide but apologies if I > > > got > > > anything wrong. > > > > > > Currently, if there is a mismatch in the request and response > > > fileids in > > > an NFS request, the kernel logs an error and attempts to return > > > ESTALE. > > > However, this error is currently dropped before it makes it all > > > the way > > > to userspace. This appears to be a mistake, since as far as I can > > > tell > > > that ESTALE value is never consumed from anywhere. > > > > > > Callstack for async NFS write, at time of error: > > > > > > nfs_update_inode <- returns -ESTALE > > > nfs_refresh_inode_locked > > > nfs_writeback_update_inode <- error is dropped here > > > nfs3_write_done > > > nfs_writeback_done > > > nfs_pgio_result <- other errors are collected here > > > rpc_exit_task > > > __rpc_execute > > > rpc_async_schedule > > > process_one_work > > > worker_thread > > > kthread > > > ret_from_fork > > > > > > We ran into this issue ourselves, and seeing the -ESTALE in the > > > kernel > > > source code but not from userspace was surprising. > > > > > > I tested a rebased version of this patch on an el8 kernel > > > (v6.1.114), > > > and it seems to correctly propagate this error. > > > > > > > 8------------------------------------------------------8< > > > > > > If an NFS server returns a response with a different file id to > > > the > > > response, the kernel currently prints out an error and attempts > > > to > > > return -ESTALE. However, this -ESTALE value is never surfaced > > > anywhere. > > > > Hi Nikhil Jha, > > > > Will this cause us to return -ESTALE to the application even if the > > WRITE > > was successful? > > > > Ben > > > > Hi Ben, > > Hmm.. I'm not sure how to answer that question exactly. A fileid is > only > mismatched between a request RPC and response RPC when something > really > weird is going on (i.e. a bug in the NFS server), so it's hard to > reason > about if a WRITE was successful or not. > > The `return -ESTALE` was already in the kernel code, but this > particular > codepath seems to have accidentally dropped that error. > > Nikhil > I'm not keen on taking any client changes to paper over what appears to be a major server bug. We simply can't support broken servers that reuse filehandles across files.
On Tue, Dec 10, 2024 at 05:33:14PM +0000, Trond Myklebust wrote: > On Tue, 2024-12-10 at 12:12 -0500, Nikhil Jha wrote: > > On Tue, Dec 10, 2024 at 07:11:43AM -0500, Benjamin Coddington wrote: > > > On 9 Dec 2024, at 12:39, Nikhil Jha wrote: > > > > > > > Hello! This is the first kernel patch I have tried to upstream. > > > > I'm > > > > following along with the kernel newbies guide but apologies if I > > > > got > > > > anything wrong. > > > > > > > > Currently, if there is a mismatch in the request and response > > > > fileids in > > > > an NFS request, the kernel logs an error and attempts to return > > > > ESTALE. > > > > However, this error is currently dropped before it makes it all > > > > the way > > > > to userspace. This appears to be a mistake, since as far as I can > > > > tell > > > > that ESTALE value is never consumed from anywhere. > > > > > > > > Callstack for async NFS write, at time of error: > > > > > > > > nfs_update_inode <- returns -ESTALE > > > > nfs_refresh_inode_locked > > > > nfs_writeback_update_inode <- error is dropped here > > > > nfs3_write_done > > > > nfs_writeback_done > > > > nfs_pgio_result <- other errors are collected here > > > > rpc_exit_task > > > > __rpc_execute > > > > rpc_async_schedule > > > > process_one_work > > > > worker_thread > > > > kthread > > > > ret_from_fork > > > > > > > > We ran into this issue ourselves, and seeing the -ESTALE in the > > > > kernel > > > > source code but not from userspace was surprising. > > > > > > > > I tested a rebased version of this patch on an el8 kernel > > > > (v6.1.114), > > > > and it seems to correctly propagate this error. > > > > > > > > > 8------------------------------------------------------8< > > > > > > > > If an NFS server returns a response with a different file id to > > > > the > > > > response, the kernel currently prints out an error and attempts > > > > to > > > > return -ESTALE. However, this -ESTALE value is never surfaced > > > > anywhere. > > > > > > Hi Nikhil Jha, > > > > > > Will this cause us to return -ESTALE to the application even if the > > > WRITE > > > was successful? > > > > > > Ben > > > > > > > Hi Ben, > > > > Hmm.. I'm not sure how to answer that question exactly. A fileid is > > only > > mismatched between a request RPC and response RPC when something > > really > > weird is going on (i.e. a bug in the NFS server), so it's hard to > > reason > > about if a WRITE was successful or not. > > > > The `return -ESTALE` was already in the kernel code, but this > > particular > > codepath seems to have accidentally dropped that error. > > > > Nikhil > > > > I'm not keen on taking any client changes to paper over what appears to > be a major server bug. We simply can't support broken servers that > reuse filehandles across files. > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > Right, this patch *doesn't* alleviate any server bug from the client. I don't think it would make sense to try to fix a buggy server from the Linux NFS client. What I'm thinking about is... 1. We're already detecting this problem, printing an error message, and trying to return ESTALE. 2. That ESTALE is not propagated anywhere. It just gets dropped. If we don't want this ESTALE to be visible, should it just get removed entirely? The value does not appear to be consumed from anywhere in the kernel code.
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c index d39a1f58e18d..4e80a13e9639 100644 --- a/fs/nfs/filelayout/filelayout.c +++ b/fs/nfs/filelayout/filelayout.c @@ -335,7 +335,7 @@ static int filelayout_write_done_cb(struct rpc_task *task, /* zero out the fattr */ hdr->fattr.valid = 0; if (task->tk_status >= 0) - nfs_writeback_update_inode(hdr); + return nfs_writeback_update_inode(hdr); return 0; } diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c index f78115c6c2c1..d15e3799a351 100644 --- a/fs/nfs/flexfilelayout/flexfilelayout.c +++ b/fs/nfs/flexfilelayout/flexfilelayout.c @@ -1503,7 +1503,7 @@ static int ff_layout_write_done_cb(struct rpc_task *task, /* zero out fattr since we don't care DS attr at all */ hdr->fattr.valid = 0; if (task->tk_status >= 0) - nfs_writeback_update_inode(hdr); + return nfs_writeback_update_inode(hdr); return 0; } diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index e564bd11ba60..5c4e2fa88324 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -592,7 +592,7 @@ void nfs_mark_request_commit(struct nfs_page *req, struct nfs_commit_info *cinfo, u32 ds_commit_idx); int nfs_write_need_commit(struct nfs_pgio_header *); -void nfs_writeback_update_inode(struct nfs_pgio_header *hdr); +int nfs_writeback_update_inode(struct nfs_pgio_header *hdr); int nfs_generic_commit_list(struct inode *inode, struct list_head *head, int how, struct nfs_commit_info *cinfo); void nfs_retry_commit(struct list_head *page_list, diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c index 1566163c6d85..42ddbc21fb05 100644 --- a/fs/nfs/nfs3proc.c +++ b/fs/nfs/nfs3proc.c @@ -887,7 +887,7 @@ static int nfs3_write_done(struct rpc_task *task, struct nfs_pgio_header *hdr) if (nfs3_async_handle_jukebox(task, inode)) return -EAGAIN; if (task->tk_status >= 0) - nfs_writeback_update_inode(hdr); + return nfs_writeback_update_inode(hdr); return 0; } diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 405f17e6e0b4..7ec372a1eb98 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -5661,7 +5661,7 @@ static int nfs4_write_done_cb(struct rpc_task *task, } if (task->tk_status >= 0) { renew_lease(NFS_SERVER(inode), hdr->timestamp); - nfs_writeback_update_inode(hdr); + return nfs_writeback_update_inode(hdr); } return 0; } diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c index e27c07bd8929..19cb080653e3 100644 --- a/fs/nfs/pagelist.c +++ b/fs/nfs/pagelist.c @@ -854,9 +854,12 @@ static void nfs_pgio_result(struct rpc_task *task, void *calldata) { struct nfs_pgio_header *hdr = calldata; struct inode *inode = hdr->inode; + int status = hdr->rw_ops->rw_done(task, hdr, inode); - if (hdr->rw_ops->rw_done(task, hdr, inode) != 0) + if (status != 0) { + nfs_set_pgio_error(hdr, status, hdr->args.offset); return; + } if (task->tk_status < 0) nfs_set_pgio_error(hdr, task->tk_status, hdr->args.offset); else diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c index 6c09cd090c34..72ffbdfc7ae6 100644 --- a/fs/nfs/proc.c +++ b/fs/nfs/proc.c @@ -629,7 +629,7 @@ static int nfs_write_done(struct rpc_task *task, struct nfs_pgio_header *hdr) { if (task->tk_status >= 0) { hdr->res.count = hdr->args.count; - nfs_writeback_update_inode(hdr); + return nfs_writeback_update_inode(hdr); } return 0; } diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 50fa539611f5..151da29175fd 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1507,22 +1507,25 @@ static void nfs_writeback_check_extend(struct nfs_pgio_header *hdr, fattr->valid |= NFS_ATTR_FATTR_SIZE; } -void nfs_writeback_update_inode(struct nfs_pgio_header *hdr) +int nfs_writeback_update_inode(struct nfs_pgio_header *hdr) { struct nfs_fattr *fattr = &hdr->fattr; struct inode *inode = hdr->inode; + int ret = 0; if (nfs_have_delegated_mtime(inode)) { spin_lock(&inode->i_lock); nfs_set_cache_invalid(inode, NFS_INO_INVALID_BLOCKS); spin_unlock(&inode->i_lock); - return; + return 0; } spin_lock(&inode->i_lock); nfs_writeback_check_extend(hdr, fattr); - nfs_post_op_update_inode_force_wcc_locked(inode, fattr); + ret = nfs_post_op_update_inode_force_wcc_locked(inode, fattr); spin_unlock(&inode->i_lock); + + return ret; } EXPORT_SYMBOL_GPL(nfs_writeback_update_inode);