Message ID | 1641840653-23059-1-git-send-email-dai.ngo@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | nfsd: Initial implementation of NFSv4 Courteous Server | expand |
> On Jan 10, 2022, at 1:50 PM, Dai Ngo <dai.ngo@oracle.com> wrote: > > Hi Bruce, Chuck > > This series of patches implement the NFSv4 Courteous Server. > > A server which does not immediately expunge the state on lease expiration > is known as a Courteous Server. A Courteous Server continues to recognize > previously generated state tokens as valid until conflict arises between > the expired state and the requests from another client, or the server > reboots. > > The v2 patch includes the following: > > . add new callback, lm_expire_lock, to lock_manager_operations to > allow the lock manager to take appropriate action with conflict lock. > > . handle conflicts of NFSv4 locks with NFSv3/NLM and local locks. > > . expire courtesy client after 24hr if client has not reconnected. > > . do not allow expired client to become courtesy client if there are > waiters for client's locks. > > . modify client_info_show to show courtesy client and seconds from > last renew. > > . fix a problem with NFSv4.1 server where the it keeps returning > SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after > the courtesy client re-connects, causing the client to keep sending > BCTS requests to server. > > The v3 patch includes the following: > > . modified posix_test_lock to check and resolve conflict locks > to handle NLM TEST and NFSv4 LOCKT requests. > > . separate out fix for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. > > The v4 patch includes: > > . rework nfsd_check_courtesy to avoid dead lock of fl_lock and client_lock > by asking the laudromat thread to destroy the courtesy client. > > . handle NFSv4 share reservation conflicts with courtesy client. This > includes conflicts between access mode and deny mode and vice versa. > > . drop the patch for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. > > The v5 patch includes: > > . fix recursive locking of file_rwsem from posix_lock_file. > > . retest with LOCKDEP enabled. > > The v6 patch includes: > > . merge witn 5.15-rc7 > > . fix a bug in nfs4_check_deny_bmap that did not check for matched > nfs4_file before checking for access/deny conflict. This bug causes > pynfs OPEN18 to fail since the server taking too long to release > lots of un-conflict clients' state. > > . enhance share reservation conflict handler to handle case where > a large number of conflict courtesy clients need to be expired. > The 1st 100 clients are expired synchronously and the rest are > expired in the background by the laundromat and NFS4ERR_DELAY > is returned to the NFS client. This is needed to prevent the > NFS client from timing out waiting got the reply. > > The v7 patch includes: > > . Fix race condition in posix_test_lock and posix_lock_inode after > dropping spinlock. > > . Enhance nfsd4_fl_expire_lock to work with with new lm_expire_lock > callback > > . Always resolve share reservation conflicts asynchrously. > > . Fix bug in nfs4_laundromat where spinlock is not used when > scanning cl_ownerstr_hashtbl. > > . Fix bug in nfs4_laundromat where idr_get_next was called > with incorrect 'id'. > > . Merge nfs4_destroy_courtesy_client into nfsd4_fl_expire_lock. > > The v8 patch includes: > > . Fix warning in nfsd4_fl_expire_lock reported by test robot. > > The V9 patch include: > > . Simplify lm_expire_lock API by (1) remove the 'testonly' flag > and (2) specifying return value as true/false to indicate > whether conflict was succesfully resolved. > > . Rework nfsd4_fl_expire_lock to mark client with > NFSD4_DESTROY_COURTESY_CLIENT then tell the laundromat to expire > the client in the background. > > . Add a spinlock in nfs4_client to synchronize access to the > NFSD4_COURTESY_CLIENT and NFSD4_DESTROY_COURTESY_CLIENT flag to > handle race conditions when resolving lock and share reservation > conflict. > > . Courtesy client that was marked as NFSD4_DESTROY_COURTESY_CLIENT > are now consisdered 'dead', waiting for the laundromat to expire > it. This client is no longer allowed to use its states if it > re-connects before the laundromat finishes expiring the client. > > For v4.1 client, the detection is done in the processing of the > SEQUENCE op and returns NFS4ERR_BAD_SESSION to force the client > to re-establish new clientid and session. > For v4.0 client, the detection is done in the processing of the > RENEW and state-related ops and return NFS4ERR_EXPIRE to force > the client to re-establish new clientid. I've made v9 available for testing and review in the nfsd-courteous-server topic branch at: git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git -- Chuck Lever
Could you look back over previous comments? I notice there's a couple unaddressed (circular locking dependency, Documentation/filesystems/). I agree with Chuck that we don't need to reschedule the laundromat, it's OK if it takes longer to get around to cleaning up a dead client. --b. On Mon, Jan 10, 2022 at 10:50:51AM -0800, Dai Ngo wrote: > Hi Bruce, Chuck > > This series of patches implement the NFSv4 Courteous Server. > > A server which does not immediately expunge the state on lease expiration > is known as a Courteous Server. A Courteous Server continues to recognize > previously generated state tokens as valid until conflict arises between > the expired state and the requests from another client, or the server > reboots. > > The v2 patch includes the following: > > . add new callback, lm_expire_lock, to lock_manager_operations to > allow the lock manager to take appropriate action with conflict lock. > > . handle conflicts of NFSv4 locks with NFSv3/NLM and local locks. > > . expire courtesy client after 24hr if client has not reconnected. > > . do not allow expired client to become courtesy client if there are > waiters for client's locks. > > . modify client_info_show to show courtesy client and seconds from > last renew. > > . fix a problem with NFSv4.1 server where the it keeps returning > SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after > the courtesy client re-connects, causing the client to keep sending > BCTS requests to server. > > The v3 patch includes the following: > > . modified posix_test_lock to check and resolve conflict locks > to handle NLM TEST and NFSv4 LOCKT requests. > > . separate out fix for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. > > The v4 patch includes: > > . rework nfsd_check_courtesy to avoid dead lock of fl_lock and client_lock > by asking the laudromat thread to destroy the courtesy client. > > . handle NFSv4 share reservation conflicts with courtesy client. This > includes conflicts between access mode and deny mode and vice versa. > > . drop the patch for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. > > The v5 patch includes: > > . fix recursive locking of file_rwsem from posix_lock_file. > > . retest with LOCKDEP enabled. > > The v6 patch includes: > > . merge witn 5.15-rc7 > > . fix a bug in nfs4_check_deny_bmap that did not check for matched > nfs4_file before checking for access/deny conflict. This bug causes > pynfs OPEN18 to fail since the server taking too long to release > lots of un-conflict clients' state. > > . enhance share reservation conflict handler to handle case where > a large number of conflict courtesy clients need to be expired. > The 1st 100 clients are expired synchronously and the rest are > expired in the background by the laundromat and NFS4ERR_DELAY > is returned to the NFS client. This is needed to prevent the > NFS client from timing out waiting got the reply. > > The v7 patch includes: > > . Fix race condition in posix_test_lock and posix_lock_inode after > dropping spinlock. > > . Enhance nfsd4_fl_expire_lock to work with with new lm_expire_lock > callback > > . Always resolve share reservation conflicts asynchrously. > > . Fix bug in nfs4_laundromat where spinlock is not used when > scanning cl_ownerstr_hashtbl. > > . Fix bug in nfs4_laundromat where idr_get_next was called > with incorrect 'id'. > > . Merge nfs4_destroy_courtesy_client into nfsd4_fl_expire_lock. > > The v8 patch includes: > > . Fix warning in nfsd4_fl_expire_lock reported by test robot. > > The V9 patch include: > > . Simplify lm_expire_lock API by (1) remove the 'testonly' flag > and (2) specifying return value as true/false to indicate > whether conflict was succesfully resolved. > > . Rework nfsd4_fl_expire_lock to mark client with > NFSD4_DESTROY_COURTESY_CLIENT then tell the laundromat to expire > the client in the background. > > . Add a spinlock in nfs4_client to synchronize access to the > NFSD4_COURTESY_CLIENT and NFSD4_DESTROY_COURTESY_CLIENT flag to > handle race conditions when resolving lock and share reservation > conflict. > > . Courtesy client that was marked as NFSD4_DESTROY_COURTESY_CLIENT > are now consisdered 'dead', waiting for the laundromat to expire > it. This client is no longer allowed to use its states if it > re-connects before the laundromat finishes expiring the client. > > For v4.1 client, the detection is done in the processing of the > SEQUENCE op and returns NFS4ERR_BAD_SESSION to force the client > to re-establish new clientid and session. > For v4.0 client, the detection is done in the processing of the > RENEW and state-related ops and return NFS4ERR_EXPIRE to force > the client to re-establish new clientid.
On 1/12/22 10:59 AM, J. Bruce Fields wrote: > Could you look back over previous comments? I notice there's a couple > unaddressed (circular locking dependency, Documentation/filesystems/). I think v9 addresses the circular locking dependency. I will update the Documentation/filesystems/locking.rst in v10. > > I agree with Chuck that we don't need to reschedule the laundromat, it's > OK if it takes longer to get around to cleaning up a dead client. Yes, it is now implemented for lock conflict and share reservation resolution. I'm doing the same for delegation conflict. -Dai > > --b. > > On Mon, Jan 10, 2022 at 10:50:51AM -0800, Dai Ngo wrote: >> Hi Bruce, Chuck >> >> This series of patches implement the NFSv4 Courteous Server. >> >> A server which does not immediately expunge the state on lease expiration >> is known as a Courteous Server. A Courteous Server continues to recognize >> previously generated state tokens as valid until conflict arises between >> the expired state and the requests from another client, or the server >> reboots. >> >> The v2 patch includes the following: >> >> . add new callback, lm_expire_lock, to lock_manager_operations to >> allow the lock manager to take appropriate action with conflict lock. >> >> . handle conflicts of NFSv4 locks with NFSv3/NLM and local locks. >> >> . expire courtesy client after 24hr if client has not reconnected. >> >> . do not allow expired client to become courtesy client if there are >> waiters for client's locks. >> >> . modify client_info_show to show courtesy client and seconds from >> last renew. >> >> . fix a problem with NFSv4.1 server where the it keeps returning >> SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after >> the courtesy client re-connects, causing the client to keep sending >> BCTS requests to server. >> >> The v3 patch includes the following: >> >> . modified posix_test_lock to check and resolve conflict locks >> to handle NLM TEST and NFSv4 LOCKT requests. >> >> . separate out fix for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. >> >> The v4 patch includes: >> >> . rework nfsd_check_courtesy to avoid dead lock of fl_lock and client_lock >> by asking the laudromat thread to destroy the courtesy client. >> >> . handle NFSv4 share reservation conflicts with courtesy client. This >> includes conflicts between access mode and deny mode and vice versa. >> >> . drop the patch for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. >> >> The v5 patch includes: >> >> . fix recursive locking of file_rwsem from posix_lock_file. >> >> . retest with LOCKDEP enabled. >> >> The v6 patch includes: >> >> . merge witn 5.15-rc7 >> >> . fix a bug in nfs4_check_deny_bmap that did not check for matched >> nfs4_file before checking for access/deny conflict. This bug causes >> pynfs OPEN18 to fail since the server taking too long to release >> lots of un-conflict clients' state. >> >> . enhance share reservation conflict handler to handle case where >> a large number of conflict courtesy clients need to be expired. >> The 1st 100 clients are expired synchronously and the rest are >> expired in the background by the laundromat and NFS4ERR_DELAY >> is returned to the NFS client. This is needed to prevent the >> NFS client from timing out waiting got the reply. >> >> The v7 patch includes: >> >> . Fix race condition in posix_test_lock and posix_lock_inode after >> dropping spinlock. >> >> . Enhance nfsd4_fl_expire_lock to work with with new lm_expire_lock >> callback >> >> . Always resolve share reservation conflicts asynchrously. >> >> . Fix bug in nfs4_laundromat where spinlock is not used when >> scanning cl_ownerstr_hashtbl. >> >> . Fix bug in nfs4_laundromat where idr_get_next was called >> with incorrect 'id'. >> >> . Merge nfs4_destroy_courtesy_client into nfsd4_fl_expire_lock. >> >> The v8 patch includes: >> >> . Fix warning in nfsd4_fl_expire_lock reported by test robot. >> >> The V9 patch include: >> >> . Simplify lm_expire_lock API by (1) remove the 'testonly' flag >> and (2) specifying return value as true/false to indicate >> whether conflict was succesfully resolved. >> >> . Rework nfsd4_fl_expire_lock to mark client with >> NFSD4_DESTROY_COURTESY_CLIENT then tell the laundromat to expire >> the client in the background. >> >> . Add a spinlock in nfs4_client to synchronize access to the >> NFSD4_COURTESY_CLIENT and NFSD4_DESTROY_COURTESY_CLIENT flag to >> handle race conditions when resolving lock and share reservation >> conflict. >> >> . Courtesy client that was marked as NFSD4_DESTROY_COURTESY_CLIENT >> are now consisdered 'dead', waiting for the laundromat to expire >> it. This client is no longer allowed to use its states if it >> re-connects before the laundromat finishes expiring the client. >> >> For v4.1 client, the detection is done in the processing of the >> SEQUENCE op and returns NFS4ERR_BAD_SESSION to force the client >> to re-establish new clientid and session. >> For v4.0 client, the detection is done in the processing of the >> RENEW and state-related ops and return NFS4ERR_EXPIRE to force >> the client to re-establish new clientid.