Message ID | 1650739455-26096-1-git-send-email-dai.ngo@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | NFSD: Initial implementation of NFSv4 Courteous Server | expand |
On Sat, Apr 23, 2022 at 11:44:08AM -0700, Dai Ngo wrote: > . Redo based on Bruce's suggestion by breaking the patches into functionality > and also don't remove client record of courtesy client until the client is > actually expired. > > 0001: courteous server framework with support for client with delegation only. > This patch also handles COURTESY and EXPIRABLE reconnect. > Conflict is resolved by set the courtesy client to EXPIRABLE, let the > laundromat expires the client on next run and return NFS4ERR_DELAY > OPEN request. > > 0002: add support for opens/share reservation to courteous server > Conflict is resolved by set the courtesy client to EXPIRABLE, let the > laundromat expires the client on next run and return NFS4ERR_DELAY > OPEN request. > > 0003: mv creation/destroying laundromat workqueue from nfs4_state_start and > and nfs4_state_shutdown_net to init_nfsd and exit_nfsd. > > 0004: fs/lock: add locks_owner_has_blockers helper > > 0005: add 2 callbacks to lock_manager_operations for resolving lock conflict > > 0006: add support for locks to courteous server, making use of 0004 and 0005 > Conflict is resolved by set the courtesy client to EXPIRABLE, run the > laundromat immediately and wait for it to complete before returning to > fs/lock code to recheck the lock list from the beginning. > > NOTE: I could not get queue_work/queue_delay_work and flush_workqueue > to work as expected, I have to use mod_delayed_work and flush_workqueue > to get the laundromat to run immediately. Whoops, yes, my bad. > When we check for blockers in nfs4_anylock_blockers, we do not check > for client with delegation conflict. This is because we already hold > the client_lock and to check for delegation conflict we need the state_lock > and scanning the del_recall_lru list each time. So to avoid this overhead > and potential deadlock (not sure about lock of ordering of these locks) > we check and set the COURTESY client with delegation being recalled to > EXPIRABLE later in nfs4_laundromat. Hm, OK, I'll think about that, but sounds like it should work. --b. > > 0007: show state of courtesy client in client info.
I'm getting a few new pynfs failures after applying these. I haven't tried to investigage what's happening. --b. ************************************************** RENEW3 st_renew.testExpired : FAILURE nfs4lib.BadCompoundRes: Opening file b'RENEW3-1': operation OP_OPEN should return NFS4_OK, instead got NFS4ERR_DELAY LKU10 st_locku.testTimedoutUnlock : FAILURE nfs4lib.BadCompoundRes: Opening file b'LKU10-1': operation OP_OPEN should return NFS4_OK, instead got NFS4ERR_DELAY CLOSE9 st_close.testTimedoutClose2 : FAILURE nfs4lib.BadCompoundRes: Opening file b'CLOSE9-1': operation OP_OPEN should return NFS4_OK, instead got NFS4ERR_DELAY CLOSE8 st_close.testTimedoutClose1 : FAILURE nfs4lib.BadCompoundRes: Opening file b'CLOSE8-1': operation OP_OPEN should return NFS4_OK, instead got NFS4ERR_DELAY
On 4/25/22 10:53 AM, J. Bruce Fields wrote: > I'm getting a few new pynfs failures after applying these. I haven't > tried to investigage what's happening. > > --b. > > ************************************************** > RENEW3 st_renew.testExpired : FAILURE > nfs4lib.BadCompoundRes: Opening file b'RENEW3-1': > operation OP_OPEN should return NFS4_OK, instead got > NFS4ERR_DELAY > LKU10 st_locku.testTimedoutUnlock : FAILURE > nfs4lib.BadCompoundRes: Opening file b'LKU10-1': > operation OP_OPEN should return NFS4_OK, instead got > NFS4ERR_DELAY > CLOSE9 st_close.testTimedoutClose2 : FAILURE > nfs4lib.BadCompoundRes: Opening file b'CLOSE9-1': > operation OP_OPEN should return NFS4_OK, instead got > NFS4ERR_DELAY > CLOSE8 st_close.testTimedoutClose1 : FAILURE > nfs4lib.BadCompoundRes: Opening file b'CLOSE8-1': > operation OP_OPEN should return NFS4_OK, instead got > NFS4ERR_DELAY with this patches, OPEN (v4.0 and v4.1) might have to handle NFS4ERR_DELAY if there is a reservation conflict. I had to modify open_confirm (v4.0) and open_create_file (v4.1) to handle the NFS4ERR_DELAY error. -Dai
On Mon, Apr 25, 2022 at 11:16:54AM -0700, dai.ngo@oracle.com wrote: > > On 4/25/22 10:53 AM, J. Bruce Fields wrote: > >I'm getting a few new pynfs failures after applying these. I haven't > >tried to investigage what's happening. > > > >--b. > > > >************************************************** > >RENEW3 st_renew.testExpired : FAILURE > > nfs4lib.BadCompoundRes: Opening file b'RENEW3-1': > > operation OP_OPEN should return NFS4_OK, instead got > > NFS4ERR_DELAY > >LKU10 st_locku.testTimedoutUnlock : FAILURE > > nfs4lib.BadCompoundRes: Opening file b'LKU10-1': > > operation OP_OPEN should return NFS4_OK, instead got > > NFS4ERR_DELAY > >CLOSE9 st_close.testTimedoutClose2 : FAILURE > > nfs4lib.BadCompoundRes: Opening file b'CLOSE9-1': > > operation OP_OPEN should return NFS4_OK, instead got > > NFS4ERR_DELAY > >CLOSE8 st_close.testTimedoutClose1 : FAILURE > > nfs4lib.BadCompoundRes: Opening file b'CLOSE8-1': > > operation OP_OPEN should return NFS4_OK, instead got > > NFS4ERR_DELAY > > with this patches, OPEN (v4.0 and v4.1) might have to handle NFS4ERR_DELAY > if there is a reservation conflict. I had to modify open_confirm (v4.0) and > open_create_file (v4.1) to handle the NFS4ERR_DELAY error. Looking at your patch 2.... OK, my suggestion was: "in the case of a conflict, call try_to_expire_client and queue_work(), then modify e.g. nfs4_get_vfs_file to flush_workqueue() and then retry after unlocking fi_lock." You're returning DELAY instead of waiting for the workqueue. After thinking on it a minute.... I like your way better. It's simpler. Any client has to handle DELAY on OPEN, for the delegation-conflict case. Maybe it's a little suboptimal, but so what, this is a very rare case. So, good thinking, let's stick with that idea. We'll just need to fix up pynfs some time. --b.