Message ID | 20150430143731.GA22038@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Apologies for the resend... On Thu, Apr 30, 2015 at 11:01 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > > > On Thu, Apr 30, 2015 at 10:37 AM, Christoph Hellwig <hch@infradead.org> > wrote: >> >> On Thu, Apr 30, 2015 at 10:34:02AM -0400, Chuck Lever wrote: >> > I?ve been discussing the possibility of adding more session slots on >> > the Linux NFS client with jlayton. We think it would be straightforward, >> > once the workqueue-based NFSD patches are in, to make the backchannel >> > service into a workqueue. Then it would be a simple matter to increase >> > the number of session slots. >> > >> > We haven?t discussed what would be needed on the server side of this >> > equation, but sounds like it has some deeper problems if it is not >> > obeying the session slot table limits advertised by the client. >> >> No, the client isn't obeying it's own slot limits >> >> The problem is when the client responds to a callback it still >> holds a references on rpc_rqst for a while. If the server >> sends the next callback fast enough to hit that race window the >> client incorrectly rejects it. Note that we never even get >> to the nfs code that check the slot id in this case, it's low-level >> sunrpc code that is the problem. > > > We can add dynamic allocation of a new slot as part of the backchannel reply > transmit workload. That way we close the race without opening for violation > of session limits. > > Cheers > Trond -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Apr 30, 2015, at 11:01 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: >> >> >> On Thu, Apr 30, 2015 at 10:37 AM, Christoph Hellwig <hch@infradead.org> wrote: >> On Thu, Apr 30, 2015 at 10:34:02AM -0400, Chuck Lever wrote: >> > I?ve been discussing the possibility of adding more session slots on >> > the Linux NFS client with jlayton. We think it would be straightforward, >> > once the workqueue-based NFSD patches are in, to make the backchannel >> > service into a workqueue. Then it would be a simple matter to increase >> > the number of session slots. >> > >> > We haven?t discussed what would be needed on the server side of this >> > equation, but sounds like it has some deeper problems if it is not >> > obeying the session slot table limits advertised by the client. >> >> No, the client isn't obeying it's own slot limits >> >> The problem is when the client responds to a callback it still >> holds a references on rpc_rqst for a while. If the server >> sends the next callback fast enough to hit that race window the >> client incorrectly rejects it. Note that we never even get >> to the nfs code that check the slot id in this case, it's low-level >> sunrpc code that is the problem. > > We can add dynamic allocation of a new slot as part of the backchannel reply transmit workload. That way we close the race without opening for violation of session limits. I’ll have to think about how that would affect RPC/RDMA backchannel. Transport resources are allocated when the transport is created, and can’t be dynamically added. (It certainly wouldn’t be a problem to overprovision, as Christoph has done here). I was thinking maybe using a local copy of the rpc_rqst for sending the backchannel reply, and freeing the rpc_rqst before sending, might close the window. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Apr 30, 2015, at 11:11 AM, Chuck Lever <chuck.lever@oracle.com> wrote: > > On Apr 30, 2015, at 11:01 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > >>> >>> >>> On Thu, Apr 30, 2015 at 10:37 AM, Christoph Hellwig <hch@infradead.org> wrote: >>> On Thu, Apr 30, 2015 at 10:34:02AM -0400, Chuck Lever wrote: >>>> I?ve been discussing the possibility of adding more session slots on >>>> the Linux NFS client with jlayton. We think it would be straightforward, >>>> once the workqueue-based NFSD patches are in, to make the backchannel >>>> service into a workqueue. Then it would be a simple matter to increase >>>> the number of session slots. >>>> >>>> We haven?t discussed what would be needed on the server side of this >>>> equation, but sounds like it has some deeper problems if it is not >>>> obeying the session slot table limits advertised by the client. >>> >>> No, the client isn't obeying it's own slot limits >>> >>> The problem is when the client responds to a callback it still >>> holds a references on rpc_rqst for a while. If the server >>> sends the next callback fast enough to hit that race window the >>> client incorrectly rejects it. Note that we never even get >>> to the nfs code that check the slot id in this case, it's low-level >>> sunrpc code that is the problem. >> >> We can add dynamic allocation of a new slot as part of the backchannel reply transmit workload. That way we close the race without opening for violation of session limits. > > I’ll have to think about how that would affect RPC/RDMA backchannel. > Transport resources are allocated when the transport is created, and > can’t be dynamically added. (It certainly wouldn’t be a problem to > overprovision, as Christoph has done here). We discussed this briefly during the Linux NFS town hall meeting. I agree using dynamic slot allocation for TCP is fine, and RPC/RDMA can use simple overprovisioning. This way the upper layer (NFSv4.1 client) doesn’t have to be aware of limitations in the RPC layer mechanism. Trond may have an additional concern that I didn’t capture. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 30, 2015 at 01:41:02PM -0400, Chuck Lever wrote: > We discussed this briefly during the Linux NFS town hall meeting. > I agree using dynamic slot allocation for TCP is fine, and RPC/RDMA > can use simple overprovisioning. > > This way the upper layer (NFSv4.1 client) doesn?t have to be aware of > limitations in the RPC layer mechanism. > > Trond may have an additional concern that I didn?t capture. The other option would be to simply overallocate in the transport layer, as that is the layer which causes the problem to start with. That being said, what is the argument for doing any sort of static allocation here? I'm fine with doing fully dynamic allocation if that works out fine, but a mixed static / dynamic allocation sounds like a nightmare. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 1, 2015 at 1:23 PM, Christoph Hellwig <hch@infradead.org> wrote: > On Thu, Apr 30, 2015 at 01:41:02PM -0400, Chuck Lever wrote: >> We discussed this briefly during the Linux NFS town hall meeting. >> I agree using dynamic slot allocation for TCP is fine, and RPC/RDMA >> can use simple overprovisioning. >> >> This way the upper layer (NFSv4.1 client) doesn?t have to be aware of >> limitations in the RPC layer mechanism. >> >> Trond may have an additional concern that I didn?t capture. > > The other option would be to simply overallocate in the transport layer, > as that is the layer which causes the problem to start with. > > That being said, what is the argument for doing any sort of static > allocation here? I'm fine with doing fully dynamic allocation if that > works out fine, but a mixed static / dynamic allocation sounds like a > nightmare. The concern is not so much static vs dynamic. The concern is limiting incoming RPC calls to the number allowed by the NFSv4.1 session. Right now, the static allocation enforces the limit of 1 slot that the client offers to the server (albeit with the race) and so I want any replacement to meet the same requirement. Cheers Trond -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On May 1, 2015, at 1:23 PM, Christoph Hellwig <hch@infradead.org> wrote: > On Thu, Apr 30, 2015 at 01:41:02PM -0400, Chuck Lever wrote: >> We discussed this briefly during the Linux NFS town hall meeting. >> I agree using dynamic slot allocation for TCP is fine, and RPC/RDMA >> can use simple overprovisioning. >> >> This way the upper layer (NFSv4.1 client) doesn?t have to be aware of >> limitations in the RPC layer mechanism. >> >> Trond may have an additional concern that I didn?t capture. > > The other option would be to simply overallocate in the transport layer, > as that is the layer which causes the problem to start with. That’s exactly what the RDMA backchannel will do. > That being said, what is the argument for doing any sort of static > allocation here? I'm fine with doing fully dynamic allocation if that > works out fine, but a mixed static / dynamic allocation sounds like a > nightmare. RDMA resources must be allocated and pre-registered up front. The RDMA transport can’t support dynamic slot allocation without adding a lot more complexity. The RDMA transport will need to have separate backchannel setup and destroy methods anyway. So doing dynamic for TCP and overprovision for RDMA will be simple. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 01, 2015 at 01:28:12PM -0400, Trond Myklebust wrote: > The concern is not so much static vs dynamic. The concern is limiting > incoming RPC calls to the number allowed by the NFSv4.1 session. Right > now, the static allocation enforces the limit of 1 slot that the > client offers to the server (albeit with the race) and so I want any > replacement to meet the same requirement. Either variant will allow accepting more than one backchannel request at the RPC layer, that's the whole point. With the simple patch I posted it will accept a 2nd one, with dynamic allocation the number would be potentially unbound. But given that the NFS client already enforces the slot limit properly in validate_seqid, and even returns the proper nfs error for this case I don't really see what enforcing it at a lower level we we can't even report proper errors buys us. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 1, 2015 at 1:37 PM, Christoph Hellwig <hch@infradead.org> wrote: > On Fri, May 01, 2015 at 01:28:12PM -0400, Trond Myklebust wrote: >> The concern is not so much static vs dynamic. The concern is limiting >> incoming RPC calls to the number allowed by the NFSv4.1 session. Right >> now, the static allocation enforces the limit of 1 slot that the >> client offers to the server (albeit with the race) and so I want any >> replacement to meet the same requirement. > > Either variant will allow accepting more than one backchannel > request at the RPC layer, that's the whole point. With the simple > patch I posted it will accept a 2nd one, with dynamic allocation > the number would be potentially unbound. If you do it wrong, yes. The point is if you do the allocation as part of the send process, then you prevent unbounded growth. > But given that the NFS client already enforces the slot limit > properly in validate_seqid, and even returns the proper nfs error > for this case I don't really see what enforcing it at a lower level > we we can't even report proper errors buys us. It buys us better congestion control. We don't even need to process the request. Cheers Trond -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h index 84326e9..7f79345 100644 --- a/fs/nfs/callback.h +++ b/fs/nfs/callback.h @@ -200,13 +200,18 @@ extern int nfs4_validate_delegation_stateid(struct nfs_delegation *delegation, const nfs4_stateid *stateid); extern int nfs4_set_callback_sessionid(struct nfs_client *clp); #endif /* CONFIG_NFS_V4 */ + /* * nfs41: Callbacks are expected to not cause substantial latency, * so we limit their concurrency to 1 by setting up the maximum number * of slots for the backchannel. + * + * Note that we allocate a second RPC request structure because the + * RPC code holds on to the buffer for a while after sending a reply + * to the server. */ -#define NFS41_BC_MIN_CALLBACKS 1 -#define NFS41_BC_MAX_CALLBACKS 1 +#define NFS41_BC_SLOTS 1 +#define NFS41_BC_REQUESTS (NFS41_BC_SLOTS * 2) extern unsigned int nfs_callback_set_tcpport; extern unsigned short nfs_callback_tcpport; diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c index 197806f..d733f04 100644 --- a/fs/nfs/callback_proc.c +++ b/fs/nfs/callback_proc.c @@ -320,7 +320,7 @@ validate_seqid(struct nfs4_slot_table *tbl, struct cb_sequenceargs * args) dprintk("%s enter. slotid %u seqid %u\n", __func__, args->csa_slotid, args->csa_sequenceid); - if (args->csa_slotid >= NFS41_BC_MAX_CALLBACKS) + if (args->csa_slotid >= NFS41_BC_SLOTS) return htonl(NFS4ERR_BADSLOT); slot = tbl->slots + args->csa_slotid; @@ -465,8 +465,8 @@ __be32 nfs4_callback_sequence(struct cb_sequenceargs *args, sizeof(res->csr_sessionid)); res->csr_sequenceid = args->csa_sequenceid; res->csr_slotid = args->csa_slotid; - res->csr_highestslotid = NFS41_BC_MAX_CALLBACKS - 1; - res->csr_target_highestslotid = NFS41_BC_MAX_CALLBACKS - 1; + res->csr_highestslotid = NFS41_BC_SLOTS - 1; + res->csr_target_highestslotid = NFS41_BC_SLOTS - 1; out: cps->clp = clp; /* put in nfs4_callback_compound */ diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c index e42be52..e36717c 100644 --- a/fs/nfs/nfs4client.c +++ b/fs/nfs/nfs4client.c @@ -248,7 +248,7 @@ static int nfs4_init_callback(struct nfs_client *clp) xprt = rcu_dereference_raw(clp->cl_rpcclient->cl_xprt); if (nfs4_has_session(clp)) { - error = xprt_setup_backchannel(xprt, NFS41_BC_MIN_CALLBACKS); + error = xprt_setup_backchannel(xprt, NFS41_BC_REQUESTS); if (error < 0) return error; } diff --git a/fs/nfs/nfs4session.c b/fs/nfs/nfs4session.c index e23366e..3ad4da4 100644 --- a/fs/nfs/nfs4session.c +++ b/fs/nfs/nfs4session.c @@ -500,7 +500,7 @@ void nfs4_destroy_session(struct nfs4_session *session) rcu_read_unlock(); dprintk("%s Destroy backchannel for xprt %p\n", __func__, xprt); - xprt_destroy_backchannel(xprt, NFS41_BC_MIN_CALLBACKS); + xprt_destroy_backchannel(xprt, NFS41_BC_REQUESTS); nfs4_destroy_session_slot_tables(session); kfree(session); }