Message ID | 20150428202157.GA23972@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 4/29/2015 4:21 AM, Christoph Hellwig wrote: > Currently the client will just crap out if a CB_NULL comes in at the > same time as a slot controlled CB_COMPOUND that includes a CB_SEQUENCE. > > I don't really understand how the spec wants to treat CB_NULL in > relation to the available backchannel slots, but given that it's > not part of the sequences in CB_SEQUENCE it somehow nees to bypass them. > > If we make sure to overallocate the rpc-level buffers so that we > have more than the available NFS-level slots we should be safe from > this condition which makes a 4.1 server doing heavy recalls under > load very unhapy by not returning an NFS level reply to its layout > recalls. > > I dont really like this patch much, so any idea for a better solution > would be highly welcome! > > diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h > index 84326e9..7afb3ef 100644 > --- a/fs/nfs/callback.h > +++ b/fs/nfs/callback.h > @@ -205,7 +205,7 @@ extern int nfs4_set_callback_sessionid(struct nfs_client *clp); > * so we limit their concurrency to 1 by setting up the maximum number > * of slots for the backchannel. > */ > -#define NFS41_BC_MIN_CALLBACKS 1 > +#define NFS41_BC_MIN_CALLBACKS 2 > #define NFS41_BC_MAX_CALLBACKS 1 Are you sure that's okay without update NFS41_BC_MAX_CALLBACKS? thanks, Kinglong Mee -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 29, 2015 at 08:08:29PM +0800, Kinglong Mee wrote: > > diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h > > index 84326e9..7afb3ef 100644 > > --- a/fs/nfs/callback.h > > +++ b/fs/nfs/callback.h > > @@ -205,7 +205,7 @@ extern int nfs4_set_callback_sessionid(struct nfs_client *clp); > > * so we limit their concurrency to 1 by setting up the maximum number > > * of slots for the backchannel. > > */ > > -#define NFS41_BC_MIN_CALLBACKS 1 > > +#define NFS41_BC_MIN_CALLBACKS 2 > > #define NFS41_BC_MAX_CALLBACKS 1 > > Are you sure that's okay without update NFS41_BC_MAX_CALLBACKS? Yes, NFS41_BC_MIN_CALLBACKS is the number of slots we tell the sunrpc layer to allocate, while NFS41_BC_MAX_CALLBACKS is used for nfs-internal accounting. So having them different is intentional for this rfc. If we'll have to merge this instead of a better fix we defintively should fix up the naming, though. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Apr 28, 2015, at 4:21 PM, Christoph Hellwig <hch@infradead.org> wrote: > Currently the client will just crap out if a CB_NULL comes in at the > same time as a slot controlled CB_COMPOUND that includes a CB_SEQUENCE. Under what circumstances does the server send a CB_NULL while a CB_COMPOUND is in flight? > I don't really understand how the spec wants to treat CB_NULL in > relation to the available backchannel slots, but given that it's > not part of the sequences in CB_SEQUENCE it somehow nees to bypass them. > > If we make sure to overallocate the rpc-level buffers so that we > have more than the available NFS-level slots we should be safe from > this condition which makes a 4.1 server doing heavy recalls under > load very unhapy by not returning an NFS level reply to its layout > recalls. > > I dont really like this patch much, so any idea for a better solution > would be highly welcome! > > diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h > index 84326e9..7afb3ef 100644 > --- a/fs/nfs/callback.h > +++ b/fs/nfs/callback.h > @@ -205,7 +205,7 @@ extern int nfs4_set_callback_sessionid(struct nfs_client *clp); > * so we limit their concurrency to 1 by setting up the maximum number > * of slots for the backchannel. > */ > -#define NFS41_BC_MIN_CALLBACKS 1 > +#define NFS41_BC_MIN_CALLBACKS 2 > #define NFS41_BC_MAX_CALLBACKS 1 > > extern unsigned int nfs_callback_set_tcpport; > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 29, 2015 at 10:55 AM, Chuck Lever <chuck.lever@oracle.com> wrote: > > On Apr 28, 2015, at 4:21 PM, Christoph Hellwig <hch@infradead.org> wrote: > >> Currently the client will just crap out if a CB_NULL comes in at the >> same time as a slot controlled CB_COMPOUND that includes a CB_SEQUENCE. > > Under what circumstances does the server send a CB_NULL while a CB_COMPOUND > is in flight? > I agree with Chuck. Why does knfsd send a CB_NULL at all if the intention is to send CB_COMPOUND? Cheers Trond -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 29, 2015 at 10:55:10AM -0400, Chuck Lever wrote: > > On Apr 28, 2015, at 4:21 PM, Christoph Hellwig <hch@infradead.org> wrote: > > > Currently the client will just crap out if a CB_NULL comes in at the > > same time as a slot controlled CB_COMPOUND that includes a CB_SEQUENCE. > > Under what circumstances does the server send a CB_NULL while a CB_COMPOUND > is in flight? When a client is under heavy loads from fsx or aio-stress, and we lose the connection (nfsd4_conn_lost is called) while doing heavy recalls. xfstests against a server offering pnfs layouts for which the client can't reach the storage devices is an easy reproducer. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 29, 2015 at 11:14 AM, Christoph Hellwig <hch@infradead.org> wrote: > On Wed, Apr 29, 2015 at 10:55:10AM -0400, Chuck Lever wrote: >> >> On Apr 28, 2015, at 4:21 PM, Christoph Hellwig <hch@infradead.org> wrote: >> >> > Currently the client will just crap out if a CB_NULL comes in at the >> > same time as a slot controlled CB_COMPOUND that includes a CB_SEQUENCE. >> >> Under what circumstances does the server send a CB_NULL while a CB_COMPOUND >> is in flight? > > When a client is under heavy loads from fsx or aio-stress, and we lose > the connection (nfsd4_conn_lost is called) while doing heavy recalls. > > xfstests against a server offering pnfs layouts for which the client > can't reach the storage devices is an easy reproducer. Why does it need to do this? If the client has sent the BIND_CONN_TO_SESSION (which I believe that knfsd asks for), then the server knows that this is a bi-directional connection. The difference between NFSv4 and NFSv4.1 is that the CB_NULL should almost always be redundant, because the client initiates the connection and it explicitly tells the server whether or not it is to be used for the callback channel. The CB_NULL should always be redundant. Trond -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 29, 2015 at 11:24:02AM -0400, Trond Myklebust wrote: > On Wed, Apr 29, 2015 at 11:14 AM, Christoph Hellwig <hch@infradead.org> wrote: > > On Wed, Apr 29, 2015 at 10:55:10AM -0400, Chuck Lever wrote: > >> > >> On Apr 28, 2015, at 4:21 PM, Christoph Hellwig <hch@infradead.org> wrote: > >> > >> > Currently the client will just crap out if a CB_NULL comes in at the > >> > same time as a slot controlled CB_COMPOUND that includes a CB_SEQUENCE. > >> > >> Under what circumstances does the server send a CB_NULL while a CB_COMPOUND > >> is in flight? > > > > When a client is under heavy loads from fsx or aio-stress, and we lose > > the connection (nfsd4_conn_lost is called) while doing heavy recalls. > > > > xfstests against a server offering pnfs layouts for which the client > > can't reach the storage devices is an easy reproducer. > > Why does it need to do this? If the client has sent the > BIND_CONN_TO_SESSION (which I believe that knfsd asks for), then the > server knows that this is a bi-directional connection. > The difference between NFSv4 and NFSv4.1 is that the CB_NULL should > almost always be redundant, because the client initiates the > connection and it explicitly tells the server whether or not it is to > be used for the callback channel. > > The CB_NULL should always be redundant. I'd be fine with suppressing it. I think I actually intended to but screwed it up. (Chuck or somebody convinced me the NFSD4_CB_UP/UNKNOWN/DOWN logic is totally broken but I never got around to fixing it.) --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 29, 2015 at 01:34:54PM -0400, J. Bruce Fields wrote: > > Why does it need to do this? If the client has sent the > > BIND_CONN_TO_SESSION (which I believe that knfsd asks for), then the > > server knows that this is a bi-directional connection. > > The difference between NFSv4 and NFSv4.1 is that the CB_NULL should > > almost always be redundant, because the client initiates the > > connection and it explicitly tells the server whether or not it is to > > be used for the callback channel. > > > > The CB_NULL should always be redundant. > > I'd be fine with suppressing it. I think I actually intended to but > screwed it up. (Chuck or somebody convinced me the > NFSD4_CB_UP/UNKNOWN/DOWN logic is totally broken but I never got around > to fixing it.) I've dived into removing CB_NULL, and fixed various major breakage in the nfsd callback path. for which I will send you an RFC series ASAP. However, even with that I see the "Callback slot table overflowed" from the client under load. I think the problem is the following: Between sending the callback response in call_bc_transmit -> xprt_transmit and actually releasing the request from rpc_exit_task -> xprt_release -> xprt_free_bc_request there is race window, and between and overloaded client and a fast connection we can hit this one easily. My patch to increase the number of buffers for the backchannel ensures this doesn't happen in my setup, but of course I could envinsion a theoretical setu where the client is so slow that multiple already processed requests might not be returned yet. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Apr 30, 2015, at 2:25 AM, Christoph Hellwig <hch@infradead.org> wrote: > On Wed, Apr 29, 2015 at 01:34:54PM -0400, J. Bruce Fields wrote: >>> Why does it need to do this? If the client has sent the >>> BIND_CONN_TO_SESSION (which I believe that knfsd asks for), then the >>> server knows that this is a bi-directional connection. >>> The difference between NFSv4 and NFSv4.1 is that the CB_NULL should >>> almost always be redundant, because the client initiates the >>> connection and it explicitly tells the server whether or not it is to >>> be used for the callback channel. >>> >>> The CB_NULL should always be redundant. >> >> I'd be fine with suppressing it. I think I actually intended to but >> screwed it up. (Chuck or somebody convinced me the >> NFSD4_CB_UP/UNKNOWN/DOWN logic is totally broken but I never got around >> to fixing it.) > > I've dived into removing CB_NULL, and fixed various major breakage > in the nfsd callback path. for which I will send you an RFC series ASAP. > > However, even with that I see the "Callback slot table overflowed" from the > client under load. I think the problem is the following: > > Between sending the callback response in call_bc_transmit -> xprt_transmit > and actually releasing the request from rpc_exit_task -> xprt_release -> > xprt_free_bc_request there is race window, and between and overloaded client > and a fast connection we can hit this one easily. > > My patch to increase the number of buffers for the backchannel ensures > this doesn't happen in my setup, but of course I could envinsion a > theoretical setu where the client is so slow that multiple already > processed requests might not be returned yet. I’ve been discussing the possibility of adding more session slots on the Linux NFS client with jlayton. We think it would be straightforward, once the workqueue-based NFSD patches are in, to make the backchannel service into a workqueue. Then it would be a simple matter to increase the number of session slots. We haven’t discussed what would be needed on the server side of this equation, but sounds like it has some deeper problems if it is not obeying the session slot table limits advertised by the client. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h index 84326e9..7afb3ef 100644 --- a/fs/nfs/callback.h +++ b/fs/nfs/callback.h @@ -205,7 +205,7 @@ extern int nfs4_set_callback_sessionid(struct nfs_client *clp); * so we limit their concurrency to 1 by setting up the maximum number * of slots for the backchannel. */ -#define NFS41_BC_MIN_CALLBACKS 1 +#define NFS41_BC_MIN_CALLBACKS 2 #define NFS41_BC_MAX_CALLBACKS 1 extern unsigned int nfs_callback_set_tcpport;