Message ID | 20130708095812.355d7cfc@notabene.brown (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, 2013-07-08 at 09:58 +1000, NeilBrown wrote: > > This patch adds a "nosharetransport" option to allow two different > mounts from the same server to use different transports. > If the mounts use NFSv4, or are of the same filesystem, then > "nosharecache" must be used as well. Won't this interfere with the recently added NFSv4 trunking detection? Also, how will it work with NFSv4.1 sessions? The server will usually require a BIND_CONN_TO_SESSION when new TCP connections attempt to attach to an existing session. > There are at least two circumstances where it might be desirable > to use separate transports: > > 1/ If the NFS server can get into a state where it will ignore > requests for one filesystem while servicing request for another, > then using separate connections for the separate filesystems can > stop problems with one affecting access to the other. > > This is particularly relevant for NetApp filers where one filesystem > has been "suspended". Requests to that filesystem will be dropped > (rather than the more correct NFS3ERR_JUKEBOX). This currently > interferes with other filesystems. This is a known issue that really needs to be fixed on the server, not on the client. As far as I know, work is already underway to fix this. > 2/ If a very fast network is used with a many-processor client, a > single TCP connection can present a bottle neck which reduces total > throughput. Using multiple TCP connections (one per mount) removes > the bottleneck. > An alternate workaround is to configure multiple virtual IP > addresses on the server and mount each filesystem from a different > IP. This is effective (throughput goes up) but an unnecessary > administrative burden. As I understand it, using multiple simultaneous TCP connections between the same endpoints also adds a risk that the congestion windows will interfere. Do you have numbers to back up the claim of a performance improvement? The other issue I can think of is that for NFS versions < 4.1, this may cause the server to allocate more resources per client in the form of replay caches etc. > Signed-off-by: NeilBrown <neilb@suse.de> > > --- > Is this a good idea? Bad idea? Have I missed something important? > > NeilBrown > > > diff --git a/fs/nfs/client.c b/fs/nfs/client.c > index c513b0c..64e3f39 100644 > --- a/fs/nfs/client.c > +++ b/fs/nfs/client.c > @@ -403,8 +403,13 @@ static struct nfs_client *nfs_match_client(const struct nfs_client_initdata *dat > const struct sockaddr *sap = data->addr; > struct nfs_net *nn = net_generic(data->net, nfs_net_id); > > + if (test_bit(NFS_CS_NO_SHARE, &data->init_flags)) > + return NULL; > + > list_for_each_entry(clp, &nn->nfs_client_list, cl_share_link) { > const struct sockaddr *clap = (struct sockaddr *)&clp->cl_addr; > + if (test_bit(NFS_CS_NO_SHARE,&clp->cl_flags)) > + continue; > /* Don't match clients that failed to initialise properly */ > if (clp->cl_cons_state < 0) > continue; > @@ -753,6 +758,8 @@ static int nfs_init_server(struct nfs_server *server, > data->timeo, data->retrans); > if (data->flags & NFS_MOUNT_NORESVPORT) > set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags); > + if (data->flags & NFS_MOUNT_NOSHARE_XPRT) > + set_bit(NFS_CS_NO_SHARE, &cl_init.init_flags); > if (server->options & NFS_OPTION_MIGRATION) > set_bit(NFS_CS_MIGRATION, &cl_init.init_flags); > > diff --git a/fs/nfs/super.c b/fs/nfs/super.c > index 2d7525f..d9141d8 100644 > --- a/fs/nfs/super.c > +++ b/fs/nfs/super.c > @@ -88,6 +88,7 @@ enum { > Opt_acl, Opt_noacl, > Opt_rdirplus, Opt_nordirplus, > Opt_sharecache, Opt_nosharecache, > + Opt_sharetransport, Opt_nosharetransport, > Opt_resvport, Opt_noresvport, > Opt_fscache, Opt_nofscache, > Opt_migration, Opt_nomigration, > @@ -146,6 +147,8 @@ static const match_table_t nfs_mount_option_tokens = { > { Opt_nordirplus, "nordirplus" }, > { Opt_sharecache, "sharecache" }, > { Opt_nosharecache, "nosharecache" }, > + { Opt_sharetransport, "sharetransport"}, > + { Opt_nosharetransport, "nosharetransport"}, > { Opt_resvport, "resvport" }, > { Opt_noresvport, "noresvport" }, > { Opt_fscache, "fsc" }, > @@ -634,6 +637,7 @@ static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss, > { NFS_MOUNT_NOACL, ",noacl", "" }, > { NFS_MOUNT_NORDIRPLUS, ",nordirplus", "" }, > { NFS_MOUNT_UNSHARED, ",nosharecache", "" }, > + { NFS_MOUNT_NOSHARE_XPRT, ",nosharetransport", ""}, > { NFS_MOUNT_NORESVPORT, ",noresvport", "" }, > { 0, NULL, NULL } > }; > @@ -1239,6 +1243,12 @@ static int nfs_parse_mount_options(char *raw, > case Opt_nosharecache: > mnt->flags |= NFS_MOUNT_UNSHARED; > break; > + case Opt_sharetransport: > + mnt->flags &= ~NFS_MOUNT_NOSHARE_XPRT; > + break; > + case Opt_nosharetransport: > + mnt->flags |= NFS_MOUNT_NOSHARE_XPRT; > + break; > case Opt_resvport: > mnt->flags &= ~NFS_MOUNT_NORESVPORT; > break; > diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h > index 3b7fa2a..9e9d7d3 100644 > --- a/include/linux/nfs_fs_sb.h > +++ b/include/linux/nfs_fs_sb.h > @@ -41,6 +41,7 @@ struct nfs_client { > #define NFS_CS_DISCRTRY 1 /* - disconnect on RPC retry */ > #define NFS_CS_MIGRATION 2 /* - transparent state migr */ > #define NFS_CS_INFINITE_SLOTS 3 /* - don't limit TCP slots */ > +#define NFS_CS_NO_SHARE 4 /* - don't share across mounts */ > struct sockaddr_storage cl_addr; /* server identifier */ > size_t cl_addrlen; > char * cl_hostname; /* hostname of server */ > diff --git a/include/uapi/linux/nfs_mount.h b/include/uapi/linux/nfs_mount.h > index 576bddd..81c49ff 100644 > --- a/include/uapi/linux/nfs_mount.h > +++ b/include/uapi/linux/nfs_mount.h > @@ -73,5 +73,6 @@ struct nfs_mount_data { > > #define NFS_MOUNT_LOCAL_FLOCK 0x100000 > #define NFS_MOUNT_LOCAL_FCNTL 0x200000 > +#define NFS_MOUNT_NOSHARE_XPRT 0x400000 > > #endif -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com
On Mon, 8 Jul 2013 18:51:40 +0000 "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > On Mon, 2013-07-08 at 09:58 +1000, NeilBrown wrote: > > > > This patch adds a "nosharetransport" option to allow two different > > mounts from the same server to use different transports. > > If the mounts use NFSv4, or are of the same filesystem, then > > "nosharecache" must be used as well. > > Won't this interfere with the recently added NFSv4 trunking detection? Will it? I googled around a bit but couldn't find anything that tells me what trunking really was in this context. Then I found commit 05f4c350ee02 which makes it quite clear (thanks Chuck!). Probably the code I wrote could interfere. > > Also, how will it work with NFSv4.1 sessions? The server will usually > require a BIND_CONN_TO_SESSION when new TCP connections attempt to > attach to an existing session. Why would it attempt to attach to an existing session? I would hope there the two different mounts with separate TCP connections would look completely separate - different transport, different cache, different session. ?? > > > There are at least two circumstances where it might be desirable > > to use separate transports: > > > > 1/ If the NFS server can get into a state where it will ignore > > requests for one filesystem while servicing request for another, > > then using separate connections for the separate filesystems can > > stop problems with one affecting access to the other. > > > > This is particularly relevant for NetApp filers where one filesystem > > has been "suspended". Requests to that filesystem will be dropped > > (rather than the more correct NFS3ERR_JUKEBOX). This currently > > interferes with other filesystems. > > This is a known issue that really needs to be fixed on the server, not > on the client. As far as I know, work is already underway to fix this. I wasn't aware of this, nor were our support people. I've passed it on so maybe they can bug Netapp.... > > > 2/ If a very fast network is used with a many-processor client, a > > single TCP connection can present a bottle neck which reduces total > > throughput. Using multiple TCP connections (one per mount) removes > > the bottleneck. > > An alternate workaround is to configure multiple virtual IP > > addresses on the server and mount each filesystem from a different > > IP. This is effective (throughput goes up) but an unnecessary > > administrative burden. > > As I understand it, using multiple simultaneous TCP connections between > the same endpoints also adds a risk that the congestion windows will > interfere. Do you have numbers to back up the claim of a performance > improvement? A customer upgraded from SLES10 (2.6.16 based) to SLES11 (3.0 based) and saw a slowdown on some large DB jobs of between 1.5 and 2 times (i.e. total time 150% to 200% of what is was before). After some analysis they created multiple virtual IPs on the server and mounted the several filesystem each from different IPs and got the performance back (they see this as a work-around rather than a genuine solution). Numbers are like "500MB/s on a single connection, 850MB/sec peaking to 1000MB/sec on multiple connections". If I can get something more concrete I'll let you know. As this worked well in 2.6.16 (which doesn't try to share connections) this is seen as a regression. On links that are easy to saturate, congestion windows are important and having a single connection is probably a good idea - so the current default is certainly correct. On a 10G ethernet or infiniband connection (where the issue has been measured) congestion just doesn't seem to be an issue. Thanks, NeilBrown
On Tue, 9 Jul 2013 13:22:53 +1000 NeilBrown <neilb@suse.de> wrote: > A customer upgraded from SLES10 (2.6.16 based) to SLES11 (3.0 based) and saw > a slowdown on some large DB jobs of between 1.5 and 2 times (i.e. total time > 150% to 200% of what is was before). > After some analysis they created multiple virtual IPs on the server and > mounted the several filesystem each from different IPs and got the > performance back (they see this as a work-around rather than a genuine > solution). > Numbers are like "500MB/s on a single connection, 850MB/sec peaking to > 1000MB/sec on multiple connections". > > If I can get something more concrete I'll let you know. Slightly more concrete: 4 mounts from the one server, 10 threads of fio on each mount. All over a 10G Ethernet. 1 IP address without "nosharetransport": ~700MB/s 4 IP addresses without "nosharetransport": ~1100MB/s 1 IP address with "nosharetransport": ~1100MB/s This is all NFSv3. NFSv4 is much slower with nosharecache (32MB/s!), but I might have botched the backport to 3.0 (or didn't address the v4 specific issues you raised). I haven't looked into that yet. NeilBrown
On Jul 8, 2013, at 11:22 PM, NeilBrown <neilb@suse.de> wrote: > On Mon, 8 Jul 2013 18:51:40 +0000 "Myklebust, Trond" > <Trond.Myklebust@netapp.com> wrote: > >> On Mon, 2013-07-08 at 09:58 +1000, NeilBrown wrote: >>> >>> This patch adds a "nosharetransport" option to allow two different >>> mounts from the same server to use different transports. >>> If the mounts use NFSv4, or are of the same filesystem, then >>> "nosharecache" must be used as well. >> >> Won't this interfere with the recently added NFSv4 trunking detection? > > Will it? I googled around a bit but couldn't find anything that tells me > what trunking really was in this context. Then I found commit 05f4c350ee02 > which makes it quite clear (thanks Chuck!). > > Probably the code I wrote could interfere. > >> >> Also, how will it work with NFSv4.1 sessions? The server will usually >> require a BIND_CONN_TO_SESSION when new TCP connections attempt to >> attach to an existing session. > > Why would it attempt to attach to an existing session? I would hope there > the two different mounts with separate TCP connections would look completely > separate - different transport, different cache, different session. > ?? > >> >>> There are at least two circumstances where it might be desirable >>> to use separate transports: >>> >>> 1/ If the NFS server can get into a state where it will ignore >>> requests for one filesystem while servicing request for another, >>> then using separate connections for the separate filesystems can >>> stop problems with one affecting access to the other. >>> >>> This is particularly relevant for NetApp filers where one filesystem >>> has been "suspended". Requests to that filesystem will be dropped >>> (rather than the more correct NFS3ERR_JUKEBOX). This currently >>> interferes with other filesystems. >> >> This is a known issue that really needs to be fixed on the server, not >> on the client. As far as I know, work is already underway to fix this. > > I wasn't aware of this, nor were our support people. I've passed it on so > maybe they can bug Netapp.... > >> >>> 2/ If a very fast network is used with a many-processor client, a >>> single TCP connection can present a bottle neck which reduces total >>> throughput. Using multiple TCP connections (one per mount) removes >>> the bottleneck. >>> An alternate workaround is to configure multiple virtual IP >>> addresses on the server and mount each filesystem from a different >>> IP. This is effective (throughput goes up) but an unnecessary >>> administrative burden. >> >> As I understand it, using multiple simultaneous TCP connections between >> the same endpoints also adds a risk that the congestion windows will >> interfere. Do you have numbers to back up the claim of a performance >> improvement? > > A customer upgraded from SLES10 (2.6.16 based) to SLES11 (3.0 based) and saw > a slowdown on some large DB jobs of between 1.5 and 2 times (i.e. total time > 150% to 200% of what is was before). > After some analysis they created multiple virtual IPs on the server and > mounted the several filesystem each from different IPs and got the > performance back (they see this as a work-around rather than a genuine > solution). > Numbers are like "500MB/s on a single connection, 850MB/sec peaking to > 1000MB/sec on multiple connections". > > If I can get something more concrete I'll let you know. > > As this worked well in 2.6.16 (which doesn't try to share connections) this > is seen as a regression. > > On links that are easy to saturate, congestion windows are important and > having a single connection is probably a good idea - so the current default > is certainly correct. > On a 10G ethernet or infiniband connection (where the issue has been > measured) congestion just doesn't seem to be an issue. We've toyed with the idea of using multiple TCP connections per mount for years. The choice was made to stick with one connection (and one session on NFSv4.1) for each server. The main limitation has been having a single RPC slot table for the transport, allowing only 16 concurrent RPC requests per server at a time. Andy and Trond did some good work making the slot table widen itself dynamically as the TCP window opens. A secondary concern is head-of-queue blocking. The server end can certainly stall a client by not taking the top request off the socket queue, and thereby delay any requests that are behind that one in the queue. I think the preferred solution there is to build out support for RPC over SCTP, and use SCTP's multi-stream feature. Alternately we might choose to try out M-TCP. Server implementations can also be made sensitive to this issue to help prevent delays. A tertiary issue is contention for the transport on multi-socket systems. For a long while I've suspected it may occur, but I've never measured it in practice. Re: the problem at hand: You've definitely measured a performance regression. However, I don't think those numbers explain _why_ it is occurring. The first thing to check is whether SuSE11 has the dynamic RPC slot table logic I mentioned above. I think it starts with upstream commit d9ba131d, but someone should correct me if I'm wrong.
T24gVHVlLCAyMDEzLTA3LTA5IGF0IDEzOjIyICsxMDAwLCBOZWlsQnJvd24gd3JvdGU6DQo+IE9u IE1vbiwgOCBKdWwgMjAxMyAxODo1MTo0MCArMDAwMCAiTXlrbGVidXN0LCBUcm9uZCINCj4gPFRy b25kLk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gDQo+ID4gT24gTW9uLCAyMDEzLTA3 LTA4IGF0IDA5OjU4ICsxMDAwLCBOZWlsQnJvd24gd3JvdGU6DQo+ID4gPiANCj4gPiA+IFRoaXMg cGF0Y2ggYWRkcyBhICJub3NoYXJldHJhbnNwb3J0IiBvcHRpb24gdG8gYWxsb3cgdHdvIGRpZmZl cmVudA0KPiA+ID4gbW91bnRzIGZyb20gdGhlIHNhbWUgc2VydmVyIHRvIHVzZSBkaWZmZXJlbnQg dHJhbnNwb3J0cy4NCj4gPiA+IElmIHRoZSBtb3VudHMgdXNlIE5GU3Y0LCBvciBhcmUgb2YgdGhl IHNhbWUgZmlsZXN5c3RlbSwgdGhlbg0KPiA+ID4gIm5vc2hhcmVjYWNoZSIgbXVzdCBiZSB1c2Vk IGFzIHdlbGwuDQo+ID4gDQo+ID4gV29uJ3QgdGhpcyBpbnRlcmZlcmUgd2l0aCB0aGUgcmVjZW50 bHkgYWRkZWQgTkZTdjQgdHJ1bmtpbmcgZGV0ZWN0aW9uPw0KPiANCj4gV2lsbCBpdD8gIEkgZ29v Z2xlZCBhcm91bmQgYSBiaXQgYnV0IGNvdWxkbid0IGZpbmQgYW55dGhpbmcgdGhhdCB0ZWxscyBt ZQ0KPiB3aGF0IHRydW5raW5nIHJlYWxseSB3YXMgaW4gdGhpcyBjb250ZXh0LiAgVGhlbiBJIGZv dW5kIGNvbW1pdCAwNWY0YzM1MGVlMDIgDQo+IHdoaWNoIG1ha2VzIGl0IHF1aXRlIGNsZWFyICh0 aGFua3MgQ2h1Y2shKS4NCj4gDQo+IFByb2JhYmx5IHRoZSBjb2RlIEkgd3JvdGUgY291bGQgaW50 ZXJmZXJlLg0KPiANCj4gPiANCj4gPiBBbHNvLCBob3cgd2lsbCBpdCB3b3JrIHdpdGggTkZTdjQu MSBzZXNzaW9ucz8gVGhlIHNlcnZlciB3aWxsIHVzdWFsbHkNCj4gPiByZXF1aXJlIGEgQklORF9D T05OX1RPX1NFU1NJT04gd2hlbiBuZXcgVENQIGNvbm5lY3Rpb25zIGF0dGVtcHQgdG8NCj4gPiBh dHRhY2ggdG8gYW4gZXhpc3Rpbmcgc2Vzc2lvbi4NCj4gDQo+IFdoeSB3b3VsZCBpdCBhdHRlbXB0 IHRvIGF0dGFjaCB0byBhbiBleGlzdGluZyBzZXNzaW9uPyAgSSB3b3VsZCBob3BlIHRoZXJlDQo+ IHRoZSB0d28gZGlmZmVyZW50IG1vdW50cyB3aXRoIHNlcGFyYXRlIFRDUCBjb25uZWN0aW9ucyB3 b3VsZCBsb29rIGNvbXBsZXRlbHkNCj4gc2VwYXJhdGUgLSBkaWZmZXJlbnQgdHJhbnNwb3J0LCBk aWZmZXJlbnQgY2FjaGUsIGRpZmZlcmVudCBzZXNzaW9uLg0KPiA/Pw0KDQpDdXJyZW50bHkgd2Ug bWFwIHNlc3Npb25zIGFuZCBsZWFzZXMgMS0xLiBZb3UnZCBoYXZlIHF1aXRlIHNvbWUgd29yayB0 bw0KZG8gdG8gY2hhbmdlIHRoYXQsIGFuZCBpdCBpcyB2ZXJ5IHVuY2xlYXIgdG8gbWUgdGhhdCB0 aGVyZSBpcyBhbnkNCmJlbmVmaXQgdG8gZG9pbmcgc28uDQoNCj4gPiANCj4gPiA+IDIvIElmIGEg dmVyeSBmYXN0IG5ldHdvcmsgaXMgdXNlZCB3aXRoIGEgbWFueS1wcm9jZXNzb3IgY2xpZW50LCBh DQo+ID4gPiAgIHNpbmdsZSBUQ1AgY29ubmVjdGlvbiBjYW4gcHJlc2VudCBhIGJvdHRsZSBuZWNr IHdoaWNoIHJlZHVjZXMgdG90YWwNCj4gPiA+ICAgdGhyb3VnaHB1dC4gIFVzaW5nIG11bHRpcGxl IFRDUCBjb25uZWN0aW9ucyAob25lIHBlciBtb3VudCkgcmVtb3Zlcw0KPiA+ID4gICB0aGUgYm90 dGxlbmVjay4NCj4gPiA+ICAgQW4gYWx0ZXJuYXRlIHdvcmthcm91bmQgaXMgdG8gY29uZmlndXJl IG11bHRpcGxlIHZpcnR1YWwgSVANCj4gPiA+ICAgYWRkcmVzc2VzIG9uIHRoZSBzZXJ2ZXIgYW5k IG1vdW50IGVhY2ggZmlsZXN5c3RlbSBmcm9tIGEgZGlmZmVyZW50DQo+ID4gPiAgIElQLiAgVGhp cyBpcyBlZmZlY3RpdmUgKHRocm91Z2hwdXQgZ29lcyB1cCkgYnV0IGFuIHVubmVjZXNzYXJ5DQo+ ID4gPiAgIGFkbWluaXN0cmF0aXZlIGJ1cmRlbi4NCj4gPiANCj4gPiBBcyBJIHVuZGVyc3RhbmQg aXQsIHVzaW5nIG11bHRpcGxlIHNpbXVsdGFuZW91cyBUQ1AgY29ubmVjdGlvbnMgYmV0d2Vlbg0K PiA+IHRoZSBzYW1lIGVuZHBvaW50cyBhbHNvIGFkZHMgYSByaXNrIHRoYXQgdGhlIGNvbmdlc3Rp b24gd2luZG93cyB3aWxsDQo+ID4gaW50ZXJmZXJlLiBEbyB5b3UgaGF2ZSBudW1iZXJzIHRvIGJh Y2sgdXAgdGhlIGNsYWltIG9mIGEgcGVyZm9ybWFuY2UNCj4gPiBpbXByb3ZlbWVudD8NCj4gDQo+ IEEgY3VzdG9tZXIgdXBncmFkZWQgZnJvbSBTTEVTMTAgKDIuNi4xNiBiYXNlZCkgdG8gU0xFUzEx ICgzLjAgYmFzZWQpIGFuZCBzYXcNCj4gYSBzbG93ZG93biBvbiBzb21lIGxhcmdlIERCIGpvYnMg b2YgYmV0d2VlbiAxLjUgYW5kIDIgdGltZXMgKGkuZS4gdG90YWwgdGltZQ0KPiAxNTAlIHRvIDIw MCUgb2Ygd2hhdCBpcyB3YXMgYmVmb3JlKS4NCj4gQWZ0ZXIgc29tZSBhbmFseXNpcyB0aGV5IGNy ZWF0ZWQgbXVsdGlwbGUgdmlydHVhbCBJUHMgb24gdGhlIHNlcnZlciBhbmQNCj4gbW91bnRlZCB0 aGUgc2V2ZXJhbCBmaWxlc3lzdGVtIGVhY2ggZnJvbSBkaWZmZXJlbnQgSVBzIGFuZCBnb3QgdGhl DQo+IHBlcmZvcm1hbmNlIGJhY2sgKHRoZXkgc2VlIHRoaXMgYXMgYSB3b3JrLWFyb3VuZCByYXRo ZXIgdGhhbiBhIGdlbnVpbmUNCj4gc29sdXRpb24pLg0KPiBOdW1iZXJzIGFyZSBsaWtlICI1MDBN Qi9zIG9uIGEgc2luZ2xlIGNvbm5lY3Rpb24sIDg1ME1CL3NlYyBwZWFraW5nIHRvDQo+IDEwMDBN Qi9zZWMgb24gbXVsdGlwbGUgY29ubmVjdGlvbnMiLg0KPiANCj4gSWYgSSBjYW4gZ2V0IHNvbWV0 aGluZyBtb3JlIGNvbmNyZXRlIEknbGwgbGV0IHlvdSBrbm93Lg0KPiANCj4gQXMgdGhpcyB3b3Jr ZWQgd2VsbCBpbiAyLjYuMTYgKHdoaWNoIGRvZXNuJ3QgdHJ5IHRvIHNoYXJlIGNvbm5lY3Rpb25z KSB0aGlzDQo+IGlzIHNlZW4gYXMgYSByZWdyZXNzaW9uLg0KPiANCj4gT24gbGlua3MgdGhhdCBh cmUgZWFzeSB0byBzYXR1cmF0ZSwgY29uZ2VzdGlvbiB3aW5kb3dzIGFyZSBpbXBvcnRhbnQgYW5k DQo+IGhhdmluZyBhIHNpbmdsZSBjb25uZWN0aW9uIGlzIHByb2JhYmx5IGEgZ29vZCBpZGVhIC0g c28gdGhlIGN1cnJlbnQgZGVmYXVsdA0KPiBpcyBjZXJ0YWlubHkgY29ycmVjdC4NCj4gT24gYSAx MEcgZXRoZXJuZXQgb3IgaW5maW5pYmFuZCBjb25uZWN0aW9uICh3aGVyZSB0aGUgaXNzdWUgaGFz IGJlZW4NCj4gbWVhc3VyZWQpIGNvbmdlc3Rpb24ganVzdCBkb2Vzbid0IHNlZW0gdG8gYmUgYW4g aXNzdWUuDQoNCkl0IHdvdWxkIGhlbHAgaWYgd2UgY2FuIHVuZGVyc3RhbmQgd2hlcmUgdGhlIGFj dHVhbCBib3R0bGVuZWNrIGlzLiBJZg0KdGhpcyByZWFsbHkgaXMgYWJvdXQgbG9jayBjb250ZW50 aW9uLCB0aGVuIHNvbHZpbmcgdGhhdCBwcm9ibGVtIG1pZ2h0DQpoZWxwIHRoZSBzaW5nbGUgbW91 bnQgY2FzZSB0b28uLi4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQg bWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0 YXBwLmNvbQ0K -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 09, 2013 at 01:22:53PM +1000, NeilBrown wrote: > On Mon, 8 Jul 2013 18:51:40 +0000 "Myklebust, Trond" > <Trond.Myklebust@netapp.com> wrote: > > > On Mon, 2013-07-08 at 09:58 +1000, NeilBrown wrote: > > > > > > This patch adds a "nosharetransport" option to allow two different > > > mounts from the same server to use different transports. > > > If the mounts use NFSv4, or are of the same filesystem, then > > > "nosharecache" must be used as well. > > > > Won't this interfere with the recently added NFSv4 trunking detection? > > Will it? I googled around a bit but couldn't find anything that tells me > what trunking really was in this context. Then I found commit 05f4c350ee02 > which makes it quite clear (thanks Chuck!). > > Probably the code I wrote could interfere. > > > > > Also, how will it work with NFSv4.1 sessions? The server will usually > > require a BIND_CONN_TO_SESSION when new TCP connections attempt to > > attach to an existing session. Since the current client only requests SP4_NONE state protection, the BIND_CONN_TO_SESSION is implicit: If, when the client ID was created, the client opted for SP4_NONE state protection, the client is not required to use BIND_CONN_TO_SESSION to associate the connection with the session, unless the client wishes to associate the connection with the backchannel. When SP4_NONE protection is used, simply sending a COMPOUND request with a SEQUENCE operation is sufficient to associate the connection with the session specified in SEQUENCE. But Neil would need to make sure he's using all the state associated with the existing client. > Why would it attempt to attach to an existing session? I would hope there > the two different mounts with separate TCP connections would look completely > separate - different transport, different cache, different session. > ?? Sounds to me sharing these things shouldn't be a problem in your case, but I don't know. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfs/client.c b/fs/nfs/client.c index c513b0c..64e3f39 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -403,8 +403,13 @@ static struct nfs_client *nfs_match_client(const struct nfs_client_initdata *dat const struct sockaddr *sap = data->addr; struct nfs_net *nn = net_generic(data->net, nfs_net_id); + if (test_bit(NFS_CS_NO_SHARE, &data->init_flags)) + return NULL; + list_for_each_entry(clp, &nn->nfs_client_list, cl_share_link) { const struct sockaddr *clap = (struct sockaddr *)&clp->cl_addr; + if (test_bit(NFS_CS_NO_SHARE,&clp->cl_flags)) + continue; /* Don't match clients that failed to initialise properly */ if (clp->cl_cons_state < 0) continue; @@ -753,6 +758,8 @@ static int nfs_init_server(struct nfs_server *server, data->timeo, data->retrans); if (data->flags & NFS_MOUNT_NORESVPORT) set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags); + if (data->flags & NFS_MOUNT_NOSHARE_XPRT) + set_bit(NFS_CS_NO_SHARE, &cl_init.init_flags); if (server->options & NFS_OPTION_MIGRATION) set_bit(NFS_CS_MIGRATION, &cl_init.init_flags); diff --git a/fs/nfs/super.c b/fs/nfs/super.c index 2d7525f..d9141d8 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -88,6 +88,7 @@ enum { Opt_acl, Opt_noacl, Opt_rdirplus, Opt_nordirplus, Opt_sharecache, Opt_nosharecache, + Opt_sharetransport, Opt_nosharetransport, Opt_resvport, Opt_noresvport, Opt_fscache, Opt_nofscache, Opt_migration, Opt_nomigration, @@ -146,6 +147,8 @@ static const match_table_t nfs_mount_option_tokens = { { Opt_nordirplus, "nordirplus" }, { Opt_sharecache, "sharecache" }, { Opt_nosharecache, "nosharecache" }, + { Opt_sharetransport, "sharetransport"}, + { Opt_nosharetransport, "nosharetransport"}, { Opt_resvport, "resvport" }, { Opt_noresvport, "noresvport" }, { Opt_fscache, "fsc" }, @@ -634,6 +637,7 @@ static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss, { NFS_MOUNT_NOACL, ",noacl", "" }, { NFS_MOUNT_NORDIRPLUS, ",nordirplus", "" }, { NFS_MOUNT_UNSHARED, ",nosharecache", "" }, + { NFS_MOUNT_NOSHARE_XPRT, ",nosharetransport", ""}, { NFS_MOUNT_NORESVPORT, ",noresvport", "" }, { 0, NULL, NULL } }; @@ -1239,6 +1243,12 @@ static int nfs_parse_mount_options(char *raw, case Opt_nosharecache: mnt->flags |= NFS_MOUNT_UNSHARED; break; + case Opt_sharetransport: + mnt->flags &= ~NFS_MOUNT_NOSHARE_XPRT; + break; + case Opt_nosharetransport: + mnt->flags |= NFS_MOUNT_NOSHARE_XPRT; + break; case Opt_resvport: mnt->flags &= ~NFS_MOUNT_NORESVPORT; break; diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h index 3b7fa2a..9e9d7d3 100644 --- a/include/linux/nfs_fs_sb.h +++ b/include/linux/nfs_fs_sb.h @@ -41,6 +41,7 @@ struct nfs_client { #define NFS_CS_DISCRTRY 1 /* - disconnect on RPC retry */ #define NFS_CS_MIGRATION 2 /* - transparent state migr */ #define NFS_CS_INFINITE_SLOTS 3 /* - don't limit TCP slots */ +#define NFS_CS_NO_SHARE 4 /* - don't share across mounts */ struct sockaddr_storage cl_addr; /* server identifier */ size_t cl_addrlen; char * cl_hostname; /* hostname of server */ diff --git a/include/uapi/linux/nfs_mount.h b/include/uapi/linux/nfs_mount.h index 576bddd..81c49ff 100644 --- a/include/uapi/linux/nfs_mount.h +++ b/include/uapi/linux/nfs_mount.h @@ -73,5 +73,6 @@ struct nfs_mount_data { #define NFS_MOUNT_LOCAL_FLOCK 0x100000 #define NFS_MOUNT_LOCAL_FCNTL 0x200000 +#define NFS_MOUNT_NOSHARE_XPRT 0x400000 #endif
This patch adds a "nosharetransport" option to allow two different mounts from the same server to use different transports. If the mounts use NFSv4, or are of the same filesystem, then "nosharecache" must be used as well. There are at least two circumstances where it might be desirable to use separate transports: 1/ If the NFS server can get into a state where it will ignore requests for one filesystem while servicing request for another, then using separate connections for the separate filesystems can stop problems with one affecting access to the other. This is particularly relevant for NetApp filers where one filesystem has been "suspended". Requests to that filesystem will be dropped (rather than the more correct NFS3ERR_JUKEBOX). This currently interferes with other filesystems. 2/ If a very fast network is used with a many-processor client, a single TCP connection can present a bottle neck which reduces total throughput. Using multiple TCP connections (one per mount) removes the bottleneck. An alternate workaround is to configure multiple virtual IP addresses on the server and mount each filesystem from a different IP. This is effective (throughput goes up) but an unnecessary administrative burden. Signed-off-by: NeilBrown <neilb@suse.de> --- Is this a good idea? Bad idea? Have I missed something important? NeilBrown