Message ID | 20170630132120.31578-6-stefanha@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 06/30/2017 09:21 AM, Stefan Hajnoczi wrote: > Neither libtirpc nor getprotobyname(3) know about AF_VSOCK. For similar > reasons as for "rdma"/"rmda6", translate "vsock" manually in getport.c. > > It is now possible to mount a file system from the host (hypervisor) > over AF_VSOCK like this: > > (guest)$ mount.nfs 2:/export /mnt -v -o clientaddr=3,proto=vsock > > The VM's cid address is 3 and the hypervisor is 2. So this is how vsocks are going to look... There is not going to be a way to lookup an vsock address? Since the format of the clientaddr parameter shouldn't that be documented in the man page? I guess a general question, is this new mount type documented anywhere? steved. > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > --- > support/nfs/getport.c | 16 ++++++++++++---- > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/support/nfs/getport.c b/support/nfs/getport.c > index 081594c..0b857af 100644 > --- a/support/nfs/getport.c > +++ b/support/nfs/getport.c > @@ -217,8 +217,7 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol) > struct protoent *proto; > > /* > - * IANA does not define a protocol number for rdma netids, > - * since "rdma" is not an IP protocol. > + * IANA does not define protocol numbers for non-IP netids. > */ > if (strcmp(netid, "rdma") == 0) { > *family = AF_INET; > @@ -230,6 +229,11 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol) > *protocol = NFSPROTO_RDMA; > return 1; > } > + if (strcmp(netid, "vsock") == 0) { > + *family = AF_VSOCK; > + *protocol = 0; > + return 1; > + } > > nconf = getnetconfigent(netid); > if (nconf == NULL) > @@ -258,14 +262,18 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol) > struct protoent *proto; > > /* > - * IANA does not define a protocol number for rdma netids, > - * since "rdma" is not an IP protocol. > + * IANA does not define protocol numbers for non-IP netids. > */ > if (strcmp(netid, "rdma") == 0) { > *family = AF_INET; > *protocol = NFSPROTO_RDMA; > return 1; > } > + if (strcmp(netid, "vsock") == 0) { > + *family = AF_VSOCK; > + *protocol = 0; > + return 1; > + } > > proto = getprotobyname(netid); > if (proto == NULL) > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Stefan- > On Jun 30, 2017, at 9:21 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > Neither libtirpc nor getprotobyname(3) know about AF_VSOCK. Why? Basically you are building a lot of specialized awareness in applications and leaving the network layer alone. That seems backwards to me. > For similar > reasons as for "rdma"/"rmda6", translate "vsock" manually in getport.c. rdma/rdma6 are specified by standards, and appear in the IANA Network Identifiers database: https://www.iana.org/assignments/rpc-netids/rpc-netids.xhtml Is there a standard netid for vsock? If not, there needs to be some discussion with the nfsv4 Working Group to get this worked out. Because AF_VSOCK is an address family and the RPC framing is the same as TCP, the netid should be something like "tcpv" and not "vsock". I've complained about this before and there has been no response of any kind. I'll note that rdma/rdma6 do not use alternate address families: an IP address is specified and mapped to a GUID by the underlying transport. We purposely did not expose GUIDs to NFS, which is based on AF_INET/AF_INET6. rdma co-exists with IP. vsock doesn't have this fallback. It might be a better approach to use well-known (say, link-local or loopback) addresses and let the underlying network layer figure it out. Then hide all this stuff with DNS and let the client mount the server by hostname and use normal sockaddr's and "proto=tcp". Then you don't need _any_ application layer changes. Without hostnames, how does a client pick a Kerberos service principal for the server? Does rpcbind implement "vsock" netids? Does the NFSv4.0 client advertise "vsock" in SETCLIENTID, and provide a "vsock" callback service? > It is now possible to mount a file system from the host (hypervisor) > over AF_VSOCK like this: > > (guest)$ mount.nfs 2:/export /mnt -v -o clientaddr=3,proto=vsock > > The VM's cid address is 3 and the hypervisor is 2. The mount command is supposed to supply "clientaddr" automatically. This mount option is exposed only for debugging purposes or very special cases (like disabling NFSv4 callback operations). I mean the whole point of this exercise is to get rid of network configuration, but here you're adding the need to additionally specify both the proto option and the clientaddr option to get this to work. Seems like that isn't zero-configuration at all. Wouldn't it be nicer if it worked like this: (guest)$ cat /etc/hosts 129.0.0.2 localhyper (guest)$ mount.nfs localhyper:/export /mnt And the result was a working NFS mount of the local hypervisor, using whatever NFS version the two both support, with no changes needed to the NFS implementation or the understanding of the system administrator? > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > --- > support/nfs/getport.c | 16 ++++++++++++---- > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/support/nfs/getport.c b/support/nfs/getport.c > index 081594c..0b857af 100644 > --- a/support/nfs/getport.c > +++ b/support/nfs/getport.c > @@ -217,8 +217,7 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol) > struct protoent *proto; > > /* > - * IANA does not define a protocol number for rdma netids, > - * since "rdma" is not an IP protocol. > + * IANA does not define protocol numbers for non-IP netids. > */ > if (strcmp(netid, "rdma") == 0) { > *family = AF_INET; > @@ -230,6 +229,11 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol) > *protocol = NFSPROTO_RDMA; > return 1; > } > + if (strcmp(netid, "vsock") == 0) { > + *family = AF_VSOCK; > + *protocol = 0; > + return 1; > + } > > nconf = getnetconfigent(netid); > if (nconf == NULL) > @@ -258,14 +262,18 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol) > struct protoent *proto; > > /* > - * IANA does not define a protocol number for rdma netids, > - * since "rdma" is not an IP protocol. > + * IANA does not define protocol numbers for non-IP netids. > */ > if (strcmp(netid, "rdma") == 0) { > *family = AF_INET; > *protocol = NFSPROTO_RDMA; > return 1; > } > + if (strcmp(netid, "vsock") == 0) { > + *family = AF_VSOCK; > + *protocol = 0; > + return 1; > + } > > proto = getprotobyname(netid); > if (proto == NULL) > -- > 2.9.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 30 2017, Chuck Lever wrote: > > Wouldn't it be nicer if it worked like this: > > (guest)$ cat /etc/hosts > 129.0.0.2 localhyper > (guest)$ mount.nfs localhyper:/export /mnt > > And the result was a working NFS mount of the > local hypervisor, using whatever NFS version the > two both support, with no changes needed to the > NFS implementation or the understanding of the > system administrator? Yes. Yes. Definitely Yes. Though I suspect you mean "127.0.0.2", not "129..."?? There must be some way to redirect TCP connections to some address transparently through to the vsock protocol. The "sshuttle" program does this to transparently forward TCP connections over an ssh connection. Using a similar technique to forward connections over vsock shouldn't be hard. Or is performance really critical, and you get too much copying when you try forwarding connections? I suspect that is fixable, but it would be a little less straight forward. I would really *not* like to see vsock support being bolted into one network tool after another. NeilBrown
On Fri, Jul 07 2017, NeilBrown wrote: > On Fri, Jun 30 2017, Chuck Lever wrote: >> >> Wouldn't it be nicer if it worked like this: >> >> (guest)$ cat /etc/hosts >> 129.0.0.2 localhyper >> (guest)$ mount.nfs localhyper:/export /mnt >> >> And the result was a working NFS mount of the >> local hypervisor, using whatever NFS version the >> two both support, with no changes needed to the >> NFS implementation or the understanding of the >> system administrator? > > Yes. Yes. Definitely Yes. > Though I suspect you mean "127.0.0.2", not "129..."?? > > There must be some way to redirect TCP connections to some address > transparently through to the vsock protocol. > The "sshuttle" program does this to transparently forward TCP connections > over an ssh connection. Using a similar technique to forward > connections over vsock shouldn't be hard. > > Or is performance really critical, and you get too much copying when you > try forwarding connections? I suspect that is fixable, but it would be > a little less straight forward. > > I would really *not* like to see vsock support being bolted into one > network tool after another. I've been digging into this a big more. I came across https://vmsplice.net/~stefan/stefanha-kvm-forum-2015.pdf which (on page 7) lists some reasons not to use TCP/IP between guest and host. . Adding & configuring guest interfaces is invasive That is possibly true. But adding support for a new address family to NFS, NFSD, and nfs-utils is also very invasive. You would need to install this software on the guest. I suggest you install different software on the guest which solves the problem better. . Prone to break due to config changes inside guest This is, I suspect, a key issue. With vsock, the address of the guest-side interface is defined by options passed to qemu. With normal IP addressing, the guest has to configure the address. However I think that IPv6 autoconfig makes this work well without vsock. If I create a bridge interface on the host, run ip -6 addr add fe80::1 dev br0 then run a guest with -net nic,macaddr=Ch:oo:se:an:ad:dr \ -net bridge,br=br0 \ then the client can mount [fe80::1%interfacename]:/path /mountpoint and the host will see a connection from fe80::ch:oo:se:an:ad:dr So from the guest side, I have achieved zero-config NFS mounts from the host. I don't think the server can filter connections based on which interface a link-local address came from. If that was a problem that someone wanted to be fixed, I'm sure we can fix it. If you need to be sure that clients don't fake their IPv6 address, I'm sure netfilter is up to the task. . Creates network interfaces on host that must be managed What vsock does is effectively create a hidden interface on the host that only the kernel knows about and so the sysadmin cannot break it. The only difference between this and an explicit interface on the host is that the latter requires a competent sysadmin. If you have other reasons for preferring the use of vsock for NFS, I'd be happy to hear them. So far I'm not convinced. Thanks, NeilBrown
> On Jul 6, 2017, at 11:17 PM, NeilBrown <neilb@suse.com> wrote: > > On Fri, Jun 30 2017, Chuck Lever wrote: >> >> Wouldn't it be nicer if it worked like this: >> >> (guest)$ cat /etc/hosts >> 129.0.0.2 localhyper >> (guest)$ mount.nfs localhyper:/export /mnt >> >> And the result was a working NFS mount of the >> local hypervisor, using whatever NFS version the >> two both support, with no changes needed to the >> NFS implementation or the understanding of the >> system administrator? > > Yes. Yes. Definitely Yes. > Though I suspect you mean "127.0.0.2", not "129..."?? I meant 129.x. 127.0.0 has well-defined semantics as a loopback to the same host. The hypervisor is clearly a network entity that is distinct from the local host. But maybe you could set up 127.0.0.2, .3 for this purpose? Someone smarter than me could figure out what is best to use here. I'm not familiar with all the rules for loopback and link-local IPv4 addressing. Loopback is the correct analogy, though. It has predictable host numbers that can be known in advance, and loopback networking is set up automatically on a host, without the need for a physical network interface. These are the stated goals for vsock. The benefit for re-using loopback here is that every application that can speak AF_INET can already use it. For NFS that means all the traditional features work: rpcbind, NFSv4.0 callback, IP-based share access control, and Kerberos, and especially DNS so that you can mount by hostname. > There must be some way to redirect TCP connections to some address > transparently through to the vsock protocol. > The "sshuttle" program does this to transparently forward TCP connections > over an ssh connection. Using a similar technique to forward > connections over vsock shouldn't be hard. > > Or is performance really critical, and you get too much copying when you > try forwarding connections? I suspect that is fixable, but it would be > a little less straight forward. > > I would really *not* like to see vsock support being bolted into one > network tool after another. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 30, 2017 at 11:01:13AM -0400, Steve Dickson wrote: > On 06/30/2017 09:21 AM, Stefan Hajnoczi wrote: > > Neither libtirpc nor getprotobyname(3) know about AF_VSOCK. For similar > > reasons as for "rdma"/"rmda6", translate "vsock" manually in getport.c. > > > > It is now possible to mount a file system from the host (hypervisor) > > over AF_VSOCK like this: > > > > (guest)$ mount.nfs 2:/export /mnt -v -o clientaddr=3,proto=vsock > > > > The VM's cid address is 3 and the hypervisor is 2. > So this is how vsocks are going to look... > There is not going to be a way to lookup an vsock address? > Since the format of the clientaddr parameter shouldn't > that be documented in the man page? AF_VSOCK does not have name resolution. The scope of the CID addresses is just the hypervisor that the VMs are running on. Inter-VM communication is not allowed. The virtualization software has the CIDs so there's not much use for name resolution. > I guess a general question, is this new mount type > documented anywhere? Thanks for pointing this out. I'll update the man pages in the next revision of this patch series.
On Fri, Jun 30, 2017 at 11:52:15AM -0400, Chuck Lever wrote: > > On Jun 30, 2017, at 9:21 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > > Neither libtirpc nor getprotobyname(3) know about AF_VSOCK. > > Why? > > Basically you are building a lot of specialized > awareness in applications and leaving the > network layer alone. That seems backwards to me. Yes. I posted glibc patches but there were concerns that getaddrinfo(3) is IPv4/IPv6 only and applications need to be ported to AF_VSOCK anyway, so there's not much to gain by adding it: https://cygwin.com/ml/libc-alpha/2016-10/msg00126.html > > For similar > > reasons as for "rdma"/"rmda6", translate "vsock" manually in getport.c. > > rdma/rdma6 are specified by standards, and appear > in the IANA Network Identifiers database: > > https://www.iana.org/assignments/rpc-netids/rpc-netids.xhtml > > Is there a standard netid for vsock? If not, > there needs to be some discussion with the nfsv4 > Working Group to get this worked out. > > Because AF_VSOCK is an address family and the RPC > framing is the same as TCP, the netid should be > something like "tcpv" and not "vsock". I've > complained about this before and there has been > no response of any kind. > > I'll note that rdma/rdma6 do not use alternate > address families: an IP address is specified and > mapped to a GUID by the underlying transport. > We purposely did not expose GUIDs to NFS, which > is based on AF_INET/AF_INET6. > > rdma co-exists with IP. vsock doesn't have this > fallback. Thanks for explaining the tcp + rdma relationship, that makes sense. There is no standard netid for vsock yet. Sorry I didn't ask about "tcpv" when you originally proposed it, I lost track of that discussion. You said: If this really is just TCP on a new address family, then "tcpv" is more in line with previous work, and you can get away with just an IANA action for a new netid, since RPC-over-TCP is already specified. Does "just TCP" mean a "connection-oriented, stream-oriented transport using RFC 1831 Record Marking"? Or does "TCP" have any other attributes? NFS over AF_VSOCK definitely is "connection-oriented, stream-oriented transport using RFC 1831 Record Marking". I'm just not sure whether there are any other assumptions beyond this that AF_VSOCK might not meet because it isn't IP and has 32-bit port numbers. > It might be a better approach to use well-known > (say, link-local or loopback) addresses and let > the underlying network layer figure it out. > > Then hide all this stuff with DNS and let the > client mount the server by hostname and use > normal sockaddr's and "proto=tcp". Then you don't > need _any_ application layer changes. > > Without hostnames, how does a client pick a > Kerberos service principal for the server? I'm not sure Kerberos would be used with AF_VSOCK. The hypervisor knows about the VMs, addresses cannot be spoofed, and VMs can only communicate with the hypervisor. This leads to a simple trust relationship. > Does rpcbind implement "vsock" netids? I have not modified rpcbind. My understanding is that rpcbind isn't required for NFSv4. Since this is a new transport there is no plan for it to run old protocol versions. > Does the NFSv4.0 client advertise "vsock" in > SETCLIENTID, and provide a "vsock" callback > service? The kernel patches implement backchannel support although I haven't exercised it. > > It is now possible to mount a file system from the host (hypervisor) > > over AF_VSOCK like this: > > > > (guest)$ mount.nfs 2:/export /mnt -v -o clientaddr=3,proto=vsock > > > > The VM's cid address is 3 and the hypervisor is 2. > > The mount command is supposed to supply "clientaddr" > automatically. This mount option is exposed only for > debugging purposes or very special cases (like > disabling NFSv4 callback operations). > > I mean the whole point of this exercise is to get > rid of network configuration, but here you're > adding the need to additionally specify both the > proto option and the clientaddr option to get this > to work. Seems like that isn't zero-configuration > at all. Thanks for pointing this out. Will fix in v2, there should be no need to manually specify the client address, this is a remnant from early development. > Wouldn't it be nicer if it worked like this: > > (guest)$ cat /etc/hosts > 129.0.0.2 localhyper > (guest)$ mount.nfs localhyper:/export /mnt > > And the result was a working NFS mount of the > local hypervisor, using whatever NFS version the > two both support, with no changes needed to the > NFS implementation or the understanding of the > system administrator? This is an interesting idea, thanks! It would be neat to have AF_INET access over the loopback interface on both guest and host.
On Wed, 2017-07-19 at 16:11 +0100, Stefan Hajnoczi wrote: > On Fri, Jun 30, 2017 at 11:52:15AM -0400, Chuck Lever wrote: > > > On Jun 30, 2017, at 9:21 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > > > > Neither libtirpc nor getprotobyname(3) know about AF_VSOCK. > > > > Why? > > > > Basically you are building a lot of specialized > > awareness in applications and leaving the > > network layer alone. That seems backwards to me. > > Yes. I posted glibc patches but there were concerns that getaddrinfo(3) > is IPv4/IPv6 only and applications need to be ported to AF_VSOCK anyway, > so there's not much to gain by adding it: > https://cygwin.com/ml/libc-alpha/2016-10/msg00126.html > > > > For similar > > > reasons as for "rdma"/"rmda6", translate "vsock" manually in getport.c. > > > > rdma/rdma6 are specified by standards, and appear > > in the IANA Network Identifiers database: > > > > https://www.iana.org/assignments/rpc-netids/rpc-netids.xhtml > > > > Is there a standard netid for vsock? If not, > > there needs to be some discussion with the nfsv4 > > Working Group to get this worked out. > > > > Because AF_VSOCK is an address family and the RPC > > framing is the same as TCP, the netid should be > > something like "tcpv" and not "vsock". I've > > complained about this before and there has been > > no response of any kind. > > > > I'll note that rdma/rdma6 do not use alternate > > address families: an IP address is specified and > > mapped to a GUID by the underlying transport. > > We purposely did not expose GUIDs to NFS, which > > is based on AF_INET/AF_INET6. > > > > rdma co-exists with IP. vsock doesn't have this > > fallback. > > Thanks for explaining the tcp + rdma relationship, that makes sense. > > There is no standard netid for vsock yet. > > Sorry I didn't ask about "tcpv" when you originally proposed it, I lost > track of that discussion. You said: > > If this really is just TCP on a new address family, then "tcpv" > is more in line with previous work, and you can get away with > just an IANA action for a new netid, since RPC-over-TCP is > already specified. > > Does "just TCP" mean a "connection-oriented, stream-oriented transport > using RFC 1831 Record Marking"? Or does "TCP" have any other > attributes? > > NFS over AF_VSOCK definitely is "connection-oriented, stream-oriented > transport using RFC 1831 Record Marking". I'm just not sure whether > there are any other assumptions beyond this that AF_VSOCK might not meet > because it isn't IP and has 32-bit port numbers. > > > It might be a better approach to use well-known > > (say, link-local or loopback) addresses and let > > the underlying network layer figure it out. > > > > Then hide all this stuff with DNS and let the > > client mount the server by hostname and use > > normal sockaddr's and "proto=tcp". Then you don't > > need _any_ application layer changes. > > > > Without hostnames, how does a client pick a > > Kerberos service principal for the server? > > I'm not sure Kerberos would be used with AF_VSOCK. The hypervisor knows > about the VMs, addresses cannot be spoofed, and VMs can only communicate > with the hypervisor. This leads to a simple trust relationship. > > > Does rpcbind implement "vsock" netids? > > I have not modified rpcbind. My understanding is that rpcbind isn't > required for NFSv4. Since this is a new transport there is no plan for > it to run old protocol versions. > > > Does the NFSv4.0 client advertise "vsock" in > > SETCLIENTID, and provide a "vsock" callback > > service? > > The kernel patches implement backchannel support although I haven't > exercised it. > > > > It is now possible to mount a file system from the host (hypervisor) > > > over AF_VSOCK like this: > > > > > > (guest)$ mount.nfs 2:/export /mnt -v -o clientaddr=3,proto=vsock > > > > > > The VM's cid address is 3 and the hypervisor is 2. > > > > The mount command is supposed to supply "clientaddr" > > automatically. This mount option is exposed only for > > debugging purposes or very special cases (like > > disabling NFSv4 callback operations). > > > > I mean the whole point of this exercise is to get > > rid of network configuration, but here you're > > adding the need to additionally specify both the > > proto option and the clientaddr option to get this > > to work. Seems like that isn't zero-configuration > > at all. > > Thanks for pointing this out. Will fix in v2, there should be no need > to manually specify the client address, this is a remnant from early > development. > > > Wouldn't it be nicer if it worked like this: > > > > (guest)$ cat /etc/hosts > > 129.0.0.2 localhyper > > (guest)$ mount.nfs localhyper:/export /mnt > > > > And the result was a working NFS mount of the > > local hypervisor, using whatever NFS version the > > two both support, with no changes needed to the > > NFS implementation or the understanding of the > > system administrator? > > This is an interesting idea, thanks! It would be neat to have AF_INET > access over the loopback interface on both guest and host. I too really like this idea better as it seems a lot less invasive. Existing applications would "just work" without needing to be changed, and you get name resolution to boot. Chuck, is 129.0.0.X within some reserved block of addrs such that you could get a standard range for this? I didn't see that block listed here during my half-assed web search: https://en.wikipedia.org/wiki/Reserved_IP_addresses Maybe you meant 192.0.0.X ? It might be easier and more future proof to get a chunk of ipv6 addrs carved out though.
> On Jul 19, 2017, at 17:35, Jeff Layton <jlayton@redhat.com> wrote: > > On Wed, 2017-07-19 at 16:11 +0100, Stefan Hajnoczi wrote: >> On Fri, Jun 30, 2017 at 11:52:15AM -0400, Chuck Lever wrote: >>>> On Jun 30, 2017, at 9:21 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote: >>>> >>>> Neither libtirpc nor getprotobyname(3) know about AF_VSOCK. >>> >>> Why? >>> >>> Basically you are building a lot of specialized >>> awareness in applications and leaving the >>> network layer alone. That seems backwards to me. >> >> Yes. I posted glibc patches but there were concerns that getaddrinfo(3) >> is IPv4/IPv6 only and applications need to be ported to AF_VSOCK anyway, >> so there's not much to gain by adding it: >> https://cygwin.com/ml/libc-alpha/2016-10/msg00126.html >> >>>> For similar >>>> reasons as for "rdma"/"rmda6", translate "vsock" manually in getport.c. >>> >>> rdma/rdma6 are specified by standards, and appear >>> in the IANA Network Identifiers database: >>> >>> https://www.iana.org/assignments/rpc-netids/rpc-netids.xhtml >>> >>> Is there a standard netid for vsock? If not, >>> there needs to be some discussion with the nfsv4 >>> Working Group to get this worked out. >>> >>> Because AF_VSOCK is an address family and the RPC >>> framing is the same as TCP, the netid should be >>> something like "tcpv" and not "vsock". I've >>> complained about this before and there has been >>> no response of any kind. >>> >>> I'll note that rdma/rdma6 do not use alternate >>> address families: an IP address is specified and >>> mapped to a GUID by the underlying transport. >>> We purposely did not expose GUIDs to NFS, which >>> is based on AF_INET/AF_INET6. >>> >>> rdma co-exists with IP. vsock doesn't have this >>> fallback. >> >> Thanks for explaining the tcp + rdma relationship, that makes sense. >> >> There is no standard netid for vsock yet. >> >> Sorry I didn't ask about "tcpv" when you originally proposed it, I lost >> track of that discussion. You said: >> >> If this really is just TCP on a new address family, then "tcpv" >> is more in line with previous work, and you can get away with >> just an IANA action for a new netid, since RPC-over-TCP is >> already specified. >> >> Does "just TCP" mean a "connection-oriented, stream-oriented transport >> using RFC 1831 Record Marking"? Or does "TCP" have any other >> attributes? >> >> NFS over AF_VSOCK definitely is "connection-oriented, stream-oriented >> transport using RFC 1831 Record Marking". I'm just not sure whether >> there are any other assumptions beyond this that AF_VSOCK might not meet >> because it isn't IP and has 32-bit port numbers. >> >>> It might be a better approach to use well-known >>> (say, link-local or loopback) addresses and let >>> the underlying network layer figure it out. >>> >>> Then hide all this stuff with DNS and let the >>> client mount the server by hostname and use >>> normal sockaddr's and "proto=tcp". Then you don't >>> need _any_ application layer changes. >>> >>> Without hostnames, how does a client pick a >>> Kerberos service principal for the server? >> >> I'm not sure Kerberos would be used with AF_VSOCK. The hypervisor knows >> about the VMs, addresses cannot be spoofed, and VMs can only communicate >> with the hypervisor. This leads to a simple trust relationship. >> >>> Does rpcbind implement "vsock" netids? >> >> I have not modified rpcbind. My understanding is that rpcbind isn't >> required for NFSv4. Since this is a new transport there is no plan for >> it to run old protocol versions. >> >>> Does the NFSv4.0 client advertise "vsock" in >>> SETCLIENTID, and provide a "vsock" callback >>> service? >> >> The kernel patches implement backchannel support although I haven't >> exercised it. >> >>>> It is now possible to mount a file system from the host (hypervisor) >>>> over AF_VSOCK like this: >>>> >>>> (guest)$ mount.nfs 2:/export /mnt -v -o clientaddr=3,proto=vsock >>>> >>>> The VM's cid address is 3 and the hypervisor is 2. >>> >>> The mount command is supposed to supply "clientaddr" >>> automatically. This mount option is exposed only for >>> debugging purposes or very special cases (like >>> disabling NFSv4 callback operations). >>> >>> I mean the whole point of this exercise is to get >>> rid of network configuration, but here you're >>> adding the need to additionally specify both the >>> proto option and the clientaddr option to get this >>> to work. Seems like that isn't zero-configuration >>> at all. >> >> Thanks for pointing this out. Will fix in v2, there should be no need >> to manually specify the client address, this is a remnant from early >> development. >> >>> Wouldn't it be nicer if it worked like this: >>> >>> (guest)$ cat /etc/hosts >>> 129.0.0.2 localhyper >>> (guest)$ mount.nfs localhyper:/export /mnt >>> >>> And the result was a working NFS mount of the >>> local hypervisor, using whatever NFS version the >>> two both support, with no changes needed to the >>> NFS implementation or the understanding of the >>> system administrator? >> >> This is an interesting idea, thanks! It would be neat to have AF_INET >> access over the loopback interface on both guest and host. > > I too really like this idea better as it seems a lot less invasive. > Existing applications would "just work" without needing to be changed, > and you get name resolution to boot. > > Chuck, is 129.0.0.X within some reserved block of addrs such that you > could get a standard range for this? I didn't see that block listed here > during my half-assed web search: > > https://en.wikipedia.org/wiki/Reserved_IP_addresses I thought there would be some range of link-local addresses that could make this work with IPv4, similar to 192. or 10. that are "unroutable" site-local addresses. If there isn't then IPv6 might have what we need. > Maybe you meant 192.0.0.X ? It might be easier and more future proof to > get a chunk of ipv6 addrs carved out though. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Jul 19, 2017, at 17:11, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > On Fri, Jun 30, 2017 at 11:52:15AM -0400, Chuck Lever wrote: >>> On Jun 30, 2017, at 9:21 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote: >>> >>> Neither libtirpc nor getprotobyname(3) know about AF_VSOCK. >> >> Why? >> >> Basically you are building a lot of specialized >> awareness in applications and leaving the >> network layer alone. That seems backwards to me. > > Yes. I posted glibc patches but there were concerns that getaddrinfo(3) > is IPv4/IPv6 only and applications need to be ported to AF_VSOCK anyway, > so there's not much to gain by adding it: > https://cygwin.com/ml/libc-alpha/2016-10/msg00126.html > >>> For similar >>> reasons as for "rdma"/"rmda6", translate "vsock" manually in getport.c. >> >> rdma/rdma6 are specified by standards, and appear >> in the IANA Network Identifiers database: >> >> https://www.iana.org/assignments/rpc-netids/rpc-netids.xhtml >> >> Is there a standard netid for vsock? If not, >> there needs to be some discussion with the nfsv4 >> Working Group to get this worked out. >> >> Because AF_VSOCK is an address family and the RPC >> framing is the same as TCP, the netid should be >> something like "tcpv" and not "vsock". I've >> complained about this before and there has been >> no response of any kind. >> >> I'll note that rdma/rdma6 do not use alternate >> address families: an IP address is specified and >> mapped to a GUID by the underlying transport. >> We purposely did not expose GUIDs to NFS, which >> is based on AF_INET/AF_INET6. >> >> rdma co-exists with IP. vsock doesn't have this >> fallback. > > Thanks for explaining the tcp + rdma relationship, that makes sense. > > There is no standard netid for vsock yet. > > Sorry I didn't ask about "tcpv" when you originally proposed it, I lost > track of that discussion. You said: > > If this really is just TCP on a new address family, then "tcpv" > is more in line with previous work, and you can get away with > just an IANA action for a new netid, since RPC-over-TCP is > already specified. > > Does "just TCP" mean a "connection-oriented, stream-oriented transport > using RFC 1831 Record Marking"? Or does "TCP" have any other > attributes? > > NFS over AF_VSOCK definitely is "connection-oriented, stream-oriented > transport using RFC 1831 Record Marking". I'm just not sure whether > there are any other assumptions beyond this that AF_VSOCK might not meet > because it isn't IP and has 32-bit port numbers. Right, it is TCP in the sense that it is connection-oriented and so on. It looks like a stream socket to the RPC client. TI-RPC calls this "tpi_cots_ord". But it isn't TCP in the sense that you aren't moving TCP segments over the link. I think the "IP / 32-bit ports" is handled entirely within the address variant that your link is using. >> It might be a better approach to use well-known >> (say, link-local or loopback) addresses and let >> the underlying network layer figure it out. >> >> Then hide all this stuff with DNS and let the >> client mount the server by hostname and use >> normal sockaddr's and "proto=tcp". Then you don't >> need _any_ application layer changes. >> >> Without hostnames, how does a client pick a >> Kerberos service principal for the server? > > I'm not sure Kerberos would be used with AF_VSOCK. The hypervisor knows > about the VMs, addresses cannot be spoofed, and VMs can only communicate > with the hypervisor. This leads to a simple trust relationship. The clients can be exploited if they are exposed in any way to remote users. Having at least sec=krb5 might be a way to block attackers from accessing data on the NFS server from a compromised client. In any event, NFSv4 will need ID mapping. Do you have a sense of how the server and clients will determine their NFSv4 ID mapping domain name? How will the server and client user ID databases be kept in synchrony? You might have some issues if there is a "cel" in multiple guests that are actually different users. >> Does rpcbind implement "vsock" netids? > > I have not modified rpcbind. My understanding is that rpcbind isn't > required for NFSv4. Since this is a new transport there is no plan for > it to run old protocol versions. > >> Does the NFSv4.0 client advertise "vsock" in >> SETCLIENTID, and provide a "vsock" callback >> service? > > The kernel patches implement backchannel support although I haven't > exercised it. > >>> It is now possible to mount a file system from the host (hypervisor) >>> over AF_VSOCK like this: >>> >>> (guest)$ mount.nfs 2:/export /mnt -v -o clientaddr=3,proto=vsock >>> >>> The VM's cid address is 3 and the hypervisor is 2. >> >> The mount command is supposed to supply "clientaddr" >> automatically. This mount option is exposed only for >> debugging purposes or very special cases (like >> disabling NFSv4 callback operations). >> >> I mean the whole point of this exercise is to get >> rid of network configuration, but here you're >> adding the need to additionally specify both the >> proto option and the clientaddr option to get this >> to work. Seems like that isn't zero-configuration >> at all. > > Thanks for pointing this out. Will fix in v2, there should be no need > to manually specify the client address, this is a remnant from early > development. > >> Wouldn't it be nicer if it worked like this: >> >> (guest)$ cat /etc/hosts >> 129.0.0.2 localhyper >> (guest)$ mount.nfs localhyper:/export /mnt >> >> And the result was a working NFS mount of the >> local hypervisor, using whatever NFS version the >> two both support, with no changes needed to the >> NFS implementation or the understanding of the >> system administrator? > > This is an interesting idea, thanks! It would be neat to have AF_INET > access over the loopback interface on both guest and host. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote: > On Fri, Jul 07 2017, NeilBrown wrote: > > > On Fri, Jun 30 2017, Chuck Lever wrote: > >> > >> Wouldn't it be nicer if it worked like this: > >> > >> (guest)$ cat /etc/hosts > >> 129.0.0.2 localhyper > >> (guest)$ mount.nfs localhyper:/export /mnt > >> > >> And the result was a working NFS mount of the > >> local hypervisor, using whatever NFS version the > >> two both support, with no changes needed to the > >> NFS implementation or the understanding of the > >> system administrator? > > > > Yes. Yes. Definitely Yes. > > Though I suspect you mean "127.0.0.2", not "129..."?? > > > > There must be some way to redirect TCP connections to some address > > transparently through to the vsock protocol. > > The "sshuttle" program does this to transparently forward TCP connections > > over an ssh connection. Using a similar technique to forward > > connections over vsock shouldn't be hard. > > > > Or is performance really critical, and you get too much copying when you > > try forwarding connections? I suspect that is fixable, but it would be > > a little less straight forward. > > > > I would really *not* like to see vsock support being bolted into one > > network tool after another. > > I've been digging into this a big more. I came across > https://vmsplice.net/~stefan/stefanha-kvm-forum-2015.pdf > > which (on page 7) lists some reasons not to use TCP/IP between guest > and host. > > . Adding & configuring guest interfaces is invasive > > That is possibly true. But adding support for a new address family to > NFS, NFSD, and nfs-utils is also very invasive. You would need to > install this software on the guest. I suggest you install different > software on the guest which solves the problem better. Two different types of "invasive": 1. Requiring guest configuration changes that are likely to cause conflicts. 2. Requiring changes to the software stack. Once installed there are no conflicts. I'm interested and open to a different solution but it must avoid invasive configuration changes, especially inside the guest. > . Prone to break due to config changes inside guest > > This is, I suspect, a key issue. With vsock, the address of the > guest-side interface is defined by options passed to qemu. With > normal IP addressing, the guest has to configure the address. > > However I think that IPv6 autoconfig makes this work well without vsock. > If I create a bridge interface on the host, run > ip -6 addr add fe80::1 dev br0 > then run a guest with > -net nic,macaddr=Ch:oo:se:an:ad:dr \ > -net bridge,br=br0 \ > > then the client can > mount [fe80::1%interfacename]:/path /mountpoint > > and the host will see a connection from > fe80::ch:oo:se:an:ad:dr > > So from the guest side, I have achieved zero-config NFS mounts from the > host. It is not zero-configuration since [fe80::1%interfacename] contains a variable, "interfacename", whose value is unknown ahead of time. This will make documentation as well as ability to share configuration between VMs more difficult. In other words, we're back to something that requires per-guest configuration and doesn't just work everywhere. > I don't think the server can filter connections based on which interface > a link-local address came from. If that was a problem that someone > wanted to be fixed, I'm sure we can fix it. > > If you need to be sure that clients don't fake their IPv6 address, I'm > sure netfilter is up to the task. Yes, it's common to prevent spoofing on the host using netfilter and I think it wouldn't be a problem. > . Creates network interfaces on host that must be managed > > What vsock does is effectively create a hidden interface on the host that only the > kernel knows about and so the sysadmin cannot break it. The only > difference between this and an explicit interface on the host is that > the latter requires a competent sysadmin. > > If you have other reasons for preferring the use of vsock for NFS, I'd be > happy to hear them. So far I'm not convinced. Before working on AF_VSOCK I originally proposed adding dedicated network interfaces to guests, similar to what you've suggested, but there was resistance for additional reasons that weren't covered in the presentation: Using AF_INET exposes the host's network stack to guests, and through accidental misconfiguration even external traffic could reach the host's network stack. AF_VSOCK doesn't do routing or forwarding so we can be sure that any activity is intentional. Some virtualization use cases run guests without any network interfaces as a matter of security policy. One could argue that AF_VSOCK is just another network channel, but due to it's restricted usage, the attack surface is much smaller than an AF_INET network interface.
On Fri, Jul 07, 2017 at 01:17:54PM +1000, NeilBrown wrote: > On Fri, Jun 30 2017, Chuck Lever wrote: > > > > Wouldn't it be nicer if it worked like this: > > > > (guest)$ cat /etc/hosts > > 129.0.0.2 localhyper > > (guest)$ mount.nfs localhyper:/export /mnt > > > > And the result was a working NFS mount of the > > local hypervisor, using whatever NFS version the > > two both support, with no changes needed to the > > NFS implementation or the understanding of the > > system administrator? > > Yes. Yes. Definitely Yes. > Though I suspect you mean "127.0.0.2", not "129..."?? > > There must be some way to redirect TCP connections to some address > transparently through to the vsock protocol. > The "sshuttle" program does this to transparently forward TCP connections > over an ssh connection. Using a similar technique to forward > connections over vsock shouldn't be hard. Thanks for the sshuttle reference. I've taken a look at it and the underlying iptables extensions. sshuttle does not have the ability to accept incoming connections but that can be achieved by adding the IP to the loopback device. Here is how bi-directional TCP connections can be tunnelled without network interfaces: host <-> vsock transport <-> guest 129.0.0.2 (lo) 129.0.0.2 (lo) 129.0.0.3 (lo) 129.0.0.3 (lo) iptables REDIRECT is used to catch 129.0.0.2->.3 connections on the host and 129.0.0.3->.2 connections in the guest. A "connect" command is then sent across the tunnel to establish a new TCP connection on the other side. Note that this isn't NAT since both sides see the correct IP addresses. Unlike using a network interface (even tun/tap) this tunnelling approach is restricted to TCP connections. It doesn't have UDP, etc. Issues: 1. Adding IPs to dev lo has side effects. For example, firewall rules on dev lo will affect the traffic. This alone probably prevents the approach from working without conflicts on existing guests. 2. Is there a safe address range to use? Using IPv6 link-local addresses as suggested in this thread might work, especially when using an OUI so we can be sure there are no address collisions. 3. Performance has already been mentioned since a userspace process tunnels from loopback TCP to the vsock transport. splice(2) can probably be used. Stefan
On Tue, Jul 25 2017, Stefan Hajnoczi wrote: > On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote: >> On Fri, Jul 07 2017, NeilBrown wrote: >> >> > On Fri, Jun 30 2017, Chuck Lever wrote: >> >> >> >> Wouldn't it be nicer if it worked like this: >> >> >> >> (guest)$ cat /etc/hosts >> >> 129.0.0.2 localhyper >> >> (guest)$ mount.nfs localhyper:/export /mnt >> >> >> >> And the result was a working NFS mount of the >> >> local hypervisor, using whatever NFS version the >> >> two both support, with no changes needed to the >> >> NFS implementation or the understanding of the >> >> system administrator? >> > >> > Yes. Yes. Definitely Yes. >> > Though I suspect you mean "127.0.0.2", not "129..."?? >> > >> > There must be some way to redirect TCP connections to some address >> > transparently through to the vsock protocol. >> > The "sshuttle" program does this to transparently forward TCP connections >> > over an ssh connection. Using a similar technique to forward >> > connections over vsock shouldn't be hard. >> > >> > Or is performance really critical, and you get too much copying when you >> > try forwarding connections? I suspect that is fixable, but it would be >> > a little less straight forward. >> > >> > I would really *not* like to see vsock support being bolted into one >> > network tool after another. >> >> I've been digging into this a big more. I came across >> https://vmsplice.net/~stefan/stefanha-kvm-forum-2015.pdf >> >> which (on page 7) lists some reasons not to use TCP/IP between guest >> and host. >> >> . Adding & configuring guest interfaces is invasive >> >> That is possibly true. But adding support for a new address family to >> NFS, NFSD, and nfs-utils is also very invasive. You would need to >> install this software on the guest. I suggest you install different >> software on the guest which solves the problem better. > > Two different types of "invasive": > 1. Requiring guest configuration changes that are likely to cause > conflicts. > 2. Requiring changes to the software stack. Once installed there are no > conflicts. > > I'm interested and open to a different solution but it must avoid > invasive configuration changes, especially inside the guest. Sounds fair. > >> . Prone to break due to config changes inside guest >> >> This is, I suspect, a key issue. With vsock, the address of the >> guest-side interface is defined by options passed to qemu. With >> normal IP addressing, the guest has to configure the address. >> >> However I think that IPv6 autoconfig makes this work well without vsock. >> If I create a bridge interface on the host, run >> ip -6 addr add fe80::1 dev br0 >> then run a guest with >> -net nic,macaddr=Ch:oo:se:an:ad:dr \ >> -net bridge,br=br0 \ >> >> then the client can >> mount [fe80::1%interfacename]:/path /mountpoint >> >> and the host will see a connection from >> fe80::ch:oo:se:an:ad:dr >> >> So from the guest side, I have achieved zero-config NFS mounts from the >> host. > > It is not zero-configuration since [fe80::1%interfacename] contains a > variable, "interfacename", whose value is unknown ahead of time. This > will make documentation as well as ability to share configuration > between VMs more difficult. In other words, we're back to something > that requires per-guest configuration and doesn't just work everywhere. Maybe. Why isn't the interfacename known ahead of time. Once upon a time it was always "eth0", but I guess guests can rename it.... You can use a number instead of a name. %1 would always be lo. %2 seems to always (often?) be the first physical interface. Presumably the order in which you describe interfaces to qemu directly maps to the order that Linux sees. Maybe %2 could always work. Maybe we could make it so that it always works, even if that requires small changes to Linux (and/or qemu). > >> I don't think the server can filter connections based on which interface >> a link-local address came from. If that was a problem that someone >> wanted to be fixed, I'm sure we can fix it. >> >> If you need to be sure that clients don't fake their IPv6 address, I'm >> sure netfilter is up to the task. > > Yes, it's common to prevent spoofing on the host using netfilter and I > think it wouldn't be a problem. > >> . Creates network interfaces on host that must be managed >> >> What vsock does is effectively create a hidden interface on the host that only the >> kernel knows about and so the sysadmin cannot break it. The only >> difference between this and an explicit interface on the host is that >> the latter requires a competent sysadmin. >> >> If you have other reasons for preferring the use of vsock for NFS, I'd be >> happy to hear them. So far I'm not convinced. > > Before working on AF_VSOCK I originally proposed adding dedicated > network interfaces to guests, similar to what you've suggested, but > there was resistance for additional reasons that weren't covered in the > presentation: I would like to suggest that this is critical information for understanding the design rationale for AF_VSOCK and should be easily found from http://wiki.qemu.org/Features/VirtioVsock > > Using AF_INET exposes the host's network stack to guests, and through > accidental misconfiguration even external traffic could reach the host's > network stack. AF_VSOCK doesn't do routing or forwarding so we can be > sure that any activity is intentional. If I understand this correctly, the suggested configuration has the host completely isolated from network traffic, and the guests directly control the physical network interfaces, so the guests see external traffic, but neither the guests nor the wider network can communicate with the host. Except that sometimes the guests do need to communicate with the host so we create a whole new protocol just for that. > > Some virtualization use cases run guests without any network interfaces > as a matter of security policy. One could argue that AF_VSOCK is just > another network channel, but due to it's restricted usage, the attack > surface is much smaller than an AF_INET network interface. No network interfaces, but they still want to use NFS. Does anyone think that sounds rational? "due to it's restricted usage, the attack surface is much smaller" or "due to it's niche use-cache, bug are likely to go undetected for longer". I'm not convinced that is sensible security policy. I think I see where you are coming from now - thanks. I'm not convinced though. It feels like someone is paranoid about possible exploits using protocols that they think they understand, so they ask you to create a new protocol that they don't understand (and so cannot be afraid of). Maybe the NFS server should be run in a guest. Surely that would protects the host's network stack. This would be a rather paranoid configuration, but it seems to match the paranoia of the requirements. I'm not against people being paranoid. I am against major code changes to well established software, just to placate that paranoia. To achieve zero-config, I think link-local addresses are by far the best answer. To achieve isolation, some targeted filtering seems like the best approach. If you really want traffic between guest and host to go over a vsock, then some sort of packet redirection should be possible. NeilBrown
On Thu, Jul 27, 2017 at 03:13:53PM +1000, NeilBrown wrote: > On Tue, Jul 25 2017, Stefan Hajnoczi wrote: > > On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote: > >> On Fri, Jul 07 2017, NeilBrown wrote: > >> > On Fri, Jun 30 2017, Chuck Lever wrote: > >> I don't think the server can filter connections based on which interface > >> a link-local address came from. If that was a problem that someone > >> wanted to be fixed, I'm sure we can fix it. > >> > >> If you need to be sure that clients don't fake their IPv6 address, I'm > >> sure netfilter is up to the task. > > > > Yes, it's common to prevent spoofing on the host using netfilter and I > > think it wouldn't be a problem. > > > >> . Creates network interfaces on host that must be managed > >> > >> What vsock does is effectively create a hidden interface on the host that only the > >> kernel knows about and so the sysadmin cannot break it. The only > >> difference between this and an explicit interface on the host is that > >> the latter requires a competent sysadmin. > >> > >> If you have other reasons for preferring the use of vsock for NFS, I'd be > >> happy to hear them. So far I'm not convinced. > > > > Before working on AF_VSOCK I originally proposed adding dedicated > > network interfaces to guests, similar to what you've suggested, but > > there was resistance for additional reasons that weren't covered in the > > presentation: > > I would like to suggest that this is critical information for > understanding the design rationale for AF_VSOCK and should be easily > found from http://wiki.qemu.org/Features/VirtioVsock Thanks, I have updated the wiki. > To achieve zero-config, I think link-local addresses are by far the best > answer. To achieve isolation, some targeted filtering seems like the > best approach. > > If you really want traffic between guest and host to go over a vsock, > then some sort of packet redirection should be possible. The issue we seem to hit with designs using AF_INET and network interfaces is that they cannot meet the "it must avoid invasive configuration changes, especially inside the guest" requirement. It's very hard to autoconfigure in a way that doesn't conflict with the user's network configuration inside the guest. One thought about solving the interface naming problem: if the dedicated NIC uses a well-known OUI dedicated for this purpose then udev could assign a persistent name (e.g. "virtguestif"). This gets us one step closer to non-invasive automatic configuration. Stefan
On Thu, 2017-07-27 at 11:58 +0100, Stefan Hajnoczi wrote: > On Thu, Jul 27, 2017 at 03:13:53PM +1000, NeilBrown wrote: > > On Tue, Jul 25 2017, Stefan Hajnoczi wrote: > > > On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote: > > > > On Fri, Jul 07 2017, NeilBrown wrote: > > > > > On Fri, Jun 30 2017, Chuck Lever wrote: > > > > > > > > I don't think the server can filter connections based on which > > > > interface > > > > a link-local address came from. If that was a problem that > > > > someone > > > > wanted to be fixed, I'm sure we can fix it. > > > > > > > > If you need to be sure that clients don't fake their IPv6 > > > > address, I'm > > > > sure netfilter is up to the task. > > > > > > Yes, it's common to prevent spoofing on the host using netfilter > > > and I > > > think it wouldn't be a problem. > > > > > > > . Creates network interfaces on host that must be managed > > > > > > > > What vsock does is effectively create a hidden interface on the > > > > host that only the > > > > kernel knows about and so the sysadmin cannot break it. The > > > > only > > > > difference between this and an explicit interface on the host > > > > is that > > > > the latter requires a competent sysadmin. > > > > > > > > If you have other reasons for preferring the use of vsock for > > > > NFS, I'd be > > > > happy to hear them. So far I'm not convinced. > > > > > > Before working on AF_VSOCK I originally proposed adding dedicated > > > network interfaces to guests, similar to what you've suggested, > > > but > > > there was resistance for additional reasons that weren't covered > > > in the > > > presentation: > > > > I would like to suggest that this is critical information for > > understanding the design rationale for AF_VSOCK and should be > > easily > > found from http://wiki.qemu.org/Features/VirtioVsock > > Thanks, I have updated the wiki. > > > To achieve zero-config, I think link-local addresses are by far the > > best > > answer. To achieve isolation, some targeted filtering seems like > > the > > best approach. > > > > If you really want traffic between guest and host to go over a > > vsock, > > then some sort of packet redirection should be possible. > > The issue we seem to hit with designs using AF_INET and network > interfaces is that they cannot meet the "it must avoid invasive > configuration changes, especially inside the guest" > requirement. It's > very hard to autoconfigure in a way that doesn't conflict with the > user's network configuration inside the guest. > > One thought about solving the interface naming problem: if the > dedicated > NIC uses a well-known OUI dedicated for this purpose then udev could > assign a persistent name (e.g. "virtguestif"). This gets us one step > closer to non-invasive automatic configuration. Link-local IPv6 addresses are always present once you bring up an IPv6 interface. You can use them to communicate with other hosts on the same network segment. It's just not routable. That seems entirely fine here where you're not dealing with routing anyway. What I would (naively) envision is a new network interface driver that presents itself as "hvlo0" or soemthing, much like we do with the loopback interface. You just need the guest to ensure that it plugs in that driver and brings up the interface for ipv6. Then the only issue is discovery of addresses. The HV should be able to figure that out and present it. Maybe roll up a new nsswitch module that queries the HV directly somehow? The nice thing there is that you get name resolution "for free", since it's just plain old IPv6 traffic at that point. AF_VSOCK just seems like a very invasive solution to this problem that's going to add a lot of maintenance burden to a lot of different code.
On Thu, Jul 27 2017, Stefan Hajnoczi wrote: > On Thu, Jul 27, 2017 at 03:13:53PM +1000, NeilBrown wrote: >> On Tue, Jul 25 2017, Stefan Hajnoczi wrote: >> > On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote: >> >> On Fri, Jul 07 2017, NeilBrown wrote: >> >> > On Fri, Jun 30 2017, Chuck Lever wrote: >> >> I don't think the server can filter connections based on which interface >> >> a link-local address came from. If that was a problem that someone >> >> wanted to be fixed, I'm sure we can fix it. >> >> >> >> If you need to be sure that clients don't fake their IPv6 address, I'm >> >> sure netfilter is up to the task. >> > >> > Yes, it's common to prevent spoofing on the host using netfilter and I >> > think it wouldn't be a problem. >> > >> >> . Creates network interfaces on host that must be managed >> >> >> >> What vsock does is effectively create a hidden interface on the host that only the >> >> kernel knows about and so the sysadmin cannot break it. The only >> >> difference between this and an explicit interface on the host is that >> >> the latter requires a competent sysadmin. >> >> >> >> If you have other reasons for preferring the use of vsock for NFS, I'd be >> >> happy to hear them. So far I'm not convinced. >> > >> > Before working on AF_VSOCK I originally proposed adding dedicated >> > network interfaces to guests, similar to what you've suggested, but >> > there was resistance for additional reasons that weren't covered in the >> > presentation: >> >> I would like to suggest that this is critical information for >> understanding the design rationale for AF_VSOCK and should be easily >> found from http://wiki.qemu.org/Features/VirtioVsock > > Thanks, I have updated the wiki. Thanks. How this one: Can be used with VMs that have no network interfaces is really crying out for some sort of justification. And given that ethernet/tcpip must be some of the most attacked (and hence hardened" code in Linux, some explanation of why it is thought that they expose more of an attack surface than some brand new code, might be helpful. > >> To achieve zero-config, I think link-local addresses are by far the best >> answer. To achieve isolation, some targeted filtering seems like the >> best approach. >> >> If you really want traffic between guest and host to go over a vsock, >> then some sort of packet redirection should be possible. > > The issue we seem to hit with designs using AF_INET and network > interfaces is that they cannot meet the "it must avoid invasive > configuration changes, especially inside the guest" requirement. It's > very hard to autoconfigure in a way that doesn't conflict with the > user's network configuration inside the guest. > > One thought about solving the interface naming problem: if the dedicated > NIC uses a well-known OUI dedicated for this purpose then udev could > assign a persistent name (e.g. "virtguestif"). This gets us one step > closer to non-invasive automatic configuration. I think this is well worth pursuing. As you say, an OUI allows the guest to reliably detect the right interface to use a link-local address on. Thanks, NeilBrown > > Stefan
Hi, On Fri, Jun 30, 2017 at 11:52 AM, Chuck Lever <chuck.lever@oracle.com> wrote: > Hi Stefan- > > Is there a standard netid for vsock? If not, > there needs to be some discussion with the nfsv4 > Working Group to get this worked out. > > Because AF_VSOCK is an address family and the RPC > framing is the same as TCP, the netid should be > something like "tcpv" and not "vsock". I've > complained about this before and there has been > no response of any kind. the onc record marking is just the length/end-of-transmission bit, and the bytes. something is being borrowed, but it isn't tcp > > I'll note that rdma/rdma6 do not use alternate > address families: an IP address is specified and > mapped to a GUID by the underlying transport. > We purposely did not expose GUIDs to NFS, which > is based on AF_INET/AF_INET6. but, as you state, vsock is an address family. > > rdma co-exists with IP. vsock doesn't have this > fallback. doesn't appear to be needed. > > It might be a better approach to use well-known > (say, link-local or loopback) addresses and let > the underlying network layer figure it out. > > Then hide all this stuff with DNS and let the > client mount the server by hostname and use > normal sockaddr's and "proto=tcp". Then you don't > need _any_ application layer changes. the changes in nfs-ganesha and ntirpc along these lines were rather trivial. > > Without hostnames, how does a client pick a > Kerberos service principal for the server? no mechanism has been proposed > > Does rpcbind implement "vsock" netids? are they needed? > > Does the NFSv4.0 client advertise "vsock" in > SETCLIENTID, and provide a "vsock" callback > service? It should at least do the latter; does it need to advertise differently in SETCLIENTID? > > >> It is now possible to mount a file system from the host (hypervisor) >> over AF_VSOCK like this: >> >> (guest)$ mount.nfs 2:/export /mnt -v -o clientaddr=3,proto=vsock >> >> The VM's cid address is 3 and the hypervisor is 2. > > The mount command is supposed to supply "clientaddr" > automatically. This mount option is exposed only for > debugging purposes or very special cases (like > disabling NFSv4 callback operations). > > I mean the whole point of this exercise is to get > rid of network configuration, but here you're > adding the need to additionally specify both the > proto option and the clientaddr option to get this > to work. Seems like that isn't zero-configuration > at all. This whole line of criticism seems to me kind of off-kilter. The concept of cross-vm pipes appears pretty classical, and one can see why it might not need to follow Internet conventions. I'll give you that I never found the zeroconf or security rationales as compelling--which is to say, I wouldn't restrict vsock to guest-host communications, except by policy. > > Wouldn't it be nicer if it worked like this: > > (guest)$ cat /etc/hosts > 129.0.0.2 localhyper > (guest)$ mount.nfs localhyper:/export /mnt > > And the result was a working NFS mount of the > local hypervisor, using whatever NFS version the > two both support, with no changes needed to the > NFS implementation or the understanding of the > system administrator? > > not clear; I can understand 2:/export pretty easily, and I don't think any minds would be blown if "localhyper:" effected 2:. > > -- > Chuck Lever > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Matt -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 28, 2017 at 09:11:22AM +1000, NeilBrown wrote: > On Thu, Jul 27 2017, Stefan Hajnoczi wrote: > > On Thu, Jul 27, 2017 at 03:13:53PM +1000, NeilBrown wrote: > >> On Tue, Jul 25 2017, Stefan Hajnoczi wrote: > >> > On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote: > >> >> On Fri, Jul 07 2017, NeilBrown wrote: > >> >> > On Fri, Jun 30 2017, Chuck Lever wrote: > >> To achieve zero-config, I think link-local addresses are by far the best > >> answer. To achieve isolation, some targeted filtering seems like the > >> best approach. > >> > >> If you really want traffic between guest and host to go over a vsock, > >> then some sort of packet redirection should be possible. > > > > The issue we seem to hit with designs using AF_INET and network > > interfaces is that they cannot meet the "it must avoid invasive > > configuration changes, especially inside the guest" requirement. It's > > very hard to autoconfigure in a way that doesn't conflict with the > > user's network configuration inside the guest. > > > > One thought about solving the interface naming problem: if the dedicated > > NIC uses a well-known OUI dedicated for this purpose then udev could > > assign a persistent name (e.g. "virtguestif"). This gets us one step > > closer to non-invasive automatic configuration. > > I think this is well worth pursuing. As you say, an OUI allows the > guest to reliably detect the right interface to use a link-local address > on. IPv6 link-local addressing with a well-known MAC address range solves address collisions. The presence of a network interface still has the following issues: 1. Network management tools (e.g. NetworkManager) inside the guest detect the interface and may auto-configure it (e.g. DHCP). Guest administrators are confronted with a new interface - this opens up the possibility that they change its configuration. 2. Default drop firewall policies conflict with the interface. The guest administrator would have to manually configure exceptions for their firewall. 3. udev is a Linux-only solution and other OSes do not offer a configurable interface naming scheme. Manual configuration would be required. I still see these as blockers preventing guest<->host file system sharing. Users can already manually add a NIC and configure NFS today, but the goal here is to offer this as a feature that works in an automated way (useful both for GUI-style virtual machine management and for OpenStack clouds where guest configuration must be simple and scale). In contrast, AF_VSOCK works as long as the driver is loaded. There is no configuration. The changes required to Linux and nfs-utils are related to the sunrpc transport and configuration. They do not introduce risks to core NFS or TCP/IP. I would really like to get patches merged because I currently have to direct interested users to building Linux and nfs-utils from source to try this out. Stefan
On Thu, Aug 03 2017, Stefan Hajnoczi wrote: > On Fri, Jul 28, 2017 at 09:11:22AM +1000, NeilBrown wrote: >> On Thu, Jul 27 2017, Stefan Hajnoczi wrote: >> > On Thu, Jul 27, 2017 at 03:13:53PM +1000, NeilBrown wrote: >> >> On Tue, Jul 25 2017, Stefan Hajnoczi wrote: >> >> > On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote: >> >> >> On Fri, Jul 07 2017, NeilBrown wrote: >> >> >> > On Fri, Jun 30 2017, Chuck Lever wrote: >> >> To achieve zero-config, I think link-local addresses are by far the best >> >> answer. To achieve isolation, some targeted filtering seems like the >> >> best approach. >> >> >> >> If you really want traffic between guest and host to go over a vsock, >> >> then some sort of packet redirection should be possible. >> > >> > The issue we seem to hit with designs using AF_INET and network >> > interfaces is that they cannot meet the "it must avoid invasive >> > configuration changes, especially inside the guest" requirement. It's >> > very hard to autoconfigure in a way that doesn't conflict with the >> > user's network configuration inside the guest. >> > >> > One thought about solving the interface naming problem: if the dedicated >> > NIC uses a well-known OUI dedicated for this purpose then udev could >> > assign a persistent name (e.g. "virtguestif"). This gets us one step >> > closer to non-invasive automatic configuration. >> >> I think this is well worth pursuing. As you say, an OUI allows the >> guest to reliably detect the right interface to use a link-local address >> on. > > IPv6 link-local addressing with a well-known MAC address range solves > address collisions. The presence of a network interface still has the > following issues: > > 1. Network management tools (e.g. NetworkManager) inside the guest > detect the interface and may auto-configure it (e.g. DHCP). Why would this matter? Auto-configuring may add addresses to the interface, but will not remove the link-local address. > Guest > administrators are confronted with a new interface - this opens up > the possibility that they change its configuration. True, the admin might delete the link-local address themselves. They might also delete /sbin/mount.nfs. Maybe they could even "rm -rf /". A rogue admin can always shoot themselves in the foot. Trying to prevent this is pointless. > > 2. Default drop firewall policies conflict with the interface. The > guest administrator would have to manually configure exceptions for > their firewall. This gets back to my original point. You are willing to stick required-configuration in the kernel and in nfs-utils, but you are not willing to require some fixed configuration which actually addresses your problem. If you want an easy way to punch a firewall hole for a particular port on a particular interface, then resolve that by talking with people who understand firewalls. Not by creating a new protocol which cannot be firewalled. > > 3. udev is a Linux-only solution and other OSes do not offer a > configurable interface naming scheme. Manual configuration would > be required. Not my problem. If some other OS is lacking important functionality, you do fix it by adding rubbish to Linux. You fix it by fixing those OSes. For example, is Linux didn't have udev or anything like it, I might be open to enhancing mount.nfs so that an address syntax like: fe80::1%*:xx:yy:xx:* would mean the that glob pattern should be matched again the MAC address of each interface and the first such interface used. This would be a focused change addressed at fixing a specific issue. I might not actually like it, but if it was the best/simplest mechanism to achieve the goal, I doubt I would fight it. Fortunately I don't need decide as we already have udev. If some of OS doesn't have a way to find the interface for a particular MAC address, maybe you need to create one. > > I still see these as blockers preventing guest<->host file system > sharing. Users can already manually add a NIC and configure NFS today, > but the goal here is to offer this as a feature that works in an > automated way (useful both for GUI-style virtual machine management and > for OpenStack clouds where guest configuration must be simple and > scale). > > In contrast, AF_VSOCK works as long as the driver is loaded. There is > no configuration. I think we all agree that providing something that "just works" is a worth goal. In only question is about how much new code can be justified, and where it should be put. Given that almost everything you need already exists, it seems best to just tie those pieces together. NeilBrown > > The changes required to Linux and nfs-utils are related to the sunrpc > transport and configuration. They do not introduce risks to core NFS or > TCP/IP. I would really like to get patches merged because I currently > have to direct interested users to building Linux and nfs-utils from > source to try this out. > > Stefan
Hi Neil, On Thu, Aug 3, 2017 at 5:45 PM, NeilBrown <neilb@suse.com> wrote: > On Thu, Aug 03 2017, Stefan Hajnoczi wrote: Since the vsock address family is in the tin since 4.8, this argument appears to be about, precisely, tying existing pieces together. The ceph developers working on openstack manila did find the nfs over vsock use case compelling. I appreciate this because it has encouraged more interest in the cephfs community around using the standardized NFS protocol for deployment. Matt > > I think we all agree that providing something that "just works" is a > worth goal. In only question is about how much new code can be > justified, and where it should be put. > > Given that almost everything you need already exists, it seems best to > just tie those pieces together. > > NeilBrown > > >> >> The changes required to Linux and nfs-utils are related to the sunrpc >> transport and configuration. They do not introduce risks to core NFS or >> TCP/IP. I would really like to get patches merged because I currently >> have to direct interested users to building Linux and nfs-utils from >> source to try this out. >> >> Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 03 2017, Matt Benjamin wrote: > Hi Neil, > > On Thu, Aug 3, 2017 at 5:45 PM, NeilBrown <neilb@suse.com> wrote: >> On Thu, Aug 03 2017, Stefan Hajnoczi wrote: > > Since the vsock address family is in the tin since 4.8, this argument > appears to be about, precisely, tying existing pieces together. No, it is about adding new, unnecessary pieces into various places. > The > ceph developers working on openstack manila did find the nfs over > vsock use case compelling. I appreciate this because it has > encouraged more interest in the cephfs community around using the > standardized NFS protocol for deployment. I'm sure the ceph developers find zero-conf NFS a compelling use case. I would be surprised if they care whether it is over vsock or IPv6. But I'm losing interest here. I'm not a gate-keeper. If you can convince Steve/Trond/Anna/Brice to accept your code, then good luck to you. I don't think a convincing case has been made though. NeilBrown > > Matt > >> >> I think we all agree that providing something that "just works" is a >> worth goal. In only question is about how much new code can be >> justified, and where it should be put. >> >> Given that almost everything you need already exists, it seems best to >> just tie those pieces together. >> >> NeilBrown >> >> >>> >>> The changes required to Linux and nfs-utils are related to the sunrpc >>> transport and configuration. They do not introduce risks to core NFS or >>> TCP/IP. I would really like to get patches merged because I currently >>> have to direct interested users to building Linux and nfs-utils from >>> source to try this out. >>> >>> Stefan
On Fri, Aug 04, 2017 at 07:45:22AM +1000, NeilBrown wrote: > On Thu, Aug 03 2017, Stefan Hajnoczi wrote: > > On Fri, Jul 28, 2017 at 09:11:22AM +1000, NeilBrown wrote: > >> On Thu, Jul 27 2017, Stefan Hajnoczi wrote: > >> > On Thu, Jul 27, 2017 at 03:13:53PM +1000, NeilBrown wrote: > >> >> On Tue, Jul 25 2017, Stefan Hajnoczi wrote: > >> >> > On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote: > >> >> >> On Fri, Jul 07 2017, NeilBrown wrote: > >> >> >> > On Fri, Jun 30 2017, Chuck Lever wrote: > > I still see these as blockers preventing guest<->host file system > > sharing. Users can already manually add a NIC and configure NFS today, > > but the goal here is to offer this as a feature that works in an > > automated way (useful both for GUI-style virtual machine management and > > for OpenStack clouds where guest configuration must be simple and > > scale). > > > > In contrast, AF_VSOCK works as long as the driver is loaded. There is > > no configuration. > > I think we all agree that providing something that "just works" is a > worth goal. In only question is about how much new code can be > justified, and where it should be put. > > Given that almost everything you need already exists, it seems best to > just tie those pieces together. Neil, You said downthread you're losing interest but there's a point that I hope you have time to consider because it's key: Even if the NFS transport can be set up automatically without conflicting with the user's system configuration, it needs to stay available going forward. A network interface is prone to user configuration changes through network management tools, firewalls, and other utilities. The risk of it breakage is significant. That's not really a technical problem - it will be caused by some user action - but using the existing Linux AF_VSOCK feature that whole class of issues can be eliminated. Stefan
On Fri, Aug 04 2017, Stefan Hajnoczi wrote: > On Fri, Aug 04, 2017 at 07:45:22AM +1000, NeilBrown wrote: >> On Thu, Aug 03 2017, Stefan Hajnoczi wrote: >> > On Fri, Jul 28, 2017 at 09:11:22AM +1000, NeilBrown wrote: >> >> On Thu, Jul 27 2017, Stefan Hajnoczi wrote: >> >> > On Thu, Jul 27, 2017 at 03:13:53PM +1000, NeilBrown wrote: >> >> >> On Tue, Jul 25 2017, Stefan Hajnoczi wrote: >> >> >> > On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote: >> >> >> >> On Fri, Jul 07 2017, NeilBrown wrote: >> >> >> >> > On Fri, Jun 30 2017, Chuck Lever wrote: >> > I still see these as blockers preventing guest<->host file system >> > sharing. Users can already manually add a NIC and configure NFS today, >> > but the goal here is to offer this as a feature that works in an >> > automated way (useful both for GUI-style virtual machine management and >> > for OpenStack clouds where guest configuration must be simple and >> > scale). >> > >> > In contrast, AF_VSOCK works as long as the driver is loaded. There is >> > no configuration. >> >> I think we all agree that providing something that "just works" is a >> worth goal. In only question is about how much new code can be >> justified, and where it should be put. >> >> Given that almost everything you need already exists, it seems best to >> just tie those pieces together. > > Neil, > You said downthread you're losing interest but there's a point that I > hope you have time to consider because it's key: > > Even if the NFS transport can be set up automatically without > conflicting with the user's system configuration, it needs to stay > available going forward. A network interface is prone to user > configuration changes through network management tools, firewalls, and > other utilities. The risk of it breakage is significant. I've already addressed this issue. I wrote: True, the admin might delete the link-local address themselves. They might also delete /sbin/mount.nfs. Maybe they could even "rm -rf /". A rogue admin can always shoot themselves in the foot. Trying to prevent this is pointless. > > That's not really a technical problem - it will be caused by some user > action - but using the existing Linux AF_VSOCK feature that whole class > of issues can be eliminated. I suggest you look up the proverb about making things fool-proof and learn to apply it. Meanwhile I have another issue. Is it possible for tcpdump, or some other tool, to capture all the packets flowing over a vsock? If it isn't possible to analyse the traffic with wireshark, it will be much harder to diagnose issues that customers have. NeilBrown > > Stefan
On Sat, Aug 05, 2017 at 08:35:52AM +1000, NeilBrown wrote: > On Fri, Aug 04 2017, Stefan Hajnoczi wrote: > > > On Fri, Aug 04, 2017 at 07:45:22AM +1000, NeilBrown wrote: > >> On Thu, Aug 03 2017, Stefan Hajnoczi wrote: > >> > On Fri, Jul 28, 2017 at 09:11:22AM +1000, NeilBrown wrote: > >> >> On Thu, Jul 27 2017, Stefan Hajnoczi wrote: > >> >> > On Thu, Jul 27, 2017 at 03:13:53PM +1000, NeilBrown wrote: > >> >> >> On Tue, Jul 25 2017, Stefan Hajnoczi wrote: > >> >> >> > On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote: > >> >> >> >> On Fri, Jul 07 2017, NeilBrown wrote: > >> >> >> >> > On Fri, Jun 30 2017, Chuck Lever wrote: > >> > I still see these as blockers preventing guest<->host file system > >> > sharing. Users can already manually add a NIC and configure NFS today, > >> > but the goal here is to offer this as a feature that works in an > >> > automated way (useful both for GUI-style virtual machine management and > >> > for OpenStack clouds where guest configuration must be simple and > >> > scale). > >> > > >> > In contrast, AF_VSOCK works as long as the driver is loaded. There is > >> > no configuration. > >> > >> I think we all agree that providing something that "just works" is a > >> worth goal. In only question is about how much new code can be > >> justified, and where it should be put. > >> > >> Given that almost everything you need already exists, it seems best to > >> just tie those pieces together. > > > > Neil, > > You said downthread you're losing interest but there's a point that I > > hope you have time to consider because it's key: > > > > Even if the NFS transport can be set up automatically without > > conflicting with the user's system configuration, it needs to stay > > available going forward. A network interface is prone to user > > configuration changes through network management tools, firewalls, and > > other utilities. The risk of it breakage is significant. > > I've already addressed this issue. I wrote: > > True, the admin might delete the link-local address themselves. They > might also delete /sbin/mount.nfs. Maybe they could even "rm -rf /". > A rogue admin can always shoot themselves in the foot. Trying to > prevent this is pointless. These are not things that I'm worried about. I agree that it's pointless trying to prevent them. The issue is genuine configuration changes either by the user or by software they are running that simply interfere with the host<->guest interface. For example, a default DROP iptables policy. > Meanwhile I have another issue. Is it possible for tcpdump, or some > other tool, to capture all the packets flowing over a vsock? If it > isn't possible to analyse the traffic with wireshark, it will be much > harder to diagnose issues that customers have. Yes, packet capture is possible. The vsockmon driver was added in Linux 4.11. Wireshark has a dissector for AF_VSOCK. Stefan
diff --git a/support/nfs/getport.c b/support/nfs/getport.c index 081594c..0b857af 100644 --- a/support/nfs/getport.c +++ b/support/nfs/getport.c @@ -217,8 +217,7 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol) struct protoent *proto; /* - * IANA does not define a protocol number for rdma netids, - * since "rdma" is not an IP protocol. + * IANA does not define protocol numbers for non-IP netids. */ if (strcmp(netid, "rdma") == 0) { *family = AF_INET; @@ -230,6 +229,11 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol) *protocol = NFSPROTO_RDMA; return 1; } + if (strcmp(netid, "vsock") == 0) { + *family = AF_VSOCK; + *protocol = 0; + return 1; + } nconf = getnetconfigent(netid); if (nconf == NULL) @@ -258,14 +262,18 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol) struct protoent *proto; /* - * IANA does not define a protocol number for rdma netids, - * since "rdma" is not an IP protocol. + * IANA does not define protocol numbers for non-IP netids. */ if (strcmp(netid, "rdma") == 0) { *family = AF_INET; *protocol = NFSPROTO_RDMA; return 1; } + if (strcmp(netid, "vsock") == 0) { + *family = AF_VSOCK; + *protocol = 0; + return 1; + } proto = getprotobyname(netid); if (proto == NULL)
Neither libtirpc nor getprotobyname(3) know about AF_VSOCK. For similar reasons as for "rdma"/"rmda6", translate "vsock" manually in getport.c. It is now possible to mount a file system from the host (hypervisor) over AF_VSOCK like this: (guest)$ mount.nfs 2:/export /mnt -v -o clientaddr=3,proto=vsock The VM's cid address is 3 and the hypervisor is 2. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> --- support/nfs/getport.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)