diff mbox

NFSd 3.13 bug (Was "Re: [PATCH 3.4 9/9] nfsd: use the current net ns in write_threads() and write_ports()")

Message ID 52AF1BE4.1090702@parallels.com (mailing list archive)
State New, archived
Headers show

Commit Message

Stanislav Kinsbursky Dec. 16, 2013, 3:27 p.m. UTC
16.12.2013 05:26, Weng Meiling ?????:
> I backport the patch 11f779421a39b86da8a523d97e5fd3477878d44f "nfsd: containerize NFSd
>>filesystem" and test. But I trigger a bug, this bug still exists in 3.13 kernel. The following
>>is what I do:
>>
>>The steps:
>>
>>step 1: start NFS server in init_net net ns
>>#service nfsserver start
>>
>>step 2: stop NFS server in non init_net net ns
>>#ip netns add test
>>#ip netns list
>>test
>>#ip netns exec test service nfsserver stop
>>
>>step 3: start NFS server again in the non init_net net ns
>>#ip netns exec test service nfsserver start
>>
>>This step 3 will trigger kernel panic.


This sequence can be reduced to steps 2 and 3.


>>The reason seems that "ip
>>netns exec" creates a new mount namespace, the changes to the
>>new mount namespace don't propgate to other namespaces. So
>>when stop NFS server in second step, the NFSD filesystem isn't
>>umounted.  When restart NFS server in third step, the NFSD
>>filesystem will not remount,  this result to the NFSD file
>>system superblock's net ns is still init_net and RPCBIND client
>>will be NULL when register RPC service with the local portmapper
>>in svc_addsock(). Do you have any ideas about this problem?
>>

The problem here is that on NFS server stop, RPCBIND client were destroyed for init_net,
because network namespace context is being taken from NFSd superblock.
On NFS start start rpc.nfsd process creates socket in nested net and passes it into "write_ports",
which leads to NFSd creation of RPCBIND socket in init_net because of the same reason. An attempt
to register passed socket in nested net leads to panic. I think, this collusion should be handled
as error and can be fixed like below.

BTW, it looks to me. that mounts with namespace-aware superblocks can't just use the same
superblock on new mount namespace creation and should be handled in more complex way.

Eric, Al, could you share your opinion how this problem should be solved?

Comments

Weng Meiling Dec. 30, 2013, 9:04 a.m. UTC | #1
Hi Stanislav,

I test kernel with this patch, the problem has be fixed. Would you please send a
formal one? :) Thanks very much!

Weng Meiling
Thanks

On 2013/12/16 23:27, Stanislav Kinsbursky wrote:
> 16.12.2013 05:26, Weng Meiling ?????:
>> I backport the patch 11f779421a39b86da8a523d97e5fd3477878d44f "nfsd: containerize NFSd
>>> filesystem" and test. But I trigger a bug, this bug still exists in 3.13 kernel. The following
>>> is what I do:
>>>
>>> The steps:
>>>
>>> step 1: start NFS server in init_net net ns
>>> #service nfsserver start
>>>
>>> step 2: stop NFS server in non init_net net ns
>>> #ip netns add test
>>> #ip netns list
>>> test
>>> #ip netns exec test service nfsserver stop
>>>
>>> step 3: start NFS server again in the non init_net net ns
>>> #ip netns exec test service nfsserver start
>>>
>>> This step 3 will trigger kernel panic.
> 
> 
> This sequence can be reduced to steps 2 and 3.
> 
> 
>>> The reason seems that "ip
>>> netns exec" creates a new mount namespace, the changes to the
>>> new mount namespace don't propgate to other namespaces. So
>>> when stop NFS server in second step, the NFSD filesystem isn't
>>> umounted.  When restart NFS server in third step, the NFSD
>>> filesystem will not remount,  this result to the NFSD file
>>> system superblock's net ns is still init_net and RPCBIND client
>>> will be NULL when register RPC service with the local portmapper
>>> in svc_addsock(). Do you have any ideas about this problem?
>>>
> 
> The problem here is that on NFS server stop, RPCBIND client were destroyed for init_net,
> because network namespace context is being taken from NFSd superblock.
> On NFS start start rpc.nfsd process creates socket in nested net and passes it into "write_ports",
> which leads to NFSd creation of RPCBIND socket in init_net because of the same reason. An attempt
> to register passed socket in nested net leads to panic. I think, this collusion should be handled
> as error and can be fixed like below.
> 
> BTW, it looks to me. that mounts with namespace-aware superblocks can't just use the same
> superblock on new mount namespace creation and should be handled in more complex way.
> 
> Eric, Al, could you share your opinion how this problem should be solved?
> 
> =======================================================================================
> 
> 
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 7f55517..f34d9de 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -699,6 +699,11 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net)
>         if (err != 0 || fd < 0)
>                 return -EINVAL;
> 
> +       if (svc_alien_sock(net, fd)) {
> +               printk(KERN_ERR "%s: socket net is different to NFSd's one\n", __func__);
> +               return -EINVAL;
> +       }
> +
>         err = nfsd_create_serv(net);
>         if (err != 0)
>                 return err;
> diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
> index 62fd1b7..947009e 100644
> --- a/include/linux/sunrpc/svcsock.h
> +++ b/include/linux/sunrpc/svcsock.h
> @@ -56,6 +56,7 @@ int           svc_recv(struct svc_rqst *, long);
>  int            svc_send(struct svc_rqst *);
>  void           svc_drop(struct svc_rqst *);
>  void           svc_sock_update_bufs(struct svc_serv *serv);
> +bool           svc_alien_sock(struct net *net, int fd);
>  int            svc_addsock(struct svc_serv *serv, const int fd,
>                                         char *name_return, const size_t len);
>  void           svc_init_xprt_sock(void);
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index b6e59f0..3ba5b87 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -1397,6 +1397,17 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>         return svsk;
>  }
> 
> +bool svc_alien_sock(struct net *net, int fd)
> +{
> +       int err;
> +       struct socket *sock = sockfd_lookup(fd, &err);
> +
> +       if (sock && (sock_net(sock->sk) != net))
> +               return true;
> +       return false;
> +}
> +EXPORT_SYMBOL_GPL(svc_alien_sock);
> +
>  /**
>   * svc_addsock - add a listener socket to an RPC service
>   * @serv: pointer to RPC service to which to add a new listener
> 
> 
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stanislav Kinsbursky Dec. 30, 2013, 9:21 a.m. UTC | #2
30.12.2013 13:04, Weng Meiling ?????:
> Hi Stanislav,
>
> I test kernel with this patch, the problem has be fixed. Would you please send a
> formal one? :) Thanks very much!
>

Thank you, Weng!
Sure, I'll resend.

> Weng Meiling
> Thanks
>
> On 2013/12/16 23:27, Stanislav Kinsbursky wrote:
>> 16.12.2013 05:26, Weng Meiling ?????:
>>> I backport the patch 11f779421a39b86da8a523d97e5fd3477878d44f "nfsd: containerize NFSd
>>>> filesystem" and test. But I trigger a bug, this bug still exists in 3.13 kernel. The following
>>>> is what I do:
>>>>
>>>> The steps:
>>>>
>>>> step 1: start NFS server in init_net net ns
>>>> #service nfsserver start
>>>>
>>>> step 2: stop NFS server in non init_net net ns
>>>> #ip netns add test
>>>> #ip netns list
>>>> test
>>>> #ip netns exec test service nfsserver stop
>>>>
>>>> step 3: start NFS server again in the non init_net net ns
>>>> #ip netns exec test service nfsserver start
>>>>
>>>> This step 3 will trigger kernel panic.
>>
>>
>> This sequence can be reduced to steps 2 and 3.
>>
>>
>>>> The reason seems that "ip
>>>> netns exec" creates a new mount namespace, the changes to the
>>>> new mount namespace don't propgate to other namespaces. So
>>>> when stop NFS server in second step, the NFSD filesystem isn't
>>>> umounted.  When restart NFS server in third step, the NFSD
>>>> filesystem will not remount,  this result to the NFSD file
>>>> system superblock's net ns is still init_net and RPCBIND client
>>>> will be NULL when register RPC service with the local portmapper
>>>> in svc_addsock(). Do you have any ideas about this problem?
>>>>
>>
>> The problem here is that on NFS server stop, RPCBIND client were destroyed for init_net,
>> because network namespace context is being taken from NFSd superblock.
>> On NFS start start rpc.nfsd process creates socket in nested net and passes it into "write_ports",
>> which leads to NFSd creation of RPCBIND socket in init_net because of the same reason. An attempt
>> to register passed socket in nested net leads to panic. I think, this collusion should be handled
>> as error and can be fixed like below.
>>
>> BTW, it looks to me. that mounts with namespace-aware superblocks can't just use the same
>> superblock on new mount namespace creation and should be handled in more complex way.
>>
>> Eric, Al, could you share your opinion how this problem should be solved?
>>
>> =======================================================================================
>>
>>
>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
>> index 7f55517..f34d9de 100644
>> --- a/fs/nfsd/nfsctl.c
>> +++ b/fs/nfsd/nfsctl.c
>> @@ -699,6 +699,11 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net)
>>          if (err != 0 || fd < 0)
>>                  return -EINVAL;
>>
>> +       if (svc_alien_sock(net, fd)) {
>> +               printk(KERN_ERR "%s: socket net is different to NFSd's one\n", __func__);
>> +               return -EINVAL;
>> +       }
>> +
>>          err = nfsd_create_serv(net);
>>          if (err != 0)
>>                  return err;
>> diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
>> index 62fd1b7..947009e 100644
>> --- a/include/linux/sunrpc/svcsock.h
>> +++ b/include/linux/sunrpc/svcsock.h
>> @@ -56,6 +56,7 @@ int           svc_recv(struct svc_rqst *, long);
>>   int            svc_send(struct svc_rqst *);
>>   void           svc_drop(struct svc_rqst *);
>>   void           svc_sock_update_bufs(struct svc_serv *serv);
>> +bool           svc_alien_sock(struct net *net, int fd);
>>   int            svc_addsock(struct svc_serv *serv, const int fd,
>>                                          char *name_return, const size_t len);
>>   void           svc_init_xprt_sock(void);
>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>> index b6e59f0..3ba5b87 100644
>> --- a/net/sunrpc/svcsock.c
>> +++ b/net/sunrpc/svcsock.c
>> @@ -1397,6 +1397,17 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>>          return svsk;
>>   }
>>
>> +bool svc_alien_sock(struct net *net, int fd)
>> +{
>> +       int err;
>> +       struct socket *sock = sockfd_lookup(fd, &err);
>> +
>> +       if (sock && (sock_net(sock->sk) != net))
>> +               return true;
>> +       return false;
>> +}
>> +EXPORT_SYMBOL_GPL(svc_alien_sock);
>> +
>>   /**
>>    * svc_addsock - add a listener socket to an RPC service
>>    * @serv: pointer to RPC service to which to add a new listener
>>
>>
>>
>>
>
>
diff mbox

Patch

=======================================================================================


diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 7f55517..f34d9de 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -699,6 +699,11 @@  static ssize_t __write_ports_addfd(char *buf, struct net *net)
         if (err != 0 || fd < 0)
                 return -EINVAL;

+       if (svc_alien_sock(net, fd)) {
+               printk(KERN_ERR "%s: socket net is different to NFSd's one\n", __func__);
+               return -EINVAL;
+       }
+
         err = nfsd_create_serv(net);
         if (err != 0)
                 return err;
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index 62fd1b7..947009e 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -56,6 +56,7 @@  int           svc_recv(struct svc_rqst *, long);
  int            svc_send(struct svc_rqst *);
  void           svc_drop(struct svc_rqst *);
  void           svc_sock_update_bufs(struct svc_serv *serv);
+bool           svc_alien_sock(struct net *net, int fd);
  int            svc_addsock(struct svc_serv *serv, const int fd,
                                         char *name_return, const size_t len);
  void           svc_init_xprt_sock(void);
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index b6e59f0..3ba5b87 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1397,6 +1397,17 @@  static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
         return svsk;
  }

+bool svc_alien_sock(struct net *net, int fd)
+{
+       int err;
+       struct socket *sock = sockfd_lookup(fd, &err);
+
+       if (sock && (sock_net(sock->sk) != net))
+               return true;
+       return false;
+}
+EXPORT_SYMBOL_GPL(svc_alien_sock);
+
  /**
   * svc_addsock - add a listener socket to an RPC service
   * @serv: pointer to RPC service to which to add a new listener