Message ID | 1498658565-3408-1-git-send-email-mustafa.ismail@intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Wed, Jun 28, 2017 at 09:02:45AM -0500, Mustafa Ismail wrote: > Add a wait/retry version of ibnl_unicast, ibnl_unicast_wait, > and modify ibnl_unicast to not wait/retry. This eliminates > the undesirable wait for future users of ibnl_unicast. > > Change Portmapper calls originating from kernel to user-space > to use ibnl_unicast_wait and take advantage of the wait/retry > logic in netlink_unicast. > > Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> > Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com> > --- > drivers/infiniband/core/iwpm_msg.c | 6 +++--- > drivers/infiniband/core/netlink.c | 12 +++++++++++- > include/rdma/rdma_netlink.h | 10 ++++++++++ > 3 files changed, 24 insertions(+), 4 deletions(-) Please apply this patch instead of Leon's patch to revert "IB/core: Add flow control to the portmapper netlink calls". Leon, we can work out names and parameters if this works for you. Chien -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 28, 2017 at 09:12:11AM -0500, Chien Tin Tung wrote: > On Wed, Jun 28, 2017 at 09:02:45AM -0500, Mustafa Ismail wrote: > > Add a wait/retry version of ibnl_unicast, ibnl_unicast_wait, > > and modify ibnl_unicast to not wait/retry. This eliminates > > the undesirable wait for future users of ibnl_unicast. > > > > Change Portmapper calls originating from kernel to user-space > > to use ibnl_unicast_wait and take advantage of the wait/retry > > logic in netlink_unicast. > > > > Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> > > Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com> > > --- > > drivers/infiniband/core/iwpm_msg.c | 6 +++--- > > drivers/infiniband/core/netlink.c | 12 +++++++++++- > > include/rdma/rdma_netlink.h | 10 ++++++++++ > > 3 files changed, 24 insertions(+), 4 deletions(-) > > Please apply this patch instead of Leon's patch to revert > "IB/core: Add flow control to the portmapper netlink calls". > > Leon, we can work out names and parameters if this works for you. Chien, The names are less my worries with this patch. First of all, it misleads by using wait/retry naming, because it blocks and not waits. The second, I disagree with solution in kernel for user space application which can't handle the netlink errors. Can you please explain why the kernel fix is unavoidable and the bug in the user space application is not needed to be fixed? Thanks > > Chien
On Wed, Jun 28, 2017 at 06:36:39PM +0300, Leon Romanovsky wrote: > On Wed, Jun 28, 2017 at 09:12:11AM -0500, Chien Tin Tung wrote: > > On Wed, Jun 28, 2017 at 09:02:45AM -0500, Mustafa Ismail wrote: > > > Add a wait/retry version of ibnl_unicast, ibnl_unicast_wait, > > > and modify ibnl_unicast to not wait/retry. This eliminates > > > the undesirable wait for future users of ibnl_unicast. > > > > > > Change Portmapper calls originating from kernel to user-space > > > to use ibnl_unicast_wait and take advantage of the wait/retry > > > logic in netlink_unicast. > > > > > > Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> > > > Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com> > > > --- > > > drivers/infiniband/core/iwpm_msg.c | 6 +++--- > > > drivers/infiniband/core/netlink.c | 12 +++++++++++- > > > include/rdma/rdma_netlink.h | 10 ++++++++++ > > > 3 files changed, 24 insertions(+), 4 deletions(-) > > > > Please apply this patch instead of Leon's patch to revert > > "IB/core: Add flow control to the portmapper netlink calls". > > > > Leon, we can work out names and parameters if this works for you. > > Chien, > > The names are less my worries with this patch. First of all, it misleads > by using wait/retry naming, because it blocks and not waits. Nope. It does a single shot retry and waits in a waitqueue. Go look at netlink_unicast and in turn netlink_attachskb. If you still disagree, please flag specific code where it blocks. Here are the two functions for your convenience. int netlink_unicast(struct sock *ssk, struct sk_buff *skb, u32 portid, int nonblock) { struct sock *sk; int err; long timeo; skb = netlink_trim(skb, gfp_any()); timeo = sock_sndtimeo(ssk, nonblock); retry: sk = netlink_getsockbyportid(ssk, portid); if (IS_ERR(sk)) { kfree_skb(skb); return PTR_ERR(sk); } if (netlink_is_kernel(sk)) return netlink_unicast_kernel(sk, skb, ssk); if (sk_filter(sk, skb)) { err = skb->len; kfree_skb(skb); sock_put(sk); return err; } err = netlink_attachskb(sk, skb, &timeo, ssk); if (err == 1) goto retry; if (err) return err; return netlink_sendskb(sk, skb); } /* * Attach a skb to a netlink socket. * The caller must hold a reference to the destination socket. On error, the * reference is dropped. The skb is not send to the destination, just all * all error checks are performed and memory in the queue is reserved. * Return values: * < 0: error. skb freed, reference to sock dropped. * 0: continue * 1: repeat lookup - reference dropped while waiting for socket memory. */ int netlink_attachskb(struct sock *sk, struct sk_buff *skb, long *timeo, struct sock *ssk) { struct netlink_sock *nlk; nlk = nlk_sk(sk); if ((atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf || test_bit(NETLINK_S_CONGESTED, &nlk->state))) { DECLARE_WAITQUEUE(wait, current); if (!*timeo) { if (!ssk || netlink_is_kernel(ssk)) netlink_overrun(sk); sock_put(sk); kfree_skb(skb); return -EAGAIN; } __set_current_state(TASK_INTERRUPTIBLE); add_wait_queue(&nlk->wait, &wait); if ((atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf || test_bit(NETLINK_S_CONGESTED, &nlk->state)) && !sock_flag(sk, SOCK_DEAD)) *timeo = schedule_timeout(*timeo); __set_current_state(TASK_RUNNING); remove_wait_queue(&nlk->wait, &wait); sock_put(sk); if (signal_pending(current)) { kfree_skb(skb); return sock_intr_errno(*timeo); } return 1; } netlink_skb_set_owner_r(skb, sk); return 0; } BTW, _nobody_ is resetting the socket attribute from O_NONBLOCK. It is very difficult to understand your argument of "blocking" when you are not sharing the specifics. Please put your finger on it so everyone can understand your point. > The second, I disagree with solution in kernel for user space application which can't > handle the netlink errors. There is no guarantee delivery nor blocking on send. Like I mentioned above, it is a 1 shot retry with a set wait time. The code obviousely handles error condition as it can happen. Chien -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 28, 2017 at 03:30:03PM -0500, Chien Tin Tung wrote: > On Wed, Jun 28, 2017 at 06:36:39PM +0300, Leon Romanovsky wrote: > > On Wed, Jun 28, 2017 at 09:12:11AM -0500, Chien Tin Tung wrote: > > > On Wed, Jun 28, 2017 at 09:02:45AM -0500, Mustafa Ismail wrote: > > > > Add a wait/retry version of ibnl_unicast, ibnl_unicast_wait, > > > > and modify ibnl_unicast to not wait/retry. This eliminates > > > > the undesirable wait for future users of ibnl_unicast. > > > > > > > > Change Portmapper calls originating from kernel to user-space > > > > to use ibnl_unicast_wait and take advantage of the wait/retry > > > > logic in netlink_unicast. > > > > > > > > Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> > > > > Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com> > > > > --- > > > > drivers/infiniband/core/iwpm_msg.c | 6 +++--- > > > > drivers/infiniband/core/netlink.c | 12 +++++++++++- > > > > include/rdma/rdma_netlink.h | 10 ++++++++++ > > > > 3 files changed, 24 insertions(+), 4 deletions(-) > > > > > > Please apply this patch instead of Leon's patch to revert > > > "IB/core: Add flow control to the portmapper netlink calls". > > > > > > Leon, we can work out names and parameters if this works for you. > > > > Chien, > > > > The names are less my worries with this patch. First of all, it misleads > > by using wait/retry naming, because it blocks and not waits. > > Nope. It does a single shot retry and waits in a waitqueue. > Go look at netlink_unicast and in turn netlink_attachskb. If you still > disagree, please flag specific code where it blocks. I agree, it wouldn't block in your scenario. However will it work in more hostile environments? For example, malicious user can open RDMA netlink socket directly (socket(...)), set sndtimeo to be MAX_SCHEDULE_TIMEOUT - 1 (LONG_MAX - 1) and send custom netlink messages right to your new _wait function. If I understand correctly from the code, it will add them to waitqueue and won't release skb till the end of processing. Will it cause to mark whole netlink socket as NETLINK_S_CONGESTED? Will other users will be able to progress with their messages or they will need to wait till those _wait calls finish? > > Here are the two functions for your convenience. > > > int netlink_unicast(struct sock *ssk, struct sk_buff *skb, > u32 portid, int nonblock) > { > struct sock *sk; > int err; > long timeo; > > skb = netlink_trim(skb, gfp_any()); > > timeo = sock_sndtimeo(ssk, nonblock); > retry: > sk = netlink_getsockbyportid(ssk, portid); > if (IS_ERR(sk)) { > kfree_skb(skb); > return PTR_ERR(sk); > } > if (netlink_is_kernel(sk)) > return netlink_unicast_kernel(sk, skb, ssk); > > if (sk_filter(sk, skb)) { > err = skb->len; > kfree_skb(skb); > sock_put(sk); > return err; > } > > err = netlink_attachskb(sk, skb, &timeo, ssk); > if (err == 1) > goto retry; > if (err) > return err; > > return netlink_sendskb(sk, skb); > } > > /* > * Attach a skb to a netlink socket. > * The caller must hold a reference to the destination socket. On error, the > * reference is dropped. The skb is not send to the destination, just all > * all error checks are performed and memory in the queue is reserved. > * Return values: > * < 0: error. skb freed, reference to sock dropped. > * 0: continue > * 1: repeat lookup - reference dropped while waiting for socket memory. > */ > int netlink_attachskb(struct sock *sk, struct sk_buff *skb, > long *timeo, struct sock *ssk) > { > struct netlink_sock *nlk; > > nlk = nlk_sk(sk); > > if ((atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf || > test_bit(NETLINK_S_CONGESTED, &nlk->state))) { > DECLARE_WAITQUEUE(wait, current); > if (!*timeo) { > if (!ssk || netlink_is_kernel(ssk)) > netlink_overrun(sk); > sock_put(sk); > kfree_skb(skb); > return -EAGAIN; > } > > __set_current_state(TASK_INTERRUPTIBLE); > add_wait_queue(&nlk->wait, &wait); > > if ((atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf || > test_bit(NETLINK_S_CONGESTED, &nlk->state)) && > !sock_flag(sk, SOCK_DEAD)) > *timeo = schedule_timeout(*timeo); > > __set_current_state(TASK_RUNNING); > remove_wait_queue(&nlk->wait, &wait); > sock_put(sk); > > if (signal_pending(current)) { > kfree_skb(skb); > return sock_intr_errno(*timeo); > } > return 1; > } > netlink_skb_set_owner_r(skb, sk); > return 0; > } > > > BTW, _nobody_ is resetting the socket attribute from O_NONBLOCK. > > It is very difficult to understand your argument of "blocking" when you are not > sharing the specifics. Please put your finger on it so everyone can understand > your point. I hope that I succeeded to answer on your questions. > > > The second, I disagree with solution in kernel for user space application which can't > > handle the netlink errors. > > There is no guarantee delivery nor blocking on send. Like I mentioned above, > it is a 1 shot retry with a set wait time. The code obviousely handles error > condition as it can happen. So, can you please refresh our memory and explain again what exactly this patch is fixing if user-space handles errors correctly? > > > Chien
On Thu, Jun 29, 2017 at 08:04:36AM +0300, Leon Romanovsky wrote: > On Wed, Jun 28, 2017 at 03:30:03PM -0500, Chien Tin Tung wrote: > > On Wed, Jun 28, 2017 at 06:36:39PM +0300, Leon Romanovsky wrote: > > > On Wed, Jun 28, 2017 at 09:12:11AM -0500, Chien Tin Tung wrote: > > > > On Wed, Jun 28, 2017 at 09:02:45AM -0500, Mustafa Ismail wrote: > > > > > Add a wait/retry version of ibnl_unicast, ibnl_unicast_wait, > > > > > and modify ibnl_unicast to not wait/retry. This eliminates > > > > > the undesirable wait for future users of ibnl_unicast. > > > > > > > > > > Change Portmapper calls originating from kernel to user-space > > > > > to use ibnl_unicast_wait and take advantage of the wait/retry > > > > > logic in netlink_unicast. > > > > > > > > > > Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> > > > > > Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com> > > > > > --- > > > > > drivers/infiniband/core/iwpm_msg.c | 6 +++--- > > > > > drivers/infiniband/core/netlink.c | 12 +++++++++++- > > > > > include/rdma/rdma_netlink.h | 10 ++++++++++ > > > > > 3 files changed, 24 insertions(+), 4 deletions(-) > > > > > > > > Please apply this patch instead of Leon's patch to revert > > > > "IB/core: Add flow control to the portmapper netlink calls". > > > > > > > > Leon, we can work out names and parameters if this works for you. > > > > > > Chien, > > > > > > The names are less my worries with this patch. First of all, it misleads > > > by using wait/retry naming, because it blocks and not waits. > > > > Nope. It does a single shot retry and waits in a waitqueue. > > Go look at netlink_unicast and in turn netlink_attachskb. If you still > > disagree, please flag specific code where it blocks. > > I agree, it wouldn't block in your scenario. However will it work in more hostile > environments? The reason behind the original patch was to make it more tolerable to "hostile" environments by giving it one more try allocate buffer to send the message. So yes, it does work in more hostile environment. > > For example, malicious user can open RDMA netlink socket directly (socket(...)), > set sndtimeo to be MAX_SCHEDULE_TIMEOUT - 1 (LONG_MAX - 1) How do you open the _kernel_ RDMA netlink socket directly from user-space and set the timeout value? > and send custom > netlink messages right to your new _wait function. If I understand correctly > from the code, it will add them to waitqueue and won't release skb till > the end of processing. The only caller of ibnl_unicast_wait is portmapper kernel client. The call is initiated from kernel not in response to user message so you did not understand the code correctly. > Will it cause to mark whole netlink socket as NETLINK_S_CONGESTED? No. It has nothing to do with the new function or relevant to this discussion. > Will other users will be able to progress with their messages or they > will need to wait till those _wait calls finish? Short answer is yes. With this patch other users of ibnl_unicast would not get the retry/wait. If you choose to use the wait version, obviousely you will have retry/wait. For the work you are doing, you should stick with ibnl_unicast and there's no impact to your code. This is the reason for the two functions as proposed originally. > > > The second, I disagree with solution in kernel for user space application which can't > > > handle the netlink errors. > > > > There is no guarantee delivery nor blocking on send. Like I mentioned above, > > it is a 1 shot retry with a set wait time. The code obviousely handles error > > condition as it can happen. > > So, can you please refresh our memory and explain again what exactly > this patch is fixing if user-space handles errors correctly? You assertion of user-space flaw is incorrect. It is based on incorrect conclusion of deadlock and blocking. There is no deadlock and no blocking, period. So what's the issue with taking this patch instead of the revert? Chien -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jun 29, 2017 at 10:02:49AM -0500, Chien Tin Tung wrote: > On Thu, Jun 29, 2017 at 08:04:36AM +0300, Leon Romanovsky wrote: > > On Wed, Jun 28, 2017 at 03:30:03PM -0500, Chien Tin Tung wrote: > > > On Wed, Jun 28, 2017 at 06:36:39PM +0300, Leon Romanovsky wrote: > > > > On Wed, Jun 28, 2017 at 09:12:11AM -0500, Chien Tin Tung wrote: > > > > > On Wed, Jun 28, 2017 at 09:02:45AM -0500, Mustafa Ismail wrote: > > > > > > Add a wait/retry version of ibnl_unicast, ibnl_unicast_wait, > > > > > > and modify ibnl_unicast to not wait/retry. This eliminates > > > > > > the undesirable wait for future users of ibnl_unicast. > > > > > > > > > > > > Change Portmapper calls originating from kernel to user-space > > > > > > to use ibnl_unicast_wait and take advantage of the wait/retry > > > > > > logic in netlink_unicast. > > > > > > > > > > > > Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> > > > > > > Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com> > > > > > > --- > > > > > > drivers/infiniband/core/iwpm_msg.c | 6 +++--- > > > > > > drivers/infiniband/core/netlink.c | 12 +++++++++++- > > > > > > include/rdma/rdma_netlink.h | 10 ++++++++++ > > > > > > 3 files changed, 24 insertions(+), 4 deletions(-) > > > > > > > > > > Please apply this patch instead of Leon's patch to revert > > > > > "IB/core: Add flow control to the portmapper netlink calls". > > > > > > > > > > Leon, we can work out names and parameters if this works for you. > > > > > > > > Chien, > > > > > > > > The names are less my worries with this patch. First of all, it misleads > > > > by using wait/retry naming, because it blocks and not waits. > > > > > > Nope. It does a single shot retry and waits in a waitqueue. > > > Go look at netlink_unicast and in turn netlink_attachskb. If you still > > > disagree, please flag specific code where it blocks. > > > > I agree, it wouldn't block in your scenario. However will it work in more hostile > > environments? > > The reason behind the original patch was to make it more tolerable to "hostile" > environments by giving it one more try allocate buffer to send the message. > So yes, it does work in more hostile environment. Are you really calling extra load as a "hostile environment"? > > > > > For example, malicious user can open RDMA netlink socket directly (socket(...)), > > set sndtimeo to be MAX_SCHEDULE_TIMEOUT - 1 (LONG_MAX - 1) > > How do you open the _kernel_ RDMA netlink socket directly from user-space and > set the timeout value? Exactly as libnl, libmnl and rdma-core do. The netlink socket is accessible from the user space without root access. It is the whole purpose of it. > > > and send custom > > netlink messages right to your new _wait function. If I understand correctly > > from the code, it will add them to waitqueue and won't release skb till > > the end of processing. > > The only caller of ibnl_unicast_wait is portmapper kernel client. The call is > initiated from kernel not in response to user message so you did not understand > the code correctly. Maybe and maybe you didn't read your code at all. Calls to _unicast are triggered from kernel for LS only, in you case as an example the request to get RDMA_NL_IWPM_MAPINFO will trigger ibnl_unicast_wait. > > > Will it cause to mark whole netlink socket as NETLINK_S_CONGESTED? > > No. It has nothing to do with the new function or relevant to this discussion. Again, read the source, the netlink socket overrun will set that flag. > > > Will other users will be able to progress with their messages or they > > will need to wait till those _wait calls finish? > > Short answer is yes. With this patch other users of ibnl_unicast would not > get the retry/wait. If you choose to use the wait version, obviousely you will > have retry/wait. For the work you are doing, you should stick with ibnl_unicast > and there's no impact to your code. This is the reason for the two functions as > proposed originally. Again, read the code and my proposed scenario. User without root permissions can open netlink socket (I'm doing it in rdmatool) and can send any message he wants. He can and will overflow the calls with ibnl_unicast_wait, so no other user will have resources available. > > > > > The second, I disagree with solution in kernel for user space application which can't > > > > handle the netlink errors. > > > > > > There is no guarantee delivery nor blocking on send. Like I mentioned above, > > > it is a 1 shot retry with a set wait time. The code obviousely handles error > > > condition as it can happen. > > > > So, can you please refresh our memory and explain again what exactly > > this patch is fixing if user-space handles errors correctly? > > You assertion of user-space flaw is incorrect. It is based on incorrect > conclusion of deadlock and blocking. > > There is no deadlock and no blocking, period. That question doesn't assume anything. I'm still asking what exactly are you fixing? > > > So what's the issue with taking this patch instead of the revert? It doesn't fix anything broken in kernel. > > Chien >
On Thu, Jun 29, 2017 at 07:37:19PM +0300, Leon Romanovsky wrote: > On Thu, Jun 29, 2017 at 10:02:49AM -0500, Chien Tin Tung wrote: > > On Thu, Jun 29, 2017 at 08:04:36AM +0300, Leon Romanovsky wrote: > > > On Wed, Jun 28, 2017 at 03:30:03PM -0500, Chien Tin Tung wrote: > > > > On Wed, Jun 28, 2017 at 06:36:39PM +0300, Leon Romanovsky wrote: > > > > > On Wed, Jun 28, 2017 at 09:12:11AM -0500, Chien Tin Tung wrote: > > > > > > On Wed, Jun 28, 2017 at 09:02:45AM -0500, Mustafa Ismail wrote: > > > > > > > Add a wait/retry version of ibnl_unicast, ibnl_unicast_wait, > > > > > > > and modify ibnl_unicast to not wait/retry. This eliminates > > > > > > > the undesirable wait for future users of ibnl_unicast. > > > > > > > > > > > > > > Change Portmapper calls originating from kernel to user-space > > > > > > > to use ibnl_unicast_wait and take advantage of the wait/retry > > > > > > > logic in netlink_unicast. > > > > > > > > > > > > > > Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> > > > > > > > Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com> > > > > > > > --- > > > > > > > drivers/infiniband/core/iwpm_msg.c | 6 +++--- > > > > > > > drivers/infiniband/core/netlink.c | 12 +++++++++++- > > > > > > > include/rdma/rdma_netlink.h | 10 ++++++++++ > > > > > > > 3 files changed, 24 insertions(+), 4 deletions(-) > > > > > > > > > > > > Please apply this patch instead of Leon's patch to revert > > > > > > "IB/core: Add flow control to the portmapper netlink calls". > > > > > > > > > > > > Leon, we can work out names and parameters if this works for you. > > > > > > > > > > Chien, > > > > > > > > > > The names are less my worries with this patch. First of all, it misleads > > > > > by using wait/retry naming, because it blocks and not waits. > > > > > > > > Nope. It does a single shot retry and waits in a waitqueue. > > > > Go look at netlink_unicast and in turn netlink_attachskb. If you still > > > > disagree, please flag specific code where it blocks. > > > > > > I agree, it wouldn't block in your scenario. However will it work in more hostile > > > environments? > > > > The reason behind the original patch was to make it more tolerable to "hostile" > > environments by giving it one more try allocate buffer to send the message. > > So yes, it does work in more hostile environment. > > Are you really calling extra load as a "hostile environment"? I'm not playing 20 questions (https://en.wikipedia.org/wiki/Twenty_Questions). If you have something in mind say so. If you are referring to your malicious user attack scenario see my response below. > > > For example, malicious user can open RDMA netlink socket directly (socket(...)), > > > set sndtimeo to be MAX_SCHEDULE_TIMEOUT - 1 (LONG_MAX - 1) > > > > How do you open the _kernel_ RDMA netlink socket directly from user-space and > > set the timeout value? > > Exactly as libnl, libmnl and rdma-core do. The netlink socket is > accessible from the user space without root access. It is the whole > purpose of it. Wow, you just uncovered a big security hole. Isn't this beyond a simple timeout value? Wouldn't the ability to set the socket to _blocking_ blow your top? > > > and send custom > > > netlink messages right to your new _wait function. If I understand correctly > > > from the code, it will add them to waitqueue and won't release skb till > > > the end of processing. > > > > The only caller of ibnl_unicast_wait is portmapper kernel client. The call is > > initiated from kernel not in response to user message so you did not understand > > the code correctly. > > Maybe and maybe you didn't read your code at all. I'm gonna give you a pass on this one. > Calls to _unicast are triggered from kernel for LS only, in you case as > an example the request to get RDMA_NL_IWPM_MAPINFO will trigger ibnl_unicast_wait. LS? I don't have your acronym database in my head... With Mustafa's patch, this is the flow with RDMA_NL_IWPM_MAPINFO from user-space into the kernel: iwpm_mapping_info_cb() iwpm_send_mapinfo() send_mapinfo_num() send_nlmsg_done() Both send_mapinfo_num and sen_nlmsg_done call ibnl_unicast() not ibnl_unicast_wait(). I know it is late where you are. So let's talk about the spirit of the patch. Anything that is in response to user-space should use ibnl_unicast. If you find something that's using ibnl_unicast_wait, then the patch is wrong and will be corrected. Now, is there actually one? Please include call flow if you find one. > > > Will it cause to mark whole netlink socket as NETLINK_S_CONGESTED? > > > > No. It has nothing to do with the new function or relevant to this discussion. > > Again, read the source, the netlink socket overrun will set that flag. Sorry, this is still beyond me. What's the connection to wait/retry? How is using wait/retry impacting setting of this flag? Are you back to your malicious user attack scenario? See response below. > > > Will other users will be able to progress with their messages or they > > > will need to wait till those _wait calls finish? > > > > Short answer is yes. With this patch other users of ibnl_unicast would not > > get the retry/wait. If you choose to use the wait version, obviousely you will > > have retry/wait. For the work you are doing, you should stick with ibnl_unicast > > and there's no impact to your code. This is the reason for the two functions as > > proposed originally. > > Again, read the code and my proposed scenario. User without root > permissions can open netlink socket (I'm doing it in rdmatool) and can > send any message he wants. He can and will overflow the calls with > ibnl_unicast_wait, so no other user will have resources available. So, in the spirit of the patch, this cannot happen. End of story. Ignoring rest of your email. > > > > > The second, I disagree with solution in kernel for user space application which can't > > > > > handle the netlink errors. > > > > > > > > There is no guarantee delivery nor blocking on send. Like I mentioned above, > > > > it is a 1 shot retry with a set wait time. The code obviousely handles error > > > > condition as it can happen. > > > > > > So, can you please refresh our memory and explain again what exactly > > > this patch is fixing if user-space handles errors correctly? > > > > You assertion of user-space flaw is incorrect. It is based on incorrect > > conclusion of deadlock and blocking. > > > > There is no deadlock and no blocking, period. > > That question doesn't assume anything. I'm still asking what exactly are > you fixing? > > > > > > So what's the issue with taking this patch instead of the revert? > > It doesn't fix anything broken in kernel. > > > > > Chien > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/iwpm_msg.c b/drivers/infiniband/core/iwpm_msg.c index a0e7c16..add99b9 100644 --- a/drivers/infiniband/core/iwpm_msg.c +++ b/drivers/infiniband/core/iwpm_msg.c @@ -174,7 +174,7 @@ int iwpm_add_mapping(struct iwpm_sa_data *pm_msg, u8 nl_client) goto add_mapping_error; nlmsg_request->req_buffer = pm_msg; - ret = ibnl_unicast(skb, nlh, iwpm_user_pid); + ret = ibnl_unicast_wait(skb, nlh, iwpm_user_pid); if (ret) { skb = NULL; /* skb is freed in the netlink send-op handling */ iwpm_user_pid = IWPM_PID_UNDEFINED; @@ -251,7 +251,7 @@ int iwpm_add_and_query_mapping(struct iwpm_sa_data *pm_msg, u8 nl_client) goto query_mapping_error; nlmsg_request->req_buffer = pm_msg; - ret = ibnl_unicast(skb, nlh, iwpm_user_pid); + ret = ibnl_unicast_wait(skb, nlh, iwpm_user_pid); if (ret) { skb = NULL; /* skb is freed in the netlink send-op handling */ err_str = "Unable to send a nlmsg"; @@ -312,7 +312,7 @@ int iwpm_remove_mapping(struct sockaddr_storage *local_addr, u8 nl_client) if (ret) goto remove_mapping_error; - ret = ibnl_unicast(skb, nlh, iwpm_user_pid); + ret = ibnl_unicast_wait(skb, nlh, iwpm_user_pid); if (ret) { skb = NULL; /* skb is freed in the netlink send-op handling */ iwpm_user_pid = IWPM_PID_UNDEFINED; diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c index 94931c4..0fc50e1 100644 --- a/drivers/infiniband/core/netlink.c +++ b/drivers/infiniband/core/netlink.c @@ -232,11 +232,21 @@ int ibnl_unicast(struct sk_buff *skb, struct nlmsghdr *nlh, { int err; - err = netlink_unicast(nls, skb, pid, 0); + err = netlink_unicast(nls, skb, pid, MSG_DONTWAIT); return (err < 0) ? err : 0; } EXPORT_SYMBOL(ibnl_unicast); +int ibnl_unicast_wait(struct sk_buff *skb, struct nlmsghdr *nlh, + __u32 pid) +{ + int err; + + err = netlink_unicast(nls, skb, pid, 0); + return (err < 0) ? err : 0; +} +EXPORT_SYMBOL(ibnl_unicast_wait); + int ibnl_multicast(struct sk_buff *skb, struct nlmsghdr *nlh, unsigned int group, gfp_t flags) { diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h index 348c102..5b14667 100644 --- a/include/rdma/rdma_netlink.h +++ b/include/rdma/rdma_netlink.h @@ -64,6 +64,16 @@ int ibnl_unicast(struct sk_buff *skb, struct nlmsghdr *nlh, __u32 pid); /** + * Send, with wait/1 retry, the supplied skb to a specific userspace PID. + * @skb: The netlink skb + * @nlh: Header of the netlink message to send + * @pid: Userspace netlink process ID + * Returns 0 on success or a negative error code. + */ +int ibnl_unicast_wait(struct sk_buff *skb, struct nlmsghdr *nlh, + __u32 pid); + +/** * Send the supplied skb to a netlink group. * @skb: The netlink skb * @nlh: Header of the netlink message to send