diff mbox

RDS: sync congestion map updating

Message ID 56FED04C.2060806@oracle.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Santosh Shilimkar April 1, 2016, 7:47 p.m. UTC
(cc-ing netdev)
On 3/30/2016 7:59 PM, Wengang Wang wrote:
>
>
> ? 2016?03?31? 09:51, Wengang Wang ??:
>>
>>
>> ? 2016?03?31? 01:16, santosh shilimkar ??:
>>> Hi Wengang,
>>>
>>> On 3/30/2016 9:19 AM, Leon Romanovsky wrote:
>>>> On Wed, Mar 30, 2016 at 05:08:22PM +0800, Wengang Wang wrote:
>>>>> Problem is found that some among a lot of parallel RDS
>>>>> communications hang.
>>>>> In my test ten or so among 33 communications hang. The send
>>>>> requests got
>>>>> -ENOBUF error meaning the peer socket (port) is congested. But
>>>>> meanwhile,
>>>>> peer socket (port) is not congested.
>>>>>
>>>>> The congestion map updating can happen in two paths: one is in
>>>>> rds_recvmsg path
>>>>> and the other is when it receives packets from the hardware. There
>>>>> is no
>>>>> synchronization when updating the congestion map. So a bit
>>>>> operation (clearing)
>>>>> in the rds_recvmsg path can be skipped by another bit operation
>>>>> (setting) in
>>>>> hardware packet receving path.
>>>>>
>
> To be more detailed.  Here, the two paths (user calls recvmsg and
> hardware receives data) are for different rds socks. thus the
> rds_sock->rs_recv_lock is not helpful to sync the updating on congestion
> map.
>
For archive purpose, let me try to conclude the thread. I synced
with Wengang offlist and came up with below fix. I was under
impression that __set_bit_le() was atmoic version. After fixing
it like patch(end of the email), the bug gets addressed.

I will probably send this as fix for stable as well.


 From 5614b61f6fdcd6ae0c04e50b97efd13201762294 Mon Sep 17 00:00:00 2001
From: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Date: Wed, 30 Mar 2016 23:26:47 -0700
Subject: [PATCH] RDS: Fix the atomicity for congestion map update

Two different threads with different rds sockets may be in
rds_recv_rcvbuf_delta() via receive path. If their ports
both map to the same word in the congestion map, then
using non-atomic ops to update it could cause the map to
be incorrect. Lets use atomics to avoid such an issue.

Full credit to Wengang <wen.gang.wang@oracle.com> for
finding the issue, analysing it and also pointing out
to offending code with spin lock based fix.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
  net/rds/cong.c |    4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)

  void rds_cong_clear_bit(struct rds_cong_map *map, __be16 port)
@@ -313,7 +313,7 @@ void rds_cong_clear_bit(struct rds_cong_map *map, 
__be16 port)
  	i = be16_to_cpu(port) / RDS_CONG_MAP_PAGE_BITS;
  	off = be16_to_cpu(port) % RDS_CONG_MAP_PAGE_BITS;

-	__clear_bit_le(off, (void *)map->m_page_addrs[i]);
+	clear_bit_le(off, (void *)map->m_page_addrs[i]);
  }

  static int rds_cong_test_bit(struct rds_cong_map *map, __be16 port)

Comments

Leon Romanovsky April 2, 2016, 1:14 a.m. UTC | #1
On Fri, Apr 01, 2016 at 12:47:24PM -0700, santosh shilimkar wrote:
> (cc-ing netdev)
> On 3/30/2016 7:59 PM, Wengang Wang wrote:
> >
> >
> >? 2016?03?31? 09:51, Wengang Wang ??:
> >>
> >>
> >>? 2016?03?31? 01:16, santosh shilimkar ??:
> >>>Hi Wengang,
> >>>
> >>>On 3/30/2016 9:19 AM, Leon Romanovsky wrote:
> >>>>On Wed, Mar 30, 2016 at 05:08:22PM +0800, Wengang Wang wrote:
> >>>>>Problem is found that some among a lot of parallel RDS
> >>>>>communications hang.
> >>>>>In my test ten or so among 33 communications hang. The send
> >>>>>requests got
> >>>>>-ENOBUF error meaning the peer socket (port) is congested. But
> >>>>>meanwhile,
> >>>>>peer socket (port) is not congested.
> >>>>>
> >>>>>The congestion map updating can happen in two paths: one is in
> >>>>>rds_recvmsg path
> >>>>>and the other is when it receives packets from the hardware. There
> >>>>>is no
> >>>>>synchronization when updating the congestion map. So a bit
> >>>>>operation (clearing)
> >>>>>in the rds_recvmsg path can be skipped by another bit operation
> >>>>>(setting) in
> >>>>>hardware packet receving path.
> >>>>>
> >
> >To be more detailed.  Here, the two paths (user calls recvmsg and
> >hardware receives data) are for different rds socks. thus the
> >rds_sock->rs_recv_lock is not helpful to sync the updating on congestion
> >map.
> >
> For archive purpose, let me try to conclude the thread. I synced
> with Wengang offlist and came up with below fix. I was under
> impression that __set_bit_le() was atmoic version. After fixing
> it like patch(end of the email), the bug gets addressed.
> 
> I will probably send this as fix for stable as well.
> 
> 
> From 5614b61f6fdcd6ae0c04e50b97efd13201762294 Mon Sep 17 00:00:00 2001
> From: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> Date: Wed, 30 Mar 2016 23:26:47 -0700
> Subject: [PATCH] RDS: Fix the atomicity for congestion map update
> 
> Two different threads with different rds sockets may be in
> rds_recv_rcvbuf_delta() via receive path. If their ports
> both map to the same word in the congestion map, then
> using non-atomic ops to update it could cause the map to
> be incorrect. Lets use atomics to avoid such an issue.
> 
> Full credit to Wengang <wen.gang.wang@oracle.com> for
> finding the issue, analysing it and also pointing out
> to offending code with spin lock based fix.

I'm glad that you solved the issue without spinlocks.
Out of curiosity, I see that this patch is needed to be sent
to Dave and applied by him. Is it right?

?  linus-tree git:(master) ./scripts/get_maintainer.pl -f net/rds/cong.c
Santosh Shilimkar <santosh.shilimkar@oracle.com> (supporter:RDS -
RELIABLE DATAGRAM SOCKETS)
"David S. Miller" <davem@davemloft.net> (maintainer:NETWORKING
[GENERAL])
netdev@vger.kernel.org (open list:RDS - RELIABLE DATAGRAM SOCKETS)
linux-rdma@vger.kernel.org (open list:RDS - RELIABLE DATAGRAM SOCKETS)
rds-devel@oss.oracle.com (moderated list:RDS - RELIABLE DATAGRAM
SOCKETS)
linux-kernel@vger.kernel.org (open list)

> 
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Reviewed-by: Leon Romanovsky <leon@leon.nu>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Santosh Shilimkar April 2, 2016, 4:30 a.m. UTC | #2
On 4/1/16 6:14 PM, Leon Romanovsky wrote:
> On Fri, Apr 01, 2016 at 12:47:24PM -0700, santosh shilimkar wrote:
>> (cc-ing netdev)
>> On 3/30/2016 7:59 PM, Wengang Wang wrote:
>>>
>>>
>>> ? 2016?03?31? 09:51, Wengang Wang ??:
>>>>
>>>>
>>>> ? 2016?03?31? 01:16, santosh shilimkar ??:
>>>>> Hi Wengang,
>>>>>
>>>>> On 3/30/2016 9:19 AM, Leon Romanovsky wrote:
>>>>>> On Wed, Mar 30, 2016 at 05:08:22PM +0800, Wengang Wang wrote:
>>>>>>> Problem is found that some among a lot of parallel RDS
>>>>>>> communications hang.
>>>>>>> In my test ten or so among 33 communications hang. The send
>>>>>>> requests got
>>>>>>> -ENOBUF error meaning the peer socket (port) is congested. But
>>>>>>> meanwhile,
>>>>>>> peer socket (port) is not congested.
>>>>>>>
>>>>>>> The congestion map updating can happen in two paths: one is in
>>>>>>> rds_recvmsg path
>>>>>>> and the other is when it receives packets from the hardware. There
>>>>>>> is no
>>>>>>> synchronization when updating the congestion map. So a bit
>>>>>>> operation (clearing)
>>>>>>> in the rds_recvmsg path can be skipped by another bit operation
>>>>>>> (setting) in
>>>>>>> hardware packet receving path.
>>>>>>>
>>>
>>> To be more detailed.  Here, the two paths (user calls recvmsg and
>>> hardware receives data) are for different rds socks. thus the
>>> rds_sock->rs_recv_lock is not helpful to sync the updating on congestion
>>> map.
>>>
>> For archive purpose, let me try to conclude the thread. I synced
>> with Wengang offlist and came up with below fix. I was under
>> impression that __set_bit_le() was atmoic version. After fixing
>> it like patch(end of the email), the bug gets addressed.
>>
>> I will probably send this as fix for stable as well.
>>
>>
>>  From 5614b61f6fdcd6ae0c04e50b97efd13201762294 Mon Sep 17 00:00:00 2001
>> From: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>> Date: Wed, 30 Mar 2016 23:26:47 -0700
>> Subject: [PATCH] RDS: Fix the atomicity for congestion map update
>>
>> Two different threads with different rds sockets may be in
>> rds_recv_rcvbuf_delta() via receive path. If their ports
>> both map to the same word in the congestion map, then
>> using non-atomic ops to update it could cause the map to
>> be incorrect. Lets use atomics to avoid such an issue.
>>
>> Full credit to Wengang <wen.gang.wang@oracle.com> for
>> finding the issue, analysing it and also pointing out
>> to offending code with spin lock based fix.
>
> I'm glad that you solved the issue without spinlocks.
> Out of curiosity, I see that this patch is needed to be sent
> to Dave and applied by him. Is it right?
>
Right. I was planning send this one along with one more fix
together on netdev for Dave to pick it up.

> ?  linus-tree git:(master) ./scripts/get_maintainer.pl -f net/rds/cong.c
> Santosh Shilimkar <santosh.shilimkar@oracle.com> (supporter:RDS -
> RELIABLE DATAGRAM SOCKETS)
> "David S. Miller" <davem@davemloft.net> (maintainer:NETWORKING
> [GENERAL])
> netdev@vger.kernel.org (open list:RDS - RELIABLE DATAGRAM SOCKETS)
> linux-rdma@vger.kernel.org (open list:RDS - RELIABLE DATAGRAM SOCKETS)
> rds-devel@oss.oracle.com (moderated list:RDS - RELIABLE DATAGRAM
> SOCKETS)
> linux-kernel@vger.kernel.org (open list)
>
>>
>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>
> Reviewed-by: Leon Romanovsky <leon@leon.nu>
>
Thanks for review.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/rds/cong.c b/net/rds/cong.c
index e6144b8..6641bcf 100644
--- a/net/rds/cong.c
+++ b/net/rds/cong.c
@@ -299,7 +299,7 @@  void rds_cong_set_bit(struct rds_cong_map *map, 
__be16 port)
  	i = be16_to_cpu(port) / RDS_CONG_MAP_PAGE_BITS;
  	off = be16_to_cpu(port) % RDS_CONG_MAP_PAGE_BITS;

-	__set_bit_le(off, (void *)map->m_page_addrs[i]);
+	set_bit_le(off, (void *)map->m_page_addrs[i]);
  }