diff mbox series

[PATCHv2,1/1] net: rds: add service level support in rds-info

Message ID 1566608656-30836-1-git-send-email-yanjun.zhu@oracle.com (mailing list archive)
State Not Applicable
Headers show
Series [PATCHv2,1/1] net: rds: add service level support in rds-info | expand

Commit Message

Zhu Yanjun Aug. 24, 2019, 1:04 a.m. UTC
From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL)
is used to identify different flows within an IBA subnet.
It is carried in the local route header of the packet.

Before this commit, run "rds-info -I". The outputs are as
below:
"
RDS IB Connections:
 LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
192.2.95.3  192.2.95.1  2   0  fe80::21:28:1a:39  fe80::21:28:10:b9
192.2.95.3  192.2.95.1  1   0  fe80::21:28:1a:39  fe80::21:28:10:b9
192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39  fe80::21:28:10:b9
"
After this commit, the output is as below:
"
RDS IB Connections:
 LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
192.2.95.3  192.2.95.1  2   2  fe80::21:28:1a:39  fe80::21:28:10:b9
192.2.95.3  192.2.95.1  1   1  fe80::21:28:1a:39  fe80::21:28:10:b9
192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39  fe80::21:28:10:b9
"

The commit fe3475af3bdf ("net: rds: add per rds connection cache
statistics") adds cache_allocs in struct rds_info_rdma_connection
as below:
struct rds_info_rdma_connection {
...
        __u32           rdma_mr_max;
        __u32           rdma_mr_size;
        __u8            tos;
        __u32           cache_allocs;
 };
The peer struct in rds-tools of struct rds_info_rdma_connection is as
below:
struct rds_info_rdma_connection {
...
        uint32_t        rdma_mr_max;
        uint32_t        rdma_mr_size;
        uint8_t         tos;
        uint8_t         sl;
        uint32_t        cache_allocs;
};
The difference between userspace and kernel is the member variable sl.
In the kernel struct, the member variable sl is missing. This will
introduce risks. So it is necessary to use this commit to avoid this risk.

Fixes: fe3475af3bdf ("net: rds: add per rds connection cache statistics")
CC: Joe Jin <joe.jin@oracle.com>
CC: JUNXIAO_BI <junxiao.bi@oracle.com>
Suggested-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
---
V1->V2: fix typos in commit logs.
---
 include/uapi/linux/rds.h |    2 ++
 net/rds/ib.c             |   16 ++++++++++------
 net/rds/ib.h             |    1 +
 net/rds/ib_cm.c          |    3 +++
 net/rds/rdma_transport.c |   10 ++++++++--
 5 files changed, 24 insertions(+), 8 deletions(-)

Comments

Santosh Shilimkar Aug. 24, 2019, 1:25 a.m. UTC | #1
On 8/23/19 6:04 PM, Zhu Yanjun wrote:
>  From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL)
> is used to identify different flows within an IBA subnet.
> It is carried in the local route header of the packet.
> 
> Before this commit, run "rds-info -I". The outputs are as
> below:
> "
> RDS IB Connections:
>   LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
> 192.2.95.3  192.2.95.1  2   0  fe80::21:28:1a:39  fe80::21:28:10:b9
> 192.2.95.3  192.2.95.1  1   0  fe80::21:28:1a:39  fe80::21:28:10:b9
> 192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39  fe80::21:28:10:b9
> "
> After this commit, the output is as below:
> "
> RDS IB Connections:
>   LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
> 192.2.95.3  192.2.95.1  2   2  fe80::21:28:1a:39  fe80::21:28:10:b9
> 192.2.95.3  192.2.95.1  1   1  fe80::21:28:1a:39  fe80::21:28:10:b9
> 192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39  fe80::21:28:10:b9
> "
> 
> The commit fe3475af3bdf ("net: rds: add per rds connection cache
> statistics") adds cache_allocs in struct rds_info_rdma_connection
> as below:
> struct rds_info_rdma_connection {
> ...
>          __u32           rdma_mr_max;
>          __u32           rdma_mr_size;
>          __u8            tos;
>          __u32           cache_allocs;
>   };
> The peer struct in rds-tools of struct rds_info_rdma_connection is as
> below:
> struct rds_info_rdma_connection {
> ...
>          uint32_t        rdma_mr_max;
>          uint32_t        rdma_mr_size;
>          uint8_t         tos;
>          uint8_t         sl;
>          uint32_t        cache_allocs;
> };
> The difference between userspace and kernel is the member variable sl.
> In the kernel struct, the member variable sl is missing. This will
> introduce risks. So it is necessary to use this commit to avoid this risk.
> 
> Fixes: fe3475af3bdf ("net: rds: add per rds connection cache statistics")
> CC: Joe Jin <joe.jin@oracle.com>
> CC: JUNXIAO_BI <junxiao.bi@oracle.com>
> Suggested-by: Gerd Rausch <gerd.rausch@oracle.com>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
> ---
> V1->V2: fix typos in commit logs.
> ---
I did ask you when ypu posted the patch about whether you did
backward compatibility tests for which you said, you did all the
tests and said "So do not worry about backward compatibility.  This
commit will work well with older rds-tools2.0.5 and 2.0.6."

https://www.spinics.net/lists/netdev/msg574691.html

I was worried about exactly such issue as described in commit.

Anyways thanks for the fixup patch. Should be applied to stable
as well.

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Regards,
Santosh
Zhu Yanjun Aug. 24, 2019, 1:36 a.m. UTC | #2
On 2019/8/24 9:25, santosh.shilimkar@oracle.com wrote:
> On 8/23/19 6:04 PM, Zhu Yanjun wrote:
>>  From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL)
>> is used to identify different flows within an IBA subnet.
>> It is carried in the local route header of the packet.
>>
>> Before this commit, run "rds-info -I". The outputs are as
>> below:
>> "
>> RDS IB Connections:
>>   LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
>> 192.2.95.3  192.2.95.1  2   0  fe80::21:28:1a:39 fe80::21:28:10:b9
>> 192.2.95.3  192.2.95.1  1   0  fe80::21:28:1a:39 fe80::21:28:10:b9
>> 192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39 fe80::21:28:10:b9
>> "
>> After this commit, the output is as below:
>> "
>> RDS IB Connections:
>>   LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
>> 192.2.95.3  192.2.95.1  2   2  fe80::21:28:1a:39 fe80::21:28:10:b9
>> 192.2.95.3  192.2.95.1  1   1  fe80::21:28:1a:39 fe80::21:28:10:b9
>> 192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39 fe80::21:28:10:b9
>> "
>>
>> The commit fe3475af3bdf ("net: rds: add per rds connection cache
>> statistics") adds cache_allocs in struct rds_info_rdma_connection
>> as below:
>> struct rds_info_rdma_connection {
>> ...
>>          __u32           rdma_mr_max;
>>          __u32           rdma_mr_size;
>>          __u8            tos;
>>          __u32           cache_allocs;
>>   };
>> The peer struct in rds-tools of struct rds_info_rdma_connection is as
>> below:
>> struct rds_info_rdma_connection {
>> ...
>>          uint32_t        rdma_mr_max;
>>          uint32_t        rdma_mr_size;
>>          uint8_t         tos;
>>          uint8_t         sl;
>>          uint32_t        cache_allocs;
>> };
>> The difference between userspace and kernel is the member variable sl.
>> In the kernel struct, the member variable sl is missing. This will
>> introduce risks. So it is necessary to use this commit to avoid this 
>> risk.
>>
>> Fixes: fe3475af3bdf ("net: rds: add per rds connection cache 
>> statistics")
>> CC: Joe Jin <joe.jin@oracle.com>
>> CC: JUNXIAO_BI <junxiao.bi@oracle.com>
>> Suggested-by: Gerd Rausch <gerd.rausch@oracle.com>
>> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>> ---
>> V1->V2: fix typos in commit logs.
>> ---
> I did ask you when ypu posted the patch about whether you did
> backward compatibility tests for which you said, you did all the
> tests and said "So do not worry about backward compatibility. This
> commit will work well with older rds-tools2.0.5 and 2.0.6."
>
> https://www.spinics.net/lists/netdev/msg574691.html
>
> I was worried about exactly such issue as described in commit.

Sorry. My bad. I will make more work to let rds robust.

Thanks a lot for your Ack.

Zhu Yanjun

>
> Anyways thanks for the fixup patch. Should be applied to stable
> as well.
>
> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>
> Regards,
> Santosh
>
>
David Miller Aug. 24, 2019, 11:58 p.m. UTC | #3
From: Zhu Yanjun <yanjun.zhu@oracle.com>
Date: Fri, 23 Aug 2019 21:04:16 -0400

> diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h
> index fd6b5f6..cba368e 100644
> --- a/include/uapi/linux/rds.h
> +++ b/include/uapi/linux/rds.h
> @@ -250,6 +250,7 @@ struct rds_info_rdma_connection {
>  	__u32		rdma_mr_max;
>  	__u32		rdma_mr_size;
>  	__u8		tos;
> +	__u8		sl;
>  	__u32		cache_allocs;
>  };

I'm applying this, but I am once again severely disappointed in how
RDS development is being handled.

From the Fixes: commit:

	Since rds.h in rds-tools is not related with the kernel rds.h,
	the change in kernel rds.h does not affect rds-tools.

This is the height of arrogance and shows a lack of understanding of
what user ABI requirements are all about.

It is possible for other userland components to be built by other
people, outside of your controlled eco-system and tools, that use
these interfaces.

And you cannot control that.

Therefore you cannot make arbitrary changes to UABI data strucures
just because the tool you use and maintain is not effected by it.

Please stop making these incredibly incompatible user interface
changes in the RDS stack.

I am, from this point forward, going to be extra strict on RDS stack
changes especially in this area.
Zhu Yanjun Aug. 25, 2019, 2:11 p.m. UTC | #4
On 2019/8/25 7:58, David Miller wrote:
> From: Zhu Yanjun <yanjun.zhu@oracle.com>
> Date: Fri, 23 Aug 2019 21:04:16 -0400
>
>> diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h
>> index fd6b5f6..cba368e 100644
>> --- a/include/uapi/linux/rds.h
>> +++ b/include/uapi/linux/rds.h
>> @@ -250,6 +250,7 @@ struct rds_info_rdma_connection {
>>   	__u32		rdma_mr_max;
>>   	__u32		rdma_mr_size;
>>   	__u8		tos;
>> +	__u8		sl;
>>   	__u32		cache_allocs;
>>   };
> I'm applying this, but I am once again severely disappointed in how
> RDS development is being handled.
>
> >From the Fixes: commit:
>
> 	Since rds.h in rds-tools is not related with the kernel rds.h,
> 	the change in kernel rds.h does not affect rds-tools.
>
> This is the height of arrogance and shows a lack of understanding of
> what user ABI requirements are all about.
>
> It is possible for other userland components to be built by other
> people, outside of your controlled eco-system and tools, that use
> these interfaces.
>
> And you cannot control that.
>
> Therefore you cannot make arbitrary changes to UABI data strucures
> just because the tool you use and maintain is not effected by it.
>
> Please stop making these incredibly incompatible user interface
> changes in the RDS stack.
>
> I am, from this point forward, going to be extra strict on RDS stack
> changes especially in this area.

OK. It is up to you to decide to merge this commit or not.

Zhu Yanjun

>
>
Gustavo A. R. Silva Sept. 3, 2019, 1:58 a.m. UTC | #5
Hi,

On 8/23/19 8:04 PM, Zhu Yanjun wrote:

[..]

> diff --git a/net/rds/ib.c b/net/rds/ib.c
> index ec05d91..45acab2 100644
> --- a/net/rds/ib.c
> +++ b/net/rds/ib.c
> @@ -291,7 +291,7 @@ static int rds_ib_conn_info_visitor(struct rds_connection *conn,
>  				    void *buffer)
>  {
>  	struct rds_info_rdma_connection *iinfo = buffer;
> -	struct rds_ib_connection *ic;
> +	struct rds_ib_connection *ic = conn->c_transport_data;
>  
>  	/* We will only ever look at IB transports */
>  	if (conn->c_trans != &rds_ib_transport)
> @@ -301,15 +301,16 @@ static int rds_ib_conn_info_visitor(struct rds_connection *conn,
>  
>  	iinfo->src_addr = conn->c_laddr.s6_addr32[3];
>  	iinfo->dst_addr = conn->c_faddr.s6_addr32[3];
> -	iinfo->tos = conn->c_tos;
> +	if (ic) {

Is this null-check actually necessary? (see related comments below...)

> +		iinfo->tos = conn->c_tos;
> +		iinfo->sl = ic->i_sl;
> +	}
>  
>  	memset(&iinfo->src_gid, 0, sizeof(iinfo->src_gid));
>  	memset(&iinfo->dst_gid, 0, sizeof(iinfo->dst_gid));
>  	if (rds_conn_state(conn) == RDS_CONN_UP) {
>  		struct rds_ib_device *rds_ibdev;
>  
> -		ic = conn->c_transport_data;
> -
>  		rdma_read_gids(ic->i_cm_id, (union ib_gid *)&iinfo->src_gid,

Notice that *ic* is dereferenced here without null-checking it. More
comments below...

>  			       (union ib_gid *)&iinfo->dst_gid);
>  
> @@ -329,7 +330,7 @@ static int rds6_ib_conn_info_visitor(struct rds_connection *conn,
>  				     void *buffer)
>  {
>  	struct rds6_info_rdma_connection *iinfo6 = buffer;
> -	struct rds_ib_connection *ic;
> +	struct rds_ib_connection *ic = conn->c_transport_data;
>  
>  	/* We will only ever look at IB transports */
>  	if (conn->c_trans != &rds_ib_transport)
> @@ -337,6 +338,10 @@ static int rds6_ib_conn_info_visitor(struct rds_connection *conn,
>  
>  	iinfo6->src_addr = conn->c_laddr;
>  	iinfo6->dst_addr = conn->c_faddr;
> +	if (ic) {
> +		iinfo6->tos = conn->c_tos;
> +		iinfo6->sl = ic->i_sl;
> +	}
>  
>  	memset(&iinfo6->src_gid, 0, sizeof(iinfo6->src_gid));
>  	memset(&iinfo6->dst_gid, 0, sizeof(iinfo6->dst_gid));
> @@ -344,7 +349,6 @@ static int rds6_ib_conn_info_visitor(struct rds_connection *conn,
>  	if (rds_conn_state(conn) == RDS_CONN_UP) {
>  		struct rds_ib_device *rds_ibdev;
>  
> -		ic = conn->c_transport_data;
>  		rdma_read_gids(ic->i_cm_id, (union ib_gid *)&iinfo6->src_gid,

Again, *ic* is being dereferenced here without a previous null-check.

>  			       (union ib_gid *)&iinfo6->dst_gid);
>  		rds_ibdev = ic->rds_ibdev;


--
Gustavo
Zhu Yanjun Sept. 4, 2019, 5:08 a.m. UTC | #6
On 2019/9/3 9:58, Gustavo A. R. Silva wrote:
> Hi,
>
> On 8/23/19 8:04 PM, Zhu Yanjun wrote:
>
> [..]
>
>> diff --git a/net/rds/ib.c b/net/rds/ib.c
>> index ec05d91..45acab2 100644
>> --- a/net/rds/ib.c
>> +++ b/net/rds/ib.c
>> @@ -291,7 +291,7 @@ static int rds_ib_conn_info_visitor(struct rds_connection *conn,
>>   				    void *buffer)
>>   {
>>   	struct rds_info_rdma_connection *iinfo = buffer;
>> -	struct rds_ib_connection *ic;
>> +	struct rds_ib_connection *ic = conn->c_transport_data;
>>   
>>   	/* We will only ever look at IB transports */
>>   	if (conn->c_trans != &rds_ib_transport)
>> @@ -301,15 +301,16 @@ static int rds_ib_conn_info_visitor(struct rds_connection *conn,
>>   
>>   	iinfo->src_addr = conn->c_laddr.s6_addr32[3];
>>   	iinfo->dst_addr = conn->c_faddr.s6_addr32[3];
>> -	iinfo->tos = conn->c_tos;
>> +	if (ic) {
> Is this null-check actually necessary? (see related comments below...)
>
>> +		iinfo->tos = conn->c_tos;
>> +		iinfo->sl = ic->i_sl;
>> +	}
>>   
>>   	memset(&iinfo->src_gid, 0, sizeof(iinfo->src_gid));
>>   	memset(&iinfo->dst_gid, 0, sizeof(iinfo->dst_gid));
>>   	if (rds_conn_state(conn) == RDS_CONN_UP) {
>>   		struct rds_ib_device *rds_ibdev;
>>   
>> -		ic = conn->c_transport_data;
>> -
>>   		rdma_read_gids(ic->i_cm_id, (union ib_gid *)&iinfo->src_gid,
> Notice that *ic* is dereferenced here without null-checking it. More
> comments below...
>
>>   			       (union ib_gid *)&iinfo->dst_gid);
>>   
>> @@ -329,7 +330,7 @@ static int rds6_ib_conn_info_visitor(struct rds_connection *conn,
>>   				     void *buffer)
>>   {
>>   	struct rds6_info_rdma_connection *iinfo6 = buffer;
>> -	struct rds_ib_connection *ic;
>> +	struct rds_ib_connection *ic = conn->c_transport_data;
>>   
>>   	/* We will only ever look at IB transports */
>>   	if (conn->c_trans != &rds_ib_transport)
>> @@ -337,6 +338,10 @@ static int rds6_ib_conn_info_visitor(struct rds_connection *conn,
>>   
>>   	iinfo6->src_addr = conn->c_laddr;
>>   	iinfo6->dst_addr = conn->c_faddr;
>> +	if (ic) {
>> +		iinfo6->tos = conn->c_tos;
>> +		iinfo6->sl = ic->i_sl;
>> +	}
>>   
>>   	memset(&iinfo6->src_gid, 0, sizeof(iinfo6->src_gid));
>>   	memset(&iinfo6->dst_gid, 0, sizeof(iinfo6->dst_gid));
>> @@ -344,7 +349,6 @@ static int rds6_ib_conn_info_visitor(struct rds_connection *conn,
>>   	if (rds_conn_state(conn) == RDS_CONN_UP) {
>>   		struct rds_ib_device *rds_ibdev;
>>   
>> -		ic = conn->c_transport_data;
>>   		rdma_read_gids(ic->i_cm_id, (union ib_gid *)&iinfo6->src_gid,
> Again, *ic* is being dereferenced here without a previous null-check.

Please  check when this "rds_conn_state(conn) = RDS_CONN_UP".

Thanks a lot.

Zhu Yanjun

>
>>   			       (union ib_gid *)&iinfo6->dst_gid);
>>   		rds_ibdev = ic->rds_ibdev;
>
> --
> Gustavo
>
diff mbox series

Patch

diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h
index fd6b5f6..cba368e 100644
--- a/include/uapi/linux/rds.h
+++ b/include/uapi/linux/rds.h
@@ -250,6 +250,7 @@  struct rds_info_rdma_connection {
 	__u32		rdma_mr_max;
 	__u32		rdma_mr_size;
 	__u8		tos;
+	__u8		sl;
 	__u32		cache_allocs;
 };
 
@@ -265,6 +266,7 @@  struct rds6_info_rdma_connection {
 	__u32		rdma_mr_max;
 	__u32		rdma_mr_size;
 	__u8		tos;
+	__u8		sl;
 	__u32		cache_allocs;
 };
 
diff --git a/net/rds/ib.c b/net/rds/ib.c
index ec05d91..45acab2 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -291,7 +291,7 @@  static int rds_ib_conn_info_visitor(struct rds_connection *conn,
 				    void *buffer)
 {
 	struct rds_info_rdma_connection *iinfo = buffer;
-	struct rds_ib_connection *ic;
+	struct rds_ib_connection *ic = conn->c_transport_data;
 
 	/* We will only ever look at IB transports */
 	if (conn->c_trans != &rds_ib_transport)
@@ -301,15 +301,16 @@  static int rds_ib_conn_info_visitor(struct rds_connection *conn,
 
 	iinfo->src_addr = conn->c_laddr.s6_addr32[3];
 	iinfo->dst_addr = conn->c_faddr.s6_addr32[3];
-	iinfo->tos = conn->c_tos;
+	if (ic) {
+		iinfo->tos = conn->c_tos;
+		iinfo->sl = ic->i_sl;
+	}
 
 	memset(&iinfo->src_gid, 0, sizeof(iinfo->src_gid));
 	memset(&iinfo->dst_gid, 0, sizeof(iinfo->dst_gid));
 	if (rds_conn_state(conn) == RDS_CONN_UP) {
 		struct rds_ib_device *rds_ibdev;
 
-		ic = conn->c_transport_data;
-
 		rdma_read_gids(ic->i_cm_id, (union ib_gid *)&iinfo->src_gid,
 			       (union ib_gid *)&iinfo->dst_gid);
 
@@ -329,7 +330,7 @@  static int rds6_ib_conn_info_visitor(struct rds_connection *conn,
 				     void *buffer)
 {
 	struct rds6_info_rdma_connection *iinfo6 = buffer;
-	struct rds_ib_connection *ic;
+	struct rds_ib_connection *ic = conn->c_transport_data;
 
 	/* We will only ever look at IB transports */
 	if (conn->c_trans != &rds_ib_transport)
@@ -337,6 +338,10 @@  static int rds6_ib_conn_info_visitor(struct rds_connection *conn,
 
 	iinfo6->src_addr = conn->c_laddr;
 	iinfo6->dst_addr = conn->c_faddr;
+	if (ic) {
+		iinfo6->tos = conn->c_tos;
+		iinfo6->sl = ic->i_sl;
+	}
 
 	memset(&iinfo6->src_gid, 0, sizeof(iinfo6->src_gid));
 	memset(&iinfo6->dst_gid, 0, sizeof(iinfo6->dst_gid));
@@ -344,7 +349,6 @@  static int rds6_ib_conn_info_visitor(struct rds_connection *conn,
 	if (rds_conn_state(conn) == RDS_CONN_UP) {
 		struct rds_ib_device *rds_ibdev;
 
-		ic = conn->c_transport_data;
 		rdma_read_gids(ic->i_cm_id, (union ib_gid *)&iinfo6->src_gid,
 			       (union ib_gid *)&iinfo6->dst_gid);
 		rds_ibdev = ic->rds_ibdev;
diff --git a/net/rds/ib.h b/net/rds/ib.h
index 303c6ee..f2b558e 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -220,6 +220,7 @@  struct rds_ib_connection {
 	/* Send/Recv vectors */
 	int			i_scq_vector;
 	int			i_rcq_vector;
+	u8			i_sl;
 };
 
 /* This assumes that atomic_t is at least 32 bits */
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index fddaa09..233f136 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -152,6 +152,9 @@  void rds_ib_cm_connect_complete(struct rds_connection *conn, struct rdma_cm_even
 		  RDS_PROTOCOL_MINOR(conn->c_version),
 		  ic->i_flowctl ? ", flow control" : "");
 
+	/* receive sl from the peer */
+	ic->i_sl = ic->i_cm_id->route.path_rec->sl;
+
 	atomic_set(&ic->i_cq_quiesce, 0);
 
 	/* Init rings and fill recv. this needs to wait until protocol
diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
index ff74c4b..28668ad 100644
--- a/net/rds/rdma_transport.c
+++ b/net/rds/rdma_transport.c
@@ -43,6 +43,9 @@ 
 static struct rdma_cm_id *rds6_rdma_listen_id;
 #endif
 
+/* Per IB specification 7.7.3, service level is a 4-bit field. */
+#define TOS_TO_SL(tos)		((tos) & 0xF)
+
 static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
 					 struct rdma_cm_event *event,
 					 bool isv6)
@@ -97,10 +100,13 @@  static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
 			struct rds_ib_connection *ibic;
 
 			ibic = conn->c_transport_data;
-			if (ibic && ibic->i_cm_id == cm_id)
+			if (ibic && ibic->i_cm_id == cm_id) {
+				cm_id->route.path_rec[0].sl =
+					TOS_TO_SL(conn->c_tos);
 				ret = trans->cm_initiate_connect(cm_id, isv6);
-			else
+			} else {
 				rds_conn_drop(conn);
+			}
 		}
 		break;