diff mbox series

[for-rc] RDMA/siw: add sendpage_ok() check to disable MSG_SPLICE_PAGES

Message ID 20241003124611.35060-1-showrya@chelsio.com (mailing list archive)
State Superseded
Headers show
Series [for-rc] RDMA/siw: add sendpage_ok() check to disable MSG_SPLICE_PAGES | expand

Commit Message

Showrya M N Oct. 3, 2024, 12:46 p.m. UTC
While running ISER over SIW, the initiator machine encounters a warning
from skb_splice_from_iter() indicating that a slab page is being used
in send_page. To address this, it is better to add a sendpage_ok() check
within the driver itself, and if it returns 0, then MSG_SPLICE_PAGES flag
should be disabled before entering the network stack.

A similar issue has been discussed for NVMe in this thread:
https://lore.kernel.org/all/20240530142417.146696-1-ofir.gal@volumez.com/

stack trace:
...
[ 2157.532917] WARNING: CPU: 0 PID: 5342 at net/core/skbuff.c:7140 skb_splice_from_iter+0x173/0x320
Call Trace:
[ 2157.533064] Call Trace:
[ 2157.533069]  ? __warn+0x84/0x130
[ 2157.533073]  ? skb_splice_from_iter+0x173/0x320
[ 2157.533075]  ? report_bug+0xfc/0x1e0
[ 2157.533081]  ? handle_bug+0x3f/0x70
[ 2157.533085]  ? exc_invalid_op+0x17/0x70
[ 2157.533088]  ? asm_exc_invalid_op+0x1a/0x20
[ 2157.533096]  ? skb_splice_from_iter+0x173/0x320
[ 2157.533101]  tcp_sendmsg_locked+0x368/0xe40
[ 2157.533111]  siw_tx_hdt+0x695/0xa40 [siw]
[ 2157.533134]  ? sched_balance_find_src_group+0x44/0x4f0
[ 2157.533143]  ? __update_load_avg_cfs_rq+0x272/0x300
[ 2157.533152]  ? place_entity+0x19/0xf0
[ 2157.533157]  ? enqueue_entity+0xdb/0x3d0
[ 2157.533162]  ? pick_eevdf+0xe2/0x120
[ 2157.533169]  ? check_preempt_wakeup_fair+0x161/0x1f0
[ 2157.533174]  ? wakeup_preempt+0x61/0x70
[ 2157.533177]  ? ttwu_do_activate+0x5d/0x1e0
[ 2157.533183]  ? try_to_wake_up+0x78/0x610
[ 2157.533188]  ? xas_load+0xd/0xb0
[ 2157.533193]  ? xa_load+0x80/0xb0
[ 2157.533200]  siw_qp_sq_process+0x102/0xb00 [siw]
[ 2157.533213]  ? __pfx_siw_run_sq+0x10/0x10 [siw]
[ 2157.533224]  siw_sq_resume+0x39/0x110 [siw]
[ 2157.533236]  siw_run_sq+0x74/0x160 [siw]
[ 2157.533246]  ? __pfx_autoremove_wake_function+0x10/0x10
[ 2157.533252]  kthread+0xd2/0x100
[ 2157.533257]  ? __pfx_kthread+0x10/0x10
[ 2157.533261]  ret_from_fork+0x34/0x40
[ 2157.533266]  ? __pfx_kthread+0x10/0x10
[ 2157.533269]  ret_from_fork_asm+0x1a/0x30
.
[ 2157.533301] iser: iser_qp_event_callback: qp event QP request error (2)
[ 2157.533307] iser: iser_qp_event_callback: qp event send queue drained (5)
[ 2157.533348]  connection26:0: detected conn error (1011)

Signed-off-by: Showrya M N <showrya@chelsio.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
---
 drivers/infiniband/sw/siw/siw_qp_tx.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Bernard Metzler Oct. 3, 2024, 2:25 p.m. UTC | #1
> -----Original Message-----
> From: Showrya M N <showrya@chelsio.com>
> Sent: Thursday, October 3, 2024 2:46 PM
> To: jgg@nvidia.com; leonro@nvidia.com; Bernard Metzler <BMT@zurich.ibm.com>
> Cc: linux-rdma@vger.kernel.org; Showrya M N <showrya@chelsio.com>; Potnuri
> Bharat Teja <bharat@chelsio.com>
> Subject: [EXTERNAL] [PATCH for-rc] RDMA/siw: add sendpage_ok() check to
> disable MSG_SPLICE_PAGES
> 
> While running ISER over SIW, the initiator machine encounters a warning
> from skb_splice_from_iter() indicating that a slab page is being used
> in send_page. To address this, it is better to add a sendpage_ok() check
> within the driver itself, and if it returns 0, then MSG_SPLICE_PAGES flag
> should be disabled before entering the network stack.
> 
> A similar issue has been discussed for NVMe in this thread:
> INVALID URI REMOVED
> 3A__lore.kernel.org_all_20240530142417.146696-2D1-2Dofir.gal-
> 40volumez.com_&d=DwIDAg&c=BSDicqBQBDjDI9RkVyTcHQ&r=4ynb4Sj_4MUcZXbhvovE4tYS
> bqxyOwdSiLedP4yO55g&m=5zLkbzFXCNvwiYwzyhSLC9r5GQnIt4VKawdBIinJdVEfirE1BFqwS
> 9QGbhVWOOKo&s=wxM2sRuzDoq36W23_jA5pNgeGDEPfTmyPhsvPqp0_-E&e=
> 
> stack trace:
> ...
> [ 2157.532917] WARNING: CPU: 0 PID: 5342 at net/core/skbuff.c:7140
> skb_splice_from_iter+0x173/0x320
> Call Trace:
> [ 2157.533064] Call Trace:
> [ 2157.533069]  ? __warn+0x84/0x130
> [ 2157.533073]  ? skb_splice_from_iter+0x173/0x320
> [ 2157.533075]  ? report_bug+0xfc/0x1e0
> [ 2157.533081]  ? handle_bug+0x3f/0x70
> [ 2157.533085]  ? exc_invalid_op+0x17/0x70
> [ 2157.533088]  ? asm_exc_invalid_op+0x1a/0x20
> [ 2157.533096]  ? skb_splice_from_iter+0x173/0x320
> [ 2157.533101]  tcp_sendmsg_locked+0x368/0xe40
> [ 2157.533111]  siw_tx_hdt+0x695/0xa40 [siw]
> [ 2157.533134]  ? sched_balance_find_src_group+0x44/0x4f0
> [ 2157.533143]  ? __update_load_avg_cfs_rq+0x272/0x300
> [ 2157.533152]  ? place_entity+0x19/0xf0
> [ 2157.533157]  ? enqueue_entity+0xdb/0x3d0
> [ 2157.533162]  ? pick_eevdf+0xe2/0x120
> [ 2157.533169]  ? check_preempt_wakeup_fair+0x161/0x1f0
> [ 2157.533174]  ? wakeup_preempt+0x61/0x70
> [ 2157.533177]  ? ttwu_do_activate+0x5d/0x1e0
> [ 2157.533183]  ? try_to_wake_up+0x78/0x610
> [ 2157.533188]  ? xas_load+0xd/0xb0
> [ 2157.533193]  ? xa_load+0x80/0xb0
> [ 2157.533200]  siw_qp_sq_process+0x102/0xb00 [siw]
> [ 2157.533213]  ? __pfx_siw_run_sq+0x10/0x10 [siw]
> [ 2157.533224]  siw_sq_resume+0x39/0x110 [siw]
> [ 2157.533236]  siw_run_sq+0x74/0x160 [siw]
> [ 2157.533246]  ? __pfx_autoremove_wake_function+0x10/0x10
> [ 2157.533252]  kthread+0xd2/0x100
> [ 2157.533257]  ? __pfx_kthread+0x10/0x10
> [ 2157.533261]  ret_from_fork+0x34/0x40
> [ 2157.533266]  ? __pfx_kthread+0x10/0x10
> [ 2157.533269]  ret_from_fork_asm+0x1a/0x30
> .
> [ 2157.533301] iser: iser_qp_event_callback: qp event QP request error (2)
> [ 2157.533307] iser: iser_qp_event_callback: qp event send queue drained
> (5)
> [ 2157.533348]  connection26:0: detected conn error (1011)
> 
> Signed-off-by: Showrya M N <showrya@chelsio.com>
> Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
> ---
>  drivers/infiniband/sw/siw/siw_qp_tx.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c
> b/drivers/infiniband/sw/siw/siw_qp_tx.c
> index 64ad9e0895bd..d777d06037db 100644
> --- a/drivers/infiniband/sw/siw/siw_qp_tx.c
> +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
> @@ -334,6 +334,9 @@ static int siw_tcp_sendpages(struct socket *s, struct
> page **page, int offset,
>  		bvec_set_page(&bvec, page[i], bytes, offset);
>  		iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
> 
> +		if (!sendpage_ok(page[i]))
> +			msg.msg_flags &= ~MSG_SPLICE_PAGES;
> +

Thanks! This looks good to me. Although, I would suggest moving
this further up just before bvec_set_page(): While it
is not dong anything to the page, it looks more clean
to first alter the pages flags before linking the bvec with it.

Thanks! Bernard

>  try_page_again:
>  		lock_sock(sk);
>  		rv = tcp_sendmsg_locked(sk, &msg, size);
> --
> 2.39.1
Showrya M N Oct. 4, 2024, 6:13 a.m. UTC | #2
ok bernard, I will send a v2.

Thanks,
Showrya

-----Original Message-----
From: Bernard Metzler <BMT@zurich.ibm.com> 
Sent: 03 October 2024 19:56
To: Showrya M N <showrya@chelsio.com>; jgg@nvidia.com; leonro@nvidia.com
Cc: linux-rdma@vger.kernel.org; Potnuri Bharat Teja <bharat@chelsio.com>
Subject: RE: [PATCH for-rc] RDMA/siw: add sendpage_ok() check to disable MSG_SPLICE_PAGES



> -----Original Message-----
> From: Showrya M N <showrya@chelsio.com>
> Sent: Thursday, October 3, 2024 2:46 PM
> To: jgg@nvidia.com; leonro@nvidia.com; Bernard Metzler 
> <BMT@zurich.ibm.com>
> Cc: linux-rdma@vger.kernel.org; Showrya M N <showrya@chelsio.com>; 
> Potnuri Bharat Teja <bharat@chelsio.com>
> Subject: [EXTERNAL] [PATCH for-rc] RDMA/siw: add sendpage_ok() check 
> to disable MSG_SPLICE_PAGES
> 
> While running ISER over SIW, the initiator machine encounters a 
> warning from skb_splice_from_iter() indicating that a slab page is 
> being used in send_page. To address this, it is better to add a 
> sendpage_ok() check within the driver itself, and if it returns 0, 
> then MSG_SPLICE_PAGES flag should be disabled before entering the network stack.
> 
> A similar issue has been discussed for NVMe in this thread:
> INVALID URI REMOVED
> 3A__lore.kernel.org_all_20240530142417.146696-2D1-2Dofir.gal-
> 40volumez.com_&d=DwIDAg&c=BSDicqBQBDjDI9RkVyTcHQ&r=4ynb4Sj_4MUcZXbhvov
> E4tYS 
> bqxyOwdSiLedP4yO55g&m=5zLkbzFXCNvwiYwzyhSLC9r5GQnIt4VKawdBIinJdVEfirE1
> BFqwS 9QGbhVWOOKo&s=wxM2sRuzDoq36W23_jA5pNgeGDEPfTmyPhsvPqp0_-E&e=
> 
> stack trace:
> ...
> [ 2157.532917] WARNING: CPU: 0 PID: 5342 at net/core/skbuff.c:7140
> skb_splice_from_iter+0x173/0x320
> Call Trace:
> [ 2157.533064] Call Trace:
> [ 2157.533069]  ? __warn+0x84/0x130
> [ 2157.533073]  ? skb_splice_from_iter+0x173/0x320 [ 2157.533075]  ? 
> report_bug+0xfc/0x1e0 [ 2157.533081]  ? handle_bug+0x3f/0x70 [ 
> 2157.533085]  ? exc_invalid_op+0x17/0x70 [ 2157.533088]  ? 
> asm_exc_invalid_op+0x1a/0x20 [ 2157.533096]  ? 
> skb_splice_from_iter+0x173/0x320 [ 2157.533101]  
> tcp_sendmsg_locked+0x368/0xe40 [ 2157.533111]  siw_tx_hdt+0x695/0xa40 
> [siw] [ 2157.533134]  ? sched_balance_find_src_group+0x44/0x4f0
> [ 2157.533143]  ? __update_load_avg_cfs_rq+0x272/0x300
> [ 2157.533152]  ? place_entity+0x19/0xf0 [ 2157.533157]  ? 
> enqueue_entity+0xdb/0x3d0 [ 2157.533162]  ? pick_eevdf+0xe2/0x120 [ 
> 2157.533169]  ? check_preempt_wakeup_fair+0x161/0x1f0
> [ 2157.533174]  ? wakeup_preempt+0x61/0x70 [ 2157.533177]  ? 
> ttwu_do_activate+0x5d/0x1e0 [ 2157.533183]  ? 
> try_to_wake_up+0x78/0x610 [ 2157.533188]  ? xas_load+0xd/0xb0 [ 
> 2157.533193]  ? xa_load+0x80/0xb0 [ 2157.533200]  
> siw_qp_sq_process+0x102/0xb00 [siw] [ 2157.533213]  ? 
> __pfx_siw_run_sq+0x10/0x10 [siw] [ 2157.533224]  
> siw_sq_resume+0x39/0x110 [siw] [ 2157.533236]  siw_run_sq+0x74/0x160 
> [siw] [ 2157.533246]  ? __pfx_autoremove_wake_function+0x10/0x10
> [ 2157.533252]  kthread+0xd2/0x100
> [ 2157.533257]  ? __pfx_kthread+0x10/0x10 [ 2157.533261]  
> ret_from_fork+0x34/0x40 [ 2157.533266]  ? __pfx_kthread+0x10/0x10 [ 
> 2157.533269]  ret_from_fork_asm+0x1a/0x30 .
> [ 2157.533301] iser: iser_qp_event_callback: qp event QP request error 
> (2) [ 2157.533307] iser: iser_qp_event_callback: qp event send queue 
> drained
> (5)
> [ 2157.533348]  connection26:0: detected conn error (1011)
> 
> Signed-off-by: Showrya M N <showrya@chelsio.com>
> Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
> ---
>  drivers/infiniband/sw/siw/siw_qp_tx.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c
> b/drivers/infiniband/sw/siw/siw_qp_tx.c
> index 64ad9e0895bd..d777d06037db 100644
> --- a/drivers/infiniband/sw/siw/siw_qp_tx.c
> +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
> @@ -334,6 +334,9 @@ static int siw_tcp_sendpages(struct socket *s, 
> struct page **page, int offset,
>  		bvec_set_page(&bvec, page[i], bytes, offset);
>  		iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
> 
> +		if (!sendpage_ok(page[i]))
> +			msg.msg_flags &= ~MSG_SPLICE_PAGES;
> +

Thanks! This looks good to me. Although, I would suggest moving this further up just before bvec_set_page(): While it is not dong anything to the page, it looks more clean to first alter the pages flags before linking the bvec with it.

Thanks! Bernard

>  try_page_again:
>  		lock_sock(sk);
>  		rv = tcp_sendmsg_locked(sk, &msg, size);
> --
> 2.39.1
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c
index 64ad9e0895bd..d777d06037db 100644
--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
@@ -334,6 +334,9 @@  static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset,
 		bvec_set_page(&bvec, page[i], bytes, offset);
 		iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
 
+		if (!sendpage_ok(page[i]))
+			msg.msg_flags &= ~MSG_SPLICE_PAGES;
+
 try_page_again:
 		lock_sock(sk);
 		rv = tcp_sendmsg_locked(sk, &msg, size);