Message ID | 20220703155625.14497-1-yangx.jy@fujitsu.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | RDMA/rxe: Process received packets in time | expand |
On Sun, Jul 03, 2022 at 11:56:25PM +0800, Xiao Yang wrote: > If received packets (i.e. skb) stored in qp->resp_pkts > cannot be processed in time, they may be ovewritten/reused > and then lead to abnormal behavior. I just merged a bunch of fixed from Bob on atomics, do they solve this problem too by chance? Jason
On 2022/7/4 20:55, Jason Gunthorpe wrote: > On Sun, Jul 03, 2022 at 11:56:25PM +0800, Xiao Yang wrote: >> If received packets (i.e. skb) stored in qp->resp_pkts >> cannot be processed in time, they may be ovewritten/reused >> and then lead to abnormal behavior. > > I just merged a bunch of fixed from Bob on atomics, do they solve this > problem too by chance? Hi Jason, Do you mean that I should use RDMA for-next branch to confirm the issue? I don't think Bob's patches can solve this problem by chance, but I will use RDMA for-next branch to test the issue on my slow vm. Best Regards, Xiao Yang > > Jason
on 7/3/2022 11:56 PM, Xiao Yang wrote: > If received packets (i.e. skb) stored in qp->resp_pkts > cannot be processed in time, they may be ovewritten/reused > and then lead to abnormal behavior. > > For example, running test_qp_ex_rc_atomic_cmp_swp always > reproduced a panic on my slow vm: > -------------------------------------------------------- > [39867.797693] rdma_rxe: qp#17 state = GET ACK > [39867.800667] rdma_rxe: qp#17 state = GET WQE > [39867.800722] rdma_rxe: qp#17 state = CHECK PSN > [39867.800739] rdma_rxe: qp#17 state = CHECK ACK > [39867.800776] rdma_rxe: unexpected opcode > [39867.800790] rdma_rxe: qp#17 state = ERROR > ... > [39867.822361] BUG: kernel NULL pointer dereference, address: 0000000000000000 > [39867.822361] #PF: supervisor read access in kernel mode > [39867.822361] #PF: error_code(0x0000) - not-present page > [39867.822361] PGD 0 P4D 0 > [39867.822361] Oops: 0000 [#1] PREEMPT SMP NOPTI > [39867.822361] CPU: 3 PID: 19605 Comm: python3 Kdump: loaded Tainted: G W 5.19.0-rc2+ #33 > [39867.822361] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-29-g6a62e0cb0dfe-prebuilt.qemu.org 04/01/2014 > [39867.822361] RIP: 0010:rxe_completer+0x41f/0xd10 [rdma_rxe] > [39867.822361] Code: 41 83 ff 0d 0f 84 a7 02 00 00 41 83 ff 0e 0f 85 da 00 00 00 48 85 db 0f 84 74 fe ff ff 48 8b 6b 08 48 8d 7b d8 be 01 00 00 00 <4c> 8b 75 00 e8 68 65 56 dd 48 8d bd a0 01 00 00 e8 8c 47 00 00 4c > [39867.822361] RSP: 0000:ffff9eab813c7e18 EFLAGS: 00000286 > [39867.822361] RAX: 0000000000022b5a RBX: ffff8adb07efeb28 RCX: 0000000000000006 > [39867.822361] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8adb07efeb00 > [39867.822361] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001 > [39867.822361] R10: 0000000000000000 R11: 0000000000011fac R12: ffff8adb06ae11a0 > [39867.822361] R13: 00000000fffffff5 R14: ffff9eab81803e80 R15: 000000000000000c > [39867.822361] FS: 00007f87f99eb6c0(0000) GS:ffff8adb3dd00000(0000) knlGS:0000000000000000 > [39867.822361] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [39867.822361] CR2: 0000000000000000 CR3: 0000000106c3e000 CR4: 00000000000006e0 > [39867.822361] Call Trace: > [39867.822361] <TASK> > [39867.822361] ? __local_bh_enable_ip+0x83/0xf0 > [39867.822361] rxe_do_task+0x67/0xc0 [rdma_rxe] > [39867.822361] tasklet_action_common.isra.0+0xe2/0x110 > [39867.822361] __do_softirq+0xf0/0x4b5 > [39867.822361] irq_exit_rcu+0xef/0x130 > [39867.822361] sysvec_apic_timer_interrupt+0x40/0xc0 > [39867.822361] asm_sysvec_apic_timer_interrupt+0x1b/0x20 > [39867.822361] RIP: 0033:0x7f87f96fcce5 > ... > -------------------------------------------------------- > > The root cause is that a received skb stored in qp->resp_pkts > has been ovewritten by another QP. Like the following logic: > -------------------------------------------------------- > /* Trigger timeout */ > Requester(QP#17) -> retransmit_timer() > -> rxe_completer() > /* No skb (Atomic ACK) is added > * to qp->resp_pkts */ > qp->req.need_retry = 1; > rxe_run_task(&qp->req.task, 0); > > /* First skb (Atomic ACK) arrived && > * rxe_requester() is not called yet */ > Requester(QP#17) -> rxe_rcv() > /* Add the skb (Atomic ACK) into qp->resp_pkts */ > -> rxe_comp_queue_pkt() > -> rxe_completer() > /* qp->req.need_retry == 1 */ > if (qp->req.need_retry) { > ret = -EAGAIN; > goto done; > } > > Responder(QP#18) -> rxe_rcv() > /* The skb (Atomic ACK) has been reused > * to store the Atomic Comapre_Swap request */ Well, i didn't see how/when another QP can reuse/overwrite this skb. If this is true, i think we should fix it as well. Thanks > -> rxe_responder() > > /* rxe_requester() is called */ > Requester(QP#17) -> rxe_requester() > if (unlikely(qp->req.need_retry)) { > req_retry(qp); > qp->req.need_retry = 0; > } > > /* Trigger timeout again */ > Requester(QP#17) -> retransmit_timer() > -> rxe_completer() > /* qp->req.need_retry == 0 */ > if (qp->req.need_retry) { > ret = -EAGAIN; > goto done; > } > /* check_ack() throws "unexpected opcode" */ > -> check_ack() > return COMPST_ERROR; > -------------------------------------------------------- > > If qp->req.need_retry is set and qp->resp_pkts is not empty, > Process received packets in time to fix the issue. > > Fixes: 8700e3e7c485 ("Soft RoCE driver") > Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> > --- > drivers/infiniband/sw/rxe/rxe_comp.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c > index da3a398053b8..8ffc874a25af 100644 > --- a/drivers/infiniband/sw/rxe/rxe_comp.c > +++ b/drivers/infiniband/sw/rxe/rxe_comp.c > @@ -580,7 +580,7 @@ int rxe_completer(void *arg) > qp->comp.timeout_retry = 0; > } > > - if (qp->req.need_retry) { > + if (qp->req.need_retry && !skb_queue_len(&qp->resp_pkts)) { > ret = -EAGAIN; > goto done; > }
On 2022/7/4 22:23, yangx.jy@fujitsu.com wrote: > On 2022/7/4 20:55, Jason Gunthorpe wrote: >> On Sun, Jul 03, 2022 at 11:56:25PM +0800, Xiao Yang wrote: >>> If received packets (i.e. skb) stored in qp->resp_pkts >>> cannot be processed in time, they may be ovewritten/reused >>> and then lead to abnormal behavior. >> >> I just merged a bunch of fixed from Bob on atomics, do they solve this >> problem too by chance? > Hi Jason, > > Do you mean that I should use RDMA for-next branch to confirm the issue? > > I don't think Bob's patches can solve this problem by chance, but I will > use RDMA for-next branch to test the issue on my slow vm. Hi Jason, I confirmed that the following patch avoided this issue on my slow vm. commit dc1848388137d20e5786b976caa49a26889f36f3 Author: Bob Pearson <rpearsonhpe@gmail.com> Date: Mon Jun 6 09:38:37 2022 -0500 RDMA/rxe: Merge normal and retry atomic flows I think this issue is related to res->atomic.skb which has been removed by the patch. However I am still confused about why res->atomic.skb will be overwritten/reused as my commit message described. Best Regards, Xiao Yang > > Best Regards, > Xiao Yang >> >> Jason
I am on vacation until July 12. I will be happy to review these when I get back. Bob -----Original Message----- From: yangx.jy@fujitsu.com <yangx.jy@fujitsu.com> Sent: Friday, July 8, 2022 1:34 AM To: Jason Gunthorpe <jgg@ziepe.ca>; Bob Pearson <rpearsonhpe@gmail.com> Cc: linux-rdma@vger.kernel.org; leon@kernel.org; zyjzyj2000@gmail.com Subject: Re: [PATCH] RDMA/rxe: Process received packets in time On 2022/7/4 22:23, yangx.jy@fujitsu.com wrote: > On 2022/7/4 20:55, Jason Gunthorpe wrote: >> On Sun, Jul 03, 2022 at 11:56:25PM +0800, Xiao Yang wrote: >>> If received packets (i.e. skb) stored in qp->resp_pkts cannot be >>> processed in time, they may be ovewritten/reused and then lead to >>> abnormal behavior. >> >> I just merged a bunch of fixed from Bob on atomics, do they solve >> this problem too by chance? > Hi Jason, > > Do you mean that I should use RDMA for-next branch to confirm the issue? > > I don't think Bob's patches can solve this problem by chance, but I > will use RDMA for-next branch to test the issue on my slow vm. Hi Jason, I confirmed that the following patch avoided this issue on my slow vm. commit dc1848388137d20e5786b976caa49a26889f36f3 Author: Bob Pearson <rpearsonhpe@gmail.com> Date: Mon Jun 6 09:38:37 2022 -0500 RDMA/rxe: Merge normal and retry atomic flows I think this issue is related to res->atomic.skb which has been removed by the patch. However I am still confused about why res->atomic.skb will be overwritten/reused as my commit message described. Best Regards, Xiao Yang > > Best Regards, > Xiao Yang >> >> Jason
diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c index da3a398053b8..8ffc874a25af 100644 --- a/drivers/infiniband/sw/rxe/rxe_comp.c +++ b/drivers/infiniband/sw/rxe/rxe_comp.c @@ -580,7 +580,7 @@ int rxe_completer(void *arg) qp->comp.timeout_retry = 0; } - if (qp->req.need_retry) { + if (qp->req.need_retry && !skb_queue_len(&qp->resp_pkts)) { ret = -EAGAIN; goto done; }
If received packets (i.e. skb) stored in qp->resp_pkts cannot be processed in time, they may be ovewritten/reused and then lead to abnormal behavior. For example, running test_qp_ex_rc_atomic_cmp_swp always reproduced a panic on my slow vm: -------------------------------------------------------- [39867.797693] rdma_rxe: qp#17 state = GET ACK [39867.800667] rdma_rxe: qp#17 state = GET WQE [39867.800722] rdma_rxe: qp#17 state = CHECK PSN [39867.800739] rdma_rxe: qp#17 state = CHECK ACK [39867.800776] rdma_rxe: unexpected opcode [39867.800790] rdma_rxe: qp#17 state = ERROR ... [39867.822361] BUG: kernel NULL pointer dereference, address: 0000000000000000 [39867.822361] #PF: supervisor read access in kernel mode [39867.822361] #PF: error_code(0x0000) - not-present page [39867.822361] PGD 0 P4D 0 [39867.822361] Oops: 0000 [#1] PREEMPT SMP NOPTI [39867.822361] CPU: 3 PID: 19605 Comm: python3 Kdump: loaded Tainted: G W 5.19.0-rc2+ #33 [39867.822361] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-29-g6a62e0cb0dfe-prebuilt.qemu.org 04/01/2014 [39867.822361] RIP: 0010:rxe_completer+0x41f/0xd10 [rdma_rxe] [39867.822361] Code: 41 83 ff 0d 0f 84 a7 02 00 00 41 83 ff 0e 0f 85 da 00 00 00 48 85 db 0f 84 74 fe ff ff 48 8b 6b 08 48 8d 7b d8 be 01 00 00 00 <4c> 8b 75 00 e8 68 65 56 dd 48 8d bd a0 01 00 00 e8 8c 47 00 00 4c [39867.822361] RSP: 0000:ffff9eab813c7e18 EFLAGS: 00000286 [39867.822361] RAX: 0000000000022b5a RBX: ffff8adb07efeb28 RCX: 0000000000000006 [39867.822361] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8adb07efeb00 [39867.822361] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001 [39867.822361] R10: 0000000000000000 R11: 0000000000011fac R12: ffff8adb06ae11a0 [39867.822361] R13: 00000000fffffff5 R14: ffff9eab81803e80 R15: 000000000000000c [39867.822361] FS: 00007f87f99eb6c0(0000) GS:ffff8adb3dd00000(0000) knlGS:0000000000000000 [39867.822361] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [39867.822361] CR2: 0000000000000000 CR3: 0000000106c3e000 CR4: 00000000000006e0 [39867.822361] Call Trace: [39867.822361] <TASK> [39867.822361] ? __local_bh_enable_ip+0x83/0xf0 [39867.822361] rxe_do_task+0x67/0xc0 [rdma_rxe] [39867.822361] tasklet_action_common.isra.0+0xe2/0x110 [39867.822361] __do_softirq+0xf0/0x4b5 [39867.822361] irq_exit_rcu+0xef/0x130 [39867.822361] sysvec_apic_timer_interrupt+0x40/0xc0 [39867.822361] asm_sysvec_apic_timer_interrupt+0x1b/0x20 [39867.822361] RIP: 0033:0x7f87f96fcce5 ... -------------------------------------------------------- The root cause is that a received skb stored in qp->resp_pkts has been ovewritten by another QP. Like the following logic: -------------------------------------------------------- /* Trigger timeout */ Requester(QP#17) -> retransmit_timer() -> rxe_completer() /* No skb (Atomic ACK) is added * to qp->resp_pkts */ qp->req.need_retry = 1; rxe_run_task(&qp->req.task, 0); /* First skb (Atomic ACK) arrived && * rxe_requester() is not called yet */ Requester(QP#17) -> rxe_rcv() /* Add the skb (Atomic ACK) into qp->resp_pkts */ -> rxe_comp_queue_pkt() -> rxe_completer() /* qp->req.need_retry == 1 */ if (qp->req.need_retry) { ret = -EAGAIN; goto done; } Responder(QP#18) -> rxe_rcv() /* The skb (Atomic ACK) has been reused * to store the Atomic Comapre_Swap request */ -> rxe_responder() /* rxe_requester() is called */ Requester(QP#17) -> rxe_requester() if (unlikely(qp->req.need_retry)) { req_retry(qp); qp->req.need_retry = 0; } /* Trigger timeout again */ Requester(QP#17) -> retransmit_timer() -> rxe_completer() /* qp->req.need_retry == 0 */ if (qp->req.need_retry) { ret = -EAGAIN; goto done; } /* check_ack() throws "unexpected opcode" */ -> check_ack() return COMPST_ERROR; -------------------------------------------------------- If qp->req.need_retry is set and qp->resp_pkts is not empty, Process received packets in time to fix the issue. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> --- drivers/infiniband/sw/rxe/rxe_comp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)