Message ID | 20210909204456.7476-1-rpearsonhpe@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | RDMA/rxe: Various bug fixes. | expand |
On 9/9/21 1:44 PM, Bob Pearson wrote: > This series of patches implements several bug fixes and minor > cleanups of the rxe driver. Specifically these fix a bug exposed > by blktest. > > They apply cleanly to both > commit 2169b908894df2ce83e7eb4a399d3224b2635126 (origin/for-rc, for-rc) > commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (HEAD -> for-next, > origin/wip/jgg-for-next, origin/for-next, origin/HEAD) > > These are being resubmitted to for-rc instead of for-next. Hi Bob, Thanks for having rebased and reposted this patch series. I have applied this series on top of commit 2169b908894d ("IB/hfi1: make hist static"). A kernel bug was triggered while running test srp/001. I have attached the kernel configuration used in my test to this email. Thanks, Bart. ib_srpt Received SRP_LOGIN_REQ with i_port_id fe80:0000:0000:0000:5054:00ff:fe86:7464, t_port_id 5054:00ff:fe86:7464:5054:00ff:fe86:7464 and it_iu_len 8260 on port 1 (guid=fe80:0000:0000:0000:5054:00ff:fe86:7464); pkey 0xffff BUG: unable to handle page fault for address: ffffc900e357d614 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 100000067 P4D 100000067 PUD 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 26 PID: 148 Comm: ksoftirqd/26 Tainted: G E 5.14.0-rc6-dbg+ #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 RIP: 0010:rxe_completer+0x96d/0x1050 [rdma_rxe] Code: e0 49 8b 44 24 08 44 89 e9 41 d3 e6 4e 8d a4 30 80 01 00 00 4d 85 e4 0f 84 f9 00 00 00 49 8d bc 24 94 00 00 00 e8 73 a8 b1 e0 <41> 8b 84 24 94 00 00 00 85 c0 0f 84 df 00 00 00 83 f8 03 0f 84 bf RSP: 0018:ffff8881014075f8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88813c67c000 RCX: dffffc0000000000 RDX: 0000000000000007 RSI: ffffffff826920c0 RDI: ffffc900e357d614 RBP: ffff8881014076e8 R08: ffffffffa09b228d R09: ffff88813c67c57b R10: ffffed10278cf8af R11: 0000000000000000 R12: ffffc900e357d580 R13: 000000000000000a R14: 00000000d9c99400 R15: ffff8881515ddd08 FS: 0000000000000000(0000) GS:ffff88842d100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc900e357d614 CR3: 0000000002e29005 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: rxe_do_task+0xdd/0x160 [rdma_rxe] rxe_run_task+0x67/0x80 [rdma_rxe] rxe_comp_queue_pkt+0x75/0x80 [rdma_rxe] rxe_rcv+0x345/0x480 [rdma_rxe] rxe_xmit_packet+0x1af/0x300 [rdma_rxe] send_ack.isra.0+0x88/0xd0 [rdma_rxe] rxe_responder+0xf4c/0x15e0 [rdma_rxe] rxe_do_task+0xdd/0x160 [rdma_rxe] rxe_run_task+0x67/0x80 [rdma_rxe] rxe_resp_queue_pkt+0x5a/0x60 [rdma_rxe] rxe_rcv+0x370/0x480 [rdma_rxe] rxe_xmit_packet+0x1af/0x300 [rdma_rxe] rxe_requester+0x4f4/0xe80 [rdma_rxe] rxe_do_task+0xdd/0x160 [rdma_rxe] tasklet_action_common.constprop.0+0x168/0x1b0 tasklet_action+0x44/0x60 __do_softirq+0x1db/0x6ed run_ksoftirqd+0x37/0x60 smpboot_thread_fn+0x302/0x410 kthread+0x1f6/0x220 ret_from_fork+0x1f/0x30 Modules linked in: ib_srp(E) scsi_transport_srp(E) target_core_user(E) uio(E) target_core_pscsi(E) target_core_file(E) ib_srpt(E) target_core_iblock(E) target_core_mod(E) ib_umad(E) rdma_ucm(E) ib_iser(E) libiscsi(E) scsi_transport_iscsi(E) rdma_cm(E) iw_cm(E) scsi_debug(E) ib_cm(E) rdma_rxe(E) ip6_udp_tunnel(E) udp_tunnel(E) ib_uverbs(E) null_blk(E) ib_core(E) brd(E) af_packet(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_tables(E) ebtable_nat(E) iTCO_wdt(E) watchdog(E) ebtable_broute(E) intel_rapl_msr(E) intel_pmc_bxt(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) iptable_mangle(E) iptable_raw(E) ip_set(E) nfnetlink(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) rfkill(E) iptable_filter(E) ip_tables(E) x_tables(E) bpfilter(E) intel_rapl_common(E) iosf_mbi(E) isst_if_common(E) i2c_i801(E) pcspkr(E) i2c_smbus(E) virtio_net(E) lpc_ich(E) virtio_balloon(E) net_failover(E) failover(E) tiny_power_button(E) button(E) fuse(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) aesni_intel(E) crypto_simd(E) cryptd(E) sr_mod(E) serio_raw(E) cdrom(E) virtio_gpu(E) virtio_dma_buf(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) cec(E) drm(E) qemu_fw_cfg(E) sg(E) nbd(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) virtio_rng(E) CR2: ffffc900e357d614 ---[ end trace 0667a278da47193a ]--- RIP: 0010:rxe_completer+0x96d/0x1050 [rdma_rxe] Code: e0 49 8b 44 24 08 44 89 e9 41 d3 e6 4e 8d a4 30 80 01 00 00 4d 85 e4 0f 84 f9 00 00 00 49 8d bc 24 94 00 00 00 e8 73 a8 b1 e0 <41> 8b 84 24 94 00 00 00 85 c0 0f 84 df 00 00 00 83 f8 03 0f 84 bf RSP: 0018:ffff8881014075f8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88813c67c000 RCX: dffffc0000000000 RDX: 0000000000000007 RSI: ffffffff826920c0 RDI: ffffc900e357d614 RBP: ffff8881014076e8 R08: ffffffffa09b228d R09: ffff88813c67c57b R10: ffffed10278cf8af R11: 0000000000000000 R12: ffffc900e357d580 R13: 000000000000000a R14: 00000000d9c99400 R15: ffff8881515ddd08 FS: 0000000000000000(0000) GS:ffff88842d100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc900e357d614 CR3: 0000000002e29005 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled Rebooting in 90 seconds..
Bart, I was able to run this test case but it is not failing. On my system it passes in ~1sec. I have several questions about your system setup. 1. Which rdma-core are you running? Out of box or the github tree? 2. Can you run ib_send_bw? Python test suite in rdma-core? 3. Where did you get the kernel bits? Which git tree? Which branch? Thanks, Bob Pearson -----Original Message----- From: Bart Van Assche <bvanassche@acm.org> Sent: Thursday, September 9, 2021 4:52 PM To: Bob Pearson <rpearsonhpe@gmail.com>; jgg@nvidia.com; zyjzyj2000@gmail.com; linux-rdma@vger.kernel.org; mie@igel.co.jp Subject: Re: [PATCH for-rc v3 0/6] RDMA/rxe: Various bug fixes. On 9/9/21 1:44 PM, Bob Pearson wrote: > This series of patches implements several bug fixes and minor cleanups > of the rxe driver. Specifically these fix a bug exposed by blktest. > > They apply cleanly to both > commit 2169b908894df2ce83e7eb4a399d3224b2635126 (origin/for-rc, > for-rc) commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (HEAD -> for-next, > origin/wip/jgg-for-next, origin/for-next, origin/HEAD) > > These are being resubmitted to for-rc instead of for-next. Hi Bob, Thanks for having rebased and reposted this patch series. I have applied this series on top of commit 2169b908894d ("IB/hfi1: make hist static"). A kernel bug was triggered while running test srp/001. I have attached the kernel configuration used in my test to this email. Thanks, Bart. ib_srpt Received SRP_LOGIN_REQ with i_port_id fe80:0000:0000:0000:5054:00ff:fe86:7464, t_port_id 5054:00ff:fe86:7464:5054:00ff:fe86:7464 and it_iu_len 8260 on port 1 (guid=fe80:0000:0000:0000:5054:00ff:fe86:7464); pkey 0xffff BUG: unable to handle page fault for address: ffffc900e357d614 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 100000067 P4D 100000067 PUD 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 26 PID: 148 Comm: ksoftirqd/26 Tainted: G E 5.14.0-rc6-dbg+ #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 RIP: 0010:rxe_completer+0x96d/0x1050 [rdma_rxe] Code: e0 49 8b 44 24 08 44 89 e9 41 d3 e6 4e 8d a4 30 80 01 00 00 4d 85 e4 0f 84 f9 00 00 00 49 8d bc 24 94 00 00 00 e8 73 a8 b1 e0 <41> 8b 84 24 94 00 00 00 85 c0 0f 84 df 00 00 00 83 f8 03 0f 84 bf RSP: 0018:ffff8881014075f8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88813c67c000 RCX: dffffc0000000000 RDX: 0000000000000007 RSI: ffffffff826920c0 RDI: ffffc900e357d614 RBP: ffff8881014076e8 R08: ffffffffa09b228d R09: ffff88813c67c57b R10: ffffed10278cf8af R11: 0000000000000000 R12: ffffc900e357d580 R13: 000000000000000a R14: 00000000d9c99400 R15: ffff8881515ddd08 FS: 0000000000000000(0000) GS:ffff88842d100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc900e357d614 CR3: 0000000002e29005 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: rxe_do_task+0xdd/0x160 [rdma_rxe] rxe_run_task+0x67/0x80 [rdma_rxe] rxe_comp_queue_pkt+0x75/0x80 [rdma_rxe] rxe_rcv+0x345/0x480 [rdma_rxe] rxe_xmit_packet+0x1af/0x300 [rdma_rxe] send_ack.isra.0+0x88/0xd0 [rdma_rxe] rxe_responder+0xf4c/0x15e0 [rdma_rxe] rxe_do_task+0xdd/0x160 [rdma_rxe] rxe_run_task+0x67/0x80 [rdma_rxe] rxe_resp_queue_pkt+0x5a/0x60 [rdma_rxe] rxe_rcv+0x370/0x480 [rdma_rxe] rxe_xmit_packet+0x1af/0x300 [rdma_rxe] rxe_requester+0x4f4/0xe80 [rdma_rxe] rxe_do_task+0xdd/0x160 [rdma_rxe] tasklet_action_common.constprop.0+0x168/0x1b0 tasklet_action+0x44/0x60 __do_softirq+0x1db/0x6ed run_ksoftirqd+0x37/0x60 smpboot_thread_fn+0x302/0x410 kthread+0x1f6/0x220 ret_from_fork+0x1f/0x30 Modules linked in: ib_srp(E) scsi_transport_srp(E) target_core_user(E) uio(E) target_core_pscsi(E) target_core_file(E) ib_srpt(E) target_core_iblock(E) target_core_mod(E) ib_umad(E) rdma_ucm(E) ib_iser(E) libiscsi(E) scsi_transport_iscsi(E) rdma_cm(E) iw_cm(E) scsi_debug(E) ib_cm(E) rdma_rxe(E) ip6_udp_tunnel(E) udp_tunnel(E) ib_uverbs(E) null_blk(E) ib_core(E) brd(E) af_packet(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_tables(E) ebtable_nat(E) iTCO_wdt(E) watchdog(E) ebtable_broute(E) intel_rapl_msr(E) intel_pmc_bxt(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) iptable_mangle(E) iptable_raw(E) ip_set(E) nfnetlink(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) rfkill(E) iptable_filter(E) ip_tables(E) x_tables(E) bpfilter(E) intel_rapl_common(E) iosf_mbi(E) isst_if_common(E) i2c_i801(E) pcspkr(E) i2c_smbus(E) virtio_net(E) lpc_ich(E) virtio_balloon(E) net_failover(E) failover(E) tiny_power_button(E) button(E) fuse(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) aesni_intel(E) crypto_simd(E) cryptd(E) sr_mod(E) serio_raw(E) cdrom(E) virtio_gpu(E) virtio_dma_buf(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) cec(E) drm(E) qemu_fw_cfg(E) sg(E) nbd(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) virtio_rng(E) CR2: ffffc900e357d614 ---[ end trace 0667a278da47193a ]--- RIP: 0010:rxe_completer+0x96d/0x1050 [rdma_rxe] Code: e0 49 8b 44 24 08 44 89 e9 41 d3 e6 4e 8d a4 30 80 01 00 00 4d 85 e4 0f 84 f9 00 00 00 49 8d bc 24 94 00 00 00 e8 73 a8 b1 e0 <41> 8b 84 24 94 00 00 00 85 c0 0f 84 df 00 00 00 83 f8 03 0f 84 bf RSP: 0018:ffff8881014075f8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88813c67c000 RCX: dffffc0000000000 RDX: 0000000000000007 RSI: ffffffff826920c0 RDI: ffffc900e357d614 RBP: ffff8881014076e8 R08: ffffffffa09b228d R09: ffff88813c67c57b R10: ffffed10278cf8af R11: 0000000000000000 R12: ffffc900e357d580 R13: 000000000000000a R14: 00000000d9c99400 R15: ffff8881515ddd08 FS: 0000000000000000(0000) GS:ffff88842d100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc900e357d614 CR3: 0000000002e29005 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled Rebooting in 90 seconds..
On 9/10/21 12:38 PM, Pearson, Robert B wrote: > 1. Which rdma-core are you running? Out of box or the github tree? I'm using the rdma-core package included in openSUSE Tumbleweed. blktests pass with that rdma-core package against older kernel versions so I think the rdma-core package is fine. The version number of the rdma-core package I'm using is as follows: $ rpm -q rdma-core rdma-core-36.0-1.1.x86_64 The rdma tool comes from the iproute2 package: $ rpm -qf /sbin/rdma iproute2-5.13-1.1.x86_64 > 3. Where did you get the kernel bits? Which git tree? Which branch? Hmm ... wasn't that mentioned in my previous email? I mentioned a commit SHA and these SHA numbers are unique and unambiguous. Anyway: commit 2169b908894d comes from the for-rc branch of the following git repository: git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git. Bart.
On 9/10/21 3:23 PM, Bart Van Assche wrote: > On 9/10/21 12:38 PM, Pearson, Robert B wrote: >> 1. Which rdma-core are you running? Out of box or the github tree? > > I'm using the rdma-core package included in openSUSE Tumbleweed. blktests > pass with that rdma-core package against older kernel versions so I think > the rdma-core package is fine. The version number of the rdma-core package > I'm using is as follows: > $ rpm -q rdma-core > rdma-core-36.0-1.1.x86_64 > > The rdma tool comes from the iproute2 package: > $ rpm -qf /sbin/rdma > iproute2-5.13-1.1.x86_64 > >> 3. Where did you get the kernel bits? Which git tree? Which branch? > > Hmm ... wasn't that mentioned in my previous email? I mentioned a commit > SHA and these SHA numbers are unique and unambiguous. Anyway: commit > 2169b908894d comes from the for-rc branch of the following git repository: > git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git. > > Bart. > > You'd be surprised how much I don't know. I do know the numbers are unique but I haven't the faintest idea how to decode them into useful strings. In theory you are correct and rdma-core and kernels are supposed to be forwards and backwards compatible but that is a goal and sometimes regressions do occur. I can try to run with that version just to make sure. There is a problem I have seen where some newer distros do not create the default IPV6 address from the MAC address. They randomize it (Ubuntu does this) and rxe is broken as a result. I end up having to add a line like sudo ip addr add dev enp6s0 fe80::b62e:99ff:fef9:fa2e/64 (where the MAC address is b4:2e:99:f9:fa:2e) just before the line sudo rdma link add rxe_1 type rxe netdev enp6s0 But, when this is an issue rxe is really broken and almost nothing works so that may not be an issue for you. I will try to recreate your setup and retest. Thanks, Bob
On 9/10/21 3:23 PM, Bart Van Assche wrote: > On 9/10/21 12:38 PM, Pearson, Robert B wrote: >> 1. Which rdma-core are you running? Out of box or the github tree? > > I'm using the rdma-core package included in openSUSE Tumbleweed. blktests > pass with that rdma-core package against older kernel versions so I think > the rdma-core package is fine. The version number of the rdma-core package > I'm using is as follows: > $ rpm -q rdma-core > rdma-core-36.0-1.1.x86_64 > > The rdma tool comes from the iproute2 package: > $ rpm -qf /sbin/rdma > iproute2-5.13-1.1.x86_64 > >> 3. Where did you get the kernel bits? Which git tree? Which branch? > > Hmm ... wasn't that mentioned in my previous email? I mentioned a commit > SHA and these SHA numbers are unique and unambiguous. Anyway: commit > 2169b908894d comes from the for-rc branch of the following git repository: > git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git. > > Bart. > > OK I checked out the kernel with the SHA number above and applied the patch series and rebuilt and reinstalled the kernel. I checked out v36.0 of rdma-core and rebuilt that. rdma is version 5.9.0 but I doubt that will have any effect. My startup script is export LD_LIBRARY_PATH=/home/bob/src/rdma-core/build/lib/:/usr/local/lib:/usr/lib sudo ip link set dev enp0s3 mtu 8500 sudo ip addr add dev enp0s3 fe80::0a00:27ff:fe94:8a69/64 sudo rdma link add rxe0 type rxe netdev enp0s3 I am running on a Virtualbox VM instance of Ubuntu 21.04 with 20 cores and 8GB of RAM. The test looks like sudo ./check -q srp/001 srp/001 (Create and remove LUNs) [passed] runtime 1.174s ... 1.236s There were no issues. Any guesses what else to look at? Thanks, Bob
On 9/10/21 4:47 PM, Bob Pearson wrote: > On 9/10/21 3:23 PM, Bart Van Assche wrote: >> On 9/10/21 12:38 PM, Pearson, Robert B wrote: >>> 1. Which rdma-core are you running? Out of box or the github tree? >> >> I'm using the rdma-core package included in openSUSE Tumbleweed. blktests >> pass with that rdma-core package against older kernel versions so I think >> the rdma-core package is fine. The version number of the rdma-core package >> I'm using is as follows: >> $ rpm -q rdma-core >> rdma-core-36.0-1.1.x86_64 >> >> The rdma tool comes from the iproute2 package: >> $ rpm -qf /sbin/rdma >> iproute2-5.13-1.1.x86_64 >> >>> 3. Where did you get the kernel bits? Which git tree? Which branch? >> >> Hmm ... wasn't that mentioned in my previous email? I mentioned a commit >> SHA and these SHA numbers are unique and unambiguous. Anyway: commit >> 2169b908894d comes from the for-rc branch of the following git repository: >> git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git. >> >> Bart. >> >> > > OK I checked out the kernel with the SHA number above and applied the patch series > and rebuilt and reinstalled the kernel. I checked out v36.0 of rdma-core and rebuilt > that. rdma is version 5.9.0 but I doubt that will have any effect. My startup script > is > > export LD_LIBRARY_PATH=/home/bob/src/rdma-core/build/lib/:/usr/local/lib:/usr/lib > > > > sudo ip link set dev enp0s3 mtu 8500 > > sudo ip addr add dev enp0s3 fe80::0a00:27ff:fe94:8a69/64 > > sudo rdma link add rxe0 type rxe netdev enp0s3 > > > I am running on a Virtualbox VM instance of Ubuntu 21.04 with 20 cores and 8GB of RAM. > > The test looks like > > sudo ./check -q srp/001 > > srp/001 (Create and remove LUNs) [passed] > > runtime 1.174s ... 1.236s > > There were no issues. > > Any guesses what else to look at? > > Thanks, > > Bob > The 8500 is not required. It runs fine with 4K MTU just as well.
On 9/10/21 2:47 PM, Bob Pearson wrote: > OK I checked out the kernel with the SHA number above and applied the patch series > and rebuilt and reinstalled the kernel. I checked out v36.0 of rdma-core and rebuilt > that. rdma is version 5.9.0 but I doubt that will have any effect. My startup script > is > > export LD_LIBRARY_PATH=/home/bob/src/rdma-core/build/lib/:/usr/local/lib:/usr/lib > > > > sudo ip link set dev enp0s3 mtu 8500 > > sudo ip addr add dev enp0s3 fe80::0a00:27ff:fe94:8a69/64 > > sudo rdma link add rxe0 type rxe netdev enp0s3 > > > I am running on a Virtualbox VM instance of Ubuntu 21.04 with 20 cores and 8GB of RAM. > > The test looks like > > sudo ./check -q srp/001 > > srp/001 (Create and remove LUNs) [passed] > > runtime 1.174s ... 1.236s > > There were no issues. > > Any guesses what else to look at? The test I ran is different. I did not run any of the ip link / ip addr / rdma link commands since the blktests scripts already run the rdma link command. The bug I reported in my previous email is reproducible and triggers a VM halt. Are we using the same kernel config? I attached my kernel config to my previous email. The source code location of the crash address is as follows: (gdb) list *(rxe_completer+0x96d) 0x228d is in rxe_completer (drivers/infiniband/sw/rxe/rxe_comp.c:149). 144 */ 145 wqe = queue_head(qp->sq.queue, QUEUE_TYPE_FROM_CLIENT); 146 *wqe_p = wqe; 147 148 /* no WQE or requester has not started it yet */ 149 if (!wqe || wqe->state == wqe_state_posted) 150 return pkt ? COMPST_DONE : COMPST_EXIT; 151 152 /* WQE does not require an ack */ 153 if (wqe->state == wqe_state_done) The disassembly output is as follows: drivers/infiniband/sw/rxe/rxe_comp.c: 149 if (!wqe || wqe->state == wqe_state_posted) 0x0000000000002277 <+2391>: test %r12,%r12 0x000000000000227a <+2394>: je 0x2379 <rxe_completer+2649> 0x0000000000002280 <+2400>: lea 0x94(%r12),%rdi 0x0000000000002288 <+2408>: call 0x228d <rxe_completer+2413> 0x000000000000228d <+2413>: mov 0x94(%r12),%eax 0x0000000000002295 <+2421>: test %eax,%eax 0x0000000000002297 <+2423>: je 0x237c <rxe_completer+2652> So the instruction that triggers the crash is "mov 0x94(%r12),%eax". Does consumer_addr() perhaps return an invalid address under certain circumstances? Thanks, Bart.
On 9/10/21 5:07 PM, Bart Van Assche wrote: > On 9/10/21 2:47 PM, Bob Pearson wrote: >> OK I checked out the kernel with the SHA number above and applied the patch series >> and rebuilt and reinstalled the kernel. I checked out v36.0 of rdma-core and rebuilt >> that. rdma is version 5.9.0 but I doubt that will have any effect. My startup script >> is >> >> export LD_LIBRARY_PATH=/home/bob/src/rdma-core/build/lib/:/usr/local/lib:/usr/lib >> >> >> >> sudo ip link set dev enp0s3 mtu 8500 >> >> sudo ip addr add dev enp0s3 fe80::0a00:27ff:fe94:8a69/64 >> >> sudo rdma link add rxe0 type rxe netdev enp0s3 >> >> >> I am running on a Virtualbox VM instance of Ubuntu 21.04 with 20 cores and 8GB of RAM. >> >> The test looks like >> >> sudo ./check -q srp/001 >> >> srp/001 (Create and remove LUNs) [passed] >> >> runtime 1.174s ... 1.236s >> >> There were no issues. >> >> Any guesses what else to look at? > > The test I ran is different. I did not run any of the ip link / ip addr / > rdma link commands since the blktests scripts already run the rdma link > command. The bug I reported in my previous email is reproducible and > triggers a VM halt. > > Are we using the same kernel config? I attached my kernel config to my > previous email. The source code location of the crash address is as > follows: > > (gdb) list *(rxe_completer+0x96d) > 0x228d is in rxe_completer (drivers/infiniband/sw/rxe/rxe_comp.c:149). > 144 */ > 145 wqe = queue_head(qp->sq.queue, QUEUE_TYPE_FROM_CLIENT); > 146 *wqe_p = wqe; > 147 > 148 /* no WQE or requester has not started it yet */ > 149 if (!wqe || wqe->state == wqe_state_posted) > 150 return pkt ? COMPST_DONE : COMPST_EXIT; > 151 > 152 /* WQE does not require an ack */ > 153 if (wqe->state == wqe_state_done) > > The disassembly output is as follows: > > drivers/infiniband/sw/rxe/rxe_comp.c: > 149 if (!wqe || wqe->state == wqe_state_posted) > 0x0000000000002277 <+2391>: test %r12,%r12 > 0x000000000000227a <+2394>: je 0x2379 <rxe_completer+2649> > 0x0000000000002280 <+2400>: lea 0x94(%r12),%rdi > 0x0000000000002288 <+2408>: call 0x228d <rxe_completer+2413> > 0x000000000000228d <+2413>: mov 0x94(%r12),%eax > 0x0000000000002295 <+2421>: test %eax,%eax > 0x0000000000002297 <+2423>: je 0x237c <rxe_completer+2652> > > So the instruction that triggers the crash is "mov 0x94(%r12),%eax". > Does consumer_addr() perhaps return an invalid address under certain > circumstances? > > Thanks, > > Bart. The most likely cause of this was fixed by a patch submitted 8/20/2021 by Xiao Yang. It is copied here From: Xiao Yang <yangx.jy@fujitsu.com> To: <linux-rdma@vger.kernel.org> Cc: <aglo@umich.edu>, <rpearsonhpe@gmail.com>, <zyjzyj2000@gmail.com>, <jgg@nvidia.com>, <leon@kernel.org>, Xiao Yang <yangx.jy@fujitsu.com> Subject: [PATCH] RDMA/rxe: Zero out index member of struct rxe_queue Date: Fri, 20 Aug 2021 19:15:09 +0800 [thread overview] Message-ID: <20210820111509.172500-1-yangx.jy@fujitsu.com> (raw) 1) New index member of struct rxe_queue is introduced but not zeroed so the initial value of index may be random. 2) Current index is not masked off to index_mask. In such case, producer_addr() and consumer_addr() will get an invalid address by the random index and then accessing the invalid address triggers the following panic: "BUG: unable to handle page fault for address: ffff9ae2c07a1414" Fix the issue by using kzalloc() to zero out index member. Fixes: 5bcf5a59c41e ("RDMA/rxe: Protext kernel index from user space") Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> --- drivers/infiniband/sw/rxe/rxe_queue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/sw/rxe/rxe_queue.c b/drivers/infiniband/sw/rxe/rxe_queue.c index 85b812586ed4..72d95398e604 100644 --- a/drivers/infiniband/sw/rxe/rxe_queue.c +++ b/drivers/infiniband/sw/rxe/rxe_queue.c @@ -63,7 +63,7 @@ struct rxe_queue *rxe_queue_init(struct rxe_dev *rxe, int *num_elem, if (*num_elem < 0) goto err1; - q = kmalloc(sizeof(*q), GFP_KERNEL); + q = kzalloc(sizeof(*q), GFP_KERNEL); if (!q) goto err1;
On 9/10/21 5:07 PM, Bart Van Assche wrote: > On 9/10/21 2:47 PM, Bob Pearson wrote: >> OK I checked out the kernel with the SHA number above and applied the patch series >> and rebuilt and reinstalled the kernel. I checked out v36.0 of rdma-core and rebuilt >> that. rdma is version 5.9.0 but I doubt that will have any effect. My startup script >> is >> >> export LD_LIBRARY_PATH=/home/bob/src/rdma-core/build/lib/:/usr/local/lib:/usr/lib >> >> >> >> sudo ip link set dev enp0s3 mtu 8500 >> >> sudo ip addr add dev enp0s3 fe80::0a00:27ff:fe94:8a69/64 >> >> sudo rdma link add rxe0 type rxe netdev enp0s3 >> >> >> I am running on a Virtualbox VM instance of Ubuntu 21.04 with 20 cores and 8GB of RAM. >> >> The test looks like >> >> sudo ./check -q srp/001 >> >> srp/001 (Create and remove LUNs) [passed] >> >> runtime 1.174s ... 1.236s >> >> There were no issues. >> >> Any guesses what else to look at? > > The test I ran is different. I did not run any of the ip link / ip addr / > rdma link commands since the blktests scripts already run the rdma link > command. The bug I reported in my previous email is reproducible and > triggers a VM halt. > > Are we using the same kernel config? I attached my kernel config to my > previous email. The source code location of the crash address is as > follows: > > (gdb) list *(rxe_completer+0x96d) > 0x228d is in rxe_completer (drivers/infiniband/sw/rxe/rxe_comp.c:149). > 144 */ > 145 wqe = queue_head(qp->sq.queue, QUEUE_TYPE_FROM_CLIENT); > 146 *wqe_p = wqe; > 147 > 148 /* no WQE or requester has not started it yet */ > 149 if (!wqe || wqe->state == wqe_state_posted) > 150 return pkt ? COMPST_DONE : COMPST_EXIT; > 151 > 152 /* WQE does not require an ack */ > 153 if (wqe->state == wqe_state_done) > > The disassembly output is as follows: > > drivers/infiniband/sw/rxe/rxe_comp.c: > 149 if (!wqe || wqe->state == wqe_state_posted) > 0x0000000000002277 <+2391>: test %r12,%r12 > 0x000000000000227a <+2394>: je 0x2379 <rxe_completer+2649> > 0x0000000000002280 <+2400>: lea 0x94(%r12),%rdi > 0x0000000000002288 <+2408>: call 0x228d <rxe_completer+2413> > 0x000000000000228d <+2413>: mov 0x94(%r12),%eax > 0x0000000000002295 <+2421>: test %eax,%eax > 0x0000000000002297 <+2423>: je 0x237c <rxe_completer+2652> > > So the instruction that triggers the crash is "mov 0x94(%r12),%eax". > Does consumer_addr() perhaps return an invalid address under certain > circumstances? > > Thanks, > > Bart. By the way I did rebuild the kernel with your config file. No change. - Bob
On 9/12/21 07:41, Bob Pearson wrote: > Fixes: 5bcf5a59c41e ("RDMA/rxe: Protext kernel index from user space") > Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> > --- > drivers/infiniband/sw/rxe/rxe_queue.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_queue.c b/drivers/infiniband/sw/rxe/rxe_queue.c > index 85b812586ed4..72d95398e604 100644 > --- a/drivers/infiniband/sw/rxe/rxe_queue.c > +++ b/drivers/infiniband/sw/rxe/rxe_queue.c > @@ -63,7 +63,7 @@ struct rxe_queue *rxe_queue_init(struct rxe_dev *rxe, int *num_elem, > if (*num_elem < 0) > goto err1; > > - q = kmalloc(sizeof(*q), GFP_KERNEL); > + q = kzalloc(sizeof(*q), GFP_KERNEL); > if (!q) > goto err1; Hi Bob, If I rebase this patch series on top of kernel v5.15-rc1 then the srp tests from the blktests suite pass. Kernel v5.15-rc1 includes the above patch. Feel free to add the following to this patch series: Tested-by: Bart Van Assche <bvanassche@acm.org> Thanks, Bart.
On 9/13/21 10:26 PM, Bart Van Assche wrote: > On 9/12/21 07:41, Bob Pearson wrote: >> Fixes: 5bcf5a59c41e ("RDMA/rxe: Protext kernel index from user space") >> Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> >> --- >> drivers/infiniband/sw/rxe/rxe_queue.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/infiniband/sw/rxe/rxe_queue.c b/drivers/infiniband/sw/rxe/rxe_queue.c >> index 85b812586ed4..72d95398e604 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_queue.c >> +++ b/drivers/infiniband/sw/rxe/rxe_queue.c >> @@ -63,7 +63,7 @@ struct rxe_queue *rxe_queue_init(struct rxe_dev *rxe, int *num_elem, >> if (*num_elem < 0) >> goto err1; >> - q = kmalloc(sizeof(*q), GFP_KERNEL); >> + q = kzalloc(sizeof(*q), GFP_KERNEL); >> if (!q) >> goto err1; > > Hi Bob, > > If I rebase this patch series on top of kernel v5.15-rc1 then the srp tests from the blktests suite pass. Kernel v5.15-rc1 includes the above patch. Feel free to add the following to this patch series: > > Tested-by: Bart Van Assche <bvanassche@acm.org> > > Thanks, > > Bart. Sadly, I have been trying to resolve the note from Shaib Rao who was trying to make rping work. His solution was not correct but it led to a can of worms. The kernel verbs consumer APIs were all using the same APIs from rxe_queue.h to manipulate the client ends of the queues but that was totally incorrect. These are written from the POV of the driver and use the private index which is not supposed to be visible to users of the queues. A whole day later I think I have that one about fixed. So I will be resubmitting the series again in the morning. Its all just memory barriers so it may not affect you. Bob