Message ID | 1698147005-5396-1-git-send-email-george.kennedy@oracle.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | mlx5: reset state to avoid attempted QP double free and UAF | expand |
On Tue, Oct 24, 2023 at 06:30:05AM -0500, George Kennedy wrote: > In the unlikely event that workqueue allocation fails and returns > NULL in mlx5_mkey_cache_init(), reset the state to > MLX5_UMR_STATE_UNINIT in mlx5_ib_stage_post_ib_reg_umr_init() > after the call to mlx5r_umr_resource_cleanup(), which frees > the QP. This will avoid attempted double free of the same QP > when __mlx5_ib_add() does its cleanup. > > Syzkaller reported a UAF in ib_destroy_qp_user > > workqueue: Failed to create a rescuer kthread for wq "mkey_cache": -EINTR > infiniband mlx5_0: mlx5_mkey_cache_init:981:(pid 1642): > failed to create work queue > infiniband mlx5_0: mlx5_ib_stage_post_ib_reg_umr_init:4075:(pid 1642): > mr cache init failed -12 > ================================================================== > BUG: KASAN: slab-use-after-free in ib_destroy_qp_user (drivers/infiniband/core/verbs.c:2073) > Read of size 8 at addr ffff88810da310a8 by task repro_upstream/1642 > > Call Trace: > <TASK> > kasan_report (mm/kasan/report.c:590) > ib_destroy_qp_user (drivers/infiniband/core/verbs.c:2073) > mlx5r_umr_resource_cleanup (drivers/infiniband/hw/mlx5/umr.c:198) > __mlx5_ib_add (drivers/infiniband/hw/mlx5/main.c:4178) > mlx5r_probe (drivers/infiniband/hw/mlx5/main.c:4402) > ... > </TASK> > > Allocated by task 1642: > __kmalloc (./include/linux/kasan.h:198 mm/slab_common.c:1026 > mm/slab_common.c:1039) > create_qp (./include/linux/slab.h:603 ./include/linux/slab.h:720 > ./include/rdma/ib_verbs.h:2795 drivers/infiniband/core/verbs.c:1209) > ib_create_qp_kernel (drivers/infiniband/core/verbs.c:1347) > mlx5r_umr_resource_init (drivers/infiniband/hw/mlx5/umr.c:164) > mlx5_ib_stage_post_ib_reg_umr_init (drivers/infiniband/hw/mlx5/main.c:4070) > __mlx5_ib_add (drivers/infiniband/hw/mlx5/main.c:4168) > mlx5r_probe (drivers/infiniband/hw/mlx5/main.c:4402) > ... > > Freed by task 1642: > __kmem_cache_free (mm/slub.c:1826 mm/slub.c:3809 mm/slub.c:3822) > ib_destroy_qp_user (drivers/infiniband/core/verbs.c:2112) > mlx5r_umr_resource_cleanup (drivers/infiniband/hw/mlx5/umr.c:198) > mlx5_ib_stage_post_ib_reg_umr_init (drivers/infiniband/hw/mlx5/main.c:4076 > drivers/infiniband/hw/mlx5/main.c:4065) > __mlx5_ib_add (drivers/infiniband/hw/mlx5/main.c:4168) > mlx5r_probe (drivers/infiniband/hw/mlx5/main.c:4402) > ... > > The buggy address belongs to the object at ffff88810da31000 > which belongs to the cache kmalloc-2k of size 2048 > The buggy address is located 168 bytes inside of > freed 2048-byte region [ffff88810da31000, ffff88810da31800) > > The buggy address belongs to the physical page: > page:000000003b5e469d refcount:1 mapcount:0 mapping:0000000000000000 > index:0x0 pfn:0x10da30 > head:000000003b5e469d order:3 entire_mapcount:0 nr_pages_mapped:0 > pincount:0 > flags: 0x17ffffc0000840(slab|head|node=0|zone=2|lastcpupid=0x1fffff) > page_type: 0xffffffff() > raw: 0017ffffc0000840 ffff888100042f00 ffffea0004180800 dead000000000002 > raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000 > page dumped because: kasan: bad access detected > > Memory state around the buggy address: > ffff88810da30f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > ffff88810da31000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >ffff88810da31080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ^ > ffff88810da31100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff88810da31180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ================================================================== > Disabling lock debugging due to kernel taint > > Fixes: 04876c12c19e ("RDMA/mlx5: Move init and cleanup of UMR to umr.c") > Reported-by: syzkaller <syzkaller@googlegroups.com> > Signed-off-by: George Kennedy <george.kennedy@oracle.com> > --- > drivers/infiniband/hw/mlx5/main.c | 1 + > 1 file changed, 1 insertion(+) Thanks for the report, I think that the following change will be better aligned to mlx5_ib code. Can you please resend your patch? diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index ec7c45272764..b1f8914abf44 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -4092,10 +4092,8 @@ static int mlx5_ib_stage_post_ib_reg_umr_init(struct mlx5_ib_dev *dev) return ret; ret = mlx5_mkey_cache_init(dev); - if (ret) { + if (ret) mlx5_ib_warn(dev, "mr cache init failed %d\n", ret); - mlx5r_umr_resource_cleanup(dev); - } return ret; }
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 555629b7..eca8c1c 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -4074,6 +4074,7 @@ static int mlx5_ib_stage_post_ib_reg_umr_init(struct mlx5_ib_dev *dev) if (ret) { mlx5_ib_warn(dev, "mr cache init failed %d\n", ret); mlx5r_umr_resource_cleanup(dev); + dev->umrc.state = MLX5_UMR_STATE_UNINIT; } return ret; }