Message ID | 20190430162018.40040-1-mark.rutland@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | io_uring: free allocated io_memory once | expand |
On 4/30/19 10:20 AM, Mark Rutland wrote: > If io_allocate_scq_urings() fails to allocate an sq_* region, it will > call io_mem_free() for any previously allocated regions, but leave > dangling pointers to these regions in the ctx. Any regions which have > not yet been allocated are left NULL. Note that when returning > -EOVERFLOW, the previously allocated sq_ring is not freed, which appears > to be an unintentional leak. > > When io_allocate_scq_urings() fails, io_uring_create() will call > io_ring_ctx_wait_and_kill(), which calls io_mem_free() on all the sq_* > regions, assuming the pointers are valid and not NULL. > > This can result in pages being freed multiple times, which has been > observed to corrupt the page state, leading to subsequent fun. This can > also result in virt_to_page() on NULL, resulting in the use of bogus > page addresses, and yet more subsequent fun. The latter can be detected > with CONFIG_DEBUG_VIRTUAL on arm64. > > Adding a cleanup path to io_allocate_scq_urings() complicates the logic, > so let's leave it to io_ring_ctx_free() to consistently free these > pointers, and simplify the io_allocate_scq_urings() error paths. > > Full splats from before this patch below. Note that the pointer logged > by the DEBUG_VIRTUAL "non-linear address" warning has been hashed, and > is actually NULL. > > [ 26.098129] page:ffff80000e949a00 count:0 mapcount:-128 mapping:0000000000000000 index:0x0 > [ 26.102976] flags: 0x63fffc000000() > [ 26.104373] raw: 000063fffc000000 ffff80000e86c188 ffff80000ea3df08 0000000000000000 > [ 26.108917] raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000 > [ 26.137235] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0) > [ 26.143960] ------------[ cut here ]------------ > [ 26.146020] kernel BUG at include/linux/mm.h:547! > [ 26.147586] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP > [ 26.149163] Modules linked in: > [ 26.150287] Process syz-executor.21 (pid: 20204, stack limit = 0x000000000e9cefeb) > [ 26.153307] CPU: 2 PID: 20204 Comm: syz-executor.21 Not tainted 5.1.0-rc7-00004-g7d30b2ea43d6 #18 > [ 26.156566] Hardware name: linux,dummy-virt (DT) > [ 26.158089] pstate: 40400005 (nZcv daif +PAN -UAO) > [ 26.159869] pc : io_mem_free+0x9c/0xa8 > [ 26.161436] lr : io_mem_free+0x9c/0xa8 > [ 26.162720] sp : ffff000013003d60 > [ 26.164048] x29: ffff000013003d60 x28: ffff800025048040 > [ 26.165804] x27: 0000000000000000 x26: ffff800025048040 > [ 26.167352] x25: 00000000000000c0 x24: ffff0000112c2820 > [ 26.169682] x23: 0000000000000000 x22: 0000000020000080 > [ 26.171899] x21: ffff80002143b418 x20: ffff80002143b400 > [ 26.174236] x19: ffff80002143b280 x18: 0000000000000000 > [ 26.176607] x17: 0000000000000000 x16: 0000000000000000 > [ 26.178997] x15: 0000000000000000 x14: 0000000000000000 > [ 26.181508] x13: 00009178a5e077b2 x12: 0000000000000001 > [ 26.183863] x11: 0000000000000000 x10: 0000000000000980 > [ 26.186437] x9 : ffff000013003a80 x8 : ffff800025048a20 > [ 26.189006] x7 : ffff8000250481c0 x6 : ffff80002ffe9118 > [ 26.191359] x5 : ffff80002ffe9118 x4 : 0000000000000000 > [ 26.193863] x3 : ffff80002ffefe98 x2 : 44c06ddd107d1f00 > [ 26.196642] x1 : 0000000000000000 x0 : 000000000000003e > [ 26.198892] Call trace: > [ 26.199893] io_mem_free+0x9c/0xa8 > [ 26.201155] io_ring_ctx_wait_and_kill+0xec/0x180 > [ 26.202688] io_uring_setup+0x6c4/0x6f0 > [ 26.204091] __arm64_sys_io_uring_setup+0x18/0x20 > [ 26.205576] el0_svc_common.constprop.0+0x7c/0xe8 > [ 26.207186] el0_svc_handler+0x28/0x78 > [ 26.208389] el0_svc+0x8/0xc > [ 26.209408] Code: aa0203e0 d0006861 9133a021 97fcdc3c (d4210000) > [ 26.211995] ---[ end trace bdb81cd43a21e50d ]--- > > [ 81.770626] ------------[ cut here ]------------ > [ 81.825015] virt_to_phys used for non-linear address: 000000000d42f2c7 ( (null)) > [ 81.827860] WARNING: CPU: 1 PID: 30171 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0x48/0x68 > [ 81.831202] Modules linked in: > [ 81.832212] CPU: 1 PID: 30171 Comm: syz-executor.20 Not tainted 5.1.0-rc7-00004-g7d30b2ea43d6 #19 > [ 81.835616] Hardware name: linux,dummy-virt (DT) > [ 81.836863] pstate: 60400005 (nZCv daif +PAN -UAO) > [ 81.838727] pc : __virt_to_phys+0x48/0x68 > [ 81.840572] lr : __virt_to_phys+0x48/0x68 > [ 81.842264] sp : ffff80002cf67c70 > [ 81.843858] x29: ffff80002cf67c70 x28: ffff800014358e18 > [ 81.846463] x27: 0000000000000000 x26: 0000000020000080 > [ 81.849148] x25: 0000000000000000 x24: ffff80001bb01f40 > [ 81.851986] x23: ffff200011db06c8 x22: ffff2000127e3c60 > [ 81.854351] x21: ffff800014358cc0 x20: ffff800014358d98 > [ 81.856711] x19: 0000000000000000 x18: 0000000000000000 > [ 81.859132] x17: 0000000000000000 x16: 0000000000000000 > [ 81.861586] x15: 0000000000000000 x14: 0000000000000000 > [ 81.863905] x13: 0000000000000000 x12: ffff1000037603e9 > [ 81.866226] x11: 1ffff000037603e8 x10: 0000000000000980 > [ 81.868776] x9 : ffff80002cf67840 x8 : ffff80001bb02920 > [ 81.873272] x7 : ffff1000037603e9 x6 : ffff80001bb01f47 > [ 81.875266] x5 : ffff1000037603e9 x4 : dfff200000000000 > [ 81.876875] x3 : ffff200010087528 x2 : ffff1000059ecf58 > [ 81.878751] x1 : 44c06ddd107d1f00 x0 : 0000000000000000 > [ 81.880453] Call trace: > [ 81.881164] __virt_to_phys+0x48/0x68 > [ 81.882919] io_mem_free+0x18/0x110 > [ 81.886585] io_ring_ctx_wait_and_kill+0x13c/0x1f0 > [ 81.891212] io_uring_setup+0xa60/0xad0 > [ 81.892881] __arm64_sys_io_uring_setup+0x2c/0x38 > [ 81.894398] el0_svc_common.constprop.0+0xac/0x150 > [ 81.896306] el0_svc_handler+0x34/0x88 > [ 81.897744] el0_svc+0x8/0xc > [ 81.898715] ---[ end trace b4a703802243cbba ]--- > > Fixes: 2b188cc1bb857a9d ("Add io_uring IO interface") > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > Cc: Jens Axboe <axboe@kernel.dk> > Cc: Alexander Viro <viro@zeniv.linux.org.uk> > Cc: linux-block@vger.kernel.org > Cc: linux-fsdevel@vger.kernel.org > Cc: linux-kernel@vger.kernel.or > --- > fs/io_uring.c | 18 ++++++++---------- > 1 file changed, 8 insertions(+), 10 deletions(-) > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 25fc8cb56fc5..5228e9b41708 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -2552,9 +2552,12 @@ static void io_ring_ctx_free(struct io_ring_ctx *ctx) > sock_release(ctx->ring_sock); > #endif > > - io_mem_free(ctx->sq_ring); > - io_mem_free(ctx->sq_sqes); > - io_mem_free(ctx->cq_ring); > + if (ctx->sq_ring) > + io_mem_free(ctx->sq_ring); > + if (ctx->sq_sqes) > + io_mem_free(ctx->sq_sqes); > + if (ctx->cq_ring) > + io_mem_free(ctx->cq_ring); Please just make io_mem_free() handle a NULL pointer so we don't need these checks.
On Tue, Apr 30, 2019 at 10:22:23AM -0600, Jens Axboe wrote: > On 4/30/19 10:20 AM, Mark Rutland wrote: > > - io_mem_free(ctx->sq_ring); > > - io_mem_free(ctx->sq_sqes); > > - io_mem_free(ctx->cq_ring); > > + if (ctx->sq_ring) > > + io_mem_free(ctx->sq_ring); > > + if (ctx->sq_sqes) > > + io_mem_free(ctx->sq_sqes); > > + if (ctx->cq_ring) > > + io_mem_free(ctx->cq_ring); > > Please just make io_mem_free() handle a NULL pointer so we don't need > these checks. Sure; I'll spin a v2 momentarily. Thanks, Mark.
diff --git a/fs/io_uring.c b/fs/io_uring.c index 25fc8cb56fc5..5228e9b41708 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2552,9 +2552,12 @@ static void io_ring_ctx_free(struct io_ring_ctx *ctx) sock_release(ctx->ring_sock); #endif - io_mem_free(ctx->sq_ring); - io_mem_free(ctx->sq_sqes); - io_mem_free(ctx->cq_ring); + if (ctx->sq_ring) + io_mem_free(ctx->sq_ring); + if (ctx->sq_sqes) + io_mem_free(ctx->sq_sqes); + if (ctx->cq_ring) + io_mem_free(ctx->cq_ring); percpu_ref_exit(&ctx->refs); if (ctx->account_mem) @@ -2747,17 +2750,12 @@ static int io_allocate_scq_urings(struct io_ring_ctx *ctx, return -EOVERFLOW; ctx->sq_sqes = io_mem_alloc(size); - if (!ctx->sq_sqes) { - io_mem_free(ctx->sq_ring); + if (!ctx->sq_sqes) return -ENOMEM; - } cq_ring = io_mem_alloc(struct_size(cq_ring, cqes, p->cq_entries)); - if (!cq_ring) { - io_mem_free(ctx->sq_ring); - io_mem_free(ctx->sq_sqes); + if (!cq_ring) return -ENOMEM; - } ctx->cq_ring = cq_ring; cq_ring->ring_mask = p->cq_entries - 1;
If io_allocate_scq_urings() fails to allocate an sq_* region, it will call io_mem_free() for any previously allocated regions, but leave dangling pointers to these regions in the ctx. Any regions which have not yet been allocated are left NULL. Note that when returning -EOVERFLOW, the previously allocated sq_ring is not freed, which appears to be an unintentional leak. When io_allocate_scq_urings() fails, io_uring_create() will call io_ring_ctx_wait_and_kill(), which calls io_mem_free() on all the sq_* regions, assuming the pointers are valid and not NULL. This can result in pages being freed multiple times, which has been observed to corrupt the page state, leading to subsequent fun. This can also result in virt_to_page() on NULL, resulting in the use of bogus page addresses, and yet more subsequent fun. The latter can be detected with CONFIG_DEBUG_VIRTUAL on arm64. Adding a cleanup path to io_allocate_scq_urings() complicates the logic, so let's leave it to io_ring_ctx_free() to consistently free these pointers, and simplify the io_allocate_scq_urings() error paths. Full splats from before this patch below. Note that the pointer logged by the DEBUG_VIRTUAL "non-linear address" warning has been hashed, and is actually NULL. [ 26.098129] page:ffff80000e949a00 count:0 mapcount:-128 mapping:0000000000000000 index:0x0 [ 26.102976] flags: 0x63fffc000000() [ 26.104373] raw: 000063fffc000000 ffff80000e86c188 ffff80000ea3df08 0000000000000000 [ 26.108917] raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000 [ 26.137235] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0) [ 26.143960] ------------[ cut here ]------------ [ 26.146020] kernel BUG at include/linux/mm.h:547! [ 26.147586] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 26.149163] Modules linked in: [ 26.150287] Process syz-executor.21 (pid: 20204, stack limit = 0x000000000e9cefeb) [ 26.153307] CPU: 2 PID: 20204 Comm: syz-executor.21 Not tainted 5.1.0-rc7-00004-g7d30b2ea43d6 #18 [ 26.156566] Hardware name: linux,dummy-virt (DT) [ 26.158089] pstate: 40400005 (nZcv daif +PAN -UAO) [ 26.159869] pc : io_mem_free+0x9c/0xa8 [ 26.161436] lr : io_mem_free+0x9c/0xa8 [ 26.162720] sp : ffff000013003d60 [ 26.164048] x29: ffff000013003d60 x28: ffff800025048040 [ 26.165804] x27: 0000000000000000 x26: ffff800025048040 [ 26.167352] x25: 00000000000000c0 x24: ffff0000112c2820 [ 26.169682] x23: 0000000000000000 x22: 0000000020000080 [ 26.171899] x21: ffff80002143b418 x20: ffff80002143b400 [ 26.174236] x19: ffff80002143b280 x18: 0000000000000000 [ 26.176607] x17: 0000000000000000 x16: 0000000000000000 [ 26.178997] x15: 0000000000000000 x14: 0000000000000000 [ 26.181508] x13: 00009178a5e077b2 x12: 0000000000000001 [ 26.183863] x11: 0000000000000000 x10: 0000000000000980 [ 26.186437] x9 : ffff000013003a80 x8 : ffff800025048a20 [ 26.189006] x7 : ffff8000250481c0 x6 : ffff80002ffe9118 [ 26.191359] x5 : ffff80002ffe9118 x4 : 0000000000000000 [ 26.193863] x3 : ffff80002ffefe98 x2 : 44c06ddd107d1f00 [ 26.196642] x1 : 0000000000000000 x0 : 000000000000003e [ 26.198892] Call trace: [ 26.199893] io_mem_free+0x9c/0xa8 [ 26.201155] io_ring_ctx_wait_and_kill+0xec/0x180 [ 26.202688] io_uring_setup+0x6c4/0x6f0 [ 26.204091] __arm64_sys_io_uring_setup+0x18/0x20 [ 26.205576] el0_svc_common.constprop.0+0x7c/0xe8 [ 26.207186] el0_svc_handler+0x28/0x78 [ 26.208389] el0_svc+0x8/0xc [ 26.209408] Code: aa0203e0 d0006861 9133a021 97fcdc3c (d4210000) [ 26.211995] ---[ end trace bdb81cd43a21e50d ]--- [ 81.770626] ------------[ cut here ]------------ [ 81.825015] virt_to_phys used for non-linear address: 000000000d42f2c7 ( (null)) [ 81.827860] WARNING: CPU: 1 PID: 30171 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0x48/0x68 [ 81.831202] Modules linked in: [ 81.832212] CPU: 1 PID: 30171 Comm: syz-executor.20 Not tainted 5.1.0-rc7-00004-g7d30b2ea43d6 #19 [ 81.835616] Hardware name: linux,dummy-virt (DT) [ 81.836863] pstate: 60400005 (nZCv daif +PAN -UAO) [ 81.838727] pc : __virt_to_phys+0x48/0x68 [ 81.840572] lr : __virt_to_phys+0x48/0x68 [ 81.842264] sp : ffff80002cf67c70 [ 81.843858] x29: ffff80002cf67c70 x28: ffff800014358e18 [ 81.846463] x27: 0000000000000000 x26: 0000000020000080 [ 81.849148] x25: 0000000000000000 x24: ffff80001bb01f40 [ 81.851986] x23: ffff200011db06c8 x22: ffff2000127e3c60 [ 81.854351] x21: ffff800014358cc0 x20: ffff800014358d98 [ 81.856711] x19: 0000000000000000 x18: 0000000000000000 [ 81.859132] x17: 0000000000000000 x16: 0000000000000000 [ 81.861586] x15: 0000000000000000 x14: 0000000000000000 [ 81.863905] x13: 0000000000000000 x12: ffff1000037603e9 [ 81.866226] x11: 1ffff000037603e8 x10: 0000000000000980 [ 81.868776] x9 : ffff80002cf67840 x8 : ffff80001bb02920 [ 81.873272] x7 : ffff1000037603e9 x6 : ffff80001bb01f47 [ 81.875266] x5 : ffff1000037603e9 x4 : dfff200000000000 [ 81.876875] x3 : ffff200010087528 x2 : ffff1000059ecf58 [ 81.878751] x1 : 44c06ddd107d1f00 x0 : 0000000000000000 [ 81.880453] Call trace: [ 81.881164] __virt_to_phys+0x48/0x68 [ 81.882919] io_mem_free+0x18/0x110 [ 81.886585] io_ring_ctx_wait_and_kill+0x13c/0x1f0 [ 81.891212] io_uring_setup+0xa60/0xad0 [ 81.892881] __arm64_sys_io_uring_setup+0x2c/0x38 [ 81.894398] el0_svc_common.constprop.0+0xac/0x150 [ 81.896306] el0_svc_handler+0x34/0x88 [ 81.897744] el0_svc+0x8/0xc [ 81.898715] ---[ end trace b4a703802243cbba ]--- Fixes: 2b188cc1bb857a9d ("Add io_uring IO interface") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-block@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.or --- fs/io_uring.c | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-)