diff mbox series

[-next] block: Warn if page offset is equal to or larger than page size

Message ID 1555606721-17679-1-git-send-email-linux@roeck-us.net (mailing list archive)
State New, archived
Headers show
Series [-next] block: Warn if page offset is equal to or larger than page size | expand

Commit Message

Guenter Roeck April 18, 2019, 4:58 p.m. UTC
Prior to commit 8a96a0e40810 ("block: rewrite blk_bvec_map_sg to avoid a
nth_page call"), it was valid to call blk_bvec_map_sg() with an offset
equal to or larger than the page size. blk_bvec_map_sg() would then adjust
page and offset by calling bvec_nth_page(). This is no longer possible.
Large offsets are now directly passed without validation to sg_set_page().
This results in data corruption and crashes such as the following,
observed when trying to boot arm:vexpress from mmc.

Kernel panic - not syncing: Attempted to kill init!  exitcode=0x0000000b
CPU: 0 PID: 1 Comm: init Tainted: G        W
5.1.0-rc5-next-20190418-dirty #1
Hardware name: ARM-Versatile Express
[<c0313188>] (unwind_backtrace) from [<c030d920>] (show_stack+0x10/0x14)
[<c030d920>] (show_stack) from [<c10daad8>] (dump_stack+0xb4/0xc8)
[<c10daad8>] (dump_stack) from [<c03485fc>] (panic+0x110/0x328)
[<c03485fc>] (panic) from [<c034d550>] (do_exit+0xbfc/0xc08)
[<c034d550>] (do_exit) from [<c034e608>] (do_group_exit+0x3c/0xbc)
[<c034e608>] (do_group_exit) from [<c035ace8>] (get_signal+0x13c/0x9c4)
[<c035ace8>] (get_signal) from [<c030cd3c>] (do_work_pending+0x1a8/0x5e0)
[<c030cd3c>] (do_work_pending) from [<c030106c>] (slow_work_pending+0xc/0x20)

The above does not help at all when trying to track down the problem.
Let's at least add a warning to generate a traceback. The crash will
still be observed, but at least we'll have a hint what is causing it.
With this patch applied, we'll see something like the following
prior to the crash.

WARNING: CPU: 0 PID: 84 at block/blk-merge.c:480 blk_rq_map_sg+0x3cc/0x6ac
page offset 19456 larger than page size 4096
Modules linked in:
CPU: 0 PID: 84 Comm: kworker/0:1H Not tainted 5.1.0-rc5-next-20190418-dirty #1
Hardware name: ARM-Versatile Express
Workqueue: kblockd blk_mq_run_work_fn
[<c0313188>] (unwind_backtrace) from [<c030d920>] (show_stack+0x10/0x14)
[<c030d920>] (show_stack) from [<c10daad8>] (dump_stack+0xb4/0xc8)
[<c10daad8>] (dump_stack) from [<c03483ac>] (__warn+0xe0/0xf8)
[<c03483ac>] (__warn) from [<c0348410>] (warn_slowpath_fmt+0x4c/0x70)
[<c0348410>] (warn_slowpath_fmt) from [<c078fb5c>] (blk_rq_map_sg+0x3cc/0x6ac)
[<c078fb5c>] (blk_rq_map_sg) from [<c0eadeec>] (mmc_blk_data_prep+0x1b0/0x2c8)
[<c0eadeec>] (mmc_blk_data_prep) from [<c0eae054>] (mmc_blk_rw_rq_prep+0x50/0x178)
[<c0eae054>] (mmc_blk_rw_rq_prep) from [<c0eb145c>] (mmc_blk_mq_issue_rq+0x294/0x880)
[<c0eb145c>] (mmc_blk_mq_issue_rq) from [<c0eb1e78>] (mmc_mq_queue_rq+0x128/0x230)
[<c0eb1e78>] (mmc_mq_queue_rq) from [<c07957f8>] (blk_mq_dispatch_rq_list+0x3b4/0x5e0)
[<c07957f8>] (blk_mq_dispatch_rq_list) from [<c079ab3c>] (blk_mq_do_dispatch_sched+0x70/0x10c)
[<c079ab3c>] (blk_mq_do_dispatch_sched) from [<c079b238>] (blk_mq_sched_dispatch_requests+0x11c/0x198)
[<c079b238>] (blk_mq_sched_dispatch_requests) from [<c0793e20>] (__blk_mq_run_hw_queue+0xe0/0x170)
[<c0793e20>] (__blk_mq_run_hw_queue) from [<c03665d8>] (process_one_work+0x284/0x6b4)
[<c03665d8>] (process_one_work) from [<c0367914>] (worker_thread+0x44/0x528)
[<c0367914>] (worker_thread) from [<c036cc6c>] (kthread+0x15c/0x164)
[<c036cc6c>] (kthread) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
Exception stack(0xc58d5fb0 to 0xc58d5ff8)
5fa0:                                     00000000 00000000 00000000 00000000
5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
---[ end trace ef7077b559d20ddc ]---

This patch is needed even if the call from mmc_blk_data_prep() is fixed.
We'll have to expect further problems from callers who are not aware about
the API change. Those problems will be difficult to track down without
extra information.

Use WARN_ONCE to report the problem. If the problem is seen, it may be
seen many times, and we don't want to flood the kernel log with repeated
messages.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
I don't know if there are situations where offset >= PAGE_SIZE
is acceptable. A quick comparison with current mainline did not report
any instances where offset >= PAGE_SIZE is passed to sg_set_page().
I do see a number of warnings triggered by this patch in -next
(for example with alpha, m68k, ppc64, and riscv64 emulations) which
do not (or not immediately) result in a crash.

If there is a different / better means to detect a problem, please
let me know and I'll be happy to adjust the patch accordingly.

 block/blk-merge.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Ming Lei April 19, 2019, 1 a.m. UTC | #1
On Fri, Apr 19, 2019 at 12:59 AM Guenter Roeck <linux@roeck-us.net> wrote:
>
> Prior to commit 8a96a0e40810 ("block: rewrite blk_bvec_map_sg to avoid a
> nth_page call"), it was valid to call blk_bvec_map_sg() with an offset
> equal to or larger than the page size. blk_bvec_map_sg() would then adjust
> page and offset by calling bvec_nth_page(). This is no longer possible.
> Large offsets are now directly passed without validation to sg_set_page().
> This results in data corruption and crashes such as the following,
> observed when trying to boot arm:vexpress from mmc.

The issue is in scatterlist code, which claims(and is supposed) to support
>=PAGE_SIZE offset and start page. However, there is bug somewhere.

>
> Kernel panic - not syncing: Attempted to kill init!  exitcode=0x0000000b
> CPU: 0 PID: 1 Comm: init Tainted: G        W
> 5.1.0-rc5-next-20190418-dirty #1
> Hardware name: ARM-Versatile Express
> [<c0313188>] (unwind_backtrace) from [<c030d920>] (show_stack+0x10/0x14)
> [<c030d920>] (show_stack) from [<c10daad8>] (dump_stack+0xb4/0xc8)
> [<c10daad8>] (dump_stack) from [<c03485fc>] (panic+0x110/0x328)
> [<c03485fc>] (panic) from [<c034d550>] (do_exit+0xbfc/0xc08)
> [<c034d550>] (do_exit) from [<c034e608>] (do_group_exit+0x3c/0xbc)
> [<c034e608>] (do_group_exit) from [<c035ace8>] (get_signal+0x13c/0x9c4)
> [<c035ace8>] (get_signal) from [<c030cd3c>] (do_work_pending+0x1a8/0x5e0)
> [<c030cd3c>] (do_work_pending) from [<c030106c>] (slow_work_pending+0xc/0x20)
>
> The above does not help at all when trying to track down the problem.
> Let's at least add a warning to generate a traceback. The crash will
> still be observed, but at least we'll have a hint what is causing it.
> With this patch applied, we'll see something like the following
> prior to the crash.
>
> WARNING: CPU: 0 PID: 84 at block/blk-merge.c:480 blk_rq_map_sg+0x3cc/0x6ac
> page offset 19456 larger than page size 4096
> Modules linked in:
> CPU: 0 PID: 84 Comm: kworker/0:1H Not tainted 5.1.0-rc5-next-20190418-dirty #1
> Hardware name: ARM-Versatile Express
> Workqueue: kblockd blk_mq_run_work_fn
> [<c0313188>] (unwind_backtrace) from [<c030d920>] (show_stack+0x10/0x14)
> [<c030d920>] (show_stack) from [<c10daad8>] (dump_stack+0xb4/0xc8)
> [<c10daad8>] (dump_stack) from [<c03483ac>] (__warn+0xe0/0xf8)
> [<c03483ac>] (__warn) from [<c0348410>] (warn_slowpath_fmt+0x4c/0x70)
> [<c0348410>] (warn_slowpath_fmt) from [<c078fb5c>] (blk_rq_map_sg+0x3cc/0x6ac)
> [<c078fb5c>] (blk_rq_map_sg) from [<c0eadeec>] (mmc_blk_data_prep+0x1b0/0x2c8)
> [<c0eadeec>] (mmc_blk_data_prep) from [<c0eae054>] (mmc_blk_rw_rq_prep+0x50/0x178)
> [<c0eae054>] (mmc_blk_rw_rq_prep) from [<c0eb145c>] (mmc_blk_mq_issue_rq+0x294/0x880)
> [<c0eb145c>] (mmc_blk_mq_issue_rq) from [<c0eb1e78>] (mmc_mq_queue_rq+0x128/0x230)
> [<c0eb1e78>] (mmc_mq_queue_rq) from [<c07957f8>] (blk_mq_dispatch_rq_list+0x3b4/0x5e0)
> [<c07957f8>] (blk_mq_dispatch_rq_list) from [<c079ab3c>] (blk_mq_do_dispatch_sched+0x70/0x10c)
> [<c079ab3c>] (blk_mq_do_dispatch_sched) from [<c079b238>] (blk_mq_sched_dispatch_requests+0x11c/0x198)
> [<c079b238>] (blk_mq_sched_dispatch_requests) from [<c0793e20>] (__blk_mq_run_hw_queue+0xe0/0x170)
> [<c0793e20>] (__blk_mq_run_hw_queue) from [<c03665d8>] (process_one_work+0x284/0x6b4)
> [<c03665d8>] (process_one_work) from [<c0367914>] (worker_thread+0x44/0x528)
> [<c0367914>] (worker_thread) from [<c036cc6c>] (kthread+0x15c/0x164)
> [<c036cc6c>] (kthread) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
> Exception stack(0xc58d5fb0 to 0xc58d5ff8)
> 5fa0:                                     00000000 00000000 00000000 00000000
> 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> ---[ end trace ef7077b559d20ddc ]---
>
> This patch is needed even if the call from mmc_blk_data_prep() is fixed.
> We'll have to expect further problems from callers who are not aware about
> the API change. Those problems will be difficult to track down without
> extra information.
>
> Use WARN_ONCE to report the problem. If the problem is seen, it may be
> seen many times, and we don't want to flood the kernel log with repeated
> messages.
>
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> ---
> I don't know if there are situations where offset >= PAGE_SIZE
> is acceptable. A quick comparison with current mainline did not report
> any instances where offset >= PAGE_SIZE is passed to sg_set_page().
> I do see a number of warnings triggered by this patch in -next
> (for example with alpha, m68k, ppc64, and riscv64 emulations) which
> do not (or not immediately) result in a crash.
>
> If there is a different / better means to detect a problem, please
> let me know and I'll be happy to adjust the patch accordingly.
>
>  block/blk-merge.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 247b17f2a0f6..9432cd88bd7e 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -475,6 +475,10 @@ static unsigned blk_bvec_map_sg(struct request_queue *q,
>                 unsigned offset = bvec->bv_offset + total;
>                 unsigned len = min(get_max_segment_size(q, offset), nbytes);
>
> +               WARN_ONCE(offset >= PAGE_SIZE,
> +                         "page offset %u larger than or equal to page size %ld\n",
> +                         offset, PAGE_SIZE);
> +
>                 *sg = blk_next_sg(sg, sglist);
>                 sg_set_page(*sg, bvec->bv_page, len, offset);

No, it shouldn't be difficult to trigger >= PAGE_SIZE offset, and it
is definitely
correct.


Thanks,
Ming Lei
Christoph Hellwig April 19, 2019, 6:57 a.m. UTC | #2
A warning isn't enough.  After seeing the mess in the MMC code I think
we should flip the switch back for now, as I can guarantee I can fix
all that up for this merge window, and I've just sent a patch for that.
diff mbox series

Patch

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 247b17f2a0f6..9432cd88bd7e 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -475,6 +475,10 @@  static unsigned blk_bvec_map_sg(struct request_queue *q,
 		unsigned offset = bvec->bv_offset + total;
 		unsigned len = min(get_max_segment_size(q, offset), nbytes);
 
+		WARN_ONCE(offset >= PAGE_SIZE,
+			  "page offset %u larger than or equal to page size %ld\n",
+			  offset, PAGE_SIZE);
+
 		*sg = blk_next_sg(sg, sglist);
 		sg_set_page(*sg, bvec->bv_page, len, offset);