mbox series

[RFC,v3,0/3] blk-mq: Avoid use-after-free for accessing old requests

Message ID 1614957294-188540-1-git-send-email-john.garry@huawei.com (mailing list archive)
Headers show
Series blk-mq: Avoid use-after-free for accessing old requests | expand

Message

John Garry March 5, 2021, 3:14 p.m. UTC
This series aims to tackle the various UAF reports, like:
[0] https://lore.kernel.org/linux-block/8376443a-ec1b-0cef-8244-ed584b96fa96@huawei.com/
[1] https://lore.kernel.org/linux-block/5c3ac5af-ed81-11e4-fee3-f92175f14daf@acm.org/T/#m6c1ac11540522716f645d004e2a5a13c9f218908
[2] https://lore.kernel.org/linux-block/04e2f9e8-79fa-f1cb-ab23-4a15bf3f64cc@kernel.dk/
[3] https://lore.kernel.org/linux-block/b859618aeac58bd9bb620d7ebdb24b90@codeaurora.org/

Details are in the commit messages.

The issue addressed in patch 1/3 is pretty easy to reproduce, 2+3/3 not so
much, and I had to add mdelays in the iters functions to recreate in
sane timeframes.

A regards patch 1/3, if 2+3/3 are adopted, then this can simplified to
simply clear the tagset requests pointers without using any atomic
operations. However, this patch on its own seems to solve the problem [3],
above. So the other 2x patches are really for extreme scenarios which may
never be seen in practice. As such, it could be considered to just accept
patch 1/3 now.

Differences to v2:
- Add patch 2+3/3
- Drop patch to lockout blk_mq_queue_tag_busy_iter() when exiting elevator

John Garry (3):
  blk-mq: Clean up references to old requests when freeing rqs
  blk-mq: Freeze and quiesce all queues for tagset in elevator_exit()
  blk-mq: Lockout tagset iterator when exiting elevator

 block/blk-mq-sched.c   |  2 +-
 block/blk-mq-tag.c     |  7 ++++++-
 block/blk-mq.c         | 21 +++++++++++++++++++--
 block/blk-mq.h         |  2 ++
 block/blk.h            | 23 +++++++++++++++++++++++
 include/linux/blk-mq.h |  1 +
 6 files changed, 52 insertions(+), 4 deletions(-)

Comments

Shinichiro Kawasaki March 18, 2021, 10:26 a.m. UTC | #1
On Mar 05, 2021 / 23:14, John Garry wrote:
> This series aims to tackle the various UAF reports, like:
> [0] https://lore.kernel.org/linux-block/8376443a-ec1b-0cef-8244-ed584b96fa96@huawei.com/
> [1] https://lore.kernel.org/linux-block/5c3ac5af-ed81-11e4-fee3-f92175f14daf@acm.org/T/#m6c1ac11540522716f645d004e2a5a13c9f218908
> [2] https://lore.kernel.org/linux-block/04e2f9e8-79fa-f1cb-ab23-4a15bf3f64cc@kernel.dk/
> [3] https://lore.kernel.org/linux-block/b859618aeac58bd9bb620d7ebdb24b90@codeaurora.org/
> 
> Details are in the commit messages.
> 
> The issue addressed in patch 1/3 is pretty easy to reproduce, 2+3/3 not so
> much, and I had to add mdelays in the iters functions to recreate in
> sane timeframes.

I also observe the KASAN UAF in blk_mq_queue_tag_busy_iter during blktests run
with kernel version 5.12-rc2 and 5.12-rc3. When the test case block/005 is run
for HDDs behind SAS HBA (Broadcom 9400), the UAF message is always reported and
it makes the test case fail. This failure was not observed with kernel v5.11. I
suppose the failure was rare until v5.11, but changes between 5.11 and 5.12-rcX
made this failure happen more frequent.

I tried the patch 1/3 by John, and saw that it avoids the UAF message and the
block/005 failure. I also tried the patch Bart suggested in this discussion
thread [1], and confirmed that it also avoids the UAF message. I appreciate
these fix work and discussion.

[1] https://marc.info/?l=linux-kernel&m=161559032606201&w=2