[RFC,4/9] ublk_drv: refactor __ublk_rq_task_work() and aborting machenism

If one rq is handled by io_uring_cmd_complete_in_task(), after a crash
this rq is actually handled by an io_uring fallback wq. We have to
end(abort) this rq since this fallback wq is a task other than the
crashed task. However, current code does not call io_uring_cmd_done()
at the same time but do it in ublk_cancel_queue(). With current design,
this does work because ublk_cancel_queue() is called AFTER del_gendisk(),
which waits for the rq ended(aborted) in fallback wq. This implies that
fallback wq on this rq is scheduled BEFORE calling io_uring_cmd_done()
on the corresponding ioucmd in ublk_cancel_queue().

However, while considering recovery feature, we cannot rely on
del_gendisk() or blk_mq_freeze_queue() to wait for completion of all
rqs because we may not want any aborted rq. Besides, io_uring does not
provide "flush fallback" machenism so we cannot trace this ioucmd.

The recovery machenism needs to complete all ioucmds of a dying ubq
to avoid leaking io_uring ctx. But as talked above, we are unsafe
to call io_uring_cmd_done() in the recovery task if fallback wq happens
to run simultaneously. This is a UAF case because io_uring ctx may be
freed. Actually a similar case happens in
(5804987b7272f437299011c76b7363b8df6f8515: ublk_drv: do not add a
re-issued request aborted previously to ioucmd's task_work).

Besides, in order to implement recovery machenism, in ublk_queue_rq()
and __ublk_rq_task_work(), we should not end(abort) current rq while
ubq_daemon is dying. Instead, we should wait for new ubq_daemon getting
ready and requeue it.

In summary, We refactor some code to avoid the UAF problem and prepare
for recovery machenism:

(1) Refactor monitor_work
Monitor_work is only used without recovery feature which aborts rqs and
stops the device. Now ublk_abort_queue() is the only function end(abort)
inflight rqs with a dying ubq_daemon. While for a not inflight(idle)
rq, its ioucmd is completed by io_uring_cmd_done() safely. We do not
rely on UBLK_IO_FLAG_ACTIVE anymore. monitor_work also sets 'force_abort'
for a dying ubq.

(2) Refactor ublk_queue_rq()
Check 'force_abort' before blk_mq_start_request(). If 'force_abort' is
true, end(abort) current rq immediately. 'force_abort' is set by
monitor_work for a dying ubq. Aborting rqs ASAP ensures that all rqs'
status do not change while we traverse rqs in monitor_work.

(3) Refactor __ublk_rq_task_work().
No matter we use task_work_add() or io_uring_cmd_complete_in_task(),
now __ublk_rq_task_work() only accepts one arg: ioucmd, which is set in
ublk_queue_rq() eariler. And no matter ubq_daemon is dying or not,
we always call io_uring_cmd_done(ioucmd). Note that ioucmd might not be
the same as io->cmd because a new ubq_daemon may set new ioucmds before
old fallback wq or exit_task_work runs. In this way the old ioucmd
can be safely freed eventually and io->cmd can be updated without race.

(4) Refactor ublk_cancel_dev()
ublk_cancel_dev() cannot complete ioucmds for a dying ubq because we
have done this in monitor_work.

Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
---
 drivers/block/ublk_drv.c | 216 +++++++++++++++++++++------------------
 1 file changed, 115 insertions(+), 101 deletions(-)

Message ID	20220824054744.77812-5-ZiyangZhang@linux.alibaba.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FF54C32792 for <linux-block@archiver.kernel.org>; Wed, 24 Aug 2022 05:49:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232817AbiHXFtK (ORCPT <rfc822;linux-block@archiver.kernel.org>); Wed, 24 Aug 2022 01:49:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46634 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234045AbiHXFtH (ORCPT <rfc822;linux-block@vger.kernel.org>); Wed, 24 Aug 2022 01:49:07 -0400 Received: from out30-44.freemail.mail.aliyun.com (out30-44.freemail.mail.aliyun.com [115.124.30.44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2135F80F46; Tue, 23 Aug 2022 22:48:59 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1\|-1;BR=01201311R161e4;CH=green;DM=\|\|false\|;DS=\|\|;FP=0\|-1\|-1\|-1\|0\|-1\|-1\|-1;HT=ay29a033018045168;MF=ziyangzhang@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0VN5zTeJ_1661320133; Received: from localhost.localdomain(mailfrom:ZiyangZhang@linux.alibaba.com fp:SMTPD_---0VN5zTeJ_1661320133) by smtp.aliyun-inc.com; Wed, 24 Aug 2022 13:48:54 +0800 From: ZiyangZhang <ZiyangZhang@linux.alibaba.com> To: ming.lei@redhat.com, axboe@kernel.dk Cc: xiaoguang.wang@linux.alibaba.com, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, joseph.qi@linux.alibaba.com, ZiyangZhang <ZiyangZhang@linux.alibaba.com> Subject: [RFC PATCH 4/9] ublk_drv: refactor __ublk_rq_task_work() and aborting machenism Date: Wed, 24 Aug 2022 13:47:39 +0800 Message-Id: <20220824054744.77812-5-ZiyangZhang@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220824054744.77812-1-ZiyangZhang@linux.alibaba.com> References: <20220824054744.77812-1-ZiyangZhang@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <linux-block.vger.kernel.org> X-Mailing-List: linux-block@vger.kernel.org
Series	ublk_drv: add USER_RECOVERY support \| expand [RFC,0/9] ublk_drv: add USER_RECOVERY support [RFC,1/9] ublk_drv: check 'current' instead of 'ubq_daemon' [RFC,2/9] ublk_drv: refactor ublk_cancel_queue() [RFC,3/9] ublk_drv: add a helper to get ioucmd from pdu [RFC,4/9] ublk_drv: refactor __ublk_rq_task_work() and aborting machenism [RFC,5/9] ublk_drv: refactor ublk_stop_dev() [RFC,6/9] ublk_drv: add pr_devel() to prepare for recovery feature [RFC,7/9] ublk_drv: define macros for recovery feature and check them [RFC,8/9] ublk_drv: add START_USER_RECOVERY and END_USER_RECOVERY support [RFC,9/9] ublk_drv: do not schedule monitor_work with recovery feature enabled

[RFC,4/9] ublk_drv: refactor __ublk_rq_task_work() and aborting machenism

Commit Message

Comments

Patch