From patchwork Mon Oct 14 09:29:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 13834518 Received: from mail-pg1-f181.google.com (mail-pg1-f181.google.com [209.85.215.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0929B158533 for ; Mon, 14 Oct 2024 09:32:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728898367; cv=none; b=cUjaSB/UfDHX+0kJFL8CTfKaLAveFujas6KlK78l64pIb6gf0Y5GolIg2tbPzPYGD62zLh6RGxkuZeyJPIE2Rf2kq87wMJbn1Z8otVafSnpzCoo0Nq48ccha7K9htJOWvmbBG+li0qB0u+l8mHCLR9hpnNJ7HWduYRvnGN5KtwQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728898367; c=relaxed/simple; bh=NfYRJbO+n9KlpHo78YK1kJWVU0JEHSyMKcCC/4kPv2g=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=frP6Wv40e9abZaNxlJa9tAS8C9IZ7e0mbUrnW7LNuSGqf5NpYX6wiU+Q9tl7dmbT8MjoURJ5Z3RLktuUeWvl7NHnLh6Sirc+3axiIFpPvmzl2SQhXyGeGpZ/f97v++ye3xPEiqz2yWUw0UkDSSGxx/aQ/b+HX9LPIxbjGC/R9I4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=NLeMvb2y; arc=none smtp.client-ip=209.85.215.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="NLeMvb2y" Received: by mail-pg1-f181.google.com with SMTP id 41be03b00d2f7-7ea76a12c32so1152188a12.1 for ; Mon, 14 Oct 2024 02:32:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1728898364; x=1729503164; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mqOMUsIf1mnLDxR4ticuxNbonVeXyKT1mpHNVeoepzQ=; b=NLeMvb2yBzSG8eZUP/gDfCqgwz964RoyZQmaBmFbFFkgltJE08UchU+dP24QeRjtBz C6ssdIpVvsRHNfiFqze05xdlTGuR8hoyaACIeW176g4q9vmLTCV7X9f+GTd98A+/VLqJ mOzchmjUzsXMKlslYbwd5DY+WyBlzEU9yoU97/ev5/3uyrUUvd5sczuJqzI7wz3u18kV 7KO67n1rhuPntn3zoPfIIT4i0LyriN3qt3wSQFVhxGLNk6h3+YYe0i1iS/hrRdFs/RfU cTAPlj+smTWEvw6UCI8/nGu09fW7htKEPsMOWUTNPkxI5cnFd3CqSZOJvNAMaeJyprsi KsuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728898364; x=1729503164; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mqOMUsIf1mnLDxR4ticuxNbonVeXyKT1mpHNVeoepzQ=; b=OYxP8sdbgMocLv5dmr9j5UAyIjmX2xPenwTUqe5RFfY9D1WNsdsdJGEkeRGGpf9EDh PT7YEk7u+n08Hrnsyn14cp5u8YrhItyN71Libq/vsA1ghiuB2Lydpcy2ieURXg6MwZzq dFq9xit8jFxol2T/tvL8Bv5iU8sg5LuWu9fwkAtmakRqKJ8v+gyzQM6zxujxNaJSSA5H Cbi48Bnn7i1Q1ZboDtlV7kQOrz4fP8zqdSdiJ6L5aC6AAKOER3/tGxWb+aeAxPNP4+b0 4KoABOC63tq44L79baJZ3HaAj0eNxU/XnUDQG/Yl95bG4ekzGnP/s63s6UphnMaON+ok In6Q== X-Gm-Message-State: AOJu0YxK6dEApXJPuMhWlWKtY2/koKFRXZMcZ80B5eVWxiVdU3FBDDFU cBhpeeAn5Y2F5lZvvtlgw31DmRR6PNDAvYDq/v1l9jFe0sw42JmO8tubGmyGHLY= X-Google-Smtp-Source: AGHT+IH7VtBvUAAZgB6HP5fFbXK+2fGQ13rySBnusn/ESWnnKcB43BkdzjpuHE4ZCqlmXOG/LNzLHQ== X-Received: by 2002:a05:6a21:118a:b0:1d4:e523:b67e with SMTP id adf61e73a8af0-1d8bcf14960mr16946376637.14.1728898364362; Mon, 14 Oct 2024 02:32:44 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.11]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71e60bbec80sm2339338b3a.95.2024.10.14.02.32.39 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Oct 2024 02:32:43 -0700 (PDT) From: Muchun Song To: axboe@kernel.dk, ming.lei@redhat.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, muchun.song@linux.dev, Muchun Song , stable@vger.kernel.org Subject: [PATCH RESEND v3 1/3] block: fix missing dispatching request when queue is started or unquiesced Date: Mon, 14 Oct 2024 17:29:32 +0800 Message-Id: <20241014092934.53630-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20241014092934.53630-1-songmuchun@bytedance.com> References: <20241014092934.53630-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Supposing the following scenario with a virtio_blk driver. CPU0 CPU1 CPU2 blk_mq_try_issue_directly() __blk_mq_issue_directly() q->mq_ops->queue_rq() virtio_queue_rq() blk_mq_stop_hw_queue() virtblk_done() blk_mq_try_issue_directly() if (blk_mq_hctx_stopped()) blk_mq_request_bypass_insert() blk_mq_run_hw_queue() blk_mq_run_hw_queue() blk_mq_run_hw_queue() blk_mq_insert_request() return After CPU0 has marked the queue as stopped, CPU1 will see the queue is stopped. But before CPU1 puts the request on the dispatch list, CPU2 receives the interrupt of completion of request, so it will run the hardware queue and marks the queue as non-stopped. Meanwhile, CPU1 also runs the same hardware queue. After both CPU1 and CPU2 complete blk_mq_run_hw_queue(), CPU1 just puts the request to the same hardware queue and returns. It misses dispatching a request. Fix it by running the hardware queue explicitly. And blk_mq_request_issue_directly() should handle a similar situation. Fix it as well. Fixes: d964f04a8fde ("blk-mq: fix direct issue") Cc: stable@vger.kernel.org Cc: Muchun Song Signed-off-by: Muchun Song Reviewed-by: Ming Lei --- block/blk-mq.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block/blk-mq.c b/block/blk-mq.c index e3c3c0c21b553..b2d0f22de0c7f 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2619,6 +2619,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, if (blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(rq->q)) { blk_mq_insert_request(rq, 0); + blk_mq_run_hw_queue(hctx, false); return; } @@ -2649,6 +2650,7 @@ static blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last) if (blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(rq->q)) { blk_mq_insert_request(rq, 0); + blk_mq_run_hw_queue(hctx, false); return BLK_STS_OK; } From patchwork Mon Oct 14 09:29:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 13834519 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F1C90158DD4 for ; Mon, 14 Oct 2024 09:32:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728898371; cv=none; b=eIByl/EIVFWmyWsNQ6qyDvomZAZwBEELrKylDeNeEmgySNCd3HV9kj99AanYRht7eq1xdUS2n9FQmrMZEgIO/yQVN15xBntBM9iujs4PkjZE/sZjVdDo51KMP+govmF16VDzyXqXhAPP4bQueEQmr+nFjR7FN3b8r4EDLMWHqns= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728898371; c=relaxed/simple; bh=KPYlAEycxQewUTOUahfTRncxoD2+pw617t+q4Tal4uA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=HUCYC83ZzyRD8BZQZiddhhxAoR4nKJ1wvIrd1pnQORvTQWztCY2kcnFAkU/N8aOZRvq1Sc5olvI0KP89fm70vU18MURL8ahGzDO1/k8wgQtXLuFTwKy0aHINM6UYU5XYgZwWsf9YJMd6T/pi59C/SOam+jxy0tHTxy8tWzr7Sbs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=SXF2Ncj6; arc=none smtp.client-ip=209.85.210.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="SXF2Ncj6" Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-71e5130832aso942367b3a.0 for ; Mon, 14 Oct 2024 02:32:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1728898369; x=1729503169; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=08ooLorU1l3a5zJ7wHu/dTOBgAYH9XEx3vZXzuZBYRU=; b=SXF2Ncj6g9TFV6T6I8WCm2zJGEIq+Ktc6KmjRAaO4i+Dy2SycqoTEvA5UHe48HApJk PRfzmOOuh6JgxPY0cjsuIVz/svvHGnxtTzM2UtlZrCvAzy0Z5uRjc7vuYAUnLnEuuO+r gMwNSKF5e55VGPoaV1mn2TUM3TPP7YMG31R/A3YJdayhJVoLPlZCnBgRmo+ThN2MOuPo pXxFIVR0C5RidkPJs+8eoF3WoARlNUQS2PLg8AQ7zJ2Nes/qmVPV9u1hXmmw3pHjSxsD jDQlVInhtoUuPyQnznQipG/rrImp+WUy1Bonyu+SxQwr3Z0p8rFYNiKUXxzoCzu1ATtX onlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728898369; x=1729503169; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=08ooLorU1l3a5zJ7wHu/dTOBgAYH9XEx3vZXzuZBYRU=; b=np34JN9ValDmskmlquaTStNaekYWByQ6PBOJhDf/Wcywkx9AEk3NSO+PPYGj+q+/yc uo8dU1NwDnAEWnIhpMVu5iOetPur7EFXHgHKJJqnrdgpZ2T+XKZiwNSOLPjyrknJScsp 9HmM81iw8mfjo5LgyfhGMQ5Eud2OSPa5+Lp/QfJlpbJ+B+8A7U6yvyKRrva72LAOVLOs dRwrVTAJJrCySpnhpSCEBxS+IVhkuTIB54nJjOwOJxl2WdLqoRGZFJp/CiTedLFPpfrv y5T38Bir8VDkFVP3b4wXtI8+H9Ahcv/gWLbbFtpfeasq55xQkCxKyBnQIYK6bg0anK1e cGkw== X-Gm-Message-State: AOJu0YxWEwk43DPWBMudsQAxggeRiYUeaKKG8iOycar6sgw5nb9BeynV 1aepBetZAO4OiUflJj28xXdI/XPf+tG79WeL7UuCA3NfgaoOPyLrGCPUpz4cnOnCz+Pk8frVP6q 3BxE= X-Google-Smtp-Source: AGHT+IGBMDRNMKLO//5sYGfMXInq4aAF8gwrpnqlC4y4fsMFrWqYiR5I8wa6nev+Irqz/anUVaW0UQ== X-Received: by 2002:a05:6a00:23d1:b0:71e:3b51:e856 with SMTP id d2e1a72fcca58-71e4c13dd24mr10888349b3a.1.1728898369262; Mon, 14 Oct 2024 02:32:49 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.11]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71e60bbec80sm2339338b3a.95.2024.10.14.02.32.44 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Oct 2024 02:32:48 -0700 (PDT) From: Muchun Song To: axboe@kernel.dk, ming.lei@redhat.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, muchun.song@linux.dev, Muchun Song , stable@vger.kernel.org Subject: [PATCH RESEND v3 2/3] block: fix ordering between checking QUEUE_FLAG_QUIESCED and adding requests Date: Mon, 14 Oct 2024 17:29:33 +0800 Message-Id: <20241014092934.53630-3-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20241014092934.53630-1-songmuchun@bytedance.com> References: <20241014092934.53630-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Supposing the following scenario. CPU0 CPU1 blk_mq_insert_request() 1) store blk_mq_unquiesce_queue() blk_queue_flag_clear() 3) store blk_mq_run_hw_queues() blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) 4) load return blk_mq_run_hw_queue() if (blk_queue_quiesced()) 2) load return blk_mq_sched_dispatch_requests() The full memory barrier should be inserted between 1) and 2), as well as between 3) and 4) to make sure that either CPU0 sees QUEUE_FLAG_QUIESCED is cleared or CPU1 sees dispatch list or setting of bitmap of software queue. Otherwise, either CPU will not rerun the hardware queue causing starvation. So the first solution is to 1) add a pair of memory barrier to fix the problem, another solution is to 2) use hctx->queue->queue_lock to synchronize QUEUE_FLAG_QUIESCED. Here, we chose 2) to fix it since memory barrier is not easy to be maintained. Fixes: f4560ffe8cec ("blk-mq: use QUEUE_FLAG_QUIESCED to quiesce queue") Cc: stable@vger.kernel.org Cc: Muchun Song Signed-off-by: Muchun Song Reviewed-by: Ming Lei --- block/blk-mq.c | 47 ++++++++++++++++++++++++++++++++++------------- 1 file changed, 34 insertions(+), 13 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index b2d0f22de0c7f..ff6df6c7eeb25 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2202,6 +2202,24 @@ void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs) } EXPORT_SYMBOL(blk_mq_delay_run_hw_queue); +static inline bool blk_mq_hw_queue_need_run(struct blk_mq_hw_ctx *hctx) +{ + bool need_run; + + /* + * When queue is quiesced, we may be switching io scheduler, or + * updating nr_hw_queues, or other things, and we can't run queue + * any more, even blk_mq_hctx_has_pending() can't be called safely. + * + * And queue will be rerun in blk_mq_unquiesce_queue() if it is + * quiesced. + */ + __blk_mq_run_dispatch_ops(hctx->queue, false, + need_run = !blk_queue_quiesced(hctx->queue) && + blk_mq_hctx_has_pending(hctx)); + return need_run; +} + /** * blk_mq_run_hw_queue - Start to run a hardware queue. * @hctx: Pointer to the hardware queue to run. @@ -2222,20 +2240,23 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) might_sleep_if(!async && hctx->flags & BLK_MQ_F_BLOCKING); - /* - * When queue is quiesced, we may be switching io scheduler, or - * updating nr_hw_queues, or other things, and we can't run queue - * any more, even __blk_mq_hctx_has_pending() can't be called safely. - * - * And queue will be rerun in blk_mq_unquiesce_queue() if it is - * quiesced. - */ - __blk_mq_run_dispatch_ops(hctx->queue, false, - need_run = !blk_queue_quiesced(hctx->queue) && - blk_mq_hctx_has_pending(hctx)); + need_run = blk_mq_hw_queue_need_run(hctx); + if (!need_run) { + unsigned long flags; - if (!need_run) - return; + /* + * Synchronize with blk_mq_unquiesce_queue(), because we check + * if hw queue is quiesced locklessly above, we need the use + * ->queue_lock to make sure we see the up-to-date status to + * not miss rerunning the hw queue. + */ + spin_lock_irqsave(&hctx->queue->queue_lock, flags); + need_run = blk_mq_hw_queue_need_run(hctx); + spin_unlock_irqrestore(&hctx->queue->queue_lock, flags); + + if (!need_run) + return; + } if (async || !cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask)) { blk_mq_delay_run_hw_queue(hctx, 0); From patchwork Mon Oct 14 09:29:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 13834520 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8FB1515667B for ; Mon, 14 Oct 2024 09:32:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728898378; cv=none; b=Aol2rTzvZ3rOkyLfd+Tt8ybg+WLquueSk7A/0Vcu3pP33RklVwW0rWdx4HFR5YsmxSYKcnjC3cQ1JDbUMjDYwLLwKEFuzFvIm9538Q2e903KmKV7qnXB4/PnwnyVU1zT2S7nyU2f9lvZfHLW6HuW3aexDTy/UsdSK99lfklOV6g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728898378; c=relaxed/simple; bh=Nlu7sy6nlj6YleTyQBt/cJY6uYT2Pq2MXEk7kWoy/Mc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=VXlZA32uAVgA1QirGhhF0G1NyAE62ZCXg3ngHeynjl+JaIn70vXJOk+xzSkHkETOITparV/GEdajIX+f7sI8goqxr2t+e8Ps2/7yOTJEiEga6pfbr9/5bP7IaoCKb8zlz/NQNP84w1zNZBY35uSuv40ehI4IBKb0YtPWDrO3+RM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=AAKMP1BU; arc=none smtp.client-ip=209.85.210.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="AAKMP1BU" Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-71e5a1c9071so943666b3a.0 for ; Mon, 14 Oct 2024 02:32:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1728898376; x=1729503176; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wEcd+IiEv1ro72y/n/abcXpVwZU0hZ5zkZIt5HG485g=; b=AAKMP1BU8BI9hZ19geOjy1VFvBjhD7kzDez3f8HcKeNxgm64Aw0IUNz5Ei1pTqMO29 6pT/HeleXMkeL2HLqK0PlnYFzcoBKg4P2MYCo66oX5z8jhqcX6oaPzvGB8IaXBwFnSKn RfLzFFwTUu+/JS+neT9Wul3vF0M2NGE9kKcgiw2E76Rm1Ww1as032gZpcM0UG18VCp8U y/rrqu+DjVTSdcFJ1RNFT9F6+DP0FNwO188aENlDXmS89mFlCZHZNl3VdraVuuf8bAyJ A7rgfMnHLHzEH5+lhnmP69t4RoZAMgW+9CcRyALTpvUjyJvp+8ON4js1I8cwhXrBZADs 5Mqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728898376; x=1729503176; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wEcd+IiEv1ro72y/n/abcXpVwZU0hZ5zkZIt5HG485g=; b=JICGR/op0M/mWtKMf+H+Af5tTCQDUN3UVi5NhV3BkinEMUsZY6LoLoh+V4m7BXBDMv 0UJ4W4S2ceOnUDPn3UCk3ri9sE1SVHjXz+xlpNmiKh3mdw8u5b7M8iKqqnZDAr7m5iHv IY3acZUmZmiyYewn4XO3OVvk7rKeythMm9Ee1xCCUqRsHyKuBkkh1/iAkfT33LpsYsVc 3AuZP4Mf0w3E8JTeaJUrWR0yTSKCRdHITl+GFb0PipYRd+d75UZxXQfei0FZ8clB2VBv lB54AnviR1J1PaWl0+7+cT4xSHSlvtEJLupvgKrVTI26jMd9uCil1FX5m83L7qPip5+K t7mQ== X-Gm-Message-State: AOJu0YwJ7N29RRh7pMVFHXm2N+50l1THGabLUVBUKhM445pcT3gWHQMM 0kECC6/4l85qymqpoHZuKpYEVWYgct6lDTxY2PQhU/CT1gyQbSQBQ06T+PSTGU0= X-Google-Smtp-Source: AGHT+IGGDQPE3jEb4Z4FwIgFqUJjG8GscF+eJJXjCeNI6Tg8DZJVOarNlapFjLz/T1p6Tv1Pf0c13Q== X-Received: by 2002:a05:6a00:3e25:b0:71e:6489:d06 with SMTP id d2e1a72fcca58-71e6489127amr5134515b3a.0.1728898375926; Mon, 14 Oct 2024 02:32:55 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.11]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71e60bbec80sm2339338b3a.95.2024.10.14.02.32.49 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Oct 2024 02:32:55 -0700 (PDT) From: Muchun Song To: axboe@kernel.dk, ming.lei@redhat.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, muchun.song@linux.dev, Muchun Song , stable@vger.kernel.org Subject: [PATCH RESEND v3 3/3] block: fix ordering between checking BLK_MQ_S_STOPPED and adding requests Date: Mon, 14 Oct 2024 17:29:34 +0800 Message-Id: <20241014092934.53630-4-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20241014092934.53630-1-songmuchun@bytedance.com> References: <20241014092934.53630-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Supposing first scenario with a virtio_blk driver. CPU0 CPU1 blk_mq_try_issue_directly() __blk_mq_issue_directly() q->mq_ops->queue_rq() virtio_queue_rq() blk_mq_stop_hw_queue() virtblk_done() blk_mq_request_bypass_insert() 1) store blk_mq_start_stopped_hw_queue() clear_bit(BLK_MQ_S_STOPPED) 3) store blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) 4) load return blk_mq_sched_dispatch_requests() blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) return blk_mq_sched_dispatch_requests() if (blk_mq_hctx_stopped()) 2) load return __blk_mq_sched_dispatch_requests() Supposing another scenario. CPU0 CPU1 blk_mq_requeue_work() blk_mq_insert_request() 1) store virtblk_done() blk_mq_start_stopped_hw_queue() blk_mq_run_hw_queues() clear_bit(BLK_MQ_S_STOPPED) 3) store blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) 4) load return blk_mq_sched_dispatch_requests() if (blk_mq_hctx_stopped()) 2) load continue blk_mq_run_hw_queue() Both scenarios are similar, the full memory barrier should be inserted between 1) and 2), as well as between 3) and 4) to make sure that either CPU0 sees BLK_MQ_S_STOPPED is cleared or CPU1 sees dispatch list. Otherwise, either CPU will not rerun the hardware queue causing starvation of the request. The easy way to fix it is to add the essential full memory barrier into helper of blk_mq_hctx_stopped(). In order to not affect the fast path (hardware queue is not stopped most of the time), we only insert the barrier into the slow path. Actually, only slow path needs to care about missing of dispatching the request to the low-level device driver. Fixes: 320ae51feed5 ("blk-mq: new multi-queue block IO queueing mechanism") Cc: stable@vger.kernel.org Cc: Muchun Song Signed-off-by: Muchun Song Reviewed-by: Ming Lei --- block/blk-mq.c | 6 ++++++ block/blk-mq.h | 13 +++++++++++++ 2 files changed, 19 insertions(+) diff --git a/block/blk-mq.c b/block/blk-mq.c index ff6df6c7eeb25..b90c1680cb780 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2413,6 +2413,12 @@ void blk_mq_start_stopped_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) return; clear_bit(BLK_MQ_S_STOPPED, &hctx->state); + /* + * Pairs with the smp_mb() in blk_mq_hctx_stopped() to order the + * clearing of BLK_MQ_S_STOPPED above and the checking of dispatch + * list in the subsequent routine. + */ + smp_mb__after_atomic(); blk_mq_run_hw_queue(hctx, async); } EXPORT_SYMBOL_GPL(blk_mq_start_stopped_hw_queue); diff --git a/block/blk-mq.h b/block/blk-mq.h index 260beea8e332c..f36f3bff70d86 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -228,6 +228,19 @@ static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data static inline bool blk_mq_hctx_stopped(struct blk_mq_hw_ctx *hctx) { + /* Fast path: hardware queue is not stopped most of the time. */ + if (likely(!test_bit(BLK_MQ_S_STOPPED, &hctx->state))) + return false; + + /* + * This barrier is used to order adding of dispatch list before and + * the test of BLK_MQ_S_STOPPED below. Pairs with the memory barrier + * in blk_mq_start_stopped_hw_queue() so that dispatch code could + * either see BLK_MQ_S_STOPPED is cleared or dispatch list is not + * empty to avoid missing dispatching requests. + */ + smp_mb(); + return test_bit(BLK_MQ_S_STOPPED, &hctx->state); }