From patchwork Fri Oct 25 21:36:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13851852 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C0FD81D2211 for ; Fri, 25 Oct 2024 21:39:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729892385; cv=none; b=ttE+0m52LXZvb557rf6UZWNNMxfwRtUnyfvJwDk/saIuBuMz2W9A7jedynPtWbNs4UB/OZGpL+oBRlfNL2jeuh5r3XWAkyDCKOX4G1cq8wgwBeDeVJGWmIF+89sAZmeLkh7kY94ZxkmRMY+qBPGmnJ/w3p6tbP3kOxfXCxVLX2s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729892385; c=relaxed/simple; bh=xMjQVN3OHCcgU/gP77PyDFUzRZnaporHn6mzvjWY/I0=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=kHFmtYj/qjVWZIxYXyNeh7h2IuSlhuqbt00WLq5KuDtJdUlnE7UTDs0gjthR6kzG2qvAA5z0u7Z/0paPtp31B3dXeNHhWEE6Jj5qpD5OjgChvw52wY42rC4Ez8cuj5qEcECcxtytERJiDrJ1T/9+0TC/tZtUfgkUHHKi3T/SF80= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=NZ3mtqkQ; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="NZ3mtqkQ" Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49PKXbiw031277 for ; Fri, 25 Oct 2024 14:39:42 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=Az7R1ZFVXi2bpm9LY0U/yHNjA9kS/rDJ+LRXrwbIfeo=; b=NZ3mtqkQUwwC PpaxegeuP5HPcII3XQURrHFC8Fnz1j/dyDlkWhyn7BN3gyZkYkh+XKdqWo32dYoJ ManFzjUFAaEfK4Kw1lVoTrF1N2oGmzJ5wrGERryEu6RD39ltOM/94CARo3mk0IoW PNQU4S0h7sOtwM0o5HWHhu7VxFQja07Ra8eXCMDbhHdvXjYrqpGxi9ZrJcvhs/e6 YjkYWPFfQMIs9F9JlSQ4E/dNR9WR2MD+5IJLkOnizTWvevSQl2Mzftsfv5WTG7Ow Le9TQwJ/zPb0VmskDULQZkZpy3T2E4V4d2by48EH/wvXzpji/Kch5N88KFOYDDa5 ngj70qCdog== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42gcu8k8pg-9 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 25 Oct 2024 14:39:42 -0700 (PDT) Received: from twshared13460.05.ash9.facebook.com (2620:10d:c0a8:1b::30) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Fri, 25 Oct 2024 21:39:37 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 66A231476D737; Fri, 25 Oct 2024 14:37:06 -0700 (PDT) From: Keith Busch To: , , , CC: , , , , , Keith Busch , Hannes Reinecke Subject: [PATCHv9 1/7] block: use generic u16 for write hints Date: Fri, 25 Oct 2024 14:36:39 -0700 Message-ID: <20241025213645.3464331-2-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241025213645.3464331-1-kbusch@meta.com> References: <20241025213645.3464331-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: olVJzh6uV0EiUlSG6LMqUn-PqW6HFPNN X-Proofpoint-GUID: olVJzh6uV0EiUlSG6LMqUn-PqW6HFPNN X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_03,2024-10-04_01,2024-09-30_01 From: Keith Busch This is still backwards compatible with lifetime hints. It just doesn't constrain the hints to that definition. Reviewed-by: Hannes Reinecke Signed-off-by: Keith Busch --- include/linux/blk-mq.h | 3 +-- include/linux/blk_types.h | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 59e9adf815a49..bf007a4081d9b 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -8,7 +8,6 @@ #include #include #include -#include struct blk_mq_tags; struct blk_flush_queue; @@ -156,7 +155,7 @@ struct request { struct blk_crypto_keyslot *crypt_keyslot; #endif - enum rw_hint write_hint; + unsigned short write_hint; unsigned short ioprio; enum mq_rq_state state; diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index dce7615c35e7e..6737795220e18 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -10,7 +10,6 @@ #include #include #include -#include struct bio_set; struct bio; @@ -219,7 +218,7 @@ struct bio { */ unsigned short bi_flags; /* BIO_* below */ unsigned short bi_ioprio; - enum rw_hint bi_write_hint; + unsigned short bi_write_hint; blk_status_t bi_status; atomic_t __bi_remaining; From patchwork Fri Oct 25 21:36:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13851851 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 00E5D1C07FD for ; Fri, 25 Oct 2024 21:39:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729892385; cv=none; b=bxpATjDfG28lbvDs2PgVrfJL+gF7so/tP4H4cj2M6tC0fM2vBINFr9mOFuVhgMWTjY+g1yPr8Y31Z+9autIXaMLv4OvhP4PS+fJSFDc+qm38Zi6CMRiynC5zD/juEQ25lDxkmrO61sRY/xCJvvQH9XNKIDPCt9XjFtHDIsWzOhE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729892385; c=relaxed/simple; bh=nauBGRU3shz+ZWCvP2BV+tS+Scp8LoxlhtFPBefRSxA=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=SIZm0CYL+hSaai/GC/nAUcn5vJ337EwEQ3MPKIO5/jQ+qbQNixLUbCuQdFG3fpao/DCz3meGIH4ztowCQurvlzaRFDWNMmCRg2vmE413a2jJo/lYNQekJ2WhTwL1hGEKqaChpd/cgK8dMAiB6sftutmr1Pouskv1oCleXTiKCew= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=iRojtxtQ; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="iRojtxtQ" Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49PKXbiu031277 for ; Fri, 25 Oct 2024 14:39:41 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=e9c++zYU75GJQkDC+/04bSzW2n4NadoZumcFoHKXFa0=; b=iRojtxtQcdUl gEOes3yBTnT9SsZUgWOd2A5ruV6tEczbb5Bt3qlBBnuNxi4MQj6/5KV6e5TgzBEa 2WuvcaHxfqf0Lj7smVfn76i5tAOCeLENeNUyfHAe6eiOPlFHRF4wzgohbUF6g85b XylB9ccMT+hqx9fG5KcxtkeKNaEKFB/BINQsbr6KLgoOWHsX60lb1fqImFuMbs8o Hl84pJOAYSSN7AdaLpLZt/OMRtMkU7Ak0cu3jpwtRoXwnrVY+dUxq3YO21iVgDb5 jmh21/LFUqF2wKbjX/mDDGDK8FXM3oi0fawBK7OEbcueHlgPdE74p3NJS0ofvEgy XN3lPOB9HA== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42gcu8k8pg-7 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 25 Oct 2024 14:39:41 -0700 (PDT) Received: from twshared13460.05.ash9.facebook.com (2620:10d:c0a8:1b::2d) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Fri, 25 Oct 2024 21:39:37 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 74CE31476D739; Fri, 25 Oct 2024 14:37:06 -0700 (PDT) From: Keith Busch To: , , , CC: , , , , , Keith Busch , Hannes Reinecke Subject: [PATCHv9 2/7] block: introduce max_write_hints queue limit Date: Fri, 25 Oct 2024 14:36:40 -0700 Message-ID: <20241025213645.3464331-3-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241025213645.3464331-1-kbusch@meta.com> References: <20241025213645.3464331-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: hEH2ViR2cmCF70zHkTPdJ802QF8Mdwko X-Proofpoint-GUID: hEH2ViR2cmCF70zHkTPdJ802QF8Mdwko X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_03,2024-10-04_01,2024-09-30_01 From: Keith Busch Drivers with hardware that support write streams need a way to export how many are available so applications can generically query this. Reviewed-by: Hannes Reinecke Signed-off-by: Keith Busch --- Documentation/ABI/stable/sysfs-block | 7 +++++++ block/blk-settings.c | 3 +++ block/blk-sysfs.c | 3 +++ include/linux/blkdev.h | 12 ++++++++++++ 4 files changed, 25 insertions(+) diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block index 8353611107154..f2db2cabb8e75 100644 --- a/Documentation/ABI/stable/sysfs-block +++ b/Documentation/ABI/stable/sysfs-block @@ -506,6 +506,13 @@ Description: [RO] Maximum size in bytes of a single element in a DMA scatter/gather list. +What: /sys/block//queue/max_write_hints +Date: October 2024 +Contact: linux-block@vger.kernel.org +Description: + [RO] Maximum number of write hints supported, 0 if not + supported. If supported, valid values are 1 through + max_write_hints, inclusive. What: /sys/block//queue/max_segments Date: March 2010 diff --git a/block/blk-settings.c b/block/blk-settings.c index a446654ddee5e..921fb4d334fa4 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -43,6 +43,7 @@ void blk_set_stacking_limits(struct queue_limits *lim) lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK; /* Inherit limits from component devices */ + lim->max_write_hints = USHRT_MAX; lim->max_segments = USHRT_MAX; lim->max_discard_segments = USHRT_MAX; lim->max_hw_sectors = UINT_MAX; @@ -544,6 +545,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, t->max_segment_size = min_not_zero(t->max_segment_size, b->max_segment_size); + t->max_write_hints = min(t->max_write_hints, b->max_write_hints); + alignment = queue_limit_alignment_offset(b, start); /* Bottom device has different alignment. Check that it is diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 741b95dfdbf6f..85f48ca461049 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -104,6 +104,7 @@ QUEUE_SYSFS_LIMIT_SHOW(max_segments) QUEUE_SYSFS_LIMIT_SHOW(max_discard_segments) QUEUE_SYSFS_LIMIT_SHOW(max_integrity_segments) QUEUE_SYSFS_LIMIT_SHOW(max_segment_size) +QUEUE_SYSFS_LIMIT_SHOW(max_write_hints) QUEUE_SYSFS_LIMIT_SHOW(logical_block_size) QUEUE_SYSFS_LIMIT_SHOW(physical_block_size) QUEUE_SYSFS_LIMIT_SHOW(chunk_sectors) @@ -457,6 +458,7 @@ QUEUE_RO_ENTRY(queue_max_hw_sectors, "max_hw_sectors_kb"); QUEUE_RO_ENTRY(queue_max_segments, "max_segments"); QUEUE_RO_ENTRY(queue_max_integrity_segments, "max_integrity_segments"); QUEUE_RO_ENTRY(queue_max_segment_size, "max_segment_size"); +QUEUE_RO_ENTRY(queue_max_write_hints, "max_write_hints"); QUEUE_RW_LOAD_MODULE_ENTRY(elv_iosched, "scheduler"); QUEUE_RO_ENTRY(queue_logical_block_size, "logical_block_size"); @@ -591,6 +593,7 @@ static struct attribute *queue_attrs[] = { &queue_max_discard_segments_entry.attr, &queue_max_integrity_segments_entry.attr, &queue_max_segment_size_entry.attr, + &queue_max_write_hints_entry.attr, &queue_hw_sector_size_entry.attr, &queue_logical_block_size_entry.attr, &queue_physical_block_size_entry.attr, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 55bec14fe55f9..a8ad41ee07234 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -393,6 +393,8 @@ struct queue_limits { unsigned short max_integrity_segments; unsigned short max_discard_segments; + unsigned short max_write_hints; + unsigned int max_open_zones; unsigned int max_active_zones; @@ -1183,6 +1185,11 @@ static inline unsigned short queue_max_segments(const struct request_queue *q) return q->limits.max_segments; } +static inline unsigned short queue_max_write_hints(struct request_queue *q) +{ + return q->limits.max_write_hints; +} + static inline unsigned short queue_max_discard_segments(const struct request_queue *q) { return q->limits.max_discard_segments; @@ -1230,6 +1237,11 @@ static inline unsigned int bdev_max_segments(struct block_device *bdev) return queue_max_segments(bdev_get_queue(bdev)); } +static inline unsigned short bdev_max_write_hints(struct block_device *bdev) +{ + return queue_max_write_hints(bdev_get_queue(bdev)); +} + static inline unsigned queue_logical_block_size(const struct request_queue *q) { return q->limits.logical_block_size; From patchwork Fri Oct 25 21:36:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13851848 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 025011D1E75 for ; Fri, 25 Oct 2024 21:37:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729892238; cv=none; b=bY9MNyYduDHOD64XlE5Mym3faZirPmgF1su4XoHRcLuvQFTlukB4HwFZGAmQb3OJzDL91+s/m9DmV2kLPMJ/XoPF/WI1NzdvOYdmab98b/IDnLmQ5wR2Vk0AFb5X+PdDxD0kaAJq/5CtaM1qrD1Gd0P2w6F6WDNd6pXoGQ6/FPQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729892238; c=relaxed/simple; bh=fk12rlCymLgohu/hSpG6ABPr7mBnWL0GZ0jjFaWFkzU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=s4br72wilPrbmoIgUtjEk/bXwlVvjyw6GcmelNW2iVqDhLy9H4zKYGdmvsNC7ICO0FwWyDjVxXnCGtNs2EeO7u83bNyCi44aDYR+Mr7yOp/rt+mNdx1v8aUAH5F03Q3m6vbMjhe8zezWnKStMSX+PeRoEBdLlUTz9L4lhU3hRR0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=WPxiNNfT; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="WPxiNNfT" Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49PKXbYT000946 for ; Fri, 25 Oct 2024 14:37:15 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=bbO7+T7xIMipIdGfjCh1Iup93mutgTnr5qBkMOWpsVo=; b=WPxiNNfTRj09 A5JlvhnAHjSuURvFLa1qVSAhgSzqCpIcTGAwzpCM3NWMD78RJ3RUTFDcDPL70nnN q9pUVcivGvyZWujeDvMfRfvdaPVrQnbQXV/lJzshfM2GXOZdDFovVtHSZaGHm86f Zodqxsv55dr1fiG82JCZjsu/C9trclqU5VHa36eYomneDJnF1Vxg5+RBdmjK/MLM VJklxJtiyZKtQAc2BNMDQsBvbSav6VExPlu1S8NhompBuzhBX5++4zcMENiMl+aI 2oR/xoZrNDpuMg2mbDUToWOFL+FMe4FcBlC5Z4TuZg6yeXFc3A6ue1z0IENEPwiG nk7aUgih/Q== Received: from mail.thefacebook.com ([163.114.134.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42ga1xv7gw-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 25 Oct 2024 14:37:14 -0700 (PDT) Received: from twshared22321.07.ash9.facebook.com (2620:10d:c085:208::7cb7) by mail.thefacebook.com (2620:10d:c08b:78::2ac9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Fri, 25 Oct 2024 21:37:10 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 7AFF91476D73B; Fri, 25 Oct 2024 14:37:06 -0700 (PDT) From: Keith Busch To: , , , CC: , , , , , Keith Busch Subject: [PATCHv9 3/7] block: allow ability to limit partition write hints Date: Fri, 25 Oct 2024 14:36:41 -0700 Message-ID: <20241025213645.3464331-4-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241025213645.3464331-1-kbusch@meta.com> References: <20241025213645.3464331-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: 6mEuvBFFFS_t4H9Sruq2XWUmGPFfV1ll X-Proofpoint-GUID: 6mEuvBFFFS_t4H9Sruq2XWUmGPFfV1ll X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_03,2024-10-04_01,2024-09-30_01 From: Keith Busch When multiple partitions are used, you may want to enforce different subsets of the available write hints for each partition. Provide a bitmap attribute of the available write hints, and allow an admin to write a different mask to set the partition's allowed write hints. Signed-off-by: Keith Busch --- block/bdev.c | 15 +++++++++++++ block/partitions/core.c | 46 +++++++++++++++++++++++++++++++++++++-- include/linux/blk_types.h | 1 + 3 files changed, 60 insertions(+), 2 deletions(-) diff --git a/block/bdev.c b/block/bdev.c index 738e3c8457e7f..5d23648db457b 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -414,6 +414,7 @@ void __init bdev_cache_init(void) struct block_device *bdev_alloc(struct gendisk *disk, u8 partno) { + unsigned short max_write_hints; struct block_device *bdev; struct inode *inode; @@ -440,6 +441,20 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno) return NULL; } bdev->bd_disk = disk; + + max_write_hints = bdev_max_write_hints(bdev); + if (max_write_hints) { + int size = BITS_TO_LONGS(max_write_hints) * sizeof(long); + + bdev->write_hint_mask = kmalloc(size, GFP_KERNEL); + if (!bdev->write_hint_mask) { + free_percpu(bdev->bd_stats); + iput(inode); + return NULL; + } + memset(bdev->write_hint_mask, 0xff, size); + } + return bdev; } diff --git a/block/partitions/core.c b/block/partitions/core.c index 815ed33caa1b8..c0ea0a7b6fa87 100644 --- a/block/partitions/core.c +++ b/block/partitions/core.c @@ -203,6 +203,42 @@ static ssize_t part_discard_alignment_show(struct device *dev, return sprintf(buf, "%u\n", bdev_discard_alignment(dev_to_bdev(dev))); } +static ssize_t part_write_hint_mask_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct block_device *bdev = dev_to_bdev(dev); + unsigned short max_write_hints = bdev_max_write_hints(bdev); + + if (max_write_hints) + return sprintf(buf, "%*pb\n", max_write_hints, bdev->write_hint_mask); + else + return sprintf(buf, "0"); +} + +static ssize_t part_write_hint_mask_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct block_device *bdev = dev_to_bdev(dev); + unsigned short max_write_hints = bdev_max_write_hints(bdev); + unsigned long *new_mask; + int size; + + if (!max_write_hints) + return count; + + size = BITS_TO_LONGS(max_write_hints) * sizeof(long); + new_mask = kzalloc(size, GFP_KERNEL); + if (!new_mask) + return -ENOMEM; + + bitmap_parse(buf, count, new_mask, max_write_hints); + bitmap_copy(bdev->write_hint_mask, new_mask, max_write_hints); + + return count; +} + static DEVICE_ATTR(partition, 0444, part_partition_show, NULL); static DEVICE_ATTR(start, 0444, part_start_show, NULL); static DEVICE_ATTR(size, 0444, part_size_show, NULL); @@ -211,6 +247,8 @@ static DEVICE_ATTR(alignment_offset, 0444, part_alignment_offset_show, NULL); static DEVICE_ATTR(discard_alignment, 0444, part_discard_alignment_show, NULL); static DEVICE_ATTR(stat, 0444, part_stat_show, NULL); static DEVICE_ATTR(inflight, 0444, part_inflight_show, NULL); +static DEVICE_ATTR(write_hint_mask, 0644, part_write_hint_mask_show, + part_write_hint_mask_store); #ifdef CONFIG_FAIL_MAKE_REQUEST static struct device_attribute dev_attr_fail = __ATTR(make-it-fail, 0644, part_fail_show, part_fail_store); @@ -225,6 +263,7 @@ static struct attribute *part_attrs[] = { &dev_attr_discard_alignment.attr, &dev_attr_stat.attr, &dev_attr_inflight.attr, + &dev_attr_write_hint_mask.attr, #ifdef CONFIG_FAIL_MAKE_REQUEST &dev_attr_fail.attr, #endif @@ -245,8 +284,11 @@ static const struct attribute_group *part_attr_groups[] = { static void part_release(struct device *dev) { - put_disk(dev_to_bdev(dev)->bd_disk); - bdev_drop(dev_to_bdev(dev)); + struct block_device *part = dev_to_bdev(dev); + + kfree(part->write_hint_mask); + put_disk(part->bd_disk); + bdev_drop(part); } static int part_uevent(const struct device *dev, struct kobj_uevent_env *env) diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 6737795220e18..af430e543f7f7 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -73,6 +73,7 @@ struct block_device { #ifdef CONFIG_SECURITY void *bd_security; #endif + unsigned long *write_hint_mask; /* * keep this out-of-line as it's both big and not needed in the fast * path From patchwork Fri Oct 25 21:36:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13851849 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 246E91C07FD for ; Fri, 25 Oct 2024 21:37:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729892242; cv=none; b=he4rffFsrmnC2FMlUN/EpcmS1q/uRfkuWmCHEoeliazzPf3mDqfJnqLhYjLinMfTSSyWXZvkxTJ7czq7N94w1SO7IO6mfLHRlhjU7RI49CnKBhwbo/z14vO/Wq2cpf9En+uAIhQizGElic3m0voPouPVd9hKSNqkvYcFfPCfFlg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729892242; c=relaxed/simple; bh=nIbhCT8FS+QVdTnVTSEph5yt3h8bhMz1gJgLk/NyFU0=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=cRt+gtJJwmPFWQBkseKP1irZDVEecOpPozBS3vCLwPvBT3w96U4sg4hDwwQshrgJrRLMq4lGySVt0ExXdkBqWB/1od/lyOxTjQCbBUS56SI4TklsbOAGx4n2OS6Rmtqqh9QNB315lQ7ju/vQChWqhjLCCEf/W4hn9VTs1yyzLhg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=Awdmkbhk; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="Awdmkbhk" Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49PKXKeD027484 for ; Fri, 25 Oct 2024 14:37:19 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=41I37dn6RfBbsfu3h2xMpU2bbYlrLzaeudide/nA534=; b=Awdmkbhk6Pa2 1ZkCunt2pVUQZcC6FyLpA3ZhbSYyJgLfeZ52gM8hhYOxd+EloqgS5+z4B/vxZUaj EaNtGpUevNQGxWLeDZTcKJfMWlWLES/zd1adakS6mltUaLmmhtgx3vdhaftQ7+RI QVaKNefPWrHEu1f2Dq3s8/MhgjEbm6sxKKS2jmVaBpPDEpOeTGQXLb/BkAHT+bu7 FLxWpdTuvx1iKGGWrHXh/Hq+2glxvcLkqO1dMbn3ZctGZ2/G2n3EqvfYCtJpkzSX uHyTFoJdHjqF17EJrU57HX4qH/P7nCuuLx3tIy3bSMXPaa0kDZUi+bR+8AMxFBpn EleN/NYIKA== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42ggrd16dj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 25 Oct 2024 14:37:18 -0700 (PDT) Received: from twshared23455.15.frc2.facebook.com (2620:10d:c0a8:1b::8e35) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Fri, 25 Oct 2024 21:37:18 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 8B2DA1476D73D; Fri, 25 Oct 2024 14:37:06 -0700 (PDT) From: Keith Busch To: , , , CC: , , , , , Keith Busch Subject: [PATCHv9 4/7] block, fs: add write hint to kiocb Date: Fri, 25 Oct 2024 14:36:42 -0700 Message-ID: <20241025213645.3464331-5-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241025213645.3464331-1-kbusch@meta.com> References: <20241025213645.3464331-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: 0RHCC4ntu986H9dXai6Jyb2YNb-zTDrv X-Proofpoint-ORIG-GUID: 0RHCC4ntu986H9dXai6Jyb2YNb-zTDrv X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_03,2024-10-04_01,2024-09-30_01 From: Keith Busch This prepares for sources other than the inode to provide a write hint. The block layer will use it for direct IO if the requested hint is within the block device's capabilities. Signed-off-by: Keith Busch --- block/fops.c | 26 +++++++++++++++++++++++--- include/linux/fs.h | 1 + 2 files changed, 24 insertions(+), 3 deletions(-) diff --git a/block/fops.c b/block/fops.c index 2d01c90076813..e3f3f1957d86d 100644 --- a/block/fops.c +++ b/block/fops.c @@ -71,7 +71,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb, bio_init(&bio, bdev, vecs, nr_pages, dio_bio_write_op(iocb)); } bio.bi_iter.bi_sector = pos >> SECTOR_SHIFT; - bio.bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint; + bio.bi_write_hint = iocb->ki_write_hint; bio.bi_ioprio = iocb->ki_ioprio; if (iocb->ki_flags & IOCB_ATOMIC) bio.bi_opf |= REQ_ATOMIC; @@ -200,7 +200,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, for (;;) { bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT; - bio->bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint; + bio->bi_write_hint = iocb->ki_write_hint; bio->bi_private = dio; bio->bi_end_io = blkdev_bio_end_io; bio->bi_ioprio = iocb->ki_ioprio; @@ -316,7 +316,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb, dio->flags = 0; dio->iocb = iocb; bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT; - bio->bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint; + bio->bi_write_hint = iocb->ki_write_hint; bio->bi_end_io = blkdev_bio_end_io_async; bio->bi_ioprio = iocb->ki_ioprio; @@ -362,6 +362,23 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb, return -EIOCBQUEUED; } +static u16 blkdev_write_hint(struct kiocb *iocb, struct block_device *bdev) +{ + u16 hint = iocb->ki_write_hint; + + if (!hint) + return file_inode(iocb->ki_filp)->i_write_hint; + + if (hint > bdev_max_write_hints(bdev)) + return file_inode(iocb->ki_filp)->i_write_hint; + + if (bdev_is_partition(bdev) && + !test_bit(hint - 1, bdev->write_hint_mask)) + return file_inode(iocb->ki_filp)->i_write_hint; + + return hint; +} + static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter) { struct block_device *bdev = I_BDEV(iocb->ki_filp->f_mapping->host); @@ -373,6 +390,9 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter) if (blkdev_dio_invalid(bdev, iocb, iter)) return -EINVAL; + if (iov_iter_rw(iter) == WRITE) + iocb->ki_write_hint = blkdev_write_hint(iocb, bdev); + nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1); if (likely(nr_pages <= BIO_MAX_VECS)) { if (is_sync_kiocb(iocb)) diff --git a/include/linux/fs.h b/include/linux/fs.h index 4b5cad44a1268..1a00accf412e5 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -370,6 +370,7 @@ struct kiocb { void *private; int ki_flags; u16 ki_ioprio; /* See linux/ioprio.h */ + u16 ki_write_hint; union { /* * Only used for async buffered reads, where it denotes the From patchwork Fri Oct 25 21:36:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13851861 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0F37B1D270A for ; Fri, 25 Oct 2024 21:43:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729892590; cv=none; b=ZSak0n7ebfp8UOC6aHnyTzNE/rh8LKa02XiegQa1kOh4KwHjIFC0EdpIWYq38cuZOCVRUpxwoiD06lVXkUb201jYMGMTugj9Onf4uRC1K48mnfMcBjsjRY6zpy1mxr0Jx65dcPvEub0MoXZmEVUbcpFHVGfcgEm1lBMYgY8gPIM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729892590; c=relaxed/simple; bh=9K35hkLTRC/BkJ+8V1VdlasgPnySGLTUr6wJTC9WLGs=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=H7arwpa98nkMOkFBmPLSUkLF5Id8GX/TBRXVG9qqF+o4hGBniM1FVxCS6RlxKxsZ/BldfqDDeM4Lcers2bVBtZ92wesO2ywuOd4zJibzUzqs8mBEgF6MdFW2IN0TspcLnvUZHueuS5Xqe2NUVotBateva8T2XlagF2+GHAQ1ss8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=XRNAmIJO; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="XRNAmIJO" Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49PKcsEw024226 for ; Fri, 25 Oct 2024 14:43:08 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=hV75pvgBqnAl8CFhITowhUqqskleOfnv2dfeICUvPJs=; b=XRNAmIJOOe71 LGMNbhslgHq0Fyh2EVQ5kle6FaW40Hb94mHqkhoJgDP9/XF6yx8g+vxv7Li57ziQ WMH0tKAFgbqAx50Eadj5Cv+MP3WqDiJ1vzXmYezhLmaYlDlBrb7jzMbp6jJEWU3X o5MjrvRFzwaFV/zXsER2Lb0bT5fXxpXMyisXzEuBLWiC87ZFqL49ywhBQ8iFpVFb vB1tVBMp/rgJIWbO8pvFwzgeyfziRMWimdqQ5N9UMewRlCs7KY2JA4n22VcD7xBA vgCWXiANGoOQ3LY4tp52C8q5mav16cDUDQWFRb8TU15cyHzd1ilaIzKW3TbmsRdQ ksMHS9p7lw== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42gjhxrcwv-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 25 Oct 2024 14:43:08 -0700 (PDT) Received: from twshared12347.06.ash8.facebook.com (2620:10d:c0a8:1b::30) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Fri, 25 Oct 2024 21:42:38 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 936951476D73F; Fri, 25 Oct 2024 14:37:06 -0700 (PDT) From: Keith Busch To: , , , CC: , , , , , Hannes Reinecke , Nitesh Shetty , Keith Busch Subject: [PATCHv9 5/7] io_uring: enable per-io hinting capability Date: Fri, 25 Oct 2024 14:36:43 -0700 Message-ID: <20241025213645.3464331-6-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241025213645.3464331-1-kbusch@meta.com> References: <20241025213645.3464331-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: mn7J8lNgNWgnHjm6StxW-KZm3dV8sD-v X-Proofpoint-ORIG-GUID: mn7J8lNgNWgnHjm6StxW-KZm3dV8sD-v X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_03,2024-10-04_01,2024-09-30_01 From: Kanchan Joshi With F_SET_RW_HINT fcntl, user can set a hint on the file inode, and all the subsequent writes on the file pass that hint value down. This can be limiting for block device as all the writes will be tagged with only one lifetime hint value. Concurrent writes (with different hint values) are hard to manage. Per-IO hinting solves that problem. Allow userspace to pass additional metadata in the SQE. __u16 write_hint; If the hint is provided, filesystems may optionally use it. A filesytem may ignore this field if it does not support per-io hints, or if the value is invalid for its backing storage. Just like the inode hints, requesting values that are not supported by the hardware are not an error. Reviewed-by: Hannes Reinecke Signed-off-by: Kanchan Joshi Signed-off-by: Nitesh Shetty Signed-off-by: Keith Busch --- include/uapi/linux/io_uring.h | 4 ++++ io_uring/rw.c | 3 ++- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 60b9c98595faf..8cdcc461d464c 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -92,6 +92,10 @@ struct io_uring_sqe { __u16 addr_len; __u16 __pad3[1]; }; + struct { + __u16 write_hint; + __u16 __pad4[1]; + }; }; union { struct { diff --git a/io_uring/rw.c b/io_uring/rw.c index 8080ffd6d5712..5a1231bfecc3a 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -279,7 +279,8 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe, rw->kiocb.ki_ioprio = get_current_ioprio(); } rw->kiocb.dio_complete = NULL; - + if (ddir == ITER_SOURCE) + rw->kiocb.ki_write_hint = READ_ONCE(sqe->write_hint); rw->addr = READ_ONCE(sqe->addr); rw->len = READ_ONCE(sqe->len); rw->flags = READ_ONCE(sqe->rw_flags); From patchwork Fri Oct 25 21:36:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13851871 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0DE6317DFEC for ; Fri, 25 Oct 2024 21:47:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729892869; cv=none; b=fyOt5RtH3Qb8mQAxdq58E3GZWgLGRSyo0fl6rg9CAmpJ3Z7/833p45oiG1bcUJM9GhcRYk+H29id/cOK/sjPwv5esTqlK455BPhY9gim0Iy4L/D93hrUHl0z5FZkxfJDW24Tnk9c7vsp+NIXR5qvVXhdaHNKPv5uX5+cyXlPkp4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729892869; c=relaxed/simple; bh=LD5x9rcWYdUb1bz76y3P0hMvu2pcICALqvz5cROUbEc=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=PC+OHSg2wFpuPrVyoMRzPrZqH7ubLni2x/Bi+lvUca6qVbVx1KR8bFZix+iuPJkpjuvCK+gKiYZwY49gYHvBphl3IRW0Qvp5MzJbIgLy/Eeip8a8JapXlV9bglj+UwuQ1yuoYIDNn8KI9YPLK/zUawVA1XGLZ2YLdARx3GtOYa8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=Fe/VTCeu; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="Fe/VTCeu" Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.18.1.2/8.18.1.2) with ESMTP id 49PKXQim023076 for ; Fri, 25 Oct 2024 14:47:45 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=fgM+86KprSeGg9uvuSIoLMMs5jH0VA0ChwB/0oA4Xr8=; b=Fe/VTCeuRJKr TSCtKboIN1udGiVQoINXDhUk4vTOsiDO6G1+WFDxE+wQpOGAU0FVUplWZtrb82xK +PX+3EhWLTtl636LH1qkMEGFF491nhW2Bj9g2B4CPkERsptzC+uK9Hq8plgw6KHb WOfsY3gFldQzusRY7zadw2S9W++tNeZeRzKfWOpi8HZSlEhqUhH3v1cWxDA7XQ8X 7glaIMdSwEYHXbd34RERoWOqGmTyIC3vfx/1oiLDGODp3nUPd6JGzWh+yH0Ulb+0 0hobjTscExsQw48BOmDtP2Knqmr/FZ4uRbUImRM3f0+ew/iCBoXNcU76bwFrfoTE /1hlE07uUw== Received: from maileast.thefacebook.com ([163.114.135.16]) by m0089730.ppops.net (PPS) with ESMTPS id 42gdaau2pr-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 25 Oct 2024 14:47:45 -0700 (PDT) Received: from twshared7093.02.ash9.facebook.com (2620:10d:c0a8:1b::30) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Fri, 25 Oct 2024 21:47:42 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 9DA6F1476D742; Fri, 25 Oct 2024 14:37:06 -0700 (PDT) From: Keith Busch To: , , , CC: , , , , , Hui Qi , Nitesh Shetty , Hannes Reinecke , Keith Busch Subject: [PATCHv9 6/7] nvme: enable FDP support Date: Fri, 25 Oct 2024 14:36:44 -0700 Message-ID: <20241025213645.3464331-7-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241025213645.3464331-1-kbusch@meta.com> References: <20241025213645.3464331-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: olOyIF1zZNQAY8QC2imbZHqyPVPsdm5- X-Proofpoint-GUID: olOyIF1zZNQAY8QC2imbZHqyPVPsdm5- X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_03,2024-10-04_01,2024-09-30_01 From: Kanchan Joshi Flexible Data Placement (FDP), as ratified in TP 4146a, allows the host to control the placement of logical blocks so as to reduce the SSD WAF. Userspace can send the write hint information using io_uring or fcntl. Fetch the placement-identifiers if the device supports FDP. The incoming write-hint is mapped to a placement-identifier, which in turn is set in the DSPEC field of the write command. Signed-off-by: Kanchan Joshi Signed-off-by: Hui Qi Signed-off-by: Nitesh Shetty Nacked-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Signed-off-by: Keith Busch --- drivers/nvme/host/core.c | 82 ++++++++++++++++++++++++++++++++++++++++ drivers/nvme/host/nvme.h | 5 +++ include/linux/nvme.h | 19 ++++++++++ 3 files changed, 106 insertions(+) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 84cb859a911d0..36c2b9be8eee7 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -44,6 +44,20 @@ struct nvme_ns_info { bool is_removed; }; +struct nvme_fdp_ruh_status_desc { + u16 pid; + u16 ruhid; + u32 earutr; + u64 ruamw; + u8 rsvd16[16]; +}; + +struct nvme_fdp_ruh_status { + u8 rsvd0[14]; + __le16 nruhsd; + struct nvme_fdp_ruh_status_desc ruhsd[]; +}; + unsigned int admin_timeout = 60; module_param(admin_timeout, uint, 0644); MODULE_PARM_DESC(admin_timeout, "timeout in seconds for admin commands"); @@ -657,6 +671,7 @@ static void nvme_free_ns_head(struct kref *ref) ida_free(&head->subsys->ns_ida, head->instance); cleanup_srcu_struct(&head->srcu); nvme_put_subsystem(head->subsys); + kfree(head->plids); kfree(head); } @@ -974,6 +989,13 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns, if (req->cmd_flags & REQ_RAHEAD) dsmgmt |= NVME_RW_DSM_FREQ_PREFETCH; + if (req->write_hint && ns->head->nr_plids) { + u16 hint = max(req->write_hint, ns->head->nr_plids); + + dsmgmt |= ns->head->plids[hint - 1] << 16; + control |= NVME_RW_DTYPE_DPLCMT; + } + if (req->cmd_flags & REQ_ATOMIC && !nvme_valid_atomic_write(req)) return BLK_STS_INVAL; @@ -2105,6 +2127,52 @@ static int nvme_update_ns_info_generic(struct nvme_ns *ns, return ret; } +static int nvme_fetch_fdp_plids(struct nvme_ns *ns, u32 nsid) +{ + struct nvme_fdp_ruh_status_desc *ruhsd; + struct nvme_ns_head *head = ns->head; + struct nvme_fdp_ruh_status *ruhs; + struct nvme_command c = {}; + int size, ret, i; + + if (head->plids) + return 0; + + size = struct_size(ruhs, ruhsd, NVME_MAX_PLIDS); + ruhs = kzalloc(size, GFP_KERNEL); + if (!ruhs) + return -ENOMEM; + + c.imr.opcode = nvme_cmd_io_mgmt_recv; + c.imr.nsid = cpu_to_le32(nsid); + c.imr.mo = 0x1; + c.imr.numd = cpu_to_le32((size >> 2) - 1); + + ret = nvme_submit_sync_cmd(ns->queue, &c, ruhs, size); + if (ret) + goto out; + + i = le16_to_cpu(ruhs->nruhsd); + if (!i) + goto out; + + ns->head->nr_plids = min_t(u16, i, NVME_MAX_PLIDS); + head->plids = kcalloc(ns->head->nr_plids, sizeof(head->plids), + GFP_KERNEL); + if (!head->plids) { + ret = -ENOMEM; + goto out; + } + + for (i = 0; i < ns->head->nr_plids; i++) { + ruhsd = &ruhs->ruhsd[i]; + head->plids[i] = le16_to_cpu(ruhsd->pid); + } +out: + kfree(ruhs); + return ret; +} + static int nvme_update_ns_info_block(struct nvme_ns *ns, struct nvme_ns_info *info) { @@ -2141,6 +2209,19 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns, goto out; } + if (ns->ctrl->ctratt & NVME_CTRL_ATTR_FDPS) { + ret = nvme_fetch_fdp_plids(ns, info->nsid); + if (ret) + dev_warn(ns->ctrl->device, + "FDP failure status:0x%x\n", ret); + if (ret < 0) + goto out; + } else { + ns->head->nr_plids = 0; + kfree(ns->head->plids); + ns->head->plids = NULL; + } + blk_mq_freeze_queue(ns->disk->queue); ns->head->lba_shift = id->lbaf[lbaf].ds; ns->head->nuse = le64_to_cpu(id->nuse); @@ -2171,6 +2252,7 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns, if (!nvme_init_integrity(ns->head, &lim, info)) capacity = 0; + lim.max_write_hints = ns->head->nr_plids; ret = queue_limits_commit_update(ns->disk->queue, &lim); if (ret) { blk_mq_unfreeze_queue(ns->disk->queue); diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 093cb423f536b..cec8e5d96377b 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -454,6 +454,8 @@ struct nvme_ns_ids { u8 csi; }; +#define NVME_MAX_PLIDS (NVME_CTRL_PAGE_SIZE / sizeof(16)) + /* * Anchor structure for namespaces. There is one for each namespace in a * NVMe subsystem that any of our controllers can see, and the namespace @@ -490,6 +492,9 @@ struct nvme_ns_head { struct device cdev_device; struct gendisk *disk; + + u16 nr_plids; + u16 *plids; #ifdef CONFIG_NVME_MULTIPATH struct bio_list requeue_list; spinlock_t requeue_lock; diff --git a/include/linux/nvme.h b/include/linux/nvme.h index b58d9405d65e0..a954eaee5b0f3 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -275,6 +275,7 @@ enum nvme_ctrl_attr { NVME_CTRL_ATTR_HID_128_BIT = (1 << 0), NVME_CTRL_ATTR_TBKAS = (1 << 6), NVME_CTRL_ATTR_ELBAS = (1 << 15), + NVME_CTRL_ATTR_FDPS = (1 << 19), }; struct nvme_id_ctrl { @@ -843,6 +844,7 @@ enum nvme_opcode { nvme_cmd_resv_register = 0x0d, nvme_cmd_resv_report = 0x0e, nvme_cmd_resv_acquire = 0x11, + nvme_cmd_io_mgmt_recv = 0x12, nvme_cmd_resv_release = 0x15, nvme_cmd_zone_mgmt_send = 0x79, nvme_cmd_zone_mgmt_recv = 0x7a, @@ -864,6 +866,7 @@ enum nvme_opcode { nvme_opcode_name(nvme_cmd_resv_register), \ nvme_opcode_name(nvme_cmd_resv_report), \ nvme_opcode_name(nvme_cmd_resv_acquire), \ + nvme_opcode_name(nvme_cmd_io_mgmt_recv), \ nvme_opcode_name(nvme_cmd_resv_release), \ nvme_opcode_name(nvme_cmd_zone_mgmt_send), \ nvme_opcode_name(nvme_cmd_zone_mgmt_recv), \ @@ -1015,6 +1018,7 @@ enum { NVME_RW_PRINFO_PRCHK_GUARD = 1 << 12, NVME_RW_PRINFO_PRACT = 1 << 13, NVME_RW_DTYPE_STREAMS = 1 << 4, + NVME_RW_DTYPE_DPLCMT = 2 << 4, NVME_WZ_DEAC = 1 << 9, }; @@ -1102,6 +1106,20 @@ struct nvme_zone_mgmt_recv_cmd { __le32 cdw14[2]; }; +struct nvme_io_mgmt_recv_cmd { + __u8 opcode; + __u8 flags; + __u16 command_id; + __le32 nsid; + __le64 rsvd2[2]; + union nvme_data_ptr dptr; + __u8 mo; + __u8 rsvd11; + __u16 mos; + __le32 numd; + __le32 cdw12[4]; +}; + enum { NVME_ZRA_ZONE_REPORT = 0, NVME_ZRASF_ZONE_REPORT_ALL = 0, @@ -1822,6 +1840,7 @@ struct nvme_command { struct nvmf_auth_receive_command auth_receive; struct nvme_dbbuf dbbuf; struct nvme_directive_cmd directive; + struct nvme_io_mgmt_recv_cmd imr; }; }; From patchwork Fri Oct 25 21:36:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13851853 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E29520EA25 for ; Fri, 25 Oct 2024 21:39:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729892392; cv=none; b=X2esQv0nU+rA/jA18osXXv3iSupg9WOYww2yHKkd8Ic6XU2YdMk0+iUYl5tWHf3ldR/k2LZayZXOyCzxnCTK33nfCKchnuCnVKWrNcbouciHFTWm4sPHNWazzMXsL1kRD4wuNMrDEza7FHvIHdAg69kVPZOGrefMsryqlIO30kY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729892392; c=relaxed/simple; bh=QNBnH7bi6WbGlj2rf3vDcTyaJMrmau3H7p98DlJvp0U=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=dQHh7MoxTC/UohV1xHUzs01zc88MKpPvcB4P8s00gwv1WXvx67vRi+L1OqZleiLhTEwLyLFRrDmEJ3Z1KoKZjvUixsQivfBsg7cbrOPGuQHZnjClfhGBdREFRMGeePglaxVcTZpvvc93iaJQs5/yg5H9fJNyGdYw89vneuQemzs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=mQ1IgskG; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="mQ1IgskG" Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49PKXQUR014215 for ; Fri, 25 Oct 2024 14:39:49 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=QDSowptOyZLvK9oAHCxfP0iFBCixlqbp/+dl8U+g3G0=; b=mQ1IgskGRUZF aHWIeAvxdctEjZ1ozlJxEG/DMeww3ySl+lsIJ/kdShKTO5W63NGWopSwifiIeuLr NjMrYPvALK8GNbcTbZnWtwsAxHqOQ8nYuub97UWRdkCxKAjOEkAfLrt1yCY4m10J kotD3iBXuL5/I5kA71zbLcMmVicPJ7cCfBg6tJsvvNFJpBQzpbX4ipirCOjJdCoc r7PbUdHzE14HEc2/mmhOBWFzIa0YRTnLRDKX+CR/L13/CtXR6FvbYlq3leSgdFxI 7SRXa09EVoqgafM6OLC6aIf+2evSaRsXITQHTFDxJ+G2Uyh2VlRpQp4TZnfQehY1 4xHhFHwbkQ== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42g657wbjc-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 25 Oct 2024 14:39:49 -0700 (PDT) Received: from twshared10900.35.frc1.facebook.com (2620:10d:c0a8:1c::11) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Fri, 25 Oct 2024 21:39:46 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id B3E041476D744; Fri, 25 Oct 2024 14:37:06 -0700 (PDT) From: Keith Busch To: , , , CC: , , , , , Keith Busch Subject: [PATCHv9 7/7] scsi: set permanent stream count in block limits Date: Fri, 25 Oct 2024 14:36:45 -0700 Message-ID: <20241025213645.3464331-8-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241025213645.3464331-1-kbusch@meta.com> References: <20241025213645.3464331-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: BSghIwIFNtwgil65YEXt82dXO74pmxiJ X-Proofpoint-ORIG-GUID: BSghIwIFNtwgil65YEXt82dXO74pmxiJ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_02,2024-10-04_01,2024-09-30_01 From: Keith Busch The block limits exports the number of write hints, so set this limit if the device reports support for the lifetime hints. Not only does this inform the user of which hints are possible, it also allows scsi devices supporting the feature to utilize the full range through raw block device direct-io. Signed-off-by: Keith Busch Reviewed-by: Bart Van Assche Reviewed-by: Hannes Reinecke --- drivers/scsi/sd.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index ca4bc0ac76adc..235dd6e5b6688 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3768,6 +3768,8 @@ static int sd_revalidate_disk(struct gendisk *disk) sd_config_protection(sdkp, &lim); } + lim.max_write_hints = sdkp->permanent_stream_count; + /* * We now have all cache related info, determine how we deal * with flush requests.