From patchwork Tue Mar 4 10:22:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nilay Shroff X-Patchwork-Id: 14000372 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AAEF81FCF72 for ; Tue, 4 Mar 2025 10:26:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741083979; cv=none; b=B+Zb6lpnwc7gJZx6wEzax+BnxlZ441VPQ/TAVxqtM9DjMXeLadi3cG9njpBICbts0VCPGMRmxtExx4MofvuSjtASAqyBwfmnqYEfkiqyf6xatKyA33SfB2Wrj/o9ptxwmVCnQ/9IYzbsJBfNMxARg569lEwnIQHPaTyd26TTKW4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741083979; c=relaxed/simple; bh=WrjsnHiPsXjSFwM2536QBTSLx+8xNYBrr6bXcaBo/Dg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dhWMr32eCBNUAIvTzoqLfMK56mXeeHkRMb7hTO1/lUFb+RViT2/qdGAOHoiEZY4iTrJI9neH4IexuqCDdH97QmMMlRW+QSXnLzbXjAsBKf/hL5Shol8LsOdEbbhyZn0iXmZ8w+oNQbrdYBetUEZZMdvaUAYfqwesd7wzlfl91/E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=AjXa5DDa; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="AjXa5DDa" Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 524ABZIh026439; Tue, 4 Mar 2025 10:25:59 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=z976WmYzsgncyERMw tGQDB37M69AJE0VAPjbAVqc+P8=; b=AjXa5DDa3sUIbNekc+b47wqfYTdNM38kG 0Znd9OZ+roDWqeXXIdDDYHCfQZAAzQIJz8nRqxEjDXHZxXt6fsG5FdbxSsO0xdVk /whxopW3Bp9KIAzrZHPJ2+mQ2eXQK30zZZ1sf/7QUMKvQfjqF9RGXQde2dd3bySd hez8I+PsRfgVzMSbDZiuTvbLk0tH0wlrmxb4n6Ez0vJNbw7WUCDAtVnwPGnysJOL WmdrtYkoDKnm0HUHPYAAJ/i5Tj/e58MchDlRfKCjXf1lLeSCLoW8UugofT89/2Zw FFmWDDpRxEpUlOMu/XJyK38xIvU5y+hJQgYvLgr2Uy7xiEZoZr7rw== Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 455kkpb82v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 04 Mar 2025 10:25:59 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5249iV7F025012; Tue, 4 Mar 2025 10:25:58 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 454f91veej-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 04 Mar 2025 10:25:58 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 524APu3h8454470 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 4 Mar 2025 10:25:56 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9123120043; Tue, 4 Mar 2025 10:25:56 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 92F6720040; Tue, 4 Mar 2025 10:25:54 +0000 (GMT) Received: from li-c9696b4c-3419-11b2-a85c-f9edc3bf8a84.in.ibm.com (unknown [9.109.198.149]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 4 Mar 2025 10:25:54 +0000 (GMT) From: Nilay Shroff To: linux-block@vger.kernel.org Cc: hch@lst.de, ming.lei@redhat.com, dlemoal@kernel.org, hare@suse.de, axboe@kernel.dk, lkp@intel.com, gjoyce@ibm.com Subject: [PATCHv6 1/7] block: acquire q->limits_lock while reading sysfs attributes Date: Tue, 4 Mar 2025 15:52:30 +0530 Message-ID: <20250304102551.2533767-2-nilay@linux.ibm.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250304102551.2533767-1-nilay@linux.ibm.com> References: <20250304102551.2533767-1-nilay@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: MqBTR-6mYXq-h6p3EXmTU-M-dR58wLT_ X-Proofpoint-ORIG-GUID: MqBTR-6mYXq-h6p3EXmTU-M-dR58wLT_ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1093,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-03-04_04,2025-03-03_04,2024-11-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 clxscore=1015 malwarescore=0 adultscore=0 impostorscore=0 bulkscore=0 mlxlogscore=999 phishscore=0 spamscore=0 priorityscore=1501 mlxscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2502100000 definitions=main-2503040080 There're few sysfs attributes(RW) whose store method is protected with q->limits_lock, however the corresponding show method of these attributes run holding q->sysfs_lock and that doesn't make sense as ideally the show method of these attributes should also run holding q->limits_lock instead of q->sysfs_lock. Hence update the show method of these sysfs attributes so that reading of these attributes acquire q->limits_lock instead of q->sysfs_lock. Similarly, there're few sysfs attributes(RO) whose show method is currently protected with q->sysfs_lock however updates to these attributes could occur using atomic limit update APIs such as queue_ limits_start_update() and queue_limits_commit_update() which run holding q->limits_lock. So that means that reading these attributes holding q->sysfs_lock doesn't make sense. Hence update the show method of these sysfs attributes(RO) such that they run with holding q-> limits_lock instead of q->sysfs_lock. We have defined a new macro QUEUE_LIM_RO_ENTRY() which uses new ->show_ limit() method and it runs holding q->limits_lock. All existing sysfs attributes(RO) which needs protection using q->limits_lock while reading have been now updated to use this new macro for initialization. Also, the existing QUEUE_LIM_RW_ENTRY() is updated to use new ->show_ limit() method for reading attributes instead of existing ->show() method. As ->show_limit() runs holding q->limits_lock, the existing sysfs attributes(RW) requiring protection are now inherently protected using q->limits_lock instead of q->sysfs_lock. Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Reviewed-by: Ming Lei Signed-off-by: Nilay Shroff --- block/blk-sysfs.c | 102 +++++++++++++++++++++++++++++----------------- 1 file changed, 65 insertions(+), 37 deletions(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 6f548a4376aa..eba5121690af 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -23,9 +23,12 @@ struct queue_sysfs_entry { struct attribute attr; ssize_t (*show)(struct gendisk *disk, char *page); + ssize_t (*show_limit)(struct gendisk *disk, char *page); + ssize_t (*store)(struct gendisk *disk, const char *page, size_t count); int (*store_limit)(struct gendisk *disk, const char *page, size_t count, struct queue_limits *lim); + void (*load_module)(struct gendisk *disk, const char *page, size_t count); }; @@ -412,10 +415,16 @@ static struct queue_sysfs_entry _prefix##_entry = { \ .store = _prefix##_store, \ }; +#define QUEUE_LIM_RO_ENTRY(_prefix, _name) \ +static struct queue_sysfs_entry _prefix##_entry = { \ + .attr = { .name = _name, .mode = 0444 }, \ + .show_limit = _prefix##_show, \ +} + #define QUEUE_LIM_RW_ENTRY(_prefix, _name) \ static struct queue_sysfs_entry _prefix##_entry = { \ .attr = { .name = _name, .mode = 0644 }, \ - .show = _prefix##_show, \ + .show_limit = _prefix##_show, \ .store_limit = _prefix##_store, \ } @@ -430,39 +439,39 @@ static struct queue_sysfs_entry _prefix##_entry = { \ QUEUE_RW_ENTRY(queue_requests, "nr_requests"); QUEUE_RW_ENTRY(queue_ra, "read_ahead_kb"); QUEUE_LIM_RW_ENTRY(queue_max_sectors, "max_sectors_kb"); -QUEUE_RO_ENTRY(queue_max_hw_sectors, "max_hw_sectors_kb"); -QUEUE_RO_ENTRY(queue_max_segments, "max_segments"); -QUEUE_RO_ENTRY(queue_max_integrity_segments, "max_integrity_segments"); -QUEUE_RO_ENTRY(queue_max_segment_size, "max_segment_size"); +QUEUE_LIM_RO_ENTRY(queue_max_hw_sectors, "max_hw_sectors_kb"); +QUEUE_LIM_RO_ENTRY(queue_max_segments, "max_segments"); +QUEUE_LIM_RO_ENTRY(queue_max_integrity_segments, "max_integrity_segments"); +QUEUE_LIM_RO_ENTRY(queue_max_segment_size, "max_segment_size"); QUEUE_RW_LOAD_MODULE_ENTRY(elv_iosched, "scheduler"); -QUEUE_RO_ENTRY(queue_logical_block_size, "logical_block_size"); -QUEUE_RO_ENTRY(queue_physical_block_size, "physical_block_size"); -QUEUE_RO_ENTRY(queue_chunk_sectors, "chunk_sectors"); -QUEUE_RO_ENTRY(queue_io_min, "minimum_io_size"); -QUEUE_RO_ENTRY(queue_io_opt, "optimal_io_size"); +QUEUE_LIM_RO_ENTRY(queue_logical_block_size, "logical_block_size"); +QUEUE_LIM_RO_ENTRY(queue_physical_block_size, "physical_block_size"); +QUEUE_LIM_RO_ENTRY(queue_chunk_sectors, "chunk_sectors"); +QUEUE_LIM_RO_ENTRY(queue_io_min, "minimum_io_size"); +QUEUE_LIM_RO_ENTRY(queue_io_opt, "optimal_io_size"); -QUEUE_RO_ENTRY(queue_max_discard_segments, "max_discard_segments"); -QUEUE_RO_ENTRY(queue_discard_granularity, "discard_granularity"); -QUEUE_RO_ENTRY(queue_max_hw_discard_sectors, "discard_max_hw_bytes"); +QUEUE_LIM_RO_ENTRY(queue_max_discard_segments, "max_discard_segments"); +QUEUE_LIM_RO_ENTRY(queue_discard_granularity, "discard_granularity"); +QUEUE_LIM_RO_ENTRY(queue_max_hw_discard_sectors, "discard_max_hw_bytes"); QUEUE_LIM_RW_ENTRY(queue_max_discard_sectors, "discard_max_bytes"); QUEUE_RO_ENTRY(queue_discard_zeroes_data, "discard_zeroes_data"); -QUEUE_RO_ENTRY(queue_atomic_write_max_sectors, "atomic_write_max_bytes"); -QUEUE_RO_ENTRY(queue_atomic_write_boundary_sectors, +QUEUE_LIM_RO_ENTRY(queue_atomic_write_max_sectors, "atomic_write_max_bytes"); +QUEUE_LIM_RO_ENTRY(queue_atomic_write_boundary_sectors, "atomic_write_boundary_bytes"); -QUEUE_RO_ENTRY(queue_atomic_write_unit_max, "atomic_write_unit_max_bytes"); -QUEUE_RO_ENTRY(queue_atomic_write_unit_min, "atomic_write_unit_min_bytes"); +QUEUE_LIM_RO_ENTRY(queue_atomic_write_unit_max, "atomic_write_unit_max_bytes"); +QUEUE_LIM_RO_ENTRY(queue_atomic_write_unit_min, "atomic_write_unit_min_bytes"); QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes"); -QUEUE_RO_ENTRY(queue_max_write_zeroes_sectors, "write_zeroes_max_bytes"); -QUEUE_RO_ENTRY(queue_max_zone_append_sectors, "zone_append_max_bytes"); -QUEUE_RO_ENTRY(queue_zone_write_granularity, "zone_write_granularity"); +QUEUE_LIM_RO_ENTRY(queue_max_write_zeroes_sectors, "write_zeroes_max_bytes"); +QUEUE_LIM_RO_ENTRY(queue_max_zone_append_sectors, "zone_append_max_bytes"); +QUEUE_LIM_RO_ENTRY(queue_zone_write_granularity, "zone_write_granularity"); -QUEUE_RO_ENTRY(queue_zoned, "zoned"); +QUEUE_LIM_RO_ENTRY(queue_zoned, "zoned"); QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones"); -QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones"); -QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones"); +QUEUE_LIM_RO_ENTRY(queue_max_open_zones, "max_open_zones"); +QUEUE_LIM_RO_ENTRY(queue_max_active_zones, "max_active_zones"); QUEUE_RW_ENTRY(queue_nomerges, "nomerges"); QUEUE_LIM_RW_ENTRY(queue_iostats_passthrough, "iostats_passthrough"); @@ -470,16 +479,16 @@ QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity"); QUEUE_RW_ENTRY(queue_poll, "io_poll"); QUEUE_RW_ENTRY(queue_poll_delay, "io_poll_delay"); QUEUE_LIM_RW_ENTRY(queue_wc, "write_cache"); -QUEUE_RO_ENTRY(queue_fua, "fua"); -QUEUE_RO_ENTRY(queue_dax, "dax"); +QUEUE_LIM_RO_ENTRY(queue_fua, "fua"); +QUEUE_LIM_RO_ENTRY(queue_dax, "dax"); QUEUE_RW_ENTRY(queue_io_timeout, "io_timeout"); -QUEUE_RO_ENTRY(queue_virt_boundary_mask, "virt_boundary_mask"); -QUEUE_RO_ENTRY(queue_dma_alignment, "dma_alignment"); +QUEUE_LIM_RO_ENTRY(queue_virt_boundary_mask, "virt_boundary_mask"); +QUEUE_LIM_RO_ENTRY(queue_dma_alignment, "dma_alignment"); /* legacy alias for logical_block_size: */ static struct queue_sysfs_entry queue_hw_sector_size_entry = { - .attr = {.name = "hw_sector_size", .mode = 0444 }, - .show = queue_logical_block_size_show, + .attr = {.name = "hw_sector_size", .mode = 0444 }, + .show_limit = queue_logical_block_size_show, }; QUEUE_LIM_RW_ENTRY(queue_rotational, "rotational"); @@ -561,7 +570,9 @@ QUEUE_RW_ENTRY(queue_wb_lat, "wbt_lat_usec"); /* Common attributes for bio-based and request-based queues. */ static struct attribute *queue_attrs[] = { - &queue_ra_entry.attr, + /* + * Attributes which are protected with q->limits_lock. + */ &queue_max_hw_sectors_entry.attr, &queue_max_sectors_entry.attr, &queue_max_segments_entry.attr, @@ -577,37 +588,46 @@ static struct attribute *queue_attrs[] = { &queue_discard_granularity_entry.attr, &queue_max_discard_sectors_entry.attr, &queue_max_hw_discard_sectors_entry.attr, - &queue_discard_zeroes_data_entry.attr, &queue_atomic_write_max_sectors_entry.attr, &queue_atomic_write_boundary_sectors_entry.attr, &queue_atomic_write_unit_min_entry.attr, &queue_atomic_write_unit_max_entry.attr, - &queue_write_same_max_entry.attr, &queue_max_write_zeroes_sectors_entry.attr, &queue_max_zone_append_sectors_entry.attr, &queue_zone_write_granularity_entry.attr, &queue_rotational_entry.attr, &queue_zoned_entry.attr, - &queue_nr_zones_entry.attr, &queue_max_open_zones_entry.attr, &queue_max_active_zones_entry.attr, - &queue_nomerges_entry.attr, &queue_iostats_passthrough_entry.attr, &queue_iostats_entry.attr, &queue_stable_writes_entry.attr, &queue_add_random_entry.attr, - &queue_poll_entry.attr, &queue_wc_entry.attr, &queue_fua_entry.attr, &queue_dax_entry.attr, - &queue_poll_delay_entry.attr, &queue_virt_boundary_mask_entry.attr, &queue_dma_alignment_entry.attr, + + /* + * Attributes which are protected with q->sysfs_lock. + */ + &queue_ra_entry.attr, + &queue_discard_zeroes_data_entry.attr, + &queue_write_same_max_entry.attr, + &queue_nr_zones_entry.attr, + &queue_nomerges_entry.attr, + &queue_poll_entry.attr, + &queue_poll_delay_entry.attr, + NULL, }; /* Request-based queue attributes that are not relevant for bio-based queues. */ static struct attribute *blk_mq_queue_attrs[] = { + /* + * Attributes which are protected with q->sysfs_lock. + */ &queue_requests_entry.attr, &elv_iosched_entry.attr, &queue_rq_affinity_entry.attr, @@ -666,8 +686,16 @@ queue_attr_show(struct kobject *kobj, struct attribute *attr, char *page) struct gendisk *disk = container_of(kobj, struct gendisk, queue_kobj); ssize_t res; - if (!entry->show) + if (!entry->show && !entry->show_limit) return -EIO; + + if (entry->show_limit) { + mutex_lock(&disk->queue->limits_lock); + res = entry->show_limit(disk, page); + mutex_unlock(&disk->queue->limits_lock); + return res; + } + mutex_lock(&disk->queue->sysfs_lock); res = entry->show(disk, page); mutex_unlock(&disk->queue->sysfs_lock); From patchwork Tue Mar 4 10:22:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nilay Shroff X-Patchwork-Id: 14000369 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B7711FFC6D for ; Tue, 4 Mar 2025 10:26:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741083975; cv=none; b=YaimnMY8h+BGolhDKLcp1sF6iRz7gnLEkK4wVAsCoyRtvizNVUf3wQ6hxm+Y+yvEUrezHd7YdQby1Hfl9qV5KnhRi9Z/q5PwC9QBUljMroWKL++TqbsgL8VTYTI5quN/ijOjGPOajJAtFK5gb4ix50QAks2V40jH4M7ckCFvGPE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741083975; c=relaxed/simple; bh=0Ham6fhOwF5aqga6Wl7KFFZYTNoFjQToKzrJgnh7UP8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HRMgdoqMv/Haeg76WUMh+Z8jxLagntVbCl2naDOZWT0Z9lJp1JtE6v3nCXq2YDjy8kkggTbT0rJgee2CACLkZmgCbIdc/hp2YXo/83BqDb+4ExP3pcSnRRWJZPXiTavmOVXTXCbH67jq3bSMKU484VunPFgP7aDN9iw17p4shAU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=oldFVCq8; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="oldFVCq8" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5248g0Cr023991; Tue, 4 Mar 2025 10:26:02 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=9mOYfSxI0q8iIDBSa +UUFTunTMyZ8OY7T3Xsy7Fty4c=; b=oldFVCq8o3YQd3cbEKZdAsMfug2v7Pdly QsbxsaIf3DkByh2WyBeJunL1/cNuyFUWyG0uPfkip87s6IB277jWRs2aZiFgqto2 etnIQ4TWpHKR35GirnbtnDWgL708x/wxrfea8/aytGfCK13IUSjSu9EfDlY9Zj/1 Qd09KOuKdwDAqEgVT82qB8bIHOIgl8Pq95ktb8P5VwFBHCkMxhIhpvwEwaJ6qTXM 1CBJjPJTKrfjVhCKEC7wbXjJdgq0KCQnh0cNaPOmxsgA3SplphLwYvVmz5miPlzM 8zrU1acCvzESOJX44oiVgMK+9XyNMORPmYQnFiK0IBGyKlaSVszNQ== Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 455cf2pm7x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 04 Mar 2025 10:26:02 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5247AAGI008901; Tue, 4 Mar 2025 10:26:01 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 454cxyd0ev-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 04 Mar 2025 10:26:01 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 524APxNe32375042 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 4 Mar 2025 10:25:59 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 211322004B; Tue, 4 Mar 2025 10:25:59 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1666F20043; Tue, 4 Mar 2025 10:25:57 +0000 (GMT) Received: from li-c9696b4c-3419-11b2-a85c-f9edc3bf8a84.in.ibm.com (unknown [9.109.198.149]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 4 Mar 2025 10:25:56 +0000 (GMT) From: Nilay Shroff To: linux-block@vger.kernel.org Cc: hch@lst.de, ming.lei@redhat.com, dlemoal@kernel.org, hare@suse.de, axboe@kernel.dk, lkp@intel.com, gjoyce@ibm.com Subject: [PATCHv6 2/7] block: move q->sysfs_lock and queue-freeze under show/store method Date: Tue, 4 Mar 2025 15:52:31 +0530 Message-ID: <20250304102551.2533767-3-nilay@linux.ibm.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250304102551.2533767-1-nilay@linux.ibm.com> References: <20250304102551.2533767-1-nilay@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: TTF17gDee_YeZ2Q7IEaKGBqjaOUrMuOD X-Proofpoint-ORIG-GUID: TTF17gDee_YeZ2Q7IEaKGBqjaOUrMuOD X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1093,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-03-04_04,2025-03-03_04,2024-11-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 suspectscore=0 malwarescore=0 clxscore=1015 priorityscore=1501 impostorscore=0 spamscore=0 mlxscore=0 bulkscore=0 mlxlogscore=999 phishscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2502100000 definitions=main-2503040084 In preparation to further simplify and group sysfs attributes which don't require locking or require some form of locking other than q-> limits_lock, move acquire/release of q->sysfs_lock and queue freeze/ unfreeze under each attributes' respective show/store method. While we are at it, also remove ->load_module() as it's used to load the module before queue is freezed. Now as we moved queue-freeze under ->store(), we could load module directly from the attributes' store method before we actually start freezing the queue. Currently, the ->load_module() is only used by "scheduler" attribute, so we now load the relevant elevator module before we start freezing the queue in elv_iosched_store(). Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Reviewed-by: Ming Lei Signed-off-by: Nilay Shroff --- block/blk-sysfs.c | 210 +++++++++++++++++++++++++++++++--------------- block/elevator.c | 20 ++++- 2 files changed, 162 insertions(+), 68 deletions(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index eba5121690af..4700ee168ed5 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -28,8 +28,6 @@ struct queue_sysfs_entry { ssize_t (*store)(struct gendisk *disk, const char *page, size_t count); int (*store_limit)(struct gendisk *disk, const char *page, size_t count, struct queue_limits *lim); - - void (*load_module)(struct gendisk *disk, const char *page, size_t count); }; static ssize_t @@ -55,7 +53,12 @@ queue_var_store(unsigned long *var, const char *page, size_t count) static ssize_t queue_requests_show(struct gendisk *disk, char *page) { - return queue_var_show(disk->queue->nr_requests, page); + ssize_t ret; + + mutex_lock(&disk->queue->sysfs_lock); + ret = queue_var_show(disk->queue->nr_requests, page); + mutex_unlock(&disk->queue->sysfs_lock); + return ret; } static ssize_t @@ -63,27 +66,38 @@ queue_requests_store(struct gendisk *disk, const char *page, size_t count) { unsigned long nr; int ret, err; + unsigned int memflags; + struct request_queue *q = disk->queue; - if (!queue_is_mq(disk->queue)) + if (!queue_is_mq(q)) return -EINVAL; ret = queue_var_store(&nr, page, count); if (ret < 0) return ret; + mutex_lock(&q->sysfs_lock); + memflags = blk_mq_freeze_queue(q); if (nr < BLKDEV_MIN_RQ) nr = BLKDEV_MIN_RQ; err = blk_mq_update_nr_requests(disk->queue, nr); if (err) - return err; - + ret = err; + blk_mq_unfreeze_queue(q, memflags); + mutex_unlock(&q->sysfs_lock); return ret; } static ssize_t queue_ra_show(struct gendisk *disk, char *page) { - return queue_var_show(disk->bdi->ra_pages << (PAGE_SHIFT - 10), page); + ssize_t ret; + + mutex_lock(&disk->queue->sysfs_lock); + ret = queue_var_show(disk->bdi->ra_pages << (PAGE_SHIFT - 10), page); + mutex_unlock(&disk->queue->sysfs_lock); + + return ret; } static ssize_t @@ -91,11 +105,19 @@ queue_ra_store(struct gendisk *disk, const char *page, size_t count) { unsigned long ra_kb; ssize_t ret; + unsigned int memflags; + struct request_queue *q = disk->queue; ret = queue_var_store(&ra_kb, page, count); if (ret < 0) return ret; + + mutex_lock(&q->sysfs_lock); + memflags = blk_mq_freeze_queue(q); disk->bdi->ra_pages = ra_kb >> (PAGE_SHIFT - 10); + blk_mq_unfreeze_queue(q, memflags); + mutex_unlock(&q->sysfs_lock); + return ret; } @@ -150,7 +172,12 @@ QUEUE_SYSFS_LIMIT_SHOW_SECTORS_TO_KB(max_hw_sectors) #define QUEUE_SYSFS_SHOW_CONST(_name, _val) \ static ssize_t queue_##_name##_show(struct gendisk *disk, char *page) \ { \ - return sysfs_emit(page, "%d\n", _val); \ + ssize_t ret; \ + \ + mutex_lock(&disk->queue->sysfs_lock); \ + ret = sysfs_emit(page, "%d\n", _val); \ + mutex_unlock(&disk->queue->sysfs_lock); \ + return ret; \ } /* deprecated fields */ @@ -239,10 +266,17 @@ QUEUE_SYSFS_FEATURE_SHOW(dax, BLK_FEAT_DAX); static ssize_t queue_poll_show(struct gendisk *disk, char *page) { - if (queue_is_mq(disk->queue)) - return sysfs_emit(page, "%u\n", blk_mq_can_poll(disk->queue)); - return sysfs_emit(page, "%u\n", - !!(disk->queue->limits.features & BLK_FEAT_POLL)); + ssize_t ret; + + mutex_lock(&disk->queue->sysfs_lock); + if (queue_is_mq(disk->queue)) { + ret = sysfs_emit(page, "%u\n", blk_mq_can_poll(disk->queue)); + } else { + ret = sysfs_emit(page, "%u\n", + !!(disk->queue->limits.features & BLK_FEAT_POLL)); + } + mutex_unlock(&disk->queue->sysfs_lock); + return ret; } static ssize_t queue_zoned_show(struct gendisk *disk, char *page) @@ -254,7 +288,12 @@ static ssize_t queue_zoned_show(struct gendisk *disk, char *page) static ssize_t queue_nr_zones_show(struct gendisk *disk, char *page) { - return queue_var_show(disk_nr_zones(disk), page); + ssize_t ret; + + mutex_lock(&disk->queue->sysfs_lock); + ret = queue_var_show(disk_nr_zones(disk), page); + mutex_unlock(&disk->queue->sysfs_lock); + return ret; } static ssize_t queue_iostats_passthrough_show(struct gendisk *disk, char *page) @@ -281,35 +320,51 @@ static int queue_iostats_passthrough_store(struct gendisk *disk, static ssize_t queue_nomerges_show(struct gendisk *disk, char *page) { - return queue_var_show((blk_queue_nomerges(disk->queue) << 1) | + ssize_t ret; + + mutex_lock(&disk->queue->sysfs_lock); + ret = queue_var_show((blk_queue_nomerges(disk->queue) << 1) | blk_queue_noxmerges(disk->queue), page); + mutex_unlock(&disk->queue->sysfs_lock); + return ret; } static ssize_t queue_nomerges_store(struct gendisk *disk, const char *page, size_t count) { unsigned long nm; + unsigned int memflags; + struct request_queue *q = disk->queue; ssize_t ret = queue_var_store(&nm, page, count); if (ret < 0) return ret; - blk_queue_flag_clear(QUEUE_FLAG_NOMERGES, disk->queue); - blk_queue_flag_clear(QUEUE_FLAG_NOXMERGES, disk->queue); + mutex_lock(&q->sysfs_lock); + memflags = blk_mq_freeze_queue(q); + blk_queue_flag_clear(QUEUE_FLAG_NOMERGES, q); + blk_queue_flag_clear(QUEUE_FLAG_NOXMERGES, q); if (nm == 2) - blk_queue_flag_set(QUEUE_FLAG_NOMERGES, disk->queue); + blk_queue_flag_set(QUEUE_FLAG_NOMERGES, q); else if (nm) - blk_queue_flag_set(QUEUE_FLAG_NOXMERGES, disk->queue); + blk_queue_flag_set(QUEUE_FLAG_NOXMERGES, q); + blk_mq_unfreeze_queue(q, memflags); + mutex_unlock(&q->sysfs_lock); return ret; } static ssize_t queue_rq_affinity_show(struct gendisk *disk, char *page) { - bool set = test_bit(QUEUE_FLAG_SAME_COMP, &disk->queue->queue_flags); - bool force = test_bit(QUEUE_FLAG_SAME_FORCE, &disk->queue->queue_flags); + ssize_t ret; + bool set, force; - return queue_var_show(set << force, page); + mutex_lock(&disk->queue->sysfs_lock); + set = test_bit(QUEUE_FLAG_SAME_COMP, &disk->queue->queue_flags); + force = test_bit(QUEUE_FLAG_SAME_FORCE, &disk->queue->queue_flags); + ret = queue_var_show(set << force, page); + mutex_unlock(&disk->queue->sysfs_lock); + return ret; } static ssize_t @@ -319,11 +374,14 @@ queue_rq_affinity_store(struct gendisk *disk, const char *page, size_t count) #ifdef CONFIG_SMP struct request_queue *q = disk->queue; unsigned long val; + unsigned int memflags; ret = queue_var_store(&val, page, count); if (ret < 0) return ret; + mutex_lock(&q->sysfs_lock); + memflags = blk_mq_freeze_queue(q); if (val == 2) { blk_queue_flag_set(QUEUE_FLAG_SAME_COMP, q); blk_queue_flag_set(QUEUE_FLAG_SAME_FORCE, q); @@ -334,6 +392,8 @@ queue_rq_affinity_store(struct gendisk *disk, const char *page, size_t count) blk_queue_flag_clear(QUEUE_FLAG_SAME_COMP, q); blk_queue_flag_clear(QUEUE_FLAG_SAME_FORCE, q); } + blk_mq_unfreeze_queue(q, memflags); + mutex_unlock(&q->sysfs_lock); #endif return ret; } @@ -347,29 +407,52 @@ static ssize_t queue_poll_delay_store(struct gendisk *disk, const char *page, static ssize_t queue_poll_store(struct gendisk *disk, const char *page, size_t count) { - if (!(disk->queue->limits.features & BLK_FEAT_POLL)) - return -EINVAL; + unsigned int memflags; + ssize_t ret = count; + struct request_queue *q = disk->queue; + + mutex_lock(&q->sysfs_lock); + memflags = blk_mq_freeze_queue(q); + if (!(q->limits.features & BLK_FEAT_POLL)) { + ret = -EINVAL; + goto out; + } pr_info_ratelimited("writes to the poll attribute are ignored.\n"); pr_info_ratelimited("please use driver specific parameters instead.\n"); - return count; +out: + blk_mq_unfreeze_queue(q, memflags); + mutex_unlock(&q->sysfs_lock); + + return ret; } static ssize_t queue_io_timeout_show(struct gendisk *disk, char *page) { - return sysfs_emit(page, "%u\n", jiffies_to_msecs(disk->queue->rq_timeout)); + ssize_t ret; + + mutex_lock(&disk->queue->sysfs_lock); + ret = sysfs_emit(page, "%u\n", + jiffies_to_msecs(disk->queue->rq_timeout)); + mutex_unlock(&disk->queue->sysfs_lock); + return ret; } static ssize_t queue_io_timeout_store(struct gendisk *disk, const char *page, size_t count) { - unsigned int val; + unsigned int val, memflags; int err; + struct request_queue *q = disk->queue; err = kstrtou32(page, 10, &val); if (err || val == 0) return -EINVAL; - blk_queue_rq_timeout(disk->queue, msecs_to_jiffies(val)); + mutex_lock(&q->sysfs_lock); + memflags = blk_mq_freeze_queue(q); + blk_queue_rq_timeout(q, msecs_to_jiffies(val)); + blk_mq_unfreeze_queue(q, memflags); + mutex_unlock(&q->sysfs_lock); return count; } @@ -428,14 +511,6 @@ static struct queue_sysfs_entry _prefix##_entry = { \ .store_limit = _prefix##_store, \ } -#define QUEUE_RW_LOAD_MODULE_ENTRY(_prefix, _name) \ -static struct queue_sysfs_entry _prefix##_entry = { \ - .attr = { .name = _name, .mode = 0644 }, \ - .show = _prefix##_show, \ - .load_module = _prefix##_load_module, \ - .store = _prefix##_store, \ -} - QUEUE_RW_ENTRY(queue_requests, "nr_requests"); QUEUE_RW_ENTRY(queue_ra, "read_ahead_kb"); QUEUE_LIM_RW_ENTRY(queue_max_sectors, "max_sectors_kb"); @@ -443,7 +518,7 @@ QUEUE_LIM_RO_ENTRY(queue_max_hw_sectors, "max_hw_sectors_kb"); QUEUE_LIM_RO_ENTRY(queue_max_segments, "max_segments"); QUEUE_LIM_RO_ENTRY(queue_max_integrity_segments, "max_integrity_segments"); QUEUE_LIM_RO_ENTRY(queue_max_segment_size, "max_segment_size"); -QUEUE_RW_LOAD_MODULE_ENTRY(elv_iosched, "scheduler"); +QUEUE_RW_ENTRY(elv_iosched, "scheduler"); QUEUE_LIM_RO_ENTRY(queue_logical_block_size, "logical_block_size"); QUEUE_LIM_RO_ENTRY(queue_physical_block_size, "physical_block_size"); @@ -512,14 +587,24 @@ static ssize_t queue_var_store64(s64 *var, const char *page) static ssize_t queue_wb_lat_show(struct gendisk *disk, char *page) { - if (!wbt_rq_qos(disk->queue)) - return -EINVAL; + ssize_t ret; + struct request_queue *q = disk->queue; - if (wbt_disabled(disk->queue)) - return sysfs_emit(page, "0\n"); + mutex_lock(&q->sysfs_lock); + if (!wbt_rq_qos(q)) { + ret = -EINVAL; + goto out; + } - return sysfs_emit(page, "%llu\n", - div_u64(wbt_get_min_lat(disk->queue), 1000)); + if (wbt_disabled(q)) { + ret = sysfs_emit(page, "0\n"); + goto out; + } + + ret = sysfs_emit(page, "%llu\n", div_u64(wbt_get_min_lat(q), 1000)); +out: + mutex_unlock(&q->sysfs_lock); + return ret; } static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page, @@ -529,6 +614,7 @@ static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page, struct rq_qos *rqos; ssize_t ret; s64 val; + unsigned int memflags; ret = queue_var_store64(&val, page); if (ret < 0) @@ -536,20 +622,24 @@ static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page, if (val < -1) return -EINVAL; + mutex_lock(&q->sysfs_lock); + memflags = blk_mq_freeze_queue(q); + rqos = wbt_rq_qos(q); if (!rqos) { ret = wbt_init(disk); if (ret) - return ret; + goto out; } + ret = count; if (val == -1) val = wbt_default_latency_nsec(q); else if (val >= 0) val *= 1000ULL; if (wbt_get_min_lat(q) == val) - return count; + goto out; /* * Ensure that the queue is idled, in case the latency update @@ -561,8 +651,11 @@ static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page, wbt_set_min_lat(q, val); blk_mq_unquiesce_queue(q); +out: + blk_mq_unfreeze_queue(q, memflags); + mutex_unlock(&q->sysfs_lock); - return count; + return ret; } QUEUE_RW_ENTRY(queue_wb_lat, "wbt_lat_usec"); @@ -684,22 +777,20 @@ queue_attr_show(struct kobject *kobj, struct attribute *attr, char *page) { struct queue_sysfs_entry *entry = to_queue(attr); struct gendisk *disk = container_of(kobj, struct gendisk, queue_kobj); - ssize_t res; if (!entry->show && !entry->show_limit) return -EIO; if (entry->show_limit) { + ssize_t res; + mutex_lock(&disk->queue->limits_lock); res = entry->show_limit(disk, page); mutex_unlock(&disk->queue->limits_lock); return res; } - mutex_lock(&disk->queue->sysfs_lock); - res = entry->show(disk, page); - mutex_unlock(&disk->queue->sysfs_lock); - return res; + return entry->show(disk, page); } static ssize_t @@ -709,21 +800,13 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr, struct queue_sysfs_entry *entry = to_queue(attr); struct gendisk *disk = container_of(kobj, struct gendisk, queue_kobj); struct request_queue *q = disk->queue; - unsigned int memflags; - ssize_t res; if (!entry->store_limit && !entry->store) return -EIO; - /* - * If the attribute needs to load a module, do it before freezing the - * queue to ensure that the module file can be read when the request - * queue is the one for the device storing the module file. - */ - if (entry->load_module) - entry->load_module(disk, page, length); - if (entry->store_limit) { + ssize_t res; + struct queue_limits lim = queue_limits_start_update(q); res = entry->store_limit(disk, page, length, &lim); @@ -738,12 +821,7 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr, return length; } - mutex_lock(&q->sysfs_lock); - memflags = blk_mq_freeze_queue(q); - res = entry->store(disk, page, length); - blk_mq_unfreeze_queue(q, memflags); - mutex_unlock(&q->sysfs_lock); - return res; + return entry->store(disk, page, length); } static const struct sysfs_ops queue_sysfs_ops = { diff --git a/block/elevator.c b/block/elevator.c index cd2ce4921601..041f1d983bc7 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -723,11 +723,24 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf, { char elevator_name[ELV_NAME_MAX]; int ret; + unsigned int memflags; + struct request_queue *q = disk->queue; + /* + * If the attribute needs to load a module, do it before freezing the + * queue to ensure that the module file can be read when the request + * queue is the one for the device storing the module file. + */ + elv_iosched_load_module(disk, buf, count); strscpy(elevator_name, buf, sizeof(elevator_name)); - ret = elevator_change(disk->queue, strstrip(elevator_name)); + + mutex_lock(&q->sysfs_lock); + memflags = blk_mq_freeze_queue(q); + ret = elevator_change(q, strstrip(elevator_name)); if (!ret) - return count; + ret = count; + blk_mq_unfreeze_queue(q, memflags); + mutex_unlock(&q->sysfs_lock); return ret; } @@ -738,6 +751,7 @@ ssize_t elv_iosched_show(struct gendisk *disk, char *name) struct elevator_type *cur = NULL, *e; int len = 0; + mutex_lock(&q->sysfs_lock); if (!q->elevator) { len += sprintf(name+len, "[none] "); } else { @@ -755,6 +769,8 @@ ssize_t elv_iosched_show(struct gendisk *disk, char *name) spin_unlock(&elv_list_lock); len += sprintf(name+len, "\n"); + mutex_unlock(&q->sysfs_lock); + return len; } From patchwork Tue Mar 4 10:22:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nilay Shroff X-Patchwork-Id: 14000371 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6422C1FE461 for ; Tue, 4 Mar 2025 10:26:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741083978; cv=none; b=UXxxTFZ5TwjelaXTaSjyWVR3IOE9Zy2+SwcBhVCUXqWYO4GcfaGXO3biDxYEDs9YMgGuY0Qva1Q3cE5HOQMvvJicsR45jaJqUh6pk7Sw8/COKq/VjEI49O/GEPDB3KpfueR29D/bKl8BE0H9ZWHsylHz1hfy8YTF4HuCtUXnD9w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741083978; c=relaxed/simple; bh=/6HJRRR9aL+b1mXAJGV7JwiSkxx/2popuvQjFgXhIfI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jntuikrSFaH4itIxBQa2XYVPgX2p6TXUf8d/OhLd+2OrIxv9LQMQJtImwRd2CdzZrVYLc77vwTIjusZkJ2LOBDVPf2GTwQ1hotYUznAdEYgG4WgSeWjabZaft5uQZ4U29iLmx0tBaNzl1n0L5a8G1mdyVQG1wORLYPOn+k09Ygs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=KlZypEb9; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="KlZypEb9" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 523KtN66022669; Tue, 4 Mar 2025 10:26:04 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=nPHDdpmD3idZ+8g7J K/XGyPfBonvq0zfSYQxomINwcQ=; b=KlZypEb9Ew3ys3yiqASaqj4kdC1B2fIhs 2zEUtvKKWffwe7G4LT0B5w2h5FJpMV+tvVAr4osDpIJn6LvxAMmFeCLYJCLIbuEX jgO/1uHJXAxBC05RlQclD/K6hfp86eCWDe4xr29jjH5kDxrS4v1eE2QwX9dd1h+e bJEo9IMVmioBmY7vmPx73sho9E4YKU0knbtpLsieKlpbqAjLjkiMtyWgkegMLHkL T4MTft2m/slGRpRwJBmsxwoqxsiQlqd42+Gn53a8Tv4eQjQc1/3wvAyQ4xkue1Il Q1YFRcr/LyPp5Fi+SXeusiFF8DzNMveOudysC9YGNRJqUyvJQlynw== Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 455dunx2a1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 04 Mar 2025 10:26:03 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5249WBnE025031; Tue, 4 Mar 2025 10:26:03 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 454f91veer-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 04 Mar 2025 10:26:02 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 524AQ1vn20971864 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 4 Mar 2025 10:26:01 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4621C2004B; Tue, 4 Mar 2025 10:26:01 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7AFC320040; Tue, 4 Mar 2025 10:25:59 +0000 (GMT) Received: from li-c9696b4c-3419-11b2-a85c-f9edc3bf8a84.in.ibm.com (unknown [9.109.198.149]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 4 Mar 2025 10:25:59 +0000 (GMT) From: Nilay Shroff To: linux-block@vger.kernel.org Cc: hch@lst.de, ming.lei@redhat.com, dlemoal@kernel.org, hare@suse.de, axboe@kernel.dk, lkp@intel.com, gjoyce@ibm.com Subject: [PATCHv6 3/7] block: remove q->sysfs_lock for attributes which don't need it Date: Tue, 4 Mar 2025 15:52:32 +0530 Message-ID: <20250304102551.2533767-4-nilay@linux.ibm.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250304102551.2533767-1-nilay@linux.ibm.com> References: <20250304102551.2533767-1-nilay@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 3-kA5q5O4phnDCnnLH-SHAcx49Drgxgq X-Proofpoint-GUID: 3-kA5q5O4phnDCnnLH-SHAcx49Drgxgq X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1093,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-03-04_04,2025-03-03_04,2024-11-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 adultscore=0 priorityscore=1501 malwarescore=0 bulkscore=0 spamscore=0 suspectscore=0 mlxlogscore=999 phishscore=0 clxscore=1015 mlxscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2502100000 definitions=main-2503040084 There're few sysfs attributes in block layer which don't really need acquiring q->sysfs_lock while accessing it. The reason being, reading/ writing a value from/to such attributes are either atomic or could be easily protected using READ_ONCE()/WRITE_ONCE(). Moreover, sysfs attributes are inherently protected with sysfs/kernfs internal locking. So this change help segregate all existing sysfs attributes for which we could avoid acquiring q->sysfs_lock. For all read-only attributes we removed the q->sysfs_lock from show method of such attributes. In case attribute is read/write then we removed the q->sysfs_lock from both show and store methods of these attributes. We audited all block sysfs attributes and found following list of attributes which shouldn't require q->sysfs_lock protection: 1. io_poll: Write to this attribute is ignored. So, we don't need q->sysfs_lock. 2. io_poll_delay: Write to this attribute is NOP, so we don't need q->sysfs_lock. 3. io_timeout: Write to this attribute updates q->rq_timeout and read of this attribute returns the value stored in q->rq_timeout Moreover, the q->rq_timeout is set only once when we init the queue (under blk_mq_ init_allocated_queue()) even before disk is added. So that means that we don't need to protect it with q->sysfs_lock. As this attribute is not directly correlated with anything else simply using READ_ONCE/WRITE_ONCE should be enough. 4. nomerges: Write to this attribute file updates two q->flags : QUEUE_FLAG_ NOMERGES and QUEUE_FLAG_NOXMERGES. These flags are accessed during bio-merge which anyways doesn't run with q->sysfs_lock held. Moreover, the q->flags are updated/accessed with bitops which are atomic. So, protecting it with q->sysfs_lock is not necessary. 5. rq_affinity: Write to this attribute file makes atomic updates to q->flags: QUEUE_FLAG_SAME_COMP and QUEUE_FLAG_SAME_FORCE. These flags are also accessed from blk_mq_complete_need_ipi() using test_bit macro. As read/write to q->flags uses bitops which are atomic, protecting it with q->stsys_lock is not necessary. 6. nr_zones: Write to this attribute happens in the driver probe method (except nvme) before disk is added and outside of q->sysfs_lock or any other lock. Moreover nr_zones is defined as "unsigned int" and so reading this attribute, even when it's simultaneously being updated on other cpu, should not return torn value on any architecture supported by linux. So we can avoid using q->sysfs_lock or any other lock/ protection while reading this attribute. 7. discard_zeroes_data: Reading of this attribute always returns 0, so we don't require holding q->sysfs_lock. 8. write_same_max_bytes Reading of this attribute always returns 0, so we don't require holding q->sysfs_lock. Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Reviewed-by: Ming Lei Signed-off-by: Nilay Shroff --- block/blk-settings.c | 2 +- block/blk-sysfs.c | 81 +++++++++++++++----------------------------- 2 files changed, 29 insertions(+), 54 deletions(-) diff --git a/block/blk-settings.c b/block/blk-settings.c index b9c6f0ec1c49..78f6a832a3f4 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -21,7 +21,7 @@ void blk_queue_rq_timeout(struct request_queue *q, unsigned int timeout) { - q->rq_timeout = timeout; + WRITE_ONCE(q->rq_timeout, timeout); } EXPORT_SYMBOL_GPL(blk_queue_rq_timeout); diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 4700ee168ed5..bc641ac71cde 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -172,12 +172,7 @@ QUEUE_SYSFS_LIMIT_SHOW_SECTORS_TO_KB(max_hw_sectors) #define QUEUE_SYSFS_SHOW_CONST(_name, _val) \ static ssize_t queue_##_name##_show(struct gendisk *disk, char *page) \ { \ - ssize_t ret; \ - \ - mutex_lock(&disk->queue->sysfs_lock); \ - ret = sysfs_emit(page, "%d\n", _val); \ - mutex_unlock(&disk->queue->sysfs_lock); \ - return ret; \ + return sysfs_emit(page, "%d\n", _val); \ } /* deprecated fields */ @@ -266,17 +261,11 @@ QUEUE_SYSFS_FEATURE_SHOW(dax, BLK_FEAT_DAX); static ssize_t queue_poll_show(struct gendisk *disk, char *page) { - ssize_t ret; + if (queue_is_mq(disk->queue)) + return sysfs_emit(page, "%u\n", blk_mq_can_poll(disk->queue)); - mutex_lock(&disk->queue->sysfs_lock); - if (queue_is_mq(disk->queue)) { - ret = sysfs_emit(page, "%u\n", blk_mq_can_poll(disk->queue)); - } else { - ret = sysfs_emit(page, "%u\n", + return sysfs_emit(page, "%u\n", !!(disk->queue->limits.features & BLK_FEAT_POLL)); - } - mutex_unlock(&disk->queue->sysfs_lock); - return ret; } static ssize_t queue_zoned_show(struct gendisk *disk, char *page) @@ -288,12 +277,7 @@ static ssize_t queue_zoned_show(struct gendisk *disk, char *page) static ssize_t queue_nr_zones_show(struct gendisk *disk, char *page) { - ssize_t ret; - - mutex_lock(&disk->queue->sysfs_lock); - ret = queue_var_show(disk_nr_zones(disk), page); - mutex_unlock(&disk->queue->sysfs_lock); - return ret; + return queue_var_show(disk_nr_zones(disk), page); } static ssize_t queue_iostats_passthrough_show(struct gendisk *disk, char *page) @@ -320,13 +304,8 @@ static int queue_iostats_passthrough_store(struct gendisk *disk, static ssize_t queue_nomerges_show(struct gendisk *disk, char *page) { - ssize_t ret; - - mutex_lock(&disk->queue->sysfs_lock); - ret = queue_var_show((blk_queue_nomerges(disk->queue) << 1) | + return queue_var_show((blk_queue_nomerges(disk->queue) << 1) | blk_queue_noxmerges(disk->queue), page); - mutex_unlock(&disk->queue->sysfs_lock); - return ret; } static ssize_t queue_nomerges_store(struct gendisk *disk, const char *page, @@ -340,7 +319,6 @@ static ssize_t queue_nomerges_store(struct gendisk *disk, const char *page, if (ret < 0) return ret; - mutex_lock(&q->sysfs_lock); memflags = blk_mq_freeze_queue(q); blk_queue_flag_clear(QUEUE_FLAG_NOMERGES, q); blk_queue_flag_clear(QUEUE_FLAG_NOXMERGES, q); @@ -349,22 +327,16 @@ static ssize_t queue_nomerges_store(struct gendisk *disk, const char *page, else if (nm) blk_queue_flag_set(QUEUE_FLAG_NOXMERGES, q); blk_mq_unfreeze_queue(q, memflags); - mutex_unlock(&q->sysfs_lock); return ret; } static ssize_t queue_rq_affinity_show(struct gendisk *disk, char *page) { - ssize_t ret; - bool set, force; + bool set = test_bit(QUEUE_FLAG_SAME_COMP, &disk->queue->queue_flags); + bool force = test_bit(QUEUE_FLAG_SAME_FORCE, &disk->queue->queue_flags); - mutex_lock(&disk->queue->sysfs_lock); - set = test_bit(QUEUE_FLAG_SAME_COMP, &disk->queue->queue_flags); - force = test_bit(QUEUE_FLAG_SAME_FORCE, &disk->queue->queue_flags); - ret = queue_var_show(set << force, page); - mutex_unlock(&disk->queue->sysfs_lock); - return ret; + return queue_var_show(set << force, page); } static ssize_t @@ -380,7 +352,12 @@ queue_rq_affinity_store(struct gendisk *disk, const char *page, size_t count) if (ret < 0) return ret; - mutex_lock(&q->sysfs_lock); + /* + * Here we update two queue flags each using atomic bitops, although + * updating two flags isn't atomic it should be harmless as those flags + * are accessed individually using atomic test_bit operation. So we + * don't grab any lock while updating these flags. + */ memflags = blk_mq_freeze_queue(q); if (val == 2) { blk_queue_flag_set(QUEUE_FLAG_SAME_COMP, q); @@ -393,7 +370,6 @@ queue_rq_affinity_store(struct gendisk *disk, const char *page, size_t count) blk_queue_flag_clear(QUEUE_FLAG_SAME_FORCE, q); } blk_mq_unfreeze_queue(q, memflags); - mutex_unlock(&q->sysfs_lock); #endif return ret; } @@ -411,30 +387,23 @@ static ssize_t queue_poll_store(struct gendisk *disk, const char *page, ssize_t ret = count; struct request_queue *q = disk->queue; - mutex_lock(&q->sysfs_lock); memflags = blk_mq_freeze_queue(q); if (!(q->limits.features & BLK_FEAT_POLL)) { ret = -EINVAL; goto out; } + pr_info_ratelimited("writes to the poll attribute are ignored.\n"); pr_info_ratelimited("please use driver specific parameters instead.\n"); out: blk_mq_unfreeze_queue(q, memflags); - mutex_unlock(&q->sysfs_lock); - return ret; } static ssize_t queue_io_timeout_show(struct gendisk *disk, char *page) { - ssize_t ret; - - mutex_lock(&disk->queue->sysfs_lock); - ret = sysfs_emit(page, "%u\n", - jiffies_to_msecs(disk->queue->rq_timeout)); - mutex_unlock(&disk->queue->sysfs_lock); - return ret; + return sysfs_emit(page, "%u\n", + jiffies_to_msecs(READ_ONCE(disk->queue->rq_timeout))); } static ssize_t queue_io_timeout_store(struct gendisk *disk, const char *page, @@ -448,11 +417,9 @@ static ssize_t queue_io_timeout_store(struct gendisk *disk, const char *page, if (err || val == 0) return -EINVAL; - mutex_lock(&q->sysfs_lock); memflags = blk_mq_freeze_queue(q); blk_queue_rq_timeout(q, msecs_to_jiffies(val)); blk_mq_unfreeze_queue(q, memflags); - mutex_unlock(&q->sysfs_lock); return count; } @@ -706,6 +673,10 @@ static struct attribute *queue_attrs[] = { * Attributes which are protected with q->sysfs_lock. */ &queue_ra_entry.attr, + + /* + * Attributes which don't require locking. + */ &queue_discard_zeroes_data_entry.attr, &queue_write_same_max_entry.attr, &queue_nr_zones_entry.attr, @@ -723,11 +694,15 @@ static struct attribute *blk_mq_queue_attrs[] = { */ &queue_requests_entry.attr, &elv_iosched_entry.attr, - &queue_rq_affinity_entry.attr, - &queue_io_timeout_entry.attr, #ifdef CONFIG_BLK_WBT &queue_wb_lat_entry.attr, #endif + /* + * Attributes which don't require locking. + */ + &queue_rq_affinity_entry.attr, + &queue_io_timeout_entry.attr, + NULL, }; From patchwork Tue Mar 4 10:22:33 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nilay Shroff X-Patchwork-Id: 14000373 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C83E201266 for ; Tue, 4 Mar 2025 10:26:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741083985; cv=none; b=u36TDPa+eT/Y7cB0tG7YnfuaTnd2KpNqvA4t8x3PchL/bMAJk6bVmQA3jF2df3rfatMnHLfwxw3jaXHrEoMSUzIgxwp1ozDgE3oPsCR6UIcyPR4UskPghrTrfozDBYTYvEER2uO+MLtrCZO3MDepdQKXjjqPuxdtgF1LR4hLDsQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741083985; c=relaxed/simple; bh=V5b2UWJeFjdZ8GD+7QQL+x3TbVaBcwvBi3UOknLA2aw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DrXqHK/NGhoD2+JZ+GgokoSxPe4EkKPvSEZUsJbjOWbmA/Aey4cgQYtaBtGFsiJHIVoHGpCli0JnRS9ZPYtRdq5twFPDfTKvcRVaaUsABGVKLiVFLs+w+wb4pEedSb5ozvWKubUdtb4RdDR/9ysnnd4mcMd3tMzfGSDE4IN7ZGo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=P6Djj+YR; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="P6Djj+YR" Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5249N8Co026649; Tue, 4 Mar 2025 10:26:06 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=PViQwPQ61LV3rDloT gbZrRM1tqkzM82sk+3zKVQILvM=; b=P6Djj+YRy4y00EflRZA25BInTfx2m7Jbl va5oYMvCRZ+FlIwaDheN+0Ugj/Hk7pyxZkyzVjdCuhvXYHmyCfc6I2UBj+aePZfu 7xoowufHk33VO/T3VbozbZwQ2REJMhbBbrnhZDEnGhi6MXvTRZx5XBmXUyi3Nhmp pg/RFzaCHJ2lX9Q6MHWgOV14dH5LOYmlOG3JIhLmAOpnuuLfitagvXKSVuO609V6 9R6H4GmRSr392Z4hqTdjUHDQwk0TzpRW1/6OeD5BrYEBsNHhoSH8OfhYMK8f5m6A 6A6US43beDGSqAU1p3zxLrAFf+4Y67yeyXciIdgn389v69BJ22dyQ== Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 455kkpb83c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 04 Mar 2025 10:26:05 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 52477hHw032680; Tue, 4 Mar 2025 10:26:05 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 454cjsw3ad-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 04 Mar 2025 10:26:04 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 524AQ3Yq47841780 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 4 Mar 2025 10:26:03 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5455E20043; Tue, 4 Mar 2025 10:26:03 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A061020040; Tue, 4 Mar 2025 10:26:01 +0000 (GMT) Received: from li-c9696b4c-3419-11b2-a85c-f9edc3bf8a84.in.ibm.com (unknown [9.109.198.149]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 4 Mar 2025 10:26:01 +0000 (GMT) From: Nilay Shroff To: linux-block@vger.kernel.org Cc: hch@lst.de, ming.lei@redhat.com, dlemoal@kernel.org, hare@suse.de, axboe@kernel.dk, lkp@intel.com, gjoyce@ibm.com Subject: [PATCHv6 4/7] block: introduce a dedicated lock for protecting queue elevator updates Date: Tue, 4 Mar 2025 15:52:33 +0530 Message-ID: <20250304102551.2533767-5-nilay@linux.ibm.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250304102551.2533767-1-nilay@linux.ibm.com> References: <20250304102551.2533767-1-nilay@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: oDqKZUTpS13DynysLPZKOghibafHEc8n X-Proofpoint-ORIG-GUID: oDqKZUTpS13DynysLPZKOghibafHEc8n X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1093,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-03-04_04,2025-03-03_04,2024-11-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 clxscore=1015 malwarescore=0 adultscore=0 impostorscore=0 bulkscore=0 mlxlogscore=816 phishscore=0 spamscore=0 priorityscore=1501 mlxscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2502100000 definitions=main-2503040080 A queue's elevator can be updated either when modifying nr_hw_queues or through the sysfs scheduler attribute. Currently, elevator switching/ updating is protected using q->sysfs_lock, but this has led to lockdep splats[1] due to inconsistent lock ordering between q->sysfs_lock and the freeze-lock in multiple block layer call sites. As the scope of q->sysfs_lock is not well-defined, its (mis)use has resulted in numerous lockdep warnings. To address this, introduce a new q->elevator_lock, dedicated specifically for protecting elevator switches/updates. And we'd now use this new q->elevator_lock instead of q->sysfs_lock for protecting elevator switches/updates. While at it, make elv_iosched_load_module() a static function, as it is only called from elv_iosched_store(). Also, remove redundant parameters from elv_iosched_load_module() function signature. [1] https://lore.kernel.org/all/67637e70.050a0220.3157ee.000c.GAE@google.com/ Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Reviewed-by: Ming Lei Signed-off-by: Nilay Shroff --- block/blk-core.c | 1 + block/blk-mq.c | 15 +++++++-------- block/blk-sysfs.c | 32 ++++++++++++++++++++++---------- block/elevator.c | 35 ++++++++++++++++------------------- block/elevator.h | 2 -- block/genhd.c | 9 ++++++--- include/linux/blkdev.h | 8 ++++++++ 7 files changed, 60 insertions(+), 42 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index d6c4fa3943b5..362d0a55b07a 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -429,6 +429,7 @@ struct request_queue *blk_alloc_queue(struct queue_limits *lim, int node_id) refcount_set(&q->refs, 1); mutex_init(&q->debugfs_mutex); + mutex_init(&q->elevator_lock); mutex_init(&q->sysfs_lock); mutex_init(&q->limits_lock); mutex_init(&q->rq_qos_mutex); diff --git a/block/blk-mq.c b/block/blk-mq.c index 40490ac88045..5a2d63927525 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -4467,7 +4467,7 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, unsigned long i, j; /* protect against switching io scheduler */ - mutex_lock(&q->sysfs_lock); + mutex_lock(&q->elevator_lock); for (i = 0; i < set->nr_hw_queues; i++) { int old_node; int node = blk_mq_get_hctx_node(set, i); @@ -4500,7 +4500,7 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, xa_for_each_start(&q->hctx_table, j, hctx, j) blk_mq_exit_hctx(q, set, hctx, j); - mutex_unlock(&q->sysfs_lock); + mutex_unlock(&q->elevator_lock); /* unregister cpuhp callbacks for exited hctxs */ blk_mq_remove_hw_queues_cpuhp(q); @@ -4933,10 +4933,9 @@ static bool blk_mq_elv_switch_none(struct list_head *head, if (!qe) return false; - /* q->elevator needs protection from ->sysfs_lock */ - mutex_lock(&q->sysfs_lock); + /* Accessing q->elevator needs protection from ->elevator_lock. */ + mutex_lock(&q->elevator_lock); - /* the check has to be done with holding sysfs_lock */ if (!q->elevator) { kfree(qe); goto unlock; @@ -4950,7 +4949,7 @@ static bool blk_mq_elv_switch_none(struct list_head *head, list_add(&qe->node, head); elevator_disable(q); unlock: - mutex_unlock(&q->sysfs_lock); + mutex_unlock(&q->elevator_lock); return true; } @@ -4980,11 +4979,11 @@ static void blk_mq_elv_switch_back(struct list_head *head, list_del(&qe->node); kfree(qe); - mutex_lock(&q->sysfs_lock); + mutex_lock(&q->elevator_lock); elevator_switch(q, t); /* drop the reference acquired in blk_mq_elv_switch_none */ elevator_put(t); - mutex_unlock(&q->sysfs_lock); + mutex_unlock(&q->elevator_lock); } static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index bc641ac71cde..1562e22877e1 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -693,10 +693,15 @@ static struct attribute *blk_mq_queue_attrs[] = { * Attributes which are protected with q->sysfs_lock. */ &queue_requests_entry.attr, - &elv_iosched_entry.attr, #ifdef CONFIG_BLK_WBT &queue_wb_lat_entry.attr, #endif + /* + * Attributes which require some form of locking other than + * q->sysfs_lock. + */ + &elv_iosched_entry.attr, + /* * Attributes which don't require locking. */ @@ -865,15 +870,19 @@ int blk_register_queue(struct gendisk *disk) if (ret) goto out_debugfs_remove; + ret = blk_crypto_sysfs_register(disk); + if (ret) + goto out_unregister_ia_ranges; + + mutex_lock(&q->elevator_lock); if (q->elevator) { ret = elv_register_queue(q, false); - if (ret) - goto out_unregister_ia_ranges; + if (ret) { + mutex_unlock(&q->elevator_lock); + goto out_crypto_sysfs_unregister; + } } - - ret = blk_crypto_sysfs_register(disk); - if (ret) - goto out_elv_unregister; + mutex_unlock(&q->elevator_lock); blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q); wbt_enable_default(disk); @@ -898,8 +907,8 @@ int blk_register_queue(struct gendisk *disk) return ret; -out_elv_unregister: - elv_unregister_queue(q); +out_crypto_sysfs_unregister: + blk_crypto_sysfs_unregister(disk); out_unregister_ia_ranges: disk_unregister_independent_access_ranges(disk); out_debugfs_remove: @@ -945,8 +954,11 @@ void blk_unregister_queue(struct gendisk *disk) blk_mq_sysfs_unregister(disk); blk_crypto_sysfs_unregister(disk); - mutex_lock(&q->sysfs_lock); + mutex_lock(&q->elevator_lock); elv_unregister_queue(q); + mutex_unlock(&q->elevator_lock); + + mutex_lock(&q->sysfs_lock); disk_unregister_independent_access_ranges(disk); mutex_unlock(&q->sysfs_lock); diff --git a/block/elevator.c b/block/elevator.c index 041f1d983bc7..b4d08026b02c 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -457,7 +457,7 @@ int elv_register_queue(struct request_queue *q, bool uevent) struct elevator_queue *e = q->elevator; int error; - lockdep_assert_held(&q->sysfs_lock); + lockdep_assert_held(&q->elevator_lock); error = kobject_add(&e->kobj, &q->disk->queue_kobj, "iosched"); if (!error) { @@ -481,7 +481,7 @@ void elv_unregister_queue(struct request_queue *q) { struct elevator_queue *e = q->elevator; - lockdep_assert_held(&q->sysfs_lock); + lockdep_assert_held(&q->elevator_lock); if (e && test_and_clear_bit(ELEVATOR_FLAG_REGISTERED, &e->flags)) { kobject_uevent(&e->kobj, KOBJ_REMOVE); @@ -618,7 +618,7 @@ int elevator_switch(struct request_queue *q, struct elevator_type *new_e) unsigned int memflags; int ret; - lockdep_assert_held(&q->sysfs_lock); + lockdep_assert_held(&q->elevator_lock); memflags = blk_mq_freeze_queue(q); blk_mq_quiesce_queue(q); @@ -655,7 +655,7 @@ void elevator_disable(struct request_queue *q) { unsigned int memflags; - lockdep_assert_held(&q->sysfs_lock); + lockdep_assert_held(&q->elevator_lock); memflags = blk_mq_freeze_queue(q); blk_mq_quiesce_queue(q); @@ -700,28 +700,23 @@ static int elevator_change(struct request_queue *q, const char *elevator_name) return ret; } -void elv_iosched_load_module(struct gendisk *disk, const char *buf, - size_t count) +static void elv_iosched_load_module(char *elevator_name) { - char elevator_name[ELV_NAME_MAX]; struct elevator_type *found; - const char *name; - - strscpy(elevator_name, buf, sizeof(elevator_name)); - name = strstrip(elevator_name); spin_lock(&elv_list_lock); - found = __elevator_find(name); + found = __elevator_find(elevator_name); spin_unlock(&elv_list_lock); if (!found) - request_module("%s-iosched", name); + request_module("%s-iosched", elevator_name); } ssize_t elv_iosched_store(struct gendisk *disk, const char *buf, size_t count) { char elevator_name[ELV_NAME_MAX]; + char *name; int ret; unsigned int memflags; struct request_queue *q = disk->queue; @@ -731,16 +726,18 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf, * queue to ensure that the module file can be read when the request * queue is the one for the device storing the module file. */ - elv_iosched_load_module(disk, buf, count); strscpy(elevator_name, buf, sizeof(elevator_name)); + name = strstrip(elevator_name); + + elv_iosched_load_module(name); - mutex_lock(&q->sysfs_lock); memflags = blk_mq_freeze_queue(q); - ret = elevator_change(q, strstrip(elevator_name)); + mutex_lock(&q->elevator_lock); + ret = elevator_change(q, name); if (!ret) ret = count; + mutex_unlock(&q->elevator_lock); blk_mq_unfreeze_queue(q, memflags); - mutex_unlock(&q->sysfs_lock); return ret; } @@ -751,7 +748,7 @@ ssize_t elv_iosched_show(struct gendisk *disk, char *name) struct elevator_type *cur = NULL, *e; int len = 0; - mutex_lock(&q->sysfs_lock); + mutex_lock(&q->elevator_lock); if (!q->elevator) { len += sprintf(name+len, "[none] "); } else { @@ -769,7 +766,7 @@ ssize_t elv_iosched_show(struct gendisk *disk, char *name) spin_unlock(&elv_list_lock); len += sprintf(name+len, "\n"); - mutex_unlock(&q->sysfs_lock); + mutex_unlock(&q->elevator_lock); return len; } diff --git a/block/elevator.h b/block/elevator.h index e526662c5dbb..e4e44dfac503 100644 --- a/block/elevator.h +++ b/block/elevator.h @@ -148,8 +148,6 @@ extern void elv_unregister(struct elevator_type *); * io scheduler sysfs switching */ ssize_t elv_iosched_show(struct gendisk *disk, char *page); -void elv_iosched_load_module(struct gendisk *disk, const char *page, - size_t count); ssize_t elv_iosched_store(struct gendisk *disk, const char *page, size_t count); extern bool elv_bio_merge_ok(struct request *, struct bio *); diff --git a/block/genhd.c b/block/genhd.c index e9375e20d866..c2bd86cd09de 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -565,8 +565,11 @@ int __must_check add_disk_fwnode(struct device *parent, struct gendisk *disk, if (disk->major == BLOCK_EXT_MAJOR) blk_free_ext_minor(disk->first_minor); out_exit_elevator: - if (disk->queue->elevator) + if (disk->queue->elevator) { + mutex_lock(&disk->queue->elevator_lock); elevator_exit(disk->queue); + mutex_unlock(&disk->queue->elevator_lock); + } return ret; } EXPORT_SYMBOL_GPL(add_disk_fwnode); @@ -742,9 +745,9 @@ void del_gendisk(struct gendisk *disk) blk_mq_quiesce_queue(q); if (q->elevator) { - mutex_lock(&q->sysfs_lock); + mutex_lock(&q->elevator_lock); elevator_exit(q); - mutex_unlock(&q->sysfs_lock); + mutex_unlock(&q->elevator_lock); } rq_qos_exit(q); blk_mq_unquiesce_queue(q); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index d37751789bf5..079bc2d00e22 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -562,6 +562,14 @@ struct request_queue { struct blk_flush_queue *fq; struct list_head flush_list; + /* + * Protects against I/O scheduler switching, specifically when + * updating q->elevator. To ensure proper locking order during + * an elevator update, first freeze the queue, then acquire + * ->elevator_lock. + */ + struct mutex elevator_lock; + struct mutex sysfs_lock; struct mutex limits_lock; From patchwork Tue Mar 4 10:22:34 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nilay Shroff X-Patchwork-Id: 14000376 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 242381F8BAA for ; Tue, 4 Mar 2025 10:27:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741084047; cv=none; b=U3yHOdbpDbn64EjTjyeg99Dfz5QlHvv2wn9CGdE9CfeE13+T13X6pCSQ4uyTXgX1Iwalm9aHwYyIrLS15rk8UJ6FAqYlmzbeVivmr4uLc/4LbG4ricOPLJUq1cnqQchKv3UiOsvCT9DrutwMoYicSnvaYg5ASGz3g3gF4t6BUYo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741084047; c=relaxed/simple; bh=K7TBEhE0QW9iXeG52cYA6Z4gY8b/mOiUYv91re7tiCE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lZ2eszC3WHmbRH2Y78EgFa8e+1Z3O1G6UregWZ3OjGCypTj6QqrYhWd8QElydZ43MB+T48hOQSwrbg2ZBdqPanonAuDaX0xeetAy37b2D+XpfRjClio8Ldc/bW0yV0uOJI/OSIk3oc1Iadfl0JBCjvzBJFXOr3OXsEAwQYOLhdU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=CTu9MD41; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="CTu9MD41" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5249K7aS022808; Tue, 4 Mar 2025 10:26:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=FB2WnC/eRi0Wy05rY 3R4VeIw0TmygO+UpoMIOEpMAUc=; b=CTu9MD41SnmqS9g0nNT1zfvJ5y9q5+YqN WYt/Gjcqq7LWWPx0b1+QDKE4JktAlLoCUZbPP9NssijkPwzmEw3PbvP9Mdm4qLfd 43nMocZySWTRJszwsvkOSMBoP/CzrZrf7rH747KUHICGQGRe5zeARjsckt8it8VW S9B6go4i45oMfaTqpNH8fe4q2xVfGfProqk1K9XXwGii0K1zNHymbGwGEOz7l49i d2U/B9mfsQI4XnvKvEWnSQJKbfZXR+q7YNOPEwHQR9kartYmfXhV/DRdLOsem3ZF KSLytndHJGjjLYk4qQErezSOGNp6KnE/aWPmN8i1HhNwj0jqa0DBQ== Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 455dunx2ae-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 04 Mar 2025 10:26:08 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5247DWmD032219; Tue, 4 Mar 2025 10:26:07 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 454cjsw3aj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 04 Mar 2025 10:26:07 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 524AQ56x45285708 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 4 Mar 2025 10:26:05 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9256E20043; Tue, 4 Mar 2025 10:26:05 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A0D5220040; Tue, 4 Mar 2025 10:26:03 +0000 (GMT) Received: from li-c9696b4c-3419-11b2-a85c-f9edc3bf8a84.in.ibm.com (unknown [9.109.198.149]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 4 Mar 2025 10:26:03 +0000 (GMT) From: Nilay Shroff To: linux-block@vger.kernel.org Cc: hch@lst.de, ming.lei@redhat.com, dlemoal@kernel.org, hare@suse.de, axboe@kernel.dk, lkp@intel.com, gjoyce@ibm.com Subject: [PATCHv6 5/7] block: protect nr_requests update using q->elevator_lock Date: Tue, 4 Mar 2025 15:52:34 +0530 Message-ID: <20250304102551.2533767-6-nilay@linux.ibm.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250304102551.2533767-1-nilay@linux.ibm.com> References: <20250304102551.2533767-1-nilay@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: roPl8adS4pUING2e-0zOJu6lhIvhfMfa X-Proofpoint-GUID: roPl8adS4pUING2e-0zOJu6lhIvhfMfa X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1093,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-03-04_04,2025-03-03_04,2024-11-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 adultscore=0 priorityscore=1501 malwarescore=0 bulkscore=0 spamscore=0 suspectscore=0 mlxlogscore=999 phishscore=0 clxscore=1015 mlxscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2502100000 definitions=main-2503040084 The sysfs attribute nr_requests could be simultaneously updated from elevator switch/update or nr_hw_queue update code path. The update to nr_requests for each of those code paths runs holding q->elevator_lock. So we should protect access to sysfs attribute nr_requests using q-> elevator_lock instead of q->sysfs_lock. Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Reviewed-by: Ming Lei Signed-off-by: Nilay Shroff --- block/blk-sysfs.c | 10 +++++----- include/linux/blkdev.h | 10 ++++++---- 2 files changed, 11 insertions(+), 9 deletions(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 1562e22877e1..f1fa57de29ed 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -55,9 +55,9 @@ static ssize_t queue_requests_show(struct gendisk *disk, char *page) { ssize_t ret; - mutex_lock(&disk->queue->sysfs_lock); + mutex_lock(&disk->queue->elevator_lock); ret = queue_var_show(disk->queue->nr_requests, page); - mutex_unlock(&disk->queue->sysfs_lock); + mutex_unlock(&disk->queue->elevator_lock); return ret; } @@ -76,16 +76,16 @@ queue_requests_store(struct gendisk *disk, const char *page, size_t count) if (ret < 0) return ret; - mutex_lock(&q->sysfs_lock); memflags = blk_mq_freeze_queue(q); + mutex_lock(&q->elevator_lock); if (nr < BLKDEV_MIN_RQ) nr = BLKDEV_MIN_RQ; err = blk_mq_update_nr_requests(disk->queue, nr); if (err) ret = err; + mutex_unlock(&q->elevator_lock); blk_mq_unfreeze_queue(q, memflags); - mutex_unlock(&q->sysfs_lock); return ret; } @@ -692,7 +692,6 @@ static struct attribute *blk_mq_queue_attrs[] = { /* * Attributes which are protected with q->sysfs_lock. */ - &queue_requests_entry.attr, #ifdef CONFIG_BLK_WBT &queue_wb_lat_entry.attr, #endif @@ -701,6 +700,7 @@ static struct attribute *blk_mq_queue_attrs[] = { * q->sysfs_lock. */ &elv_iosched_entry.attr, + &queue_requests_entry.attr, /* * Attributes which don't require locking. diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 079bc2d00e22..ed8d139a4f5d 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -563,10 +563,12 @@ struct request_queue { struct list_head flush_list; /* - * Protects against I/O scheduler switching, specifically when - * updating q->elevator. To ensure proper locking order during - * an elevator update, first freeze the queue, then acquire - * ->elevator_lock. + * Protects against I/O scheduler switching, particularly when + * updating q->elevator. Since the elevator update code path may + * also modify q->nr_requests, this lock also protects the sysfs + * attribute nr_requests. + * To ensure proper locking order during an elevator update, first + * freeze the queue, then acquire ->elevator_lock. */ struct mutex elevator_lock; From patchwork Tue Mar 4 10:22:35 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nilay Shroff X-Patchwork-Id: 14000375 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 552F01FDE2B for ; Tue, 4 Mar 2025 10:27:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741084042; cv=none; b=hhhHs0/N2NX0mmThpYFrequxT5X8e85gGv5qFPUCCjyJQshG91gaYrsdvqo9imjDlaCdf3oqNW+a6dB/XjXt+hpX0BkgDXkbuzxMG2Vu7Ss/rSXhCBO6wkP+o1zjEj+3s6rDEMNvWrhH42uOc3pLuX1pejYuGyZ2n7zwFH1derU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741084042; c=relaxed/simple; bh=qYsOmkxmJSbULeA+8DFy3RPhafw0wQFz03I/J6MOXOg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ubPnXhVj4X++gnUAOrI57tNZkGFNTMMYrRxqmhwSWpi3LC/DZKHjIGk4/PbptfTjqMKYKjcFnefpfAAftZ99NIB0WH62cVhHFm8vbfq1jSu9L1ZuuMJFZgadS+a5LgqyLTCL8Fq9JeDDemlQb64AEKCU1/Ar0yZUXvIX+VRXnTA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=a5CojEmm; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="a5CojEmm" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5248ELH1025152; Tue, 4 Mar 2025 10:26:11 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=J8n/OYwF2NrL4z8Nu bu6AZFt+EdMcDh2V1mtV3lXCAs=; b=a5CojEmmhCi6+Jmq8lNlwvTM4Pqr8heRf duHB968zLsXdg/qrskHxa9dxqUPqNCQq0gjdUfY0uQnHmyqnjF+9XefFSnusdHw9 PWp8LHSiEBbWLYLgJBlkt0dFMBWVo3OJj502zvfJI7vTZpzQzxtoITYbx8GhGNDT O1SzoMGRAwm9q1e7fnn/AZDtEHJCIsioZmrb4Ljy4zm8g4FiJJosqUbNafHnKtrd mf+3djQbcjKlUETgoKRIORqe9zQsQZLGeTI4L6TM9j5Unvcim82KutSl5KKiS6J1 +Ke3To0iQwiZjbOi/YUMwG7Xo4uMr1d4KNJPl+SsGOk4zF6iyNg7Q== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 455cf2pm8h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 04 Mar 2025 10:26:10 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5248FXx1013728; Tue, 4 Mar 2025 10:26:09 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 454e2kmsp3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 04 Mar 2025 10:26:09 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 524AQ7p958917250 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 4 Mar 2025 10:26:07 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B6F8820040; Tue, 4 Mar 2025 10:26:07 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E996A20043; Tue, 4 Mar 2025 10:26:05 +0000 (GMT) Received: from li-c9696b4c-3419-11b2-a85c-f9edc3bf8a84.in.ibm.com (unknown [9.109.198.149]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 4 Mar 2025 10:26:05 +0000 (GMT) From: Nilay Shroff To: linux-block@vger.kernel.org Cc: hch@lst.de, ming.lei@redhat.com, dlemoal@kernel.org, hare@suse.de, axboe@kernel.dk, lkp@intel.com, gjoyce@ibm.com Subject: [PATCHv6 6/7] block: protect wbt_lat_usec using q->elevator_lock Date: Tue, 4 Mar 2025 15:52:35 +0530 Message-ID: <20250304102551.2533767-7-nilay@linux.ibm.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250304102551.2533767-1-nilay@linux.ibm.com> References: <20250304102551.2533767-1-nilay@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: vzze_AJ0ee9Jxyj84VpnJDupQKEaHJs0 X-Proofpoint-ORIG-GUID: vzze_AJ0ee9Jxyj84VpnJDupQKEaHJs0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1093,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-03-04_04,2025-03-03_04,2024-11-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 suspectscore=0 malwarescore=0 clxscore=1015 priorityscore=1501 impostorscore=0 spamscore=0 mlxscore=0 bulkscore=0 mlxlogscore=999 phishscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2502100000 definitions=main-2503040084 The wbt latency and state could be updated while initializing the elevator or exiting the elevator. It could be also updated while configuring IO latency QoS parameters using cgroup. The elevator code path is now protected with q->elevator_lock. So we should protect the access to sysfs attribute wbt_lat_usec using q->elevator _lock instead of q->sysfs_lock. White we're at it, also protect ioc_qos_write(), which configures wbt parameters via cgroup, using q->elevator_lock. Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Reviewed-by: Ming Lei Signed-off-by: Nilay Shroff --- block/blk-iocost.c | 2 ++ block/blk-sysfs.c | 20 ++++++++------------ include/linux/blkdev.h | 4 ++-- 3 files changed, 12 insertions(+), 14 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 65a1d4427ccf..c68373361301 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -3249,6 +3249,7 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input, } memflags = blk_mq_freeze_queue(disk->queue); + mutex_lock(&disk->queue->elevator_lock); blk_mq_quiesce_queue(disk->queue); spin_lock_irq(&ioc->lock); @@ -3356,6 +3357,7 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input, spin_unlock_irq(&ioc->lock); blk_mq_unquiesce_queue(disk->queue); + mutex_unlock(&disk->queue->elevator_lock); blk_mq_unfreeze_queue(disk->queue, memflags); ret = -EINVAL; diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index f1fa57de29ed..223da196a548 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -557,7 +557,7 @@ static ssize_t queue_wb_lat_show(struct gendisk *disk, char *page) ssize_t ret; struct request_queue *q = disk->queue; - mutex_lock(&q->sysfs_lock); + mutex_lock(&q->elevator_lock); if (!wbt_rq_qos(q)) { ret = -EINVAL; goto out; @@ -570,7 +570,7 @@ static ssize_t queue_wb_lat_show(struct gendisk *disk, char *page) ret = sysfs_emit(page, "%llu\n", div_u64(wbt_get_min_lat(q), 1000)); out: - mutex_unlock(&q->sysfs_lock); + mutex_unlock(&q->elevator_lock); return ret; } @@ -589,8 +589,8 @@ static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page, if (val < -1) return -EINVAL; - mutex_lock(&q->sysfs_lock); memflags = blk_mq_freeze_queue(q); + mutex_lock(&q->elevator_lock); rqos = wbt_rq_qos(q); if (!rqos) { @@ -619,8 +619,8 @@ static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page, blk_mq_unquiesce_queue(q); out: + mutex_unlock(&q->elevator_lock); blk_mq_unfreeze_queue(q, memflags); - mutex_unlock(&q->sysfs_lock); return ret; } @@ -689,19 +689,15 @@ static struct attribute *queue_attrs[] = { /* Request-based queue attributes that are not relevant for bio-based queues. */ static struct attribute *blk_mq_queue_attrs[] = { - /* - * Attributes which are protected with q->sysfs_lock. - */ -#ifdef CONFIG_BLK_WBT - &queue_wb_lat_entry.attr, -#endif /* * Attributes which require some form of locking other than * q->sysfs_lock. */ &elv_iosched_entry.attr, &queue_requests_entry.attr, - +#ifdef CONFIG_BLK_WBT + &queue_wb_lat_entry.attr, +#endif /* * Attributes which don't require locking. */ @@ -882,10 +878,10 @@ int blk_register_queue(struct gendisk *disk) goto out_crypto_sysfs_unregister; } } + wbt_enable_default(disk); mutex_unlock(&q->elevator_lock); blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q); - wbt_enable_default(disk); /* Now everything is ready and send out KOBJ_ADD uevent */ kobject_uevent(&disk->queue_kobj, KOBJ_ADD); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index ed8d139a4f5d..2977a583d740 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -565,8 +565,8 @@ struct request_queue { /* * Protects against I/O scheduler switching, particularly when * updating q->elevator. Since the elevator update code path may - * also modify q->nr_requests, this lock also protects the sysfs - * attribute nr_requests. + * also modify q->nr_requests and wbt latency, this lock also + * protects the sysfs attributes nr_requests and wbt_lat_usec. * To ensure proper locking order during an elevator update, first * freeze the queue, then acquire ->elevator_lock. */ From patchwork Tue Mar 4 10:22:36 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nilay Shroff X-Patchwork-Id: 14000374 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 45CDF1FDA86 for ; Tue, 4 Mar 2025 10:26:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741083989; cv=none; b=gJMF/4xcSey6N4mZEcC2/sTwRe5D55JKSvSTVkUNj6m+AzHVWOOpyyq9RBpCj3ZoHbaYqlaGUzoxfRPu9Jktizp2YgmziwaETJoFxJylEVgRvTdF83KfOCPm4QHI3GMmyqWy12w5yoy1hggj4V4CP9Oj/VMjdjjl5fyxOhhKLIA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741083989; c=relaxed/simple; bh=PDEqRnc/CzZUJlbl3fgWYT6Lyo+JXpZLolbs0D0EsAk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hReX6VtruK7PSHgeHXVs8FtRTwsWFWKJ49m93gOcFNCMY0RDaq5isp9ueRtCGaxG/mpXsEF7yawA3RUhJ8SYKLFzzKXhehpWiE7C6VvpQWpsqKK50SzPo7OLR+lTH0J61S4vj/CLHGa0mf+heIxoj1ECR4qcr18cwBcpeztOyzQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=Ce7ur/oy; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="Ce7ur/oy" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5249hhqi022673; Tue, 4 Mar 2025 10:26:13 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=EdDZPhhOVqrRyGrpq MgxhB9MxyJ3ltHLHIhH/OQlHRE=; b=Ce7ur/oyHju96iTY86EOVXV2vgs+N+vCT BPEMWLzV++uPaffyZlLnou1dFFpwxBo+I4fVmYCpVeRRd458Wkv4+JuNVtXB5uGv LA1uS2jBnX+bwT2ya7ZQB6txM5Zhpcz21eyncfTVOVx+Kd0+RkMF9Mho87Oxy1xq 5zo2+nBoYsOozDshTigfPBxI6fC9PX69Z66TV7rPYI/bKs4FtZSy8ACy3PTw2c3c kyeAACcqE45GbPhOSgtsK40G5VQsHirJixi3JhVuSD0Rt1Z8f+AFkBxduMe6BzEC sSAm2dlOJmbjrwFH8hHyjSGkrU/fXhV54ilj9QYiLxlRMbT/VECog== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 455ku5391t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 04 Mar 2025 10:26:12 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5247ONQE020835; Tue, 4 Mar 2025 10:26:12 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 454djncvu8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 04 Mar 2025 10:26:12 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 524AQAQk8454486 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 4 Mar 2025 10:26:10 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3F95C20040; Tue, 4 Mar 2025 10:26:10 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4C5BD20043; Tue, 4 Mar 2025 10:26:08 +0000 (GMT) Received: from li-c9696b4c-3419-11b2-a85c-f9edc3bf8a84.in.ibm.com (unknown [9.109.198.149]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 4 Mar 2025 10:26:08 +0000 (GMT) From: Nilay Shroff To: linux-block@vger.kernel.org Cc: hch@lst.de, ming.lei@redhat.com, dlemoal@kernel.org, hare@suse.de, axboe@kernel.dk, lkp@intel.com, gjoyce@ibm.com Subject: [PATCHv6 7/7] block: protect read_ahead_kb using q->limits_lock Date: Tue, 4 Mar 2025 15:52:36 +0530 Message-ID: <20250304102551.2533767-8-nilay@linux.ibm.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250304102551.2533767-1-nilay@linux.ibm.com> References: <20250304102551.2533767-1-nilay@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: osMM9rAIEsRTeQQglv52JRWMUoIqRKXZ X-Proofpoint-GUID: osMM9rAIEsRTeQQglv52JRWMUoIqRKXZ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1093,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-03-04_04,2025-03-03_04,2024-11-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 adultscore=0 clxscore=1015 mlxscore=0 suspectscore=0 impostorscore=0 bulkscore=0 mlxlogscore=999 lowpriorityscore=0 spamscore=0 phishscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2502100000 definitions=main-2503040084 The bdi->ra_pages could be updated under q->limits_lock because it's usually calculated from the queue limits by queue_limits_commit_update. So protect reading/writing the sysfs attribute read_ahead_kb using q->limits_lock instead of q->sysfs_lock. Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Reviewed-by: Ming Lei Signed-off-by: Nilay Shroff --- block/blk-sysfs.c | 16 ++++++++++------ include/linux/blkdev.h | 3 +++ 2 files changed, 13 insertions(+), 6 deletions(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 223da196a548..d584461a1d84 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -93,9 +93,9 @@ static ssize_t queue_ra_show(struct gendisk *disk, char *page) { ssize_t ret; - mutex_lock(&disk->queue->sysfs_lock); + mutex_lock(&disk->queue->limits_lock); ret = queue_var_show(disk->bdi->ra_pages << (PAGE_SHIFT - 10), page); - mutex_unlock(&disk->queue->sysfs_lock); + mutex_unlock(&disk->queue->limits_lock); return ret; } @@ -111,12 +111,15 @@ queue_ra_store(struct gendisk *disk, const char *page, size_t count) ret = queue_var_store(&ra_kb, page, count); if (ret < 0) return ret; - - mutex_lock(&q->sysfs_lock); + /* + * ->ra_pages is protected by ->limits_lock because it is usually + * calculated from the queue limits by queue_limits_commit_update. + */ + mutex_lock(&q->limits_lock); memflags = blk_mq_freeze_queue(q); disk->bdi->ra_pages = ra_kb >> (PAGE_SHIFT - 10); + mutex_unlock(&q->limits_lock); blk_mq_unfreeze_queue(q, memflags); - mutex_unlock(&q->sysfs_lock); return ret; } @@ -670,7 +673,8 @@ static struct attribute *queue_attrs[] = { &queue_dma_alignment_entry.attr, /* - * Attributes which are protected with q->sysfs_lock. + * Attributes which require some form of locking other than + * q->sysfs_lock. */ &queue_ra_entry.attr, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 2977a583d740..4daef0408683 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -573,6 +573,9 @@ struct request_queue { struct mutex elevator_lock; struct mutex sysfs_lock; + /* + * Protects queue limits and also sysfs attribute read_ahead_kb. + */ struct mutex limits_lock; /*