Message ID | 1468934439-93579-5-git-send-email-hare@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 7/19/16 22:20, Hannes Reinecke wrote: > Add a sysfs queue attribute 'zoned' to display the zone layout > for zoned devices. > > Signed-off-by: Hannes Reinecke <hare@suse.de> > --- > block/blk-sysfs.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 46 insertions(+) Reviewed-by: Damien Le Moal <damien.lemoal@hgst.com> Tested-by: Damien Le Moal <damien.lemoal@hgst.com>
>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes:
Hannes> Add a sysfs queue attribute 'zoned' to display the zone layout
Hannes> for zoned devices.
Not quite one value per file :(
On 07/22/2016 10:45 PM, Martin K. Petersen wrote: >>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes: > > Hannes> Add a sysfs queue attribute 'zoned' to display the zone layout > Hannes> for zoned devices. > > Not quite one value per file :( > Yes. But I wanted to display the zone layout in a concise way allowing user-space programs to determine the zone layout without having to issue a 'REPORT ZONES' command themselves. I found it slightly pointless to add one sysfs entry per zone, and at the same time a simple 'zone_size' attribute wouldn't cover all possibilities. However, as SMR drives seem to stabilise around having a fixed zone size (with a possible exemption of the last zone to cover left-overs) I'd be fine a replace this with a single 'zone_size' attribute which could be set to eg '-1' for drives which indeed would implement variable zone sizes. Cheers, Hannes
On 07/23/16 05:43, Hannes Reinecke wrote: > On 07/22/2016 10:45 PM, Martin K. Petersen wrote: >>>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes: >> >> Hannes> Add a sysfs queue attribute 'zoned' to display the zone layout >> Hannes> for zoned devices. >> >> Not quite one value per file :( >> > Yes. > But I wanted to display the zone layout in a concise way allowing > user-space programs to determine the zone layout without having to > issue a 'REPORT ZONES' command themselves. > I found it slightly pointless to add one sysfs entry per zone, > and at the same time a simple 'zone_size' attribute wouldn't cover all > possibilities. > > However, as SMR drives seem to stabilise around having a fixed zone size > (with a possible exemption of the last zone to cover left-overs) > I'd be fine a replace this with a single 'zone_size' attribute which > could be set to eg '-1' for drives which indeed would implement variable > zone sizes. It's not that hard to convert the information exported by queue_zoned_show() from a single sysfs attribute into one directory per zone. Doing so would make it much easier for scripts to parse that information and would also avoid that the zone information has to be truncated because not all of it fits into a single page. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/24/2016 12:13 AM, Bart Van Assche wrote: > On 07/23/16 05:43, Hannes Reinecke wrote: >> On 07/22/2016 10:45 PM, Martin K. Petersen wrote: >>>>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes: >>> >>> Hannes> Add a sysfs queue attribute 'zoned' to display the zone layout >>> Hannes> for zoned devices. >>> >>> Not quite one value per file :( >>> >> Yes. >> But I wanted to display the zone layout in a concise way allowing >> user-space programs to determine the zone layout without having to >> issue a 'REPORT ZONES' command themselves. >> I found it slightly pointless to add one sysfs entry per zone, >> and at the same time a simple 'zone_size' attribute wouldn't cover all >> possibilities. >> >> However, as SMR drives seem to stabilise around having a fixed zone size >> (with a possible exemption of the last zone to cover left-overs) >> I'd be fine a replace this with a single 'zone_size' attribute which >> could be set to eg '-1' for drives which indeed would implement variable >> zone sizes. > > It's not that hard to convert the information exported by > queue_zoned_show() from a single sysfs attribute into one directory per > zone. Doing so would make it much easier for scripts to parse that > information and would also avoid that the zone information has to be > truncated because not all of it fits into a single page. > But this is precisely what I've tried to avoid. Creating one file or directory per zone would mean we'll end up with rough 20k files/directories. Which I found rather excessive. Of course, it that is not a concern that I can easily convert it. Cheers, Hannes
On 07/24/16 00:10, Hannes Reinecke wrote: > On 07/24/2016 12:13 AM, Bart Van Assche wrote: >> On 07/23/16 05:43, Hannes Reinecke wrote: >>> On 07/22/2016 10:45 PM, Martin K. Petersen wrote: >>>>>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes: >>>> >>>> Hannes> Add a sysfs queue attribute 'zoned' to display the zone layout >>>> Hannes> for zoned devices. >>>> >>>> Not quite one value per file :( >>>> >>> Yes. >>> But I wanted to display the zone layout in a concise way allowing >>> user-space programs to determine the zone layout without having to >>> issue a 'REPORT ZONES' command themselves. >>> I found it slightly pointless to add one sysfs entry per zone, >>> and at the same time a simple 'zone_size' attribute wouldn't cover all >>> possibilities. >>> >>> However, as SMR drives seem to stabilise around having a fixed zone size >>> (with a possible exemption of the last zone to cover left-overs) >>> I'd be fine a replace this with a single 'zone_size' attribute which >>> could be set to eg '-1' for drives which indeed would implement variable >>> zone sizes. >> >> It's not that hard to convert the information exported by >> queue_zoned_show() from a single sysfs attribute into one directory per >> zone. Doing so would make it much easier for scripts to parse that >> information and would also avoid that the zone information has to be >> truncated because not all of it fits into a single page. >> > But this is precisely what I've tried to avoid. > Creating one file or directory per zone would mean we'll end up with > rough 20k files/directories. > Which I found rather excessive. > > Of course, it that is not a concern that I can easily convert it. If there are 10K zones and since queue_zoned_show() is limited to one page then only a very small fraction of the zone information will be available through sysfs. I remember from your presentations that reading the zone information is slow. Is 10K zones a typical value or a worst case value? Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/24/16 06:37, Bart Van Assche wrote: > On 07/24/16 00:10, Hannes Reinecke wrote: >> On 07/24/2016 12:13 AM, Bart Van Assche wrote: >>> On 07/23/16 05:43, Hannes Reinecke wrote: >>>> On 07/22/2016 10:45 PM, Martin K. Petersen wrote: >>>>>>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes: >>>>> >>>>> Hannes> Add a sysfs queue attribute 'zoned' to display the zone layout >>>>> Hannes> for zoned devices. >>>>> >>>>> Not quite one value per file :( >>>>> >>>> Yes. >>>> But I wanted to display the zone layout in a concise way allowing >>>> user-space programs to determine the zone layout without having to >>>> issue a 'REPORT ZONES' command themselves. >>>> I found it slightly pointless to add one sysfs entry per zone, >>>> and at the same time a simple 'zone_size' attribute wouldn't cover all >>>> possibilities. >>>> >>>> However, as SMR drives seem to stabilise around having a fixed zone size >>>> (with a possible exemption of the last zone to cover left-overs) >>>> I'd be fine a replace this with a single 'zone_size' attribute which >>>> could be set to eg '-1' for drives which indeed would implement variable >>>> zone sizes. >>> >>> It's not that hard to convert the information exported by >>> queue_zoned_show() from a single sysfs attribute into one directory per >>> zone. Doing so would make it much easier for scripts to parse that >>> information and would also avoid that the zone information has to be >>> truncated because not all of it fits into a single page. >>> >> But this is precisely what I've tried to avoid. >> Creating one file or directory per zone would mean we'll end up with >> rough 20k files/directories. >> Which I found rather excessive. >> >> Of course, it that is not a concern that I can easily convert it. > > If there are 10K zones and since queue_zoned_show() is limited to one > page then only a very small fraction of the zone information will be > available through sysfs. I remember from your presentations that reading > the zone information is slow. Is 10K zones a typical value or a worst > case value? (replying to my own e-mail) Something I should have asked before: is the zone information intended for end users or rather for software developers? In the latter case, have you considered to use debugfs instead of sysfs to export this information? From Documentation/filesystems/debugfs.txt: "Unlike /proc, which is only meant for information about a process, or sysfs, which has strict one-value-per-file rules, debugfs has no rules at all." Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bart, > On Jul 24, 2016, at 22:37, Bart Van Assche <bart.vanassche@sandisk.com> wrote: > > If there are 10K zones and since queue_zoned_show() is limited to one page then only a very small fraction of the zone information will be available through sysfs. I remember from your presentations that reading the zone information is slow. Is 10K zones a typical value or a worst case value? Our 10 TB drive with 256 MB zones has 37256 zones. A drive with the same capacity and 128 MB zones will double that (there are some out there with 128 MB zones). And higher capacities are coming. So 100K+ zones is not unrealistic in the very near future. I think that what Hannes proposed, i.e. to have the zoned file only specify the zone size (so only one value), is OK as long as we also add an ioctl to get the detailed zone information, which would be much faster for applications than using something like libzbc which will be sending SG_IOs to retrieve the zone information from the disk. Best regards. ------------------------ Damien Le Moal, Ph.D. Sr. Manager, System Software Group, HGST Research, HGST, a Western Digital brand Damien.LeMoal@hgst.com (+81) 0466-98-3593 (ext. 513593) 1 kirihara-cho, Fujisawa, Kanagawa, 252-0888 Japan www.hgst.com Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer: This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bart, > On Jul 24, 2016, at 22:51, Bart Van Assche <bart.vanassche@sandisk.com> wrote: > > Something I should have asked before: is the zone information intended for end users or rather for software developers? In the latter case, have you considered to use debugfs instead of sysfs to export this information? From Documentation/filesystems/debugfs.txt: "Unlike /proc, which is only meant for information about a process, or sysfs, which has strict one-value-per-file rules, debugfs has no rules at all." I think that the zone information will definitively be of higher value for software developers rather than end users. But that information must be available at run time for things like mkfs.xxx and having to mount debugfs on a production system to be able to run mkfs on an SMR disk (or initialize an application using the raw block device) does not sound ideal. Back to my previous reply, having the sysfs zoned file specifying just the disk zone size and having an ioctl to retrieve the detailed zone information looks to me much more sensible. Best regards. ------------------------ Damien Le Moal, Ph.D. Sr. Manager, System Software Group, HGST Research, HGST, a Western Digital brand Damien.LeMoal@hgst.com (+81) 0466-98-3593 (ext. 513593) 1 kirihara-cho, Fujisawa, Kanagawa, 252-0888 Japan www.hgst.com Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer: This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/24/2016 03:51 PM, Bart Van Assche wrote: > On 07/24/16 06:37, Bart Van Assche wrote: >> On 07/24/16 00:10, Hannes Reinecke wrote: >>> On 07/24/2016 12:13 AM, Bart Van Assche wrote: >>>> On 07/23/16 05:43, Hannes Reinecke wrote: >>>>> On 07/22/2016 10:45 PM, Martin K. Petersen wrote: >>>>>>>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes: >>>>>> >>>>>> Hannes> Add a sysfs queue attribute 'zoned' to display the zone >>>>>> layout >>>>>> Hannes> for zoned devices. >>>>>> >>>>>> Not quite one value per file :( >>>>>> >>>>> Yes. >>>>> But I wanted to display the zone layout in a concise way allowing >>>>> user-space programs to determine the zone layout without having to >>>>> issue a 'REPORT ZONES' command themselves. >>>>> I found it slightly pointless to add one sysfs entry per zone, >>>>> and at the same time a simple 'zone_size' attribute wouldn't cover all >>>>> possibilities. >>>>> >>>>> However, as SMR drives seem to stabilise around having a fixed zone >>>>> size >>>>> (with a possible exemption of the last zone to cover left-overs) >>>>> I'd be fine a replace this with a single 'zone_size' attribute which >>>>> could be set to eg '-1' for drives which indeed would implement >>>>> variable >>>>> zone sizes. >>>> >>>> It's not that hard to convert the information exported by >>>> queue_zoned_show() from a single sysfs attribute into one directory per >>>> zone. Doing so would make it much easier for scripts to parse that >>>> information and would also avoid that the zone information has to be >>>> truncated because not all of it fits into a single page. >>>> >>> But this is precisely what I've tried to avoid. >>> Creating one file or directory per zone would mean we'll end up with >>> rough 20k files/directories. >>> Which I found rather excessive. >>> >>> Of course, it that is not a concern that I can easily convert it. >> >> If there are 10K zones and since queue_zoned_show() is limited to one >> page then only a very small fraction of the zone information will be >> available through sysfs. I remember from your presentations that reading >> the zone information is slow. Is 10K zones a typical value or a worst >> case value? > > (replying to my own e-mail) > > Something I should have asked before: is the zone information intended > for end users or rather for software developers? In the latter case, > have you considered to use debugfs instead of sysfs to export this > information? From Documentation/filesystems/debugfs.txt: "Unlike /proc, > which is only meant for information about a process, or sysfs, which has > strict one-value-per-file rules, debugfs has no rules at all." > I would be perfectly fine with exporting the zone information via debugfs. But at the same time I would like to have a simple sysfs representation for SMR drives which can be utilized by mkfs and friend. Typically SMR drives have a fixed zone size, and more often than not the size of the zones is identical. So mkfs would benefit from knowing the fixed zone layout as it then can arrange the filesystem structures to align to those zones. At the same time it doesn't need to know the write pointer, so full zone information is not required here. And during normal operation ideally the zone information is handled within the kernel, so again userspace doesn't necessarily need to have access to the full zone information. So I guess I'll redo this one to export a sysfs attribute 'zone_size' (if the zones are of identical size), and make the full zone information available via debugfs. Cheers, Hannes
On 07/24/2016 07:37 AM, Bart Van Assche wrote: > On 07/24/16 00:10, Hannes Reinecke wrote: >> On 07/24/2016 12:13 AM, Bart Van Assche wrote: >>> On 07/23/16 05:43, Hannes Reinecke wrote: >>>> On 07/22/2016 10:45 PM, Martin K. Petersen wrote: >>>>>>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes: >>>>> >>>>> Hannes> Add a sysfs queue attribute 'zoned' to display the zone layout >>>>> Hannes> for zoned devices. >>>>> >>>>> Not quite one value per file :( >>>>> >>>> Yes. >>>> But I wanted to display the zone layout in a concise way allowing >>>> user-space programs to determine the zone layout without having to >>>> issue a 'REPORT ZONES' command themselves. >>>> I found it slightly pointless to add one sysfs entry per zone, >>>> and at the same time a simple 'zone_size' attribute wouldn't cover all >>>> possibilities. >>>> >>>> However, as SMR drives seem to stabilise around having a fixed zone >>>> size >>>> (with a possible exemption of the last zone to cover left-overs) >>>> I'd be fine a replace this with a single 'zone_size' attribute which >>>> could be set to eg '-1' for drives which indeed would implement >>>> variable >>>> zone sizes. >>> >>> It's not that hard to convert the information exported by >>> queue_zoned_show() from a single sysfs attribute into one directory per >>> zone. Doing so would make it much easier for scripts to parse that >>> information and would also avoid that the zone information has to be >>> truncated because not all of it fits into a single page. >>> >> But this is precisely what I've tried to avoid. >> Creating one file or directory per zone would mean we'll end up with >> rough 20k files/directories. >> Which I found rather excessive. >> >> Of course, it that is not a concern that I can easily convert it. > > If there are 10K zones and since queue_zoned_show() is limited to one > page then only a very small fraction of the zone information will be > available through sysfs. I remember from your presentations that reading > the zone information is slow. Is 10K zones a typical value or a worst > case value? Just kill the sysfs file, it's useless for many zones. Either have the application use some library to provide the information, or use bsg and similar to get it.
On 07/24/2016 11:56 PM, Hannes Reinecke wrote: > On 07/24/2016 03:51 PM, Bart Van Assche wrote: >> On 07/24/16 06:37, Bart Van Assche wrote: >>> On 07/24/16 00:10, Hannes Reinecke wrote: >>>> On 07/24/2016 12:13 AM, Bart Van Assche wrote: >>>>> On 07/23/16 05:43, Hannes Reinecke wrote: >>>>>> On 07/22/2016 10:45 PM, Martin K. Petersen wrote: >>>>>>>>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes: >>>>>>> >>>>>>> Hannes> Add a sysfs queue attribute 'zoned' to display the zone >>>>>>> layout >>>>>>> Hannes> for zoned devices. >>>>>>> >>>>>>> Not quite one value per file :( >>>>>>> >>>>>> Yes. >>>>>> But I wanted to display the zone layout in a concise way allowing >>>>>> user-space programs to determine the zone layout without having to >>>>>> issue a 'REPORT ZONES' command themselves. >>>>>> I found it slightly pointless to add one sysfs entry per zone, >>>>>> and at the same time a simple 'zone_size' attribute wouldn't cover all >>>>>> possibilities. >>>>>> >>>>>> However, as SMR drives seem to stabilise around having a fixed zone >>>>>> size >>>>>> (with a possible exemption of the last zone to cover left-overs) >>>>>> I'd be fine a replace this with a single 'zone_size' attribute which >>>>>> could be set to eg '-1' for drives which indeed would implement >>>>>> variable >>>>>> zone sizes. >>>>> >>>>> It's not that hard to convert the information exported by >>>>> queue_zoned_show() from a single sysfs attribute into one directory per >>>>> zone. Doing so would make it much easier for scripts to parse that >>>>> information and would also avoid that the zone information has to be >>>>> truncated because not all of it fits into a single page. >>>>> >>>> But this is precisely what I've tried to avoid. >>>> Creating one file or directory per zone would mean we'll end up with >>>> rough 20k files/directories. >>>> Which I found rather excessive. >>>> >>>> Of course, it that is not a concern that I can easily convert it. >>> >>> If there are 10K zones and since queue_zoned_show() is limited to one >>> page then only a very small fraction of the zone information will be >>> available through sysfs. I remember from your presentations that reading >>> the zone information is slow. Is 10K zones a typical value or a worst >>> case value? >> >> (replying to my own e-mail) >> >> Something I should have asked before: is the zone information intended >> for end users or rather for software developers? In the latter case, >> have you considered to use debugfs instead of sysfs to export this >> information? From Documentation/filesystems/debugfs.txt: "Unlike /proc, >> which is only meant for information about a process, or sysfs, which has >> strict one-value-per-file rules, debugfs has no rules at all." >> > I would be perfectly fine with exporting the zone information via debugfs. > But at the same time I would like to have a simple sysfs representation > for SMR drives which can be utilized by mkfs and friend. Then provide a real interface, through a library. Don't expose tons of zones through sysfs, and expect applications to parse that. That's silly.
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 73200b8..7ffbc44 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -229,6 +229,42 @@ static ssize_t queue_max_hw_sectors_show(struct request_queue *q, char *page) return queue_var_show(max_hw_sectors_kb, (page)); } +#ifdef CONFIG_BLK_DEV_ZONED +static ssize_t queue_zoned_show(struct request_queue *q, char *page) +{ + struct rb_node *node; + struct blk_zone *zone; + ssize_t offset = 0, end = 0; + size_t size = 0, num = 0; + enum blk_zone_type type = BLK_ZONE_TYPE_UNKNOWN; + + for (node = rb_first(&q->zones); node; node = rb_next(node)) { + zone = rb_entry(node, struct blk_zone, node); + if (zone->type != type || + zone->len != size || + end != zone->start) { + if (size != 0) + offset += sprintf(page + offset, "%zu\n", num); + /* We can only store one page ... */ + if (offset + 42 > PAGE_SIZE) { + offset += sprintf(page + offset, "...\n"); + return offset; + } + size = zone->len; + type = zone->type; + offset += sprintf(page + offset, "%zu %zu %d ", + zone->start, size, type); + num = 0; + end = zone->start + size; + } else + end += zone->len; + num++; + } + offset += sprintf(page + offset, "%zu\n", num); + return offset; +} +#endif + #define QUEUE_SYSFS_BIT_FNS(name, flag, neg) \ static ssize_t \ queue_show_##name(struct request_queue *q, char *page) \ @@ -484,6 +520,13 @@ static struct queue_sysfs_entry queue_write_same_max_entry = { .show = queue_write_same_max_show, }; +#ifdef CONFIG_BLK_DEV_ZONED +static struct queue_sysfs_entry queue_zoned_entry = { + .attr = {.name = "zoned", .mode = S_IRUGO }, + .show = queue_zoned_show, +}; +#endif + static struct queue_sysfs_entry queue_nonrot_entry = { .attr = {.name = "rotational", .mode = S_IRUGO | S_IWUSR }, .show = queue_show_nonrot, @@ -547,6 +590,9 @@ static struct attribute *default_attrs[] = { &queue_discard_zeroes_data_entry.attr, &queue_write_same_max_entry.attr, &queue_nonrot_entry.attr, +#ifdef CONFIG_BLK_DEV_ZONED + &queue_zoned_entry.attr, +#endif &queue_nomerges_entry.attr, &queue_rq_affinity_entry.attr, &queue_iostats_entry.attr,
Add a sysfs queue attribute 'zoned' to display the zone layout for zoned devices. Signed-off-by: Hannes Reinecke <hare@suse.de> --- block/blk-sysfs.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+)