Message ID | 20241119002815.600608-1-bvanassche@acm.org (mailing list archive) |
---|---|
Headers | show |
Series | Improve write performance for zoned UFS devices | expand |
On 11/19/24 09:27, Bart Van Assche wrote: > Hi Damien and Christoph, > > This patch series improves small write IOPS by a factor of four (+300%) for > zoned UFS devices on my test setup with an UFSHCI 3.0 controller. Although > you are probably busy because the merge window is open, please take a look > at this patch series when you have the time. This patch series is organized > as follows: > - Bug fixes for existing code at the start of the series. > - The write pipelining support implementation comes after the bug fixes. Impressive improvements but the changes are rather invasive. Have you tried simpler solution like forcing unplugging a zone write plug from the driver once a command is passed to the driver and the driver did not reject it ? It seems like this would make everything simpler on the block layer side. But I am not sure if the performance gains would be the same.
On Mon, Nov 18, 2024 at 04:27:49PM -0800, Bart Van Assche wrote: > Hi Damien and Christoph, > > This patch series improves small write IOPS by a factor of four (+300%) for > zoned UFS devices on my test setup with an UFSHCI 3.0 controller. What's your exact test setup? Which upstream kernel support zoned UFS device are using?
On 11/19/24 4:25 AM, Christoph Hellwig wrote: > What's your exact test setup? Pixel 2023 and 2025 development boards with android-mainline kernel (https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline). The android mainline kernel tracks the upstream kernel closely. While the performance of Pixel development boards is identical to that of the corresponding smartphone generation, these development boards have the following advantages: - A UFS socket that allows swapping UFS devices without any soldering. - A USB port that makes it easy to monitor and capture kernel log messages. > Which upstream kernel support zoned UFS device are using? A Micron ZUFS device. Micron zoned UFS devices have a zone size that is a power of two. Thanks, Bart.
On 11/19/24 12:01 AM, Damien Le Moal wrote: > Impressive improvements but the changes are rather invasive. Have you tried > simpler solution like forcing unplugging a zone write plug from the driver once > a command is passed to the driver and the driver did not reject it ? It seems > like this would make everything simpler on the block layer side. But I am not > sure if the performance gains would be the same. Hi Damien, I'm not sure that the approach of submitting a new zoned write if the driver did not reject the previous write would result in a simpler solution. SCSI devices are allowed to respond to any command with a unit attention instead of processing the command. If a unit attention is reported, the SCSI core requeues the command. In other words, even with this approach, proper support for requeued zoned writes in the block layer is required. Additionally, that approach is not compatible with using .queue_rqs(). While the SCSI core does not yet support a .queue_rqs() callback, I think it would be good to have this support in the SCSI core. If we need requeuing support anyway, why to select an approach that probably will result in lower performance than what has been implemented in this patch series? Thanks, Bart.
On 11/20/24 04:08, Bart Van Assche wrote: > On 11/19/24 12:01 AM, Damien Le Moal wrote: >> Impressive improvements but the changes are rather invasive. Have you tried >> simpler solution like forcing unplugging a zone write plug from the driver once >> a command is passed to the driver and the driver did not reject it ? It seems >> like this would make everything simpler on the block layer side. But I am not >> sure if the performance gains would be the same. > > Hi Damien, > > I'm not sure that the approach of submitting a new zoned write if the > driver did not reject the previous write would result in a simpler > solution. SCSI devices are allowed to respond to any command with a unit > attention instead of processing the command. If a unit attention is > reported, the SCSI core requeues the command. In other words, even with > this approach, proper support for requeued zoned writes in the block > layer is required. Yes, but it would be vastly simpler because you would be guaranteed to having only a single write request per zone being in-flight between the write plug and the device at any time. So the requeue would not need reordering, and likely not need any special code at all. nless I am missing something, this would be simpler, no ? The main question though with such approach is: does it give you the same performance improvements as your current (more invasive) approach ? > Additionally, that approach is not compatible with using .queue_rqs(). > While the SCSI core does not yet support a .queue_rqs() callback, I > think it would be good to have this support in the SCSI core. I do not understand why it is not compatible. What is the problem ? > If we need requeuing support anyway, why to select an approach that > probably will result in lower performance than what has been implemented > in this patch series? I am only trying to see if there is not a simpler approach than what you did. The less changes, the better, right ?
On 11/20/24 7:20 PM, Damien Le Moal wrote: > I am only trying to see if there is not a simpler approach than what you did. > The less changes, the better, right ? Hi Damien, I agree with you that we should select the simplest approach that yields the desired performance. Regarding the proposed approach, forcing unplugging a zone write plug from the driver once a command is passed to the driver and the driver did not reject it, is this approach compatible with SCSI devices that may report a unit attention? If two zoned writes for the same zone are submitted in order to a SCSI device and the SCSI device responds with a unit attention condition to the first write then the second write will fail with an "unaligned write" error. This will have to be handled by pausing zoned write submission and by resubmitting zoned writes after all pending zoned writes for the given zone have completed. In other words, if higher queue depths are supported per zone, we cannot avoid increasing the complexity of the code in block/blk-zoned.c. If we cannot avoid increasing the complexity of that code, I think we can as well select the approach that yields the highest performance and the fewest changes in the block layer code for regular reads and writes. Thanks, Bart.
On 11/22/24 3:00 AM, Bart Van Assche wrote: > On 11/20/24 7:20 PM, Damien Le Moal wrote: >> I am only trying to see if there is not a simpler approach than what you did. >> The less changes, the better, right ? > > Hi Damien, > > I agree with you that we should select the simplest approach that yields > the desired performance. > > Regarding the proposed approach, forcing unplugging a zone write plug > from the driver once a command is passed to the driver and the driver > did not reject it, is this approach compatible with SCSI devices that > may report a unit attention? If two zoned writes for the same zone are > submitted in order to a SCSI device and the SCSI device responds with a > unit attention condition to the first write then the second write will > fail with an "unaligned write" error. This will have to be handled by I have never seen this happen (i mean UNIT ATTENTION being returned) in practice with SAS HDDs. Not sure how zoned UFS devices behave though. > pausing zoned write submission and by resubmitting zoned writes after > all pending zoned writes for the given zone have completed. In other > words, if higher queue depths are supported per zone, we cannot avoid > increasing the complexity of the code in block/blk-zoned.c. If we cannot > avoid increasing the complexity of that code, I think we can as well > select the approach that yields the highest performance and the fewest > changes in the block layer code for regular reads and writes. But it seems that all we need to do is a better request handling in the requeue+dispatch queue to handle requeued writes. I am not yet convinced that zone write plugging needs a lot of changes beside allowing more than one write per zone and some form of throttling based on feedback from the requeue path. I.e. the current code throttles write command issuing every time one command is submitted: the submission "plugs" the write plug. In your case, it seems that the plugging should be driven by the requeue path, or the driver. Unplugging of a plug happens currently when a BIO completes, but in your case, it would need to be also driven by the requeue path or driver. I am thinking that things could be cleaner/easier to maintain with a zoned write requeue special path. May be... I am thinking aloud here as I have not tried to code anything. > > Thanks, > > Bart. > >