diff mbox

[1/1] sd: fix lbprz discard granularity as expected

Message ID 56e12d57.4b43620a.4c8c0.ffff9bc5@mx.google.com (mailing list archive)
State Rejected, archived
Headers show

Commit Message

Tom Yan March 10, 2016, 8:16 a.m. UTC
From: Tom Yan <tom.ty89@gmail.com>

According to its own comment, the discard granularity should
fixed to the logical block size. However, the actual code has
it hardcoded as 1 byte. Changing it to logical_block_size.

Signed-off-by: Tom Yan <tom.ty89@gmail.com>

Comments

Tom Yan March 10, 2016, 9:15 a.m. UTC | #1
Nevertheless, I don't really quite get the sense of the original commit anyway.

Isn't discard granularity the minimum size we can discard? In that
case why would it have anything to do with "read zeroes"?

Suppose we are handling a device reports a discard granularity of 4096
bytes, while logical block size and physical block size are both 512
bytes. For such a device, a single 512-byte discard request should
simply be rejected because it's not allowed by the device. When it's
rejected, it doesn't mean that "the 512-byte block does not read
zeroes after being discarded"; instead, it's just "we did not discard
the 512-byte block as per requested because it's not allowed".

On 10 March 2016 at 16:16,  <tom.ty89@gmail.com> wrote:
> From: Tom Yan <tom.ty89@gmail.com>
>
> According to its own comment, the discard granularity should
> fixed to the logical block size. However, the actual code has
> it hardcoded as 1 byte. Changing it to logical_block_size.
>
> Signed-off-by: Tom Yan <tom.ty89@gmail.com>
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index d749da7..5a5457a 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -648,7 +648,7 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
>          */
>         if (sdkp->lbprz) {
>                 q->limits.discard_alignment = 0;
> -               q->limits.discard_granularity = 1;
> +               q->limits.discard_granularity = logical_block_size;
>         } else {
>                 q->limits.discard_alignment = sdkp->unmap_alignment *
>                         logical_block_size;
> --
> 2.7.2
>
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin K. Petersen March 11, 2016, 2:08 a.m. UTC | #2
>>>>> "Tom" == tom ty89 <tom.ty89@gmail.com> writes:

Tom> According to its own comment, the discard granularity should fixed
Tom> to the logical block size. However, the actual code has it
Tom> hardcoded as 1 byte. Changing it to logical_block_size.

Already fixed.
Martin K. Petersen March 11, 2016, 2:16 a.m. UTC | #3
>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:

Tom> Nevertheless, I don't really quite get the sense of the original
Tom> commit anyway.  Isn't discard granularity the minimum size we can
Tom> discard? In that case why would it have anything to do with "read
Tom> zeroes"?

On devices that guarantee returning zeroes after discard we use the
feature to clear block ranges (for filesystem metadata, etc.).

A filesystem that clears a block range obviously needs every logical
block to be properly zeroed. And not just the portion that are a certain
size and aligned to a certain boundary.

Tom> Suppose we are handling a device reports a discard granularity of
Tom> 4096 bytes, while logical block size and physical block size are
Tom> both 512 bytes. For such a device, a single 512-byte discard
Tom> request should simply be rejected because it's not allowed by the
Tom> device.

Discard is advisory, the command will not get rejected.

Tom> When it's rejected, it doesn't mean that "the 512-byte block does
Tom> not read zeroes after being discarded"; instead, it's just "we did
Tom> not discard the 512-byte block as per requested because it's not
Tom> allowed".

We only honor discard_zeroes_data for devices that support WRITE SAME w/
UNMAP. And WRITE SAME will manually write any block that it can not
deprovision so you are guaranteed reliable results.
Tom Yan March 11, 2016, 3:53 a.m. UTC | #4
On 11 March 2016 at 10:16, Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>
> Discard is advisory, the command will not get rejected.
>

In that case, if the granularity only act as some sort of "reference"
but has no real "binding" to the actual behaviour, why would the
kernel even "make up" the granularity itself according to the physical
block size of the device:

q->limits.discard_granularity = max(sdkp->physical_block_size,
sdkp->unmap_granularity * logical_block_size);

Shouldn't it simply be:

q->limits.discard_granularity = sdkp->unmap_granularity * logical_block_size;
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin K. Petersen March 11, 2016, 12:37 p.m. UTC | #5
>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:

Tom,

Tom> In that case, if the granularity only act as some sort of
Tom> "reference" but has no real "binding" to the actual behaviour, why
Tom> would the kernel even "make up" the granularity itself according to
Tom> the physical block size of the device:

Tom> q-> limits.discard_granularity = max(sdkp->physical_block_size,
Tom>     sdkp-> unmap_granularity * logical_block_size);

Many devices predate the UNMAP-related fields in the Block Limits
VPD. There are also devices that do not report an OPTIMAL UNMAP
GRANULARITY despite supporting UNMAP.
Tom Yan March 11, 2016, 2:55 p.m. UTC | #6
Would min_not_zero() be more proper than max()?

On 11 March 2016 at 20:37, Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:
>
> Tom,
>
> Tom> In that case, if the granularity only act as some sort of
> Tom> "reference" but has no real "binding" to the actual behaviour, why
> Tom> would the kernel even "make up" the granularity itself according to
> Tom> the physical block size of the device:
>
> Tom> q-> limits.discard_granularity = max(sdkp->physical_block_size,
> Tom>     sdkp-> unmap_granularity * logical_block_size);
>
> Many devices predate the UNMAP-related fields in the Block Limits
> VPD. There are also devices that do not report an OPTIMAL UNMAP
> GRANULARITY despite supporting UNMAP.
>
> --
> Martin K. Petersen      Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin K. Petersen March 11, 2016, 9:41 p.m. UTC | #7
>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:

Tom,

Tom> Would min_not_zero() be more proper than max()?

That would effectively set discard_granularity to physical_block_size
regardless of whether unmap_granularity was provided or not.
Tom Yan March 12, 2016, 5:37 a.m. UTC | #8
Suppose a device has logical block size of 512 bytes and physical
block size of 4096 bytes ("AF 512e").

If it has unmap granularity reported to be 1, then its discard
granularity should be 1 * 512 = 512 bytes. However, with max() it is
set to 4096 bytes instead. With min_not_zero(), it is CORRECTLY set to
512 bytes.

If it does not has unmap granularity reported, then with max() its
discard granularity is set to 4096 bytes. But with min_not_zero(), it
will ALSO be set to 4096 bytes.

Therefore, I don't see why max() should be used instead according to
your previous mails.

On 12 March 2016 at 05:41, Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:
>
> Tom,
>
> Tom> Would min_not_zero() be more proper than max()?
>
> That would effectively set discard_granularity to physical_block_size
> regardless of whether unmap_granularity was provided or not.
>
> --
> Martin K. Petersen      Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Yan March 12, 2016, 5:43 a.m. UTC | #9
But yeah if it has unmap granularity reported to be, for example, 16,
then it will not be correctly set with min_not_zero().

Wait, shouldn't it be:

max(sdkp->logical_block_size,
   sdkp->unmap_granularity * logical_block_size);

then?


On 12 March 2016 at 13:37, Tom Yan <tom.ty89@gmail.com> wrote:
> Suppose a device has logical block size of 512 bytes and physical
> block size of 4096 bytes ("AF 512e").
>
> If it has unmap granularity reported to be 1, then its discard
> granularity should be 1 * 512 = 512 bytes. However, with max() it is
> set to 4096 bytes instead. With min_not_zero(), it is CORRECTLY set to
> 512 bytes.
>
> If it does not has unmap granularity reported, then with max() its
> discard granularity is set to 4096 bytes. But with min_not_zero(), it
> will ALSO be set to 4096 bytes.
>
> Therefore, I don't see why max() should be used instead according to
> your previous mails.
>
> On 12 March 2016 at 05:41, Martin K. Petersen
> <martin.petersen@oracle.com> wrote:
>>>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:
>>
>> Tom,
>>
>> Tom> Would min_not_zero() be more proper than max()?
>>
>> That would effectively set discard_granularity to physical_block_size
>> regardless of whether unmap_granularity was provided or not.
>>
>> --
>> Martin K. Petersen      Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin K. Petersen March 14, 2016, 8:07 p.m. UTC | #10
>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:

Tom,

Tom> But yeah if it has unmap granularity reported to be, for example,
Tom> 16, then it will not be correctly set with min_not_zero().

Tom> Wait, shouldn't it be:

Tom> max(sdkp->logical_block_size, sdkp->unmap_granularity * logical_block_size);

Tom> then?

Let's take the your example of a drive with 512-byte logical blocks and
4096-byte physical blocks. How would a drive partially unmap a sector?
diff mbox

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d749da7..5a5457a 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -648,7 +648,7 @@  static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
 	 */
 	if (sdkp->lbprz) {
 		q->limits.discard_alignment = 0;
-		q->limits.discard_granularity = 1;
+		q->limits.discard_granularity = logical_block_size;
 	} else {
 		q->limits.discard_alignment = sdkp->unmap_alignment *
 			logical_block_size;