diff mbox series

[RFC,1/4] block: Make bdev_can_atomic_write() robust against mis-aligned bdev size

Message ID 20240903150748.2179966-2-john.g.garry@oracle.com (mailing list archive)
State New, archived
Headers show
Series RAID0 atomic write support | expand

Commit Message

John Garry Sept. 3, 2024, 3:07 p.m. UTC
For bdev file operations, a write will be truncated when trying to write
past the end of the device. This could not be tolerated for an atomic
write.

Ensure that the size of the bdev matches max atomic write unit so that this
truncation would never occur.

Fixes: 9da3d1e912f3 ("block: Add core atomic write support")
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 include/linux/blkdev.h | 3 +++
 1 file changed, 3 insertions(+)

Comments

Christoph Hellwig Sept. 12, 2024, 1:15 p.m. UTC | #1
On Tue, Sep 03, 2024 at 03:07:45PM +0000, John Garry wrote:
> For bdev file operations, a write will be truncated when trying to write
> past the end of the device. This could not be tolerated for an atomic
> write.
> 
> Ensure that the size of the bdev matches max atomic write unit so that this
> truncation would never occur.

But we'd still support atomic writes for all but the last sectors of
the device?  Isn't this really an application problem?

If not supporting atomic writes at all for unaligned devices is the right
thing to do, we'll need to clearly document this somewhere.  Any maybe
also add a pr_once to log a message?
John Garry Sept. 12, 2024, 2:58 p.m. UTC | #2
On 12/09/2024 14:15, Christoph Hellwig wrote:
> On Tue, Sep 03, 2024 at 03:07:45PM +0000, John Garry wrote:
>> For bdev file operations, a write will be truncated when trying to write
>> past the end of the device. This could not be tolerated for an atomic
>> write.
>>
>> Ensure that the size of the bdev matches max atomic write unit so that this
>> truncation would never occur.
> 
> But we'd still support atomic writes for all but the last sectors of
> the device? 

We should do be able to, but with this patch we cannot. However, a 
misaligned partition would be very much unexpected.

> Isn't this really an application problem?

Sure, if the application tried to do an atomic write to the end of the 
device and it was truncated.

> 
> If not supporting atomic writes at all for unaligned devices is the right
> thing to do, we'll need to clearly document this somewhere.  Any maybe
> also add a pr_once to log a message?

I could also just reject any truncation on the atomic write in fops. 
Maybe that is better.

And at some stage looking at making parted, fdisk, and other 
partitioning tools atomic write aware would be good, so that the user 
knows about these restrictions.

Thanks,
John
Christoph Hellwig Sept. 12, 2024, 3:07 p.m. UTC | #3
On Thu, Sep 12, 2024 at 03:58:00PM +0100, John Garry wrote:
> On 12/09/2024 14:15, Christoph Hellwig wrote:
>> On Tue, Sep 03, 2024 at 03:07:45PM +0000, John Garry wrote:
>>> For bdev file operations, a write will be truncated when trying to write
>>> past the end of the device. This could not be tolerated for an atomic
>>> write.
>>>
>>> Ensure that the size of the bdev matches max atomic write unit so that this
>>> truncation would never occur.
>>
>> But we'd still support atomic writes for all but the last sectors of
>> the device? 
>
> We should do be able to, but with this patch we cannot. However, a 
> misaligned partition would be very much unexpected.

Yes, misaligned partitions is very unexpected, but with large and
potentially unlimited atomic boundaries I would not expect the size
to always be aligned.  But then again at least in NVMe atomic writes
don't need to match the max size anyway, so I'm not entirely sure
what the problem actually is.

> I could also just reject any truncation on the atomic write in fops. Maybe 
> that is better.

It probably is.
John Garry Sept. 12, 2024, 3:22 p.m. UTC | #4
On 12/09/2024 16:07, Christoph Hellwig wrote:
>> We should do be able to, but with this patch we cannot. However, a
>> misaligned partition would be very much unexpected.
> Yes, misaligned partitions is very unexpected, but with large and
> potentially unlimited atomic boundaries I would not expect the size
> to always be aligned.  But then again at least in NVMe atomic writes
> don't need to match the max size anyway, so I'm not entirely sure
> what the problem actually is.

Actually it's not an alignment issue, but a size issue.

Consider a 3.5MB partition and atomic write max is 1MB. If we tried to 
atomic write 1MB at offset 3MB, then it would be truncated to a 0.5MB write.

So maybe it is an application bug.

> 
>> I could also just reject any truncation on the atomic write in fops. Maybe
>> that is better.
Hannes Reinecke Sept. 13, 2024, 8:36 a.m. UTC | #5
On 9/12/24 17:22, John Garry wrote:
> On 12/09/2024 16:07, Christoph Hellwig wrote:
>>> We should do be able to, but with this patch we cannot. However, a
>>> misaligned partition would be very much unexpected.
>> Yes, misaligned partitions is very unexpected, but with large and
>> potentially unlimited atomic boundaries I would not expect the size
>> to always be aligned.  But then again at least in NVMe atomic writes
>> don't need to match the max size anyway, so I'm not entirely sure
>> what the problem actually is.
> 
> Actually it's not an alignment issue, but a size issue.
> 
> Consider a 3.5MB partition and atomic write max is 1MB. If we tried to 
> atomic write 1MB at offset 3MB, then it would be truncated to a 0.5MB 
> write.
> 
> So maybe it is an application bug.
> 
Hmm. Why don't we reject such an I/O? We cannot guarantee an atomic 
write, so I think we should be perfectly fine to return an error to
userspace.

Cheers,

Hannes
diff mbox series

Patch

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index b7664d593486..af8434e391fa 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1662,6 +1662,9 @@  static inline bool bdev_can_atomic_write(struct block_device *bdev)
 	if (!limits->atomic_write_unit_min)
 		return false;
 
+	if (!IS_ALIGNED(bdev_nr_bytes(bdev), limits->atomic_write_unit_max))
+		return false;
+
 	if (bdev_is_partition(bdev)) {
 		sector_t bd_start_sect = bdev->bd_start_sect;
 		unsigned int alignment =