diff mbox series

[RFC,v2,09/11] block/bdev: lift block size restrictions and use common definition

Message ID 20241214031050.1337920-10-mcgrof@kernel.org (mailing list archive)
State New
Headers show
Series enable bs > ps for block devices | expand

Commit Message

Luis Chamberlain Dec. 14, 2024, 3:10 a.m. UTC
We now can support blocksizes larger than PAGE_SIZE, so lift
the restriction up to the max supported page cache order and
just bake this into a common helper used by the block layer.

We bound ourselves to 64k as a sensible limit. The hard limit,
however is 1 << (PAGE_SHIFT + MAX_PAGECACHE_ORDER).

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 block/bdev.c           |  5 ++---
 include/linux/blkdev.h | 11 ++++++++++-
 2 files changed, 12 insertions(+), 4 deletions(-)

Comments

John Garry Dec. 16, 2024, 8:55 a.m. UTC | #1
On 14/12/2024 03:10, Luis Chamberlain wrote:
> index 167d82b46781..b57dc4bff81b 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -157,8 +157,7 @@ int set_blocksize(struct file *file, int size)
>   	struct inode *inode = file->f_mapping->host;
>   	struct block_device *bdev = I_BDEV(inode);
>   
> -	/* Size must be a power of two, and between 512 and PAGE_SIZE */
> -	if (size > PAGE_SIZE || size < 512 || !is_power_of_2(size))
> +	if (blk_validate_block_size(size))
>   		return -EINVAL;

I suppose that this can be sent as a separate patch to be merged now.
Ming Lei Dec. 16, 2024, 9:19 a.m. UTC | #2
On Mon, Dec 16, 2024 at 4:58 PM John Garry <john.g.garry@oracle.com> wrote:
>
> On 14/12/2024 03:10, Luis Chamberlain wrote:
> > index 167d82b46781..b57dc4bff81b 100644
> > --- a/block/bdev.c
> > +++ b/block/bdev.c
> > @@ -157,8 +157,7 @@ int set_blocksize(struct file *file, int size)
> >       struct inode *inode = file->f_mapping->host;
> >       struct block_device *bdev = I_BDEV(inode);
> >
> > -     /* Size must be a power of two, and between 512 and PAGE_SIZE */
> > -     if (size > PAGE_SIZE || size < 512 || !is_power_of_2(size))
> > +     if (blk_validate_block_size(size))
> >               return -EINVAL;
>
> I suppose that this can be sent as a separate patch to be merged now.

There have been some bugs found in case that PAGE_SIZE == 64K, and I
think it is bad to use PAGE_SIZE for validating many hw/queue limits, we might
have to fix them first.

Such as:

1) blk_validate_limits()

- failure if max_segment_size is less than 64K

- max_user_sectors

if (lim->max_user_sectors < PAGE_SIZE / SECTOR_SIZE)
       return -EINVAL;

2) bio_may_need_split()

- max_segment_size may be less than 64K

Thanks,
John Garry Dec. 16, 2024, 10:13 a.m. UTC | #3
On 16/12/2024 09:19, Ming Lei wrote:
> On Mon, Dec 16, 2024 at 4:58 PM John Garry <john.g.garry@oracle.com> wrote:
>>
>> On 14/12/2024 03:10, Luis Chamberlain wrote:
>>> index 167d82b46781..b57dc4bff81b 100644
>>> --- a/block/bdev.c
>>> +++ b/block/bdev.c
>>> @@ -157,8 +157,7 @@ int set_blocksize(struct file *file, int size)
>>>        struct inode *inode = file->f_mapping->host;
>>>        struct block_device *bdev = I_BDEV(inode);
>>>
>>> -     /* Size must be a power of two, and between 512 and PAGE_SIZE */
>>> -     if (size > PAGE_SIZE || size < 512 || !is_power_of_2(size))
>>> +     if (blk_validate_block_size(size))
>>>                return -EINVAL;
>>
>> I suppose that this can be sent as a separate patch to be merged now.
> 
> There have been some bugs found in case that PAGE_SIZE == 64K, and I
> think it is bad to use PAGE_SIZE for validating many hw/queue limits, we might
> have to fix them first.

I am just suggesting to remove duplicated code, as these checks are same 
as blk_validate_block_size()

> 
> Such as:

Aren't the below list just enforcing block layer requirements? And so 
only block drivers need fixing for PAGE_SIZE > 4K (or cannot be used for 
PAGE_SIZE > 4K), right?

> 
> 1) blk_validate_limits()
> 
> - failure if max_segment_size is less than 64K
 > > - max_user_sectors
> 
> if (lim->max_user_sectors < PAGE_SIZE / SECTOR_SIZE)
>         return -EINVAL;
> 
> 2) bio_may_need_split()
> 
> - max_segment_size may be less than 64K
> 
> Thanks,



>
Ming Lei Dec. 16, 2024, 10:23 a.m. UTC | #4
On Mon, Dec 16, 2024 at 6:14 PM John Garry <john.g.garry@oracle.com> wrote:
>
> On 16/12/2024 09:19, Ming Lei wrote:
> > On Mon, Dec 16, 2024 at 4:58 PM John Garry <john.g.garry@oracle.com> wrote:
> >>
> >> On 14/12/2024 03:10, Luis Chamberlain wrote:
> >>> index 167d82b46781..b57dc4bff81b 100644
> >>> --- a/block/bdev.c
> >>> +++ b/block/bdev.c
> >>> @@ -157,8 +157,7 @@ int set_blocksize(struct file *file, int size)
> >>>        struct inode *inode = file->f_mapping->host;
> >>>        struct block_device *bdev = I_BDEV(inode);
> >>>
> >>> -     /* Size must be a power of two, and between 512 and PAGE_SIZE */
> >>> -     if (size > PAGE_SIZE || size < 512 || !is_power_of_2(size))
> >>> +     if (blk_validate_block_size(size))
> >>>                return -EINVAL;
> >>
> >> I suppose that this can be sent as a separate patch to be merged now.
> >
> > There have been some bugs found in case that PAGE_SIZE == 64K, and I
> > think it is bad to use PAGE_SIZE for validating many hw/queue limits, we might
> > have to fix them first.
>
> I am just suggesting to remove duplicated code, as these checks are same
> as blk_validate_block_size()

My fault, misunderstood your point as pushing this single patch only.

>
> >
> > Such as:
>
> Aren't the below list just enforcing block layer requirements? And so
> only block drivers need fixing for PAGE_SIZE > 4K (or cannot be used for
> PAGE_SIZE > 4K), right?

It is block layer which should be fixed to support  PAGE_SIZE > 4K.

Thanks,
Hannes Reinecke Dec. 17, 2024, 10:05 a.m. UTC | #5
On 12/14/24 04:10, Luis Chamberlain wrote:
> We now can support blocksizes larger than PAGE_SIZE, so lift
> the restriction up to the max supported page cache order and
> just bake this into a common helper used by the block layer.
> 
> We bound ourselves to 64k as a sensible limit. The hard limit,
> however is 1 << (PAGE_SHIFT + MAX_PAGECACHE_ORDER).
> 
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>   block/bdev.c           |  5 ++---
>   include/linux/blkdev.h | 11 ++++++++++-
>   2 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/block/bdev.c b/block/bdev.c
> index 167d82b46781..b57dc4bff81b 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -157,8 +157,7 @@ int set_blocksize(struct file *file, int size)
>   	struct inode *inode = file->f_mapping->host;
>   	struct block_device *bdev = I_BDEV(inode);
>   
> -	/* Size must be a power of two, and between 512 and PAGE_SIZE */
> -	if (size > PAGE_SIZE || size < 512 || !is_power_of_2(size))
> +	if (blk_validate_block_size(size))
>   		return -EINVAL;
>   
>   	/* Size cannot be smaller than the size supported by the device */
> @@ -185,7 +184,7 @@ int sb_set_blocksize(struct super_block *sb, int size)
>   	if (set_blocksize(sb->s_bdev_file, size))
>   		return 0;
>   	/* If we get here, we know size is power of two
> -	 * and it's value is between 512 and PAGE_SIZE */
> +	 * and its value is larger than 512 */
>   	sb->s_blocksize = size;
>   	sb->s_blocksize_bits = blksize_bits(size);
>   	return sb->s_blocksize;
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 08a727b40816..a7303a55ed2a 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -269,10 +269,19 @@ static inline dev_t disk_devt(struct gendisk *disk)
>   	return MKDEV(disk->major, disk->first_minor);
>   }
>   
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +/*
> + * The hard limit is (1 << (PAGE_SHIFT + MAX_PAGECACHE_ORDER).
As this is the hard limit, one wonders why we don't use it ...
So please add a comment why we restrict it to 64k.

> + */
> +#define BLK_MAX_BLOCK_SIZE      (SZ_64K)
> +#else
> +#define BLK_MAX_BLOCK_SIZE      (PAGE_SIZE)
> +#endif
> +
>   /* blk_validate_limits() validates bsize, so drivers don't usually need to */
>   static inline int blk_validate_block_size(unsigned long bsize)
>   {
> -	if (bsize < 512 || bsize > PAGE_SIZE || !is_power_of_2(bsize))
> +	if (bsize < 512 || bsize > BLK_MAX_BLOCK_SIZE || !is_power_of_2(bsize))
>   		return -EINVAL;
>   
>   	return 0;

Otherwise:
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
John Garry Dec. 17, 2024, 8:51 p.m. UTC | #6
On 16/12/2024 08:55, John Garry wrote:
> On 14/12/2024 03:10, Luis Chamberlain wrote:
>> index 167d82b46781..b57dc4bff81b 100644
>> --- a/block/bdev.c
>> +++ b/block/bdev.c
>> @@ -157,8 +157,7 @@ int set_blocksize(struct file *file, int size)
>>       struct inode *inode = file->f_mapping->host;
>>       struct block_device *bdev = I_BDEV(inode);
>> -    /* Size must be a power of two, and between 512 and PAGE_SIZE */
>> -    if (size > PAGE_SIZE || size < 512 || !is_power_of_2(size))
>> +    if (blk_validate_block_size(size))
>>           return -EINVAL;
> 
> I suppose that this can be sent as a separate patch to be merged now.

And if you want to send this as a separate prep change, feel free to add:

Reviewed-by: John Garry <john.g.garry@oracle.com>
Bart Van Assche Dec. 17, 2024, 9 p.m. UTC | #7
On 12/13/24 7:10 PM, Luis Chamberlain wrote:
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +/*
> + * The hard limit is (1 << (PAGE_SHIFT + MAX_PAGECACHE_ORDER).
> + */

A closing parenthesis is missing from the above comment.

> +#define BLK_MAX_BLOCK_SIZE      (SZ_64K)
> +#else
> +#define BLK_MAX_BLOCK_SIZE      (PAGE_SIZE)
> +#endif

Parentheses are never necessary around constants since the definition of
the constant should include parenthesis if necessary.

Thanks,

Bart.
diff mbox series

Patch

diff --git a/block/bdev.c b/block/bdev.c
index 167d82b46781..b57dc4bff81b 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -157,8 +157,7 @@  int set_blocksize(struct file *file, int size)
 	struct inode *inode = file->f_mapping->host;
 	struct block_device *bdev = I_BDEV(inode);
 
-	/* Size must be a power of two, and between 512 and PAGE_SIZE */
-	if (size > PAGE_SIZE || size < 512 || !is_power_of_2(size))
+	if (blk_validate_block_size(size))
 		return -EINVAL;
 
 	/* Size cannot be smaller than the size supported by the device */
@@ -185,7 +184,7 @@  int sb_set_blocksize(struct super_block *sb, int size)
 	if (set_blocksize(sb->s_bdev_file, size))
 		return 0;
 	/* If we get here, we know size is power of two
-	 * and it's value is between 512 and PAGE_SIZE */
+	 * and its value is larger than 512 */
 	sb->s_blocksize = size;
 	sb->s_blocksize_bits = blksize_bits(size);
 	return sb->s_blocksize;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 08a727b40816..a7303a55ed2a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -269,10 +269,19 @@  static inline dev_t disk_devt(struct gendisk *disk)
 	return MKDEV(disk->major, disk->first_minor);
 }
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+/*
+ * The hard limit is (1 << (PAGE_SHIFT + MAX_PAGECACHE_ORDER).
+ */
+#define BLK_MAX_BLOCK_SIZE      (SZ_64K)
+#else
+#define BLK_MAX_BLOCK_SIZE      (PAGE_SIZE)
+#endif
+
 /* blk_validate_limits() validates bsize, so drivers don't usually need to */
 static inline int blk_validate_block_size(unsigned long bsize)
 {
-	if (bsize < 512 || bsize > PAGE_SIZE || !is_power_of_2(bsize))
+	if (bsize < 512 || bsize > BLK_MAX_BLOCK_SIZE || !is_power_of_2(bsize))
 		return -EINVAL;
 
 	return 0;