Message ID | 20240123-vfs-bdev-file-v2-31-adbd023e19cc@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Open block devices as files | expand |
On Tue, Jan 23, 2024 at 02:26:48PM +0100, Christian Brauner wrote: > Make it possible to detected a block device that was opened with > restricted write access solely based on its file operations that it was > opened with. This avoids wasting an FMODE_* flag. > > def_blk_fops isn't needed to check whether something is a block device > checking the inode type is enough for that. And def_blk_fops_restricted > can be kept private to the block layer. I agree with not wasting a FMODE_* flag, but I also really hate duplicating the file operations. I went to search for a good place to stash this information and ended up at the f_version field in struct file. That one is never touched by the VFS proper but just the file system and a few lseek helpers. The latter currently happen to be used by block devices unfortunately, but it seem like moving the f_version clearing into the few filesystem actualy using it would be good code hygiene anyway.
On Mon, Jan 29, 2024 at 05:49:34PM +0100, Christoph Hellwig wrote: > On Tue, Jan 23, 2024 at 02:26:48PM +0100, Christian Brauner wrote: > > Make it possible to detected a block device that was opened with > > restricted write access solely based on its file operations that it was > > opened with. This avoids wasting an FMODE_* flag. > > > > def_blk_fops isn't needed to check whether something is a block device > > checking the inode type is enough for that. And def_blk_fops_restricted > > can be kept private to the block layer. > > I agree with not wasting a FMODE_* flag, but I also really hate > duplicating the file operations. I don't think it's that bad and is temporary until we can unconditionally disable writing to mounted block devices. Until then we can place all of this under #if IS_ENABLED(CONFIG_BLK_DEV_WRITE_MOUNTED) in a single location in block/fops.c so its nicely encapsulated and confined. > I went to search for a good place to stash this information and ended up > at the f_version field in struct file. That one is never touched by the > VFS proper but just the file system and a few lseek helpers. The > latter currently happen to be used by block devices unfortunately, > but it seem like moving the f_version clearing into the few filesystem > actualy using it would be good code hygiene anyway. Kinda like choosing between pest and cholera. I think that the f_op solution is nicer. Overloading f_version is not something I feel I have the stomach for. The cleanup itself might still be worth it ofc.
On Mon, Jan 29, 2024 at 06:09:37PM +0100, Christian Brauner wrote: > I don't think it's that bad and is temporary until we can > unconditionally disable writing to mounted block devices. Until then we > can place all of this under #if IS_ENABLED(CONFIG_BLK_DEV_WRITE_MOUNTED) > in a single location in block/fops.c so its nicely encapsulated and > confined. Oh well. If Jens is fine with this I can live with it even if I don't like it too much. I'll probably just clean it up as a follow up. OTOH I fear we won't be able to unconditionally disable writing to mounted block devices anytime soon if ever.
On Tue, Jan 30, 2024 at 09:32:13AM +0100, Christoph Hellwig wrote: > On Mon, Jan 29, 2024 at 06:09:37PM +0100, Christian Brauner wrote: > > I don't think it's that bad and is temporary until we can > > unconditionally disable writing to mounted block devices. Until then we > > can place all of this under #if IS_ENABLED(CONFIG_BLK_DEV_WRITE_MOUNTED) > > in a single location in block/fops.c so its nicely encapsulated and > > confined. > > Oh well. If Jens is fine with this I can live with it even if I don't > like it too much. I'll probably just clean it up as a follow up. > > OTOH I fear we won't be able to unconditionally disable writing to > mounted block devices anytime soon if ever. One my dream. Put another way, if we don't even allow us to think that we can remove insecure functionality in the future then we have to accept that we'll be piling on #ifdefine's and mostly unused code forever which is just sad. :/ I'm hopeful that writing to mounted block devices is something that we can make all major distros move away from. We should start just because we need to figure out what tools do actually try to do stuff like that.
On Tue 23-01-24 14:26:48, Christian Brauner wrote: > Make it possible to detected a block device that was opened with > restricted write access solely based on its file operations that it was > opened with. This avoids wasting an FMODE_* flag. > > def_blk_fops isn't needed to check whether something is a block device > checking the inode type is enough for that. And def_blk_fops_restricted > can be kept private to the block layer. > > Signed-off-by: Christian Brauner <brauner@kernel.org> I don't think we need def_blk_fops_restricted. If we have BLK_OPEN_WRITE file against a bdev with bdev_writes_blocked() == true, we are sure this is the handle blocking other writes so we can unblock them in bdev_yield_write_access()... Honza > --- > block/bdev.c | 16 ++++++++++++---- > block/blk.h | 2 ++ > block/fops.c | 3 +++ > 3 files changed, 17 insertions(+), 4 deletions(-) > > diff --git a/block/bdev.c b/block/bdev.c > index 71eaa1b5b7eb..9d96a43f198d 100644 > --- a/block/bdev.c > +++ b/block/bdev.c > @@ -799,13 +799,16 @@ static void bdev_claim_write_access(struct block_device *bdev, blk_mode_t mode) > bdev->bd_writers++; > } > > -static void bdev_yield_write_access(struct block_device *bdev, blk_mode_t mode) > +static void bdev_yield_write_access(struct file *bdev_file, blk_mode_t mode) > { > + struct block_device *bdev; > + > if (bdev_allow_write_mounted) > return; > > + bdev = file_bdev(bdev_file); > /* Yield exclusive or shared write access. */ > - if (mode & BLK_OPEN_RESTRICT_WRITES) > + if (bdev_file->f_op == &def_blk_fops_restricted) > bdev_unblock_writes(bdev); > else if (mode & BLK_OPEN_WRITE) > bdev->bd_writers--; > @@ -959,6 +962,7 @@ struct file *bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder, > const struct blk_holder_ops *hops) > { > struct file *bdev_file; > + const struct file_operations *blk_fops; > struct block_device *bdev; > unsigned int flags; > int ret; > @@ -972,8 +976,12 @@ struct file *bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder, > return ERR_PTR(-ENXIO); > > flags = blk_to_file_flags(mode); > + if (mode & BLK_OPEN_RESTRICT_WRITES) > + blk_fops = &def_blk_fops_restricted; > + else > + blk_fops = &def_blk_fops; > bdev_file = alloc_file_pseudo_noaccount(bdev->bd_inode, > - blockdev_mnt, "", flags | O_LARGEFILE, &def_blk_fops); > + blockdev_mnt, "", flags | O_LARGEFILE, blk_fops); > if (IS_ERR(bdev_file)) { > blkdev_put_no_open(bdev); > return bdev_file; > @@ -1033,7 +1041,7 @@ void bdev_release(struct file *bdev_file) > sync_blockdev(bdev); > > mutex_lock(&disk->open_mutex); > - bdev_yield_write_access(bdev, handle->mode); > + bdev_yield_write_access(bdev_file, handle->mode); > > if (handle->holder) > bd_end_claim(bdev, handle->holder); > diff --git a/block/blk.h b/block/blk.h > index 7ca24814f3a0..dfa958909c54 100644 > --- a/block/blk.h > +++ b/block/blk.h > @@ -9,6 +9,8 @@ > > struct elevator_type; > > +extern const struct file_operations def_blk_fops_restricted; > + > /* Max future timer expiry for timeouts */ > #define BLK_MAX_TIMEOUT (5 * HZ) > > diff --git a/block/fops.c b/block/fops.c > index 5589bf9c3822..f56bdfe459de 100644 > --- a/block/fops.c > +++ b/block/fops.c > @@ -862,6 +862,9 @@ const struct file_operations def_blk_fops = { > .fallocate = blkdev_fallocate, > }; > > +/* Indicator that this block device is opened with restricted write access. */ > +const struct file_operations def_blk_fops_restricted = def_blk_fops; > + > static __init int blkdev_init(void) > { > return bioset_init(&blkdev_dio_pool, 4, > > -- > 2.43.0 >
On Thu, Feb 01, 2024 at 12:08:58PM +0100, Jan Kara wrote: > On Tue 23-01-24 14:26:48, Christian Brauner wrote: > > Make it possible to detected a block device that was opened with > > restricted write access solely based on its file operations that it was > > opened with. This avoids wasting an FMODE_* flag. > > > > def_blk_fops isn't needed to check whether something is a block device > > checking the inode type is enough for that. And def_blk_fops_restricted > > can be kept private to the block layer. > > > > Signed-off-by: Christian Brauner <brauner@kernel.org> > > I don't think we need def_blk_fops_restricted. If we have BLK_OPEN_WRITE > file against a bdev with bdev_writes_blocked() == true, we are sure this is > the handle blocking other writes so we can unblock them in > bdev_yield_write_access()... Excellent: commit e2dd15e4c32ad66d938d35e1acd26375a7f355fb Author: Christian Brauner <brauner@kernel.org> AuthorDate: Tue Jan 23 14:26:48 2024 +0100 Commit: Christian Brauner <brauner@kernel.org> CommitDate: Thu Feb 1 17:13:16 2024 +0100 block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access Make it possible to detected a block device that was opened with restricted write access based only on BLK_OPEN_WRITE and bdev->bd_writers < 0 so we won't have to claim another FMODE_* flag. Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-31-adbd023e19cc@kernel.org base-commit: 0bd1bf95a554f5f877724c27dbe33d4db0af4d0c change-id: 20240129-vfs-bdev-file-bd_inode-385a56c57a51 Signed-off-by: Christian Brauner <brauner@kernel.org> diff --git a/block/bdev.c b/block/bdev.c index 9be8c3c683ae..0edeb073e4d8 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -799,13 +799,22 @@ static void bdev_claim_write_access(struct block_device *bdev, blk_mode_t mode) bdev->bd_writers++; } -static void bdev_yield_write_access(struct block_device *bdev, blk_mode_t mode) +static void bdev_yield_write_access(struct file *bdev_file, blk_mode_t mode) { + struct block_device *bdev; + if (bdev_allow_write_mounted) return; + bdev = file_bdev(bdev_file); /* Yield exclusive or shared write access. */ - if (mode & BLK_OPEN_RESTRICT_WRITES) + if (mode & BLK_OPEN_WRITE) { + if (bdev_writes_blocked(bdev)) + bdev_unblock_writes(bdev); + else + bdev->bd_writers--; + } + if (bdev_file->f_op == &def_blk_fops_restricted) bdev_unblock_writes(bdev); else if (mode & BLK_OPEN_WRITE) bdev->bd_writers--; @@ -1020,7 +1029,7 @@ void bdev_release(struct file *bdev_file) sync_blockdev(bdev); mutex_lock(&disk->open_mutex); - bdev_yield_write_access(bdev, handle->mode); + bdev_yield_write_access(bdev_file, handle->mode); if (handle->holder) bd_end_claim(bdev, handle->holder);
On Thu 01-02-24 17:16:02, Christian Brauner wrote: > On Thu, Feb 01, 2024 at 12:08:58PM +0100, Jan Kara wrote: > > On Tue 23-01-24 14:26:48, Christian Brauner wrote: > > > Make it possible to detected a block device that was opened with > > > restricted write access solely based on its file operations that it was > > > opened with. This avoids wasting an FMODE_* flag. > > > > > > def_blk_fops isn't needed to check whether something is a block device > > > checking the inode type is enough for that. And def_blk_fops_restricted > > > can be kept private to the block layer. > > > > > > Signed-off-by: Christian Brauner <brauner@kernel.org> > > > > I don't think we need def_blk_fops_restricted. If we have BLK_OPEN_WRITE > > file against a bdev with bdev_writes_blocked() == true, we are sure this is > > the handle blocking other writes so we can unblock them in > > bdev_yield_write_access()... ... > - if (mode & BLK_OPEN_RESTRICT_WRITES) > + if (mode & BLK_OPEN_WRITE) { > + if (bdev_writes_blocked(bdev)) > + bdev_unblock_writes(bdev); > + else > + bdev->bd_writers--; > + } > + if (bdev_file->f_op == &def_blk_fops_restricted) Uh, why are you leaving def_blk_fops_restricted check here? I'd expect you can delete def_blk_fops_restricted completely... Honza
On Thu, Feb 01, 2024 at 06:36:31PM +0100, Jan Kara wrote: > On Thu 01-02-24 17:16:02, Christian Brauner wrote: > > On Thu, Feb 01, 2024 at 12:08:58PM +0100, Jan Kara wrote: > > > On Tue 23-01-24 14:26:48, Christian Brauner wrote: > > > > Make it possible to detected a block device that was opened with > > > > restricted write access solely based on its file operations that it was > > > > opened with. This avoids wasting an FMODE_* flag. > > > > > > > > def_blk_fops isn't needed to check whether something is a block device > > > > checking the inode type is enough for that. And def_blk_fops_restricted > > > > can be kept private to the block layer. > > > > > > > > Signed-off-by: Christian Brauner <brauner@kernel.org> > > > > > > I don't think we need def_blk_fops_restricted. If we have BLK_OPEN_WRITE > > > file against a bdev with bdev_writes_blocked() == true, we are sure this is > > > the handle blocking other writes so we can unblock them in > > > bdev_yield_write_access()... > > ... > > > - if (mode & BLK_OPEN_RESTRICT_WRITES) > > + if (mode & BLK_OPEN_WRITE) { > > + if (bdev_writes_blocked(bdev)) > > + bdev_unblock_writes(bdev); > > + else > > + bdev->bd_writers--; > > + } > > + if (bdev_file->f_op == &def_blk_fops_restricted) > > Uh, why are you leaving def_blk_fops_restricted check here? I'd expect you > can delete def_blk_fops_restricted completely... Copy-paste error when dumping this into here. Here's the full patch.
On Fri 02-02-24 12:45:49, Christian Brauner wrote: > On Thu, Feb 01, 2024 at 06:36:31PM +0100, Jan Kara wrote: > > On Thu 01-02-24 17:16:02, Christian Brauner wrote: > > > On Thu, Feb 01, 2024 at 12:08:58PM +0100, Jan Kara wrote: > > > > On Tue 23-01-24 14:26:48, Christian Brauner wrote: > > > > > Make it possible to detected a block device that was opened with > > > > > restricted write access solely based on its file operations that it was > > > > > opened with. This avoids wasting an FMODE_* flag. > > > > > > > > > > def_blk_fops isn't needed to check whether something is a block device > > > > > checking the inode type is enough for that. And def_blk_fops_restricted > > > > > can be kept private to the block layer. > > > > > > > > > > Signed-off-by: Christian Brauner <brauner@kernel.org> > > > > > > > > I don't think we need def_blk_fops_restricted. If we have BLK_OPEN_WRITE > > > > file against a bdev with bdev_writes_blocked() == true, we are sure this is > > > > the handle blocking other writes so we can unblock them in > > > > bdev_yield_write_access()... > > > > ... > > > > > - if (mode & BLK_OPEN_RESTRICT_WRITES) > > > + if (mode & BLK_OPEN_WRITE) { > > > + if (bdev_writes_blocked(bdev)) > > > + bdev_unblock_writes(bdev); > > > + else > > > + bdev->bd_writers--; > > > + } > > > + if (bdev_file->f_op == &def_blk_fops_restricted) > > > > Uh, why are you leaving def_blk_fops_restricted check here? I'd expect you > > can delete def_blk_fops_restricted completely... > > Copy-paste error when dumping this into here. Here's the full patch. Yes, the full patch looks good to me! Thanks! Honza
diff --git a/block/bdev.c b/block/bdev.c index 71eaa1b5b7eb..9d96a43f198d 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -799,13 +799,16 @@ static void bdev_claim_write_access(struct block_device *bdev, blk_mode_t mode) bdev->bd_writers++; } -static void bdev_yield_write_access(struct block_device *bdev, blk_mode_t mode) +static void bdev_yield_write_access(struct file *bdev_file, blk_mode_t mode) { + struct block_device *bdev; + if (bdev_allow_write_mounted) return; + bdev = file_bdev(bdev_file); /* Yield exclusive or shared write access. */ - if (mode & BLK_OPEN_RESTRICT_WRITES) + if (bdev_file->f_op == &def_blk_fops_restricted) bdev_unblock_writes(bdev); else if (mode & BLK_OPEN_WRITE) bdev->bd_writers--; @@ -959,6 +962,7 @@ struct file *bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder, const struct blk_holder_ops *hops) { struct file *bdev_file; + const struct file_operations *blk_fops; struct block_device *bdev; unsigned int flags; int ret; @@ -972,8 +976,12 @@ struct file *bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder, return ERR_PTR(-ENXIO); flags = blk_to_file_flags(mode); + if (mode & BLK_OPEN_RESTRICT_WRITES) + blk_fops = &def_blk_fops_restricted; + else + blk_fops = &def_blk_fops; bdev_file = alloc_file_pseudo_noaccount(bdev->bd_inode, - blockdev_mnt, "", flags | O_LARGEFILE, &def_blk_fops); + blockdev_mnt, "", flags | O_LARGEFILE, blk_fops); if (IS_ERR(bdev_file)) { blkdev_put_no_open(bdev); return bdev_file; @@ -1033,7 +1041,7 @@ void bdev_release(struct file *bdev_file) sync_blockdev(bdev); mutex_lock(&disk->open_mutex); - bdev_yield_write_access(bdev, handle->mode); + bdev_yield_write_access(bdev_file, handle->mode); if (handle->holder) bd_end_claim(bdev, handle->holder); diff --git a/block/blk.h b/block/blk.h index 7ca24814f3a0..dfa958909c54 100644 --- a/block/blk.h +++ b/block/blk.h @@ -9,6 +9,8 @@ struct elevator_type; +extern const struct file_operations def_blk_fops_restricted; + /* Max future timer expiry for timeouts */ #define BLK_MAX_TIMEOUT (5 * HZ) diff --git a/block/fops.c b/block/fops.c index 5589bf9c3822..f56bdfe459de 100644 --- a/block/fops.c +++ b/block/fops.c @@ -862,6 +862,9 @@ const struct file_operations def_blk_fops = { .fallocate = blkdev_fallocate, }; +/* Indicator that this block device is opened with restricted write access. */ +const struct file_operations def_blk_fops_restricted = def_blk_fops; + static __init int blkdev_init(void) { return bioset_init(&blkdev_dio_pool, 4,
Make it possible to detected a block device that was opened with restricted write access solely based on its file operations that it was opened with. This avoids wasting an FMODE_* flag. def_blk_fops isn't needed to check whether something is a block device checking the inode type is enough for that. And def_blk_fops_restricted can be kept private to the block layer. Signed-off-by: Christian Brauner <brauner@kernel.org> --- block/bdev.c | 16 ++++++++++++---- block/blk.h | 2 ++ block/fops.c | 3 +++ 3 files changed, 17 insertions(+), 4 deletions(-)