mbox series

[v2,00/34] Open block devices as files

Message ID 20240123-vfs-bdev-file-v2-0-adbd023e19cc@kernel.org (mailing list archive)
Headers show
Series Open block devices as files | expand

Message

Christian Brauner Jan. 23, 2024, 1:26 p.m. UTC
Hey Christoph,
Hey Jan,
Hey Jens,

This opens block devices as files. Instead of introducing a separate
indirection into bdev_open_by_*() vis struct bdev_handle we can just
make bdev_file_open_by_*() return a struct file. Opening and closing a
block device from setup_bdev_super() and in all other places just
becomes equivalent to opening and closing a file.

This has held up in xfstests and in blktests so far and it seems stable
and clean. The equivalence of opening and closing block devices to
regular files is a win in and of itself imho. Added to that is the
ability to do away with struct bdev_handle completely and make various
low-level helpers private to the block layer.

All places were we currently stash a struct bdev_handle we just stash a
file and use an accessor such as file_bdev() akin to I_BDEV() to get to
the block device.

It's now also possible to use file->f_mapping as a replacement for
bdev->bd_inode->i_mapping and file->f_inode or file->f_mapping->host as
an alternative to bdev->bd_inode allowing us to significantly reduce or
even fully remove bdev->bd_inode in follow-up patches.

In addition, we could get rid of sb->s_bdev and various other places
that stash the block device directly and instead stash the block device
file. Again, this is follow-up work.

Thanks!
Christian

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
Changes in v2:
- This is not an RFC anymore.
- The patches to convert all of fs/buffer.c and associated helpers to
  struct file have been split out of the core infrastructure.
- Various renaming of helpers in response to v1.
- Link to v1: https://lore.kernel.org/r/20240103-vfs-bdev-file-v1-0-6c8ee55fb6ef@kernel.org

---
Christian Brauner (34):
      bdev: open block device as files
      block/ioctl: port blkdev_bszset() to file
      block/genhd: port disk_scan_partitions() to file
      md: port block device access to file
      swap: port block device usage to file
      power: port block device access to file
      xfs: port block device access to files
      drbd: port block device access to file
      pktcdvd: port block device access to file
      rnbd: port block device access to file
      xen: port block device access to file
      zram: port block device access to file
      bcache: port block device access to files
      block2mtd: port device access to files
      nvme: port block device access to file
      s390: port block device access to file
      target: port block device access to file
      bcachefs: port block device access to file
      btrfs: port device access to file
      erofs: port device access to file
      ext4: port block device access to file
      f2fs: port block device access to files
      jfs: port block device access to file
      nfs: port block device access to files
      ocfs2: port block device access to file
      reiserfs: port block device access to file
      bdev: remove bdev_open_by_path()
      bdev: make bdev_release() private to block layer
      bdev: make struct bdev_handle private to the block layer
      bdev: remove bdev pointer from struct bdev_handle
      block: use file->f_op to indicate restricted writes
      block: remove bdev_handle completely
      block: expose bdev_file_inode()
      ext4: rely on sb->f_bdev only

 block/bdev.c                        | 249 ++++++++++++++++++++++--------------
 block/blk.h                         |   6 +
 block/fops.c                        |  48 ++++---
 block/genhd.c                       |  12 +-
 block/ioctl.c                       |   9 +-
 drivers/block/drbd/drbd_int.h       |   4 +-
 drivers/block/drbd/drbd_nl.c        |  58 ++++-----
 drivers/block/pktcdvd.c             |  68 +++++-----
 drivers/block/rnbd/rnbd-srv.c       |  28 ++--
 drivers/block/rnbd/rnbd-srv.h       |   2 +-
 drivers/block/xen-blkback/blkback.c |   4 +-
 drivers/block/xen-blkback/common.h  |   4 +-
 drivers/block/xen-blkback/xenbus.c  |  37 +++---
 drivers/block/zram/zram_drv.c       |  26 ++--
 drivers/block/zram/zram_drv.h       |   2 +-
 drivers/md/bcache/bcache.h          |   4 +-
 drivers/md/bcache/super.c           |  74 +++++------
 drivers/md/dm.c                     |  23 ++--
 drivers/md/md.c                     |  12 +-
 drivers/md/md.h                     |   2 +-
 drivers/mtd/devices/block2mtd.c     |  46 +++----
 drivers/nvme/target/io-cmd-bdev.c   |  16 +--
 drivers/nvme/target/nvmet.h         |   2 +-
 drivers/s390/block/dasd.c           |  10 +-
 drivers/s390/block/dasd_genhd.c     |  36 +++---
 drivers/s390/block/dasd_int.h       |   2 +-
 drivers/s390/block/dasd_ioctl.c     |   2 +-
 drivers/target/target_core_iblock.c |  18 +--
 drivers/target/target_core_iblock.h |   2 +-
 drivers/target/target_core_pscsi.c  |  22 ++--
 drivers/target/target_core_pscsi.h  |   2 +-
 fs/bcachefs/super-io.c              |  20 +--
 fs/bcachefs/super_types.h           |   2 +-
 fs/btrfs/dev-replace.c              |  14 +-
 fs/btrfs/ioctl.c                    |  16 +--
 fs/btrfs/volumes.c                  |  92 ++++++-------
 fs/btrfs/volumes.h                  |   4 +-
 fs/cramfs/inode.c                   |   2 +-
 fs/erofs/data.c                     |   6 +-
 fs/erofs/internal.h                 |   2 +-
 fs/erofs/super.c                    |  16 +--
 fs/ext4/dir.c                       |   2 +-
 fs/ext4/ext4.h                      |   2 +-
 fs/ext4/ext4_jbd2.c                 |   2 +-
 fs/ext4/fast_commit.c               |   2 +-
 fs/ext4/fsmap.c                     |   8 +-
 fs/ext4/super.c                     |  87 ++++++-------
 fs/f2fs/f2fs.h                      |   2 +-
 fs/f2fs/super.c                     |  12 +-
 fs/file_table.c                     |  36 +++++-
 fs/jfs/jfs_logmgr.c                 |  26 ++--
 fs/jfs/jfs_logmgr.h                 |   2 +-
 fs/jfs/jfs_mount.c                  |   2 +-
 fs/nfs/blocklayout/blocklayout.h    |   2 +-
 fs/nfs/blocklayout/dev.c            |  68 +++++-----
 fs/ocfs2/cluster/heartbeat.c        |  32 ++---
 fs/reiserfs/journal.c               |  38 +++---
 fs/reiserfs/procfs.c                |   2 +-
 fs/reiserfs/reiserfs.h              |   8 +-
 fs/romfs/super.c                    |   2 +-
 fs/super.c                          |  18 +--
 fs/xfs/xfs_buf.c                    |  10 +-
 fs/xfs/xfs_buf.h                    |   4 +-
 fs/xfs/xfs_super.c                  |  43 +++----
 include/linux/blkdev.h              |  18 +--
 include/linux/device-mapper.h       |   2 +-
 include/linux/file.h                |   2 +
 include/linux/fs.h                  |   4 +-
 include/linux/pktcdvd.h             |   4 +-
 include/linux/swap.h                |   2 +-
 kernel/power/swap.c                 |  28 ++--
 mm/swapfile.c                       |  22 ++--
 72 files changed, 791 insertions(+), 705 deletions(-)
---
base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
change-id: 20240103-vfs-bdev-file-1208da73d7ea

Comments

Christoph Hellwig Jan. 29, 2024, 6:17 a.m. UTC | #1
Do you have a git tree for this series?
Christian Brauner Jan. 29, 2024, 10:17 a.m. UTC | #2
On Mon, Jan 29, 2024 at 07:17:07AM +0100, Christoph Hellwig wrote:
> Do you have a git tree for this series? 

b4/vfs-bdev-file on vfs.git
Christian Brauner Jan. 29, 2024, 10:56 a.m. UTC | #3
Hey Christoph,
Hey Jan,

This is an attempt to remove bdev->bd_inode and restrict direct access
to the block layer, block drivers and few instances in fs/buffer.c where
it's needed. Suggestions to do better welcome!

Thanks!
Christian

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
Christian Brauner (2):
      fs & block: remove bdev->bd_inode
      fs,drivers: remove bdev_inode() usage outside of block layer and drivers

 block/bdev.c                          | 48 ++++++++++--------
 block/blk-zoned.c                     |  4 +-
 block/fops.c                          |  4 +-
 block/genhd.c                         |  8 +--
 block/ioctl.c                         |  8 +--
 block/partitions/core.c               | 11 ++--
 drivers/gpu/drm/drm_gem_vram_helper.c |  2 +-
 drivers/md/bcache/super.c             |  7 +--
 drivers/md/md-bitmap.c                |  2 +-
 drivers/mtd/devices/block2mtd.c       |  4 +-
 drivers/s390/block/dasd_ioctl.c       |  2 +-
 drivers/scsi/scsicam.c                |  2 +-
 fs/affs/file.c                        |  2 +-
 fs/bcachefs/util.h                    |  5 --
 fs/btrfs/dev-replace.c                |  2 +-
 fs/btrfs/disk-io.c                    | 17 ++++---
 fs/btrfs/disk-io.h                    |  4 +-
 fs/btrfs/inode.c                      |  2 +-
 fs/btrfs/super.c                      |  2 +-
 fs/btrfs/volumes.c                    | 26 +++++-----
 fs/btrfs/volumes.h                    |  2 +-
 fs/btrfs/zoned.c                      | 18 ++++---
 fs/btrfs/zoned.h                      |  4 +-
 fs/buffer.c                           | 95 +++++++++++++++++++----------------
 fs/cramfs/inode.c                     |  2 +-
 fs/direct-io.c                        |  7 +--
 fs/erofs/data.c                       |  5 +-
 fs/erofs/internal.h                   |  1 +
 fs/erofs/zmap.c                       |  2 +-
 fs/ext2/inode.c                       |  4 +-
 fs/ext2/xattr.c                       |  2 +-
 fs/ext4/inode.c                       |  8 +--
 fs/ext4/mmp.c                         |  2 +-
 fs/ext4/page-io.c                     |  4 +-
 fs/ext4/super.c                       |  7 ++-
 fs/ext4/xattr.c                       |  2 +-
 fs/f2fs/data.c                        |  7 ++-
 fs/f2fs/f2fs.h                        |  1 +
 fs/fuse/dax.c                         |  2 +-
 fs/gfs2/aops.c                        |  2 +-
 fs/gfs2/bmap.c                        |  2 +-
 fs/gfs2/glock.c                       |  2 +-
 fs/gfs2/meta_io.c                     |  2 +-
 fs/gfs2/ops_fstype.c                  |  2 +-
 fs/hpfs/file.c                        |  2 +-
 fs/iomap/buffered-io.c                |  8 +--
 fs/iomap/direct-io.c                  | 10 ++--
 fs/iomap/swapfile.c                   |  2 +-
 fs/iomap/trace.h                      |  2 +-
 fs/jbd2/commit.c                      |  2 +-
 fs/jbd2/journal.c                     | 29 ++++++-----
 fs/jbd2/recovery.c                    |  6 +--
 fs/jbd2/revoke.c                      | 10 ++--
 fs/jbd2/transaction.c                 |  8 +--
 fs/mpage.c                            | 26 ++++++----
 fs/nilfs2/btnode.c                    |  4 +-
 fs/nilfs2/gcinode.c                   |  2 +-
 fs/nilfs2/mdt.c                       |  2 +-
 fs/nilfs2/page.c                      |  4 +-
 fs/nilfs2/recovery.c                  | 20 ++++----
 fs/nilfs2/segment.c                   |  2 +-
 fs/nilfs2/the_nilfs.c                 |  1 +
 fs/nilfs2/the_nilfs.h                 |  1 +
 fs/ntfs/aops.c                        |  6 +--
 fs/ntfs/file.c                        |  2 +-
 fs/ntfs/mft.c                         |  4 +-
 fs/ntfs3/fsntfs.c                     |  8 +--
 fs/ntfs3/inode.c                      |  2 +-
 fs/ntfs3/super.c                      |  2 +-
 fs/ocfs2/aops.c                       |  2 +-
 fs/ocfs2/journal.c                    |  2 +-
 fs/reiserfs/fix_node.c                |  2 +-
 fs/reiserfs/journal.c                 | 10 ++--
 fs/reiserfs/prints.c                  |  4 +-
 fs/reiserfs/reiserfs.h                |  6 +--
 fs/reiserfs/stree.c                   |  2 +-
 fs/reiserfs/tail_conversion.c         |  2 +-
 fs/xfs/xfs_iomap.c                    |  4 +-
 fs/zonefs/file.c                      |  4 +-
 include/linux/blk_types.h             |  1 -
 include/linux/blkdev.h                |  6 ++-
 include/linux/buffer_head.h           | 68 ++++++++++++++++---------
 include/linux/fs.h                    |  4 +-
 include/linux/iomap.h                 | 13 ++++-
 include/linux/jbd2.h                  | 10 ++--
 include/trace/events/block.h          |  2 +-
 86 files changed, 363 insertions(+), 291 deletions(-)
---
base-commit: 0bd1bf95a554f5f877724c27dbe33d4db0af4d0c
change-id: 20240129-vfs-bdev-file-bd_inode-385a56c57a51
Christian Brauner Feb. 5, 2024, 11:55 a.m. UTC | #4
On Tue, Jan 23, 2024 at 02:26:17PM +0100, Christian Brauner wrote:
> Hey Christoph,
> Hey Jan,
> Hey Jens,
> 
> This opens block devices as files. Instead of introducing a separate
> indirection into bdev_open_by_*() vis struct bdev_handle we can just
> make bdev_file_open_by_*() return a struct file. Opening and closing a
> block device from setup_bdev_super() and in all other places just
> becomes equivalent to opening and closing a file.
> 
> This has held up in xfstests and in blktests so far and it seems stable
> and clean. The equivalence of opening and closing block devices to
> regular files is a win in and of itself imho. Added to that is the
> ability to do away with struct bdev_handle completely and make various
> low-level helpers private to the block layer.
> 
> All places were we currently stash a struct bdev_handle we just stash a
> file and use an accessor such as file_bdev() akin to I_BDEV() to get to
> the block device.
> 
> It's now also possible to use file->f_mapping as a replacement for
> bdev->bd_inode->i_mapping and file->f_inode or file->f_mapping->host as
> an alternative to bdev->bd_inode allowing us to significantly reduce or
> even fully remove bdev->bd_inode in follow-up patches.
> 
> In addition, we could get rid of sb->s_bdev and various other places
> that stash the block device directly and instead stash the block device
> file. Again, this is follow-up work.
> 
> Thanks!
> Christian
> 
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> ---

With all fixes applied I've moved this into vfs.super on vfs/vfs.git so
this gets some exposure in -next asap. Please let me know if you have
quarrels with that.
Jan Kara Feb. 5, 2024, 2:19 p.m. UTC | #5
Hi!

On Mon 05-02-24 12:55:18, Christian Brauner wrote:
> On Tue, Jan 23, 2024 at 02:26:17PM +0100, Christian Brauner wrote:
> > Hey Christoph,
> > Hey Jan,
> > Hey Jens,
> > 
> > This opens block devices as files. Instead of introducing a separate
> > indirection into bdev_open_by_*() vis struct bdev_handle we can just
> > make bdev_file_open_by_*() return a struct file. Opening and closing a
> > block device from setup_bdev_super() and in all other places just
> > becomes equivalent to opening and closing a file.
> > 
> > This has held up in xfstests and in blktests so far and it seems stable
> > and clean. The equivalence of opening and closing block devices to
> > regular files is a win in and of itself imho. Added to that is the
> > ability to do away with struct bdev_handle completely and make various
> > low-level helpers private to the block layer.
> > 
> > All places were we currently stash a struct bdev_handle we just stash a
> > file and use an accessor such as file_bdev() akin to I_BDEV() to get to
> > the block device.
> > 
> > It's now also possible to use file->f_mapping as a replacement for
> > bdev->bd_inode->i_mapping and file->f_inode or file->f_mapping->host as
> > an alternative to bdev->bd_inode allowing us to significantly reduce or
> > even fully remove bdev->bd_inode in follow-up patches.
> > 
> > In addition, we could get rid of sb->s_bdev and various other places
> > that stash the block device directly and instead stash the block device
> > file. Again, this is follow-up work.
> > 
> > Thanks!
> > Christian
> > 
> > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > ---
> 
> With all fixes applied I've moved this into vfs.super on vfs/vfs.git so
> this gets some exposure in -next asap. Please let me know if you have
> quarrels with that.

No quarrels really. I went through the patches and all of them look fine to
me to feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

I have just noticed that in "bdev: make struct bdev_handle private to the
block layer" in bdev_open() we are still leaking the handle in case of
error. This is however temporary (until the end of the series when we get
rid of handles altogether) so whatever.

								Honza
Christian Brauner Feb. 6, 2024, 1:39 p.m. UTC | #6
On Mon, Feb 05, 2024 at 03:19:11PM +0100, Jan Kara wrote:
> Hi!
> 
> On Mon 05-02-24 12:55:18, Christian Brauner wrote:
> > On Tue, Jan 23, 2024 at 02:26:17PM +0100, Christian Brauner wrote:
> > > Hey Christoph,
> > > Hey Jan,
> > > Hey Jens,
> > > 
> > > This opens block devices as files. Instead of introducing a separate
> > > indirection into bdev_open_by_*() vis struct bdev_handle we can just
> > > make bdev_file_open_by_*() return a struct file. Opening and closing a
> > > block device from setup_bdev_super() and in all other places just
> > > becomes equivalent to opening and closing a file.
> > > 
> > > This has held up in xfstests and in blktests so far and it seems stable
> > > and clean. The equivalence of opening and closing block devices to
> > > regular files is a win in and of itself imho. Added to that is the
> > > ability to do away with struct bdev_handle completely and make various
> > > low-level helpers private to the block layer.
> > > 
> > > All places were we currently stash a struct bdev_handle we just stash a
> > > file and use an accessor such as file_bdev() akin to I_BDEV() to get to
> > > the block device.
> > > 
> > > It's now also possible to use file->f_mapping as a replacement for
> > > bdev->bd_inode->i_mapping and file->f_inode or file->f_mapping->host as
> > > an alternative to bdev->bd_inode allowing us to significantly reduce or
> > > even fully remove bdev->bd_inode in follow-up patches.
> > > 
> > > In addition, we could get rid of sb->s_bdev and various other places
> > > that stash the block device directly and instead stash the block device
> > > file. Again, this is follow-up work.
> > > 
> > > Thanks!
> > > Christian
> > > 
> > > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > > ---
> > 
> > With all fixes applied I've moved this into vfs.super on vfs/vfs.git so
> > this gets some exposure in -next asap. Please let me know if you have
> > quarrels with that.
> 
> No quarrels really. I went through the patches and all of them look fine to
> me to feel free to add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> 
> I have just noticed that in "bdev: make struct bdev_handle private to the
> block layer" in bdev_open() we are still leaking the handle in case of
> error. This is however temporary (until the end of the series when we get
> rid of handles altogether) so whatever.

Can you double-check what's in vfs.super right now? I thought I fixed
this up. I'll check too!
Jan Kara Feb. 6, 2024, 1:58 p.m. UTC | #7
On Tue 06-02-24 14:39:13, Christian Brauner wrote:
> On Mon, Feb 05, 2024 at 03:19:11PM +0100, Jan Kara wrote:
> > Hi!
> > 
> > On Mon 05-02-24 12:55:18, Christian Brauner wrote:
> > > On Tue, Jan 23, 2024 at 02:26:17PM +0100, Christian Brauner wrote:
> > > > Hey Christoph,
> > > > Hey Jan,
> > > > Hey Jens,
> > > > 
> > > > This opens block devices as files. Instead of introducing a separate
> > > > indirection into bdev_open_by_*() vis struct bdev_handle we can just
> > > > make bdev_file_open_by_*() return a struct file. Opening and closing a
> > > > block device from setup_bdev_super() and in all other places just
> > > > becomes equivalent to opening and closing a file.
> > > > 
> > > > This has held up in xfstests and in blktests so far and it seems stable
> > > > and clean. The equivalence of opening and closing block devices to
> > > > regular files is a win in and of itself imho. Added to that is the
> > > > ability to do away with struct bdev_handle completely and make various
> > > > low-level helpers private to the block layer.
> > > > 
> > > > All places were we currently stash a struct bdev_handle we just stash a
> > > > file and use an accessor such as file_bdev() akin to I_BDEV() to get to
> > > > the block device.
> > > > 
> > > > It's now also possible to use file->f_mapping as a replacement for
> > > > bdev->bd_inode->i_mapping and file->f_inode or file->f_mapping->host as
> > > > an alternative to bdev->bd_inode allowing us to significantly reduce or
> > > > even fully remove bdev->bd_inode in follow-up patches.
> > > > 
> > > > In addition, we could get rid of sb->s_bdev and various other places
> > > > that stash the block device directly and instead stash the block device
> > > > file. Again, this is follow-up work.
> > > > 
> > > > Thanks!
> > > > Christian
> > > > 
> > > > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > > > ---
> > > 
> > > With all fixes applied I've moved this into vfs.super on vfs/vfs.git so
> > > this gets some exposure in -next asap. Please let me know if you have
> > > quarrels with that.
> > 
> > No quarrels really. I went through the patches and all of them look fine to
> > me to feel free to add:
> > 
> > Reviewed-by: Jan Kara <jack@suse.cz>
> > 
> > I have just noticed that in "bdev: make struct bdev_handle private to the
> > block layer" in bdev_open() we are still leaking the handle in case of
> > error. This is however temporary (until the end of the series when we get
> > rid of handles altogether) so whatever.
> 
> Can you double-check what's in vfs.super right now? I thought I fixed
> this up. I'll check too!

Well, you've fixed the "double allocation" issue but there's still a
problem that you do:

int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
	      const struct blk_holder_ops *hops, struct file *bdev_file)
{
...
	handle = kmalloc(sizeof(struct bdev_handle), GFP_KERNEL);
	if (!handle)
		return -ENOMEM;
 	if (holder) {
 		mode |= BLK_OPEN_EXCL;
 		ret = bd_prepare_to_claim(bdev, holder, hops);
 		if (ret)
			return ret;
 	} else {
...


So in case bd_prepare_to_claim() fails we forget to free the allocated
handle.

								Honza
Christian Brauner Feb. 6, 2024, 4:10 p.m. UTC | #8
> > Can you double-check what's in vfs.super right now? I thought I fixed
> > this up. I'll check too!
> 
> Well, you've fixed the "double allocation" issue but there's still a
> problem that you do:
> 
> int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
> 	      const struct blk_holder_ops *hops, struct file *bdev_file)
> {
> ...
> 	handle = kmalloc(sizeof(struct bdev_handle), GFP_KERNEL);
> 	if (!handle)
> 		return -ENOMEM;
>  	if (holder) {
>  		mode |= BLK_OPEN_EXCL;
>  		ret = bd_prepare_to_claim(bdev, holder, hops);
>  		if (ret)
> 			return ret;
>  	} else {
> ...
> 
> 
> So in case bd_prepare_to_claim() fails we forget to free the allocated
> handle.

Grumble grumble grumble, thank you! Fixing.
Matthew Wilcox March 21, 2024, 10:17 p.m. UTC | #9
On Tue, Jan 23, 2024 at 02:26:17PM +0100, Christian Brauner wrote:
> This opens block devices as files. Instead of introducing a separate
> indirection into bdev_open_by_*() vis struct bdev_handle we can just
> make bdev_file_open_by_*() return a struct file. Opening and closing a
> block device from setup_bdev_super() and in all other places just
> becomes equivalent to opening and closing a file.
> 
> This has held up in xfstests and in blktests so far and it seems stable
> and clean. The equivalence of opening and closing block devices to
> regular files is a win in and of itself imho. Added to that is the
> ability to do away with struct bdev_handle completely and make various
> low-level helpers private to the block layer.

It fails to hold up in xfstests for me.

git bisect leads to:

commit 321de651fa565dcf76c017b257bdf15ec7fff45d
Author: Christian Brauner <brauner@kernel.org>
Date:   Tue Jan 23 14:26:48 2024 +0100

    block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access

QA output created by 015
mkfs failed
(see /ktest-out/xfstests/generic/015.full for details)
umount: /dev/vdc: not mounted.

** mkfs failed with extra mkfs options added to "-m reflink=1,rmapbt=1 -i sparse=1,nrext64=1" by test 015 **
** attempting to mkfs using only test 015 options: -d size=268435456 -b size=4096 **
mkfs.xfs: cannot open /dev/vdc: Device or resource busy
mkfs failed

About half the xfstests fail this way (722 of 1387 tests)
Kent Overstreet March 22, 2024, 3:38 a.m. UTC | #10
On Thu, Mar 21, 2024 at 10:17:55PM +0000, Matthew Wilcox wrote:
> On Tue, Jan 23, 2024 at 02:26:17PM +0100, Christian Brauner wrote:
> > This opens block devices as files. Instead of introducing a separate
> > indirection into bdev_open_by_*() vis struct bdev_handle we can just
> > make bdev_file_open_by_*() return a struct file. Opening and closing a
> > block device from setup_bdev_super() and in all other places just
> > becomes equivalent to opening and closing a file.
> > 
> > This has held up in xfstests and in blktests so far and it seems stable
> > and clean. The equivalence of opening and closing block devices to
> > regular files is a win in and of itself imho. Added to that is the
> > ability to do away with struct bdev_handle completely and make various
> > low-level helpers private to the block layer.
> 
> It fails to hold up in xfstests for me.
> 
> git bisect leads to:
> 
> commit 321de651fa565dcf76c017b257bdf15ec7fff45d
> Author: Christian Brauner <brauner@kernel.org>
> Date:   Tue Jan 23 14:26:48 2024 +0100
> 
>     block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access
> 
> QA output created by 015
> mkfs failed
> (see /ktest-out/xfstests/generic/015.full for details)
> umount: /dev/vdc: not mounted.
> 
> ** mkfs failed with extra mkfs options added to "-m reflink=1,rmapbt=1 -i sparse=1,nrext64=1" by test 015 **
> ** attempting to mkfs using only test 015 options: -d size=268435456 -b size=4096 **
> mkfs.xfs: cannot open /dev/vdc: Device or resource busy
> mkfs failed
> 
> About half the xfstests fail this way (722 of 1387 tests)

Christain, let's chat about testing at LSF - I was looking at this too
because we thought it was a ktest update that broke at first, but if we
can get you using the automated test infrastructure I built this could
get caught before hitting -next
Christian Brauner March 22, 2024, 12:31 p.m. UTC | #11
> ** mkfs failed with extra mkfs options added to "-m reflink=1,rmapbt=1 -i sparse=1,nrext64=1" by test 015 **
> ** attempting to mkfs using only test 015 options: -d size=268435456 -b size=4096 **
> mkfs.xfs: cannot open /dev/vdc: Device or resource busy
> mkfs failed
> 
> About half the xfstests fail this way (722 of 1387 tests)

Thanks for the report. Can you please show me the kernel config and the
xfstests config that was used for this?
Matthew Wilcox March 22, 2024, 12:40 p.m. UTC | #12
On Fri, Mar 22, 2024 at 01:31:23PM +0100, Christian Brauner wrote:
> > ** mkfs failed with extra mkfs options added to "-m reflink=1,rmapbt=1 -i sparse=1,nrext64=1" by test 015 **
> > ** attempting to mkfs using only test 015 options: -d size=268435456 -b size=4096 **
> > mkfs.xfs: cannot open /dev/vdc: Device or resource busy
> > mkfs failed
> > 
> > About half the xfstests fail this way (722 of 1387 tests)
> 
> Thanks for the report. Can you please show me the kernel config and the
> xfstests config that was used for this?

Kernel config attached.

I'll have to defer to Kent on the xfstests config that's used.  It might
be this:

    cat << EOF > /ktest/tests/xfstests/local.config
TEST_DEV=${ktest_scratch_dev[0]}
TEST_DIR=$TEST_DIR
SCRATCH_DEV=${ktest_scratch_dev[1]}
SCRATCH_MNT=/mnt/scratch
LOGWRITES_DEV=${ktest_scratch_dev[2]}
RESULT_BASE=/ktest-out/xfstests
LOGGER_PROG=true
EOF


Also, while generic/015 is the first to fail, you can't just run
generic/015.  You can't even just run 012, 013, 014, 015.  I haven't
bisected to exactly how many predecessor tests are necessary to get
a failure.
Christian Brauner March 22, 2024, 1:53 p.m. UTC | #13
On Fri, Mar 22, 2024 at 12:40:03PM +0000, Matthew Wilcox wrote:
> On Fri, Mar 22, 2024 at 01:31:23PM +0100, Christian Brauner wrote:
> > > ** mkfs failed with extra mkfs options added to "-m reflink=1,rmapbt=1 -i sparse=1,nrext64=1" by test 015 **
> > > ** attempting to mkfs using only test 015 options: -d size=268435456 -b size=4096 **
> > > mkfs.xfs: cannot open /dev/vdc: Device or resource busy
> > > mkfs failed
> > > 
> > > About half the xfstests fail this way (722 of 1387 tests)
> > 
> > Thanks for the report. Can you please show me the kernel config and the
> > xfstests config that was used for this?
> 
> Kernel config attached.
> 
> I'll have to defer to Kent on the xfstests config that's used.  It might
> be this:
> 
>     cat << EOF > /ktest/tests/xfstests/local.config
> TEST_DEV=${ktest_scratch_dev[0]}
> TEST_DIR=$TEST_DIR
> SCRATCH_DEV=${ktest_scratch_dev[1]}
> SCRATCH_MNT=/mnt/scratch
> LOGWRITES_DEV=${ktest_scratch_dev[2]}
> RESULT_BASE=/ktest-out/xfstests
> LOGGER_PROG=true
> EOF
> 
> 
> Also, while generic/015 is the first to fail, you can't just run
> generic/015.  You can't even just run 012, 013, 014, 015.  I haven't
> bisected to exactly how many predecessor tests are necessary to get
> a failure.

Thanks for the info. So it's as I suspected. The config that was used
has
# CONFIG_BLK_DEV_WRITE_MOUNTED is not set
That means it isn't possible to write to mounted block devices. The
default is CONFIG_BLK_DEV_WRITE_MOUNTED=y.

I go through all block-based filesystems with xfstests for such changes
wit various config options. My test matrix hasn't been updated to
specifically unset CONFIG_BLK_DEV_WRITE_MOUNTED which is why this
escaped. I'll send a fix shortly.
Christian Brauner March 22, 2024, 1:56 p.m. UTC | #14
> Christain, let's chat about testing at LSF - I was looking at this too
> because we thought it was a ktest update that broke at first, but if we
> can get you using the automated test infrastructure I built this could
> get caught before hitting -next

I already do automated testing. This specific error depends on a new
config option CONFIG_BLK_DEV_WRITE_MOUNTED which wasn't reflected in my
test matrix. I've fixed that now. But I'm always happy to have the tree
integrated with other automated testing as well!