Message ID | 20230704125702.23180-1-jack@suse.cz (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | block: Add config option to not allow writing to mounted devices | expand |
On Tue, Jul 4, 2023, at 8:56 AM, Jan Kara wrote: > Writing to mounted devices is dangerous and can lead to filesystem > corruption as well as crashes. Furthermore syzbot comes with more and > more involved examples how to corrupt block device under a mounted > filesystem leading to kernel crashes and reports we can do nothing > about. Add tracking of writers to each block device and a kernel cmdline > argument which controls whether writes to block devices open with > BLK_OPEN_BLOCK_WRITES flag are allowed. We will make filesystems use > this flag for used devices. > > Syzbot can use this cmdline argument option to avoid uninteresting > crashes. Also users whose userspace setup does not need writing to > mounted block devices can set this option for hardening. > > Link: > https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org > Signed-off-by: Jan Kara <jack@suse.cz> > --- > block/Kconfig | 16 ++++++++++ > block/bdev.c | 63 ++++++++++++++++++++++++++++++++++++++- > include/linux/blk_types.h | 1 + > include/linux/blkdev.h | 3 ++ > 4 files changed, 82 insertions(+), 1 deletion(-) > > diff --git a/block/Kconfig b/block/Kconfig > index 86122e459fe0..8b4fa105b854 100644 > --- a/block/Kconfig > +++ b/block/Kconfig > @@ -77,6 +77,22 @@ config BLK_DEV_INTEGRITY_T10 > select CRC_T10DIF > select CRC64_ROCKSOFT > > +config BLK_DEV_WRITE_MOUNTED > + bool "Allow writing to mounted block devices" > + default y > + help > + When a block device is mounted, writing to its buffer cache very likely s/very/is very/ > + going to cause filesystem corruption. It is also rather easy to crash > + the kernel in this way since the filesystem has no practical way of > + detecting these writes to buffer cache and verifying its metadata > + integrity. However there are some setups that need this capability > + like running fsck on read-only mounted root device, modifying some > + features on mounted ext4 filesystem, and similar. If you say N, the > + kernel will prevent processes from writing to block devices that are > + mounted by filesystems which provides some more protection from runaway > + priviledged processes. If in doubt, say Y. The configuration can be s/priviledged/privileged/ > + overridden with bdev_allow_write_mounted boot option. s/with/with the/ > +/* open is exclusive wrt all other BLK_OPEN_WRITE opens to the device */ > +#define BLK_OPEN_BLOCK_WRITES ((__force blk_mode_t)(1 << 5)) Bikeshed but: I think BLK and BLOCK "stutter" here. The doc comment already uses the term "exclusive" so how about BLK_OPEN_EXCLUSIVE ?
On Tue, Jul 04, 2023 at 11:56:44AM -0400, Colin Walters wrote: > > +/* open is exclusive wrt all other BLK_OPEN_WRITE opens to the device */ > > +#define BLK_OPEN_BLOCK_WRITES ((__force blk_mode_t)(1 << 5)) > > Bikeshed but: I think BLK and BLOCK "stutter" here. The doc comment already > uses the term "exclusive" so how about BLK_OPEN_EXCLUSIVE ? Yeah, using "block" in two different ways at the same time is confusing. BLK_OPEN_EXCLUSIVE would probably be good, as would something like BLK_OPEN_RESTRICT_WRITES. I can't figure out how to apply this patch series, so I can't really see it in context though. - Eric
On Tue, Jul 04, 2023 at 02:56:49PM +0200, Jan Kara wrote: > Writing to mounted devices is dangerous and can lead to filesystem > corruption as well as crashes. Furthermore syzbot comes with more and > more involved examples how to corrupt block device under a mounted > filesystem leading to kernel crashes and reports we can do nothing > about. Add tracking of writers to each block device and a kernel cmdline > argument which controls whether writes to block devices open with > BLK_OPEN_BLOCK_WRITES flag are allowed. We will make filesystems use > this flag for used devices. > > Syzbot can use this cmdline argument option to avoid uninteresting > crashes. Also users whose userspace setup does not need writing to > mounted block devices can set this option for hardening. > > Link: https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org > Signed-off-by: Jan Kara <jack@suse.cz> > --- > block/Kconfig | 16 ++++++++++ > block/bdev.c | 63 ++++++++++++++++++++++++++++++++++++++- > include/linux/blk_types.h | 1 + > include/linux/blkdev.h | 3 ++ > 4 files changed, 82 insertions(+), 1 deletion(-) > > diff --git a/block/Kconfig b/block/Kconfig > index 86122e459fe0..8b4fa105b854 100644 > --- a/block/Kconfig > +++ b/block/Kconfig > @@ -77,6 +77,22 @@ config BLK_DEV_INTEGRITY_T10 > select CRC_T10DIF > select CRC64_ROCKSOFT > > +config BLK_DEV_WRITE_MOUNTED > + bool "Allow writing to mounted block devices" > + default y > + help > + When a block device is mounted, writing to its buffer cache very likely > + going to cause filesystem corruption. It is also rather easy to crash > + the kernel in this way since the filesystem has no practical way of > + detecting these writes to buffer cache and verifying its metadata > + integrity. However there are some setups that need this capability > + like running fsck on read-only mounted root device, modifying some > + features on mounted ext4 filesystem, and similar. If you say N, the > + kernel will prevent processes from writing to block devices that are > + mounted by filesystems which provides some more protection from runaway > + priviledged processes. If in doubt, say Y. The configuration can be > + overridden with bdev_allow_write_mounted boot option. Does this prevent the underlying storage from being written to? Say if the mounted block device is /dev/sda1 and someone tries to write to /dev/sda in the region that contains sda1. I *think* the answer is no, writes to /dev/sda are still allowed since the goal is just to prevent writes to the buffer cache of mounted block devices, not writes to the underlying storage. That is really something that should be stated explicitly, though. - Eric
On Tue, Jul 04, 2023 at 11:44:16AM -0700, Eric Biggers wrote: > Does this prevent the underlying storage from being written to? Say if the > mounted block device is /dev/sda1 and someone tries to write to /dev/sda in the > region that contains sda1. > > I *think* the answer is no, writes to /dev/sda are still allowed since the goal > is just to prevent writes to the buffer cache of mounted block devices, not > writes to the underlying storage. That is really something that should be > stated explicitly, though. Well, at the risk of giving the Syzbot developers any ideas, we also aren't preventing someone from opening the SCSI generic device and manually sending raw SCSI commands to modify a mounted block device, and then no doubt they would claim that the kernel config CONFIG_CHR_DEV_SG is "insecure", and so therefore any kernel that could support writing CD or DVD's is by definition "insecure" by their lights... Which is why talking about security models without having an agreed upon threat model is really a waste of time... - Ted
On Tue 04-07-23 11:44:16, Eric Biggers wrote: > On Tue, Jul 04, 2023 at 02:56:49PM +0200, Jan Kara wrote: > > Writing to mounted devices is dangerous and can lead to filesystem > > corruption as well as crashes. Furthermore syzbot comes with more and > > more involved examples how to corrupt block device under a mounted > > filesystem leading to kernel crashes and reports we can do nothing > > about. Add tracking of writers to each block device and a kernel cmdline > > argument which controls whether writes to block devices open with > > BLK_OPEN_BLOCK_WRITES flag are allowed. We will make filesystems use > > this flag for used devices. > > > > Syzbot can use this cmdline argument option to avoid uninteresting > > crashes. Also users whose userspace setup does not need writing to > > mounted block devices can set this option for hardening. > > > > Link: https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > block/Kconfig | 16 ++++++++++ > > block/bdev.c | 63 ++++++++++++++++++++++++++++++++++++++- > > include/linux/blk_types.h | 1 + > > include/linux/blkdev.h | 3 ++ > > 4 files changed, 82 insertions(+), 1 deletion(-) > > > > diff --git a/block/Kconfig b/block/Kconfig > > index 86122e459fe0..8b4fa105b854 100644 > > --- a/block/Kconfig > > +++ b/block/Kconfig > > @@ -77,6 +77,22 @@ config BLK_DEV_INTEGRITY_T10 > > select CRC_T10DIF > > select CRC64_ROCKSOFT > > > > +config BLK_DEV_WRITE_MOUNTED > > + bool "Allow writing to mounted block devices" > > + default y > > + help > > + When a block device is mounted, writing to its buffer cache very likely > > + going to cause filesystem corruption. It is also rather easy to crash > > + the kernel in this way since the filesystem has no practical way of > > + detecting these writes to buffer cache and verifying its metadata > > + integrity. However there are some setups that need this capability > > + like running fsck on read-only mounted root device, modifying some > > + features on mounted ext4 filesystem, and similar. If you say N, the > > + kernel will prevent processes from writing to block devices that are > > + mounted by filesystems which provides some more protection from runaway > > + priviledged processes. If in doubt, say Y. The configuration can be > > + overridden with bdev_allow_write_mounted boot option. > > Does this prevent the underlying storage from being written to? Say if the > mounted block device is /dev/sda1 and someone tries to write to /dev/sda in the > region that contains sda1. > > I *think* the answer is no, writes to /dev/sda are still allowed since the goal > is just to prevent writes to the buffer cache of mounted block devices, not > writes to the underlying storage. That is really something that should be > stated explicitly, though. You are correct. The answer is "no" because as Ted says, there are many ways to do that anyway and for a filesystem it is generally not much different from just corrupted fs image. I'll explicitely mention it in the config text, that's a good idea. Honza
On Tue, Jul 04, 2023 at 02:56:49PM +0200, Jan Kara wrote: > Writing to mounted devices is dangerous and can lead to filesystem > corruption as well as crashes. Furthermore syzbot comes with more and > more involved examples how to corrupt block device under a mounted > filesystem leading to kernel crashes and reports we can do nothing > about. Add tracking of writers to each block device and a kernel cmdline > argument which controls whether writes to block devices open with > BLK_OPEN_BLOCK_WRITES flag are allowed. We will make filesystems use > this flag for used devices. > > Syzbot can use this cmdline argument option to avoid uninteresting > crashes. Also users whose userspace setup does not need writing to > mounted block devices can set this option for hardening. > > Link: https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org > Signed-off-by: Jan Kara <jack@suse.cz> > --- > block/Kconfig | 16 ++++++++++ > block/bdev.c | 63 ++++++++++++++++++++++++++++++++++++++- > include/linux/blk_types.h | 1 + > include/linux/blkdev.h | 3 ++ > 4 files changed, 82 insertions(+), 1 deletion(-) > > diff --git a/block/Kconfig b/block/Kconfig > index 86122e459fe0..8b4fa105b854 100644 > --- a/block/Kconfig > +++ b/block/Kconfig > @@ -77,6 +77,22 @@ config BLK_DEV_INTEGRITY_T10 > select CRC_T10DIF > select CRC64_ROCKSOFT > > +config BLK_DEV_WRITE_MOUNTED > + bool "Allow writing to mounted block devices" > + default y > + help > + When a block device is mounted, writing to its buffer cache very likely > + going to cause filesystem corruption. It is also rather easy to crash > + the kernel in this way since the filesystem has no practical way of > + detecting these writes to buffer cache and verifying its metadata > + integrity. However there are some setups that need this capability > + like running fsck on read-only mounted root device, modifying some > + features on mounted ext4 filesystem, and similar. If you say N, the > + kernel will prevent processes from writing to block devices that are > + mounted by filesystems which provides some more protection from runaway > + priviledged processes. If in doubt, say Y. The configuration can be > + overridden with bdev_allow_write_mounted boot option. > + > config BLK_DEV_ZONED > bool "Zoned block device support" > select MQ_IOSCHED_DEADLINE > diff --git a/block/bdev.c b/block/bdev.c > index 523ea7289834..346e68dbf0bf 100644 > --- a/block/bdev.c > +++ b/block/bdev.c > @@ -30,6 +30,9 @@ > #include "../fs/internal.h" > #include "blk.h" > > +/* Should we allow writing to mounted block devices? */ > +static bool bdev_allow_write_mounted = IS_ENABLED(CONFIG_BLK_DEV_WRITE_MOUNTED); This might be premature at this point, but I wonder if you've given any consideration to adding a lockdown prohibition as well? e.g. static inline bool bdev_allow_write_mounted(void) { if (security_locked_down(LOCKDOWN_MOUNTED_BDEV) != 0) return false; return __bdev_allow_write_mounted; } --D > struct bdev_inode { > struct block_device bdev; > struct inode vfs_inode; > @@ -744,7 +747,34 @@ void blkdev_put_no_open(struct block_device *bdev) > { > put_device(&bdev->bd_device); > } > - > + > +static bool bdev_writes_blocked(struct block_device *bdev) > +{ > + return bdev->bd_writers == -1; > +} > + > +static void bdev_block_writes(struct block_device *bdev) > +{ > + bdev->bd_writers = -1; > +} > + > +static void bdev_unblock_writes(struct block_device *bdev) > +{ > + bdev->bd_writers = 0; > +} > + > +static bool blkdev_open_compatible(struct block_device *bdev, blk_mode_t mode) > +{ > + if (!bdev_allow_write_mounted) { > + /* Writes blocked? */ > + if (mode & BLK_OPEN_WRITE && bdev_writes_blocked(bdev)) > + return false; > + if (mode & BLK_OPEN_BLOCK_WRITES && bdev->bd_writers > 0) > + return false; > + } > + return true; > +} > + > /** > * blkdev_get_by_dev - open a block device by device number > * @dev: device number of block device to open > @@ -787,6 +817,10 @@ struct bdev_handle *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder, > if (ret) > goto free_handle; > > + /* Blocking writes requires exclusive opener */ > + if (mode & BLK_OPEN_BLOCK_WRITES && !holder) > + return ERR_PTR(-EINVAL); > + > bdev = blkdev_get_no_open(dev); > if (!bdev) { > ret = -ENXIO; > @@ -814,12 +848,21 @@ struct bdev_handle *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder, > goto abort_claiming; > if (!try_module_get(disk->fops->owner)) > goto abort_claiming; > + ret = -EBUSY; > + if (!blkdev_open_compatible(bdev, mode)) > + goto abort_claiming; > if (bdev_is_partition(bdev)) > ret = blkdev_get_part(bdev, mode); > else > ret = blkdev_get_whole(bdev, mode); > if (ret) > goto put_module; > + if (!bdev_allow_write_mounted) { > + if (mode & BLK_OPEN_BLOCK_WRITES) > + bdev_block_writes(bdev); > + else if (mode & BLK_OPEN_WRITE) > + bdev->bd_writers++; > + } > if (holder) { > bd_finish_claiming(bdev, holder, hops); > > @@ -842,6 +885,7 @@ struct bdev_handle *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder, > disk_unblock_events(disk); > handle->bdev = bdev; > handle->holder = holder; > + handle->mode = mode; > return handle; > put_module: > module_put(disk->fops->owner); > @@ -914,6 +958,14 @@ void blkdev_put(struct bdev_handle *handle) > sync_blockdev(bdev); > > mutex_lock(&disk->open_mutex); > + if (!bdev_allow_write_mounted) { > + /* The exclusive opener was blocking writes? Unblock them. */ > + if (handle->mode & BLK_OPEN_BLOCK_WRITES) > + bdev_unblock_writes(bdev); > + else if (handle->mode & BLK_OPEN_WRITE) > + bdev->bd_writers--; > + } > + > if (handle->holder) > bd_end_claim(bdev, handle->holder); > > @@ -1070,3 +1122,12 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat) > > blkdev_put_no_open(bdev); > } > + > +static int __init setup_bdev_allow_write_mounted(char *str) > +{ > + if (kstrtobool(str, &bdev_allow_write_mounted)) > + pr_warn("Invalid option string for bdev_allow_write_mounted:" > + " '%s'\n", str); > + return 1; > +} > +__setup("bdev_allow_write_mounted=", setup_bdev_allow_write_mounted); > diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h > index 0bad62cca3d0..5bf0d2d458fd 100644 > --- a/include/linux/blk_types.h > +++ b/include/linux/blk_types.h > @@ -70,6 +70,7 @@ struct block_device { > #ifdef CONFIG_FAIL_MAKE_REQUEST > bool bd_make_it_fail; > #endif > + int bd_writers; > /* > * keep this out-of-line as it's both big and not needed in the fast > * path > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index 4ae3647a0322..ca467525e6e4 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -124,6 +124,8 @@ typedef unsigned int __bitwise blk_mode_t; > #define BLK_OPEN_NDELAY ((__force blk_mode_t)(1 << 3)) > /* open for "writes" only for ioctls (specialy hack for floppy.c) */ > #define BLK_OPEN_WRITE_IOCTL ((__force blk_mode_t)(1 << 4)) > +/* open is exclusive wrt all other BLK_OPEN_WRITE opens to the device */ > +#define BLK_OPEN_BLOCK_WRITES ((__force blk_mode_t)(1 << 5)) > > struct gendisk { > /* > @@ -1474,6 +1476,7 @@ struct blk_holder_ops { > struct bdev_handle { > struct block_device *bdev; > void *holder; > + blk_mode_t mode; > }; > > struct bdev_handle *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder, > -- > 2.35.3 >
On Tue 04-07-23 09:52:40, Eric Biggers wrote: > On Tue, Jul 04, 2023 at 11:56:44AM -0400, Colin Walters wrote: > > > +/* open is exclusive wrt all other BLK_OPEN_WRITE opens to the device */ > > > +#define BLK_OPEN_BLOCK_WRITES ((__force blk_mode_t)(1 << 5)) > > > > Bikeshed but: I think BLK and BLOCK "stutter" here. The doc comment already > > uses the term "exclusive" so how about BLK_OPEN_EXCLUSIVE ? > > Yeah, using "block" in two different ways at the same time is confusing. > BLK_OPEN_EXCLUSIVE would probably be good, as would something like > BLK_OPEN_RESTRICT_WRITES. BLK_OPEN_RESTRICT_WRITES sounds good to me. I'll rename the flag. Honza
On Tue 04-07-23 11:56:44, Colin Walters wrote: > On Tue, Jul 4, 2023, at 8:56 AM, Jan Kara wrote: > > Writing to mounted devices is dangerous and can lead to filesystem > > corruption as well as crashes. Furthermore syzbot comes with more and > > more involved examples how to corrupt block device under a mounted > > filesystem leading to kernel crashes and reports we can do nothing > > about. Add tracking of writers to each block device and a kernel cmdline > > argument which controls whether writes to block devices open with > > BLK_OPEN_BLOCK_WRITES flag are allowed. We will make filesystems use > > this flag for used devices. > > > > Syzbot can use this cmdline argument option to avoid uninteresting > > crashes. Also users whose userspace setup does not need writing to > > mounted block devices can set this option for hardening. > > > > Link: > > https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > block/Kconfig | 16 ++++++++++ > > block/bdev.c | 63 ++++++++++++++++++++++++++++++++++++++- > > include/linux/blk_types.h | 1 + > > include/linux/blkdev.h | 3 ++ > > 4 files changed, 82 insertions(+), 1 deletion(-) > > > > diff --git a/block/Kconfig b/block/Kconfig > > index 86122e459fe0..8b4fa105b854 100644 > > --- a/block/Kconfig > > +++ b/block/Kconfig > > @@ -77,6 +77,22 @@ config BLK_DEV_INTEGRITY_T10 > > select CRC_T10DIF > > select CRC64_ROCKSOFT > > > > +config BLK_DEV_WRITE_MOUNTED > > + bool "Allow writing to mounted block devices" > > + default y > > + help > > + When a block device is mounted, writing to its buffer cache very likely > > s/very/is very/ > > > + going to cause filesystem corruption. It is also rather easy to crash > > + the kernel in this way since the filesystem has no practical way of > > + detecting these writes to buffer cache and verifying its metadata > > + integrity. However there are some setups that need this capability > > + like running fsck on read-only mounted root device, modifying some > > + features on mounted ext4 filesystem, and similar. If you say N, the > > + kernel will prevent processes from writing to block devices that are > > + mounted by filesystems which provides some more protection from runaway > > + priviledged processes. If in doubt, say Y. The configuration can be > > s/priviledged/privileged/ > > > + overridden with bdev_allow_write_mounted boot option. > > s/with/with the/ Thanks for the language fixes! > > +/* open is exclusive wrt all other BLK_OPEN_WRITE opens to the device */ > > +#define BLK_OPEN_BLOCK_WRITES ((__force blk_mode_t)(1 << 5)) > > Bikeshed but: I think BLK and BLOCK "stutter" here. The doc comment > already uses the term "exclusive" so how about BLK_OPEN_EXCLUSIVE ? Well, we already have exclusive opens of block devices which are different (they are exclusive only wrt other exclusive opens) so BLK_OPEN_EXCLUSIVE will be really confusing. But BLK_OPEN_RESTRICT_WRITES sounds good to me. Honza
Hi Jan, On Tue, Jul 04, 2023 at 02:56:49PM +0200, Jan Kara wrote: > Writing to mounted devices is dangerous and can lead to filesystem > corruption as well as crashes. Furthermore syzbot comes with more and > more involved examples how to corrupt block device under a mounted > filesystem leading to kernel crashes and reports we can do nothing > about. Add tracking of writers to each block device and a kernel cmdline > argument which controls whether writes to block devices open with > BLK_OPEN_BLOCK_WRITES flag are allowed. We will make filesystems use > this flag for used devices. > > Syzbot can use this cmdline argument option to avoid uninteresting > crashes. Also users whose userspace setup does not need writing to > mounted block devices can set this option for hardening. > > Link: https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org > Signed-off-by: Jan Kara <jack@suse.cz> Can you make it clear that the important thing this patch prevents is writes to the block device's buffer cache, not writes to the underlying storage? It's super important not to confuse the two cases. Related to this topic, I wonder if there is any value in providing an option that would allow O_DIRECT writes but forbid buffered writes? Would that be useful for any of the known use cases for writing to mounted block devices? - Eric
Hi Eric! On Mon 21-08-23 22:35:23, Eric Biggers wrote: > On Tue, Jul 04, 2023 at 02:56:49PM +0200, Jan Kara wrote: > > Writing to mounted devices is dangerous and can lead to filesystem > > corruption as well as crashes. Furthermore syzbot comes with more and > > more involved examples how to corrupt block device under a mounted > > filesystem leading to kernel crashes and reports we can do nothing > > about. Add tracking of writers to each block device and a kernel cmdline > > argument which controls whether writes to block devices open with > > BLK_OPEN_BLOCK_WRITES flag are allowed. We will make filesystems use > > this flag for used devices. > > > > Syzbot can use this cmdline argument option to avoid uninteresting > > crashes. Also users whose userspace setup does not need writing to > > mounted block devices can set this option for hardening. > > > > Link: https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org > > Signed-off-by: Jan Kara <jack@suse.cz> > > Can you make it clear that the important thing this patch prevents is > writes to the block device's buffer cache, not writes to the underlying > storage? It's super important not to confuse the two cases. Right, I've already updated the description of the help text in the kconfig to explicitely explain that this does not prevent underlying device content from being modified, it just prevents writes the the block device itself. But I guess I can also explain this (with a bit more technical details) in the changelog. Good idea. > Related to this topic, I wonder if there is any value in providing an option > that would allow O_DIRECT writes but forbid buffered writes? Would that be > useful for any of the known use cases for writing to mounted block devices? I'm not sure how useful that would be but it would be certainly rather difficult to implement. The problem is we can currently fallback from direct to buffered IO as we see fit, also we need to invalidate page cache while doing direct IO which can fail etc. So it will be a rather nasty can of worms to open... Honza
Hi Jan, Thank you for the series! Have you already had a chance to push an updated version of it? I tried to search LKML, but didn't find anything. Or did you decide to put it off until later?
Hi! On Thu 19-10-23 11:16:55, Aleksandr Nogikh wrote: > Thank you for the series! > > Have you already had a chance to push an updated version of it? > I tried to search LKML, but didn't find anything. > > Or did you decide to put it off until later? So there is preliminary series sitting in VFS tree that changes how block devices are open. There are some conflicts with btrfs tree and bcachefs merge that complicate all this (plus there was quite some churn in VFS itself due to changing rules how block devices are open) so I didn't push out the series that actually forbids opening of mounted block devices because that would cause a "merge from hell" issues. I plan to push out the remaining patches once the merge window closes and all the dependencies are hopefully in a stable state. Maybe I can push out the series earlier based on linux-next so that people can have a look at the current state. Honza
I see, thanks for sharing the details! We'll set CONFIG_BLK_DEV_WRITE_MOUNTED=n on syzbot once the series is in linux-next. On Tue, Oct 24, 2023 at 1:10 PM Jan Kara <jack@suse.cz> wrote: > > Hi! > > On Thu 19-10-23 11:16:55, Aleksandr Nogikh wrote: > > Thank you for the series! > > > > Have you already had a chance to push an updated version of it? > > I tried to search LKML, but didn't find anything. > > > > Or did you decide to put it off until later? > > So there is preliminary series sitting in VFS tree that changes how block > devices are open. There are some conflicts with btrfs tree and bcachefs > merge that complicate all this (plus there was quite some churn in VFS > itself due to changing rules how block devices are open) so I didn't push > out the series that actually forbids opening of mounted block devices > because that would cause a "merge from hell" issues. I plan to push out the > remaining patches once the merge window closes and all the dependencies are > hopefully in a stable state. Maybe I can push out the series earlier based > on linux-next so that people can have a look at the current state. > > Honza > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR
Hi! On Tue 24-10-23 13:10:15, Jan Kara wrote: > On Thu 19-10-23 11:16:55, Aleksandr Nogikh wrote: > > Thank you for the series! > > > > Have you already had a chance to push an updated version of it? > > I tried to search LKML, but didn't find anything. > > > > Or did you decide to put it off until later? > > So there is preliminary series sitting in VFS tree that changes how block > devices are open. There are some conflicts with btrfs tree and bcachefs > merge that complicate all this (plus there was quite some churn in VFS > itself due to changing rules how block devices are open) so I didn't push > out the series that actually forbids opening of mounted block devices > because that would cause a "merge from hell" issues. I plan to push out the > remaining patches once the merge window closes and all the dependencies are > hopefully in a stable state. Maybe I can push out the series earlier based > on linux-next so that people can have a look at the current state. So patches are now in VFS tree [1] so they should be in linux-next as well. You should be able to start using the config option for syzbot runs :) Honza [1] https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=vfs.super
Hi! Thanks for letting me know! I've sent a PR with new syzbot configs: https://github.com/google/syzkaller/pull/4324
diff --git a/block/Kconfig b/block/Kconfig index 86122e459fe0..8b4fa105b854 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -77,6 +77,22 @@ config BLK_DEV_INTEGRITY_T10 select CRC_T10DIF select CRC64_ROCKSOFT +config BLK_DEV_WRITE_MOUNTED + bool "Allow writing to mounted block devices" + default y + help + When a block device is mounted, writing to its buffer cache very likely + going to cause filesystem corruption. It is also rather easy to crash + the kernel in this way since the filesystem has no practical way of + detecting these writes to buffer cache and verifying its metadata + integrity. However there are some setups that need this capability + like running fsck on read-only mounted root device, modifying some + features on mounted ext4 filesystem, and similar. If you say N, the + kernel will prevent processes from writing to block devices that are + mounted by filesystems which provides some more protection from runaway + priviledged processes. If in doubt, say Y. The configuration can be + overridden with bdev_allow_write_mounted boot option. + config BLK_DEV_ZONED bool "Zoned block device support" select MQ_IOSCHED_DEADLINE diff --git a/block/bdev.c b/block/bdev.c index 523ea7289834..346e68dbf0bf 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -30,6 +30,9 @@ #include "../fs/internal.h" #include "blk.h" +/* Should we allow writing to mounted block devices? */ +static bool bdev_allow_write_mounted = IS_ENABLED(CONFIG_BLK_DEV_WRITE_MOUNTED); + struct bdev_inode { struct block_device bdev; struct inode vfs_inode; @@ -744,7 +747,34 @@ void blkdev_put_no_open(struct block_device *bdev) { put_device(&bdev->bd_device); } - + +static bool bdev_writes_blocked(struct block_device *bdev) +{ + return bdev->bd_writers == -1; +} + +static void bdev_block_writes(struct block_device *bdev) +{ + bdev->bd_writers = -1; +} + +static void bdev_unblock_writes(struct block_device *bdev) +{ + bdev->bd_writers = 0; +} + +static bool blkdev_open_compatible(struct block_device *bdev, blk_mode_t mode) +{ + if (!bdev_allow_write_mounted) { + /* Writes blocked? */ + if (mode & BLK_OPEN_WRITE && bdev_writes_blocked(bdev)) + return false; + if (mode & BLK_OPEN_BLOCK_WRITES && bdev->bd_writers > 0) + return false; + } + return true; +} + /** * blkdev_get_by_dev - open a block device by device number * @dev: device number of block device to open @@ -787,6 +817,10 @@ struct bdev_handle *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder, if (ret) goto free_handle; + /* Blocking writes requires exclusive opener */ + if (mode & BLK_OPEN_BLOCK_WRITES && !holder) + return ERR_PTR(-EINVAL); + bdev = blkdev_get_no_open(dev); if (!bdev) { ret = -ENXIO; @@ -814,12 +848,21 @@ struct bdev_handle *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder, goto abort_claiming; if (!try_module_get(disk->fops->owner)) goto abort_claiming; + ret = -EBUSY; + if (!blkdev_open_compatible(bdev, mode)) + goto abort_claiming; if (bdev_is_partition(bdev)) ret = blkdev_get_part(bdev, mode); else ret = blkdev_get_whole(bdev, mode); if (ret) goto put_module; + if (!bdev_allow_write_mounted) { + if (mode & BLK_OPEN_BLOCK_WRITES) + bdev_block_writes(bdev); + else if (mode & BLK_OPEN_WRITE) + bdev->bd_writers++; + } if (holder) { bd_finish_claiming(bdev, holder, hops); @@ -842,6 +885,7 @@ struct bdev_handle *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder, disk_unblock_events(disk); handle->bdev = bdev; handle->holder = holder; + handle->mode = mode; return handle; put_module: module_put(disk->fops->owner); @@ -914,6 +958,14 @@ void blkdev_put(struct bdev_handle *handle) sync_blockdev(bdev); mutex_lock(&disk->open_mutex); + if (!bdev_allow_write_mounted) { + /* The exclusive opener was blocking writes? Unblock them. */ + if (handle->mode & BLK_OPEN_BLOCK_WRITES) + bdev_unblock_writes(bdev); + else if (handle->mode & BLK_OPEN_WRITE) + bdev->bd_writers--; + } + if (handle->holder) bd_end_claim(bdev, handle->holder); @@ -1070,3 +1122,12 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat) blkdev_put_no_open(bdev); } + +static int __init setup_bdev_allow_write_mounted(char *str) +{ + if (kstrtobool(str, &bdev_allow_write_mounted)) + pr_warn("Invalid option string for bdev_allow_write_mounted:" + " '%s'\n", str); + return 1; +} +__setup("bdev_allow_write_mounted=", setup_bdev_allow_write_mounted); diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 0bad62cca3d0..5bf0d2d458fd 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -70,6 +70,7 @@ struct block_device { #ifdef CONFIG_FAIL_MAKE_REQUEST bool bd_make_it_fail; #endif + int bd_writers; /* * keep this out-of-line as it's both big and not needed in the fast * path diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 4ae3647a0322..ca467525e6e4 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -124,6 +124,8 @@ typedef unsigned int __bitwise blk_mode_t; #define BLK_OPEN_NDELAY ((__force blk_mode_t)(1 << 3)) /* open for "writes" only for ioctls (specialy hack for floppy.c) */ #define BLK_OPEN_WRITE_IOCTL ((__force blk_mode_t)(1 << 4)) +/* open is exclusive wrt all other BLK_OPEN_WRITE opens to the device */ +#define BLK_OPEN_BLOCK_WRITES ((__force blk_mode_t)(1 << 5)) struct gendisk { /* @@ -1474,6 +1476,7 @@ struct blk_holder_ops { struct bdev_handle { struct block_device *bdev; void *holder; + blk_mode_t mode; }; struct bdev_handle *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder,
Writing to mounted devices is dangerous and can lead to filesystem corruption as well as crashes. Furthermore syzbot comes with more and more involved examples how to corrupt block device under a mounted filesystem leading to kernel crashes and reports we can do nothing about. Add tracking of writers to each block device and a kernel cmdline argument which controls whether writes to block devices open with BLK_OPEN_BLOCK_WRITES flag are allowed. We will make filesystems use this flag for used devices. Syzbot can use this cmdline argument option to avoid uninteresting crashes. Also users whose userspace setup does not need writing to mounted block devices can set this option for hardening. Link: https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org Signed-off-by: Jan Kara <jack@suse.cz> --- block/Kconfig | 16 ++++++++++ block/bdev.c | 63 ++++++++++++++++++++++++++++++++++++++- include/linux/blk_types.h | 1 + include/linux/blkdev.h | 3 ++ 4 files changed, 82 insertions(+), 1 deletion(-)