Message ID | 4ebef78f07ff1ea4d553c481ffa9e130d65db772.1719520771.git.daniel@makrotopia.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | block: preparations for NVMEM provider | expand |
On Thu, Jun 27, 2024 at 09:50:50PM +0100, Daniel Golle wrote: > Add notifier block to notify other subsystems about the addition or > removal of block devices. Notification for what? I really hate the concept of random modular code being able to hook into device discovery / partition scanning. And not actually having a user for it is a complete no-go.
On Thu, Jun 27, 2024 at 09:45:45PM -0700, Christoph Hellwig wrote: > On Thu, Jun 27, 2024 at 09:50:50PM +0100, Daniel Golle wrote: > > Add notifier block to notify other subsystems about the addition or > > removal of block devices. > > Notification for what? I really hate the concept of random modular > code being able to hook into device discovery / partition scanning. Adding a dedicated notification interface (instead of using block internals in the nvmem driver) has been requested in a previous review by Bart Van Assche: https://patchwork.kernel.org/comment/25771998/ Quote from that previous review comment: >>> Why to add this functionality to the block layer instead of somewhere >>> in the drivers/ directory? >> >> Simply because we need notifications about appearing and disappearing >> block devices, or a way to iterate over all block devices in a system. >> For both there isn't currently any other interface than using a >> class_interface for that, and that requires access to &block_class >> which is considered a block subsystem internal. > > That's an argument for adding an interface to the block layer that > implements this functionality but not for adding this code in the block > layer. --- So that's what I did consequently. Using the notification interface the NVMEM driver can live in drivers/nvmem/ and doesn't need to be using block internals. > And not actually having a user for it is a complete no-go. > The user will be the nvmem provider, you can see the code in earlier versions of the patchset where I had included it: https://patchwork.kernel.org/project/linux-block/patch/96554d6b4d9fa72f936c2c476eb0b023cdd60a64.1717031992.git.daniel@makrotopia.org/ Being another subsystem I thought it'd be better to deal with the block related things first, and once that has been sorted out I will move on to add the NVMEM driver and make the necessary changes for using it on eMMC. Thank you for taking a look at this. Let me know if anything else is not clear or needs to be changed. Cheers Daniel
On Fri, Jun 28, 2024 at 01:23:47PM +0100, Daniel Golle wrote: > So that's what I did consequently. Using the notification interface > the NVMEM driver can live in drivers/nvmem/ and doesn't need to be > using block internals. > > > And not actually having a user for it is a complete no-go. > > > > The user will be the nvmem provider, you can see the code in earlier > versions of the patchset where I had included it: > > https://patchwork.kernel.org/project/linux-block/patch/96554d6b4d9fa72f936c2c476eb0b023cdd60a64.1717031992.git.daniel@makrotopia.org/ > > Being another subsystem I thought it'd be better to deal with the > block related things first, and once that has been sorted out I will > move on to add the NVMEM driver and make the necessary changes for > using it on eMMC. It is rather hard to review an interface without the users. I still dislike the idea of notifications from bdev discovery / partition scanning into the users of them. We have one such users in the MD legacy autodetect code, but that has largely been considered at bad idea and distros tend to use mdadm based assembly from initramfs instead. Which IMHO feels like the right thing for nvmem as well, just have an nvmem provider that opens a file for a user provided path and use kernel_read on it. This can covered block devices, character devices and even regular files. It will require initramfs support, but that is pretty much used everywhere for storage discovery and setup anyway.
Hi Christoph, On Fri, Jun 28, 2024 at 05:50:20AM -0700, Christoph Hellwig wrote: > On Fri, Jun 28, 2024 at 01:23:47PM +0100, Daniel Golle wrote: > > So that's what I did consequently. Using the notification interface > > the NVMEM driver can live in drivers/nvmem/ and doesn't need to be > > using block internals. > > > > > And not actually having a user for it is a complete no-go. > > > > > > > The user will be the nvmem provider, you can see the code in earlier > > versions of the patchset where I had included it: > > > > https://patchwork.kernel.org/project/linux-block/patch/96554d6b4d9fa72f936c2c476eb0b023cdd60a64.1717031992.git.daniel@makrotopia.org/ > > > > Being another subsystem I thought it'd be better to deal with the > > block related things first, and once that has been sorted out I will > > move on to add the NVMEM driver and make the necessary changes for > > using it on eMMC. > > It is rather hard to review an interface without the users. I understand that, please take a look at previous iterations of this very series or use the link to the patch which I have sent before (see above). > > I still dislike the idea of notifications from bdev discovery / > partition scanning into the users of them. We have one such users > in the MD legacy autodetect code, but that has largely been considered > at bad idea and distros tend to use mdadm based assembly from initramfs > instead. Which IMHO feels like the right thing for nvmem as well, > just have an nvmem provider that opens a file for a user provided > path and use kernel_read on it. This can covered block devices, > character devices and even regular files. It will require initramfs > support, but that is pretty much used everywhere for storage discovery > and setup anyway. The problem there is that then we cannot use Device Tree to device the NVMEM layouts, and reference NVMEM bits to the dirvers which need them. Hence also the definition of the NVMEM layout would have to happen in userspace, inside an initramfs. I know that having an initramfs is common for classic desktop or server distributions, but the same is not true on more simple embedded devices such as router/firewall or WiFi access point appliances running OpenWrt. Carrying all that board-specific knowledge in the form of scripts identifying the board is not exactly nice, nor is creating an individual initramfs for each of the 1000+ boards supported by OpenWrt, for example. Extracting the layout information from /sys/firmware/devicetree in userspace just to then somehow use libblkid to identify the block device and throw that information back at the kernel which requested it e.g. using a firmware hotplug call is an option, but imho an unnecessary complication. And obviously it would still prevent things like nfsroot (which requires the MAC address and potentially a WiFi calibration data to be setup) from working out of the box. Doing the detour via userspace only for devices with an eMMC would be different from how it is done for any other type of backing storage such as simple I2C EEPROMs, (SPI-)flashes or UBI volumes. Having a unified approach for all of them would make things much more convenient, as typically the actual layouts are even the same and can be reused accross devices of the same vendor. GL-iNet or ASUS router devices are a good example for that: The more high-end ones come with an eMMC and use the same NVMEM layout inside a GPT partition than mid-field devices on SPI-NAND or UBI, and older and/or lower-end devices on an SPI-NOR flash MTD partition. To better understand the situation, maybe look at a few examples for which we are currently already using this patchset downstream at OpenWrt: Inside a GPT partition on an eMMC: https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/mediatek/dts/mt7981b-unielec-u7981-01-emmc.dts;h=abd4d4e59d74cc0d24e8c9e10e16c0859844e003;hb=HEAD#l39 On raw SPI-NAND (already possible with vanilla Linux): https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/mediatek/dts/mt7981b-unielec-u7981-01-nand.dts;h=230a612a34f7d370eb09e92284950d9949bf10cd;hb=HEAD#l45 Inside a UBI volume (also already possible with vanilla Linux): https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/mediatek/dts/mt7986a-asus-tuf-ax4200.dts;h=e40602fa215e1a677b534c85b767e824af041518;hb=HEAD#l255 Inside the boot hwpartition of the eMMC: https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/mediatek/dts/mt7981b-glinet-gl-mt2500.dts;h=15818a90fca02955879d1bcca55ba2153e390621;hb=HEAD#l156 Inside a GPT partition on an eMMC: https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/mediatek/dts/mt7986a-glinet-gl-mt6000.dts;h=fd0e1a69157ed0f27c32376aabab0fcc85ce23b9;hb=HEAD#l317 https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/mediatek/dts/mt7988a-smartrg-mt-stuart.dtsi;h=2b468f9bb311141d083e51ca5edaa7dce253c91c;hb=HEAD#l504 Those are just the examples I have been working on myself, so it was easy to come up with them. There are also ASUS devices with Qualcomm SoC using the same layout as the MediaTek-based devices inside a UBI volume. Once a unified way to describe the loaction of the NVMEM bits is also present in upstream Linux, all those downstream device trees could be submitted for inclusion in Linux, and one could install a general-purpose OS like Debian **which wouldn't need to know anything about the details of where to read MAC addresses or calibration data from**, all hardware-specific knowledge would reside in DT. The failure of defining this in a nice way results in very ugly situations such as distributions carrying the (board-specific!) calibration data for very few but very common single-board computers like the RaspberryPi in their rootfs or even in initramfs. Each distribution with a different level of hacks to give them individual MAC addresses and to load the correct file... And while this doesn't look so bad for systems which anyway come only with removable microSD storage, it **does* get ugly when it comes to systems which do store that information on their eMMC (typically "appliances" rather than SBCs meant for tinkerers). Please take all that into consiration and also note that obviously, on systems not making use of any of that, you may simple not enable CONFIG_BLOCK_NOTIFIERS -- other than in previous iterations of the patchset this is all completely optional now. Cheers Daniel
On Fri, Jun 28, 2024 at 03:00:05PM +0100, Daniel Golle wrote: > The problem there is that then we cannot use Device Tree to device the > NVMEM layouts, and reference NVMEM bits to the dirvers which need them. > Hence also the definition of the NVMEM layout would have to happen in > userspace, inside an initramfs. I know that having an initramfs is > common for classic desktop or server distributions, but the same is not > true on more simple embedded devices such as router/firewall or WiFi > access point appliances running OpenWrt. Maybe it needs to become more common so that we don't need crazy kernel workarounds for something that can be trivially done with a few lines of userspace code?
diff --git a/block/Kconfig b/block/Kconfig index 5b623b876d3b..67cd4f92378a 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -209,6 +209,12 @@ config BLK_INLINE_ENCRYPTION_FALLBACK by falling back to the kernel crypto API when inline encryption hardware is not present. +config BLOCK_NOTIFIERS + bool "Enable support for notifications in block layer" + help + Enable this option to provide notifiers for other subsystems + upon addition or removal of block devices. + source "block/partitions/Kconfig" config BLK_MQ_PCI diff --git a/block/Makefile b/block/Makefile index ddfd21c1a9ff..a131fa7d6b26 100644 --- a/block/Makefile +++ b/block/Makefile @@ -38,3 +38,4 @@ obj-$(CONFIG_BLK_INLINE_ENCRYPTION) += blk-crypto.o blk-crypto-profile.o \ blk-crypto-sysfs.o obj-$(CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK) += blk-crypto-fallback.o obj-$(CONFIG_BLOCK_HOLDER_DEPRECATED) += holder.o +obj-$(CONFIG_BLOCK_NOTIFIERS) += blk-notify.o diff --git a/block/blk-notify.c b/block/blk-notify.c new file mode 100644 index 000000000000..fd727288ea19 --- /dev/null +++ b/block/blk-notify.c @@ -0,0 +1,87 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Notifiers for addition and removal of block devices + * + * Copyright (c) 2024 Daniel Golle <daniel@makrotopia.org> + */ + +#include <linux/list.h> +#include <linux/mutex.h> +#include <linux/notifier.h> + +#include "blk.h" + +struct blk_device_list { + struct device *dev; + struct list_head list; +}; + +static RAW_NOTIFIER_HEAD(blk_notifier_list); +static DEFINE_MUTEX(blk_notifier_lock); +static LIST_HEAD(blk_devices); + +void blk_register_notify(struct notifier_block *nb) +{ + struct blk_device_list *existing_blkdev; + + mutex_lock(&blk_notifier_lock); + raw_notifier_chain_register(&blk_notifier_list, nb); + + list_for_each_entry(existing_blkdev, &blk_devices, list) + nb->notifier_call(nb, BLK_DEVICE_ADD, existing_blkdev->dev); + + mutex_unlock(&blk_notifier_lock); +} +EXPORT_SYMBOL_GPL(blk_register_notify); + +void blk_unregister_notify(struct notifier_block *nb) +{ + mutex_lock(&blk_notifier_lock); + raw_notifier_chain_unregister(&blk_notifier_list, nb); + mutex_unlock(&blk_notifier_lock); +} +EXPORT_SYMBOL_GPL(blk_unregister_notify); + +static int blk_call_notifier_add(struct device *dev) +{ + struct blk_device_list *new_blkdev; + + new_blkdev = kmalloc(sizeof(*new_blkdev), GFP_KERNEL); + if (!new_blkdev) + return -ENOMEM; + + new_blkdev->dev = dev; + mutex_lock(&blk_notifier_lock); + list_add_tail(&new_blkdev->list, &blk_devices); + raw_notifier_call_chain(&blk_notifier_list, BLK_DEVICE_ADD, dev); + mutex_unlock(&blk_notifier_lock); + return 0; +} + +static void blk_call_notifier_remove(struct device *dev) +{ + struct blk_device_list *old_blkdev, *tmp; + + mutex_lock(&blk_notifier_lock); + list_for_each_entry_safe(old_blkdev, tmp, &blk_devices, list) { + if (old_blkdev->dev != dev) + continue; + + list_del(&old_blkdev->list); + kfree(old_blkdev); + } + raw_notifier_call_chain(&blk_notifier_list, BLK_DEVICE_REMOVE, dev); + mutex_unlock(&blk_notifier_lock); +} + +static struct class_interface blk_notifications_bus_interface __refdata = { + .class = &block_class, + .add_dev = &blk_call_notifier_add, + .remove_dev = &blk_call_notifier_remove, +}; + +static int __init blk_notifications_init(void) +{ + return class_interface_register(&blk_notifications_bus_interface); +} +device_initcall(blk_notifications_init); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index b2f1362c4681..c6e9fb16f76e 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1687,4 +1687,15 @@ static inline bool bdev_can_atomic_write(struct block_device *bdev) #define DEFINE_IO_COMP_BATCH(name) struct io_comp_batch name = { } + +#define BLK_DEVICE_ADD 1 +#define BLK_DEVICE_REMOVE 2 +#if defined(CONFIG_BLOCK_NOTIFIERS) +void blk_register_notify(struct notifier_block *nb); +void blk_unregister_notify(struct notifier_block *nb); +#else +static inline void blk_register_notify(struct notifier_block *nb) { }; +static inline void blk_unregister_notify(struct notifier_block *nb) { }; +#endif + #endif /* _LINUX_BLKDEV_H */
Add notifier block to notify other subsystems about the addition or removal of block devices. Signed-off-by: Daniel Golle <daniel@makrotopia.org> --- block/Kconfig | 6 +++ block/Makefile | 1 + block/blk-notify.c | 87 ++++++++++++++++++++++++++++++++++++++++++ include/linux/blkdev.h | 11 ++++++ 4 files changed, 105 insertions(+) create mode 100644 block/blk-notify.c