Message ID | 20210928123730.393551-1-nborisov@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] btrfs-progs: Ignore path device during device scan | expand |
On 28/09/2021 20:37, Nikolay Borisov wrote: > Currently btrfs-progs will happily enumerate any device which has a > btrfs filesystem on it. For the majority of use cases that's fine and > there haven't been any problems with that. > However, there was a recent > report Could you point to the report or if it is internal? Kernel message has the process of name for the device scan. We don't have to fix the btrfs-progs end if it is not doing it. > that in multipath scenario when running "btrfs fi show" after a > path flap It is better to use 'btrfs fi show -m' it provides kernel perspective. What do you mean by path flap here? Do you mean a device-path in a multi-path config disappeared forever or failed temporarily? > instead of the multipath device being show the path device is > shown. So a multipath filesystem might look like: > > Label: none uuid: d3c1261f-18be-4015-9fef-6b35759dfdba > Total devices 1 FS bytes used 192.00KiB > devid 1 size 10.00GiB used 536.00MiB path /dev/mapper/3600140501cc1f49e5364f0093869c763 > > /dev/mapper/xxx can actually be backed by an arbitrary number of path, It is not arbitrary it depends on the number of HBAs configured to the storage/LUN in a SAN. > which in turn are presented to the system as ordinary scsi devices i.e > /dev/sdd. Yeah, it is problematic. Other OSs got away with that approach by introducing a path consolidator between the HBA and the target driver. > If a path flaps and a user re-runs 'btrfs fi show' the output > would look like: > Label: none uuid: d3c1261f-18be-4015-9fef-6b35759dfdba > Total devices 1 FS bytes used 192.00KiB > devid 1 size 10.00GiB used 536.00MiB path /dev/sdd Kernel messages will know when we change the device path and who called the device scan. Could you pls put those messages here? Thanks, Anand > Turns out the output of this command is consumed by libraries and the > presence of a path device rather than the actual multipath one can cause > issues. > > Fix this by relying on the fact that path devices are tagged with the > DM_MULTIPATH_DEVICE_PATH attribute by the respective udev scripts. > In > order to access it an optional dependency on libudev is added, if the > library can't be found then device enumeration will continue working > as it was before the commit. > Since libudev doesn't have static library > for now support for this behavior in case of static builds is going to > be disabled. Fallback code for static builds will come in a future > commit. > > Signed-off-by: Nikolay Borisov <nborisov@suse.com> > --- > Makefile | 2 +- > Makefile.inc.in | 2 +- > common/device-scan.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ > configure.ac | 9 +++++++++ > 4 files changed, 55 insertions(+), 2 deletions(-) > > diff --git a/Makefile b/Makefile > index 93fe4c2b3e08..e96f66a36b46 100644 > --- a/Makefile > +++ b/Makefile > @@ -129,7 +129,7 @@ LIBS = $(LIBS_BASE) $(LIBS_CRYPTO) > LIBBTRFS_LIBS = $(LIBS_BASE) $(LIBS_CRYPTO) > > # Static compilation flags > -STATIC_CFLAGS = $(CFLAGS) -ffunction-sections -fdata-sections > +STATIC_CFLAGS = $(CFLAGS) -ffunction-sections -fdata-sections -DSTATIC_BUILD > STATIC_LDFLAGS = -static -Wl,--gc-sections > STATIC_LIBS = $(STATIC_LIBS_BASE) > > diff --git a/Makefile.inc.in b/Makefile.inc.in > index 9f49337147b8..c995aef97219 100644 > --- a/Makefile.inc.in > +++ b/Makefile.inc.in > @@ -27,7 +27,7 @@ CRYPTO_CFLAGS = @GCRYPT_CFLAGS@ @SODIUM_CFLAGS@ @KCAPI_CFLAGS@ > SUBST_CFLAGS = @CFLAGS@ > SUBST_LDFLAGS = @LDFLAGS@ > > -LIBS_BASE = @UUID_LIBS@ @BLKID_LIBS@ -L. -pthread > +LIBS_BASE = @UUID_LIBS@ @BLKID_LIBS@ @LIBUDEV_LIBS@ -L. -pthread > LIBS_COMP = @ZLIB_LIBS@ @LZO2_LIBS@ @ZSTD_LIBS@ > LIBS_PYTHON = @PYTHON_LIBS@ > LIBS_CRYPTO = @GCRYPT_LIBS@ @SODIUM_LIBS@ @KCAPI_LIBS@ > diff --git a/common/device-scan.c b/common/device-scan.c > index b5bfe844104b..2ed0e34d3664 100644 > --- a/common/device-scan.c > +++ b/common/device-scan.c > @@ -14,6 +14,10 @@ > * Boston, MA 021110-1307, USA. > */ > > +#ifdef STATIC_BUILD > +#undef HAVE_LIBUDEV > +#endif > + > #include "kerncompat.h" > #include <sys/ioctl.h> > #include <stdlib.h> > @@ -25,6 +29,10 @@ > #include <dirent.h> > #include <blkid/blkid.h> > #include <uuid/uuid.h> > +#ifdef HAVE_LIBUDEV > +#include <sys/stat.h> > +#include <libudev.h> > +#endif > #include "kernel-lib/overflow.h" > #include "common/path-utils.h" > #include "common/device-scan.h" > @@ -364,6 +372,37 @@ void free_seen_fsid(struct seen_fsid *seen_fsid_hash[]) > } > } > > +#ifdef HAVE_LIBUDEV > +static bool is_path_device(char *device_path) > +{ > + struct udev *udev = NULL; > + struct udev_device *dev = NULL; > + struct stat dev_stat; > + const char *val; > + bool ret = false; > + > + if (stat(device_path, &dev_stat) < 0) > + return false; > + > + udev = udev_new(); > + if (!udev) > + goto out; > + > + dev = udev_device_new_from_devnum(udev, 'b', dev_stat.st_rdev); > + if (!dev) > + goto out; > + > + val = udev_device_get_property_value(dev, "DM_MULTIPATH_DEVICE_PATH"); > + if (val && atoi(val) > 0) > + ret = true; > +out: > + udev_device_unref(dev); > + udev_unref(udev); > + > + return ret; > +} > +#endif > + > int btrfs_scan_devices(int verbose) > { > int fd = -1; > @@ -394,6 +433,11 @@ int btrfs_scan_devices(int verbose) > /* if we are here its definitely a btrfs disk*/ > strncpy_null(path, blkid_dev_devname(dev)); > > +#ifdef HAVE_LIBUDEV > + if (is_path_device(path)) > + continue; > +#endif > + > fd = open(path, O_RDONLY); > if (fd < 0) { > error("cannot open %s: %m", path); > diff --git a/configure.ac b/configure.ac > index 038c2688421c..d0ceb0d70d16 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -304,6 +304,15 @@ PKG_STATIC(UUID_LIBS_STATIC, [uuid]) > PKG_CHECK_MODULES(ZLIB, [zlib]) > PKG_STATIC(ZLIB_LIBS_STATIC, [zlib]) > > +PKG_CHECK_EXISTS([libudev], [pkg_config_libudev=yes], [pkg_config_libudev=no]) > +if test "x$pkg_config_libudev" = xyes; then > + PKG_CHECK_MODULES([LIBUDEV], [libudev]) > + AC_DEFINE([HAVE_LIBUDEV], [1], [Define to 1 if libudev is available]) > +else > + AC_MSG_CHECKING([for LIBUDEV]) > + AC_MSG_RESULT([no]) > +fi > + > AC_ARG_ENABLE([zstd], > AS_HELP_STRING([--disable-zstd], [build without zstd support]), > [], [enable_zstd=yes] >
On 29.09.21 г. 2:03, Anand Jain wrote: > On 28/09/2021 20:37, Nikolay Borisov wrote: >> Currently btrfs-progs will happily enumerate any device which has a >> btrfs filesystem on it. For the majority of use cases that's fine and >> there haven't been any problems with that. > >> However, there was a recent >> report > > Could you point to the report or if it is internal? Internal > > Kernel message has the process of name for the device scan. > We don't have to fix the btrfs-progs end if it is not doing it. > >> that in multipath scenario when running "btrfs fi show" after a >> path flap > > It is better to use 'btrfs fi show -m' it provides kernel perspective. [146822.972653] BTRFS warning: duplicate device /dev/sdd devid 1 generation 8 scanned by systemd-udevd (6254) [146823.060984] BTRFS info (device dm-0): devid 1 device path /dev/mapper/3600140501cc1f49e5364f0093869c763 changed to /dev/dm-0 scanned by systemd-udevd (6266) [146823.075084] BTRFS info (device dm-0): devid 1 device path /dev/dm-0 changed to /dev/mapper/3600140501cc1f49e5364f0093869c763 scanned by systemd-udevd (6266) btrfs fi show -m is actually consistent with always showing the mapper device. > > What do you mean by path flap here? Do you mean a device-path in a > multi-path config disappeared forever or failed temporarily? flap means going up and down. The gist is that btrfs fi show would show the latest device being scanned, which in the case of multipath device could be any of the paths. > >> instead of the multipath device being show the path device is >> shown. So a multipath filesystem might look like: >> >> Label: none uuid: d3c1261f-18be-4015-9fef-6b35759dfdba >> Total devices 1 FS bytes used 192.00KiB >> devid 1 size 10.00GiB used 536.00MiB path >> /dev/mapper/3600140501cc1f49e5364f0093869c763 > >> >> /dev/mapper/xxx can actually be backed by an arbitrary number of path, > > It is not arbitrary it depends on the number of HBAs configured to the > storage/LUN in a SAN. And this precisely means that the number of paths is arbitrary, based on particular configuration. > >> which in turn are presented to the system as ordinary scsi devices i.e >> /dev/sdd. > <snip>
On 29/09/2021 14:59, Nikolay Borisov wrote: > > > On 29.09.21 г. 2:03, Anand Jain wrote: >> On 28/09/2021 20:37, Nikolay Borisov wrote: >>> Currently btrfs-progs will happily enumerate any device which has a >>> btrfs filesystem on it. For the majority of use cases that's fine and >>> there haven't been any problems with that. >> >>> However, there was a recent >>> report >> >> Could you point to the report or if it is internal? > > Internal > >> >> Kernel message has the process of name for the device scan. >> We don't have to fix the btrfs-progs end if it is not doing it. >> >>> that in multipath scenario when running "btrfs fi show" after a >>> path flap >> >> It is better to use 'btrfs fi show -m' it provides kernel perspective. > > [146822.972653] BTRFS warning: duplicate device /dev/sdd devid 1 > generation 8 scanned by systemd-udevd (6254) > > [146823.060984] BTRFS info (device dm-0): devid 1 device path > /dev/mapper/3600140501cc1f49e5364f0093869c763 changed to /dev/dm-0 > scanned by systemd-udevd (6266) So /dev/sdd is a device-path of the multi-path /dev/mapper/3600140501cc1f49e5364f0093869c763 aka /dev/dm-0 which, the kernel rejected to use, that's good. > [146823.075084] BTRFS info (device dm-0): devid 1 device path /dev/dm-0 > changed to /dev/mapper/3600140501cc1f49e5364f0093869c763 scanned by > systemd-udevd (6266) systemd-udevd is scanning the same device too many times. > btrfs fi show -m is actually consistent with always showing the mapper > device. Do you mean either of /dev/mapper/3600140501cc1f49e5364f0093869c763 or /dev/dm-0 but, never /dev/sdd? That's expected as per the above messages which is not a problem. >> What do you mean by path flap here? Do you mean a device-path in a >> multi-path config disappeared forever or failed temporarily? > > flap means going up and down. The gist is that btrfs fi show would show > the latest device being scanned, which in the case of multipath device > could be any of the paths. Do you mean 'btrfs fi show' shows a device of the multi-path however, 'btrfs fi show -m' shows the correct multi-path device? <snip>
On 29.09.21 г. 14:13, Anand Jain wrote: <snip> >> flap means going up and down. The gist is that btrfs fi show would show >> the latest device being scanned, which in the case of multipath device >> could be any of the paths. > > Do you mean 'btrfs fi show' shows a device of the multi-path however, > 'btrfs fi show -m' shows the correct multi-path device? Yes, that's a problem purely in btrfs-progs, as the path devices are opened exclusively for the purpose of multiapth. > > <snip> >
>>> flap means going up and down. The gist is that btrfs fi show would show >>> the latest device being scanned, which in the case of multipath device >>> could be any of the paths. But why the problem is only when a device flaps? Or it doesn't matter? >> Do you mean 'btrfs fi show' shows a device of the multi-path however, >> 'btrfs fi show -m' shows the correct multi-path device? > > Yes, that's a problem purely in btrfs-progs, as the path devices are > opened exclusively for the purpose of multiapth. Ok. All parts of the test case is with an unmounted btrfs, I am clarified, now.
On 29.09.21 г. 15:44, Anand Jain wrote: > > >>>> flap means going up and down. The gist is that btrfs fi show would show >>>> the latest device being scanned, which in the case of multipath device >>>> could be any of the paths. > > But why the problem is only when a device flaps? Or it doesn't matter? Because when the device re-appears it will be the last device scanned by btrfs scanning code. It can be reproduced by following steps: 1) Validate 'btrfs fi show' is showing /dev/mapper/xxxx for all fs's 1) Unplug one of the cables from the FC adapter < this can be simulated by simply doing 'echo 1 > /sys/block/sdd/device/delete' for the given path device > 2) Wait for the paths/drives associated with the downed port to disappear 3) Check again that 'btrfs fi show' still lists the /dev/mapper entry 4) Reattach the cable to the hba port <this can be simulated by rescanning the HBA or w/e bus you have: echo "- - -" > /sys/class/scsi_host/host1/scan > 5) Check that 'btrfs fi show' is now shows /dev/sdX devices for all mpio fs's > >>> Do you mean 'btrfs fi show' shows a device of the multi-path however, >>> 'btrfs fi show -m' shows the correct multi-path device? >> >> Yes, that's a problem purely in btrfs-progs, as the path devices are >> opened exclusively for the purpose of multiapth. > > Ok. All parts of the test case is with an unmounted btrfs, I am > clarified, now. > > >
On Tue, Sep 28, 2021 at 03:37:29PM +0300, Nikolay Borisov wrote: > Currently btrfs-progs will happily enumerate any device which has a > btrfs filesystem on it. For the majority of use cases that's fine and > there haven't been any problems with that. However, there was a recent > report that in multipath scenario when running "btrfs fi show" after a > path flap instead of the multipath device being show the path device is > shown. So a multipath filesystem might look like: > > Label: none uuid: d3c1261f-18be-4015-9fef-6b35759dfdba > Total devices 1 FS bytes used 192.00KiB > devid 1 size 10.00GiB used 536.00MiB path /dev/mapper/3600140501cc1f49e5364f0093869c763 > > /dev/mapper/xxx can actually be backed by an arbitrary number of path, > which in turn are presented to the system as ordinary scsi devices i.e > /dev/sdd. If a path flaps and a user re-runs 'btrfs fi show' the output > would look like: > > Label: none uuid: d3c1261f-18be-4015-9fef-6b35759dfdba > Total devices 1 FS bytes used 192.00KiB > devid 1 size 10.00GiB used 536.00MiB path /dev/sdd > > Turns out the output of this command is consumed by libraries and the > presence of a path device rather than the actual multipath one can cause > issues. > > Fix this by relying on the fact that path devices are tagged with the > DM_MULTIPATH_DEVICE_PATH attribute by the respective udev scripts. In > order to access it an optional dependency on libudev is added, if the > library can't be found then device enumeration will continue working > as it was before the commit. Since libudev doesn't have static library > for now support for this behavior in case of static builds is going to > be disabled. Fallback code for static builds will come in a future > commit. > > Signed-off-by: Nikolay Borisov <nborisov@suse.com> > --- > Makefile | 2 +- > Makefile.inc.in | 2 +- > common/device-scan.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ > configure.ac | 9 +++++++++ > 4 files changed, 55 insertions(+), 2 deletions(-) > > diff --git a/Makefile b/Makefile > index 93fe4c2b3e08..e96f66a36b46 100644 > --- a/Makefile > +++ b/Makefile > @@ -129,7 +129,7 @@ LIBS = $(LIBS_BASE) $(LIBS_CRYPTO) > LIBBTRFS_LIBS = $(LIBS_BASE) $(LIBS_CRYPTO) > > # Static compilation flags > -STATIC_CFLAGS = $(CFLAGS) -ffunction-sections -fdata-sections > +STATIC_CFLAGS = $(CFLAGS) -ffunction-sections -fdata-sections -DSTATIC_BUILD Could you please split the configure/makefile changes to a separate patch? For progs the patch separation rules/recommendations are less strict as for kernel but at least the build and code changes should be separate as each has a different justification. Doing libudev and -DSTATIC_BUILD is probably fine in one patch. > STATIC_LDFLAGS = -static -Wl,--gc-sections > STATIC_LIBS = $(STATIC_LIBS_BASE) > > diff --git a/Makefile.inc.in b/Makefile.inc.in > index 9f49337147b8..c995aef97219 100644 > --- a/Makefile.inc.in > +++ b/Makefile.inc.in > @@ -27,7 +27,7 @@ CRYPTO_CFLAGS = @GCRYPT_CFLAGS@ @SODIUM_CFLAGS@ @KCAPI_CFLAGS@ > SUBST_CFLAGS = @CFLAGS@ > SUBST_LDFLAGS = @LDFLAGS@ > > -LIBS_BASE = @UUID_LIBS@ @BLKID_LIBS@ -L. -pthread > +LIBS_BASE = @UUID_LIBS@ @BLKID_LIBS@ @LIBUDEV_LIBS@ -L. -pthread > LIBS_COMP = @ZLIB_LIBS@ @LZO2_LIBS@ @ZSTD_LIBS@ > LIBS_PYTHON = @PYTHON_LIBS@ > LIBS_CRYPTO = @GCRYPT_LIBS@ @SODIUM_LIBS@ @KCAPI_LIBS@ > diff --git a/common/device-scan.c b/common/device-scan.c > index b5bfe844104b..2ed0e34d3664 100644 > --- a/common/device-scan.c > +++ b/common/device-scan.c > @@ -14,6 +14,10 @@ > * Boston, MA 021110-1307, USA. > */ > > +#ifdef STATIC_BUILD > +#undef HAVE_LIBUDEV > +#endif > + > #include "kerncompat.h" > #include <sys/ioctl.h> > #include <stdlib.h> > @@ -25,6 +29,10 @@ > #include <dirent.h> > #include <blkid/blkid.h> > #include <uuid/uuid.h> > +#ifdef HAVE_LIBUDEV > +#include <sys/stat.h> > +#include <libudev.h> > +#endif > #include "kernel-lib/overflow.h" > #include "common/path-utils.h" > #include "common/device-scan.h" > @@ -364,6 +372,37 @@ void free_seen_fsid(struct seen_fsid *seen_fsid_hash[]) > } > } > > +#ifdef HAVE_LIBUDEV > +static bool is_path_device(char *device_path) Please rename it to is_multipath_device, we've been using 'path' for normal paths and I find using 'path' confusing in this context. > +{ > + struct udev *udev = NULL; > + struct udev_device *dev = NULL; > + struct stat dev_stat; > + const char *val; > + bool ret = false; > + > + if (stat(device_path, &dev_stat) < 0) > + return false; > + > + udev = udev_new(); > + if (!udev) > + goto out; > + > + dev = udev_device_new_from_devnum(udev, 'b', dev_stat.st_rdev); > + if (!dev) > + goto out; > + > + val = udev_device_get_property_value(dev, "DM_MULTIPATH_DEVICE_PATH"); > + if (val && atoi(val) > 0) > + ret = true; > +out: > + udev_device_unref(dev); > + udev_unref(udev); > + > + return ret; > +} > +#endif > + > int btrfs_scan_devices(int verbose) > { > int fd = -1; > @@ -394,6 +433,11 @@ int btrfs_scan_devices(int verbose) > /* if we are here its definitely a btrfs disk*/ > strncpy_null(path, blkid_dev_devname(dev)); > > +#ifdef HAVE_LIBUDEV > + if (is_path_device(path)) > + continue; > +#endif I'd rather avoid inline ifdef, the cleaner way is to define 2 versions based on the macro like #ifdef HAVE_LIBUDEV static bool is_path_device(char *device_path) { ... as above } #else static bool is_path_device(char *device_path) { return false; } #endif and inside btrfs_scan_devices do if (is_multipath_device(...)) > + > fd = open(path, O_RDONLY); > if (fd < 0) { > error("cannot open %s: %m", path); > diff --git a/configure.ac b/configure.ac > index 038c2688421c..d0ceb0d70d16 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -304,6 +304,15 @@ PKG_STATIC(UUID_LIBS_STATIC, [uuid]) > PKG_CHECK_MODULES(ZLIB, [zlib]) > PKG_STATIC(ZLIB_LIBS_STATIC, [zlib]) > > +PKG_CHECK_EXISTS([libudev], [pkg_config_libudev=yes], [pkg_config_libudev=no]) > +if test "x$pkg_config_libudev" = xyes; then > + PKG_CHECK_MODULES([LIBUDEV], [libudev]) > + AC_DEFINE([HAVE_LIBUDEV], [1], [Define to 1 if libudev is available]) > +else > + AC_MSG_CHECKING([for LIBUDEV]) > + AC_MSG_RESULT([no]) > +fi > + > AC_ARG_ENABLE([zstd], > AS_HELP_STRING([--disable-zstd], [build without zstd support]), > [], [enable_zstd=yes] > -- > 2.17.1
On Wed, Sep 29, 2021 at 09:59:15AM +0300, Nikolay Borisov wrote: > On 29.09.21 г. 2:03, Anand Jain wrote: > > On 28/09/2021 20:37, Nikolay Borisov wrote: > >> Currently btrfs-progs will happily enumerate any device which has a > >> btrfs filesystem on it. For the majority of use cases that's fine and > >> there haven't been any problems with that. > > > >> However, there was a recent > >> report > > > > Could you point to the report or if it is internal? > > Internal > > > > > Kernel message has the process of name for the device scan. > > We don't have to fix the btrfs-progs end if it is not doing it. > > > >> that in multipath scenario when running "btrfs fi show" after a > >> path flap > > > > It is better to use 'btrfs fi show -m' it provides kernel perspective. > > [146822.972653] BTRFS warning: duplicate device /dev/sdd devid 1 > generation 8 scanned by systemd-udevd (6254) > > [146823.060984] BTRFS info (device dm-0): devid 1 device path > /dev/mapper/3600140501cc1f49e5364f0093869c763 changed to /dev/dm-0 > scanned by systemd-udevd (6266) > > [146823.075084] BTRFS info (device dm-0): devid 1 device path /dev/dm-0 > changed to /dev/mapper/3600140501cc1f49e5364f0093869c763 scanned by > systemd-udevd (6266) > > > btrfs fi show -m is actually consistent with always showing the mapper > device. > > > > > What do you mean by path flap here? Do you mean a device-path in a > > multi-path config disappeared forever or failed temporarily? > > flap means going up and down. Please use a more verbose description like "a path can flap (go up and down)"
On 29/09/2021 20:51, Nikolay Borisov wrote: > > > On 29.09.21 г. 15:44, Anand Jain wrote: >> >> >>>>> flap means going up and down. The gist is that btrfs fi show would show >>>>> the latest device being scanned, which in the case of multipath device >>>>> could be any of the paths. >> >> But why the problem is only when a device flaps? Or it doesn't matter? > > Because when the device re-appears it will be the last device scanned by > btrfs scanning code. It shouldn't be so fragile that we merely depend on the order of the btrfs device enumeration. Device orders can be random. It shouldn't matter. Even if it is working fine before the disappear-reappear cycle, it is just by luck. To show the list of btrfs devices that are unmounted, we use btrfs_scan_devices(). In btrfs_scan_devices(), we extensively use libblkid to enumerate btrfs devices. Few important lines of it are as below. ----------------- 367 int btrfs_scan_devices(int verbose) <snip> 381 ret = blkid_get_cache(&cache, NULL); 382 if (ret < 0) { 383 errno = -ret; 384 error("blkid cache get failed: %m"); 385 return ret; 386 } 387 blkid_probe_all(cache); 388 iter = blkid_dev_iterate_begin(cache); 389 blkid_dev_set_search(iter, "TYPE", "btrfs"); 390 while (blkid_dev_next(iter, &dev) == 0) { 391 dev = blkid_verify(cache, dev); 392 if (!dev) 393 continue; 394 /* if we are here its definitely a btrfs disk*/ 395 strncpy_null(path, blkid_dev_devname(dev)); ----------------- So, you mean to say the blkid_dev_set_search() always finds all three paths containing the same fsid+uuid+devid? But, only its order varies when the a underlying device disappears and reappears. For example: Device order before /dev/sdb disappeared /dev/sda MAJ:MIN?? /dev/sdb MAJ:MIN?? /dev/mapper/3600140501cc1f49e5364f0093869c763 MAJ:MIN?? Device order after /dev/sdb reappeared /dev/sda MAJ:MIN?? /dev/mapper/3600140501cc1f49e5364f0093869c763 MAJ:MIN?? /dev/sdb MAJ:MIN?? Could you please help to find the MAJ:MIN of the devices before and after the disappear-reappear cycle? Are we sure the reappeared device has the same MAJ:MIN as before? is it shown as a new device? If not then it could be block layer problem too. So as said we shouldn't depend on the order of the device enumeration. Your fix makes sense to an extent but, still depend on the device order of reporting. Let us say we dd a device to another device then, btrfs fi show will show the last enumerated device by blkid (I think). If yes then, it is wrong. Could we make it order neutral? I don't know how. I think the only choice is to list all the devices in a tree format. Similar to lsblk(8). Thanks, Anand > It can be reproduced by following steps: > 1) Validate 'btrfs fi show' is showing /dev/mapper/xxxx for all fs's > 1) Unplug one of the cables from the FC adapter < this can be simulated > by simply doing 'echo 1 > /sys/block/sdd/device/delete' for the given > path device > > 2) Wait for the paths/drives associated with the downed port to disappear > 3) Check again that 'btrfs fi show' still lists the /dev/mapper entry > 4) Reattach the cable to the hba port <this can be simulated by > rescanning the HBA or w/e bus you have: echo "- - -" > > /sys/class/scsi_host/host1/scan > > 5) Check that 'btrfs fi show' is now shows /dev/sdX devices for all mpio > fs's > >> >>>> Do you mean 'btrfs fi show' shows a device of the multi-path however, >>>> 'btrfs fi show -m' shows the correct multi-path device? >>> >>> Yes, that's a problem purely in btrfs-progs, as the path devices are >>> opened exclusively for the purpose of multiapth. >> >> Ok. All parts of the test case is with an unmounted btrfs, I am >> clarified, now.
diff --git a/Makefile b/Makefile index 93fe4c2b3e08..e96f66a36b46 100644 --- a/Makefile +++ b/Makefile @@ -129,7 +129,7 @@ LIBS = $(LIBS_BASE) $(LIBS_CRYPTO) LIBBTRFS_LIBS = $(LIBS_BASE) $(LIBS_CRYPTO) # Static compilation flags -STATIC_CFLAGS = $(CFLAGS) -ffunction-sections -fdata-sections +STATIC_CFLAGS = $(CFLAGS) -ffunction-sections -fdata-sections -DSTATIC_BUILD STATIC_LDFLAGS = -static -Wl,--gc-sections STATIC_LIBS = $(STATIC_LIBS_BASE) diff --git a/Makefile.inc.in b/Makefile.inc.in index 9f49337147b8..c995aef97219 100644 --- a/Makefile.inc.in +++ b/Makefile.inc.in @@ -27,7 +27,7 @@ CRYPTO_CFLAGS = @GCRYPT_CFLAGS@ @SODIUM_CFLAGS@ @KCAPI_CFLAGS@ SUBST_CFLAGS = @CFLAGS@ SUBST_LDFLAGS = @LDFLAGS@ -LIBS_BASE = @UUID_LIBS@ @BLKID_LIBS@ -L. -pthread +LIBS_BASE = @UUID_LIBS@ @BLKID_LIBS@ @LIBUDEV_LIBS@ -L. -pthread LIBS_COMP = @ZLIB_LIBS@ @LZO2_LIBS@ @ZSTD_LIBS@ LIBS_PYTHON = @PYTHON_LIBS@ LIBS_CRYPTO = @GCRYPT_LIBS@ @SODIUM_LIBS@ @KCAPI_LIBS@ diff --git a/common/device-scan.c b/common/device-scan.c index b5bfe844104b..2ed0e34d3664 100644 --- a/common/device-scan.c +++ b/common/device-scan.c @@ -14,6 +14,10 @@ * Boston, MA 021110-1307, USA. */ +#ifdef STATIC_BUILD +#undef HAVE_LIBUDEV +#endif + #include "kerncompat.h" #include <sys/ioctl.h> #include <stdlib.h> @@ -25,6 +29,10 @@ #include <dirent.h> #include <blkid/blkid.h> #include <uuid/uuid.h> +#ifdef HAVE_LIBUDEV +#include <sys/stat.h> +#include <libudev.h> +#endif #include "kernel-lib/overflow.h" #include "common/path-utils.h" #include "common/device-scan.h" @@ -364,6 +372,37 @@ void free_seen_fsid(struct seen_fsid *seen_fsid_hash[]) } } +#ifdef HAVE_LIBUDEV +static bool is_path_device(char *device_path) +{ + struct udev *udev = NULL; + struct udev_device *dev = NULL; + struct stat dev_stat; + const char *val; + bool ret = false; + + if (stat(device_path, &dev_stat) < 0) + return false; + + udev = udev_new(); + if (!udev) + goto out; + + dev = udev_device_new_from_devnum(udev, 'b', dev_stat.st_rdev); + if (!dev) + goto out; + + val = udev_device_get_property_value(dev, "DM_MULTIPATH_DEVICE_PATH"); + if (val && atoi(val) > 0) + ret = true; +out: + udev_device_unref(dev); + udev_unref(udev); + + return ret; +} +#endif + int btrfs_scan_devices(int verbose) { int fd = -1; @@ -394,6 +433,11 @@ int btrfs_scan_devices(int verbose) /* if we are here its definitely a btrfs disk*/ strncpy_null(path, blkid_dev_devname(dev)); +#ifdef HAVE_LIBUDEV + if (is_path_device(path)) + continue; +#endif + fd = open(path, O_RDONLY); if (fd < 0) { error("cannot open %s: %m", path); diff --git a/configure.ac b/configure.ac index 038c2688421c..d0ceb0d70d16 100644 --- a/configure.ac +++ b/configure.ac @@ -304,6 +304,15 @@ PKG_STATIC(UUID_LIBS_STATIC, [uuid]) PKG_CHECK_MODULES(ZLIB, [zlib]) PKG_STATIC(ZLIB_LIBS_STATIC, [zlib]) +PKG_CHECK_EXISTS([libudev], [pkg_config_libudev=yes], [pkg_config_libudev=no]) +if test "x$pkg_config_libudev" = xyes; then + PKG_CHECK_MODULES([LIBUDEV], [libudev]) + AC_DEFINE([HAVE_LIBUDEV], [1], [Define to 1 if libudev is available]) +else + AC_MSG_CHECKING([for LIBUDEV]) + AC_MSG_RESULT([no]) +fi + AC_ARG_ENABLE([zstd], AS_HELP_STRING([--disable-zstd], [build without zstd support]), [], [enable_zstd=yes]
Currently btrfs-progs will happily enumerate any device which has a btrfs filesystem on it. For the majority of use cases that's fine and there haven't been any problems with that. However, there was a recent report that in multipath scenario when running "btrfs fi show" after a path flap instead of the multipath device being show the path device is shown. So a multipath filesystem might look like: Label: none uuid: d3c1261f-18be-4015-9fef-6b35759dfdba Total devices 1 FS bytes used 192.00KiB devid 1 size 10.00GiB used 536.00MiB path /dev/mapper/3600140501cc1f49e5364f0093869c763 /dev/mapper/xxx can actually be backed by an arbitrary number of path, which in turn are presented to the system as ordinary scsi devices i.e /dev/sdd. If a path flaps and a user re-runs 'btrfs fi show' the output would look like: Label: none uuid: d3c1261f-18be-4015-9fef-6b35759dfdba Total devices 1 FS bytes used 192.00KiB devid 1 size 10.00GiB used 536.00MiB path /dev/sdd Turns out the output of this command is consumed by libraries and the presence of a path device rather than the actual multipath one can cause issues. Fix this by relying on the fact that path devices are tagged with the DM_MULTIPATH_DEVICE_PATH attribute by the respective udev scripts. In order to access it an optional dependency on libudev is added, if the library can't be found then device enumeration will continue working as it was before the commit. Since libudev doesn't have static library for now support for this behavior in case of static builds is going to be disabled. Fallback code for static builds will come in a future commit. Signed-off-by: Nikolay Borisov <nborisov@suse.com> --- Makefile | 2 +- Makefile.inc.in | 2 +- common/device-scan.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ configure.ac | 9 +++++++++ 4 files changed, 55 insertions(+), 2 deletions(-)