mbox series

[v3,0/6] block: add a sequence number to disks

Message ID 20210623105858.6978-1-mcroce@linux.microsoft.com (mailing list archive)
Headers show
Series block: add a sequence number to disks | expand

Message

Matteo Croce June 23, 2021, 10:58 a.m. UTC
From: Matteo Croce <mcroce@microsoft.com>

With this series a monotonically increasing number is added to disks,
precisely in the genhd struct, and it's exported in sysfs and uevent.

This helps the userspace correlate events for devices that reuse the
same device, like loop.

The first patch is the core one, the 2..4 expose the information in
different ways, the 5th increases the seqnum on media change and
the last one increases the sequence number for loop devices upon
attach, detach or reconfigure.

If merged, this feature will immediately used by the userspace:
https://github.com/systemd/systemd/issues/17469#issuecomment-762919781

v2 -> v3:
- rebased on top of 5.13-rc7
- resend because it appeared archived on patchwork

v1 -> v2:
- increase seqnum on media change
- increase on loop detach

Matteo Croce (6):
  block: add disk sequence number
  block: add ioctl to read the disk sequence number
  block: refactor sysfs code
  block: export diskseq in sysfs
  block: increment sequence number
  loop: increment sequence number

 Documentation/ABI/testing/sysfs-block | 12 +++++++
 block/genhd.c                         | 46 ++++++++++++++++++++++++---
 block/ioctl.c                         |  2 ++
 drivers/block/loop.c                  |  5 +++
 include/linux/genhd.h                 |  2 ++
 include/uapi/linux/fs.h               |  1 +
 6 files changed, 64 insertions(+), 4 deletions(-)

Comments

Hannes Reinecke June 23, 2021, 12:03 p.m. UTC | #1
On 6/23/21 12:58 PM, Matteo Croce wrote:
> From: Matteo Croce <mcroce@microsoft.com>
> 
> With this series a monotonically increasing number is added to disks,
> precisely in the genhd struct, and it's exported in sysfs and uevent.
> 
> This helps the userspace correlate events for devices that reuse the
> same device, like loop.
> 
I'm failing to see the point here.
Apparently you are assuming that there is a userspace tool tracking 
events, and has a need to correlate events related to different 
instances of the disk.
But if you have an userspace application tracking events, why can't the 
same application track the 'add' and 'remove' events to track the 
lifetime of the devices, and implement its own numbering based on that?

Why do we need to burden the kernel with this?

Cheers,

Hannes
Luca Boccassi June 23, 2021, 12:46 p.m. UTC | #2
On Wed, 2021-06-23 at 14:03 +0200, Hannes Reinecke wrote:
> On 6/23/21 12:58 PM, Matteo Croce wrote:
> > From: Matteo Croce <mcroce@microsoft.com>
> > 
> > With this series a monotonically increasing number is added to disks,
> > precisely in the genhd struct, and it's exported in sysfs and uevent.
> > 
> > This helps the userspace correlate events for devices that reuse the
> > same device, like loop.
> > 
> I'm failing to see the point here.
> Apparently you are assuming that there is a userspace tool tracking 
> events, and has a need to correlate events related to different 
> instances of the disk.
> But if you have an userspace application tracking events, why can't the 
> same application track the 'add' and 'remove' events to track the 
> lifetime of the devices, and implement its own numbering based on that?
> 
> Why do we need to burden the kernel with this?
> 
> Cheers,
> 
> Hannes

Hi,

It is not an assumption, such tool does exist, and manually tracking
does not work because of the impossibility of reliably correlating
events to devices (we've tried, again and again and again), which is
the purpose of this series - to solve this long standing issue, which
has been causing problems both in testing and production for a long
time now, despite our best efforts to add workaround after workaround.

For more info please see the discussion on the v1:

https://lore.kernel.org/linux-fsdevel/20210315201331.GA2577561@casper.infradead.org/t/#m5b03e48013de14b4a080c90afdc4a8b8c94c30d4

and the bug linked in the cover letter:

https://github.com/systemd/systemd/issues/17469#issuecomment-762919781
Lennart Poettering June 23, 2021, 2:07 p.m. UTC | #3
On Mi, 23.06.21 14:03, Hannes Reinecke (hare@suse.de) wrote:

> On 6/23/21 12:58 PM, Matteo Croce wrote:
> > From: Matteo Croce <mcroce@microsoft.com>
> >
> > With this series a monotonically increasing number is added to disks,
> > precisely in the genhd struct, and it's exported in sysfs and uevent.
> >
> > This helps the userspace correlate events for devices that reuse the
> > same device, like loop.
> >
> I'm failing to see the point here.
> Apparently you are assuming that there is a userspace tool tracking events,
> and has a need to correlate events related to different instances of the
> disk.
> But if you have an userspace application tracking events, why can't the same
> application track the 'add' and 'remove' events to track the lifetime of the
> devices, and implement its own numbering based on that?
>
> Why do we need to burden the kernel with this?

The problem is that tracking the "add" and "remove" events is simply
not safely possibly right now for block devices whose names are
frequently reused.

Consider the loopback block device subsystem: whenever some tool wants
a loopback block device it will ask the kernel for one and the kernel
allocates from the bottom, hence /dev/loop0 is the most frequently
used loopback block device. If a large number of concurrently running
programs now repeatedly/quickly allocate/deallocate block devices they
all sooner or later get /dev/loop0. If they now want to watch the
"add" and "remove" uevents for that device for their own use of it
there's a very good chance they'll end up seeing the previous user's
"add" and "remove" events, as there's simply no way to associate the
uevents you see with *your* *own* use of /dev/loop0 right now, and
distinguish them from the uevent that might have been queued from a
prior use of /dev/loop0 and were just slow to be processed.

or to say this differently: loopback devices are named from a very
small, dense pool of names, and are frequently and quickly
reused. uevents are enqeued asynchronously and potentially take a long
time to reach the listeners (after all they have to travel through two
AF_NETLINK sockets and udev) and the only way to match up the device
uses and their uevents is by these kernel device names that are so
useless as a stable identifier.

This not only applies to loopback block devices, but many other block
device subsystems too. For example nbd allocates from the bottom, too,
i.e. /dev/nbd0 is the most like name to be used. And for SCSI devices
too: if you quickly plug/unplug/replug a bunch of USB sticks, you'll
likely always get /dev/sda...

Lennart

--
Lennart Poettering, Berlin