diff mbox series

[v3,6/6] loop: increment sequence number

Message ID 20210623105858.6978-7-mcroce@linux.microsoft.com (mailing list archive)
State New, archived
Headers show
Series block: add a sequence number to disks | expand

Commit Message

Matteo Croce June 23, 2021, 10:58 a.m. UTC
From: Matteo Croce <mcroce@microsoft.com>

On a very loaded system, if there are many events queued up from multiple
attach/detach cycles, it's impossible to match them up with the
LOOP_CONFIGURE or LOOP_SET_FD call, since we don't know where the position
of our own association in the queue is[1].
Not even an empty uevent queue is a reliable indication that we already
received the uevent we were waiting for, since with multi-partition block
devices each partition's event is queued asynchronously and might be
delivered later.

Increment the disk sequence number when setting or changing the backing
file, so the userspace knows which backing file generated the event:

    # udevadm monitor -kp |grep -e ^DEVNAME -e ^DISKSEQ &
    [1] 263
    # losetup -fP 3part
    [   12.309974] loop0: detected capacity change from 0 to 2048
    DEVNAME=/dev/loop0
    DISKSEQ=8
    [   12.360252]  loop0: p1 p2 p3
    DEVNAME=/dev/loop0
    DISKSEQ=8
    DEVNAME=/dev/loop0p1
    DISKSEQ=8
    DEVNAME=/dev/loop0p2
    DISKSEQ=8
    DEVNAME=/dev/loop0p3
    DISKSEQ=8
    # losetup -D
    DEVNAME=/dev/loop0
    DISKSEQ=9
    DEVNAME=/dev/loop0p1
    DISKSEQ=9
    DEVNAME=/dev/loop0p2
    DISKSEQ=9
    DEVNAME=/dev/loop0p3
    DISKSEQ=9
    # losetup -fP 2part
    [   29.001344] loop0: detected capacity change from 0 to 2048
    DEVNAME=/dev/loop0
    DISKSEQ=10
    [   29.040226]  loop0: p1 p2
    DEVNAME=/dev/loop0
    DISKSEQ=10
    DEVNAME=/dev/loop0p1
    DISKSEQ=10
    DEVNAME=/dev/loop0p2
    DISKSEQ=10

[1] https://github.com/systemd/systemd/issues/17469#issuecomment-762919781

Signed-off-by: Matteo Croce <mcroce@microsoft.com>
---
 drivers/block/loop.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Christoph Hellwig June 23, 2021, 11:57 a.m. UTC | #1
On Wed, Jun 23, 2021 at 12:58:58PM +0200, Matteo Croce wrote:
> From: Matteo Croce <mcroce@microsoft.com>
> 
> On a very loaded system, if there are many events queued up from multiple
> attach/detach cycles, it's impossible to match them up with the
> LOOP_CONFIGURE or LOOP_SET_FD call, since we don't know where the position
> of our own association in the queue is[1].
> Not even an empty uevent queue is a reliable indication that we already
> received the uevent we were waiting for, since with multi-partition block
> devices each partition's event is queued asynchronously and might be
> delivered later.
> 
> Increment the disk sequence number when setting or changing the backing
> file, so the userspace knows which backing file generated the event:

Instead of manually incrementing the sequence here, can we make loop
generate the DISK_EVENT_MEDIA_CHANGE event on a backing device (aka
media) change?
Luca Boccassi June 23, 2021, 1:13 p.m. UTC | #2
On Wed, 2021-06-23 at 12:57 +0100, Christoph Hellwig wrote:
> On Wed, Jun 23, 2021 at 12:58:58PM +0200, Matteo Croce wrote:
> > From: Matteo Croce <mcroce@microsoft.com>
> > 
> > On a very loaded system, if there are many events queued up from multiple
> > attach/detach cycles, it's impossible to match them up with the
> > LOOP_CONFIGURE or LOOP_SET_FD call, since we don't know where the position
> > of our own association in the queue is[1].
> > Not even an empty uevent queue is a reliable indication that we already
> > received the uevent we were waiting for, since with multi-partition block
> > devices each partition's event is queued asynchronously and might be
> > delivered later.
> > 
> > Increment the disk sequence number when setting or changing the backing
> > file, so the userspace knows which backing file generated the event:
> 
> Instead of manually incrementing the sequence here, can we make loop
> generate the DISK_EVENT_MEDIA_CHANGE event on a backing device (aka
> media) change?

Hi,

This was answered in the v1 thread:

https://lore.kernel.org/linux-fsdevel/20210315201331.GA2577561@casper.infradead.org/t/#m8a677028572e826352cbb1e19d1b9c1f3b6bff4b

The fundamental issue is that we'd be back at trying to correlate
events to loopdev instances, which does not work reliably - hence this
patch series. With the new ioctl, we can get the id immediately and
without delay when we create the device, with no possible races. Then
we can handle events reliably, as we can correlate correctly in all
cases.
Christoph Hellwig June 23, 2021, 2:25 p.m. UTC | #3
On Wed, Jun 23, 2021 at 02:13:25PM +0100, Luca Boccassi wrote:
> On Wed, 2021-06-23 at 12:57 +0100, Christoph Hellwig wrote:
> > On Wed, Jun 23, 2021 at 12:58:58PM +0200, Matteo Croce wrote:
> > > From: Matteo Croce <mcroce@microsoft.com>
> > > 
> > > On a very loaded system, if there are many events queued up from multiple
> > > attach/detach cycles, it's impossible to match them up with the
> > > LOOP_CONFIGURE or LOOP_SET_FD call, since we don't know where the position
> > > of our own association in the queue is[1].
> > > Not even an empty uevent queue is a reliable indication that we already
> > > received the uevent we were waiting for, since with multi-partition block
> > > devices each partition's event is queued asynchronously and might be
> > > delivered later.
> > > 
> > > Increment the disk sequence number when setting or changing the backing
> > > file, so the userspace knows which backing file generated the event:
> > 
> > Instead of manually incrementing the sequence here, can we make loop
> > generate the DISK_EVENT_MEDIA_CHANGE event on a backing device (aka
> > media) change?
> 
> Hi,
> 
> This was answered in the v1 thread:
> 
> https://lore.kernel.org/linux-fsdevel/20210315201331.GA2577561@casper.infradead.org/t/#m8a677028572e826352cbb1e19d1b9c1f3b6bff4b
> 
> The fundamental issue is that we'd be back at trying to correlate
> events to loopdev instances, which does not work reliably - hence this
> patch series. With the new ioctl, we can get the id immediately and
> without delay when we create the device, with no possible races. Then
> we can handle events reliably, as we can correlate correctly in all
> cases.

I very much disagree with your reply there.  The device now points to
a different media.  Both for the loop device, a floppy or a CD changer
probably by some kind of user action.  In the last cast it might even
by done entirely locally through a script just like the loop device.
Lennart Poettering June 23, 2021, 3:29 p.m. UTC | #4
On Mi, 23.06.21 15:25, Christoph Hellwig (hch@infradead.org) wrote:

> On Wed, Jun 23, 2021 at 02:13:25PM +0100, Luca Boccassi wrote:
> > On Wed, 2021-06-23 at 12:57 +0100, Christoph Hellwig wrote:
> > > On Wed, Jun 23, 2021 at 12:58:58PM +0200, Matteo Croce wrote:
> > > > From: Matteo Croce <mcroce@microsoft.com>
> > > >
> > > > On a very loaded system, if there are many events queued up from multiple
> > > > attach/detach cycles, it's impossible to match them up with the
> > > > LOOP_CONFIGURE or LOOP_SET_FD call, since we don't know where the position
> > > > of our own association in the queue is[1].
> > > > Not even an empty uevent queue is a reliable indication that we already
> > > > received the uevent we were waiting for, since with multi-partition block
> > > > devices each partition's event is queued asynchronously and might be
> > > > delivered later.
> > > >
> > > > Increment the disk sequence number when setting or changing the backing
> > > > file, so the userspace knows which backing file generated the event:
> > >
> > > Instead of manually incrementing the sequence here, can we make loop
> > > generate the DISK_EVENT_MEDIA_CHANGE event on a backing device (aka
> > > media) change?
> >
> > Hi,
> >
> > This was answered in the v1 thread:
> >
> > https://lore.kernel.org/linux-fsdevel/20210315201331.GA2577561@casper.infradead.org/t/#m8a677028572e826352cbb1e19d1b9c1f3b6bff4b
> >
> > The fundamental issue is that we'd be back at trying to correlate
> > events to loopdev instances, which does not work reliably - hence this
> > patch series. With the new ioctl, we can get the id immediately and
> > without delay when we create the device, with no possible races. Then
> > we can handle events reliably, as we can correlate correctly in all
> > cases.
>
> I very much disagree with your reply there.  The device now points to
> a different media.  Both for the loop device, a floppy or a CD changer
> probably by some kind of user action.  In the last cast it might even
> by done entirely locally through a script just like the loop device.

I am not sure I grok your point.

but let me try to explain why I think it's better to make media
changes *also* trigger seqno changes, and not make media change events
the *only* way to trigger seqno changes.

1. First of all, loopback devices currently don't hook into the media
   change logic (which so far is focussed on time-based polling
   actually, for both CDs and floppies). Adding this would change
   semantics visibly to userspace (since userspace would suddenly see
   another action=change + DISK_MEDIA_CHANGE=1 uevent suddenly that it
   needs to handle correctly). One can certainly argue that userspace
   must be ready to get additional uevents like this any time, but
   knowing userspace a bit I am pretty sure this will confuse some
   userspace that doesn't expect this. I mean loopback block devices
   already issue "change" uevents on attachment and detachment, one
   that userpace typically expects, but adding the media change one
   probably means sending two (?) of these out for each
   attachment. One being the old one from the loopback device itself,
   and then another one for the media change from the mdia change
   logic. That's not just noisy, but also ugly.

2. We want seqnums to be allocated for devices not only when doing
   media change (e.g. when attaching or detaching a loopback device)
   but also when allocating a block device, so that even before the
   first media change event a block device has a sequence number. This
   means allocating a sequence number for block devices won't be
   limited to the media change code anyway.

3. Doing the sequence number bumping in media change code exclusively
   kinda suggests this was something we could properly abstract away,
   to be done only there, and that the rest of the block subsystems
   wouldn#t have to bother much. But I am pretty sure that's not
   correct: in particular in the loopback block device code (but in
   other block subsystems too I guess) we really want to be able to
   atomically attach a loopback block device and return the seqnum of
   that very attachmnt so that we can focus on uevents for it. Thus,
   attachment, allocation and returning the seqnum to userspace in the
   LOOP_CONFIGURE ioctl (or whatever is appropriate) kinda go hand in
   hand.

4. The media change protocol follows a protocol along with the eject
   button handling (DISK_EVENT_EJECT_REQUEST), the locking of the tray
   and the time based polling. i.e. it's specifically focussed on
   certain hw features, none of which really apply to loopback block
   devices, which have no trays to lock, but eject buttons to handle,
   and where media change is always triggered internally by privileged
   code, never externally by the user. I doubt it makes sense to mix
   these up. Currently udev+udisks in userspace implement that
   procotol for cdroms/floppies, and I doubt we would want to bother
   to extend this for loopback devices in userspace. In fact, if media
   change events are added to loopback devices, we probably would have
   to change userspace to ignore them.

TLDR: making loopback block devices generate media is a (probably
minor, but unclear) API break, probably means duplicating uevent
traffic for it, won't abstract things away anyway, won't allocate a
seqnum on device allocation, and won't actually use anything of the
real media change functionality.

Or to say this differently: if the goal is to make loopback block
devices to send CDROM compatible media change events to userspace,
then I think it would probably make more sense to attach the
DISK_MEDIA_CHANGE=1 property to the attachment/detachment uevents the
loopback devices *already* generate, rather than to try to shoehorn the
existing media change infrastructure into the loopback device logic,
trying to reuse it even though nothing of it is really needed.

But that said, I am not aware of anyone ever asking for CDROM
compatible EDISK_MEDIA_CHANGE=1 uevents for loopback devices. I really
wouldn't bother with that. Sounds like nothing anyone want with a
chance of breaking things everywhere. (i.e. remember the
"bind"/"unbind" uevent fiasco?)

Lennart

--
Lennart Poettering, Berlin
Christoph Hellwig June 24, 2021, 6:11 a.m. UTC | #5
On Wed, Jun 23, 2021 at 05:29:17PM +0200, Lennart Poettering wrote:
> I am not sure I grok your point.

You don't.

> 1. First of all, loopback devices currently don't hook into the media
>    change logic (which so far is focussed on time-based polling
>    actually, for both CDs and floppies).

And that is the whole problem.  We need to first fix loop devices to
hook into the existing mechanism to report media changes.  We can then
enhance that mechanism to be more suitable to loop (avoid the polling)
and userspace (add a sequence number).  But if we don't have the basic
agreement to fully integreat loop with the existing way that the kernel
reports media change we don't even need to discuss this series and can
just ignore it, as it simply won't be acceptable.

>    Adding this would change
>    semantics visibly to userspace (since userspace would suddenly see
>    another action=change + DISK_MEDIA_CHANGE=1 uevent suddenly that it
>    needs to handle correctly).

Yes, and that is a good thing as loop is currently completely broken
in this respect.

> 2. We want seqnums to be allocated for devices not only when doing
>    media change (e.g. when attaching or detaching a loopback device)
>    but also when allocating a block device, so that even before the
>    first media change event a block device has a sequence number. This
>    means allocating a sequence number for block devices won't be
>    limited to the media change code anyway.

Doing this on creation is fine, and attaching is by definition a media
change.
diff mbox series

Patch

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 76e12f3482a9..b1c638d23306 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -735,6 +735,7 @@  static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
 		goto out_err;
 
 	/* and ... switch */
+	inc_diskseq(lo->lo_disk);
 	blk_mq_freeze_queue(lo->lo_queue);
 	mapping_set_gfp_mask(old_file->f_mapping, lo->old_gfp_mask);
 	lo->lo_backing_file = file;
@@ -1123,6 +1124,8 @@  static int loop_configure(struct loop_device *lo, fmode_t mode,
 	if (error)
 		goto out_unlock;
 
+	inc_diskseq(lo->lo_disk);
+
 	if (!(file->f_mode & FMODE_WRITE) || !(mode & FMODE_WRITE) ||
 	    !file->f_op->write_iter)
 		lo->lo_flags |= LO_FLAGS_READ_ONLY;
@@ -1223,6 +1226,8 @@  static int __loop_clr_fd(struct loop_device *lo, bool release)
 	lo->lo_backing_file = NULL;
 	spin_unlock_irq(&lo->lo_lock);
 
+	inc_diskseq(lo->lo_disk);
+
 	loop_release_xfer(lo);
 	lo->transfer = NULL;
 	lo->ioctl = NULL;