Message ID | 20210623105858.6978-7-mcroce@linux.microsoft.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: add a sequence number to disks | expand |
On Wed, Jun 23, 2021 at 12:58:58PM +0200, Matteo Croce wrote: > From: Matteo Croce <mcroce@microsoft.com> > > On a very loaded system, if there are many events queued up from multiple > attach/detach cycles, it's impossible to match them up with the > LOOP_CONFIGURE or LOOP_SET_FD call, since we don't know where the position > of our own association in the queue is[1]. > Not even an empty uevent queue is a reliable indication that we already > received the uevent we were waiting for, since with multi-partition block > devices each partition's event is queued asynchronously and might be > delivered later. > > Increment the disk sequence number when setting or changing the backing > file, so the userspace knows which backing file generated the event: Instead of manually incrementing the sequence here, can we make loop generate the DISK_EVENT_MEDIA_CHANGE event on a backing device (aka media) change?
On Wed, 2021-06-23 at 12:57 +0100, Christoph Hellwig wrote: > On Wed, Jun 23, 2021 at 12:58:58PM +0200, Matteo Croce wrote: > > From: Matteo Croce <mcroce@microsoft.com> > > > > On a very loaded system, if there are many events queued up from multiple > > attach/detach cycles, it's impossible to match them up with the > > LOOP_CONFIGURE or LOOP_SET_FD call, since we don't know where the position > > of our own association in the queue is[1]. > > Not even an empty uevent queue is a reliable indication that we already > > received the uevent we were waiting for, since with multi-partition block > > devices each partition's event is queued asynchronously and might be > > delivered later. > > > > Increment the disk sequence number when setting or changing the backing > > file, so the userspace knows which backing file generated the event: > > Instead of manually incrementing the sequence here, can we make loop > generate the DISK_EVENT_MEDIA_CHANGE event on a backing device (aka > media) change? Hi, This was answered in the v1 thread: https://lore.kernel.org/linux-fsdevel/20210315201331.GA2577561@casper.infradead.org/t/#m8a677028572e826352cbb1e19d1b9c1f3b6bff4b The fundamental issue is that we'd be back at trying to correlate events to loopdev instances, which does not work reliably - hence this patch series. With the new ioctl, we can get the id immediately and without delay when we create the device, with no possible races. Then we can handle events reliably, as we can correlate correctly in all cases.
On Wed, Jun 23, 2021 at 02:13:25PM +0100, Luca Boccassi wrote: > On Wed, 2021-06-23 at 12:57 +0100, Christoph Hellwig wrote: > > On Wed, Jun 23, 2021 at 12:58:58PM +0200, Matteo Croce wrote: > > > From: Matteo Croce <mcroce@microsoft.com> > > > > > > On a very loaded system, if there are many events queued up from multiple > > > attach/detach cycles, it's impossible to match them up with the > > > LOOP_CONFIGURE or LOOP_SET_FD call, since we don't know where the position > > > of our own association in the queue is[1]. > > > Not even an empty uevent queue is a reliable indication that we already > > > received the uevent we were waiting for, since with multi-partition block > > > devices each partition's event is queued asynchronously and might be > > > delivered later. > > > > > > Increment the disk sequence number when setting or changing the backing > > > file, so the userspace knows which backing file generated the event: > > > > Instead of manually incrementing the sequence here, can we make loop > > generate the DISK_EVENT_MEDIA_CHANGE event on a backing device (aka > > media) change? > > Hi, > > This was answered in the v1 thread: > > https://lore.kernel.org/linux-fsdevel/20210315201331.GA2577561@casper.infradead.org/t/#m8a677028572e826352cbb1e19d1b9c1f3b6bff4b > > The fundamental issue is that we'd be back at trying to correlate > events to loopdev instances, which does not work reliably - hence this > patch series. With the new ioctl, we can get the id immediately and > without delay when we create the device, with no possible races. Then > we can handle events reliably, as we can correlate correctly in all > cases. I very much disagree with your reply there. The device now points to a different media. Both for the loop device, a floppy or a CD changer probably by some kind of user action. In the last cast it might even by done entirely locally through a script just like the loop device.
On Mi, 23.06.21 15:25, Christoph Hellwig (hch@infradead.org) wrote: > On Wed, Jun 23, 2021 at 02:13:25PM +0100, Luca Boccassi wrote: > > On Wed, 2021-06-23 at 12:57 +0100, Christoph Hellwig wrote: > > > On Wed, Jun 23, 2021 at 12:58:58PM +0200, Matteo Croce wrote: > > > > From: Matteo Croce <mcroce@microsoft.com> > > > > > > > > On a very loaded system, if there are many events queued up from multiple > > > > attach/detach cycles, it's impossible to match them up with the > > > > LOOP_CONFIGURE or LOOP_SET_FD call, since we don't know where the position > > > > of our own association in the queue is[1]. > > > > Not even an empty uevent queue is a reliable indication that we already > > > > received the uevent we were waiting for, since with multi-partition block > > > > devices each partition's event is queued asynchronously and might be > > > > delivered later. > > > > > > > > Increment the disk sequence number when setting or changing the backing > > > > file, so the userspace knows which backing file generated the event: > > > > > > Instead of manually incrementing the sequence here, can we make loop > > > generate the DISK_EVENT_MEDIA_CHANGE event on a backing device (aka > > > media) change? > > > > Hi, > > > > This was answered in the v1 thread: > > > > https://lore.kernel.org/linux-fsdevel/20210315201331.GA2577561@casper.infradead.org/t/#m8a677028572e826352cbb1e19d1b9c1f3b6bff4b > > > > The fundamental issue is that we'd be back at trying to correlate > > events to loopdev instances, which does not work reliably - hence this > > patch series. With the new ioctl, we can get the id immediately and > > without delay when we create the device, with no possible races. Then > > we can handle events reliably, as we can correlate correctly in all > > cases. > > I very much disagree with your reply there. The device now points to > a different media. Both for the loop device, a floppy or a CD changer > probably by some kind of user action. In the last cast it might even > by done entirely locally through a script just like the loop device. I am not sure I grok your point. but let me try to explain why I think it's better to make media changes *also* trigger seqno changes, and not make media change events the *only* way to trigger seqno changes. 1. First of all, loopback devices currently don't hook into the media change logic (which so far is focussed on time-based polling actually, for both CDs and floppies). Adding this would change semantics visibly to userspace (since userspace would suddenly see another action=change + DISK_MEDIA_CHANGE=1 uevent suddenly that it needs to handle correctly). One can certainly argue that userspace must be ready to get additional uevents like this any time, but knowing userspace a bit I am pretty sure this will confuse some userspace that doesn't expect this. I mean loopback block devices already issue "change" uevents on attachment and detachment, one that userpace typically expects, but adding the media change one probably means sending two (?) of these out for each attachment. One being the old one from the loopback device itself, and then another one for the media change from the mdia change logic. That's not just noisy, but also ugly. 2. We want seqnums to be allocated for devices not only when doing media change (e.g. when attaching or detaching a loopback device) but also when allocating a block device, so that even before the first media change event a block device has a sequence number. This means allocating a sequence number for block devices won't be limited to the media change code anyway. 3. Doing the sequence number bumping in media change code exclusively kinda suggests this was something we could properly abstract away, to be done only there, and that the rest of the block subsystems wouldn#t have to bother much. But I am pretty sure that's not correct: in particular in the loopback block device code (but in other block subsystems too I guess) we really want to be able to atomically attach a loopback block device and return the seqnum of that very attachmnt so that we can focus on uevents for it. Thus, attachment, allocation and returning the seqnum to userspace in the LOOP_CONFIGURE ioctl (or whatever is appropriate) kinda go hand in hand. 4. The media change protocol follows a protocol along with the eject button handling (DISK_EVENT_EJECT_REQUEST), the locking of the tray and the time based polling. i.e. it's specifically focussed on certain hw features, none of which really apply to loopback block devices, which have no trays to lock, but eject buttons to handle, and where media change is always triggered internally by privileged code, never externally by the user. I doubt it makes sense to mix these up. Currently udev+udisks in userspace implement that procotol for cdroms/floppies, and I doubt we would want to bother to extend this for loopback devices in userspace. In fact, if media change events are added to loopback devices, we probably would have to change userspace to ignore them. TLDR: making loopback block devices generate media is a (probably minor, but unclear) API break, probably means duplicating uevent traffic for it, won't abstract things away anyway, won't allocate a seqnum on device allocation, and won't actually use anything of the real media change functionality. Or to say this differently: if the goal is to make loopback block devices to send CDROM compatible media change events to userspace, then I think it would probably make more sense to attach the DISK_MEDIA_CHANGE=1 property to the attachment/detachment uevents the loopback devices *already* generate, rather than to try to shoehorn the existing media change infrastructure into the loopback device logic, trying to reuse it even though nothing of it is really needed. But that said, I am not aware of anyone ever asking for CDROM compatible EDISK_MEDIA_CHANGE=1 uevents for loopback devices. I really wouldn't bother with that. Sounds like nothing anyone want with a chance of breaking things everywhere. (i.e. remember the "bind"/"unbind" uevent fiasco?) Lennart -- Lennart Poettering, Berlin
On Wed, Jun 23, 2021 at 05:29:17PM +0200, Lennart Poettering wrote: > I am not sure I grok your point. You don't. > 1. First of all, loopback devices currently don't hook into the media > change logic (which so far is focussed on time-based polling > actually, for both CDs and floppies). And that is the whole problem. We need to first fix loop devices to hook into the existing mechanism to report media changes. We can then enhance that mechanism to be more suitable to loop (avoid the polling) and userspace (add a sequence number). But if we don't have the basic agreement to fully integreat loop with the existing way that the kernel reports media change we don't even need to discuss this series and can just ignore it, as it simply won't be acceptable. > Adding this would change > semantics visibly to userspace (since userspace would suddenly see > another action=change + DISK_MEDIA_CHANGE=1 uevent suddenly that it > needs to handle correctly). Yes, and that is a good thing as loop is currently completely broken in this respect. > 2. We want seqnums to be allocated for devices not only when doing > media change (e.g. when attaching or detaching a loopback device) > but also when allocating a block device, so that even before the > first media change event a block device has a sequence number. This > means allocating a sequence number for block devices won't be > limited to the media change code anyway. Doing this on creation is fine, and attaching is by definition a media change.
diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 76e12f3482a9..b1c638d23306 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -735,6 +735,7 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev, goto out_err; /* and ... switch */ + inc_diskseq(lo->lo_disk); blk_mq_freeze_queue(lo->lo_queue); mapping_set_gfp_mask(old_file->f_mapping, lo->old_gfp_mask); lo->lo_backing_file = file; @@ -1123,6 +1124,8 @@ static int loop_configure(struct loop_device *lo, fmode_t mode, if (error) goto out_unlock; + inc_diskseq(lo->lo_disk); + if (!(file->f_mode & FMODE_WRITE) || !(mode & FMODE_WRITE) || !file->f_op->write_iter) lo->lo_flags |= LO_FLAGS_READ_ONLY; @@ -1223,6 +1226,8 @@ static int __loop_clr_fd(struct loop_device *lo, bool release) lo->lo_backing_file = NULL; spin_unlock_irq(&lo->lo_lock); + inc_diskseq(lo->lo_disk); + loop_release_xfer(lo); lo->transfer = NULL; lo->ioctl = NULL;