Message ID | 20150930151449.GC26299@lst.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hello, Christoph. On Wed, Sep 30, 2015 at 05:14:49PM +0200, Christoph Hellwig wrote: > The problem is that async probing deadlocks vs a synchronous > request_module, as Tejun figured out based on the thrad in > http://thread.gmane.org/gmane.linux.kernel/1420814 > > Tejun, if I understand the thread and your patch right there really > isn't any good altenative to a synchronous request_module, and you > thus disabled autoloading elevator modules, right? Yeap, that's right. IIRC, the conclusion was "let's not do that". Thanks.
On Wed, 2015-09-30 at 17:53 -0400, Tejun Heo wrote: > Hello, Christoph. > > On Wed, Sep 30, 2015 at 05:14:49PM +0200, Christoph Hellwig wrote: > > The problem is that async probing deadlocks vs a synchronous > > request_module, as Tejun figured out based on the thrad in > > http://thread.gmane.org/gmane.linux.kernel/1420814 > > > > Tejun, if I understand the thread and your patch right there really > > isn't any good altenative to a synchronous request_module, and you > > thus disabled autoloading elevator modules, right? > > Yeap, that's right. IIRC, the conclusion was "let's not do that". Perhaps we don't have to be that draconian. There's no real reason we can't autoload asynchronously. If the module isn't ready by the time we come to check the attachment, then it will attach to the device later when the module init routine runs. Should we do anything to limit the module_request floods? This will likely happen for every LUN of an ALUA system ... there can be hundreds of those. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 30, 2015 at 05:14:49PM +0200, Christoph Hellwig wrote: > > Paul, can you try the trivial one liner below? > > diff --git a/drivers/scsi/scsi_dh.c b/drivers/scsi/scsi_dh.c > index edb044a..fbc9502 100644 > --- a/drivers/scsi/scsi_dh.c > +++ b/drivers/scsi/scsi_dh.c > @@ -391,7 +391,7 @@ int scsi_dh_attach(struct request_queue *q, const char *name) > if (!sdev) > return -ENODEV; > > - scsi_dh = scsi_dh_lookup(name); > + scsi_dh = __scsi_dh_lookup(name); > if (!scsi_dh) { > err = -EINVAL; > goto out_put_device; I still get the warning: WARNING: at /home/paulus/kernel/kvm/kernel/kmod.c:140 Modules linked in: radeon(+) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm tg3(+) ptp ipr cxgb3 pps_core i2c_core mdio dm_multipath CPU: 11 PID: 7 Comm: kworker/u321:0 Not tainted 4.3.0-rc2-kvm+ #73 Workqueue: events_unbound async_run_entry_fn task: c000000ff0545080 ti: c000000ff0610000 task.ti: c000000ff0610000 ... NIP [c0000000000d390c] __request_module+0x21c/0x380 LR [c0000000000d38f4] __request_module+0x204/0x380 Call Trace: [c000000ff0613920] [c0000000000d38f4] __request_module+0x204/0x380 (unreliable) [c000000ff06139e0] [c0000000006369d4] scsi_dh_lookup+0x64/0x80 [c000000ff0613a50] [c000000000636fcc] scsi_dh_add_device+0x13c/0x170 [c000000ff0613aa0] [c000000000630ea4] scsi_sysfs_add_sdev+0x114/0x380 [c000000ff0613b30] [c00000000062e040] do_scan_async+0xf0/0x240 [c000000ff0613bc0] [c0000000000e6bc0] async_run_entry_fn+0xa0/0x200 [c000000ff0613c50] [c0000000000d9750] process_one_work+0x1a0/0x4b0 [c000000ff0613ce0] [c0000000000d9bf0] worker_thread+0x190/0x5f0 [c000000ff0613d80] [c0000000000e21b0] kthread+0x110/0x130 [c000000ff0613e30] [c0000000000095b0] ret_from_kernel_thread+0x5c/0xac So it's the call to scsi_dh_lookup() in scsi_dh_add_device() that is causing the warning this time. Do you want me to add the "__" to that one too? Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 30, 2015 at 03:34:54PM -0700, James Bottomley wrote: > Perhaps we don't have to be that draconian. There's no real reason we > can't autoload asynchronously. If the module isn't ready by the time we > come to check the attachment, then it will attach to the device later > when the module init routine runs. I don't think this works. We're attaching just after the request_module call, so modprobe doesn't really have a chance to finish before we return. > Should we do anything to limit the module_request floods? This will > likely happen for every LUN of an ALUA system ... there can be hundreds > of those. Once the alua module is loaded there won't be any request_module calls. If you build with device handler support but without ALUA support you'll get a lot of calls, but that's not any different than probing for other optional features. Personally I think we probably should simple build ALUA support into scsi_mod if SCSI_DH is selected, as that's part of the standard and supported by any modern multi pathing device. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2015-10-02 at 14:56 +0200, Christoph Hellwig wrote: > On Wed, Sep 30, 2015 at 03:34:54PM -0700, James Bottomley wrote: > > Perhaps we don't have to be that draconian. There's no real reason we > > can't autoload asynchronously. If the module isn't ready by the time we > > come to check the attachment, then it will attach to the device later > > when the module init routine runs. > > I don't think this works. We're attaching just after the request_module > call, so modprobe doesn't really have a chance to finish before we > return. Yes, I suspect it will mostly happen in the alua attach, except for large LUN sets. > > Should we do anything to limit the module_request floods? This will > > likely happen for every LUN of an ALUA system ... there can be hundreds > > of those. > > Once the alua module is loaded there won't be any request_module calls. > If you build with device handler support but without ALUA support > you'll get a lot of calls, but that's not any different than probing > for other optional features. That doesn't matter: if you modprobe alua after all devices are discovered, it will attach correctly to all potential devices from the alua module_init. This means the effect is the same whether the request_module is sync or async ... the object is to get the device attached to alua if it is an alua device. The only drawback of an async request_module is that we get a flood of request_modules()s from every LUN on an array until alua is loaded. > Personally I think we probably should simple build ALUA support into > scsi_mod if SCSI_DH is selected, as that's part of the standard and > supported by any modern multi pathing device. I suppose; the code size is roughly the same as scsi_dh.o which we just made non modular. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 02, 2015 at 06:25:01AM -0700, James Bottomley wrote: > That doesn't matter: if you modprobe alua after all devices are > discovered, it will attach correctly to all potential devices from the > alua module_init. This means the effect is the same whether the > request_module is sync or async ... the object is to get the device > attached to alua if it is an alua device. No, in 4.3-rc it won't. We removed that feature. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2015-10-02 at 15:34 +0200, Christoph Hellwig wrote: > On Fri, Oct 02, 2015 at 06:25:01AM -0700, James Bottomley wrote: > > That doesn't matter: if you modprobe alua after all devices are > > discovered, it will attach correctly to all potential devices from the > > alua module_init. This means the effect is the same whether the > > request_module is sync or async ... the object is to get the device > > attached to alua if it is an alua device. > > No, in 4.3-rc it won't. We removed that feature. I think I prefer restoring that to having to build in every dh module to get them to work. If we take your proposed fix for the sync module load in the current scheme, any non-built in modules would never attach, so we'd be moving towards the conclusion that *every* device handler has to be non-modular. Skimming the code it looks like dh should be using the driver binding model rather than reinventing it. That would decouple it better and make sure binding happened regardless of when the module was loaded. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 02, 2015 at 06:44:57AM -0700, James Bottomley wrote: > I think I prefer restoring that to having to build in every dh module to > get them to work. If we take your proposed fix for the sync module load > in the current scheme, any non-built in modules would never attach, so > we'd be moving towards the conclusion that *every* device handler has to > be non-modular. You don't need to build every module in to make it work. In 4.2 and earlier we already only auto load modules when dm-multipath explicitly attaches to them. That will still work in 4.3+. In fact we will now autoload when activating through sysfs as well. With the change I sent to Paul we still won't autoload at scan time, which would be really useful to have, but wasn't implemented previously. > Skimming the code it looks like dh should be using the driver binding > model rather than reinventing it. That would decouple it better and > make sure binding happened regardless of when the module was loaded. I tried this early on but gave up because I ran into too many problems. I can try to give it a spin again. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/04/2015 09:45 AM, Christoph Hellwig wrote: > On Fri, Oct 02, 2015 at 06:44:57AM -0700, James Bottomley wrote: >> I think I prefer restoring that to having to build in every dh module to >> get them to work. If we take your proposed fix for the sync module load >> in the current scheme, any non-built in modules would never attach, so >> we'd be moving towards the conclusion that *every* device handler has to >> be non-modular. > > You don't need to build every module in to make it work. In 4.2 and earlier > we already only auto load modules when dm-multipath explicitly attaches > to them. That will still work in 4.3+. In fact we will now autoload > when activating through sysfs as well. With the change I sent to Paul > we still won't autoload at scan time, which would be really useful to have, > but wasn't implemented previously. > >> Skimming the code it looks like dh should be using the driver binding >> model rather than reinventing it. That would decouple it better and >> make sure binding happened regardless of when the module was loaded. > > I tried this early on but gave up because I ran into too many problems. > I can try to give it a spin again. You cannot easily use the driver model here as the scsi_device is already (potentially) bound to the ULDs. If you were to go with the driver model you'd have to introduce another sub device between scsi_target and scsi_device. Actually I have been thinking that, as it might make my life for the ALUA handler easier. However, this would be quite a largish redesign of the current handler infrastructure, pushing out my ALUA handler update even more. So I'd like to have the ALUA changes ironed out first and merged, and then work on a redesigned device handler infrastructure. Cheers, Hannes
On Mon, Oct 12, 2015 at 02:45:38PM +0200, Hannes Reinecke wrote: > You cannot easily use the driver model here as the scsi_device is > already (potentially) bound to the ULDs. > If you were to go with the driver model you'd have to introduce > another sub device between scsi_target and scsi_device. You can have two struct devices in struct scsi_device, while it's not pretty there are plenty of example all over the kernel with multiple devices in a single containing structure. > Actually I have been thinking that, as it might make my life for the > ALUA handler easier. However, this would be quite a largish redesign > of the current handler infrastructure, pushing out my ALUA handler > update even more. > So I'd like to have the ALUA changes ironed out first and merged, > and then work on a redesigned device handler infrastructure. Fine with me. As mentioned before we've never supported autoloading the device handler modules at boot time - we only ever loaded them when explicitly attaching them through device mapper. Waiting another release or maybe two to finally get there isn't the end of the world. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2015-10-12 at 14:45 +0200, Hannes Reinecke wrote: > On 10/04/2015 09:45 AM, Christoph Hellwig wrote: > > On Fri, Oct 02, 2015 at 06:44:57AM -0700, James Bottomley wrote: > >> I think I prefer restoring that to having to build in every dh module to > >> get them to work. If we take your proposed fix for the sync module load > >> in the current scheme, any non-built in modules would never attach, so > >> we'd be moving towards the conclusion that *every* device handler has to > >> be non-modular. > > > > You don't need to build every module in to make it work. In 4.2 and earlier > > we already only auto load modules when dm-multipath explicitly attaches > > to them. That will still work in 4.3+. In fact we will now autoload > > when activating through sysfs as well. With the change I sent to Paul > > we still won't autoload at scan time, which would be really useful to have, > > but wasn't implemented previously. > > > >> Skimming the code it looks like dh should be using the driver binding > >> model rather than reinventing it. That would decouple it better and > >> make sure binding happened regardless of when the module was loaded. > > > > I tried this early on but gave up because I ran into too many problems. > > I can try to give it a spin again. > > You cannot easily use the driver model here as the scsi_device is > already (potentially) bound to the ULDs. > If you were to go with the driver model you'd have to introduce > another sub device between scsi_target and scsi_device. I was thinking more like what we do today for the ULD's: 3 of them use the driver binding and one uses the class interface model. Why can't we also use the class interface model for dh? James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Oct 12, 2015 at 10:39 AM, Christoph Hellwig <hch@lst.de> wrote: > On Mon, Oct 12, 2015 at 02:45:38PM +0200, Hannes Reinecke wrote: >> You cannot easily use the driver model here as the scsi_device is >> already (potentially) bound to the ULDs. >> If you were to go with the driver model you'd have to introduce >> another sub device between scsi_target and scsi_device. > > You can have two struct devices in struct scsi_device, while it's > not pretty there are plenty of example all over the kernel with > multiple devices in a single containing structure. > >> Actually I have been thinking that, as it might make my life for the >> ALUA handler easier. However, this would be quite a largish redesign >> of the current handler infrastructure, pushing out my ALUA handler >> update even more. >> So I'd like to have the ALUA changes ironed out first and merged, >> and then work on a redesigned device handler infrastructure. > > Fine with me. As mentioned before we've never supported autoloading > the device handler modules at boot time - we only ever loaded them when > explicitly attaching them through device mapper. Waiting another > release or maybe two to finally get there isn't the end of the world. What may be getting lost during this discussion is that historically it has been important to be able to attach the scsi_dh during the SCSI scan. As the scsi_dh alters the SCSI midlayer's sense code processing (via callout to the attached scsi_dh). And this altered SCSI sense code handling amounts to the difference between a successful/quick boot versus hugely delayed and ultimately error-prone boot on systems with many LUNs that have multiple paths. So that is why either of these solutions were deployed: 1) in RHEL6 we'd require dracut to preload the scsi_dh modules early in loading the initramfs 2) in RHEL7 all scsi_dh modules _are_ builtin Both achieve the goal of having all required scsi_dh available during SCSI scan. Mike -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Oct 12, 2015 at 03:29:45PM -0400, Mike Snitzer wrote: > What may be getting lost during this discussion is that historically > it has been important to be able to attach the scsi_dh during the SCSI > scan. As the scsi_dh alters the SCSI midlayer's sense code processing > (via callout to the attached scsi_dh). And this altered SCSI sense > code handling amounts to the difference between a successful/quick > boot versus hugely delayed and ultimately error-prone boot on systems > with many LUNs that have multiple paths. > > So that is why either of these solutions were deployed: > 1) in RHEL6 we'd require dracut to preload the scsi_dh modules early > in loading the initramfs > 2) in RHEL7 all scsi_dh modules _are_ builtin > > Both achieve the goal of having all required scsi_dh available during SCSI scan. Yes, and both are workarounds. I tried to implement this properly, but due to async probing it doesn't actually work. Given the statement from Tejun I don't really see how to ever get it to work as long as we use async probing and the module loading code isn't safe to call from the async probe path unfortunately. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/12/2015 09:36 PM, Christoph Hellwig wrote: > On Mon, Oct 12, 2015 at 03:29:45PM -0400, Mike Snitzer wrote: >> What may be getting lost during this discussion is that historically >> it has been important to be able to attach the scsi_dh during the SCSI >> scan. As the scsi_dh alters the SCSI midlayer's sense code processing >> (via callout to the attached scsi_dh). And this altered SCSI sense >> code handling amounts to the difference between a successful/quick >> boot versus hugely delayed and ultimately error-prone boot on systems >> with many LUNs that have multiple paths. >> >> So that is why either of these solutions were deployed: >> 1) in RHEL6 we'd require dracut to preload the scsi_dh modules early >> in loading the initramfs >> 2) in RHEL7 all scsi_dh modules _are_ builtin >> >> Both achieve the goal of having all required scsi_dh available during SCSI scan. > > Yes, and both are workarounds. I tried to implement this properly, > but due to async probing it doesn't actually work. Given the statement > from Tejun I don't really see how to ever get it to work as long as > we use async probing and the module loading code isn't safe to call > from the async probe path unfortunately. > We've originally designed them to be modular as some handler (notably emc, netapp, and alua) are mutually exclusive, ie you could load either the alua or one of the vendor specific ones. In the light of the discussion I think it would be better to upgrade the device handler to become their own transport class, to be hooked in between scsi_target and scsi_device. That way we would have a way of exposing the topology (target port groups etc) and would get rid of the probing problem. It would mean that effectively the device handler would be demodularized and become a compile-time option, but even that wouldn't be too much of an issue, seeing that most vendors load the modules unconditionally anyway. (BTW, SUSE also loads the modules unconditionally ...) Cheers, Hannes
On Tue, Oct 13, 2015 at 08:00:43AM +0200, Hannes Reinecke wrote: > In the light of the discussion I think it would be better to > upgrade the device handler to become their own transport class, > to be hooked in between scsi_target and scsi_device. > That way we would have a way of exposing the topology (target port > groups etc) and would get rid of the probing problem. I don't think a transport class makes sense here. A struct device or a class_interface as suggested by James sound much better. But even then unless we solve the request_module from async context problem we can't reliably load them as soon as we encounter the first ALUA/RDAC/EMC/HP device, so distribution would probably still need to pre-load them. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/scsi/scsi_dh.c b/drivers/scsi/scsi_dh.c index edb044a..fbc9502 100644 --- a/drivers/scsi/scsi_dh.c +++ b/drivers/scsi/scsi_dh.c @@ -391,7 +391,7 @@ int scsi_dh_attach(struct request_queue *q, const char *name) if (!sdev) return -ENODEV; - scsi_dh = scsi_dh_lookup(name); + scsi_dh = __scsi_dh_lookup(name); if (!scsi_dh) { err = -EINVAL; goto out_put_device;