diff mbox series

[v1,1/5] drivercore: Revert "deferral race condition fix"

Message ID 20181110181101.24557-1-andriy.shevchenko@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series [v1,1/5] drivercore: Revert "deferral race condition fix" | expand

Commit Message

Andy Shevchenko Nov. 10, 2018, 6:10 p.m. UTC
Consider the following scenario.

There are two independent devices coupled together by functional dependencies:
 - USB OTG (dwc3-pci)
 - extcon (tested with extcon-intel-mrfld, not yet in upstream)

Each of the driver services a corresponding device is built as a module. In the
Buildroot environment the modules are probed by alphabetical ordering of their
modaliases. The latter comes to the case when USB OTG driver will be probed
first followed by extcon one.

So, if the platform anticipates extcon device to be appeared, in the above case
we will get deferred probe of USB OTG, because of ordering.

Now, a cherry on top of the cake, the deferred probing list contains
the only two modules, i.e. USB OTG and extcon. Due to above circumstances,
values in the local_trigger_count and deferred_trigger_count are not the same,
and thus provokes deferred probe triggering again and again.

...
[   20.678332] platform dwc3.0.auto: Retrying from deferred list
[   20.694743] platform dwc3.0.auto: Driver dwc3 requests probe deferral
[   20.701254] platform dwc3.0.auto: Added to deferred list
[   20.706620] platform dwc3.0.auto: driver_deferred_probe_add_trigger 1 2
[   20.713732] platform dwc3.0.auto: Retrying from deferred list
[   20.730035] platform dwc3.0.auto: Driver dwc3 requests probe deferral
[   20.736540] platform dwc3.0.auto: Added to deferred list
[   20.741889] platform dwc3.0.auto: driver_deferred_probe_add_trigger 3 4
[   20.748991] platform dwc3.0.auto: Retrying from deferred list
[   20.765416] platform dwc3.0.auto: Driver dwc3 requests probe deferral
[   20.771914] platform dwc3.0.auto: Added to deferred list
[   20.777279] platform dwc3.0.auto: driver_deferred_probe_add_trigger 5 6
...

Deeper investigation shows the culprit commit 58b116bce136
("drivercore: deferral race condition fix") which was dedicated to fix some
other issue while bringing a regression.

This reverts commit 58b116bce13612e5aa6fcd49ecbd4cf8bb59e835 for good until
we will have better solution.

Cc: Grant Likely <grant.likely@linaro.org>
Cc: Peter Ujfalusi <peter.ujfalusi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 drivers/base/dd.c | 27 ++-------------------------
 1 file changed, 2 insertions(+), 25 deletions(-)

Comments

Greg Kroah-Hartman Nov. 10, 2018, 6:26 p.m. UTC | #1
On Sat, Nov 10, 2018 at 08:10:57PM +0200, Andy Shevchenko wrote:
> Consider the following scenario.
> 
> There are two independent devices coupled together by functional dependencies:
>  - USB OTG (dwc3-pci)
>  - extcon (tested with extcon-intel-mrfld, not yet in upstream)
> 
> Each of the driver services a corresponding device is built as a module. In the
> Buildroot environment the modules are probed by alphabetical ordering of their
> modaliases. The latter comes to the case when USB OTG driver will be probed
> first followed by extcon one.
> 
> So, if the platform anticipates extcon device to be appeared, in the above case
> we will get deferred probe of USB OTG, because of ordering.
> 
> Now, a cherry on top of the cake, the deferred probing list contains
> the only two modules, i.e. USB OTG and extcon. Due to above circumstances,
> values in the local_trigger_count and deferred_trigger_count are not the same,
> and thus provokes deferred probe triggering again and again.
> 
> ...
> [   20.678332] platform dwc3.0.auto: Retrying from deferred list
> [   20.694743] platform dwc3.0.auto: Driver dwc3 requests probe deferral
> [   20.701254] platform dwc3.0.auto: Added to deferred list
> [   20.706620] platform dwc3.0.auto: driver_deferred_probe_add_trigger 1 2
> [   20.713732] platform dwc3.0.auto: Retrying from deferred list
> [   20.730035] platform dwc3.0.auto: Driver dwc3 requests probe deferral
> [   20.736540] platform dwc3.0.auto: Added to deferred list
> [   20.741889] platform dwc3.0.auto: driver_deferred_probe_add_trigger 3 4
> [   20.748991] platform dwc3.0.auto: Retrying from deferred list
> [   20.765416] platform dwc3.0.auto: Driver dwc3 requests probe deferral
> [   20.771914] platform dwc3.0.auto: Added to deferred list
> [   20.777279] platform dwc3.0.auto: driver_deferred_probe_add_trigger 5 6
> ...
> 
> Deeper investigation shows the culprit commit 58b116bce136
> ("drivercore: deferral race condition fix") which was dedicated to fix some
> other issue while bringing a regression.
> 
> This reverts commit 58b116bce13612e5aa6fcd49ecbd4cf8bb59e835 for good until
> we will have better solution.
> 
> Cc: Grant Likely <grant.likely@linaro.org>
> Cc: Peter Ujfalusi <peter.ujfalusi@ti.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Felipe Balbi <balbi@kernel.org>
> Cc: Andrzej Hajda <a.hajda@samsung.com>
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> ---
>  drivers/base/dd.c | 27 ++-------------------------
>  1 file changed, 2 insertions(+), 25 deletions(-)

Shouldn't there be a "Fixes:" line and cc: stable here?

thanks,

greg k-h
Andy Shevchenko Nov. 10, 2018, 6:36 p.m. UTC | #2
On Sat, Nov 10, 2018 at 10:26:22AM -0800, Greg Kroah-Hartman wrote:
> On Sat, Nov 10, 2018 at 08:10:57PM +0200, Andy Shevchenko wrote:
> > Consider the following scenario.
> > 
> > There are two independent devices coupled together by functional dependencies:
> >  - USB OTG (dwc3-pci)
> >  - extcon (tested with extcon-intel-mrfld, not yet in upstream)
> > 
> > Each of the driver services a corresponding device is built as a module. In the
> > Buildroot environment the modules are probed by alphabetical ordering of their
> > modaliases. The latter comes to the case when USB OTG driver will be probed
> > first followed by extcon one.
> > 
> > So, if the platform anticipates extcon device to be appeared, in the above case
> > we will get deferred probe of USB OTG, because of ordering.
> > 
> > Now, a cherry on top of the cake, the deferred probing list contains
> > the only two modules, i.e. USB OTG and extcon. Due to above circumstances,
> > values in the local_trigger_count and deferred_trigger_count are not the same,
> > and thus provokes deferred probe triggering again and again.
> > 
> > ...
> > [   20.678332] platform dwc3.0.auto: Retrying from deferred list
> > [   20.694743] platform dwc3.0.auto: Driver dwc3 requests probe deferral
> > [   20.701254] platform dwc3.0.auto: Added to deferred list
> > [   20.706620] platform dwc3.0.auto: driver_deferred_probe_add_trigger 1 2
> > [   20.713732] platform dwc3.0.auto: Retrying from deferred list
> > [   20.730035] platform dwc3.0.auto: Driver dwc3 requests probe deferral
> > [   20.736540] platform dwc3.0.auto: Added to deferred list
> > [   20.741889] platform dwc3.0.auto: driver_deferred_probe_add_trigger 3 4
> > [   20.748991] platform dwc3.0.auto: Retrying from deferred list
> > [   20.765416] platform dwc3.0.auto: Driver dwc3 requests probe deferral
> > [   20.771914] platform dwc3.0.auto: Added to deferred list
> > [   20.777279] platform dwc3.0.auto: driver_deferred_probe_add_trigger 5 6
> > ...
> > 
> > Deeper investigation shows the culprit commit 58b116bce136
> > ("drivercore: deferral race condition fix") which was dedicated to fix some
> > other issue while bringing a regression.
> > 
> > This reverts commit 58b116bce13612e5aa6fcd49ecbd4cf8bb59e835 for good until
> > we will have better solution.
> > 
> > Cc: Grant Likely <grant.likely@linaro.org>
> > Cc: Peter Ujfalusi <peter.ujfalusi@ti.com>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Mark Brown <broonie@kernel.org>
> > Cc: Felipe Balbi <balbi@kernel.org>
> > Cc: Andrzej Hajda <a.hajda@samsung.com>
> > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > ---
> >  drivers/base/dd.c | 27 ++-------------------------
> >  1 file changed, 2 insertions(+), 25 deletions(-)
> 
> Shouldn't there be a "Fixes:" line and cc: stable here?

I'm not sure (yet). I would like to hear from people first, especially from
Grant (I spoke to him already for the matters at ELCE in Edinburg).

Perhaps, Hans can have a chance to test this and comment on.
Hans de Goede Nov. 11, 2018, 11:45 a.m. UTC | #3
Hi,

On 11/10/18 7:36 PM, Andy Shevchenko wrote:
> On Sat, Nov 10, 2018 at 10:26:22AM -0800, Greg Kroah-Hartman wrote:
>> On Sat, Nov 10, 2018 at 08:10:57PM +0200, Andy Shevchenko wrote:
>>> Consider the following scenario.
>>>
>>> There are two independent devices coupled together by functional dependencies:
>>>   - USB OTG (dwc3-pci)
>>>   - extcon (tested with extcon-intel-mrfld, not yet in upstream)
>>>
>>> Each of the driver services a corresponding device is built as a module. In the
>>> Buildroot environment the modules are probed by alphabetical ordering of their
>>> modaliases. The latter comes to the case when USB OTG driver will be probed
>>> first followed by extcon one.
>>>
>>> So, if the platform anticipates extcon device to be appeared, in the above case
>>> we will get deferred probe of USB OTG, because of ordering.
>>>
>>> Now, a cherry on top of the cake, the deferred probing list contains
>>> the only two modules, i.e. USB OTG and extcon. Due to above circumstances,
>>> values in the local_trigger_count and deferred_trigger_count are not the same,
>>> and thus provokes deferred probe triggering again and again.
>>>
>>> ...
>>> [   20.678332] platform dwc3.0.auto: Retrying from deferred list
>>> [   20.694743] platform dwc3.0.auto: Driver dwc3 requests probe deferral
>>> [   20.701254] platform dwc3.0.auto: Added to deferred list
>>> [   20.706620] platform dwc3.0.auto: driver_deferred_probe_add_trigger 1 2
>>> [   20.713732] platform dwc3.0.auto: Retrying from deferred list
>>> [   20.730035] platform dwc3.0.auto: Driver dwc3 requests probe deferral
>>> [   20.736540] platform dwc3.0.auto: Added to deferred list
>>> [   20.741889] platform dwc3.0.auto: driver_deferred_probe_add_trigger 3 4
>>> [   20.748991] platform dwc3.0.auto: Retrying from deferred list
>>> [   20.765416] platform dwc3.0.auto: Driver dwc3 requests probe deferral
>>> [   20.771914] platform dwc3.0.auto: Added to deferred list
>>> [   20.777279] platform dwc3.0.auto: driver_deferred_probe_add_trigger 5 6
>>> ...
>>>
>>> Deeper investigation shows the culprit commit 58b116bce136
>>> ("drivercore: deferral race condition fix") which was dedicated to fix some
>>> other issue while bringing a regression.
>>>
>>> This reverts commit 58b116bce13612e5aa6fcd49ecbd4cf8bb59e835 for good until
>>> we will have better solution.
>>>
>>> Cc: Grant Likely <grant.likely@linaro.org>
>>> Cc: Peter Ujfalusi <peter.ujfalusi@ti.com>
>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>> Cc: Mark Brown <broonie@kernel.org>
>>> Cc: Felipe Balbi <balbi@kernel.org>
>>> Cc: Andrzej Hajda <a.hajda@samsung.com>
>>> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
>>> ---
>>>   drivers/base/dd.c | 27 ++-------------------------
>>>   1 file changed, 2 insertions(+), 25 deletions(-)
>>
>> Shouldn't there be a "Fixes:" line and cc: stable here?
> 
> I'm not sure (yet). I would like to hear from people first, especially from
> Grant (I spoke to him already for the matters at ELCE in Edinburg).
> 
> Perhaps, Hans can have a chance to test this and comment on.

I'm currently hitting a regression in 4.20-rc1 which causes it to not
boot on my Cherry Trail test devices (sdhci driver times out so it
cannot find its root filesystem).

I'm currently debugging this (it looks like I need todo a full bisect to
find the cause) once I've that working, if I hit something which look
likes you've described I will give this patch a test. But currently I
cannot reproduce the problem you describe.

As for patches 2-55, they look good to me:

Ackeed-by: Hans de Goede <hdegoede@redhat.com>

Regards,

Hans
Peter Ujfalusi Nov. 12, 2018, 4:11 p.m. UTC | #4
Hi Andy,

On 2018-11-10 20:10, Andy Shevchenko wrote:
> Consider the following scenario.
> 
> There are two independent devices coupled together by functional dependencies:
>  - USB OTG (dwc3-pci)
>  - extcon (tested with extcon-intel-mrfld, not yet in upstream)
> 
> Each of the driver services a corresponding device is built as a module. In the
> Buildroot environment the modules are probed by alphabetical ordering of their
> modaliases. The latter comes to the case when USB OTG driver will be probed
> first followed by extcon one.
> 
> So, if the platform anticipates extcon device to be appeared, in the above case
> we will get deferred probe of USB OTG, because of ordering.
> 
> Now, a cherry on top of the cake, the deferred probing list contains
> the only two modules, i.e. USB OTG and extcon. Due to above circumstances,
> values in the local_trigger_count and deferred_trigger_count are not the same,
> and thus provokes deferred probe triggering again and again.
> 
> ...
> [   20.678332] platform dwc3.0.auto: Retrying from deferred list
> [   20.694743] platform dwc3.0.auto: Driver dwc3 requests probe deferral
> [   20.701254] platform dwc3.0.auto: Added to deferred list
> [   20.706620] platform dwc3.0.auto: driver_deferred_probe_add_trigger 1 2
> [   20.713732] platform dwc3.0.auto: Retrying from deferred list
> [   20.730035] platform dwc3.0.auto: Driver dwc3 requests probe deferral
> [   20.736540] platform dwc3.0.auto: Added to deferred list
> [   20.741889] platform dwc3.0.auto: driver_deferred_probe_add_trigger 3 4
> [   20.748991] platform dwc3.0.auto: Retrying from deferred list
> [   20.765416] platform dwc3.0.auto: Driver dwc3 requests probe deferral
> [   20.771914] platform dwc3.0.auto: Added to deferred list
> [   20.777279] platform dwc3.0.auto: driver_deferred_probe_add_trigger 5 6
> ...
> 
> Deeper investigation shows the culprit commit 58b116bce136
> ("drivercore: deferral race condition fix") which was dedicated to fix some
> other issue while bringing a regression.
> 
> This reverts commit 58b116bce13612e5aa6fcd49ecbd4cf8bb59e835 for good until
> we will have better solution.

if we revert the commit then the original issue will re-surfaces. afaik
it was not only audio which hit the 'last driver to be probed from the
deferred list would never probe, unless we provoke the kernel to load
additional module, or remove/reload the module' issue.

Do I understand correctly that in your case you have two modules
(dwc3-pci and extcon-intel-mrfld) in a deferred probe loop, iow both of
the drivers returns -EPROBE_DEFER and they just spin?

If both is deferring, how this supposed to work?

If we revert 58b116bce13612e5aa6fcd49ecbd4cf8bb59e835, then you might be
hitting the very same issue as described by the commit:
s/davinci_evm sound.3/dwc3-pci
s/davinci-mcasp 4803c000.mcasp/extcon-intel-mrfld

Am I missing something?

> 
> Cc: Grant Likely <grant.likely@linaro.org>
> Cc: Peter Ujfalusi <peter.ujfalusi@ti.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Felipe Balbi <balbi@kernel.org>
> Cc: Andrzej Hajda <a.hajda@samsung.com>
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> ---
>  drivers/base/dd.c | 27 ++-------------------------
>  1 file changed, 2 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 169412ee4ae8..9a966e45fda5 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -53,7 +53,6 @@
>  static DEFINE_MUTEX(deferred_probe_mutex);
>  static LIST_HEAD(deferred_probe_pending_list);
>  static LIST_HEAD(deferred_probe_active_list);
> -static atomic_t deferred_trigger_count = ATOMIC_INIT(0);
>  static struct dentry *deferred_devices;
>  static bool initcalls_done;
>  
> @@ -143,17 +142,6 @@ static bool driver_deferred_probe_enable = false;
>   * This functions moves all devices from the pending list to the active
>   * list and schedules the deferred probe workqueue to process them.  It
>   * should be called anytime a driver is successfully bound to a device.
> - *
> - * Note, there is a race condition in multi-threaded probe. In the case where
> - * more than one device is probing at the same time, it is possible for one
> - * probe to complete successfully while another is about to defer. If the second
> - * depends on the first, then it will get put on the pending list after the
> - * trigger event has already occurred and will be stuck there.
> - *
> - * The atomic 'deferred_trigger_count' is used to determine if a successful
> - * trigger has occurred in the midst of probing a driver. If the trigger count
> - * changes in the midst of a probe, then deferred processing should be triggered
> - * again.
>   */
>  static void driver_deferred_probe_trigger(void)
>  {
> @@ -166,7 +154,6 @@ static void driver_deferred_probe_trigger(void)
>  	 * into the active list so they can be retried by the workqueue
>  	 */
>  	mutex_lock(&deferred_probe_mutex);
> -	atomic_inc(&deferred_trigger_count);
>  	list_splice_tail_init(&deferred_probe_pending_list,
>  			      &deferred_probe_active_list);
>  	mutex_unlock(&deferred_probe_mutex);
> @@ -434,19 +421,9 @@ EXPORT_SYMBOL_GPL(device_bind_driver);
>  static atomic_t probe_count = ATOMIC_INIT(0);
>  static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
>  
> -static void driver_deferred_probe_add_trigger(struct device *dev,
> -					      int local_trigger_count)
> -{
> -	driver_deferred_probe_add(dev);
> -	/* Did a trigger occur while probing? Need to re-trigger if yes */
> -	if (local_trigger_count != atomic_read(&deferred_trigger_count))
> -		driver_deferred_probe_trigger();
> -}
> -
>  static int really_probe(struct device *dev, struct device_driver *drv)
>  {
>  	int ret = -EPROBE_DEFER;
> -	int local_trigger_count = atomic_read(&deferred_trigger_count);
>  	bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
>  			   !drv->suppress_bind_attrs;
>  
> @@ -463,7 +440,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>  
>  	ret = device_links_check_suppliers(dev);
>  	if (ret == -EPROBE_DEFER)
> -		driver_deferred_probe_add_trigger(dev, local_trigger_count);
> +		driver_deferred_probe_add(dev);
>  	if (ret)
>  		return ret;
>  
> @@ -559,7 +536,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>  	case -EPROBE_DEFER:
>  		/* Driver requested deferred probing */
>  		dev_dbg(dev, "Driver %s requests probe deferral\n", drv->name);
> -		driver_deferred_probe_add_trigger(dev, local_trigger_count);
> +		driver_deferred_probe_add(dev);
>  		break;
>  	case -ENODEV:
>  	case -ENXIO:
> 

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Mark Brown Nov. 14, 2018, 12:33 a.m. UTC | #5
On Mon, Nov 12, 2018 at 06:11:26PM +0200, Peter Ujfalusi wrote:

> if we revert the commit then the original issue will re-surfaces. afaik
> it was not only audio which hit the 'last driver to be probed from the
> deferred list would never probe, unless we provoke the kernel to load
> additional module, or remove/reload the module' issue.

Right, aren't we just going to be swapping one bug for another?

> Do I understand correctly that in your case you have two modules
> (dwc3-pci and extcon-intel-mrfld) in a deferred probe loop, iow both of
> the drivers returns -EPROBE_DEFER and they just spin?

> If both is deferring, how this supposed to work?

I'm struggling to follow the original explanation too :(
Andy Shevchenko Nov. 14, 2018, 8:45 a.m. UTC | #6
On Wed, Nov 14, 2018 at 2:34 AM Mark Brown <broonie@kernel.org> wrote:
>
> On Mon, Nov 12, 2018 at 06:11:26PM +0200, Peter Ujfalusi wrote:
>
> > if we revert the commit then the original issue will re-surfaces. afaik
> > it was not only audio which hit the 'last driver to be probed from the
> > deferred list would never probe, unless we provoke the kernel to load
> > additional module, or remove/reload the module' issue.
>
> Right, aren't we just going to be swapping one bug for another?

Have anyone in possession of Davinchi tested most recent kernel with
this revert?

> > Do I understand correctly that in your case you have two modules
> > (dwc3-pci and extcon-intel-mrfld) in a deferred probe loop, iow both of
> > the drivers returns -EPROBE_DEFER and they just spin?
>
> > If both is deferring, how this supposed to work?
>
> I'm struggling to follow the original explanation too :(

Sorry, guys, I confused a nit myself. The bug is there, but
exxplanation is not fully corrent, indeed. I'll come back with more
details later.
Peter Ujfalusi Nov. 14, 2018, 10:19 a.m. UTC | #7
Andy,

On 2018-11-14 10:45, Andy Shevchenko wrote:
> On Wed, Nov 14, 2018 at 2:34 AM Mark Brown <broonie@kernel.org> wrote:
>>
>> On Mon, Nov 12, 2018 at 06:11:26PM +0200, Peter Ujfalusi wrote:
>>
>>> if we revert the commit then the original issue will re-surfaces. afaik
>>> it was not only audio which hit the 'last driver to be probed from the
>>> deferred list would never probe, unless we provoke the kernel to load
>>> additional module, or remove/reload the module' issue.
>>
>> Right, aren't we just going to be swapping one bug for another?
> 
> Have anyone in possession of Davinchi tested most recent kernel with
> this revert?

I don't think it is related to daVinci, in fact we have seen the very
same issue with OMAP4.
Fwiw this was my initial patch
http://lkml.iu.edu/hypermail/linux/kernel/1404.0/01603.html
Grant based his change on this.

Note the part in the commit message:
"The issue has been observed on embedded systems with CONFIG_PREEMPT
enabled, audio support built as modules and using nfsroot for root
filesystem."

as I recall I was only able to reproduce the stall with nfsroot and
buildroot fs. The timings were different compared to MMC rootfs.

Anyways the patch fixes a real race condition which is generic:
We have driverA and driverB as modules. driverB needs driverA to be
registered to a framework (no direct, hard dependency).

1. driverA starts to probe
2. driverB starts to probe
3. driverB can not continue as driverA is not registered itself to the
framework -> driverB will defer
4. driverA registers to the framework
5. driverA probes successfully
6. driver core will prepare the deferred probe list (note: driverB is
still in it's probe, not in the deferred list)
7. driverB returns from it's probe with -EPROBE_DEFER

and here we are, driverB is alone in the deferred list and the list is
not going to be tried as driverA and driverB were the last ones to
probe.

So, driverB is in the deferred list and stays there until other driver
probes successfully as the deferred driver list only tried when a driver
loads successfully (to see if that satisfy any of the deferred driver).

We have evidence that this has happened, it is a generic issue given
that now days we have more frameworks drivers are relying on.

>>> Do I understand correctly that in your case you have two modules
>>> (dwc3-pci and extcon-intel-mrfld) in a deferred probe loop, iow both of
>>> the drivers returns -EPROBE_DEFER and they just spin?
>>
>>> If both is deferring, how this supposed to work?
>>
>> I'm struggling to follow the original explanation too :(
> 
> Sorry, guys, I confused a nit myself. The bug is there, but
> exxplanation is not fully corrent, indeed. I'll come back with more
> details later.
> 

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
diff mbox series

Patch

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 169412ee4ae8..9a966e45fda5 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -53,7 +53,6 @@ 
 static DEFINE_MUTEX(deferred_probe_mutex);
 static LIST_HEAD(deferred_probe_pending_list);
 static LIST_HEAD(deferred_probe_active_list);
-static atomic_t deferred_trigger_count = ATOMIC_INIT(0);
 static struct dentry *deferred_devices;
 static bool initcalls_done;
 
@@ -143,17 +142,6 @@  static bool driver_deferred_probe_enable = false;
  * This functions moves all devices from the pending list to the active
  * list and schedules the deferred probe workqueue to process them.  It
  * should be called anytime a driver is successfully bound to a device.
- *
- * Note, there is a race condition in multi-threaded probe. In the case where
- * more than one device is probing at the same time, it is possible for one
- * probe to complete successfully while another is about to defer. If the second
- * depends on the first, then it will get put on the pending list after the
- * trigger event has already occurred and will be stuck there.
- *
- * The atomic 'deferred_trigger_count' is used to determine if a successful
- * trigger has occurred in the midst of probing a driver. If the trigger count
- * changes in the midst of a probe, then deferred processing should be triggered
- * again.
  */
 static void driver_deferred_probe_trigger(void)
 {
@@ -166,7 +154,6 @@  static void driver_deferred_probe_trigger(void)
 	 * into the active list so they can be retried by the workqueue
 	 */
 	mutex_lock(&deferred_probe_mutex);
-	atomic_inc(&deferred_trigger_count);
 	list_splice_tail_init(&deferred_probe_pending_list,
 			      &deferred_probe_active_list);
 	mutex_unlock(&deferred_probe_mutex);
@@ -434,19 +421,9 @@  EXPORT_SYMBOL_GPL(device_bind_driver);
 static atomic_t probe_count = ATOMIC_INIT(0);
 static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
 
-static void driver_deferred_probe_add_trigger(struct device *dev,
-					      int local_trigger_count)
-{
-	driver_deferred_probe_add(dev);
-	/* Did a trigger occur while probing? Need to re-trigger if yes */
-	if (local_trigger_count != atomic_read(&deferred_trigger_count))
-		driver_deferred_probe_trigger();
-}
-
 static int really_probe(struct device *dev, struct device_driver *drv)
 {
 	int ret = -EPROBE_DEFER;
-	int local_trigger_count = atomic_read(&deferred_trigger_count);
 	bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
 			   !drv->suppress_bind_attrs;
 
@@ -463,7 +440,7 @@  static int really_probe(struct device *dev, struct device_driver *drv)
 
 	ret = device_links_check_suppliers(dev);
 	if (ret == -EPROBE_DEFER)
-		driver_deferred_probe_add_trigger(dev, local_trigger_count);
+		driver_deferred_probe_add(dev);
 	if (ret)
 		return ret;
 
@@ -559,7 +536,7 @@  static int really_probe(struct device *dev, struct device_driver *drv)
 	case -EPROBE_DEFER:
 		/* Driver requested deferred probing */
 		dev_dbg(dev, "Driver %s requests probe deferral\n", drv->name);
-		driver_deferred_probe_add_trigger(dev, local_trigger_count);
+		driver_deferred_probe_add(dev);
 		break;
 	case -ENODEV:
 	case -ENXIO: