Message ID | 1416280832-24609-1-git-send-email-inki.dae@samsung.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, 2014-11-18 at 12:20 +0900, Inki Dae wrote: > This patch makes the deferred probe is tried up to 3 times in maximum. > However, this is considered only for Exynos drm so I think other SoC > drivers could also produce same issue. Therefore, the best way to resolve > this issue, infinite loop incurred by defered probe, would be that dd core > is fixed up corrrectly. At first sight this seems to make little to no sense. Unless i'm mistaken this would cause the exynos drm probe return -ENODEV to the dd core, causing it to stop trying to probe. Which obviously breaks your infinite loop, it also breaks situations where the probe needs to be retried more then 3 times. I suspect with this patch once exynos DRM is loaded and actually validly needs to defer (iotw when the required modules do exist but simply aren't loaded just yet), it still jumps into an infinite loop which you break after 3 tries after which the display will simply never come up even if everything is in place because the core doesn't know it should re-probe.... > Signed-off-by: Inki Dae <inki.dae@samsung.com> > --- > drivers/gpu/drm/exynos/exynos_drm_drv.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c b/drivers/gpu/drm/exynos/exynos_drm_drv.c > index eab12f0..4d84f3a 100644 > --- a/drivers/gpu/drm/exynos/exynos_drm_drv.c > +++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c > @@ -38,6 +38,8 @@ > #define DRIVER_MAJOR 1 > #define DRIVER_MINOR 0 > > +#define MAX_TRY_PROBE_DEFER 3 > + > static struct platform_device *exynos_drm_pdev; > > static DEFINE_MUTEX(drm_component_lock); > @@ -481,6 +483,7 @@ static struct component_match *exynos_drm_match_add(struct device *dev) > struct component_match *match = NULL; > struct component_dev *cdev; > unsigned int attach_cnt = 0; > + static unsigned int try_probe_defer; > > mutex_lock(&drm_component_lock); > > @@ -527,6 +530,11 @@ out_lock: > > mutex_unlock(&drm_component_lock); > > + if (++try_probe_defer > MAX_TRY_PROBE_DEFER) { > + try_probe_defer = 0; > + return ERR_PTR(-ENODEV); > + } > + > return attach_cnt ? match : ERR_PTR(-EPROBE_DEFER); > } >
On 2014? 11? 18? 16:58, Sjoerd Simons wrote: > On Tue, 2014-11-18 at 12:20 +0900, Inki Dae wrote: >> This patch makes the deferred probe is tried up to 3 times in maximum. >> However, this is considered only for Exynos drm so I think other SoC >> drivers could also produce same issue. Therefore, the best way to resolve >> this issue, infinite loop incurred by defered probe, would be that dd core >> is fixed up corrrectly. > > At first sight this seems to make little to no sense. Unless i'm > mistaken this would cause the exynos drm probe return -ENODEV to the dd > core, causing it to stop trying to probe. Which obviously breaks your > infinite loop, it also breaks situations where the probe needs to be > retried more then 3 times. Right, but at least, we could avoid kernel booting failure which is very critical. Please know that this patch is temporary to avoid the kernel booting failure although deferred probe request of Exynos drm could be broken. For this, I will look into dd core to find out more generic way: I suspect that this might be incurred in case that a driver is probed in probe context of other driver or it might be really dd core bug. Thanks, Inki Dae > > I suspect with this patch once exynos DRM is loaded and actually validly > needs to defer (iotw when the required modules do exist but simply > aren't loaded just yet), it still jumps into an infinite loop which you > break after 3 tries after which the display will simply never come up > even if everything is in place because the core doesn't know it should > re-probe.... > > > >> Signed-off-by: Inki Dae <inki.dae@samsung.com> >> --- >> drivers/gpu/drm/exynos/exynos_drm_drv.c | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c b/drivers/gpu/drm/exynos/exynos_drm_drv.c >> index eab12f0..4d84f3a 100644 >> --- a/drivers/gpu/drm/exynos/exynos_drm_drv.c >> +++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c >> @@ -38,6 +38,8 @@ >> #define DRIVER_MAJOR 1 >> #define DRIVER_MINOR 0 >> >> +#define MAX_TRY_PROBE_DEFER 3 >> + >> static struct platform_device *exynos_drm_pdev; >> >> static DEFINE_MUTEX(drm_component_lock); >> @@ -481,6 +483,7 @@ static struct component_match *exynos_drm_match_add(struct device *dev) >> struct component_match *match = NULL; >> struct component_dev *cdev; >> unsigned int attach_cnt = 0; >> + static unsigned int try_probe_defer; >> >> mutex_lock(&drm_component_lock); >> >> @@ -527,6 +530,11 @@ out_lock: >> >> mutex_unlock(&drm_component_lock); >> >> + if (++try_probe_defer > MAX_TRY_PROBE_DEFER) { >> + try_probe_defer = 0; >> + return ERR_PTR(-ENODEV); >> + } >> + >> return attach_cnt ? match : ERR_PTR(-EPROBE_DEFER); >> } >> > >
Hello Inki, > Right, but at least, we could avoid kernel booting failure which is very > critical. Please know that this patch is temporary to avoid the kernel > booting failure although deferred probe request of Exynos drm could be > broken. For this, I will look into dd core to find out more generic way: > I suspect that this might be incurred in case that a driver is probed in > probe context of other driver or it might be really dd core bug. > I gave a try to your patch on top of today's linux-next and I still see the same boot failure reported by Kevin on a Exynos5420 Peach Pit so $subject does not fix the issue. The boot message is [0] fyi. By digging a bit I noticed that this happens when the exynos_drm_platform_probe() calls platform_driver_register() to register the Exynos fimd platform driver. The problem is that in __driver_attach() the call to device_lock(dev->parent) never returns and the thread sleeps forever waiting for the device parent mutex to be released. Do you have any ideas why this could happen? If I modify __driver_attach() to only grab the device lock and not its parent lock, then the thread is able to hold its own mutex and the platform driver registration succeeds but then I see the infinite loop that was reported before and the workaround in $subject indeed avoids to happen. So we have two issues here and your patch is only a workaround for the later. Best regards, Javier [0]: [ 1.324091] [drm] Initialized drm 1.1.0 20060810 [ 240.158665] random: nonblocking pool is initialized [ 240.162202] INFO: task swapper/0:1 blocked for more than 120 seconds. [ 240.168493] Not tainted 3.18.0-rc4-next-20141117-00001-g85466f9 #22 [ 240.175256] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 240.183064] swapper/0 D c045bb00 0 1 0 0x00000000 [ 240.189410] [<c045bb00>] (__schedule) from [<c045c230>] (schedule_preempt_disabled+0x14/0x20) [ 240.197904] [<c045c230>] (schedule_preempt_disabled) from [<c045e998>] (__mutex_lock_slowpath+0x19c/0x3f4) [ 240.207531] [<c045e998>] (__mutex_lock_slowpath) from [<c045ebfc>] (mutex_lock+0xc/0x24) [ 240.215599] [<c045ebfc>] (mutex_lock) from [<c0281718>] (__driver_attach+0x44/0x90) [ 240.223239] [<c0281718>] (__driver_attach) from [<c027ff30>] (bus_for_each_dev+0x54/0x88) [ 240.231387] [<c027ff30>] (bus_for_each_dev) from [<c0280da0>] (bus_add_driver+0xd8/0x1cc) [ 240.239541] [<c0280da0>] (bus_add_driver) from [<c0281d80>] (driver_register+0x78/0xf4) [ 240.247523] [<c0281d80>] (driver_register) from [<c0274324>] (exynos_drm_platform_probe+0x34/0x188) [ 240.256546] [<c0274324>] (exynos_drm_platform_probe) from [<c02829d8>] (platform_drv_probe+0x48/0x98) [ 240.265739] [<c02829d8>] (platform_drv_probe) from [<c02815b4>] (driver_probe_device+0x114/0x234) [ 240.274588] [<c02815b4>] (driver_probe_device) from [<c0281760>] (__driver_attach+0x8c/0x90) [ 240.283003] [<c0281760>] (__driver_attach) from [<c027ff30>] (bus_for_each_dev+0x54/0x88) [ 240.291158] [<c027ff30>] (bus_for_each_dev) from [<c0280da0>] (bus_add_driver+0xd8/0x1cc) [ 240.299311] [<c0280da0>] (bus_add_driver) from [<c0281d80>] (driver_register+0x78/0xf4) [ 240.307293] [<c0281d80>] (driver_register) from [<c02742a4>] (exynos_drm_init+0x84/0xd0) [ 240.315362] [<c02742a4>] (exynos_drm_init) from [<c0008944>] (do_one_initcall+0x80/0x1d0) [ 240.323521] [<c0008944>] (do_one_initcall) from [<c0624d3c>] (kernel_init_freeable+0x108/0x1d4) [ 240.332191] [<c0624d3c>] (kernel_init_freeable) from [<c0457224>] (kernel_init+0x8/0xe4) [ 240.340261] [<c0457224>] (kernel_init) from [<c000e638>] (ret_from_fork+0x14/0x3c)
On 2014? 11? 18? 19:23, Javier Martinez Canillas wrote: > Hello Inki, > >> Right, but at least, we could avoid kernel booting failure which is very >> critical. Please know that this patch is temporary to avoid the kernel >> booting failure although deferred probe request of Exynos drm could be >> broken. For this, I will look into dd core to find out more generic way: >> I suspect that this might be incurred in case that a driver is probed in >> probe context of other driver or it might be really dd core bug. >> > > I gave a try to your patch on top of today's linux-next and I still > see the same boot failure reported by Kevin on a Exynos5420 Peach Pit > so $subject does not fix the issue. The boot message is [0] fyi. > > By digging a bit I noticed that this happens when the > exynos_drm_platform_probe() calls platform_driver_register() to > register the Exynos fimd platform driver. The problem is that in > __driver_attach() the call to device_lock(dev->parent) never returns > and the thread sleeps forever waiting for the device parent mutex to > be released. > > Do you have any ideas why this could happen? As I mentioned above, I guess this issue could be incurred when a driver is probed in probe context of other driver. Actually, we had faced with same issue but we couldn't see this issue anymore for some time. Anyway, we need more checking. Thanks, Inki Dae > > If I modify __driver_attach() to only grab the device lock and not its > parent lock, then the thread is able to hold its own mutex and the > platform driver registration succeeds but then I see the infinite loop > that was reported before and the workaround in $subject indeed avoids > to happen. > > So we have two issues here and your patch is only a workaround for the later. > > Best regards, > Javier > > [0]: > [ 1.324091] [drm] Initialized drm 1.1.0 20060810 > [ 240.158665] random: nonblocking pool is initialized > [ 240.162202] INFO: task swapper/0:1 blocked for more than 120 seconds. > [ 240.168493] Not tainted 3.18.0-rc4-next-20141117-00001-g85466f9 #22 > [ 240.175256] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 240.183064] swapper/0 D c045bb00 0 1 0 0x00000000 > [ 240.189410] [<c045bb00>] (__schedule) from [<c045c230>] > (schedule_preempt_disabled+0x14/0x20) > [ 240.197904] [<c045c230>] (schedule_preempt_disabled) from > [<c045e998>] (__mutex_lock_slowpath+0x19c/0x3f4) > [ 240.207531] [<c045e998>] (__mutex_lock_slowpath) from [<c045ebfc>] > (mutex_lock+0xc/0x24) > [ 240.215599] [<c045ebfc>] (mutex_lock) from [<c0281718>] > (__driver_attach+0x44/0x90) > [ 240.223239] [<c0281718>] (__driver_attach) from [<c027ff30>] > (bus_for_each_dev+0x54/0x88) > [ 240.231387] [<c027ff30>] (bus_for_each_dev) from [<c0280da0>] > (bus_add_driver+0xd8/0x1cc) > [ 240.239541] [<c0280da0>] (bus_add_driver) from [<c0281d80>] > (driver_register+0x78/0xf4) > [ 240.247523] [<c0281d80>] (driver_register) from [<c0274324>] > (exynos_drm_platform_probe+0x34/0x188) > [ 240.256546] [<c0274324>] (exynos_drm_platform_probe) from > [<c02829d8>] (platform_drv_probe+0x48/0x98) > [ 240.265739] [<c02829d8>] (platform_drv_probe) from [<c02815b4>] > (driver_probe_device+0x114/0x234) > [ 240.274588] [<c02815b4>] (driver_probe_device) from [<c0281760>] > (__driver_attach+0x8c/0x90) > [ 240.283003] [<c0281760>] (__driver_attach) from [<c027ff30>] > (bus_for_each_dev+0x54/0x88) > [ 240.291158] [<c027ff30>] (bus_for_each_dev) from [<c0280da0>] > (bus_add_driver+0xd8/0x1cc) > [ 240.299311] [<c0280da0>] (bus_add_driver) from [<c0281d80>] > (driver_register+0x78/0xf4) > [ 240.307293] [<c0281d80>] (driver_register) from [<c02742a4>] > (exynos_drm_init+0x84/0xd0) > [ 240.315362] [<c02742a4>] (exynos_drm_init) from [<c0008944>] > (do_one_initcall+0x80/0x1d0) > [ 240.323521] [<c0008944>] (do_one_initcall) from [<c0624d3c>] > (kernel_init_freeable+0x108/0x1d4) > [ 240.332191] [<c0624d3c>] (kernel_init_freeable) from [<c0457224>] > (kernel_init+0x8/0xe4) > [ 240.340261] [<c0457224>] (kernel_init) from [<c000e638>] > (ret_from_fork+0x14/0x3c) > -- > To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
On 11/18/2014 11:23 AM, Javier Martinez Canillas wrote: > Hello Inki, > >> Right, but at least, we could avoid kernel booting failure which is very >> critical. Please know that this patch is temporary to avoid the kernel >> booting failure although deferred probe request of Exynos drm could be >> broken. For this, I will look into dd core to find out more generic way: >> I suspect that this might be incurred in case that a driver is probed in >> probe context of other driver or it might be really dd core bug. >> > > I gave a try to your patch on top of today's linux-next and I still > see the same boot failure reported by Kevin on a Exynos5420 Peach Pit > so $subject does not fix the issue. The boot message is [0] fyi. > > By digging a bit I noticed that this happens when the > exynos_drm_platform_probe() calls platform_driver_register() to > register the Exynos fimd platform driver. The problem is that in > __driver_attach() the call to device_lock(dev->parent) never returns > and the thread sleeps forever waiting for the device parent mutex to > be released. > > Do you have any ideas why this could happen? > > If I modify __driver_attach() to only grab the device lock and not its > parent lock, then the thread is able to hold its own mutex and the > platform driver registration succeeds but then I see the infinite loop > that was reported before and the workaround in $subject indeed avoids > to happen. > > So we have two issues here and your patch is only a workaround for the later. This is the same issue Krzysztof reported two weeks ago and I answered him with my diagnosis[1]. [1]: http://permalink.gmane.org/gmane.linux.kernel.samsung-soc/39804 Regards Andrzej > > Best regards, > Javier > > [0]: > [ 1.324091] [drm] Initialized drm 1.1.0 20060810 > [ 240.158665] random: nonblocking pool is initialized > [ 240.162202] INFO: task swapper/0:1 blocked for more than 120 seconds. > [ 240.168493] Not tainted 3.18.0-rc4-next-20141117-00001-g85466f9 #22 > [ 240.175256] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 240.183064] swapper/0 D c045bb00 0 1 0 0x00000000 > [ 240.189410] [<c045bb00>] (__schedule) from [<c045c230>] > (schedule_preempt_disabled+0x14/0x20) > [ 240.197904] [<c045c230>] (schedule_preempt_disabled) from > [<c045e998>] (__mutex_lock_slowpath+0x19c/0x3f4) > [ 240.207531] [<c045e998>] (__mutex_lock_slowpath) from [<c045ebfc>] > (mutex_lock+0xc/0x24) > [ 240.215599] [<c045ebfc>] (mutex_lock) from [<c0281718>] > (__driver_attach+0x44/0x90) > [ 240.223239] [<c0281718>] (__driver_attach) from [<c027ff30>] > (bus_for_each_dev+0x54/0x88) > [ 240.231387] [<c027ff30>] (bus_for_each_dev) from [<c0280da0>] > (bus_add_driver+0xd8/0x1cc) > [ 240.239541] [<c0280da0>] (bus_add_driver) from [<c0281d80>] > (driver_register+0x78/0xf4) > [ 240.247523] [<c0281d80>] (driver_register) from [<c0274324>] > (exynos_drm_platform_probe+0x34/0x188) > [ 240.256546] [<c0274324>] (exynos_drm_platform_probe) from > [<c02829d8>] (platform_drv_probe+0x48/0x98) > [ 240.265739] [<c02829d8>] (platform_drv_probe) from [<c02815b4>] > (driver_probe_device+0x114/0x234) > [ 240.274588] [<c02815b4>] (driver_probe_device) from [<c0281760>] > (__driver_attach+0x8c/0x90) > [ 240.283003] [<c0281760>] (__driver_attach) from [<c027ff30>] > (bus_for_each_dev+0x54/0x88) > [ 240.291158] [<c027ff30>] (bus_for_each_dev) from [<c0280da0>] > (bus_add_driver+0xd8/0x1cc) > [ 240.299311] [<c0280da0>] (bus_add_driver) from [<c0281d80>] > (driver_register+0x78/0xf4) > [ 240.307293] [<c0281d80>] (driver_register) from [<c02742a4>] > (exynos_drm_init+0x84/0xd0) > [ 240.315362] [<c02742a4>] (exynos_drm_init) from [<c0008944>] > (do_one_initcall+0x80/0x1d0) > [ 240.323521] [<c0008944>] (do_one_initcall) from [<c0624d3c>] > (kernel_init_freeable+0x108/0x1d4) > [ 240.332191] [<c0624d3c>] (kernel_init_freeable) from [<c0457224>] > (kernel_init+0x8/0xe4) > [ 240.340261] [<c0457224>] (kernel_init) from [<c000e638>] > (ret_from_fork+0x14/0x3c) >
Hello Andrzej, On Tue, Nov 18, 2014 at 11:48 AM, Andrzej Hajda <a.hajda@samsung.com> wrote: >> So we have two issues here and your patch is only a workaround for the later. > > This is the same issue Krzysztof reported two weeks ago and I answered > him with my diagnosis[1]. > > [1]: http://permalink.gmane.org/gmane.linux.kernel.samsung-soc/39804 > Great, thanks a lot for finding the cause of this issue! > Regards > Andrzej > Best regards, Javier
On 2014? 11? 18? 12:20, Inki Dae wrote: > This patch fixes a infinite loop issue incurred when > it doesn't have a pair of crtc and connector drivers, > which was reported by Kevin Hilman like below, > http://www.spinics.net/lists/linux-samsung-soc/msg39050.html > > cdev->conn_dev could be NULL by exynos_drm_component_del call in case > that connector driver is failed while probing after compoments to crtc > and connector drivers are added to specific drm_compoment_list. > In this case, exynos_drm_match_add returns -EPROBE_DEFER error and > Exynos drm driver will try the defered probe over and over again. > > This patch makes the deferred probe is tried up to 3 times in maximum. > However, this is considered only for Exynos drm so I think other SoC > drivers could also produce same issue. Therefore, the best way to resolve > this issue, infinite loop incurred by defered probe, would be that dd core > is fixed up corrrectly. Ignore this patch. I will post other patch set soon, which supports full separated sub driver modules. Thanks, Inki Dae > > Signed-off-by: Inki Dae <inki.dae@samsung.com> > --- > drivers/gpu/drm/exynos/exynos_drm_drv.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c b/drivers/gpu/drm/exynos/exynos_drm_drv.c > index eab12f0..4d84f3a 100644 > --- a/drivers/gpu/drm/exynos/exynos_drm_drv.c > +++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c > @@ -38,6 +38,8 @@ > #define DRIVER_MAJOR 1 > #define DRIVER_MINOR 0 > > +#define MAX_TRY_PROBE_DEFER 3 > + > static struct platform_device *exynos_drm_pdev; > > static DEFINE_MUTEX(drm_component_lock); > @@ -481,6 +483,7 @@ static struct component_match *exynos_drm_match_add(struct device *dev) > struct component_match *match = NULL; > struct component_dev *cdev; > unsigned int attach_cnt = 0; > + static unsigned int try_probe_defer; > > mutex_lock(&drm_component_lock); > > @@ -527,6 +530,11 @@ out_lock: > > mutex_unlock(&drm_component_lock); > > + if (++try_probe_defer > MAX_TRY_PROBE_DEFER) { > + try_probe_defer = 0; > + return ERR_PTR(-ENODEV); > + } > + > return attach_cnt ? match : ERR_PTR(-EPROBE_DEFER); > } > >
diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c b/drivers/gpu/drm/exynos/exynos_drm_drv.c index eab12f0..4d84f3a 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_drv.c +++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c @@ -38,6 +38,8 @@ #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 +#define MAX_TRY_PROBE_DEFER 3 + static struct platform_device *exynos_drm_pdev; static DEFINE_MUTEX(drm_component_lock); @@ -481,6 +483,7 @@ static struct component_match *exynos_drm_match_add(struct device *dev) struct component_match *match = NULL; struct component_dev *cdev; unsigned int attach_cnt = 0; + static unsigned int try_probe_defer; mutex_lock(&drm_component_lock); @@ -527,6 +530,11 @@ out_lock: mutex_unlock(&drm_component_lock); + if (++try_probe_defer > MAX_TRY_PROBE_DEFER) { + try_probe_defer = 0; + return ERR_PTR(-ENODEV); + } + return attach_cnt ? match : ERR_PTR(-EPROBE_DEFER); }
This patch fixes a infinite loop issue incurred when it doesn't have a pair of crtc and connector drivers, which was reported by Kevin Hilman like below, http://www.spinics.net/lists/linux-samsung-soc/msg39050.html cdev->conn_dev could be NULL by exynos_drm_component_del call in case that connector driver is failed while probing after compoments to crtc and connector drivers are added to specific drm_compoment_list. In this case, exynos_drm_match_add returns -EPROBE_DEFER error and Exynos drm driver will try the defered probe over and over again. This patch makes the deferred probe is tried up to 3 times in maximum. However, this is considered only for Exynos drm so I think other SoC drivers could also produce same issue. Therefore, the best way to resolve this issue, infinite loop incurred by defered probe, would be that dd core is fixed up corrrectly. Signed-off-by: Inki Dae <inki.dae@samsung.com> --- drivers/gpu/drm/exynos/exynos_drm_drv.c | 8 ++++++++ 1 file changed, 8 insertions(+)