diff mbox series

soc: qcom: pdr: Fix the potential deadlock

Message ID 20250128080751.3718762-1-mukesh.ojha@oss.qualcomm.com (mailing list archive)
State Superseded
Headers show
Series soc: qcom: pdr: Fix the potential deadlock | expand

Commit Message

Mukesh Ojha Jan. 28, 2025, 8:07 a.m. UTC
When some client process A call pdr_add_lookup() to add the look up for
the service and does schedule locator work, later a process B got a new
server packet indicating locator is up and call pdr_locator_new_server()
which eventually sets pdr->locator_init_complete to true which process A
sees and takes list lock and queries domain list but it will timeout due
to deadlock as the response will queued to the same qmi->wq and it is
ordered workqueue and process B is not able to complete new server
request work due to deadlock on list lock.

       Process A                        Process B

                                     process_scheduled_works()
pdr_add_lookup()                      qmi_data_ready_work()
 process_scheduled_works()             pdr_locator_new_server()
                                         pdr->locator_init_complete=true;
   pdr_locator_work()
    mutex_lock(&pdr->list_lock);

     pdr_locate_service()                  mutex_lock(&pdr->list_lock);

      pdr_get_domain_list()
       pr_err("PDR: %s get domain list
               txn wait failed: %d\n",
               req->service_name,
               ret);

Fix it by removing the unnecessary list iteration as the list iteration
is already being done inside locator work, so avoid it here and just
call schedule_work() here.

Signed-off-by: Saranya R <quic_sarar@quicinc.com>
Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
---
 drivers/soc/qcom/pdr_interface.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

Comments

Dmitry Baryshkov Jan. 28, 2025, 4:10 p.m. UTC | #1
On Tue, Jan 28, 2025 at 01:37:51PM +0530, Mukesh Ojha wrote:
> When some client process A call pdr_add_lookup() to add the look up for
> the service and does schedule locator work, later a process B got a new
> server packet indicating locator is up and call pdr_locator_new_server()
> which eventually sets pdr->locator_init_complete to true which process A
> sees and takes list lock and queries domain list but it will timeout due
> to deadlock as the response will queued to the same qmi->wq and it is
> ordered workqueue and process B is not able to complete new server
> request work due to deadlock on list lock.
> 
>        Process A                        Process B
> 
>                                      process_scheduled_works()
> pdr_add_lookup()                      qmi_data_ready_work()
>  process_scheduled_works()             pdr_locator_new_server()
>                                          pdr->locator_init_complete=true;
>    pdr_locator_work()
>     mutex_lock(&pdr->list_lock);
> 
>      pdr_locate_service()                  mutex_lock(&pdr->list_lock);
> 
>       pdr_get_domain_list()
>        pr_err("PDR: %s get domain list
>                txn wait failed: %d\n",
>                req->service_name,
>                ret);
> 
> Fix it by removing the unnecessary list iteration as the list iteration
> is already being done inside locator work, so avoid it here and just
> call schedule_work() here.
> 
> Signed-off-by: Saranya R <quic_sarar@quicinc.com>
> Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>

Missing Fixes tag.

> ---
>  drivers/soc/qcom/pdr_interface.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
>
Mukesh Ojha Jan. 29, 2025, 3:47 p.m. UTC | #2
On Tue, Jan 28, 2025 at 06:10:24PM +0200, Dmitry Baryshkov wrote:
> On Tue, Jan 28, 2025 at 01:37:51PM +0530, Mukesh Ojha wrote:
> > When some client process A call pdr_add_lookup() to add the look up for
> > the service and does schedule locator work, later a process B got a new
> > server packet indicating locator is up and call pdr_locator_new_server()
> > which eventually sets pdr->locator_init_complete to true which process A
> > sees and takes list lock and queries domain list but it will timeout due
> > to deadlock as the response will queued to the same qmi->wq and it is
> > ordered workqueue and process B is not able to complete new server
> > request work due to deadlock on list lock.
> > 
> >        Process A                        Process B
> > 
> >                                      process_scheduled_works()
> > pdr_add_lookup()                      qmi_data_ready_work()
> >  process_scheduled_works()             pdr_locator_new_server()
> >                                          pdr->locator_init_complete=true;
> >    pdr_locator_work()
> >     mutex_lock(&pdr->list_lock);
> > 
> >      pdr_locate_service()                  mutex_lock(&pdr->list_lock);
> > 
> >       pdr_get_domain_list()
> >        pr_err("PDR: %s get domain list
> >                txn wait failed: %d\n",
> >                req->service_name,
> >                ret);
> > 
> > Fix it by removing the unnecessary list iteration as the list iteration
> > is already being done inside locator work, so avoid it here and just
> > call schedule_work() here.
> > 
> > Signed-off-by: Saranya R <quic_sarar@quicinc.com>
> > Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
> 
> Missing Fixes tag.

Sure, will add.

-Mukesh
> 
> > ---
> >  drivers/soc/qcom/pdr_interface.c | 8 +-------
> >  1 file changed, 1 insertion(+), 7 deletions(-)
> > 
> 
> -- 
> With best wishes
> Dmitry
diff mbox series

Patch

diff --git a/drivers/soc/qcom/pdr_interface.c b/drivers/soc/qcom/pdr_interface.c
index 328b6153b2be..71be378d2e43 100644
--- a/drivers/soc/qcom/pdr_interface.c
+++ b/drivers/soc/qcom/pdr_interface.c
@@ -75,7 +75,6 @@  static int pdr_locator_new_server(struct qmi_handle *qmi,
 {
 	struct pdr_handle *pdr = container_of(qmi, struct pdr_handle,
 					      locator_hdl);
-	struct pdr_service *pds;
 
 	mutex_lock(&pdr->lock);
 	/* Create a local client port for QMI communication */
@@ -87,12 +86,7 @@  static int pdr_locator_new_server(struct qmi_handle *qmi,
 	mutex_unlock(&pdr->lock);
 
 	/* Service pending lookup requests */
-	mutex_lock(&pdr->list_lock);
-	list_for_each_entry(pds, &pdr->lookups, node) {
-		if (pds->need_locator_lookup)
-			schedule_work(&pdr->locator_work);
-	}
-	mutex_unlock(&pdr->list_lock);
+	schedule_work(&pdr->locator_work);
 
 	return 0;
 }