Message ID | 20230318081303.792969-1-zyytlz.wz@163.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [RESEND] scsi: qedi: Fix use after free bug in qedi_remove due to race condition | expand |
On 3/18/23 3:13 AM, Zheng Wang wrote: > In qedi_probe, it calls __qedi_probe, which bound &qedi->recovery_work > with qedi_recovery_handler and bound &qedi->board_disable_work > with qedi_board_disable_work. > > When it calls qedi_schedule_recovery_handler, it will finally > call schedule_delayed_work to start the work. > > When we call qedi_remove to remove the driver, there > may be a sequence as follows: > > Fix it by finishing the work before cleanup in qedi_remove. > > CPU0 CPU1 > > |qedi_recovery_handler > qedi_remove | > __qedi_remove | > iscsi_host_free | > scsi_host_put | > //free shost | > |iscsi_host_for_each_session > |//use qedi->shost > > Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process") > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > --- > drivers/scsi/qedi/qedi_main.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c > index f2ee49756df8..25223f6f5344 100644 > --- a/drivers/scsi/qedi/qedi_main.c > +++ b/drivers/scsi/qedi/qedi_main.c > @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev *pdev, int mode) > int rval; > u16 retry = 10; > > + /*cancel work*/ This comment is not needed. The name of the functions you are calling have "cancel" and "work" in them so we know. If you want to add a comment explain why the cancel calls are needed here. > + cancel_delayed_work_sync(&qedi->recovery_work); > + cancel_delayed_work_sync(&qedi->board_disable_work); How do you know after you have called cancel_delayed_work_sync that schedule_recovery_handler or schedule_hw_err_handler can't be called? I don't know the qed driver well, but it looks like you could have operations still running, so after you cancel here one of those ops could lead to them scheduling the work again. > + > if (mode == QEDI_MODE_NORMAL) > iscsi_host_remove(qedi->shost, false); > else if (mode == QEDI_MODE_SHUTDOWN)
Mike Christie <michael.christie@oracle.com> 于2023年3月21日周二 00:11写道: > > On 3/18/23 3:13 AM, Zheng Wang wrote: > > In qedi_probe, it calls __qedi_probe, which bound &qedi->recovery_work > > with qedi_recovery_handler and bound &qedi->board_disable_work > > with qedi_board_disable_work. > > > > When it calls qedi_schedule_recovery_handler, it will finally > > call schedule_delayed_work to start the work. > > > > When we call qedi_remove to remove the driver, there > > may be a sequence as follows: > > > > Fix it by finishing the work before cleanup in qedi_remove. > > > > CPU0 CPU1 > > > > |qedi_recovery_handler > > qedi_remove | > > __qedi_remove | > > iscsi_host_free | > > scsi_host_put | > > //free shost | > > |iscsi_host_for_each_session > > |//use qedi->shost > > > > Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process") > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > --- > > drivers/scsi/qedi/qedi_main.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c > > index f2ee49756df8..25223f6f5344 100644 > > --- a/drivers/scsi/qedi/qedi_main.c > > +++ b/drivers/scsi/qedi/qedi_main.c > > @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev *pdev, int mode) > > int rval; > > u16 retry = 10; > > > > + /*cancel work*/ > > This comment is not needed. The name of the functions you are calling have > "cancel" and "work" in them so we know. If you want to add a comment explain > why the cancel calls are needed here. > Hi, Sorry for my late reply and thanks for your advice. Will remove it in the next version of patch. > > > + cancel_delayed_work_sync(&qedi->recovery_work); > > + cancel_delayed_work_sync(&qedi->board_disable_work); > > > How do you know after you have called cancel_delayed_work_sync that > schedule_recovery_handler or schedule_hw_err_handler can't be called? > I don't know the qed driver well, but it looks like you could have > operations still running, so after you cancel here one of those ops > could lead to them scheduling the work again. > Sorry I didn't know how to make sure there's no more schedule. But I do think this is important. Maybe there're someone else who can give us advice. Best regards, Zheng > > > + > > if (mode == QEDI_MODE_NORMAL) > > iscsi_host_remove(qedi->shost, false); > > else if (mode == QEDI_MODE_SHUTDOWN) >
> -----Original Message----- > From: Zheng Hacker <hackerzheng666@gmail.com> > Sent: Thursday, March 23, 2023 9:15 AM > To: Mike Christie <michael.christie@oracle.com> > Cc: Zheng Wang <zyytlz.wz@163.com>; Nilesh Javali <njavali@marvell.com>; > Manish Rangankar <mrangankar@marvell.com>; GR-QLogic-Storage- > Upstream <GR-QLogic-Storage-Upstream@marvell.com>; > jejb@linux.ibm.com; martin.petersen@oracle.com; linux- > scsi@vger.kernel.org; linux-kernel@vger.kernel.org; > 1395428693sheep@gmail.com; alex000young@gmail.com > Subject: [EXT] Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in > qedi_remove due to race condition > > External Email > > ---------------------------------------------------------------------- > Mike Christie <michael.christie@oracle.com> 于2023年3月21日周二 00:11写 > 道: > > > > On 3/18/23 3:13 AM, Zheng Wang wrote: > > > In qedi_probe, it calls __qedi_probe, which bound > > > &qedi->recovery_work with qedi_recovery_handler and bound > > > &qedi->board_disable_work with qedi_board_disable_work. > > > > > > When it calls qedi_schedule_recovery_handler, it will finally call > > > schedule_delayed_work to start the work. > > > > > > When we call qedi_remove to remove the driver, there may be a > > > sequence as follows: > > > > > > Fix it by finishing the work before cleanup in qedi_remove. > > > > > > CPU0 CPU1 > > > > > > |qedi_recovery_handler > > > qedi_remove | > > > __qedi_remove | > > > iscsi_host_free | > > > scsi_host_put | > > > //free shost | > > > |iscsi_host_for_each_session > > > |//use qedi->shost > > > > > > Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process") > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > > --- > > > drivers/scsi/qedi/qedi_main.c | 4 ++++ > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/drivers/scsi/qedi/qedi_main.c > > > b/drivers/scsi/qedi/qedi_main.c index f2ee49756df8..25223f6f5344 > > > 100644 > > > --- a/drivers/scsi/qedi/qedi_main.c > > > +++ b/drivers/scsi/qedi/qedi_main.c > > > @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev > *pdev, int mode) > > > int rval; > > > u16 retry = 10; > > > > > > + /*cancel work*/ > > > > This comment is not needed. The name of the functions you are calling > > have "cancel" and "work" in them so we know. If you want to add a > > comment explain why the cancel calls are needed here. > > > > Hi, > > Sorry for my late reply and thanks for your advice. Will remove it in the next > version of patch. > > > > > > + cancel_delayed_work_sync(&qedi->recovery_work); > > > + cancel_delayed_work_sync(&qedi->board_disable_work); > > > > > > How do you know after you have called cancel_delayed_work_sync that > > schedule_recovery_handler or schedule_hw_err_handler can't be called? > > I don't know the qed driver well, but it looks like you could have > > operations still running, so after you cancel here one of those ops > > could lead to them scheduling the work again. > > > > Sorry I didn't know how to make sure there's no more schedule. But I do > think this is important. Maybe there're someone else who can give us advice. > > Best regards, > Zheng > > Best place to call cancel_delayed_work_sync is after qedi_ops->stop(qedi->cdev) and qedi_ops->ll2->stop(qedi->cdev);, after these qed calls firmware will not post any events to qedi driver. Thanks, Manish
Manish Rangankar <mrangankar@marvell.com> 于2023年3月23日周四 18:17写道: > > > > > -----Original Message----- > > From: Zheng Hacker <hackerzheng666@gmail.com> > > Sent: Thursday, March 23, 2023 9:15 AM > > To: Mike Christie <michael.christie@oracle.com> > > Cc: Zheng Wang <zyytlz.wz@163.com>; Nilesh Javali <njavali@marvell.com>; > > Manish Rangankar <mrangankar@marvell.com>; GR-QLogic-Storage- > > Upstream <GR-QLogic-Storage-Upstream@marvell.com>; > > jejb@linux.ibm.com; martin.petersen@oracle.com; linux- > > scsi@vger.kernel.org; linux-kernel@vger.kernel.org; > > 1395428693sheep@gmail.com; alex000young@gmail.com > > Subject: [EXT] Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in > > qedi_remove due to race condition > > > > External Email > > > > ---------------------------------------------------------------------- > > Mike Christie <michael.christie@oracle.com> 于2023年3月21日周二 00:11写 > > 道: > > > > > > On 3/18/23 3:13 AM, Zheng Wang wrote: > > > > In qedi_probe, it calls __qedi_probe, which bound > > > > &qedi->recovery_work with qedi_recovery_handler and bound > > > > &qedi->board_disable_work with qedi_board_disable_work. > > > > > > > > When it calls qedi_schedule_recovery_handler, it will finally call > > > > schedule_delayed_work to start the work. > > > > > > > > When we call qedi_remove to remove the driver, there may be a > > > > sequence as follows: > > > > > > > > Fix it by finishing the work before cleanup in qedi_remove. > > > > > > > > CPU0 CPU1 > > > > > > > > |qedi_recovery_handler > > > > qedi_remove | > > > > __qedi_remove | > > > > iscsi_host_free | > > > > scsi_host_put | > > > > //free shost | > > > > |iscsi_host_for_each_session > > > > |//use qedi->shost > > > > > > > > Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process") > > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > > > --- > > > > drivers/scsi/qedi/qedi_main.c | 4 ++++ > > > > 1 file changed, 4 insertions(+) > > > > > > > > diff --git a/drivers/scsi/qedi/qedi_main.c > > > > b/drivers/scsi/qedi/qedi_main.c index f2ee49756df8..25223f6f5344 > > > > 100644 > > > > --- a/drivers/scsi/qedi/qedi_main.c > > > > +++ b/drivers/scsi/qedi/qedi_main.c > > > > @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev > > *pdev, int mode) > > > > int rval; > > > > u16 retry = 10; > > > > > > > > + /*cancel work*/ > > > > > > This comment is not needed. The name of the functions you are calling > > > have "cancel" and "work" in them so we know. If you want to add a > > > comment explain why the cancel calls are needed here. > > > > > > > Hi, > > > > Sorry for my late reply and thanks for your advice. Will remove it in the next > > version of patch. > > > > > > > > > + cancel_delayed_work_sync(&qedi->recovery_work); > > > > + cancel_delayed_work_sync(&qedi->board_disable_work); > > > > > > > > > How do you know after you have called cancel_delayed_work_sync that > > > schedule_recovery_handler or schedule_hw_err_handler can't be called? > > > I don't know the qed driver well, but it looks like you could have > > > operations still running, so after you cancel here one of those ops > > > could lead to them scheduling the work again. > > > > > > > Sorry I didn't know how to make sure there's no more schedule. But I do > > think this is important. Maybe there're someone else who can give us advice. > > > > Best regards, > > Zheng > > > > > Best place to call cancel_delayed_work_sync is after qedi_ops->stop(qedi->cdev) and > qedi_ops->ll2->stop(qedi->cdev);, after these qed calls firmware will not post any events to qedi driver. > Sorry for my late reply. Will apply that in next version. Best reagrds, Zheng > Thanks, > Manish >
diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c index f2ee49756df8..25223f6f5344 100644 --- a/drivers/scsi/qedi/qedi_main.c +++ b/drivers/scsi/qedi/qedi_main.c @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev *pdev, int mode) int rval; u16 retry = 10; + /*cancel work*/ + cancel_delayed_work_sync(&qedi->recovery_work); + cancel_delayed_work_sync(&qedi->board_disable_work); + if (mode == QEDI_MODE_NORMAL) iscsi_host_remove(qedi->shost, false); else if (mode == QEDI_MODE_SHUTDOWN)
In qedi_probe, it calls __qedi_probe, which bound &qedi->recovery_work with qedi_recovery_handler and bound &qedi->board_disable_work with qedi_board_disable_work. When it calls qedi_schedule_recovery_handler, it will finally call schedule_delayed_work to start the work. When we call qedi_remove to remove the driver, there may be a sequence as follows: Fix it by finishing the work before cleanup in qedi_remove. CPU0 CPU1 |qedi_recovery_handler qedi_remove | __qedi_remove | iscsi_host_free | scsi_host_put | //free shost | |iscsi_host_for_each_session |//use qedi->shost Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process") Signed-off-by: Zheng Wang <zyytlz.wz@163.com> --- drivers/scsi/qedi/qedi_main.c | 4 ++++ 1 file changed, 4 insertions(+)