Message ID | 20200409013148.4219-1-xinhui.pan@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/ttm: Schedule out if possibe in bo delayed delete worker | expand |
Am 09.04.20 um 03:31 schrieb xinhui pan: > The delayed delete list is per device which might be very huge. And in > a heavy workload test, the list might always not be empty. That will > trigger any RCU stall warnings or softlockups in non-preemptible kernels > Lets do schedule out if possible in that case. Mhm, I'm not sure if that is actually allowed. This is called from a work item and those are not really supposed to be scheduled away. Maybe rather change the while into while (!list_empty(&bdev->ddestroy) && !should_reschedule(0)). Christian. > > Signed-off-by: xinhui pan <xinhui.pan@amd.com> > --- > drivers/gpu/drm/ttm/ttm_bo.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > index 9e07c3f75156..b8d853cab33b 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -541,6 +541,7 @@ static bool ttm_bo_delayed_delete(struct ttm_bo_device *bdev, bool remove_all) > } > > ttm_bo_put(bo); > + cond_resched(); > spin_lock(&glob->lru_lock); > } > list_splice_tail(&removed, &bdev->ddestroy);
Am Donnerstag, den 09.04.2020, 14:35 +0200 schrieb Christian König: > Am 09.04.20 um 03:31 schrieb xinhui pan: > > The delayed delete list is per device which might be very huge. And in > > a heavy workload test, the list might always not be empty. That will > > trigger any RCU stall warnings or softlockups in non-preemptible kernels > > Lets do schedule out if possible in that case. > > Mhm, I'm not sure if that is actually allowed. This is called from a > work item and those are not really supposed to be scheduled away. Huh? Workitems can schedule out just fine, otherwise they would be horribly broken when it comes to sleeping locks. The workqueue code even has measures to keep the workqueues at the expected concurrency level by starting other workitems when one of them goes to sleep. Regards, Lucas > Maybe rather change the while into while (!list_empty(&bdev->ddestroy) > && !should_reschedule(0)). > > Christian. > > > Signed-off-by: xinhui pan <xinhui.pan@amd.com> > > --- > > drivers/gpu/drm/ttm/ttm_bo.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > > index 9e07c3f75156..b8d853cab33b 100644 > > --- a/drivers/gpu/drm/ttm/ttm_bo.c > > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > > @@ -541,6 +541,7 @@ static bool ttm_bo_delayed_delete(struct ttm_bo_device *bdev, bool remove_all) > > } > > > > ttm_bo_put(bo); > > + cond_resched(); > > spin_lock(&glob->lru_lock); > > } > > list_splice_tail(&removed, &bdev->ddestroy); > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Am 09.04.20 um 15:25 schrieb Lucas Stach: > Am Donnerstag, den 09.04.2020, 14:35 +0200 schrieb Christian König: >> Am 09.04.20 um 03:31 schrieb xinhui pan: >>> The delayed delete list is per device which might be very huge. And in >>> a heavy workload test, the list might always not be empty. That will >>> trigger any RCU stall warnings or softlockups in non-preemptible kernels >>> Lets do schedule out if possible in that case. >> Mhm, I'm not sure if that is actually allowed. This is called from a >> work item and those are not really supposed to be scheduled away. > Huh? Workitems can schedule out just fine, otherwise they would be > horribly broken when it comes to sleeping locks. Let me refine the sentence: Work items are not really supposed to be scheduled purposely. E.g. you shouldn't call schedule() or cond_resched() like in the case here. Getting scheduled away because we wait for a lock is of course perfectly fine. > The workqueue code > even has measures to keep the workqueues at the expected concurrency > level by starting other workitems when one of them goes to sleep. Yeah, and exactly that's what I would say we should avoid here :) In other words work items can be scheduled away, but they should not if not really necessary (e.g. waiting for a lock). Otherwise as you said new threads for work item processing are started up and I don't think we want that. Just returning from the work item and waiting for the next cycle is most likely the better option. Regards, Christian. > > Regards, > Lucas > >> Maybe rather change the while into while (!list_empty(&bdev->ddestroy) >> && !should_reschedule(0)). >> >> Christian. >> >>> Signed-off-by: xinhui pan <xinhui.pan@amd.com> >>> --- >>> drivers/gpu/drm/ttm/ttm_bo.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c >>> index 9e07c3f75156..b8d853cab33b 100644 >>> --- a/drivers/gpu/drm/ttm/ttm_bo.c >>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c >>> @@ -541,6 +541,7 @@ static bool ttm_bo_delayed_delete(struct ttm_bo_device *bdev, bool remove_all) >>> } >>> >>> ttm_bo_put(bo); >>> + cond_resched(); >>> spin_lock(&glob->lru_lock); >>> } >>> list_splice_tail(&removed, &bdev->ddestroy); >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&data=02%7C01%7Cchristian.koenig%40amd.com%7C0a47486676a74702f05408d7dc89839c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637220355504145868&sdata=wbRkYBPI6mYuZjKBtQN3AGLDOwqJlWY3XUtwwSiUQHg%3D&reserved=0
[AMD Official Use Only - Internal Distribution Only] Why we break out the loops when there are pending bos to be released? And I just checked the process_one_work. Right after the work item callback is called, the workqueue itself will call cond_resched. So I think
I think it doesn't matter if workitem schedule out. Even we did not schedule out, the workqueue itself will schedule out later. So it did not break anything with this patch I think.
https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026 This is another example of the usage of cond_sched.
> Why we break out the loops when there are pending bos to be released? We do this anyway if we can't acquire the necessary locks. Freeing already deleted BOs is just a very lazy background work. > So it did not break anything with this patch I think. Oh, the patch will certainly work. I'm just not sure if it's the ideal behavior. > https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026 > > This is another example of the usage of cond_sched. Yes, and that is also a good example of what I mean here: > if (!mutex_trylock > <https://elixir.bootlin.com/linux/latest/ident/mutex_trylock>(&slab_mutex > <https://elixir.bootlin.com/linux/latest/ident/slab_mutex>)) > /* Give up. Setup the next iteration. */ > goto out; If the function can't acquire the lock immediately it gives up and waits for the next iteration. I think it would be better if we do this in TTM as well if we spend to much time cleaning up old BOs. On the other hand you are right that cond_resched() has the advantage that we could spend more time on cleaning up old BOs if there is nothing else for the CPU TODO. Regards, Christian. Am 09.04.20 um 16:24 schrieb Pan, Xinhui: > https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026 > > This is another example of the usage of cond_sched. > ------------------------------------------------------------------------ > *From:* Pan, Xinhui <Xinhui.Pan@amd.com> > *Sent:* Thursday, April 9, 2020 10:11:08 PM > *To:* Lucas Stach <l.stach@pengutronix.de>; > amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; Koenig, > Christian <Christian.Koenig@amd.com> > *Cc:* dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org> > *Subject:* Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed > delete worker > I think it doesn't matter if workitem schedule out. Even we did not > schedule out, the workqueue itself will schedule out later. > So it did not break anything with this patch I think. > ------------------------------------------------------------------------ > *From:* Pan, Xinhui <Xinhui.Pan@amd.com> > *Sent:* Thursday, April 9, 2020 10:07:09 PM > *To:* Lucas Stach <l.stach@pengutronix.de>; > amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; Koenig, > Christian <Christian.Koenig@amd.com> > *Cc:* dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org> > *Subject:* Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed > delete worker > Why we break out the loops when there are pending bos to be released? > > And I just checked the process_one_work. Right after the work item > callback is called, the workqueue itself will call cond_resched. So I > think > ------------------------------------------------------------------------ > *From:* Koenig, Christian <Christian.Koenig@amd.com> > *Sent:* Thursday, April 9, 2020 9:38:24 PM > *To:* Lucas Stach <l.stach@pengutronix.de>; Pan, Xinhui > <Xinhui.Pan@amd.com>; amd-gfx@lists.freedesktop.org > <amd-gfx@lists.freedesktop.org> > *Cc:* dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org> > *Subject:* Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed > delete worker > Am 09.04.20 um 15:25 schrieb Lucas Stach: > > Am Donnerstag, den 09.04.2020, 14:35 +0200 schrieb Christian König: > >> Am 09.04.20 um 03:31 schrieb xinhui pan: > >>> The delayed delete list is per device which might be very huge. And in > >>> a heavy workload test, the list might always not be empty. That will > >>> trigger any RCU stall warnings or softlockups in non-preemptible > kernels > >>> Lets do schedule out if possible in that case. > >> Mhm, I'm not sure if that is actually allowed. This is called from a > >> work item and those are not really supposed to be scheduled away. > > Huh? Workitems can schedule out just fine, otherwise they would be > > horribly broken when it comes to sleeping locks. > > Let me refine the sentence: Work items are not really supposed to be > scheduled purposely. E.g. you shouldn't call schedule() or > cond_resched() like in the case here. > > Getting scheduled away because we wait for a lock is of course perfectly > fine. > > > The workqueue code > > even has measures to keep the workqueues at the expected concurrency > > level by starting other workitems when one of them goes to sleep. > > Yeah, and exactly that's what I would say we should avoid here :) > > In other words work items can be scheduled away, but they should not if > not really necessary (e.g. waiting for a lock). > > Otherwise as you said new threads for work item processing are started > up and I don't think we want that. > > Just returning from the work item and waiting for the next cycle is most > likely the better option. > > Regards, > Christian. > > > > > Regards, > > Lucas > > > >> Maybe rather change the while into while (!list_empty(&bdev->ddestroy) > >> && !should_reschedule(0)). > >> > >> Christian. > >> > >>> Signed-off-by: xinhui pan <xinhui.pan@amd.com> > >>> --- > >>> drivers/gpu/drm/ttm/ttm_bo.c | 1 + > >>> 1 file changed, 1 insertion(+) > >>> > >>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c > b/drivers/gpu/drm/ttm/ttm_bo.c > >>> index 9e07c3f75156..b8d853cab33b 100644 > >>> --- a/drivers/gpu/drm/ttm/ttm_bo.c > >>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c > >>> @@ -541,6 +541,7 @@ static bool ttm_bo_delayed_delete(struct > ttm_bo_device *bdev, bool remove_all) > >>> } > >>> > >>> ttm_bo_put(bo); > >>> + cond_resched(); > >>> spin_lock(&glob->lru_lock); > >>> } > >>> list_splice_tail(&removed, &bdev->ddestroy); > >> _______________________________________________ > >> dri-devel mailing list > >> dri-devel@lists.freedesktop.org > >> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&data=02%7C01%7Cchristian.koenig%40amd.com%7C0a47486676a74702f05408d7dc89839c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637220355504145868&sdata=wbRkYBPI6mYuZjKBtQN3AGLDOwqJlWY3XUtwwSiUQHg%3D&reserved=0 >
> 2020年4月9日 22:59,Koenig, Christian <Christian.Koenig@amd.com> 写道: > >> Why we break out the loops when there are pending bos to be released? > > We do this anyway if we can't acquire the necessary locks. Freeing already deleted BOs is just a very lazy background work. That is true. eviction will reclaim the BO resource too. > >> So it did not break anything with this patch I think. > > Oh, the patch will certainly work. I'm just not sure if it's the ideal behavior. > >> https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026 >> >> This is another example of the usage of cond_sched. > > Yes, and that is also a good example of what I mean here: > >> if (!mutex_trylock(&slab_mutex)) >> >> >> /* Give up. Setup the next iteration. */ >> >> >> goto out; > > If the function can't acquire the lock immediately it gives up and waits for the next iteration. > > I think it would be better if we do this in TTM as well if we spend to much time cleaning up old BOs. fair enough. > > On the other hand you are right that cond_resched() has the advantage that we could spend more time on cleaning up old BOs if there is nothing else for the CPU TODO. > > Regards, > Christian. > > Am 09.04.20 um 16:24 schrieb Pan, Xinhui: >> https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026 >> >> This is another example of the usage of cond_sched. >> From: Pan, Xinhui <Xinhui.Pan@amd.com> >> Sent: Thursday, April 9, 2020 10:11:08 PM >> To: Lucas Stach <l.stach@pengutronix.de>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; Koenig, Christian <Christian.Koenig@amd.com> >> Cc: dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org> >> Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker >> >> I think it doesn't matter if workitem schedule out. Even we did not schedule out, the workqueue itself will schedule out later. >> So it did not break anything with this patch I think. >> From: Pan, Xinhui <Xinhui.Pan@amd.com> >> Sent: Thursday, April 9, 2020 10:07:09 PM >> To: Lucas Stach <l.stach@pengutronix.de>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; Koenig, Christian <Christian.Koenig@amd.com> >> Cc: dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org> >> Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker >> >> Why we break out the loops when there are pending bos to be released? >> >> And I just checked the process_one_work. Right after the work item callback is called, the workqueue itself will call cond_resched. So I think >> From: Koenig, Christian <Christian.Koenig@amd.com> >> Sent: Thursday, April 9, 2020 9:38:24 PM >> To: Lucas Stach <l.stach@pengutronix.de>; Pan, Xinhui <Xinhui.Pan@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org> >> Cc: dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org> >> Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker >> >> Am 09.04.20 um 15:25 schrieb Lucas Stach: >> > Am Donnerstag, den 09.04.2020, 14:35 +0200 schrieb Christian König: >> >> Am 09.04.20 um 03:31 schrieb xinhui pan: >> >>> The delayed delete list is per device which might be very huge. And in >> >>> a heavy workload test, the list might always not be empty. That will >> >>> trigger any RCU stall warnings or softlockups in non-preemptible kernels >> >>> Lets do schedule out if possible in that case. >> >> Mhm, I'm not sure if that is actually allowed. This is called from a >> >> work item and those are not really supposed to be scheduled away. >> > Huh? Workitems can schedule out just fine, otherwise they would be >> > horribly broken when it comes to sleeping locks. >> >> Let me refine the sentence: Work items are not really supposed to be >> scheduled purposely. E.g. you shouldn't call schedule() or >> cond_resched() like in the case here. >> >> Getting scheduled away because we wait for a lock is of course perfectly >> fine. >> >> > The workqueue code >> > even has measures to keep the workqueues at the expected concurrency >> > level by starting other workitems when one of them goes to sleep. >> >> Yeah, and exactly that's what I would say we should avoid here :) >> >> In other words work items can be scheduled away, but they should not if >> not really necessary (e.g. waiting for a lock). >> >> Otherwise as you said new threads for work item processing are started >> up and I don't think we want that. >> >> Just returning from the work item and waiting for the next cycle is most >> likely the better option. >> >> Regards, >> Christian. >> >> > >> > Regards, >> > Lucas >> > >> >> Maybe rather change the while into while (!list_empty(&bdev->ddestroy) >> >> && !should_reschedule(0)). >> >> >> >> Christian. >> >> >> >>> Signed-off-by: xinhui pan <xinhui.pan@amd.com> >> >>> --- >> >>> drivers/gpu/drm/ttm/ttm_bo.c | 1 + >> >>> 1 file changed, 1 insertion(+) >> >>> >> >>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c >> >>> index 9e07c3f75156..b8d853cab33b 100644 >> >>> --- a/drivers/gpu/drm/ttm/ttm_bo.c >> >>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c >> >>> @@ -541,6 +541,7 @@ static bool ttm_bo_delayed_delete(struct ttm_bo_device *bdev, bool remove_all) >> >>> } >> >>> >> >>> ttm_bo_put(bo); >> >>> + cond_resched(); >> >>> spin_lock(&glob->lru_lock); >> >>> } >> >>> list_splice_tail(&removed, &bdev->ddestroy); >> >> _______________________________________________ >> >> dri-devel mailing list >> >> dri-devel@lists.freedesktop.org >> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&data=02%7C01%7Cchristian.koenig%40amd.com%7C0a47486676a74702f05408d7dc89839c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637220355504145868&sdata=wbRkYBPI6mYuZjKBtQN3AGLDOwqJlWY3XUtwwSiUQHg%3D&reserved=0 >> >
We can have both of yours, I think. Even switch to use spin_trylock, I think we are ok to have cond_resched() Xinhui added in this patch. That can give more chance to urgent task to use cpu. -David 在 2020/4/9 22:59, Christian König 写道: >> Why we break out the loops when there are pending bos to be released? > > We do this anyway if we can't acquire the necessary locks. Freeing > already deleted BOs is just a very lazy background work. > >> So it did not break anything with this patch I think. > > Oh, the patch will certainly work. I'm just not sure if it's the ideal > behavior. > >> https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026 >> >> This is another example of the usage of cond_sched. > > Yes, and that is also a good example of what I mean here: > >> if (!mutex_trylock >> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fident%2Fmutex_trylock&data=02%7C01%7CDavid1.Zhou%40amd.com%7Cfcae774489544a033c0608d7dc969f31%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637220411795246517&sdata=XACA%2BgpBkJEgPva9c7Wf6Ca1bAOrNaXARf%2B4ze1Mqyw%3D&reserved=0>(&slab_mutex >> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fident%2Fslab_mutex&data=02%7C01%7CDavid1.Zhou%40amd.com%7Cfcae774489544a033c0608d7dc969f31%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637220411795256512&sdata=z9uAZriS46hBXuYeVaYkAWb%2BPLYwLcCrK%2BWa4DRO0zw%3D&reserved=0>)) >> /* Give up. Setup the next iteration. */ >> goto out; > > If the function can't acquire the lock immediately it gives up and > waits for the next iteration. > > I think it would be better if we do this in TTM as well if we spend to > much time cleaning up old BOs. > > On the other hand you are right that cond_resched() has the advantage > that we could spend more time on cleaning up old BOs if there is > nothing else for the CPU TODO. > > Regards, > Christian. > > Am 09.04.20 um 16:24 schrieb Pan, Xinhui: >> https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026 >> >> This is another example of the usage of cond_sched. >> ------------------------------------------------------------------------ >> *From:* Pan, Xinhui <Xinhui.Pan@amd.com> >> *Sent:* Thursday, April 9, 2020 10:11:08 PM >> *To:* Lucas Stach <l.stach@pengutronix.de>; >> amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; >> Koenig, Christian <Christian.Koenig@amd.com> >> *Cc:* dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org> >> *Subject:* Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed >> delete worker >> I think it doesn't matter if workitem schedule out. Even we did not >> schedule out, the workqueue itself will schedule out later. >> So it did not break anything with this patch I think. >> ------------------------------------------------------------------------ >> *From:* Pan, Xinhui <Xinhui.Pan@amd.com> >> *Sent:* Thursday, April 9, 2020 10:07:09 PM >> *To:* Lucas Stach <l.stach@pengutronix.de>; >> amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; >> Koenig, Christian <Christian.Koenig@amd.com> >> *Cc:* dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org> >> *Subject:* Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed >> delete worker >> Why we break out the loops when there are pending bos to be released? >> >> And I just checked the process_one_work. Right after the work item >> callback is called, the workqueue itself will call cond_resched. So >> I think >> ------------------------------------------------------------------------ >> *From:* Koenig, Christian <Christian.Koenig@amd.com> >> *Sent:* Thursday, April 9, 2020 9:38:24 PM >> *To:* Lucas Stach <l.stach@pengutronix.de>; Pan, Xinhui >> <Xinhui.Pan@amd.com>; amd-gfx@lists.freedesktop.org >> <amd-gfx@lists.freedesktop.org> >> *Cc:* dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org> >> *Subject:* Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed >> delete worker >> Am 09.04.20 um 15:25 schrieb Lucas Stach: >> > Am Donnerstag, den 09.04.2020, 14:35 +0200 schrieb Christian König: >> >> Am 09.04.20 um 03:31 schrieb xinhui pan: >> >>> The delayed delete list is per device which might be very huge. >> And in >> >>> a heavy workload test, the list might always not be empty. That will >> >>> trigger any RCU stall warnings or softlockups in non-preemptible >> kernels >> >>> Lets do schedule out if possible in that case. >> >> Mhm, I'm not sure if that is actually allowed. This is called from a >> >> work item and those are not really supposed to be scheduled away. >> > Huh? Workitems can schedule out just fine, otherwise they would be >> > horribly broken when it comes to sleeping locks. >> >> Let me refine the sentence: Work items are not really supposed to be >> scheduled purposely. E.g. you shouldn't call schedule() or >> cond_resched() like in the case here. >> >> Getting scheduled away because we wait for a lock is of course perfectly >> fine. >> >> > The workqueue code >> > even has measures to keep the workqueues at the expected concurrency >> > level by starting other workitems when one of them goes to sleep. >> >> Yeah, and exactly that's what I would say we should avoid here :) >> >> In other words work items can be scheduled away, but they should not if >> not really necessary (e.g. waiting for a lock). >> >> Otherwise as you said new threads for work item processing are started >> up and I don't think we want that. >> >> Just returning from the work item and waiting for the next cycle is most >> likely the better option. >> >> Regards, >> Christian. >> >> > >> > Regards, >> > Lucas >> > >> >> Maybe rather change the while into while (!list_empty(&bdev->ddestroy) >> >> && !should_reschedule(0)). >> >> >> >> Christian. >> >> >> >>> Signed-off-by: xinhui pan <xinhui.pan@amd.com> >> >>> --- >> >>> drivers/gpu/drm/ttm/ttm_bo.c | 1 + >> >>> 1 file changed, 1 insertion(+) >> >>> >> >>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c >> b/drivers/gpu/drm/ttm/ttm_bo.c >> >>> index 9e07c3f75156..b8d853cab33b 100644 >> >>> --- a/drivers/gpu/drm/ttm/ttm_bo.c >> >>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c >> >>> @@ -541,6 +541,7 @@ static bool ttm_bo_delayed_delete(struct >> ttm_bo_device *bdev, bool remove_all) >> >>> } >> >>> >> >>> ttm_bo_put(bo); >> >>> + cond_resched(); >> >>> spin_lock(&glob->lru_lock); >> >>> } >> >>> list_splice_tail(&removed, &bdev->ddestroy); >> >> _______________________________________________ >> >> dri-devel mailing list >> >> dri-devel@lists.freedesktop.org >> >> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&data=02%7C01%7Cchristian.koenig%40amd.com%7C0a47486676a74702f05408d7dc89839c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637220355504145868&sdata=wbRkYBPI6mYuZjKBtQN3AGLDOwqJlWY3XUtwwSiUQHg%3D&reserved=0 >> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&data=02%7C01%7CDavid1.Zhou%40amd.com%7Cfcae774489544a033c0608d7dc969f31%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637220411795266500&sdata=UGbQTN7vjHhNPidodWhXx4sSqUQjtKp4dbCJcztf6ZM%3D&reserved=0> >> > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&data=02%7C01%7CDavid1.Zhou%40amd.com%7Cfcae774489544a033c0608d7dc969f31%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637220411795286493&sdata=SoE%2F8sg9vCb0OwxPF%2FPpKLLUKug%2FAhZDyyvDiQamWp4%3D&reserved=0
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 9e07c3f75156..b8d853cab33b 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -541,6 +541,7 @@ static bool ttm_bo_delayed_delete(struct ttm_bo_device *bdev, bool remove_all) } ttm_bo_put(bo); + cond_resched(); spin_lock(&glob->lru_lock); } list_splice_tail(&removed, &bdev->ddestroy);
The delayed delete list is per device which might be very huge. And in a heavy workload test, the list might always not be empty. That will trigger any RCU stall warnings or softlockups in non-preemptible kernels Lets do schedule out if possible in that case. Signed-off-by: xinhui pan <xinhui.pan@amd.com> --- drivers/gpu/drm/ttm/ttm_bo.c | 1 + 1 file changed, 1 insertion(+)