Message ID | 20190422103836.4300-1-david1.zhou@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ttm: wait mem space if user allow while gpu busy | expand |
Acked-by: Prike Liang <Prike.Liang@amd.com> Thanks, Prike -----Original Message----- From: Chunming Zhou <david1.zhou@amd.com> Sent: Monday, April 22, 2019 6:39 PM To: dri-devel@lists.freedesktop.org Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy heavy gpu job could occupy memory long time, which could lead to other user fail to get memory. Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11 Signed-off-by: Chunming Zhou <david1.zhou@amd.com> --- drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, if (mem->mm_node) break; ret = ttm_mem_evict_first(bdev, mem_type, place, ctx); - if (unlikely(ret != 0)) - return ret; + if (unlikely(ret != 0)) { + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY) + return ret; + } } while (1); mem->mem_type = mem_type; return ttm_bo_add_move_fence(bo, man, mem); -- 2.17.1
Well that is certainly a NAK because it can lead to deadlock in the memory management. You can't just busy wait with all those locks held. Regards, Christian. Am 23.04.19 um 03:45 schrieb Liang, Prike: > Acked-by: Prike Liang <Prike.Liang@amd.com> > > Thanks, > Prike > -----Original Message----- > From: Chunming Zhou <david1.zhou@amd.com> > Sent: Monday, April 22, 2019 6:39 PM > To: dri-devel@lists.freedesktop.org > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy > > heavy gpu job could occupy memory long time, which could lead to other user fail to get memory. > > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11 > Signed-off-by: Chunming Zhou <david1.zhou@amd.com> > --- > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, > if (mem->mm_node) > break; > ret = ttm_mem_evict_first(bdev, mem_type, place, ctx); > - if (unlikely(ret != 0)) > - return ret; > + if (unlikely(ret != 0)) { > + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY) > + return ret; > + } > } while (1); > mem->mem_type = mem_type; > return ttm_bo_add_move_fence(bo, man, mem); > -- > 2.17.1 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
How about adding more condition ctx->resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock. Otherwise, any other idea? -------- Original Message -------- Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy From: Christian König To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org CC: Well that is certainly a NAK because it can lead to deadlock in the memory management. You can't just busy wait with all those locks held. Regards, Christian. Am 23.04.19 um 03:45 schrieb Liang, Prike: > Acked-by: Prike Liang <Prike.Liang@amd.com> > > Thanks, > Prike > -----Original Message----- > From: Chunming Zhou <david1.zhou@amd.com> > Sent: Monday, April 22, 2019 6:39 PM > To: dri-devel@lists.freedesktop.org > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy > > heavy gpu job could occupy memory long time, which could lead to other user fail to get memory. > > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11 > Signed-off-by: Chunming Zhou <david1.zhou@amd.com> > --- > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, > if (mem->mm_node) > break; > ret = ttm_mem_evict_first(bdev, mem_type, place, ctx); > - if (unlikely(ret != 0)) > - return ret; > + if (unlikely(ret != 0)) { > + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY) > + return ret; > + } > } while (1); > mem->mem_type = mem_type; > return ttm_bo_add_move_fence(bo, man, mem); > -- > 2.17.1 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <meta name="Generator" content="Microsoft Exchange Server"> <!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style> </head> <body> <div>How about adding more condition ctx->resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock.<br> <br> Otherwise, any other idea?<br> <br> -------- Original Message --------<br> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy<br> From: Christian König <br> To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org<br> CC: <br> <br> </div> <font size="2"><span style="font-size:11pt;"> <div class="PlainText">Well that is certainly a NAK because it can lead to deadlock in the <br> memory management.<br> <br> You can't just busy wait with all those locks held.<br> <br> Regards,<br> Christian.<br> <br> Am 23.04.19 um 03:45 schrieb Liang, Prike:<br> > Acked-by: Prike Liang <Prike.Liang@amd.com><br> ><br> > Thanks,<br> > Prike<br> > -----Original Message-----<br> > From: Chunming Zhou <david1.zhou@amd.com><br> > Sent: Monday, April 22, 2019 6:39 PM<br> > To: dri-devel@lists.freedesktop.org<br> > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com><br> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy<br> ><br> > heavy gpu job could occupy memory long time, which could lead to other user fail to get memory.<br> ><br> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11<br> > Signed-off-by: Chunming Zhou <david1.zhou@amd.com><br> > ---<br> > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--<br> > 1 file changed, 4 insertions(+), 2 deletions(-)<br> ><br> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644<br> > --- a/drivers/gpu/drm/ttm/ttm_bo.c<br> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c<br> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,<br> > if (mem->mm_node)<br> > break;<br> > ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);<br> > - if (unlikely(ret != 0))<br> > - return ret;<br> > + if (unlikely(ret != 0)) {<br> > + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)<br> > + return ret;<br> > + }<br> > } while (1);<br> > mem->mem_type = mem_type;<br> > return ttm_bo_add_move_fence(bo, man, mem);<br> > --<br> > 2.17.1<br> ><br> > _______________________________________________<br> > dri-devel mailing list<br> > dri-devel@lists.freedesktop.org<br> > <a href="https://lists.freedesktop.org/mailman/listinfo/dri-devel">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a><br> <br> </div> </span></font> </body> </html>
Well that's not so easy of hand. The basic problem here is that when you busy wait at this place you can easily run into situations where application A busy waits for B while B busy waits for A -> deadlock. So what we need here is the deadlock detection logic of the ww_mutex. To use this we at least need to do the following steps: 1. Reserve the BO in DC using a ww_mutex ticket (trivial). 2. If we then run into this EBUSY condition in TTM check if the BO we need memory for (or rather the ww_mutex of its reservation object) has a ticket assigned. 3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket. 4. If getting the reservation lock with the ticket succeeded we check if the BO is still the first one on the LRU in question (the BO could have moved). 5. If the BO is still the first one on the LRU in question we try to evict it as we would evict any other BO. 6. If any of the "If's" above fail we just back off and return -EBUSY. Steps 2-5 are certainly not trivial, but doable as far as I can see. Have fun :) Christian. Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing): > How about adding more condition ctx->resv inline to address your > concern? As well as don't wait from same user, shouldn't lead to deadlock. > > Otherwise, any other idea? > > -------- Original Message -------- > Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy > From: Christian König > To: "Liang, Prike" ,"Zhou, David(ChunMing)" > ,dri-devel@lists.freedesktop.org > CC: > > Well that is certainly a NAK because it can lead to deadlock in the > memory management. > > You can't just busy wait with all those locks held. > > Regards, > Christian. > > Am 23.04.19 um 03:45 schrieb Liang, Prike: > > Acked-by: Prike Liang <Prike.Liang@amd.com> > > > > Thanks, > > Prike > > -----Original Message----- > > From: Chunming Zhou <david1.zhou@amd.com> > > Sent: Monday, April 22, 2019 6:39 PM > > To: dri-devel@lists.freedesktop.org > > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) > <David1.Zhou@amd.com> > > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy > > > > heavy gpu job could occupy memory long time, which could lead to > other user fail to get memory. > > > > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11 > > Signed-off-by: Chunming Zhou <david1.zhou@amd.com> > > --- > > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c > b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644 > > --- a/drivers/gpu/drm/ttm/ttm_bo.c > > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct > ttm_buffer_object *bo, > > if (mem->mm_node) > > break; > > ret = ttm_mem_evict_first(bdev, mem_type, place, ctx); > > - if (unlikely(ret != 0)) > > - return ret; > > + if (unlikely(ret != 0)) { > > + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY) > > + return ret; > > + } > > } while (1); > > mem->mem_type = mem_type; > > return ttm_bo_add_move_fence(bo, man, mem); > > -- > > 2.17.1 > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> </head> <body text="#000000" bgcolor="#FFFFFF"> <div class="moz-cite-prefix">Well that's not so easy of hand.<br> <br> The basic problem here is that when you busy wait at this place you can easily run into situations where application A busy waits for B while B busy waits for A -> deadlock.<br> <br> So what we need here is the deadlock detection logic of the ww_mutex. To use this we at least need to do the following steps:<br> <br> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).<br> <br> 2. If we then run into this EBUSY condition in TTM check if the BO we need memory for (or rather the ww_mutex of its reservation object) has a ticket assigned.<br> <br> 3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.<br> <br> 4. If getting the reservation lock with the ticket succeeded we check if the BO is still the first one on the LRU in question (the BO could have moved).<br> <br> 5. If the BO is still the first one on the LRU in question we try to evict it as we would evict any other BO.<br> <br> 6. If any of the "If's" above fail we just back off and return -EBUSY.<br> <br> Steps 2-5 are certainly not trivial, but doable as far as I can see.<br> <br> Have fun :)<br> Christian.<br> <br> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):<br> </div> <blockquote type="cite" cite="mid:ven3p5-dcjsua-aovhkzqonovm1ermvl-iasad2-grfh83xntt6k-qnptbs-6ko55nfhu2ac-rew2lw-zbd8fsix6bv3-dlzat2-8w1c4brcif8r7131wn-p5wdqd-fvtllbv000vq-jszyw5-f3mv7ivocjk.1556025578377@email.android.com"> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <meta name="Generator" content="Microsoft Exchange Server"> <!-- converted from text --> <style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style> <div>How about adding more condition ctx->resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock.<br> <br> Otherwise, any other idea?<br> <br> -------- Original Message --------<br> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy<br> From: Christian König <br> To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a><br> CC: <br> <br> </div> <font size="2"><span style="font-size:11pt;"> <div class="PlainText">Well that is certainly a NAK because it can lead to deadlock in the <br> memory management.<br> <br> You can't just busy wait with all those locks held.<br> <br> Regards,<br> Christian.<br> <br> Am 23.04.19 um 03:45 schrieb Liang, Prike:<br> > Acked-by: Prike Liang <a class="moz-txt-link-rfc2396E" href="mailto:Prike.Liang@amd.com"><Prike.Liang@amd.com></a><br> ><br> > Thanks,<br> > Prike<br> > -----Original Message-----<br> > From: Chunming Zhou <a class="moz-txt-link-rfc2396E" href="mailto:david1.zhou@amd.com"><david1.zhou@amd.com></a><br> > Sent: Monday, April 22, 2019 6:39 PM<br> > To: <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a><br> > Cc: Liang, Prike <a class="moz-txt-link-rfc2396E" href="mailto:Prike.Liang@amd.com"><Prike.Liang@amd.com></a>; Zhou, David(ChunMing) <a class="moz-txt-link-rfc2396E" href="mailto:David1.Zhou@amd.com"><David1.Zhou@amd.com></a><br> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy<br> ><br> > heavy gpu job could occupy memory long time, which could lead to other user fail to get memory.<br> ><br> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11<br> > Signed-off-by: Chunming Zhou <a class="moz-txt-link-rfc2396E" href="mailto:david1.zhou@amd.com"><david1.zhou@amd.com></a><br> > ---<br> > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--<br> > 1 file changed, 4 insertions(+), 2 deletions(-)<br> ><br> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644<br> > --- a/drivers/gpu/drm/ttm/ttm_bo.c<br> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c<br> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,<br> > if (mem->mm_node)<br> > break;<br> > ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);<br> > - if (unlikely(ret != 0))<br> > - return ret;<br> > + if (unlikely(ret != 0)) {<br> > + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)<br> > + return ret;<br> > + }<br> > } while (1);<br> > mem->mem_type = mem_type;<br> > return ttm_bo_add_move_fence(bo, man, mem);<br> > --<br> > 2.17.1<br> ><br> > _______________________________________________<br> > dri-devel mailing list<br> > <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a><br> > <a href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a><br> <br> </div> </span></font> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </body> </html>
>3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket. The BO on LRU is already locked by cs user, can it be dropped here by DC user? and then DC user grab its lock with ticket, how does CS grab it again? If you think waiting in ttm has this risk, how about just adding a wrapper for pin function as below? amdgpu_get_pin_bo_timeout() { do { amdgpo_bo_reserve(); r=amdgpu_bo_pin(); if(!r) break; amdgpu_bo_unreserve(); timeout--; } while(timeout>0); } -------- Original Message -------- Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy From: Christian König To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" ,dri-devel@lists.freedesktop.org CC: Well that's not so easy of hand. The basic problem here is that when you busy wait at this place you can easily run into situations where application A busy waits for B while B busy waits for A -> deadlock. So what we need here is the deadlock detection logic of the ww_mutex. To use this we at least need to do the following steps: 1. Reserve the BO in DC using a ww_mutex ticket (trivial). 2. If we then run into this EBUSY condition in TTM check if the BO we need memory for (or rather the ww_mutex of its reservation object) has a ticket assigned. 3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket. 4. If getting the reservation lock with the ticket succeeded we check if the BO is still the first one on the LRU in question (the BO could have moved). 5. If the BO is still the first one on the LRU in question we try to evict it as we would evict any other BO. 6. If any of the "If's" above fail we just back off and return -EBUSY. Steps 2-5 are certainly not trivial, but doable as far as I can see. Have fun :) Christian. Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing): How about adding more condition ctx->resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock. Otherwise, any other idea? -------- Original Message -------- Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy From: Christian König To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org> CC: Well that is certainly a NAK because it can lead to deadlock in the memory management. You can't just busy wait with all those locks held. Regards, Christian. Am 23.04.19 um 03:45 schrieb Liang, Prike: > Acked-by: Prike Liang <Prike.Liang@amd.com><mailto:Prike.Liang@amd.com> > > Thanks, > Prike > -----Original Message----- > From: Chunming Zhou <david1.zhou@amd.com><mailto:david1.zhou@amd.com> > Sent: Monday, April 22, 2019 6:39 PM > To: dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org> > Cc: Liang, Prike <Prike.Liang@amd.com><mailto:Prike.Liang@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com><mailto:David1.Zhou@amd.com> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy > > heavy gpu job could occupy memory long time, which could lead to other user fail to get memory. > > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11 > Signed-off-by: Chunming Zhou <david1.zhou@amd.com><mailto:david1.zhou@amd.com> > --- > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, > if (mem->mm_node) > break; > ret = ttm_mem_evict_first(bdev, mem_type, place, ctx); > - if (unlikely(ret != 0)) > - return ret; > + if (unlikely(ret != 0)) { > + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY) > + return ret; > + } > } while (1); > mem->mem_type = mem_type; > return ttm_bo_add_move_fence(bo, man, mem); > -- > 2.17.1 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> how about just adding a wrapper for pin function as below? I considered this as well and don't think it will work reliable. We could use it as a band aid for this specific problem, but in general we need to improve the handling in TTM to resolve those kind of resource conflicts. Regards, Christian. Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing): > >3. If we have a ticket we grab a reference to the first BO on the > LRU, drop the LRU lock and try to grab the reservation lock with the > ticket. > > The BO on LRU is already locked by cs user, can it be dropped here by > DC user? and then DC user grab its lock with ticket, how does CS grab > it again? > > If you think waiting in ttm has this risk, how about just adding a > wrapper for pin function as below? > amdgpu_get_pin_bo_timeout() > { > do { > amdgpo_bo_reserve(); > r=amdgpu_bo_pin(); > > if(!r) > break; > amdgpu_bo_unreserve(); > timeout--; > > } while(timeout>0); > > } > > -------- Original Message -------- > Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy > From: Christian König > To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" > ,dri-devel@lists.freedesktop.org > CC: > > Well that's not so easy of hand. > > The basic problem here is that when you busy wait at this place you > can easily run into situations where application A busy waits for B > while B busy waits for A -> deadlock. > > So what we need here is the deadlock detection logic of the ww_mutex. > To use this we at least need to do the following steps: > > 1. Reserve the BO in DC using a ww_mutex ticket (trivial). > > 2. If we then run into this EBUSY condition in TTM check if the BO we > need memory for (or rather the ww_mutex of its reservation object) has > a ticket assigned. > > 3. If we have a ticket we grab a reference to the first BO on the LRU, > drop the LRU lock and try to grab the reservation lock with the ticket. > > 4. If getting the reservation lock with the ticket succeeded we check > if the BO is still the first one on the LRU in question (the BO could > have moved). > > 5. If the BO is still the first one on the LRU in question we try to > evict it as we would evict any other BO. > > 6. If any of the "If's" above fail we just back off and return -EBUSY. > > Steps 2-5 are certainly not trivial, but doable as far as I can see. > > Have fun :) > Christian. > > Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing): >> How about adding more condition ctx->resv inline to address your >> concern? As well as don't wait from same user, shouldn't lead to >> deadlock. >> >> Otherwise, any other idea? >> >> -------- Original Message -------- >> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy >> From: Christian König >> To: "Liang, Prike" ,"Zhou, David(ChunMing)" >> ,dri-devel@lists.freedesktop.org >> CC: >> >> Well that is certainly a NAK because it can lead to deadlock in the >> memory management. >> >> You can't just busy wait with all those locks held. >> >> Regards, >> Christian. >> >> Am 23.04.19 um 03:45 schrieb Liang, Prike: >> > Acked-by: Prike Liang <Prike.Liang@amd.com> >> > >> > Thanks, >> > Prike >> > -----Original Message----- >> > From: Chunming Zhou <david1.zhou@amd.com> >> > Sent: Monday, April 22, 2019 6:39 PM >> > To: dri-devel@lists.freedesktop.org >> > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) >> <David1.Zhou@amd.com> >> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy >> > >> > heavy gpu job could occupy memory long time, which could lead to >> other user fail to get memory. >> > >> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11 >> > Signed-off-by: Chunming Zhou <david1.zhou@amd.com> >> > --- >> > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++-- >> > 1 file changed, 4 insertions(+), 2 deletions(-) >> > >> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c >> b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644 >> > --- a/drivers/gpu/drm/ttm/ttm_bo.c >> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c >> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct >> ttm_buffer_object *bo, >> > if (mem->mm_node) >> > break; >> > ret = ttm_mem_evict_first(bdev, mem_type, place, ctx); >> > - if (unlikely(ret != 0)) >> > - return ret; >> > + if (unlikely(ret != 0)) { >> > + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY) >> > + return ret; >> > + } >> > } while (1); >> > mem->mem_type = mem_type; >> > return ttm_bo_add_move_fence(bo, man, mem); >> > -- >> > 2.17.1 >> > >> > _______________________________________________ >> > dri-devel mailing list >> > dri-devel@lists.freedesktop.org >> > https://lists.freedesktop.org/mailman/listinfo/dri-devel >> >> >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> </head> <body text="#000000" bgcolor="#FFFFFF"> <div class="moz-cite-prefix"> <blockquote type="cite">how about just adding a wrapper for pin function as below?</blockquote> I considered this as well and don't think it will work reliable.<br> <br> We could use it as a band aid for this specific problem, but in general we need to improve the handling in TTM to resolve those kind of resource conflicts.<br> <br> Regards,<br> Christian.<br> <br> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):<br> </div> <blockquote type="cite" cite="mid:-jm8957cqk536djh1631fvvv-xx4wzb5q21ak-v8q7rd2l14aq-f68kxr7kbea18a7xceae626b-8s84c4d1mgrg53g0bhq3ahee89h70qrv4ly1-vf5a7d63x4mbquxnfmiehuk2-rwaw2m-qc2chu.1556032141262@email.android.com"> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <meta content="text/html; charset=Windows-1252"> >3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.<br> <br> The BO on LRU is already locked by cs user, can it be dropped here by DC user? and then DC user grab its lock with ticket, how does CS grab it again?<br> <br> If you think waiting in ttm has this risk, how about just adding a wrapper for pin function as below?<br> amdgpu_get_pin_bo_timeout()<br> {<br> do {<br> amdgpo_bo_reserve();<br> r=amdgpu_bo_pin();<br> <br> if(!r)<br> break;<br> amdgpu_bo_unreserve();<br> timeout--;<br> <br> } while(timeout>0);<br> <br> }<br> <br> -------- Original Message --------<br> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy<br> From: Christian König <br> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a><br> CC: <br> <br> <div> <div class="moz-cite-prefix">Well that's not so easy of hand.<br> <br> The basic problem here is that when you busy wait at this place you can easily run into situations where application A busy waits for B while B busy waits for A -> deadlock.<br> <br> So what we need here is the deadlock detection logic of the ww_mutex. To use this we at least need to do the following steps:<br> <br> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).<br> <br> 2. If we then run into this EBUSY condition in TTM check if the BO we need memory for (or rather the ww_mutex of its reservation object) has a ticket assigned.<br> <br> 3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.<br> <br> 4. If getting the reservation lock with the ticket succeeded we check if the BO is still the first one on the LRU in question (the BO could have moved).<br> <br> 5. If the BO is still the first one on the LRU in question we try to evict it as we would evict any other BO.<br> <br> 6. If any of the "If's" above fail we just back off and return -EBUSY.<br> <br> Steps 2-5 are certainly not trivial, but doable as far as I can see.<br> <br> Have fun :)<br> Christian.<br> <br> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):<br> </div> <blockquote type="cite"> <meta name="Generator" content="Microsoft Exchange Server"> <style> <!-- .EmailQuote {margin-left:1pt; padding-left:4pt; border-left:#800000 2px solid} --> </style> <div>How about adding more condition ctx->resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock.<br> <br> Otherwise, any other idea?<br> <br> -------- Original Message --------<br> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy<br> From: Christian König <br> To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br> CC: <br> <br> </div> <font size="2"><span style="font-size:11pt"> <div class="PlainText">Well that is certainly a NAK because it can lead to deadlock in the <br> memory management.<br> <br> You can't just busy wait with all those locks held.<br> <br> Regards,<br> Christian.<br> <br> Am 23.04.19 um 03:45 schrieb Liang, Prike:<br> > Acked-by: Prike Liang <a class="moz-txt-link-rfc2396E" href="mailto:Prike.Liang@amd.com" moz-do-not-send="true"> <Prike.Liang@amd.com></a><br> ><br> > Thanks,<br> > Prike<br> > -----Original Message-----<br> > From: Chunming Zhou <a class="moz-txt-link-rfc2396E" href="mailto:david1.zhou@amd.com" moz-do-not-send="true"> <david1.zhou@amd.com></a><br> > Sent: Monday, April 22, 2019 6:39 PM<br> > To: <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true"> dri-devel@lists.freedesktop.org</a><br> > Cc: Liang, Prike <a class="moz-txt-link-rfc2396E" href="mailto:Prike.Liang@amd.com" moz-do-not-send="true"> <Prike.Liang@amd.com></a>; Zhou, David(ChunMing) <a class="moz-txt-link-rfc2396E" href="mailto:David1.Zhou@amd.com" moz-do-not-send="true"> <David1.Zhou@amd.com></a><br> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy<br> ><br> > heavy gpu job could occupy memory long time, which could lead to other user fail to get memory.<br> ><br> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11<br> > Signed-off-by: Chunming Zhou <a class="moz-txt-link-rfc2396E" href="mailto:david1.zhou@amd.com" moz-do-not-send="true"> <david1.zhou@amd.com></a><br> > ---<br> > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--<br> > 1 file changed, 4 insertions(+), 2 deletions(-)<br> ><br> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644<br> > --- a/drivers/gpu/drm/ttm/ttm_bo.c<br> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c<br> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,<br> > if (mem->mm_node)<br> > break;<br> > ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);<br> > - if (unlikely(ret != 0))<br> > - return ret;<br> > + if (unlikely(ret != 0)) {<br> > + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)<br> > + return ret;<br> > + }<br> > } while (1);<br> > mem->mem_type = mem_type;<br> > return ttm_bo_add_move_fence(bo, man, mem);<br> > --<br> > 2.17.1<br> ><br> > _______________________________________________<br> > dri-devel mailing list<br> > <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true"> dri-devel@lists.freedesktop.org</a><br> > <a href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a><br> <br> </div> </span></font><br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </div> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </body> </html>
I made a patch as attached. I'm not sure how to make patch as your proposal, Could you make a patch for that if mine isn't enough? -David On 2019年04月24日 15:12, Christian König wrote: >> how about just adding a wrapper for pin function as below? > I considered this as well and don't think it will work reliable. > > We could use it as a band aid for this specific problem, but in > general we need to improve the handling in TTM to resolve those kind > of resource conflicts. > > Regards, > Christian. > > Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing): >> >3. If we have a ticket we grab a reference to the first BO on the >> LRU, drop the LRU lock and try to grab the reservation lock with the >> ticket. >> >> The BO on LRU is already locked by cs user, can it be dropped here by >> DC user? and then DC user grab its lock with ticket, how does CS grab >> it again? >> >> If you think waiting in ttm has this risk, how about just adding a >> wrapper for pin function as below? >> amdgpu_get_pin_bo_timeout() >> { >> do { >> amdgpo_bo_reserve(); >> r=amdgpu_bo_pin(); >> >> if(!r) >> break; >> amdgpu_bo_unreserve(); >> timeout--; >> >> } while(timeout>0); >> >> } >> >> -------- Original Message -------- >> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy >> From: Christian König >> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" >> ,dri-devel@lists.freedesktop.org >> CC: >> >> Well that's not so easy of hand. >> >> The basic problem here is that when you busy wait at this place you >> can easily run into situations where application A busy waits for B >> while B busy waits for A -> deadlock. >> >> So what we need here is the deadlock detection logic of the ww_mutex. >> To use this we at least need to do the following steps: >> >> 1. Reserve the BO in DC using a ww_mutex ticket (trivial). >> >> 2. If we then run into this EBUSY condition in TTM check if the BO we >> need memory for (or rather the ww_mutex of its reservation object) >> has a ticket assigned. >> >> 3. If we have a ticket we grab a reference to the first BO on the >> LRU, drop the LRU lock and try to grab the reservation lock with the >> ticket. >> >> 4. If getting the reservation lock with the ticket succeeded we check >> if the BO is still the first one on the LRU in question (the BO could >> have moved). >> >> 5. If the BO is still the first one on the LRU in question we try to >> evict it as we would evict any other BO. >> >> 6. If any of the "If's" above fail we just back off and return -EBUSY. >> >> Steps 2-5 are certainly not trivial, but doable as far as I can see. >> >> Have fun :) >> Christian. >> >> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing): >>> How about adding more condition ctx->resv inline to address your >>> concern? As well as don't wait from same user, shouldn't lead to >>> deadlock. >>> >>> Otherwise, any other idea? >>> >>> -------- Original Message -------- >>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy >>> From: Christian König >>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" >>> ,dri-devel@lists.freedesktop.org >>> CC: >>> >>> Well that is certainly a NAK because it can lead to deadlock in the >>> memory management. >>> >>> You can't just busy wait with all those locks held. >>> >>> Regards, >>> Christian. >>> >>> Am 23.04.19 um 03:45 schrieb Liang, Prike: >>> > Acked-by: Prike Liang <Prike.Liang@amd.com> >>> > >>> > Thanks, >>> > Prike >>> > -----Original Message----- >>> > From: Chunming Zhou <david1.zhou@amd.com> >>> > Sent: Monday, April 22, 2019 6:39 PM >>> > To: dri-devel@lists.freedesktop.org >>> > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) >>> <David1.Zhou@amd.com> >>> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy >>> > >>> > heavy gpu job could occupy memory long time, which could lead to >>> other user fail to get memory. >>> > >>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11 >>> > Signed-off-by: Chunming Zhou <david1.zhou@amd.com> >>> > --- >>> > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++-- >>> > 1 file changed, 4 insertions(+), 2 deletions(-) >>> > >>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c >>> b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644 >>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c >>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c >>> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct >>> ttm_buffer_object *bo, >>> > if (mem->mm_node) >>> > break; >>> > ret = ttm_mem_evict_first(bdev, mem_type, place, ctx); >>> > - if (unlikely(ret != 0)) >>> > - return ret; >>> > + if (unlikely(ret != 0)) { >>> > + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY) >>> > + return ret; >>> > + } >>> > } while (1); >>> > mem->mem_type = mem_type; >>> > return ttm_bo_add_move_fence(bo, man, mem); >>> > -- >>> > 2.17.1 >>> > >>> > _______________________________________________ >>> > dri-devel mailing list >>> > dri-devel@lists.freedesktop.org >>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel >>> >>> >>> _______________________________________________ >>> dri-devel mailing list >>> dri-devel@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >> >> >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel > <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <p>I made a patch as attached.</p> <p>I'm not sure how to make patch as your proposal, Could you make a patch for that if mine isn't enough?<br> </p> -David<br> <br> <div class="moz-cite-prefix">On 2019年04月24日 15:12, Christian König wrote:<br> </div> <blockquote type="cite" cite="mid:32f2a1b7-99f8-de9a-799c-f98af308842b@gmail.com"> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <div class="moz-cite-prefix"> <blockquote type="cite">how about just adding a wrapper for pin function as below?</blockquote> I considered this as well and don't think it will work reliable.<br> <br> We could use it as a band aid for this specific problem, but in general we need to improve the handling in TTM to resolve those kind of resource conflicts.<br> <br> Regards,<br> Christian.<br> <br> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):<br> </div> <blockquote type="cite" cite="mid:-jm8957cqk536djh1631fvvv-xx4wzb5q21ak-v8q7rd2l14aq-f68kxr7kbea18a7xceae626b-8s84c4d1mgrg53g0bhq3ahee89h70qrv4ly1-vf5a7d63x4mbquxnfmiehuk2-rwaw2m-qc2chu.1556032141262@email.android.com"> <meta content="text/html; charset=Windows-1252"> >3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.<br> <br> The BO on LRU is already locked by cs user, can it be dropped here by DC user? and then DC user grab its lock with ticket, how does CS grab it again?<br> <br> If you think waiting in ttm has this risk, how about just adding a wrapper for pin function as below?<br> amdgpu_get_pin_bo_timeout()<br> {<br> do {<br> amdgpo_bo_reserve();<br> r=amdgpu_bo_pin();<br> <br> if(!r)<br> break;<br> amdgpu_bo_unreserve();<br> timeout--;<br> <br> } while(timeout>0);<br> <br> }<br> <br> -------- Original Message --------<br> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy<br> From: Christian König <br> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br> CC: <br> <br> <div> <div class="moz-cite-prefix">Well that's not so easy of hand.<br> <br> The basic problem here is that when you busy wait at this place you can easily run into situations where application A busy waits for B while B busy waits for A -> deadlock.<br> <br> So what we need here is the deadlock detection logic of the ww_mutex. To use this we at least need to do the following steps:<br> <br> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).<br> <br> 2. If we then run into this EBUSY condition in TTM check if the BO we need memory for (or rather the ww_mutex of its reservation object) has a ticket assigned.<br> <br> 3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.<br> <br> 4. If getting the reservation lock with the ticket succeeded we check if the BO is still the first one on the LRU in question (the BO could have moved).<br> <br> 5. If the BO is still the first one on the LRU in question we try to evict it as we would evict any other BO.<br> <br> 6. If any of the "If's" above fail we just back off and return -EBUSY.<br> <br> Steps 2-5 are certainly not trivial, but doable as far as I can see.<br> <br> Have fun :)<br> Christian.<br> <br> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):<br> </div> <blockquote type="cite"> <meta name="Generator" content="Microsoft Exchange Server"> <style> <!-- .EmailQuote {margin-left:1pt; padding-left:4pt; border-left:#800000 2px solid} --> </style> <div>How about adding more condition ctx->resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock.<br> <br> Otherwise, any other idea?<br> <br> -------- Original Message --------<br> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy<br> From: Christian König <br> To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br> CC: <br> <br> </div> <font size="2"><span style="font-size:11pt"> <div class="PlainText">Well that is certainly a NAK because it can lead to deadlock in the <br> memory management.<br> <br> You can't just busy wait with all those locks held.<br> <br> Regards,<br> Christian.<br> <br> Am 23.04.19 um 03:45 schrieb Liang, Prike:<br> > Acked-by: Prike Liang <a class="moz-txt-link-rfc2396E" href="mailto:Prike.Liang@amd.com" moz-do-not-send="true"> <Prike.Liang@amd.com></a><br> ><br> > Thanks,<br> > Prike<br> > -----Original Message-----<br> > From: Chunming Zhou <a class="moz-txt-link-rfc2396E" href="mailto:david1.zhou@amd.com" moz-do-not-send="true"> <david1.zhou@amd.com></a><br> > Sent: Monday, April 22, 2019 6:39 PM<br> > To: <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true"> dri-devel@lists.freedesktop.org</a><br> > Cc: Liang, Prike <a class="moz-txt-link-rfc2396E" href="mailto:Prike.Liang@amd.com" moz-do-not-send="true"> <Prike.Liang@amd.com></a>; Zhou, David(ChunMing) <a class="moz-txt-link-rfc2396E" href="mailto:David1.Zhou@amd.com" moz-do-not-send="true"> <David1.Zhou@amd.com></a><br> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy<br> ><br> > heavy gpu job could occupy memory long time, which could lead to other user fail to get memory.<br> ><br> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11<br> > Signed-off-by: Chunming Zhou <a class="moz-txt-link-rfc2396E" href="mailto:david1.zhou@amd.com" moz-do-not-send="true"> <david1.zhou@amd.com></a><br> > ---<br> > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--<br> > 1 file changed, 4 insertions(+), 2 deletions(-)<br> ><br> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644<br> > --- a/drivers/gpu/drm/ttm/ttm_bo.c<br> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c<br> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,<br> > if (mem->mm_node)<br> > break;<br> > ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);<br> > - if (unlikely(ret != 0))<br> > - return ret;<br> > + if (unlikely(ret != 0)) {<br> > + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)<br> > + return ret;<br> > + }<br> > } while (1);<br> > mem->mem_type = mem_type;<br> > return ttm_bo_add_move_fence(bo, man, mem);<br> > --<br> > 2.17.1<br> ><br> > _______________________________________________<br> > dri-devel mailing list<br> > <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true"> dri-devel@lists.freedesktop.org</a><br> > <a href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a><br> <br> </div> </span></font><br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </div> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </blockquote> <br> </body> </html> From 184941165665d80ad1992179e7652d7e4d739b2e Mon Sep 17 00:00:00 2001 From: Chunming Zhou <david1.zhou@amd.com> Date: Wed, 24 Apr 2019 11:35:29 +0800 Subject: [PATCH] ttm: return EBUSY to user EBUSY means there is no idle BO on LRU which can be evicted, the error is meaningful to user. Change-Id: I3857c4b99859c9d2f354cf2c486dbacb8520d0ed Signed-off-by: Chunming Zhou <david1.zhou@amd.com> --- drivers/gpu/drm/ttm/ttm_bo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..dce90053f29e 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1000,7 +1000,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, return -EINVAL; } - return (has_erestartsys) ? -ERESTARTSYS : -ENOMEM; + return ret == -EBUSY ? ret : (has_erestartsys) ? -ERESTARTSYS : -ENOMEM; } EXPORT_SYMBOL(ttm_bo_mem_space);
That would change the semantics of ttm_bo_mem_space() and so could change the return code in an IOCTL as well. Not a good idea, cause that could have a lot of side effects. Instead in the calling DC code you could check if you get an -ENOMEM and then call schedule(). If after the schedule() we see that we have now BOs on the LRU we can try again and see if pinning the frame buffer now succeeds. Christian. Am 24.04.19 um 09:17 schrieb zhoucm1: > > I made a patch as attached. > > I'm not sure how to make patch as your proposal, Could you make a > patch for that if mine isn't enough? > > -David > > On 2019年04月24日 15:12, Christian König wrote: >>> how about just adding a wrapper for pin function as below? >> I considered this as well and don't think it will work reliable. >> >> We could use it as a band aid for this specific problem, but in >> general we need to improve the handling in TTM to resolve those kind >> of resource conflicts. >> >> Regards, >> Christian. >> >> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing): >>> >3. If we have a ticket we grab a reference to the first BO on the >>> LRU, drop the LRU lock and try to grab the reservation lock with the >>> ticket. >>> >>> The BO on LRU is already locked by cs user, can it be dropped here >>> by DC user? and then DC user grab its lock with ticket, how does CS >>> grab it again? >>> >>> If you think waiting in ttm has this risk, how about just adding a >>> wrapper for pin function as below? >>> amdgpu_get_pin_bo_timeout() >>> { >>> do { >>> amdgpo_bo_reserve(); >>> r=amdgpu_bo_pin(); >>> >>> if(!r) >>> break; >>> amdgpu_bo_unreserve(); >>> timeout--; >>> >>> } while(timeout>0); >>> >>> } >>> >>> -------- Original Message -------- >>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy >>> From: Christian König >>> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" >>> ,dri-devel@lists.freedesktop.org >>> CC: >>> >>> Well that's not so easy of hand. >>> >>> The basic problem here is that when you busy wait at this place you >>> can easily run into situations where application A busy waits for B >>> while B busy waits for A -> deadlock. >>> >>> So what we need here is the deadlock detection logic of the >>> ww_mutex. To use this we at least need to do the following steps: >>> >>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial). >>> >>> 2. If we then run into this EBUSY condition in TTM check if the BO >>> we need memory for (or rather the ww_mutex of its reservation >>> object) has a ticket assigned. >>> >>> 3. If we have a ticket we grab a reference to the first BO on the >>> LRU, drop the LRU lock and try to grab the reservation lock with the >>> ticket. >>> >>> 4. If getting the reservation lock with the ticket succeeded we >>> check if the BO is still the first one on the LRU in question (the >>> BO could have moved). >>> >>> 5. If the BO is still the first one on the LRU in question we try to >>> evict it as we would evict any other BO. >>> >>> 6. If any of the "If's" above fail we just back off and return -EBUSY. >>> >>> Steps 2-5 are certainly not trivial, but doable as far as I can see. >>> >>> Have fun :) >>> Christian. >>> >>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing): >>>> How about adding more condition ctx->resv inline to address your >>>> concern? As well as don't wait from same user, shouldn't lead to >>>> deadlock. >>>> >>>> Otherwise, any other idea? >>>> >>>> -------- Original Message -------- >>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy >>>> From: Christian König >>>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" >>>> ,dri-devel@lists.freedesktop.org >>>> CC: >>>> >>>> Well that is certainly a NAK because it can lead to deadlock in the >>>> memory management. >>>> >>>> You can't just busy wait with all those locks held. >>>> >>>> Regards, >>>> Christian. >>>> >>>> Am 23.04.19 um 03:45 schrieb Liang, Prike: >>>> > Acked-by: Prike Liang <Prike.Liang@amd.com> >>>> > >>>> > Thanks, >>>> > Prike >>>> > -----Original Message----- >>>> > From: Chunming Zhou <david1.zhou@amd.com> >>>> > Sent: Monday, April 22, 2019 6:39 PM >>>> > To: dri-devel@lists.freedesktop.org >>>> > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) >>>> <David1.Zhou@amd.com> >>>> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy >>>> > >>>> > heavy gpu job could occupy memory long time, which could lead to >>>> other user fail to get memory. >>>> > >>>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11 >>>> > Signed-off-by: Chunming Zhou <david1.zhou@amd.com> >>>> > --- >>>> > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++-- >>>> > 1 file changed, 4 insertions(+), 2 deletions(-) >>>> > >>>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c >>>> b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644 >>>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c >>>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c >>>> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct >>>> ttm_buffer_object *bo, >>>> > if (mem->mm_node) >>>> > break; >>>> > ret = ttm_mem_evict_first(bdev, mem_type, place, ctx); >>>> > - if (unlikely(ret != 0)) >>>> > - return ret; >>>> > + if (unlikely(ret != 0)) { >>>> > + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY) >>>> > + return ret; >>>> > + } >>>> > } while (1); >>>> > mem->mem_type = mem_type; >>>> > return ttm_bo_add_move_fence(bo, man, mem); >>>> > -- >>>> > 2.17.1 >>>> > >>>> > _______________________________________________ >>>> > dri-devel mailing list >>>> > dri-devel@lists.freedesktop.org >>>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>> >>>> >>>> _______________________________________________ >>>> dri-devel mailing list >>>> dri-devel@lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>> >>> >>> _______________________________________________ >>> dri-devel mailing list >>> dri-devel@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >> > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <div class="moz-cite-prefix">That would change the semantics of ttm_bo_mem_space() and so could change the return code in an IOCTL as well. Not a good idea, cause that could have a lot of side effects.<br> <br> Instead in the calling DC code you could check if you get an -ENOMEM and then call schedule().<br> <br> If after the schedule() we see that we have now BOs on the LRU we can try again and see if pinning the frame buffer now succeeds.<br> <br> Christian.<br> <br> Am 24.04.19 um 09:17 schrieb zhoucm1:<br> </div> <blockquote type="cite" cite="mid:b57d94e6-6e20-d809-7ac2-89a39a57a365@amd.com"> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <p>I made a patch as attached.</p> <p>I'm not sure how to make patch as your proposal, Could you make a patch for that if mine isn't enough?<br> </p> -David<br> <br> <div class="moz-cite-prefix">On 2019年04月24日 15:12, Christian König wrote:<br> </div> <blockquote type="cite" cite="mid:32f2a1b7-99f8-de9a-799c-f98af308842b@gmail.com"> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <div class="moz-cite-prefix"> <blockquote type="cite">how about just adding a wrapper for pin function as below?</blockquote> I considered this as well and don't think it will work reliable.<br> <br> We could use it as a band aid for this specific problem, but in general we need to improve the handling in TTM to resolve those kind of resource conflicts.<br> <br> Regards,<br> Christian.<br> <br> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):<br> </div> <blockquote type="cite" cite="mid:-jm8957cqk536djh1631fvvv-xx4wzb5q21ak-v8q7rd2l14aq-f68kxr7kbea18a7xceae626b-8s84c4d1mgrg53g0bhq3ahee89h70qrv4ly1-vf5a7d63x4mbquxnfmiehuk2-rwaw2m-qc2chu.1556032141262@email.android.com"> <meta content="text/html; charset=Windows-1252"> >3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.<br> <br> The BO on LRU is already locked by cs user, can it be dropped here by DC user? and then DC user grab its lock with ticket, how does CS grab it again?<br> <br> If you think waiting in ttm has this risk, how about just adding a wrapper for pin function as below?<br> amdgpu_get_pin_bo_timeout()<br> {<br> do {<br> amdgpo_bo_reserve();<br> r=amdgpu_bo_pin();<br> <br> if(!r)<br> break;<br> amdgpu_bo_unreserve();<br> timeout--;<br> <br> } while(timeout>0);<br> <br> }<br> <br> -------- Original Message --------<br> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy<br> From: Christian König <br> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br> CC: <br> <br> <div> <div class="moz-cite-prefix">Well that's not so easy of hand.<br> <br> The basic problem here is that when you busy wait at this place you can easily run into situations where application A busy waits for B while B busy waits for A -> deadlock.<br> <br> So what we need here is the deadlock detection logic of the ww_mutex. To use this we at least need to do the following steps:<br> <br> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).<br> <br> 2. If we then run into this EBUSY condition in TTM check if the BO we need memory for (or rather the ww_mutex of its reservation object) has a ticket assigned.<br> <br> 3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.<br> <br> 4. If getting the reservation lock with the ticket succeeded we check if the BO is still the first one on the LRU in question (the BO could have moved).<br> <br> 5. If the BO is still the first one on the LRU in question we try to evict it as we would evict any other BO.<br> <br> 6. If any of the "If's" above fail we just back off and return -EBUSY.<br> <br> Steps 2-5 are certainly not trivial, but doable as far as I can see.<br> <br> Have fun :)<br> Christian.<br> <br> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):<br> </div> <blockquote type="cite"> <meta name="Generator" content="Microsoft Exchange Server"> <style> <!-- .EmailQuote {margin-left:1pt; padding-left:4pt; border-left:#800000 2px solid} --> </style> <div>How about adding more condition ctx->resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock.<br> <br> Otherwise, any other idea?<br> <br> -------- Original Message --------<br> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy<br> From: Christian König <br> To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br> CC: <br> <br> </div> <font size="2"><span style="font-size:11pt"> <div class="PlainText">Well that is certainly a NAK because it can lead to deadlock in the <br> memory management.<br> <br> You can't just busy wait with all those locks held.<br> <br> Regards,<br> Christian.<br> <br> Am 23.04.19 um 03:45 schrieb Liang, Prike:<br> > Acked-by: Prike Liang <a class="moz-txt-link-rfc2396E" href="mailto:Prike.Liang@amd.com" moz-do-not-send="true"> <Prike.Liang@amd.com></a><br> ><br> > Thanks,<br> > Prike<br> > -----Original Message-----<br> > From: Chunming Zhou <a class="moz-txt-link-rfc2396E" href="mailto:david1.zhou@amd.com" moz-do-not-send="true"> <david1.zhou@amd.com></a><br> > Sent: Monday, April 22, 2019 6:39 PM<br> > To: <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true"> dri-devel@lists.freedesktop.org</a><br> > Cc: Liang, Prike <a class="moz-txt-link-rfc2396E" href="mailto:Prike.Liang@amd.com" moz-do-not-send="true"> <Prike.Liang@amd.com></a>; Zhou, David(ChunMing) <a class="moz-txt-link-rfc2396E" href="mailto:David1.Zhou@amd.com" moz-do-not-send="true"> <David1.Zhou@amd.com></a><br> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy<br> ><br> > heavy gpu job could occupy memory long time, which could lead to other user fail to get memory.<br> ><br> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11<br> > Signed-off-by: Chunming Zhou <a class="moz-txt-link-rfc2396E" href="mailto:david1.zhou@amd.com" moz-do-not-send="true"> <david1.zhou@amd.com></a><br> > ---<br> > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--<br> > 1 file changed, 4 insertions(+), 2 deletions(-)<br> ><br> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644<br> > --- a/drivers/gpu/drm/ttm/ttm_bo.c<br> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c<br> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,<br> > if (mem->mm_node)<br> > break;<br> > ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);<br> > - if (unlikely(ret != 0))<br> > - return ret;<br> > + if (unlikely(ret != 0)) {<br> > + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)<br> > + return ret;<br> > + }<br> > } while (1);<br> > mem->mem_type = mem_type;<br> > return ttm_bo_add_move_fence(bo, man, mem);<br> > --<br> > 2.17.1<br> ><br> > _______________________________________________<br> > dri-devel mailing list<br> > <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true"> dri-devel@lists.freedesktop.org</a><br> > <a href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a><br> <br> </div> </span></font><br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </div> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </body> </html>
how about new attached? -David On 2019年04月24日 15:30, Christian König wrote: > That would change the semantics of ttm_bo_mem_space() and so could > change the return code in an IOCTL as well. Not a good idea, cause > that could have a lot of side effects. > > Instead in the calling DC code you could check if you get an -ENOMEM > and then call schedule(). > > If after the schedule() we see that we have now BOs on the LRU we can > try again and see if pinning the frame buffer now succeeds. > > Christian. > > Am 24.04.19 um 09:17 schrieb zhoucm1: >> >> I made a patch as attached. >> >> I'm not sure how to make patch as your proposal, Could you make a >> patch for that if mine isn't enough? >> >> -David >> >> On 2019年04月24日 15:12, Christian König wrote: >>>> how about just adding a wrapper for pin function as below? >>> I considered this as well and don't think it will work reliable. >>> >>> We could use it as a band aid for this specific problem, but in >>> general we need to improve the handling in TTM to resolve those kind >>> of resource conflicts. >>> >>> Regards, >>> Christian. >>> >>> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing): >>>> >3. If we have a ticket we grab a reference to the first BO on the >>>> LRU, drop the LRU lock and try to grab the reservation lock with >>>> the ticket. >>>> >>>> The BO on LRU is already locked by cs user, can it be dropped here >>>> by DC user? and then DC user grab its lock with ticket, how does CS >>>> grab it again? >>>> >>>> If you think waiting in ttm has this risk, how about just adding a >>>> wrapper for pin function as below? >>>> amdgpu_get_pin_bo_timeout() >>>> { >>>> do { >>>> amdgpo_bo_reserve(); >>>> r=amdgpu_bo_pin(); >>>> >>>> if(!r) >>>> break; >>>> amdgpu_bo_unreserve(); >>>> timeout--; >>>> >>>> } while(timeout>0); >>>> >>>> } >>>> >>>> -------- Original Message -------- >>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy >>>> From: Christian König >>>> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" >>>> ,dri-devel@lists.freedesktop.org >>>> CC: >>>> >>>> Well that's not so easy of hand. >>>> >>>> The basic problem here is that when you busy wait at this place you >>>> can easily run into situations where application A busy waits for B >>>> while B busy waits for A -> deadlock. >>>> >>>> So what we need here is the deadlock detection logic of the >>>> ww_mutex. To use this we at least need to do the following steps: >>>> >>>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial). >>>> >>>> 2. If we then run into this EBUSY condition in TTM check if the BO >>>> we need memory for (or rather the ww_mutex of its reservation >>>> object) has a ticket assigned. >>>> >>>> 3. If we have a ticket we grab a reference to the first BO on the >>>> LRU, drop the LRU lock and try to grab the reservation lock with >>>> the ticket. >>>> >>>> 4. If getting the reservation lock with the ticket succeeded we >>>> check if the BO is still the first one on the LRU in question (the >>>> BO could have moved). >>>> >>>> 5. If the BO is still the first one on the LRU in question we try >>>> to evict it as we would evict any other BO. >>>> >>>> 6. If any of the "If's" above fail we just back off and return -EBUSY. >>>> >>>> Steps 2-5 are certainly not trivial, but doable as far as I can see. >>>> >>>> Have fun :) >>>> Christian. >>>> >>>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing): >>>>> How about adding more condition ctx->resv inline to address your >>>>> concern? As well as don't wait from same user, shouldn't lead to >>>>> deadlock. >>>>> >>>>> Otherwise, any other idea? >>>>> >>>>> -------- Original Message -------- >>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy >>>>> From: Christian König >>>>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" >>>>> ,dri-devel@lists.freedesktop.org >>>>> CC: >>>>> >>>>> Well that is certainly a NAK because it can lead to deadlock in the >>>>> memory management. >>>>> >>>>> You can't just busy wait with all those locks held. >>>>> >>>>> Regards, >>>>> Christian. >>>>> >>>>> Am 23.04.19 um 03:45 schrieb Liang, Prike: >>>>> > Acked-by: Prike Liang <Prike.Liang@amd.com> >>>>> > >>>>> > Thanks, >>>>> > Prike >>>>> > -----Original Message----- >>>>> > From: Chunming Zhou <david1.zhou@amd.com> >>>>> > Sent: Monday, April 22, 2019 6:39 PM >>>>> > To: dri-devel@lists.freedesktop.org >>>>> > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) >>>>> <David1.Zhou@amd.com> >>>>> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy >>>>> > >>>>> > heavy gpu job could occupy memory long time, which could lead to >>>>> other user fail to get memory. >>>>> > >>>>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11 >>>>> > Signed-off-by: Chunming Zhou <david1.zhou@amd.com> >>>>> > --- >>>>> > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++-- >>>>> > 1 file changed, 4 insertions(+), 2 deletions(-) >>>>> > >>>>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c >>>>> b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644 >>>>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c >>>>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c >>>>> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct >>>>> ttm_buffer_object *bo, >>>>> > if (mem->mm_node) >>>>> > break; >>>>> > ret = ttm_mem_evict_first(bdev, mem_type, place, >>>>> ctx); >>>>> > - if (unlikely(ret != 0)) >>>>> > - return ret; >>>>> > + if (unlikely(ret != 0)) { >>>>> > + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY) >>>>> > + return ret; >>>>> > + } >>>>> > } while (1); >>>>> > mem->mem_type = mem_type; >>>>> > return ttm_bo_add_move_fence(bo, man, mem); >>>>> > -- >>>>> > 2.17.1 >>>>> > >>>>> > _______________________________________________ >>>>> > dri-devel mailing list >>>>> > dri-devel@lists.freedesktop.org >>>>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>> >>>>> >>>>> _______________________________________________ >>>>> dri-devel mailing list >>>>> dri-devel@lists.freedesktop.org >>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>> >>>> >>>> _______________________________________________ >>>> dri-devel mailing list >>>> dri-devel@lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>> >> >> >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel > <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <p>how about new attached?</p> <p><br> </p> <p>-David<br> </p> <br> <div class="moz-cite-prefix">On 2019年04月24日 15:30, Christian König wrote:<br> </div> <blockquote type="cite" cite="mid:8db16fe8-78fe-7f25-4acd-51e37e645f3b@gmail.com"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <div class="moz-cite-prefix">That would change the semantics of ttm_bo_mem_space() and so could change the return code in an IOCTL as well. Not a good idea, cause that could have a lot of side effects.<br> <br> Instead in the calling DC code you could check if you get an -ENOMEM and then call schedule().<br> <br> If after the schedule() we see that we have now BOs on the LRU we can try again and see if pinning the frame buffer now succeeds.<br> <br> Christian.<br> <br> Am 24.04.19 um 09:17 schrieb zhoucm1:<br> </div> <blockquote type="cite" cite="mid:b57d94e6-6e20-d809-7ac2-89a39a57a365@amd.com"> <p>I made a patch as attached.</p> <p>I'm not sure how to make patch as your proposal, Could you make a patch for that if mine isn't enough?<br> </p> -David<br> <br> <div class="moz-cite-prefix">On 2019年04月24日 15:12, Christian König wrote:<br> </div> <blockquote type="cite" cite="mid:32f2a1b7-99f8-de9a-799c-f98af308842b@gmail.com"> <div class="moz-cite-prefix"> <blockquote type="cite">how about just adding a wrapper for pin function as below?</blockquote> I considered this as well and don't think it will work reliable.<br> <br> We could use it as a band aid for this specific problem, but in general we need to improve the handling in TTM to resolve those kind of resource conflicts.<br> <br> Regards,<br> Christian.<br> <br> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):<br> </div> <blockquote type="cite" cite="mid:-jm8957cqk536djh1631fvvv-xx4wzb5q21ak-v8q7rd2l14aq-f68kxr7kbea18a7xceae626b-8s84c4d1mgrg53g0bhq3ahee89h70qrv4ly1-vf5a7d63x4mbquxnfmiehuk2-rwaw2m-qc2chu.1556032141262@email.android.com"> <meta content="text/html; charset=Windows-1252"> >3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.<br> <br> The BO on LRU is already locked by cs user, can it be dropped here by DC user? and then DC user grab its lock with ticket, how does CS grab it again?<br> <br> If you think waiting in ttm has this risk, how about just adding a wrapper for pin function as below?<br> amdgpu_get_pin_bo_timeout()<br> {<br> do {<br> amdgpo_bo_reserve();<br> r=amdgpu_bo_pin();<br> <br> if(!r)<br> break;<br> amdgpu_bo_unreserve();<br> timeout--;<br> <br> } while(timeout>0);<br> <br> }<br> <br> -------- Original Message --------<br> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy<br> From: Christian König <br> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br> CC: <br> <br> <div> <div class="moz-cite-prefix">Well that's not so easy of hand.<br> <br> The basic problem here is that when you busy wait at this place you can easily run into situations where application A busy waits for B while B busy waits for A -> deadlock.<br> <br> So what we need here is the deadlock detection logic of the ww_mutex. To use this we at least need to do the following steps:<br> <br> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).<br> <br> 2. If we then run into this EBUSY condition in TTM check if the BO we need memory for (or rather the ww_mutex of its reservation object) has a ticket assigned.<br> <br> 3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.<br> <br> 4. If getting the reservation lock with the ticket succeeded we check if the BO is still the first one on the LRU in question (the BO could have moved).<br> <br> 5. If the BO is still the first one on the LRU in question we try to evict it as we would evict any other BO.<br> <br> 6. If any of the "If's" above fail we just back off and return -EBUSY.<br> <br> Steps 2-5 are certainly not trivial, but doable as far as I can see.<br> <br> Have fun :)<br> Christian.<br> <br> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):<br> </div> <blockquote type="cite"> <meta name="Generator" content="Microsoft Exchange Server"> <style> <!-- .EmailQuote {margin-left:1pt; padding-left:4pt; border-left:#800000 2px solid} --> </style> <div>How about adding more condition ctx->resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock.<br> <br> Otherwise, any other idea?<br> <br> -------- Original Message --------<br> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy<br> From: Christian König <br> To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br> CC: <br> <br> </div> <font size="2"><span style="font-size:11pt"> <div class="PlainText">Well that is certainly a NAK because it can lead to deadlock in the <br> memory management.<br> <br> You can't just busy wait with all those locks held.<br> <br> Regards,<br> Christian.<br> <br> Am 23.04.19 um 03:45 schrieb Liang, Prike:<br> > Acked-by: Prike Liang <a class="moz-txt-link-rfc2396E" href="mailto:Prike.Liang@amd.com" moz-do-not-send="true"> <Prike.Liang@amd.com></a><br> ><br> > Thanks,<br> > Prike<br> > -----Original Message-----<br> > From: Chunming Zhou <a class="moz-txt-link-rfc2396E" href="mailto:david1.zhou@amd.com" moz-do-not-send="true"> <david1.zhou@amd.com></a><br> > Sent: Monday, April 22, 2019 6:39 PM<br> > To: <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true"> dri-devel@lists.freedesktop.org</a><br> > Cc: Liang, Prike <a class="moz-txt-link-rfc2396E" href="mailto:Prike.Liang@amd.com" moz-do-not-send="true"> <Prike.Liang@amd.com></a>; Zhou, David(ChunMing) <a class="moz-txt-link-rfc2396E" href="mailto:David1.Zhou@amd.com" moz-do-not-send="true"> <David1.Zhou@amd.com></a><br> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy<br> ><br> > heavy gpu job could occupy memory long time, which could lead to other user fail to get memory.<br> ><br> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11<br> > Signed-off-by: Chunming Zhou <a class="moz-txt-link-rfc2396E" href="mailto:david1.zhou@amd.com" moz-do-not-send="true"> <david1.zhou@amd.com></a><br> > ---<br> > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--<br> > 1 file changed, 4 insertions(+), 2 deletions(-)<br> ><br> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644<br> > --- a/drivers/gpu/drm/ttm/ttm_bo.c<br> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c<br> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,<br> > if (mem->mm_node)<br> > break;<br> > ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);<br> > - if (unlikely(ret != 0))<br> > - return ret;<br> > + if (unlikely(ret != 0)) {<br> > + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)<br> > + return ret;<br> > + }<br> > } while (1);<br> > mem->mem_type = mem_type;<br> > return ttm_bo_add_move_fence(bo, man, mem);<br> > --<br> > 2.17.1<br> ><br> > _______________________________________________<br> > dri-devel mailing list<br> > <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true"> dri-devel@lists.freedesktop.org</a><br> > <a href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a><br> <br> </div> </span></font><br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </div> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </blockquote> <br> </body> </html> From 61174dd44618bbfcb035484a44435b7ed4df86a5 Mon Sep 17 00:00:00 2001 From: Chunming Zhou <david1.zhou@amd.com> Date: Wed, 24 Apr 2019 11:35:29 +0800 Subject: [PATCH] drm/amdgpu: wait memory idle when fb is pinned fail heavy gpu job execution could occupy memory long time, which could lead to other user fail to get memory. Change-Id: I3857c4b99859c9d2f354cf2c486dbacb8520d0ed Signed-off-by: Chunming Zhou <david1.zhou@amd.com> --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index a5cacf846e1b..8577b629a640 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -4093,6 +4093,8 @@ static const struct drm_plane_funcs dm_plane_funcs = { .atomic_destroy_state = dm_drm_plane_destroy_state, }; +/* 100s default to wait for pin fb */ +#define DM_PIN_MEM_TIMEOUT 100000000000ULL static int dm_plane_helper_prepare_fb(struct drm_plane *plane, struct drm_plane_state *new_state) { @@ -4103,6 +4105,7 @@ static int dm_plane_helper_prepare_fb(struct drm_plane *plane, struct dm_plane_state *dm_plane_state_new, *dm_plane_state_old; uint64_t tiling_flags; uint32_t domain; + u64 timeout = nsecs_to_jiffies64(DM_PIN_MEM_TIMEOUT); int r; dm_plane_state_old = to_dm_plane_state(plane->state); @@ -4117,6 +4120,8 @@ static int dm_plane_helper_prepare_fb(struct drm_plane *plane, obj = new_state->fb->obj[0]; rbo = gem_to_amdgpu_bo(obj); adev = amdgpu_ttm_adev(rbo->tbo.bdev); + +retry: r = amdgpu_bo_reserve(rbo, false); if (unlikely(r != 0)) return r; @@ -4131,8 +4136,18 @@ static int dm_plane_helper_prepare_fb(struct drm_plane *plane, if (r != -ERESTARTSYS) DRM_ERROR("Failed to pin framebuffer with error %d\n", r); amdgpu_bo_unreserve(rbo); + if (r == -ENOMEM) { + set_current_state(TASK_INTERRUPTIBLE); + if (timeout == 0) + return r; + if (signal_pending(current)) + return r; + timeout = schedule_timeout(timeout); + goto retry; + } return r; } + __set_current_state(TASK_RUNNING); r = amdgpu_ttm_alloc_gart(&rbo->tbo); if (unlikely(r != 0)) {
On Tue, Apr 23, 2019 at 4:42 PM Christian König <ckoenig.leichtzumerken@gmail.com> wrote: > > Well that's not so easy of hand. > > The basic problem here is that when you busy wait at this place you can easily run into situations where application A busy waits for B while B busy waits for A -> deadlock. > > So what we need here is the deadlock detection logic of the ww_mutex. To use this we at least need to do the following steps: > > 1. Reserve the BO in DC using a ww_mutex ticket (trivial). > > 2. If we then run into this EBUSY condition in TTM check if the BO we need memory for (or rather the ww_mutex of its reservation object) has a ticket assigned. > > 3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket. > > 4. If getting the reservation lock with the ticket succeeded we check if the BO is still the first one on the LRU in question (the BO could have moved). I don't think you actually need this check. Once you're in this slow reclaim mode all hope for performance is pretty much lost (you're thrashin vram terribly), forward progress matters. Also, less code :-) > 5. If the BO is still the first one on the LRU in question we try to evict it as we would evict any other BO. > > 6. If any of the "If's" above fail we just back off and return -EBUSY. Another idea I pondered (but never implemented) is a slow reclaim lru lock. Essentially there'd be two ways to walk the lru and evict bo: - fast path: spinlock + trylock, like today - slow path: ww_mutex lru lock, plus every bo is reserved (nested within that ww_mutex lru lock) with a full ww_mutex_lock. Guaranteed forward progress. Transition would be that as soon as someone hits an EBUSY, they set the slow reclaim flag (while holding the quick reclaim spinlock quickly, which will drain anyone still stuck in fast reclaim path). Everytime fast reclaim acquires the spinlock it needs to check for the slow reclaim flag, and if that's set, fall back to slow reclaim. Transitioning out of slow reclaim would only happen once the thread (with it's ww ticket) that hit the EBUSY has completed whatever it was trying to do (either successfully, or failed because even evicting everyone else didn't give you enough vram). Tricky part here is making sure threads still in slow reclaim don't blow up if we switch back. Since only ever one thread can be actually doing slow reclaim (everyone is serialized on the single ww mutex lru lock) should be doable by checking for the slow reclaim conditiona once you have the lru ww_mutex and if the slow reclaim condition is lifted, switch back to fast reclaim. The slow reclaim conditiona might also need to be a full reference count, to handle multiple threads hitting EBUSY/slow reclaim without the book-keeping getting all confused. Upshot of this is that it's guranteeing forward progress, but the perf cliff might be too steep if this happens too often. You might need to round it off with 1-2 retries when you hit EBUSY, before forcing slow reclaim. -Daniel > Steps 2-5 are certainly not trivial, but doable as far as I can see. > > Have fun :) > Christian. > > Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing): > > How about adding more condition ctx->resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock. > > Otherwise, any other idea? > > -------- Original Message -------- > Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy > From: Christian König > To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org > CC: > > Well that is certainly a NAK because it can lead to deadlock in the > memory management. > > You can't just busy wait with all those locks held. > > Regards, > Christian. > > Am 23.04.19 um 03:45 schrieb Liang, Prike: > > Acked-by: Prike Liang <Prike.Liang@amd.com> > > > > Thanks, > > Prike > > -----Original Message----- > > From: Chunming Zhou <david1.zhou@amd.com> > > Sent: Monday, April 22, 2019 6:39 PM > > To: dri-devel@lists.freedesktop.org > > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com> > > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy > > > > heavy gpu job could occupy memory long time, which could lead to other user fail to get memory. > > > > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11 > > Signed-off-by: Chunming Zhou <david1.zhou@amd.com> > > --- > > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644 > > --- a/drivers/gpu/drm/ttm/ttm_bo.c > > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, > > if (mem->mm_node) > > break; > > ret = ttm_mem_evict_first(bdev, mem_type, place, ctx); > > - if (unlikely(ret != 0)) > > - return ret; > > + if (unlikely(ret != 0)) { > > + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY) > > + return ret; > > + } > > } while (1); > > mem->mem_type = mem_type; > > return ttm_bo_add_move_fence(bo, man, mem); > > -- > > 2.17.1 > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Am 24.04.19 um 10:01 schrieb Daniel Vetter: > On Tue, Apr 23, 2019 at 4:42 PM Christian König > <ckoenig.leichtzumerken@gmail.com> wrote: >> Well that's not so easy of hand. >> >> The basic problem here is that when you busy wait at this place you can easily run into situations where application A busy waits for B while B busy waits for A -> deadlock. >> >> So what we need here is the deadlock detection logic of the ww_mutex. To use this we at least need to do the following steps: >> >> 1. Reserve the BO in DC using a ww_mutex ticket (trivial). >> >> 2. If we then run into this EBUSY condition in TTM check if the BO we need memory for (or rather the ww_mutex of its reservation object) has a ticket assigned. >> >> 3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket. >> >> 4. If getting the reservation lock with the ticket succeeded we check if the BO is still the first one on the LRU in question (the BO could have moved). > I don't think you actually need this check. Once you're in this slow > reclaim mode all hope for performance is pretty much lost (you're > thrashin vram terribly), forward progress matters. Also, less code :-) This step is not for performance, but for correctness. When you drop the LRU lock and grab the ww_mutex in the slow path you need to double check that this BO wasn't evicted by somebody else in the meantime. >> 5. If the BO is still the first one on the LRU in question we try to evict it as we would evict any other BO. >> >> 6. If any of the "If's" above fail we just back off and return -EBUSY. > Another idea I pondered (but never implemented) is a slow reclaim lru > lock. Essentially there'd be two ways to walk the lru and evict bo: > > - fast path: spinlock + trylock, like today > > - slow path: ww_mutex lru lock, plus every bo is reserved (nested > within that ww_mutex lru lock) with a full ww_mutex_lock. Guaranteed > forward progress. Of hand I don't see any advantage to this. So what is the benefit? Regards, Christian. > > Transition would be that as soon as someone hits an EBUSY, they set > the slow reclaim flag (while holding the quick reclaim spinlock > quickly, which will drain anyone still stuck in fast reclaim path). > Everytime fast reclaim acquires the spinlock it needs to check for the > slow reclaim flag, and if that's set, fall back to slow reclaim. > > Transitioning out of slow reclaim would only happen once the thread > (with it's ww ticket) that hit the EBUSY has completed whatever it was > trying to do (either successfully, or failed because even evicting > everyone else didn't give you enough vram). Tricky part here is making > sure threads still in slow reclaim don't blow up if we switch back. > Since only ever one thread can be actually doing slow reclaim > (everyone is serialized on the single ww mutex lru lock) should be > doable by checking for the slow reclaim conditiona once you have the > lru ww_mutex and if the slow reclaim condition is lifted, switch back > to fast reclaim. > > The slow reclaim conditiona might also need to be a full reference > count, to handle multiple threads hitting EBUSY/slow reclaim without > the book-keeping getting all confused. > > Upshot of this is that it's guranteeing forward progress, but the perf > cliff might be too steep if this happens too often. You might need to > round it off with 1-2 retries when you hit EBUSY, before forcing slow > reclaim. > -Daniel > >> Steps 2-5 are certainly not trivial, but doable as far as I can see. >> >> Have fun :) >> Christian. >> >> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing): >> >> How about adding more condition ctx->resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock. >> >> Otherwise, any other idea? >> >> -------- Original Message -------- >> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy >> From: Christian König >> To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org >> CC: >> >> Well that is certainly a NAK because it can lead to deadlock in the >> memory management. >> >> You can't just busy wait with all those locks held. >> >> Regards, >> Christian. >> >> Am 23.04.19 um 03:45 schrieb Liang, Prike: >>> Acked-by: Prike Liang <Prike.Liang@amd.com> >>> >>> Thanks, >>> Prike >>> -----Original Message----- >>> From: Chunming Zhou <david1.zhou@amd.com> >>> Sent: Monday, April 22, 2019 6:39 PM >>> To: dri-devel@lists.freedesktop.org >>> Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com> >>> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy >>> >>> heavy gpu job could occupy memory long time, which could lead to other user fail to get memory. >>> >>> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11 >>> Signed-off-by: Chunming Zhou <david1.zhou@amd.com> >>> --- >>> drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++-- >>> 1 file changed, 4 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644 >>> --- a/drivers/gpu/drm/ttm/ttm_bo.c >>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c >>> @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, >>> if (mem->mm_node) >>> break; >>> ret = ttm_mem_evict_first(bdev, mem_type, place, ctx); >>> - if (unlikely(ret != 0)) >>> - return ret; >>> + if (unlikely(ret != 0)) { >>> + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY) >>> + return ret; >>> + } >>> } while (1); >>> mem->mem_type = mem_type; >>> return ttm_bo_add_move_fence(bo, man, mem); >>> -- >>> 2.17.1 >>> >>> _______________________________________________ >>> dri-devel mailing list >>> dri-devel@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >> >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel >> >> >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel > >
This is used in a work item, so you don't need to check for signals. And checking if the LRU is populated is mandatory here or otherwise you can run into an endless loop. Christian. Am 24.04.19 um 09:59 schrieb zhoucm1: > > how about new attached? > > > -David > > > On 2019年04月24日 15:30, Christian König wrote: >> That would change the semantics of ttm_bo_mem_space() and so could >> change the return code in an IOCTL as well. Not a good idea, cause >> that could have a lot of side effects. >> >> Instead in the calling DC code you could check if you get an -ENOMEM >> and then call schedule(). >> >> If after the schedule() we see that we have now BOs on the LRU we can >> try again and see if pinning the frame buffer now succeeds. >> >> Christian. >> >> Am 24.04.19 um 09:17 schrieb zhoucm1: >>> >>> I made a patch as attached. >>> >>> I'm not sure how to make patch as your proposal, Could you make a >>> patch for that if mine isn't enough? >>> >>> -David >>> >>> On 2019年04月24日 15:12, Christian König wrote: >>>>> how about just adding a wrapper for pin function as below? >>>> I considered this as well and don't think it will work reliable. >>>> >>>> We could use it as a band aid for this specific problem, but in >>>> general we need to improve the handling in TTM to resolve those >>>> kind of resource conflicts. >>>> >>>> Regards, >>>> Christian. >>>> >>>> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing): >>>>> >3. If we have a ticket we grab a reference to the first BO on the >>>>> LRU, drop the LRU lock and try to grab the reservation lock with >>>>> the ticket. >>>>> >>>>> The BO on LRU is already locked by cs user, can it be dropped here >>>>> by DC user? and then DC user grab its lock with ticket, how does >>>>> CS grab it again? >>>>> >>>>> If you think waiting in ttm has this risk, how about just adding a >>>>> wrapper for pin function as below? >>>>> amdgpu_get_pin_bo_timeout() >>>>> { >>>>> do { >>>>> amdgpo_bo_reserve(); >>>>> r=amdgpu_bo_pin(); >>>>> >>>>> if(!r) >>>>> break; >>>>> amdgpu_bo_unreserve(); >>>>> timeout--; >>>>> >>>>> } while(timeout>0); >>>>> >>>>> } >>>>> >>>>> -------- Original Message -------- >>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy >>>>> From: Christian König >>>>> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" >>>>> ,dri-devel@lists.freedesktop.org >>>>> CC: >>>>> >>>>> Well that's not so easy of hand. >>>>> >>>>> The basic problem here is that when you busy wait at this place >>>>> you can easily run into situations where application A busy waits >>>>> for B while B busy waits for A -> deadlock. >>>>> >>>>> So what we need here is the deadlock detection logic of the >>>>> ww_mutex. To use this we at least need to do the following steps: >>>>> >>>>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial). >>>>> >>>>> 2. If we then run into this EBUSY condition in TTM check if the BO >>>>> we need memory for (or rather the ww_mutex of its reservation >>>>> object) has a ticket assigned. >>>>> >>>>> 3. If we have a ticket we grab a reference to the first BO on the >>>>> LRU, drop the LRU lock and try to grab the reservation lock with >>>>> the ticket. >>>>> >>>>> 4. If getting the reservation lock with the ticket succeeded we >>>>> check if the BO is still the first one on the LRU in question (the >>>>> BO could have moved). >>>>> >>>>> 5. If the BO is still the first one on the LRU in question we try >>>>> to evict it as we would evict any other BO. >>>>> >>>>> 6. If any of the "If's" above fail we just back off and return -EBUSY. >>>>> >>>>> Steps 2-5 are certainly not trivial, but doable as far as I can see. >>>>> >>>>> Have fun :) >>>>> Christian. >>>>> >>>>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing): >>>>>> How about adding more condition ctx->resv inline to address your >>>>>> concern? As well as don't wait from same user, shouldn't lead to >>>>>> deadlock. >>>>>> >>>>>> Otherwise, any other idea? >>>>>> >>>>>> -------- Original Message -------- >>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy >>>>>> From: Christian König >>>>>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" >>>>>> ,dri-devel@lists.freedesktop.org >>>>>> CC: >>>>>> >>>>>> Well that is certainly a NAK because it can lead to deadlock in the >>>>>> memory management. >>>>>> >>>>>> You can't just busy wait with all those locks held. >>>>>> >>>>>> Regards, >>>>>> Christian. >>>>>> >>>>>> Am 23.04.19 um 03:45 schrieb Liang, Prike: >>>>>> > Acked-by: Prike Liang <Prike.Liang@amd.com> >>>>>> > >>>>>> > Thanks, >>>>>> > Prike >>>>>> > -----Original Message----- >>>>>> > From: Chunming Zhou <david1.zhou@amd.com> >>>>>> > Sent: Monday, April 22, 2019 6:39 PM >>>>>> > To: dri-devel@lists.freedesktop.org >>>>>> > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) >>>>>> <David1.Zhou@amd.com> >>>>>> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy >>>>>> > >>>>>> > heavy gpu job could occupy memory long time, which could lead >>>>>> to other user fail to get memory. >>>>>> > >>>>>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11 >>>>>> > Signed-off-by: Chunming Zhou <david1.zhou@amd.com> >>>>>> > --- >>>>>> > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++-- >>>>>> > 1 file changed, 4 insertions(+), 2 deletions(-) >>>>>> > >>>>>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c >>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec >>>>>> 100644 >>>>>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c >>>>>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c >>>>>> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct >>>>>> ttm_buffer_object *bo, >>>>>> > if (mem->mm_node) >>>>>> > break; >>>>>> > ret = ttm_mem_evict_first(bdev, mem_type, place, >>>>>> ctx); >>>>>> > - if (unlikely(ret != 0)) >>>>>> > - return ret; >>>>>> > + if (unlikely(ret != 0)) { >>>>>> > + if (!ctx || ctx->no_wait_gpu || ret != >>>>>> -EBUSY) >>>>>> > + return ret; >>>>>> > + } >>>>>> > } while (1); >>>>>> > mem->mem_type = mem_type; >>>>>> > return ttm_bo_add_move_fence(bo, man, mem); >>>>>> > -- >>>>>> > 2.17.1 >>>>>> > >>>>>> > _______________________________________________ >>>>>> > dri-devel mailing list >>>>>> > dri-devel@lists.freedesktop.org >>>>>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> dri-devel mailing list >>>>>> dri-devel@lists.freedesktop.org >>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>> >>>>> >>>>> _______________________________________________ >>>>> dri-devel mailing list >>>>> dri-devel@lists.freedesktop.org >>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>> >>> >>> >>> _______________________________________________ >>> dri-devel mailing list >>> dri-devel@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >> > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <div class="moz-cite-prefix">This is used in a work item, so you don't need to check for signals.<br> <br> And checking if the LRU is populated is mandatory here or otherwise you can run into an endless loop.<br> <br> Christian.<br> <br> Am 24.04.19 um 09:59 schrieb zhoucm1:<br> </div> <blockquote type="cite" cite="mid:26341685-6e73-55c4-2609-52616d92c06a@amd.com"> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <p>how about new attached?</p> <p><br> </p> <p>-David<br> </p> <br> <div class="moz-cite-prefix">On 2019年04月24日 15:30, Christian König wrote:<br> </div> <blockquote type="cite" cite="mid:8db16fe8-78fe-7f25-4acd-51e37e645f3b@gmail.com"> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <div class="moz-cite-prefix">That would change the semantics of ttm_bo_mem_space() and so could change the return code in an IOCTL as well. Not a good idea, cause that could have a lot of side effects.<br> <br> Instead in the calling DC code you could check if you get an -ENOMEM and then call schedule().<br> <br> If after the schedule() we see that we have now BOs on the LRU we can try again and see if pinning the frame buffer now succeeds.<br> <br> Christian.<br> <br> Am 24.04.19 um 09:17 schrieb zhoucm1:<br> </div> <blockquote type="cite" cite="mid:b57d94e6-6e20-d809-7ac2-89a39a57a365@amd.com"> <p>I made a patch as attached.</p> <p>I'm not sure how to make patch as your proposal, Could you make a patch for that if mine isn't enough?<br> </p> -David<br> <br> <div class="moz-cite-prefix">On 2019年04月24日 15:12, Christian König wrote:<br> </div> <blockquote type="cite" cite="mid:32f2a1b7-99f8-de9a-799c-f98af308842b@gmail.com"> <div class="moz-cite-prefix"> <blockquote type="cite">how about just adding a wrapper for pin function as below?</blockquote> I considered this as well and don't think it will work reliable.<br> <br> We could use it as a band aid for this specific problem, but in general we need to improve the handling in TTM to resolve those kind of resource conflicts.<br> <br> Regards,<br> Christian.<br> <br> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):<br> </div> <blockquote type="cite" cite="mid:-jm8957cqk536djh1631fvvv-xx4wzb5q21ak-v8q7rd2l14aq-f68kxr7kbea18a7xceae626b-8s84c4d1mgrg53g0bhq3ahee89h70qrv4ly1-vf5a7d63x4mbquxnfmiehuk2-rwaw2m-qc2chu.1556032141262@email.android.com"> <meta content="text/html; charset=Windows-1252"> >3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.<br> <br> The BO on LRU is already locked by cs user, can it be dropped here by DC user? and then DC user grab its lock with ticket, how does CS grab it again?<br> <br> If you think waiting in ttm has this risk, how about just adding a wrapper for pin function as below?<br> amdgpu_get_pin_bo_timeout()<br> {<br> do {<br> amdgpo_bo_reserve();<br> r=amdgpu_bo_pin();<br> <br> if(!r)<br> break;<br> amdgpu_bo_unreserve();<br> timeout--;<br> <br> } while(timeout>0);<br> <br> }<br> <br> -------- Original Message --------<br> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy<br> From: Christian König <br> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br> CC: <br> <br> <div> <div class="moz-cite-prefix">Well that's not so easy of hand.<br> <br> The basic problem here is that when you busy wait at this place you can easily run into situations where application A busy waits for B while B busy waits for A -> deadlock.<br> <br> So what we need here is the deadlock detection logic of the ww_mutex. To use this we at least need to do the following steps:<br> <br> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).<br> <br> 2. If we then run into this EBUSY condition in TTM check if the BO we need memory for (or rather the ww_mutex of its reservation object) has a ticket assigned.<br> <br> 3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.<br> <br> 4. If getting the reservation lock with the ticket succeeded we check if the BO is still the first one on the LRU in question (the BO could have moved).<br> <br> 5. If the BO is still the first one on the LRU in question we try to evict it as we would evict any other BO.<br> <br> 6. If any of the "If's" above fail we just back off and return -EBUSY.<br> <br> Steps 2-5 are certainly not trivial, but doable as far as I can see.<br> <br> Have fun :)<br> Christian.<br> <br> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):<br> </div> <blockquote type="cite"> <meta name="Generator" content="Microsoft Exchange Server"> <style> <!-- .EmailQuote {margin-left:1pt; padding-left:4pt; border-left:#800000 2px solid} --> </style> <div>How about adding more condition ctx->resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock.<br> <br> Otherwise, any other idea?<br> <br> -------- Original Message --------<br> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy<br> From: Christian König <br> To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br> CC: <br> <br> </div> <font size="2"><span style="font-size:11pt"> <div class="PlainText">Well that is certainly a NAK because it can lead to deadlock in the <br> memory management.<br> <br> You can't just busy wait with all those locks held.<br> <br> Regards,<br> Christian.<br> <br> Am 23.04.19 um 03:45 schrieb Liang, Prike:<br> > Acked-by: Prike Liang <a class="moz-txt-link-rfc2396E" href="mailto:Prike.Liang@amd.com" moz-do-not-send="true"> <Prike.Liang@amd.com></a><br> ><br> > Thanks,<br> > Prike<br> > -----Original Message-----<br> > From: Chunming Zhou <a class="moz-txt-link-rfc2396E" href="mailto:david1.zhou@amd.com" moz-do-not-send="true"> <david1.zhou@amd.com></a><br> > Sent: Monday, April 22, 2019 6:39 PM<br> > To: <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true"> dri-devel@lists.freedesktop.org</a><br> > Cc: Liang, Prike <a class="moz-txt-link-rfc2396E" href="mailto:Prike.Liang@amd.com" moz-do-not-send="true"> <Prike.Liang@amd.com></a>; Zhou, David(ChunMing) <a class="moz-txt-link-rfc2396E" href="mailto:David1.Zhou@amd.com" moz-do-not-send="true"> <David1.Zhou@amd.com></a><br> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy<br> ><br> > heavy gpu job could occupy memory long time, which could lead to other user fail to get memory.<br> ><br> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11<br> > Signed-off-by: Chunming Zhou <a class="moz-txt-link-rfc2396E" href="mailto:david1.zhou@amd.com" moz-do-not-send="true"> <david1.zhou@amd.com></a><br> > ---<br> > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--<br> > 1 file changed, 4 insertions(+), 2 deletions(-)<br> ><br> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644<br> > --- a/drivers/gpu/drm/ttm/ttm_bo.c<br> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c<br> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,<br> > if (mem->mm_node)<br> > break;<br> > ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);<br> > - if (unlikely(ret != 0))<br> > - return ret;<br> > + if (unlikely(ret != 0)) {<br> > + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)<br> > + return ret;<br> > + }<br> > } while (1);<br> > mem->mem_type = mem_type;<br> > return ttm_bo_add_move_fence(bo, man, mem);<br> > --<br> > 2.17.1<br> ><br> > _______________________________________________<br> > dri-devel mailing list<br> > <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true"> dri-devel@lists.freedesktop.org</a><br> > <a href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a><br> <br> </div> </span></font><br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </div> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </body> </html>
On 2019年04月24日 16:07, Christian König wrote: > This is used in a work item, so you don't need to check for signals. will remove. > > And checking if the LRU is populated is mandatory here How to check it outside of TTM? because the code is in dm. > or otherwise you can run into an endless loop. I already add a timeout for that. -David > > Christian. > > Am 24.04.19 um 09:59 schrieb zhoucm1: >> >> how about new attached? >> >> >> -David >> >> >> On 2019年04月24日 15:30, Christian König wrote: >>> That would change the semantics of ttm_bo_mem_space() and so could >>> change the return code in an IOCTL as well. Not a good idea, cause >>> that could have a lot of side effects. >>> >>> Instead in the calling DC code you could check if you get an -ENOMEM >>> and then call schedule(). >>> >>> If after the schedule() we see that we have now BOs on the LRU we >>> can try again and see if pinning the frame buffer now succeeds. >>> >>> Christian. >>> >>> Am 24.04.19 um 09:17 schrieb zhoucm1: >>>> >>>> I made a patch as attached. >>>> >>>> I'm not sure how to make patch as your proposal, Could you make a >>>> patch for that if mine isn't enough? >>>> >>>> -David >>>> >>>> On 2019年04月24日 15:12, Christian König wrote: >>>>>> how about just adding a wrapper for pin function as below? >>>>> I considered this as well and don't think it will work reliable. >>>>> >>>>> We could use it as a band aid for this specific problem, but in >>>>> general we need to improve the handling in TTM to resolve those >>>>> kind of resource conflicts. >>>>> >>>>> Regards, >>>>> Christian. >>>>> >>>>> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing): >>>>>> >3. If we have a ticket we grab a reference to the first BO on >>>>>> the LRU, drop the LRU lock and try to grab the reservation lock >>>>>> with the ticket. >>>>>> >>>>>> The BO on LRU is already locked by cs user, can it be dropped >>>>>> here by DC user? and then DC user grab its lock with ticket, how >>>>>> does CS grab it again? >>>>>> >>>>>> If you think waiting in ttm has this risk, how about just adding >>>>>> a wrapper for pin function as below? >>>>>> amdgpu_get_pin_bo_timeout() >>>>>> { >>>>>> do { >>>>>> amdgpo_bo_reserve(); >>>>>> r=amdgpu_bo_pin(); >>>>>> >>>>>> if(!r) >>>>>> break; >>>>>> amdgpu_bo_unreserve(); >>>>>> timeout--; >>>>>> >>>>>> } while(timeout>0); >>>>>> >>>>>> } >>>>>> >>>>>> -------- Original Message -------- >>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy >>>>>> From: Christian König >>>>>> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" >>>>>> ,dri-devel@lists.freedesktop.org >>>>>> CC: >>>>>> >>>>>> Well that's not so easy of hand. >>>>>> >>>>>> The basic problem here is that when you busy wait at this place >>>>>> you can easily run into situations where application A busy waits >>>>>> for B while B busy waits for A -> deadlock. >>>>>> >>>>>> So what we need here is the deadlock detection logic of the >>>>>> ww_mutex. To use this we at least need to do the following steps: >>>>>> >>>>>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial). >>>>>> >>>>>> 2. If we then run into this EBUSY condition in TTM check if the >>>>>> BO we need memory for (or rather the ww_mutex of its reservation >>>>>> object) has a ticket assigned. >>>>>> >>>>>> 3. If we have a ticket we grab a reference to the first BO on the >>>>>> LRU, drop the LRU lock and try to grab the reservation lock with >>>>>> the ticket. >>>>>> >>>>>> 4. If getting the reservation lock with the ticket succeeded we >>>>>> check if the BO is still the first one on the LRU in question >>>>>> (the BO could have moved). >>>>>> >>>>>> 5. If the BO is still the first one on the LRU in question we try >>>>>> to evict it as we would evict any other BO. >>>>>> >>>>>> 6. If any of the "If's" above fail we just back off and return >>>>>> -EBUSY. >>>>>> >>>>>> Steps 2-5 are certainly not trivial, but doable as far as I can see. >>>>>> >>>>>> Have fun :) >>>>>> Christian. >>>>>> >>>>>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing): >>>>>>> How about adding more condition ctx->resv inline to address your >>>>>>> concern? As well as don't wait from same user, shouldn't lead to >>>>>>> deadlock. >>>>>>> >>>>>>> Otherwise, any other idea? >>>>>>> >>>>>>> -------- Original Message -------- >>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu >>>>>>> busy >>>>>>> From: Christian König >>>>>>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" >>>>>>> ,dri-devel@lists.freedesktop.org >>>>>>> CC: >>>>>>> >>>>>>> Well that is certainly a NAK because it can lead to deadlock in the >>>>>>> memory management. >>>>>>> >>>>>>> You can't just busy wait with all those locks held. >>>>>>> >>>>>>> Regards, >>>>>>> Christian. >>>>>>> >>>>>>> Am 23.04.19 um 03:45 schrieb Liang, Prike: >>>>>>> > Acked-by: Prike Liang <Prike.Liang@amd.com> >>>>>>> > >>>>>>> > Thanks, >>>>>>> > Prike >>>>>>> > -----Original Message----- >>>>>>> > From: Chunming Zhou <david1.zhou@amd.com> >>>>>>> > Sent: Monday, April 22, 2019 6:39 PM >>>>>>> > To: dri-devel@lists.freedesktop.org >>>>>>> > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) >>>>>>> <David1.Zhou@amd.com> >>>>>>> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy >>>>>>> > >>>>>>> > heavy gpu job could occupy memory long time, which could lead >>>>>>> to other user fail to get memory. >>>>>>> > >>>>>>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11 >>>>>>> > Signed-off-by: Chunming Zhou <david1.zhou@amd.com> >>>>>>> > --- >>>>>>> > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++-- >>>>>>> > 1 file changed, 4 insertions(+), 2 deletions(-) >>>>>>> > >>>>>>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c >>>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec >>>>>>> 100644 >>>>>>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c >>>>>>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c >>>>>>> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct >>>>>>> ttm_buffer_object *bo, >>>>>>> > if (mem->mm_node) >>>>>>> > break; >>>>>>> > ret = ttm_mem_evict_first(bdev, mem_type, >>>>>>> place, ctx); >>>>>>> > - if (unlikely(ret != 0)) >>>>>>> > - return ret; >>>>>>> > + if (unlikely(ret != 0)) { >>>>>>> > + if (!ctx || ctx->no_wait_gpu || ret != >>>>>>> -EBUSY) >>>>>>> > + return ret; >>>>>>> > + } >>>>>>> > } while (1); >>>>>>> > mem->mem_type = mem_type; >>>>>>> > return ttm_bo_add_move_fence(bo, man, mem); >>>>>>> > -- >>>>>>> > 2.17.1 >>>>>>> > >>>>>>> > _______________________________________________ >>>>>>> > dri-devel mailing list >>>>>>> > dri-devel@lists.freedesktop.org >>>>>>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> dri-devel mailing list >>>>>>> dri-devel@lists.freedesktop.org >>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> dri-devel mailing list >>>>>> dri-devel@lists.freedesktop.org >>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>> >>>> >>>> >>>> _______________________________________________ >>>> dri-devel mailing list >>>> dri-devel@lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>> >> >> >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel > <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <p><br> </p> <br> <div class="moz-cite-prefix">On 2019年04月24日 16:07, Christian König wrote:<br> </div> <blockquote type="cite" cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <div class="moz-cite-prefix">This is used in a work item, so you don't need to check for signals.<br> </div> </blockquote> will remove.<br> <blockquote type="cite" cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com"> <div class="moz-cite-prefix"> <br> And checking if the LRU is populated is mandatory here </div> </blockquote> How to check it outside of TTM? because the code is in dm.<br> <br> <blockquote type="cite" cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com"> <div class="moz-cite-prefix">or otherwise you can run into an endless loop.<br> </div> </blockquote> I already add a timeout for that.<br> <br> -David<br> <blockquote type="cite" cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com"> <div class="moz-cite-prefix"> <br> Christian.<br> <br> Am 24.04.19 um 09:59 schrieb zhoucm1:<br> </div> <blockquote type="cite" cite="mid:26341685-6e73-55c4-2609-52616d92c06a@amd.com"> <p>how about new attached?</p> <p><br> </p> <p>-David<br> </p> <br> <div class="moz-cite-prefix">On 2019年04月24日 15:30, Christian König wrote:<br> </div> <blockquote type="cite" cite="mid:8db16fe8-78fe-7f25-4acd-51e37e645f3b@gmail.com"> <div class="moz-cite-prefix">That would change the semantics of ttm_bo_mem_space() and so could change the return code in an IOCTL as well. Not a good idea, cause that could have a lot of side effects.<br> <br> Instead in the calling DC code you could check if you get an -ENOMEM and then call schedule().<br> <br> If after the schedule() we see that we have now BOs on the LRU we can try again and see if pinning the frame buffer now succeeds.<br> <br> Christian.<br> <br> Am 24.04.19 um 09:17 schrieb zhoucm1:<br> </div> <blockquote type="cite" cite="mid:b57d94e6-6e20-d809-7ac2-89a39a57a365@amd.com"> <p>I made a patch as attached.</p> <p>I'm not sure how to make patch as your proposal, Could you make a patch for that if mine isn't enough?<br> </p> -David<br> <br> <div class="moz-cite-prefix">On 2019年04月24日 15:12, Christian König wrote:<br> </div> <blockquote type="cite" cite="mid:32f2a1b7-99f8-de9a-799c-f98af308842b@gmail.com"> <div class="moz-cite-prefix"> <blockquote type="cite">how about just adding a wrapper for pin function as below?</blockquote> I considered this as well and don't think it will work reliable.<br> <br> We could use it as a band aid for this specific problem, but in general we need to improve the handling in TTM to resolve those kind of resource conflicts.<br> <br> Regards,<br> Christian.<br> <br> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):<br> </div> <blockquote type="cite" cite="mid:-jm8957cqk536djh1631fvvv-xx4wzb5q21ak-v8q7rd2l14aq-f68kxr7kbea18a7xceae626b-8s84c4d1mgrg53g0bhq3ahee89h70qrv4ly1-vf5a7d63x4mbquxnfmiehuk2-rwaw2m-qc2chu.1556032141262@email.android.com"> <meta content="text/html; charset=Windows-1252"> >3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.<br> <br> The BO on LRU is already locked by cs user, can it be dropped here by DC user? and then DC user grab its lock with ticket, how does CS grab it again?<br> <br> If you think waiting in ttm has this risk, how about just adding a wrapper for pin function as below?<br> amdgpu_get_pin_bo_timeout()<br> {<br> do {<br> amdgpo_bo_reserve();<br> r=amdgpu_bo_pin();<br> <br> if(!r)<br> break;<br> amdgpu_bo_unreserve();<br> timeout--;<br> <br> } while(timeout>0);<br> <br> }<br> <br> -------- Original Message --------<br> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy<br> From: Christian König <br> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br> CC: <br> <br> <div> <div class="moz-cite-prefix">Well that's not so easy of hand.<br> <br> The basic problem here is that when you busy wait at this place you can easily run into situations where application A busy waits for B while B busy waits for A -> deadlock.<br> <br> So what we need here is the deadlock detection logic of the ww_mutex. To use this we at least need to do the following steps:<br> <br> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).<br> <br> 2. If we then run into this EBUSY condition in TTM check if the BO we need memory for (or rather the ww_mutex of its reservation object) has a ticket assigned.<br> <br> 3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.<br> <br> 4. If getting the reservation lock with the ticket succeeded we check if the BO is still the first one on the LRU in question (the BO could have moved).<br> <br> 5. If the BO is still the first one on the LRU in question we try to evict it as we would evict any other BO.<br> <br> 6. If any of the "If's" above fail we just back off and return -EBUSY.<br> <br> Steps 2-5 are certainly not trivial, but doable as far as I can see.<br> <br> Have fun :)<br> Christian.<br> <br> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):<br> </div> <blockquote type="cite"> <meta name="Generator" content="Microsoft Exchange Server"> <style> <!-- .EmailQuote {margin-left:1pt; padding-left:4pt; border-left:#800000 2px solid} --> </style> <div>How about adding more condition ctx->resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock.<br> <br> Otherwise, any other idea?<br> <br> -------- Original Message --------<br> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy<br> From: Christian König <br> To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br> CC: <br> <br> </div> <font size="2"><span style="font-size:11pt"> <div class="PlainText">Well that is certainly a NAK because it can lead to deadlock in the <br> memory management.<br> <br> You can't just busy wait with all those locks held.<br> <br> Regards,<br> Christian.<br> <br> Am 23.04.19 um 03:45 schrieb Liang, Prike:<br> > Acked-by: Prike Liang <a class="moz-txt-link-rfc2396E" href="mailto:Prike.Liang@amd.com" moz-do-not-send="true"> <Prike.Liang@amd.com></a><br> ><br> > Thanks,<br> > Prike<br> > -----Original Message-----<br> > From: Chunming Zhou <a class="moz-txt-link-rfc2396E" href="mailto:david1.zhou@amd.com" moz-do-not-send="true"> <david1.zhou@amd.com></a><br> > Sent: Monday, April 22, 2019 6:39 PM<br> > To: <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true"> dri-devel@lists.freedesktop.org</a><br> > Cc: Liang, Prike <a class="moz-txt-link-rfc2396E" href="mailto:Prike.Liang@amd.com" moz-do-not-send="true"> <Prike.Liang@amd.com></a>; Zhou, David(ChunMing) <a class="moz-txt-link-rfc2396E" href="mailto:David1.Zhou@amd.com" moz-do-not-send="true"> <David1.Zhou@amd.com></a><br> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy<br> ><br> > heavy gpu job could occupy memory long time, which could lead to other user fail to get memory.<br> ><br> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11<br> > Signed-off-by: Chunming Zhou <a class="moz-txt-link-rfc2396E" href="mailto:david1.zhou@amd.com" moz-do-not-send="true"> <david1.zhou@amd.com></a><br> > ---<br> > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--<br> > 1 file changed, 4 insertions(+), 2 deletions(-)<br> ><br> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644<br> > --- a/drivers/gpu/drm/ttm/ttm_bo.c<br> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c<br> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,<br> > if (mem->mm_node)<br> > break;<br> > ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);<br> > - if (unlikely(ret != 0))<br> > - return ret;<br> > + if (unlikely(ret != 0)) {<br> > + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)<br> > + return ret;<br> > + }<br> > } while (1);<br> > mem->mem_type = mem_type;<br> > return ttm_bo_add_move_fence(bo, man, mem);<br> > --<br> > 2.17.1<br> ><br> > _______________________________________________<br> > dri-devel mailing list<br> > <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true"> dri-devel@lists.freedesktop.org</a><br> > <a href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a><br> <br> </div> </span></font><br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </div> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </blockquote> <br> </body> </html>
Am 24.04.19 um 10:11 schrieb zhoucm1: On 2019年04月24日 16:07, Christian König wrote: This is used in a work item, so you don't need to check for signals. will remove. And checking if the LRU is populated is mandatory here How to check it outside of TTM? because the code is in dm. Well just use a static cast on the first entry of the LRU. We can't upstream that solution anyway, so just a hack should do for now. or otherwise you can run into an endless loop. I already add a timeout for that. Thinking more about it we most likely actually need to grab the lock on the first BO entry from the LRU. -David Christian. Am 24.04.19 um 09:59 schrieb zhoucm1: how about new attached? -David On 2019年04月24日 15:30, Christian König wrote: That would change the semantics of ttm_bo_mem_space() and so could change the return code in an IOCTL as well. Not a good idea, cause that could have a lot of side effects. Instead in the calling DC code you could check if you get an -ENOMEM and then call schedule(). If after the schedule() we see that we have now BOs on the LRU we can try again and see if pinning the frame buffer now succeeds. Christian. Am 24.04.19 um 09:17 schrieb zhoucm1: I made a patch as attached. I'm not sure how to make patch as your proposal, Could you make a patch for that if mine isn't enough? -David On 2019年04月24日 15:12, Christian König wrote: how about just adding a wrapper for pin function as below? I considered this as well and don't think it will work reliable. We could use it as a band aid for this specific problem, but in general we need to improve the handling in TTM to resolve those kind of resource conflicts. Regards, Christian. Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing): >3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket. The BO on LRU is already locked by cs user, can it be dropped here by DC user? and then DC user grab its lock with ticket, how does CS grab it again? If you think waiting in ttm has this risk, how about just adding a wrapper for pin function as below? amdgpu_get_pin_bo_timeout() { do { amdgpo_bo_reserve(); r=amdgpu_bo_pin(); if(!r) break; amdgpu_bo_unreserve(); timeout--; } while(timeout>0); } -------- Original Message -------- Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy From: Christian König To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" ,dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org> CC: Well that's not so easy of hand. The basic problem here is that when you busy wait at this place you can easily run into situations where application A busy waits for B while B busy waits for A -> deadlock. So what we need here is the deadlock detection logic of the ww_mutex. To use this we at least need to do the following steps: 1. Reserve the BO in DC using a ww_mutex ticket (trivial). 2. If we then run into this EBUSY condition in TTM check if the BO we need memory for (or rather the ww_mutex of its reservation object) has a ticket assigned. 3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket. 4. If getting the reservation lock with the ticket succeeded we check if the BO is still the first one on the LRU in question (the BO could have moved). 5. If the BO is still the first one on the LRU in question we try to evict it as we would evict any other BO. 6. If any of the "If's" above fail we just back off and return -EBUSY. Steps 2-5 are certainly not trivial, but doable as far as I can see. Have fun :) Christian. Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing): How about adding more condition ctx->resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock. Otherwise, any other idea? -------- Original Message -------- Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy From: Christian König To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org> CC: Well that is certainly a NAK because it can lead to deadlock in the memory management. You can't just busy wait with all those locks held. Regards, Christian. Am 23.04.19 um 03:45 schrieb Liang, Prike: > Acked-by: Prike Liang <Prike.Liang@amd.com><mailto:Prike.Liang@amd.com> > > Thanks, > Prike > -----Original Message----- > From: Chunming Zhou <david1.zhou@amd.com><mailto:david1.zhou@amd.com> > Sent: Monday, April 22, 2019 6:39 PM > To: dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org> > Cc: Liang, Prike <Prike.Liang@amd.com><mailto:Prike.Liang@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com><mailto:David1.Zhou@amd.com> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy > > heavy gpu job could occupy memory long time, which could lead to other user fail to get memory. > > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11 > Signed-off-by: Chunming Zhou <david1.zhou@amd.com><mailto:david1.zhou@amd.com> > --- > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, > if (mem->mm_node) > break; > ret = ttm_mem_evict_first(bdev, mem_type, place, ctx); > - if (unlikely(ret != 0)) > - return ret; > + if (unlikely(ret != 0)) { > + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY) > + return ret; > + } > } while (1); > mem->mem_type = mem_type; > return ttm_bo_add_move_fence(bo, man, mem); > -- > 2.17.1 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
OK, Let's drop mine, then Could you draft a solution for that? -David On 2019年04月24日 16:59, Koenig, Christian wrote: > Am 24.04.19 um 10:11 schrieb zhoucm1: >> >> >> >> On 2019年04月24日 16:07, Christian König wrote: >>> This is used in a work item, so you don't need to check for signals. >> will remove. >>> >>> And checking if the LRU is populated is mandatory here >> How to check it outside of TTM? because the code is in dm. > > Well just use a static cast on the first entry of the LRU. > > We can't upstream that solution anyway, so just a hack should do for now. > >> >>> or otherwise you can run into an endless loop. >> I already add a timeout for that. > > Thinking more about it we most likely actually need to grab the lock > on the first BO entry from the LRU. > >> >> -David >>> >>> Christian. >>> >>> Am 24.04.19 um 09:59 schrieb zhoucm1: >>>> >>>> how about new attached? >>>> >>>> >>>> -David >>>> >>>> >>>> On 2019年04月24日 15:30, Christian König wrote: >>>>> That would change the semantics of ttm_bo_mem_space() and so could >>>>> change the return code in an IOCTL as well. Not a good idea, cause >>>>> that could have a lot of side effects. >>>>> >>>>> Instead in the calling DC code you could check if you get an >>>>> -ENOMEM and then call schedule(). >>>>> >>>>> If after the schedule() we see that we have now BOs on the LRU we >>>>> can try again and see if pinning the frame buffer now succeeds. >>>>> >>>>> Christian. >>>>> >>>>> Am 24.04.19 um 09:17 schrieb zhoucm1: >>>>>> >>>>>> I made a patch as attached. >>>>>> >>>>>> I'm not sure how to make patch as your proposal, Could you make a >>>>>> patch for that if mine isn't enough? >>>>>> >>>>>> -David >>>>>> >>>>>> On 2019年04月24日 15:12, Christian König wrote: >>>>>>>> how about just adding a wrapper for pin function as below? >>>>>>> I considered this as well and don't think it will work reliable. >>>>>>> >>>>>>> We could use it as a band aid for this specific problem, but in >>>>>>> general we need to improve the handling in TTM to resolve those >>>>>>> kind of resource conflicts. >>>>>>> >>>>>>> Regards, >>>>>>> Christian. >>>>>>> >>>>>>> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing): >>>>>>>> >3. If we have a ticket we grab a reference to the first BO on >>>>>>>> the LRU, drop the LRU lock and try to grab the reservation lock >>>>>>>> with the ticket. >>>>>>>> >>>>>>>> The BO on LRU is already locked by cs user, can it be dropped >>>>>>>> here by DC user? and then DC user grab its lock with ticket, >>>>>>>> how does CS grab it again? >>>>>>>> >>>>>>>> If you think waiting in ttm has this risk, how about just >>>>>>>> adding a wrapper for pin function as below? >>>>>>>> amdgpu_get_pin_bo_timeout() >>>>>>>> { >>>>>>>> do { >>>>>>>> amdgpo_bo_reserve(); >>>>>>>> r=amdgpu_bo_pin(); >>>>>>>> >>>>>>>> if(!r) >>>>>>>> break; >>>>>>>> amdgpu_bo_unreserve(); >>>>>>>> timeout--; >>>>>>>> >>>>>>>> } while(timeout>0); >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> -------- Original Message -------- >>>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while >>>>>>>> gpu busy >>>>>>>> From: Christian König >>>>>>>> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, >>>>>>>> Prike" ,dri-devel@lists.freedesktop.org >>>>>>>> CC: >>>>>>>> >>>>>>>> Well that's not so easy of hand. >>>>>>>> >>>>>>>> The basic problem here is that when you busy wait at this place >>>>>>>> you can easily run into situations where application A busy >>>>>>>> waits for B while B busy waits for A -> deadlock. >>>>>>>> >>>>>>>> So what we need here is the deadlock detection logic of the >>>>>>>> ww_mutex. To use this we at least need to do the following steps: >>>>>>>> >>>>>>>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial). >>>>>>>> >>>>>>>> 2. If we then run into this EBUSY condition in TTM check if the >>>>>>>> BO we need memory for (or rather the ww_mutex of its >>>>>>>> reservation object) has a ticket assigned. >>>>>>>> >>>>>>>> 3. If we have a ticket we grab a reference to the first BO on >>>>>>>> the LRU, drop the LRU lock and try to grab the reservation lock >>>>>>>> with the ticket. >>>>>>>> >>>>>>>> 4. If getting the reservation lock with the ticket succeeded we >>>>>>>> check if the BO is still the first one on the LRU in question >>>>>>>> (the BO could have moved). >>>>>>>> >>>>>>>> 5. If the BO is still the first one on the LRU in question we >>>>>>>> try to evict it as we would evict any other BO. >>>>>>>> >>>>>>>> 6. If any of the "If's" above fail we just back off and return >>>>>>>> -EBUSY. >>>>>>>> >>>>>>>> Steps 2-5 are certainly not trivial, but doable as far as I can >>>>>>>> see. >>>>>>>> >>>>>>>> Have fun :) >>>>>>>> Christian. >>>>>>>> >>>>>>>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing): >>>>>>>>> How about adding more condition ctx->resv inline to address >>>>>>>>> your concern? As well as don't wait from same user, shouldn't >>>>>>>>> lead to deadlock. >>>>>>>>> >>>>>>>>> Otherwise, any other idea? >>>>>>>>> >>>>>>>>> -------- Original Message -------- >>>>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while >>>>>>>>> gpu busy >>>>>>>>> From: Christian König >>>>>>>>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" >>>>>>>>> ,dri-devel@lists.freedesktop.org >>>>>>>>> CC: >>>>>>>>> >>>>>>>>> Well that is certainly a NAK because it can lead to deadlock >>>>>>>>> in the >>>>>>>>> memory management. >>>>>>>>> >>>>>>>>> You can't just busy wait with all those locks held. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Christian. >>>>>>>>> >>>>>>>>> Am 23.04.19 um 03:45 schrieb Liang, Prike: >>>>>>>>> > Acked-by: Prike Liang <Prike.Liang@amd.com> >>>>>>>>> > >>>>>>>>> > Thanks, >>>>>>>>> > Prike >>>>>>>>> > -----Original Message----- >>>>>>>>> > From: Chunming Zhou <david1.zhou@amd.com> >>>>>>>>> > Sent: Monday, April 22, 2019 6:39 PM >>>>>>>>> > To: dri-devel@lists.freedesktop.org >>>>>>>>> > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, >>>>>>>>> David(ChunMing) <David1.Zhou@amd.com> >>>>>>>>> > Subject: [PATCH] ttm: wait mem space if user allow while gpu >>>>>>>>> busy >>>>>>>>> > >>>>>>>>> > heavy gpu job could occupy memory long time, which could >>>>>>>>> lead to other user fail to get memory. >>>>>>>>> > >>>>>>>>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11 >>>>>>>>> > Signed-off-by: Chunming Zhou <david1.zhou@amd.com> >>>>>>>>> > --- >>>>>>>>> > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++-- >>>>>>>>> > 1 file changed, 4 insertions(+), 2 deletions(-) >>>>>>>>> > >>>>>>>>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c >>>>>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c index >>>>>>>>> 7c484729f9b2..6c596cc24bec 100644 >>>>>>>>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c >>>>>>>>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c >>>>>>>>> > @@ -830,8 +830,10 @@ static int >>>>>>>>> ttm_bo_mem_force_space(struct ttm_buffer_object *bo, >>>>>>>>> > if (mem->mm_node) >>>>>>>>> > break; >>>>>>>>> > ret = ttm_mem_evict_first(bdev, mem_type, >>>>>>>>> place, ctx); >>>>>>>>> > - if (unlikely(ret != 0)) >>>>>>>>> > - return ret; >>>>>>>>> > + if (unlikely(ret != 0)) { >>>>>>>>> > + if (!ctx || ctx->no_wait_gpu || ret != >>>>>>>>> -EBUSY) >>>>>>>>> > + return ret; >>>>>>>>> > + } >>>>>>>>> > } while (1); >>>>>>>>> > mem->mem_type = mem_type; >>>>>>>>> > return ttm_bo_add_move_fence(bo, man, mem); >>>>>>>>> > -- >>>>>>>>> > 2.17.1 >>>>>>>>> > >>>>>>>>> > _______________________________________________ >>>>>>>>> > dri-devel mailing list >>>>>>>>> > dri-devel@lists.freedesktop.org >>>>>>>>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> dri-devel mailing list >>>>>>>>> dri-devel@lists.freedesktop.org >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> dri-devel mailing list >>>>>>>> dri-devel@lists.freedesktop.org >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> dri-devel mailing list >>>>>> dri-devel@lists.freedesktop.org >>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>> >>>> >>>> >>>> _______________________________________________ >>>> dri-devel mailing list >>>> dri-devel@lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>> >> > <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <p>OK, Let's drop mine, then Could you draft a solution for that?</p> <p><br> </p> <p>-David<br> </p> <br> <div class="moz-cite-prefix">On 2019年04月24日 16:59, Koenig, Christian wrote:<br> </div> <blockquote type="cite" cite="mid:9e6e865f-29dd-be69-fada-f6a6b35e6c69@amd.com"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <div class="moz-cite-prefix">Am 24.04.19 um 10:11 schrieb zhoucm1:<br> </div> <blockquote type="cite" cite="mid:cc8a8a83-a200-528f-9d31-a16115c3c00f@amd.com"> <p><br> </p> <br> <div class="moz-cite-prefix">On 2019年04月24日 16:07, Christian König wrote:<br> </div> <blockquote type="cite" cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com"> <div class="moz-cite-prefix">This is used in a work item, so you don't need to check for signals.<br> </div> </blockquote> will remove.<br> <blockquote type="cite" cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com"> <div class="moz-cite-prefix"><br> And checking if the LRU is populated is mandatory here </div> </blockquote> How to check it outside of TTM? because the code is in dm.<br> </blockquote> <br> Well just use a static cast on the first entry of the LRU.<br> <br> We can't upstream that solution anyway, so just a hack should do for now.<br> <br> <blockquote type="cite" cite="mid:cc8a8a83-a200-528f-9d31-a16115c3c00f@amd.com"><br> <blockquote type="cite" cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com"> <div class="moz-cite-prefix">or otherwise you can run into an endless loop.<br> </div> </blockquote> I already add a timeout for that.<br> </blockquote> <br> Thinking more about it we most likely actually need to grab the lock on the first BO entry from the LRU.<br> <br> <blockquote type="cite" cite="mid:cc8a8a83-a200-528f-9d31-a16115c3c00f@amd.com"><br> -David<br> <blockquote type="cite" cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com"> <div class="moz-cite-prefix"><br> Christian.<br> <br> Am 24.04.19 um 09:59 schrieb zhoucm1:<br> </div> <blockquote type="cite" cite="mid:26341685-6e73-55c4-2609-52616d92c06a@amd.com"> <p>how about new attached?</p> <p><br> </p> <p>-David<br> </p> <br> <div class="moz-cite-prefix">On 2019年04月24日 15:30, Christian König wrote:<br> </div> <blockquote type="cite" cite="mid:8db16fe8-78fe-7f25-4acd-51e37e645f3b@gmail.com"> <div class="moz-cite-prefix">That would change the semantics of ttm_bo_mem_space() and so could change the return code in an IOCTL as well. Not a good idea, cause that could have a lot of side effects.<br> <br> Instead in the calling DC code you could check if you get an -ENOMEM and then call schedule().<br> <br> If after the schedule() we see that we have now BOs on the LRU we can try again and see if pinning the frame buffer now succeeds.<br> <br> Christian.<br> <br> Am 24.04.19 um 09:17 schrieb zhoucm1:<br> </div> <blockquote type="cite" cite="mid:b57d94e6-6e20-d809-7ac2-89a39a57a365@amd.com"> <p>I made a patch as attached.</p> <p>I'm not sure how to make patch as your proposal, Could you make a patch for that if mine isn't enough?<br> </p> -David<br> <br> <div class="moz-cite-prefix">On 2019年04月24日 15:12, Christian König wrote:<br> </div> <blockquote type="cite" cite="mid:32f2a1b7-99f8-de9a-799c-f98af308842b@gmail.com"> <div class="moz-cite-prefix"> <blockquote type="cite">how about just adding a wrapper for pin function as below?</blockquote> I considered this as well and don't think it will work reliable.<br> <br> We could use it as a band aid for this specific problem, but in general we need to improve the handling in TTM to resolve those kind of resource conflicts.<br> <br> Regards,<br> Christian.<br> <br> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):<br> </div> <blockquote type="cite" cite="mid:-jm8957cqk536djh1631fvvv-xx4wzb5q21ak-v8q7rd2l14aq-f68kxr7kbea18a7xceae626b-8s84c4d1mgrg53g0bhq3ahee89h70qrv4ly1-vf5a7d63x4mbquxnfmiehuk2-rwaw2m-qc2chu.1556032141262@email.android.com"> <meta content="text/html; charset=Windows-1252"> >3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.<br> <br> The BO on LRU is already locked by cs user, can it be dropped here by DC user? and then DC user grab its lock with ticket, how does CS grab it again?<br> <br> If you think waiting in ttm has this risk, how about just adding a wrapper for pin function as below?<br> amdgpu_get_pin_bo_timeout()<br> {<br> do {<br> amdgpo_bo_reserve();<br> r=amdgpu_bo_pin();<br> <br> if(!r)<br> break;<br> amdgpu_bo_unreserve();<br> timeout--;<br> <br> } while(timeout>0);<br> <br> }<br> <br> -------- Original Message --------<br> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy<br> From: Christian König <br> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br> CC: <br> <br> <div> <div class="moz-cite-prefix">Well that's not so easy of hand.<br> <br> The basic problem here is that when you busy wait at this place you can easily run into situations where application A busy waits for B while B busy waits for A -> deadlock.<br> <br> So what we need here is the deadlock detection logic of the ww_mutex. To use this we at least need to do the following steps:<br> <br> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).<br> <br> 2. If we then run into this EBUSY condition in TTM check if the BO we need memory for (or rather the ww_mutex of its reservation object) has a ticket assigned.<br> <br> 3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.<br> <br> 4. If getting the reservation lock with the ticket succeeded we check if the BO is still the first one on the LRU in question (the BO could have moved).<br> <br> 5. If the BO is still the first one on the LRU in question we try to evict it as we would evict any other BO.<br> <br> 6. If any of the "If's" above fail we just back off and return -EBUSY.<br> <br> Steps 2-5 are certainly not trivial, but doable as far as I can see.<br> <br> Have fun :)<br> Christian.<br> <br> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):<br> </div> <blockquote type="cite"> <meta name="Generator" content="Microsoft Exchange Server"> <style> <!-- .EmailQuote {margin-left:1pt; padding-left:4pt; border-left:#800000 2px solid} --> </style> <div>How about adding more condition ctx->resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock.<br> <br> Otherwise, any other idea?<br> <br> -------- Original Message --------<br> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy<br> From: Christian König <br> To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br> CC: <br> <br> </div> <font size="2"><span style="font-size:11pt"> <div class="PlainText">Well that is certainly a NAK because it can lead to deadlock in the <br> memory management.<br> <br> You can't just busy wait with all those locks held.<br> <br> Regards,<br> Christian.<br> <br> Am 23.04.19 um 03:45 schrieb Liang, Prike:<br> > Acked-by: Prike Liang <a class="moz-txt-link-rfc2396E" href="mailto:Prike.Liang@amd.com" moz-do-not-send="true"> <Prike.Liang@amd.com></a><br> ><br> > Thanks,<br> > Prike<br> > -----Original Message-----<br> > From: Chunming Zhou <a class="moz-txt-link-rfc2396E" href="mailto:david1.zhou@amd.com" moz-do-not-send="true"> <david1.zhou@amd.com></a><br> > Sent: Monday, April 22, 2019 6:39 PM<br> > To: <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true"> dri-devel@lists.freedesktop.org</a><br> > Cc: Liang, Prike <a class="moz-txt-link-rfc2396E" href="mailto:Prike.Liang@amd.com" moz-do-not-send="true"> <Prike.Liang@amd.com></a>; Zhou, David(ChunMing) <a class="moz-txt-link-rfc2396E" href="mailto:David1.Zhou@amd.com" moz-do-not-send="true"> <David1.Zhou@amd.com></a><br> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy<br> ><br> > heavy gpu job could occupy memory long time, which could lead to other user fail to get memory.<br> ><br> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11<br> > Signed-off-by: Chunming Zhou <a class="moz-txt-link-rfc2396E" href="mailto:david1.zhou@amd.com" moz-do-not-send="true"> <david1.zhou@amd.com></a><br> > ---<br> > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--<br> > 1 file changed, 4 insertions(+), 2 deletions(-)<br> ><br> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644<br> > --- a/drivers/gpu/drm/ttm/ttm_bo.c<br> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c<br> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,<br> > if (mem->mm_node)<br> > break;<br> > ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);<br> > - if (unlikely(ret != 0))<br> > - return ret;<br> > + if (unlikely(ret != 0)) {<br> > + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)<br> > + return ret;<br> > + }<br> > } while (1);<br> > mem->mem_type = mem_type;<br> > return ttm_bo_add_move_fence(bo, man, mem);<br> > --<br> > 2.17.1<br> ><br> > _______________________________________________<br> > dri-devel mailing list<br> > <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true"> dri-devel@lists.freedesktop.org</a><br> > <a href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true"> https://lists.freedesktop.org/mailman/listinfo/dri-devel</a><br> <br> </div> </span></font><br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </div> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </blockquote> <br> </blockquote> <br> </blockquote> <br> </body> </html>
No, I'm busy with P2P and recoverable page faults. Christian. Am 24.04.19 um 11:03 schrieb zhoucm1: > > OK, Let's drop mine, then Could you draft a solution for that? > > > -David > > > On 2019年04月24日 16:59, Koenig, Christian wrote: >> Am 24.04.19 um 10:11 schrieb zhoucm1: >>> >>> >>> >>> On 2019年04月24日 16:07, Christian König wrote: >>>> This is used in a work item, so you don't need to check for signals. >>> will remove. >>>> >>>> And checking if the LRU is populated is mandatory here >>> How to check it outside of TTM? because the code is in dm. >> >> Well just use a static cast on the first entry of the LRU. >> >> We can't upstream that solution anyway, so just a hack should do for now. >> >>> >>>> or otherwise you can run into an endless loop. >>> I already add a timeout for that. >> >> Thinking more about it we most likely actually need to grab the lock >> on the first BO entry from the LRU. >> >>> >>> -David >>>> >>>> Christian. >>>> >>>> Am 24.04.19 um 09:59 schrieb zhoucm1: >>>>> >>>>> how about new attached? >>>>> >>>>> >>>>> -David >>>>> >>>>> >>>>> On 2019年04月24日 15:30, Christian König wrote: >>>>>> That would change the semantics of ttm_bo_mem_space() and so >>>>>> could change the return code in an IOCTL as well. Not a good >>>>>> idea, cause that could have a lot of side effects. >>>>>> >>>>>> Instead in the calling DC code you could check if you get an >>>>>> -ENOMEM and then call schedule(). >>>>>> >>>>>> If after the schedule() we see that we have now BOs on the LRU we >>>>>> can try again and see if pinning the frame buffer now succeeds. >>>>>> >>>>>> Christian. >>>>>> >>>>>> Am 24.04.19 um 09:17 schrieb zhoucm1: >>>>>>> >>>>>>> I made a patch as attached. >>>>>>> >>>>>>> I'm not sure how to make patch as your proposal, Could you make >>>>>>> a patch for that if mine isn't enough? >>>>>>> >>>>>>> -David >>>>>>> >>>>>>> On 2019年04月24日 15:12, Christian König wrote: >>>>>>>>> how about just adding a wrapper for pin function as below? >>>>>>>> I considered this as well and don't think it will work reliable. >>>>>>>> >>>>>>>> We could use it as a band aid for this specific problem, but in >>>>>>>> general we need to improve the handling in TTM to resolve those >>>>>>>> kind of resource conflicts. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Christian. >>>>>>>> >>>>>>>> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing): >>>>>>>>> >3. If we have a ticket we grab a reference to the first BO on >>>>>>>>> the LRU, drop the LRU lock and try to grab the reservation >>>>>>>>> lock with the ticket. >>>>>>>>> >>>>>>>>> The BO on LRU is already locked by cs user, can it be dropped >>>>>>>>> here by DC user? and then DC user grab its lock with ticket, >>>>>>>>> how does CS grab it again? >>>>>>>>> >>>>>>>>> If you think waiting in ttm has this risk, how about just >>>>>>>>> adding a wrapper for pin function as below? >>>>>>>>> amdgpu_get_pin_bo_timeout() >>>>>>>>> { >>>>>>>>> do { >>>>>>>>> amdgpo_bo_reserve(); >>>>>>>>> r=amdgpu_bo_pin(); >>>>>>>>> >>>>>>>>> if(!r) >>>>>>>>> break; >>>>>>>>> amdgpu_bo_unreserve(); >>>>>>>>> timeout--; >>>>>>>>> >>>>>>>>> } while(timeout>0); >>>>>>>>> >>>>>>>>> } >>>>>>>>> >>>>>>>>> -------- Original Message -------- >>>>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while >>>>>>>>> gpu busy >>>>>>>>> From: Christian König >>>>>>>>> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, >>>>>>>>> Prike" ,dri-devel@lists.freedesktop.org >>>>>>>>> CC: >>>>>>>>> >>>>>>>>> Well that's not so easy of hand. >>>>>>>>> >>>>>>>>> The basic problem here is that when you busy wait at this >>>>>>>>> place you can easily run into situations where application A >>>>>>>>> busy waits for B while B busy waits for A -> deadlock. >>>>>>>>> >>>>>>>>> So what we need here is the deadlock detection logic of the >>>>>>>>> ww_mutex. To use this we at least need to do the following steps: >>>>>>>>> >>>>>>>>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial). >>>>>>>>> >>>>>>>>> 2. If we then run into this EBUSY condition in TTM check if >>>>>>>>> the BO we need memory for (or rather the ww_mutex of its >>>>>>>>> reservation object) has a ticket assigned. >>>>>>>>> >>>>>>>>> 3. If we have a ticket we grab a reference to the first BO on >>>>>>>>> the LRU, drop the LRU lock and try to grab the reservation >>>>>>>>> lock with the ticket. >>>>>>>>> >>>>>>>>> 4. If getting the reservation lock with the ticket succeeded >>>>>>>>> we check if the BO is still the first one on the LRU in >>>>>>>>> question (the BO could have moved). >>>>>>>>> >>>>>>>>> 5. If the BO is still the first one on the LRU in question we >>>>>>>>> try to evict it as we would evict any other BO. >>>>>>>>> >>>>>>>>> 6. If any of the "If's" above fail we just back off and return >>>>>>>>> -EBUSY. >>>>>>>>> >>>>>>>>> Steps 2-5 are certainly not trivial, but doable as far as I >>>>>>>>> can see. >>>>>>>>> >>>>>>>>> Have fun :) >>>>>>>>> Christian. >>>>>>>>> >>>>>>>>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing): >>>>>>>>>> How about adding more condition ctx->resv inline to address >>>>>>>>>> your concern? As well as don't wait from same user, shouldn't >>>>>>>>>> lead to deadlock. >>>>>>>>>> >>>>>>>>>> Otherwise, any other idea? >>>>>>>>>> >>>>>>>>>> -------- Original Message -------- >>>>>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while >>>>>>>>>> gpu busy >>>>>>>>>> From: Christian König >>>>>>>>>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" >>>>>>>>>> ,dri-devel@lists.freedesktop.org >>>>>>>>>> CC: >>>>>>>>>> >>>>>>>>>> Well that is certainly a NAK because it can lead to deadlock >>>>>>>>>> in the >>>>>>>>>> memory management. >>>>>>>>>> >>>>>>>>>> You can't just busy wait with all those locks held. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Christian. >>>>>>>>>> >>>>>>>>>> Am 23.04.19 um 03:45 schrieb Liang, Prike: >>>>>>>>>> > Acked-by: Prike Liang <Prike.Liang@amd.com> >>>>>>>>>> > >>>>>>>>>> > Thanks, >>>>>>>>>> > Prike >>>>>>>>>> > -----Original Message----- >>>>>>>>>> > From: Chunming Zhou <david1.zhou@amd.com> >>>>>>>>>> > Sent: Monday, April 22, 2019 6:39 PM >>>>>>>>>> > To: dri-devel@lists.freedesktop.org >>>>>>>>>> > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, >>>>>>>>>> David(ChunMing) <David1.Zhou@amd.com> >>>>>>>>>> > Subject: [PATCH] ttm: wait mem space if user allow while >>>>>>>>>> gpu busy >>>>>>>>>> > >>>>>>>>>> > heavy gpu job could occupy memory long time, which could >>>>>>>>>> lead to other user fail to get memory. >>>>>>>>>> > >>>>>>>>>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11 >>>>>>>>>> > Signed-off-by: Chunming Zhou <david1.zhou@amd.com> >>>>>>>>>> > --- >>>>>>>>>> > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++-- >>>>>>>>>> > 1 file changed, 4 insertions(+), 2 deletions(-) >>>>>>>>>> > >>>>>>>>>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c >>>>>>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c index >>>>>>>>>> 7c484729f9b2..6c596cc24bec 100644 >>>>>>>>>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c >>>>>>>>>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c >>>>>>>>>> > @@ -830,8 +830,10 @@ static int >>>>>>>>>> ttm_bo_mem_force_space(struct ttm_buffer_object *bo, >>>>>>>>>> > if (mem->mm_node) >>>>>>>>>> > break; >>>>>>>>>> > ret = ttm_mem_evict_first(bdev, mem_type, >>>>>>>>>> place, ctx); >>>>>>>>>> > - if (unlikely(ret != 0)) >>>>>>>>>> > - return ret; >>>>>>>>>> > + if (unlikely(ret != 0)) { >>>>>>>>>> > + if (!ctx || ctx->no_wait_gpu || ret >>>>>>>>>> != -EBUSY) >>>>>>>>>> > + return ret; >>>>>>>>>> > + } >>>>>>>>>> > } while (1); >>>>>>>>>> > mem->mem_type = mem_type; >>>>>>>>>> > return ttm_bo_add_move_fence(bo, man, mem); >>>>>>>>>> > -- >>>>>>>>>> > 2.17.1 >>>>>>>>>> > >>>>>>>>>> > _______________________________________________ >>>>>>>>>> > dri-devel mailing list >>>>>>>>>> > dri-devel@lists.freedesktop.org >>>>>>>>>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> dri-devel mailing list >>>>>>>>>> dri-devel@lists.freedesktop.org >>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> dri-devel mailing list >>>>>>>>> dri-devel@lists.freedesktop.org >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> dri-devel mailing list >>>>>>> dri-devel@lists.freedesktop.org >>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> dri-devel mailing list >>>>> dri-devel@lists.freedesktop.org >>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>> >>> >> > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <div class="moz-cite-prefix">No, I'm busy with P2P and recoverable page faults.<br> <br> Christian.<br> <br> Am 24.04.19 um 11:03 schrieb zhoucm1:<br> </div> <blockquote type="cite" cite="mid:f18a4a91-822b-4a0f-d038-3488fdd64c64@amd.com"> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <p>OK, Let's drop mine, then Could you draft a solution for that?</p> <p><br> </p> <p>-David<br> </p> <br> <div class="moz-cite-prefix">On 2019年04月24日 16:59, Koenig, Christian wrote:<br> </div> <blockquote type="cite" cite="mid:9e6e865f-29dd-be69-fada-f6a6b35e6c69@amd.com"> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <div class="moz-cite-prefix">Am 24.04.19 um 10:11 schrieb zhoucm1:<br> </div> <blockquote type="cite" cite="mid:cc8a8a83-a200-528f-9d31-a16115c3c00f@amd.com"> <p><br> </p> <br> <div class="moz-cite-prefix">On 2019年04月24日 16:07, Christian König wrote:<br> </div> <blockquote type="cite" cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com"> <div class="moz-cite-prefix">This is used in a work item, so you don't need to check for signals.<br> </div> </blockquote> will remove.<br> <blockquote type="cite" cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com"> <div class="moz-cite-prefix"><br> And checking if the LRU is populated is mandatory here </div> </blockquote> How to check it outside of TTM? because the code is in dm.<br> </blockquote> <br> Well just use a static cast on the first entry of the LRU.<br> <br> We can't upstream that solution anyway, so just a hack should do for now.<br> <br> <blockquote type="cite" cite="mid:cc8a8a83-a200-528f-9d31-a16115c3c00f@amd.com"><br> <blockquote type="cite" cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com"> <div class="moz-cite-prefix">or otherwise you can run into an endless loop.<br> </div> </blockquote> I already add a timeout for that.<br> </blockquote> <br> Thinking more about it we most likely actually need to grab the lock on the first BO entry from the LRU.<br> <br> <blockquote type="cite" cite="mid:cc8a8a83-a200-528f-9d31-a16115c3c00f@amd.com"><br> -David<br> <blockquote type="cite" cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com"> <div class="moz-cite-prefix"><br> Christian.<br> <br> Am 24.04.19 um 09:59 schrieb zhoucm1:<br> </div> <blockquote type="cite" cite="mid:26341685-6e73-55c4-2609-52616d92c06a@amd.com"> <p>how about new attached?</p> <p><br> </p> <p>-David<br> </p> <br> <div class="moz-cite-prefix">On 2019年04月24日 15:30, Christian König wrote:<br> </div> <blockquote type="cite" cite="mid:8db16fe8-78fe-7f25-4acd-51e37e645f3b@gmail.com"> <div class="moz-cite-prefix">That would change the semantics of ttm_bo_mem_space() and so could change the return code in an IOCTL as well. Not a good idea, cause that could have a lot of side effects.<br> <br> Instead in the calling DC code you could check if you get an -ENOMEM and then call schedule().<br> <br> If after the schedule() we see that we have now BOs on the LRU we can try again and see if pinning the frame buffer now succeeds.<br> <br> Christian.<br> <br> Am 24.04.19 um 09:17 schrieb zhoucm1:<br> </div> <blockquote type="cite" cite="mid:b57d94e6-6e20-d809-7ac2-89a39a57a365@amd.com"> <p>I made a patch as attached.</p> <p>I'm not sure how to make patch as your proposal, Could you make a patch for that if mine isn't enough?<br> </p> -David<br> <br> <div class="moz-cite-prefix">On 2019年04月24日 15:12, Christian König wrote:<br> </div> <blockquote type="cite" cite="mid:32f2a1b7-99f8-de9a-799c-f98af308842b@gmail.com"> <div class="moz-cite-prefix"> <blockquote type="cite">how about just adding a wrapper for pin function as below?</blockquote> I considered this as well and don't think it will work reliable.<br> <br> We could use it as a band aid for this specific problem, but in general we need to improve the handling in TTM to resolve those kind of resource conflicts.<br> <br> Regards,<br> Christian.<br> <br> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):<br> </div> <blockquote type="cite" cite="mid:-jm8957cqk536djh1631fvvv-xx4wzb5q21ak-v8q7rd2l14aq-f68kxr7kbea18a7xceae626b-8s84c4d1mgrg53g0bhq3ahee89h70qrv4ly1-vf5a7d63x4mbquxnfmiehuk2-rwaw2m-qc2chu.1556032141262@email.android.com"> <meta content="text/html; charset=Windows-1252"> >3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.<br> <br> The BO on LRU is already locked by cs user, can it be dropped here by DC user? and then DC user grab its lock with ticket, how does CS grab it again?<br> <br> If you think waiting in ttm has this risk, how about just adding a wrapper for pin function as below?<br> amdgpu_get_pin_bo_timeout()<br> {<br> do {<br> amdgpo_bo_reserve();<br> r=amdgpu_bo_pin();<br> <br> if(!r)<br> break;<br> amdgpu_bo_unreserve();<br> timeout--;<br> <br> } while(timeout>0);<br> <br> }<br> <br> -------- Original Message --------<br> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy<br> From: Christian König <br> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br> CC: <br> <br> <div> <div class="moz-cite-prefix">Well that's not so easy of hand.<br> <br> The basic problem here is that when you busy wait at this place you can easily run into situations where application A busy waits for B while B busy waits for A -> deadlock.<br> <br> So what we need here is the deadlock detection logic of the ww_mutex. To use this we at least need to do the following steps:<br> <br> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).<br> <br> 2. If we then run into this EBUSY condition in TTM check if the BO we need memory for (or rather the ww_mutex of its reservation object) has a ticket assigned.<br> <br> 3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.<br> <br> 4. If getting the reservation lock with the ticket succeeded we check if the BO is still the first one on the LRU in question (the BO could have moved).<br> <br> 5. If the BO is still the first one on the LRU in question we try to evict it as we would evict any other BO.<br> <br> 6. If any of the "If's" above fail we just back off and return -EBUSY.<br> <br> Steps 2-5 are certainly not trivial, but doable as far as I can see.<br> <br> Have fun :)<br> Christian.<br> <br> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):<br> </div> <blockquote type="cite"> <meta name="Generator" content="Microsoft Exchange Server"> <style> <!-- .EmailQuote {margin-left:1pt; padding-left:4pt; border-left:#800000 2px solid} --> </style> <div>How about adding more condition ctx->resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock.<br> <br> Otherwise, any other idea?<br> <br> -------- Original Message --------<br> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy<br> From: Christian König <br> To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br> CC: <br> <br> </div> <font size="2"><span style="font-size:11pt"> <div class="PlainText">Well that is certainly a NAK because it can lead to deadlock in the <br> memory management.<br> <br> You can't just busy wait with all those locks held.<br> <br> Regards,<br> Christian.<br> <br> Am 23.04.19 um 03:45 schrieb Liang, Prike:<br> > Acked-by: Prike Liang <a class="moz-txt-link-rfc2396E" href="mailto:Prike.Liang@amd.com" moz-do-not-send="true"> <Prike.Liang@amd.com></a><br> ><br> > Thanks,<br> > Prike<br> > -----Original Message-----<br> > From: Chunming Zhou <a class="moz-txt-link-rfc2396E" href="mailto:david1.zhou@amd.com" moz-do-not-send="true"> <david1.zhou@amd.com></a><br> > Sent: Monday, April 22, 2019 6:39 PM<br> > To: <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true"> dri-devel@lists.freedesktop.org</a><br> > Cc: Liang, Prike <a class="moz-txt-link-rfc2396E" href="mailto:Prike.Liang@amd.com" moz-do-not-send="true"> <Prike.Liang@amd.com></a>; Zhou, David(ChunMing) <a class="moz-txt-link-rfc2396E" href="mailto:David1.Zhou@amd.com" moz-do-not-send="true"> <David1.Zhou@amd.com></a><br> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy<br> ><br> > heavy gpu job could occupy memory long time, which could lead to other user fail to get memory.<br> ><br> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11<br> > Signed-off-by: Chunming Zhou <a class="moz-txt-link-rfc2396E" href="mailto:david1.zhou@amd.com" moz-do-not-send="true"> <david1.zhou@amd.com></a><br> > ---<br> > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--<br> > 1 file changed, 4 insertions(+), 2 deletions(-)<br> ><br> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644<br> > --- a/drivers/gpu/drm/ttm/ttm_bo.c<br> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c<br> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,<br> > if (mem->mm_node)<br> > break;<br> > ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);<br> > - if (unlikely(ret != 0))<br> > - return ret;<br> > + if (unlikely(ret != 0)) {<br> > + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)<br> > + return ret;<br> > + }<br> > } while (1);<br> > mem->mem_type = mem_type;<br> > return ttm_bo_add_move_fence(bo, man, mem);<br> > --<br> > 2.17.1<br> ><br> > _______________________________________________<br> > dri-devel mailing list<br> > <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true"> dri-devel@lists.freedesktop.org</a><br> > <a href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true"> https://lists.freedesktop.org/mailman/listinfo/dri-devel</a><br> <br> </div> </span></font><br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </div> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </blockquote> <br> </blockquote> <br> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ dri-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre> </blockquote> <br> </body> </html>
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, if (mem->mm_node) break; ret = ttm_mem_evict_first(bdev, mem_type, place, ctx); - if (unlikely(ret != 0)) - return ret; + if (unlikely(ret != 0)) { + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY) + return ret; + } } while (1); mem->mem_type = mem_type; return ttm_bo_add_move_fence(bo, man, mem);
heavy gpu job could occupy memory long time, which could lead to other user fail to get memory. Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11 Signed-off-by: Chunming Zhou <david1.zhou@amd.com> --- drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)