diff mbox

[4/6] drm/radeon: add buffers to the LRU list from smallest to largest

Message ID 1393439112-10861-4-git-send-email-maraeo@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Marek Olšák Feb. 26, 2014, 6:25 p.m. UTC
From: Marek Olšák <marek.olsak@amd.com>

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/radeon/radeon_cs.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

Comments

Michel Dänzer Feb. 27, 2014, 1:22 a.m. UTC | #1
On Mit, 2014-02-26 at 19:25 +0100, Marek Olšák wrote:
> 
> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
> index f28a8d8..d49a3f7 100644
> --- a/drivers/gpu/drm/radeon/radeon_cs.c
> +++ b/drivers/gpu/drm/radeon/radeon_cs.c
> [...]
> @@ -303,6 +314,18 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
>  	unsigned i;
>  
>  	if (!error) {
> +		/* Sort the buffer list from the smallest to largest buffer,
> +		 * which affects the order of buffers in the LRU list.
> +		 * This assures that the smallest buffers are added first
> +		 * to the LRU list, so they are likely to be later evicted
> +		 * first, instead of large buffers whose eviction is more
> +		 * expensive.
> +		 *
> +		 * This slightly lowers the number of bytes moved by TTM
> +		 * per frame under memory pressure.
> +		 */
> +		list_sort(NULL, &parser->validated, cmp_size_smaller_first);
> +
>  		ttm_eu_fence_buffer_objects(&parser->ticket,
>  					    &parser->validated,
>  					    parser->ib.fence);

This seems like a good idea in general, so maybe it should be done in
ttm_eu_fence_buffer_objects() instead, and possibly also in the callers
of ttm_eu_backoff_reservation_locked().
Marek Olšák March 1, 2014, 9:57 p.m. UTC | #2
Only the VMWare driver uses ttm_eu_fence_buffer_objects. Cc'ing
Thomas. What do you think about this, Thomas?

Marek

On Thu, Feb 27, 2014 at 2:22 AM, Michel Dänzer <michel@daenzer.net> wrote:
> On Mit, 2014-02-26 at 19:25 +0100, Marek Olšák wrote:
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
>> index f28a8d8..d49a3f7 100644
>> --- a/drivers/gpu/drm/radeon/radeon_cs.c
>> +++ b/drivers/gpu/drm/radeon/radeon_cs.c
>> [...]
>> @@ -303,6 +314,18 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
>>       unsigned i;
>>
>>       if (!error) {
>> +             /* Sort the buffer list from the smallest to largest buffer,
>> +              * which affects the order of buffers in the LRU list.
>> +              * This assures that the smallest buffers are added first
>> +              * to the LRU list, so they are likely to be later evicted
>> +              * first, instead of large buffers whose eviction is more
>> +              * expensive.
>> +              *
>> +              * This slightly lowers the number of bytes moved by TTM
>> +              * per frame under memory pressure.
>> +              */
>> +             list_sort(NULL, &parser->validated, cmp_size_smaller_first);
>> +
>>               ttm_eu_fence_buffer_objects(&parser->ticket,
>>                                           &parser->validated,
>>                                           parser->ib.fence);
>
> This seems like a good idea in general, so maybe it should be done in
> ttm_eu_fence_buffer_objects() instead, and possibly also in the callers
> of ttm_eu_backoff_reservation_locked().
>
>
> --
> Earthling Michel Dänzer            |                  http://www.amd.com
> Libre software enthusiast          |                Mesa and X developer
>
Thomas Hellstrom March 4, 2014, 1:27 p.m. UTC | #3
Hi, Marek!

I'm OOTO this week, so I can't take a closer look until next week but
there are a couple of things that worries me about this.

1) It's optimizing a slowpath (memory pressure) at the cost of making a
fastpath slower by adding a sorting pass.
2) While it might make a difference on VRAM, it will not help much on
fragmentation-free memory types, like vmwgfx GMR and MOB memory types.
3) It seems like the mechanism by which it works is to reduce the amount
of memory evicted "in vain", (buffers that do not contribute to the
final free memory area that's used). If that's really the case, there
are better ways to do this (walk the LRU list pre-determining what
buffers need to be evicted like the intel driver does).

So considering this, I think we should make this sorting pass optional.

/Thomas


On 03/01/2014 10:57 PM, Marek Olšák wrote:
> Only the VMWare driver uses ttm_eu_fence_buffer_objects. Cc'ing
> Thomas. What do you think about this, Thomas?
>
> Marek
>
> On Thu, Feb 27, 2014 at 2:22 AM, Michel Dänzer <michel@daenzer.net> wrote:
>> On Mit, 2014-02-26 at 19:25 +0100, Marek Olšák wrote:
>>> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
>>> index f28a8d8..d49a3f7 100644
>>> --- a/drivers/gpu/drm/radeon/radeon_cs.c
>>> +++ b/drivers/gpu/drm/radeon/radeon_cs.c
>>> [...]
>>> @@ -303,6 +314,18 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
>>>       unsigned i;
>>>
>>>       if (!error) {
>>> +             /* Sort the buffer list from the smallest to largest buffer,
>>> +              * which affects the order of buffers in the LRU list.
>>> +              * This assures that the smallest buffers are added first
>>> +              * to the LRU list, so they are likely to be later evicted
>>> +              * first, instead of large buffers whose eviction is more
>>> +              * expensive.
>>> +              *
>>> +              * This slightly lowers the number of bytes moved by TTM
>>> +              * per frame under memory pressure.
>>> +              */
>>> +             list_sort(NULL, &parser->validated, cmp_size_smaller_first);
>>> +
>>>               ttm_eu_fence_buffer_objects(&parser->ticket,
>>>                                           &parser->validated,
>>>                                           parser->ib.fence);
>> This seems like a good idea in general, so maybe it should be done in
>> ttm_eu_fence_buffer_objects() instead, and possibly also in the callers
>> of ttm_eu_backoff_reservation_locked().
>>
>>
>> --
>> Earthling Michel Dänzer            |                  https://urldefense.proofpoint.com/v1/url?u=http://www.amd.com/&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=aQIiktj8ewwEsYip5R%2FcVgKb30vgbPORPhMChqLpKg0%3D%0A&s=ccc52fd17cadd5c2a056880e0fc457d6e7258e119ac5fc4545a0aea4b4a0aa34
>> Libre software enthusiast          |                Mesa and X developer
>>
diff mbox

Patch

diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index f28a8d8..d49a3f7 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -24,6 +24,7 @@ 
  * Authors:
  *    Jerome Glisse <glisse@freedesktop.org>
  */
+#include <linux/list_sort.h>
 #include <drm/drmP.h>
 #include <drm/radeon_drm.h>
 #include "radeon_reg.h"
@@ -290,6 +291,16 @@  int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
 	return 0;
 }
 
+static int cmp_size_smaller_first(void *priv, struct list_head *a,
+				  struct list_head *b)
+{
+	struct radeon_bo_list *la = list_entry(a, struct radeon_bo_list, tv.head);
+	struct radeon_bo_list *lb = list_entry(b, struct radeon_bo_list, tv.head);
+
+	/* Sort A before B if A is smaller. */
+	return (int)la->bo->tbo.num_pages - (int)lb->bo->tbo.num_pages;
+}
+
 /**
  * cs_parser_fini() - clean parser states
  * @parser:	parser structure holding parsing context.
@@ -303,6 +314,18 @@  static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
 	unsigned i;
 
 	if (!error) {
+		/* Sort the buffer list from the smallest to largest buffer,
+		 * which affects the order of buffers in the LRU list.
+		 * This assures that the smallest buffers are added first
+		 * to the LRU list, so they are likely to be later evicted
+		 * first, instead of large buffers whose eviction is more
+		 * expensive.
+		 *
+		 * This slightly lowers the number of bytes moved by TTM
+		 * per frame under memory pressure.
+		 */
+		list_sort(NULL, &parser->validated, cmp_size_smaller_first);
+
 		ttm_eu_fence_buffer_objects(&parser->ticket,
 					    &parser->validated,
 					    parser->ib.fence);