Message ID | 1393439112-10861-4-git-send-email-maraeo@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mit, 2014-02-26 at 19:25 +0100, Marek Olšák wrote: > > diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c > index f28a8d8..d49a3f7 100644 > --- a/drivers/gpu/drm/radeon/radeon_cs.c > +++ b/drivers/gpu/drm/radeon/radeon_cs.c > [...] > @@ -303,6 +314,18 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo > unsigned i; > > if (!error) { > + /* Sort the buffer list from the smallest to largest buffer, > + * which affects the order of buffers in the LRU list. > + * This assures that the smallest buffers are added first > + * to the LRU list, so they are likely to be later evicted > + * first, instead of large buffers whose eviction is more > + * expensive. > + * > + * This slightly lowers the number of bytes moved by TTM > + * per frame under memory pressure. > + */ > + list_sort(NULL, &parser->validated, cmp_size_smaller_first); > + > ttm_eu_fence_buffer_objects(&parser->ticket, > &parser->validated, > parser->ib.fence); This seems like a good idea in general, so maybe it should be done in ttm_eu_fence_buffer_objects() instead, and possibly also in the callers of ttm_eu_backoff_reservation_locked().
Only the VMWare driver uses ttm_eu_fence_buffer_objects. Cc'ing Thomas. What do you think about this, Thomas? Marek On Thu, Feb 27, 2014 at 2:22 AM, Michel Dänzer <michel@daenzer.net> wrote: > On Mit, 2014-02-26 at 19:25 +0100, Marek Olšák wrote: >> >> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c >> index f28a8d8..d49a3f7 100644 >> --- a/drivers/gpu/drm/radeon/radeon_cs.c >> +++ b/drivers/gpu/drm/radeon/radeon_cs.c >> [...] >> @@ -303,6 +314,18 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo >> unsigned i; >> >> if (!error) { >> + /* Sort the buffer list from the smallest to largest buffer, >> + * which affects the order of buffers in the LRU list. >> + * This assures that the smallest buffers are added first >> + * to the LRU list, so they are likely to be later evicted >> + * first, instead of large buffers whose eviction is more >> + * expensive. >> + * >> + * This slightly lowers the number of bytes moved by TTM >> + * per frame under memory pressure. >> + */ >> + list_sort(NULL, &parser->validated, cmp_size_smaller_first); >> + >> ttm_eu_fence_buffer_objects(&parser->ticket, >> &parser->validated, >> parser->ib.fence); > > This seems like a good idea in general, so maybe it should be done in > ttm_eu_fence_buffer_objects() instead, and possibly also in the callers > of ttm_eu_backoff_reservation_locked(). > > > -- > Earthling Michel Dänzer | http://www.amd.com > Libre software enthusiast | Mesa and X developer >
Hi, Marek! I'm OOTO this week, so I can't take a closer look until next week but there are a couple of things that worries me about this. 1) It's optimizing a slowpath (memory pressure) at the cost of making a fastpath slower by adding a sorting pass. 2) While it might make a difference on VRAM, it will not help much on fragmentation-free memory types, like vmwgfx GMR and MOB memory types. 3) It seems like the mechanism by which it works is to reduce the amount of memory evicted "in vain", (buffers that do not contribute to the final free memory area that's used). If that's really the case, there are better ways to do this (walk the LRU list pre-determining what buffers need to be evicted like the intel driver does). So considering this, I think we should make this sorting pass optional. /Thomas On 03/01/2014 10:57 PM, Marek Olšák wrote: > Only the VMWare driver uses ttm_eu_fence_buffer_objects. Cc'ing > Thomas. What do you think about this, Thomas? > > Marek > > On Thu, Feb 27, 2014 at 2:22 AM, Michel Dänzer <michel@daenzer.net> wrote: >> On Mit, 2014-02-26 at 19:25 +0100, Marek Olšák wrote: >>> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c >>> index f28a8d8..d49a3f7 100644 >>> --- a/drivers/gpu/drm/radeon/radeon_cs.c >>> +++ b/drivers/gpu/drm/radeon/radeon_cs.c >>> [...] >>> @@ -303,6 +314,18 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo >>> unsigned i; >>> >>> if (!error) { >>> + /* Sort the buffer list from the smallest to largest buffer, >>> + * which affects the order of buffers in the LRU list. >>> + * This assures that the smallest buffers are added first >>> + * to the LRU list, so they are likely to be later evicted >>> + * first, instead of large buffers whose eviction is more >>> + * expensive. >>> + * >>> + * This slightly lowers the number of bytes moved by TTM >>> + * per frame under memory pressure. >>> + */ >>> + list_sort(NULL, &parser->validated, cmp_size_smaller_first); >>> + >>> ttm_eu_fence_buffer_objects(&parser->ticket, >>> &parser->validated, >>> parser->ib.fence); >> This seems like a good idea in general, so maybe it should be done in >> ttm_eu_fence_buffer_objects() instead, and possibly also in the callers >> of ttm_eu_backoff_reservation_locked(). >> >> >> -- >> Earthling Michel Dänzer | https://urldefense.proofpoint.com/v1/url?u=http://www.amd.com/&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=aQIiktj8ewwEsYip5R%2FcVgKb30vgbPORPhMChqLpKg0%3D%0A&s=ccc52fd17cadd5c2a056880e0fc457d6e7258e119ac5fc4545a0aea4b4a0aa34 >> Libre software enthusiast | Mesa and X developer >>
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index f28a8d8..d49a3f7 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -24,6 +24,7 @@ * Authors: * Jerome Glisse <glisse@freedesktop.org> */ +#include <linux/list_sort.h> #include <drm/drmP.h> #include <drm/radeon_drm.h> #include "radeon_reg.h" @@ -290,6 +291,16 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data) return 0; } +static int cmp_size_smaller_first(void *priv, struct list_head *a, + struct list_head *b) +{ + struct radeon_bo_list *la = list_entry(a, struct radeon_bo_list, tv.head); + struct radeon_bo_list *lb = list_entry(b, struct radeon_bo_list, tv.head); + + /* Sort A before B if A is smaller. */ + return (int)la->bo->tbo.num_pages - (int)lb->bo->tbo.num_pages; +} + /** * cs_parser_fini() - clean parser states * @parser: parser structure holding parsing context. @@ -303,6 +314,18 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo unsigned i; if (!error) { + /* Sort the buffer list from the smallest to largest buffer, + * which affects the order of buffers in the LRU list. + * This assures that the smallest buffers are added first + * to the LRU list, so they are likely to be later evicted + * first, instead of large buffers whose eviction is more + * expensive. + * + * This slightly lowers the number of bytes moved by TTM + * per frame under memory pressure. + */ + list_sort(NULL, &parser->validated, cmp_size_smaller_first); + ttm_eu_fence_buffer_objects(&parser->ticket, &parser->validated, parser->ib.fence);