Message ID | 20141128153929.GB8425@strange.ger.corp.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> -----Original Message----- > From: Lespiau, Damien > Sent: Friday, November 28, 2014 3:39 PM > To: Gore, Tim > Cc: intel-gfx@lists.freedesktop.org > Subject: Re: [RFC] tests/gem_ring_sync_copy: reduce memory usage > > On Fri, Nov 28, 2014 at 02:46:24PM +0000, tim.gore@intel.com wrote: > > From: Tim Gore <tim.gore@intel.com> > > > > gem_ring_sync_copy uses a lot of memory and gets OOM killed on smaller > > systems (eg android devices). > > Most of the allocation is for "busy work" to keep the render rings > > busy and for this we can just re-use the same few buffers over and > > over. This enables the test to be run on low end devices. > > > > Signed-off-by: Tim Gore <tim.gore@intel.com> > > The one thing I'm wondering is how we can ensure that 32 is indeed enough > to create a dependency between the two rings. > > I'm not sure why you've chosen to remove n_buffers_load (without actually > removing it from the test data. How about: > > diff --git a/tests/gem_ring_sync_copy.c b/tests/gem_ring_sync_copy.c > index 4a732d2..2561bac 100644 > --- a/tests/gem_ring_sync_copy.c > +++ b/tests/gem_ring_sync_copy.c > @@ -331,7 +331,7 @@ igt_main > data.drm_fd = drm_open_any_render(); > data.devid = intel_get_drm_devid(data.drm_fd); > > - data.n_buffers_load = 1000; > + data.n_buffers_load = 32; > > data.bufmgr = drm_intel_bufmgr_gem_init(data.drm_fd, 4096); > igt_assert(data.bufmgr); > > -- > Damien > N_buffers_load is still used. I am still submitting 1000 buffers to the ring, its just that I use the same buffers over and over (hence the "i % NUM_BUSY_BUFFERS"). So I only allocate 32 buffers, and each gets copied 1000/32 times, so the ring is kept busy for as long as previously. Tim > > --- > > tests/gem_ring_sync_copy.c | 25 +++++++++++++++---------- > > 1 file changed, 15 insertions(+), 10 deletions(-) > > > > diff --git a/tests/gem_ring_sync_copy.c b/tests/gem_ring_sync_copy.c > > index 4a732d2..7257188 100644 > > --- a/tests/gem_ring_sync_copy.c > > +++ b/tests/gem_ring_sync_copy.c > > @@ -57,6 +57,7 @@ > > > > #define WIDTH 512 > > #define HEIGHT 512 > > +#define NUM_BUSY_BUFFERS 32 > > > > typedef struct { > > int drm_fd; > > @@ -163,11 +164,13 @@ static void render_busy(data_t *data) > > size_t array_size; > > int i; > > > > - array_size = data->n_buffers_load * sizeof(struct igt_buf); > > + /* allocate 32 buffer objects and re-use them as needed */ > > + array_size = NUM_BUSY_BUFFERS * sizeof(struct igt_buf); > > + > > data->render.srcs = malloc(array_size); > > data->render.dsts = malloc(array_size); > > > > - for (i = 0; i < data->n_buffers_load; i++) { > > + for (i = 0; i < NUM_BUSY_BUFFERS; i++) { > > scratch_buf_init(data, &data->render.srcs[i], WIDTH, > HEIGHT, > > 0xdeadbeef); > > scratch_buf_init(data, &data->render.dsts[i], WIDTH, > HEIGHT, @@ > > -177,10 +180,10 @@ static void render_busy(data_t *data) > > for (i = 0; i < data->n_buffers_load; i++) { > > data->render.copy(data->batch, > > NULL, /* context */ > > - &data->render.srcs[i], > > + &data->render.srcs[i % > NUM_BUSY_BUFFERS], > > 0, 0, /* src_x, src_y */ > > WIDTH, HEIGHT, > > - &data->render.dsts[i], > > + &data->render.dsts[i % > NUM_BUSY_BUFFERS], > > 0, 0 /* dst_x, dst_y */); > > } > > } > > @@ -189,7 +192,7 @@ static void render_busy_fini(data_t *data) { > > int i; > > > > - for (i = 0; i < data->n_buffers_load; i++) { > > + for (i = 0; i < NUM_BUSY_BUFFERS; i++) { > > drm_intel_bo_unreference(data->render.srcs[i].bo); > > drm_intel_bo_unreference(data->render.dsts[i].bo); > > } > > @@ -225,11 +228,13 @@ static void blitter_busy(data_t *data) > > size_t array_size; > > int i; > > > > - array_size = data->n_buffers_load * sizeof(drm_intel_bo *); > > + /* allocate 32 buffer objects and re-use them as needed */ > > + array_size = NUM_BUSY_BUFFERS * sizeof(drm_intel_bo *); > > + > > data->blitter.srcs = malloc(array_size); > > data->blitter.dsts = malloc(array_size); > > > > - for (i = 0; i < data->n_buffers_load; i++) { > > + for (i = 0; i < NUM_BUSY_BUFFERS; i++) { > > data->blitter.srcs[i] = bo_create(data, > > WIDTH, HEIGHT, > > 0xdeadbeef); > > @@ -240,8 +245,8 @@ static void blitter_busy(data_t *data) > > > > for (i = 0; i < data->n_buffers_load; i++) { > > intel_copy_bo(data->batch, > > - data->blitter.srcs[i], > > - data->blitter.dsts[i], > > + data->blitter.srcs[i % NUM_BUSY_BUFFERS], > > + data->blitter.dsts[i % NUM_BUSY_BUFFERS], > > WIDTH*HEIGHT*4); > > } > > } > > @@ -250,7 +255,7 @@ static void blitter_busy_fini(data_t *data) { > > int i; > > > > - for (i = 0; i < data->n_buffers_load; i++) { > > + for (i = 0; i < NUM_BUSY_BUFFERS; i++) { > > drm_intel_bo_unreference(data->blitter.srcs[i]); > > drm_intel_bo_unreference(data->blitter.dsts[i]); > > } > > -- > > 2.1.3 > >
On Fri, Nov 28, 2014 at 03:47:01PM +0000, Gore, Tim wrote: > N_buffers_load is still used. I am still submitting 1000 buffers to the ring, its just > that I use the same buffers over and over (hence the "i % NUM_BUSY_BUFFERS"). > So I only allocate 32 buffers, and each gets copied 1000/32 times, so the ring is kept > busy for as long as previously. Ah oops, yes, indeed. Looks good then, pushed, thanks for the patch.
On Fri, Nov 28, 2014 at 04:04:14PM +0000, Damien Lespiau wrote: > On Fri, Nov 28, 2014 at 03:47:01PM +0000, Gore, Tim wrote: > > N_buffers_load is still used. I am still submitting 1000 buffers to the ring, its just > > that I use the same buffers over and over (hence the "i % NUM_BUSY_BUFFERS"). > > So I only allocate 32 buffers, and each gets copied 1000/32 times, so the ring is kept > > busy for as long as previously. > > Ah oops, yes, indeed. Looks good then, pushed, thanks for the patch. The ring is kept as busy, but the queue depth is drastically reduced (from N_buffers to 32). Since both numbers are arbitrary, I am not adverse to the change, but I would feel happier if it was demonstrated that the new test is still capable of detecting bugs deliberately introduced into the ring synchronisation code. -Chris
> -----Original Message----- > From: Chris Wilson [mailto:chris@chris-wilson.co.uk] > Sent: Friday, November 28, 2014 4:20 PM > To: Lespiau, Damien > Cc: Gore, Tim; intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce memory > usage > > On Fri, Nov 28, 2014 at 04:04:14PM +0000, Damien Lespiau wrote: > > On Fri, Nov 28, 2014 at 03:47:01PM +0000, Gore, Tim wrote: > > > N_buffers_load is still used. I am still submitting 1000 buffers to > > > the ring, its just that I use the same buffers over and over (hence the "i % > NUM_BUSY_BUFFERS"). > > > So I only allocate 32 buffers, and each gets copied 1000/32 times, > > > so the ring is kept busy for as long as previously. > > > > Ah oops, yes, indeed. Looks good then, pushed, thanks for the patch. > > The ring is kept as busy, but the queue depth is drastically reduced (from > N_buffers to 32). Since both numbers are arbitrary, I am not adverse to the > change, but I would feel happier if it was demonstrated that the new test is > still capable of detecting bugs deliberately introduced into the ring > synchronisation code. > -Chris > Excuse a rather novice question, but which queue depth is reduced? Tim > -- > Chris Wilson, Intel Open Source Technology Centre
On Fri, Nov 28, 2014 at 04:34:08PM +0000, Gore, Tim wrote: > > > > -----Original Message----- > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk] > > Sent: Friday, November 28, 2014 4:20 PM > > To: Lespiau, Damien > > Cc: Gore, Tim; intel-gfx@lists.freedesktop.org > > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce memory > > usage > > > > On Fri, Nov 28, 2014 at 04:04:14PM +0000, Damien Lespiau wrote: > > > On Fri, Nov 28, 2014 at 03:47:01PM +0000, Gore, Tim wrote: > > > > N_buffers_load is still used. I am still submitting 1000 buffers to > > > > the ring, its just that I use the same buffers over and over (hence the "i % > > NUM_BUSY_BUFFERS"). > > > > So I only allocate 32 buffers, and each gets copied 1000/32 times, > > > > so the ring is kept busy for as long as previously. > > > > > > Ah oops, yes, indeed. Looks good then, pushed, thanks for the patch. > > > > The ring is kept as busy, but the queue depth is drastically reduced (from > > N_buffers to 32). Since both numbers are arbitrary, I am not adverse to the > > change, but I would feel happier if it was demonstrated that the new test is > > still capable of detecting bugs deliberately introduced into the ring > > synchronisation code. > > -Chris > > > > Excuse a rather novice question, but which queue depth is reduced? We track on the object the last read/write request. If you reuse objects the effective depth in the read/write queue is reduced, and this queue is implicitly used when synchronising between rings. -Chris
> -----Original Message----- > From: Chris Wilson [mailto:chris@chris-wilson.co.uk] > Sent: Friday, November 28, 2014 4:47 PM > To: Gore, Tim > Cc: Lespiau, Damien; intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce memory > usage > > On Fri, Nov 28, 2014 at 04:34:08PM +0000, Gore, Tim wrote: > > > > > > > -----Original Message----- > > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk] > > > Sent: Friday, November 28, 2014 4:20 PM > > > To: Lespiau, Damien > > > Cc: Gore, Tim; intel-gfx@lists.freedesktop.org > > > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce > > > memory usage > > > > > > On Fri, Nov 28, 2014 at 04:04:14PM +0000, Damien Lespiau wrote: > > > > On Fri, Nov 28, 2014 at 03:47:01PM +0000, Gore, Tim wrote: > > > > > N_buffers_load is still used. I am still submitting 1000 buffers > > > > > to the ring, its just that I use the same buffers over and over > > > > > (hence the "i % > > > NUM_BUSY_BUFFERS"). > > > > > So I only allocate 32 buffers, and each gets copied 1000/32 > > > > > times, so the ring is kept busy for as long as previously. > > > > > > > > Ah oops, yes, indeed. Looks good then, pushed, thanks for the patch. > > > > > > The ring is kept as busy, but the queue depth is drastically reduced > > > (from N_buffers to 32). Since both numbers are arbitrary, I am not > > > adverse to the change, but I would feel happier if it was > > > demonstrated that the new test is still capable of detecting bugs > > > deliberately introduced into the ring synchronisation code. > > > -Chris > > > > > > > Excuse a rather novice question, but which queue depth is reduced? > > We track on the object the last read/write request. If you reuse objects the > effective depth in the read/write queue is reduced, and this queue is > implicitly used when synchronising between rings. > -Chris > OK thanks. So the test has changed but in a subtle way. It would seem that the main thrust of the test is still there but perhaps we could do better by checking how much memory we have and then using 1000 buffers of a size that we can accommodate.? Tim > -- > Chris Wilson, Intel Open Source Technology Centre
On Fri, Nov 28, 2014 at 05:05:11PM +0000, Gore, Tim wrote: > > > -----Original Message----- > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk] > > Sent: Friday, November 28, 2014 4:47 PM > > To: Gore, Tim > > Cc: Lespiau, Damien; intel-gfx@lists.freedesktop.org > > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce memory > > usage > > > > On Fri, Nov 28, 2014 at 04:34:08PM +0000, Gore, Tim wrote: > > > > > > > > > > -----Original Message----- > > > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk] > > > > Sent: Friday, November 28, 2014 4:20 PM > > > > To: Lespiau, Damien > > > > Cc: Gore, Tim; intel-gfx@lists.freedesktop.org > > > > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce > > > > memory usage > > > > > > > > On Fri, Nov 28, 2014 at 04:04:14PM +0000, Damien Lespiau wrote: > > > > > On Fri, Nov 28, 2014 at 03:47:01PM +0000, Gore, Tim wrote: > > > > > > N_buffers_load is still used. I am still submitting 1000 buffers > > > > > > to the ring, its just that I use the same buffers over and over > > > > > > (hence the "i % > > > > NUM_BUSY_BUFFERS"). > > > > > > So I only allocate 32 buffers, and each gets copied 1000/32 > > > > > > times, so the ring is kept busy for as long as previously. > > > > > > > > > > Ah oops, yes, indeed. Looks good then, pushed, thanks for the patch. > > > > > > > > The ring is kept as busy, but the queue depth is drastically reduced > > > > (from N_buffers to 32). Since both numbers are arbitrary, I am not > > > > adverse to the change, but I would feel happier if it was > > > > demonstrated that the new test is still capable of detecting bugs > > > > deliberately introduced into the ring synchronisation code. > > > > -Chris > > > > > > > > > > Excuse a rather novice question, but which queue depth is reduced? > > > > We track on the object the last read/write request. If you reuse objects the > > effective depth in the read/write queue is reduced, and this queue is > > implicitly used when synchronising between rings. > > -Chris > > > > OK thanks. So the test has changed but in a subtle way. It would seem that the main > thrust of the test is still there but perhaps we could do better by checking how much > memory we have and then using 1000 buffers of a size that we can accommodate.? Yes. You could allocate 1000 single pages and perform lots of little copies in each page to generate the workload with large queue depths. Good idea. -Chris
On Fri, Nov 28, 2014 at 05:34:31PM +0000, Chris Wilson wrote: > On Fri, Nov 28, 2014 at 05:05:11PM +0000, Gore, Tim wrote: > > > > > -----Original Message----- > > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk] > > > Sent: Friday, November 28, 2014 4:47 PM > > > To: Gore, Tim > > > Cc: Lespiau, Damien; intel-gfx@lists.freedesktop.org > > > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce memory > > > usage > > > > > > On Fri, Nov 28, 2014 at 04:34:08PM +0000, Gore, Tim wrote: > > > > > > > > > > > > > -----Original Message----- > > > > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk] > > > > > Sent: Friday, November 28, 2014 4:20 PM > > > > > To: Lespiau, Damien > > > > > Cc: Gore, Tim; intel-gfx@lists.freedesktop.org > > > > > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce > > > > > memory usage > > > > > > > > > > On Fri, Nov 28, 2014 at 04:04:14PM +0000, Damien Lespiau wrote: > > > > > > On Fri, Nov 28, 2014 at 03:47:01PM +0000, Gore, Tim wrote: > > > > > > > N_buffers_load is still used. I am still submitting 1000 buffers > > > > > > > to the ring, its just that I use the same buffers over and over > > > > > > > (hence the "i % > > > > > NUM_BUSY_BUFFERS"). > > > > > > > So I only allocate 32 buffers, and each gets copied 1000/32 > > > > > > > times, so the ring is kept busy for as long as previously. > > > > > > > > > > > > Ah oops, yes, indeed. Looks good then, pushed, thanks for the patch. > > > > > > > > > > The ring is kept as busy, but the queue depth is drastically reduced > > > > > (from N_buffers to 32). Since both numbers are arbitrary, I am not > > > > > adverse to the change, but I would feel happier if it was > > > > > demonstrated that the new test is still capable of detecting bugs > > > > > deliberately introduced into the ring synchronisation code. > > > > > -Chris > > > > > > > > > > > > > Excuse a rather novice question, but which queue depth is reduced? > > > > > > We track on the object the last read/write request. If you reuse objects the > > > effective depth in the read/write queue is reduced, and this queue is > > > implicitly used when synchronising between rings. > > > -Chris > > > > > > > OK thanks. So the test has changed but in a subtle way. It would seem that the main > > thrust of the test is still there but perhaps we could do better by checking how much > > memory we have and then using 1000 buffers of a size that we can accommodate.? > > Yes. You could allocate 1000 single pages and perform lots of little copies > in each page to generate the workload with large queue depths. Good idea. I think overall we have way too many copypastas of "throw load with depencies onto $engine with $ctx". Volunteers to extract something autotuning and use it everwhere highgly welcome ;-) I think Ville is working on something for kms_flip (since on vlv the current workload takes to long and results in timeouts for flips and so test failures). But not sure whether he'll do the full librarization. Aside on depencies: As long as we keep things in the same (implicit) context we won't reorder with the scheduler. Or at least that's been my working assumption while writing tests. Only for cross-ctx/fd/engine stuff do the actual depencies matter. -Daniel
On Fri, Nov 28, 2014 at 07:24:18PM +0100, Daniel Vetter wrote: > On Fri, Nov 28, 2014 at 05:34:31PM +0000, Chris Wilson wrote: > > On Fri, Nov 28, 2014 at 05:05:11PM +0000, Gore, Tim wrote: > > > > > > > -----Original Message----- > > > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk] > > > > Sent: Friday, November 28, 2014 4:47 PM > > > > To: Gore, Tim > > > > Cc: Lespiau, Damien; intel-gfx@lists.freedesktop.org > > > > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce memory > > > > usage > > > > > > > > On Fri, Nov 28, 2014 at 04:34:08PM +0000, Gore, Tim wrote: > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk] > > > > > > Sent: Friday, November 28, 2014 4:20 PM > > > > > > To: Lespiau, Damien > > > > > > Cc: Gore, Tim; intel-gfx@lists.freedesktop.org > > > > > > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce > > > > > > memory usage > > > > > > > > > > > > On Fri, Nov 28, 2014 at 04:04:14PM +0000, Damien Lespiau wrote: > > > > > > > On Fri, Nov 28, 2014 at 03:47:01PM +0000, Gore, Tim wrote: > > > > > > > > N_buffers_load is still used. I am still submitting 1000 buffers > > > > > > > > to the ring, its just that I use the same buffers over and over > > > > > > > > (hence the "i % > > > > > > NUM_BUSY_BUFFERS"). > > > > > > > > So I only allocate 32 buffers, and each gets copied 1000/32 > > > > > > > > times, so the ring is kept busy for as long as previously. > > > > > > > > > > > > > > Ah oops, yes, indeed. Looks good then, pushed, thanks for the patch. > > > > > > > > > > > > The ring is kept as busy, but the queue depth is drastically reduced > > > > > > (from N_buffers to 32). Since both numbers are arbitrary, I am not > > > > > > adverse to the change, but I would feel happier if it was > > > > > > demonstrated that the new test is still capable of detecting bugs > > > > > > deliberately introduced into the ring synchronisation code. > > > > > > -Chris > > > > > > > > > > > > > > > > Excuse a rather novice question, but which queue depth is reduced? > > > > > > > > We track on the object the last read/write request. If you reuse objects the > > > > effective depth in the read/write queue is reduced, and this queue is > > > > implicitly used when synchronising between rings. > > > > -Chris > > > > > > > > > > OK thanks. So the test has changed but in a subtle way. It would seem that the main > > > thrust of the test is still there but perhaps we could do better by checking how much > > > memory we have and then using 1000 buffers of a size that we can accommodate.? > > > > Yes. You could allocate 1000 single pages and perform lots of little copies > > in each page to generate the workload with large queue depths. Good idea. > > I think overall we have way too many copypastas of "throw load with > depencies onto $engine with $ctx". Volunteers to extract something > autotuning and use it everwhere highgly welcome ;-) > > I think Ville is working on something for kms_flip (since on vlv the > current workload takes to long and results in timeouts for flips and so > test failures). But not sure whether he'll do the full librarization. My current hacked up dummy load stuff still lives in kms_flip. It just tries to tune for a 1 second delay now. And it still uses two bos to do the copies, but those are now fixed 2kx2k and only the final copy is aimed at the fb. Seems to working reosonably well now. I'll try to clean it up a bit and post it to the list, but it's getting a bit late here so that'll have to wait until early next week.
diff --git a/tests/gem_ring_sync_copy.c b/tests/gem_ring_sync_copy.c index 4a732d2..2561bac 100644 --- a/tests/gem_ring_sync_copy.c +++ b/tests/gem_ring_sync_copy.c @@ -331,7 +331,7 @@ igt_main data.drm_fd = drm_open_any_render(); data.devid = intel_get_drm_devid(data.drm_fd); - data.n_buffers_load = 1000; + data.n_buffers_load = 32; data.bufmgr = drm_intel_bufmgr_gem_init(data.drm_fd, 4096); igt_assert(data.bufmgr);