diff mbox series

[3/6] drm/syncobj: Do not allocate an array to store zeros

Message ID 20250102194418.70383-4-tursulin@igalia.com (mailing list archive)
State New
Headers show
Series A few drm_syncobj optimisations | expand

Commit Message

Tvrtko Ursulin Jan. 2, 2025, 7:44 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>

When waiting on syncobjs the current code allocates a temporary array only
to fill it up with all zeros.

We can avoid that by relying on the allocated entry array already being
zero allocated. For the timeline mode we fetch the timeline point values
as we populate the entries array so also do not need this additional
temporary allocation.

"vkgears -present-mailbox" average framerate:

  Before: 21410.1089
  After:  21609.7225

With a disclaimer that measuring with vkgears feels a bit variable,
nevertheless it did not look like noise.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
---
 drivers/gpu/drm/drm_syncobj.c | 38 +++++++++++++----------------------
 1 file changed, 14 insertions(+), 24 deletions(-)

Comments

Michel Dänzer Jan. 6, 2025, 5:16 p.m. UTC | #1
On 2025-01-02 20:44, Tvrtko Ursulin wrote:
> 
> "vkgears -present-mailbox" average framerate:
> 
>   Before: 21410.1089
>   After:  21609.7225
> 
> With a disclaimer that measuring with vkgears feels a bit variable,
> nevertheless it did not look like noise.

That's ~1% difference. IME the frame rate can easily vary more than that during a single run.

The frame rate surely varies by more than 1 fps during each run, so comparing such large average values down to 4 digits after the decimal point doesn't seem very useful.

Doing multiple (at least 3 or more each) before & after runs and comparing the distribution of individual measured values using something like ministat might confirm it's essentially noise, or give more confidence it's not.
Tvrtko Ursulin Jan. 7, 2025, 9:29 a.m. UTC | #2
On 06/01/2025 17:16, Michel Dänzer wrote:
> On 2025-01-02 20:44, Tvrtko Ursulin wrote:
>>
>> "vkgears -present-mailbox" average framerate:
>>
>>    Before: 21410.1089
>>    After:  21609.7225
>>
>> With a disclaimer that measuring with vkgears feels a bit variable,
>> nevertheless it did not look like noise.
> 
> That's ~1% difference. IME the frame rate can easily vary more than that during a single run.
> 
> The frame rate surely varies by more than 1 fps during each run, so comparing such large average values down to 4 digits after the decimal point doesn't seem very useful.
> 
> Doing multiple (at least 3 or more each) before & after runs and comparing the distribution of individual measured values using something like ministat might confirm it's essentially noise, or give more confidence it's not.

I did multiple runs and I thought from the code changes it would be 
obvious there is some code there which should go.

But fair enough, I agree ministat is common practice so I re-did it. 
Five ~100 second runs each kernel. Absolute numbers are a bit different 
before I turned on some kernel hardening options since.

x before
+ after
+------------------------------------------------------------+
|                          x         +                       |
|                   x      x         +                       |
|                   x      xx      ++++                      |
|                 x x      xx x    ++++                      |
|                 x xx   x xx x+   ++++                      |
|                xxxxx   xxxxxx+   ++++ + +                  |
|                xxxxxxx xxxxxx+x  ++++ +++                  |
|              x xxxxxxxxxxx*xx+* x++++++++   ++             |
|        x x   xxxxxxxxxxxx**x*+*+*++++++++ ++++ +           |
|       xx x   xxxxxxxxxx*x****+***+**+++++ ++++++           |
|x     xxx x   xxxxx*x****x***********+*++**+++++++   +  +  +|
|               |_______A______|                             |
|                             |______A_______|               |
+------------------------------------------------------------+
     N           Min           Max        Median           Avg        Stddev
x 135      21697.58     22809.467     22321.396     22307.707     198.75011
+ 118     22200.746      23277.09       22661.4     22671.442     192.10609
Difference at 95.0% confidence
	363.735 +/- 48.3345
	1.63054% +/- 0.216672%
	(Student's t, pooled s = 195.681)

It's a small difference but every little helps.

Regards,

Tvrtko
Michel Dänzer Jan. 7, 2025, 11:22 a.m. UTC | #3
On 2025-01-07 10:29, Tvrtko Ursulin wrote:
> 
> On 06/01/2025 17:16, Michel Dänzer wrote:
>> On 2025-01-02 20:44, Tvrtko Ursulin wrote:
>>>
>>> "vkgears -present-mailbox" average framerate:
>>>
>>>    Before: 21410.1089
>>>    After:  21609.7225
>>>
>>> With a disclaimer that measuring with vkgears feels a bit variable,
>>> nevertheless it did not look like noise.
>>
>> That's ~1% difference. IME the frame rate can easily vary more than that during a single run.
>>
>> The frame rate surely varies by more than 1 fps during each run, so comparing such large average values down to 4 digits after the decimal point doesn't seem very useful.
>>
>> Doing multiple (at least 3 or more each) before & after runs and comparing the distribution of individual measured values using something like ministat might confirm it's essentially noise, or give more confidence it's not.
> 
> I did multiple runs and I thought from the code changes it would be obvious there is some code there which should go.

FWIW, I didn't question that, in fact I don't have any particular opinion on the actual code changes. Claims of performance differences based on two numbers are just a pet peeve of mine. :)


> But fair enough, I agree ministat is common practice so I re-did it. Five ~100 second runs each kernel. Absolute numbers are a bit different before I turned on some kernel hardening options since.
> 
> x before
> + after
> +------------------------------------------------------------+
> |                          x         +                       |
> |                   x      x         +                       |
> |                   x      xx      ++++                      |
> |                 x x      xx x    ++++                      |
> |                 x xx   x xx x+   ++++                      |
> |                xxxxx   xxxxxx+   ++++ + +                  |
> |                xxxxxxx xxxxxx+x  ++++ +++                  |
> |              x xxxxxxxxxxx*xx+* x++++++++   ++             |
> |        x x   xxxxxxxxxxxx**x*+*+*++++++++ ++++ +           |
> |       xx x   xxxxxxxxxx*x****+***+**+++++ ++++++           |
> |x     xxx x   xxxxx*x****x***********+*++**+++++++   +  +  +|
> |               |_______A______|                             |
> |                             |______A_______|               |
> +------------------------------------------------------------+
>     N           Min           Max        Median           Avg        Stddev
> x 135      21697.58     22809.467     22321.396     22307.707     198.75011
> + 118     22200.746      23277.09       22661.4     22671.442     192.10609
> Difference at 95.0% confidence
>     363.735 +/- 48.3345
>     1.63054% +/- 0.216672%
>     (Student's t, pooled s = 195.681)
> 
> It's a small difference but every little helps.

This gives a lot of confidence in the performance gain, thanks! Would be great if you could incorporate at least a summary of this into a commit log somehow.
Tvrtko Ursulin Jan. 7, 2025, 12:37 p.m. UTC | #4
On 07/01/2025 11:22, Michel Dänzer wrote:
> On 2025-01-07 10:29, Tvrtko Ursulin wrote:
>>
>> On 06/01/2025 17:16, Michel Dänzer wrote:
>>> On 2025-01-02 20:44, Tvrtko Ursulin wrote:
>>>>
>>>> "vkgears -present-mailbox" average framerate:
>>>>
>>>>     Before: 21410.1089
>>>>     After:  21609.7225
>>>>
>>>> With a disclaimer that measuring with vkgears feels a bit variable,
>>>> nevertheless it did not look like noise.
>>>
>>> That's ~1% difference. IME the frame rate can easily vary more than that during a single run.
>>>
>>> The frame rate surely varies by more than 1 fps during each run, so comparing such large average values down to 4 digits after the decimal point doesn't seem very useful.
>>>
>>> Doing multiple (at least 3 or more each) before & after runs and comparing the distribution of individual measured values using something like ministat might confirm it's essentially noise, or give more confidence it's not.
>>
>> I did multiple runs and I thought from the code changes it would be obvious there is some code there which should go.
> 
> FWIW, I didn't question that, in fact I don't have any particular opinion on the actual code changes. Claims of performance differences based on two numbers are just a pet peeve of mine. :)
> 
> 
>> But fair enough, I agree ministat is common practice so I re-did it. Five ~100 second runs each kernel. Absolute numbers are a bit different before I turned on some kernel hardening options since.
>>
>> x before
>> + after
>> +------------------------------------------------------------+
>> |                          x         +                       |
>> |                   x      x         +                       |
>> |                   x      xx      ++++                      |
>> |                 x x      xx x    ++++                      |
>> |                 x xx   x xx x+   ++++                      |
>> |                xxxxx   xxxxxx+   ++++ + +                  |
>> |                xxxxxxx xxxxxx+x  ++++ +++                  |
>> |              x xxxxxxxxxxx*xx+* x++++++++   ++             |
>> |        x x   xxxxxxxxxxxx**x*+*+*++++++++ ++++ +           |
>> |       xx x   xxxxxxxxxx*x****+***+**+++++ ++++++           |
>> |x     xxx x   xxxxx*x****x***********+*++**+++++++   +  +  +|
>> |               |_______A______|                             |
>> |                             |______A_______|               |
>> +------------------------------------------------------------+
>>      N           Min           Max        Median           Avg        Stddev
>> x 135      21697.58     22809.467     22321.396     22307.707     198.75011
>> + 118     22200.746      23277.09       22661.4     22671.442     192.10609
>> Difference at 95.0% confidence
>>      363.735 +/- 48.3345
>>      1.63054% +/- 0.216672%
>>      (Student's t, pooled s = 195.681)
>>
>> It's a small difference but every little helps.
> 
> This gives a lot of confidence in the performance gain, thanks! Would be great if you could incorporate at least a summary of this into a commit log somehow.

Thanks! Now lets hope someone can spare the time to review.

I pasted the graph into a reply to cover letter of v2 of the series. 
That's my best idea since it is a series with a few small improvements 
which add up to a total.

Regards,

Tvrtko
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 2fb95d6f6c82..059d6be3ff1f 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1027,7 +1027,7 @@  static void syncobj_wait_syncobj_func(struct drm_syncobj *syncobj,
 }
 
 static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
-						  void __user *user_points,
+						  u64 __user *user_points,
 						  uint32_t count,
 						  uint32_t flags,
 						  signed long timeout,
@@ -1035,9 +1035,8 @@  static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
 						  ktime_t *deadline)
 {
 	struct syncobj_wait_entry *entries;
-	struct dma_fence *fence;
-	uint64_t *points;
 	uint32_t signaled_count, i;
+	struct dma_fence *fence;
 
 	if (flags & (DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT |
 		     DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE)) {
@@ -1045,24 +1044,14 @@  static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
 		lockdep_assert_none_held_once();
 	}
 
-	points = kmalloc_array(count, sizeof(*points), GFP_KERNEL);
-	if (points == NULL)
-		return -ENOMEM;
-
-	if (!user_points) {
-		memset(points, 0, count * sizeof(uint64_t));
-
-	} else if (copy_from_user(points, user_points,
-				  sizeof(uint64_t) * count)) {
-		timeout = -EFAULT;
-		goto err_free_points;
-	}
+	if (user_points &&
+	    !access_ok(user_points, count * sizeof(*user_points)))
+		return -EFAULT;
 
 	entries = kcalloc(count, sizeof(*entries), GFP_KERNEL);
-	if (!entries) {
-		timeout = -ENOMEM;
-		goto err_free_points;
-	}
+	if (!entries)
+		return -ENOMEM;
+
 	/* Walk the list of sync objects and initialize entries.  We do
 	 * this up-front so that we can properly return -EINVAL if there is
 	 * a syncobj with a missing fence and then never have the chance of
@@ -1073,9 +1062,13 @@  static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
 		struct dma_fence *fence;
 
 		entries[i].task = current;
-		entries[i].point = points[i];
+		if (user_points && get_user(entries[i].point, user_points++)) {
+			timeout = -EFAULT;
+			goto cleanup_entries;
+		}
 		fence = drm_syncobj_fence_get(syncobjs[i]);
-		if (!fence || dma_fence_chain_find_seqno(&fence, points[i])) {
+		if (!fence ||
+		    dma_fence_chain_find_seqno(&fence, entries[i].point)) {
 			dma_fence_put(fence);
 			if (flags & (DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT |
 				     DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE)) {
@@ -1181,9 +1174,6 @@  static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
 	}
 	kfree(entries);
 
-err_free_points:
-	kfree(points);
-
 	return timeout;
 }