mbox series

[0/7] A few drm_syncobj optimisations

Message ID 20250318155424.78552-1-tvrtko.ursulin@igalia.com (mailing list archive)
Headers show
Series A few drm_syncobj optimisations | expand

Message

Tvrtko Ursulin March 18, 2025, 3:54 p.m. UTC
A small set of drm_syncobj optimisations which should make things a tiny bit
more efficient on the CPU side of things.

Improvement seems to be around 1.5%* more FPS if observed with "vkgears
-present-mailbox" on a Steam Deck Plasma desktop, but I am reluctant to make a
definitive claim on the numbers since there is some run to run variance. But, as
suggested by Michel Dänzer, I did do a five ~100 second runs on the each kernel
to be able to show the ministat analysis.

x before
+ after
+------------------------------------------------------------+
|                          x         +                       |
|                   x      x         +                       |
|                   x      xx      ++++                      |
|                 x x      xx x    ++++                      |
|                 x xx   x xx x+   ++++                      |
|                xxxxx   xxxxxx+   ++++ + +                  |
|                xxxxxxx xxxxxx+x  ++++ +++                  |
|              x xxxxxxxxxxx*xx+* x++++++++   ++             |
|        x x   xxxxxxxxxxxx**x*+*+*++++++++ ++++ +           |
|       xx x   xxxxxxxxxx*x****+***+**+++++ ++++++           |
|x     xxx x   xxxxx*x****x***********+*++**+++++++   +  +  +|
|               |_______A______|                             |
|                             |______A_______|               |
+------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x 135      21697.58     22809.467     22321.396     22307.707     198.75011
+ 118     22200.746      23277.09       22661.4     22671.442     192.10609
Difference at 95.0% confidence
    363.735 +/- 48.3345
    1.63054% +/- 0.216672%
    (Student's t, pooled s = 195.681)

Tvrtko Ursulin (7):
  drm/syncobj: Remove unhelpful helper
  drm/syncobj: Do not allocate an array to store zeros when waiting
  drm/syncobj: Avoid one temporary allocation in drm_syncobj_array_find
  drm/syncobj: Use put_user in drm_syncobj_query_ioctl
  drm/syncobj: Avoid temporary allocation in
    drm_syncobj_timeline_signal_ioctl
  drm/syncobj: Add a fast path to drm_syncobj_array_wait_timeout
  drm/syncobj: Add a fast path to drm_syncobj_array_find

 drivers/gpu/drm/drm_syncobj.c | 281 ++++++++++++++++++----------------
 1 file changed, 147 insertions(+), 134 deletions(-)

Comments

Maíra Canal March 24, 2025, 11:17 p.m. UTC | #1
Hi Tvrtko,

Thanks for this patchset! I applied this patchset to the RPi downstream
kernel 6.13.7 [1] and saw an FPS improvement of approximately 5.85%
with "vkgears -present-mailbox" on the RPi 5.

I did five 100 seconds runs on each kernel and here are my results:

### 6.13.7

|   Run    |   Min FPS   |   Max FPS   |   Avg FPS   |
|----------|-------------|-------------|-------------|
| Run #1   | 6646.52     | 6874.77     | 6739.313    |
| Run #2   | 5387.04     | 6723.274    | 6046.773    |
| Run #3   | 6230.49     | 6823.47     | 6423.923    |
| Run #4   | 5269.678    | 5870.59     | 5501.858    |
| Run #5   | 5504.54     | 6285.91     | 5859.724    |

* Overall Avg FPS: 6114.318 FPS


### 6.13.7 + DRM Syncobj optimisations

|   Run    |   Min FPS   |   Max FPS   |   Avg FPS   |
|----------|-------------|-------------|-------------|
| Run #1   | 6089.05     | 7296.27     | 6859.724    |
| Run #2   | 6022.48     | 7264        | 6818.518    |
| Run #3   | 5987.68     | 6188.77     | 6041.365    |
| Run #4   | 5699.27     | 6448.99     | 6190.374    |
| Run #5   | 6199.27     | 6791.15     | 6450.900    |

* Overall Avg FPS: 6472.176 FPS

[1] https://github.com/raspberrypi/linux/tree/rpi-6.13.y

Best Regards,
- Maíra

On 18/03/25 12:54, Tvrtko Ursulin wrote:
> A small set of drm_syncobj optimisations which should make things a tiny bit
> more efficient on the CPU side of things.
> 
> Improvement seems to be around 1.5%* more FPS if observed with "vkgears
> -present-mailbox" on a Steam Deck Plasma desktop, but I am reluctant to make a
> definitive claim on the numbers since there is some run to run variance. But, as
> suggested by Michel Dänzer, I did do a five ~100 second runs on the each kernel
> to be able to show the ministat analysis.
> 
> x before
> + after
> +------------------------------------------------------------+
> |                          x         +                       |
> |                   x      x         +                       |
> |                   x      xx      ++++                      |
> |                 x x      xx x    ++++                      |
> |                 x xx   x xx x+   ++++                      |
> |                xxxxx   xxxxxx+   ++++ + +                  |
> |                xxxxxxx xxxxxx+x  ++++ +++                  |
> |              x xxxxxxxxxxx*xx+* x++++++++   ++             |
> |        x x   xxxxxxxxxxxx**x*+*+*++++++++ ++++ +           |
> |       xx x   xxxxxxxxxx*x****+***+**+++++ ++++++           |
> |x     xxx x   xxxxx*x****x***********+*++**+++++++   +  +  +|
> |               |_______A______|                             |
> |                             |______A_______|               |
> +------------------------------------------------------------+
>      N           Min           Max        Median           Avg        Stddev
> x 135      21697.58     22809.467     22321.396     22307.707     198.75011
> + 118     22200.746      23277.09       22661.4     22671.442     192.10609
> Difference at 95.0% confidence
>      363.735 +/- 48.3345
>      1.63054% +/- 0.216672%
>      (Student's t, pooled s = 195.681)
> 
> Tvrtko Ursulin (7):
>    drm/syncobj: Remove unhelpful helper
>    drm/syncobj: Do not allocate an array to store zeros when waiting
>    drm/syncobj: Avoid one temporary allocation in drm_syncobj_array_find
>    drm/syncobj: Use put_user in drm_syncobj_query_ioctl
>    drm/syncobj: Avoid temporary allocation in
>      drm_syncobj_timeline_signal_ioctl
>    drm/syncobj: Add a fast path to drm_syncobj_array_wait_timeout
>    drm/syncobj: Add a fast path to drm_syncobj_array_find
> 
>   drivers/gpu/drm/drm_syncobj.c | 281 ++++++++++++++++++----------------
>   1 file changed, 147 insertions(+), 134 deletions(-)
>
Tvrtko Ursulin March 25, 2025, 9:57 a.m. UTC | #2
On 24/03/2025 23:17, Maíra Canal wrote:
> Hi Tvrtko,
> 
> Thanks for this patchset! I applied this patchset to the RPi downstream
> kernel 6.13.7 [1] and saw an FPS improvement of approximately 5.85%
> with "vkgears -present-mailbox" on the RPi 5.
> 
> I did five 100 seconds runs on each kernel and here are my results:
> 
> ### 6.13.7
> 
> |   Run    |   Min FPS   |   Max FPS   |   Avg FPS   |
> |----------|-------------|-------------|-------------|
> | Run #1   | 6646.52     | 6874.77     | 6739.313    |
> | Run #2   | 5387.04     | 6723.274    | 6046.773    |
> | Run #3   | 6230.49     | 6823.47     | 6423.923    |
> | Run #4   | 5269.678    | 5870.59     | 5501.858    |
> | Run #5   | 5504.54     | 6285.91     | 5859.724    |
> 
> * Overall Avg FPS: 6114.318 FPS
> 
> 
> ### 6.13.7 + DRM Syncobj optimisations
> 
> |   Run    |   Min FPS   |   Max FPS   |   Avg FPS   |
> |----------|-------------|-------------|-------------|
> | Run #1   | 6089.05     | 7296.27     | 6859.724    |
> | Run #2   | 6022.48     | 7264        | 6818.518    |
> | Run #3   | 5987.68     | 6188.77     | 6041.365    |
> | Run #4   | 5699.27     | 6448.99     | 6190.374    |
> | Run #5   | 6199.27     | 6791.15     | 6450.900    |
> 
> * Overall Avg FPS: 6472.176 FPS

Neat, thanks for testing! I am not surprised a slower CPU benefits more.

Btw if you have the raw data it would be nice to feed it to ministat too.

Regards,

Tvrtko

> [1] https://github.com/raspberrypi/linux/tree/rpi-6.13.y
> 
> Best Regards,
> - Maíra
> 
> On 18/03/25 12:54, Tvrtko Ursulin wrote:
>> A small set of drm_syncobj optimisations which should make things a 
>> tiny bit
>> more efficient on the CPU side of things.
>>
>> Improvement seems to be around 1.5%* more FPS if observed with "vkgears
>> -present-mailbox" on a Steam Deck Plasma desktop, but I am reluctant 
>> to make a
>> definitive claim on the numbers since there is some run to run 
>> variance. But, as
>> suggested by Michel Dänzer, I did do a five ~100 second runs on the 
>> each kernel
>> to be able to show the ministat analysis.
>>
>> x before
>> + after
>> +------------------------------------------------------------+
>> |                          x         +                       |
>> |                   x      x         +                       |
>> |                   x      xx      ++++                      |
>> |                 x x      xx x    ++++                      |
>> |                 x xx   x xx x+   ++++                      |
>> |                xxxxx   xxxxxx+   ++++ + +                  |
>> |                xxxxxxx xxxxxx+x  ++++ +++                  |
>> |              x xxxxxxxxxxx*xx+* x++++++++   ++             |
>> |        x x   xxxxxxxxxxxx**x*+*+*++++++++ ++++ +           |
>> |       xx x   xxxxxxxxxx*x****+***+**+++++ ++++++           |
>> |x     xxx x   xxxxx*x****x***********+*++**+++++++   +  +  +|
>> |               |_______A______|                             |
>> |                             |______A_______|               |
>> +------------------------------------------------------------+
>>      N           Min           Max        Median           Avg        
>> Stddev
>> x 135      21697.58     22809.467     22321.396     22307.707     
>> 198.75011
>> + 118     22200.746      23277.09       22661.4     22671.442     
>> 192.10609
>> Difference at 95.0% confidence
>>      363.735 +/- 48.3345
>>      1.63054% +/- 0.216672%
>>      (Student's t, pooled s = 195.681)
>>
>> Tvrtko Ursulin (7):
>>    drm/syncobj: Remove unhelpful helper
>>    drm/syncobj: Do not allocate an array to store zeros when waiting
>>    drm/syncobj: Avoid one temporary allocation in drm_syncobj_array_find
>>    drm/syncobj: Use put_user in drm_syncobj_query_ioctl
>>    drm/syncobj: Avoid temporary allocation in
>>      drm_syncobj_timeline_signal_ioctl
>>    drm/syncobj: Add a fast path to drm_syncobj_array_wait_timeout
>>    drm/syncobj: Add a fast path to drm_syncobj_array_find
>>
>>   drivers/gpu/drm/drm_syncobj.c | 281 ++++++++++++++++++----------------
>>   1 file changed, 147 insertions(+), 134 deletions(-)
>>
>
Maíra Canal March 25, 2025, 9:10 p.m. UTC | #3
Hi Tvrtko,

On 25/03/25 06:57, Tvrtko Ursulin wrote:
> 
> On 24/03/2025 23:17, Maíra Canal wrote:
>> Hi Tvrtko,
>>
>> Thanks for this patchset! I applied this patchset to the RPi downstream
>> kernel 6.13.7 [1] and saw an FPS improvement of approximately 5.85%
>> with "vkgears -present-mailbox" on the RPi 5.
>>
>> I did five 100 seconds runs on each kernel and here are my results:
>>
>> ### 6.13.7
>>
>> |   Run    |   Min FPS   |   Max FPS   |   Avg FPS   |
>> |----------|-------------|-------------|-------------|
>> | Run #1   | 6646.52     | 6874.77     | 6739.313    |
>> | Run #2   | 5387.04     | 6723.274    | 6046.773    |
>> | Run #3   | 6230.49     | 6823.47     | 6423.923    |
>> | Run #4   | 5269.678    | 5870.59     | 5501.858    |
>> | Run #5   | 5504.54     | 6285.91     | 5859.724    |
>>
>> * Overall Avg FPS: 6114.318 FPS
>>
>>
>> ### 6.13.7 + DRM Syncobj optimisations
>>
>> |   Run    |   Min FPS   |   Max FPS   |   Avg FPS   |
>> |----------|-------------|-------------|-------------|
>> | Run #1   | 6089.05     | 7296.27     | 6859.724    |
>> | Run #2   | 6022.48     | 7264        | 6818.518    |
>> | Run #3   | 5987.68     | 6188.77     | 6041.365    |
>> | Run #4   | 5699.27     | 6448.99     | 6190.374    |
>> | Run #5   | 6199.27     | 6791.15     | 6450.900    |
>>
>> * Overall Avg FPS: 6472.176 FPS
> 
> Neat, thanks for testing! I am not surprised a slower CPU benefits more.
> 
> Btw if you have the raw data it would be nice to feed it to ministat too.

I ran again and collected the raw data. Here is the ministat:

x no-optimizations.txt
+ syncobjs-optimizations.txt
+---------------------------------------------------------------------------+
|                                 +                 + 
    |
|                                 +    +           ++ 
    |
|                     x           +    +           ++ 
    |
|                     xx          *   ++x          ++ 
    |
|                  *  xx         +*+  +*x++        ++ 
    |
|  x        ++x    *+xxx         +*+ x+*x+*x x     ++   x 
    |
|x xxx      ++xxxx *+xxx         +*+ x***+** x   + ++  **   +  + x++ 
    |
|xxxxx    x +***x*x*+**x xxxx* xx+** *******x* x + +++x**x+*+  + **++x 
xxx x|
|             |__________|______A_M____MA__________|___| 
    |
+---------------------------------------------------------------------------+
     N           Min           Max        Median           Avg        Stddev
x  95      5660.033      7371.548      6413.172     6383.4326     431.10036
+  95      5914.994      7209.361      6538.192     6568.3293      345.7754
Difference at 95.0% confidence
	184.897 +/- 111.131
	2.89651% +/- 1.74093%
	(Student's t, pooled s = 390.774)

Best Regards,
- Maíra

> 
> Regards,
> 
> Tvrtko
> 
>> [1] https://github.com/raspberrypi/linux/tree/rpi-6.13.y
>>
>> Best Regards,
>> - Maíra
>>
>> On 18/03/25 12:54, Tvrtko Ursulin wrote:
>>> A small set of drm_syncobj optimisations which should make things a 
>>> tiny bit
>>> more efficient on the CPU side of things.
>>>
>>> Improvement seems to be around 1.5%* more FPS if observed with "vkgears
>>> -present-mailbox" on a Steam Deck Plasma desktop, but I am reluctant 
>>> to make a
>>> definitive claim on the numbers since there is some run to run 
>>> variance. But, as
>>> suggested by Michel Dänzer, I did do a five ~100 second runs on the 
>>> each kernel
>>> to be able to show the ministat analysis.
>>>
>>> x before
>>> + after
>>> +------------------------------------------------------------+
>>> |                          x         +                       |
>>> |                   x      x         +                       |
>>> |                   x      xx      ++++                      |
>>> |                 x x      xx x    ++++                      |
>>> |                 x xx   x xx x+   ++++                      |
>>> |                xxxxx   xxxxxx+   ++++ + +                  |
>>> |                xxxxxxx xxxxxx+x  ++++ +++                  |
>>> |              x xxxxxxxxxxx*xx+* x++++++++   ++             |
>>> |        x x   xxxxxxxxxxxx**x*+*+*++++++++ ++++ +           |
>>> |       xx x   xxxxxxxxxx*x****+***+**+++++ ++++++           |
>>> |x     xxx x   xxxxx*x****x***********+*++**+++++++   +  +  +|
>>> |               |_______A______|                             |
>>> |                             |______A_______|               |
>>> +------------------------------------------------------------+
>>>      N           Min           Max        Median           Avg Stddev
>>> x 135      21697.58     22809.467     22321.396     22307.707 198.75011
>>> + 118     22200.746      23277.09       22661.4     22671.442 192.10609
>>> Difference at 95.0% confidence
>>>      363.735 +/- 48.3345
>>>      1.63054% +/- 0.216672%
>>>      (Student's t, pooled s = 195.681)
>>>
>>> Tvrtko Ursulin (7):
>>>    drm/syncobj: Remove unhelpful helper
>>>    drm/syncobj: Do not allocate an array to store zeros when waiting
>>>    drm/syncobj: Avoid one temporary allocation in drm_syncobj_array_find
>>>    drm/syncobj: Use put_user in drm_syncobj_query_ioctl
>>>    drm/syncobj: Avoid temporary allocation in
>>>      drm_syncobj_timeline_signal_ioctl
>>>    drm/syncobj: Add a fast path to drm_syncobj_array_wait_timeout
>>>    drm/syncobj: Add a fast path to drm_syncobj_array_find
>>>
>>>   drivers/gpu/drm/drm_syncobj.c | 281 ++++++++++++++++++----------------
>>>   1 file changed, 147 insertions(+), 134 deletions(-)
>>>
>>
>