Message ID | 20250318155424.78552-1-tvrtko.ursulin@igalia.com (mailing list archive) |
---|---|
Headers | show |
Series | A few drm_syncobj optimisations | expand |
Hi Tvrtko, Thanks for this patchset! I applied this patchset to the RPi downstream kernel 6.13.7 [1] and saw an FPS improvement of approximately 5.85% with "vkgears -present-mailbox" on the RPi 5. I did five 100 seconds runs on each kernel and here are my results: ### 6.13.7 | Run | Min FPS | Max FPS | Avg FPS | |----------|-------------|-------------|-------------| | Run #1 | 6646.52 | 6874.77 | 6739.313 | | Run #2 | 5387.04 | 6723.274 | 6046.773 | | Run #3 | 6230.49 | 6823.47 | 6423.923 | | Run #4 | 5269.678 | 5870.59 | 5501.858 | | Run #5 | 5504.54 | 6285.91 | 5859.724 | * Overall Avg FPS: 6114.318 FPS ### 6.13.7 + DRM Syncobj optimisations | Run | Min FPS | Max FPS | Avg FPS | |----------|-------------|-------------|-------------| | Run #1 | 6089.05 | 7296.27 | 6859.724 | | Run #2 | 6022.48 | 7264 | 6818.518 | | Run #3 | 5987.68 | 6188.77 | 6041.365 | | Run #4 | 5699.27 | 6448.99 | 6190.374 | | Run #5 | 6199.27 | 6791.15 | 6450.900 | * Overall Avg FPS: 6472.176 FPS [1] https://github.com/raspberrypi/linux/tree/rpi-6.13.y Best Regards, - Maíra On 18/03/25 12:54, Tvrtko Ursulin wrote: > A small set of drm_syncobj optimisations which should make things a tiny bit > more efficient on the CPU side of things. > > Improvement seems to be around 1.5%* more FPS if observed with "vkgears > -present-mailbox" on a Steam Deck Plasma desktop, but I am reluctant to make a > definitive claim on the numbers since there is some run to run variance. But, as > suggested by Michel Dänzer, I did do a five ~100 second runs on the each kernel > to be able to show the ministat analysis. > > x before > + after > +------------------------------------------------------------+ > | x + | > | x x + | > | x xx ++++ | > | x x xx x ++++ | > | x xx x xx x+ ++++ | > | xxxxx xxxxxx+ ++++ + + | > | xxxxxxx xxxxxx+x ++++ +++ | > | x xxxxxxxxxxx*xx+* x++++++++ ++ | > | x x xxxxxxxxxxxx**x*+*+*++++++++ ++++ + | > | xx x xxxxxxxxxx*x****+***+**+++++ ++++++ | > |x xxx x xxxxx*x****x***********+*++**+++++++ + + +| > | |_______A______| | > | |______A_______| | > +------------------------------------------------------------+ > N Min Max Median Avg Stddev > x 135 21697.58 22809.467 22321.396 22307.707 198.75011 > + 118 22200.746 23277.09 22661.4 22671.442 192.10609 > Difference at 95.0% confidence > 363.735 +/- 48.3345 > 1.63054% +/- 0.216672% > (Student's t, pooled s = 195.681) > > Tvrtko Ursulin (7): > drm/syncobj: Remove unhelpful helper > drm/syncobj: Do not allocate an array to store zeros when waiting > drm/syncobj: Avoid one temporary allocation in drm_syncobj_array_find > drm/syncobj: Use put_user in drm_syncobj_query_ioctl > drm/syncobj: Avoid temporary allocation in > drm_syncobj_timeline_signal_ioctl > drm/syncobj: Add a fast path to drm_syncobj_array_wait_timeout > drm/syncobj: Add a fast path to drm_syncobj_array_find > > drivers/gpu/drm/drm_syncobj.c | 281 ++++++++++++++++++---------------- > 1 file changed, 147 insertions(+), 134 deletions(-) >
On 24/03/2025 23:17, Maíra Canal wrote: > Hi Tvrtko, > > Thanks for this patchset! I applied this patchset to the RPi downstream > kernel 6.13.7 [1] and saw an FPS improvement of approximately 5.85% > with "vkgears -present-mailbox" on the RPi 5. > > I did five 100 seconds runs on each kernel and here are my results: > > ### 6.13.7 > > | Run | Min FPS | Max FPS | Avg FPS | > |----------|-------------|-------------|-------------| > | Run #1 | 6646.52 | 6874.77 | 6739.313 | > | Run #2 | 5387.04 | 6723.274 | 6046.773 | > | Run #3 | 6230.49 | 6823.47 | 6423.923 | > | Run #4 | 5269.678 | 5870.59 | 5501.858 | > | Run #5 | 5504.54 | 6285.91 | 5859.724 | > > * Overall Avg FPS: 6114.318 FPS > > > ### 6.13.7 + DRM Syncobj optimisations > > | Run | Min FPS | Max FPS | Avg FPS | > |----------|-------------|-------------|-------------| > | Run #1 | 6089.05 | 7296.27 | 6859.724 | > | Run #2 | 6022.48 | 7264 | 6818.518 | > | Run #3 | 5987.68 | 6188.77 | 6041.365 | > | Run #4 | 5699.27 | 6448.99 | 6190.374 | > | Run #5 | 6199.27 | 6791.15 | 6450.900 | > > * Overall Avg FPS: 6472.176 FPS Neat, thanks for testing! I am not surprised a slower CPU benefits more. Btw if you have the raw data it would be nice to feed it to ministat too. Regards, Tvrtko > [1] https://github.com/raspberrypi/linux/tree/rpi-6.13.y > > Best Regards, > - Maíra > > On 18/03/25 12:54, Tvrtko Ursulin wrote: >> A small set of drm_syncobj optimisations which should make things a >> tiny bit >> more efficient on the CPU side of things. >> >> Improvement seems to be around 1.5%* more FPS if observed with "vkgears >> -present-mailbox" on a Steam Deck Plasma desktop, but I am reluctant >> to make a >> definitive claim on the numbers since there is some run to run >> variance. But, as >> suggested by Michel Dänzer, I did do a five ~100 second runs on the >> each kernel >> to be able to show the ministat analysis. >> >> x before >> + after >> +------------------------------------------------------------+ >> | x + | >> | x x + | >> | x xx ++++ | >> | x x xx x ++++ | >> | x xx x xx x+ ++++ | >> | xxxxx xxxxxx+ ++++ + + | >> | xxxxxxx xxxxxx+x ++++ +++ | >> | x xxxxxxxxxxx*xx+* x++++++++ ++ | >> | x x xxxxxxxxxxxx**x*+*+*++++++++ ++++ + | >> | xx x xxxxxxxxxx*x****+***+**+++++ ++++++ | >> |x xxx x xxxxx*x****x***********+*++**+++++++ + + +| >> | |_______A______| | >> | |______A_______| | >> +------------------------------------------------------------+ >> N Min Max Median Avg >> Stddev >> x 135 21697.58 22809.467 22321.396 22307.707 >> 198.75011 >> + 118 22200.746 23277.09 22661.4 22671.442 >> 192.10609 >> Difference at 95.0% confidence >> 363.735 +/- 48.3345 >> 1.63054% +/- 0.216672% >> (Student's t, pooled s = 195.681) >> >> Tvrtko Ursulin (7): >> drm/syncobj: Remove unhelpful helper >> drm/syncobj: Do not allocate an array to store zeros when waiting >> drm/syncobj: Avoid one temporary allocation in drm_syncobj_array_find >> drm/syncobj: Use put_user in drm_syncobj_query_ioctl >> drm/syncobj: Avoid temporary allocation in >> drm_syncobj_timeline_signal_ioctl >> drm/syncobj: Add a fast path to drm_syncobj_array_wait_timeout >> drm/syncobj: Add a fast path to drm_syncobj_array_find >> >> drivers/gpu/drm/drm_syncobj.c | 281 ++++++++++++++++++---------------- >> 1 file changed, 147 insertions(+), 134 deletions(-) >> >
Hi Tvrtko, On 25/03/25 06:57, Tvrtko Ursulin wrote: > > On 24/03/2025 23:17, Maíra Canal wrote: >> Hi Tvrtko, >> >> Thanks for this patchset! I applied this patchset to the RPi downstream >> kernel 6.13.7 [1] and saw an FPS improvement of approximately 5.85% >> with "vkgears -present-mailbox" on the RPi 5. >> >> I did five 100 seconds runs on each kernel and here are my results: >> >> ### 6.13.7 >> >> | Run | Min FPS | Max FPS | Avg FPS | >> |----------|-------------|-------------|-------------| >> | Run #1 | 6646.52 | 6874.77 | 6739.313 | >> | Run #2 | 5387.04 | 6723.274 | 6046.773 | >> | Run #3 | 6230.49 | 6823.47 | 6423.923 | >> | Run #4 | 5269.678 | 5870.59 | 5501.858 | >> | Run #5 | 5504.54 | 6285.91 | 5859.724 | >> >> * Overall Avg FPS: 6114.318 FPS >> >> >> ### 6.13.7 + DRM Syncobj optimisations >> >> | Run | Min FPS | Max FPS | Avg FPS | >> |----------|-------------|-------------|-------------| >> | Run #1 | 6089.05 | 7296.27 | 6859.724 | >> | Run #2 | 6022.48 | 7264 | 6818.518 | >> | Run #3 | 5987.68 | 6188.77 | 6041.365 | >> | Run #4 | 5699.27 | 6448.99 | 6190.374 | >> | Run #5 | 6199.27 | 6791.15 | 6450.900 | >> >> * Overall Avg FPS: 6472.176 FPS > > Neat, thanks for testing! I am not surprised a slower CPU benefits more. > > Btw if you have the raw data it would be nice to feed it to ministat too. I ran again and collected the raw data. Here is the ministat: x no-optimizations.txt + syncobjs-optimizations.txt +---------------------------------------------------------------------------+ | + + | | + + ++ | | x + + ++ | | xx * ++x ++ | | * xx +*+ +*x++ ++ | | x ++x *+xxx +*+ x+*x+*x x ++ x | |x xxx ++xxxx *+xxx +*+ x***+** x + ++ ** + + x++ | |xxxxx x +***x*x*+**x xxxx* xx+** *******x* x + +++x**x+*+ + **++x xxx x| | |__________|______A_M____MA__________|___| | +---------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 95 5660.033 7371.548 6413.172 6383.4326 431.10036 + 95 5914.994 7209.361 6538.192 6568.3293 345.7754 Difference at 95.0% confidence 184.897 +/- 111.131 2.89651% +/- 1.74093% (Student's t, pooled s = 390.774) Best Regards, - Maíra > > Regards, > > Tvrtko > >> [1] https://github.com/raspberrypi/linux/tree/rpi-6.13.y >> >> Best Regards, >> - Maíra >> >> On 18/03/25 12:54, Tvrtko Ursulin wrote: >>> A small set of drm_syncobj optimisations which should make things a >>> tiny bit >>> more efficient on the CPU side of things. >>> >>> Improvement seems to be around 1.5%* more FPS if observed with "vkgears >>> -present-mailbox" on a Steam Deck Plasma desktop, but I am reluctant >>> to make a >>> definitive claim on the numbers since there is some run to run >>> variance. But, as >>> suggested by Michel Dänzer, I did do a five ~100 second runs on the >>> each kernel >>> to be able to show the ministat analysis. >>> >>> x before >>> + after >>> +------------------------------------------------------------+ >>> | x + | >>> | x x + | >>> | x xx ++++ | >>> | x x xx x ++++ | >>> | x xx x xx x+ ++++ | >>> | xxxxx xxxxxx+ ++++ + + | >>> | xxxxxxx xxxxxx+x ++++ +++ | >>> | x xxxxxxxxxxx*xx+* x++++++++ ++ | >>> | x x xxxxxxxxxxxx**x*+*+*++++++++ ++++ + | >>> | xx x xxxxxxxxxx*x****+***+**+++++ ++++++ | >>> |x xxx x xxxxx*x****x***********+*++**+++++++ + + +| >>> | |_______A______| | >>> | |______A_______| | >>> +------------------------------------------------------------+ >>> N Min Max Median Avg Stddev >>> x 135 21697.58 22809.467 22321.396 22307.707 198.75011 >>> + 118 22200.746 23277.09 22661.4 22671.442 192.10609 >>> Difference at 95.0% confidence >>> 363.735 +/- 48.3345 >>> 1.63054% +/- 0.216672% >>> (Student's t, pooled s = 195.681) >>> >>> Tvrtko Ursulin (7): >>> drm/syncobj: Remove unhelpful helper >>> drm/syncobj: Do not allocate an array to store zeros when waiting >>> drm/syncobj: Avoid one temporary allocation in drm_syncobj_array_find >>> drm/syncobj: Use put_user in drm_syncobj_query_ioctl >>> drm/syncobj: Avoid temporary allocation in >>> drm_syncobj_timeline_signal_ioctl >>> drm/syncobj: Add a fast path to drm_syncobj_array_wait_timeout >>> drm/syncobj: Add a fast path to drm_syncobj_array_find >>> >>> drivers/gpu/drm/drm_syncobj.c | 281 ++++++++++++++++++---------------- >>> 1 file changed, 147 insertions(+), 134 deletions(-) >>> >> >