Message ID | 20250102194418.70383-4-tursulin@igalia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | A few drm_syncobj optimisations | expand |
On 2025-01-02 20:44, Tvrtko Ursulin wrote: > > "vkgears -present-mailbox" average framerate: > > Before: 21410.1089 > After: 21609.7225 > > With a disclaimer that measuring with vkgears feels a bit variable, > nevertheless it did not look like noise. That's ~1% difference. IME the frame rate can easily vary more than that during a single run. The frame rate surely varies by more than 1 fps during each run, so comparing such large average values down to 4 digits after the decimal point doesn't seem very useful. Doing multiple (at least 3 or more each) before & after runs and comparing the distribution of individual measured values using something like ministat might confirm it's essentially noise, or give more confidence it's not.
On 06/01/2025 17:16, Michel Dänzer wrote: > On 2025-01-02 20:44, Tvrtko Ursulin wrote: >> >> "vkgears -present-mailbox" average framerate: >> >> Before: 21410.1089 >> After: 21609.7225 >> >> With a disclaimer that measuring with vkgears feels a bit variable, >> nevertheless it did not look like noise. > > That's ~1% difference. IME the frame rate can easily vary more than that during a single run. > > The frame rate surely varies by more than 1 fps during each run, so comparing such large average values down to 4 digits after the decimal point doesn't seem very useful. > > Doing multiple (at least 3 or more each) before & after runs and comparing the distribution of individual measured values using something like ministat might confirm it's essentially noise, or give more confidence it's not. I did multiple runs and I thought from the code changes it would be obvious there is some code there which should go. But fair enough, I agree ministat is common practice so I re-did it. Five ~100 second runs each kernel. Absolute numbers are a bit different before I turned on some kernel hardening options since. x before + after +------------------------------------------------------------+ | x + | | x x + | | x xx ++++ | | x x xx x ++++ | | x xx x xx x+ ++++ | | xxxxx xxxxxx+ ++++ + + | | xxxxxxx xxxxxx+x ++++ +++ | | x xxxxxxxxxxx*xx+* x++++++++ ++ | | x x xxxxxxxxxxxx**x*+*+*++++++++ ++++ + | | xx x xxxxxxxxxx*x****+***+**+++++ ++++++ | |x xxx x xxxxx*x****x***********+*++**+++++++ + + +| | |_______A______| | | |______A_______| | +------------------------------------------------------------+ N Min Max Median Avg Stddev x 135 21697.58 22809.467 22321.396 22307.707 198.75011 + 118 22200.746 23277.09 22661.4 22671.442 192.10609 Difference at 95.0% confidence 363.735 +/- 48.3345 1.63054% +/- 0.216672% (Student's t, pooled s = 195.681) It's a small difference but every little helps. Regards, Tvrtko
On 2025-01-07 10:29, Tvrtko Ursulin wrote: > > On 06/01/2025 17:16, Michel Dänzer wrote: >> On 2025-01-02 20:44, Tvrtko Ursulin wrote: >>> >>> "vkgears -present-mailbox" average framerate: >>> >>> Before: 21410.1089 >>> After: 21609.7225 >>> >>> With a disclaimer that measuring with vkgears feels a bit variable, >>> nevertheless it did not look like noise. >> >> That's ~1% difference. IME the frame rate can easily vary more than that during a single run. >> >> The frame rate surely varies by more than 1 fps during each run, so comparing such large average values down to 4 digits after the decimal point doesn't seem very useful. >> >> Doing multiple (at least 3 or more each) before & after runs and comparing the distribution of individual measured values using something like ministat might confirm it's essentially noise, or give more confidence it's not. > > I did multiple runs and I thought from the code changes it would be obvious there is some code there which should go. FWIW, I didn't question that, in fact I don't have any particular opinion on the actual code changes. Claims of performance differences based on two numbers are just a pet peeve of mine. :) > But fair enough, I agree ministat is common practice so I re-did it. Five ~100 second runs each kernel. Absolute numbers are a bit different before I turned on some kernel hardening options since. > > x before > + after > +------------------------------------------------------------+ > | x + | > | x x + | > | x xx ++++ | > | x x xx x ++++ | > | x xx x xx x+ ++++ | > | xxxxx xxxxxx+ ++++ + + | > | xxxxxxx xxxxxx+x ++++ +++ | > | x xxxxxxxxxxx*xx+* x++++++++ ++ | > | x x xxxxxxxxxxxx**x*+*+*++++++++ ++++ + | > | xx x xxxxxxxxxx*x****+***+**+++++ ++++++ | > |x xxx x xxxxx*x****x***********+*++**+++++++ + + +| > | |_______A______| | > | |______A_______| | > +------------------------------------------------------------+ > N Min Max Median Avg Stddev > x 135 21697.58 22809.467 22321.396 22307.707 198.75011 > + 118 22200.746 23277.09 22661.4 22671.442 192.10609 > Difference at 95.0% confidence > 363.735 +/- 48.3345 > 1.63054% +/- 0.216672% > (Student's t, pooled s = 195.681) > > It's a small difference but every little helps. This gives a lot of confidence in the performance gain, thanks! Would be great if you could incorporate at least a summary of this into a commit log somehow.
On 07/01/2025 11:22, Michel Dänzer wrote: > On 2025-01-07 10:29, Tvrtko Ursulin wrote: >> >> On 06/01/2025 17:16, Michel Dänzer wrote: >>> On 2025-01-02 20:44, Tvrtko Ursulin wrote: >>>> >>>> "vkgears -present-mailbox" average framerate: >>>> >>>> Before: 21410.1089 >>>> After: 21609.7225 >>>> >>>> With a disclaimer that measuring with vkgears feels a bit variable, >>>> nevertheless it did not look like noise. >>> >>> That's ~1% difference. IME the frame rate can easily vary more than that during a single run. >>> >>> The frame rate surely varies by more than 1 fps during each run, so comparing such large average values down to 4 digits after the decimal point doesn't seem very useful. >>> >>> Doing multiple (at least 3 or more each) before & after runs and comparing the distribution of individual measured values using something like ministat might confirm it's essentially noise, or give more confidence it's not. >> >> I did multiple runs and I thought from the code changes it would be obvious there is some code there which should go. > > FWIW, I didn't question that, in fact I don't have any particular opinion on the actual code changes. Claims of performance differences based on two numbers are just a pet peeve of mine. :) > > >> But fair enough, I agree ministat is common practice so I re-did it. Five ~100 second runs each kernel. Absolute numbers are a bit different before I turned on some kernel hardening options since. >> >> x before >> + after >> +------------------------------------------------------------+ >> | x + | >> | x x + | >> | x xx ++++ | >> | x x xx x ++++ | >> | x xx x xx x+ ++++ | >> | xxxxx xxxxxx+ ++++ + + | >> | xxxxxxx xxxxxx+x ++++ +++ | >> | x xxxxxxxxxxx*xx+* x++++++++ ++ | >> | x x xxxxxxxxxxxx**x*+*+*++++++++ ++++ + | >> | xx x xxxxxxxxxx*x****+***+**+++++ ++++++ | >> |x xxx x xxxxx*x****x***********+*++**+++++++ + + +| >> | |_______A______| | >> | |______A_______| | >> +------------------------------------------------------------+ >> N Min Max Median Avg Stddev >> x 135 21697.58 22809.467 22321.396 22307.707 198.75011 >> + 118 22200.746 23277.09 22661.4 22671.442 192.10609 >> Difference at 95.0% confidence >> 363.735 +/- 48.3345 >> 1.63054% +/- 0.216672% >> (Student's t, pooled s = 195.681) >> >> It's a small difference but every little helps. > > This gives a lot of confidence in the performance gain, thanks! Would be great if you could incorporate at least a summary of this into a commit log somehow. Thanks! Now lets hope someone can spare the time to review. I pasted the graph into a reply to cover letter of v2 of the series. That's my best idea since it is a series with a few small improvements which add up to a total. Regards, Tvrtko
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 2fb95d6f6c82..059d6be3ff1f 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -1027,7 +1027,7 @@ static void syncobj_wait_syncobj_func(struct drm_syncobj *syncobj, } static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, - void __user *user_points, + u64 __user *user_points, uint32_t count, uint32_t flags, signed long timeout, @@ -1035,9 +1035,8 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, ktime_t *deadline) { struct syncobj_wait_entry *entries; - struct dma_fence *fence; - uint64_t *points; uint32_t signaled_count, i; + struct dma_fence *fence; if (flags & (DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT | DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE)) { @@ -1045,24 +1044,14 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, lockdep_assert_none_held_once(); } - points = kmalloc_array(count, sizeof(*points), GFP_KERNEL); - if (points == NULL) - return -ENOMEM; - - if (!user_points) { - memset(points, 0, count * sizeof(uint64_t)); - - } else if (copy_from_user(points, user_points, - sizeof(uint64_t) * count)) { - timeout = -EFAULT; - goto err_free_points; - } + if (user_points && + !access_ok(user_points, count * sizeof(*user_points))) + return -EFAULT; entries = kcalloc(count, sizeof(*entries), GFP_KERNEL); - if (!entries) { - timeout = -ENOMEM; - goto err_free_points; - } + if (!entries) + return -ENOMEM; + /* Walk the list of sync objects and initialize entries. We do * this up-front so that we can properly return -EINVAL if there is * a syncobj with a missing fence and then never have the chance of @@ -1073,9 +1062,13 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, struct dma_fence *fence; entries[i].task = current; - entries[i].point = points[i]; + if (user_points && get_user(entries[i].point, user_points++)) { + timeout = -EFAULT; + goto cleanup_entries; + } fence = drm_syncobj_fence_get(syncobjs[i]); - if (!fence || dma_fence_chain_find_seqno(&fence, points[i])) { + if (!fence || + dma_fence_chain_find_seqno(&fence, entries[i].point)) { dma_fence_put(fence); if (flags & (DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT | DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE)) { @@ -1181,9 +1174,6 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, } kfree(entries); -err_free_points: - kfree(points); - return timeout; }