Message ID | 20230329-iwlwifi-ptp-avoid-64-bit-div-v1-1-ad8db8d66bc2@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Johannes Berg |
Headers | show |
Series | [wireless-next] wifi: iwlwifi: mvm: Avoid 64-bit division in iwl_mvm_get_crosstimestamp_fw() | expand |
On Wed, 2023-03-29 at 10:05 -0700, Nathan Chancellor wrote: > > GCC has optimizations for division by a constant that clang does not > implement, so this issue is not visible when building with GCC. Huh yeah, we did 32-bit builds with gcc ... > Using div_u64() would resolve this issue, but Arnd points out that this > can be quite expensive and the timestamp is being read at nanosecond > granularity. Doesn't matter though, all the calculations are based on just the command response from the firmware, which (tries to) take it in a synchronised fashion. So taking more time here would be fine, as far as I can tell. > Nick pointed out that the result of this division is being > stored to a 32-bit type anyways, so truncate gp2_10ns first then do the > division, which elides the need for libcalls. That loses ~7 top bits though, no? I'd be more worried about that, than the time div_u64() takes. johannes
On Wed, Mar 29, 2023 at 07:20:43PM +0200, Johannes Berg wrote: > On Wed, 2023-03-29 at 10:05 -0700, Nathan Chancellor wrote: > > > > GCC has optimizations for division by a constant that clang does not > > implement, so this issue is not visible when building with GCC. > > Huh yeah, we did 32-bit builds with gcc ... > > > Using div_u64() would resolve this issue, but Arnd points out that this > > can be quite expensive and the timestamp is being read at nanosecond > > granularity. > > Doesn't matter though, all the calculations are based on just the > command response from the firmware, which (tries to) take it in a > synchronised fashion. Okay, that is good information, thanks for providing it! > So taking more time here would be fine, as far as I can tell. > > > Nick pointed out that the result of this division is being > > stored to a 32-bit type anyways, so truncate gp2_10ns first then do the > > division, which elides the need for libcalls. > > That loses ~7 top bits though, no? I'd be more worried about that, than > the time div_u64() takes. Right, I sent this version of the fix to spur discussion around whether or not this was an acceptable approach, rather than having the question sit unanswered in our issue tracker :) I have no problems sending a v2 to use div_u64() and be done with it, which I will do shortly. Thanks for the quick input, cheers! Nathan
On Wed, Mar 29, 2023 at 10:20 AM Johannes Berg <johannes@sipsolutions.net> wrote: > > On Wed, 2023-03-29 at 10:05 -0700, Nathan Chancellor wrote: > > > > GCC has optimizations for division by a constant that clang does not > > implement, so this issue is not visible when building with GCC. > > Huh yeah, we did 32-bit builds with gcc ... Right, GCC is better about turning division by double-word constant into multiplication by reciprocal. Craig has been improving LLVM, but it seems that some divisors still aren't supported (in this case 100). > > > Using div_u64() would resolve this issue, but Arnd points out that this > > can be quite expensive and the timestamp is being read at nanosecond > > granularity. > > Doesn't matter though, all the calculations are based on just the > command response from the firmware, which (tries to) take it in a > synchronised fashion. > > So taking more time here would be fine, as far as I can tell. div_u64() it is then. > > > Nick pointed out that the result of this division is being > > stored to a 32-bit type anyways, so truncate gp2_10ns first then do the > > division, which elides the need for libcalls. > > That loses ~7 top bits though, no? I'd be more worried about that, than > the time div_u64() takes. The result is still stored in a u32; there is a loss of precision regardless of use of div_u64 or open coded binary operator /. So is the loss of precision before the division as tolerable as after the division?
On Wed, 2023-03-29 at 10:30 -0700, Nick Desaulniers wrote: > > > > > Nick pointed out that the result of this division is being > > > stored to a 32-bit type anyways, so truncate gp2_10ns first then do the > > > division, which elides the need for libcalls. > > > > That loses ~7 top bits though, no? I'd be more worried about that, than > > the time div_u64() takes. > > The result is still stored in a u32; there is a loss of precision > regardless of use of div_u64 or open coded binary operator /. > Right, obviously. > So is > the loss of precision before the division as tolerable as after the > division? For all I can tell this is meant to be 'gp2' with an additional lower bits to reach a unit/granularity of 10ns, basically in FW something like gp2_10ns = gp2 * 100 + subsampling_10ns_unit (and gp2 in FW is a 32-bit value, so it rolls over eventually). But I _think_ we want to make a proper division by 100 to obtain back the original 'gp2' value here. johannes
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/ptp.c b/drivers/net/wireless/intel/iwlwifi/mvm/ptp.c index 5c2bfc8ed88d..cdd6d69c5b68 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/ptp.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/ptp.c @@ -116,7 +116,7 @@ iwl_mvm_get_crosstimestamp_fw(struct iwl_mvm *mvm, u32 *gp2, u64 *sys_time) gp2_10ns = (u64)le32_to_cpu(resp->gp2_timestamp_hi) << 32 | le32_to_cpu(resp->gp2_timestamp_lo); - *gp2 = gp2_10ns / 100; + *gp2 = (u32)gp2_10ns / 100; *sys_time = (u64)le32_to_cpu(resp->platform_timestamp_hi) << 32 | le32_to_cpu(resp->platform_timestamp_lo);