Message ID | 20241025194753.3070604-1-vadfed@meta.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next,v2,1/2] bnxt_en: cache only 24 bits of hw counter | expand |
On Fri, Oct 25, 2024 at 12:48 PM Vadim Fedorenko <vadfed@meta.com> wrote: > > This hardware can provide only 48 bits of cycle counter. We can leave > only 24 bits in the cache to extend RX timestamps from 32 bits to 48 > bits. This make cache writes atomic even on 32 bit platforms and we can > simply use READ_ONCE()/WRITE_ONCE() pair and remove spinlock. The > configuration structure will be also reduced by 4 bytes. ptp->old_time serves 2 purposes: to cache the upper 16 bits of the HW counter and for rollover check. With this patch reducing ptp->old_time to 24 bits, we now use the upper 16 bits for the cache and the next 8 bits for the rollover check. I think this will work. But since the field is now 32-bit, why not use the full 32 bits instead of 24 bits? Thanks.
On 25/10/2024 22:31, Michael Chan wrote: > On Fri, Oct 25, 2024 at 12:48 PM Vadim Fedorenko <vadfed@meta.com> wrote: >> >> This hardware can provide only 48 bits of cycle counter. We can leave >> only 24 bits in the cache to extend RX timestamps from 32 bits to 48 >> bits. This make cache writes atomic even on 32 bit platforms and we can >> simply use READ_ONCE()/WRITE_ONCE() pair and remove spinlock. The >> configuration structure will be also reduced by 4 bytes. > > ptp->old_time serves 2 purposes: to cache the upper 16 bits of the HW > counter and for rollover check. With this patch reducing > ptp->old_time to 24 bits, we now use the upper 16 bits for the cache > and the next 8 bits for the rollover check. I think this will work. > But since the field is now 32-bit, why not use the full 32 bits > instead of 24 bits? Thanks. As you confirmed that the HW has 48 bits of cycle counter, we have to cache 16 bits only. The other 8 bits are used for the rollover check. We can even use less bits for it. What is the use for another 8 bits? With this patch the rollover will happen every ~16 ms, but will be caught even till ~4s. If we will cache upper 32 bits, the rollover will happen every 64us and we will have to update cache value more often. But at the same time it will not bring any value, because the upper limit will still be ~4s. That's why I don't see any benefits of using all 32 bits.
On Fri, Oct 25, 2024 at 2:53 PM Vadim Fedorenko <vadim.fedorenko@linux.dev> wrote: > > On 25/10/2024 22:31, Michael Chan wrote: > > On Fri, Oct 25, 2024 at 12:48 PM Vadim Fedorenko <vadfed@meta.com> wrote: > >> > >> This hardware can provide only 48 bits of cycle counter. We can leave > >> only 24 bits in the cache to extend RX timestamps from 32 bits to 48 > >> bits. This make cache writes atomic even on 32 bit platforms and we can > >> simply use READ_ONCE()/WRITE_ONCE() pair and remove spinlock. The > >> configuration structure will be also reduced by 4 bytes. > > > > ptp->old_time serves 2 purposes: to cache the upper 16 bits of the HW > > counter and for rollover check. With this patch reducing > > ptp->old_time to 24 bits, we now use the upper 16 bits for the cache > > and the next 8 bits for the rollover check. I think this will work. > > But since the field is now 32-bit, why not use the full 32 bits > > instead of 24 bits? Thanks. > > As you confirmed that the HW has 48 bits of cycle counter, we have to > cache 16 bits only. The other 8 bits are used for the rollover check. We > can even use less bits for it. What is the use for another 8 bits? With > this patch the rollover will happen every ~16 ms, but will be caught > even till ~4s. If we will cache upper 32 bits, the rollover will happen > every 64us and we will have to update cache value more often. But at the > same time it will not bring any value, because the upper limit will > still be ~4s. That's why I don't see any benefits of using all 32 bits. I agree the extra bits have no value other than to fill the 32-bit field. But it should not affect the frequency of caching ptp->old_time. It should be updated in bnxt_ptp_ts_aux_work() at a fixed interval (1 second).
On 25/10/2024 23:13, Michael Chan wrote: > On Fri, Oct 25, 2024 at 2:53 PM Vadim Fedorenko > <vadim.fedorenko@linux.dev> wrote: >> >> On 25/10/2024 22:31, Michael Chan wrote: >>> On Fri, Oct 25, 2024 at 12:48 PM Vadim Fedorenko <vadfed@meta.com> wrote: >>>> >>>> This hardware can provide only 48 bits of cycle counter. We can leave >>>> only 24 bits in the cache to extend RX timestamps from 32 bits to 48 >>>> bits. This make cache writes atomic even on 32 bit platforms and we can >>>> simply use READ_ONCE()/WRITE_ONCE() pair and remove spinlock. The >>>> configuration structure will be also reduced by 4 bytes. >>> >>> ptp->old_time serves 2 purposes: to cache the upper 16 bits of the HW >>> counter and for rollover check. With this patch reducing >>> ptp->old_time to 24 bits, we now use the upper 16 bits for the cache >>> and the next 8 bits for the rollover check. I think this will work. >>> But since the field is now 32-bit, why not use the full 32 bits >>> instead of 24 bits? Thanks. >> >> As you confirmed that the HW has 48 bits of cycle counter, we have to >> cache 16 bits only. The other 8 bits are used for the rollover check. We >> can even use less bits for it. What is the use for another 8 bits? With >> this patch the rollover will happen every ~16 ms, but will be caught >> even till ~4s. If we will cache upper 32 bits, the rollover will happen >> every 64us and we will have to update cache value more often. But at the >> same time it will not bring any value, because the upper limit will >> still be ~4s. That's why I don't see any benefits of using all 32 bits. > > I agree the extra bits have no value other than to fill the 32-bit > field. But it should not affect the frequency of caching > ptp->old_time. It should be updated in bnxt_ptp_ts_aux_work() at a > fixed interval (1 second). Ok, BNXT_LO_TIMER_MASK/BNXT_HI_TIMER_MASK use 24 bits only. I don't see any reason to keep more bits out of 48 bit of counter and I tried to be consistent across the code. It doesn't matter in terms of performance should we shift 16 or 24 bits. But because there will be another version (I forgot to remove one variable), I can change the code to fill 32-bit field if you insist.
On Sat, Oct 26, 2024 at 3:20 PM Vadim Fedorenko <vadim.fedorenko@linux.dev> wrote: > Ok, BNXT_LO_TIMER_MASK/BNXT_HI_TIMER_MASK use 24 bits only. I don't see > any reason to keep more bits out of 48 bit of counter and I tried to be > consistent across the code. It doesn't matter in terms of performance > should we shift 16 or 24 bits. But because there will be another version > (I forgot to remove one variable), I can change the code to fill 32-bit > field if you insist. 24-bit will be fine. but please document that 8 bits will be used for roll-over check. Thanks.
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c index fa514be87650..c7e626b9098a 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c @@ -106,7 +106,7 @@ static void bnxt_ptp_get_current_time(struct bnxt *bp) if (!ptp) return; spin_lock_irqsave(&ptp->ptp_lock, flags); - WRITE_ONCE(ptp->old_time, ptp->current_time); + WRITE_ONCE(ptp->old_time, (u32)(ptp->current_time >> 24)); bnxt_refclk_read(bp, NULL, &ptp->current_time); spin_unlock_irqrestore(&ptp->ptp_lock, flags); } @@ -174,7 +174,7 @@ void bnxt_ptp_update_current_time(struct bnxt *bp) struct bnxt_ptp_cfg *ptp = bp->ptp_cfg; bnxt_refclk_read(ptp->bp, NULL, &ptp->current_time); - WRITE_ONCE(ptp->old_time, ptp->current_time); + WRITE_ONCE(ptp->old_time, (u32)(ptp->current_time >> 24)); } static int bnxt_ptp_adjphc(struct bnxt_ptp_cfg *ptp, s64 delta) @@ -813,7 +813,7 @@ int bnxt_get_rx_ts_p5(struct bnxt *bp, u64 *ts, u32 pkt_ts) if (!ptp) return -ENODEV; - BNXT_READ_TIME64(ptp, time, ptp->old_time); + time = (u64)(READ_ONCE(ptp->old_time) << 24); *ts = (time & BNXT_HI_TIMER_MASK) | pkt_ts; if (pkt_ts < (time & BNXT_LO_TIMER_MASK)) *ts += BNXT_LO_TIMER_MASK + 1; @@ -1079,7 +1079,7 @@ int bnxt_ptp_init(struct bnxt *bp, bool phc_cfg) spin_lock_irqsave(&ptp->ptp_lock, flags); bnxt_refclk_read(bp, NULL, &ptp->current_time); - WRITE_ONCE(ptp->old_time, ptp->current_time); + WRITE_ONCE(ptp->old_time, (u32)(ptp->current_time >> 24)); spin_unlock_irqrestore(&ptp->ptp_lock, flags); ptp_schedule_worker(ptp->ptp_clock, 0); } diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h index f322466ecad3..80046bd314db 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h @@ -106,10 +106,10 @@ struct bnxt_ptp_cfg { /* serialize ts tx request queuing */ spinlock_t ptp_tx_lock; u64 current_time; - u64 old_time; unsigned long next_period; unsigned long next_overflow_check; u32 cmult; + u32 old_time; /* a 23b shift cyclecounter will overflow in ~36 mins. Check overflow every 18 mins. */ #define BNXT_PHC_OVERFLOW_PERIOD (18 * 60 * HZ) @@ -145,20 +145,6 @@ struct bnxt_ptp_cfg { struct bnxt_ptp_stats stats; }; -#if BITS_PER_LONG == 32 -#define BNXT_READ_TIME64(ptp, dst, src) \ -do { \ - unsigned long flags; \ - \ - spin_lock_irqsave(&(ptp)->ptp_lock, flags); \ - (dst) = (src); \ - spin_unlock_irqrestore(&(ptp)->ptp_lock, flags); \ -} while (0) -#else -#define BNXT_READ_TIME64(ptp, dst, src) \ - ((dst) = READ_ONCE(src)) -#endif - #define BNXT_PTP_INC_TX_AVAIL(ptp) \ do { \ spin_lock_bh(&(ptp)->ptp_tx_lock); \
This hardware can provide only 48 bits of cycle counter. We can leave only 24 bits in the cache to extend RX timestamps from 32 bits to 48 bits. This make cache writes atomic even on 32 bit platforms and we can simply use READ_ONCE()/WRITE_ONCE() pair and remove spinlock. The configuration structure will be also reduced by 4 bytes. Signed-off-by: Vadim Fedorenko <vadfed@meta.com> --- drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c | 8 ++++---- drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h | 16 +--------------- 2 files changed, 5 insertions(+), 19 deletions(-)