Message ID | 20220128170257.42094-1-yannick.vignon@oss.nxp.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] net: stmmac: optimize locking around PTP clock reads | expand |
On Fri, 28 Jan 2022 18:02:57 +0100 Yannick Vignon wrote: > Reading the PTP clock is a simple operation requiring only 2 register > reads. Under a PREEMPT_RT kernel, protecting those reads by a spin_lock is > counter-productive: > * if the task is preempted in-between the 2 reads, the return time value > could become inconsistent, > * if the 2nd task preempting the 1st has a higher prio but needs to > read time as well, it will require 2 context switches, which will pretty > much always be more costly than just disabling preemption for the duration > of the 2 reads. > > Improve the above situation by: > * replacing the PTP spinlock by a rwlock, and using read_lock for PTP > clock reads so simultaneous reads do not block each other, Are you sure the reads don't latch the other register? Otherwise this code is buggy, it should check for wrap around. (e.g. during 1.99 -> 2.00 transition driver can read .99, then 2, resulting in 2.99). > * protecting the register reads by local_irq_save/local_irq_restore, to > ensure the code is not preempted between the 2 reads, even with PREEMPT_RT. > /* Get the TSSS value */ > ns = readl(ioaddr + PTP_STNSR); > /* Get the TSS and convert sec time value to nanosecond */ > ns += readl(ioaddr + PTP_STSR) * 1000000000ULL;
On 2/1/2022 6:42 AM, Jakub Kicinski wrote: > On Fri, 28 Jan 2022 18:02:57 +0100 Yannick Vignon wrote: >> Reading the PTP clock is a simple operation requiring only 2 register >> reads. Under a PREEMPT_RT kernel, protecting those reads by a spin_lock is >> counter-productive: >> * if the task is preempted in-between the 2 reads, the return time value >> could become inconsistent, >> * if the 2nd task preempting the 1st has a higher prio but needs to >> read time as well, it will require 2 context switches, which will pretty >> much always be more costly than just disabling preemption for the duration >> of the 2 reads. >> >> Improve the above situation by: >> * replacing the PTP spinlock by a rwlock, and using read_lock for PTP >> clock reads so simultaneous reads do not block each other, > > Are you sure the reads don't latch the other register? Otherwise this > code is buggy, it should check for wrap around. (e.g. during 1.99 -> > 2.00 transition driver can read .99, then 2, resulting in 2.99). > Well, we did observe the issue on another device (micro-controller, not running Linux) using the same IP, and we were wondering how the Linux driver could work and why we didn't observe the issue... I experimented again today, and I did observe the problem, so I guess we didn't try hard enough before. (this time I bypassed the kernel by doing tight read loops from user-space after mmap'ing the registers). Going to add another commit to this patch-queue to fix that. >> * protecting the register reads by local_irq_save/local_irq_restore, to >> ensure the code is not preempted between the 2 reads, even with PREEMPT_RT. > >> /* Get the TSSS value */ >> ns = readl(ioaddr + PTP_STNSR); >> /* Get the TSS and convert sec time value to nanosecond */ >> ns += readl(ioaddr + PTP_STSR) * 1000000000ULL;
On Tue, 1 Feb 2022 18:54:15 +0100 Yannick Vignon wrote: > On 2/1/2022 6:42 AM, Jakub Kicinski wrote: > > On Fri, 28 Jan 2022 18:02:57 +0100 Yannick Vignon wrote: > >> Reading the PTP clock is a simple operation requiring only 2 register > >> reads. Under a PREEMPT_RT kernel, protecting those reads by a spin_lock is > >> counter-productive: > >> * if the task is preempted in-between the 2 reads, the return time value > >> could become inconsistent, > >> * if the 2nd task preempting the 1st has a higher prio but needs to > >> read time as well, it will require 2 context switches, which will pretty > >> much always be more costly than just disabling preemption for the duration > >> of the 2 reads. > >> > >> Improve the above situation by: > >> * replacing the PTP spinlock by a rwlock, and using read_lock for PTP > >> clock reads so simultaneous reads do not block each other, > > > > Are you sure the reads don't latch the other register? Otherwise this > > code is buggy, it should check for wrap around. (e.g. during 1.99 -> > > 2.00 transition driver can read .99, then 2, resulting in 2.99). > > Well, we did observe the issue on another device (micro-controller, not > running Linux) using the same IP, and we were wondering how the Linux > driver could work and why we didn't observe the issue... I experimented > again today, and I did observe the problem, so I guess we didn't try > hard enough before. (this time I bypassed the kernel by doing tight read > loops from user-space after mmap'ing the registers). > Going to add another commit to this patch-queue to fix that. That's a fix tho, it needs to be a separate change containing a Fixes tag and targeted at the netdev/net tree. We'd first apply that, and then the optimizations on top of it into the net-next tree.
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c index 8e8778cfbbad..5943ff9f21c2 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c @@ -383,10 +383,10 @@ static int intel_crosststamp(ktime_t *device, /* Repeat until the timestamps are from the FIFO last segment */ for (i = 0; i < num_snapshot; i++) { - spin_lock_irqsave(&priv->ptp_lock, flags); + read_lock_irqsave(&priv->ptp_lock, flags); stmmac_get_ptptime(priv, ptpaddr, &ptp_time); *device = ns_to_ktime(ptp_time); - spin_unlock_irqrestore(&priv->ptp_lock, flags); + read_unlock_irqrestore(&priv->ptp_lock, flags); get_arttime(priv->mii, intel_priv->mdio_adhoc_addr, &art_time); *system = convert_art_to_tsc(art_time); } diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h index 5b195d5051d6..57970ae2178d 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h @@ -263,7 +263,7 @@ struct stmmac_priv { u32 adv_ts; int use_riwt; int irq_wake; - spinlock_t ptp_lock; + rwlock_t ptp_lock; /* Protects auxiliary snapshot registers from concurrent access. */ struct mutex aux_ts_lock; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c index 074e2cdfb0fa..da22b29f8711 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c @@ -145,12 +145,15 @@ static int adjust_systime(void __iomem *ioaddr, u32 sec, u32 nsec, static void get_systime(void __iomem *ioaddr, u64 *systime) { + unsigned long flags; u64 ns; + local_irq_save(flags); /* Get the TSSS value */ ns = readl(ioaddr + PTP_STNSR); /* Get the TSS and convert sec time value to nanosecond */ ns += readl(ioaddr + PTP_STSR) * 1000000000ULL; + local_irq_restore(flags); if (systime) *systime = ns; @@ -158,10 +161,13 @@ static void get_systime(void __iomem *ioaddr, u64 *systime) static void get_ptptime(void __iomem *ptpaddr, u64 *ptp_time) { + unsigned long flags; u64 ns; + local_irq_save(flags); ns = readl(ptpaddr + PTP_ATNR); ns += readl(ptpaddr + PTP_ATSR) * NSEC_PER_SEC; + local_irq_restore(flags); *ptp_time = ns; } @@ -191,9 +197,9 @@ static void timestamp_interrupt(struct stmmac_priv *priv) GMAC_TIMESTAMP_ATSNS_SHIFT; for (i = 0; i < num_snapshot; i++) { - spin_lock_irqsave(&priv->ptp_lock, flags); + read_lock_irqsave(&priv->ptp_lock, flags); get_ptptime(priv->ptpaddr, &ptp_time); - spin_unlock_irqrestore(&priv->ptp_lock, flags); + read_unlock_irqrestore(&priv->ptp_lock, flags); event.type = PTP_CLOCK_EXTTS; event.index = 0; event.timestamp = ptp_time; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c index 1c9f02f9c317..e45fb191d8e6 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c @@ -39,9 +39,9 @@ static int stmmac_adjust_freq(struct ptp_clock_info *ptp, s32 ppb) diff = div_u64(adj, 1000000000ULL); addend = neg_adj ? (addend - diff) : (addend + diff); - spin_lock_irqsave(&priv->ptp_lock, flags); + write_lock_irqsave(&priv->ptp_lock, flags); stmmac_config_addend(priv, priv->ptpaddr, addend); - spin_unlock_irqrestore(&priv->ptp_lock, flags); + write_unlock_irqrestore(&priv->ptp_lock, flags); return 0; } @@ -86,9 +86,9 @@ static int stmmac_adjust_time(struct ptp_clock_info *ptp, s64 delta) mutex_unlock(&priv->plat->est->lock); } - spin_lock_irqsave(&priv->ptp_lock, flags); + write_lock_irqsave(&priv->ptp_lock, flags); stmmac_adjust_systime(priv, priv->ptpaddr, sec, nsec, neg_adj, xmac); - spin_unlock_irqrestore(&priv->ptp_lock, flags); + write_unlock_irqrestore(&priv->ptp_lock, flags); /* Caculate new basetime and re-configured EST after PTP time adjust. */ if (est_rst) { @@ -137,9 +137,9 @@ static int stmmac_get_time(struct ptp_clock_info *ptp, struct timespec64 *ts) unsigned long flags; u64 ns = 0; - spin_lock_irqsave(&priv->ptp_lock, flags); + read_lock_irqsave(&priv->ptp_lock, flags); stmmac_get_systime(priv, priv->ptpaddr, &ns); - spin_unlock_irqrestore(&priv->ptp_lock, flags); + read_unlock_irqrestore(&priv->ptp_lock, flags); *ts = ns_to_timespec64(ns); @@ -162,9 +162,9 @@ static int stmmac_set_time(struct ptp_clock_info *ptp, container_of(ptp, struct stmmac_priv, ptp_clock_ops); unsigned long flags; - spin_lock_irqsave(&priv->ptp_lock, flags); + write_lock_irqsave(&priv->ptp_lock, flags); stmmac_init_systime(priv, priv->ptpaddr, ts->tv_sec, ts->tv_nsec); - spin_unlock_irqrestore(&priv->ptp_lock, flags); + write_unlock_irqrestore(&priv->ptp_lock, flags); return 0; } @@ -194,12 +194,12 @@ static int stmmac_enable(struct ptp_clock_info *ptp, cfg->period.tv_sec = rq->perout.period.sec; cfg->period.tv_nsec = rq->perout.period.nsec; - spin_lock_irqsave(&priv->ptp_lock, flags); + write_lock_irqsave(&priv->ptp_lock, flags); ret = stmmac_flex_pps_config(priv, priv->ioaddr, rq->perout.index, cfg, on, priv->sub_second_inc, priv->systime_flags); - spin_unlock_irqrestore(&priv->ptp_lock, flags); + write_unlock_irqrestore(&priv->ptp_lock, flags); break; case PTP_CLK_REQ_EXTTS: priv->plat->ext_snapshot_en = on; @@ -314,7 +314,7 @@ void stmmac_ptp_register(struct stmmac_priv *priv) stmmac_ptp_clock_ops.n_per_out = priv->dma_cap.pps_out_num; stmmac_ptp_clock_ops.n_ext_ts = priv->dma_cap.aux_snapshot_n; - spin_lock_init(&priv->ptp_lock); + rwlock_init(&priv->ptp_lock); mutex_init(&priv->aux_ts_lock); priv->ptp_clock_ops = stmmac_ptp_clock_ops; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c index be3cb63675a5..9f1759593b94 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c @@ -1777,9 +1777,9 @@ static int stmmac_test_tbs(struct stmmac_priv *priv) if (ret) return ret; - spin_lock_irqsave(&priv->ptp_lock, flags); + read_lock_irqsave(&priv->ptp_lock, flags); stmmac_get_systime(priv, priv->ptpaddr, &curr_time); - spin_unlock_irqrestore(&priv->ptp_lock, flags); + read_unlock_irqrestore(&priv->ptp_lock, flags); if (!curr_time) { ret = -EOPNOTSUPP; @@ -1799,9 +1799,9 @@ static int stmmac_test_tbs(struct stmmac_priv *priv) goto fail_disable; /* Check if expected time has elapsed */ - spin_lock_irqsave(&priv->ptp_lock, flags); + read_lock_irqsave(&priv->ptp_lock, flags); stmmac_get_systime(priv, priv->ptpaddr, &curr_time); - spin_unlock_irqrestore(&priv->ptp_lock, flags); + read_unlock_irqrestore(&priv->ptp_lock, flags); if ((curr_time - start_time) < STMMAC_TBS_LT_OFFSET) ret = -EINVAL;