diff mbox series

[net-next] net: stmmac: optimize locking around PTP clock reads

Message ID 20220128170257.42094-1-yannick.vignon@oss.nxp.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net-next] net: stmmac: optimize locking around PTP clock reads | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 21 this patch: 21
netdev/cc_maintainers warning 2 maintainers not CCed: linux-arm-kernel@lists.infradead.org linux-stm32@st-md-mailman.stormreply.com
netdev/build_clang success Errors and warnings before: 2 this patch: 2
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 21 this patch: 21
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 147 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Yannick Vignon Jan. 28, 2022, 5:02 p.m. UTC
From: Yannick Vignon <yannick.vignon@nxp.com>

Reading the PTP clock is a simple operation requiring only 2 register
reads. Under a PREEMPT_RT kernel, protecting those reads by a spin_lock is
counter-productive:
 * if the task is preempted in-between the 2 reads, the return time value
could become inconsistent,
 * if the 2nd task preempting the 1st has a higher prio but needs to
read time as well, it will require 2 context switches, which will pretty
much always be more costly than just disabling preemption for the duration
of the 2 reads.

Improve the above situation by:
* replacing the PTP spinlock by a rwlock, and using read_lock for PTP
clock reads so simultaneous reads do not block each other,
* protecting the register reads by local_irq_save/local_irq_restore, to
ensure the code is not preempted between the 2 reads, even with PREEMPT_RT.

Signed-off-by: Yannick Vignon <yannick.vignon@nxp.com>
---
 .../net/ethernet/stmicro/stmmac/dwmac-intel.c |  4 ++--
 drivers/net/ethernet/stmicro/stmmac/stmmac.h  |  2 +-
 .../ethernet/stmicro/stmmac/stmmac_hwtstamp.c | 10 +++++++--
 .../net/ethernet/stmicro/stmmac/stmmac_ptp.c  | 22 +++++++++----------
 .../stmicro/stmmac/stmmac_selftests.c         |  8 +++----
 5 files changed, 26 insertions(+), 20 deletions(-)

Comments

Jakub Kicinski Feb. 1, 2022, 5:42 a.m. UTC | #1
On Fri, 28 Jan 2022 18:02:57 +0100 Yannick Vignon wrote:
> Reading the PTP clock is a simple operation requiring only 2 register
> reads. Under a PREEMPT_RT kernel, protecting those reads by a spin_lock is
> counter-productive:
>  * if the task is preempted in-between the 2 reads, the return time value
> could become inconsistent,
>  * if the 2nd task preempting the 1st has a higher prio but needs to
> read time as well, it will require 2 context switches, which will pretty
> much always be more costly than just disabling preemption for the duration
> of the 2 reads.
> 
> Improve the above situation by:
> * replacing the PTP spinlock by a rwlock, and using read_lock for PTP
> clock reads so simultaneous reads do not block each other,

Are you sure the reads don't latch the other register? Otherwise this
code is buggy, it should check for wrap around. (e.g. during 1.99 ->
2.00 transition driver can read .99, then 2, resulting in 2.99).

> * protecting the register reads by local_irq_save/local_irq_restore, to
> ensure the code is not preempted between the 2 reads, even with PREEMPT_RT.

>  	/* Get the TSSS value */
>  	ns = readl(ioaddr + PTP_STNSR);
>  	/* Get the TSS and convert sec time value to nanosecond */
>  	ns += readl(ioaddr + PTP_STSR) * 1000000000ULL;
Yannick Vignon Feb. 1, 2022, 5:54 p.m. UTC | #2
On 2/1/2022 6:42 AM, Jakub Kicinski wrote:
> On Fri, 28 Jan 2022 18:02:57 +0100 Yannick Vignon wrote:
>> Reading the PTP clock is a simple operation requiring only 2 register
>> reads. Under a PREEMPT_RT kernel, protecting those reads by a spin_lock is
>> counter-productive:
>>   * if the task is preempted in-between the 2 reads, the return time value
>> could become inconsistent,
>>   * if the 2nd task preempting the 1st has a higher prio but needs to
>> read time as well, it will require 2 context switches, which will pretty
>> much always be more costly than just disabling preemption for the duration
>> of the 2 reads.
>>
>> Improve the above situation by:
>> * replacing the PTP spinlock by a rwlock, and using read_lock for PTP
>> clock reads so simultaneous reads do not block each other,
> 
> Are you sure the reads don't latch the other register? Otherwise this
> code is buggy, it should check for wrap around. (e.g. during 1.99 ->
> 2.00 transition driver can read .99, then 2, resulting in 2.99).
> 

Well, we did observe the issue on another device (micro-controller, not 
running Linux) using the same IP, and we were wondering how the Linux 
driver could work and why we didn't observe the issue... I experimented 
again today, and I did observe the problem, so I guess we didn't try 
hard enough before. (this time I bypassed the kernel by doing tight read 
loops from user-space after mmap'ing the registers).
Going to add another commit to this patch-queue to fix that.

>> * protecting the register reads by local_irq_save/local_irq_restore, to
>> ensure the code is not preempted between the 2 reads, even with PREEMPT_RT.
> 
>>   	/* Get the TSSS value */
>>   	ns = readl(ioaddr + PTP_STNSR);
>>   	/* Get the TSS and convert sec time value to nanosecond */
>>   	ns += readl(ioaddr + PTP_STSR) * 1000000000ULL;
Jakub Kicinski Feb. 1, 2022, 7:12 p.m. UTC | #3
On Tue, 1 Feb 2022 18:54:15 +0100 Yannick Vignon wrote:
> On 2/1/2022 6:42 AM, Jakub Kicinski wrote:
> > On Fri, 28 Jan 2022 18:02:57 +0100 Yannick Vignon wrote:  
> >> Reading the PTP clock is a simple operation requiring only 2 register
> >> reads. Under a PREEMPT_RT kernel, protecting those reads by a spin_lock is
> >> counter-productive:
> >>   * if the task is preempted in-between the 2 reads, the return time value
> >> could become inconsistent,
> >>   * if the 2nd task preempting the 1st has a higher prio but needs to
> >> read time as well, it will require 2 context switches, which will pretty
> >> much always be more costly than just disabling preemption for the duration
> >> of the 2 reads.
> >>
> >> Improve the above situation by:
> >> * replacing the PTP spinlock by a rwlock, and using read_lock for PTP
> >> clock reads so simultaneous reads do not block each other,  
> > 
> > Are you sure the reads don't latch the other register? Otherwise this
> > code is buggy, it should check for wrap around. (e.g. during 1.99 ->
> > 2.00 transition driver can read .99, then 2, resulting in 2.99).
> 
> Well, we did observe the issue on another device (micro-controller, not 
> running Linux) using the same IP, and we were wondering how the Linux 
> driver could work and why we didn't observe the issue... I experimented 
> again today, and I did observe the problem, so I guess we didn't try 
> hard enough before. (this time I bypassed the kernel by doing tight read 
> loops from user-space after mmap'ing the registers).
> Going to add another commit to this patch-queue to fix that.

That's a fix tho, it needs to be a separate change containing a Fixes
tag and targeted at the netdev/net tree. We'd first apply that, and
then the optimizations on top of it into the net-next tree.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
index 8e8778cfbbad..5943ff9f21c2 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
@@ -383,10 +383,10 @@  static int intel_crosststamp(ktime_t *device,
 
 	/* Repeat until the timestamps are from the FIFO last segment */
 	for (i = 0; i < num_snapshot; i++) {
-		spin_lock_irqsave(&priv->ptp_lock, flags);
+		read_lock_irqsave(&priv->ptp_lock, flags);
 		stmmac_get_ptptime(priv, ptpaddr, &ptp_time);
 		*device = ns_to_ktime(ptp_time);
-		spin_unlock_irqrestore(&priv->ptp_lock, flags);
+		read_unlock_irqrestore(&priv->ptp_lock, flags);
 		get_arttime(priv->mii, intel_priv->mdio_adhoc_addr, &art_time);
 		*system = convert_art_to_tsc(art_time);
 	}
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index 5b195d5051d6..57970ae2178d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -263,7 +263,7 @@  struct stmmac_priv {
 	u32 adv_ts;
 	int use_riwt;
 	int irq_wake;
-	spinlock_t ptp_lock;
+	rwlock_t ptp_lock;
 	/* Protects auxiliary snapshot registers from concurrent access. */
 	struct mutex aux_ts_lock;
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c
index 074e2cdfb0fa..da22b29f8711 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c
@@ -145,12 +145,15 @@  static int adjust_systime(void __iomem *ioaddr, u32 sec, u32 nsec,
 
 static void get_systime(void __iomem *ioaddr, u64 *systime)
 {
+	unsigned long flags;
 	u64 ns;
 
+	local_irq_save(flags);
 	/* Get the TSSS value */
 	ns = readl(ioaddr + PTP_STNSR);
 	/* Get the TSS and convert sec time value to nanosecond */
 	ns += readl(ioaddr + PTP_STSR) * 1000000000ULL;
+	local_irq_restore(flags);
 
 	if (systime)
 		*systime = ns;
@@ -158,10 +161,13 @@  static void get_systime(void __iomem *ioaddr, u64 *systime)
 
 static void get_ptptime(void __iomem *ptpaddr, u64 *ptp_time)
 {
+	unsigned long flags;
 	u64 ns;
 
+	local_irq_save(flags);
 	ns = readl(ptpaddr + PTP_ATNR);
 	ns += readl(ptpaddr + PTP_ATSR) * NSEC_PER_SEC;
+	local_irq_restore(flags);
 
 	*ptp_time = ns;
 }
@@ -191,9 +197,9 @@  static void timestamp_interrupt(struct stmmac_priv *priv)
 		       GMAC_TIMESTAMP_ATSNS_SHIFT;
 
 	for (i = 0; i < num_snapshot; i++) {
-		spin_lock_irqsave(&priv->ptp_lock, flags);
+		read_lock_irqsave(&priv->ptp_lock, flags);
 		get_ptptime(priv->ptpaddr, &ptp_time);
-		spin_unlock_irqrestore(&priv->ptp_lock, flags);
+		read_unlock_irqrestore(&priv->ptp_lock, flags);
 		event.type = PTP_CLOCK_EXTTS;
 		event.index = 0;
 		event.timestamp = ptp_time;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
index 1c9f02f9c317..e45fb191d8e6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
@@ -39,9 +39,9 @@  static int stmmac_adjust_freq(struct ptp_clock_info *ptp, s32 ppb)
 	diff = div_u64(adj, 1000000000ULL);
 	addend = neg_adj ? (addend - diff) : (addend + diff);
 
-	spin_lock_irqsave(&priv->ptp_lock, flags);
+	write_lock_irqsave(&priv->ptp_lock, flags);
 	stmmac_config_addend(priv, priv->ptpaddr, addend);
-	spin_unlock_irqrestore(&priv->ptp_lock, flags);
+	write_unlock_irqrestore(&priv->ptp_lock, flags);
 
 	return 0;
 }
@@ -86,9 +86,9 @@  static int stmmac_adjust_time(struct ptp_clock_info *ptp, s64 delta)
 		mutex_unlock(&priv->plat->est->lock);
 	}
 
-	spin_lock_irqsave(&priv->ptp_lock, flags);
+	write_lock_irqsave(&priv->ptp_lock, flags);
 	stmmac_adjust_systime(priv, priv->ptpaddr, sec, nsec, neg_adj, xmac);
-	spin_unlock_irqrestore(&priv->ptp_lock, flags);
+	write_unlock_irqrestore(&priv->ptp_lock, flags);
 
 	/* Caculate new basetime and re-configured EST after PTP time adjust. */
 	if (est_rst) {
@@ -137,9 +137,9 @@  static int stmmac_get_time(struct ptp_clock_info *ptp, struct timespec64 *ts)
 	unsigned long flags;
 	u64 ns = 0;
 
-	spin_lock_irqsave(&priv->ptp_lock, flags);
+	read_lock_irqsave(&priv->ptp_lock, flags);
 	stmmac_get_systime(priv, priv->ptpaddr, &ns);
-	spin_unlock_irqrestore(&priv->ptp_lock, flags);
+	read_unlock_irqrestore(&priv->ptp_lock, flags);
 
 	*ts = ns_to_timespec64(ns);
 
@@ -162,9 +162,9 @@  static int stmmac_set_time(struct ptp_clock_info *ptp,
 	    container_of(ptp, struct stmmac_priv, ptp_clock_ops);
 	unsigned long flags;
 
-	spin_lock_irqsave(&priv->ptp_lock, flags);
+	write_lock_irqsave(&priv->ptp_lock, flags);
 	stmmac_init_systime(priv, priv->ptpaddr, ts->tv_sec, ts->tv_nsec);
-	spin_unlock_irqrestore(&priv->ptp_lock, flags);
+	write_unlock_irqrestore(&priv->ptp_lock, flags);
 
 	return 0;
 }
@@ -194,12 +194,12 @@  static int stmmac_enable(struct ptp_clock_info *ptp,
 		cfg->period.tv_sec = rq->perout.period.sec;
 		cfg->period.tv_nsec = rq->perout.period.nsec;
 
-		spin_lock_irqsave(&priv->ptp_lock, flags);
+		write_lock_irqsave(&priv->ptp_lock, flags);
 		ret = stmmac_flex_pps_config(priv, priv->ioaddr,
 					     rq->perout.index, cfg, on,
 					     priv->sub_second_inc,
 					     priv->systime_flags);
-		spin_unlock_irqrestore(&priv->ptp_lock, flags);
+		write_unlock_irqrestore(&priv->ptp_lock, flags);
 		break;
 	case PTP_CLK_REQ_EXTTS:
 		priv->plat->ext_snapshot_en = on;
@@ -314,7 +314,7 @@  void stmmac_ptp_register(struct stmmac_priv *priv)
 	stmmac_ptp_clock_ops.n_per_out = priv->dma_cap.pps_out_num;
 	stmmac_ptp_clock_ops.n_ext_ts = priv->dma_cap.aux_snapshot_n;
 
-	spin_lock_init(&priv->ptp_lock);
+	rwlock_init(&priv->ptp_lock);
 	mutex_init(&priv->aux_ts_lock);
 	priv->ptp_clock_ops = stmmac_ptp_clock_ops;
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c
index be3cb63675a5..9f1759593b94 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c
@@ -1777,9 +1777,9 @@  static int stmmac_test_tbs(struct stmmac_priv *priv)
 	if (ret)
 		return ret;
 
-	spin_lock_irqsave(&priv->ptp_lock, flags);
+	read_lock_irqsave(&priv->ptp_lock, flags);
 	stmmac_get_systime(priv, priv->ptpaddr, &curr_time);
-	spin_unlock_irqrestore(&priv->ptp_lock, flags);
+	read_unlock_irqrestore(&priv->ptp_lock, flags);
 
 	if (!curr_time) {
 		ret = -EOPNOTSUPP;
@@ -1799,9 +1799,9 @@  static int stmmac_test_tbs(struct stmmac_priv *priv)
 		goto fail_disable;
 
 	/* Check if expected time has elapsed */
-	spin_lock_irqsave(&priv->ptp_lock, flags);
+	read_lock_irqsave(&priv->ptp_lock, flags);
 	stmmac_get_systime(priv, priv->ptpaddr, &curr_time);
-	spin_unlock_irqrestore(&priv->ptp_lock, flags);
+	read_unlock_irqrestore(&priv->ptp_lock, flags);
 
 	if ((curr_time - start_time) < STMMAC_TBS_LT_OFFSET)
 		ret = -EINVAL;