diff mbox series

[net-next,2/4] tsnep: Fix rotten packets

Message ID 20221117201440.21183-3-gerhard@engleder-embedded.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series tsnep: Throttle irq, rotten pkts, RX buffer alloc and ethtool_get_channels() | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 0 this patch: 8
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang fail Errors and warnings before: 0 this patch: 10
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn fail Errors and warnings before: 0 this patch: 8
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 85 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Gerhard Engleder Nov. 17, 2022, 8:14 p.m. UTC
If PTP synchronisation is done every second, then sporadic the interval
is higher than one second:

ptp4l[696.582]: master offset        -17 s2 freq   -1891 path delay 573
ptp4l[697.582]: master offset        -22 s2 freq   -1901 path delay 573
ptp4l[699.368]: master offset         -1 s2 freq   -1887 path delay 573
      ^^^^^^^ Should be 698.582!

This problem is caused by rotten packets, which are received after
polling but before interrupts are enabled again. This can be fixed by
checking for pending work and rescheduling if necessary after interrupts
has been enabled again.

Fixes: 403f69bbdbad ("tsnep: Add TSN endpoint Ethernet MAC driver")
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
---
 drivers/net/ethernet/engleder/tsnep_main.c | 59 +++++++++++++++++++++-
 1 file changed, 58 insertions(+), 1 deletion(-)

Comments

Andrew Lunn Nov. 17, 2022, 8:39 p.m. UTC | #1
On Thu, Nov 17, 2022 at 09:14:38PM +0100, Gerhard Engleder wrote:
> If PTP synchronisation is done every second, then sporadic the interval
> is higher than one second:
> 
> ptp4l[696.582]: master offset        -17 s2 freq   -1891 path delay 573
> ptp4l[697.582]: master offset        -22 s2 freq   -1901 path delay 573
> ptp4l[699.368]: master offset         -1 s2 freq   -1887 path delay 573
>       ^^^^^^^ Should be 698.582!
> 
> This problem is caused by rotten packets, which are received after
> polling but before interrupts are enabled again.

Is this a hardware bug? At the end of the interrupt coalescence
period, should it not check the queue and fire an interrupt?

	Andrew
Gerhard Engleder Nov. 18, 2022, 6:13 a.m. UTC | #2
On 17.11.22 21:39, Andrew Lunn wrote:
> On Thu, Nov 17, 2022 at 09:14:38PM +0100, Gerhard Engleder wrote:
>> If PTP synchronisation is done every second, then sporadic the interval
>> is higher than one second:
>>
>> ptp4l[696.582]: master offset        -17 s2 freq   -1891 path delay 573
>> ptp4l[697.582]: master offset        -22 s2 freq   -1901 path delay 573
>> ptp4l[699.368]: master offset         -1 s2 freq   -1887 path delay 573
>>        ^^^^^^^ Should be 698.582!
>>
>> This problem is caused by rotten packets, which are received after
>> polling but before interrupts are enabled again.
> 
> Is this a hardware bug? At the end of the interrupt coalescence
> period, should it not check the queue and fire an interrupt?

In my case, the hardware is not signaled if a descriptor is processed by
the software. The hardware is only signaled if it gets new descriptors
assigned. So the hardware does not know if there are still descriptors
in the RX queue which need to be processed by the software. As a result,
it would only be possible to trigger an interrupt for descriptors which
may has been processed already anyway.

In the end I made the hardware stupid. If interrupts are disabled for
NAPI polling, then interrupts events in the hardware are ignored. If
interrupts are enabled again, then only new interrupt events will
trigger the interrupt. I was afraid that too intelligent hardware will
lead to hardware bugs in this case.

Gerhard
Jakub Kicinski Nov. 19, 2022, 1:26 a.m. UTC | #3
On Thu, 17 Nov 2022 21:14:38 +0100 Gerhard Engleder wrote:
> If PTP synchronisation is done every second, then sporadic the interval
> is higher than one second:
> 
> ptp4l[696.582]: master offset        -17 s2 freq   -1891 path delay 573
> ptp4l[697.582]: master offset        -22 s2 freq   -1901 path delay 573
> ptp4l[699.368]: master offset         -1 s2 freq   -1887 path delay 573
>       ^^^^^^^ Should be 698.582!
> 
> This problem is caused by rotten packets, which are received after
> polling but before interrupts are enabled again. This can be fixed by
> checking for pending work and rescheduling if necessary after interrupts
> has been enabled again.
> 
> Fixes: 403f69bbdbad ("tsnep: Add TSN endpoint Ethernet MAC driver")
> Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>

This patch needs to go to net separately :(
Packets getting stuck in a queue can cause real issues to users.
Gerhard Engleder Nov. 19, 2022, 8:47 p.m. UTC | #4
On 19.11.22 02:26, Jakub Kicinski wrote:
> On Thu, 17 Nov 2022 21:14:38 +0100 Gerhard Engleder wrote:
>> If PTP synchronisation is done every second, then sporadic the interval
>> is higher than one second:
>>
>> ptp4l[696.582]: master offset        -17 s2 freq   -1891 path delay 573
>> ptp4l[697.582]: master offset        -22 s2 freq   -1901 path delay 573
>> ptp4l[699.368]: master offset         -1 s2 freq   -1887 path delay 573
>>        ^^^^^^^ Should be 698.582!
>>
>> This problem is caused by rotten packets, which are received after
>> polling but before interrupts are enabled again. This can be fixed by
>> checking for pending work and rescheduling if necessary after interrupts
>> has been enabled again.
>>
>> Fixes: 403f69bbdbad ("tsnep: Add TSN endpoint Ethernet MAC driver")
>> Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
> 
> This patch needs to go to net separately :(
> Packets getting stuck in a queue can cause real issues to users.

I will post it separately.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/engleder/tsnep_main.c b/drivers/net/ethernet/engleder/tsnep_main.c
index a99320e03279..0aca2ba97757 100644
--- a/drivers/net/ethernet/engleder/tsnep_main.c
+++ b/drivers/net/ethernet/engleder/tsnep_main.c
@@ -544,6 +544,27 @@  static bool tsnep_tx_poll(struct tsnep_tx *tx, int napi_budget)
 	return (budget != 0);
 }
 
+static bool tsnep_tx_pending(struct tsnep_tx *tx)
+{
+	unsigned long flags;
+	struct tsnep_tx_entry *entry;
+	bool pending = false;
+
+	spin_lock_irqsave(&tx->lock, flags);
+
+	if (tx->read != tx->write) {
+		entry = &tx->entry[tx->read];
+		if ((__le32_to_cpu(entry->desc_wb->properties) &
+		     TSNEP_TX_DESC_OWNER_MASK) ==
+		    (entry->properties & TSNEP_TX_DESC_OWNER_MASK))
+			pending = true;
+	}
+
+	spin_unlock_irqrestore(&tx->lock, flags);
+
+	return pending;
+}
+
 static int tsnep_tx_open(struct tsnep_adapter *adapter, void __iomem *addr,
 			 int queue_index, struct tsnep_tx *tx)
 {
@@ -823,6 +844,21 @@  static int tsnep_rx_poll(struct tsnep_rx *rx, struct napi_struct *napi,
 	return done;
 }
 
+static bool tsnep_rx_pending(struct tsnep_rx *rx)
+{
+	struct tsnep_rx_entry *entry;
+
+	if (rx->read != rx->write) {
+		entry = &rx->entry[rx->read];
+		if ((__le32_to_cpu(entry->desc_wb->properties) &
+		     TSNEP_DESC_OWNER_COUNTER_MASK) ==
+		    (entry->properties & TSNEP_DESC_OWNER_COUNTER_MASK))
+			return true;
+	}
+
+	return false;
+}
+
 static int tsnep_rx_open(struct tsnep_adapter *adapter, void __iomem *addr,
 			 int queue_index, struct tsnep_rx *rx)
 {
@@ -868,6 +904,17 @@  static void tsnep_rx_close(struct tsnep_rx *rx)
 	tsnep_rx_ring_cleanup(rx);
 }
 
+static bool tsnep_pending(struct tsnep_queue *queue)
+{
+	if (queue->tx && tsnep_tx_pending(queue->tx))
+		return true;
+
+	if (queue->rx && tsnep_rx_pending(queue->rx))
+		return true;
+
+	return false;
+}
+
 static int tsnep_poll(struct napi_struct *napi, int budget)
 {
 	struct tsnep_queue *queue = container_of(napi, struct tsnep_queue,
@@ -888,9 +935,19 @@  static int tsnep_poll(struct napi_struct *napi, int budget)
 	if (!complete)
 		return budget;
 
-	if (likely(napi_complete_done(napi, done)))
+	if (likely(napi_complete_done(napi, done))) {
 		tsnep_enable_irq(queue->adapter, queue->irq_mask);
 
+		/* reschedule if work is already pending, prevent rotten packets
+		 * which are transmitted or received after polling but before
+		 * interrupt enable
+		 */
+		if (tsnep_pending(queue)) {
+			tsnep_disable_irq(queue->adapter, queue->irq_mask);
+			napi_schedule(napi);
+		}
+	}
+
 	return min(done, budget - 1);
 }