Message ID | 20221117201440.21183-3-gerhard@engleder-embedded.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | tsnep: Throttle irq, rotten pkts, RX buffer alloc and ethtool_get_channels() | expand |
On Thu, Nov 17, 2022 at 09:14:38PM +0100, Gerhard Engleder wrote: > If PTP synchronisation is done every second, then sporadic the interval > is higher than one second: > > ptp4l[696.582]: master offset -17 s2 freq -1891 path delay 573 > ptp4l[697.582]: master offset -22 s2 freq -1901 path delay 573 > ptp4l[699.368]: master offset -1 s2 freq -1887 path delay 573 > ^^^^^^^ Should be 698.582! > > This problem is caused by rotten packets, which are received after > polling but before interrupts are enabled again. Is this a hardware bug? At the end of the interrupt coalescence period, should it not check the queue and fire an interrupt? Andrew
On 17.11.22 21:39, Andrew Lunn wrote: > On Thu, Nov 17, 2022 at 09:14:38PM +0100, Gerhard Engleder wrote: >> If PTP synchronisation is done every second, then sporadic the interval >> is higher than one second: >> >> ptp4l[696.582]: master offset -17 s2 freq -1891 path delay 573 >> ptp4l[697.582]: master offset -22 s2 freq -1901 path delay 573 >> ptp4l[699.368]: master offset -1 s2 freq -1887 path delay 573 >> ^^^^^^^ Should be 698.582! >> >> This problem is caused by rotten packets, which are received after >> polling but before interrupts are enabled again. > > Is this a hardware bug? At the end of the interrupt coalescence > period, should it not check the queue and fire an interrupt? In my case, the hardware is not signaled if a descriptor is processed by the software. The hardware is only signaled if it gets new descriptors assigned. So the hardware does not know if there are still descriptors in the RX queue which need to be processed by the software. As a result, it would only be possible to trigger an interrupt for descriptors which may has been processed already anyway. In the end I made the hardware stupid. If interrupts are disabled for NAPI polling, then interrupts events in the hardware are ignored. If interrupts are enabled again, then only new interrupt events will trigger the interrupt. I was afraid that too intelligent hardware will lead to hardware bugs in this case. Gerhard
On Thu, 17 Nov 2022 21:14:38 +0100 Gerhard Engleder wrote: > If PTP synchronisation is done every second, then sporadic the interval > is higher than one second: > > ptp4l[696.582]: master offset -17 s2 freq -1891 path delay 573 > ptp4l[697.582]: master offset -22 s2 freq -1901 path delay 573 > ptp4l[699.368]: master offset -1 s2 freq -1887 path delay 573 > ^^^^^^^ Should be 698.582! > > This problem is caused by rotten packets, which are received after > polling but before interrupts are enabled again. This can be fixed by > checking for pending work and rescheduling if necessary after interrupts > has been enabled again. > > Fixes: 403f69bbdbad ("tsnep: Add TSN endpoint Ethernet MAC driver") > Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> This patch needs to go to net separately :( Packets getting stuck in a queue can cause real issues to users.
On 19.11.22 02:26, Jakub Kicinski wrote: > On Thu, 17 Nov 2022 21:14:38 +0100 Gerhard Engleder wrote: >> If PTP synchronisation is done every second, then sporadic the interval >> is higher than one second: >> >> ptp4l[696.582]: master offset -17 s2 freq -1891 path delay 573 >> ptp4l[697.582]: master offset -22 s2 freq -1901 path delay 573 >> ptp4l[699.368]: master offset -1 s2 freq -1887 path delay 573 >> ^^^^^^^ Should be 698.582! >> >> This problem is caused by rotten packets, which are received after >> polling but before interrupts are enabled again. This can be fixed by >> checking for pending work and rescheduling if necessary after interrupts >> has been enabled again. >> >> Fixes: 403f69bbdbad ("tsnep: Add TSN endpoint Ethernet MAC driver") >> Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> > > This patch needs to go to net separately :( > Packets getting stuck in a queue can cause real issues to users. I will post it separately.
diff --git a/drivers/net/ethernet/engleder/tsnep_main.c b/drivers/net/ethernet/engleder/tsnep_main.c index a99320e03279..0aca2ba97757 100644 --- a/drivers/net/ethernet/engleder/tsnep_main.c +++ b/drivers/net/ethernet/engleder/tsnep_main.c @@ -544,6 +544,27 @@ static bool tsnep_tx_poll(struct tsnep_tx *tx, int napi_budget) return (budget != 0); } +static bool tsnep_tx_pending(struct tsnep_tx *tx) +{ + unsigned long flags; + struct tsnep_tx_entry *entry; + bool pending = false; + + spin_lock_irqsave(&tx->lock, flags); + + if (tx->read != tx->write) { + entry = &tx->entry[tx->read]; + if ((__le32_to_cpu(entry->desc_wb->properties) & + TSNEP_TX_DESC_OWNER_MASK) == + (entry->properties & TSNEP_TX_DESC_OWNER_MASK)) + pending = true; + } + + spin_unlock_irqrestore(&tx->lock, flags); + + return pending; +} + static int tsnep_tx_open(struct tsnep_adapter *adapter, void __iomem *addr, int queue_index, struct tsnep_tx *tx) { @@ -823,6 +844,21 @@ static int tsnep_rx_poll(struct tsnep_rx *rx, struct napi_struct *napi, return done; } +static bool tsnep_rx_pending(struct tsnep_rx *rx) +{ + struct tsnep_rx_entry *entry; + + if (rx->read != rx->write) { + entry = &rx->entry[rx->read]; + if ((__le32_to_cpu(entry->desc_wb->properties) & + TSNEP_DESC_OWNER_COUNTER_MASK) == + (entry->properties & TSNEP_DESC_OWNER_COUNTER_MASK)) + return true; + } + + return false; +} + static int tsnep_rx_open(struct tsnep_adapter *adapter, void __iomem *addr, int queue_index, struct tsnep_rx *rx) { @@ -868,6 +904,17 @@ static void tsnep_rx_close(struct tsnep_rx *rx) tsnep_rx_ring_cleanup(rx); } +static bool tsnep_pending(struct tsnep_queue *queue) +{ + if (queue->tx && tsnep_tx_pending(queue->tx)) + return true; + + if (queue->rx && tsnep_rx_pending(queue->rx)) + return true; + + return false; +} + static int tsnep_poll(struct napi_struct *napi, int budget) { struct tsnep_queue *queue = container_of(napi, struct tsnep_queue, @@ -888,9 +935,19 @@ static int tsnep_poll(struct napi_struct *napi, int budget) if (!complete) return budget; - if (likely(napi_complete_done(napi, done))) + if (likely(napi_complete_done(napi, done))) { tsnep_enable_irq(queue->adapter, queue->irq_mask); + /* reschedule if work is already pending, prevent rotten packets + * which are transmitted or received after polling but before + * interrupt enable + */ + if (tsnep_pending(queue)) { + tsnep_disable_irq(queue->adapter, queue->irq_mask); + napi_schedule(napi); + } + } + return min(done, budget - 1); }
If PTP synchronisation is done every second, then sporadic the interval is higher than one second: ptp4l[696.582]: master offset -17 s2 freq -1891 path delay 573 ptp4l[697.582]: master offset -22 s2 freq -1901 path delay 573 ptp4l[699.368]: master offset -1 s2 freq -1887 path delay 573 ^^^^^^^ Should be 698.582! This problem is caused by rotten packets, which are received after polling but before interrupts are enabled again. This can be fixed by checking for pending work and rescheduling if necessary after interrupts has been enabled again. Fixes: 403f69bbdbad ("tsnep: Add TSN endpoint Ethernet MAC driver") Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> --- drivers/net/ethernet/engleder/tsnep_main.c | 59 +++++++++++++++++++++- 1 file changed, 58 insertions(+), 1 deletion(-)