Message ID | 20250205100524.1138523-6-faizal.abdul.rahim@linux.intel.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | igc: Add support for Frame Preemption feature in IGC | expand |
On Wed, Feb 05, 2025 at 05:05:20AM -0500, Faizal Rahim wrote: > This patch implements the "ethtool --set-mm" callback to trigger the > frame preemption verification handshake. > > Uses the MAC Merge Software Verification (mmsv) mechanism in ethtool > to perform the verification handshake for igc. > The structure fpe.mmsv is set by mmsv in ethtool and should remain > read-only for the driver. > > igc does not use two mmsv callbacks: > a) configure_tx() > - igc lacks registers to configure FPE in the transmit direction. Yes, maybe, but it's still important to handle this. It tells you when the preemptible traffic classes should be sent as preemptible on the wire (i.e. when the verification is either disabled, or it succeeded). There is a selftest called manual_failed_verification() which supposedly tests this exact condition: if verification fails, then packets sent to TC0 are supposed to bump the eMAC's TX counters, even though TC0 is configured as preemptible. Otherwise stated: even if the tc program says that a certain traffic class is preemptible, you don't want to actually send preemptible packets if you haven't verified the link partner can handle them, since it will likely drop them on RX otherwise. The problem with that selftest is that it relies on the driver's ability to report split ethtool counters for eMAC/pMAC, rather than just aggregate. In lack of that, you need to know through some other mechanism whether the link partner received those packets as preemptible or express, and that's kind of hard to add to a selftest. > b) configure_pmac() > - this callback dynamically controls pmac_enabled at runtime. For > example, mmsv calls configure_pmac() and disables pmac_enabled when > the link partner goes down, even if the user previously enabled it. > The intention is to save power but it is not feasible in igc > because it causes an endless adapter reset loop: > > 1) Board A and Board B complete the verification handshake. > Tx mode register for both boards are in TSN mode. > 2) Board B link goes down. > > On Board A: > 3) mmsv calls configure_pmac() with pmac_enabled = false. > 4) configure_pmac() in igc updates a new field based on > pmac_enabled. Driver uses this field in igc_tsn_new_flags() > to indicate that the user enabled/disabled FPE. > 5) configure_pmac() in igc calls igc_tsn_offload_apply() to check > whether an adapter reset is needed. Calls existing logic in > igc_tsn_will_tx_mode_change() and igc_tsn_new_flags(). > 6) Since pmac_enabled is now disabled and no other TSN feature > is active, igc_tsn_will_tx_mode_change() evaluates to true > because Tx mode will switch from TSN to Legacy. > 7) Driver resets the adapter. > 8) Registers are set, and Tx mode switches to Legacy. > 9) When link partner is up, steps 3–8 repeat, but this time > with pmac_enabled = true, reactivating TSN. > igc_tsn_will_tx_mode_change() evaluates to true again, > since Tx mode will switch from Legacy to TSN. > 10) Driver resets the adapter. > 11) Rest adapter completes, registers are set, and Tx mode > switches to TSN. > > On Board B: > 12) Adapter reset on Board A at step 10 causes it to detect its > link partner as down. Is this using fiber/DAC, or twisted pair (RJ45 PHY)? > 13) Repeats steps 3–8. > 14) Once reset adapter on Board A is completed at step 11, it > detects its link partner as up. > 15) Repeats steps 9–11. > > - this cycle repeats indefinitely. To avoid this issue, IGC only uses > mmsv.pmac_enabled to track whether FPE is enabled or disabled. Not that I couldn't eventually tolerate this workaround, but I figure it's worth asking anyway. Isn't there a way to do an adapter reset without losing link in the PHY (assuming it's the PHY that loses link)? Is that a consequence of software design that's not careful enough, or of hardware design? I assume the PHY is a discrete component. Avoiding the need to retrigger auto-negotiation saves a few seconds of waiting, so it's worth pursuing if it's possible at all. > Co-developed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> > Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> > Co-developed-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> > Signed-off-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> > Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com> > --- > diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c > index 817838677817..41cb03816f92 100644 > --- a/drivers/net/ethernet/intel/igc/igc_ethtool.c > +++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c > @@ -1781,6 +1782,23 @@ static int igc_ethtool_set_eee(struct net_device *netdev, > return 0; > } > > +static int igc_ethtool_set_mm(struct net_device *netdev, > + struct ethtool_mm_cfg *cmd, > + struct netlink_ext_ack *extack) > +{ > + struct igc_adapter *adapter = netdev_priv(netdev); > + struct fpe_t *fpe = &adapter->fpe; > + > + ethtool_mmsv_set_mm(&fpe->mmsv, cmd); > + > + if (fpe->mmsv.pmac_enabled) > + static_branch_enable(&igc_fpe_enabled); > + else > + static_branch_disable(&igc_fpe_enabled); I don't think the API is correctly used here. If there are 2 igc cards in the system and one gets an ethtool call with pmac_enabled=false, it will break the static branch for the other card. You need to use static_branch_inc() and static_branch_dec(), and react only on changes (old fpe->mmsv.pmac_enabled ^ cmd->pmac_enabled). > + > + return igc_tsn_offload_apply(adapter); > +} > + > static int igc_ethtool_get_link_ksettings(struct net_device *netdev, > struct ethtool_link_ksettings *cmd) > { > diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c > index 44e4f925491f..396410522e01 100644 > --- a/drivers/net/ethernet/intel/igc/igc_main.c > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > @@ -2618,6 +2618,15 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget) > size -= IGC_TS_HDR_LEN; > } > > + if (static_branch_unlikely(&igc_fpe_enabled) && > + igc_fpe_is_verify_or_response(rx_desc, size)) { FWIW, the igc_fpe_enabled static branch is kernel-wide and does not necessarily mean that FPE is enabled on _this_ NIC (that would still be adapter->fpe.mmsv.pmac_enabled). I don't think it's very safe to call ethtool_mmsv events when it doesn't expect them (pmac_enabled = false), at least from a maintainance perspective it would be easier not to do that. I'm just ballparking this, but I think igc_fpe_is_verify_or_response() is also just a bit more computationally expensive than just checking adapter->fpe.mmsv.pmac_enabled for the "normal" case where FPE isn't enabled, so it might make sense to add this as an extra check anyway. > + igc_fpe_lp_event_status(rx_desc, &adapter->fpe.mmsv); > + /* Advance the ring next-to-clean */ > + igc_is_non_eop(rx_ring, rx_desc); > + cleaned_count++; > + continue; > + } > + > if (!skb) { > xdp_init_buff(&ctx.xdp, truesize, &rx_ring->xdp_rxq); > xdp_prepare_buff(&ctx.xdp, pktbuf - igc_rx_offset(rx_ring), > diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c > index f0213cfce07d..52ff6f324d79 100644 > --- a/drivers/net/ethernet/intel/igc/igc_tsn.c > +++ b/drivers/net/ethernet/intel/igc/igc_tsn.c > @@ -1,10 +1,153 @@ > // SPDX-License-Identifier: GPL-2.0 > /* Copyright (c) 2019 Intel Corporation */ > > +#include <linux/kernel.h> > #include "igc.h" > +#include "igc_base.h" > #include "igc_hw.h" > #include "igc_tsn.h" > > +DEFINE_STATIC_KEY_FALSE(igc_fpe_enabled); > + > +static int igc_fpe_init_smd_frame(struct igc_ring *ring, > + struct igc_tx_buffer *buffer, > + struct sk_buff *skb) > +{ > + dma_addr_t dma = dma_map_single(ring->dev, skb->data, skb->len, > + DMA_TO_DEVICE); > + > + if (dma_mapping_error(ring->dev, dma)) { > + netdev_err_once(ring->netdev, "Failed to map DMA for TX\n"); > + return -ENOMEM; > + } > + > + buffer->skb = skb; > + buffer->protocol = 0; > + buffer->bytecount = skb->len; > + buffer->gso_segs = 1; > + buffer->time_stamp = jiffies; > + dma_unmap_len_set(buffer, len, skb->len); > + dma_unmap_addr_set(buffer, dma, dma); > + > + return 0; > +} > + > +static int igc_fpe_init_tx_descriptor(struct igc_ring *ring, > + struct sk_buff *skb, int type) > +{ > + struct igc_tx_buffer *buffer; > + union igc_adv_tx_desc *desc; > + u32 cmd_type, olinfo_status; > + int err; > + > + if (!igc_desc_unused(ring)) > + return -EBUSY; > + > + if (type != IGC_TXD_POPTS_SMD_V && type != IGC_TXD_POPTS_SMD_R) > + return -EINVAL; It should really be sufficient to guarantee at compile time that the argument is valid (it never comes from user input, so it's not hard). Error handling for a condition that can't be triggered shouldn't exist, except for the very specific case of future proofing loosely coupled software components (what one would call "API", not the case here). I guess a concern is what happens when, for any reason, you add a new type of packet that you can transmit, and igc_fpe_init_tx_descriptor() isn't updated to handle it yet. If you make "int type" a proper enum type, the compiler will automatically warn when an enum value isn't handled in the "switch (type)" block, because there's no "default" case. I think that's actually better than finding this out at runtime, so I would recommend using an enum for stronger typing. Warning: it's still only a hint and thus not perfect, in that you could pass an out-of-range constant literal like "42" and the compiler wouldn't warn. > + > + buffer = &ring->tx_buffer_info[ring->next_to_use]; > + err = igc_fpe_init_smd_frame(ring, buffer, skb); > + if (err) > + return err; > + > + cmd_type = IGC_ADVTXD_DTYP_DATA | IGC_ADVTXD_DCMD_DEXT | > + IGC_ADVTXD_DCMD_IFCS | IGC_TXD_DCMD | > + buffer->bytecount; > + olinfo_status = buffer->bytecount << IGC_ADVTXD_PAYLEN_SHIFT; Still some FIELD_PREP() conversions to do here? > + > + switch (type) { > + case IGC_TXD_POPTS_SMD_V: > + olinfo_status |= (IGC_TXD_POPTS_SMD_V << IGC_TXD_POPTS_SMD_SHIFT); > + break; > + case IGC_TXD_POPTS_SMD_R: > + olinfo_status |= (IGC_TXD_POPTS_SMD_R << IGC_TXD_POPTS_SMD_SHIFT); > + break; > + } What I just said above is a bit diluted by the fact that this doesn't have to be a switch/case block. It's just "olinfo_status |= type << IGC_TXD_POPTS_SMD_SHIFT" (also, parentheses are not necessary). As a mere suggestion, I would write this as: switch (type) { // "type" being an enum case IGC_TXD_POPTS_SMD_V: case IGC_TXD_POPTS_SMD_R: olinfo_status |= FIELD_PREP(IGC_TXD_POPTS_SMD_MASK, type); break; } > + > + desc = IGC_TX_DESC(ring, ring->next_to_use); > + desc->read.cmd_type_len = cpu_to_le32(cmd_type); > + desc->read.olinfo_status = cpu_to_le32(olinfo_status); > + desc->read.buffer_addr = cpu_to_le64(dma_unmap_addr(buffer, dma)); > + > + netdev_tx_sent_queue(txring_txq(ring), skb->len); > + > + buffer->next_to_watch = desc; > + ring->next_to_use = (ring->next_to_use + 1) % ring->count; > + > + return 0; > +} > + > +bool igc_fpe_transmitted_smd_v(union igc_adv_tx_desc *tx_desc) > +{ > + u8 smd = FIELD_GET(IGC_TXD_POPTS_SMD_MASK, tx_desc->read.olinfo_status); > + > + return smd == IGC_TXD_POPTS_SMD_V; > +} > + > +static int igc_fpe_xmit_smd_frame(struct igc_adapter *adapter, int type) > +{ > + int cpu = smp_processor_id(); > + struct netdev_queue *nq; > + struct igc_ring *ring; > + struct sk_buff *skb; > + void *data; > + int err; > + > + if (!netif_running(adapter->netdev)) > + return -ENOTCONN; Have you seen a code path that would require protecting the transmission using this conditional? I get the feeling that this is defensive programming when we could give guarantees in other ways. But maybe there are some race conditions I'm not seeing. ethtool_mmsv_send_mpacket() is called from ethtool_mmsv_verify_timer() and from ethtool_mmsv_event_handle(event == ETHTOOL_MMSV_LP_SENT_VERIFY_MPACKET). When called from ethtool_mmsv_verify_timer(), it means the verify timer has been armed, which only happens if (netif_running(mmsv->dev)). After netif_running() becomes false (aka __LINK_STATE_START is cleared, aka in ndo_stop()), you have a chance to call ethtool_mmsv_stop(), which stops the verification timer. I see you're not doing that. Is there a reason? > + > + ring = igc_get_tx_ring(adapter, cpu); > + nq = txring_txq(ring); > + > + skb = alloc_skb(SMD_FRAME_SIZE, GFP_KERNEL); > + if (!skb) > + return -ENOMEM; > + > + data = skb_put(skb, SMD_FRAME_SIZE); > + memset(data, 0, SMD_FRAME_SIZE); skb_put_zero() > + > + __netif_tx_lock(nq, cpu); > + > + err = igc_fpe_init_tx_descriptor(ring, skb, type); > + igc_flush_tx_descriptors(ring); > + > + __netif_tx_unlock(nq); > + > + return err; > +} > + > +static void igc_fpe_send_mpacket(struct ethtool_mmsv *mmsv, > + enum ethtool_mpacket type) > +{ > + struct fpe_t *fpe = container_of(mmsv, struct fpe_t, mmsv); > + struct igc_adapter *adapter; > + int err; > + > + adapter = container_of(fpe, struct igc_adapter, fpe); > + > + if (type == ETHTOOL_MPACKET_VERIFY) { > + err = igc_fpe_xmit_smd_frame(adapter, IGC_TXD_POPTS_SMD_V); > + if (err) > + netdev_err(adapter->netdev, "Error sending SMD-V\n"); > + } else if (type == ETHTOOL_MPACKET_RESPONSE) { > + err = igc_fpe_xmit_smd_frame(adapter, IGC_TXD_POPTS_SMD_R); > + if (err) > + netdev_err(adapter->netdev, "Error sending SMD-R frame\n"); Can you please rate limit these prints? > + } > +}
Hi Vladimir, Thanks for the quick review, appreciate your help. On 6/2/2025 1:12 am, Vladimir Oltean wrote: > On Wed, Feb 05, 2025 at 05:05:20AM -0500, Faizal Rahim wrote: >> This patch implements the "ethtool --set-mm" callback to trigger the >> frame preemption verification handshake. >> >> Uses the MAC Merge Software Verification (mmsv) mechanism in ethtool >> to perform the verification handshake for igc. >> The structure fpe.mmsv is set by mmsv in ethtool and should remain >> read-only for the driver. >> >> igc does not use two mmsv callbacks: >> a) configure_tx() >> - igc lacks registers to configure FPE in the transmit direction. > > Yes, maybe, but it's still important to handle this. It tells you when > the preemptible traffic classes should be sent as preemptible on the wire > (i.e. when the verification is either disabled, or it succeeded). > > There is a selftest called manual_failed_verification() which supposedly > tests this exact condition: if verification fails, then packets sent to > TC0 are supposed to bump the eMAC's TX counters, even though TC0 is > configured as preemptible. Otherwise stated: even if the tc program says > that a certain traffic class is preemptible, you don't want to actually > send preemptible packets if you haven't verified the link partner can > handle them, since it will likely drop them on RX otherwise. Even though fpe in tx direction isn't set in igc, it still checks ethtool_mmsv_is_tx_active() before setting a queue as preemptible. This is done in : igc_tsn_enable_offload(struct igc_adapter *adapter) { { .... if (ethtool_mmsv_is_tx_active(&adapter->fpe.mmsv) && ring->preemptible) txqctl |= IGC_TXQCTL_PREEMPTIBLE; Wouldn't this handle the situation mentioned ? Sorry if I miss something here. > The problem with that selftest is that it relies on the driver's ability > to report split ethtool counters for eMAC/pMAC, rather than just aggregate. > In lack of that, you need to know through some other mechanism whether > the link partner received those packets as preemptible or express, and > that's kind of hard to add to a selftest. > >> b) configure_pmac() >> - this callback dynamically controls pmac_enabled at runtime. For >> example, mmsv calls configure_pmac() and disables pmac_enabled when >> the link partner goes down, even if the user previously enabled it. >> The intention is to save power but it is not feasible in igc >> because it causes an endless adapter reset loop: >> >> 1) Board A and Board B complete the verification handshake. >> Tx mode register for both boards are in TSN mode. >> 2) Board B link goes down. >> >> On Board A: >> 3) mmsv calls configure_pmac() with pmac_enabled = false. >> 4) configure_pmac() in igc updates a new field based on >> pmac_enabled. Driver uses this field in igc_tsn_new_flags() >> to indicate that the user enabled/disabled FPE. >> 5) configure_pmac() in igc calls igc_tsn_offload_apply() to check >> whether an adapter reset is needed. Calls existing logic in >> igc_tsn_will_tx_mode_change() and igc_tsn_new_flags(). >> 6) Since pmac_enabled is now disabled and no other TSN feature >> is active, igc_tsn_will_tx_mode_change() evaluates to true >> because Tx mode will switch from TSN to Legacy. >> 7) Driver resets the adapter. >> 8) Registers are set, and Tx mode switches to Legacy. >> 9) When link partner is up, steps 3–8 repeat, but this time >> with pmac_enabled = true, reactivating TSN. >> igc_tsn_will_tx_mode_change() evaluates to true again, >> since Tx mode will switch from Legacy to TSN. >> 10) Driver resets the adapter. >> 11) Rest adapter completes, registers are set, and Tx mode >> switches to TSN. >> >> On Board B: >> 12) Adapter reset on Board A at step 10 causes it to detect its >> link partner as down. > > Is this using fiber/DAC, or twisted pair (RJ45 PHY)? It's using RJ45 PHY. >> 13) Repeats steps 3–8. >> 14) Once reset adapter on Board A is completed at step 11, it >> detects its link partner as up. >> 15) Repeats steps 9–11. >> >> - this cycle repeats indefinitely. To avoid this issue, IGC only uses >> mmsv.pmac_enabled to track whether FPE is enabled or disabled. > > Not that I couldn't eventually tolerate this workaround, but I figure > it's worth asking anyway. Isn't there a way to do an adapter reset > without losing link in the PHY (assuming it's the PHY that loses link)? > Is that a consequence of software design that's not careful enough, or > of hardware design? I assume the PHY is a discrete component. Avoiding > the need to retrigger auto-negotiation saves a few seconds of waiting, > so it's worth pursuing if it's possible at all. > I briefly checked the driver code and the i226 SW User Manual. The code calls igc_reset_task() → igc_reset() → igc_reinit_locked() → igc_down() → igc_reset() → igc_power_down_phy_copper_base(). I suspect igc_power_down_phy_copper_base() contributes to the link loss, but there are likely other factors as well. The SW User Manual states that a software reset (CTRL.DEV_RST) affects several components, including "Port Configuration Autoload from NVM, Data Path, On-die Memories, MAC, PCS, Auto Negotiation and other Port-related Logic, Function Queue Enable, Function Interrupt and Statistics registers, Wake-up Management Registers, and Memory Configuration Registers." Given this, right now, I’m unsure about the feasibility of making this change, though I see the benefits mentioned. Validating this would require significant exploration—i.e., investigating the code, running experiments, reviewing commit history, confirming hardware expectations from the SW manual, and consulting other hardware SMEs. Resetting the adapter is a common mechanism in igc that relies on shared functions, which can be triggered in various scenarios. Modifying this behavior (if feasible) could introduce some risks and would likely require additional testing across those scenarios. Focusing on this right now might delay the current series, which is primarily aimed at enabling Qbu on i225/6. Would it be alright if I explore this separately from this Qbu series? >> Co-developed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> >> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> >> Co-developed-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> >> Signed-off-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> >> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com> >> --- >> diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c >> index 817838677817..41cb03816f92 100644 >> --- a/drivers/net/ethernet/intel/igc/igc_ethtool.c >> +++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c >> @@ -1781,6 +1782,23 @@ static int igc_ethtool_set_eee(struct net_device *netdev, >> return 0; >> } >> >> +static int igc_ethtool_set_mm(struct net_device *netdev, >> + struct ethtool_mm_cfg *cmd, >> + struct netlink_ext_ack *extack) >> +{ >> + struct igc_adapter *adapter = netdev_priv(netdev); >> + struct fpe_t *fpe = &adapter->fpe; >> + >> + ethtool_mmsv_set_mm(&fpe->mmsv, cmd); >> + >> + if (fpe->mmsv.pmac_enabled) >> + static_branch_enable(&igc_fpe_enabled); >> + else >> + static_branch_disable(&igc_fpe_enabled); > > I don't think the API is correctly used here. If there are 2 igc cards > in the system and one gets an ethtool call with pmac_enabled=false, it > will break the static branch for the other card. You need to use > static_branch_inc() and static_branch_dec(), and react only on changes > (old fpe->mmsv.pmac_enabled ^ cmd->pmac_enabled). > Got it, thanks for the explanation, that makes sense, I'll make the change. >> + >> + return igc_tsn_offload_apply(adapter); >> +} >> + >> static int igc_ethtool_get_link_ksettings(struct net_device *netdev, >> struct ethtool_link_ksettings *cmd) >> { >> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c >> index 44e4f925491f..396410522e01 100644 >> --- a/drivers/net/ethernet/intel/igc/igc_main.c >> +++ b/drivers/net/ethernet/intel/igc/igc_main.c >> @@ -2618,6 +2618,15 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget) >> size -= IGC_TS_HDR_LEN; >> } >> >> + if (static_branch_unlikely(&igc_fpe_enabled) && >> + igc_fpe_is_verify_or_response(rx_desc, size)) { > > FWIW, the igc_fpe_enabled static branch is kernel-wide and does not > necessarily mean that FPE is enabled on _this_ NIC (that would still be > adapter->fpe.mmsv.pmac_enabled). > > I don't think it's very safe to call ethtool_mmsv events when it doesn't > expect them (pmac_enabled = false), at least from a maintainance perspective > it would be easier not to do that. > > I'm just ballparking this, but I think igc_fpe_is_verify_or_response() is > also just a bit more computationally expensive than just checking > adapter->fpe.mmsv.pmac_enabled for the "normal" case where FPE isn't > enabled, so it might make sense to add this as an extra check anyway. > Got it, will update. >> + igc_fpe_lp_event_status(rx_desc, &adapter->fpe.mmsv); >> + /* Advance the ring next-to-clean */ >> + igc_is_non_eop(rx_ring, rx_desc); >> + cleaned_count++; >> + continue; >> + } >> + >> if (!skb) { >> xdp_init_buff(&ctx.xdp, truesize, &rx_ring->xdp_rxq); >> xdp_prepare_buff(&ctx.xdp, pktbuf - igc_rx_offset(rx_ring), >> diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c >> index f0213cfce07d..52ff6f324d79 100644 >> --- a/drivers/net/ethernet/intel/igc/igc_tsn.c >> +++ b/drivers/net/ethernet/intel/igc/igc_tsn.c >> @@ -1,10 +1,153 @@ >> // SPDX-License-Identifier: GPL-2.0 >> /* Copyright (c) 2019 Intel Corporation */ >> >> +#include <linux/kernel.h> >> #include "igc.h" >> +#include "igc_base.h" >> #include "igc_hw.h" >> #include "igc_tsn.h" >> >> +DEFINE_STATIC_KEY_FALSE(igc_fpe_enabled); >> + >> +static int igc_fpe_init_smd_frame(struct igc_ring *ring, >> + struct igc_tx_buffer *buffer, >> + struct sk_buff *skb) >> +{ >> + dma_addr_t dma = dma_map_single(ring->dev, skb->data, skb->len, >> + DMA_TO_DEVICE); >> + >> + if (dma_mapping_error(ring->dev, dma)) { >> + netdev_err_once(ring->netdev, "Failed to map DMA for TX\n"); >> + return -ENOMEM; >> + } >> + >> + buffer->skb = skb; >> + buffer->protocol = 0; >> + buffer->bytecount = skb->len; >> + buffer->gso_segs = 1; >> + buffer->time_stamp = jiffies; >> + dma_unmap_len_set(buffer, len, skb->len); >> + dma_unmap_addr_set(buffer, dma, dma); >> + >> + return 0; >> +} >> + >> +static int igc_fpe_init_tx_descriptor(struct igc_ring *ring, >> + struct sk_buff *skb, int type) >> +{ >> + struct igc_tx_buffer *buffer; >> + union igc_adv_tx_desc *desc; >> + u32 cmd_type, olinfo_status; >> + int err; >> + >> + if (!igc_desc_unused(ring)) >> + return -EBUSY; >> + >> + if (type != IGC_TXD_POPTS_SMD_V && type != IGC_TXD_POPTS_SMD_R) >> + return -EINVAL; > > It should really be sufficient to guarantee at compile time that the argument > is valid (it never comes from user input, so it's not hard). Error handling > for a condition that can't be triggered shouldn't exist, except for the Yea that error handling wasn't originally there, it was added due to some internal commments, even though it didn't feel right to me. Will change according to your suggestion. > very specific case of future proofing loosely coupled software components > (what one would call "API", not the case here). > > I guess a concern is what happens when, for any reason, you add a new > type of packet that you can transmit, and igc_fpe_init_tx_descriptor() > isn't updated to handle it yet. If you make "int type" a proper enum > type, the compiler will automatically warn when an enum value isn't > handled in the "switch (type)" block, because there's no "default" case. > I think that's actually better than finding this out at runtime, so I > would recommend using an enum for stronger typing. > > Warning: it's still only a hint and thus not perfect, in that you > could pass an out-of-range constant literal like "42" and the compiler > wouldn't warn. > >> + >> + buffer = &ring->tx_buffer_info[ring->next_to_use]; >> + err = igc_fpe_init_smd_frame(ring, buffer, skb); >> + if (err) >> + return err; >> + >> + cmd_type = IGC_ADVTXD_DTYP_DATA | IGC_ADVTXD_DCMD_DEXT | >> + IGC_ADVTXD_DCMD_IFCS | IGC_TXD_DCMD | >> + buffer->bytecount; >> + olinfo_status = buffer->bytecount << IGC_ADVTXD_PAYLEN_SHIFT; > > Still some FIELD_PREP() conversions to do here? > >> + >> + switch (type) { >> + case IGC_TXD_POPTS_SMD_V: >> + olinfo_status |= (IGC_TXD_POPTS_SMD_V << IGC_TXD_POPTS_SMD_SHIFT); >> + break; >> + case IGC_TXD_POPTS_SMD_R: >> + olinfo_status |= (IGC_TXD_POPTS_SMD_R << IGC_TXD_POPTS_SMD_SHIFT); >> + break; >> + } > > What I just said above is a bit diluted by the fact that this doesn't > have to be a switch/case block. It's just "olinfo_status |= type << IGC_TXD_POPTS_SMD_SHIFT" > (also, parentheses are not necessary). > > As a mere suggestion, I would write this as: > > switch (type) { // "type" being an enum > case IGC_TXD_POPTS_SMD_V: > case IGC_TXD_POPTS_SMD_R: > olinfo_status |= FIELD_PREP(IGC_TXD_POPTS_SMD_MASK, type); > break; > } > >> + >> + desc = IGC_TX_DESC(ring, ring->next_to_use); >> + desc->read.cmd_type_len = cpu_to_le32(cmd_type); >> + desc->read.olinfo_status = cpu_to_le32(olinfo_status); >> + desc->read.buffer_addr = cpu_to_le64(dma_unmap_addr(buffer, dma)); >> + >> + netdev_tx_sent_queue(txring_txq(ring), skb->len); >> + >> + buffer->next_to_watch = desc; >> + ring->next_to_use = (ring->next_to_use + 1) % ring->count; >> + >> + return 0; >> +} >> + >> +bool igc_fpe_transmitted_smd_v(union igc_adv_tx_desc *tx_desc) >> +{ >> + u8 smd = FIELD_GET(IGC_TXD_POPTS_SMD_MASK, tx_desc->read.olinfo_status); >> + >> + return smd == IGC_TXD_POPTS_SMD_V; >> +} >> + >> +static int igc_fpe_xmit_smd_frame(struct igc_adapter *adapter, int type) >> +{ >> + int cpu = smp_processor_id(); >> + struct netdev_queue *nq; >> + struct igc_ring *ring; >> + struct sk_buff *skb; >> + void *data; >> + int err; >> + >> + if (!netif_running(adapter->netdev)) >> + return -ENOTCONN; > > Have you seen a code path that would require protecting the transmission > using this conditional? > I checked, I'm not sure too. I'll check with Vinicius too later, but I think it's not needed already since this flow is triggered by mmsv that already checks if (netif_running(mmsv->dev)) - as per your explanation below. > I get the feeling that this is defensive programming when we could give > guarantees in other ways. But maybe there are some race conditions I'm > not seeing. > > ethtool_mmsv_send_mpacket() is called from ethtool_mmsv_verify_timer() > and from ethtool_mmsv_event_handle(event == ETHTOOL_MMSV_LP_SENT_VERIFY_MPACKET). > > When called from ethtool_mmsv_verify_timer(), it means the verify timer > has been armed, which only happens if (netif_running(mmsv->dev)). After > netif_running() becomes false (aka __LINK_STATE_START is cleared, aka in > ndo_stop()), you have a chance to call ethtool_mmsv_stop(), which stops > the verification timer. I see you're not doing that. Is there a reason? > I missed this - ethtool_mmsv_stop(). Sorry for that, I'll update the code, thanks.
On Thu, Feb 06, 2025 at 10:40:11PM +0800, Abdul Rahim, Faizal wrote: > > Hi Vladimir, > > Thanks for the quick review, appreciate your help. > > On 6/2/2025 1:12 am, Vladimir Oltean wrote: > > On Wed, Feb 05, 2025 at 05:05:20AM -0500, Faizal Rahim wrote: > > > This patch implements the "ethtool --set-mm" callback to trigger the > > > frame preemption verification handshake. > > > > > > Uses the MAC Merge Software Verification (mmsv) mechanism in ethtool > > > to perform the verification handshake for igc. > > > The structure fpe.mmsv is set by mmsv in ethtool and should remain > > > read-only for the driver. > > > > > > igc does not use two mmsv callbacks: > > > a) configure_tx() > > > - igc lacks registers to configure FPE in the transmit direction. > > > > Yes, maybe, but it's still important to handle this. It tells you when > > the preemptible traffic classes should be sent as preemptible on the wire > > (i.e. when the verification is either disabled, or it succeeded). > > > > There is a selftest called manual_failed_verification() which supposedly > > tests this exact condition: if verification fails, then packets sent to > > TC0 are supposed to bump the eMAC's TX counters, even though TC0 is > > configured as preemptible. Otherwise stated: even if the tc program says > > that a certain traffic class is preemptible, you don't want to actually > > send preemptible packets if you haven't verified the link partner can > > handle them, since it will likely drop them on RX otherwise. > > Even though fpe in tx direction isn't set in igc, it still checks > ethtool_mmsv_is_tx_active() before setting a queue as preemptible. > > This is done in : > igc_tsn_enable_offload(struct igc_adapter *adapter) { > { > .... > if (ethtool_mmsv_is_tx_active(&adapter->fpe.mmsv) && > ring->preemptible) > txqctl |= IGC_TXQCTL_PREEMPTIBLE; > > > Wouldn't this handle the situation mentioned ? > Sorry if I miss something here. And what if tx_active becomes true after you had already configured the queues with tc (and the above check caused IGC_TXQCTL_PREEMPTIBLE to not be set)? Shouldn't you set IGC_TXQCTL_PREEMPTIBLE now? Isn't ethtool_mmsv_configure_tx() exactly the function that notifies you of changes to tx_active, and hence, aren't you interested in setting up a callback for it? Or is igc_tsn_reset() -> igc_tsn_enable_offload() called through some other path, after verification succeeds, that I'm not seeing? I don't think so. Maybe coincidentally, but not guaranteed. > I briefly checked the driver code and the i226 SW User Manual. > > The code calls igc_reset_task() → igc_reset() → igc_reinit_locked() → > igc_down() → igc_reset() → igc_power_down_phy_copper_base(). > > I suspect igc_power_down_phy_copper_base() contributes to the link loss, but > there are likely other factors as well. > > The SW User Manual states that a software reset (CTRL.DEV_RST) affects > several components, including "Port Configuration Autoload from NVM, Data > Path, On-die Memories, MAC, PCS, Auto Negotiation and other Port-related > Logic, Function Queue Enable, Function Interrupt and Statistics registers, > Wake-up Management Registers, and Memory Configuration Registers." > > Given this, right now, I’m unsure about the feasibility of making this > change, though I see the benefits mentioned. > Validating this would require significant exploration—i.e., investigating > the code, running experiments, reviewing commit history, confirming hardware > expectations from the SW manual, and consulting other hardware SMEs. > > Resetting the adapter is a common mechanism in igc that relies on shared > functions, which can be triggered in various scenarios. Modifying this > behavior (if feasible) could introduce some risks and would likely require > additional testing across those scenarios. Focusing on this right now might > delay the current series, which is primarily aimed at enabling Qbu on > i225/6. > > Would it be alright if I explore this separately from this Qbu series? Ok - as I said, it's not as if I couldn't eventually tolerate the workaround you chose. We'd be putting a dependency of this feature on some unrelated thing with a high degree of uncertainty. For example, I asked if this is fiber or a twisted pair PHY, because while for the copper PHY the issue might be more tractable (clause 28 autoneg on the media side is completely independent of what the host side NIC is doing - it may reset, it may do whatever - unless the MAC provides a clock that is important for PHY operation), I do wonder if anything at all could be done to avoid the link partner seeing a link loss over fiber, because there, the connection is direct PCS-to-PCS. If you have to reset, you have to reset, and then pick up the pieces, I understand. It's good you're committing to look into this in more depth, though.
On 6/2/2025 11:04 pm, Vladimir Oltean wrote: > On Thu, Feb 06, 2025 at 10:40:11PM +0800, Abdul Rahim, Faizal wrote: >> >> Hi Vladimir, >> >> Thanks for the quick review, appreciate your help. >> >> On 6/2/2025 1:12 am, Vladimir Oltean wrote: >>> On Wed, Feb 05, 2025 at 05:05:20AM -0500, Faizal Rahim wrote: >>>> This patch implements the "ethtool --set-mm" callback to trigger the >>>> frame preemption verification handshake. >>>> >>>> Uses the MAC Merge Software Verification (mmsv) mechanism in ethtool >>>> to perform the verification handshake for igc. >>>> The structure fpe.mmsv is set by mmsv in ethtool and should remain >>>> read-only for the driver. >>>> >>>> igc does not use two mmsv callbacks: >>>> a) configure_tx() >>>> - igc lacks registers to configure FPE in the transmit direction. >>> >>> Yes, maybe, but it's still important to handle this. It tells you when >>> the preemptible traffic classes should be sent as preemptible on the wire >>> (i.e. when the verification is either disabled, or it succeeded). >>> >>> There is a selftest called manual_failed_verification() which supposedly >>> tests this exact condition: if verification fails, then packets sent to >>> TC0 are supposed to bump the eMAC's TX counters, even though TC0 is >>> configured as preemptible. Otherwise stated: even if the tc program says >>> that a certain traffic class is preemptible, you don't want to actually >>> send preemptible packets if you haven't verified the link partner can >>> handle them, since it will likely drop them on RX otherwise. >> >> Even though fpe in tx direction isn't set in igc, it still checks >> ethtool_mmsv_is_tx_active() before setting a queue as preemptible. >> >> This is done in : >> igc_tsn_enable_offload(struct igc_adapter *adapter) { >> { >> .... >> if (ethtool_mmsv_is_tx_active(&adapter->fpe.mmsv) && >> ring->preemptible) >> txqctl |= IGC_TXQCTL_PREEMPTIBLE; >> >> >> Wouldn't this handle the situation mentioned ? >> Sorry if I miss something here. > > And what if tx_active becomes true after you had already configured the > queues with tc (and the above check caused IGC_TXQCTL_PREEMPTIBLE to not > be set)? Shouldn't you set IGC_TXQCTL_PREEMPTIBLE now? Isn't > ethtool_mmsv_configure_tx() exactly the function that notifies you of > changes to tx_active, and hence, aren't you interested in setting up a > callback for it? > Ahh okay, got it. I sent v3 that also included this update. Thanks!
diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h index 22ecdac26cf4..705bd4739e3b 100644 --- a/drivers/net/ethernet/intel/igc/igc.h +++ b/drivers/net/ethernet/intel/igc/igc.h @@ -40,6 +40,10 @@ void igc_ethtool_set_ops(struct net_device *); #define IGC_MAX_TX_TSTAMP_REGS 4 +struct fpe_t { + struct ethtool_mmsv mmsv; +}; + enum igc_mac_filter_type { IGC_MAC_FILTER_TYPE_DST = 0, IGC_MAC_FILTER_TYPE_SRC @@ -332,6 +336,8 @@ struct igc_adapter { struct timespec64 period; } perout[IGC_N_PEROUT]; + struct fpe_t fpe; + /* LEDs */ struct mutex led_mutex; struct igc_led_classdev *leds; @@ -389,10 +395,11 @@ extern char igc_driver_name[]; #define IGC_FLAG_TSN_QBV_ENABLED BIT(17) #define IGC_FLAG_TSN_QAV_ENABLED BIT(18) #define IGC_FLAG_TSN_LEGACY_ENABLED BIT(19) +#define IGC_FLAG_TSN_PREEMPT_ENABLED BIT(20) #define IGC_FLAG_TSN_ANY_ENABLED \ (IGC_FLAG_TSN_QBV_ENABLED | IGC_FLAG_TSN_QAV_ENABLED | \ - IGC_FLAG_TSN_LEGACY_ENABLED) + IGC_FLAG_TSN_LEGACY_ENABLED | IGC_FLAG_TSN_PREEMPT_ENABLED) #define IGC_FLAG_RSS_FIELD_IPV4_UDP BIT(6) #define IGC_FLAG_RSS_FIELD_IPV6_UDP BIT(7) @@ -736,7 +743,10 @@ struct igc_nfc_rule *igc_get_nfc_rule(struct igc_adapter *adapter, u32 location); int igc_add_nfc_rule(struct igc_adapter *adapter, struct igc_nfc_rule *rule); void igc_del_nfc_rule(struct igc_adapter *adapter, struct igc_nfc_rule *rule); +void igc_disable_empty_addr_recv(struct igc_adapter *adapter); +int igc_enable_empty_addr_recv(struct igc_adapter *adapter); struct igc_ring *igc_get_tx_ring(struct igc_adapter *adapter, int cpu); +void igc_flush_tx_descriptors(struct igc_ring *ring); void igc_ptp_init(struct igc_adapter *adapter); void igc_ptp_reset(struct igc_adapter *adapter); void igc_ptp_suspend(struct igc_adapter *adapter); diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h index b19ac6f30dac..563b154c3e94 100644 --- a/drivers/net/ethernet/intel/igc/igc_defines.h +++ b/drivers/net/ethernet/intel/igc/igc_defines.h @@ -308,6 +308,10 @@ #define IGC_TXD_DTYP_C 0x00000000 /* Context Descriptor */ #define IGC_TXD_POPTS_IXSM 0x01 /* Insert IP checksum */ #define IGC_TXD_POPTS_TXSM 0x02 /* Insert TCP/UDP checksum */ +#define IGC_TXD_POPTS_SMD_V 0x01 /* Transmitted packet is a SMD-Verify */ +#define IGC_TXD_POPTS_SMD_R 0x02 /* Transmitted packet is a SMD-Response */ +#define IGC_TXD_POPTS_SMD_SHIFT 12 +#define IGC_TXD_POPTS_SMD_MASK 0x3000 #define IGC_TXD_CMD_EOP 0x01000000 /* End of Packet */ #define IGC_TXD_CMD_IC 0x04000000 /* Insert Checksum */ #define IGC_TXD_CMD_DEXT 0x20000000 /* Desc extension (0 = legacy) */ @@ -363,6 +367,8 @@ #define IGC_SRRCTL_TIMER0SEL(timer) (((timer) & 0x3) << 17) /* Receive Descriptor bit definitions */ +#define IGC_RXD_STAT_SMD_TYPE_V 0x01 /* SMD-V Packet */ +#define IGC_RXD_STAT_SMD_TYPE_R 0x02 /* SMD-R Packet */ #define IGC_RXD_STAT_EOP 0x02 /* End of Packet */ #define IGC_RXD_STAT_IXSM 0x04 /* Ignore checksum */ #define IGC_RXD_STAT_UDPCS 0x10 /* UDP xsum calculated */ @@ -372,7 +378,8 @@ #define IGC_RXDEXT_STATERR_LB 0x00040000 /* Advanced Receive Descriptor bit definitions */ -#define IGC_RXDADV_STAT_TSIP 0x08000 /* timestamp in packet */ +#define IGC_RXDADV_STAT_SMD_TYPE_MASK 0x06000 +#define IGC_RXDADV_STAT_TSIP 0x08000 /* timestamp in packet */ #define IGC_RXDEXT_STATERR_L4E 0x20000000 #define IGC_RXDEXT_STATERR_IPE 0x40000000 @@ -543,6 +550,7 @@ /* Transmit Scheduling */ #define IGC_TQAVCTRL_TRANSMIT_MODE_TSN 0x00000001 +#define IGC_TQAVCTRL_PREEMPT_ENA 0x00000002 #define IGC_TQAVCTRL_ENHANCED_QAV 0x00000008 #define IGC_TQAVCTRL_FUTSCDDIS 0x00000080 diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c index 817838677817..41cb03816f92 100644 --- a/drivers/net/ethernet/intel/igc/igc_ethtool.c +++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c @@ -8,6 +8,7 @@ #include "igc.h" #include "igc_diag.h" +#include "igc_tsn.h" /* forward declaration */ struct igc_stats { @@ -1781,6 +1782,23 @@ static int igc_ethtool_set_eee(struct net_device *netdev, return 0; } +static int igc_ethtool_set_mm(struct net_device *netdev, + struct ethtool_mm_cfg *cmd, + struct netlink_ext_ack *extack) +{ + struct igc_adapter *adapter = netdev_priv(netdev); + struct fpe_t *fpe = &adapter->fpe; + + ethtool_mmsv_set_mm(&fpe->mmsv, cmd); + + if (fpe->mmsv.pmac_enabled) + static_branch_enable(&igc_fpe_enabled); + else + static_branch_disable(&igc_fpe_enabled); + + return igc_tsn_offload_apply(adapter); +} + static int igc_ethtool_get_link_ksettings(struct net_device *netdev, struct ethtool_link_ksettings *cmd) { @@ -2068,6 +2086,7 @@ static const struct ethtool_ops igc_ethtool_ops = { .set_rxfh = igc_ethtool_set_rxfh, .get_ts_info = igc_ethtool_get_ts_info, .get_channels = igc_ethtool_get_channels, + .set_mm = igc_ethtool_set_mm, .set_channels = igc_ethtool_set_channels, .get_priv_flags = igc_ethtool_get_priv_flags, .set_priv_flags = igc_ethtool_set_priv_flags, diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index 44e4f925491f..396410522e01 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -2529,7 +2529,7 @@ static int igc_xdp_run_prog(struct igc_adapter *adapter, struct xdp_buff *xdp) } /* This function assumes __netif_tx_lock is held by the caller. */ -static void igc_flush_tx_descriptors(struct igc_ring *ring) +void igc_flush_tx_descriptors(struct igc_ring *ring) { /* Once tail pointer is updated, hardware can fetch the descriptors * any time so we issue a write membar here to ensure all memory @@ -2618,6 +2618,15 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget) size -= IGC_TS_HDR_LEN; } + if (static_branch_unlikely(&igc_fpe_enabled) && + igc_fpe_is_verify_or_response(rx_desc, size)) { + igc_fpe_lp_event_status(rx_desc, &adapter->fpe.mmsv); + /* Advance the ring next-to-clean */ + igc_is_non_eop(rx_ring, rx_desc); + cleaned_count++; + continue; + } + if (!skb) { xdp_init_buff(&ctx.xdp, truesize, &rx_ring->xdp_rxq); xdp_prepare_buff(&ctx.xdp, pktbuf - igc_rx_offset(rx_ring), @@ -3065,6 +3074,11 @@ static bool igc_clean_tx_irq(struct igc_q_vector *q_vector, int napi_budget) if (!(eop_desc->wb.status & cpu_to_le32(IGC_TXD_STAT_DD))) break; + if (static_branch_unlikely(&igc_fpe_enabled) && + igc_fpe_transmitted_smd_v(tx_desc)) + ethtool_mmsv_event_handle(&adapter->fpe.mmsv, + ETHTOOL_MMSV_LD_SENT_VERIFY_MPACKET); + /* Hold the completions while there's a pending tx hardware * timestamp request from XDP Tx metadata. */ @@ -3956,6 +3970,30 @@ static int igc_uc_unsync(struct net_device *netdev, const unsigned char *addr) return 0; } +/** + * igc_enable_empty_addr_recv - Enable rx of packets with all-zeroes MAC address + * @adapter: Pointer to the igc_adapter structure. + * + * Frame preemption verification requires that packets with the all-zeroes + * MAC address are allowed to be received by IGC. This function adds the + * all-zeroes destination address to the list of acceptable addresses. + * + * Return: 0 on success, negative value otherwise. + */ +int igc_enable_empty_addr_recv(struct igc_adapter *adapter) +{ + u8 empty[ETH_ALEN] = { }; + + return igc_add_mac_filter(adapter, IGC_MAC_FILTER_TYPE_DST, empty, -1); +} + +void igc_disable_empty_addr_recv(struct igc_adapter *adapter) +{ + u8 empty[ETH_ALEN] = { }; + + igc_del_mac_filter(adapter, IGC_MAC_FILTER_TYPE_DST, empty); +} + /** * igc_set_rx_mode - Secondary Unicast, Multicast and Promiscuous mode set * @netdev: network interface device structure @@ -5755,6 +5793,10 @@ static void igc_watchdog_task(struct work_struct *work) */ igc_tsn_adjust_txtime_offset(adapter); + if (adapter->fpe.mmsv.pmac_enabled) + ethtool_mmsv_link_state_handle(&adapter->fpe.mmsv, + true); + if (adapter->link_speed != SPEED_1000) goto no_wait; @@ -5790,6 +5832,10 @@ static void igc_watchdog_task(struct work_struct *work) netdev_info(netdev, "NIC Link is Down\n"); netif_carrier_off(netdev); + if (adapter->fpe.mmsv.pmac_enabled) + ethtool_mmsv_link_state_handle(&adapter->fpe.mmsv, + false); + /* link state has changed, schedule phy info update */ if (!test_bit(__IGC_DOWN, &adapter->state)) mod_timer(&adapter->phy_info_timer, @@ -7110,6 +7156,8 @@ static int igc_probe(struct pci_dev *pdev, igc_tsn_clear_schedule(adapter); + igc_fpe_init(adapter); + /* reset the hardware with the new settings */ igc_reset(adapter); diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c index f0213cfce07d..52ff6f324d79 100644 --- a/drivers/net/ethernet/intel/igc/igc_tsn.c +++ b/drivers/net/ethernet/intel/igc/igc_tsn.c @@ -1,10 +1,153 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright (c) 2019 Intel Corporation */ +#include <linux/kernel.h> #include "igc.h" +#include "igc_base.h" #include "igc_hw.h" #include "igc_tsn.h" +DEFINE_STATIC_KEY_FALSE(igc_fpe_enabled); + +static int igc_fpe_init_smd_frame(struct igc_ring *ring, + struct igc_tx_buffer *buffer, + struct sk_buff *skb) +{ + dma_addr_t dma = dma_map_single(ring->dev, skb->data, skb->len, + DMA_TO_DEVICE); + + if (dma_mapping_error(ring->dev, dma)) { + netdev_err_once(ring->netdev, "Failed to map DMA for TX\n"); + return -ENOMEM; + } + + buffer->skb = skb; + buffer->protocol = 0; + buffer->bytecount = skb->len; + buffer->gso_segs = 1; + buffer->time_stamp = jiffies; + dma_unmap_len_set(buffer, len, skb->len); + dma_unmap_addr_set(buffer, dma, dma); + + return 0; +} + +static int igc_fpe_init_tx_descriptor(struct igc_ring *ring, + struct sk_buff *skb, int type) +{ + struct igc_tx_buffer *buffer; + union igc_adv_tx_desc *desc; + u32 cmd_type, olinfo_status; + int err; + + if (!igc_desc_unused(ring)) + return -EBUSY; + + if (type != IGC_TXD_POPTS_SMD_V && type != IGC_TXD_POPTS_SMD_R) + return -EINVAL; + + buffer = &ring->tx_buffer_info[ring->next_to_use]; + err = igc_fpe_init_smd_frame(ring, buffer, skb); + if (err) + return err; + + cmd_type = IGC_ADVTXD_DTYP_DATA | IGC_ADVTXD_DCMD_DEXT | + IGC_ADVTXD_DCMD_IFCS | IGC_TXD_DCMD | + buffer->bytecount; + olinfo_status = buffer->bytecount << IGC_ADVTXD_PAYLEN_SHIFT; + + switch (type) { + case IGC_TXD_POPTS_SMD_V: + olinfo_status |= (IGC_TXD_POPTS_SMD_V << IGC_TXD_POPTS_SMD_SHIFT); + break; + case IGC_TXD_POPTS_SMD_R: + olinfo_status |= (IGC_TXD_POPTS_SMD_R << IGC_TXD_POPTS_SMD_SHIFT); + break; + } + + desc = IGC_TX_DESC(ring, ring->next_to_use); + desc->read.cmd_type_len = cpu_to_le32(cmd_type); + desc->read.olinfo_status = cpu_to_le32(olinfo_status); + desc->read.buffer_addr = cpu_to_le64(dma_unmap_addr(buffer, dma)); + + netdev_tx_sent_queue(txring_txq(ring), skb->len); + + buffer->next_to_watch = desc; + ring->next_to_use = (ring->next_to_use + 1) % ring->count; + + return 0; +} + +bool igc_fpe_transmitted_smd_v(union igc_adv_tx_desc *tx_desc) +{ + u8 smd = FIELD_GET(IGC_TXD_POPTS_SMD_MASK, tx_desc->read.olinfo_status); + + return smd == IGC_TXD_POPTS_SMD_V; +} + +static int igc_fpe_xmit_smd_frame(struct igc_adapter *adapter, int type) +{ + int cpu = smp_processor_id(); + struct netdev_queue *nq; + struct igc_ring *ring; + struct sk_buff *skb; + void *data; + int err; + + if (!netif_running(adapter->netdev)) + return -ENOTCONN; + + ring = igc_get_tx_ring(adapter, cpu); + nq = txring_txq(ring); + + skb = alloc_skb(SMD_FRAME_SIZE, GFP_KERNEL); + if (!skb) + return -ENOMEM; + + data = skb_put(skb, SMD_FRAME_SIZE); + memset(data, 0, SMD_FRAME_SIZE); + + __netif_tx_lock(nq, cpu); + + err = igc_fpe_init_tx_descriptor(ring, skb, type); + igc_flush_tx_descriptors(ring); + + __netif_tx_unlock(nq); + + return err; +} + +static void igc_fpe_send_mpacket(struct ethtool_mmsv *mmsv, + enum ethtool_mpacket type) +{ + struct fpe_t *fpe = container_of(mmsv, struct fpe_t, mmsv); + struct igc_adapter *adapter; + int err; + + adapter = container_of(fpe, struct igc_adapter, fpe); + + if (type == ETHTOOL_MPACKET_VERIFY) { + err = igc_fpe_xmit_smd_frame(adapter, IGC_TXD_POPTS_SMD_V); + if (err) + netdev_err(adapter->netdev, "Error sending SMD-V\n"); + } else if (type == ETHTOOL_MPACKET_RESPONSE) { + err = igc_fpe_xmit_smd_frame(adapter, IGC_TXD_POPTS_SMD_R); + if (err) + netdev_err(adapter->netdev, "Error sending SMD-R frame\n"); + } +} + +static const struct ethtool_mmsv_ops igc_mmsv_ops = { + .send_mpacket = igc_fpe_send_mpacket, +}; + +void igc_fpe_init(struct igc_adapter *adapter) +{ + struct fpe_t *fpe = &adapter->fpe; + + ethtool_mmsv_init(&adapter->fpe.mmsv, adapter->netdev, &igc_mmsv_ops); +} + static bool is_any_launchtime(struct igc_adapter *adapter) { int i; @@ -49,6 +192,9 @@ static unsigned int igc_tsn_new_flags(struct igc_adapter *adapter) if (adapter->strict_priority_enable) new_flags |= IGC_FLAG_TSN_LEGACY_ENABLED; + if (adapter->fpe.mmsv.pmac_enabled) + new_flags |= IGC_FLAG_TSN_PREEMPT_ENABLED; + return new_flags; } @@ -148,7 +294,8 @@ static int igc_tsn_disable_offload(struct igc_adapter *adapter) tqavctrl = rd32(IGC_TQAVCTRL); tqavctrl &= ~(IGC_TQAVCTRL_TRANSMIT_MODE_TSN | - IGC_TQAVCTRL_ENHANCED_QAV | IGC_TQAVCTRL_FUTSCDDIS); + IGC_TQAVCTRL_ENHANCED_QAV | IGC_TQAVCTRL_FUTSCDDIS | + IGC_TQAVCTRL_PREEMPT_ENA); wr32(IGC_TQAVCTRL, tqavctrl); @@ -370,10 +517,14 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter) wr32(IGC_TXQCTL(i), txqctl); } - tqavctrl = rd32(IGC_TQAVCTRL) & ~IGC_TQAVCTRL_FUTSCDDIS; + tqavctrl = rd32(IGC_TQAVCTRL) & ~(IGC_TQAVCTRL_FUTSCDDIS | + IGC_TQAVCTRL_PREEMPT_ENA); tqavctrl |= IGC_TQAVCTRL_TRANSMIT_MODE_TSN | IGC_TQAVCTRL_ENHANCED_QAV; + if (adapter->fpe.mmsv.pmac_enabled) + tqavctrl |= IGC_TQAVCTRL_PREEMPT_ENA; + adapter->qbv_count++; cycle = adapter->cycle_time; @@ -434,6 +585,11 @@ int igc_tsn_reset(struct igc_adapter *adapter) unsigned int new_flags; int err = 0; + if (adapter->fpe.mmsv.pmac_enabled) + igc_enable_empty_addr_recv(adapter); + else + igc_disable_empty_addr_recv(adapter); + new_flags = igc_tsn_new_flags(adapter); if (!(new_flags & IGC_FLAG_TSN_ANY_ENABLED)) diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.h b/drivers/net/ethernet/intel/igc/igc_tsn.h index 98ec845a86bf..889d70c6d827 100644 --- a/drivers/net/ethernet/intel/igc/igc_tsn.h +++ b/drivers/net/ethernet/intel/igc/igc_tsn.h @@ -4,9 +4,42 @@ #ifndef _IGC_TSN_H_ #define _IGC_TSN_H_ +#define SMD_FRAME_SIZE 60 + +DECLARE_STATIC_KEY_FALSE(igc_fpe_enabled); + +void igc_fpe_init(struct igc_adapter *adapter); +u32 igc_fpe_get_supported_frag_size(u32 user_frag_size); +bool igc_fpe_transmitted_smd_v(union igc_adv_tx_desc *tx_desc); int igc_tsn_offload_apply(struct igc_adapter *adapter); int igc_tsn_reset(struct igc_adapter *adapter); void igc_tsn_adjust_txtime_offset(struct igc_adapter *adapter); bool igc_tsn_is_taprio_activated_by_user(struct igc_adapter *adapter); +static inline void igc_fpe_lp_event_status(union igc_adv_rx_desc *rx_desc, + struct ethtool_mmsv *mmsv) +{ + __le32 status_error = le32_to_cpu(rx_desc->wb.upper.status_error); + int smd; + + smd = FIELD_GET(IGC_RXDADV_STAT_SMD_TYPE_MASK, status_error); + + if (smd == IGC_RXD_STAT_SMD_TYPE_V) + ethtool_mmsv_event_handle(mmsv, ETHTOOL_MMSV_LP_SENT_VERIFY_MPACKET); + else if (smd == IGC_RXD_STAT_SMD_TYPE_R) + ethtool_mmsv_event_handle(mmsv, ETHTOOL_MMSV_LP_SENT_RESPONSE_MPACKET); +} + +static inline bool igc_fpe_is_verify_or_response(union igc_adv_rx_desc *rx_desc, + unsigned int size) +{ + __le32 status_error = le32_to_cpu(rx_desc->wb.upper.status_error); + int smd; + + smd = FIELD_GET(IGC_RXDADV_STAT_SMD_TYPE_MASK, status_error); + + return ((smd == IGC_RXD_STAT_SMD_TYPE_V || smd == IGC_RXD_STAT_SMD_TYPE_R) && + size == SMD_FRAME_SIZE); +} + #endif /* _IGC_BASE_H */