diff mbox series

[net-next,2/2] net: enetc: add support for software TSO

Message ID 20211006201308.2492890-3-ioana.ciornei@nxp.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series net: enetc: add support for software TSO | expand

Checks

Context Check Description
netdev/cover_letter success Series has a cover letter
netdev/fixes_present success Fixes tag not required for -next series
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers success CCed 4 of 4 maintainers
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/module_param success Was 0 now: 0
netdev/build_32bit fail Errors and warnings before: 3 this patch: 8
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success No Fixes tag
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns WARNING: line length of 95 exceeds 80 columns
netdev/build_allmodconfig_warn fail Errors and warnings before: 3 this patch: 8
netdev/header_inline success No static functions without inline keyword in header files

Commit Message

Ioana Ciornei Oct. 6, 2021, 8:13 p.m. UTC
This patch adds support for driver level TSO in the enetc driver using
the TSO API.

Beside using the usual tso_build_hdr(), tso_build_data() this specific
implementation also has to compute the checksum, both IP and L4, for
each resulted segment. This is because the ENETC controller does not
support Tx checksum offload which is needed in order to take advantage
of TSO.

With the workaround for the ENETC MDIO erratum in place the Tx path of
the driver is forced to lock/unlock for each skb sent. This is why, even
though we are computing the checksum by hand we see the following
improvement in TCP termination on the LS1028A SoC, on a single A72 core
running at 1.3GHz:

before: 1.63 Gbits/sec
after:  2.34 Gbits/sec

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
---
 drivers/net/ethernet/freescale/enetc/enetc.c  | 276 +++++++++++++++++-
 drivers/net/ethernet/freescale/enetc/enetc.h  |   4 +
 .../net/ethernet/freescale/enetc/enetc_pf.c   |   5 +-
 .../net/ethernet/freescale/enetc/enetc_vf.c   |   5 +-
 4 files changed, 274 insertions(+), 16 deletions(-)

Comments

Jakub Kicinski Oct. 7, 2021, 12:30 a.m. UTC | #1
On Wed,  6 Oct 2021 23:13:08 +0300 Ioana Ciornei wrote:
> +__wsum enetc_tso_hdr_csum(struct tso_t *tso, struct sk_buff *skb, char *hdr,
> +			  int hdr_len, int *l4_hdr_len)

> +void enetc_tso_complete_csum(struct enetc_bdr *tx_ring, struct tso_t *tso, struct sk_buff *skb,
> +			     char *hdr, int len, __wsum sum)

static x2
Ioana Ciornei Oct. 7, 2021, 6:48 a.m. UTC | #2
On Wed, Oct 06, 2021 at 05:30:21PM -0700, Jakub Kicinski wrote:
> On Wed,  6 Oct 2021 23:13:08 +0300 Ioana Ciornei wrote:
> > +__wsum enetc_tso_hdr_csum(struct tso_t *tso, struct sk_buff *skb, char *hdr,
> > +			  int hdr_len, int *l4_hdr_len)
> 
> > +void enetc_tso_complete_csum(struct enetc_bdr *tx_ring, struct tso_t *tso, struct sk_buff *skb,
> > +			     char *hdr, int len, __wsum sum)
> 
> static x2

Thanks. Forgot about these.
Claudiu Manoil Oct. 7, 2021, 7:59 a.m. UTC | #3
> -----Original Message-----
> From: Ioana Ciornei <ioana.ciornei@nxp.com>
> Sent: Wednesday, October 6, 2021 11:13 PM
[...]
> +static int enetc_map_tx_tso_buffs(struct enetc_bdr *tx_ring, struct
> sk_buff *skb)
> +{
> +	int hdr_len, total_len, data_len;
> +	struct enetc_tx_swbd *tx_swbd;
> +	union enetc_tx_bd *txbd;
> +	struct tso_t tso;
> +	__wsum csum, csum2;
> +	int count = 0, pos;
> +	int err, i;
> +
> +	/* Check that we have enough BDs for this skb */
> +	if (enetc_bd_unused(tx_ring) < tso_count_descs(skb)) {
> +		if (net_ratelimit())
> +			netdev_err(tx_ring->ndev, "Not enough BDs for TSO!\n");
> +		return 0;
> +	}
> +

On this path, in case the interface is congested, you will drop the packet in the driver,
and the stack will think transmission was successful and will continue to deliver skbs
to the driver. Is this the right thing to do?
Ioana Ciornei Oct. 7, 2021, 8:33 a.m. UTC | #4
On Thu, Oct 07, 2021 at 07:59:25AM +0000, Claudiu Manoil wrote:
> > -----Original Message-----
> > From: Ioana Ciornei <ioana.ciornei@nxp.com>
> > Sent: Wednesday, October 6, 2021 11:13 PM
> [...]
> > +static int enetc_map_tx_tso_buffs(struct enetc_bdr *tx_ring, struct
> > sk_buff *skb)
> > +{
> > +	int hdr_len, total_len, data_len;
> > +	struct enetc_tx_swbd *tx_swbd;
> > +	union enetc_tx_bd *txbd;
> > +	struct tso_t tso;
> > +	__wsum csum, csum2;
> > +	int count = 0, pos;
> > +	int err, i;
> > +
> > +	/* Check that we have enough BDs for this skb */
> > +	if (enetc_bd_unused(tx_ring) < tso_count_descs(skb)) {
> > +		if (net_ratelimit())
> > +			netdev_err(tx_ring->ndev, "Not enough BDs for TSO!\n");
> > +		return 0;
> > +	}
> > +
> 
> On this path, in case the interface is congested, you will drop the packet in the driver,
> and the stack will think transmission was successful and will continue to deliver skbs
> to the driver. Is this the right thing to do?
> 

Good point. I should have mimicked the non-GSO code path when congestion
occurs and stop the subqueue.

For symmetry I'll also move this check outside of the
enetc_map_tx_tso_buffs() to get the code looking somewhat like this:


	if (skb_is_gso(skb)) {
		if (enetc_bd_unused(tx_ring) < tso_count_descs(skb)) {
			netif_stop_subqueue(ndev, tx_ring->index);
			return NETDEV_TX_BUSY;
		}

		enetc_lock_mdio();
		count = enetc_map_tx_tso_buffs(tx_ring, skb);
		enetc_unlock_mdio();
	} else {
		if (unlikely(skb_shinfo(skb)->nr_frags > ENETC_MAX_SKB_FRAGS))
			if (unlikely(skb_linearize(skb)))
				goto drop_packet_err;

		count = skb_shinfo(skb)->nr_frags + 1; /* fragments + head */
		if (enetc_bd_unused(tx_ring) < ENETC_TXBDS_NEEDED(count)) {
			netif_stop_subqueue(ndev, tx_ring->index);
			return NETDEV_TX_BUSY;
		}

		if (skb->ip_summed == CHECKSUM_PARTIAL) {
			err = skb_csum_hwoffload_help(skb, 0);
			if (err)
				goto drop_packet_err;
		}
		enetc_lock_mdio();
		count = enetc_map_tx_buffs(tx_ring, skb);
		enetc_unlock_mdio();
	}


Ioana
Claudiu Manoil Oct. 7, 2021, 9:06 a.m. UTC | #5
> -----Original Message-----
> From: Ioana Ciornei <ioana.ciornei@nxp.com>
> Sent: Thursday, October 7, 2021 11:33 AM
> To: Claudiu Manoil <claudiu.manoil@nxp.com>
> Cc: davem@davemloft.net; kuba@kernel.org; netdev@vger.kernel.org;
> Vladimir Oltean <vladimir.oltean@nxp.com>
> Subject: Re: [PATCH net-next 2/2] net: enetc: add support for software TSO
> 
> On Thu, Oct 07, 2021 at 07:59:25AM +0000, Claudiu Manoil wrote:
> > > -----Original Message-----
> > > From: Ioana Ciornei <ioana.ciornei@nxp.com>
> > > Sent: Wednesday, October 6, 2021 11:13 PM
> > [...]
> > > +static int enetc_map_tx_tso_buffs(struct enetc_bdr *tx_ring, struct
> > > sk_buff *skb)
> > > +{
> > > +	int hdr_len, total_len, data_len;
> > > +	struct enetc_tx_swbd *tx_swbd;
> > > +	union enetc_tx_bd *txbd;
> > > +	struct tso_t tso;
> > > +	__wsum csum, csum2;
> > > +	int count = 0, pos;
> > > +	int err, i;
> > > +
> > > +	/* Check that we have enough BDs for this skb */
> > > +	if (enetc_bd_unused(tx_ring) < tso_count_descs(skb)) {
> > > +		if (net_ratelimit())
> > > +			netdev_err(tx_ring->ndev, "Not enough BDs for TSO!\n");
> > > +		return 0;
> > > +	}
> > > +
> >
> > On this path, in case the interface is congested, you will drop the packet in the driver,
> > and the stack will think transmission was successful and will continue to deliver skbs
> > to the driver. Is this the right thing to do?
> >
> 
> Good point. I should have mimicked the non-GSO code path when
> congestion occurs and stop the subqueue.
> 
> For symmetry I'll also move this check outside of the
> enetc_map_tx_tso_buffs() to get the code looking somewhat like this:
> 
> 
> 	if (skb_is_gso(skb)) {
> 		if (enetc_bd_unused(tx_ring) < tso_count_descs(skb)) {
> 			netif_stop_subqueue(ndev, tx_ring->index);
> 			return NETDEV_TX_BUSY;
> 		}
> 
> 		enetc_lock_mdio();
> 		count = enetc_map_tx_tso_buffs(tx_ring, skb);
> 		enetc_unlock_mdio();
> 	} else {
> 		if (unlikely(skb_shinfo(skb)->nr_frags > ENETC_MAX_SKB_FRAGS))
> 			if (unlikely(skb_linearize(skb)))
> 				goto drop_packet_err;
> 

Ok for handling congestion, the idea is good, but now another issue emerges.
The ENETC_MAX_SKB_FRAGS check is due to a hardware limitation. Enetc cannot
handle more than 15 chained buffer descriptors for transmission (i.e. 13 frags + 1
for the linear part + 1 optional extension BD). This limitation is specified in the
hardware manual now (after I've hit it during development).
So you should add this check on the TSO processing path too, but adapted to that
case, of course, since skb_linearize() would not work with TSO.

Thanks.
Ioana Ciornei Oct. 7, 2021, 9:26 a.m. UTC | #6
On Thu, Oct 07, 2021 at 09:06:45AM +0000, Claudiu Manoil wrote:
> > -----Original Message-----
> > From: Ioana Ciornei <ioana.ciornei@nxp.com>
> > Sent: Thursday, October 7, 2021 11:33 AM
> > To: Claudiu Manoil <claudiu.manoil@nxp.com>
> > Cc: davem@davemloft.net; kuba@kernel.org; netdev@vger.kernel.org;
> > Vladimir Oltean <vladimir.oltean@nxp.com>
> > Subject: Re: [PATCH net-next 2/2] net: enetc: add support for software TSO
> > 
> > On Thu, Oct 07, 2021 at 07:59:25AM +0000, Claudiu Manoil wrote:
> > > > -----Original Message-----
> > > > From: Ioana Ciornei <ioana.ciornei@nxp.com>
> > > > Sent: Wednesday, October 6, 2021 11:13 PM
> > > [...]
> > > > +static int enetc_map_tx_tso_buffs(struct enetc_bdr *tx_ring, struct
> > > > sk_buff *skb)
> > > > +{
> > > > +	int hdr_len, total_len, data_len;
> > > > +	struct enetc_tx_swbd *tx_swbd;
> > > > +	union enetc_tx_bd *txbd;
> > > > +	struct tso_t tso;
> > > > +	__wsum csum, csum2;
> > > > +	int count = 0, pos;
> > > > +	int err, i;
> > > > +
> > > > +	/* Check that we have enough BDs for this skb */
> > > > +	if (enetc_bd_unused(tx_ring) < tso_count_descs(skb)) {
> > > > +		if (net_ratelimit())
> > > > +			netdev_err(tx_ring->ndev, "Not enough BDs for TSO!\n");
> > > > +		return 0;
> > > > +	}
> > > > +
> > >
> > > On this path, in case the interface is congested, you will drop the packet in the driver,
> > > and the stack will think transmission was successful and will continue to deliver skbs
> > > to the driver. Is this the right thing to do?
> > >
> > 
> > Good point. I should have mimicked the non-GSO code path when
> > congestion occurs and stop the subqueue.
> > 
> > For symmetry I'll also move this check outside of the
> > enetc_map_tx_tso_buffs() to get the code looking somewhat like this:
> > 
> > 
> > 	if (skb_is_gso(skb)) {
> > 		if (enetc_bd_unused(tx_ring) < tso_count_descs(skb)) {
> > 			netif_stop_subqueue(ndev, tx_ring->index);
> > 			return NETDEV_TX_BUSY;
> > 		}
> > 
> > 		enetc_lock_mdio();
> > 		count = enetc_map_tx_tso_buffs(tx_ring, skb);
> > 		enetc_unlock_mdio();G
> > 	} else {
> > 		if (unlikely(skb_shinfo(skb)->nr_frags > ENETC_MAX_SKB_FRAGS))
> > 			if (unlikely(skb_linearize(skb)))
> > 				goto drop_packet_err;
> > 
> 
> Ok for handling congestion, the idea is good, but now another issue emerges.
> The ENETC_MAX_SKB_FRAGS check is due to a hardware limitation. Enetc cannot
> handle more than 15 chained buffer descriptors for transmission (i.e. 13 frags + 1
> for the linear part + 1 optional extension BD). This limitation is specified in the
> hardware manual now (after I've hit it during development).

On the TSO processing case this is way less likely to happen since the
resulted segment would have to need 15 chained BDs, not the entire skb.
At a maximum, I have seen one frame needing 4-5 BDs: 1 for the header,
1 extention BD in case of VLAN and 1-3 for the data part.

> So you should add this check on the TSO processing path too, but adapted to that
> case, of course, since skb_linearize() would not work with TSO.
> 

Anyhow, I'll add this check directly in the while loop which creates the
chain of BDs for a frame, since that is the only time that I know how
many BDs are needed for a resulted frame.

Thanks.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index a92bfd660f22..7a8e920725de 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -8,6 +8,7 @@ 
 #include <linux/vmalloc.h>
 #include <linux/ptp_classify.h>
 #include <net/pkt_sched.h>
+#include <net/tso.h>
 
 static int enetc_num_stack_tx_queues(struct enetc_ndev_priv *priv)
 {
@@ -314,6 +315,235 @@  static int enetc_map_tx_buffs(struct enetc_bdr *tx_ring, struct sk_buff *skb)
 	return 0;
 }
 
+static void enetc_map_tx_tso_hdr(struct enetc_bdr *tx_ring, struct sk_buff *skb,
+				 struct enetc_tx_swbd *tx_swbd,
+				 union enetc_tx_bd *txbd, int *i, int hdr_len,
+				 int data_len)
+{
+	union enetc_tx_bd txbd_tmp;
+	u8 flags = 0, e_flags = 0;
+	dma_addr_t addr;
+
+	enetc_clear_tx_bd(&txbd_tmp);
+	addr = tx_ring->tso_headers_dma + *i * TSO_HEADER_SIZE;
+
+	if (skb_vlan_tag_present(skb))
+		flags |= ENETC_TXBD_FLAGS_EX;
+
+	txbd_tmp.addr = cpu_to_le64(addr);
+	txbd_tmp.buf_len = cpu_to_le16(hdr_len);
+
+	/* first BD needs frm_len and offload flags set */
+	txbd_tmp.frm_len = cpu_to_le16(hdr_len + data_len);
+	txbd_tmp.flags = flags;
+
+	/* For the TSO header we do not set the dma address since we do not
+	 * want it unmapped when we do cleanup. We still set len so that we
+	 * count the bytes sent.
+	 */
+	tx_swbd->len = hdr_len;
+	tx_swbd->do_twostep_tstamp = false;
+	tx_swbd->check_wb = false;
+
+	/* Actually write the header in the BD */
+	*txbd = txbd_tmp;
+
+	/* Add extension BD for VLAN */
+	if (flags & ENETC_TXBD_FLAGS_EX) {
+		/* Get the next BD */
+		enetc_bdr_idx_inc(tx_ring, i);
+		txbd = ENETC_TXBD(*tx_ring, *i);
+		tx_swbd = &tx_ring->tx_swbd[*i];
+		prefetchw(txbd);
+
+		/* Setup the VLAN fields */
+		enetc_clear_tx_bd(&txbd_tmp);
+		txbd_tmp.ext.vid = cpu_to_le16(skb_vlan_tag_get(skb));
+		txbd_tmp.ext.tpid = 0; /* < C-TAG */
+		e_flags |= ENETC_TXBD_E_FLAGS_VLAN_INS;
+
+		/* Write the BD */
+		txbd_tmp.ext.e_flags = e_flags;
+		*txbd = txbd_tmp;
+	}
+}
+
+static int enetc_map_tx_tso_data(struct enetc_bdr *tx_ring, struct sk_buff *skb,
+				 struct enetc_tx_swbd *tx_swbd,
+				 union enetc_tx_bd *txbd, char *data,
+				 int size, bool last_bd)
+{
+	union enetc_tx_bd txbd_tmp;
+	dma_addr_t addr;
+	u8 flags = 0;
+
+	enetc_clear_tx_bd(&txbd_tmp);
+
+	addr = dma_map_single(tx_ring->dev, data, size, DMA_TO_DEVICE);
+	if (unlikely(dma_mapping_error(tx_ring->dev, addr))) {
+		netdev_err(tx_ring->ndev, "DMA map error\n");
+		return -ENOMEM;
+	}
+
+	if (last_bd) {
+		flags |= ENETC_TXBD_FLAGS_F;
+		tx_swbd->is_eof = 1;
+	}
+
+	txbd_tmp.addr = cpu_to_le64(addr);
+	txbd_tmp.buf_len = cpu_to_le16(size);
+	txbd_tmp.flags = flags;
+
+	tx_swbd->dma = addr;
+	tx_swbd->len = size;
+	tx_swbd->dir = DMA_TO_DEVICE;
+
+	*txbd = txbd_tmp;
+
+	return 0;
+}
+
+__wsum enetc_tso_hdr_csum(struct tso_t *tso, struct sk_buff *skb, char *hdr,
+			  int hdr_len, int *l4_hdr_len)
+{
+	int mac_hdr_len = skb_network_offset(skb);
+	struct iphdr *iph = (void *)(hdr + mac_hdr_len);
+	struct tcphdr *tcph;
+	struct udphdr *udph;
+
+	if (tso->tlen == sizeof(struct udphdr)) {
+		udph = (struct udphdr *)(hdr + skb_transport_offset(skb));
+		udph->check = 0;
+	} else {
+		tcph = (struct tcphdr *)(hdr + skb_transport_offset(skb));
+		tcph->check = 0;
+	}
+
+	/* Compute the IP checksum. This is necessary since tso_build_hdr()
+	 * already incremented the IP ID field.
+	 */
+	iph->check = 0;
+	iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
+
+	/* Compute the checksum over the L4 header. */
+	*l4_hdr_len = hdr_len - skb_transport_offset(skb);
+	return csum_partial((char *)tcph, *l4_hdr_len, 0);
+}
+
+void enetc_tso_complete_csum(struct enetc_bdr *tx_ring, struct tso_t *tso, struct sk_buff *skb,
+			     char *hdr, int len, __wsum sum)
+{
+	struct tcphdr *tcph;
+
+	/* Complete the L4 checksum by appending the pseudo-header to the
+	 * already computed checksum.
+	 */
+	tcph = (struct tcphdr *)(hdr + skb_transport_offset(skb));
+	tcph->check = csum_tcpudp_magic(ip_hdr(skb)->saddr, ip_hdr(skb)->daddr,
+					len, ip_hdr(skb)->protocol, sum);
+}
+
+static int enetc_map_tx_tso_buffs(struct enetc_bdr *tx_ring, struct sk_buff *skb)
+{
+	int hdr_len, total_len, data_len;
+	struct enetc_tx_swbd *tx_swbd;
+	union enetc_tx_bd *txbd;
+	struct tso_t tso;
+	__wsum csum, csum2;
+	int count = 0, pos;
+	int err, i;
+
+	/* Check that we have enough BDs for this skb */
+	if (enetc_bd_unused(tx_ring) < tso_count_descs(skb)) {
+		if (net_ratelimit())
+			netdev_err(tx_ring->ndev, "Not enough BDs for TSO!\n");
+		return 0;
+	}
+
+	/* Initialize the TSO handler, and prepare the first payload */
+	hdr_len = tso_start(skb, &tso);
+	total_len = skb->len - hdr_len;
+	i = tx_ring->next_to_use;
+
+	while (total_len > 0) {
+		char *hdr;
+
+		/* Get the BD */
+		txbd = ENETC_TXBD(*tx_ring, i);
+		tx_swbd = &tx_ring->tx_swbd[i];
+		prefetchw(txbd);
+
+		/* Determine the length of this packet */
+		data_len = min_t(int, skb_shinfo(skb)->gso_size, total_len);
+		total_len -= data_len;
+
+		/* prepare packet headers: MAC + IP + TCP */
+		hdr = tx_ring->tso_headers + i * TSO_HEADER_SIZE;
+		tso_build_hdr(skb, hdr, &tso, data_len, total_len == 0);
+
+		/* compute the csum over the L4 header */
+		csum = enetc_tso_hdr_csum(&tso, skb, hdr, hdr_len, &pos);
+		enetc_map_tx_tso_hdr(tx_ring, skb, tx_swbd, txbd, &i, hdr_len, data_len);
+		count++;
+
+		while (data_len > 0) {
+			int size;
+
+			size = min_t(int, tso.size, data_len);
+
+			/* Advance the index in the BDR */
+			enetc_bdr_idx_inc(tx_ring, &i);
+			txbd = ENETC_TXBD(*tx_ring, i);
+			tx_swbd = &tx_ring->tx_swbd[i];
+			prefetchw(txbd);
+
+			/* Compute the checksum over this segment of data and
+			 * add it to the csum already computed (over the L4
+			 * header and possible other data segments).
+			 */
+			csum2 = csum_partial(tso.data, size, 0);
+			csum = csum_block_add(csum, csum2, pos);
+			pos += size;
+
+			err = enetc_map_tx_tso_data(tx_ring, skb, tx_swbd, txbd,
+						    tso.data, size,
+						    size == data_len);
+			if (err)
+				goto err_map_data;
+
+			data_len -= size;
+			count++;
+			tso_build_data(skb, &tso, size);
+		}
+
+		enetc_tso_complete_csum(tx_ring, &tso, skb, hdr, pos, csum);
+
+		if (total_len == 0)
+			tx_swbd->skb = skb;
+
+		/* Go to the next BD */
+		enetc_bdr_idx_inc(tx_ring, &i);
+	}
+
+	tx_ring->next_to_use = i;
+	enetc_update_tx_ring_tail(tx_ring);
+
+	return count;
+
+err_map_data:
+	dev_err(tx_ring->dev, "DMA map error");
+
+	do {
+		tx_swbd = &tx_ring->tx_swbd[i];
+		enetc_free_tx_frame(tx_ring, tx_swbd);
+		if (i == 0)
+			i = tx_ring->bd_count;
+		i--;
+	} while (count--);
+
+	return 0;
+}
+
 static netdev_tx_t enetc_start_xmit(struct sk_buff *skb,
 				    struct net_device *ndev)
 {
@@ -342,14 +572,17 @@  static netdev_tx_t enetc_start_xmit(struct sk_buff *skb,
 		return NETDEV_TX_BUSY;
 	}
 
-	if (skb->ip_summed == CHECKSUM_PARTIAL) {
-		err = skb_csum_hwoffload_help(skb, 0);
-		if (err)
-			goto drop_packet_err;
-	}
-
 	enetc_lock_mdio();
-	count = enetc_map_tx_buffs(tx_ring, skb);
+	if (skb_is_gso(skb)) {
+		count = enetc_map_tx_tso_buffs(tx_ring, skb);
+	} else {
+		if (skb->ip_summed == CHECKSUM_PARTIAL) {
+			err = skb_csum_hwoffload_help(skb, 0);
+			if (err)
+				goto drop_packet_err;
+		}
+		count = enetc_map_tx_buffs(tx_ring, skb);
+	}
 	enetc_unlock_mdio();
 
 	if (unlikely(!count))
@@ -573,7 +806,7 @@  static bool enetc_clean_tx_ring(struct enetc_bdr *tx_ring, int napi_budget)
 		if (xdp_frame) {
 			xdp_return_frame(xdp_frame);
 		} else if (skb) {
-			if (unlikely(tx_swbd->skb->cb[0] &
+			if (unlikely(skb->cb[0] &
 				     ENETC_F_TX_ONESTEP_SYNC_TSTAMP)) {
 				/* Start work to release lock for next one-step
 				 * timestamping packet. And send one skb in
@@ -1499,15 +1732,30 @@  static int enetc_alloc_txbdr(struct enetc_bdr *txr)
 		return -ENOMEM;
 
 	err = enetc_dma_alloc_bdr(txr, sizeof(union enetc_tx_bd));
-	if (err) {
-		vfree(txr->tx_swbd);
-		return err;
-	}
+	if (err)
+		goto err_alloc_bdr;
+
+	txr->tso_headers = dma_alloc_coherent(txr->dev,
+					      txr->bd_count * TSO_HEADER_SIZE,
+					      &txr->tso_headers_dma,
+					      GFP_KERNEL);
+	if (err)
+		goto err_alloc_tso;
 
 	txr->next_to_clean = 0;
 	txr->next_to_use = 0;
 
 	return 0;
+
+err_alloc_tso:
+	dma_free_coherent(txr->dev, txr->bd_count * sizeof(union enetc_tx_bd),
+			  txr->bd_base, txr->bd_dma_base);
+	txr->bd_base = NULL;
+err_alloc_bdr:
+	vfree(txr->tx_swbd);
+	txr->tx_swbd = NULL;
+
+	return err;
 }
 
 static void enetc_free_txbdr(struct enetc_bdr *txr)
@@ -1519,6 +1767,10 @@  static void enetc_free_txbdr(struct enetc_bdr *txr)
 
 	size = txr->bd_count * sizeof(union enetc_tx_bd);
 
+	dma_free_coherent(txr->dev, txr->bd_count * TSO_HEADER_SIZE,
+			  txr->tso_headers, txr->tso_headers_dma);
+	txr->tso_headers = NULL;
+
 	dma_free_coherent(txr->dev, size, txr->bd_base, txr->bd_dma_base);
 	txr->bd_base = NULL;
 
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index 08b283347d9c..fb39e406b7fc 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -112,6 +112,10 @@  struct enetc_bdr {
 	dma_addr_t bd_dma_base;
 	u8 tsd_enable; /* Time specific departure */
 	bool ext_en; /* enable h/w descriptor extensions */
+
+	/* DMA buffer for TSO headers */
+	char *tso_headers;
+	dma_addr_t tso_headers_dma;
 } ____cacheline_aligned_in_smp;
 
 static inline void enetc_bdr_idx_inc(struct enetc_bdr *bdr, int *i)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index 7ac276f8ee4f..024b610753f2 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -760,11 +760,12 @@  static void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
 	ndev->hw_features = NETIF_F_SG | NETIF_F_RXCSUM |
 			    NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX |
 			    NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_LOOPBACK |
-			    NETIF_F_IP_CSUM;
+			    NETIF_F_IP_CSUM | NETIF_F_TSO;
 	ndev->features = NETIF_F_HIGHDMA | NETIF_F_SG | NETIF_F_RXCSUM |
 			 NETIF_F_HW_VLAN_CTAG_TX |
 			 NETIF_F_HW_VLAN_CTAG_RX |
-			 NETIF_F_IP_CSUM;
+			 NETIF_F_IP_CSUM | NETIF_F_TSO;
+	ndev->vlan_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO;
 
 	if (si->num_rss)
 		ndev->hw_features |= NETIF_F_RXHASH;
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_vf.c b/drivers/net/ethernet/freescale/enetc/enetc_vf.c
index 2166a436f818..b37a894f139c 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_vf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_vf.c
@@ -123,11 +123,12 @@  static void enetc_vf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
 	ndev->hw_features = NETIF_F_SG | NETIF_F_RXCSUM |
 			    NETIF_F_HW_VLAN_CTAG_TX |
 			    NETIF_F_HW_VLAN_CTAG_RX |
-			    NETIF_F_IP_CSUM;
+			    NETIF_F_IP_CSUM | NETIF_F_TSO;
 	ndev->features = NETIF_F_HIGHDMA | NETIF_F_SG | NETIF_F_RXCSUM |
 			 NETIF_F_HW_VLAN_CTAG_TX |
 			 NETIF_F_HW_VLAN_CTAG_RX |
-			 NETIF_F_IP_CSUM;
+			 NETIF_F_IP_CSUM | NETIF_F_TSO;
+	ndev->vlan_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO;
 
 	if (si->num_rss)
 		ndev->hw_features |= NETIF_F_RXHASH;