From patchwork Sat Jul 25 19:20:30 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergei Shtylyov X-Patchwork-Id: 6865541 Return-Path: X-Original-To: patchwork-linux-sh@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 774C09F38B for ; Sat, 25 Jul 2015 19:20:44 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 235FD20630 for ; Sat, 25 Jul 2015 19:20:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A6FAF2062B for ; Sat, 25 Jul 2015 19:20:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964797AbbGYTUl (ORCPT ); Sat, 25 Jul 2015 15:20:41 -0400 Received: from mail-la0-f44.google.com ([209.85.215.44]:34045 "EHLO mail-la0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964780AbbGYTUk (ORCPT ); Sat, 25 Jul 2015 15:20:40 -0400 Received: by lafd3 with SMTP id d3so19683356laf.1 for ; Sat, 25 Jul 2015 12:20:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:organization :user-agent:mime-version:content-transfer-encoding:content-type; bh=agjD6xuZQHMYUffabxdTClmlxH2I0xKhGW44wtt+wTU=; b=GPvMiA4hAr6p81Dq6WugdIrWi18dBP4BJYXbeV1AkDnXNFyU71hPpBSdoM9GNiw9OO ymk2YvoUDkdnYgs9VB6wi9L7Rx/DMacYrZFwTM9LdR4I3i9XgmMwHLkdx9Oc7jucswV4 YhWgYAEZAIKZOyhEbH1HJYBWlvXfYF7Xpswx2jkDE0/Vyoh+DsiQTnpRf/coY1rG+2Hb FfzdHSjVIknwHukzIpJ3Zu1JRKPlUA7V3HgiWTLkw7GtSXQdjChayCjWjoOYZJ8X7k3X 4mMTFTBc0BfaX+UPng/Lpfnvk6sNuLMBxLgzRnBVk4tGp3KAsDwLZse9pvmxXXSUWFwN irUA== X-Gm-Message-State: ALoCoQkyZ8OBM0fDnapDBGud9vcAAZOcIUwbwBUIl28c9K6WzCoP4ywYP3AGZVB5gp4+ieNf2898 X-Received: by 10.152.170.130 with SMTP id am2mr19921771lac.54.1437852037966; Sat, 25 Jul 2015 12:20:37 -0700 (PDT) Received: from wasted.cogentembedded.com (ppp83-237-254-87.pppoe.mtu-net.ru. [83.237.254.87]) by smtp.gmail.com with ESMTPSA id y1sm2616764lal.39.2015.07.25.12.20.36 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 25 Jul 2015 12:20:36 -0700 (PDT) From: Sergei Shtylyov To: netdev@vger.kernel.org Cc: linux-sh@vger.kernel.org Subject: [PATCH] ravb: minimize TX data copying Date: Sat, 25 Jul 2015 22:20:30 +0300 Message-ID: <7692456.7SI5hJLWyQ@wasted.cogentembedded.com> Organization: Cogent Embedded Inc. User-Agent: KMail/4.14.9 (Linux/4.0.8-200.fc21.x86_64; KDE/4.14.9; x86_64; ; ) MIME-Version: 1.0 Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Spam-Status: No, score=-8.1 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Renesas Ethernet AVB controller requires that all data are aligned on 4-byte boundary. While it's easily achievable for the RX data with the help of skb_reserve() (we even align on 128-byte boundary as recommended by the manual), we can't do the same with the TX data, and it always comes unaligned from the networking core. Originally we solved it an easy way, copying all packet to a preallocated aligned buffer; however, it's enough to copy only up to 3 first bytes from each packet, doing the transfer using 2 TX descriptors instead of just 1. Here's an implementation of the new TX algorithm that significantly reduces the driver's memory requirements. Signed-off-by: Sergei Shtylyov --- The patch is against Dave Miller's 'net-next.git' repo. drivers/net/ethernet/renesas/ravb.h | 5 + drivers/net/ethernet/renesas/ravb_main.c | 104 +++++++++++++++++-------------- 2 files changed, 64 insertions(+), 45 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Index: net-next/drivers/net/ethernet/renesas/ravb.h =================================================================== --- net-next.orig/drivers/net/ethernet/renesas/ravb.h +++ net-next/drivers/net/ethernet/renesas/ravb.h @@ -658,6 +658,8 @@ struct ravb_desc { __le32 dptr; /* Descriptor pointer */ }; +#define DPTR_ALIGN 4 /* Required data pointer alignment */ + enum DIE_DT { /* Frame data */ DT_FMID = 0x40, @@ -739,6 +741,7 @@ enum RAVB_QUEUE { #define RX_QUEUE_OFFSET 4 #define NUM_RX_QUEUE 2 #define NUM_TX_QUEUE 2 +#define NUM_TX_DESC 2 /* TX desciptors per packet */ struct ravb_tstamp_skb { struct list_head list; @@ -777,9 +780,9 @@ struct ravb_private { dma_addr_t tx_desc_dma[NUM_TX_QUEUE]; struct ravb_ex_rx_desc *rx_ring[NUM_RX_QUEUE]; struct ravb_tx_desc *tx_ring[NUM_TX_QUEUE]; + void *tx_align[NUM_TX_QUEUE]; struct sk_buff **rx_skb[NUM_RX_QUEUE]; struct sk_buff **tx_skb[NUM_TX_QUEUE]; - void **tx_buffers[NUM_TX_QUEUE]; u32 rx_over_errors; u32 rx_fifo_errors; struct net_device_stats stats[NUM_RX_QUEUE]; Index: net-next/drivers/net/ethernet/renesas/ravb_main.c =================================================================== --- net-next.orig/drivers/net/ethernet/renesas/ravb_main.c +++ net-next/drivers/net/ethernet/renesas/ravb_main.c @@ -195,12 +195,8 @@ static void ravb_ring_free(struct net_de priv->tx_skb[q] = NULL; /* Free aligned TX buffers */ - if (priv->tx_buffers[q]) { - for (i = 0; i < priv->num_tx_ring[q]; i++) - kfree(priv->tx_buffers[q][i]); - } - kfree(priv->tx_buffers[q]); - priv->tx_buffers[q] = NULL; + kfree(priv->tx_align[q]); + priv->tx_align[q] = NULL; if (priv->rx_ring[q]) { ring_size = sizeof(struct ravb_ex_rx_desc) * @@ -212,7 +208,7 @@ static void ravb_ring_free(struct net_de if (priv->tx_ring[q]) { ring_size = sizeof(struct ravb_tx_desc) * - (priv->num_tx_ring[q] + 1); + (priv->num_tx_ring[q] * NUM_TX_DESC + 1); dma_free_coherent(NULL, ring_size, priv->tx_ring[q], priv->tx_desc_dma[q]); priv->tx_ring[q] = NULL; @@ -227,7 +223,8 @@ static void ravb_ring_format(struct net_ struct ravb_tx_desc *tx_desc; struct ravb_desc *desc; int rx_ring_size = sizeof(*rx_desc) * priv->num_rx_ring[q]; - int tx_ring_size = sizeof(*tx_desc) * priv->num_tx_ring[q]; + int tx_ring_size = sizeof(*tx_desc) * priv->num_tx_ring[q] * + NUM_TX_DESC; dma_addr_t dma_addr; int i; @@ -260,11 +257,12 @@ static void ravb_ring_format(struct net_ memset(priv->tx_ring[q], 0, tx_ring_size); /* Build TX ring buffer */ - for (i = 0; i < priv->num_tx_ring[q]; i++) { - tx_desc = &priv->tx_ring[q][i]; + for (i = 0, tx_desc = priv->tx_ring[q]; i < priv->num_tx_ring[q]; + i++, tx_desc++) { + tx_desc->die_dt = DT_EEMPTY; + tx_desc++; tx_desc->die_dt = DT_EEMPTY; } - tx_desc = &priv->tx_ring[q][i]; tx_desc->dptr = cpu_to_le32((u32)priv->tx_desc_dma[q]); tx_desc->die_dt = DT_LINKFIX; /* type */ @@ -285,7 +283,6 @@ static int ravb_ring_init(struct net_dev struct ravb_private *priv = netdev_priv(ndev); struct sk_buff *skb; int ring_size; - void *buffer; int i; /* Allocate RX and TX skb rings */ @@ -305,19 +302,11 @@ static int ravb_ring_init(struct net_dev } /* Allocate rings for the aligned buffers */ - priv->tx_buffers[q] = kcalloc(priv->num_tx_ring[q], - sizeof(*priv->tx_buffers[q]), GFP_KERNEL); - if (!priv->tx_buffers[q]) + priv->tx_align[q] = kmalloc(DPTR_ALIGN * priv->num_tx_ring[q] + + DPTR_ALIGN - 1, GFP_KERNEL); + if (!priv->tx_align[q]) goto error; - for (i = 0; i < priv->num_tx_ring[q]; i++) { - buffer = kmalloc(PKT_BUF_SZ + RAVB_ALIGN - 1, GFP_KERNEL); - if (!buffer) - goto error; - /* Aligned TX buffer */ - priv->tx_buffers[q][i] = buffer; - } - /* Allocate all RX descriptors. */ ring_size = sizeof(struct ravb_ex_rx_desc) * (priv->num_rx_ring[q] + 1); priv->rx_ring[q] = dma_alloc_coherent(NULL, ring_size, @@ -329,7 +318,8 @@ static int ravb_ring_init(struct net_dev priv->dirty_rx[q] = 0; /* Allocate all TX descriptors. */ - ring_size = sizeof(struct ravb_tx_desc) * (priv->num_tx_ring[q] + 1); + ring_size = sizeof(struct ravb_tx_desc) * + (priv->num_tx_ring[q] * NUM_TX_DESC + 1); priv->tx_ring[q] = dma_alloc_coherent(NULL, ring_size, &priv->tx_desc_dma[q], GFP_KERNEL); @@ -443,7 +433,8 @@ static int ravb_tx_free(struct net_devic u32 size; for (; priv->cur_tx[q] - priv->dirty_tx[q] > 0; priv->dirty_tx[q]++) { - entry = priv->dirty_tx[q] % priv->num_tx_ring[q]; + entry = priv->dirty_tx[q] % (priv->num_tx_ring[q] * + NUM_TX_DESC); desc = &priv->tx_ring[q][entry]; if (desc->die_dt != DT_FEMPTY) break; @@ -451,14 +442,18 @@ static int ravb_tx_free(struct net_devic dma_rmb(); size = le16_to_cpu(desc->ds_tagl) & TX_DS; /* Free the original skb. */ - if (priv->tx_skb[q][entry]) { + if (priv->tx_skb[q][entry / NUM_TX_DESC]) { dma_unmap_single(&ndev->dev, le32_to_cpu(desc->dptr), size, DMA_TO_DEVICE); - dev_kfree_skb_any(priv->tx_skb[q][entry]); - priv->tx_skb[q][entry] = NULL; + /* Last packet descriptor? */ + if (entry % NUM_TX_DESC == NUM_TX_DESC - 1) { + entry /= NUM_TX_DESC; + dev_kfree_skb_any(priv->tx_skb[q][entry]); + priv->tx_skb[q][entry] = NULL; + stats->tx_packets++; + } free_num++; } - stats->tx_packets++; stats->tx_bytes += size; desc->die_dt = DT_EEMPTY; } @@ -1284,37 +1279,53 @@ static netdev_tx_t ravb_start_xmit(struc u32 dma_addr; void *buffer; u32 entry; + u32 len; spin_lock_irqsave(&priv->lock, flags); - if (priv->cur_tx[q] - priv->dirty_tx[q] >= priv->num_tx_ring[q]) { + if (priv->cur_tx[q] - priv->dirty_tx[q] > (priv->num_tx_ring[q] - 1) * + NUM_TX_DESC) { netif_err(priv, tx_queued, ndev, "still transmitting with the full ring!\n"); netif_stop_subqueue(ndev, q); spin_unlock_irqrestore(&priv->lock, flags); return NETDEV_TX_BUSY; } - entry = priv->cur_tx[q] % priv->num_tx_ring[q]; - priv->tx_skb[q][entry] = skb; + entry = priv->cur_tx[q] % (priv->num_tx_ring[q] * NUM_TX_DESC); + priv->tx_skb[q][entry / NUM_TX_DESC] = skb; if (skb_put_padto(skb, ETH_ZLEN)) goto drop; - buffer = PTR_ALIGN(priv->tx_buffers[q][entry], RAVB_ALIGN); - memcpy(buffer, skb->data, skb->len); - desc = &priv->tx_ring[q][entry]; - desc->ds_tagl = cpu_to_le16(skb->len); - dma_addr = dma_map_single(&ndev->dev, buffer, skb->len, DMA_TO_DEVICE); + buffer = PTR_ALIGN(priv->tx_align[q], DPTR_ALIGN) + + entry / NUM_TX_DESC * DPTR_ALIGN; + len = PTR_ALIGN(skb->data, DPTR_ALIGN) - skb->data; + memcpy(buffer, skb->data, len); + dma_addr = dma_map_single(&ndev->dev, buffer, len, DMA_TO_DEVICE); if (dma_mapping_error(&ndev->dev, dma_addr)) goto drop; + + desc = &priv->tx_ring[q][entry]; + desc->ds_tagl = cpu_to_le16(len); + desc->dptr = cpu_to_le32(dma_addr); + + buffer = skb->data + len; + len = skb->len - len; + dma_addr = dma_map_single(&ndev->dev, buffer, len, DMA_TO_DEVICE); + if (dma_mapping_error(&ndev->dev, dma_addr)) + goto unmap; + + desc++; + desc->ds_tagl = cpu_to_le16(len); desc->dptr = cpu_to_le32(dma_addr); /* TX timestamp required */ if (q == RAVB_NC) { ts_skb = kmalloc(sizeof(*ts_skb), GFP_ATOMIC); if (!ts_skb) { - dma_unmap_single(&ndev->dev, dma_addr, skb->len, + desc--; + dma_unmap_single(&ndev->dev, dma_addr, len, DMA_TO_DEVICE); - goto drop; + goto unmap; } ts_skb->skb = skb; ts_skb->tag = priv->ts_skb_tag++; @@ -1330,13 +1341,15 @@ static netdev_tx_t ravb_start_xmit(struc /* Descriptor type must be set after all the above writes */ dma_wmb(); - desc->die_dt = DT_FSINGLE; + desc->die_dt = DT_FEND; + desc--; + desc->die_dt = DT_FSTART; ravb_write(ndev, ravb_read(ndev, TCCR) | (TCCR_TSRQ0 << q), TCCR); - priv->cur_tx[q]++; - if (priv->cur_tx[q] - priv->dirty_tx[q] >= priv->num_tx_ring[q] && - !ravb_tx_free(ndev, q)) + priv->cur_tx[q] += NUM_TX_DESC; + if (priv->cur_tx[q] - priv->dirty_tx[q] > + (priv->num_tx_ring[q] - 1) * NUM_TX_DESC && !ravb_tx_free(ndev, q)) netif_stop_subqueue(ndev, q); exit: @@ -1344,9 +1357,12 @@ exit: spin_unlock_irqrestore(&priv->lock, flags); return NETDEV_TX_OK; +unmap: + dma_unmap_single(&ndev->dev, le32_to_cpu(desc->dptr), + le16_to_cpu(desc->ds_tagl), DMA_TO_DEVICE); drop: dev_kfree_skb_any(skb); - priv->tx_skb[q][entry] = NULL; + priv->tx_skb[q][entry / NUM_TX_DESC] = NULL; goto exit; }