diff mbox

[v8,3/3] net: hisilicon: new hip04 ethernet driver

Message ID 5487C357.9030507@huawei.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ding Tianhong Dec. 10, 2014, 3:51 a.m. UTC
On 2014/12/8 4:09, Arnd Bergmann wrote:
> On Sunday 07 December 2014 10:49:12 Alexander Graf wrote:
>> On 07.12.14 04:28, Ding Tianhong wrote:
>>> On 2014/12/7 8:42, Alexander Graf wrote:
>>>> On 19.04.14 03:13, Zhangfei Gao wrote:
>>>>> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
>>>>> The controller has no tx done interrupt, reclaim xmitted buffer in the poll.
>>>>>
>>>>> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
>>>>
>>>> Is this driver still supposed to go upstream? I presume this was the
>>>> last submission and it's been quite some time ago 
>>>>
>>>
>>> yes, it is really a long time, but The hip04 did not support tx irq, 
>>> we couldn't get any better idea to fix this defect, do you have any suggestion?
>>
>> Well, if hardware doesn't have a TX irq I don't see there's anything we
>> can do to fix that ;).
> 
> I don't know if it's related to the ethernet on hip01, but I would assume
> it is, and that platform is currently being submitted for inclusion, so
> I'd definitely hope to see this driver get merged too eventually.
> 
> IIRC, the last revision of the patch set had basically fixed the problem,
> except for a race that would still allow the napi poll function to exit
> with poll_complete() but a full queue of TX descriptors and no fallback
> to clean them up. There was also still an open question about whether or
> not the driver should use skb_orphan, but I may be misremembering that part.
>  

Hi Arnd:

what about use a state machine to check the tx queue and free the skb, just like:


what do you think of this solution?

Regards
Ding


>> Dave, what's your take here? Should we keep a driver from going upstream
>> just because the hardware is partly broken? I'd really prefer to have an
>> upstream driver on that SoC rather than some random (eventually even
>> more broken) downstream code.
> 
> We can certainly have a slow driver for this hardware, and I'd much
> prefer slow over broken. I'd guess that some of the performance impact
> of the missing interrupts can now be offset with the xmit_more	 logic.
> 
> 	Arnd
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
>

Comments

Ding Tianhong Dec. 10, 2014, 6:45 a.m. UTC | #1
On 2014/12/10 11:51, Ding Tianhong wrote:
> On 2014/12/8 4:09, Arnd Bergmann wrote:
>> On Sunday 07 December 2014 10:49:12 Alexander Graf wrote:
>>> On 07.12.14 04:28, Ding Tianhong wrote:
>>>> On 2014/12/7 8:42, Alexander Graf wrote:
>>>>> On 19.04.14 03:13, Zhangfei Gao wrote:
>>>>>> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
>>>>>> The controller has no tx done interrupt, reclaim xmitted buffer in the poll.
>>>>>>
>>>>>> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
>>>>>
>>>>> Is this driver still supposed to go upstream? I presume this was the
>>>>> last submission and it's been quite some time ago 
>>>>>
>>>>
>>>> yes, it is really a long time, but The hip04 did not support tx irq, 
>>>> we couldn't get any better idea to fix this defect, do you have any suggestion?
>>>
>>> Well, if hardware doesn't have a TX irq I don't see there's anything we
>>> can do to fix that ;).
>>
>> I don't know if it's related to the ethernet on hip01, but I would assume
>> it is, and that platform is currently being submitted for inclusion, so
>> I'd definitely hope to see this driver get merged too eventually.
>>
>> IIRC, the last revision of the patch set had basically fixed the problem,
>> except for a race that would still allow the napi poll function to exit
>> with poll_complete() but a full queue of TX descriptors and no fallback
>> to clean them up. There was also still an open question about whether or
>> not the driver should use skb_orphan, but I may be misremembering that part.
>>  
> 
> Hi Arnd:
> 
> what about use a state machine to check the tx queue and free the skb, just like:
> 
> diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
> index 8593658..71faca8 100644
> --- a/drivers/net/ethernet/hisilicon/hip04_eth.c
> +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
> @@ -396,9 +396,25 @@ static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>         stats->tx_packets++;
>         priv->tx_count++;
> 
> +
> +       queue_delayed_work(priv->wq, &priv->tx_queue, delay);
> +
>         return NETDEV_TX_OK;
>  }
> 
> +static void hip04_tx_queue_monitor(struct work_struct *work)
> +{
> +       struct hip04_priv *priv = container_of(work, struct hip04_priv,
> +                                              queue_work.work);
> +       struct net_device *dev = priv->ndev;
> +       hip04_tx_reclain(ndev, false);
> +
> +       if (TX_QUEUE_IS_EMPRY(ndev))
> +               return;
> +
> +       queue_delayed_work(priv->wq, &priv->tx_queue, delay);
> +}
> +
>  static int hip04_rx_poll(struct napi_struct *napi, int budget)
>  {
>         struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
> @@ -736,6 +752,8 @@ static int hip04_mac_probe(struct platform_device *pdev)
>                 goto alloc_fail;
>         }
> 
> +       INIT_DELAYED_WORK(&priv->tx_queue, hip04_tx_queue_monitor);
> +
>         return 0;
> 
> 
> 
> what do you think of this solution?
> 
> Regards
> Ding
> 

Miss this code, I think the best way is skb_orphan(skb), just like the cxgb3 drivers, some hardware
didn't use the tx inq to free dmad Tx packages.

Ding

> 
>>> Dave, what's your take here? Should we keep a driver from going upstream
>>> just because the hardware is partly broken? I'd really prefer to have an
>>> upstream driver on that SoC rather than some random (eventually even
>>> more broken) downstream code.
>>
>> We can certainly have a slow driver for this hardware, and I'd much
>> prefer slow over broken. I'd guess that some of the performance impact
>> of the missing interrupts can now be offset with the xmit_more	 logic.
>>
>> 	Arnd
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> .
>>
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
>
Arnd Bergmann Dec. 10, 2014, 9:35 a.m. UTC | #2
On Wednesday 10 December 2014 14:45:29 Ding Tianhong wrote:
> 
> Miss this code, I think the best way is skb_orphan(skb), just like the cxgb3 drivers, some hardware
> didn't use the tx inq to free dmad Tx packages.

The problem with skb_orphan is that you are telling the network stack that
the packet is now out of its reach and it will no longer be able to take
queued packets into account. This can work as long as you are guaranteed
to never fill up the tx queue and assume that the network link is always
faster than your CPUs, but then you have to do additional steps, at least:

- turn off flow control (pause frames)
- disable half-duplex, 10mbit and 100mbit connections
- remove the byte queue limit code from the driver
- never call netif_stop_queue or return NETDEV_TX_BUSY from the tx
  function, but instead drop the oldest packets from the tx queue
  to make room, and rely on higher-level protocols (TCP) to throttle
  the output
- for best throughput, don't bother looking at the status of the tx
  queue until it has filled up, then clean up all transmitted packets
  (if any) but at least enough so you can continue sending a bit longer.

The above can work because you know the device is only used in one SoC
that has a relatively slow CPU. I would describe it as "we don't know
how much data we can send, but in doubt we send more and do our best
to make that case as rare as possible". It's also management friendly
because you can reach higher-than-gigabit transmit rates this way
if you ignore the packet loss ;-). The price for this is that you
eventually have to retransmit packets that get dropped.

The alternative to this is to keep all the features of the network
stack in place and to keep retrying the tx cleanup in some way so
you don't have to orphan or drop any skbs. This could be as easy as

- drop the skb_orphan() call
- make hip04_tx_reclaim() return an error if the queue is still full
- do not call napi_complete or turn on the rx irq if hip04_tx_reclaim
  failed

	Arnd
David Miller Dec. 10, 2014, 4:07 p.m. UTC | #3
From: Arnd Bergmann <arnd@arndb.de>
Date: Wed, 10 Dec 2014 10:35:20 +0100

> On Wednesday 10 December 2014 14:45:29 Ding Tianhong wrote:
>> 
>> Miss this code, I think the best way is skb_orphan(skb), just like the cxgb3 drivers, some hardware
>> didn't use the tx inq to free dmad Tx packages.
> 
> The problem with skb_orphan is that you are telling the network stack that
> the packet is now out of its reach and it will no longer be able to take
> queued packets into account.

skb_orphan() also does not release netfilter resources attached to the
packet.

Really, all drivers must free TX SKBs in a small, finite, amount of
time after submission.

There is no way around this.
Arnd Bergmann Dec. 10, 2014, 4:36 p.m. UTC | #4
On Wednesday 10 December 2014 11:07:32 David Miller wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> Date: Wed, 10 Dec 2014 10:35:20 +0100
> 
> > On Wednesday 10 December 2014 14:45:29 Ding Tianhong wrote:
> >> 
> >> Miss this code, I think the best way is skb_orphan(skb), just like the cxgb3 drivers, some hardware
> >> didn't use the tx inq to free dmad Tx packages.
> > 
> > The problem with skb_orphan is that you are telling the network stack that
> > the packet is now out of its reach and it will no longer be able to take
> > queued packets into account.
> 
> skb_orphan() also does not release netfilter resources attached to the
> packet.
> 
> Really, all drivers must free TX SKBs in a small, finite, amount of
> time after submission.
> 
> There is no way around this.

I see. Do you see a problem with returning early from napi->poll()
without calling napi_complete() when the TX queue remains full at
the end of the poll function?

Or is there any better alternative for broken hardware that lacks
a TX-complete interrupt?

	Arnd
David Miller Dec. 10, 2014, 5:02 p.m. UTC | #5
From: Arnd Bergmann <arnd@arndb.de>
Date: Wed, 10 Dec 2014 17:36:46 +0100

> Or is there any better alternative for broken hardware that lacks
> a TX-complete interrupt?

The only option is an hrtimer or similar.
diff mbox

Patch

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
index 8593658..71faca8 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -396,9 +396,25 @@  static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
        stats->tx_packets++;
        priv->tx_count++;

+
+       queue_delayed_work(priv->wq, &priv->tx_queue, delay);
+
        return NETDEV_TX_OK;
 }

+static void hip04_tx_queue_monitor(struct work_struct *work)
+{
+       struct hip04_priv *priv = container_of(work, struct hip04_priv,
+                                              queue_work.work);
+       struct net_device *dev = priv->ndev;
+       hip04_tx_reclain(ndev, false);
+
+       if (TX_QUEUE_IS_EMPRY(ndev))
+               return;
+
+       queue_delayed_work(priv->wq, &priv->tx_queue, delay);
+}
+
 static int hip04_rx_poll(struct napi_struct *napi, int budget)
 {
        struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
@@ -736,6 +752,8 @@  static int hip04_mac_probe(struct platform_device *pdev)
                goto alloc_fail;
        }

+       INIT_DELAYED_WORK(&priv->tx_queue, hip04_tx_queue_monitor);
+
        return 0;