Message ID | 20141201072802.GB21731@1wt.eu (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Willy, Good day. Thank you for the quick feedback. > In the mean time you can apply the attached patch. I haven't submitted it yet only by lack of time :-( Should we apply this patch on: 1. 3.13.9? or 2. 3.13.9 with cd71e246c16b30e3f396a85943d5f596202737ba and reverted 4f3a4f701b59a3e4b5c8503ac3d905c0a326f922? This patch is expected resolve the low throughput and the kernel crash as well? # Not just the "No buffer space available" error? Thank you. Regards, Maggie Roxas On Mon, Dec 1, 2014 at 3:28 PM, Willy Tarreau <w@1wt.eu> wrote: > Hi Maggie, > > On Mon, Dec 01, 2014 at 02:26:49PM +0800, Maggie Mae Roxas wrote: >> Hi Willy, Thomas. >> Good day. >> >> I am reopening this discussion because we found an unusual behavior >> after using this combination that we thought was OK as discussed in >> the previous messages of this thread: >> >> > - use 3.13.9 mvneta.c >> > - apply cd71e246c16b30e3f396a85943d5f596202737ba >> > - revert 4f3a4f701b59a3e4b5c8503ac3d905c0a326f922 >> >> Specifically, if we apply above, the "No buffer space available" error >> during continuous ping does NOT occur anymore. >> # Attached: with_patch_3_13_9_no_buffer_space_solved.txt >> >> However, after continuous and further testing, we encounter the ff. issues: >> 1. Low throughput during iperf when Armada 370 device is set as iperf >> client. For example, in 1000Mbits/s, we only get below 140Mbits/s. > > Yes that was the intent of the original fix. > > We recently diagnosed the issue related to "no buffer space available". > What happens is that the "ping" utility uses a very small socket buffer. > It sends a few packets, and the NIC doesn't send interrupts until the > TX interrupt count is reached, so the Tx skbs are not freed and the > socket buffers remain full. > > The only solution at the moment is to make the NIC emit an IRQ for each > Tx packet. I'm still trying to find a better way to do this (either find > a way to make the NIC emit an IRQ once the Tx queue is empty or adjust > the IRQ delay when adding more packets, though it creates a race condition). > > In the mean time you can apply the attached patch. I haven't submitted it > yet only by lack of time :-( > > Best regards, > Willy >
On Mon, Dec 01, 2014 at 04:27:46PM +0800, Maggie Mae Roxas wrote: > Hi Willy, > Good day. > > Thank you for the quick feedback. > > > In the mean time you can apply the attached patch. I haven't submitted it yet only by lack of time :-( > Should we apply this patch on: > 1. 3.13.9? or > 2. 3.13.9 with cd71e246c16b30e3f396a85943d5f596202737ba and reverted > 4f3a4f701b59a3e4b5c8503ac3d905c0a326f922? > > This patch is expected resolve the low throughput and the kernel crash as well? > # Not just the "No buffer space available" error? Yes absolutely. The low throughput is caused by the use of a timer instead of an interrupt to flush Tx descriptors. The "No buffer space available" is caused by the Tx coalesce of 16 which only flushes the buffers after 16 packets have been emitted. When socket buffers are too small for 16 packets (eg: ping) you get the error above. Thus setting Tx coalesce to 1 fixes it for all situations. It's slightly less performant than coalesce 16 but you can change it using ethtool if you want (4 still works with ping and shows better performance). Best regards, Willy
Dear Willy Tarreau, On Mon, 1 Dec 2014 10:28:51 +0100, Willy Tarreau wrote: > > This patch is expected resolve the low throughput and the kernel crash as well? > > # Not just the "No buffer space available" error? > > Yes absolutely. The low throughput is caused by the use of a timer instead > of an interrupt to flush Tx descriptors. The "No buffer space available" > is caused by the Tx coalesce of 16 which only flushes the buffers after 16 > packets have been emitted. When socket buffers are too small for 16 packets > (eg: ping) you get the error above. Thus setting Tx coalesce to 1 fixes it > for all situations. It's slightly less performant than coalesce 16 but you > can change it using ethtool if you want (4 still works with ping and shows > better performance). If I understood correctly, on RX the interrupt coalescing can be done every X packets, or after N milliseconds. However, on TX, it's only after Y packets, there is no way to configure a delay. But in any case, with NAPI implemented in software, are these hardware interrupt coalescing features very important? As soon as the number of interrupts becomes high, the kernel will disable the interrupt and switch to polling, no? Best regards, Thomas
Hi Thomas, On Mon, Dec 01, 2014 at 10:32:21AM +0100, Thomas Petazzoni wrote: > If I understood correctly, on RX the interrupt coalescing can be done > every X packets, or after N milliseconds. However, on TX, it's only > after Y packets, there is no way to configure a delay. That was my understanding of the datasheet as well. > But in any case, with NAPI implemented in software, are these hardware > interrupt coalescing features very important? As soon as the number of > interrupts becomes high, the kernel will disable the interrupt and > switch to polling, no? Absolutely, but despite this, the interrupts still impact the system's performance, because instead of waking up the driver once the Tx queue is about to be empty (let's say twice per Tx queue), we wake up as often as the system supports it. The net effect is a performance loss of about 5% on small packets, which is not huge of course, but would rather be spent doing some more useful stuff. Uri gave me a contact at Marvell who knows this device well, I'll ask him if there's no other way to work with this chip. Sending an interrupt at least when the Tx queue is empty would be nice :-/ Best regards, Willy
Hi Thomas, Willy. Good day. Thank you for your assistance as always. We are currently testing the attached patch + 3.13.9. We'll let you know the results as soon as we've conducted enough tests - latest tomorrow. Will keep you posted. Regards, Maggie Roxas On Mon, Dec 1, 2014 at 5:58 PM, Willy Tarreau <w@1wt.eu> wrote: > Hi Thomas, > > On Mon, Dec 01, 2014 at 10:32:21AM +0100, Thomas Petazzoni wrote: >> If I understood correctly, on RX the interrupt coalescing can be done >> every X packets, or after N milliseconds. However, on TX, it's only >> after Y packets, there is no way to configure a delay. > > That was my understanding of the datasheet as well. > >> But in any case, with NAPI implemented in software, are these hardware >> interrupt coalescing features very important? As soon as the number of >> interrupts becomes high, the kernel will disable the interrupt and >> switch to polling, no? > > Absolutely, but despite this, the interrupts still impact the system's > performance, because instead of waking up the driver once the Tx queue > is about to be empty (let's say twice per Tx queue), we wake up as often > as the system supports it. The net effect is a performance loss of about > 5% on small packets, which is not huge of course, but would rather be > spent doing some more useful stuff. > > Uri gave me a contact at Marvell who knows this device well, I'll ask > him if there's no other way to work with this chip. Sending an interrupt > at least when the Tx queue is empty would be nice :-/ > > Best regards, > Willy >
Hi Willy, Thomas, Good day. This is just to confirm that the patch Willy sent yesterday, patched to 3.13.9 alone, resolves these issues: - "No buffer space available" around 17th packet - Low throughput on iperf as client (ie, ~130 Mbits/s on 1000 Mbits/s) - Kernel crash We confirmed it after quite some testing. # Full logs at: # as iperf client: https://app.box.com/s/5pzh753dmmnqt4q8s5iy # as iperf server: https://app.box.com/s/t9ohxrcphpmnfer9dtj9 Thank you very much for your assistance again!!! :) Regards, Maggie Roxas On Mon, Dec 1, 2014 at 6:15 PM, Maggie Mae Roxas <maggie.mae.roxas@gmail.com> wrote: > Hi Thomas, Willy. > Good day. > > Thank you for your assistance as always. > > We are currently testing the attached patch + 3.13.9. > We'll let you know the results as soon as we've conducted enough tests > - latest tomorrow. > > Will keep you posted. > > Regards, > Maggie Roxas > > > On Mon, Dec 1, 2014 at 5:58 PM, Willy Tarreau <w@1wt.eu> wrote: >> Hi Thomas, >> >> On Mon, Dec 01, 2014 at 10:32:21AM +0100, Thomas Petazzoni wrote: >>> If I understood correctly, on RX the interrupt coalescing can be done >>> every X packets, or after N milliseconds. However, on TX, it's only >>> after Y packets, there is no way to configure a delay. >> >> That was my understanding of the datasheet as well. >> >>> But in any case, with NAPI implemented in software, are these hardware >>> interrupt coalescing features very important? As soon as the number of >>> interrupts becomes high, the kernel will disable the interrupt and >>> switch to polling, no? >> >> Absolutely, but despite this, the interrupts still impact the system's >> performance, because instead of waking up the driver once the Tx queue >> is about to be empty (let's say twice per Tx queue), we wake up as often >> as the system supports it. The net effect is a performance loss of about >> 5% on small packets, which is not huge of course, but would rather be >> spent doing some more useful stuff. >> >> Uri gave me a contact at Marvell who knows this device well, I'll ask >> him if there's no other way to work with this chip. Sending an interrupt >> at least when the Tx queue is empty would be nice :-/ >> >> Best regards, >> Willy >>
Hi Maggie, On Tue, Dec 02, 2014 at 12:09:05PM +0800, Maggie Mae Roxas wrote: > Hi Willy, Thomas, > Good day. > > This is just to confirm that the patch Willy sent yesterday, patched > to 3.13.9 alone, resolves these issues: > - "No buffer space available" around 17th packet > - Low throughput on iperf as client (ie, ~130 Mbits/s on 1000 Mbits/s) > - Kernel crash > > We confirmed it after quite some testing. > # Full logs at: > # as iperf client: https://app.box.com/s/5pzh753dmmnqt4q8s5iy > # as iperf server: https://app.box.com/s/t9ohxrcphpmnfer9dtj9 > > Thank you very much for your assistance again!!! :) thanks for your valuable feedback, I'll add your tested-by and will send it to netdev for inclusion before I switch to something else and forget again! Willy
Hi Willy, Good day. No problem. Thanks again! Regards, Maggie Roxas On Tue, Dec 2, 2014 at 2:56 PM, Willy Tarreau <w@1wt.eu> wrote: > Hi Maggie, > > On Tue, Dec 02, 2014 at 12:09:05PM +0800, Maggie Mae Roxas wrote: >> Hi Willy, Thomas, >> Good day. >> >> This is just to confirm that the patch Willy sent yesterday, patched >> to 3.13.9 alone, resolves these issues: >> - "No buffer space available" around 17th packet >> - Low throughput on iperf as client (ie, ~130 Mbits/s on 1000 Mbits/s) >> - Kernel crash >> >> We confirmed it after quite some testing. >> # Full logs at: >> # as iperf client: https://app.box.com/s/5pzh753dmmnqt4q8s5iy >> # as iperf server: https://app.box.com/s/t9ohxrcphpmnfer9dtj9 >> >> Thank you very much for your assistance again!!! :) > > thanks for your valuable feedback, I'll add your tested-by and will > send it to netdev for inclusion before I switch to something else > and forget again! > > Willy >
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index 4762994..35bfba7 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -214,7 +214,7 @@ /* Various constants */ /* Coalescing */ -#define MVNETA_TXDONE_COAL_PKTS 16 +#define MVNETA_TXDONE_COAL_PKTS 1 #define MVNETA_RX_COAL_PKTS 32 #define MVNETA_RX_COAL_USEC 100