Message ID | 20220511125941.55812-3-ihuguet@redhat.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | sfc: fix some efx_separate_tx_channels errors | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Clearly marked for net |
netdev/fixes_present | success | Fixes tag present in non-next series |
netdev/subject_prefix | success | Link |
netdev/cover_letter | success | Series has a cover letter |
netdev/patch_count | success | Link |
netdev/header_inline | success | No static functions without inline keyword in header files |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/cc_maintainers | success | CCed 8 of 8 maintainers |
netdev/build_clang | success | Errors and warnings before: 0 this patch: 0 |
netdev/module_param | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Signed-off-by tag matches author and committer |
netdev/verify_fixes | success | Fixes tag looks correct |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 30 lines checked |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/source_inline | success | Was 0 now: 0 |
On Wed, May 11, 2022 at 02:59:41PM +0200, Íñigo Huguet wrote: > If efx_separate_tx_channels is used, some error messages and backtraces > are shown in the logs (see below). This is because during channels > start, all queues in the channels are init asumming that they exist, but > they might not if efx_separate_tx_channels is used: some channels only > have RX queues and others only have TX queues. Thanks for reporting this. At first glance I suspect there may be more callers of efx_for_each_channel_tx_queue() which is why it is not yet working for you even with this fix. Probably we need to fix those macros themselves. I'm having a closer look, but it will take some time. Martin > > Avoid that by checking if the channel has TX, RX or both queues. > However, even with this patch the NIC is unusable when using > efx_separate_tx_channels, so there are more problems that I've not > identified. These messages are still shown at probe time many times: > sfc 0000:03:00.0 (unnamed net_device) (uninitialized): MC command 0x92 inlen 8 failed rc=-71 (raw=0) arg=0 > sfc 0000:03:00.0 (unnamed net_device) (uninitialized): failed to link VI 4294967295 to PIO buffer 1 (-71) > > Those messages were also shown before these patch. > > And then this other message and backtrace were also shown many times, > but now they're not: > sfc 0000:03:00.0 ens6f0np0: MC command 0x82 inlen 544 failed rc=-22 (raw=0) arg=0 > ------------[ cut here ]------------ > netdevice: ens6f0np0: failed to initialise TXQ -1 > WARNING: CPU: 1 PID: 626 at drivers/net/ethernet/sfc/ef10.c:2393 efx_ef10_tx_init+0x201/0x300 [sfc] > [...] stripped > RIP: 0010:efx_ef10_tx_init+0x201/0x300 [sfc] > [...] stripped > Call Trace: > efx_init_tx_queue+0xaa/0xf0 [sfc] > efx_start_channels+0x49/0x120 [sfc] > efx_start_all+0x1f8/0x430 [sfc] > efx_net_open+0x5a/0xe0 [sfc] > __dev_open+0xd0/0x190 > __dev_change_flags+0x1b3/0x220 > dev_change_flags+0x21/0x60 > [...] > > At remove time, these messages were shown. Now they're neither shown: > sfc 0000:03:00.0 ens6f0np0: failed to flush 10 queues > sfc 0000:03:00.0 ens6f0np0: failed to flush queues > > Fixes: 7ec3de426014 ("sfc: move datapath management code") > Reported-by: Tianhao Zhao <tizhao@redhat.com> > Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> > --- > drivers/net/ethernet/sfc/efx_channels.c | 22 +++++++++++++--------- > 1 file changed, 13 insertions(+), 9 deletions(-) > > diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c > index da2db6791907..b6b960e2021c 100644 > --- a/drivers/net/ethernet/sfc/efx_channels.c > +++ b/drivers/net/ethernet/sfc/efx_channels.c > @@ -1139,17 +1139,21 @@ void efx_start_channels(struct efx_nic *efx) > struct efx_channel *channel; > > efx_for_each_channel_rev(channel, efx) { > - efx_for_each_channel_tx_queue(tx_queue, channel) { > - efx_init_tx_queue(tx_queue); > - atomic_inc(&efx->active_queues); > + if (channel->channel >= efx->tx_channel_offset) { > + efx_for_each_channel_tx_queue(tx_queue, channel) { > + efx_init_tx_queue(tx_queue); > + atomic_inc(&efx->active_queues); > + } > } > > - efx_for_each_channel_rx_queue(rx_queue, channel) { > - efx_init_rx_queue(rx_queue); > - atomic_inc(&efx->active_queues); > - efx_stop_eventq(channel); > - efx_fast_push_rx_descriptors(rx_queue, false); > - efx_start_eventq(channel); > + if (channel->channel < efx->n_rx_channels) { > + efx_for_each_channel_rx_queue(rx_queue, channel) { > + efx_init_rx_queue(rx_queue); > + atomic_inc(&efx->active_queues); > + efx_stop_eventq(channel); > + efx_fast_push_rx_descriptors(rx_queue, false); > + efx_start_eventq(channel); > + } > } > > WARN_ON(channel->rx_pkt_n_frags); > -- > 2.34.1
On Fri, May 13, 2022 at 12:07:23PM +0100, Martin Habets wrote: > On Wed, May 11, 2022 at 02:59:41PM +0200, Íñigo Huguet wrote: > > If efx_separate_tx_channels is used, some error messages and backtraces > > are shown in the logs (see below). This is because during channels > > start, all queues in the channels are init asumming that they exist, but > > they might not if efx_separate_tx_channels is used: some channels only > > have RX queues and others only have TX queues. > > Thanks for reporting this. At first glance I suspect there may be more callers > of efx_for_each_channel_tx_queue() which is why it is not yet working for you > even with this fix. > Probably we need to fix those macros themselves. > > I'm having a closer look, but it will take some time. It was easier than I thought. With the patch below I do not get any errors, and ping works. I did not have to touch efx_for_each_channel_rx_queue(). Can you give this a try and report if it works for you? Martin --- drivers/net/ethernet/sfc/net_driver.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h index 318db906a154..723bbeea5d0c 100644 --- a/drivers/net/ethernet/sfc/net_driver.h +++ b/drivers/net/ethernet/sfc/net_driver.h @@ -1530,7 +1530,7 @@ static inline bool efx_channel_is_xdp_tx(struct efx_channel *channel) static inline bool efx_channel_has_tx_queues(struct efx_channel *channel) { - return true; + return channel && channel->channel >= channel->efx->tx_channel_offset; } static inline unsigned int efx_channel_num_tx_queues(struct efx_channel *channel)
On Fri, May 13, 2022 at 2:37 PM Martin Habets <habetsm.xilinx@gmail.com> wrote: > > On Fri, May 13, 2022 at 12:07:23PM +0100, Martin Habets wrote: > > On Wed, May 11, 2022 at 02:59:41PM +0200, Íñigo Huguet wrote: > > > If efx_separate_tx_channels is used, some error messages and backtraces > > > are shown in the logs (see below). This is because during channels > > > start, all queues in the channels are init asumming that they exist, but > > > they might not if efx_separate_tx_channels is used: some channels only > > > have RX queues and others only have TX queues. > > > > Thanks for reporting this. At first glance I suspect there may be more callers > > of efx_for_each_channel_tx_queue() which is why it is not yet working for you > > even with this fix. > > Probably we need to fix those macros themselves. > > > > I'm having a closer look, but it will take some time. > > It was easier than I thought. With the patch below I do not get any errors, > and ping works. I did not have to touch efx_for_each_channel_rx_queue(). > Can you give this a try and report if it works for you? Martin, this is working fine for me. Module loads and unloads without errors, and I can ping and run an iperf3 test also without errors. How do you want to do it? Should I send this patch on your behalf within my patch series? Or do you want to send it yourself first? > > Martin > --- > drivers/net/ethernet/sfc/net_driver.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h > index 318db906a154..723bbeea5d0c 100644 > --- a/drivers/net/ethernet/sfc/net_driver.h > +++ b/drivers/net/ethernet/sfc/net_driver.h > @@ -1530,7 +1530,7 @@ static inline bool efx_channel_is_xdp_tx(struct efx_channel *channel) > > static inline bool efx_channel_has_tx_queues(struct efx_channel *channel) > { > - return true; > + return channel && channel->channel >= channel->efx->tx_channel_offset; > } > > static inline unsigned int efx_channel_num_tx_queues(struct efx_channel *channel) >
On Tue, May 24, 2022 at 05:36:49PM +0200, Íñigo Huguet wrote: > On Fri, May 13, 2022 at 2:37 PM Martin Habets <habetsm.xilinx@gmail.com> wrote: > > > > On Fri, May 13, 2022 at 12:07:23PM +0100, Martin Habets wrote: > > > On Wed, May 11, 2022 at 02:59:41PM +0200, Íñigo Huguet wrote: > > > > If efx_separate_tx_channels is used, some error messages and backtraces > > > > are shown in the logs (see below). This is because during channels > > > > start, all queues in the channels are init asumming that they exist, but > > > > they might not if efx_separate_tx_channels is used: some channels only > > > > have RX queues and others only have TX queues. > > > > > > Thanks for reporting this. At first glance I suspect there may be more callers > > > of efx_for_each_channel_tx_queue() which is why it is not yet working for you > > > even with this fix. > > > Probably we need to fix those macros themselves. > > > > > > I'm having a closer look, but it will take some time. > > > > It was easier than I thought. With the patch below I do not get any errors, > > and ping works. I did not have to touch efx_for_each_channel_rx_queue(). > > Can you give this a try and report if it works for you? > > Martin, this is working fine for me. Module loads and unloads without > errors, and I can ping and run an iperf3 test also without errors. > > How do you want to do it? Should I send this patch on your behalf > within my patch series? Or do you want to send it yourself first? IMO you did the hard work so I did not want to steal your thunder. Please send it as part of your series, I'll Ack it. Martin > > > > Martin > > --- > > drivers/net/ethernet/sfc/net_driver.h | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h > > index 318db906a154..723bbeea5d0c 100644 > > --- a/drivers/net/ethernet/sfc/net_driver.h > > +++ b/drivers/net/ethernet/sfc/net_driver.h > > @@ -1530,7 +1530,7 @@ static inline bool efx_channel_is_xdp_tx(struct efx_channel *channel) > > > > static inline bool efx_channel_has_tx_queues(struct efx_channel *channel) > > { > > - return true; > > + return channel && channel->channel >= channel->efx->tx_channel_offset; > > } > > > > static inline unsigned int efx_channel_num_tx_queues(struct efx_channel *channel) > > > > > -- > Íñigo Huguet
diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c index da2db6791907..b6b960e2021c 100644 --- a/drivers/net/ethernet/sfc/efx_channels.c +++ b/drivers/net/ethernet/sfc/efx_channels.c @@ -1139,17 +1139,21 @@ void efx_start_channels(struct efx_nic *efx) struct efx_channel *channel; efx_for_each_channel_rev(channel, efx) { - efx_for_each_channel_tx_queue(tx_queue, channel) { - efx_init_tx_queue(tx_queue); - atomic_inc(&efx->active_queues); + if (channel->channel >= efx->tx_channel_offset) { + efx_for_each_channel_tx_queue(tx_queue, channel) { + efx_init_tx_queue(tx_queue); + atomic_inc(&efx->active_queues); + } } - efx_for_each_channel_rx_queue(rx_queue, channel) { - efx_init_rx_queue(rx_queue); - atomic_inc(&efx->active_queues); - efx_stop_eventq(channel); - efx_fast_push_rx_descriptors(rx_queue, false); - efx_start_eventq(channel); + if (channel->channel < efx->n_rx_channels) { + efx_for_each_channel_rx_queue(rx_queue, channel) { + efx_init_rx_queue(rx_queue); + atomic_inc(&efx->active_queues); + efx_stop_eventq(channel); + efx_fast_push_rx_descriptors(rx_queue, false); + efx_start_eventq(channel); + } } WARN_ON(channel->rx_pkt_n_frags);
If efx_separate_tx_channels is used, some error messages and backtraces are shown in the logs (see below). This is because during channels start, all queues in the channels are init asumming that they exist, but they might not if efx_separate_tx_channels is used: some channels only have RX queues and others only have TX queues. Avoid that by checking if the channel has TX, RX or both queues. However, even with this patch the NIC is unusable when using efx_separate_tx_channels, so there are more problems that I've not identified. These messages are still shown at probe time many times: sfc 0000:03:00.0 (unnamed net_device) (uninitialized): MC command 0x92 inlen 8 failed rc=-71 (raw=0) arg=0 sfc 0000:03:00.0 (unnamed net_device) (uninitialized): failed to link VI 4294967295 to PIO buffer 1 (-71) Those messages were also shown before these patch. And then this other message and backtrace were also shown many times, but now they're not: sfc 0000:03:00.0 ens6f0np0: MC command 0x82 inlen 544 failed rc=-22 (raw=0) arg=0 ------------[ cut here ]------------ netdevice: ens6f0np0: failed to initialise TXQ -1 WARNING: CPU: 1 PID: 626 at drivers/net/ethernet/sfc/ef10.c:2393 efx_ef10_tx_init+0x201/0x300 [sfc] [...] stripped RIP: 0010:efx_ef10_tx_init+0x201/0x300 [sfc] [...] stripped Call Trace: efx_init_tx_queue+0xaa/0xf0 [sfc] efx_start_channels+0x49/0x120 [sfc] efx_start_all+0x1f8/0x430 [sfc] efx_net_open+0x5a/0xe0 [sfc] __dev_open+0xd0/0x190 __dev_change_flags+0x1b3/0x220 dev_change_flags+0x21/0x60 [...] At remove time, these messages were shown. Now they're neither shown: sfc 0000:03:00.0 ens6f0np0: failed to flush 10 queues sfc 0000:03:00.0 ens6f0np0: failed to flush queues Fixes: 7ec3de426014 ("sfc: move datapath management code") Reported-by: Tianhao Zhao <tizhao@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> --- drivers/net/ethernet/sfc/efx_channels.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)