Message ID | 20201123141256.14208-1-tariqt@nvidia.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] netdevice.h: Fix unintentional disable of ALL_FOR_ALL features on upper device | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net |
netdev/subject_prefix | success | Link |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 7841 this patch: 7841 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | warning | WARNING: line length of 96 exceeds 80 columns |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 8262 this patch: 8262 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
On Mon, Nov 23, 2020 at 3:13 PM Tariq Toukan <tariqt@nvidia.com> wrote: > > Calling netdev_increment_features() on upper/master device from > netdev_add_tso_features() implies unintentional clearance of ALL_FOR_ALL > features supported by all slaves. Fix it by passing ALL_FOR_ALL in > addition to ALL_TSO. > > Fixes: b0ce3508b25e ("bonding: allow TSO being set on bonding master") I think you should give more details to your bug report, because netdev_add_tso_features() is used from different places. Thanks.
On 11/23/2020 4:55 PM, Eric Dumazet wrote: > On Mon, Nov 23, 2020 at 3:13 PM Tariq Toukan <tariqt@nvidia.com> wrote: >> >> Calling netdev_increment_features() on upper/master device from >> netdev_add_tso_features() implies unintentional clearance of ALL_FOR_ALL >> features supported by all slaves. Fix it by passing ALL_FOR_ALL in >> addition to ALL_TSO. >> >> Fixes: b0ce3508b25e ("bonding: allow TSO being set on bonding master") > > I think you should give more details to your bug report, because > netdev_add_tso_features() is used from different > places. > > Thanks. > Right. I'll include these in the re-spin: Fixes: 247f6d0f8667 ("team: allow TSO being set on master") Fixes: f902e8812ef6 ("bridge: Add ability to enable TSO") I wonder though if netdev_increment_features() is expected to clear features that are not part of the mask. Thanks.
On Mon, Nov 23, 2020 at 5:15 PM Tariq Toukan <ttoukan.linux@gmail.com> wrote: > > > > On 11/23/2020 4:55 PM, Eric Dumazet wrote: > > On Mon, Nov 23, 2020 at 3:13 PM Tariq Toukan <tariqt@nvidia.com> wrote: > >> > >> Calling netdev_increment_features() on upper/master device from > >> netdev_add_tso_features() implies unintentional clearance of ALL_FOR_ALL > >> features supported by all slaves. Fix it by passing ALL_FOR_ALL in > >> addition to ALL_TSO. > >> > >> Fixes: b0ce3508b25e ("bonding: allow TSO being set on bonding master") > > > > I think you should give more details to your bug report, because > > netdev_add_tso_features() is used from different > > places. > > > > Thanks. > > > > Right. I'll include these in the re-spin: > Fixes: 247f6d0f8667 ("team: allow TSO being set on master") > Fixes: f902e8812ef6 ("bridge: Add ability to enable TSO") I was more thinking about what exact issue you had, and how we can reproduce it, and test the fix. > > I wonder though if netdev_increment_features() is expected to clear > features that are not part of the mask. Well, the 'increment' part was suggesting the function was adding flags, not removing them. We might ask Herbert Xu if we : 1) Need to comment the function, or change its name to be more descriptive. 2) Change the behavior (as you suggested) 3) Other choice.
On 11/24/2020 12:48 PM, Eric Dumazet wrote: > On Mon, Nov 23, 2020 at 5:15 PM Tariq Toukan <ttoukan.linux@gmail.com> wrote: >> >> >> >> On 11/23/2020 4:55 PM, Eric Dumazet wrote: >>> On Mon, Nov 23, 2020 at 3:13 PM Tariq Toukan <tariqt@nvidia.com> wrote: >>>> >>>> Calling netdev_increment_features() on upper/master device from >>>> netdev_add_tso_features() implies unintentional clearance of ALL_FOR_ALL >>>> features supported by all slaves. Fix it by passing ALL_FOR_ALL in >>>> addition to ALL_TSO. >>>> >>>> Fixes: b0ce3508b25e ("bonding: allow TSO being set on bonding master") >>> >>> I think you should give more details to your bug report, because >>> netdev_add_tso_features() is used from different >>> places. >>> >>> Thanks. >>> >> >> Right. I'll include these in the re-spin: >> Fixes: 247f6d0f8667 ("team: allow TSO being set on master") >> Fixes: f902e8812ef6 ("bridge: Add ability to enable TSO") > > I was more thinking about what exact issue you had, and how we can > reproduce it, and test the fix. > Issue reproduction is very simple: Pick any of the features under ALL_FOR_ALL, like tx-nocache-copy. Turn it on for all slaves. Turn it on for the bond. You'll still not be able to use it: tx-nocache-copy: off [requested on] Reason is that the call to netdev_add_tso_features() being considered as a "dummy" slave that has this feature bit cleared, breaking ALL_FOR_ALL logic. >> >> I wonder though if netdev_increment_features() is expected to clear >> features that are not part of the mask. > > Well, the 'increment' part was suggesting the function was adding > flags, not removing them. > Yes, that's confusing... Although ALL_FOR_ALL logic is just about removing, unlike ONE_FOR_ALL. > We might ask Herbert Xu if we : > > 1) Need to comment the function, or change its name to be more descriptive. > 2) Change the behavior (as you suggested) > 3) Other choice. >
On Tue, Nov 24, 2020 at 11:48:35AM +0100, Eric Dumazet wrote: > > Well, the 'increment' part was suggesting the function was adding > flags, not removing them. The idea of the increment part is that we're adding a constituent device, not that we're adding features. There have always been features which were conjunctions, i.e., they must be supported by all underlying devices for them to be enabled on the virtual device. Your use of the increment function is unusual, as you're not adding features that belong to one underlying device, but rather you're trying to enable a feature on the virtual device unconditionally. > We might ask Herbert Xu if we : > > 1) Need to comment the function, or change its name to be more descriptive. > 2) Change the behavior (as you suggested) > 3) Other choice. I think Tariq's patch is fine, although a comment should be added to netdev_add_tso_features as this use of the increment function is nonstandard. Thanks,
On 11/25/2020 5:25 AM, Herbert Xu wrote: > On Tue, Nov 24, 2020 at 11:48:35AM +0100, Eric Dumazet wrote: >> >> Well, the 'increment' part was suggesting the function was adding >> flags, not removing them. > > The idea of the increment part is that we're adding a constituent > device, not that we're adding features. There have always been > features which were conjunctions, i.e., they must be supported by > all underlying devices for them to be enabled on the virtual device. > > Your use of the increment function is unusual, as you're not adding > features that belong to one underlying device, but rather you're > trying to enable a feature on the virtual device unconditionally. > >> We might ask Herbert Xu if we : >> >> 1) Need to comment the function, or change its name to be more descriptive. >> 2) Change the behavior (as you suggested) >> 3) Other choice. > > I think Tariq's patch is fine, although a comment should be added > to netdev_add_tso_features as this use of the increment function > is nonstandard. > Thanks Herbert, I'll add a comment and re-spin. > Thanks, >
On Wed, Nov 25, 2020 at 10:06 AM Tariq Toukan <ttoukan.linux@gmail.com> wrote: > > > > On 11/25/2020 5:25 AM, Herbert Xu wrote: > > On Tue, Nov 24, 2020 at 11:48:35AM +0100, Eric Dumazet wrote: > >> > >> Well, the 'increment' part was suggesting the function was adding > >> flags, not removing them. > > > > The idea of the increment part is that we're adding a constituent > > device, not that we're adding features. There have always been > > features which were conjunctions, i.e., they must be supported by > > all underlying devices for them to be enabled on the virtual device. > > > > Your use of the increment function is unusual, as you're not adding > > features that belong to one underlying device, but rather you're > > trying to enable a feature on the virtual device unconditionally. This was not the intent. We can still disable TSO on the bonding device if desired. pk51:~# for i in bond0 eth1 eth2; do ethtool -k $i|grep tcp-segmentation-offload; done tcp-segmentation-offload: on tcp-segmentation-offload: on tcp-segmentation-offload: on lpk51:~# ethtool -K bond0 tso off Actual changes: tcp-segmentation-offload: off tx-tcp-segmentation: off tx-tcp-ecn-segmentation: off tx-tcp-mangleid-segmentation: off tx-tcp6-segmentation: off large-receive-offload: off [requested on] lpk51:~# for i in bond0 eth1 eth2; do ethtool -k $i|grep tcp-segmentation-offload; done tcp-segmentation-offload: off tcp-segmentation-offload: on tcp-segmentation-offload: on The intent was that we could have : lpk51:~# ethtool -K bond0 tso on Actual changes: tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: on tx-tcp-mangleid-segmentation: on tx-tcp6-segmentation: on lpk51:~# ethtool -K eth1 tso off lpk51:~# ethtool -K eth2 tso off lpk51:~# for i in bond0 eth1 eth2; do ethtool -k $i|grep tcp-segmentation-offload; done tcp-segmentation-offload: on tcp-segmentation-offload: off tcp-segmentation-offload: off lpk51:~# > > > >> We might ask Herbert Xu if we : > >> > >> 1) Need to comment the function, or change its name to be more descriptive. > >> 2) Change the behavior (as you suggested) > >> 3) Other choice. > > > > I think Tariq's patch is fine, although a comment should be added > > to netdev_add_tso_features as this use of the increment function > > is nonstandard. > > > > Thanks Herbert, I'll add a comment and re-spin. I think we should remove the use of netdev_increment_features() and replace it with something else, because there is too much confusion.
On 11/25/2020 11:27 AM, Eric Dumazet wrote: > On Wed, Nov 25, 2020 at 10:06 AM Tariq Toukan <ttoukan.linux@gmail.com> wrote: >> >> >> >> On 11/25/2020 5:25 AM, Herbert Xu wrote: >>> On Tue, Nov 24, 2020 at 11:48:35AM +0100, Eric Dumazet wrote: >>>> >>>> Well, the 'increment' part was suggesting the function was adding >>>> flags, not removing them. >>> >>> The idea of the increment part is that we're adding a constituent >>> device, not that we're adding features. There have always been >>> features which were conjunctions, i.e., they must be supported by >>> all underlying devices for them to be enabled on the virtual device. >>> >>> Your use of the increment function is unusual, as you're not adding >>> features that belong to one underlying device, but rather you're >>> trying to enable a feature on the virtual device unconditionally. > > This was not the intent. > > We can still disable TSO on the bonding device if desired. > > pk51:~# for i in bond0 eth1 eth2; do ethtool -k $i|grep > tcp-segmentation-offload; done > tcp-segmentation-offload: on > tcp-segmentation-offload: on > tcp-segmentation-offload: on > lpk51:~# ethtool -K bond0 tso off > Actual changes: > tcp-segmentation-offload: off > tx-tcp-segmentation: off > tx-tcp-ecn-segmentation: off > tx-tcp-mangleid-segmentation: off > tx-tcp6-segmentation: off > large-receive-offload: off [requested on] > lpk51:~# for i in bond0 eth1 eth2; do ethtool -k $i|grep > tcp-segmentation-offload; done > tcp-segmentation-offload: off > tcp-segmentation-offload: on > tcp-segmentation-offload: on > > The intent was that we could have : > > lpk51:~# ethtool -K bond0 tso on > Actual changes: > tcp-segmentation-offload: on > tx-tcp-segmentation: on > tx-tcp-ecn-segmentation: on > tx-tcp-mangleid-segmentation: on > tx-tcp6-segmentation: on > lpk51:~# ethtool -K eth1 tso off > lpk51:~# ethtool -K eth2 tso off > lpk51:~# for i in bond0 eth1 eth2; do ethtool -k $i|grep > tcp-segmentation-offload; done > tcp-segmentation-offload: on > tcp-segmentation-offload: off > tcp-segmentation-offload: off > lpk51:~# > > IIUC, we want to let the bond TSO feature bit be totally independent, not affected by slaves. If so, I think that: First we should take NETIF_F_GSO_SOFTWARE (or just NETIF_F_ALL_TSO) out of NETIF_F_ONE_FOR_ALL. Then, make sure it is set in bond_setup (it is already done, as part of BOND_VLAN_FEATURES). I think this new logic is good for all other upper devices, which will be affected by the change in NETIF_F_ONE_FOR_ALL. >>> >>>> We might ask Herbert Xu if we : >>>> >>>> 1) Need to comment the function, or change its name to be more descriptive. >>>> 2) Change the behavior (as you suggested) >>>> 3) Other choice. >>> >>> I think Tariq's patch is fine, although a comment should be added >>> to netdev_add_tso_features as this use of the increment function >>> is nonstandard. >>> >> >> Thanks Herbert, I'll add a comment and re-spin. > > I think we should remove the use of netdev_increment_features() and > replace it with something else, > because there is too much confusion. > I think it would be best. I can prepare the patch I described above if you agree with it.
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 18dec08439f9..a9d5e4bb829b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4748,7 +4748,7 @@ netdev_features_t netdev_increment_features(netdev_features_t all, static inline netdev_features_t netdev_add_tso_features(netdev_features_t features, netdev_features_t mask) { - return netdev_increment_features(features, NETIF_F_ALL_TSO, mask); + return netdev_increment_features(features, NETIF_F_ALL_TSO | NETIF_F_ALL_FOR_ALL, mask); } int __netdev_update_features(struct net_device *dev);
Calling netdev_increment_features() on upper/master device from netdev_add_tso_features() implies unintentional clearance of ALL_FOR_ALL features supported by all slaves. Fix it by passing ALL_FOR_ALL in addition to ALL_TSO. Fixes: b0ce3508b25e ("bonding: allow TSO being set on bonding master") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> --- include/linux/netdevice.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Hi, I know that netdev_increment_features() does not set any feature that's unmasked in the mask argument. I wonder why it can clear them though, was it meant to be like this? If not, then the proper fix should be in netdev_increment_features(), not in netdev_add_tso_features().