diff mbox series

[net-next] net/sched: cls_flower: fix resetting of ether proto mask

Message ID 20210617161435.8853-1-vadym.kochan@plvision.eu (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net-next] net/sched: cls_flower: fix resetting of ether proto mask | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 1 maintainers not CCed: jiri@resnulli.us
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 12 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link

Commit Message

Vadym Kochan June 17, 2021, 4:14 p.m. UTC
In case of matching 'protocol' the rule does not work:

    tc filter add dev $DEV ingress prio 1 protocol $PROTO flower skip_sw action drop

so clear the ether proto mask only for CVLAN case.

The issue was observed by testing on Marvell Prestera Switchdev driver
with recent 'flower' offloading feature.

Fixes: 0dca2c7404a9 ("net/sched: cls_flower: Remove match on n_proto")
CC: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
---
RFC -> PATCH:
    Removed extra empty line.
    
 net/sched/cls_flower.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Vladimir Oltean June 17, 2021, 4:41 p.m. UTC | #1
On Thu, Jun 17, 2021 at 07:14:35PM +0300, Vadym Kochan wrote:
> In case of matching 'protocol' the rule does not work:
> 
>     tc filter add dev $DEV ingress prio 1 protocol $PROTO flower skip_sw action drop
> 
> so clear the ether proto mask only for CVLAN case.
> 
> The issue was observed by testing on Marvell Prestera Switchdev driver
> with recent 'flower' offloading feature.
> 
> Fixes: 0dca2c7404a9 ("net/sched: cls_flower: Remove match on n_proto")
> CC: Boris Sukholitko <boris.sukholitko@broadcom.com>
> Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
> ---

You resent the patch very quickly, did you find out why Boris' patch was
working in the first place?
Vladimir Oltean June 17, 2021, 7:51 p.m. UTC | #2
On Thu, Jun 17, 2021 at 07:41:55PM +0300, Vladimir Oltean wrote:
> On Thu, Jun 17, 2021 at 07:14:35PM +0300, Vadym Kochan wrote:
> > In case of matching 'protocol' the rule does not work:
> >
> >     tc filter add dev $DEV ingress prio 1 protocol $PROTO flower skip_sw action drop
> >
> > so clear the ether proto mask only for CVLAN case.
> >
> > The issue was observed by testing on Marvell Prestera Switchdev driver
> > with recent 'flower' offloading feature.
> >
> > Fixes: 0dca2c7404a9 ("net/sched: cls_flower: Remove match on n_proto")
> > CC: Boris Sukholitko <boris.sukholitko@broadcom.com>
> > Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
> > ---
>
> You resent the patch very quickly, did you find out why Boris' patch was
> working in the first place?

I guess not.

The question day is, really, "what is skb->protocol?"

Boris said that the main classification routine - __tcf_classify -
already selects the adequate classifier (a struct tcf_proto) from the
classifier chain on the ingress qdisc of the interface. The selected
struct tcf_proto has a protocol field equal to the skb->protocol of the
packet.

So given three filters:

tc qdisc add dev sw1p0 clsact
tc filter add dev sw1p0 ingress handle 10 protocol ipv4 flower src_ip 192.168.1.1 skip_hw action drop
tc filter add dev sw1p0 ingress handle 10 protocol ipv4 flower src_ip 192.168.1.2 skip_hw action drop
tc filter add dev sw1p0 ingress handle 10 protocol ipv4 flower src_ip 192.168.1.3 skip_hw action drop
tc filter add dev sw1p0 ingress handle 10 protocol ipv4 flower src_ip 192.168.1.4 skip_hw action drop
tc filter add dev sw1p0 ingress handle 11 protocol 0x8864 flower skip_hw action drop
tc filter add dev sw1p0 ingress handle 12 protocol 0x8862 flower skip_hw action drop

there will be 3 tcf_proto structures: one for protocol ipv4 (0x0800),
one for 0x8864 and one for 0x8862.

Boris points out that skb_flow_dissect() does not always set the
key->basic.n_proto value to the EtherType, but rather to the innermost
protocol, if the EtherType is for a 'tunneling' protocol like PPPoE.
That's just how the flow dissector works.

But Boris goes on to argue that he wants to match packets with EtherType
0x8864, and the flow dissector is stopping him from doing that, since it
is coded up to look inside the PPPoE session header, and return either
ETH_P_IP or ETH_P_IPV6.

How does Boris solve the problem?
He removes the key->basic.n_proto from the set of keys used by the flow dissector
instantiated by the flower classifier associated with this struct tcf_proto.
Because the protocol is already taken into account by virtue of __tcf_classify()
looking at skb_protocol(skb), it appears to be redundant to give the
flow dissector a chance to have its own shot at what the skb protocol is.

Why does classification by protocol still work?
Because, for example, this filter:

tc filter add dev sw1p0 ingress handle 11 protocol 0x8864 flower skip_hw action drop

will be the only filter of the tcf_proto classifier for protocol 0x8864
which is found by __tcf_classify(). By the time the flower's fl_classify()
is called, the tp_proto for 0x8864 will have only one mask, and despite
the lack of any valid flow dissector keys, fl_mask_lookup() will always
find that one filter.

If there are multiple filters for the same tcf_proto classifier, I guess
it's random which one will match.

What does Boris break?
Offloading, of course. fl_hw_replace_filter() and fl_reoffload() create
a struct flow_cls_offload with a rule->match.mask member derived from
the mask of the software classifier: &f->mask->key - that same mask that
is used for initializing the flow dissector keys, and the one from which
Boris removed the basic.n_proto member because it was bothering him.


Now, having understood the problem, it is clear that Boris's patch needs
more work to not break offloading. If we decide that Boris's approach is
good and the classifier protocol does mean something close to the
EtherType (although again, still not always the EtherType, see below),
then we probably need to keep a different fl_flow_mask/fl_flow_key
structure for passing to the flow dissector compared to the
fl_flow_mask/fl_flow_key we pass to the offloading driver.

However, I think we need to take a step back and see how the situation
should be dealt with.

I tried to see what does "man tc.8" say about the "protocol" field, and
of course, it says nothing - I guess it's just too obvious to mention.
But the tc classifiers use skb->protocol, not the EtherType, and
skb->protocol is derived from eth_type_trans(), which in itself does
some really magik tricks. For example, packets received on a DSA master
will have a skb->protocol of 0x00F8, regardless of the EtherType of the
actual packet. DSA then registers a ptype_handler for this magik value,
and that's how it sniffs and processes those packets.  But no one really
expected this information to leak to user space in this way, I mean if
you add a software filter on a DSA master it will need to be for
protocol 0xf8, but that isn't documented anywhere unless you read the
code (and of course, offloaded filters behave totally differently
because they have no idea of eth_type_trans and its tricks).
Then, DSA has adjustment code in the flow dissector again such that the
DSA header, if present, is transparent and just skipped over - revealing
the basic.n_proto value behind that (ETH_P_IP or whatever) and other
headers. So.. you can match on IP headers using tc on a DSA master?
Well, no, because the protocol is 0xf8, not ipv4, so the tc user space
parser won't allow you to set IPv4 keys for the flower classifier :)

But maybe we decide that no, we really need an unadulterated EtherType
to match in tc-flower. Jamal considered that the generic tc filter protocol
is good enough for that, but it clearly isn't. The trouble is - even
if we make the tc program continue to pass the TCA_FLOWER_KEY_ETH_TYPE
netlink attribute as a potentially different value compared to the
filter protocol (right now they are kept in sync by user space) - the
flow dissector will not give us the EtherType in basic.n_proto, but
rather the innermost protocol in the case of tunnelling protocols.
So maybe it is the flow dissector we need to fix, to make it give us an
additional pure EtherType if asked for, make tc-flower use that
dissector key instead, and then revert Jamal's user space patch, and we
should all install our tc filters as:

tc filter add dev sw1p0 ingress handle 11 protocol all flower eth_type 0x8864 skip_hw action drop

?

Or maybe just be like you, say I don't care about any of that, I just
want it to behave as before, and simply revert Boris's patch. Ok, maybe
for various reasons we decide to go that route, for example if we can't
decide on anything better, and there is an obvious regression being
introduced - broken offloading. But at the very least, your revert needs
work too. Please use straight "git revert -s 0dca2c7404a9" and don't
just do partial reverts. I don't know whether intentionally or not, but
you left the C-VLAN case broken - the mask->key.basic.n_proto is still
being set to zero, despite there being information in the netlink
attribute.

In any case, please do a better job of describing the change you are
making and why you are making it this way.
Boris Sukholitko June 21, 2021, 8:32 a.m. UTC | #3
On Thu, Jun 17, 2021 at 10:51:02PM +0300, Vladimir Oltean wrote:
> On Thu, Jun 17, 2021 at 07:41:55PM +0300, Vladimir Oltean wrote:
> > On Thu, Jun 17, 2021 at 07:14:35PM +0300, Vadym Kochan wrote:

[snip excellent problem analysis]

> So maybe it is the flow dissector we need to fix, to make it give us an
> additional pure EtherType if asked for, make tc-flower use that
> dissector key instead, and then revert Jamal's user space patch, and we
> should all install our tc filters as:
> 
> tc filter add dev sw1p0 ingress handle 11 protocol all flower eth_type 0x8864 skip_hw action drop
> 
> ?

I like this solution. To be more explicit, the plan becomes:

1. Add FLOW_DISSECTOR_KEY_ETH_TYPE and struct flow_dissector_key_eth_type.
2. Have skb flow dissector use it.
3. Userspace does not set TCA_FLOWER_KEY_ETH_TYPE automagically
   anymore. cls_flower takes basic.n_proto from struct tcf_proto.
4. Add eth_type to the userspace and use it to set TCA_FLOWER_KEY_ETH_TYPE
5. Existence of TCA_FLOWER_KEY_ETH_TYPE triggers new eth_type dissector.

IMHO this neatly solves non-vlan protocol match case.

What should we do with the VLANs then? Should we have vlan_pure_ethtype
and cvlan_pure_ethtype as additional keys?

> 
> Or maybe just be like you, say I don't care about any of that, I just
> want it to behave as before, and simply revert Boris's patch. Ok, maybe

FTR I fully support reverting the patch. Please accept my apologies for
breaking the HW offload and big thanks to Vadym for finding it.

I will send the revert shortly.

Thanks,
Boris.
Vladimir Oltean June 21, 2021, 10:09 a.m. UTC | #4
On Mon, Jun 21, 2021 at 11:32:27AM +0300, Boris Sukholitko wrote:
> On Thu, Jun 17, 2021 at 10:51:02PM +0300, Vladimir Oltean wrote:
> > On Thu, Jun 17, 2021 at 07:41:55PM +0300, Vladimir Oltean wrote:
> > > On Thu, Jun 17, 2021 at 07:14:35PM +0300, Vadym Kochan wrote:
> 
> [snip excellent problem analysis]
> 
> > So maybe it is the flow dissector we need to fix, to make it give us an
> > additional pure EtherType if asked for, make tc-flower use that
> > dissector key instead, and then revert Jamal's user space patch, and we
> > should all install our tc filters as:
> > 
> > tc filter add dev sw1p0 ingress handle 11 protocol all flower eth_type 0x8864 skip_hw action drop
> > 
> > ?
> 
> I like this solution. To be more explicit, the plan becomes:
> 
> 1. Add FLOW_DISSECTOR_KEY_ETH_TYPE and struct flow_dissector_key_eth_type.
> 2. Have skb flow dissector use it.
> 3. Userspace does not set TCA_FLOWER_KEY_ETH_TYPE automagically
>    anymore. cls_flower takes basic.n_proto from struct tcf_proto.
> 4. Add eth_type to the userspace and use it to set TCA_FLOWER_KEY_ETH_TYPE
> 5. Existence of TCA_FLOWER_KEY_ETH_TYPE triggers new eth_type dissector.
> 
> IMHO this neatly solves non-vlan protocol match case.
> 
> What should we do with the VLANs then? Should we have vlan_pure_ethtype
> and cvlan_pure_ethtype as additional keys?

Yeah, I don't know about the "_pure_" part (the current name of the
options in tc user space seems fine), but the flow dissector should have
some parsing keys for the C-VLAN and S-VLAN EthType too, since the
FLOW_DISSECTOR_KEY_ETH_TYPE should match on, well, the EtherType.

> > 
> > Or maybe just be like you, say I don't care about any of that, I just
> > want it to behave as before, and simply revert Boris's patch. Ok, maybe
> 
> FTR I fully support reverting the patch. Please accept my apologies for
> breaking the HW offload and big thanks to Vadym for finding it.
> 
> I will send the revert shortly.
> 
> Thanks,
> Boris.

Thanks.

Please note that I haven't used tc for long enough to know what changes
are for its own good, so there is still place for expert feedback from
the maintainers, but this solution seems common sense to me.
Jamal Hadi Salim June 21, 2021, 2:04 p.m. UTC | #5
On 2021-06-21 4:32 a.m., Boris Sukholitko wrote:
> On Thu, Jun 17, 2021 at 10:51:02PM +0300, Vladimir Oltean wrote:
>> On Thu, Jun 17, 2021 at 07:41:55PM +0300, Vladimir Oltean wrote:
>>> On Thu, Jun 17, 2021 at 07:14:35PM +0300, Vadym Kochan wrote:
> 
> [snip excellent problem analysis]
> 
>> So maybe it is the flow dissector we need to fix, to make it give us an
>> additional pure EtherType if asked for, make tc-flower use that
>> dissector key instead, and then revert Jamal's user space patch, and we
>> should all install our tc filters as:
>>
>> tc filter add dev sw1p0 ingress handle 11 protocol all flower eth_type 0x8864 skip_hw action drop
>>
>> ?
> 
> I like this solution. To be more explicit, the plan becomes:
> 
> 1. Add FLOW_DISSECTOR_KEY_ETH_TYPE and struct flow_dissector_key_eth_type.
> 2. Have skb flow dissector use it.
> 3. Userspace does not set TCA_FLOWER_KEY_ETH_TYPE automagically
>     anymore. cls_flower takes basic.n_proto from struct tcf_proto.
> 4. Add eth_type to the userspace and use it to set TCA_FLOWER_KEY_ETH_TYPE
> 5. Existence of TCA_FLOWER_KEY_ETH_TYPE triggers new eth_type dissector.
> 
> IMHO this neatly solves non-vlan protocol match case.
> 
> What should we do with the VLANs then? Should we have vlan_pure_ethtype
> and cvlan_pure_ethtype as additional keys?
> 

I didnt see the original patch you sent until after it was applied
and the cursory 30 second glance didnt say much to me.

Vlans unfortunately are a speacial beast: You will have to retrieve
the proto differently.

Q: Was this always broken? Example look at Toke's change here:
commit d7bf2ebebc2bd61ab95e2a8e33541ef282f303d4

cheers,
jamal
Boris Sukholitko June 22, 2021, 1:13 p.m. UTC | #6
Hi Jamal,

On Mon, Jun 21, 2021 at 10:04:41AM -0400, Jamal Hadi Salim wrote:
> On 2021-06-21 4:32 a.m., Boris Sukholitko wrote:
> > On Thu, Jun 17, 2021 at 10:51:02PM +0300, Vladimir Oltean wrote:
> > > On Thu, Jun 17, 2021 at 07:41:55PM +0300, Vladimir Oltean wrote:
> > > > On Thu, Jun 17, 2021 at 07:14:35PM +0300, Vadym Kochan wrote:
> > 
> > [snip excellent problem analysis]
> > 
> > > So maybe it is the flow dissector we need to fix, to make it give us an
> > > additional pure EtherType if asked for, make tc-flower use that
> > > dissector key instead, and then revert Jamal's user space patch, and we
> > > should all install our tc filters as:
> > > 
> > > tc filter add dev sw1p0 ingress handle 11 protocol all flower eth_type 0x8864 skip_hw action drop
> > > 
> > > ?
> > 
> > I like this solution. To be more explicit, the plan becomes:
> > 
> > 1. Add FLOW_DISSECTOR_KEY_ETH_TYPE and struct flow_dissector_key_eth_type.
> > 2. Have skb flow dissector use it.
> > 3. Userspace does not set TCA_FLOWER_KEY_ETH_TYPE automagically
> >     anymore. cls_flower takes basic.n_proto from struct tcf_proto.
> > 4. Add eth_type to the userspace and use it to set TCA_FLOWER_KEY_ETH_TYPE
> > 5. Existence of TCA_FLOWER_KEY_ETH_TYPE triggers new eth_type dissector.
> > 
> > IMHO this neatly solves non-vlan protocol match case.
> > 
> > What should we do with the VLANs then? Should we have vlan_pure_ethtype
> > and cvlan_pure_ethtype as additional keys?
> > 
> 
> I didnt see the original patch you sent until after it was applied
> and the cursory 30 second glance didnt say much to me.
> 
> Vlans unfortunately are a speacial beast: You will have to retrieve
> the proto differently.

Do you by any chance have some naming suggestion? Does
vlan_pure_ethtype sound ok? What about vlan_{orig, pkt, raw, hdr}_ethtype?

> 
> Q: Was this always broken? Example look at Toke's change here:
> commit d7bf2ebebc2bd61ab95e2a8e33541ef282f303d4
> 

IMHO we've always had this problem. I did some archeology on this
code and didn't see anything which might have caused the bug.

Toke's change doesn't look related because in fl_classify it does:

	skb_key.basic.n_proto = skb_protocol(skb, false);

before running the dissector. In case of a known tunnelling protocol (such
as ETH_P_PPP_SES) the n_proto will be overriden by __skb_flow_dissect.

Thanks,
Boris.

> cheers,
> jamal
Jamal Hadi Salim June 22, 2021, 2:17 p.m. UTC | #7
Hi Boris,

On 2021-06-22 9:13 a.m., Boris Sukholitko wrote:
> Hi Jamal,
> 
> On Mon, Jun 21, 2021 at 10:04:41AM -0400, Jamal Hadi Salim wrote:
>> On 2021-06-21 4:32 a.m., Boris Sukholitko wrote:

[..]
>>> I like this solution. To be more explicit, the plan becomes:
>>>
>>> 1. Add FLOW_DISSECTOR_KEY_ETH_TYPE and struct flow_dissector_key_eth_type.
>>> 2. Have skb flow dissector use it.
>>> 3. Userspace does not set TCA_FLOWER_KEY_ETH_TYPE automagically
>>>      anymore. cls_flower takes basic.n_proto from struct tcf_proto.
>>> 4. Add eth_type to the userspace and use it to set TCA_FLOWER_KEY_ETH_TYPE
>>> 5. Existence of TCA_FLOWER_KEY_ETH_TYPE triggers new eth_type dissector.
>>>
>>> IMHO this neatly solves non-vlan protocol match case.
>>>
>>> What should we do with the VLANs then? Should we have vlan_pure_ethtype
>>> and cvlan_pure_ethtype as additional keys?
>>>
>>
>> I didnt see the original patch you sent until after it was applied
>> and the cursory 30 second glance didnt say much to me.
>>
>> Vlans unfortunately are a speacial beast: You will have to retrieve
>> the proto differently.
> 
> Do you by any chance have some naming suggestion? Does
> vlan_pure_ethtype sound ok? What about vlan_{orig, pkt, raw, hdr}_ethtype?
> 

The distinction is in getting the inner vs outer proto, correct?

Before we go there let me push back a little since no other
classifier has this problem. IIUC:
For the hardware offload current scheme works fine. On the
non-offload side, the issue seems to be that classify() was
not getting the proper protocol?

I pointed to Toke's change since it tries to generalize the
solution.  you'd use that approach
(eg setting to true).

Would that solve the issue?

If not maybe we need a naming convention for inner vs out proto.
Should be noted that user space semantics require that the "current
protocol" be specified in the policy. i.e if you have an inner
protocol and you need both looked at then you would have two rules
like this:

1) tc filter ... protocol outer .... action-to-strip-outer-header \
action reclassify
2) tc filter ... protoco inner ... action blah

The action-to-strip-outer-header will set properly the skb->proto
after moving the data pointers so that #2 will match it.

>>
>> Q: Was this always broken? Example look at Toke's change here:
>> commit d7bf2ebebc2bd61ab95e2a8e33541ef282f303d4
>>
> 
> IMHO we've always had this problem. I did some archeology on this
> code and didn't see anything which might have caused the bug.
> 

Suprised it took this long to find out.

> Toke's change doesn't look related because in fl_classify it does:
> 
> 	skb_key.basic.n_proto = skb_protocol(skb, false);
> 
> before running the dissector. In case of a known tunnelling protocol (such
> as ETH_P_PPP_SES) the n_proto will be overriden by __skb_flow_dissect.
> 

Toke's change may have left things as they were before
but generally to get vlan protos you'd do things differently.

This is because when vlan offloading was merged it skewed
things a little and we had to live with that.

Flower is unique in its use of the dissector which other classifiers
dont. Is this resolvable by having the fix in the  dissector?

cheers,
jamal
Boris Sukholitko June 22, 2021, 3:22 p.m. UTC | #8
On Tue, Jun 22, 2021 at 10:17:45AM -0400, Jamal Hadi Salim wrote:
> Hi Boris,
> 
> On 2021-06-22 9:13 a.m., Boris Sukholitko wrote:
> > Hi Jamal,
> > 
> > On Mon, Jun 21, 2021 at 10:04:41AM -0400, Jamal Hadi Salim wrote:
> > > On 2021-06-21 4:32 a.m., Boris Sukholitko wrote:
> 
> [..]
> > > > I like this solution. To be more explicit, the plan becomes:
> > > > 
> > > > 1. Add FLOW_DISSECTOR_KEY_ETH_TYPE and struct flow_dissector_key_eth_type.
> > > > 2. Have skb flow dissector use it.
> > > > 3. Userspace does not set TCA_FLOWER_KEY_ETH_TYPE automagically
> > > >      anymore. cls_flower takes basic.n_proto from struct tcf_proto.
> > > > 4. Add eth_type to the userspace and use it to set TCA_FLOWER_KEY_ETH_TYPE
> > > > 5. Existence of TCA_FLOWER_KEY_ETH_TYPE triggers new eth_type dissector.
> > > > 
> > > > IMHO this neatly solves non-vlan protocol match case.
> > > > 
> > > > What should we do with the VLANs then? Should we have vlan_pure_ethtype
> > > > and cvlan_pure_ethtype as additional keys?
> > > > 
> > > 
> > > I didnt see the original patch you sent until after it was applied
> > > and the cursory 30 second glance didnt say much to me.
> > > 
> > > Vlans unfortunately are a speacial beast: You will have to retrieve
> > > the proto differently.
> > 
> > Do you by any chance have some naming suggestion? Does
> > vlan_pure_ethtype sound ok? What about vlan_{orig, pkt, raw, hdr}_ethtype?
> > 
> 
> The distinction is in getting the inner vs outer proto, correct?

Yes. To be more explicit: the outer protocol (ETH_P_PPP_SES in this case) is
invisible to the user due to __skb_flow_dissect drilling down
to find the inner protocol.

> 
> Before we go there let me push back a little since no other
> classifier has this problem. IIUC:
> For the hardware offload current scheme works fine. On the
> non-offload side, the issue seems to be that classify() was
> not getting the proper protocol?

Yes. Talking specifically about flower's fl_classify and the following
rule (0x8864 is ETH_P_PPP_SES):

tc filter add dev eth0 ingress protocol 0x8864 flower action simple sdata hi6

skb_flow_dissect sets skb_key.basic.n_proto to the inner protocol
contained inside the PPP tunnel. fl_mask_lookup will fail finding the
outer protocol configured by the user.

> 
> I pointed to Toke's change since it tries to generalize the
> solution.  you'd use that approach
> (eg setting to true).
> 
> Would that solve the issue?
> 

I don't see how it might help ...

> If not maybe we need a naming convention for inner vs out proto.
> Should be noted that user space semantics require that the "current
> protocol" be specified in the policy. i.e if you have an inner
> protocol and you need both looked at then you would have two rules
> like this:
> 
> 1) tc filter ... protocol outer .... action-to-strip-outer-header \
> action reclassify
> 2) tc filter ... protoco inner ... action blah
> 
> The action-to-strip-outer-header will set properly the skb->proto
> after moving the data pointers so that #2 will match it.

It looks to me that there is no way to match on outer protocol such as
ETH_P_PPP_SES at least in flower. Although other filters (e.g. matchall)
will work, VLAN packets containing ETH_P_PPP_SES will require flower and
still will not match.

> 
> > > 
> > > Q: Was this always broken? Example look at Toke's change here:
> > > commit d7bf2ebebc2bd61ab95e2a8e33541ef282f303d4
> > > 
> > 
> > IMHO we've always had this problem. I did some archeology on this
> > code and didn't see anything which might have caused the bug.
> > 
> 
> Suprised it took this long to find out.
> 
> > Toke's change doesn't look related because in fl_classify it does:
> > 
> > 	skb_key.basic.n_proto = skb_protocol(skb, false);
> > 
> > before running the dissector. In case of a known tunnelling protocol (such
> > as ETH_P_PPP_SES) the n_proto will be overriden by __skb_flow_dissect.
> > 
> 
> Toke's change may have left things as they were before
> but generally to get vlan protos you'd do things differently.
> 
> This is because when vlan offloading was merged it skewed
> things a little and we had to live with that.
> 
> Flower is unique in its use of the dissector which other classifiers
> dont. Is this resolvable by having the fix in the  dissector?

Yes, the solution suggested by Vladimir and elaborated by myself
involves extending the dissector to keep the outer protocol and having
flower eth_type match on it. This is the "plan" being quoted above.

I believe this is the solution for the non-vlan tagged traffic. For the
vlans we already have [c]vlan_ethtype keys taken. Therefore we'll need
new [c]vlan_outer_ethtype keys.

Thanks,
Boris.

> 
> cheers,
> jamal
>
Jamal Hadi Salim June 24, 2021, 8:07 p.m. UTC | #9
Hi Boris,

Apologies for the latency.

On 2021-06-22 11:22 a.m., Boris Sukholitko wrote:
> On Tue, Jun 22, 2021 at 10:17:45AM -0400, Jamal Hadi Salim wrote:

[..]

>>> Do you by any chance have some naming suggestion? Does
>>> vlan_pure_ethtype sound ok? What about vlan_{orig, pkt, raw, hdr}_ethtype?
>>>
>>
>> The distinction is in getting the inner vs outer proto, correct?
> 
> Yes. To be more explicit: the outer protocol (ETH_P_PPP_SES in this case) is
> invisible to the user due to __skb_flow_dissect drilling down
> to find the inner protocol.

Ok, seems this is going to be problematic for flower for more than
just ETH_P_PPP_SES, no? i.e anything that has an inner proto.
IIUC, basically what you end up seeing in fl_classify() is
the PPP protocol that is extracted by the dissector?

> Yes. Talking specifically about flower's fl_classify and the following
> rule (0x8864 is ETH_P_PPP_SES):
> 
> tc filter add dev eth0 ingress protocol 0x8864 flower action simple sdata hi6
> 
> skb_flow_dissect sets skb_key.basic.n_proto to the inner protocol
> contained inside the PPP tunnel. fl_mask_lookup will fail finding the
> outer protocol configured by the user.
> 

For vlans it seems that flower tries to "rectify" the situation
with skb_protocol() (that why i pointed to that function) - but the
situation in this case cant be rectified inside fl_classify().

Just quick glance of the dissector code though seems to indicate
skb->protocol is untouched, no? i.e if you have a simple pppoe with
ppp protocol == ipv4, skb->protocol should still be 0x8864 on it
(even when skb_key.basic.n_proto has ipv4).


> It looks to me that there is no way to match on outer protocol such as
> ETH_P_PPP_SES at least in flower. Although other filters (e.g. matchall)
> will work, VLAN packets containing ETH_P_PPP_SES will require flower and
> still will not match.

This is a consequence of flower using flow_dissector and flow
dissector loosing information..

>> This is because when vlan offloading was merged it skewed
>> things a little and we had to live with that.
>>
>> Flower is unique in its use of the dissector which other classifiers
>> dont. Is this resolvable by having the fix in the  dissector?
> 
> Yes, the solution suggested by Vladimir and elaborated by myself
> involves extending the dissector to keep the outer protocol and having
> flower eth_type match on it. This is the "plan" being quoted above.
> 
 >
> I believe this is the solution for the non-vlan tagged traffic. For the
> vlans we already have [c]vlan_ethtype keys taken. Therefore we'll need
> new [c]vlan_outer_ethtype keys.
> 

I think that would work in the short term but what happens when you
have vlan in vlan carrying pppoe? i.e how much max nesting can you
assume and what would you call these fields?

cheers,
jamal
Boris Sukholitko June 28, 2021, 6:24 a.m. UTC | #10
Hi Jamal,

On Thu, Jun 24, 2021 at 04:07:56PM -0400, Jamal Hadi Salim wrote:
> Hi Boris,
> 
> Apologies for the latency.
> 

Apparently mine is not great either. :)

> On 2021-06-22 11:22 a.m., Boris Sukholitko wrote:
> > On Tue, Jun 22, 2021 at 10:17:45AM -0400, Jamal Hadi Salim wrote:
> 
> [..]
> 
> > > > Do you by any chance have some naming suggestion? Does
> > > > vlan_pure_ethtype sound ok? What about vlan_{orig, pkt, raw, hdr}_ethtype?
> > > > 
> > > 
> > > The distinction is in getting the inner vs outer proto, correct?
> > 
> > Yes. To be more explicit: the outer protocol (ETH_P_PPP_SES in this case) is
> > invisible to the user due to __skb_flow_dissect drilling down
> > to find the inner protocol.
> 
> Ok, seems this is going to be problematic for flower for more than
> just ETH_P_PPP_SES, no? i.e anything that has an inner proto.
> IIUC, basically what you end up seeing in fl_classify() is
> the PPP protocol that is extracted by the dissector?
> 

Yes. It looks like the problem for all of tunnel protocols.

> > Yes. Talking specifically about flower's fl_classify and the following
> > rule (0x8864 is ETH_P_PPP_SES):
> > 
> > tc filter add dev eth0 ingress protocol 0x8864 flower action simple sdata hi6
> > 
> > skb_flow_dissect sets skb_key.basic.n_proto to the inner protocol
> > contained inside the PPP tunnel. fl_mask_lookup will fail finding the
> > outer protocol configured by the user.
> > 
> 
> For vlans it seems that flower tries to "rectify" the situation
> with skb_protocol() (that why i pointed to that function) - but the
> situation in this case cant be rectified inside fl_classify().
> 
> Just quick glance of the dissector code though seems to indicate
> skb->protocol is untouched, no? i.e if you have a simple pppoe with
> ppp protocol == ipv4, skb->protocol should still be 0x8864 on it
> (even when skb_key.basic.n_proto has ipv4).
> 

The skb->protocol is probably untouched. Unfortunately I don't see
how this may help us... 

> 
> > It looks to me that there is no way to match on outer protocol such as
> > ETH_P_PPP_SES at least in flower. Although other filters (e.g. matchall)
> > will work, VLAN packets containing ETH_P_PPP_SES will require flower and
> > still will not match.
> 
> This is a consequence of flower using flow_dissector and flow
> dissector loosing information..
> 

Yes.

> > > This is because when vlan offloading was merged it skewed
> > > things a little and we had to live with that.
> > > 
> > > Flower is unique in its use of the dissector which other classifiers
> > > dont. Is this resolvable by having the fix in the  dissector?
> > 
> > Yes, the solution suggested by Vladimir and elaborated by myself
> > involves extending the dissector to keep the outer protocol and having
> > flower eth_type match on it. This is the "plan" being quoted above.
> > 
> >
> > I believe this is the solution for the non-vlan tagged traffic. For the
> > vlans we already have [c]vlan_ethtype keys taken. Therefore we'll need
> > new [c]vlan_outer_ethtype keys.
> > 
> 
> I think that would work in the short term but what happens when you
> have vlan in vlan carrying pppoe? i.e how much max nesting can you
> assume and what would you call these fields?
> 

Currently flower matches on 2 level of vlans: e.g. it has vlan_prio for
the outer vlan and cvlan_prio for the inner vlan. I have no ambition to
go beyond that. Thus only two keys will be needed: vlan_outer_ethtype
and cvlan_outer_ethtype, I think...

In your vlan in vlan ppoe example, I hope that
cvlan_outer_ethtype == ETH_P_PPP_SES will match.

Thanks,
Boris.

> cheers,
> jamal
>
diff mbox series

Patch

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 2e704c7a105a..3e1b0012bce4 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -1534,10 +1534,12 @@  static int fl_set_key(struct net *net, struct nlattr **tb,
 					mask->basic.n_proto = cpu_to_be16(0);
 				} else {
 					key->basic.n_proto = ethertype;
+					mask->basic.n_proto = cpu_to_be16(~0);
 				}
 			}
 		} else {
 			key->basic.n_proto = ethertype;
+			mask->basic.n_proto = cpu_to_be16(~0);
 		}
 	}