diff mbox series

[net-next] net/sched: cls_flower: Add orig_ethtype

Message ID 20210830080800.18591-1-boris.sukholitko@broadcom.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net-next] net/sched: cls_flower: Add orig_ethtype | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 7 maintainers not CCed: wangqing@vivo.com eranbe@nvidia.com wenxu@ucloud.cn alobakin@pm.me vladimir.oltean@nxp.com gustavoars@kernel.org zhangkaiheb@126.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 6248 this patch: 6248
netdev/kdoc success Errors and warnings before: 7 this patch: 7
netdev/verify_fixes success Link
netdev/checkpatch warning WARNING: line length of 86 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns
netdev/build_allmodconfig_warn success Errors and warnings before: 6310 this patch: 6310
netdev/header_inline success Link

Commit Message

Boris Sukholitko Aug. 30, 2021, 8:08 a.m. UTC
The following flower filter fails to match packets:

tc filter add dev eth0 ingress protocol 0x8864 flower \
    action simple sdata hi64

The protocol 0x8864 (ETH_P_PPP_SES) is a tunnel protocol. As such, it is
being dissected by __skb_flow_dissect and it's internal protocol is
being set as key->basic.n_proto. IOW, the existence of ETH_P_PPP_SES
tunnel is transparent to the callers of __skb_flow_dissect.

OTOH, in the filters above, cls_flower configures its key->basic.n_proto
to the ETH_P_PPP_SES value configured by the user. Matching on this key
fails because of __skb_flow_dissect "transparency" mentioned above.

Therefore there is no way currently to match on such packets using
flower.

To fix the issue add new orig_ethtype key to the flower along with the
necessary changes to the flow dissector etc.

To filter the ETH_P_PPP_SES packets the command becomes:

tc filter add dev eth0 ingress flower orig_ethtype 0x8864 \
    action simple sdata hi64

Corresponding iproute2 patch follows.

Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
---
 include/net/flow_dissector.h |  9 +++++++++
 include/uapi/linux/pkt_cls.h |  1 +
 net/core/flow_dissector.c    | 12 ++++++++++++
 net/sched/cls_flower.c       | 15 +++++++++++++++
 4 files changed, 37 insertions(+)

Comments

Vladimir Oltean Aug. 30, 2021, 9 a.m. UTC | #1
Hi Boris,

On Mon, Aug 30, 2021 at 11:08:00AM +0300, Boris Sukholitko wrote:
> The following flower filter fails to match packets:
>
> tc filter add dev eth0 ingress protocol 0x8864 flower \
>     action simple sdata hi64
>
> The protocol 0x8864 (ETH_P_PPP_SES) is a tunnel protocol. As such, it is
> being dissected by __skb_flow_dissect and it's internal protocol is
> being set as key->basic.n_proto. IOW, the existence of ETH_P_PPP_SES
> tunnel is transparent to the callers of __skb_flow_dissect.
>
> OTOH, in the filters above, cls_flower configures its key->basic.n_proto
> to the ETH_P_PPP_SES value configured by the user. Matching on this key
> fails because of __skb_flow_dissect "transparency" mentioned above.
>
> Therefore there is no way currently to match on such packets using
> flower.
>
> To fix the issue add new orig_ethtype key to the flower along with the
> necessary changes to the flow dissector etc.
>
> To filter the ETH_P_PPP_SES packets the command becomes:
>
> tc filter add dev eth0 ingress flower orig_ethtype 0x8864 \
>     action simple sdata hi64
>
> Corresponding iproute2 patch follows.
>
> Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
> ---

It is very good that you've followed up this discussion with a patch:
https://patchwork.kernel.org/project/netdevbpf/patch/20210617161435.8853-1-vadym.kochan@plvision.eu/

I don't seem to see, however, in that discussion, what was the reasoning
that led to the introduction of a new TCA_FLOWER_KEY_ORIG_ETH_TYPE as
opposed to using TCA_FLOWER_KEY_ORIG_ETH_TYPE?

Can you explain in English what is the objective meaning of
TCA_FLOWER_KEY_ORIG_ETH_TYPE, other than "what I need to solve my problem"?
Maybe an entry in the man page section in your iproute2 patch?

How will the VLAN case be dealt with? What is the current status quo on
vlan_ethtype, will a tc-flower key of "vlan_ethtype $((ETH_P_PPP_SES))"
match a VLAN-tagged PPP session packet or not, will the flow dissector
still drill deep inside the packet? I guess this is the reason why you
introduced another variant of the ETH_TYPE netlink attribute, to be
symmetric with what could be done for VLAN? But I don't see VLAN changes?
Vladimir Oltean Aug. 30, 2021, 9:04 a.m. UTC | #2
On Mon, Aug 30, 2021 at 12:00:03PM +0300, Vladimir Oltean wrote:
> I don't seem to see, however, in that discussion, what was the reasoning
> that led to the introduction of a new TCA_FLOWER_KEY_ORIG_ETH_TYPE as
> opposed to using TCA_FLOWER_KEY_ORIG_ETH_TYPE?

Typo, of course, I meant "as opposed to TCA_FLOWER_KEY_ETH_TYPE".
Boris Sukholitko Aug. 30, 2021, 9:18 a.m. UTC | #3
Hi Vladimir,

On Mon, Aug 30, 2021 at 12:00:03PM +0300, Vladimir Oltean wrote:
[snip]
> 
> It is very good that you've followed up this discussion with a patch:
> https://patchwork.kernel.org/project/netdevbpf/patch/20210617161435.8853-1-vadym.kochan@plvision.eu/
> 
> I don't seem to see, however, in that discussion, what was the reasoning
> that led to the introduction of a new TCA_FLOWER_KEY_ORIG_ETH_TYPE as
> opposed to using TCA_FLOWER_KEY_ETH_TYPE?

While trying to implement the plan from:

https://patchwork.kernel.org/project/netdevbpf/patch/20210617161435.8853-1-vadym.kochan@plvision.eu/#24263965

I've came upon the conclusion that it is better to make new orig_ethtype key
rather than reusing TCA_FLOWER_KEY_ETH_TYPE name. The changes I've
proposed there seem of a dubious value now. IMHO, of course :)

> 
> Can you explain in English what is the objective meaning of
> TCA_FLOWER_KEY_ORIG_ETH_TYPE, other than "what I need to solve my problem"?

The orig part in the name means that the match is done on the original
protocol field of the packet, before dissector manipulation.

> Maybe an entry in the man page section in your iproute2 patch?

Yes, sure, good catch! I'll send V2 of the iproute2 patch shortly.

> 
> How will the VLAN case be dealt with? What is the current status quo on
> vlan_ethtype, will a tc-flower key of "vlan_ethtype $((ETH_P_PPP_SES))"
> match a VLAN-tagged PPP session packet or not, will the flow dissector
> still drill deep inside the packet? I guess this is the reason why you
> introduced another variant of the ETH_TYPE netlink attribute, to be
> symmetric with what could be done for VLAN? But I don't see VLAN changes?

For VLAN, I intend to add [c]vlan_orig_ethtype keys. I intend to send those
(to-be-written :)) patches separately.

Thanks,
Boris.
Vladimir Oltean Aug. 30, 2021, 9:21 a.m. UTC | #4
On Mon, Aug 30, 2021 at 12:18:13PM +0300, Boris Sukholitko wrote:
> Hi Vladimir,
>
> On Mon, Aug 30, 2021 at 12:00:03PM +0300, Vladimir Oltean wrote:
> [snip]
> >
> > It is very good that you've followed up this discussion with a patch:
> > https://patchwork.kernel.org/project/netdevbpf/patch/20210617161435.8853-1-vadym.kochan@plvision.eu/
> >
> > I don't seem to see, however, in that discussion, what was the reasoning
> > that led to the introduction of a new TCA_FLOWER_KEY_ORIG_ETH_TYPE as
> > opposed to using TCA_FLOWER_KEY_ETH_TYPE?
>
> While trying to implement the plan from:
>
> https://patchwork.kernel.org/project/netdevbpf/patch/20210617161435.8853-1-vadym.kochan@plvision.eu/#24263965
>
> I've came upon the conclusion that it is better to make new orig_ethtype key
> rather than reusing TCA_FLOWER_KEY_ETH_TYPE name. The changes I've
> proposed there seem of a dubious value now. IMHO, of course :)
>
> >
> > Can you explain in English what is the objective meaning of
> > TCA_FLOWER_KEY_ORIG_ETH_TYPE, other than "what I need to solve my problem"?
>
> The orig part in the name means that the match is done on the original
> protocol field of the packet, before dissector manipulation.
>
> > Maybe an entry in the man page section in your iproute2 patch?
>
> Yes, sure, good catch! I'll send V2 of the iproute2 patch shortly.
>
> >
> > How will the VLAN case be dealt with? What is the current status quo on
> > vlan_ethtype, will a tc-flower key of "vlan_ethtype $((ETH_P_PPP_SES))"
> > match a VLAN-tagged PPP session packet or not, will the flow dissector
> > still drill deep inside the packet? I guess this is the reason why you
> > introduced another variant of the ETH_TYPE netlink attribute, to be
> > symmetric with what could be done for VLAN? But I don't see VLAN changes?
>
> For VLAN, I intend to add [c]vlan_orig_ethtype keys. I intend to send those
> (to-be-written :)) patches separately.

Wait a minute, don't hurry! We haven't even discussed offloading.
So if I am writing a driver which offloads tc-flower, do I match on
ETH_TYPE or on ORIG_ETH_TYPE? To me, the EtherType is, well, the EtherType...
Boris Sukholitko Aug. 30, 2021, 9:42 a.m. UTC | #5
On Mon, Aug 30, 2021 at 12:21:28PM +0300, Vladimir Oltean wrote:
> On Mon, Aug 30, 2021 at 12:18:13PM +0300, Boris Sukholitko wrote:
> > Hi Vladimir,
> >
> > On Mon, Aug 30, 2021 at 12:00:03PM +0300, Vladimir Oltean wrote:
> > [snip]
> > >
> > > It is very good that you've followed up this discussion with a patch:
> > > https://patchwork.kernel.org/project/netdevbpf/patch/20210617161435.8853-1-vadym.kochan@plvision.eu/
> > >
> > > I don't seem to see, however, in that discussion, what was the reasoning
> > > that led to the introduction of a new TCA_FLOWER_KEY_ORIG_ETH_TYPE as
> > > opposed to using TCA_FLOWER_KEY_ETH_TYPE?
> >
> > While trying to implement the plan from:
> >
> > https://patchwork.kernel.org/project/netdevbpf/patch/20210617161435.8853-1-vadym.kochan@plvision.eu/#24263965
> >
> > I've came upon the conclusion that it is better to make new orig_ethtype key
> > rather than reusing TCA_FLOWER_KEY_ETH_TYPE name. The changes I've
> > proposed there seem of a dubious value now. IMHO, of course :)
> >
> > >
> > > Can you explain in English what is the objective meaning of
> > > TCA_FLOWER_KEY_ORIG_ETH_TYPE, other than "what I need to solve my problem"?
> >
> > The orig part in the name means that the match is done on the original
> > protocol field of the packet, before dissector manipulation.
> >
> > > Maybe an entry in the man page section in your iproute2 patch?
> >
> > Yes, sure, good catch! I'll send V2 of the iproute2 patch shortly.
> >
> > >
> > > How will the VLAN case be dealt with? What is the current status quo on
> > > vlan_ethtype, will a tc-flower key of "vlan_ethtype $((ETH_P_PPP_SES))"
> > > match a VLAN-tagged PPP session packet or not, will the flow dissector
> > > still drill deep inside the packet? I guess this is the reason why you
> > > introduced another variant of the ETH_TYPE netlink attribute, to be
> > > symmetric with what could be done for VLAN? But I don't see VLAN changes?
> >
> > For VLAN, I intend to add [c]vlan_orig_ethtype keys. I intend to send those
> > (to-be-written :)) patches separately.
> 
> Wait a minute, don't hurry! We haven't even discussed offloading.
> So if I am writing a driver which offloads tc-flower, do I match on
> ETH_TYPE or on ORIG_ETH_TYPE? To me, the EtherType is, well, the EtherType...

AFAIK, the offloads are using basic.n_proto key now. This means matching
on the innermost protocol (i.e. after stripping various tunnels, vlan
etc.). Notice, how the offload driver has no access to the original
'protocol' setting.

ORIG_ETH_TYPE if given, asks to match on the original protocol as it
appears in the unmodified packet. This gives the offload driver writers
ability to match on it if the need arises.

Thanks,
Boris.
Vladimir Oltean Aug. 30, 2021, 10:13 a.m. UTC | #6
On Mon, Aug 30, 2021 at 12:42:40PM +0300, Boris Sukholitko wrote:
> On Mon, Aug 30, 2021 at 12:21:28PM +0300, Vladimir Oltean wrote:
> > On Mon, Aug 30, 2021 at 12:18:13PM +0300, Boris Sukholitko wrote:
> > > Hi Vladimir,
> > >
> > > On Mon, Aug 30, 2021 at 12:00:03PM +0300, Vladimir Oltean wrote:
> > > [snip]
> > > >
> > > > It is very good that you've followed up this discussion with a patch:
> > > > https://patchwork.kernel.org/project/netdevbpf/patch/20210617161435.8853-1-vadym.kochan@plvision.eu/
> > > >
> > > > I don't seem to see, however, in that discussion, what was the reasoning
> > > > that led to the introduction of a new TCA_FLOWER_KEY_ORIG_ETH_TYPE as
> > > > opposed to using TCA_FLOWER_KEY_ETH_TYPE?
> > >
> > > While trying to implement the plan from:
> > >
> > > https://patchwork.kernel.org/project/netdevbpf/patch/20210617161435.8853-1-vadym.kochan@plvision.eu/#24263965
> > >
> > > I've came upon the conclusion that it is better to make new orig_ethtype key
> > > rather than reusing TCA_FLOWER_KEY_ETH_TYPE name. The changes I've
> > > proposed there seem of a dubious value now. IMHO, of course :)
> > >
> > > >
> > > > Can you explain in English what is the objective meaning of
> > > > TCA_FLOWER_KEY_ORIG_ETH_TYPE, other than "what I need to solve my problem"?
> > >
> > > The orig part in the name means that the match is done on the original
> > > protocol field of the packet, before dissector manipulation.
> > >
> > > > Maybe an entry in the man page section in your iproute2 patch?
> > >
> > > Yes, sure, good catch! I'll send V2 of the iproute2 patch shortly.
> > >
> > > >
> > > > How will the VLAN case be dealt with? What is the current status quo on
> > > > vlan_ethtype, will a tc-flower key of "vlan_ethtype $((ETH_P_PPP_SES))"
> > > > match a VLAN-tagged PPP session packet or not, will the flow dissector
> > > > still drill deep inside the packet? I guess this is the reason why you
> > > > introduced another variant of the ETH_TYPE netlink attribute, to be
> > > > symmetric with what could be done for VLAN? But I don't see VLAN changes?
> > >
> > > For VLAN, I intend to add [c]vlan_orig_ethtype keys. I intend to send those
> > > (to-be-written :)) patches separately.
> >
> > Wait a minute, don't hurry! We haven't even discussed offloading.
> > So if I am writing a driver which offloads tc-flower, do I match on
> > ETH_TYPE or on ORIG_ETH_TYPE? To me, the EtherType is, well, the EtherType...
>
> AFAIK, the offloads are using basic.n_proto key now. This means matching
> on the innermost protocol (i.e. after stripping various tunnels, vlan
> etc.). Notice, how the offload driver has no access to the original
> 'protocol' setting.
>
> ORIG_ETH_TYPE if given, asks to match on the original protocol as it
> appears in the unmodified packet. This gives the offload driver writers
> ability to match on it if the need arises.

That is correct. The fact that drivers offload EtherType matching based
on basic.n_proto seems wrong in the sense that it does not line up with
what the software dissector does, even though it is the best that the
API offers them, and most probably matches the intention.

And in the case of vlan_ethtype and cvlan_ethtype, I see that these are
passed along to the offloading driver through the same basic.n_proto,
which is... interesting?

But nonetheless, can you please make a compelling argument for introducing
a new set of ORIG netlink attributes instead of using the ones that exist?
Even if you don't implement the ORIG netlink attributes for VLAN, which
is fine if you don't need them, it would be good if you could document
your thought process, walk the reader through the solution as far as you've
explored it even if only mentally. Rewrite the commit message is what
I'm saying. You might stop responding to emails one day, and the changes
need to speak for themselves. My main complaint is that too little is
said, too much is being implied. This is not only you btw, but you
should not add to that situation.
Jamal Hadi Salim Aug. 31, 2021, 1:48 a.m. UTC | #7
On 2021-08-30 4:08 a.m., Boris Sukholitko wrote:
> The following flower filter fails to match packets:
> 
> tc filter add dev eth0 ingress protocol 0x8864 flower \
>      action simple sdata hi64
> 
> The protocol 0x8864 (ETH_P_PPP_SES) is a tunnel protocol. As such, it is
> being dissected by __skb_flow_dissect and it's internal protocol is
> being set as key->basic.n_proto. IOW, the existence of ETH_P_PPP_SES
> tunnel is transparent to the callers of __skb_flow_dissect.
> 
> OTOH, in the filters above, cls_flower configures its key->basic.n_proto
> to the ETH_P_PPP_SES value configured by the user. Matching on this key
> fails because of __skb_flow_dissect "transparency" mentioned above.
> 
> Therefore there is no way currently to match on such packets using
> flower.
> 
> To fix the issue add new orig_ethtype key to the flower along with the
> necessary changes to the flow dissector etc.
> 
> To filter the ETH_P_PPP_SES packets the command becomes:
> 
> tc filter add dev eth0 ingress flower orig_ethtype 0x8864 \
>      action simple sdata hi64

Where's "protocol" on the above command line is. Probably a typo?

The main culprit is clearly the flow dissector parsing. I am not sure
if in general flowdisc to deal with deeper hierarchies/tunnels
without constantly adding a lot more hacks. Imagine if you had an
ethernet packet with double vlan tags and encapsulating a pppoe packet
(i.e 3 or more layers of ethernet) - that would be a nightmare.
IMO, your approach is adding yet another bandaid.

Would it make sense for the setting of the
skb_key.basic.n_proto  to be from tp->protocol for
your specific case in classify().

Which means your original setup:
  tc filter add dev eth0 ingress protocol 0x8864 flower \
      action simple sdata hi64

should continue to work if i am not mistaken. Vlans would
continue to be a speacial case.

I dont know what that would break though...

cheers,
jamal
Boris Sukholitko Aug. 31, 2021, 12:04 p.m. UTC | #8
Hi Jamal,

On Mon, Aug 30, 2021 at 09:48:38PM -0400, Jamal Hadi Salim wrote:
> On 2021-08-30 4:08 a.m., Boris Sukholitko wrote:
> > The following flower filter fails to match packets:
> > 
> > tc filter add dev eth0 ingress protocol 0x8864 flower \
> >      action simple sdata hi64
> > 
> > The protocol 0x8864 (ETH_P_PPP_SES) is a tunnel protocol. As such, it is
> > being dissected by __skb_flow_dissect and it's internal protocol is
> > being set as key->basic.n_proto. IOW, the existence of ETH_P_PPP_SES
> > tunnel is transparent to the callers of __skb_flow_dissect.
> > 
> > OTOH, in the filters above, cls_flower configures its key->basic.n_proto
> > to the ETH_P_PPP_SES value configured by the user. Matching on this key
> > fails because of __skb_flow_dissect "transparency" mentioned above.
> > 
> > Therefore there is no way currently to match on such packets using
> > flower.
> > 
> > To fix the issue add new orig_ethtype key to the flower along with the
> > necessary changes to the flow dissector etc.
> > 
> > To filter the ETH_P_PPP_SES packets the command becomes:
> > 
> > tc filter add dev eth0 ingress flower orig_ethtype 0x8864 \
> >      action simple sdata hi64
> 
> Where's "protocol" on the above command line is. Probably a typo?

There is no need for protocol there. We intend to match on the tunnel
protocol existence only, disregarding its contents. Therefore
orig_ethtype key is sufficient.

> 
> The main culprit is clearly the flow dissector parsing. I am not sure
> if in general flowdisc to deal with deeper hierarchies/tunnels
> without constantly adding a lot more hacks. Imagine if you had an
> ethernet packet with double vlan tags and encapsulating a pppoe packet
> (i.e 3 or more layers of ethernet) - that would be a nightmare.

Of course there is no limit to our imagination :) However I would argue
that in the RealWorld(tm) the number of such nesting cases is
neglectable.

The evidence is that TC and flower are being actively used. Double VLAN
tags configurations notwithstading. IMHO, the fact that I've been
unlucky to hit this tunnel corner case does not mean that there is a
design problem with the flower.

AFAICS, the current meaning for the protocol field in TC is:

match the innermost layer 2 type through the tunnels that the kernel
knows about.

And it seems to me that this semantic is perfectly fine and does not
require fixing. Maybe be we need to mention it in the docs...

> IMO, your approach is adding yet another bandaid.

Could you please elaborate why do you see this approach as a bandaid? 

The patch in question adds another key to the other ~50 that exists in the
flower currently. Two more similar keys will be done for single and
double VLAN case. As a result, my users will gain the ability to match
packets that are impossible to match right now.

In difference with the TC protocol field, orig_ethtype answers the
question:

what is the original eth type of the packet, independent of the further
kernel processing.

IMHO, the above definition is also quite exact and has the right to
exist because we do not have such ability in the current kernel.

> 
> Would it make sense for the setting of the
> skb_key.basic.n_proto  to be from tp->protocol for
> your specific case in classify().
> 
> Which means your original setup:
>  tc filter add dev eth0 ingress protocol 0x8864 flower \
>      action simple sdata hi64
> 
> should continue to work if i am not mistaken. Vlans would
> continue to be a speacial case.
> 
> I dont know what that would break though...
> 

I think right off the bat, it will break the following user
configuration (untested!):

tc filter add dev eth0 ingress protocol ipv4 flower \
      action simple sdata hi64

Currently, the above rule matches ipv4 packets encapsulated in
ETH_P_PPP_SES protocol. The special casing will break it.

Thanks,
Boris.

> cheers,
> jamal
Jamal Hadi Salim Aug. 31, 2021, 1:18 p.m. UTC | #9
On 2021-08-31 8:04 a.m., Boris Sukholitko wrote:
> Hi Jamal,
> 
> On Mon, Aug 30, 2021 at 09:48:38PM -0400, Jamal Hadi Salim wrote:
>> On 2021-08-30 4:08 a.m., Boris Sukholitko wrote:
>>> The following flower filter fails to match packets:
>>>
>>> tc filter add dev eth0 ingress protocol 0x8864 flower \
>>>       action simple sdata hi64
>>>
>>> The protocol 0x8864 (ETH_P_PPP_SES) is a tunnel protocol. As such, it is
>>> being dissected by __skb_flow_dissect and it's internal protocol is
>>> being set as key->basic.n_proto. IOW, the existence of ETH_P_PPP_SES
>>> tunnel is transparent to the callers of __skb_flow_dissect.
>>>
>>> OTOH, in the filters above, cls_flower configures its key->basic.n_proto
>>> to the ETH_P_PPP_SES value configured by the user. Matching on this key
>>> fails because of __skb_flow_dissect "transparency" mentioned above.
>>>
>>> Therefore there is no way currently to match on such packets using
>>> flower.
>>>
>>> To fix the issue add new orig_ethtype key to the flower along with the
>>> necessary changes to the flow dissector etc.
>>>
>>> To filter the ETH_P_PPP_SES packets the command becomes:
>>>
>>> tc filter add dev eth0 ingress flower orig_ethtype 0x8864 \
>>>       action simple sdata hi64
>>
>> Where's "protocol" on the above command line is. Probably a typo?
> 
> There is no need for protocol there. We intend to match on the tunnel
> protocol existence only, disregarding its contents. Therefore
> orig_ethtype key is sufficient.
> 

Hold on: The command line requires you specify the root (parse)
node with keyword "protocol". i.e something like:
tc filter add dev eth0 ingress protocol 0x8864 flower ...

You are suggesting a new syntax - which only serves this one
use case that solves your problem and is only needed for flower
but no other tc classifier.

>>
>> The main culprit is clearly the flow dissector parsing. I am not sure
>> if in general flowdisc to deal with deeper hierarchies/tunnels
>> without constantly adding a lot more hacks. Imagine if you had an
>> ethernet packet with double vlan tags and encapsulating a pppoe packet
>> (i.e 3 or more layers of ethernet) - that would be a nightmare.
> 
> Of course there is no limit to our imagination :) However I would argue
> that in the RealWorld(tm) the number of such nesting cases is
> neglectable.
>

Youd be very suprised on how some telcos use things like vlan tags
as "path metadata" and/or mpls labels and the layering involved in that.
i.e There is another world out there;-> More important: there is
nothing illegitimate in any standard about multi-layer tunnels as
such.

> The evidence is that TC and flower are being actively used. Double VLAN
> tags configurations notwithstading. IMHO, the fact that I've been
> unlucky to hit this tunnel corner case does not mean that there is a
> design problem with the flower.
> 

You have _not_ been unlucky - it is a design issue with flow dissector
and the wrapping around flower. Just waiting to happen for more
other use cases..

> AFAICS, the current meaning for the protocol field in TC is:
> 
> match the innermost layer 2 type through the tunnels that the kernel
> knows about.
> 

"protocol" describes where you start parsing and matching in
the header tree - for tc that starts with the outer ethertype.
The frame has to make sense.

> And it seems to me that this semantic is perfectly fine and does not
> require fixing. Maybe be we need to mention it in the docs...
> 

Sorry, I disagree.

>> IMO, your approach is adding yet another bandaid.
> 
> Could you please elaborate why do you see this approach as a bandaid?
> 
> The patch in question adds another key to the other ~50 that exists in the
> flower currently. Two more similar keys will be done for single and
> double VLAN case. As a result, my users will gain the ability to match
> packets that are impossible to match right now.
> 

You are changing the semantics of the tooling for your single
use case at the expense of the general syntax. No other tc classifier
needs that keyword and no other tunneling feature within flower
needs it. Note:
It was a mistake allowing the speacial casing of vlans to work
around the dissector's shortcomings. Lets not add more now
"fixed" by user space.
Hacks are already happening with flower in user space
(at some point someone removed differentiation of "protocol"
and flower's "ethertype" to accomodate some h/w offload concern).

> In difference with the TC protocol field, orig_ethtype answers the
> question:
> 
> what is the original eth type of the packet, independent of the further
> kernel processing.
> 

"protocol" points to the root of the parse/match tree i.e where
to start matching.
Tom (Cced) has a nice diagram somewhere which i cant find right now.
There are certainly issues with namespace overlap. If you have multiple
"ethertypes" encapsulated in a parse tree, giving them appropriate
names would help. Some of the flower syntax allows that nicely but
only to two levels..


> IMHO, the above definition is also quite exact and has the right to
> exist because we do not have such ability in the current kernel.
> 
>>
>> Would it make sense for the setting of the
>> skb_key.basic.n_proto  to be from tp->protocol for
>> your specific case in classify().
>>
>> Which means your original setup:
>>   tc filter add dev eth0 ingress protocol 0x8864 flower \
>>       action simple sdata hi64
>>
>> should continue to work if i am not mistaken. Vlans would
>> continue to be a speacial case.
>>
>> I dont know what that would break though...
>>
> 
> I think right off the bat, it will break the following user
> configuration (untested!):
> 
> tc filter add dev eth0 ingress protocol ipv4 flower \
>        action simple sdata hi64
>

I dont see how from the code inspection. tp->protocol compare
already happened by the time flower classify() is invoked.
i.e it is correct value specified in configuration.
Dont use your update to iproute2. Just the kernel change
is needed.
I am not worried about your use case - just other tunneling
scenarios; we are going to have to run all tdc testcases and
a few more to validate.

> Currently, the above rule matches ipv4 packets encapsulated in
> ETH_P_PPP_SES protocol. The special casing will break it.

Can you describe the rules?
I think you need something like:
tc filter add dev eth0 ingress protocol 0x8864 flower \
...
ppp_proto ip \
dst_ip ...\
src_ip ....

This will require you introduce new attributes for ppp encaped
inside pppoe; i dont think there will be namespace collision in
the case of ip src/dst because it will be tied to ppp_proto

cheers,
jamal
Boris Sukholitko Aug. 31, 2021, 2:03 p.m. UTC | #10
On Tue, Aug 31, 2021 at 09:18:16AM -0400, Jamal Hadi Salim wrote:
[snip convincing points, to get to the solution]
> I think you need something like:
> tc filter add dev eth0 ingress protocol 0x8864 flower \
> ...
> ppp_proto ip \
> dst_ip ...\
> src_ip ....
> 
> This will require you introduce new attributes for ppp encaped
> inside pppoe; i dont think there will be namespace collision in
> the case of ip src/dst because it will be tied to ppp_proto
> 

I like this ppp_proto approach and going to implement it.

Thanks,
Boris.
Ido Schimmel Sept. 2, 2021, 6:48 a.m. UTC | #11
On Tue, Aug 31, 2021 at 09:18:16AM -0400, Jamal Hadi Salim wrote:
> You have _not_ been unlucky - it is a design issue with flow dissector
> and the wrapping around flower. Just waiting to happen for more
> other use cases..

I agree. I think the fundamental problem is that flower does not set
'FLOW_DISSECTOR_F_STOP_AT_ENCAP' and simply lets the flow dissector
parse as deep as possible. For example, 'dst_ip' will match on the
inner most destination IP which is not intuitive and probably different
than what most hardware implementations do.

This behavior is also very error prone because it means that if the
kernel learns to dissect a new tunnel protocol, filters can be suddenly
broken (match on outer field now matches on inner field).

I don't think that changing the default behavior is a solution as it can
break user space. Maybe adding a 'stop_encap' flag to flower that user
space will have to set?
Jamal Hadi Salim Sept. 3, 2021, 10:52 p.m. UTC | #12
On 2021-09-02 2:48 a.m., Ido Schimmel wrote:
> On Tue, Aug 31, 2021 at 09:18:16AM -0400, Jamal Hadi Salim wrote:
>> You have _not_ been unlucky - it is a design issue with flow dissector
>> and the wrapping around flower. Just waiting to happen for more
>> other use cases..
> 
> I agree. I think the fundamental problem is that flower does not set
> 'FLOW_DISSECTOR_F_STOP_AT_ENCAP' and simply lets the flow dissector
> parse as deep as possible. For example, 'dst_ip' will match on the
> inner most destination IP which is not intuitive and probably different
> than what most hardware implementations do.
> 
> This behavior is also very error prone because it means that if the
> kernel learns to dissect a new tunnel protocol, filters can be suddenly
> broken (match on outer field now matches on inner field).
> 

indeed, lots of ambiguity with multiple appearing headers of the same
type (eg ethernet/ethernet/ethernet or ip/ip/udp/vxlan/ip/...).

> I don't think that changing the default behavior is a solution as it can
> break user space. Maybe adding a 'stop_encap' flag to flower that user
> space will have to set?


Yes, this would work for the case of one simple rule that Boris posted
(small addition to user space).
For the rest of the data he was trying to match (ip headers) further
parsing would be needed before matching.
Unfortunately,  there is a lot of _ambiguity_ in those kind of
scenarios. Today's approach in TC is you pop some header then advance
the packet cursor - and the next rule picks up where the first one left 
off (i.e something like action "pppoe pop" would be needed).
The suggestion i made to Boris was to make it parse everything pppoe has
to offer in one rule - but that would not be advancing any skb data
pointers and would possibly require that one extra change i suggested
to set protocol to tp->protocol; such an approach is probably closest
to what hardware would do (i.e parse everything you need then match).
I am not sure which approach is less intrusive; imo, the challenge here
is perhaps the flow dissector is getting messy as a generic parser.
Maybe Tom and co can post patches for Panda which handles these
issues much more smoothly... Tom?

On your point on the hardware: interesting, guess I never thought of
possible inconsistencies. IIUC, as it stands today the software version
may end up having very different result than a supposedly equivalent
hw offload.
Would it make sense to make the hardware parsing also programmable
from software so there is consistency?

cheers,
jamal
Tom Herbert Sept. 4, 2021, 2:08 p.m. UTC | #13
On Tue, Aug 31, 2021 at 6:18 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On 2021-08-31 8:04 a.m., Boris Sukholitko wrote:
> > Hi Jamal,
> >
> > On Mon, Aug 30, 2021 at 09:48:38PM -0400, Jamal Hadi Salim wrote:
> >> On 2021-08-30 4:08 a.m., Boris Sukholitko wrote:
> >>> The following flower filter fails to match packets:
> >>>
> >>> tc filter add dev eth0 ingress protocol 0x8864 flower \
> >>>       action simple sdata hi64
> >>>
> >>> The protocol 0x8864 (ETH_P_PPP_SES) is a tunnel protocol. As such, it is
> >>> being dissected by __skb_flow_dissect and it's internal protocol is
> >>> being set as key->basic.n_proto. IOW, the existence of ETH_P_PPP_SES
> >>> tunnel is transparent to the callers of __skb_flow_dissect.
> >>>
> >>> OTOH, in the filters above, cls_flower configures its key->basic.n_proto
> >>> to the ETH_P_PPP_SES value configured by the user. Matching on this key
> >>> fails because of __skb_flow_dissect "transparency" mentioned above.
> >>>
> >>> Therefore there is no way currently to match on such packets using
> >>> flower.
> >>>
> >>> To fix the issue add new orig_ethtype key to the flower along with the
> >>> necessary changes to the flow dissector etc.
> >>>
> >>> To filter the ETH_P_PPP_SES packets the command becomes:
> >>>
> >>> tc filter add dev eth0 ingress flower orig_ethtype 0x8864 \
> >>>       action simple sdata hi64
> >>
> >> Where's "protocol" on the above command line is. Probably a typo?
> >
> > There is no need for protocol there. We intend to match on the tunnel
> > protocol existence only, disregarding its contents. Therefore
> > orig_ethtype key is sufficient.
> >
>
> Hold on: The command line requires you specify the root (parse)
> node with keyword "protocol". i.e something like:
> tc filter add dev eth0 ingress protocol 0x8864 flower ...
>
> You are suggesting a new syntax - which only serves this one
> use case that solves your problem and is only needed for flower
> but no other tc classifier.
>
> >>
> >> The main culprit is clearly the flow dissector parsing. I am not sure
> >> if in general flowdisc to deal with deeper hierarchies/tunnels
> >> without constantly adding a lot more hacks. Imagine if you had an
> >> ethernet packet with double vlan tags and encapsulating a pppoe packet
> >> (i.e 3 or more layers of ethernet) - that would be a nightmare.
> >
> > Of course there is no limit to our imagination :) However I would argue
> > that in the RealWorld(tm) the number of such nesting cases is
> > neglectable.
> >
>
> Youd be very suprised on how some telcos use things like vlan tags
> as "path metadata" and/or mpls labels and the layering involved in that.
> i.e There is another world out there;-> More important: there is
> nothing illegitimate in any standard about multi-layer tunnels as
> such.
>
> > The evidence is that TC and flower are being actively used. Double VLAN
> > tags configurations notwithstading. IMHO, the fact that I've been
> > unlucky to hit this tunnel corner case does not mean that there is a
> > design problem with the flower.
> >
>
> You have _not_ been unlucky - it is a design issue with flow dissector
> and the wrapping around flower. Just waiting to happen for more
> other use cases..
>
> > AFAICS, the current meaning for the protocol field in TC is:
> >
> > match the innermost layer 2 type through the tunnels that the kernel
> > knows about.
> >
>
> "protocol" describes where you start parsing and matching in
> the header tree - for tc that starts with the outer ethertype.
> The frame has to make sense.
>
> > And it seems to me that this semantic is perfectly fine and does not
> > require fixing. Maybe be we need to mention it in the docs...
> >
>
> Sorry, I disagree.
>
> >> IMO, your approach is adding yet another bandaid.
> >
> > Could you please elaborate why do you see this approach as a bandaid?
> >
> > The patch in question adds another key to the other ~50 that exists in the
> > flower currently. Two more similar keys will be done for single and
> > double VLAN case. As a result, my users will gain the ability to match
> > packets that are impossible to match right now.
> >
>
> You are changing the semantics of the tooling for your single
> use case at the expense of the general syntax. No other tc classifier
> needs that keyword and no other tunneling feature within flower
> needs it. Note:
> It was a mistake allowing the speacial casing of vlans to work
> around the dissector's shortcomings. Lets not add more now
> "fixed" by user space.
> Hacks are already happening with flower in user space
> (at some point someone removed differentiation of "protocol"
> and flower's "ethertype" to accomodate some h/w offload concern).
>
> > In difference with the TC protocol field, orig_ethtype answers the
> > question:
> >
> > what is the original eth type of the packet, independent of the further
> > kernel processing.
> >
>
> "protocol" points to the root of the parse/match tree i.e where
> to start matching.
> Tom (Cced) has a nice diagram somewhere which i cant find right now.
> There are certainly issues with namespace overlap. If you have multiple
> "ethertypes" encapsulated in a parse tree, giving them appropriate
> names would help. Some of the flower syntax allows that nicely but
> only to two levels..

I attached the diagram that Jamal is referring to (see attached). This
is a protocol graph of the various networking protocols parsed by flow
dissector. As complex as this is, there are still several other
protocols that users might be interested to parse and create filter
rules for like UDP encapsulations, TCP options, and HBH and
destination options. Red back links in the diagram indicate protocol
encapsulations which drive the number of possible protocol
combinations in a packet is combinatorial. Protocols are only getting
more and more complex and users are using more protocols and more
combinations of encapsulations, so I don't believe that continuing to
incrementally add to flow dissector or add support in tc-flower one
protocol at a time can scale. We need to rethink this. Also note, this
is no longer just a software problem either, more and more hardware is
built with advanced functionality like parsers and to leverage that
from Linux we essentially want the ability to offload flow dissector
with some assurance that hardware can parse all the same protocols and
produce the exact same metadata output. The end goal is pretty obvious
to me, a user should be able to make filter rules about the arbitrary
protocol and protocol combinations they are using, including custom
protocols, and in that they expect the highest performance possible
for their given environment.

IMO, the only way we can accomplish this is to have a common
programming model where the _exact_ same_ parser program used in
software is offloaded as is the hardware and guarantees the same
effects. This is a one out the outcomes of PANDA, see
https://www.youtube.com/watch?v=zVnmVDSEoXc&list=PLrninrcyMo3L-hsJv23hFyDGRaeBY1EJO&index=23.

Tom

>
>
> > IMHO, the above definition is also quite exact and has the right to
> > exist because we do not have such ability in the current kernel.
> >
> >>
> >> Would it make sense for the setting of the
> >> skb_key.basic.n_proto  to be from tp->protocol for
> >> your specific case in classify().
> >>
> >> Which means your original setup:
> >>   tc filter add dev eth0 ingress protocol 0x8864 flower \
> >>       action simple sdata hi64
> >>
> >> should continue to work if i am not mistaken. Vlans would
> >> continue to be a speacial case.
> >>
> >> I dont know what that would break though...
> >>
> >
> > I think right off the bat, it will break the following user
> > configuration (untested!):
> >
> > tc filter add dev eth0 ingress protocol ipv4 flower \
> >        action simple sdata hi64
> >
>
> I dont see how from the code inspection. tp->protocol compare
> already happened by the time flower classify() is invoked.
> i.e it is correct value specified in configuration.
> Dont use your update to iproute2. Just the kernel change
> is needed.
> I am not worried about your use case - just other tunneling
> scenarios; we are going to have to run all tdc testcases and
> a few more to validate.
>
> > Currently, the above rule matches ipv4 packets encapsulated in
> > ETH_P_PPP_SES protocol. The special casing will break it.
>
> Can you describe the rules?
> I think you need something like:
> tc filter add dev eth0 ingress protocol 0x8864 flower \
> ...
> ppp_proto ip \
> dst_ip ...\
> src_ip ....
>
> This will require you introduce new attributes for ppp encaped
> inside pppoe; i dont think there will be namespace collision in
> the case of ip src/dst because it will be tied to ppp_proto
>
> cheers,
> jamal
diff mbox series

Patch

diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index ffd386ea0dbb..083245a2d408 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -251,6 +251,14 @@  struct flow_dissector_key_hash {
 	u32 hash;
 };
 
+/**
+ * struct flow_dissector_key_orig_ethtype:
+ * @orig_ethtype: eth type as it appears in the packet
+ */
+struct flow_dissector_key_orig_ethtype {
+	__be16	orig_ethtype;
+};
+
 enum flow_dissector_key_id {
 	FLOW_DISSECTOR_KEY_CONTROL, /* struct flow_dissector_key_control */
 	FLOW_DISSECTOR_KEY_BASIC, /* struct flow_dissector_key_basic */
@@ -280,6 +288,7 @@  enum flow_dissector_key_id {
 	FLOW_DISSECTOR_KEY_META, /* struct flow_dissector_key_meta */
 	FLOW_DISSECTOR_KEY_CT, /* struct flow_dissector_key_ct */
 	FLOW_DISSECTOR_KEY_HASH, /* struct flow_dissector_key_hash */
+	FLOW_DISSECTOR_KEY_ORIG_ETH_TYPE, /* struct flow_dissector_key_orig_ethtype */
 
 	FLOW_DISSECTOR_KEY_MAX,
 };
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 025c40fef93d..238dee49f450 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -583,6 +583,7 @@  enum {
 	TCA_FLOWER_KEY_HASH,		/* u32 */
 	TCA_FLOWER_KEY_HASH_MASK,	/* u32 */
 
+	TCA_FLOWER_KEY_ORIG_ETH_TYPE,	/* be16 */
 	__TCA_FLOWER_MAX,
 };
 
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 4b2415d34873..23051e0d02fd 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -924,6 +924,7 @@  bool __skb_flow_dissect(const struct net *net,
 	struct flow_dissector_key_vlan *key_vlan;
 	enum flow_dissect_ret fdret;
 	enum flow_dissector_key_id dissector_vlan = FLOW_DISSECTOR_KEY_MAX;
+	__be16 orig_proto = proto;
 	bool mpls_el = false;
 	int mpls_lse = 0;
 	int num_hdrs = 0;
@@ -934,6 +935,7 @@  bool __skb_flow_dissect(const struct net *net,
 		data = skb->data;
 		proto = skb_vlan_tag_present(skb) ?
 			 skb->vlan_proto : skb->protocol;
+		orig_proto = proto;
 		nhoff = skb_network_offset(skb);
 		hlen = skb_headlen(skb);
 #if IS_ENABLED(CONFIG_NET_DSA)
@@ -1032,6 +1034,16 @@  bool __skb_flow_dissect(const struct net *net,
 		memcpy(key_eth_addrs, &eth->h_dest, sizeof(*key_eth_addrs));
 	}
 
+	if (dissector_uses_key(flow_dissector,
+			       FLOW_DISSECTOR_KEY_ORIG_ETH_TYPE)) {
+		struct flow_dissector_key_orig_ethtype *orig_ethtype;
+
+		orig_ethtype = skb_flow_dissector_target(flow_dissector,
+							 FLOW_DISSECTOR_KEY_ORIG_ETH_TYPE,
+							 target_container);
+		orig_ethtype->orig_ethtype = orig_proto;
+	}
+
 proto_again:
 	fdret = FLOW_DISSECT_RET_CONTINUE;
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index d7869a984881..bf6e88819f7d 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -70,6 +70,7 @@  struct fl_flow_key {
 	} tp_range;
 	struct flow_dissector_key_ct ct;
 	struct flow_dissector_key_hash hash;
+	struct flow_dissector_key_orig_ethtype orig_ethtype;
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. */
 
 struct fl_flow_mask_range {
@@ -710,6 +711,7 @@  static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 1] = {
 	[TCA_FLOWER_FLAGS]		= { .type = NLA_U32 },
 	[TCA_FLOWER_KEY_HASH]		= { .type = NLA_U32 },
 	[TCA_FLOWER_KEY_HASH_MASK]	= { .type = NLA_U32 },
+	[TCA_FLOWER_KEY_ORIG_ETH_TYPE]	= { .type = NLA_U16 },
 
 };
 
@@ -1696,6 +1698,11 @@  static int fl_set_key(struct net *net, struct nlattr **tb,
 		       &mask->hash.hash, TCA_FLOWER_KEY_HASH_MASK,
 		       sizeof(key->hash.hash));
 
+	fl_set_key_val(tb, &key->orig_ethtype.orig_ethtype,
+		       TCA_FLOWER_KEY_ORIG_ETH_TYPE,
+		       &mask->orig_ethtype.orig_ethtype, TCA_FLOWER_UNSPEC,
+		       sizeof(key->orig_ethtype.orig_ethtype));
+
 	if (tb[TCA_FLOWER_KEY_ENC_OPTS]) {
 		ret = fl_set_enc_opt(tb, key, mask, extack);
 		if (ret)
@@ -1812,6 +1819,8 @@  static void fl_init_dissector(struct flow_dissector *dissector,
 			     FLOW_DISSECTOR_KEY_CT, ct);
 	FL_KEY_SET_IF_MASKED(mask, keys, cnt,
 			     FLOW_DISSECTOR_KEY_HASH, hash);
+	FL_KEY_SET_IF_MASKED(mask, keys, cnt,
+			     FLOW_DISSECTOR_KEY_ORIG_ETH_TYPE, orig_ethtype);
 
 	skb_flow_dissector_init(dissector, keys, cnt);
 }
@@ -3037,6 +3046,12 @@  static int fl_dump_key(struct sk_buff *skb, struct net *net,
 			     sizeof(key->hash.hash)))
 		goto nla_put_failure;
 
+	if (fl_dump_key_val(skb, &key->orig_ethtype.orig_ethtype,
+			    TCA_FLOWER_KEY_ORIG_ETH_TYPE,
+			    &mask->orig_ethtype.orig_ethtype, TCA_FLOWER_UNSPEC,
+			    sizeof(key->orig_ethtype.orig_ethtype)))
+		goto nla_put_failure;
+
 	return 0;
 
 nla_put_failure: