diff mbox series

[v10,net-next,3/3] net/sched: act_frag: add implict packet fragment support.

Message ID 1605151497-29986-4-git-send-email-wenxu@ucloud.cn (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series net/sched: fix over mtu packet of defrag in | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit fail Errors and warnings before: 257 this patch: 258
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch warning WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
netdev/build_allmodconfig_warn fail Errors and warnings before: 257 this patch: 258
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

wenxu Nov. 12, 2020, 3:24 a.m. UTC
From: wenxu <wenxu@ucloud.cn>

Currently kernel tc subsystem can do conntrack in cat_ct. But when several
fragment packets go through the act_ct, function tcf_ct_handle_fragments
will defrag the packets to a big one. But the last action will redirect
mirred to a device which maybe lead the reassembly big packet over the mtu
of target device.

This patch add support for a xmit hook to mirred, that gets executed before
xmiting the packet. Then, when act_ct gets loaded, it configs that hook.
The frag xmit hook maybe reused by other modules.

Signed-off-by: wenxu <wenxu@ucloud.cn>
---
v2: Fix the crash for act_frag module without load
v3: modify the kconfig describe and put tcf_xmit_hook_is_enabled
in the tcf_dev_queue_xmit, and xchg atomic for tcf_xmit_hook
v4: using skb_protocol and fix line length exceeds 80 columns
v5: no change
v6: protect the tcf_xmit_hook with rcu lock
v7-v10: fix __rcu warning 

 include/net/act_api.h  |  18 ++++++
 net/sched/Kconfig      |  13 ++++
 net/sched/Makefile     |   1 +
 net/sched/act_api.c    |  44 +++++++++++++
 net/sched/act_ct.c     |   7 +++
 net/sched/act_frag.c   | 164 +++++++++++++++++++++++++++++++++++++++++++++++++
 net/sched/act_mirred.c |   2 +-
 7 files changed, 248 insertions(+), 1 deletion(-)
 create mode 100644 net/sched/act_frag.c

Comments

Jakub Kicinski Nov. 12, 2020, 10:20 p.m. UTC | #1
On Thu, 12 Nov 2020 11:24:57 +0800 wenxu@ucloud.cn wrote:
> v7-v10: fix __rcu warning 

Are you reposting stuff just to get it build tested?

This is absolutely unacceptable.
Marcelo Ricardo Leitner Nov. 13, 2020, 2:25 a.m. UTC | #2
On Thu, Nov 12, 2020 at 02:20:58PM -0800, Jakub Kicinski wrote:
> On Thu, 12 Nov 2020 11:24:57 +0800 wenxu@ucloud.cn wrote:
> > v7-v10: fix __rcu warning 
> 
> Are you reposting stuff just to get it build tested?
> 
> This is absolutely unacceptable.

I don't know if that's the case, but maybe we could have a shadow
mailing list just for that? So that bots would monitor and would run
(almost) the same tests are they do here. Then when patches are posted
here, a list that people actually subscribe, they are already more
ready. The bots would have to email an "ok" as well, but that's
implementation detail already. Not that developers shouldn't test
before posting, but the bots are already doing some tests that may be
beyond of what one can think of testing before posting.
Jakub Kicinski Nov. 13, 2020, 5:04 p.m. UTC | #3
On Thu, 12 Nov 2020 23:25:22 -0300 Marcelo Ricardo Leitner wrote:
> On Thu, Nov 12, 2020 at 02:20:58PM -0800, Jakub Kicinski wrote:
> > On Thu, 12 Nov 2020 11:24:57 +0800 wenxu@ucloud.cn wrote:  
> > > v7-v10: fix __rcu warning   
> > 
> > Are you reposting stuff just to get it build tested?
> > 
> > This is absolutely unacceptable.  
> 
> I don't know if that's the case, but maybe we could have a shadow
> mailing list just for that? So that bots would monitor and would run
> (almost) the same tests are they do here. Then when patches are posted
> here, a list that people actually subscribe, they are already more
> ready. The bots would have to email an "ok" as well, but that's
> implementation detail already. Not that developers shouldn't test
> before posting, but the bots are already doing some tests that may be
> beyond of what one can think of testing before posting.

The code for the entire system is right here:

https://github.com/kuba-moo/nipa

It depends on a patchwork instance to report results to.

I have a script there to feed patches in locally from a maildir but
haven't tested that in a while so it's probably broken. You can also
just run the build bash script without running the whole bot:

https://github.com/kuba-moo/nipa/blob/master/tests/patch/build_allmodconfig_warn/build_allmodconfig.sh

Hardly rocket science.

I have no preference on what people do to test their code, and I'm
happy to take patches for the bot, too.

But we can't have people posting 11 versions of patches to netdev which
is already too high traffic for people to follow.

Not to mention that someone needs to pay for the CPU cycles of the bot,
and we don't want to block getting results for legitimate patches.
Cong Wang Nov. 14, 2020, 6:05 p.m. UTC | #4
On Wed, Nov 11, 2020 at 9:44 PM <wenxu@ucloud.cn> wrote:
> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
> index 9c79fb9..dff3c40 100644
> --- a/net/sched/act_ct.c
> +++ b/net/sched/act_ct.c
> @@ -1541,8 +1541,14 @@ static int __init ct_init_module(void)
>         if (err)
>                 goto err_register;
>
> +       err = tcf_set_xmit_hook(tcf_frag_xmit_hook);

Yeah, this approach is certainly much better than extending act_mirred.
Just one comment below.


> diff --git a/net/sched/act_frag.c b/net/sched/act_frag.c
> new file mode 100644
> index 0000000..3a7ab92
> --- /dev/null
> +++ b/net/sched/act_frag.c

It is kinda confusing to see this is a module. It provides some
wrappers and hooks the dev_xmit_queue(), it belongs more to
the core tc code than any modularized code. How about putting
this into net/sched/sch_generic.c?

Thanks.
Marcelo Ricardo Leitner Nov. 14, 2020, 10:46 p.m. UTC | #5
On Sat, Nov 14, 2020 at 10:05:39AM -0800, Cong Wang wrote:
> On Wed, Nov 11, 2020 at 9:44 PM <wenxu@ucloud.cn> wrote:
> > diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
> > index 9c79fb9..dff3c40 100644
> > --- a/net/sched/act_ct.c
> > +++ b/net/sched/act_ct.c
> > @@ -1541,8 +1541,14 @@ static int __init ct_init_module(void)
> >         if (err)
> >                 goto err_register;
> >
> > +       err = tcf_set_xmit_hook(tcf_frag_xmit_hook);
> 
> Yeah, this approach is certainly much better than extending act_mirred.
> Just one comment below.

Nice. :-)

> 
> 
> > diff --git a/net/sched/act_frag.c b/net/sched/act_frag.c
> > new file mode 100644
> > index 0000000..3a7ab92
> > --- /dev/null
> > +++ b/net/sched/act_frag.c
> 
> It is kinda confusing to see this is a module. It provides some
> wrappers and hooks the dev_xmit_queue(), it belongs more to
> the core tc code than any modularized code. How about putting
> this into net/sched/sch_generic.c?

Davide had shared similar concerns with regards of the new module too.
The main idea behind the new module was to keep it as
isolated/contained as possible, and only so. So thumbs up from my
side. 

To be clear, you're only talking about the module itself, right? It
would still need to have the Kconfig to enable this feature, or not?

Thanks,
Marcelo
wenxu Nov. 15, 2020, 1:05 p.m. UTC | #6
在 2020/11/15 2:05, Cong Wang 写道:
> On Wed, Nov 11, 2020 at 9:44 PM <wenxu@ucloud.cn> wrote:
>> diff --git a/net/sched/act_frag.c b/net/sched/act_frag.c
>> new file mode 100644
>> index 0000000..3a7ab92
>> --- /dev/null
>> +++ b/net/sched/act_frag.c
> It is kinda confusing to see this is a module. It provides some
> wrappers and hooks the dev_xmit_queue(), it belongs more to
> the core tc code than any modularized code. How about putting
> this into net/sched/sch_generic.c?
>
> Thanks.

All the operations in the act_frag  are single L3 action.

So we put in a single module. to keep it as isolated/contained as possible

Maybe put this in a single file is better than a module? Buildin in the tc core code or not.

Enable this feature in Kconifg with NET_ACT_FRAG?

+config NET_ACT_FRAG
+	bool "Packet fragmentation"
+	depends on NET_CLS_ACT
+	help
+         Say Y here to allow fragmenting big packets when outputting
+         with the mirred action.
+
+	  If unsure, say N.


>
Jamal Hadi Salim Nov. 15, 2020, 4:26 p.m. UTC | #7
This nagged me:
What happens if all the frags dont make it out?
Should you at least return an error code(from tcf_fragment?)
and get the action err counters incremented?

cheers,
jamal

On 2020-11-15 8:05 a.m., wenxu wrote:
> 
> 在 2020/11/15 2:05, Cong Wang 写道:
>> On Wed, Nov 11, 2020 at 9:44 PM <wenxu@ucloud.cn> wrote:
>>> diff --git a/net/sched/act_frag.c b/net/sched/act_frag.c
>>> new file mode 100644
>>> index 0000000..3a7ab92
>>> --- /dev/null
>>> +++ b/net/sched/act_frag.c
>> It is kinda confusing to see this is a module. It provides some
>> wrappers and hooks the dev_xmit_queue(), it belongs more to
>> the core tc code than any modularized code. How about putting
>> this into net/sched/sch_generic.c?
>>
>> Thanks.
> 
> All the operations in the act_frag  are single L3 action.
> 
> So we put in a single module. to keep it as isolated/contained as possible
> 
> Maybe put this in a single file is better than a module? Buildin in the tc core code or not.
> 
> Enable this feature in Kconifg with NET_ACT_FRAG?
> 
> +config NET_ACT_FRAG
> +	bool "Packet fragmentation"
> +	depends on NET_CLS_ACT
> +	help
> +         Say Y here to allow fragmenting big packets when outputting
> +         with the mirred action.
> +
> +	  If unsure, say N.
> 
> 
>>
wenxu Nov. 16, 2020, 1:09 a.m. UTC | #8
On 11/16/2020 12:26 AM, Jamal Hadi Salim wrote:
> This nagged me:
> What happens if all the frags dont make it out?
> Should you at least return an error code(from tcf_fragment?)
> and get the action err counters incremented?
Thanks, Will do.
>
> cheers,
> jamal
>
> On 2020-11-15 8:05 a.m., wenxu wrote:
>>
>> 在 2020/11/15 2:05, Cong Wang 写道:
>>> On Wed, Nov 11, 2020 at 9:44 PM <wenxu@ucloud.cn> wrote:
>>>> diff --git a/net/sched/act_frag.c b/net/sched/act_frag.c
>>>> new file mode 100644
>>>> index 0000000..3a7ab92
>>>> --- /dev/null
>>>> +++ b/net/sched/act_frag.c
>>> It is kinda confusing to see this is a module. It provides some
>>> wrappers and hooks the dev_xmit_queue(), it belongs more to
>>> the core tc code than any modularized code. How about putting
>>> this into net/sched/sch_generic.c?
>>>
>>> Thanks.
>>
>> All the operations in the act_frag  are single L3 action.
>>
>> So we put in a single module. to keep it as isolated/contained as possible
>>
>> Maybe put this in a single file is better than a module? Buildin in the tc core code or not.
>>
>> Enable this feature in Kconifg with NET_ACT_FRAG?
>>
>> +config NET_ACT_FRAG
>> +    bool "Packet fragmentation"
>> +    depends on NET_CLS_ACT
>> +    help
>> +         Say Y here to allow fragmenting big packets when outputting
>> +         with the mirred action.
>> +
>> +      If unsure, say N.
>>
>>
>>>
>
>
Cong Wang Nov. 16, 2020, 6:57 p.m. UTC | #9
On Sat, Nov 14, 2020 at 2:46 PM Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> Davide had shared similar concerns with regards of the new module too.
> The main idea behind the new module was to keep it as
> isolated/contained as possible, and only so. So thumbs up from my
> side.
>
> To be clear, you're only talking about the module itself, right? It
> would still need to have the Kconfig to enable this feature, or not?

Both. The code itself doesn't look like a module, and it doesn't
look like an optional feature for act_ct either, does it? If not, there is
no need to have a user visible Kconfig, we just select it, or no Kconfig
at all.

Thanks.
Cong Wang Nov. 16, 2020, 7:01 p.m. UTC | #10
On Sun, Nov 15, 2020 at 5:06 AM wenxu <wenxu@ucloud.cn> wrote:
>
>
> 在 2020/11/15 2:05, Cong Wang 写道:
> > On Wed, Nov 11, 2020 at 9:44 PM <wenxu@ucloud.cn> wrote:
> >> diff --git a/net/sched/act_frag.c b/net/sched/act_frag.c
> >> new file mode 100644
> >> index 0000000..3a7ab92
> >> --- /dev/null
> >> +++ b/net/sched/act_frag.c
> > It is kinda confusing to see this is a module. It provides some
> > wrappers and hooks the dev_xmit_queue(), it belongs more to
> > the core tc code than any modularized code. How about putting
> > this into net/sched/sch_generic.c?
> >
> > Thanks.
>
> All the operations in the act_frag  are single L3 action.
>
> So we put in a single module. to keep it as isolated/contained as possible

Yeah, but you hook dev_queue_xmit() which is L2.

>
> Maybe put this in a single file is better than a module? Buildin in the tc core code or not.
>
> Enable this feature in Kconifg with NET_ACT_FRAG?

Sort of... If this is not an optional feature, that is a must-have
feature for act_ct,
we should just get rid of this Kconfig.

Also, you need to depend on CONFIG_INET somewhere to use the IP
fragment, no?

Thanks.
wenxu Nov. 17, 2020, 4:01 a.m. UTC | #11
On 11/17/2020 3:01 AM, Cong Wang wrote:
> On Sun, Nov 15, 2020 at 5:06 AM wenxu <wenxu@ucloud.cn> wrote:
>>
>> 在 2020/11/15 2:05, Cong Wang 写道:
>>> On Wed, Nov 11, 2020 at 9:44 PM <wenxu@ucloud.cn> wrote:
>>>> diff --git a/net/sched/act_frag.c b/net/sched/act_frag.c
>>>> new file mode 100644
>>>> index 0000000..3a7ab92
>>>> --- /dev/null
>>>> +++ b/net/sched/act_frag.c
>>> It is kinda confusing to see this is a module. It provides some
>>> wrappers and hooks the dev_xmit_queue(), it belongs more to
>>> the core tc code than any modularized code. How about putting
>>> this into net/sched/sch_generic.c?
>>>
>>> Thanks.
>> All the operations in the act_frag  are single L3 action.
>>
>> So we put in a single module. to keep it as isolated/contained as possible
> Yeah, but you hook dev_queue_xmit() which is L2.
>
>> Maybe put this in a single file is better than a module? Buildin in the tc core code or not.
>>
>> Enable this feature in Kconifg with NET_ACT_FRAG?
> Sort of... If this is not an optional feature, that is a must-have
> feature for act_ct,
> we should just get rid of this Kconfig.
>
> Also, you need to depend on CONFIG_INET somewhere to use the IP
> fragment, no?
>
> Thanks.

Maybe the act_frag should rename to sch_frag and buildin kernel.

This fcuntion can be used for all tc subsystem. There is no need for

CONFIG_INET. The sched system depends on NET.

>
Cong Wang Nov. 17, 2020, 10:43 p.m. UTC | #12
On Mon, Nov 16, 2020 at 8:06 PM wenxu <wenxu@ucloud.cn> wrote:
>
>
> On 11/17/2020 3:01 AM, Cong Wang wrote:
> > On Sun, Nov 15, 2020 at 5:06 AM wenxu <wenxu@ucloud.cn> wrote:
> >>
> >> 在 2020/11/15 2:05, Cong Wang 写道:
> >>> On Wed, Nov 11, 2020 at 9:44 PM <wenxu@ucloud.cn> wrote:
> >>>> diff --git a/net/sched/act_frag.c b/net/sched/act_frag.c
> >>>> new file mode 100644
> >>>> index 0000000..3a7ab92
> >>>> --- /dev/null
> >>>> +++ b/net/sched/act_frag.c
> >>> It is kinda confusing to see this is a module. It provides some
> >>> wrappers and hooks the dev_xmit_queue(), it belongs more to
> >>> the core tc code than any modularized code. How about putting
> >>> this into net/sched/sch_generic.c?
> >>>
> >>> Thanks.
> >> All the operations in the act_frag  are single L3 action.
> >>
> >> So we put in a single module. to keep it as isolated/contained as possible
> > Yeah, but you hook dev_queue_xmit() which is L2.
> >
> >> Maybe put this in a single file is better than a module? Buildin in the tc core code or not.
> >>
> >> Enable this feature in Kconifg with NET_ACT_FRAG?
> > Sort of... If this is not an optional feature, that is a must-have
> > feature for act_ct,
> > we should just get rid of this Kconfig.
> >
> > Also, you need to depend on CONFIG_INET somewhere to use the IP
> > fragment, no?
> >
> > Thanks.
>
> Maybe the act_frag should rename to sch_frag and buildin kernel.

sch_frag still sounds like a module. ;) This is why I proposed putting
it into sch_generic.c.

>
> This fcuntion can be used for all tc subsystem. There is no need for
>
> CONFIG_INET. The sched system depends on NET.

CONFIG_INET is different from CONFIG_NET, right?

Thanks.
wenxu Nov. 17, 2020, 11:21 p.m. UTC | #13
在 2020/11/18 6:43, Cong Wang 写道:
> On Mon, Nov 16, 2020 at 8:06 PM wenxu <wenxu@ucloud.cn> wrote:
>>
>> On 11/17/2020 3:01 AM, Cong Wang wrote:
>>> On Sun, Nov 15, 2020 at 5:06 AM wenxu <wenxu@ucloud.cn> wrote:
>>>> 在 2020/11/15 2:05, Cong Wang 写道:
>>>>> On Wed, Nov 11, 2020 at 9:44 PM <wenxu@ucloud.cn> wrote:
>>>>>> diff --git a/net/sched/act_frag.c b/net/sched/act_frag.c
>>>>>> new file mode 100644
>>>>>> index 0000000..3a7ab92
>>>>>> --- /dev/null
>>>>>> +++ b/net/sched/act_frag.c
>>>>> It is kinda confusing to see this is a module. It provides some
>>>>> wrappers and hooks the dev_xmit_queue(), it belongs more to
>>>>> the core tc code than any modularized code. How about putting
>>>>> this into net/sched/sch_generic.c?
>>>>>
>>>>> Thanks.
>>>> All the operations in the act_frag  are single L3 action.
>>>>
>>>> So we put in a single module. to keep it as isolated/contained as possible
>>> Yeah, but you hook dev_queue_xmit() which is L2.
>>>
>>>> Maybe put this in a single file is better than a module? Buildin in the tc core code or not.
>>>>
>>>> Enable this feature in Kconifg with NET_ACT_FRAG?
>>> Sort of... If this is not an optional feature, that is a must-have
>>> feature for act_ct,
>>> we should just get rid of this Kconfig.
>>>
>>> Also, you need to depend on CONFIG_INET somewhere to use the IP
>>> fragment, no?
>>>
>>> Thanks.
>> Maybe the act_frag should rename to sch_frag and buildin kernel.
> sch_frag still sounds like a module. ;) This is why I proposed putting
> it into sch_generic.c.
>
>> This fcuntion can be used for all tc subsystem. There is no need for
>>
>> CONFIG_INET. The sched system depends on NET.
> CONFIG_INET is different from CONFIG_NET, right?

you are right. ip_do_fragment depends on this!

>
> Thanks.
>
diff mbox series

Patch

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 8721492..87ea1df 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -239,6 +239,24 @@  int tcf_action_check_ctrlact(int action, struct tcf_proto *tp,
 			     struct netlink_ext_ack *newchain);
 struct tcf_chain *tcf_action_set_ctrlact(struct tc_action *a, int action,
 					 struct tcf_chain *newchain);
+
+typedef int xmit_hook_func(struct sk_buff *skb,
+			   int (*xmit)(struct sk_buff *skb));
+
+int tcf_dev_queue_xmit(struct sk_buff *skb, int (*xmit)(struct sk_buff *skb));
+int tcf_set_xmit_hook(xmit_hook_func *xmit_hook);
+void tcf_clear_xmit_hook(void);
+
+#if IS_ENABLED(CONFIG_NET_ACT_FRAG)
+int tcf_frag_xmit_hook(struct sk_buff *skb, int (*xmit)(struct sk_buff *skb));
+#else
+static inline int tcf_frag_xmit_hook(struct sk_buff *skb,
+				     int (*xmit)(struct sk_buff *skb))
+{
+	return 0;
+}
+#endif
+
 #endif /* CONFIG_NET_CLS_ACT */
 
 static inline void tcf_action_stats_update(struct tc_action *a, u64 bytes,
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index a3b37d8..9a240c7 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -974,9 +974,22 @@  config NET_ACT_TUNNEL_KEY
 	  To compile this code as a module, choose M here: the
 	  module will be called act_tunnel_key.
 
+config NET_ACT_FRAG
+	tristate "Packet fragmentation"
+	depends on NET_CLS_ACT
+	help
+         Say Y here to allow fragmenting big packets when outputting
+         with the mirred action.
+
+	  If unsure, say N.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_frag.
+
 config NET_ACT_CT
 	tristate "connection tracking tc action"
 	depends on NET_CLS_ACT && NF_CONNTRACK && NF_NAT && NF_FLOW_TABLE
+	depends on NET_ACT_FRAG
 	help
 	  Say Y here to allow sending the packets to conntrack module.
 
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 66bbf9a..c146186 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -29,6 +29,7 @@  obj-$(CONFIG_NET_IFE_SKBMARK)	+= act_meta_mark.o
 obj-$(CONFIG_NET_IFE_SKBPRIO)	+= act_meta_skbprio.o
 obj-$(CONFIG_NET_IFE_SKBTCINDEX)	+= act_meta_skbtcindex.o
 obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o
+obj-$(CONFIG_NET_ACT_FRAG)	+= act_frag.o
 obj-$(CONFIG_NET_ACT_CT)	+= act_ct.o
 obj-$(CONFIG_NET_ACT_GATE)	+= act_gate.o
 obj-$(CONFIG_NET_SCH_FIFO)	+= sch_fifo.o
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index f66417d..93868b7 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -22,6 +22,50 @@ 
 #include <net/act_api.h>
 #include <net/netlink.h>
 
+static xmit_hook_func __rcu *tcf_xmit_hook;
+static DEFINE_SPINLOCK(tcf_xmit_hook_lock);
+static u16 tcf_xmit_hook_count;
+
+int tcf_set_xmit_hook(xmit_hook_func *xmit_hook)
+{
+	spin_lock(&tcf_xmit_hook_lock);
+	if (!tcf_xmit_hook_count) {
+		rcu_assign_pointer(tcf_xmit_hook, xmit_hook);
+	} else if (xmit_hook != tcf_xmit_hook) {
+		spin_unlock(&tcf_xmit_hook_lock);
+		return -EBUSY;
+	}
+
+	tcf_xmit_hook_count++;
+	spin_unlock(&tcf_xmit_hook_lock);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(tcf_set_xmit_hook);
+
+void tcf_clear_xmit_hook(void)
+{
+	spin_lock(&tcf_xmit_hook_lock);
+	if (--tcf_xmit_hook_count == 0)
+		rcu_assign_pointer(tcf_xmit_hook, NULL);
+	spin_unlock(&tcf_xmit_hook_lock);
+
+	synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(tcf_clear_xmit_hook);
+
+int tcf_dev_queue_xmit(struct sk_buff *skb, int (*xmit)(struct sk_buff *skb))
+{
+	xmit_hook_func *xmit_hook;
+
+	xmit_hook = rcu_dereference(tcf_xmit_hook);
+	if (xmit_hook)
+		return xmit_hook(skb, xmit);
+	else
+		return xmit(skb);
+}
+EXPORT_SYMBOL_GPL(tcf_dev_queue_xmit);
+
 static void tcf_action_goto_chain_exec(const struct tc_action *a,
 				       struct tcf_result *res)
 {
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 9c79fb9..dff3c40 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -1541,8 +1541,14 @@  static int __init ct_init_module(void)
 	if (err)
 		goto err_register;
 
+	err = tcf_set_xmit_hook(tcf_frag_xmit_hook);
+	if (err)
+		goto err_action;
+
 	return 0;
 
+err_action:
+	tcf_unregister_action(&act_ct_ops, &ct_net_ops);
 err_register:
 	tcf_ct_flow_tables_uninit();
 err_tbl_init:
@@ -1552,6 +1558,7 @@  static int __init ct_init_module(void)
 
 static void __exit ct_cleanup_module(void)
 {
+	tcf_clear_xmit_hook();
 	tcf_unregister_action(&act_ct_ops, &ct_net_ops);
 	tcf_ct_flow_tables_uninit();
 	destroy_workqueue(act_ct_wq);
diff --git a/net/sched/act_frag.c b/net/sched/act_frag.c
new file mode 100644
index 0000000..3a7ab92
--- /dev/null
+++ b/net/sched/act_frag.c
@@ -0,0 +1,164 @@ 
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+#include <net/netlink.h>
+#include <net/act_api.h>
+#include <net/dst.h>
+#include <net/ip.h>
+#include <net/ip6_fib.h>
+
+struct tcf_frag_data {
+	unsigned long dst;
+	struct qdisc_skb_cb cb;
+	__be16 inner_protocol;
+	u16 vlan_tci;
+	__be16 vlan_proto;
+	unsigned int l2_len;
+	u8 l2_data[VLAN_ETH_HLEN];
+	int (*xmit)(struct sk_buff *skb);
+};
+
+static DEFINE_PER_CPU(struct tcf_frag_data, tcf_frag_data_storage);
+
+static int tcf_frag_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
+{
+	struct tcf_frag_data *data = this_cpu_ptr(&tcf_frag_data_storage);
+
+	if (skb_cow_head(skb, data->l2_len) < 0) {
+		kfree_skb(skb);
+		return -ENOMEM;
+	}
+
+	__skb_dst_copy(skb, data->dst);
+	*qdisc_skb_cb(skb) = data->cb;
+	skb->inner_protocol = data->inner_protocol;
+	if (data->vlan_tci & VLAN_CFI_MASK)
+		__vlan_hwaccel_put_tag(skb, data->vlan_proto,
+				       data->vlan_tci & ~VLAN_CFI_MASK);
+	else
+		__vlan_hwaccel_clear_tag(skb);
+
+	/* Reconstruct the MAC header.  */
+	skb_push(skb, data->l2_len);
+	memcpy(skb->data, &data->l2_data, data->l2_len);
+	skb_postpush_rcsum(skb, skb->data, data->l2_len);
+	skb_reset_mac_header(skb);
+
+	data->xmit(skb);
+
+	return 0;
+}
+
+static void tcf_frag_prepare_frag(struct sk_buff *skb,
+				  int (*xmit)(struct sk_buff *skb))
+{
+	unsigned int hlen = skb_network_offset(skb);
+	struct tcf_frag_data *data;
+
+	data = this_cpu_ptr(&tcf_frag_data_storage);
+	data->dst = skb->_skb_refdst;
+	data->cb = *qdisc_skb_cb(skb);
+	data->xmit = xmit;
+	data->inner_protocol = skb->inner_protocol;
+	if (skb_vlan_tag_present(skb))
+		data->vlan_tci = skb_vlan_tag_get(skb) | VLAN_CFI_MASK;
+	else
+		data->vlan_tci = 0;
+	data->vlan_proto = skb->vlan_proto;
+	data->l2_len = hlen;
+	memcpy(&data->l2_data, skb->data, hlen);
+
+	memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
+	skb_pull(skb, hlen);
+}
+
+static unsigned int
+tcf_frag_dst_get_mtu(const struct dst_entry *dst)
+{
+	return dst->dev->mtu;
+}
+
+static struct dst_ops tcf_frag_dst_ops = {
+	.family = AF_UNSPEC,
+	.mtu = tcf_frag_dst_get_mtu,
+};
+
+static int tcf_fragment(struct net *net, struct sk_buff *skb,
+			u16 mru, int (*xmit)(struct sk_buff *skb))
+{
+	if (skb_network_offset(skb) > VLAN_ETH_HLEN) {
+		net_warn_ratelimited("L2 header too long to fragment\n");
+		goto err;
+	}
+
+	if (skb_protocol(skb, true) == htons(ETH_P_IP)) {
+		struct dst_entry tcf_frag_dst;
+		unsigned long orig_dst;
+
+		tcf_frag_prepare_frag(skb, xmit);
+		dst_init(&tcf_frag_dst, &tcf_frag_dst_ops, NULL, 1,
+			 DST_OBSOLETE_NONE, DST_NOCOUNT);
+		tcf_frag_dst.dev = skb->dev;
+
+		orig_dst = skb->_skb_refdst;
+		skb_dst_set_noref(skb, &tcf_frag_dst);
+		IPCB(skb)->frag_max_size = mru;
+
+		ip_do_fragment(net, skb->sk, skb, tcf_frag_xmit);
+		refdst_drop(orig_dst);
+	} else if (skb_protocol(skb, true) == htons(ETH_P_IPV6)) {
+		unsigned long orig_dst;
+		struct rt6_info tcf_frag_rt;
+
+		tcf_frag_prepare_frag(skb, xmit);
+		memset(&tcf_frag_rt, 0, sizeof(tcf_frag_rt));
+		dst_init(&tcf_frag_rt.dst, &tcf_frag_dst_ops, NULL, 1,
+			 DST_OBSOLETE_NONE, DST_NOCOUNT);
+		tcf_frag_rt.dst.dev = skb->dev;
+
+		orig_dst = skb->_skb_refdst;
+		skb_dst_set_noref(skb, &tcf_frag_rt.dst);
+		IP6CB(skb)->frag_max_size = mru;
+
+		ipv6_stub->ipv6_fragment(net, skb->sk, skb, tcf_frag_xmit);
+		refdst_drop(orig_dst);
+	} else {
+		net_warn_ratelimited("Fail frag %s: eth=%x, MRU=%d, MTU=%d\n",
+				     netdev_name(skb->dev),
+				     ntohs(skb_protocol(skb, true)), mru,
+				     skb->dev->mtu);
+		goto err;
+	}
+
+	qdisc_skb_cb(skb)->mru = 0;
+	return 0;
+err:
+	kfree_skb(skb);
+	return -1;
+}
+
+int tcf_frag_xmit_hook(struct sk_buff *skb, int (*xmit)(struct sk_buff *skb))
+{
+	u16 mru = qdisc_skb_cb(skb)->mru;
+	int err;
+
+	if (mru && skb->len > mru + skb->dev->hard_header_len)
+		err = tcf_fragment(dev_net(skb->dev), skb, mru, xmit);
+	else
+		err = xmit(skb);
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(tcf_frag_xmit_hook);
+
+static int __init frag_init_module(void)
+{
+	return 0;
+}
+
+static void __exit frag_cleanup_module(void)
+{
+}
+
+module_init(frag_init_module);
+module_exit(frag_cleanup_module);
+MODULE_AUTHOR("wenxu <wenxu@ucloud.cn>");
+MODULE_LICENSE("GPL v2");
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 17d0095..7153c67 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -210,7 +210,7 @@  static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
 	int err;
 
 	if (!want_ingress)
-		err = dev_queue_xmit(skb);
+		err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
 	else
 		err = netif_receive_skb(skb);