Message ID | a2a8af1622dff2bfd51d446aa8da2c1d2f6f543c.1611304190.git.lukas@wunner.de (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Netfilter egress hook | expand |
On Fri, Jan 22, 2021 at 9:55 AM Lukas Wunner <lukas@wunner.de> wrote: > > sch_handle_egress() returns either the skb or NULL to signal to its > caller __dev_queue_xmit() whether a packet should continue to be > processed. > > The skb is always non-NULL, otherwise __dev_queue_xmit() would hit a > NULL pointer deref right at its top. > > But the compiler doesn't know that. So if sch_handle_egress() signals > success by returning the skb, the "if (!skb) goto out;" statement > results in a gratuitous NULL pointer check in the Assembler output. > > Avoid by telling the compiler that __dev_queue_xmit() is never passed a > NULL skb. This also eliminates another gratuitous NULL pointer check in > __dev_queue_xmit() > qdisc_pkt_len_init() > skb_header_pointer() > __skb_header_pointer() > > The speedup is barely measurable: > Before: 1877 1875 1878 1874 1882 1873 Mb/sec > After: 1877 1877 1880 1883 1888 1886 Mb/sec > > However we're about to add a netfilter egress hook to __dev_queue_xmit() > and without the micro-optimization, it will result in a performance > degradation which is indeed measurable: > With netfilter hook: 1853 1852 1850 1848 1849 1851 Mb/sec > With netfilter hook + micro-optim: 1874 1877 1881 1875 1876 1876 Mb/sec > > The performance degradation is caused by a JNE instruction ("if (skb)") > being flipped to a JE instruction ("if (!skb)") once the netfilter hook > is added. The micro-optimization removes the test and jump instructions > altogether. > > Measurements were performed on a Core i7-3615QM. Reproducer: > ip link add dev foo type dummy > ip link set dev foo up > tc qdisc add dev foo clsact > tc filter add dev foo egress bpf da bytecode '1,6 0 0 0,' > modprobe pktgen > echo "add_device foo" > /proc/net/pktgen/kpktgend_3 > samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i foo -n 400000000 -m "11:11:11:11:11:11" -d 1.1.1.1 > > Signed-off-by: Lukas Wunner <lukas@wunner.de> > Cc: John Fastabend <john.fastabend@gmail.com> > Cc: Daniel Borkmann <daniel@iogearbox.net> > Cc: Alexei Starovoitov <ast@kernel.org> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Thomas Graf <tgraf@suug.ch> > --- > net/core/dev.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 7afbb642e203..4c16b9932823 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4072,6 +4072,7 @@ struct netdev_queue *netdev_core_pick_tx(struct net_device *dev, > * the BH enable code must have IRQs enabled so that it will not deadlock. > * --BLG > */ > +__attribute__((nonnull(1))) > static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) > { > struct net_device *dev = skb->dev; > -- > 2.29.2 > Interesting ! It is a bit sad the compilers do not automatically get this knowledge from the very first instruction : struct net_device *dev = skb->dev; I see this also makes qdisc_pkt_len_init() slightly faster because this removes the if (!skb) test from __skb_header_pointer() I guess we also could add this patch to benefit all skb_header_pointer() callers : diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 5f60c9e907c9d8eae1e85ae0329838383e3325df..db8774c50cc6ab99deaecdb1dfc31dc5306b9a37 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3627,6 +3627,7 @@ __skb_header_pointer(const struct sk_buff *skb, int offset, return buffer; } +__attribute__((nonnull(1))) static inline void * __must_check skb_header_pointer(const struct sk_buff *skb, int offset, int len, void *buffer) {
On Fri, 22 Jan 2021 09:47:01 +0100 Lukas Wunner wrote: > sch_handle_egress() returns either the skb or NULL to signal to its > caller __dev_queue_xmit() whether a packet should continue to be > processed. > > The skb is always non-NULL, otherwise __dev_queue_xmit() would hit a > NULL pointer deref right at its top. > > But the compiler doesn't know that. So if sch_handle_egress() signals > success by returning the skb, the "if (!skb) goto out;" statement > results in a gratuitous NULL pointer check in the Assembler output. Which exact compiler are we talking about it? Did you report this? As Eric pointed the compiler should be able to figure this out quite easily. > Avoid by telling the compiler that __dev_queue_xmit() is never passed a > NULL skb. This also eliminates another gratuitous NULL pointer check in > __dev_queue_xmit() > qdisc_pkt_len_init() > skb_header_pointer() > __skb_header_pointer() > > The speedup is barely measurable: > Before: 1877 1875 1878 1874 1882 1873 Mb/sec > After: 1877 1877 1880 1883 1888 1886 Mb/sec > > However we're about to add a netfilter egress hook to __dev_queue_xmit() > and without the micro-optimization, it will result in a performance > degradation which is indeed measurable: > With netfilter hook: 1853 1852 1850 1848 1849 1851 Mb/sec > With netfilter hook + micro-optim: 1874 1877 1881 1875 1876 1876 Mb/sec > > The performance degradation is caused by a JNE instruction ("if (skb)") > being flipped to a JE instruction ("if (!skb)") once the netfilter hook > is added. The micro-optimization removes the test and jump instructions > altogether. > > Measurements were performed on a Core i7-3615QM. Reproducer: > ip link add dev foo type dummy > ip link set dev foo up > tc qdisc add dev foo clsact > tc filter add dev foo egress bpf da bytecode '1,6 0 0 0,' > modprobe pktgen > echo "add_device foo" > /proc/net/pktgen/kpktgend_3 > samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i foo -n 400000000 -m "11:11:11:11:11:11" -d 1.1.1.1 > > Signed-off-by: Lukas Wunner <lukas@wunner.de> > Cc: John Fastabend <john.fastabend@gmail.com> > Cc: Daniel Borkmann <daniel@iogearbox.net> > Cc: Alexei Starovoitov <ast@kernel.org> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Thomas Graf <tgraf@suug.ch> > --- > net/core/dev.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 7afbb642e203..4c16b9932823 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4072,6 +4072,7 @@ struct netdev_queue *netdev_core_pick_tx(struct net_device *dev, > * the BH enable code must have IRQs enabled so that it will not deadlock. > * --BLG > */ > +__attribute__((nonnull(1))) > static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) > { > struct net_device *dev = skb->dev;
On Fri, Jan 22, 2021 at 10:40:05AM +0100, Eric Dumazet wrote: > On Fri, Jan 22, 2021 at 9:55 AM Lukas Wunner <lukas@wunner.de> wrote: > > sch_handle_egress() returns either the skb or NULL to signal to its > > caller __dev_queue_xmit() whether a packet should continue to be > > processed. > > > > The skb is always non-NULL, otherwise __dev_queue_xmit() would hit a > > NULL pointer deref right at its top. > > > > But the compiler doesn't know that. So if sch_handle_egress() signals > > success by returning the skb, the "if (!skb) goto out;" statement > > results in a gratuitous NULL pointer check in the Assembler output. > > > > Avoid by telling the compiler that __dev_queue_xmit() is never passed a > > NULL skb. [...] > > we're about to add a netfilter egress hook to __dev_queue_xmit() > > and without the micro-optimization, it will result in a performance > > degradation which is indeed measurable: [...] > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > +__attribute__((nonnull(1))) > > static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) > > { > > struct net_device *dev = skb->dev; > > It is a bit sad the compilers do not automatically get this knowledge > from the very first instruction : > > struct net_device *dev = skb->dev; The compiler (gcc) is capable of doing that, but the feature was disabled by: commit a3ca86aea507904148870946d599e07a340b39bf Author: Eugene Teo <eteo@redhat.com> Date: Wed Jul 15 14:59:10 2009 +0800 Add '-fno-delete-null-pointer-checks' to gcc CFLAGS If -fno-delete-null-pointer-checks is dropped from the top-level Makefile then the gratuitous NULL pointer checks disappear from the Assembler output, obviating the need to litter hot paths with __attribute__((nonnull(1))) annotations. Taking a closer look at that commit, its rationale appears questionable: It says that broken code such as ... struct agnx_priv *priv = dev->priv; if (!dev) return; ... would result in the NULL pointer check being optimized away. The commit message claims that keeping the NULL pointer check in "makes it harder to abuse" the broken code. I don't see how that's the case: If dev is NULL, the NULL pointer dereference at the function's top causes termination of the task in kernel/exit.c:do_exit(). So the NULL pointer check is never reached by the task. If on the other hand dev is non-NULL, the task isn't terminated but then the NULL pointer check is unnecessary as well. So the point of the commit remains elusive to me. I could submit an RFC patch which drops -fno-delete-null-pointer-checks and see if any security folks cry foul. Thoughts? Thanks, Lukas
On Sat, Jan 23, 2021 at 07:26:24PM -0800, Jakub Kicinski wrote: > On Fri, 22 Jan 2021 09:47:01 +0100 Lukas Wunner wrote: > > sch_handle_egress() returns either the skb or NULL to signal to its > > caller __dev_queue_xmit() whether a packet should continue to be > > processed. > > > > The skb is always non-NULL, otherwise __dev_queue_xmit() would hit a > > NULL pointer deref right at its top. > > > > But the compiler doesn't know that. So if sch_handle_egress() signals > > success by returning the skb, the "if (!skb) goto out;" statement > > results in a gratuitous NULL pointer check in the Assembler output. > > Which exact compiler are we talking about it? Did you report this? > As Eric pointed the compiler should be able to figure this out quite > easily. I tested with gcc 8, 9, 10. No need to report as it's the expected behavior with -fno-delete-null-pointer-checks, whose motivation appears questionable though (per my preceding e-mail). Thanks, Lukas
On Sun, 24 Jan 2021 11:33:01 +0100 Lukas Wunner wrote: > On Fri, Jan 22, 2021 at 10:40:05AM +0100, Eric Dumazet wrote: > > On Fri, Jan 22, 2021 at 9:55 AM Lukas Wunner <lukas@wunner.de> wrote: > > > sch_handle_egress() returns either the skb or NULL to signal to its > > > caller __dev_queue_xmit() whether a packet should continue to be > > > processed. > > > > > > The skb is always non-NULL, otherwise __dev_queue_xmit() would hit a > > > NULL pointer deref right at its top. > > > > > > But the compiler doesn't know that. So if sch_handle_egress() signals > > > success by returning the skb, the "if (!skb) goto out;" statement > > > results in a gratuitous NULL pointer check in the Assembler output. > > > > > > Avoid by telling the compiler that __dev_queue_xmit() is never passed a > > > NULL skb. > [...] > > > we're about to add a netfilter egress hook to __dev_queue_xmit() > > > and without the micro-optimization, it will result in a performance > > > degradation which is indeed measurable: > [...] > > > --- a/net/core/dev.c > > > +++ b/net/core/dev.c > > > +__attribute__((nonnull(1))) > > > static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) > > > { > > > struct net_device *dev = skb->dev; > > > > It is a bit sad the compilers do not automatically get this knowledge > > from the very first instruction : > > > > struct net_device *dev = skb->dev; > > The compiler (gcc) is capable of doing that, but the feature was disabled by: > > commit a3ca86aea507904148870946d599e07a340b39bf > Author: Eugene Teo <eteo@redhat.com> > Date: Wed Jul 15 14:59:10 2009 +0800 > > Add '-fno-delete-null-pointer-checks' to gcc CFLAGS > > If -fno-delete-null-pointer-checks is dropped from the top-level Makefile > then the gratuitous NULL pointer checks disappear from the Assembler output, > obviating the need to litter hot paths with __attribute__((nonnull(1))) > annotations. > > Taking a closer look at that commit, its rationale appears questionable: > It says that broken code such as ... > > struct agnx_priv *priv = dev->priv; > > if (!dev) > return; > > ... would result in the NULL pointer check being optimized away. > The commit message claims that keeping the NULL pointer check in > "makes it harder to abuse" the broken code. > > I don't see how that's the case: If dev is NULL, the NULL pointer > dereference at the function's top causes termination of the task > in kernel/exit.c:do_exit(). So the NULL pointer check is never > reached by the task. If on the other hand dev is non-NULL, > the task isn't terminated but then the NULL pointer check is > unnecessary as well. > > So the point of the commit remains elusive to me. I could submit > an RFC patch which drops -fno-delete-null-pointer-checks and see > if any security folks cry foul. Thoughts? I wonder if modern compilers can't simply warn about this particular case. Not to mention our static checkers.. Dan, do you think the concern from the above-quoted commit is still valid? Is this something that smatch flags these days? We're apparently paying a real performance price in networking for tying compiler's hands with -fno-delete-null-pointer-checks
On Mon, Jan 25, 2021 at 11:39:08AM -0800, Jakub Kicinski wrote: > On Sun, 24 Jan 2021 11:33:01 +0100 Lukas Wunner wrote: > > On Fri, Jan 22, 2021 at 10:40:05AM +0100, Eric Dumazet wrote: > > > On Fri, Jan 22, 2021 at 9:55 AM Lukas Wunner <lukas@wunner.de> wrote: > > > > sch_handle_egress() returns either the skb or NULL to signal to its > > > > caller __dev_queue_xmit() whether a packet should continue to be > > > > processed. > > > > > > > > The skb is always non-NULL, otherwise __dev_queue_xmit() would hit a > > > > NULL pointer deref right at its top. > > > > > > > > But the compiler doesn't know that. So if sch_handle_egress() signals > > > > success by returning the skb, the "if (!skb) goto out;" statement > > > > results in a gratuitous NULL pointer check in the Assembler output. > > > > > > > > Avoid by telling the compiler that __dev_queue_xmit() is never passed a > > > > NULL skb. > > [...] > > > > we're about to add a netfilter egress hook to __dev_queue_xmit() > > > > and without the micro-optimization, it will result in a performance > > > > degradation which is indeed measurable: > > [...] > > > > --- a/net/core/dev.c > > > > +++ b/net/core/dev.c > > > > +__attribute__((nonnull(1))) > > > > static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) > > > > { > > > > struct net_device *dev = skb->dev; > > > > > > It is a bit sad the compilers do not automatically get this knowledge > > > from the very first instruction : > > > > > > struct net_device *dev = skb->dev; > > > > The compiler (gcc) is capable of doing that, but the feature was disabled by: > > > > commit a3ca86aea507904148870946d599e07a340b39bf > > Author: Eugene Teo <eteo@redhat.com> > > Date: Wed Jul 15 14:59:10 2009 +0800 > > > > Add '-fno-delete-null-pointer-checks' to gcc CFLAGS > > > > If -fno-delete-null-pointer-checks is dropped from the top-level Makefile > > then the gratuitous NULL pointer checks disappear from the Assembler output, > > obviating the need to litter hot paths with __attribute__((nonnull(1))) > > annotations. > > > > Taking a closer look at that commit, its rationale appears questionable: > > It says that broken code such as ... > > > > struct agnx_priv *priv = dev->priv; > > > > if (!dev) > > return; > > > > ... would result in the NULL pointer check being optimized away. > > The commit message claims that keeping the NULL pointer check in > > "makes it harder to abuse" the broken code. > > > > I don't see how that's the case: If dev is NULL, the NULL pointer > > dereference at the function's top causes termination of the task > > in kernel/exit.c:do_exit(). So the NULL pointer check is never > > reached by the task. If on the other hand dev is non-NULL, > > the task isn't terminated but then the NULL pointer check is > > unnecessary as well. > > > > So the point of the commit remains elusive to me. I could submit > > an RFC patch which drops -fno-delete-null-pointer-checks and see > > if any security folks cry foul. Thoughts? > This was a famous tun.c bug back in the day. In those days we weren't careful to disallow remapping NULL to a different pointer. See /proc/sys/vm/mmap_min_addr. The exploit was to remap NULL to be a valid user controlled pointer. It should have been impossible to exploit because the code had a check for NULL, but the compiler optimized it away. https://lwn.net/Articles/342330/ > I wonder if modern compilers can't simply warn about this particular > case. Not to mention our static checkers.. > > > Dan, do you think the concern from the above-quoted commit is still > valid? Is this something that smatch flags these days? We're apparently > paying a real performance price in networking for tying compiler's hands > with -fno-delete-null-pointer-checks If I had to guess why GCC doesn't warn about this I would say that probably it's because a lot of macros have NULL checks built in. Most static analysis tools have a warning about inconsistent NULL checks but Smatch won't warn about it unless it can lead to a NULL dereference. The fact that pointless NULL checks slow down the code has never bothered anyone up to now. regards, dan carpenter
On Tue, Jan 26, 2021 at 11:58:17AM +0300, Dan Carpenter wrote: > On Mon, Jan 25, 2021 at 11:39:08AM -0800, Jakub Kicinski wrote: > > On Sun, 24 Jan 2021 11:33:01 +0100 Lukas Wunner wrote: > > > On Fri, Jan 22, 2021 at 10:40:05AM +0100, Eric Dumazet wrote: > > > > On Fri, Jan 22, 2021 at 9:55 AM Lukas Wunner <lukas@wunner.de> wrote: > > > > > sch_handle_egress() returns either the skb or NULL to signal to its > > > > > caller __dev_queue_xmit() whether a packet should continue to be > > > > > processed. > > > > > > > > > > The skb is always non-NULL, otherwise __dev_queue_xmit() would hit a > > > > > NULL pointer deref right at its top. > > > > > > > > > > But the compiler doesn't know that. So if sch_handle_egress() signals > > > > > success by returning the skb, the "if (!skb) goto out;" statement > > > > > results in a gratuitous NULL pointer check in the Assembler output. > > > > > > > > > > Avoid by telling the compiler that __dev_queue_xmit() is never passed a > > > > > NULL skb. > > > [...] > > > > > we're about to add a netfilter egress hook to __dev_queue_xmit() > > > > > and without the micro-optimization, it will result in a performance > > > > > degradation which is indeed measurable: > > > [...] > > > > > --- a/net/core/dev.c > > > > > +++ b/net/core/dev.c > > > > > +__attribute__((nonnull(1))) > > > > > static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) > > > > > { > > > > > struct net_device *dev = skb->dev; > > > > > > > > It is a bit sad the compilers do not automatically get this knowledge > > > > from the very first instruction : > > > > > > > > struct net_device *dev = skb->dev; > > > > > > The compiler (gcc) is capable of doing that, but the feature was disabled by: > > > > > > commit a3ca86aea507904148870946d599e07a340b39bf > > > Author: Eugene Teo <eteo@redhat.com> > > > Date: Wed Jul 15 14:59:10 2009 +0800 > > > > > > Add '-fno-delete-null-pointer-checks' to gcc CFLAGS > > > > > > If -fno-delete-null-pointer-checks is dropped from the top-level Makefile > > > then the gratuitous NULL pointer checks disappear from the Assembler output, > > > obviating the need to litter hot paths with __attribute__((nonnull(1))) > > > annotations. > > > > > > Taking a closer look at that commit, its rationale appears questionable: > > > It says that broken code such as ... > > > > > > struct agnx_priv *priv = dev->priv; > > > > > > if (!dev) > > > return; > > > > > > ... would result in the NULL pointer check being optimized away. > > > The commit message claims that keeping the NULL pointer check in > > > "makes it harder to abuse" the broken code. > > > > > > I don't see how that's the case: If dev is NULL, the NULL pointer > > > dereference at the function's top causes termination of the task > > > in kernel/exit.c:do_exit(). So the NULL pointer check is never > > > reached by the task. If on the other hand dev is non-NULL, > > > the task isn't terminated but then the NULL pointer check is > > > unnecessary as well. > > > > > > So the point of the commit remains elusive to me. I could submit > > > an RFC patch which drops -fno-delete-null-pointer-checks and see > > > if any security folks cry foul. Thoughts? > > This was a famous tun.c bug back in the day. In those days we weren't > careful to disallow remapping NULL to a different pointer. See > /proc/sys/vm/mmap_min_addr. The exploit was to remap NULL to be a valid > user controlled pointer. It should have been impossible to exploit > because the code had a check for NULL, but the compiler optimized it > away. > > https://lwn.net/Articles/342330/ Thanks Dan, that's valuable information. I'll add it to the commit message when respinning the series. So to summarize, keeping seemingly useless NULL pointer checks in is a mitigation in case accesses to the zero page don't result in an exception because an attacker managed to map something to that page. I'm assuming for now that selectively dropping those NULL pointer checks from hotpaths is acceptable if otherwise a measurable performance degradation occurs. Maintainers please speak up if you disagree. Thanks, Lukas
diff --git a/net/core/dev.c b/net/core/dev.c index 7afbb642e203..4c16b9932823 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4072,6 +4072,7 @@ struct netdev_queue *netdev_core_pick_tx(struct net_device *dev, * the BH enable code must have IRQs enabled so that it will not deadlock. * --BLG */ +__attribute__((nonnull(1))) static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) { struct net_device *dev = skb->dev;
sch_handle_egress() returns either the skb or NULL to signal to its caller __dev_queue_xmit() whether a packet should continue to be processed. The skb is always non-NULL, otherwise __dev_queue_xmit() would hit a NULL pointer deref right at its top. But the compiler doesn't know that. So if sch_handle_egress() signals success by returning the skb, the "if (!skb) goto out;" statement results in a gratuitous NULL pointer check in the Assembler output. Avoid by telling the compiler that __dev_queue_xmit() is never passed a NULL skb. This also eliminates another gratuitous NULL pointer check in __dev_queue_xmit() qdisc_pkt_len_init() skb_header_pointer() __skb_header_pointer() The speedup is barely measurable: Before: 1877 1875 1878 1874 1882 1873 Mb/sec After: 1877 1877 1880 1883 1888 1886 Mb/sec However we're about to add a netfilter egress hook to __dev_queue_xmit() and without the micro-optimization, it will result in a performance degradation which is indeed measurable: With netfilter hook: 1853 1852 1850 1848 1849 1851 Mb/sec With netfilter hook + micro-optim: 1874 1877 1881 1875 1876 1876 Mb/sec The performance degradation is caused by a JNE instruction ("if (skb)") being flipped to a JE instruction ("if (!skb)") once the netfilter hook is added. The micro-optimization removes the test and jump instructions altogether. Measurements were performed on a Core i7-3615QM. Reproducer: ip link add dev foo type dummy ip link set dev foo up tc qdisc add dev foo clsact tc filter add dev foo egress bpf da bytecode '1,6 0 0 0,' modprobe pktgen echo "add_device foo" > /proc/net/pktgen/kpktgend_3 samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i foo -n 400000000 -m "11:11:11:11:11:11" -d 1.1.1.1 Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Thomas Graf <tgraf@suug.ch> --- net/core/dev.c | 1 + 1 file changed, 1 insertion(+)