Message ID | 20210829183608.2297877-1-me@ubique.spb.ru (mailing list archive) |
---|---|
Headers | show |
Series | bpfilter | expand |
On 8/29/21 12:35 PM, Dmitrii Banshchikov wrote: > The patchset is based on the patches from David S. Miller [1] and > Daniel Borkmann [2]. > > The main goal of the patchset is to prepare bpfilter for > iptables' configuration blob parsing and code generation. The referenced patches are from 2018. Since then, and since this is bpf-next, places like [1] indicate that we are moving on from iptables towards nftables. Any thoughts? [1] https://wiki.archlinux.org/title/Iptables > > The patchset introduces data structures and code for matches, > targets, rules and tables. Beside that the code generation > is introduced. > > The first version of the code generation supports only "inline" > mode - all chains and their rules emit instructions in linear > approach. The plan for the code generation is to introduce a > bpf_map_for_each subprogram that will handle all rules that > aren't generated in inline mode due to verifier's limit. This > shall allow to handle arbitrary large rule sets. > > Things that are not implemented yet: > 1) The process of switching from the previous BPF programs to the > new set isn't atomic. > 2) The code generation for FORWARD chain isn't supported > 3) Counters setsockopts() are not handled > 4) No support of device ifindex - it's hardcoded > 5) No helper subprog for counters update > > Another problem is using iptables' blobs for tests and filter > table initialization. While it saves lines something more > maintainable should be done here. > > The plan for the next iteration: > 1) Handle large rule sets via bpf_map_for_each > 2) Add a helper program for counters update > 3) Handle iptables' counters setsockopts() > 4) Handle ifindex > 5) Add TCP match > > Patch 1 adds definitions of the used types. > Patch 2 adds logging to bpfilter. > Patch 3 adds bpfilter header to tools > Patch 4 adds an associative map. > Patch 5 adds code generation basis > Patches 6/7/8/9 add code for matches, targets, rules and table. > Patch 10 adds code generation for table > Patch 11 handles hooked setsockopt(2) calls. > Patch 12 adds filter table > Patch 13 uses prepared code in main(). > ... > > 1. https://lore.kernel.org/patchwork/patch/902785/ > 2. https://lore.kernel.org/patchwork/patch/902783/ > 3. https://kernel.ubuntu.com/~cking/stress-ng/stress-ng.pdf
On Sun, Aug 29, 2021 at 01:13:53PM -0600, Raymond Burkholder wrote: > On 8/29/21 12:35 PM, Dmitrii Banshchikov wrote: > > The patchset is based on the patches from David S. Miller [1] and > > Daniel Borkmann [2]. > > > > The main goal of the patchset is to prepare bpfilter for > > iptables' configuration blob parsing and code generation. > > The referenced patches are from 2018. Since then, and since this is > bpf-next, places like [1] indicate that we are moving on from iptables > towards nftables. > > Any thoughts? I'm not sure what kind of thoughts you expect. If your question is why to use an outdated interface the answer is - this is just a starting point and nothing prevents us from having our own interface in future . > > [1] https://wiki.archlinux.org/title/Iptables >
On 2021-08-29 2:35 p.m., Dmitrii Banshchikov wrote: [..] > And here are some performance tests. > > The environment consists of two machines(sender and receiver) > connected with 10Gbps link via switch. The sender uses DPDK to > simulate QUIC packets(89 bytes long) from random IP. The switch > measures the generated traffic to be about 7066377568 bits/sec, > 9706553 packets/sec. > > The receiver is a 2 socket 2680v3 + HT and uses either iptables, > nft or bpfilter to filter out UDP traffic. > > Two tests were made. Two rulesets(default policy was to ACCEPT) > were used in each test: > > ``` > iptables -A INPUT -p udp -m udp --dport 1500 -j DROP > ``` > and > ``` > iptables -A INPUT -s 1.1.1.1/32 -p udp -m udp --dport 1000 -j DROP > iptables -A INPUT -s 2.2.2.2/32 -p udp -m udp --dport 2000 -j DROP > ... > iptables -A INPUT -s 31.31.31.31/32 -p udp -m udp --dport 31000 -j DROP > iptables -A INPUT -p udp -m udp --dport 1500 -j DROP > ``` > > The first test measures performance of the receiver via stress-ng > [3] in bogo-ops. The upper-bound(there are no firewall and no > traffic) value for bogo-ops is 8148-8210. The lower bound value > (there is traffic but no firewall) is 6567-6643. > The stress-ng command used: stress-ng -t60 -c48 --metrics-brief. > > The second test measures the number the of dropped packets. The > receiver keeps only 1 CPU online and disables all > others(maxcpus=1 and set number of cores per socket to 1 in > BIOS). The number of the dropped packets is collected via > iptables-legacy -nvL, iptables -nvL and bpftool map dump id. > > Test 1: bogo-ops(the more the better) > iptables nft bpfilter > 1 rule: 6474-6554 6483-6515 7996-8008 > 32 rules: 6374-6433 5761-5804 7997-8042 > > > Test 2: number of dropped packets(the more the better) > iptables nft bpfilter > 1 rule: 234M-241M 220M 900M+ > 32 rules: 186M-196M 97M-98M 900M+ > > > Please let me know if you see a gap in the testing environment. General perf testing will depend on the nature of the use case you are trying to target. What is the nature of the app? Is it just receiving packets and counting? Does it exemplify something something real in your network or is just purely benchmarking? Both are valid. What else can it do (eg are you interested in latency accounting etc)? What i have seen in practise for iptables deployments is a default drop and in general an accept list. Per ruleset IP address aggregation is typically achieved via ipset. So your mileage may vary... Having said that: Our testing[1] approach is typically for a worst case scenario. i.e we make sure you structure the rulesets such that all of the linear rulesets will be iterated and we eventually hit the default ruleset. We also try to reduce variability in the results. A lot of small things could affect your reproducibility, so we try to avoid them. For example, from what you described: You are sending from a random IP - that means each packet will hit a random ruleset (for the case of 32 rulesets). And some rules will likely be hit more often than others. The likelihood of reproducing the same results for multiple runs gets lower as you increase the number of rulesets. From a collection perspective: Looking at the nature of the CPU utilization is important Softirq vs system calls vs user app. Your test workload seems to be very specific to ingress host. So in reality you are more constrained by kernel->user syscalls (which will be hidden if you are mostly dropping in the kernel as opposed to letting packets go to user space). Something is not clear from your email: You seem to indicate that no traffic was running in test 1. If so, why would 32 rulesets give different results than 1? cheers, jamal [1] https://netdevconf.info/0x15/session.html?Linux-ACL-Performance-Analysis
On Mon, Aug 30, 2021 at 09:56:18PM -0400, Jamal Hadi Salim wrote: > On 2021-08-29 2:35 p.m., Dmitrii Banshchikov wrote: > > [..] > > > And here are some performance tests. > > > > The environment consists of two machines(sender and receiver) > > connected with 10Gbps link via switch. The sender uses DPDK to > > simulate QUIC packets(89 bytes long) from random IP. The switch > > measures the generated traffic to be about 7066377568 bits/sec, > > 9706553 packets/sec. > > > > The receiver is a 2 socket 2680v3 + HT and uses either iptables, > > nft or bpfilter to filter out UDP traffic. > > > > Two tests were made. Two rulesets(default policy was to ACCEPT) > > were used in each test: > > > > ``` > > iptables -A INPUT -p udp -m udp --dport 1500 -j DROP > > ``` > > and > > ``` > > iptables -A INPUT -s 1.1.1.1/32 -p udp -m udp --dport 1000 -j DROP > > iptables -A INPUT -s 2.2.2.2/32 -p udp -m udp --dport 2000 -j DROP > > ... > > iptables -A INPUT -s 31.31.31.31/32 -p udp -m udp --dport 31000 -j DROP > > iptables -A INPUT -p udp -m udp --dport 1500 -j DROP > > ``` > > > > The first test measures performance of the receiver via stress-ng > > [3] in bogo-ops. The upper-bound(there are no firewall and no > > traffic) value for bogo-ops is 8148-8210. The lower bound value > > (there is traffic but no firewall) is 6567-6643. > > The stress-ng command used: stress-ng -t60 -c48 --metrics-brief. > > > > The second test measures the number the of dropped packets. The > > receiver keeps only 1 CPU online and disables all > > others(maxcpus=1 and set number of cores per socket to 1 in > > BIOS). The number of the dropped packets is collected via > > iptables-legacy -nvL, iptables -nvL and bpftool map dump id. > > > > Test 1: bogo-ops(the more the better) > > iptables nft bpfilter > > 1 rule: 6474-6554 6483-6515 7996-8008 > > 32 rules: 6374-6433 5761-5804 7997-8042 > > > > > > Test 2: number of dropped packets(the more the better) > > iptables nft bpfilter > > 1 rule: 234M-241M 220M 900M+ > > 32 rules: 186M-196M 97M-98M 900M+ > > > > > > Please let me know if you see a gap in the testing environment. > > General perf testing will depend on the nature of the use case > you are trying to target. > What is the nature of the app? Is it just receiving packets and > counting? Does it exemplify something something real in your > network or is just purely benchmarking? Both are valid. > What else can it do (eg are you interested in latency accounting etc)? > What i have seen in practise for iptables deployments is a default > drop and in general an accept list. Per ruleset IP address aggregation > is typically achieved via ipset. So your mileage may vary... This was a pure benchmarking with the single goal - show that there might exist scenarios when using bpfilter may provide some performance benefits. > > Having said that: > Our testing[1] approach is typically for a worst case scenario. > i.e we make sure you structure the rulesets such that all of the > linear rulesets will be iterated and we eventually hit the default > ruleset. > We also try to reduce variability in the results. A lot of small > things could affect your reproducibility, so we try to avoid them. > For example, from what you described: > You are sending from a random IP - that means each packet will hit > a random ruleset (for the case of 32 rulesets). And some rules will > likely be hit more often than others. The likelihood of reproducing the > same results for multiple runs gets lower as you increase the number > of rulesets. > From a collection perspective: > Looking at the nature of the CPU utilization is important > Softirq vs system calls vs user app. > Your test workload seems to be very specific to ingress host. > So in reality you are more constrained by kernel->user syscalls > (which will be hidden if you are mostly dropping in the kernel > as opposed to letting packets go to user space). > > Something is not clear from your email: > You seem to indicate that no traffic was running in test 1. > If so, why would 32 rulesets give different results than 1? I mentioned the lower and upper bound values for bogo-ops on the machine. The lower bound is when there is traffic and no firewall at all. The upper bound is when there is no firewall and no traffic. Then the first test measures bogo-ops for two rule sets when there is traffic for either iptables, nft or bpfilter. > > cheers, > jamal > > [1] https://netdevconf.info/0x15/session.html?Linux-ACL-Performance-Analysis
On 2021-08-31 8:48 a.m., Dmitrii Banshchikov wrote: > On Mon, Aug 30, 2021 at 09:56:18PM -0400, Jamal Hadi Salim wrote: >> On 2021-08-29 2:35 p.m., Dmitrii Banshchikov wrote: >> >> Something is not clear from your email: >> You seem to indicate that no traffic was running in test 1. >> If so, why would 32 rulesets give different results than 1? > > I mentioned the lower and upper bound values for bogo-ops on the > machine. The lower bound is when there is traffic and no firewall > at all. The upper bound is when there is no firewall and no > traffic. Then the first test measures bogo-ops for two rule sets > when there is traffic for either iptables, nft or bpfilter. > Thanks, I totally misread that. I did look at stress-ng initially to use it to stress the system and emulate some real world setup (polluting cache etc) while testing but the variability of the results was bad - so dropped it earlier. Maybe we should look at it again. cheers, jamal