Message ID | 20220815071332.627393-1-yuzhao@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Multi-Gen LRU Framework | expand |
TLDR ==== RAM utilization Throughput (95% CI) P99 Latency (95% CI) ---------------------------------------------------------- ~90% NS NS ~110% +[12, 16]% -[20, 22]% Abbreviations ============= CI: confidence interval NS: no statistically significant difference DUT: device under test ATE: automatic test equipment Rational ======== 1. OpenWrt is the most popular distro for WiFi routers; many of its targets use big endianness [1]. 2. 4 out of the top 5 bestselling WiFi routers in the US use MIPS [2]; MIPS uses software-managed TLB. 3. Memcached is the best available memory benchmark on OpenWrt; admittedly such a use case is very limited in the real world. Hardware ======== DUT: Ubiquiti EdgeRouter (ER-8) [3] DUT # cat /proc/cpuinfo system type : UBNT_E200 (CN6120p1.1-800-NSP) machine : Unknown processor : 0 cpu model : Cavium Octeon II V0.1 BogoMIPS : 1600.00 wait instruction : yes microsecond timers : yes tlb_entries : 128 extra interrupt vector : yes hardware watchpoint : yes, count: 2, address/irw mask: [0x0ffc, 0x0ffb] isa : mips1 mips2 mips3 mips4 mips5 mips32r1 mips32r2 mips64r1 mips64r2 ASEs implemented : Options implemented : tlb rixiex 4kex octeon_cache 32fpr prefetch mcheck ejtag llsc rixi lpa vtag_icache userlocal perf_cntr_intr_bit perf shadow register sets : 1 kscratch registers : 3 package : 0 core : 0 VCED exceptions : not available VCEI exceptions : not available processor : 1 cpu model : Cavium Octeon II V0.1 BogoMIPS : 1600.00 wait instruction : yes microsecond timers : yes tlb_entries : 128 extra interrupt vector : yes hardware watchpoint : yes, count: 2, address/irw mask: [0x0ffc, 0x0ffb] isa : mips1 mips2 mips3 mips4 mips5 mips32r1 mips32r2 mips64r1 mips64r2 ASEs implemented : Options implemented : tlb rixiex 4kex octeon_cache 32fpr prefetch mcheck ejtag llsc rixi lpa vtag_icache userlocal perf_cntr_intr_bit perf shadow register sets : 1 kscratch registers : 3 package : 0 core : 1 VCED exceptions : not available VCEI exceptions : not available DUT # cat /proc/meminfo MemTotal: 1991964 kB MemFree: 1917304 kB MemAvailable: 1896856 kB Buffers: 4 kB Cached: 33464 kB SwapCached: 0 kB Active: 1316 kB Inactive: 33500 kB Active(anon): 1316 kB Inactive(anon): 33496 kB Active(file): 0 kB Inactive(file): 4 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 995324 kB SwapFree: 995324 kB Dirty: 0 kB Writeback: 0 kB AnonPages: 1360 kB Mapped: 2688 kB Shmem: 33464 kB KReclaimable: 8244 kB Slab: 19772 kB SReclaimable: 8244 kB SUnreclaim: 11528 kB KernelStack: 1056 kB PageTables: 336 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 1991304 kB Committed_AS: 38916 kB VmallocTotal: 1069547512 kB VmallocUsed: 4856 kB VmallocChunk: 0 kB Percpu: 272 kB Software ======== DUT # cat /etc/openwrt_release DISTRIB_ID='OpenWrt' DISTRIB_RELEASE='22.03.0-rc6' DISTRIB_REVISION='r19590-042d558536' DISTRIB_TARGET='octeon/generic' DISTRIB_ARCH='mips64_octeonplus' DISTRIB_DESCRIPTION='OpenWrt 22.03.0-rc6 r19590-042d558536' DISTRIB_TAINTS='no-all no-ipv6' DUT # uname -a Linux OpenWrt 6.0.0-rc3+ #0 SMP Sun Jul 31 15:12:47 2022 mips64 GNU/Linux DUT # cat /proc/swaps Filename Type Size Used Priority /dev/zram0 partition 995324 0 100 DUT # memcached -V memcached 1.6.9 DUT # cat /etc/config/memcached config memcached option user 'memcached' option maxconn '1024' option listen '0.0.0.0' option port '11211' option memory '6400' ATE $ memtier_benchmark -v memtier_benchmark 1.3.0 Copyright (C) 2011-2022 Redis Ltd. This is free software. You may redistribute copies of it under the terms of the GNU General Public License <http://www.gnu.org/licenses/gpl.html>. There is NO WARRANTY, to the extent permitted by law. Procedure ========= ATE $ cat run_benchmark_matrix.sh run_memtier_benchmark() { # boot to kernel $3 # populate dataset memtier_benchmark/memtier_benchmark -s $DUT_IP -p 11211 \ -P memcache_binary -n allkeys -c 1 --ratio 1:0 --pipeline 8 \ --key-minimum=1 --key-maximum=$2 --key-pattern=P:P \ -d 1000 # access dataset using Guassian pattern memtier_benchmark/memtier_benchmark -s $DUT_IP -p 11211 \ -P memcache_binary --test-time $1 -c 1 --ratio 0:1 \ --pipeline 8 --key-minimum=1 --key-maximum=$2 \ --key-pattern=G:G --randomize --distinct-client-seed # collect results } run_duration_secs=1200 mem_utils_90_110=(1600000 2000000) kernels=("baseline" "patched") for mem_util in ${mem_utils_90_110[@]}; do for kernel in ${kernels[@]}; do run_memtier_benchmark $run_duration_secs $mem_util $kernel done done Results ======= Baseline 90% RAM utilization ------------------------------------------------------------ Ops/sec Avg. Lat. p50 Lat. p99 Lat. p99.9 Lat. KB/sec ------------------------------------------------------------ 48550.71 0.65687 0.48700 2.84700 5.56700 1812.25 48600.55 0.65629 0.48700 2.86300 5.59900 1814.11 48562.37 0.65674 0.48700 2.84700 5.50300 1812.68 48556.66 0.65688 0.48700 2.84700 5.53500 1812.47 48619.50 0.65600 0.48700 2.87900 5.63100 1814.82 48579.74 0.65654 0.48700 2.84700 5.56700 1813.33 48593.25 0.65764 0.48700 2.86300 5.56700 1814.10 48535.52 0.65716 0.48700 2.86300 5.56700 1811.68 48587.24 0.65645 0.48700 2.83100 5.50300 1813.61 48541.92 0.65704 0.48700 2.81500 5.47100 1811.92 MGLRU 90% RAM utilization ------------------------------------------------------------ Ops/sec Avg. Lat. p50 Lat. p99 Lat. p99.9 Lat. KB/sec ------------------------------------------------------------ 48622.38 0.65594 0.48700 2.81500 5.47100 1814.92 48537.74 0.65715 0.48700 2.84700 5.53500 1811.76 48586.82 0.65646 0.48700 2.84700 5.50300 1813.59 48552.44 0.65695 0.48700 2.83100 5.43900 1812.31 48557.35 0.65680 0.49500 2.83100 5.53500 1812.49 48625.48 0.65593 0.48700 2.81500 5.43900 1815.04 48655.75 0.65557 0.48700 2.84700 5.53500 1816.17 48625.67 0.65595 0.48700 2.84700 5.53500 1815.04 48622.22 0.65600 0.48700 2.84700 5.47100 1814.91 48617.10 0.65610 0.48700 2.84700 5.56700 1814.73 Baseline 110% RAM utilization ------------------------------------------------------------ Ops/sec Avg. Lat. p50 Lat. p99 Lat. p99.9 Lat. KB/sec ------------------------------------------------------------ 19813.79 1.61245 0.63100 17.79100 31.74300 744.91 20328.29 1.57158 0.62300 17.27900 31.10300 764.25 20104.12 1.58913 0.62300 17.40700 31.10300 755.82 20342.03 1.57053 0.61500 17.27900 30.84700 764.77 19688.05 1.62268 0.62300 17.91900 31.35900 740.18 19607.31 1.62943 0.63900 17.91900 31.23100 737.15 19250.96 1.65963 0.65500 17.91900 31.10300 723.75 20182.79 1.58290 0.63100 17.40700 30.84700 758.78 20181.88 1.58299 0.63100 17.40700 30.84700 758.75 20615.90 1.54963 0.62300 17.02300 30.84700 775.06 MGLRU 110% RAM utilization ------------------------------------------------------------ Ops/sec Avg. Lat. p50 Lat. p99 Lat. p99.9 Lat. KB/sec ------------------------------------------------------------ 22911.33 1.39405 0.61500 13.69500 28.79900 861.36 22339.08 1.42989 0.61500 14.07900 30.07900 839.85 23394.22 1.36521 0.59900 13.56700 29.05500 879.51 22521.48 1.41830 0.61500 13.88700 29.82300 846.70 22678.10 1.40818 0.61500 13.82300 29.69500 852.59 22344.50 1.42952 0.61500 14.07900 29.95100 840.05 23245.65 1.37406 0.60700 13.50300 28.92700 873.93 23140.17 1.38032 0.59900 13.69500 29.18300 869.96 23003.34 1.38856 0.61500 13.63100 29.05500 864.82 22937.52 1.39253 0.61500 13.69500 29.43900 862.35 Flame graphs ------------ Baseline: https://drive.google.com/file/d/1-Ac4HMPAyZIqxtvKerUTqNNAgBLhpX9R MGLRU: https://drive.google.com/file/d/1-9x0W2yIYeiRvXWiYRzL6niTqW7zCVPX References ========== [1] https://openwrt.org/docs/platforms/start [2] https://www.amazon.com/bestsellers/pc/300189 [3] https://openwrt.org/toh/ubiquiti/edgerouter
On Wed, Aug 31, 2022, at 6:17 AM, Yu Zhao wrote: > > Rational > ======== > 1. OpenWrt is the most popular distro for WiFi routers; many of its > targets use big endianness [1]. > 2. 4 out of the top 5 bestselling WiFi routers in the US use MIPS [2]; > MIPS uses software-managed TLB. > 3. Memcached is the best available memory benchmark on OpenWrt; > admittedly such a use case is very limited in the real world. > > Hardware > ======== > DUT: Ubiquiti EdgeRouter (ER-8) [3] I don't know if it makes any difference to your findings, but I would point out the test hardware is neither representative of most devices supported by OpenWRT, nor those on the amazon best-seller list that I see looking from Germany: Five of the top-10 devices on that list are arm64 (little-endian, hardware TLB walker, typically 512MB of RAM), the others are mips32 (typically only 128MB, mostly single-core) and only the oldest one (Archer C7) of them is big-endian. I would not expect endianness to make any difference, but the 16x smaller memory of typical mips devices (ath79, mt76) might. Arnd
On Wed, Aug 31, 2022, at 6:17 AM, Yu Zhao wrote: > > Rational > ======== > 1. OpenWrt is the most popular distro for WiFi routers; many of its > targets use big endianness [1]. > 2. 4 out of the top 5 bestselling WiFi routers in the US use MIPS [2]; > MIPS uses software-managed TLB. > 3. Memcached is the best available memory benchmark on OpenWrt; > admittedly such a use case is very limited in the real world. > > Hardware > ======== > DUT: Ubiquiti EdgeRouter (ER-8) [3] I don't know if it makes any difference to your findings, but I would point out the test hardware is neither representative of most devices supported by OpenWRT, nor those on the amazon best-seller list that I see looking from Germany: Five of the top-10 devices on that list are arm64 (little-endian, hardware TLB walker, typically 512MB of RAM), the others are mips32 (typically only 128MB, mostly single-core) and only the oldest one (Archer C7) of them is big-endian. I would not expect endianness to make any difference, but the 16x smaller memory of typical mips devices (ath79, mt76) might. Arnd
On 8/30/22 21:17, Yu Zhao wrote: > TLDR > ==== > RAM utilization Throughput (95% CI) P99 Latency (95% CI) > ---------------------------------------------------------- > ~90% NS NS > ~110% +[12, 16]% -[20, 22]% I'll give you points for thinking out of the box on this one. This is a piece of hardware where both latency and bandwidth theoretically matter. I've got a slightly older but similar piece of Ubiquiti hardware with 512MB of RAM. It doesn't run OpenWRT, fwiw. Maybe my firmware is a bit outdated. *But*, most of the heavy lifting for packet flow on these systems is done in hardware. They have some hardware acceleration to be able to _route_ at gigabit speeds, so they're probably not quite as sensitive to software hiccups as lower-end routers. That said, my system at least does not typically have *any* memory pressure. Right now, it hasn't even filled free memory with page cache and it's been up for over a month: # cat /proc/meminfo MemTotal: 491552 kB MemFree: 160188 kB MemAvailable: 373088 kB Cached: 151004 kB I think a better tl;dr would be: MGLRU doesn't help much or cause any regressions on this hardware. Under (atypical) synthetic memory pressure, MGLRU did show some modest but measurable throughput and latency benefits. In other words, this provides more of a data point that MGLRU doesn't hurt medium-ish sized embedded systems. I think you could make an even stronger case with even smaller hardware or something that actually sees memory pressure on a regular basis in the wild.
On Tue, Aug 30, 2022 at 10:17 PM Yu Zhao <yuzhao@google.com> wrote: > > TLDR > ==== > RAM utilization Throughput (95% CI) P99 Latency (95% CI) > ---------------------------------------------------------- > ~90% NS NS > ~110% +[12, 16]% -[20, 22]% > > Abbreviations > ============= > CI: confidence interval > NS: no statistically significant difference > DUT: device under test > ATE: automatic test equipment > > Rational > ======== > 1. OpenWrt is the most popular distro for WiFi routers; many of its > targets use big endianness [1]. > 2. 4 out of the top 5 bestselling WiFi routers in the US use MIPS [2]; > MIPS uses software-managed TLB. > 3. Memcached is the best available memory benchmark on OpenWrt; > admittedly such a use case is very limited in the real world. Thanks. My goal is to encourage MM people to extend their test coverage to some commonly used but less tested configurations. I carefully constructed this benchmark with the balance between its representativeness and the effort to reproduce. When I wear my MM hat, I see ER-8 as the ideal choice because it comes with a serial port, a replaceable memory DIMM and one of the two cores that can be disabled. The same SoC is also what the Debian MIPS port mainly uses for their testing [1]. So if I need help, I might be able to get it from them. From OpenWrt's / MIPS OEMs' POVs, I do see ER-8 as an uninteresting platform. Currently the best selling WiFi router on Amazon US is Archer A7, a knockoff of Archer C7. The latter comes with not only the serial port header but also the JTAG header, and that's what I use. But I seriously doubt showing how I work on C7 would encourage MM people to try it. I snapped a pictures of it during lunch: https://drive.google.com/file/d/1rYBwLOyMqBSr6WKUZd7Gbf9RfwA641X5/ And other boards I routinely test the MM performance on: https://drive.google.com/file/d/1yBMx9OPWw-5czvz3maNUy6WBFwPvAqG5/ All the way dates back to this vintage: https://drive.google.com/file/d/12N21qiWSoyJgZwVkwAhY8_5Fj4dKftqD/ [1] https://wiki.debian.org/MIPSPort
I'd like to move mglru into the mm-stable branch late this week. I'm not terribly happy about the level of review nor the carefulness of the code commenting (these things are related) and I have a note here that "mm: multi-gen LRU: admin guide" is due for an update and everyone is at conference anyway. But let's please try to push things along anyway.
On Sun, Sep 11, 2022 at 6:08 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > I'd like to move mglru into the mm-stable branch late this week. > > I'm not terribly happy about the level of review nor the carefulness of > the code commenting (these things are related) and I have a note here > that "mm: multi-gen LRU: admin guide" is due for an update and everyone > is at conference anyway. But let's please try to push things along anyway. Thanks for the heads-up. Will add as many comments as I can and wrap it up by the end of tomorrow.
On Thu, Sep 15, 2022 at 11:56 AM Yu Zhao <yuzhao@google.com> wrote: > > On Sun, Sep 11, 2022 at 6:08 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > I'd like to move mglru into the mm-stable branch late this week. > > > > I'm not terribly happy about the level of review nor the carefulness of > > the code commenting (these things are related) and I have a note here > > that "mm: multi-gen LRU: admin guide" is due for an update and everyone > > is at conference anyway. But let's please try to push things along anyway. > > Thanks for the heads-up. Will add as many comments as I can and wrap > it up by the end of tomorrow. I've posted v15 which can replace what mm-unstable currently has. Apologies for the delay: an unexpected lockdep warning from the maple tree forced me to restart all the tests [1]. Let me also post the incremental patches after this email, in case you strongly prefer to add them on top of v14. [1] https://lore.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com/
On Sun, 18 Sep 2022 14:40:01 -0600 Yu Zhao <yuzhao@google.com> wrote: > Let me also post the incremental patches after this email, in case you > strongly prefer to add them on top of v14. Thanks, helpful. I have one question regarding 03/11. The final two updates look pretty substantial. I guess I'll do a series replacement and let this and mapletree sit another week.