Message ID | 35FD53F367049845BC99AC72306C23D1044A02027E0E@CNBJMBX05.corpusers.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, 3 Feb 2015 16:42:14 +0800 "Wang, Yalin" <Yalin.Wang@sonymobile.com> wrote: > I make a change in kernel to test hit/miss ratio: Neat, thanks. > > ... > > After use the phone some time: > root@D5303:/ # cat /proc/meminfo > VmallocUsed: 10348 kB > VmallocChunk: 75632 kB > __set_bit_miss_count:10002 __set_bit_success_count:1096661 > __clear_bit_miss_count:359484 __clear_bit_success_count:3674617 > __test_and_set_bit_miss_count:7 __test_and_set_bit_success_count:221 > __test_and_clear_bit_miss_count:924611 __test_and_clear_bit_success_count:193 > > __test_and_clear_bit_miss_count has a very high miss rate. > In fact, I think set/clear/test_and_set(clear)_bit atomic version can also > Be investigated to see its miss ratio, > I have not tested the atomic version, > Because it reside in different architectures. Hopefully misses in test_and_X_bit are not a problem. The CPU implementation would be pretty stupid to go and dirty the cacheline when it knows it didn't change anything. But maybe I'm wrong about that. That we're running clear_bit against a cleared bit 10% of the time is a bit alarming. I wonder where that's coming from. The enormous miss count in test_and_clear_bit() might indicate an inefficiency somewhere.
> -----Original Message----- > From: Andrew Morton [mailto:akpm@linux-foundation.org] > Sent: Tuesday, February 03, 2015 6:59 PM > To: Wang, Yalin > Cc: 'Kirill A. Shutemov'; 'arnd@arndb.de'; 'linux-arch@vger.kernel.org'; > 'linux-kernel@vger.kernel.org'; 'linux@arm.linux.org.uk'; 'linux-arm- > kernel@lists.infradead.org' > Subject: Re: [RFC] change non-atomic bitops method > > On Tue, 3 Feb 2015 16:42:14 +0800 "Wang, Yalin" <Yalin.Wang@sonymobile.com> > wrote: > > > I make a change in kernel to test hit/miss ratio: > > Neat, thanks. > > > > > ... > > > > After use the phone some time: > > root@D5303:/ # cat /proc/meminfo > > VmallocUsed: 10348 kB > > VmallocChunk: 75632 kB > > __set_bit_miss_count:10002 __set_bit_success_count:1096661 > > __clear_bit_miss_count:359484 __clear_bit_success_count:3674617 > > __test_and_set_bit_miss_count:7 __test_and_set_bit_success_count:221 > > __test_and_clear_bit_miss_count:924611 > __test_and_clear_bit_success_count:193 > > > > __test_and_clear_bit_miss_count has a very high miss rate. > > In fact, I think set/clear/test_and_set(clear)_bit atomic version can > also > > Be investigated to see its miss ratio, > > I have not tested the atomic version, > > Because it reside in different architectures. > > Hopefully misses in test_and_X_bit are not a problem. The CPU > implementation would be pretty stupid to go and dirty the cacheline > when it knows it didn't change anything. But maybe I'm wrong about > that. > > That we're running clear_bit against a cleared bit 10% of the time is a > bit alarming. I wonder where that's coming from. > > The enormous miss count in test_and_clear_bit() might indicate an > inefficiency somewhere. I te-test the patch on 3.10 kernel. The result like this: VmallocChunk: 251498164 kB __set_bit_miss_count:11730 __set_bit_success_count:1036316 __clear_bit_miss_count:209640 __clear_bit_success_count:4806556 __test_and_set_bit_miss_count:0 __test_and_set_bit_success_count:121 __test_and_clear_bit_miss_count:0 __test_and_clear_bit_success_count:445 __clear_bit miss rate is a little high, I check the log, and most miss coming from this code: <6>[ 442.701798] [<ffffffc00021d084>] warn_slowpath_fmt+0x4c/0x58 <6>[ 442.701805] [<ffffffc0002461a8>] __clear_bit+0x98/0xa4 <6>[ 442.701813] [<ffffffc0003126ac>] __alloc_fd+0xc8/0x124 <6>[ 442.701821] [<ffffffc000312768>] get_unused_fd_flags+0x28/0x34 <6>[ 442.701828] [<ffffffc0002f9370>] do_sys_open+0x10c/0x1c0 <6>[ 442.701835] [<ffffffc0002f9458>] SyS_openat+0xc/0x18 In __clear_close_on_exec(fd, fdt); <6>[ 442.695354] [<ffffffc00021d084>] warn_slowpath_fmt+0x4c/0x58 <6>[ 442.695359] [<ffffffc0002461a8>] __clear_bit+0x98/0xa4 <6>[ 442.695367] [<ffffffc000312340>] dup_fd+0x1d4/0x280 <6>[ 442.695375] [<ffffffc00021b07c>] copy_process.part.56+0x42c/0xe38 <6>[ 442.695382] [<ffffffc00021bb9c>] do_fork+0xe0/0x360 <6>[ 442.695389] [<ffffffc00021beb4>] SyS_clone+0x10/0x1c In __clear_open_fd(open_files - i, new_fdt); Do we need test_bit() before clear_bit()at these 2 place? Thanks
On Mon, Feb 09 2015, "Wang, Yalin" <Yalin.Wang@sonymobile.com> wrote: > I te-test the patch on 3.10 kernel. > The result like this: > > VmallocChunk: 251498164 kB > __set_bit_miss_count:11730 __set_bit_success_count:1036316 > __clear_bit_miss_count:209640 __clear_bit_success_count:4806556 > __test_and_set_bit_miss_count:0 __test_and_set_bit_success_count:121 > __test_and_clear_bit_miss_count:0 __test_and_clear_bit_success_count:445 > > __clear_bit miss rate is a little high, > I check the log, and most miss coming from this code: > > <6>[ 442.701798] [<ffffffc00021d084>] warn_slowpath_fmt+0x4c/0x58 > <6>[ 442.701805] [<ffffffc0002461a8>] __clear_bit+0x98/0xa4 > <6>[ 442.701813] [<ffffffc0003126ac>] __alloc_fd+0xc8/0x124 > <6>[ 442.701821] [<ffffffc000312768>] get_unused_fd_flags+0x28/0x34 > <6>[ 442.701828] [<ffffffc0002f9370>] do_sys_open+0x10c/0x1c0 > <6>[ 442.701835] [<ffffffc0002f9458>] SyS_openat+0xc/0x18 > In __clear_close_on_exec(fd, fdt); > > > > <6>[ 442.695354] [<ffffffc00021d084>] warn_slowpath_fmt+0x4c/0x58 > <6>[ 442.695359] [<ffffffc0002461a8>] __clear_bit+0x98/0xa4 > <6>[ 442.695367] [<ffffffc000312340>] dup_fd+0x1d4/0x280 > <6>[ 442.695375] [<ffffffc00021b07c>] copy_process.part.56+0x42c/0xe38 > <6>[ 442.695382] [<ffffffc00021bb9c>] do_fork+0xe0/0x360 > <6>[ 442.695389] [<ffffffc00021beb4>] SyS_clone+0x10/0x1c > In __clear_open_fd(open_files - i, new_fdt); > > Do we need test_bit() before clear_bit()at these 2 place? > In the second case, new_fdt->open_fds has just been filled by a memcpy, and no-one can possibly have written to that cache line in the meantime. In the first case, testing is also likely wasteful if fdt->max_fds is less than half the number of bits in a cacheline (fdt->close_on_exec and fdt->open_fds are always contiguous, and the latter is unconditionally written to). Rasmus
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index 80e4645..a82937b 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -2,6 +2,7 @@ #include <linux/hugetlb.h> #include <linux/init.h> #include <linux/kernel.h> +#include <linux/module.h> #include <linux/mm.h> #include <linux/mman.h> #include <linux/mmzone.h> @@ -15,6 +16,41 @@ #include <asm/pgtable.h> #include "internal.h" +atomic_t __set_bit_success_count = ATOMIC_INIT(0); +atomic_t __set_bit_miss_count = ATOMIC_INIT(0); +EXPORT_SYMBOL_GPL(__set_bit_success_count); +EXPORT_SYMBOL_GPL(__set_bit_miss_count); + +atomic_t __clear_bit_success_count = ATOMIC_INIT(0); +atomic_t __clear_bit_miss_count = ATOMIC_INIT(0); +EXPORT_SYMBOL_GPL(__clear_bit_success_count); +EXPORT_SYMBOL_GPL(__clear_bit_miss_count); + +atomic_t __test_and_set_bit_success_count = ATOMIC_INIT(0); +atomic_t __test_and_set_bit_miss_count = ATOMIC_INIT(0); +EXPORT_SYMBOL_GPL(__test_and_set_bit_success_count); +EXPORT_SYMBOL_GPL(__test_and_set_bit_miss_count); + +atomic_t __test_and_clear_bit_success_count = ATOMIC_INIT(0); +atomic_t __test_and_clear_bit_miss_count = ATOMIC_INIT(0); +EXPORT_SYMBOL_GPL(__test_and_clear_bit_success_count); +EXPORT_SYMBOL_GPL(__test_and_clear_bit_miss_count); + +/* + * atomic bitops + */ +atomic_t set_bit_success_count = ATOMIC_INIT(0); +atomic_t set_bit_miss_count = ATOMIC_INIT(0); + +atomic_t clear_bit_success_count = ATOMIC_INIT(0); +atomic_t clear_bit_miss_count = ATOMIC_INIT(0); + +atomic_t test_and_set_bit_success_count = ATOMIC_INIT(0); +atomic_t test_and_set_bit_miss_count = ATOMIC_INIT(0); + +atomic_t test_and_clear_bit_success_count = ATOMIC_INIT(0); +atomic_t test_and_clear_bit_miss_count = ATOMIC_INIT(0); + void __attribute__((weak)) arch_report_meminfo(struct seq_file *m) { } @@ -165,6 +201,18 @@ static int meminfo_proc_show(struct seq_file *m, void *v) HPAGE_PMD_NR) #endif ); + seq_printf(m, "__set_bit_miss_count:%d __set_bit_success_count:%d\n" + "__clear_bit_miss_count:%d __clear_bit_success_count:%d\n" + "__test_and_set_bit_miss_count:%d __test_and_set_bit_success_count:%d\n" + "__test_and_clear_bit_miss_count:%d __test_and_clear_bit_success_count:%d\n", + atomic_read(&__set_bit_miss_count), atomic_read(&__set_bit_success_count), + atomic_read(&__clear_bit_miss_count), atomic_read(&__clear_bit_success_count), + + atomic_read(&__test_and_set_bit_miss_count), + atomic_read(&__test_and_set_bit_success_count), + + atomic_read(&__test_and_clear_bit_miss_count), + atomic_read(&__test_and_clear_bit_success_count)); hugetlb_report_meminfo(m); diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h index 697cc2b..1895133 100644 --- a/include/asm-generic/bitops/non-atomic.h +++ b/include/asm-generic/bitops/non-atomic.h @@ -2,7 +2,18 @@ #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ #include <asm/types.h> +#include <asm/atomic.h> +extern atomic_t __set_bit_success_count; +extern atomic_t __set_bit_miss_count; +extern atomic_t __clear_bit_success_count; +extern atomic_t __clear_bit_miss_count; + +extern atomic_t __test_and_set_bit_success_count; +extern atomic_t __test_and_set_bit_miss_count; + +extern atomic_t __test_and_clear_bit_success_count; +extern atomic_t __test_and_clear_bit_miss_count; /** * __set_bit - Set a bit in memory * @nr: the bit to set @@ -17,7 +28,13 @@ static inline void __set_bit(int nr, volatile unsigned long *addr) unsigned long mask = BIT_MASK(nr); unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr); - *p |= mask; + if ((*p & mask) == 0) { + atomic_inc(&__set_bit_success_count); + *p |= mask; + } else { + atomic_inc(&__set_bit_miss_count); + } + } static inline void __clear_bit(int nr, volatile unsigned long *addr) @@ -25,7 +42,12 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr) unsigned long mask = BIT_MASK(nr); unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr); - *p &= ~mask; + if ((*p & mask) != 0) { + atomic_inc(&__clear_bit_success_count); + *p &= ~mask; + } else { + atomic_inc(&__clear_bit_miss_count); + } } /** @@ -60,7 +82,12 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr) unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr); unsigned long old = *p; - *p = old | mask; + if ((old & mask) == 0) { + atomic_inc(&__test_and_set_bit_success_count); + *p = old | mask; + } else { + atomic_inc(&__test_and_set_bit_miss_count); + } return (old & mask) != 0; } @@ -79,7 +106,12 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr) unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr); unsigned long old = *p; - *p = old & ~mask; + if ((old & mask) != 0) { + atomic_inc(&__test_and_clear_bit_success_count); + *p = old & ~mask; + } else { + atomic_inc(&__test_and_clear_bit_miss_count); + } return (old & mask) != 0; } --- After use the phone some time: