Message ID | 1510802067-18609-2-git-send-email-byungchul.park@lge.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
[I have only briefly looked at patches so I might have missed some details.] On Thu 16-11-17 12:14:25, Byungchul Park wrote: > Although lock_page() and its family can cause deadlock, lockdep have not > worked with them, because unlock_page() might be called in a different > context from the acquire context, which violated lockdep's assumption. > > Now CONFIG_LOCKDEP_CROSSRELEASE has been introduced, lockdep can work > with page locks. I definitely agree that debugging page_lock deadlocks is a major PITA but your implementation seems prohibitively too expensive. [...] > @@ -218,6 +222,10 @@ struct page { > #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS > int _last_cpupid; > #endif > + > +#ifdef CONFIG_LOCKDEP_PAGELOCK > + struct lockdep_map_cross map; > +#endif > } now you are adding struct lockdep_map_cross { struct lockdep_map map; /* 0 40 */ struct cross_lock xlock; /* 40 56 */ /* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */ /* size: 96, cachelines: 2, members: 2 */ /* last cacheline: 32 bytes */ }; for each struct page. So you are doubling the size. Who is going to enable this config option? You are moving this to page_ext in a later patch which is a good step but it doesn't go far enough because this still consumes those resources. Is there any problem to make this kernel command line controllable? Something we do for page_owner for example? Also it would be really great if you could give us some measures about the runtime overhead. I do not expect it to be very large but this is something people are usually interested in when enabling debugging features.
On 11/16/2017 9:02 PM, Michal Hocko wrote: > for each struct page. So you are doubling the size. Who is going to > enable this config option? You are moving this to page_ext in a later > patch which is a good step but it doesn't go far enough because this > still consumes those resources. Is there any problem to make this > kernel command line controllable? Something we do for page_owner for > example? Sure. I will add it. > Also it would be really great if you could give us some measures about > the runtime overhead. I do not expect it to be very large but this is The major overhead would come from the amount of additional memory consumption for 'lockdep_map's. Do you want me to measure the overhead by the additional memory consumption? Or do you expect another overhead? Could you tell me what kind of result you want to get? > something people are usually interested in when enabling debugging > features.
On Thu 16-11-17 21:48:05, Byungchul Park wrote: > On 11/16/2017 9:02 PM, Michal Hocko wrote: > > for each struct page. So you are doubling the size. Who is going to > > enable this config option? You are moving this to page_ext in a later > > patch which is a good step but it doesn't go far enough because this > > still consumes those resources. Is there any problem to make this > > kernel command line controllable? Something we do for page_owner for > > example? > > Sure. I will add it. > > > Also it would be really great if you could give us some measures about > > the runtime overhead. I do not expect it to be very large but this is > > The major overhead would come from the amount of additional memory > consumption for 'lockdep_map's. yes > Do you want me to measure the overhead by the additional memory > consumption? > > Or do you expect another overhead? I would be also interested how much impact this has on performance. I do not expect it would be too large but having some numbers for cache cold parallel kbuild or other heavy page lock workloads.
On Thu, Nov 16, 2017 at 02:07:46PM +0100, Michal Hocko wrote: > On Thu 16-11-17 21:48:05, Byungchul Park wrote: > > On 11/16/2017 9:02 PM, Michal Hocko wrote: > > > for each struct page. So you are doubling the size. Who is going to > > > enable this config option? You are moving this to page_ext in a later > > > patch which is a good step but it doesn't go far enough because this > > > still consumes those resources. Is there any problem to make this > > > kernel command line controllable? Something we do for page_owner for > > > example? > > > > Sure. I will add it. > > > > > Also it would be really great if you could give us some measures about > > > the runtime overhead. I do not expect it to be very large but this is > > > > The major overhead would come from the amount of additional memory > > consumption for 'lockdep_map's. > > yes > > > Do you want me to measure the overhead by the additional memory > > consumption? > > > > Or do you expect another overhead? > > I would be also interested how much impact this has on performance. I do > not expect it would be too large but having some numbers for cache cold > parallel kbuild or other heavy page lock workloads. Hello Michal, I measured 'cache cold parallel kbuild' on my qemu machine. The result varies much so I cannot confirm, but I think there's no meaningful difference between before and after applying crossrelease to page locks. Actually, I expect little overhead in lock_page() and unlock_page() even after applying crossreleas to page locks, but only expect a bit overhead by additional memory consumption for 'lockdep_map's per page. I run the following instructions within "QEMU x86_64 4GB memory 4 cpus": make clean echo 3 > drop_caches time make -j4 The results are: # w/o page lock tracking At the 1st try, real 5m28.105s user 17m52.716s sys 3m8.871s At the 2nd try, real 5m27.023s user 17m50.134s sys 3m9.289s At the 3rd try, real 5m22.837s user 17m34.514s sys 3m8.097s # w/ page lock tracking At the 1st try, real 5m18.158s user 17m18.200s sys 3m8.639s At the 2nd try, real 5m19.329s user 17m19.982s sys 3m8.345s At the 3rd try, real 5m19.626s user 17m21.363s sys 3m9.869s I think thers's no meaningful difference on my small machine. -- Thanks, Byungchul
On Fri 24-11-17 12:02:36, Byungchul Park wrote: > On Thu, Nov 16, 2017 at 02:07:46PM +0100, Michal Hocko wrote: > > On Thu 16-11-17 21:48:05, Byungchul Park wrote: > > > On 11/16/2017 9:02 PM, Michal Hocko wrote: > > > > for each struct page. So you are doubling the size. Who is going to > > > > enable this config option? You are moving this to page_ext in a later > > > > patch which is a good step but it doesn't go far enough because this > > > > still consumes those resources. Is there any problem to make this > > > > kernel command line controllable? Something we do for page_owner for > > > > example? > > > > > > Sure. I will add it. > > > > > > > Also it would be really great if you could give us some measures about > > > > the runtime overhead. I do not expect it to be very large but this is > > > > > > The major overhead would come from the amount of additional memory > > > consumption for 'lockdep_map's. > > > > yes > > > > > Do you want me to measure the overhead by the additional memory > > > consumption? > > > > > > Or do you expect another overhead? > > > > I would be also interested how much impact this has on performance. I do > > not expect it would be too large but having some numbers for cache cold > > parallel kbuild or other heavy page lock workloads. > > Hello Michal, > > I measured 'cache cold parallel kbuild' on my qemu machine. The result > varies much so I cannot confirm, but I think there's no meaningful > difference between before and after applying crossrelease to page locks. > > Actually, I expect little overhead in lock_page() and unlock_page() even > after applying crossreleas to page locks, but only expect a bit overhead > by additional memory consumption for 'lockdep_map's per page. > > I run the following instructions within "QEMU x86_64 4GB memory 4 cpus": > > make clean > echo 3 > drop_caches > time make -j4 Maybe FS people will help you find a more representative workload. E.g. linear cache cold file read should be good as well. Maybe there are some tests in fstests (or how they call xfstests these days). > The results are: > > # w/o page lock tracking > > At the 1st try, > real 5m28.105s > user 17m52.716s > sys 3m8.871s > > At the 2nd try, > real 5m27.023s > user 17m50.134s > sys 3m9.289s > > At the 3rd try, > real 5m22.837s > user 17m34.514s > sys 3m8.097s > > # w/ page lock tracking > > At the 1st try, > real 5m18.158s > user 17m18.200s > sys 3m8.639s > > At the 2nd try, > real 5m19.329s > user 17m19.982s > sys 3m8.345s > > At the 3rd try, > real 5m19.626s > user 17m21.363s > sys 3m9.869s > > I think thers's no meaningful difference on my small machine. Yeah, this doesn't seem to indicate anything. Maybe moving the build to shmem to rule out IO cost would tell more. But as I've said previously page I do not really expect this would be very visible. It was more a matter of my curiosity than an acceptance requirement. I think it is much more important to make this runtime configurable because almost nobody is going to enable the feature if it is only build time. The cost is jut too high.
On Fri 24-11-17 09:11:49, Michal Hocko wrote: > On Fri 24-11-17 12:02:36, Byungchul Park wrote: > > On Thu, Nov 16, 2017 at 02:07:46PM +0100, Michal Hocko wrote: > > > On Thu 16-11-17 21:48:05, Byungchul Park wrote: > > > > On 11/16/2017 9:02 PM, Michal Hocko wrote: > > > > > for each struct page. So you are doubling the size. Who is going to > > > > > enable this config option? You are moving this to page_ext in a later > > > > > patch which is a good step but it doesn't go far enough because this > > > > > still consumes those resources. Is there any problem to make this > > > > > kernel command line controllable? Something we do for page_owner for > > > > > example? > > > > > > > > Sure. I will add it. > > > > > > > > > Also it would be really great if you could give us some measures about > > > > > the runtime overhead. I do not expect it to be very large but this is > > > > > > > > The major overhead would come from the amount of additional memory > > > > consumption for 'lockdep_map's. > > > > > > yes > > > > > > > Do you want me to measure the overhead by the additional memory > > > > consumption? > > > > > > > > Or do you expect another overhead? > > > > > > I would be also interested how much impact this has on performance. I do > > > not expect it would be too large but having some numbers for cache cold > > > parallel kbuild or other heavy page lock workloads. > > > > Hello Michal, > > > > I measured 'cache cold parallel kbuild' on my qemu machine. The result > > varies much so I cannot confirm, but I think there's no meaningful > > difference between before and after applying crossrelease to page locks. > > > > Actually, I expect little overhead in lock_page() and unlock_page() even > > after applying crossreleas to page locks, but only expect a bit overhead > > by additional memory consumption for 'lockdep_map's per page. > > > > I run the following instructions within "QEMU x86_64 4GB memory 4 cpus": > > > > make clean > > echo 3 > drop_caches > > time make -j4 > > Maybe FS people will help you find a more representative workload. E.g. > linear cache cold file read should be good as well. Maybe there are some > tests in fstests (or how they call xfstests these days). So a relatively good test of page handling costs is to mmap cache hot file and measure time to fault in all the pages in the mapping. That way IO and filesystem stays out of the way and you measure only page table lookup, page handling (taking page ref and locking the page), and instantiation of the new PTE. Out of this page handling is actually the significant part. Honza
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index c85f11d..263b861 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -17,6 +17,10 @@ #include <asm/mmu.h> +#ifdef CONFIG_LOCKDEP_PAGELOCK +#include <linux/lockdep.h> +#endif + #ifndef AT_VECTOR_SIZE_ARCH #define AT_VECTOR_SIZE_ARCH 0 #endif @@ -218,6 +222,10 @@ struct page { #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS int _last_cpupid; #endif + +#ifdef CONFIG_LOCKDEP_PAGELOCK + struct lockdep_map_cross map; +#endif } /* * The struct page can be forced to be double word aligned so that atomic ops diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index e08b533..35b4f67 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -15,6 +15,9 @@ #include <linux/bitops.h> #include <linux/hardirq.h> /* for in_interrupt() */ #include <linux/hugetlb_inline.h> +#ifdef CONFIG_LOCKDEP_PAGELOCK +#include <linux/lockdep.h> +#endif /* * Bits in mapping->flags. @@ -457,26 +460,91 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma, return pgoff; } +#ifdef CONFIG_LOCKDEP_PAGELOCK +#define lock_page_init(p) \ +do { \ + static struct lock_class_key __key; \ + lockdep_init_map_crosslock((struct lockdep_map *)&(p)->map, \ + "(PG_locked)" #p, &__key, 0); \ +} while (0) + +static inline void lock_page_acquire(struct page *page, int try) +{ + page = compound_head(page); + lock_acquire_exclusive((struct lockdep_map *)&page->map, 0, + try, NULL, _RET_IP_); +} + +static inline void lock_page_release(struct page *page) +{ + page = compound_head(page); + /* + * lock_commit_crosslock() is necessary for crosslocks. + */ + lock_commit_crosslock((struct lockdep_map *)&page->map); + lock_release((struct lockdep_map *)&page->map, 0, _RET_IP_); +} +#else +static inline void lock_page_init(struct page *page) {} +static inline void lock_page_free(struct page *page) {} +static inline void lock_page_acquire(struct page *page, int try) {} +static inline void lock_page_release(struct page *page) {} +#endif + extern void __lock_page(struct page *page); extern int __lock_page_killable(struct page *page); extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm, unsigned int flags); -extern void unlock_page(struct page *page); +extern void do_raw_unlock_page(struct page *page); -static inline int trylock_page(struct page *page) +static inline void unlock_page(struct page *page) +{ + lock_page_release(page); + do_raw_unlock_page(page); +} + +static inline int do_raw_trylock_page(struct page *page) { page = compound_head(page); return (likely(!test_and_set_bit_lock(PG_locked, &page->flags))); } +static inline int trylock_page(struct page *page) +{ + if (do_raw_trylock_page(page)) { + lock_page_acquire(page, 1); + return 1; + } + return 0; +} + /* * lock_page may only be called if we have the page's inode pinned. */ static inline void lock_page(struct page *page) { might_sleep(); - if (!trylock_page(page)) + + if (!do_raw_trylock_page(page)) __lock_page(page); + /* + * acquire() must be after actual lock operation for crosslocks. + * This way a crosslock and current lock can be ordered like: + * + * CONTEXT 1 CONTEXT 2 + * --------- --------- + * lock A (cross) + * acquire A + * X = atomic_inc_return(&cross_gen_id) + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + * acquire B + * Y = atomic_read_acquire(&cross_gen_id) + * lock B + * + * so that 'lock A and then lock B' can be seen globally, + * if X <= Y. + */ + lock_page_acquire(page, 0); } /* @@ -486,9 +554,20 @@ static inline void lock_page(struct page *page) */ static inline int lock_page_killable(struct page *page) { + int ret; + might_sleep(); - if (!trylock_page(page)) - return __lock_page_killable(page); + + if (!do_raw_trylock_page(page)) { + ret = __lock_page_killable(page); + if (ret) + return ret; + } + /* + * acquire() must be after actual lock operation for crosslocks. + * This way a crosslock and other locks can be ordered. + */ + lock_page_acquire(page, 0); return 0; } @@ -503,7 +582,17 @@ static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm, unsigned int flags) { might_sleep(); - return trylock_page(page) || __lock_page_or_retry(page, mm, flags); + + if (do_raw_trylock_page(page) || __lock_page_or_retry(page, mm, flags)) { + /* + * acquire() must be after actual lock operation for crosslocks. + * This way a crosslock and other locks can be ordered. + */ + lock_page_acquire(page, 0); + return 1; + } + + return 0; } /* diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 2b439a5..2e8c679 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1094,6 +1094,7 @@ config PROVE_LOCKING select DEBUG_LOCK_ALLOC select LOCKDEP_CROSSRELEASE select LOCKDEP_COMPLETIONS + select LOCKDEP_PAGELOCK select TRACE_IRQFLAGS default n help @@ -1179,6 +1180,12 @@ config LOCKDEP_COMPLETIONS A deadlock caused by wait_for_completion() and complete() can be detected by lockdep using crossrelease feature. +config LOCKDEP_PAGELOCK + bool + help + PG_locked lock is a kind of crosslock. Using crossrelease feature, + PG_locked lock can work with lockdep. + config BOOTPARAM_LOCKDEP_CROSSRELEASE_FULLSTACK bool "Enable the boot parameter, crossrelease_fullstack" depends on LOCKDEP_CROSSRELEASE diff --git a/mm/filemap.c b/mm/filemap.c index 594d73f..870d442 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1099,7 +1099,7 @@ static inline bool clear_bit_unlock_is_negative_byte(long nr, volatile void *mem * portably (architectures that do LL/SC can test any bit, while x86 can * test the sign bit). */ -void unlock_page(struct page *page) +void do_raw_unlock_page(struct page *page) { BUILD_BUG_ON(PG_waiters != 7); page = compound_head(page); @@ -1107,7 +1107,7 @@ void unlock_page(struct page *page) if (clear_bit_unlock_is_negative_byte(PG_locked, &page->flags)) wake_up_page_bit(page, PG_locked); } -EXPORT_SYMBOL(unlock_page); +EXPORT_SYMBOL(do_raw_unlock_page); /** * end_page_writeback - end writeback against a page diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 77e4d3c..8436b28 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5371,6 +5371,9 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, } else { __init_single_pfn(pfn, zone, nid); } +#ifdef CONFIG_LOCKDEP_PAGELOCK + lock_page_init(pfn_to_page(pfn)); +#endif } }
Although lock_page() and its family can cause deadlock, lockdep have not worked with them, because unlock_page() might be called in a different context from the acquire context, which violated lockdep's assumption. Now CONFIG_LOCKDEP_CROSSRELEASE has been introduced, lockdep can work with page locks. Signed-off-by: Byungchul Park <byungchul.park@lge.com> --- include/linux/mm_types.h | 8 ++++ include/linux/pagemap.h | 101 ++++++++++++++++++++++++++++++++++++++++++++--- lib/Kconfig.debug | 7 ++++ mm/filemap.c | 4 +- mm/page_alloc.c | 3 ++ 5 files changed, 115 insertions(+), 8 deletions(-)