Message ID | 20190528120453.27374-1-npiggin@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/4] mm/large system hash: use vmalloc for size > MAX_ORDER when !hashdist | expand |
On Tue, May 28, 2019 at 5:08 AM Nicholas Piggin <npiggin@gmail.com> wrote: > > The kernel currently clamps large system hashes to MAX_ORDER when > hashdist is not set, which is rather arbitrary. I think the *really* arbitrary part here is "hashdist". If you enable NUMA support, hashdist is just set to 1 by default on 64-bit, whether the machine actually has any numa characteristics or not. So you take that vmalloc() TLB overhead whether you need it or not. So I think your series looks sane, and should help the vmalloc case for big hash allocations, but I also think that this whole alloc_large_system_hash() function should be smarter in general. Yes, it's called "alloc_large_system_hash()", but it's used on small and perfectly normal-sized systems too, and often for not all that big hashes. Yes, we tend to try to make some of those hashes large (dentry one in particular), but we also use this for small stuff. For example, on my machine I have several network hashes that have order 6-8 sizes, none of which really make any sense to use vmalloc space for (and which are smaller than a large page, so your patch series wouldn't help). So on the whole I have no issues with this series, but I do think we should maybe fix that crazy "if (hashdist)" case. Hmm? Linus
Linus Torvalds's on June 1, 2019 4:30 am: > On Tue, May 28, 2019 at 5:08 AM Nicholas Piggin <npiggin@gmail.com> wrote: >> >> The kernel currently clamps large system hashes to MAX_ORDER when >> hashdist is not set, which is rather arbitrary. > > I think the *really* arbitrary part here is "hashdist". > > If you enable NUMA support, hashdist is just set to 1 by default on > 64-bit, whether the machine actually has any numa characteristics or > not. So you take that vmalloc() TLB overhead whether you need it or > not. Yeah, that's strange it seems to just be an oversight nobody ever picked up. Patch 2/4 actually fixed that exactly the way you said. > > So I think your series looks sane, and should help the vmalloc case > for big hash allocations, but I also think that this whole > alloc_large_system_hash() function should be smarter in general. > > Yes, it's called "alloc_large_system_hash()", but it's used on small > and perfectly normal-sized systems too, and often for not all that big > hashes. > > Yes, we tend to try to make some of those hashes large (dentry one in > particular), but we also use this for small stuff. > > For example, on my machine I have several network hashes that have > order 6-8 sizes, none of which really make any sense to use vmalloc > space for (and which are smaller than a large page, so your patch > series wouldn't help). > > So on the whole I have no issues with this series, but I do think we > should maybe fix that crazy "if (hashdist)" case. Hmm? Yes agreed. Even after this series with 2MB mappings it's actually a bit sad that we can't use the linear map for the non-NUMA case. My laptop has a 32MB dentry cache and 16MB inode cache so doing a bunch of name lookups is quite a waste of TLB entries (although at least with 2MB pages it doesn't blow the TLB completely). We might be able to go a step further and use memblock allocator for those as well, or reserve some boot CMA for that common case ot just use the linear map for these hashes. I'll look into that. Thanks, Nick
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d66bc8abe0af..dd419a074141 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8029,7 +8029,7 @@ void *__init alloc_large_system_hash(const char *tablename, else table = memblock_alloc_raw(size, SMP_CACHE_BYTES); - } else if (hashdist) { + } else if (get_order(size) >= MAX_ORDER || hashdist) { table = __vmalloc(size, gfp_flags, PAGE_KERNEL); } else { /* @@ -8037,10 +8037,8 @@ void *__init alloc_large_system_hash(const char *tablename, * some pages at the end of hash table which * alloc_pages_exact() automatically does */ - if (get_order(size) < MAX_ORDER) { - table = alloc_pages_exact(size, gfp_flags); - kmemleak_alloc(table, size, 1, gfp_flags); - } + table = alloc_pages_exact(size, gfp_flags); + kmemleak_alloc(table, size, 1, gfp_flags); } } while (!table && size > PAGE_SIZE && --log2qty);
The kernel currently clamps large system hashes to MAX_ORDER when hashdist is not set, which is rather arbitrary. vmalloc space is limited on 32-bit machines, but this shouldn't result in much more used because of small physical memory limiting system hash sizes. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- mm/page_alloc.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)