Message ID | 1515531365-37423-5-git-send-email-keescook@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, 9 Jan 2018, Kees Cook wrote: > +struct kmem_cache *kmem_cache_create_usercopy(const char *name, > + size_t size, size_t align, slab_flags_t flags, > + size_t useroffset, size_t usersize, > + void (*ctor)(void *)); Hmmm... At some point we should switch kmem_cache_create to pass a struct containing all the parameters. Otherwise the API will blow up with additional functions. > index 2181719fd907..70c4b4bb4d1f 100644 > --- a/include/linux/stddef.h > +++ b/include/linux/stddef.h > @@ -19,6 +19,8 @@ enum { > #define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER) > #endif > > +#define sizeof_field(structure, field) sizeof((((structure *)0)->field)) > + > /** > * offsetofend(TYPE, MEMBER) > * Have a separate patch for adding this functionality? Its not a slab maintainer file. Rest looks ok. Acked-by: Christoph Lameter <cl@linux.com>
On Wed, Jan 10, 2018 at 10:28 AM, Christopher Lameter <cl@linux.com> wrote: > On Tue, 9 Jan 2018, Kees Cook wrote: > >> +struct kmem_cache *kmem_cache_create_usercopy(const char *name, >> + size_t size, size_t align, slab_flags_t flags, >> + size_t useroffset, size_t usersize, >> + void (*ctor)(void *)); > > Hmmm... At some point we should switch kmem_cache_create to pass a struct > containing all the parameters. Otherwise the API will blow up with > additional functions. > >> index 2181719fd907..70c4b4bb4d1f 100644 >> --- a/include/linux/stddef.h >> +++ b/include/linux/stddef.h >> @@ -19,6 +19,8 @@ enum { >> #define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER) >> #endif >> >> +#define sizeof_field(structure, field) sizeof((((structure *)0)->field)) >> + >> /** >> * offsetofend(TYPE, MEMBER) >> * > > Have a separate patch for adding this functionality? Its not a slab > maintainer > file. Good idea; I've done this now. > Rest looks ok. > > Acked-by: Christoph Lameter <cl@linux.com> Thanks! -Kees
From: Christopher Lameter > Sent: 10 January 2018 18:28 > On Tue, 9 Jan 2018, Kees Cook wrote: > > > +struct kmem_cache *kmem_cache_create_usercopy(const char *name, > > + size_t size, size_t align, slab_flags_t flags, > > + size_t useroffset, size_t usersize, > > + void (*ctor)(void *)); > > Hmmm... At some point we should switch kmem_cache_create to pass a struct > containing all the parameters. Otherwise the API will blow up with > additional functions. Or add an extra function to 'configure' the kmem_cache with the extra parameters. David
On Fri, 12 Jan 2018, David Laight wrote: > > Hmmm... At some point we should switch kmem_cache_create to pass a struct > > containing all the parameters. Otherwise the API will blow up with > > additional functions. > > Or add an extra function to 'configure' the kmem_cache with the > extra parameters. We probably need even more configurability if we add callbacks for object reclaim / moving.
On Wed, Jan 10, 2018 at 12:28:23PM -0600, Christopher Lameter wrote: > On Tue, 9 Jan 2018, Kees Cook wrote: > > +struct kmem_cache *kmem_cache_create_usercopy(const char *name, > > + size_t size, size_t align, slab_flags_t flags, > > + size_t useroffset, size_t usersize, > > + void (*ctor)(void *)); > > Hmmm... At some point we should switch kmem_cache_create to pass a struct > containing all the parameters. Otherwise the API will blow up with > additional functions. Obviously I agree with you. I'm inclined to not let that delay Kees' patches; we can fix the few places that use this API later. At this point, my proposal for the ultimate form would be: struct kmem_cache_attr { const char name[32]; void (*ctor)(void *); unsigned int useroffset; unsigned int user_size; }; kmem_create_cache_attr(const struct kmem_cache_attr *attr, unsigned int size, unsigned int align, slab_flags_t flags) (my rationale is that everything in attr should be const, but size, align and flags all get modified by the slab code).
On Sun, 14 Jan 2018, Matthew Wilcox wrote: > > Hmmm... At some point we should switch kmem_cache_create to pass a struct > > containing all the parameters. Otherwise the API will blow up with > > additional functions. > > Obviously I agree with you. I'm inclined to not let that delay Kees' > patches; we can fix the few places that use this API later. At this > point, my proposal for the ultimate form would be: Right. Thats why I said "at some point".... > > struct kmem_cache_attr { > const char name[32]; Want to avoid the string reference mess that occurred in the past? Is that really necessary? But it would limit the size of the name. > void (*ctor)(void *); > unsigned int useroffset; > unsigned int user_size; > }; > > kmem_create_cache_attr(const struct kmem_cache_attr *attr, unsigned int size, > unsigned int align, slab_flags_t flags) > > (my rationale is that everything in attr should be const, but size, align > and flags all get modified by the slab code). Thought about putting all the parameters into the kmem_cache_attr struct. So struct kmem_cache_attr { char *name; size_t size; size_t align; slab_flags_t flags; unsigned int useroffset; unsinged int usersize; void (*ctor)(void *); kmem_isolate_func *isolate; kmem_migrate_func *migrate; ... }
On Tue, Jan 16, 2018 at 09:21:30AM -0600, Christopher Lameter wrote: > > struct kmem_cache_attr { > > const char name[32]; > > Want to avoid the string reference mess that occurred in the past? > Is that really necessary? But it would limit the size of the name. I think that's a good thing! /proc/slabinfo really starts to get grotty above 16 bytes. I'd like to chop off "_cache" from the name of every single slab! If ext4_allocation_context has to become ext4_alloc_ctx, I don't think we're going to lose any valuable information. My real intent was to reduce the number of allocations; if we can make it not necessary to kstrdup the name, I think that'd be appreciated by our CONFIG_TINY friends. > > (my rationale is that everything in attr should be const, but size, align > > and flags all get modified by the slab code). > > Thought about putting all the parameters into the kmem_cache_attr struct. > > So > > struct kmem_cache_attr { > char *name; > size_t size; > size_t align; > slab_flags_t flags; > unsigned int useroffset; > unsinged int usersize; > void (*ctor)(void *); > kmem_isolate_func *isolate; > kmem_migrate_func *migrate; > ... > } In these slightly-more-security-conscious days, it's considered poor practice to have function pointers in writable memory. That was why I wanted to make the kmem_cache_attr const. Also, there's no need for 'size' and 'align' to be size_t. Slab should never support allocations above 4GB in size. I'm not even keen on seeing allocations above 64kB, but I see my laptop has six 512kB allocations (!), three 256kB allocations and seven 128kB allocations, so I must reluctantly concede that using an unsigned int is necessary. If I were really into bitshaving, I might force all allocations to be a multiple of 32-bytes in size, and then we could use 16 bits to represent an allocation between 32 and 2MB, but I think that tips us beyond the complexity boundary.
On Tue, 16 Jan 2018, Matthew Wilcox wrote: > I think that's a good thing! /proc/slabinfo really starts to get grotty > above 16 bytes. I'd like to chop off "_cache" from the name of every > single slab! If ext4_allocation_context has to become ext4_alloc_ctx, > I don't think we're going to lose any valuable information. Ok so we are going to cut off at 16 charaacters? Sounds good to me. > > struct kmem_cache_attr { > > char *name; > > size_t size; > > size_t align; > > slab_flags_t flags; > > unsigned int useroffset; > > unsinged int usersize; > > void (*ctor)(void *); > > kmem_isolate_func *isolate; > > kmem_migrate_func *migrate; > > ... > > } > > In these slightly-more-security-conscious days, it's considered poor > practice to have function pointers in writable memory. That was why > I wanted to make the kmem_cache_attr const. Sure this data is never changed. It can be const. > Also, there's no need for 'size' and 'align' to be size_t. Slab should > never support allocations above 4GB in size. I'm not even keen on seeing > allocations above 64kB, but I see my laptop has six 512kB allocations (!), > three 256kB allocations and seven 128kB allocations, so I must reluctantly > concede that using an unsigned int is necessary. If I were really into > bitshaving, I might force all allocations to be a multiple of 32-bytes > in size, and then we could use 16 bits to represent an allocation between > 32 and 2MB, but I think that tips us beyond the complexity boundary. I am not married to either way of specifying the sizes. unsigned int would be fine with me. SLUB falls back to the page allocator anyways for anything above 2* PAGE_SIZE and I think we can do the same for the other allocators as well. Zeroing or initializing such a large memory chunk is much more expensive than the allocation so it does not make much sense to have that directly supported in the slab allocators. Some platforms support 64K page size and I could envision a 2M page size at some point. So I think we cannot use 16 bits there. If no one objects then I can use unsigned int there again.
On Tue, Jan 16, 2018 at 10:54:27AM -0600, Christopher Lameter wrote: > On Tue, 16 Jan 2018, Matthew Wilcox wrote: > > > I think that's a good thing! /proc/slabinfo really starts to get grotty > > above 16 bytes. I'd like to chop off "_cache" from the name of every > > single slab! If ext4_allocation_context has to become ext4_alloc_ctx, > > I don't think we're going to lose any valuable information. > > Ok so we are going to cut off at 16 charaacters? Sounds good to me. Excellent! > > > struct kmem_cache_attr { > > > char *name; > > > size_t size; > > > size_t align; > > > slab_flags_t flags; > > > unsigned int useroffset; > > > unsinged int usersize; > > > void (*ctor)(void *); > > > kmem_isolate_func *isolate; > > > kmem_migrate_func *migrate; > > > ... > > > } > > > > In these slightly-more-security-conscious days, it's considered poor > > practice to have function pointers in writable memory. That was why > > I wanted to make the kmem_cache_attr const. > > Sure this data is never changed. It can be const. It's changed at initialisation. Look: kmem_cache_create(const char *name, size_t size, size_t align, slab_flags_t flags, void (*ctor)(void *)) s = create_cache(cache_name, size, size, calculate_alignment(flags, align, size), flags, ctor, NULL, NULL); The 'align' that ends up in s->align, is not the user-specified align. It's also dependent on runtime information (cache_line_size()), so it can't be calculated at compile time. 'flags' also gets mangled: flags &= CACHE_CREATE_MASK; > I am not married to either way of specifying the sizes. unsigned int would > be fine with me. SLUB falls back to the page allocator anyways for > anything above 2* PAGE_SIZE and I think we can do the same for the other > allocators as well. Zeroing or initializing such a large memory chunk is > much more expensive than the allocation so it does not make much sense to > have that directly supported in the slab allocators. The only slabs larger than 4kB on my system right now are: kvm_vcpu 0 0 19136 1 8 : tunables 8 4 0 : slabdata 0 0 0 net_namespace 1 1 6080 1 2 : tunables 8 4 0 : slabdata 1 1 0 (other than the fake slabs for kmalloc) > Some platforms support 64K page size and I could envision a 2M page size > at some point. So I think we cannot use 16 bits there. > > If no one objects then I can use unsigned int there again. unsigned int would be my preference.
On Tue, 16 Jan 2018, Matthew Wilcox wrote: > > Sure this data is never changed. It can be const. > > It's changed at initialisation. Look: > > kmem_cache_create(const char *name, size_t size, size_t align, > slab_flags_t flags, void (*ctor)(void *)) > s = create_cache(cache_name, size, size, > calculate_alignment(flags, align, size), > flags, ctor, NULL, NULL); > > The 'align' that ends up in s->align, is not the user-specified align. > It's also dependent on runtime information (cache_line_size()), so it > can't be calculated at compile time. Then we would need another align field in struct kmem_cache that takes the changes value? > 'flags' also gets mangled: > flags &= CACHE_CREATE_MASK; Well ok then that also belongs into kmem_cache and the original value stays in kmem_cache_attr. > unsigned int would be my preference. Great.
diff --git a/include/linux/slab.h b/include/linux/slab.h index ca11d5affacf..518f72bf565e 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -135,9 +135,13 @@ struct mem_cgroup; void __init kmem_cache_init(void); bool slab_is_available(void); -struct kmem_cache *kmem_cache_create(const char *, size_t, size_t, - slab_flags_t, - void (*)(void *)); +struct kmem_cache *kmem_cache_create(const char *name, size_t size, + size_t align, slab_flags_t flags, + void (*ctor)(void *)); +struct kmem_cache *kmem_cache_create_usercopy(const char *name, + size_t size, size_t align, slab_flags_t flags, + size_t useroffset, size_t usersize, + void (*ctor)(void *)); void kmem_cache_destroy(struct kmem_cache *); int kmem_cache_shrink(struct kmem_cache *); @@ -153,9 +157,20 @@ void memcg_destroy_kmem_caches(struct mem_cgroup *); * f.e. add ____cacheline_aligned_in_smp to the struct declaration * then the objects will be properly aligned in SMP configurations. */ -#define KMEM_CACHE(__struct, __flags) kmem_cache_create(#__struct,\ - sizeof(struct __struct), __alignof__(struct __struct),\ - (__flags), NULL) +#define KMEM_CACHE(__struct, __flags) \ + kmem_cache_create(#__struct, sizeof(struct __struct), \ + __alignof__(struct __struct), (__flags), NULL) + +/* + * To whitelist a single field for copying to/from usercopy, use this + * macro instead for KMEM_CACHE() above. + */ +#define KMEM_CACHE_USERCOPY(__struct, __flags, __field) \ + kmem_cache_create_usercopy(#__struct, \ + sizeof(struct __struct), \ + __alignof__(struct __struct), (__flags), \ + offsetof(struct __struct, __field), \ + sizeof_field(struct __struct, __field), NULL) /* * Common kmalloc functions provided by all allocators diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h index 072e46e9e1d5..7385547c04b1 100644 --- a/include/linux/slab_def.h +++ b/include/linux/slab_def.h @@ -85,6 +85,9 @@ struct kmem_cache { unsigned int *random_seq; #endif + size_t useroffset; /* Usercopy region offset */ + size_t usersize; /* Usercopy region size */ + struct kmem_cache_node *node[MAX_NUMNODES]; }; diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index 0adae162dc8f..8ad99c47b19c 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -135,6 +135,9 @@ struct kmem_cache { struct kasan_cache kasan_info; #endif + size_t useroffset; /* Usercopy region offset */ + size_t usersize; /* Usercopy region size */ + struct kmem_cache_node *node[MAX_NUMNODES]; }; diff --git a/include/linux/stddef.h b/include/linux/stddef.h index 2181719fd907..70c4b4bb4d1f 100644 --- a/include/linux/stddef.h +++ b/include/linux/stddef.h @@ -19,6 +19,8 @@ enum { #define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER) #endif +#define sizeof_field(structure, field) sizeof((((structure *)0)->field)) + /** * offsetofend(TYPE, MEMBER) * diff --git a/mm/slab.c b/mm/slab.c index 1a33ba68df3d..f1ead7b7909d 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1281,7 +1281,7 @@ void __init kmem_cache_init(void) create_boot_cache(kmem_cache, "kmem_cache", offsetof(struct kmem_cache, node) + nr_node_ids * sizeof(struct kmem_cache_node *), - SLAB_HWCACHE_ALIGN); + SLAB_HWCACHE_ALIGN, 0, 0); list_add(&kmem_cache->list, &slab_caches); slab_state = PARTIAL; diff --git a/mm/slab.h b/mm/slab.h index ad657ffa44e5..8f3030788e01 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -22,6 +22,8 @@ struct kmem_cache { unsigned int size; /* The aligned/padded/added on size */ unsigned int align; /* Alignment as calculated */ slab_flags_t flags; /* Active flags on the slab */ + size_t useroffset; /* Usercopy region offset */ + size_t usersize; /* Usercopy region size */ const char *name; /* Slab name for sysfs */ int refcount; /* Use counter */ void (*ctor)(void *); /* Called on object slot creation */ @@ -97,7 +99,8 @@ int __kmem_cache_create(struct kmem_cache *, slab_flags_t flags); extern struct kmem_cache *create_kmalloc_cache(const char *name, size_t size, slab_flags_t flags); extern void create_boot_cache(struct kmem_cache *, const char *name, - size_t size, slab_flags_t flags); + size_t size, slab_flags_t flags, size_t useroffset, + size_t usersize); int slab_unmergeable(struct kmem_cache *s); struct kmem_cache *find_mergeable(size_t size, size_t align, diff --git a/mm/slab_common.c b/mm/slab_common.c index c8cb36774ba1..fc3e66bdce75 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -281,6 +281,9 @@ int slab_unmergeable(struct kmem_cache *s) if (s->ctor) return 1; + if (s->usersize) + return 1; + /* * We may have set a slab to be unmergeable during bootstrap. */ @@ -366,12 +369,16 @@ unsigned long calculate_alignment(slab_flags_t flags, static struct kmem_cache *create_cache(const char *name, size_t object_size, size_t size, size_t align, - slab_flags_t flags, void (*ctor)(void *), + slab_flags_t flags, size_t useroffset, + size_t usersize, void (*ctor)(void *), struct mem_cgroup *memcg, struct kmem_cache *root_cache) { struct kmem_cache *s; int err; + if (WARN_ON(useroffset + usersize > object_size)) + useroffset = usersize = 0; + err = -ENOMEM; s = kmem_cache_zalloc(kmem_cache, GFP_KERNEL); if (!s) @@ -382,6 +389,8 @@ static struct kmem_cache *create_cache(const char *name, s->size = size; s->align = align; s->ctor = ctor; + s->useroffset = useroffset; + s->usersize = usersize; err = init_memcg_params(s, memcg, root_cache); if (err) @@ -406,11 +415,13 @@ static struct kmem_cache *create_cache(const char *name, } /* - * kmem_cache_create - Create a cache. + * kmem_cache_create_usercopy - Create a cache. * @name: A string which is used in /proc/slabinfo to identify this cache. * @size: The size of objects to be created in this cache. * @align: The required alignment for the objects. * @flags: SLAB flags + * @useroffset: Usercopy region offset + * @usersize: Usercopy region size * @ctor: A constructor for the objects. * * Returns a ptr to the cache on success, NULL on failure. @@ -430,8 +441,9 @@ static struct kmem_cache *create_cache(const char *name, * as davem. */ struct kmem_cache * -kmem_cache_create(const char *name, size_t size, size_t align, - slab_flags_t flags, void (*ctor)(void *)) +kmem_cache_create_usercopy(const char *name, size_t size, size_t align, + slab_flags_t flags, size_t useroffset, size_t usersize, + void (*ctor)(void *)) { struct kmem_cache *s = NULL; const char *cache_name; @@ -462,7 +474,13 @@ kmem_cache_create(const char *name, size_t size, size_t align, */ flags &= CACHE_CREATE_MASK; - s = __kmem_cache_alias(name, size, align, flags, ctor); + /* Fail closed on bad usersize of useroffset values. */ + if (WARN_ON(!usersize && useroffset) || + WARN_ON(size < usersize || size - usersize < useroffset)) + usersize = useroffset = 0; + + if (!usersize) + s = __kmem_cache_alias(name, size, align, flags, ctor); if (s) goto out_unlock; @@ -474,7 +492,7 @@ kmem_cache_create(const char *name, size_t size, size_t align, s = create_cache(cache_name, size, size, calculate_alignment(flags, align, size), - flags, ctor, NULL, NULL); + flags, useroffset, usersize, ctor, NULL, NULL); if (IS_ERR(s)) { err = PTR_ERR(s); kfree_const(cache_name); @@ -500,6 +518,15 @@ kmem_cache_create(const char *name, size_t size, size_t align, } return s; } +EXPORT_SYMBOL(kmem_cache_create_usercopy); + +struct kmem_cache * +kmem_cache_create(const char *name, size_t size, size_t align, + slab_flags_t flags, void (*ctor)(void *)) +{ + return kmem_cache_create_usercopy(name, size, align, flags, 0, size, + ctor); +} EXPORT_SYMBOL(kmem_cache_create); static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work) @@ -612,6 +639,7 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg, s = create_cache(cache_name, root_cache->object_size, root_cache->size, root_cache->align, root_cache->flags & CACHE_CREATE_MASK, + root_cache->useroffset, root_cache->usersize, root_cache->ctor, memcg, root_cache); /* * If we could not create a memcg cache, do not complain, because @@ -879,13 +907,15 @@ bool slab_is_available(void) #ifndef CONFIG_SLOB /* Create a cache during boot when no slab services are available yet */ void __init create_boot_cache(struct kmem_cache *s, const char *name, size_t size, - slab_flags_t flags) + slab_flags_t flags, size_t useroffset, size_t usersize) { int err; s->name = name; s->size = s->object_size = size; s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size); + s->useroffset = useroffset; + s->usersize = usersize; slab_init_memcg_params(s); @@ -906,7 +936,7 @@ struct kmem_cache *__init create_kmalloc_cache(const char *name, size_t size, if (!s) panic("Out of memory when creating slab %s\n", name); - create_boot_cache(s, name, size, flags); + create_boot_cache(s, name, size, flags, 0, size); list_add(&s->list, &slab_caches); memcg_link_cache(s); s->refcount = 1; diff --git a/mm/slub.c b/mm/slub.c index 8c82872cc8ef..8738a8d8bf8e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -4183,7 +4183,7 @@ void __init kmem_cache_init(void) kmem_cache = &boot_kmem_cache; create_boot_cache(kmem_cache_node, "kmem_cache_node", - sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN); + sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN, 0, 0); register_hotmemory_notifier(&slab_memory_callback_nb); @@ -4193,7 +4193,7 @@ void __init kmem_cache_init(void) create_boot_cache(kmem_cache, "kmem_cache", offsetof(struct kmem_cache, node) + nr_node_ids * sizeof(struct kmem_cache_node *), - SLAB_HWCACHE_ALIGN); + SLAB_HWCACHE_ALIGN, 0, 0); kmem_cache = bootstrap(&boot_kmem_cache); @@ -5063,6 +5063,12 @@ static ssize_t cache_dma_show(struct kmem_cache *s, char *buf) SLAB_ATTR_RO(cache_dma); #endif +static ssize_t usersize_show(struct kmem_cache *s, char *buf) +{ + return sprintf(buf, "%zu\n", s->usersize); +} +SLAB_ATTR_RO(usersize); + static ssize_t destroy_by_rcu_show(struct kmem_cache *s, char *buf) { return sprintf(buf, "%d\n", !!(s->flags & SLAB_TYPESAFE_BY_RCU)); @@ -5437,6 +5443,7 @@ static struct attribute *slab_attrs[] = { #ifdef CONFIG_FAILSLAB &failslab_attr.attr, #endif + &usersize_attr.attr, NULL };