Message ID | 20210502180755.445-2-longman@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] mm: memcg/slab: Prevent recursive kfree() loop | expand |
On 5/2/21 8:07 PM, Waiman Long wrote: > The obj_cgroup array (memcg_data) embedded in the page structure is > allocated at the first instance an accounted memory allocation happens. > With the right size object, it is possible that the allocated obj_cgroup > array comes from the same slab that requires memory accounting. If this > happens, the slab will never become empty again as there is at least one > object left (the obj_cgroup array) in the slab. > > With instructmentation code added to detect this situation, I got 76 > hits on the kmalloc-192 slab when booting up a test kernel on a VM. > So this can really happen. > > To avoid the creation of these unfreeable slabs, a check is added to > memcg_alloc_page_obj_cgroups() to detect that and double the size > of the array in case it happens to make sure that it comes from a > different kmemcache. > > This change, however, does not completely eliminate the presence > of unfreeable slabs which can still happen if a circular obj_cgroup > array dependency is formed. Hm this looks like only a half fix then. I'm afraid the proper fix is for kmemcg to create own set of caches for the arrays. It would also solve the recursive kfree() issue. > Fixes: 286e04b8ed7a ("mm: memcg/slab: allocate obj_cgroups for non-root slab pages") > Signed-off-by: Waiman Long <longman@redhat.com> > --- > mm/memcontrol.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index b0695d3aa530..44852ac048c3 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2876,12 +2876,24 @@ int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s, > */ > objects = max(objs_per_slab_page(s, page), > (int)(sizeof(struct rcu_head)/sizeof(void *))); > - > +retry: > vec = kcalloc_node(objects, sizeof(struct obj_cgroup *), gfp, > page_to_nid(page)); > if (!vec) > return -ENOMEM; > > + /* > + * The allocated vector should not come from the same slab. > + * Otherwise, this slab will never become empty. Double the size > + * in this case to make sure that the vector comes from a different > + * kmemcache. > + */ > + if (unlikely(virt_to_head_page(vec) == page)) { > + kfree(vec); > + objects *= 2; > + goto retry; > + } > + > memcg_data = (unsigned long) vec | MEMCG_DATA_OBJCGS; > if (new_page) { > /* >
On 5/3/21 8:22 AM, Vlastimil Babka wrote: > On 5/2/21 8:07 PM, Waiman Long wrote: >> The obj_cgroup array (memcg_data) embedded in the page structure is >> allocated at the first instance an accounted memory allocation happens. >> With the right size object, it is possible that the allocated obj_cgroup >> array comes from the same slab that requires memory accounting. If this >> happens, the slab will never become empty again as there is at least one >> object left (the obj_cgroup array) in the slab. >> >> With instructmentation code added to detect this situation, I got 76 >> hits on the kmalloc-192 slab when booting up a test kernel on a VM. >> So this can really happen. >> >> To avoid the creation of these unfreeable slabs, a check is added to >> memcg_alloc_page_obj_cgroups() to detect that and double the size >> of the array in case it happens to make sure that it comes from a >> different kmemcache. >> >> This change, however, does not completely eliminate the presence >> of unfreeable slabs which can still happen if a circular obj_cgroup >> array dependency is formed. > Hm this looks like only a half fix then. > I'm afraid the proper fix is for kmemcg to create own set of caches for the > arrays. It would also solve the recursive kfree() issue. Right, this is a possible solution. However, the objcg pointers array should need that much memory. Creating its own set of kmemcaches may seem like an overkill. Cheers, Longman
On 5/3/21 4:20 PM, Waiman Long wrote: > On 5/3/21 8:22 AM, Vlastimil Babka wrote: >> On 5/2/21 8:07 PM, Waiman Long wrote: >>> The obj_cgroup array (memcg_data) embedded in the page structure is >>> allocated at the first instance an accounted memory allocation happens. >>> With the right size object, it is possible that the allocated obj_cgroup >>> array comes from the same slab that requires memory accounting. If this >>> happens, the slab will never become empty again as there is at least one >>> object left (the obj_cgroup array) in the slab. >>> >>> With instructmentation code added to detect this situation, I got 76 >>> hits on the kmalloc-192 slab when booting up a test kernel on a VM. >>> So this can really happen. >>> >>> To avoid the creation of these unfreeable slabs, a check is added to >>> memcg_alloc_page_obj_cgroups() to detect that and double the size >>> of the array in case it happens to make sure that it comes from a >>> different kmemcache. >>> >>> This change, however, does not completely eliminate the presence >>> of unfreeable slabs which can still happen if a circular obj_cgroup >>> array dependency is formed. >> Hm this looks like only a half fix then. >> I'm afraid the proper fix is for kmemcg to create own set of caches for the >> arrays. It would also solve the recursive kfree() issue. > > Right, this is a possible solution. However, the objcg pointers array should > need that much memory. Creating its own set of kmemcaches may seem like an > overkill. Well if we go that way, there might be additional benefits: depending of gfp flags, kmalloc() would allocate from: kmalloc-* caches that never have kmemcg objects, thus can be used for the objcg pointer arrays kmalloc-cg-* caches that have only kmemcg unreclaimable objects kmalloc-rcl-* and dma-kmalloc-* can stay with on-demand memcg_alloc_page_obj_cgroups() This way we fully solve the issues that this patchset solves. In addition we get better separation between kmemcg and !kmemcg thus save memory - no allocation of the array as soon as a single object appears in slab. For "kmalloc-8" we now have 8 bytes for the useful data and 8 bytes for the obj_cgroup pointer. Vlastimil > Cheers, > Longman >
On Mon, May 3, 2021 at 8:32 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > On 5/3/21 4:20 PM, Waiman Long wrote: > > On 5/3/21 8:22 AM, Vlastimil Babka wrote: > >> On 5/2/21 8:07 PM, Waiman Long wrote: > >>> The obj_cgroup array (memcg_data) embedded in the page structure is > >>> allocated at the first instance an accounted memory allocation happens. > >>> With the right size object, it is possible that the allocated obj_cgroup > >>> array comes from the same slab that requires memory accounting. If this > >>> happens, the slab will never become empty again as there is at least one > >>> object left (the obj_cgroup array) in the slab. > >>> > >>> With instructmentation code added to detect this situation, I got 76 > >>> hits on the kmalloc-192 slab when booting up a test kernel on a VM. > >>> So this can really happen. > >>> > >>> To avoid the creation of these unfreeable slabs, a check is added to > >>> memcg_alloc_page_obj_cgroups() to detect that and double the size > >>> of the array in case it happens to make sure that it comes from a > >>> different kmemcache. > >>> > >>> This change, however, does not completely eliminate the presence > >>> of unfreeable slabs which can still happen if a circular obj_cgroup > >>> array dependency is formed. > >> Hm this looks like only a half fix then. > >> I'm afraid the proper fix is for kmemcg to create own set of caches for the > >> arrays. It would also solve the recursive kfree() issue. > > > > Right, this is a possible solution. However, the objcg pointers array should > > need that much memory. Creating its own set of kmemcaches may seem like an > > overkill. > > Well if we go that way, there might be additional benefits: > > depending of gfp flags, kmalloc() would allocate from: > > kmalloc-* caches that never have kmemcg objects, thus can be used for the objcg > pointer arrays > kmalloc-cg-* caches that have only kmemcg unreclaimable objects > kmalloc-rcl-* and dma-kmalloc-* can stay with on-demand > memcg_alloc_page_obj_cgroups() > > This way we fully solve the issues that this patchset solves. In addition we get > better separation between kmemcg and !kmemcg thus save memory - no allocation of > the array as soon as a single object appears in slab. For "kmalloc-8" we now > have 8 bytes for the useful data and 8 bytes for the obj_cgroup pointer. > Yes this seems like a better approach.
On 5/3/21 12:24 PM, Shakeel Butt wrote: > On Mon, May 3, 2021 at 8:32 AM Vlastimil Babka <vbabka@suse.cz> wrote: >> On 5/3/21 4:20 PM, Waiman Long wrote: >>> On 5/3/21 8:22 AM, Vlastimil Babka wrote: >>>> On 5/2/21 8:07 PM, Waiman Long wrote: >>>>> The obj_cgroup array (memcg_data) embedded in the page structure is >>>>> allocated at the first instance an accounted memory allocation happens. >>>>> With the right size object, it is possible that the allocated obj_cgroup >>>>> array comes from the same slab that requires memory accounting. If this >>>>> happens, the slab will never become empty again as there is at least one >>>>> object left (the obj_cgroup array) in the slab. >>>>> >>>>> With instructmentation code added to detect this situation, I got 76 >>>>> hits on the kmalloc-192 slab when booting up a test kernel on a VM. >>>>> So this can really happen. >>>>> >>>>> To avoid the creation of these unfreeable slabs, a check is added to >>>>> memcg_alloc_page_obj_cgroups() to detect that and double the size >>>>> of the array in case it happens to make sure that it comes from a >>>>> different kmemcache. >>>>> >>>>> This change, however, does not completely eliminate the presence >>>>> of unfreeable slabs which can still happen if a circular obj_cgroup >>>>> array dependency is formed. >>>> Hm this looks like only a half fix then. >>>> I'm afraid the proper fix is for kmemcg to create own set of caches for the >>>> arrays. It would also solve the recursive kfree() issue. >>> Right, this is a possible solution. However, the objcg pointers array should >>> need that much memory. Creating its own set of kmemcaches may seem like an >>> overkill. >> Well if we go that way, there might be additional benefits: >> >> depending of gfp flags, kmalloc() would allocate from: >> >> kmalloc-* caches that never have kmemcg objects, thus can be used for the objcg >> pointer arrays >> kmalloc-cg-* caches that have only kmemcg unreclaimable objects >> kmalloc-rcl-* and dma-kmalloc-* can stay with on-demand >> memcg_alloc_page_obj_cgroups() >> >> This way we fully solve the issues that this patchset solves. In addition we get >> better separation between kmemcg and !kmemcg thus save memory - no allocation of >> the array as soon as a single object appears in slab. For "kmalloc-8" we now >> have 8 bytes for the useful data and 8 bytes for the obj_cgroup pointer. >> > Yes this seems like a better approach. > OK, I will try to go this route then if there is no objection from others. From slabinfo, the objs/slab numbers range from 4-512. That means we need kmalloc-cg-{32,64,128,256,512,1k,2k,4k}. A init function to set up the new kmemcaches and an allocation function that use the proper kmemcaches to allocate from. Cheers, Longman
On 5/3/21 1:21 PM, Waiman Long wrote: > On 5/3/21 12:24 PM, Shakeel Butt wrote: >> On Mon, May 3, 2021 at 8:32 AM Vlastimil Babka <vbabka@suse.cz> wrote: >>> On 5/3/21 4:20 PM, Waiman Long wrote: >>>> On 5/3/21 8:22 AM, Vlastimil Babka wrote: >>>>> On 5/2/21 8:07 PM, Waiman Long wrote: >>>>>> The obj_cgroup array (memcg_data) embedded in the page structure is >>>>>> allocated at the first instance an accounted memory allocation >>>>>> happens. >>>>>> With the right size object, it is possible that the allocated >>>>>> obj_cgroup >>>>>> array comes from the same slab that requires memory accounting. >>>>>> If this >>>>>> happens, the slab will never become empty again as there is at >>>>>> least one >>>>>> object left (the obj_cgroup array) in the slab. >>>>>> >>>>>> With instructmentation code added to detect this situation, I got 76 >>>>>> hits on the kmalloc-192 slab when booting up a test kernel on a VM. >>>>>> So this can really happen. >>>>>> >>>>>> To avoid the creation of these unfreeable slabs, a check is added to >>>>>> memcg_alloc_page_obj_cgroups() to detect that and double the size >>>>>> of the array in case it happens to make sure that it comes from a >>>>>> different kmemcache. >>>>>> >>>>>> This change, however, does not completely eliminate the presence >>>>>> of unfreeable slabs which can still happen if a circular obj_cgroup >>>>>> array dependency is formed. >>>>> Hm this looks like only a half fix then. >>>>> I'm afraid the proper fix is for kmemcg to create own set of >>>>> caches for the >>>>> arrays. It would also solve the recursive kfree() issue. >>>> Right, this is a possible solution. However, the objcg pointers >>>> array should >>>> need that much memory. Creating its own set of kmemcaches may seem >>>> like an >>>> overkill. >>> Well if we go that way, there might be additional benefits: >>> >>> depending of gfp flags, kmalloc() would allocate from: >>> >>> kmalloc-* caches that never have kmemcg objects, thus can be used >>> for the objcg >>> pointer arrays >>> kmalloc-cg-* caches that have only kmemcg unreclaimable objects >>> kmalloc-rcl-* and dma-kmalloc-* can stay with on-demand >>> memcg_alloc_page_obj_cgroups() >>> >>> This way we fully solve the issues that this patchset solves. In >>> addition we get >>> better separation between kmemcg and !kmemcg thus save memory - no >>> allocation of >>> the array as soon as a single object appears in slab. For >>> "kmalloc-8" we now >>> have 8 bytes for the useful data and 8 bytes for the obj_cgroup >>> pointer. >>> >> Yes this seems like a better approach. >> > OK, I will try to go this route then if there is no objection from > others. > > From slabinfo, the objs/slab numbers range from 4-512. That means we > need kmalloc-cg-{32,64,128,256,512,1k,2k,4k}. A init function to set > up the new kmemcaches and an allocation function that use the proper > kmemcaches to allocate from. I think I had misinterpreted the kmalloc-* setup. In this case, the kmalloc-cg-* should have the same set of sizes as kmalloc-*. Cheers, Longman
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b0695d3aa530..44852ac048c3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2876,12 +2876,24 @@ int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s, */ objects = max(objs_per_slab_page(s, page), (int)(sizeof(struct rcu_head)/sizeof(void *))); - +retry: vec = kcalloc_node(objects, sizeof(struct obj_cgroup *), gfp, page_to_nid(page)); if (!vec) return -ENOMEM; + /* + * The allocated vector should not come from the same slab. + * Otherwise, this slab will never become empty. Double the size + * in this case to make sure that the vector comes from a different + * kmemcache. + */ + if (unlikely(virt_to_head_page(vec) == page)) { + kfree(vec); + objects *= 2; + goto retry; + } + memcg_data = (unsigned long) vec | MEMCG_DATA_OBJCGS; if (new_page) { /*
The obj_cgroup array (memcg_data) embedded in the page structure is allocated at the first instance an accounted memory allocation happens. With the right size object, it is possible that the allocated obj_cgroup array comes from the same slab that requires memory accounting. If this happens, the slab will never become empty again as there is at least one object left (the obj_cgroup array) in the slab. With instructmentation code added to detect this situation, I got 76 hits on the kmalloc-192 slab when booting up a test kernel on a VM. So this can really happen. To avoid the creation of these unfreeable slabs, a check is added to memcg_alloc_page_obj_cgroups() to detect that and double the size of the array in case it happens to make sure that it comes from a different kmemcache. This change, however, does not completely eliminate the presence of unfreeable slabs which can still happen if a circular obj_cgroup array dependency is formed. Fixes: 286e04b8ed7a ("mm: memcg/slab: allocate obj_cgroups for non-root slab pages") Signed-off-by: Waiman Long <longman@redhat.com> --- mm/memcontrol.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)