Message ID | 1684479370.5483281.1527680579781.JavaMail.zimbra@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed 30-05-18 07:42:59, Chunyu Hu wrote: > > ----- Original Message ----- > > From: "Michal Hocko" <mhocko@suse.com> > > To: "Chunyu Hu" <chuhu@redhat.com> > > Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, dvyukov@google.com, linux-mm@kvack.org, > > "catalin marinas" <catalin.marinas@arm.com> > > Sent: Wednesday, May 30, 2018 6:46:37 PM > > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL > > > > On Wed 30-05-18 05:35:37, Chunyu Hu wrote: > > [...] > > > I'm trying to reuse the make_it_fail field in task for fault injection. As > > > adding > > > an extra memory alloc flag is not thought so good, I think adding task > > > flag > > > is either? > > > > Yeah, task flag will be reduced to KMEMLEAK enabled configurations > > without an additional maint. overhead. Anyway, you should really think > > about how to guarantee trackability for atomic allocation requests. You > > cannot simply assume that GFP_NOWAIT will succeed. I guess you really > > Sure. While I'm using task->make_it_fail, I'm still in the direction of > making kmemleak avoid fault inject with task flag instead of page alloc flag. > > > want to have a pre-populated pool of objects for those requests. The > > obvious question is how to balance such a pool. It ain't easy to track > > memory by allocating more memory... > > This solution is going to make kmemleak trace really nofail. We can think > later. > > while I'm thinking about if fault inject can be disabled via flag in task. > > Actually, I'm doing something like below, the disable_fault_inject() is > just setting a flag in task->make_it_fail. But this will depend on if > fault injection accept a change like this. CCing Akinobu You still seem to be missing my point I am afraid (or I am ;). So say that you want to track a GFP_NOWAIT allocation request. So create_object will get called with that gfp mask and no matter what you try here your tracking object will be allocated in a weak allocation context as well and disable kmemleak. So it only takes a more heavy memory pressure and the tracing is gone...
Hi Michal, I'm catching up with this thread. On Wed, May 30, 2018 at 02:38:26PM +0200, Michal Hocko wrote: > On Wed 30-05-18 07:42:59, Chunyu Hu wrote: > > From: "Michal Hocko" <mhocko@suse.com> > > > want to have a pre-populated pool of objects for those requests. The > > > obvious question is how to balance such a pool. It ain't easy to track > > > memory by allocating more memory... > > > > This solution is going to make kmemleak trace really nofail. We can think > > later. > > > > while I'm thinking about if fault inject can be disabled via flag in task. > > > > Actually, I'm doing something like below, the disable_fault_inject() is > > just setting a flag in task->make_it_fail. But this will depend on if > > fault injection accept a change like this. CCing Akinobu > > You still seem to be missing my point I am afraid (or I am ;). So say > that you want to track a GFP_NOWAIT allocation request. So create_object > will get called with that gfp mask and no matter what you try here your > tracking object will be allocated in a weak allocation context as well > and disable kmemleak. So it only takes a more heavy memory pressure and > the tracing is gone... create_object() indeed gets the originating gfp but it only cares whether it was GFP_KERNEL or GFP_ATOMIC. gfp_kmemleak_mask() masks out all the other flags when allocating a struct kmemleak_object (and adds some of its own). This has worked ok so far. There is a higher risk of GFP_ATOMIC allocations failing but I haven't seen issues with kmemleak unless you run it under heavy memory pressure (and kmemleak just disables itself). With fault injection, such pressure is simulated with the side effect of rendering kmemleak unusable. Kmemleak could implement its own allocation mechanism (maybe even skipping the slab altogether) but I feel it's overkill just to cope with the specific case of fault injection. Also, it's not easy to figure out how to balance such pool and it may end up calling alloc_pages() which in turn can inject a fault. Could we tweak gfp_kmemleak_mask() to still pass __GFP_NOFAIL but in a compatible way (e.g. by setting __GFP_DIRECT_RECLAIM) when fault injection is enabled? Otherwise, I'd prefer some per-thread temporary fault injection disabling.
On Thu 31-05-18 16:22:26, Catalin Marinas wrote: > Hi Michal, > > I'm catching up with this thread. > > On Wed, May 30, 2018 at 02:38:26PM +0200, Michal Hocko wrote: > > On Wed 30-05-18 07:42:59, Chunyu Hu wrote: > > > From: "Michal Hocko" <mhocko@suse.com> > > > > want to have a pre-populated pool of objects for those requests. The > > > > obvious question is how to balance such a pool. It ain't easy to track > > > > memory by allocating more memory... > > > > > > This solution is going to make kmemleak trace really nofail. We can think > > > later. > > > > > > while I'm thinking about if fault inject can be disabled via flag in task. > > > > > > Actually, I'm doing something like below, the disable_fault_inject() is > > > just setting a flag in task->make_it_fail. But this will depend on if > > > fault injection accept a change like this. CCing Akinobu > > > > You still seem to be missing my point I am afraid (or I am ;). So say > > that you want to track a GFP_NOWAIT allocation request. So create_object > > will get called with that gfp mask and no matter what you try here your > > tracking object will be allocated in a weak allocation context as well > > and disable kmemleak. So it only takes a more heavy memory pressure and > > the tracing is gone... > > create_object() indeed gets the originating gfp but it only cares > whether it was GFP_KERNEL or GFP_ATOMIC. gfp_kmemleak_mask() masks out > all the other flags when allocating a struct kmemleak_object (and adds > some of its own). > > This has worked ok so far. There is a higher risk of GFP_ATOMIC > allocations failing but I haven't seen issues with kmemleak unless you > run it under heavy memory pressure (and kmemleak just disables itself). > With fault injection, such pressure is simulated with the side effect of > rendering kmemleak unusable. > > Kmemleak could implement its own allocation mechanism (maybe even > skipping the slab altogether) but I feel it's overkill just to cope with > the specific case of fault injection. Also, it's not easy to figure out > how to balance such pool and it may end up calling alloc_pages() which > in turn can inject a fault. > > Could we tweak gfp_kmemleak_mask() to still pass __GFP_NOFAIL but in a > compatible way (e.g. by setting __GFP_DIRECT_RECLAIM) when fault > injection is enabled? > > Otherwise, I'd prefer some per-thread temporary fault injection > disabling. Well, there are two issues (which boil down to the one in the end) here. Fault injection or a GFP_NOWAIT or any other weaker than GFP_KERNEL context is something to care about. A weaker allocation context can and will lead to kmemleak meta data allocation failures regardless of the fault injection. The way how those objects are allocated directly in the allacator context makes this really hard and inherently subtle. I can see only two ways around. Either you have a placeholder for "this object is not tracked so do not throw false positives" or have a preallocated pool to use if the direct context allocation failes for whatever reason. Abusing __GFP_NOFAIL is simply a crude hack which will lead to all kind of problems.
----- Original Message ----- > From: "Michal Hocko" <mhocko@suse.com> > To: "Catalin Marinas" <catalin.marinas@arm.com> > Cc: "Chunyu Hu" <chuhu@redhat.com>, "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, > dvyukov@google.com, linux-mm@kvack.org, "Akinobu Mita" <akinobu.mita@gmail.com> > Sent: Friday, June 1, 2018 2:41:04 AM > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL > > On Thu 31-05-18 16:22:26, Catalin Marinas wrote: > > Hi Michal, > > > > I'm catching up with this thread. > > > > On Wed, May 30, 2018 at 02:38:26PM +0200, Michal Hocko wrote: > > > On Wed 30-05-18 07:42:59, Chunyu Hu wrote: > > > > From: "Michal Hocko" <mhocko@suse.com> > > > > > want to have a pre-populated pool of objects for those requests. The > > > > > obvious question is how to balance such a pool. It ain't easy to > > > > > track > > > > > memory by allocating more memory... > > > > > > > > This solution is going to make kmemleak trace really nofail. We can > > > > think > > > > later. > > > > > > > > while I'm thinking about if fault inject can be disabled via flag in > > > > task. > > > > > > > > Actually, I'm doing something like below, the disable_fault_inject() is > > > > just setting a flag in task->make_it_fail. But this will depend on if > > > > fault injection accept a change like this. CCing Akinobu > > > > > > You still seem to be missing my point I am afraid (or I am ;). So say > > > that you want to track a GFP_NOWAIT allocation request. So create_object > > > will get called with that gfp mask and no matter what you try here your > > > tracking object will be allocated in a weak allocation context as well > > > and disable kmemleak. So it only takes a more heavy memory pressure and > > > the tracing is gone... > > > > create_object() indeed gets the originating gfp but it only cares > > whether it was GFP_KERNEL or GFP_ATOMIC. gfp_kmemleak_mask() masks out > > all the other flags when allocating a struct kmemleak_object (and adds > > some of its own). > > > > This has worked ok so far. There is a higher risk of GFP_ATOMIC > > allocations failing but I haven't seen issues with kmemleak unless you > > run it under heavy memory pressure (and kmemleak just disables itself). > > With fault injection, such pressure is simulated with the side effect of > > rendering kmemleak unusable. > > > > Kmemleak could implement its own allocation mechanism (maybe even > > skipping the slab altogether) but I feel it's overkill just to cope with > > the specific case of fault injection. Also, it's not easy to figure out > > how to balance such pool and it may end up calling alloc_pages() which > > in turn can inject a fault. it would benefit kmemleak trace, I see in my test that kmemleak even can work in user pressure cases, such as in my test, stress-ng to consume nearly all the swap space. kmemleak is still working. but 1M objects pool is consuming around 400M + memory. So this is just a experiment try, as you said, how to balance it's size is the issue or ther issues has to be resolved, such as when to add pool, the speed, how big, and so on ... And I fault injected 20000 times fail_page_alloc, and 2148 times happened in create_object, and in such a case, kmemleak is till working after the 2000+ calloc failures. [root@dhcp-12-244 fail_page_alloc]# grep create_object /var/log/messages | wc -l 2148 [60498.299412] FAULT_INJECTION: forcing a failure. name fail_page_alloc, interval 0, probability 80, space 0, times 2 So this way is not just for fault injection, it's about making kmemleak a bit stronger under memory failure case. It would be an exciting experience we see if kmemleak still work even after mem pressure, as a user, I experienced the good usage. > > > > Could we tweak gfp_kmemleak_mask() to still pass __GFP_NOFAIL but in a > > compatible way (e.g. by setting __GFP_DIRECT_RECLAIM) when fault > > injection is enabled? Maybe I can have a try on this.. > > > > Otherwise, I'd prefer some per-thread temporary fault injection > > disabling. I tried in make_it_fail flag, kmemleak can avoid fault injection, but I can see kmemleak diabled itself... > > Well, there are two issues (which boil down to the one in the end) here. > Fault injection or a GFP_NOWAIT or any other weaker than GFP_KERNEL > context is something to care about. A weaker allocation context can and > will lead to kmemleak meta data allocation failures regardless of the > fault injection. The way how those objects are allocated directly in the > allacator context makes this really hard and inherently subtle. I can > see only two ways around. Either you have a placeholder for "this object > is not tracked so do not throw false positives" or have a preallocated > pool to use if the direct context allocation failes for whatever reason. > Abusing __GFP_NOFAIL is simply a crude hack which will lead to all kind > of problems. > -- > Michal Hocko > SUSE Labs >
----- Original Message ----- > From: "Chunyu Hu" <chuhu@redhat.com> > To: "Catalin Marinas" <catalin.marinas@arm.com> > Cc: "Michal Hocko" <mhocko@suse.com>, "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, > dvyukov@google.com, linux-mm@kvack.org, "Akinobu Mita" <akinobu.mita@gmail.com> > Sent: Friday, June 1, 2018 9:50:20 AM > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL > > > > ----- Original Message ----- > > From: "Michal Hocko" <mhocko@suse.com> > > To: "Catalin Marinas" <catalin.marinas@arm.com> > > Cc: "Chunyu Hu" <chuhu@redhat.com>, "Tetsuo Handa" > > <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, > > dvyukov@google.com, linux-mm@kvack.org, "Akinobu Mita" > > <akinobu.mita@gmail.com> > > Sent: Friday, June 1, 2018 2:41:04 AM > > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL > > > > On Thu 31-05-18 16:22:26, Catalin Marinas wrote: > > > Hi Michal, > > > > > > I'm catching up with this thread. > > > > > > On Wed, May 30, 2018 at 02:38:26PM +0200, Michal Hocko wrote: > > > > On Wed 30-05-18 07:42:59, Chunyu Hu wrote: > > > > > From: "Michal Hocko" <mhocko@suse.com> > > > > > > want to have a pre-populated pool of objects for those requests. > > > > > > The > > > > > > obvious question is how to balance such a pool. It ain't easy to > > > > > > track > > > > > > memory by allocating more memory... > > > > > > > > > > This solution is going to make kmemleak trace really nofail. We can > > > > > think > > > > > later. > > > > > > > > > > while I'm thinking about if fault inject can be disabled via flag in > > > > > task. > > > > > > > > > > Actually, I'm doing something like below, the disable_fault_inject() > > > > > is > > > > > just setting a flag in task->make_it_fail. But this will depend on if > > > > > fault injection accept a change like this. CCing Akinobu > > > > > > > > You still seem to be missing my point I am afraid (or I am ;). So say > > > > that you want to track a GFP_NOWAIT allocation request. So > > > > create_object > > > > will get called with that gfp mask and no matter what you try here your > > > > tracking object will be allocated in a weak allocation context as well > > > > and disable kmemleak. So it only takes a more heavy memory pressure and > > > > the tracing is gone... > > > > > > create_object() indeed gets the originating gfp but it only cares > > > whether it was GFP_KERNEL or GFP_ATOMIC. gfp_kmemleak_mask() masks out > > > all the other flags when allocating a struct kmemleak_object (and adds > > > some of its own). > > > > > > This has worked ok so far. There is a higher risk of GFP_ATOMIC > > > allocations failing but I haven't seen issues with kmemleak unless you > > > run it under heavy memory pressure (and kmemleak just disables itself). > > > With fault injection, such pressure is simulated with the side effect of > > > rendering kmemleak unusable. > > > > > > Kmemleak could implement its own allocation mechanism (maybe even > > > skipping the slab altogether) but I feel it's overkill just to cope with > > > the specific case of fault injection. Also, it's not easy to figure out > > > how to balance such pool and it may end up calling alloc_pages() which > > > in turn can inject a fault. > > > it would benefit kmemleak trace, I see in my test that kmemleak even can work > in > user pressure cases, such as in my test, stress-ng to consume > nearly all the swap space. kmemleak is still working. but 1M objects pool > is consuming around 400M + memory. So this is just a experiment try, as you > said, how to balance it's size is the issue or ther issues has to be > resolved, > such as when to add pool, the speed, how big, and so on ... > > And I fault injected 20000 times fail_page_alloc, and 2148 times happened > in create_object, and in such a case, kmemleak is till working after the > 2000+ calloc failures. > > [root@dhcp-12-244 fail_page_alloc]# grep create_object /var/log/messages | wc > -l > 2148 > > [60498.299412] FAULT_INJECTION: forcing a failure. > name fail_page_alloc, interval 0, probability 80, space 0, times 2 > > So this way is not just for fault injection, it's about making kmemleak > a bit stronger under memory failure case. It would be an exciting experience > we > see if kmemleak still work even after mem pressure, as a user, I experienced > the good usage. > > > > > > > > Could we tweak gfp_kmemleak_mask() to still pass __GFP_NOFAIL but in a > > > compatible way (e.g. by setting __GFP_DIRECT_RECLAIM) when fault > > > injection is enabled? > > Maybe I can have a try on this.. looks like I tried this in my first report thread, and it failed. As it can sleep in irq() .. https://marc.info/?l=linux-mm&m=152482400222806&w=2 > > > > > > > Otherwise, I'd prefer some per-thread temporary fault injection > > > disabling. > > I tried in make_it_fail flag, kmemleak can avoid fault injection, but I > can see kmemleak diabled itself... > > > > > Well, there are two issues (which boil down to the one in the end) here. > > Fault injection or a GFP_NOWAIT or any other weaker than GFP_KERNEL > > context is something to care about. A weaker allocation context can and > > will lead to kmemleak meta data allocation failures regardless of the > > fault injection. The way how those objects are allocated directly in the > > allacator context makes this really hard and inherently subtle. I can > > see only two ways around. Either you have a placeholder for "this object > > is not tracked so do not throw false positives" or have a preallocated > > pool to use if the direct context allocation failes for whatever reason. > > Abusing __GFP_NOFAIL is simply a crude hack which will lead to all kind > > of problems. > > -- > > Michal Hocko > > SUSE Labs > > > > -- > Regards, > Chunyu Hu > >
On Fri, Jun 1, 2018 at 6:53 AM, Chunyu Hu <chuhu@redhat.com> wrote: > ----- Original Message ----- >> From: "Chunyu Hu" <chuhu@redhat.com> >> To: "Catalin Marinas" <catalin.marinas@arm.com> >> Cc: "Michal Hocko" <mhocko@suse.com>, "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, >> dvyukov@google.com, linux-mm@kvack.org, "Akinobu Mita" <akinobu.mita@gmail.com> >> Sent: Friday, June 1, 2018 9:50:20 AM >> Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL >> >> >> >> ----- Original Message ----- >> > From: "Michal Hocko" <mhocko@suse.com> >> > To: "Catalin Marinas" <catalin.marinas@arm.com> >> > Cc: "Chunyu Hu" <chuhu@redhat.com>, "Tetsuo Handa" >> > <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, >> > dvyukov@google.com, linux-mm@kvack.org, "Akinobu Mita" >> > <akinobu.mita@gmail.com> >> > Sent: Friday, June 1, 2018 2:41:04 AM >> > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL >> > >> > On Thu 31-05-18 16:22:26, Catalin Marinas wrote: >> > > Hi Michal, >> > > >> > > I'm catching up with this thread. >> > > >> > > On Wed, May 30, 2018 at 02:38:26PM +0200, Michal Hocko wrote: >> > > > On Wed 30-05-18 07:42:59, Chunyu Hu wrote: >> > > > > From: "Michal Hocko" <mhocko@suse.com> >> > > > > > want to have a pre-populated pool of objects for those requests. >> > > > > > The >> > > > > > obvious question is how to balance such a pool. It ain't easy to >> > > > > > track >> > > > > > memory by allocating more memory... >> > > > > >> > > > > This solution is going to make kmemleak trace really nofail. We can >> > > > > think >> > > > > later. >> > > > > >> > > > > while I'm thinking about if fault inject can be disabled via flag in >> > > > > task. >> > > > > >> > > > > Actually, I'm doing something like below, the disable_fault_inject() >> > > > > is >> > > > > just setting a flag in task->make_it_fail. But this will depend on if >> > > > > fault injection accept a change like this. CCing Akinobu >> > > > >> > > > You still seem to be missing my point I am afraid (or I am ;). So say >> > > > that you want to track a GFP_NOWAIT allocation request. So >> > > > create_object >> > > > will get called with that gfp mask and no matter what you try here your >> > > > tracking object will be allocated in a weak allocation context as well >> > > > and disable kmemleak. So it only takes a more heavy memory pressure and >> > > > the tracing is gone... >> > > >> > > create_object() indeed gets the originating gfp but it only cares >> > > whether it was GFP_KERNEL or GFP_ATOMIC. gfp_kmemleak_mask() masks out >> > > all the other flags when allocating a struct kmemleak_object (and adds >> > > some of its own). >> > > >> > > This has worked ok so far. There is a higher risk of GFP_ATOMIC >> > > allocations failing but I haven't seen issues with kmemleak unless you >> > > run it under heavy memory pressure (and kmemleak just disables itself). >> > > With fault injection, such pressure is simulated with the side effect of >> > > rendering kmemleak unusable. >> > > >> > > Kmemleak could implement its own allocation mechanism (maybe even >> > > skipping the slab altogether) but I feel it's overkill just to cope with >> > > the specific case of fault injection. Also, it's not easy to figure out >> > > how to balance such pool and it may end up calling alloc_pages() which >> > > in turn can inject a fault. >> >> >> it would benefit kmemleak trace, I see in my test that kmemleak even can work >> in >> user pressure cases, such as in my test, stress-ng to consume >> nearly all the swap space. kmemleak is still working. but 1M objects pool >> is consuming around 400M + memory. So this is just a experiment try, as you >> said, how to balance it's size is the issue or ther issues has to be >> resolved, >> such as when to add pool, the speed, how big, and so on ... >> >> And I fault injected 20000 times fail_page_alloc, and 2148 times happened >> in create_object, and in such a case, kmemleak is till working after the >> 2000+ calloc failures. >> >> [root@dhcp-12-244 fail_page_alloc]# grep create_object /var/log/messages | wc >> -l >> 2148 >> >> [60498.299412] FAULT_INJECTION: forcing a failure. >> name fail_page_alloc, interval 0, probability 80, space 0, times 2 >> >> So this way is not just for fault injection, it's about making kmemleak >> a bit stronger under memory failure case. It would be an exciting experience >> we >> see if kmemleak still work even after mem pressure, as a user, I experienced >> the good usage. >> >> >> > > >> > > Could we tweak gfp_kmemleak_mask() to still pass __GFP_NOFAIL but in a >> > > compatible way (e.g. by setting __GFP_DIRECT_RECLAIM) when fault >> > > injection is enabled? >> >> Maybe I can have a try on this.. > > looks like I tried this in my first report thread, and it failed. As it > can sleep in irq() .. > > https://marc.info/?l=linux-mm&m=152482400222806&w=2 > >> >> > > >> > > Otherwise, I'd prefer some per-thread temporary fault injection >> > > disabling. >> >> I tried in make_it_fail flag, kmemleak can avoid fault injection, but I >> can see kmemleak diabled itself... >> >> > >> > Well, there are two issues (which boil down to the one in the end) here. >> > Fault injection or a GFP_NOWAIT or any other weaker than GFP_KERNEL >> > context is something to care about. A weaker allocation context can and >> > will lead to kmemleak meta data allocation failures regardless of the >> > fault injection. The way how those objects are allocated directly in the >> > allacator context makes this really hard and inherently subtle. I can >> > see only two ways around. Either you have a placeholder for "this object >> > is not tracked so do not throw false positives" or have a preallocated >> > pool to use if the direct context allocation failes for whatever reason. >> > Abusing __GFP_NOFAIL is simply a crude hack which will lead to all kind >> > of problems. FWIW this problem is traditionally solved in dynamic analysis tools by embedding meta info right in headers of heap blocks. All of KASAN, KMSAN, slub debug, LeakSanitizer, asan, valgrind work this way. Then an object is either allocated or not. If caller has something to prevent allocations from failing in any context, then the same will be true for KMEMLEAK meta data. On a related note, we could also consider switching to lib/stackdepot.c for alloc stack memorization to reduce header size. stackdepot is not 100% proof against allocation failures in atomic contexts, but it does eager memory preallocation and seems to work well in practice.
On Mon 04-06-18 10:41:39, Dmitry Vyukov wrote: [...] > FWIW this problem is traditionally solved in dynamic analysis tools by > embedding meta info right in headers of heap blocks. All of KASAN, > KMSAN, slub debug, LeakSanitizer, asan, valgrind work this way. Then > an object is either allocated or not. If caller has something to > prevent allocations from failing in any context, then the same will be > true for KMEMLEAK meta data. > This makes much more sense, of course. I thought there were some fundamental reasons why kmemleak needs to have an off-object tracking which makes the whole thing much more complicated of course.
On Mon, Jun 04, 2018 at 02:42:10PM +0200, Michal Hocko wrote: > On Mon 04-06-18 10:41:39, Dmitry Vyukov wrote: > [...] > > FWIW this problem is traditionally solved in dynamic analysis tools by > > embedding meta info right in headers of heap blocks. All of KASAN, > > KMSAN, slub debug, LeakSanitizer, asan, valgrind work this way. Then > > an object is either allocated or not. If caller has something to > > prevent allocations from failing in any context, then the same will be > > true for KMEMLEAK meta data. > > This makes much more sense, of course. I thought there were some > fundamental reasons why kmemleak needs to have an off-object tracking > which makes the whole thing much more complicated of course. Kmemleak needs to track all memory blocks that may contain pointers (otherwise the dependency graph cannot be correctly tracked leading to lots of false positives). Not all these objects come from the slab allocator, for example it tracks certain alloc_pages() blocks, all of memblock_alloc(). An option would be to use separate metadata only for non-slab objects, though I'd have to see how intrusive this is for mm/sl*b.c. Also there is RCU freeing for the kmemleak metadata to avoid locking when traversing the internal lists. If the metadata is in the slab object itself, we'd have to either defer its freeing or add some bigger lock to kmemleak.
On Mon, Jun 4, 2018 at 5:08 PM, Catalin Marinas <catalin.marinas@arm.com> wrote: > On Mon, Jun 04, 2018 at 02:42:10PM +0200, Michal Hocko wrote: >> On Mon 04-06-18 10:41:39, Dmitry Vyukov wrote: >> [...] >> > FWIW this problem is traditionally solved in dynamic analysis tools by >> > embedding meta info right in headers of heap blocks. All of KASAN, >> > KMSAN, slub debug, LeakSanitizer, asan, valgrind work this way. Then >> > an object is either allocated or not. If caller has something to >> > prevent allocations from failing in any context, then the same will be >> > true for KMEMLEAK meta data. >> >> This makes much more sense, of course. I thought there were some >> fundamental reasons why kmemleak needs to have an off-object tracking >> which makes the whole thing much more complicated of course. > > Kmemleak needs to track all memory blocks that may contain pointers > (otherwise the dependency graph cannot be correctly tracked leading to > lots of false positives). Not all these objects come from the slab > allocator, for example it tracks certain alloc_pages() blocks, all of > memblock_alloc(). I understand that this will make KMEMLEAK tracking non-uniform, but heap objects are the most important class of allocations. page struct already contains stackdepot id if CONFIG_PAGE_OWNER is enabled. Do we need anything else other than stack trace for pages? I don't know about memblock's. > An option would be to use separate metadata only for non-slab objects, > though I'd have to see how intrusive this is for mm/sl*b.c. Also there > is RCU freeing for the kmemleak metadata to avoid locking when > traversing the internal lists. If the metadata is in the slab object > itself, we'd have to either defer its freeing or add some bigger lock to > kmemleak. This relates to scanning without slopped world, right? In our experience with large-scale systematic testing any tool with false positives can't be used in practice in systematic way. KMEMLEAK false positives do not allow to enable it on syzbot. We know there are tons of leaks, we have the tool, but we are not detecting leaks. I don't know who/how uses KMEMLEAK in non-stop-the-world mode, but stop-the-world is pretty much a requirement for deployment for us. And it would also solve the problem with disappearing under our feet heap blocks, right? FWIW In LeakSanitizer we don't specifically keep track of heap blocks. Instead we stop the world and then ask memory allocator for metainfo. I would expect that sl*b also have all required info, maybe in not O(1) accessible form, so it may require some preprocessing (e.g. collecting all free objects in a slab and then subtracting it from set of all objects in the slab to get set of allocated objects). But I understand that all of this turns this from "add a flag" to almost a complete rewrite of the tool...
On Mon, Jun 04, 2018 at 05:36:31PM +0200, Dmitry Vyukov wrote: > On Mon, Jun 4, 2018 at 5:08 PM, Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Mon, Jun 04, 2018 at 02:42:10PM +0200, Michal Hocko wrote: > >> On Mon 04-06-18 10:41:39, Dmitry Vyukov wrote: > >> [...] > >> > FWIW this problem is traditionally solved in dynamic analysis tools by > >> > embedding meta info right in headers of heap blocks. All of KASAN, > >> > KMSAN, slub debug, LeakSanitizer, asan, valgrind work this way. Then > >> > an object is either allocated or not. If caller has something to > >> > prevent allocations from failing in any context, then the same will be > >> > true for KMEMLEAK meta data. > >> > >> This makes much more sense, of course. I thought there were some > >> fundamental reasons why kmemleak needs to have an off-object tracking > >> which makes the whole thing much more complicated of course. > > > > Kmemleak needs to track all memory blocks that may contain pointers > > (otherwise the dependency graph cannot be correctly tracked leading to > > lots of false positives). Not all these objects come from the slab > > allocator, for example it tracks certain alloc_pages() blocks, all of > > memblock_alloc(). > > I understand that this will make KMEMLEAK tracking non-uniform, but > heap objects are the most important class of allocations. > page struct already contains stackdepot id if CONFIG_PAGE_OWNER is > enabled. Do we need anything else other than stack trace for pages? > I don't know about memblock's. Well, it needs most of the other stuff that's in struct kmemleak_object (list_head, rb_node, some counters, spinlock_t). > > An option would be to use separate metadata only for non-slab objects, > > though I'd have to see how intrusive this is for mm/sl*b.c. Also there > > is RCU freeing for the kmemleak metadata to avoid locking when > > traversing the internal lists. If the metadata is in the slab object > > itself, we'd have to either defer its freeing or add some bigger lock to > > kmemleak. > > This relates to scanning without slopped world, right? Initially the RCU mechanism was added to defer kmemleak freeing its metadata with another recursive call into the slab freeing routine (since it does this when the tracked object is freed). This came in handy for other lists traversal in kmemleak. For the actual memory scanning, there is some fine-grained locking per metadata object as we want to block the freeing until the scanning of the specific object completes (e.g. vfree() must not unmap the object during scanning). > In our > experience with large-scale systematic testing any tool with false > positives can't be used in practice in systematic way. KMEMLEAK false > positives do not allow to enable it on syzbot. We know there are tons > of leaks, we have the tool, but we are not detecting leaks. I don't > know who/how uses KMEMLEAK in non-stop-the-world mode, but > stop-the-world is pretty much a requirement for deployment for us. And > it would also solve the problem with disappearing under our feet heap > blocks, right? A hard requirement during the early kmemleak development was not to actually stop the world (as it can even take minutes to complete the scanning). It employs various heuristics to deal with false positives like checksumming, delaying the actual reporting, waiting for an object to be detected as a leak in two successive scans while its checksum is the same. While not ideal, it works most of the time. Now, there was indeed a recent requirement to implement stop-the-world scanning via a "stopscan" command to /sys/kernel/debug/kmemleak (using stop_machine()) but I never got around to implementing it. This would be very useful for non-interactive sessions like automated testing. > FWIW In LeakSanitizer we don't specifically keep track of heap blocks. > Instead we stop the world and then ask memory allocator for metainfo. > I would expect that sl*b also have all required info, maybe in not > O(1) accessible form, so it may require some preprocessing (e.g. > collecting all free objects in a slab and then subtracting it from set > of all objects in the slab to get set of allocated objects). > But I understand that all of this turns this from "add a flag" to > almost a complete rewrite of the tool... As I said above, background scanning is still a requirement but we could add a stopscan command on top, should be too hard.
--- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -111,6 +111,7 @@ #include <linux/kasan.h> #include <linux/kmemleak.h> #include <linux/memory_hotplug.h> +#include <linux/fault-inject.h> /* * Kmemleak configuration and common defines. @@ -126,7 +127,7 @@ /* GFP bitmask for kmemleak internal allocations */ #define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \ __GFP_NORETRY | __GFP_NOMEMALLOC | \ - __GFP_NOWARN | __GFP_NOFAIL) + __GFP_NOWARN) /* scanning area inside a memory block */ struct kmemleak_scan_area { @@ -551,12 +552,15 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, struct kmemleak_object *object, *parent; struct rb_node **link, *rb_parent; + disable_fault_inject(); object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp)); if (!object) { pr_warn("Cannot allocate a kmemleak_object structure\n"); kmemleak_disable(); + enable_fault_inject(); return NULL; } + enable_fault_inject(); INIT_LIST_HEAD(&object->object_list); INIT_LIST_HEAD(&object->gray_list);