diff mbox series

kmemleak memory scanning

Message ID YMe8ktUsdtwFKHuF@nuc10 (mailing list archive)
State New, archived
Headers show
Series kmemleak memory scanning | expand

Commit Message

Rustam Kovhaev June 14, 2021, 8:31 p.m. UTC
hello Catalin, Andrew!

while troubleshooting a false positive syzbot kmemleak report i have
noticed an interesting behavior in kmemleak and i wonder whether it is
behavior by design and should be documented, or maybe something to
improve.
apologies if some of the questions do not make sense, i am still going
through kmemleak code..

a) kmemleak scans struct page (kmemleak.c:1462), but it does not scan
the actual contents (page_address(page)) of the page.
if we allocate an object with kmalloc(), then allocate page with
alloc_page(), and if we put kmalloc pointer somewhere inside that page, 
kmemleak will report kmalloc pointer as a false positive.
should we improve kmemleak and make it scan page contents?
or will this bring too many false negatives?

b) when kmemleak object gets created (kmemleak.c:598) it gets checksum
of 0, by the time user requests kmemleak "scan" via debugfs the pointer
will be most likely changed to some value by the kernel and during
first scan kmemleak won't report the object as orphan even if it did not
find any reference to it, because it will execute update_checksum() and
after that will proceed to updating object->count (kmemleak.c:1502).
and so the user will have to initiate a second "scan" via debugfs and
only then kmemleak will produce the report.
should we document this?

below i am attaching a simplified reproducer for the false positive
kmemleak report (a).
i could have done it in the module, but i found it to be easier and
faster to test when done in a syscall, so that i did not have to
modprobe/modprobe -r.

tyvm!

---
 arch/x86/entry/syscalls/syscall_64.tbl |  1 +
 include/linux/syscalls.h               |  1 +
 mm/Makefile                            |  2 +-
 mm/kmemleak_test.c                     | 45 ++++++++++++++++++++++++++
 4 files changed, 48 insertions(+), 1 deletion(-)
 create mode 100644 mm/kmemleak_test.c

Comments

Dmitry Vyukov June 15, 2021, 5:15 a.m. UTC | #1
On Mon, Jun 14, 2021 at 10:31 PM Rustam Kovhaev <rkovhaev@gmail.com> wrote:
>
> hello Catalin, Andrew!
>
> while troubleshooting a false positive syzbot kmemleak report i have
> noticed an interesting behavior in kmemleak and i wonder whether it is
> behavior by design and should be documented, or maybe something to
> improve.
> apologies if some of the questions do not make sense, i am still going
> through kmemleak code..
>
> a) kmemleak scans struct page (kmemleak.c:1462), but it does not scan
> the actual contents (page_address(page)) of the page.
> if we allocate an object with kmalloc(), then allocate page with
> alloc_page(), and if we put kmalloc pointer somewhere inside that page,
> kmemleak will report kmalloc pointer as a false positive.
> should we improve kmemleak and make it scan page contents?
> or will this bring too many false negatives?

Hi Rustam,

Nice debugging!
I assume lots of pages are allocated for slab and we don't want to
scan the whole page if only a few slab objects are alive on the page.
However alloc_pages() can be called by end kernel code as well.
I grepped for any kmemleak annotations around existing calls to
alloc_pages, but did not find any...
Does it require an explicit kmemleak_alloc() after allocating the page
and kmemleak_free () before freeing the page?
If there are more than one use case for this, I guess we could add
some GFP flag for this maybe.





> b) when kmemleak object gets created (kmemleak.c:598) it gets checksum
> of 0, by the time user requests kmemleak "scan" via debugfs the pointer
> will be most likely changed to some value by the kernel and during
> first scan kmemleak won't report the object as orphan even if it did not
> find any reference to it, because it will execute update_checksum() and
> after that will proceed to updating object->count (kmemleak.c:1502).
> and so the user will have to initiate a second "scan" via debugfs and
> only then kmemleak will produce the report.
> should we document this?
>
> below i am attaching a simplified reproducer for the false positive
> kmemleak report (a).
> i could have done it in the module, but i found it to be easier and
> faster to test when done in a syscall, so that i did not have to
> modprobe/modprobe -r.
>
> tyvm!
>
> ---
>  arch/x86/entry/syscalls/syscall_64.tbl |  1 +
>  include/linux/syscalls.h               |  1 +
>  mm/Makefile                            |  2 +-
>  mm/kmemleak_test.c                     | 45 ++++++++++++++++++++++++++
>  4 files changed, 48 insertions(+), 1 deletion(-)
>  create mode 100644 mm/kmemleak_test.c
>
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index ce18119ea0d0..da967a87eb78 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -343,6 +343,7 @@
>  332    common  statx                   sys_statx
>  333    common  io_pgetevents           sys_io_pgetevents
>  334    common  rseq                    sys_rseq
> +335    common  kmemleak_test           sys_kmemleak_test
>  # don't use numbers 387 through 423, add new calls after the last
>  # 'common' entry
>  424    common  pidfd_send_signal       sys_pidfd_send_signal
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 050511e8f1f8..0602308aabf4 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -1029,6 +1029,7 @@ asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
>                           unsigned mask, struct statx __user *buffer);
>  asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len,
>                          int flags, uint32_t sig);
> +asmlinkage long sys_kmemleak_test()
>  asmlinkage long sys_open_tree(int dfd, const char __user *path, unsigned flags);
>  asmlinkage long sys_move_mount(int from_dfd, const char __user *from_path,
>                                int to_dfd, const char __user *to_path,
> diff --git a/mm/Makefile b/mm/Makefile
> index bf71e295e9f6..878783838fa1 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -97,7 +97,7 @@ obj-$(CONFIG_CGROUP_HUGETLB) += hugetlb_cgroup.o
>  obj-$(CONFIG_GUP_TEST) += gup_test.o
>  obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o
>  obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
> -obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
> +obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o kmemleak_test.o
>  obj-$(CONFIG_DEBUG_RODATA_TEST) += rodata_test.o
>  obj-$(CONFIG_DEBUG_VM_PGTABLE) += debug_vm_pgtable.o
>  obj-$(CONFIG_PAGE_OWNER) += page_owner.o
> diff --git a/mm/kmemleak_test.c b/mm/kmemleak_test.c
> new file mode 100644
> index 000000000000..828246e20b7f
> --- /dev/null
> +++ b/mm/kmemleak_test.c
> @@ -0,0 +1,45 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +
> +#include <linux/sched.h>
> +#include <linux/syscalls.h>
> +#include <linux/types.h>
> +#include <linux/mm.h>
> +
> +struct kmemleak_check {
> +       unsigned long canary;
> +       struct work_struct work;
> +       struct page **pages;
> +};
> +
> +static void work_func(struct work_struct *work)
> +{
> +       struct page **pages;
> +       struct kmemleak_check *ptr;
> +
> +       set_current_state(TASK_INTERRUPTIBLE);
> +       schedule_timeout(3600*HZ);
> +
> +       ptr = container_of(work, struct kmemleak_check, work);
> +       pages = ptr->pages;
> +       __free_page(pages[0]);
> +       kvfree(pages);
> +}
> +
> +SYSCALL_DEFINE0(kmemleak_test)
> +{
> +       struct page **pages, *page;
> +       struct kmemleak_check *ptr;
> +
> +       pages = kzalloc(sizeof(*pages), GFP_KERNEL);
> +       page = alloc_page(GFP_KERNEL);
> +       pages[0] = page;
> +       ptr = page_address(page);
> +       ptr->canary = 0x00FF00FF00FF00FF;
> +       ptr->pages = pages;
> +       pr_info("DEBUG: pages %px page %px ptr %px\n", pages, page, ptr);
> +
> +       INIT_WORK(&ptr->work, work_func);
> +       schedule_work(&ptr->work);
> +
> +       return 0;
> +}
> --
> 2.30.2
>
David Hildenbrand June 15, 2021, 8:12 a.m. UTC | #2
On 14.06.21 22:31, Rustam Kovhaev wrote:
> hello Catalin, Andrew!
> 
> while troubleshooting a false positive syzbot kmemleak report i have
> noticed an interesting behavior in kmemleak and i wonder whether it is
> behavior by design and should be documented, or maybe something to
> improve.

Hi,

See below regarding documentation.

> apologies if some of the questions do not make sense, i am still going
> through kmemleak code..
> 
> a) kmemleak scans struct page (kmemleak.c:1462), but it does not scan
> the actual contents (page_address(page)) of the page.
> if we allocate an object with kmalloc(), then allocate page with
> alloc_page(), and if we put kmalloc pointer somewhere inside that page,
> kmemleak will report kmalloc pointer as a false positive.
> should we improve kmemleak and make it scan page contents?
> or will this bring too many false negatives?

I looked into this a while ago to see which parts of the kernel end up 
reading random physical page content and was happy to see that kmemleak 
does *not* scan random physical memory :)

We have to be very careful when reading random physical page content, 
especially in virt environments this is really undesired, or when 
dealing with memory holes, memory with problematic semantics like gart 
memory ...

The doc (Documentation/dev-tools/kmemleak.rst) states "Page allocations 
and ioremap are not tracked.", which includes the alloc_page() example 
you gave I think.
Catalin Marinas June 15, 2021, 10:15 a.m. UTC | #3
Hi Rustam,

On Mon, Jun 14, 2021 at 01:31:14PM -0700, Rustam Kovhaev wrote:
> a) kmemleak scans struct page (kmemleak.c:1462), but it does not scan
> the actual contents (page_address(page)) of the page.
> if we allocate an object with kmalloc(), then allocate page with
> alloc_page(), and if we put kmalloc pointer somewhere inside that page, 
> kmemleak will report kmalloc pointer as a false positive.
> should we improve kmemleak and make it scan page contents?
> or will this bring too many false negatives?

This is indeed on purpose otherwise (1) we'd get a lot of false
negatives and (2) the scanning would take significantly longer. There
are a lot more pages allocated for user processes than used in the
kernel, we don't need to scan them all.

We do have a few places where we explicitly call kmemleak_alloc():
neigh_hash_alloc(), alloc_page_ext(), blk_mq_alloc_rqs(),
early_amd_iommu_init().

> b) when kmemleak object gets created (kmemleak.c:598) it gets checksum
> of 0, by the time user requests kmemleak "scan" via debugfs the pointer
> will be most likely changed to some value by the kernel and during
> first scan kmemleak won't report the object as orphan even if it did not
> find any reference to it, because it will execute update_checksum() and
> after that will proceed to updating object->count (kmemleak.c:1502).
> and so the user will have to initiate a second "scan" via debugfs and
> only then kmemleak will produce the report.
> should we document this?

That's a mitigation against false positives. Lot's of objects that get
allocated just prior to a memory scan have a tendency to be reported as
leaks before they get referenced via e.g. a list (and the in-object
list_head structure updated). So you'd need to get the checksum
identical in two consecutive scans to report it as a leak.

We should probably document this.
Rustam Kovhaev June 16, 2021, 6:25 p.m. UTC | #4
On Tue, Jun 15, 2021 at 07:15:24AM +0200, Dmitry Vyukov wrote:
> On Mon, Jun 14, 2021 at 10:31 PM Rustam Kovhaev <rkovhaev@gmail.com> wrote:
> >
> > hello Catalin, Andrew!
> >
> > while troubleshooting a false positive syzbot kmemleak report i have
> > noticed an interesting behavior in kmemleak and i wonder whether it is
> > behavior by design and should be documented, or maybe something to
> > improve.
> > apologies if some of the questions do not make sense, i am still going
> > through kmemleak code..
> >
> > a) kmemleak scans struct page (kmemleak.c:1462), but it does not scan
> > the actual contents (page_address(page)) of the page.
> > if we allocate an object with kmalloc(), then allocate page with
> > alloc_page(), and if we put kmalloc pointer somewhere inside that page,
> > kmemleak will report kmalloc pointer as a false positive.
> > should we improve kmemleak and make it scan page contents?
> > or will this bring too many false negatives?
> 
> Hi Rustam,
> 
> Nice debugging!
> I assume lots of pages are allocated for slab and we don't want to
> scan the whole page if only a few slab objects are alive on the page.
> However alloc_pages() can be called by end kernel code as well.
> I grepped for any kmemleak annotations around existing calls to
> alloc_pages, but did not find any...
> Does it require an explicit kmemleak_alloc() after allocating the page
> and kmemleak_free () before freeing the page?

hi Dmitry, thank you!
yes, as Catalin has pointed out, there are a few places where we call
kmemleak_alloc()/kmemleak_free() explicitly in order for the pages to be
scanned, like in blk_mq_alloc_rqs()

> If there are more than one use case for this, I guess we could add
> some GFP flag for this maybe.

and this way kernel users won't have to use kmemleak fuctions mentioned
above including some or most kmemleak_not_leak() calls and basically
kmemleak will be kind of "transparent" to them? and they will only need
to use the GFP flag to instruct kmemleak to scan the page contents?
it sounds like a good idea to me..
Rustam Kovhaev June 16, 2021, 6:27 p.m. UTC | #5
On Tue, Jun 15, 2021 at 10:12:06AM +0200, David Hildenbrand wrote:
> On 14.06.21 22:31, Rustam Kovhaev wrote:
> > hello Catalin, Andrew!
> > 
> > while troubleshooting a false positive syzbot kmemleak report i have
> > noticed an interesting behavior in kmemleak and i wonder whether it is
> > behavior by design and should be documented, or maybe something to
> > improve.
> 
> Hi,
> 
> See below regarding documentation.
> 
> > apologies if some of the questions do not make sense, i am still going
> > through kmemleak code..
> > 
> > a) kmemleak scans struct page (kmemleak.c:1462), but it does not scan
> > the actual contents (page_address(page)) of the page.
> > if we allocate an object with kmalloc(), then allocate page with
> > alloc_page(), and if we put kmalloc pointer somewhere inside that page,
> > kmemleak will report kmalloc pointer as a false positive.
> > should we improve kmemleak and make it scan page contents?
> > or will this bring too many false negatives?
> 
> I looked into this a while ago to see which parts of the kernel end up
> reading random physical page content and was happy to see that kmemleak does
> *not* scan random physical memory :)
> 
> We have to be very careful when reading random physical page content,
> especially in virt environments this is really undesired, or when dealing
> with memory holes, memory with problematic semantics like gart memory ...

i see, makes sense, thank you for the info!

> 
> The doc (Documentation/dev-tools/kmemleak.rst) states "Page allocations and
> ioremap are not tracked.", which includes the alloc_page() example you gave
> I think.

i see it now, ty!
Rustam Kovhaev June 16, 2021, 6:36 p.m. UTC | #6
hi Catalin,

On Tue, Jun 15, 2021 at 11:15:15AM +0100, Catalin Marinas wrote:
> Hi Rustam,
> 
> On Mon, Jun 14, 2021 at 01:31:14PM -0700, Rustam Kovhaev wrote:
> > a) kmemleak scans struct page (kmemleak.c:1462), but it does not scan
> > the actual contents (page_address(page)) of the page.
> > if we allocate an object with kmalloc(), then allocate page with
> > alloc_page(), and if we put kmalloc pointer somewhere inside that page, 
> > kmemleak will report kmalloc pointer as a false positive.
> > should we improve kmemleak and make it scan page contents?
> > or will this bring too many false negatives?
> 
> This is indeed on purpose otherwise (1) we'd get a lot of false
> negatives and (2) the scanning would take significantly longer. There
> are a lot more pages allocated for user processes than used in the
> kernel, we don't need to scan them all.
> 
> We do have a few places where we explicitly call kmemleak_alloc():
> neigh_hash_alloc(), alloc_page_ext(), blk_mq_alloc_rqs(),
> early_amd_iommu_init().

makes sense, tyvm!

> > b) when kmemleak object gets created (kmemleak.c:598) it gets checksum
> > of 0, by the time user requests kmemleak "scan" via debugfs the pointer
> > will be most likely changed to some value by the kernel and during
> > first scan kmemleak won't report the object as orphan even if it did not
> > find any reference to it, because it will execute update_checksum() and
> > after that will proceed to updating object->count (kmemleak.c:1502).
> > and so the user will have to initiate a second "scan" via debugfs and
> > only then kmemleak will produce the report.
> > should we document this?
> 
> That's a mitigation against false positives. Lot's of objects that get
> allocated just prior to a memory scan have a tendency to be reported as
> leaks before they get referenced via e.g. a list (and the in-object
> list_head structure updated). So you'd need to get the checksum
> identical in two consecutive scans to report it as a leak.
> 
> We should probably document this.

thanks, i'll send a documentation patch for this
Rustam Kovhaev June 24, 2021, 5:36 p.m. UTC | #7
On Wed, Jun 16, 2021 at 11:25:22AM -0700, Rustam Kovhaev wrote:
> On Tue, Jun 15, 2021 at 07:15:24AM +0200, Dmitry Vyukov wrote:
> > On Mon, Jun 14, 2021 at 10:31 PM Rustam Kovhaev <rkovhaev@gmail.com> wrote:
> > >
> > > hello Catalin, Andrew!
> > >
> > > while troubleshooting a false positive syzbot kmemleak report i have
> > > noticed an interesting behavior in kmemleak and i wonder whether it is
> > > behavior by design and should be documented, or maybe something to
> > > improve.
> > > apologies if some of the questions do not make sense, i am still going
> > > through kmemleak code..
> > >
> > > a) kmemleak scans struct page (kmemleak.c:1462), but it does not scan
> > > the actual contents (page_address(page)) of the page.
> > > if we allocate an object with kmalloc(), then allocate page with
> > > alloc_page(), and if we put kmalloc pointer somewhere inside that page,
> > > kmemleak will report kmalloc pointer as a false positive.
> > > should we improve kmemleak and make it scan page contents?
> > > or will this bring too many false negatives?
> > 
> > Hi Rustam,
> > 
> > Nice debugging!
> > I assume lots of pages are allocated for slab and we don't want to
> > scan the whole page if only a few slab objects are alive on the page.
> > However alloc_pages() can be called by end kernel code as well.
> > I grepped for any kmemleak annotations around existing calls to
> > alloc_pages, but did not find any...
> > Does it require an explicit kmemleak_alloc() after allocating the page
> > and kmemleak_free () before freeing the page?
> 
> hi Dmitry, thank you!
> yes, as Catalin has pointed out, there are a few places where we call
> kmemleak_alloc()/kmemleak_free() explicitly in order for the pages to be
> scanned, like in blk_mq_alloc_rqs()
> 
> > If there are more than one use case for this, I guess we could add
> > some GFP flag for this maybe.
> 
> and this way kernel users won't have to use kmemleak fuctions mentioned
> above including some or most kmemleak_not_leak() calls and basically
> kmemleak will be kind of "transparent" to them? and they will only need
> to use the GFP flag to instruct kmemleak to scan the page contents?
> it sounds like a good idea to me..
> 

i've been thinking about this and it seems like in the scenario where we
want kmemleak to scan only some part of the page, we will have to either
do separate alloc_page() calls with different flags or use 
kmemleak_scan_area() to limit the memory scan area. maybe this approach
won't simplify things and will produce more code instead of reducing it
Catalin Marinas June 25, 2021, 3:01 p.m. UTC | #8
On Thu, Jun 24, 2021 at 10:36:50AM -0700, Rustam Kovhaev wrote:
> On Wed, Jun 16, 2021 at 11:25:22AM -0700, Rustam Kovhaev wrote:
> > On Tue, Jun 15, 2021 at 07:15:24AM +0200, Dmitry Vyukov wrote:
> > > On Mon, Jun 14, 2021 at 10:31 PM Rustam Kovhaev <rkovhaev@gmail.com> wrote:
> > > >
> > > > hello Catalin, Andrew!
> > > >
> > > > while troubleshooting a false positive syzbot kmemleak report i have
> > > > noticed an interesting behavior in kmemleak and i wonder whether it is
> > > > behavior by design and should be documented, or maybe something to
> > > > improve.
> > > > apologies if some of the questions do not make sense, i am still going
> > > > through kmemleak code..
> > > >
> > > > a) kmemleak scans struct page (kmemleak.c:1462), but it does not scan
> > > > the actual contents (page_address(page)) of the page.
> > > > if we allocate an object with kmalloc(), then allocate page with
> > > > alloc_page(), and if we put kmalloc pointer somewhere inside that page,
> > > > kmemleak will report kmalloc pointer as a false positive.
> > > > should we improve kmemleak and make it scan page contents?
> > > > or will this bring too many false negatives?
> > > 
> > > Hi Rustam,
> > > 
> > > Nice debugging!
> > > I assume lots of pages are allocated for slab and we don't want to
> > > scan the whole page if only a few slab objects are alive on the page.
> > > However alloc_pages() can be called by end kernel code as well.
> > > I grepped for any kmemleak annotations around existing calls to
> > > alloc_pages, but did not find any...
> > > Does it require an explicit kmemleak_alloc() after allocating the page
> > > and kmemleak_free () before freeing the page?
> > 
> > hi Dmitry, thank you!
> > yes, as Catalin has pointed out, there are a few places where we call
> > kmemleak_alloc()/kmemleak_free() explicitly in order for the pages to be
> > scanned, like in blk_mq_alloc_rqs()
> > 
> > > If there are more than one use case for this, I guess we could add
> > > some GFP flag for this maybe.
> > 
> > and this way kernel users won't have to use kmemleak fuctions mentioned
> > above including some or most kmemleak_not_leak() calls and basically
> > kmemleak will be kind of "transparent" to them? and they will only need
> > to use the GFP flag to instruct kmemleak to scan the page contents?
> > it sounds like a good idea to me..
> > 
> 
> i've been thinking about this and it seems like in the scenario where we
> want kmemleak to scan only some part of the page, we will have to either
> do separate alloc_page() calls with different flags or use 
> kmemleak_scan_area() to limit the memory scan area. maybe this approach
> won't simplify things and will produce more code instead of reducing it

Since page allocation is not tracked by kmemleak, you can always do an
explicit kmemleak_alloc() call with a smaller size than a full page.
Rustam Kovhaev June 25, 2021, 3:27 p.m. UTC | #9
Hi Catalin,

On Fri, Jun 25, 2021 at 04:01:33PM +0100, Catalin Marinas wrote:
> On Thu, Jun 24, 2021 at 10:36:50AM -0700, Rustam Kovhaev wrote:
> > On Wed, Jun 16, 2021 at 11:25:22AM -0700, Rustam Kovhaev wrote:
> > > On Tue, Jun 15, 2021 at 07:15:24AM +0200, Dmitry Vyukov wrote:
> > > > On Mon, Jun 14, 2021 at 10:31 PM Rustam Kovhaev <rkovhaev@gmail.com> wrote:
> > > > >
> > > > > hello Catalin, Andrew!
> > > > >
> > > > > while troubleshooting a false positive syzbot kmemleak report i have
> > > > > noticed an interesting behavior in kmemleak and i wonder whether it is
> > > > > behavior by design and should be documented, or maybe something to
> > > > > improve.
> > > > > apologies if some of the questions do not make sense, i am still going
> > > > > through kmemleak code..
> > > > >
> > > > > a) kmemleak scans struct page (kmemleak.c:1462), but it does not scan
> > > > > the actual contents (page_address(page)) of the page.
> > > > > if we allocate an object with kmalloc(), then allocate page with
> > > > > alloc_page(), and if we put kmalloc pointer somewhere inside that page,
> > > > > kmemleak will report kmalloc pointer as a false positive.
> > > > > should we improve kmemleak and make it scan page contents?
> > > > > or will this bring too many false negatives?
> > > > 
> > > > Hi Rustam,
> > > > 
> > > > Nice debugging!
> > > > I assume lots of pages are allocated for slab and we don't want to
> > > > scan the whole page if only a few slab objects are alive on the page.
> > > > However alloc_pages() can be called by end kernel code as well.
> > > > I grepped for any kmemleak annotations around existing calls to
> > > > alloc_pages, but did not find any...
> > > > Does it require an explicit kmemleak_alloc() after allocating the page
> > > > and kmemleak_free () before freeing the page?
> > > 
> > > hi Dmitry, thank you!
> > > yes, as Catalin has pointed out, there are a few places where we call
> > > kmemleak_alloc()/kmemleak_free() explicitly in order for the pages to be
> > > scanned, like in blk_mq_alloc_rqs()
> > > 
> > > > If there are more than one use case for this, I guess we could add
> > > > some GFP flag for this maybe.
> > > 
> > > and this way kernel users won't have to use kmemleak fuctions mentioned
> > > above including some or most kmemleak_not_leak() calls and basically
> > > kmemleak will be kind of "transparent" to them? and they will only need
> > > to use the GFP flag to instruct kmemleak to scan the page contents?
> > > it sounds like a good idea to me..
> > > 
> > 
> > i've been thinking about this and it seems like in the scenario where we
> > want kmemleak to scan only some part of the page, we will have to either
> > do separate alloc_page() calls with different flags or use 
> > kmemleak_scan_area() to limit the memory scan area. maybe this approach
> > won't simplify things and will produce more code instead of reducing it
> 
> Since page allocation is not tracked by kmemleak, you can always do an
> explicit kmemleak_alloc() call with a smaller size than a full page.
> 
right, but if i understood Dmitry's idea correctly, he was thinking
about using a new GFP flag, like GFP_KMEMLEAK, and burying
kmemleak_alloc() in page allocator
Dmitry Vyukov June 25, 2021, 3:36 p.m. UTC | #10
On Fri, Jun 25, 2021 at 5:28 PM Rustam Kovhaev <rkovhaev@gmail.com> wrote:
>
> Hi Catalin,
>
> On Fri, Jun 25, 2021 at 04:01:33PM +0100, Catalin Marinas wrote:
> > On Thu, Jun 24, 2021 at 10:36:50AM -0700, Rustam Kovhaev wrote:
> > > On Wed, Jun 16, 2021 at 11:25:22AM -0700, Rustam Kovhaev wrote:
> > > > On Tue, Jun 15, 2021 at 07:15:24AM +0200, Dmitry Vyukov wrote:
> > > > > On Mon, Jun 14, 2021 at 10:31 PM Rustam Kovhaev <rkovhaev@gmail.com> wrote:
> > > > > >
> > > > > > hello Catalin, Andrew!
> > > > > >
> > > > > > while troubleshooting a false positive syzbot kmemleak report i have
> > > > > > noticed an interesting behavior in kmemleak and i wonder whether it is
> > > > > > behavior by design and should be documented, or maybe something to
> > > > > > improve.
> > > > > > apologies if some of the questions do not make sense, i am still going
> > > > > > through kmemleak code..
> > > > > >
> > > > > > a) kmemleak scans struct page (kmemleak.c:1462), but it does not scan
> > > > > > the actual contents (page_address(page)) of the page.
> > > > > > if we allocate an object with kmalloc(), then allocate page with
> > > > > > alloc_page(), and if we put kmalloc pointer somewhere inside that page,
> > > > > > kmemleak will report kmalloc pointer as a false positive.
> > > > > > should we improve kmemleak and make it scan page contents?
> > > > > > or will this bring too many false negatives?
> > > > >
> > > > > Hi Rustam,
> > > > >
> > > > > Nice debugging!
> > > > > I assume lots of pages are allocated for slab and we don't want to
> > > > > scan the whole page if only a few slab objects are alive on the page.
> > > > > However alloc_pages() can be called by end kernel code as well.
> > > > > I grepped for any kmemleak annotations around existing calls to
> > > > > alloc_pages, but did not find any...
> > > > > Does it require an explicit kmemleak_alloc() after allocating the page
> > > > > and kmemleak_free () before freeing the page?
> > > >
> > > > hi Dmitry, thank you!
> > > > yes, as Catalin has pointed out, there are a few places where we call
> > > > kmemleak_alloc()/kmemleak_free() explicitly in order for the pages to be
> > > > scanned, like in blk_mq_alloc_rqs()
> > > >
> > > > > If there are more than one use case for this, I guess we could add
> > > > > some GFP flag for this maybe.
> > > >
> > > > and this way kernel users won't have to use kmemleak fuctions mentioned
> > > > above including some or most kmemleak_not_leak() calls and basically
> > > > kmemleak will be kind of "transparent" to them? and they will only need
> > > > to use the GFP flag to instruct kmemleak to scan the page contents?
> > > > it sounds like a good idea to me..
> > > >
> > >
> > > i've been thinking about this and it seems like in the scenario where we
> > > want kmemleak to scan only some part of the page, we will have to either
> > > do separate alloc_page() calls with different flags or use
> > > kmemleak_scan_area() to limit the memory scan area. maybe this approach
> > > won't simplify things and will produce more code instead of reducing it
> >
> > Since page allocation is not tracked by kmemleak, you can always do an
> > explicit kmemleak_alloc() call with a smaller size than a full page.
> >
> right, but if i understood Dmitry's idea correctly, he was thinking
> about using a new GFP flag, like GFP_KMEMLEAK, and burying
> kmemleak_alloc() in page allocator

I don't have a strong opinion either way. Especially since we have
only 1 use case so far.
diff mbox series

Patch

diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index ce18119ea0d0..da967a87eb78 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -343,6 +343,7 @@ 
 332	common	statx			sys_statx
 333	common	io_pgetevents		sys_io_pgetevents
 334	common	rseq			sys_rseq
+335	common	kmemleak_test		sys_kmemleak_test
 # don't use numbers 387 through 423, add new calls after the last
 # 'common' entry
 424	common	pidfd_send_signal	sys_pidfd_send_signal
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 050511e8f1f8..0602308aabf4 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1029,6 +1029,7 @@  asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
 			  unsigned mask, struct statx __user *buffer);
 asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len,
 			 int flags, uint32_t sig);
+asmlinkage long sys_kmemleak_test()
 asmlinkage long sys_open_tree(int dfd, const char __user *path, unsigned flags);
 asmlinkage long sys_move_mount(int from_dfd, const char __user *from_path,
 			       int to_dfd, const char __user *to_path,
diff --git a/mm/Makefile b/mm/Makefile
index bf71e295e9f6..878783838fa1 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -97,7 +97,7 @@  obj-$(CONFIG_CGROUP_HUGETLB) += hugetlb_cgroup.o
 obj-$(CONFIG_GUP_TEST) += gup_test.o
 obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o
 obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
-obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
+obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o kmemleak_test.o
 obj-$(CONFIG_DEBUG_RODATA_TEST) += rodata_test.o
 obj-$(CONFIG_DEBUG_VM_PGTABLE) += debug_vm_pgtable.o
 obj-$(CONFIG_PAGE_OWNER) += page_owner.o
diff --git a/mm/kmemleak_test.c b/mm/kmemleak_test.c
new file mode 100644
index 000000000000..828246e20b7f
--- /dev/null
+++ b/mm/kmemleak_test.c
@@ -0,0 +1,45 @@ 
+// SPDX-License-Identifier: GPL-2.0+
+
+#include <linux/sched.h>
+#include <linux/syscalls.h>
+#include <linux/types.h>
+#include <linux/mm.h>
+
+struct kmemleak_check {
+	unsigned long canary;
+	struct work_struct work;
+	struct page **pages;
+};
+
+static void work_func(struct work_struct *work)
+{
+	struct page **pages;
+	struct kmemleak_check *ptr;
+
+	set_current_state(TASK_INTERRUPTIBLE);
+	schedule_timeout(3600*HZ);
+
+	ptr = container_of(work, struct kmemleak_check, work);
+	pages = ptr->pages;
+	__free_page(pages[0]);
+	kvfree(pages);
+}
+
+SYSCALL_DEFINE0(kmemleak_test)
+{
+	struct page **pages, *page;
+	struct kmemleak_check *ptr;
+
+	pages = kzalloc(sizeof(*pages), GFP_KERNEL);
+	page = alloc_page(GFP_KERNEL);
+	pages[0] = page;
+	ptr = page_address(page);
+	ptr->canary = 0x00FF00FF00FF00FF;
+	ptr->pages = pages;
+	pr_info("DEBUG: pages %px page %px ptr %px\n", pages, page, ptr);
+
+	INIT_WORK(&ptr->work, work_func);
+	schedule_work(&ptr->work);
+
+	return 0;
+}