diff mbox series

[bpf] lib/buildid: handle memfd_secret() files in build_id_parse()

Message ID 20241014235631.1229438-1-andrii@kernel.org (mailing list archive)
State New
Headers show
Series [bpf] lib/buildid: handle memfd_secret() files in build_id_parse() | expand

Commit Message

Andrii Nakryiko Oct. 14, 2024, 11:56 p.m. UTC
From memfd_secret(2) manpage:

  The memory areas backing the file created with memfd_secret(2) are
  visible only to the processes that have access to the file descriptor.
  The memory region is removed from the kernel page tables and only the
  page tables of the processes holding the file descriptor map the
  corresponding physical memory. (Thus, the pages in the region can't be
  accessed by the kernel itself, so that, for example, pointers to the
  region can't be passed to system calls.)

We need to handle this special case gracefully in build ID fetching
code. Return -EACCESS whenever secretmem file is passed to build_id_parse()
family of APIs. Original report and repro can be found in [0].

  [0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/

Reported-by: Yi Lai <yi1.lai@intel.com>
Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 lib/buildid.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Shakeel Butt Oct. 15, 2024, midnight UTC | #1
On Mon, Oct 14, 2024 at 04:56:31PM GMT, Andrii Nakryiko wrote:
> From memfd_secret(2) manpage:
> 
>   The memory areas backing the file created with memfd_secret(2) are
>   visible only to the processes that have access to the file descriptor.
>   The memory region is removed from the kernel page tables and only the
>   page tables of the processes holding the file descriptor map the
>   corresponding physical memory. (Thus, the pages in the region can't be
>   accessed by the kernel itself, so that, for example, pointers to the
>   region can't be passed to system calls.)
> 
> We need to handle this special case gracefully in build ID fetching
> code. Return -EACCESS whenever secretmem file is passed to build_id_parse()
> family of APIs. Original report and repro can be found in [0].
> 
>   [0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
> 
> Reported-by: Yi Lai <yi1.lai@intel.com>
> Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
> Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction")
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Shakeel Butt Oct. 16, 2024, 6:39 p.m. UTC | #2
Ccing couple more folks who are doing similar work (ASI, guest_memfd)

Folks, what is the generic way to check if a given mapping has folios
unmapped from kernel address space?

On Mon, Oct 14, 2024 at 04:56:31PM GMT, Andrii Nakryiko wrote:
> From memfd_secret(2) manpage:
> 
>   The memory areas backing the file created with memfd_secret(2) are
>   visible only to the processes that have access to the file descriptor.
>   The memory region is removed from the kernel page tables and only the
>   page tables of the processes holding the file descriptor map the
>   corresponding physical memory. (Thus, the pages in the region can't be
>   accessed by the kernel itself, so that, for example, pointers to the
>   region can't be passed to system calls.)
> 
> We need to handle this special case gracefully in build ID fetching
> code. Return -EACCESS whenever secretmem file is passed to build_id_parse()
> family of APIs. Original report and repro can be found in [0].
> 
>   [0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
> 
> Reported-by: Yi Lai <yi1.lai@intel.com>
> Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
> Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction")
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  lib/buildid.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/lib/buildid.c b/lib/buildid.c
> index 290641d92ac1..f0e6facf61c5 100644
> --- a/lib/buildid.c
> +++ b/lib/buildid.c
> @@ -5,6 +5,7 @@
>  #include <linux/elf.h>
>  #include <linux/kernel.h>
>  #include <linux/pagemap.h>
> +#include <linux/secretmem.h>
>  
>  #define BUILD_ID 3
>  
> @@ -64,6 +65,10 @@ static int freader_get_folio(struct freader *r, loff_t file_off)
>  
>  	freader_put_folio(r);
>  
> +	/* reject secretmem folios created with memfd_secret() */
> +	if (secretmem_mapping(r->file->f_mapping))
> +		return -EACCES;
> +
>  	r->folio = filemap_get_folio(r->file->f_mapping, file_off >> PAGE_SHIFT);
>  
>  	/* if sleeping is allowed, wait for the page, if necessary */
> -- 
> 2.43.5
>
Yosry Ahmed Oct. 16, 2024, 7:59 p.m. UTC | #3
On Wed, Oct 16, 2024 at 11:39 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> Ccing couple more folks who are doing similar work (ASI, guest_memfd)
>
> Folks, what is the generic way to check if a given mapping has folios
> unmapped from kernel address space?

I suppose you mean specifically if a folio is not mapped in the direct
map, because a folio can also be mapped in other regions of the kernel
address space (e.g. vmalloc).

From my perspective of working on ASI on the x86 side, I think
lookup_address() is
the right API to use. It returns a PTE and you can check if it is
present.

Based on that, I would say that the generic way is perhaps
kernel_page_present(), which does the above on x86, not sure about
other architectures. It seems like kernel_page_present() always
returns true with !CONFIG_ARCH_HAS_SET_DIRECT_MAP, which assumes that
unmapping folios from the direct map uses set_direct_map_*().

For secretmem, it seems like set_direct_map_*() is indeed the method
used to unmap folios. I am not sure if the same stands for
guest_memfd, but I don't see why not.

ASI does not use set_direct_map_*(), but it doesn't matter in this
context, read below if you care about the reasoning.

ASI does not unmap folios from the direct map in the kernel address
space, but it creates a new "restricted" address space that has the
folios unmapped from the direct map by default. However, I don't think
this is relevant here. IIUC, the purpose of this patch is to check if
the folio is accessible by the kernel, which should be true even in
the ASI restricted address space, because ASI will just transparently
switch to the unrestricted kernel address space where the folio is
mapped if needed.

I hope this helps.


>
> On Mon, Oct 14, 2024 at 04:56:31PM GMT, Andrii Nakryiko wrote:
> > From memfd_secret(2) manpage:
> >
> >   The memory areas backing the file created with memfd_secret(2) are
> >   visible only to the processes that have access to the file descriptor.
> >   The memory region is removed from the kernel page tables and only the
> >   page tables of the processes holding the file descriptor map the
> >   corresponding physical memory. (Thus, the pages in the region can't be
> >   accessed by the kernel itself, so that, for example, pointers to the
> >   region can't be passed to system calls.)
> >
> > We need to handle this special case gracefully in build ID fetching
> > code. Return -EACCESS whenever secretmem file is passed to build_id_parse()
> > family of APIs. Original report and repro can be found in [0].
> >
> >   [0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
> >
> > Reported-by: Yi Lai <yi1.lai@intel.com>
> > Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
> > Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction")
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  lib/buildid.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/lib/buildid.c b/lib/buildid.c
> > index 290641d92ac1..f0e6facf61c5 100644
> > --- a/lib/buildid.c
> > +++ b/lib/buildid.c
> > @@ -5,6 +5,7 @@
> >  #include <linux/elf.h>
> >  #include <linux/kernel.h>
> >  #include <linux/pagemap.h>
> > +#include <linux/secretmem.h>
> >
> >  #define BUILD_ID 3
> >
> > @@ -64,6 +65,10 @@ static int freader_get_folio(struct freader *r, loff_t file_off)
> >
> >       freader_put_folio(r);
> >
> > +     /* reject secretmem folios created with memfd_secret() */
> > +     if (secretmem_mapping(r->file->f_mapping))
> > +             return -EACCES;
> > +
> >       r->folio = filemap_get_folio(r->file->f_mapping, file_off >> PAGE_SHIFT);
> >
> >       /* if sleeping is allowed, wait for the page, if necessary */
> > --
> > 2.43.5
> >
Shakeel Butt Oct. 16, 2024, 9:45 p.m. UTC | #4
On Wed, Oct 16, 2024 at 12:59:13PM GMT, Yosry Ahmed wrote:
> On Wed, Oct 16, 2024 at 11:39 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > Ccing couple more folks who are doing similar work (ASI, guest_memfd)
> >
> > Folks, what is the generic way to check if a given mapping has folios
> > unmapped from kernel address space?
> 
> I suppose you mean specifically if a folio is not mapped in the direct
> map, because a folio can also be mapped in other regions of the kernel
> address space (e.g. vmalloc).
> 
> From my perspective of working on ASI on the x86 side, I think
> lookup_address() is
> the right API to use. It returns a PTE and you can check if it is
> present.
> 
> Based on that, I would say that the generic way is perhaps
> kernel_page_present(), which does the above on x86, not sure about
> other architectures. It seems like kernel_page_present() always
> returns true with !CONFIG_ARCH_HAS_SET_DIRECT_MAP, which assumes that
> unmapping folios from the direct map uses set_direct_map_*().
> 
> For secretmem, it seems like set_direct_map_*() is indeed the method
> used to unmap folios. I am not sure if the same stands for
> guest_memfd, but I don't see why not.
> 
> ASI does not use set_direct_map_*(), but it doesn't matter in this
> context, read below if you care about the reasoning.
> 
> ASI does not unmap folios from the direct map in the kernel address
> space, but it creates a new "restricted" address space that has the
> folios unmapped from the direct map by default. However, I don't think
> this is relevant here. IIUC, the purpose of this patch is to check if
> the folio is accessible by the kernel, which should be true even in
> the ASI restricted address space, because ASI will just transparently
> switch to the unrestricted kernel address space where the folio is
> mapped if needed.
> 
> I hope this helps.
> 

Thanks a lot. This is really helpful.
David Hildenbrand Oct. 17, 2024, 9:17 a.m. UTC | #5
On 16.10.24 20:39, Shakeel Butt wrote:
> Ccing couple more folks who are doing similar work (ASI, guest_memfd)
> 
> Folks, what is the generic way to check if a given mapping has folios
> unmapped from kernel address space?


Can't we just lookup the mapping and refuse these folios that really 
shouldn't be looked at?

See gup_fast_folio_allowed() where we refuse secretmem_mapping().
Shakeel Butt Oct. 17, 2024, 4:22 p.m. UTC | #6
On Thu, Oct 17, 2024 at 11:17:19AM GMT, David Hildenbrand wrote:
> On 16.10.24 20:39, Shakeel Butt wrote:
> > Ccing couple more folks who are doing similar work (ASI, guest_memfd)
> > 
> > Folks, what is the generic way to check if a given mapping has folios
> > unmapped from kernel address space?
> 
> 
> Can't we just lookup the mapping and refuse these folios that really
> shouldn't be looked at?
> 
> See gup_fast_folio_allowed() where we refuse secretmem_mapping().

That is exactly what this patch is doing. See [1]. The reason I asked
this question was because I see parallel efforts related to guest_memfd
and ASI are going to unmap folios from direct map. (Yosry already
explained ASI is a bit different). We want a more robust and future
proof solution.


[1] https://lore.kernel.org/all/20241014235631.1229438-1-andrii@kernel.org/

> 
> 
> -- 
> Cheers,
> 
> David / dhildenb
>
David Hildenbrand Oct. 17, 2024, 5:41 p.m. UTC | #7
On 17.10.24 18:22, Shakeel Butt wrote:
> On Thu, Oct 17, 2024 at 11:17:19AM GMT, David Hildenbrand wrote:
>> On 16.10.24 20:39, Shakeel Butt wrote:
>>> Ccing couple more folks who are doing similar work (ASI, guest_memfd)
>>>
>>> Folks, what is the generic way to check if a given mapping has folios
>>> unmapped from kernel address space?
>>
>>
>> Can't we just lookup the mapping and refuse these folios that really
>> shouldn't be looked at?
>>
>> See gup_fast_folio_allowed() where we refuse secretmem_mapping().
> 
> That is exactly what this patch is doing. See [1].

Hah! I should have looked at the full patch not just the discussion 
where I was CCed :)

> The reason I asked
> this question was because I see parallel efforts related to guest_memfd
> and ASI are going to unmap folios from direct map. (Yosry already
> explained ASI is a bit different). We want a more robust and future
> proof solution.

There was a discussion a while ago about having the abstraction of 
inaccessible mappings.

See 
https://lore.kernel.org/all/c87a4ba0-b9c4-4044-b0c3-c1112601494f@redhat.com/

It would be a more future-proof replacement of the secretmem checks.
diff mbox series

Patch

diff --git a/lib/buildid.c b/lib/buildid.c
index 290641d92ac1..f0e6facf61c5 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@ 
 #include <linux/elf.h>
 #include <linux/kernel.h>
 #include <linux/pagemap.h>
+#include <linux/secretmem.h>
 
 #define BUILD_ID 3
 
@@ -64,6 +65,10 @@  static int freader_get_folio(struct freader *r, loff_t file_off)
 
 	freader_put_folio(r);
 
+	/* reject secretmem folios created with memfd_secret() */
+	if (secretmem_mapping(r->file->f_mapping))
+		return -EACCES;
+
 	r->folio = filemap_get_folio(r->file->f_mapping, file_off >> PAGE_SHIFT);
 
 	/* if sleeping is allowed, wait for the page, if necessary */