Message ID | 20241014235631.1229438-1-andrii@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [bpf] lib/buildid: handle memfd_secret() files in build_id_parse() | expand |
On Mon, Oct 14, 2024 at 04:56:31PM GMT, Andrii Nakryiko wrote: > From memfd_secret(2) manpage: > > The memory areas backing the file created with memfd_secret(2) are > visible only to the processes that have access to the file descriptor. > The memory region is removed from the kernel page tables and only the > page tables of the processes holding the file descriptor map the > corresponding physical memory. (Thus, the pages in the region can't be > accessed by the kernel itself, so that, for example, pointers to the > region can't be passed to system calls.) > > We need to handle this special case gracefully in build ID fetching > code. Return -EACCESS whenever secretmem file is passed to build_id_parse() > family of APIs. Original report and repro can be found in [0]. > > [0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/ > > Reported-by: Yi Lai <yi1.lai@intel.com> > Suggested-by: Shakeel Butt <shakeel.butt@linux.dev> > Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction") > Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Ccing couple more folks who are doing similar work (ASI, guest_memfd) Folks, what is the generic way to check if a given mapping has folios unmapped from kernel address space? On Mon, Oct 14, 2024 at 04:56:31PM GMT, Andrii Nakryiko wrote: > From memfd_secret(2) manpage: > > The memory areas backing the file created with memfd_secret(2) are > visible only to the processes that have access to the file descriptor. > The memory region is removed from the kernel page tables and only the > page tables of the processes holding the file descriptor map the > corresponding physical memory. (Thus, the pages in the region can't be > accessed by the kernel itself, so that, for example, pointers to the > region can't be passed to system calls.) > > We need to handle this special case gracefully in build ID fetching > code. Return -EACCESS whenever secretmem file is passed to build_id_parse() > family of APIs. Original report and repro can be found in [0]. > > [0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/ > > Reported-by: Yi Lai <yi1.lai@intel.com> > Suggested-by: Shakeel Butt <shakeel.butt@linux.dev> > Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction") > Signed-off-by: Andrii Nakryiko <andrii@kernel.org> > --- > lib/buildid.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/lib/buildid.c b/lib/buildid.c > index 290641d92ac1..f0e6facf61c5 100644 > --- a/lib/buildid.c > +++ b/lib/buildid.c > @@ -5,6 +5,7 @@ > #include <linux/elf.h> > #include <linux/kernel.h> > #include <linux/pagemap.h> > +#include <linux/secretmem.h> > > #define BUILD_ID 3 > > @@ -64,6 +65,10 @@ static int freader_get_folio(struct freader *r, loff_t file_off) > > freader_put_folio(r); > > + /* reject secretmem folios created with memfd_secret() */ > + if (secretmem_mapping(r->file->f_mapping)) > + return -EACCES; > + > r->folio = filemap_get_folio(r->file->f_mapping, file_off >> PAGE_SHIFT); > > /* if sleeping is allowed, wait for the page, if necessary */ > -- > 2.43.5 >
On Wed, Oct 16, 2024 at 11:39 AM Shakeel Butt <shakeel.butt@linux.dev> wrote: > > Ccing couple more folks who are doing similar work (ASI, guest_memfd) > > Folks, what is the generic way to check if a given mapping has folios > unmapped from kernel address space? I suppose you mean specifically if a folio is not mapped in the direct map, because a folio can also be mapped in other regions of the kernel address space (e.g. vmalloc). From my perspective of working on ASI on the x86 side, I think lookup_address() is the right API to use. It returns a PTE and you can check if it is present. Based on that, I would say that the generic way is perhaps kernel_page_present(), which does the above on x86, not sure about other architectures. It seems like kernel_page_present() always returns true with !CONFIG_ARCH_HAS_SET_DIRECT_MAP, which assumes that unmapping folios from the direct map uses set_direct_map_*(). For secretmem, it seems like set_direct_map_*() is indeed the method used to unmap folios. I am not sure if the same stands for guest_memfd, but I don't see why not. ASI does not use set_direct_map_*(), but it doesn't matter in this context, read below if you care about the reasoning. ASI does not unmap folios from the direct map in the kernel address space, but it creates a new "restricted" address space that has the folios unmapped from the direct map by default. However, I don't think this is relevant here. IIUC, the purpose of this patch is to check if the folio is accessible by the kernel, which should be true even in the ASI restricted address space, because ASI will just transparently switch to the unrestricted kernel address space where the folio is mapped if needed. I hope this helps. > > On Mon, Oct 14, 2024 at 04:56:31PM GMT, Andrii Nakryiko wrote: > > From memfd_secret(2) manpage: > > > > The memory areas backing the file created with memfd_secret(2) are > > visible only to the processes that have access to the file descriptor. > > The memory region is removed from the kernel page tables and only the > > page tables of the processes holding the file descriptor map the > > corresponding physical memory. (Thus, the pages in the region can't be > > accessed by the kernel itself, so that, for example, pointers to the > > region can't be passed to system calls.) > > > > We need to handle this special case gracefully in build ID fetching > > code. Return -EACCESS whenever secretmem file is passed to build_id_parse() > > family of APIs. Original report and repro can be found in [0]. > > > > [0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/ > > > > Reported-by: Yi Lai <yi1.lai@intel.com> > > Suggested-by: Shakeel Butt <shakeel.butt@linux.dev> > > Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction") > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org> > > --- > > lib/buildid.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/lib/buildid.c b/lib/buildid.c > > index 290641d92ac1..f0e6facf61c5 100644 > > --- a/lib/buildid.c > > +++ b/lib/buildid.c > > @@ -5,6 +5,7 @@ > > #include <linux/elf.h> > > #include <linux/kernel.h> > > #include <linux/pagemap.h> > > +#include <linux/secretmem.h> > > > > #define BUILD_ID 3 > > > > @@ -64,6 +65,10 @@ static int freader_get_folio(struct freader *r, loff_t file_off) > > > > freader_put_folio(r); > > > > + /* reject secretmem folios created with memfd_secret() */ > > + if (secretmem_mapping(r->file->f_mapping)) > > + return -EACCES; > > + > > r->folio = filemap_get_folio(r->file->f_mapping, file_off >> PAGE_SHIFT); > > > > /* if sleeping is allowed, wait for the page, if necessary */ > > -- > > 2.43.5 > >
On Wed, Oct 16, 2024 at 12:59:13PM GMT, Yosry Ahmed wrote: > On Wed, Oct 16, 2024 at 11:39 AM Shakeel Butt <shakeel.butt@linux.dev> wrote: > > > > Ccing couple more folks who are doing similar work (ASI, guest_memfd) > > > > Folks, what is the generic way to check if a given mapping has folios > > unmapped from kernel address space? > > I suppose you mean specifically if a folio is not mapped in the direct > map, because a folio can also be mapped in other regions of the kernel > address space (e.g. vmalloc). > > From my perspective of working on ASI on the x86 side, I think > lookup_address() is > the right API to use. It returns a PTE and you can check if it is > present. > > Based on that, I would say that the generic way is perhaps > kernel_page_present(), which does the above on x86, not sure about > other architectures. It seems like kernel_page_present() always > returns true with !CONFIG_ARCH_HAS_SET_DIRECT_MAP, which assumes that > unmapping folios from the direct map uses set_direct_map_*(). > > For secretmem, it seems like set_direct_map_*() is indeed the method > used to unmap folios. I am not sure if the same stands for > guest_memfd, but I don't see why not. > > ASI does not use set_direct_map_*(), but it doesn't matter in this > context, read below if you care about the reasoning. > > ASI does not unmap folios from the direct map in the kernel address > space, but it creates a new "restricted" address space that has the > folios unmapped from the direct map by default. However, I don't think > this is relevant here. IIUC, the purpose of this patch is to check if > the folio is accessible by the kernel, which should be true even in > the ASI restricted address space, because ASI will just transparently > switch to the unrestricted kernel address space where the folio is > mapped if needed. > > I hope this helps. > Thanks a lot. This is really helpful.
On 16.10.24 20:39, Shakeel Butt wrote: > Ccing couple more folks who are doing similar work (ASI, guest_memfd) > > Folks, what is the generic way to check if a given mapping has folios > unmapped from kernel address space? Can't we just lookup the mapping and refuse these folios that really shouldn't be looked at? See gup_fast_folio_allowed() where we refuse secretmem_mapping().
On Thu, Oct 17, 2024 at 11:17:19AM GMT, David Hildenbrand wrote: > On 16.10.24 20:39, Shakeel Butt wrote: > > Ccing couple more folks who are doing similar work (ASI, guest_memfd) > > > > Folks, what is the generic way to check if a given mapping has folios > > unmapped from kernel address space? > > > Can't we just lookup the mapping and refuse these folios that really > shouldn't be looked at? > > See gup_fast_folio_allowed() where we refuse secretmem_mapping(). That is exactly what this patch is doing. See [1]. The reason I asked this question was because I see parallel efforts related to guest_memfd and ASI are going to unmap folios from direct map. (Yosry already explained ASI is a bit different). We want a more robust and future proof solution. [1] https://lore.kernel.org/all/20241014235631.1229438-1-andrii@kernel.org/ > > > -- > Cheers, > > David / dhildenb >
On 17.10.24 18:22, Shakeel Butt wrote: > On Thu, Oct 17, 2024 at 11:17:19AM GMT, David Hildenbrand wrote: >> On 16.10.24 20:39, Shakeel Butt wrote: >>> Ccing couple more folks who are doing similar work (ASI, guest_memfd) >>> >>> Folks, what is the generic way to check if a given mapping has folios >>> unmapped from kernel address space? >> >> >> Can't we just lookup the mapping and refuse these folios that really >> shouldn't be looked at? >> >> See gup_fast_folio_allowed() where we refuse secretmem_mapping(). > > That is exactly what this patch is doing. See [1]. Hah! I should have looked at the full patch not just the discussion where I was CCed :) > The reason I asked > this question was because I see parallel efforts related to guest_memfd > and ASI are going to unmap folios from direct map. (Yosry already > explained ASI is a bit different). We want a more robust and future > proof solution. There was a discussion a while ago about having the abstraction of inaccessible mappings. See https://lore.kernel.org/all/c87a4ba0-b9c4-4044-b0c3-c1112601494f@redhat.com/ It would be a more future-proof replacement of the secretmem checks.
diff --git a/lib/buildid.c b/lib/buildid.c index 290641d92ac1..f0e6facf61c5 100644 --- a/lib/buildid.c +++ b/lib/buildid.c @@ -5,6 +5,7 @@ #include <linux/elf.h> #include <linux/kernel.h> #include <linux/pagemap.h> +#include <linux/secretmem.h> #define BUILD_ID 3 @@ -64,6 +65,10 @@ static int freader_get_folio(struct freader *r, loff_t file_off) freader_put_folio(r); + /* reject secretmem folios created with memfd_secret() */ + if (secretmem_mapping(r->file->f_mapping)) + return -EACCES; + r->folio = filemap_get_folio(r->file->f_mapping, file_off >> PAGE_SHIFT); /* if sleeping is allowed, wait for the page, if necessary */
From memfd_secret(2) manpage: The memory areas backing the file created with memfd_secret(2) are visible only to the processes that have access to the file descriptor. The memory region is removed from the kernel page tables and only the page tables of the processes holding the file descriptor map the corresponding physical memory. (Thus, the pages in the region can't be accessed by the kernel itself, so that, for example, pointers to the region can't be passed to system calls.) We need to handle this special case gracefully in build ID fetching code. Return -EACCESS whenever secretmem file is passed to build_id_parse() family of APIs. Original report and repro can be found in [0]. [0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/ Reported-by: Yi Lai <yi1.lai@intel.com> Suggested-by: Shakeel Butt <shakeel.butt@linux.dev> Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> --- lib/buildid.c | 5 +++++ 1 file changed, 5 insertions(+)