mbox series

[RFC,v5,0/7] mseal system mappings

Message ID 20250212032155.1276806-1-jeffxu@google.com (mailing list archive)
Headers show
Series mseal system mappings | expand

Message

Jeff Xu Feb. 12, 2025, 3:21 a.m. UTC
From: Jeff Xu <jeffxu@chromium.org>

The commit message in the first patch contains the full description of
this series.

------------------
History:

V5
  - Remove kernel cmd line (Lorenzo Stoakes)
  - Add test info (Lorenzo Stoakes)
  - Add threat model info (Lorenzo Stoakes)
  - Fix x86 selftest: test_mremap_vdso
  - Restrict code change to ARM64/x86-64/UM arch only.
  - Add userprocess.h to include seal_system_mapping().
  - Remove sealing vsyscall.
  - Split the patch.

V4:
  https://lore.kernel.org/all/20241125202021.3684919-1-jeffxu@google.com/

V3:
  https://lore.kernel.org/all/20241113191602.3541870-1-jeffxu@google.com/

V2:
  https://lore.kernel.org/all/20241014215022.68530-1-jeffxu@google.com/

V1:
  https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@google.com/

Jeff Xu (7):
  mseal, system mappings: kernel config and header change
  selftests: x86: test_mremap_vdso: skip if vdso is msealed
  mseal, system mappings: enable x86-64
  mseal, system mappings: enable arm64
  mseal, system mappings: enable uml architecture
  mseal, system mappings: uprobe mapping
  mseal, system mappings: update mseal.rst

 Documentation/userspace-api/mseal.rst         |  5 +++
 arch/arm64/Kconfig                            |  1 +
 arch/arm64/kernel/vdso.c                      | 23 +++++++----
 arch/um/Kconfig                               |  1 +
 arch/x86/Kconfig                              |  1 +
 arch/x86/entry/vdso/vma.c                     | 17 ++++++---
 arch/x86/um/vdso/vma.c                        |  7 +++-
 include/linux/userprocess.h                   | 18 +++++++++
 init/Kconfig                                  | 18 +++++++++
 kernel/events/uprobes.c                       |  6 ++-
 security/Kconfig                              | 18 +++++++++
 .../testing/selftests/x86/test_mremap_vdso.c  | 38 +++++++++++++++++++
 12 files changed, 137 insertions(+), 16 deletions(-)
 create mode 100644 include/linux/userprocess.h

Comments

Lorenzo Stoakes Feb. 12, 2025, 11:24 a.m. UTC | #1
On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
> From: Jeff Xu <jeffxu@chromium.org>
>
> The commit message in the first patch contains the full description of
> this series.

Sorry to nit, but it'd be useful to reproduce in the cover letter too! But
this obviously isn't urgent, just be nice when we un-RFC.

Thanks for sending as RFC, appreciated, keen to figure out a way forward
with this series and this gives us space to discuss.

One thing that came up recently with the LWN article (...!) was that rr is
also impacted by this [0].

I think with this behind a config flag we're fine (this refers to my
'opt-in' comment in the reply on LWN) as my concerns about this being
enabled in a broken way without an explicit kernel configuration are
addressed, and actually we do expose a means by which a user can detect if
the VDSO for instance is sealed via /proc/$pid/[s]maps.

So tools like rr and such can be updated to check for this. I wonder if we
ought to try to liaise with the known problematic ones?

It'd be nice to update the documentation to have a list of 'known
problematic userland software with sealed VDSO' so we make people aware.

Hopefully we are acheiving the opt-in nature of the thing here, but it
makes me wonder whether we need a prctl() interface to optionally disable
even if the system has it enabled as a whole.

That way, rr for instance can just turn it off for debugging purposes. To
be clear I am not trying to add additional extranous tasks here - my one
and only motive is to ensure that the feature works and we address concerns
about any possible breakage.

And I _want the series to land_ :>) I suspect we're close now.

I am tied up with a number of other tasks at the moment so apologies if I
take a second to get back to this series, but just wanted to say thanks for
addressing my various points, and that I will definitely provide review
(it's on the whiteboard, the only true guarantee I will do something :P).

I will also come back to your testing series which I owe you review on,
which is equally on the same whiteboard...

Thanks, Lorenzo

[0]:https://lwn.net/Articles/1007984/

>
> ------------------
> History:
>
> V5
>   - Remove kernel cmd line (Lorenzo Stoakes)
>   - Add test info (Lorenzo Stoakes)
>   - Add threat model info (Lorenzo Stoakes)
>   - Fix x86 selftest: test_mremap_vdso
>   - Restrict code change to ARM64/x86-64/UM arch only.
>   - Add userprocess.h to include seal_system_mapping().
>   - Remove sealing vsyscall.
>   - Split the patch.
>
> V4:
>   https://lore.kernel.org/all/20241125202021.3684919-1-jeffxu@google.com/
>
> V3:
>   https://lore.kernel.org/all/20241113191602.3541870-1-jeffxu@google.com/
>
> V2:
>   https://lore.kernel.org/all/20241014215022.68530-1-jeffxu@google.com/
>
> V1:
>   https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@google.com/
>
> Jeff Xu (7):
>   mseal, system mappings: kernel config and header change
>   selftests: x86: test_mremap_vdso: skip if vdso is msealed
>   mseal, system mappings: enable x86-64
>   mseal, system mappings: enable arm64
>   mseal, system mappings: enable uml architecture
>   mseal, system mappings: uprobe mapping
>   mseal, system mappings: update mseal.rst
>
>  Documentation/userspace-api/mseal.rst         |  5 +++
>  arch/arm64/Kconfig                            |  1 +
>  arch/arm64/kernel/vdso.c                      | 23 +++++++----
>  arch/um/Kconfig                               |  1 +
>  arch/x86/Kconfig                              |  1 +
>  arch/x86/entry/vdso/vma.c                     | 17 ++++++---
>  arch/x86/um/vdso/vma.c                        |  7 +++-
>  include/linux/userprocess.h                   | 18 +++++++++
>  init/Kconfig                                  | 18 +++++++++
>  kernel/events/uprobes.c                       |  6 ++-
>  security/Kconfig                              | 18 +++++++++
>  .../testing/selftests/x86/test_mremap_vdso.c  | 38 +++++++++++++++++++
>  12 files changed, 137 insertions(+), 16 deletions(-)
>  create mode 100644 include/linux/userprocess.h
>
> --
> 2.48.1.502.g6dc24dfdaf-goog
>
Pedro Falcato Feb. 12, 2025, 12:37 p.m. UTC | #2
On Wed, Feb 12, 2025 at 11:25 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
> > From: Jeff Xu <jeffxu@chromium.org>
> >
> > The commit message in the first patch contains the full description of
> > this series.
>
> Sorry to nit, but it'd be useful to reproduce in the cover letter too! But
> this obviously isn't urgent, just be nice when we un-RFC.
>
> Thanks for sending as RFC, appreciated, keen to figure out a way forward
> with this series and this gives us space to discuss.
>
> One thing that came up recently with the LWN article (...!) was that rr is
> also impacted by this [0].
>
> I think with this behind a config flag we're fine (this refers to my
> 'opt-in' comment in the reply on LWN) as my concerns about this being
> enabled in a broken way without an explicit kernel configuration are
> addressed, and actually we do expose a means by which a user can detect if
> the VDSO for instance is sealed via /proc/$pid/[s]maps.
>
> So tools like rr and such can be updated to check for this. I wonder if we
> ought to try to liaise with the known problematic ones?
>
> It'd be nice to update the documentation to have a list of 'known
> problematic userland software with sealed VDSO' so we make people aware.
>
> Hopefully we are acheiving the opt-in nature of the thing here, but it
> makes me wonder whether we need a prctl() interface to optionally disable
> even if the system has it enabled as a whole.

Just noting that (as we discussed off-list) doing prctl() would not
work, because that would effectively be an munseal for those vdso
regions.
Possibly something like a personality() flag (that's *not* inherited
when AT_SECURE/secureexec). But personalities have other issues...

FWIW, although it would (at the moment) be hard to pull off in the
libc, I still much prefer it to playing these weird games with CONFIG
options and kernel command line options and prctl and personality and
whatnot. It seems to me like we're trying to stick policy where it
doesn't belong.
Lorenzo Stoakes Feb. 12, 2025, 2:01 p.m. UTC | #3
(sorry I really am struggling to reply to mail as lore still seems to be
broken).

On Wed, Feb 12, 2025 at 12:37:50PM +0000, Pedro Falcato wrote:
> On Wed, Feb 12, 2025 at 11:25 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
> > > From: Jeff Xu <jeffxu@chromium.org>
> > >
> > > The commit message in the first patch contains the full description of
> > > this series.
> >
> > Sorry to nit, but it'd be useful to reproduce in the cover letter too! But
> > this obviously isn't urgent, just be nice when we un-RFC.
> >
> > Thanks for sending as RFC, appreciated, keen to figure out a way forward
> > with this series and this gives us space to discuss.
> >
> > One thing that came up recently with the LWN article (...!) was that rr is
> > also impacted by this [0].
> >
> > I think with this behind a config flag we're fine (this refers to my
> > 'opt-in' comment in the reply on LWN) as my concerns about this being
> > enabled in a broken way without an explicit kernel configuration are
> > addressed, and actually we do expose a means by which a user can detect if
> > the VDSO for instance is sealed via /proc/$pid/[s]maps.
> >
> > So tools like rr and such can be updated to check for this. I wonder if we
> > ought to try to liaise with the known problematic ones?
> >
> > It'd be nice to update the documentation to have a list of 'known
> > problematic userland software with sealed VDSO' so we make people aware.
> >
> > Hopefully we are acheiving the opt-in nature of the thing here, but it
> > makes me wonder whether we need a prctl() interface to optionally disable
> > even if the system has it enabled as a whole.
>
> Just noting that (as we discussed off-list) doing prctl() would not
> work, because that would effectively be an munseal for those vdso
> regions.
> Possibly something like a personality() flag (that's *not* inherited
> when AT_SECURE/secureexec). But personalities have other issues...

Thanks, yeah that's a good point, it would have to be implemented as a
personality or something similar otherwise you're essentially relying on
'unsealing' which can't be permitted.

I'm not sure how useful that'd be for the likes of rr though. But I suppose
if it makes everything exec'd by a child inherit it then maybe that works
for a debugging session etc.?

>
> FWIW, although it would (at the moment) be hard to pull off in the
> libc, I still much prefer it to playing these weird games with CONFIG
> options and kernel command line options and prctl and personality and
> whatnot. It seems to me like we're trying to stick policy where it
> doesn't belong.

The problem is, as a security feature, you don't want to make it trivially
easy to disable.

I mean we _need_ a config option to be able to strictly enforce only making
the feature enable-able on architectures and configuration option
combinations that work.

But if there is userspace that will be broken, we really have to have some
way of avoiding the disconnect between somebody making policy decision at
the kernel level and somebody trying to run something.

Because I can easily envision somebody enabling this as a 'good security
feature' for a distro release or such, only for somebody else to later try
rr, CRIU, or whatever else and for it to just not work or fail subtly and
to have no idea why.

I mean one option is to have it as a CONFIG_ flag _and_ you have to enable
it via a tunable, so then it can become sysctl.d policy for instance.

The CONFIG_ flag dependency is critical because we don't want to enable
this on arches that have not been tested against it.

It's vital at any rate that we document everywhere we can that _this might
break some userland that depends on remapping the VDSO_.

>
> --
> Pedro
Johannes Berg Feb. 12, 2025, 2:08 p.m. UTC | #4
On Wed, 2025-02-12 at 14:01 +0000, Lorenzo Stoakes wrote:
> Thanks, yeah that's a good point, it would have to be implemented as a
> personality or something similar otherwise you're essentially relying on
> 'unsealing' which can't be permitted.
> 
> I'm not sure how useful that'd be for the likes of rr though. But I suppose
> if it makes everything exec'd by a child inherit it then maybe that works
> for a debugging session etc.?

For whatever that's worth, ARCH=um should not need 'unsealing' or 'not
sealing' it for *itself*, but rather only for the *children* it starts,
which are for the userspace processes inside of it. Which I suppose
could actually start without a VDSO in the first place, but I don't
think that's possible now?

Which I'll note should not have access to the host, so in a way this
outer security feature (sealing) breaks the inner ARCH=um security, I
think.

johannes
Kees Cook Feb. 12, 2025, 10:05 p.m. UTC | #5
On Wed, Feb 12, 2025 at 11:24:35AM +0000, Lorenzo Stoakes wrote:
> On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
> > From: Jeff Xu <jeffxu@chromium.org>
> >
> > The commit message in the first patch contains the full description of
> > this series.
> 
> Sorry to nit, but it'd be useful to reproduce in the cover letter too! But
> this obviously isn't urgent, just be nice when we un-RFC.

I advised Jeff against this because I've found it can sometimes cause
"thread splitting" in that some people reply to the cover letter, and
some people reply to the first patch, etc. I've tended to try to keep
cover letters very general, with the bulk of the prose in the first
patch.

> It'd be nice to update the documentation to have a list of 'known
> problematic userland software with sealed VDSO' so we make people aware.

I like this idea! Probably in mseal.rst, as the Kconfig help already
points there.

-Kees
Jeff Xu Feb. 13, 2025, 2:19 p.m. UTC | #6
On Wed, Feb 12, 2025 at 3:24 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:

> It'd be nice to update the documentation to have a list of 'known
> problematic userland software with sealed VDSO' so we make people aware.
>
Sure. It will be added in the next version.

>
> And I _want the series to land_ :>) I suspect we're close now.
>
Thank you for reviewing the patch and giving support.

> I am tied up with a number of other tasks at the moment so apologies if I
> take a second to get back to this series, but just wanted to say thanks for
> addressing my various points, and that I will definitely provide review
> (it's on the whiteboard, the only true guarantee I will do something :P).
>
> I will also come back to your testing series which I owe you review on,
> which is equally on the same whiteboard...
>
Thank you for the heads up.

-Jeff
Jeff Xu Feb. 13, 2025, 2:20 p.m. UTC | #7
On Wed, Feb 12, 2025 at 2:05 PM Kees Cook <kees@kernel.org> wrote:
>
> > It'd be nice to update the documentation to have a list of 'known
> > problematic userland software with sealed VDSO' so we make people aware.
>
> I like this idea! Probably in mseal.rst, as the Kconfig help already
> points there.
>
Will update mseal.rst to include the above suggestion in the next version.

Thanks
-Jeff

> -Kees


>
> --
> Kees Cook
Liam R. Howlett Feb. 13, 2025, 6:35 p.m. UTC | #8
* Kees Cook <kees@kernel.org> [250212 17:05]:
> On Wed, Feb 12, 2025 at 11:24:35AM +0000, Lorenzo Stoakes wrote:
> > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
> > > From: Jeff Xu <jeffxu@chromium.org>
> > >
> > > The commit message in the first patch contains the full description of
> > > this series.
> > 
> > Sorry to nit, but it'd be useful to reproduce in the cover letter too! But
> > this obviously isn't urgent, just be nice when we un-RFC.
> 
> I advised Jeff against this because I've found it can sometimes cause
> "thread splitting" in that some people reply to the cover letter, and
> some people reply to the first patch, etc. I've tended to try to keep
> cover letters very general, with the bulk of the prose in the first
> patch.

Interesting idea, but I think thread splitting is less of a concern than
diluting the meaning of a patch by including a lengthy change log with a
fraction of the text being about the code that follows.

I think this is the reason for a cover letter in the first place; not
just version control.  After all, we could tack the version information
into the first patch too and avoid it being in the final commit message.

Thanks,
Liam
Kees Cook Feb. 13, 2025, 7:34 p.m. UTC | #9
On February 13, 2025 10:35:21 AM PST, "Liam R. Howlett" <Liam.Howlett@oracle.com> wrote:
>* Kees Cook <kees@kernel.org> [250212 17:05]:
>> On Wed, Feb 12, 2025 at 11:24:35AM +0000, Lorenzo Stoakes wrote:
>> > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
>> > > From: Jeff Xu <jeffxu@chromium.org>
>> > >
>> > > The commit message in the first patch contains the full description of
>> > > this series.
>> > 
>> > Sorry to nit, but it'd be useful to reproduce in the cover letter too! But
>> > this obviously isn't urgent, just be nice when we un-RFC.
>> 
>> I advised Jeff against this because I've found it can sometimes cause
>> "thread splitting" in that some people reply to the cover letter, and
>> some people reply to the first patch, etc. I've tended to try to keep
>> cover letters very general, with the bulk of the prose in the first
>> patch.
>
>Interesting idea, but I think thread splitting is less of a concern than
>diluting the meaning of a patch by including a lengthy change log with a
>fraction of the text being about the code that follows.
>
>I think this is the reason for a cover letter in the first place; not
>just version control.  After all, we could tack the version information
>into the first patch too and avoid it being in the final commit message.

Okay, so to be clear: you'd prefer to put the rationales and other stuff in the cover, and put more specific details in the first patch? I've not liked this because cover letters aren't (except for akpm's trees) included anywhere in git, which makes archeology much harder.

-Kees
Pedro Falcato Feb. 13, 2025, 7:59 p.m. UTC | #10
On Wed, Feb 12, 2025 at 2:02 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> (sorry I really am struggling to reply to mail as lore still seems to be
> broken).
>
> On Wed, Feb 12, 2025 at 12:37:50PM +0000, Pedro Falcato wrote:
> > On Wed, Feb 12, 2025 at 11:25 AM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > >
> > > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
> > > > From: Jeff Xu <jeffxu@chromium.org>
> > > >
> > > > The commit message in the first patch contains the full description of
> > > > this series.
> > >
> > > Sorry to nit, but it'd be useful to reproduce in the cover letter too! But
> > > this obviously isn't urgent, just be nice when we un-RFC.
> > >
> > > Thanks for sending as RFC, appreciated, keen to figure out a way forward
> > > with this series and this gives us space to discuss.
> > >
> > > One thing that came up recently with the LWN article (...!) was that rr is
> > > also impacted by this [0].
> > >
> > > I think with this behind a config flag we're fine (this refers to my
> > > 'opt-in' comment in the reply on LWN) as my concerns about this being
> > > enabled in a broken way without an explicit kernel configuration are
> > > addressed, and actually we do expose a means by which a user can detect if
> > > the VDSO for instance is sealed via /proc/$pid/[s]maps.
> > >
> > > So tools like rr and such can be updated to check for this. I wonder if we
> > > ought to try to liaise with the known problematic ones?
> > >
> > > It'd be nice to update the documentation to have a list of 'known
> > > problematic userland software with sealed VDSO' so we make people aware.
> > >
> > > Hopefully we are acheiving the opt-in nature of the thing here, but it
> > > makes me wonder whether we need a prctl() interface to optionally disable
> > > even if the system has it enabled as a whole.
> >
> > Just noting that (as we discussed off-list) doing prctl() would not
> > work, because that would effectively be an munseal for those vdso
> > regions.
> > Possibly something like a personality() flag (that's *not* inherited
> > when AT_SECURE/secureexec). But personalities have other issues...
>
> Thanks, yeah that's a good point, it would have to be implemented as a
> personality or something similar otherwise you're essentially relying on
> 'unsealing' which can't be permitted.
>
> I'm not sure how useful that'd be for the likes of rr though. But I suppose
> if it makes everything exec'd by a child inherit it then maybe that works
> for a debugging session etc.?
>
> >
> > FWIW, although it would (at the moment) be hard to pull off in the
> > libc, I still much prefer it to playing these weird games with CONFIG
> > options and kernel command line options and prctl and personality and
> > whatnot. It seems to me like we're trying to stick policy where it
> > doesn't belong.
>
> The problem is, as a security feature, you don't want to make it trivially
> easy to disable.
>
> I mean we _need_ a config option to be able to strictly enforce only making
> the feature enable-able on architectures and configuration option
> combinations that work.
>
> But if there is userspace that will be broken, we really have to have some
> way of avoiding the disconnect between somebody making policy decision at
> the kernel level and somebody trying to run something.
>
> Because I can easily envision somebody enabling this as a 'good security
> feature' for a distro release or such, only for somebody else to later try
> rr, CRIU, or whatever else and for it to just not work or fail subtly and
> to have no idea why.

Ok so I went looking around for the glibc patchset. It seems they're
moving away from tunables and there was a nice
GNU_PROPERTY_MEMORY_SEAL added to binutils.
So my proposal is to parse this property on the binfmt_elf.c side, and
mm would use this to know if we should seal these mappings. This seems
to tackle compatibility problems,
and glibc isn't sealing programs without this program header anyway. Thoughts?

>
> I mean one option is to have it as a CONFIG_ flag _and_ you have to enable
> it via a tunable, so then it can become sysctl.d policy for instance.

sysctl is also an option but the idea of dropping a random feature
behind a CONFIG_ that's unusable by lots of people (including the
general GNU/Linux ecosystem) is really really unappealing to me.
Liam R. Howlett Feb. 13, 2025, 8:10 p.m. UTC | #11
* Kees Cook <kees@kernel.org> [250213 14:34]:
> 
> 
> On February 13, 2025 10:35:21 AM PST, "Liam R. Howlett" <Liam.Howlett@oracle.com> wrote:
> >* Kees Cook <kees@kernel.org> [250212 17:05]:
> >> On Wed, Feb 12, 2025 at 11:24:35AM +0000, Lorenzo Stoakes wrote:
> >> > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
> >> > > From: Jeff Xu <jeffxu@chromium.org>
> >> > >
> >> > > The commit message in the first patch contains the full description of
> >> > > this series.
> >> > 
> >> > Sorry to nit, but it'd be useful to reproduce in the cover letter too! But
> >> > this obviously isn't urgent, just be nice when we un-RFC.
> >> 
> >> I advised Jeff against this because I've found it can sometimes cause
> >> "thread splitting" in that some people reply to the cover letter, and
> >> some people reply to the first patch, etc. I've tended to try to keep
> >> cover letters very general, with the bulk of the prose in the first
> >> patch.
> >
> >Interesting idea, but I think thread splitting is less of a concern than
> >diluting the meaning of a patch by including a lengthy change log with a
> >fraction of the text being about the code that follows.
> >
> >I think this is the reason for a cover letter in the first place; not
> >just version control.  After all, we could tack the version information
> >into the first patch too and avoid it being in the final commit message.
> 
> Okay, so to be clear: you'd prefer to put the rationales and other stuff in the cover, and put more specific details in the first patch? I've not liked this because cover letters aren't (except for akpm's trees) included anywhere in git, which makes archeology much harder.

Yes, rationales in the cover letter.  I like the way the akpm's tree
does things because it's the best of both worlds.  There is also a
separation of the cover letter with the actual commit message on the
first patch.

Having the full cover letter on patch 1 makes it difficult to understand
*that* patch on its own during review.

I've also gotten emails from Linus asking why in the ____ing ____ I did
it this way when I said why in the cover letter.. to that note I like
the patches to have _all_ the necessary details for that one patch,
including the sometimes "this is changed in the very next patch" lines
to spell out in-transit patches, or what ever else is needed from the
cover letter/context.

Taking this example, we have a 111 line cover letter and a patch that
adds a new file with a single function, and two kconfig options.  The
justification and reason for the patch is in the middle of that huge
block of text.  That seems silly.

That is to say:
Cover letters have the rationale and over-arching reason.
Patches have more than enough details to code inspect and know why this
patch is necessary.

Thanks,
Liam
Kees Cook Feb. 13, 2025, 8:47 p.m. UTC | #12
On Thu, Feb 13, 2025 at 07:59:48PM +0000, Pedro Falcato wrote:
> On Wed, Feb 12, 2025 at 2:02 PM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > (sorry I really am struggling to reply to mail as lore still seems to be
> > broken).
> >
> > On Wed, Feb 12, 2025 at 12:37:50PM +0000, Pedro Falcato wrote:
> > > On Wed, Feb 12, 2025 at 11:25 AM Lorenzo Stoakes
> > > <lorenzo.stoakes@oracle.com> wrote:
> > > >
> > > > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
> > > > > From: Jeff Xu <jeffxu@chromium.org>
> > > > >
> > > > > The commit message in the first patch contains the full description of
> > > > > this series.
> > > >
> > [...]
> > >
> > > FWIW, although it would (at the moment) be hard to pull off in the
> > > libc, I still much prefer it to playing these weird games with CONFIG
> > > options and kernel command line options and prctl and personality and
> > > whatnot. It seems to me like we're trying to stick policy where it
> > > doesn't belong.
> >
> > The problem is, as a security feature, you don't want to make it trivially
> > easy to disable.
> >
> > I mean we _need_ a config option to be able to strictly enforce only making
> > the feature enable-able on architectures and configuration option
> > combinations that work.
> >
> > But if there is userspace that will be broken, we really have to have some
> > way of avoiding the disconnect between somebody making policy decision at
> > the kernel level and somebody trying to run something.
> >
> > Because I can easily envision somebody enabling this as a 'good security
> > feature' for a distro release or such, only for somebody else to later try
> > rr, CRIU, or whatever else and for it to just not work or fail subtly and
> > to have no idea why.
> 
> Ok so I went looking around for the glibc patchset. It seems they're
> moving away from tunables and there was a nice
> GNU_PROPERTY_MEMORY_SEAL added to binutils.
> So my proposal is to parse this property on the binfmt_elf.c side, and
> mm would use this to know if we should seal these mappings. This seems
> to tackle compatibility problems,
> and glibc isn't sealing programs without this program header anyway. Thoughts?

It seems to me that doing this ties it to the binary, rather than
execution context, which may want to seal/not-seal, etc. I have a sense
that it's be better as a secure bit, or prctl, or something like that. The
properties seem to be better suited for "this binary _can_ do a thing"
or "this binary _requires_ a thing", like the GNU_STACK bits, etc. But
maybe there's more to this I'm not considering?

> > I mean one option is to have it as a CONFIG_ flag _and_ you have to enable
> > it via a tunable, so then it can become sysctl.d policy for instance.
> 
> sysctl is also an option but the idea of dropping a random feature
> behind a CONFIG_ that's unusable by lots of people (including the
> general GNU/Linux ecosystem) is really really unappealing to me.

I agree 100%, but I think we need to make small steps. Behind a CONFIG
means we get it implemented, and then we can look at how to make it more
flexible. I'm motivated to figure this out because I've long wanted to
have a boot param to disable CRIU since I have distro systems that I
don't use CRIU on, and I don't want the (very small) interface changes
it makes available into seccomp filter visibility. And if CRIU could be
run-time based, so could system mapping sealing. :)
Pedro Falcato Feb. 18, 2025, 11:18 p.m. UTC | #13
On Thu, Feb 13, 2025 at 8:47 PM Kees Cook <kees@kernel.org> wrote:
>
> On Thu, Feb 13, 2025 at 07:59:48PM +0000, Pedro Falcato wrote:
> > On Wed, Feb 12, 2025 at 2:02 PM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > >
> > > (sorry I really am struggling to reply to mail as lore still seems to be
> > > broken).
> > >
> > > On Wed, Feb 12, 2025 at 12:37:50PM +0000, Pedro Falcato wrote:
> > > > On Wed, Feb 12, 2025 at 11:25 AM Lorenzo Stoakes
> > > > <lorenzo.stoakes@oracle.com> wrote:
> > > > >
> > > > > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
> > > > > > From: Jeff Xu <jeffxu@chromium.org>
> > > > > >
> > > > > > The commit message in the first patch contains the full description of
> > > > > > this series.
> > > > >
> > > [...]
> > > >
> > > > FWIW, although it would (at the moment) be hard to pull off in the
> > > > libc, I still much prefer it to playing these weird games with CONFIG
> > > > options and kernel command line options and prctl and personality and
> > > > whatnot. It seems to me like we're trying to stick policy where it
> > > > doesn't belong.
> > >
> > > The problem is, as a security feature, you don't want to make it trivially
> > > easy to disable.
> > >
> > > I mean we _need_ a config option to be able to strictly enforce only making
> > > the feature enable-able on architectures and configuration option
> > > combinations that work.
> > >
> > > But if there is userspace that will be broken, we really have to have some
> > > way of avoiding the disconnect between somebody making policy decision at
> > > the kernel level and somebody trying to run something.
> > >
> > > Because I can easily envision somebody enabling this as a 'good security
> > > feature' for a distro release or such, only for somebody else to later try
> > > rr, CRIU, or whatever else and for it to just not work or fail subtly and
> > > to have no idea why.
> >
> > Ok so I went looking around for the glibc patchset. It seems they're
> > moving away from tunables and there was a nice
> > GNU_PROPERTY_MEMORY_SEAL added to binutils.
> > So my proposal is to parse this property on the binfmt_elf.c side, and
> > mm would use this to know if we should seal these mappings. This seems
> > to tackle compatibility problems,
> > and glibc isn't sealing programs without this program header anyway. Thoughts?
>
> It seems to me that doing this ties it to the binary, rather than
> execution context, which may want to seal/not-seal, etc. I have a sense
> that it's be better as a secure bit, or prctl, or something like that. The
> properties seem to be better suited for "this binary _can_ do a thing"
> or "this binary _requires_ a thing", like the GNU_STACK bits, etc. But
> maybe there's more to this I'm not considering?

Doesn't this exactly kind of Just Work though? "This binary can
do/tolerate sealing". I would blindly guess that we don't have very
opinionated shared libraries that do this kind of shenanigans
unilaterally, so that's probably not something we really need to worry
about (though I admittedly need to read through the glibc patchset,
and nail down what they're thinking about doing with linking
mseal-ready and mseal-non-ready ELF execs/shared objects together).
The problem with something like prctl is that we either indirectly
provide some kind of limited form of munseal, or we require some sort
of handover (like personality(2) + execve(2)), which both sound like a
huge PITA and still don't solve any of our backwards compat issues...
all binaries would need to be patched with this
prctl/personality()/whatever call, and old ones wouldn't work.

The semantics behind GNU_PROPERTY_MEMORY_SEAL are unclear to me (maybe
the toolchain folks could shed some light?), but it sounds like it'd
fit perfectly.
I suspect we probably want to parse this on the kernel's side anyway
(to seal the main program/interp's segments)[1], then extending them
to the kernel system mappings should be somewhat trivial...
I don't think we'll ever get a program that can't cope with sealing
the system mappings but can cope with sealing itself (and if we do, we
just won't seal the entire thing and that's _okay_).

Deploying mseal-ready programs could then be done in a phased way by
distros. e.g chromeOS and android could simply enable the
corresponding linker option in LDFLAGS and let it rip. Other more
mainstream distros could obviously take a little longer or test/deploy
this on all programs not named gVisor and/or after CRIU is okay with
all of this. We then might not need a user-configurable CONFIG_ (only
an arch HAS_SYSTEM_MAPPINGS_MSEAL or whatever), nor a sysctl, and
everyone is happy.

I glanced through libc-alpha again and it seems like the glibc folks
also seem to have reached the same idea, but I'd love to hear from
Adhemerval.

Am I missing anything?


[1] we should probably nail this responsibility handover down before
glibc msealing (or bionic) makes it to a release. It'd probably be a
little nicer if we could mseal these segments from the kernel instead
of forcing the libc to take care of this, now that we have this
property.
Adhemerval Zanella Netto Feb. 19, 2025, 1:46 p.m. UTC | #14
On 18/02/25 20:18, Pedro Falcato wrote:
> On Thu, Feb 13, 2025 at 8:47 PM Kees Cook <kees@kernel.org> wrote:
>>
>> On Thu, Feb 13, 2025 at 07:59:48PM +0000, Pedro Falcato wrote:
>>> On Wed, Feb 12, 2025 at 2:02 PM Lorenzo Stoakes
>>> <lorenzo.stoakes@oracle.com> wrote:
>>>>
>>>> (sorry I really am struggling to reply to mail as lore still seems to be
>>>> broken).
>>>>
>>>> On Wed, Feb 12, 2025 at 12:37:50PM +0000, Pedro Falcato wrote:
>>>>> On Wed, Feb 12, 2025 at 11:25 AM Lorenzo Stoakes
>>>>> <lorenzo.stoakes@oracle.com> wrote:
>>>>>>
>>>>>> On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
>>>>>>> From: Jeff Xu <jeffxu@chromium.org>
>>>>>>>
>>>>>>> The commit message in the first patch contains the full description of
>>>>>>> this series.
>>>>>>
>>>> [...]
>>>>>
>>>>> FWIW, although it would (at the moment) be hard to pull off in the
>>>>> libc, I still much prefer it to playing these weird games with CONFIG
>>>>> options and kernel command line options and prctl and personality and
>>>>> whatnot. It seems to me like we're trying to stick policy where it
>>>>> doesn't belong.
>>>>
>>>> The problem is, as a security feature, you don't want to make it trivially
>>>> easy to disable.
>>>>
>>>> I mean we _need_ a config option to be able to strictly enforce only making
>>>> the feature enable-able on architectures and configuration option
>>>> combinations that work.
>>>>
>>>> But if there is userspace that will be broken, we really have to have some
>>>> way of avoiding the disconnect between somebody making policy decision at
>>>> the kernel level and somebody trying to run something.
>>>>
>>>> Because I can easily envision somebody enabling this as a 'good security
>>>> feature' for a distro release or such, only for somebody else to later try
>>>> rr, CRIU, or whatever else and for it to just not work or fail subtly and
>>>> to have no idea why.
>>>
>>> Ok so I went looking around for the glibc patchset. It seems they're
>>> moving away from tunables and there was a nice
>>> GNU_PROPERTY_MEMORY_SEAL added to binutils.
>>> So my proposal is to parse this property on the binfmt_elf.c side, and
>>> mm would use this to know if we should seal these mappings. This seems
>>> to tackle compatibility problems,
>>> and glibc isn't sealing programs without this program header anyway. Thoughts?
>>
>> It seems to me that doing this ties it to the binary, rather than
>> execution context, which may want to seal/not-seal, etc. I have a sense
>> that it's be better as a secure bit, or prctl, or something like that. The
>> properties seem to be better suited for "this binary _can_ do a thing"
>> or "this binary _requires_ a thing", like the GNU_STACK bits, etc. But
>> maybe there's more to this I'm not considering?
> 
> Doesn't this exactly kind of Just Work though? "This binary can
> do/tolerate sealing". I would blindly guess that we don't have very
> opinionated shared libraries that do this kind of shenanigans
> unilaterally, so that's probably not something we really need to worry
> about (though I admittedly need to read through the glibc patchset,
> and nail down what they're thinking about doing with linking
> mseal-ready and mseal-non-ready ELF execs/shared objects together).
> The problem with something like prctl is that we either indirectly
> provide some kind of limited form of munseal, or we require some sort
> of handover (like personality(2) + execve(2)), which both sound like a
> huge PITA and still don't solve any of our backwards compat issues...
> all binaries would need to be patched with this
> prctl/personality()/whatever call, and old ones wouldn't work.
> 
> The semantics behind GNU_PROPERTY_MEMORY_SEAL are unclear to me (maybe
> the toolchain folks could shed some light?), but it sounds like it'd
> fit perfectly.
> I suspect we probably want to parse this on the kernel's side anyway
> (to seal the main program/interp's segments)[1], then extending them
> to the kernel system mappings should be somewhat trivial...
> I don't think we'll ever get a program that can't cope with sealing
> the system mappings but can cope with sealing itself (and if we do, we
> just won't seal the entire thing and that's _okay_).
> 
> Deploying mseal-ready programs could then be done in a phased way by
> distros. e.g chromeOS and android could simply enable the
> corresponding linker option in LDFLAGS and let it rip. Other more
> mainstream distros could obviously take a little longer or test/deploy
> this on all programs not named gVisor and/or after CRIU is okay with
> all of this. We then might not need a user-configurable CONFIG_ (only
> an arch HAS_SYSTEM_MAPPINGS_MSEAL or whatever), nor a sysctl, and
> everyone is happy.
> 
> I glanced through libc-alpha again and it seems like the glibc folks
> also seem to have reached the same idea, but I'd love to hear from
> Adhemerval.
> 
> Am I missing anything?

Hi Pedro,

After discussing with CRIU developers, I plan to change how glibc will
handle GNU_PROPERTY_MEMORY_SEAL to allow a more smooth enablement in
distros (the latest version is at [1]).

The idea is only enable memory sealing iff the main binary (either 
ET_EXEC or PIE) has the new sealing attribute.  If it were the case, 
the loader will still check if the dependency has the attribute and 
seal accordingly.

So if for any reason the binary does not support memory sealing at all,
like CRIU for the snapshot restore phase, it just need to be built
with -Wl,-z,nomemory-seal.  It is also for the case for libraries,
although I think this will be rare.

I will also work on adding a way to enable partially non-sealing
sections, since Florian has hinted that he wants to use on some libgcc
metadata (similar to how RELRO '.data.rel.ro' works).  My initial plan
is just mimic what OpenBSD does with PT_OPENBSD_MUTABLE, which only
works for ET_DYN (and I think it should not be a problem). But it is
a userland problem, so no kernel support would be required.

> 
> 
> [1] we should probably nail this responsibility handover down before
> glibc msealing (or bionic) makes it to a release. It'd probably be a
> little nicer if we could mseal these segments from the kernel instead
> of forcing the libc to take care of this, now that we have this
> property.
> 

Keep in mind that GNU_PROPERTY_MEMORY_SEAL is currently a glibc extension 
and I am not sure if other runtime would be willing to adopt as well.  
So its presence can not be implied that sealing will be applied by runtime.

[1] https://patchwork.sourceware.org/project/glibc/list/?series=43524
enh Feb. 19, 2025, 5:17 p.m. UTC | #15
On Tue, Feb 18, 2025 at 6:18 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
>
> On Thu, Feb 13, 2025 at 8:47 PM Kees Cook <kees@kernel.org> wrote:
> >
> > On Thu, Feb 13, 2025 at 07:59:48PM +0000, Pedro Falcato wrote:
> > > On Wed, Feb 12, 2025 at 2:02 PM Lorenzo Stoakes
> > > <lorenzo.stoakes@oracle.com> wrote:
> > > >
> > > > (sorry I really am struggling to reply to mail as lore still seems to be
> > > > broken).
> > > >
> > > > On Wed, Feb 12, 2025 at 12:37:50PM +0000, Pedro Falcato wrote:
> > > > > On Wed, Feb 12, 2025 at 11:25 AM Lorenzo Stoakes
> > > > > <lorenzo.stoakes@oracle.com> wrote:
> > > > > >
> > > > > > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
> > > > > > > From: Jeff Xu <jeffxu@chromium.org>
> > > > > > >
> > > > > > > The commit message in the first patch contains the full description of
> > > > > > > this series.
> > > > > >
> > > > [...]
> > > > >
> > > > > FWIW, although it would (at the moment) be hard to pull off in the
> > > > > libc, I still much prefer it to playing these weird games with CONFIG
> > > > > options and kernel command line options and prctl and personality and
> > > > > whatnot. It seems to me like we're trying to stick policy where it
> > > > > doesn't belong.
> > > >
> > > > The problem is, as a security feature, you don't want to make it trivially
> > > > easy to disable.
> > > >
> > > > I mean we _need_ a config option to be able to strictly enforce only making
> > > > the feature enable-able on architectures and configuration option
> > > > combinations that work.
> > > >
> > > > But if there is userspace that will be broken, we really have to have some
> > > > way of avoiding the disconnect between somebody making policy decision at
> > > > the kernel level and somebody trying to run something.
> > > >
> > > > Because I can easily envision somebody enabling this as a 'good security
> > > > feature' for a distro release or such, only for somebody else to later try
> > > > rr, CRIU, or whatever else and for it to just not work or fail subtly and
> > > > to have no idea why.
> > >
> > > Ok so I went looking around for the glibc patchset. It seems they're
> > > moving away from tunables and there was a nice
> > > GNU_PROPERTY_MEMORY_SEAL added to binutils.
> > > So my proposal is to parse this property on the binfmt_elf.c side, and
> > > mm would use this to know if we should seal these mappings. This seems
> > > to tackle compatibility problems,
> > > and glibc isn't sealing programs without this program header anyway. Thoughts?
> >
> > It seems to me that doing this ties it to the binary, rather than
> > execution context, which may want to seal/not-seal, etc. I have a sense
> > that it's be better as a secure bit, or prctl, or something like that. The
> > properties seem to be better suited for "this binary _can_ do a thing"
> > or "this binary _requires_ a thing", like the GNU_STACK bits, etc. But
> > maybe there's more to this I'm not considering?
>
> Doesn't this exactly kind of Just Work though? "This binary can
> do/tolerate sealing". I would blindly guess that we don't have very
> opinionated shared libraries that do this kind of shenanigans
> unilaterally, so that's probably not something we really need to worry
> about (though I admittedly need to read through the glibc patchset,
> and nail down what they're thinking about doing with linking
> mseal-ready and mseal-non-ready ELF execs/shared objects together).
> The problem with something like prctl is that we either indirectly
> provide some kind of limited form of munseal, or we require some sort
> of handover (like personality(2) + execve(2)), which both sound like a
> huge PITA and still don't solve any of our backwards compat issues...
> all binaries would need to be patched with this
> prctl/personality()/whatever call, and old ones wouldn't work.
>
> The semantics behind GNU_PROPERTY_MEMORY_SEAL are unclear to me (maybe
> the toolchain folks could shed some light?), but it sounds like it'd
> fit perfectly.
> I suspect we probably want to parse this on the kernel's side anyway
> (to seal the main program/interp's segments)[1], then extending them
> to the kernel system mappings should be somewhat trivial...
> I don't think we'll ever get a program that can't cope with sealing
> the system mappings but can cope with sealing itself (and if we do, we
> just won't seal the entire thing and that's _okay_).
>
> Deploying mseal-ready programs could then be done in a phased way by
> distros. e.g chromeOS and android could simply enable the
> corresponding linker option in LDFLAGS and let it rip. Other more
> mainstream distros could obviously take a little longer or test/deploy
> this on all programs not named gVisor and/or after CRIU is okay with
> all of this. We then might not need a user-configurable CONFIG_ (only
> an arch HAS_SYSTEM_MAPPINGS_MSEAL or whatever), nor a sysctl, and
> everyone is happy.
>
> I glanced through libc-alpha again and it seems like the glibc folks
> also seem to have reached the same idea, but I'd love to hear from
> Adhemerval.
>
> Am I missing anything?

Android's a bit interesting because there isn't really a "binary" in
the usual sense. an Android app is basically a shared library of JNI
code dlopen()ed into a clone() of an already-initialized Java runtime
(the "zygote").

that said, i'm not expecting sealing the vdso to be problematic (even
if i'm not sure how useful it is to do so, being unaware of any
exploit that's ever used this?).

for me the tricky part is when it's used for regular user shared
libraries which seems the most convincing use case to me, albeit
mainly for app compat reasons ["stopping apps from poking at
implementation details they shouldn't, which later causes breakage if
the implementation detail changes"]. the irony being that it'll
_cause_ app compat problems by breaking all such things all at once.
so that'll be "fun" for someone to try to roll out!

it also breaks dlclose() (unless your dlopen() was with RTLD_NODELETE,
which is not the default on Android either). that's fine for OS
libraries that have already been loaded by the zygote, but hard to
make a default. that said, i do expect games and banking apps to be
interested to try this when they hear about it.

anyway, in general i think an ELF property per-library sounds
reasonable, matching what we have for arm64 BTI. i have no strong
opinion on the system mappings and what if anything executables should
do, for the reasons given above, but the elf property sounds
reasonable.

> [1] we should probably nail this responsibility handover down before
> glibc msealing (or bionic) makes it to a release. It'd probably be a
> little nicer if we could mseal these segments from the kernel instead
> of forcing the libc to take care of this, now that we have this
> property.
>
> --
> Pedro