diff mbox series

[v11,bpf-next,02/17] bpf: add BPF token delegation mount options to BPF FS

Message ID 20231127190409.2344550-3-andrii@kernel.org (mailing list archive)
State Superseded
Headers show
Series BPF token and BPF FS-based delegation | expand

Commit Message

Andrii Nakryiko Nov. 27, 2023, 7:03 p.m. UTC
Add few new mount options to BPF FS that allow to specify that a given
BPF FS instance allows creation of BPF token (added in the next patch),
and what sort of operations are allowed under BPF token. As such, we get
4 new mount options, each is a bit mask
  - `delegate_cmds` allow to specify which bpf() syscall commands are
    allowed with BPF token derived from this BPF FS instance;
  - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies
    a set of allowable BPF map types that could be created with BPF token;
  - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies
    a set of allowable BPF program types that could be loaded with BPF token;
  - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies
    a set of allowable BPF program attach types that could be loaded with
    BPF token; delegate_progs and delegate_attachs are meant to be used
    together, as full BPF program type is, in general, determined
    through both program type and program attach type.

Currently, these mount options accept the following forms of values:
  - a special value "any", that enables all possible values of a given
  bit set;
  - numeric value (decimal or hexadecimal, determined by kernel
  automatically) that specifies a bit mask value directly;
  - all the values for a given mount option are combined, if specified
  multiple times. E.g., `mount -t bpf nodev /path/to/mount -o
  delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3
  mask.

Ideally, more convenient (for humans) symbolic form derived from
corresponding UAPI enums would be accepted (e.g., `-o
delegate_progs=kprobe|tracepoint`) and I intend to implement this, but
it requires a bunch of UAPI header churn, so I postponed it until this
feature lands upstream or at least there is a definite consensus that
this feature is acceptable and is going to make it, just to minimize
amount of wasted effort and not increase amount of non-essential code to
be reviewed.

Attentive reader will notice that BPF FS is now marked as
FS_USERNS_MOUNT, which theoretically makes it mountable inside non-init
user namespace as long as the process has sufficient *namespaced*
capabilities within that user namespace. But in reality we still
restrict BPF FS to be mountable only by processes with CAP_SYS_ADMIN *in
init userns* (extra check in bpf_fill_super()). FS_USERNS_MOUNT is added
to allow creating BPF FS context object (i.e., fsopen("bpf")) from
inside unprivileged process inside non-init userns, to capture that
userns as the owning userns. It will still be required to pass this
context object back to privileged process to instantiate and mount it.

This manipulation is important, because capturing non-init userns as the
owning userns of BPF FS instance (super block) allows to use that userns
to constraint BPF token to that userns later on (see next patch). So
creating BPF FS with delegation inside unprivileged userns will restrict
derived BPF token objects to only "work" inside that intended userns,
making it scoped to a intended "container". Also, setting these
delegation options requires capable(CAP_SYS_ADMIN), so unprivileged
process cannot set this up without involvement of a privileged process.

There is a set of selftests at the end of the patch set that simulates
this sequence of steps and validates that everything works as intended.
But careful review is requested to make sure there are no missed gaps in
the implementation and testing.

This somewhat subtle set of aspects is the result of previous
discussions ([0]) about various user namespace implications and
interactions with BPF token functionality and is necessary to contain
BPF token inside intended user namespace.

  [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h | 10 +++++
 kernel/bpf/inode.c  | 91 ++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 91 insertions(+), 10 deletions(-)

Comments

Christian Brauner Nov. 30, 2023, 2:18 p.m. UTC | #1
On Mon, Nov 27, 2023 at 11:03:54AM -0800, Andrii Nakryiko wrote:
> Add few new mount options to BPF FS that allow to specify that a given
> BPF FS instance allows creation of BPF token (added in the next patch),
> and what sort of operations are allowed under BPF token. As such, we get
> 4 new mount options, each is a bit mask
>   - `delegate_cmds` allow to specify which bpf() syscall commands are
>     allowed with BPF token derived from this BPF FS instance;
>   - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies
>     a set of allowable BPF map types that could be created with BPF token;
>   - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies
>     a set of allowable BPF program types that could be loaded with BPF token;
>   - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies
>     a set of allowable BPF program attach types that could be loaded with
>     BPF token; delegate_progs and delegate_attachs are meant to be used
>     together, as full BPF program type is, in general, determined
>     through both program type and program attach type.
> 
> Currently, these mount options accept the following forms of values:
>   - a special value "any", that enables all possible values of a given
>   bit set;
>   - numeric value (decimal or hexadecimal, determined by kernel
>   automatically) that specifies a bit mask value directly;
>   - all the values for a given mount option are combined, if specified
>   multiple times. E.g., `mount -t bpf nodev /path/to/mount -o
>   delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3
>   mask.
> 
> Ideally, more convenient (for humans) symbolic form derived from
> corresponding UAPI enums would be accepted (e.g., `-o
> delegate_progs=kprobe|tracepoint`) and I intend to implement this, but
> it requires a bunch of UAPI header churn, so I postponed it until this
> feature lands upstream or at least there is a definite consensus that
> this feature is acceptable and is going to make it, just to minimize
> amount of wasted effort and not increase amount of non-essential code to
> be reviewed.
> 
> Attentive reader will notice that BPF FS is now marked as
> FS_USERNS_MOUNT, which theoretically makes it mountable inside non-init
> user namespace as long as the process has sufficient *namespaced*
> capabilities within that user namespace. But in reality we still
> restrict BPF FS to be mountable only by processes with CAP_SYS_ADMIN *in
> init userns* (extra check in bpf_fill_super()). FS_USERNS_MOUNT is added
> to allow creating BPF FS context object (i.e., fsopen("bpf")) from
> inside unprivileged process inside non-init userns, to capture that
> userns as the owning userns. It will still be required to pass this
> context object back to privileged process to instantiate and mount it.
> 
> This manipulation is important, because capturing non-init userns as the
> owning userns of BPF FS instance (super block) allows to use that userns
> to constraint BPF token to that userns later on (see next patch). So
> creating BPF FS with delegation inside unprivileged userns will restrict
> derived BPF token objects to only "work" inside that intended userns,
> making it scoped to a intended "container". Also, setting these
> delegation options requires capable(CAP_SYS_ADMIN), so unprivileged
> process cannot set this up without involvement of a privileged process.
> 
> There is a set of selftests at the end of the patch set that simulates
> this sequence of steps and validates that everything works as intended.
> But careful review is requested to make sure there are no missed gaps in
> the implementation and testing.
> 
> This somewhat subtle set of aspects is the result of previous
> discussions ([0]) about various user namespace implications and
> interactions with BPF token functionality and is necessary to contain
> BPF token inside intended user namespace.
> 
>   [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---

I still think this is a little weird because this isn't really
unprivileged bpf and it isn't really safe bpf as well.

All this does is allow an administrator to punch a big fat hole into an
unprivileged container so workloads get to play with their favorite toy.

I think that having a way to have signed bpf programs in addition to
this would be much more interesting to generic workloads that don't know
who or what they can trust.

And there's a few things to remember:

* This absolutely isn't a safety mechanism.
* This absolutely isn't safe to enable generically in containers.
* This is a workaround and not a solution to unprivileged bpf.

And this is an ACK solely on the code of this patch, not the concept.
Acked-by: Christian Brauner <brauner@kernel.org> (reluctantly)
Simon Horman Nov. 30, 2023, 4:36 p.m. UTC | #2
On Mon, Nov 27, 2023 at 11:03:54AM -0800, Andrii Nakryiko wrote:

...

> @@ -764,7 +817,10 @@ static int bpf_get_tree(struct fs_context *fc)
>  
>  static void bpf_free_fc(struct fs_context *fc)
>  {
> -	kfree(fc->fs_private);
> +	struct bpf_mount_opts *opts = fc->s_fs_info;
> +
> +	if (opts)
> +		kfree(opts);
>  }

Hi Andrii,

as it looks like there will be a v12, I have a minor nit to report: There
is no need to check if opts is non-NULL because kfree() is basically a
no-op if it's argument is NULL.

So perhaps this can become (completely untested!):

static void bpf_free_fc(struct fs_context *fc)
{
	kfree(fc->s_fs_info);
}

...
Andrii Nakryiko Nov. 30, 2023, 6:02 p.m. UTC | #3
On Thu, Nov 30, 2023 at 6:18 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Mon, Nov 27, 2023 at 11:03:54AM -0800, Andrii Nakryiko wrote:
> > Add few new mount options to BPF FS that allow to specify that a given
> > BPF FS instance allows creation of BPF token (added in the next patch),
> > and what sort of operations are allowed under BPF token. As such, we get
> > 4 new mount options, each is a bit mask
> >   - `delegate_cmds` allow to specify which bpf() syscall commands are
> >     allowed with BPF token derived from this BPF FS instance;
> >   - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies
> >     a set of allowable BPF map types that could be created with BPF token;
> >   - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies
> >     a set of allowable BPF program types that could be loaded with BPF token;
> >   - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies
> >     a set of allowable BPF program attach types that could be loaded with
> >     BPF token; delegate_progs and delegate_attachs are meant to be used
> >     together, as full BPF program type is, in general, determined
> >     through both program type and program attach type.
> >
> > Currently, these mount options accept the following forms of values:
> >   - a special value "any", that enables all possible values of a given
> >   bit set;
> >   - numeric value (decimal or hexadecimal, determined by kernel
> >   automatically) that specifies a bit mask value directly;
> >   - all the values for a given mount option are combined, if specified
> >   multiple times. E.g., `mount -t bpf nodev /path/to/mount -o
> >   delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3
> >   mask.
> >
> > Ideally, more convenient (for humans) symbolic form derived from
> > corresponding UAPI enums would be accepted (e.g., `-o
> > delegate_progs=kprobe|tracepoint`) and I intend to implement this, but
> > it requires a bunch of UAPI header churn, so I postponed it until this
> > feature lands upstream or at least there is a definite consensus that
> > this feature is acceptable and is going to make it, just to minimize
> > amount of wasted effort and not increase amount of non-essential code to
> > be reviewed.
> >
> > Attentive reader will notice that BPF FS is now marked as
> > FS_USERNS_MOUNT, which theoretically makes it mountable inside non-init
> > user namespace as long as the process has sufficient *namespaced*
> > capabilities within that user namespace. But in reality we still
> > restrict BPF FS to be mountable only by processes with CAP_SYS_ADMIN *in
> > init userns* (extra check in bpf_fill_super()). FS_USERNS_MOUNT is added
> > to allow creating BPF FS context object (i.e., fsopen("bpf")) from
> > inside unprivileged process inside non-init userns, to capture that
> > userns as the owning userns. It will still be required to pass this
> > context object back to privileged process to instantiate and mount it.
> >
> > This manipulation is important, because capturing non-init userns as the
> > owning userns of BPF FS instance (super block) allows to use that userns
> > to constraint BPF token to that userns later on (see next patch). So
> > creating BPF FS with delegation inside unprivileged userns will restrict
> > derived BPF token objects to only "work" inside that intended userns,
> > making it scoped to a intended "container". Also, setting these
> > delegation options requires capable(CAP_SYS_ADMIN), so unprivileged
> > process cannot set this up without involvement of a privileged process.
> >
> > There is a set of selftests at the end of the patch set that simulates
> > this sequence of steps and validates that everything works as intended.
> > But careful review is requested to make sure there are no missed gaps in
> > the implementation and testing.
> >
> > This somewhat subtle set of aspects is the result of previous
> > discussions ([0]) about various user namespace implications and
> > interactions with BPF token functionality and is necessary to contain
> > BPF token inside intended user namespace.
> >
> >   [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
>
> I still think this is a little weird because this isn't really
> unprivileged bpf and it isn't really safe bpf as well.
>
> All this does is allow an administrator to punch a big fat hole into an
> unprivileged container so workloads get to play with their favorite toy.
>
> I think that having a way to have signed bpf programs in addition to
> this would be much more interesting to generic workloads that don't know
> who or what they can trust.

Absolutely, that is the trust part of the puzzle. Song's LSM+fsverity
patch set is a step in that direction.

>
> And there's a few things to remember:
>
> * This absolutely isn't a safety mechanism.
> * This absolutely isn't safe to enable generically in containers.

100% agreed, this should be used only with trusted workloads (however
the trust is established in any particular setup: signing, fsverity,
whatnot)

> * This is a workaround and not a solution to unprivileged bpf.
>
> And this is an ACK solely on the code of this patch, not the concept.
> Acked-by: Christian Brauner <brauner@kernel.org> (reluctantly)

Thank you anyways! I appreciate your efforts to make sure this doesn't
create unintended security holes.
Andrii Nakryiko Nov. 30, 2023, 6:03 p.m. UTC | #4
On Thu, Nov 30, 2023 at 8:37 AM Simon Horman <horms@kernel.org> wrote:
>
> On Mon, Nov 27, 2023 at 11:03:54AM -0800, Andrii Nakryiko wrote:
>
> ...
>
> > @@ -764,7 +817,10 @@ static int bpf_get_tree(struct fs_context *fc)
> >
> >  static void bpf_free_fc(struct fs_context *fc)
> >  {
> > -     kfree(fc->fs_private);
> > +     struct bpf_mount_opts *opts = fc->s_fs_info;
> > +
> > +     if (opts)
> > +             kfree(opts);
> >  }
>
> Hi Andrii,
>
> as it looks like there will be a v12, I have a minor nit to report: There
> is no need to check if opts is non-NULL because kfree() is basically a
> no-op if it's argument is NULL.
>
> So perhaps this can become (completely untested!):
>
> static void bpf_free_fc(struct fs_context *fc)
> {
>         kfree(fc->s_fs_info);
> }
>

sure, I can drop the check, I wasn't sure if it's canonical or not to
check the argument for NULL before calling kfree(). For user-space
it's definitely quite expected to not have to check for null before
calling free().


> ...
Andrii Nakryiko Nov. 30, 2023, 6:13 p.m. UTC | #5
On Thu, Nov 30, 2023 at 10:03 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Nov 30, 2023 at 8:37 AM Simon Horman <horms@kernel.org> wrote:
> >
> > On Mon, Nov 27, 2023 at 11:03:54AM -0800, Andrii Nakryiko wrote:
> >
> > ...
> >
> > > @@ -764,7 +817,10 @@ static int bpf_get_tree(struct fs_context *fc)
> > >
> > >  static void bpf_free_fc(struct fs_context *fc)
> > >  {
> > > -     kfree(fc->fs_private);
> > > +     struct bpf_mount_opts *opts = fc->s_fs_info;
> > > +
> > > +     if (opts)
> > > +             kfree(opts);
> > >  }
> >
> > Hi Andrii,
> >
> > as it looks like there will be a v12, I have a minor nit to report: There
> > is no need to check if opts is non-NULL because kfree() is basically a
> > no-op if it's argument is NULL.
> >
> > So perhaps this can become (completely untested!):
> >
> > static void bpf_free_fc(struct fs_context *fc)
> > {
> >         kfree(fc->s_fs_info);
> > }
> >
>
> sure, I can drop the check, I wasn't sure if it's canonical or not to
> check the argument for NULL before calling kfree(). For user-space
> it's definitely quite expected to not have to check for null before
> calling free().

Heh, turns out I already simplified this, but it's in the next patch.
I'll move it into patch #2, though, where it actually belongs.

>
>
> > ...
Simon Horman Dec. 5, 2023, 9:13 a.m. UTC | #6
On Thu, Nov 30, 2023 at 10:13:41AM -0800, Andrii Nakryiko wrote:
> On Thu, Nov 30, 2023 at 10:03 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Nov 30, 2023 at 8:37 AM Simon Horman <horms@kernel.org> wrote:
> > >
> > > On Mon, Nov 27, 2023 at 11:03:54AM -0800, Andrii Nakryiko wrote:
> > >
> > > ...
> > >
> > > > @@ -764,7 +817,10 @@ static int bpf_get_tree(struct fs_context *fc)
> > > >
> > > >  static void bpf_free_fc(struct fs_context *fc)
> > > >  {
> > > > -     kfree(fc->fs_private);
> > > > +     struct bpf_mount_opts *opts = fc->s_fs_info;
> > > > +
> > > > +     if (opts)
> > > > +             kfree(opts);
> > > >  }
> > >
> > > Hi Andrii,
> > >
> > > as it looks like there will be a v12, I have a minor nit to report: There
> > > is no need to check if opts is non-NULL because kfree() is basically a
> > > no-op if it's argument is NULL.
> > >
> > > So perhaps this can become (completely untested!):
> > >
> > > static void bpf_free_fc(struct fs_context *fc)
> > > {
> > >         kfree(fc->s_fs_info);
> > > }
> > >
> >
> > sure, I can drop the check, I wasn't sure if it's canonical or not to
> > check the argument for NULL before calling kfree(). For user-space
> > it's definitely quite expected to not have to check for null before
> > calling free().
> 
> Heh, turns out I already simplified this, but it's in the next patch.
> I'll move it into patch #2, though, where it actually belongs.

Thanks. I do believe that for kernel code not checking for NULL here
is preferred.
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index eb447b0a9423..ec8e5ca75168 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1570,6 +1570,16 @@  struct bpf_link_primer {
 	u32 id;
 };
 
+struct bpf_mount_opts {
+	umode_t mode;
+
+	/* BPF token-related delegation options */
+	u64 delegate_cmds;
+	u64 delegate_maps;
+	u64 delegate_progs;
+	u64 delegate_attachs;
+};
+
 struct bpf_struct_ops_value;
 struct btf_member;
 
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 1aafb2ff2e95..53313a95fdc6 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -20,6 +20,7 @@ 
 #include <linux/filter.h>
 #include <linux/bpf.h>
 #include <linux/bpf_trace.h>
+#include <linux/kstrtox.h>
 #include "preload/bpf_preload.h"
 
 enum bpf_type {
@@ -599,10 +600,31 @@  EXPORT_SYMBOL(bpf_prog_get_type_path);
  */
 static int bpf_show_options(struct seq_file *m, struct dentry *root)
 {
+	struct bpf_mount_opts *opts = root->d_sb->s_fs_info;
 	umode_t mode = d_inode(root)->i_mode & S_IALLUGO & ~S_ISVTX;
 
 	if (mode != S_IRWXUGO)
 		seq_printf(m, ",mode=%o", mode);
+
+	if (opts->delegate_cmds == ~0ULL)
+		seq_printf(m, ",delegate_cmds=any");
+	else if (opts->delegate_cmds)
+		seq_printf(m, ",delegate_cmds=0x%llx", opts->delegate_cmds);
+
+	if (opts->delegate_maps == ~0ULL)
+		seq_printf(m, ",delegate_maps=any");
+	else if (opts->delegate_maps)
+		seq_printf(m, ",delegate_maps=0x%llx", opts->delegate_maps);
+
+	if (opts->delegate_progs == ~0ULL)
+		seq_printf(m, ",delegate_progs=any");
+	else if (opts->delegate_progs)
+		seq_printf(m, ",delegate_progs=0x%llx", opts->delegate_progs);
+
+	if (opts->delegate_attachs == ~0ULL)
+		seq_printf(m, ",delegate_attachs=any");
+	else if (opts->delegate_attachs)
+		seq_printf(m, ",delegate_attachs=0x%llx", opts->delegate_attachs);
 	return 0;
 }
 
@@ -626,22 +648,27 @@  static const struct super_operations bpf_super_ops = {
 
 enum {
 	OPT_MODE,
+	OPT_DELEGATE_CMDS,
+	OPT_DELEGATE_MAPS,
+	OPT_DELEGATE_PROGS,
+	OPT_DELEGATE_ATTACHS,
 };
 
 static const struct fs_parameter_spec bpf_fs_parameters[] = {
 	fsparam_u32oct	("mode",			OPT_MODE),
+	fsparam_string	("delegate_cmds",		OPT_DELEGATE_CMDS),
+	fsparam_string	("delegate_maps",		OPT_DELEGATE_MAPS),
+	fsparam_string	("delegate_progs",		OPT_DELEGATE_PROGS),
+	fsparam_string	("delegate_attachs",		OPT_DELEGATE_ATTACHS),
 	{}
 };
 
-struct bpf_mount_opts {
-	umode_t mode;
-};
-
 static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
 {
-	struct bpf_mount_opts *opts = fc->fs_private;
+	struct bpf_mount_opts *opts = fc->s_fs_info;
 	struct fs_parse_result result;
-	int opt;
+	int opt, err;
+	u64 msk;
 
 	opt = fs_parse(fc, bpf_fs_parameters, param, &result);
 	if (opt < 0) {
@@ -665,6 +692,28 @@  static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
 	case OPT_MODE:
 		opts->mode = result.uint_32 & S_IALLUGO;
 		break;
+	case OPT_DELEGATE_CMDS:
+	case OPT_DELEGATE_MAPS:
+	case OPT_DELEGATE_PROGS:
+	case OPT_DELEGATE_ATTACHS:
+		if (strcmp(param->string, "any") == 0) {
+			msk = ~0ULL;
+		} else {
+			err = kstrtou64(param->string, 0, &msk);
+			if (err)
+				return err;
+		}
+		/* Setting delegation mount options requires privileges */
+		if (msk && !capable(CAP_SYS_ADMIN))
+			return -EPERM;
+		switch (opt) {
+		case OPT_DELEGATE_CMDS: opts->delegate_cmds |= msk; break;
+		case OPT_DELEGATE_MAPS: opts->delegate_maps |= msk; break;
+		case OPT_DELEGATE_PROGS: opts->delegate_progs |= msk; break;
+		case OPT_DELEGATE_ATTACHS: opts->delegate_attachs |= msk; break;
+		default: return -EINVAL;
+		}
+		break;
 	}
 
 	return 0;
@@ -739,10 +788,14 @@  static int populate_bpffs(struct dentry *parent)
 static int bpf_fill_super(struct super_block *sb, struct fs_context *fc)
 {
 	static const struct tree_descr bpf_rfiles[] = { { "" } };
-	struct bpf_mount_opts *opts = fc->fs_private;
+	struct bpf_mount_opts *opts = sb->s_fs_info;
 	struct inode *inode;
 	int ret;
 
+	/* Mounting an instance of BPF FS requires privileges */
+	if (fc->user_ns != &init_user_ns && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
 	ret = simple_fill_super(sb, BPF_FS_MAGIC, bpf_rfiles);
 	if (ret)
 		return ret;
@@ -764,7 +817,10 @@  static int bpf_get_tree(struct fs_context *fc)
 
 static void bpf_free_fc(struct fs_context *fc)
 {
-	kfree(fc->fs_private);
+	struct bpf_mount_opts *opts = fc->s_fs_info;
+
+	if (opts)
+		kfree(opts);
 }
 
 static const struct fs_context_operations bpf_context_ops = {
@@ -786,17 +842,32 @@  static int bpf_init_fs_context(struct fs_context *fc)
 
 	opts->mode = S_IRWXUGO;
 
-	fc->fs_private = opts;
+	/* start out with no BPF token delegation enabled */
+	opts->delegate_cmds = 0;
+	opts->delegate_maps = 0;
+	opts->delegate_progs = 0;
+	opts->delegate_attachs = 0;
+
+	fc->s_fs_info = opts;
 	fc->ops = &bpf_context_ops;
 	return 0;
 }
 
+static void bpf_kill_super(struct super_block *sb)
+{
+	struct bpf_mount_opts *opts = sb->s_fs_info;
+
+	kill_litter_super(sb);
+	kfree(opts);
+}
+
 static struct file_system_type bpf_fs_type = {
 	.owner		= THIS_MODULE,
 	.name		= "bpf",
 	.init_fs_context = bpf_init_fs_context,
 	.parameters	= bpf_fs_parameters,
-	.kill_sb	= kill_litter_super,
+	.kill_sb	= bpf_kill_super,
+	.fs_flags	= FS_USERNS_MOUNT,
 };
 
 static int __init bpf_init(void)