diff mbox series

[bpf-next,v4,1/3] bpf: Add skb dynptrs

Message ID 20220822235649.2218031-2-joannelkoong@gmail.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series Add skb + xdp dynptrs | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-4 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-5 success Logs for set-matrix
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 105798 this patch: 105798
netdev/cc_maintainers warning 12 maintainers not CCed: john.fastabend@gmail.com song@kernel.org sdf@google.com hawk@kernel.org martin.lau@linux.dev davem@davemloft.net edumazet@google.com kpsingh@kernel.org jolsa@kernel.org pabeni@redhat.com haoluo@google.com yhs@fb.com
netdev/build_clang success Errors and warnings before: 173 this patch: 173
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 107672 this patch: 107672
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns WARNING: line length of 95 exceeds 80 columns WARNING: line length of 98 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-1 success Logs for Kernel LATEST on ubuntu-latest with gcc
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Kernel LATEST on ubuntu-latest with llvm-16
bpf/vmtest-bpf-next-VM_Test-3 fail Logs for Kernel LATEST on z15 with gcc

Commit Message

Joanne Koong Aug. 22, 2022, 11:56 p.m. UTC
Add skb dynptrs, which are dynptrs whose underlying pointer points
to a skb. The dynptr acts on skb data. skb dynptrs have two main
benefits. One is that they allow operations on sizes that are not
statically known at compile-time (eg variable-sized accesses).
Another is that parsing the packet data through dynptrs (instead of
through direct access of skb->data and skb->data_end) can be more
ergonomic and less brittle (eg does not need manual if checking for
being within bounds of data_end).

For bpf prog types that don't support writes on skb data, the dynptr is
read-only. For reads and writes through the bpf_dynptr_read() and
bpf_dynptr_write() interfaces, this supports reading and writing into
data in the non-linear paged buffers. For data slices (through the
bpf_dynptr_data() interface), if the data is in a paged buffer, the user
must first call bpf_skb_pull_data() to pull the data into the linear
portion. The returned data slice from a call to bpf_dynptr_data() is of
reg type PTR_TO_PACKET | PTR_MAYBE_NULL.

Any bpf_dynptr_write() automatically invalidates any prior data slices
to the skb dynptr. This is because a bpf_dynptr_write() may be writing
to data in a paged buffer, so it will need to pull the buffer first into
the head. The reason it needs to be pulled instead of writing directly to
the paged buffers is because they may be cloned (only the head of the skb
is by default uncloned). As such, any bpf_dynptr_write() will
automatically have its prior data slices invalidated, even if the write
is to data in the skb head (the verifier has no way of differentiating
whether the write is to the head or paged buffers during program load
time). Please note as well that any other helper calls that change the
underlying packet buffer (eg bpf_skb_pull_data()) invalidates any data
slices of the skb dynptr as well. Whenever such a helper call is made,
the verifier marks any PTR_TO_PACKET reg type (which includes skb dynptr
slices since they are PTR_TO_PACKETs) as unknown. The stack trace for
this is check_helper_call() -> clear_all_pkt_pointers() ->
__clear_all_pkt_pointers() -> mark_reg_unknown()

For examples of how skb dynptrs can be used, please see the attached
selftests.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/bpf.h            | 83 +++++++++++++++++-----------
 include/linux/filter.h         |  4 ++
 include/uapi/linux/bpf.h       | 40 ++++++++++++--
 kernel/bpf/helpers.c           | 81 +++++++++++++++++++++++++---
 kernel/bpf/verifier.c          | 99 ++++++++++++++++++++++++++++------
 net/core/filter.c              | 53 ++++++++++++++++--
 tools/include/uapi/linux/bpf.h | 40 ++++++++++++--
 7 files changed, 335 insertions(+), 65 deletions(-)

Comments

kernel test robot Aug. 23, 2022, 11:22 p.m. UTC | #1
Hi Joanne,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Joanne-Koong/Add-skb-xdp-dynptrs/20220823-080022
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: arm-buildonly-randconfig-r002-20220823 (https://download.01.org/0day-ci/archive/20220824/202208240728.59W00MTW-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/a2c8a74d8f0b7fd0b0008dc9bc5ccf9887317f36
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Joanne-Koong/Add-skb-xdp-dynptrs/20220823-080022
        git checkout a2c8a74d8f0b7fd0b0008dc9bc5ccf9887317f36
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   arm-linux-gnueabi-ld: kernel/bpf/helpers.o: in function `bpf_dynptr_read':
>> helpers.c:(.text+0xb1c): undefined reference to `__bpf_skb_load_bytes'
   arm-linux-gnueabi-ld: kernel/bpf/helpers.o: in function `bpf_dynptr_write':
>> helpers.c:(.text+0xc30): undefined reference to `__bpf_skb_store_bytes'
kernel test robot Aug. 23, 2022, 11:53 p.m. UTC | #2
Hi Joanne,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Joanne-Koong/Add-skb-xdp-dynptrs/20220823-080022
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: csky-randconfig-r022-20220823 (https://download.01.org/0day-ci/archive/20220824/202208240751.BRPS1SoF-lkp@intel.com/config)
compiler: csky-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/a2c8a74d8f0b7fd0b0008dc9bc5ccf9887317f36
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Joanne-Koong/Add-skb-xdp-dynptrs/20220823-080022
        git checkout a2c8a74d8f0b7fd0b0008dc9bc5ccf9887317f36
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=csky SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   csky-linux-ld: kernel/bpf/helpers.o: in function `____bpf_dynptr_read':
>> kernel/bpf/helpers.c:1543: undefined reference to `__bpf_skb_load_bytes'
   csky-linux-ld: kernel/bpf/helpers.o: in function `bpf_dynptr_read':
   kernel/bpf/helpers.c:1522: undefined reference to `__bpf_skb_load_bytes'
   csky-linux-ld: kernel/bpf/helpers.o: in function `____bpf_dynptr_write':
>> kernel/bpf/helpers.c:1584: undefined reference to `__bpf_skb_store_bytes'
   csky-linux-ld: kernel/bpf/helpers.o: in function `bpf_dynptr_write':
   kernel/bpf/helpers.c:1561: undefined reference to `__bpf_skb_store_bytes'


vim +1543 kernel/bpf/helpers.c

  1521	
  1522	BPF_CALL_5(bpf_dynptr_read, void *, dst, u32, len, struct bpf_dynptr_kern *, src,
  1523		   u32, offset, u64, flags)
  1524	{
  1525		enum bpf_dynptr_type type;
  1526		int err;
  1527	
  1528		if (!src->data || flags)
  1529			return -EINVAL;
  1530	
  1531		err = bpf_dynptr_check_off_len(src, offset, len);
  1532		if (err)
  1533			return err;
  1534	
  1535		type = bpf_dynptr_get_type(src);
  1536	
  1537		switch (type) {
  1538		case BPF_DYNPTR_TYPE_LOCAL:
  1539		case BPF_DYNPTR_TYPE_RINGBUF:
  1540			memcpy(dst, src->data + src->offset + offset, len);
  1541			return 0;
  1542		case BPF_DYNPTR_TYPE_SKB:
> 1543			return __bpf_skb_load_bytes(src->data, src->offset + offset, dst, len);
  1544		default:
  1545			WARN(true, "bpf_dynptr_read: unknown dynptr type %d\n", type);
  1546			return -EFAULT;
  1547		}
  1548	}
  1549	
  1550	static const struct bpf_func_proto bpf_dynptr_read_proto = {
  1551		.func		= bpf_dynptr_read,
  1552		.gpl_only	= false,
  1553		.ret_type	= RET_INTEGER,
  1554		.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
  1555		.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
  1556		.arg3_type	= ARG_PTR_TO_DYNPTR,
  1557		.arg4_type	= ARG_ANYTHING,
  1558		.arg5_type	= ARG_ANYTHING,
  1559	};
  1560	
  1561	BPF_CALL_5(bpf_dynptr_write, struct bpf_dynptr_kern *, dst, u32, offset, void *, src,
  1562		   u32, len, u64, flags)
  1563	{
  1564		enum bpf_dynptr_type type;
  1565		int err;
  1566	
  1567		if (!dst->data || bpf_dynptr_is_rdonly(dst))
  1568			return -EINVAL;
  1569	
  1570		err = bpf_dynptr_check_off_len(dst, offset, len);
  1571		if (err)
  1572			return err;
  1573	
  1574		type = bpf_dynptr_get_type(dst);
  1575	
  1576		switch (type) {
  1577		case BPF_DYNPTR_TYPE_LOCAL:
  1578		case BPF_DYNPTR_TYPE_RINGBUF:
  1579			if (flags)
  1580				return -EINVAL;
  1581			memcpy(dst->data + dst->offset + offset, src, len);
  1582			return 0;
  1583		case BPF_DYNPTR_TYPE_SKB:
> 1584			return __bpf_skb_store_bytes(dst->data, dst->offset + offset, src, len,
  1585						     flags);
  1586		default:
  1587			WARN(true, "bpf_dynptr_write: unknown dynptr type %d\n", type);
  1588			return -EFAULT;
  1589		}
  1590	}
  1591
Andrii Nakryiko Aug. 24, 2022, 6:27 p.m. UTC | #3
On Mon, Aug 22, 2022 at 4:57 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> Add skb dynptrs, which are dynptrs whose underlying pointer points
> to a skb. The dynptr acts on skb data. skb dynptrs have two main
> benefits. One is that they allow operations on sizes that are not
> statically known at compile-time (eg variable-sized accesses).
> Another is that parsing the packet data through dynptrs (instead of
> through direct access of skb->data and skb->data_end) can be more
> ergonomic and less brittle (eg does not need manual if checking for
> being within bounds of data_end).
>
> For bpf prog types that don't support writes on skb data, the dynptr is
> read-only. For reads and writes through the bpf_dynptr_read() and
> bpf_dynptr_write() interfaces, this supports reading and writing into
> data in the non-linear paged buffers. For data slices (through the
> bpf_dynptr_data() interface), if the data is in a paged buffer, the user
> must first call bpf_skb_pull_data() to pull the data into the linear
> portion. The returned data slice from a call to bpf_dynptr_data() is of
> reg type PTR_TO_PACKET | PTR_MAYBE_NULL.
>
> Any bpf_dynptr_write() automatically invalidates any prior data slices
> to the skb dynptr. This is because a bpf_dynptr_write() may be writing
> to data in a paged buffer, so it will need to pull the buffer first into
> the head. The reason it needs to be pulled instead of writing directly to
> the paged buffers is because they may be cloned (only the head of the skb
> is by default uncloned). As such, any bpf_dynptr_write() will
> automatically have its prior data slices invalidated, even if the write
> is to data in the skb head (the verifier has no way of differentiating
> whether the write is to the head or paged buffers during program load
> time). Please note as well that any other helper calls that change the
> underlying packet buffer (eg bpf_skb_pull_data()) invalidates any data
> slices of the skb dynptr as well. Whenever such a helper call is made,
> the verifier marks any PTR_TO_PACKET reg type (which includes skb dynptr
> slices since they are PTR_TO_PACKETs) as unknown. The stack trace for
> this is check_helper_call() -> clear_all_pkt_pointers() ->
> __clear_all_pkt_pointers() -> mark_reg_unknown()
>
> For examples of how skb dynptrs can be used, please see the attached
> selftests.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  include/linux/bpf.h            | 83 +++++++++++++++++-----------
>  include/linux/filter.h         |  4 ++
>  include/uapi/linux/bpf.h       | 40 ++++++++++++--
>  kernel/bpf/helpers.c           | 81 +++++++++++++++++++++++++---
>  kernel/bpf/verifier.c          | 99 ++++++++++++++++++++++++++++------
>  net/core/filter.c              | 53 ++++++++++++++++--
>  tools/include/uapi/linux/bpf.h | 40 ++++++++++++--
>  7 files changed, 335 insertions(+), 65 deletions(-)
>

[...]

> @@ -1521,9 +1532,19 @@ BPF_CALL_5(bpf_dynptr_read, void *, dst, u32, len, struct bpf_dynptr_kern *, src
>         if (err)
>                 return err;
>
> -       memcpy(dst, src->data + src->offset + offset, len);
> +       type = bpf_dynptr_get_type(src);
>
> -       return 0;
> +       switch (type) {
> +       case BPF_DYNPTR_TYPE_LOCAL:
> +       case BPF_DYNPTR_TYPE_RINGBUF:
> +               memcpy(dst, src->data + src->offset + offset, len);
> +               return 0;
> +       case BPF_DYNPTR_TYPE_SKB:
> +               return __bpf_skb_load_bytes(src->data, src->offset + offset, dst, len);
> +       default:
> +               WARN(true, "bpf_dynptr_read: unknown dynptr type %d\n", type);

nit: probably better to WARN_ONCE?

> +               return -EFAULT;
> +       }
>  }
>
>  static const struct bpf_func_proto bpf_dynptr_read_proto = {
> @@ -1540,18 +1561,32 @@ static const struct bpf_func_proto bpf_dynptr_read_proto = {
>  BPF_CALL_5(bpf_dynptr_write, struct bpf_dynptr_kern *, dst, u32, offset, void *, src,
>            u32, len, u64, flags)
>  {
> +       enum bpf_dynptr_type type;
>         int err;
>
> -       if (!dst->data || flags || bpf_dynptr_is_rdonly(dst))
> +       if (!dst->data || bpf_dynptr_is_rdonly(dst))
>                 return -EINVAL;
>
>         err = bpf_dynptr_check_off_len(dst, offset, len);
>         if (err)
>                 return err;
>
> -       memcpy(dst->data + dst->offset + offset, src, len);
> +       type = bpf_dynptr_get_type(dst);
>
> -       return 0;
> +       switch (type) {
> +       case BPF_DYNPTR_TYPE_LOCAL:
> +       case BPF_DYNPTR_TYPE_RINGBUF:
> +               if (flags)
> +                       return -EINVAL;
> +               memcpy(dst->data + dst->offset + offset, src, len);
> +               return 0;
> +       case BPF_DYNPTR_TYPE_SKB:
> +               return __bpf_skb_store_bytes(dst->data, dst->offset + offset, src, len,
> +                                            flags);
> +       default:
> +               WARN(true, "bpf_dynptr_write: unknown dynptr type %d\n", type);

ditto about WARN_ONCE

> +               return -EFAULT;
> +       }
>  }
>
>  static const struct bpf_func_proto bpf_dynptr_write_proto = {

[...]

> +static enum bpf_dynptr_type stack_slot_get_dynptr_info(struct bpf_verifier_env *env,
> +                                                      struct bpf_reg_state *reg,
> +                                                      int *ref_obj_id)
>  {
>         struct bpf_func_state *state = func(env, reg);
>         int spi = get_spi(reg->off);
>
> -       return state->stack[spi].spilled_ptr.id;
> +       if (ref_obj_id)
> +               *ref_obj_id = state->stack[spi].spilled_ptr.id;
> +
> +       return state->stack[spi].spilled_ptr.dynptr.type;
>  }
>
>  static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
> @@ -6056,7 +6075,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>                         case DYNPTR_TYPE_RINGBUF:
>                                 err_extra = "ringbuf ";
>                                 break;
> -                       default:
> +                       case DYNPTR_TYPE_SKB:
> +                               err_extra = "skb ";
>                                 break;
>                         }
>
> @@ -7149,6 +7169,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>  {
>         enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
>         const struct bpf_func_proto *fn = NULL;
> +       enum bpf_dynptr_type dynptr_type;

compiler technically can complain about use of uninitialized
dynptr_type, maybe initialize it to BPF_DYNPTR_TYPE_INVALID ?

>         enum bpf_return_type ret_type;
>         enum bpf_type_flag ret_flag;
>         struct bpf_reg_state *regs;
> @@ -7320,24 +7341,43 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>                         }
>                 }
>                 break;
> -       case BPF_FUNC_dynptr_data:
> -               for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
> -                       if (arg_type_is_dynptr(fn->arg_type[i])) {
> -                               if (meta.ref_obj_id) {
> -                                       verbose(env, "verifier internal error: meta.ref_obj_id already set\n");
> -                                       return -EFAULT;
> -                               }
> -                               /* Find the id of the dynptr we're tracking the reference of */
> -                               meta.ref_obj_id = stack_slot_get_id(env, &regs[BPF_REG_1 + i]);
> -                               break;
> -                       }
> +       case BPF_FUNC_dynptr_write:
> +       {
> +               struct bpf_reg_state *reg;
> +
> +               reg = get_dynptr_arg_reg(fn, regs);
> +               if (!reg) {
> +                       verbose(env, "verifier internal error: no dynptr in bpf_dynptr_data()\n");

s/bpf_dynptr_data/bpf_dynptr_write/

> +                       return -EFAULT;
>                 }
> -               if (i == MAX_BPF_FUNC_REG_ARGS) {
> +
> +               /* bpf_dynptr_write() for skb-type dynptrs may pull the skb, so we must
> +                * invalidate all data slices associated with it
> +                */
> +               if (stack_slot_get_dynptr_info(env, reg, NULL) == BPF_DYNPTR_TYPE_SKB)
> +                       changes_data = true;
> +
> +               break;
> +       }
> +       case BPF_FUNC_dynptr_data:
> +       {
> +               struct bpf_reg_state *reg;
> +
> +               reg = get_dynptr_arg_reg(fn, regs);
> +               if (!reg) {
>                         verbose(env, "verifier internal error: no dynptr in bpf_dynptr_data()\n");
>                         return -EFAULT;
>                 }
> +
> +               if (meta.ref_obj_id) {
> +                       verbose(env, "verifier internal error: meta.ref_obj_id already set\n");
> +                       return -EFAULT;
> +               }
> +
> +               dynptr_type = stack_slot_get_dynptr_info(env, reg, &meta.ref_obj_id);
>                 break;
>         }
> +       }
>
>         if (err)
>                 return err;
> @@ -7397,8 +7437,15 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>                 break;
>         case RET_PTR_TO_ALLOC_MEM:
>                 mark_reg_known_zero(env, regs, BPF_REG_0);
> -               regs[BPF_REG_0].type = PTR_TO_MEM | ret_flag;
> -               regs[BPF_REG_0].mem_size = meta.mem_size;
> +
> +               if (func_id == BPF_FUNC_dynptr_data &&
> +                   dynptr_type == BPF_DYNPTR_TYPE_SKB) {
> +                       regs[BPF_REG_0].type = PTR_TO_PACKET | ret_flag;
> +                       regs[BPF_REG_0].range = meta.mem_size;
> +               } else {
> +                       regs[BPF_REG_0].type = PTR_TO_MEM | ret_flag;
> +                       regs[BPF_REG_0].mem_size = meta.mem_size;
> +               }
>                 break;
>         case RET_PTR_TO_MEM_OR_BTF_ID:
>         {
> @@ -14141,6 +14188,24 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
>                         goto patch_call_imm;
>                 }
>
> +               if (insn->imm == BPF_FUNC_dynptr_from_skb) {
> +                       bool is_rdonly = !may_access_direct_pkt_data(env, NULL, BPF_WRITE);
> +
> +                       insn_buf[0] = BPF_MOV32_IMM(BPF_REG_4, is_rdonly);
> +                       insn_buf[1] = *insn;
> +                       cnt = 2;

This might have been discussed before, sorry if I missed it. But why
this special rewrite to provide read-only flag vs having two
implementations of bpf_dynptr_from_skb() depending on program types?
If program type allows read+write access, return
bpf_dynptr_from_skb_rdwr(), for those that have read-only access -
bpf_dynptr_from_skb_rdonly(), and for others return NULL proto
(disable helper)?

You can then use it very similarly for bpf_dynptr_from_skb_meta().

> +
> +                       new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
> +                       if (!new_prog)
> +                               return -ENOMEM;
> +
> +                       delta += cnt - 1;
> +                       env->prog = new_prog;
> +                       prog = new_prog;
> +                       insn = new_prog->insnsi + i + delta;
> +                       goto patch_call_imm;
> +               }
> +

[...]

>  BPF_CALL_1(bpf_sk_fullsock, struct sock *, sk)
>  {
>         return sk_fullsock(sk) ? (unsigned long)sk : (unsigned long)NULL;
> @@ -7726,6 +7763,8 @@ sk_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>                 return &bpf_get_socket_uid_proto;
>         case BPF_FUNC_perf_event_output:
>                 return &bpf_skb_event_output_proto;
> +       case BPF_FUNC_dynptr_from_skb:
> +               return &bpf_dynptr_from_skb_proto;
>         default:
>                 return bpf_sk_base_func_proto(func_id);
>         }
> @@ -7909,6 +7948,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>                 return &bpf_tcp_raw_check_syncookie_ipv6_proto;
>  #endif
>  #endif
> +       case BPF_FUNC_dynptr_from_skb:
> +               return &bpf_dynptr_from_skb_proto;
>         default:
>                 return bpf_sk_base_func_proto(func_id);
>         }
> @@ -8104,6 +8145,8 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>         case BPF_FUNC_skc_lookup_tcp:
>                 return &bpf_skc_lookup_tcp_proto;
>  #endif
> +       case BPF_FUNC_dynptr_from_skb:
> +               return &bpf_dynptr_from_skb_proto;
>         default:
>                 return bpf_sk_base_func_proto(func_id);
>         }
> @@ -8142,6 +8185,8 @@ lwt_out_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>                 return &bpf_get_smp_processor_id_proto;
>         case BPF_FUNC_skb_under_cgroup:
>                 return &bpf_skb_under_cgroup_proto;
> +       case BPF_FUNC_dynptr_from_skb:
> +               return &bpf_dynptr_from_skb_proto;

so like here you'd return read-only prototype for dynptr_from_skb
(seems like LWT programs have only read-only access, according to
may_access_direct_pkt_data), but in cases where
may_access_direct_pkt_data() allows read-write access we'd choose rdwr
variant of dynptr_from_skb?

>         default:
>                 return bpf_sk_base_func_proto(func_id);
>         }

[...]
Kumar Kartikeya Dwivedi Aug. 24, 2022, 11:25 p.m. UTC | #4
On Wed, 24 Aug 2022 at 20:42, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Aug 22, 2022 at 4:57 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > Add skb dynptrs, which are dynptrs whose underlying pointer points
> > to a skb. The dynptr acts on skb data. skb dynptrs have two main
> > benefits. One is that they allow operations on sizes that are not
> > statically known at compile-time (eg variable-sized accesses).
> > Another is that parsing the packet data through dynptrs (instead of
> > through direct access of skb->data and skb->data_end) can be more
> > ergonomic and less brittle (eg does not need manual if checking for
> > being within bounds of data_end).
> >
> > For bpf prog types that don't support writes on skb data, the dynptr is
> > read-only. For reads and writes through the bpf_dynptr_read() and
> > bpf_dynptr_write() interfaces, this supports reading and writing into
> > data in the non-linear paged buffers. For data slices (through the
> > bpf_dynptr_data() interface), if the data is in a paged buffer, the user
> > must first call bpf_skb_pull_data() to pull the data into the linear
> > portion. The returned data slice from a call to bpf_dynptr_data() is of
> > reg type PTR_TO_PACKET | PTR_MAYBE_NULL.
> >
> > Any bpf_dynptr_write() automatically invalidates any prior data slices
> > to the skb dynptr. This is because a bpf_dynptr_write() may be writing
> > to data in a paged buffer, so it will need to pull the buffer first into
> > the head. The reason it needs to be pulled instead of writing directly to
> > the paged buffers is because they may be cloned (only the head of the skb
> > is by default uncloned). As such, any bpf_dynptr_write() will
> > automatically have its prior data slices invalidated, even if the write
> > is to data in the skb head (the verifier has no way of differentiating
> > whether the write is to the head or paged buffers during program load
> > time). Please note as well that any other helper calls that change the
> > underlying packet buffer (eg bpf_skb_pull_data()) invalidates any data
> > slices of the skb dynptr as well. Whenever such a helper call is made,
> > the verifier marks any PTR_TO_PACKET reg type (which includes skb dynptr
> > slices since they are PTR_TO_PACKETs) as unknown. The stack trace for
> > this is check_helper_call() -> clear_all_pkt_pointers() ->
> > __clear_all_pkt_pointers() -> mark_reg_unknown()
> >
> > For examples of how skb dynptrs can be used, please see the attached
> > selftests.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> >  include/linux/bpf.h            | 83 +++++++++++++++++-----------
> >  include/linux/filter.h         |  4 ++
> >  include/uapi/linux/bpf.h       | 40 ++++++++++++--
> >  kernel/bpf/helpers.c           | 81 +++++++++++++++++++++++++---
> >  kernel/bpf/verifier.c          | 99 ++++++++++++++++++++++++++++------
> >  net/core/filter.c              | 53 ++++++++++++++++--
> >  tools/include/uapi/linux/bpf.h | 40 ++++++++++++--
> >  7 files changed, 335 insertions(+), 65 deletions(-)
> >
>
> [...]
>
> > @@ -1521,9 +1532,19 @@ BPF_CALL_5(bpf_dynptr_read, void *, dst, u32, len, struct bpf_dynptr_kern *, src
> >         if (err)
> >                 return err;
> >
> > -       memcpy(dst, src->data + src->offset + offset, len);
> > +       type = bpf_dynptr_get_type(src);
> >
> > -       return 0;
> > +       switch (type) {
> > +       case BPF_DYNPTR_TYPE_LOCAL:
> > +       case BPF_DYNPTR_TYPE_RINGBUF:
> > +               memcpy(dst, src->data + src->offset + offset, len);
> > +               return 0;
> > +       case BPF_DYNPTR_TYPE_SKB:
> > +               return __bpf_skb_load_bytes(src->data, src->offset + offset, dst, len);
> > +       default:
> > +               WARN(true, "bpf_dynptr_read: unknown dynptr type %d\n", type);
>
> nit: probably better to WARN_ONCE?
>
> > +               return -EFAULT;
> > +       }
> >  }
> >
> >  static const struct bpf_func_proto bpf_dynptr_read_proto = {
> > @@ -1540,18 +1561,32 @@ static const struct bpf_func_proto bpf_dynptr_read_proto = {
> >  BPF_CALL_5(bpf_dynptr_write, struct bpf_dynptr_kern *, dst, u32, offset, void *, src,
> >            u32, len, u64, flags)
> >  {
> > +       enum bpf_dynptr_type type;
> >         int err;
> >
> > -       if (!dst->data || flags || bpf_dynptr_is_rdonly(dst))
> > +       if (!dst->data || bpf_dynptr_is_rdonly(dst))
> >                 return -EINVAL;
> >
> >         err = bpf_dynptr_check_off_len(dst, offset, len);
> >         if (err)
> >                 return err;
> >
> > -       memcpy(dst->data + dst->offset + offset, src, len);
> > +       type = bpf_dynptr_get_type(dst);
> >
> > -       return 0;
> > +       switch (type) {
> > +       case BPF_DYNPTR_TYPE_LOCAL:
> > +       case BPF_DYNPTR_TYPE_RINGBUF:
> > +               if (flags)
> > +                       return -EINVAL;
> > +               memcpy(dst->data + dst->offset + offset, src, len);
> > +               return 0;
> > +       case BPF_DYNPTR_TYPE_SKB:
> > +               return __bpf_skb_store_bytes(dst->data, dst->offset + offset, src, len,
> > +                                            flags);
> > +       default:
> > +               WARN(true, "bpf_dynptr_write: unknown dynptr type %d\n", type);
>
> ditto about WARN_ONCE
>
> > +               return -EFAULT;
> > +       }
> >  }
> >
> >  static const struct bpf_func_proto bpf_dynptr_write_proto = {
>
> [...]
>
> > +static enum bpf_dynptr_type stack_slot_get_dynptr_info(struct bpf_verifier_env *env,
> > +                                                      struct bpf_reg_state *reg,
> > +                                                      int *ref_obj_id)
> >  {
> >         struct bpf_func_state *state = func(env, reg);
> >         int spi = get_spi(reg->off);
> >
> > -       return state->stack[spi].spilled_ptr.id;
> > +       if (ref_obj_id)
> > +               *ref_obj_id = state->stack[spi].spilled_ptr.id;
> > +
> > +       return state->stack[spi].spilled_ptr.dynptr.type;
> >  }
> >
> >  static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
> > @@ -6056,7 +6075,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
> >                         case DYNPTR_TYPE_RINGBUF:
> >                                 err_extra = "ringbuf ";
> >                                 break;
> > -                       default:
> > +                       case DYNPTR_TYPE_SKB:
> > +                               err_extra = "skb ";
> >                                 break;
> >                         }
> >
> > @@ -7149,6 +7169,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> >  {
> >         enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
> >         const struct bpf_func_proto *fn = NULL;
> > +       enum bpf_dynptr_type dynptr_type;
>
> compiler technically can complain about use of uninitialized
> dynptr_type, maybe initialize it to BPF_DYNPTR_TYPE_INVALID ?
>
> >         enum bpf_return_type ret_type;
> >         enum bpf_type_flag ret_flag;
> >         struct bpf_reg_state *regs;
> > @@ -7320,24 +7341,43 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> >                         }
> >                 }
> >                 break;
> > -       case BPF_FUNC_dynptr_data:
> > -               for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
> > -                       if (arg_type_is_dynptr(fn->arg_type[i])) {
> > -                               if (meta.ref_obj_id) {
> > -                                       verbose(env, "verifier internal error: meta.ref_obj_id already set\n");
> > -                                       return -EFAULT;
> > -                               }
> > -                               /* Find the id of the dynptr we're tracking the reference of */
> > -                               meta.ref_obj_id = stack_slot_get_id(env, &regs[BPF_REG_1 + i]);
> > -                               break;
> > -                       }
> > +       case BPF_FUNC_dynptr_write:
> > +       {
> > +               struct bpf_reg_state *reg;
> > +
> > +               reg = get_dynptr_arg_reg(fn, regs);
> > +               if (!reg) {
> > +                       verbose(env, "verifier internal error: no dynptr in bpf_dynptr_data()\n");
>
> s/bpf_dynptr_data/bpf_dynptr_write/
>
> > +                       return -EFAULT;
> >                 }
> > -               if (i == MAX_BPF_FUNC_REG_ARGS) {
> > +
> > +               /* bpf_dynptr_write() for skb-type dynptrs may pull the skb, so we must
> > +                * invalidate all data slices associated with it
> > +                */
> > +               if (stack_slot_get_dynptr_info(env, reg, NULL) == BPF_DYNPTR_TYPE_SKB)
> > +                       changes_data = true;
> > +
> > +               break;
> > +       }
> > +       case BPF_FUNC_dynptr_data:
> > +       {
> > +               struct bpf_reg_state *reg;
> > +
> > +               reg = get_dynptr_arg_reg(fn, regs);
> > +               if (!reg) {
> >                         verbose(env, "verifier internal error: no dynptr in bpf_dynptr_data()\n");
> >                         return -EFAULT;
> >                 }
> > +
> > +               if (meta.ref_obj_id) {
> > +                       verbose(env, "verifier internal error: meta.ref_obj_id already set\n");
> > +                       return -EFAULT;
> > +               }
> > +
> > +               dynptr_type = stack_slot_get_dynptr_info(env, reg, &meta.ref_obj_id);
> >                 break;
> >         }
> > +       }
> >
> >         if (err)
> >                 return err;
> > @@ -7397,8 +7437,15 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> >                 break;
> >         case RET_PTR_TO_ALLOC_MEM:
> >                 mark_reg_known_zero(env, regs, BPF_REG_0);
> > -               regs[BPF_REG_0].type = PTR_TO_MEM | ret_flag;
> > -               regs[BPF_REG_0].mem_size = meta.mem_size;
> > +
> > +               if (func_id == BPF_FUNC_dynptr_data &&
> > +                   dynptr_type == BPF_DYNPTR_TYPE_SKB) {
> > +                       regs[BPF_REG_0].type = PTR_TO_PACKET | ret_flag;
> > +                       regs[BPF_REG_0].range = meta.mem_size;
> > +               } else {
> > +                       regs[BPF_REG_0].type = PTR_TO_MEM | ret_flag;
> > +                       regs[BPF_REG_0].mem_size = meta.mem_size;
> > +               }
> >                 break;
> >         case RET_PTR_TO_MEM_OR_BTF_ID:
> >         {
> > @@ -14141,6 +14188,24 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
> >                         goto patch_call_imm;
> >                 }
> >
> > +               if (insn->imm == BPF_FUNC_dynptr_from_skb) {
> > +                       bool is_rdonly = !may_access_direct_pkt_data(env, NULL, BPF_WRITE);
> > +
> > +                       insn_buf[0] = BPF_MOV32_IMM(BPF_REG_4, is_rdonly);
> > +                       insn_buf[1] = *insn;
> > +                       cnt = 2;
>
> This might have been discussed before, sorry if I missed it. But why
> this special rewrite to provide read-only flag vs having two
> implementations of bpf_dynptr_from_skb() depending on program types?
> If program type allows read+write access, return
> bpf_dynptr_from_skb_rdwr(), for those that have read-only access -
> bpf_dynptr_from_skb_rdonly(), and for others return NULL proto
> (disable helper)?
>
> You can then use it very similarly for bpf_dynptr_from_skb_meta().
>

Related question, it seems we know statically if dynptr is read only
or not, so why even do all this hidden parameter passing and instead
just reject writes directly? You only need to be able to set
MEM_RDONLY on dynptr_data returned PTR_TO_PACKETs, and reject
dynptr_write when dynptr type is xdp/skb (and ctx is only one). That
seems simpler than checking it at runtime. Verifier already handles
MEM_RDONLY generically, you only need to add the guard for
check_packet_acces (and check_helper_mem_access for meta->raw_mode
under pkt case), and rejecting dynptr_write seems like a if condition.
Joanne Koong Aug. 25, 2022, 9:02 p.m. UTC | #5
On Wed, Aug 24, 2022 at 4:26 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Wed, 24 Aug 2022 at 20:42, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Aug 22, 2022 at 4:57 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > Add skb dynptrs, which are dynptrs whose underlying pointer points
> > > to a skb. The dynptr acts on skb data. skb dynptrs have two main
> > > benefits. One is that they allow operations on sizes that are not
> > > statically known at compile-time (eg variable-sized accesses).
> > > Another is that parsing the packet data through dynptrs (instead of
> > > through direct access of skb->data and skb->data_end) can be more
> > > ergonomic and less brittle (eg does not need manual if checking for
> > > being within bounds of data_end).
> > >
> > > For bpf prog types that don't support writes on skb data, the dynptr is
> > > read-only. For reads and writes through the bpf_dynptr_read() and
> > > bpf_dynptr_write() interfaces, this supports reading and writing into
> > > data in the non-linear paged buffers. For data slices (through the
> > > bpf_dynptr_data() interface), if the data is in a paged buffer, the user
> > > must first call bpf_skb_pull_data() to pull the data into the linear
> > > portion. The returned data slice from a call to bpf_dynptr_data() is of
> > > reg type PTR_TO_PACKET | PTR_MAYBE_NULL.
> > >
> > > Any bpf_dynptr_write() automatically invalidates any prior data slices
> > > to the skb dynptr. This is because a bpf_dynptr_write() may be writing
> > > to data in a paged buffer, so it will need to pull the buffer first into
> > > the head. The reason it needs to be pulled instead of writing directly to
> > > the paged buffers is because they may be cloned (only the head of the skb
> > > is by default uncloned). As such, any bpf_dynptr_write() will
> > > automatically have its prior data slices invalidated, even if the write
> > > is to data in the skb head (the verifier has no way of differentiating
> > > whether the write is to the head or paged buffers during program load
> > > time). Please note as well that any other helper calls that change the
> > > underlying packet buffer (eg bpf_skb_pull_data()) invalidates any data
> > > slices of the skb dynptr as well. Whenever such a helper call is made,
> > > the verifier marks any PTR_TO_PACKET reg type (which includes skb dynptr
> > > slices since they are PTR_TO_PACKETs) as unknown. The stack trace for
> > > this is check_helper_call() -> clear_all_pkt_pointers() ->
> > > __clear_all_pkt_pointers() -> mark_reg_unknown()
> > >
> > > For examples of how skb dynptrs can be used, please see the attached
> > > selftests.
> > >
> > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > ---
> > >  include/linux/bpf.h            | 83 +++++++++++++++++-----------
> > >  include/linux/filter.h         |  4 ++
> > >  include/uapi/linux/bpf.h       | 40 ++++++++++++--
> > >  kernel/bpf/helpers.c           | 81 +++++++++++++++++++++++++---
> > >  kernel/bpf/verifier.c          | 99 ++++++++++++++++++++++++++++------
> > >  net/core/filter.c              | 53 ++++++++++++++++--
> > >  tools/include/uapi/linux/bpf.h | 40 ++++++++++++--
> > >  7 files changed, 335 insertions(+), 65 deletions(-)
> > >
> >
> > [...]
> >
> > > @@ -1521,9 +1532,19 @@ BPF_CALL_5(bpf_dynptr_read, void *, dst, u32, len, struct bpf_dynptr_kern *, src
> > >         if (err)
> > >                 return err;
> > >
> > > -       memcpy(dst, src->data + src->offset + offset, len);
> > > +       type = bpf_dynptr_get_type(src);
> > >
> > > -       return 0;
> > > +       switch (type) {
> > > +       case BPF_DYNPTR_TYPE_LOCAL:
> > > +       case BPF_DYNPTR_TYPE_RINGBUF:
> > > +               memcpy(dst, src->data + src->offset + offset, len);
> > > +               return 0;
> > > +       case BPF_DYNPTR_TYPE_SKB:
> > > +               return __bpf_skb_load_bytes(src->data, src->offset + offset, dst, len);
> > > +       default:
> > > +               WARN(true, "bpf_dynptr_read: unknown dynptr type %d\n", type);
> >
> > nit: probably better to WARN_ONCE?
> >
> > > +               return -EFAULT;
> > > +       }
> > >  }
> > >
> > >  static const struct bpf_func_proto bpf_dynptr_read_proto = {
> > > @@ -1540,18 +1561,32 @@ static const struct bpf_func_proto bpf_dynptr_read_proto = {
> > >  BPF_CALL_5(bpf_dynptr_write, struct bpf_dynptr_kern *, dst, u32, offset, void *, src,
> > >            u32, len, u64, flags)
> > >  {
> > > +       enum bpf_dynptr_type type;
> > >         int err;
> > >
> > > -       if (!dst->data || flags || bpf_dynptr_is_rdonly(dst))
> > > +       if (!dst->data || bpf_dynptr_is_rdonly(dst))
> > >                 return -EINVAL;
> > >
> > >         err = bpf_dynptr_check_off_len(dst, offset, len);
> > >         if (err)
> > >                 return err;
> > >
> > > -       memcpy(dst->data + dst->offset + offset, src, len);
> > > +       type = bpf_dynptr_get_type(dst);
> > >
> > > -       return 0;
> > > +       switch (type) {
> > > +       case BPF_DYNPTR_TYPE_LOCAL:
> > > +       case BPF_DYNPTR_TYPE_RINGBUF:
> > > +               if (flags)
> > > +                       return -EINVAL;
> > > +               memcpy(dst->data + dst->offset + offset, src, len);
> > > +               return 0;
> > > +       case BPF_DYNPTR_TYPE_SKB:
> > > +               return __bpf_skb_store_bytes(dst->data, dst->offset + offset, src, len,
> > > +                                            flags);
> > > +       default:
> > > +               WARN(true, "bpf_dynptr_write: unknown dynptr type %d\n", type);
> >
> > ditto about WARN_ONCE
> >
> > > +               return -EFAULT;
> > > +       }
> > >  }
> > >
> > >  static const struct bpf_func_proto bpf_dynptr_write_proto = {
> >
> > [...]
> >
> > > +static enum bpf_dynptr_type stack_slot_get_dynptr_info(struct bpf_verifier_env *env,
> > > +                                                      struct bpf_reg_state *reg,
> > > +                                                      int *ref_obj_id)
> > >  {
> > >         struct bpf_func_state *state = func(env, reg);
> > >         int spi = get_spi(reg->off);
> > >
> > > -       return state->stack[spi].spilled_ptr.id;
> > > +       if (ref_obj_id)
> > > +               *ref_obj_id = state->stack[spi].spilled_ptr.id;
> > > +
> > > +       return state->stack[spi].spilled_ptr.dynptr.type;
> > >  }
> > >
> > >  static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
> > > @@ -6056,7 +6075,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
> > >                         case DYNPTR_TYPE_RINGBUF:
> > >                                 err_extra = "ringbuf ";
> > >                                 break;
> > > -                       default:
> > > +                       case DYNPTR_TYPE_SKB:
> > > +                               err_extra = "skb ";
> > >                                 break;
> > >                         }
> > >
> > > @@ -7149,6 +7169,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> > >  {
> > >         enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
> > >         const struct bpf_func_proto *fn = NULL;
> > > +       enum bpf_dynptr_type dynptr_type;
> >
> > compiler technically can complain about use of uninitialized
> > dynptr_type, maybe initialize it to BPF_DYNPTR_TYPE_INVALID ?
> >
> > >         enum bpf_return_type ret_type;
> > >         enum bpf_type_flag ret_flag;
> > >         struct bpf_reg_state *regs;
> > > @@ -7320,24 +7341,43 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> > >                         }
> > >                 }
> > >                 break;
> > > -       case BPF_FUNC_dynptr_data:
> > > -               for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
> > > -                       if (arg_type_is_dynptr(fn->arg_type[i])) {
> > > -                               if (meta.ref_obj_id) {
> > > -                                       verbose(env, "verifier internal error: meta.ref_obj_id already set\n");
> > > -                                       return -EFAULT;
> > > -                               }
> > > -                               /* Find the id of the dynptr we're tracking the reference of */
> > > -                               meta.ref_obj_id = stack_slot_get_id(env, &regs[BPF_REG_1 + i]);
> > > -                               break;
> > > -                       }
> > > +       case BPF_FUNC_dynptr_write:
> > > +       {
> > > +               struct bpf_reg_state *reg;
> > > +
> > > +               reg = get_dynptr_arg_reg(fn, regs);
> > > +               if (!reg) {
> > > +                       verbose(env, "verifier internal error: no dynptr in bpf_dynptr_data()\n");
> >
> > s/bpf_dynptr_data/bpf_dynptr_write/
> >
> > > +                       return -EFAULT;
> > >                 }
> > > -               if (i == MAX_BPF_FUNC_REG_ARGS) {
> > > +
> > > +               /* bpf_dynptr_write() for skb-type dynptrs may pull the skb, so we must
> > > +                * invalidate all data slices associated with it
> > > +                */
> > > +               if (stack_slot_get_dynptr_info(env, reg, NULL) == BPF_DYNPTR_TYPE_SKB)
> > > +                       changes_data = true;
> > > +
> > > +               break;
> > > +       }
> > > +       case BPF_FUNC_dynptr_data:
> > > +       {
> > > +               struct bpf_reg_state *reg;
> > > +
> > > +               reg = get_dynptr_arg_reg(fn, regs);
> > > +               if (!reg) {
> > >                         verbose(env, "verifier internal error: no dynptr in bpf_dynptr_data()\n");
> > >                         return -EFAULT;
> > >                 }
> > > +
> > > +               if (meta.ref_obj_id) {
> > > +                       verbose(env, "verifier internal error: meta.ref_obj_id already set\n");
> > > +                       return -EFAULT;
> > > +               }
> > > +
> > > +               dynptr_type = stack_slot_get_dynptr_info(env, reg, &meta.ref_obj_id);
> > >                 break;
> > >         }
> > > +       }
> > >
> > >         if (err)
> > >                 return err;
> > > @@ -7397,8 +7437,15 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> > >                 break;
> > >         case RET_PTR_TO_ALLOC_MEM:
> > >                 mark_reg_known_zero(env, regs, BPF_REG_0);
> > > -               regs[BPF_REG_0].type = PTR_TO_MEM | ret_flag;
> > > -               regs[BPF_REG_0].mem_size = meta.mem_size;
> > > +
> > > +               if (func_id == BPF_FUNC_dynptr_data &&
> > > +                   dynptr_type == BPF_DYNPTR_TYPE_SKB) {
> > > +                       regs[BPF_REG_0].type = PTR_TO_PACKET | ret_flag;
> > > +                       regs[BPF_REG_0].range = meta.mem_size;
> > > +               } else {
> > > +                       regs[BPF_REG_0].type = PTR_TO_MEM | ret_flag;
> > > +                       regs[BPF_REG_0].mem_size = meta.mem_size;
> > > +               }
> > >                 break;
> > >         case RET_PTR_TO_MEM_OR_BTF_ID:
> > >         {
> > > @@ -14141,6 +14188,24 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
> > >                         goto patch_call_imm;
> > >                 }
> > >
> > > +               if (insn->imm == BPF_FUNC_dynptr_from_skb) {
> > > +                       bool is_rdonly = !may_access_direct_pkt_data(env, NULL, BPF_WRITE);
> > > +
> > > +                       insn_buf[0] = BPF_MOV32_IMM(BPF_REG_4, is_rdonly);
> > > +                       insn_buf[1] = *insn;
> > > +                       cnt = 2;
> >
> > This might have been discussed before, sorry if I missed it. But why
> > this special rewrite to provide read-only flag vs having two
> > implementations of bpf_dynptr_from_skb() depending on program types?
> > If program type allows read+write access, return
> > bpf_dynptr_from_skb_rdwr(), for those that have read-only access -
> > bpf_dynptr_from_skb_rdonly(), and for others return NULL proto
> > (disable helper)?

Ooh I love this idea, thanks Andrii! I'll add this in v5.

> >
> > You can then use it very similarly for bpf_dynptr_from_skb_meta().
> >
>
> Related question, it seems we know statically if dynptr is read only
> or not, so why even do all this hidden parameter passing and instead
> just reject writes directly? You only need to be able to set
> MEM_RDONLY on dynptr_data returned PTR_TO_PACKETs, and reject
> dynptr_write when dynptr type is xdp/skb (and ctx is only one). That
> seems simpler than checking it at runtime. Verifier already handles
> MEM_RDONLY generically, you only need to add the guard for
> check_packet_acces (and check_helper_mem_access for meta->raw_mode
> under pkt case), and rejecting dynptr_write seems like a if condition.

There will be other helper functions that do writes (eg memcpy to
dynptrs, strncpy to dynptrs, probe read user to dynptrs, hashing
dynptrs, ...) so it's more scalable if we reject these at runtime
rather than enforce these at the verifier level. I also think it's
cleaner to keep the verifier logic as simple as possible and do the
checking in the helper.

There's a prior discussion about this in v1 [0] as well.

[0] https://lore.kernel.org/bpf/20220726184706.954822-1-joannelkoong@gmail.com/T/#mf3fe3965bc1852b07b8f2d306d09818b35acf3c1
Kumar Kartikeya Dwivedi Aug. 26, 2022, 12:18 a.m. UTC | #6
On Thu, 25 Aug 2022 at 23:02, Joanne Koong <joannelkoong@gmail.com> wrote:
> [...]
> >
> > Related question, it seems we know statically if dynptr is read only
> > or not, so why even do all this hidden parameter passing and instead
> > just reject writes directly? You only need to be able to set
> > MEM_RDONLY on dynptr_data returned PTR_TO_PACKETs, and reject
> > dynptr_write when dynptr type is xdp/skb (and ctx is only one). That
> > seems simpler than checking it at runtime. Verifier already handles
> > MEM_RDONLY generically, you only need to add the guard for
> > check_packet_acces (and check_helper_mem_access for meta->raw_mode
> > under pkt case), and rejecting dynptr_write seems like a if condition.
>
> There will be other helper functions that do writes (eg memcpy to
> dynptrs, strncpy to dynptrs, probe read user to dynptrs, hashing
> dynptrs, ...) so it's more scalable if we reject these at runtime
> rather than enforce these at the verifier level. I also think it's
> cleaner to keep the verifier logic as simple as possible and do the
> checking in the helper.

I won't be pushing this further, since you know what you plan to add
in the future better, but I still disagree.

I'm guessing there might be dynptrs where this read only property is
set dynamically at runtime, which is why you want to go this route?
I.e. you might not know statically whether dynptr is read only or not?

My main confusion is the inconsistency here.

Right now the patch implicitly relies on may_access_direct_pkt_data to
protect slices returned from dynptr_data, instead of setting
MEM_RDONLY on the returned PTR_TO_PACKET. Which is fine, it's not
needed. So indirectly, you are relying on knowing statically whether
the dynptr is read only or not. But then you also set this bit at
runtime.

So you reject some cases at load time, and the rest of them only at
runtime. Direct writes to dynptr slice fails load, writes through
helper does not (only fails at runtime).

Also, dynptr_data needs to know whether dynptr is read only
statically, to protect writes to its returned pointer, unless you
decide to introduce another helper for the dynamic rdonly bit case
(like dynptr_data_rdonly). Then you have a mismatch, where dynptr_data
works for some rdonly dynptrs (known to be rdonly statically, like
this skb one), but not for others.

I also don't agree about the complexity or scalability part, all the
infra and precedence is already there. We already have similar checks
for meta->raw_mode where we reject writes to read only pointers in
check_helper_mem_access.
Joanne Koong Aug. 26, 2022, 6:44 p.m. UTC | #7
On Thu, Aug 25, 2022 at 5:19 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Thu, 25 Aug 2022 at 23:02, Joanne Koong <joannelkoong@gmail.com> wrote:
> > [...]
> > >
> > > Related question, it seems we know statically if dynptr is read only
> > > or not, so why even do all this hidden parameter passing and instead
> > > just reject writes directly? You only need to be able to set
> > > MEM_RDONLY on dynptr_data returned PTR_TO_PACKETs, and reject
> > > dynptr_write when dynptr type is xdp/skb (and ctx is only one). That
> > > seems simpler than checking it at runtime. Verifier already handles
> > > MEM_RDONLY generically, you only need to add the guard for
> > > check_packet_acces (and check_helper_mem_access for meta->raw_mode
> > > under pkt case), and rejecting dynptr_write seems like a if condition.
> >
> > There will be other helper functions that do writes (eg memcpy to
> > dynptrs, strncpy to dynptrs, probe read user to dynptrs, hashing
> > dynptrs, ...) so it's more scalable if we reject these at runtime
> > rather than enforce these at the verifier level. I also think it's
> > cleaner to keep the verifier logic as simple as possible and do the
> > checking in the helper.
>
> I won't be pushing this further, since you know what you plan to add
> in the future better, but I still disagree.
>
> I'm guessing there might be dynptrs where this read only property is
> set dynamically at runtime, which is why you want to go this route?
> I.e. you might not know statically whether dynptr is read only or not?
>
> My main confusion is the inconsistency here.
>
> Right now the patch implicitly relies on may_access_direct_pkt_data to
> protect slices returned from dynptr_data, instead of setting
> MEM_RDONLY on the returned PTR_TO_PACKET. Which is fine, it's not
> needed. So indirectly, you are relying on knowing statically whether
> the dynptr is read only or not. But then you also set this bit at
> runtime.
>
> So you reject some cases at load time, and the rest of them only at
> runtime. Direct writes to dynptr slice fails load, writes through
> helper does not (only fails at runtime).
>
> Also, dynptr_data needs to know whether dynptr is read only
> statically, to protect writes to its returned pointer, unless you
> decide to introduce another helper for the dynamic rdonly bit case
> (like dynptr_data_rdonly). Then you have a mismatch, where dynptr_data
> works for some rdonly dynptrs (known to be rdonly statically, like
> this skb one), but not for others.
>
> I also don't agree about the complexity or scalability part, all the
> infra and precedence is already there. We already have similar checks
> for meta->raw_mode where we reject writes to read only pointers in
> check_helper_mem_access.

My point about scalability is that if we reject bpf_dynptr_write() at
load time, then we must reject any future dynptr helper that does any
writing at load time as well, to be consistent.

I don't feel strongly about whether we reject at load time or run
time. Rejecting at load time instead of runtime doesn't seem that
useful to me, but there's a good chance I'm wrong here since Martin
stated that he prefers rejecting at load time as well.

As for the added complexity part, what I mean is that we'll need to
keep track of some more stuff to support this, such as whether the
dynptr is read only and which helper functions need to check whether
the dynptr is read only or not.

On the whole, I think given that both Martin and you have specified
your preferences for this, we should reject at load time instead of
runtime. I'll make this change for v5 :)
Kumar Kartikeya Dwivedi Aug. 26, 2022, 6:51 p.m. UTC | #8
On Fri, 26 Aug 2022 at 20:44, Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Thu, Aug 25, 2022 at 5:19 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Thu, 25 Aug 2022 at 23:02, Joanne Koong <joannelkoong@gmail.com> wrote:
> > > [...]
> > > >
> > > > Related question, it seems we know statically if dynptr is read only
> > > > or not, so why even do all this hidden parameter passing and instead
> > > > just reject writes directly? You only need to be able to set
> > > > MEM_RDONLY on dynptr_data returned PTR_TO_PACKETs, and reject
> > > > dynptr_write when dynptr type is xdp/skb (and ctx is only one). That
> > > > seems simpler than checking it at runtime. Verifier already handles
> > > > MEM_RDONLY generically, you only need to add the guard for
> > > > check_packet_acces (and check_helper_mem_access for meta->raw_mode
> > > > under pkt case), and rejecting dynptr_write seems like a if condition.
> > >
> > > There will be other helper functions that do writes (eg memcpy to
> > > dynptrs, strncpy to dynptrs, probe read user to dynptrs, hashing
> > > dynptrs, ...) so it's more scalable if we reject these at runtime
> > > rather than enforce these at the verifier level. I also think it's
> > > cleaner to keep the verifier logic as simple as possible and do the
> > > checking in the helper.
> >
> > I won't be pushing this further, since you know what you plan to add
> > in the future better, but I still disagree.
> >
> > I'm guessing there might be dynptrs where this read only property is
> > set dynamically at runtime, which is why you want to go this route?
> > I.e. you might not know statically whether dynptr is read only or not?
> >
> > My main confusion is the inconsistency here.
> >
> > Right now the patch implicitly relies on may_access_direct_pkt_data to
> > protect slices returned from dynptr_data, instead of setting
> > MEM_RDONLY on the returned PTR_TO_PACKET. Which is fine, it's not
> > needed. So indirectly, you are relying on knowing statically whether
> > the dynptr is read only or not. But then you also set this bit at
> > runtime.
> >
> > So you reject some cases at load time, and the rest of them only at
> > runtime. Direct writes to dynptr slice fails load, writes through
> > helper does not (only fails at runtime).
> >
> > Also, dynptr_data needs to know whether dynptr is read only
> > statically, to protect writes to its returned pointer, unless you
> > decide to introduce another helper for the dynamic rdonly bit case
> > (like dynptr_data_rdonly). Then you have a mismatch, where dynptr_data
> > works for some rdonly dynptrs (known to be rdonly statically, like
> > this skb one), but not for others.
> >
> > I also don't agree about the complexity or scalability part, all the
> > infra and precedence is already there. We already have similar checks
> > for meta->raw_mode where we reject writes to read only pointers in
> > check_helper_mem_access.
>
> My point about scalability is that if we reject bpf_dynptr_write() at
> load time, then we must reject any future dynptr helper that does any
> writing at load time as well, to be consistent.
>
> I don't feel strongly about whether we reject at load time or run
> time. Rejecting at load time instead of runtime doesn't seem that
> useful to me, but there's a good chance I'm wrong here since Martin
> stated that he prefers rejecting at load time as well.
>
> As for the added complexity part, what I mean is that we'll need to
> keep track of some more stuff to support this, such as whether the
> dynptr is read only and which helper functions need to check whether
> the dynptr is read only or not.

What I'm trying to understand is how dynptr_data is supposed to work
if this dynptr read only bit is only known at runtime. Or will it be
always known statically so that it can set returned pointer as read
only? Because then it doesn't seem it is required or useful to track
the readonly bit at runtime.

It is fine if _everything_ checks it at runtime, but that doesn't seem
possible, hence the question. We would need a new slice helper that
only returns read-only slices, because dynptr_data can return rw
slices currently and it is already UAPI so changing that is not
possible anymore.
Joanne Koong Aug. 26, 2022, 7:49 p.m. UTC | #9
On Fri, Aug 26, 2022 at 11:52 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Fri, 26 Aug 2022 at 20:44, Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Thu, Aug 25, 2022 at 5:19 PM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > On Thu, 25 Aug 2022 at 23:02, Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > [...]
> > > > >
> > > > > Related question, it seems we know statically if dynptr is read only
> > > > > or not, so why even do all this hidden parameter passing and instead
> > > > > just reject writes directly? You only need to be able to set
> > > > > MEM_RDONLY on dynptr_data returned PTR_TO_PACKETs, and reject
> > > > > dynptr_write when dynptr type is xdp/skb (and ctx is only one). That
> > > > > seems simpler than checking it at runtime. Verifier already handles
> > > > > MEM_RDONLY generically, you only need to add the guard for
> > > > > check_packet_acces (and check_helper_mem_access for meta->raw_mode
> > > > > under pkt case), and rejecting dynptr_write seems like a if condition.
> > > >
> > > > There will be other helper functions that do writes (eg memcpy to
> > > > dynptrs, strncpy to dynptrs, probe read user to dynptrs, hashing
> > > > dynptrs, ...) so it's more scalable if we reject these at runtime
> > > > rather than enforce these at the verifier level. I also think it's
> > > > cleaner to keep the verifier logic as simple as possible and do the
> > > > checking in the helper.
> > >
> > > I won't be pushing this further, since you know what you plan to add
> > > in the future better, but I still disagree.
> > >
> > > I'm guessing there might be dynptrs where this read only property is
> > > set dynamically at runtime, which is why you want to go this route?
> > > I.e. you might not know statically whether dynptr is read only or not?
> > >
> > > My main confusion is the inconsistency here.
> > >
> > > Right now the patch implicitly relies on may_access_direct_pkt_data to
> > > protect slices returned from dynptr_data, instead of setting
> > > MEM_RDONLY on the returned PTR_TO_PACKET. Which is fine, it's not
> > > needed. So indirectly, you are relying on knowing statically whether
> > > the dynptr is read only or not. But then you also set this bit at
> > > runtime.
> > >
> > > So you reject some cases at load time, and the rest of them only at
> > > runtime. Direct writes to dynptr slice fails load, writes through
> > > helper does not (only fails at runtime).
> > >
> > > Also, dynptr_data needs to know whether dynptr is read only
> > > statically, to protect writes to its returned pointer, unless you
> > > decide to introduce another helper for the dynamic rdonly bit case
> > > (like dynptr_data_rdonly). Then you have a mismatch, where dynptr_data
> > > works for some rdonly dynptrs (known to be rdonly statically, like
> > > this skb one), but not for others.
> > >
> > > I also don't agree about the complexity or scalability part, all the
> > > infra and precedence is already there. We already have similar checks
> > > for meta->raw_mode where we reject writes to read only pointers in
> > > check_helper_mem_access.
> >
> > My point about scalability is that if we reject bpf_dynptr_write() at
> > load time, then we must reject any future dynptr helper that does any
> > writing at load time as well, to be consistent.
> >
> > I don't feel strongly about whether we reject at load time or run
> > time. Rejecting at load time instead of runtime doesn't seem that
> > useful to me, but there's a good chance I'm wrong here since Martin
> > stated that he prefers rejecting at load time as well.
> >
> > As for the added complexity part, what I mean is that we'll need to
> > keep track of some more stuff to support this, such as whether the
> > dynptr is read only and which helper functions need to check whether
> > the dynptr is read only or not.
>
> What I'm trying to understand is how dynptr_data is supposed to work
> if this dynptr read only bit is only known at runtime. Or will it be
> always known statically so that it can set returned pointer as read
> only? Because then it doesn't seem it is required or useful to track
> the readonly bit at runtime.

I think it'll always be known statically whether the dynptr is
read-only or not. If we make all writable dynptr helper functions
reject read-only dynptrs at load time instead of run time, then yes we
can remove the read-only bit in the bpf_dynptr_kern struct.

There's also the question of whether this constraint (eg all read-only
writes are rejected at load time) is too rigid - for example, what if
in the future we want to add a helper function where if a certain
condition is met, then we write some number of bytes, else we read
some number of bytes? This would be not possible to add then, since
we'll only know at runtime whether the condition is met.

I personally lean towards rejecting helper function writes at runtime,
but if you think it's a non-trivial benefit to reject at load time
instead, I'm fine going with that.

>
> It is fine if _everything_ checks it at runtime, but that doesn't seem
> possible, hence the question. We would need a new slice helper that
> only returns read-only slices, because dynptr_data can return rw
> slices currently and it is already UAPI so changing that is not
> possible anymore.

I don't agree that if bpf_dynptr_write() is checked at runtime, then
bpf_dynptr_data must also be checked at runtime to be consistent. I
think it's fine if writes through helper functions are rejected at
runtime, and writes through direct access are rejected at load time.
That doesn't seem inconsistent to me.
Kumar Kartikeya Dwivedi Aug. 26, 2022, 8:54 p.m. UTC | #10
On Fri, 26 Aug 2022 at 21:49, Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Fri, Aug 26, 2022 at 11:52 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Fri, 26 Aug 2022 at 20:44, Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > On Thu, Aug 25, 2022 at 5:19 PM Kumar Kartikeya Dwivedi
> > > <memxor@gmail.com> wrote:
> > > >
> > > > On Thu, 25 Aug 2022 at 23:02, Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > > [...]
> > > > > >
> > > > > > Related question, it seems we know statically if dynptr is read only
> > > > > > or not, so why even do all this hidden parameter passing and instead
> > > > > > just reject writes directly? You only need to be able to set
> > > > > > MEM_RDONLY on dynptr_data returned PTR_TO_PACKETs, and reject
> > > > > > dynptr_write when dynptr type is xdp/skb (and ctx is only one). That
> > > > > > seems simpler than checking it at runtime. Verifier already handles
> > > > > > MEM_RDONLY generically, you only need to add the guard for
> > > > > > check_packet_acces (and check_helper_mem_access for meta->raw_mode
> > > > > > under pkt case), and rejecting dynptr_write seems like a if condition.
> > > > >
> > > > > There will be other helper functions that do writes (eg memcpy to
> > > > > dynptrs, strncpy to dynptrs, probe read user to dynptrs, hashing
> > > > > dynptrs, ...) so it's more scalable if we reject these at runtime
> > > > > rather than enforce these at the verifier level. I also think it's
> > > > > cleaner to keep the verifier logic as simple as possible and do the
> > > > > checking in the helper.
> > > >
> > > > I won't be pushing this further, since you know what you plan to add
> > > > in the future better, but I still disagree.
> > > >
> > > > I'm guessing there might be dynptrs where this read only property is
> > > > set dynamically at runtime, which is why you want to go this route?
> > > > I.e. you might not know statically whether dynptr is read only or not?
> > > >
> > > > My main confusion is the inconsistency here.
> > > >
> > > > Right now the patch implicitly relies on may_access_direct_pkt_data to
> > > > protect slices returned from dynptr_data, instead of setting
> > > > MEM_RDONLY on the returned PTR_TO_PACKET. Which is fine, it's not
> > > > needed. So indirectly, you are relying on knowing statically whether
> > > > the dynptr is read only or not. But then you also set this bit at
> > > > runtime.
> > > >
> > > > So you reject some cases at load time, and the rest of them only at
> > > > runtime. Direct writes to dynptr slice fails load, writes through
> > > > helper does not (only fails at runtime).
> > > >
> > > > Also, dynptr_data needs to know whether dynptr is read only
> > > > statically, to protect writes to its returned pointer, unless you
> > > > decide to introduce another helper for the dynamic rdonly bit case
> > > > (like dynptr_data_rdonly). Then you have a mismatch, where dynptr_data
> > > > works for some rdonly dynptrs (known to be rdonly statically, like
> > > > this skb one), but not for others.
> > > >
> > > > I also don't agree about the complexity or scalability part, all the
> > > > infra and precedence is already there. We already have similar checks
> > > > for meta->raw_mode where we reject writes to read only pointers in
> > > > check_helper_mem_access.
> > >
> > > My point about scalability is that if we reject bpf_dynptr_write() at
> > > load time, then we must reject any future dynptr helper that does any
> > > writing at load time as well, to be consistent.
> > >
> > > I don't feel strongly about whether we reject at load time or run
> > > time. Rejecting at load time instead of runtime doesn't seem that
> > > useful to me, but there's a good chance I'm wrong here since Martin
> > > stated that he prefers rejecting at load time as well.
> > >
> > > As for the added complexity part, what I mean is that we'll need to
> > > keep track of some more stuff to support this, such as whether the
> > > dynptr is read only and which helper functions need to check whether
> > > the dynptr is read only or not.
> >
> > What I'm trying to understand is how dynptr_data is supposed to work
> > if this dynptr read only bit is only known at runtime. Or will it be
> > always known statically so that it can set returned pointer as read
> > only? Because then it doesn't seem it is required or useful to track
> > the readonly bit at runtime.
>
> I think it'll always be known statically whether the dynptr is
> read-only or not. If we make all writable dynptr helper functions
> reject read-only dynptrs at load time instead of run time, then yes we
> can remove the read-only bit in the bpf_dynptr_kern struct.
>
> There's also the question of whether this constraint (eg all read-only
> writes are rejected at load time) is too rigid - for example, what if
> in the future we want to add a helper function where if a certain
> condition is met, then we write some number of bytes, else we read
> some number of bytes? This would be not possible to add then, since
> we'll only know at runtime whether the condition is met.
>
> I personally lean towards rejecting helper function writes at runtime,
> but if you think it's a non-trivial benefit to reject at load time
> instead, I'm fine going with that.
>

My personal opinion is this:

When I am working with a statically known read only dynptr, it is like
declaring a variable const. Every function expecting it to be
non-const should fail compilation, and trying to mutate the variables
through writes should also fail compilation. For BPF compilation is
analogous to program load.

It might be that said variable is not const, then those operations may
fail at runtime due to some other reason. Being dynamically read-only
is then a runtime failure condition, which will cause failure at
runtime. Both are distinct cases in my mind, and it is fine to fail at
runtime when we don't know. In general, you save a lot of time of the
user in my opinion (esp. people new to things) if you reject known
incorrectness as early as possible.

E.g. load a dynptr from a map, where the field accepts storing both
read only and non-read only ones. Then it is expected that writes may
fail at runtime. That also allows you to switch read-only ness at
runtime back to rw. But if the field only expects rdonly dynptr,
verifier knows that the type is const statically, so it triggers
failures for writes at load time instead.

Taking this a step further, you may even store rw dynptr to a map
field expecting rdonly dynptr. That's like returning a const pointer
from a function for a rw memory, where you want to limit access of the
user, even better if you do it statically. Then functions trying to
write to dynptr loaded from said map field will fail load itself,
while others having access to rw dynptr can do it just fine.

When the verifier does not know, it does not know. There will be such
cases when you make const-ness a runtime property.

> >
> > It is fine if _everything_ checks it at runtime, but that doesn't seem
> > possible, hence the question. We would need a new slice helper that
> > only returns read-only slices, because dynptr_data can return rw
> > slices currently and it is already UAPI so changing that is not
> > possible anymore.
>
> I don't agree that if bpf_dynptr_write() is checked at runtime, then
> bpf_dynptr_data must also be checked at runtime to be consistent. I
> think it's fine if writes through helper functions are rejected at
> runtime, and writes through direct access are rejected at load time.
> That doesn't seem inconsistent to me.

My point was more that dynptr_data cannot propagate runtime
read-only-ness to its returned pointer. The verifier has to know
statically, at which point I don't see why we can't just reject other
cases at load anyway.

When we have dynptrs which have const-ness as a runtime property, it
is ok to support that by failing at runtime (but then you'll have a
hard time deciding how you want dynptr_data to work, most likely
you'll need another helper which returns only a rdonly slice, when it
fails, we call dynptr_data for rw slice).

But as I said before, I don't know how dynptr is going to evolve in
the future, so you'll have a better idea, and I'll leave it up to you
decide how you want to design its API. Enough words exchanged about
this :).
Andrii Nakryiko Aug. 27, 2022, 5:36 a.m. UTC | #11
On Fri, Aug 26, 2022 at 1:54 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Fri, 26 Aug 2022 at 21:49, Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Fri, Aug 26, 2022 at 11:52 AM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > On Fri, 26 Aug 2022 at 20:44, Joanne Koong <joannelkoong@gmail.com> wrote:
> > > >
> > > > On Thu, Aug 25, 2022 at 5:19 PM Kumar Kartikeya Dwivedi
> > > > <memxor@gmail.com> wrote:
> > > > >
> > > > > On Thu, 25 Aug 2022 at 23:02, Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > > > [...]
> > > > > > >
> > > > > > > Related question, it seems we know statically if dynptr is read only
> > > > > > > or not, so why even do all this hidden parameter passing and instead
> > > > > > > just reject writes directly? You only need to be able to set
> > > > > > > MEM_RDONLY on dynptr_data returned PTR_TO_PACKETs, and reject
> > > > > > > dynptr_write when dynptr type is xdp/skb (and ctx is only one). That
> > > > > > > seems simpler than checking it at runtime. Verifier already handles
> > > > > > > MEM_RDONLY generically, you only need to add the guard for
> > > > > > > check_packet_acces (and check_helper_mem_access for meta->raw_mode
> > > > > > > under pkt case), and rejecting dynptr_write seems like a if condition.
> > > > > >
> > > > > > There will be other helper functions that do writes (eg memcpy to
> > > > > > dynptrs, strncpy to dynptrs, probe read user to dynptrs, hashing
> > > > > > dynptrs, ...) so it's more scalable if we reject these at runtime
> > > > > > rather than enforce these at the verifier level. I also think it's
> > > > > > cleaner to keep the verifier logic as simple as possible and do the
> > > > > > checking in the helper.
> > > > >
> > > > > I won't be pushing this further, since you know what you plan to add
> > > > > in the future better, but I still disagree.
> > > > >
> > > > > I'm guessing there might be dynptrs where this read only property is
> > > > > set dynamically at runtime, which is why you want to go this route?
> > > > > I.e. you might not know statically whether dynptr is read only or not?
> > > > >
> > > > > My main confusion is the inconsistency here.
> > > > >
> > > > > Right now the patch implicitly relies on may_access_direct_pkt_data to
> > > > > protect slices returned from dynptr_data, instead of setting
> > > > > MEM_RDONLY on the returned PTR_TO_PACKET. Which is fine, it's not
> > > > > needed. So indirectly, you are relying on knowing statically whether
> > > > > the dynptr is read only or not. But then you also set this bit at
> > > > > runtime.
> > > > >
> > > > > So you reject some cases at load time, and the rest of them only at
> > > > > runtime. Direct writes to dynptr slice fails load, writes through
> > > > > helper does not (only fails at runtime).
> > > > >
> > > > > Also, dynptr_data needs to know whether dynptr is read only
> > > > > statically, to protect writes to its returned pointer, unless you
> > > > > decide to introduce another helper for the dynamic rdonly bit case
> > > > > (like dynptr_data_rdonly). Then you have a mismatch, where dynptr_data
> > > > > works for some rdonly dynptrs (known to be rdonly statically, like
> > > > > this skb one), but not for others.
> > > > >
> > > > > I also don't agree about the complexity or scalability part, all the
> > > > > infra and precedence is already there. We already have similar checks
> > > > > for meta->raw_mode where we reject writes to read only pointers in
> > > > > check_helper_mem_access.
> > > >
> > > > My point about scalability is that if we reject bpf_dynptr_write() at
> > > > load time, then we must reject any future dynptr helper that does any
> > > > writing at load time as well, to be consistent.
> > > >
> > > > I don't feel strongly about whether we reject at load time or run
> > > > time. Rejecting at load time instead of runtime doesn't seem that
> > > > useful to me, but there's a good chance I'm wrong here since Martin
> > > > stated that he prefers rejecting at load time as well.
> > > >
> > > > As for the added complexity part, what I mean is that we'll need to
> > > > keep track of some more stuff to support this, such as whether the
> > > > dynptr is read only and which helper functions need to check whether
> > > > the dynptr is read only or not.
> > >
> > > What I'm trying to understand is how dynptr_data is supposed to work
> > > if this dynptr read only bit is only known at runtime. Or will it be
> > > always known statically so that it can set returned pointer as read
> > > only? Because then it doesn't seem it is required or useful to track
> > > the readonly bit at runtime.
> >
> > I think it'll always be known statically whether the dynptr is
> > read-only or not. If we make all writable dynptr helper functions
> > reject read-only dynptrs at load time instead of run time, then yes we
> > can remove the read-only bit in the bpf_dynptr_kern struct.
> >
> > There's also the question of whether this constraint (eg all read-only
> > writes are rejected at load time) is too rigid - for example, what if
> > in the future we want to add a helper function where if a certain
> > condition is met, then we write some number of bytes, else we read
> > some number of bytes? This would be not possible to add then, since
> > we'll only know at runtime whether the condition is met.
> >
> > I personally lean towards rejecting helper function writes at runtime,
> > but if you think it's a non-trivial benefit to reject at load time
> > instead, I'm fine going with that.
> >
>
> My personal opinion is this:
>
> When I am working with a statically known read only dynptr, it is like
> declaring a variable const. Every function expecting it to be
> non-const should fail compilation, and trying to mutate the variables
> through writes should also fail compilation. For BPF compilation is
> analogous to program load.
>
> It might be that said variable is not const, then those operations may
> fail at runtime due to some other reason. Being dynamically read-only
> is then a runtime failure condition, which will cause failure at
> runtime. Both are distinct cases in my mind, and it is fine to fail at
> runtime when we don't know. In general, you save a lot of time of the
> user in my opinion (esp. people new to things) if you reject known
> incorrectness as early as possible.
>
> E.g. load a dynptr from a map, where the field accepts storing both
> read only and non-read only ones. Then it is expected that writes may
> fail at runtime. That also allows you to switch read-only ness at
> runtime back to rw. But if the field only expects rdonly dynptr,
> verifier knows that the type is const statically, so it triggers
> failures for writes at load time instead.
>
> Taking this a step further, you may even store rw dynptr to a map
> field expecting rdonly dynptr. That's like returning a const pointer
> from a function for a rw memory, where you want to limit access of the
> user, even better if you do it statically. Then functions trying to
> write to dynptr loaded from said map field will fail load itself,
> while others having access to rw dynptr can do it just fine.
>
> When the verifier does not know, it does not know. There will be such
> cases when you make const-ness a runtime property.
>
> > >
> > > It is fine if _everything_ checks it at runtime, but that doesn't seem
> > > possible, hence the question. We would need a new slice helper that
> > > only returns read-only slices, because dynptr_data can return rw
> > > slices currently and it is already UAPI so changing that is not
> > > possible anymore.
> >
> > I don't agree that if bpf_dynptr_write() is checked at runtime, then
> > bpf_dynptr_data must also be checked at runtime to be consistent. I
> > think it's fine if writes through helper functions are rejected at
> > runtime, and writes through direct access are rejected at load time.
> > That doesn't seem inconsistent to me.
>
> My point was more that dynptr_data cannot propagate runtime
> read-only-ness to its returned pointer. The verifier has to know
> statically, at which point I don't see why we can't just reject other
> cases at load anyway.

I think the right answer here is to not make bpf_dynptr_data() return
direct pointer of changing read-only-ness. Maybe the right answer here
is another helper, bpf_dynptr_data_rdonly(), that will return NULL for
non-read-only dynptr and PTR_TO_MEM | MEM_RDONLY if dynptr is indeed
read-only?

By saying that read-only-ness of dynptr should be statically known and
rejecting some dynptr functions at load time places us at the mercy of
verifier's complete knowledge of application logic, which is exactly
against the spirit of dynptr.

It's only slightly tangential, but I still dread my experience proving
to BPF verifier that some value is strictly greater than zero for BPF
helper that expected ARG_CONST_SIZE (not ARG_CONST_SIZE_OR_ZERO).
There were also cases were absolutely correct program had to be
mangled just to prove to BPF verifier that it indeed can return just 0
or 1, etc. This is not to bash BPF verifier, but just to point out
that sometimes unnecessary strictness adds nothing but unnecessary
pain to user. So, let's not reject anything at load, we can check all
that at runtime and return NULL.

But bpf_dynptr_data_rdonly() seems useful for cases where we know we
are not going to write

>
> When we have dynptrs which have const-ness as a runtime property, it
> is ok to support that by failing at runtime (but then you'll have a
> hard time deciding how you want dynptr_data to work, most likely
> you'll need another helper which returns only a rdonly slice, when it
> fails, we call dynptr_data for rw slice).
>
> But as I said before, I don't know how dynptr is going to evolve in
> the future, so you'll have a better idea, and I'll leave it up to you
> decide how you want to design its API. Enough words exchanged about
> this :).

directionally, dynptr is about offloading decisions to runtime, so I
think avoiding unnecessary restrictions at verification time is the
right trade off
Kumar Kartikeya Dwivedi Aug. 27, 2022, 7:11 a.m. UTC | #12
On Sat, 27 Aug 2022 at 07:37, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Aug 26, 2022 at 1:54 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Fri, 26 Aug 2022 at 21:49, Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > On Fri, Aug 26, 2022 at 11:52 AM Kumar Kartikeya Dwivedi
> > > <memxor@gmail.com> wrote:
> > > >
> > > > On Fri, 26 Aug 2022 at 20:44, Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > >
> > > > > On Thu, Aug 25, 2022 at 5:19 PM Kumar Kartikeya Dwivedi
> > > > > <memxor@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, 25 Aug 2022 at 23:02, Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > > > > [...]
> > > > > > > >
> > > > > > > > Related question, it seems we know statically if dynptr is read only
> > > > > > > > or not, so why even do all this hidden parameter passing and instead
> > > > > > > > just reject writes directly? You only need to be able to set
> > > > > > > > MEM_RDONLY on dynptr_data returned PTR_TO_PACKETs, and reject
> > > > > > > > dynptr_write when dynptr type is xdp/skb (and ctx is only one). That
> > > > > > > > seems simpler than checking it at runtime. Verifier already handles
> > > > > > > > MEM_RDONLY generically, you only need to add the guard for
> > > > > > > > check_packet_acces (and check_helper_mem_access for meta->raw_mode
> > > > > > > > under pkt case), and rejecting dynptr_write seems like a if condition.
> > > > > > >
> > > > > > > There will be other helper functions that do writes (eg memcpy to
> > > > > > > dynptrs, strncpy to dynptrs, probe read user to dynptrs, hashing
> > > > > > > dynptrs, ...) so it's more scalable if we reject these at runtime
> > > > > > > rather than enforce these at the verifier level. I also think it's
> > > > > > > cleaner to keep the verifier logic as simple as possible and do the
> > > > > > > checking in the helper.
> > > > > >
> > > > > > I won't be pushing this further, since you know what you plan to add
> > > > > > in the future better, but I still disagree.
> > > > > >
> > > > > > I'm guessing there might be dynptrs where this read only property is
> > > > > > set dynamically at runtime, which is why you want to go this route?
> > > > > > I.e. you might not know statically whether dynptr is read only or not?
> > > > > >
> > > > > > My main confusion is the inconsistency here.
> > > > > >
> > > > > > Right now the patch implicitly relies on may_access_direct_pkt_data to
> > > > > > protect slices returned from dynptr_data, instead of setting
> > > > > > MEM_RDONLY on the returned PTR_TO_PACKET. Which is fine, it's not
> > > > > > needed. So indirectly, you are relying on knowing statically whether
> > > > > > the dynptr is read only or not. But then you also set this bit at
> > > > > > runtime.
> > > > > >
> > > > > > So you reject some cases at load time, and the rest of them only at
> > > > > > runtime. Direct writes to dynptr slice fails load, writes through
> > > > > > helper does not (only fails at runtime).
> > > > > >
> > > > > > Also, dynptr_data needs to know whether dynptr is read only
> > > > > > statically, to protect writes to its returned pointer, unless you
> > > > > > decide to introduce another helper for the dynamic rdonly bit case
> > > > > > (like dynptr_data_rdonly). Then you have a mismatch, where dynptr_data
> > > > > > works for some rdonly dynptrs (known to be rdonly statically, like
> > > > > > this skb one), but not for others.
> > > > > >
> > > > > > I also don't agree about the complexity or scalability part, all the
> > > > > > infra and precedence is already there. We already have similar checks
> > > > > > for meta->raw_mode where we reject writes to read only pointers in
> > > > > > check_helper_mem_access.
> > > > >
> > > > > My point about scalability is that if we reject bpf_dynptr_write() at
> > > > > load time, then we must reject any future dynptr helper that does any
> > > > > writing at load time as well, to be consistent.
> > > > >
> > > > > I don't feel strongly about whether we reject at load time or run
> > > > > time. Rejecting at load time instead of runtime doesn't seem that
> > > > > useful to me, but there's a good chance I'm wrong here since Martin
> > > > > stated that he prefers rejecting at load time as well.
> > > > >
> > > > > As for the added complexity part, what I mean is that we'll need to
> > > > > keep track of some more stuff to support this, such as whether the
> > > > > dynptr is read only and which helper functions need to check whether
> > > > > the dynptr is read only or not.
> > > >
> > > > What I'm trying to understand is how dynptr_data is supposed to work
> > > > if this dynptr read only bit is only known at runtime. Or will it be
> > > > always known statically so that it can set returned pointer as read
> > > > only? Because then it doesn't seem it is required or useful to track
> > > > the readonly bit at runtime.
> > >
> > > I think it'll always be known statically whether the dynptr is
> > > read-only or not. If we make all writable dynptr helper functions
> > > reject read-only dynptrs at load time instead of run time, then yes we
> > > can remove the read-only bit in the bpf_dynptr_kern struct.
> > >
> > > There's also the question of whether this constraint (eg all read-only
> > > writes are rejected at load time) is too rigid - for example, what if
> > > in the future we want to add a helper function where if a certain
> > > condition is met, then we write some number of bytes, else we read
> > > some number of bytes? This would be not possible to add then, since
> > > we'll only know at runtime whether the condition is met.
> > >
> > > I personally lean towards rejecting helper function writes at runtime,
> > > but if you think it's a non-trivial benefit to reject at load time
> > > instead, I'm fine going with that.
> > >
> >
> > My personal opinion is this:
> >
> > When I am working with a statically known read only dynptr, it is like
> > declaring a variable const. Every function expecting it to be
> > non-const should fail compilation, and trying to mutate the variables
> > through writes should also fail compilation. For BPF compilation is
> > analogous to program load.
> >
> > It might be that said variable is not const, then those operations may
> > fail at runtime due to some other reason. Being dynamically read-only
> > is then a runtime failure condition, which will cause failure at
> > runtime. Both are distinct cases in my mind, and it is fine to fail at
> > runtime when we don't know. In general, you save a lot of time of the
> > user in my opinion (esp. people new to things) if you reject known
> > incorrectness as early as possible.
> >
> > E.g. load a dynptr from a map, where the field accepts storing both
> > read only and non-read only ones. Then it is expected that writes may
> > fail at runtime. That also allows you to switch read-only ness at
> > runtime back to rw. But if the field only expects rdonly dynptr,
> > verifier knows that the type is const statically, so it triggers
> > failures for writes at load time instead.
> >
> > Taking this a step further, you may even store rw dynptr to a map
> > field expecting rdonly dynptr. That's like returning a const pointer
> > from a function for a rw memory, where you want to limit access of the
> > user, even better if you do it statically. Then functions trying to
> > write to dynptr loaded from said map field will fail load itself,
> > while others having access to rw dynptr can do it just fine.
> >
> > When the verifier does not know, it does not know. There will be such
> > cases when you make const-ness a runtime property.
> >
> > > >
> > > > It is fine if _everything_ checks it at runtime, but that doesn't seem
> > > > possible, hence the question. We would need a new slice helper that
> > > > only returns read-only slices, because dynptr_data can return rw
> > > > slices currently and it is already UAPI so changing that is not
> > > > possible anymore.
> > >
> > > I don't agree that if bpf_dynptr_write() is checked at runtime, then
> > > bpf_dynptr_data must also be checked at runtime to be consistent. I
> > > think it's fine if writes through helper functions are rejected at
> > > runtime, and writes through direct access are rejected at load time.
> > > That doesn't seem inconsistent to me.
> >
> > My point was more that dynptr_data cannot propagate runtime
> > read-only-ness to its returned pointer. The verifier has to know
> > statically, at which point I don't see why we can't just reject other
> > cases at load anyway.
>
> I think the right answer here is to not make bpf_dynptr_data() return
> direct pointer of changing read-only-ness. Maybe the right answer here
> is another helper, bpf_dynptr_data_rdonly(), that will return NULL for
> non-read-only dynptr and PTR_TO_MEM | MEM_RDONLY if dynptr is indeed
> read-only?

Shouldn't it be the other way around? bpf_dynptr_data_rdonly() should
work for both ro and rw dynptrs, and bpf_dynptr_data() only for rw
dynptr?

And yes, you're kind of painting yourself in a corner if you allow
dynptr_data to work with statically ro dynptrs now, ascertaining the
ro bit for the returned slice, and then later you plan to add dynptrs
whose ro vs rw is not known to the verifier statically. Then it works
well for statically known ones (returning both ro and rw slices), but
has to return NULL at runtime for statically unknown read only ones,
and always rw slice in verifier state for them.

>
> By saying that read-only-ness of dynptr should be statically known and
> rejecting some dynptr functions at load time places us at the mercy of
> verifier's complete knowledge of application logic, which is exactly
> against the spirit of dynptr.
>

Right, that might be too restrictive if we require them to be
statically read only.

But it's not about forcing it to be statically ro, it is more about
rejecting load when we know the program is incorrect (e.g. the types
are incorrect when passed to helpers), otherwise we throw the error at
runtime anyway, which seems to be the convention afaicu. But maybe I
missed the memo and we gradually want to move away from such strict
static checks.

I view the situation here similar to if we were rejecting direct
writes to PTR_TO_MEM | MEM_RDONLY at load time, but offloading as
runtime check in the helper writing to it as rw memory arg. It's as if
we pretend it's part of the 'type' of the register when doing direct
writes, but then ignore it while matching it against the said helper's
argument type.

> It's only slightly tangential, but I still dread my experience proving
> to BPF verifier that some value is strictly greater than zero for BPF
> helper that expected ARG_CONST_SIZE (not ARG_CONST_SIZE_OR_ZERO).
> There were also cases were absolutely correct program had to be
> mangled just to prove to BPF verifier that it indeed can return just 0
> or 1, etc. This is not to bash BPF verifier, but just to point out
> that sometimes unnecessary strictness adds nothing but unnecessary
> pain to user. So, let's not reject anything at load, we can check all
> that at runtime and return NULL.
>
> But bpf_dynptr_data_rdonly() seems useful for cases where we know we
> are not going to write
>
> >
> > When we have dynptrs which have const-ness as a runtime property, it
> > is ok to support that by failing at runtime (but then you'll have a
> > hard time deciding how you want dynptr_data to work, most likely
> > you'll need another helper which returns only a rdonly slice, when it
> > fails, we call dynptr_data for rw slice).
> >
> > But as I said before, I don't know how dynptr is going to evolve in
> > the future, so you'll have a better idea, and I'll leave it up to you
> > decide how you want to design its API. Enough words exchanged about
> > this :).
>
> directionally, dynptr is about offloading decisions to runtime, so I
> think avoiding unnecessary restrictions at verification time is the
> right trade off

Fair enough, I guess I've made my point. Let's go with what you both
feel would be best.
Andrii Nakryiko Aug. 27, 2022, 5:21 p.m. UTC | #13
On Sat, Aug 27, 2022 at 12:12 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Sat, 27 Aug 2022 at 07:37, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Fri, Aug 26, 2022 at 1:54 PM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > On Fri, 26 Aug 2022 at 21:49, Joanne Koong <joannelkoong@gmail.com> wrote:
> > > >
> > > > On Fri, Aug 26, 2022 at 11:52 AM Kumar Kartikeya Dwivedi
> > > > <memxor@gmail.com> wrote:
> > > > >
> > > > > On Fri, 26 Aug 2022 at 20:44, Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Aug 25, 2022 at 5:19 PM Kumar Kartikeya Dwivedi
> > > > > > <memxor@gmail.com> wrote:
> > > > > > >
> > > > > > > On Thu, 25 Aug 2022 at 23:02, Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > > > > > [...]
> > > > > > > > >
> > > > > > > > > Related question, it seems we know statically if dynptr is read only
> > > > > > > > > or not, so why even do all this hidden parameter passing and instead
> > > > > > > > > just reject writes directly? You only need to be able to set
> > > > > > > > > MEM_RDONLY on dynptr_data returned PTR_TO_PACKETs, and reject
> > > > > > > > > dynptr_write when dynptr type is xdp/skb (and ctx is only one). That
> > > > > > > > > seems simpler than checking it at runtime. Verifier already handles
> > > > > > > > > MEM_RDONLY generically, you only need to add the guard for
> > > > > > > > > check_packet_acces (and check_helper_mem_access for meta->raw_mode
> > > > > > > > > under pkt case), and rejecting dynptr_write seems like a if condition.
> > > > > > > >
> > > > > > > > There will be other helper functions that do writes (eg memcpy to
> > > > > > > > dynptrs, strncpy to dynptrs, probe read user to dynptrs, hashing
> > > > > > > > dynptrs, ...) so it's more scalable if we reject these at runtime
> > > > > > > > rather than enforce these at the verifier level. I also think it's
> > > > > > > > cleaner to keep the verifier logic as simple as possible and do the
> > > > > > > > checking in the helper.
> > > > > > >
> > > > > > > I won't be pushing this further, since you know what you plan to add
> > > > > > > in the future better, but I still disagree.
> > > > > > >
> > > > > > > I'm guessing there might be dynptrs where this read only property is
> > > > > > > set dynamically at runtime, which is why you want to go this route?
> > > > > > > I.e. you might not know statically whether dynptr is read only or not?
> > > > > > >
> > > > > > > My main confusion is the inconsistency here.
> > > > > > >
> > > > > > > Right now the patch implicitly relies on may_access_direct_pkt_data to
> > > > > > > protect slices returned from dynptr_data, instead of setting
> > > > > > > MEM_RDONLY on the returned PTR_TO_PACKET. Which is fine, it's not
> > > > > > > needed. So indirectly, you are relying on knowing statically whether
> > > > > > > the dynptr is read only or not. But then you also set this bit at
> > > > > > > runtime.
> > > > > > >
> > > > > > > So you reject some cases at load time, and the rest of them only at
> > > > > > > runtime. Direct writes to dynptr slice fails load, writes through
> > > > > > > helper does not (only fails at runtime).
> > > > > > >
> > > > > > > Also, dynptr_data needs to know whether dynptr is read only
> > > > > > > statically, to protect writes to its returned pointer, unless you
> > > > > > > decide to introduce another helper for the dynamic rdonly bit case
> > > > > > > (like dynptr_data_rdonly). Then you have a mismatch, where dynptr_data
> > > > > > > works for some rdonly dynptrs (known to be rdonly statically, like
> > > > > > > this skb one), but not for others.
> > > > > > >
> > > > > > > I also don't agree about the complexity or scalability part, all the
> > > > > > > infra and precedence is already there. We already have similar checks
> > > > > > > for meta->raw_mode where we reject writes to read only pointers in
> > > > > > > check_helper_mem_access.
> > > > > >
> > > > > > My point about scalability is that if we reject bpf_dynptr_write() at
> > > > > > load time, then we must reject any future dynptr helper that does any
> > > > > > writing at load time as well, to be consistent.
> > > > > >
> > > > > > I don't feel strongly about whether we reject at load time or run
> > > > > > time. Rejecting at load time instead of runtime doesn't seem that
> > > > > > useful to me, but there's a good chance I'm wrong here since Martin
> > > > > > stated that he prefers rejecting at load time as well.
> > > > > >
> > > > > > As for the added complexity part, what I mean is that we'll need to
> > > > > > keep track of some more stuff to support this, such as whether the
> > > > > > dynptr is read only and which helper functions need to check whether
> > > > > > the dynptr is read only or not.
> > > > >
> > > > > What I'm trying to understand is how dynptr_data is supposed to work
> > > > > if this dynptr read only bit is only known at runtime. Or will it be
> > > > > always known statically so that it can set returned pointer as read
> > > > > only? Because then it doesn't seem it is required or useful to track
> > > > > the readonly bit at runtime.
> > > >
> > > > I think it'll always be known statically whether the dynptr is
> > > > read-only or not. If we make all writable dynptr helper functions
> > > > reject read-only dynptrs at load time instead of run time, then yes we
> > > > can remove the read-only bit in the bpf_dynptr_kern struct.
> > > >
> > > > There's also the question of whether this constraint (eg all read-only
> > > > writes are rejected at load time) is too rigid - for example, what if
> > > > in the future we want to add a helper function where if a certain
> > > > condition is met, then we write some number of bytes, else we read
> > > > some number of bytes? This would be not possible to add then, since
> > > > we'll only know at runtime whether the condition is met.
> > > >
> > > > I personally lean towards rejecting helper function writes at runtime,
> > > > but if you think it's a non-trivial benefit to reject at load time
> > > > instead, I'm fine going with that.
> > > >
> > >
> > > My personal opinion is this:
> > >
> > > When I am working with a statically known read only dynptr, it is like
> > > declaring a variable const. Every function expecting it to be
> > > non-const should fail compilation, and trying to mutate the variables
> > > through writes should also fail compilation. For BPF compilation is
> > > analogous to program load.
> > >
> > > It might be that said variable is not const, then those operations may
> > > fail at runtime due to some other reason. Being dynamically read-only
> > > is then a runtime failure condition, which will cause failure at
> > > runtime. Both are distinct cases in my mind, and it is fine to fail at
> > > runtime when we don't know. In general, you save a lot of time of the
> > > user in my opinion (esp. people new to things) if you reject known
> > > incorrectness as early as possible.
> > >
> > > E.g. load a dynptr from a map, where the field accepts storing both
> > > read only and non-read only ones. Then it is expected that writes may
> > > fail at runtime. That also allows you to switch read-only ness at
> > > runtime back to rw. But if the field only expects rdonly dynptr,
> > > verifier knows that the type is const statically, so it triggers
> > > failures for writes at load time instead.
> > >
> > > Taking this a step further, you may even store rw dynptr to a map
> > > field expecting rdonly dynptr. That's like returning a const pointer
> > > from a function for a rw memory, where you want to limit access of the
> > > user, even better if you do it statically. Then functions trying to
> > > write to dynptr loaded from said map field will fail load itself,
> > > while others having access to rw dynptr can do it just fine.
> > >
> > > When the verifier does not know, it does not know. There will be such
> > > cases when you make const-ness a runtime property.
> > >
> > > > >
> > > > > It is fine if _everything_ checks it at runtime, but that doesn't seem
> > > > > possible, hence the question. We would need a new slice helper that
> > > > > only returns read-only slices, because dynptr_data can return rw
> > > > > slices currently and it is already UAPI so changing that is not
> > > > > possible anymore.
> > > >
> > > > I don't agree that if bpf_dynptr_write() is checked at runtime, then
> > > > bpf_dynptr_data must also be checked at runtime to be consistent. I
> > > > think it's fine if writes through helper functions are rejected at
> > > > runtime, and writes through direct access are rejected at load time.
> > > > That doesn't seem inconsistent to me.
> > >
> > > My point was more that dynptr_data cannot propagate runtime
> > > read-only-ness to its returned pointer. The verifier has to know
> > > statically, at which point I don't see why we can't just reject other
> > > cases at load anyway.
> >
> > I think the right answer here is to not make bpf_dynptr_data() return
> > direct pointer of changing read-only-ness. Maybe the right answer here
> > is another helper, bpf_dynptr_data_rdonly(), that will return NULL for
> > non-read-only dynptr and PTR_TO_MEM | MEM_RDONLY if dynptr is indeed
> > read-only?
>
> Shouldn't it be the other way around? bpf_dynptr_data_rdonly() should
> work for both ro and rw dynptrs, and bpf_dynptr_data() only for rw
> dynptr?

Right, that's what I proposed:

  "bpf_dynptr_data_rdonly(), that will return NULL for non-read-only dynptr"

so if you pass read-write dynptr, it will return NULL (because it's
unsafe to take writable direct pointer).

bpf_dynptr_data_rdonly() should still work fine with both rdonly and
read-write dynptr.
bpf_dynptr_data() only works (in the sense returns non-NULL) for
read-write dynptr only.


>
> And yes, you're kind of painting yourself in a corner if you allow
> dynptr_data to work with statically ro dynptrs now, ascertaining the
> ro bit for the returned slice, and then later you plan to add dynptrs
> whose ro vs rw is not known to the verifier statically. Then it works
> well for statically known ones (returning both ro and rw slices), but
> has to return NULL at runtime for statically unknown read only ones,
> and always rw slice in verifier state for them.

Right, will be both inconsistent and puzzling.

>
> >
> > By saying that read-only-ness of dynptr should be statically known and
> > rejecting some dynptr functions at load time places us at the mercy of
> > verifier's complete knowledge of application logic, which is exactly
> > against the spirit of dynptr.
> >
>
> Right, that might be too restrictive if we require them to be
> statically read only.
>
> But it's not about forcing it to be statically ro, it is more about
> rejecting load when we know the program is incorrect (e.g. the types
> are incorrect when passed to helpers), otherwise we throw the error at
> runtime anyway, which seems to be the convention afaicu. But maybe I
> missed the memo and we gradually want to move away from such strict
> static checks.
>
> I view the situation here similar to if we were rejecting direct
> writes to PTR_TO_MEM | MEM_RDONLY at load time, but offloading as
> runtime check in the helper writing to it as rw memory arg. It's as if
> we pretend it's part of the 'type' of the register when doing direct
> writes, but then ignore it while matching it against the said helper's
> argument type.

I disagree, it's not the same. bpf_dynptr_data/bpf_dynptr_data_rdonly
turns completely dynamic dynptr into static slice of memory. Only
after that point it makes sense for BPF verifier to reject something.
Until then it's not incorrect. BPF program will always have to deal
with some dynptr operations potentially failing. dynptr can always be
NULL internally, you can still call bpf_dynptr_xxx() operations on it,
they will just do nothing and return error. That doesn't make BPF
program incorrect.

As I said, dynptr is about shifting static verification to runtime
checks for stuff that BPF verifier can't or shouldn't verify
statically. Like dynptr's dynamic size, for example. We've used a
special data/data_end pointer types to try to solve this for skb, but
it is quite painful in practice and relies on compiler generating
"good" sequence of instructions understandable to BPF verifier.

dynptr takes a different approach, shifts validation to runtime and
"interfaces" with static verification through bpf_dynptr_data() and
similar APIs.

>
> > It's only slightly tangential, but I still dread my experience proving
> > to BPF verifier that some value is strictly greater than zero for BPF
> > helper that expected ARG_CONST_SIZE (not ARG_CONST_SIZE_OR_ZERO).
> > There were also cases were absolutely correct program had to be
> > mangled just to prove to BPF verifier that it indeed can return just 0
> > or 1, etc. This is not to bash BPF verifier, but just to point out
> > that sometimes unnecessary strictness adds nothing but unnecessary
> > pain to user. So, let's not reject anything at load, we can check all
> > that at runtime and return NULL.
> >
> > But bpf_dynptr_data_rdonly() seems useful for cases where we know we
> > are not going to write
> >
> > >
> > > When we have dynptrs which have const-ness as a runtime property, it
> > > is ok to support that by failing at runtime (but then you'll have a
> > > hard time deciding how you want dynptr_data to work, most likely
> > > you'll need another helper which returns only a rdonly slice, when it
> > > fails, we call dynptr_data for rw slice).
> > >
> > > But as I said before, I don't know how dynptr is going to evolve in
> > > the future, so you'll have a better idea, and I'll leave it up to you
> > > decide how you want to design its API. Enough words exchanged about
> > > this :).
> >
> > directionally, dynptr is about offloading decisions to runtime, so I
> > think avoiding unnecessary restrictions at verification time is the
> > right trade off
>
> Fair enough, I guess I've made my point. Let's go with what you both
> feel would be best.

Sounds good. I tried to point out the "philosophy" behind dynptr. It
doesn't preclude making the BPF verifier smarter and track more
things. But it's a deliberate design of dynptr to shift more checks to
runtime because in a lot of cases it makes more sense than "fighting
BPF verifier". This "fighting verifier" phrase is like a meme now.
I've heard this exact phrase from multiple unrelated engineers. BPF
verifier should be helpful, we should engineers having to "fight" it,
so overly strict checks from verifier should be avoided if they don't
compromise correctness, IMO.
Kumar Kartikeya Dwivedi Aug. 27, 2022, 6:32 p.m. UTC | #14
On Sat, 27 Aug 2022 at 19:22, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Sat, Aug 27, 2022 at 12:12 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> > [...]
> > >
> > > I think the right answer here is to not make bpf_dynptr_data() return
> > > direct pointer of changing read-only-ness. Maybe the right answer here
> > > is another helper, bpf_dynptr_data_rdonly(), that will return NULL for
> > > non-read-only dynptr and PTR_TO_MEM | MEM_RDONLY if dynptr is indeed
> > > read-only?
> >
> > Shouldn't it be the other way around? bpf_dynptr_data_rdonly() should
> > work for both ro and rw dynptrs, and bpf_dynptr_data() only for rw
> > dynptr?
>
> Right, that's what I proposed:
>
>   "bpf_dynptr_data_rdonly(), that will return NULL for non-read-only dynptr"
>
> so if you pass read-write dynptr, it will return NULL (because it's
> unsafe to take writable direct pointer).
>
> bpf_dynptr_data_rdonly() should still work fine with both rdonly and
> read-write dynptr.
> bpf_dynptr_data() only works (in the sense returns non-NULL) for
> read-write dynptr only.
>
>
> >
> > And yes, you're kind of painting yourself in a corner if you allow
> > dynptr_data to work with statically ro dynptrs now, ascertaining the
> > ro bit for the returned slice, and then later you plan to add dynptrs
> > whose ro vs rw is not known to the verifier statically. Then it works
> > well for statically known ones (returning both ro and rw slices), but
> > has to return NULL at runtime for statically unknown read only ones,
> > and always rw slice in verifier state for them.
>
> Right, will be both inconsistent and puzzling.
>
> >
> > >
> > > By saying that read-only-ness of dynptr should be statically known and
> > > rejecting some dynptr functions at load time places us at the mercy of
> > > verifier's complete knowledge of application logic, which is exactly
> > > against the spirit of dynptr.
> > >
> >
> > Right, that might be too restrictive if we require them to be
> > statically read only.
> >
> > But it's not about forcing it to be statically ro, it is more about
> > rejecting load when we know the program is incorrect (e.g. the types
> > are incorrect when passed to helpers), otherwise we throw the error at
> > runtime anyway, which seems to be the convention afaicu. But maybe I
> > missed the memo and we gradually want to move away from such strict
> > static checks.
> >
> > I view the situation here similar to if we were rejecting direct
> > writes to PTR_TO_MEM | MEM_RDONLY at load time, but offloading as
> > runtime check in the helper writing to it as rw memory arg. It's as if
> > we pretend it's part of the 'type' of the register when doing direct
> > writes, but then ignore it while matching it against the said helper's
> > argument type.
>
> I disagree, it's not the same. bpf_dynptr_data/bpf_dynptr_data_rdonly
> turns completely dynamic dynptr into static slice of memory. Only
> after that point it makes sense for BPF verifier to reject something.
> Until then it's not incorrect. BPF program will always have to deal
> with some dynptr operations potentially failing. dynptr can always be
> NULL internally, you can still call bpf_dynptr_xxx() operations on it,
> they will just do nothing and return error. That doesn't make BPF
> program incorrect.

Let me just explain one last time why I'm unable to swallow this
'completely dynamic dynptr' explanation, because it is not treated as
completely dynamic by all dynptr helpers.

No pushback, but it would be great if you could further help me wrap
my head around this, so that we're in sync for future discussions.

So you say you may not know the type of dynptr (read-only, rw, local,
ringbuf, etc.). Hence you want to treat every dynptr as 'dynamic
dynptr' you know nothing about even when you do know some information
about it statically (e.g. if it's on stack). You don't want to reject
things early at load even if you have all the info to do so. You want
operations on it to fail at runtime instead.

If you cannot track ro vs rw in the future statically, you won't be be
able to track local or ringbuf or skb either (since both are part of
the type of the dynptr). If you can, you can just as well encode
const-ness as part of the type where you declare it (e.g. in a map
field where the value is assigned dynamically, where you say
dynptr_ringbuf etc.). Type comprises local vs skb vs ringbuf | ro vs
rw for me. But maybe I could be wrong.

So following this line of reasoning, will you be relaxing the argument
type of helpers like bpf_ringbuf_submit_dynptr? Right now they take
'dynamic dynptr' as arg, but also expect DYNPTR_TYPE_RINGBUF, so you
reject load when it's not a ringbuf dynptr. Will it then fallback to
checking the type at runtime when the type will not be known? But then
it will also permit passing local or skb dynptr in the future which it
rejects right now.

I'm just hoping you are able to see why it's looking a bit
inconsistent to me. If DYNPTR_TYPE_RINGBUF has to be checked, it felt
to me like DYNPTR_RDONLY is as much part of that kind of type safety
wrt helpers. It would be set on the dynptr when skb passed to
dynptr_from_skb is rdonly in some program types, along with
DYNPTR_TYPE_SKB, and expect to be matched when invoking helpers on
such dynptrs.

It is fine to push checks to runtime, especially when you won't know
the type, because verifier cannot reasonably track the dynptr type
then.

But right now there's still some level of state you maintain in the
verifier about dynptrs (like it's type), and it seems to me like some
helpers are using that state to reject things at load time, while some
other helper will ignore it and fallback to runtime checks.

I hope this is a clear enough description to atleast justify why I'm
(still) a bit confused.
Kumar Kartikeya Dwivedi Aug. 27, 2022, 7:16 p.m. UTC | #15
On Sat, 27 Aug 2022 at 20:32, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> On Sat, 27 Aug 2022 at 19:22, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Sat, Aug 27, 2022 at 12:12 AM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > > [...]
> > > >
> > > > I think the right answer here is to not make bpf_dynptr_data() return
> > > > direct pointer of changing read-only-ness. Maybe the right answer here
> > > > is another helper, bpf_dynptr_data_rdonly(), that will return NULL for
> > > > non-read-only dynptr and PTR_TO_MEM | MEM_RDONLY if dynptr is indeed
> > > > read-only?
> > >
> > > Shouldn't it be the other way around? bpf_dynptr_data_rdonly() should
> > > work for both ro and rw dynptrs, and bpf_dynptr_data() only for rw
> > > dynptr?
> >
> > Right, that's what I proposed:
> >
> >   "bpf_dynptr_data_rdonly(), that will return NULL for non-read-only dynptr"
> >
> > so if you pass read-write dynptr, it will return NULL (because it's
> > unsafe to take writable direct pointer).
> >
> > bpf_dynptr_data_rdonly() should still work fine with both rdonly and
> > read-write dynptr.
> > bpf_dynptr_data() only works (in the sense returns non-NULL) for
> > read-write dynptr only.
> >
> >
> > >
> > > And yes, you're kind of painting yourself in a corner if you allow
> > > dynptr_data to work with statically ro dynptrs now, ascertaining the
> > > ro bit for the returned slice, and then later you plan to add dynptrs
> > > whose ro vs rw is not known to the verifier statically. Then it works
> > > well for statically known ones (returning both ro and rw slices), but
> > > has to return NULL at runtime for statically unknown read only ones,
> > > and always rw slice in verifier state for them.
> >
> > Right, will be both inconsistent and puzzling.
> >
> > >
> > > >
> > > > By saying that read-only-ness of dynptr should be statically known and
> > > > rejecting some dynptr functions at load time places us at the mercy of
> > > > verifier's complete knowledge of application logic, which is exactly
> > > > against the spirit of dynptr.
> > > >
> > >
> > > Right, that might be too restrictive if we require them to be
> > > statically read only.
> > >
> > > But it's not about forcing it to be statically ro, it is more about
> > > rejecting load when we know the program is incorrect (e.g. the types
> > > are incorrect when passed to helpers), otherwise we throw the error at
> > > runtime anyway, which seems to be the convention afaicu. But maybe I
> > > missed the memo and we gradually want to move away from such strict
> > > static checks.
> > >
> > > I view the situation here similar to if we were rejecting direct
> > > writes to PTR_TO_MEM | MEM_RDONLY at load time, but offloading as
> > > runtime check in the helper writing to it as rw memory arg. It's as if
> > > we pretend it's part of the 'type' of the register when doing direct
> > > writes, but then ignore it while matching it against the said helper's
> > > argument type.
> >
> > I disagree, it's not the same. bpf_dynptr_data/bpf_dynptr_data_rdonly
> > turns completely dynamic dynptr into static slice of memory. Only
> > after that point it makes sense for BPF verifier to reject something.
> > Until then it's not incorrect. BPF program will always have to deal
> > with some dynptr operations potentially failing. dynptr can always be
> > NULL internally, you can still call bpf_dynptr_xxx() operations on it,
> > they will just do nothing and return error. That doesn't make BPF
> > program incorrect.
>
> Let me just explain one last time why I'm unable to swallow this
> 'completely dynamic dynptr' explanation, because it is not treated as
> completely dynamic by all dynptr helpers.
>
> No pushback, but it would be great if you could further help me wrap
> my head around this, so that we're in sync for future discussions.
>
> So you say you may not know the type of dynptr (read-only, rw, local,
> ringbuf, etc.). Hence you want to treat every dynptr as 'dynamic
> dynptr' you know nothing about even when you do know some information
> about it statically (e.g. if it's on stack). You don't want to reject
> things early at load even if you have all the info to do so. You want
> operations on it to fail at runtime instead.
>
> If you cannot track ro vs rw in the future statically, you won't be be
> able to track local or ringbuf or skb either (since both are part of
> the type of the dynptr). If you can, you can just as well encode
> const-ness as part of the type where you declare it (e.g. in a map
> field where the value is assigned dynamically, where you say
> dynptr_ringbuf etc.). Type comprises local vs skb vs ringbuf | ro vs
> rw for me. But maybe I could be wrong.
>
> So following this line of reasoning, will you be relaxing the argument
> type of helpers like bpf_ringbuf_submit_dynptr? Right now they take
> 'dynamic dynptr' as arg, but also expect DYNPTR_TYPE_RINGBUF, so you
> reject load when it's not a ringbuf dynptr. Will it then fallback to
> checking the type at runtime when the type will not be known? But then
> it will also permit passing local or skb dynptr in the future which it
> rejects right now.
>
> I'm just hoping you are able to see why it's looking a bit
> inconsistent to me. If DYNPTR_TYPE_RINGBUF has to be checked, it felt
> to me like DYNPTR_RDONLY is as much part of that kind of type safety
> wrt helpers. It would be set on the dynptr when skb passed to
> dynptr_from_skb is rdonly in some program types, along with
> DYNPTR_TYPE_SKB, and expect to be matched when invoking helpers on
> such dynptrs.
>
> It is fine to push checks to runtime, especially when you won't know
> the type, because verifier cannot reasonably track the dynptr type
> then.
>

To further expand on this point, in my mind when you have actual
'completely dynamic dynptrs', you will most likely track them as
DYNPTR_TYPE_UNKNOWN in the verifier. Then you would capture free bits
to encode the types at runtime, and start checking that in helpers.

All helpers will start taking such DYNPTR_TYPE_UNKNOWN, or you can
have a casting helper like we already have for normal pointers.
No need for extra verifier complexity to teach it to recognize type
when it is unknown. It will then be checked dynamically at runtime for
dynptr operations.

Being strictly type safe by default for helper arguments is not going
to lead you to 'fighting the verifier'. Very much the opposite, the
verifier is helping you catch errors that will rather occur at runtime
anyway.

> But right now there's still some level of state you maintain in the
> verifier about dynptrs (like it's type), and it seems to me like some
> helpers are using that state to reject things at load time, while some
> other helper will ignore it and fallback to runtime checks.
>
> I hope this is a clear enough description to atleast justify why I'm
> (still) a bit confused.
Andrii Nakryiko Aug. 27, 2022, 11:03 p.m. UTC | #16
On Sat, Aug 27, 2022 at 11:33 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Sat, 27 Aug 2022 at 19:22, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Sat, Aug 27, 2022 at 12:12 AM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > > [...]
> > > >
> > > > I think the right answer here is to not make bpf_dynptr_data() return
> > > > direct pointer of changing read-only-ness. Maybe the right answer here
> > > > is another helper, bpf_dynptr_data_rdonly(), that will return NULL for
> > > > non-read-only dynptr and PTR_TO_MEM | MEM_RDONLY if dynptr is indeed
> > > > read-only?
> > >
> > > Shouldn't it be the other way around? bpf_dynptr_data_rdonly() should
> > > work for both ro and rw dynptrs, and bpf_dynptr_data() only for rw
> > > dynptr?
> >
> > Right, that's what I proposed:
> >
> >   "bpf_dynptr_data_rdonly(), that will return NULL for non-read-only dynptr"
> >
> > so if you pass read-write dynptr, it will return NULL (because it's
> > unsafe to take writable direct pointer).
> >
> > bpf_dynptr_data_rdonly() should still work fine with both rdonly and
> > read-write dynptr.
> > bpf_dynptr_data() only works (in the sense returns non-NULL) for
> > read-write dynptr only.
> >
> >
> > >
> > > And yes, you're kind of painting yourself in a corner if you allow
> > > dynptr_data to work with statically ro dynptrs now, ascertaining the
> > > ro bit for the returned slice, and then later you plan to add dynptrs
> > > whose ro vs rw is not known to the verifier statically. Then it works
> > > well for statically known ones (returning both ro and rw slices), but
> > > has to return NULL at runtime for statically unknown read only ones,
> > > and always rw slice in verifier state for them.
> >
> > Right, will be both inconsistent and puzzling.
> >
> > >
> > > >
> > > > By saying that read-only-ness of dynptr should be statically known and
> > > > rejecting some dynptr functions at load time places us at the mercy of
> > > > verifier's complete knowledge of application logic, which is exactly
> > > > against the spirit of dynptr.
> > > >
> > >
> > > Right, that might be too restrictive if we require them to be
> > > statically read only.
> > >
> > > But it's not about forcing it to be statically ro, it is more about
> > > rejecting load when we know the program is incorrect (e.g. the types
> > > are incorrect when passed to helpers), otherwise we throw the error at
> > > runtime anyway, which seems to be the convention afaicu. But maybe I
> > > missed the memo and we gradually want to move away from such strict
> > > static checks.
> > >
> > > I view the situation here similar to if we were rejecting direct
> > > writes to PTR_TO_MEM | MEM_RDONLY at load time, but offloading as
> > > runtime check in the helper writing to it as rw memory arg. It's as if
> > > we pretend it's part of the 'type' of the register when doing direct
> > > writes, but then ignore it while matching it against the said helper's
> > > argument type.
> >
> > I disagree, it's not the same. bpf_dynptr_data/bpf_dynptr_data_rdonly
> > turns completely dynamic dynptr into static slice of memory. Only
> > after that point it makes sense for BPF verifier to reject something.
> > Until then it's not incorrect. BPF program will always have to deal
> > with some dynptr operations potentially failing. dynptr can always be
> > NULL internally, you can still call bpf_dynptr_xxx() operations on it,
> > they will just do nothing and return error. That doesn't make BPF
> > program incorrect.
>
> Let me just explain one last time why I'm unable to swallow this
> 'completely dynamic dynptr' explanation, because it is not treated as
> completely dynamic by all dynptr helpers.
>
> No pushback, but it would be great if you could further help me wrap
> my head around this, so that we're in sync for future discussions.
>
> So you say you may not know the type of dynptr (read-only, rw, local,
> ringbuf, etc.). Hence you want to treat every dynptr as 'dynamic

No, that's not what I said. I didn't talk about dynptr type (ringbuf
vs skb vs local). Type we have to track statically precisely to limit
what kind of helpers can be used on dynptrs of specific type.

But note that there are helpers that don't care about dynptr and, in
theory, should work with any dynptr. Because dynptr is an abstraction
of "logically continuous range of memory". So in some cases even types
don't matter.

But regardless, I just don't think about read-only bit as part of
dynptr types. I don't see why it has to be statically known and even
why it has to stay as read-only or read-write for the entire duration
of dynptr lifetime. Why can't we allow flipping dynptr from read-write
to read-only before passing it to some custom BPF subprog, potentially
provided from some BPF library which you don't control; just to make
sure that it cannot really modify underlying memory (even if it is
fundamentally modifiable, like with ringbuf)?

So the point I'm trying to communicate is that things that don't
**need** to be knowable statically to BPF verifier - **should not** be
knowable and should be checked at runtime only. With any dynptr helper
that is interfacing its runtime nature into static thing that verifier
enforces (which is what bpf_dynptr_data/bpf_dynptr_data_rdonly are),
it will always be "fallible" and users will have to expect that they
might return NULL or do nothing, or whatever way to deal with failure
case.


And I believe dynptr read-onliness is not something that BPF verifier
should be tracking statically. PTR_TO_MEM -- yes, dynptr -- no. That's
it. Dynptr type/kind -- yes, track statically, it's way more important
to determine what you can at all do with dynptr.


> dynptr' you know nothing about even when you do know some information
> about it statically (e.g. if it's on stack). You don't want to reject
> things early at load even if you have all the info to do so. You want
> operations on it to fail at runtime instead.
>
> If you cannot track ro vs rw in the future statically, you won't be be
> able to track local or ringbuf or skb either (since both are part of
> the type of the dynptr). If you can, you can just as well encode
> const-ness as part of the type where you declare it (e.g. in a map
> field where the value is assigned dynamically, where you say
> dynptr_ringbuf etc.). Type comprises local vs skb vs ringbuf | ro vs
> rw for me. But maybe I could be wrong.

I don't consider read-only to be a part of type. It's more like offset
and size to me, rather than type. And even if we can track constness
(e.g., we can teach BPF verifier to recognize that hypothetical
bpf_dynptr_set_read_only() makes dynptr into dynptr_rdonly, but why?)

>
> So following this line of reasoning, will you be relaxing the argument
> type of helpers like bpf_ringbuf_submit_dynptr? Right now they take
> 'dynamic dynptr' as arg, but also expect DYNPTR_TYPE_RINGBUF, so you
> reject load when it's not a ringbuf dynptr. Will it then fallback to
> checking the type at runtime when the type will not be known? But then
> it will also permit passing local or skb dynptr in the future which it
> rejects right now.

No, which is exactly why we track dynptr type statically. Because some
operations don't make sense and they have additional semantics (like
with ringbuf submit).

>
> I'm just hoping you are able to see why it's looking a bit
> inconsistent to me. If DYNPTR_TYPE_RINGBUF has to be checked, it felt
> to me like DYNPTR_RDONLY is as much part of that kind of type safety
> wrt helpers. It would be set on the dynptr when skb passed to
> dynptr_from_skb is rdonly in some program types, along with
> DYNPTR_TYPE_SKB, and expect to be matched when invoking helpers on
> such dynptrs.

I understand your view point. But you are taking "completely dynamic"
nature of dynptr too far (not tracking dynptr type) in one case, or
making it too restrictive (tracking read-only state) in another. We
can argue what substitutes "inconsistent", but I'm just leaning
towards slightly different tradeoffs. I don't see a huge value in
preventing bpf_dynptr_write() to be called on read-only dynptr
**statically**. I've ran into various cases where I'd rather BPF
verifier not pretend to be all-knowing and too helpful, because it
sometimes turns into a game of proving to BPF verifier that "I know
what I'm doing, and it's correct, so please just let me do it".

>
> It is fine to push checks to runtime, especially when you won't know
> the type, because verifier cannot reasonably track the dynptr type
> then.
>
> But right now there's still some level of state you maintain in the
> verifier about dynptrs (like it's type), and it seems to me like some
> helpers are using that state to reject things at load time, while some
> other helper will ignore it and fallback to runtime checks.

Exactly. There are helpers that can fail no matter what, and
read-onliness is just another condition they will have to check and
reject.

And some helpers really don't and shouldn't care about the nature of
dynptr (like bpf_dynptr_get_size(), for example).

>
> I hope this is a clear enough description to atleast justify why I'm
> (still) a bit confused.

I don't think you are confused, you just prefer a different tradeoff.
Hopefully I gave a few more arguments in favor of less static
treatment of dynptr where it doesn't hurt correctness.

P.S. I'm going away on vacation, so unlikely to be able to continue
this discussion in a timely manner going forward.
Kumar Kartikeya Dwivedi Aug. 27, 2022, 11:47 p.m. UTC | #17
On Sun, 28 Aug 2022 at 01:03, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Sat, Aug 27, 2022 at 11:33 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Sat, 27 Aug 2022 at 19:22, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Sat, Aug 27, 2022 at 12:12 AM Kumar Kartikeya Dwivedi
> > > <memxor@gmail.com> wrote:
> > > > [...]
> > > > >
> > > > > I think the right answer here is to not make bpf_dynptr_data() return
> > > > > direct pointer of changing read-only-ness. Maybe the right answer here
> > > > > is another helper, bpf_dynptr_data_rdonly(), that will return NULL for
> > > > > non-read-only dynptr and PTR_TO_MEM | MEM_RDONLY if dynptr is indeed
> > > > > read-only?
> > > >
> > > > Shouldn't it be the other way around? bpf_dynptr_data_rdonly() should
> > > > work for both ro and rw dynptrs, and bpf_dynptr_data() only for rw
> > > > dynptr?
> > >
> > > Right, that's what I proposed:
> > >
> > >   "bpf_dynptr_data_rdonly(), that will return NULL for non-read-only dynptr"
> > >
> > > so if you pass read-write dynptr, it will return NULL (because it's
> > > unsafe to take writable direct pointer).
> > >
> > > bpf_dynptr_data_rdonly() should still work fine with both rdonly and
> > > read-write dynptr.
> > > bpf_dynptr_data() only works (in the sense returns non-NULL) for
> > > read-write dynptr only.
> > >
> > >
> > > >
> > > > And yes, you're kind of painting yourself in a corner if you allow
> > > > dynptr_data to work with statically ro dynptrs now, ascertaining the
> > > > ro bit for the returned slice, and then later you plan to add dynptrs
> > > > whose ro vs rw is not known to the verifier statically. Then it works
> > > > well for statically known ones (returning both ro and rw slices), but
> > > > has to return NULL at runtime for statically unknown read only ones,
> > > > and always rw slice in verifier state for them.
> > >
> > > Right, will be both inconsistent and puzzling.
> > >
> > > >
> > > > >
> > > > > By saying that read-only-ness of dynptr should be statically known and
> > > > > rejecting some dynptr functions at load time places us at the mercy of
> > > > > verifier's complete knowledge of application logic, which is exactly
> > > > > against the spirit of dynptr.
> > > > >
> > > >
> > > > Right, that might be too restrictive if we require them to be
> > > > statically read only.
> > > >
> > > > But it's not about forcing it to be statically ro, it is more about
> > > > rejecting load when we know the program is incorrect (e.g. the types
> > > > are incorrect when passed to helpers), otherwise we throw the error at
> > > > runtime anyway, which seems to be the convention afaicu. But maybe I
> > > > missed the memo and we gradually want to move away from such strict
> > > > static checks.
> > > >
> > > > I view the situation here similar to if we were rejecting direct
> > > > writes to PTR_TO_MEM | MEM_RDONLY at load time, but offloading as
> > > > runtime check in the helper writing to it as rw memory arg. It's as if
> > > > we pretend it's part of the 'type' of the register when doing direct
> > > > writes, but then ignore it while matching it against the said helper's
> > > > argument type.
> > >
> > > I disagree, it's not the same. bpf_dynptr_data/bpf_dynptr_data_rdonly
> > > turns completely dynamic dynptr into static slice of memory. Only
> > > after that point it makes sense for BPF verifier to reject something.
> > > Until then it's not incorrect. BPF program will always have to deal
> > > with some dynptr operations potentially failing. dynptr can always be
> > > NULL internally, you can still call bpf_dynptr_xxx() operations on it,
> > > they will just do nothing and return error. That doesn't make BPF
> > > program incorrect.
> >
> > Let me just explain one last time why I'm unable to swallow this
> > 'completely dynamic dynptr' explanation, because it is not treated as
> > completely dynamic by all dynptr helpers.
> >
> > No pushback, but it would be great if you could further help me wrap
> > my head around this, so that we're in sync for future discussions.
> >
> > So you say you may not know the type of dynptr (read-only, rw, local,
> > ringbuf, etc.). Hence you want to treat every dynptr as 'dynamic
>
> No, that's not what I said. I didn't talk about dynptr type (ringbuf
> vs skb vs local). Type we have to track statically precisely to limit
> what kind of helpers can be used on dynptrs of specific type.
>
> But note that there are helpers that don't care about dynptr and, in
> theory, should work with any dynptr. Because dynptr is an abstraction
> of "logically continuous range of memory". So in some cases even types
> don't matter.
>
> But regardless, I just don't think about read-only bit as part of
> dynptr types. I don't see why it has to be statically known and even
> why it has to stay as read-only or read-write for the entire duration
> of dynptr lifetime. Why can't we allow flipping dynptr from read-write
> to read-only before passing it to some custom BPF subprog, potentially
> provided from some BPF library which you don't control; just to make
> sure that it cannot really modify underlying memory (even if it is
> fundamentally modifiable, like with ringbuf)?
>
> So the point I'm trying to communicate is that things that don't
> **need** to be knowable statically to BPF verifier - **should not** be
> knowable and should be checked at runtime only. With any dynptr helper
> that is interfacing its runtime nature into static thing that verifier
> enforces (which is what bpf_dynptr_data/bpf_dynptr_data_rdonly are),
> it will always be "fallible" and users will have to expect that they
> might return NULL or do nothing, or whatever way to deal with failure
> case.
>
>
> And I believe dynptr read-onliness is not something that BPF verifier
> should be tracking statically. PTR_TO_MEM -- yes, dynptr -- no. That's
> it. Dynptr type/kind -- yes, track statically, it's way more important
> to determine what you can at all do with dynptr.
>

Understood, it's more clear now how you're thinking about this :),
especially that you don't consider read-only bit to be a property of
dynptr type. I think that was the main point I was not following.

And yes, I guess it's about different tradeoffs. Taking your BPF
library example, my idea would be that we will anyway be verifying the
global or static subprog in the library by the BTF of its function
prototype. The verifier has to verify safety of the subprog by setting
up argument types if it is global.

There, I find it more appropriate for global ones to declare const
bpf_dynptr * as the argument type if the intent is to not write to the
dynptr. Then user can safely pass both rw and ro dynptrs to it,
without having to flip it each time at runtime just to ensure safety.
If it is static it will also be able to catch incorrect writes for
callers.

But indeed there may be cases where you want to control this at
runtime, IMPO that should be orthogonal to type level const-ness.
Those will be rw at the type level but may change const-ness at
runtime, then operations start failing at runtime.

Anyway, I guess we are done here. Again, thanks for sticking along and
taking the time to describe it in detail :).

>
> [...]
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 39bd36359c1e..30615d1a0c13 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -407,11 +407,14 @@  enum bpf_type_flag {
 	/* Size is known at compile time. */
 	MEM_FIXED_SIZE		= BIT(10 + BPF_BASE_TYPE_BITS),
 
+	/* DYNPTR points to sk_buff */
+	DYNPTR_TYPE_SKB		= BIT(11 + BPF_BASE_TYPE_BITS),
+
 	__BPF_TYPE_FLAG_MAX,
 	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
 };
 
-#define DYNPTR_TYPE_FLAG_MASK	(DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF)
+#define DYNPTR_TYPE_FLAG_MASK	(DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF | DYNPTR_TYPE_SKB)
 
 /* Max number of base types. */
 #define BPF_BASE_TYPE_LIMIT	(1UL << BPF_BASE_TYPE_BITS)
@@ -903,6 +906,36 @@  static __always_inline __nocfi unsigned int bpf_dispatcher_nop_func(
 	return bpf_func(ctx, insnsi);
 }
 
+/* the implementation of the opaque uapi struct bpf_dynptr */
+struct bpf_dynptr_kern {
+	void *data;
+	/* Size represents the number of usable bytes of dynptr data.
+	 * If for example the offset is at 4 for a local dynptr whose data is
+	 * of type u64, the number of usable bytes is 4.
+	 *
+	 * The upper 8 bits are reserved. It is as follows:
+	 * Bits 0 - 23 = size
+	 * Bits 24 - 30 = dynptr type
+	 * Bit 31 = whether dynptr is read-only
+	 */
+	u32 size;
+	u32 offset;
+} __aligned(8);
+
+enum bpf_dynptr_type {
+	BPF_DYNPTR_TYPE_INVALID,
+	/* Points to memory that is local to the bpf program */
+	BPF_DYNPTR_TYPE_LOCAL,
+	/* Underlying data is a ringbuf record */
+	BPF_DYNPTR_TYPE_RINGBUF,
+	/* Underlying data is a sk_buff */
+	BPF_DYNPTR_TYPE_SKB,
+	/* Underlying data is a xdp_buff */
+	BPF_DYNPTR_TYPE_XDP,
+};
+
+int bpf_dynptr_check_size(u32 size);
+
 #ifdef CONFIG_BPF_JIT
 int bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr);
 int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr);
@@ -1975,6 +2008,12 @@  static inline bool has_current_bpf_ctx(void)
 {
 	return !!current->bpf_ctx;
 }
+
+void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
+		     enum bpf_dynptr_type type, u32 offset, u32 size);
+void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
+void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr);
+
 #else /* !CONFIG_BPF_SYSCALL */
 static inline struct bpf_prog *bpf_prog_get(u32 ufd)
 {
@@ -2188,6 +2227,19 @@  static inline bool has_current_bpf_ctx(void)
 {
 	return false;
 }
+
+static inline void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
+				   enum bpf_dynptr_type type, u32 offset, u32 size)
+{
+}
+
+static inline void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr)
+{
+}
+
+static inline void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr)
+{
+}
 #endif /* CONFIG_BPF_SYSCALL */
 
 void __bpf_free_used_btfs(struct bpf_prog_aux *aux,
@@ -2548,35 +2600,6 @@  int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
 			u32 **bin_buf, u32 num_args);
 void bpf_bprintf_cleanup(void);
 
-/* the implementation of the opaque uapi struct bpf_dynptr */
-struct bpf_dynptr_kern {
-	void *data;
-	/* Size represents the number of usable bytes of dynptr data.
-	 * If for example the offset is at 4 for a local dynptr whose data is
-	 * of type u64, the number of usable bytes is 4.
-	 *
-	 * The upper 8 bits are reserved. It is as follows:
-	 * Bits 0 - 23 = size
-	 * Bits 24 - 30 = dynptr type
-	 * Bit 31 = whether dynptr is read-only
-	 */
-	u32 size;
-	u32 offset;
-} __aligned(8);
-
-enum bpf_dynptr_type {
-	BPF_DYNPTR_TYPE_INVALID,
-	/* Points to memory that is local to the bpf program */
-	BPF_DYNPTR_TYPE_LOCAL,
-	/* Underlying data is a ringbuf record */
-	BPF_DYNPTR_TYPE_RINGBUF,
-};
-
-void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
-		     enum bpf_dynptr_type type, u32 offset, u32 size);
-void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
-int bpf_dynptr_check_size(u32 size);
-
 #ifdef CONFIG_BPF_LSM
 void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype);
 void bpf_cgroup_atype_put(int cgroup_atype);
diff --git a/include/linux/filter.h b/include/linux/filter.h
index a5f21dc3c432..649063d9cbfd 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1532,4 +1532,8 @@  static __always_inline int __bpf_xdp_redirect_map(struct bpf_map *map, u32 ifind
 	return XDP_REDIRECT;
 }
 
+int __bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset, void *to, u32 len);
+int __bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from,
+			  u32 len, u64 flags);
+
 #endif /* __LINUX_FILTER_H__ */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 934a2a8beb87..320e6b95d95c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5253,11 +5253,22 @@  union bpf_attr {
  *	Description
  *		Write *len* bytes from *src* into *dst*, starting from *offset*
  *		into *dst*.
- *		*flags* is currently unused.
+ *
+ *		*flags* must be 0 except for skb-type dynptrs.
+ *
+ *		For skb-type dynptrs:
+ *		    *  All data slices of the dynptr are automatically
+ *		       invalidated after **bpf_dynptr_write**\ (). If you wish to
+ *		       avoid this, please perform the write using direct data slices
+ *		       instead.
+ *
+ *		    *  For *flags*, please see the flags accepted by
+ *		       **bpf_skb_store_bytes**\ ().
  *	Return
  *		0 on success, -E2BIG if *offset* + *len* exceeds the length
  *		of *dst*'s data, -EINVAL if *dst* is an invalid dynptr or if *dst*
- *		is a read-only dynptr or if *flags* is not 0.
+ *		is a read-only dynptr or if *flags* is not correct. For skb-type dynptrs,
+ *		other errors correspond to errors returned by **bpf_skb_store_bytes**\ ().
  *
  * void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len)
  *	Description
@@ -5265,10 +5276,20 @@  union bpf_attr {
  *
  *		*len* must be a statically known value. The returned data slice
  *		is invalidated whenever the dynptr is invalidated.
+ *
+ *		For skb-type dynptrs:
+ *		    * If *offset* + *len* extends into the skb's paged buffers,
+ *		      the user should manually pull the skb with **bpf_skb_pull_data**\ ()
+ *		      and try again.
+ *
+ *		    * The data slice is automatically invalidated anytime
+ *		      **bpf_dynptr_write**\ () or a helper call that changes
+ *		      the underlying packet buffer (eg **bpf_skb_pull_data**\ ())
+ *		      is called.
  *	Return
  *		Pointer to the underlying dynptr data, NULL if the dynptr is
  *		read-only, if the dynptr is invalid, or if the offset and length
- *		is out of bounds.
+ *		is out of bounds or in a paged buffer for skb-type dynptrs.
  *
  * s64 bpf_tcp_raw_gen_syncookie_ipv4(struct iphdr *iph, struct tcphdr *th, u32 th_len)
  *	Description
@@ -5355,6 +5376,18 @@  union bpf_attr {
  *	Return
  *		Current *ktime*.
  *
+ * long bpf_dynptr_from_skb(struct sk_buff *skb, u64 flags, struct bpf_dynptr *ptr)
+ *	Description
+ *		Get a dynptr to the data in *skb*. *skb* must be the BPF program
+ *		context. Depending on program type, the dynptr may be read-only.
+ *
+ *		Calls that change the *skb*'s underlying packet buffer
+ *		(eg **bpf_skb_pull_data**\ ()) do not invalidate the dynptr, but
+ *		they do invalidate any data slices associated with the dynptr.
+ *
+ *		*flags* is currently unused, it must be 0 for now.
+ *	Return
+ *		0 on success or -EINVAL if flags is not 0.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5566,6 +5599,7 @@  union bpf_attr {
 	FN(tcp_raw_check_syncookie_ipv4),	\
 	FN(tcp_raw_check_syncookie_ipv6),	\
 	FN(ktime_get_tai_ns),		\
+	FN(dynptr_from_skb),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 3c1b9bbcf971..471a01a9b6ae 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1437,11 +1437,21 @@  static bool bpf_dynptr_is_rdonly(struct bpf_dynptr_kern *ptr)
 	return ptr->size & DYNPTR_RDONLY_BIT;
 }
 
+void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr)
+{
+	ptr->size |= DYNPTR_RDONLY_BIT;
+}
+
 static void bpf_dynptr_set_type(struct bpf_dynptr_kern *ptr, enum bpf_dynptr_type type)
 {
 	ptr->size |= type << DYNPTR_TYPE_SHIFT;
 }
 
+static enum bpf_dynptr_type bpf_dynptr_get_type(const struct bpf_dynptr_kern *ptr)
+{
+	return (ptr->size & ~(DYNPTR_RDONLY_BIT)) >> DYNPTR_TYPE_SHIFT;
+}
+
 static u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr)
 {
 	return ptr->size & DYNPTR_SIZE_MASK;
@@ -1512,6 +1522,7 @@  static const struct bpf_func_proto bpf_dynptr_from_mem_proto = {
 BPF_CALL_5(bpf_dynptr_read, void *, dst, u32, len, struct bpf_dynptr_kern *, src,
 	   u32, offset, u64, flags)
 {
+	enum bpf_dynptr_type type;
 	int err;
 
 	if (!src->data || flags)
@@ -1521,9 +1532,19 @@  BPF_CALL_5(bpf_dynptr_read, void *, dst, u32, len, struct bpf_dynptr_kern *, src
 	if (err)
 		return err;
 
-	memcpy(dst, src->data + src->offset + offset, len);
+	type = bpf_dynptr_get_type(src);
 
-	return 0;
+	switch (type) {
+	case BPF_DYNPTR_TYPE_LOCAL:
+	case BPF_DYNPTR_TYPE_RINGBUF:
+		memcpy(dst, src->data + src->offset + offset, len);
+		return 0;
+	case BPF_DYNPTR_TYPE_SKB:
+		return __bpf_skb_load_bytes(src->data, src->offset + offset, dst, len);
+	default:
+		WARN(true, "bpf_dynptr_read: unknown dynptr type %d\n", type);
+		return -EFAULT;
+	}
 }
 
 static const struct bpf_func_proto bpf_dynptr_read_proto = {
@@ -1540,18 +1561,32 @@  static const struct bpf_func_proto bpf_dynptr_read_proto = {
 BPF_CALL_5(bpf_dynptr_write, struct bpf_dynptr_kern *, dst, u32, offset, void *, src,
 	   u32, len, u64, flags)
 {
+	enum bpf_dynptr_type type;
 	int err;
 
-	if (!dst->data || flags || bpf_dynptr_is_rdonly(dst))
+	if (!dst->data || bpf_dynptr_is_rdonly(dst))
 		return -EINVAL;
 
 	err = bpf_dynptr_check_off_len(dst, offset, len);
 	if (err)
 		return err;
 
-	memcpy(dst->data + dst->offset + offset, src, len);
+	type = bpf_dynptr_get_type(dst);
 
-	return 0;
+	switch (type) {
+	case BPF_DYNPTR_TYPE_LOCAL:
+	case BPF_DYNPTR_TYPE_RINGBUF:
+		if (flags)
+			return -EINVAL;
+		memcpy(dst->data + dst->offset + offset, src, len);
+		return 0;
+	case BPF_DYNPTR_TYPE_SKB:
+		return __bpf_skb_store_bytes(dst->data, dst->offset + offset, src, len,
+					     flags);
+	default:
+		WARN(true, "bpf_dynptr_write: unknown dynptr type %d\n", type);
+		return -EFAULT;
+	}
 }
 
 static const struct bpf_func_proto bpf_dynptr_write_proto = {
@@ -1567,6 +1602,9 @@  static const struct bpf_func_proto bpf_dynptr_write_proto = {
 
 BPF_CALL_3(bpf_dynptr_data, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len)
 {
+	enum bpf_dynptr_type type;
+	bool is_rdonly;
+	void *data;
 	int err;
 
 	if (!ptr->data)
@@ -1576,10 +1614,37 @@  BPF_CALL_3(bpf_dynptr_data, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len
 	if (err)
 		return 0;
 
-	if (bpf_dynptr_is_rdonly(ptr))
-		return 0;
+	type = bpf_dynptr_get_type(ptr);
+
+	/* Only skb dynptrs can get read-only data slices, because the
+	 * verifier enforces PTR_TO_PACKET accesses
+	 */
+	is_rdonly = bpf_dynptr_is_rdonly(ptr);
+
+	switch (type) {
+	case BPF_DYNPTR_TYPE_LOCAL:
+	case BPF_DYNPTR_TYPE_RINGBUF:
+		if (is_rdonly)
+			return 0;
+
+		data = ptr->data;
+		break;
+	case BPF_DYNPTR_TYPE_SKB:
+	{
+		struct sk_buff *skb = ptr->data;
 
-	return (unsigned long)(ptr->data + ptr->offset + offset);
+		/* if the data is paged, the caller needs to pull it first */
+		if (ptr->offset + offset + len > skb->len - skb->data_len)
+			return 0;
+
+		data = skb->data;
+		break;
+	}
+	default:
+		WARN(true, "bpf_dynptr_data: unknown dynptr type %d\n", type);
+		return 0;
+	}
+	return (unsigned long)(data + ptr->offset + offset);
 }
 
 static const struct bpf_func_proto bpf_dynptr_data_proto = {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2c1f8069f7b7..1ea295f47525 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -684,6 +684,8 @@  static enum bpf_dynptr_type arg_to_dynptr_type(enum bpf_arg_type arg_type)
 		return BPF_DYNPTR_TYPE_LOCAL;
 	case DYNPTR_TYPE_RINGBUF:
 		return BPF_DYNPTR_TYPE_RINGBUF;
+	case DYNPTR_TYPE_SKB:
+		return BPF_DYNPTR_TYPE_SKB;
 	default:
 		return BPF_DYNPTR_TYPE_INVALID;
 	}
@@ -5826,12 +5828,29 @@  int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	return __check_ptr_off_reg(env, reg, regno, fixed_off_ok);
 }
 
-static u32 stack_slot_get_id(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+static struct bpf_reg_state *get_dynptr_arg_reg(const struct bpf_func_proto *fn,
+						struct bpf_reg_state *regs)
+{
+	int i;
+
+	for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++)
+		if (arg_type_is_dynptr(fn->arg_type[i]))
+			return &regs[BPF_REG_1 + i];
+
+	return NULL;
+}
+
+static enum bpf_dynptr_type stack_slot_get_dynptr_info(struct bpf_verifier_env *env,
+						       struct bpf_reg_state *reg,
+						       int *ref_obj_id)
 {
 	struct bpf_func_state *state = func(env, reg);
 	int spi = get_spi(reg->off);
 
-	return state->stack[spi].spilled_ptr.id;
+	if (ref_obj_id)
+		*ref_obj_id = state->stack[spi].spilled_ptr.id;
+
+	return state->stack[spi].spilled_ptr.dynptr.type;
 }
 
 static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
@@ -6056,7 +6075,8 @@  static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			case DYNPTR_TYPE_RINGBUF:
 				err_extra = "ringbuf ";
 				break;
-			default:
+			case DYNPTR_TYPE_SKB:
+				err_extra = "skb ";
 				break;
 			}
 
@@ -7149,6 +7169,7 @@  static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 {
 	enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
 	const struct bpf_func_proto *fn = NULL;
+	enum bpf_dynptr_type dynptr_type;
 	enum bpf_return_type ret_type;
 	enum bpf_type_flag ret_flag;
 	struct bpf_reg_state *regs;
@@ -7320,24 +7341,43 @@  static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			}
 		}
 		break;
-	case BPF_FUNC_dynptr_data:
-		for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
-			if (arg_type_is_dynptr(fn->arg_type[i])) {
-				if (meta.ref_obj_id) {
-					verbose(env, "verifier internal error: meta.ref_obj_id already set\n");
-					return -EFAULT;
-				}
-				/* Find the id of the dynptr we're tracking the reference of */
-				meta.ref_obj_id = stack_slot_get_id(env, &regs[BPF_REG_1 + i]);
-				break;
-			}
+	case BPF_FUNC_dynptr_write:
+	{
+		struct bpf_reg_state *reg;
+
+		reg = get_dynptr_arg_reg(fn, regs);
+		if (!reg) {
+			verbose(env, "verifier internal error: no dynptr in bpf_dynptr_data()\n");
+			return -EFAULT;
 		}
-		if (i == MAX_BPF_FUNC_REG_ARGS) {
+
+		/* bpf_dynptr_write() for skb-type dynptrs may pull the skb, so we must
+		 * invalidate all data slices associated with it
+		 */
+		if (stack_slot_get_dynptr_info(env, reg, NULL) == BPF_DYNPTR_TYPE_SKB)
+			changes_data = true;
+
+		break;
+	}
+	case BPF_FUNC_dynptr_data:
+	{
+		struct bpf_reg_state *reg;
+
+		reg = get_dynptr_arg_reg(fn, regs);
+		if (!reg) {
 			verbose(env, "verifier internal error: no dynptr in bpf_dynptr_data()\n");
 			return -EFAULT;
 		}
+
+		if (meta.ref_obj_id) {
+			verbose(env, "verifier internal error: meta.ref_obj_id already set\n");
+			return -EFAULT;
+		}
+
+		dynptr_type = stack_slot_get_dynptr_info(env, reg, &meta.ref_obj_id);
 		break;
 	}
+	}
 
 	if (err)
 		return err;
@@ -7397,8 +7437,15 @@  static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		break;
 	case RET_PTR_TO_ALLOC_MEM:
 		mark_reg_known_zero(env, regs, BPF_REG_0);
-		regs[BPF_REG_0].type = PTR_TO_MEM | ret_flag;
-		regs[BPF_REG_0].mem_size = meta.mem_size;
+
+		if (func_id == BPF_FUNC_dynptr_data &&
+		    dynptr_type == BPF_DYNPTR_TYPE_SKB) {
+			regs[BPF_REG_0].type = PTR_TO_PACKET | ret_flag;
+			regs[BPF_REG_0].range = meta.mem_size;
+		} else {
+			regs[BPF_REG_0].type = PTR_TO_MEM | ret_flag;
+			regs[BPF_REG_0].mem_size = meta.mem_size;
+		}
 		break;
 	case RET_PTR_TO_MEM_OR_BTF_ID:
 	{
@@ -14141,6 +14188,24 @@  static int do_misc_fixups(struct bpf_verifier_env *env)
 			goto patch_call_imm;
 		}
 
+		if (insn->imm == BPF_FUNC_dynptr_from_skb) {
+			bool is_rdonly = !may_access_direct_pkt_data(env, NULL, BPF_WRITE);
+
+			insn_buf[0] = BPF_MOV32_IMM(BPF_REG_4, is_rdonly);
+			insn_buf[1] = *insn;
+			cnt = 2;
+
+			new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
+			if (!new_prog)
+				return -ENOMEM;
+
+			delta += cnt - 1;
+			env->prog = new_prog;
+			prog = new_prog;
+			insn = new_prog->insnsi + i + delta;
+			goto patch_call_imm;
+		}
+
 		/* BPF_EMIT_CALL() assumptions in some of the map_gen_lookup
 		 * and other inlining handlers are currently limited to 64 bit
 		 * only.
diff --git a/net/core/filter.c b/net/core/filter.c
index 1acfaffeaf32..5b204b42fb3e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1681,8 +1681,8 @@  static inline void bpf_pull_mac_rcsum(struct sk_buff *skb)
 		skb_postpull_rcsum(skb, skb_mac_header(skb), skb->mac_len);
 }
 
-BPF_CALL_5(bpf_skb_store_bytes, struct sk_buff *, skb, u32, offset,
-	   const void *, from, u32, len, u64, flags)
+int __bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from,
+			  u32 len, u64 flags)
 {
 	void *ptr;
 
@@ -1707,6 +1707,12 @@  BPF_CALL_5(bpf_skb_store_bytes, struct sk_buff *, skb, u32, offset,
 	return 0;
 }
 
+BPF_CALL_5(bpf_skb_store_bytes, struct sk_buff *, skb, u32, offset,
+	   const void *, from, u32, len, u64, flags)
+{
+	return __bpf_skb_store_bytes(skb, offset, from, len, flags);
+}
+
 static const struct bpf_func_proto bpf_skb_store_bytes_proto = {
 	.func		= bpf_skb_store_bytes,
 	.gpl_only	= false,
@@ -1718,8 +1724,7 @@  static const struct bpf_func_proto bpf_skb_store_bytes_proto = {
 	.arg5_type	= ARG_ANYTHING,
 };
 
-BPF_CALL_4(bpf_skb_load_bytes, const struct sk_buff *, skb, u32, offset,
-	   void *, to, u32, len)
+int __bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset, void *to, u32 len)
 {
 	void *ptr;
 
@@ -1738,6 +1743,12 @@  BPF_CALL_4(bpf_skb_load_bytes, const struct sk_buff *, skb, u32, offset,
 	return -EFAULT;
 }
 
+BPF_CALL_4(bpf_skb_load_bytes, const struct sk_buff *, skb, u32, offset,
+	   void *, to, u32, len)
+{
+	return __bpf_skb_load_bytes(skb, offset, to, len);
+}
+
 static const struct bpf_func_proto bpf_skb_load_bytes_proto = {
 	.func		= bpf_skb_load_bytes,
 	.gpl_only	= false,
@@ -1849,6 +1860,32 @@  static const struct bpf_func_proto bpf_skb_pull_data_proto = {
 	.arg2_type	= ARG_ANYTHING,
 };
 
+/* is_rdonly is set by the verifier */
+BPF_CALL_4(bpf_dynptr_from_skb, struct sk_buff *, skb, u64, flags,
+	   struct bpf_dynptr_kern *, ptr, u32, is_rdonly)
+{
+	if (flags) {
+		bpf_dynptr_set_null(ptr);
+		return -EINVAL;
+	}
+
+	bpf_dynptr_init(ptr, skb, BPF_DYNPTR_TYPE_SKB, 0, skb->len);
+
+	if (is_rdonly)
+		bpf_dynptr_set_rdonly(ptr);
+
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_dynptr_from_skb_proto = {
+	.func		= bpf_dynptr_from_skb,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_SKB | MEM_UNINIT,
+};
+
 BPF_CALL_1(bpf_sk_fullsock, struct sock *, sk)
 {
 	return sk_fullsock(sk) ? (unsigned long)sk : (unsigned long)NULL;
@@ -7726,6 +7763,8 @@  sk_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_socket_uid_proto;
 	case BPF_FUNC_perf_event_output:
 		return &bpf_skb_event_output_proto;
+	case BPF_FUNC_dynptr_from_skb:
+		return &bpf_dynptr_from_skb_proto;
 	default:
 		return bpf_sk_base_func_proto(func_id);
 	}
@@ -7909,6 +7948,8 @@  tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_tcp_raw_check_syncookie_ipv6_proto;
 #endif
 #endif
+	case BPF_FUNC_dynptr_from_skb:
+		return &bpf_dynptr_from_skb_proto;
 	default:
 		return bpf_sk_base_func_proto(func_id);
 	}
@@ -8104,6 +8145,8 @@  sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_skc_lookup_tcp:
 		return &bpf_skc_lookup_tcp_proto;
 #endif
+	case BPF_FUNC_dynptr_from_skb:
+		return &bpf_dynptr_from_skb_proto;
 	default:
 		return bpf_sk_base_func_proto(func_id);
 	}
@@ -8142,6 +8185,8 @@  lwt_out_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_smp_processor_id_proto;
 	case BPF_FUNC_skb_under_cgroup:
 		return &bpf_skb_under_cgroup_proto;
+	case BPF_FUNC_dynptr_from_skb:
+		return &bpf_dynptr_from_skb_proto;
 	default:
 		return bpf_sk_base_func_proto(func_id);
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 1d6085e15fc8..3f1800a2b77c 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5253,11 +5253,22 @@  union bpf_attr {
  *	Description
  *		Write *len* bytes from *src* into *dst*, starting from *offset*
  *		into *dst*.
- *		*flags* is currently unused.
+ *
+ *		*flags* must be 0 except for skb-type dynptrs.
+ *
+ *		For skb-type dynptrs:
+ *		    *  All data slices of the dynptr are automatically
+ *		       invalidated after **bpf_dynptr_write**\ (). If you wish to
+ *		       avoid this, please perform the write using direct data slices
+ *		       instead.
+ *
+ *		    *  For *flags*, please see the flags accepted by
+ *		       **bpf_skb_store_bytes**\ ().
  *	Return
  *		0 on success, -E2BIG if *offset* + *len* exceeds the length
  *		of *dst*'s data, -EINVAL if *dst* is an invalid dynptr or if *dst*
- *		is a read-only dynptr or if *flags* is not 0.
+ *		is a read-only dynptr or if *flags* is not correct. For skb-type dynptrs,
+ *		other errors correspond to errors returned by **bpf_skb_store_bytes**\ ().
  *
  * void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len)
  *	Description
@@ -5265,10 +5276,20 @@  union bpf_attr {
  *
  *		*len* must be a statically known value. The returned data slice
  *		is invalidated whenever the dynptr is invalidated.
+ *
+ *		For skb-type dynptrs:
+ *		    * If *offset* + *len* extends into the skb's paged buffers,
+ *		      the user should manually pull the skb with **bpf_skb_pull_data**\ ()
+ *		      and try again.
+ *
+ *		    * The data slice is automatically invalidated anytime
+ *		      **bpf_dynptr_write**\ () or a helper call that changes
+ *		      the underlying packet buffer (eg **bpf_skb_pull_data**\ ())
+ *		      is called.
  *	Return
  *		Pointer to the underlying dynptr data, NULL if the dynptr is
  *		read-only, if the dynptr is invalid, or if the offset and length
- *		is out of bounds.
+ *		is out of bounds or in a paged buffer for skb-type dynptrs.
  *
  * s64 bpf_tcp_raw_gen_syncookie_ipv4(struct iphdr *iph, struct tcphdr *th, u32 th_len)
  *	Description
@@ -5355,6 +5376,18 @@  union bpf_attr {
  *	Return
  *		Current *ktime*.
  *
+ * long bpf_dynptr_from_skb(struct sk_buff *skb, u64 flags, struct bpf_dynptr *ptr)
+ *	Description
+ *		Get a dynptr to the data in *skb*. *skb* must be the BPF program
+ *		context. Depending on program type, the dynptr may be read-only.
+ *
+ *		Calls that change the *skb*'s underlying packet buffer
+ *		(eg **bpf_skb_pull_data**\ ()) do not invalidate the dynptr, but
+ *		they do invalidate any data slices associated with the dynptr.
+ *
+ *		*flags* is currently unused, it must be 0 for now.
+ *	Return
+ *		0 on success or -EINVAL if flags is not 0.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5566,6 +5599,7 @@  union bpf_attr {
 	FN(tcp_raw_check_syncookie_ipv4),	\
 	FN(tcp_raw_check_syncookie_ipv6),	\
 	FN(ktime_get_tai_ns),		\
+	FN(dynptr_from_skb),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper