diff mbox series

[bpf-next,2/5] bpf: Add a bpf_snprintf helper

Message ID 20210310220211.1454516-3-revest@chromium.org (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series Add a snprintf eBPF helper | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 9 maintainers not CCed: netdev@vger.kernel.org joe@cilium.io linux-api@vger.kernel.org mingo@redhat.com kafai@fb.com rostedt@goodmis.org john.fastabend@gmail.com songliubraving@fb.com quentin@isovalent.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit fail Errors and warnings before: 12264 this patch: 12268
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch warning CHECK: Alignment should match open parenthesis CHECK: No space is necessary after a cast CHECK: Please use a blank line after function/struct/union/enum declarations CHECK: spaces preferred around that '+' (ctx:VxV) WARNING: line length of 87 exceeds 80 columns
netdev/build_allmodconfig_warn fail Errors and warnings before: 12768 this patch: 12770
netdev/header_inline success Link

Commit Message

Florent Revest March 10, 2021, 10:02 p.m. UTC
The implementation takes inspiration from the existing bpf_trace_printk
helper but there are a few differences:

To allow for a large number of format-specifiers, parameters are
provided in an array, like in bpf_seq_printf.

Because the output string takes two arguments and the array of
parameters also takes two arguments, the format string needs to fit in
one argument. But because ARG_PTR_TO_CONST_STR guarantees to point to a
NULL-terminated read-only map, we don't need a format string length arg.

Because the format-string is known at verification time, we also move
most of the format string validation, currently done in formatting
helper calls, into the verifier logic. This makes debugging easier and
also slightly improves the runtime performance.

Signed-off-by: Florent Revest <revest@chromium.org>
---
 include/linux/bpf.h            |   4 +
 include/uapi/linux/bpf.h       |  28 +++++++
 kernel/bpf/verifier.c          | 137 +++++++++++++++++++++++++++++++++
 kernel/trace/bpf_trace.c       | 110 ++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  28 +++++++
 5 files changed, 307 insertions(+)

Comments

kernel test robot March 11, 2021, 12:14 a.m. UTC | #1
Hi Florent,

I love your patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Florent-Revest/bpf-Add-a-ARG_PTR_TO_CONST_STR-argument-type/20210311-070306
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: openrisc-randconfig-r023-20210308 (attached as .config)
compiler: or1k-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/c06d419a45370b897b7383625f0435873a7458fa
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Florent-Revest/bpf-Add-a-ARG_PTR_TO_CONST_STR-argument-type/20210311-070306
        git checkout c06d419a45370b897b7383625f0435873a7458fa
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=openrisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   kernel/bpf/verifier.c: In function 'check_func_arg':
   kernel/bpf/verifier.c:4918:13: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    4918 |   map_ptr = (char *)(map_addr);
         |             ^
   kernel/bpf/verifier.c: At top level:
>> kernel/bpf/verifier.c:5735:5: warning: no previous prototype for 'check_bpf_snprintf_call' [-Wmissing-prototypes]
    5735 | int check_bpf_snprintf_call(struct bpf_verifier_env *env,
         |     ^~~~~~~~~~~~~~~~~~~~~~~
   kernel/bpf/verifier.c: In function 'check_bpf_snprintf_call':
   kernel/bpf/verifier.c:5758:8: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    5758 |  fmt = (char *)fmt_addr;
         |        ^
   In file included from include/linux/bpf_verifier.h:9,
                    from kernel/bpf/verifier.c:12:
   kernel/bpf/verifier.c: In function 'jit_subprogs':
   include/linux/filter.h:363:4: warning: cast between incompatible function types from 'unsigned int (*)(const void *, const struct bpf_insn *)' to 'u64 (*)(u64,  u64,  u64,  u64,  u64)' {aka 'long long unsigned int (*)(long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int)'} [-Wcast-function-type]
     363 |   ((u64 (*)(u64, u64, u64, u64, u64))(x))
         |    ^
   kernel/bpf/verifier.c:11865:16: note: in expansion of macro 'BPF_CAST_CALL'
   11865 |    insn->imm = BPF_CAST_CALL(func[subprog]->bpf_func) -
         |                ^~~~~~~~~~~~~
   kernel/bpf/verifier.c: In function 'do_misc_fixups':
   include/linux/filter.h:363:4: warning: cast between incompatible function types from 'void * (* const)(struct bpf_map *, void *)' to 'u64 (*)(u64,  u64,  u64,  u64,  u64)' {aka 'long long unsigned int (*)(long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int)'} [-Wcast-function-type]
     363 |   ((u64 (*)(u64, u64, u64, u64, u64))(x))
         |    ^
   kernel/bpf/verifier.c:12273:17: note: in expansion of macro 'BPF_CAST_CALL'
   12273 |     insn->imm = BPF_CAST_CALL(ops->map_lookup_elem) -
         |                 ^~~~~~~~~~~~~
   include/linux/filter.h:363:4: warning: cast between incompatible function types from 'int (* const)(struct bpf_map *, void *, void *, u64)' {aka 'int (* const)(struct bpf_map *, void *, void *, long long unsigned int)'} to 'u64 (*)(u64,  u64,  u64,  u64,  u64)' {aka 'long long unsigned int (*)(long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int)'} [-Wcast-function-type]
     363 |   ((u64 (*)(u64, u64, u64, u64, u64))(x))
         |    ^
   kernel/bpf/verifier.c:12277:17: note: in expansion of macro 'BPF_CAST_CALL'
   12277 |     insn->imm = BPF_CAST_CALL(ops->map_update_elem) -
         |                 ^~~~~~~~~~~~~
   include/linux/filter.h:363:4: warning: cast between incompatible function types from 'int (* const)(struct bpf_map *, void *)' to 'u64 (*)(u64,  u64,  u64,  u64,  u64)' {aka 'long long unsigned int (*)(long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int)'} [-Wcast-function-type]
     363 |   ((u64 (*)(u64, u64, u64, u64, u64))(x))
         |    ^
   kernel/bpf/verifier.c:12281:17: note: in expansion of macro 'BPF_CAST_CALL'
   12281 |     insn->imm = BPF_CAST_CALL(ops->map_delete_elem) -
         |                 ^~~~~~~~~~~~~
   include/linux/filter.h:363:4: warning: cast between incompatible function types from 'int (* const)(struct bpf_map *, void *, u64)' {aka 'int (* const)(struct bpf_map *, void *, long long unsigned int)'} to 'u64 (*)(u64,  u64,  u64,  u64,  u64)' {aka 'long long unsigned int (*)(long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int)'} [-Wcast-function-type]
     363 |   ((u64 (*)(u64, u64, u64, u64, u64))(x))
         |    ^
   kernel/bpf/verifier.c:12285:17: note: in expansion of macro 'BPF_CAST_CALL'
   12285 |     insn->imm = BPF_CAST_CALL(ops->map_push_elem) -
         |                 ^~~~~~~~~~~~~
   include/linux/filter.h:363:4: warning: cast between incompatible function types from 'int (* const)(struct bpf_map *, void *)' to 'u64 (*)(u64,  u64,  u64,  u64,  u64)' {aka 'long long unsigned int (*)(long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int)'} [-Wcast-function-type]
     363 |   ((u64 (*)(u64, u64, u64, u64, u64))(x))
         |    ^
   kernel/bpf/verifier.c:12289:17: note: in expansion of macro 'BPF_CAST_CALL'
   12289 |     insn->imm = BPF_CAST_CALL(ops->map_pop_elem) -
         |                 ^~~~~~~~~~~~~
   include/linux/filter.h:363:4: warning: cast between incompatible function types from 'int (* const)(struct bpf_map *, void *)' to 'u64 (*)(u64,  u64,  u64,  u64,  u64)' {aka 'long long unsigned int (*)(long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int)'} [-Wcast-function-type]
     363 |   ((u64 (*)(u64, u64, u64, u64, u64))(x))
         |    ^
   kernel/bpf/verifier.c:12293:17: note: in expansion of macro 'BPF_CAST_CALL'
   12293 |     insn->imm = BPF_CAST_CALL(ops->map_peek_elem) -
         |                 ^~~~~~~~~~~~~
   include/linux/filter.h:363:4: warning: cast between incompatible function types from 'int (* const)(struct bpf_map *, u32,  u64)' {aka 'int (* const)(struct bpf_map *, unsigned int,  long long unsigned int)'} to 'u64 (*)(u64,  u64,  u64,  u64,  u64)' {aka 'long long unsigned int (*)(long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int,  long long unsigned int)'} [-Wcast-function-type]
     363 |   ((u64 (*)(u64, u64, u64, u64, u64))(x))
         |    ^
   kernel/bpf/verifier.c:12297:17: note: in expansion of macro 'BPF_CAST_CALL'
   12297 |     insn->imm = BPF_CAST_CALL(ops->map_redirect) -
         |                 ^~~~~~~~~~~~~


vim +/check_bpf_snprintf_call +5735 kernel/bpf/verifier.c

  5734	
> 5735	int check_bpf_snprintf_call(struct bpf_verifier_env *env,
  5736				    struct bpf_reg_state *regs)
  5737	{
  5738		struct bpf_reg_state *fmt_reg = &regs[BPF_REG_3];
  5739		struct bpf_reg_state *data_len_reg = &regs[BPF_REG_5];
  5740		struct bpf_map *fmt_map = fmt_reg->map_ptr;
  5741		int err, fmt_map_off, i, fmt_cnt = 0, memcpy_cnt = 0, num_args;
  5742		u64 fmt_addr;
  5743		char *fmt;
  5744	
  5745		/* data must be an array of u64 so data_len must be a multiple of 8 */
  5746		if (data_len_reg->var_off.value & 7)
  5747			return -EINVAL;
  5748		num_args = data_len_reg->var_off.value / 8;
  5749	
  5750		/* fmt being ARG_PTR_TO_CONST_STR guarantees that var_off is const
  5751		 * and map_direct_value_addr is set.
  5752		 */
  5753		fmt_map_off = fmt_reg->off + fmt_reg->var_off.value;
  5754		err = fmt_map->ops->map_direct_value_addr(fmt_map, &fmt_addr,
  5755							  fmt_map_off);
  5756		if (err)
  5757			return err;
  5758		fmt = (char *)fmt_addr;
  5759	
  5760		/* We are also guaranteed that fmt+fmt_map_off is NULL terminated, we
  5761		 * can focus on validating the format specifiers.
  5762		 */
  5763		for (i = fmt_map_off; fmt[i] != '\0'; i++) {
  5764			if ((!isprint(fmt[i]) && !isspace(fmt[i])) ||
  5765			    !isascii(fmt[i])) {
  5766				verbose(env, "only printable ascii for now\n");
  5767				return -EINVAL;
  5768			}
  5769	
  5770			if (fmt[i] != '%')
  5771				continue;
  5772	
  5773			if (fmt[i + 1] == '%') {
  5774				i++;
  5775				continue;
  5776			}
  5777	
  5778			if (fmt_cnt >= MAX_SNPRINTF_VARARGS) {
  5779				verbose(env, "too many format specifiers\n");
  5780				return -E2BIG;
  5781			}
  5782	
  5783			if (fmt_cnt >= num_args) {
  5784				verbose(env, "not enough parameters to print\n");
  5785				return -EINVAL;
  5786			}
  5787	
  5788			/* fmt[i] != 0 && fmt[last] == 0, so we can access fmt[i + 1] */
  5789			i++;
  5790	
  5791			/* skip optional "[0 +-][num]" width formating field */
  5792			while (fmt[i] == '0' || fmt[i] == '+'  || fmt[i] == '-' ||
  5793			       fmt[i] == ' ')
  5794				i++;
  5795			if (fmt[i] >= '1' && fmt[i] <= '9') {
  5796				i++;
  5797				while (fmt[i] >= '0' && fmt[i] <= '9')
  5798					i++;
  5799			}
  5800	
  5801			if (fmt[i] == 's') {
  5802				if (memcpy_cnt >= MAX_SNPRINTF_MEMCPY) {
  5803					verbose(env, "too many buffer copies\n");
  5804					return -E2BIG;
  5805				}
  5806	
  5807				fmt_cnt++;
  5808				memcpy_cnt++;
  5809				continue;
  5810			}
  5811	
  5812			if (fmt[i] == 'p') {
  5813				if (fmt[i + 1] == 0 ||  fmt[i + 1] == 'K' ||
  5814				    fmt[i + 1] == 'x' || fmt[i + 1] == 'B' ||
  5815				    fmt[i + 1] == 's' || fmt[i + 1] == 'S') {
  5816					fmt_cnt++;
  5817					continue;
  5818				}
  5819	
  5820				/* only support "%pI4", "%pi4", "%pI6" and "%pi6". */
  5821				if (fmt[i + 1] != 'i' && fmt[i + 1] != 'I') {
  5822					verbose(env, "invalid specifier %%p%c\n",
  5823						fmt[i+1]);
  5824					return -EINVAL;
  5825				}
  5826				if (fmt[i + 2] != '4' && fmt[i + 2] != '6') {
  5827					verbose(env, "invalid specifier %%p%c%c\n",
  5828						fmt[i+1], fmt[i+2]);
  5829					return -EINVAL;
  5830				}
  5831	
  5832				if (memcpy_cnt >= MAX_SNPRINTF_MEMCPY) {
  5833					verbose(env, "too many buffer copies\n");
  5834					return -E2BIG;
  5835				}
  5836	
  5837				i += 2;
  5838				fmt_cnt++;
  5839				memcpy_cnt++;
  5840				continue;
  5841			}
  5842	
  5843			if (fmt[i] == 'l') {
  5844				i++;
  5845				if (fmt[i] == 'l')
  5846					i++;
  5847			}
  5848	
  5849			if (fmt[i] != 'i' && fmt[i] != 'd' && fmt[i] != 'u' &&
  5850			    fmt[i] != 'x' && fmt[i] != 'X') {
  5851				verbose(env, "invalid format specifier %%%c\n", fmt[i]);
  5852				return -EINVAL;
  5853			}
  5854	
  5855			fmt_cnt++;
  5856		}
  5857	
  5858		if (fmt_cnt != num_args) {
  5859			verbose(env, "too many parameters to print\n");
  5860			return -EINVAL;
  5861		}
  5862	
  5863		return 0;
  5864	}
  5865	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
kernel test robot March 11, 2021, 3:12 a.m. UTC | #2
Hi Florent,

I love your patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Florent-Revest/bpf-Add-a-ARG_PTR_TO_CONST_STR-argument-type/20210311-070306
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: x86_64-randconfig-s022-20210310 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-262-g5e674421-dirty
        # https://github.com/0day-ci/linux/commit/c06d419a45370b897b7383625f0435873a7458fa
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Florent-Revest/bpf-Add-a-ARG_PTR_TO_CONST_STR-argument-type/20210311-070306
        git checkout c06d419a45370b897b7383625f0435873a7458fa
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


"sparse warnings: (new ones prefixed by >>)"
>> kernel/bpf/verifier.c:5735:5: sparse: sparse: symbol 'check_bpf_snprintf_call' was not declared. Should it be static?
   kernel/bpf/verifier.c:11865:76: sparse: sparse: subtraction of functions? Share your drugs
   kernel/bpf/verifier.c:12273:81: sparse: sparse: subtraction of functions? Share your drugs
   kernel/bpf/verifier.c:12277:81: sparse: sparse: subtraction of functions? Share your drugs
   kernel/bpf/verifier.c:12281:81: sparse: sparse: subtraction of functions? Share your drugs
   kernel/bpf/verifier.c:12285:79: sparse: sparse: subtraction of functions? Share your drugs
   kernel/bpf/verifier.c:12289:78: sparse: sparse: subtraction of functions? Share your drugs
   kernel/bpf/verifier.c:12293:79: sparse: sparse: subtraction of functions? Share your drugs
   kernel/bpf/verifier.c:12297:78: sparse: sparse: subtraction of functions? Share your drugs
   kernel/bpf/verifier.c:12341:38: sparse: sparse: subtraction of functions? Share your drugs

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Andrii Nakryiko March 16, 2021, 1:25 a.m. UTC | #3
On Wed, Mar 10, 2021 at 2:02 PM Florent Revest <revest@chromium.org> wrote:
>
> The implementation takes inspiration from the existing bpf_trace_printk
> helper but there are a few differences:
>
> To allow for a large number of format-specifiers, parameters are
> provided in an array, like in bpf_seq_printf.
>
> Because the output string takes two arguments and the array of
> parameters also takes two arguments, the format string needs to fit in
> one argument. But because ARG_PTR_TO_CONST_STR guarantees to point to a
> NULL-terminated read-only map, we don't need a format string length arg.
>
> Because the format-string is known at verification time, we also move
> most of the format string validation, currently done in formatting
> helper calls, into the verifier logic. This makes debugging easier and
> also slightly improves the runtime performance.
>
> Signed-off-by: Florent Revest <revest@chromium.org>
> ---
>  include/linux/bpf.h            |   4 +
>  include/uapi/linux/bpf.h       |  28 +++++++
>  kernel/bpf/verifier.c          | 137 +++++++++++++++++++++++++++++++++
>  kernel/trace/bpf_trace.c       | 110 ++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h |  28 +++++++
>  5 files changed, 307 insertions(+)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 7b5319d75b3e..d78175c9a887 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1902,6 +1902,10 @@ extern const struct bpf_func_proto bpf_task_storage_get_proto;
>  extern const struct bpf_func_proto bpf_task_storage_delete_proto;
>  extern const struct bpf_func_proto bpf_for_each_map_elem_proto;
>
> +#define MAX_SNPRINTF_VARARGS           12
> +#define MAX_SNPRINTF_MEMCPY            6
> +#define MAX_SNPRINTF_STR_LEN           128
> +
>  const struct bpf_func_proto *bpf_tracing_func_proto(
>         enum bpf_func_id func_id, const struct bpf_prog *prog);
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 2d3036e292a9..3cbdc8ae00e7 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -4660,6 +4660,33 @@ union bpf_attr {
>   *     Return
>   *             The number of traversed map elements for success, **-EINVAL** for
>   *             invalid **flags**.
> + *
> + * long bpf_snprintf(char *out, u32 out_size, const char *fmt, u64 *data, u32 data_len)

bpf_snprintf_btf calls out and out_size str and str_size, let's be consistent?

> + *     Description
> + *             Outputs a string into the **out** buffer of size **out_size**
> + *             based on a format string stored in a read-only map pointed by
> + *             **fmt**.
> + *
> + *             Each format specifier in **fmt** corresponds to one u64 element
> + *             in the **data** array. For strings and pointers where pointees
> + *             are accessed, only the pointer values are stored in the *data*
> + *             array. The *data_len* is the size of *data* in bytes.
> + *
> + *             Formats **%s** and **%p{i,I}{4,6}** require to read kernel
> + *             memory. Reading kernel memory may fail due to either invalid
> + *             address or valid address but requiring a major memory fault. If
> + *             reading kernel memory fails, the string for **%s** will be an
> + *             empty string, and the ip address for **%p{i,I}{4,6}** will be 0.
> + *             Not returning error to bpf program is consistent with what
> + *             **bpf_trace_printk**\ () does for now.
> + *
> + *     Return
> + *             The strictly positive length of the printed string, including
> + *             the trailing NUL character. If the return value is greater than
> + *             **out_size**, **out** contains a truncated string, without a
> + *             trailing NULL character.

this deviates from the behavior in other BPF helpers dealing with
strings. and it's extremely inconvenient for users to get
non-zero-terminated string. I think we should always zero-terminate.

> + *
> + *             Or **-EBUSY** if the per-CPU memory copy buffer is busy.
>   */
>  #define __BPF_FUNC_MAPPER(FN)          \
>         FN(unspec),                     \
> @@ -4827,6 +4854,7 @@ union bpf_attr {
>         FN(sock_from_file),             \
>         FN(check_mtu),                  \
>         FN(for_each_map_elem),          \
> +       FN(snprintf),                   \
>         /* */
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index c99b2b67dc8d..3ab549df817b 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -5732,6 +5732,137 @@ static int check_reference_leak(struct bpf_verifier_env *env)
>         return state->acquired_refs ? -EINVAL : 0;
>  }
>
> +int check_bpf_snprintf_call(struct bpf_verifier_env *env,
> +                           struct bpf_reg_state *regs)
> +{

can we please extra the printf format string parsing/checking logic
and re-use them across all functions? We now have at least 4 variants
of it, it's not great to say the least. I hope it's possible to
generalize it in such a way that the same function will parse the
string, and will record each expected argument and it's type, with
whatever extra flags we need to. That should make the printing part
simpler as well, as it will just follow "directions" from the parsing
part? Devil is in the details, of course :) But it's worthwhile to try
at least.

> +       struct bpf_reg_state *fmt_reg = &regs[BPF_REG_3];
> +       struct bpf_reg_state *data_len_reg = &regs[BPF_REG_5];
> +       struct bpf_map *fmt_map = fmt_reg->map_ptr;
> +       int err, fmt_map_off, i, fmt_cnt = 0, memcpy_cnt = 0, num_args;
> +       u64 fmt_addr;
> +       char *fmt;
> +
> +       /* data must be an array of u64 so data_len must be a multiple of 8 */
> +       if (data_len_reg->var_off.value & 7)
> +               return -EINVAL;
> +       num_args = data_len_reg->var_off.value / 8;
> +
> +       /* fmt being ARG_PTR_TO_CONST_STR guarantees that var_off is const
> +        * and map_direct_value_addr is set.
> +        */
> +       fmt_map_off = fmt_reg->off + fmt_reg->var_off.value;
> +       err = fmt_map->ops->map_direct_value_addr(fmt_map, &fmt_addr,
> +                                                 fmt_map_off);
> +       if (err)
> +               return err;
> +       fmt = (char *)fmt_addr;
> +

[...] not fun to read this part over and over :)

> +       }
> +
> +       return 0;
> +}
> +
>  static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>                              int *insn_idx_p)
>  {
> @@ -5846,6 +5977,12 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>                         return -EINVAL;
>         }
>
> +       if (func_id == BPF_FUNC_snprintf) {
> +               err = check_bpf_snprintf_call(env, regs);
> +               if (err < 0)
> +                       return err;
> +       }
> +
>         /* reset caller saved regs */
>         for (i = 0; i < CALLER_SAVED_REGS; i++) {
>                 mark_reg_not_init(env, regs, caller_saved[i]);
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 0d23755c2747..7b80759c10a9 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -1271,6 +1271,114 @@ const struct bpf_func_proto bpf_snprintf_btf_proto = {
>         .arg5_type      = ARG_ANYTHING,
>  };
>
> +struct bpf_snprintf_buf {
> +       char buf[MAX_SNPRINTF_MEMCPY][MAX_SNPRINTF_STR_LEN];
> +};
> +static DEFINE_PER_CPU(struct bpf_snprintf_buf, bpf_snprintf_buf);
> +static DEFINE_PER_CPU(int, bpf_snprintf_buf_used);
> +
> +BPF_CALL_5(bpf_snprintf, char *, out, u32, out_size, char *, fmt, u64 *, args,
> +          u32, args_len)
> +{
> +       int err, i, buf_used, copy_size, fmt_cnt = 0, memcpy_cnt = 0;
> +       u64 params[MAX_SNPRINTF_VARARGS];
> +       struct bpf_snprintf_buf *bufs;
> +
> +       buf_used = this_cpu_inc_return(bpf_snprintf_buf_used);
> +       if (WARN_ON_ONCE(buf_used > 1)) {
> +               err = -EBUSY;
> +               goto out;
> +       }
> +
> +       bufs = this_cpu_ptr(&bpf_snprintf_buf);
> +
> +       /*
> +        * The verifier has already done most of the heavy-work for us in
> +        * check_bpf_snprintf_call. We know that fmt is well formatted and that
> +        * args_len is valid. The only task left is to convert some of the
> +        * arguments. For the %s and %pi* specifiers, we need to read buffers
> +        * from a kernel address during the helper call.
> +        */
> +       for (i = 0; fmt[i] != '\0'; i++) {

same function should hopefully be reused here

> +       }
> +
> +       /* Maximumly we can have MAX_SNPRINTF_VARARGS parameters, just give
> +        * all of them to snprintf().
> +        */
> +       err = snprintf(out, out_size, fmt, params[0], params[1], params[2],
> +                      params[3], params[4], params[5], params[6], params[7],
> +                      params[8], params[9], params[10], params[11]) + 1;
> +
> +out:
> +       this_cpu_dec(bpf_snprintf_buf_used);
> +       return err;
> +}
> +
> +static const struct bpf_func_proto bpf_snprintf_proto = {
> +       .func           = bpf_snprintf,
> +       .gpl_only       = true,
> +       .ret_type       = RET_INTEGER,
> +       .arg1_type      = ARG_PTR_TO_MEM,
> +       .arg2_type      = ARG_CONST_SIZE,

can we mark is CONST_SIZE_OR_ZERO and just do nothing on zero at
runtime? I still have scars from having to deal (prove, actually) with
ARG_CONST_SIZE (> 0) limitations in perf_event_output. No need to make
anyone's life harder, if it's easy to just do something sensible on
zero (i.e., do nothing, but emit desired amount of bytes).

> +       .arg3_type      = ARG_PTR_TO_CONST_STR,
> +       .arg4_type      = ARG_PTR_TO_MEM,
> +       .arg5_type      = ARG_CONST_SIZE_OR_ZERO,
> +};
> +
>  const struct bpf_func_proto *
>  bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  {
> @@ -1373,6 +1481,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>                 return &bpf_task_storage_delete_proto;
>         case BPF_FUNC_for_each_map_elem:
>                 return &bpf_for_each_map_elem_proto;
> +       case BPF_FUNC_snprintf:
> +               return &bpf_snprintf_proto;

why just tracing? can't all BPF programs use this functionality?

>         default:
>                 return NULL;
>         }
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 2d3036e292a9..3cbdc8ae00e7 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -4660,6 +4660,33 @@ union bpf_attr {
>   *     Return
>   *             The number of traversed map elements for success, **-EINVAL** for
>   *             invalid **flags**.
> + *
> + * long bpf_snprintf(char *out, u32 out_size, const char *fmt, u64 *data, u32 data_len)
> + *     Description
> + *             Outputs a string into the **out** buffer of size **out_size**
> + *             based on a format string stored in a read-only map pointed by
> + *             **fmt**.
> + *
> + *             Each format specifier in **fmt** corresponds to one u64 element
> + *             in the **data** array. For strings and pointers where pointees
> + *             are accessed, only the pointer values are stored in the *data*
> + *             array. The *data_len* is the size of *data* in bytes.
> + *
> + *             Formats **%s** and **%p{i,I}{4,6}** require to read kernel
> + *             memory. Reading kernel memory may fail due to either invalid
> + *             address or valid address but requiring a major memory fault. If
> + *             reading kernel memory fails, the string for **%s** will be an
> + *             empty string, and the ip address for **%p{i,I}{4,6}** will be 0.
> + *             Not returning error to bpf program is consistent with what
> + *             **bpf_trace_printk**\ () does for now.
> + *
> + *     Return
> + *             The strictly positive length of the printed string, including
> + *             the trailing NUL character. If the return value is greater than
> + *             **out_size**, **out** contains a truncated string, without a
> + *             trailing NULL character.
> + *
> + *             Or **-EBUSY** if the per-CPU memory copy buffer is busy.
>   */
>  #define __BPF_FUNC_MAPPER(FN)          \
>         FN(unspec),                     \
> @@ -4827,6 +4854,7 @@ union bpf_attr {
>         FN(sock_from_file),             \
>         FN(check_mtu),                  \
>         FN(for_each_map_elem),          \
> +       FN(snprintf),                   \
>         /* */
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> --
> 2.30.1.766.gb4fecdf3b7-goog
>
Florent Revest March 16, 2021, 1:18 p.m. UTC | #4
On Tue, Mar 16, 2021 at 2:25 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Mar 10, 2021 at 2:02 PM Florent Revest <revest@chromium.org> wrote:
> >
> > The implementation takes inspiration from the existing bpf_trace_printk
> > helper but there are a few differences:
> >
> > To allow for a large number of format-specifiers, parameters are
> > provided in an array, like in bpf_seq_printf.
> >
> > Because the output string takes two arguments and the array of
> > parameters also takes two arguments, the format string needs to fit in
> > one argument. But because ARG_PTR_TO_CONST_STR guarantees to point to a
> > NULL-terminated read-only map, we don't need a format string length arg.
> >
> > Because the format-string is known at verification time, we also move
> > most of the format string validation, currently done in formatting
> > helper calls, into the verifier logic. This makes debugging easier and
> > also slightly improves the runtime performance.
> >
> > Signed-off-by: Florent Revest <revest@chromium.org>
> > ---
> >  include/linux/bpf.h            |   4 +
> >  include/uapi/linux/bpf.h       |  28 +++++++
> >  kernel/bpf/verifier.c          | 137 +++++++++++++++++++++++++++++++++
> >  kernel/trace/bpf_trace.c       | 110 ++++++++++++++++++++++++++
> >  tools/include/uapi/linux/bpf.h |  28 +++++++
> >  5 files changed, 307 insertions(+)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 7b5319d75b3e..d78175c9a887 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1902,6 +1902,10 @@ extern const struct bpf_func_proto bpf_task_storage_get_proto;
> >  extern const struct bpf_func_proto bpf_task_storage_delete_proto;
> >  extern const struct bpf_func_proto bpf_for_each_map_elem_proto;
> >
> > +#define MAX_SNPRINTF_VARARGS           12
> > +#define MAX_SNPRINTF_MEMCPY            6
> > +#define MAX_SNPRINTF_STR_LEN           128
> > +
> >  const struct bpf_func_proto *bpf_tracing_func_proto(
> >         enum bpf_func_id func_id, const struct bpf_prog *prog);
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 2d3036e292a9..3cbdc8ae00e7 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -4660,6 +4660,33 @@ union bpf_attr {
> >   *     Return
> >   *             The number of traversed map elements for success, **-EINVAL** for
> >   *             invalid **flags**.
> > + *
> > + * long bpf_snprintf(char *out, u32 out_size, const char *fmt, u64 *data, u32 data_len)
>
> bpf_snprintf_btf calls out and out_size str and str_size, let's be consistent?
>
> > + *     Description
> > + *             Outputs a string into the **out** buffer of size **out_size**
> > + *             based on a format string stored in a read-only map pointed by
> > + *             **fmt**.
> > + *
> > + *             Each format specifier in **fmt** corresponds to one u64 element
> > + *             in the **data** array. For strings and pointers where pointees
> > + *             are accessed, only the pointer values are stored in the *data*
> > + *             array. The *data_len* is the size of *data* in bytes.
> > + *
> > + *             Formats **%s** and **%p{i,I}{4,6}** require to read kernel
> > + *             memory. Reading kernel memory may fail due to either invalid
> > + *             address or valid address but requiring a major memory fault. If
> > + *             reading kernel memory fails, the string for **%s** will be an
> > + *             empty string, and the ip address for **%p{i,I}{4,6}** will be 0.
> > + *             Not returning error to bpf program is consistent with what
> > + *             **bpf_trace_printk**\ () does for now.
> > + *
> > + *     Return
> > + *             The strictly positive length of the printed string, including
> > + *             the trailing NUL character. If the return value is greater than
> > + *             **out_size**, **out** contains a truncated string, without a
> > + *             trailing NULL character.
>
> this deviates from the behavior in other BPF helpers dealing with
> strings. and it's extremely inconvenient for users to get
> non-zero-terminated string. I think we should always zero-terminate.
>
> > + *
> > + *             Or **-EBUSY** if the per-CPU memory copy buffer is busy.
> >   */
> >  #define __BPF_FUNC_MAPPER(FN)          \
> >         FN(unspec),                     \
> > @@ -4827,6 +4854,7 @@ union bpf_attr {
> >         FN(sock_from_file),             \
> >         FN(check_mtu),                  \
> >         FN(for_each_map_elem),          \
> > +       FN(snprintf),                   \
> >         /* */
> >
> >  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index c99b2b67dc8d..3ab549df817b 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -5732,6 +5732,137 @@ static int check_reference_leak(struct bpf_verifier_env *env)
> >         return state->acquired_refs ? -EINVAL : 0;
> >  }
> >
> > +int check_bpf_snprintf_call(struct bpf_verifier_env *env,
> > +                           struct bpf_reg_state *regs)
> > +{
>
> can we please extra the printf format string parsing/checking logic
> and re-use them across all functions? We now have at least 4 variants
> of it, it's not great to say the least. I hope it's possible to
> generalize it in such a way that the same function will parse the
> string, and will record each expected argument and it's type, with
> whatever extra flags we need to. That should make the printing part
> simpler as well, as it will just follow "directions" from the parsing
> part? Devil is in the details, of course :) But it's worthwhile to try
> at least.

Eheh this is gonna be fun, I'll try it out and see if I can come up
with something ~decent. :)

Thanks for the thorough review! I agree with all your points and will
address them in v2.

> > +       struct bpf_reg_state *fmt_reg = &regs[BPF_REG_3];
> > +       struct bpf_reg_state *data_len_reg = &regs[BPF_REG_5];
> > +       struct bpf_map *fmt_map = fmt_reg->map_ptr;
> > +       int err, fmt_map_off, i, fmt_cnt = 0, memcpy_cnt = 0, num_args;
> > +       u64 fmt_addr;
> > +       char *fmt;
> > +
> > +       /* data must be an array of u64 so data_len must be a multiple of 8 */
> > +       if (data_len_reg->var_off.value & 7)
> > +               return -EINVAL;
> > +       num_args = data_len_reg->var_off.value / 8;
> > +
> > +       /* fmt being ARG_PTR_TO_CONST_STR guarantees that var_off is const
> > +        * and map_direct_value_addr is set.
> > +        */
> > +       fmt_map_off = fmt_reg->off + fmt_reg->var_off.value;
> > +       err = fmt_map->ops->map_direct_value_addr(fmt_map, &fmt_addr,
> > +                                                 fmt_map_off);
> > +       if (err)
> > +               return err;
> > +       fmt = (char *)fmt_addr;
> > +
>
> [...] not fun to read this part over and over :)
>
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> >  static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >                              int *insn_idx_p)
> >  {
> > @@ -5846,6 +5977,12 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> >                         return -EINVAL;
> >         }
> >
> > +       if (func_id == BPF_FUNC_snprintf) {
> > +               err = check_bpf_snprintf_call(env, regs);
> > +               if (err < 0)
> > +                       return err;
> > +       }
> > +
> >         /* reset caller saved regs */
> >         for (i = 0; i < CALLER_SAVED_REGS; i++) {
> >                 mark_reg_not_init(env, regs, caller_saved[i]);
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index 0d23755c2747..7b80759c10a9 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -1271,6 +1271,114 @@ const struct bpf_func_proto bpf_snprintf_btf_proto = {
> >         .arg5_type      = ARG_ANYTHING,
> >  };
> >
> > +struct bpf_snprintf_buf {
> > +       char buf[MAX_SNPRINTF_MEMCPY][MAX_SNPRINTF_STR_LEN];
> > +};
> > +static DEFINE_PER_CPU(struct bpf_snprintf_buf, bpf_snprintf_buf);
> > +static DEFINE_PER_CPU(int, bpf_snprintf_buf_used);
> > +
> > +BPF_CALL_5(bpf_snprintf, char *, out, u32, out_size, char *, fmt, u64 *, args,
> > +          u32, args_len)
> > +{
> > +       int err, i, buf_used, copy_size, fmt_cnt = 0, memcpy_cnt = 0;
> > +       u64 params[MAX_SNPRINTF_VARARGS];
> > +       struct bpf_snprintf_buf *bufs;
> > +
> > +       buf_used = this_cpu_inc_return(bpf_snprintf_buf_used);
> > +       if (WARN_ON_ONCE(buf_used > 1)) {
> > +               err = -EBUSY;
> > +               goto out;
> > +       }
> > +
> > +       bufs = this_cpu_ptr(&bpf_snprintf_buf);
> > +
> > +       /*
> > +        * The verifier has already done most of the heavy-work for us in
> > +        * check_bpf_snprintf_call. We know that fmt is well formatted and that
> > +        * args_len is valid. The only task left is to convert some of the
> > +        * arguments. For the %s and %pi* specifiers, we need to read buffers
> > +        * from a kernel address during the helper call.
> > +        */
> > +       for (i = 0; fmt[i] != '\0'; i++) {
>
> same function should hopefully be reused here
>
> > +       }
> > +
> > +       /* Maximumly we can have MAX_SNPRINTF_VARARGS parameters, just give
> > +        * all of them to snprintf().
> > +        */
> > +       err = snprintf(out, out_size, fmt, params[0], params[1], params[2],
> > +                      params[3], params[4], params[5], params[6], params[7],
> > +                      params[8], params[9], params[10], params[11]) + 1;
> > +
> > +out:
> > +       this_cpu_dec(bpf_snprintf_buf_used);
> > +       return err;
> > +}
> > +
> > +static const struct bpf_func_proto bpf_snprintf_proto = {
> > +       .func           = bpf_snprintf,
> > +       .gpl_only       = true,
> > +       .ret_type       = RET_INTEGER,
> > +       .arg1_type      = ARG_PTR_TO_MEM,
> > +       .arg2_type      = ARG_CONST_SIZE,
>
> can we mark is CONST_SIZE_OR_ZERO and just do nothing on zero at
> runtime? I still have scars from having to deal (prove, actually) with
> ARG_CONST_SIZE (> 0) limitations in perf_event_output. No need to make
> anyone's life harder, if it's easy to just do something sensible on
> zero (i.e., do nothing, but emit desired amount of bytes).
>
> > +       .arg3_type      = ARG_PTR_TO_CONST_STR,
> > +       .arg4_type      = ARG_PTR_TO_MEM,
> > +       .arg5_type      = ARG_CONST_SIZE_OR_ZERO,
> > +};
> > +
> >  const struct bpf_func_proto *
> >  bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> >  {
> > @@ -1373,6 +1481,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> >                 return &bpf_task_storage_delete_proto;
> >         case BPF_FUNC_for_each_map_elem:
> >                 return &bpf_for_each_map_elem_proto;
> > +       case BPF_FUNC_snprintf:
> > +               return &bpf_snprintf_proto;
>
> why just tracing? can't all BPF programs use this functionality?
>
> >         default:
> >                 return NULL;
> >         }
> > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > index 2d3036e292a9..3cbdc8ae00e7 100644
> > --- a/tools/include/uapi/linux/bpf.h
> > +++ b/tools/include/uapi/linux/bpf.h
> > @@ -4660,6 +4660,33 @@ union bpf_attr {
> >   *     Return
> >   *             The number of traversed map elements for success, **-EINVAL** for
> >   *             invalid **flags**.
> > + *
> > + * long bpf_snprintf(char *out, u32 out_size, const char *fmt, u64 *data, u32 data_len)
> > + *     Description
> > + *             Outputs a string into the **out** buffer of size **out_size**
> > + *             based on a format string stored in a read-only map pointed by
> > + *             **fmt**.
> > + *
> > + *             Each format specifier in **fmt** corresponds to one u64 element
> > + *             in the **data** array. For strings and pointers where pointees
> > + *             are accessed, only the pointer values are stored in the *data*
> > + *             array. The *data_len* is the size of *data* in bytes.
> > + *
> > + *             Formats **%s** and **%p{i,I}{4,6}** require to read kernel
> > + *             memory. Reading kernel memory may fail due to either invalid
> > + *             address or valid address but requiring a major memory fault. If
> > + *             reading kernel memory fails, the string for **%s** will be an
> > + *             empty string, and the ip address for **%p{i,I}{4,6}** will be 0.
> > + *             Not returning error to bpf program is consistent with what
> > + *             **bpf_trace_printk**\ () does for now.
> > + *
> > + *     Return
> > + *             The strictly positive length of the printed string, including
> > + *             the trailing NUL character. If the return value is greater than
> > + *             **out_size**, **out** contains a truncated string, without a
> > + *             trailing NULL character.
> > + *
> > + *             Or **-EBUSY** if the per-CPU memory copy buffer is busy.
> >   */
> >  #define __BPF_FUNC_MAPPER(FN)          \
> >         FN(unspec),                     \
> > @@ -4827,6 +4854,7 @@ union bpf_attr {
> >         FN(sock_from_file),             \
> >         FN(check_mtu),                  \
> >         FN(for_each_map_elem),          \
> > +       FN(snprintf),                   \
> >         /* */
> >
> >  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> > --
> > 2.30.1.766.gb4fecdf3b7-goog
> >
Alexei Starovoitov March 23, 2021, 3:21 a.m. UTC | #5
On Wed, Mar 10, 2021 at 11:02:08PM +0100, Florent Revest wrote:
>  
> +struct bpf_snprintf_buf {
> +	char buf[MAX_SNPRINTF_MEMCPY][MAX_SNPRINTF_STR_LEN];
> +};
> +static DEFINE_PER_CPU(struct bpf_snprintf_buf, bpf_snprintf_buf);
> +static DEFINE_PER_CPU(int, bpf_snprintf_buf_used);
> +
> +BPF_CALL_5(bpf_snprintf, char *, out, u32, out_size, char *, fmt, u64 *, args,
> +	   u32, args_len)
> +{
> +	int err, i, buf_used, copy_size, fmt_cnt = 0, memcpy_cnt = 0;
> +	u64 params[MAX_SNPRINTF_VARARGS];
> +	struct bpf_snprintf_buf *bufs;
> +
> +	buf_used = this_cpu_inc_return(bpf_snprintf_buf_used);
> +	if (WARN_ON_ONCE(buf_used > 1)) {

this can trigger only if the helper itself gets preempted and
another bpf prog will run on the same cpu and will call into this helper
again, right?
If so, how about adding preempt_disable here to avoid this case?
It won't prevent the case where kprobe is inside snprintf core,
so the counter is still needed, but it wouldn't trigger by accident.
Also since bufs are not used always, how about grabbing the
buffers only when %p or %s are seen in fmt?
After snprintf() is done it would conditionally do:
if (bufs_were_used) {
   this_cpu_dec(bpf_snprintf_buf_used);
   preempt_enable();
}
This way simple bpf_snprintf won't ever hit EBUSY.

> +		err = -EBUSY;
> +		goto out;
> +	}
> +
> +	bufs = this_cpu_ptr(&bpf_snprintf_buf);
> +
> +	/*
> +	 * The verifier has already done most of the heavy-work for us in
> +	 * check_bpf_snprintf_call. We know that fmt is well formatted and that
> +	 * args_len is valid. The only task left is to convert some of the
> +	 * arguments. For the %s and %pi* specifiers, we need to read buffers
> +	 * from a kernel address during the helper call.
> +	 */
> +	for (i = 0; fmt[i] != '\0'; i++) {
> +		if (fmt[i] != '%')
> +			continue;
> +
> +		if (fmt[i + 1] == '%') {
> +			i++;
> +			continue;
> +		}
> +
> +		/* fmt[i] != 0 && fmt[last] == 0, so we can access fmt[i + 1] */
> +		i++;
> +
> +		/* skip optional "[0 +-][num]" width formating field */
> +		while (fmt[i] == '0' || fmt[i] == '+'  || fmt[i] == '-' ||
> +		       fmt[i] == ' ')
> +			i++;
> +		if (fmt[i] >= '1' && fmt[i] <= '9') {
> +			i++;
> +			while (fmt[i] >= '0' && fmt[i] <= '9')
> +				i++;
> +		}
> +
> +		if (fmt[i] == 's') {
> +			void *unsafe_ptr = (void *)(long)args[fmt_cnt];
> +
> +			err = strncpy_from_kernel_nofault(bufs->buf[memcpy_cnt],
> +							  unsafe_ptr,
> +							  MAX_SNPRINTF_STR_LEN);
> +			if (err < 0)
> +				bufs->buf[memcpy_cnt][0] = '\0';
> +			params[fmt_cnt] = (u64)(long)bufs->buf[memcpy_cnt];

how about:
char buf[512]; instead?
instead of memcpy_cnt++ remember how many bytes of the buf were used and
copy next arg after that.
The scratch space would be used more efficiently.
The helper would potentially return ENOSPC if the first string printed via %s
consumed most of the 512 space and the second string doesn't fit.
But the verifier-time if (memcpy_cnt >= MAX_SNPRINTF_MEMCPY) can be removed.
Ten small %s will work fine.

We can allocate a page per-cpu when this helper is used by prog and free
that page when all progs with bpf_snprintf are unloaded.
But extra complexity is probably not worth it. I would start with 512 per-cpu.
It's going to be enough for most users.

Overall looks great. Cannot wait for v2 :)
Florent Revest March 23, 2021, 2:04 p.m. UTC | #6
On Tue, Mar 23, 2021 at 4:21 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Mar 10, 2021 at 11:02:08PM +0100, Florent Revest wrote:
> >
> > +struct bpf_snprintf_buf {
> > +     char buf[MAX_SNPRINTF_MEMCPY][MAX_SNPRINTF_STR_LEN];
> > +};
> > +static DEFINE_PER_CPU(struct bpf_snprintf_buf, bpf_snprintf_buf);
> > +static DEFINE_PER_CPU(int, bpf_snprintf_buf_used);
> > +
> > +BPF_CALL_5(bpf_snprintf, char *, out, u32, out_size, char *, fmt, u64 *, args,
> > +        u32, args_len)
> > +{
> > +     int err, i, buf_used, copy_size, fmt_cnt = 0, memcpy_cnt = 0;
> > +     u64 params[MAX_SNPRINTF_VARARGS];
> > +     struct bpf_snprintf_buf *bufs;
> > +
> > +     buf_used = this_cpu_inc_return(bpf_snprintf_buf_used);
> > +     if (WARN_ON_ONCE(buf_used > 1)) {
>
> this can trigger only if the helper itself gets preempted and
> another bpf prog will run on the same cpu and will call into this helper
> again, right?
> If so, how about adding preempt_disable here to avoid this case?

Ah, neat, that sounds like a good idea indeed. This was really just
cargo-culted from bpf_seq_printf but as part of my grand unification
attempt for the various printf-like helpers, I can try to make it use
preempt_disable as well yes.

> It won't prevent the case where kprobe is inside snprintf core,
> so the counter is still needed, but it wouldn't trigger by accident.

Good point, I will keep it around then.

> Also since bufs are not used always, how about grabbing the
> buffers only when %p or %s are seen in fmt?
> After snprintf() is done it would conditionally do:
> if (bufs_were_used) {
>    this_cpu_dec(bpf_snprintf_buf_used);
>    preempt_enable();
> }
> This way simple bpf_snprintf won't ever hit EBUSY.

Absolutely, it would be nice. :)

> > +             err = -EBUSY;
> > +             goto out;
> > +     }
> > +
> > +     bufs = this_cpu_ptr(&bpf_snprintf_buf);
> > +
> > +     /*
> > +      * The verifier has already done most of the heavy-work for us in
> > +      * check_bpf_snprintf_call. We know that fmt is well formatted and that
> > +      * args_len is valid. The only task left is to convert some of the
> > +      * arguments. For the %s and %pi* specifiers, we need to read buffers
> > +      * from a kernel address during the helper call.
> > +      */
> > +     for (i = 0; fmt[i] != '\0'; i++) {
> > +             if (fmt[i] != '%')
> > +                     continue;
> > +
> > +             if (fmt[i + 1] == '%') {
> > +                     i++;
> > +                     continue;
> > +             }
> > +
> > +             /* fmt[i] != 0 && fmt[last] == 0, so we can access fmt[i + 1] */
> > +             i++;
> > +
> > +             /* skip optional "[0 +-][num]" width formating field */
> > +             while (fmt[i] == '0' || fmt[i] == '+'  || fmt[i] == '-' ||
> > +                    fmt[i] == ' ')
> > +                     i++;
> > +             if (fmt[i] >= '1' && fmt[i] <= '9') {
> > +                     i++;
> > +                     while (fmt[i] >= '0' && fmt[i] <= '9')
> > +                             i++;
> > +             }
> > +
> > +             if (fmt[i] == 's') {
> > +                     void *unsafe_ptr = (void *)(long)args[fmt_cnt];
> > +
> > +                     err = strncpy_from_kernel_nofault(bufs->buf[memcpy_cnt],
> > +                                                       unsafe_ptr,
> > +                                                       MAX_SNPRINTF_STR_LEN);
> > +                     if (err < 0)
> > +                             bufs->buf[memcpy_cnt][0] = '\0';
> > +                     params[fmt_cnt] = (u64)(long)bufs->buf[memcpy_cnt];
>
> how about:
> char buf[512]; instead?
> instead of memcpy_cnt++ remember how many bytes of the buf were used and
> copy next arg after that.
> The scratch space would be used more efficiently.
> The helper would potentially return ENOSPC if the first string printed via %s
> consumed most of the 512 space and the second string doesn't fit.
> But the verifier-time if (memcpy_cnt >= MAX_SNPRINTF_MEMCPY) can be removed.
> Ten small %s will work fine.

Cool! That is also a good idea :)

> We can allocate a page per-cpu when this helper is used by prog and free
> that page when all progs with bpf_snprintf are unloaded.
> But extra complexity is probably not worth it. I would start with 512 per-cpu.
> It's going to be enough for most users.

Yes, let's maybe keep that for later. I think there is already enough
complexity going into the printf-like helpers unification patch.

> Overall looks great. Cannot wait for v2 :)

Ahah wait until you see that patch! :D
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 7b5319d75b3e..d78175c9a887 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1902,6 +1902,10 @@  extern const struct bpf_func_proto bpf_task_storage_get_proto;
 extern const struct bpf_func_proto bpf_task_storage_delete_proto;
 extern const struct bpf_func_proto bpf_for_each_map_elem_proto;
 
+#define MAX_SNPRINTF_VARARGS		12
+#define MAX_SNPRINTF_MEMCPY		6
+#define MAX_SNPRINTF_STR_LEN		128
+
 const struct bpf_func_proto *bpf_tracing_func_proto(
 	enum bpf_func_id func_id, const struct bpf_prog *prog);
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2d3036e292a9..3cbdc8ae00e7 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4660,6 +4660,33 @@  union bpf_attr {
  *	Return
  *		The number of traversed map elements for success, **-EINVAL** for
  *		invalid **flags**.
+ *
+ * long bpf_snprintf(char *out, u32 out_size, const char *fmt, u64 *data, u32 data_len)
+ *	Description
+ *		Outputs a string into the **out** buffer of size **out_size**
+ *		based on a format string stored in a read-only map pointed by
+ *		**fmt**.
+ *
+ *		Each format specifier in **fmt** corresponds to one u64 element
+ *		in the **data** array. For strings and pointers where pointees
+ *		are accessed, only the pointer values are stored in the *data*
+ *		array. The *data_len* is the size of *data* in bytes.
+ *
+ *		Formats **%s** and **%p{i,I}{4,6}** require to read kernel
+ *		memory. Reading kernel memory may fail due to either invalid
+ *		address or valid address but requiring a major memory fault. If
+ *		reading kernel memory fails, the string for **%s** will be an
+ *		empty string, and the ip address for **%p{i,I}{4,6}** will be 0.
+ *		Not returning error to bpf program is consistent with what
+ *		**bpf_trace_printk**\ () does for now.
+ *
+ *	Return
+ *		The strictly positive length of the printed string, including
+ *		the trailing NUL character. If the return value is greater than
+ *		**out_size**, **out** contains a truncated string, without a
+ *		trailing NULL character.
+ *
+ *		Or **-EBUSY** if the per-CPU memory copy buffer is busy.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -4827,6 +4854,7 @@  union bpf_attr {
 	FN(sock_from_file),		\
 	FN(check_mtu),			\
 	FN(for_each_map_elem),		\
+	FN(snprintf),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c99b2b67dc8d..3ab549df817b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5732,6 +5732,137 @@  static int check_reference_leak(struct bpf_verifier_env *env)
 	return state->acquired_refs ? -EINVAL : 0;
 }
 
+int check_bpf_snprintf_call(struct bpf_verifier_env *env,
+			    struct bpf_reg_state *regs)
+{
+	struct bpf_reg_state *fmt_reg = &regs[BPF_REG_3];
+	struct bpf_reg_state *data_len_reg = &regs[BPF_REG_5];
+	struct bpf_map *fmt_map = fmt_reg->map_ptr;
+	int err, fmt_map_off, i, fmt_cnt = 0, memcpy_cnt = 0, num_args;
+	u64 fmt_addr;
+	char *fmt;
+
+	/* data must be an array of u64 so data_len must be a multiple of 8 */
+	if (data_len_reg->var_off.value & 7)
+		return -EINVAL;
+	num_args = data_len_reg->var_off.value / 8;
+
+	/* fmt being ARG_PTR_TO_CONST_STR guarantees that var_off is const
+	 * and map_direct_value_addr is set.
+	 */
+	fmt_map_off = fmt_reg->off + fmt_reg->var_off.value;
+	err = fmt_map->ops->map_direct_value_addr(fmt_map, &fmt_addr,
+						  fmt_map_off);
+	if (err)
+		return err;
+	fmt = (char *)fmt_addr;
+
+	/* We are also guaranteed that fmt+fmt_map_off is NULL terminated, we
+	 * can focus on validating the format specifiers.
+	 */
+	for (i = fmt_map_off; fmt[i] != '\0'; i++) {
+		if ((!isprint(fmt[i]) && !isspace(fmt[i])) ||
+		    !isascii(fmt[i])) {
+			verbose(env, "only printable ascii for now\n");
+			return -EINVAL;
+		}
+
+		if (fmt[i] != '%')
+			continue;
+
+		if (fmt[i + 1] == '%') {
+			i++;
+			continue;
+		}
+
+		if (fmt_cnt >= MAX_SNPRINTF_VARARGS) {
+			verbose(env, "too many format specifiers\n");
+			return -E2BIG;
+		}
+
+		if (fmt_cnt >= num_args) {
+			verbose(env, "not enough parameters to print\n");
+			return -EINVAL;
+		}
+
+		/* fmt[i] != 0 && fmt[last] == 0, so we can access fmt[i + 1] */
+		i++;
+
+		/* skip optional "[0 +-][num]" width formating field */
+		while (fmt[i] == '0' || fmt[i] == '+'  || fmt[i] == '-' ||
+		       fmt[i] == ' ')
+			i++;
+		if (fmt[i] >= '1' && fmt[i] <= '9') {
+			i++;
+			while (fmt[i] >= '0' && fmt[i] <= '9')
+				i++;
+		}
+
+		if (fmt[i] == 's') {
+			if (memcpy_cnt >= MAX_SNPRINTF_MEMCPY) {
+				verbose(env, "too many buffer copies\n");
+				return -E2BIG;
+			}
+
+			fmt_cnt++;
+			memcpy_cnt++;
+			continue;
+		}
+
+		if (fmt[i] == 'p') {
+			if (fmt[i + 1] == 0 ||  fmt[i + 1] == 'K' ||
+			    fmt[i + 1] == 'x' || fmt[i + 1] == 'B' ||
+			    fmt[i + 1] == 's' || fmt[i + 1] == 'S') {
+				fmt_cnt++;
+				continue;
+			}
+
+			/* only support "%pI4", "%pi4", "%pI6" and "%pi6". */
+			if (fmt[i + 1] != 'i' && fmt[i + 1] != 'I') {
+				verbose(env, "invalid specifier %%p%c\n",
+					fmt[i+1]);
+				return -EINVAL;
+			}
+			if (fmt[i + 2] != '4' && fmt[i + 2] != '6') {
+				verbose(env, "invalid specifier %%p%c%c\n",
+					fmt[i+1], fmt[i+2]);
+				return -EINVAL;
+			}
+
+			if (memcpy_cnt >= MAX_SNPRINTF_MEMCPY) {
+				verbose(env, "too many buffer copies\n");
+				return -E2BIG;
+			}
+
+			i += 2;
+			fmt_cnt++;
+			memcpy_cnt++;
+			continue;
+		}
+
+		if (fmt[i] == 'l') {
+			i++;
+			if (fmt[i] == 'l')
+				i++;
+		}
+
+		if (fmt[i] != 'i' && fmt[i] != 'd' && fmt[i] != 'u' &&
+		    fmt[i] != 'x' && fmt[i] != 'X') {
+			verbose(env, "invalid format specifier %%%c\n", fmt[i]);
+			return -EINVAL;
+		}
+
+		fmt_cnt++;
+	}
+
+	if (fmt_cnt != num_args) {
+		verbose(env, "too many parameters to print\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			     int *insn_idx_p)
 {
@@ -5846,6 +5977,12 @@  static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			return -EINVAL;
 	}
 
+	if (func_id == BPF_FUNC_snprintf) {
+		err = check_bpf_snprintf_call(env, regs);
+		if (err < 0)
+			return err;
+	}
+
 	/* reset caller saved regs */
 	for (i = 0; i < CALLER_SAVED_REGS; i++) {
 		mark_reg_not_init(env, regs, caller_saved[i]);
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 0d23755c2747..7b80759c10a9 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1271,6 +1271,114 @@  const struct bpf_func_proto bpf_snprintf_btf_proto = {
 	.arg5_type	= ARG_ANYTHING,
 };
 
+struct bpf_snprintf_buf {
+	char buf[MAX_SNPRINTF_MEMCPY][MAX_SNPRINTF_STR_LEN];
+};
+static DEFINE_PER_CPU(struct bpf_snprintf_buf, bpf_snprintf_buf);
+static DEFINE_PER_CPU(int, bpf_snprintf_buf_used);
+
+BPF_CALL_5(bpf_snprintf, char *, out, u32, out_size, char *, fmt, u64 *, args,
+	   u32, args_len)
+{
+	int err, i, buf_used, copy_size, fmt_cnt = 0, memcpy_cnt = 0;
+	u64 params[MAX_SNPRINTF_VARARGS];
+	struct bpf_snprintf_buf *bufs;
+
+	buf_used = this_cpu_inc_return(bpf_snprintf_buf_used);
+	if (WARN_ON_ONCE(buf_used > 1)) {
+		err = -EBUSY;
+		goto out;
+	}
+
+	bufs = this_cpu_ptr(&bpf_snprintf_buf);
+
+	/*
+	 * The verifier has already done most of the heavy-work for us in
+	 * check_bpf_snprintf_call. We know that fmt is well formatted and that
+	 * args_len is valid. The only task left is to convert some of the
+	 * arguments. For the %s and %pi* specifiers, we need to read buffers
+	 * from a kernel address during the helper call.
+	 */
+	for (i = 0; fmt[i] != '\0'; i++) {
+		if (fmt[i] != '%')
+			continue;
+
+		if (fmt[i + 1] == '%') {
+			i++;
+			continue;
+		}
+
+		/* fmt[i] != 0 && fmt[last] == 0, so we can access fmt[i + 1] */
+		i++;
+
+		/* skip optional "[0 +-][num]" width formating field */
+		while (fmt[i] == '0' || fmt[i] == '+'  || fmt[i] == '-' ||
+		       fmt[i] == ' ')
+			i++;
+		if (fmt[i] >= '1' && fmt[i] <= '9') {
+			i++;
+			while (fmt[i] >= '0' && fmt[i] <= '9')
+				i++;
+		}
+
+		if (fmt[i] == 's') {
+			void *unsafe_ptr = (void *)(long)args[fmt_cnt];
+
+			err = strncpy_from_kernel_nofault(bufs->buf[memcpy_cnt],
+							  unsafe_ptr,
+							  MAX_SNPRINTF_STR_LEN);
+			if (err < 0)
+				bufs->buf[memcpy_cnt][0] = '\0';
+			params[fmt_cnt] = (u64)(long)bufs->buf[memcpy_cnt];
+
+			fmt_cnt++;
+			memcpy_cnt++;
+			continue;
+		}
+
+		if (fmt[i] == 'p' && (fmt[i + 1] == 'i' || fmt[i + 1] == 'I')) {
+			copy_size = (fmt[i + 2] == '4') ? 4 : 16;
+
+			err = copy_from_kernel_nofault(bufs->buf[memcpy_cnt],
+						(void *) (long) args[fmt_cnt],
+						copy_size);
+			if (err < 0)
+				memset(bufs->buf[memcpy_cnt], 0, copy_size);
+			params[fmt_cnt] = (u64)(long)bufs->buf[memcpy_cnt];
+
+			i += 2;
+			fmt_cnt++;
+			memcpy_cnt++;
+			continue;
+		}
+
+		params[fmt_cnt] = args[fmt_cnt];
+		fmt_cnt++;
+	}
+
+	/* Maximumly we can have MAX_SNPRINTF_VARARGS parameters, just give
+	 * all of them to snprintf().
+	 */
+	err = snprintf(out, out_size, fmt, params[0], params[1], params[2],
+		       params[3], params[4], params[5], params[6], params[7],
+		       params[8], params[9], params[10], params[11]) + 1;
+
+out:
+	this_cpu_dec(bpf_snprintf_buf_used);
+	return err;
+}
+
+static const struct bpf_func_proto bpf_snprintf_proto = {
+	.func		= bpf_snprintf,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_MEM,
+	.arg2_type	= ARG_CONST_SIZE,
+	.arg3_type	= ARG_PTR_TO_CONST_STR,
+	.arg4_type	= ARG_PTR_TO_MEM,
+	.arg5_type	= ARG_CONST_SIZE_OR_ZERO,
+};
+
 const struct bpf_func_proto *
 bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -1373,6 +1481,8 @@  bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_task_storage_delete_proto;
 	case BPF_FUNC_for_each_map_elem:
 		return &bpf_for_each_map_elem_proto;
+	case BPF_FUNC_snprintf:
+		return &bpf_snprintf_proto;
 	default:
 		return NULL;
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 2d3036e292a9..3cbdc8ae00e7 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4660,6 +4660,33 @@  union bpf_attr {
  *	Return
  *		The number of traversed map elements for success, **-EINVAL** for
  *		invalid **flags**.
+ *
+ * long bpf_snprintf(char *out, u32 out_size, const char *fmt, u64 *data, u32 data_len)
+ *	Description
+ *		Outputs a string into the **out** buffer of size **out_size**
+ *		based on a format string stored in a read-only map pointed by
+ *		**fmt**.
+ *
+ *		Each format specifier in **fmt** corresponds to one u64 element
+ *		in the **data** array. For strings and pointers where pointees
+ *		are accessed, only the pointer values are stored in the *data*
+ *		array. The *data_len* is the size of *data* in bytes.
+ *
+ *		Formats **%s** and **%p{i,I}{4,6}** require to read kernel
+ *		memory. Reading kernel memory may fail due to either invalid
+ *		address or valid address but requiring a major memory fault. If
+ *		reading kernel memory fails, the string for **%s** will be an
+ *		empty string, and the ip address for **%p{i,I}{4,6}** will be 0.
+ *		Not returning error to bpf program is consistent with what
+ *		**bpf_trace_printk**\ () does for now.
+ *
+ *	Return
+ *		The strictly positive length of the printed string, including
+ *		the trailing NUL character. If the return value is greater than
+ *		**out_size**, **out** contains a truncated string, without a
+ *		trailing NULL character.
+ *
+ *		Or **-EBUSY** if the per-CPU memory copy buffer is busy.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -4827,6 +4854,7 @@  union bpf_attr {
 	FN(sock_from_file),		\
 	FN(check_mtu),			\
 	FN(for_each_map_elem),		\
+	FN(snprintf),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper