From patchwork Tue Aug 15 17:47:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13354037 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6AB8F13AFD for ; Tue, 15 Aug 2023 17:47:32 +0000 (UTC) Received: from mail-yw1-x112c.google.com (mail-yw1-x112c.google.com [IPv6:2607:f8b0:4864:20::112c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C4EAD1 for ; Tue, 15 Aug 2023 10:47:30 -0700 (PDT) Received: by mail-yw1-x112c.google.com with SMTP id 00721157ae682-589e590af61so37888877b3.0 for ; Tue, 15 Aug 2023 10:47:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692121649; x=1692726449; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9puSNbN3JUXNFsW0BMNEXIM3nLOifYiPhNLu/HEfeb8=; b=QdkWEaoQrrF0F9RBtDqWsqezpNnVhLqRJnMy7gOcqBVpCXrPPSvY06VLJ1Qs8Q674q ftgkVkeZmfwKNNcDE4khjk9UWm0lT65l/P1Q3mC8NyX314CjpWNYewRMi/aOkzEaOy/J h837I5F4dakAC9221huQ/k/h5j6vohYljLtDLFRihggpGKnIhdxC7Yj9iPHlI79fC046 FBJ3obqtfBcq7YvKvBA4tvWmZLC16aUNsTPcyqGycKGlw4KHFT329qhwLbtNXlTiJklB gNciODIkShAKuF09MQAkFNCm9NAGZeiiiFYyxVLgiRzWQVNQ8K1P5az1xozyEdD1qipb yGjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692121649; x=1692726449; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9puSNbN3JUXNFsW0BMNEXIM3nLOifYiPhNLu/HEfeb8=; b=H6vkmYoPGnbFdQ/IEMg/q/xq/CeljEOdv4kNHFd0Jj/cAqb3TKZNrWX0ZuMD+NIcDj fNg8OFOpcTkCAud/Z4CBxAWm7/fAwGTAXyUij00WH0JmF3HnAdOBlTP38x3i4TNBbGQJ ebRUAcx08vM2mDm17RwTRLoijVx6ZqAAyx9s38vDt1UZIsw5xlviHooRvA97U3MZp+jw dHClTQUC0oWeQojurREuV8h77Q136qxMQXBfQLEVjoOsR1xVQM/eye1aul9VgUcdqc4U 7IkGY/cxeHRoqkmkCDcHFMfLrveOulW2ki9S2pAPswOz2Q4BU81PpgNgczQwaQRTnjyF hTow== X-Gm-Message-State: AOJu0YxJIH3m0n23zmugAIHmrYKFlihJymm2ko+WPagrbBFIyW5Pzarx NyULF1N30hg2yYDMGpwwZdCZXGXzQ2t9PQ== X-Google-Smtp-Source: AGHT+IGNtVte+pfmrm9hSbR6AxChju3CLuIAgCf+mtWeRazTE7I+kTBP7h61l3Rk681wAqJz4B7NRg== X-Received: by 2002:a81:4fc8:0:b0:56c:f0c7:7d72 with SMTP id d191-20020a814fc8000000b0056cf0c77d72mr3077037ywb.4.1692121649275; Tue, 15 Aug 2023 10:47:29 -0700 (PDT) Received: from kickker.attlocal.net ([2600:1700:6cf8:1240:84ee:9e38:88fa:8a7b]) by smtp.gmail.com with ESMTPSA id o128-20020a0dcc86000000b00577139f85dfsm3509404ywd.22.2023.08.15.10.47.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Aug 2023 10:47:28 -0700 (PDT) From: thinker.li@gmail.com To: bpf@vger.kernel.org, ast@kernel.org, martin.lau@linux.dev, song@kernel.org, kernel-team@meta.com, andrii@kernel.org, sdf@google.com, yonghong.song@linux.dev Cc: sinquersw@gmail.com, kuifeng@meta.com, Kui-Feng Lee Subject: [RFC bpf-next v3 1/5] bpf: enable sleepable BPF programs attached to cgroup/{get,set}sockopt. Date: Tue, 15 Aug 2023 10:47:08 -0700 Message-Id: <20230815174712.660956-2-thinker.li@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230815174712.660956-1-thinker.li@gmail.com> References: <20230815174712.660956-1-thinker.li@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC From: Kui-Feng Lee Enable sleepable cgroup/{get,set}sockopt hooks. The sleepable BPF programs attached to cgroup/{get,set}sockopt hooks may received a pointer to the optval in user space instead of a kernel copy. ctx->user_optval and ctx->user_optval_end are the pointers to the begin and end of the user space buffer if receiving a user space buffer. ctx->optval and ctx->optval_end will be a kernel copy if receiving a kernel space buffer. A program receives a user space buffer if ctx->flags & BPF_SOCKOPT_FLAG_OPTVAL_USER is true, otherwise it receives a kernel space buffer. The BPF programs should not read/write from/to a user space buffer dirrectly. It should access the buffer through bpf_copy_from_user() and bpf_copy_to_user() provided in the following patches. Signed-off-by: Kui-Feng Lee --- include/linux/bpf.h | 6 ++ include/linux/filter.h | 6 ++ kernel/bpf/cgroup.c | 207 ++++++++++++++++++++++++++++++++--------- kernel/bpf/verifier.c | 5 +- 4 files changed, 177 insertions(+), 47 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index cfabbcf47bdb..edb35bcfa548 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1769,9 +1769,15 @@ struct bpf_prog_array_item { struct bpf_prog_array { struct rcu_head rcu; + u32 flags; struct bpf_prog_array_item items[]; }; +enum bpf_prog_array_flags { + BPF_PROG_ARRAY_F_SLEEPABLE = 1 << 0, + BPF_PROG_ARRAY_F_NON_SLEEPABLE = 1 << 1, +}; + struct bpf_empty_prog_array { struct bpf_prog_array hdr; struct bpf_prog *null_prog; diff --git a/include/linux/filter.h b/include/linux/filter.h index 761af6b3cf2b..2aa2a96526de 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1337,12 +1337,18 @@ struct bpf_sockopt_kern { s32 level; s32 optname; s32 optlen; + u32 flags; /* for retval in struct bpf_cg_run_ctx */ struct task_struct *current_task; /* Temporary "register" for indirect stores to ppos. */ u64 tmp_reg; }; +enum bpf_sockopt_kern_flags { + /* optval is a pointer to user space memory */ + BPF_SOCKOPT_FLAG_OPTVAL_USER = (1U << 0), +}; + int copy_bpf_fprog_from_user(struct sock_fprog *dst, sockptr_t src, int len); struct bpf_sk_lookup_kern { diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 5b2741aa0d9b..b977768a28e5 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -28,25 +28,46 @@ EXPORT_SYMBOL(cgroup_bpf_enabled_key); * function pointer. */ static __always_inline int -bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp, - enum cgroup_bpf_attach_type atype, - const void *ctx, bpf_prog_run_fn run_prog, - int retval, u32 *ret_flags) +bpf_prog_run_array_cg_cb(const struct cgroup_bpf *cgrp, + enum cgroup_bpf_attach_type atype, + const void *ctx, bpf_prog_run_fn run_prog, + int retval, u32 *ret_flags, + int (*progs_cb)(void *, const struct bpf_prog_array *), + void *progs_cb_arg) { const struct bpf_prog_array_item *item; const struct bpf_prog *prog; const struct bpf_prog_array *array; struct bpf_run_ctx *old_run_ctx; struct bpf_cg_run_ctx run_ctx; + bool do_sleepable; u32 func_ret; + int err; + + do_sleepable = + atype == CGROUP_SETSOCKOPT || atype == CGROUP_GETSOCKOPT; run_ctx.retval = retval; migrate_disable(); - rcu_read_lock(); + if (do_sleepable) { + might_fault(); + rcu_read_lock_trace(); + } else + rcu_read_lock(); array = rcu_dereference(cgrp->effective[atype]); item = &array->items[0]; + + if (progs_cb) { + err = progs_cb(progs_cb_arg, array); + if (err) + return err; + } + old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); while ((prog = READ_ONCE(item->prog))) { + if (do_sleepable && !prog->aux->sleepable) + rcu_read_lock(); + run_ctx.prog_item = item; func_ret = run_prog(prog, ctx); if (ret_flags) { @@ -56,13 +77,29 @@ bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp, if (!func_ret && !IS_ERR_VALUE((long)run_ctx.retval)) run_ctx.retval = -EPERM; item++; + + if (do_sleepable && !prog->aux->sleepable) + rcu_read_unlock(); } bpf_reset_run_ctx(old_run_ctx); - rcu_read_unlock(); + if (do_sleepable) + rcu_read_unlock_trace(); + else + rcu_read_unlock(); migrate_enable(); return run_ctx.retval; } +static __always_inline int +bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp, + enum cgroup_bpf_attach_type atype, + const void *ctx, bpf_prog_run_fn run_prog, + int retval, u32 *ret_flags) +{ + return bpf_prog_run_array_cg_cb(cgrp, atype, ctx, run_prog, retval, + ret_flags, NULL, NULL); +} + unsigned int __cgroup_bpf_run_lsm_sock(const void *ctx, const struct bpf_insn *insn) { @@ -307,7 +344,7 @@ static void cgroup_bpf_release(struct work_struct *work) old_array = rcu_dereference_protected( cgrp->bpf.effective[atype], lockdep_is_held(&cgroup_mutex)); - bpf_prog_array_free(old_array); + bpf_prog_array_free_sleepable(old_array); } list_for_each_entry_safe(storage, stmp, storages, list_cg) { @@ -402,6 +439,7 @@ static int compute_effective_progs(struct cgroup *cgrp, enum cgroup_bpf_attach_type atype, struct bpf_prog_array **array) { + bool has_non_sleepable = false, has_sleepable = false; struct bpf_prog_array_item *item; struct bpf_prog_array *progs; struct bpf_prog_list *pl; @@ -434,10 +472,19 @@ static int compute_effective_progs(struct cgroup *cgrp, item->prog = prog_list_prog(pl); bpf_cgroup_storages_assign(item->cgroup_storage, pl->storage); + if (item->prog->aux->sleepable) + has_sleepable = true; + else + has_non_sleepable = true; cnt++; } } while ((p = cgroup_parent(p))); + if (has_non_sleepable) + progs->flags |= BPF_PROG_ARRAY_F_NON_SLEEPABLE; + if (has_sleepable) + progs->flags |= BPF_PROG_ARRAY_F_SLEEPABLE; + *array = progs; return 0; } @@ -451,7 +498,7 @@ static void activate_effective_progs(struct cgroup *cgrp, /* free prog array after grace period, since __cgroup_bpf_run_*() * might be still walking the array */ - bpf_prog_array_free(old_array); + bpf_prog_array_free_sleepable(old_array); } /** @@ -491,7 +538,7 @@ int cgroup_bpf_inherit(struct cgroup *cgrp) return 0; cleanup: for (i = 0; i < NR; i++) - bpf_prog_array_free(arrays[i]); + bpf_prog_array_free_sleepable(arrays[i]); for (p = cgroup_parent(cgrp); p; p = cgroup_parent(p)) cgroup_bpf_put(p); @@ -525,7 +572,7 @@ static int update_effective_progs(struct cgroup *cgrp, if (percpu_ref_is_zero(&desc->bpf.refcnt)) { if (unlikely(desc->bpf.inactive)) { - bpf_prog_array_free(desc->bpf.inactive); + bpf_prog_array_free_sleepable(desc->bpf.inactive); desc->bpf.inactive = NULL; } continue; @@ -544,7 +591,7 @@ static int update_effective_progs(struct cgroup *cgrp, css_for_each_descendant_pre(css, &cgrp->self) { struct cgroup *desc = container_of(css, struct cgroup, self); - bpf_prog_array_free(desc->bpf.inactive); + bpf_prog_array_free_sleepable(desc->bpf.inactive); desc->bpf.inactive = NULL; } @@ -1740,7 +1787,7 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head, #ifdef CONFIG_NET static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen, - struct bpf_sockopt_buf *buf) + struct bpf_sockopt_buf *buf, bool force_alloc) { if (unlikely(max_optlen < 0)) return -EINVAL; @@ -1752,7 +1799,7 @@ static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen, max_optlen = PAGE_SIZE; } - if (max_optlen <= sizeof(buf->data)) { + if (max_optlen <= sizeof(buf->data) && !force_alloc) { /* When the optval fits into BPF_SOCKOPT_KERN_BUF_SIZE * bytes avoid the cost of kzalloc. */ @@ -1773,7 +1820,8 @@ static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen, static void sockopt_free_buf(struct bpf_sockopt_kern *ctx, struct bpf_sockopt_buf *buf) { - if (ctx->optval == buf->data) + if (ctx->optval == buf->data || + ctx->flags & BPF_SOCKOPT_FLAG_OPTVAL_USER) return; kfree(ctx->optval); } @@ -1781,7 +1829,47 @@ static void sockopt_free_buf(struct bpf_sockopt_kern *ctx, static bool sockopt_buf_allocated(struct bpf_sockopt_kern *ctx, struct bpf_sockopt_buf *buf) { - return ctx->optval != buf->data; + return ctx->optval != buf->data && + !(ctx->flags & BPF_SOCKOPT_FLAG_OPTVAL_USER); +} + +struct filter_sockopt_cb_args { + struct bpf_sockopt_kern *ctx; + struct bpf_sockopt_buf *buf; + int max_optlen; +}; + +static int filter_setsockopt_progs_cb(void *arg, + const struct bpf_prog_array *progs) +{ + struct filter_sockopt_cb_args *cb_args = arg; + struct bpf_sockopt_kern *ctx = cb_args->ctx; + char *optval = ctx->optval; + int max_optlen; + + if (!(progs->flags & BPF_PROG_ARRAY_F_NON_SLEEPABLE)) + return 0; + + /* Allocate a bit more than the initial user buffer for + * BPF program. The canonical use case is overriding + * TCP_CONGESTION(nv) to TCP_CONGESTION(cubic). + */ + max_optlen = max_t(int, 16, ctx->optlen); + /* We need to force allocating from heap if there are sleepable + * programs since they may created dynptrs from ctx->optval. In + * this case, dynptrs will try to free the buffer that is actually + * on the stack without this flag. + */ + max_optlen = sockopt_alloc_buf(ctx, max_optlen, cb_args->buf, + progs->flags & BPF_PROG_ARRAY_F_SLEEPABLE); + if (max_optlen < 0) + return max_optlen; + + if (copy_from_user(ctx->optval, optval, + min(ctx->optlen, max_optlen)) != 0) + return -EFAULT; + + return 0; } int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, @@ -1795,27 +1883,22 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, .level = *level, .optname = *optname, }; + struct filter_sockopt_cb_args cb_args = { + .ctx = &ctx, + .buf = &buf, + }; int ret, max_optlen; - /* Allocate a bit more than the initial user buffer for - * BPF program. The canonical use case is overriding - * TCP_CONGESTION(nv) to TCP_CONGESTION(cubic). - */ - max_optlen = max_t(int, 16, *optlen); - max_optlen = sockopt_alloc_buf(&ctx, max_optlen, &buf); - if (max_optlen < 0) - return max_optlen; - + max_optlen = *optlen; ctx.optlen = *optlen; - - if (copy_from_user(ctx.optval, optval, min(*optlen, max_optlen)) != 0) { - ret = -EFAULT; - goto out; - } + ctx.optval = optval; + ctx.optval_end = optval + *optlen; + ctx.flags = BPF_SOCKOPT_FLAG_OPTVAL_USER; lock_sock(sk); - ret = bpf_prog_run_array_cg(&cgrp->bpf, CGROUP_SETSOCKOPT, - &ctx, bpf_prog_run, 0, NULL); + ret = bpf_prog_run_array_cg_cb(&cgrp->bpf, CGROUP_SETSOCKOPT, + &ctx, bpf_prog_run, 0, NULL, + filter_setsockopt_progs_cb, &cb_args); release_sock(sk); if (ret) @@ -1824,7 +1907,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, if (ctx.optlen == -1) { /* optlen set to -1, bypass kernel */ ret = 1; - } else if (ctx.optlen > max_optlen || ctx.optlen < -1) { + } else if (ctx.optlen > (ctx.optval_end - ctx.optval) || + ctx.optlen < -1) { /* optlen is out of bounds */ if (*optlen > PAGE_SIZE && ctx.optlen >= 0) { pr_info_once("bpf setsockopt: ignoring program buffer with optlen=%d (max_optlen=%d)\n", @@ -1846,6 +1930,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, */ if (ctx.optlen != 0) { *optlen = ctx.optlen; + if (ctx.flags & BPF_SOCKOPT_FLAG_OPTVAL_USER) + return 0; /* We've used bpf_sockopt_kern->buf as an intermediary * storage, but the BPF program indicates that we need * to pass this data to the kernel setsockopt handler. @@ -1874,6 +1960,33 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, return ret; } +static int filter_getsockopt_progs_cb(void *arg, + const struct bpf_prog_array *progs) +{ + struct filter_sockopt_cb_args *cb_args = arg; + struct bpf_sockopt_kern *ctx = cb_args->ctx; + int max_optlen; + char *optval; + + if (!(progs->flags & BPF_PROG_ARRAY_F_NON_SLEEPABLE)) + return 0; + + optval = ctx->optval; + max_optlen = sockopt_alloc_buf(ctx, cb_args->max_optlen, + cb_args->buf, false); + if (max_optlen < 0) + return max_optlen; + + if (copy_from_user(ctx->optval, optval, + min(ctx->optlen, max_optlen)) != 0) + return -EFAULT; + + ctx->flags = 0; + cb_args->max_optlen = max_optlen; + + return 0; +} + int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen, int max_optlen, @@ -1887,15 +2000,16 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, .optname = optname, .current_task = current, }; + struct filter_sockopt_cb_args cb_args = { + .ctx = &ctx, + .buf = &buf, + .max_optlen = max_optlen, + }; int orig_optlen; int ret; orig_optlen = max_optlen; ctx.optlen = max_optlen; - max_optlen = sockopt_alloc_buf(&ctx, max_optlen, &buf); - if (max_optlen < 0) - return max_optlen; - if (!retval) { /* If kernel getsockopt finished successfully, * copy whatever was returned to the user back @@ -1914,18 +2028,19 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, goto out; } orig_optlen = ctx.optlen; - - if (copy_from_user(ctx.optval, optval, - min(ctx.optlen, max_optlen)) != 0) { - ret = -EFAULT; - goto out; - } } + ctx.optval = optval; + ctx.optval_end = optval + max_optlen; + ctx.flags = BPF_SOCKOPT_FLAG_OPTVAL_USER; + lock_sock(sk); - ret = bpf_prog_run_array_cg(&cgrp->bpf, CGROUP_GETSOCKOPT, - &ctx, bpf_prog_run, retval, NULL); + ret = bpf_prog_run_array_cg_cb(&cgrp->bpf, CGROUP_GETSOCKOPT, + &ctx, bpf_prog_run, retval, NULL, + filter_getsockopt_progs_cb, + &cb_args); release_sock(sk); + max_optlen = cb_args.max_optlen; if (ret < 0) goto out; @@ -1942,7 +2057,9 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, } if (ctx.optlen != 0) { - if (optval && copy_to_user(optval, ctx.optval, ctx.optlen)) { + if (optval && + !(ctx.flags & BPF_SOCKOPT_FLAG_OPTVAL_USER) && + copy_to_user(optval, ctx.optval, ctx.optlen)) { ret = -EFAULT; goto out; } diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 4ccca1f6c998..61be432b9420 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -19549,7 +19549,8 @@ static bool can_be_sleepable(struct bpf_prog *prog) } return prog->type == BPF_PROG_TYPE_LSM || prog->type == BPF_PROG_TYPE_KPROBE /* only for uprobes */ || - prog->type == BPF_PROG_TYPE_STRUCT_OPS; + prog->type == BPF_PROG_TYPE_STRUCT_OPS || + prog->type == BPF_PROG_TYPE_CGROUP_SOCKOPT; } static int check_attach_btf_id(struct bpf_verifier_env *env) @@ -19571,7 +19572,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env) } if (prog->aux->sleepable && !can_be_sleepable(prog)) { - verbose(env, "Only fentry/fexit/fmod_ret, lsm, iter, uprobe, and struct_ops programs can be sleepable\n"); + verbose(env, "Only fentry/fexit/fmod_ret, lsm, iter, uprobe, cgroup, and struct_ops programs can be sleepable\n"); return -EINVAL; } From patchwork Tue Aug 15 17:47:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13354038 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 82E0E13AFD for ; Tue, 15 Aug 2023 17:47:33 +0000 (UTC) Received: from mail-yw1-x112d.google.com (mail-yw1-x112d.google.com [IPv6:2607:f8b0:4864:20::112d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AB3D3D1 for ; Tue, 15 Aug 2023 10:47:32 -0700 (PDT) Received: by mail-yw1-x112d.google.com with SMTP id 00721157ae682-58c55d408daso11596957b3.2 for ; Tue, 15 Aug 2023 10:47:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692121651; x=1692726451; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Isfnt8xxQu2vEYSiPUbmnt9K0kd+VySH+bQvzCMsoUE=; b=FOSmcNW7mBrPD8gh690bc1oeIEGSCkYt43w6ClaUiYt9x8EajyPCeNCyemTIUapP/o QWlZndsor6pJJ33B3zeuyF657yCyXPtMamtr8vY8UzuC/W2aC7yXBKFFCGb0n19UBm7j w9arTGzgRDVX9GbfULoxn1a6Sa5S5c9Ws2mwFpnrK0wqocQ/sEZeyHqG617tWhUlcnCd QEh1fVkti6L22Eqed8pIAVqCYG4Q4STDwC5LUKtBZ5gmS3Ipal5eB+pqoh/Tp2vyvgXg tcGyUvR/dDNNF75MhcUeiqTE/5xi6BAu5NmF5vycKijOWGTWl4OyewOlsiiMJ/FqF8hz BDaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692121651; x=1692726451; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Isfnt8xxQu2vEYSiPUbmnt9K0kd+VySH+bQvzCMsoUE=; b=Nrdyf3yR0NADtT9Dzag+eNm7Ei5+WQHykNwMGPaZ3Pc0yGtZCl237dHLazB9ksUbit M32Sp5H2YFOMb6713FFM2cuXE5F5V1u/QF1Wj6JznLoeLSQivawgeLLCp7susv0fg/fI iaAlgz9IjFzZk/I8J0kgq/cCG+kmWhMwgkfJafPHYrXSUhRMM0KxF/sWyNNZ9chM7tsn Vqx1W2T3xo+EbFuAUz8nxxBE7GmPjelnTHY/x+ADoE4c60oLHDKxEji6sRQ4aMkyjRSW 70jsVx7Mt5+kM00koNHlR6WVrZ8sA/nh++yBBZHXohaNEpGLR2Y24Rgo9BoTHCzqvL/2 eaUQ== X-Gm-Message-State: AOJu0Yz7tj3FCFFAwqTIYutNGZMcqiS9P/0SO85uRfQIsTLhp8jSqLc+ 3NJvVeQu17GALhfVmHYDB3bsCQ3Va2bPHA== X-Google-Smtp-Source: AGHT+IGYcr9s8MYSY/gqye7JI1nzjH9EU3RQnng2HT9I8ZPXhPEAWB8fPW33F/AdBj4o5vQnj4z0cA== X-Received: by 2002:a81:7282:0:b0:586:9fb3:33cc with SMTP id n124-20020a817282000000b005869fb333ccmr14881740ywc.50.1692121651663; Tue, 15 Aug 2023 10:47:31 -0700 (PDT) Received: from kickker.attlocal.net ([2600:1700:6cf8:1240:84ee:9e38:88fa:8a7b]) by smtp.gmail.com with ESMTPSA id o128-20020a0dcc86000000b00577139f85dfsm3509404ywd.22.2023.08.15.10.47.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Aug 2023 10:47:30 -0700 (PDT) From: thinker.li@gmail.com To: bpf@vger.kernel.org, ast@kernel.org, martin.lau@linux.dev, song@kernel.org, kernel-team@meta.com, andrii@kernel.org, sdf@google.com, yonghong.song@linux.dev Cc: sinquersw@gmail.com, kuifeng@meta.com, Kui-Feng Lee Subject: [RFC bpf-next v3 2/5] libbpf: add sleepable sections for {get,set}sockopt() Date: Tue, 15 Aug 2023 10:47:09 -0700 Message-Id: <20230815174712.660956-3-thinker.li@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230815174712.660956-1-thinker.li@gmail.com> References: <20230815174712.660956-1-thinker.li@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC From: Kui-Feng Lee Enable libbpf users to define sleepable programs attached on {get,set}sockopt(). The sleepable programs should be defined with SEC("cgroup/getsockopt.s") and SEC("cgroup/setsockopt.s") respectively. Signed-off-by: Kui-Feng Lee --- tools/lib/bpf/libbpf.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index b14a4376a86e..ddd6dc166e3e 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -8766,7 +8766,9 @@ static const struct bpf_sec_def section_defs[] = { SEC_DEF("cgroup/getsockname6", CGROUP_SOCK_ADDR, BPF_CGROUP_INET6_GETSOCKNAME, SEC_ATTACHABLE), SEC_DEF("cgroup/sysctl", CGROUP_SYSCTL, BPF_CGROUP_SYSCTL, SEC_ATTACHABLE), SEC_DEF("cgroup/getsockopt", CGROUP_SOCKOPT, BPF_CGROUP_GETSOCKOPT, SEC_ATTACHABLE), + SEC_DEF("cgroup/getsockopt.s", CGROUP_SOCKOPT, BPF_CGROUP_GETSOCKOPT, SEC_ATTACHABLE | SEC_SLEEPABLE), SEC_DEF("cgroup/setsockopt", CGROUP_SOCKOPT, BPF_CGROUP_SETSOCKOPT, SEC_ATTACHABLE), + SEC_DEF("cgroup/setsockopt.s", CGROUP_SOCKOPT, BPF_CGROUP_SETSOCKOPT, SEC_ATTACHABLE | SEC_SLEEPABLE), SEC_DEF("cgroup/dev", CGROUP_DEVICE, BPF_CGROUP_DEVICE, SEC_ATTACHABLE_OPT), SEC_DEF("struct_ops+", STRUCT_OPS, 0, SEC_NONE), SEC_DEF("struct_ops.s+", STRUCT_OPS, 0, SEC_SLEEPABLE), From patchwork Tue Aug 15 17:47:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13354039 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EED2B13AFD for ; Tue, 15 Aug 2023 17:47:35 +0000 (UTC) Received: from mail-yw1-x112a.google.com (mail-yw1-x112a.google.com [IPv6:2607:f8b0:4864:20::112a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C1DBD1 for ; Tue, 15 Aug 2023 10:47:34 -0700 (PDT) Received: by mail-yw1-x112a.google.com with SMTP id 00721157ae682-58c68c79befso2364737b3.3 for ; Tue, 15 Aug 2023 10:47:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692121653; x=1692726453; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=V/2JyvcyhL1PSGWJNY4OwCKZqkgCbGc65Bo2s3xjWGI=; b=JGaq3RzEycmmjL1UsWRuKT/rmCvjfThTvPj0bSOciqpbztJfvs5FCksvyD3cbh0dUO 7vqND2fEkbvpxEkMXwh6KL3RE3uaWKGj0KP6RSA/V38ZgbJ97uDX1s/J8B/wIjofuXJx +YVXYy1I+DDZaXqje7mqKFxChIXt2ujlNREMlOEyTtIBH9cVQ7S6BOZLLhg/rP2weL1e c/9U9l7yrBT+pjv68m7p7CLUk59fkvW0E49jTbgxiiSlTGzkJJQMFK0hZ8fvq2avLYXW S/HPdGYzs7QKqmlX3EQ/FgkqJzrVJICmUpY9vO4F3x/vdKAjMV6d8Fe86GrdMbyhYKH2 kAMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692121653; x=1692726453; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=V/2JyvcyhL1PSGWJNY4OwCKZqkgCbGc65Bo2s3xjWGI=; b=Q7Zdvw0NcxwYCbpOMepq1pwHPVNJ0PdUpVtxFjQeP2uwl7jcpMh0HRjUS3+k7Cs5Gs 5JVBOd1iewWXkADkvfpjCNUKL/9jM7obV1xaWYxY0g/9SXLMlxQSF04g9t2uAjTiRF5q X/oTZXzwlGtrkSC2wGG2NUS3k1JPTPemI4RXf4M2ntX4l7y86lBdSuHLNlF5/F7edFcH ryXkTZV3gkA22Y/oDgNobP5la5hw1u86N/jh3z7lOxAOLA7ZMimkj8F6GHyshiOMBhDe 0HcFc7sWawgOROpY0b564Jihl+EMFc8sLGIU6Su8aziCUI6+JAvHNKjmF+EvWObP3htJ Qggg== X-Gm-Message-State: AOJu0YyG/pF60B8HK8dpZRoWv/CnGe+hTJfDVxUriFYTQEL/O6d5XGkr dcXdoQXdJ9PSujAwl5m4DcgIRF9g+VfJQw== X-Google-Smtp-Source: AGHT+IH4U3d1wB5eAg8x772p6ltXlqvLqt8ZNl8tns2kdLoxvqzk3luwcxFwYqkR/9b5dsGmzkB2YA== X-Received: by 2002:a0d:e609:0:b0:589:fb73:d282 with SMTP id p9-20020a0de609000000b00589fb73d282mr6325207ywe.37.1692121653314; Tue, 15 Aug 2023 10:47:33 -0700 (PDT) Received: from kickker.attlocal.net ([2600:1700:6cf8:1240:84ee:9e38:88fa:8a7b]) by smtp.gmail.com with ESMTPSA id o128-20020a0dcc86000000b00577139f85dfsm3509404ywd.22.2023.08.15.10.47.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Aug 2023 10:47:32 -0700 (PDT) From: thinker.li@gmail.com To: bpf@vger.kernel.org, ast@kernel.org, martin.lau@linux.dev, song@kernel.org, kernel-team@meta.com, andrii@kernel.org, sdf@google.com, yonghong.song@linux.dev Cc: sinquersw@gmail.com, kuifeng@meta.com, Kui-Feng Lee Subject: [RFC bpf-next v3 3/5] bpf: Prevent BPF programs from access the buffer pointed by user_optval. Date: Tue, 15 Aug 2023 10:47:10 -0700 Message-Id: <20230815174712.660956-4-thinker.li@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230815174712.660956-1-thinker.li@gmail.com> References: <20230815174712.660956-1-thinker.li@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC From: Kui-Feng Lee Since the buffer pointed by ctx->user_optval is in user space, BPF programs in kernel space should not access it directly. They should use bpf_copy_from_user() and bpf_copy_to_user() to move data between user and kernel space. Signed-off-by: Kui-Feng Lee --- kernel/bpf/cgroup.c | 16 +++++++++-- kernel/bpf/verifier.c | 66 +++++++++++++++++++++---------------------- 2 files changed, 47 insertions(+), 35 deletions(-) diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index b977768a28e5..425094e071ba 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -2494,12 +2494,24 @@ static bool cg_sockopt_is_valid_access(int off, int size, case offsetof(struct bpf_sockopt, optval): if (size != sizeof(__u64)) return false; - info->reg_type = PTR_TO_PACKET; + if (prog->aux->sleepable) + /* Prohibit access to the memory pointed by optval + * in sleepable programs. + */ + info->reg_type = PTR_TO_PACKET | MEM_USER; + else + info->reg_type = PTR_TO_PACKET; break; case offsetof(struct bpf_sockopt, optval_end): if (size != sizeof(__u64)) return false; - info->reg_type = PTR_TO_PACKET_END; + if (prog->aux->sleepable) + /* Prohibit access to the memory pointed by + * optval_end in sleepable programs. + */ + info->reg_type = PTR_TO_PACKET_END | MEM_USER; + else + info->reg_type = PTR_TO_PACKET_END; break; case offsetof(struct bpf_sockopt, retval): if (size != size_default) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 61be432b9420..936a171ea976 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -13373,7 +13373,7 @@ static void find_good_pkt_pointers(struct bpf_verifier_state *vstate, * dst_reg->off is known < MAX_PACKET_OFF, therefore it fits in a u16. */ bpf_for_each_reg_in_vstate(vstate, state, reg, ({ - if (reg->type == type && reg->id == dst_reg->id) + if (base_type(reg->type) == type && reg->id == dst_reg->id) /* keep the maximum range already checked */ reg->range = max(reg->range, new_range); })); @@ -13926,84 +13926,84 @@ static bool try_match_pkt_pointers(const struct bpf_insn *insn, switch (BPF_OP(insn->code)) { case BPF_JGT: - if ((dst_reg->type == PTR_TO_PACKET && - src_reg->type == PTR_TO_PACKET_END) || - (dst_reg->type == PTR_TO_PACKET_META && + if ((base_type(dst_reg->type) == PTR_TO_PACKET && + base_type(src_reg->type) == PTR_TO_PACKET_END) || + (base_type(dst_reg->type) == PTR_TO_PACKET_META && reg_is_init_pkt_pointer(src_reg, PTR_TO_PACKET))) { /* pkt_data' > pkt_end, pkt_meta' > pkt_data */ find_good_pkt_pointers(this_branch, dst_reg, - dst_reg->type, false); + base_type(dst_reg->type), false); mark_pkt_end(other_branch, insn->dst_reg, true); - } else if ((dst_reg->type == PTR_TO_PACKET_END && - src_reg->type == PTR_TO_PACKET) || + } else if ((base_type(dst_reg->type) == PTR_TO_PACKET_END && + base_type(src_reg->type) == PTR_TO_PACKET) || (reg_is_init_pkt_pointer(dst_reg, PTR_TO_PACKET) && - src_reg->type == PTR_TO_PACKET_META)) { + base_type(src_reg->type) == PTR_TO_PACKET_META)) { /* pkt_end > pkt_data', pkt_data > pkt_meta' */ find_good_pkt_pointers(other_branch, src_reg, - src_reg->type, true); + base_type(src_reg->type), true); mark_pkt_end(this_branch, insn->src_reg, false); } else { return false; } break; case BPF_JLT: - if ((dst_reg->type == PTR_TO_PACKET && - src_reg->type == PTR_TO_PACKET_END) || - (dst_reg->type == PTR_TO_PACKET_META && + if ((base_type(dst_reg->type) == PTR_TO_PACKET && + base_type(src_reg->type) == PTR_TO_PACKET_END) || + (base_type(dst_reg->type) == PTR_TO_PACKET_META && reg_is_init_pkt_pointer(src_reg, PTR_TO_PACKET))) { /* pkt_data' < pkt_end, pkt_meta' < pkt_data */ find_good_pkt_pointers(other_branch, dst_reg, - dst_reg->type, true); + base_type(dst_reg->type), true); mark_pkt_end(this_branch, insn->dst_reg, false); - } else if ((dst_reg->type == PTR_TO_PACKET_END && - src_reg->type == PTR_TO_PACKET) || + } else if ((base_type(dst_reg->type) == PTR_TO_PACKET_END && + base_type(src_reg->type) == PTR_TO_PACKET) || (reg_is_init_pkt_pointer(dst_reg, PTR_TO_PACKET) && - src_reg->type == PTR_TO_PACKET_META)) { + base_type(src_reg->type) == PTR_TO_PACKET_META)) { /* pkt_end < pkt_data', pkt_data > pkt_meta' */ find_good_pkt_pointers(this_branch, src_reg, - src_reg->type, false); + base_type(src_reg->type), false); mark_pkt_end(other_branch, insn->src_reg, true); } else { return false; } break; case BPF_JGE: - if ((dst_reg->type == PTR_TO_PACKET && - src_reg->type == PTR_TO_PACKET_END) || - (dst_reg->type == PTR_TO_PACKET_META && + if ((base_type(dst_reg->type) == PTR_TO_PACKET && + base_type(src_reg->type) == PTR_TO_PACKET_END) || + (base_type(dst_reg->type) == PTR_TO_PACKET_META && reg_is_init_pkt_pointer(src_reg, PTR_TO_PACKET))) { /* pkt_data' >= pkt_end, pkt_meta' >= pkt_data */ find_good_pkt_pointers(this_branch, dst_reg, - dst_reg->type, true); + base_type(dst_reg->type), true); mark_pkt_end(other_branch, insn->dst_reg, false); - } else if ((dst_reg->type == PTR_TO_PACKET_END && - src_reg->type == PTR_TO_PACKET) || + } else if ((base_type(dst_reg->type) == PTR_TO_PACKET_END && + base_type(src_reg->type) == PTR_TO_PACKET) || (reg_is_init_pkt_pointer(dst_reg, PTR_TO_PACKET) && - src_reg->type == PTR_TO_PACKET_META)) { + base_type(src_reg->type) == PTR_TO_PACKET_META)) { /* pkt_end >= pkt_data', pkt_data >= pkt_meta' */ find_good_pkt_pointers(other_branch, src_reg, - src_reg->type, false); + base_type(src_reg->type), false); mark_pkt_end(this_branch, insn->src_reg, true); } else { return false; } break; case BPF_JLE: - if ((dst_reg->type == PTR_TO_PACKET && - src_reg->type == PTR_TO_PACKET_END) || - (dst_reg->type == PTR_TO_PACKET_META && + if ((base_type(dst_reg->type) == PTR_TO_PACKET && + base_type(src_reg->type) == PTR_TO_PACKET_END) || + (base_type(dst_reg->type) == PTR_TO_PACKET_META && reg_is_init_pkt_pointer(src_reg, PTR_TO_PACKET))) { /* pkt_data' <= pkt_end, pkt_meta' <= pkt_data */ find_good_pkt_pointers(other_branch, dst_reg, - dst_reg->type, false); + base_type(dst_reg->type), false); mark_pkt_end(this_branch, insn->dst_reg, true); - } else if ((dst_reg->type == PTR_TO_PACKET_END && - src_reg->type == PTR_TO_PACKET) || + } else if ((base_type(dst_reg->type) == PTR_TO_PACKET_END && + base_type(src_reg->type) == PTR_TO_PACKET) || (reg_is_init_pkt_pointer(dst_reg, PTR_TO_PACKET) && - src_reg->type == PTR_TO_PACKET_META)) { + base_type(src_reg->type) == PTR_TO_PACKET_META)) { /* pkt_end <= pkt_data', pkt_data <= pkt_meta' */ find_good_pkt_pointers(this_branch, src_reg, - src_reg->type, true); + base_type(src_reg->type), true); mark_pkt_end(other_branch, insn->src_reg, false); } else { return false; From patchwork Tue Aug 15 17:47:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13354040 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 195BE13AFD for ; Tue, 15 Aug 2023 17:47:38 +0000 (UTC) Received: from mail-yw1-x1133.google.com (mail-yw1-x1133.google.com [IPv6:2607:f8b0:4864:20::1133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F110C1BCC for ; Tue, 15 Aug 2023 10:47:35 -0700 (PDT) Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-58c4e337357so17644877b3.0 for ; Tue, 15 Aug 2023 10:47:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692121655; x=1692726455; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GxdWDKd1S8Tu9hMxk7z8Po/VzXEND29ljbXI3nxnuCM=; b=rOXq1mlzlwoGAHMAOcg0A5VDT8B86Sd4F2Ih+xs1mJlmZAjG/wYdiRtq7edOjeUcRs 1VlrGDX4hbSwusLLxRISarxQsi0AZDlORZzu034pUo5fVPr0XuB5vDe023+YaXLhovpa 6pLjnGpcfts//jwPVMixKmy/EP52ebw1ouq0JQf/YXg2/GNbkPKv50AZYSimvqzWN/oG OhRd/MaZO9QMZlgf5GcXorhRy2LdQgDpWo0/QDTDikb3IDK0cOUn3Y/i4J5MiiPnyIrZ 4qXs7rXcnHXbj9VR1FCY814BVZmP+eG5rOmeHW9PnrYlHDF/EicXG7efIZnk1/SoajzI +oJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692121655; x=1692726455; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GxdWDKd1S8Tu9hMxk7z8Po/VzXEND29ljbXI3nxnuCM=; b=EC/BfTSZ9AqOZv4jEgvRSxQr8WYw+Z9Jzrv9EqBMQ9pk74EE/7lDAHyR/6W4UUWkKS uNujyg+wwfAF9Pced1WVdtU0dqRFuRgAyau7fsnQlLHaxesqO1f2SoemMF/dkEFGqE1J sh76upeChsYnR8HtiWrSMnm4BJ5CQBAJWef7GRlaKdyHLlD80uIfbTa+EGIecEIW+jx/ DWr8kuPpbSBnvJVeBVBIOuup+tqyRXQuy9lHpRG60SYH9NEkgJqHMKrENH4gPzHUVuGh uNkEcX4vXqfSpcJMES5m/AECfmb8af6boq7wcrAxxNyBjQapG0BnlBv2CKbS9Mo+o9H5 pX5g== X-Gm-Message-State: AOJu0YwRp/naMOKykg/FpNzXdak4OsR0d715P454Uih3B20fYAnT+jsb mwsmj0Z2PFz6GghnIQtmZfgtbayg/c2yug== X-Google-Smtp-Source: AGHT+IGfKd+6huQrGj0xwVrJrGX5AJd0HL4tAfHce4yMBwTWpbJeJsMMYh7f3mwQeS835fx02ycUOQ== X-Received: by 2002:a81:6d4f:0:b0:589:eb9f:8d70 with SMTP id i76-20020a816d4f000000b00589eb9f8d70mr8911332ywc.4.1692121654838; Tue, 15 Aug 2023 10:47:34 -0700 (PDT) Received: from kickker.attlocal.net ([2600:1700:6cf8:1240:84ee:9e38:88fa:8a7b]) by smtp.gmail.com with ESMTPSA id o128-20020a0dcc86000000b00577139f85dfsm3509404ywd.22.2023.08.15.10.47.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Aug 2023 10:47:34 -0700 (PDT) From: thinker.li@gmail.com To: bpf@vger.kernel.org, ast@kernel.org, martin.lau@linux.dev, song@kernel.org, kernel-team@meta.com, andrii@kernel.org, sdf@google.com, yonghong.song@linux.dev Cc: sinquersw@gmail.com, kuifeng@meta.com, Kui-Feng Lee Subject: [RFC bpf-next v3 4/5] bpf: Add a new dynptr type for CGRUP_SOCKOPT. Date: Tue, 15 Aug 2023 10:47:11 -0700 Message-Id: <20230815174712.660956-5-thinker.li@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230815174712.660956-1-thinker.li@gmail.com> References: <20230815174712.660956-1-thinker.li@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC From: Kui-Feng Lee The new dynptr type (BPF_DYNPTR_TYPE_CGROUP_SOCKOPT) will be used by BPF programs to create a buffer that can be installed on ctx to replace exisiting optval or user_optval. Installation is only allowed if ctx->flags & BPF_SOCKOPT_FLAG_OPTVAL_REPLACE is true. It is enabled only for sleepable programs on the cgroup/setsockopt hook. BPF programs can install a new buffer holding by a dynptr to increase the size of optval passed to setsockopt(). Installation is not enabled for cgroup/getsockopt since you can not increased a buffer created, by user program, to return data from getsockopt(). Signed-off-by: Kui-Feng Lee --- include/linux/bpf.h | 7 +- include/linux/filter.h | 4 + kernel/bpf/btf.c | 3 + kernel/bpf/cgroup.c | 5 +- kernel/bpf/helpers.c | 197 +++++++++++++++++++++++++++++++++++++++++ kernel/bpf/verifier.c | 47 +++++++++- 6 files changed, 259 insertions(+), 4 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index edb35bcfa548..b9e4d7752555 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -663,12 +663,15 @@ enum bpf_type_flag { /* DYNPTR points to xdp_buff */ DYNPTR_TYPE_XDP = BIT(16 + BPF_BASE_TYPE_BITS), + /* DYNPTR points to optval buffer of bpf_sockopt */ + DYNPTR_TYPE_CGROUP_SOCKOPT = BIT(17 + BPF_BASE_TYPE_BITS), + __BPF_TYPE_FLAG_MAX, __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, }; #define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF | DYNPTR_TYPE_SKB \ - | DYNPTR_TYPE_XDP) + | DYNPTR_TYPE_XDP | DYNPTR_TYPE_CGROUP_SOCKOPT) /* Max number of base types. */ #define BPF_BASE_TYPE_LIMIT (1UL << BPF_BASE_TYPE_BITS) @@ -1206,6 +1209,8 @@ enum bpf_dynptr_type { BPF_DYNPTR_TYPE_SKB, /* Underlying data is a xdp_buff */ BPF_DYNPTR_TYPE_XDP, + /* Underlying data is for the optval of a cgroup sock */ + BPF_DYNPTR_TYPE_CGROUP_SOCKOPT, }; int bpf_dynptr_check_size(u32 size); diff --git a/include/linux/filter.h b/include/linux/filter.h index 2aa2a96526de..df12fddd2f21 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1347,6 +1347,10 @@ struct bpf_sockopt_kern { enum bpf_sockopt_kern_flags { /* optval is a pointer to user space memory */ BPF_SOCKOPT_FLAG_OPTVAL_USER = (1U << 0), + /* able to install new optval */ + BPF_SOCKOPT_FLAG_OPTVAL_REPLACE = (1U << 1), + /* optval is referenced by a dynptr */ + BPF_SOCKOPT_FLAG_OPTVAL_DYNPTR = (1U << 2), }; int copy_bpf_fprog_from_user(struct sock_fprog *dst, sockptr_t src, int len); diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 249657c466dd..6d6a040688be 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -217,6 +217,7 @@ enum btf_kfunc_hook { BTF_KFUNC_HOOK_SOCKET_FILTER, BTF_KFUNC_HOOK_LWT, BTF_KFUNC_HOOK_NETFILTER, + BTF_KFUNC_HOOK_CGROUP_SOCKOPT, BTF_KFUNC_HOOK_MAX, }; @@ -7846,6 +7847,8 @@ static int bpf_prog_type_to_kfunc_hook(enum bpf_prog_type prog_type) return BTF_KFUNC_HOOK_LWT; case BPF_PROG_TYPE_NETFILTER: return BTF_KFUNC_HOOK_NETFILTER; + case BPF_PROG_TYPE_CGROUP_SOCKOPT: + return BTF_KFUNC_HOOK_CGROUP_SOCKOPT; default: return BTF_KFUNC_HOOK_MAX; } diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 425094e071ba..164dee8753cf 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -1865,6 +1865,8 @@ static int filter_setsockopt_progs_cb(void *arg, if (max_optlen < 0) return max_optlen; + ctx->flags = BPF_SOCKOPT_FLAG_OPTVAL_REPLACE; + if (copy_from_user(ctx->optval, optval, min(ctx->optlen, max_optlen)) != 0) return -EFAULT; @@ -1893,7 +1895,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, ctx.optlen = *optlen; ctx.optval = optval; ctx.optval_end = optval + *optlen; - ctx.flags = BPF_SOCKOPT_FLAG_OPTVAL_USER; + ctx.flags = BPF_SOCKOPT_FLAG_OPTVAL_USER | + BPF_SOCKOPT_FLAG_OPTVAL_REPLACE; lock_sock(sk); ret = bpf_prog_run_array_cg_cb(&cgrp->bpf, CGROUP_SETSOCKOPT, diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index eb91cae0612a..fc38aff02654 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1537,6 +1537,7 @@ BPF_CALL_5(bpf_dynptr_read, void *, dst, u32, len, const struct bpf_dynptr_kern switch (type) { case BPF_DYNPTR_TYPE_LOCAL: case BPF_DYNPTR_TYPE_RINGBUF: + case BPF_DYNPTR_TYPE_CGROUP_SOCKOPT: /* Source and destination may possibly overlap, hence use memmove to * copy the data. E.g. bpf_dynptr_from_mem may create two dynptr * pointing to overlapping PTR_TO_MAP_VALUE regions. @@ -1582,6 +1583,7 @@ BPF_CALL_5(bpf_dynptr_write, const struct bpf_dynptr_kern *, dst, u32, offset, v switch (type) { case BPF_DYNPTR_TYPE_LOCAL: case BPF_DYNPTR_TYPE_RINGBUF: + case BPF_DYNPTR_TYPE_CGROUP_SOCKOPT: if (flags) return -EINVAL; /* Source and destination may possibly overlap, hence use memmove to @@ -1634,6 +1636,7 @@ BPF_CALL_3(bpf_dynptr_data, const struct bpf_dynptr_kern *, ptr, u32, offset, u3 switch (type) { case BPF_DYNPTR_TYPE_LOCAL: case BPF_DYNPTR_TYPE_RINGBUF: + case BPF_DYNPTR_TYPE_CGROUP_SOCKOPT: return (unsigned long)(ptr->data + ptr->offset + offset); case BPF_DYNPTR_TYPE_SKB: case BPF_DYNPTR_TYPE_XDP: @@ -2261,6 +2264,7 @@ __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset switch (type) { case BPF_DYNPTR_TYPE_LOCAL: case BPF_DYNPTR_TYPE_RINGBUF: + case BPF_DYNPTR_TYPE_CGROUP_SOCKOPT: return ptr->data + ptr->offset + offset; case BPF_DYNPTR_TYPE_SKB: if (buffer__opt) @@ -2429,6 +2433,185 @@ __bpf_kfunc void bpf_rcu_read_unlock(void) rcu_read_unlock(); } +/* Create a buffer of the given size for a {set,get}sockopt BPF filter. + * + * This kfunc is only avaliabe for sleeplabe contexts. The dynptr should be + * released by bpf_sockopt_dynptr_install() or bpf_sockopt_release(). + */ +__bpf_kfunc int bpf_sockopt_dynptr_alloc(struct bpf_sockopt *sopt, int size, + struct bpf_dynptr_kern *ptr__uninit) +{ + void *optval; + int err; + + bpf_dynptr_set_null(ptr__uninit); + + err = bpf_dynptr_check_size(size); + if (err) + return err; + + optval = kzalloc(size, GFP_KERNEL); + if (!optval) + return -ENOMEM; + + bpf_dynptr_init(ptr__uninit, optval, + BPF_DYNPTR_TYPE_CGROUP_SOCKOPT, 0, size); + + return size; +} + +/* Install the buffer of the dynptr into the sockopt context. + * + * This kfunc is only avaliabe for sleeplabe contexts. The dynptr should be + * allocated by bpf_sockopt_dynptr_alloc(). The dynptr is invalid after + * returning from this function successfully. + */ +__bpf_kfunc int bpf_sockopt_dynptr_install(struct bpf_sockopt *sopt, + struct bpf_dynptr_kern *ptr) +{ + struct bpf_sockopt_kern *sopt_kern = (struct bpf_sockopt_kern *)sopt; + + if (!(sopt_kern->flags & BPF_SOCKOPT_FLAG_OPTVAL_REPLACE) || + bpf_dynptr_get_type(ptr) != BPF_DYNPTR_TYPE_CGROUP_SOCKOPT || + !ptr->data) + return -EINVAL; + + if (sopt_kern->optval == ptr->data && + !(sopt_kern->flags & BPF_SOCKOPT_FLAG_OPTVAL_USER)) { + /* This dynptr is initialized by bpf_sockopt_dynptr_from() + * and the optval is not overwritten by + * bpf_sockopt_dynptr_install() yet. + */ + bpf_dynptr_set_null(ptr); + sopt_kern->flags &= ~BPF_SOCKOPT_FLAG_OPTVAL_DYNPTR; + return 0; + } + + if (sopt_kern->optval && + !(sopt_kern->flags & (BPF_SOCKOPT_FLAG_OPTVAL_USER | + BPF_SOCKOPT_FLAG_OPTVAL_DYNPTR))) + kfree(sopt_kern->optval); + + sopt_kern->optval = ptr->data; + sopt_kern->optval_end = ptr->data + __bpf_dynptr_size(ptr); + sopt_kern->optlen = __bpf_dynptr_size(ptr); + sopt_kern->flags &= ~(BPF_SOCKOPT_FLAG_OPTVAL_USER | + BPF_SOCKOPT_FLAG_OPTVAL_DYNPTR); + + bpf_dynptr_set_null(ptr); + + return 0; +} + +__bpf_kfunc int bpf_sockopt_dynptr_release(struct bpf_sockopt *sopt, + struct bpf_dynptr_kern *ptr) +{ + struct bpf_sockopt_kern *sopt_kern = (struct bpf_sockopt_kern *)sopt; + + if (bpf_dynptr_get_type(ptr) != BPF_DYNPTR_TYPE_CGROUP_SOCKOPT || + !ptr->data) + return -EINVAL; + + if (sopt_kern->optval == ptr->data && + !(sopt_kern->flags & BPF_SOCKOPT_FLAG_OPTVAL_USER)) + /* This dynptr is initialized by bpf_sockopt_dynptr_from() + * and the optval is not overwritten by + * bpf_sockopt_dynptr_install() yet. + */ + sopt_kern->flags &= ~BPF_SOCKOPT_FLAG_OPTVAL_DYNPTR; + else + kfree(ptr->data); + bpf_dynptr_set_null(ptr); + + return 0; +} + +/* Initialize a sockopt dynptr from a user or installed optval pointer. + * + * sopt->optval can be a user pointer or a kernel pointer. A kernel pointer + * can be a buffer allocated by the caller of the BPF program or a buffer + * installed by other BPF programs through bpf_sockopt_dynptr_install(). + * + * Atmost one dynptr shall be created by this function at any moment, or + * it will return -EINVAL. You can create another dypptr by this function + * after release the previous one by bpf_sockopt_dynptr_release(). + * + * A dynptr that is initialized when optval is a user pointer is an + * exception. In this case, the dynptr will point to a kernel buffer with + * the same content as the user buffer. To simplify the code, users should + * always make sure having only one dynptr initialized by this function at + * any moment. + */ +__bpf_kfunc int bpf_sockopt_dynptr_from(struct bpf_sockopt *sopt, + struct bpf_dynptr_kern *ptr__uninit, + unsigned int size) +{ + struct bpf_sockopt_kern *sopt_kern = (struct bpf_sockopt_kern *)sopt; + int err; + + bpf_dynptr_set_null(ptr__uninit); + + if (size > (sopt_kern->optval_end - sopt_kern->optval)) + return -EINVAL; + + if (size == 0) + size = min(sopt_kern->optlen, + (int)(sopt_kern->optval_end - sopt_kern->optval)); + + if (sopt_kern->flags & BPF_SOCKOPT_FLAG_OPTVAL_DYNPTR) + return -EINVAL; + + if (sopt_kern->flags & BPF_SOCKOPT_FLAG_OPTVAL_USER) { + err = bpf_sockopt_dynptr_alloc(sopt, sopt_kern->optlen, + ptr__uninit); + if (err >= 0) + err = copy_from_user(ptr__uninit->data, + sopt_kern->optval, + size); + return err; + } + + bpf_dynptr_init(ptr__uninit, sopt_kern->optval, + BPF_DYNPTR_TYPE_CGROUP_SOCKOPT, 0, + size); + sopt_kern->flags |= BPF_SOCKOPT_FLAG_OPTVAL_DYNPTR; + + return size; +} + +/** + * int bpf_sockopt_dynptr_copy_to(struct bpf_sockopt *sopt, + * struct bpf_dynptr_kern *ptr) + * Description + * Copy data from *ptr* to *sopt->optval*. + * Return + * >= 0 on success, or a negative error in case of failure. + */ +__bpf_kfunc int bpf_sockopt_dynptr_copy_to(struct bpf_sockopt *sopt, + struct bpf_dynptr_kern *ptr) +{ + __u32 size = bpf_dynptr_size(ptr); + + struct bpf_sockopt_kern *sopt_kern = (struct bpf_sockopt_kern *)sopt; + int ret; + + if (size > (sopt_kern->optval_end - sopt_kern->optval)) + return -EINVAL; + + if (sopt_kern->flags & BPF_SOCKOPT_FLAG_OPTVAL_USER) { + ret = copy_to_user(sopt_kern->optval, ptr->data, + size); + if (unlikely(ret)) + return -EFAULT; + } else { + /* Use memmove() in case of optval & ptr overlap. */ + memmove(sopt_kern->optval, ptr->data, size); + ret = size; + } + + return ret; +} + __diag_pop(); BTF_SET8_START(generic_btf_ids) @@ -2494,6 +2677,19 @@ static const struct btf_kfunc_id_set common_kfunc_set = { .set = &common_btf_ids, }; +BTF_SET8_START(cgroup_common_btf_ids) +BTF_ID_FLAGS(func, bpf_sockopt_dynptr_copy_to, KF_SLEEPABLE) +BTF_ID_FLAGS(func, bpf_sockopt_dynptr_alloc, KF_SLEEPABLE) +BTF_ID_FLAGS(func, bpf_sockopt_dynptr_install, KF_SLEEPABLE) +BTF_ID_FLAGS(func, bpf_sockopt_dynptr_release, KF_SLEEPABLE) +BTF_ID_FLAGS(func, bpf_sockopt_dynptr_from, KF_SLEEPABLE) +BTF_SET8_END(cgroup_common_btf_ids) + +static const struct btf_kfunc_id_set cgroup_kfunc_set = { + .owner = THIS_MODULE, + .set = &cgroup_common_btf_ids, +}; + static int __init kfunc_init(void) { int ret; @@ -2513,6 +2709,7 @@ static int __init kfunc_init(void) ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &generic_kfunc_set); ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &generic_kfunc_set); ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &generic_kfunc_set); + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCKOPT, &cgroup_kfunc_set); ret = ret ?: register_btf_id_dtor_kfuncs(generic_dtors, ARRAY_SIZE(generic_dtors), THIS_MODULE); diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 936a171ea976..83d65a6e1309 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -745,6 +745,8 @@ static const char *dynptr_type_str(enum bpf_dynptr_type type) return "skb"; case BPF_DYNPTR_TYPE_XDP: return "xdp"; + case BPF_DYNPTR_TYPE_CGROUP_SOCKOPT: + return "cgroup_sockopt"; case BPF_DYNPTR_TYPE_INVALID: return ""; default: @@ -826,6 +828,8 @@ static enum bpf_dynptr_type arg_to_dynptr_type(enum bpf_arg_type arg_type) return BPF_DYNPTR_TYPE_SKB; case DYNPTR_TYPE_XDP: return BPF_DYNPTR_TYPE_XDP; + case DYNPTR_TYPE_CGROUP_SOCKOPT: + return BPF_DYNPTR_TYPE_CGROUP_SOCKOPT; default: return BPF_DYNPTR_TYPE_INVALID; } @@ -842,6 +846,8 @@ static enum bpf_type_flag get_dynptr_type_flag(enum bpf_dynptr_type type) return DYNPTR_TYPE_SKB; case BPF_DYNPTR_TYPE_XDP: return DYNPTR_TYPE_XDP; + case BPF_DYNPTR_TYPE_CGROUP_SOCKOPT: + return DYNPTR_TYPE_CGROUP_SOCKOPT; default: return 0; } @@ -849,7 +855,8 @@ static enum bpf_type_flag get_dynptr_type_flag(enum bpf_dynptr_type type) static bool dynptr_type_refcounted(enum bpf_dynptr_type type) { - return type == BPF_DYNPTR_TYPE_RINGBUF; + return type == BPF_DYNPTR_TYPE_RINGBUF || + type == BPF_DYNPTR_TYPE_CGROUP_SOCKOPT; } static void __mark_dynptr_reg(struct bpf_reg_state *reg, @@ -10271,6 +10278,10 @@ enum special_kfunc_type { KF_bpf_dynptr_slice, KF_bpf_dynptr_slice_rdwr, KF_bpf_dynptr_clone, + KF_bpf_sockopt_dynptr_alloc, + KF_bpf_sockopt_dynptr_install, + KF_bpf_sockopt_dynptr_release, + KF_bpf_sockopt_dynptr_from, }; BTF_SET_START(special_kfunc_set) @@ -10291,6 +10302,10 @@ BTF_ID(func, bpf_dynptr_from_xdp) BTF_ID(func, bpf_dynptr_slice) BTF_ID(func, bpf_dynptr_slice_rdwr) BTF_ID(func, bpf_dynptr_clone) +BTF_ID(func, bpf_sockopt_dynptr_alloc) +BTF_ID(func, bpf_sockopt_dynptr_install) +BTF_ID(func, bpf_sockopt_dynptr_release) +BTF_ID(func, bpf_sockopt_dynptr_from) BTF_SET_END(special_kfunc_set) BTF_ID_LIST(special_kfunc_list) @@ -10313,6 +10328,10 @@ BTF_ID(func, bpf_dynptr_from_xdp) BTF_ID(func, bpf_dynptr_slice) BTF_ID(func, bpf_dynptr_slice_rdwr) BTF_ID(func, bpf_dynptr_clone) +BTF_ID(func, bpf_sockopt_dynptr_alloc) +BTF_ID(func, bpf_sockopt_dynptr_install) +BTF_ID(func, bpf_sockopt_dynptr_release) +BTF_ID(func, bpf_sockopt_dynptr_from) static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta) { @@ -10966,6 +10985,20 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ arg_type |= OBJ_RELEASE; break; case KF_ARG_PTR_TO_DYNPTR: + if (meta->func_id == special_kfunc_list[KF_bpf_sockopt_dynptr_install] || + meta->func_id == special_kfunc_list[KF_bpf_sockopt_dynptr_release]) { + int ref_obj_id = dynptr_ref_obj_id(env, reg); + + if (ref_obj_id < 0) { + verbose(env, "R%d is not a valid dynptr\n", regno); + return -EINVAL; + } + + /* Required by check_func_arg_reg_off() */ + arg_type |= ARG_PTR_TO_DYNPTR | OBJ_RELEASE; + meta->release_regno = regno; + } + break; case KF_ARG_PTR_TO_ITER: case KF_ARG_PTR_TO_LIST_HEAD: case KF_ARG_PTR_TO_LIST_NODE: @@ -11053,6 +11086,10 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ verbose(env, "verifier internal error: missing ref obj id for parent of clone\n"); return -EFAULT; } + } else if ((meta->func_id == special_kfunc_list[KF_bpf_sockopt_dynptr_alloc] || + meta->func_id == special_kfunc_list[KF_bpf_sockopt_dynptr_from]) && + (dynptr_arg_type & MEM_UNINIT)) { + dynptr_arg_type |= DYNPTR_TYPE_CGROUP_SOCKOPT; } ret = process_dynptr_func(env, regno, insn_idx, dynptr_arg_type, clone_ref_obj_id); @@ -11361,7 +11398,13 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, * PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now. */ if (meta.release_regno) { - err = release_reference(env, regs[meta.release_regno].ref_obj_id); + verbose(env, "release refcounted PTR_TO_BTF_ID %s\n", + meta.func_name); + if (meta.func_id == special_kfunc_list[KF_bpf_sockopt_dynptr_install] || + meta.func_id == special_kfunc_list[KF_bpf_sockopt_dynptr_release]) + err = unmark_stack_slots_dynptr(env, ®s[meta.release_regno]); + else + err = release_reference(env, regs[meta.release_regno].ref_obj_id); if (err) { verbose(env, "kfunc %s#%d reference has not been acquired before\n", func_name, meta.func_id); From patchwork Tue Aug 15 17:47:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13354041 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 15B3113AFD for ; Tue, 15 Aug 2023 17:47:40 +0000 (UTC) Received: from mail-yw1-x112d.google.com (mail-yw1-x112d.google.com [IPv6:2607:f8b0:4864:20::112d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 060E61BDB for ; Tue, 15 Aug 2023 10:47:38 -0700 (PDT) Received: by mail-yw1-x112d.google.com with SMTP id 00721157ae682-583d702129cso62060937b3.3 for ; Tue, 15 Aug 2023 10:47:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692121656; x=1692726456; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=JOqmEllem/jPeBzpq8fEl4V95lhjC3iSrE+tn6lSaq0=; b=WvDnAcCJm20xYQRzAfAa3rEpZ2EDbdSWa2V7DksaUUTLEfvLJ2Chiuo7C555yKx1HH 50RTOXFxAv14Rj1euie5r8EinTQFwMgfI9AC12A/MJgDKypK0ybohndJLl6ZrLQRA5qt tVxpiMXTi4B9+017soFfzOuZm4LH0yqFjG43+fpaAwXDLw5TWFvd/dshoLAmepzFRlAA KBhVYXacSe32LPbpHmrOzR5nM/aTf6BTJct/Gmw5gHzMLkOAjKDnNSwfwBLtmeR8kopO dKy7g3H4zGX/4l8X/Kp/9JfKsrsO6lwB0LigZJrAHoS/XKg/ziq6YLtmElZxyP1wBKYp tKBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692121656; x=1692726456; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JOqmEllem/jPeBzpq8fEl4V95lhjC3iSrE+tn6lSaq0=; b=kmVZuSxPfX87nPjB0ce9vo//HaqOT24mkOUHgr7LQRABddz0PRHoM9z75HSZancPWD IXPJf+SNfGq5ff0mmnS8lbOZah70//BGNhpvMFZ/onXmn1jzVnfxFHtCw8XivHRzefYY c+fQyhUqgy30jlBchCd9Mxjr6oKt6ChM8fbnktYr/+ufCX33gme4+Q1svG3yb8diKH5T ZAUFKuFscqe2CENldWOfy9+QxTN3AJYdhmdFGPhsQJjWZXH7uj6sw6MLTvVnV/90zEWr h4KGs66kJcRuzV83rrINryCMbSKp448LoxME0L3NvPg9zXQy4rzdkRjelwyrGsQtgQgm RlKg== X-Gm-Message-State: AOJu0YymZ4Ap8HWACcIbMQpRS8pCJUOnVD0Y2YOczqFTyVr+k5Ty9RJ8 vEEaDsGQuiaqye3js/2Y4MlvlRPOcfDnkg== X-Google-Smtp-Source: AGHT+IGbO0ysPOdNX0TwbMRRNmVT/IrLnuPhCvy0JAp/vu8z2xNK2Nq3M31ziOt7CY4rVWTaj+R+TA== X-Received: by 2002:a0d:edc6:0:b0:56d:3b91:7e76 with SMTP id w189-20020a0dedc6000000b0056d3b917e76mr13678510ywe.12.1692121656369; Tue, 15 Aug 2023 10:47:36 -0700 (PDT) Received: from kickker.attlocal.net ([2600:1700:6cf8:1240:84ee:9e38:88fa:8a7b]) by smtp.gmail.com with ESMTPSA id o128-20020a0dcc86000000b00577139f85dfsm3509404ywd.22.2023.08.15.10.47.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Aug 2023 10:47:35 -0700 (PDT) From: thinker.li@gmail.com To: bpf@vger.kernel.org, ast@kernel.org, martin.lau@linux.dev, song@kernel.org, kernel-team@meta.com, andrii@kernel.org, sdf@google.com, yonghong.song@linux.dev Cc: sinquersw@gmail.com, kuifeng@meta.com, Kui-Feng Lee Subject: [RFC bpf-next v3 5/5] selftests/bpf: Add test cases for sleepable BPF programs of the CGROUP_SOCKOPT type Date: Tue, 15 Aug 2023 10:47:12 -0700 Message-Id: <20230815174712.660956-6-thinker.li@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230815174712.660956-1-thinker.li@gmail.com> References: <20230815174712.660956-1-thinker.li@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC From: Kui-Feng Lee Do the same test as non-sleepable ones. Signed-off-by: Kui-Feng Lee --- .../testing/selftests/bpf/bpf_experimental.h | 36 +++ tools/testing/selftests/bpf/bpf_kfuncs.h | 41 +++ .../selftests/bpf/prog_tests/sockopt_sk.c | 112 +++++++- .../testing/selftests/bpf/progs/sockopt_sk.c | 257 ++++++++++++++++++ .../selftests/bpf/verifier/sleepable.c | 2 +- 5 files changed, 445 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h index 209811b1993a..9b5dfefe65dc 100644 --- a/tools/testing/selftests/bpf/bpf_experimental.h +++ b/tools/testing/selftests/bpf/bpf_experimental.h @@ -131,4 +131,40 @@ extern int bpf_rbtree_add_impl(struct bpf_rb_root *root, struct bpf_rb_node *nod */ extern struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root) __ksym; +/* + * Description + * Copy data from *ptr* to *sopt->optval*. + * Return + * >= 0 on success, or a negative error in case of failure. + */ +extern int bpf_sockopt_dynptr_copy_to(struct bpf_sockopt *sopt, + struct bpf_dynptr *ptr) __ksym; + +/* Description + * Allocate a buffer of 'size' bytes for being installed as optval. + * Returns + * > 0 on success, the size of the allocated buffer + * -ENOMEM or -EINVAL on failure + */ +extern int bpf_sockopt_dynptr_alloc(struct bpf_sockopt *sopt, int size, + struct bpf_dynptr *ptr__uninit) __ksym; + +/* Description + * Install the buffer pointed to by 'ptr' as optval. + * Returns + * 0 on success + * -EINVAL if the buffer is too small + */ +extern int bpf_sockopt_dynptr_install(struct bpf_sockopt *sopt, + struct bpf_dynptr *ptr) __ksym; + +/* Description + * Release the buffer allocated by bpf_sockopt_dynptr_alloc. + * Returns + * 0 on success + * -EINVAL if the buffer was not allocated by bpf_sockopt_dynptr_alloc + */ +extern int bpf_sockopt_dynptr_release(struct bpf_sockopt *sopt, + struct bpf_dynptr *ptr) __ksym; + #endif diff --git a/tools/testing/selftests/bpf/bpf_kfuncs.h b/tools/testing/selftests/bpf/bpf_kfuncs.h index 642dda0e758a..772040225257 100644 --- a/tools/testing/selftests/bpf/bpf_kfuncs.h +++ b/tools/testing/selftests/bpf/bpf_kfuncs.h @@ -41,4 +41,45 @@ extern bool bpf_dynptr_is_rdonly(const struct bpf_dynptr *ptr) __ksym; extern __u32 bpf_dynptr_size(const struct bpf_dynptr *ptr) __ksym; extern int bpf_dynptr_clone(const struct bpf_dynptr *ptr, struct bpf_dynptr *clone__init) __ksym; +extern int bpf_sockopt_dynptr_copy_to(struct bpf_sockopt *sopt, + struct bpf_dynptr *ptr) __ksym; + +/* Description + * Allocate a buffer of 'size' bytes for being installed as optval. + * Returns + * > 0 on success, the size of the allocated buffer + * -ENOMEM or -EINVAL on failure + */ +extern int bpf_sockopt_dynptr_alloc(struct bpf_sockopt *sopt, int size, + struct bpf_dynptr *ptr__uninit) __ksym; + +/* Description + * Install the buffer pointed to by 'ptr' as optval. + * Returns + * 0 on success + * -EINVAL if the buffer is too small + */ +extern int bpf_sockopt_dynptr_install(struct bpf_sockopt *sopt, + struct bpf_dynptr *ptr) __ksym; + +/* Description + * Release the buffer allocated by bpf_sockopt_dynptr_alloc. + * Returns + * 0 on success + * -EINVAL if the buffer was not allocated by bpf_sockopt_dynptr_alloc + */ +extern int bpf_sockopt_dynptr_release(struct bpf_sockopt *sopt, + struct bpf_dynptr *ptr) __ksym; + +/* Description + * Initialize a dynptr to access the content of optval passing + * to {get,set}sockopt()s. + * Returns + * > 0 on success, the size of the allocated buffer + * -ENOMEM or -EINVAL on failure + */ +extern int bpf_sockopt_dynptr_from(struct bpf_sockopt *sopt, + struct bpf_dynptr *ptr__uninit, + unsigned int size) __ksym; + #endif diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c index 05d0e07da394..85255648747f 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c +++ b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c @@ -92,6 +92,7 @@ static int getsetsockopt(void) } if (buf.u8[0] != 0x01) { log_err("Unexpected buf[0] 0x%02x != 0x01", buf.u8[0]); + log_err("optlen %d", optlen); goto err; } @@ -220,7 +221,7 @@ static int getsetsockopt(void) return -1; } -static void run_test(int cgroup_fd) +static void run_test_nonsleepable(int cgroup_fd) { struct sockopt_sk *skel; @@ -246,6 +247,106 @@ static void run_test(int cgroup_fd) sockopt_sk__destroy(skel); } +static void run_test_nonsleepable_mixed(int cgroup_fd) +{ + struct sockopt_sk *skel; + + skel = sockopt_sk__open_and_load(); + if (!ASSERT_OK_PTR(skel, "skel_load")) + goto cleanup; + + skel->bss->page_size = getpagesize(); + skel->bss->skip_sleepable = 1; + + skel->links._setsockopt_s = + bpf_program__attach_cgroup(skel->progs._setsockopt_s, cgroup_fd); + if (!ASSERT_OK_PTR(skel->links._setsockopt_s, "setsockopt_link (sleepable)")) + goto cleanup; + + skel->links._getsockopt_s = + bpf_program__attach_cgroup(skel->progs._getsockopt_s, cgroup_fd); + if (!ASSERT_OK_PTR(skel->links._getsockopt_s, "getsockopt_link (sleepable)")) + goto cleanup; + + skel->links._setsockopt = + bpf_program__attach_cgroup(skel->progs._setsockopt, cgroup_fd); + if (!ASSERT_OK_PTR(skel->links._setsockopt, "setsockopt_link")) + goto cleanup; + + skel->links._getsockopt = + bpf_program__attach_cgroup(skel->progs._getsockopt, cgroup_fd); + if (!ASSERT_OK_PTR(skel->links._getsockopt, "getsockopt_link")) + goto cleanup; + + ASSERT_OK(getsetsockopt(), "getsetsockopt"); + +cleanup: + sockopt_sk__destroy(skel); +} + +static void run_test_sleepable(int cgroup_fd) +{ + struct sockopt_sk *skel; + + skel = sockopt_sk__open_and_load(); + if (!ASSERT_OK_PTR(skel, "skel_load")) + goto cleanup; + + skel->bss->page_size = getpagesize(); + + skel->links._setsockopt_s = + bpf_program__attach_cgroup(skel->progs._setsockopt_s, cgroup_fd); + if (!ASSERT_OK_PTR(skel->links._setsockopt_s, "setsockopt_link")) + goto cleanup; + + skel->links._getsockopt_s = + bpf_program__attach_cgroup(skel->progs._getsockopt_s, cgroup_fd); + if (!ASSERT_OK_PTR(skel->links._getsockopt_s, "getsockopt_link")) + goto cleanup; + + ASSERT_OK(getsetsockopt(), "getsetsockopt"); + +cleanup: + sockopt_sk__destroy(skel); +} + +static void run_test_sleepable_mixed(int cgroup_fd) +{ + struct sockopt_sk *skel; + + skel = sockopt_sk__open_and_load(); + if (!ASSERT_OK_PTR(skel, "skel_load")) + goto cleanup; + + skel->bss->page_size = getpagesize(); + skel->bss->skip_nonsleepable = 1; + + skel->links._setsockopt = + bpf_program__attach_cgroup(skel->progs._setsockopt, cgroup_fd); + if (!ASSERT_OK_PTR(skel->links._setsockopt, "setsockopt_link (nonsleepable)")) + goto cleanup; + + skel->links._getsockopt = + bpf_program__attach_cgroup(skel->progs._getsockopt, cgroup_fd); + if (!ASSERT_OK_PTR(skel->links._getsockopt, "getsockopt_link (nonsleepable)")) + goto cleanup; + + skel->links._setsockopt_s = + bpf_program__attach_cgroup(skel->progs._setsockopt_s, cgroup_fd); + if (!ASSERT_OK_PTR(skel->links._setsockopt_s, "setsockopt_link")) + goto cleanup; + + skel->links._getsockopt_s = + bpf_program__attach_cgroup(skel->progs._getsockopt_s, cgroup_fd); + if (!ASSERT_OK_PTR(skel->links._getsockopt_s, "getsockopt_link")) + goto cleanup; + + ASSERT_OK(getsetsockopt(), "getsetsockopt"); + +cleanup: + sockopt_sk__destroy(skel); +} + void test_sockopt_sk(void) { int cgroup_fd; @@ -254,6 +355,13 @@ void test_sockopt_sk(void) if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup /sockopt_sk")) return; - run_test(cgroup_fd); + if (test__start_subtest("nonsleepable")) + run_test_nonsleepable(cgroup_fd); + if (test__start_subtest("sleepable")) + run_test_sleepable(cgroup_fd); + if (test__start_subtest("nonsleepable_mixed")) + run_test_nonsleepable_mixed(cgroup_fd); + if (test__start_subtest("sleepable_mixed")) + run_test_sleepable_mixed(cgroup_fd); close(cgroup_fd); } diff --git a/tools/testing/selftests/bpf/progs/sockopt_sk.c b/tools/testing/selftests/bpf/progs/sockopt_sk.c index cb990a7d3d45..efacd3b88c40 100644 --- a/tools/testing/selftests/bpf/progs/sockopt_sk.c +++ b/tools/testing/selftests/bpf/progs/sockopt_sk.c @@ -5,10 +5,16 @@ #include #include +typedef int bool; +#include "bpf_kfuncs.h" + char _license[] SEC("license") = "GPL"; int page_size = 0; /* userspace should set it */ +int skip_sleepable = 0; +int skip_nonsleepable = 0; + #ifndef SOL_TCP #define SOL_TCP IPPROTO_TCP #endif @@ -34,6 +40,9 @@ int _getsockopt(struct bpf_sockopt *ctx) struct sockopt_sk *storage; struct bpf_sock *sk; + if (skip_nonsleepable) + return 1; + /* Bypass AF_NETLINK. */ sk = ctx->sk; if (sk && sk->family == AF_NETLINK) @@ -136,6 +145,134 @@ int _getsockopt(struct bpf_sockopt *ctx) return 1; } +SEC("cgroup/getsockopt.s") +int _getsockopt_s(struct bpf_sockopt *ctx) +{ + struct tcp_zerocopy_receive *zcvr; + struct bpf_dynptr optval_dynptr; + struct sockopt_sk *storage; + __u8 *optval, *optval_end; + struct bpf_sock *sk; + char buf[1]; + __u64 addr; + int ret; + + if (skip_sleepable) + return 1; + + /* Bypass AF_NETLINK. */ + sk = ctx->sk; + if (sk && sk->family == AF_NETLINK) + return 1; + + optval = ctx->optval; + optval_end = ctx->optval_end; + + /* Make sure bpf_get_netns_cookie is callable. + */ + if (bpf_get_netns_cookie(NULL) == 0) + return 0; + + if (bpf_get_netns_cookie(ctx) == 0) + return 0; + + if (ctx->level == SOL_IP && ctx->optname == IP_TOS) { + /* Not interested in SOL_IP:IP_TOS; + * let next BPF program in the cgroup chain or kernel + * handle it. + */ + return 1; + } + + if (ctx->level == SOL_SOCKET && ctx->optname == SO_SNDBUF) { + /* Not interested in SOL_SOCKET:SO_SNDBUF; + * let next BPF program in the cgroup chain or kernel + * handle it. + */ + return 1; + } + + if (ctx->level == SOL_TCP && ctx->optname == TCP_CONGESTION) { + /* Not interested in SOL_TCP:TCP_CONGESTION; + * let next BPF program in the cgroup chain or kernel + * handle it. + */ + return 1; + } + + if (ctx->level == SOL_TCP && ctx->optname == TCP_ZEROCOPY_RECEIVE) { + /* Verify that TCP_ZEROCOPY_RECEIVE triggers. + * It has a custom implementation for performance + * reasons. + */ + + bpf_sockopt_dynptr_from(ctx, &optval_dynptr, sizeof(*zcvr)); + zcvr = bpf_dynptr_data(&optval_dynptr, 0, sizeof(*zcvr)); + addr = zcvr ? zcvr->address : 0; + bpf_sockopt_dynptr_release(ctx, &optval_dynptr); + + return addr != 0 ? 0 : 1; + } + + if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) { + if (optval + 1 > optval_end) + return 0; /* bounds check */ + + ctx->retval = 0; /* Reset system call return value to zero */ + + /* Always export 0x55 */ + buf[0] = 0x55; + ret = bpf_sockopt_dynptr_alloc(ctx, 1, &optval_dynptr); + if (ret >= 0) { + bpf_dynptr_write(&optval_dynptr, 0, buf, 1, 0); + ret = bpf_sockopt_dynptr_copy_to(ctx, &optval_dynptr); + } + bpf_sockopt_dynptr_release(ctx, &optval_dynptr); + if (ret < 0) + return 0; + ctx->optlen = 1; + + /* Userspace buffer is PAGE_SIZE * 2, but BPF + * program can only see the first PAGE_SIZE + * bytes of data. + */ + if (optval_end - optval != page_size && 0) + return 0; /* unexpected data size */ + + return 1; + } + + if (ctx->level != SOL_CUSTOM) + return 0; /* deny everything except custom level */ + + if (optval + 1 > optval_end) + return 0; /* bounds check */ + + storage = bpf_sk_storage_get(&socket_storage_map, ctx->sk, 0, + BPF_SK_STORAGE_GET_F_CREATE); + if (!storage) + return 0; /* couldn't get sk storage */ + + if (!ctx->retval) + return 0; /* kernel should not have handled + * SOL_CUSTOM, something is wrong! + */ + ctx->retval = 0; /* Reset system call return value to zero */ + + buf[0] = storage->val; + ret = bpf_sockopt_dynptr_alloc(ctx, 1, &optval_dynptr); + if (ret >= 0) { + bpf_dynptr_write(&optval_dynptr, 0, buf, 1, 0); + ret = bpf_sockopt_dynptr_copy_to(ctx, &optval_dynptr); + } + bpf_sockopt_dynptr_release(ctx, &optval_dynptr); + if (ret < 0) + return 0; + ctx->optlen = 1; + + return 1; +} + SEC("cgroup/setsockopt") int _setsockopt(struct bpf_sockopt *ctx) { @@ -144,6 +281,9 @@ int _setsockopt(struct bpf_sockopt *ctx) struct sockopt_sk *storage; struct bpf_sock *sk; + if (skip_nonsleepable) + return 1; + /* Bypass AF_NETLINK. */ sk = ctx->sk; if (sk && sk->family == AF_NETLINK) @@ -236,3 +376,120 @@ int _setsockopt(struct bpf_sockopt *ctx) ctx->optlen = 0; return 1; } + +SEC("cgroup/setsockopt.s") +int _setsockopt_s(struct bpf_sockopt *ctx) +{ + struct bpf_dynptr optval_buf; + struct sockopt_sk *storage; + __u8 *optval, *optval_end; + struct bpf_sock *sk; + __u8 tmp_u8; + __u32 tmp; + int ret; + + if (skip_sleepable) + return 1; + + optval = ctx->optval; + optval_end = ctx->optval_end; + + /* Bypass AF_NETLINK. */ + sk = ctx->sk; + if (sk && sk->family == AF_NETLINK) + return -1; + + /* Make sure bpf_get_netns_cookie is callable. + */ + if (bpf_get_netns_cookie(NULL) == 0) + return 0; + + if (bpf_get_netns_cookie(ctx) == 0) + return 0; + + if (ctx->level == SOL_IP && ctx->optname == IP_TOS) { + /* Not interested in SOL_IP:IP_TOS; + * let next BPF program in the cgroup chain or kernel + * handle it. + */ + ctx->optlen = 0; /* bypass optval>PAGE_SIZE */ + return 1; + } + + if (ctx->level == SOL_SOCKET && ctx->optname == SO_SNDBUF) { + /* Overwrite SO_SNDBUF value */ + + ret = bpf_sockopt_dynptr_alloc(ctx, sizeof(__u32), + &optval_buf); + if (ret < 0) + bpf_sockopt_dynptr_release(ctx, &optval_buf); + else { + tmp = 0x55AA; + bpf_dynptr_write(&optval_buf, 0, &tmp, sizeof(tmp), 0); + ret = bpf_sockopt_dynptr_install(ctx, &optval_buf); + } + + return ret >= 0 ? 1 : 0; + } + + if (ctx->level == SOL_TCP && ctx->optname == TCP_CONGESTION) { + /* Always use cubic */ + + ret = bpf_sockopt_dynptr_alloc(ctx, 5, &optval_buf); + if (ret < 0) { + bpf_sockopt_dynptr_release(ctx, &optval_buf); + return 0; + } + bpf_dynptr_write(&optval_buf, 0, "cubic", 5, 0); + ret = bpf_sockopt_dynptr_install(ctx, &optval_buf); + if (ret < 0) + return 0; + ctx->optlen = 5; + + return 1; + } + + if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) { + /* Original optlen is larger than PAGE_SIZE. */ + if (ctx->optlen != page_size * 2) + return 0; /* unexpected data size */ + + ret = bpf_sockopt_dynptr_alloc(ctx, 1, &optval_buf); + if (ret < 0) { + bpf_sockopt_dynptr_release(ctx, &optval_buf); + return 0; + } + tmp_u8 = 0; + bpf_dynptr_write(&optval_buf, 0, &tmp_u8, 1, 0); + ret = bpf_sockopt_dynptr_install(ctx, &optval_buf); + if (ret < 0) + return 0; + ctx->optlen = 1; + + return 1; + } + + if (ctx->level != SOL_CUSTOM) + return 0; /* deny everything except custom level */ + + if (optval + 1 > optval_end) + return 0; /* bounds check */ + + storage = bpf_sk_storage_get(&socket_storage_map, ctx->sk, 0, + BPF_SK_STORAGE_GET_F_CREATE); + if (!storage) + return 0; /* couldn't get sk storage */ + + bpf_sockopt_dynptr_from(ctx, &optval_buf, sizeof(__u8)); + optval = bpf_dynptr_data(&optval_buf, 0, sizeof(__u8)); + if (optval) { + storage->val = *optval; + ctx->optlen = -1; /* BPF has consumed this option, don't call + * kernel setsockopt handler. + */ + } + bpf_sockopt_dynptr_release(ctx, &optval_buf); + + return optval ? 1 : 0; +} + diff --git a/tools/testing/selftests/bpf/verifier/sleepable.c b/tools/testing/selftests/bpf/verifier/sleepable.c index 1f0d2bdc673f..4b6c1117ec9f 100644 --- a/tools/testing/selftests/bpf/verifier/sleepable.c +++ b/tools/testing/selftests/bpf/verifier/sleepable.c @@ -85,7 +85,7 @@ .expected_attach_type = BPF_TRACE_RAW_TP, .kfunc = "sched_switch", .result = REJECT, - .errstr = "Only fentry/fexit/fmod_ret, lsm, iter, uprobe, and struct_ops programs can be sleepable", + .errstr = "Only fentry/fexit/fmod_ret, lsm, iter, uprobe, cgroup, and struct_ops programs can be sleepable", .flags = BPF_F_SLEEPABLE, .runs = -1, },