Message ID | AM6PR03MB584846B635B10C59EFAF596099542@AM6PR03MB5848.eurprd03.prod.outlook.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | bpf/crib: Add open-coded style process file iterator and file related CRIB kfuncs | expand |
On Tue, Oct 29, 2024 at 5:15 PM Juntong Deng <juntong.deng@outlook.com> wrote: > > This patch adds the open-coded iterator style process file iterator > kfuncs bpf_iter_task_file_{new,next,destroy} that iterates over all > files opened by the specified process. > > In addition, this patch adds bpf_iter_task_file_get_fd() getter to get > the file descriptor corresponding to the file in the current iteration. > > The reference to struct file acquired by the previous > bpf_iter_task_file_next() is released in the next > bpf_iter_task_file_next(), and the last reference is released in the > last bpf_iter_task_file_next() that returns NULL. > > In the bpf_iter_task_file_destroy(), if the iterator does not iterate to > the end, then the last struct file reference is released at this time. > > Signed-off-by: Juntong Deng <juntong.deng@outlook.com> > --- > kernel/bpf/Makefile | 1 + > kernel/bpf/crib/Makefile | 3 ++ > kernel/bpf/crib/crib.c | 29 +++++++++++ > kernel/bpf/crib/files.c | 105 +++++++++++++++++++++++++++++++++++++++ > 4 files changed, 138 insertions(+) > create mode 100644 kernel/bpf/crib/Makefile > create mode 100644 kernel/bpf/crib/crib.c > create mode 100644 kernel/bpf/crib/files.c > > diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile > index 105328f0b9c0..933d36264e5e 100644 > --- a/kernel/bpf/Makefile > +++ b/kernel/bpf/Makefile > @@ -53,3 +53,4 @@ obj-$(CONFIG_BPF_SYSCALL) += relo_core.o > obj-$(CONFIG_BPF_SYSCALL) += btf_iter.o > obj-$(CONFIG_BPF_SYSCALL) += btf_relocate.o > obj-$(CONFIG_BPF_SYSCALL) += kmem_cache_iter.o > +obj-$(CONFIG_BPF_SYSCALL) += crib/ > diff --git a/kernel/bpf/crib/Makefile b/kernel/bpf/crib/Makefile > new file mode 100644 > index 000000000000..4e1bae1972dd > --- /dev/null > +++ b/kernel/bpf/crib/Makefile > @@ -0,0 +1,3 @@ > +# SPDX-License-Identifier: GPL-2.0 > + > +obj-$(CONFIG_BPF_SYSCALL) += crib.o files.o > diff --git a/kernel/bpf/crib/crib.c b/kernel/bpf/crib/crib.c > new file mode 100644 > index 000000000000..e6536ee9a845 > --- /dev/null > +++ b/kernel/bpf/crib/crib.c > @@ -0,0 +1,29 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Checkpoint/Restore In eBPF (CRIB) > + */ > + > +#include <linux/bpf.h> > +#include <linux/btf.h> > +#include <linux/btf_ids.h> > + > +BTF_KFUNCS_START(bpf_crib_kfuncs) > + > +BTF_ID_FLAGS(func, bpf_iter_task_file_new, KF_ITER_NEW | KF_TRUSTED_ARGS) > +BTF_ID_FLAGS(func, bpf_iter_task_file_next, KF_ITER_NEXT | KF_RET_NULL) > +BTF_ID_FLAGS(func, bpf_iter_task_file_get_fd) > +BTF_ID_FLAGS(func, bpf_iter_task_file_destroy, KF_ITER_DESTROY) This is in no way CRIB-specific, right? So I'd drop the CRIB reference and move code next to task_file BPF iterator program type implementation, this is a generic functionality. Even more so, given Namhyung's recent work on adding kmem_cache iterator (both program type and open-coded iterator), it seems like it should be possible to cut down on code duplication by using open-coded iterator logic inside the BPF iterator program. Now that you are adding task_file open-coded iterator, can you please check if it can be reused. See kernel/bpf/task_iter.c (and I think that's where your code should live as well, btw). pw-bot: cr > + > +BTF_KFUNCS_END(bpf_crib_kfuncs) > + > +static const struct btf_kfunc_id_set bpf_crib_kfunc_set = { > + .owner = THIS_MODULE, > + .set = &bpf_crib_kfuncs, > +}; > + > +static int __init bpf_crib_init(void) > +{ > + return register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &bpf_crib_kfunc_set); > +} > + > +late_initcall(bpf_crib_init); > diff --git a/kernel/bpf/crib/files.c b/kernel/bpf/crib/files.c > new file mode 100644 > index 000000000000..ececf150303f > --- /dev/null > +++ b/kernel/bpf/crib/files.c > @@ -0,0 +1,105 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +#include <linux/btf.h> > +#include <linux/file.h> > +#include <linux/fdtable.h> > +#include <linux/net.h> > + > +struct bpf_iter_task_file { > + __u64 __opaque[3]; > +} __aligned(8); > + > +struct bpf_iter_task_file_kern { > + struct task_struct *task; > + struct file *file; > + int fd; > +} __aligned(8); > + > +__bpf_kfunc_start_defs(); > + > +/** > + * bpf_iter_task_file_new() - Initialize a new task file iterator for a task, > + * used to iterate over all files opened by a specified task > + * > + * @it: the new bpf_iter_task_file to be created > + * @task: a pointer pointing to a task to be iterated over > + */ > +__bpf_kfunc int bpf_iter_task_file_new(struct bpf_iter_task_file *it, > + struct task_struct *task) > +{ > + struct bpf_iter_task_file_kern *kit = (void *)it; > + > + BUILD_BUG_ON(sizeof(struct bpf_iter_task_file_kern) > sizeof(struct bpf_iter_task_file)); > + BUILD_BUG_ON(__alignof__(struct bpf_iter_task_file_kern) != > + __alignof__(struct bpf_iter_task_file)); > + > + kit->task = task; > + kit->fd = -1; > + kit->file = NULL; > + > + return 0; > +} > + > +/** > + * bpf_iter_task_file_next() - Get the next file in bpf_iter_task_file > + * > + * bpf_iter_task_file_next acquires a reference to the returned struct file. > + * > + * The reference to struct file acquired by the previous > + * bpf_iter_task_file_next() is released in the next bpf_iter_task_file_next(), > + * and the last reference is released in the last bpf_iter_task_file_next() > + * that returns NULL. > + * > + * @it: the bpf_iter_task_file to be checked > + * > + * @returns a pointer to the struct file of the next file if further files > + * are available, otherwise returns NULL > + */ > +__bpf_kfunc struct file *bpf_iter_task_file_next(struct bpf_iter_task_file *it) > +{ > + struct bpf_iter_task_file_kern *kit = (void *)it; > + > + if (kit->file) > + fput(kit->file); > + > + kit->fd++; > + > + rcu_read_lock(); > + kit->file = task_lookup_next_fdget_rcu(kit->task, &kit->fd); > + rcu_read_unlock(); > + > + return kit->file; > +} > + > +/** > + * bpf_iter_task_file_get_fd() - Get the file descriptor corresponding to > + * the file in the current iteration > + * > + * @it: the bpf_iter_task_file to be checked > + * > + * @returns the file descriptor > + */ > +__bpf_kfunc int bpf_iter_task_file_get_fd(struct bpf_iter_task_file *it__iter) > +{ > + struct bpf_iter_task_file_kern *kit = (void *)it__iter; > + > + return kit->fd; > +} > + I don't think we need this. It's probably better to return a pointer to a small struct representing "item" returned from this iterator. Something like struct bpf_iter_task_file_item { struct task_struct *task; struct file *file; int fd; }; You can then embed this struct into struct bpf_iter_task_file and return a pointer to it on each next() call (avoiding memory allocation) (naming just for illustrative purposes, I spent 0 seconds thinking about it) > +/** > + * bpf_iter_task_file_destroy() - Destroy a bpf_iter_task_file > + * > + * If the iterator does not iterate to the end, then the last > + * struct file reference is released at this time. > + * > + * @it: the bpf_iter_task_file to be destroyed > + */ > +__bpf_kfunc void bpf_iter_task_file_destroy(struct bpf_iter_task_file *it) > +{ > + struct bpf_iter_task_file_kern *kit = (void *)it; > + > + if (kit->file) > + fput(kit->file); > +} > + > +__bpf_kfunc_end_defs(); > -- > 2.39.5 >
On 2024/11/1 19:06, Andrii Nakryiko wrote: > On Tue, Oct 29, 2024 at 5:15 PM Juntong Deng <juntong.deng@outlook.com> wrote: >> >> This patch adds the open-coded iterator style process file iterator >> kfuncs bpf_iter_task_file_{new,next,destroy} that iterates over all >> files opened by the specified process. >> >> In addition, this patch adds bpf_iter_task_file_get_fd() getter to get >> the file descriptor corresponding to the file in the current iteration. >> >> The reference to struct file acquired by the previous >> bpf_iter_task_file_next() is released in the next >> bpf_iter_task_file_next(), and the last reference is released in the >> last bpf_iter_task_file_next() that returns NULL. >> >> In the bpf_iter_task_file_destroy(), if the iterator does not iterate to >> the end, then the last struct file reference is released at this time. >> >> Signed-off-by: Juntong Deng <juntong.deng@outlook.com> >> --- >> kernel/bpf/Makefile | 1 + >> kernel/bpf/crib/Makefile | 3 ++ >> kernel/bpf/crib/crib.c | 29 +++++++++++ >> kernel/bpf/crib/files.c | 105 +++++++++++++++++++++++++++++++++++++++ >> 4 files changed, 138 insertions(+) >> create mode 100644 kernel/bpf/crib/Makefile >> create mode 100644 kernel/bpf/crib/crib.c >> create mode 100644 kernel/bpf/crib/files.c >> >> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile >> index 105328f0b9c0..933d36264e5e 100644 >> --- a/kernel/bpf/Makefile >> +++ b/kernel/bpf/Makefile >> @@ -53,3 +53,4 @@ obj-$(CONFIG_BPF_SYSCALL) += relo_core.o >> obj-$(CONFIG_BPF_SYSCALL) += btf_iter.o >> obj-$(CONFIG_BPF_SYSCALL) += btf_relocate.o >> obj-$(CONFIG_BPF_SYSCALL) += kmem_cache_iter.o >> +obj-$(CONFIG_BPF_SYSCALL) += crib/ >> diff --git a/kernel/bpf/crib/Makefile b/kernel/bpf/crib/Makefile >> new file mode 100644 >> index 000000000000..4e1bae1972dd >> --- /dev/null >> +++ b/kernel/bpf/crib/Makefile >> @@ -0,0 +1,3 @@ >> +# SPDX-License-Identifier: GPL-2.0 >> + >> +obj-$(CONFIG_BPF_SYSCALL) += crib.o files.o >> diff --git a/kernel/bpf/crib/crib.c b/kernel/bpf/crib/crib.c >> new file mode 100644 >> index 000000000000..e6536ee9a845 >> --- /dev/null >> +++ b/kernel/bpf/crib/crib.c >> @@ -0,0 +1,29 @@ >> +// SPDX-License-Identifier: GPL-2.0 >> +/* >> + * Checkpoint/Restore In eBPF (CRIB) >> + */ >> + >> +#include <linux/bpf.h> >> +#include <linux/btf.h> >> +#include <linux/btf_ids.h> >> + >> +BTF_KFUNCS_START(bpf_crib_kfuncs) >> + >> +BTF_ID_FLAGS(func, bpf_iter_task_file_new, KF_ITER_NEW | KF_TRUSTED_ARGS) >> +BTF_ID_FLAGS(func, bpf_iter_task_file_next, KF_ITER_NEXT | KF_RET_NULL) >> +BTF_ID_FLAGS(func, bpf_iter_task_file_get_fd) >> +BTF_ID_FLAGS(func, bpf_iter_task_file_destroy, KF_ITER_DESTROY) > > This is in no way CRIB-specific, right? So I'd drop the CRIB reference > and move code next to task_file BPF iterator program type > implementation, this is a generic functionality. > > Even more so, given Namhyung's recent work on adding kmem_cache > iterator (both program type and open-coded iterator), it seems like it > should be possible to cut down on code duplication by using open-coded > iterator logic inside the BPF iterator program. Now that you are > adding task_file open-coded iterator, can you please check if it can > be reused. See kernel/bpf/task_iter.c (and I think that's where your > code should live as well, btw). > > pw-bot: cr > Thanks for your reply! Yes, I agree that it would be better to put the task_file open-coded iterator together with the traditional task_file iterator (in the same file). I will move it in the next patch series. >> + >> +BTF_KFUNCS_END(bpf_crib_kfuncs) >> + >> +static const struct btf_kfunc_id_set bpf_crib_kfunc_set = { >> + .owner = THIS_MODULE, >> + .set = &bpf_crib_kfuncs, >> +}; >> + >> +static int __init bpf_crib_init(void) >> +{ >> + return register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &bpf_crib_kfunc_set); >> +} >> + >> +late_initcall(bpf_crib_init); >> diff --git a/kernel/bpf/crib/files.c b/kernel/bpf/crib/files.c >> new file mode 100644 >> index 000000000000..ececf150303f >> --- /dev/null >> +++ b/kernel/bpf/crib/files.c >> @@ -0,0 +1,105 @@ >> +// SPDX-License-Identifier: GPL-2.0 >> + >> +#include <linux/btf.h> >> +#include <linux/file.h> >> +#include <linux/fdtable.h> >> +#include <linux/net.h> >> + >> +struct bpf_iter_task_file { >> + __u64 __opaque[3]; >> +} __aligned(8); >> + >> +struct bpf_iter_task_file_kern { >> + struct task_struct *task; >> + struct file *file; >> + int fd; >> +} __aligned(8); >> + >> +__bpf_kfunc_start_defs(); >> + >> +/** >> + * bpf_iter_task_file_new() - Initialize a new task file iterator for a task, >> + * used to iterate over all files opened by a specified task >> + * >> + * @it: the new bpf_iter_task_file to be created >> + * @task: a pointer pointing to a task to be iterated over >> + */ >> +__bpf_kfunc int bpf_iter_task_file_new(struct bpf_iter_task_file *it, >> + struct task_struct *task) >> +{ >> + struct bpf_iter_task_file_kern *kit = (void *)it; >> + >> + BUILD_BUG_ON(sizeof(struct bpf_iter_task_file_kern) > sizeof(struct bpf_iter_task_file)); >> + BUILD_BUG_ON(__alignof__(struct bpf_iter_task_file_kern) != >> + __alignof__(struct bpf_iter_task_file)); >> + >> + kit->task = task; >> + kit->fd = -1; >> + kit->file = NULL; >> + >> + return 0; >> +} >> + >> +/** >> + * bpf_iter_task_file_next() - Get the next file in bpf_iter_task_file >> + * >> + * bpf_iter_task_file_next acquires a reference to the returned struct file. >> + * >> + * The reference to struct file acquired by the previous >> + * bpf_iter_task_file_next() is released in the next bpf_iter_task_file_next(), >> + * and the last reference is released in the last bpf_iter_task_file_next() >> + * that returns NULL. >> + * >> + * @it: the bpf_iter_task_file to be checked >> + * >> + * @returns a pointer to the struct file of the next file if further files >> + * are available, otherwise returns NULL >> + */ >> +__bpf_kfunc struct file *bpf_iter_task_file_next(struct bpf_iter_task_file *it) >> +{ >> + struct bpf_iter_task_file_kern *kit = (void *)it; >> + >> + if (kit->file) >> + fput(kit->file); >> + >> + kit->fd++; >> + >> + rcu_read_lock(); >> + kit->file = task_lookup_next_fdget_rcu(kit->task, &kit->fd); >> + rcu_read_unlock(); >> + >> + return kit->file; >> +} >> + >> +/** >> + * bpf_iter_task_file_get_fd() - Get the file descriptor corresponding to >> + * the file in the current iteration >> + * >> + * @it: the bpf_iter_task_file to be checked >> + * >> + * @returns the file descriptor >> + */ >> +__bpf_kfunc int bpf_iter_task_file_get_fd(struct bpf_iter_task_file *it__iter) >> +{ >> + struct bpf_iter_task_file_kern *kit = (void *)it__iter; >> + >> + return kit->fd; >> +} >> + > > I don't think we need this. It's probably better to return a pointer > to a small struct representing "item" returned from this iterator. > Something like > > struct bpf_iter_task_file_item { > struct task_struct *task; > struct file *file; > int fd; > }; > > You can then embed this struct into struct bpf_iter_task_file and > return a pointer to it on each next() call (avoiding memory > allocation) > > > (naming just for illustrative purposes, I spent 0 seconds thinking about it) > Yes, I agree that it is feasible. But there is a question here, should we expose the internal state structure of the iterator (If we want to embed) ? I guess that we need two versions of data structures struct bpf_iter_xxx and struct bpf_iter_xxx_kern is for the purpose of encapsulation? With two versions of the data structure, users can only manipulate the iterator using the iterator kfuncs, avoiding users from directly accessing the internal state. After we decide to return struct bpf_iter_task_file_item, these members will not be able to change and users can directly access/change the internal state of the iterator. >> +/** >> + * bpf_iter_task_file_destroy() - Destroy a bpf_iter_task_file >> + * >> + * If the iterator does not iterate to the end, then the last >> + * struct file reference is released at this time. >> + * >> + * @it: the bpf_iter_task_file to be destroyed >> + */ >> +__bpf_kfunc void bpf_iter_task_file_destroy(struct bpf_iter_task_file *it) >> +{ >> + struct bpf_iter_task_file_kern *kit = (void *)it; >> + >> + if (kit->file) >> + fput(kit->file); >> +} >> + >> +__bpf_kfunc_end_defs(); >> -- >> 2.39.5 >>
On Fri, Nov 1, 2024 at 1:22 PM Juntong Deng <juntong.deng@outlook.com> wrote: > > On 2024/11/1 19:06, Andrii Nakryiko wrote: > > On Tue, Oct 29, 2024 at 5:15 PM Juntong Deng <juntong.deng@outlook.com> wrote: > >> > >> This patch adds the open-coded iterator style process file iterator > >> kfuncs bpf_iter_task_file_{new,next,destroy} that iterates over all > >> files opened by the specified process. > >> > >> In addition, this patch adds bpf_iter_task_file_get_fd() getter to get > >> the file descriptor corresponding to the file in the current iteration. > >> > >> The reference to struct file acquired by the previous > >> bpf_iter_task_file_next() is released in the next > >> bpf_iter_task_file_next(), and the last reference is released in the > >> last bpf_iter_task_file_next() that returns NULL. > >> > >> In the bpf_iter_task_file_destroy(), if the iterator does not iterate to > >> the end, then the last struct file reference is released at this time. > >> > >> Signed-off-by: Juntong Deng <juntong.deng@outlook.com> > >> --- > >> kernel/bpf/Makefile | 1 + > >> kernel/bpf/crib/Makefile | 3 ++ > >> kernel/bpf/crib/crib.c | 29 +++++++++++ > >> kernel/bpf/crib/files.c | 105 +++++++++++++++++++++++++++++++++++++++ > >> 4 files changed, 138 insertions(+) > >> create mode 100644 kernel/bpf/crib/Makefile > >> create mode 100644 kernel/bpf/crib/crib.c > >> create mode 100644 kernel/bpf/crib/files.c > >> > >> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile > >> index 105328f0b9c0..933d36264e5e 100644 > >> --- a/kernel/bpf/Makefile > >> +++ b/kernel/bpf/Makefile > >> @@ -53,3 +53,4 @@ obj-$(CONFIG_BPF_SYSCALL) += relo_core.o > >> obj-$(CONFIG_BPF_SYSCALL) += btf_iter.o > >> obj-$(CONFIG_BPF_SYSCALL) += btf_relocate.o > >> obj-$(CONFIG_BPF_SYSCALL) += kmem_cache_iter.o > >> +obj-$(CONFIG_BPF_SYSCALL) += crib/ > >> diff --git a/kernel/bpf/crib/Makefile b/kernel/bpf/crib/Makefile > >> new file mode 100644 > >> index 000000000000..4e1bae1972dd > >> --- /dev/null > >> +++ b/kernel/bpf/crib/Makefile > >> @@ -0,0 +1,3 @@ > >> +# SPDX-License-Identifier: GPL-2.0 > >> + > >> +obj-$(CONFIG_BPF_SYSCALL) += crib.o files.o > >> diff --git a/kernel/bpf/crib/crib.c b/kernel/bpf/crib/crib.c > >> new file mode 100644 > >> index 000000000000..e6536ee9a845 > >> --- /dev/null > >> +++ b/kernel/bpf/crib/crib.c > >> @@ -0,0 +1,29 @@ > >> +// SPDX-License-Identifier: GPL-2.0 > >> +/* > >> + * Checkpoint/Restore In eBPF (CRIB) > >> + */ > >> + > >> +#include <linux/bpf.h> > >> +#include <linux/btf.h> > >> +#include <linux/btf_ids.h> > >> + > >> +BTF_KFUNCS_START(bpf_crib_kfuncs) > >> + > >> +BTF_ID_FLAGS(func, bpf_iter_task_file_new, KF_ITER_NEW | KF_TRUSTED_ARGS) > >> +BTF_ID_FLAGS(func, bpf_iter_task_file_next, KF_ITER_NEXT | KF_RET_NULL) > >> +BTF_ID_FLAGS(func, bpf_iter_task_file_get_fd) > >> +BTF_ID_FLAGS(func, bpf_iter_task_file_destroy, KF_ITER_DESTROY) > > > > This is in no way CRIB-specific, right? So I'd drop the CRIB reference > > and move code next to task_file BPF iterator program type > > implementation, this is a generic functionality. > > > > Even more so, given Namhyung's recent work on adding kmem_cache > > iterator (both program type and open-coded iterator), it seems like it > > should be possible to cut down on code duplication by using open-coded > > iterator logic inside the BPF iterator program. Now that you are > > adding task_file open-coded iterator, can you please check if it can > > be reused. See kernel/bpf/task_iter.c (and I think that's where your > > code should live as well, btw). > > > > pw-bot: cr > > > > Thanks for your reply! > > Yes, I agree that it would be better to put the task_file open-coded > iterator together with the traditional task_file iterator (in the > same file). > > I will move it in the next patch series. > > >> + > >> +BTF_KFUNCS_END(bpf_crib_kfuncs) > >> + > >> +static const struct btf_kfunc_id_set bpf_crib_kfunc_set = { > >> + .owner = THIS_MODULE, > >> + .set = &bpf_crib_kfuncs, > >> +}; > >> + > >> +static int __init bpf_crib_init(void) > >> +{ > >> + return register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &bpf_crib_kfunc_set); > >> +} > >> + > >> +late_initcall(bpf_crib_init); > >> diff --git a/kernel/bpf/crib/files.c b/kernel/bpf/crib/files.c > >> new file mode 100644 > >> index 000000000000..ececf150303f > >> --- /dev/null > >> +++ b/kernel/bpf/crib/files.c > >> @@ -0,0 +1,105 @@ > >> +// SPDX-License-Identifier: GPL-2.0 > >> + > >> +#include <linux/btf.h> > >> +#include <linux/file.h> > >> +#include <linux/fdtable.h> > >> +#include <linux/net.h> > >> + > >> +struct bpf_iter_task_file { > >> + __u64 __opaque[3]; > >> +} __aligned(8); > >> + > >> +struct bpf_iter_task_file_kern { > >> + struct task_struct *task; > >> + struct file *file; > >> + int fd; > >> +} __aligned(8); > >> + > >> +__bpf_kfunc_start_defs(); > >> + > >> +/** > >> + * bpf_iter_task_file_new() - Initialize a new task file iterator for a task, > >> + * used to iterate over all files opened by a specified task > >> + * > >> + * @it: the new bpf_iter_task_file to be created > >> + * @task: a pointer pointing to a task to be iterated over > >> + */ > >> +__bpf_kfunc int bpf_iter_task_file_new(struct bpf_iter_task_file *it, > >> + struct task_struct *task) > >> +{ > >> + struct bpf_iter_task_file_kern *kit = (void *)it; > >> + > >> + BUILD_BUG_ON(sizeof(struct bpf_iter_task_file_kern) > sizeof(struct bpf_iter_task_file)); > >> + BUILD_BUG_ON(__alignof__(struct bpf_iter_task_file_kern) != > >> + __alignof__(struct bpf_iter_task_file)); > >> + > >> + kit->task = task; > >> + kit->fd = -1; > >> + kit->file = NULL; > >> + > >> + return 0; > >> +} > >> + > >> +/** > >> + * bpf_iter_task_file_next() - Get the next file in bpf_iter_task_file > >> + * > >> + * bpf_iter_task_file_next acquires a reference to the returned struct file. > >> + * > >> + * The reference to struct file acquired by the previous > >> + * bpf_iter_task_file_next() is released in the next bpf_iter_task_file_next(), > >> + * and the last reference is released in the last bpf_iter_task_file_next() > >> + * that returns NULL. > >> + * > >> + * @it: the bpf_iter_task_file to be checked > >> + * > >> + * @returns a pointer to the struct file of the next file if further files > >> + * are available, otherwise returns NULL > >> + */ > >> +__bpf_kfunc struct file *bpf_iter_task_file_next(struct bpf_iter_task_file *it) > >> +{ > >> + struct bpf_iter_task_file_kern *kit = (void *)it; > >> + > >> + if (kit->file) > >> + fput(kit->file); > >> + > >> + kit->fd++; > >> + > >> + rcu_read_lock(); > >> + kit->file = task_lookup_next_fdget_rcu(kit->task, &kit->fd); > >> + rcu_read_unlock(); > >> + > >> + return kit->file; > >> +} > >> + > >> +/** > >> + * bpf_iter_task_file_get_fd() - Get the file descriptor corresponding to > >> + * the file in the current iteration > >> + * > >> + * @it: the bpf_iter_task_file to be checked > >> + * > >> + * @returns the file descriptor > >> + */ > >> +__bpf_kfunc int bpf_iter_task_file_get_fd(struct bpf_iter_task_file *it__iter) > >> +{ > >> + struct bpf_iter_task_file_kern *kit = (void *)it__iter; > >> + > >> + return kit->fd; > >> +} > >> + > > > > I don't think we need this. It's probably better to return a pointer > > to a small struct representing "item" returned from this iterator. > > Something like > > > > struct bpf_iter_task_file_item { > > struct task_struct *task; > > struct file *file; > > int fd; > > }; > > > > You can then embed this struct into struct bpf_iter_task_file and > > return a pointer to it on each next() call (avoiding memory > > allocation) > > > > > > (naming just for illustrative purposes, I spent 0 seconds thinking about it) > > > > Yes, I agree that it is feasible. > > But there is a question here, should we expose the internal state > structure of the iterator (If we want to embed) ? > > I guess that we need two versions of data structures struct bpf_iter_xxx > and struct bpf_iter_xxx_kern is for the purpose of encapsulation? Yes, that's what we do for iterator state structure, and you should do that as well. bpf_iter_xxx one will be opaque (see other examples, we literally add `u64 __opaque[N];` there). But this bpf_iter_task_file_item will be sort of internal API that is returned from bpf_iter_task_file_next(). So you'll have something like struct bpf_iter_task_file { .... additional state ... struct bpf_iter_task_file_item item; }; then you have struct bpf_iter_task_file_item bpf_iter_task_file_next(struct bpf_iter_task_file *it) { struct bpf_iter_task_file_kern *kit = (void *)it; ... kit->item.task = <sometask>; kit->item.file = <file>; /* and so on */ return &kit->item; } > > With two versions of the data structure, users can only manipulate > the iterator using the iterator kfuncs, avoiding users from directly > accessing the internal state. > > After we decide to return struct bpf_iter_task_file_item, these members > will not be able to change and users can directly access/change the > internal state of the iterator. Yes, you have to carefully set up bpf_iter_task_file_item, but you could expand it in the future without breaking any old users, because you only return it by pointer and with BPF CO-RE all the field shifts will be correctly handled. > > >> +/** > >> + * bpf_iter_task_file_destroy() - Destroy a bpf_iter_task_file > >> + * > >> + * If the iterator does not iterate to the end, then the last > >> + * struct file reference is released at this time. > >> + * > >> + * @it: the bpf_iter_task_file to be destroyed > >> + */ > >> +__bpf_kfunc void bpf_iter_task_file_destroy(struct bpf_iter_task_file *it) > >> +{ > >> + struct bpf_iter_task_file_kern *kit = (void *)it; > >> + > >> + if (kit->file) > >> + fput(kit->file); > >> +} > >> + > >> +__bpf_kfunc_end_defs(); > >> -- > >> 2.39.5 > >> >
On 2024/11/6 20:02, Andrii Nakryiko wrote: > On Fri, Nov 1, 2024 at 1:22 PM Juntong Deng <juntong.deng@outlook.com> wrote: >> Yes, users can access file->f_op, but there seems to be no way for >> users to get references to struct file_operations for the various file >> types? For example, how does a user get a reference to socket_file_ops? > > See [0]. Libbpf will find it for the BPF program from kallsyms. > > [0] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/progs/test_ksyms.c#L13-L18 > Thanks for telling me this method. Yes, then we don't need bpf_get_file_ops_type(). >> >> Yes, I agree that it is feasible. >> >> But there is a question here, should we expose the internal state >> structure of the iterator (If we want to embed) ? >> >> I guess that we need two versions of data structures struct bpf_iter_xxx >> and struct bpf_iter_xxx_kern is for the purpose of encapsulation? > > Yes, that's what we do for iterator state structure, and you should do > that as well. bpf_iter_xxx one will be opaque (see other examples, we > literally add `u64 __opaque[N];` there). > > But this bpf_iter_task_file_item will be sort of internal API that is > returned from bpf_iter_task_file_next(). So you'll have something like > > struct bpf_iter_task_file { > .... additional state ... > struct bpf_iter_task_file_item item; > }; > > then you have > > struct bpf_iter_task_file_item bpf_iter_task_file_next(struct > bpf_iter_task_file *it) > { > struct bpf_iter_task_file_kern *kit = (void *)it; > > ... > kit->item.task = <sometask>; > kit->item.file = <file>; /* and so on */ > > return &kit->item; > } > >> >> With two versions of the data structure, users can only manipulate >> the iterator using the iterator kfuncs, avoiding users from directly >> accessing the internal state. >> >> After we decide to return struct bpf_iter_task_file_item, these members >> will not be able to change and users can directly access/change the >> internal state of the iterator. > > Yes, you have to carefully set up bpf_iter_task_file_item, but you > could expand it in the future without breaking any old users, because > you only return it by pointer and with BPF CO-RE all the field shifts > will be correctly handled. > You are right, in the next patch series version I will use struct bpf_iter_task_file_item. Could you please also give some feedback on "2. Where should we put CRIB related kfuncs?" in the discussion section of the v3 patch series cover letter [0]? [0]: https://lore.kernel.org/bpf/AM6PR03MB58488FD29EB0D0B89D52AABB99532@AM6PR03MB5848.eurprd03.prod.outlook.com/T/#t Then I can fix all the things in the v4 patch series. Many thanks.
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 105328f0b9c0..933d36264e5e 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -53,3 +53,4 @@ obj-$(CONFIG_BPF_SYSCALL) += relo_core.o obj-$(CONFIG_BPF_SYSCALL) += btf_iter.o obj-$(CONFIG_BPF_SYSCALL) += btf_relocate.o obj-$(CONFIG_BPF_SYSCALL) += kmem_cache_iter.o +obj-$(CONFIG_BPF_SYSCALL) += crib/ diff --git a/kernel/bpf/crib/Makefile b/kernel/bpf/crib/Makefile new file mode 100644 index 000000000000..4e1bae1972dd --- /dev/null +++ b/kernel/bpf/crib/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 + +obj-$(CONFIG_BPF_SYSCALL) += crib.o files.o diff --git a/kernel/bpf/crib/crib.c b/kernel/bpf/crib/crib.c new file mode 100644 index 000000000000..e6536ee9a845 --- /dev/null +++ b/kernel/bpf/crib/crib.c @@ -0,0 +1,29 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Checkpoint/Restore In eBPF (CRIB) + */ + +#include <linux/bpf.h> +#include <linux/btf.h> +#include <linux/btf_ids.h> + +BTF_KFUNCS_START(bpf_crib_kfuncs) + +BTF_ID_FLAGS(func, bpf_iter_task_file_new, KF_ITER_NEW | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_iter_task_file_next, KF_ITER_NEXT | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_task_file_get_fd) +BTF_ID_FLAGS(func, bpf_iter_task_file_destroy, KF_ITER_DESTROY) + +BTF_KFUNCS_END(bpf_crib_kfuncs) + +static const struct btf_kfunc_id_set bpf_crib_kfunc_set = { + .owner = THIS_MODULE, + .set = &bpf_crib_kfuncs, +}; + +static int __init bpf_crib_init(void) +{ + return register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &bpf_crib_kfunc_set); +} + +late_initcall(bpf_crib_init); diff --git a/kernel/bpf/crib/files.c b/kernel/bpf/crib/files.c new file mode 100644 index 000000000000..ececf150303f --- /dev/null +++ b/kernel/bpf/crib/files.c @@ -0,0 +1,105 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/btf.h> +#include <linux/file.h> +#include <linux/fdtable.h> +#include <linux/net.h> + +struct bpf_iter_task_file { + __u64 __opaque[3]; +} __aligned(8); + +struct bpf_iter_task_file_kern { + struct task_struct *task; + struct file *file; + int fd; +} __aligned(8); + +__bpf_kfunc_start_defs(); + +/** + * bpf_iter_task_file_new() - Initialize a new task file iterator for a task, + * used to iterate over all files opened by a specified task + * + * @it: the new bpf_iter_task_file to be created + * @task: a pointer pointing to a task to be iterated over + */ +__bpf_kfunc int bpf_iter_task_file_new(struct bpf_iter_task_file *it, + struct task_struct *task) +{ + struct bpf_iter_task_file_kern *kit = (void *)it; + + BUILD_BUG_ON(sizeof(struct bpf_iter_task_file_kern) > sizeof(struct bpf_iter_task_file)); + BUILD_BUG_ON(__alignof__(struct bpf_iter_task_file_kern) != + __alignof__(struct bpf_iter_task_file)); + + kit->task = task; + kit->fd = -1; + kit->file = NULL; + + return 0; +} + +/** + * bpf_iter_task_file_next() - Get the next file in bpf_iter_task_file + * + * bpf_iter_task_file_next acquires a reference to the returned struct file. + * + * The reference to struct file acquired by the previous + * bpf_iter_task_file_next() is released in the next bpf_iter_task_file_next(), + * and the last reference is released in the last bpf_iter_task_file_next() + * that returns NULL. + * + * @it: the bpf_iter_task_file to be checked + * + * @returns a pointer to the struct file of the next file if further files + * are available, otherwise returns NULL + */ +__bpf_kfunc struct file *bpf_iter_task_file_next(struct bpf_iter_task_file *it) +{ + struct bpf_iter_task_file_kern *kit = (void *)it; + + if (kit->file) + fput(kit->file); + + kit->fd++; + + rcu_read_lock(); + kit->file = task_lookup_next_fdget_rcu(kit->task, &kit->fd); + rcu_read_unlock(); + + return kit->file; +} + +/** + * bpf_iter_task_file_get_fd() - Get the file descriptor corresponding to + * the file in the current iteration + * + * @it: the bpf_iter_task_file to be checked + * + * @returns the file descriptor + */ +__bpf_kfunc int bpf_iter_task_file_get_fd(struct bpf_iter_task_file *it__iter) +{ + struct bpf_iter_task_file_kern *kit = (void *)it__iter; + + return kit->fd; +} + +/** + * bpf_iter_task_file_destroy() - Destroy a bpf_iter_task_file + * + * If the iterator does not iterate to the end, then the last + * struct file reference is released at this time. + * + * @it: the bpf_iter_task_file to be destroyed + */ +__bpf_kfunc void bpf_iter_task_file_destroy(struct bpf_iter_task_file *it) +{ + struct bpf_iter_task_file_kern *kit = (void *)it; + + if (kit->file) + fput(kit->file); +} + +__bpf_kfunc_end_defs();
This patch adds the open-coded iterator style process file iterator kfuncs bpf_iter_task_file_{new,next,destroy} that iterates over all files opened by the specified process. In addition, this patch adds bpf_iter_task_file_get_fd() getter to get the file descriptor corresponding to the file in the current iteration. The reference to struct file acquired by the previous bpf_iter_task_file_next() is released in the next bpf_iter_task_file_next(), and the last reference is released in the last bpf_iter_task_file_next() that returns NULL. In the bpf_iter_task_file_destroy(), if the iterator does not iterate to the end, then the last struct file reference is released at this time. Signed-off-by: Juntong Deng <juntong.deng@outlook.com> --- kernel/bpf/Makefile | 1 + kernel/bpf/crib/Makefile | 3 ++ kernel/bpf/crib/crib.c | 29 +++++++++++ kernel/bpf/crib/files.c | 105 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 138 insertions(+) create mode 100644 kernel/bpf/crib/Makefile create mode 100644 kernel/bpf/crib/crib.c create mode 100644 kernel/bpf/crib/files.c