Message ID | 20220906170301.256206-2-roberto.sassu@huaweicloud.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | bpf: Add fd modes check for map iter and extend libbpf | expand |
On Tue, Sep 6, 2022 at 10:04 AM Roberto Sassu <roberto.sassu@huaweicloud.com> wrote: > > From: Roberto Sassu <roberto.sassu@huawei.com> > > Commit 6e71b04a82248 ("bpf: Add file mode configuration into bpf maps") > added the BPF_F_RDONLY and BPF_F_WRONLY flags, to let user space specify > whether it will just read or modify a map. > > Map access control is done in two steps. First, when user space wants to > obtain a map fd, it provides to the kernel the eBPF-defined flags, which > are converted into open flags and passed to the security_bpf_map() security > hook for evaluation by LSMs. > > Second, if user space successfully obtained an fd, it passes that fd to the > kernel when it requests a map operation (e.g. lookup or update). The kernel > first checks if the fd has the modes required to perform the requested > operation and, if yes, continues the execution and returns the result to > user space. > > While the fd modes check was added for map_*_elem() functions, it is > currently missing for map iterators, added more recently with commit > a5cbe05a6673 ("bpf: Implement bpf iterator for map elements"). A map > iterator executes a chosen eBPF program for each key/value pair of a map > and allows that program to read and/or modify them. > > Whether a map iterator allows only read or also write depends on whether > the MEM_RDONLY flag in the ctx_arg_info member of the bpf_iter_reg > structure is set. Also, write needs to be supported at verifier level (for > example, it is currently not supported for sock maps). > > Since map iterators obtain a map from a user space fd with > bpf_map_get_with_uref(), add the new req_modes parameter to that function, > so that map iterators can provide the required fd modes to access a map. If > the user space fd doesn't include the required modes, > bpf_map_get_with_uref() returns with an error, and the map iterator will > not be created. > > If a map iterator marks both the key and value as read-only, it calls > bpf_map_get_with_uref() with FMODE_CAN_READ as value for req_modes. If it > also allows write access to either the key or the value, it calls that > function with FMODE_CAN_READ | FMODE_CAN_WRITE as value for req_modes, > regardless of whether or not the write is supported by the verifier (the > write is intentionally allowed). > > bpf_fd_probe_obj() does not require any fd mode, as the fd is only used for > the purpose of finding the eBPF object type, for pinning the object to the > bpffs filesystem. > > Finally, it is worth to mention that the fd modes check was not added for > the cgroup iterator, although it registers an attach_target method like the > other iterators. The reason is that the fd is not the only way for user > space to reference a cgroup object (also by ID and by path). For the > protection to be effective, all reference methods need to be evaluated > consistently. This work is deferred to a separate patch. I think the current behavior is fine. File permissions don't apply at iterator level or prog level. fmode_can_read/write are for syscall commands only. To be fair we've added them to lookup/delete commands and it was more of a pain to maintain and no confirmed good use.
On Tue, 2022-09-06 at 11:21 -0700, Alexei Starovoitov wrote: > On Tue, Sep 6, 2022 at 10:04 AM Roberto Sassu > <roberto.sassu@huaweicloud.com> wrote: > > From: Roberto Sassu <roberto.sassu@huawei.com> > > > > Commit 6e71b04a82248 ("bpf: Add file mode configuration into bpf > > maps") > > added the BPF_F_RDONLY and BPF_F_WRONLY flags, to let user space > > specify > > whether it will just read or modify a map. > > > > Map access control is done in two steps. First, when user space > > wants to > > obtain a map fd, it provides to the kernel the eBPF-defined flags, > > which > > are converted into open flags and passed to the security_bpf_map() > > security > > hook for evaluation by LSMs. > > > > Second, if user space successfully obtained an fd, it passes that > > fd to the > > kernel when it requests a map operation (e.g. lookup or update). > > The kernel > > first checks if the fd has the modes required to perform the > > requested > > operation and, if yes, continues the execution and returns the > > result to > > user space. > > > > While the fd modes check was added for map_*_elem() functions, it > > is > > currently missing for map iterators, added more recently with > > commit > > a5cbe05a6673 ("bpf: Implement bpf iterator for map elements"). A > > map > > iterator executes a chosen eBPF program for each key/value pair of > > a map > > and allows that program to read and/or modify them. > > > > Whether a map iterator allows only read or also write depends on > > whether > > the MEM_RDONLY flag in the ctx_arg_info member of the bpf_iter_reg > > structure is set. Also, write needs to be supported at verifier > > level (for > > example, it is currently not supported for sock maps). > > > > Since map iterators obtain a map from a user space fd with > > bpf_map_get_with_uref(), add the new req_modes parameter to that > > function, > > so that map iterators can provide the required fd modes to access a > > map. If > > the user space fd doesn't include the required modes, > > bpf_map_get_with_uref() returns with an error, and the map iterator > > will > > not be created. > > > > If a map iterator marks both the key and value as read-only, it > > calls > > bpf_map_get_with_uref() with FMODE_CAN_READ as value for req_modes. > > If it > > also allows write access to either the key or the value, it calls > > that > > function with FMODE_CAN_READ | FMODE_CAN_WRITE as value for > > req_modes, > > regardless of whether or not the write is supported by the verifier > > (the > > write is intentionally allowed). > > > > bpf_fd_probe_obj() does not require any fd mode, as the fd is only > > used for > > the purpose of finding the eBPF object type, for pinning the object > > to the > > bpffs filesystem. > > > > Finally, it is worth to mention that the fd modes check was not > > added for > > the cgroup iterator, although it registers an attach_target method > > like the > > other iterators. The reason is that the fd is not the only way for > > user > > space to reference a cgroup object (also by ID and by path). For > > the > > protection to be effective, all reference methods need to be > > evaluated > > consistently. This work is deferred to a separate patch. > > I think the current behavior is fine. > File permissions don't apply at iterator level or prog level. + Chenbo, linux-security-module Well, if you write a security module to prevent writes on a map, and user space is able to do it anyway with an iterator, what is the purpose of the security module then? > fmode_can_read/write are for syscall commands only. > To be fair we've added them to lookup/delete commands > and it was more of a pain to maintain and no confirmed good use. I think a good use would be requesting the right permission for the type of operation that needs to be performed, e.g. read-only permission when you have a read-like operation like a lookup or dump. By always requesting read-write permission, for all operations, security modules won't be able to distinguish which operation has to be denied to satisfy the policy. One example of that is that, when there is a security module preventing writes on maps (will be that uncommon?), bpftool is not able to show the full list of maps because it asks for read-write permission for getting the map info. Freezing the map is not a solution, if you want to allow certain subjects to continuously update the protected map at run-time. Roberto
On Wed, Sep 7, 2022 at 1:03 AM Roberto Sassu <roberto.sassu@huaweicloud.com> wrote: > > On Tue, 2022-09-06 at 11:21 -0700, Alexei Starovoitov wrote: > > On Tue, Sep 6, 2022 at 10:04 AM Roberto Sassu > > <roberto.sassu@huaweicloud.com> wrote: > > > From: Roberto Sassu <roberto.sassu@huawei.com> > > > > > > Commit 6e71b04a82248 ("bpf: Add file mode configuration into bpf > > > maps") > > > added the BPF_F_RDONLY and BPF_F_WRONLY flags, to let user space > > > specify > > > whether it will just read or modify a map. > > > > > > Map access control is done in two steps. First, when user space > > > wants to > > > obtain a map fd, it provides to the kernel the eBPF-defined flags, > > > which > > > are converted into open flags and passed to the security_bpf_map() > > > security > > > hook for evaluation by LSMs. > > > > > > Second, if user space successfully obtained an fd, it passes that > > > fd to the > > > kernel when it requests a map operation (e.g. lookup or update). > > > The kernel > > > first checks if the fd has the modes required to perform the > > > requested > > > operation and, if yes, continues the execution and returns the > > > result to > > > user space. > > > > > > While the fd modes check was added for map_*_elem() functions, it > > > is > > > currently missing for map iterators, added more recently with > > > commit > > > a5cbe05a6673 ("bpf: Implement bpf iterator for map elements"). A > > > map > > > iterator executes a chosen eBPF program for each key/value pair of > > > a map > > > and allows that program to read and/or modify them. > > > > > > Whether a map iterator allows only read or also write depends on > > > whether > > > the MEM_RDONLY flag in the ctx_arg_info member of the bpf_iter_reg > > > structure is set. Also, write needs to be supported at verifier > > > level (for > > > example, it is currently not supported for sock maps). > > > > > > Since map iterators obtain a map from a user space fd with > > > bpf_map_get_with_uref(), add the new req_modes parameter to that > > > function, > > > so that map iterators can provide the required fd modes to access a > > > map. If > > > the user space fd doesn't include the required modes, > > > bpf_map_get_with_uref() returns with an error, and the map iterator > > > will > > > not be created. > > > > > > If a map iterator marks both the key and value as read-only, it > > > calls > > > bpf_map_get_with_uref() with FMODE_CAN_READ as value for req_modes. > > > If it > > > also allows write access to either the key or the value, it calls > > > that > > > function with FMODE_CAN_READ | FMODE_CAN_WRITE as value for > > > req_modes, > > > regardless of whether or not the write is supported by the verifier > > > (the > > > write is intentionally allowed). > > > > > > bpf_fd_probe_obj() does not require any fd mode, as the fd is only > > > used for > > > the purpose of finding the eBPF object type, for pinning the object > > > to the > > > bpffs filesystem. > > > > > > Finally, it is worth to mention that the fd modes check was not > > > added for > > > the cgroup iterator, although it registers an attach_target method > > > like the > > > other iterators. The reason is that the fd is not the only way for > > > user > > > space to reference a cgroup object (also by ID and by path). For > > > the > > > protection to be effective, all reference methods need to be > > > evaluated > > > consistently. This work is deferred to a separate patch. > > > > I think the current behavior is fine. > > File permissions don't apply at iterator level or prog level. > > + Chenbo, linux-security-module > > Well, if you write a security module to prevent writes on a map, and > user space is able to do it anyway with an iterator, what is the > purpose of the security module then? sounds like a broken "security module" and nothing else. > > fmode_can_read/write are for syscall commands only. > > To be fair we've added them to lookup/delete commands > > and it was more of a pain to maintain and no confirmed good use. > > I think a good use would be requesting the right permission for the > type of operation that needs to be performed, e.g. read-only permission > when you have a read-like operation like a lookup or dump. > > By always requesting read-write permission, for all operations, > security modules won't be able to distinguish which operation has to be > denied to satisfy the policy. > > One example of that is that, when there is a security module preventing > writes on maps (will be that uncommon?), lsm that prevents writes into bpf maps? That's a convoluted design. You can try to implement such an lsm, but expect lots of challenges. > bpftool is not able to show > the full list of maps because it asks for read-write permission for > getting the map info. completely orthogonal issue. > Freezing the map is not a solution, if you want to allow certain > subjects to continuously update the protected map at run-time. > > Roberto >
On Wed, 2022-09-07 at 09:02 -0700, Alexei Starovoitov wrote: > [...] > > Well, if you write a security module to prevent writes on a map, > > and > > user space is able to do it anyway with an iterator, what is the > > purpose of the security module then? > > sounds like a broken "security module" and nothing else. Ok, if a custom security module does not convince you, let me make a small example with SELinux. I created a small map iterator that sets every value of a map to 5: SEC("iter/bpf_map_elem") int write_bpf_hash_map(struct bpf_iter__bpf_map_elem *ctx) { u32 *key = ctx->key; u8 *val = ctx->value; if (key == NULL || val == NULL) return 0; *val = 5; return 0; } I create and pin a map: # bpftool map create /sys/fs/bpf/map type array key 4 value 1 entries 1 name test Initially, the content of the map looks like: # bpftool map dump pinned /sys/fs/bpf/map key: 00 00 00 00 value: 00 Found 1 element I then created a new SELinux type bpftool_test_t, which has only read permission on maps: # sesearch -A -s bpftool_test_t -t unconfined_t -c bpf allow bpftool_test_t unconfined_t:bpf map_read; So, what I expect is that this type is not able to write to the map. Indeed, the current bpftool is not able to do it: # strace -f -etrace=bpf runcon -t bpftool_test_t bpftool iter pin writer.o /sys/fs/bpf/iter map pinned /sys/fs/bpf/map bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/map", bpf_fd=0, file_flags=0}, 144) = -1 EACCES (Permission denied) Error: bpf obj get (/sys/fs/bpf): Permission denied This happens because the current bpftool requests to access the map with read-write permission, and SELinux denies it: # cat /var/log/audit/audit.log|audit2allow #============= bpftool_test_t ============== allow bpftool_test_t unconfined_t:bpf map_write; The command failed, and the content of the map is still: # bpftool map dump pinned /sys/fs/bpf/map key: 00 00 00 00 value: 00 Found 1 element Now, what I will do is to use a slightly modified version of bpftool which requests read-only access to the map instead: # strace -f -etrace=bpf runcon -t bpftool_test_t ./bpftool iter pin writer.o /sys/fs/bpf/iter map pinned /sys/fs/bpf/map bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/map", bpf_fd=0, file_flags=BPF_F_RDONLY}, 16) = 3 libbpf: elf: skipping unrecognized data section(5) .eh_frame libbpf: elf: skipping relo section(6) .rel.eh_frame for section(5) .eh_frame ... bpf(BPF_LINK_CREATE, {link_create={prog_fd=4, target_fd=0, attach_type=BPF_TRACE_ITER, flags=0}, ...}, 48) = 5 bpf(BPF_OBJ_PIN, {pathname="/sys/fs/bpf/iter", bpf_fd=5, file_flags=0}, 16) = 0 That worked, because SELinux grants read-only permission to bpftool_test_t. However, the map iterator does not check how the fd was obtained, and thus allows the iterator to be created. At this point, we have write access, despite not having the right to do it: # cat /sys/fs/bpf/iter # bpftool map dump pinned /sys/fs/bpf/map key: 00 00 00 00 value: 05 Found 1 element The iterator updated the map value. The patch I'm proposing checks how the map fd was obtained, and if its modes are compatible with the operations an attached program is allowed to do. If the fd does not have the required modes, eBPF denies the creation of the map iterator. After patching the kernel, I try to run the modified bpftool again: # strace -f -etrace=bpf runcon -t bpftool_test_t ./bpftool iter pin writer.o /sys/fs/bpf/iter map pinned /sys/fs/bpf/map bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/map", bpf_fd=0, file_flags=BPF_F_RDONLY}, 16) = 3 libbpf: elf: skipping unrecognized data section(5) .eh_frame libbpf: elf: skipping relo section(6) .rel.eh_frame for section(5) .eh_frame ... bpf(BPF_LINK_CREATE, {link_create={prog_fd=4, target_fd=0, attach_type=BPF_TRACE_ITER, flags=0}, ...}, 48) = -1 EPERM (Operation not permitted) libbpf: prog 'write_bpf_hash_map': failed to attach to iterator: Operation not permitted Error: attach_iter failed for program write_bpf_hash_map The map iterator cannot be created and the map is not updated: # bpftool map dump pinned /sys/fs/bpf/map key: 00 00 00 00 value: 00 Found 1 element Roberto
On Thu, Sep 8, 2022 at 6:59 AM Roberto Sassu <roberto.sassu@huaweicloud.com> wrote: > > On Wed, 2022-09-07 at 09:02 -0700, Alexei Starovoitov wrote: > > > > [...] > > > > Well, if you write a security module to prevent writes on a map, > > > and > > > user space is able to do it anyway with an iterator, what is the > > > purpose of the security module then? > > > > sounds like a broken "security module" and nothing else. > > Ok, if a custom security module does not convince you, let me make a > small example with SELinux. > > I created a small map iterator that sets every value of a map to 5: > > SEC("iter/bpf_map_elem") > int write_bpf_hash_map(struct bpf_iter__bpf_map_elem *ctx) > { > u32 *key = ctx->key; > u8 *val = ctx->value; > > if (key == NULL || val == NULL) > return 0; > > *val = 5; > return 0; > } > > I create and pin a map: > > # bpftool map create /sys/fs/bpf/map type array key 4 value 1 entries 1 > name test > > Initially, the content of the map looks like: > > # bpftool map dump pinned /sys/fs/bpf/map > key: 00 00 00 00 value: 00 > Found 1 element > > I then created a new SELinux type bpftool_test_t, which has only read > permission on maps: > > # sesearch -A -s bpftool_test_t -t unconfined_t -c bpf > allow bpftool_test_t unconfined_t:bpf map_read; > > So, what I expect is that this type is not able to write to the map. > > Indeed, the current bpftool is not able to do it: > > # strace -f -etrace=bpf runcon -t bpftool_test_t bpftool iter pin > writer.o /sys/fs/bpf/iter map pinned /sys/fs/bpf/map > bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/map", bpf_fd=0, file_flags=0}, > 144) = -1 EACCES (Permission denied) > Error: bpf obj get (/sys/fs/bpf): Permission denied > > This happens because the current bpftool requests to access the map > with read-write permission, and SELinux denies it: > > # cat /var/log/audit/audit.log|audit2allow > > > #============= bpftool_test_t ============== > allow bpftool_test_t unconfined_t:bpf map_write; > > > The command failed, and the content of the map is still: > > # bpftool map dump pinned /sys/fs/bpf/map > key: 00 00 00 00 value: 00 > Found 1 element > > > Now, what I will do is to use a slightly modified version of bpftool > which requests read-only access to the map instead: > > # strace -f -etrace=bpf runcon -t bpftool_test_t ./bpftool iter pin > writer.o /sys/fs/bpf/iter map pinned /sys/fs/bpf/map > bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/map", bpf_fd=0, > file_flags=BPF_F_RDONLY}, 16) = 3 > libbpf: elf: skipping unrecognized data section(5) .eh_frame > libbpf: elf: skipping relo section(6) .rel.eh_frame for section(5) > .eh_frame > > ... > > bpf(BPF_LINK_CREATE, {link_create={prog_fd=4, target_fd=0, > attach_type=BPF_TRACE_ITER, flags=0}, ...}, 48) = 5 > bpf(BPF_OBJ_PIN, {pathname="/sys/fs/bpf/iter", bpf_fd=5, file_flags=0}, > 16) = 0 > > That worked, because SELinux grants read-only permission to > bpftool_test_t. However, the map iterator does not check how the fd was > obtained, and thus allows the iterator to be created. > > At this point, we have write access, despite not having the right to do > it: That is a wrong assumption to begin with. Having an fd to a bpf object (map, link, prog) allows access. read/write sort-of applicable to maps, but not so much to progs, links. That file based read/write flag is only for user processes. bpf progs always had separate flags for that. See BPF_F_RDONLY vs BPF_F_RDONLY_PROG. One doesn't imply the other.
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 9c1674973e03..6cd2ca910553 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1628,7 +1628,7 @@ bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_ma void bpf_map_free_kptrs(struct bpf_map *map, void *map_value); struct bpf_map *bpf_map_get(u32 ufd); -struct bpf_map *bpf_map_get_with_uref(u32 ufd); +struct bpf_map *bpf_map_get_with_uref(u32 ufd, fmode_t req_modes); struct bpf_map *__bpf_map_get(struct fd f); void bpf_map_inc(struct bpf_map *map); void bpf_map_inc_with_uref(struct bpf_map *map); diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c index 4f841e16779e..862e1caa8b0f 100644 --- a/kernel/bpf/inode.c +++ b/kernel/bpf/inode.c @@ -71,7 +71,7 @@ static void *bpf_fd_probe_obj(u32 ufd, enum bpf_type *type) { void *raw; - raw = bpf_map_get_with_uref(ufd); + raw = bpf_map_get_with_uref(ufd, 0); if (!IS_ERR(raw)) { *type = BPF_TYPE_MAP; return raw; diff --git a/kernel/bpf/map_iter.c b/kernel/bpf/map_iter.c index b0fa190b0979..1143f8960135 100644 --- a/kernel/bpf/map_iter.c +++ b/kernel/bpf/map_iter.c @@ -110,7 +110,8 @@ static int bpf_iter_attach_map(struct bpf_prog *prog, if (!linfo->map.map_fd) return -EBADF; - map = bpf_map_get_with_uref(linfo->map.map_fd); + map = bpf_map_get_with_uref(linfo->map.map_fd, + FMODE_CAN_READ | FMODE_CAN_WRITE); if (IS_ERR(map)) return PTR_ERR(map); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 4e9d4622aef7..4a2063d8e99c 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1232,7 +1232,7 @@ struct bpf_map *bpf_map_get(u32 ufd) } EXPORT_SYMBOL(bpf_map_get); -struct bpf_map *bpf_map_get_with_uref(u32 ufd) +struct bpf_map *bpf_map_get_with_uref(u32 ufd, fmode_t req_modes) { struct fd f = fdget(ufd); struct bpf_map *map; @@ -1241,7 +1241,13 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd) if (IS_ERR(map)) return map; + if ((map_get_sys_perms(map, f) & req_modes) != req_modes) { + map = ERR_PTR(-EPERM); + goto out; + } + bpf_map_inc_with_uref(map); +out: fdput(f); return map; diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c index 1b7f385643b4..bf9c6afed8ac 100644 --- a/net/core/bpf_sk_storage.c +++ b/net/core/bpf_sk_storage.c @@ -897,7 +897,8 @@ static int bpf_iter_attach_map(struct bpf_prog *prog, if (!linfo->map.map_fd) return -EBADF; - map = bpf_map_get_with_uref(linfo->map.map_fd); + map = bpf_map_get_with_uref(linfo->map.map_fd, + FMODE_CAN_READ | FMODE_CAN_WRITE); if (IS_ERR(map)) return PTR_ERR(map); diff --git a/net/core/sock_map.c b/net/core/sock_map.c index a660baedd9e7..7f7375dc39b2 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -1636,7 +1636,8 @@ static int sock_map_iter_attach_target(struct bpf_prog *prog, if (!linfo->map.map_fd) return -EBADF; - map = bpf_map_get_with_uref(linfo->map.map_fd); + map = bpf_map_get_with_uref(linfo->map.map_fd, + FMODE_CAN_READ | FMODE_CAN_WRITE); if (IS_ERR(map)) return PTR_ERR(map);