Message ID | 20200616074934.1600036-1-keescook@chromium.org (mailing list archive) |
---|---|
Headers | show |
Series | seccomp: Implement constant action bitmaps | expand |
On Tue, Jun 16, 2020 at 12:49 AM Kees Cook <keescook@chromium.org> wrote: > > Hi, > > In order to build this mapping at filter attach time, each filter is > executed for every syscall (under each possible architecture), and > checked for any accesses of struct seccomp_data that are not the "arch" > nor "nr" (syscall) members. If only "arch" and "nr" are examined, then > there is a constant mapping for that syscall, and bitmaps can be updated > accordingly. If any accesses happen outside of those struct members, > seccomp must not bypass filter execution for that syscall, since program > state will be used to determine filter action result. > > During syscall action probing, in order to determine whether other members > of struct seccomp_data are being accessed during a filter execution, > the struct is placed across a page boundary with the "arch" and "nr" > members in the first page, and everything else in the second page. The > "page accessed" flag is cleared in the second page's PTE, and the filter > is run. If the "page accessed" flag appears as set after running the > filter, we can determine that the filter looked beyond the "arch" and > "nr" members, and exclude that syscall from the constant action bitmaps. This is... evil. I don't know how I feel about it. It's also potentially quite slow. I don't suppose you could, instead, instrument the BPF code to get at this without TLB hackery? Or maybe try to do some real symbolic execution of the BPF code? --Andy
On Tue, Jun 16, 2020 at 10:01:43AM -0700, Andy Lutomirski wrote: > On Tue, Jun 16, 2020 at 12:49 AM Kees Cook <keescook@chromium.org> wrote: > > > > Hi, > > > > > In order to build this mapping at filter attach time, each filter is > > executed for every syscall (under each possible architecture), and > > checked for any accesses of struct seccomp_data that are not the "arch" > > nor "nr" (syscall) members. If only "arch" and "nr" are examined, then > > there is a constant mapping for that syscall, and bitmaps can be updated > > accordingly. If any accesses happen outside of those struct members, > > seccomp must not bypass filter execution for that syscall, since program > > state will be used to determine filter action result. > > > > > During syscall action probing, in order to determine whether other members > > of struct seccomp_data are being accessed during a filter execution, > > the struct is placed across a page boundary with the "arch" and "nr" > > members in the first page, and everything else in the second page. The > > "page accessed" flag is cleared in the second page's PTE, and the filter > > is run. If the "page accessed" flag appears as set after running the > > filter, we can determine that the filter looked beyond the "arch" and > > "nr" members, and exclude that syscall from the constant action bitmaps. > > This is... evil. I don't know how I feel about it. It's also Thank you! ;) > potentially quite slow. I got the impression that (worst-case: a "full" filter for every arch/syscall combo) ~900 _local_ TLB flushes per filter attach wouldn't be very slow at all. (And the code is optimized to avoid needless flushes.) > I don't suppose you could, instead, instrument the BPF code to get at > this without TLB hackery? Or maybe try to do some real symbolic > execution of the BPF code? I think the "simple emulator" path[1] might get us a realistically large coverage. I'm going to try it out, and see what it looks like. -Kees [1] https://lore.kernel.org/lkml/202006160757.99FD9B785@keescook/