mbox series

[RFC,security-next,0/4] Introducing Hornet LSM

Message ID 20250321164537.16719-1-bboscaccy@linux.microsoft.com (mailing list archive)
Headers show
Series Introducing Hornet LSM | expand

Message

Blaise Boscaccy March 21, 2025, 4:45 p.m. UTC
This patch series introduces the Hornet LSM.

Hornet takes a simple approach to light-skeleton-based eBPF signature
verification. Signature data can be easily generated for the binary
data that is generated via bpftool gen -L. This signature can be
appended to a skeleton executable via scripts/sign-ebpf. Hornet checks
the signature against a binary buffer containing the lskel
instructions that the loader maps use. Maps are frozen to prevent
TOCTOU bugs where a sufficiently privileged user could rewrite map
data between the calls to BPF_PROG_LOAD and
BPF_PROG_RUN. Additionally, both sparse-array-based and
fd_array_cnt-based map fd arrays are supported for signature
verification.


Blaise Boscaccy (4):
  security: Hornet LSM
  hornet: Introduce sign-ebpf
  hornet: Add an example lskel data extactor script
  selftests/hornet: Add a selftest for the hornet LSM

 Documentation/admin-guide/LSM/Hornet.rst     |  51 +++
 crypto/asymmetric_keys/pkcs7_verify.c        |  10 +
 include/linux/kernel_read_file.h             |   1 +
 include/linux/verification.h                 |   1 +
 include/uapi/linux/lsm.h                     |   1 +
 scripts/Makefile                             |   1 +
 scripts/hornet/Makefile                      |   5 +
 scripts/hornet/extract-skel.sh               |  29 ++
 scripts/hornet/sign-ebpf.c                   | 420 +++++++++++++++++++
 security/Kconfig                             |   3 +-
 security/Makefile                            |   1 +
 security/hornet/Kconfig                      |  11 +
 security/hornet/Makefile                     |   4 +
 security/hornet/hornet_lsm.c                 | 239 +++++++++++
 tools/testing/selftests/Makefile             |   1 +
 tools/testing/selftests/hornet/Makefile      |  51 +++
 tools/testing/selftests/hornet/loader.c      |  21 +
 tools/testing/selftests/hornet/trivial.bpf.c |  33 ++
 18 files changed, 882 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/admin-guide/LSM/Hornet.rst
 create mode 100644 scripts/hornet/Makefile
 create mode 100755 scripts/hornet/extract-skel.sh
 create mode 100644 scripts/hornet/sign-ebpf.c
 create mode 100644 security/hornet/Kconfig
 create mode 100644 security/hornet/Makefile
 create mode 100644 security/hornet/hornet_lsm.c
 create mode 100644 tools/testing/selftests/hornet/Makefile
 create mode 100644 tools/testing/selftests/hornet/loader.c
 create mode 100644 tools/testing/selftests/hornet/trivial.bpf.c

Comments

Paul Moore March 21, 2025, 9:43 p.m. UTC | #1
On Fri, Mar 21, 2025 at 12:45 PM Blaise Boscaccy
<bboscaccy@linux.microsoft.com> wrote:
>
> This patch series introduces the Hornet LSM.
>
> Hornet takes a simple approach to light-skeleton-based eBPF signature
> verification. Signature data can be easily generated for the binary
> data that is generated via bpftool gen -L. This signature can be
> appended to a skeleton executable via scripts/sign-ebpf. Hornet checks
> the signature against a binary buffer containing the lskel
> instructions that the loader maps use. Maps are frozen to prevent
> TOCTOU bugs where a sufficiently privileged user could rewrite map
> data between the calls to BPF_PROG_LOAD and
> BPF_PROG_RUN. Additionally, both sparse-array-based and
> fd_array_cnt-based map fd arrays are supported for signature
> verification.
>
> Blaise Boscaccy (4):
>   security: Hornet LSM
>   hornet: Introduce sign-ebpf
>   hornet: Add an example lskel data extactor script
>   selftests/hornet: Add a selftest for the hornet LSM

Thanks Blaise, I noticed a few minor things, but nothing critical.  As
I understand it, you'll be presenting Hornet at LSFMMBPF next week?
Assuming that's the case, I'm going to hold off on reviewing this
until we hear how that went next week; please report back after the
conference.

However, to be clear, the Hornet LSM proposed here seems very
reasonable to me and I would have no conceptual objections to merging
it upstream.  Based on off-list discussions I believe there is a lot
of demand for something like this, and I believe many people will be
happy to have BPF signature verification in-tree.
Jarkko Sakkinen March 22, 2025, 5:22 p.m. UTC | #2
On Fri, Mar 21, 2025 at 09:45:02AM -0700, Blaise Boscaccy wrote:
> This patch series introduces the Hornet LSM.
> 
> Hornet takes a simple approach to light-skeleton-based eBPF signature

Can you define "light-skeleton-based" before using the term.

This is the first time in my life when I hear about it.

> verification. Signature data can be easily generated for the binary

s/easily//

Useless word having no measure.

> data that is generated via bpftool gen -L. This signature can be

I have no idea what that command does.

"Signature data can be generated for the binary data as follows:

bpftool gen -L

<explanation>"

Here you'd need to answer to couple of unknowns:

1. What is in exact terms "signature data"?
2. What does "bpftool gen -L" do?

This feedback maps to other examples too in the cover letter.

BR, Jarkko
Paul Moore March 22, 2025, 8:44 p.m. UTC | #3
On Sat, Mar 22, 2025 at 1:22 PM Jarkko Sakkinen <jarkko@kernel.org> wrote:
> On Fri, Mar 21, 2025 at 09:45:02AM -0700, Blaise Boscaccy wrote:
> > This patch series introduces the Hornet LSM.
> >
> > Hornet takes a simple approach to light-skeleton-based eBPF signature
>
> Can you define "light-skeleton-based" before using the term.
>
> This is the first time in my life when I hear about it.

I was in the same situation a few months ago when I first heard about it :)

Blaise can surely provide a much better answer that what I'm about to
write, but since Blaise is going to be at LSFMMBPF this coming week I
suspect he might not have a lot of time to respond to email in the
next few days so I thought I would do my best to try and answer :)

An eBPF "light skeleton" is basically a BPF loader program and while
I'm sure there are several uses for a light skeleton, or lskel for
brevity, the single use case that we are interested in here, and the
one that Hornet deals with, is the idea of using a lskel to enable
signature verification of BPF programs as it seems to be the one way
that has been deemed acceptable by the BPF maintainers.

Once again, skipping over a lot of details, the basic idea is that you
take your original BPF program (A), feed it into a BPF userspace tool
to encapsulate the original program A into a BPF map and generate a
corresponding light skeleton BPF program (B), and then finally sign
the resulting binary containing the lskel program (B) and map
corresponding to the original program A.  At runtime, the lskel binary
is loaded into the kernel, and if Hornet is enabled, the signature of
both the lskel program A and original program B is verified.  If the
signature verification passes, lskel program A performs the necessary
BPF CO-RE transforms on BPF program A stored in the BPF map and then
attempts to load the original BPF program B, all from within the
kernel, and with the map frozen to prevent tampering from userspace.

Hopefully that helps fill in some gaps until someone more
knowledgeable can provide a better answer and/or correct any mistakes
in my explanation above ;)
Paul Moore March 22, 2025, 8:48 p.m. UTC | #4
On Sat, Mar 22, 2025 at 4:44 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Sat, Mar 22, 2025 at 1:22 PM Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > On Fri, Mar 21, 2025 at 09:45:02AM -0700, Blaise Boscaccy wrote:
> > > This patch series introduces the Hornet LSM.
> > >
> > > Hornet takes a simple approach to light-skeleton-based eBPF signature
> >
> > Can you define "light-skeleton-based" before using the term.
> >
> > This is the first time in my life when I hear about it.
>
> I was in the same situation a few months ago when I first heard about it :)
>
> Blaise can surely provide a much better answer that what I'm about to
> write, but since Blaise is going to be at LSFMMBPF this coming week I
> suspect he might not have a lot of time to respond to email in the
> next few days so I thought I would do my best to try and answer :)
>
> An eBPF "light skeleton" is basically a BPF loader program and while
> I'm sure there are several uses for a light skeleton, or lskel for
> brevity, the single use case that we are interested in here, and the
> one that Hornet deals with, is the idea of using a lskel to enable
> signature verification of BPF programs as it seems to be the one way
> that has been deemed acceptable by the BPF maintainers.
>
> Once again, skipping over a lot of details, the basic idea is that you
> take your original BPF program (A), feed it into a BPF userspace tool
> to encapsulate the original program A into a BPF map and generate a
> corresponding light skeleton BPF program (B), and then finally sign
> the resulting binary containing the lskel program (B) and map
> corresponding to the original program A.

Forgive me, I mixed up my "A" and "B" above :/

> At runtime, the lskel binary
> is loaded into the kernel, and if Hornet is enabled, the signature of
> both the lskel program A and original program B is verified.

... and I did again here

> If the
> signature verification passes, lskel program A performs the necessary
> BPF CO-RE transforms on BPF program A stored in the BPF map and then
> attempts to load the original BPF program B, all from within the
> kernel, and with the map frozen to prevent tampering from userspace.

... and once more here because why not? :)

> Hopefully that helps fill in some gaps until someone more
> knowledgeable can provide a better answer and/or correct any mistakes
> in my explanation above ;)
Jarkko Sakkinen March 22, 2025, 9:42 p.m. UTC | #5
On Sat, Mar 22, 2025 at 04:44:13PM -0400, Paul Moore wrote:
> On Sat, Mar 22, 2025 at 1:22 PM Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > On Fri, Mar 21, 2025 at 09:45:02AM -0700, Blaise Boscaccy wrote:
> > > This patch series introduces the Hornet LSM.
> > >
> > > Hornet takes a simple approach to light-skeleton-based eBPF signature
> >
> > Can you define "light-skeleton-based" before using the term.
> >
> > This is the first time in my life when I hear about it.
> 
> I was in the same situation a few months ago when I first heard about it :)
> 
> Blaise can surely provide a much better answer that what I'm about to
> write, but since Blaise is going to be at LSFMMBPF this coming week I
> suspect he might not have a lot of time to respond to email in the
> next few days so I thought I would do my best to try and answer :)

Yeah, I don't think there is anything largely wrong in the feature
itself but it speaks language that would fit to eBPF subsystem list,
not here :-)

I.e. assume only very basic knowledge of eBPF and explain what stuff
mentioned actually does. Like bpftool statement should be opened up
fully.

> 
> An eBPF "light skeleton" is basically a BPF loader program and while
> I'm sure there are several uses for a light skeleton, or lskel for
> brevity, the single use case that we are interested in here, and the
> one that Hornet deals with, is the idea of using a lskel to enable
> signature verification of BPF programs as it seems to be the one way
> that has been deemed acceptable by the BPF maintainers.

I got some grip but the term only should be used IMHO in the commit
message, if it is defined at first :-)

> 
> Once again, skipping over a lot of details, the basic idea is that you
> take your original BPF program (A), feed it into a BPF userspace tool
> to encapsulate the original program A into a BPF map and generate a
> corresponding light skeleton BPF program (B), and then finally sign
> the resulting binary containing the lskel program (B) and map
> corresponding to the original program A.  At runtime, the lskel binary
> is loaded into the kernel, and if Hornet is enabled, the signature of
> both the lskel program A and original program B is verified.  If the
> signature verification passes, lskel program A performs the necessary
> BPF CO-RE transforms on BPF program A stored in the BPF map and then
> attempts to load the original BPF program B, all from within the
> kernel, and with the map frozen to prevent tampering from userspace.

When you speak about corresponding lskel program what does that
program contain? Is it some kind of new version of the same
program with modifications, or?

I neither did not know what BPF CO-RE is but I googled it ;-)

> 
> Hopefully that helps fill in some gaps until someone more
> knowledgeable can provide a better answer and/or correct any mistakes
> in my explanation above ;)

Sure... Thanks for the explanations!

> 
> -- 
> paul-moore.com

BR, Jarkko
Jarkko Sakkinen March 22, 2025, 9:43 p.m. UTC | #6
On Sat, Mar 22, 2025 at 04:48:14PM -0400, Paul Moore wrote:
> On Sat, Mar 22, 2025 at 4:44 PM Paul Moore <paul@paul-moore.com> wrote:
> >
> > On Sat, Mar 22, 2025 at 1:22 PM Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > > On Fri, Mar 21, 2025 at 09:45:02AM -0700, Blaise Boscaccy wrote:
> > > > This patch series introduces the Hornet LSM.
> > > >
> > > > Hornet takes a simple approach to light-skeleton-based eBPF signature
> > >
> > > Can you define "light-skeleton-based" before using the term.
> > >
> > > This is the first time in my life when I hear about it.
> >
> > I was in the same situation a few months ago when I first heard about it :)
> >
> > Blaise can surely provide a much better answer that what I'm about to
> > write, but since Blaise is going to be at LSFMMBPF this coming week I
> > suspect he might not have a lot of time to respond to email in the
> > next few days so I thought I would do my best to try and answer :)
> >
> > An eBPF "light skeleton" is basically a BPF loader program and while
> > I'm sure there are several uses for a light skeleton, or lskel for
> > brevity, the single use case that we are interested in here, and the
> > one that Hornet deals with, is the idea of using a lskel to enable
> > signature verification of BPF programs as it seems to be the one way
> > that has been deemed acceptable by the BPF maintainers.
> >
> > Once again, skipping over a lot of details, the basic idea is that you
> > take your original BPF program (A), feed it into a BPF userspace tool
> > to encapsulate the original program A into a BPF map and generate a
> > corresponding light skeleton BPF program (B), and then finally sign
> > the resulting binary containing the lskel program (B) and map
> > corresponding to the original program A.
> 
> Forgive me, I mixed up my "A" and "B" above :/
> 
> > At runtime, the lskel binary
> > is loaded into the kernel, and if Hornet is enabled, the signature of
> > both the lskel program A and original program B is verified.
> 
> ... and I did again here
> 
> > If the
> > signature verification passes, lskel program A performs the necessary
> > BPF CO-RE transforms on BPF program A stored in the BPF map and then
> > attempts to load the original BPF program B, all from within the
> > kernel, and with the map frozen to prevent tampering from userspace.
> 
> ... and once more here because why not? :)

No worries I was able to decipher this :-)

> 
> > Hopefully that helps fill in some gaps until someone more
> > knowledgeable can provide a better answer and/or correct any mistakes
> > in my explanation above ;)
> 
> -- 
> paul-moore.com

BR, Jarkko
Blaise Boscaccy March 31, 2025, 8:57 p.m. UTC | #7
Jarkko Sakkinen <jarkko@kernel.org> writes:

Hi Jarkko,

Thanks for the comments. Paul did a very nice job providing some
background info, allow me to provide some additional data.

> On Fri, Mar 21, 2025 at 09:45:02AM -0700, Blaise Boscaccy wrote:
>> This patch series introduces the Hornet LSM.
>> 
>> Hornet takes a simple approach to light-skeleton-based eBPF signature
>
> Can you define "light-skeleton-based" before using the term.
>
> This is the first time in my life when I hear about it.
>

Sure. Here is the patchset where this stuff got introduced if you are
curious.
https://lore.kernel.org/bpf/20220209054315.73833-1-alexei.starovoitov@gmail.com/

eBPF has similar requirements to that of modules when it comes to
loading: find kallysym addresses, fix up elf relocations, some
struct field offset handing stuff called CO-RE (compile-one
run-anywhere), and some other miscellaneous bookkeeping.  During eBPF
program compilation, pseudo-values get written to the immedate operands
of instructions.  During loading, those pseudo-values get rewritten with
concrete addresses or data applicable to the currently running system,
e.g. a kallsym address or a fd for a map. This needs to happen before
the instructions for a bpf program are loaded into the kernel via the
bpf() syscall.

Unlike modules, an in-kernel loader unfortunately doesn't
exist. Typically, the instruction rewriting is done dynamically in
userspace via libbpf (or the rust/go/python loader). What skeletons do
is generate a script of required instruction-rewriting operations which
then gets played back at load-time against a hard-coded blob of raw
instruction data. This removes the need to distribute source-code or
object files.

There are two flavors of skeletons, normal skeletons, and light
skeletons. Normal skeletons utilize relocation logic that lives in
libbpf, and the relocations/instruction rewriting happen in userspace.
The second flavor, light skeletons, uses a small eBPF program that
contains the relocation lookup logic. As it's running in in the kernel,
it unpacks the target program, peforms the instruction rewriting, and
loads the target program. Light skeletons are currently utilized for
some drivers, and BPF_PRELOAD functionionality since they can operate
without userspace.

Light skeletons were recommended on various mailing list discussions as
the preffered path to performing signature verification. There are some
PoCs floating around that used light-skeletons in concert with
fs-verity/IMA and eBPF LSMs. We took a slightly different approach to
Hornet, by utilizing the existing PCKS#7 signing scheme that is used for
kernel modules.

>> verification. Signature data can be easily generated for the binary
>
> s/easily//
>
> Useless word having no measure.
>

Ack, thanks.


>> data that is generated via bpftool gen -L. This signature can be
>
> I have no idea what that command does.
>
> "Signature data can be generated for the binary data as follows:
>
> bpftool gen -L
>
> <explanation>"
>
> Here you'd need to answer to couple of unknowns:
>
> 1. What is in exact terms "signature data"?

That is a PKCS#7 signature of a data buffer containing the raw
instructions of an eBPF program, followed by the initial values of any
maps used by the program. 

> 2. What does "bpftool gen -L" do?
>

eBPF programs often have 2 parts. An orchestrator/loader program that
provides load -> attach/run -> i/o -> teardown logic and the in-kernel
program.

That command is used to generate a skeleton which can be used by the
orchestrator prgoram. Skeletons get generated as a C header file, that
contains various autogenerated functions that open and load bpf programs
as decribed above. That header file ends up being included in a
userspace orchestrator program or possibly a kernel module.

> This feedback maps to other examples too in the cover letter.
>
> BR, Jarkko


I'll rework this with some definitions of the eBPF subsystem jargon
along with your suggestions.

-blaise
Jarkko Sakkinen April 1, 2025, 3:50 p.m. UTC | #8
On Mon, Mar 31, 2025 at 01:57:15PM -0700, Blaise Boscaccy wrote:
> There are two flavors of skeletons, normal skeletons, and light
> skeletons. Normal skeletons utilize relocation logic that lives in
> libbpf, and the relocations/instruction rewriting happen in userspace.
> The second flavor, light skeletons, uses a small eBPF program that
> contains the relocation lookup logic. As it's running in in the kernel,
> it unpacks the target program, peforms the instruction rewriting, and
> loads the target program. Light skeletons are currently utilized for
> some drivers, and BPF_PRELOAD functionionality since they can operate
> without userspace.
> 
> Light skeletons were recommended on various mailing list discussions as
> the preffered path to performing signature verification. There are some
> PoCs floating around that used light-skeletons in concert with
> fs-verity/IMA and eBPF LSMs. We took a slightly different approach to
> Hornet, by utilizing the existing PCKS#7 signing scheme that is used for
> kernel modules.

Right, because in the normal skeletons relocation logic remains
unsigned?

I have to admit I don't fully cope how the relocation process translates
into eBPF program but I do get how it is better for signatures if it
does :-)

> 
> >> verification. Signature data can be easily generated for the binary
> >
> > s/easily//
> >
> > Useless word having no measure.
> >
> 
> Ack, thanks.
> 
> 
> >> data that is generated via bpftool gen -L. This signature can be
> >
> > I have no idea what that command does.
> >
> > "Signature data can be generated for the binary data as follows:
> >
> > bpftool gen -L
> >
> > <explanation>"
> >
> > Here you'd need to answer to couple of unknowns:
> >
> > 1. What is in exact terms "signature data"?
> 
> That is a PKCS#7 signature of a data buffer containing the raw
> instructions of an eBPF program, followed by the initial values of any
> maps used by the program. 

Got it, thanks. This motivates to refine my TPM2 asymmetric keys
series so that TPM2 could anchor these :-)

https://lore.kernel.org/linux-integrity/20240528210823.28798-1-jarkko@kernel.org/


> 
> > 2. What does "bpftool gen -L" do?
> >
> 
> eBPF programs often have 2 parts. An orchestrator/loader program that
> provides load -> attach/run -> i/o -> teardown logic and the in-kernel
> program.
> 
> That command is used to generate a skeleton which can be used by the
> orchestrator prgoram. Skeletons get generated as a C header file, that
> contains various autogenerated functions that open and load bpf programs
> as decribed above. That header file ends up being included in a
> userspace orchestrator program or possibly a kernel module.

I did read the man page now too, but thanks for the commentary!

> 
> > This feedback maps to other examples too in the cover letter.
> >
> > BR, Jarkko
> 
> 
> I'll rework this with some definitions of the eBPF subsystem jargon
> along with your suggestions.

Yeah, you should be able to put the gist a factor better to nutshell :-)

> 
> -blaise

BR, Jarkko
Blaise Boscaccy April 1, 2025, 6:56 p.m. UTC | #9
Jarkko Sakkinen <jarkko@kernel.org> writes:

> On Mon, Mar 31, 2025 at 01:57:15PM -0700, Blaise Boscaccy wrote:
>> There are two flavors of skeletons, normal skeletons, and light
>> skeletons. Normal skeletons utilize relocation logic that lives in
>> libbpf, and the relocations/instruction rewriting happen in userspace.
>> The second flavor, light skeletons, uses a small eBPF program that
>> contains the relocation lookup logic. As it's running in in the kernel,
>> it unpacks the target program, peforms the instruction rewriting, and
>> loads the target program. Light skeletons are currently utilized for
>> some drivers, and BPF_PRELOAD functionionality since they can operate
>> without userspace.
>> 
>> Light skeletons were recommended on various mailing list discussions as
>> the preffered path to performing signature verification. There are some
>> PoCs floating around that used light-skeletons in concert with
>> fs-verity/IMA and eBPF LSMs. We took a slightly different approach to
>> Hornet, by utilizing the existing PCKS#7 signing scheme that is used for
>> kernel modules.
>
> Right, because in the normal skeletons relocation logic remains
> unsigned?
>

Yup, Exactly. 

> I have to admit I don't fully cope how the relocation process translates
> into eBPF program but I do get how it is better for signatures if it
> does :-)
>
>> 
>> >> verification. Signature data can be easily generated for the binary
>> >
>> > s/easily//
>> >
>> > Useless word having no measure.
>> >
>> 
>> Ack, thanks.
>> 
>> 
>> >> data that is generated via bpftool gen -L. This signature can be
>> >
>> > I have no idea what that command does.
>> >
>> > "Signature data can be generated for the binary data as follows:
>> >
>> > bpftool gen -L
>> >
>> > <explanation>"
>> >
>> > Here you'd need to answer to couple of unknowns:
>> >
>> > 1. What is in exact terms "signature data"?
>> 
>> That is a PKCS#7 signature of a data buffer containing the raw
>> instructions of an eBPF program, followed by the initial values of any
>> maps used by the program. 
>
> Got it, thanks. This motivates to refine my TPM2 asymmetric keys
> series so that TPM2 could anchor these :-)
>
> https://lore.kernel.org/linux-integrity/20240528210823.28798-1-jarkko@kernel.org/
>
>

Oooh. That would be very nice :) 

>> 
>> > 2. What does "bpftool gen -L" do?
>> >
>> 
>> eBPF programs often have 2 parts. An orchestrator/loader program that
>> provides load -> attach/run -> i/o -> teardown logic and the in-kernel
>> program.
>> 
>> That command is used to generate a skeleton which can be used by the
>> orchestrator prgoram. Skeletons get generated as a C header file, that
>> contains various autogenerated functions that open and load bpf programs
>> as decribed above. That header file ends up being included in a
>> userspace orchestrator program or possibly a kernel module.
>
> I did read the man page now too, but thanks for the commentary!
>
>> 
>> > This feedback maps to other examples too in the cover letter.
>> >
>> > BR, Jarkko
>> 
>> 
>> I'll rework this with some definitions of the eBPF subsystem jargon
>> along with your suggestions.
>
> Yeah, you should be able to put the gist a factor better to nutshell :-)
>
>> 
>> -blaise
>
> BR, Jarkko