diff mbox

[V2,1/3] seccomp: add generic code for jitted seccomp filters.

Message ID 1363618233-6375-2-git-send-email-nschichan@freebox.fr (mailing list archive)
State New, archived
Headers show

Commit Message

Nicolas Schichan March 18, 2013, 2:50 p.m. UTC
Architecture must select HAVE_SECCOMP_FILTER_JIT and implement
seccomp_jit_compile() and seccomp_jit_free() if they intend to support
jitted seccomp filters.

struct seccomp_filter has been moved to <linux/seccomp.h> to make its
content available to the jit compilation code.

In a way similar to the net BPF, the jit compilation code is expected
to updates struct seccomp_filter.bpf_func pointer to the generated
code.

Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
---
 arch/Kconfig            |   14 ++++++++++++++
 include/linux/seccomp.h |   41 +++++++++++++++++++++++++++++++++++++++++
 kernel/seccomp.c        |   34 +++++-----------------------------
 3 files changed, 60 insertions(+), 29 deletions(-)

Comments

Kees Cook April 1, 2013, 9:53 p.m. UTC | #1
On Mon, Mar 18, 2013 at 7:50 AM, Nicolas Schichan <nschichan@freebox.fr> wrote:
> Architecture must select HAVE_SECCOMP_FILTER_JIT and implement
> seccomp_jit_compile() and seccomp_jit_free() if they intend to support
> jitted seccomp filters.
>
> struct seccomp_filter has been moved to <linux/seccomp.h> to make its
> content available to the jit compilation code.
>
> In a way similar to the net BPF, the jit compilation code is expected
> to updates struct seccomp_filter.bpf_func pointer to the generated
> code.
>
> Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>

Acked-by: Kees Cook <keescook@chromium.org>

I'd love to see this for x86 too. I suspect it'd be a small change
after this series lands.

Thanks,

-Kees
Will Drewry April 4, 2013, 7:58 p.m. UTC | #2
On Mon, Apr 1, 2013 at 4:53 PM, Kees Cook <keescook@chromium.org> wrote:
> On Mon, Mar 18, 2013 at 7:50 AM, Nicolas Schichan <nschichan@freebox.fr> wrote:
>> Architecture must select HAVE_SECCOMP_FILTER_JIT and implement
>> seccomp_jit_compile() and seccomp_jit_free() if they intend to support
>> jitted seccomp filters.
>>
>> struct seccomp_filter has been moved to <linux/seccomp.h> to make its
>> content available to the jit compilation code.
>>
>> In a way similar to the net BPF, the jit compilation code is expected
>> to updates struct seccomp_filter.bpf_func pointer to the generated
>> code.
>>
>> Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
>
> Acked-by: Kees Cook <keescook@chromium.org>
>
> I'd love to see this for x86 too. I suspect it'd be a small change
> after this series lands.

Agreed - and thanks for working through the necessary changes!

Acked-By: Will Drewry <wad@chromium.org>
(for the series)
Andrew Morton April 17, 2013, 9:56 p.m. UTC | #3
On Mon, 18 Mar 2013 15:50:30 +0100 Nicolas Schichan <nschichan@freebox.fr> wrote:

> Architecture must select HAVE_SECCOMP_FILTER_JIT and implement
> seccomp_jit_compile() and seccomp_jit_free() if they intend to support
> jitted seccomp filters.
> 
> struct seccomp_filter has been moved to <linux/seccomp.h> to make its
> content available to the jit compilation code.
> 
> In a way similar to the net BPF, the jit compilation code is expected
> to updates struct seccomp_filter.bpf_func pointer to the generated
> code.
> 

This patch is killing me.

> --- a/include/linux/seccomp.h
> +++ b/include/linux/seccomp.h
> @@ -6,6 +6,7 @@
>  #ifdef CONFIG_SECCOMP
>  
>  #include <linux/thread_info.h>
> +#include <linux/filter.h>
>  #include <asm/seccomp.h>

In file included from include/linux/compat.h:18,
                 from include/linux/filter.h:9,
                 from include/linux/seccomp.h:9,
                 from include/linux/sched.h:39,
                 from arch/x86/kernel/asm-offsets.c:9:
/usr/src/25/arch/x86/include/asm/compat.h: In function 'arch_compat_alloc_user_space':
/usr/src/25/arch/x86/include/asm/compat.h:301: error: dereferencing pointer to incomplete type

Problem is, compat.h's arch_compat_alloc_user_space() needs sched.h for
task_struct but as you can see from the above include tree, sched.h
includes seccomp.h and everything falls over.  The preprocessed code
contains the definition of arch_compat_alloc_user_space() *before* the
definition of task_struct.

This is a basic x86_64 "make clean; make allmodconfig; make".

I had a few slashes at fixing it but got all frustrated.
Nicolas Schichan April 22, 2013, 12:31 p.m. UTC | #4
On 04/17/2013 11:56 PM, Andrew Morton wrote:
> This patch is killing me.
>
>> --- a/include/linux/seccomp.h
>> +++ b/include/linux/seccomp.h
>> @@ -6,6 +6,7 @@
>>   #ifdef CONFIG_SECCOMP
>>
>>   #include <linux/thread_info.h>
>> +#include <linux/filter.h>
>>   #include <asm/seccomp.h>
>
> In file included from include/linux/compat.h:18,
>                   from include/linux/filter.h:9,
>                   from include/linux/seccomp.h:9,
>                   from include/linux/sched.h:39,
>                   from arch/x86/kernel/asm-offsets.c:9:
> /usr/src/25/arch/x86/include/asm/compat.h: In function 'arch_compat_alloc_user_space':
> /usr/src/25/arch/x86/include/asm/compat.h:301: error: dereferencing pointer to incomplete type
>
> Problem is, compat.h's arch_compat_alloc_user_space() needs sched.h for
> task_struct but as you can see from the above include tree, sched.h
> includes seccomp.h and everything falls over.  The preprocessed code
> contains the definition of arch_compat_alloc_user_space() *before* the
> definition of task_struct.
>
> This is a basic x86_64 "make clean; make allmodconfig; make".

Hi,

Would including <uapi/linux/filter.h> instead of <linux/filter.h> in seccomp.h 
be an acceptable solution ?

I have tried that and (with an additional forward declaration of struct 
sk_buff) an x86_64 "make clean; make allmodconfig" run finishes succesfully.

If that's ok with you, I can resend the serie with that fix.

Regards,
Andrew Morton April 23, 2013, 11:43 p.m. UTC | #5
On Mon, 22 Apr 2013 14:31:20 +0200 Nicolas Schichan <nschichan@freebox.fr> wrote:

> On 04/17/2013 11:56 PM, Andrew Morton wrote:
> > This patch is killing me.
> >
> >> --- a/include/linux/seccomp.h
> >> +++ b/include/linux/seccomp.h
> >> @@ -6,6 +6,7 @@
> >>   #ifdef CONFIG_SECCOMP
> >>
> >>   #include <linux/thread_info.h>
> >> +#include <linux/filter.h>
> >>   #include <asm/seccomp.h>
> >
> > In file included from include/linux/compat.h:18,
> >                   from include/linux/filter.h:9,
> >                   from include/linux/seccomp.h:9,
> >                   from include/linux/sched.h:39,
> >                   from arch/x86/kernel/asm-offsets.c:9:
> > /usr/src/25/arch/x86/include/asm/compat.h: In function 'arch_compat_alloc_user_space':
> > /usr/src/25/arch/x86/include/asm/compat.h:301: error: dereferencing pointer to incomplete type
> >
> > Problem is, compat.h's arch_compat_alloc_user_space() needs sched.h for
> > task_struct but as you can see from the above include tree, sched.h
> > includes seccomp.h and everything falls over.  The preprocessed code
> > contains the definition of arch_compat_alloc_user_space() *before* the
> > definition of task_struct.
> >
> > This is a basic x86_64 "make clean; make allmodconfig; make".
> 
> Hi,
> 
> Would including <uapi/linux/filter.h> instead of <linux/filter.h> in seccomp.h 
> be an acceptable solution ?
> 
> I have tried that and (with an additional forward declaration of struct 
> sk_buff) an x86_64 "make clean; make allmodconfig" run finishes succesfully.
> 
> If that's ok with you, I can resend the serie with that fix.

It would be better to make the code and include tangle less complex,
rather than more complex.

Did we really need to move the `struct seccomp_filter' definition into
the header file?  afaict that wasn't really necessary - we can add a
few helper functions to kernel/seccomp.c and then have the remote code
treat seccomp_filter in an opaque fashion rather than directly poking
at its internals.


btw, what on earth is going on with seccomp_jit_free()?  It does
disturbing undocumented typecasting and it punts the module_free into a
kernel thread for mysterious, undocumented and possibly buggy reasons.

I realize it just copies bpf_jit_free().  The same observations apply there.
Nicolas Schichan April 24, 2013, 3:52 p.m. UTC | #6
On 04/24/2013 01:43 AM, Andrew Morton wrote:
> On Mon, 22 Apr 2013 14:31:20 +0200 Nicolas Schichan <nschichan@freebox.fr> wrote:
>> Would including <uapi/linux/filter.h> instead of <linux/filter.h> in seccomp.h
>> be an acceptable solution ?
>>
>> I have tried that and (with an additional forward declaration of struct
>> sk_buff) an x86_64 "make clean; make allmodconfig" run finishes succesfully.
>>
>> If that's ok with you, I can resend the serie with that fix.
>
> It would be better to make the code and include tangle less complex,
> rather than more complex.
>
> Did we really need to move the `struct seccomp_filter' definition into
> the header file?  afaict that wasn't really necessary - we can add a
> few helper functions to kernel/seccomp.c and then have the remote code
> treat seccomp_filter in an opaque fashion rather than directly poking
> at its internals.

Hi,

I will resend a V3 of the patch serie with the accessors.

> btw, what on earth is going on with seccomp_jit_free()?  It does
> disturbing undocumented typecasting and it punts the module_free into a
> kernel thread for mysterious, undocumented and possibly buggy reasons.
>
> I realize it just copies bpf_jit_free().  The same observations apply there.

The reason for this hack for both seccomp filters and socket filters is that 
{seccomp,bpf}_jit_free are called from a softirq. module_free() cannot be 
called directly from softirq, as it will in turn call vfree() which will 
BUG_ON() if in_interrupt() is non zero. So to call module_free(), it is 
therefore required to be in a process context, which is provided by the work 
struct.

Here is the call stack for the socket filter case:

[<c0087bf8>] (vfree+0x28/0x2c) from [<c001fc5c>] (bpf_jit_free+0x10/0x18)
[<c001fc5c>] (bpf_jit_free+0x10/0x18) from 
[<c0231188>](sk_filter_release_rcu+0x10/0x1c)
[<c0231188>] (sk_filter_release_rcu+0x10/0x1c) from [<c0060bb4>] 
(__rcu_process_callbacks+0x98/0xac)
[<c0060bb4>] (__rcu_process_callbacks+0x98/0xac) from [<c0060bd8>] 
(rcu_process_callbacks+0x10/0x20)
[<c0060bd8>] (rcu_process_callbacks+0x10/0x20) from [<c0029498>] 
(__do_softirq+0xbc/0x194)
[<c0029498>] (__do_softirq+0xbc/0x194) from [<c00295b0>] (run_ksoftirqd+0x40/0x64)
[<c00295b0>] (run_ksoftirqd+0x40/0x64) from [<c0041954>] 
(smpboot_thread_fn+0x150/0x15c)
[<c0041954>] (smpboot_thread_fn+0x150/0x15c) from [<c003bb2c>] (kthread+0xa4/0xb0)
[<c003bb2c>] (kthread+0xa4/0xb0) from [<c0013450>] (ret_from_fork+0x14/0x24)

Here is the call stack for the seccomp filter case:

[<c0087c28>] (vfree+0x28/0x2c) from [<c0060834>] (put_seccomp_filter+0x6c/0x84)
[<c0060834>] (put_seccomp_filter+0x6c/0x84) from [<c0020db4>] 
(free_task+0x30/0x50)
[<c0020db4>] (free_task+0x30/0x50) from [<c0060be4>] 
(__rcu_process_callbacks+0x98/0xac)
[<c0060be4>] (__rcu_process_callbacks+0x98/0xac) from [<c0060c08>] 
(rcu_process_callbacks+0x10/0x20)
[<c0060c08>] (rcu_process_callbacks+0x10/0x20) from [<c00294c8>] 
(__do_softirq+0xbc/0x194)
[<c00294c8>] (__do_softirq+0xbc/0x194) from [<c00295e0>] (run_ksoftirqd+0x40/0x64)
[<c00295e0>] (run_ksoftirqd+0x40/0x64) from [<c0041984>] 
(smpboot_thread_fn+0x150/0x15c)
[<c0041984>] (smpboot_thread_fn+0x150/0x15c) from [<c003bb5c>] (kthread+0xa4/0xb0)
[<c003bb5c>] (kthread+0xa4/0xb0) from [<c0013450>] (ret_from_fork+0x14/0x24)

Regards,
diff mbox

Patch

diff --git a/arch/Kconfig b/arch/Kconfig
index 5a1779c..1284367 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -339,6 +339,10 @@  config HAVE_ARCH_SECCOMP_FILTER
 	  - secure_computing return value is checked and a return value of -1
 	    results in the system call being skipped immediately.
 
+# Used by archs to tell that they support SECCOMP_FILTER_JIT
+config HAVE_SECCOMP_FILTER_JIT
+	bool
+
 config SECCOMP_FILTER
 	def_bool y
 	depends on HAVE_ARCH_SECCOMP_FILTER && SECCOMP && NET
@@ -349,6 +353,16 @@  config SECCOMP_FILTER
 
 	  See Documentation/prctl/seccomp_filter.txt for details.
 
+config SECCOMP_FILTER_JIT
+	bool "enable Seccomp filter Just In Time compiler"
+	depends on HAVE_SECCOMP_FILTER_JIT && BPF_JIT && SECCOMP_FILTER
+	help
+	  Seccomp syscall filtering capabilities are normally handled
+	  by an interpreter. This option allows kernel to generate a native
+	  code when filter is loaded in memory. This should speedup
+	  syscall filtering. Note : Admin should enable this feature
+	  changing /proc/sys/net/core/bpf_jit_enable
+
 config HAVE_CONTEXT_TRACKING
 	bool
 	help
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 6f19cfd..a216ab7 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -6,6 +6,7 @@ 
 #ifdef CONFIG_SECCOMP
 
 #include <linux/thread_info.h>
+#include <linux/filter.h>
 #include <asm/seccomp.h>
 
 struct seccomp_filter;
@@ -47,6 +48,46 @@  static inline int seccomp_mode(struct seccomp *s)
 	return s->mode;
 }
 
+/**
+ * struct seccomp_filter - container for seccomp BPF programs
+ *
+ * @usage: reference count to manage the object lifetime.
+ *         get/put helpers should be used when accessing an instance
+ *         outside of a lifetime-guarded section.  In general, this
+ *         is only needed for handling filters shared across tasks.
+ * @prev: points to a previously installed, or inherited, filter
+ * @len: the number of instructions in the program
+ * @bpc_func: points to either sk_run_filter or the code generated
+ *         by the BPF JIT.
+ * @insns: the BPF program instructions to evaluate
+ *
+ * seccomp_filter objects are organized in a tree linked via the @prev
+ * pointer.  For any task, it appears to be a singly-linked list starting
+ * with current->seccomp.filter, the most recently attached or inherited filter.
+ * However, multiple filters may share a @prev node, by way of fork(), which
+ * results in a unidirectional tree existing in memory.  This is similar to
+ * how namespaces work.
+ *
+ * seccomp_filter objects should never be modified after being attached
+ * to a task_struct (other than @usage).
+ */
+struct seccomp_filter {
+	atomic_t usage;
+	struct seccomp_filter *prev;
+	unsigned short len;  /* Instruction count */
+	unsigned int (*bpf_func)(const struct sk_buff *skb,
+				 const struct sock_filter *filter);
+	struct sock_filter insns[];
+};
+
+#ifdef CONFIG_SECCOMP_FILTER_JIT
+extern void seccomp_jit_compile(struct seccomp_filter *fp);
+extern void seccomp_jit_free(struct seccomp_filter *fp);
+#else
+static inline void seccomp_jit_compile(struct seccomp_filter *fp) { }
+static inline void seccomp_jit_free(struct seccomp_filter *fp) { }
+#endif
+
 #else /* CONFIG_SECCOMP */
 
 #include <linux/errno.h>
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index b7a1004..a1aadaa 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -30,34 +30,6 @@ 
 #include <linux/tracehook.h>
 #include <linux/uaccess.h>
 
-/**
- * struct seccomp_filter - container for seccomp BPF programs
- *
- * @usage: reference count to manage the object lifetime.
- *         get/put helpers should be used when accessing an instance
- *         outside of a lifetime-guarded section.  In general, this
- *         is only needed for handling filters shared across tasks.
- * @prev: points to a previously installed, or inherited, filter
- * @len: the number of instructions in the program
- * @insns: the BPF program instructions to evaluate
- *
- * seccomp_filter objects are organized in a tree linked via the @prev
- * pointer.  For any task, it appears to be a singly-linked list starting
- * with current->seccomp.filter, the most recently attached or inherited filter.
- * However, multiple filters may share a @prev node, by way of fork(), which
- * results in a unidirectional tree existing in memory.  This is similar to
- * how namespaces work.
- *
- * seccomp_filter objects should never be modified after being attached
- * to a task_struct (other than @usage).
- */
-struct seccomp_filter {
-	atomic_t usage;
-	struct seccomp_filter *prev;
-	unsigned short len;  /* Instruction count */
-	struct sock_filter insns[];
-};
-
 /* Limit any path through the tree to 256KB worth of instructions. */
 #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
 
@@ -213,7 +185,7 @@  static u32 seccomp_run_filters(int syscall)
 	 * value always takes priority (ignoring the DATA).
 	 */
 	for (f = current->seccomp.filter; f; f = f->prev) {
-		u32 cur_ret = sk_run_filter(NULL, f->insns);
+		u32 cur_ret = f->bpf_func(NULL, f->insns);
 		if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION))
 			ret = cur_ret;
 	}
@@ -275,6 +247,9 @@  static long seccomp_attach_filter(struct sock_fprog *fprog)
 	if (ret)
 		goto fail;
 
+	filter->bpf_func = sk_run_filter;
+	seccomp_jit_compile(filter);
+
 	/*
 	 * If there is an existing filter, make it the prev and don't drop its
 	 * task reference.
@@ -332,6 +307,7 @@  void put_seccomp_filter(struct task_struct *tsk)
 	while (orig && atomic_dec_and_test(&orig->usage)) {
 		struct seccomp_filter *freeme = orig;
 		orig = orig->prev;
+		seccomp_jit_free(freeme);
 		kfree(freeme);
 	}
 }