Message ID | 20201025134540.3770-1-john.wood@gmx.com (mailing list archive) |
---|---|
Headers | show |
Series | Fork brute force attack mitigation | expand |
Hi, Has anyone had time to review this patch serie? Any comments on this? Regards. On Sun, Oct 25, 2020 at 02:45:32PM +0100, John Wood wrote: > Attacks against vulnerable userspace applications with the purpose to break > ASLR or bypass canaries traditionaly use some level of brute force with the > help of the fork system call. This is possible since when creating a new > process using fork its memory contents are the same as those of the parent > process (the process that called the fork system call). So, the attacker > can test the memory infinite times to find the correct memory values or the > correct memory addresses without worrying about crashing the application. > > Based on the above scenario it would be nice to have this detected and > mitigated, and this is the goal of this patch serie. > > Other implementations > --------------------- > > The public version of grsecurity, as a summary, is based on the idea of > delay the fork system call if a child died due to a fatal error. This has > some issues: > > 1.- Bad practices: Add delays to the kernel is, in general, a bad idea. > > 2.- Weak points: This protection can be bypassed using two different > methods since it acts only when the fork is called after a child has > crashed. > > 2.1.- Bypass 1: So, it would still be possible for an attacker to fork > a big amount of children (in the order of thousands), then probe > all of them, and finally wait the protection time before repeat > the steps. > > 2.2.- Bypass 2: This method is based on the idea that the protection > doesn't act if the parent crashes. So, it would still be possible > for an attacker to fork a process and probe itself. Then, fork > the child process and probe itself again. This way, these steps > can be repeated infinite times without any mitigation. > > This implementation > ------------------- > > The main idea behind this implementation is to improve the existing ones > focusing on the weak points annotated before. The solution for the first > bypass method is to detect a fast crash rate instead of only one simple > crash. For the second bypass method the solution is to detect both the > crash of parent and child processes. Moreover, as a mitigation method it is > better to kill all the offending tasks involve in the attack instead of use > delays. > > So, the solution to the two bypass methods previously commented is to use > some statistical data shared across all the processes that can have the > same memory contents. Or in other words, a statistical data shared between > all the fork hierarchy processes after an execve system call. > > The purpose of these statistics is to compute the application crash period > in order to detect an attack. This crash period is the time between the > execve system call and the first fault or the time between two consecutives > faults, but this has a drawback. If an application crashes once quickly > from the execve system call or crashes twice in a short period of time for > some reason, a false positive attack will be triggered. To avoid this > scenario the shared statistical data holds a list of the i last crashes > timestamps and the application crash period is computed as follows: > > crash_period = (n_last_timestamp - n_minus_i_timestamp) / i; > > This ways, the size of the last crashes timestamps list allows to fine > tuning the detection sensibility. > > When this crash period falls under a certain threshold there is a clear > signal that something malicious is happening. Once detected, the mitigation > only kills the processes that share the same statistical data and so, all > the tasks that can have the same memory contents. This way, an attack is > rejected. > > 1.- Per system enabling: This feature can be enabled at build time using > the CONFIG_SECURITY_FORK_BRUTE option or using the visual config > application under the following menu: > > Security options ---> Fork brute force attack detection and mitigation > > 2.- Per process enabling/disabling: To allow that specific applications can > turn off or turn on the detection and mitigation of a fork brute force > attack when required, there are two new prctls. > > prctl(PR_SECURITY_FORK_BRUTE_ENABLE, 0, 0, 0, 0) > prctl(PR_SECURITY_FORK_BRUTE_DISABLE, 0, 0, 0, 0) > > 3.- Fine tuning: To customize the detection's sensibility there are two new > sysctl attributes that allow to set the last crashes timestamps list > size and the application crash period threshold (in milliseconds). Both > are accessible through the following files respectively. > > /proc/sys/kernel/brute/timestamps_list_size > /proc/sys/kernel/brute/crash_period_threshold > > The list size allows to avoid false positives due to crashes unrelated > with a real attack. The period threshold sets the time limit to detect > an attack. And, since a fork brute force attack will be detected if the > application crash period falls under this threshold, the higher this > value, the more sensitive the detection will be. > > So, knowing all this information I will explain now the different patches: > > The 1/8 patch defines a new LSM hook to get the fatal signal of a task. > This will be useful during the attack detection phase. > > The 2/8 patch defines a new LSM and manages the statistical data shared by > all the fork hierarchy processes. > > The 3/8 patch adds the sysctl attributes to fine tuning the detection. > > Patchs 4/8 and 5/8 detect and mitigate a fork brute force attack. > > Patch 6/8 adds the prctls to allow per process enabling/disabling. > > Patch 7/8 adds the documentation to explain this implementation. > > Patch 8/8 updates the maintainers file. > > This patch series is a task of the KSPP [1] and can also be accessed from > my github tree [2] in the "brute_v2" branch. > > [1] https://github.com/KSPP/linux/issues/39 > [2] https://github.com/johwood/linux/ > > The first version can be found in: > > https://lore.kernel.org/kernel-hardening/20200910202107.3799376-1-keescook@chromium.org/ > > Changelog RFC -> v2 > ------------------- > - Rename this feature with a more appropiate name (Jann Horn, Kees Cook). > - Convert the code to an LSM (Kees Cook). > - Add locking to avoid data races (Jann Horn). > - Add a new LSM hook to get the fatal signal of a task (Jann Horn, Kees > Cook). > - Add the last crashes timestamps list to avoid false positives in the > attack detection (Jann Horn). > - Use "period" instead of "rate" (Jann Horn). > - Other minor changes suggested (Jann Horn, Kees Cook). > > John Wood (8): > security: Add LSM hook at the point where a task gets a fatal signal > security/brute: Define a LSM and manage statistical data > security/brute: Add sysctl attributes to allow detection fine tuning > security/brute: Detect a fork brute force attack > security/brute: Mitigate a fork brute force attack > security/brute: Add prctls to enable/disable the fork attack detection > Documentation: Add documentation for the Brute LSM > MAINTAINERS: Add a new entry for the Brute LSM > > Documentation/admin-guide/LSM/Brute.rst | 118 ++++ > Documentation/admin-guide/LSM/index.rst | 1 + > MAINTAINERS | 7 + > include/brute/brute.h | 16 + > include/linux/lsm_hook_defs.h | 1 + > include/linux/lsm_hooks.h | 4 + > include/linux/security.h | 4 + > include/uapi/linux/prctl.h | 4 + > kernel/signal.c | 1 + > kernel/sys.c | 8 + > security/Kconfig | 11 +- > security/Makefile | 4 + > security/brute/Kconfig | 13 + > security/brute/Makefile | 2 + > security/brute/brute.c | 749 ++++++++++++++++++++++++ > security/security.c | 5 + > 16 files changed, 943 insertions(+), 5 deletions(-) > create mode 100644 Documentation/admin-guide/LSM/Brute.rst > create mode 100644 include/brute/brute.h > create mode 100644 security/brute/Kconfig > create mode 100644 security/brute/Makefile > create mode 100644 security/brute/brute.c > > -- > 2.25.1 >
On Sun, Oct 25, 2020 at 02:45:32PM +0100, John Wood wrote: > Attacks against vulnerable userspace applications with the purpose to break > ASLR or bypass canaries traditionaly use some level of brute force with the > help of the fork system call. This is possible since when creating a new > process using fork its memory contents are the same as those of the parent > process (the process that called the fork system call). So, the attacker > can test the memory infinite times to find the correct memory values or the > correct memory addresses without worrying about crashing the application. > > Based on the above scenario it would be nice to have this detected and > mitigated, and this is the goal of this patch serie. Thanks for preparing the v2! I spent some time looking at this today, and I really like how it has been rearranged into an LSM. This feels much more natural. Various notes: The locking isn't right; it'll trip with CONFIG_PROVE_LOCKING=y. Here's the giant splat: [ 8.205146] brute: Fork brute force attack detected [ 8.206821] [ 8.207317] ===================================================== [ 8.209392] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected [ 8.211852] 5.10.0-rc3 #2 Not tainted [ 8.213215] ----------------------------------------------------- [ 8.215213] runc/2505 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [ 8.217387] ffffffff97206098 (tasklist_lock){.+.+}-{2:2}, at: brute_task_fatal_signal.cold+0x49/0xe0 [ 8.219554] [ 8.219554] and this task is already holding: [ 8.220891] ffff9a6a10eb8318 (&stats->lock){+.-.}-{2:2}, at: brute_task_fatal_signal+0x97/0x210 [ 8.222856] which would create a new lock dependency: [ 8.223943] (&stats->lock){+.-.}-{2:2} -> (tasklist_lock){.+.+}-{2:2} [ 8.225360] [ 8.225360] but this new dependency connects a SOFTIRQ-irq-safe lock: [ 8.227081] (&stats->lock){+.-.}-{2:2} [ 8.227084] [ 8.227084] ... which became SOFTIRQ-irq-safe at: [ 8.228732] lock_acquire+0x13f/0x3f0 [ 8.229381] _raw_spin_lock+0x2c/0x40 [ 8.230024] brute_task_free+0x22/0x90 [ 8.230619] security_task_free+0x22/0x50 [ 8.231260] __put_task_struct+0x58/0x140 [ 8.231924] rcu_core+0x2bb/0x620 [ 8.232492] __do_softirq+0x156/0x4d9 [ 8.233080] asm_call_irq_on_stack+0x12/0x20 [ 8.233759] do_softirq_own_stack+0x5b/0x70 [ 8.234425] irq_exit_rcu+0x9f/0xe0 [ 8.235221] sysvec_apic_timer_interrupt+0x43/0xa0 [ 8.235976] asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 8.236794] _raw_write_unlock_irq+0x2c/0x40 [ 8.237465] copy_process+0x15e6/0x1cb0 [ 8.237966] kernel_clone+0x9b/0x3f0 [ 8.238413] kernel_thread+0x55/0x70 [ 8.238861] call_usermodehelper_exec_work+0x77/0xb0 [ 8.239466] process_one_work+0x23e/0x580 [ 8.239969] worker_thread+0x55/0x3c0 [ 8.240425] kthread+0x141/0x160 [ 8.240833] ret_from_fork+0x22/0x30 [ 8.241276] [ 8.241276] to a SOFTIRQ-irq-unsafe lock: [ 8.241939] (tasklist_lock){.+.+}-{2:2} [ 8.241940] [ 8.241940] ... which became SOFTIRQ-irq-unsafe at: [ 8.243180] ... [ 8.243182] lock_acquire+0x13f/0x3f0 [ 8.243853] _raw_read_lock+0x5d/0x70 [ 8.244307] do_wait+0xd2/0x2e0 [ 8.244706] kernel_wait+0x49/0x90 [ 8.245126] call_usermodehelper_exec_work+0x61/0xb0 [ 8.245748] process_one_work+0x23e/0x580 [ 8.246238] worker_thread+0x55/0x3c0 [ 8.246685] kthread+0x141/0x160 [ 8.247086] ret_from_fork+0x22/0x30 [ 8.247522] [ 8.247522] other info that might help us debug this: [ 8.247522] [ 8.248480] Possible interrupt unsafe locking scenario: [ 8.248480] [ 8.249289] CPU0 CPU1 [ 8.249839] ---- ---- [ 8.250386] lock(tasklist_lock); [ 8.250811] local_irq_disable(); [ 8.251525] lock(&stats->lock); [ 8.252227] lock(tasklist_lock); [ 8.252938] <Interrupt> [ 8.253249] lock(&stats->lock); [ 8.253663] [ 8.253663] *** DEADLOCK *** [ 8.253663] [ 8.254362] 1 lock held by runc/2505: [ 8.254800] #0: ffff9a6a10eb8318 (&stats->lock){+.-.}-{2:2}, at: brute_task_fatal_signal+0x97/0x210 [ 8.255872] [ 8.255872] the dependencies between SOFTIRQ-irq-safe lock and the holding lock: [ 8.256931] -> (&stats->lock){+.-.}-{2:2} { [ 8.257432] HARDIRQ-ON-W at: [ 8.257809] lock_acquire+0x13f/0x3f0 [ 8.258442] _raw_spin_lock+0x2c/0x40 [ 8.259069] brute_task_free+0x22/0x90 [ 8.259710] security_task_free+0x22/0x50 [ 8.260384] __put_task_struct+0x58/0x140 [ 8.261078] rcu_core+0x2bb/0x620 [ 8.261677] __do_softirq+0x156/0x4d9 [ 8.262304] asm_call_irq_on_stack+0x12/0x20 [ 8.263010] do_softirq_own_stack+0x5b/0x70 [ 8.263699] irq_exit_rcu+0x9f/0xe0 [ 8.264315] sysvec_apic_timer_interrupt+0x43/0xa0 [ 8.265085] asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 8.265886] _raw_write_unlock_irq+0x2c/0x40 [ 8.266580] copy_process+0x15e6/0x1cb0 [ 8.267218] kernel_clone+0x9b/0x3f0 [ 8.267832] kernel_thread+0x55/0x70 [ 8.268444] call_usermodehelper_exec_work+0x77/0xb0 [ 8.269234] process_one_work+0x23e/0x580 [ 8.269912] worker_thread+0x55/0x3c0 [ 8.270546] kthread+0x141/0x160 [ 8.271144] ret_from_fork+0x22/0x30 [ 8.271771] IN-SOFTIRQ-W at: [ 8.272146] lock_acquire+0x13f/0x3f0 [ 8.272780] _raw_spin_lock+0x2c/0x40 [ 8.273398] brute_task_free+0x22/0x90 [ 8.274022] security_task_free+0x22/0x50 [ 8.274690] __put_task_struct+0x58/0x140 [ 8.275358] rcu_core+0x2bb/0x620 [ 8.275937] __do_softirq+0x156/0x4d9 [ 8.276559] asm_call_irq_on_stack+0x12/0x20 [ 8.277262] do_softirq_own_stack+0x5b/0x70 [ 8.277948] irq_exit_rcu+0x9f/0xe0 [ 8.278561] sysvec_apic_timer_interrupt+0x43/0xa0 [ 8.279324] asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 8.280121] _raw_write_unlock_irq+0x2c/0x40 [ 8.280845] copy_process+0x15e6/0x1cb0 [ 8.281491] kernel_clone+0x9b/0x3f0 [ 8.282117] kernel_thread+0x55/0x70 [ 8.282734] call_usermodehelper_exec_work+0x77/0xb0 [ 8.283514] process_one_work+0x23e/0x580 [ 8.284181] worker_thread+0x55/0x3c0 [ 8.284802] kthread+0x141/0x160 [ 8.285381] ret_from_fork+0x22/0x30 [ 8.285998] INITIAL USE at: [ 8.286365] lock_acquire+0x13f/0x3f0 [ 8.286975] _raw_spin_lock_irqsave+0x3b/0x60 [ 8.287665] brute_share_stats+0x17/0x70 [ 8.288311] brute_task_alloc+0x65/0x70 [ 8.288948] security_task_alloc+0x44/0xd0 [ 8.289605] copy_process+0x789/0x1cb0 [ 8.290223] kernel_clone+0x9b/0x3f0 [ 8.290819] kernel_thread+0x55/0x70 [ 8.291421] rest_init+0x21/0x258 [ 8.291995] start_kernel+0x566/0x587 [ 8.292611] secondary_startup_64_no_verify+0xc2/0xcb [ 8.293384] } [ 8.293586] ... key at: [<ffffffff984ce3a0>] __key.0+0x0/0x10 [ 8.294324] ... acquired at: [ 8.294679] lock_acquire+0x13f/0x3f0 [ 8.295133] _raw_read_lock+0x5d/0x70 [ 8.295589] brute_task_fatal_signal.cold+0x49/0xe0 [ 8.296185] security_task_fatal_signal+0x22/0x30 [ 8.296772] get_signal+0x3e0/0xcf0 [ 8.297212] arch_do_signal+0x30/0x880 [ 8.297682] exit_to_user_mode_prepare+0xfc/0x170 [ 8.298263] syscall_exit_to_user_mode+0x38/0x240 [ 8.298844] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 8.299464] [ 8.299648] [ 8.299648] the dependencies between the lock to be acquired [ 8.299648] and SOFTIRQ-irq-unsafe lock: [ 8.300969] -> (tasklist_lock){.+.+}-{2:2} { [ 8.301481] HARDIRQ-ON-R at: [ 8.301857] lock_acquire+0x13f/0x3f0 [ 8.302482] _raw_read_lock+0x5d/0x70 [ 8.303110] do_wait+0xd2/0x2e0 [ 8.303677] kernel_wait+0x49/0x90 [ 8.304266] call_usermodehelper_exec_work+0x61/0xb0 [ 8.305038] process_one_work+0x23e/0x580 [ 8.305700] worker_thread+0x55/0x3c0 [ 8.306320] kthread+0x141/0x160 [ 8.306895] ret_from_fork+0x22/0x30 [ 8.307506] SOFTIRQ-ON-R at: [ 8.307880] lock_acquire+0x13f/0x3f0 [ 8.308505] _raw_read_lock+0x5d/0x70 [ 8.309125] do_wait+0xd2/0x2e0 [ 8.309689] kernel_wait+0x49/0x90 [ 8.310272] call_usermodehelper_exec_work+0x61/0xb0 [ 8.311044] process_one_work+0x23e/0x580 [ 8.311709] worker_thread+0x55/0x3c0 [ 8.312335] kthread+0x141/0x160 [ 8.312910] ret_from_fork+0x22/0x30 [ 8.313529] INITIAL USE at: [ 8.313897] lock_acquire+0x13f/0x3f0 [ 8.314515] _raw_write_lock_irq+0x34/0x50 [ 8.315187] copy_process+0x11d5/0x1cb0 [ 8.315834] kernel_clone+0x9b/0x3f0 [ 8.316446] kernel_thread+0x55/0x70 [ 8.317057] rest_init+0x21/0x258 [ 8.317636] start_kernel+0x566/0x587 [ 8.318250] secondary_startup_64_no_verify+0xc2/0xcb [ 8.319033] INITIAL READ USE at: [ 8.319452] lock_acquire+0x13f/0x3f0 [ 8.320117] _raw_read_lock+0x5d/0x70 [ 8.320777] do_wait+0xd2/0x2e0 [ 8.321387] kernel_wait+0x49/0x90 [ 8.322031] call_usermodehelper_exec_work+0x61/0xb0 [ 8.322852] process_one_work+0x23e/0x580 [ 8.323557] worker_thread+0x55/0x3c0 [ 8.324220] kthread+0x141/0x160 [ 8.324837] ret_from_fork+0x22/0x30 [ 8.325490] } [ 8.325694] ... key at: [<ffffffff97206098>] tasklist_lock+0x18/0x40 [ 8.326509] ... acquired at: [ 8.326871] lock_acquire+0x13f/0x3f0 [ 8.327330] _raw_read_lock+0x5d/0x70 [ 8.327790] brute_task_fatal_signal.cold+0x49/0xe0 [ 8.328392] security_task_fatal_signal+0x22/0x30 [ 8.328958] get_signal+0x3e0/0xcf0 [ 8.329388] arch_do_signal+0x30/0x880 [ 8.329855] exit_to_user_mode_prepare+0xfc/0x170 [ 8.330434] syscall_exit_to_user_mode+0x38/0x240 [ 8.331010] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 8.331622] [ 8.331801] [ 8.331801] stack backtrace: [ 8.332322] CPU: 2 PID: 2505 Comm: runc Not tainted 5.10.0-rc3 #2 [ 8.333037] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 [ 8.334088] Call Trace: [ 8.334385] dump_stack+0x77/0x97 [ 8.334779] check_irq_usage.cold+0x279/0x2ec [ 8.335292] ? check_noncircular+0x75/0x110 [ 8.335783] __lock_acquire+0x12e0/0x2590 [ 8.336256] lock_acquire+0x13f/0x3f0 [ 8.336696] ? brute_task_fatal_signal.cold+0x49/0xe0 [ 8.337284] _raw_read_lock+0x5d/0x70 [ 8.337721] ? brute_task_fatal_signal.cold+0x49/0xe0 [ 8.338315] brute_task_fatal_signal.cold+0x49/0xe0 [ 8.338885] security_task_fatal_signal+0x22/0x30 [ 8.339435] get_signal+0x3e0/0xcf0 [ 8.339849] arch_do_signal+0x30/0x880 [ 8.340290] ? rcu_read_lock_sched_held+0x3f/0x70 [ 8.340853] ? kfree+0x25d/0x2c0 [ 8.341241] exit_to_user_mode_prepare+0xfc/0x170 [ 8.341796] syscall_exit_to_user_mode+0x38/0x240 [ 8.342350] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 8.342945] RIP: 0033:0x461923 [ 8.343310] Code: 24 20 c3 cc cc cc cc 48 8b 7c 24 08 8b 74 24 10 8b 54 24 14 4c 8b 54 24 18 4c 8b 44 24 20 44 8b 4c 24 28 b8 ca 00 00 00 0f 05 <89> 44 24 30 c3 cc cc cc cc cc cc cc cc 8b 7c 24 08 48 8b 74 24 10 [ 8.345488] RSP: 002b:00007fd62affcda8 EFLAGS: 00000286 ORIG_RAX: 00000000000000ca [ 8.346385] RAX: fffffffffffffe00 RBX: 000000c000032e00 RCX: 0000000000461923 [ 8.347225] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 0000000000b3b898 [ 8.348071] RBP: 00007fd62affcdf0 R08: 0000000000000000 R09: 0000000000000000 [ 8.349006] R10: 0000000000000000 R11: 0000000000000286 R12: 000000c000001680 [ 8.349838] R13: 00007ffd85b25fcf R14: 00007ffd85b260d0 R15: 00007fd62affcfc0 I think it should be possible to using existing task locking semantics to manage the statistics, but I'll need to take a closer look. > Other implementations > --------------------- > > The public version of grsecurity, as a summary, is based on the idea of > delay the fork system call if a child died due to a fatal error. This has > some issues: > > 1.- Bad practices: Add delays to the kernel is, in general, a bad idea. > > 2.- Weak points: This protection can be bypassed using two different > methods since it acts only when the fork is called after a child has > crashed. > > 2.1.- Bypass 1: So, it would still be possible for an attacker to fork > a big amount of children (in the order of thousands), then probe > all of them, and finally wait the protection time before repeat > the steps. > > 2.2.- Bypass 2: This method is based on the idea that the protection > doesn't act if the parent crashes. So, it would still be possible > for an attacker to fork a process and probe itself. Then, fork > the child process and probe itself again. This way, these steps > can be repeated infinite times without any mitigation. It's good to clarify what the expected behaviors should be; however, while working with the resulting system, it wasn't clear what the threat model was for this defense. I think we need two things: clear descriptions of what is expected to be detected (and what is not), and a set of self-tests that can be used to validate those expectations. Also, what, specifically, does "fatal error" cover? Is it strictly fatal signals? (i.e. "error" might refer to exit code, for example.) > > This implementation > ------------------- > > The main idea behind this implementation is to improve the existing ones > focusing on the weak points annotated before. The solution for the first > bypass method is to detect a fast crash rate instead of only one simple > crash. For the second bypass method the solution is to detect both the > crash of parent and child processes. Moreover, as a mitigation method it is > better to kill all the offending tasks involve in the attack instead of use > delays. > > So, the solution to the two bypass methods previously commented is to use > some statistical data shared across all the processes that can have the > same memory contents. Or in other words, a statistical data shared between > all the fork hierarchy processes after an execve system call. Hm, is this already tracked in some other way? i.e. the family hierarchy of the mm struct? They're only shared on clone, but get totally copied on fork(). Should that be the place where this is tracked instead? (i.e. I could fork and totally rearrange my VMAs.) > > The purpose of these statistics is to compute the application crash period > in order to detect an attack. This crash period is the time between the > execve system call and the first fault or the time between two consecutives > faults, but this has a drawback. If an application crashes once quickly > from the execve system call or crashes twice in a short period of time for > some reason, a false positive attack will be triggered. To avoid this > scenario the shared statistical data holds a list of the i last crashes > timestamps and the application crash period is computed as follows: > > crash_period = (n_last_timestamp - n_minus_i_timestamp) / i; > > This ways, the size of the last crashes timestamps list allows to fine > tuning the detection sensibility. Instead of a list, can't the rate just be calculated on an on-going basis? > When this crash period falls under a certain threshold there is a clear > signal that something malicious is happening. Once detected, the mitigation > only kills the processes that share the same statistical data and so, all > the tasks that can have the same memory contents. This way, an attack is > rejected. Here's where I think the threat model needs some more work. The above describes what I think is a less common situation. I expect the common attack to hold still with a single value, and let the fork/exec spin until the value lines up. (i.e. a fork is required.) Here are the threat scenarios that come to mind for me: 1- launching (fork/exec) a setuid process repeatedly until you get a desirable memory layout (e.g. what Stack Clash[1] did). 2- connecting to an exec()ing network daemon (e.g. xinetd) repeatedly until you get a desirable memory layout (e.g. what CTFs do for simple network service[2]). 3- launching processes _without_ exec (e.g. Android Zygote[3]), and exposing state to attack a sibling. 4- connecting to a fork()ing network daemon (e.g. apache) repeatedly until you expose the previously-shared memory layout of all the other children (e.g. kind of related to HeartBleed[3], though that was a direct exposure not a crasher). In each case, a privilege boundary is being crossed (setuid in the first, priv-changes in the 2nd, and network-to-local in the latter two), so I suspect that kind of detail will need to play a part in the design to help avoid false positives. Regardless, when I tested this series, 1 and 3 isn't detected, since they pass through an execve(), and I think that needs to be covered as well. [1] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt [2] https://github.com/BSidesPDX/CTF-2017/blob/master/pwn/200-leek/src/leek.service [3] https://link.springer.com/article/10.1007/s10207-018-00425-8 [4] https://heartbleed.com/ > > 1.- Per system enabling: This feature can be enabled at build time using > the CONFIG_SECURITY_FORK_BRUTE option or using the visual config > application under the following menu: > > Security options ---> Fork brute force attack detection and mitigation (there is a built-in boot time disabling too, by changing the lsm= bootparam) > > 2.- Per process enabling/disabling: To allow that specific applications can > turn off or turn on the detection and mitigation of a fork brute force > attack when required, there are two new prctls. > > prctl(PR_SECURITY_FORK_BRUTE_ENABLE, 0, 0, 0, 0) > prctl(PR_SECURITY_FORK_BRUTE_DISABLE, 0, 0, 0, 0) How do you see this being used? > 3.- Fine tuning: To customize the detection's sensibility there are two new > sysctl attributes that allow to set the last crashes timestamps list > size and the application crash period threshold (in milliseconds). Both > are accessible through the following files respectively. > > /proc/sys/kernel/brute/timestamps_list_size > /proc/sys/kernel/brute/crash_period_threshold > > The list size allows to avoid false positives due to crashes unrelated > with a real attack. The period threshold sets the time limit to detect > an attack. And, since a fork brute force attack will be detected if the > application crash period falls under this threshold, the higher this > value, the more sensitive the detection will be. I wonder if these will be needed as we narrow in on the specific threat model (i.e. there will be enough additional signal to obviate this tuning). And in testing I found another false positive that I haven't fully diagnosed. I found that at boot, with Docker installed, "runc" would immediately trip the mitigation. With some debugging added, I looks like runc had several forked processes that got SIGKILLed in quick succession, and then the entire group got killed by Brute. I haven't narrowed down what runc is doing here, but it makes me wonder if there might need to be an exception for user-space delivered signals, as opposed to kernel-delivered signals... Thanks again for the work! I'm liking the idea of getting a solid protection for this. It's been a long-standing hole in upstream. :) -Kees
On Tue, Nov 10, 2020 at 04:10:49PM -0800, Kees Cook wrote: > On Sun, Oct 25, 2020 at 02:45:32PM +0100, John Wood wrote: > > Attacks against vulnerable userspace applications with the purpose to break > > ASLR or bypass canaries traditionaly use some level of brute force with the > > help of the fork system call. This is possible since when creating a new > > process using fork its memory contents are the same as those of the parent > > process (the process that called the fork system call). So, the attacker > > can test the memory infinite times to find the correct memory values or the > > correct memory addresses without worrying about crashing the application. > > > > Based on the above scenario it would be nice to have this detected and > > mitigated, and this is the goal of this patch serie. > > Thanks for preparing the v2! I spent some time looking at this today, > and I really like how it has been rearranged into an LSM. This feels > much more natural. Thank you very much for reviewing this work. > Various notes: > > The locking isn't right; it'll trip with CONFIG_PROVE_LOCKING=y. > Here's the giant splat: > > [ 8.205146] brute: Fork brute force attack detected > [ 8.206821] > [ 8.207317] ===================================================== > [ 8.209392] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected > [ 8.211852] 5.10.0-rc3 #2 Not tainted > [ 8.213215] ----------------------------------------------------- > [...] Interesting... My ".config" also have this option "on" but this warning never appeared. Under what circumstances did it happen? I tested this patchset with two custom test programs with no problem. Maybe my tests are not good enough :( > I think it should be possible to using existing task locking semantics > to manage the statistics, but I'll need to take a closer look. > > > Other implementations > > --------------------- > > > > The public version of grsecurity, as a summary, is based on the idea of > > delay the fork system call if a child died due to a fatal error. This has > > some issues: > > > > 1.- Bad practices: Add delays to the kernel is, in general, a bad idea. > > > > 2.- Weak points: This protection can be bypassed using two different > > methods since it acts only when the fork is called after a child has > > crashed. > > > > 2.1.- Bypass 1: So, it would still be possible for an attacker to fork > > a big amount of children (in the order of thousands), then probe > > all of them, and finally wait the protection time before repeat > > the steps. > > > > 2.2.- Bypass 2: This method is based on the idea that the protection > > doesn't act if the parent crashes. So, it would still be possible > > for an attacker to fork a process and probe itself. Then, fork > > the child process and probe itself again. This way, these steps > > can be repeated infinite times without any mitigation. > > It's good to clarify what the expected behaviors should be; however, > while working with the resulting system, it wasn't clear what the threat > model was for this defense. I think we need two things: clear > descriptions of what is expected to be detected (and what is not), and a > set of self-tests that can be used to validate those expectations. Ok, I will make the necessary changes in order to clarify the commented points. I will also add the self-tests. Thanks for the suggestion. > Also, what, specifically, does "fatal error" cover? Is it strictly fatal > signals? (i.e. "error" might refer to exit code, for example.) Ok, understood. > > > > This implementation > > ------------------- > > > > The main idea behind this implementation is to improve the existing ones > > focusing on the weak points annotated before. The solution for the first > > bypass method is to detect a fast crash rate instead of only one simple > > crash. For the second bypass method the solution is to detect both the > > crash of parent and child processes. Moreover, as a mitigation method it is > > better to kill all the offending tasks involve in the attack instead of use > > delays. > > > > So, the solution to the two bypass methods previously commented is to use > > some statistical data shared across all the processes that can have the > > same memory contents. Or in other words, a statistical data shared between > > all the fork hierarchy processes after an execve system call. > > Hm, is this already tracked in some other way? i.e. the family hierarchy > of the mm struct? They're only shared on clone, but get totally copied on > fork(). Should that be the place where this is tracked instead? (i.e. I > could fork and totally rearrange my VMAs.) Great, I will give it a try and study this hierarchy. If this is feasible, the correct path shoud be to create a new security blob hold by the mm struct? Also, I think that new LSM hooks will be needed to notify the allocation, release and copy of this structure. What do you think, am I in the right direction? > > The purpose of these statistics is to compute the application crash period > > in order to detect an attack. This crash period is the time between the > > execve system call and the first fault or the time between two consecutives > > faults, but this has a drawback. If an application crashes once quickly > > from the execve system call or crashes twice in a short period of time for > > some reason, a false positive attack will be triggered. To avoid this > > scenario the shared statistical data holds a list of the i last crashes > > timestamps and the application crash period is computed as follows: > > > > crash_period = (n_last_timestamp - n_minus_i_timestamp) / i; > > > > This ways, the size of the last crashes timestamps list allows to fine > > tuning the detection sensibility. > > Instead of a list, can't the rate just be calculated on an on-going > basis? The first version computed the application crash period on the fly (without a list) using the accumulated time and the number of faults. period = time_since_execve / faults_since_execve; But this method has two major drawbacks. The first (noted by Jann Horn), is that if an application runs for a long time (say, a month) without any crash, and the detection threshold is set to 30 s (default value), an attacker can break the app approximately: 30*24*60*60/30 = 86400 times before the detection is triggered. The second drawback is that if the first fault is fast (from the execve system call) the detection will be triggered instantly. This is a problem if an application fails quickly for reasons that are not related to a real attack. I think to avoid the first drawback we can compute the crash period using a kind of weighted average (setting more weight for the last time between faults). But the second drawback does not go away. However, with the list we don't have any of the commented drawbacks (only a larger memory footprint and more calculation time). Any ideas to remove the list are welcome ;) > > When this crash period falls under a certain threshold there is a clear > > signal that something malicious is happening. Once detected, the mitigation > > only kills the processes that share the same statistical data and so, all > > the tasks that can have the same memory contents. This way, an attack is > > rejected. > > Here's where I think the threat model needs some more work. The above > describes what I think is a less common situation. I expect the common > attack to hold still with a single value, and let the fork/exec spin > until the value lines up. (i.e. a fork is required.) > > Here are the threat scenarios that come to mind for me: > > 1- launching (fork/exec) a setuid process repeatedly until you get a > desirable memory layout (e.g. what Stack Clash[1] did). > > 2- connecting to an exec()ing network daemon (e.g. xinetd) repeatedly > until you get a desirable memory layout (e.g. what CTFs do for simple > network service[2]). > > 3- launching processes _without_ exec (e.g. Android Zygote[3]), and > exposing state to attack a sibling. > > 4- connecting to a fork()ing network daemon (e.g. apache) repeatedly > until you expose the previously-shared memory layout of all the other > children (e.g. kind of related to HeartBleed[3], though that was a > direct exposure not a crasher). > > In each case, a privilege boundary is being crossed (setuid in the first, > priv-changes in the 2nd, and network-to-local in the latter two), so I > suspect that kind of detail will need to play a part in the design to > help avoid false positives. Ok, I will try to play with privilege boundary crossing in the next version. > Regardless, when I tested this series, 1 and 3 isn't detected, since > they pass through an execve(), and I think that needs to be covered as > well. Thanks, I will work on it and review the execve case. Also, thanks for showing me real examples of fork brute force attacks ;) > [1] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt > [2] https://github.com/BSidesPDX/CTF-2017/blob/master/pwn/200-leek/src/leek.service > [3] https://link.springer.com/article/10.1007/s10207-018-00425-8 > [4] https://heartbleed.com/ > > > > > 1.- Per system enabling: This feature can be enabled at build time using > > the CONFIG_SECURITY_FORK_BRUTE option or using the visual config > > application under the following menu: > > > > Security options ---> Fork brute force attack detection and mitigation > > (there is a built-in boot time disabling too, by changing the lsm= > bootparam) Thanks, I will add it to the documentation. > > 2.- Per process enabling/disabling: To allow that specific applications can > > turn off or turn on the detection and mitigation of a fork brute force > > attack when required, there are two new prctls. > > > > prctl(PR_SECURITY_FORK_BRUTE_ENABLE, 0, 0, 0, 0) > > prctl(PR_SECURITY_FORK_BRUTE_DISABLE, 0, 0, 0, 0) > > How do you see this being used? Maybe the PR_SECURITY_FORK_BRUTE_DISABLE prctl can be used by applications like "runc" (commented below) if we cannot exclude this false positive. I mean during phases that may need fork and kill in a quick way. Then, during other phases, the protection could be enabled again. Makes sense? > > 3.- Fine tuning: To customize the detection's sensibility there are two new > > sysctl attributes that allow to set the last crashes timestamps list > > size and the application crash period threshold (in milliseconds). Both > > are accessible through the following files respectively. > > > > /proc/sys/kernel/brute/timestamps_list_size > > /proc/sys/kernel/brute/crash_period_threshold > > > > The list size allows to avoid false positives due to crashes unrelated > > with a real attack. The period threshold sets the time limit to detect > > an attack. And, since a fork brute force attack will be detected if the > > application crash period falls under this threshold, the higher this > > value, the more sensitive the detection will be. > > I wonder if these will be needed as we narrow in on the specific threat > model (i.e. there will be enough additional signal to obviate this > tuning). I agree. If possible, I will remove as much configuration as possible. Thanks. > And in testing I found another false positive that I haven't fully > diagnosed. I found that at boot, with Docker installed, "runc" would > immediately trip the mitigation. With some debugging added, I looks like > runc had several forked processes that got SIGKILLed in quick succession, > and then the entire group got killed by Brute. I haven't narrowed down > what runc is doing here, but it makes me wonder if there might need > to be an exception for user-space delivered signals, as opposed to > kernel-delivered signals... I will try to work on this as well. > Thanks again for the work! I'm liking the idea of getting a solid > protection for this. It's been a long-standing hole in upstream. :) It's a pleasure to work on this protection. This is the perfect task to learn a lot. Thank you very much for the review and all the comments. I know this review was streamed on twitch but, due to my main language is not english, I cannot follow speech the way I want. Will this video be uploaded to youtube? There I can turn on the subtitles. This way I can learn and improve more things. Thanks again. > > -Kees > > -- > Kees Cook Regards, John Wood