diff mbox series

[v2,5/5] audit: do not use exclusive wait in audit_receive()

Message ID 20230511052116.19452-6-eiichi.tsukata@nutanix.com (mailing list archive)
State Rejected
Delegated to: Paul Moore
Headers show
Series audit: refactors and fixes for potential deadlocks | expand

Commit Message

Eiichi Tsukata May 11, 2023, 5:21 a.m. UTC
kauditd thread issues wake_up() before it goes to sleep. The wake_up()
call wakes up only one process as waiter side uses exclusive wait.
This can be problematic when there are multiple processes (one is in
audit_receive() and others are in audit_log_start()) waiting on
audit_backlog_wait queue.

For example, if there are two processes waiting:

  Process (A): in audit_receive()
  Process (B): in audit_log_start()

And (A) is at the head of the wait queue. Then kauditd's wake_up() only
wakes up (A) leaving (B) as it is even if @audit_queue is drained. As a
result, (B) can be blocked for up to backlog_wait_time.

To prevent the issue, use non-exclusive wait in audit_receive() so that
kauditd can wake up all waiters in audit_receive().

Fixes: 8f110f530635 ("audit: ensure userspace is penalized the same as the kernel when under pressure")
Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
---
 kernel/audit.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

Comments

Paul Moore May 19, 2023, 8:54 p.m. UTC | #1
On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
> 
> kauditd thread issues wake_up() before it goes to sleep. The wake_up()
> call wakes up only one process as waiter side uses exclusive wait.
> This can be problematic when there are multiple processes (one is in
> audit_receive() and others are in audit_log_start()) waiting on
> audit_backlog_wait queue.
> 
> For example, if there are two processes waiting:
> 
>   Process (A): in audit_receive()
>   Process (B): in audit_log_start()
> 
> And (A) is at the head of the wait queue. Then kauditd's wake_up() only
> wakes up (A) leaving (B) as it is even if @audit_queue is drained. As a
> result, (B) can be blocked for up to backlog_wait_time.
> 
> To prevent the issue, use non-exclusive wait in audit_receive() so that
> kauditd can wake up all waiters in audit_receive().
> 
> Fixes: 8f110f530635 ("audit: ensure userspace is penalized the same as the kernel when under pressure")
> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
> ---
>  kernel/audit.c | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)

This was also discussed in the last patchset.

--
paul-moore.com
Eiichi Tsukata May 22, 2023, 4:44 a.m. UTC | #2
> On May 20, 2023, at 5:54, Paul Moore <paul@paul-moore.com> wrote:
> 
> On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
>> 
>> kauditd thread issues wake_up() before it goes to sleep. The wake_up()
>> call wakes up only one process as waiter side uses exclusive wait.
>> This can be problematic when there are multiple processes (one is in
>> audit_receive() and others are in audit_log_start()) waiting on
>> audit_backlog_wait queue.
>> 
>> For example, if there are two processes waiting:
>> 
>>  Process (A): in audit_receive()
>>  Process (B): in audit_log_start()
>> 
>> And (A) is at the head of the wait queue. Then kauditd's wake_up() only
>> wakes up (A) leaving (B) as it is even if @audit_queue is drained. As a
>> result, (B) can be blocked for up to backlog_wait_time.
>> 
>> To prevent the issue, use non-exclusive wait in audit_receive() so that
>> kauditd can wake up all waiters in audit_receive().
>> 
>> Fixes: 8f110f530635 ("audit: ensure userspace is penalized the same as the kernel when under pressure")
>> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
>> ---
>> kernel/audit.c | 17 +++++++++++------
>> 1 file changed, 11 insertions(+), 6 deletions(-)
> 
> This was also discussed in the last patchset.
> 
> 

This bug is much easily reproducible on real environments and can cause problematic
user space failure like SSH connection timeout.
Let’s not keep the bug unfixed.
(Of course we’ve already carefully tuned audit related params and user space auditd config so that our product won’t hit backlog full.)

Other ideas in my minds are:

(1) Use different wait queues in audit_receive() and audit_log_start() to guarantee kautid 
  wake_up() tries to wake up a waiter in audit_log_start().

(2) Periodically (say in every 1 sec) check if @audit_queue is full in audit_receive() to prevent 
  audit_receive() from unnecessarily waiting for so long time. 

BTW, the default backlog_wait_time is 60 * HZ which seems pretty large.
I’d appreciate if you could tell me the reason behind that value.

Eiichi
Eiichi Tsukata May 23, 2023, 3:58 a.m. UTC | #3
> On May 22, 2023, at 13:44, Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
> 
> 
> 
>> On May 20, 2023, at 5:54, Paul Moore <paul@paul-moore.com> wrote:
>> 
>> On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
>>> 
>>> kauditd thread issues wake_up() before it goes to sleep. The wake_up()
>>> call wakes up only one process as waiter side uses exclusive wait.
>>> This can be problematic when there are multiple processes (one is in
>>> audit_receive() and others are in audit_log_start()) waiting on
>>> audit_backlog_wait queue.
>>> 
>>> For example, if there are two processes waiting:
>>> 
>>> Process (A): in audit_receive()
>>> Process (B): in audit_log_start()
>>> 
>>> And (A) is at the head of the wait queue. Then kauditd's wake_up() only
>>> wakes up (A) leaving (B) as it is even if @audit_queue is drained. As a
>>> result, (B) can be blocked for up to backlog_wait_time.
>>> 
>>> To prevent the issue, use non-exclusive wait in audit_receive() so that
>>> kauditd can wake up all waiters in audit_receive().
>>> 
>>> Fixes: 8f110f530635 ("audit: ensure userspace is penalized the same as the kernel when under pressure")
>>> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
>>> ---
>>> kernel/audit.c | 17 +++++++++++------
>>> 1 file changed, 11 insertions(+), 6 deletions(-)
>> 
>> This was also discussed in the last patchset.
>> 
>> 
> 
> This bug is much easily reproducible on real environments and can cause problematic
> user space failure like SSH connection timeout.
> Let’s not keep the bug unfixed.
> (Of course we’ve already carefully tuned audit related params and user space auditd config so that our product won’t hit backlog full.)
> 
> Other ideas in my minds are:
> 
> (1) Use different wait queues in audit_receive() and audit_log_start() to guarantee kautid 
>  wake_up() tries to wake up a waiter in audit_log_start().
> 
> (2) Periodically (say in every 1 sec) check if @audit_queue is full in audit_receive() to prevent 
>  audit_receive() from unnecessarily waiting for so long time. 
> 
> BTW, the default backlog_wait_time is 60 * HZ which seems pretty large.
> I’d appreciate if you could tell me the reason behind that value.
> 
> Eiichi

I came up with a better idea:

(3) Move wait_for_kauditd() in audit_receive() *before* audit_ctl_lock() 
 and restrict penalty only for msg_type which can queue a new audit record. (AUDIT_USER, AUDIT_TRIM, AUDIT_MAKE_EQUIV, ..)

Originally, it’s not reasonable to give penalty for innocent operation
like AUDIT_GET.
This approach makes successive audit_log_end() wake up kauditd.
Also it prevents audit_log_end() from queueing skb ignoring backlog_limit.

Eiichi
Paul Moore May 23, 2023, 9:07 p.m. UTC | #4
On Mon, May 22, 2023 at 11:58 PM Eiichi Tsukata
<eiichi.tsukata@nutanix.com> wrote:
> > On May 22, 2023, at 13:44, Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
> >> On May 20, 2023, at 5:54, Paul Moore <paul@paul-moore.com> wrote:
> >> On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
> >>>
> >>> kauditd thread issues wake_up() before it goes to sleep. The wake_up()
> >>> call wakes up only one process as waiter side uses exclusive wait.
> >>> This can be problematic when there are multiple processes (one is in
> >>> audit_receive() and others are in audit_log_start()) waiting on
> >>> audit_backlog_wait queue.
> >>>
> >>> For example, if there are two processes waiting:
> >>>
> >>> Process (A): in audit_receive()
> >>> Process (B): in audit_log_start()
> >>>
> >>> And (A) is at the head of the wait queue. Then kauditd's wake_up() only
> >>> wakes up (A) leaving (B) as it is even if @audit_queue is drained. As a
> >>> result, (B) can be blocked for up to backlog_wait_time.
> >>>
> >>> To prevent the issue, use non-exclusive wait in audit_receive() so that
> >>> kauditd can wake up all waiters in audit_receive().
> >>>
> >>> Fixes: 8f110f530635 ("audit: ensure userspace is penalized the same as the kernel when under pressure")
> >>> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
> >>> ---
> >>> kernel/audit.c | 17 +++++++++++------
> >>> 1 file changed, 11 insertions(+), 6 deletions(-)
> >>
> >> This was also discussed in the last patchset.
> >
> > This bug is much easily reproducible on real environments and can cause problematic
> > user space failure like SSH connection timeout.
> > Let’s not keep the bug unfixed.
> > (Of course we’ve already carefully tuned audit related params and user space auditd config so that our product won’t hit backlog full.)

Good.  Resolving your issues through audit runtime configuration is
the proper solution to this.

> > BTW, the default backlog_wait_time is 60 * HZ which seems pretty large.
> > I’d appreciate if you could tell me the reason behind that value.

I do not recall the original logic behind that value.  It is likely
that the original value predated my maintenance of the audit
subsystem.
diff mbox series

Patch

diff --git a/kernel/audit.c b/kernel/audit.c
index 6b0cc0459984..ef48210343ae 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -631,22 +631,27 @@  static void kauditd_retry_skb(struct sk_buff *skb, __always_unused int error)
 /**
  * wait_for_kauditd - Wait for kauditd to drain the queue
  * @stime: time to sleep
+ * @exclusive: use exclusive wait if true
  *
  * Description:
  * Waits for kauditd to drain the queue then adds duration of sleep time to
  * audit_backlog_wait_time_actual as cumulative sum.
  * Returns remaining time to sleep.
  */
-static long wait_for_kauditd(long stime)
+static long wait_for_kauditd(long stime, bool exclusive)
 {
 	long rtime;
 	DEFINE_WAIT(wait);
 
-	prepare_to_wait_exclusive(&audit_backlog_wait, &wait,
-				  TASK_UNINTERRUPTIBLE);
+	if (exclusive)
+		prepare_to_wait_exclusive(&audit_backlog_wait, &wait,
+					  TASK_UNINTERRUPTIBLE);
+	else
+		prepare_to_wait(&audit_backlog_wait, &wait,
+				TASK_UNINTERRUPTIBLE);
 
 	/* need to check if the queue is full again because kauditd might have
-	 * flushed the queue and went to sleep after prepare_to_wait_exclusive()
+	 * flushed the queue and went to sleep after prepare_to_wait_*()
 	 */
 	if (audit_queue_full(&audit_queue)) {
 		rtime = schedule_timeout(stime);
@@ -1601,7 +1606,7 @@  static void audit_receive(struct sk_buff  *skb)
 	if (audit_queue_full(&audit_queue)) {
 		/* wake kauditd to try and flush the queue */
 		wake_up_interruptible(&kauditd_wait);
-		wait_for_kauditd(audit_backlog_wait_time);
+		wait_for_kauditd(audit_backlog_wait_time, false);
 	}
 }
 
@@ -1900,7 +1905,7 @@  struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
 			/* sleep if we are allowed and we haven't exhausted our
 			 * backlog wait limit */
 			if (gfpflags_allow_blocking(gfp_mask) && stime > 0) {
-				stime = wait_for_kauditd(stime);
+				stime = wait_for_kauditd(stime, true);
 			} else {
 				if (audit_rate_check() && printk_ratelimit())
 					pr_warn("audit_backlog=%d > audit_backlog_limit=%d\n",