diff mbox series

[v2,4/5] audit: check if audit_queue is full after prepare_to_wait_exclusive()

Message ID 20230511052116.19452-5-eiichi.tsukata@nutanix.com (mailing list archive)
State Rejected
Delegated to: Paul Moore
Headers show
Series audit: refactors and fixes for potential deadlocks | expand

Commit Message

Eiichi Tsukata May 11, 2023, 5:21 a.m. UTC
Commit 7ffb8e317bae ("audit: we don't need to
__set_current_state(TASK_RUNNING)") accidentally moved queue full check
before add_wait_queue_exclusive() which introduced the following race:

    CPU1                           CPU2
  ========                       ========
  (in audit_log_start())         (in kauditd_thread())

  @audit_queue is full
                                 wake_up(&audit_backlog_wait)
                                 wait_event_freezable()
  add_wait_queue_exclusive()
  ...
  schedule_timeout()

Once this happens, both audit_log_start() and kauditd_thread() can cause
deadlock for up to backlog_wait_time waiting for each other. To prevent
the race, this patch adds @audit_queue full check after
prepare_to_wait_exclusive() and call schedule_timeout() only if the
queue is full.

Fixes: 7ffb8e317bae ("audit: we don't need to __set_current_state(TASK_RUNNING)")
Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
---
 kernel/audit.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Paul Moore May 19, 2023, 8:54 p.m. UTC | #1
On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
> 
> Commit 7ffb8e317bae ("audit: we don't need to
> __set_current_state(TASK_RUNNING)") accidentally moved queue full check
> before add_wait_queue_exclusive() which introduced the following race:
> 
>     CPU1                           CPU2
>   ========                       ========
>   (in audit_log_start())         (in kauditd_thread())
> 
>   @audit_queue is full
>                                  wake_up(&audit_backlog_wait)
>                                  wait_event_freezable()
>   add_wait_queue_exclusive()
>   ...
>   schedule_timeout()
> 
> Once this happens, both audit_log_start() and kauditd_thread() can cause
> deadlock for up to backlog_wait_time waiting for each other. To prevent
> the race, this patch adds @audit_queue full check after
> prepare_to_wait_exclusive() and call schedule_timeout() only if the
> queue is full.
> 
> Fixes: 7ffb8e317bae ("audit: we don't need to __set_current_state(TASK_RUNNING)")
> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
> ---
>  kernel/audit.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)

I discussed my concerns with this patch in the last patchset, and I
believe they still apply here.

--
paul-moore.com
Eiichi Tsukata May 22, 2023, 4:28 a.m. UTC | #2
> On May 20, 2023, at 5:54, Paul Moore <paul@paul-moore.com> wrote:
> 
> On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
>> 
>> Commit 7ffb8e317bae ("audit: we don't need to
>> __set_current_state(TASK_RUNNING)") accidentally moved queue full check
>> before add_wait_queue_exclusive() which introduced the following race:
>> 
>>    CPU1                           CPU2
>>  ========                       ========
>>  (in audit_log_start())         (in kauditd_thread())
>> 
>>  @audit_queue is full
>>                                 wake_up(&audit_backlog_wait)
>>                                 wait_event_freezable()
>>  add_wait_queue_exclusive()
>>  ...
>>  schedule_timeout()
>> 
>> Once this happens, both audit_log_start() and kauditd_thread() can cause
>> deadlock for up to backlog_wait_time waiting for each other. To prevent
>> the race, this patch adds @audit_queue full check after
>> prepare_to_wait_exclusive() and call schedule_timeout() only if the
>> queue is full.
>> 
>> Fixes: 7ffb8e317bae ("audit: we don't need to __set_current_state(TASK_RUNNING)")
>> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
>> ---
>> kernel/audit.c | 12 ++++++++++--
>> 1 file changed, 10 insertions(+), 2 deletions(-)
> 
> I discussed my concerns with this patch in the last patchset, and I
> believe they still apply here.
> 

Please refer to the implementation of ___wait_event().
It checks the condition *after* prepare_to_wait_event().

Another similar example in the kernel code is unix_wait_for_peer().
It checks unix_recvq_full() after prepare_to_wait_exclusive().

I’m assuming this is a logical bug needs to be fixed.

Eiichi
Paul Moore May 23, 2023, 9:02 p.m. UTC | #3
On Mon, May 22, 2023 at 12:28 AM Eiichi Tsukata
<eiichi.tsukata@nutanix.com> wrote:
> > On May 20, 2023, at 5:54, Paul Moore <paul@paul-moore.com> wrote:
> > On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
> >>
> >> Commit 7ffb8e317bae ("audit: we don't need to
> >> __set_current_state(TASK_RUNNING)") accidentally moved queue full check
> >> before add_wait_queue_exclusive() which introduced the following race:
> >>
> >>    CPU1                           CPU2
> >>  ========                       ========
> >>  (in audit_log_start())         (in kauditd_thread())
> >>
> >>  @audit_queue is full
> >>                                 wake_up(&audit_backlog_wait)
> >>                                 wait_event_freezable()
> >>  add_wait_queue_exclusive()
> >>  ...
> >>  schedule_timeout()
> >>
> >> Once this happens, both audit_log_start() and kauditd_thread() can cause
> >> deadlock for up to backlog_wait_time waiting for each other. To prevent
> >> the race, this patch adds @audit_queue full check after
> >> prepare_to_wait_exclusive() and call schedule_timeout() only if the
> >> queue is full.
> >>
> >> Fixes: 7ffb8e317bae ("audit: we don't need to __set_current_state(TASK_RUNNING)")
> >> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
> >> ---
> >> kernel/audit.c | 12 ++++++++++--
> >> 1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > I discussed my concerns with this patch in the last patchset, and I
> > believe they still apply here.
> >
>
> Please refer to the implementation of ___wait_event().
> It checks the condition *after* prepare_to_wait_event().
>
> Another similar example in the kernel code is unix_wait_for_peer().
> It checks unix_recvq_full() after prepare_to_wait_exclusive().
>
> I’m assuming this is a logical bug needs to be fixed.

I disagree, see my previous comments.  The fixes you've presented do
not eliminate the possibility of rescheduling which could result in
some of the issues you've described.  The proper fix for systems which
are sensitive to long scheduling delays such as this is to adjust your
audit runtime configuration so that audit does not block userspace.
Suggestions include removing the backlog limit and/or shortening the
wait time.
diff mbox series

Patch

diff --git a/kernel/audit.c b/kernel/audit.c
index bcbb0ba33c84..6b0cc0459984 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -644,8 +644,16 @@  static long wait_for_kauditd(long stime)
 
 	prepare_to_wait_exclusive(&audit_backlog_wait, &wait,
 				  TASK_UNINTERRUPTIBLE);
-	rtime = schedule_timeout(stime);
-	atomic_add(stime - rtime, &audit_backlog_wait_time_actual);
+
+	/* need to check if the queue is full again because kauditd might have
+	 * flushed the queue and went to sleep after prepare_to_wait_exclusive()
+	 */
+	if (audit_queue_full(&audit_queue)) {
+		rtime = schedule_timeout(stime);
+		atomic_add(stime - rtime, &audit_backlog_wait_time_actual);
+	} else {
+		rtime = 0;
+	}
 	finish_wait(&audit_backlog_wait, &wait);
 
 	return rtime;