Message ID | 20230508075812.76077-1-eiichi.tsukata@nutanix.com (mailing list archive) |
---|---|
Headers | show |
Series | audit: refactor and fix for potential deadlock | expand |
On Mon, May 8, 2023 at 3:58 AM Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote: > Commit 7ffb8e317bae ("audit: we don't need to > __set_current_state(TASK_RUNNING)") accidentally moved queue full check > before add_wait_queue_exclusive() which introduced the following race: > > CPU1 CPU2 > ======== ======== > (in audit_log_start()) (in kauditd_thread()) > > queue is full > wake_up(&audit_backlog_wait) > wait_event_freezable() > add_wait_queue_exclusive() > ... > schedule_timeout() > > Once this happens, both audit_log_start() and kauditd_thread() can cause > deadlock for up to backlog_wait_time waiting for each other. To prevent > the race, this patch adds queue full check after > prepare_to_wait_exclusive(). Have you seen this occur in practice? The first thing that audit_log_start() does when detecting that @audit_queue is full is to wake kauditd_thread, which should start immediately draining @audit_queue (even if a daemon is not listening, the @audit_queue will be drained into one of the other queues). While it is hard to predict scheduling behavior, it seems unlikely that kauditd_thread would be able to process the entire queue before audit_log_start() progresses from waking kauditd_thread() to sleeping. In the absolute worst case, the task attempting to emit an audit record sleeps for either the configured wait time or until it is awakened by kauditd_thread() which could be triggered by another task attempting to emit an audit record.
> On May 8, 2023, at 23:07, Paul Moore <paul@paul-moore.com> wrote: > > On Mon, May 8, 2023 at 3:58 AM Eiichi Tsukata > <eiichi.tsukata@nutanix.com> wrote: >> Commit 7ffb8e317bae ("audit: we don't need to >> __set_current_state(TASK_RUNNING)") accidentally moved queue full check >> before add_wait_queue_exclusive() which introduced the following race: >> >> CPU1 CPU2 >> ======== ======== >> (in audit_log_start()) (in kauditd_thread()) >> >> queue is full >> wake_up(&audit_backlog_wait) >> wait_event_freezable() >> add_wait_queue_exclusive() >> ... >> schedule_timeout() >> >> Once this happens, both audit_log_start() and kauditd_thread() can cause >> deadlock for up to backlog_wait_time waiting for each other. To prevent >> the race, this patch adds queue full check after >> prepare_to_wait_exclusive(). > > Have you seen this occur in practice? Yes, we hit this issue multiple times, though it’s pretty rare as you are mentioning. In our case, sshd got stuck in audit_log_user_message(), which caused SSH connection timeout. Eiichi
> On May 9, 2023, at 10:34, Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote: > > > >> On May 8, 2023, at 23:07, Paul Moore <paul@paul-moore.com> wrote: >> >> On Mon, May 8, 2023 at 3:58 AM Eiichi Tsukata >> <eiichi.tsukata@nutanix.com> wrote: >>> Commit 7ffb8e317bae ("audit: we don't need to >>> __set_current_state(TASK_RUNNING)") accidentally moved queue full check >>> before add_wait_queue_exclusive() which introduced the following race: >>> >>> CPU1 CPU2 >>> ======== ======== >>> (in audit_log_start()) (in kauditd_thread()) >>> >>> queue is full >>> wake_up(&audit_backlog_wait) >>> wait_event_freezable() >>> add_wait_queue_exclusive() >>> ... >>> schedule_timeout() >>> >>> Once this happens, both audit_log_start() and kauditd_thread() can cause >>> deadlock for up to backlog_wait_time waiting for each other. To prevent >>> the race, this patch adds queue full check after >>> prepare_to_wait_exclusive(). >> >> Have you seen this occur in practice? > > Yes, we hit this issue multiple times, though it’s pretty rare as you are mentioning. > In our case, sshd got stuck in audit_log_user_message(), which caused SSH connection > timeout. > I found another case. kauditd_thread issues wake_up(&audit_backlog_wait) once after wake up. As waiter side is using add_wait_queue_exclusive() it only wakes up one process at once. If kauditd wakes up a process which is sleeping in audit_log_start(), then the process will queue skb and issue wake_up_interruptible(&kauditd_wait). No problem. But if kauditd wakes up a process which is sleeping in audit_receive(), then the process won’t try to wake up kauditd. In this case other processes sleeping in audit_log_start() keep sleeping even if kauditd have flushed the queue. At this point I’m planning to use non-exclusive wait in audit_receive() in v2. Let me know if we should use wake_up_all() in kauditd or you have better solution. Eiichi
On Mon, May 8, 2023 at 9:49 PM Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote: > > On May 8, 2023, at 23:07, Paul Moore <paul@paul-moore.com> wrote: > > On Mon, May 8, 2023 at 3:58 AM Eiichi Tsukata > > <eiichi.tsukata@nutanix.com> wrote: > >> Commit 7ffb8e317bae ("audit: we don't need to > >> __set_current_state(TASK_RUNNING)") accidentally moved queue full check > >> before add_wait_queue_exclusive() which introduced the following race: > >> > >> CPU1 CPU2 > >> ======== ======== > >> (in audit_log_start()) (in kauditd_thread()) > >> > >> queue is full > >> wake_up(&audit_backlog_wait) > >> wait_event_freezable() > >> add_wait_queue_exclusive() > >> ... > >> schedule_timeout() > >> > >> Once this happens, both audit_log_start() and kauditd_thread() can cause > >> deadlock for up to backlog_wait_time waiting for each other. To prevent > >> the race, this patch adds queue full check after > >> prepare_to_wait_exclusive(). > > > > Have you seen this occur in practice? > > Yes, we hit this issue multiple times, though it’s pretty rare as you are mentioning. > In our case, sshd got stuck in audit_log_user_message(), which caused SSH connection > timeout. Sorry for the delay, I was away at a conference last week. Regardless of how we tweak the wait, there is always going to be a chance that rescheduling the task with a timeout is going to impact the application. If you have sensitive applications where this is proving to be a problem I would suggest tuning your backlog size, the wait time, or removing the backlog limit entirely. -- paul-moore.com
On Wed, May 10, 2023 at 4:09 AM Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote: > > On May 9, 2023, at 10:34, Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote: > >> On May 8, 2023, at 23:07, Paul Moore <paul@paul-moore.com> wrote: > >> On Mon, May 8, 2023 at 3:58 AM Eiichi Tsukata > >> <eiichi.tsukata@nutanix.com> wrote: > >>> Commit 7ffb8e317bae ("audit: we don't need to > >>> __set_current_state(TASK_RUNNING)") accidentally moved queue full check > >>> before add_wait_queue_exclusive() which introduced the following race: > >>> > >>> CPU1 CPU2 > >>> ======== ======== > >>> (in audit_log_start()) (in kauditd_thread()) > >>> > >>> queue is full > >>> wake_up(&audit_backlog_wait) > >>> wait_event_freezable() > >>> add_wait_queue_exclusive() > >>> ... > >>> schedule_timeout() > >>> > >>> Once this happens, both audit_log_start() and kauditd_thread() can cause > >>> deadlock for up to backlog_wait_time waiting for each other. To prevent > >>> the race, this patch adds queue full check after > >>> prepare_to_wait_exclusive(). > >> > >> Have you seen this occur in practice? > > > > Yes, we hit this issue multiple times, though it’s pretty rare as you are mentioning. > > In our case, sshd got stuck in audit_log_user_message(), which caused SSH connection > > timeout. > > > > I found another case. > > kauditd_thread issues wake_up(&audit_backlog_wait) once after wake up. > As waiter side is using add_wait_queue_exclusive() it only wakes up one process at once. > > If kauditd wakes up a process which is sleeping in audit_log_start(), then the process will > queue skb and issue wake_up_interruptible(&kauditd_wait). No problem. > But if kauditd wakes up a process which is sleeping in audit_receive(), then the process > won’t try to wake up kauditd. In this case other processes sleeping in audit_log_start() > keep sleeping even if kauditd have flushed the queue. > > At this point I’m planning to use non-exclusive wait in audit_receive() in v2. > Let me know if we should use wake_up_all() in kauditd or you have better solution. The nice part about marking the waiters as WQ_FLAG_EXCLUSIVE is that we avoid the thundering herd problem. The waiter was held in the first place because the system was under high levels of audit pressure, so it stands to reason that a slow wake-up of the waiters is a wise choice to avoid re-entering another audit pressure state. I'll take a look at your v2 patch, but as things stand I still believe that this is better addressed via runtime configuration (see my previous email). However, some of the refactoring you've done might be nice, I'll reply back on that in your v2 patch. Thanks.