mbox series

[RFC,0/2] watch_queue: Fix pipe allocation and accounting

Message ID b34d5d5f-f936-4781-82d3-6a69fdec9b61@redhat.com (mailing list archive)
Headers show
Series watch_queue: Fix pipe allocation and accounting | expand

Message

Eric Sandeen Feb. 14, 2025, 3:34 p.m. UTC
(Giant disclaimer: I had not even heard of watch_queue.c until this week.
So this might be garbage, but I think at least the bug is real and I think
the analysis is correct.)

We got a bug report stating that doing this in a loop:

        pipe2(pipefd, O_NOTIFICATION_PIPE);
        ioctl(pipefd[0], IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE);
        close(pipefd[0]);
        close(pipefd[1]);

as an unprivileged user would eventually yield -EPERM. The bug actually
turned up when running
https://github.com/SELinuxProject/selinux-testsuite/tree/main/tests/watchkey

Analysis:

The -EPERM happens because the too_many_pipe_buffers_soft() test in
watch_queue_set_size() fails. That fails because user->pipe_bufs has
underflowed. user->pipe_bufs has underflowed because (abbreviated) ...

sys_pipe2
        __do_pipe_flags
                create_pipe_files
                        get_pipe_inode
                                alloc_pipe_info
                                        pipe_bufs = PIPE_DEF_BUFFERS; // (16)
                                        // charge 16 bufs to user
                                        account_pipe_buffers(user, 0, pipe_bufs);
                                        pipe->nr_accounted = pipe_bufs; // (16)

so now the pipe has nr_accounted set, normally to PIPE_DEF_BUFFERS (16)

Then, the ioctl:

IOC_WATCH_QUEUE_SET_SIZE
        watch_queue_set_size(nr_notes)
                nr_pages = nr_notes / WATCH_QUEUE_NOTES_PER_PAGE; // (8)
                // reduce bufs charged to user for this pipe from 16 to 8
                account_pipe_buffers(pipe->user, pipe->nr_accounted, nr_pages);
                nr_notes = nr_pages * WATCH_QUEUE_NOTES_PER_PAGE;
                pipe_resize_ring(pipe, roundup_pow_of_two(nr_slots == nr_notes));
                        if (!pipe_has_watch_queue(pipe))
                                pipe->nr_accounted = nr_slots;

Two things to note here:

Because pipe_has_watch_queue() is true, pipe->nr_accounted is not changed.
Also, pipe_resize_ring() resized to the number of notifications, not the
number of pages, though pipe_resize_ring() seems to expect a page count?

At this point wve hae charged 8 pages to user->pipe_bufs, but
pipe->nr_accounted is still 16. And, it seems that we have actually
allocated 8*32 pages to the pipe by virtue of calling pipe_resize_ring()
with nr_notes not nr_pages.

Finally, we close the pipes:

pipe_release
        put_pipe_info
                free_pipe_info
                        account_pipe_buffers(pipe->user, pipe->nr_accounted, 0);

This subtracts pipe->nr_accounted (16) from user->pipe_bufs, even though
per above we had only actually charged 8 to user->pipe_bufs, so it's a
net reduction. Do that in a loop, and eventually user->pipe_bufs underflows.

Maybe all that should be in the commit messages, but I'll try to reduce it for
now, any comments are appreciated.

(This does pass the handful of watch queue tests in LTP.)

thanks,
-Eric

Comments

Eric Sandeen Feb. 14, 2025, 4:28 p.m. UTC | #1
On 2/14/25 9:34 AM, Eric Sandeen wrote:
> (Giant disclaimer: I had not even heard of watch_queue.c until this week.
> So this might be garbage, but I think at least the bug is real and I think
> the analysis is correct.)
> 
> We got a bug report stating that doing this in a loop:
> 
>         pipe2(pipefd, O_NOTIFICATION_PIPE);
>         ioctl(pipefd[0], IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE);

I should have specified that BUF_SIZE is 256, which is why
watch_queue_set_size adjusts the accounting to
(256 / WATCH_QUEUE_NOTES_PER_PAGE) == 8 in my analysis.

The maximum allowed in IOC_WATCH_QUEUE_SET_SIZE is 512, which won't
expose the bug, but slightly smaller numbers (~480, which leads
to < 16 pages), will.

Thanks,
-Eric

>         close(pipefd[0]);
>         close(pipefd[1]);
> 
> as an unprivileged user would eventually yield -EPERM. The bug actually
> turned up when running
> https://github.com/SELinuxProject/selinux-testsuite/tree/main/tests/watchkey
> 
> Analysis:
> 
> The -EPERM happens because the too_many_pipe_buffers_soft() test in
> watch_queue_set_size() fails. That fails because user->pipe_bufs has
> underflowed. user->pipe_bufs has underflowed because (abbreviated) ...
> 
> sys_pipe2
>         __do_pipe_flags
>                 create_pipe_files
>                         get_pipe_inode
>                                 alloc_pipe_info
>                                         pipe_bufs = PIPE_DEF_BUFFERS; // (16)
>                                         // charge 16 bufs to user
>                                         account_pipe_buffers(user, 0, pipe_bufs);
>                                         pipe->nr_accounted = pipe_bufs; // (16)
> 
> so now the pipe has nr_accounted set, normally to PIPE_DEF_BUFFERS (16)
> 
> Then, the ioctl:
> 
> IOC_WATCH_QUEUE_SET_SIZE
>         watch_queue_set_size(nr_notes)
>                 nr_pages = nr_notes / WATCH_QUEUE_NOTES_PER_PAGE; // (8)
>                 // reduce bufs charged to user for this pipe from 16 to 8
>                 account_pipe_buffers(pipe->user, pipe->nr_accounted, nr_pages);
>                 nr_notes = nr_pages * WATCH_QUEUE_NOTES_PER_PAGE;
>                 pipe_resize_ring(pipe, roundup_pow_of_two(nr_slots == nr_notes));
>                         if (!pipe_has_watch_queue(pipe))
>                                 pipe->nr_accounted = nr_slots;
> 
> Two things to note here:
> 
> Because pipe_has_watch_queue() is true, pipe->nr_accounted is not changed.
> Also, pipe_resize_ring() resized to the number of notifications, not the
> number of pages, though pipe_resize_ring() seems to expect a page count?
> 
> At this point wve hae charged 8 pages to user->pipe_bufs, but
> pipe->nr_accounted is still 16. And, it seems that we have actually
> allocated 8*32 pages to the pipe by virtue of calling pipe_resize_ring()
> with nr_notes not nr_pages.
> 
> Finally, we close the pipes:
> 
> pipe_release
>         put_pipe_info
>                 free_pipe_info
>                         account_pipe_buffers(pipe->user, pipe->nr_accounted, 0);
> 
> This subtracts pipe->nr_accounted (16) from user->pipe_bufs, even though
> per above we had only actually charged 8 to user->pipe_bufs, so it's a
> net reduction. Do that in a loop, and eventually user->pipe_bufs underflows.
> 
> Maybe all that should be in the commit messages, but I'll try to reduce it for
> now, any comments are appreciated.
> 
> (This does pass the handful of watch queue tests in LTP.)
> 
> thanks,
> -Eric
> 
>