Message ID | 20240221051608.43241-1-dmantipov@yandex.ru (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [RFC] net: smc: fix fasync leak in smc_release() | expand |
On 2024/2/21 13:16, Dmitry Antipov wrote: > I've tracked https://syzkaller.appspot.com/bug?extid=5f1acda7e06a2298fae6 > down to the problem which may be illustrated by the following pseudocode: > > int sock; > > /* thread 1 */ > > while (1) { > struct msghdr msg = { ... }; > sock = socket(AF_SMC, SOCK_STREAM, 0); > sendmsg(sock, &msg, MSG_FASTOPEN); > close(sock); > } > > /* thread 2 */ > > while (1) { > int on = 1; > ioctl(sock, FIOASYNC, &on); > on = 0; > ioctl(sock, FIOASYNC, &on); > } > > That is, something in thread 1 may cause 'smc_switch_to_fallback()' and > swap kernel sockets (of 'struct smc_sock') behind 'sock' between 'ioctl()' > calls in thread 2, so this becomes an attempt to add fasync entry to one > socket but remove from another one. When 'sock' is closing, '__fput()' > calls 'f_op->fasync()' _before_ 'f_op->release()', and it's too late to > revert the trick performed by 'smc_switch_to_fallback()' in 'smc_release()' > and below. Finally we end up with leaked 'struct fasync_struct' object > linked to the base socket, and this object is noticed by '__sock_release()' > ("fasync list not empty"). Of course using 'fasync_remove_entry()' in such > a way is extremely ugly, but what else we can do without touching generic > socket code, '__fput()', etc.? Comments are highly appreciated. > Hi, Dmitry. Just to confirm if I understand correctly: 1. on = 1; ioctl(sock, FIOASYNC, &on), a fasync entry is added to smc->sk.sk_socket->wq.fasync_list; 2. Then fallback happend, and swapped the socket: smc->clcsock->file = smc->sk.sk_socket->file; smc->clcsock->file->private_data = smc->clcsock; smc->clcsock->wq.fasync_list = smc->sk.sk_socket->wq.fasync_list; smc->sk.sk_socket->wq.fasync_list = NULL; 3. on = 0; ioctl(sock, FIOASYNC, &on), the fasync entry is removed from smc->clcsock->wq.fasync_list, (Is there a race between 2 and 3 ?) 4. Then close the file, __fput() calls file->f_op->fasync(-1, file, 0), then sock_fasync() calls fasync_helper(fd, filp, on, &wq->fasync_list) and fasync_remove_entry() removes entries in smc->clcsock->wq.fasync_list. Now smc->clcsock->wq.fasync_list is empty. 5. __fput() calls file->f_op->release(inode, file), then sock_close calls __sock_release, then ops->release calls smc_release(), and __smc_release() calls smc_restore_fallback_changes() to restore socket: if (smc->clcsock->file) { /* non-accepted sockets have no file yet */ smc->clcsock->file->private_data = smc->sk.sk_socket; smc->clcsock->file = NULL; smc_fback_restore_callbacks(smc); } 6. Then back to __sock_release, check if sock->wq.fasync_list (that is smc->sk.sk_socket->wq.fasync_list) is empty and it is empty. So in which step we leaked the fasync_struct entry in smc->sk.sk_socket->wq.fasync_list? Looks like I missed something, could you please point it to me? Thanks! > Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> > --- > net/smc/af_smc.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c > index 0f53a5c6fd9d..68cde9db5d2f 100644 > --- a/net/smc/af_smc.c > +++ b/net/smc/af_smc.c > @@ -337,9 +337,13 @@ static int smc_release(struct socket *sock) > else > lock_sock(sk); > > - if (old_state == SMC_INIT && sk->sk_state == SMC_ACTIVE && > - !smc->use_fallback) > + if (smc->use_fallback) { > + /* FIXME: ugly and should be done in some other way */ > + if (sock->wq.fasync_list) > + fasync_remove_entry(sock->file, &sock->wq.fasync_list); > + } else if (old_state == SMC_INIT && sk->sk_state == SMC_ACTIVE) { > smc_close_active_abort(smc); > + } > > rc = __smc_release(smc); >
On Wed, 2024-02-21 at 21:09 +0800, Wen Gu wrote: > 1. on = 1; ioctl(sock, FIOASYNC, &on), a fasync entry is added to > smc->sk.sk_socket->wq.fasync_list; > > 2. Then fallback happend, and swapped the socket: > smc->clcsock->file = smc->sk.sk_socket->file; > smc->clcsock->file->private_data = smc->clcsock; > smc->clcsock->wq.fasync_list = smc->sk.sk_socket->wq.fasync_list; > smc->sk.sk_socket->wq.fasync_list = NULL; > > 3. on = 0; ioctl(sock, FIOASYNC, &on), the fasync entry is removed > from smc->clcsock->wq.fasync_list, > (Is there a race between 2 and 3 ?) 1) IIUC yes. The following sequence from smc_switch_to_fallback(): smc->clcsock->file = smc->sk.sk_socket->file; smc->clcsock->file->private_data = smc->clcsock; is executed with locked smc->sk.sk_socket but unlocked smc->clcsock. This way, struct socket *sock = filp->private_data; in sock_fasync() introduces an undefined behavior (because an actual value of 'private_data' is unpredictable). So there are two possible scenarios. When on = 1; ioctl(sock, FIOASYNC, &on); on = 0; ioctl(sock, FIOASYNC, &on); actually modifies fasync list of the same socket, this works as expected. If kernel sockets behind 'sock' gets swapped between calls to ioctl(), fasync list of the first socket has an entry which can't be removed by the second ioctl(). > 4. Then close the file, __fput() calls file->f_op->fasync(-1, file, 0), > then sock_fasync() calls fasync_helper(fd, filp, on, &wq->fasync_list) > and fasync_remove_entry() removes entries in smc->clcsock->wq.fasync_list. > Now smc->clcsock->wq.fasync_list is empty. 2) No. In the second (bad) scenario from the above, an attempt to remove fasync entry from smc->clcsock->wq.fasync_list always fails because fasync entry was actually linked to smc->sk.sk_socket->wq.fasync_list. Note sock_fasync() doesn't check the value returned from fasync_helper(). How dumb. > 5. __fput() calls file->f_op->release(inode, file), then sock_close calls > __sock_release, then ops->release calls smc_release(), and __smc_release() > calls smc_restore_fallback_changes() to restore socket: > if (smc->clcsock->file) { /* non-accepted sockets have no file yet */ > smc->clcsock->file->private_data = smc->sk.sk_socket; > smc->clcsock->file = NULL; > smc_fback_restore_callbacks(smc); > } 3) Yes. And it's too late because __fput() calls file->f_op->fasync(-1, ..., 0) _before_ file->f_op->release(). So even if you have sockets swapped back, no one will take care about freeing fasync list. > 6. Then back to __sock_release, check if sock->wq.fasync_list (that is > smc->sk.sk_socket->wq.fasync_list) is empty and it is empty. 4) No. It's not empty because an attempt to remove fasync entry has failed at [2] just because it was made against the wrong socket. For your convenience, there is a human-readable reproducer loosely modeled around the one generated by syzkaller. You can try it on any system running recently enough kernel with CONFIG_SMC enabled (root is not required), and receiving a few (or may be a lot of) "__sock_release: fasync list not empty" messages clearly indicates an issue. NOTE: this shouldn't crash the system and/or make it unusable, but remember that each message comes with a small kernel memory leak. Dmitry #include <signal.h> #include <unistd.h> #include <pthread.h> #include <sys/ioctl.h> #include <sys/socket.h> int sock; void * loop0 (void *arg) { struct msghdr msg = { 0 }; while (1) { sock = socket (AF_SMC, SOCK_STREAM, 0); sendmsg (sock, &msg, MSG_FASTOPEN); close (sock); } return NULL; } void * loop1 (void *arg) { int on; while (1) { on = 1; ioctl (sock, FIOASYNC, &on); on = 0; ioctl (sock, FIOASYNC, &on); } return NULL; } int main (int argc, char *argv[]) { pthread_t a, b; struct sigaction sa = { 0 }; sa.sa_handler = SIG_IGN; sigaction (SIGIO, &sa, NULL); pthread_create (&a, NULL, loop0, NULL); pthread_create (&b, NULL, loop1, NULL); pause (); pthread_join (a, NULL); pthread_join (b, NULL); return 0; }
On 2024/2/21 23:02, Antipov, Dmitriy wrote: > On Wed, 2024-02-21 at 21:09 +0800, Wen Gu wrote: > >> 1. on = 1; ioctl(sock, FIOASYNC, &on), a fasync entry is added to >> smc->sk.sk_socket->wq.fasync_list; >> >> 2. Then fallback happend, and swapped the socket: >> smc->clcsock->file = smc->sk.sk_socket->file; >> smc->clcsock->file->private_data = smc->clcsock; >> smc->clcsock->wq.fasync_list = smc->sk.sk_socket->wq.fasync_list; >> smc->sk.sk_socket->wq.fasync_list = NULL; >> >> 3. on = 0; ioctl(sock, FIOASYNC, &on), the fasync entry is removed >> from smc->clcsock->wq.fasync_list, >> (Is there a race between 2 and 3 ?) > > 1) IIUC yes. The following sequence from smc_switch_to_fallback(): > > smc->clcsock->file = smc->sk.sk_socket->file; > smc->clcsock->file->private_data = smc->clcsock; > > is executed with locked smc->sk.sk_socket but unlocked smc->clcsock. > This way, > > struct socket *sock = filp->private_data; > > in sock_fasync() introduces an undefined behavior (because an > actual value of 'private_data' is unpredictable). So there are > two possible scenarios. When > > on = 1; ioctl(sock, FIOASYNC, &on); > on = 0; ioctl(sock, FIOASYNC, &on); > > actually modifies fasync list of the same socket, this works as > expected. If kernel sockets behind 'sock' gets swapped between > calls to ioctl(), fasync list of the first socket has an entry > which can't be removed by the second ioctl(). > Thank you for the explanation, Dmitriy. So the race could be: sock_fasync | smc_switch_to_fallback ------------------------------------------------------------------ /* smc->sk.sk_socket->wq.fasync_list| is empty at the beginning */ | | /* attempt to add fasync_struct */ | sock = filp->private_data; | (smc->sk.sk_socket) | /* fallback happens */ | lock_sock(sk); | file->private_data = smc->clcsock | smc->clcsock->wq.fasync_list = smc->sk.sk_socket->wq.fasync_list | smc->sk.sk_socket->wq.fasync_list = NULL | release_sock(sk); lock_sock(sk); | fasync_helper(on=1) | (successed to add the entry in | smc->sk.sk_socket->wq.fasync_list) | release_sock(sk); | | /* the fasync_struct entry can't be | removed later, since | file->private_data has been changed | to clcsock */ | Now clcsock->wq.fasync_list is empty and the fasync_struct entry is always left in smc->sk.sk_socket->wq.fasync_list. If we only remove fasync_struct entries in smc->sk.sk_socket->wq.fasync_list during smc_release, it indeed solves the leak, but it fails to address the problem of fasync_struct entries being incorrectly inserted into the old socket (they should have been placed in smc->clcsock->wq.fasync_list if fallback happens). One solution to this issue I can think of is to check whether filp->private_data has been changed when the sock_fasync holds the sock lock, but it inevitably changes the general code.. diff --git a/net/socket.c b/net/socket.c index ed3df2f749bf..a28435195854 100644 --- a/net/socket.c +++ b/net/socket.c @@ -1443,6 +1443,11 @@ static int sock_fasync(int fd, struct file *filp, int on) return -EINVAL; lock_sock(sk); + /* filp->private_data has changed */ + if (on && unlikely(sock != filp->private_data)) { + release_sock(sk); + return -EAGAIN; + } fasync_helper(fd, filp, on, &wq->fasync_list); if (!wq->fasync_list) Let's see if anyone else has a better idea. Best regards, Wen Gu > >> 4. Then close the file, __fput() calls file->f_op->fasync(-1, file, 0), >> then sock_fasync() calls fasync_helper(fd, filp, on, &wq->fasync_list) >> and fasync_remove_entry() removes entries in smc->clcsock->wq.fasync_list. >> Now smc->clcsock->wq.fasync_list is empty. > > 2) No. In the second (bad) scenario from the above, an attempt to remove > fasync entry from smc->clcsock->wq.fasync_list always fails because > fasync entry was actually linked to smc->sk.sk_socket->wq.fasync_list. > > Note sock_fasync() doesn't check the value returned from fasync_helper(). > How dumb. > >> 5. __fput() calls file->f_op->release(inode, file), then sock_close calls >> __sock_release, then ops->release calls smc_release(), and __smc_release() >> calls smc_restore_fallback_changes() to restore socket: >> if (smc->clcsock->file) { /* non-accepted sockets have no file yet */ >> smc->clcsock->file->private_data = smc->sk.sk_socket; >> smc->clcsock->file = NULL; >> smc_fback_restore_callbacks(smc); >> } > > 3) Yes. And it's too late because __fput() calls file->f_op->fasync(-1, ..., > 0) _before_ file->f_op->release(). So even if you have sockets swapped back, > no one will take care about freeing fasync list. > > >> 6. Then back to __sock_release, check if sock->wq.fasync_list (that is >> smc->sk.sk_socket->wq.fasync_list) is empty and it is empty. > > 4) No. It's not empty because an attempt to remove fasync entry has failed > at [2] just because it was made against the wrong socket. > > > For your convenience, there is a human-readable reproducer loosely modeled > around the one generated by syzkaller. You can try it on any system running > recently enough kernel with CONFIG_SMC enabled (root is not required), and > receiving a few (or may be a lot of) "__sock_release: fasync list not empty" > messages clearly indicates an issue. NOTE: this shouldn't crash the system > and/or make it unusable, but remember that each message comes with a small > kernel memory leak. > > Dmitry > > #include <signal.h> > #include <unistd.h> > #include <pthread.h> > #include <sys/ioctl.h> > #include <sys/socket.h> > > int sock; > > void * > loop0 (void *arg) > { > struct msghdr msg = { 0 }; > > while (1) > { > sock = socket (AF_SMC, SOCK_STREAM, 0); > sendmsg (sock, &msg, MSG_FASTOPEN); > close (sock); > } > return NULL; > } > > void * > loop1 (void *arg) > { > int on; > > while (1) > { > on = 1; > ioctl (sock, FIOASYNC, &on); > on = 0; > ioctl (sock, FIOASYNC, &on); > } > > return NULL; > } > > int > main (int argc, char *argv[]) > { > pthread_t a, b; > struct sigaction sa = { 0 }; > > sa.sa_handler = SIG_IGN; > sigaction (SIGIO, &sa, NULL); > > pthread_create (&a, NULL, loop0, NULL); > pthread_create (&b, NULL, loop1, NULL); > > pause (); > > pthread_join (a, NULL); > pthread_join (b, NULL); > > return 0; > } > >
On 2/23/24 06:36, Wen Gu wrote: > One solution to this issue I can think of is to check whether > filp->private_data has been changed when the sock_fasync holds the sock lock, > but it inevitably changes the general code.. > > diff --git a/net/socket.c b/net/socket.c > index ed3df2f749bf..a28435195854 100644 > --- a/net/socket.c > +++ b/net/socket.c > @@ -1443,6 +1443,11 @@ static int sock_fasync(int fd, struct file *filp, int on) > return -EINVAL; > > lock_sock(sk); > + /* filp->private_data has changed */ > + if (on && unlikely(sock != filp->private_data)) { > + release_sock(sk); > + return -EAGAIN; > + } > fasync_helper(fd, filp, on, &wq->fasync_list); > > if (!wq->fasync_list) > > Let's see if anyone else has a better idea. IIUC this is not a solution just because it decreases the probability of the race but doesn't eliminate it completely - an underlying socket switch (e.g. changing 'filp->private_data') may happen when 'fasync_helper()' is in progress. Dmitry
On 2024/3/5 00:35, Dmitry Antipov wrote: > On 2/23/24 06:36, Wen Gu wrote: > >> One solution to this issue I can think of is to check whether >> filp->private_data has been changed when the sock_fasync holds the sock lock, >> but it inevitably changes the general code.. >> >> diff --git a/net/socket.c b/net/socket.c >> index ed3df2f749bf..a28435195854 100644 >> --- a/net/socket.c >> +++ b/net/socket.c >> @@ -1443,6 +1443,11 @@ static int sock_fasync(int fd, struct file *filp, int on) >> return -EINVAL; >> >> lock_sock(sk); >> + /* filp->private_data has changed */ >> + if (on && unlikely(sock != filp->private_data)) { >> + release_sock(sk); >> + return -EAGAIN; >> + } >> fasync_helper(fd, filp, on, &wq->fasync_list); >> >> if (!wq->fasync_list) >> >> Let's see if anyone else has a better idea. > > IIUC this is not a solution just because it decreases the probability of the race > but doesn't eliminate it completely - an underlying socket switch (e.g. changing > 'filp->private_data') may happen when 'fasync_helper()' is in progress. > Hi Dmitry, IIUC, the fallback (or more precisely the private_data change) essentially always happens when the lock_sock(smc->sk) is held, except in smc_listen_work() or smc_listen_decline(), but at that moment, userspace program can not yet acquire this new socket to add fasync entries to the fasync_list. So IMHO, the above patch should work, since it checks the private_data under the lock_sock(sk). But if I missed something, please correct me. And I wonder if you can still see the leak with the patch above through your reproducer or syzbot's reproducer? I once ran your reproducer for about 50 mins and didn't see the leak. Thanks! > Dmitry > >
On 3/6/24 17:45, Wen Gu wrote: > IIUC, the fallback (or more precisely the private_data change) essentially > always happens when the lock_sock(smc->sk) is held, except in smc_listen_work() > or smc_listen_decline(), but at that moment, userspace program can not yet > acquire this new socket to add fasync entries to the fasync_list. > > So IMHO, the above patch should work, since it checks the private_data under > the lock_sock(sk). But if I missed something, please correct me. Well, the whole picture is somewhat more complicated. Consider the following diagram (an underlying kernel socket is in [], e.g. [smc->sk]): Thread 0 Thread 1 ioctl(sock, FIOASYNC, [1]) ... sock = filp->private_data; lock_sock(sock [smc->sk]); sock_fasync(sock, ..., 1) ; new fasync_struct linked to smc->sk release_sock(sock [smc->sk]); ... lock_sock([smc->sk]); ... smc_switch_to_fallback() ... smc->clcsock->file->private_data = smc->clcsock; ... release_sock([smc->sk]); ioctl(sock, FIOASYNC, [0]) ... sock = filp->private_data; lock_sock(sock [smc->clcsock]); sock_fasync(sock, ..., 0) ; nothing to unlink from smc->clcsock ; since fasync entry was linked to smc->sk release_sock(sock [smc->clcsock]); ... close(sock [smc->clcsock]); __fput(...); file->f_op->fasync(sock, [0]) ; always failed - ; should use ; smc->sk instead file->f_op->release() ... smc_restore_fallback_changes() ... file->private_data = smc->sk.sk_socket; That is, smc_restore_fallback_changes() restores filp->private_data to smc->sk. If __fput() would have called file->f_op->release() _before_ file->f_op->fasync(), the fix would be as simple as adding smc->sk.sk_socket->wq.fasync_list = smc->clcsock->wq.fasync_list; to smc_restore_fallback_changes(). But since file->f_op->fasync() is called before file->f_op->release(), the former always makes an attempt to unlink fasync entry from smc->clcsock instead of smc->sk, thus introducing the memory leak. And an idea with shared wait queue was intended in attempt to eliminate this chicken-egg lookalike problem completely. Dmitry
On 06/03/2024 19:07, Dmitry Antipov wrote: > > Well, the whole picture is somewhat more complicated. Consider the > following diagram (an underlying kernel socket is in [], e.g. [smc->sk]): > > Thread 0 Thread 1 > > ioctl(sock, FIOASYNC, [1]) > ... > sock = filp->private_data; > lock_sock(sock [smc->sk]); > sock_fasync(sock, ..., 1) ; new fasync_struct linked to smc->sk > release_sock(sock [smc->sk]); > ... > lock_sock([smc->sk]); > ... > smc_switch_to_fallback() > ... > smc->clcsock->file->private_data = > smc->clcsock; > ... > release_sock([smc->sk]); > ioctl(sock, FIOASYNC, [0]) > ... > sock = filp->private_data; > lock_sock(sock [smc->clcsock]); > sock_fasync(sock, ..., 0) ; nothing to unlink from smc->clcsock > ; since fasync entry was linked to smc->sk > release_sock(sock [smc->clcsock]); > ... > close(sock [smc->clcsock]); > __fput(...); > file->f_op->fasync(sock, [0]) ; > always failed - > ; > should use > ; > smc->sk instead > file->f_op->release() > ... > smc_restore_fallback_changes() > ... > file->private_data = smc->sk.sk_socket; Thank you Dmitry for your detailed explanations. It helped me a lot trying to understand the problem. And I'm still in the progress of trying to understand it. From my naive point of view: Would it help if we catch the ioctl and check if there is an active fallback and either stall or route the ioctl to the correct socket? I've seen that there is an ioctl function handle in the proto_ops. So on a very abstract level we could do the following: 1. Indicate an active Fallback at the start of the fallback to tcp. 2. Catch ioctls. 3. Check if there is an active fallback. NO: Pass the ioctl. YES: Wait for the fallback to complete and pass after. If this blocks too much we can obviously do some finer checks there. E.g.: Just check if the private data is already at attached to the socket. Do you think this would be a suiteable solution? I'm going to look into your proposal Dmitry to see how you solved the problem and to understand it better. Thanks - Jan > > That is, smc_restore_fallback_changes() restores filp->private_data to > smc->sk. If __fput() would have called file->f_op->release() _before_ > file->f_op->fasync(), the fix would be as simple as adding > > smc->sk.sk_socket->wq.fasync_list = smc->clcsock->wq.fasync_list; > > to smc_restore_fallback_changes(). But since file->f_op->fasync() is called > before file->f_op->release(), the former always makes an attempt to > unlink fasync > entry from smc->clcsock instead of smc->sk, thus introducing the memory > leak. > > And an idea with shared wait queue was intended in attempt to eliminate > this chicken-egg lookalike problem completely. > > Dmitry >
On 06/03/2024 19:07, Dmitry Antipov wrote: > On 3/6/24 17:45, Wen Gu wrote: > >> IIUC, the fallback (or more precisely the private_data change) >> essentially >> always happens when the lock_sock(smc->sk) is held, except in >> smc_listen_work() >> or smc_listen_decline(), but at that moment, userspace program can not >> yet >> acquire this new socket to add fasync entries to the fasync_list. >> >> So IMHO, the above patch should work, since it checks the private_data >> under >> the lock_sock(sk). But if I missed something, please correct me. > > Well, the whole picture is somewhat more complicated. Consider the > following diagram (an underlying kernel socket is in [], e.g. [smc->sk]): > > Thread 0 Thread 1 > > ioctl(sock, FIOASYNC, [1]) > ... > sock = filp->private_data; > lock_sock(sock [smc->sk]); > sock_fasync(sock, ..., 1) ; new fasync_struct linked to smc->sk > release_sock(sock [smc->sk]); > ... > lock_sock([smc->sk]); > ... > smc_switch_to_fallback() > ... > smc->clcsock->file->private_data = > smc->clcsock; > ... > release_sock([smc->sk]); > ioctl(sock, FIOASYNC, [0]) > ... > sock = filp->private_data; > lock_sock(sock [smc->clcsock]); > sock_fasync(sock, ..., 0) ; nothing to unlink from smc->clcsock > ; since fasync entry was linked to smc->sk > release_sock(sock [smc->clcsock]); > ... > close(sock [smc->clcsock]); > __fput(...); > file->f_op->fasync(sock, [0]) ; > always failed - > ; > should use > ; > smc->sk instead > file->f_op->release() > ... > smc_restore_fallback_changes() > ... > file->private_data = smc->sk.sk_socket; > > That is, smc_restore_fallback_changes() restores filp->private_data to > smc->sk. If __fput() would have called file->f_op->release() _before_ > file->f_op->fasync(), the fix would be as simple as adding > > smc->sk.sk_socket->wq.fasync_list = smc->clcsock->wq.fasync_list; > > to smc_restore_fallback_changes(). But since file->f_op->fasync() is called > before file->f_op->release(), the former always makes an attempt to > unlink fasync > entry from smc->clcsock instead of smc->sk, thus introducing the memory > leak. > > And an idea with shared wait queue was intended in attempt to eliminate > this chicken-egg lookalike problem completely. > > Dmitry > Me and Gerd had another look at this. The infrastructure for what i proposed in the last E-Mail regarding the ioctl function handler is already there (af_smc.c#smc_ioctl). There we already check if we are in a active fallback to send the ioctls to the clcsock instead of the sk socket. ``` lock_sock(&smc->sk); if (smc->use_fallback) { if (!smc->clcsock) { release_sock(&smc->sk); return -EBADF; } answ = smc->clcsock->ops->ioctl(smc->clcsock, cmd, arg); release_sock(&smc->sk); return answ; } ``` We think it might be an option to secure the path in this function with the smc->clcsock_release_lock. ``` lock_sock(&smc->sk); if (smc->use_fallback) { if (!smc->clcsock) { release_sock(&smc->sk); return -EBADF; } + mutex_lock(&smc->clcsock_release_lock); answ = smc->clcsock->ops->ioctl(smc->clcsock, cmd, arg); + mutex_unlock(&smc->clcsock_release_lock); release_sock(&smc->sk); return answ; } ``` What do yo think about this? I'm going to test this idea and see if we canget rid of the leak this way. Thanks - Jan & Gerd
On Thu, 2024-03-07 at 10:57 +0100, Jan Karcher wrote: > We think it might be an option to secure the path in this function with > the smc->clcsock_release_lock. > > ``` > lock_sock(&smc->sk); > if (smc->use_fallback) { > if (!smc->clcsock) { > release_sock(&smc->sk); > return -EBADF; > } > + mutex_lock(&smc->clcsock_release_lock); > answ = smc->clcsock->ops->ioctl(smc->clcsock, cmd, arg); > + mutex_unlock(&smc->clcsock_release_lock); > release_sock(&smc->sk); > return answ; > } > ``` > > What do yo think about this? You're trying to fix it on the wrong path. FIOASYNC is a generic rather than protocol-specific thing. So userspace 'ioctl(sock, FIOASYNC, [])' call is handled with: -> sys_ioctl() -> do_vfs_ioctl() -> ioctl_fioasync() -> filp->f_op->fasync() (which is sock_fasync() for all sockets) rather than 'sock->ops->ioctl(...)'. Dmitry
On 2024/3/7 02:07, Dmitry Antipov wrote: > On 3/6/24 17:45, Wen Gu wrote: > >> IIUC, the fallback (or more precisely the private_data change) essentially >> always happens when the lock_sock(smc->sk) is held, except in smc_listen_work() >> or smc_listen_decline(), but at that moment, userspace program can not yet >> acquire this new socket to add fasync entries to the fasync_list. >> >> So IMHO, the above patch should work, since it checks the private_data under >> the lock_sock(sk). But if I missed something, please correct me. > > Well, the whole picture is somewhat more complicated. Consider the > following diagram (an underlying kernel socket is in [], e.g. [smc->sk]): > > Thread 0 Thread 1 > > ioctl(sock, FIOASYNC, [1]) > ... > sock = filp->private_data; > lock_sock(sock [smc->sk]); > sock_fasync(sock, ..., 1) ; new fasync_struct linked to smc->sk > release_sock(sock [smc->sk]); > ... > lock_sock([smc->sk]); > ... > smc_switch_to_fallback() > ... > smc->clcsock->file->private_data = smc->clcsock; > ... > release_sock([smc->sk]); > ioctl(sock, FIOASYNC, [0]) > ... > sock = filp->private_data; > lock_sock(sock [smc->clcsock]); > sock_fasync(sock, ..., 0) ; nothing to unlink from smc->clcsock > ; since fasync entry was linked to smc->sk > release_sock(sock [smc->clcsock]); I don't understand why the fasync entry now can't be removed from clcsock->wq.fasync_list? since the fasync entry has been moved to clcsock->wq.fasync_list during fallback. The process you described above is: 1) An fasync entry was added into smc->sk.sk_socket->wq.fasync_list; 2) then fallback occurs, and the fasync entry is moved to clcsock->wq.fasync_list, and file->private_data is changed to smc->clcsock; 3) lastly we removed the fasync entry from clcsock->wq.fasync_list. It can be reproduced by follows, right? #include <signal.h> #include <unistd.h> #include <pthread.h> #include <sys/ioctl.h> #include <sys/socket.h> #include <stdio.h> int main (int argc, char *argv[]) { struct msghdr msg = { 0 }; int sock; int on; sock = socket(AF_SMC, SOCK_STREAM, 0); /* add fasync entry */ on = 1; ioctl(sock, FIOASYNC, &on); /* fallback */ sendmsg(sock, &msg, MSG_FASTOPEN); /* remove fasync entry */ on = 0; ioctl(sock, FIOASYNC, &on); close(sock); return 0; } and I added some prints in the kernel for quick debug: diff --git a/fs/fcntl.c b/fs/fcntl.c index c80a6acad742..79b8e435c380 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -880,6 +880,7 @@ int fasync_remove_entry(struct file *filp, struct fasync_struct **fapp) call_rcu(&fa->fa_rcu, fasync_free_rcu); filp->f_flags &= ~FASYNC; result = 1; + pr_warn("%s: wq->fasync_list %pK, fasync entry %pK\n", __func__, &(*fapp), fa); break; } spin_unlock(&fasync_lock); @@ -932,6 +933,7 @@ struct fasync_struct *fasync_insert_entry(int fd, struct file *filp, struct fasy new->fa_next = *fapp; rcu_assign_pointer(*fapp, new); filp->f_flags |= FASYNC; + pr_warn("%s: wq->fasync_list %pK, fasync entry %pK\n", __func__, &(*fapp), new); out: spin_unlock(&fasync_lock); diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 4b52b3b159c0..3b9737d42dbd 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -925,6 +925,9 @@ static int smc_switch_to_fallback(struct smc_sock *smc, int reason_code) smc->clcsock->wq.fasync_list = smc->sk.sk_socket->wq.fasync_list; smc->sk.sk_socket->wq.fasync_list = NULL; + pr_warn("%s: smc->sk wq.fasync_list %pK, clcsock->wq.fasync_list %pK\n", + __func__, &smc->sk.sk_socket->wq.fasync_list, + &smc->clcsock->wq.fasync_list); /* There might be some wait entries remaining * in smc sk->sk_wq and they should be woken up ran the reproducer, and dmesg shows: [] fasync_insert_entry: wq->fasync_list ffff96fdc0425e98, fasync entry ffff96fdccc62690 [] smc: smc_switch_to_fallback: smc->sk wq.fasync_list ffff96fdc0425e98, clcsock->wq.fasync_list ffff96fdc0426ed8 [] fasync_remove_entry: wq->fasync_list ffff96fdc0426ed8, fasync entry ffff96fdccc62690 It shows that 1. an fasync entry ffff96fdccc62690 is added into ffff96fdc0425e98 (smc->sk wq.fasync_list) 2. then fallback, all the fasync entris in smc->sk wq.fasync_list will be moved to clcsock->wq.fasync_list. 3. then the fasync entry ffff96fdccc62690 (the one in #1) is removed from ffff96fdc0426ed8 (clcsock->wq.fasync_list) What's wrong with this? In fact, I think what does cause this leak (maybe one of causes) is the race I discribed through the diagram in https://lore.kernel.org/netdev/19d7d71b-c911-45cc-9671-235d98720be6@linux.alibaba.com/ 1) sock_fasync() got the filp->private_data->wq (that is smc->sk.sk_socket->wq) and record it in 'wq'; 2) meanwhile, fallback occurs and filp->private_data changed, and from now on, user can only operate the clcsock based on file->private_data; 3) (race here) and sock_fasync() keep going and add an entry to 'wq'->fasync_list (that is smc->sk.sk_socket->wq); This fasync entry is the one that we can't removed later, since we start to operate clcsock after fallback. Thanks! > ... > close(sock [smc->clcsock]); > __fput(...); > file->f_op->fasync(sock, [0]) ; always failed - > ; should use > ; smc->sk instead > file->f_op->release() > ... > smc_restore_fallback_changes() > ... > file->private_data = smc->sk.sk_socket; > > That is, smc_restore_fallback_changes() restores filp->private_data to > smc->sk. If __fput() would have called file->f_op->release() _before_ > file->f_op->fasync(), the fix would be as simple as adding > > smc->sk.sk_socket->wq.fasync_list = smc->clcsock->wq.fasync_list; > > to smc_restore_fallback_changes(). But since file->f_op->fasync() is called > before file->f_op->release(), the former always makes an attempt to unlink fasync > entry from smc->clcsock instead of smc->sk, thus introducing the memory leak. > > And an idea with shared wait queue was intended in attempt to eliminate > this chicken-egg lookalike problem completely. > > Dmitry
On Thu, 2024-03-07 at 13:21 +0300, Dmitry Antipov wrote: > On Thu, 2024-03-07 at 10:57 +0100, Jan Karcher wrote: > > > We think it might be an option to secure the path in this function with > > the smc->clcsock_release_lock. > > > > ``` > > lock_sock(&smc->sk); > > if (smc->use_fallback) { > > if (!smc->clcsock) { > > release_sock(&smc->sk); > > return -EBADF; > > } > > + mutex_lock(&smc->clcsock_release_lock); > > answ = smc->clcsock->ops->ioctl(smc->clcsock, cmd, arg); > > + mutex_unlock(&smc->clcsock_release_lock); > > release_sock(&smc->sk); > > return answ; > > } > > ``` > > > > What do yo think about this? > > You're trying to fix it on the wrong path. FIOASYNC is a generic rather > than protocol-specific thing. So userspace 'ioctl(sock, FIOASYNC, [])' > call is handled with: > > -> sys_ioctl() > -> do_vfs_ioctl() > -> ioctl_fioasync() > -> filp->f_op->fasync() (which is sock_fasync() for all sockets) > > rather than 'sock->ops->ioctl(...)'. Any progress on this? Dmitry
On 2024/3/26 16:18, Antipov, Dmitriy wrote: > On Thu, 2024-03-07 at 13:21 +0300, Dmitry Antipov wrote: > >> On Thu, 2024-03-07 at 10:57 +0100, Jan Karcher wrote: >> >>> We think it might be an option to secure the path in this function with >>> the smc->clcsock_release_lock. >>> >>> ``` >>> lock_sock(&smc->sk); >>> if (smc->use_fallback) { >>> if (!smc->clcsock) { >>> release_sock(&smc->sk); >>> return -EBADF; >>> } >>> + mutex_lock(&smc->clcsock_release_lock); >>> answ = smc->clcsock->ops->ioctl(smc->clcsock, cmd, arg); >>> + mutex_unlock(&smc->clcsock_release_lock); >>> release_sock(&smc->sk); >>> return answ; >>> } >>> ``` >>> >>> What do yo think about this? >> >> You're trying to fix it on the wrong path. FIOASYNC is a generic rather >> than protocol-specific thing. So userspace 'ioctl(sock, FIOASYNC, [])' >> call is handled with: >> >> -> sys_ioctl() >> -> do_vfs_ioctl() >> -> ioctl_fioasync() >> -> filp->f_op->fasync() (which is sock_fasync() for all sockets) >> >> rather than 'sock->ops->ioctl(...)'. > > Any progress on this? Hi Dmitry, In my opinion, first we need to figure out what the root cause(race) of this leak is. I am not very convinced about your analysis[1] and gave some my thoughts about it[2]. I would appreciate if you give your response about that to make this issue clearer and get everyone on the same page (including SMC maintainers). Then we can see if your other proposal[3] is a proper solution to the issue or if anyone has a better idea. [1] https://lore.kernel.org/netdev/35584a9f-f4c2-423a-8bb8-2c729cedb6fe@yandex.ru/ [2] https://lore.kernel.org/netdev/a88a0731-6cbe-4987-b1e9-afa51f9ab057@linux.alibaba.com/ [3] https://lore.kernel.org/netdev/625c9519-7ae6-43a3-a5d0-81164ad7fd0e@yandex.ru/ Thanks. > > Dmitry >
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 0f53a5c6fd9d..68cde9db5d2f 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -337,9 +337,13 @@ static int smc_release(struct socket *sock) else lock_sock(sk); - if (old_state == SMC_INIT && sk->sk_state == SMC_ACTIVE && - !smc->use_fallback) + if (smc->use_fallback) { + /* FIXME: ugly and should be done in some other way */ + if (sock->wq.fasync_list) + fasync_remove_entry(sock->file, &sock->wq.fasync_list); + } else if (old_state == SMC_INIT && sk->sk_state == SMC_ACTIVE) { smc_close_active_abort(smc); + } rc = __smc_release(smc);
I've tracked https://syzkaller.appspot.com/bug?extid=5f1acda7e06a2298fae6 down to the problem which may be illustrated by the following pseudocode: int sock; /* thread 1 */ while (1) { struct msghdr msg = { ... }; sock = socket(AF_SMC, SOCK_STREAM, 0); sendmsg(sock, &msg, MSG_FASTOPEN); close(sock); } /* thread 2 */ while (1) { int on = 1; ioctl(sock, FIOASYNC, &on); on = 0; ioctl(sock, FIOASYNC, &on); } That is, something in thread 1 may cause 'smc_switch_to_fallback()' and swap kernel sockets (of 'struct smc_sock') behind 'sock' between 'ioctl()' calls in thread 2, so this becomes an attempt to add fasync entry to one socket but remove from another one. When 'sock' is closing, '__fput()' calls 'f_op->fasync()' _before_ 'f_op->release()', and it's too late to revert the trick performed by 'smc_switch_to_fallback()' in 'smc_release()' and below. Finally we end up with leaked 'struct fasync_struct' object linked to the base socket, and this object is noticed by '__sock_release()' ("fasync list not empty"). Of course using 'fasync_remove_entry()' in such a way is extremely ugly, but what else we can do without touching generic socket code, '__fput()', etc.? Comments are highly appreciated. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> --- net/smc/af_smc.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)