Message ID | 20210420154140.80034-1-kuniyu@amazon.co.jp (mailing list archive) |
---|---|
Headers | show |
Series | Socket migration for SO_REUSEPORT. | expand |
On 4/20/21 5:41 PM, Kuniyuki Iwashima wrote: > The SO_REUSEPORT option allows sockets to listen on the same port and to > accept connections evenly. However, there is a defect in the current > implementation [1]. When a SYN packet is received, the connection is tied > to a listening socket. Accordingly, when the listener is closed, in-flight > requests during the three-way handshake and child sockets in the accept > queue are dropped even if other listeners on the same port could accept > such connections. > > This situation can happen when various server management tools restart > server (such as nginx) processes. For instance, when we change nginx > configurations and restart it, it spins up new workers that respect the new > configuration and closes all listeners on the old workers, resulting in the > in-flight ACK of 3WHS is responded by RST. > > The SO_REUSEPORT option is excellent to improve scalability. This was before the SYN processing was made lockless. I really wonder if we still need SO_REUSEPORT for TCP ? Eventually a new accept() system call where different threads can express how they want to choose the children sockets would be less invasive. Instead of having many listeners, have one listener and eventually multiple accept queues to improve scalability of accept() phase.
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 20 Apr 2021 18:43:36 +0200 > On 4/20/21 5:41 PM, Kuniyuki Iwashima wrote: > > The SO_REUSEPORT option allows sockets to listen on the same port and to > > accept connections evenly. However, there is a defect in the current > > implementation [1]. When a SYN packet is received, the connection is tied > > to a listening socket. Accordingly, when the listener is closed, in-flight > > requests during the three-way handshake and child sockets in the accept > > queue are dropped even if other listeners on the same port could accept > > such connections. > > > > This situation can happen when various server management tools restart > > server (such as nginx) processes. For instance, when we change nginx > > configurations and restart it, it spins up new workers that respect the new > > configuration and closes all listeners on the old workers, resulting in the > > in-flight ACK of 3WHS is responded by RST. > > > > The SO_REUSEPORT option is excellent to improve scalability. > > This was before the SYN processing was made lockless. > > I really wonder if we still need SO_REUSEPORT for TCP ? I'm sorry this might be misleading. This was an old topic in v3.5. Also, scalability or performance are not the primary reason to use SO_REUSEPORT for now. There are cases which need SO_REUSEPORT for other reasons. If servers take both UDP and TCP requests (for example, proxy of QUIC and HTTP2), it is nice to have the same eBPF mechanism to handle UDP and TCP. Also, about reloading configurations, some applications want to keep it simple to reload configurations by replacing processes. Then, even with the new accept() syscall, I think there would be migration (of queue or of children) needed. If the way was like fd passing, it might not work when the process died in the middle of fd passing. So, I think it is better to do migration in kernel without interaction with the old process. In this point, SO_REUSEPORT is good because we can bind a new process without interaction with the old process. And with this patchset, we can migrate requests by close()/shutdown() the old listener. > > Eventually a new accept() system call where different threads > can express how they want to choose the children sockets would > be less invasive. > > Instead of having many listeners, have one listener and eventually multiple > accept queues to improve scalability of accept() phase. It sounds interesting. Could you elaborate the idea ? And sorry, I couldn't understand correctly what "invasive" means. Does it mean the new accept() will have less change or more simple API or something other ? Also, I wonder if the new accept() has similar flexibility as eBPF does.