Message ID | 8fcb0ad6f324008ccadfd1811d91b3145bbf95fd.1605199807.git.pabeni@redhat.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | mptcp: improve multiple xmit streams support | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net-next |
netdev/subject_prefix | success | Link |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 1527 this patch: 1527 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | fail | CHECK: Alignment should match open parenthesis ERROR: do not use assignment in if condition |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 1513 this patch: 1513 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
On Thu, 12 Nov 2020 18:45:21 +0100 Paolo Abeni wrote: > + skb = sk_stream_alloc_skb(sk, 0, sk->sk_allocation, > + tcp_rtx_and_write_queues_empty(sk)); no good reason to misalign this AFAICT
On Thu, 12 Nov 2020 15:08:31 -0800 Jakub Kicinski wrote: > On Thu, 12 Nov 2020 18:45:21 +0100 Paolo Abeni wrote: > > + skb = sk_stream_alloc_skb(sk, 0, sk->sk_allocation, > > + tcp_rtx_and_write_queues_empty(sk)); > > no good reason to misalign this AFAICT Maybe not worth respining just for this, I thought there are build warnings but seems it's mostly sparse getting confused. Is there a chance someone could look into adding annotations to socket locking?
On Thu, 2020-11-12 at 15:12 -0800, Jakub Kicinski wrote: > On Thu, 12 Nov 2020 15:08:31 -0800 Jakub Kicinski wrote: > > On Thu, 12 Nov 2020 18:45:21 +0100 Paolo Abeni wrote: > > > + skb = sk_stream_alloc_skb(sk, 0, sk->sk_allocation, > > > + tcp_rtx_and_write_queues_empty(sk)); > > > > no good reason to misalign this AFAICT > > Maybe not worth respining just for this, I thought there are build > warnings but seems it's mostly sparse getting confused. Thanks for looking into this! The misalign comes from the orginal TCP code, which I tried to keep as unmodfied as possible to simplify the review. Anyhow I had to drop an indentation level, so there are really no excuse for me. I'll address this in the next iteration, if other changes will be needed > Is there a chance someone could look into adding annotations to socket > locking? Annotating lock_sock_fast()/unlock_sock_fast() as they would unconditionally acquire/release the socket spinlock removes the warning related to fast lock - at least for me;). Hopefully that does not interact with lockdep, but perhpas is a bit too extreme/rusty? Something alike the following: --- diff --git a/include/net/sock.h b/include/net/sock.h index fbd2ba2f48c0..26db18024b74 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1591,7 +1591,8 @@ void release_sock(struct sock *sk); SINGLE_DEPTH_NESTING) #define bh_unlock_sock(__sk) spin_unlock(&((__sk)->sk_lock.slock)) -bool lock_sock_fast(struct sock *sk); +bool lock_sock_fast(struct sock *sk) __acquires(&sk->sk_lock.slock); + /** * unlock_sock_fast - complement of lock_sock_fast * @sk: socket @@ -1601,11 +1602,14 @@ bool lock_sock_fast(struct sock *sk); * If slow mode is on, we call regular release_sock() */ static inline void unlock_sock_fast(struct sock *sk, bool slow) + __releases(&sk->sk_lock.slock) { - if (slow) + if (slow) { release_sock(sk); - else + __release(&sk->sk_lock.slock); + } else { spin_unlock_bh(&sk->sk_lock.slock); + } } /* Used by processes to "lock" a socket state, so that diff --git a/net/core/sock.c b/net/core/sock.c index 727ea1cc633c..9badbe7bb4e4 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -3078,7 +3078,7 @@ EXPORT_SYMBOL(release_sock); * * sk_lock.slock unlocked, owned = 1, BH enabled */ -bool lock_sock_fast(struct sock *sk) +bool lock_sock_fast(struct sock *sk) __acquires(&sk->sk_lock.slock) { might_sleep(); spin_lock_bh(&sk->sk_lock.slock); @@ -3096,6 +3096,7 @@ bool lock_sock_fast(struct sock *sk) * The sk_lock has mutex_lock() semantics here: */ mutex_acquire(&sk->sk_lock.dep_map, 0, 0, _RET_IP_); + __acquire(&sk->sk_lock.slock); local_bh_enable(); return true; }
On Fri, 13 Nov 2020 11:38:13 +0100 Paolo Abeni wrote: > > Is there a chance someone could look into adding annotations to socket > > locking? > > Annotating lock_sock_fast()/unlock_sock_fast() as they would > unconditionally acquire/release the socket spinlock removes the warning > related to fast lock - at least for me;). > > Hopefully that does not interact with lockdep, but perhpas is a bit too > extreme/rusty? I'm not a sparse expert, do we need both __acquire and __acquires? Would you mind submitting officially and CCing the sparse ML?
diff --git a/include/net/tcp.h b/include/net/tcp.h index 4aba0f069b05..374d0a2acc4b 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -322,6 +322,7 @@ void tcp_shutdown(struct sock *sk, int how); int tcp_v4_early_demux(struct sk_buff *skb); int tcp_v4_rcv(struct sk_buff *skb); +void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb); int tcp_v4_tw_remember_stamp(struct inet_timewait_sock *tw); int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size); int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size); @@ -329,6 +330,8 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags); int tcp_sendpage_locked(struct sock *sk, struct page *page, int offset, size_t size, int flags); +struct sk_buff *tcp_build_frag(struct sock *sk, int size_goal, int flags, + struct page *page, int offset, size_t *size); ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, size_t size, int flags); int tcp_send_mss(struct sock *sk, int *size_goal, int flags); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index b2bc3d7fe9e8..391705aaa80e 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -954,7 +954,7 @@ int tcp_send_mss(struct sock *sk, int *size_goal, int flags) * importantly be able to generate EPOLLOUT for Edge Trigger epoll() * users. */ -static void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb) +void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb) { if (skb && !skb->len) { tcp_unlink_write_queue(skb, sk); @@ -964,6 +964,68 @@ static void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb) } } +struct sk_buff *tcp_build_frag(struct sock *sk, int size_goal, int flags, + struct page *page, int offset, size_t *size) +{ + struct sk_buff *skb = tcp_write_queue_tail(sk); + struct tcp_sock *tp = tcp_sk(sk); + bool can_coalesce; + int copy, i; + + if (!skb || (copy = size_goal - skb->len) <= 0 || + !tcp_skb_can_collapse_to(skb)) { +new_segment: + if (!sk_stream_memory_free(sk)) + return NULL; + + skb = sk_stream_alloc_skb(sk, 0, sk->sk_allocation, + tcp_rtx_and_write_queues_empty(sk)); + if (!skb) + return NULL; + +#ifdef CONFIG_TLS_DEVICE + skb->decrypted = !!(flags & MSG_SENDPAGE_DECRYPTED); +#endif + skb_entail(sk, skb); + copy = size_goal; + } + + if (copy > *size) + copy = *size; + + i = skb_shinfo(skb)->nr_frags; + can_coalesce = skb_can_coalesce(skb, i, page, offset); + if (!can_coalesce && i >= sysctl_max_skb_frags) { + tcp_mark_push(tp, skb); + goto new_segment; + } + if (!sk_wmem_schedule(sk, copy)) + return NULL; + + if (can_coalesce) { + skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); + } else { + get_page(page); + skb_fill_page_desc(skb, i, page, offset, copy); + } + + if (!(flags & MSG_NO_SHARED_FRAGS)) + skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG; + + skb->len += copy; + skb->data_len += copy; + skb->truesize += copy; + sk_wmem_queued_add(sk, copy); + sk_mem_charge(sk, copy); + skb->ip_summed = CHECKSUM_PARTIAL; + WRITE_ONCE(tp->write_seq, tp->write_seq + copy); + TCP_SKB_CB(skb)->end_seq += copy; + tcp_skb_pcount_set(skb, 0); + + *size = copy; + return skb; +} + ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, size_t size, int flags) { @@ -999,60 +1061,13 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, goto out_err; while (size > 0) { - struct sk_buff *skb = tcp_write_queue_tail(sk); - int copy, i; - bool can_coalesce; - - if (!skb || (copy = size_goal - skb->len) <= 0 || - !tcp_skb_can_collapse_to(skb)) { -new_segment: - if (!sk_stream_memory_free(sk)) - goto wait_for_space; - - skb = sk_stream_alloc_skb(sk, 0, sk->sk_allocation, - tcp_rtx_and_write_queues_empty(sk)); - if (!skb) - goto wait_for_space; - -#ifdef CONFIG_TLS_DEVICE - skb->decrypted = !!(flags & MSG_SENDPAGE_DECRYPTED); -#endif - skb_entail(sk, skb); - copy = size_goal; - } + struct sk_buff *skb; + size_t copy = size; - if (copy > size) - copy = size; - - i = skb_shinfo(skb)->nr_frags; - can_coalesce = skb_can_coalesce(skb, i, page, offset); - if (!can_coalesce && i >= sysctl_max_skb_frags) { - tcp_mark_push(tp, skb); - goto new_segment; - } - if (!sk_wmem_schedule(sk, copy)) + skb = tcp_build_frag(sk, size_goal, flags, page, offset, ©); + if (!skb) goto wait_for_space; - if (can_coalesce) { - skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); - } else { - get_page(page); - skb_fill_page_desc(skb, i, page, offset, copy); - } - - if (!(flags & MSG_NO_SHARED_FRAGS)) - skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG; - - skb->len += copy; - skb->data_len += copy; - skb->truesize += copy; - sk_wmem_queued_add(sk, copy); - sk_mem_charge(sk, copy); - skb->ip_summed = CHECKSUM_PARTIAL; - WRITE_ONCE(tp->write_seq, tp->write_seq + copy); - TCP_SKB_CB(skb)->end_seq += copy; - tcp_skb_pcount_set(skb, 0); - if (!copied) TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_PSH;
Will be needed by the next patch, as MPTCP needs to handle directly the error/memory-allocation-needed path. No functional changes intended. Additionally let MPTCP code access the tcp_remove_empty_skb() helper. Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- include/net/tcp.h | 3 ++ net/ipv4/tcp.c | 119 ++++++++++++++++++++++++++-------------------- 2 files changed, 70 insertions(+), 52 deletions(-)