From patchwork Fri Nov 26 12:00:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Abeni X-Patchwork-Id: 12640493 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C9FFC433EF for ; Fri, 26 Nov 2021 12:02:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377252AbhKZMFg (ORCPT ); Fri, 26 Nov 2021 07:05:36 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:51597 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346209AbhKZMDe (ORCPT ); Fri, 26 Nov 2021 07:03:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1637928021; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=iDrWAu3qsdqC1G+K85PyPx5yYTMiKAdZ80dSk9v+8F0=; b=It+wURrBZXh/rNrvlqHUeIYy5MMoXUYHsl4OJ1YU8rIeVk1EgrrVgkyXqeGB9Xivhek77b qi1acxtNAOjT+nmOc3vItqruiAKfTUQdBh2OVeabZ4w8y+8bkFpjVjOf4YAkgGhSQ3/CeF 6YyWKmfw0wCavUUo4MzqyfebMfAzqCg= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-462-cOqX1pLqNsaqirqpeSZu6Q-1; Fri, 26 Nov 2021 07:00:20 -0500 X-MC-Unique: cOqX1pLqNsaqirqpeSZu6Q-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 8DE9E2F20; Fri, 26 Nov 2021 12:00:19 +0000 (UTC) Received: from gerbillo.fritz.box (unknown [10.39.194.163]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8421E5F4ED; Fri, 26 Nov 2021 12:00:18 +0000 (UTC) From: Paolo Abeni To: netdev@vger.kernel.org Cc: Steffen Froemer , Eric Dumazet Subject: [PATCH net] tcp: fix page frag corruption on page fault Date: Fri, 26 Nov 2021 13:00:01 +0100 Message-Id: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Steffen reported a TCP stream corruption for HTTP requests served by the apache web-server using a cifs mount-point and memory mapping the relevant file. The root cause is quite similar to the one addressed by commit 20eb4f29b602 ("net: fix sk_page_frag() recursion from memory reclaim"). Here the nested access to the task page frag is caused by a page fault on the (mmapped) user-space memory buffer coming from the cifs file. The page fault handler performs an smb transaction on a different socket, inside the same process context. Since sk->sk_allaction for such socket does not prevent the usage for the task_frag, the nested allocation modify "under the hood" the page frag in use by the outer sendmsg call, corrupting the stream. The overall relevant stack trace looks like the following: httpd 78268 [001] 3461630.850950: probe:tcp_sendmsg_locked: ffffffff91461d91 tcp_sendmsg_locked+0x1 ffffffff91462b57 tcp_sendmsg+0x27 ffffffff9139814e sock_sendmsg+0x3e ffffffffc06dfe1d smb_send_kvec+0x28 [...] ffffffffc06cfaf8 cifs_readpages+0x213 ffffffff90e83c4b read_pages+0x6b ffffffff90e83f31 __do_page_cache_readahead+0x1c1 ffffffff90e79e98 filemap_fault+0x788 ffffffff90eb0458 __do_fault+0x38 ffffffff90eb5280 do_fault+0x1a0 ffffffff90eb7c84 __handle_mm_fault+0x4d4 ffffffff90eb8093 handle_mm_fault+0xc3 ffffffff90c74f6d __do_page_fault+0x1ed ffffffff90c75277 do_page_fault+0x37 ffffffff9160111e page_fault+0x1e ffffffff9109e7b5 copyin+0x25 ffffffff9109eb40 _copy_from_iter_full+0xe0 ffffffff91462370 tcp_sendmsg_locked+0x5e0 ffffffff91462370 tcp_sendmsg_locked+0x5e0 ffffffff91462b57 tcp_sendmsg+0x27 ffffffff9139815c sock_sendmsg+0x4c ffffffff913981f7 sock_write_iter+0x97 ffffffff90f2cc56 do_iter_readv_writev+0x156 ffffffff90f2dff0 do_iter_write+0x80 ffffffff90f2e1c3 vfs_writev+0xa3 ffffffff90f2e27c do_writev+0x5c ffffffff90c042bb do_syscall_64+0x5b ffffffff916000ad entry_SYSCALL_64_after_hwframe+0x65 A possible solution would be adding the __GFP_MEMALLOC flag to the cifs allocation. That looks dangerous, as the memory allocated by the cifs fs will not be free soon and such allocation will not allow for more memory to be freed. Instead, this patch changes the tcp_sendmsg() code to avoid touching the page frag after performing the copy from the user-space buffer. Any page fault or memory reclaim operation there is now free to touch the task page fragment without corrupting the state used by the outer sendmsg(). As a downside, if the user-space copy fails, there will be some additional atomic operations due to the reference counting on the faulty fragment, but that looks acceptable for a slow error path. Reported-by: Steffen Froemer Fixes: 5640f7685831 ("net: use a per task frag allocator") Signed-off-by: Paolo Abeni --- net/ipv4/tcp.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index bbb3d39c69af..2d85636c1577 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1304,6 +1304,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) bool merge = true; int i = skb_shinfo(skb)->nr_frags; struct page_frag *pfrag = sk_page_frag(sk); + unsigned int offset; if (!sk_page_frag_refill(sk, pfrag)) goto wait_for_space; @@ -1331,14 +1332,11 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) if (!sk_wmem_schedule(sk, copy)) goto wait_for_space; - err = skb_copy_to_page_nocache(sk, &msg->msg_iter, skb, - pfrag->page, - pfrag->offset, - copy); - if (err) - goto do_error; - - /* Update the skb. */ + /* Update the skb before accessing the user space buffer + * so that we leave the task frag in a consistent state. + * Just in case the page_fault handler need to use it + */ + offset = pfrag->offset; if (merge) { skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); } else { @@ -1347,6 +1345,12 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) page_ref_inc(pfrag->page); } pfrag->offset += copy; + + err = skb_copy_to_page_nocache(sk, &msg->msg_iter, skb, + pfrag->page, + offset, copy); + if (err) + goto do_error; } else { /* First append to a fragless skb builds initial * pure zerocopy skb