[net-next,v5,0/4] virtio_net: rx enable premapped mode by default

Message ID	20240511031404.30903-1-xuanzhuo@linux.alibaba.com (mailing list archive)
Headers	show Received: from out30-124.freemail.mail.aliyun.com (out30-124.freemail.mail.aliyun.com [115.124.30.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 627153716D for <netdev@vger.kernel.org>; Sat, 11 May 2024 03:14:13 +0000 (UTC) From: Xuan Zhuo <xuanzhuo@linux.alibaba.com> To: netdev@vger.kernel.org Cc: "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, Xuan Zhuo <xuanzhuo@linux.alibaba.com>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, virtualization@lists.linux.dev Subject: [PATCH net-next v5 0/4] virtio_net: rx enable premapped mode by default Date: Sat, 11 May 2024 11:14:00 +0800 Message-Id: <20240511031404.30903-1-xuanzhuo@linux.alibaba.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	virtio_net: rx enable premapped mode by default \| expand [net-next,v5,0/4] virtio_net: rx enable premapped mode by default [net-next,v5,1/4] virtio_ring: enable premapped mode whatever use_dma_api [net-next,v5,2/4] virtio_net: big mode skip the unmap check [net-next,v5,3/4] virtio_net: rx remove premapped failover code [net-next,v5,4/4] virtio_net: remove the misleading comment

Xuan Zhuo May 11, 2024, 3:14 a.m. UTC

Actually, for the virtio drivers, we can enable premapped mode whatever
the value of use_dma_api. Because we provide the virtio dma apis.
So the driver can enable premapped mode unconditionally.

This patch set makes the big mode of virtio-net to support premapped mode.
And enable premapped mode for rx by default.

Based on the following points, we do not use page pool to manage these
pages:

1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
we can only prevent the page pool from performing DMA operations, and
let the driver perform DMA operations on the allocated pages.
2. But when the page pool releases the page, we have no chance to
execute dma unmap.
3. A solution to #2 is to execute dma unmap every time before putting
the page back to the page pool. (This is actually a waste, we don't
execute unmap so frequently.)
4. But there is another problem, we still need to use page.dma_addr to
save the dma address. Using page.dma_addr while using page pool is
unsafe behavior.
5. And we need space the chain the pages submitted once to virtio core.

More:
https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/

Why we do not use the page space to store the dma?
http://lore.kernel.org/all/CACGkMEuyeJ9mMgYnnB42=hw6umNuo=agn7VBqBqYPd7GN=+39Q@mail.gmail.com

Please review.

v5: 1. Fix the comments from @Larysa Zaremba
http://lore.kernel.org/all/20240508063718.69806-1-xuanzhuo@linux.alibaba.com

v4:
1. For the conflict, switch to the net-next branch

v3:
1. big mode still use the mode that virtio core does the dma map/unmap

v2:
1. make gcc happy in page_chain_get_dma()
http://lore.kernel.org/all/202404221325.SX5ChRGP-lkp@intel.com

v1:
1. discussed for using page pool
2. use dma sync to replace the unmap for the first page

Thanks.

Xuan Zhuo (4):
virtio_ring: enable premapped mode whatever use_dma_api
virtio_net: big mode skip the unmap check
virtio_net: rx remove premapped failover code
virtio_net: remove the misleading comment

drivers/net/virtio_net.c | 90 +++++++++++++++---------------------
drivers/virtio/virtio_ring.c | 7 +--
2 files changed, 38 insertions(+), 59 deletions(-)

--
2.32.0.3.g01195cf9f

patchwork-bot+netdevbpf@kernel.org May 14, 2024, 12:20 a.m. UTC | #1

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sat, 11 May 2024 11:14:00 +0800 you wrote:
> Actually, for the virtio drivers, we can enable premapped mode whatever
> the value of use_dma_api. Because we provide the virtio dma apis.
> So the driver can enable premapped mode unconditionally.
> 
> This patch set makes the big mode of virtio-net to support premapped mode.
> And enable premapped mode for rx by default.
> 
> [...]

Here is the summary with links:
  - [net-next,v5,1/4] virtio_ring: enable premapped mode whatever use_dma_api
    https://git.kernel.org/netdev/net-next/c/f9dac92ba908
  - [net-next,v5,2/4] virtio_net: big mode skip the unmap check
    https://git.kernel.org/netdev/net-next/c/a377ae542d8d
  - [net-next,v5,3/4] virtio_net: rx remove premapped failover code
    https://git.kernel.org/netdev/net-next/c/defd28aa5acb
  - [net-next,v5,4/4] virtio_net: remove the misleading comment
    https://git.kernel.org/netdev/net-next/c/9719f039d328

You are awesome, thank you!

Thorsten Leemhuis Aug. 15, 2024, 7:14 a.m. UTC | #2

[side note: the message I have been replying to at least when downloaded
from lore has two message-ids, one of them identical two a older
message, which is why this looks odd in the lore archives:
https://lore.kernel.org/all/20240511031404.30903-1-xuanzhuo@linux.alibaba.com/]

On 14.08.24 08:59, Michael S. Tsirkin wrote:
> Note: Xuan Zhuo, if you have a better idea, pls post an alternative
> patch.
> 
> Note2: untested, posting for Darren to help with testing.
> 
> Turns out unconditionally enabling premapped 
> virtio-net leads to a regression on VM with no ACCESS_PLATFORM, and with
> sysctl net.core.high_order_alloc_disable=1
> 
> where crashes and scp failures were reported (scp a file 100M in size to VM):
> [...]

TWIMC, there is a regression report on lore and I wonder if this might
be related or the same problem, as it also mentioned a "get_swap_device:
Bad swap file entry" error:
https://bugzilla.kernel.org/show_bug.cgi?id=219154

To quote:

"""
Hello,

I've encountered repeated crashes or freezes when a KVM VM receives
large amounts of data over the network while the system is under memory
load and performing I/O operations. The crashes sometimes occur in the
filesystem code (ext4 and btrfs, at least), but they also happen in
other locations.

This issue occurs on my custom builds using kernel versions v6.10 to
v6.11-rc2, with virtio network and disk drivers, and either Ubuntu 22.04
or Debian 12 user space.

The same kernel build did not crash on an Azure VM, which does not use
the virtio network driver. Since this issue only appears when receiving
data, I suspect there could be an issue related to the virtio interface
or receive buffer handling.

This issue did not occur on the Debian backport kernel 6.9.7-1~bpo12+1
amd64.

Steps to Reproduce:
1. Setup a small VM on a KVM host.
   I tested this on an x86_64 KVM VM with 1 CPU, 512 MB RAM, 2 GB SWAP
(the smallest configuration from Vultr), using a Debian 12 user space,
virtio disk, and virtio net.
2. Induce high memory and I/O load. Run the following command:
   stress --vm 2 --hdd 1
   (Adjust --vm to to occupy all the RAM)
   This slows down the system but does not cause a crash.
3. Send large data to the VM.
   I used `iperf3 -s` on the VM and sent data using `iperf3 -c` from
another host. The system crashes within a few seconds to a few minutes.
(The reverse direction `iperf3 -c -R` did not cause a crash.)

The OOPS messages are mostly general protection faults, but sometimes I
see "Bad pagetable" or other errors, such as:
Oops: general protection fault, probably for non-canonical address
0x2f9b7fa5e2bde696: 0000 [#1] PREEMPT SMP PTI
Oops: Oops: 0000 [#1] PREEMPT SMP PTI
Oops: Bad pagetable: 000d [#1] PREEMPT SMP PTI

In some cases, dmesg contains something like:
UBSAN: shift-out-of-bounds in lib/xarray.c:158:34

When the system freezes without crash, I sometimes found BUGON messages
in some cases, such as:
get_swap_device: Bad swap file entry 3403b0f5b2584992
BUG: Bad page map in process stress  pte:c42f93fac0299e1d pmd:0d9b2047
BUG: Bad rss-counter-state mm:000000004df3dd9a type:MM_ANONPAGES val:2
BUG: Bad rss-counter-state mm:000000004df3dd9a type:MM_SWAPENTS val:-1

Thanks.
"""

Ciao, Thorsten

Darren Kenny Aug. 15, 2024, 10:22 a.m. UTC | #3

On Thursday, 2024-08-15 at 09:14:27 +02, Linux regression tracking (Thorsten Leemhuis) wrote:
> [side note: the message I have been replying to at least when downloaded
> from lore has two message-ids, one of them identical two a older
> message, which is why this looks odd in the lore archives:
> https://lore.kernel.org/all/20240511031404.30903-1-xuanzhuo@linux.alibaba.com/]
>

Yes, I saw that too, hence I responded to patch 1 in the series, rather
than the cover letter.

> On 14.08.24 08:59, Michael S. Tsirkin wrote:
>> Note: Xuan Zhuo, if you have a better idea, pls post an alternative
>> patch.
>> 
>> Note2: untested, posting for Darren to help with testing.
>> 
>> Turns out unconditionally enabling premapped 
>> virtio-net leads to a regression on VM with no ACCESS_PLATFORM, and with
>> sysctl net.core.high_order_alloc_disable=1
>> 
>> where crashes and scp failures were reported (scp a file 100M in size to VM):
>> [...]
>
> TWIMC, there is a regression report on lore and I wonder if this might
> be related or the same problem, as it also mentioned a "get_swap_device:
> Bad swap file entry" error:
> https://bugzilla.kernel.org/show_bug.cgi?id=219154
>

I took a look at the stack traces, they don't look similar to what I was
seeing, but I wasn't running with an ASAN enabled in the kernel.

Most of the traces that I was seeing would look like as in the e-mail
from Si-Wei:

  https://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com/

We could trigger it only when the sysctl value was set like:

- net.core.high_order_alloc_disable=1

And it would immediately panic on any relatively large download, e.g.
wget of a few RPMS, or similar.

Best I can suggest would be to try reverting them in a custom kernel
and see if it fixes this problem too.

Thanks,

Darren.

> To quote:
>
> """
> Hello,
>
> I've encountered repeated crashes or freezes when a KVM VM receives
> large amounts of data over the network while the system is under memory
> load and performing I/O operations. The crashes sometimes occur in the
> filesystem code (ext4 and btrfs, at least), but they also happen in
> other locations.
>
> This issue occurs on my custom builds using kernel versions v6.10 to
> v6.11-rc2, with virtio network and disk drivers, and either Ubuntu 22.04
> or Debian 12 user space.
>
> The same kernel build did not crash on an Azure VM, which does not use
> the virtio network driver. Since this issue only appears when receiving
> data, I suspect there could be an issue related to the virtio interface
> or receive buffer handling.
>
> This issue did not occur on the Debian backport kernel 6.9.7-1~bpo12+1
> amd64.
>
> Steps to Reproduce:
> 1. Setup a small VM on a KVM host.
>    I tested this on an x86_64 KVM VM with 1 CPU, 512 MB RAM, 2 GB SWAP
> (the smallest configuration from Vultr), using a Debian 12 user space,
> virtio disk, and virtio net.
> 2. Induce high memory and I/O load. Run the following command:
>    stress --vm 2 --hdd 1
>    (Adjust --vm to to occupy all the RAM)
>    This slows down the system but does not cause a crash.
> 3. Send large data to the VM.
>    I used `iperf3 -s` on the VM and sent data using `iperf3 -c` from
> another host. The system crashes within a few seconds to a few minutes.
> (The reverse direction `iperf3 -c -R` did not cause a crash.)
>
>
> The OOPS messages are mostly general protection faults, but sometimes I
> see "Bad pagetable" or other errors, such as:
> Oops: general protection fault, probably for non-canonical address
> 0x2f9b7fa5e2bde696: 0000 [#1] PREEMPT SMP PTI
> Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> Oops: Bad pagetable: 000d [#1] PREEMPT SMP PTI
>
> In some cases, dmesg contains something like:
> UBSAN: shift-out-of-bounds in lib/xarray.c:158:34
>
> When the system freezes without crash, I sometimes found BUGON messages
> in some cases, such as:
> get_swap_device: Bad swap file entry 3403b0f5b2584992
> BUG: Bad page map in process stress  pte:c42f93fac0299e1d pmd:0d9b2047
> BUG: Bad rss-counter-state mm:000000004df3dd9a type:MM_ANONPAGES val:2
> BUG: Bad rss-counter-state mm:000000004df3dd9a type:MM_SWAPENTS val:-1
>
> Thanks.
> """
>
> Ciao, Thorsten

Michael S. Tsirkin Aug. 15, 2024, 3:23 p.m. UTC | #4

On Thu, Aug 15, 2024 at 09:14:27AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> [side note: the message I have been replying to at least when downloaded
> from lore has two message-ids, one of them identical two a older
> message, which is why this looks odd in the lore archives:
> https://lore.kernel.org/all/20240511031404.30903-1-xuanzhuo@linux.alibaba.com/]

Sorry, could you clarify - which message has two message IDs?

Michael S. Tsirkin Aug. 15, 2024, 3:28 p.m. UTC | #5

On Thu, Aug 15, 2024 at 11:23:19AM -0400, Michael S. Tsirkin wrote:
> On Thu, Aug 15, 2024 at 09:14:27AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> > [side note: the message I have been replying to at least when downloaded
> > from lore has two message-ids, one of them identical two a older
> > message, which is why this looks odd in the lore archives:
> > https://lore.kernel.org/all/20240511031404.30903-1-xuanzhuo@linux.alibaba.com/]
> 
> Sorry, could you clarify - which message has two message IDs?

Ouch. The one I sent had a bad message Id :(
Donnu how it happened, I guess I was mucking with it
manually and corrupted it. Really sorry.

Thorsten Leemhuis Aug. 16, 2024, 5:03 a.m. UTC | #6

On 15.08.24 12:22, Darren Kenny wrote:
> On Thursday, 2024-08-15 at 09:14:27 +02, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 14.08.24 08:59, Michael S. Tsirkin wrote:
>>> Note: Xuan Zhuo, if you have a better idea, pls post an alternative
>>> patch.
>>>
>>> Note2: untested, posting for Darren to help with testing.
>>>
>>> Turns out unconditionally enabling premapped 
>>> virtio-net leads to a regression on VM with no ACCESS_PLATFORM, and with
>>> sysctl net.core.high_order_alloc_disable=1
>>>
>>> where crashes and scp failures were reported (scp a file 100M in size to VM):
>>> [...]
>>
>> TWIMC, there is a regression report on lore

Obviously I meant bugzilla here, sorry.

>> and I wonder if this might
>> be related or the same problem, as it also mentioned a "get_swap_device:
>> Bad swap file entry" error:
>> https://bugzilla.kernel.org/show_bug.cgi?id=219154
> 
> I took a look at the stack traces, they don't look similar to what I was
> seeing, but I wasn't running with an ASAN enabled in the kernel.
> [...]

Yeah, but in the end it seems it is the same problem: The reporter,
Takero Funaki (now CCed) meanwhile performed a bisection that ended up
on f9dac92ba908 (virtio_ring: enable premapped mode regardless of
use_dma_api) -- and later confirmed in bugzilla that reverting the three
patches resolved the problem. Feel free to CC Takero on further mails
about this.

Ciao, Thorsten

#regzbot report:
https://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com/
#regzbot dup: https://bugzilla.kernel.org/show_bug.cgi?id=219154

[net-next,v5,0/4] virtio_net: rx enable premapped mode by default

Message

Comments