Message ID | 20250105213036.288356-1-atomlin@atomlin.com (mailing list archive) |
---|---|
Headers | show |
Series | vmxnet3: Adjust maximum Rx ring buffer size | expand |
On Sun, 5 Jan 2025 21:30:35 +0000 Aaron Tomlin wrote: > I managed to trigger the MAX_PAGE_ORDER warning in the context of function > __alloc_pages_noprof() with /usr/sbin/ethtool --set-ring rx 4096 rx-mini > 2048 [devname]' using the maximum supported Ring 0 and Rx ring buffer size. > Admittedly this was under the stock Linux kernel-4.18.0-477.27.1.el8_8 > whereby CONFIG_CMA is not enabled. I think it does not make sense to > attempt a large memory allocation request for physically contiguous memory, > to hold the Rx Data ring that could exceed the maximum page-order supported > by the system. I think CMA should be a bit orthogonal to the warning. Off the top of my head the usual way to solve the warning is to add __GFP_NOWARN to the allocations which trigger it. And then handle the error gracefully.
On 1/6/25 15:47, 'Jakub Kicinski' via BCM-KERNEL-FEEDBACK-LIST,PDL wrote: > On Sun, 5 Jan 2025 21:30:35 +0000 Aaron Tomlin wrote: >> I managed to trigger the MAX_PAGE_ORDER warning in the context of function >> __alloc_pages_noprof() with /usr/sbin/ethtool --set-ring rx 4096 rx-mini >> 2048 [devname]' using the maximum supported Ring 0 and Rx ring buffer size. >> Admittedly this was under the stock Linux kernel-4.18.0-477.27.1.el8_8 >> whereby CONFIG_CMA is not enabled. I think it does not make sense to >> attempt a large memory allocation request for physically contiguous memory, >> to hold the Rx Data ring that could exceed the maximum page-order supported >> by the system. > > I think CMA should be a bit orthogonal to the warning. > > Off the top of my head the usual way to solve the warning is to add > __GFP_NOWARN to the allocations which trigger it. And then handle > the error gracefully. That IMHO should really be the default for any driver that calls __netdev_alloc_skb() under the hood, we should not really have to specify __GFP_NOWARN, rather if people want it, they should specify it.
On Mon, 6 Jan 2025 15:51:10 -0800 Florian Fainelli wrote: > On 1/6/25 15:47, 'Jakub Kicinski' via BCM-KERNEL-FEEDBACK-LIST,PDL wrote: > > On Sun, 5 Jan 2025 21:30:35 +0000 Aaron Tomlin wrote: > >> I managed to trigger the MAX_PAGE_ORDER warning in the context of function > >> __alloc_pages_noprof() with /usr/sbin/ethtool --set-ring rx 4096 rx-mini > >> 2048 [devname]' using the maximum supported Ring 0 and Rx ring buffer size. > >> Admittedly this was under the stock Linux kernel-4.18.0-477.27.1.el8_8 > >> whereby CONFIG_CMA is not enabled. I think it does not make sense to > >> attempt a large memory allocation request for physically contiguous memory, > >> to hold the Rx Data ring that could exceed the maximum page-order supported > >> by the system. > > > > I think CMA should be a bit orthogonal to the warning. > > > > Off the top of my head the usual way to solve the warning is to add > > __GFP_NOWARN to the allocations which trigger it. And then handle > > the error gracefully. > > That IMHO should really be the default for any driver that calls > __netdev_alloc_skb() under the hood, we should not really have to > specify __GFP_NOWARN, rather if people want it, they should specify it. True, although TBH I don't fully understand why this flag exists in the first place. Is it just supposed to be catching programming errors, or is it due to potential DoS implications of users triggering large allocations?
On Tue, 7 Jan 2025, Jakub Kicinski wrote: > True, although TBH I don't fully understand why this flag exists > in the first place. Is it just supposed to be catching programming > errors, or is it due to potential DoS implications of users triggering > large allocations? Jakub, I suspect that introducing __GFP_NOWARN would mask the issue, no? I think the warning was useful. Otherwise it would be rather difficult to establish precisely why the Rx Data ring was disable. In this particular case, if I understand correctly, the intended size of the Rx Data ring was simply too large due to the size of the maximum supported Rx Data buffer.
On Tue, 7 Jan 2025 22:55:38 +0000 (GMT) Aaron Tomlin wrote: > On Tue, 7 Jan 2025, Jakub Kicinski wrote: > > True, although TBH I don't fully understand why this flag exists > > in the first place. Is it just supposed to be catching programming > > errors, or is it due to potential DoS implications of users triggering > > large allocations? > > Jakub, > > I suspect that introducing __GFP_NOWARN would mask the issue, no? > I think the warning was useful. Otherwise it would be rather difficult to > establish precisely why the Rx Data ring was disable. In this particular > case, if I understand correctly, the intended size of the Rx Data ring was > simply too large due to the size of the maximum supported Rx Data buffer. This is a bit of a weird driver. But we should distinguish the default ring size, which yes, should not be too large, and max ring size which can be large but user setting a large size risks the fact the allocations will fail and device will not open. This driver seems to read the default size from the hypervisor, is that the value that is too large in your case? Maybe we should min() it with something reasonable? The max allowed to be set via ethtool can remain high IMO