mbox series

[RFC,0/1] vmxnet3: Adjust maximum Rx ring buffer size

Message ID 20250105213036.288356-1-atomlin@atomlin.com (mailing list archive)
Headers show
Series vmxnet3: Adjust maximum Rx ring buffer size | expand

Message

Aaron Tomlin Jan. 5, 2025, 9:30 p.m. UTC
Hi Ronak, Paolo,

I managed to trigger the MAX_PAGE_ORDER warning in the context of function
__alloc_pages_noprof() with /usr/sbin/ethtool --set-ring rx 4096 rx-mini
2048 [devname]' using the maximum supported Ring 0 and Rx ring buffer size.
Admittedly this was under the stock Linux kernel-4.18.0-477.27.1.el8_8
whereby CONFIG_CMA is not enabled. I think it does not make sense to
attempt a large memory allocation request for physically contiguous memory,
to hold the Rx Data ring that could exceed the maximum page-order supported
by the system.

I am not familiar with drivers/net/vmxnet3 related code.
Please let me know your thoughts. Thank you.



Aaron Tomlin (1):
  vmxnet3: Adjust maximum Rx ring buffer size

 drivers/net/vmxnet3/vmxnet3_defs.h | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Jakub Kicinski Jan. 6, 2025, 11:47 p.m. UTC | #1
On Sun,  5 Jan 2025 21:30:35 +0000 Aaron Tomlin wrote:
> I managed to trigger the MAX_PAGE_ORDER warning in the context of function
> __alloc_pages_noprof() with /usr/sbin/ethtool --set-ring rx 4096 rx-mini
> 2048 [devname]' using the maximum supported Ring 0 and Rx ring buffer size.
> Admittedly this was under the stock Linux kernel-4.18.0-477.27.1.el8_8
> whereby CONFIG_CMA is not enabled. I think it does not make sense to
> attempt a large memory allocation request for physically contiguous memory,
> to hold the Rx Data ring that could exceed the maximum page-order supported
> by the system.

I think CMA should be a bit orthogonal to the warning.

Off the top of my head the usual way to solve the warning is to add
__GFP_NOWARN to the allocations which trigger it. And then handle
the error gracefully.
Florian Fainelli Jan. 6, 2025, 11:51 p.m. UTC | #2
On 1/6/25 15:47, 'Jakub Kicinski' via BCM-KERNEL-FEEDBACK-LIST,PDL wrote:
> On Sun,  5 Jan 2025 21:30:35 +0000 Aaron Tomlin wrote:
>> I managed to trigger the MAX_PAGE_ORDER warning in the context of function
>> __alloc_pages_noprof() with /usr/sbin/ethtool --set-ring rx 4096 rx-mini
>> 2048 [devname]' using the maximum supported Ring 0 and Rx ring buffer size.
>> Admittedly this was under the stock Linux kernel-4.18.0-477.27.1.el8_8
>> whereby CONFIG_CMA is not enabled. I think it does not make sense to
>> attempt a large memory allocation request for physically contiguous memory,
>> to hold the Rx Data ring that could exceed the maximum page-order supported
>> by the system.
> 
> I think CMA should be a bit orthogonal to the warning.
> 
> Off the top of my head the usual way to solve the warning is to add
> __GFP_NOWARN to the allocations which trigger it. And then handle
> the error gracefully.

That IMHO should really be the default for any driver that calls 
__netdev_alloc_skb() under the hood, we should not really have to 
specify __GFP_NOWARN, rather if people want it, they should specify it.
Jakub Kicinski Jan. 7, 2025, 12:57 a.m. UTC | #3
On Mon, 6 Jan 2025 15:51:10 -0800 Florian Fainelli wrote:
> On 1/6/25 15:47, 'Jakub Kicinski' via BCM-KERNEL-FEEDBACK-LIST,PDL wrote:
> > On Sun,  5 Jan 2025 21:30:35 +0000 Aaron Tomlin wrote:  
> >> I managed to trigger the MAX_PAGE_ORDER warning in the context of function
> >> __alloc_pages_noprof() with /usr/sbin/ethtool --set-ring rx 4096 rx-mini
> >> 2048 [devname]' using the maximum supported Ring 0 and Rx ring buffer size.
> >> Admittedly this was under the stock Linux kernel-4.18.0-477.27.1.el8_8
> >> whereby CONFIG_CMA is not enabled. I think it does not make sense to
> >> attempt a large memory allocation request for physically contiguous memory,
> >> to hold the Rx Data ring that could exceed the maximum page-order supported
> >> by the system.  
> > 
> > I think CMA should be a bit orthogonal to the warning.
> > 
> > Off the top of my head the usual way to solve the warning is to add
> > __GFP_NOWARN to the allocations which trigger it. And then handle
> > the error gracefully.  
> 
> That IMHO should really be the default for any driver that calls 
> __netdev_alloc_skb() under the hood, we should not really have to 
> specify __GFP_NOWARN, rather if people want it, they should specify it.

True, although TBH I don't fully understand why this flag exists
in the first place. Is it just supposed to be catching programming
errors, or is it due to potential DoS implications of users triggering
large allocations?
Aaron Tomlin Jan. 7, 2025, 10:55 p.m. UTC | #4
On Tue, 7 Jan 2025, Jakub Kicinski wrote:
> True, although TBH I don't fully understand why this flag exists
> in the first place. Is it just supposed to be catching programming
> errors, or is it due to potential DoS implications of users triggering
> large allocations?

Jakub,

I suspect that introducing __GFP_NOWARN would mask the issue, no?
I think the warning was useful. Otherwise it would be rather difficult to
establish precisely why the Rx Data ring was disable. In this particular
case, if I understand correctly, the intended size of the Rx Data ring was
simply too large due to the size of the maximum supported Rx Data buffer.
Jakub Kicinski Jan. 7, 2025, 11:46 p.m. UTC | #5
On Tue, 7 Jan 2025 22:55:38 +0000 (GMT) Aaron Tomlin wrote:
> On Tue, 7 Jan 2025, Jakub Kicinski wrote:
> > True, although TBH I don't fully understand why this flag exists
> > in the first place. Is it just supposed to be catching programming
> > errors, or is it due to potential DoS implications of users triggering
> > large allocations?  
> 
> Jakub,
> 
> I suspect that introducing __GFP_NOWARN would mask the issue, no?
> I think the warning was useful. Otherwise it would be rather difficult to
> establish precisely why the Rx Data ring was disable. In this particular
> case, if I understand correctly, the intended size of the Rx Data ring was
> simply too large due to the size of the maximum supported Rx Data buffer.

This is a bit of a weird driver. But we should distinguish the default
ring size, which yes, should not be too large, and max ring size which
can be large but user setting a large size risks the fact the
allocations will fail and device will not open.

This driver seems to read the default size from the hypervisor, is that
the value that is too large in your case? Maybe we should min() it with
something reasonable? The max allowed to be set via ethtool can remain
high IMO
Florian Fainelli Jan. 8, 2025, 4:53 p.m. UTC | #6
On 1/6/25 16:57, Jakub Kicinski wrote:
> On Mon, 6 Jan 2025 15:51:10 -0800 Florian Fainelli wrote:
>> On 1/6/25 15:47, 'Jakub Kicinski' via BCM-KERNEL-FEEDBACK-LIST,PDL wrote:
>>> On Sun,  5 Jan 2025 21:30:35 +0000 Aaron Tomlin wrote:
>>>> I managed to trigger the MAX_PAGE_ORDER warning in the context of function
>>>> __alloc_pages_noprof() with /usr/sbin/ethtool --set-ring rx 4096 rx-mini
>>>> 2048 [devname]' using the maximum supported Ring 0 and Rx ring buffer size.
>>>> Admittedly this was under the stock Linux kernel-4.18.0-477.27.1.el8_8
>>>> whereby CONFIG_CMA is not enabled. I think it does not make sense to
>>>> attempt a large memory allocation request for physically contiguous memory,
>>>> to hold the Rx Data ring that could exceed the maximum page-order supported
>>>> by the system.
>>>
>>> I think CMA should be a bit orthogonal to the warning.
>>>
>>> Off the top of my head the usual way to solve the warning is to add
>>> __GFP_NOWARN to the allocations which trigger it. And then handle
>>> the error gracefully.
>>
>> That IMHO should really be the default for any driver that calls
>> __netdev_alloc_skb() under the hood, we should not really have to
>> specify __GFP_NOWARN, rather if people want it, they should specify it.
> 
> True, although TBH I don't fully understand why this flag exists
> in the first place. Is it just supposed to be catching programming
> errors, or is it due to potential DoS implications of users triggering
> large allocations?
> 

There is some value IMHO in printing when allocations fail, where they 
came from, their gfp_t flags and page order so you can track high order 
offenders in hot paths (one of our Wi-Fi driver was notorious for doing 
that and having verbose out of memory dumps by default definitively 
helped). Once you fix those however, hogging the system while dumping 
lines and lines of information onto a slow console tends to be worse 
than the recovery from out of memory itself. One could argue that 
triggering an OOM plus dumping information can result in a DoS, so that 
should be frowned upon...
Ronak Doshi Jan. 8, 2025, 5:24 p.m. UTC | #7
On Tue, Jan 7, 2025 at 3:46 PM Jakub Kicinski <kuba@kernel.org> wrote:

>This driver seems to read the default size from the hypervisor, is that
>the value that is too large in your case?
The default should be 128 which is way less than max value.

Thanks,
Ronak
Aaron Tomlin Jan. 8, 2025, 9:05 p.m. UTC | #8
On Tue, 7 Jan 2025, Jakub Kicinski wrote:
> This is a bit of a weird driver. But we should distinguish the default
> ring size, which yes, should not be too large, and max ring size which
> can be large but user setting a large size risks the fact the
> allocations will fail and device will not open.
>
> This driver seems to read the default size from the hypervisor, is that
> the value that is too large in your case? Maybe we should min() it with
> something reasonable? The max allowed to be set via ethtool can remain
> high IMO
>

See vmxnet3_get_ringparam(). If I understand correctly, since commit
50a5ce3e7116a ("vmxnet3: add receive data ring support"), if the specified
VMXNET3 adapter has support for the Rx Data ring feature then the maximum
Rx Data buffer size is reported as VMXNET3_RXDATA_DESC_MAX_SIZE (i.e. 2048)
by 'ethtool'. Furthermore, See vmxnet3_set_ringparam(). A user specified Rx
mini value cannot be more than VMXNET3_RXDATA_DESC_MAX_SIZE. Indeed the Rx
mini value in the context of VMXNET3 would be the size of the Rx Data ring
buffer. See the following excerpt from vmxnet3_set_ringparam(). As far as I
can tell, the Rx Data buffer cannot be more than
VMXNET3_RXDATA_DESC_MAX_SIZE:

 686 static int
 687 vmxnet3_set_ringparam(struct net_device *netdev,
 688                       struct ethtool_ringparam *param,
 689                       struct kernel_ethtool_ringparam *kernel_param,
 690                       struct netlink_ext_ack *extack)
 691 {
  :
 760         new_rxdata_desc_size =
 761                 (param->rx_mini_pending + VMXNET3_RXDATA_DESC_SIZE_MASK) &
 762                 ~VMXNET3_RXDATA_DESC_SIZE_MASK;
 763         new_rxdata_desc_size = min_t(u16, new_rxdata_desc_size,
 764                                      VMXNET3_RXDATA_DESC_MAX_SIZE);


Have I missed something?