diff mbox series

[net-next,v2] net: mana: Implement get_ringparam/set_ringparam for mana

Message ID 1722358895-13430-1-git-send-email-shradhagupta@linux.microsoft.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net-next,v2] net: mana: Implement get_ringparam/set_ringparam for mana | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 42 this patch: 42
netdev/build_tools success Errors and warnings before: 10 this patch: 10
netdev/cc_maintainers success CCed 14 of 14 maintainers
netdev/build_clang success Errors and warnings before: 43 this patch: 43
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 43 this patch: 43
netdev/checkpatch warning WARNING: line length of 100 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-07-31--12-00 (tests: 707)

Commit Message

Shradha Gupta July 30, 2024, 5:01 p.m. UTC
Currently the values of WQs for RX and TX queues for MANA devices
are hardcoded to default sizes.
Allow configuring these values for MANA devices as ringparam
configuration(get/set) through ethtool_ops.

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
---
 Changes in v2:
 * Removed unnecessary validations in mana_set_ringparam()
 * Fixed codespell error
 * Improved error message to indicate issue with the parameter
---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 20 +++---
 .../ethernet/microsoft/mana/mana_ethtool.c    | 66 +++++++++++++++++++
 include/net/mana/mana.h                       | 21 +++++-
 3 files changed, 96 insertions(+), 11 deletions(-)

Comments

Naman Jain July 31, 2024, 8:49 a.m. UTC | #1
On 7/30/2024 10:31 PM, Shradha Gupta wrote:
> Currently the values of WQs for RX and TX queues for MANA devices
> are hardcoded to default sizes.
> Allow configuring these values for MANA devices as ringparam
> configuration(get/set) through ethtool_ops.
> 
> Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> Reviewed-by: Long Li <longli@microsoft.com>
> ---
>   Changes in v2:
>   * Removed unnecessary validations in mana_set_ringparam()
>   * Fixed codespell error
>   * Improved error message to indicate issue with the parameter
> ---
>   drivers/net/ethernet/microsoft/mana/mana_en.c | 20 +++---
>   .../ethernet/microsoft/mana/mana_ethtool.c    | 66 +++++++++++++++++++
>   include/net/mana/mana.h                       | 21 +++++-
>   3 files changed, 96 insertions(+), 11 deletions(-)
> 

 From what I understand, we are adding support for "ethtool -G --set-
ring"  command.
Please correct me if I am wrong.

Maybe it would be good to capture the benefit/purpose of this patch in
the commit msg, as in which use-cases/scenarios we are now trying to
support that previously were not supported. The "why?" part basically.



Regards,
Naman Jain
Jakub Kicinski Aug. 1, 2024, 12:15 a.m. UTC | #2
On Tue, 30 Jul 2024 10:01:35 -0700 Shradha Gupta wrote:
> +	err1 = mana_detach(ndev, false);
> +	if (err1) {
> +		netdev_err(ndev, "mana_detach failed: %d\n", err1);
> +		return err1;
> +	}
> +
> +	apc->tx_queue_size = new_tx;
> +	apc->rx_queue_size = new_rx;
> +	err1 = mana_attach(ndev);
> +	if (!err1)
> +		return 0;
> +
> +	netdev_err(ndev, "mana_attach failed: %d\n", err1);
> +
> +	/* Try rolling back to the older values */
> +	apc->tx_queue_size = old_tx;
> +	apc->rx_queue_size = old_rx;
> +	err2 = mana_attach(ndev);

If system is under memory pressure there's no guarantee you'll get 
the memory back, even if you revert to the old counts.
We strongly recommend you refactor the code to hold onto the old memory
until you're sure new config works.
Shradha Gupta Aug. 1, 2024, 3:49 a.m. UTC | #3
On Wed, Jul 31, 2024 at 02:19:34PM +0530, Naman Jain wrote:
> 
> 
> On 7/30/2024 10:31 PM, Shradha Gupta wrote:
> >Currently the values of WQs for RX and TX queues for MANA devices
> >are hardcoded to default sizes.
> >Allow configuring these values for MANA devices as ringparam
> >configuration(get/set) through ethtool_ops.
> >
> >Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
> >Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> >Reviewed-by: Long Li <longli@microsoft.com>
> >---
> >  Changes in v2:
> >  * Removed unnecessary validations in mana_set_ringparam()
> >  * Fixed codespell error
> >  * Improved error message to indicate issue with the parameter
> >---
> >  drivers/net/ethernet/microsoft/mana/mana_en.c | 20 +++---
> >  .../ethernet/microsoft/mana/mana_ethtool.c    | 66 +++++++++++++++++++
> >  include/net/mana/mana.h                       | 21 +++++-
> >  3 files changed, 96 insertions(+), 11 deletions(-)
> >
> 
> From what I understand, we are adding support for "ethtool -G --set-
> ring"  command.
> Please correct me if I am wrong.
> 
> Maybe it would be good to capture the benefit/purpose of this patch in
> the commit msg, as in which use-cases/scenarios we are now trying to
> support that previously were not supported. The "why?" part basically.

Hi Naman,
Thanks for your comment. 
It is a pretty standard support for network drivers to allow changing
TX/RX queue sizes. We are working on improving customizations in MANA
driver based on VM configurations. This patch is a part of that series.
Hope that makes things more clear.

regards,
Shradha
> 
> 
> 
> Regards,
> Naman Jain
Shradha Gupta Aug. 1, 2024, 3:50 a.m. UTC | #4
On Wed, Jul 31, 2024 at 05:15:18PM -0700, Jakub Kicinski wrote:
> On Tue, 30 Jul 2024 10:01:35 -0700 Shradha Gupta wrote:
> > +	err1 = mana_detach(ndev, false);
> > +	if (err1) {
> > +		netdev_err(ndev, "mana_detach failed: %d\n", err1);
> > +		return err1;
> > +	}
> > +
> > +	apc->tx_queue_size = new_tx;
> > +	apc->rx_queue_size = new_rx;
> > +	err1 = mana_attach(ndev);
> > +	if (!err1)
> > +		return 0;
> > +
> > +	netdev_err(ndev, "mana_attach failed: %d\n", err1);
> > +
> > +	/* Try rolling back to the older values */
> > +	apc->tx_queue_size = old_tx;
> > +	apc->rx_queue_size = old_rx;
> > +	err2 = mana_attach(ndev);
> 
> If system is under memory pressure there's no guarantee you'll get 
> the memory back, even if you revert to the old counts.
> We strongly recommend you refactor the code to hold onto the old memory
> until you're sure new config works.
Okay, that makes sense. Let me try to make that change

Thanks,
Shradha.
> -- 
> pw-bot: cr
Jakub Kicinski Aug. 1, 2024, 2:16 p.m. UTC | #5
On Wed, 31 Jul 2024 20:49:05 -0700 Shradha Gupta wrote:
> It is a pretty standard support for network drivers to allow changing
> TX/RX queue sizes. We are working on improving customizations in MANA
> driver based on VM configurations. This patch is a part of that series.
> Hope that makes things more clear.

Simple reconfiguration must not run the risk of taking the system off
network.
Shradha Gupta Aug. 2, 2024, 4:29 a.m. UTC | #6
On Thu, Aug 01, 2024 at 07:16:49AM -0700, Jakub Kicinski wrote:
> On Wed, 31 Jul 2024 20:49:05 -0700 Shradha Gupta wrote:
> > It is a pretty standard support for network drivers to allow changing
> > TX/RX queue sizes. We are working on improving customizations in MANA
> > driver based on VM configurations. This patch is a part of that series.
> > Hope that makes things more clear.
> 
> Simple reconfiguration must not run the risk of taking the system off
> network.
Agreed. I'm making the changes and send a new version. Thanks
Zhu Yanjun Aug. 3, 2024, 6:09 p.m. UTC | #7
在 2024/7/31 1:01, Shradha Gupta 写道:
> Currently the values of WQs for RX and TX queues for MANA devices
> are hardcoded to default sizes.
> Allow configuring these values for MANA devices as ringparam
> configuration(get/set) through ethtool_ops.
> 
> Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> Reviewed-by: Long Li <longli@microsoft.com>
> ---
>   Changes in v2:
>   * Removed unnecessary validations in mana_set_ringparam()
>   * Fixed codespell error
>   * Improved error message to indicate issue with the parameter
> ---
>   drivers/net/ethernet/microsoft/mana/mana_en.c | 20 +++---
>   .../ethernet/microsoft/mana/mana_ethtool.c    | 66 +++++++++++++++++++
>   include/net/mana/mana.h                       | 21 +++++-
>   3 files changed, 96 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index d2f07e179e86..598ac62be47d 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -618,7 +618,7 @@ static int mana_pre_alloc_rxbufs(struct mana_port_context *mpc, int new_mtu)
>   
>   	dev = mpc->ac->gdma_dev->gdma_context->dev;
>   
> -	num_rxb = mpc->num_queues * RX_BUFFERS_PER_QUEUE;
> +	num_rxb = mpc->num_queues * mpc->rx_queue_size;
>   
>   	WARN(mpc->rxbufs_pre, "mana rxbufs_pre exists\n");
>   	mpc->rxbufs_pre = kmalloc_array(num_rxb, sizeof(void *), GFP_KERNEL);
> @@ -1899,14 +1899,15 @@ static int mana_create_txq(struct mana_port_context *apc,
>   		return -ENOMEM;
>   
>   	/*  The minimum size of the WQE is 32 bytes, hence
> -	 *  MAX_SEND_BUFFERS_PER_QUEUE represents the maximum number of WQEs
> +	 *  apc->tx_queue_size represents the maximum number of WQEs
>   	 *  the SQ can store. This value is then used to size other queues
>   	 *  to prevent overflow.
> +	 *  Also note that the txq_size is always going to be MANA_PAGE_ALIGNED,
> +	 *  as tx_queue_size is always a power of 2.
>   	 */
> -	txq_size = MAX_SEND_BUFFERS_PER_QUEUE * 32;
> -	BUILD_BUG_ON(!MANA_PAGE_ALIGNED(txq_size));
> +	txq_size = apc->tx_queue_size * 32;

Not sure if the following is needed or not.
"
WARN_ON(!MANA_PAGE_ALIGNED(txq_size));
"

Zhu Yanjun

>   
> -	cq_size = MAX_SEND_BUFFERS_PER_QUEUE * COMP_ENTRY_SIZE;
> +	cq_size = apc->tx_queue_size * COMP_ENTRY_SIZE;
>   	cq_size = MANA_PAGE_ALIGN(cq_size);
>   
>   	gc = gd->gdma_context;
> @@ -2145,10 +2146,11 @@ static int mana_push_wqe(struct mana_rxq *rxq)
>   
>   static int mana_create_page_pool(struct mana_rxq *rxq, struct gdma_context *gc)
>   {
> +	struct mana_port_context *mpc = netdev_priv(rxq->ndev);
>   	struct page_pool_params pprm = {};
>   	int ret;
>   
> -	pprm.pool_size = RX_BUFFERS_PER_QUEUE;
> +	pprm.pool_size = mpc->rx_queue_size;
>   	pprm.nid = gc->numa_node;
>   	pprm.napi = &rxq->rx_cq.napi;
>   	pprm.netdev = rxq->ndev;
> @@ -2180,13 +2182,13 @@ static struct mana_rxq *mana_create_rxq(struct mana_port_context *apc,
>   
>   	gc = gd->gdma_context;
>   
> -	rxq = kzalloc(struct_size(rxq, rx_oobs, RX_BUFFERS_PER_QUEUE),
> +	rxq = kzalloc(struct_size(rxq, rx_oobs, apc->rx_queue_size),
>   		      GFP_KERNEL);
>   	if (!rxq)
>   		return NULL;
>   
>   	rxq->ndev = ndev;
> -	rxq->num_rx_buf = RX_BUFFERS_PER_QUEUE;
> +	rxq->num_rx_buf = apc->rx_queue_size;
>   	rxq->rxq_idx = rxq_idx;
>   	rxq->rxobj = INVALID_MANA_HANDLE;
>   
> @@ -2734,6 +2736,8 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
>   	apc->ndev = ndev;
>   	apc->max_queues = gc->max_num_queues;
>   	apc->num_queues = gc->max_num_queues;
> +	apc->tx_queue_size = DEF_TX_BUFFERS_PER_QUEUE;
> +	apc->rx_queue_size = DEF_RX_BUFFERS_PER_QUEUE;
>   	apc->port_handle = INVALID_MANA_HANDLE;
>   	apc->pf_filter_handle = INVALID_MANA_HANDLE;
>   	apc->port_idx = port_idx;
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> index 146d5db1792f..34707da6ff68 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> @@ -369,6 +369,70 @@ static int mana_set_channels(struct net_device *ndev,
>   	return err;
>   }
>   
> +static void mana_get_ringparam(struct net_device *ndev,
> +			       struct ethtool_ringparam *ring,
> +			       struct kernel_ethtool_ringparam *kernel_ring,
> +			       struct netlink_ext_ack *extack)
> +{
> +	struct mana_port_context *apc = netdev_priv(ndev);
> +
> +	ring->rx_pending = apc->rx_queue_size;
> +	ring->tx_pending = apc->tx_queue_size;
> +	ring->rx_max_pending = MAX_RX_BUFFERS_PER_QUEUE;
> +	ring->tx_max_pending = MAX_TX_BUFFERS_PER_QUEUE;
> +}
> +
> +static int mana_set_ringparam(struct net_device *ndev,
> +			      struct ethtool_ringparam *ring,
> +			      struct kernel_ethtool_ringparam *kernel_ring,
> +			      struct netlink_ext_ack *extack)
> +{
> +	struct mana_port_context *apc = netdev_priv(ndev);
> +	u32 new_tx, new_rx;
> +	u32 old_tx, old_rx;
> +	int err1, err2;
> +
> +	old_tx = apc->tx_queue_size;
> +	old_rx = apc->rx_queue_size;
> +	new_tx = clamp_t(u32, ring->tx_pending, MIN_TX_BUFFERS_PER_QUEUE, MAX_TX_BUFFERS_PER_QUEUE);
> +	new_rx = clamp_t(u32, ring->rx_pending, MIN_RX_BUFFERS_PER_QUEUE, MAX_RX_BUFFERS_PER_QUEUE);
> +
> +	if (!is_power_of_2(new_tx)) {
> +		netdev_err(ndev, "%s:Tx:%d not supported. Needs to be a power of 2\n",
> +			   __func__, new_tx);
> +		return -EINVAL;
> +	}
> +
> +	if (!is_power_of_2(new_rx)) {
> +		netdev_err(ndev, "%s:Rx:%d not supported. Needs to be a power of 2\n",
> +			   __func__, new_rx);
> +		return -EINVAL;
> +	}
> +
> +	err1 = mana_detach(ndev, false);
> +	if (err1) {
> +		netdev_err(ndev, "mana_detach failed: %d\n", err1);
> +		return err1;
> +	}
> +
> +	apc->tx_queue_size = new_tx;
> +	apc->rx_queue_size = new_rx;
> +	err1 = mana_attach(ndev);
> +	if (!err1)
> +		return 0;
> +
> +	netdev_err(ndev, "mana_attach failed: %d\n", err1);
> +
> +	/* Try rolling back to the older values */
> +	apc->tx_queue_size = old_tx;
> +	apc->rx_queue_size = old_rx;
> +	err2 = mana_attach(ndev);
> +	if (err2)
> +		netdev_err(ndev, "mana_reattach failed: %d\n", err2);
> +
> +	return err1;
> +}
> +
>   const struct ethtool_ops mana_ethtool_ops = {
>   	.get_ethtool_stats	= mana_get_ethtool_stats,
>   	.get_sset_count		= mana_get_sset_count,
> @@ -380,4 +444,6 @@ const struct ethtool_ops mana_ethtool_ops = {
>   	.set_rxfh		= mana_set_rxfh,
>   	.get_channels		= mana_get_channels,
>   	.set_channels		= mana_set_channels,
> +	.get_ringparam          = mana_get_ringparam,
> +	.set_ringparam          = mana_set_ringparam,
>   };
> diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
> index 6439fd8b437b..8f922b389883 100644
> --- a/include/net/mana/mana.h
> +++ b/include/net/mana/mana.h
> @@ -38,9 +38,21 @@ enum TRI_STATE {
>   
>   #define COMP_ENTRY_SIZE 64
>   
> -#define RX_BUFFERS_PER_QUEUE 512
> +/* This Max value for RX buffers is derived from __alloc_page()'s max page
> + * allocation calculation. It allows maximum 2^(MAX_ORDER -1) pages. RX buffer
> + * size beyond this value gets rejected by __alloc_page() call.
> + */
> +#define MAX_RX_BUFFERS_PER_QUEUE 8192
> +#define DEF_RX_BUFFERS_PER_QUEUE 512
> +#define MIN_RX_BUFFERS_PER_QUEUE 128
>   
> -#define MAX_SEND_BUFFERS_PER_QUEUE 256
> +/* This max value for TX buffers is derived as the maximum allocatable
> + * pages supported on host per guest through testing. TX buffer size beyond
> + * this value is rejected by the hardware.
> + */
> +#define MAX_TX_BUFFERS_PER_QUEUE 16384
> +#define DEF_TX_BUFFERS_PER_QUEUE 256
> +#define MIN_TX_BUFFERS_PER_QUEUE 128
>   
>   #define EQ_SIZE (8 * MANA_PAGE_SIZE)
>   
> @@ -285,7 +297,7 @@ struct mana_recv_buf_oob {
>   	void *buf_va;
>   	bool from_pool; /* allocated from a page pool */
>   
> -	/* SGL of the buffer going to be sent has part of the work request. */
> +	/* SGL of the buffer going to be sent as part of the work request. */
>   	u32 num_sge;
>   	struct gdma_sge sgl[MAX_RX_WQE_SGL_ENTRIES];
>   
> @@ -437,6 +449,9 @@ struct mana_port_context {
>   	unsigned int max_queues;
>   	unsigned int num_queues;
>   
> +	unsigned int rx_queue_size;
> +	unsigned int tx_queue_size;
> +
>   	mana_handle_t port_handle;
>   	mana_handle_t pf_filter_handle;
>
Stephen Hemminger Aug. 3, 2024, 6:31 p.m. UTC | #8
On Sun, 4 Aug 2024 02:09:21 +0800
Zhu Yanjun <yanjun.zhu@linux.dev> wrote:

> >   
> >   	/*  The minimum size of the WQE is 32 bytes, hence
> > -	 *  MAX_SEND_BUFFERS_PER_QUEUE represents the maximum number of WQEs
> > +	 *  apc->tx_queue_size represents the maximum number of WQEs
> >   	 *  the SQ can store. This value is then used to size other queues
> >   	 *  to prevent overflow.
> > +	 *  Also note that the txq_size is always going to be MANA_PAGE_ALIGNED,
> > +	 *  as tx_queue_size is always a power of 2.
> >   	 */
> > -	txq_size = MAX_SEND_BUFFERS_PER_QUEUE * 32;
> > -	BUILD_BUG_ON(!MANA_PAGE_ALIGNED(txq_size));
> > +	txq_size = apc->tx_queue_size * 32;  
> 
> Not sure if the following is needed or not.
> "
> WARN_ON(!MANA_PAGE_ALIGNED(txq_size));
> "
> 
> Zhu Yanjun

On many systems warn is set to panic the system.
Any constraint like this should be enforced where user input
is managed. In this patch, that would be earlier in mana_set_ringparam().
Looking there, the only requirement is that txq_size is between
the min/max buffers per queue and a power of 2.
Shradha Gupta Aug. 5, 2024, 3:48 a.m. UTC | #9
On Sat, Aug 03, 2024 at 11:31:54AM -0700, Stephen Hemminger wrote:
> On Sun, 4 Aug 2024 02:09:21 +0800
> Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
> 
> > >   
> > >   	/*  The minimum size of the WQE is 32 bytes, hence
> > > -	 *  MAX_SEND_BUFFERS_PER_QUEUE represents the maximum number of WQEs
> > > +	 *  apc->tx_queue_size represents the maximum number of WQEs
> > >   	 *  the SQ can store. This value is then used to size other queues
> > >   	 *  to prevent overflow.
> > > +	 *  Also note that the txq_size is always going to be MANA_PAGE_ALIGNED,
> > > +	 *  as tx_queue_size is always a power of 2.
> > >   	 */
> > > -	txq_size = MAX_SEND_BUFFERS_PER_QUEUE * 32;
> > > -	BUILD_BUG_ON(!MANA_PAGE_ALIGNED(txq_size));
> > > +	txq_size = apc->tx_queue_size * 32;  
> > 
> > Not sure if the following is needed or not.
> > "
> > WARN_ON(!MANA_PAGE_ALIGNED(txq_size));
> > "
> > 
> > Zhu Yanjun
> 
> On many systems warn is set to panic the system.
> Any constraint like this should be enforced where user input
> is managed. In this patch, that would be earlier in mana_set_ringparam().
> Looking there, the only requirement is that txq_size is between
> the min/max buffers per queue and a power of 2.

Thanks Stephen, that's right.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index d2f07e179e86..598ac62be47d 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -618,7 +618,7 @@  static int mana_pre_alloc_rxbufs(struct mana_port_context *mpc, int new_mtu)
 
 	dev = mpc->ac->gdma_dev->gdma_context->dev;
 
-	num_rxb = mpc->num_queues * RX_BUFFERS_PER_QUEUE;
+	num_rxb = mpc->num_queues * mpc->rx_queue_size;
 
 	WARN(mpc->rxbufs_pre, "mana rxbufs_pre exists\n");
 	mpc->rxbufs_pre = kmalloc_array(num_rxb, sizeof(void *), GFP_KERNEL);
@@ -1899,14 +1899,15 @@  static int mana_create_txq(struct mana_port_context *apc,
 		return -ENOMEM;
 
 	/*  The minimum size of the WQE is 32 bytes, hence
-	 *  MAX_SEND_BUFFERS_PER_QUEUE represents the maximum number of WQEs
+	 *  apc->tx_queue_size represents the maximum number of WQEs
 	 *  the SQ can store. This value is then used to size other queues
 	 *  to prevent overflow.
+	 *  Also note that the txq_size is always going to be MANA_PAGE_ALIGNED,
+	 *  as tx_queue_size is always a power of 2.
 	 */
-	txq_size = MAX_SEND_BUFFERS_PER_QUEUE * 32;
-	BUILD_BUG_ON(!MANA_PAGE_ALIGNED(txq_size));
+	txq_size = apc->tx_queue_size * 32;
 
-	cq_size = MAX_SEND_BUFFERS_PER_QUEUE * COMP_ENTRY_SIZE;
+	cq_size = apc->tx_queue_size * COMP_ENTRY_SIZE;
 	cq_size = MANA_PAGE_ALIGN(cq_size);
 
 	gc = gd->gdma_context;
@@ -2145,10 +2146,11 @@  static int mana_push_wqe(struct mana_rxq *rxq)
 
 static int mana_create_page_pool(struct mana_rxq *rxq, struct gdma_context *gc)
 {
+	struct mana_port_context *mpc = netdev_priv(rxq->ndev);
 	struct page_pool_params pprm = {};
 	int ret;
 
-	pprm.pool_size = RX_BUFFERS_PER_QUEUE;
+	pprm.pool_size = mpc->rx_queue_size;
 	pprm.nid = gc->numa_node;
 	pprm.napi = &rxq->rx_cq.napi;
 	pprm.netdev = rxq->ndev;
@@ -2180,13 +2182,13 @@  static struct mana_rxq *mana_create_rxq(struct mana_port_context *apc,
 
 	gc = gd->gdma_context;
 
-	rxq = kzalloc(struct_size(rxq, rx_oobs, RX_BUFFERS_PER_QUEUE),
+	rxq = kzalloc(struct_size(rxq, rx_oobs, apc->rx_queue_size),
 		      GFP_KERNEL);
 	if (!rxq)
 		return NULL;
 
 	rxq->ndev = ndev;
-	rxq->num_rx_buf = RX_BUFFERS_PER_QUEUE;
+	rxq->num_rx_buf = apc->rx_queue_size;
 	rxq->rxq_idx = rxq_idx;
 	rxq->rxobj = INVALID_MANA_HANDLE;
 
@@ -2734,6 +2736,8 @@  static int mana_probe_port(struct mana_context *ac, int port_idx,
 	apc->ndev = ndev;
 	apc->max_queues = gc->max_num_queues;
 	apc->num_queues = gc->max_num_queues;
+	apc->tx_queue_size = DEF_TX_BUFFERS_PER_QUEUE;
+	apc->rx_queue_size = DEF_RX_BUFFERS_PER_QUEUE;
 	apc->port_handle = INVALID_MANA_HANDLE;
 	apc->pf_filter_handle = INVALID_MANA_HANDLE;
 	apc->port_idx = port_idx;
diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
index 146d5db1792f..34707da6ff68 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
@@ -369,6 +369,70 @@  static int mana_set_channels(struct net_device *ndev,
 	return err;
 }
 
+static void mana_get_ringparam(struct net_device *ndev,
+			       struct ethtool_ringparam *ring,
+			       struct kernel_ethtool_ringparam *kernel_ring,
+			       struct netlink_ext_ack *extack)
+{
+	struct mana_port_context *apc = netdev_priv(ndev);
+
+	ring->rx_pending = apc->rx_queue_size;
+	ring->tx_pending = apc->tx_queue_size;
+	ring->rx_max_pending = MAX_RX_BUFFERS_PER_QUEUE;
+	ring->tx_max_pending = MAX_TX_BUFFERS_PER_QUEUE;
+}
+
+static int mana_set_ringparam(struct net_device *ndev,
+			      struct ethtool_ringparam *ring,
+			      struct kernel_ethtool_ringparam *kernel_ring,
+			      struct netlink_ext_ack *extack)
+{
+	struct mana_port_context *apc = netdev_priv(ndev);
+	u32 new_tx, new_rx;
+	u32 old_tx, old_rx;
+	int err1, err2;
+
+	old_tx = apc->tx_queue_size;
+	old_rx = apc->rx_queue_size;
+	new_tx = clamp_t(u32, ring->tx_pending, MIN_TX_BUFFERS_PER_QUEUE, MAX_TX_BUFFERS_PER_QUEUE);
+	new_rx = clamp_t(u32, ring->rx_pending, MIN_RX_BUFFERS_PER_QUEUE, MAX_RX_BUFFERS_PER_QUEUE);
+
+	if (!is_power_of_2(new_tx)) {
+		netdev_err(ndev, "%s:Tx:%d not supported. Needs to be a power of 2\n",
+			   __func__, new_tx);
+		return -EINVAL;
+	}
+
+	if (!is_power_of_2(new_rx)) {
+		netdev_err(ndev, "%s:Rx:%d not supported. Needs to be a power of 2\n",
+			   __func__, new_rx);
+		return -EINVAL;
+	}
+
+	err1 = mana_detach(ndev, false);
+	if (err1) {
+		netdev_err(ndev, "mana_detach failed: %d\n", err1);
+		return err1;
+	}
+
+	apc->tx_queue_size = new_tx;
+	apc->rx_queue_size = new_rx;
+	err1 = mana_attach(ndev);
+	if (!err1)
+		return 0;
+
+	netdev_err(ndev, "mana_attach failed: %d\n", err1);
+
+	/* Try rolling back to the older values */
+	apc->tx_queue_size = old_tx;
+	apc->rx_queue_size = old_rx;
+	err2 = mana_attach(ndev);
+	if (err2)
+		netdev_err(ndev, "mana_reattach failed: %d\n", err2);
+
+	return err1;
+}
+
 const struct ethtool_ops mana_ethtool_ops = {
 	.get_ethtool_stats	= mana_get_ethtool_stats,
 	.get_sset_count		= mana_get_sset_count,
@@ -380,4 +444,6 @@  const struct ethtool_ops mana_ethtool_ops = {
 	.set_rxfh		= mana_set_rxfh,
 	.get_channels		= mana_get_channels,
 	.set_channels		= mana_set_channels,
+	.get_ringparam          = mana_get_ringparam,
+	.set_ringparam          = mana_set_ringparam,
 };
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 6439fd8b437b..8f922b389883 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -38,9 +38,21 @@  enum TRI_STATE {
 
 #define COMP_ENTRY_SIZE 64
 
-#define RX_BUFFERS_PER_QUEUE 512
+/* This Max value for RX buffers is derived from __alloc_page()'s max page
+ * allocation calculation. It allows maximum 2^(MAX_ORDER -1) pages. RX buffer
+ * size beyond this value gets rejected by __alloc_page() call.
+ */
+#define MAX_RX_BUFFERS_PER_QUEUE 8192
+#define DEF_RX_BUFFERS_PER_QUEUE 512
+#define MIN_RX_BUFFERS_PER_QUEUE 128
 
-#define MAX_SEND_BUFFERS_PER_QUEUE 256
+/* This max value for TX buffers is derived as the maximum allocatable
+ * pages supported on host per guest through testing. TX buffer size beyond
+ * this value is rejected by the hardware.
+ */
+#define MAX_TX_BUFFERS_PER_QUEUE 16384
+#define DEF_TX_BUFFERS_PER_QUEUE 256
+#define MIN_TX_BUFFERS_PER_QUEUE 128
 
 #define EQ_SIZE (8 * MANA_PAGE_SIZE)
 
@@ -285,7 +297,7 @@  struct mana_recv_buf_oob {
 	void *buf_va;
 	bool from_pool; /* allocated from a page pool */
 
-	/* SGL of the buffer going to be sent has part of the work request. */
+	/* SGL of the buffer going to be sent as part of the work request. */
 	u32 num_sge;
 	struct gdma_sge sgl[MAX_RX_WQE_SGL_ENTRIES];
 
@@ -437,6 +449,9 @@  struct mana_port_context {
 	unsigned int max_queues;
 	unsigned int num_queues;
 
+	unsigned int rx_queue_size;
+	unsigned int tx_queue_size;
+
 	mana_handle_t port_handle;
 	mana_handle_t pf_filter_handle;