diff mbox series

[net-next,13/15] net/smc: introduce loopback-ism DMB type control

Message ID 20240111120036.109903-14-guwen@linux.alibaba.com (mailing list archive)
State Deferred
Delegated to: Netdev Maintainers
Headers show
Series net/smc: implement loopback-ism used by SMC-D | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1090 this patch: 1090
netdev/cc_maintainers success CCed 0 of 0 maintainers
netdev/build_clang success Errors and warnings before: 1108 this patch: 1108
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1114 this patch: 1114
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 149 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Wen Gu Jan. 11, 2024, noon UTC
This provides a way to {get|set} type of DMB offered by loopback-ism,
whether it is physically or virtually contiguous memory.

echo 0 > /sys/devices/virtual/smc/loopback-ism/dmb_type # physically
echo 1 > /sys/devices/virtual/smc/loopback-ism/dmb_type # virtually

The settings take effect after re-activating loopback-ism by:

echo 0 > /sys/devices/virtual/smc/loopback-ism/active
echo 1 > /sys/devices/virtual/smc/loopback-ism/active

After this, the link group and DMBs related to loopback-ism will be
flushed and subsequent DMBs created will be of the desired type.

The motivation of this control is that physically contiguous DMB has
best performance but is usually expensive, while the virtually
contiguous DMB is cheap and perform well in most scenarios, but if
sndbuf and DMB are merged, virtual DMB will be accessed concurrently
in Tx and Rx and there will be a bottleneck caused by lock contention
of find_vmap_area when there are many CPUs and CONFIG_HARDENED_USERCOPY
is set (see link below). So an option is provided.

Link: https://lore.kernel.org/all/238e63cd-e0e8-4fbf-852f-bc4d5bc35d5a@linux.alibaba.com/
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 net/smc/smc_loopback.c | 80 +++++++++++++++++++++++++++++++++++-------
 net/smc/smc_loopback.h |  6 ++++
 2 files changed, 74 insertions(+), 12 deletions(-)

Comments

Wenjia Zhang Feb. 16, 2024, 2:25 p.m. UTC | #1
On 11.01.24 13:00, Wen Gu wrote:
> This provides a way to {get|set} type of DMB offered by loopback-ism,
> whether it is physically or virtually contiguous memory.
> 
> echo 0 > /sys/devices/virtual/smc/loopback-ism/dmb_type # physically
> echo 1 > /sys/devices/virtual/smc/loopback-ism/dmb_type # virtually
> 
> The settings take effect after re-activating loopback-ism by:
> 
> echo 0 > /sys/devices/virtual/smc/loopback-ism/active
> echo 1 > /sys/devices/virtual/smc/loopback-ism/active
> 
> After this, the link group and DMBs related to loopback-ism will be
> flushed and subsequent DMBs created will be of the desired type.
> 
> The motivation of this control is that physically contiguous DMB has
> best performance but is usually expensive, while the virtually
> contiguous DMB is cheap and perform well in most scenarios, but if
> sndbuf and DMB are merged, virtual DMB will be accessed concurrently
> in Tx and Rx and there will be a bottleneck caused by lock contention
> of find_vmap_area when there are many CPUs and CONFIG_HARDENED_USERCOPY
> is set (see link below). So an option is provided.
> 
I'm courious about why you say that physically contiguous DMB has best 
performance. Because we saw even a bit better perfomance with the 
virtual one than the performance with the physical one.
Wen Gu Feb. 20, 2024, 3:19 a.m. UTC | #2
On 2024/2/16 22:25, Wenjia Zhang wrote:
> 
> 
> On 11.01.24 13:00, Wen Gu wrote:
>> This provides a way to {get|set} type of DMB offered by loopback-ism,
>> whether it is physically or virtually contiguous memory.
>>
>> echo 0 > /sys/devices/virtual/smc/loopback-ism/dmb_type # physically
>> echo 1 > /sys/devices/virtual/smc/loopback-ism/dmb_type # virtually
>>
>> The settings take effect after re-activating loopback-ism by:
>>
>> echo 0 > /sys/devices/virtual/smc/loopback-ism/active
>> echo 1 > /sys/devices/virtual/smc/loopback-ism/active
>>
>> After this, the link group and DMBs related to loopback-ism will be
>> flushed and subsequent DMBs created will be of the desired type.
>>
>> The motivation of this control is that physically contiguous DMB has
>> best performance but is usually expensive, while the virtually
>> contiguous DMB is cheap and perform well in most scenarios, but if
>> sndbuf and DMB are merged, virtual DMB will be accessed concurrently
>> in Tx and Rx and there will be a bottleneck caused by lock contention
>> of find_vmap_area when there are many CPUs and CONFIG_HARDENED_USERCOPY
>> is set (see link below). So an option is provided.
>>
> I'm courious about why you say that physically contiguous DMB has best performance. Because we saw even a bit better 
> perfomance with the virtual one than the performance with the physical one.

Hi Wenjia, you can find examples from here:

https://lore.kernel.org/all/3189e342-c38f-6076-b730-19a6efd732a5@linux.alibaba.com/
https://lore.kernel.org/all/238e63cd-e0e8-4fbf-852f-bc4d5bc35d5a@linux.alibaba.com/

Excerpted from above:
"
In 48 CPUs qemu environment, the Requests/s increased by 5 times:
- nginx
- wrk -c 1000 -t 96 -d 30 http://127.0.0.1:80

                  vzalloced shmem      vzalloced shmem(with this patch set)
Requests/sec          113536.56            583729.93


But it also has some overhead, compared to using kzalloced shared memory
or unsetting CONFIG_HARDENED_USERCOPY, which won't involve finding vmap area:

                  kzalloced shmem      vzalloced shmem(unset CONFIG_HARDENED_USERCOPY)
Requests/sec          831950.39            805164.78
"

Without CONFIG_HARDENED_USERCOPY, the performance of physical-DMB and
virtual-DMB is basically same (physical-DMB is a bit better), and with
CONFIG_HARDENED_USERCOPY, under many CPUs environment, such as 48 CPUs
here, if we merge sndbuf and DMB, the find_vmap_area lock contention is
heavy, and the performance is drop obviously. So I said physical-DMB has
best performance, since it can guarantee good performance under known
environments.


By the way, we discussed the memory cost before (see [1]), but I found
that when we use s390 ISM (or not merge sndbuf and DMB), the sndbuf also
costs physically contiguous memory.

static struct smc_buf_desc *smcd_new_buf_create(struct smc_link_group *lgr,
						bool is_dmb, int bufsize)
{
<...>
	if (is_dmb) {
<...>
	} else {
		buf_desc->cpu_addr = kzalloc(bufsize, GFP_KERNEL |
					     __GFP_NOWARN | __GFP_NORETRY |
					     __GFP_NOMEMALLOC);
		if (!buf_desc->cpu_addr) {
			kfree(buf_desc);
			return ERR_PTR(-EAGAIN);
		}
		buf_desc->len = bufsize;
	}
<...>
}

So I wonder is it really necessary to use virtual-DMB in loopback-ism? Maybe
we can always use physical-DMB in loopback-ism, then there is no need for the
dmb_type or dmb_copy knobs.

[1] https://lore.kernel.org/netdev/d6facfd5-e083-ffc7-05e5-2e8f3ef17735@linux.alibaba.com/


Thanks!
diff mbox series

Patch

diff --git a/net/smc/smc_loopback.c b/net/smc/smc_loopback.c
index a89dbf84aea5..2e734f8e08f5 100644
--- a/net/smc/smc_loopback.c
+++ b/net/smc/smc_loopback.c
@@ -13,6 +13,7 @@ 
 
 #include <linux/device.h>
 #include <linux/types.h>
+#include <linux/vmalloc.h>
 #include <net/smc.h>
 
 #include "smc_cdc.h"
@@ -24,6 +25,7 @@ 
 #define SMC_DMA_ADDR_INVALID	(~(dma_addr_t)0)
 
 static const char smc_lo_dev_name[] = "loopback-ism";
+static unsigned int smc_lo_dmb_type = SMC_LO_DMB_PHYS;
 static struct smc_lo_dev *lo_dev;
 static struct class *smc_class;
 
@@ -124,8 +126,50 @@  static ssize_t active_store(struct device *dev,
 	return count;
 }
 static DEVICE_ATTR_RW(active);
+
+static ssize_t dmb_type_show(struct device *dev,
+			     struct device_attribute *attr, char *buf)
+{
+	struct smc_lo_dev *ldev =
+		container_of(dev, struct smc_lo_dev, dev);
+	const char *type;
+
+	switch (ldev->dmb_type) {
+	case SMC_LO_DMB_PHYS:
+		type = "Physically contiguous buffer";
+		break;
+	case SMC_LO_DMB_VIRT:
+		type = "Virtually contiguous buffer";
+		break;
+	default:
+		type = "Unknown type";
+	}
+
+	return sysfs_emit(buf, "%d: %s\n", ldev->dmb_type, type);
+}
+
+static ssize_t dmb_type_store(struct device *dev,
+			      struct device_attribute *attr,
+			      const char *buf, size_t count)
+{
+	unsigned int dmb_type;
+	int ret;
+
+	ret = kstrtouint(buf, 0, &dmb_type);
+	if (ret)
+		return ret;
+
+	if (dmb_type != SMC_LO_DMB_PHYS &&
+	    dmb_type != SMC_LO_DMB_VIRT)
+		return -EINVAL;
+
+	smc_lo_dmb_type = dmb_type; /* re-activate to take effect */
+	return count;
+}
+static DEVICE_ATTR_RW(dmb_type);
 static struct attribute *smc_lo_attrs[] = {
 	&dev_attr_active.attr,
+	&dev_attr_dmb_type.attr,
 	&dev_attr_xfer_bytes.attr,
 	&dev_attr_dmbs_cnt.attr,
 	NULL,
@@ -170,8 +214,7 @@  static int smc_lo_register_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb,
 {
 	struct smc_lo_dmb_node *dmb_node, *tmp_node;
 	struct smc_lo_dev *ldev = smcd->priv;
-	int sba_idx, order, rc;
-	struct page *pages;
+	int sba_idx, rc;
 
 	/* check space for new dmb */
 	for_each_clear_bit(sba_idx, ldev->sba_idx_mask, SMC_LO_MAX_DMBS) {
@@ -188,16 +231,27 @@  static int smc_lo_register_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb,
 	}
 
 	dmb_node->sba_idx = sba_idx;
-	order = get_order(dmb->dmb_len);
-	pages = alloc_pages(GFP_KERNEL | __GFP_NOWARN |
-			    __GFP_NOMEMALLOC | __GFP_COMP |
-			    __GFP_NORETRY | __GFP_ZERO,
-			    order);
-	if (!pages) {
-		rc = -ENOMEM;
-		goto err_node;
+	if (ldev->dmb_type == SMC_LO_DMB_PHYS) {
+		struct page *pages;
+		int order;
+
+		order = get_order(dmb->dmb_len);
+		pages = alloc_pages(GFP_KERNEL | __GFP_NOWARN |
+				    __GFP_NOMEMALLOC | __GFP_COMP |
+				    __GFP_NORETRY | __GFP_ZERO,
+				    order);
+		if (!pages) {
+			rc = -ENOMEM;
+			goto err_node;
+		}
+		dmb_node->cpu_addr = (void *)page_address(pages);
+	} else {
+		dmb_node->cpu_addr = vzalloc(dmb->dmb_len);
+		if (!dmb_node->cpu_addr) {
+			rc = -ENOMEM;
+			goto err_node;
+		}
 	}
-	dmb_node->cpu_addr = (void *)page_address(pages);
 	dmb_node->len = dmb->dmb_len;
 	dmb_node->dma_addr = SMC_DMA_ADDR_INVALID;
 
@@ -251,7 +305,7 @@  static int smc_lo_unregister_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb)
 	write_unlock(&ldev->dmb_ht_lock);
 
 	clear_bit(dmb_node->sba_idx, ldev->sba_idx_mask);
-	kfree(dmb_node->cpu_addr);
+	kvfree(dmb_node->cpu_addr);
 	kfree(dmb_node);
 	SMC_LO_STAT_DMBS_DEC(ldev);
 
@@ -396,6 +450,7 @@  static int smcd_lo_register_dev(struct smc_lo_dev *ldev)
 	ldev->smcd = smcd;
 	smcd->priv = ldev;
 	smc_ism_set_v2_capable();
+	ldev->dmb_type = smc_lo_dmb_type;
 	mutex_lock(&smcd_dev_list.mutex);
 	list_add(&smcd->list, &smcd_dev_list.list);
 	mutex_unlock(&smcd_dev_list.mutex);
@@ -419,6 +474,7 @@  static void smcd_lo_unregister_dev(struct smc_lo_dev *ldev)
 	mutex_unlock(&smcd_dev_list.mutex);
 	kfree(smcd->conn);
 	kfree(smcd);
+	ldev->dmb_type = smc_lo_dmb_type;
 	smc_lo_clear_stats(ldev);
 }
 
diff --git a/net/smc/smc_loopback.h b/net/smc/smc_loopback.h
index d4572ca42f08..8ee5c6805fc4 100644
--- a/net/smc/smc_loopback.h
+++ b/net/smc/smc_loopback.h
@@ -23,6 +23,11 @@ 
 #define SMC_LO_DMBS_HASH_BITS	12
 #define SMC_LO_CHID		0xFFFF
 
+enum {
+	SMC_LO_DMB_PHYS,
+	SMC_LO_DMB_VIRT,
+};
+
 struct smc_lo_dmb_node {
 	struct hlist_node list;
 	u64 token;
@@ -41,6 +46,7 @@  struct smc_lo_dev {
 	struct smcd_dev *smcd;
 	struct device dev;
 	u8 active;
+	u8 dmb_type;
 	u16 chid;
 	struct smcd_gid local_gid;
 	struct smc_lo_dev_stats64 __percpu *stats;