Message ID | 20240514182050.20931-3-ronak.doshi@broadcom.com (mailing list archive) |
---|---|
State | Deferred |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | vmxnet3: upgrade to version 9 | expand |
On Tue, May 14, 2024 at 11:20:47AM -0700, Ronak Doshi wrote: > This patch enhances vmxnet3 to support latency measurement. > This support will help to track the latency in packet processing > between guest virtual nic driver and host. For this purpose, we > introduce a new timestamp ring in vmxnet3 which will be per Tx/Rx > queue. This ring will be used to carry timestamp of the packets > which will be used to calculate the latency. > > Signed-off-by: Ronak Doshi <ronak.doshi@broadcom.com> > Acked-by: Guolin Yang <guolin.yang@broadcom.com> ... > index b3f3136cc8be..74cb63e3d311 100644 > --- a/drivers/net/vmxnet3/vmxnet3_drv.c > +++ b/drivers/net/vmxnet3/vmxnet3_drv.c > @@ -143,6 +143,29 @@ vmxnet3_tq_stop(struct vmxnet3_tx_queue *tq, struct vmxnet3_adapter *adapter) > netif_stop_subqueue(adapter->netdev, (tq - adapter->tx_queue)); > } > > +static u64 > +vmxnet3_get_cycles(int pmc) > +{ > + u32 low, high; > + > + asm volatile("rdpmc" : "=a" (low), "=d" (high) : "c" (pmc)); > + return (low | ((u_int64_t)high << 32)); > +} Hi Ronak, This seems to open-code the rdpmc macro. And it also seems to exclude compilation of this driver other than for x86. This seems undesirable as, in general, networking drivers are supposed to be architecture independent. I'd say, doubly so, for software devices. Moreover, rdpmc outside of x86 architecture-specific code seems highly unusual to me. So I wonder if there is a better approach to the problem at hand. If not, I would suggest making this feature optional and only compiled for x86. That might mean factoring it out into a different file. I'm unsure. If not, I think the driver's Kconfig needs to be updated to reflect that it can only be compiled for x86. ...
Hi Ronak, kernel test robot noticed the following build errors: [auto build test ERROR on net-next/main] url: https://github.com/intel-lab-lkp/linux/commits/Ronak-Doshi/vmxnet3-prepare-for-version-9-changes/20240515-023833 base: net-next/main patch link: https://lore.kernel.org/r/20240514182050.20931-3-ronak.doshi%40broadcom.com patch subject: [PATCH net-next 2/4] vmxnet3: add latency measurement support in vmxnet3 config: csky-randconfig-r081-20240516 (https://download.01.org/0day-ci/archive/20240516/202405160624.WdmYFkNm-lkp@intel.com/config) compiler: csky-linux-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240516/202405160624.WdmYFkNm-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202405160624.WdmYFkNm-lkp@intel.com/ All errors (new ones prefixed by >>): In function 'vmxnet3_get_cycles', inlined from 'vmxnet3_rq_rx_complete' at drivers/net/vmxnet3/vmxnet3_drv.c:1675:25: >> drivers/net/vmxnet3/vmxnet3_drv.c:151:9: error: impossible constraint in 'asm' 151 | asm volatile("rdpmc" : "=a" (low), "=d" (high) : "c" (pmc)); | ^~~ vim +/asm +151 drivers/net/vmxnet3/vmxnet3_drv.c 145 146 static u64 147 vmxnet3_get_cycles(int pmc) 148 { 149 u32 low, high; 150 > 151 asm volatile("rdpmc" : "=a" (low), "=d" (high) : "c" (pmc)); 152 return (low | ((u_int64_t)high << 32)); 153 } 154
Hi Ronak, kernel test robot noticed the following build errors: [auto build test ERROR on net-next/main] url: https://github.com/intel-lab-lkp/linux/commits/Ronak-Doshi/vmxnet3-prepare-for-version-9-changes/20240515-023833 base: net-next/main patch link: https://lore.kernel.org/r/20240514182050.20931-3-ronak.doshi%40broadcom.com patch subject: [PATCH net-next 2/4] vmxnet3: add latency measurement support in vmxnet3 config: s390-allmodconfig (https://download.01.org/0day-ci/archive/20240516/202405160926.GplTaou0-lkp@intel.com/config) compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project d3455f4ddd16811401fa153298fadd2f59f6914e) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240516/202405160926.GplTaou0-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202405160926.GplTaou0-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from include/linux/elf.h:6: In file included from arch/s390/include/asm/elf.h:173: In file included from arch/s390/include/asm/mmu_context.h:11: In file included from arch/s390/include/asm/pgalloc.h:18: In file included from include/linux/mm.h:2210: include/linux/vmstat.h:508:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 508 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~ ^ 509 | item]; | ~~~~ include/linux/vmstat.h:515:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 515 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~ ^ 516 | NR_VM_NUMA_EVENT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~~ include/linux/vmstat.h:522:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 522 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" | ~~~~~~~~~~~ ^ ~~~ include/linux/vmstat.h:527:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 527 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~ ^ 528 | NR_VM_NUMA_EVENT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~~ include/linux/vmstat.h:536:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 536 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~ ^ 537 | NR_VM_NUMA_EVENT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~~ In file included from drivers/net/vmxnet3/vmxnet3_drv.c:28: In file included from include/net/ip6_checksum.h:27: In file included from include/net/ip.h:22: In file included from include/linux/ip.h:16: In file included from include/linux/skbuff.h:28: In file included from include/linux/dma-mapping.h:11: In file included from include/linux/scatterlist.h:9: In file included from arch/s390/include/asm/io.h:78: include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 547 | val = __raw_readb(PCI_IOBASE + addr); | ~~~~~~~~~~ ^ include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 560 | val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr)); | ~~~~~~~~~~ ^ include/uapi/linux/byteorder/big_endian.h:37:59: note: expanded from macro '__le16_to_cpu' 37 | #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x)) | ^ include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16' 102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x)) | ^ In file included from drivers/net/vmxnet3/vmxnet3_drv.c:28: In file included from include/net/ip6_checksum.h:27: In file included from include/net/ip.h:22: In file included from include/linux/ip.h:16: In file included from include/linux/skbuff.h:28: In file included from include/linux/dma-mapping.h:11: In file included from include/linux/scatterlist.h:9: In file included from arch/s390/include/asm/io.h:78: include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 573 | val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr)); | ~~~~~~~~~~ ^ include/uapi/linux/byteorder/big_endian.h:35:59: note: expanded from macro '__le32_to_cpu' 35 | #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x)) | ^ include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32' 115 | #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x)) | ^ In file included from drivers/net/vmxnet3/vmxnet3_drv.c:28: In file included from include/net/ip6_checksum.h:27: In file included from include/net/ip.h:22: In file included from include/linux/ip.h:16: In file included from include/linux/skbuff.h:28: In file included from include/linux/dma-mapping.h:11: In file included from include/linux/scatterlist.h:9: In file included from arch/s390/include/asm/io.h:78: include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 584 | __raw_writeb(value, PCI_IOBASE + addr); | ~~~~~~~~~~ ^ include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 594 | __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr); | ~~~~~~~~~~ ^ include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 604 | __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr); | ~~~~~~~~~~ ^ include/asm-generic/io.h:692:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 692 | readsb(PCI_IOBASE + addr, buffer, count); | ~~~~~~~~~~ ^ include/asm-generic/io.h:700:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 700 | readsw(PCI_IOBASE + addr, buffer, count); | ~~~~~~~~~~ ^ include/asm-generic/io.h:708:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 708 | readsl(PCI_IOBASE + addr, buffer, count); | ~~~~~~~~~~ ^ include/asm-generic/io.h:717:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 717 | writesb(PCI_IOBASE + addr, buffer, count); | ~~~~~~~~~~ ^ include/asm-generic/io.h:726:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 726 | writesw(PCI_IOBASE + addr, buffer, count); | ~~~~~~~~~~ ^ include/asm-generic/io.h:735:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 735 | writesl(PCI_IOBASE + addr, buffer, count); | ~~~~~~~~~~ ^ >> drivers/net/vmxnet3/vmxnet3_drv.c:151:51: error: invalid input constraint 'c' in asm 151 | asm volatile("rdpmc" : "=a" (low), "=d" (high) : "c" (pmc)); | ^ drivers/net/vmxnet3/vmxnet3_drv.c:3973:46: warning: shift count >= width of type [-Wshift-count-overflow] 3973 | err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64)); | ^~~~~~~~~~~~~~~~ include/linux/dma-mapping.h:77:54: note: expanded from macro 'DMA_BIT_MASK' 77 | #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1)) | ^ ~~~ 18 warnings and 1 error generated. vim +/c +151 drivers/net/vmxnet3/vmxnet3_drv.c 145 146 static u64 147 vmxnet3_get_cycles(int pmc) 148 { 149 u32 low, high; 150 > 151 asm volatile("rdpmc" : "=a" (low), "=d" (high) : "c" (pmc)); 152 return (low | ((u_int64_t)high << 32)); 153 } 154
Hi Ronak, kernel test robot noticed the following build warnings: [auto build test WARNING on net-next/main] url: https://github.com/intel-lab-lkp/linux/commits/Ronak-Doshi/vmxnet3-prepare-for-version-9-changes/20240515-023833 base: net-next/main patch link: https://lore.kernel.org/r/20240514182050.20931-3-ronak.doshi%40broadcom.com patch subject: [PATCH net-next 2/4] vmxnet3: add latency measurement support in vmxnet3 config: loongarch-allmodconfig (https://download.01.org/0day-ci/archive/20240516/202405161123.iWtQjhIw-lkp@intel.com/config) compiler: loongarch64-linux-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240516/202405161123.iWtQjhIw-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202405161123.iWtQjhIw-lkp@intel.com/ All warnings (new ones prefixed by >>): In function 'vmxnet3_get_cycles', inlined from 'vmxnet3_rq_rx_complete' at drivers/net/vmxnet3/vmxnet3_drv.c:1675:25: >> drivers/net/vmxnet3/vmxnet3_drv.c:151:9: warning: 'asm' operand 2 probably does not match constraints 151 | asm volatile("rdpmc" : "=a" (low), "=d" (high) : "c" (pmc)); | ^~~ drivers/net/vmxnet3/vmxnet3_drv.c:151:9: error: impossible constraint in 'asm' In function 'vmxnet3_get_cycles', inlined from 'vmxnet3_tq_xmit.isra' at drivers/net/vmxnet3/vmxnet3_drv.c:1316:28: >> drivers/net/vmxnet3/vmxnet3_drv.c:151:9: warning: 'asm' operand 2 probably does not match constraints 151 | asm volatile("rdpmc" : "=a" (low), "=d" (high) : "c" (pmc)); | ^~~ vim +/asm +151 drivers/net/vmxnet3/vmxnet3_drv.c 145 146 static u64 147 vmxnet3_get_cycles(int pmc) 148 { 149 u32 low, high; 150 > 151 asm volatile("rdpmc" : "=a" (low), "=d" (high) : "c" (pmc)); 152 return (low | ((u_int64_t)high << 32)); 153 } 154
> On Wed, May 15, 2024 at 3:46 AM Simon Horman <horms@kernel.org> wrote: > If not, I would suggest making this feature optional and only compiled > for x86. That might mean factoring it out into a different file. I'm > unsure. I can move the rdpmc code under #if defined(__i386__) || defined(__x86_64__) so that it will be no-op for other architectures. Will that be fine?
On Mon, May 20, 2024 at 12:19:43PM -0700, Ronak Doshi wrote: > > On Wed, May 15, 2024 at 3:46 AM Simon Horman <horms@kernel.org> wrote: > > If not, I would suggest making this feature optional and only compiled > > for x86. That might mean factoring it out into a different file. I'm > > unsure. > I can move the rdpmc code under #if defined(__i386__) || > defined(__x86_64__) so that it will be no-op for other architectures. > Will that be fine? Hi Ronak, I think that it would be an good improvement, as long as the result is a working driver for other architectures. Perhaps #ifdef CONFIG_X86 can be used as the guard. I also would suggest using rdpmc helpers if possible.
diff --git a/drivers/net/vmxnet3/vmxnet3_defs.h b/drivers/net/vmxnet3/vmxnet3_defs.h index 67387bb7aa24..dcf1cf8e7a86 100644 --- a/drivers/net/vmxnet3/vmxnet3_defs.h +++ b/drivers/net/vmxnet3/vmxnet3_defs.h @@ -80,6 +80,8 @@ enum { #define VMXNET3_IO_TYPE(addr) ((addr) >> 24) #define VMXNET3_IO_REG(addr) ((addr) & 0xFFFFFF) +#define VMXNET3_PMC_PSEUDO_TSC 0x10003 + enum { VMXNET3_CMD_FIRST_SET = 0xCAFE0000, VMXNET3_CMD_ACTIVATE_DEV = VMXNET3_CMD_FIRST_SET, @@ -123,6 +125,7 @@ enum { VMXNET3_CMD_GET_RESERVED4, VMXNET3_CMD_GET_MAX_CAPABILITIES, VMXNET3_CMD_GET_DCR0_REG, + VMXNET3_CMD_GET_TSRING_DESC_SIZE, }; /* @@ -254,6 +257,24 @@ struct Vmxnet3_RxDesc { #define VMXNET3_RCD_HDR_INNER_SHIFT 13 +struct Vmxnet3TSInfo { + u64 tsData:56; + u64 tsType:4; + u64 tsi:1; //bit to indicate to set ts + u64 pad:3; + u64 pad2; +}; + +struct Vmxnet3_TxTSDesc { + struct Vmxnet3TSInfo ts; + u64 pad[14]; +}; + +struct Vmxnet3_RxTSDesc { + struct Vmxnet3TSInfo ts; + u64 pad[14]; +}; + struct Vmxnet3_RxCompDesc { #ifdef __BIG_ENDIAN_BITFIELD u32 ext2:1; @@ -427,6 +448,13 @@ union Vmxnet3_GenericDesc { #define VMXNET3_RXDATA_DESC_SIZE_ALIGN 64 #define VMXNET3_RXDATA_DESC_SIZE_MASK (VMXNET3_RXDATA_DESC_SIZE_ALIGN - 1) +/* Rx TS Ring buffer size must be a multiple of 64 bytes */ +#define VMXNET3_RXTS_DESC_SIZE_ALIGN 64 +#define VMXNET3_RXTS_DESC_SIZE_MASK (VMXNET3_RXTS_DESC_SIZE_ALIGN - 1) +/* Tx TS Ring buffer size must be a multiple of 64 bytes */ +#define VMXNET3_TXTS_DESC_SIZE_ALIGN 64 +#define VMXNET3_TXTS_DESC_SIZE_MASK (VMXNET3_TXTS_DESC_SIZE_ALIGN - 1) + /* Max ring size */ #define VMXNET3_TX_RING_MAX_SIZE 4096 #define VMXNET3_TC_RING_MAX_SIZE 4096 @@ -439,6 +467,9 @@ union Vmxnet3_GenericDesc { #define VMXNET3_RXDATA_DESC_MAX_SIZE 2048 +#define VMXNET3_TXTS_DESC_MAX_SIZE 256 +#define VMXNET3_RXTS_DESC_MAX_SIZE 256 + /* a list of reasons for queue stop */ enum { @@ -546,6 +577,24 @@ struct Vmxnet3_RxQueueConf { }; +struct Vmxnet3_LatencyConf { + u16 sampleRate; + u16 pad; +}; + +struct Vmxnet3_TxQueueTSConf { + __le64 txTSRingBasePA; + __le16 txTSRingDescSize; /* size of tx timestamp ring buffer */ + u16 pad; + struct Vmxnet3_LatencyConf latencyConf; +}; + +struct Vmxnet3_RxQueueTSConf { + __le64 rxTSRingBasePA; + __le16 rxTSRingDescSize; /* size of rx timestamp ring buffer */ + u16 pad[3]; +}; + enum vmxnet3_intr_mask_mode { VMXNET3_IMM_AUTO = 0, VMXNET3_IMM_ACTIVE = 1, @@ -679,7 +728,8 @@ struct Vmxnet3_TxQueueDesc { /* Driver read after a GET command */ struct Vmxnet3_QueueStatus status; struct UPT1_TxStats stats; - u8 _pad[88]; /* 128 aligned */ + struct Vmxnet3_TxQueueTSConf tsConf; + u8 _pad[72]; /* 128 aligned */ }; @@ -689,7 +739,8 @@ struct Vmxnet3_RxQueueDesc { /* Driver read after a GET commad */ struct Vmxnet3_QueueStatus status; struct UPT1_RxStats stats; - u8 __pad[88]; /* 128 aligned */ + struct Vmxnet3_RxQueueTSConf tsConf; + u8 __pad[72]; /* 128 aligned */ }; struct Vmxnet3_SetPolling { diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c index b3f3136cc8be..74cb63e3d311 100644 --- a/drivers/net/vmxnet3/vmxnet3_drv.c +++ b/drivers/net/vmxnet3/vmxnet3_drv.c @@ -143,6 +143,29 @@ vmxnet3_tq_stop(struct vmxnet3_tx_queue *tq, struct vmxnet3_adapter *adapter) netif_stop_subqueue(adapter->netdev, (tq - adapter->tx_queue)); } +static u64 +vmxnet3_get_cycles(int pmc) +{ + u32 low, high; + + asm volatile("rdpmc" : "=a" (low), "=d" (high) : "c" (pmc)); + return (low | ((u_int64_t)high << 32)); +} + +static bool +vmxnet3_apply_timestamp(struct vmxnet3_tx_queue *tq, u16 rate) +{ + if (rate > 0) { + if (tq->tsPktCount == 1) { + if (rate != 1) + tq->tsPktCount = rate; + return true; + } + tq->tsPktCount--; + } + return false; +} + /* Check if capability is supported by UPT device or * UPT is even requested */ @@ -498,6 +521,12 @@ vmxnet3_tq_destroy(struct vmxnet3_tx_queue *tq, tq->data_ring.base, tq->data_ring.basePA); tq->data_ring.base = NULL; } + if (tq->ts_ring.base) { + dma_free_coherent(&adapter->pdev->dev, + tq->tx_ring.size * tq->tx_ts_desc_size, + tq->ts_ring.base, tq->ts_ring.basePA); + tq->ts_ring.base = NULL; + } if (tq->comp_ring.base) { dma_free_coherent(&adapter->pdev->dev, tq->comp_ring.size * sizeof(struct Vmxnet3_TxCompDesc), @@ -535,6 +564,11 @@ vmxnet3_tq_init(struct vmxnet3_tx_queue *tq, memset(tq->data_ring.base, 0, tq->data_ring.size * tq->txdata_desc_size); + if (tq->ts_ring.base) { + memset(tq->ts_ring.base, 0, + tq->tx_ring.size * tq->tx_ts_desc_size); + } + /* reset the tx comp ring contents to 0 and reset comp ring states */ memset(tq->comp_ring.base, 0, tq->comp_ring.size * sizeof(struct Vmxnet3_TxCompDesc)); @@ -573,6 +607,18 @@ vmxnet3_tq_create(struct vmxnet3_tx_queue *tq, goto err; } + if (tq->tx_ts_desc_size != 0) { + tq->ts_ring.base = dma_alloc_coherent(&adapter->pdev->dev, + tq->tx_ring.size * tq->tx_ts_desc_size, + &tq->ts_ring.basePA, GFP_KERNEL); + if (!tq->ts_ring.base) { + netdev_err(adapter->netdev, "failed to allocate tx ts ring\n"); + tq->tx_ts_desc_size = 0; + } + } else { + tq->ts_ring.base = NULL; + } + tq->comp_ring.base = dma_alloc_coherent(&adapter->pdev->dev, tq->comp_ring.size * sizeof(struct Vmxnet3_TxCompDesc), &tq->comp_ring.basePA, GFP_KERNEL); @@ -861,6 +907,11 @@ vmxnet3_map_pkt(struct sk_buff *skb, struct vmxnet3_tx_ctx *ctx, /* set the last buf_info for the pkt */ tbi->skb = skb; tbi->sop_idx = ctx->sop_txd - tq->tx_ring.base; + if (tq->tx_ts_desc_size != 0) { + ctx->ts_txd = (struct Vmxnet3_TxTSDesc *)((u8 *)tq->ts_ring.base + + tbi->sop_idx * tq->tx_ts_desc_size); + ctx->ts_txd->ts.tsi = 0; + } return 0; } @@ -968,7 +1019,7 @@ vmxnet3_parse_hdr(struct sk_buff *skb, struct vmxnet3_tx_queue *tq, skb_headlen(skb)); } - if (skb->len <= VMXNET3_HDR_COPY_SIZE) + if (skb->len <= tq->txdata_desc_size) ctx->copy_size = skb->len; /* make sure headers are accessible directly */ @@ -1259,6 +1310,14 @@ vmxnet3_tq_xmit(struct sk_buff *skb, struct vmxnet3_tx_queue *tq, gdesc->txd.tci = skb_vlan_tag_get(skb); } + if (tq->tx_ts_desc_size != 0 && + adapter->latencyConf->sampleRate != 0) { + if (vmxnet3_apply_timestamp(tq, adapter->latencyConf->sampleRate)) { + ctx.ts_txd->ts.tsData = vmxnet3_get_cycles(VMXNET3_PMC_PSEUDO_TSC); + ctx.ts_txd->ts.tsi = 1; + } + } + /* Ensure that the write to (&gdesc->txd)->gen will be observed after * all other writes to &gdesc->txd. */ @@ -1608,6 +1667,15 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq, skip_page_frags = false; ctx->skb = rbi->skb; + if (rq->rx_ts_desc_size != 0 && rcd->ext2) { + struct Vmxnet3_RxTSDesc *ts_rxd; + + ts_rxd = (struct Vmxnet3_RxTSDesc *)((u8 *)rq->ts_ring.base + + idx * rq->rx_ts_desc_size); + ts_rxd->ts.tsData = vmxnet3_get_cycles(VMXNET3_PMC_PSEUDO_TSC); + ts_rxd->ts.tsi = 1; + } + rxDataRingUsed = VMXNET3_RX_DATA_RING(adapter, rcd->rqID); len = rxDataRingUsed ? rcd->len : rbi->len; @@ -2007,6 +2075,13 @@ static void vmxnet3_rq_destroy(struct vmxnet3_rx_queue *rq, rq->data_ring.base = NULL; } + if (rq->ts_ring.base) { + dma_free_coherent(&adapter->pdev->dev, + rq->rx_ring[0].size * rq->rx_ts_desc_size, + rq->ts_ring.base, rq->ts_ring.basePA); + rq->ts_ring.base = NULL; + } + if (rq->comp_ring.base) { dma_free_coherent(&adapter->pdev->dev, rq->comp_ring.size * sizeof(struct Vmxnet3_RxCompDesc), @@ -2090,6 +2165,11 @@ vmxnet3_rq_init(struct vmxnet3_rx_queue *rq, } vmxnet3_rq_alloc_rx_buf(rq, 1, rq->rx_ring[1].size - 1, adapter); + if (rq->ts_ring.base) { + memset(rq->ts_ring.base, 0, + rq->rx_ring[0].size * rq->rx_ts_desc_size); + } + /* reset the comp ring */ rq->comp_ring.next2proc = 0; memset(rq->comp_ring.base, 0, rq->comp_ring.size * @@ -2160,6 +2240,21 @@ vmxnet3_rq_create(struct vmxnet3_rx_queue *rq, struct vmxnet3_adapter *adapter) rq->data_ring.desc_size = 0; } + if (rq->rx_ts_desc_size != 0) { + sz = rq->rx_ring[0].size * rq->rx_ts_desc_size; + rq->ts_ring.base = + dma_alloc_coherent(&adapter->pdev->dev, sz, + &rq->ts_ring.basePA, + GFP_KERNEL); + if (!rq->ts_ring.base) { + netdev_err(adapter->netdev, + "rx ts ring will be disabled\n"); + rq->rx_ts_desc_size = 0; + } + } else { + rq->ts_ring.base = NULL; + } + sz = rq->comp_ring.size * sizeof(struct Vmxnet3_RxCompDesc); rq->comp_ring.base = dma_alloc_coherent(&adapter->pdev->dev, sz, &rq->comp_ring.basePA, @@ -2759,6 +2854,8 @@ vmxnet3_setup_driver_shared(struct vmxnet3_adapter *adapter) struct Vmxnet3_DSDevReadExt *devReadExt = &shared->devReadExt; struct Vmxnet3_TxQueueConf *tqc; struct Vmxnet3_RxQueueConf *rqc; + struct Vmxnet3_TxQueueTSConf *tqtsc; + struct Vmxnet3_RxQueueTSConf *rqtsc; int i; memset(shared, 0, sizeof(*shared)); @@ -2815,6 +2912,11 @@ vmxnet3_setup_driver_shared(struct vmxnet3_adapter *adapter) tqc->compRingSize = cpu_to_le32(tq->comp_ring.size); tqc->ddLen = cpu_to_le32(0); tqc->intrIdx = tq->comp_ring.intr_idx; + if (VMXNET3_VERSION_GE_9(adapter)) { + tqtsc = &adapter->tqd_start[i].tsConf; + tqtsc->txTSRingBasePA = cpu_to_le64(tq->ts_ring.basePA); + tqtsc->txTSRingDescSize = cpu_to_le16(tq->tx_ts_desc_size); + } } /* rx queue settings */ @@ -2837,6 +2939,11 @@ vmxnet3_setup_driver_shared(struct vmxnet3_adapter *adapter) rqc->rxDataRingDescSize = cpu_to_le16(rq->data_ring.desc_size); } + if (VMXNET3_VERSION_GE_9(adapter)) { + rqtsc = &adapter->rqd_start[i].tsConf; + rqtsc->rxTSRingBasePA = cpu_to_le64(rq->ts_ring.basePA); + rqtsc->rxTSRingDescSize = cpu_to_le16(rq->rx_ts_desc_size); + } } #ifdef VMXNET3_RSS @@ -3299,6 +3406,8 @@ vmxnet3_create_queues(struct vmxnet3_adapter *adapter, u32 tx_ring_size, tq->stopped = true; tq->adapter = adapter; tq->qid = i; + tq->tx_ts_desc_size = adapter->tx_ts_desc_size; + tq->tsPktCount = 1; err = vmxnet3_tq_create(tq, adapter); /* * Too late to change num_tx_queues. We cannot do away with @@ -3320,6 +3429,7 @@ vmxnet3_create_queues(struct vmxnet3_adapter *adapter, u32 tx_ring_size, rq->shared = &adapter->rqd_start[i].ctrl; rq->adapter = adapter; rq->data_ring.desc_size = rxdata_desc_size; + rq->rx_ts_desc_size = adapter->rx_ts_desc_size; err = vmxnet3_rq_create(rq, adapter); if (err) { if (i == 0) { @@ -3361,14 +3471,15 @@ vmxnet3_open(struct net_device *netdev) if (VMXNET3_VERSION_GE_3(adapter)) { unsigned long flags; u16 txdata_desc_size; + u32 ret; spin_lock_irqsave(&adapter->cmd_lock, flags); VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD, VMXNET3_CMD_GET_TXDATA_DESC_SIZE); - txdata_desc_size = VMXNET3_READ_BAR1_REG(adapter, - VMXNET3_REG_CMD); + ret = VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_CMD); spin_unlock_irqrestore(&adapter->cmd_lock, flags); + txdata_desc_size = ret & 0xffff; if ((txdata_desc_size < VMXNET3_TXDATA_DESC_MIN_SIZE) || (txdata_desc_size > VMXNET3_TXDATA_DESC_MAX_SIZE) || (txdata_desc_size & VMXNET3_TXDATA_DESC_SIZE_MASK)) { @@ -3377,10 +3488,42 @@ vmxnet3_open(struct net_device *netdev) } else { adapter->txdata_desc_size = txdata_desc_size; } + if (VMXNET3_VERSION_GE_9(adapter)) + adapter->rxdata_desc_size = (ret >> 16) & 0xffff; } else { adapter->txdata_desc_size = sizeof(struct Vmxnet3_TxDataDesc); } + if (VMXNET3_VERSION_GE_9(adapter)) { + unsigned long flags; + u16 tx_ts_desc_size = 0; + u16 rx_ts_desc_size = 0; + u32 ret; + + spin_lock_irqsave(&adapter->cmd_lock, flags); + VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD, + VMXNET3_CMD_GET_TSRING_DESC_SIZE); + ret = VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_CMD); + spin_unlock_irqrestore(&adapter->cmd_lock, flags); + if (ret > 0) { + tx_ts_desc_size = (ret & 0xff); + rx_ts_desc_size = ((ret >> 16) & 0xff); + } + if (tx_ts_desc_size > VMXNET3_TXTS_DESC_MAX_SIZE || + tx_ts_desc_size & VMXNET3_TXTS_DESC_SIZE_MASK) { + tx_ts_desc_size = 0; + } + if (rx_ts_desc_size > VMXNET3_RXTS_DESC_MAX_SIZE || + rx_ts_desc_size & VMXNET3_RXTS_DESC_SIZE_MASK) { + rx_ts_desc_size = 0; + } + adapter->tx_ts_desc_size = tx_ts_desc_size; + adapter->rx_ts_desc_size = rx_ts_desc_size; + } else { + adapter->tx_ts_desc_size = 0; + adapter->rx_ts_desc_size = 0; + } + err = vmxnet3_create_queues(adapter, adapter->tx_ring_size, adapter->rx_ring_size, @@ -3992,6 +4135,9 @@ vmxnet3_probe_device(struct pci_dev *pdev, } adapter->rqd_start = (struct Vmxnet3_RxQueueDesc *)(adapter->tqd_start + adapter->num_tx_queues); + if (VMXNET3_VERSION_GE_9(adapter)) { + adapter->latencyConf = &adapter->tqd_start->tsConf.latencyConf; + } adapter->pm_conf = dma_alloc_coherent(&adapter->pdev->dev, sizeof(struct Vmxnet3_PMConf), diff --git a/drivers/net/vmxnet3/vmxnet3_int.h b/drivers/net/vmxnet3/vmxnet3_int.h index 2926dfe8514f..68358e71526c 100644 --- a/drivers/net/vmxnet3/vmxnet3_int.h +++ b/drivers/net/vmxnet3/vmxnet3_int.h @@ -193,6 +193,11 @@ struct vmxnet3_tx_data_ring { dma_addr_t basePA; }; +struct vmxnet3_tx_ts_ring { + struct Vmxnet3_TxTSDesc *base; + dma_addr_t basePA; +}; + #define VMXNET3_MAP_NONE 0 #define VMXNET3_MAP_SINGLE BIT(0) #define VMXNET3_MAP_PAGE BIT(1) @@ -245,6 +250,7 @@ struct vmxnet3_tx_ctx { u32 copy_size; /* # of bytes copied into the data ring */ union Vmxnet3_GenericDesc *sop_txd; union Vmxnet3_GenericDesc *eop_txd; + struct Vmxnet3_TxTSDesc *ts_txd; }; struct vmxnet3_tx_queue { @@ -254,6 +260,7 @@ struct vmxnet3_tx_queue { struct vmxnet3_cmd_ring tx_ring; struct vmxnet3_tx_buf_info *buf_info; struct vmxnet3_tx_data_ring data_ring; + struct vmxnet3_tx_ts_ring ts_ring; struct vmxnet3_comp_ring comp_ring; struct Vmxnet3_TxQueueCtrl *shared; struct vmxnet3_tq_driver_stats stats; @@ -262,6 +269,8 @@ struct vmxnet3_tx_queue { * stopped */ int qid; u16 txdata_desc_size; + u16 tx_ts_desc_size; + u16 tsPktCount; } ____cacheline_aligned; enum vmxnet3_rx_buf_type { @@ -309,6 +318,11 @@ struct vmxnet3_rx_data_ring { u16 desc_size; }; +struct vmxnet3_rx_ts_ring { + struct Vmxnet3_RxTSDesc *base; + dma_addr_t basePA; +}; + struct vmxnet3_rx_queue { char name[IFNAMSIZ + 8]; /* To identify interrupt */ struct vmxnet3_adapter *adapter; @@ -316,6 +330,7 @@ struct vmxnet3_rx_queue { struct vmxnet3_cmd_ring rx_ring[2]; struct vmxnet3_rx_data_ring data_ring; struct vmxnet3_comp_ring comp_ring; + struct vmxnet3_rx_ts_ring ts_ring; struct vmxnet3_rx_ctx rx_ctx; u32 qid; /* rqID in RCD for buffer from 1st ring */ u32 qid2; /* rqID in RCD for buffer from 2nd ring */ @@ -325,6 +340,7 @@ struct vmxnet3_rx_queue { struct vmxnet3_rq_driver_stats stats; struct page_pool *page_pool; struct xdp_rxq_info xdp_rxq; + u16 rx_ts_desc_size; } ____cacheline_aligned; #define VMXNET3_DEVICE_MAX_TX_QUEUES 32 @@ -434,6 +450,10 @@ struct vmxnet3_adapter { u16 rx_prod_offset; u16 rx_prod2_offset; struct bpf_prog __rcu *xdp_bpf_prog; + struct Vmxnet3_LatencyConf *latencyConf; + /* Size of buffer in the ts ring */ + u16 tx_ts_desc_size; + u16 rx_ts_desc_size; }; #define VMXNET3_WRITE_BAR0_REG(adapter, reg, val) \