From patchwork Mon Sep 10 16:49:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Dalessandro X-Patchwork-Id: 10594537 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1F63C14E5 for ; Mon, 10 Sep 2018 16:49:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0EEC9290DF for ; Mon, 10 Sep 2018 16:49:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 02FB2290F0; Mon, 10 Sep 2018 16:49:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D98AC290DF for ; Mon, 10 Sep 2018 16:49:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727784AbeIJVoZ (ORCPT ); Mon, 10 Sep 2018 17:44:25 -0400 Received: from mga14.intel.com ([192.55.52.115]:9304 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728150AbeIJVoX (ORCPT ); Mon, 10 Sep 2018 17:44:23 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Sep 2018 09:49:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,356,1531810800"; d="scan'208";a="79368113" Received: from scymds01.sc.intel.com ([10.82.194.37]) by FMSMGA003.fm.intel.com with ESMTP; 10 Sep 2018 09:49:10 -0700 Received: from scvm10.sc.intel.com (scvm10.sc.intel.com [10.82.195.27]) by scymds01.sc.intel.com with ESMTP id w8AGnALw019561; Mon, 10 Sep 2018 09:49:10 -0700 Received: from scvm10.sc.intel.com (localhost [127.0.0.1]) by scvm10.sc.intel.com with ESMTP id w8AGnA5M019292; Mon, 10 Sep 2018 09:49:10 -0700 Subject: [PATCH v2 for-next 1/3] IB/hfi1: Move rvt_cq_wc struct into uapi directory From: Dennis Dalessandro To: jgg@ziepe.ca, dledford@redhat.com Cc: linux-rdma@vger.kernel.org, Mike Marciniszyn , Kamenee Arumugam Date: Mon, 10 Sep 2018 09:49:10 -0700 Message-ID: <20180910164905.15907.47183.stgit@scvm10.sc.intel.com> In-Reply-To: <20180910164804.15907.64096.stgit@scvm10.sc.intel.com> References: <20180910164804.15907.64096.stgit@scvm10.sc.intel.com> User-Agent: StGit/0.17.1-18-g2e886-dirty MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Kamenee Arumugam The rvt_cq_wc struct elements are shared between rdmavt and the providers but not in uapi directory. As the per comment in below's link, hfi1 driver is not instructed to use kernel-headers and uapi header. https://patchwork.kernel.org/patch/10323739/ Move rvt_cq_wc struct into rvt-abi.h header in uapi directory. Create rvt_k_cq_w kernel struct to separate it from the user version. Signed-off-by: Kamenee Arumugam Reviewed-by: Mike Marciniszyn Signed-off-by: Dennis Dalessandro --- drivers/infiniband/hw/hfi1/qp.c | 32 ++++++ drivers/infiniband/sw/rdmavt/cq.c | 190 ++++++++++++++++++++++++------------- include/rdma/rdmavt_cq.h | 10 +- include/uapi/rdma/rvt-abi.h | 74 ++++++++++++++ 4 files changed, 230 insertions(+), 76 deletions(-) create mode 100644 include/uapi/rdma/rvt-abi.h diff --git a/drivers/infiniband/hw/hfi1/qp.c b/drivers/infiniband/hw/hfi1/qp.c index 9b1e84a..352fef6 100644 --- a/drivers/infiniband/hw/hfi1/qp.c +++ b/drivers/infiniband/hw/hfi1/qp.c @@ -543,6 +543,34 @@ static int qp_idle(struct rvt_qp *qp) } /** + * ib_cq_tail - Return tail index of cq buffer + * @send_cq - The cq for send + * + * This is called in qp_iter_print to get tail + * of cq buffer. + */ +static u32 ib_cq_tail(struct ib_cq *send_cq) +{ + return ibcq_to_rvtcq(send_cq)->ip ? + ibcq_to_rvtcq(send_cq)->queue->tail : + ibcq_to_rvtcq(send_cq)->kqueue->tail; +} + +/** + * ib_cq_head - Return head index of cq buffer + * @send_cq - The cq for send + * + * This is called in qp_iter_print to get head + * of cq buffer. + */ +static u32 ib_cq_head(struct ib_cq *send_cq) +{ + return ibcq_to_rvtcq(send_cq)->ip ? + ibcq_to_rvtcq(send_cq)->queue->head : + ibcq_to_rvtcq(send_cq)->kqueue->head; +} + +/** * qp_iter_print - print the qp information to seq_file * @s: the seq_file to emit the qp information on * @iter: the iterator for the qp hash list @@ -602,8 +630,8 @@ void qp_iter_print(struct seq_file *s, struct rvt_qp_iter *iter) sde ? sde->this_idx : 0, send_context, send_context ? send_context->sw_index : 0, - ibcq_to_rvtcq(qp->ibqp.send_cq)->queue->head, - ibcq_to_rvtcq(qp->ibqp.send_cq)->queue->tail, + ib_cq_head(qp->ibqp.send_cq), + ib_cq_tail(qp->ibqp.send_cq), qp->pid, qp->s_state, qp->s_ack_state, diff --git a/drivers/infiniband/sw/rdmavt/cq.c b/drivers/infiniband/sw/rdmavt/cq.c index 4f1544a..6406a08 100644 --- a/drivers/infiniband/sw/rdmavt/cq.c +++ b/drivers/infiniband/sw/rdmavt/cq.c @@ -1,3 +1,4 @@ +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) /* * Copyright(c) 2016 - 2018 Intel Corporation. * @@ -63,19 +64,33 @@ */ void rvt_cq_enter(struct rvt_cq *cq, struct ib_wc *entry, bool solicited) { - struct rvt_cq_wc *wc; + struct ib_uverbs_wc *uqueue = NULL; + struct ib_wc *kqueue = NULL; + struct rvt_cq_wc *u_wc = NULL; + struct rvt_k_cq_wc *k_wc = NULL; unsigned long flags; u32 head; u32 next; + u32 tail; spin_lock_irqsave(&cq->lock, flags); + if (cq->ip) { + u_wc = cq->queue; + uqueue = &u_wc->uqueue[0]; + head = u_wc->head; + tail = u_wc->tail; + } else { + k_wc = cq->kqueue; + kqueue = &k_wc->kqueue[0]; + head = k_wc->head; + tail = k_wc->tail; + } + /* - * Note that the head pointer might be writable by user processes. - * Take care to verify it is a sane value. + * Note that the head pointer might be writable by + * user processes.Take care to verify it is a sane value. */ - wc = cq->queue; - head = wc->head; if (head >= (unsigned)cq->ibcq.cqe) { head = cq->ibcq.cqe; next = 0; @@ -83,7 +98,7 @@ void rvt_cq_enter(struct rvt_cq *cq, struct ib_wc *entry, bool solicited) next = head + 1; } - if (unlikely(next == wc->tail)) { + if (unlikely(next == tail)) { spin_unlock_irqrestore(&cq->lock, flags); if (cq->ibcq.event_handler) { struct ib_event ev; @@ -96,27 +111,28 @@ void rvt_cq_enter(struct rvt_cq *cq, struct ib_wc *entry, bool solicited) return; } trace_rvt_cq_enter(cq, entry, head); - if (cq->ip) { - wc->uqueue[head].wr_id = entry->wr_id; - wc->uqueue[head].status = entry->status; - wc->uqueue[head].opcode = entry->opcode; - wc->uqueue[head].vendor_err = entry->vendor_err; - wc->uqueue[head].byte_len = entry->byte_len; - wc->uqueue[head].ex.imm_data = entry->ex.imm_data; - wc->uqueue[head].qp_num = entry->qp->qp_num; - wc->uqueue[head].src_qp = entry->src_qp; - wc->uqueue[head].wc_flags = entry->wc_flags; - wc->uqueue[head].pkey_index = entry->pkey_index; - wc->uqueue[head].slid = ib_lid_cpu16(entry->slid); - wc->uqueue[head].sl = entry->sl; - wc->uqueue[head].dlid_path_bits = entry->dlid_path_bits; - wc->uqueue[head].port_num = entry->port_num; + if (uqueue) { + uqueue[head].wr_id = entry->wr_id; + uqueue[head].status = entry->status; + uqueue[head].opcode = entry->opcode; + uqueue[head].vendor_err = entry->vendor_err; + uqueue[head].byte_len = entry->byte_len; + uqueue[head].ex.imm_data = entry->ex.imm_data; + uqueue[head].qp_num = entry->qp->qp_num; + uqueue[head].src_qp = entry->src_qp; + uqueue[head].wc_flags = entry->wc_flags; + uqueue[head].pkey_index = entry->pkey_index; + uqueue[head].slid = ib_lid_cpu16(entry->slid); + uqueue[head].sl = entry->sl; + uqueue[head].dlid_path_bits = entry->dlid_path_bits; + uqueue[head].port_num = entry->port_num; /* Make sure entry is written before the head index. */ smp_wmb(); + u_wc->head = next; } else { - wc->kqueue[head] = *entry; + kqueue[head] = *entry; + k_wc->head = next; } - wc->head = next; if (cq->notify == IB_CQ_NEXT_COMP || (cq->notify == IB_CQ_SOLICITED && @@ -183,7 +199,8 @@ struct ib_cq *rvt_create_cq(struct ib_device *ibdev, { struct rvt_dev_info *rdi = ib_to_rvt(ibdev); struct rvt_cq *cq; - struct rvt_cq_wc *wc; + struct rvt_cq_wc *u_wc = NULL; + struct rvt_k_cq_wc *k_wc = NULL; struct ib_cq *ret; u32 sz; unsigned int entries = attr->cqe; @@ -212,17 +229,22 @@ struct ib_cq *rvt_create_cq(struct ib_device *ibdev, * We need to use vmalloc() in order to support mmap and large * numbers of entries. */ - sz = sizeof(*wc); - if (udata && udata->outlen >= sizeof(__u64)) - sz += sizeof(struct ib_uverbs_wc) * (entries + 1); - else - sz += sizeof(struct ib_wc) * (entries + 1); - wc = udata ? - vmalloc_user(sz) : - vzalloc_node(sz, rdi->dparms.node); - if (!wc) { - ret = ERR_PTR(-ENOMEM); - goto bail_cq; + if (udata && udata->outlen >= sizeof(__u64)) { + sz = sizeof(struct ib_uverbs_wc) * (entries + 1); + sz += sizeof(*u_wc); + u_wc = vmalloc_user(sz); + if (!u_wc) { + ret = ERR_PTR(-ENOMEM); + goto bail_cq; + } + } else { + sz = sizeof(struct ib_wc) * (entries + 1); + sz += sizeof(*k_wc); + k_wc = vzalloc_node(sz, rdi->dparms.node); + if (!k_wc) { + ret = ERR_PTR(-ENOMEM); + goto bail_cq; + } } /* @@ -232,7 +254,7 @@ struct ib_cq *rvt_create_cq(struct ib_device *ibdev, if (udata && udata->outlen >= sizeof(__u64)) { int err; - cq->ip = rvt_create_mmap_info(rdi, sz, context, wc); + cq->ip = rvt_create_mmap_info(rdi, sz, context, u_wc); if (!cq->ip) { ret = ERR_PTR(-ENOMEM); goto bail_wc; @@ -279,7 +301,10 @@ struct ib_cq *rvt_create_cq(struct ib_device *ibdev, cq->notify = RVT_CQ_NONE; spin_lock_init(&cq->lock); INIT_WORK(&cq->comptask, send_complete); - cq->queue = wc; + if (u_wc) + cq->queue = u_wc; + else + cq->kqueue = k_wc; ret = &cq->ibcq; @@ -289,7 +314,8 @@ struct ib_cq *rvt_create_cq(struct ib_device *ibdev, bail_ip: kfree(cq->ip); bail_wc: - vfree(wc); + vfree(u_wc); + vfree(k_wc); bail_cq: kfree(cq); done: @@ -346,9 +372,15 @@ int rvt_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags notify_flags) if (cq->notify != IB_CQ_NEXT_COMP) cq->notify = notify_flags & IB_CQ_SOLICITED_MASK; - if ((notify_flags & IB_CQ_REPORT_MISSED_EVENTS) && - cq->queue->head != cq->queue->tail) - ret = 1; + if (notify_flags & IB_CQ_REPORT_MISSED_EVENTS) { + if (cq->queue) { + if (cq->queue->head != cq->queue->tail) + ret = 1; + } else { + if (cq->kqueue->head != cq->kqueue->tail) + ret = 1; + } + } spin_unlock_irqrestore(&cq->lock, flags); @@ -364,12 +396,14 @@ int rvt_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags notify_flags) int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata) { struct rvt_cq *cq = ibcq_to_rvtcq(ibcq); - struct rvt_cq_wc *old_wc; - struct rvt_cq_wc *wc; u32 head, tail, n; int ret; u32 sz; struct rvt_dev_info *rdi = cq->rdi; + struct rvt_cq_wc *u_wc = NULL; + struct rvt_cq_wc *old_u_wc = NULL; + struct rvt_k_cq_wc *k_wc = NULL; + struct rvt_k_cq_wc *old_k_wc = NULL; if (cqe < 1 || cqe > rdi->dparms.props.max_cqe) return -EINVAL; @@ -377,17 +411,19 @@ int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata) /* * Need to use vmalloc() if we want to support large #s of entries. */ - sz = sizeof(*wc); - if (udata && udata->outlen >= sizeof(__u64)) - sz += sizeof(struct ib_uverbs_wc) * (cqe + 1); - else - sz += sizeof(struct ib_wc) * (cqe + 1); - wc = udata ? - vmalloc_user(sz) : - vzalloc_node(sz, rdi->dparms.node); - if (!wc) - return -ENOMEM; - + if (udata && udata->outlen >= sizeof(__u64)) { + sz = sizeof(struct ib_uverbs_wc) * (cqe + 1); + sz += sizeof(*u_wc); + u_wc = vmalloc_user(sz); + if (!u_wc) + return -ENOMEM; + } else { + sz = sizeof(struct ib_wc) * (cqe + 1); + sz += sizeof(*k_wc); + k_wc = vzalloc_node(sz, rdi->dparms.node); + if (!k_wc) + return -ENOMEM; + } /* Check that we can write the offset to mmap. */ if (udata && udata->outlen >= sizeof(__u64)) { __u64 offset = 0; @@ -402,11 +438,18 @@ int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata) * Make sure head and tail are sane since they * might be user writable. */ - old_wc = cq->queue; - head = old_wc->head; + if (udata && udata->outlen >= sizeof(__u64)) { + old_u_wc = cq->queue; + head = old_u_wc->head; + tail = old_u_wc->tail; + } else { + old_k_wc = cq->kqueue; + head = old_k_wc->head; + tail = old_k_wc->tail; + } + if (head > (u32)cq->ibcq.cqe) head = (u32)cq->ibcq.cqe; - tail = old_wc->tail; if (tail > (u32)cq->ibcq.cqe) tail = (u32)cq->ibcq.cqe; if (head < tail) @@ -418,27 +461,36 @@ int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata) goto bail_unlock; } for (n = 0; tail != head; n++) { - if (cq->ip) - wc->uqueue[n] = old_wc->uqueue[tail]; + if (udata && udata->outlen >= sizeof(__u64)) + u_wc->uqueue[n] = old_u_wc->uqueue[tail]; else - wc->kqueue[n] = old_wc->kqueue[tail]; + k_wc->kqueue[n] = old_k_wc->kqueue[tail]; if (tail == (u32)cq->ibcq.cqe) tail = 0; else tail++; } cq->ibcq.cqe = cqe; - wc->head = n; - wc->tail = 0; - cq->queue = wc; + if (u_wc) { + u_wc->head = n; + u_wc->tail = 0; + cq->queue = u_wc; + } else { + k_wc->head = n; + k_wc->tail = 0; + cq->kqueue = k_wc; + } spin_unlock_irq(&cq->lock); - vfree(old_wc); + if (u_wc) + vfree(old_u_wc); + else + vfree(old_k_wc); if (cq->ip) { struct rvt_mmap_info *ip = cq->ip; - rvt_update_mmap_info(rdi, ip, sz, wc); + rvt_update_mmap_info(rdi, ip, sz, u_wc); /* * Return the offset to mmap. @@ -462,7 +514,9 @@ int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata) bail_unlock: spin_unlock_irq(&cq->lock); bail_free: - vfree(wc); + vfree(u_wc); + vfree(k_wc); + return ret; } @@ -480,7 +534,7 @@ int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata) int rvt_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry) { struct rvt_cq *cq = ibcq_to_rvtcq(ibcq); - struct rvt_cq_wc *wc; + struct rvt_k_cq_wc *wc; unsigned long flags; int npolled; u32 tail; @@ -491,7 +545,7 @@ int rvt_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry) spin_lock_irqsave(&cq->lock, flags); - wc = cq->queue; + wc = cq->kqueue; tail = wc->tail; if (tail > (u32)cq->ibcq.cqe) tail = (u32)cq->ibcq.cqe; diff --git a/include/rdma/rdmavt_cq.h b/include/rdma/rdmavt_cq.h index 75dc65c..9f25be0 100644 --- a/include/rdma/rdmavt_cq.h +++ b/include/rdma/rdmavt_cq.h @@ -59,20 +59,17 @@ * notifications are armed. */ #define RVT_CQ_NONE (IB_CQ_NEXT_COMP + 1) +#include /* * This structure is used to contain the head pointer, tail pointer, * and completion queue entries as a single memory allocation so * it can be mmap'ed into user space. */ -struct rvt_cq_wc { +struct rvt_k_cq_wc { u32 head; /* index of next entry to fill */ u32 tail; /* index of next ib_poll_cq() entry */ - union { - /* these are actually size ibcq.cqe + 1 */ - struct ib_uverbs_wc uqueue[0]; - struct ib_wc kqueue[0]; - }; + struct ib_wc kqueue[0]; }; /* @@ -88,6 +85,7 @@ struct rvt_cq { struct rvt_dev_info *rdi; struct rvt_cq_wc *queue; struct rvt_mmap_info *ip; + struct rvt_k_cq_wc *kqueue; }; static inline struct rvt_cq *ibcq_to_rvtcq(struct ib_cq *ibcq) diff --git a/include/uapi/rdma/rvt-abi.h b/include/uapi/rdma/rvt-abi.h new file mode 100644 index 0000000..8cff1df --- /dev/null +++ b/include/uapi/rdma/rvt-abi.h @@ -0,0 +1,74 @@ +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */ +/* + * Copyright(c) 2015 - 2018 Intel Corporation. + * + * This file is provided under a dual BSD/GPLv2 license. When using or + * redistributing this file, you may do so under either license. + * + * GPL LICENSE SUMMARY + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * BSD LICENSE + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * - Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * - Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * - Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * + */ + +/* + * This file contains defines, structures, etc. that are used + * to communicate between kernel and user code. + */ + +#ifndef RVT_ABI_USER_H +#define RVT_ABI_USER_H + +#include +#ifndef RDMA_ATOMIC_UAPI +#define RDMA_ATOMIC_UAPI(_type, _name) _type _name +#endif +/* + * This structure is used to contain the head pointer, tail pointer, + * and completion queue entries as a single memory allocation so + * it can be mmap'ed into user space. + */ +struct rvt_cq_wc { + /* index of next entry to fill */ + RDMA_ATOMIC_UAPI(u32, head); + /* index of next ib_poll_cq() entry */ + RDMA_ATOMIC_UAPI(u32, tail); + + /* these are actually size ibcq.cqe + 1 */ + struct ib_uverbs_wc uqueue[0]; +}; + +#endif /* RVT_ABI_USER_H */ From patchwork Mon Sep 10 16:49:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Dalessandro X-Patchwork-Id: 10594535 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EC69014E0 for ; Mon, 10 Sep 2018 16:49:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DA007290DF for ; Mon, 10 Sep 2018 16:49:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CC62E290EE; Mon, 10 Sep 2018 16:49:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D7828290DF for ; Mon, 10 Sep 2018 16:49:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728340AbeIJVoR (ORCPT ); Mon, 10 Sep 2018 17:44:17 -0400 Received: from mga05.intel.com ([192.55.52.43]:2395 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728628AbeIJVoR (ORCPT ); Mon, 10 Sep 2018 17:44:17 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Sep 2018 09:49:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,356,1531810800"; d="scan'208";a="79368134" Received: from scymds01.sc.intel.com ([10.82.194.37]) by FMSMGA003.fm.intel.com with ESMTP; 10 Sep 2018 09:49:19 -0700 Received: from scvm10.sc.intel.com (scvm10.sc.intel.com [10.82.195.27]) by scymds01.sc.intel.com with ESMTP id w8AGnJRi019612; Mon, 10 Sep 2018 09:49:19 -0700 Received: from scvm10.sc.intel.com (localhost [127.0.0.1]) by scvm10.sc.intel.com with ESMTP id w8AGnJEi019321; Mon, 10 Sep 2018 09:49:19 -0700 Subject: [PATCH v2 for-next 2/3] IB/{hfi1, rdmavt, qib}: Fit completions into single aligned cache-line From: Dennis Dalessandro To: jgg@ziepe.ca, dledford@redhat.com Cc: linux-rdma@vger.kernel.org, Mike Marciniszyn , Sebastian Sanchez , Kamenee Arumugam Date: Mon, 10 Sep 2018 09:49:19 -0700 Message-ID: <20180910164915.15907.82884.stgit@scvm10.sc.intel.com> In-Reply-To: <20180910164804.15907.64096.stgit@scvm10.sc.intel.com> References: <20180910164804.15907.64096.stgit@scvm10.sc.intel.com> User-Agent: StGit/0.17.1-18-g2e886-dirty MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Sebastian Sanchez The struct ib_wc uses two cache-lines per completion, and it is unaligned. This structure used to fit within one cacheline, but it was expanded by fields added in the following patches: dd5f03beb4f7 ("IB/core: Ethernet L2 attributes in verbs/cm structures") c865f24628b9 ("IB/core: Add rdma_network_type to wc") These new fields are only needed for ethernet and for HCAs that don't provide the network type to search the proper GID in the GID table. Since there are two cache-lines, more cache-lines are dirtied per work completion entry. Create a kernel only rvt_wc structure that is a single aligned cache-line. This reduces the cache lines used per completion and eliminates any cache line push-pull by aligning the size to a cache-line. Cache-aligning the new kernel completion queue expands struct rvt_cq_wc breaking the ABI for the user completion queue. Therefore, decouple the kernel completion queue from struct rvt_cq_wc to prevent this. Reviewed-by: Mike Marciniszyn Reviewed-by: Sebastian Sanchez Signed-off-by: Sebastian Sanchez Signed-off-by: Kamenee Arumugam Signed-off-by: Dennis Dalessandro --- drivers/infiniband/hw/hfi1/rc.c | 2 + drivers/infiniband/hw/hfi1/ruc.c | 2 + drivers/infiniband/hw/hfi1/uc.c | 2 + drivers/infiniband/hw/hfi1/ud.c | 4 +- drivers/infiniband/hw/qib/qib_rc.c | 2 + drivers/infiniband/hw/qib/qib_ruc.c | 2 + drivers/infiniband/hw/qib/qib_uc.c | 2 + drivers/infiniband/hw/qib/qib_ud.c | 4 +- drivers/infiniband/sw/rdmavt/cq.c | 57 ++++++++++++++++++++----------- drivers/infiniband/sw/rdmavt/qp.c | 6 ++- drivers/infiniband/sw/rdmavt/trace_cq.h | 6 ++- include/rdma/rdmavt_cq.h | 29 +++++++++++++++- include/rdma/rdmavt_qp.h | 2 + 13 files changed, 81 insertions(+), 39 deletions(-) diff --git a/drivers/infiniband/hw/hfi1/rc.c b/drivers/infiniband/hw/hfi1/rc.c index 9bd63ab..975d91a 100644 --- a/drivers/infiniband/hw/hfi1/rc.c +++ b/drivers/infiniband/hw/hfi1/rc.c @@ -2041,7 +2041,7 @@ void hfi1_rc_rcv(struct hfi1_packet *packet) u32 hdrsize = packet->hlen; u32 psn = ib_bth_get_psn(packet->ohdr); u32 pad = packet->pad; - struct ib_wc wc; + struct rvt_wc wc; u32 pmtu = qp->pmtu; int diff; struct ib_reth *reth; diff --git a/drivers/infiniband/hw/hfi1/ruc.c b/drivers/infiniband/hw/hfi1/ruc.c index 5f56f3c..019713a 100644 --- a/drivers/infiniband/hw/hfi1/ruc.c +++ b/drivers/infiniband/hw/hfi1/ruc.c @@ -173,7 +173,7 @@ static void ruc_loopback(struct rvt_qp *sqp) struct rvt_swqe *wqe; struct rvt_sge *sge; unsigned long flags; - struct ib_wc wc; + struct rvt_wc wc; u64 sdata; atomic64_t *maddr; enum ib_wc_status send_status; diff --git a/drivers/infiniband/hw/hfi1/uc.c b/drivers/infiniband/hw/hfi1/uc.c index e254dce..5b5e057 100644 --- a/drivers/infiniband/hw/hfi1/uc.c +++ b/drivers/infiniband/hw/hfi1/uc.c @@ -312,7 +312,7 @@ void hfi1_uc_rcv(struct hfi1_packet *packet) u32 hdrsize = packet->hlen; u32 psn; u32 pad = packet->pad; - struct ib_wc wc; + struct rvt_wc wc; u32 pmtu = qp->pmtu; struct ib_reth *reth; int ret; diff --git a/drivers/infiniband/hw/hfi1/ud.c b/drivers/infiniband/hw/hfi1/ud.c index 70d39fc..8a4785e 100644 --- a/drivers/infiniband/hw/hfi1/ud.c +++ b/drivers/infiniband/hw/hfi1/ud.c @@ -79,7 +79,7 @@ static void ud_loopback(struct rvt_qp *sqp, struct rvt_swqe *swqe) unsigned long flags; struct rvt_sge_state ssge; struct rvt_sge *sge; - struct ib_wc wc; + struct rvt_wc wc; u32 length; enum ib_qp_type sqptype, dqptype; @@ -867,7 +867,7 @@ static int opa_smp_check(struct hfi1_ibport *ibp, u16 pkey, u8 sc5, void hfi1_ud_rcv(struct hfi1_packet *packet) { u32 hdrsize = packet->hlen; - struct ib_wc wc; + struct rvt_wc wc; u32 src_qp; u16 pkey; int mgmt_pkey_idx = -1; diff --git a/drivers/infiniband/hw/qib/qib_rc.c b/drivers/infiniband/hw/qib/qib_rc.c index f35fdeb..a121d2c 100644 --- a/drivers/infiniband/hw/qib/qib_rc.c +++ b/drivers/infiniband/hw/qib/qib_rc.c @@ -1744,7 +1744,7 @@ void qib_rc_rcv(struct qib_ctxtdata *rcd, struct ib_header *hdr, u32 hdrsize; u32 psn; u32 pad; - struct ib_wc wc; + struct rvt_wc wc; u32 pmtu = qp->pmtu; int diff; struct ib_reth *reth; diff --git a/drivers/infiniband/hw/qib/qib_ruc.c b/drivers/infiniband/hw/qib/qib_ruc.c index f8a7de7..1230c37 100644 --- a/drivers/infiniband/hw/qib/qib_ruc.c +++ b/drivers/infiniband/hw/qib/qib_ruc.c @@ -191,7 +191,7 @@ static void qib_ruc_loopback(struct rvt_qp *sqp) struct rvt_swqe *wqe; struct rvt_sge *sge; unsigned long flags; - struct ib_wc wc; + struct rvt_wc wc; u64 sdata; atomic64_t *maddr; enum ib_wc_status send_status; diff --git a/drivers/infiniband/hw/qib/qib_uc.c b/drivers/infiniband/hw/qib/qib_uc.c index 3e54bc1..fa7dd3a 100644 --- a/drivers/infiniband/hw/qib/qib_uc.c +++ b/drivers/infiniband/hw/qib/qib_uc.c @@ -242,7 +242,7 @@ void qib_uc_rcv(struct qib_ibport *ibp, struct ib_header *hdr, u32 hdrsize; u32 psn; u32 pad; - struct ib_wc wc; + struct rvt_wc wc; u32 pmtu = qp->pmtu; struct ib_reth *reth; int ret; diff --git a/drivers/infiniband/hw/qib/qib_ud.c b/drivers/infiniband/hw/qib/qib_ud.c index f8d029a..580ebc8 100644 --- a/drivers/infiniband/hw/qib/qib_ud.c +++ b/drivers/infiniband/hw/qib/qib_ud.c @@ -58,7 +58,7 @@ static void qib_ud_loopback(struct rvt_qp *sqp, struct rvt_swqe *swqe) unsigned long flags; struct rvt_sge_state ssge; struct rvt_sge *sge; - struct ib_wc wc; + struct rvt_wc wc; u32 length; enum ib_qp_type sqptype, dqptype; @@ -434,7 +434,7 @@ void qib_ud_rcv(struct qib_ibport *ibp, struct ib_header *hdr, int opcode; u32 hdrsize; u32 pad; - struct ib_wc wc; + struct rvt_wc wc; u32 qkey; u32 src_qp; u16 dlid; diff --git a/drivers/infiniband/sw/rdmavt/cq.c b/drivers/infiniband/sw/rdmavt/cq.c index 6406a08..e729335 100644 --- a/drivers/infiniband/sw/rdmavt/cq.c +++ b/drivers/infiniband/sw/rdmavt/cq.c @@ -62,10 +62,10 @@ * * This may be called with qp->s_lock held. */ -void rvt_cq_enter(struct rvt_cq *cq, struct ib_wc *entry, bool solicited) +void rvt_cq_enter(struct rvt_cq *cq, struct rvt_wc *entry, bool solicited) { struct ib_uverbs_wc *uqueue = NULL; - struct ib_wc *kqueue = NULL; + struct rvt_wc *kqueue = NULL; struct rvt_cq_wc *u_wc = NULL; struct rvt_k_cq_wc *k_wc = NULL; unsigned long flags; @@ -230,6 +230,8 @@ struct ib_cq *rvt_create_cq(struct ib_device *ibdev, * numbers of entries. */ if (udata && udata->outlen >= sizeof(__u64)) { + int err; + sz = sizeof(struct ib_uverbs_wc) * (entries + 1); sz += sizeof(*u_wc); u_wc = vmalloc_user(sz); @@ -237,23 +239,11 @@ struct ib_cq *rvt_create_cq(struct ib_device *ibdev, ret = ERR_PTR(-ENOMEM); goto bail_cq; } - } else { - sz = sizeof(struct ib_wc) * (entries + 1); - sz += sizeof(*k_wc); - k_wc = vzalloc_node(sz, rdi->dparms.node); - if (!k_wc) { - ret = ERR_PTR(-ENOMEM); - goto bail_cq; - } - } - - /* - * Return the address of the WC as the offset to mmap. - * See rvt_mmap() for details. - */ - if (udata && udata->outlen >= sizeof(__u64)) { - int err; + /* + * Return the address of the WC as the offset to mmap. + * See rvt_mmap() for details. + */ cq->ip = rvt_create_mmap_info(rdi, sz, context, u_wc); if (!cq->ip) { ret = ERR_PTR(-ENOMEM); @@ -266,6 +256,14 @@ struct ib_cq *rvt_create_cq(struct ib_device *ibdev, ret = ERR_PTR(err); goto bail_ip; } + } else { + sz = sizeof(struct rvt_wc) * (entries + 1); + sz += sizeof(*k_wc); + k_wc = vzalloc_node(sz, rdi->dparms.node); + if (!k_wc) { + ret = ERR_PTR(-ENOMEM); + goto bail_cq; + } } spin_lock_irq(&rdi->n_cqs_lock); @@ -343,6 +341,7 @@ int rvt_destroy_cq(struct ib_cq *ibcq) kref_put(&cq->ip->ref, rvt_release_mmap_info); else vfree(cq->queue); + vfree(cq->kqueue); kfree(cq); return 0; @@ -418,7 +417,7 @@ int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata) if (!u_wc) return -ENOMEM; } else { - sz = sizeof(struct ib_wc) * (cqe + 1); + sz = sizeof(struct rvt_wc) * (cqe + 1); sz += sizeof(*k_wc); k_wc = vzalloc_node(sz, rdi->dparms.node); if (!k_wc) @@ -520,6 +519,24 @@ int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata) return ret; } +static void copy_rvt_wc_to_ib_wc(struct ib_wc *ibwc, struct rvt_wc *rvtwc) +{ + ibwc->wr_id = rvtwc->wr_id; + ibwc->status = rvtwc->status; + ibwc->opcode = rvtwc->opcode; + ibwc->vendor_err = rvtwc->vendor_err; + ibwc->byte_len = rvtwc->byte_len; + ibwc->qp = rvtwc->qp; + ibwc->ex.invalidate_rkey = rvtwc->ex.invalidate_rkey; + ibwc->src_qp = rvtwc->src_qp; + ibwc->wc_flags = rvtwc->wc_flags; + ibwc->slid = rvtwc->slid; + ibwc->pkey_index = rvtwc->pkey_index; + ibwc->sl = rvtwc->sl; + ibwc->dlid_path_bits = rvtwc->dlid_path_bits; + ibwc->port_num = rvtwc->port_num; +} + /** * rvt_poll_cq - poll for work completion entries * @ibcq: the completion queue to poll @@ -554,7 +571,7 @@ int rvt_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry) break; /* The kernel doesn't need a RMB since it has the lock. */ trace_rvt_cq_poll(cq, &wc->kqueue[tail], npolled); - *entry = wc->kqueue[tail]; + copy_rvt_wc_to_ib_wc(entry, &wc->kqueue[tail]); if (tail >= cq->ibcq.cqe) tail = 0; else diff --git a/drivers/infiniband/sw/rdmavt/qp.c b/drivers/infiniband/sw/rdmavt/qp.c index 5ce403c..e2cc827 100644 --- a/drivers/infiniband/sw/rdmavt/qp.c +++ b/drivers/infiniband/sw/rdmavt/qp.c @@ -1049,7 +1049,7 @@ struct ib_qp *rvt_create_qp(struct ib_pd *ibpd, */ int rvt_error_qp(struct rvt_qp *qp, enum ib_wc_status err) { - struct ib_wc wc; + struct rvt_wc wc; int ret = 0; struct rvt_dev_info *rdi = ib_to_rvt(qp->ibqp.device); @@ -1573,7 +1573,7 @@ int rvt_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr, return -ENOMEM; } if (unlikely(qp_err_flush)) { - struct ib_wc wc; + struct rvt_wc wc; memset(&wc, 0, sizeof(wc)); wc.qp = &qp->ibqp; @@ -1996,7 +1996,7 @@ int rvt_post_srq_recv(struct ib_srq *ibsrq, const struct ib_recv_wr *wr, static int init_sge(struct rvt_qp *qp, struct rvt_rwqe *wqe) { int i, j, ret; - struct ib_wc wc; + struct rvt_wc wc; struct rvt_lkey_table *rkt; struct rvt_pd *pd; struct rvt_sge_state *ss; diff --git a/drivers/infiniband/sw/rdmavt/trace_cq.h b/drivers/infiniband/sw/rdmavt/trace_cq.h index df8e1ad..832308d 100644 --- a/drivers/infiniband/sw/rdmavt/trace_cq.h +++ b/drivers/infiniband/sw/rdmavt/trace_cq.h @@ -109,7 +109,7 @@ DECLARE_EVENT_CLASS( rvt_cq_entry_template, - TP_PROTO(struct rvt_cq *cq, struct ib_wc *wc, u32 idx), + TP_PROTO(struct rvt_cq *cq, struct rvt_wc *wc, u32 idx), TP_ARGS(cq, wc, idx), TP_STRUCT__entry( RDI_DEV_ENTRY(cq->rdi) @@ -143,12 +143,12 @@ DEFINE_EVENT( rvt_cq_entry_template, rvt_cq_enter, - TP_PROTO(struct rvt_cq *cq, struct ib_wc *wc, u32 idx), + TP_PROTO(struct rvt_cq *cq, struct rvt_wc *wc, u32 idx), TP_ARGS(cq, wc, idx)); DEFINE_EVENT( rvt_cq_entry_template, rvt_cq_poll, - TP_PROTO(struct rvt_cq *cq, struct ib_wc *wc, u32 idx), + TP_PROTO(struct rvt_cq *cq, struct rvt_wc *wc, u32 idx), TP_ARGS(cq, wc, idx)); #endif /* __RVT_TRACE_CQ_H */ diff --git a/include/rdma/rdmavt_cq.h b/include/rdma/rdmavt_cq.h index 9f25be0..a5d3e3c 100644 --- a/include/rdma/rdmavt_cq.h +++ b/include/rdma/rdmavt_cq.h @@ -62,6 +62,30 @@ #include /* + * If any fields within struct rvt_wc change, the function + * copy_rvt_wc_to_ib_wc() should be updated. + */ +struct rvt_wc { + u64 wr_id; + enum ib_wc_status status; + enum ib_wc_opcode opcode; + u32 vendor_err; + u32 byte_len; + struct ib_qp *qp; + union { + __be32 imm_data; + u32 invalidate_rkey; + } ex; + u32 src_qp; + int wc_flags; + u32 slid; + u16 pkey_index; + u8 sl; + u8 dlid_path_bits; + u8 port_num; /* valid only for DR SMPs onswitches*/ +} ____cacheline_aligned_in_smp; + +/* * This structure is used to contain the head pointer, tail pointer, * and completion queue entries as a single memory allocation so * it can be mmap'ed into user space. @@ -69,7 +93,8 @@ struct rvt_k_cq_wc { u32 head; /* index of next entry to fill */ u32 tail; /* index of next ib_poll_cq() entry */ - struct ib_wc kqueue[0]; + /* this is actually size ibcq.cqe + 1 */ + struct rvt_wc kqueue[0]; }; /* @@ -93,6 +118,6 @@ struct rvt_cq { return container_of(ibcq, struct rvt_cq, ibcq); } -void rvt_cq_enter(struct rvt_cq *cq, struct ib_wc *entry, bool solicited); +void rvt_cq_enter(struct rvt_cq *cq, struct rvt_wc *entry, bool solicited); #endif /* DEF_RDMAVT_INCCQH */ diff --git a/include/rdma/rdmavt_qp.h b/include/rdma/rdmavt_qp.h index 927f6d5..125db26 100644 --- a/include/rdma/rdmavt_qp.h +++ b/include/rdma/rdmavt_qp.h @@ -589,7 +589,7 @@ static inline void rvt_qp_swqe_complete( if (!(qp->s_flags & RVT_S_SIGNAL_REQ_WR) || (wqe->wr.send_flags & IB_SEND_SIGNALED) || status != IB_WC_SUCCESS) { - struct ib_wc wc; + struct rvt_wc wc; memset(&wc, 0, sizeof(wc)); wc.wr_id = wqe->wr.wr_id; From patchwork Mon Sep 10 16:49:27 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Dalessandro X-Patchwork-Id: 10594539 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 30AFB14E5 for ; Mon, 10 Sep 2018 16:49:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 22328290DF for ; Mon, 10 Sep 2018 16:49:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 168E6290EE; Mon, 10 Sep 2018 16:49:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 60C69290ED for ; Mon, 10 Sep 2018 16:49:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728741AbeIJVo0 (ORCPT ); Mon, 10 Sep 2018 17:44:26 -0400 Received: from mga03.intel.com ([134.134.136.65]:5582 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728150AbeIJVo0 (ORCPT ); Mon, 10 Sep 2018 17:44:26 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Sep 2018 09:49:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,356,1531810800"; d="scan'208";a="82349666" Received: from scymds01.sc.intel.com ([10.82.194.37]) by orsmga003.jf.intel.com with ESMTP; 10 Sep 2018 09:49:28 -0700 Received: from scvm10.sc.intel.com (scvm10.sc.intel.com [10.82.195.27]) by scymds01.sc.intel.com with ESMTP id w8AGnRY9019630; Mon, 10 Sep 2018 09:49:27 -0700 Received: from scvm10.sc.intel.com (localhost [127.0.0.1]) by scvm10.sc.intel.com with ESMTP id w8AGnR5p019354; Mon, 10 Sep 2018 09:49:27 -0700 Subject: [PATCH v2 for-next 3/3] IB/{hfi1, qib, rdmavt}: Schedule multi RC/UC packets instead of posting From: Dennis Dalessandro To: jgg@ziepe.ca, dledford@redhat.com Cc: linux-rdma@vger.kernel.org, "Michael J. Ruhl" , Mike Marciniszyn , Ira Weiny Date: Mon, 10 Sep 2018 09:49:27 -0700 Message-ID: <20180910164924.15907.54127.stgit@scvm10.sc.intel.com> In-Reply-To: <20180910164804.15907.64096.stgit@scvm10.sc.intel.com> References: <20180910164804.15907.64096.stgit@scvm10.sc.intel.com> User-Agent: StGit/0.17.1-18-g2e886-dirty MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Michael J. Ruhl The post_send() path determines if it should post directly or, schedule the post for later. The current logic is: if the swqe ring is empty or (for hfi1) wqe->length <= piothreshold post the send else schedule This can allow large requests to call the send engine directly. Large requests can potentially produce a large number of packets prior to returning to the caller, blocking the caller from posting more requests, and allowing better parallel processing. Allow the driver(s) more say in this logic (pass call_send to the driver, rather than examining a return value). Update hfi1/qib logic to schedule the send engine if an RC or UC message is larger than the QP MTU size. Reviewed-by: Mike Marciniszyn Reviewed-by: Ira Weiny Signed-off-by: Michael J. Ruhl Signed-off-by: Dennis Dalessandro --- drivers/infiniband/hw/hfi1/qp.c | 12 +++++------- drivers/infiniband/hw/hfi1/verbs.h | 3 ++- drivers/infiniband/hw/qib/qib_qp.c | 17 +++++++---------- drivers/infiniband/hw/qib/qib_verbs.h | 3 ++- drivers/infiniband/sw/rdmavt/qp.c | 14 ++++++++------ include/rdma/rdma_vt.h | 10 ++++++++-- 6 files changed, 32 insertions(+), 27 deletions(-) diff --git a/drivers/infiniband/hw/hfi1/qp.c b/drivers/infiniband/hw/hfi1/qp.c index 352fef6..17ccd47 100644 --- a/drivers/infiniband/hw/hfi1/qp.c +++ b/drivers/infiniband/hw/hfi1/qp.c @@ -285,17 +285,13 @@ void hfi1_modify_qp(struct rvt_qp *qp, struct ib_qp_attr *attr, * hfi1_check_send_wqe - validate wqe * @qp - The qp * @wqe - The built wqe - * - * validate wqe. This is called - * prior to inserting the wqe into - * the ring but after the wqe has been - * setup. + * @call_send - Determine if the send should be posted or scheduled. * * Returns 0 on success, -EINVAL on failure * */ int hfi1_check_send_wqe(struct rvt_qp *qp, - struct rvt_swqe *wqe) + struct rvt_swqe *wqe, bool *call_send) { struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, qp->port_num); struct rvt_ah *ah; @@ -305,6 +301,8 @@ int hfi1_check_send_wqe(struct rvt_qp *qp, case IB_QPT_UC: if (wqe->length > 0x80000000U) return -EINVAL; + if (wqe->length > qp->pmtu) + *call_send = false; break; case IB_QPT_SMI: ah = ibah_to_rvtah(wqe->ud_wr.ah); @@ -321,7 +319,7 @@ int hfi1_check_send_wqe(struct rvt_qp *qp, default: break; } - return wqe->length <= piothreshold; + return 0; } /** diff --git a/drivers/infiniband/hw/hfi1/verbs.h b/drivers/infiniband/hw/hfi1/verbs.h index a4d0650..269ec33 100644 --- a/drivers/infiniband/hw/hfi1/verbs.h +++ b/drivers/infiniband/hw/hfi1/verbs.h @@ -343,7 +343,8 @@ int hfi1_check_modify_qp(struct rvt_qp *qp, struct ib_qp_attr *attr, void hfi1_modify_qp(struct rvt_qp *qp, struct ib_qp_attr *attr, int attr_mask, struct ib_udata *udata); void hfi1_restart_rc(struct rvt_qp *qp, u32 psn, int wait); -int hfi1_check_send_wqe(struct rvt_qp *qp, struct rvt_swqe *wqe); +int hfi1_check_send_wqe(struct rvt_qp *qp, struct rvt_swqe *wqe, + bool *call_send); extern const u32 rc_only_opcode; extern const u32 uc_only_opcode; diff --git a/drivers/infiniband/hw/qib/qib_qp.c b/drivers/infiniband/hw/qib/qib_qp.c index 344e401..a81905d 100644 --- a/drivers/infiniband/hw/qib/qib_qp.c +++ b/drivers/infiniband/hw/qib/qib_qp.c @@ -378,25 +378,22 @@ void qib_flush_qp_waiters(struct rvt_qp *qp) * qib_check_send_wqe - validate wr/wqe * @qp - The qp * @wqe - The built wqe + * @call_send - Determine if the send should be posted or scheduled * - * validate wr/wqe. This is called - * prior to inserting the wqe into - * the ring but after the wqe has been - * setup. - * - * Returns 1 to force direct progress, 0 otherwise, -EINVAL on failure + * Returns 0 on success, -EINVAL on failure */ int qib_check_send_wqe(struct rvt_qp *qp, - struct rvt_swqe *wqe) + struct rvt_swqe *wqe, bool *call_send) { struct rvt_ah *ah; - int ret = 0; switch (qp->ibqp.qp_type) { case IB_QPT_RC: case IB_QPT_UC: if (wqe->length > 0x80000000U) return -EINVAL; + if (wqe->length > qp->pmtu) + *call_send = false; break; case IB_QPT_SMI: case IB_QPT_GSI: @@ -405,12 +402,12 @@ int qib_check_send_wqe(struct rvt_qp *qp, if (wqe->length > (1 << ah->log_pmtu)) return -EINVAL; /* progress hint */ - ret = 1; + *call_send = true; break; default: break; } - return ret; + return 0; } #ifdef CONFIG_DEBUG_FS diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h index 666613e..3d7b744 100644 --- a/drivers/infiniband/hw/qib/qib_verbs.h +++ b/drivers/infiniband/hw/qib/qib_verbs.h @@ -303,7 +303,8 @@ void qib_rc_rcv(struct qib_ctxtdata *rcd, struct ib_header *hdr, int qib_check_ah(struct ib_device *ibdev, struct rdma_ah_attr *ah_attr); -int qib_check_send_wqe(struct rvt_qp *qp, struct rvt_swqe *wqe); +int qib_check_send_wqe(struct rvt_qp *qp, struct rvt_swqe *wqe, + bool *call_send); struct ib_ah *qib_create_qp0_ah(struct qib_ibport *ibp, u16 dlid); diff --git a/drivers/infiniband/sw/rdmavt/qp.c b/drivers/infiniband/sw/rdmavt/qp.c index e2cc827..8da0ebc 100644 --- a/drivers/infiniband/sw/rdmavt/qp.c +++ b/drivers/infiniband/sw/rdmavt/qp.c @@ -1718,7 +1718,7 @@ static inline int rvt_qp_is_avail( */ static int rvt_post_one_wr(struct rvt_qp *qp, const struct ib_send_wr *wr, - int *call_send) + bool *call_send) { struct rvt_swqe *wqe; u32 next; @@ -1825,11 +1825,9 @@ static int rvt_post_one_wr(struct rvt_qp *qp, /* general part of wqe valid - allow for driver checks */ if (rdi->driver_f.check_send_wqe) { - ret = rdi->driver_f.check_send_wqe(qp, wqe); + ret = rdi->driver_f.check_send_wqe(qp, wqe, call_send); if (ret < 0) goto bail_inval_free; - if (ret) - *call_send = ret; } log_pmtu = qp->log_pmtu; @@ -1897,7 +1895,7 @@ int rvt_post_send(struct ib_qp *ibqp, const struct ib_send_wr *wr, struct rvt_qp *qp = ibqp_to_rvtqp(ibqp); struct rvt_dev_info *rdi = ib_to_rvt(ibqp->device); unsigned long flags = 0; - int call_send; + bool call_send; unsigned nreq = 0; int err = 0; @@ -1930,7 +1928,11 @@ int rvt_post_send(struct ib_qp *ibqp, const struct ib_send_wr *wr, bail: spin_unlock_irqrestore(&qp->s_hlock, flags); if (nreq) { - if (call_send) + /* + * Only call do_send if there is exactly one packet, and the + * driver said it was ok. + */ + if (nreq == 1 && call_send) rdi->driver_f.do_send(qp); else rdi->driver_f.schedule_send_no_lock(qp); diff --git a/include/rdma/rdma_vt.h b/include/rdma/rdma_vt.h index e79229a..e32facd 100644 --- a/include/rdma/rdma_vt.h +++ b/include/rdma/rdma_vt.h @@ -214,8 +214,14 @@ struct rvt_driver_provided { void (*schedule_send)(struct rvt_qp *qp); void (*schedule_send_no_lock)(struct rvt_qp *qp); - /* Driver specific work request checking */ - int (*check_send_wqe)(struct rvt_qp *qp, struct rvt_swqe *wqe); + /* + * Validate the wqe. This needs to be done prior to inserting the + * wqe into the ring, but after the wqe has been set up. Allow for + * driver specific work request checking by providing a callback. + * call_send indicates if the wqe should be posted or scheduled. + */ + int (*check_send_wqe)(struct rvt_qp *qp, struct rvt_swqe *wqe, + bool *call_send); /* * Sometimes rdmavt needs to kick the driver's send progress. That is