From patchwork Wed Dec 5 15:10:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Dalessandro X-Patchwork-Id: 10714367 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9DEB718A7 for ; Wed, 5 Dec 2018 15:10:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8C0062D452 for ; Wed, 5 Dec 2018 15:10:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7F3BA2D45B; Wed, 5 Dec 2018 15:10:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 10A592D452 for ; Wed, 5 Dec 2018 15:10:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727257AbeLEPKk (ORCPT ); Wed, 5 Dec 2018 10:10:40 -0500 Received: from mga12.intel.com ([192.55.52.136]:18290 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727094AbeLEPKk (ORCPT ); Wed, 5 Dec 2018 10:10:40 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Dec 2018 07:10:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,317,1539673200"; d="scan'208";a="257030428" Received: from scymds01.sc.intel.com ([10.82.194.37]) by orsmga004.jf.intel.com with ESMTP; 05 Dec 2018 07:10:39 -0800 Received: from scvm10.sc.intel.com (scvm10.sc.intel.com [10.82.195.27]) by scymds01.sc.intel.com with ESMTP id wB5FAdKT030975; Wed, 5 Dec 2018 07:10:39 -0800 Received: from scvm10.sc.intel.com (localhost [127.0.0.1]) by scvm10.sc.intel.com with ESMTP id wB5FAd1J031951; Wed, 5 Dec 2018 07:10:39 -0800 Subject: [PATCH for-next v2 1/4] IB/{hfi1, qib}: Cleanup open coded sge sizing From: Dennis Dalessandro To: jgg@ziepe.ca, dledford@redhat.com Cc: linux-rdma@vger.kernel.org, "Michael J. Ruhl" , Mike Marciniszyn Date: Wed, 05 Dec 2018 07:10:39 -0800 Message-ID: <20181205151026.27132.54027.stgit@scvm10.sc.intel.com> In-Reply-To: <20181205150358.27132.36815.stgit@scvm10.sc.intel.com> References: <20181205150358.27132.36815.stgit@scvm10.sc.intel.com> User-Agent: StGit/0.17.1-18-g2e886-dirty MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Michael J. Ruhl Sge sizing is done in several places using an open coded method. This can cause maintenance issues. The open coded method is encapsulated in a helper routine. The helper was introduced with commit: 1198fcea8a78 ("IB/hfi1, rdmavt: Move SGE state helper routines into rdmavt") Update all call sites that have the open coded path with the helper routine. Reviewed-by: Mike Marciniszyn Signed-off-by: Michael J. Ruhl Signed-off-by: Dennis Dalessandro --- drivers/infiniband/hw/hfi1/verbs.c | 12 ++---------- drivers/infiniband/hw/qib/qib_ud.c | 6 +----- drivers/infiniband/hw/qib/qib_verbs.c | 18 +++--------------- 3 files changed, 6 insertions(+), 30 deletions(-) diff --git a/drivers/infiniband/hw/hfi1/verbs.c b/drivers/infiniband/hw/hfi1/verbs.c index 630f38e..55ec4db 100644 --- a/drivers/infiniband/hw/hfi1/verbs.c +++ b/drivers/infiniband/hw/hfi1/verbs.c @@ -553,11 +553,7 @@ static noinline int build_verbs_ulp_payload( int ret = 0; while (length) { - len = ss->sge.length; - if (len > length) - len = length; - if (len > ss->sge.sge_length) - len = ss->sge.sge_length; + len = rvt_get_sge_length(&ss->sge, length); WARN_ON_ONCE(len == 0); ret = sdma_txadd_kvaddr( sde->dd, @@ -914,12 +910,8 @@ int hfi1_verbs_send_pio(struct rvt_qp *qp, struct hfi1_pkt_state *ps, if (ss) { while (len) { void *addr = ss->sge.vaddr; - u32 slen = ss->sge.length; + u32 slen = rvt_get_sge_length(&ss->sge, len); - if (slen > len) - slen = len; - if (slen > ss->sge.sge_length) - slen = ss->sge.sge_length; rvt_update_sge(ss, slen, false); seg_pio_copy_mid(pbuf, addr, slen); len -= slen; diff --git a/drivers/infiniband/hw/qib/qib_ud.c b/drivers/infiniband/hw/qib/qib_ud.c index 4d4c31e..8dab667 100644 --- a/drivers/infiniband/hw/qib/qib_ud.c +++ b/drivers/infiniband/hw/qib/qib_ud.c @@ -172,12 +172,8 @@ static void qib_ud_loopback(struct rvt_qp *sqp, struct rvt_swqe *swqe) ssge.num_sge = swqe->wr.num_sge; sge = &ssge.sge; while (length) { - u32 len = sge->length; + u32 len = rvt_get_sge_length(sge, length); - if (len > length) - len = length; - if (len > sge->sge_length) - len = sge->sge_length; BUG_ON(len == 0); rvt_copy_sge(qp, &qp->r_sge, sge->vaddr, len, true, false); sge->vaddr += len; diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c index 4b0f576..9cbc644 100644 --- a/drivers/infiniband/hw/qib/qib_verbs.c +++ b/drivers/infiniband/hw/qib/qib_verbs.c @@ -144,12 +144,8 @@ static u32 qib_count_sge(struct rvt_sge_state *ss, u32 length) u32 ndesc = 1; /* count the header */ while (length) { - u32 len = sge.length; + u32 len = rvt_get_sge_length(&sge, length); - if (len > length) - len = length; - if (len > sge.sge_length) - len = sge.sge_length; BUG_ON(len == 0); if (((long) sge.vaddr & (sizeof(u32) - 1)) || (len != length && (len & (sizeof(u32) - 1)))) { @@ -187,12 +183,8 @@ static void qib_copy_from_sge(void *data, struct rvt_sge_state *ss, u32 length) struct rvt_sge *sge = &ss->sge; while (length) { - u32 len = sge->length; + u32 len = rvt_get_sge_length(sge, length); - if (len > length) - len = length; - if (len > sge->sge_length) - len = sge->sge_length; BUG_ON(len == 0); memcpy(data, sge->vaddr, len); sge->vaddr += len; @@ -442,13 +434,9 @@ static void copy_io(u32 __iomem *piobuf, struct rvt_sge_state *ss, u32 last; while (1) { - u32 len = ss->sge.length; + u32 len = rvt_get_sge_length(&ss->sge, length); u32 off; - if (len > length) - len = length; - if (len > ss->sge.sge_length) - len = ss->sge.sge_length; BUG_ON(len == 0); /* If the source address is not aligned, try to align it. */ off = (unsigned long)ss->sge.vaddr & (sizeof(u32) - 1); From patchwork Wed Dec 5 15:10:52 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Dalessandro X-Patchwork-Id: 10714369 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EC61C13AF for ; Wed, 5 Dec 2018 15:10:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D88702D443 for ; Wed, 5 Dec 2018 15:10:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CC0AA2D44C; Wed, 5 Dec 2018 15:10:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B5D272D45E for ; Wed, 5 Dec 2018 15:10:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727692AbeLEPK5 (ORCPT ); Wed, 5 Dec 2018 10:10:57 -0500 Received: from mga06.intel.com ([134.134.136.31]:28941 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727094AbeLEPK5 (ORCPT ); Wed, 5 Dec 2018 10:10:57 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Dec 2018 07:10:53 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,317,1539673200"; d="scan'208";a="299557551" Received: from scymds01.sc.intel.com ([10.82.194.37]) by fmsmga006.fm.intel.com with ESMTP; 05 Dec 2018 07:10:54 -0800 Received: from scvm10.sc.intel.com (scvm10.sc.intel.com [10.82.195.27]) by scymds01.sc.intel.com with ESMTP id wB5FAq17031260; Wed, 5 Dec 2018 07:10:52 -0800 Received: from scvm10.sc.intel.com (localhost [127.0.0.1]) by scvm10.sc.intel.com with ESMTP id wB5FAq7A032020; Wed, 5 Dec 2018 07:10:52 -0800 Subject: [PATCH for-next v2 2/4] IB/hfi1: Move rvt_cq_wc struct into uapi directory From: Dennis Dalessandro To: jgg@ziepe.ca, dledford@redhat.com Cc: linux-rdma@vger.kernel.org, Mike Marciniszyn , Kamenee Arumugam Date: Wed, 05 Dec 2018 07:10:52 -0800 Message-ID: <20181205151044.27132.1576.stgit@scvm10.sc.intel.com> In-Reply-To: <20181205150358.27132.36815.stgit@scvm10.sc.intel.com> References: <20181205150358.27132.36815.stgit@scvm10.sc.intel.com> User-Agent: StGit/0.17.1-18-g2e886-dirty MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Kamenee Arumugam The rvt_cq_wc struct elements are shared between rdmavt and the providers but not in uapi directory. As per the comment in https://marc.info/?l=linux-rdma&m=152296522708522&w=2 The hfi1 driver and the rdma core driver are not using shared structures in the uapi directory. In that case, move rvt_cq_wc struct into the rvt-abi.h header file and create a rvt_k_cq_w for the kernel completion queue. Signed-off-by: Kamenee Arumugam Reviewed-by: Mike Marciniszyn Signed-off-by: Dennis Dalessandro --- drivers/infiniband/hw/hfi1/qp.c | 32 ++++++ drivers/infiniband/sw/rdmavt/cq.c | 189 ++++++++++++++++++++++++------------- include/rdma/rdmavt_cq.h | 10 +- include/uapi/rdma/rvt-abi.h | 31 ++++++ 4 files changed, 186 insertions(+), 76 deletions(-) create mode 100644 include/uapi/rdma/rvt-abi.h diff --git a/drivers/infiniband/hw/hfi1/qp.c b/drivers/infiniband/hw/hfi1/qp.c index 5344e89..7e6d0a7 100644 --- a/drivers/infiniband/hw/hfi1/qp.c +++ b/drivers/infiniband/hw/hfi1/qp.c @@ -591,6 +591,34 @@ static int qp_idle(struct rvt_qp *qp) } /** + * ib_cq_tail - Return tail index of cq buffer + * @send_cq - The cq for send + * + * This is called in qp_iter_print to get tail + * of cq buffer. + */ +static u32 ib_cq_tail(struct ib_cq *send_cq) +{ + return ibcq_to_rvtcq(send_cq)->ip ? + ibcq_to_rvtcq(send_cq)->queue->tail : + ibcq_to_rvtcq(send_cq)->kqueue->tail; +} + +/** + * ib_cq_head - Return head index of cq buffer + * @send_cq - The cq for send + * + * This is called in qp_iter_print to get head + * of cq buffer. + */ +static u32 ib_cq_head(struct ib_cq *send_cq) +{ + return ibcq_to_rvtcq(send_cq)->ip ? + ibcq_to_rvtcq(send_cq)->queue->head : + ibcq_to_rvtcq(send_cq)->kqueue->head; +} + +/** * qp_iter_print - print the qp information to seq_file * @s: the seq_file to emit the qp information on * @iter: the iterator for the qp hash list @@ -650,8 +678,8 @@ void qp_iter_print(struct seq_file *s, struct rvt_qp_iter *iter) sde ? sde->this_idx : 0, send_context, send_context ? send_context->sw_index : 0, - ibcq_to_rvtcq(qp->ibqp.send_cq)->queue->head, - ibcq_to_rvtcq(qp->ibqp.send_cq)->queue->tail, + ib_cq_head(qp->ibqp.send_cq), + ib_cq_tail(qp->ibqp.send_cq), qp->pid, qp->s_state, qp->s_ack_state, diff --git a/drivers/infiniband/sw/rdmavt/cq.c b/drivers/infiniband/sw/rdmavt/cq.c index 4f1544a..31c1d68 100644 --- a/drivers/infiniband/sw/rdmavt/cq.c +++ b/drivers/infiniband/sw/rdmavt/cq.c @@ -63,19 +63,33 @@ */ void rvt_cq_enter(struct rvt_cq *cq, struct ib_wc *entry, bool solicited) { - struct rvt_cq_wc *wc; + struct ib_uverbs_wc *uqueue = NULL; + struct ib_wc *kqueue = NULL; + struct rvt_cq_wc *u_wc = NULL; + struct rvt_k_cq_wc *k_wc = NULL; unsigned long flags; u32 head; u32 next; + u32 tail; spin_lock_irqsave(&cq->lock, flags); + if (cq->ip) { + u_wc = cq->queue; + uqueue = &u_wc->uqueue[0]; + head = u_wc->head; + tail = u_wc->tail; + } else { + k_wc = cq->kqueue; + kqueue = &k_wc->kqueue[0]; + head = k_wc->head; + tail = k_wc->tail; + } + /* - * Note that the head pointer might be writable by user processes. - * Take care to verify it is a sane value. + * Note that the head pointer might be writable by + * user processes.Take care to verify it is a sane value. */ - wc = cq->queue; - head = wc->head; if (head >= (unsigned)cq->ibcq.cqe) { head = cq->ibcq.cqe; next = 0; @@ -83,7 +97,7 @@ void rvt_cq_enter(struct rvt_cq *cq, struct ib_wc *entry, bool solicited) next = head + 1; } - if (unlikely(next == wc->tail)) { + if (unlikely(next == tail)) { spin_unlock_irqrestore(&cq->lock, flags); if (cq->ibcq.event_handler) { struct ib_event ev; @@ -96,27 +110,28 @@ void rvt_cq_enter(struct rvt_cq *cq, struct ib_wc *entry, bool solicited) return; } trace_rvt_cq_enter(cq, entry, head); - if (cq->ip) { - wc->uqueue[head].wr_id = entry->wr_id; - wc->uqueue[head].status = entry->status; - wc->uqueue[head].opcode = entry->opcode; - wc->uqueue[head].vendor_err = entry->vendor_err; - wc->uqueue[head].byte_len = entry->byte_len; - wc->uqueue[head].ex.imm_data = entry->ex.imm_data; - wc->uqueue[head].qp_num = entry->qp->qp_num; - wc->uqueue[head].src_qp = entry->src_qp; - wc->uqueue[head].wc_flags = entry->wc_flags; - wc->uqueue[head].pkey_index = entry->pkey_index; - wc->uqueue[head].slid = ib_lid_cpu16(entry->slid); - wc->uqueue[head].sl = entry->sl; - wc->uqueue[head].dlid_path_bits = entry->dlid_path_bits; - wc->uqueue[head].port_num = entry->port_num; + if (uqueue) { + uqueue[head].wr_id = entry->wr_id; + uqueue[head].status = entry->status; + uqueue[head].opcode = entry->opcode; + uqueue[head].vendor_err = entry->vendor_err; + uqueue[head].byte_len = entry->byte_len; + uqueue[head].ex.imm_data = entry->ex.imm_data; + uqueue[head].qp_num = entry->qp->qp_num; + uqueue[head].src_qp = entry->src_qp; + uqueue[head].wc_flags = entry->wc_flags; + uqueue[head].pkey_index = entry->pkey_index; + uqueue[head].slid = ib_lid_cpu16(entry->slid); + uqueue[head].sl = entry->sl; + uqueue[head].dlid_path_bits = entry->dlid_path_bits; + uqueue[head].port_num = entry->port_num; /* Make sure entry is written before the head index. */ smp_wmb(); + u_wc->head = next; } else { - wc->kqueue[head] = *entry; + kqueue[head] = *entry; + k_wc->head = next; } - wc->head = next; if (cq->notify == IB_CQ_NEXT_COMP || (cq->notify == IB_CQ_SOLICITED && @@ -183,7 +198,8 @@ struct ib_cq *rvt_create_cq(struct ib_device *ibdev, { struct rvt_dev_info *rdi = ib_to_rvt(ibdev); struct rvt_cq *cq; - struct rvt_cq_wc *wc; + struct rvt_cq_wc *u_wc = NULL; + struct rvt_k_cq_wc *k_wc = NULL; struct ib_cq *ret; u32 sz; unsigned int entries = attr->cqe; @@ -212,17 +228,22 @@ struct ib_cq *rvt_create_cq(struct ib_device *ibdev, * We need to use vmalloc() in order to support mmap and large * numbers of entries. */ - sz = sizeof(*wc); - if (udata && udata->outlen >= sizeof(__u64)) - sz += sizeof(struct ib_uverbs_wc) * (entries + 1); - else - sz += sizeof(struct ib_wc) * (entries + 1); - wc = udata ? - vmalloc_user(sz) : - vzalloc_node(sz, rdi->dparms.node); - if (!wc) { - ret = ERR_PTR(-ENOMEM); - goto bail_cq; + if (udata && udata->outlen >= sizeof(__u64)) { + sz = sizeof(struct ib_uverbs_wc) * (entries + 1); + sz += sizeof(*u_wc); + u_wc = vmalloc_user(sz); + if (!u_wc) { + ret = ERR_PTR(-ENOMEM); + goto bail_cq; + } + } else { + sz = sizeof(struct ib_wc) * (entries + 1); + sz += sizeof(*k_wc); + k_wc = vzalloc_node(sz, rdi->dparms.node); + if (!k_wc) { + ret = ERR_PTR(-ENOMEM); + goto bail_cq; + } } /* @@ -232,7 +253,7 @@ struct ib_cq *rvt_create_cq(struct ib_device *ibdev, if (udata && udata->outlen >= sizeof(__u64)) { int err; - cq->ip = rvt_create_mmap_info(rdi, sz, context, wc); + cq->ip = rvt_create_mmap_info(rdi, sz, context, u_wc); if (!cq->ip) { ret = ERR_PTR(-ENOMEM); goto bail_wc; @@ -279,7 +300,10 @@ struct ib_cq *rvt_create_cq(struct ib_device *ibdev, cq->notify = RVT_CQ_NONE; spin_lock_init(&cq->lock); INIT_WORK(&cq->comptask, send_complete); - cq->queue = wc; + if (u_wc) + cq->queue = u_wc; + else + cq->kqueue = k_wc; ret = &cq->ibcq; @@ -289,7 +313,8 @@ struct ib_cq *rvt_create_cq(struct ib_device *ibdev, bail_ip: kfree(cq->ip); bail_wc: - vfree(wc); + vfree(u_wc); + vfree(k_wc); bail_cq: kfree(cq); done: @@ -346,9 +371,15 @@ int rvt_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags notify_flags) if (cq->notify != IB_CQ_NEXT_COMP) cq->notify = notify_flags & IB_CQ_SOLICITED_MASK; - if ((notify_flags & IB_CQ_REPORT_MISSED_EVENTS) && - cq->queue->head != cq->queue->tail) - ret = 1; + if (notify_flags & IB_CQ_REPORT_MISSED_EVENTS) { + if (cq->queue) { + if (cq->queue->head != cq->queue->tail) + ret = 1; + } else { + if (cq->kqueue->head != cq->kqueue->tail) + ret = 1; + } + } spin_unlock_irqrestore(&cq->lock, flags); @@ -364,12 +395,14 @@ int rvt_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags notify_flags) int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata) { struct rvt_cq *cq = ibcq_to_rvtcq(ibcq); - struct rvt_cq_wc *old_wc; - struct rvt_cq_wc *wc; u32 head, tail, n; int ret; u32 sz; struct rvt_dev_info *rdi = cq->rdi; + struct rvt_cq_wc *u_wc = NULL; + struct rvt_cq_wc *old_u_wc = NULL; + struct rvt_k_cq_wc *k_wc = NULL; + struct rvt_k_cq_wc *old_k_wc = NULL; if (cqe < 1 || cqe > rdi->dparms.props.max_cqe) return -EINVAL; @@ -377,17 +410,19 @@ int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata) /* * Need to use vmalloc() if we want to support large #s of entries. */ - sz = sizeof(*wc); - if (udata && udata->outlen >= sizeof(__u64)) - sz += sizeof(struct ib_uverbs_wc) * (cqe + 1); - else - sz += sizeof(struct ib_wc) * (cqe + 1); - wc = udata ? - vmalloc_user(sz) : - vzalloc_node(sz, rdi->dparms.node); - if (!wc) - return -ENOMEM; - + if (udata && udata->outlen >= sizeof(__u64)) { + sz = sizeof(struct ib_uverbs_wc) * (cqe + 1); + sz += sizeof(*u_wc); + u_wc = vmalloc_user(sz); + if (!u_wc) + return -ENOMEM; + } else { + sz = sizeof(struct ib_wc) * (cqe + 1); + sz += sizeof(*k_wc); + k_wc = vzalloc_node(sz, rdi->dparms.node); + if (!k_wc) + return -ENOMEM; + } /* Check that we can write the offset to mmap. */ if (udata && udata->outlen >= sizeof(__u64)) { __u64 offset = 0; @@ -402,11 +437,18 @@ int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata) * Make sure head and tail are sane since they * might be user writable. */ - old_wc = cq->queue; - head = old_wc->head; + if (u_wc) { + old_u_wc = cq->queue; + head = old_u_wc->head; + tail = old_u_wc->tail; + } else { + old_k_wc = cq->kqueue; + head = old_k_wc->head; + tail = old_k_wc->tail; + } + if (head > (u32)cq->ibcq.cqe) head = (u32)cq->ibcq.cqe; - tail = old_wc->tail; if (tail > (u32)cq->ibcq.cqe) tail = (u32)cq->ibcq.cqe; if (head < tail) @@ -418,27 +460,36 @@ int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata) goto bail_unlock; } for (n = 0; tail != head; n++) { - if (cq->ip) - wc->uqueue[n] = old_wc->uqueue[tail]; + if (u_wc) + u_wc->uqueue[n] = old_u_wc->uqueue[tail]; else - wc->kqueue[n] = old_wc->kqueue[tail]; + k_wc->kqueue[n] = old_k_wc->kqueue[tail]; if (tail == (u32)cq->ibcq.cqe) tail = 0; else tail++; } cq->ibcq.cqe = cqe; - wc->head = n; - wc->tail = 0; - cq->queue = wc; + if (u_wc) { + u_wc->head = n; + u_wc->tail = 0; + cq->queue = u_wc; + } else { + k_wc->head = n; + k_wc->tail = 0; + cq->kqueue = k_wc; + } spin_unlock_irq(&cq->lock); - vfree(old_wc); + if (u_wc) + vfree(old_u_wc); + else + vfree(old_k_wc); if (cq->ip) { struct rvt_mmap_info *ip = cq->ip; - rvt_update_mmap_info(rdi, ip, sz, wc); + rvt_update_mmap_info(rdi, ip, sz, u_wc); /* * Return the offset to mmap. @@ -462,7 +513,9 @@ int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata) bail_unlock: spin_unlock_irq(&cq->lock); bail_free: - vfree(wc); + vfree(u_wc); + vfree(k_wc); + return ret; } @@ -480,7 +533,7 @@ int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata) int rvt_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry) { struct rvt_cq *cq = ibcq_to_rvtcq(ibcq); - struct rvt_cq_wc *wc; + struct rvt_k_cq_wc *wc; unsigned long flags; int npolled; u32 tail; @@ -491,7 +544,7 @@ int rvt_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry) spin_lock_irqsave(&cq->lock, flags); - wc = cq->queue; + wc = cq->kqueue; tail = wc->tail; if (tail > (u32)cq->ibcq.cqe) tail = (u32)cq->ibcq.cqe; diff --git a/include/rdma/rdmavt_cq.h b/include/rdma/rdmavt_cq.h index 75dc65c..9f25be0 100644 --- a/include/rdma/rdmavt_cq.h +++ b/include/rdma/rdmavt_cq.h @@ -59,20 +59,17 @@ * notifications are armed. */ #define RVT_CQ_NONE (IB_CQ_NEXT_COMP + 1) +#include /* * This structure is used to contain the head pointer, tail pointer, * and completion queue entries as a single memory allocation so * it can be mmap'ed into user space. */ -struct rvt_cq_wc { +struct rvt_k_cq_wc { u32 head; /* index of next entry to fill */ u32 tail; /* index of next ib_poll_cq() entry */ - union { - /* these are actually size ibcq.cqe + 1 */ - struct ib_uverbs_wc uqueue[0]; - struct ib_wc kqueue[0]; - }; + struct ib_wc kqueue[0]; }; /* @@ -88,6 +85,7 @@ struct rvt_cq { struct rvt_dev_info *rdi; struct rvt_cq_wc *queue; struct rvt_mmap_info *ip; + struct rvt_k_cq_wc *kqueue; }; static inline struct rvt_cq *ibcq_to_rvtcq(struct ib_cq *ibcq) diff --git a/include/uapi/rdma/rvt-abi.h b/include/uapi/rdma/rvt-abi.h new file mode 100644 index 0000000..c9ad6fa --- /dev/null +++ b/include/uapi/rdma/rvt-abi.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */ + +/* + * This file contains defines, structures, etc. that are used + * to communicate between kernel and user code. + */ + +#ifndef RVT_ABI_USER_H +#define RVT_ABI_USER_H + +#include +#include +#ifndef RDMA_ATOMIC_UAPI +#define RDMA_ATOMIC_UAPI(_type, _name) _type _name +#endif +/* + * This structure is used to contain the head pointer, tail pointer, + * and completion queue entries as a single memory allocation so + * it can be mmap'ed into user space. + */ +struct rvt_cq_wc { + /* index of next entry to fill */ + RDMA_ATOMIC_UAPI(u32, head); + /* index of next ib_poll_cq() entry */ + RDMA_ATOMIC_UAPI(u32, tail); + + /* these are actually size ibcq.cqe + 1 */ + struct ib_uverbs_wc uqueue[0]; +}; + +#endif /* RVT_ABI_USER_H */ From patchwork Wed Dec 5 15:11:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Dalessandro X-Patchwork-Id: 10714371 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DBAFB13AF for ; Wed, 5 Dec 2018 15:11:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CA7662D460 for ; Wed, 5 Dec 2018 15:11:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BE81C2D45D; Wed, 5 Dec 2018 15:11:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4EA632D452 for ; Wed, 5 Dec 2018 15:11:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727297AbeLEPLH (ORCPT ); Wed, 5 Dec 2018 10:11:07 -0500 Received: from mga09.intel.com ([134.134.136.24]:19250 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727094AbeLEPLG (ORCPT ); Wed, 5 Dec 2018 10:11:06 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Dec 2018 07:11:06 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,317,1539673200"; d="scan'208";a="257030598" Received: from scymds01.sc.intel.com ([10.82.194.37]) by orsmga004.jf.intel.com with ESMTP; 05 Dec 2018 07:11:05 -0800 Received: from scvm10.sc.intel.com (scvm10.sc.intel.com [10.82.195.27]) by scymds01.sc.intel.com with ESMTP id wB5FB56r031455; Wed, 5 Dec 2018 07:11:05 -0800 Received: from scvm10.sc.intel.com (localhost [127.0.0.1]) by scvm10.sc.intel.com with ESMTP id wB5FB510032084; Wed, 5 Dec 2018 07:11:05 -0800 Subject: [PATCH for-next v2 3/4] IB/hfi1: Move receive work queue struct into uapi directory From: Dennis Dalessandro To: jgg@ziepe.ca, dledford@redhat.com Cc: linux-rdma@vger.kernel.org, "Michael J. Ruhl" , Mike Marciniszyn , Kamenee Arumugam Date: Wed, 05 Dec 2018 07:11:05 -0800 Message-ID: <20181205151057.27132.1159.stgit@scvm10.sc.intel.com> In-Reply-To: <20181205150358.27132.36815.stgit@scvm10.sc.intel.com> References: <20181205150358.27132.36815.stgit@scvm10.sc.intel.com> User-Agent: StGit/0.17.1-18-g2e886-dirty MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Kamenee Arumugam The rvt_rwqe and rvt_rwq struct elements are shared between rdmavt and the providers but not in uapi directory. As per the comment in https://marc.info/?l=linux-rdma&m=152296522708522&w=2, The hfi1 driver and the rdma core driver are not using shared structures in the uapi directory. Move rvt_rwqe and rvt_rwq struct into rvt-abi.h header in uapi directory. Create rvt_krwq kernel struct to separate it from the user version. Reviewed-by: Mike Marciniszyn Reviewed-by: Michael J. Ruhl Signed-off-by: Kamenee Arumugam Signed-off-by: Dennis Dalessandro --- include/rdma/rdmavt_qp.h | 26 +------------------------- include/uapi/rdma/rvt-abi.h | 28 ++++++++++++++++++++++++++++ 2 files changed, 29 insertions(+), 25 deletions(-) diff --git a/include/rdma/rdmavt_qp.h b/include/rdma/rdmavt_qp.h index cbafb18..1ef17a6 100644 --- a/include/rdma/rdmavt_qp.h +++ b/include/rdma/rdmavt_qp.h @@ -52,6 +52,7 @@ #include #include #include +#include /* * Atomic bit definitions for r_aflags. */ @@ -177,31 +178,6 @@ struct rvt_swqe { struct rvt_sge sg_list[0]; }; -/* - * Receive work request queue entry. - * The size of the sg_list is determined when the QP (or SRQ) is created - * and stored in qp->r_rq.max_sge (or srq->rq.max_sge). - */ -struct rvt_rwqe { - u64 wr_id; - u8 num_sge; - struct ib_sge sg_list[0]; -}; - -/* - * This structure is used to contain the head pointer, tail pointer, - * and receive work queue entries as a single memory allocation so - * it can be mmap'ed into user space. - * Note that the wq array elements are variable size so you can't - * just index into the array to get the N'th element; - * use get_rwqe_ptr() instead. - */ -struct rvt_rwq { - u32 head; /* new work requests posted to the head */ - u32 tail; /* receives pull requests from here. */ - struct rvt_rwqe wq[0]; -}; - struct rvt_rq { struct rvt_rwq *wq; u32 size; /* size of RWQE array */ diff --git a/include/uapi/rdma/rvt-abi.h b/include/uapi/rdma/rvt-abi.h index c9ad6fa..1d7ca46 100644 --- a/include/uapi/rdma/rvt-abi.h +++ b/include/uapi/rdma/rvt-abi.h @@ -10,6 +10,7 @@ #include #include +#include #ifndef RDMA_ATOMIC_UAPI #define RDMA_ATOMIC_UAPI(_type, _name) _type _name #endif @@ -28,4 +29,31 @@ struct rvt_cq_wc { struct ib_uverbs_wc uqueue[0]; }; +/* + * Receive work request queue entry. + * The size of the sg_list is determined when the QP (or SRQ) is created + * and stored in qp->r_rq.max_sge (or srq->rq.max_sge). + */ +struct rvt_rwqe { + __u64 wr_id; + __u8 num_sge; + __u8 padding[7]; + struct ib_sge sg_list[0]; +}; + +/* + * This structure is used to contain the head pointer, tail pointer, + * and receive work queue entries as a single memory allocation so + * it can be mmap'ed into user space. + * Note that the wq array elements are variable size so you can't + * just index into the array to get the N'th element; + * use get_rwqe_ptr() instead. + */ +struct rvt_rwq { + /* new work requests posted to the head */ + RDMA_ATOMIC_UAPI(u32, head); + /* receives pull requests from here. */ + RDMA_ATOMIC_UAPI(u32, tail); + struct rvt_rwqe wq[0]; +}; #endif /* RVT_ABI_USER_H */ From patchwork Wed Dec 5 15:11:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Dalessandro X-Patchwork-Id: 10714373 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B4CDF13AF for ; Wed, 5 Dec 2018 15:11:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A11662D417 for ; Wed, 5 Dec 2018 15:11:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 953F52D46E; Wed, 5 Dec 2018 15:11:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2F1AB2D463 for ; Wed, 5 Dec 2018 15:11:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727797AbeLEPLT (ORCPT ); Wed, 5 Dec 2018 10:11:19 -0500 Received: from mga11.intel.com ([192.55.52.93]:12459 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727630AbeLEPLT (ORCPT ); Wed, 5 Dec 2018 10:11:19 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Dec 2018 07:11:18 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,317,1539673200"; d="scan'208";a="98287045" Received: from scymds01.sc.intel.com ([10.82.194.37]) by orsmga006.jf.intel.com with ESMTP; 05 Dec 2018 07:11:18 -0800 Received: from scvm10.sc.intel.com (scvm10.sc.intel.com [10.82.195.27]) by scymds01.sc.intel.com with ESMTP id wB5FBH1n031579; Wed, 5 Dec 2018 07:11:17 -0800 Received: from scvm10.sc.intel.com (localhost [127.0.0.1]) by scvm10.sc.intel.com with ESMTP id wB5FBHc8032127; Wed, 5 Dec 2018 07:11:17 -0800 Subject: [PATCH for-next v2 4/4] IB/rdmavt: Fracture single lock used for posting and processing RWQEs From: Dennis Dalessandro To: jgg@ziepe.ca, dledford@redhat.com Cc: Harish Chegondi , linux-rdma@vger.kernel.org, Mike Marciniszyn , Kamenee Arumugam Date: Wed, 05 Dec 2018 07:11:17 -0800 Message-ID: <20181205151110.27132.15219.stgit@scvm10.sc.intel.com> In-Reply-To: <20181205150358.27132.36815.stgit@scvm10.sc.intel.com> References: <20181205150358.27132.36815.stgit@scvm10.sc.intel.com> User-Agent: StGit/0.17.1-18-g2e886-dirty MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Harish Chegondi Usage of single lock prevents fetching posted and processing receive work queue entries from progressing simultaneously and impacts overall performance. Fracture the single lock used for posting and processing Receive Work Queue Entries (RWQEs) to allow the circular buffer to be filled and emptied at the same time. Two new spinlocks - one for the producers and one for the consumers used for posting and processing RWQEs simultaneously and the two indices are define on two different cache lines. The threshold count is used to avoid reading other index in different cache line every time. Signed-off-by: Harish Chegondi Signed-off-by: Kamenee Arumugam Reviewed-by: Mike Marciniszyn Signed-off-by: Dennis Dalessandro --- drivers/infiniband/sw/rdmavt/qp.c | 263 ++++++++++++++++++++++++++---------- drivers/infiniband/sw/rdmavt/qp.h | 2 drivers/infiniband/sw/rdmavt/rc.c | 41 +++--- drivers/infiniband/sw/rdmavt/srq.c | 95 ++++++++----- include/rdma/rdmavt_qp.h | 30 ++++ 5 files changed, 307 insertions(+), 124 deletions(-) diff --git a/drivers/infiniband/sw/rdmavt/qp.c b/drivers/infiniband/sw/rdmavt/qp.c index a1bd8cf..72febb8 100644 --- a/drivers/infiniband/sw/rdmavt/qp.c +++ b/drivers/infiniband/sw/rdmavt/qp.c @@ -57,6 +57,8 @@ #include "vt.h" #include "trace.h" +#define RVT_RWQ_COUNT_THRESHOLD 16 + static void rvt_rc_timeout(struct timer_list *t); /* @@ -807,6 +809,63 @@ static void rvt_remove_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp) } /** + * rvt_free_rq - free memory allocated for rvt_rq struct + * @rvt_rq: request queue data structure + */ +static void rvt_free_rq(struct rvt_rq *rq) +{ + kvfree(rq->kwq); + rq->kwq = NULL; + vfree(rq->wq); + rq->wq = NULL; +} + +/** + * rvt_alloc_rq - allocate memory for user or kernel buffer + * @rq: receive queue data structure + * @size: number of request queue entries + * @node: The NUMA node + * @user: True if user data is available or not false + * + * Return: If memory allocation filed, return -ENONEM + * This function is used by both shared receive + * queues and non-shared receive queues to allocate + * memory. + */ +int rvt_alloc_rq(struct rvt_rq *rq, u32 size, int node, + struct ib_udata *udata) +{ + if (udata) { + rq->wq = + vmalloc_user(sizeof(struct rvt_rwq) + + size); + if (!rq->wq) + goto bail; + /* need kwq with no buffers */ + rq->kwq = kzalloc_node(sizeof(*rq->kwq), + GFP_KERNEL, node); + if (!rq->kwq) + goto bail; + rq->kwq->curr_wq = rq->wq->wq; + } else { + /* need kwq with buffers */ + rq->kwq = + vzalloc_node(sizeof(struct rvt_krwq) + + size, node); + if (!rq->kwq) + goto bail; + rq->kwq->curr_wq = rq->kwq->wq; + } + + spin_lock_init(&rq->kwq->p_lock); + spin_lock_init(&rq->kwq->c_lock); + return 0; +bail: + rvt_free_rq(rq); + return -ENOMEM; +} + +/** * rvt_init_qp - initialize the QP state to the reset state * @qp: the QP to init or reinit * @type: the QP type @@ -855,9 +914,10 @@ static void rvt_init_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp, qp->r_head_ack_queue = 0; qp->s_tail_ack_queue = 0; qp->s_num_rd_atomic = 0; - if (qp->r_rq.wq) { - qp->r_rq.wq->head = 0; - qp->r_rq.wq->tail = 0; + if (qp->r_rq.kwq) { + qp->r_rq.kwq->head = 0; + qp->r_rq.kwq->tail = 0; + qp->r_rq.kwq->count = qp->r_rq.size; } qp->r_sge.num_sge = 0; atomic_set(&qp->s_reserved_used, 0); @@ -1051,17 +1111,12 @@ struct ib_qp *rvt_create_qp(struct ib_pd *ibpd, qp->r_rq.max_sge = init_attr->cap.max_recv_sge; sz = (sizeof(struct ib_sge) * qp->r_rq.max_sge) + sizeof(struct rvt_rwqe); - if (udata) - qp->r_rq.wq = vmalloc_user( - sizeof(struct rvt_rwq) + - qp->r_rq.size * sz); - else - qp->r_rq.wq = vzalloc_node( - sizeof(struct rvt_rwq) + - qp->r_rq.size * sz, - rdi->dparms.node); - if (!qp->r_rq.wq) + err = rvt_alloc_rq(&qp->r_rq, qp->r_rq.size * sz, + rdi->dparms.node, udata); + if (err) { + ret = ERR_PTR(err); goto bail_driver_priv; + } } /* @@ -1071,7 +1126,6 @@ struct ib_qp *rvt_create_qp(struct ib_pd *ibpd, spin_lock_init(&qp->r_lock); spin_lock_init(&qp->s_hlock); spin_lock_init(&qp->s_lock); - spin_lock_init(&qp->r_rq.lock); atomic_set(&qp->refcount, 0); atomic_set(&qp->local_ops_pending, 0); init_waitqueue_head(&qp->wait); @@ -1208,8 +1262,8 @@ struct ib_qp *rvt_create_qp(struct ib_pd *ibpd, rvt_free_qpn(&rdi->qp_dev->qpn_table, qp->ibqp.qp_num); bail_rq_wq: - if (!qp->ip) - vfree(qp->r_rq.wq); + vfree(qp->r_rq.wq); + kvfree(qp->r_rq.kwq); bail_driver_priv: rdi->driver_f.qp_priv_free(rdi, qp); @@ -1240,6 +1294,7 @@ int rvt_error_qp(struct rvt_qp *qp, enum ib_wc_status err) struct ib_wc wc; int ret = 0; struct rvt_dev_info *rdi = ib_to_rvt(qp->ibqp.device); + unsigned long flags; lockdep_assert_held(&qp->r_lock); lockdep_assert_held(&qp->s_lock); @@ -1275,30 +1330,40 @@ int rvt_error_qp(struct rvt_qp *qp, enum ib_wc_status err) } wc.status = IB_WC_WR_FLUSH_ERR; - if (qp->r_rq.wq) { - struct rvt_rwq *wq; + if (qp->r_rq.kwq) { u32 head; u32 tail; - - spin_lock(&qp->r_rq.lock); - + struct rvt_rwq *wq = NULL; + struct rvt_krwq *kwq = NULL; + + spin_lock_irqsave(&qp->r_rq.kwq->c_lock, flags); + if (qp->ip) { + wq = qp->r_rq.wq; + head = wq->head; + tail = wq->tail; + } else { + kwq = qp->r_rq.kwq; + head = kwq->head; + tail = kwq->tail; + } /* sanity check pointers before trusting them */ - wq = qp->r_rq.wq; - head = wq->head; if (head >= qp->r_rq.size) head = 0; - tail = wq->tail; if (tail >= qp->r_rq.size) tail = 0; while (tail != head) { - wc.wr_id = rvt_get_rwqe_ptr(&qp->r_rq, tail)->wr_id; + wc.wr_id = + rvt_get_rwqe_ptr(&qp->r_rq, tail)->wr_id; if (++tail >= qp->r_rq.size) tail = 0; - rvt_cq_enter(ibcq_to_rvtcq(qp->ibqp.recv_cq), &wc, 1); + rvt_cq_enter(ibcq_to_rvtcq(qp->ibqp.recv_cq), + &wc, 1); } - wq->tail = tail; - - spin_unlock(&qp->r_rq.lock); + if (qp->ip) + wq->tail = tail; + else + kwq->tail = tail; + spin_unlock_irqrestore(&qp->r_rq.kwq->c_lock, flags); } else if (qp->ibqp.event_handler) { ret = 1; } @@ -1640,8 +1705,7 @@ int rvt_destroy_qp(struct ib_qp *ibqp) if (qp->ip) kref_put(&qp->ip->ref, rvt_release_mmap_info); - else - vfree(qp->r_rq.wq); + kvfree(qp->r_rq.kwq); vfree(qp->s_wq); rdi->driver_f.qp_priv_free(rdi, qp); kfree(qp->s_ack_queue); @@ -1727,8 +1791,9 @@ int rvt_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr, const struct ib_recv_wr **bad_wr) { struct rvt_qp *qp = ibqp_to_rvtqp(ibqp); - struct rvt_rwq *wq = qp->r_rq.wq; unsigned long flags; + struct rvt_krwq *wq = qp->r_rq.kwq; + int qp_err_flush = (ib_rvt_state_ops[qp->state] & RVT_FLUSH_RECV) && !qp->ibqp.srq; @@ -1748,12 +1813,12 @@ int rvt_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr, return -EINVAL; } - spin_lock_irqsave(&qp->r_rq.lock, flags); + spin_lock_irqsave(&qp->r_rq.kwq->p_lock, flags); next = wq->head + 1; if (next >= qp->r_rq.size) next = 0; - if (next == wq->tail) { - spin_unlock_irqrestore(&qp->r_rq.lock, flags); + if (next == READ_ONCE(wq->tail)) { + spin_unlock_irqrestore(&qp->r_rq.kwq->p_lock, flags); *bad_wr = wr; return -ENOMEM; } @@ -1776,10 +1841,10 @@ int rvt_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr, * Make sure queue entry is written * before the head index. */ - smp_wmb(); + smp_store_release(&wq->head, next); wq->head = next; } - spin_unlock_irqrestore(&qp->r_rq.lock, flags); + spin_unlock_irqrestore(&qp->r_rq.kwq->p_lock, flags); } return 0; } @@ -2151,7 +2216,7 @@ int rvt_post_srq_recv(struct ib_srq *ibsrq, const struct ib_recv_wr *wr, const struct ib_recv_wr **bad_wr) { struct rvt_srq *srq = ibsrq_to_rvtsrq(ibsrq); - struct rvt_rwq *wq; + struct rvt_krwq *wq; unsigned long flags; for (; wr; wr = wr->next) { @@ -2164,13 +2229,13 @@ int rvt_post_srq_recv(struct ib_srq *ibsrq, const struct ib_recv_wr *wr, return -EINVAL; } - spin_lock_irqsave(&srq->rq.lock, flags); - wq = srq->rq.wq; + spin_lock_irqsave(&srq->rq.kwq->p_lock, flags); + wq = srq->rq.kwq; next = wq->head + 1; if (next >= srq->rq.size) next = 0; - if (next == wq->tail) { - spin_unlock_irqrestore(&srq->rq.lock, flags); + if (next == READ_ONCE(wq->tail)) { + spin_unlock_irqrestore(&srq->rq.kwq->p_lock, flags); *bad_wr = wr; return -ENOMEM; } @@ -2181,9 +2246,9 @@ int rvt_post_srq_recv(struct ib_srq *ibsrq, const struct ib_recv_wr *wr, for (i = 0; i < wr->num_sge; i++) wqe->sg_list[i] = wr->sg_list[i]; /* Make sure queue entry is written before the head index. */ - smp_wmb(); + smp_store_release(&wq->head, next); wq->head = next; - spin_unlock_irqrestore(&srq->rq.lock, flags); + spin_unlock_irqrestore(&srq->rq.kwq->p_lock, flags); } return 0; } @@ -2239,6 +2304,50 @@ static int init_sge(struct rvt_qp *qp, struct rvt_rwqe *wqe) return 0; } +/* + * get_count - count numbers of request work queue entries + * in circular buffer + * @rq: data structure for request queue entry + * @tail: tail indices of the circular buffer + * @head: head indices of the circular buffer + * + * Return - total number of entries in the circular buffer + */ +static u32 get_count(struct rvt_rq *rq, u32 tail, u32 head) +{ + u32 count; + + count = head; + + if (count >= rq->size) + count = 0; + if (count < tail) + count += rq->size - tail; + else + count -= tail; + + return count; +} + +/* + * get_rvt_head - get head indices of the circular buffer + * @rq: data structure for request queue entry + * @ip: the QP + * + * Return - head index value + */ +static u32 get_rvt_head(struct rvt_rq *rq, void *ip) +{ + u32 head; + + if (ip) + head = READ_ONCE(rq->wq->head); + else + head = rq->kwq->head; + + return head; +} + /** * rvt_get_rwqe - copy the next RWQE into the QP's RWQE * @qp: the QP @@ -2253,39 +2362,54 @@ int rvt_get_rwqe(struct rvt_qp *qp, bool wr_id_only) { unsigned long flags; struct rvt_rq *rq; + struct rvt_krwq *kwq; struct rvt_rwq *wq; struct rvt_srq *srq; struct rvt_rwqe *wqe; void (*handler)(struct ib_event *, void *); u32 tail; + u32 head; int ret; + void *ip = NULL; if (qp->ibqp.srq) { srq = ibsrq_to_rvtsrq(qp->ibqp.srq); handler = srq->ibsrq.event_handler; rq = &srq->rq; + ip = srq->ip; } else { srq = NULL; handler = NULL; rq = &qp->r_rq; + ip = qp->ip; } - spin_lock_irqsave(&rq->lock, flags); + spin_lock_irqsave(&rq->kwq->c_lock, flags); if (!(ib_rvt_state_ops[qp->state] & RVT_PROCESS_RECV_OK)) { ret = 0; goto unlock; } + kwq = rq->kwq; + if (ip) { + wq = rq->wq; + tail = wq->tail; + } else { + tail = kwq->tail; + } - wq = rq->wq; - tail = wq->tail; /* Validate tail before using it since it is user writable. */ if (tail >= rq->size) tail = 0; - if (unlikely(tail == wq->head)) { + + if (kwq->count < RVT_RWQ_COUNT_THRESHOLD) { + head = get_rvt_head(rq, ip); + kwq->count = get_count(rq, tail, head); + } + if (unlikely(kwq->count == 0)) { ret = 0; goto unlock; } - /* Make sure entry is read after head index is read. */ + /* Make sure entry is read after the count is read. */ smp_rmb(); wqe = rvt_get_rwqe_ptr(rq, tail); /* @@ -2295,43 +2419,42 @@ int rvt_get_rwqe(struct rvt_qp *qp, bool wr_id_only) */ if (++tail >= rq->size) tail = 0; - wq->tail = tail; + if (ip) + wq->tail = tail; + else + kwq->tail = tail; if (!wr_id_only && !init_sge(qp, wqe)) { ret = -1; goto unlock; } qp->r_wr_id = wqe->wr_id; + kwq->count--; ret = 1; set_bit(RVT_R_WRID_VALID, &qp->r_aflags); if (handler) { - u32 n; - /* * Validate head pointer value and compute * the number of remaining WQEs. */ - n = wq->head; - if (n >= rq->size) - n = 0; - if (n < tail) - n += rq->size - tail; - else - n -= tail; - if (n < srq->limit) { - struct ib_event ev; - - srq->limit = 0; - spin_unlock_irqrestore(&rq->lock, flags); - ev.device = qp->ibqp.device; - ev.element.srq = qp->ibqp.srq; - ev.event = IB_EVENT_SRQ_LIMIT_REACHED; - handler(&ev, srq->ibsrq.srq_context); - goto bail; + if (kwq->count < srq->limit) { + kwq->count = get_count(rq, tail, get_rvt_head(rq, ip)); + if (kwq->count < srq->limit) { + struct ib_event ev; + + srq->limit = 0; + spin_unlock_irqrestore(&rq->kwq->c_lock, flags); + ev.device = qp->ibqp.device; + ev.element.srq = qp->ibqp.srq; + ev.event = IB_EVENT_SRQ_LIMIT_REACHED; + handler(&ev, srq->ibsrq.srq_context); + goto bail; + } } } + unlock: - spin_unlock_irqrestore(&rq->lock, flags); + spin_unlock_irqrestore(&rq->kwq->c_lock, flags); bail: return ret; } diff --git a/drivers/infiniband/sw/rdmavt/qp.h b/drivers/infiniband/sw/rdmavt/qp.h index 6d88397..01031d7 100644 --- a/drivers/infiniband/sw/rdmavt/qp.h +++ b/drivers/infiniband/sw/rdmavt/qp.h @@ -68,4 +68,6 @@ int rvt_post_srq_recv(struct ib_srq *ibsrq, const struct ib_recv_wr *wr, const struct ib_recv_wr **bad_wr); int rvt_wss_init(struct rvt_dev_info *rdi); void rvt_wss_exit(struct rvt_dev_info *rdi); +int rvt_alloc_rq(struct rvt_rq *rq, u32 size, int node, + struct ib_udata *udata); #endif /* DEF_RVTQP_H */ diff --git a/drivers/infiniband/sw/rdmavt/rc.c b/drivers/infiniband/sw/rdmavt/rc.c index 6131cc5..0ddddaf 100644 --- a/drivers/infiniband/sw/rdmavt/rc.c +++ b/drivers/infiniband/sw/rdmavt/rc.c @@ -105,25 +105,34 @@ __be32 rvt_compute_aeth(struct rvt_qp *qp) u32 min, max, x; u32 credits; struct rvt_rwq *wq = qp->r_rq.wq; + struct rvt_krwq *kwq = qp->r_rq.kwq; u32 head; u32 tail; - /* sanity check pointers before trusting them */ - head = wq->head; - if (head >= qp->r_rq.size) - head = 0; - tail = wq->tail; - if (tail >= qp->r_rq.size) - tail = 0; - /* - * Compute the number of credits available (RWQEs). - * There is a small chance that the pair of reads are - * not atomic, which is OK, since the fuzziness is - * resolved as further ACKs go out. - */ - credits = head - tail; - if ((int)credits < 0) - credits += qp->r_rq.size; + credits = READ_ONCE(kwq->count); + if (credits == 0) { + /* sanity check pointers before trusting them */ + if (qp->ip) { + head = READ_ONCE(wq->head); + tail = READ_ONCE(wq->tail); + } else { + head = READ_ONCE(kwq->head); + tail = READ_ONCE(kwq->tail); + } + if (head >= qp->r_rq.size) + head = 0; + if (tail >= qp->r_rq.size) + tail = 0; + /* + * Compute the number of credits available (RWQEs). + * There is a small chance that the pair of reads are + * not atomic, which is OK, since the fuzziness is + * resolved as further ACKs go out. + */ + credits = head - tail; + if ((int)credits < 0) + credits += qp->r_rq.size; + } /* * Binary search the credit table to find the code to * use. diff --git a/drivers/infiniband/sw/rdmavt/srq.c b/drivers/infiniband/sw/rdmavt/srq.c index 78e06fc..e654643 100644 --- a/drivers/infiniband/sw/rdmavt/srq.c +++ b/drivers/infiniband/sw/rdmavt/srq.c @@ -51,7 +51,7 @@ #include "srq.h" #include "vt.h" - +#include "qp.h" /** * rvt_driver_srq_init - init srq resources on a per driver basis * @rdi: rvt dev structure @@ -80,6 +80,7 @@ struct ib_srq *rvt_create_srq(struct ib_pd *ibpd, struct rvt_srq *srq; u32 sz; struct ib_srq *ret; + int err; if (srq_init_attr->srq_type != IB_SRQT_BASIC) return ERR_PTR(-EOPNOTSUPP); @@ -101,11 +102,9 @@ struct ib_srq *rvt_create_srq(struct ib_pd *ibpd, srq->rq.max_sge = srq_init_attr->attr.max_sge; sz = sizeof(struct ib_sge) * srq->rq.max_sge + sizeof(struct rvt_rwqe); - srq->rq.wq = udata ? - vmalloc_user(sizeof(struct rvt_rwq) + srq->rq.size * sz) : - vzalloc_node(sizeof(struct rvt_rwq) + srq->rq.size * sz, - dev->dparms.node); - if (!srq->rq.wq) { + err = rvt_alloc_rq(&srq->rq, (srq->rq.size * sz), + dev->dparms.node, udata); + if (err) { ret = ERR_PTR(-ENOMEM); goto bail_srq; } @@ -115,7 +114,6 @@ struct ib_srq *rvt_create_srq(struct ib_pd *ibpd, * See rvt_mmap() for details. */ if (udata && udata->outlen >= sizeof(__u64)) { - int err; u32 s = sizeof(struct rvt_rwq) + srq->rq.size * sz; srq->ip = @@ -137,7 +135,6 @@ struct ib_srq *rvt_create_srq(struct ib_pd *ibpd, /* * ib_create_srq() will initialize srq->ibsrq. */ - spin_lock_init(&srq->rq.lock); srq->limit = srq_init_attr->attr.srq_limit; spin_lock(&dev->n_srqs_lock); @@ -162,6 +159,7 @@ struct ib_srq *rvt_create_srq(struct ib_pd *ibpd, kfree(srq->ip); bail_wq: vfree(srq->rq.wq); + kvfree(srq->rq.kwq); bail_srq: kfree(srq); return ret; @@ -182,30 +180,34 @@ int rvt_modify_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr, { struct rvt_srq *srq = ibsrq_to_rvtsrq(ibsrq); struct rvt_dev_info *dev = ib_to_rvt(ibsrq->device); - struct rvt_rwq *wq; int ret = 0; + struct rvt_rq tmp_rq; + struct rvt_krwq *kwq; + struct rvt_rwq *wq; if (attr_mask & IB_SRQ_MAX_WR) { - struct rvt_rwq *owq; + struct rvt_krwq *okwq; + struct rvt_rwq *owq = NULL; struct rvt_rwqe *p; u32 sz, size, n, head, tail; /* Check that the requested sizes are below the limits. */ if ((attr->max_wr > dev->dparms.props.max_srq_wr) || ((attr_mask & IB_SRQ_LIMIT) ? - attr->srq_limit : srq->limit) > attr->max_wr) + attr->srq_limit : srq->limit) > attr->max_wr) { return -EINVAL; + } sz = sizeof(struct rvt_rwqe) + srq->rq.max_sge * sizeof(struct ib_sge); size = attr->max_wr + 1; - wq = udata ? - vmalloc_user(sizeof(struct rvt_rwq) + size * sz) : - vzalloc_node(sizeof(struct rvt_rwq) + size * sz, - dev->dparms.node); - if (!wq) - return -ENOMEM; - + memset(&tmp_rq, 0, sizeof(tmp_rq)); + ret = rvt_alloc_rq(&tmp_rq, size * sz, dev->dparms.node, + udata); + if (ret) { + ret = -ENOMEM; + goto bail_free; + } /* Check that we can write the offset to mmap. */ if (udata && udata->inlen >= sizeof(__u64)) { __u64 offset_addr; @@ -223,14 +225,20 @@ int rvt_modify_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr, goto bail_free; } - spin_lock_irq(&srq->rq.lock); + spin_lock_irq(&srq->rq.kwq->c_lock); /* * validate head and tail pointer values and compute * the number of remaining WQEs. */ - owq = srq->rq.wq; - head = owq->head; - tail = owq->tail; + okwq = srq->rq.kwq; + if (udata) { + owq = srq->rq.wq; + head = owq->head; + tail = owq->tail; + } else { + head = okwq->head; + tail = okwq->tail; + } if (head >= srq->rq.size || tail >= srq->rq.size) { ret = -EINVAL; goto bail_unlock; @@ -245,7 +253,14 @@ int rvt_modify_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr, goto bail_unlock; } n = 0; - p = wq->wq; + if (udata) { + wq = tmp_rq.wq; + p = wq->wq; + } else { + kwq = tmp_rq.kwq; + p = kwq->wq; + } + while (tail != head) { struct rvt_rwqe *wqe; int i; @@ -260,22 +275,31 @@ int rvt_modify_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr, if (++tail >= srq->rq.size) tail = 0; } - srq->rq.wq = wq; + + srq->rq.kwq = tmp_rq.kwq; + if (udata) { + srq->rq.wq = tmp_rq.wq; + tmp_rq.wq->head = n; + tmp_rq.wq->tail = 0; + } else { + tmp_rq.kwq->head = n; + tmp_rq.kwq->tail = 0; + } srq->rq.size = size; - wq->head = n; - wq->tail = 0; if (attr_mask & IB_SRQ_LIMIT) srq->limit = attr->srq_limit; - spin_unlock_irq(&srq->rq.lock); + spin_unlock_irq(&srq->rq.kwq->c_lock); - vfree(owq); + if (udata) + vfree(owq); + kvfree(okwq); if (srq->ip) { struct rvt_mmap_info *ip = srq->ip; struct rvt_dev_info *dev = ib_to_rvt(srq->ibsrq.device); u32 s = sizeof(struct rvt_rwq) + size * sz; - rvt_update_mmap_info(dev, ip, s, wq); + rvt_update_mmap_info(dev, ip, s, tmp_rq.wq); /* * Return the offset to mmap. @@ -299,19 +323,21 @@ int rvt_modify_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr, spin_unlock_irq(&dev->pending_lock); } } else if (attr_mask & IB_SRQ_LIMIT) { - spin_lock_irq(&srq->rq.lock); + spin_lock_irq(&srq->rq.kwq->c_lock); if (attr->srq_limit >= srq->rq.size) ret = -EINVAL; else srq->limit = attr->srq_limit; - spin_unlock_irq(&srq->rq.lock); + spin_unlock_irq(&srq->rq.kwq->c_lock); } return ret; bail_unlock: - spin_unlock_irq(&srq->rq.lock); + spin_unlock_irq(&srq->rq.kwq->c_lock); bail_free: - vfree(wq); + kvfree(tmp_rq.kwq); + if (udata) + vfree(tmp_rq.wq); return ret; } @@ -347,8 +373,7 @@ int rvt_destroy_srq(struct ib_srq *ibsrq) spin_unlock(&dev->n_srqs_lock); if (srq->ip) kref_put(&srq->ip->ref, rvt_release_mmap_info); - else - vfree(srq->rq.wq); + kvfree(srq->rq.kwq); kfree(srq); return 0; diff --git a/include/rdma/rdmavt_qp.h b/include/rdma/rdmavt_qp.h index 1ef17a6..c40dfff 100644 --- a/include/rdma/rdmavt_qp.h +++ b/include/rdma/rdmavt_qp.h @@ -178,12 +178,36 @@ struct rvt_swqe { struct rvt_sge sg_list[0]; }; +/** + * struct rvt_krwq -kernel struct receive work request + * @p_lock: lock to protect producer of the kernel buffer + * @head: index of next entry to fill + * @c_lock:lock to protect consumer of the kernel buffer + * @tail: index of next entry to pull + * @count: count of total receive enteries posted. + * @rvt_rwqe: struct of receive work request queue entry. + * + * This structure is used to contain the head pointer, + * tail pointer and receive work queue entries for kernel + * mode user. + */ +struct rvt_krwq { + spinlock_t p_lock; /* protect producer */ + u32 head; /* new work requests posted to the head */ + + /* protect consumer */ + spinlock_t c_lock ____cacheline_aligned_in_smp; + u32 tail; /* receives pull requests from here. */ + u32 count; /* approx count of receive entries posted */ + struct rvt_rwqe *curr_wq; + struct rvt_rwqe wq[0]; +}; + struct rvt_rq { struct rvt_rwq *wq; + struct rvt_krwq *kwq; u32 size; /* size of RWQE array */ u8 max_sge; - /* protect changes in this struct */ - spinlock_t lock ____cacheline_aligned_in_smp; }; /* @@ -449,7 +473,7 @@ struct rvt_mcast { static inline struct rvt_rwqe *rvt_get_rwqe_ptr(struct rvt_rq *rq, unsigned n) { return (struct rvt_rwqe *) - ((char *)rq->wq->wq + + ((char *)rq->kwq->curr_wq + (sizeof(struct rvt_rwqe) + rq->max_sge * sizeof(struct ib_sge)) * n); }