[v1,03/15] IB/pvrdma: Add support for Completion Queues

Message ID	1467785688-23229-4-git-send-email-aditr@vmware.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-rdma-owner@kernel.org> From: Adit Ranadive <aditr@vmware.com> To: <dledford@redhat.com>, <linux-rdma@vger.kernel.org>, <pv-drivers@vmware.com> CC: Adit Ranadive <aditr@vmware.com>, <jhansen@vmware.com>, <asarwade@vmware.com>, <georgezhang@vmware.com>, <bryantan@vmware.com> Subject: [PATCH v1 03/15] IB/pvrdma: Add support for Completion Queues Date: Tue, 5 Jul 2016 23:14:36 -0700 Message-ID: <1467785688-23229-4-git-send-email-aditr@vmware.com> In-Reply-To: <1467785688-23229-1-git-send-email-aditr@vmware.com> References: <1467785688-23229-1-git-send-email-aditr@vmware.com> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: None (EX13-EDG-OU-001.vmware.com: aditr@vmware.com does not designate permitted sender hosts) Received-SPF: None (protection.outlook.com: vmware.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; SN2PR0501MB848; 23:Jh7s2jDZfzDw0TaxjhULI1qjKKPrYiS+tt6OlcJw?= =?us-ascii?Q?UNOGVbAms1dkK6xUtDUeTaAQrnz0jZDkL0rTSVFmIgIiqcAUOpq4Tgi5VzY8?= =?us-ascii?Q?ytLquU8hh5/Bhw2psrTJJvutGB6DijK4mTyQjyEQjVdjSCgCjuDlWGQSbl3+?= =?us-ascii?Q?dAUNhAdduJK1k8nq9/wzgtlY8o4IfII53x61FGBM67w+YcB4hoeuSbNlDItI?= =?us-ascii?Q?js1A8/O8yqt6I8i3kXdIdIkOEe/Q4N4XGIJW+E2tqYQ8Puu3JOKb+Fj1Tj5N?= =?us-ascii?Q?NCSyIKBheEQB2R5ksHMcmQUJHLjXegbBEKGo2wkg+srctpedjqpD2EvnCKfm?= =?us-ascii?Q?CgZ+KSHJVHEPGSRI2gO/8Y3+XvLrC7TkavdSMGZvvG1H3HEzlJ7jdd9ZxPnK?= =?us-ascii?Q?jDYXdTtDJyE/zUtEQ4MeEMtaN55I7+oYR0JSpGV0IrsjbBRdsGQOpalTLVSs?= =?us-ascii?Q?3UALEKwtfqtMjb5jSHtV4kFUUAmzyNEv1tnosjgz5zlK1N5EBH+HQy79Gc7/?= =?us-ascii?Q?tDiXC49OYA+FvJRqVJ2IuiMwOA1n3ZLPZ18JD44eAWywEBnN1Cve44BHjS8u?= =?us-ascii?Q?sHr5jLt20bCDQO1Ue/06lWzW1daiTaf87J4X4FbZXkxIwE7BCjLP54d5mpYS?= =?us-ascii?Q?gzrCKqWy6B2NOPmWA/jLQDL6CDJulc/N+YqWcpSvKMiX9tCSKTil6aecv3uz?= =?us-ascii?Q?IOePoHQfAED+RWvfPYk6x479Lhroc5BAp6noAvMedNzesVk8ZUzPBzXenYuR?= =?us-ascii?Q?nk8DhoqhegU2Zs7MAzlQKt5shcOT/M8LKGps0pNl1WOFUrxhkP7wONri2JBD?= =?us-ascii?Q?HbLauJatAKlKiUba1VN9cqu8JwNFlHgdH+xcjb53kmDv6oi2keT2m0Qto4ed?= =?us-ascii?Q?cU/KDfXFyHigGJFRblk4BgVDZSms9TsMdPcLjhZyEVa8hTGYp3lfqq2awqCV?= =?us-ascii?Q?fsT5wrfdfBO1ZxSrne95NXjPu3qQKnPh44LaOy7ePc/WwDD4FeHiSpui9u0l?= =?us-ascii?Q?XIYWeS723Gne4nl6U3kKj5avcPQO8OLVLCS0yeuEop09bQIpZ/VcZG/kxNxl?= =?us-ascii?Q?wUCc7xLj5scyggUe6LukqpylDekj+NxlLibNapLWtFe+fxVUTFAhq/2ccK0S?= =?us-ascii?Q?HD+3+PX8+693CSNwyf2kpOJTj+daKxbeGBDZqdqS/bpCOND5eFawnUERHF3z?= =?us-ascii?Q?OXZ4PKeuLp90b+y+KHsHQ4Xk7chnkPJSnbZe4S3PezxwZ37Re18vzeCW8g?= =?us-ascii?Q?=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1; SN2PR0501MB848; 6:Q6eqfcd6M8EYtXMGaC6KvNYN4Ey5lPGiaN+wazsw63SI8gKc2iV+9Kl6y8Hxum/CdQmRQX6ZUv7+aLSHeZFXDxAHiFlVoIsVhYS1UzemBjv6QK/yrbHX+kYLVibWrSKm3sIWbjrZdftVXx0v1+JtrciszBYCjzyuRULeaSG9I/irKm9fx94KuJe4p2/u48Y47SclJ57YGAzZDWVSGkW59TblfM5VKOgc3yx2Nv3oxvC8zWM71jZY2Oq0VmgfqRUX56GXo7S6H+FDIfC+qtEmnKK0efdoEGOLWuZDiQgi1tc=; 5:KmOGlBEKFWm3WdWKBbjUoS1Y7gp5n3OOdOFqYUezLZ3UmcwOsBoNpS6uLjJSr7/l7myN2FaaXRPgplgUkrqf7l1lov0QjCErnShpinMVxnDAhwbhLTtoEVugY2TkiY0QZ5joVtnukU+cnzFPH68XEA==; 24:VjzLO7x0GMlsRm/LBeVU2pMDf3yRXI2Oke9opOFGRwA+foQHb0kOkC+e4BAJ2LGNjwN6G+7og9of7ZrH8RwvDwcI7xZzG7aBE5aFLYBA8wk=; 7:qTQg17mMGIsJkG1oEtlUvXRhOJlf859l/1PmSp0eyCdupoAz7Yrit4Ow+BYF5fxV6G1TfNw8g3hf5qK5bQR6krUlHnn4Y1LFa23nPJJjBGjh14Qf7b1yLDLeYb16EpP4Q7u0l7oy4mksIl2TbkkKx8vTo9JBDFk+rIuBrb+GFz2yK/AsimiXZBNobUYpsMmZf8sYOSCUv8O+vuydQXCLJm03Wc863+vIrwgHN7AyWKlM2GDrIR+0PDwD4QlX+3Ne SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; SN2PR0501MB848; 20:lkGQjYx5QjDxLPF48YjvBbhB3QBr2j0CO09Z3mNCbrAzNmKrfk1JkIgNB4vQ5thkOoJBovsUEntNFwK2iwyZnZZDMg5FgE/G8s9MhRXsGs6CaHYQTvxmAD8jn0YfjoYyu76fziDKp7qnG0eqtN9XpZz0VfE8TN2p0Cm6x4QiSYo= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Jul 2016 06:15:14.6876 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN2PR0501MB848 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk

diff --git a/drivers/infiniband/hw/pvrdma/pvrdma_cq.c b/drivers/infiniband/hw/pvrdma/pvrdma_cq.c new file mode 100644 index 0000000..9a4b42c --- /dev/null +++ b/drivers/infiniband/hw/pvrdma/pvrdma_cq.c @@ -0,0 +1,436 @@ +/* + * Copyright (c) 2012-2016 VMware, Inc. All rights reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of EITHER the GNU General Public License + * version 2 as published by the Free Software Foundation or the BSD + * 2-Clause License. This program is distributed in the hope that it + * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED + * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + * See the GNU General Public License version 2 for more details at + * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html. + * + * You should have received a copy of the GNU General Public License + * along with this program available in the file COPYING in the main + * directory of this source tree. + * + * The BSD 2-Clause License + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS + * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE + * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, + * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED + * OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include <asm/page.h> +#include <linux/io.h> +#include <linux/wait.h> +#include <rdma/ib_addr.h> +#include <rdma/ib_smi.h> +#include <rdma/ib_user_verbs.h> + +#include "pvrdma.h" +#include "pvrdma_user.h" + +/** + * pvrdma_req_notify_cq - request notification for a completion queue + * @ibcq: the completion queue + * @notify_flags: notification flags + * + * @return: 0 for success. + */ +int pvrdma_req_notify_cq(struct ib_cq *ibcq, + enum ib_cq_notify_flags notify_flags) +{ + struct pvrdma_dev *dev = to_vdev(ibcq->device); + struct pvrdma_cq *cq = to_vcq(ibcq); + u32 val = cq->cq_handle; + + val |= (notify_flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED ? + PVRDMA_UAR_CQ_ARM_SOL : PVRDMA_UAR_CQ_ARM; + + writel(cpu_to_le32(val), dev->driver_uar.map + PVRDMA_UAR_CQ_OFFSET); + + return 0; +} + +/** + * pvrdma_create_cq - create completion queue + * @ibdev: the device + * @attr: completion queue attributes + * @context: user context + * @udata: user data + * + * @return: ib_cq completion queue pointer on success, + * otherwise returns negative errno. + */ +struct ib_cq *pvrdma_create_cq(struct ib_device *ibdev, + const struct ib_cq_init_attr *attr, + struct ib_ucontext *context, + struct ib_udata *udata) +{ + int entries = attr->cqe; + struct pvrdma_dev *dev = to_vdev(ibdev); + struct pvrdma_cq *cq; + int ret; + int npages; + unsigned long flags; + union pvrdma_cmd_req req; + union pvrdma_cmd_resp rsp; + struct pvrdma_cmd_create_cq *cmd = &req.create_cq; + struct pvrdma_cmd_create_cq_resp *resp = &rsp.create_cq_resp; + struct pvrdma_create_cq ucmd; + + BUILD_BUG_ON(sizeof(struct pvrdma_cqe) != 64); + + entries = roundup_pow_of_two(entries); + if (entries < 1 || entries > dev->dsr->caps.max_cqe) + return ERR_PTR(-EINVAL); + + if (!atomic_add_unless(&dev->num_cqs, 1, dev->dsr->caps.max_cq)) + return ERR_PTR(-EINVAL); + + cq = kzalloc(sizeof(*cq), GFP_KERNEL); + if (!cq) { + atomic_dec(&dev->num_cqs); + return ERR_PTR(-ENOMEM); + } + + cq->ibcq.cqe = entries; + + if (context) { + if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd))) { + ret = -EFAULT; + goto err_cq; + } + + cq->umem = ib_umem_get(context, ucmd.buf_addr, ucmd.buf_size, + IB_ACCESS_LOCAL_WRITE, 1); + if (IS_ERR(cq->umem)) { + ret = PTR_ERR(cq->umem); + goto err_cq; + } + + npages = ib_umem_page_count(cq->umem); + } else { + cq->is_kernel = true; + + /* One extra page for shared ring state */ + npages = 1 + (entries * sizeof(struct pvrdma_cqe) + + PAGE_SIZE - 1) / PAGE_SIZE; + + /* Skip header page. */ + cq->offset = PAGE_SIZE; + } + + if (npages < 0 || npages > PVRDMA_PAGE_DIR_MAX_PAGES) { + dev_warn(&dev->pdev->dev, + "overflow pages in completion queue\n"); + ret = -EINVAL; + goto err_umem; + } + + ret = pvrdma_page_dir_init(dev, &cq->pdir, npages, cq->is_kernel); + if (ret) { + dev_warn(&dev->pdev->dev, + "could not allocate page directory\n"); + goto err_umem; + } + + if (cq->is_kernel) { + /* Ring state is always the first page. */ + cq->ring_state = cq->pdir.pages[0]; + } else { + pvrdma_page_dir_insert_umem(&cq->pdir, cq->umem, 0); + } + + atomic_set(&cq->refcnt, 1); + init_waitqueue_head(&cq->wait); + spin_lock_init(&cq->cq_lock); + + memset(cmd, 0, sizeof(*cmd)); + cmd->hdr.cmd = PVRDMA_CMD_CREATE_CQ; + cmd->nchunks = npages; + cmd->ctx_handle = (context) ? + (u64)to_vucontext(context)->ctx_handle : 0; + cmd->cqe = entries; + cmd->pdir_dma = cq->pdir.dir_dma; + ret = pvrdma_cmd_post(dev, &req, true, &rsp); + + if (ret < 0 || rsp.hdr.ack != PVRDMA_CMD_CREATE_CQ_RESP) { + dev_warn(&dev->pdev->dev, + "could not create completion queue\n"); + goto err_page_dir; + } + + cq->ibcq.cqe = resp->cqe; + cq->cq_handle = resp->cq_handle; + spin_lock_irqsave(&dev->cq_tbl_lock, flags); + dev->cq_tbl[cq->cq_handle % dev->dsr->caps.max_cq] = cq; + spin_unlock_irqrestore(&dev->cq_tbl_lock, flags); + + if (context) { + cq->uar = &(to_vucontext(context)->uar); + + /* Copy udata back. */ + if (ib_copy_to_udata(udata, &cq->cq_handle, sizeof(__u32))) { + dev_warn(&dev->pdev->dev, + "failed to copy back udata\n"); + ret = -EINVAL; + goto err_page_dir; + } + } + + return &cq->ibcq; + +err_page_dir: + pvrdma_page_dir_cleanup(dev, &cq->pdir); +err_umem: + if (context) + ib_umem_release(cq->umem); +err_cq: + atomic_dec(&dev->num_cqs); + kfree(cq); + + return ERR_PTR(ret); +} + +static void pvrdma_free_cq(struct pvrdma_dev *dev, struct pvrdma_cq *cq) +{ + atomic_dec(&cq->refcnt); + wait_event(cq->wait, !atomic_read(&cq->refcnt)); + + if (!cq->is_kernel) + ib_umem_release(cq->umem); + + pvrdma_page_dir_cleanup(dev, &cq->pdir); + kfree(cq); +} + +/** + * pvrdma_destroy_cq - destroy completion queue + * @cq: the completion queue to destroy. + * + * @return: 0 for success. + */ +int pvrdma_destroy_cq(struct ib_cq *cq) +{ + struct pvrdma_cq *vcq = to_vcq(cq); + union pvrdma_cmd_req req; + union pvrdma_cmd_resp rsp; + struct pvrdma_cmd_destroy_cq *cmd = &req.destroy_cq; + struct pvrdma_dev *dev = to_vdev(cq->device); + unsigned long flags; + int ret; + + memset(cmd, 0, sizeof(*cmd)); + cmd->hdr.cmd = PVRDMA_CMD_DESTROY_CQ; + cmd->cq_handle = vcq->cq_handle; + + ret = pvrdma_cmd_post(dev, &req, false, &rsp); + if (ret < 0) + dev_warn(&dev->pdev->dev, + "could not destroy completion queue\n"); + + /* free cq's resources */ + spin_lock_irqsave(&dev->cq_tbl_lock, flags); + dev->cq_tbl[vcq->cq_handle] = NULL; + spin_unlock_irqrestore(&dev->cq_tbl_lock, flags); + + pvrdma_free_cq(dev, vcq); + atomic_dec(&dev->num_cqs); + + return ret; +} + +/** + * pvrdma_modify_cq - modify the CQ moderation parameters + * @ibcq: the CQ to modify + * @cq_count: number of CQEs that will trigger an event + * @cq_period: max period of time in usec before triggering an event + * + * @return: -EOPNOTSUPP as CQ resize is not supported. + */ +int pvrdma_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period) +{ + return -EOPNOTSUPP; +} + +static inline struct pvrdma_cqe *get_cqe(struct pvrdma_cq *cq, int i) +{ + return (struct pvrdma_cqe *)pvrdma_page_dir_get_ptr( + &cq->pdir, + cq->offset + + sizeof(struct pvrdma_cqe) * i); +} + +void pvrdma_flush_cqe(struct pvrdma_qp *qp, struct pvrdma_cq *cq) +{ + int head; + int has_data; + + if (!cq->is_kernel) + return; + + /* Lock held */ + has_data = pvrdma_idx_ring_has_data(&cq->ring_state->rx, + cq->ibcq.cqe, &head); + if (unlikely(has_data > 0)) { + int items; + int curr; + int tail = pvrdma_idx(&cq->ring_state->rx.prod_tail, + cq->ibcq.cqe); + struct pvrdma_cqe *cqe; + struct pvrdma_cqe *curr_cqe; + + items = (tail > head) ? (tail - head) : + (cq->ibcq.cqe - head + tail); + curr = --tail; + while (items-- > 0) { + if (curr < 0) + curr = cq->ibcq.cqe - 1; + if (tail < 0) + tail = cq->ibcq.cqe - 1; + curr_cqe = get_cqe(cq, curr); + if ((curr_cqe->qp & 0xFFFF) != qp->qp_handle) { + if (curr != tail) { + cqe = get_cqe(cq, tail); + *cqe = *curr_cqe; + } + tail--; + } else { + pvrdma_idx_ring_inc( + &cq->ring_state->rx.cons_head, + cq->ibcq.cqe); + } + curr--; + } + } +} + +static int pvrdma_poll_one(struct pvrdma_cq *cq, struct pvrdma_qp **cur_qp, + struct ib_wc *wc) +{ + struct pvrdma_dev *dev = to_vdev(cq->ibcq.device); + int has_data; + unsigned int head; + bool tried = false; + struct pvrdma_cqe *cqe; + +retry: + has_data = pvrdma_idx_ring_has_data(&cq->ring_state->rx, + cq->ibcq.cqe, &head); + if (has_data == 0) { + u32 val; + + if (tried) + return -EAGAIN; + + /* Pass down POLL to give physical HCA a chance to poll. */ + val = cq->cq_handle | PVRDMA_UAR_CQ_POLL; + writel(cpu_to_le32(val), + dev->driver_uar.map + PVRDMA_UAR_CQ_OFFSET); + + tried = true; + goto retry; + } else if (has_data == -1) { + return -EINVAL; + } + + cqe = get_cqe(cq, head); + + /* Ensure cqe is valid. */ + rmb(); + if (dev->qp_tbl[cqe->qp & 0xffff]) + *cur_qp = (struct pvrdma_qp *)dev->qp_tbl[cqe->qp & 0xffff]; + else + return -EINVAL; + + wc->opcode = pvrdma_wc_opcode_to_ib(cqe->opcode); + wc->status = pvrdma_wc_status_to_ib(cqe->status); + wc->wr_id = cqe->wr_id; + wc->qp = &(*cur_qp)->ibqp; + wc->byte_len = cqe->byte_len; + wc->ex.imm_data = cqe->imm_data; + wc->src_qp = cqe->src_qp; + wc->wc_flags = pvrdma_wc_flags_to_ib(cqe->wc_flags); + wc->pkey_index = cqe->pkey_index; + wc->slid = cqe->slid; + wc->sl = cqe->sl; + wc->dlid_path_bits = cqe->dlid_path_bits; + wc->port_num = cqe->port_num; + wc->vendor_err = 0; + + /* Update shared ring state */ + pvrdma_idx_ring_inc(&cq->ring_state->rx.cons_head, cq->ibcq.cqe); + + return 0; +} + +/** + * pvrdma_poll_cq - poll for work completion queue entries + * @ibcq: completion queue + * @num_entries: the maximum number of entries + * @entry: pointer to work completion array + * + * @return: number of polled completion entries + */ +int pvrdma_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc) +{ + struct pvrdma_cq *cq = to_vcq(ibcq); + struct pvrdma_qp *cur_qp = NULL; + unsigned long flags; + int npolled; + int ret; + + if (num_entries < 1) + return -EINVAL; + + spin_lock_irqsave(&cq->cq_lock, flags); + for (npolled = 0; npolled < num_entries; ++npolled) { + ret = pvrdma_poll_one(cq, &cur_qp, wc + npolled); + if (ret) + break; + } + + spin_unlock_irqrestore(&cq->cq_lock, flags); + + if ((ret == 0) || (ret == -EAGAIN)) + return npolled; + else + return ret; +} + +/** + * pvrdma_resize_cq - resize CQ + * @ibcq: the completion queue + * @entries: CQ entries + * @udata: user data + * + * @return: -EOPNOTSUPP as CQ resize is not supported. + */ +int pvrdma_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata) +{ + return -EOPNOTSUPP; +}

[v1,03/15] IB/pvrdma: Add support for Completion Queues

Commit Message

Patch