From patchwork Fri Mar 24 10:45:19 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jinpu Wang X-Patchwork-Id: 9642571 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 72CC56020B for ; Fri, 24 Mar 2017 10:47:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 87FCD26E81 for ; Fri, 24 Mar 2017 10:47:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7BC812823D; Fri, 24 Mar 2017 10:47:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3C1C526E81 for ; Fri, 24 Mar 2017 10:47:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935427AbdCXKrj (ORCPT ); Fri, 24 Mar 2017 06:47:39 -0400 Received: from mail-wm0-f48.google.com ([74.125.82.48]:37418 "EHLO mail-wm0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935578AbdCXKrg (ORCPT ); Fri, 24 Mar 2017 06:47:36 -0400 Received: by mail-wm0-f48.google.com with SMTP id n11so9891678wma.0 for ; Fri, 24 Mar 2017 03:47:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=profitbricks-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=rn9iU8uSwDHAVDvQtXcItMtPVCSa/GEanfSyBuvXnPE=; b=nZv/jwfFM8+qaFZrRcisbCOwgfTiTK/QZiQbVueapxzetqEBmazGirJtEjg4hnoPRi nIP/Y2Q0bb3+dAcp2tTd3At1PCnUCnLKozMHncRnCW+X3rNahE+N1B5e+B7qdpw9644h /m9zP5hUJblliOCIAvXLNF1ZhfSSexl9Bxr0CAGM9ItZ099rmEAewUR1Wo92TSyIOGVD vvWFRf6oKrHRF4PgTfkT4AQV7ZkbFB8lroEvXVN1mpNAiVOz4c7UsEHiNypzHgvhgsZd Bz/+TgGxhzS966nu9O2/1heh6gTOFC/WNgYUtLAYWLc0ihHFMhGHjjBKlSjMGn6o6bTe 97yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=rn9iU8uSwDHAVDvQtXcItMtPVCSa/GEanfSyBuvXnPE=; b=WovNyMnnvlvkuv7jSTCrb9qqTyOTr4MwXym1CKS3RjxdpOsK4B3sVoK3L/IwmfxZT+ rkrsVzSUqSnKHANCTE/BA7FnNCy3TrwhjXxHtHTRDzADJ1lRE4HGdQfHpDqK00sY7wRS DEwqkfgvllv2P+gIfTpuZ+UsxkpgIy0/IwM0Sh3gh0TLJFh+QkOiivoIq8jadppA9DFg CortaMEbax3+yjfaKv6CgG9NmV/E9JALnYtzyVSBv+LOi21YcMptnxED/MIyl0ntJ++6 h4AKm8d3qKcbA1uP4kPzg+5qxc5AbG17Knfw2ppDiLzqw3p0ETWzUdaN0Yy1g3uWNCMl Fw6A== X-Gm-Message-State: AFeK/H09j4EWiIVvOZ6vCIgzCv+KbwkIscvSWJihKpm/X8Z9zvlBe97YTL7l3vkosTiXZHUn X-Received: by 10.28.173.2 with SMTP id w2mr2246616wme.117.1490352453946; Fri, 24 Mar 2017 03:47:33 -0700 (PDT) Received: from jinpu-GA-870A-USB3.pb.local ([62.217.45.26]) by smtp.gmail.com with ESMTPSA id s17sm2404458wrc.25.2017.03.24.03.47.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 24 Mar 2017 03:47:33 -0700 (PDT) From: Jack Wang X-Google-Original-From: Jack Wang To: linux-block@vger.kernel.org, linux-rdma@vger.kernel.org Cc: dledford@redhat.com, axboe@kernel.dk, hch@lst.de, mail@fholler.de, Milind.dumbare@gmail.com, yun.wang@profitbricks.com, Jack Wang , Kleber Souza , Danil Kipnis , Roman Pen Subject: [PATCH 04/28] ibtrs_clt: add header file for exported interface Date: Fri, 24 Mar 2017 11:45:19 +0100 Message-Id: <1490352343-20075-5-git-send-email-jinpu.wangl@profitbricks.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1490352343-20075-1-git-send-email-jinpu.wangl@profitbricks.com> References: <1490352343-20075-1-git-send-email-jinpu.wangl@profitbricks.com> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Jack Wang User module eg ibnbd_client will use this interface to transfer data later. Signed-off-by: Jack Wang Signed-off-by: Kleber Souza Signed-off-by: Danil Kipnis Signed-off-by: Roman Pen --- include/rdma/ibtrs_clt.h | 316 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 316 insertions(+) create mode 100644 include/rdma/ibtrs_clt.h diff --git a/include/rdma/ibtrs_clt.h b/include/rdma/ibtrs_clt.h new file mode 100644 index 0000000..4fc9b12 --- /dev/null +++ b/include/rdma/ibtrs_clt.h @@ -0,0 +1,316 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler < mail@fholler.de> + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Pen + * Milind Dumbare + * + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions, and the following disclaimer, + * without modification. + * 2. Redistributions in binary form must reproduce at minimum a disclaimer + * substantially similar to the "NO WARRANTY" disclaimer below + * ("Disclaimer") and any redistribution must be conditioned upon + * including a substantially similar Disclaimer requirement for further + * binary redistribution. + * 3. Neither the names of the above-listed copyright holders nor the names + * of any contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * Alternatively, this software may be distributed under the terms of the + * GNU General Public License ("GPL") version 2 as published by the Free + * Software Foundation. + * + * NO WARRANTY + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, + * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING + * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGES. + * + */ + +#if !defined(IBTRS_CLIENT_H) +#define IBTRS_CLIENT_H + +#include + +struct ibtrs_session; + +/** + * ibtrs_clt_open() - Open a session to a ibtrs_server + * @addr: The IPv4, IPv6 or GID address of the peer + * @pdu_sz: Size of extra payload which can be accessed after tag allocation. + * @priv: Pointer passed back on &ibtrs_clt_ops->sess_ev() invocation + * @max_inflight_msg: Max. number of parallel inflight messages for the session + * @max_segments: Max. number of segments per IO request + * @reconnect_delay_sec: time between reconnect tries + * @max_reconnect_attempts: Number of times to reconnect on error before giving + * up, 0 for * disabled, -1 for forever + * + * Starts session establishment with the ibtrs_server. The function can block + * up to ~2000ms until it returns. + * + * Return a valid pointer on success otherwise PTR_ERR. + * -EINVAL: The provided addr could not be resolved to an Infiniband + * address, the route to the host could not be resolved or + * ibtrs_clt_register() was not called before. + */ +struct ibtrs_session *ibtrs_clt_open(const struct sockaddr_storage *addr, + size_t pdu_sz, void *priv, + u8 reconnect_delay_sec, u16 max_segments, + s16 max_reconnect_attempts); + +/** + * ibtrs_clt_close() - Close a session + * @sess: Session handler, is freed on return + */ +int ibtrs_clt_close(struct ibtrs_session *sess); + +/** + * enum ibtrs_clt_rdma_ev - Events related to RDMA transfer operations + */ +enum ibtrs_clt_rdma_ev { + IBTRS_CLT_RDMA_EV_RDMA_REQUEST_WRITE_COMPL, + IBTRS_CLT_RDMA_EV_RDMA_WRITE_COMPL, +}; + +/** + * enum ibtrs_sess_ev - Events about connectivity state of a session + * @IBTRS_CLT_SESS_EV_RECONNECT The session was reconnected. + * @IBTRS_CLT_SESS_EV_DISCONNECTED The session was disconnected. + * @IBTRS_CLT_SESS_EV_MAX_RECONN_EXCEEDED Reconect attempts stopped because + * max. number of reconnect attempts + * are reached. + */ +enum ibtrs_clt_sess_ev { + IBTRS_CLT_SESS_EV_RECONNECT, + IBTRS_CLT_SESS_EV_DISCONNECTED, + IBTRS_CLT_SESS_EV_MAX_RECONN_EXCEEDED, +}; + +/** + * struct ibtrs_clt_ops - Callback functions of the user module + * @owner: module that uses ibtrs_server + * @rdma_ev: Event notifications for RDMA operations, + * Context: in interrupt(soft irq). The function should be relatively fast. + * @priv: user supplied data that was passed to + * ibtrs_clt_request_rdma_write() or + * ibtrs_clt_rdma_write() before + * @ev: Occurred event + * @errno: Result of corresponding operation, + * 0 on success or negative ERRNO code on error + * @sess_ev: Event notification for connection state changes + * @priv: user supplied data that was passed to + * ibtrs_clt_open() + * @ev: Occurred event + * @errno: Result of corresponding operation, + * 0 on success or negative ERRNO code on error + * @recv: Event notification for infiniband message receival + * @priv: user supplied data that was passed to + * ibtrs_clt_open() + * @msg Received data + * @len. Size of the @msg buffer + * + * The @recv and @rdma_ev are running on the same CPU that requested the RDMA + * operation before. + */ + +typedef void (rdma_clt_ev_fn)(void *priv, enum ibtrs_clt_rdma_ev ev, int errno); +typedef void (sess_clt_ev_fn)(void *priv, enum ibtrs_clt_sess_ev ev, int errno); +typedef void (recv_clt_fn)(void *priv, const void *msg, size_t len); + +struct ibtrs_clt_ops { + struct module *owner; + rdma_clt_ev_fn *rdma_ev; + sess_clt_ev_fn *sess_ev; + recv_clt_fn *recv; +}; + +/** + * ibtrs_clt_register() - register a user module with ibtrs_client + * @ops: callback functions to register + * + * Return: + * 0: Success + * -ENOTSUPP: Registration failed, max. number of supported user modules + reached + */ +int ibtrs_clt_register(const struct ibtrs_clt_ops *ops); + +/** + * ibtrs_clt_unregister() - unregister a module at ibtrs_client + * @ops: struct that was passed before to ibtrs_clt_register() + * + * ibtrs_clt_unregister() must only be called after all session that were + * created by the user module were closed. + */ +void ibtrs_clt_unregister(const struct ibtrs_clt_ops *ops); + +enum { + IBTRS_TAG_NOWAIT = 0, + IBTRS_TAG_WAIT = 1, +}; + +/** + * ibtrs_tag - tags the memory allocation for future RDMA operation + */ +struct ibtrs_tag { + unsigned int cpu_id; + unsigned int mem_id; + unsigned int mem_id_mask; +}; + +static inline struct ibtrs_tag *ibtrs_tag_from_pdu(void *pdu) +{ + return pdu - sizeof(struct ibtrs_tag); +} + +static inline void *ibtrs_tag_to_pdu(struct ibtrs_tag *tag) +{ + return tag + 1; +} + +/** + * ibtrs_get_tag() - allocates tag for future RDMA operation + * @sess: Current session + * @cpu_id: cpu_id to run + * @nr_bytes: Number of bytes to consume per tag + * @wait: Wait type + * + * Description: + * Allocates tag for the following RDMA operation. Tag is used + * to preallocate all resources and to propagate memory pressure + * up earlier. + * + * Context: + * Can sleep if @wait == IBTRS_TAG_WAIT + */ +struct ibtrs_tag *ibtrs_get_tag(struct ibtrs_session *sess, int cpu_id, + size_t nr_bytes, int wait); + +/** + * ibtrs_put_tag() - puts allocated tag + * @sess: Current session + * @tag: Tag to be freed + * + * Context: + * Does not matter + */ +void ibtrs_put_tag(struct ibtrs_session *sess, struct ibtrs_tag *tag); + +/** + * ibtrs_clt_rdma_write() - Transfer data to the server via RDMA. + * @sess: Session + * @tag: Preallocated tag + * @priv: User provided data, passed back on corresponding + * @ibtrs_clt_ops->rdma_ev() event + * @vec: User module message to transfer together with @sg. + * Sum of len of all @vec elements limited to <= IO_MSG_SIZE + * @nr: Number of elements in @vec. + * @data_len: Size of data in @sg + * @sg: data to transferred, 512B aligned in the receivers memory + * @sg_len: number of elements in @sg array + * + * Return: + * 0: Success + * <0: Error + * + * On completion of the operation a %IBTRS_CLT_RDMA_EV_RDMA_WRITE_COMPL is + * generated. If an error happened on IBTRS layer for this operation a + * %IBTRS_CLT_RDMA_EV_ERROR is generated. + */ +int ibtrs_clt_rdma_write(struct ibtrs_session *sess, struct ibtrs_tag *tag, + void *priv, const struct kvec *vec, size_t nr, + size_t data_len, struct scatterlist *sg, + unsigned int sg_len); + +/** + * ibtrs_clt_request_rdma_write() - Request data transfer from server via RDMA. + * + * @sess: Session + * @tag: Preallocated tag + * @priv: User provided data, passed back on corresponding + * @ibtrs_clt_ops->rdma_ev() event + * @vec: Message that is send to server together with the request. + * Sum of len of all @vec elements limited to <= IO_MSG_SIZE. + * @nr: Number of elements in @vec. + * @result_len: Max. length of data that ibtrs_server will send back + * @recv_sg: Pages in which the response of the server will be stored. + * @recv_sg_len: Number of elements in the @recv_sg + * + * Return: + * 0: Success + * <0: Error + * + * IBTRS Client will request a data transfer from Server to Client via RDMA. + * The data that the server will respond with will be stored in @recv_sg when + * the user receives an %IBTRS_CLT_RDMA_EV_RDMA_REQUEST_WRITE_COMPL event. + * If an error occurred on the IBTRS layer a %IBTRS_CLT_RDMA_EV_ERROR is + * generated instead + */ +int ibtrs_clt_request_rdma_write(struct ibtrs_session *sess, + struct ibtrs_tag *tag, void *priv, + const struct kvec *vec, size_t nr, + size_t result_len, + struct scatterlist *recv_sg, + unsigned int recv_sg_len); + +/** + * ibtrs_clt_send() - Send data to server via an infiniband message. + * @sess: Session + * @vec: Data to transfer + * @nr: Number of elements in @vec + * + * Return: + * 0: Success + * <0: Error: + * -ECOMM no connection to the server + * -EINVAL message size too big (500 bytes max) + * -EAGAIN run out of tx buffers - try again later + * - see mlx doc + * + * The operation is not confirmed. It is the responsibility of the user on the + * other side to send an acknowledgment if required. + */ +int ibtrs_clt_send(struct ibtrs_session *sess, const struct kvec *vec, + size_t nr); + +/** + * ibtrs_attrs - IBTRS session attributes + */ +struct ibtrs_attrs { + u32 queue_depth; + u64 mr_page_mask; + u32 mr_page_size; + u32 mr_max_size; + u32 max_pages_per_mr; + u32 max_sge; + u32 max_io_size; + u8 hostname[MAXHOSTNAMELEN]; +}; + +/** + * ibtrs_clt_query() - queries IBTRS session attributes + * + * Returns: + * 0 on success + * -ECOMM no connection to the server + */ +int ibtrs_clt_query(struct ibtrs_session *sess, struct ibtrs_attrs *attr); + +#endif