From patchwork Fri Mar 24 10:45:42 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jinpu Wang X-Patchwork-Id: 9642665 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1B52560328 for ; Fri, 24 Mar 2017 10:48:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3138D2823E for ; Fri, 24 Mar 2017 10:48:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2604C28236; Fri, 24 Mar 2017 10:48:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6D89B26E81 for ; Fri, 24 Mar 2017 10:48:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754132AbdCXKsY (ORCPT ); Fri, 24 Mar 2017 06:48:24 -0400 Received: from mail-wr0-f173.google.com ([209.85.128.173]:35276 "EHLO mail-wr0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935667AbdCXKsJ (ORCPT ); Fri, 24 Mar 2017 06:48:09 -0400 Received: by mail-wr0-f173.google.com with SMTP id u1so6757136wra.2 for ; Fri, 24 Mar 2017 03:47:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=profitbricks-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=zxeuQ2Q3+rkBdtCOI2YCI3oWp/LoElBp7m3BufI7NRU=; b=yN+6UXdYPXWDidQmXXQ8D4yaVZOLlBjXEiXablqDfMXsxQI2TWZ9jO4fXl+UMKTVT5 X2PNlvhtWMBYSJ3X56Q+LHLQdo5gKYA60ME0ZnMl/uOIv945KurRLYdD57R77QXZMb7F r0+lGQaYCImxPyYDvMEXB7eysfiCUBhliynxE2btUFaR64Km6ZAkkTWU2W4SoKKxDSB3 786NV+WpgWsssCHI3vAEoT8oeA+41vietcOnEF5t/nTk1WXAiJ/XW5KGi/prOPF2BG2x PEbWP1z67G+V68gUGcRSNC50NH6feZK657wFG5oF8EdeiScW0BSqle1nZllwGBqEDPlE Bl6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=zxeuQ2Q3+rkBdtCOI2YCI3oWp/LoElBp7m3BufI7NRU=; b=SOG3KcII5leiXEPIJswCRd4okNVwC0Ik0wPOS/pZAk5+CaC9g+HyvEo8rgilQ0Tr4l ts/n8h8sB6rsgRFx2D818ccPXWX3rm57TXohsmHY1d7KezzJDNentd2WNl4trplwYP8p X//OlvphfQXpgF/jt6OnKoC29JdYjSuXgJhmNtaL8AKdM4TqntXZX2DUZV/ankqkiCzU 2JNorfdZCQ2IR3HIWdRZ8nYl34SBa75Ln2kyDbP0iJLvFXdWAg8r1aLz+cnJHHDXs1jP bQmqfh7YAJJrc2Ls19rsA4VYW0EwtrAVLrNSYcR42XMs2eRGSW5f1IV3srkZllm3Z903 6Bbg== X-Gm-Message-State: AFeK/H0xmwb7rIZF3ix8WRrSxsZ0l9DM5WQvLezpBG0WOecw0SxIhMNNG1oiBcxj9G6e1Ugc X-Received: by 10.223.153.142 with SMTP id y14mr6758890wrb.193.1490352476732; Fri, 24 Mar 2017 03:47:56 -0700 (PDT) Received: from jinpu-GA-870A-USB3.pb.local ([62.217.45.26]) by smtp.gmail.com with ESMTPSA id s17sm2404458wrc.25.2017.03.24.03.47.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 24 Mar 2017 03:47:56 -0700 (PDT) From: Jack Wang X-Google-Original-From: Jack Wang To: linux-block@vger.kernel.org, linux-rdma@vger.kernel.org Cc: dledford@redhat.com, axboe@kernel.dk, hch@lst.de, mail@fholler.de, Milind.dumbare@gmail.com, yun.wang@profitbricks.com, Jack Wang Subject: [PATCH 27/28] ibnbd: add doc for how to use ibnbd and sysfs interface Date: Fri, 24 Mar 2017 11:45:42 +0100 Message-Id: <1490352343-20075-28-git-send-email-jinpu.wangl@profitbricks.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1490352343-20075-1-git-send-email-jinpu.wangl@profitbricks.com> References: <1490352343-20075-1-git-send-email-jinpu.wangl@profitbricks.com> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Jack Wang Signed-off-by: Jack Wang --- Documentation/IBNBD.txt | 284 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 284 insertions(+) create mode 100644 Documentation/IBNBD.txt diff --git a/Documentation/IBNBD.txt b/Documentation/IBNBD.txt new file mode 100644 index 0000000..f7f490a --- /dev/null +++ b/Documentation/IBNBD.txt @@ -0,0 +1,284 @@ +Infiniband Network Block Device (IBNBD) +======================================= + +Introduction +------------ + +IBNBD (InfiniBand Network Block Device) is a pair of kernel modules (client and +server) that allows to access a remote storage device on the server from +clients via an InfiniBand network. +Mapped storage devices appear transparent for the client, acting as any other +regular storage devices. + +The data transport between client and server over the InfiniBand network +is performed by the IBTRS (InfiniBand Transport) kernel modules. + +The administration of these modules is done via sysfs. A Command-line tool +(ibnbd-cli) is also available for a more user-friendly experience. + +Requirements +------------ + - IBTRS kernel modules (available as git-submodule) + +Quick Start +----------- +Server: + # insmod ibtrs/ibtrs_server/ibtrs_server.ko + # insmod ibnbd_server/ibnbd_server.ko + +Client: + # insmod ibtrs/ibtrs_client/ibtrs_client.ko + # insmod ibnbd_client/ibnbd_client.ko + # echo "server= device_path=" > /sys/kernel/ibnbd/map_device + +The block device will become available on the client as +/dev/ibnbd. It can be used like a local block device. + +Client Userspace Interface +-------------------------- +This chapter describes only the most important files of Userspace Interface. +A full documentation can be found in the Architecture Documentation. + +All sysfs files that are not read-only will return a usage information if they +are read. + +example: + $ cat /sys/kernel/ibnbd/map_device + + +/sys/kernel/ibnbd/ entries +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +map_device (RW) +^^^^^^^^^^^^^^^ +To map a volume on the client, information about the device has to be written +to: + /sys/kernel/ibnbd/map_device + +The format of the input is: + "server= device_path= + [access_mode= + - ip: + - gid: + +device_path Parameter ++++++++++++++++++++++++++++++++ +A device can be mapped by specifying its relative path to the configured +dev_search_path on the server side. +The ibnbd_server prepends the configured dev_search_path to the passed +device_path from the mapped operation and tries to open a block device with the +path dev_search_path/device_path: +On success, a /dev/ibnbd device file, a /sys/block/ibnbd/ibnbd/ +directory and a entry in /sys/kernel/ibnbd/devices will be created. + +access_mode Parameter ++++++++++++++++++++++ +The access_mode parameter specifies if the device is to be mapped as read-only +or read-write. The "migration" access mode has the same effect as "rw" and +should be used during a VM migration scenario by the client where the VM is +being migrated to. +If not specified, 'rw' is used. + +input_mode Parameter +++++++++++++++++++++ +The input_mode parameter specifies the internal I/O processing mode of the +network block device on the client. +If not specified, 'mq' mode is used. + +io_mode Parameter ++++++++++++++++++ +The io_mode parameter specifies if the device on the server will be opened as +block device (blockio) or as file (fileio). +When the device is opened as file, the VFS page cache is used for read I/O +operations, write I/O operations bypass the page cache and go directly to disk +(except meta updates, like file access time). +When the device is opened as block device, the block device is accessed +directly, no VFS page cache is used. +If not specified, 'fileio' mode is used. + +Exit Codes +++++++++++ +If the device is already mapped it will fail with EEXIST. If the input has an +invalid format it will return EINVAL. If the device path cannot be found on the +server, it will fail with ENOENT. + +Examples +++++++++ + # echo "server=ip:10.50.100.64 device_path=/dev/ram1" input_mode=mq > /sys/kernel/ibnbd/map_device + # echo "server=ip:10.50.100.64 device_path=3F2504E0-4F89-41D3-9A0C-0305E82C3301" > /sys/kernel/ibnbd/map_device + +Finding device file after mapping ++++++++++++++++++++++++++++++++++ +After mapping, the device file can be found by: +1.) The symlink /sys/kernel/ibnbd/devices/ points to + /sys/block/. + The last part of the symlink destination is the same than the device name. + By extracting the last part of the path the path to the device + /dev/ can be build. +2.) /dev/block/$(cat /sys/kernel/ibnbd/devices//dev) + +How to find the of the device is described on the next chapter +(devices/ directory). + +devices/ (DIRECTORY) +^^^^^^^^^^^^^^^^^^^^ +For each device mapped on the client a new symbolic link is created as +/sys/kernel/ibnbd/devices/, which points to the block device created +by ibnbd (/sys/block/ibnbd/). The of each device is created as +follows: + +- If the 'device_path' provided during mapping contains slashes ("/"), they are + replaced by exclamation mark ("!") and used as as the . Otherwise, + the will be the same as the 'device_path' provided. + + +Examples +++++++++ + /sys/kernel/ibnbd/devices/3F2504E0-4F89-41D3-9A0C-0305E82C3301 -> /sys/block/ibnbd1/ + /sys/kernel/ibnbd/devices/!dev!ram1 -> /sys/block/ibnbd0/ + + +/sys/block/ibnbd/ibnbd/ entries +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +unmap_device (RW) +^^^^^^^^^^^^^^^^^ +To unmap a volume, 'normal' or 'force' has to be written to: + /sys/block/ibnbd/ibnbd/unmap_device + +When 'normal' is used, the operation will fail with EBUSY if any process is +using the device. +When 'force' is used, the device is also unmapped when device is in use. +All I/Os that are in progress will fail. It can happen that the device +file (/dev/ibnbdx) still exists after the unmapping. The kernel +couldn't remove the file because it was in use but it's marked as unused. +The device file will be freed when no process refer to it. + +In a following IBNBD mapping the remote device can be reused, but +ibnbd may generate different device file for it. + +Examples +++++++++ + # echo "normal" > /sys/block/ibnbd0/ibnbd/unmap_device + +state (RO) +^^^^^^^^^^ +The file contains the current state of the block device. The state file returns +'open' when the device is successfully mapped from the server and accepting I/O +requests. When the connection to the server gets disconnected in case of an +error (e.g. link failure), the state file returns 'closed' and all I/O requests +will fail with -EIO. + +session (RO) +^^^^^^^^^^^^ +IBNBD uses IBTRS session to transport the data between client and server. +The file 'session' contains the address of the server, that was used to +establish the IBTRS session. +It's the same address that was passed as server parameter to the map_device +file. + +mapping_path (RO) +^^^^^^^^^^^^^^^^^ +Contains the path that was passed as device_path to the map_device operation. + +/sys/kernel/ibtrs/sessions/ entries +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The connections to the servers are created and destroyed on demand. When the +first device is mapped from a server, an IBTRS connection will be created with +this server and the following directory will be created: + +/sys/kernel/ibtrs/sessions// + +If the connection establishment fails, detailed error information can be found +in the kernel log (dmesg). + +When the last device is unmapped from a server, the connection will be closed +and the directory will be deleted. + + +Server Userspace Interface +-------------------------- + +/sys/kernel/ibnbd/ entries +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +/sys/kernel/ibnbd/devices/ entries +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +When a Pserver maps a device, a directory entry with the name of the block +device is created under /sys/kernel/ibnbd/devices/. If the device path provided +by the client is a symbolic link to a block device, the target block device name +is used instead of the mapping path name. + +block_dev +^^^^^^^^^ +block_dev is a symlink to the sysfs entry of the exported device + +Examples +++++++++ + block_dev -> ../../../../devices/virtual/block/nullb1 + +revalidate +^^^^^^^^^^ +When the size of a exported block device changes on the server, the clients +have to be notified so they can resize the mapped device. + +Notification of the clients about a device change is triggered by writing '1' +to the revalidate file. + +Examples +++++++++ + # echo 1 > /sys/kernel/ibnbd/devices/nullb1/revalidate + +/sys/kernel/ibnbd/devices//clients entries +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +When the device is mapped from a client, the following directory will be +created: + +/sys/kernel/ibnbd/devices//clients/ entries +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +When the device is unmapped, the directory will be removed. + +read_only +^^^^^^^^^ +Contains '1' if device is mapped read-only, otherwise '0'. + +mapping_path +^^^^^^^^^^^^ +Contains the relative device path provided by the user during mapping. + + +IBNBD-Server Module Parameters +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +dev_search_path +^^^^^^^^^^^^^^^ +When a device is mapped from the client, the server generates the path to the +block device on the server side by concatenating dev_search_path and the +device_path that was specified in the map_device operation. + +The format of the input is + path ::= Absolute linux path name, + Max. length depends on PATH_MAX define (usually 4095 chars) + +The default dev_search_path is: "/". + +Example ++++++++ + +Configured dev_search_path on server is: /dev/storage/ +client maps device by:: + # echo "server=ip:10.50.100.64 device_path=3F2504E0-4F89-41D3-9A0C-0305E82C3301" > /sys/kernel/ibnbd/map_device + +The server tries to open a block device with the path: + /dev/storage/3F2504E0-4F89-41D3-9A0C-0305E82C3301 + + +Contact +------- +Mailing list: ibnbd@profitbricks.com