From patchwork Thu Jul 25 17:23:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 11059385 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 57628746 for ; Thu, 25 Jul 2019 17:24:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3B0D0287A3 for ; Thu, 25 Jul 2019 17:24:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2F60628A27; Thu, 25 Jul 2019 17:24:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7622428A1B for ; Thu, 25 Jul 2019 17:24:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391155AbfGYRXz (ORCPT ); Thu, 25 Jul 2019 13:23:55 -0400 Received: from ale.deltatee.com ([207.54.116.67]:39728 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391125AbfGYRXy (ORCPT ); Thu, 25 Jul 2019 13:23:54 -0400 Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hqhSv-0001JD-Lp; Thu, 25 Jul 2019 11:23:53 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.89) (envelope-from ) id 1hqhSu-0001n3-Pk; Thu, 25 Jul 2019 11:23:40 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: Christoph Hellwig , Sagi Grimberg , Keith Busch , Jens Axboe , Chaitanya Kulkarni , Max Gurtovoy , Stephen Bates , Logan Gunthorpe Date: Thu, 25 Jul 2019 11:23:19 -0600 Message-Id: <20190725172335.6825-1-logang@deltatee.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, hch@lst.de, sagi@grimberg.me, kbusch@kernel.org, axboe@fb.com, Chaitanya.Kulkarni@wdc.com, maxg@mellanox.com, sbates@raithlin.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v6 00/16] nvmet: add target passthru commands support X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi, Chaitainya has asked us to take on these patches as we have an interest in getting them into upstream. To that end, we've done a large amount of testing, bug fixes and cleanup. Passthru support for nvmet allows users to export an entire NVMe controller through NVMe-oF. When exported in this way (as opposed to exporting each namespace as a block device), all the NVMe commands are passed to the given controller unmodified, including most admin commands and Vendor Unique Commands (VUCs). A passthru target will expose all namespaces for a given device to the remote host. There are three major non-bugfix changes that we've done to the series: 1) Instead of using a seperate special passthru subsystem in configfs simply add a passthru directory that's analogous to the existing namespace directories. The directories have very similar attributes to namespaces but are mutually exclusive. If a user enables a namespaces, they can't then enable passthru controller and vice versa. This simplifies the code required to implement passthru configfs and IMO creates a much clearer and uniform interface. 2) Instead of taking a bare controller name (ie. "nvme1"), take a full device path to the controller's char device. This is more consistent with the regular namespaces which take a path and also allows users to make use of udev rules and symlinks to manage their controllers instead of the potentially unstable device names. 3) Implement block accounting for the passthru devices. This is so the target OS can still track device usage using /proc/diskstats. Besides these three changes, we've also found a large number of bugs and crashes and did a bunch of testing with KASAN, lockdep and kmemleak. A more complete list of changes is given below. Additionally, we've written some new blktests to test the passthru code. A branch is available here[1] and can be submitted once these patches are upstream. These patches are based off of v5.3-rc1 and a git branch is available at [2]. Thanks, Logan [1] https://github.com/Eideticom/blktests nvmet_passthru [2] https://github.com/sbates130272/linux-p2pmem/ nvmet_passthru_v6 --- v6 Changes (done by Logan): 1. Rebased onto v5.3-rc1 2. Rework configfs interface to simply be a passthru directory within the existing subsystem. The directory is similar to and consistent with a namespace directory. 3. Have the configfs take a path instead of a bare controller name 4. Renamed the main passthru file to io-cmd-passthru.c for consistency with the file and block-dev methods. 5. Cleaned up all the CONFIG_NVME_TARGET_PASSTHRU usage to remove all the inline #ifdefs 6. Restructured nvmet_passthru_make_request() a bit for clearer code 7. Moved nvme_find_get_ns() call into nvmet_passthru_execute_cmd() seeing calling it in nvmet_req_init() causes a lockdep warning due to nvme_find_get_ns() being able to sleep. 8. Added a check in nvmet_passthru_execute_cmd() to ensure we don't violate queue_max_segments or queue_max_hw_sectors and overrode mdts to ensure hosts don't intentionally submit commands that will exceed these limits. 9. Reworked the code which ensures there's only one subsystem per passthru controller to use an xarray instead of a list as this is simpler and more easily fixed some bugs triggered by disabling subsystems that weren't enabled. 10. Removed the overide of the target cntlid with the passthru cntlid; this seemed like a really bad idea especially in the presence of mixed systems as you could end up with two ctrlrs with the same cntlid. For now, commands that depend on cntlid are black listed. 11. Implement block accounting for passthru so the target can track usage using /proc/diskstats 12. A number of other minor bug fixes and cleanups v5 Changes (not sent to list, from Chaitanya): 1. Added workqueue for admin commands. 2. Added kconfig option for the pass-thru feature. 3. Restructure the parsing code according to your suggestion, call nvmet_xxx_parse_cmd() from nvmet_passthru_parse_cmd(). 4. Use pass-thru instead of pt. 5. Several cleanups and add comments at the appropriate locations. 6. Minimize the code for checking pass-thru ns across all the subsystems. 7. Removed the delays in the ns related admin commands since I was not able to reproduce the previous bug. v4 Changes: 1. Add request polling interface to the block layer. 2. Use request polling interface in the NVMEoF target passthru code path. 3. Add checks suggested by Sagi for creating one target ctrl per passthru ctrl. 4. Don't enable the namespace if it belongs to the configured passthru ctrl. 5. Adjust the code latest kernel. v3 Changes: 1. Split the addition of passthru command handlers and integration into two different patches since we add guards to create one target controller per passthru controller. This way it will be easier to review the code. 2. Adjust the code for 4.18. v2 Changes: 1. Update the new nvme core controller find API naming and changed the string comparison of the ctrl. 2. Get rid of the newly added #defines for target ctrl values. 3. Use the newly added structure members in the same patch where they are used. Aggregate the passthru command handling support and integration with nvmet-core into one patch. 4. Introduce global NVMe Target subsystem list for connected and not connected subsystems on the target side. 5. Add check when configuring the target ns and target passthru ctrl to allow only one target controller to be created for one passthru subsystem. 6. Use the passthru ctrl cntlid when creating the target controller. Chaitanya Kulkarni (6): nvme-core: export existing ctrl and ns interfaces nvmet: add return value to nvmet_add_async_event() nvmet-passthru: update KConfig with config passthru option nvmet-passthru: add passthru code to process commands nvmet-core: allow one host per passthru-ctrl nvmet-core: don't check the data len for pt-ctrl Logan Gunthorpe (10): chardev: factor out cdev_lookup() helper chardev: introduce cdev_get_by_path() chardev: export cdev_put() nvme-core: introduce nvme_get_by_path() nvmet: make nvmet_copy_ns_identifier() non-static nvmet-passthru: add enable/disable helpers nvmet-configfs: introduce passthru configfs interface block: don't check blk_rq_is_passthrough() in blk_do_io_stat() block: call blk_account_io_start() in blk_execute_rq_nowait() nvmet-passthru: support block accounting block/blk-exec.c | 2 + block/blk-mq.c | 2 +- block/blk.h | 5 +- drivers/nvme/host/core.c | 40 +- drivers/nvme/host/nvme.h | 9 + drivers/nvme/target/Kconfig | 10 + drivers/nvme/target/Makefile | 1 + drivers/nvme/target/admin-cmd.c | 4 +- drivers/nvme/target/configfs.c | 99 ++++ drivers/nvme/target/core.c | 41 +- drivers/nvme/target/io-cmd-passthru.c | 681 ++++++++++++++++++++++++++ drivers/nvme/target/nvmet.h | 71 ++- fs/char_dev.c | 61 ++- include/linux/cdev.h | 1 + 14 files changed, 1003 insertions(+), 24 deletions(-) create mode 100644 drivers/nvme/target/io-cmd-passthru.c -- 2.20.1