From patchwork Wed Jul 24 21:57:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Verma, Vishal L" X-Patchwork-Id: 11057647 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 345CD912 for ; Wed, 24 Jul 2019 21:57:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2308F1FFB2 for ; Wed, 24 Jul 2019 21:57:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 16F30288DB; Wed, 24 Jul 2019 21:57:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6EED01FFB2 for ; Wed, 24 Jul 2019 21:57:49 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id C1256212DC5AF; Wed, 24 Jul 2019 15:00:15 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=192.55.52.115; helo=mga14.intel.com; envelope-from=vishal.l.verma@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 8390E212DA5ED for ; Wed, 24 Jul 2019 15:00:13 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jul 2019 14:57:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,304,1559545200"; d="scan'208";a="193602201" Received: from vverma7-desk1.lm.intel.com ([10.232.112.185]) by fmsmga004.fm.intel.com with ESMTP; 24 Jul 2019 14:57:46 -0700 From: Vishal Verma To: Subject: [ndctl PATCH v7 00/13] daxctl: add a new reconfigure-device command Date: Wed, 24 Jul 2019 15:57:28 -0600 Message-Id: <20190724215741.18556-1-vishal.l.verma@intel.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Dave Hansen , Pavel Tatashin Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP Changes in v7: - Fix a couple of checkaptch type errors in the new lines added in v6 (Dan). - Get rid of daxctl_dev_get_mode. daxctl_dev_get_memory is sufficient to both check the mode and allocate the memory related structures on its first call. (Dan) - Due to the above, daxctl_dev_mode is now private to libdaxctl, and not part of the API exported through libdaxctl.h - Add a large enough buffer at init time to construct dynamic paths, and avoid asprintf() type allocations for memory blocks at runtime (Dan). Changes in v6: - For memory block online/offline operations, the kernel responds with an EINVAL for both 'real' errors, and if the memory was already in the requested state. Since there is a TOCTOU hole between checking the state and storing it, just perform a second check if the store results in an error. If the check shows the state to be the same as the one we're attempting, it means that another agent (usually udev) won the race, but we don't care so long as the state change happened, so don't report an error. (Fan Du) Changes in v5: - device.c: correctly set loglevel for daxctl_ctx for --verbose - drop the subsys caching, its complexity started to exceed its benefit. dax-class device models will simply error out during reconfigure. (Dan) - Add a note to the man page for the above. - Clarify the onlining policy (online_movable) in the man page - rename "numa_node" to "target_node" in device listings (Dan) - When printing a device 'mode', assume devdax if !system-ram, avoiding a "mode: unknown" situation which can be confusing. (Dan) - Add a "state: disabled" attribute to the device listing if a driver is not bound. This is more apt than the previous "mode: unknown" listing. - add an api to get 'dev->resource' parsing /proc/iomem as a fallback for when the kernel doesn't provide the attribute (Dan) - convert node_* apis to 'memory_* apis that act on a new daxctl_memory object (Dan) - online only memory sections belonging to the device in question by cross referencing block indices with the dax device resource (Dan) - Refuse to reconfigure a device that is already in the target mode. Until now, reconfiguring a system-ram device back to system-ram would result in a 'online memory may not be hot-removed' kernel warning. - If the device was already in the system-ram mode, skip disabling/enabling, but still try to online the memory unless the --no-online option is in effect. - In daxctl_unbind, also 'remove_id' to prevent devices automatically binding to the kmem driver on a disable + re-enable, which can be surprising (Dan). - Rewrite the top half of daxctl/device.c to borrow elements from ndctl/namespace.c so that it can support growing additional commands that operate on devices (online-memory and offline-memory) - Refactor the bottom half of daxctl/device.c so we only do the disabling/offlining steps if the device was enabled. - Add new commands to online and offline memory sections (Dan) associated with a given dax device (Dan) - Add a new test - daxctl-device.sh - to test daxctl reconfigure-device, online-memory, and offline-memory commands. - Add an example in documentation demonstrating how to use numactl to bind a process to a node surfaced from a dax device (Andy Rudoff) Changes in v4: - Don't fail add_dax_dev for kmod failures. Instead fail only when the kmod list is actually used, i.e. during daxctl-reconfigure-device Changes in v3: - In daxctl_dev_get_mode(), remove the subsystem warning, detect dax-class and simply make it return devdax Changes in v2: - Add examples to the documentation page (Dave Hansen) - Clarify documentation regarding the conversion from system-ram to devdax - Remove any references to a persistent config from the documentation - those can be added when the feature is added. - device.c: validate option compatibility - daxctl-list: display numa_node for device listings - daxctl-list: display mode for device listings - make the options more consistent by adding a '-O' short option for --attempt-offline Add a new daxctl-reconfigure-device command that lets us reconfigure DAX devices back and forth between 'system-ram' and 'device-dax' modes. It also includes facilities to online any newly hot-plugged memory (default), and attempt to offline memory before converting away from the system-ram mode (not default, requires a --attempt-offline option). Currently missing from this series is a way to persistently store which devices have been 'marked' for use as system-ram. This depends on a config system overhaul in ndctl, and patches for those will follow separately and are independent of this work. Example invocations: 1. Reconfigure dax0.0 to system-ram mode, don’t online the memory # daxctl reconfigure-device --mode=system-ram --no-online dax0.0 [ { "chardev":"dax0.0", "size":16777216000, "target_node":2, "mode":"system-ram" } ] 2. Reconfigure dax0.0 to devdax mode, attempt to offline the memory # daxctl reconfigure-device --human --mode=devdax --attempt-offline dax0.0 { "chardev":"dax0.0", "size":"15.63 GiB (16.78 GB)", "target_node":2, "mode":"devdax" } 3. Reconfigure all dax devices on region0 to system-ram mode # daxctl reconfigure-device --mode=system-ram --region=0 all [ { "chardev":"dax0.0", "size":16777216000, "target_node":2, "mode":"system-ram" }, { "chardev":"dax0.1", "size":16777216000, "target_node":3, "mode":"system-ram" } ] These patches can also be found in the 'kmem-pending' branch on github: https://github.com/pmem/ndctl/tree/kmem-pending Cc: Dan Williams Cc: Dave Hansen Cc: Pavel Tatashin Vishal Verma (13): libdaxctl: add interfaces to get ctx and check device state libdaxctl: add interfaces to enable/disable devices libdaxctl: add an interface to retrieve the device resource libdaxctl: add a 'daxctl_memory' object for memory based operations daxctl/list: add target_node for device listings daxctl/list: display the mode for a dax device daxctl: add a new reconfigure-device command Documentation/daxctl: add a man page for daxctl-reconfigure-device daxctl: add commands to online and offline memory Documentation: Add man pages for daxctl-{on,off}line-memory contrib/ndctl: fix region-id completions for daxctl contrib/ndctl: add bash-completion for the new daxctl commands test: Add a unit test for daxctl-reconfigure-device and friends Documentation/daxctl/Makefile.am | 5 +- .../daxctl/daxctl-offline-memory.txt | 72 ++ Documentation/daxctl/daxctl-online-memory.txt | 80 ++ .../daxctl/daxctl-reconfigure-device.txt | 139 ++++ Makefile.am | 3 +- contrib/ndctl | 38 +- daxctl/Makefile.am | 2 + daxctl/builtin.h | 3 + daxctl/daxctl.c | 3 + daxctl/device.c | 476 ++++++++++++ daxctl/lib/Makefile.am | 5 +- daxctl/lib/libdaxctl-private.h | 38 + daxctl/lib/libdaxctl.c | 685 ++++++++++++++++++ daxctl/lib/libdaxctl.sym | 18 + daxctl/libdaxctl.h | 16 + test/Makefile.am | 3 +- test/common | 19 +- test/daxctl-devices.sh | 81 +++ util/iomem.c | 37 + util/iomem.h | 12 + util/json.c | 22 + 21 files changed, 1743 insertions(+), 14 deletions(-) create mode 100644 Documentation/daxctl/daxctl-offline-memory.txt create mode 100644 Documentation/daxctl/daxctl-online-memory.txt create mode 100644 Documentation/daxctl/daxctl-reconfigure-device.txt create mode 100644 daxctl/device.c create mode 100755 test/daxctl-devices.sh create mode 100644 util/iomem.c create mode 100644 util/iomem.h