From patchwork Tue Apr 9 16:52:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Liu X-Patchwork-Id: 10891767 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 79C12139A for ; Tue, 9 Apr 2019 16:54:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5DB092465D for ; Tue, 9 Apr 2019 16:54:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5172727C0B; Tue, 9 Apr 2019 16:54:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 893B52465D for ; Tue, 9 Apr 2019 16:54:41 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hDtyu-0001S9-ES; Tue, 09 Apr 2019 16:52:20 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hDtys-0001S3-Li for xen-devel@lists.xenproject.org; Tue, 09 Apr 2019 16:52:18 +0000 X-Inumbo-ID: d39db323-5ae7-11e9-92d7-bc764e045a96 Received: from SMTP03.CITRIX.COM (unknown [162.221.156.55]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id d39db323-5ae7-11e9-92d7-bc764e045a96; Tue, 09 Apr 2019 16:52:14 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.60,330,1549929600"; d="scan'208";a="83138380" From: Wei Liu To: Date: Tue, 9 Apr 2019 17:52:08 +0100 Message-ID: <20190409165210.7624-2-wei.liu2@citrix.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190409165210.7624-1-wei.liu2@citrix.com> References: <20190409165210.7624-1-wei.liu2@citrix.com> MIME-Version: 1.0 Subject: [Xen-devel] [PATCH v3 1/3] tools: remove tmem code and commands X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Ian Jackson , Wei Liu , =?utf-8?q?Marek_Marczykowski-G=C3=B3recki?= , Konrad Rzeszutek Wilk Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Remove all tmem related code in libxc. Leave some stubs in libxl in case anyone has linked to those functions before the removal. Remove all tmem related commands in xl, all tmem related code in other utilities we ship. Signed-off-by: Wei Liu Acked-by: Konrad Rzeszutek Wilk Acked-by: Ian Jackson --- Rebase and fix up conflicts --- tools/libxc/Makefile | 1 - tools/libxc/include/xenctrl.h | 17 - tools/libxc/xc_tmem.c | 507 -------------------- tools/libxl/libxl_tmem.c | 119 +---- tools/misc/Makefile | 1 - tools/misc/xen-tmem-list-parse.c | 339 ------------- tools/python/xen/lowlevel/xc/xc.c | 87 ---- tools/xenstat/libxenstat/src/xenstat.c | 53 +- tools/xenstat/libxenstat/src/xenstat.h | 15 - tools/xenstat/libxenstat/src/xenstat_priv.h | 8 - tools/xenstat/xentop/xentop.c | 36 +- tools/xl/Makefile | 2 +- tools/xl/xl.h | 6 - tools/xl/xl_cmdtable.c | 40 -- tools/xl/xl_tmem.c | 251 ---------- 15 files changed, 18 insertions(+), 1464 deletions(-) delete mode 100644 tools/libxc/xc_tmem.c delete mode 100644 tools/misc/xen-tmem-list-parse.c delete mode 100644 tools/xl/xl_tmem.c diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile index 668cbce0f8..cbc30001f6 100644 --- a/tools/libxc/Makefile +++ b/tools/libxc/Makefile @@ -30,7 +30,6 @@ CTRL_SRCS-y += xc_tbuf.c CTRL_SRCS-y += xc_pm.c CTRL_SRCS-y += xc_cpu_hotplug.c CTRL_SRCS-y += xc_resume.c -CTRL_SRCS-y += xc_tmem.c CTRL_SRCS-y += xc_vm_event.c CTRL_SRCS-y += xc_monitor.c CTRL_SRCS-y += xc_mem_paging.c diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index a3628e56bb..f3deba8b58 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -44,7 +44,6 @@ #include #include #include -#include #include #include @@ -1907,22 +1906,6 @@ int xc_set_cpuidle_max_cstate(xc_interface *xch, uint32_t value); int xc_enable_turbo(xc_interface *xch, int cpuid); int xc_disable_turbo(xc_interface *xch, int cpuid); -/** - * tmem operations - */ - -int xc_tmem_control_oid(xc_interface *xch, int32_t pool_id, uint32_t subop, - uint32_t cli_id, uint32_t len, uint32_t arg, - struct xen_tmem_oid oid, void *buf); -int xc_tmem_control(xc_interface *xch, - int32_t pool_id, uint32_t subop, uint32_t cli_id, - uint32_t len, uint32_t arg, void *buf); -int xc_tmem_auth(xc_interface *xch, int cli_id, char *uuid_str, int enable); -int xc_tmem_save(xc_interface *xch, uint32_t domid, int live, int fd, int field_marker); -int xc_tmem_save_extra(xc_interface *xch, uint32_t domid, int fd, int field_marker); -void xc_tmem_save_done(xc_interface *xch, uint32_t domid); -int xc_tmem_restore(xc_interface *xch, uint32_t domid, int fd); -int xc_tmem_restore_extra(xc_interface *xch, uint32_t domid, int fd); /** * altp2m operations diff --git a/tools/libxc/xc_tmem.c b/tools/libxc/xc_tmem.c deleted file mode 100644 index a365c74388..0000000000 --- a/tools/libxc/xc_tmem.c +++ /dev/null @@ -1,507 +0,0 @@ -/****************************************************************************** - * xc_tmem.c - * - * Copyright (C) 2008 Oracle Corp. - * - * This library is free software; you can redistribute it and/or - * modify it under the terms of the GNU Lesser General Public - * License as published by the Free Software Foundation; - * version 2.1 of the License. - * - * This library is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * Lesser General Public License for more details. - * - * You should have received a copy of the GNU Lesser General Public - * License along with this library; If not, see . - */ - -#include "xc_private.h" -#include -#include -#include - -int xc_tmem_control(xc_interface *xch, - int32_t pool_id, - uint32_t cmd, - uint32_t cli_id, - uint32_t len, - uint32_t arg, - void *buf) -{ - DECLARE_SYSCTL; - DECLARE_HYPERCALL_BOUNCE(buf, len, XC_HYPERCALL_BUFFER_BOUNCE_OUT); - int rc; - - sysctl.cmd = XEN_SYSCTL_tmem_op; - sysctl.u.tmem_op.pool_id = pool_id; - sysctl.u.tmem_op.cmd = cmd; - sysctl.u.tmem_op.cli_id = cli_id; - sysctl.u.tmem_op.len = len; - sysctl.u.tmem_op.arg = arg; - sysctl.u.tmem_op.pad = 0; - sysctl.u.tmem_op.oid.oid[0] = 0; - sysctl.u.tmem_op.oid.oid[1] = 0; - sysctl.u.tmem_op.oid.oid[2] = 0; - - if ( cmd == XEN_SYSCTL_TMEM_OP_SET_CLIENT_INFO || - cmd == XEN_SYSCTL_TMEM_OP_SET_AUTH ) - HYPERCALL_BOUNCE_SET_DIR(buf, XC_HYPERCALL_BUFFER_BOUNCE_IN); - if ( len ) - { - if ( buf == NULL ) - { - errno = EINVAL; - return -1; - } - if ( xc_hypercall_bounce_pre(xch, buf) ) - { - PERROR("Could not bounce buffer for tmem control hypercall"); - return -1; - } - } - - set_xen_guest_handle(sysctl.u.tmem_op.u.buf, buf); - - rc = do_sysctl(xch, &sysctl); - - if ( len ) - xc_hypercall_bounce_post(xch, buf); - - return rc; -} - -int xc_tmem_control_oid(xc_interface *xch, - int32_t pool_id, - uint32_t cmd, - uint32_t cli_id, - uint32_t len, - uint32_t arg, - struct xen_tmem_oid oid, - void *buf) -{ - DECLARE_SYSCTL; - DECLARE_HYPERCALL_BOUNCE(buf, len, XC_HYPERCALL_BUFFER_BOUNCE_OUT); - int rc; - - sysctl.cmd = XEN_SYSCTL_tmem_op; - sysctl.u.tmem_op.pool_id = pool_id; - sysctl.u.tmem_op.cmd = cmd; - sysctl.u.tmem_op.cli_id = cli_id; - sysctl.u.tmem_op.len = len; - sysctl.u.tmem_op.arg = arg; - sysctl.u.tmem_op.pad = 0; - sysctl.u.tmem_op.oid = oid; - - if ( len ) - { - if ( buf == NULL ) - { - errno = EINVAL; - return -1; - } - if ( xc_hypercall_bounce_pre(xch, buf) ) - { - PERROR("Could not bounce buffer for tmem control (OID) hypercall"); - return -1; - } - } - - set_xen_guest_handle(sysctl.u.tmem_op.u.buf, buf); - - rc = do_sysctl(xch, &sysctl); - - if ( len ) - xc_hypercall_bounce_post(xch, buf); - - return rc; -} - -static int xc_tmem_uuid_parse(char *uuid_str, uint64_t *uuid_lo, uint64_t *uuid_hi) -{ - char *p = uuid_str; - uint64_t *x = uuid_hi; - int i = 0, digit; - - *uuid_lo = 0; *uuid_hi = 0; - for ( p = uuid_str, i = 0; i != 36 && *p != '\0'; p++, i++ ) - { - if ( (i == 8 || i == 13 || i == 18 || i == 23) ) - { - if ( *p != '-' ) - return -1; - if ( i == 18 ) - x = uuid_lo; - continue; - } - else if ( *p >= '0' && *p <= '9' ) - digit = *p - '0'; - else if ( *p >= 'A' && *p <= 'F' ) - digit = *p - 'A' + 10; - else if ( *p >= 'a' && *p <= 'f' ) - digit = *p - 'a' + 10; - else - return -1; - *x = (*x << 4) | digit; - } - if ( (i != 1 && i != 36) || *p != '\0' ) - return -1; - return 0; -} - -int xc_tmem_auth(xc_interface *xch, - int cli_id, - char *uuid_str, - int enable) -{ - xen_tmem_pool_info_t pool = { - .flags.u.auth = enable, - .id = 0, - .n_pages = 0, - .uuid[0] = 0, - .uuid[1] = 0, - }; - if ( xc_tmem_uuid_parse(uuid_str, &pool.uuid[0], - &pool.uuid[1]) < 0 ) - { - PERROR("Can't parse uuid, use xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"); - return -1; - } - return xc_tmem_control(xch, 0 /* pool_id */, - XEN_SYSCTL_TMEM_OP_SET_AUTH, - cli_id, sizeof(pool), - 0 /* arg */, &pool); -} - -/* Save/restore/live migrate */ - -/* - Note that live migration complicates the save/restore format in - multiple ways: Though saving/migration can only occur when all - tmem pools belonging to the domain-being-saved are frozen and - this ensures that new pools can't be created or existing pools - grown (in number of pages), it is possible during a live migration - that pools may be destroyed and pages invalidated while the migration - is in process. As a result, (1) it is not safe to pre-specify counts - for these values precisely, but only as a "max", and (2) a "invalidation" - list (of pools, objects, pages) must be appended when the domain is truly - suspended. - */ - -/* returns 0 if nothing to save, -1 if error saving, 1 if saved successfully */ -int xc_tmem_save(xc_interface *xch, - uint32_t domid, int io_fd, int live, int field_marker) -{ - int marker = field_marker; - int i, j, rc; - uint32_t minusone = -1; - struct tmem_handle *h; - xen_tmem_client_t info; - xen_tmem_pool_info_t *pools; - char *buf = NULL; - - rc = xc_tmem_control(xch, 0, XEN_SYSCTL_TMEM_OP_SAVE_BEGIN, - domid, 0 /* len*/ , live, NULL); - if ( rc ) - { - /* Nothing to save - no tmem enabled. */ - if ( errno == ENOENT ) - return 0; - - return rc; - } - - if ( xc_tmem_control(xch, 0 /* pool_id */, - XEN_SYSCTL_TMEM_OP_GET_CLIENT_INFO, - domid /* cli_id */, sizeof(info), 0 /* arg */, - &info) < 0 ) - return -1; - - /* Nothing to do. */ - if ( !info.nr_pools ) - return 0; - - pools = calloc(info.nr_pools, sizeof(*pools)); - if ( !pools ) - return -1; - - rc = xc_tmem_control(xch, 0 /* pool_id is ignored. */, - XEN_SYSCTL_TMEM_OP_GET_POOLS, - domid /* cli_id */, sizeof(*pools) * info.nr_pools, - 0 /* arg */, pools); - - if ( rc < 0 || (uint32_t)rc > info.nr_pools ) - goto out_memory; - - /* Update it - as we have less pools between the two hypercalls. */ - info.nr_pools = (uint32_t)rc; - - if ( write_exact(io_fd, &marker, sizeof(marker)) ) - goto out_memory; - - if ( write_exact(io_fd, &info, sizeof(info)) ) - goto out_memory; - - if ( write_exact(io_fd, &minusone, sizeof(minusone)) ) - goto out_memory; - - for ( i = 0; i < info.nr_pools; i++ ) - { - uint32_t pagesize; - int bufsize = 0; - int checksum = 0; - xen_tmem_pool_info_t *pool = &pools[i]; - - if ( pool->flags.raw != -1 ) - { - if ( !pool->flags.u.persist ) - pool->n_pages = 0; - - if ( write_exact(io_fd, pool, sizeof(*pool)) ) - goto out_memory; - - if ( !pool->flags.u.persist ) - continue; - - pagesize = 1 << (pool->flags.u.pagebits + 12); - if ( pagesize > bufsize ) - { - bufsize = pagesize + sizeof(struct tmem_handle); - if ( (buf = realloc(buf,bufsize)) == NULL ) - goto out_memory; - } - for ( j = pool->n_pages; j > 0; j-- ) - { - int ret; - if ( (ret = xc_tmem_control( - xch, pool->id, XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_PAGE, - domid, bufsize, 0, buf)) > 0 ) - { - h = (struct tmem_handle *)buf; - if ( write_exact(io_fd, &h->oid, sizeof(h->oid)) ) - goto out_memory; - - if ( write_exact(io_fd, &h->index, sizeof(h->index)) ) - goto out_memory; - h++; - checksum += *(char *)h; - if ( write_exact(io_fd, h, pagesize) ) - goto out_memory; - } else if ( ret == 0 ) { - continue; - } else { - /* page list terminator */ - h = (struct tmem_handle *)buf; - h->oid.oid[0] = h->oid.oid[1] = h->oid.oid[2] = -1L; - if ( write_exact(io_fd, &h->oid, sizeof(h->oid)) ) - { - out_memory: - free(pools); - free(buf); - return -1; - } - break; - } - } - DPRINTF("saved %"PRId64" tmem pages for dom=%d pool=%d, checksum=%x\n", - pool->n_pages - j, domid, pool->id, checksum); - } - } - free(pools); - free(buf); - - /* pool list terminator */ - minusone = -1; - if ( write_exact(io_fd, &minusone, sizeof(minusone)) ) - return -1; - - return 1; -} - -/* only called for live migration */ -int xc_tmem_save_extra(xc_interface *xch, uint32_t domid, int io_fd, int field_marker) -{ - struct tmem_handle handle; - int marker = field_marker; - uint32_t minusone; - int count = 0, checksum = 0; - - if ( write_exact(io_fd, &marker, sizeof(marker)) ) - return -1; - while ( xc_tmem_control(xch, 0, XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_INV, domid, - sizeof(handle),0,&handle) > 0 ) { - if ( write_exact(io_fd, &handle.pool_id, sizeof(handle.pool_id)) ) - return -1; - if ( write_exact(io_fd, &handle.oid, sizeof(handle.oid)) ) - return -1; - if ( write_exact(io_fd, &handle.index, sizeof(handle.index)) ) - return -1; - count++; - checksum += handle.pool_id + handle.oid.oid[0] + handle.oid.oid[1] + - handle.oid.oid[2] + handle.index; - } - if ( count ) - DPRINTF("needed %d tmem invalidates, check=%d\n",count,checksum); - minusone = -1; - if ( write_exact(io_fd, &minusone, sizeof(minusone)) ) - return -1; - return 0; -} - -/* only called for live migration */ -void xc_tmem_save_done(xc_interface *xch, uint32_t domid) -{ - xc_tmem_control(xch, 0, XEN_SYSCTL_TMEM_OP_SAVE_END, domid, 0, 0, NULL); -} - -/* restore routines */ - -static int xc_tmem_restore_new_pool( - xc_interface *xch, - int cli_id, - uint32_t pool_id, - uint32_t flags, - uint64_t uuid_lo, - uint64_t uuid_hi) -{ - xen_tmem_pool_info_t pool = { - .flags.raw = flags, - .id = pool_id, - .n_pages = 0, - .uuid[0] = uuid_lo, - .uuid[1] = uuid_hi, - }; - - return xc_tmem_control(xch, pool_id, - XEN_SYSCTL_TMEM_OP_SET_POOLS, - cli_id, sizeof(pool), - 0 /* arg */, &pool); -} - -int xc_tmem_restore(xc_interface *xch, uint32_t domid, int io_fd) -{ - uint32_t minusone; - xen_tmem_client_t info; - int checksum = 0; - unsigned int i; - char *buf = NULL; - - if ( read_exact(io_fd, &info, sizeof(info)) ) - return -1; - - /* We would never save if there weren't any pools! */ - if ( !info.nr_pools ) - return -1; - - if ( xc_tmem_control(xch, 0, XEN_SYSCTL_TMEM_OP_RESTORE_BEGIN, domid, 0, 0, NULL) < 0 ) - return -1; - - if ( xc_tmem_control(xch, 0 /* pool_id */, - XEN_SYSCTL_TMEM_OP_SET_CLIENT_INFO, - domid /* cli_id */, sizeof(info), 0 /* arg */, - &info) < 0 ) - return -1; - - if ( read_exact(io_fd, &minusone, sizeof(minusone)) ) - return -1; - - for ( i = 0; i < info.nr_pools; i++ ) - { - int bufsize = 0, pagesize; - int j; - xen_tmem_pool_info_t pool; - - if ( read_exact(io_fd, &pool, sizeof(pool)) ) - goto out_memory; - - if ( xc_tmem_restore_new_pool(xch, domid, pool.id, pool.flags.raw, - pool.uuid[0], pool.uuid[1]) < 0 ) - goto out_memory; - - if ( pool.n_pages <= 0 ) - continue; - - pagesize = 1 << (pool.flags.u.pagebits + 12); - if ( pagesize > bufsize ) - { - bufsize = pagesize; - if ( (buf = realloc(buf,bufsize)) == NULL ) - goto out_memory; - } - for ( j = pool.n_pages; j > 0; j-- ) - { - struct xen_tmem_oid oid; - uint32_t index; - int rc; - - if ( read_exact(io_fd, &oid, sizeof(oid)) ) - goto out_memory; - - if ( oid.oid[0] == -1L && oid.oid[1] == -1L && oid.oid[2] == -1L ) - break; - if ( read_exact(io_fd, &index, sizeof(index)) ) - goto out_memory; - - if ( read_exact(io_fd, buf, pagesize) ) - goto out_memory; - - checksum += *buf; - if ( (rc = xc_tmem_control_oid( - xch, pool.id, XEN_SYSCTL_TMEM_OP_RESTORE_PUT_PAGE, - domid, bufsize, index, oid, buf)) <= 0 ) - { - DPRINTF("xc_tmem_restore: putting page failed, rc=%d\n",rc); - out_memory: - free(buf); - return -1; - } - } - if ( pool.n_pages ) - DPRINTF("restored %"PRId64" tmem pages for dom=%d pool=%d, check=%x\n", - pool.n_pages - j, domid, pool.id, checksum); - } - free(buf); - - return 0; -} - -/* only called for live migration, must be called after suspend */ -int xc_tmem_restore_extra(xc_interface *xch, uint32_t domid, int io_fd) -{ - uint32_t pool_id; - struct xen_tmem_oid oid; - uint32_t index; - int count = 0; - int checksum = 0; - - while ( read_exact(io_fd, &pool_id, sizeof(pool_id)) == 0 && pool_id != -1 ) - { - if ( read_exact(io_fd, &oid, sizeof(oid)) ) - return -1; - if ( read_exact(io_fd, &index, sizeof(index)) ) - return -1; - if ( xc_tmem_control_oid( - xch, pool_id, XEN_SYSCTL_TMEM_OP_RESTORE_FLUSH_PAGE, - domid, 0, index, oid, NULL) <= 0 ) - return -1; - count++; - checksum += pool_id + oid.oid[0] + oid.oid[1] + oid.oid[2] + index; - } - if ( pool_id != -1 ) - return -1; - if ( count ) - DPRINTF("invalidated %d tmem pages, check=%d\n",count,checksum); - - return 0; -} - -/* - * Local variables: - * mode: C - * c-file-style: "BSD" - * c-basic-offset: 4 - * tab-width: 4 - * indent-tabs-mode: nil - * End: - */ diff --git a/tools/libxl/libxl_tmem.c b/tools/libxl/libxl_tmem.c index 2bee8d1edf..a553b39738 100644 --- a/tools/libxl/libxl_tmem.c +++ b/tools/libxl/libxl_tmem.c @@ -16,146 +16,55 @@ #include "libxl_internal.h" +/* TMEM is gone. Leave some stubs here. */ + char *libxl_tmem_list(libxl_ctx *ctx, uint32_t domid, int use_long) { - int r; - char _buf[32768]; GC_INIT(ctx); - - r = xc_tmem_control(ctx->xch, -1, XEN_SYSCTL_TMEM_OP_LIST, domid, 32768, - use_long, _buf); - if (r < 0) { - LOGED(ERROR, domid, "Can not get tmem list"); - GC_FREE; - return NULL; - } - + LOGED(ERROR, domid, "Can not get tmem list"); GC_FREE; - return strdup(_buf); + return NULL; } int libxl_tmem_freeze(libxl_ctx *ctx, uint32_t domid) { - int r, rc; GC_INIT(ctx); - - r = xc_tmem_control(ctx->xch, -1, XEN_SYSCTL_TMEM_OP_FREEZE, domid, 0, 0, - NULL); - if (r < 0) { - LOGED(ERROR, domid, "Can not freeze tmem pools"); - rc = ERROR_FAIL; - goto out; - } - - rc = 0; -out: + LOGED(ERROR, domid, "Can not freeze tmem pools"); GC_FREE; - return rc; + return ERROR_FAIL; } int libxl_tmem_thaw(libxl_ctx *ctx, uint32_t domid) { - int r, rc; GC_INIT(ctx); - - r = xc_tmem_control(ctx->xch, -1, XEN_SYSCTL_TMEM_OP_THAW, domid, 0, 0, - NULL); - if (r < 0) { - LOGED(ERROR, domid, "Can not thaw tmem pools"); - rc = ERROR_FAIL; - goto out; - } - - rc = 0; -out: + LOGED(ERROR, domid, "Can not thaw tmem pools"); GC_FREE; - return rc; -} - -static int32_t tmem_setop_from_string(char *set_name, uint32_t val, - xen_tmem_client_t *info) -{ - if (!strcmp(set_name, "weight")) - info->weight = val; - else if (!strcmp(set_name, "compress")) - info->flags.u.compress = val; - else - return -1; - - return 0; + return ERROR_FAIL; } int libxl_tmem_set(libxl_ctx *ctx, uint32_t domid, char* name, uint32_t set) { - int r, rc; - xen_tmem_client_t info; GC_INIT(ctx); - - r = xc_tmem_control(ctx->xch, -1 /* pool_id */, - XEN_SYSCTL_TMEM_OP_GET_CLIENT_INFO, - domid, sizeof(info), 0 /* arg */, &info); - if (r < 0) { - LOGED(ERROR, domid, "Can not get tmem data!"); - rc = ERROR_FAIL; - goto out; - } - rc = tmem_setop_from_string(name, set, &info); - if (rc == -1) { - LOGEVD(ERROR, -1, domid, "Invalid set, valid sets are "); - rc = ERROR_INVAL; - goto out; - } - r = xc_tmem_control(ctx->xch, -1 /* pool_id */, - XEN_SYSCTL_TMEM_OP_SET_CLIENT_INFO, - domid, sizeof(info), 0 /* arg */, &info); - if (r < 0) { - LOGED(ERROR, domid, "Can not set tmem %s", name); - rc = ERROR_FAIL; - goto out; - } - - rc = 0; -out: + LOGED(ERROR, domid, "Can not set tmem %s", name); GC_FREE; - return rc; + return ERROR_FAIL; } int libxl_tmem_shared_auth(libxl_ctx *ctx, uint32_t domid, char* uuid, int auth) { - int r, rc; GC_INIT(ctx); - - r = xc_tmem_auth(ctx->xch, domid, uuid, auth); - if (r < 0) { - LOGED(ERROR, domid, "Can not set tmem shared auth"); - rc = ERROR_FAIL; - goto out; - } - - rc = 0; -out: + LOGED(ERROR, domid, "Can not set tmem shared auth"); GC_FREE; - return rc; + return ERROR_FAIL; } int libxl_tmem_freeable(libxl_ctx *ctx) { - int r, rc; GC_INIT(ctx); - - r = xc_tmem_control(ctx->xch, -1, XEN_SYSCTL_TMEM_OP_QUERY_FREEABLE_MB, - -1, 0, 0, 0); - if (r < 0) { - LOGE(ERROR, "Can not get tmem freeable memory"); - rc = ERROR_FAIL; - goto out; - } - - rc = 0; -out: + LOGE(ERROR, "Can not get tmem freeable memory"); GC_FREE; - return rc; + return ERROR_FAIL; } /* diff --git a/tools/misc/Makefile b/tools/misc/Makefile index 51adb6f1a6..d4320dce81 100644 --- a/tools/misc/Makefile +++ b/tools/misc/Makefile @@ -22,7 +22,6 @@ INSTALL_SBIN-$(CONFIG_X86) += xen-hvmcrash INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx INSTALL_SBIN-$(CONFIG_X86) += xen-lowmemd INSTALL_SBIN-$(CONFIG_X86) += xen-mfndump -INSTALL_SBIN += xen-tmem-list-parse INSTALL_SBIN += xencov INSTALL_SBIN += xenlockprof INSTALL_SBIN += xenperf diff --git a/tools/misc/xen-tmem-list-parse.c b/tools/misc/xen-tmem-list-parse.c deleted file mode 100644 index f32b107dce..0000000000 --- a/tools/misc/xen-tmem-list-parse.c +++ /dev/null @@ -1,339 +0,0 @@ -/* - * Parse output from tmem-list and reformat to human-readable - * - * NOTE: NEVER delete a parse call as this file documents backwards - * compatibility for older versions of tmem-list and we don't want to - * accidentally reuse an old tag - * - * Copyright (c) 2009, Dan Magenheimer, Oracle Corp. - */ - -#include -#include -#include - -#define BUFSIZE 4096 -#define PAGE_SIZE 4096 - -unsigned long long parse(char *s,char *match) -{ - char *s1 = strstr(s,match); - unsigned long long ret; - - if ( s1 == NULL ) - return 0LL; - s1 += 2; - if ( *s1++ != ':' ) - return 0LL; - sscanf(s1,"%llu",&ret); - return ret; -} - -unsigned long long parse_hex(char *s,char *match) -{ - char *s1 = strstr(s,match); - unsigned long long ret; - - if ( s1 == NULL ) - return 0LL; - s1 += 2; - if ( *s1++ != ':' ) - return 0LL; - sscanf(s1,"%llx",&ret); - return ret; -} - -unsigned long long parse2(char *s,char *match1, char *match2) -{ - char match[3]; - match[0] = *match1; - match[1] = *match2; - match[2] = '\0'; - return parse(s,match); -} - -void parse_string(char *s,char *match, char *buf, int len) -{ - char *s1 = strstr(s,match); - int i; - - if ( s1 == NULL ) - return; - s1 += 2; - if ( *s1++ != ':' ) - return; - for ( i = 0; i < len; i++ ) - *buf++ = *s1++; -} - -void parse_sharers(char *s, char *match, char *buf, int len) -{ - char *s1 = strstr(s,match); - char *b = buf; - - if ( s1 == NULL ) - return; - while ( s1 ) - { - s1 += 2; - if (*s1++ != ':') - return; - while (*s1 >= '0' && *s1 <= '9') - *b++ = *s1++; - *b++ = ','; - s1 = strstr(s1,match); - } - if ( b != buf ) - *--b = '\0'; -} - -void parse_global(char *s) -{ - unsigned long long total_ops = parse(s,"Tt"); - unsigned long long errored_ops = parse(s,"Te"); - unsigned long long failed_copies = parse(s,"Cf"); - unsigned long long alloc_failed = parse(s,"Af"); - unsigned long long alloc_page_failed = parse(s,"Pf"); - unsigned long long avail_pages = parse(s,"Ta"); - unsigned long long low_on_memory = parse(s,"Lm"); - unsigned long long evicted_pgs = parse(s,"Et"); - unsigned long long evict_attempts = parse(s,"Ea"); - unsigned long long relinq_pgs = parse(s,"Rt"); - unsigned long long relinq_attempts = parse(s,"Ra"); - unsigned long long max_evicts_per_relinq = parse(s,"Rx"); - unsigned long long total_flush_pool = parse(s,"Fp"); - unsigned long long global_eph_count = parse(s,"Ec"); - unsigned long long global_eph_max = parse(s,"Em"); - unsigned long long obj_count = parse(s,"Oc"); - unsigned long long obj_max = parse(s,"Om"); - unsigned long long rtree_node_count = parse(s,"Nc"); - unsigned long long rtree_node_max = parse(s,"Nm"); - unsigned long long pgp_count = parse(s,"Pc"); - unsigned long long pgp_max = parse(s,"Pm"); - unsigned long long page_count = parse(s,"Fc"); - unsigned long long max_page_count = parse(s,"Fm"); - unsigned long long pcd_count = parse(s,"Sc"); - unsigned long long max_pcd_count = parse(s,"Sm"); - unsigned long long pcd_tot_tze_size = parse(s,"Zt"); - unsigned long long pcd_tot_csize = parse(s,"Gz"); - unsigned long long deduped_puts = parse(s,"Gd"); - unsigned long long tot_good_eph_puts = parse(s,"Ep"); - - printf("total tmem ops=%llu (errors=%llu) -- tmem pages avail=%llu\n", - total_ops, errored_ops, avail_pages); - printf("datastructs: objs=%llu (max=%llu) pgps=%llu (max=%llu) " - "nodes=%llu (max=%llu) pages=%llu (max=%llu) ", - obj_count, obj_max, pgp_count, pgp_max, - rtree_node_count, rtree_node_max, - page_count,max_page_count); - if (max_pcd_count != 0 && global_eph_count != 0 && tot_good_eph_puts != 0) { - printf("pcds=%llu (max=%llu) ", - pcd_count,max_pcd_count); - printf("deduped: avg=%4.2f%% (curr=%4.2f%%) ", - ((deduped_puts*1.0)/tot_good_eph_puts)*100, - (1.0-(pcd_count*1.0)/global_eph_count)*100); - } - if (pcd_count != 0) - { - if (pcd_tot_tze_size && (pcd_tot_tze_size < pcd_count*PAGE_SIZE)) - printf("tze savings=%4.2f%% ", - (1.0-(pcd_tot_tze_size*1.0)/(pcd_count*PAGE_SIZE))*100); - if (pcd_tot_csize && (pcd_tot_csize < pcd_count*PAGE_SIZE)) - printf("compression savings=%4.2f%% ", - (1.0-(pcd_tot_csize*1.0)/(pcd_count*PAGE_SIZE))*100); - } - printf("\n"); - printf("misc: failed_copies=%llu alloc_failed=%llu alloc_page_failed=%llu " - "low_mem=%llu evicted=%llu/%llu relinq=%llu/%llu, " - "max_evicts_per_relinq=%llu, flush_pools=%llu, " - "eph_count=%llu, eph_max=%llu\n", - failed_copies, alloc_failed, alloc_page_failed, low_on_memory, - evicted_pgs, evict_attempts, relinq_pgs, relinq_attempts, - max_evicts_per_relinq, total_flush_pool, - global_eph_count, global_eph_max); -} - -#define PARSE_CYC_COUNTER(s,x,prefix) unsigned long long \ - x##_count = parse2(s,prefix,"n"), \ - x##_sum_cycles = parse2(s,prefix,"t"), \ - x##_max_cycles = parse2(s,prefix,"x"), \ - x##_min_cycles = parse2(s,prefix,"m") -#define PRINTF_CYC_COUNTER(x,text) \ - if (x##_count) printf(text" avg=%llu, max=%llu, " \ - "min=%llu, samples=%llu\n", \ - x##_sum_cycles ? (x##_sum_cycles/x##_count) : 0, \ - x##_max_cycles, x##_min_cycles, x##_count) - -void parse_time_stats(char *s) -{ - PARSE_CYC_COUNTER(s,succ_get,"G"); - PARSE_CYC_COUNTER(s,succ_put,"P"); - PARSE_CYC_COUNTER(s,non_succ_get,"g"); - PARSE_CYC_COUNTER(s,non_succ_put,"p"); - PARSE_CYC_COUNTER(s,flush,"F"); - PARSE_CYC_COUNTER(s,flush_obj,"O"); - PARSE_CYC_COUNTER(s,pg_copy,"C"); - PARSE_CYC_COUNTER(s,compress,"c"); - PARSE_CYC_COUNTER(s,decompress,"d"); - - PRINTF_CYC_COUNTER(succ_get,"succ get cycles:"); - PRINTF_CYC_COUNTER(succ_put,"succ put cycles:"); - PRINTF_CYC_COUNTER(non_succ_get,"failed get cycles:"); - PRINTF_CYC_COUNTER(non_succ_put,"failed put cycles:"); - PRINTF_CYC_COUNTER(flush,"flush cycles:"); - PRINTF_CYC_COUNTER(flush_obj,"flush_obj cycles:"); - PRINTF_CYC_COUNTER(pg_copy,"page copy cycles:"); - PRINTF_CYC_COUNTER(compress,"compression cycles:"); - PRINTF_CYC_COUNTER(decompress,"decompression cycles:"); -} - -void parse_client(char *s) -{ - unsigned long cli_id = parse(s,"CI"); - unsigned long weight = parse(s,"ww"); - unsigned long cap = parse(s,"ca"); - unsigned long compress = parse(s,"co"); - unsigned long frozen = parse(s,"fr"); - unsigned long long eph_count = parse(s,"Ec"); - unsigned long long max_eph_count = parse(s,"Em"); - unsigned long long compressed_pages = parse(s,"cp"); - unsigned long long compressed_sum_size = parse(s,"cb"); - unsigned long long compress_poor = parse(s,"cn"); - unsigned long long compress_nomem = parse(s,"cm"); - unsigned long long total_cycles = parse(s,"Tc"); - unsigned long long succ_eph_gets = parse(s,"Ge"); - unsigned long long succ_pers_puts = parse(s,"Pp"); - unsigned long long succ_pers_gets = parse(s,"Gp"); - - printf("domid%lu: weight=%lu,cap=%lu,compress=%d,frozen=%d," - "total_cycles=%llu,succ_eph_gets=%llu," - "succ_pers_puts=%llu,succ_pers_gets=%llu," - "eph_count=%llu,max_eph=%llu," - "compression ratio=%lu%% (samples=%llu,poor=%llu,nomem=%llu)\n", - cli_id, weight, cap, compress?1:0, frozen?1:0, - total_cycles, succ_eph_gets, succ_pers_puts, succ_pers_gets, - eph_count, max_eph_count, - compressed_pages ? (long)((compressed_sum_size*100LL) / - (compressed_pages*PAGE_SIZE)) : 0, - compressed_pages, compress_poor, compress_nomem); - -} - -void parse_pool(char *s) -{ - char pool_type[3]; - unsigned long cli_id = parse(s,"CI"); - unsigned long pool_id = parse(s,"PI"); - unsigned long long pgp_count = parse(s,"Pc"); - unsigned long long max_pgp_count = parse(s,"Pm"); - unsigned long long obj_count = parse(s,"Oc"); - unsigned long long max_obj_count = parse(s,"Om"); - unsigned long long objnode_count = parse(s,"Nc"); - unsigned long long max_objnode_count = parse(s,"Nm"); - unsigned long long good_puts = parse(s,"ps"); - unsigned long long puts = parse(s,"pt"); - unsigned long long no_mem_puts = parse(s,"px"); - unsigned long long dup_puts_flushed = parse(s,"pd"); - unsigned long long dup_puts_replaced = parse(s,"pr"); - unsigned long long found_gets = parse(s,"gs"); - unsigned long long gets = parse(s,"gt"); - unsigned long long flushs_found = parse(s,"fs"); - unsigned long long flushs = parse(s,"ft"); - unsigned long long flush_objs_found = parse(s,"os"); - unsigned long long flush_objs = parse(s,"ot"); - - parse_string(s,"PT",pool_type,2); - pool_type[2] = '\0'; - if (pool_type[1] == 'S') - return; /* no need to repeat print data for shared pools */ - printf("domid%lu,id%lu[%s]:pgp=%llu(max=%llu) obj=%llu(%llu) " - "objnode=%llu(%llu) puts=%llu/%llu/%llu(dup=%llu/%llu) " - "gets=%llu/%llu(%llu%%) " - "flush=%llu/%llu flobj=%llu/%llu\n", - cli_id, pool_id, pool_type, - pgp_count, max_pgp_count, obj_count, max_obj_count, - objnode_count, max_objnode_count, - good_puts, puts, no_mem_puts, - dup_puts_flushed, dup_puts_replaced, - found_gets, gets, - gets ? (found_gets*100LL)/gets : 0, - flushs_found, flushs, flush_objs_found, flush_objs); - -} - -void parse_shared_pool(char *s) -{ - char pool_type[3]; - char buf[BUFSIZE]; - unsigned long pool_id = parse(s,"PI"); - unsigned long long uid0 = parse_hex(s,"U0"); - unsigned long long uid1 = parse_hex(s,"U1"); - unsigned long long pgp_count = parse(s,"Pc"); - unsigned long long max_pgp_count = parse(s,"Pm"); - unsigned long long obj_count = parse(s,"Oc"); - unsigned long long max_obj_count = parse(s,"Om"); - unsigned long long objnode_count = parse(s,"Nc"); - unsigned long long max_objnode_count = parse(s,"Nm"); - unsigned long long good_puts = parse(s,"ps"); - unsigned long long puts = parse(s,"pt"); - unsigned long long no_mem_puts = parse(s,"px"); - unsigned long long dup_puts_flushed = parse(s,"pd"); - unsigned long long dup_puts_replaced = parse(s,"pr"); - unsigned long long found_gets = parse(s,"gs"); - unsigned long long gets = parse(s,"gt"); - unsigned long long flushs_found = parse(s,"fs"); - unsigned long long flushs = parse(s,"ft"); - unsigned long long flush_objs_found = parse(s,"os"); - unsigned long long flush_objs = parse(s,"ot"); - - parse_string(s,"PT",pool_type,2); - pool_type[2] = '\0'; - parse_sharers(s,"SC",buf,BUFSIZE); - printf("poolid=%lu[%s] uuid=%llx.%llx, shared-by:%s: " - "pgp=%llu(max=%llu) obj=%llu(%llu) " - "objnode=%llu(%llu) puts=%llu/%llu/%llu(dup=%llu/%llu) " - "gets=%llu/%llu(%llu%%) " - "flush=%llu/%llu flobj=%llu/%llu\n", - pool_id, pool_type, uid0, uid1, buf, - pgp_count, max_pgp_count, obj_count, max_obj_count, - objnode_count, max_objnode_count, - good_puts, puts, no_mem_puts, - dup_puts_flushed, dup_puts_replaced, - found_gets, gets, - gets ? (found_gets*100LL)/gets : 0, - flushs_found, flushs, flush_objs_found, flush_objs); -} - -int main(int ac, char **av) -{ - char *p, c; - char buf[BUFSIZE]; - - while ( (p = fgets(buf,BUFSIZE,stdin)) != NULL ) - { - c = *p++; - if ( *p++ != '=' ) - continue; - switch ( c ) - { - case 'G': - parse_global(p); - break; - case 'T': - parse_time_stats(p); - break; - case 'C': - parse_client(p); - break; - case 'P': - parse_pool(p); - break; - case 'S': - parse_shared_pool(p); - break; - default: - continue; - } - } - return 0; -} diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c index cc8175a11e..2d97dc5ecd 100644 --- a/tools/python/xen/lowlevel/xc/xc.c +++ b/tools/python/xen/lowlevel/xc/xc.c @@ -17,7 +17,6 @@ #include #include -#include #include "xc_dom.h" #include #include @@ -1614,71 +1613,6 @@ static PyObject *dom_op(XcObject *self, PyObject *args, return zero; } -static PyObject *pyxc_tmem_control(XcObject *self, - PyObject *args, - PyObject *kwds) -{ - int32_t pool_id; - uint32_t subop; - uint32_t cli_id; - uint32_t len; - uint32_t arg; - char *buf; - char _buffer[32768], *buffer = _buffer; - int rc; - - static char *kwd_list[] = { "pool_id", "subop", "cli_id", "arg1", "arg2", "buf", NULL }; - - if ( !PyArg_ParseTupleAndKeywords(args, kwds, "iiiiis", kwd_list, - &pool_id, &subop, &cli_id, &len, &arg, &buf) ) - return NULL; - - if ( (subop == XEN_SYSCTL_TMEM_OP_LIST) && (len > 32768) ) - len = 32768; - - if ( (rc = xc_tmem_control(self->xc_handle, pool_id, subop, cli_id, len, arg, buffer)) < 0 ) - return Py_BuildValue("i", rc); - - switch (subop) { - case XEN_SYSCTL_TMEM_OP_LIST: - return Py_BuildValue("s", buffer); - case XEN_SYSCTL_TMEM_OP_FLUSH: - return Py_BuildValue("i", rc); - case XEN_SYSCTL_TMEM_OP_QUERY_FREEABLE_MB: - return Py_BuildValue("i", rc); - case XEN_SYSCTL_TMEM_OP_THAW: - case XEN_SYSCTL_TMEM_OP_FREEZE: - case XEN_SYSCTL_TMEM_OP_DESTROY: - default: - break; - } - - Py_INCREF(zero); - return zero; -} - -static PyObject *pyxc_tmem_shared_auth(XcObject *self, - PyObject *args, - PyObject *kwds) -{ - uint32_t cli_id; - uint32_t arg1; - char *uuid_str; - int rc; - - static char *kwd_list[] = { "cli_id", "uuid_str", "arg1", NULL }; - - if ( !PyArg_ParseTupleAndKeywords(args, kwds, "isi", kwd_list, - &cli_id, &uuid_str, &arg1) ) - return NULL; - - if ( (rc = xc_tmem_auth(self->xc_handle, cli_id, uuid_str, arg1)) < 0 ) - return Py_BuildValue("i", rc); - - Py_INCREF(zero); - return zero; -} - static PyObject *pyxc_dom_set_memshr(XcObject *self, PyObject *args) { uint32_t dom; @@ -2497,27 +2431,6 @@ static PyMethodDef pyxc_methods[] = { " dom [int]: Identifier of domain.\n" }, #endif - { "tmem_control", - (PyCFunction)pyxc_tmem_control, - METH_VARARGS | METH_KEYWORDS, "\n" - "Do various control on a tmem pool.\n" - " pool_id [int]: Identifier of the tmem pool (-1 == all).\n" - " subop [int]: Supplementary Operation.\n" - " cli_id [int]: Client identifier (-1 == all).\n" - " len [int]: Length of 'buf'.\n" - " arg [int]: Argument.\n" - " buf [str]: Buffer.\n\n" - "Returns: [int] 0 or [str] tmem info on success; exception on error.\n" }, - - { "tmem_shared_auth", - (PyCFunction)pyxc_tmem_shared_auth, - METH_VARARGS | METH_KEYWORDS, "\n" - "De/authenticate a shared tmem pool.\n" - " cli_id [int]: Client identifier (-1 == all).\n" - " uuid_str [str]: uuid.\n" - " auth [int]: 0|1 .\n" - "Returns: [int] 0 on success; exception on error.\n" }, - { "dom_set_memshr", (PyCFunction)pyxc_dom_set_memshr, METH_VARARGS, "\n" diff --git a/tools/xenstat/libxenstat/src/xenstat.c b/tools/xenstat/libxenstat/src/xenstat.c index 8b856b32a3..bba143eb53 100644 --- a/tools/xenstat/libxenstat/src/xenstat.c +++ b/tools/xenstat/libxenstat/src/xenstat.c @@ -145,19 +145,6 @@ static inline unsigned long long parse(char *s, char *match) return ret; } -void domain_get_tmem_stats(xenstat_handle * handle, xenstat_domain * domain) -{ - char buffer[4096]; - - if (xc_tmem_control(handle->xc_handle,-1,XEN_SYSCTL_TMEM_OP_LIST,domain->id, - sizeof(buffer),-1,buffer) < 0) - return; - domain->tmem_stats.curr_eph_pages = parse(buffer,"Ec"); - domain->tmem_stats.succ_eph_gets = parse(buffer,"Ge"); - domain->tmem_stats.succ_pers_puts = parse(buffer,"Pp"); - domain->tmem_stats.succ_pers_gets = parse(buffer,"Gp"); -} - xenstat_node *xenstat_get_node(xenstat_handle * handle, unsigned int flags) { #define DOMAIN_CHUNK_SIZE 256 @@ -166,7 +153,6 @@ xenstat_node *xenstat_get_node(xenstat_handle * handle, unsigned int flags) xc_domaininfo_t domaininfo[DOMAIN_CHUNK_SIZE]; int new_domains; unsigned int i; - int rc; /* Create the node */ node = (xenstat_node *) calloc(1, sizeof(xenstat_node)); @@ -190,9 +176,7 @@ xenstat_node *xenstat_get_node(xenstat_handle * handle, unsigned int flags) node->free_mem = ((unsigned long long)physinfo.free_pages) * handle->page_size; - rc = xc_tmem_control(handle->xc_handle, -1, - XEN_SYSCTL_TMEM_OP_QUERY_FREEABLE_MB, -1, 0, 0, NULL); - node->freeable_mb = (rc < 0) ? 0 : rc; + node->freeable_mb = 0; /* malloc(0) is not portable, so allocate a single domain. This will * be resized below. */ node->domains = malloc(sizeof(xenstat_domain)); @@ -260,7 +244,6 @@ xenstat_node *xenstat_get_node(xenstat_handle * handle, unsigned int flags) domain->networks = NULL; domain->num_vbds = 0; domain->vbds = NULL; - domain_get_tmem_stats(handle,domain); domain++; node->num_domains++; @@ -735,40 +718,6 @@ bool xenstat_vbd_error(xenstat_vbd * vbd) return vbd->error; } -/* - * Tmem functions - */ - -xenstat_tmem *xenstat_domain_tmem(xenstat_domain * domain) -{ - return &domain->tmem_stats; -} - -/* Get the current number of ephemeral pages */ -unsigned long long xenstat_tmem_curr_eph_pages(xenstat_tmem *tmem) -{ - return tmem->curr_eph_pages; -} - -/* Get the number of successful ephemeral gets */ -unsigned long long xenstat_tmem_succ_eph_gets(xenstat_tmem *tmem) -{ - return tmem->succ_eph_gets; -} - -/* Get the number of successful persistent puts */ -unsigned long long xenstat_tmem_succ_pers_puts(xenstat_tmem *tmem) -{ - return tmem->succ_pers_puts; -} - -/* Get the number of successful persistent gets */ -unsigned long long xenstat_tmem_succ_pers_gets(xenstat_tmem *tmem) -{ - return tmem->succ_pers_gets; -} - - static char *xenstat_get_domain_name(xenstat_handle *handle, unsigned int domain_id) { char path[80]; diff --git a/tools/xenstat/libxenstat/src/xenstat.h b/tools/xenstat/libxenstat/src/xenstat.h index 9f5053dd92..76a660f321 100644 --- a/tools/xenstat/libxenstat/src/xenstat.h +++ b/tools/xenstat/libxenstat/src/xenstat.h @@ -29,7 +29,6 @@ typedef struct xenstat_node xenstat_node; typedef struct xenstat_vcpu xenstat_vcpu; typedef struct xenstat_network xenstat_network; typedef struct xenstat_vbd xenstat_vbd; -typedef struct xenstat_tmem xenstat_tmem; /* Initialize the xenstat library. Returns a handle to be used with * subsequent calls to the xenstat library, or NULL if an error occurs. */ @@ -72,9 +71,6 @@ unsigned long long xenstat_node_tot_mem(xenstat_node * node); /* Get amount of free memory on a node */ unsigned long long xenstat_node_free_mem(xenstat_node * node); -/* Get amount of tmem freeable memory (in MiB) on a node */ -long xenstat_node_freeable_mb(xenstat_node * node); - /* Find the number of domains existing on a node */ unsigned int xenstat_node_num_domains(xenstat_node * node); @@ -135,9 +131,6 @@ unsigned int xenstat_domain_num_vbds(xenstat_domain *); xenstat_vbd *xenstat_domain_vbd(xenstat_domain * domain, unsigned int vbd); -/* Get the tmem information for a given domain */ -xenstat_tmem *xenstat_domain_tmem(xenstat_domain * domain); - /* * VCPU functions - extract information from a xenstat_vcpu */ @@ -198,12 +191,4 @@ unsigned long long xenstat_vbd_wr_sects(xenstat_vbd * vbd); /* Returns error while getting stats (1 if error happened, 0 otherwise) */ bool xenstat_vbd_error(xenstat_vbd * vbd); -/* - * Tmem functions - extract tmem information - */ -unsigned long long xenstat_tmem_curr_eph_pages(xenstat_tmem *tmem); -unsigned long long xenstat_tmem_succ_eph_gets(xenstat_tmem *tmem); -unsigned long long xenstat_tmem_succ_pers_puts(xenstat_tmem *tmem); -unsigned long long xenstat_tmem_succ_pers_gets(xenstat_tmem *tmem); - #endif /* XENSTAT_H */ diff --git a/tools/xenstat/libxenstat/src/xenstat_priv.h b/tools/xenstat/libxenstat/src/xenstat_priv.h index ebfcd0fff6..4eb44a8ebb 100644 --- a/tools/xenstat/libxenstat/src/xenstat_priv.h +++ b/tools/xenstat/libxenstat/src/xenstat_priv.h @@ -52,13 +52,6 @@ struct xenstat_node { long freeable_mb; }; -struct xenstat_tmem { - unsigned long long curr_eph_pages; - unsigned long long succ_eph_gets; - unsigned long long succ_pers_puts; - unsigned long long succ_pers_gets; -}; - struct xenstat_domain { unsigned int id; char *name; @@ -73,7 +66,6 @@ struct xenstat_domain { xenstat_network *networks; /* Array of length num_networks */ unsigned int num_vbds; xenstat_vbd *vbds; - xenstat_tmem tmem_stats; }; struct xenstat_vcpu { diff --git a/tools/xenstat/xentop/xentop.c b/tools/xenstat/xentop/xentop.c index 31ea0e65a6..af11ebfbf7 100644 --- a/tools/xenstat/xentop/xentop.c +++ b/tools/xenstat/xentop/xentop.c @@ -209,7 +209,6 @@ unsigned int iterations = 0; int show_vcpus = 0; int show_networks = 0; int show_vbds = 0; -int show_tmem = 0; int repeat_header = 0; int show_full_name = 0; #define PROMPT_VAL_LEN 80 @@ -362,9 +361,6 @@ static int handle_key(int ch) case 'b': case 'B': show_vbds ^= 1; break; - case 't': case 'T': - show_tmem ^= 1; - break; case 'r': case 'R': repeat_header ^= 1; break; @@ -973,8 +969,8 @@ void do_summary(void) "%u crashed, %u dying, %u shutdown \n", num_domains, run, block, pause, crash, dying, shutdown); - used = xenstat_node_tot_mem(cur_node)-xenstat_node_free_mem(cur_node); - freeable_mb = xenstat_node_freeable_mb(cur_node); + used = xenstat_node_tot_mem(cur_node); + freeable_mb = 0; /* Dump node memory and cpu information */ if ( freeable_mb <= 0 ) @@ -1032,12 +1028,6 @@ void do_bottom_line(void) attr_addstr(show_vbds ? COLOR_PAIR(1) : 0, "ds"); addstr(" "); - /* tmem */ - addch(A_REVERSE | 'T'); - attr_addstr(show_tmem ? COLOR_PAIR(1) : 0, "mem"); - addstr(" "); - - /* vcpus */ addch(A_REVERSE | 'V'); attr_addstr(show_vcpus ? COLOR_PAIR(1) : 0, "CPUs"); @@ -1175,23 +1165,6 @@ void do_vbd(xenstat_domain *domain) } } -/* Output all tmem information */ -void do_tmem(xenstat_domain *domain) -{ - xenstat_tmem *tmem = xenstat_domain_tmem(domain); - unsigned long long curr_eph_pages = xenstat_tmem_curr_eph_pages(tmem); - unsigned long long succ_eph_gets = xenstat_tmem_succ_eph_gets(tmem); - unsigned long long succ_pers_puts = xenstat_tmem_succ_pers_puts(tmem); - unsigned long long succ_pers_gets = xenstat_tmem_succ_pers_gets(tmem); - - if (curr_eph_pages | succ_eph_gets | succ_pers_puts | succ_pers_gets) - print("Tmem: Curr eph pages: %8llu Succ eph gets: %8llu " - "Succ pers puts: %8llu Succ pers gets: %8llu\n", - curr_eph_pages, succ_eph_gets, - succ_pers_puts, succ_pers_gets); - -} - static void top(void) { xenstat_domain **domains; @@ -1244,8 +1217,6 @@ static void top(void) do_network(domains[i]); if (show_vbds) do_vbd(domains[i]); - if (show_tmem) - do_tmem(domains[i]); } if (!batch) @@ -1321,9 +1292,6 @@ int main(int argc, char **argv) case 'f': show_full_name = 1; break; - case 't': - show_tmem = 1; - break; } } diff --git a/tools/xl/Makefile b/tools/xl/Makefile index 2769295515..af4912e67a 100644 --- a/tools/xl/Makefile +++ b/tools/xl/Makefile @@ -17,7 +17,7 @@ CFLAGS_XL += -Wshadow XL_OBJS-$(CONFIG_X86) = xl_psr.o XL_OBJS = xl.o xl_cmdtable.o xl_sxp.o xl_utils.o $(XL_OBJS-y) -XL_OBJS += xl_tmem.o xl_parse.o xl_cpupool.o xl_flask.o +XL_OBJS += xl_parse.o xl_cpupool.o xl_flask.o XL_OBJS += xl_vtpm.o xl_block.o xl_nic.o xl_usb.o XL_OBJS += xl_sched.o xl_pci.o xl_vcpu.o xl_cdrom.o xl_mem.o XL_OBJS += xl_info.o xl_console.o xl_misc.o diff --git a/tools/xl/xl.h b/tools/xl/xl.h index cf4202bc89..60bdad8ffb 100644 --- a/tools/xl/xl.h +++ b/tools/xl/xl.h @@ -184,12 +184,6 @@ int main_usbdev_detach(int argc, char **argv); int main_usblist(int argc, char **argv); int main_uptime(int argc, char **argv); int main_claims(int argc, char **argv); -int main_tmem_list(int argc, char **argv); -int main_tmem_freeze(int argc, char **argv); -int main_tmem_thaw(int argc, char **argv); -int main_tmem_set(int argc, char **argv); -int main_tmem_shared_auth(int argc, char **argv); -int main_tmem_freeable(int argc, char **argv); int main_network2attach(int argc, char **argv); int main_network2list(int argc, char **argv); int main_network2detach(int argc, char **argv); diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c index 89716badcb..5baa6023aa 100644 --- a/tools/xl/xl_cmdtable.c +++ b/tools/xl/xl_cmdtable.c @@ -443,46 +443,6 @@ struct cmd_spec cmd_table[] = { "", "", }, - { "tmem-list", - &main_tmem_list, 0, 0, - "List tmem pools", - "[-l] [|-a]", - " -l List tmem stats", - }, - { "tmem-freeze", - &main_tmem_freeze, 0, 1, - "Freeze tmem pools", - "[|-a]", - " -a Freeze all tmem", - }, - { "tmem-thaw", - &main_tmem_thaw, 0, 1, - "Thaw tmem pools", - "[|-a]", - " -a Thaw all tmem", - }, - { "tmem-set", - &main_tmem_set, 0, 1, - "Change tmem settings", - "[|-a] [-w[=WEIGHT]|-c[=CAP]|-p[=COMPRESS]]", - " -a Operate on all tmem\n" - " -w WEIGHT Weight (int)\n" - " -p COMPRESS Compress (int)", - }, - { "tmem-shared-auth", - &main_tmem_shared_auth, 0, 1, - "De/authenticate shared tmem pool", - "[|-a] [-u[=UUID] [-A[=AUTH]", - " -a Authenticate for all tmem pools\n" - " -u UUID Specify uuid\n" - " (abcdef01-2345-6789-1234-567890abcdef)\n" - " -A AUTH 0=deauth,1=auth", - }, - { "tmem-freeable", - &main_tmem_freeable, 0, 0, - "Get information about how much freeable memory (MB) is in-use by tmem", - "", - }, { "cpupool-create", &main_cpupoolcreate, 1, 1, "Create a new CPU pool", diff --git a/tools/xl/xl_tmem.c b/tools/xl/xl_tmem.c deleted file mode 100644 index 36214321e6..0000000000 --- a/tools/xl/xl_tmem.c +++ /dev/null @@ -1,251 +0,0 @@ -/* - * Copyright 2009-2017 Citrix Ltd and other contributors - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU Lesser General Public License as published - * by the Free Software Foundation; version 2.1 only. with the special - * exception on linking described in file LICENSE. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU Lesser General Public License for more details. - */ - -#include -#include - -#include - -#include "xl.h" -#include "xl_utils.h" - -int main_tmem_list(int argc, char **argv) -{ - uint32_t domid; - const char *dom = NULL; - char *buf = NULL; - int use_long = 0; - int all = 0; - int opt; - - SWITCH_FOREACH_OPT(opt, "al", NULL, "tmem-list", 0) { - case 'l': - use_long = 1; - break; - case 'a': - all = 1; - break; - } - - dom = argv[optind]; - if (!dom && all == 0) { - fprintf(stderr, "You must specify -a or a domain id.\n\n"); - help("tmem-list"); - return 1; - } - - if (all) - domid = INVALID_DOMID; - else - domid = find_domain(dom); - - buf = libxl_tmem_list(ctx, domid, use_long); - if (buf == NULL) - return EXIT_FAILURE; - - printf("%s\n", buf); - free(buf); - return EXIT_SUCCESS; -} - -int main_tmem_freeze(int argc, char **argv) -{ - uint32_t domid; - const char *dom = NULL; - int all = 0; - int opt; - - SWITCH_FOREACH_OPT(opt, "a", NULL, "tmem-freeze", 0) { - case 'a': - all = 1; - break; - } - - dom = argv[optind]; - if (!dom && all == 0) { - fprintf(stderr, "You must specify -a or a domain id.\n\n"); - help("tmem-freeze"); - return EXIT_FAILURE; - } - - if (all) - domid = INVALID_DOMID; - else - domid = find_domain(dom); - - if (libxl_tmem_freeze(ctx, domid) < 0) - return EXIT_FAILURE; - - return EXIT_SUCCESS; -} - -int main_tmem_thaw(int argc, char **argv) -{ - uint32_t domid; - const char *dom = NULL; - int all = 0; - int opt; - - SWITCH_FOREACH_OPT(opt, "a", NULL, "tmem-thaw", 0) { - case 'a': - all = 1; - break; - } - - dom = argv[optind]; - if (!dom && all == 0) { - fprintf(stderr, "You must specify -a or a domain id.\n\n"); - help("tmem-thaw"); - return EXIT_FAILURE; - } - - if (all) - domid = INVALID_DOMID; - else - domid = find_domain(dom); - - if (libxl_tmem_thaw(ctx, domid) < 0) - return EXIT_FAILURE; - - return EXIT_SUCCESS; -} - -int main_tmem_set(int argc, char **argv) -{ - uint32_t domid; - const char *dom = NULL; - uint32_t weight = 0, cap = 0, compress = 0; - int opt_w = 0, opt_c = 0, opt_p = 0; - int all = 0; - int opt; - int rc = 0; - - SWITCH_FOREACH_OPT(opt, "aw:c:p:", NULL, "tmem-set", 0) { - case 'a': - all = 1; - break; - case 'w': - weight = strtol(optarg, NULL, 10); - opt_w = 1; - break; - case 'c': - cap = strtol(optarg, NULL, 10); - opt_c = 1; - break; - case 'p': - compress = strtol(optarg, NULL, 10); - opt_p = 1; - break; - } - - dom = argv[optind]; - if (!dom && all == 0) { - fprintf(stderr, "You must specify -a or a domain id.\n\n"); - help("tmem-set"); - return EXIT_FAILURE; - } - - if (all) - domid = INVALID_DOMID; - else - domid = find_domain(dom); - - if (!opt_w && !opt_c && !opt_p) { - fprintf(stderr, "No set value specified.\n\n"); - help("tmem-set"); - return EXIT_FAILURE; - } - - if (opt_w) - rc = libxl_tmem_set(ctx, domid, "weight", weight); - if (opt_c) - rc = libxl_tmem_set(ctx, domid, "cap", cap); - if (opt_p) - rc = libxl_tmem_set(ctx, domid, "compress", compress); - - if (rc < 0) - return EXIT_FAILURE; - - return EXIT_SUCCESS; -} - -int main_tmem_shared_auth(int argc, char **argv) -{ - uint32_t domid; - const char *autharg = NULL; - char *endptr = NULL; - const char *dom = NULL; - char *uuid = NULL; - int auth = -1; - int all = 0; - int opt; - - SWITCH_FOREACH_OPT(opt, "au:A:", NULL, "tmem-shared-auth", 0) { - case 'a': - all = 1; - break; - case 'u': - uuid = optarg; - break; - case 'A': - autharg = optarg; - break; - } - - dom = argv[optind]; - if (!dom && all == 0) { - fprintf(stderr, "You must specify -a or a domain id.\n\n"); - help("tmem-shared-auth"); - return EXIT_FAILURE; - } - - if (all) - domid = INVALID_DOMID; - else - domid = find_domain(dom); - - if (uuid == NULL || autharg == NULL) { - fprintf(stderr, "No uuid or auth specified.\n\n"); - help("tmem-shared-auth"); - return EXIT_FAILURE; - } - - auth = strtol(autharg, &endptr, 10); - if (*endptr != '\0') { - fprintf(stderr, "Invalid auth, valid auth are <0|1>.\n\n"); - return EXIT_FAILURE; - } - - if (libxl_tmem_shared_auth(ctx, domid, uuid, auth) < 0) - return EXIT_FAILURE; - - return EXIT_SUCCESS; -} - -int main_tmem_freeable(int argc, char **argv) -{ - int opt; - int mb; - - SWITCH_FOREACH_OPT(opt, "", NULL, "tmem-freeable", 0) { - /* No options */ - } - - mb = libxl_tmem_freeable(ctx); - if (mb == -1) - return EXIT_FAILURE; - - printf("%d\n", mb); - return EXIT_SUCCESS; -} From patchwork Tue Apr 9 16:52:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Liu X-Patchwork-Id: 10891769 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F281C17E1 for ; Tue, 9 Apr 2019 16:55:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CBDDA28904 for ; Tue, 9 Apr 2019 16:55:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BE1B028918; Tue, 9 Apr 2019 16:55:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D5BD828904 for ; Tue, 9 Apr 2019 16:55:33 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hDtzl-0001hQ-HL; Tue, 09 Apr 2019 16:53:13 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hDtzk-0001h6-11 for xen-devel@lists.xenproject.org; Tue, 09 Apr 2019 16:53:12 +0000 X-Inumbo-ID: f1a34638-5ae7-11e9-8b50-7fdf691fc879 Received: from SMTP03.CITRIX.COM (unknown [162.221.156.55]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id f1a34638-5ae7-11e9-8b50-7fdf691fc879; Tue, 09 Apr 2019 16:53:05 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.60,330,1549929600"; d="scan'208";a="83138384" From: Wei Liu To: Date: Tue, 9 Apr 2019 17:52:09 +0100 Message-ID: <20190409165210.7624-3-wei.liu2@citrix.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190409165210.7624-1-wei.liu2@citrix.com> References: <20190409165210.7624-1-wei.liu2@citrix.com> MIME-Version: 1.0 Subject: [Xen-devel] [PATCH v3 2/3] xen: remove tmem from hypervisor X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Stefano Stabellini , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Ian Jackson , Tim Deegan , Julien Grall , Jan Beulich , Daniel De Graaf , =?utf-8?q?Roger_Pau_Monn=C3=A9?= Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP This patch removes all tmem related code and CONFIG_TMEM from the hypervisor. Also remove tmem hypercalls from the default XSM policy. It is written as if tmem is disabled and tmem freeable pages is 0. We will need to keep public/tmem.h around forever to avoid breaking guests. Remove the hypervisor only part and put guest visible part under a xen version check. Take the chance to remove trailing whitespaces. Signed-off-by: Wei Liu Acked-by: Jan Beulich Acked-by: Daniel De Graaf Acked-by: Konrad Rzeszutek Wilk --- v3: 1. Fold in changes agreed in v2. 2. Add acks. 3. Rebase and fix up conflicts v2: 1. remove some more residuals 2. fix errors discovered by Gitlab CI 3. keep public/tmem.h --- MAINTAINERS | 8 - tools/flask/policy/modules/dom0.te | 4 +- tools/flask/policy/modules/guest_features.te | 3 - xen/arch/arm/configs/tiny64.conf | 1 - xen/arch/x86/configs/pvshim_defconfig | 1 - xen/arch/x86/guest/hypercall_page.S | 2 +- xen/arch/x86/hvm/hypercall.c | 3 - xen/arch/x86/hypercall.c | 1 - xen/arch/x86/pv/hypercall.c | 3 - xen/arch/x86/setup.c | 8 - xen/common/Kconfig | 15 - xen/common/Makefile | 4 - xen/common/compat/tmem_xen.c | 23 - xen/common/domain.c | 3 - xen/common/memory.c | 12 +- xen/common/page_alloc.c | 54 +- xen/common/sysctl.c | 5 - xen/common/tmem.c | 2095 ------------------ xen/common/tmem_control.c | 560 ----- xen/common/tmem_xen.c | 277 --- xen/include/Makefile | 1 - xen/include/public/sysctl.h | 108 +- xen/include/public/tmem.h | 14 +- xen/include/xen/hypercall.h | 7 - xen/include/xen/mm.h | 3 - xen/include/xen/sched.h | 3 - xen/include/xen/tmem.h | 45 - xen/include/xen/tmem_control.h | 39 - xen/include/xen/tmem_xen.h | 343 --- xen/include/xlat.lst | 2 - xen/include/xsm/dummy.h | 6 - xen/include/xsm/xsm.h | 6 - xen/xsm/dummy.c | 1 - xen/xsm/flask/hooks.c | 9 - xen/xsm/flask/policy/access_vectors | 4 - 35 files changed, 20 insertions(+), 3653 deletions(-) delete mode 100644 xen/common/compat/tmem_xen.c delete mode 100644 xen/common/tmem.c delete mode 100644 xen/common/tmem_control.c delete mode 100644 xen/common/tmem_xen.c delete mode 100644 xen/include/xen/tmem.h delete mode 100644 xen/include/xen/tmem_control.h delete mode 100644 xen/include/xen/tmem_xen.h diff --git a/MAINTAINERS b/MAINTAINERS index a208bbe304..1f422d96a8 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -402,14 +402,6 @@ F: */configure F: */*.ac F: tools/ -TRANSCENDENT MEMORY (TMEM) -M: Konrad Rzeszutek Wilk -W: https://oss.oracle.com/projects/tmem -S: Supported -F: xen/common/tmem* -F: xen/include/xen/tmem* -F: docs/misc/tmem* - UNMODIFIED LINUX PV DRIVERS M: Jan Beulich S: Obsolete diff --git a/tools/flask/policy/modules/dom0.te b/tools/flask/policy/modules/dom0.te index a347d664f8..9970f9dc08 100644 --- a/tools/flask/policy/modules/dom0.te +++ b/tools/flask/policy/modules/dom0.te @@ -10,8 +10,8 @@ allow dom0_t xen_t:xen { settime tbufcontrol readconsole clearconsole perfcontrol mtrr_add mtrr_del mtrr_read microcode physinfo quirk writeconsole readapic writeapic privprofile nonprivprofile kexec firmware sleep frequency - getidle debug getcpuinfo heap pm_op mca_op lockprof cpupool_op tmem_op - tmem_control getscheduler setscheduler + getidle debug getcpuinfo heap pm_op mca_op lockprof cpupool_op + getscheduler setscheduler }; allow dom0_t xen_t:xen2 { resource_op psr_cmt_op psr_alloc pmu_ctrl get_symbol diff --git a/tools/flask/policy/modules/guest_features.te b/tools/flask/policy/modules/guest_features.te index fe4835db5b..2797a22761 100644 --- a/tools/flask/policy/modules/guest_features.te +++ b/tools/flask/policy/modules/guest_features.te @@ -1,6 +1,3 @@ -# Allow all domains to use (unprivileged parts of) the tmem hypercall -allow domain_type xen_t:xen tmem_op; - # Allow all domains to use PMU (but not to change its settings --- that's what # pmu_ctrl is for) allow domain_type xen_t:xen2 pmu_use; diff --git a/xen/arch/arm/configs/tiny64.conf b/xen/arch/arm/configs/tiny64.conf index aecc55c95f..cc6d93f2f8 100644 --- a/xen/arch/arm/configs/tiny64.conf +++ b/xen/arch/arm/configs/tiny64.conf @@ -11,7 +11,6 @@ CONFIG_ARM=y # # Common Features # -# CONFIG_TMEM is not set CONFIG_SCHED_CREDIT=y # CONFIG_SCHED_CREDIT2 is not set # CONFIG_SCHED_RTDS is not set diff --git a/xen/arch/x86/configs/pvshim_defconfig b/xen/arch/x86/configs/pvshim_defconfig index a12e3d0465..9710aa6238 100644 --- a/xen/arch/x86/configs/pvshim_defconfig +++ b/xen/arch/x86/configs/pvshim_defconfig @@ -11,7 +11,6 @@ CONFIG_NR_CPUS=32 # CONFIG_HVM_FEP is not set # CONFIG_TBOOT is not set # CONFIG_KEXEC is not set -# CONFIG_TMEM is not set # CONFIG_XENOPROF is not set # CONFIG_XSM is not set # CONFIG_SCHED_CREDIT2 is not set diff --git a/xen/arch/x86/guest/hypercall_page.S b/xen/arch/x86/guest/hypercall_page.S index 26afabf909..347fee4ff9 100644 --- a/xen/arch/x86/guest/hypercall_page.S +++ b/xen/arch/x86/guest/hypercall_page.S @@ -58,8 +58,8 @@ DECLARE_HYPERCALL(hvm_op) DECLARE_HYPERCALL(sysctl) DECLARE_HYPERCALL(domctl) DECLARE_HYPERCALL(kexec_op) -DECLARE_HYPERCALL(tmem_op) DECLARE_HYPERCALL(argo_op) +DECLARE_HYPERCALL(xc_reserved_op) DECLARE_HYPERCALL(xenpmu_op) DECLARE_HYPERCALL(arch_0) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 00455ff115..33dd2d99d2 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -135,9 +135,6 @@ static const hypercall_table_t hvm_hypercall_table[] = { HYPERCALL(hvm_op), HYPERCALL(sysctl), HYPERCALL(domctl), -#ifdef CONFIG_TMEM - HYPERCALL(tmem_op), -#endif #ifdef CONFIG_ARGO COMPAT_CALL(argo_op), #endif diff --git a/xen/arch/x86/hypercall.c b/xen/arch/x86/hypercall.c index cf44b82793..d483dbaa6b 100644 --- a/xen/arch/x86/hypercall.c +++ b/xen/arch/x86/hypercall.c @@ -65,7 +65,6 @@ const hypercall_args_t hypercall_args_table[NR_hypercalls] = ARGS(sysctl, 1), ARGS(domctl, 1), ARGS(kexec_op, 2), - ARGS(tmem_op, 1), #ifdef CONFIG_ARGO ARGS(argo_op, 5), #endif diff --git a/xen/arch/x86/pv/hypercall.c b/xen/arch/x86/pv/hypercall.c index e9da8419ca..5fdb8f988f 100644 --- a/xen/arch/x86/pv/hypercall.c +++ b/xen/arch/x86/pv/hypercall.c @@ -76,9 +76,6 @@ const hypercall_table_t pv_hypercall_table[] = { #ifdef CONFIG_KEXEC COMPAT_CALL(kexec_op), #endif -#ifdef CONFIG_TMEM - HYPERCALL(tmem_op), -#endif #ifdef CONFIG_ARGO COMPAT_CALL(argo_op), #endif diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index 3440794275..aea9181a1f 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -25,7 +25,6 @@ #include #include #include -#include #include #include #include @@ -1485,13 +1484,6 @@ void __init noreturn __start_xen(unsigned long mbi_p) s = pfn_to_paddr(limit + 1); init_domheap_pages(s, e); } - - if ( tmem_enabled() ) - { - printk(XENLOG_WARNING - "TMEM physical RAM limit exceeded, disabling TMEM\n"); - tmem_disable(); - } } else end_boot_allocator(); diff --git a/xen/common/Kconfig b/xen/common/Kconfig index c838506241..7a12346f19 100644 --- a/xen/common/Kconfig +++ b/xen/common/Kconfig @@ -88,21 +88,6 @@ config KEXEC If unsure, say Y. -config TMEM - bool "Transcendent Memory Support (deprecated)" if EXPERT = "y" - ---help--- - Transcendent memory allows PV-aware guests to collaborate on memory - usage. Guests can 'swap' their memory to the hypervisor or have an - collective pool of memory shared across guests. The end result is - less memory usage by guests allowing higher guest density. - - You also have to enable it on the Xen commandline by using tmem=1. - - WARNING: This option (and its underlying code) is going to go away - in a future Xen version. - - If unsure, say N. - config XENOPROF def_bool y prompt "Xen Oprofile Support" if EXPERT = "y" diff --git a/xen/common/Makefile b/xen/common/Makefile index bca48e6e22..51df0ba844 100644 --- a/xen/common/Makefile +++ b/xen/common/Makefile @@ -73,10 +73,6 @@ obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma $(lzo-y) unl obj-$(CONFIG_COMPAT) += $(addprefix compat/,domain.o kernel.o memory.o multicall.o xlat.o) -tmem-y := tmem.o tmem_xen.o tmem_control.o -tmem-$(CONFIG_COMPAT) += compat/tmem_xen.o -obj-$(CONFIG_TMEM) += $(tmem-y) - extra-y := symbols-dummy.o subdir-$(CONFIG_COVERAGE) += coverage diff --git a/xen/common/compat/tmem_xen.c b/xen/common/compat/tmem_xen.c deleted file mode 100644 index 5111fd8df6..0000000000 --- a/xen/common/compat/tmem_xen.c +++ /dev/null @@ -1,23 +0,0 @@ -/****************************************************************************** - * tmem_xen.c - * - */ - -#include -#include -#include -#include -#include -#include - -CHECK_tmem_oid; - -/* - * Local variables: - * mode: C - * c-file-style: "BSD" - * c-basic-offset: 4 - * tab-width: 4 - * indent-tabs-mode: nil - * End: - */ diff --git a/xen/common/domain.c b/xen/common/domain.c index 88bbe984bc..90c66079f9 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -41,7 +41,6 @@ #include #include #include -#include #include #ifdef CONFIG_X86 @@ -725,10 +724,8 @@ int domain_kill(struct domain *d) argo_destroy(d); evtchn_destroy(d); gnttab_release_mappings(d); - tmem_destroy(d->tmem_client); vnuma_destroy(d->vnuma); domain_set_outstanding_pages(d, 0); - d->tmem_client = NULL; /* fallthrough */ case DOMDYING_dying: rc = domain_relinquish_resources(d); diff --git a/xen/common/memory.c b/xen/common/memory.c index 86567e6117..20609e153d 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -7,6 +7,7 @@ * Copyright (c) 2003-2005, K A Fraser */ +#include #include #include #include @@ -18,8 +19,6 @@ #include #include #include -#include -#include #include #include #include @@ -250,11 +249,10 @@ static void populate_physmap(struct memop_args *a) if ( unlikely(!page) ) { - if ( !tmem_enabled() || a->extent_order ) - gdprintk(XENLOG_INFO, - "Could not allocate order=%u extent: id=%d memflags=%#x (%u of %u)\n", - a->extent_order, d->domain_id, a->memflags, - i, a->nr_extents); + gdprintk(XENLOG_INFO, + "Could not allocate order=%u extent: id=%d memflags=%#x (%u of %u)\n", + a->extent_order, d->domain_id, a->memflags, + i, a->nr_extents); goto out; } diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index be44158033..702e3e0b42 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -135,8 +135,6 @@ #include #include #include -#include -#include #include #include #include @@ -451,10 +449,6 @@ static unsigned long node_need_scrub[MAX_NUMNODES]; static unsigned long *avail[MAX_NUMNODES]; static long total_avail_pages; -/* TMEM: Reserve a fraction of memory for mid-size (0 - total_avail_pages + tmem_freeable_pages()) && + if ( (outstanding_claims + request > total_avail_pages) && ((memflags & MEMF_no_refcount) || !d || d->outstanding_pages < request) ) { @@ -948,22 +930,6 @@ static struct page_info *alloc_heap_pages( return NULL; } - /* - * TMEM: When available memory is scarce due to tmem absorbing it, allow - * only mid-size allocations to avoid worst of fragmentation issues. - * Others try tmem pools then fail. This is a workaround until all - * post-dom0-creation-multi-page allocations can be eliminated. - */ - if ( ((order == 0) || (order >= 9)) && - (total_avail_pages <= midsize_alloc_zone_pages) && - tmem_freeable_pages() ) - { - /* Try to free memory from tmem. */ - pg = tmem_relinquish_pages(order, memflags); - spin_unlock(&heap_lock); - return pg; - } - pg = get_free_buddy(zone_lo, zone_hi, order, memflags, d); /* Try getting a dirty buddy if we couldn't get a clean one. */ if ( !pg && !(memflags & MEMF_no_scrub) ) @@ -1443,10 +1409,6 @@ static void free_heap_pages( else pg->u.free.first_dirty = INVALID_DIRTY_IDX; - if ( tmem_enabled() ) - midsize_alloc_zone_pages = max( - midsize_alloc_zone_pages, total_avail_pages / MIDSIZE_ALLOC_FRAC); - /* Merge chunks as far as possible. */ while ( order < MAX_ORDER ) { @@ -1833,11 +1795,6 @@ static unsigned long avail_heap_pages( return free_pages; } -unsigned long total_free_pages(void) -{ - return total_avail_pages - midsize_alloc_zone_pages; -} - void __init end_boot_allocator(void) { unsigned int i; @@ -2264,10 +2221,9 @@ int assign_pages( { if ( unlikely((d->tot_pages + (1 << order)) > d->max_pages) ) { - if ( !tmem_enabled() || order != 0 || d->tot_pages != d->max_pages ) - gprintk(XENLOG_INFO, "Over-allocation for domain %u: " - "%u > %u\n", d->domain_id, - d->tot_pages + (1 << order), d->max_pages); + gprintk(XENLOG_INFO, "Over-allocation for domain %u: " + "%u > %u\n", d->domain_id, + d->tot_pages + (1 << order), d->max_pages); rc = -E2BIG; goto out; } diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c index c0aa6bde4e..765effde8d 100644 --- a/xen/common/sysctl.c +++ b/xen/common/sysctl.c @@ -13,7 +13,6 @@ #include #include #include -#include #include #include #include @@ -456,10 +455,6 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl) } #endif - case XEN_SYSCTL_tmem_op: - ret = tmem_control(&op->u.tmem_op); - break; - case XEN_SYSCTL_livepatch_op: ret = livepatch_op(&op->u.livepatch); if ( ret != -ENOSYS && ret != -EOPNOTSUPP ) diff --git a/xen/common/tmem.c b/xen/common/tmem.c deleted file mode 100644 index c077f87e77..0000000000 --- a/xen/common/tmem.c +++ /dev/null @@ -1,2095 +0,0 @@ -/****************************************************************************** - * tmem.c - * - * Transcendent memory - * - * Copyright (c) 2009, Dan Magenheimer, Oracle Corp. - */ - -/* TODO list: 090129 (updated 100318) - - any better reclamation policy? - - use different tlsf pools for each client (maybe each pool) - - test shared access more completely (ocfs2) - - add feedback-driven compression (not for persistent pools though!) - - add data-structure total bytes overhead stats - */ - -#ifdef __XEN__ -#include /* host-specific (eg Xen) code goes here. */ -#endif - -#include -#include -#include -#include -#include -#include - -#define TMEM_SPEC_VERSION 1 - -struct tmem_statistics tmem_stats = { - .global_obj_count = ATOMIC_INIT(0), - .global_pgp_count = ATOMIC_INIT(0), - .global_pcd_count = ATOMIC_INIT(0), - .global_page_count = ATOMIC_INIT(0), - .global_rtree_node_count = ATOMIC_INIT(0), -}; - -/************ CORE DATA STRUCTURES ************************************/ - -struct tmem_object_root { - struct xen_tmem_oid oid; - struct rb_node rb_tree_node; /* Protected by pool->pool_rwlock. */ - unsigned long objnode_count; /* Atomicity depends on obj_spinlock. */ - long pgp_count; /* Atomicity depends on obj_spinlock. */ - struct radix_tree_root tree_root; /* Tree of pages within object. */ - struct tmem_pool *pool; - domid_t last_client; - spinlock_t obj_spinlock; -}; - -struct tmem_object_node { - struct tmem_object_root *obj; - struct radix_tree_node rtn; -}; - -struct tmem_page_descriptor { - union { - struct list_head global_eph_pages; - struct list_head client_inv_pages; - }; - union { - struct { - union { - struct list_head client_eph_pages; - struct list_head pool_pers_pages; - }; - struct tmem_object_root *obj; - } us; - struct xen_tmem_oid inv_oid; /* Used for invalid list only. */ - }; - pagesize_t size; /* 0 == PAGE_SIZE (pfp), -1 == data invalid, - else compressed data (cdata). */ - uint32_t index; - bool eviction_attempted; /* CHANGE TO lifetimes? (settable). */ - union { - struct page_info *pfp; /* Page frame pointer. */ - char *cdata; /* Compressed data. */ - struct tmem_page_content_descriptor *pcd; /* Page dedup. */ - }; - union { - uint64_t timestamp; - uint32_t pool_id; /* Used for invalid list only. */ - }; -}; - -#define PCD_TZE_MAX_SIZE (PAGE_SIZE - (PAGE_SIZE/64)) - -struct tmem_page_content_descriptor { - union { - struct page_info *pfp; /* Page frame pointer. */ - char *cdata; /* If compression_enabled. */ - }; - pagesize_t size; /* If compression_enabled -> 0 *pfp. */ -}; - -static int tmem_initialized = 0; - -struct xmem_pool *tmem_mempool = 0; -unsigned int tmem_mempool_maxalloc = 0; - -DEFINE_SPINLOCK(tmem_page_list_lock); -PAGE_LIST_HEAD(tmem_page_list); -unsigned long tmem_page_list_pages = 0; - -DEFINE_RWLOCK(tmem_rwlock); -static DEFINE_SPINLOCK(eph_lists_spinlock); /* Protects global AND clients. */ -static DEFINE_SPINLOCK(pers_lists_spinlock); - -#define ASSERT_SPINLOCK(_l) ASSERT(spin_is_locked(_l)) -#define ASSERT_WRITELOCK(_l) ASSERT(rw_is_write_locked(_l)) - - atomic_t client_weight_total; - -struct tmem_global tmem_global = { - .ephemeral_page_list = LIST_HEAD_INIT(tmem_global.ephemeral_page_list), - .client_list = LIST_HEAD_INIT(tmem_global.client_list), - .client_weight_total = ATOMIC_INIT(0), -}; - -/* - * There two types of memory allocation interfaces in tmem. - * One is based on xmem_pool and the other is used for allocate a whole page. - * Both of them are based on the lowlevel function __tmem_alloc_page/_thispool(). - * The call trace of alloc path is like below. - * Persistant pool: - * 1.tmem_malloc() - * > xmem_pool_alloc() - * > tmem_persistent_pool_page_get() - * > __tmem_alloc_page_thispool() - * 2.tmem_alloc_page() - * > __tmem_alloc_page_thispool() - * - * Ephemeral pool: - * 1.tmem_malloc() - * > xmem_pool_alloc() - * > tmem_mempool_page_get() - * > __tmem_alloc_page() - * 2.tmem_alloc_page() - * > __tmem_alloc_page() - * - * The free path is done in the same manner. - */ -static void *tmem_malloc(size_t size, struct tmem_pool *pool) -{ - void *v = NULL; - - if ( (pool != NULL) && is_persistent(pool) ) { - if ( pool->client->persistent_pool ) - v = xmem_pool_alloc(size, pool->client->persistent_pool); - } - else - { - ASSERT( size < tmem_mempool_maxalloc ); - ASSERT( tmem_mempool != NULL ); - v = xmem_pool_alloc(size, tmem_mempool); - } - if ( v == NULL ) - tmem_stats.alloc_failed++; - return v; -} - -static void tmem_free(void *p, struct tmem_pool *pool) -{ - if ( pool == NULL || !is_persistent(pool) ) - { - ASSERT( tmem_mempool != NULL ); - xmem_pool_free(p, tmem_mempool); - } - else - { - ASSERT( pool->client->persistent_pool != NULL ); - xmem_pool_free(p, pool->client->persistent_pool); - } -} - -static struct page_info *tmem_alloc_page(struct tmem_pool *pool) -{ - struct page_info *pfp = NULL; - - if ( pool != NULL && is_persistent(pool) ) - pfp = __tmem_alloc_page_thispool(pool->client->domain); - else - pfp = __tmem_alloc_page(); - if ( pfp == NULL ) - tmem_stats.alloc_page_failed++; - else - atomic_inc_and_max(global_page_count); - return pfp; -} - -static void tmem_free_page(struct tmem_pool *pool, struct page_info *pfp) -{ - ASSERT(pfp); - if ( pool == NULL || !is_persistent(pool) ) - __tmem_free_page(pfp); - else - __tmem_free_page_thispool(pfp); - atomic_dec_and_assert(global_page_count); -} - -static void *tmem_mempool_page_get(unsigned long size) -{ - struct page_info *pi; - - ASSERT(size == PAGE_SIZE); - if ( (pi = __tmem_alloc_page()) == NULL ) - return NULL; - return page_to_virt(pi); -} - -static void tmem_mempool_page_put(void *page_va) -{ - ASSERT(IS_PAGE_ALIGNED(page_va)); - __tmem_free_page(virt_to_page(page_va)); -} - -static int __init tmem_mempool_init(void) -{ - tmem_mempool = xmem_pool_create("tmem", tmem_mempool_page_get, - tmem_mempool_page_put, PAGE_SIZE, 0, PAGE_SIZE); - if ( tmem_mempool ) - tmem_mempool_maxalloc = xmem_pool_maxalloc(tmem_mempool); - return tmem_mempool != NULL; -} - -/* Persistent pools are per-domain. */ -static void *tmem_persistent_pool_page_get(unsigned long size) -{ - struct page_info *pi; - struct domain *d = current->domain; - - ASSERT(size == PAGE_SIZE); - if ( (pi = __tmem_alloc_page_thispool(d)) == NULL ) - return NULL; - ASSERT(IS_VALID_PAGE(pi)); - return page_to_virt(pi); -} - -static void tmem_persistent_pool_page_put(void *page_va) -{ - struct page_info *pi; - - ASSERT(IS_PAGE_ALIGNED(page_va)); - pi = mfn_to_page(_mfn(virt_to_mfn(page_va))); - ASSERT(IS_VALID_PAGE(pi)); - __tmem_free_page_thispool(pi); -} - -/* - * Page content descriptor manipulation routines. - */ -#define NOT_SHAREABLE ((uint16_t)-1UL) - -/************ PAGE DESCRIPTOR MANIPULATION ROUTINES *******************/ - -/* Allocate a struct tmem_page_descriptor and associate it with an object. */ -static struct tmem_page_descriptor *pgp_alloc(struct tmem_object_root *obj) -{ - struct tmem_page_descriptor *pgp; - struct tmem_pool *pool; - - ASSERT(obj != NULL); - ASSERT(obj->pool != NULL); - pool = obj->pool; - if ( (pgp = tmem_malloc(sizeof(struct tmem_page_descriptor), pool)) == NULL ) - return NULL; - pgp->us.obj = obj; - INIT_LIST_HEAD(&pgp->global_eph_pages); - INIT_LIST_HEAD(&pgp->us.client_eph_pages); - pgp->pfp = NULL; - pgp->size = -1; - pgp->index = -1; - pgp->timestamp = get_cycles(); - atomic_inc_and_max(global_pgp_count); - atomic_inc(&pool->pgp_count); - if ( _atomic_read(pool->pgp_count) > pool->pgp_count_max ) - pool->pgp_count_max = _atomic_read(pool->pgp_count); - return pgp; -} - -static struct tmem_page_descriptor *pgp_lookup_in_obj(struct tmem_object_root *obj, uint32_t index) -{ - ASSERT(obj != NULL); - ASSERT_SPINLOCK(&obj->obj_spinlock); - ASSERT(obj->pool != NULL); - return radix_tree_lookup(&obj->tree_root, index); -} - -static void pgp_free_data(struct tmem_page_descriptor *pgp, struct tmem_pool *pool) -{ - pagesize_t pgp_size = pgp->size; - - if ( pgp->pfp == NULL ) - return; - if ( pgp_size ) - tmem_free(pgp->cdata, pool); - else - tmem_free_page(pgp->us.obj->pool,pgp->pfp); - if ( pool != NULL && pgp_size ) - { - pool->client->compressed_pages--; - pool->client->compressed_sum_size -= pgp_size; - } - pgp->pfp = NULL; - pgp->size = -1; -} - -static void __pgp_free(struct tmem_page_descriptor *pgp, struct tmem_pool *pool) -{ - pgp->us.obj = NULL; - pgp->index = -1; - tmem_free(pgp, pool); -} - -static void pgp_free(struct tmem_page_descriptor *pgp) -{ - struct tmem_pool *pool = NULL; - - ASSERT(pgp->us.obj != NULL); - ASSERT(pgp->us.obj->pool != NULL); - ASSERT(pgp->us.obj->pool->client != NULL); - - pool = pgp->us.obj->pool; - if ( !is_persistent(pool) ) - { - ASSERT(list_empty(&pgp->global_eph_pages)); - ASSERT(list_empty(&pgp->us.client_eph_pages)); - } - pgp_free_data(pgp, pool); - atomic_dec_and_assert(global_pgp_count); - atomic_dec(&pool->pgp_count); - ASSERT(_atomic_read(pool->pgp_count) >= 0); - pgp->size = -1; - if ( is_persistent(pool) && pool->client->info.flags.u.migrating ) - { - pgp->inv_oid = pgp->us.obj->oid; - pgp->pool_id = pool->pool_id; - return; - } - __pgp_free(pgp, pool); -} - -/* Remove pgp from global/pool/client lists and free it. */ -static void pgp_delist_free(struct tmem_page_descriptor *pgp) -{ - struct client *client; - uint64_t life; - - ASSERT(pgp != NULL); - ASSERT(pgp->us.obj != NULL); - ASSERT(pgp->us.obj->pool != NULL); - client = pgp->us.obj->pool->client; - ASSERT(client != NULL); - - /* Delist pgp. */ - if ( !is_persistent(pgp->us.obj->pool) ) - { - spin_lock(&eph_lists_spinlock); - if ( !list_empty(&pgp->us.client_eph_pages) ) - client->eph_count--; - ASSERT(client->eph_count >= 0); - list_del_init(&pgp->us.client_eph_pages); - if ( !list_empty(&pgp->global_eph_pages) ) - tmem_global.eph_count--; - ASSERT(tmem_global.eph_count >= 0); - list_del_init(&pgp->global_eph_pages); - spin_unlock(&eph_lists_spinlock); - } - else - { - if ( client->info.flags.u.migrating ) - { - spin_lock(&pers_lists_spinlock); - list_add_tail(&pgp->client_inv_pages, - &client->persistent_invalidated_list); - if ( pgp != pgp->us.obj->pool->cur_pgp ) - list_del_init(&pgp->us.pool_pers_pages); - spin_unlock(&pers_lists_spinlock); - } - else - { - spin_lock(&pers_lists_spinlock); - list_del_init(&pgp->us.pool_pers_pages); - spin_unlock(&pers_lists_spinlock); - } - } - life = get_cycles() - pgp->timestamp; - pgp->us.obj->pool->sum_life_cycles += life; - - /* Free pgp. */ - pgp_free(pgp); -} - -/* Called only indirectly by radix_tree_destroy. */ -static void pgp_destroy(void *v) -{ - struct tmem_page_descriptor *pgp = (struct tmem_page_descriptor *)v; - - pgp->us.obj->pgp_count--; - pgp_delist_free(pgp); -} - -static int pgp_add_to_obj(struct tmem_object_root *obj, uint32_t index, struct tmem_page_descriptor *pgp) -{ - int ret; - - ASSERT_SPINLOCK(&obj->obj_spinlock); - ret = radix_tree_insert(&obj->tree_root, index, pgp); - if ( !ret ) - obj->pgp_count++; - return ret; -} - -static struct tmem_page_descriptor *pgp_delete_from_obj(struct tmem_object_root *obj, uint32_t index) -{ - struct tmem_page_descriptor *pgp; - - ASSERT(obj != NULL); - ASSERT_SPINLOCK(&obj->obj_spinlock); - ASSERT(obj->pool != NULL); - pgp = radix_tree_delete(&obj->tree_root, index); - if ( pgp != NULL ) - obj->pgp_count--; - ASSERT(obj->pgp_count >= 0); - - return pgp; -} - -/************ RADIX TREE NODE MANIPULATION ROUTINES *******************/ - -/* Called only indirectly from radix_tree_insert. */ -static struct radix_tree_node *rtn_alloc(void *arg) -{ - struct tmem_object_node *objnode; - struct tmem_object_root *obj = (struct tmem_object_root *)arg; - - ASSERT(obj->pool != NULL); - objnode = tmem_malloc(sizeof(struct tmem_object_node),obj->pool); - if (objnode == NULL) - return NULL; - objnode->obj = obj; - memset(&objnode->rtn, 0, sizeof(struct radix_tree_node)); - if (++obj->pool->objnode_count > obj->pool->objnode_count_max) - obj->pool->objnode_count_max = obj->pool->objnode_count; - atomic_inc_and_max(global_rtree_node_count); - obj->objnode_count++; - return &objnode->rtn; -} - -/* Called only indirectly from radix_tree_delete/destroy. */ -static void rtn_free(struct radix_tree_node *rtn, void *arg) -{ - struct tmem_pool *pool; - struct tmem_object_node *objnode; - - ASSERT(rtn != NULL); - objnode = container_of(rtn,struct tmem_object_node,rtn); - ASSERT(objnode->obj != NULL); - ASSERT_SPINLOCK(&objnode->obj->obj_spinlock); - pool = objnode->obj->pool; - ASSERT(pool != NULL); - pool->objnode_count--; - objnode->obj->objnode_count--; - objnode->obj = NULL; - tmem_free(objnode, pool); - atomic_dec_and_assert(global_rtree_node_count); -} - -/************ POOL OBJECT COLLECTION MANIPULATION ROUTINES *******************/ - -static int oid_compare(struct xen_tmem_oid *left, - struct xen_tmem_oid *right) -{ - if ( left->oid[2] == right->oid[2] ) - { - if ( left->oid[1] == right->oid[1] ) - { - if ( left->oid[0] == right->oid[0] ) - return 0; - else if ( left->oid[0] < right->oid[0] ) - return -1; - else - return 1; - } - else if ( left->oid[1] < right->oid[1] ) - return -1; - else - return 1; - } - else if ( left->oid[2] < right->oid[2] ) - return -1; - else - return 1; -} - -static void oid_set_invalid(struct xen_tmem_oid *oidp) -{ - oidp->oid[0] = oidp->oid[1] = oidp->oid[2] = -1UL; -} - -static unsigned oid_hash(struct xen_tmem_oid *oidp) -{ - return (tmem_hash(oidp->oid[0] ^ oidp->oid[1] ^ oidp->oid[2], - BITS_PER_LONG) & OBJ_HASH_BUCKETS_MASK); -} - -/* Searches for object==oid in pool, returns locked object if found. */ -static struct tmem_object_root * obj_find(struct tmem_pool *pool, - struct xen_tmem_oid *oidp) -{ - struct rb_node *node; - struct tmem_object_root *obj; - -restart_find: - read_lock(&pool->pool_rwlock); - node = pool->obj_rb_root[oid_hash(oidp)].rb_node; - while ( node ) - { - obj = container_of(node, struct tmem_object_root, rb_tree_node); - switch ( oid_compare(&obj->oid, oidp) ) - { - case 0: /* Equal. */ - if ( !spin_trylock(&obj->obj_spinlock) ) - { - read_unlock(&pool->pool_rwlock); - goto restart_find; - } - read_unlock(&pool->pool_rwlock); - return obj; - case -1: - node = node->rb_left; - break; - case 1: - node = node->rb_right; - } - } - read_unlock(&pool->pool_rwlock); - return NULL; -} - -/* Free an object that has no more pgps in it. */ -static void obj_free(struct tmem_object_root *obj) -{ - struct tmem_pool *pool; - struct xen_tmem_oid old_oid; - - ASSERT_SPINLOCK(&obj->obj_spinlock); - ASSERT(obj != NULL); - ASSERT(obj->pgp_count == 0); - pool = obj->pool; - ASSERT(pool != NULL); - ASSERT(pool->client != NULL); - ASSERT_WRITELOCK(&pool->pool_rwlock); - if ( obj->tree_root.rnode != NULL ) /* May be a "stump" with no leaves. */ - radix_tree_destroy(&obj->tree_root, pgp_destroy); - ASSERT((long)obj->objnode_count == 0); - ASSERT(obj->tree_root.rnode == NULL); - pool->obj_count--; - ASSERT(pool->obj_count >= 0); - obj->pool = NULL; - old_oid = obj->oid; - oid_set_invalid(&obj->oid); - obj->last_client = TMEM_CLI_ID_NULL; - atomic_dec_and_assert(global_obj_count); - rb_erase(&obj->rb_tree_node, &pool->obj_rb_root[oid_hash(&old_oid)]); - spin_unlock(&obj->obj_spinlock); - tmem_free(obj, pool); -} - -static int obj_rb_insert(struct rb_root *root, struct tmem_object_root *obj) -{ - struct rb_node **new, *parent = NULL; - struct tmem_object_root *this; - - ASSERT(obj->pool); - ASSERT_WRITELOCK(&obj->pool->pool_rwlock); - - new = &(root->rb_node); - while ( *new ) - { - this = container_of(*new, struct tmem_object_root, rb_tree_node); - parent = *new; - switch ( oid_compare(&this->oid, &obj->oid) ) - { - case 0: - return 0; - case -1: - new = &((*new)->rb_left); - break; - case 1: - new = &((*new)->rb_right); - break; - } - } - rb_link_node(&obj->rb_tree_node, parent, new); - rb_insert_color(&obj->rb_tree_node, root); - return 1; -} - -/* - * Allocate, initialize, and insert an tmem_object_root - * (should be called only if find failed). - */ -static struct tmem_object_root * obj_alloc(struct tmem_pool *pool, - struct xen_tmem_oid *oidp) -{ - struct tmem_object_root *obj; - - ASSERT(pool != NULL); - if ( (obj = tmem_malloc(sizeof(struct tmem_object_root), pool)) == NULL ) - return NULL; - pool->obj_count++; - if (pool->obj_count > pool->obj_count_max) - pool->obj_count_max = pool->obj_count; - atomic_inc_and_max(global_obj_count); - radix_tree_init(&obj->tree_root); - radix_tree_set_alloc_callbacks(&obj->tree_root, rtn_alloc, rtn_free, obj); - spin_lock_init(&obj->obj_spinlock); - obj->pool = pool; - obj->oid = *oidp; - obj->objnode_count = 0; - obj->pgp_count = 0; - obj->last_client = TMEM_CLI_ID_NULL; - return obj; -} - -/* Free an object after destroying any pgps in it. */ -static void obj_destroy(struct tmem_object_root *obj) -{ - ASSERT_WRITELOCK(&obj->pool->pool_rwlock); - radix_tree_destroy(&obj->tree_root, pgp_destroy); - obj_free(obj); -} - -/* Destroys all objs in a pool, or only if obj->last_client matches cli_id. */ -static void pool_destroy_objs(struct tmem_pool *pool, domid_t cli_id) -{ - struct rb_node *node; - struct tmem_object_root *obj; - int i; - - write_lock(&pool->pool_rwlock); - pool->is_dying = 1; - for (i = 0; i < OBJ_HASH_BUCKETS; i++) - { - node = rb_first(&pool->obj_rb_root[i]); - while ( node != NULL ) - { - obj = container_of(node, struct tmem_object_root, rb_tree_node); - spin_lock(&obj->obj_spinlock); - node = rb_next(node); - if ( obj->last_client == cli_id ) - obj_destroy(obj); - else - spin_unlock(&obj->obj_spinlock); - } - } - write_unlock(&pool->pool_rwlock); -} - - -/************ POOL MANIPULATION ROUTINES ******************************/ - -static struct tmem_pool * pool_alloc(void) -{ - struct tmem_pool *pool; - int i; - - if ( (pool = xzalloc(struct tmem_pool)) == NULL ) - return NULL; - for (i = 0; i < OBJ_HASH_BUCKETS; i++) - pool->obj_rb_root[i] = RB_ROOT; - INIT_LIST_HEAD(&pool->persistent_page_list); - rwlock_init(&pool->pool_rwlock); - return pool; -} - -static void pool_free(struct tmem_pool *pool) -{ - pool->client = NULL; - xfree(pool); -} - -/* - * Register new_client as a user of this shared pool and return 0 on succ. - */ -static int shared_pool_join(struct tmem_pool *pool, struct client *new_client) -{ - struct share_list *sl; - ASSERT(is_shared(pool)); - - if ( (sl = tmem_malloc(sizeof(struct share_list), NULL)) == NULL ) - return -1; - sl->client = new_client; - list_add_tail(&sl->share_list, &pool->share_list); - if ( new_client->cli_id != pool->client->cli_id ) - tmem_client_info("adding new %s %d to shared pool owned by %s %d\n", - tmem_client_str, new_client->cli_id, tmem_client_str, - pool->client->cli_id); - else if ( pool->shared_count ) - tmem_client_info("inter-guest sharing of shared pool %s by client %d\n", - tmem_client_str, pool->client->cli_id); - ++pool->shared_count; - return 0; -} - -/* Reassign "ownership" of the pool to another client that shares this pool. */ -static void shared_pool_reassign(struct tmem_pool *pool) -{ - struct share_list *sl; - int poolid; - struct client *old_client = pool->client, *new_client; - - ASSERT(is_shared(pool)); - if ( list_empty(&pool->share_list) ) - { - ASSERT(pool->shared_count == 0); - return; - } - old_client->pools[pool->pool_id] = NULL; - sl = list_entry(pool->share_list.next, struct share_list, share_list); - /* - * The sl->client can be old_client if there are multiple shared pools - * within an guest. - */ - pool->client = new_client = sl->client; - for (poolid = 0; poolid < MAX_POOLS_PER_DOMAIN; poolid++) - if (new_client->pools[poolid] == pool) - break; - ASSERT(poolid != MAX_POOLS_PER_DOMAIN); - new_client->eph_count += _atomic_read(pool->pgp_count); - old_client->eph_count -= _atomic_read(pool->pgp_count); - list_splice_init(&old_client->ephemeral_page_list, - &new_client->ephemeral_page_list); - tmem_client_info("reassigned shared pool from %s=%d to %s=%d pool_id=%d\n", - tmem_cli_id_str, old_client->cli_id, tmem_cli_id_str, new_client->cli_id, poolid); - pool->pool_id = poolid; -} - -/* - * Destroy all objects with last_client same as passed cli_id, - * remove pool's cli_id from list of sharers of this pool. - */ -static int shared_pool_quit(struct tmem_pool *pool, domid_t cli_id) -{ - struct share_list *sl; - int s_poolid; - - ASSERT(is_shared(pool)); - ASSERT(pool->client != NULL); - - ASSERT_WRITELOCK(&tmem_rwlock); - pool_destroy_objs(pool, cli_id); - list_for_each_entry(sl,&pool->share_list, share_list) - { - if (sl->client->cli_id != cli_id) - continue; - list_del(&sl->share_list); - tmem_free(sl, pool); - --pool->shared_count; - if (pool->client->cli_id == cli_id) - shared_pool_reassign(pool); - if (pool->shared_count) - return pool->shared_count; - for (s_poolid = 0; s_poolid < MAX_GLOBAL_SHARED_POOLS; s_poolid++) - if ( (tmem_global.shared_pools[s_poolid]) == pool ) - { - tmem_global.shared_pools[s_poolid] = NULL; - break; - } - return 0; - } - tmem_client_warn("tmem: no match unsharing pool, %s=%d\n", - tmem_cli_id_str,pool->client->cli_id); - return -1; -} - -/* Flush all data (owned by cli_id) from a pool and, optionally, free it. */ -static void pool_flush(struct tmem_pool *pool, domid_t cli_id) -{ - ASSERT(pool != NULL); - if ( (is_shared(pool)) && (shared_pool_quit(pool,cli_id) > 0) ) - { - tmem_client_warn("tmem: %s=%d no longer using shared pool %d owned by %s=%d\n", - tmem_cli_id_str, cli_id, pool->pool_id, tmem_cli_id_str,pool->client->cli_id); - return; - } - tmem_client_info("Destroying %s-%s tmem pool %s=%d pool_id=%d\n", - is_persistent(pool) ? "persistent" : "ephemeral" , - is_shared(pool) ? "shared" : "private", - tmem_cli_id_str, pool->client->cli_id, pool->pool_id); - if ( pool->client->info.flags.u.migrating ) - { - tmem_client_warn("can't destroy pool while %s is live-migrating\n", - tmem_client_str); - return; - } - pool_destroy_objs(pool, TMEM_CLI_ID_NULL); - pool->client->pools[pool->pool_id] = NULL; - pool_free(pool); -} - -/************ CLIENT MANIPULATION OPERATIONS **************************/ - -struct client *client_create(domid_t cli_id) -{ - struct client *client = xzalloc(struct client); - int i, shift; - char name[5]; - struct domain *d; - - tmem_client_info("tmem: initializing tmem capability for %s=%d...", - tmem_cli_id_str, cli_id); - if ( client == NULL ) - { - tmem_client_err("failed... out of memory\n"); - goto fail; - } - - for (i = 0, shift = 12; i < 4; shift -=4, i++) - name[i] = (((unsigned short)cli_id >> shift) & 0xf) + '0'; - name[4] = '\0'; - client->persistent_pool = xmem_pool_create(name, tmem_persistent_pool_page_get, - tmem_persistent_pool_page_put, PAGE_SIZE, 0, PAGE_SIZE); - if ( client->persistent_pool == NULL ) - { - tmem_client_err("failed... can't alloc persistent pool\n"); - goto fail; - } - - d = rcu_lock_domain_by_id(cli_id); - if ( d == NULL ) { - tmem_client_err("failed... can't set client\n"); - xmem_pool_destroy(client->persistent_pool); - goto fail; - } - if ( !d->is_dying ) { - d->tmem_client = client; - client->domain = d; - } - rcu_unlock_domain(d); - - client->cli_id = cli_id; - client->info.version = TMEM_SPEC_VERSION; - client->info.maxpools = MAX_POOLS_PER_DOMAIN; - client->info.flags.u.compress = tmem_compression_enabled(); - for ( i = 0; i < MAX_GLOBAL_SHARED_POOLS; i++) - client->shared_auth_uuid[i][0] = - client->shared_auth_uuid[i][1] = -1L; - list_add_tail(&client->client_list, &tmem_global.client_list); - INIT_LIST_HEAD(&client->ephemeral_page_list); - INIT_LIST_HEAD(&client->persistent_invalidated_list); - tmem_client_info("ok\n"); - return client; - - fail: - xfree(client); - return NULL; -} - -static void client_free(struct client *client) -{ - list_del(&client->client_list); - xmem_pool_destroy(client->persistent_pool); - xfree(client); -} - -/* Flush all data from a client and, optionally, free it. */ -static void client_flush(struct client *client) -{ - int i; - struct tmem_pool *pool; - - for (i = 0; i < MAX_POOLS_PER_DOMAIN; i++) - { - if ( (pool = client->pools[i]) == NULL ) - continue; - pool_flush(pool, client->cli_id); - client->pools[i] = NULL; - client->info.nr_pools--; - } - client_free(client); -} - -static bool client_over_quota(const struct client *client) -{ - int total = _atomic_read(tmem_global.client_weight_total); - - ASSERT(client != NULL); - if ( (total == 0) || (client->info.weight == 0) || - (client->eph_count == 0) ) - return false; - - return (((tmem_global.eph_count * 100L) / client->eph_count) > - ((total * 100L) / client->info.weight)); -} - -/************ MEMORY REVOCATION ROUTINES *******************************/ - -static bool tmem_try_to_evict_pgp(struct tmem_page_descriptor *pgp, - bool *hold_pool_rwlock) -{ - struct tmem_object_root *obj = pgp->us.obj; - struct tmem_pool *pool = obj->pool; - - if ( pool->is_dying ) - return false; - if ( spin_trylock(&obj->obj_spinlock) ) - { - if ( obj->pgp_count > 1 ) - return true; - if ( write_trylock(&pool->pool_rwlock) ) - { - *hold_pool_rwlock = 1; - return true; - } - spin_unlock(&obj->obj_spinlock); - } - return false; -} - -int tmem_evict(void) -{ - struct client *client = current->domain->tmem_client; - struct tmem_page_descriptor *pgp = NULL, *pgp_del; - struct tmem_object_root *obj; - struct tmem_pool *pool; - int ret = 0; - bool hold_pool_rwlock = false; - - tmem_stats.evict_attempts++; - spin_lock(&eph_lists_spinlock); - if ( (client != NULL) && client_over_quota(client) && - !list_empty(&client->ephemeral_page_list) ) - { - list_for_each_entry(pgp, &client->ephemeral_page_list, us.client_eph_pages) - if ( tmem_try_to_evict_pgp(pgp, &hold_pool_rwlock) ) - goto found; - } - else if ( !list_empty(&tmem_global.ephemeral_page_list) ) - { - list_for_each_entry(pgp, &tmem_global.ephemeral_page_list, global_eph_pages) - if ( tmem_try_to_evict_pgp(pgp, &hold_pool_rwlock) ) - { - client = pgp->us.obj->pool->client; - goto found; - } - } - /* Global_ephemeral_page_list is empty, so we bail out. */ - spin_unlock(&eph_lists_spinlock); - goto out; - -found: - /* Delist. */ - list_del_init(&pgp->us.client_eph_pages); - client->eph_count--; - list_del_init(&pgp->global_eph_pages); - tmem_global.eph_count--; - ASSERT(tmem_global.eph_count >= 0); - ASSERT(client->eph_count >= 0); - spin_unlock(&eph_lists_spinlock); - - ASSERT(pgp != NULL); - obj = pgp->us.obj; - ASSERT(obj != NULL); - ASSERT(obj->pool != NULL); - pool = obj->pool; - - ASSERT_SPINLOCK(&obj->obj_spinlock); - pgp_del = pgp_delete_from_obj(obj, pgp->index); - ASSERT(pgp_del == pgp); - - /* pgp already delist, so call pgp_free directly. */ - pgp_free(pgp); - if ( obj->pgp_count == 0 ) - { - ASSERT_WRITELOCK(&pool->pool_rwlock); - obj_free(obj); - } - else - spin_unlock(&obj->obj_spinlock); - if ( hold_pool_rwlock ) - write_unlock(&pool->pool_rwlock); - tmem_stats.evicted_pgs++; - ret = 1; -out: - return ret; -} - - -/* - * Under certain conditions (e.g. if each client is putting pages for exactly - * one object), once locks are held, freeing up memory may - * result in livelocks and very long "put" times, so we try to ensure there - * is a minimum amount of memory (1MB) available BEFORE any data structure - * locks are held. - */ -static inline bool tmem_ensure_avail_pages(void) -{ - int failed_evict = 10; - unsigned long free_mem; - - do { - free_mem = (tmem_page_list_pages + total_free_pages()) - >> (20 - PAGE_SHIFT); - if ( free_mem ) - return true; - if ( !tmem_evict() ) - failed_evict--; - } while ( failed_evict > 0 ); - - return false; -} - -/************ TMEM CORE OPERATIONS ************************************/ - -static int do_tmem_put_compress(struct tmem_page_descriptor *pgp, xen_pfn_t cmfn, - tmem_cli_va_param_t clibuf) -{ - void *dst, *p; - size_t size; - int ret = 0; - - ASSERT(pgp != NULL); - ASSERT(pgp->us.obj != NULL); - ASSERT_SPINLOCK(&pgp->us.obj->obj_spinlock); - ASSERT(pgp->us.obj->pool != NULL); - ASSERT(pgp->us.obj->pool->client != NULL); - - if ( pgp->pfp != NULL ) - pgp_free_data(pgp, pgp->us.obj->pool); - ret = tmem_compress_from_client(cmfn, &dst, &size, clibuf); - if ( ret <= 0 ) - goto out; - else if ( (size == 0) || (size >= tmem_mempool_maxalloc) ) { - ret = 0; - goto out; - } else if ( (p = tmem_malloc(size,pgp->us.obj->pool)) == NULL ) { - ret = -ENOMEM; - goto out; - } else { - memcpy(p,dst,size); - pgp->cdata = p; - } - pgp->size = size; - pgp->us.obj->pool->client->compressed_pages++; - pgp->us.obj->pool->client->compressed_sum_size += size; - ret = 1; - -out: - return ret; -} - -static int do_tmem_dup_put(struct tmem_page_descriptor *pgp, xen_pfn_t cmfn, - tmem_cli_va_param_t clibuf) -{ - struct tmem_pool *pool; - struct tmem_object_root *obj; - struct client *client; - struct tmem_page_descriptor *pgpfound = NULL; - int ret; - - ASSERT(pgp != NULL); - ASSERT(pgp->pfp != NULL); - ASSERT(pgp->size != -1); - obj = pgp->us.obj; - ASSERT_SPINLOCK(&obj->obj_spinlock); - ASSERT(obj != NULL); - pool = obj->pool; - ASSERT(pool != NULL); - client = pool->client; - if ( client->info.flags.u.migrating ) - goto failed_dup; /* No dups allowed when migrating. */ - /* Can we successfully manipulate pgp to change out the data? */ - if ( client->info.flags.u.compress && pgp->size != 0 ) - { - ret = do_tmem_put_compress(pgp, cmfn, clibuf); - if ( ret == 1 ) - goto done; - else if ( ret == 0 ) - goto copy_uncompressed; - else if ( ret == -ENOMEM ) - goto failed_dup; - else if ( ret == -EFAULT ) - goto bad_copy; - } - -copy_uncompressed: - if ( pgp->pfp ) - pgp_free_data(pgp, pool); - if ( ( pgp->pfp = tmem_alloc_page(pool) ) == NULL ) - goto failed_dup; - pgp->size = 0; - ret = tmem_copy_from_client(pgp->pfp, cmfn, tmem_cli_buf_null); - if ( ret < 0 ) - goto bad_copy; - -done: - /* Successfully replaced data, clean up and return success. */ - if ( is_shared(pool) ) - obj->last_client = client->cli_id; - spin_unlock(&obj->obj_spinlock); - pool->dup_puts_replaced++; - pool->good_puts++; - if ( is_persistent(pool) ) - client->succ_pers_puts++; - return 1; - -bad_copy: - tmem_stats.failed_copies++; - goto cleanup; - -failed_dup: - /* - * Couldn't change out the data, flush the old data and return - * -ENOSPC instead of -ENOMEM to differentiate failed _dup_ put. - */ - ret = -ENOSPC; -cleanup: - pgpfound = pgp_delete_from_obj(obj, pgp->index); - ASSERT(pgpfound == pgp); - pgp_delist_free(pgpfound); - if ( obj->pgp_count == 0 ) - { - write_lock(&pool->pool_rwlock); - obj_free(obj); - write_unlock(&pool->pool_rwlock); - } else { - spin_unlock(&obj->obj_spinlock); - } - pool->dup_puts_flushed++; - return ret; -} - -static int do_tmem_put(struct tmem_pool *pool, - struct xen_tmem_oid *oidp, uint32_t index, - xen_pfn_t cmfn, tmem_cli_va_param_t clibuf) -{ - struct tmem_object_root *obj = NULL; - struct tmem_page_descriptor *pgp = NULL; - struct client *client; - int ret, newobj = 0; - - ASSERT(pool != NULL); - client = pool->client; - ASSERT(client != NULL); - ret = client->info.flags.u.frozen ? -EFROZEN : -ENOMEM; - pool->puts++; - -refind: - /* Does page already exist (dup)? if so, handle specially. */ - if ( (obj = obj_find(pool, oidp)) != NULL ) - { - if ((pgp = pgp_lookup_in_obj(obj, index)) != NULL) - { - return do_tmem_dup_put(pgp, cmfn, clibuf); - } - else - { - /* No puts allowed into a frozen pool (except dup puts). */ - if ( client->info.flags.u.frozen ) - goto unlock_obj; - } - } - else - { - /* No puts allowed into a frozen pool (except dup puts). */ - if ( client->info.flags.u.frozen ) - return ret; - if ( (obj = obj_alloc(pool, oidp)) == NULL ) - return -ENOMEM; - - write_lock(&pool->pool_rwlock); - /* - * Parallel callers may already allocated obj and inserted to obj_rb_root - * before us. - */ - if ( !obj_rb_insert(&pool->obj_rb_root[oid_hash(oidp)], obj) ) - { - tmem_free(obj, pool); - write_unlock(&pool->pool_rwlock); - goto refind; - } - - spin_lock(&obj->obj_spinlock); - newobj = 1; - write_unlock(&pool->pool_rwlock); - } - - /* When arrive here, we have a spinlocked obj for use. */ - ASSERT_SPINLOCK(&obj->obj_spinlock); - if ( (pgp = pgp_alloc(obj)) == NULL ) - goto unlock_obj; - - ret = pgp_add_to_obj(obj, index, pgp); - if ( ret == -ENOMEM ) - /* Warning: may result in partially built radix tree ("stump"). */ - goto free_pgp; - - pgp->index = index; - pgp->size = 0; - - if ( client->info.flags.u.compress ) - { - ASSERT(pgp->pfp == NULL); - ret = do_tmem_put_compress(pgp, cmfn, clibuf); - if ( ret == 1 ) - goto insert_page; - if ( ret == -ENOMEM ) - { - client->compress_nomem++; - goto del_pgp_from_obj; - } - if ( ret == 0 ) - { - client->compress_poor++; - goto copy_uncompressed; - } - if ( ret == -EFAULT ) - goto bad_copy; - } - -copy_uncompressed: - if ( ( pgp->pfp = tmem_alloc_page(pool) ) == NULL ) - { - ret = -ENOMEM; - goto del_pgp_from_obj; - } - ret = tmem_copy_from_client(pgp->pfp, cmfn, clibuf); - if ( ret < 0 ) - goto bad_copy; - -insert_page: - if ( !is_persistent(pool) ) - { - spin_lock(&eph_lists_spinlock); - list_add_tail(&pgp->global_eph_pages, &tmem_global.ephemeral_page_list); - if (++tmem_global.eph_count > tmem_stats.global_eph_count_max) - tmem_stats.global_eph_count_max = tmem_global.eph_count; - list_add_tail(&pgp->us.client_eph_pages, - &client->ephemeral_page_list); - if (++client->eph_count > client->eph_count_max) - client->eph_count_max = client->eph_count; - spin_unlock(&eph_lists_spinlock); - } - else - { /* is_persistent. */ - spin_lock(&pers_lists_spinlock); - list_add_tail(&pgp->us.pool_pers_pages, - &pool->persistent_page_list); - spin_unlock(&pers_lists_spinlock); - } - - if ( is_shared(pool) ) - obj->last_client = client->cli_id; - - /* Free the obj spinlock. */ - spin_unlock(&obj->obj_spinlock); - pool->good_puts++; - - if ( is_persistent(pool) ) - client->succ_pers_puts++; - else - tmem_stats.tot_good_eph_puts++; - return 1; - -bad_copy: - tmem_stats.failed_copies++; - -del_pgp_from_obj: - ASSERT((obj != NULL) && (pgp != NULL) && (pgp->index != -1)); - pgp_delete_from_obj(obj, pgp->index); - -free_pgp: - pgp_free(pgp); -unlock_obj: - if ( newobj ) - { - write_lock(&pool->pool_rwlock); - obj_free(obj); - write_unlock(&pool->pool_rwlock); - } - else - { - spin_unlock(&obj->obj_spinlock); - } - pool->no_mem_puts++; - return ret; -} - -static int do_tmem_get(struct tmem_pool *pool, - struct xen_tmem_oid *oidp, uint32_t index, - xen_pfn_t cmfn, tmem_cli_va_param_t clibuf) -{ - struct tmem_object_root *obj; - struct tmem_page_descriptor *pgp; - struct client *client = pool->client; - int rc; - - if ( !_atomic_read(pool->pgp_count) ) - return -EEMPTY; - - pool->gets++; - obj = obj_find(pool,oidp); - if ( obj == NULL ) - return 0; - - ASSERT_SPINLOCK(&obj->obj_spinlock); - if (is_shared(pool) || is_persistent(pool) ) - pgp = pgp_lookup_in_obj(obj, index); - else - pgp = pgp_delete_from_obj(obj, index); - if ( pgp == NULL ) - { - spin_unlock(&obj->obj_spinlock); - return 0; - } - ASSERT(pgp->size != -1); - if ( pgp->size != 0 ) - { - rc = tmem_decompress_to_client(cmfn, pgp->cdata, pgp->size, clibuf); - } - else - rc = tmem_copy_to_client(cmfn, pgp->pfp, clibuf); - if ( rc <= 0 ) - goto bad_copy; - - if ( !is_persistent(pool) ) - { - if ( !is_shared(pool) ) - { - pgp_delist_free(pgp); - if ( obj->pgp_count == 0 ) - { - write_lock(&pool->pool_rwlock); - obj_free(obj); - obj = NULL; - write_unlock(&pool->pool_rwlock); - } - } else { - spin_lock(&eph_lists_spinlock); - list_del(&pgp->global_eph_pages); - list_add_tail(&pgp->global_eph_pages,&tmem_global.ephemeral_page_list); - list_del(&pgp->us.client_eph_pages); - list_add_tail(&pgp->us.client_eph_pages,&client->ephemeral_page_list); - spin_unlock(&eph_lists_spinlock); - obj->last_client = current->domain->domain_id; - } - } - if ( obj != NULL ) - { - spin_unlock(&obj->obj_spinlock); - } - pool->found_gets++; - if ( is_persistent(pool) ) - client->succ_pers_gets++; - else - client->succ_eph_gets++; - return 1; - -bad_copy: - spin_unlock(&obj->obj_spinlock); - tmem_stats.failed_copies++; - return rc; -} - -static int do_tmem_flush_page(struct tmem_pool *pool, - struct xen_tmem_oid *oidp, uint32_t index) -{ - struct tmem_object_root *obj; - struct tmem_page_descriptor *pgp; - - pool->flushs++; - obj = obj_find(pool,oidp); - if ( obj == NULL ) - goto out; - pgp = pgp_delete_from_obj(obj, index); - if ( pgp == NULL ) - { - spin_unlock(&obj->obj_spinlock); - goto out; - } - pgp_delist_free(pgp); - if ( obj->pgp_count == 0 ) - { - write_lock(&pool->pool_rwlock); - obj_free(obj); - write_unlock(&pool->pool_rwlock); - } else { - spin_unlock(&obj->obj_spinlock); - } - pool->flushs_found++; - -out: - if ( pool->client->info.flags.u.frozen ) - return -EFROZEN; - else - return 1; -} - -static int do_tmem_flush_object(struct tmem_pool *pool, - struct xen_tmem_oid *oidp) -{ - struct tmem_object_root *obj; - - pool->flush_objs++; - obj = obj_find(pool,oidp); - if ( obj == NULL ) - goto out; - write_lock(&pool->pool_rwlock); - obj_destroy(obj); - pool->flush_objs_found++; - write_unlock(&pool->pool_rwlock); - -out: - if ( pool->client->info.flags.u.frozen ) - return -EFROZEN; - else - return 1; -} - -static int do_tmem_destroy_pool(uint32_t pool_id) -{ - struct client *client = current->domain->tmem_client; - struct tmem_pool *pool; - - if ( pool_id >= MAX_POOLS_PER_DOMAIN ) - return 0; - if ( (pool = client->pools[pool_id]) == NULL ) - return 0; - client->pools[pool_id] = NULL; - pool_flush(pool, client->cli_id); - client->info.nr_pools--; - return 1; -} - -int do_tmem_new_pool(domid_t this_cli_id, - uint32_t d_poolid, uint32_t flags, - uint64_t uuid_lo, uint64_t uuid_hi) -{ - struct client *client; - domid_t cli_id; - int persistent = flags & TMEM_POOL_PERSIST; - int shared = flags & TMEM_POOL_SHARED; - int pagebits = (flags >> TMEM_POOL_PAGESIZE_SHIFT) - & TMEM_POOL_PAGESIZE_MASK; - int specversion = (flags >> TMEM_POOL_VERSION_SHIFT) - & TMEM_POOL_VERSION_MASK; - struct tmem_pool *pool, *shpool; - int i, first_unused_s_poolid; - - if ( this_cli_id == TMEM_CLI_ID_NULL ) - cli_id = current->domain->domain_id; - else - cli_id = this_cli_id; - tmem_client_info("tmem: allocating %s-%s tmem pool for %s=%d...", - persistent ? "persistent" : "ephemeral" , - shared ? "shared" : "private", tmem_cli_id_str, cli_id); - if ( specversion != TMEM_SPEC_VERSION ) - { - tmem_client_err("failed... unsupported spec version\n"); - return -EPERM; - } - if ( shared && persistent ) - { - tmem_client_err("failed... unable to create a shared-persistant pool\n"); - return -EPERM; - } - if ( pagebits != (PAGE_SHIFT - 12) ) - { - tmem_client_err("failed... unsupported pagesize %d\n", - 1 << (pagebits + 12)); - return -EPERM; - } - if ( flags & TMEM_POOL_PRECOMPRESSED ) - { - tmem_client_err("failed... precompression flag set but unsupported\n"); - return -EPERM; - } - if ( flags & TMEM_POOL_RESERVED_BITS ) - { - tmem_client_err("failed... reserved bits must be zero\n"); - return -EPERM; - } - if ( this_cli_id != TMEM_CLI_ID_NULL ) - { - if ( (client = tmem_client_from_cli_id(this_cli_id)) == NULL - || d_poolid >= MAX_POOLS_PER_DOMAIN - || client->pools[d_poolid] != NULL ) - return -EPERM; - } - else - { - client = current->domain->tmem_client; - ASSERT(client != NULL); - for ( d_poolid = 0; d_poolid < MAX_POOLS_PER_DOMAIN; d_poolid++ ) - if ( client->pools[d_poolid] == NULL ) - break; - if ( d_poolid >= MAX_POOLS_PER_DOMAIN ) - { - tmem_client_err("failed... no more pool slots available for this %s\n", - tmem_client_str); - return -EPERM; - } - } - - if ( (pool = pool_alloc()) == NULL ) - { - tmem_client_err("failed... out of memory\n"); - return -ENOMEM; - } - client->pools[d_poolid] = pool; - pool->client = client; - pool->pool_id = d_poolid; - pool->shared = shared; - pool->persistent = persistent; - pool->uuid[0] = uuid_lo; - pool->uuid[1] = uuid_hi; - - /* - * Already created a pool when arrived here, but need some special process - * for shared pool. - */ - if ( shared ) - { - if ( uuid_lo == -1L && uuid_hi == -1L ) - { - tmem_client_info("Invalid uuid, create non shared pool instead!\n"); - pool->shared = 0; - goto out; - } - if ( !tmem_global.shared_auth ) - { - for ( i = 0; i < MAX_GLOBAL_SHARED_POOLS; i++) - if ( (client->shared_auth_uuid[i][0] == uuid_lo) && - (client->shared_auth_uuid[i][1] == uuid_hi) ) - break; - if ( i == MAX_GLOBAL_SHARED_POOLS ) - { - tmem_client_info("Shared auth failed, create non shared pool instead!\n"); - pool->shared = 0; - goto out; - } - } - - /* - * Authorize okay, match a global shared pool or use the newly allocated - * one. - */ - first_unused_s_poolid = MAX_GLOBAL_SHARED_POOLS; - for ( i = 0; i < MAX_GLOBAL_SHARED_POOLS; i++ ) - { - if ( (shpool = tmem_global.shared_pools[i]) != NULL ) - { - if ( shpool->uuid[0] == uuid_lo && shpool->uuid[1] == uuid_hi ) - { - /* Succ to match a global shared pool. */ - tmem_client_info("(matches shared pool uuid=%"PRIx64".%"PRIx64") pool_id=%d\n", - uuid_hi, uuid_lo, d_poolid); - client->pools[d_poolid] = shpool; - if ( !shared_pool_join(shpool, client) ) - { - pool_free(pool); - goto out; - } - else - goto fail; - } - } - else - { - if ( first_unused_s_poolid == MAX_GLOBAL_SHARED_POOLS ) - first_unused_s_poolid = i; - } - } - - /* Failed to find a global shared pool slot. */ - if ( first_unused_s_poolid == MAX_GLOBAL_SHARED_POOLS ) - { - tmem_client_warn("tmem: failed... no global shared pool slots available\n"); - goto fail; - } - /* Add pool to global shared pool. */ - else - { - INIT_LIST_HEAD(&pool->share_list); - pool->shared_count = 0; - if ( shared_pool_join(pool, client) ) - goto fail; - tmem_global.shared_pools[first_unused_s_poolid] = pool; - } - } - -out: - tmem_client_info("pool_id=%d\n", d_poolid); - client->info.nr_pools++; - return d_poolid; - -fail: - pool_free(pool); - return -EPERM; -} - -/************ TMEM CONTROL OPERATIONS ************************************/ - -int tmemc_shared_pool_auth(domid_t cli_id, uint64_t uuid_lo, - uint64_t uuid_hi, bool auth) -{ - struct client *client; - int i, free = -1; - - if ( cli_id == TMEM_CLI_ID_NULL ) - { - tmem_global.shared_auth = auth; - return 1; - } - client = tmem_client_from_cli_id(cli_id); - if ( client == NULL ) - return -EINVAL; - - for ( i = 0; i < MAX_GLOBAL_SHARED_POOLS; i++) - { - if ( auth == 0 ) - { - if ( (client->shared_auth_uuid[i][0] == uuid_lo) && - (client->shared_auth_uuid[i][1] == uuid_hi) ) - { - client->shared_auth_uuid[i][0] = -1L; - client->shared_auth_uuid[i][1] = -1L; - return 1; - } - } - else - { - if ( (client->shared_auth_uuid[i][0] == -1L) && - (client->shared_auth_uuid[i][1] == -1L) ) - { - free = i; - break; - } - } - } - if ( auth == 0 ) - return 0; - else if ( free == -1) - return -ENOMEM; - else - { - client->shared_auth_uuid[free][0] = uuid_lo; - client->shared_auth_uuid[free][1] = uuid_hi; - return 1; - } -} - -static int tmemc_save_subop(int cli_id, uint32_t pool_id, - uint32_t subop, tmem_cli_va_param_t buf, uint32_t arg) -{ - struct client *client = tmem_client_from_cli_id(cli_id); - uint32_t p; - struct tmem_page_descriptor *pgp, *pgp2; - int rc = -ENOENT; - - switch(subop) - { - case XEN_SYSCTL_TMEM_OP_SAVE_BEGIN: - if ( client == NULL ) - break; - for (p = 0; p < MAX_POOLS_PER_DOMAIN; p++) - if ( client->pools[p] != NULL ) - break; - - if ( p == MAX_POOLS_PER_DOMAIN ) - break; - - client->was_frozen = client->info.flags.u.frozen; - client->info.flags.u.frozen = 1; - if ( arg != 0 ) - client->info.flags.u.migrating = 1; - rc = 0; - break; - case XEN_SYSCTL_TMEM_OP_RESTORE_BEGIN: - if ( client == NULL ) - rc = client_create(cli_id) ? 0 : -ENOMEM; - else - rc = -EEXIST; - break; - case XEN_SYSCTL_TMEM_OP_SAVE_END: - if ( client == NULL ) - break; - client->info.flags.u.migrating = 0; - if ( !list_empty(&client->persistent_invalidated_list) ) - list_for_each_entry_safe(pgp,pgp2, - &client->persistent_invalidated_list, client_inv_pages) - __pgp_free(pgp, client->pools[pgp->pool_id]); - client->info.flags.u.frozen = client->was_frozen; - rc = 0; - break; - } - return rc; -} - -static int tmemc_save_get_next_page(int cli_id, uint32_t pool_id, - tmem_cli_va_param_t buf, uint32_t bufsize) -{ - struct client *client = tmem_client_from_cli_id(cli_id); - struct tmem_pool *pool = (client == NULL || pool_id >= MAX_POOLS_PER_DOMAIN) - ? NULL : client->pools[pool_id]; - struct tmem_page_descriptor *pgp; - struct xen_tmem_oid *oid; - int ret = 0; - struct tmem_handle h; - - if ( pool == NULL || !is_persistent(pool) ) - return -1; - - if ( bufsize < PAGE_SIZE + sizeof(struct tmem_handle) ) - return -ENOMEM; - - spin_lock(&pers_lists_spinlock); - if ( list_empty(&pool->persistent_page_list) ) - { - ret = -1; - goto out; - } - /* Note: pool->cur_pgp is the pgp last returned by get_next_page. */ - if ( pool->cur_pgp == NULL ) - { - /* Process the first one. */ - pool->cur_pgp = pgp = list_entry((&pool->persistent_page_list)->next, - struct tmem_page_descriptor,us.pool_pers_pages); - } else if ( list_is_last(&pool->cur_pgp->us.pool_pers_pages, - &pool->persistent_page_list) ) - { - /* Already processed the last one in the list. */ - ret = -1; - goto out; - } - pgp = list_entry((&pool->cur_pgp->us.pool_pers_pages)->next, - struct tmem_page_descriptor,us.pool_pers_pages); - pool->cur_pgp = pgp; - oid = &pgp->us.obj->oid; - h.pool_id = pool_id; - BUILD_BUG_ON(sizeof(h.oid) != sizeof(*oid)); - memcpy(&(h.oid), oid, sizeof(h.oid)); - h.index = pgp->index; - if ( copy_to_guest(guest_handle_cast(buf, void), &h, 1) ) - { - ret = -EFAULT; - goto out; - } - guest_handle_add_offset(buf, sizeof(h)); - ret = do_tmem_get(pool, oid, pgp->index, 0, buf); - -out: - spin_unlock(&pers_lists_spinlock); - return ret; -} - -static int tmemc_save_get_next_inv(int cli_id, tmem_cli_va_param_t buf, - uint32_t bufsize) -{ - struct client *client = tmem_client_from_cli_id(cli_id); - struct tmem_page_descriptor *pgp; - struct tmem_handle h; - int ret = 0; - - if ( client == NULL ) - return 0; - if ( bufsize < sizeof(struct tmem_handle) ) - return 0; - spin_lock(&pers_lists_spinlock); - if ( list_empty(&client->persistent_invalidated_list) ) - goto out; - if ( client->cur_pgp == NULL ) - { - pgp = list_entry((&client->persistent_invalidated_list)->next, - struct tmem_page_descriptor,client_inv_pages); - client->cur_pgp = pgp; - } else if ( list_is_last(&client->cur_pgp->client_inv_pages, - &client->persistent_invalidated_list) ) - { - client->cur_pgp = NULL; - ret = 0; - goto out; - } else { - pgp = list_entry((&client->cur_pgp->client_inv_pages)->next, - struct tmem_page_descriptor,client_inv_pages); - client->cur_pgp = pgp; - } - h.pool_id = pgp->pool_id; - BUILD_BUG_ON(sizeof(h.oid) != sizeof(pgp->inv_oid)); - memcpy(&(h.oid), &(pgp->inv_oid), sizeof(h.oid)); - h.index = pgp->index; - ret = 1; - if ( copy_to_guest(guest_handle_cast(buf, void), &h, 1) ) - ret = -EFAULT; -out: - spin_unlock(&pers_lists_spinlock); - return ret; -} - -static int tmemc_restore_put_page(int cli_id, uint32_t pool_id, - struct xen_tmem_oid *oidp, - uint32_t index, tmem_cli_va_param_t buf, - uint32_t bufsize) -{ - struct client *client = tmem_client_from_cli_id(cli_id); - struct tmem_pool *pool = (client == NULL || pool_id >= MAX_POOLS_PER_DOMAIN) - ? NULL : client->pools[pool_id]; - - if ( pool == NULL ) - return -1; - if (bufsize != PAGE_SIZE) { - tmem_client_err("tmem: %s: invalid parameter bufsize(%d) != (%ld)\n", - __func__, bufsize, PAGE_SIZE); - return -EINVAL; - } - return do_tmem_put(pool, oidp, index, 0, buf); -} - -static int tmemc_restore_flush_page(int cli_id, uint32_t pool_id, - struct xen_tmem_oid *oidp, - uint32_t index) -{ - struct client *client = tmem_client_from_cli_id(cli_id); - struct tmem_pool *pool = (client == NULL || pool_id >= MAX_POOLS_PER_DOMAIN) - ? NULL : client->pools[pool_id]; - - if ( pool == NULL ) - return -1; - return do_tmem_flush_page(pool,oidp,index); -} - -int do_tmem_control(struct xen_sysctl_tmem_op *op) -{ - int ret; - uint32_t pool_id = op->pool_id; - uint32_t cmd = op->cmd; - struct xen_tmem_oid *oidp = &op->oid; - - ASSERT(rw_is_write_locked(&tmem_rwlock)); - - switch (cmd) - { - case XEN_SYSCTL_TMEM_OP_SAVE_BEGIN: - case XEN_SYSCTL_TMEM_OP_RESTORE_BEGIN: - case XEN_SYSCTL_TMEM_OP_SAVE_END: - ret = tmemc_save_subop(op->cli_id, pool_id, cmd, - guest_handle_cast(op->u.buf, char), op->arg); - break; - case XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_PAGE: - ret = tmemc_save_get_next_page(op->cli_id, pool_id, - guest_handle_cast(op->u.buf, char), op->len); - break; - case XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_INV: - ret = tmemc_save_get_next_inv(op->cli_id, - guest_handle_cast(op->u.buf, char), op->len); - break; - case XEN_SYSCTL_TMEM_OP_RESTORE_PUT_PAGE: - ret = tmemc_restore_put_page(op->cli_id, pool_id, oidp, op->arg, - guest_handle_cast(op->u.buf, char), op->len); - break; - case XEN_SYSCTL_TMEM_OP_RESTORE_FLUSH_PAGE: - ret = tmemc_restore_flush_page(op->cli_id, pool_id, oidp, op->arg); - break; - default: - ret = -1; - } - - return ret; -} - -/************ EXPORTed FUNCTIONS **************************************/ - -long do_tmem_op(tmem_cli_op_t uops) -{ - struct tmem_op op; - struct client *client = current->domain->tmem_client; - struct tmem_pool *pool = NULL; - struct xen_tmem_oid *oidp; - int rc = 0; - - if ( !tmem_initialized ) - return -ENODEV; - - if ( xsm_tmem_op(XSM_HOOK) ) - return -EPERM; - - tmem_stats.total_tmem_ops++; - - if ( client != NULL && client->domain->is_dying ) - { - tmem_stats.errored_tmem_ops++; - return -ENODEV; - } - - if ( unlikely(tmem_get_tmemop_from_client(&op, uops) != 0) ) - { - tmem_client_err("tmem: can't get tmem struct from %s\n", tmem_client_str); - tmem_stats.errored_tmem_ops++; - return -EFAULT; - } - - /* Acquire write lock for all commands at first. */ - write_lock(&tmem_rwlock); - - switch ( op.cmd ) - { - case TMEM_CONTROL: - case TMEM_RESTORE_NEW: - case TMEM_AUTH: - rc = -EOPNOTSUPP; - break; - - default: - /* - * For other commands, create per-client tmem structure dynamically on - * first use by client. - */ - if ( client == NULL ) - { - if ( (client = client_create(current->domain->domain_id)) == NULL ) - { - tmem_client_err("tmem: can't create tmem structure for %s\n", - tmem_client_str); - rc = -ENOMEM; - goto out; - } - } - - if ( op.cmd == TMEM_NEW_POOL || op.cmd == TMEM_DESTROY_POOL ) - { - if ( op.cmd == TMEM_NEW_POOL ) - rc = do_tmem_new_pool(TMEM_CLI_ID_NULL, 0, op.u.creat.flags, - op.u.creat.uuid[0], op.u.creat.uuid[1]); - else - rc = do_tmem_destroy_pool(op.pool_id); - } - else - { - if ( ((uint32_t)op.pool_id >= MAX_POOLS_PER_DOMAIN) || - ((pool = client->pools[op.pool_id]) == NULL) ) - { - tmem_client_err("tmem: operation requested on uncreated pool\n"); - rc = -ENODEV; - goto out; - } - /* Commands that only need read lock. */ - write_unlock(&tmem_rwlock); - read_lock(&tmem_rwlock); - - oidp = &op.u.gen.oid; - switch ( op.cmd ) - { - case TMEM_NEW_POOL: - case TMEM_DESTROY_POOL: - BUG(); /* Done earlier. */ - break; - case TMEM_PUT_PAGE: - if (tmem_ensure_avail_pages()) - rc = do_tmem_put(pool, oidp, op.u.gen.index, op.u.gen.cmfn, - tmem_cli_buf_null); - else - rc = -ENOMEM; - break; - case TMEM_GET_PAGE: - rc = do_tmem_get(pool, oidp, op.u.gen.index, op.u.gen.cmfn, - tmem_cli_buf_null); - break; - case TMEM_FLUSH_PAGE: - rc = do_tmem_flush_page(pool, oidp, op.u.gen.index); - break; - case TMEM_FLUSH_OBJECT: - rc = do_tmem_flush_object(pool, oidp); - break; - default: - tmem_client_warn("tmem: op %d not implemented\n", op.cmd); - rc = -ENOSYS; - break; - } - read_unlock(&tmem_rwlock); - if ( rc < 0 ) - tmem_stats.errored_tmem_ops++; - return rc; - } - break; - - } -out: - write_unlock(&tmem_rwlock); - if ( rc < 0 ) - tmem_stats.errored_tmem_ops++; - return rc; -} - -/* This should be called when the host is destroying a client (domain). */ -void tmem_destroy(void *v) -{ - struct client *client = (struct client *)v; - - if ( client == NULL ) - return; - - if ( !client->domain->is_dying ) - { - printk("tmem: tmem_destroy can only destroy dying client\n"); - return; - } - - write_lock(&tmem_rwlock); - - printk("tmem: flushing tmem pools for %s=%d\n", - tmem_cli_id_str, client->cli_id); - client_flush(client); - - write_unlock(&tmem_rwlock); -} - -#define MAX_EVICTS 10 /* Should be variable or set via XEN_SYSCTL_TMEM_OP_ ?? */ -void *tmem_relinquish_pages(unsigned int order, unsigned int memflags) -{ - struct page_info *pfp; - unsigned long evicts_per_relinq = 0; - int max_evictions = 10; - - if (!tmem_enabled() || !tmem_freeable_pages()) - return NULL; - - tmem_stats.relinq_attempts++; - if ( order > 0 ) - { -#ifndef NDEBUG - printk("tmem_relinquish_page: failing order=%d\n", order); -#endif - return NULL; - } - - while ( (pfp = tmem_page_list_get()) == NULL ) - { - if ( (max_evictions-- <= 0) || !tmem_evict()) - break; - evicts_per_relinq++; - } - if ( evicts_per_relinq > tmem_stats.max_evicts_per_relinq ) - tmem_stats.max_evicts_per_relinq = evicts_per_relinq; - if ( pfp != NULL ) - { - if ( !(memflags & MEMF_tmem) ) - scrub_one_page(pfp); - tmem_stats.relinq_pgs++; - } - - return pfp; -} - -unsigned long tmem_freeable_pages(void) -{ - if ( !tmem_enabled() ) - return 0; - - return tmem_page_list_pages + _atomic_read(freeable_page_count); -} - -/* Called at hypervisor startup. */ -static int __init init_tmem(void) -{ - if ( !tmem_enabled() ) - return 0; - - if ( !tmem_mempool_init() ) - return 0; - - if ( tmem_init() ) - { - printk("tmem: initialized comp=%d\n", tmem_compression_enabled()); - tmem_initialized = 1; - } - else - printk("tmem: initialization FAILED\n"); - - return 0; -} -__initcall(init_tmem); - -/* - * Local variables: - * mode: C - * c-file-style: "BSD" - * c-basic-offset: 4 - * tab-width: 4 - * indent-tabs-mode: nil - * End: - */ diff --git a/xen/common/tmem_control.c b/xen/common/tmem_control.c deleted file mode 100644 index 30bf6fb362..0000000000 --- a/xen/common/tmem_control.c +++ /dev/null @@ -1,560 +0,0 @@ -/* - * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved. - * - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include - -/************ TMEM CONTROL OPERATIONS ************************************/ - -/* Freeze/thaw all pools belonging to client cli_id (all domains if -1). */ -static int tmemc_freeze_pools(domid_t cli_id, int arg) -{ - struct client *client; - bool freeze = arg == XEN_SYSCTL_TMEM_OP_FREEZE; - bool destroy = arg == XEN_SYSCTL_TMEM_OP_DESTROY; - char *s; - - s = destroy ? "destroyed" : ( freeze ? "frozen" : "thawed" ); - if ( cli_id == TMEM_CLI_ID_NULL ) - { - list_for_each_entry(client,&tmem_global.client_list,client_list) - client->info.flags.u.frozen = freeze; - tmem_client_info("tmem: all pools %s for all %ss\n", s, tmem_client_str); - } - else - { - if ( (client = tmem_client_from_cli_id(cli_id)) == NULL) - return -1; - client->info.flags.u.frozen = freeze; - tmem_client_info("tmem: all pools %s for %s=%d\n", - s, tmem_cli_id_str, cli_id); - } - return 0; -} - -static unsigned long tmem_flush_npages(unsigned long n) -{ - unsigned long avail_pages = 0; - - while ( (avail_pages = tmem_page_list_pages) < n ) - { - if ( !tmem_evict() ) - break; - } - if ( avail_pages ) - { - spin_lock(&tmem_page_list_lock); - while ( !page_list_empty(&tmem_page_list) ) - { - struct page_info *pg = page_list_remove_head(&tmem_page_list); - scrub_one_page(pg); - tmem_page_list_pages--; - free_domheap_page(pg); - } - ASSERT(tmem_page_list_pages == 0); - INIT_PAGE_LIST_HEAD(&tmem_page_list); - spin_unlock(&tmem_page_list_lock); - } - return avail_pages; -} - -static int tmemc_flush_mem(domid_t cli_id, uint32_t kb) -{ - uint32_t npages, flushed_pages, flushed_kb; - - if ( cli_id != TMEM_CLI_ID_NULL ) - { - tmem_client_warn("tmem: %s-specific flush not supported yet, use --all\n", - tmem_client_str); - return -1; - } - /* Convert kb to pages, rounding up if necessary. */ - npages = (kb + ((1 << (PAGE_SHIFT-10))-1)) >> (PAGE_SHIFT-10); - flushed_pages = tmem_flush_npages(npages); - flushed_kb = flushed_pages << (PAGE_SHIFT-10); - return flushed_kb; -} - -/* - * These tmemc_list* routines output lots of stats in a format that is - * intended to be program-parseable, not human-readable. Further, by - * tying each group of stats to a line format indicator (e.g. G= for - * global stats) and each individual stat to a two-letter specifier - * (e.g. Ec:nnnnn in the G= line says there are nnnnn pages in the - * global ephemeral pool), it should allow the stats reported to be - * forward and backwards compatible as tmem evolves. - */ -#define BSIZE 1024 - -static int tmemc_list_client(struct client *c, tmem_cli_va_param_t buf, - int off, uint32_t len, bool use_long) -{ - char info[BSIZE]; - int i, n = 0, sum = 0; - struct tmem_pool *p; - bool s; - - n = scnprintf(info,BSIZE,"C=CI:%d,ww:%d,co:%d,fr:%d," - "Tc:%"PRIu64",Ge:%ld,Pp:%ld,Gp:%ld%c", - c->cli_id, c->info.weight, c->info.flags.u.compress, c->info.flags.u.frozen, - c->total_cycles, c->succ_eph_gets, c->succ_pers_puts, c->succ_pers_gets, - use_long ? ',' : '\n'); - if (use_long) - n += scnprintf(info+n,BSIZE-n, - "Ec:%ld,Em:%ld,cp:%ld,cb:%"PRId64",cn:%ld,cm:%ld\n", - c->eph_count, c->eph_count_max, - c->compressed_pages, c->compressed_sum_size, - c->compress_poor, c->compress_nomem); - if ( !copy_to_guest_offset(buf, off + sum, info, n + 1) ) - sum += n; - for ( i = 0; i < MAX_POOLS_PER_DOMAIN; i++ ) - { - if ( (p = c->pools[i]) == NULL ) - continue; - s = is_shared(p); - n = scnprintf(info,BSIZE,"P=CI:%d,PI:%d," - "PT:%c%c,U0:%"PRIx64",U1:%"PRIx64"%c", - c->cli_id, p->pool_id, - is_persistent(p) ? 'P' : 'E', s ? 'S' : 'P', - (uint64_t)(s ? p->uuid[0] : 0), - (uint64_t)(s ? p->uuid[1] : 0LL), - use_long ? ',' : '\n'); - if (use_long) - n += scnprintf(info+n,BSIZE-n, - "Pc:%d,Pm:%d,Oc:%ld,Om:%ld,Nc:%lu,Nm:%lu," - "ps:%lu,pt:%lu,pd:%lu,pr:%lu,px:%lu,gs:%lu,gt:%lu," - "fs:%lu,ft:%lu,os:%lu,ot:%lu\n", - _atomic_read(p->pgp_count), p->pgp_count_max, - p->obj_count, p->obj_count_max, - p->objnode_count, p->objnode_count_max, - p->good_puts, p->puts,p->dup_puts_flushed, p->dup_puts_replaced, - p->no_mem_puts, - p->found_gets, p->gets, - p->flushs_found, p->flushs, p->flush_objs_found, p->flush_objs); - if ( sum + n >= len ) - return sum; - if ( !copy_to_guest_offset(buf, off + sum, info, n + 1) ) - sum += n; - } - return sum; -} - -static int tmemc_list_shared(tmem_cli_va_param_t buf, int off, uint32_t len, - bool use_long) -{ - char info[BSIZE]; - int i, n = 0, sum = 0; - struct tmem_pool *p; - struct share_list *sl; - - for ( i = 0; i < MAX_GLOBAL_SHARED_POOLS; i++ ) - { - if ( (p = tmem_global.shared_pools[i]) == NULL ) - continue; - n = scnprintf(info+n,BSIZE-n,"S=SI:%d,PT:%c%c,U0:%"PRIx64",U1:%"PRIx64, - i, is_persistent(p) ? 'P' : 'E', - is_shared(p) ? 'S' : 'P', - p->uuid[0], p->uuid[1]); - list_for_each_entry(sl,&p->share_list, share_list) - n += scnprintf(info+n,BSIZE-n,",SC:%d",sl->client->cli_id); - n += scnprintf(info+n,BSIZE-n,"%c", use_long ? ',' : '\n'); - if (use_long) - n += scnprintf(info+n,BSIZE-n, - "Pc:%d,Pm:%d,Oc:%ld,Om:%ld,Nc:%lu,Nm:%lu," - "ps:%lu,pt:%lu,pd:%lu,pr:%lu,px:%lu,gs:%lu,gt:%lu," - "fs:%lu,ft:%lu,os:%lu,ot:%lu\n", - _atomic_read(p->pgp_count), p->pgp_count_max, - p->obj_count, p->obj_count_max, - p->objnode_count, p->objnode_count_max, - p->good_puts, p->puts,p->dup_puts_flushed, p->dup_puts_replaced, - p->no_mem_puts, - p->found_gets, p->gets, - p->flushs_found, p->flushs, p->flush_objs_found, p->flush_objs); - if ( sum + n >= len ) - return sum; - if ( !copy_to_guest_offset(buf, off + sum, info, n + 1) ) - sum += n; - } - return sum; -} - -static int tmemc_list_global_perf(tmem_cli_va_param_t buf, int off, - uint32_t len, bool use_long) -{ - char info[BSIZE]; - int n = 0, sum = 0; - - n = scnprintf(info+n,BSIZE-n,"T="); - n--; /* Overwrite trailing comma. */ - n += scnprintf(info+n,BSIZE-n,"\n"); - if ( sum + n >= len ) - return sum; - if ( !copy_to_guest_offset(buf, off + sum, info, n + 1) ) - sum += n; - return sum; -} - -static int tmemc_list_global(tmem_cli_va_param_t buf, int off, uint32_t len, - bool use_long) -{ - char info[BSIZE]; - int n = 0, sum = off; - - n += scnprintf(info,BSIZE,"G=" - "Tt:%lu,Te:%lu,Cf:%lu,Af:%lu,Pf:%lu,Ta:%lu," - "Lm:%lu,Et:%lu,Ea:%lu,Rt:%lu,Ra:%lu,Rx:%lu,Fp:%lu%c", - tmem_stats.total_tmem_ops, tmem_stats.errored_tmem_ops, tmem_stats.failed_copies, - tmem_stats.alloc_failed, tmem_stats.alloc_page_failed, tmem_page_list_pages, - tmem_stats.low_on_memory, tmem_stats.evicted_pgs, - tmem_stats.evict_attempts, tmem_stats.relinq_pgs, tmem_stats.relinq_attempts, - tmem_stats.max_evicts_per_relinq, - tmem_stats.total_flush_pool, use_long ? ',' : '\n'); - if (use_long) - n += scnprintf(info+n,BSIZE-n, - "Ec:%ld,Em:%ld,Oc:%d,Om:%d,Nc:%d,Nm:%d,Pc:%d,Pm:%d," - "Fc:%d,Fm:%d,Sc:%d,Sm:%d,Ep:%lu,Gd:%lu,Zt:%lu,Gz:%lu\n", - tmem_global.eph_count, tmem_stats.global_eph_count_max, - _atomic_read(tmem_stats.global_obj_count), tmem_stats.global_obj_count_max, - _atomic_read(tmem_stats.global_rtree_node_count), tmem_stats.global_rtree_node_count_max, - _atomic_read(tmem_stats.global_pgp_count), tmem_stats.global_pgp_count_max, - _atomic_read(tmem_stats.global_page_count), tmem_stats.global_page_count_max, - _atomic_read(tmem_stats.global_pcd_count), tmem_stats.global_pcd_count_max, - tmem_stats.tot_good_eph_puts,tmem_stats.deduped_puts,tmem_stats.pcd_tot_tze_size, - tmem_stats.pcd_tot_csize); - if ( sum + n >= len ) - return sum; - if ( !copy_to_guest_offset(buf, off + sum, info, n + 1) ) - sum += n; - return sum; -} - -static int tmemc_list(domid_t cli_id, tmem_cli_va_param_t buf, uint32_t len, - bool use_long) -{ - struct client *client; - int off = 0; - - if ( cli_id == TMEM_CLI_ID_NULL ) { - off = tmemc_list_global(buf,0,len,use_long); - off += tmemc_list_shared(buf,off,len-off,use_long); - list_for_each_entry(client,&tmem_global.client_list,client_list) - off += tmemc_list_client(client, buf, off, len-off, use_long); - off += tmemc_list_global_perf(buf,off,len-off,use_long); - } - else if ( (client = tmem_client_from_cli_id(cli_id)) == NULL) - return -1; - else - off = tmemc_list_client(client, buf, 0, len, use_long); - - return 0; -} - -static int __tmemc_set_client_info(struct client *client, - XEN_GUEST_HANDLE(xen_tmem_client_t) buf) -{ - domid_t cli_id; - uint32_t old_weight; - xen_tmem_client_t info = { }; - - ASSERT(client); - - if ( copy_from_guest(&info, buf, 1) ) - return -EFAULT; - - if ( info.version != TMEM_SPEC_VERSION ) - return -EOPNOTSUPP; - - if ( info.maxpools > MAX_POOLS_PER_DOMAIN ) - return -ERANGE; - - /* Ignore info.nr_pools. */ - cli_id = client->cli_id; - - if ( info.weight != client->info.weight ) - { - old_weight = client->info.weight; - client->info.weight = info.weight; - tmem_client_info("tmem: weight set to %d for %s=%d\n", - info.weight, tmem_cli_id_str, cli_id); - atomic_sub(old_weight,&tmem_global.client_weight_total); - atomic_add(client->info.weight,&tmem_global.client_weight_total); - } - - - if ( info.flags.u.compress != client->info.flags.u.compress ) - { - client->info.flags.u.compress = info.flags.u.compress; - tmem_client_info("tmem: compression %s for %s=%d\n", - info.flags.u.compress ? "enabled" : "disabled", - tmem_cli_id_str,cli_id); - } - return 0; -} - -static int tmemc_set_client_info(domid_t cli_id, - XEN_GUEST_HANDLE(xen_tmem_client_t) info) -{ - struct client *client; - int ret = -ENOENT; - - if ( cli_id == TMEM_CLI_ID_NULL ) - { - list_for_each_entry(client,&tmem_global.client_list,client_list) - { - ret = __tmemc_set_client_info(client, info); - if (ret) - break; - } - } - else - { - client = tmem_client_from_cli_id(cli_id); - if ( client ) - ret = __tmemc_set_client_info(client, info); - } - return ret; -} - -static int tmemc_get_client_info(int cli_id, - XEN_GUEST_HANDLE(xen_tmem_client_t) info) -{ - struct client *client = tmem_client_from_cli_id(cli_id); - - if ( client ) - { - if ( copy_to_guest(info, &client->info, 1) ) - return -EFAULT; - } - else - { - static const xen_tmem_client_t generic = { - .version = TMEM_SPEC_VERSION, - .maxpools = MAX_POOLS_PER_DOMAIN - }; - - if ( copy_to_guest(info, &generic, 1) ) - return -EFAULT; - } - - return 0; -} - -static int tmemc_get_pool(int cli_id, - XEN_GUEST_HANDLE(xen_tmem_pool_info_t) pools, - uint32_t len) -{ - struct client *client = tmem_client_from_cli_id(cli_id); - unsigned int i, idx; - int rc = 0; - unsigned int nr = len / sizeof(xen_tmem_pool_info_t); - - if ( len % sizeof(xen_tmem_pool_info_t) ) - return -EINVAL; - - if ( nr > MAX_POOLS_PER_DOMAIN ) - return -E2BIG; - - if ( !guest_handle_okay(pools, nr) ) - return -EINVAL; - - if ( !client ) - return -EINVAL; - - for ( idx = 0, i = 0; i < MAX_POOLS_PER_DOMAIN; i++ ) - { - struct tmem_pool *pool = client->pools[i]; - xen_tmem_pool_info_t out; - - if ( pool == NULL ) - continue; - - out.flags.raw = (pool->persistent ? TMEM_POOL_PERSIST : 0) | - (pool->shared ? TMEM_POOL_SHARED : 0) | - (POOL_PAGESHIFT << TMEM_POOL_PAGESIZE_SHIFT) | - (TMEM_SPEC_VERSION << TMEM_POOL_VERSION_SHIFT); - out.n_pages = _atomic_read(pool->pgp_count); - out.uuid[0] = pool->uuid[0]; - out.uuid[1] = pool->uuid[1]; - out.id = i; - - /* N.B. 'idx' != 'i'. */ - if ( __copy_to_guest_offset(pools, idx, &out, 1) ) - { - rc = -EFAULT; - break; - } - idx++; - /* Don't try to put more than what was requested. */ - if ( idx >= nr ) - break; - } - - /* And how many we have processed. */ - return rc ? : idx; -} - -static int tmemc_set_pools(int cli_id, - XEN_GUEST_HANDLE(xen_tmem_pool_info_t) pools, - uint32_t len) -{ - unsigned int i; - int rc = 0; - unsigned int nr = len / sizeof(xen_tmem_pool_info_t); - struct client *client = tmem_client_from_cli_id(cli_id); - - if ( len % sizeof(xen_tmem_pool_info_t) ) - return -EINVAL; - - if ( nr > MAX_POOLS_PER_DOMAIN ) - return -E2BIG; - - if ( !guest_handle_okay(pools, nr) ) - return -EINVAL; - - if ( !client ) - { - client = client_create(cli_id); - if ( !client ) - return -ENOMEM; - } - for ( i = 0; i < nr; i++ ) - { - xen_tmem_pool_info_t pool; - - if ( __copy_from_guest_offset(&pool, pools, i, 1 ) ) - return -EFAULT; - - if ( pool.n_pages ) - return -EINVAL; - - rc = do_tmem_new_pool(cli_id, pool.id, pool.flags.raw, - pool.uuid[0], pool.uuid[1]); - if ( rc < 0 ) - break; - - pool.id = rc; - if ( __copy_to_guest_offset(pools, i, &pool, 1) ) - return -EFAULT; - } - - /* And how many we have processed. */ - return rc ? : i; -} - -static int tmemc_auth_pools(int cli_id, - XEN_GUEST_HANDLE(xen_tmem_pool_info_t) pools, - uint32_t len) -{ - unsigned int i; - int rc = 0; - unsigned int nr = len / sizeof(xen_tmem_pool_info_t); - struct client *client = tmem_client_from_cli_id(cli_id); - - if ( len % sizeof(xen_tmem_pool_info_t) ) - return -EINVAL; - - if ( nr > MAX_POOLS_PER_DOMAIN ) - return -E2BIG; - - if ( !guest_handle_okay(pools, nr) ) - return -EINVAL; - - if ( !client ) - { - client = client_create(cli_id); - if ( !client ) - return -ENOMEM; - } - - for ( i = 0; i < nr; i++ ) - { - xen_tmem_pool_info_t pool; - - if ( __copy_from_guest_offset(&pool, pools, i, 1 ) ) - return -EFAULT; - - if ( pool.n_pages ) - return -EINVAL; - - rc = tmemc_shared_pool_auth(cli_id, pool.uuid[0], pool.uuid[1], - pool.flags.u.auth); - - if ( rc < 0 ) - break; - - } - - /* And how many we have processed. */ - return rc ? : i; -} - -int tmem_control(struct xen_sysctl_tmem_op *op) -{ - int ret; - uint32_t cmd = op->cmd; - - if ( op->pad != 0 ) - return -EINVAL; - - write_lock(&tmem_rwlock); - - switch (cmd) - { - case XEN_SYSCTL_TMEM_OP_THAW: - case XEN_SYSCTL_TMEM_OP_FREEZE: - case XEN_SYSCTL_TMEM_OP_DESTROY: - ret = tmemc_freeze_pools(op->cli_id, cmd); - break; - case XEN_SYSCTL_TMEM_OP_FLUSH: - ret = tmemc_flush_mem(op->cli_id, op->arg); - break; - case XEN_SYSCTL_TMEM_OP_LIST: - ret = tmemc_list(op->cli_id, - guest_handle_cast(op->u.buf, char), op->len, op->arg); - break; - case XEN_SYSCTL_TMEM_OP_SET_CLIENT_INFO: - ret = tmemc_set_client_info(op->cli_id, op->u.client); - break; - case XEN_SYSCTL_TMEM_OP_QUERY_FREEABLE_MB: - ret = tmem_freeable_pages() >> (20 - PAGE_SHIFT); - break; - case XEN_SYSCTL_TMEM_OP_GET_CLIENT_INFO: - ret = tmemc_get_client_info(op->cli_id, op->u.client); - break; - case XEN_SYSCTL_TMEM_OP_GET_POOLS: - ret = tmemc_get_pool(op->cli_id, op->u.pool, op->len); - break; - case XEN_SYSCTL_TMEM_OP_SET_POOLS: /* TMEM_RESTORE_NEW */ - ret = tmemc_set_pools(op->cli_id, op->u.pool, op->len); - break; - case XEN_SYSCTL_TMEM_OP_SET_AUTH: /* TMEM_AUTH */ - ret = tmemc_auth_pools(op->cli_id, op->u.pool, op->len); - break; - default: - ret = do_tmem_control(op); - break; - } - - write_unlock(&tmem_rwlock); - - return ret; -} - -/* - * Local variables: - * mode: C - * c-file-style: "BSD" - * c-basic-offset: 4 - * tab-width: 4 - * indent-tabs-mode: nil - * End: - */ diff --git a/xen/common/tmem_xen.c b/xen/common/tmem_xen.c deleted file mode 100644 index bf7b14f79a..0000000000 --- a/xen/common/tmem_xen.c +++ /dev/null @@ -1,277 +0,0 @@ -/****************************************************************************** - * tmem-xen.c - * - * Xen-specific Transcendent memory - * - * Copyright (c) 2009, Dan Magenheimer, Oracle Corp. - */ - -#include -#include -#include /* compression code */ -#include -#include -#include -#include - -bool __read_mostly opt_tmem; -boolean_param("tmem", opt_tmem); - -bool __read_mostly opt_tmem_compress; -boolean_param("tmem_compress", opt_tmem_compress); - -atomic_t freeable_page_count = ATOMIC_INIT(0); - -/* these are a concurrency bottleneck, could be percpu and dynamically - * allocated iff opt_tmem_compress */ -#define LZO_WORKMEM_BYTES LZO1X_1_MEM_COMPRESS -#define LZO_DSTMEM_PAGES 2 -static DEFINE_PER_CPU_READ_MOSTLY(unsigned char *, workmem); -static DEFINE_PER_CPU_READ_MOSTLY(unsigned char *, dstmem); -static DEFINE_PER_CPU_READ_MOSTLY(void *, scratch_page); - -#if defined(CONFIG_ARM) -static inline void *cli_get_page(xen_pfn_t cmfn, mfn_t *pcli_mfn, - struct page_info **pcli_pfp, bool cli_write) -{ - ASSERT_UNREACHABLE(); - return NULL; -} - -static inline void cli_put_page(void *cli_va, struct page_info *cli_pfp, - mfn_t cli_mfn, bool mark_dirty) -{ - ASSERT_UNREACHABLE(); -} -#else -#include - -static inline void *cli_get_page(xen_pfn_t cmfn, mfn_t *pcli_mfn, - struct page_info **pcli_pfp, bool cli_write) -{ - p2m_type_t t; - struct page_info *page; - - page = get_page_from_gfn(current->domain, cmfn, &t, P2M_ALLOC); - if ( !page || t != p2m_ram_rw ) - { - if ( page ) - put_page(page); - return NULL; - } - - if ( cli_write && !get_page_type(page, PGT_writable_page) ) - { - put_page(page); - return NULL; - } - - *pcli_mfn = page_to_mfn(page); - *pcli_pfp = page; - - return map_domain_page(*pcli_mfn); -} - -static inline void cli_put_page(void *cli_va, struct page_info *cli_pfp, - mfn_t cli_mfn, bool mark_dirty) -{ - if ( mark_dirty ) - { - put_page_and_type(cli_pfp); - paging_mark_dirty(current->domain, cli_mfn); - } - else - put_page(cli_pfp); - unmap_domain_page(cli_va); -} -#endif - -int tmem_copy_from_client(struct page_info *pfp, - xen_pfn_t cmfn, tmem_cli_va_param_t clibuf) -{ - mfn_t tmem_mfn, cli_mfn = INVALID_MFN; - char *tmem_va, *cli_va = NULL; - struct page_info *cli_pfp = NULL; - int rc = 1; - - ASSERT(pfp != NULL); - tmem_mfn = page_to_mfn(pfp); - tmem_va = map_domain_page(tmem_mfn); - if ( guest_handle_is_null(clibuf) ) - { - cli_va = cli_get_page(cmfn, &cli_mfn, &cli_pfp, 0); - if ( cli_va == NULL ) - { - unmap_domain_page(tmem_va); - return -EFAULT; - } - } - smp_mb(); - if ( cli_va ) - { - memcpy(tmem_va, cli_va, PAGE_SIZE); - cli_put_page(cli_va, cli_pfp, cli_mfn, 0); - } - else - rc = -EINVAL; - unmap_domain_page(tmem_va); - return rc; -} - -int tmem_compress_from_client(xen_pfn_t cmfn, - void **out_va, size_t *out_len, tmem_cli_va_param_t clibuf) -{ - int ret = 0; - unsigned char *dmem = this_cpu(dstmem); - unsigned char *wmem = this_cpu(workmem); - char *scratch = this_cpu(scratch_page); - struct page_info *cli_pfp = NULL; - mfn_t cli_mfn = INVALID_MFN; - void *cli_va = NULL; - - if ( dmem == NULL || wmem == NULL ) - return 0; /* no buffer, so can't compress */ - if ( guest_handle_is_null(clibuf) ) - { - cli_va = cli_get_page(cmfn, &cli_mfn, &cli_pfp, 0); - if ( cli_va == NULL ) - return -EFAULT; - } - else if ( !scratch ) - return 0; - else if ( copy_from_guest(scratch, clibuf, PAGE_SIZE) ) - return -EFAULT; - smp_mb(); - ret = lzo1x_1_compress(cli_va ?: scratch, PAGE_SIZE, dmem, out_len, wmem); - ASSERT(ret == LZO_E_OK); - *out_va = dmem; - if ( cli_va ) - cli_put_page(cli_va, cli_pfp, cli_mfn, 0); - return 1; -} - -int tmem_copy_to_client(xen_pfn_t cmfn, struct page_info *pfp, - tmem_cli_va_param_t clibuf) -{ - mfn_t tmem_mfn, cli_mfn = INVALID_MFN; - char *tmem_va, *cli_va = NULL; - struct page_info *cli_pfp = NULL; - int rc = 1; - - ASSERT(pfp != NULL); - if ( guest_handle_is_null(clibuf) ) - { - cli_va = cli_get_page(cmfn, &cli_mfn, &cli_pfp, 1); - if ( cli_va == NULL ) - return -EFAULT; - } - tmem_mfn = page_to_mfn(pfp); - tmem_va = map_domain_page(tmem_mfn); - - if ( cli_va ) - { - memcpy(cli_va, tmem_va, PAGE_SIZE); - cli_put_page(cli_va, cli_pfp, cli_mfn, 1); - } - else - rc = -EINVAL; - unmap_domain_page(tmem_va); - smp_mb(); - return rc; -} - -int tmem_decompress_to_client(xen_pfn_t cmfn, void *tmem_va, - size_t size, tmem_cli_va_param_t clibuf) -{ - mfn_t cli_mfn = INVALID_MFN; - struct page_info *cli_pfp = NULL; - void *cli_va = NULL; - char *scratch = this_cpu(scratch_page); - size_t out_len = PAGE_SIZE; - int ret; - - if ( guest_handle_is_null(clibuf) ) - { - cli_va = cli_get_page(cmfn, &cli_mfn, &cli_pfp, 1); - if ( cli_va == NULL ) - return -EFAULT; - } - else if ( !scratch ) - return 0; - ret = lzo1x_decompress_safe(tmem_va, size, cli_va ?: scratch, &out_len); - ASSERT(ret == LZO_E_OK); - ASSERT(out_len == PAGE_SIZE); - if ( cli_va ) - cli_put_page(cli_va, cli_pfp, cli_mfn, 1); - else if ( copy_to_guest(clibuf, scratch, PAGE_SIZE) ) - return -EFAULT; - smp_mb(); - return 1; -} - -/****************** XEN-SPECIFIC HOST INITIALIZATION ********************/ -static int dstmem_order, workmem_order; - -static int cpu_callback( - struct notifier_block *nfb, unsigned long action, void *hcpu) -{ - unsigned int cpu = (unsigned long)hcpu; - - switch ( action ) - { - case CPU_UP_PREPARE: { - if ( per_cpu(dstmem, cpu) == NULL ) - per_cpu(dstmem, cpu) = alloc_xenheap_pages(dstmem_order, 0); - if ( per_cpu(workmem, cpu) == NULL ) - per_cpu(workmem, cpu) = alloc_xenheap_pages(workmem_order, 0); - if ( per_cpu(scratch_page, cpu) == NULL ) - per_cpu(scratch_page, cpu) = alloc_xenheap_page(); - break; - } - case CPU_DEAD: - case CPU_UP_CANCELED: { - if ( per_cpu(dstmem, cpu) != NULL ) - { - free_xenheap_pages(per_cpu(dstmem, cpu), dstmem_order); - per_cpu(dstmem, cpu) = NULL; - } - if ( per_cpu(workmem, cpu) != NULL ) - { - free_xenheap_pages(per_cpu(workmem, cpu), workmem_order); - per_cpu(workmem, cpu) = NULL; - } - if ( per_cpu(scratch_page, cpu) != NULL ) - { - free_xenheap_page(per_cpu(scratch_page, cpu)); - per_cpu(scratch_page, cpu) = NULL; - } - break; - } - default: - break; - } - - return NOTIFY_DONE; -} - -static struct notifier_block cpu_nfb = { - .notifier_call = cpu_callback -}; - -int __init tmem_init(void) -{ - unsigned int cpu; - - dstmem_order = get_order_from_pages(LZO_DSTMEM_PAGES); - workmem_order = get_order_from_bytes(LZO1X_1_MEM_COMPRESS); - - for_each_online_cpu ( cpu ) - { - void *hcpu = (void *)(long)cpu; - cpu_callback(&cpu_nfb, CPU_UP_PREPARE, hcpu); - } - - register_cpu_notifier(&cpu_nfb); - - return 1; -} diff --git a/xen/include/Makefile b/xen/include/Makefile index 3d14532dbd..c3e0283d34 100644 --- a/xen/include/Makefile +++ b/xen/include/Makefile @@ -17,7 +17,6 @@ headers-y := \ compat/physdev.h \ compat/platform.h \ compat/sched.h \ - compat/tmem.h \ compat/trace.h \ compat/vcpu.h \ compat/version.h \ diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h index c49b4dcc99..02cbd3e5a5 100644 --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -34,7 +34,6 @@ #include "xen.h" #include "domctl.h" #include "physdev.h" -#include "tmem.h" #define XEN_SYSCTL_INTERFACE_VERSION 0x00000012 @@ -732,110 +731,6 @@ struct xen_sysctl_psr_alloc { } u; }; -#define XEN_SYSCTL_TMEM_OP_ALL_CLIENTS 0xFFFFU - -#define XEN_SYSCTL_TMEM_OP_THAW 0 -#define XEN_SYSCTL_TMEM_OP_FREEZE 1 -#define XEN_SYSCTL_TMEM_OP_FLUSH 2 -#define XEN_SYSCTL_TMEM_OP_DESTROY 3 -#define XEN_SYSCTL_TMEM_OP_LIST 4 -#define XEN_SYSCTL_TMEM_OP_GET_CLIENT_INFO 5 -#define XEN_SYSCTL_TMEM_OP_SET_CLIENT_INFO 6 -#define XEN_SYSCTL_TMEM_OP_GET_POOLS 7 -#define XEN_SYSCTL_TMEM_OP_QUERY_FREEABLE_MB 8 -#define XEN_SYSCTL_TMEM_OP_SET_POOLS 9 -#define XEN_SYSCTL_TMEM_OP_SAVE_BEGIN 10 -#define XEN_SYSCTL_TMEM_OP_SET_AUTH 11 -#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_PAGE 19 -#define XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_INV 20 -#define XEN_SYSCTL_TMEM_OP_SAVE_END 21 -#define XEN_SYSCTL_TMEM_OP_RESTORE_BEGIN 30 -#define XEN_SYSCTL_TMEM_OP_RESTORE_PUT_PAGE 32 -#define XEN_SYSCTL_TMEM_OP_RESTORE_FLUSH_PAGE 33 - -/* - * XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_[PAGE|INV] override the 'buf' in - * xen_sysctl_tmem_op with this structure - sometimes with an extra - * page tackled on. - */ -struct tmem_handle { - uint32_t pool_id; - uint32_t index; - xen_tmem_oid_t oid; -}; - -/* - * XEN_SYSCTL_TMEM_OP_[GET,SAVE]_CLIENT uses the 'client' in - * xen_tmem_op with this structure, which is mostly used during migration. - */ -struct xen_tmem_client { - uint32_t version; /* If mismatched we will get XEN_EOPNOTSUPP. */ - uint32_t maxpools; /* If greater than what hypervisor supports, will get - XEN_ERANGE. */ - uint32_t nr_pools; /* Current amount of pools. Ignored on SET*/ - union { /* See TMEM_CLIENT_[COMPRESS,FROZEN] */ - uint32_t raw; - struct { - uint8_t frozen:1, - compress:1, - migrating:1; - } u; - } flags; - uint32_t weight; -}; -typedef struct xen_tmem_client xen_tmem_client_t; -DEFINE_XEN_GUEST_HANDLE(xen_tmem_client_t); - -/* - * XEN_SYSCTL_TMEM_OP_[GET|SET]_POOLS or XEN_SYSCTL_TMEM_OP_SET_AUTH - * uses the 'pool' array in * xen_sysctl_tmem_op with this structure. - * The XEN_SYSCTL_TMEM_OP_GET_POOLS hypercall will - * return the number of entries in 'pool' or a negative value - * if an error was encountered. - * The XEN_SYSCTL_TMEM_OP_SET_[AUTH|POOLS] will return the number of - * entries in 'pool' processed or a negative value if an error - * was encountered. - */ -struct xen_tmem_pool_info { - union { - uint32_t raw; - struct { - uint32_t persist:1, /* See TMEM_POOL_PERSIST. */ - shared:1, /* See TMEM_POOL_SHARED. */ - auth:1, /* See TMEM_POOL_AUTH. */ - rsv1:1, - pagebits:8, /* TMEM_POOL_PAGESIZE_[SHIFT,MASK]. */ - rsv2:12, - version:8; /* TMEM_POOL_VERSION_[SHIFT,MASK]. */ - } u; - } flags; - uint32_t id; /* Less than tmem_client.maxpools. */ - uint64_t n_pages; /* Zero on XEN_SYSCTL_TMEM_OP_SET_[AUTH|POOLS]. */ - uint64_aligned_t uuid[2]; -}; -typedef struct xen_tmem_pool_info xen_tmem_pool_info_t; -DEFINE_XEN_GUEST_HANDLE(xen_tmem_pool_info_t); - -struct xen_sysctl_tmem_op { - uint32_t cmd; /* IN: XEN_SYSCTL_TMEM_OP_* . */ - int32_t pool_id; /* IN: 0 by default unless _SAVE_*, RESTORE_* .*/ - uint32_t cli_id; /* IN: client id, 0 for XEN_SYSCTL_TMEM_QUERY_FREEABLE_MB - for all others can be the domain id or - XEN_SYSCTL_TMEM_OP_ALL_CLIENTS for all. */ - uint32_t len; /* IN: length of 'buf'. If not applicable to use 0. */ - uint32_t arg; /* IN: If not applicable to command use 0. */ - uint32_t pad; /* Padding so structure is the same under 32 and 64. */ - xen_tmem_oid_t oid; /* IN: If not applicable to command use 0s. */ - union { - XEN_GUEST_HANDLE_64(char) buf; /* IN/OUT: Buffer to save/restore */ - XEN_GUEST_HANDLE_64(xen_tmem_client_t) client; /* IN/OUT for */ - /* XEN_SYSCTL_TMEM_OP_[GET,SAVE]_CLIENT. */ - XEN_GUEST_HANDLE_64(xen_tmem_pool_info_t) pool; /* OUT for */ - /* XEN_SYSCTL_TMEM_OP_GET_POOLS. Must have 'len' */ - /* of them. */ - } u; -}; - /* * XEN_SYSCTL_get_cpu_levelling_caps (x86 specific) * @@ -1124,7 +1019,7 @@ struct xen_sysctl { #define XEN_SYSCTL_psr_cmt_op 21 #define XEN_SYSCTL_pcitopoinfo 22 #define XEN_SYSCTL_psr_alloc 23 -#define XEN_SYSCTL_tmem_op 24 +/* #define XEN_SYSCTL_tmem_op 24 */ #define XEN_SYSCTL_get_cpu_levelling_caps 25 #define XEN_SYSCTL_get_cpu_featureset 26 #define XEN_SYSCTL_livepatch_op 27 @@ -1154,7 +1049,6 @@ struct xen_sysctl { struct xen_sysctl_coverage_op coverage_op; struct xen_sysctl_psr_cmt_op psr_cmt_op; struct xen_sysctl_psr_alloc psr_alloc; - struct xen_sysctl_tmem_op tmem_op; struct xen_sysctl_cpu_levelling_caps cpu_levelling_caps; struct xen_sysctl_cpu_featureset cpu_featureset; struct xen_sysctl_livepatch_op livepatch; diff --git a/xen/include/public/tmem.h b/xen/include/public/tmem.h index aa0aafaa9d..c02be9f704 100644 --- a/xen/include/public/tmem.h +++ b/xen/include/public/tmem.h @@ -1,8 +1,8 @@ /****************************************************************************** * tmem.h - * + * * Guest OS interface to Xen Transcendent Memory. - * + * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to * deal in the Software without restriction, including without limitation the @@ -29,15 +29,11 @@ #include "xen.h" +#if __XEN_INTERFACE_VERSION__ < 0x00041200 + /* version of ABI */ #define TMEM_SPEC_VERSION 1 -/* Commands to HYPERVISOR_tmem_op() */ -#ifdef __XEN__ -#define TMEM_CONTROL 0 /* Now called XEN_SYSCTL_tmem_op */ -#else -#undef TMEM_CONTROL -#endif #define TMEM_NEW_POOL 1 #define TMEM_DESTROY_POOL 2 #define TMEM_PUT_PAGE 4 @@ -111,6 +107,8 @@ typedef struct tmem_op tmem_op_t; DEFINE_XEN_GUEST_HANDLE(tmem_op_t); #endif +#endif /* __XEN_INTERFACE_VERSION__ < 0x00041200 */ + #endif /* __XEN_PUBLIC_TMEM_H__ */ /* diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h index 6154c48cb8..fc00a67448 100644 --- a/xen/include/xen/hypercall.h +++ b/xen/include/xen/hypercall.h @@ -12,7 +12,6 @@ #include #include #include -#include #include #include #include @@ -130,12 +129,6 @@ extern long do_xsm_op( XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_xsm_op); -#ifdef CONFIG_TMEM -extern long -do_tmem_op( - XEN_GUEST_HANDLE_PARAM(tmem_op_t) uops); -#endif - #ifdef CONFIG_ARGO extern long do_argo_op( unsigned int cmd, diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h index e971147234..7fc3039593 100644 --- a/xen/include/xen/mm.h +++ b/xen/include/xen/mm.h @@ -209,7 +209,6 @@ unsigned long avail_node_heap_pages(unsigned int); unsigned int online_page(unsigned long mfn, uint32_t *status); int offline_page(unsigned long mfn, int broken, uint32_t *status); int query_page_offline(unsigned long mfn, uint32_t *status); -unsigned long total_free_pages(void); void heap_init_late(void); @@ -249,8 +248,6 @@ struct npfec { #define MEMF_no_refcount (1U<<_MEMF_no_refcount) #define _MEMF_populate_on_demand 1 #define MEMF_populate_on_demand (1U<<_MEMF_populate_on_demand) -#define _MEMF_tmem 2 -#define MEMF_tmem (1U<<_MEMF_tmem) #define _MEMF_no_dma 3 #define MEMF_no_dma (1U<<_MEMF_no_dma) #define _MEMF_exact_node 4 diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 748bb0f2f9..2201faca6b 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -454,9 +454,6 @@ struct domain */ spinlock_t hypercall_deadlock_mutex; - /* transcendent memory, auto-allocated on first tmem op by each domain */ - struct client *tmem_client; - struct lock_profile_qhead profile_head; /* Various vm_events */ diff --git a/xen/include/xen/tmem.h b/xen/include/xen/tmem.h deleted file mode 100644 index 414a14d808..0000000000 --- a/xen/include/xen/tmem.h +++ /dev/null @@ -1,45 +0,0 @@ -/****************************************************************************** - * tmem.h - * - * Transcendent memory - * - * Copyright (c) 2008, Dan Magenheimer, Oracle Corp. - */ - -#ifndef __XEN_TMEM_H__ -#define __XEN_TMEM_H__ - -struct xen_sysctl_tmem_op; - -#ifdef CONFIG_TMEM -extern int tmem_control(struct xen_sysctl_tmem_op *op); -extern void tmem_destroy(void *); -extern void *tmem_relinquish_pages(unsigned int, unsigned int); -extern unsigned long tmem_freeable_pages(void); -#else -static inline int -tmem_control(struct xen_sysctl_tmem_op *op) -{ - return -ENOSYS; -} - -static inline void -tmem_destroy(void *p) -{ - return; -} - -static inline void * -tmem_relinquish_pages(unsigned int x, unsigned int y) -{ - return NULL; -} - -static inline unsigned long -tmem_freeable_pages(void) -{ - return 0; -} -#endif /* CONFIG_TMEM */ - -#endif /* __XEN_TMEM_H__ */ diff --git a/xen/include/xen/tmem_control.h b/xen/include/xen/tmem_control.h deleted file mode 100644 index ad04cf707b..0000000000 --- a/xen/include/xen/tmem_control.h +++ /dev/null @@ -1,39 +0,0 @@ -/* - * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved. - * - */ - -#ifndef __XEN_TMEM_CONTROL_H__ -#define __XEN_TMEM_CONTROL_H__ - -#ifdef CONFIG_TMEM -#include -/* Variables and functions that tmem_control.c needs from tmem.c */ - -extern struct tmem_statistics tmem_stats; -extern struct tmem_global tmem_global; - -extern rwlock_t tmem_rwlock; - -int tmem_evict(void); -int do_tmem_control(struct xen_sysctl_tmem_op *op); - -struct client *client_create(domid_t cli_id); -int do_tmem_new_pool(domid_t this_cli_id, uint32_t d_poolid, uint32_t flags, - uint64_t uuid_lo, uint64_t uuid_hi); - -int tmemc_shared_pool_auth(domid_t cli_id, uint64_t uuid_lo, - uint64_t uuid_hi, bool auth); -#endif /* CONFIG_TMEM */ - -#endif /* __XEN_TMEM_CONTROL_H__ */ - -/* - * Local variables: - * mode: C - * c-file-style: "BSD" - * c-basic-offset: 4 - * tab-width: 4 - * indent-tabs-mode: nil - * End: - */ diff --git a/xen/include/xen/tmem_xen.h b/xen/include/xen/tmem_xen.h deleted file mode 100644 index 8516a0b131..0000000000 --- a/xen/include/xen/tmem_xen.h +++ /dev/null @@ -1,343 +0,0 @@ -/****************************************************************************** - * tmem_xen.h - * - * Xen-specific Transcendent memory - * - * Copyright (c) 2009, Dan Magenheimer, Oracle Corp. - */ - -#ifndef __XEN_TMEM_XEN_H__ -#define __XEN_TMEM_XEN_H__ - -#include /* heap alloc/free */ -#include -#include /* xmalloc/xfree */ -#include /* struct domain */ -#include /* copy_from_guest */ -#include /* hash_long */ -#include /* __map_domain_page */ -#include /* struct rb_root */ -#include /* xsm_tmem_control */ -#include -#ifdef CONFIG_COMPAT -#include -#endif -typedef uint32_t pagesize_t; /* like size_t, must handle largest PAGE_SIZE */ - -#define IS_PAGE_ALIGNED(addr) IS_ALIGNED((unsigned long)(addr), PAGE_SIZE) -#define IS_VALID_PAGE(_pi) mfn_valid(page_to_mfn(_pi)) - -extern struct page_list_head tmem_page_list; -extern spinlock_t tmem_page_list_lock; -extern unsigned long tmem_page_list_pages; -extern atomic_t freeable_page_count; - -extern int tmem_init(void); -#define tmem_hash hash_long - -extern bool opt_tmem_compress; -static inline bool tmem_compression_enabled(void) -{ - return opt_tmem_compress; -} - -#ifdef CONFIG_TMEM -extern bool opt_tmem; -static inline bool tmem_enabled(void) -{ - return opt_tmem; -} - -static inline void tmem_disable(void) -{ - opt_tmem = false; -} -#else -static inline bool tmem_enabled(void) -{ - return false; -} - -static inline void tmem_disable(void) -{ -} -#endif /* CONFIG_TMEM */ - -/* - * Memory free page list management - */ - -static inline struct page_info *tmem_page_list_get(void) -{ - struct page_info *pi; - - spin_lock(&tmem_page_list_lock); - if ( (pi = page_list_remove_head(&tmem_page_list)) != NULL ) - tmem_page_list_pages--; - spin_unlock(&tmem_page_list_lock); - ASSERT((pi == NULL) || IS_VALID_PAGE(pi)); - return pi; -} - -static inline void tmem_page_list_put(struct page_info *pi) -{ - ASSERT(IS_VALID_PAGE(pi)); - spin_lock(&tmem_page_list_lock); - page_list_add(pi, &tmem_page_list); - tmem_page_list_pages++; - spin_unlock(&tmem_page_list_lock); -} - -/* - * Memory allocation for persistent data - */ -static inline struct page_info *__tmem_alloc_page_thispool(struct domain *d) -{ - struct page_info *pi; - - /* note that this tot_pages check is not protected by d->page_alloc_lock, - * so may race and periodically fail in donate_page or alloc_domheap_pages - * That's OK... neither is a problem, though chatty if log_lvl is set */ - if ( d->tot_pages >= d->max_pages ) - return NULL; - - if ( tmem_page_list_pages ) - { - if ( (pi = tmem_page_list_get()) != NULL ) - { - if ( donate_page(d,pi,0) == 0 ) - goto out; - else - tmem_page_list_put(pi); - } - } - - pi = alloc_domheap_pages(d,0,MEMF_tmem); - -out: - ASSERT((pi == NULL) || IS_VALID_PAGE(pi)); - return pi; -} - -static inline void __tmem_free_page_thispool(struct page_info *pi) -{ - struct domain *d = page_get_owner(pi); - - ASSERT(IS_VALID_PAGE(pi)); - if ( (d == NULL) || steal_page(d,pi,0) == 0 ) - tmem_page_list_put(pi); - else - { - scrub_one_page(pi); - ASSERT((pi->count_info & ~(PGC_allocated | 1)) == 0); - free_domheap_pages(pi,0); - } -} - -/* - * Memory allocation for ephemeral (non-persistent) data - */ -static inline struct page_info *__tmem_alloc_page(void) -{ - struct page_info *pi = tmem_page_list_get(); - - if ( pi == NULL) - pi = alloc_domheap_pages(0,0,MEMF_tmem); - - if ( pi ) - atomic_inc(&freeable_page_count); - ASSERT((pi == NULL) || IS_VALID_PAGE(pi)); - return pi; -} - -static inline void __tmem_free_page(struct page_info *pi) -{ - ASSERT(IS_VALID_PAGE(pi)); - tmem_page_list_put(pi); - atomic_dec(&freeable_page_count); -} - -/* "Client" (==domain) abstraction */ -static inline struct client *tmem_client_from_cli_id(domid_t cli_id) -{ - struct client *c; - struct domain *d = rcu_lock_domain_by_id(cli_id); - if (d == NULL) - return NULL; - c = d->tmem_client; - rcu_unlock_domain(d); - return c; -} - -/* these typedefs are in the public/tmem.h interface -typedef XEN_GUEST_HANDLE(void) cli_mfn_t; -typedef XEN_GUEST_HANDLE(char) cli_va_t; -*/ -typedef XEN_GUEST_HANDLE_PARAM(tmem_op_t) tmem_cli_op_t; -typedef XEN_GUEST_HANDLE_PARAM(char) tmem_cli_va_param_t; - -static inline int tmem_get_tmemop_from_client(tmem_op_t *op, tmem_cli_op_t uops) -{ -#ifdef CONFIG_COMPAT - if ( is_hvm_vcpu(current) ? hvm_guest_x86_mode(current) != 8 - : is_pv_32bit_vcpu(current) ) - { - int rc; - enum XLAT_tmem_op_u u; - tmem_op_compat_t cop; - - rc = copy_from_guest(&cop, guest_handle_cast(uops, void), 1); - if ( rc ) - return rc; - switch ( cop.cmd ) - { - case TMEM_NEW_POOL: u = XLAT_tmem_op_u_creat; break; - default: u = XLAT_tmem_op_u_gen ; break; - } - XLAT_tmem_op(op, &cop); - return 0; - } -#endif - return copy_from_guest(op, uops, 1); -} - -#define tmem_cli_buf_null guest_handle_from_ptr(NULL, char) -#define TMEM_CLI_ID_NULL ((domid_t)((domid_t)-1L)) -#define tmem_cli_id_str "domid" -#define tmem_client_str "domain" - -int tmem_decompress_to_client(xen_pfn_t, void *, size_t, - tmem_cli_va_param_t); -int tmem_compress_from_client(xen_pfn_t, void **, size_t *, - tmem_cli_va_param_t); - -int tmem_copy_from_client(struct page_info *, xen_pfn_t, tmem_cli_va_param_t); -int tmem_copy_to_client(xen_pfn_t, struct page_info *, tmem_cli_va_param_t); - -#define tmem_client_err(fmt, args...) printk(XENLOG_G_ERR fmt, ##args) -#define tmem_client_warn(fmt, args...) printk(XENLOG_G_WARNING fmt, ##args) -#define tmem_client_info(fmt, args...) printk(XENLOG_G_INFO fmt, ##args) - -/* Global statistics (none need to be locked). */ -struct tmem_statistics { - unsigned long total_tmem_ops; - unsigned long errored_tmem_ops; - unsigned long total_flush_pool; - unsigned long alloc_failed; - unsigned long alloc_page_failed; - unsigned long evicted_pgs; - unsigned long evict_attempts; - unsigned long relinq_pgs; - unsigned long relinq_attempts; - unsigned long max_evicts_per_relinq; - unsigned long low_on_memory; - unsigned long deduped_puts; - unsigned long tot_good_eph_puts; - int global_obj_count_max; - int global_pgp_count_max; - int global_pcd_count_max; - int global_page_count_max; - int global_rtree_node_count_max; - long global_eph_count_max; - unsigned long failed_copies; - unsigned long pcd_tot_tze_size; - unsigned long pcd_tot_csize; - /* Global counters (should use long_atomic_t access). */ - atomic_t global_obj_count; - atomic_t global_pgp_count; - atomic_t global_pcd_count; - atomic_t global_page_count; - atomic_t global_rtree_node_count; -}; - -#define atomic_inc_and_max(_c) do { \ - atomic_inc(&tmem_stats._c); \ - if ( _atomic_read(tmem_stats._c) > tmem_stats._c##_max ) \ - tmem_stats._c##_max = _atomic_read(tmem_stats._c); \ -} while (0) - -#define atomic_dec_and_assert(_c) do { \ - atomic_dec(&tmem_stats._c); \ - ASSERT(_atomic_read(tmem_stats._c) >= 0); \ -} while (0) - -#define MAX_GLOBAL_SHARED_POOLS 16 -struct tmem_global { - struct list_head ephemeral_page_list; /* All pages in ephemeral pools. */ - struct list_head client_list; - struct tmem_pool *shared_pools[MAX_GLOBAL_SHARED_POOLS]; - bool shared_auth; - long eph_count; /* Atomicity depends on eph_lists_spinlock. */ - atomic_t client_weight_total; -}; - -#define MAX_POOLS_PER_DOMAIN 16 - -struct tmem_pool; -struct tmem_page_descriptor; -struct tmem_page_content_descriptor; -struct client { - struct list_head client_list; - struct tmem_pool *pools[MAX_POOLS_PER_DOMAIN]; - struct domain *domain; - struct xmem_pool *persistent_pool; - struct list_head ephemeral_page_list; - long eph_count, eph_count_max; - domid_t cli_id; - xen_tmem_client_t info; - /* For save/restore/migration. */ - bool was_frozen; - struct list_head persistent_invalidated_list; - struct tmem_page_descriptor *cur_pgp; - /* Statistics collection. */ - unsigned long compress_poor, compress_nomem; - unsigned long compressed_pages; - uint64_t compressed_sum_size; - uint64_t total_cycles; - unsigned long succ_pers_puts, succ_eph_gets, succ_pers_gets; - /* Shared pool authentication. */ - uint64_t shared_auth_uuid[MAX_GLOBAL_SHARED_POOLS][2]; -}; - -#define POOL_PAGESHIFT (PAGE_SHIFT - 12) -#define OBJ_HASH_BUCKETS 256 /* Must be power of two. */ -#define OBJ_HASH_BUCKETS_MASK (OBJ_HASH_BUCKETS-1) - -#define is_persistent(_p) (_p->persistent) -#define is_shared(_p) (_p->shared) - -struct tmem_pool { - bool shared; - bool persistent; - bool is_dying; - struct client *client; - uint64_t uuid[2]; /* 0 for private, non-zero for shared. */ - uint32_t pool_id; - rwlock_t pool_rwlock; - struct rb_root obj_rb_root[OBJ_HASH_BUCKETS]; /* Protected by pool_rwlock. */ - struct list_head share_list; /* Valid if shared. */ - int shared_count; /* Valid if shared. */ - /* For save/restore/migration. */ - struct list_head persistent_page_list; - struct tmem_page_descriptor *cur_pgp; - /* Statistics collection. */ - atomic_t pgp_count; - int pgp_count_max; - long obj_count; /* Atomicity depends on pool_rwlock held for write. */ - long obj_count_max; - unsigned long objnode_count, objnode_count_max; - uint64_t sum_life_cycles; - uint64_t sum_evicted_cycles; - unsigned long puts, good_puts, no_mem_puts; - unsigned long dup_puts_flushed, dup_puts_replaced; - unsigned long gets, found_gets; - unsigned long flushs, flushs_found; - unsigned long flush_objs, flush_objs_found; -}; - -struct share_list { - struct list_head share_list; - struct client *client; -}; - -#endif /* __XEN_TMEM_XEN_H__ */ diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst index e330f72580..95f5e5592b 100644 --- a/xen/include/xlat.lst +++ b/xen/include/xlat.lst @@ -135,8 +135,6 @@ ? sched_pin_override sched.h ? sched_remote_shutdown sched.h ? sched_shutdown sched.h -? tmem_oid tmem.h -! tmem_op tmem.h ? t_buf trace.h ? vcpu_get_physid vcpu.h ? vcpu_register_vcpu_info vcpu.h diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h index e628b1c6af..01d2814fed 100644 --- a/xen/include/xsm/dummy.h +++ b/xen/include/xsm/dummy.h @@ -433,12 +433,6 @@ static XSM_INLINE int xsm_page_offline(XSM_DEFAULT_ARG uint32_t cmd) return xsm_default_action(action, current->domain, NULL); } -static XSM_INLINE int xsm_tmem_op(XSM_DEFAULT_VOID) -{ - XSM_ASSERT_ACTION(XSM_HOOK); - return xsm_default_action(action, current->domain, NULL); -} - static XSM_INLINE long xsm_do_xsm_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) op) { return -ENOSYS; diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h index 8a78d8abd3..9a90355056 100644 --- a/xen/include/xsm/xsm.h +++ b/xen/include/xsm/xsm.h @@ -127,7 +127,6 @@ struct xsm_operations { int (*resource_setup_misc) (void); int (*page_offline)(uint32_t cmd); - int (*tmem_op)(void); long (*do_xsm_op) (XEN_GUEST_HANDLE_PARAM(xsm_op_t) op); #ifdef CONFIG_COMPAT @@ -537,11 +536,6 @@ static inline int xsm_page_offline(xsm_default_t def, uint32_t cmd) return xsm_ops->page_offline(cmd); } -static inline int xsm_tmem_op(xsm_default_t def) -{ - return xsm_ops->tmem_op(); -} - static inline long xsm_do_xsm_op (XEN_GUEST_HANDLE_PARAM(xsm_op_t) op) { return xsm_ops->do_xsm_op(op); diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c index 1fe0e746fa..c9a566f2b5 100644 --- a/xen/xsm/dummy.c +++ b/xen/xsm/dummy.c @@ -103,7 +103,6 @@ void __init xsm_fixup_ops (struct xsm_operations *ops) set_to_dummy_if_null(ops, resource_setup_misc); set_to_dummy_if_null(ops, page_offline); - set_to_dummy_if_null(ops, tmem_op); set_to_dummy_if_null(ops, hvm_param); set_to_dummy_if_null(ops, hvm_control); set_to_dummy_if_null(ops, hvm_param_nested); diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c index 3d00c747f6..a7d690ac3c 100644 --- a/xen/xsm/flask/hooks.c +++ b/xen/xsm/flask/hooks.c @@ -810,9 +810,6 @@ static int flask_sysctl(int cmd) return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2, XEN2__PSR_ALLOC, NULL); - case XEN_SYSCTL_tmem_op: - return domain_has_xen(current->domain, XEN__TMEM_CONTROL); - case XEN_SYSCTL_get_cpu_levelling_caps: return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2, XEN2__GET_CPU_LEVELLING_CAPS, NULL); @@ -1178,11 +1175,6 @@ static inline int flask_page_offline(uint32_t cmd) } } -static inline int flask_tmem_op(void) -{ - return domain_has_xen(current->domain, XEN__TMEM_OP); -} - static int flask_add_to_physmap(struct domain *d1, struct domain *d2) { return domain_has_perm(d1, d2, SECCLASS_MMU, MMU__PHYSMAP); @@ -1818,7 +1810,6 @@ static struct xsm_operations flask_ops = { .resource_setup_misc = flask_resource_setup_misc, .page_offline = flask_page_offline, - .tmem_op = flask_tmem_op, .hvm_param = flask_hvm_param, .hvm_control = flask_hvm_param, .hvm_param_nested = flask_hvm_param_nested, diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors index e00448b776..194d743a71 100644 --- a/xen/xsm/flask/policy/access_vectors +++ b/xen/xsm/flask/policy/access_vectors @@ -67,10 +67,6 @@ class xen lockprof # XEN_SYSCTL_cpupool_op cpupool_op -# tmem hypercall (any access) - tmem_op -# XEN_SYSCTL_tmem_op command of tmem (part of sysctl) - tmem_control # XEN_SYSCTL_scheduler_op with XEN_DOMCTL_SCHEDOP_getinfo, XEN_SYSCTL_sched_id, XEN_DOMCTL_SCHEDOP_getvcpuinfo getscheduler # XEN_SYSCTL_scheduler_op with XEN_DOMCTL_SCHEDOP_putinfo, XEN_DOMCTL_SCHEDOP_putvcpuinfo From patchwork Tue Apr 9 16:52:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Liu X-Patchwork-Id: 10891765 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0C711139A for ; Tue, 9 Apr 2019 16:54:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E1D45252D5 for ; Tue, 9 Apr 2019 16:54:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D5AA42621E; Tue, 9 Apr 2019 16:54:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 392EB252D5 for ; Tue, 9 Apr 2019 16:54:39 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hDtyv-0001TM-TJ; Tue, 09 Apr 2019 16:52:21 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hDtyu-0001SP-TV for xen-devel@lists.xenproject.org; Tue, 09 Apr 2019 16:52:20 +0000 X-Inumbo-ID: d40bee9a-5ae7-11e9-8b81-b314d09c256e Received: from SMTP03.CITRIX.COM (unknown [162.221.156.55]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id d40bee9a-5ae7-11e9-8b81-b314d09c256e; Tue, 09 Apr 2019 16:52:16 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.60,330,1549929600"; d="scan'208";a="83138381" From: Wei Liu To: Date: Tue, 9 Apr 2019 17:52:10 +0100 Message-ID: <20190409165210.7624-4-wei.liu2@citrix.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190409165210.7624-1-wei.liu2@citrix.com> References: <20190409165210.7624-1-wei.liu2@citrix.com> MIME-Version: 1.0 Subject: [Xen-devel] [PATCH v3 3/3] docs: remove tmem related text X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Stefano Stabellini , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Ian Jackson , Tim Deegan , Julien Grall , Jan Beulich , Daniel De Graaf Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Wei Liu Acked-by: Daniel De Graaf Acked-by: Konrad Rzeszutek Wilk --- v3: Rebase and fix up conflicts --- docs/man/xl.1.pod.in | 68 --- docs/man/xl.conf.5.pod | 9 +- docs/misc/tmem-internals.html | 789 ------------------------------ docs/misc/xen-command-line.pandoc | 12 - docs/misc/xsm-flask.txt | 36 -- 5 files changed, 2 insertions(+), 912 deletions(-) delete mode 100644 docs/misc/tmem-internals.html diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in index 4310fcd818..a8dae005b1 100644 --- a/docs/man/xl.1.pod.in +++ b/docs/man/xl.1.pod.in @@ -1677,74 +1677,6 @@ Obtain information of USB devices connected as such via the device model =back -=head1 TRANSCENDENT MEMORY (TMEM) - -=over 4 - -=item B I<[OPTIONS]> I - -List tmem pools. - -B - -=over 4 - -=item B<-l> - -If this parameter is specified, also list tmem stats. - -=back - -=item B I - -Freeze tmem pools. - -=item B I - -Thaw tmem pools. - -=item B I [I] - -Change tmem settings. - -B - -=over 4 - -=item B<-w> I - -Weight (int) - -=item B<-p> I - -Compress (int) - -=back - -=item B I [I] - -De/authenticate shared tmem pool. - -B - -=over 4 - -=item B<-u> I - -Specify uuid (abcdef01-2345-6789-1234-567890abcdef) - -=item B<-a> I - -0=auth,1=deauth - -=back - -=item B - -Get information about how much freeable memory (MB) is in-use by tmem. - -=back - =head1 FLASK B is a security framework that defines a mandatory access control policy diff --git a/docs/man/xl.conf.5.pod b/docs/man/xl.conf.5.pod index 37262a7ef8..b1bde7d657 100644 --- a/docs/man/xl.conf.5.pod +++ b/docs/man/xl.conf.5.pod @@ -148,10 +148,8 @@ The default choice is "xvda". =item B If this option is enabled then when a guest is created there will be an -guarantee that there is memory available for the guest. This is an -particularly acute problem on hosts with memory over-provisioned guests -that use tmem and have self-balloon enabled (which is the default -option). The self-balloon mechanism can deflate/inflate the balloon +guarantee that there is memory available for the guest. +The self-balloon mechanism can deflate/inflate the balloon quickly and the amount of free memory (which C can show) is stale the moment it is printed. When claim is enabled a reservation for the amount of memory (see 'memory' in xl.conf(5)) is set, which is then @@ -163,9 +161,6 @@ If the reservation cannot be meet the guest creation fails immediately instead of taking seconds/minutes (depending on the size of the guest) while the guest is populated. -Note that to enable tmem type guests, one needs to provide C on the -Xen hypervisor argument and as well on the Linux kernel command line. - Default: C<1> =over 4 diff --git a/docs/misc/tmem-internals.html b/docs/misc/tmem-internals.html deleted file mode 100644 index 9b7e70e650..0000000000 --- a/docs/misc/tmem-internals.html +++ /dev/null @@ -1,789 +0,0 @@ -

Transcendent Memory Internals in Xen

-

-by Dan Magenheimer, Oracle Corp.

-

-Draft 0.1 -- Updated: 20100324 -

Overview

-

-This document focuses on the internal implementation of -Transcendent Memory (tmem) on Xen. It assumes -that the reader has a basic knowledge of the terminology, objectives, and -functionality of tmem and also has access to the Xen source code. -It corresponds to the Xen 4.0 release, with -patch added to support page deduplication (V2). -

-The primary responsibilities of the tmem implementation are to: -

    -
  • manage a potentially huge and extremely dynamic -number of memory pages from a potentially large number of clients (domains) -with low memory overhead and proper isolation -
  • provide quick and efficient access to these -pages with as much concurrency as possible -
  • enable efficient reclamation and eviction of pages (e.g. when -memory is fully utilized) -
  • optionally, increase page density through compression and/or -deduplication -
  • where necessary, properly assign and account for -memory belonging to guests to avoid malicious and/or accidental unfairness -and/or denial-of-service -
  • record utilization statistics and make them available to management tools -
-

Source Code Organization

- -

-The source code in Xen that provides the tmem functionality -is divided up into four files: tmem.c, tmem.h, tmem_xen.c, and tmem_xen.h. -The files tmem.c and tmem.h are intended to -be implementation- (and hypervisor-) independent and the other two files -provide the Xen-specific code. This -division is intended to make it easier to port tmem functionality to other -hypervisors, though at this time porting to other hypervisors has not been -attempted. Together, these four files -total less than 4000 lines of C code. -

-Even ignoring the implementation-specific functionality, the -implementation-independent part of tmem has several dependencies on -library functionality (Xen source filenames in parentheses): -

    -
  • -a good fast general-purpose dynamic memory -allocator with bounded response time and efficient use of memory for a very -large number of sub-page allocations. To -achieve this in Xen, the bad old memory allocator was replaced with a -slightly-modified version of TLSF (xmalloc_tlsf.c), first ported to Linux by -Nitin Gupta for compcache. -
  • -good tree data structure libraries, specifically -red-black trees (rbtree.c) and radix trees (radix-tree.c). -Code for these was borrowed for Linux and adapted for tmem and Xen. -
  • -good locking and list code. Both of these existed in Xen and required -little or no change. -
  • -optionally, a good fast lossless compression -library. The Xen implementation added to -support tmem uses LZO1X (lzo.c), also ported for Linux by Nitin Gupta. -
-

-More information about the specific functionality of these -libraries can easily be found through a search engine, via wikipedia, or in the -Xen or Linux source logs so we will not elaborate further here. - -

Prefixes/Abbreviations/Glossary

- -

-The tmem code uses several prefixes and abbreviations. -Knowledge of these will improve code readability: -

    -
  • -tmh == -transcendent memory host. Functions or -data structures that are defined by the implementation-specific code, i.e. the -Xen host code -
  • -tmemc -== transcendent memory control. -Functions or data structures that provide management tool functionality, -rather than core tmem operations. -
  • -cli or -client == client. -The tmem generic term for a domain or a guest OS. -
-

-When used in prose, common tmem operations are indicated -with a different font, such as put -and get. - -

Key Data Structures

- -

-To manage a huge number of pages, efficient data structures -must be carefully selected. -

-Recall that a tmem-enabled guest OS may create one or more -pools with different attributes. It then -puts and gets -pages to/from this pool, identifying the page -with a handle that consists of a pool_id, an -object_id, and a page_id (sometimes -called an index). -This suggests a few obvious core data -structures: -

    -
  • -When a guest OS first calls tmem, a client_t is created to contain -and track all uses of tmem by that guest OS. Among -other things, a client_t keeps pointers -to a fixed number of pools (16 in the current Xen implementation). -
  • -When a guest OS requests a new pool, a pool_t is created. -Some pools are shared and are kept in a -sharelist (sharelist_t) which points -to all the clients that are sharing the pool. -Since an object_id is 64-bits, -a pool_t must be able to keep track -of a potentially very large number of objects. -To do so, it maintains a number of parallel trees (256 in the current -Xen implementation) and a hash algorithm is applied to the object_id -to select the correct tree. -Each tree element points to an object. -Because an object_id usually represents an inode -(a unique file number identifier), and inode numbers -are fairly random, though often "clumpy", a red-black tree -is used. -
  • -When a guest first -puts a page to a pool with an as-yet-unused object_id, an -obj_t is created. Since a page_id is usually an index into a file, -it is often a small number, but may sometimes be very large (up to -32-bits). A radix tree is a good data structure to contain items -with this kind of index distribution. -
  • -When a page is -put, a page descriptor, or pgp_t, is created, which -among other things will point to the storage location where the data is kept. -In the normal case the pointer is to a pfp_t, which is an -implementation-specific datatype representing a physical pageframe in memory -(which in Xen is a "struct page_info"). -When deduplication is enabled, it points to -yet another data structure, a pcd_t -(see below). When compression is enabled -(and deduplication is not), the pointer points directly to the compressed data. -For reasons we will see shortly, each pgp_t that represents -an ephemeral page (that is, a page placed -in an ephemeral pool) is also placed -into two doubly-linked linked lists, one containing all ephemeral pages -put by the same client and one -containing all ephemeral pages across all clients ("global"). -
  • -When deduplication is enabled, multiple pgp_t's may need to point to -the same data, so another data structure (and level of indirection) is used -called a page content descriptor, or pcd_t. -Multiple page descriptors (pgp_t's) may point to the same pcd_t. -The pcd_t, in turn, points to either a pfp_t -(if a full page of data), directly to a -location in memory (if the page has been compressed or trailing zeroes have -been eliminated), or even a NULL pointer (if the page contained all zeroes and -trailing zero elimination is enabled). -
-

-The most apparent usage of this multi-layer web of data structures -is "top-down" because, in normal operation, the vast majority of tmem -operations invoked by a client are -puts and gets, which require the various -data structures to be walked starting with the client_t, then -a pool_t, then an obj_t, then a pgd_t. -However, there is another highly frequent tmem operation that is not -visible from a client: memory reclamation. -Since tmem attempts to use all spare memory in the system, it must -frequently free up, or evict, -pages. The eviction algorithm will be -explained in more detail later but, in brief, to free memory, ephemeral pages -are removed from the tail of one of the doubly-linked lists, which means that -all of the data structures associated with that page-to-be-removed must be -updated or eliminated and freed. As a -result, each data structure also contains a back-pointer -to its parent, for example every obj_t -contains a pointer to its containing pool_t. -

-This complex web of interconnected data structures is updated constantly and -thus extremely sensitive to careless code changes which, for example, may -result in unexpected hypervisor crashes or non-obvious memory leaks. -On the other hand, the code is fairly well -modularized so, once understood, it is possible to relatively easily switch out -one kind of data structure for another. -To catch problems as quickly as possible when debug is enabled, most of -the data structures are equipped with sentinelsand many inter-function -assumptions are documented and tested dynamically -with assertions. -While these clutter and lengthen the tmem -code substantially, their presence has proven invaluable on many occasions. -

-For completeness, we should also describe a key data structure in the Xen -implementation-dependent code: the tmh_page_list. For security and -performance reasons, pages that are freed due to tmem operations (such -as get) are not immediately put back into Xen's pool -of free memory (aka the Xen heap). -Tmem pages may contain guest-private data that must be scrubbed before -those memory pages are released for the use of other guests. -But if a page is immediately re-used inside of tmem itself, the entire -page is overwritten with new data, so need not be scrubbed. -Since tmem is usually the most frequent -customer of the Xen heap allocation code, it would be a waste of time to scrub -a page, release it to the Xen heap, and then immediately re-allocate it -again. So, instead, tmem maintains -currently-unused pages of memory on its own free list, tmh_page_list, -and returns the pages to Xen only when non-tmem Xen -heap allocation requests would otherwise fail. - -

Scalablility/Concurrency

- -

Tmem has been designed to be highly scalable. -Since tmem access is invoked similarly in -many ways to asynchronous disk access, a "big SMP" tmem-aware guest -OS can, and often will, invoke tmem hypercalls simultaneously on many different -physical CPUs. And, of course, multiple -tmem-aware guests may independently and simultaneously invoke tmem -hypercalls. While the normal frequency -of tmem invocations is rarely extremely high, some tmem operations such as data -compression or lookups in a very large tree may take tens of thousands of -cycles or more to complete. Measurements -have shown that normal workloads spend no more than about 0.2% (2% with -compression enabled) of CPU time executing tmem operations. -But those familiar with OS scalability issues -recognize that even this limited execution time can create concurrency problems -in large systems and result in poorly-scalable performance. -

-A good locking strategy is critical to concurrency, but also -must be designed carefully to avoid deadlock and livelock problems. For -debugging purposes, tmem supports a "big kernel lock" which disables -concurrency altogether (enabled in Xen with "tmem_lock", but note -that this functionality is rarely tested and likely has bit-rotted). Infrequent -but invasive tmem hypercalls, such as pool creation or the control operations, -are serialized on a single read-write lock, called tmem_rwlock, -which must be held for writing. All other tmem operations must hold this lock -for reading, so frequent operations such as -put and get flush can execute simultaneously -as long as no invasive operations are occurring. -

-Once a pool has been selected, there is a per-pool -read-write lock (pool_rwlock) which -must be held for writing if any transformative operations might occur within -that pool, such as when an obj_t is -created or destroyed. For the highly -frequent operation of finding an obj_t -within a pool, pool_rwlock must be held for reading. -

-Once an object has been selected, there is a per-object -spinlock (obj_spinlock). -This is a spinlock rather than a read-write -lock because nearly all of the most frequent tmem operations (e.g. -put and get flush) -are transformative, in -that they add or remove a page within the object. -This lock is generally taken whenever an -object lookup occurs and released when the tmem operation is complete. -

-Next, the per-client and global ephemeral lists are -protected by a single global spinlock (eph_lists_spinlock) -and the per-client persistent lists are also protected by a single global -spinlock (pers_list_spinlock). -And to complete the description of -implementation-independent locks, if page deduplication is enabled, all pages -for which the first byte match are contained in one of 256 trees that are -protected by one of 256 corresponding read-write locks -(pcd_tree_rwlocks). -

-In the Xen-specific code (tmem_xen.c), page frames (e.g. struct page_info) -that have been released are kept in a list (tmh_page_list) that -is protected by a spinlock (tmh_page_list_lock). -There is also an "implied" lock -associated with compression, which is likely the most time-consuming operation -in all of tmem (of course, only when compression is enabled): A compression -buffer is allocated one-per-physical-cpu early in Xen boot and a pointer to -this buffer is returned to implementation-independent code and used without a -lock. -

-The proper method to avoid deadlocks is to take and release -locks in a very specific predetermined order. -Unfortunately, since tmem data structures must simultaneously be -accessed "top-down" ( -put and get) -and "bottoms-up" -(memory reclamation), more complex methods must be employed: -A trylockmechanism is used (c.f. tmem_try_to_evict_pgp()), -which takes the lock if it is available but returns immediately (rather than -spinning and waiting) if the lock is not available. -When walking the ephemeral list to identify -pages to free, any page that belongs to an object that is locked is simply -skipped. Further, if the page is the -last page belonging to an object, and the pool read-write lock for the pool the -object belongs to is not available (for writing), that object is skipped. -These constraints modify the LRU algorithm -somewhat, but avoid the potential for deadlock. -

-Unfortunately, a livelock was still discovered in this approach: -When memory is scarce and each client is -putting a large number of pages -for exactly one object (and thus holding the object spinlock for that object), -memory reclamation takes a very long time to determine that it is unable to -free any pages, and so the time to do a -put (which eventually fails) becomes linear to the -number of pages in the object! To avoid -this situation, a workaround was added to always ensure a minimum amount of -memory (1MB) is available before any object lock is taken for the client -invoking tmem (see tmem_ensure_avail_pages()). -Other such livelocks (and perhaps deadlocks) -may be lurking. -

-A last issue related to concurrency is atomicity of counters. -Tmem gathers a large number of -statistics. Some of these counters are -informational only, while some are critical to tmem operation and must be -incremented and decremented atomically to ensure, for example, that the number -of pages in a tree never goes negative if two concurrent tmem operations access -the counter exactly simultaneously. Some -of the atomic counters are used for debugging (in assertions) and perhaps need -not be atomic; fixing these may increase performance slightly by reducing -cache-coherency traffic. Similarly, some -of the non-atomic counters may yield strange results to management tools, such -as showing the total number of successful -puts as being higher than the number of -puts attempted. -These are left as exercises for future tmem implementors. - -

Control and Manageability

- -

-Tmem has a control interface to, for example, set various -parameters and obtain statistics. All -tmem control operations funnel through do_tmem_control() -and other functions supporting tmem control operations are prefixed -with tmemc_. - -

-During normal operation, even if only one tmem-aware guest -is running, tmem may absorb nearly all free memory in the system for its own -use. Then if a management tool wishes to -create a new guest (or migrate a guest from another system to this one), it may -notice that there is insufficient "free" memory and fail the creation -(or migration). For this reason, tmem -introduces a new tool-visible class of memory -- freeable memory -- -and provides a control interface to access -it. All ephemeral memory and all pages on the tmh_page_list -are freeable. To properly access freeable -memory, a management tool must follow a sequence of steps: -

    -
  • -freeze -tmem:When tmem is frozen, all -puts fail, which ensures that no -additional memory may be absorbed by tmem. -(See tmemc_freeze_pools(), and -note that individual clients may be frozen, though this functionality may be -used only rarely.) -
  • -query freeable MB: If all freeable memory were released to the Xen -heap, this is the amount of memory (in MB) that would be freed. -See tmh_freeable_pages(). -
  • -flush: -Tmem may be requested to flush, or relinquish, a certain amount of memory, e.g. -back to the Xen heap. This amount is -specified in KB. See tmemc_flush_mem() and tmem_relinquish_npages(). -
  • -At this point the management tool may allocate -the memory, e.g. using Xen's published interfaces. -
  • -thaw -tmem: This terminates the freeze, allowing tmem to accept -puts again. -
-

-Extensive tmem statistics are available through tmem's -control interface (see tmemc_list and -the separate source for the "xm tmem-list" command and the -xen-tmem-list-parse tool). To maximize -forward/backward compatibility with future tmem and tools versions, statistical -information is passed via an ASCII interface where each individual counter is -identified by an easily parseable two-letter ASCII sequence. - -

Save/Restore/Migrate

- -

-Another piece of functionality that has a major impact on -the tmem code is support for save/restore of a tmem client and, highly related, -live migration of a tmem client. -Ephemeral pages, by definition, do not need to be saved or -live-migrated, but persistent pages are part of the state of a running VM and -so must be properly preserved. -

-When a save (or live-migrate) of a tmem-enabled VM is initiated, the first step -is for the tmem client to be frozen (see the manageability section). -Next, tmem API version information is -recorded (to avoid possible incompatibility issues as the tmem spec evolves in -the future). Then, certain high-level -tmem structural information specific to the client is recorded, including -information about the existing pools. -Finally, the contents of all persistent pages are recorded. -

-For live-migration, the process is somewhat more complicated. -Ignoring tmem for a moment, recall that in -live migration, the vast majority of the VM's memory is transferred while the -VM is still fully operational. During -each phase, memory pages belonging to the VM that are changed are marked and -then retransmitted during a later phase. -Eventually only a small amount of memory remains, the VM is paused, the -remaining memory is transmitted, and the VM is unpaused on the target machine. -

-The number of persistent tmem pages may be quite large, -possibly even larger than all the other memory used by the VM; so it is -unacceptable to transmit persistent tmem pages during the "paused" -phase of live migration. But if the VM -is still operational, it may be making calls to tmem: -A frozen tmem client will reject any -put operations, but tmem must -still correctly process flushes -(page and object), including implicit flushes due to duplicate -puts. -Fortunately, these operations can only -invalidate tmem pages, not overwrite tmem pages or create new pages. -So, when a live-migrate has been initiated, -the client is frozen. Then during the -"live" phase, tmem transmits all persistent pages, but also records -the handle of all persistent pages that are invalidated. -Then, during the "paused" phase, -only the handles of invalidated persistent pages are transmitted, resulting in -the invalidation on the target machine of any matching pages that were -previously transmitted during the "live" phase. -

-For restore (and on the target machine of a live migration), -tmem must be capable of reconstructing the internal state of the client from -the saved/migrated data. However, it is -not the client itself that is put'ing -the pages but the management tools conducting the restore/migration. -This slightly complicates tmem by requiring -new API calls and new functions in the implementation, but the code is -structured so that duplication is minimized. -Once all tmem data structures for the client are reconstructed, all -persistent pages are recreated and, in the case of live-migration, all -invalidations have been processed and the client has been thawed, the restored -client can be resumed. -

-Finally, tmem's data structures must be cluttered a bit to -support save/restore/migration. Notably, -a per-pool list of persistent pages must be maintained and, during live -migration, a per-client list of invalidated pages must be logged. -A reader of the code will note that these -lists are overlaid into space-sensitive data structures as a union, which may -be more error-prone but eliminates significant space waste. - -

Miscellaneous Tmem Topics

- -

-Duplicate puts. -One interesting corner case that -significantly complicates the tmem source code is the possibility -of a duplicate -put, -which occurs when two -puts -are requested with the same handle but with possibly different data. -The tmem API addresses - -put-put-get -coherence explicitly: When a duplicate -put occurs, tmem may react one of two ways: (1) The -put may succeed with the old -data overwritten by the new data, or (2) the -put may be failed with the original data flushed and -neither the old nor the new data accessible. -Tmem may not fail the -put and leave the old data accessible. -

-When tmem has been actively working for an extended period, -system memory may be in short supply and it is possible for a memory allocation -for a page (or even a data structure such as a pgd_t) to fail. Thus, -for a duplicate -put, it may be impossible for tmem to temporarily -simultaneously maintain data structures and data for both the original -put and the duplicate -put. -When the space required for the data is -identical, tmem may be able to overwrite in place the old data with -the new data (option 1). But in some circumstances, such as when data -is being compressed, overwriting is not always possible and option 2 must be -performed. -

-Page deduplication and trailing-zero elimination. -When page deduplication is enabled -("tmem_dedup" option to Xen), ephemeral pages for which the contents -are identical -- whether the pages belong -to the same client or different clients -- utilize the same pageframe of -memory. In Xen environments where -multiple domains have a highly similar workload, this can save a substantial -amount of memory, allowing a much larger number of ephemeral pages to be -used. Tmem page deduplication uses -methods similar to the KSM implementation in Linux [ref], but differences between -the two are sufficiently great that tmem does not directly leverage the -code. In particular, ephemeral pages in -tmem are never dirtied, so need never be copied-on-write. -Like KSM, however, tmem avoids hashing, -instead employing red-black trees -that use the entire page contents as the lookup -key. There may be better ways to implement this. -

-Dedup'ed pages may optionally be compressed -("tmem_compress" and "tmem_dedup" Xen options specified), -to save even more space, at the cost of more time. -Additionally, trailing zero elimination (tze) may be applied to dedup'ed -pages. With tze, pages that contain a -significant number of zeroes at the end of the page are saved without the trailing -zeroes; an all-zero page requires no data to be saved at all. -In certain workloads that utilize a large number -of small files (and for which the last partial page of a file is padded with -zeroes), a significant space savings can be realized without the high cost of -compression/decompression. -

-Both compression and tze significantly complicate memory -allocation. This will be discussed more below. -

-Memory accounting. -Accounting is boring, but poor accounting may -result in some interesting problems. In -the implementation-independent code of tmem, most data structures, page frames, -and partial pages (e.g. for compresssion) are billed to a pool, -and thus to a client. Some infrastructure data structures, such as -pools and clients, are allocated with tmh_alloc_infra(), which does not -require a pool to be specified. Two other -exceptions are page content descriptors (pcd_t) -and sharelists (sharelist_t) which -are explicitly not associated with a pool/client by specifying NULL instead of -a pool_t. -(Note to self: -These should probably just use the tmh_alloc_infra() interface too.) -As we shall see, persistent pool pages and -data structures may need to be handled a bit differently, so the -implementation-independent layer calls a different allocation/free routine for -persistent pages (e.g. tmh_alloc_page_thispool()) -than for ephemeral pages (e.g. tmh_alloc_page()). -

-In the Xen-specific layer, we -disregard the pool_t for ephemeral -pages, as we use the generic Xen heap for all ephemeral pages and data -structures.(Denial-of-service attacks -can be handled in the implementation-independent layer because ephemeral pages -are kept in per-client queues each with a counted length. -See the discussion on weights and caps below.) -However we explicitly bill persistent pages -and data structures against the client/domain that is using them. -(See the calls to the Xen routine alloc_domheap_pages() in tmem_xen.h; of -the first argument is a domain, the pages allocated are billed by Xen to that -domain.)This means that a Xen domain -cannot allocate even a single tmem persistent page when it is currently utilizing -its maximum assigned memory allocation! -This is reasonable for persistent pages because, even though the data is -not directly accessible by the domain, the data is permanently saved until -either the domain flushes it or the domain dies. -

-Note that proper accounting requires (even for ephemeral pools) that the same -pool is referenced when memory is freed as when it was allocated, even if the -ownership of a pool has been moved from one client to another (c.f. shared_pool_reassign()). -The underlying Xen-specific information may -not always enforce this for ephemeral pools, but incorrect alloc/free matching -can cause some difficult-to-find memory leaks and bent pointers. -

-Page deduplication is not possible for persistent pools for -accounting reasons: Imagine a page that is created by persistent pool A, which -belongs to a domain that is currently well under its maximum allocation. -Then the pcd_tis matched by persistent pool B, which is -currently at its maximum. -Then the domain owning pool A is destroyed. -Is B beyond its maximum? -(There may be a clever way around this -problem. Exercise for the reader!) -

-Memory allocation. The implementation-independent layer assumes -there is a good fast general-purpose dynamic memory allocator with bounded -response time and efficient use of memory for a very large number of sub-page -allocations. The old xmalloc memory -allocator in Xen was not a good match for this purpose, so was replaced by the -TLSF allocator. Note that the TLSF -allocator is used only for allocations smaller than a page (and, more -precisely, no larger than tmem_subpage_maxsize()); -full pages are allocated by Xen's normal heap allocator. -

-After the TLSF allocator was integrated into Xen, more work -was required so that each client could allocate memory from a separate -independent pool. (See the call to xmem_pool_create()in -tmh_client_init().) -This allows the data structures allocated for the -purpose of supporting persistent pages to be billed to the same client as the -pages themselves. It also allows partial -(e.g. compressed) pages to be properly billed. -Further, when partial page allocations cause internal fragmentation, -this fragmentation can be isolated per-client. -And, when a domain dies, full pages can be freed, rather than only -partial pages. One other change was -required in the TLSF allocator: In the original version, when a TLSF memory -pool was allocated, the first page of memory was also allocated. -Since, for a persistent pool, this page would -be billed to the client, the allocation of the first page failed if the domain -was started at its maximum memory, and this resulted in a failure to create the -memory pool. To avoid this, the code was -changed to delay the allocation of the first page until first use of the memory -pool. -

-Memory allocation interdependency. -As previously described, -pages of memory must be moveable back and forth between the Xen heap and the -tmem ephemeral lists (and page lists). -When tmem needs a page but doesn't have one, it requests one from the -Xen heap (either indirectly via xmalloc, or directly via Xen's alloc_domheap_pages()). -And when Xen needs a page but doesn't have -one, it requests one from tmem (via a call to tmem_relinquish_pages() in Xen's alloc_heap_pages() in page_alloc.c). -This leads to a potential infinite loop! -To break this loop, a new memory flag (MEMF_tmem) was added to Xen -to flag and disallow the loop. -See tmh_called_from_tmem() -in tmem_relinquish_pages(). -Note that the tmem_relinquish_pages() interface allows for memory requests of -order > 0 (multiple contiguous pages), but the tmem implementation disallows -any requests larger than a single page. -

-LRU page reclamation. -Ephemeral pages generally age in -a queue, and the space associated with the oldest -- or least-recently-used -- page is reclaimed when tmem needs more -memory. But there are a few exceptions -to strict LRU queuing. First is when -removal from a queue is constrained by locks, as previously described above. -Second, when an ephemeral pool is shared, unlike a private ephemeral -pool, a -get -does not imply a -flush -Instead, in a shared pool, a -results in the page being promoted to the front of the queue. -Third, when a page that is deduplicated (i.e. -is referenced by more than one pgp_t) -reaches the end of the LRU queue, it is marked as eviction attempted and promoted to the front of the queue; if it -reaches the end of the queue a second time, eviction occurs. -Note that only the pgp_t is evicted; the actual data is only reclaimed if there is no -other pgp_t pointing to the data. -

-All of these modified- LRU algorithms deserve to be studied -carefully against a broad range of workloads. -

-Internal fragmentation. -When -compression or tze is enabled, allocations between a half-page and a full-page -in size are very common and this places a great deal of pressure on even the -best memory allocator. Additionally, -problems may be caused for memory reclamation: When one tmem ephemeral page is -evicted, only a fragment of a physical page of memory might be reclaimed. -As a result, when compression or tze is -enabled, it may take a very large number of eviction attempts to free up a full -contiguous page of memory and so, to avoid near-infinite loops and livelocks, eviction -must be assumed to be able to fail. -While all memory allocation paths in tmem are resilient to failure, very -complex corner cases may eventually occur. -As a result, compression and tze are disabled by default and should be -used with caution until they have been tested with a much broader set of -workloads.(Note to self: The -code needs work.) -

-Weights and caps. -Because -of the just-discussed LRU-based eviction algorithms, a client that uses tmem at -a very high frequency can quickly swamp tmem so that it provides little benefit -to a client that uses it less frequently. -To reduce the possibility of this denial-of-service, limits can be -specified via management tools that are enforced internally by tmem. -On Xen, the "xm tmem-set" command -can specify "weight=<weight>" or "cap=<cap>" -for any client. If weight is non-zero -for a client and the current percentage of ephemeral pages in use by the client -exceeds its share (as measured by the sum of weights of all clients), the next -page chosen for eviction is selected from the requesting client's ephemeral -queue, instead of the global ephemeral queue that contains pages from all -clients.(See client_over_quota().) -Setting a cap for a client is currently a no-op. -

-Shared pools and authentication. -When tmem was first proposed to the linux kernel mailing list -(LKML), there was concern expressed about security of shared ephemeral -pools. The initial tmem implementation only -required a client to provide a 128-bit UUID to identify a shared pool, and the -linux-side tmem implementation obtained this UUID from the superblock of the -shared filesystem (in ocfs2). It was -pointed out on LKML that the UUID was essentially a security key and any -malicious domain that guessed it would have access to any data from the shared -filesystem that found its way into tmem. -Ocfs2 has only very limited security; it is assumed that anyone who can -access the filesystem bits on the shared disk can mount the filesystem and use -it. But in a virtualized data center, -higher isolation requirements may apply. -As a result, management tools must explicitly authenticate (or may -explicitly deny) shared pool access to any client. -On Xen, this is done with the "xl -tmem-shared-auth" command. -

-32-bit implementation. -There was some effort put into getting tmem working on a 32-bit Xen. -However, the Xen heap is limited in size on -32-bit Xen so tmem did not work very well. -There are still 32-bit ifdefs in some places in the code, but things may -have bit-rotted so using tmem on a 32-bit Xen is not recommended. - -

Known Issues

- -

Fragmentation.When tmem -is active, all physically memory becomes fragmented -into individual pages. However, the Xen -memory allocator allows memory to be requested in multi-page contiguous -quantities, called order>0 allocations. -(e.g. 2order so -order==4 is sixteen contiguous pages.) -In some cases, a request for a larger order will fail gracefully if no -matching contiguous allocation is available from Xen. -As of Xen 4.0, however, there are several -critical order>0 allocation requests that do not fail gracefully. -Notably, when a domain is created, and -order==4 structure is required or the domain creation will fail. -And shadow paging requires many order==2 -allocations; if these fail, a PV live-migration may fail. -There are likely other such issues. -

-But, fragmentation can occur even without tmem if any domU does -any extensive ballooning; tmem just accelerates the fragmentation. -So the fragmentation problem must be solved -anyway. The best solution is to disallow -order>0 allocations altogether in Xen -- or at least ensure that any attempt -to allocate order>0 can fail gracefully, e.g. by falling back to a sequence -of single page allocations. However this restriction may require a major rewrite -in some of Xen's most sensitive code. -(Note that order>0 allocations during Xen boot and early in domain0 -launch are safe and, if dom0 does not enable tmem, any order>0 allocation by -dom0 is safe, until the first domU is created.) -

-Until Xen can be rewritten to be fragmentation-safe, a small hack -was added in the Xen page -allocator.(See the comment " -memory is scarce" in alloc_heap_pages().) -Briefly, a portion of memory is pre-reserved -for allocations where order>0 and order<9. -(Domain creation uses 2MB pages, but fails -gracefully, and there are no other known order==9 allocations or order>9 -allocations currently in Xen.) -

-NUMA. Tmem assumes that -all memory pages are equal and any RAM page can store a page of data for any -client. This has potential performance -consequences in any NUMA machine where access to far memory is significantly slower than access to near memory. -On nearly all of today's servers, however, -access times to far memory is still -much faster than access to disk or network-based storage, and tmem's primary performance -advantage comes from the fact that paging and swapping are reduced. -So, the current tmem implementation ignores -NUMA-ness; future tmem design for NUMA machines is an exercise left for the -reader. - -

Bibliography

- -

-(needs work)